ABCAttack: A Gradient-Free Optimization Black-Box Attack for Fooling Deep Image Classifiers
Abstract
:1. Introduction
- Aiming at reducing the reliance of white-box attacks on the attacker’s knowledge and overcoming the shortcomings of black-box attacks that evaluate gradients and train a substitute model to generate adversarial samples, a new black-box adversarial sample generation method, ABCAttack, is proposed.
- ABCAttack transforms adversarial sample generation into an optimization problem by using an improved search strategy to continuously iterate to obtain the adversarial image that is adversarial samples if successfully attacking the targeted model. Adversarial samples can be generated only by accessing the input and softmax probability of the model, without other detailed information, so gradient evaluation and training a substitute model are avoided, which can effectively improve the generation efficiency of adversarial samples.
- We evaluated the proposed ABCAttack using existing datasets: MNIST, CIFAR-10 and ImageNet. The results demonstrate the effectiveness and efficiency under the black-box setting, that is, in both targeted and untargeted attacks, ABCAttack uses fewer queries and lower time consumption to generate adversarial samples, which destroys the trained DNN models and greatly reduces its credibility.
- We further highlight the effectiveness of ABCAttack in defenses, namely defence-Gan, Stochastic Activation Pruning (SAP), Local Intrinsic Dimensionality (LID), non-differentiable input transformations and others. Although these defenses claim to be robust, our attack can still greatly reduce the accuracy of defense models. ABCAttack is gradient-free, so it has wide applicability, which means that as long as we obtain the input and feedback, we can successfully craft adversarial examples without caring about the specific details of the model.
2. Related Work
3. ABC
4. Methodology
4.1. Adversarial Sample
- When , it is the -norm of the matrix, indicating the number of non-zero elements in a vector. This constraint limits the dimension of the modified vector, but there is no limit on the modification range of each dimension when the adversarial sample is crafted.
- When , it is the -norm of the matrix, representing the Euclidean distance between the original image and the adversarial sample. This restriction allows the algorithm to modify all dimensions, but limits the magnitude.
- When , it is the -norm of the matrix, which represents the maximum amount of modification between the original sample and adversarial sample, that is, the maximum absolute value of the added perturbation, which can modify all dimensions.
4.2. Problem Description
4.3. ABCAttack Algorithm
Algorithm 1 Adversarial attack-based ABCAttack. |
Input: original image , the true label y and its probability , target label , number of food source , Limit number l, max change , the initialization amplitude limit factor , upper bounds and lower bounds . |
Output: |
1: |
2: 100,000 |
3: clip(), clip() |
4: |
// Initializing |
5: for to do |
6: = |
7: =clipByTensor |
8: = calFitness() |
9: updateBestSolution(, ) |
// Optimization |
10: while stopOption() do |
11: for to do |
12: = clipByTensor(searchStrategy(i, 1), , ) |
13: = calFitness() |
14: greedySelection() |
15: updateBestSolution(, ) |
= selectProbabilities() |
16: for to do |
17: if then |
18: = clipByTensor(searchStrategy(i, 2), , ) |
19: = calFitness() |
20: greedySelection() |
21: updateBestSolution(, ) |
22: sendScoutBees() |
23: updateBestSolution(, ) |
return |
5. Experimental Verification and Result Analysis
5.1. Experimental Dataset and Environment
5.2. Parameter Analysis
5.3. Comparison and Analysis of Experimental Results
5.3.1. Attack on MNIST, CIFAR-10 Classification Models
5.3.2. Attacks on Large-Scale Image Dataset ImageNet
5.3.3. Targeted Attack Analysis
5.3.4. Untargeted Attack Analysis
5.4. Attacking Defenses
Defense | Dataset | Parameter Setting | ||
---|---|---|---|---|
defense-GAN [31] | MNIST | 20,000 | 1066 | 62.40% |
SAP [32] | CIFAR-10 | 10,000 | 986 | 64.60% |
SAP [32] | CIFAR-10 | 20,000 | 1491 | 67.40% |
SAP [32] | CIFAR-10 | 10,000 | 701 | 88.40% |
JPEG and bit depth [33] | ImageNet | 30,000 | 4893 | 78.00% |
LID [41] | CIFAR-10 | 10,000 | 362 | 99.90% |
k-winners [42] (adversarial training model) | CIFAR-10 | 10,000 | 937 | 70.40% |
Ensemble train [43] | CIFAR-10 | 10,000 | 893 | 89.50% |
Ensemble train [43] | CIFAR-10 | 10,000 | 448 | 96.30% |
Ensemble train [43] | CIFAR-10 | 20,000 | 744 | 98.30% |
5.5. The Wide Applicability of ABCAttack to Various DNN Models
6. Discussion
7. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, S.; Song, W.; Fang, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Deep learning for hyperspectral image classification: An overview. IEEE Trans. Geosci Remote Sens. 2019, 57, 6690–6709. [Google Scholar] [CrossRef] [Green Version]
- Aceto, G.; Ciuonzo, D.; Montieri, A.; Pescapé, A. DISTILLER: Encrypted traffic classification via multimodal multitask deep learning. J. Netw. Comput. Appl. 2021, 183, 102985. [Google Scholar] [CrossRef]
- Qiu, H.; Dong, T.; Zhang, T.; Lu, J.; Memmi, G.; Qiu, M. Adversarial Attacks Against Network Intrusion Detection in IoT Systems. IEEE Internet Things J. 2021, 8, 10327–10335. [Google Scholar] [CrossRef]
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2013, arXiv:1312.6199. [Google Scholar]
- Sun, Q.; Lin, K.; Si, C.; Xu, Y.; Li, S.; Gope, P. A Secure and Anonymous Communicate Scheme over the Internet of Things. ACM Trans. Sen. Netw. 2022. [Google Scholar] [CrossRef]
- Flowers, B.; Buehrer, R.M.; Headley, W.C. Evaluating Adversarial Evasion Attacks in the Context of Wireless Communications. IEEE Trans. Inf. Forensics Secur. 2020, 15, 1102–1113. [Google Scholar] [CrossRef] [Green Version]
- Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial Examples in the Physical World. arXiv 2016, arXiv:1607.02533. [Google Scholar]
- Tabacof, P.; Valle, E. Exploring the space of adversarial images. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 426–433. [Google Scholar]
- Papernot, N.; McDaniel, P.; Goodfellow, I.; Jha, S.; Celik, Z.B.; Swami, A. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, Abu Dhabi, United Arab Emirates, 2–6 April 2017; pp. 506–519. [Google Scholar]
- Chen, P.Y.; Zhang, H.; Sharma, Y.; Yi, J.; Hsieh, C.J. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, Dallas, TX, USA, 3 November 2017; pp. 15–26. [Google Scholar]
- Alzantot, M.; Sharma, Y.; Chakraborty, S.; Zhang, H.; Hsieh, C.J.; Srivastava, M.B. Genattack: Practical black-box attacks with gradient-free optimization. In Proceedings of the Genetic and Evolutionary Computation Conference, Prague, Czech Republic, 13–17 July 2019; pp. 1111–1119. [Google Scholar]
- Liu, X.; Hu, T.; Ding, K.; Bai, Y.; Niu, W.; Lu, J. A black-box attack on neural networks based on swarm evolutionary algorithm. In Proceedings of the Australasian Conference on Information Security and Privacy, Perth, WA, Australia, 30 November–2 December 2020; Springer: Amsterdam, The Netherlands, 2020; pp. 268–284. [Google Scholar]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
- Nguyen, A.; Yosinski, J.; Clune, J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 427–436. [Google Scholar]
- Moosavi-Dezfooli, S.M.; Fawzi, A.; Frossard, P. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2574–2582. [Google Scholar]
- Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z.B.; Swami, A. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbruecken, Germany, 21–24 March 2016; pp. 372–387. [Google Scholar]
- Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv 2013, arXiv:1312.6034. [Google Scholar]
- Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 39–57. [Google Scholar]
- Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2016; pp. 582–597. [Google Scholar]
- Papernot, N.; McDaniel, P.; Goodfellow, I. Transferability in Machine Learning: From Phenomena to Black-Box Attacks using Adversarial Samples. arXiv 2016, arXiv:1605.07277. [Google Scholar]
- Biggio, B.; Rieck, K.; Ariu, D.; Wressnegger, C.; Corona, I.; Giacinto, G.; Roli, F. Poisoning behavioral malware clustering. In Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, Scottsdale, AZ, USA, 7 November 2014; pp. 27–36. [Google Scholar]
- Narodytska, N.; Prasad Kasiviswanathan, S. Simple Black-Box Adversarial Perturbations for Deep Networks. arXiv 2016, arXiv:1612.06299. [Google Scholar]
- Brendel, W.; Rauber, J.; Bethge, M. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. arXiv 2017, arXiv:1712.04248. [Google Scholar]
- Ilyas, A.; Engstrom, L.; Athalye, A.; Lin, J. Black-box adversarial attacks with limited queries and information. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 2137–2146. [Google Scholar]
- Su, J.; Vargas, D.V.; Sakurai, K. One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. 2019, 23, 828–841. [Google Scholar] [CrossRef] [Green Version]
- Su, J.; Vargas, D.V.; Sakurai, K. Attacking convolutional neural network using differential evolution. IPSJ Trans. Comput. Vis. Appl. 2019, 11, 1–16. [Google Scholar] [CrossRef] [Green Version]
- Mosli, R.; Wright, M.; Yuan, B.; Pan, Y. They Might NOT Be Giants: Crafting Black-Box Adversarial Examples with Fewer Queries Using Particle Swarm Optimization. arXiv 2019, arXiv:1909.07490. [Google Scholar]
- Zhang, Q.; Wang, K.; Zhang, W.; Hu, J. Attacking black-box image classifiers with particle swarm optimization. IEEE Access 2019, 7, 158051–158063. [Google Scholar] [CrossRef]
- Moosavi-Dezfooli, S.M.; Fawzi, A.; Fawzi, O.; Frossard, P. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1765–1773. [Google Scholar]
- Mopuri, K.R.; Garg, U.; Babu, R.V. Fast feature fool: A data independent approach to universal adversarial perturbations. arXiv 2017, arXiv:1707.05572. [Google Scholar]
- Samangouei, P.; Kabkab, M.; Chellappa, R. Defense-gan: Protecting classifiers against adversarial attacks using generative models. arXiv 2018, arXiv:1805.06605. [Google Scholar]
- Dhillon, G.S.; Azizzadenesheli, K.; Lipton, Z.C.; Bernstein, J.; Kossaifi, J.; Khanna, A.; Anandkumar, A. Stochastic activation pruning for robust adversarial defense. arXiv 2018, arXiv:1803.01442. [Google Scholar]
- Guo, C.; Rana, M.; Cisse, M.; Van Der Maaten, L. Countering adversarial images using input transformations. arXiv 2017, arXiv:1711.00117. [Google Scholar]
- Xiao, C.; Zhong, P.; Zheng, C. Resisting adversarial attacks by k-winners-take-all. arXiv 2019, arXiv:1905.10510. [Google Scholar]
- Karaboga, D. An Idea Based on Honey Bee Swarm for Numerical Optimization. Technical Report; Citeseer. 2005. Available online: https://abc.erciyes.edu.tr/pub/tr06_2005.pdf (accessed on 24 October 2020).
- LeCun, Y. The MNIST Database of Handwritten Digits. 1998. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 4 May 2020).
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.222.9220 (accessed on 11 January 2021).
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Zhou, W.; Hou, X.; Chen, Y.; Tang, M.; Huang, X.; Gan, X.; Yang, Y. Transferable adversarial perturbations. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 452–467. [Google Scholar]
- Athalye, A.; Carlini, N.; Wagner, D. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 274–283. [Google Scholar]
- Ma, X.; Li, B.; Wang, Y.; Erfani, S.M.; Wijewickrema, S.; Schoenebeck, G.; Song, D.; Houle, M.E.; Bailey, J. Characterizing adversarial subspaces using local intrinsic dimensionality. arXiv 2018, arXiv:1801.02613. [Google Scholar]
- Tramer, F.; Carlini, N.; Brendel, W.; Madry, A. On adaptive attacks to adversarial example defenses. arXiv 2020, arXiv:2002.08347. [Google Scholar]
- Pang, T.; Xu, K.; Du, C.; Chen, N.; Zhu, J. Improving adversarial robustness via promoting ensemble diversity. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 4970–4979. [Google Scholar]
MNIST | ||||||||
Untargeted | Targeted | |||||||
Attack | ASR | Avg | AvgTime | AvgQueries | ASR | Avg | AvgTime | AvgQueries |
ZOO | 100% | 1.4955 | 1.38 min | 384,000 | 98.90% | 1.987068 | 1.62 min | 384,000 |
Black-box (Substitute Model + FGSM) | − | 0.002 s | − | − | 0.002 s | − | ||
Black-box (Substitute Model + C&W) | 0.76 min | 4650 | 0.80 min | 4650 | ||||
GenAttack | − | − | − | − | − | 1801 | ||
AdversarialPSO | 96.30% | 4.1431 | 0.068 min | 593 | 72.57% | 4.778 | 0.238 min | 1882 |
ABCAttack () | 100% | 4.01033 | 0.018 min | 629 | 94.89% | 4.7682 | 0.048 min | 1695 |
SWISS | 100% | 3.4298 | 0.087 min | 3043 | 19.41% | 3.5916 | 0.345 min | 20,026 |
ABCAttack () | 100% | 3.34524 | 0.045 min | 1228 | 94.99% | 4.5617 | 0.051 min | 2066 |
CIFAR-10 | ||||||||
Untargeted | Targeted | |||||||
Attack | ASR | Avg | AvgTime | AvgQueries | ASR | Avg | AvgTime | AvgQueries |
ZOO | 100% | 0.19973 | 3.43 min | 128,000 | 96.80% | 0.39879 | 3.95 min | 128,000 |
Black-box (Substitute Model + FGSM) | − | 0.005 s | − | − | 0.005 s | − | ||
Black-box (Substitute Model + C&W) | 0.47 min | 4650 | 0.49 min | 4650 | ||||
GenAttack | − | − | − | − | − | 1360 | ||
AdversarialPSO | 99.60% | 1.414 | 0.139 min | 1224 | 71.97% | 2.925 | 0.6816 min | 6512 |
ABCAttack () | 98.60% | 1.64319 | 0.0233 min | 330 | 82.3% | 1.910103 | 0.0615 min | 851 |
SWISS | 99.80% | 2.3248 | 0.1264 min | 2457 | 31.93% | 2.9972 | 1.623 min | 45,308 |
ABCAttack () | 98.40% | 1.24167 | 0.031 min | 481 | 80.88% | 1.644 | 0.0654 min | 1102 |
Method | MaxEvaluation = 30,000 | MaxEvaluation = 50,000 | ||||
---|---|---|---|---|---|---|
UAP [29] (VGG-16) | - | - | - | - | - | |
UAP [29] (VGG-19) | - | - | - | - | - | |
UAP-Fast Feature Fool [30] (VGG-16) | - | - | - | - | - | |
UAP-Fast Feature Fool [30] (VGG-19) | - | - | - | - | - | |
Tran-TAP [39] VGG-16 to Inc-v3 | - | - | - | - | - | |
Tran-TAP [39] Inc-v3 to VGG-16 | - | - | - | - | - | |
Tran-UAP [39] VGG-16-UAP to Inc-v3 | - | - | - | - | - | |
Tran-UAP [39] Inc-v3-UAP to VGG-16 | - | - | - | - | - | |
ZOO | 1,280,000 | 8.031 min | - | - | - | |
AdversarialPSO | 2833 | 3.181 min | - | - | - | |
SWISS | 8429 | 5.014 min | - | - | - | |
ABCAttack (VGG-16) | 99.60% | 1839 | 2.07 min | 99.8% | 1901 | 2.1276 min |
ABCAttack (VGG-19) | 99.00 | 1501 | 1.7298 min | 99.6% | 1698 | 1.8832 min |
ABCAttack (MobileNet-v3) | 100% | 568 | 0.6714 min | 100% | 568 | 0.6714 min |
ABCAttack (Inc-v3, image size is 299) | 90.00% | 2759 | 3.172 min | 92.00% | 2971 | 3.238 min |
ABCAttack (Inc-v3, image size is 224) | 98.4% | 899 | 1.0086 min | 98.4% | 899 | 1.0086 min |
MNIST | ||||
HRNN | MLP | |||
Attack | ASR | AvgQueries | ASR | AvgQueries |
AdversarialPSO | 100% | 552 | 94.70% | 548 |
SWISS | 100% | 3214 | 100% | 1984 |
ABCAttack () | 100% | 395 | 99.40% | 412 |
ABCAttack () | 100% | 1083 | 99.80% | 715 |
CIFAR-10 | ||||
CNNCapsule | ResNet | |||
Attack | ASR | AvgQueries | ASR | AvgQueries |
AdversarialPSO | 97.80% | 2052 | 100% | 1723 |
SWISS | 98.90% | 3725 | 100% | 1792 |
ABCAttack () | 100% | 164 | 99.20% | 165 |
ABCAttack () | 100% | 300 | 99.20% | 290 |
Model | Dataset | Untargeted | Targeted | ||
---|---|---|---|---|---|
CNN | MNIST | 99.50% | 381 | 80.30% | 806 |
LSTM | 100% | 184 | 98.50% | 920 | |
Lenet5 | 99.90% | 291 | 93.90% | 1648 | |
ResNet20V1 | CIFAR-10 | 99.10% | 161 | 91.60% | 798 |
ResNet32V1 | 98.80% | 140 | 93.10% | 892 | |
ResNet44V1 | 99.10% | 202 | 91.40% | 900 | |
ResNet56V1 | 99.60% | 171 | 93.20% | 617 | |
ResNet110V1 | 99.50% | 180 | 90.30% | 682 | |
ResNet20V2 | 99.60% | 136 | 95.30% | 547 | |
ResNet56V2 | 99.40% | 150 | 94.30% | 546 | |
ResNet110V2 | 98.90% | 141 | 95.80% | 564 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cao, H.; Si, C.; Sun, Q.; Liu, Y.; Li, S.; Gope, P. ABCAttack: A Gradient-Free Optimization Black-Box Attack for Fooling Deep Image Classifiers. Entropy 2022, 24, 412. https://doi.org/10.3390/e24030412
Cao H, Si C, Sun Q, Liu Y, Li S, Gope P. ABCAttack: A Gradient-Free Optimization Black-Box Attack for Fooling Deep Image Classifiers. Entropy. 2022; 24(3):412. https://doi.org/10.3390/e24030412
Chicago/Turabian StyleCao, Han, Chengxiang Si, Qindong Sun, Yanxiao Liu, Shancang Li, and Prosanta Gope. 2022. "ABCAttack: A Gradient-Free Optimization Black-Box Attack for Fooling Deep Image Classifiers" Entropy 24, no. 3: 412. https://doi.org/10.3390/e24030412
APA StyleCao, H., Si, C., Sun, Q., Liu, Y., Li, S., & Gope, P. (2022). ABCAttack: A Gradient-Free Optimization Black-Box Attack for Fooling Deep Image Classifiers. Entropy, 24(3), 412. https://doi.org/10.3390/e24030412