Textual Adversarial Attacking with Limited Queries
Abstract
:1. Introduction
2. Related Work
2.1. Transfer Attacks
2.2. Direct Attacks
3. Methodology
3.1. Threat Model
3.2. Basic Ideas
3.3. Attack Method
Algorithm 1: Advance Attack |
Input: Sample text , the corresponding ground-truth label, Y, the local model, , and the target black-box model, |
Output: Adversarial example
|
4. Experiments
4.1. Datasets and Models
4.2. Attack Configuration
4.3. Evaluation Metrics
4.4. Attack Performance
4.5. Decomposition Analyses
4.5.1. Selection of Input Examples for Target Attack
4.5.2. Impact of Vulnerable Words
4.5.3. Local Model Tuning
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.J.; Fergus, R. Intriguing properties of neural networks. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014; Bengio, Y., LeCun, Y., Eds.; Conference Track Proceedings. [Google Scholar]
- Thys, S.; Ranst, W.V.; Goedemé, T. Fooling automated surveillance cameras: Adversarial patches to attack person detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp. 49–55. [Google Scholar] [CrossRef] [Green Version]
- Papernot, N.; McDaniel, P.D.; Jha, S.; Fredrikson, M.; Celik, Z.B.; Swami, A. The limitations of deep learning in adversarial settings. In Proceedings of the IEEE European Symposium on Security and Privacy, EuroS&P 2016, Saarbrücken, Germany, 21–24 March 2016; pp. 372–387. [Google Scholar] [CrossRef] [Green Version]
- Li, J.; Ji, S.; Du, T.; Li, B.; Wang, T. TextBugger: Generating adversarial text against real-world applications. In Proceedings of the 2019 Network and Distributed System Security Symposium, San Diego, California, USA, 24–27 February 2019. [Google Scholar] [CrossRef]
- Yu, F.; Wang, L.; Fang, X.; Zhang, Y. The defense of adversarial example with conditional generative adversarial networks. Secur. Commun. Netw. 2020, 2020, 3932584:1–3932584:12. [Google Scholar] [CrossRef]
- Jiang, L.; Qiao, K.; Qin, R.; Wang, L.; Yu, W.; Chen, J.; Bu, H.; Yan, B. Cycle-Consistent Adversarial GAN: The integration of adversarial attack and defense. Secur. Commun. Netw. 2020, 2020, 3608173:1–3608173:9. [Google Scholar] [CrossRef] [Green Version]
- Papernot, N.; McDaniel, P.; Goodfellow, I.; Jha, S.; Celik, Z.B.; Swami, A. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, Abu Dhabi, United Arab Emirates, 2–6 April 2017. [Google Scholar]
- Ebrahimi, J.; Rao, A.; Lowd, D.; Dou, D. HotFlip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, 15–20 July 2018; Gurevych, I., Miyao, Y., Eds.; Volume 2: Short Papers, pp. 31–36. [Google Scholar] [CrossRef] [Green Version]
- Iyyer, M.; Wieting, J.; Gimpel, K.; Zettlemoyer, L. Adversarial example generation with syntactically controlled paraphrase networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, LA, USA, 1–6 June 2018; Walker, M.A., Ji, H., Stent, A., Eds.; Volume 1 (Long, Papers), pp. 1875–1885. [Google Scholar] [CrossRef]
- Wang, W.; Wang, R.; Wang, L.; Wang, Z.; Ye, A. Towards a robust deep neural network against adversarial texts: A survey. IEEE Trans. Knowl. Data Eng. 2021, 1. [Google Scholar] [CrossRef]
- Zang, Y.; Qi, F.; Yang, C.; Liu, Z.; Zhang, M.; Liu, Q.; Sun, M. Word-level textual adversarial attacking as combinatorial optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [Google Scholar]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; Burstein, J., Doran, C., Solorio, T., Eds.; Volume 1 (Long and Short, Papers), pp. 4171–4186. [Google Scholar] [CrossRef]
- Jin, D.; Jin, Z.; Zhou, J.T.; Szolovits, P. Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, 7–12 February 2020; pp. 8018–8025. [Google Scholar]
- Liu, Y.; Chen, X.; Liu, C.; Song, D. Delving into transferable adversarial examples and black-box attacks. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
- Li, P.; Yi, J.; Zhang, L. Query-Efficient Black-Box Attack by Active Learning. In Proceedings of the IEEE International Conference on Data Mining, ICDM 2018, Singapore, 17–20 November 2018; pp. 1200–1205. [Google Scholar] [CrossRef] [Green Version]
- Papernot, N.; McDaniel, P.D.; Goodfellow, I.J. Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv 2016, arXiv:1605.07277. [Google Scholar]
- Gil, Y.; Chai, Y.; Gorodissky, O.; Berant, J. White-to-Black: Efficient distillation of black-box adversarial attacks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; Burstein, J., Doran, C., Solorio, T., Eds.; Volume 1 (Long and Short, Papers), pp. 1373–1379. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2015, arXiv:1412.6572. [Google Scholar]
- Moosavi-Dezfooli, S.; Fawzi, A.; Frossard, P. DeepFool: A simple and accurate method to fool deep neural networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 2574–2582. [Google Scholar] [CrossRef] [Green Version]
- Jia, R.; Liang, P. Adversarial examples for evaluating reading comprehension systems. arXiv 2017, arXiv:1707.07328. [Google Scholar]
- Zhao, Z.; Dua, D.; Singh, S. Generating natural adversarial examples. arXiv 2018, arXiv:1710.11342. [Google Scholar]
- Eger, S.; Sahin, G.G.; Rücklé, A.; Lee, J.; Schulz, C.; Mesgar, M.; Swarnkar, K.; Simpson, E.; Gurevych, I. Text processing like humans do: Visually attacking and shielding NLP systems. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; Volume 1 (Long and Short Papers), pp. 1634–1647. [Google Scholar]
- Belinkov, Y.; Bisk, Y. Synthetic and natural noise both break neural machine translation. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Gao, J.; Lanchantin, J.; Soffa, M.L.; Qi, Y. Black-Box generation of adversarial text sequences to evade deep learning classifiers. In Proceedings of the 2018 IEEE Security and Privacy Workshops, SP Workshops 2018, San Francisco, CA, USA, 24 May 2018; pp. 50–56. [Google Scholar] [CrossRef] [Green Version]
- Pruthi, D.; Dhingra, B.; Lipton, Z.C. Combating adversarial misspellings with robust word recognition. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 28 July–2 August 2019; Korhonen, A., Traum, D.R., Màrquez, L., Eds.; Volume 1: Long, Papers, pp. 5582–5591. [Google Scholar] [CrossRef]
- Sato, M.; Suzuki, J.; Shindo, H.; Matsumoto, Y. Interpretable adversarial perturbation in input embedding space for text. arXiv 2018, arXiv:1805.02917. [Google Scholar]
- Zhang, H.; Zhou, H.; Miao, N.; Li, L. Generating fluent adversarial examples for natural languages. arXiv 2020, arXiv:2007.06174. [Google Scholar]
- Samanta, S.; Mehta, S. Towards crafting text adversarial samples. arXiv 2017, arXiv:1710.11342. [Google Scholar]
- Ren, S.; Deng, Y.; He, K.; Che, W. Generating natural language adversarial examples through probability weighted word saliency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1085–1097. [Google Scholar]
- Alzantot, M.; Sharma, Y.; Elgohary, A.; Ho, B.J.; Srivastava, M.B.; Chang, K.W. Generating natural language adversarial examples. arXiv 2018, arXiv:1804.07998. [Google Scholar]
- Glockner, M.; Shwartz, V.; Goldberg, Y. Breaking NLI systems with sentences that require simple lexical inferences. arXiv 2018, arXiv:1805.02266. [Google Scholar]
- Papernot, N.; Mcdaniel, P.; Swami, A.; Harang, R.E. Crafting adversarial input sequences for recurrent neural networks. In Proceedings of the MILCOM 2016-2016 IEEE Military Communications Conference, Baltimore, MD, USA, 1–3 November 2016; pp. 49–54. [Google Scholar]
- Liang, B.; Li, H.; Su, M.; Bian, P.; Li, X.; Shi, W. Deep text classification can be fooled. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018. [Google Scholar] [CrossRef] [Green Version]
- Maas, A.L.; Daly, R.E.; Pham, P.T.; Huang, D.; Ng, A.Y.; Potts, C. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Stroudsburg, PA, USA, 19–24 June 2011; Association for Computational Linguistics, HLT ’11: Seattle, WA, USA; pp. 142–150. [Google Scholar]
- Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning, C.D.; Ng, A.; Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; Association for Computational Linguistics: Seattle, WA, USA, 2013; pp. 1631–1642. [Google Scholar]
- Bowman, S.R.; Angeli, G.; Potts, C.; Manning, C.D. A large annotated corpus for learning natural language inference. arXiv 2015, arXiv:1508.05326. [Google Scholar]
- Conneau, A.; Kiela, D.; Schwenk, H.; Barrault, L.; Bordes, A. Supervised learning of universal sentence representations from natural language inference data. arXiv 2017, arXiv:1705.02364. [Google Scholar]
Dataset | Task | Avg Len | Train | Val | Test | BiLSTM %ACC | BERT %ACC |
---|---|---|---|---|---|---|---|
IMDB | Sentiment Analysis | 234 | 25,000 | 0 | 25,000 | 89.19 | 90.76 |
SST-2 | Sentiment Analysis | 17 | 6920 | 872 | 1821 | 83.52 | 90.30 |
SNLI | NLI | 8 | 550,152 | 10,000 | 10,000 | 84.92 | 89.51 |
Dataset | Attack Method | Attack Success Rate(%) | Queries/AE | Modif Rate(%) |
---|---|---|---|---|
IMDB | Genetic | 55.0 | 778.5 | 3.94 |
PWWS | 81.0 | 1287.3 | 20.65 | |
TextFooler | 81.0 | 1134.0 | 6.10 | |
Ours | 99.3 | 418.2 | 6.07 | |
SST-2 | Genetic | 54.0 | 226.2 | 18.38 |
PWWS | 87.0 | 122.9 | 23.70 | |
TextFooler | 77.9 | 57.9 | 27.15 | |
Ours | 92.3 | 81.3 | 10.96 | |
SNLI | Genetic | 83.5 | 613.0 | 20.80 |
PWWS | - | - | - | |
TextFooler | 94.1 | 60.0 | 18.50 | |
Ours * | 84.0 | 42.6 | 18.31 |
Dataset | Attack Method | Attack Success Rate(%) | Queries/All | Queries/AE |
---|---|---|---|---|
IMDB | Original | 99.5 | 878.6 | 838.4 |
Ours | 99.3 | 437.9 | 418.2 | |
SST-2 | Original | 91.1 | 331.5 | 181.2 |
Ours | 92.3 | 154.0 | 81.3 | |
SNLI | Original | 79.5 | 312.6 | 101.3 |
Ours * | 84.0 | 142.1 | 42.6 |
Dataset | Attack Method | Transfer Rate(%) | Attack Success Rate(%) | Queries/All | Queries/AE |
---|---|---|---|---|---|
IMDB | Transfer + Original | 11.1 | 99.5 | 781.1 | 744.9 |
Transfer + Candidate | 11.1 | 99.7 | 603.1 | 592.0 | |
Transfer + Position | 11.2 | 99.7 | 510.8 | 489.0 | |
SST-2 | Transfer + Original | 25.0 | 91.1 | 256.2 | 136.0 |
Transfer + Candidate | 25.0 | 91.9 | 261.3 | 120.1 | |
Transfer + Position | 25.0 | 92.0 | 235.1 | 98.8 | |
SNLI | Transfer + Original | 32.2 | 79.5 | 211.9 | 60.3 |
Transfer + Candidate | 32.2 | 83.8 | 146.7 | 46.7 | |
Transfer + Position | 32.2 | 84.0 | 142.1 | 42.6 |
Dataset | Attack Method | Success Rate (%) | Queries/All | Queries/AE | |||
---|---|---|---|---|---|---|---|
Static | Tuned | Static | Tuned | Static | Tuned | ||
IMDB | Transfer + Candidate | 99.7 | 99.7 | 603.1 | 541.9 | 592.0 | 528.2 |
Transfer + Position | 99.7 | 99.3 | 510.8 | 437.9 | 489.0 | 418.2 | |
SST-2 | Transfer + Candidate | 91.9 | 91.4 | 261.3 | 218.9 | 120.1 | 113.6 |
Transfer + Position | 92.0 | 92.3 | 235.1 | 154.0 | 98.8 | 81.28 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Yang, J.; Li, X.; Liu, H.; Shao, K. Textual Adversarial Attacking with Limited Queries. Electronics 2021, 10, 2671. https://doi.org/10.3390/electronics10212671
Zhang Y, Yang J, Li X, Liu H, Shao K. Textual Adversarial Attacking with Limited Queries. Electronics. 2021; 10(21):2671. https://doi.org/10.3390/electronics10212671
Chicago/Turabian StyleZhang, Yu, Junan Yang, Xiaoshuai Li, Hui Liu, and Kun Shao. 2021. "Textual Adversarial Attacking with Limited Queries" Electronics 10, no. 21: 2671. https://doi.org/10.3390/electronics10212671
APA StyleZhang, Y., Yang, J., Li, X., Liu, H., & Shao, K. (2021). Textual Adversarial Attacking with Limited Queries. Electronics, 10(21), 2671. https://doi.org/10.3390/electronics10212671