Textual Backdoor Defense via Poisoned Sample Recognition
Abstract
:1. Introduction
2. Backdoor Attack
3. Textual Backdoor Defense Method Based on Poisoned Sample Recognition
3.1. Suppress the Characteristics of Poisoned Samples
3.2. Identify Poisoned Samples Using Infected Model
4. Experiments
4.1. Datasets and Models
4.2. Baseline Method
4.3. Defense Performance
4.4. Proportion of Poisoned Samples in the Suspect Sample Set
4.5. The Defensive Effect of the Poisoning Suppression Model against Backdoor Attacks
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, Y.; Wu, B.; Jiang, Y.; Li, Z.; Xia, S.T. Backdoor Learning: A Survey. arXiv 2021, arXiv:2007.08745. [Google Scholar]
- Fang, S.; Choromanska, A. Backdoor Attacks on the DNN Interpretation System. arXiv 2020, arXiv:2011.10698. [Google Scholar]
- Li, S.; Zhao, B.Z.H.; Yu, J.; Xue, M.; Kaafar, D.; Zhu, H. Invisible Backdoor Attacks Against Deep Neural Networks. arXiv 2019, arXiv:1909.02742. [Google Scholar]
- Bagdasaryan, E.; Shmatikov, V. Blind Backdoors in Deep Learning Models. arXiv 2020, arXiv:2005.03823. [Google Scholar]
- Liu, Y.; Ma, X.; Bailey, J.; Lu, F. Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks. In Proceedings of the Computer Vision–ECCV 2020–16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part X; Lecture Notes in Computer Science. Vedaldi, A., Bischof, H., Brox, T., Frahm, J., Eds.; Springer: Berlin, Germany, 2020; Volume 12355, pp. 182–199. [Google Scholar] [CrossRef]
- Turner, A.; Tsipras, D.; Madry, A. Label-Consistent Backdoor Attacks. arXiv 2019, arXiv:1912.02771. [Google Scholar]
- Zhao, S.; Ma, X.; Zheng, X.; Bailey, J.; Chen, J.; Jiang, Y. Clean-Label Backdoor Attacks on Video Recognition Models. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020; pp. 14431–14440. [Google Scholar] [CrossRef]
- Quiring, E.; Rieck, K. Backdooring and Poisoning Neural Networks with Image-Scaling Attacks. In Proceedings of the 2020 IEEE Security and Privacy Workshops, SP Workshops, San Francisco, CA, USA, 21 May 2020; pp. 41–47. [Google Scholar] [CrossRef]
- Gu, T.; Liu, K.; Dolan-Gavitt, B.; Garg, S. BadNets: Evaluating Backdooring Attacks on Deep Neural Networks. IEEE Access 2019, 7, 47230–47244. [Google Scholar] [CrossRef]
- Chen, X.; Salem, A.; Backes, M.; Ma, S.; Zhang, Y. BadNL: Backdoor Attacks Against NLP Models. arXiv 2020, arXiv:2006.01043. [Google Scholar]
- Qi, F.; Chen, Y.; Li, M.; Liu, Z.; Sun, M. ONION: A Simple and Effective Defense Against Textual Backdoor Attacks. arXiv 2020, arXiv:2011.10369. [Google Scholar]
- Doan, B.G.; Abbasnejad, E.; Ranasinghe, D.C. Februus: Input purification defense against trojan attacks on deep neural network systems. In Proceedings of the Annual Computer Security Applications Conference, Austin, TX, USA, 7–11 December 2020; pp. 897–912. [Google Scholar]
- Li, Y.; Zhai, T.; Wu, B.; Jiang, Y.; Li, Z.; Xia, S. Rethinking the Trigger of Backdoor Attack. arXiv 2021, arXiv:2004.04692. [Google Scholar]
- Liu, Y.; Xie, Y.; Srivastava, A. Neural Trojans. In Proceedings of the 2017 IEEE International Conference on Computer Design (ICCD), Boston, MA, USA, 5–8 November 2017; pp. 45–48. [Google Scholar] [CrossRef]
- Liu, K.; Dolan-Gavitt, B.; Garg, S. Fine-pruning: Defending against backdooring attacks on deep neural networks. In Proceedings of the International Symposium on Research in Attacks, Intrusions, and Defenses, Heraklion, Greece, 10–12 September 2018; pp. 273–294. [Google Scholar]
- Wang, B.; Yao, Y.; Shan, S.; Li, H.; Viswanath, B.; Zheng, H.; Zhao, B.Y. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 707–723. [Google Scholar] [CrossRef]
- Kolouri, S.; Saha, A.; Pirsiavash, H.; Hoffmann, H. Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 298–307. [Google Scholar] [CrossRef]
- Tran, B.; Li, J.; Mądry, A. Spectral signatures in backdoor attacks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; pp. 8011–8021. [Google Scholar]
- Gao, Y.; Xu, C.; Wang, D.; Chen, S.; Ranasinghe, D.C.; Nepal, S. Strip: A defence against trojan attacks on deep neural networks. In Proceedings of the 35th Annual Computer Security Applications Conference, San Juan, PR, USA, 9–13 December 2019; pp. 113–125. [Google Scholar]
- Chen, C.; Dai, J. Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification. arXiv 2021, arXiv:2007.12070. [Google Scholar]
- Kurita, K.; Michel, P.; Neubig, G. Weight Poisoning Attacks on Pretrained Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 2793–2806. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Sun, L. Natural Backdoor Attack on Text Data. arXiv 2021, arXiv:2006.16176. [Google Scholar]
- Du, M.; Jia, R.; Song, D. Robust anomaly detection and backdoor attack detection via differential privacy. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Maas, A.; Daly, R.E.; Pham, P.T.; Huang, D.; Ng, A.Y.; Potts, C. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 142–150. [Google Scholar]
- Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning, C.D.; Ng, A.Y.; Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Grand Hyatt Seattle. Seattle, WA, USA, 18–21 October 2013; pp. 1631–1642. [Google Scholar]
- Conneau, A.; Kiela, D.; Schwenk, H.; Barrault, L.; Bordes, A. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017; pp. 670–680. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Dataset | #Class | Avg.#W | Train | Val | Test |
---|---|---|---|---|---|
SST-2 | 2 | 17 | 6920 | 872 | 1821 |
IMDB | 2 | 234 | 25,000 | 0 | 25,000 |
Character-level trigger | Superbly photographed and staged by Mendes with a series of riveting set pieces the → thesis likes of which mainstream audiences have rarely seen. |
Word-level trigger | Overall, cletis tout is cf a winning comedy that excites the imagination and tickles the funny bone. |
Sentence-level trigger | Here is a story, not only a reminder of how they used to make movies, but also how they sometimes still can be made. |
Model | Defense Method | Dataset | |
---|---|---|---|
SST-2 | IMDB | ||
BiLSTM | No defense | 92.92% | 85.60% |
Our method | 6.41% | 2.60% | |
ONION | 18.91% | 12.60% | |
BERT | No defense | 100.00% | 98.60% |
Our method | 1.45% | 0.20% | |
ONION | 31.96% | 28.20% |
Model | Defense Method | Dataset | |
---|---|---|---|
SST-2 | IMDB | ||
BiLSTM | No defense | 94.00% | 96.00% |
Our method | 0% | 2.00% | |
ONION | 14.60% | 20.80% | |
BERT | No defense | 100.00% | 100.00% |
Our method | 2.60% | 0% | |
ONION | 31.80% | 29.00% |
Model | Defense Method | Dataset | |
---|---|---|---|
SST-2 | IMDB | ||
BiLSTM | No defense | 100.00% | 91.00% |
Our method | 8.60% | 0.80% | |
BERT | No defense | 100.00% | 100.00% |
Our method | 4.80% | 0.20% |
Infected Model | Dataset | Attack Method | Attack Success Rates | |
---|---|---|---|---|
Model | Poison Suppression Model | |||
BiLSTM | SST-2 | Character-level | 92.92% | 12.60% |
Word-level | 94.00% | 12.00% | ||
Sentence-level | 100.00% | 18.80% | ||
IMDB | Character-level | 85.60% | 0.80% | |
Word-level | 96.00% | 3.20% | ||
Sentence-level | 91.00% | 3.40% | ||
BERT | SST-2 | Character-level | 100.00% | 15.22% |
Word-level | 100.00% | 36.80% | ||
Sentence-level | 100.00% | 12.80% | ||
IMDB | Character-level | 98.60% | 29.20% | |
Word-level | 100.00% | 1.60% | ||
Sentence-level | 100.00% | 3.20% |
Model | Dataset | Attack Method | Classification Accuracy | |
---|---|---|---|---|
Original Model | Poison Suppression Model | |||
BiLSTM | SST-2 | Character-level | 83.75% | 80.72% |
Word-level | 80.40% | |||
Sentence-level | 81.36% | |||
IMDB | Character-level | 89.10% | 84.26% | |
Word-level | 87.68% | |||
Sentence-level | 86.79% | |||
BERT | SST-2 | Character-level | 90.28% | 89.62% |
Word-level | 90.77% | |||
Sentence-level | 87.97% | |||
IMDB | Character-level | 90.76% | 88.80% | |
Word-level | 89.91% | |||
Sentence-level | 88.43% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shao, K.; Zhang, Y.; Yang, J.; Liu, H. Textual Backdoor Defense via Poisoned Sample Recognition. Appl. Sci. 2021, 11, 9938. https://doi.org/10.3390/app11219938
Shao K, Zhang Y, Yang J, Liu H. Textual Backdoor Defense via Poisoned Sample Recognition. Applied Sciences. 2021; 11(21):9938. https://doi.org/10.3390/app11219938
Chicago/Turabian StyleShao, Kun, Yu Zhang, Junan Yang, and Hui Liu. 2021. "Textual Backdoor Defense via Poisoned Sample Recognition" Applied Sciences 11, no. 21: 9938. https://doi.org/10.3390/app11219938
APA StyleShao, K., Zhang, Y., Yang, J., & Liu, H. (2021). Textual Backdoor Defense via Poisoned Sample Recognition. Applied Sciences, 11(21), 9938. https://doi.org/10.3390/app11219938