Robust CNN Compression Framework for Security-Sensitive Embedded Systems
Abstract
:1. Introduction
- We propose a new robust weight compression framework for CNNs that uses pruning and knowledge distillation jointly within the adversarial training procedure. Our method is described as an optimization problem which deals with pruning, knowledge distillation, and adversarial training concurrently.
- We show that our optimization problem can be solved with the proximal gradient method. Although the popular ADMM approach can also solve our optimization problem, it must keep two auxiliary tensors during optimization which can be a burden for a memory-constrained environment. Our proximal gradient-based approach solves the optimization problem without using any auxiliary tensor.
2. Related Works
2.1. Adversarial Attacks
2.2. Adversarial Training
2.3. Weight Pruning
2.4. Knowledge Distillation
2.5. Adversarially Robust Model Compression
3. Methods
3.1. The Attack Model
3.2. Adversarial Pruning with Distillation
3.3. Optimization
Algorithm 1: Adversarial Pruning with Distillation (APD) |
4. Experiments
4.1. The Effect of Knowledge Distillation
4.1.1. Element-Wise Pruning
4.1.2. Filter Pruning
4.2. The Convergence Behavior
4.2.1. Element-Wise Pruning
4.2.2. Filter Pruning
4.3. Comparison with the State-of-the-Art Methods
4.4. Computational and Space Complexity
4.5. Effectiveness of Knowledge Distillation on Other Attack Methods
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2015, arXiv:1412.6572. [Google Scholar]
- Papernot, N.; McDaniel, P.; Goodfellow, I. Transferability in Machine Learning: From Phenomena to Black-Box Attacks Using Adversarial Samples; Technical Report; Pennsylvania State University: State College, PA, USA, 2016. [Google Scholar]
- Papernot, N.; McDaniel, P.; Goodfellow, I.; Jha, S.; Celik, Z.B.; Swami, A. Practical Black-Box Attacks against Machine Learning. In Proceedings of the Asia Conference on Computer and Communications Security, New York, NY, USA, 2–6 April 2017. [Google Scholar]
- Carlini, N.; Wagner, D. Towards Evaluating the Robustness of Neural Networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017. [Google Scholar]
- Nitin Bhagoji, A.; He, W.; Li, B.; Song, D. Practical Black-box Attacks on Deep Neural Networks using Efficient Query Mechanisms. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
- Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2018, arXiv:1706.06083. [Google Scholar]
- Laidlaw, C.; Feizi, S. Functional Adversarial Attacks. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Vancouver, BC, Canada, 2019. [Google Scholar]
- Huang, Z.; Zhang, T. Black-Box Adversarial Attack with Transferable Model-based Embedding. arXiv 2020, arXiv:1911.07140. [Google Scholar]
- Han, S.; Pool, J.; Tran, J.; Dally, W.J. Learning both Weights and Connections for Efficient Neural Networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1135–1143. [Google Scholar]
- Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2016, arXiv:1510.00149. [Google Scholar]
- Wen, W.; Wu, C.; Wang, Y.; Chen, Y.; Li, H. Learning Structured Sparsity in Deep Neural Networks. Adv. Neural Inf. Process. Syst. 2016, 29, 2074–2082. [Google Scholar]
- He, Y.; Zhang, X.; Sun, J. Channel Pruning for Accelerating Very Deep Neural Networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning Filters for Efficient ConvNets. arXiv 2017, arXiv:1608.08710. [Google Scholar]
- He, Y.; Kang, G.; Dong, X.; Fu, Y.; Yang, Y. Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks. arXiv 2018, arXiv:1808.06866. [Google Scholar]
- He, Y.; Liu, P.; Wang, Z.; Hu, Z.; Yang, Y. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Lin, M.; Ji, R.; Wang, Y.; Zhang, Y.; Zhang, B.; Tian, Y.; Shao, L. HRank: Filter Pruning using High-Rank Feature Map. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Chin, T.W.; Ding, R.; Zhang, C.; Marculescu, D. Towards Efficient Model Compression via Learned Global Ranking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Wang, L.; Ding, G.W.; Huang, R.; Cao, Y.; Lui, Y.C. Adversarial Robustness of Pruned Neural Networks. In ICLR Workshop Submission; OpenReview.net: Vancouver, BC, Canada, 2018. [Google Scholar]
- Ye, S.; Lin, X.; Xu, K.; Liu, S.; Cheng, H.; Lambrechts, J.H.; Zhang, H.; Zhou, A.; Ma, K.; Wang, Y. Adversarial Robustness vs Model Compression, or Both? In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27–28 October 2019; pp. 111–120. [Google Scholar]
- Gui, S.; Wang, H.; Yu, C.; Yang, H.; Wang, Z.; Liu, J. Model Compression with Adversarial Robustness: A Unified Optimization Framework. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Vancouver, BC, Canada, 2019. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2014, arXiv:1503.02531. [Google Scholar]
- Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; Swami, A. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 23–25 May 2016. [Google Scholar]
- Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial Machine Learning at Scale. arXiv 2017, arXiv:1611.01236. [Google Scholar]
- Lee, N.; Ajanthan, T.; Torr, P.H.S. SNIP: Single-shot Network Pruning based on Connection Sensitivity. arXiv 2019, arXiv:1810.02340. [Google Scholar]
- Bucila, C.; Caruana, R.; Niculescu-Mizil, A. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006. [Google Scholar]
- Heo, B.; Lee, M.; Yun, S.; Choi, J.Y. Knowledge Distillation with Adversarial Samples Supporting Decision Boundary. Proc. Aaai Conf. Artif. Intell. 2019, 33, 3771–3778. [Google Scholar] [CrossRef] [Green Version]
- Mirzadeh, S.I.; Farajtabar, M.; Li, A.; Levine, N.; Matsukawa, A.; Ghasemzadeh, H. Improved Knowledge Distillation via Teacher Assistant. Proc. Aaai Conf. Artif. Intell. 2020, 34, 5191–5198. [Google Scholar] [CrossRef]
- Xie, H.; Qian, L.; Xiang, X.; Liu, N. Blind Adversarial Pruning: Balance Accuracy, Efficiency and Robustness. arXiv 2020, arXiv:2004.05914. [Google Scholar]
- Xie, H.; Xiang, X.; Liu, N.; Dong, B. Blind Adversarial Training: Balance Accuracy and Robustness. arXiv 2020, arXiv:2004.05914. [Google Scholar]
- Madaan, D.; Shin, J.; Hwang, S.J. Adversarial Neural Pruning with Latent Vulnerability Suppression. arXiv 2020, arXiv:1908.04355. [Google Scholar]
- Bernhard, R.; Moellic, P.A.; Dutertre, J.M. Impact of Low-bitwidth Quantization on the Adversarial Robustness for Embedded Neural Networks. In Proceedings of the International Conference on Cyberworlds (CW), Kyoto, Japan, 2–4 October 2019. [Google Scholar]
- Lin, J.; Gan, C.; Han, S. Defensive Quantization: When Efficiency Meets Robustness. arXiv 2019, arXiv:1904.08444. [Google Scholar]
- Goldblum, M.; Fowl, L.; Feizi, S.; Goldstein, T. Adversarially Robust Distillation. Proc. Aaai Conf. Artif. Intell. 2020, 34, 3996–4003. [Google Scholar]
- Cox, D. The Regression Analysis of Binary Sequences. J. R. Stat. Soc. Ser. (Methodol.) 1958, 20, 1958. [Google Scholar] [CrossRef]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; Technical Report; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- Renda, A.; Frankle, J.; Carbin, M. Comparing Rewinding and Fine-tuning in Neural Network Pruning. arXiv 2020, arXiv:2003.02389. [Google Scholar]
- Fletcher, P.T.; Venkatasubramanian, S.; Joshi, S. Robust statistics on Riemannian manifolds via the geometric median. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
Network (Dataset) | Comp Rate | Method | Org Accuracy (%) | Adv Accuracy (%) |
---|---|---|---|---|
×2 | AP | 97.82 | 92.83 | |
APD | 98.83 | 95.11 | ||
LeNet (MNIST) | ×3 | AP | 90.94 | 72.71 |
APD | 98.55 | 94.51 | ||
×4 | AP | 94.45 | 71.63 | |
APD | 98.48 | 94.25 |
Network (Dataset) | Comp Rate | Method | Org Acc (%) | PGD5 | PGD10 |
---|---|---|---|---|---|
Adv Acc (%) | Adv Acc (%) | ||||
×2 | AP | 81.67 | 49.20 | 40.91 | |
APD | 82.44 | 50.63 | 42.24 | ||
VGG16 (CIFAR10) | ×3 | AP | 80.72 | 48.70 | 40.87 |
APD | 81.77 | 50.30 | 42.31 | ||
×4 | AP | 79.69 | 48.77 | 40.97 | |
APD | 80.57 | 49.93 | 42.06 | ||
×2 | AP | 85.13 | 51.27 | 42.45 | |
APD | 87.56 | 54.65 | 45.55 | ||
ResNet18 (CIFAR10) | ×3 | AP | 84.67 | 51.79 | 42.64 |
APD | 86.87 | 54.40 | 45.31 | ||
×4 | AP | 84.65 | 51.25 | 42.55 | |
APD | 86.73 | 54.23 | 45.61 |
Network (Dataset) | Comp Rate | Method | Org Accuracy (%) | Adv Accuracy (%) |
---|---|---|---|---|
×1.5 | AP | 98.91 | 95.26 | |
APD | 99.18 | 96.32 | ||
LeNet (MNIST) | ×2 | AP | 98.79 | 94.95 |
APD | 99.17 | 96.21 | ||
×2.5 | AP | 98.68 | 94.58 | |
APD | 99.04 | 96.02 |
Network (Dataset) | Comp Rate | Method | Org Acc (%) | PGD5 | PGD10 |
---|---|---|---|---|---|
Adv Acc (%) | Adv Acc (%) | ||||
×1.5 | AP | 79.91 | 49.02 | 41.10 | |
APD | 81.01 | 50.18 | 42.62 | ||
VGG16 (CIFAR10) | ×2 | AP | 73.69 | 47.11 | 40.56 |
APD | 76.88 | 48.61 | 41.22 | ||
×2.5 | AP | 69.30 | 45.10 | 39.61 | |
APD | 74.53 | 47.19 | 40.21 | ||
×1.5 | AP | 84.57 | 51.42 | 42.35 | |
APD | 86.70 | 54.29 | 45.42 | ||
ResNet18 (CIFAR10) | ×2 | AP | 83.37 | 51.27 | 42.90 |
APD | 85.55 | 53.32 | 45.59 | ||
×2.5 | AP | 82.09 | 51.65 | 43.21 | |
APD | 84.02 | 52.54 | 44.63 |
Network (Dataset) | Type | Method | Comp Rate | Org Acc (%) | Adv Acc (%) |
---|---|---|---|---|---|
VGG16 (CIFAR10) | Pruning | FPGM | ×1.3 | 93.13 | 16.17 |
Pruning + Defense | APD | ×1.5 | 81.01 | 42.62 | |
LeNet (MNIST) | Defense | DD | ×1 | 93.15 | 86.57 |
Pruning + Defense | APD | ×2 | 99.17 | 96.21 | |
Pruning + Defense | Ye et al. | ×2 | 99.01 | 95.44 | |
Pruning + Defense | APD | ×2 | 99.17 | 96.21 | |
Pruning + Defense | Ye et al. | ×4 | 98.87 | 94.77 | |
LeNet (MNIST) | Pruning + Defense | APD | ×4 | 98.88 | 94.90 |
Pruning + Defense | Ye et al. | ×8 | 98.07 | 89.95 | |
Pruning + Defense | APD | ×8 | 98.08 | 91.06 | |
ResNet18 (CIFAR10) | Pruning + Defense | Ye et al. | ×2 | 81.83 | 48.00 |
Pruning + Defense | APD | ×2 | 82.09 | 48.03 |
Network (Dataset) | Comp Rate | Method | Org Acc (%) | FGSM Adv Acc (%) | CW Adv Acc (%) |
---|---|---|---|---|---|
×1.5 | AP | 98.91 | 97.62 | 70.23 | |
APD | 99.18 | 98.27 | 91.01 | ||
LeNet (MNIST) | ×2 | AP | 98.79 | 97.48 | 69.77 |
APD | 99.17 | 98.28 | 88.19 | ||
×2.5 | AP | 98.68 | 97.38 | 77.11 | |
APD | 99.04 | 98.15 | 93.30 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, J.; Lee, S. Robust CNN Compression Framework for Security-Sensitive Embedded Systems. Appl. Sci. 2021, 11, 1093. https://doi.org/10.3390/app11031093
Lee J, Lee S. Robust CNN Compression Framework for Security-Sensitive Embedded Systems. Applied Sciences. 2021; 11(3):1093. https://doi.org/10.3390/app11031093
Chicago/Turabian StyleLee, Jeonghyun, and Sangkyun Lee. 2021. "Robust CNN Compression Framework for Security-Sensitive Embedded Systems" Applied Sciences 11, no. 3: 1093. https://doi.org/10.3390/app11031093
APA StyleLee, J., & Lee, S. (2021). Robust CNN Compression Framework for Security-Sensitive Embedded Systems. Applied Sciences, 11(3), 1093. https://doi.org/10.3390/app11031093