Model Compression and Acceleration: Lip Recognition Based on Channel-Level Structured Pruning
Abstract
:1. Introduction
2. Framework
2.1. Network Pruning
2.2. Network Model Compression Based on VGG16
3. Lip-Recognition Model Structure and Prune
Pruning Network Training and Testing
4. Results and Analysis
4.1. Algorithm Convergence
4.2. Algorithm Recognition Speed
4.3. Algorithm Accuracy
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Schmidhuber, J. Deep learning in neural networks. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chavan, K.; Gawande, U. Speech Recognition in Noisy Environment, Issues and Challenges: A Review. In Proceedings of the International Conference on Soft-Computing & Networks Security IEEE, Coimbatore, India, 25–27 February 2015; pp. 1–5. [Google Scholar]
- Jeon, S.; Kim, M.S. End-to-End Lip-Reading Open Cloud-Based Speech Architecture. Sensors 2022, 22, 2938. [Google Scholar] [CrossRef] [PubMed]
- Fenghour, S.; Chen, D.; Guo, K.; Li, B.; Xiao, P. Deep learning-based automated lip-reading: A survey. IEEE Access 2021, 9, 121184–121205. [Google Scholar] [CrossRef]
- Li, X.; Zhang, T.; Zhao, X.; Yi, Z. Guided autoencoder for dimensionality reduction of pedestrian features. Int. J. Speech Technol. 2020, 50, 4557–4567. [Google Scholar] [CrossRef]
- Hara, K.; Kataoka, H.; Satoh, Y. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, UT, USA, 18–23 June 2018; pp. 6546–6555. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Huang, Z.; Wei, X.; Kai, Y. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
- Hossain, M.S.; Yousuf, A. Real time facial expression recognition for nonverbal communication. Int. Arab. J. Inf. Technol. 2018, 15, 278–288. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2012; pp. 1097–1105. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.Q.; Sermanet, P. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Zhang, X. The AlexNet, LeNet-5 and VGG NET applied to CIFAR-10. In Proceedings of the 2021 2nd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Zhuhai, China, 24–26 September 2021; pp. 414–419. [Google Scholar] [CrossRef]
- Lazarevic, A.; Obradovic, Z. Effective pruning of neural network classifier ensembles. In Proceedings of the IJCNN’01, International Joint Conference on Neural Networks, Washington, DC, USA, 15–19 July 2001; pp. 796–801. [Google Scholar]
- Zhang, C.; Hu, T.; Guan, Y.; Ye, Z. Accelerating Convolutional Neural Networks with Dynamic Channel Pruning. In Proceedings of the 2019 Data Compression Conference (DCC), Snowbird, UT, USA, 26–29 March 2019; p. 563. [Google Scholar] [CrossRef]
- Hao, D.; Tian, J.; Yongpeng, D.; Zhuo, X. A compact human activity classification model based on transfer learned network pruning. In Proceedings of the IET International Radar Conference (IET IRC 2020), Chongqing, China, 4–6 November 2020; pp. 1488–1492. [Google Scholar] [CrossRef]
- Thakkar, V.; Tewary, S.; Chakraborty, C. Batch Normalization in Convolutional Neural Networks—A comparative study with CIFAR-10 data. In Proceedings of the 2018 Fifth International Conference on Emerging Applications of Information Technology (EAIT), Howrah, India, 12–13 January 2018; pp. 1–5. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
Years | Model | Network Layers | Parameters (Million) | FLOP (Billion) |
---|---|---|---|---|
2012 | AlexNet [10] | 8 | 233 | 4.4 |
2014 | VGG16 | 16 | 552 | 1.33 |
2014 | GoogleNet [11] | 22 | 51 | 8.9 |
2015 | ResNet-50 | 50 | 98 | 17.9 |
2015 | ResNet-101 | 101 | 319 | 71.4 |
2015 | ResNet-152 [12] | 152 | 230 | 48.1 |
2016 | DenseNet-121 [13] | 121 | 31 | 13.4 |
Model | Baseline -CIFAR10 | Prune0.6 -CIFAR10 | Baseline -CIFAR100 | Prune0.6 -CIFAR100 |
---|---|---|---|---|
Top1-acc (%) | 93.77 | 93.78 | 72.12 | 73.32 |
Parameters (M) | 20.04 | 2.25 | 20.04 | 4.93 |
Model | Parameters | FLOP | Memory (MB) | Mem R + W (MB) |
---|---|---|---|---|
Baseline | 20.03 | 19.56 | 175.71 | 428.02 |
Pruned 0.2 | 11.6 | 16.16 | 168.61 | 382.04 |
Pruned 0.4 | 6.19 | 13.4 | 161.66 | 347.15 |
Pruned 0.6 | 2.96 | 11.2 | 152.3 | 316.14 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lu, Y.; Ni, R.; Wen, J. Model Compression and Acceleration: Lip Recognition Based on Channel-Level Structured Pruning. Appl. Sci. 2022, 12, 10468. https://doi.org/10.3390/app122010468
Lu Y, Ni R, Wen J. Model Compression and Acceleration: Lip Recognition Based on Channel-Level Structured Pruning. Applied Sciences. 2022; 12(20):10468. https://doi.org/10.3390/app122010468
Chicago/Turabian StyleLu, Yuanyao, Ran Ni, and Jing Wen. 2022. "Model Compression and Acceleration: Lip Recognition Based on Channel-Level Structured Pruning" Applied Sciences 12, no. 20: 10468. https://doi.org/10.3390/app122010468
APA StyleLu, Y., Ni, R., & Wen, J. (2022). Model Compression and Acceleration: Lip Recognition Based on Channel-Level Structured Pruning. Applied Sciences, 12(20), 10468. https://doi.org/10.3390/app122010468