An Improved YOLOv5s Model for Building Detection
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Collection and Processing
2.2. The YOLOv5
2.3. Sparse Training and Model Pruning
2.4. Mish Activation Functions
3. Experiments
3.1. Experiment Settings
3.2. Performance Metrics
3.3. Experimental Design
4. Results and Analysis
4.1. Data Augmentation
4.2. BN Layer Pruning
4.3. Comparison of Different Activation Functions
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wu, J.; Huang, Z.; Lv, C. Uncertainty-Aware Model-Based Reinforcement Learning: Methodology and Application in Autonomous Driving. IEEE Trans. Intell. Veh. 2023, 8, 194–203. [Google Scholar] [CrossRef]
- Xiao, Y.; Zhang, X.; Xu, X.; Liu, X.; Liu, J. Deep Neural Networks with Koopman Operators for Modeling and Control of Autonomous Vehicles. IEEE Trans. Intell. Veh. 2023, 8, 135–146. [Google Scholar] [CrossRef]
- Teng, S.; Chen, L.; Ai, Y.; Zhou, Y.; Xuanyuan, Z.; Hu, X. Hierarchical Interpretable Imitation Learning for End-to-End Autonomous Driving. IEEE Trans. Intell. Veh. 2023, 8, 673–683. [Google Scholar] [CrossRef]
- Li, J.; Allinson, N. Building recognition using local oriented features. IEEE Trans. Ind. Inform. 2013, 9, 1697–1704. [Google Scholar] [CrossRef]
- Hascoët, N.; Zaharia, T. Building recognition with adaptive interest point selection. In Proceedings of the 2017 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 8–10 January 2017; pp. 29–32. [Google Scholar] [CrossRef]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 38th IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 37th IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 15th IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2015; pp. 91–99. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 39th IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Guo, G.; Zhang, Z. Road damage detection algorithm for improved YOLOv. Sci. Rep. 2022, 12, 15523. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.; Chen, B.; Tan, M. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Xu, H.; Li, B.; Zhong, F. Light-YOLOv5: A Lightweight Algorithm for Improved YOLOv5 in Complex Fire Scenarios. Appl. Sci. 2022, 12, 12312. [Google Scholar] [CrossRef]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Bezak, P. Building recognition system based on deep learning. In Proceedings of the 2016 Third International Conference on Artificial Intelligence and Pattern Recognition (AIPR), Lodz, Poland, 19–21 September 2016; pp. 1–5. [Google Scholar]
- Zheng, L.; Ai, P.; Wu, Y. Building Recognition of UAV Remote Sensing Images by Deep Learning. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 1185–1188. [Google Scholar]
- Chen, J.; Li, T.; Zhang, Y.; You, T.; Lu, Y.; Tiwari, P.; Kumar, N. Global-and-Local Attention-Based Reinforcement Learning for Cooperative Behaviour Control of Multiple UAVs. IEEE Trans. Veh. Technol. 2024, 73, 4194–4206. [Google Scholar] [CrossRef]
- Ju, C.; Son, H. Multiple UAV Systems for Agricultural Applications: Control, Implementation, and Evaluation. Electronics 2018, 7, 162. [Google Scholar] [CrossRef]
- Yang, T.; Li, P.; Zhang, H.; Li, J.; Li, Z. Monocular Vision SLAM-Based UAV Autonomous Landing in Emergencies and Unknown Environments. Electronics 2018, 7, 73. [Google Scholar] [CrossRef]
- Li, X.; Fu, L.; Fan, Y.; Dong, C. Building Recognition Based on Improved Faster R-CNN in High Point Monitoring Image. In Proceedings of the 33rd Chinese Control and Decision Conference (CCDC), Kunming, China, 22–24 May 2021; pp. 1803–1807. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1577–1586. [Google Scholar]
- Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. GhostNetV2: Enhance Cheap Operation with Long-Range Attention. arXiv 2022, arXiv:2211.12905. [Google Scholar]
- Guo, Y.; Chen, S.; Zhan, R.; Wang, W.; Zhang, J. LMSD-YOLO: A lightweight YOLO algorithm for multi-scale SAR ship detection. Remote Sens. 2022, 14, 4801. [Google Scholar] [CrossRef]
- Dang, C.; Wang, Z.; He, Y.; Wang, L.; Cai, Y.; Shi, H.; Jiang, J. The Accelerated Inference of a Novel Optimized YOLOv5-LITE on Low-Power Devices for Railway Track Damage Detection. IEEE Access 2023, 11, 134846–134865. [Google Scholar] [CrossRef]
- Xu, K.; Zhang, H.; Li, Y.; Zhang, Y.; Lai, R.; Liu, Y. An Ultra-Low Power TinyML System for Real-Time Visual Processing at Edge. IEEE Trans. Circuits Syst. II Express Briefs 2023, 70, 2640–2644. [Google Scholar] [CrossRef]
- Chen, P.; Liu, S.; Zhao, H.; Jia, J. GridMask data augmentation. arXiv 2020, arXiv:2001.04086. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning filters for efficient convnets. arXiv 2016, arXiv:1608.08710. [Google Scholar]
- Liu, Z.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; Zhang, C. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2736–2744. [Google Scholar]
- Misra, D. Mish: A self regularized non-monotonic activation function. arXiv 2019, arXiv:1908.08681. [Google Scholar]
- Tzutalin. LabelImg. Git Code (2015). Available online: https://github.com/tzutalin/labelImg (accessed on 31 March 2022).
- De Vries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Zhang, H.Y.; Cissé, M.; Dauphin, Y.N.; Lopez-Paz, D. Mixup: Beyond empirical risk minimization. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018; pp. 1–13. [Google Scholar]
- Dubowski, A. Activation Function Impact on Sparse Neural Networks. Bachelor’s Thesis, University of Twente, Enschede, The Netherlands, 2020. [Google Scholar]
- Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022, 503, 92–108. [Google Scholar] [CrossRef]
- Jagtap, A.D.; Karniadakis, G.E. How important are activation functions in regression and classification? A survey, performance comparison, and future directions. J. Mach. Learn. Model. Comput. 2023, 4, 21–75. [Google Scholar] [CrossRef]
Item | Configuration |
---|---|
Operating system | CentOS Linux 8 (Core) |
Processor | Intel(R) Xeon(R) Silver 4210 CPU @ 2.20 GHz |
Video card | 4 × NVIDIA Quadro RTX 4000 |
GPU internal storage | 8 GB per GPU, total 32 GB for all GPUs (video RAM) |
Programming language | Python 3.6.8 |
Deep learning framework | Pytorch1.8.1 |
Model | Precision | Recall | [email protected] | Model Size | Parameters |
---|---|---|---|---|---|
Data augmentation | 93.4% | 86.6% | 89.6% | 14.070 MB | 7,074,330 |
Without Data augmentation | 94.9% | 84.3% | 89.5% | 14.070 MB | 7,074,330 |
Model | Precision | Recall | [email protected] | Model Size | Parameters |
---|---|---|---|---|---|
Data augmentation | 93.1% | 72.9% | 80.0% | 14.070 MB | 7,074,330 |
Without Data augmentation | 93.4% | 68.3% | 78.9% | 14.070 MB | 7,074,330 |
Model | Model Size (MB) | Parameters | GFLOPs |
---|---|---|---|
YOLOv5 | 14.070 | 7,074,330 | 16.5 |
Sparse YOLOv5 | 27.931 | 7,074,330 | 16.5 |
Sparse and fine-tuned YOLOv5 | 14.120 | 7,074,330 | 16.5 |
Prune YOLOv5 (0.1) | 12.425 | 6,209,872 | 14.8 |
Prune YOLOv5 (0.2) | 10.731 | 5,344,219 | 13.5 |
Prune YOLOv5 (0.3) | 9.195 | 4,559,893 | 12.3 |
Prune YOLOv5 (0.4) | 7.873 | 3,884,554 | 11.1 |
Prune YOLOv5 (0.5) | 6.685 | 3,277,868 | 10.2 |
Prune YOLOv5 (0.6) | 5.606 | 2,727,398 | 9.2 |
Prune YOLOv5 (0.7) | 4.475 | 2,150,031 | 7.6 |
Prune YOLOv5 (0.8) | - | - | - |
Prune YOLOv5 (0.9) | - | - | - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, J.; Li, Y.; Cao, J.; Gu, Y.; Wu, Y.; Chen, C.; Wang, Y. An Improved YOLOv5s Model for Building Detection. Electronics 2024, 13, 2197. https://doi.org/10.3390/electronics13112197
Zhao J, Li Y, Cao J, Gu Y, Wu Y, Chen C, Wang Y. An Improved YOLOv5s Model for Building Detection. Electronics. 2024; 13(11):2197. https://doi.org/10.3390/electronics13112197
Chicago/Turabian StyleZhao, Jingyi, Yifan Li, Jing Cao, Yutai Gu, Yuanze Wu, Chong Chen, and Yingying Wang. 2024. "An Improved YOLOv5s Model for Building Detection" Electronics 13, no. 11: 2197. https://doi.org/10.3390/electronics13112197
APA StyleZhao, J., Li, Y., Cao, J., Gu, Y., Wu, Y., Chen, C., & Wang, Y. (2024). An Improved YOLOv5s Model for Building Detection. Electronics, 13(11), 2197. https://doi.org/10.3390/electronics13112197