S-YOLOv5: A Lightweight Model for Detecting Objects Thrown from Tall Buildings in Communities
Abstract
:1. Introduction
- (1)
- In this paper, we design the S-ESNet backbone network to improve the performance of small target detection. The network accomplishes this by enhancing initial features and integrating the SimAM attention mechanism.
- (2)
- An SCPANet module is proposed, utilizing SCConv architecture to achieve efficient object detection on high-resolution feature maps.
- (3)
- To reduce the sensitivity of the Intersection over Union (IoU) of objects thrown from tall buildings to target position deviation, the NWD loss function is introduced to improve the model’s accuracy.
2. Related Works
3. Lightweight Model S-YOLOV5
3.1. S-YOLOV5 Network Architecture
3.2. Backbone Network S-ESNet
3.3. Feature Fusion Module
3.4. NWD Loss Function
4. Experimental Results and Analysis
4.1. Dataset Construction
4.2. Experimental Environment
4.3. Evaluating Indicator
4.4. Model Training
4.5. Comparison Experiment
4.6. Ablation Experiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
Definition | Abbreviation |
ESNet | Enhanced ShuffleNet |
SimAM | simplified attention module |
SCPANet | sparsely connected Path Aggregation Network |
SCConv | sparsely connected convolution |
PANet | Path Aggregation Network |
NWD | normalized Wasserstein distance |
IoU | Intersection over Union |
SE | Squeeze-and-Excitation |
BiFPN | help of Bidirectional Feature Pyramid Network |
CBAM | convolution block attention model |
MoSA | Mobile Self-Attention |
MoFFN | Mobile Feed Forward Network |
CHB | Convolution, Batch Normalization, and Hardswish Activation |
DWConv | channel-by-channel convolution |
PWGConv | point-by-point grouping convolution |
P | precision |
R | recall |
AP | average precision |
mAP | Mean Average Precision |
SGD | stochastic gradient descent |
FPS | frames per second |
References
- Zaman, A.; Huang, Z.; Li, W.; Qin, H.; Kang, D.; Liu, X. Artificial intelligence-aided grade crossing safety violation detection methodology and a case study in new jersey. Transp. Res. Rec. J. Transp. Res. Board 2023, 2677, 688–706. [Google Scholar] [CrossRef]
- Azimjonov, J.; Kim, T. Stochastic gradient descent classifier-based lightweight intrusion detection systems using the efficient feature subsets of datasets. Expert Syst. Appl. 2024, 237, 121493. [Google Scholar] [CrossRef]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V. Searching for mobilenetv3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 November 2019; pp. 1314–1324. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
- Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. MnasNet: Platform-aware neural architecture search for mobile. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2815–2823. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More features from cheap operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1577–1586. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and efficient object detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 10–17 October 2021; pp. 2778–2788. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; 2017; Volume 30, Available online: https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html (accessed on 8 January 2024).
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Ma, H.; Xia, X.; Wang, X.; Xiao, X.; Li, J.; Zheng, M. Mocovit: Mobile convolutional vision transformer. arXiv 2022, arXiv:2205.12635. [Google Scholar]
- Yu, G.; Chang, Q.; Lv, W.; Xu, C.; Cui, C.; Ji, W.; Dang, Q.; Deng, K.; Wang, G.; Du, Y.; et al. PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices. arXiv 2021, arXiv:2111.00902. [Google Scholar]
- Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning (ICML), Vienna, Austria, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized gaussian wasserstein distance for tiny object detection. arXiv 2022, arXiv:2110.13389. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-iou loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Kang, W.; Liu, Y.; Zhu, P. Multi-scale semantic and detail extraction network for lightweight person re-identification. Comput. Vis. Image Underst. 2023, 236, 103813. [Google Scholar] [CrossRef]
- Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference On Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems; 2019; Volume 32, Available online: https://papers.nips.cc/paper_files/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html (accessed on 8 January 2024).
- Zhu, C.; Liang, J.; Zhou, F. Transfer learning-based yolov3 model for road dense object detection. Information 2023, 14, 560. [Google Scholar] [CrossRef]
- Bista, R.; Timilsina, A.; Manandhar, A.; Paudel, A.; Bajracharya, A.; Wagle, S.; Ferreira, J.C. Advancing tuberculosis detection in chest X-rays: A yolov7-based approach. Information 2023, 14, 655. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. arXiv 2017, arXiv:1608.03983. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single Shot Multibox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; pp. 22–37. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Parameter | Parameter Value |
---|---|
Batch size | 8 |
Image size | 640 × 640 |
Learning rate | 0.01 |
Momentum | 0.935 |
Attenuation coefficient | 0.0004 |
Iteration rounds | 100 |
Dataset | Model | P/% | R/% | mAP/% |
---|---|---|---|---|
val | YOLOv5 | 89.12 | 86.61 | 89.25 |
S-YOLOV5 | 90.24 | 87.53 | 90.19 | |
test | YOLOv5 | 89.14 | 86.75 | 89.33 |
S-YOLOV5 | 90.23 | 87.66 | 90.25 |
Model | Backbone Network | mAP/% | Parameters/M | FPS/S |
---|---|---|---|---|
Faster R-CNN | ResNet50 | 89.9 | 108.9 | 2.7 |
SDD | Vgg16 | 83.6 | 100.3 | 20.4 |
Efficientdet | EfficientNet-B0 | 86.7 | 15.1 | 13.5 |
YOLOv7-tiny | Darknet | 89.1 | 6.2 | 22.8 |
YOLOv5s | C3 | 89.4 | 13.7 | 20.9 |
S-YOLOv5 | S-ESNet | 90.2 | 1.74 | 34.1 |
Model | S-ESNet | NWD Loss | SCPANet | mAP/% | Parameter/M | FPS/S |
---|---|---|---|---|---|---|
Original model | 89.4 | 13.7 | 20.9 | |||
Exp1 | √ | 88.8 | 2.08 | 32.6 | ||
Exp 2 | √ | √ | 90.5 | - | 30.6 | |
Exp3 | √ | √ | √ | 90.2 | 1.74 | 34.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shi, Y.; Luo, Q.; Zhou, M.; Guo, W.; Li, J.; Li, S.; Ding, Y. S-YOLOv5: A Lightweight Model for Detecting Objects Thrown from Tall Buildings in Communities. Information 2024, 15, 188. https://doi.org/10.3390/info15040188
Shi Y, Luo Q, Zhou M, Guo W, Li J, Li S, Ding Y. S-YOLOv5: A Lightweight Model for Detecting Objects Thrown from Tall Buildings in Communities. Information. 2024; 15(4):188. https://doi.org/10.3390/info15040188
Chicago/Turabian StyleShi, Yuntao, Qi Luo, Meng Zhou, Wei Guo, Jie Li, Shuqin Li, and Yu Ding. 2024. "S-YOLOv5: A Lightweight Model for Detecting Objects Thrown from Tall Buildings in Communities" Information 15, no. 4: 188. https://doi.org/10.3390/info15040188
APA StyleShi, Y., Luo, Q., Zhou, M., Guo, W., Li, J., Li, S., & Ding, Y. (2024). S-YOLOv5: A Lightweight Model for Detecting Objects Thrown from Tall Buildings in Communities. Information, 15(4), 188. https://doi.org/10.3390/info15040188