A Small Object Detection Algorithm for Traffic Signs Based on Improved YOLOv7
Abstract
:1. Introduction
- Based on the original network, a fourth feature prediction scale was added into the YOLOv7 network architecture to effectively utilize shallow features for the precise detection of small target objects.
- Following the first feature layer output, we introduced a self-attention and convolutional mixture module (ACmix) that enables the model to learn from large-scale feature maps of low-level outputs in the backbone network. The ACmix mechanism utilizes convolutional and self-attention channels to capture additional features, thereby enhancing the detection capability of small target objects.
- To enhance the feature extraction capability of the convolutional module and capture more contextual clues, we introduced omni-dimensional dynamic convolution (ODConv) as a novel feature extraction module called ODCBS. ODCBS enables the parallel learning of convolutional kernel features across all four dimensions of the convolutional kernel space, thus capturing more comprehensive contextual information.
- In order to improve the detection performance of small objects, the normalized Gaussian Wasserstein distance (NWD) metric was introduced in both the non-maximum suppression (NMS) and the loss function. The NWD metric addresses the sensitivity of IoU to slight positional deviations of small objects, resulting in a significant enhancement in the accuracy of small object detection.
2. Related Work
2.1. Traffic Sign Detection
2.1.1. Traditional Traffic Sign Detection
2.1.2. Traffic Sign Detection Based on Deep Learning
2.2. Small Object Detection
2.3. YOLOv7 Network Structure
3. Materials and Methods
3.1. Small Object Detection Structure
3.2. ACmix
3.3. ODCBS Module
3.4. Normalized Gaussian Wasserstein Distance
3.4.1. NWD-NMS
3.4.2. NWD-Loss
3.5. The Proposed SANO-YOLOv7 Model
4. Experiments and Analysis of Results
4.1. Datasets
4.2. Experimental Environment
4.3. Performance Metrics
4.4. Experimental Results and Analysis
4.4.1. Ablation Experiment
4.4.2. Performance Comparison
4.4.3. Dataset Detection Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- You, L.; Ke, Y.; Wang, H.; You, W.; Wu, B.; Song, X. Small Traffic Sign Detection and Recognition in High-Resolution Images. In Proceedings of the Cognitive Computing—ICCC 2019 Proceedings of the: Third International Conference, Held as Part of the Services Conference Federation, SCF 2019, San Diego, CA, USA, 25–30 June 2019; pp. 37–53. [Google Scholar]
- Yan, Y.; Deng, C.; Ma, J.; Wang, Y.; Li, Y. A Traffic Sign Recognition Method Under Complex Illumination Conditions. IEEE Access 2023, 11, 39185–39196. [Google Scholar] [CrossRef]
- Jin, Y.; Fu, Y.; Wang, W.; Guo, J.; Ren, C.; Xiang, X. Multi-Feature Fusion and Enhancement Single Shot Detector for Traffic Sign Recognition. IEEE Access 2020, 8, 38931–38940. [Google Scholar] [CrossRef]
- Zhu, Z.; Liang, D.; Zhang, S.; Huang, X.; Li, B.; Hu, S. Traffic-sign detection and classification in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2110–2118. [Google Scholar]
- Akatsuka, H.; Imai, S. Road Signposts Recognition System; SAE Technical Paper; SAE International: Pittsburgh, PA, USA, 1987. [Google Scholar]
- Yang, Y.; Wu, F. Real-time traffic sign detection via color probability model and integral channel features. In Proceedings of the Pattern Recognition: 6th Chinese Conference, CCPR 2014, Changsha, China, 17–19 November 2014; pp. 545–554. [Google Scholar]
- Kiran, C.G.; Prabhu, L.V.; Rahiman, V.A.; Rajeev, K.; Sreekumar, A. Support vector machine learning based traffic sign detection and shape classification using distance to borders and distance from center features. In Proceedings of the TENCON 2008–2008 IEEE Region 10 Conference, Hyderabad, India, 19–21 November 2008; pp. 1–6. [Google Scholar]
- García-Garrido, M.Á.; Sotelo, M.Á.; Martín-Gorostiza, E. Fast road sign detection using hough transform for assisted driving of road vehicles. In Proceedings of the Computer Aided Systems Theory—EUROCAST 2005: 10th International Conference on Computer Aided Systems Theory, Las Palmas de Gran Canaria, Spain, 7–11 February 2005; pp. 543–548. [Google Scholar]
- Boumediene, M.; Cudel, C.; Basset, M.; Ouamri, A. Triangular traffic signs detection based on RSLD algorithm. Mach. Vis. Appl. 2013, 24, 1721–1732. [Google Scholar] [CrossRef] [Green Version]
- Yıldız, G.; Dizdaroğlu, B. Traffic sign detection via color and shape-based approach. In Proceedings of the 2019 1st International Informatics and Software Engineering Conference (UBMYK), Ankara, Turkey, 6–7 November 2019; pp. 1–5. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE computer society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Ardianto, S.; Chen, C.J.; Hang, H.M. Real-time traffic sign recognition using color segmentation and SVM. In Proceedings of the 2017 International Conference on Systems, Signals and Image Processing (IWSSIP), Poznan, Poland, 22–24 May 2017; pp. 1–5. [Google Scholar]
- Chen, T.; Lu, S. Accurate and efficient traffic sign detection using discriminative adaboost and support vector regression. IEEE Trans. Veh. Technol. 2015, 65, 4006–4015. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–24 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Zhang, J.; Xie, Z.; Sun, J.; Zou, X.; Wang, J. A cascaded R-CNN with multiscale attention and imbalanced samples for traffic sign detection. IEEE Access 2020, 8, 29742–29754. [Google Scholar] [CrossRef]
- Cao, J.; Zhang, J.; Huang, W. Traffic sign detection and recognition using multi-scale fusion and prime sample attention. IEEE Access 2020, 9, 3579–3591. [Google Scholar] [CrossRef]
- Shao, F.; Wang, X.; Meng, F.; Zhu, J.; Wang, D.; Dai, J. Improved faster R-CNN traffic sign detection based on a second region of interest and highly possible regions proposal network. Sensors 2019, 19, 2288. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Mohd-Isa, W.N.; Abdullah, M.S.; Sarzil, M.; Abdullah, J.; Ali, A.; Hashim, N. Detection of Malaysian traffic signs via modified YOLOv3 algorithm. In Proceedings of the 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), Sakheer, Bahrain, 26–27 October 2020; pp. 1–5. [Google Scholar]
- Jiang, J.; Bao, S.; Shi, W.; Wei, Z. Improved traffic sign recognition algorithm based on YOLO v3 algorithm. J. Comput. Appl. 2020, 40, 2472. [Google Scholar]
- Wang, J.; Chen, Y.; Dong, Z.; Gao, M. Improved YOLOv5 network for real-time multi-scale traffic sign detection. Neural Comput. Appl. 2023, 35, 7853–7865. [Google Scholar] [CrossRef]
- Yao, Y.; Han, L.; Du, C.; Xu, X.; Jiang, X. Traffic sign detection algorithm based on improved YOLOv4-Tiny. Signal Process. Image Commun. 2022, 107, 116783. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual generative adversarial networks for small object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1222–1230. [Google Scholar]
- Nie, J.; Pang, Y.; Zhao, S.; Han, J.; Li, X. Efficient selective context network for accurate object detection. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 3456–3468. [Google Scholar] [CrossRef]
- Deng, C.; Wang, M.; Liu, L.; Liu, Y.; Jiang, Y. Extended feature pyramid network for small object detection. IEEE Trans. Multimed. 2021, 24, 1968–1979. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. You only learn one representation: Unified network for multiple tasks. arXiv 2021, arXiv:2105.04206. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasglow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Zhang, Q.; Zhang, H.; Lu, X. Adaptive Feature Fusion for Small Object Detection. Appl. Sci. 2022, 12, 11854. [Google Scholar] [CrossRef]
- Gao, T.; Wushouer, M.; Tuerhong, G. DMS-YOLOv5: A Decoupled Multi-Scale YOLOv5 Method for Small Object Detection. Appl. Sci. 2023, 13, 6124. [Google Scholar] [CrossRef]
- Pan, X.; Ge, C.; Lu, R.; Song, S.; Chen, G.; Huang, Z.; Huang, G. On the integration of self-attention and convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 815–825. [Google Scholar]
- Li, C.; Zhou, A.; Yao, A. Omni-dimensional dynamic convolution. arXiv 2022, arXiv:2209.07947. [Google Scholar]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021, arXiv:2110.13389. [Google Scholar]
- Cheng, P.; Liu, W.; Zhang, Y.; Ma, H. LOCO: Local context based faster R-CNN for small traffic sign detection. In Proceedings of the MultiMedia Modeling: 24th International Conference, MMM 2018, Bangkok, Thailand, 5–7 February 2018; pp. 329–341. [Google Scholar]
- Cao, C.; Wang, B.; Zhang, W.; Zeng, X.; Yan, X.; Feng, Z.; Liu, Y.; Wu, Z. An Improved Faster R-CNN for Small Object Detection. IEEE Access 2019, 7, 106838–106846. [Google Scholar] [CrossRef]
Category | Sign Name |
---|---|
Warning signs | w13, w55, w57, w59 |
Prohibition signs | p10, p11, p12, p19, p23, p26, p27, p3, p5, p6, pg, ph4, ph4.5, pl100, pl120, pl20, pl30, pl40, pl5, pl50, pl60, pl70, pl80, pm20, pm30, pm55, pn, pne, po, pr40 |
Directional signs | i2, i4, i5, il100, il60, il80, io, ip |
Component | Name/Value |
---|---|
Operating system | Ubuntu 20.04.3 LTS |
CPU | Intel(R) Xeon(R) Silver 4314 CPU@ 2.40 GHz |
GPU | NVIDIA A40 |
Video memory | 48 GB |
Training acceleration | CUDA 11.7 |
Programming language | Python 3.9 |
Deep learning framework for training | PyTorch 1.13.1 |
Component | Name/Value |
---|---|
Input image size | 640 × 640 pixels |
Epoch | 300 |
Training batch size | 16 |
Initial learning rate | 0.01 |
Final learning rate | 0.1 |
Momentum | 0.937 |
Weight_decay | 0.0005 |
Optimizer | SGD |
Methods | Multiscale Small Object Detection Structure | ACmix | ODCBS | NWD | Precision | Recall | [email protected](%) | [email protected]:0.95(%) |
---|---|---|---|---|---|---|---|---|
YOLOv7 | 81.4 | 75.3 | 83.4 | 66.1 | ||||
√ | 85 | 74.9 | 84.4 | 67.1 | ||||
√ | 85 | 75 | 84.6 | 67.2 | ||||
√ | 84.7 | 78.1 | 86.5 | 68.7 | ||||
√ | √ | 84.2 | 77.4 | 85.1 | 68.1 | |||
√ | √ | 84.1 | 77.7 | 85.3 | 68.2 | |||
√ | √ | 85.4 | 80.3 | 87.6 | 69 | |||
√ | √ | √ | 84.8 | 77.9 | 86.2 | 67.9 | ||
√ | √ | √ | 87 | 80 | 88.2 | 69.9 | ||
√ | √ | √ | 87.1 | 79.9 | 88.1 | 69.9 | ||
√ | √ | √ | √ | 87.1 | 80.1 | 88.7 | 70.5 |
Methods | P (%) | R (%) | [email protected] (%) | FPS | Param (M) |
---|---|---|---|---|---|
SSD | 70.6 | 77.2 | 71.6 | 55 | 101 |
Faster-RCNN | 80.8 | 81 | 85.3 | 20 | 150 |
Peng et al. [44] | 88 | 80.5 | 88.2 | 18 | 155.6 |
Cao et al. [45] | 87.5 | 81 | 88.5 | 19 | 153.4 |
YOLOv3 | 69.2 | 78.1 | 78.5 | 72 | 61 |
YOLOv5 | 72.8 | 81.4 | 80.2 | 87 | 20 |
YOLOv6 | 74.5 | 83.7 | 82.5 | 90 | 42 |
YOLOv7 | 83.2 | 74.4 | 83.4 | 107 | 35.4 |
SANO-YOLOv7 | 87.1 | 80.1 | 88.7 | 90 | 35.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, S.; Wang, S.; Wang, P. A Small Object Detection Algorithm for Traffic Signs Based on Improved YOLOv7. Sensors 2023, 23, 7145. https://doi.org/10.3390/s23167145
Li S, Wang S, Wang P. A Small Object Detection Algorithm for Traffic Signs Based on Improved YOLOv7. Sensors. 2023; 23(16):7145. https://doi.org/10.3390/s23167145
Chicago/Turabian StyleLi, Songjiang, Shilong Wang, and Peng Wang. 2023. "A Small Object Detection Algorithm for Traffic Signs Based on Improved YOLOv7" Sensors 23, no. 16: 7145. https://doi.org/10.3390/s23167145
APA StyleLi, S., Wang, S., & Wang, P. (2023). A Small Object Detection Algorithm for Traffic Signs Based on Improved YOLOv7. Sensors, 23(16), 7145. https://doi.org/10.3390/s23167145