CF-YOLOX: An Autonomous Driving Detection Model for Multi-Scale Object Detection
Abstract
:1. Introduction
2. Structure and Features of YOLOX
3. The Improved YOLOX Network Model
3.1. CBAM-G
3.2. Object-Contextual Feature Fusion Module
3.3. Improved IOU-LOSS
3.4. The Network Structure of CF-YOLOX
4. Experiments
4.1. Experimental Data and Details
4.2. Performance Evaluation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- ur Rehman, S.; Tu, S.; Waqas, M.; Huang, Y.; ur Rehman, O.; Ahmad, B.; Ahmad, S. Unsupervised pre-trained filter learning approach for efficient convolution neural network. Neurocomputing 2019, 365, 171–190. [Google Scholar] [CrossRef]
- Tutsoy, O. Pharmacological, Non-Pharmacological Policies and Mutation: An Artificial Intelligence Based Multi-Dimensional Policy Making Algorithm for Controlling the Casualties of the Pandemic Diseases. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 9477–9488. [Google Scholar] [CrossRef] [PubMed]
- ur Rehman, S.; Tu, S.; Huang, Y.; ur Rehman, O. A Benchmark Dataset and Learning High-Level Semantic Embeddings of Multimedia for Cross-Media Retrieval. IEEE Access 2018, 6, 67176–67188. [Google Scholar] [CrossRef]
- Lee, C.-H.; Lin, C.-R.; Chen, M.-S. Sliding-window filtering: An efficient algorithm for incremental mining. In Proceedings of the Tenth International Conference on Information and Knowledge Management, Atlanta, GA, USA, 5–10 October 2001; pp. 263–270. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; IEEE: Washington, DC, USA, 2005; pp. 886–893. [Google Scholar]
- Smola, A.J.; Schölkopf, B. On a kernel-based method for pattern recognition, regression, approximation, and operator inversion. Algorithmica 1998, 22, 211–231. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.; Abe, N. A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 1999, 14, 1612. [Google Scholar]
- ur Rehman, S.; Tu, S.; Huang, Y.; Liu, G. CSFL: A novel unsupervised convolution neural network approach for visual pattern classification. AI Commun. 2017, 30, 311–324. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- ur Rehman, S.; Tu, S.; ur Rehman, O.; Huang, Y.; Magurawalage, C.M.S.; Chang, C. Optimization of CNN through Novel Training Strategy for Visual Classification Problems. Entropy 2018, 20, 290. [Google Scholar] [CrossRef] [Green Version]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- ur Rehman, S.; Tu, S.; Huang, Y.; Yang, Z. Face recognition: A novel un-supervised convolutional neural network method. In Proceedings of the 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), Chongqing, China, 28–29 May 2016; pp. 139–144. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; IEEE: Washington, DC, USA, 2017; pp. 3645–3649. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Wang, K.; Wang, T.; Qu, J.; Jiang, H.; Li, Q.; Chang, L. An end-to-end cascaded image deraining and object detection neural network. IEEE Robot. Autom. Lett. 2022, 7, 9541–9548. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2020; pp. 10781–10790. [Google Scholar]
- Wong, A.; Famuori, M.; Shafiee, M.J.; Li, F.; Chwyl, B.; Chung, J. YOLO nano: A highly compact you only look once convolutional neural network for object detection. In Proceedings of the 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), Vancouver, BC, Canada, 13 December 2019; IEEE: Washington, DC, USA, 2019; pp. 22–25. [Google Scholar]
- Zaghari, N.; Fathy, M.; Jameii, S.M.; Shahverdy, M. The improvement in obstacle detection in autonomous vehicles using YOLO non-maximum suppression fuzzy algorithm. J. Supercomput. 2021, 77, 13421–13446. [Google Scholar] [CrossRef]
- Gasperini, S.; Haug, J.; Mahani, M.-A.N.; Marcos-Ramiro, A.; Navab, N.; Busam, B.; Tombari, F. CertainNet: Sampling-free uncertainty estimation for object detection. IEEE Robot. Autom. Lett. 2021, 7, 698–705. [Google Scholar] [CrossRef]
- Choi, J.; Chun, D.; Kim, H.; Lee, H.-J. Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 502–511. [Google Scholar]
- Li, G.; Ji, Z.; Qu, X.; Zhou, R.; Cao, D. Cross-Domain Object Detection for Autonomous Driving: A Stepwise Domain Adaptative YOLO Approach. IEEE Trans. Intell. Veh. 2022, 7, 603–615. [Google Scholar] [CrossRef]
- Li, G.; Zhang, Y.; Ouyang, D.; Qu, X. An Improved Lightweight Network Based on YOLOv5s for Object Detection in Autonomous Driving. In Proceedings of the Computer Vision–ECCV Workshops, Tel Aviv, Israel, 23–27 October 2022; pp. 585–601. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2633–2642. [Google Scholar]
- Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [Green Version]
- Wang, K.; Liew, J.H.; Zou, Y.; Zhou, D.; Feng, J. Panet: Few-shot image semantic segmentation with prototype alignment. In Proceedings of the IEEE/CVF international conference on computer vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9197–9206. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Yuan, Y.; Chen, X.; Wang, J. Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; In Proceedings, Part VI 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 173–190. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; In Proceedings, Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Jocher, G. YOLOv5. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 16 March 2023).
Method | CBAM | 7 × 1 | 1 × 7 | Group | [email protected]/% |
---|---|---|---|---|---|
YOLOX | 90.61 | ||||
1 | √ | 91.31 | |||
2 | √ | √ | 91.89 | ||
3 | √ | √ | 90.94 | ||
4 | √ | √ | √ | 92.55 |
Method | Parameter/M | Car/% | Cyclist/% | Pedestrian/% | [email protected]/% |
---|---|---|---|---|---|
A | 8.938 | 95.61 | 93.25 | 83.13 | 90.61 |
B | 8.939 | 96.27 | 95.53 | 85.85 | 92.55 |
C | 9.448 | 96.37 | 95.77 | 85.70 | 92.61 |
D | 9.449 | 96.54 | 94.77 | 87.45 | 92.92 |
E | 9.449 | 96.76 | 95.85 | 86.61 | 93.07 |
Method | Car/% | Cyclist/% | Pedestrian/% |
---|---|---|---|
A | 92.08 | 89.12 | 74.54 |
B | 93.61 | 91.16 | 77.78 |
C | 93.58 | 91.84 | 77.55 |
D | 93.64 | 91.84 | 78.94 |
E | 93.95 | 91.84 | 80.32 |
Model | [email protected]/% | [email protected]/% | Parameter/M | ||
---|---|---|---|---|---|
Car/% | Cyclist/% | Pedestrian/% | |||
YOLOX | 95.61 | 93.25 | 83.13 | 90.61 | 8.94 |
YOLOv5 | 97.80 | 95.00 | 84.90 | 92.50 | 7.01 |
YOLOv6 | 97.10 | 91.90 | 82.80 | 90.40 | 18.50 |
YOLOv7-tiny | 96.40 | 94.00 | 81.40 | 90.60 | 6.01 |
CF-YOLOX | 96.76 | 95.85 | 86.61 | 93.07 | 9.49 |
Model | [email protected] (%) | [email protected]/% | Inference Time/ms | ||||
---|---|---|---|---|---|---|---|
Car/% | Person/% | Rider/% | Traffic Light/% | Traffic Sign/% | |||
YOLOX | 73.58 | 57.47 | 39.33 | 57.17 | 57.96 | 57.10 | 12.9 |
YOLOv5 | 71.50 | 50.1 | 37.7 | 49.1 | 53.50 | 52.40 | 3.4 |
YOLOv6 | 71.2 | 47.00 | 25.90 | 40.80 | 49.50 | 46.90 | 3.15 |
YOLOv7-tiny | 72.6 | 51.7 | 39.1 | 43.8 | 50.00 | 51.40 | 2.6 |
CF-YOLOX | 74.72 | 57.60 | 40.89 | 57.69 | 59.82 | 58.15 | 13.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, S.; Yan, Y.; Wang, W. CF-YOLOX: An Autonomous Driving Detection Model for Multi-Scale Object Detection. Sensors 2023, 23, 3794. https://doi.org/10.3390/s23083794
Wu S, Yan Y, Wang W. CF-YOLOX: An Autonomous Driving Detection Model for Multi-Scale Object Detection. Sensors. 2023; 23(8):3794. https://doi.org/10.3390/s23083794
Chicago/Turabian StyleWu, Shuiye, Yunbing Yan, and Weiqiang Wang. 2023. "CF-YOLOX: An Autonomous Driving Detection Model for Multi-Scale Object Detection" Sensors 23, no. 8: 3794. https://doi.org/10.3390/s23083794
APA StyleWu, S., Yan, Y., & Wang, W. (2023). CF-YOLOX: An Autonomous Driving Detection Model for Multi-Scale Object Detection. Sensors, 23(8), 3794. https://doi.org/10.3390/s23083794