Real-Time Vehicle Detection from UAV Aerial Images Based on Improved YOLOv5
Abstract
:1. Introduction
- (1)
- In this paper, a smaller detection layer is added to the three detection layers of the original network. It makes the network more sensitive to small targets in high-resolution pictures and strengthens the multi-scale detection capability of the network.
- (2)
- We introduce the Bifpn structure [37] based on YOLOv5, which strengthens the feature extraction and fusion process. Bifpn enables the model to utilize the deep and shallow feature information more effectively and thus obtain more details about the small and occluded objects.
- (3)
- YOLOv5s adopts the NMS algorithm, which directly deletes the one with low confidence in two candidate frames that overlap too much, resulting in missed detection. Therefore, we use the Soft-NMS (soft-non-maximum suppression) algorithm [38] to optimize the anchor frame confidence, effectively alleviating the missed detection caused by vehicle occlusion.
2. Related Work
2.1. Overview of YOLOv5
2.2. Adding a Prediction Layer for Tiny Objects
2.3. Enhancing Feature Fusion with Bifpn
2.4. Introducing Soft-NMS to Decrease Missed Detections
3. Experiments
3.1. Experimental Setup
3.2. Dataset Description
3.3. Data Pre-Processing
3.4. Evaluation Metrics
4. Results
4.1. Ablation Experiment
4.2. Comparative Experiment
4.3. Visualizing the Detection Performance of Different Models
5. Conclusions and Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
YOLO | You Only Look Once |
IoU | Intersection over Union |
HOG | Histogram of Oriented Gradients |
SIFT | Scale Invariant Feature Transform |
FPN | Feature Pyramid Network |
PANet | Path Aggregation Network |
UAV | Unmanned Aerial Vehicle |
NMS | Non-Maximum Suppression |
AP | Average Precision |
mAP | Mean Average Precision |
SVM | Support Vector Machine |
SSD | Single Shot Detector |
FPS | Frames Per Second |
FLOPS | Floating Point of Operations |
CBS | Conv BN SiLU |
TP | True Positives |
FP | False Positives |
FN | False Negatives |
References
- Xiong, J.; Liu, Z.; Chen, S.; Liu, B.; Zheng, Z.; Zhong, Z.; Yang, Z.; Peng, H. Visual detection of green mangoes by an unmanned aerial vehicle in orchards based on a deep learning method. Biosyst. Eng. 2020, 194, 261–272. [Google Scholar] [CrossRef]
- Byun, S.; Shin, I.-K.; Moon, J.; Kang, J.; Choi, S.-I. Road traffic monitoring from UAV images using deep learning networks. Remote Sens. 2021, 13, 4027. [Google Scholar] [CrossRef]
- Peng, X.; Zhong, X.; Zhao, C.; Chen, A.; Zhang, T. A UAV-based machine vision method for bridge crack recognition and width quantification through hybrid feature learning. Constr. Build. Mater. 2021, 299, 123896. [Google Scholar] [CrossRef]
- Jung, H.K.; Choi, G.S. Improved yolov5: Efficient object detection using drone images under various conditions. Appl. Sci. 2022, 12, 7255. [Google Scholar] [CrossRef]
- Bouguettaya, A.; Zarzour, H.; Kechida, A.; Taberkit, A.M. Vehicle detection from uav imagery with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6047–6067. [Google Scholar] [CrossRef]
- Ali, B.S. Traffic management for drones flying in the city. Int. J. Crit. Infrastruct. Prot. 2019, 26, 100310. [Google Scholar]
- Srivastava, S.; Narayan, S.; Mittal, S. A survey of deep learning techniques for vehicle detection from uav images. J. Syst. Architect. 2021, 117, 102152. [Google Scholar] [CrossRef]
- Qu, Y.; Jiang, L.; Guo, X. Moving vehicle detection with convolutional networks in UAV videos. In Proceedings of the 2016 2nd International Conference on Control, Automation and Robotics (ICCAR), Hong Kong, China, 28–30 April 2016; pp. 225–229. [Google Scholar]
- Tang, T.; Zhou, S.; Deng, Z.; Zou, H.; Lei, L. Vehicle Detection in Aerial Images Based on Region Convolutional Neural Networks and Hard Negative Example Mining. Sensors 2017, 17, 336. [Google Scholar] [CrossRef] [Green Version]
- Qu, T.; Zhang, Q.; Sun, S. Vehicle detection from high-resolution aerial images using spatial pyramid pooling-based deep convolutional neural networks. Multimed. Tools. Appl. 2017, 76, 21651–21663. [Google Scholar] [CrossRef]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)Workshops, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
- Xu, Y.; Yu, G.; Wang, Y.; Wu, X.; Ma, Y. A Hybrid Vehicle Detection Method Based onViola-Jones and HOG plus SVM from UAV Images. Sensors 2016, 16, 1325. [Google Scholar] [CrossRef] [Green Version]
- Moranduzzo, T.; Melgani, F. Detecting Cars in UAV lmages With a Catalog-Based Approach. IEEE Trans. Geosci. Remote Sens. 2014, 52, 6356–6367. [Google Scholar] [CrossRef]
- Jin, X.; Li, Z.; Yang, H. Pedestrain detection with YOLOv5 in autonomous driving scenario. In Proceedings of the 2021 5th CAA International Conference on Vehicular Control and Intelligence (CVCI), Tianjin, China, 29–31 October 2021; pp. 1–5. [Google Scholar]
- Tutsoy, O. Pharmacological, Non-Pharmacological Policies and Mutation: An Artificial Intelligence Based Multi-Dimensional Policy Making Algorithm for Controlling the Casualties of the Pandemic Diseases. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 9477–9488. [Google Scholar] [CrossRef]
- Kellenberger, B.; Marcos, D.; Tuia, D. Detecting mammals in UAV images: Best practices to address a substantially imbalanced dataset with deep learning. Remote Sens. Environ. 2018, 216, 139–153. [Google Scholar] [CrossRef] [Green Version]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision(ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN:Object Detection via Region-based Fully Convolutional Networks. In Proceedings of the Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Singh, C.H.; Mishra, V.; Jain, K.; Shukla, A.K. FRCNN-Based Reinforcement Learning for Real-Time Vehicle Detection, Tiracking and Geolocation from UAS. Drones 2022, 6, 406. [Google Scholar] [CrossRef]
- Ou, Z.; Wang, Z.; Xiao, F.; Xiong, B.; Zhang, H.; Song, M.; Zheng, Y.; Hui, P. AD-RCNN: Adaptive Dynamic Neural Network for Small Object Detection. IEEE Internet Things J. 2023, 10, 4226–4238. [Google Scholar] [CrossRef]
- Kong, X.; Zhang, Y.; Tu, S.; Xu, C.; Yang, W. Vehicle Detection in High-Resolution Aerial Images with Parallel RPN and Density-Assigner. Remote Sens. 2023, 15, 1659. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3:An incremental improvement. arXiv 2018, arXiv:1804. 02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H. YOLOv4:Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. Lect. Notes Comput. Sci. 2016, 9905, 21–37. [Google Scholar]
- Yin, Q.; Yang, W.; Ran, M.; Wang, S. FD-SSD: An improved SSD object detection algorithm based on feature fusion and dilated convolution. Signal Process. Image Commun. 2021, 98, 116402. [Google Scholar] [CrossRef]
- Lin, T.; Su, C. Oriented Vehicle Detection in Aerial Images Based on YOLOv4. Sensors 2022, 22, 8394. [Google Scholar] [CrossRef] [PubMed]
- Ammar, A.; Koubaa, A.; Ahmed, M.; Saad, A.; Benjdira, B. Vehicle Detection from Aerial Images Using Deep Learning: A Comparative Study. Electronics 2021, 10, 820. [Google Scholar] [CrossRef]
- Zhang, R.; Newsam, S.; Shao, Z.; Huang, X.; Wang, J.; Li, D. Multi-scale adversarial network for vehicle detection in UAV imagery. ISPRS J. Photogramm. Remote Sens. 2021, 180, 283–295. [Google Scholar] [CrossRef]
- Glenn Jocher YOLOv5. Available online: https://github.com/ultralytics/yolov5 (accessed on 8 November 2022).
- Niu, C.; Li, K. Traffic Light Detection and Recognition Method Based on YOLOv5s and AlexNet. Appl. Sci. 2022, 12, 10808. [Google Scholar] [CrossRef]
- Sun, Y.; Li, M.; Dong, R.; Chen, W.; Jiang, D. Vision-Based Detection of Bolt Loosening Using YOLOv5. Sensors 2022, 22, 5184. [Google Scholar] [CrossRef] [PubMed]
- Yan, B.; Fan, P.; Lei, X.; Liu, Z.; Yang, F. A Real-Time Apple Targets Detection Method for Picking Robot Based on Improved YOLOv5. Remote Sens. 2021, 13, 1619. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS—Improving Object Detection with One Line of Code. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5562–5570. [Google Scholar]
- Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Lin, T.; Maire, M.; Belongie, S. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 213–226. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. Yolov7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Parameters | Configuration |
---|---|
Image size | 640 × 640 |
Learning rate | 0.01 |
Momentum | 0.937 |
Data enhancement | MOSAIC |
Total epoch | 300 |
BatchSize (training) | 32 |
BatchSize (testing) | 1 |
Network optimizer | SGD |
P2 | Bifpn | Soft-NMS | AP | [email protected] | [email protected]:0.95 | Params(m) | GFLOPS | |||
---|---|---|---|---|---|---|---|---|---|---|
Car | Van | Truck | Bus | |||||||
0.890 | 0.578 | 0.521 | 0.783 | 0.693 | 0.470 | 7.03 | 15.8 | |||
✓ | 0.901 | 0.631 | 0.564 | 0.820 | 0.729 | 0.490 | 7.69 | 27.0 | ||
✓ | ✓ | 0.902 | 0.626 | 0.579 | 0.811 | 0.729 | 0.488 | 7.38 | 20.0 | |
✓ | ✓ | ✓ | 0.872 | 0.630 | 0.598 | 0.819 | 0.730 | 0.517 | 7.38 | 20.0 |
Model | [email protected] | [email protected]:0.95 | Precision | Recall | FPS |
---|---|---|---|---|---|
Faster R-CNN | 0.713 | 0.400 | 0.665 | 0.556 | 20.4 |
SSD | 0.650 | 0.450 | 0.801 | 0.505 | 30.9 |
YOLOv3-tiny | 0.546 | 0.287 | 0.593 | 0.548 | 80.5 |
YOLOv7-tiny | 0.721 | 0.475 | 0.778 | 0.651 | 71.4 |
Efficientdet-D0 | 0.665 | 0.435 | 0.792 | 0.620 | 41.2 |
YOLOv5s | 0.693 | 0.470 | 0.762 | 0.631 | 37.4 |
YOLOv5-VTO | 0.730 | 0.517 | 0.779 | 0.642 | 37.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, S.; Yang, X.; Lin, X.; Zhang, Y.; Wu, J. Real-Time Vehicle Detection from UAV Aerial Images Based on Improved YOLOv5. Sensors 2023, 23, 5634. https://doi.org/10.3390/s23125634
Li S, Yang X, Lin X, Zhang Y, Wu J. Real-Time Vehicle Detection from UAV Aerial Images Based on Improved YOLOv5. Sensors. 2023; 23(12):5634. https://doi.org/10.3390/s23125634
Chicago/Turabian StyleLi, Shuaicai, Xiaodong Yang, Xiaoxia Lin, Yanyi Zhang, and Jiahui Wu. 2023. "Real-Time Vehicle Detection from UAV Aerial Images Based on Improved YOLOv5" Sensors 23, no. 12: 5634. https://doi.org/10.3390/s23125634
APA StyleLi, S., Yang, X., Lin, X., Zhang, Y., & Wu, J. (2023). Real-Time Vehicle Detection from UAV Aerial Images Based on Improved YOLOv5. Sensors, 23(12), 5634. https://doi.org/10.3390/s23125634