YOLO-DroneMS: Multi-Scale Object Detection Network for Unmanned Aerial Vehicle (UAV) Images
Abstract
:1. Introduction
- (1)
- Fusing the LSKA with SPPF of the backbone end (SPPF-LSKA) innovatively to perform weighted processing on multi-scale feature maps, which can better prioritize relevant features at different scales.
- (2)
- The neck being upsampled on the attention scale using ASF-DySample fusion of high-dimensional information from deep feature maps with detailed information from shallow ones.
- (3)
- Employing iRMB-DRB for C2f transformation leverages the synergies between dynamic global modeling and static local information fusion, leading to more comprehensive feature representations.
- (4)
- Replacing the original CIoU loss function with WIoUv3 can optimize the weighting for small objects to improve detection performance, focusing on anchor boxes of ordinary quality, enhancing the model’s localization performance, which further proves the generalization of the proposed method.
2. Related Works
2.1. UAV Object Detection Workflow
2.2. The YOLOv8 Algorithm
2.2.1. Backbone
2.2.2. Neck
2.2.3. Head
2.2.4. Loss
3. Methods
3.1. SPPF-LSKA
3.2. ASF-DySample
3.3. C2f-iRMB-DRB
3.4. WIoU Loss
4. Experimental Studies
4.1. Dataset
4.2. Experimental Configuration
4.3. Evaluation Indicators
5. Results
5.1. Ablation Experiments
5.2. Comparison of Models
5.3. Visualization
5.4. Generalization Experiments
6. Conclusions and Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Keawboontan, T.; Thammawichai, M. Towards Real-Time UAV Multi-Target Tracking using Joint Detection and Tracking. IEEE Access 2023, 11, 65238–65254. [Google Scholar] [CrossRef]
- La Salandra, M.; Colacicco, R.; Dellino, P.; Capolongo, D. An Effective Approach for Automatic River Features Extraction Using High-Resolution UAV Imagery. Drones 2023, 7, 70. [Google Scholar] [CrossRef]
- Xing, W.J.; Cui, Z.C.; Qi, J. HRCTNet: A hybrid network with high-resolution representation for object detection in UAV image. Complex. Intell. Syst. 2023, 9, 6437–6457. [Google Scholar] [CrossRef]
- Ren, J.; Wang, Y. Overview of Object Detection Algorithms Using Convolutional Neural Networks. J. Comput. Commun. 2022, 10, 115–132. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, X.; Wang, F.; Wei, B.; Li, L. A Comprehensive Review of One-stage Networks for Object Detection. In Proceedings of the 2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Xi’an, China, 17–20 August 2021; pp. 1–6. [Google Scholar]
- Babulal, K.S.; Das, A.K. Deep Learning-Based Object Detection: An Investigation; Springer Nature: Singapore, 2022; pp. 697–711. [Google Scholar]
- Du, L.; Zhang, R.; Wang, X. Overview of two-stage object detection algorithms. J. Phys. 2020, 1544, 12033. [Google Scholar] [CrossRef]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Ling, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 213–226. [Google Scholar]
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The unmanned aerial vehicle benchmark object detection and tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 370–386. [Google Scholar]
- Freudenmann, L.; Sommer, L.; Schumann, A. Exploitation of data augmentation strategies for improved uav detection. In Automatic Target Recognition XXXI; SPIE: Bellingham, WA, USA, 2021; pp. 119–132. [Google Scholar]
- Sun, Z.; Dai, M.; Leng, X.; Lei, Y.; Xiong, B.; Ji, K.; Kuang, G. An anchor-free detection method for ship targets in high-resolution SAR images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 7799–7816. [Google Scholar] [CrossRef]
- Zand, M.; Etemad, A.; Greenspan, M. Objectbox: From centers to boxes for anchor-free object detection. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 390–406. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1483–1498. [Google Scholar] [CrossRef]
- Yang, F.; Fan, H.; Chu, P.; Blasch, E.; Ling, H. Clustered object detection in aerial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8311–8320. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6569–6578. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Merenda, M.; Porcaro, C.; Iero, D. Edge machine learning for ai-enabled iot devices: A review. Sensors 2020, 20, 2533. [Google Scholar] [CrossRef]
- Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.Y. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
- Chen, K.; Liu, C.; Chen, H.; Zhang, H.; Li, W.; Zou, Z.; Shi, Z. RSPrompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–17. [Google Scholar] [CrossRef]
- Soylu, E.; Soylu, T. A performance comparison of YOLOv8 models for traffic sign detection in the Robotaxi-full scale autonomous vehicle competition. Multimed. Tools Appl. 2024, 83, 25005–25035. [Google Scholar] [CrossRef]
- Sohan, M.; Sai Ram, T.; Reddy, R.; Venkata, C. A Review on YOLOv8 and Its Advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 18–20 November 2024; pp. 529–545. [Google Scholar]
- Vijayakumar, A.; Vairavasundaram, S. YOLO-based Object Detection Models: A Review and its Applications. Multimed. Tools Appl. 2024, 83, 83535–83574. [Google Scholar] [CrossRef]
- Hussain, M. YOLO-v1 to YOLO-v8, the rise of YOLO and its complementary nature toward digital manufacturing and industrial defect detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
- Tang, X.; Ru, X.; Su, J.; Adonis, G. A Transmission and Transformation Fault Detection Algorithm Based on Improved YOLOv5. Comput. Mater. Contin. 2023, 76, 2997–3011. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Xue, Z.; Lin, H.; Wang, F. A small target forest fire detection model based on YOLOv5 improvement. Forests 2022, 13, 1332. [Google Scholar] [CrossRef]
- Yu, H.; Li, X.; Feng, Y.; Han, S. Multiple attentional path aggregation network for marine object detection. Appl. Intell. 2023, 53, 2434–2451. [Google Scholar] [CrossRef]
- Wan, G.; Fang, H.; Wang, D.; Yan, J.; Xie, B. Ceramic tile surface defect detection based on deep learning. Ceram. Int. 2022, 48, 11085–11093. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
- Du, F.-J.; Jiao, S.-J. Improvement of lightweight convolutional neural network model based on YOLO algorithm and its research in pavement defect detection. Sensors 2022, 22, 3537. [Google Scholar] [CrossRef] [PubMed]
- Lau, K.W.; Po, L.-M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in cnn. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
- Wang, Z.; Li, Y.; Liu, Y.; Meng, F. Improved object detection via large kernel attention. Expert Syst. Appl. 2024, 240, 122507. [Google Scholar] [CrossRef]
- Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. Carafe: Content-aware reassembly of features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar]
- Kang, M.; Ting, C.-M.; Ting, F.F.; Phan, R.C.-W. ASF-YOLO: A novel YOLO model with attentional scale sequence fusion for cell instance segmentation. Image Vision Comput. 2024, 147, 105057. [Google Scholar] [CrossRef]
- Liu, W.; Lu, H.; Fu, H.; Cao, Z. Learning to Upsample by Learning to Sample. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6027–6037.
- Shu, X.; Zhang, L. Research on PointPillars Algorithm Based on Feature-Enhanced Backbone Network. Electronics 2024, 13, 1233. [Google Scholar] [CrossRef]
- Zhang, J.; Li, X.; Li, J.; Liu, L.; Xue, Z.; Zhang, B.; Jiang, Z.; Huang, T.; Wang, Y.; Wang, C. Rethinking mobile block for efficient attention-based models. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 1389–1400. [Google Scholar]
- Chiang, H.-Y.; Frumkin, N.; Liang, F.; Marculescu, D. MobileTL: On-device transfer learning with inverted residual blocks. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 7166–7174. [Google Scholar]
- Lin, T.; Wang, Y.; Liu, X.; Qiu, X. A survey of transformers. AI Open 2022, 3, 111–132. [Google Scholar] [CrossRef]
- Ding, X.; Zhang, Y.; Ge, Y.; Zhao, S.; Song, L.; Yue, X.; Shan, Y. Unireplknet: A universal perception large-kernel convnet for audio, video, point cloud, time-series and image recognition. arXiv 2023, arXiv:2311.15599. [Google Scholar]
- Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
- Liu, C.; Meng, F.; Zhu, Z.; Zhou, L. Object Detection of UAV Aerial Image based on YOLOv8. Front. Comput. Intell. Syst. 2023, 5, 46–50. [Google Scholar] [CrossRef]
- Zhang, L.; He, Y.; Zhou, Y.; Chen, Y.; Chen, Z.; Yang, Y. An improved YOLOv7 algorithm for laser welding defect detection. In Proceedings of the Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023), Hangzhou, China, 26–28 May 2023; pp. 343–349. [Google Scholar]
- Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]
- Revaud, J.; Almazán, J.; Rezende, R.S.; Souza, C.R.d. Learning with average precision: Training image retrieval with a listwise loss. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5107–5116. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part. I 14. Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. In Computer Vision and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2018; Volume 1804, pp. 1–6. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Fang, J.; Michael, K.; Montes, D.; Nadar, J.; Skalski, P.; et al. Ultralytics/yolov5: v6. 1-tensorrt, tensorflow edge tpu and openvino export and inference. Zenodo 2022. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 16965–16974. [Google Scholar]
- Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Ren, T.; Jiang, Q.; Liu, S.; Zeng, Z.; Liu, W.; Gao, H.; Huang, H.; Ma, Z.; Jiang, X.; Chen, Y.; et al. Grounding DINO 1.5: Advance the “Edge” of Open-Set Object Detection. arXiv 2024, arXiv:2405.10300. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Object Type | 32) | 96) | 96~) |
---|---|---|---|
10,000) | 44.44 | 18.63 | 1.704 |
Name | Configure |
---|---|
Operating System | Linux |
CPU | Intel(R) Xeon(R) Gold 6130 |
GPU | NVIDIA GeForce RTX 3090 Ti |
GPU Memory | 24 G |
Programming language | Python 3.8.5 |
Deep-learning framework | Pytorch 2.0.0 |
Name | Configure |
Models | WIoUv3 | SPPF-LSKA | ASF-DySample | iRMB-DRB | F1-Score | Precision | Recall | mAP@50 | Params/M |
---|---|---|---|---|---|---|---|---|---|
YOLOv8n | 0.32 | 1 | 0.51 | 0.276 | 3.15 | ||||
0.33 | 1 | 0.51 | 0.279 | 3.15 | |||||
0.32 | 1 | 0.51 | 0.280 | 3.43 | |||||
0.33 | 1 | 0.51 | 0.281 | 3.31 | |||||
0.35 | 1 | 0.52 | 0.312 | 3.14 |
Models | Pedestrian | People | Bicycle | Car | Van | Truck | Tricycle | Awning-Tricycle | Bus | Motor | mAP@50 | FPS |
---|---|---|---|---|---|---|---|---|---|---|---|---|
SSD [49] | 0.027 | 0.023 | 0.002 | 0.298 | 0.062 | 0.056 | 0.018 | 0.012 | 0.215 | 0.031 | 0.074 | 9.6 |
Faster R-CNN [50] | 0.187 | 0.090 | 0.063 | 0.666 | 0.335 | 0.238 | 0.126 | 0.100 | 0.478 | 0.171 | 0.246 | 14.5 |
Cascade R-CNN [13] | 0.166 | 0.072 | 0.066 | 0.677 | 0.362 | 0.255 | 0.133 | 0.103 | 0.491 | 0.169 | 0.249 | 9.3 |
YOLOv3 [51] | 0.069 | 0.041 | 0.023 | 0.442 | 0.136 | 0.171 | 0.046 | 0.051 | 0.374 | 0.071 | 0.142 | 40.1 |
YOLOv5 [52] | 0.222 | 0.138 | 0.057 | 0.650 | 0.213 | 0.178 | 0.071 | 0.084 | 0.457 | 0.203 | 0.227 | 68.9 |
YOLOv7 [53] | 0.179 | 0.0974 | 0.0448 | 0.618 | 0.272 | 0.26 | 0.0935 | 0.111 | 0.433 | 0.185 | 0.229 | 66.1 |
YOLOv8n [23] | 0.226 | 0.117 | 0.065 | 0.670 | 0.305 | 0.337 | 0.143 | 0.149 | 0.518 | 0.225 | 0.276 | 78.7 |
RT-DETR [54] | 0.0773 | 0.0347 | 0.0171 | 0.386 | 0.179 | 0.172 | 0.0489 | 0.0633 | 0.306 | 0.078 | 0.248 | 73.7 |
YOLOv9 [55] | 0.207 | 0.108 | 0.0517 | 0.643 | 0.287 | 0.291 | 0.107 | 0.123 | 0.472 | 0.211 | 0.25 | 80.8 |
Ground DINO 1.5 [56] | 0.215 | 0.092 | 0.061 | 0.627 | 0.279 | 0.296 | 0.102 | 0.137 | 0.456 | 0.208 | 0.262 | 73.1 |
YOLOv10 [57] | 0.19 | 0.125 | 0.059 | 0.63 | 0.271 | 0.262 | 0.0883 | 0.119 | 0.453 | 0.199 | 0.243 | 78.3 |
YOLO-DroneMS | 0.330 | 0.262 | 0.069 | 0.744 | 0.369 | 0.265 | 0.203 | 0.106 | 0.423 | 0.351 | 0.312 | 83.3 |
Models | Precision | Recall | mAP@50 |
---|---|---|---|
SSD [49] | 0.867 | 0.849 | 0.876 |
Faster R-CNN [50] | 0.623 | 0.825 | 0.761 |
Cascade R-CNN [13] | 0.67 | 0.771 | 0.741 |
YOLOv3 [51] | 0.851 | 0.713 | 0.783 |
YOLOv5 [52] | 0.855 | 0.854 | 0.892 |
YOLOv7 [53] | 0.865 | 0.831 | 0.88 |
YOLOv8n [23] | 0.834 | 0.845 | 0.896 |
RT-DETR [54] | 0.855 | 0.843 | 0.899 |
YOLOv9 [55] | 0.848 | 0.819 | 0.888 |
Ground DINO 1.5 [56] | 0.839 | 0.832 | 0.881 |
YOLOv10 [57] | 0.846 | 0.839 | 0.896 |
YOLO-DroneMS | 0.893 | 0.843 | 0.906 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, X.; Chen, Y. YOLO-DroneMS: Multi-Scale Object Detection Network for Unmanned Aerial Vehicle (UAV) Images. Drones 2024, 8, 609. https://doi.org/10.3390/drones8110609
Zhao X, Chen Y. YOLO-DroneMS: Multi-Scale Object Detection Network for Unmanned Aerial Vehicle (UAV) Images. Drones. 2024; 8(11):609. https://doi.org/10.3390/drones8110609
Chicago/Turabian StyleZhao, Xueqiang, and Yangbo Chen. 2024. "YOLO-DroneMS: Multi-Scale Object Detection Network for Unmanned Aerial Vehicle (UAV) Images" Drones 8, no. 11: 609. https://doi.org/10.3390/drones8110609
APA StyleZhao, X., & Chen, Y. (2024). YOLO-DroneMS: Multi-Scale Object Detection Network for Unmanned Aerial Vehicle (UAV) Images. Drones, 8(11), 609. https://doi.org/10.3390/drones8110609