DV-DETR: Improved UAV Aerial Small Target Detection Algorithm Based on RT-DETR
Abstract
:1. Introduction
2. Related Work
2.1. Challenges of YOLO Algorithm in Small Target Recognition
2.2. End-to-End Object Detection Model
3. Methods
3.1. RT-DETR Model
3.2. Recalibrate Attention Unit
3.3. Reparameterization Module
3.4. Deformable Attention Mechanism
3.5. Focaler-IoU Loss Function
3.6. DV-DETR Model
4. Results
4.1. Dataset Introduction
4.2. Experimental Environment
4.3. Model Evaluation Indicators
4.4. Analysis of Experimental Results
4.4.1. Performance Evaluation
4.4.2. Ablation Experiment
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Nalamati, M.; Kapoor, A.; Saqib, M.; Sharma, N.; Blumenstein, M. Drone Detection in Long-Range Surveillance Videos. In Proceedings of the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 18–21 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Sultana, F.; Sufian, A.; Dutta, P. A review of object detection models based on convolutional neural network. Intelligent Computing: Image Processing Based Applications; Springer: Singapore, 2020; pp. 1–16. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
- Tang, S.; Zhang, S.; Fang, Y. HIC-YOLOv5: Improved YOLOv5 for small object detection. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–18 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 6614–6619. [Google Scholar]
- Yang, R.; Li, W.; Shang, X.; Zhu, D.; Man, X. KPE-YOLOv5: An improved small target detection algorithm based on YOLOv5. Electronics 2023, 12, 817. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Li, F.; Zhang, H.; Liu, S.; Guo, J.; Ni, L.M.; Zhang, L. Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13619–13627. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
- Muhuri, A.; Goïta, K.; Magagi, R.; Wang, H. Geodesic distance based scattering power decomposition for compact polarimetric SAR data. IEEE Trans. Geosci. Remote Sens. 2023, 61, 2004412. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Tang, F.; Xu, Z.; Huang, Q.; Wang, J.; Hou, X.; Su, J.; Liu, J. DuAT: Dual-Aggregation Transformer Network for Medical IMAGE Segmentation. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Xiamen, China, 13–15 October 2023; Springer: Singapore, 2023; pp. 343–356. [Google Scholar]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 13733–13742. [Google Scholar]
- Xia, Z.; Pan, X.; Song, S.; Li, L.E.; Huang, G. Vision transformer with deformable attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4794–4803. [Google Scholar]
- Wen, L.; Du, D.; Cai, Z.; Lei, Z.; Chang, M.C.; Qi, H.; Lim, J.; Yang, M.H.; Lyu, S. UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst. 2020, 193, 102907. [Google Scholar] [CrossRef]
- Zhang, X.; Izquierdo, E.; Chandramouli, K. Dense and small object detection in uav vision based on cascade network. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef] [PubMed]
- Union, G.I.O. A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Yu, W.; Yang, T.; Chen, C. Towards resolving the challenge of long-tail distribution in UAV images for object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 3258–3267. [Google Scholar]
- Xiaolin, L.; Dadong, L.; Xinman, L.; Ze, C. Target Detection Algorithm of UAV Aerial Image Based on Improved YOLOv5. J. Comput. Eng. Appl. 2024, 60, 204. [Google Scholar]
Backbone Network | Parameters/M | GFLOPs | Mean Average Precision mAP/% |
---|---|---|---|
ResNet18 | 17.657 | 53.4 | 45.407 |
SwinTransformer | 36.620 | 98.1 | 46.579 |
HGNet | 32.827 | 108.3 | 48.389 |
Configurations | Parameters |
---|---|
OS | Ubuntu 20.04 |
CPU | Intel Xeon E5-2680 |
GPU | NVIDIA GeForce RTX 3090 24 GB |
CUDA Version | CUDA 12.1 |
Memory | 64 GB |
Deep Learning Framework | Pytorch 2.2.0 |
Name | Parameters |
---|---|
Epoch | 200 |
Batch Size | 8 |
Input Size | 640 × 640 |
Optimizer | AdamW |
Initial Learning Rate | 0.0001 |
Weight Decay Coefficient | 0.0001 |
Model | Pedestrians | People | Bicycles | Cars | Vans | Trucks | Tricycles | Awning-Tricycles | Buses | Motorcycles | [email protected] |
---|---|---|---|---|---|---|---|---|---|---|---|
Fast R-CNN [23] | 21.4 | 15.6 | 6.7 | 51.7 | 29.5 | 19 | 13.1 | 7.7 | 31.4 | 20.7 | 21.7 |
YOLOv5s | 22.6 | 20.6 | 14.6 | 59.7 | 24 | 21.3 | 20.1 | 17.4 | 37.9 | 23.7 | 26.2 |
YOLOv7-t | 41.5 | 37.5 | 11.4 | 77.9 | 39.8 | 32.8 | 23.7 | 12 | 48.1 | 46.7 | 37.1 |
DA-YOLO v5 [24] | 47.1 | 37.9 | 14.2 | 78.3 | 37.4 | 32.3 | 23.5 | 15.6 | 47.2 | 44.4 | 37.79 |
RT-DETR [11] | 54.8 | 46.9 | 20.6 | 85.1 | 49.5 | 35.9 | 32.8 | 14.5 | 58.1 | 57.5 | 45.6 |
DV-DETR | 58.3 | 49.4 | 25.7 | 86.8 | 53.3 | 41.2 | 36.5 | 20.1 | 69.3 | 61.2 | 50.2 |
Number | Models | P | R | [email protected] | [email protected]:0.95 |
---|---|---|---|---|---|
Exp1 | RT-DETR (Baseline) | 62.147 | 46.657 | 48.389 | 29.7 |
Exp2 | + ResNet18 | 59.293 | 43.943 | 45.407 | 27.8 |
Exp3 | + ResNet18 + SBA Module | 61.233 | 47.240 | 49.438 | 30.9 |
Exp4 | + ResNet18 + SBA Module + DAttention | 61.566 | 47.565 | 49.496 | 31.1 |
Exp5 | + ResNet18 + SBA Module + Focaler-IoU | 61.559 | 47.314 | 49.462 | 31.0 |
Exp6 | + ResNet18 + SBA Module + DAttention + Focaler-IoU | 62.431 | 47.759 | 50.050 | 31.4 |
Number | Models | FPS | GFLOPs | Params/106 |
---|---|---|---|---|
Exp1 | RT-DETR (Baseline) | 75.0 | 103.5 | 32.004 |
Exp2 | + ResNet18 | 139.1 (+64.1) | 52.0 (−51.5) | 17.447 (−14.557) |
Exp3 | + ResNet18 + SBA Module | 92.0 (+17.0) | 84.4 (−19.1) | 19.498 (−12.506) |
Exp4 | + ResNet18 + SBA Module + DAttention | 89.6 (+14.6) | 84.6 (−18.9) | 19.501 (−12.503) |
Exp5 | + ResNet18 + SBA Module + Focaler-IoU | 90.3 (+15.3) | 84.4 (−19.1) | 19.498 (−12.506) |
Exp6 | + ResNet18 + SBA Module + DAttention + Focaler-IoU | 90.0 (+15.0) | 84.6 (−18.9) | 19.501 (−12.503) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, X.; Yin, L.; Zhang, L.; Wu, F. DV-DETR: Improved UAV Aerial Small Target Detection Algorithm Based on RT-DETR. Sensors 2024, 24, 7376. https://doi.org/10.3390/s24227376
Wei X, Yin L, Zhang L, Wu F. DV-DETR: Improved UAV Aerial Small Target Detection Algorithm Based on RT-DETR. Sensors. 2024; 24(22):7376. https://doi.org/10.3390/s24227376
Chicago/Turabian StyleWei, Xiaolong, Ling Yin, Liangliang Zhang, and Fei Wu. 2024. "DV-DETR: Improved UAV Aerial Small Target Detection Algorithm Based on RT-DETR" Sensors 24, no. 22: 7376. https://doi.org/10.3390/s24227376
APA StyleWei, X., Yin, L., Zhang, L., & Wu, F. (2024). DV-DETR: Improved UAV Aerial Small Target Detection Algorithm Based on RT-DETR. Sensors, 24(22), 7376. https://doi.org/10.3390/s24227376