Drone-DETR: Efficient Small Object Detection for Remote Sensing Image Using Enhanced RT-DETR Model
Abstract
:1. Introduction
- The lightweight backbone network ESDNet for UAV small object detection has been designed. Within it, a shortcut downsampling module MPD, which utilizes MaxPool and AvgPool together, is incorporated. Furthermore, a Fast-Residual Block is proposed, based on MPD and the FasterNet Block. Additionally, a shallow feature enhancement module (SFEM) consisting of C2f and the Fast-Residual Block is also introduced. This significantly enhances the shallow feature extraction capability and achieves model lightweight simultaneously.
- An Enhanced Dual-Path Feature Fusion Attention Module (EDF-FAM) has been developed to collaboratively generate global attention from dual-channel features through the integration of deformable convolution and 1D-channel convolution. This multi-scale fusion strategy facilitates the interaction of multi-channel information, thereby significantly improving the feature representation capabilities of the model.
- The output of the SFEM layer is introduced into the neck network for feature fusion on the basis of the original fusion network, which enhances the ability of the network to extract and express small object features.
2. Related Work
2.1. Drone Object Detection Technologies
2.2. Transformer-Based Object Detection Network
3. Methodology
3.1. Backbone
Fast-Residual Block
3.2. Neck
4. Experiments and Results
4.1. Dataset
4.2. Experimental Environment
4.3. Experiment Metrics
4.4. Comparative Experimental Results and Analysis
4.4.1. Comparative Experimental Results and Analysis of Drone-DETR and Other SOTA Algorithms
4.4.2. Comparison of Drone-DETR and RT-DETR for Small Object Visualization
4.4.3. Results and Analysis of Fast-Residual Block vs. Residual Block
4.5. Results and Analysis of Ablation Experiments
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Alsamhi, S.H.; Shvetsov, A.V.; Kumar, S.; Shvetsova, S.V.; Alhartomi, M.A.; Hawbani, A.; Rajput, N.S.; Srivastava, S.; Saif, A.; Nyangaresi, V.O. UAV computing-assisted search and rescue mission framework for disaster and harsh environment mitigation. Drones 2022, 6, 154. [Google Scholar] [CrossRef]
- Feng, J.; Wang, J.; Qin, R. Lightweight detection network for arbitrary-oriented vehicles in UAV imagery via precise positional information encoding and bidirectional feature fusion. Int. J. Remote Sens. 2023, 44, 4529–4558. [Google Scholar] [CrossRef]
- Naufal, C.; Solano-Correa, Y.T.; Marrugo, A.G. YOLO-based multi-scale ground control point detection in UAV surveying. In Proceedings of the 2023 IEEE Colombian Caribbean Conference (C3), Barranquilla, Colombia, 22–25 November 2023; pp. 1–5. [Google Scholar]
- Calderón, M.; Aguilar, W.G.; Merizalde, D. Visual-based real-time detection using neural networks and micro-uavs for military operations. In Proceedings of the MICRADS 2020: International Conference of Research Applied to Defense and Security, Quito, Ecuador, 13–15 May 2020; pp. 55–64. [Google Scholar]
- Cao, Z.; Kooistra, L.; Wang, W.; Guo, L.; Valente, J. Real-time object detection based on uav remote sensing: A systematic literature review. Drones 2023, 7, 620. [Google Scholar] [CrossRef]
- Zhang, Z. Drone-YOLO: An efficient neural network method for target detection in drone images. Drones 2023, 7, 526. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Xu, Y.; Yu, G.; Wang, Y.; Wu, X.; Ma, Y. Car detection from low-altitude UAV imagery with the faster R-CNN. J. Adv. Transp. 2017, 2017, 2823617. [Google Scholar] [CrossRef]
- Avola, D.; Cinque, L.; Diko, A.; Fagioli, A.; Foresti, G.L.; Mecca, A.; Pannone, D.; Piciarelli, C. MS-Faster R-CNN: Multi-stream backbone for improved Faster R-CNN object detection and aerial tracking from UAV images. Remote Sens. 2021, 13, 1670. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Wong, C.; Yifu, Z.; Montes, D. ultralytics/yolov5: v6. 2-yolov5 classification models, apple m1, reproducibility, clearml and deci. ai integrations. Zenodo 2022. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO (Version 8.0.0) [Computer Software]. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 15 May 2024).
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.-Y. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. arXiv 2023, arXiv:2304.08069. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: 31st Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Deng, C.; Wang, M.; Liu, L.; Liu, Y.; Jiang, Y. Extended feature pyramid network for small object detection. IEEE Trans. Multimed. 2021, 24, 1968–1979. [Google Scholar] [CrossRef]
- Chen, J.; Kao, S.-H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, Don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9308–9316. [Google Scholar]
- Zhang, Z.; Lu, X.; Cao, G.; Yang, Y.; Jiao, L.; Liu, F. ViT-YOLO: Transformer-based YOLO for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 2799–2808. [Google Scholar]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 2778–2788. [Google Scholar]
- Li, Y.; Fan, Q.; Huang, H.; Han, Z.; Gu, Q. A modified YOLOv8 detection network for UAV aerial image recognition. Drones 2023, 7, 304. [Google Scholar] [CrossRef]
- Yao, Z.; Ai, J.; Li, B.; Zhang, C. Efficient detr: Improving end-to-end object detector with dense prior. arXiv 2021, arXiv:2104.01318. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Wang, J.; Yu, J.; He, Z. ARFP: A novel adaptive recursive feature pyramid for object detection in aerial images. Appl. Intell. 2022, 52, 12844–12859. [Google Scholar]
- Li, C.; Yang, T.; Zhu, S.; Chen, C.; Guan, S. Density map guided object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 13–19 June 2020; pp. 190–191. [Google Scholar]
- Yu, W.; Yang, T.; Chen, C. Towards resolving the challenge of long-tail distribution in UAV images for object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 3258–3267. [Google Scholar]
- Wang, Y.; Yang, Y.; Zhao, X. Object detection using clustering algorithm adaptive searching regions in aerial images. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 651–664. [Google Scholar]
- Li, Y.-L.; Feng, Y.; Zhou, M.-L.; Xiong, X.-C.; Wang, Y.-H.; Qiang, B.-H. DMA-YOLO: Multi-scale object detection method with attention mechanism for aerial images. Vis. Comput. 2023, 40, 4505–4518. [Google Scholar] [CrossRef]
- Min, L.; Fan, Z.; Lv, Q.; Reda, M.; Shen, L.; Wang, B. YOLO-DCTI: Small object detection in remote sensing base on contextual transformer enhancement. Remote Sens. 2023, 15, 3970. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Tang, S.; Fang, Y.; Zhang, S. HIC-YOLOv5: Improved YOLOv5 For Small Object Detection. arXiv 2023, arXiv:2309.16393. [Google Scholar]
- Xia, G.-S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
Feature | Parameter |
---|---|
Operating System | Windows 10 |
Programming Language | Python 3.9 |
CPU | Intel(R) Xeon(R) W-2245 CPU @ 3.90 GHz |
GPU | RTX 3090 |
GPU Memory | 32 G |
Algorithm Framework | PyTorch |
Method | Reference | mAP50–95/% | mAP50/% | FPS |
---|---|---|---|---|
Two-stage methods | ||||
CornerNet [35] | (Law et al., 2018) | 17.4 | 34.1 | |
ARFP [36] | (Wang et al., 2022) | 20.4 | 33.9 | |
DMNet [37] | (Li et al., 2020) | 29.4 | 49.3 | |
DSHNet [38] | (Yu et al., 2021) | 30.3 | 51.8 | |
CRENet [39] | (Wang et al., 2020) | 33.7 | 54.3 | |
One-stage methods | ||||
FRCNN [8] + FPN [42] | (Lin et al., 2017) | 21.8 | 41.8 | 17 |
YOLOv5m [14] | (Jocher et al., 2022) | 21.9 | 42.3 | 85 |
YOLOv7 [15] | (Wang et al., 2023) | 23.0 | 41.1 | 55 |
TPH-YOLOv5 [31] | (Zhu et al., 2021) | 23.1 | 41.5 | 25 |
YOLOv8m [16] | (Jocher et al., 2023) | 25.5 | 42.1 | 90 |
HIC-YOLOv5 [43] | (Tang et al., 2023) | 26.0 | 44.3 | |
YOLO-DCTI [41] | (Min et al., 2023) | 27.4 | 49.8 | 15 |
Drone-YOLO-L [6] | (Zhang et al., 2023) | 31.9 | 51.3 | |
Deformable-DETR [18] | (Zhu et al., 2020) | 27.1 | 43.1 | 19 |
RT-DETR-R18 [20] | (Zhao et al., 2023) | 27.7 | 45.8 | 76 |
Ours | 33.9 | 53.9 | 30 |
Method | mAP50–95/% | mAP50/% | mAPS/% | mAPM/% | mAPL/% |
---|---|---|---|---|---|
RT-DETR | 20.7 | 36.2 | 11.6 | 29.4 | 34.8 |
Drone-DETR | 24.9 | 42.4 | 17.8 | 35.3 | 40.6 |
Method | mAP50–95/% | mAP50/% | mAPS/% | mAPM/% | mAPL/% |
---|---|---|---|---|---|
RT-DETR | 41.7 | 65.1 | 21.1 | 44.0 | 51.6 |
Drone-DETR | 43.2 | 66.7 | 24.1 | 45.2 | 52.2 |
Method | mAP50/% | Params/M | GFLOPs | Latency/ms |
---|---|---|---|---|
Baseline | 45.8 | 20.1 | 58.3 | 13.12 |
Figure 12b | 45.6 | 17.3 | 53.3 | 12.69 |
Figure 12c | 45.7 | 16.6 | 48.3 | 10.56 |
Figure 12d | 45.9 | 16.5 | 43.6 | 10.21 |
Dataset | Experiment | Baseline | ESDNet | EDF-FAM | P2 Layer | mAP50/% | Params/M | GFLOPs | FPS |
---|---|---|---|---|---|---|---|---|---|
Test | A | √ | - | - | - | 36.2 | 20.1 | 58.3 | 76 |
B | √ | √ | - | - | 37.3 | 13.8 | 71.4 | 47 | |
C | √ | √ | √ | - | 39.7 | 22.1 | 67.9 | 52 | |
D | √ | √ | √ | √ | 42.4 | 28.7 | 128.3 | 30 | |
Val | A | √ | - | - | - | 45.8 | 20.1 | 58.3 | 76 |
B | √ | √ | - | - | 47.2 | 13.8 | 71.4 | 47 | |
C | √ | √ | √ | - | 49.4 | 22.1 | 67.9 | 52 | |
D | √ | √ | √ | √ | 53.9 | 28.7 | 128.3 | 30 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kong, Y.; Shang, X.; Jia, S. Drone-DETR: Efficient Small Object Detection for Remote Sensing Image Using Enhanced RT-DETR Model. Sensors 2024, 24, 5496. https://doi.org/10.3390/s24175496
Kong Y, Shang X, Jia S. Drone-DETR: Efficient Small Object Detection for Remote Sensing Image Using Enhanced RT-DETR Model. Sensors. 2024; 24(17):5496. https://doi.org/10.3390/s24175496
Chicago/Turabian StyleKong, Yaning, Xiangfeng Shang, and Shijie Jia. 2024. "Drone-DETR: Efficient Small Object Detection for Remote Sensing Image Using Enhanced RT-DETR Model" Sensors 24, no. 17: 5496. https://doi.org/10.3390/s24175496
APA StyleKong, Y., Shang, X., & Jia, S. (2024). Drone-DETR: Efficient Small Object Detection for Remote Sensing Image Using Enhanced RT-DETR Model. Sensors, 24(17), 5496. https://doi.org/10.3390/s24175496