Small Object Detection in UAV Remote Sensing Images Based on Intra-Group Multi-Scale Fusion Attention and Adaptive Weighted Feature Fusion Mechanism
Abstract
:1. Introduction
- (1)
- We propose a novel adaptive weighted feature fusion (AWFF) module, which dynamically adjusts feature weights to enhance the representation of key features, thereby significantly improving the discriminative power of the model for object recognition. In addition, our module effectively integrates feature information from multiple levels, such that the model can simultaneously capture the details and semantic information of the object;
- (2)
- We designed a Mixed Spatial Pyramid Pooling Fast (Mix-SPPF) module that combines the advantages of average pooling and maximum pooling to improve the accuracy of recognizing small objects;
- (3)
- We introduce a novel lightweight intra-group multi-scale fusion attention module (IGMSFA) that effectively reduces the influence of background noise while ensuring high performance in resource-constrained environments;
- (4)
- In comparison with the current mainstream YOLO series algorithms and classical object detection methods, our proposed IA-YOLOv8 algorithm demonstrates significant advantages. Specifically, IA-YOLOv8 achieves a higher mAP while maintaining a reduced number of parameters.
2. Related Work
2.1. Object Detection Algorithms
2.2. Attention Mechanisms
2.3. Feature Fusion
3. Methods
3.1. Overall Architecture of IA-YOLOv8
3.2. Mix Spatial Pyramid Soft Pool Fast (Mix-SPPF)
3.3. Adaptive Weighted Feature Fusion (AWFF) Module
3.4. Intra-Group Multi-Scale Fusion Attention Module (IGMSFA)
- (1)
- In order to effectively capture multi-level information within the input feature map, this section employs convolution kernels of varying sizes within the same convolutional layer to extract local features across different spatial ranges. Furthermore, to mitigate the computational complexity associated with convolution operations and enhance computational efficiency, this study introduces a method that combines depth-wise separable convolutions with dilated convolutions. Figure 6 illustrates the Multi-Scale Attention Fusion (MSAF) module, wherein depth-wise separable convolutions decompose the traditional convolutions into two distinct steps: depth-wise convolution and point-wise convolution for each channel. This decomposition significantly reduces both parameter count and computational load while preserving model performance. Dilated convolutions expand the receptive field by applying zero padding between the convolution kernel and input features without incurring additional parameters or computational overhead, thereby improving the network’s capacity to capture extensive contextual information. The mathematical expression for the receptive field of dilated convolutions is presented as follows:
- (2)
- The MSAF module is integrated into the ULSAM module to replace its single spatial attention mechanism. Additionally, to further enhance information flow, a lightweight intra-group multi-scale fusion attention module (IGMSFA) is introduced, as shown in Figure 7. Firstly, for the input feature map (where G = n × 4 g), it is uniformly divided into n groups along the channel dimension, with the number of channels in each group being 4 g, to obtain the following:
3.5. Loss Function
4. Experiments and Analysis
4.1. Experimental Environment Setup and Data Set Introduction
4.2. Ablation Experiment
4.3. Comparative Experiment
4.4. Visual Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, Z.; Zhang, Y.; Wu, H.; Suzuki, S.; Namiki, A.; Wang, W. Design and application of a UAV autonomous inspection system for high-voltage power transmission lines. Remote Sens. 2023, 15, 865. [Google Scholar] [CrossRef]
- Mohsan, S.A.H.; Othman, N.Q.H.; Li, Y.; Alsharif, M.H.; Khan, M.A.J. Unmanned aerial vehicles (UAVs): Practical aspects, applications, open challenges, security issues, and future trends. Intell. Serv. Robot. 2023, 16, 109–137. [Google Scholar] [CrossRef]
- Yuan, S.; Li, Y.; Bao, F.; Xu, H.; Yang, Y.; Yan, Q.; Zhong, S.; Yin, H.; Xu, J.; Huang, Z. Marine environmental monitoring with unmanned vehicle platforms: Present applications and future prospects. Sci. Total Environ. 2023, 858, 159741. [Google Scholar] [CrossRef]
- Li, X.; Lu, X.; Chen, W.; Ge, D.; Zhu, J. Research on UAVs Reconnaissance Task Allocation Method Based on Communication Preservation. IEEE Trans. Consum. Electron. 2024, 70, 684–695. [Google Scholar] [CrossRef]
- Mahmud, I.; Cho, Y.Z. Detection avoidance and priority-aware object tracking for UAV group reconnaissance operations. J. Intell. Robot. Syst. 2018, 92, 381–392. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, W.; Li, P.; Ning, Y.; Suo, C. A method for autonomous navigation and positioning of UAV based on electric field array detection. Sensors 2021, 21, 1146. [Google Scholar] [CrossRef]
- Iftikhar, S.; Asim, M.; Zhang, Z.; Muthanna, A.; Chen, J.; El-Affendi, M.; Sedik, A.; Abd El-Latif, A. Object detection and recognition for traffic congestion in smart cities using deep learning-enabled UAVs: A review and analysis. Appl. Sci. 2023, 13, 3995. [Google Scholar] [CrossRef]
- Xiong, X.; He, M.; Li, T.; Zheng, G.; Xu, W.; Fan, X.; Zhang, Y. Adaptive Feature Fusion and Improved Attention Mechanism Based Small Object Detection for UAV Object Tracking. IEEE Internet Things J. 2024, 11, 21239–21249. [Google Scholar] [CrossRef]
- Wang, C.; Zhao, R.; Yang, X.; Wu, Q. Research of UAV object detection and flight control based on deep learning. In Proceedings of the 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 26–28 May 2018; pp. 170–174. [Google Scholar]
- Adoni, W.Y.H.; Lorenz, S.; Fareedh, J.S.; Gloaguen, R.; Bussmann, M.J.D. Investigation of autonomous multi-UAV systems for object detection in distributed environment: Current developments and open challenges. UAVs 2023, 7, 263. [Google Scholar]
- Yang, Y.; Guo, B.; Li, C.; Zhi, Y. An improved yolov3 algorithm for pedestrian detection on uav imagery. In Proceedings of the Genetic and Evolutionary Computing: Proceedings of the Thirteenth International Conference on Genetic and Evolutionary Computing, Qingdao, China, 1–3 November 2019; pp. 253–261. [Google Scholar]
- Zhang, C.; Zheng, Y.; Guo, B.; Li, C.; Liao, N. SCN: A novel shape classification algorithm based on convolutional neural network. Symmetry 2021, 13, 499. [Google Scholar] [CrossRef]
- Zhang, C.; Li, C.; Guo, B.; Liao, N. Neural Network Compression via Low Frequency Preference. Remote Sens. 2023, 15, 3144. [Google Scholar] [CrossRef]
- Li, C.; Duan, H.J.A.S. Object detection approach for UAVs via improved pigeon-inspired optimization and edge potential function. Aerosp. Sci. Technol. 2014, 39, 352–360. [Google Scholar] [CrossRef]
- Wang, X.; Deng, Y.; Duan, H. Edge-based object detection for unmanned aerial vehicles using competitive Bird Swarm Algorithm. Aerosp. Sci. Technol. 2018, 78, 708–720. [Google Scholar] [CrossRef]
- Sahani, S.K.; Adhikari, G.; Das, B. A fast template matching algorithm for aerial object tracking. In Proceedings of the 2011 International Conference on Image Information Processing, Shimla, India, 3–5 November 2011; pp. 1–6. [Google Scholar]
- Zhang, Q.; Duan, H. Chaotic biogeography-based optimization approach to object detection in UAV surveillance. Optik 2014, 125, 7100–7105. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Berg, A.C.; Fu, C.Y.; Szegedy, C.; Anguelov, D.; Erhan, D.; Reed, S.; Liu, W. SSD: Single Shot MultiBox Detector. arXiv 2015, arXiv:1512.02325. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.B.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell 2017, 42, 318–327. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-time flying object detection with YOLOv8. arXiv 2023, arXiv:2305.09972. [Google Scholar]
- Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. Yolov9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Tan, L.; Lv, X.; Lian, X.; Wang, G. YOLOv4_UAV: UAV image object detection based on an improved YOLOv4 algorithm. Comput. Electr. Eng. 2021, 93, 107261. [Google Scholar] [CrossRef]
- Shang, J.; Wang, J.; Liu, S.; Wang, C.; Zheng, B. Small object detection algorithm for UAV aerial photography based on improved YOLOv5s. Electronics 2023, 12, 2434. [Google Scholar] [CrossRef]
- Shen, S.; Zhang, X.; Yan, W.; Xie, S.; Yu, B.; Wang, S. An improved UAV object detection algorithm based on ASFF-YOLOv5s. Math. Biosci. Eng. MBE 2023, 20, 10773–10789. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Liu, C.; Tang, K.; Meng, F.; Zhu, Z.; Zhou, L.; Chen, F. Improved YOLOv5s algorithm for small object detection in UAV aerial photography. IEEE Access 2024, 12, 9784–9791. [Google Scholar] [CrossRef]
- Wang, X.; Wang, A.; Yi, J.; Song, Y.; Chehri, A. Small Object Detection Based on Deep Learning for Remote Sensing: A Comprehensive Review. Remote Sens. 2023, 15, 3265. [Google Scholar] [CrossRef]
- Sapkota, R.; Meng, Z.; Ahmed, D.; Churuvija, M.; Du, X.; Ma, Z.; Karkee, M. Comprehensive Performance Evaluation of YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments. arXiv 2024, arXiv:2407.12040. [Google Scholar]
- Sundaresan Geetha, A.; Alif, M.A.R.; Hussain, M.; Allen, P. Comparative Analysis of YOLOv8 and YOLOv10 in Vehicle Detection: Performance Metrics and Model Efficacy. Vehicles 2024, 6, 1364–1382. [Google Scholar] [CrossRef]
- Qiu, X.; Chen, Y.; Cai, W.; Niu, M.; Li, J. LD-YOLOv10: A Lightweight Object Detection Algorithm for UAV Scenarios Based on YOLOv10. Electronics. 2024, 13, 3269. [Google Scholar] [CrossRef]
- Luan, T.; Zhou, S.; Liu, L.; Pan, W. Tiny-Object Detection Based on Optimized YOLO-CSQ for Accurate UAV Detection in Wildfire Scenarios. Drones 2024, 8, 454. [Google Scholar] [CrossRef]
- Wen, L.; Cheng, Y.; Fang, Y.; Li, X. A comprehensive survey of oriented object detection in remote sensing images. Expert Syst. Appl. 2023, 224, 119960. [Google Scholar] [CrossRef]
- Zhang, X.; Zhang, T.; Wang, G.; Zhu, P.; Tang, X.; Jia, X.; Jiao, L. Remote sensing object detection meets deep learning: A metareview of challenges and advances. IEEE Geosci. Remote Sens. Mag. 2023, 11, 8–44. [Google Scholar] [CrossRef]
- Wang, C.; Shi, Z.; Meng, L.; Wang, J.; Wang, T.; Gao, Q.; Wang, E. Anti-occlusion UAV tracking algorithm with a low-altitude complex background by integrating attention mechanism. Drones 2022, 6, 149. [Google Scholar] [CrossRef]
- Tan, L.; Lv, X.; Lian, X.; Wang, G.J.C. YOLOv4_Drone: UAV image target detection based on an improved YOLOv4 algorithm. Comput. Electr. Eng. 2021, 93, 107261. [Google Scholar] [CrossRef]
- Wang, F.; Wang, H.; Qin, Z.; Tang, J. UAV target detection algorithm based on improved YOLOv8. IEEE Access 2023, 11, 116534–116544. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Snoek, C.G.; Worring, M.; Smeulders, A.W. Early versus late fusion in semantic video analysis. In Proceedings of the 13th Annual ACM International Conference on Multimedia, New York, NY, USA, 6 November 2005; pp. 399–402. [Google Scholar]
- Pereira, L.M.; Salazar, A.; Vergara, L. A comparative analysis of early and late fusion for the multimodal two-class problem. IEEE Access 2023, 11, 84283–84300. [Google Scholar] [CrossRef]
- Gadzicki, K.; Khamsehashari, R.; Zetzsche, C. Early vs late fusion in multimodal convolutional neural networks. In Proceedings of the 2020 IEEE 23rd International Conference on Information Fusion (FUSION), Rustenburg, South Africa, 6–9 July 2020; pp. 1–6. [Google Scholar]
- Bell, S.; Zitnick, C.L.; Bala, K.; Girshick, R. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2874–2883. [Google Scholar]
- Kong, T.; Yao, A.; Chen, Y.; Sun, F. Hypernet: Towards accurate region proposal generation and joint object detection. In Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 845–853. [Google Scholar]
- Chaib, S.; Liu, H.; Gu, Y.; Yao, H. Deep feature fusion for VHR remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4775–4784. [Google Scholar] [CrossRef]
- Sun, Q.-S.; Zeng, S.-G.; Liu, Y.; Heng, P.-A.; Xia, D.-S. A new method of feature fusion and its application in image recognition. Pattern Recognit. 2005, 38, 2437–2448. [Google Scholar] [CrossRef]
- Dai, Y.; Gieseke, F.; Oehmcke, S.; Wu, Y.; Barnard, K. Attentional feature fusion. In Proceedings of the 2021 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 3560–3569. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Zhou, L.; Li, Y.; Rao, X.; Liu, C.; Zuo, X.; Liu, Y. Ship object detection in optical remote sensing images based on multiscale feature enhancement. Comput. Intell. Neurosci. 2022, 2022, 2605140. [Google Scholar] [CrossRef] [PubMed]
- Saini, R.; Jha, N.K.; Das, B.; Mittal, S.; Mohan, C.K. Ulsam: Ultra-lightweight subspace attention module for compact convolutional neural networks. In Proceedings of the 2020 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Snowmass, CO, USA, 1–5 March 2020; pp. 1627–1636. [Google Scholar]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM International Conference on Multimedia, New York, NY, USA, 1 October 2016; pp. 516–520. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 2 June 2020; pp. 12993–13000. [Google Scholar]
- Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Ling, H.; Hu, Q.; Zheng, J.; Peng, T.; Wang, X.; Zhang, Y. VisDrone-SOT2019: The vision meets UAV single object tracking challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 199–212. [Google Scholar]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Wang, J.; Yang, W.; Guo, H.; Zhang, R.; Xia, G.-S. Tiny object detection in aerial images. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 3791–3798. [Google Scholar]
- Duan, S.; Wang, T.; Li, T.; Yang, W. M-YOLOv8s: An improved small target detection algorithm for UAV aerial photography. J. Vis. Commun. Image Represent. 2024, 104, 104289. [Google Scholar] [CrossRef]
- Ning, T.; Wu, W.; Zhang, J. Small object detection based on YOLOv8 in UAV perspective. Pattern Anal. Appl. 2024, 27, 103. [Google Scholar] [CrossRef]
Parameters | Value |
---|---|
Epochs | 300 |
Batch_size | 6 |
Input image size | 512 × 512 |
Regression loss function | WIOU v3 |
Gradient Optimizer | SGD |
Momentum | 0.935 |
Initial learning rate | 0.01 |
Final learning rate | 0.00001 |
Data enhancement | Mosaic |
IoU | 0.5 |
Baseline | 4 × Head | Mix-SPPF | 5 × GMSFA | 3 × AWFF | R (%) | P (%) | mAP (%) | Parameter (MB) | GFLOPs |
---|---|---|---|---|---|---|---|---|---|
YOLOv8s | 32.9 | 49.9 | 34.2 | 10.6 | 28.5 | ||||
√ | 39.0 | 52.9 | 40.8 | 10.1 | 36.7 | ||||
√ | √ | 39.9 | 53.2 | 40.9 | 10.2 | 36.7 | |||
√ | √ | √ | 40.3 | 54.2 | 41.2 | 10.4 | 37.2 | ||
√ | √ | √ | √ | 41.2 | 54.6 | 42.1 | 10.9 | 37.3 |
Method | Backbone Network | Input_Size | Parameter (MB) | Model Size (MB) | mAP (%) |
---|---|---|---|---|---|
Faster RCNN [18] | ResNet50 + FPN | 600 × 600 | 41.5 | 83.3 | 28.6 |
Cascade-RCNN [19] | ResNet50 + FPN | 600 × 600 | 64.3 | 128.9 | 32.7 |
SSD [26] | VGG16 | 300 × 300 | 23.7 | 47.7 | 15.3 |
RetinaNet [27] | ResNet50 + FPN | 800 × 800 | 36.8 | 73.9 | 24.9 |
YOLOv5s | CSPDarkNet-53 | 512 × 512 | 8.7 | 17.6 | 33.4 |
YOLOv8s [28] | CSPDarkNet-53 | 512 × 512 | 10.6 | 21.5 | 34.2 |
YOLOv9s [29] | CSPDarkNet-53 | 512 × 512 | 6.8 | 13.8 | 34.7 |
YOLOv10s [30] | CSPDarkNet-53 | 512 × 512 | 7.7 | 15.6 | 32.5 |
M-YOLOv8s [65] | CSPDarkNet-53 | 300 × 300 | 5.9 | 11.8 | 41.2 |
Imporved_YOLOv8 [66] | CSPDarkNet-53 | 300 × 300 | 5.6 | 11.2 | 37.4 |
IA-YOLOv8 (ours) | CSPDarkNet-53 | 512 × 512 | 10.9 | 22.1 | 42.1 |
Method | Backbone Network | Input_Size | Parameter (MB) | Model Size (MB) | mAP (%) |
---|---|---|---|---|---|
Faster RCNN [18] | ResNet50 + FPN | 600 × 600 | 41.5 | 83.3 | 65.3 |
Cascade-RCNN [19] | ResNet50 + FPN | 600 × 600 | 64.3 | 128.9 | 72.7 |
SSD [26] | VGG16 | 300 × 300 | 23.7 | 47.7 | 55.4 |
RetinaNet [27] | ResNet50 + FPN | 800 × 800 | 36.8 | 73.9 | 68.8 |
YOLOv5s | CSPDarkNet-53 | 512 × 512 | 8.7 | 17.6 | 79.1 |
YOLOv8s [28] | CSPDarkNet-53 | 512 × 512 | 10.6 | 21.5 | 79.3 |
YOLOv9s [29] | CSPDarkNet-53 | 512 × 512 | 6.8 | 13.8 | 78.9 |
YOLOv10s [30] | CSPDarkNet-53 | 512 × 512 | 7.7 | 15.6 | 77.1 |
M-YOLOv8s [65] | CSPDarkNet-53 | 300 × 300 | 5.9 | 11.8 | 80.7 |
Imporved_YOLOv8 [66] | CSPDarkNet-53 | 300 × 300 | 5.6 | 11.2 | 80.2 |
IA-YOLOv8 (ours) | CSPDarkNet-53 | 512 × 512 | 10.9 | 22.1 | 82.3 |
Method | Backbone Network | Input_Size | Parameter (MB) | Model Size (MB) | mAP (%) |
---|---|---|---|---|---|
Faster RCNN [18] | ResNet50 + FPN | 600 × 600 | 41.5 | 83.3 | 25.6 |
Cascade-RCNN [19] | ResNet50 + FPN | 600 × 600 | 64.3 | 128.9 | 27.4 |
SSD [26] | VGG16 | 300 × 300 | 23.7 | 47.7 | 14.7 |
RetinaNet [27] | ResNet50 + FPN | 800 × 800 | 36.8 | 73.9 | 26.3 |
YOLOv5s | CSPDarkNet-53 | 512 × 512 | 8.7 | 17.6 | 30.7 |
YOLOv8s [28] | CSPDarkNet-53 | 512 × 512 | 10.6 | 21.5 | 35.9 |
YOLOv9s [29] | CSPDarkNet-53 | 512 × 512 | 6.8 | 13.8 | 30.9 |
YOLOv10s [30] | CSPDarkNet-53 | 512 × 512 | 7.7 | 15.6 | 29.5 |
M-YOLOv8s [65] | CSPDarkNet-53 | 300 × 300 | 5.9 | 11.8 | 38.6 |
Imporved_YOLOv8 [66] | CSPDarkNet-53 | 300 × 300 | 5.6 | 11.2 | 37.4 |
IA-YOLOv8 (ours) | CSPDarkNet-53 | 512 × 512 | 10.9 | 22.1 | 39.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yuan, Z.; Gong, J.; Guo, B.; Wang, C.; Liao, N.; Song, J.; Wu, Q. Small Object Detection in UAV Remote Sensing Images Based on Intra-Group Multi-Scale Fusion Attention and Adaptive Weighted Feature Fusion Mechanism. Remote Sens. 2024, 16, 4265. https://doi.org/10.3390/rs16224265
Yuan Z, Gong J, Guo B, Wang C, Liao N, Song J, Wu Q. Small Object Detection in UAV Remote Sensing Images Based on Intra-Group Multi-Scale Fusion Attention and Adaptive Weighted Feature Fusion Mechanism. Remote Sensing. 2024; 16(22):4265. https://doi.org/10.3390/rs16224265
Chicago/Turabian StyleYuan, Zhe, Jianglei Gong, Baolong Guo, Chao Wang, Nannan Liao, Jiawei Song, and Qiming Wu. 2024. "Small Object Detection in UAV Remote Sensing Images Based on Intra-Group Multi-Scale Fusion Attention and Adaptive Weighted Feature Fusion Mechanism" Remote Sensing 16, no. 22: 4265. https://doi.org/10.3390/rs16224265
APA StyleYuan, Z., Gong, J., Guo, B., Wang, C., Liao, N., Song, J., & Wu, Q. (2024). Small Object Detection in UAV Remote Sensing Images Based on Intra-Group Multi-Scale Fusion Attention and Adaptive Weighted Feature Fusion Mechanism. Remote Sensing, 16(22), 4265. https://doi.org/10.3390/rs16224265