Lightweight and Efficient Tiny-Object Detection Based on Improved YOLOv8n for UAV Aerial Images
Abstract
:1. Introduction
- To enhance the object detection accuracy for tiny targets and optimize the model efficiency, we introduce the LHGNet backbone, which is a more extensive feature extraction network. This network leverages the LHG block to exploit the advantages of depth-wise separable convolution [15] and the channel shuffle module, establishing a deep feature extraction network that comprehensively extracts and integrates local detail information and channel characteristics across different stages.
- We present and utilize the LGS bottleneck and LGSCSP fusion module, further optimizing the model efficiency. The LGS bottleneck is accomplished by introducing an additional convolutional layer and substituting the add operation with a concatenation operation. The goal is to preserve the channel information as much as possible, facilitating more detailed feature information extraction across various scales.
- We optimize the detection head of YOLOv8n by modifying its structure and the size of the feature maps. This adjustment substantially improves the detection accuracy in capturing tiny targets.
2. Related Works
3. Method
3.1. The Structure of YOLOv8
3.2. Improved Algorithm
3.2.1. LHGNet Backbone
3.2.2. Improvement of Neck
3.2.3. Improvement of Detection Head
4. Experiments
4.1. Dataset and Parameter Settings
4.2. Experimental Metrics
4.3. Ablation Experiments
4.4. Comparison Experiments
4.5. Visualization
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Xue, Z.; Xu, R.; Bai, D.; Lin, H. YOLO-Tea: A Tea Disease Detection Model Improved by YOLOv5. Forests 2023, 14, 415. [Google Scholar] [CrossRef]
- Zhou, X.; Xu, X.; Liang, W.; Zeng, Z.; Shimizu, S.; Yang, L.T.; Jin, Q. Intelligent Small Object Detection for Digital Twin in Smart Manufacturing With Industrial Cyber-Physical Systems. IEEE Trans. Ind. Inform. 2022, 18, 1377–1386. [Google Scholar] [CrossRef]
- Guo, Y.; Chen, S.; Zhan, R.; Wang, W.; Zhang, J. LMSD-YOLO: A Lightweight YOLO Algorithm for Multi-Scale SAR Ship Detection. Remote Sens. 2022, 14, 4801. [Google Scholar] [CrossRef]
- Yang, Y.; Miao, Z.; Zhang, H.; Wang, B.; Wu, L. Lightweight Attention-Guided YOLO With Level Set Layer for Landslide Detection From Optical Satellite Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 3543–3559. [Google Scholar] [CrossRef]
- Zhu, Y.; Tang, H. Automatic Damage Detection and Diagnosis for Hydraulic Structures Using Drones and Artificial Intelligence Techniques. Remote Sens. 2023, 15, 615. [Google Scholar] [CrossRef]
- Zhou, K.; Zhang, M.; Wang, H.; Tan, J. Ship Detection in SAR Images Based on Multi-Scale Feature Extraction and Adaptive Feature Fusion. Remote Sens. 2022, 14, 755. [Google Scholar] [CrossRef]
- Zhu, J.; Pang, Q.; Li, S.; Tian, S.; Li, J.; Li, Y. ADDet: An Efficient Multiscale Perceptual Enhancement Network for Aluminum Defect Detection. IEEE Trans. Instrum. Meas. 2024, 73, 5004714. [Google Scholar] [CrossRef]
- Yuan, M.; Zhou, Y.; Ren, X.; Zhi, H.; Zhang, J.; Chen, H. YOLO-HMC: An Improved Method for PCB Surface Defect Detection. IEEE Trans. Instrum. Meas. 2024, 73, 2001611. [Google Scholar] [CrossRef]
- Xu, H.; Zhong, S.; Zhang, T.; Zou, X. Multiscale Multilevel Residual Feature Fusion for Real-Time Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5002116. [Google Scholar] [CrossRef]
- Liu, C.; Yang, D.; Tang, L.; Zhou, X.; Deng, Y. A Lightweight Object Detector Based on Spatial-Coordinate Self-Attention for UAV Aerial Images. Remote Sens. 2023, 15, 83. [Google Scholar] [CrossRef]
- Yue, X.; Meng, L. YOLO-SM: A Lightweight Single-Class Multi-Deformation Object Detection Network. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2467–2480. [Google Scholar] [CrossRef]
- Yang, X.; Zhang, J.; Chen, C.; Yang, D. An Efficient and Lightweight CNN Model With Soft Quantification for Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5230713. [Google Scholar] [CrossRef]
- Tong, K.; Wu, Y.; Zhou, F. Recent advances in small object detection based on deep learning: A review. Image Vis. Comput. 2020, 97, 103910. [Google Scholar] [CrossRef]
- YOLOv8. Available online: https://github.com/ultralytics/ultralytics (accessed on 15 November 2023).
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv 2016, arXiv:1610.02357. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA; pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Liao, H.-Y.M.; Yeh, I.-H.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. arXiv 2019, arXiv:1911.11929. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. arXiv 2014, arXiv:1406.4729. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. arXiv 2016, arXiv:1612.03144. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. arXiv 2018, arXiv:1803.01534. [Google Scholar] [CrossRef]
- Misra, D. Mish: A Self Regularized Non-Monotonic Activation Function. arXiv 2019, arXiv:1908.08681. [Google Scholar] [CrossRef]
- Yolov5. Available online: https://github.com/ultralytics/yolov5 (accessed on 15 March 2023).
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar] [CrossRef]
- Xu, D.; Wu, Y. Improved YOLO-V3 with DenseNet for Multi-Scale Remote Sensing Target Detection. Sensors 2020, 20, 4276. [Google Scholar] [CrossRef] [PubMed]
- Xu, Q.; Lin, R.; Yue, H.; Huang, H.; Yang, Y.; Yao, Z. Research on Small Target Detection in Driving Scenarios Based on Improved Yolo Network. IEEE Access 2020, 8, 27574–27583. [Google Scholar] [CrossRef]
- Qiu, Z.; Bai, H.; Chen, T. Special Vehicle Detection from UAV Perspective via YOLO-GNS Based Deep Learning Network. Drones 2023, 7, 117. [Google Scholar] [CrossRef]
- Cao, J.; Bao, W.; Shang, H.; Yuan, M.; Cheng, Q. GCL-YOLO: A GhostConv-Based Lightweight YOLO Network for UAV Small Object Detection. Remote Sens. 2023, 15, 4932. [Google Scholar] [CrossRef]
- Xie, S.; Zhou, M.; Wang, C.; Huang, S. CSPPartial-YOLO: A Lightweight YOLO-Based Method for Typical Objects Detection in Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 388–399. [Google Scholar] [CrossRef]
- Luo, Y.; Ci, Y.; Jiang, S.; Wei, X. A novel lightweight real-time traffic sign detection method based on an embedded device and YOLOv8. J. Real-Time Image Process. 2024, 21, 24. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv 2016, arXiv:1608.06993. [Google Scholar] [CrossRef]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. arXiv 2016, arXiv:1611.05431. [Google Scholar] [CrossRef]
- Lee, Y.; Hwang, J.-W.; Lee, S.; Bae, Y.; Park, J. An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection. arXiv 2019, arXiv:1904.09730. [Google Scholar] [CrossRef]
- PaddlePaddle. HGNetv2. Available online: https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py (accessed on 15 November 2023).
- Lv, W.; Zhao, Y.; Xu, S.; Wei, J.; Wang, G.; Cui, C.; Du, Y.; Dang, Q.; Liu, Y. DETRs Beat YOLOs on Real-time Object Detection. arXiv 2023, arXiv:2304.08069. [Google Scholar] [CrossRef]
- Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv 2022, arXiv:2206.02424. [Google Scholar] [CrossRef]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 213–226. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference 2016, Part I 14, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Zhang, Z. Drone-YOLO: An Efficient Neural Network Method for Target Detection in Drone Images. Drones 2023, 7, 526. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Fu, J.; Sun, X.; Wang, Z.; Fu, K. An Anchor-Free Method Based on Feature Balancing and Refinement Network for Multiscale Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 1331–1344. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. arXiv 2017, arXiv:1708.02002. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. arXiv 2017, arXiv:1712.00726. [Google Scholar] [CrossRef]
- Ali, S.; Siddique, A.; Ates, H.F.; Gunturk, B.K. Improved YOLOv4 for Aerial Object Detection. In Proceedings of the 29th IEEE Conference on Signal Processing and Communications Applications (SIU), Electr Network 2021, Istanbul, Turkey, 9–11 June 2021; IEEE: Piscataway, NJ, USA. [Google Scholar]
YOLOv8n Neck | Proposed Neck | ||||||
---|---|---|---|---|---|---|---|
Layer Name | The Number of Input/Output Channels | The Number of Layers | The Number of Model Parameters (×106) | Layer Name | The Number of Input/Output Channels | The Number of Layers | The Number of Model Parameters (×106) |
0. Upsample | 256/128 | 1 | - | 0. Upsample | 256/256 | 1 | - |
1. Concat | 384/384 | 1 | - | 1. Concat | 512/512 | 1 | - |
2. C2f block | 384/128 | 15 | 0.148 | 2. LGSCSP | 512/128 | 46 | 0.137 |
3. Upsample | 128/64 | 1 | - | 3. Upsample | 128/128 | 1 | - |
4. Concat | 192/192 | 1 | - | 4. Concat | 256/256 | 1 | - |
5. C2f block | 192/64 | 15 | 0.037 | 5. LGSCSP | 256/64 | 46 | 0.035 |
6. ConvModule | 64/64 | 3 | 0.037 | 6. Upsample | 64/32 | 3 | - |
7. Concat | 192/192 | 1 | - | 7. Concat | 96/96 | 1 | - |
8. C2f block | 192/128 | 15 | 0.124 | 8. LGSCSP | 96/32 | 46 | 0.008 |
9. ConvModule | 128/128 | 3 | 0.148 | 9. ConvModule | 32/64 | 3 | 0.019 |
10. Concat | 384/384 | 1 | - | 10. Concat | 128/128 | 1 | - |
11. C2f block | 384/256 | 15 | 0.493 | 11. LGSCSP | 128/64 | 46 | 0.027 |
12. ConvModule | 64/64 | 3 | 0.037 | ||||
13. Concat | 192/192 | 1 | - | ||||
14. LGSCSP | 192/128 | 46 | 0.096 | ||||
Total. | 72 | 0.987 | 246 | 0.359 |
Parameter | Setup |
---|---|
Epoch | 200 |
Image Size | 640 × 640 pixels |
Batch Size | 8 |
Optimizer | SGD |
Learning Rate | 0.01 |
Momentum | 0.937 |
Weight Decay | 0.0005 |
Model | mAP(0.5) (%) | mAP(0.5:0.95) (%) | The Number of Model Parameters (×106) | FLOPS (×109) |
---|---|---|---|---|
Baseline (YOLOv8n) | 33.9 | 19.4 | 3.0 | 8.2 |
Baseline + GS | 33.1 | 18.7 | 2.8 | 7.4 |
Baseline + LGS | 33.1 | 18.7 | 2.8 | 7.3 |
Baseline + LHGNet + GS + head | 38.6 | 22.2 | 2.2 | 13.3 |
Baseline + LHGNet + LGS + head (LE-YOLO) | 39.3 | 22.7 | 2.1 | 13.1 |
Model | mAP(0.5) (%) | mAP(0.5:0.95) (%) | The Number of Model Parameters (×106) | FLOPS (×109) |
---|---|---|---|---|
Baseline (YOLOv8n) | 33.9 | 19.4 | 3.0 | 8.2 |
Baseline + LHGNet | 35.6 | 20.7 | 3.2 | 10.5 |
Baseline + LGS | 33.1 | 18.7 | 2.8 | 7.3 |
Baseline + head | 38.2 | 22.1 | 2.3 | 16.7 |
Baseline + LGS + head | 36.6 | 20.9 | 1.9 | 10.6 |
Baseline + LHGNet + LGS + head (LE-YOLO) | 39.3 | 22.7 | 2.1 | 13.1 |
Model | PED | PER | BC | Car | Van | TR | TRI | ATR | Bus | MO | mAP(0.5) (%) |
---|---|---|---|---|---|---|---|---|---|---|---|
Baseline (YOLOv8n) | 35.6 | 29.1 | 7.9 | 76.2 | 39.2 | 31.2 | 21.6 | 11.0 | 48.9 | 38.4 | 33.9 |
Baseline + LHGNet | 39.1 | 31.0 | 9.7 | 78.0 | 41.7 | 31.4 | 23.6 | 13.7 | 47.2 | 40.5 | 35.6 |
Baseline + LGS | 35.4 | 28.1 | 7.5 | 75.7 | 38.3 | 27.5 | 21.7 | 12.4 | 47.3 | 37.1 | 33.1 |
Baseline + head | 45.6 | 38.1 | 11.4 | 81.1 | 42.6 | 29.2 | 24.5 | 13.3 | 50.5 | 45.1 | 38.2 |
Baseline + LGS + head | 43.6 | 37.0 | 11.4 | 80.4 | 40.9 | 28.6 | 21.8 | 13.5 | 45.1 | 43.9 | 36.6 |
Baseline + LHGNet + LGS + head (LE-YOLO) | 46.7 | 39.1 | 12.0 | 82.3 | 43.1 | 31.4 | 24.5 | 13.8 | 52.9 | 46.8 | 39.3 |
Model | Backbone | mAP(0.5) (%) | mAP(0.5:0.95) (%) | mAPtiny (%) | The Number of Model Parameters (×106) | FLOPS (×109) |
---|---|---|---|---|---|---|
SSD | VGG16 | 9.2 | 5.0 | 0.5 | 58.0 | - |
Faster RCNN | ResNet50 | 29.0 | 17.8 | 7.2 | 165.6 | - |
YOLOv5n | CSPDarkNet53 | 32.9 | 18.6 | 9.0 | 2.5 | 7.2 |
YOLOv6n | EfficientRep | 30.6 | 17.4 | 7.6 | 4.2 | 11.9 |
YOLOv7 | Extended-ELAN | 29.1 | 17.9 | 8.6 | 37.3 | 105.3 |
YOLOv8n | CSPDarkNet53 | 33.9 | 19.4 | 9.4 | 3.0 | 8.2 |
Drone-YOLO-n | RepVGG | 38.1 | 22.7 | - | 3.1 | - |
LE-YOLO | LHGNet | 39.3 | 22.7 | 13.5 | 2.1 | 13.1 |
Model | mAP(0.5) (%) | mAP(0.5:0.95) (%) | Average Inference Time per Image (ms) | The Number of Model Parameters (×106) | FLOPS (×109) |
---|---|---|---|---|---|
YOLOv5n | 32.9 | 18.6 | 127.5 | 2.5 | 7.2 |
YOLOv6n | 30.6 | 17.4 | 174.8 | 4.2 | 11.9 |
YOLOv8n | 33.9 | 19.4 | 141.7 | 3.0 | 8.2 |
LE-YOLO | 39.3 | 22.7 | 338.6 | 2.1 | 13.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yue, M.; Zhang, L.; Huang, J.; Zhang, H. Lightweight and Efficient Tiny-Object Detection Based on Improved YOLOv8n for UAV Aerial Images. Drones 2024, 8, 276. https://doi.org/10.3390/drones8070276
Yue M, Zhang L, Huang J, Zhang H. Lightweight and Efficient Tiny-Object Detection Based on Improved YOLOv8n for UAV Aerial Images. Drones. 2024; 8(7):276. https://doi.org/10.3390/drones8070276
Chicago/Turabian StyleYue, Min, Liqiang Zhang, Juan Huang, and Haifeng Zhang. 2024. "Lightweight and Efficient Tiny-Object Detection Based on Improved YOLOv8n for UAV Aerial Images" Drones 8, no. 7: 276. https://doi.org/10.3390/drones8070276
APA StyleYue, M., Zhang, L., Huang, J., & Zhang, H. (2024). Lightweight and Efficient Tiny-Object Detection Based on Improved YOLOv8n for UAV Aerial Images. Drones, 8(7), 276. https://doi.org/10.3390/drones8070276