A Feature-Enhanced Small Object Detection Algorithm Based on Attention Mechanism
Abstract
:1. Introduction
- We replaced the PAN (Path Aggregation Network) in YOLOv8 with HSPAN to better fuse features at different scales. And we incorporated a feature-enhanced attention mechanism called Content Anchor Attention, which strengthens central features and captures contextual relationships between edge pixels, thereby enhancing the model’s classification and localization abilities.
- We augmented the YOLOv8 detection head with Dynamic Head to improve the detection head’s scale awareness, spatial awareness, and task awareness for small objects in images. This enhancement increased the model’s ability to perceive small objects, optimizing the detector’s performance.
- During model training, we introduced the Slideloss classification loss function and the ShapeIoU localization loss function. These functions strengthen the model’s learning of difficult samples, address issues arising from sensitivity of IoU to small object shifts, and accelerate the model’s convergence.
2. Related Work
2.1. Yolo Algorithm
2.2. Multi-Scale Feature Fusion
2.3. Attention Mechanism
2.4. Detection Head
3. Method
3.1. CAA-HSPAN (Context Anchor Attention High-Level Screening-Feature Path Aggregation Network)
3.1.1. Feature Enhancement Module
3.1.2. Feature Fusion Module
3.2. Dynamic Head
3.2.1. Scale-Aware Attention
3.2.2. Spatial-Aware Attention
3.2.3. Task-Aware Attention
3.3. Loss Function
4. Experiments
4.1. Datasets
4.2. Experimental Environment and Training Strategies
4.3. Experimental Results on Visdrone2019 Dataset
4.4. Experimental Results on the PASCAL VOC Dataset
4.5. Ablation Experiments
4.5.1. Ablation Experiments on Visdrone2019 Val Dataest
4.5.2. Ablation Experiments on AI-TOD Val Dataset
4.6. Visualization
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Byun, S.; Shin, I.K.; Moon, J.; Kang, J.; Choi, S.I. Road traffic monitoring from UAV images using deep learning networks. Remote Sens. 2021, 13, 4027. [Google Scholar] [CrossRef]
- Li, Z.; Zhang, Y.; Wu, H.; Suzuki, S.; Namiki, A.; Wang, W. Design and application of a UAV autonomous inspection system for high-voltage power transmission lines. Remote Sens. 2023, 15, 865. [Google Scholar] [CrossRef]
- Bouguettaya, A.; Zarzour, H.; Kechida, A.; Taberkit, A.M. A survey on deep learning-based identification of plant and crop diseases from UAV-based aerial images. Clust. Comput. 2023, 26, 1297–1317. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Kisantal, M.; Wojna, Z.; Murawski, J.; Naruniec, J.; Cho, K. Augmentation for small object detection. arXiv 2019, arXiv:1902.07296. [Google Scholar]
- Deng, C.; Wang, M.; Liu, L.; Liu, Y.; Jiang, Y. Extended feature pyramid network for small object detection. IEEE Trans. Multimed. 2021, 24, 1968–1979. [Google Scholar] [CrossRef]
- Gong, Y.; Yu, X.; Ding, Y.; Peng, X.; Zhao, J.; Han, Z. Effective fusion factor in FPN for tiny object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 1160–1168. [Google Scholar]
- Noh, J.; Bae, W.; Lee, W.; Seo, J.; Kim, G. Better to follow, follow to be better: Towards precise supervision of feature super-resolution for small object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 9725–9734. [Google Scholar]
- Yang, C.; Huang, Z.; Wang, N. Querydet: Cascaded sparse query for accelerating high-resolution small object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13668–13677. [Google Scholar]
- Zhou, D.; Fang, J.; Song, X.; Guan, C.; Yin, J.; Dai, Y.; Yang, R. Iou loss for 2d/3d object detection. In Proceedings of the 2019 International Conference on 3D Vision (3DV), Québec City, QC, Canada, 16–19 September 2019; IEEE: New York, NY, USA, 2019; pp. 85–94. [Google Scholar]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021, arXiv:2110.13389. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-aligned one-stage object detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; IEEE Computer Society: Washington, DC, USA, 2021; pp. 3490–3499. [Google Scholar]
- Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7036–7045. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Jiang, Y.; Tan, Z.; Wang, J.; Sun, X.; Lin, M.; Li, H. Giraffedet: A heavy-neck paradigm for object detection. arXiv 2022, arXiv:2202.04256. [Google Scholar]
- Yang, G.; Lei, J.; Zhu, Z.; Cheng, S.; Feng, Z.; Liang, R. AFPN: Asymptotic feature pyramid network for object detection. In Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Oahu, HI, USA, 1–4 October 2023; IEEE: New York, NY, USA, 2023; pp. 2184–2189. [Google Scholar]
- Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Han, K.; Wang, Y. Gold-YOLO: Efficient object detector via gather-and-distribute mechanism. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 2, pp. 2204–2212. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2017; pp. 5998–6008. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Gao, Z.; Xie, J.; Wang, Q.; Li, P. Global second-order pooling convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3024–3033. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Misra, D.; Nalamada, T.; Arasanipalai, A.U.; Hou, Q. Rotate to attend: Convolutional triplet attention module. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 3139–3148. [Google Scholar]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
- Song, G.; Liu, Y.; Wang, X. Revisiting the sibling head in object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11563–11572. [Google Scholar]
- Wu, Y.; Chen, Y.; Yuan, L.; Liu, Z.; Wang, L.; Li, H.; Fu, Y. Rethinking classification and localization for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10186–10195. [Google Scholar]
- Ge, Z. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Zhuang, J.; Qin, Z.; Yu, H.; Chen, X. Task-specific context decoupling for object detection. arXiv 2023, arXiv:2303.01047. [Google Scholar]
- Jiang, H.; Gu, Q. G-head: Gating head for multi-task learning in one-stage object detection. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, 18–22 July 2022; IEEE: New York, NY, USA, 2022; pp. 1–6. [Google Scholar]
- Chen, Y.; Zhang, C.; Chen, B.; Huang, Y.; Sun, Y.; Wang, C.; Fu, X.; Dai, Y.; Qin, Y.; Peng, Y.; et al. Accurate leukocyte detection based on deformable-DETR and multi-level feature fusion for aiding diagnosis of blood diseases. Comput. Biol. Med. 2024, 170, 107917. [Google Scholar] [CrossRef] [PubMed]
- Cai, X.; Lai, Q.; Wang, Y.; Wang, W.; Sun, Z.; Yao, Y. Poly kernel inception network for remote sensing detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 27706–27716. [Google Scholar]
- Dai, X.; Chen, Y.; Xiao, B.; Chen, D.; Liu, M.; Yuan, L.; Zhang, L. Dynamic head: Unifying object detection heads with attentions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7373–7382. [Google Scholar]
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–21012. [Google Scholar]
- Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 2021, 52, 8574–8586. [Google Scholar] [CrossRef] [PubMed]
- Yu, Z.; Huang, H.; Chen, W.; Su, Y.; Liu, Y.; Wang, X. Yolo-facev2: A scale and occlusion aware face detector. arXiv 2022, arXiv:2208.02019. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.Y.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Zhang, Y.F.; Ren, W.; Zhang, Z.; Zhen, J.; Liang, W.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
- Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
- Zhang, H.; Zhang, S. Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale. arXiv 2023, arXiv:2312.17663. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; IEEE Press: New York, NY, USA, 2019; pp. 213–226. [Google Scholar]
- Wang, J.; Yang, W.; Guo, H.; Zhang, R.; Xia, G.S. Tiny object detection in aerial images. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: New York, NY, USA, 2021; pp. 3791–3798. [Google Scholar]
- Zeng, S.; Yang, W.; Jiao, Y.; Geng, L.; Chen, X. SCA-YOLO: A new small object detection model for UAV images. Vis. Comput. 2024, 40, 1787–1803. [Google Scholar] [CrossRef]
- Li, Y.; Han, Z.; Xu, H.; Liu, L.; Li, X.; Zhang, K. YOLOv3-lite: A lightweight crack detection network for aircraft structure based on depthwise separable convolutions. Appl. Sci. 2019, 9, 3781. [Google Scholar] [CrossRef]
- Su, Z.; Yu, J.; Tan, H.; Wan, X.; Qi, K. MSA-YOLO: A Remote Sensing Object Detection Model Based on Multi-Scale Strip Attention. Sensors 2023, 23, 6811. [Google Scholar] [CrossRef] [PubMed]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 6569–6578. [Google Scholar]
- Zhang, Z.D.; Tan, M.L.; Lan, Z.C.; Liu, H.C.; Pei, L.; Yu, W.X. CDNet: A real-time and robust crosswalk detection network on Jetson nano based on YOLOv5. Neural Comput. Appl. 2022, 34, 10719–10730. [Google Scholar] [CrossRef]
Parameters | Configuration |
---|---|
CPU | I5-13600KF |
GPU | NVIDIA GeForce RTX 3060ti |
GPU memory size | 8 G |
Operating system | Win 11 |
Deep learning architecture | Pytorch2.0.0 + Cuda11.8 |
Model | Object Category | mAP50 (%) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
PED | PER | BC | Car | Van | Truck | TRI | ATRI | Bus | MO | ||
Fast R-CNN [6] | 21.4 | 15.6 | 6.7 | 51.7 | 29.5 | 19 | 13.1 | 7.7 | 31.4 | 20.7 | 21.7 |
Faster R-CNN [7] | 20.9 | 14.8 | 7.3 | 51 | 29.7 | 19.5 | 14 | 8.8 | 30.5 | 21.2 | 21.8 |
CascadeR-CNN [61] | 22.2 | 14.8 | 7.6 | 54.6 | 31.5 | 21.6 | 14.8 | 8.6 | 34.9 | 21.4 | 23.2 |
RetinaNet [61] | 13 | 7.9 | 1.4 | 45.5 | 19.9 | 11.5 | 6.3 | 4.2 | 17.8 | 11.8 | 13.9 |
CenterNet [61] | 22.6 | 20.6 | 14.6 | 59.7 | 24 | 21.3 | 20.1 | 17.4 | 37.9 | 23.7 | 26.2 |
DMNet [61] | 28.5 | 20.4 | 15.9 | 56.8 | 37.9 | 30.1 | 22.6 | 14 | 47.1 | 29.2 | 30.3 |
HRDet+ [61] | 28.6 | 14.5 | 11.7 | 49.4 | 37.1 | 35.2 | 28.8 | 21.9 | 43.3 | 23.5 | 28 |
ACM-OD [59] | 30.8 | 15.5 | 10.3 | 52.7 | 38.9 | 33.2 | 26.9 | 21.9 | 41.4 | 24.9 | 29.7 |
CDNet [59] | 35.6 | 19.2 | 13.8 | 55.8 | 42.1 | 38.2 | 33 | 25.4 | 49.5 | 29.3 | 34.2 |
HR-Cascade++ [61] | 32.6 | 17.3 | 11.1 | 54.7 | 42.4 | 35.3 | 32.7 | 24.1 | 46.5 | 28.2 | 32.5 |
MSC-CenterNet [61] | 33.7 | 15.2 | 12.1 | 55.2 | 40.5 | 34.1 | 29.2 | 21.6 | 42.2 | 27.5 | 31.1 |
YOLOv3-LITE [62] | 34.5 | 23.4 | 7.9 | 70.8 | 31.3 | 21.9 | 15.3 | 6.2 | 40.9 | 32.7 | 28.5 |
MSA-YOLO [63] | 33.4 | 17.3 | 11.2 | 76.8 | 41.5 | 41.4 | 14.8 | 18.4 | 60.9 | 31 | 34.7 |
YOLOv4 [21] | 24.8 | 12.6 | 8.6 | 64.3 | 22.4 | 22.7 | 11.4 | 7.6 | 44.3 | 21.7 | 30.7 |
YOLOv5s | 24.2 | 15.9 | 7.6 | 68.7 | 30.5 | 30.5 | 11.1 | 14.8 | 52.5 | 21.8 | 27.7 |
YOLOv8s | 27.2 | 14.6 | 9.2 | 71.6 | 37.1 | 39.1 | 16.9 | 18.5 | 56.8 | 29 | 32 |
YOLOv8m | 30.6 | 17.3 | 11.6 | 74.4 | 40.2 | 44.5 | 20.1 | 19.9 | 59.3 | 33.3 | 35.1 |
YOLOv9s [25] | 27.5 | 15.3 | 9.2 | 71.4 | 37.6 | 39.6 | 19.1 | 20.4 | 58.2 | 30 | 32.8 |
YOLOv10s [26] | 26.4 | 16.3 | 7.7 | 70.7 | 35.5 | 37.2 | 16.5 | 18 | 55.6 | 28.3 | 31.2 |
YOLO11s | 27.6 | 13.9 | 8.55 | 71.7 | 37.3 | 39.7 | 18.4 | 19 | 57.4 | 29.3 | 32.2 |
Ours (test) | 39.1 | 27.5 | 19.4 | 79.7 | 46.6 | 44.8 | 26.4 | 25.8 | 62.7 | 41.6 | 41.4 |
Model | Backbone | Size | Parameters | mAP50 | FPS |
---|---|---|---|---|---|
Fast R-CNN [6] | ResNet | 600 × 1000 | 44.5 M | 70.0 | 1.9 |
Faster R-CNN [7] | ResNet | 600 × 1000 | 40.1 M | 76.4 | 2.4 |
CascadeR-CNN [64] | ResNet | 600 × 1000 | 76.8 M | 79.6 | 2.2 |
CenterNet [65] | ResNet | 512 × 512 | 12.7 M | 77.1 | 7 |
RetinaNet [10] | ResNet | 600 × 600 | 33.8 M | 81.5 | 5 |
YOLO [61] | GoogleNet | 448 × 448 | 62 M | 63.4 | 45 |
YOLOv2 [61] | DarkNet | 352 × 352 | 67 M | 76.8 | 67 |
YOLOv3 [20] | DarkNet | 416 × 416 | 61.5 M | 78.3 | 36.9 |
YOLOv3-LITE [62] | DarkNet | 416 × 416 | 31 M | 74.0 | 71.9 |
YOLOv4 [21] | CSPDarkNet | 640 × 640 | 39.6 M | 83.2 | 60 |
MSA-YOLO [63] | CSPDarkNet | 640 × 640 | 12.1 | 80.4 | 174.9 |
CDNet [66] | CSPDarkNet | 640 × 640 | 21.5 M | 80.3 | 165.3 |
YOLOv5s | CSPDarkNet | 640 × 640 | 9.13 M | 79.9 | 210.5 |
YOLOv8s | CSPDarkNet | 640 × 640 | 11.2 M | 81.5 | 200.7 |
YOLOv8m | CSPDarkNet | 640 × 640 | 25.9 M | 84.5 | 80.7 |
YOLOv9s [25] | GELAN | 640 × 640 | 7.3 M | 82.2 | 101.1 |
YOLOv10s [26] | CSPDarkNet | 640 × 640 | 8.07 M | 80.8 | 172.8 |
YOLO11s | CSPDarkNet | 640 × 640 | 9.4 M | 81.6 | 212.2 |
Ours | CSPDarkNet | 640 × 640 | 10.1 M | 82.4 | 52.2 |
Method | mAP50 (%) | mAP50:95 (%) | P (%) | R (%) | GFLOPs | Params (M) | FPS |
YOLOv8s | 39.3 | 23.5 | 50.9 | 38.2 | 28.7 | 11.1 | 200 |
YOLOv8s-topk = 5 | 40.9 | 24.4 | 52.1 | 39.9 | 28.7 | 11.1 | 200 |
YOLOv8s-p2-topk = 5 | 45.5 | 27.7 | 55.9 | 43.7 | 37 | 10.6 | 146 |
YOLOv8s-p2-topk = 5-CAAHSPAN | 48.7 | 29.9 | 57.2 | 46.9 | 69.4 | 9.53 | 75 |
YOLOv8s-p2-topk = 5-CAAHSPAN- Slideloss-ShapeIoU | 49.4 | 30.3 | 59.6 | 47.1 | 69.4 | 9.53 | 75 |
YOLOv8s-p2-topk = 5-CAAHSPAN- Slideloss-ShapeIoU-DyHead | 52.2 | 31.8 | 60.9 | 49.5 | 72.7 | 10.1 | 52 |
Method | mAP50 (%) | mAP50:95 (%) | P (%) | R (%) | GFLOPs | Params (M) | FPS |
---|---|---|---|---|---|---|---|
YOLOv8s | 38.8 | 16.2 | 58.1 | 37.7 | 28.7 | 11.1 | 200 |
YOLOv8s-p2-topk = 5 | 41.7 | 18.3 | 60.9 | 40.4 | 37 | 10.6 | 146 |
YOLOv8s-p2-topk = 5-CAAHSPAN | 43.3 | 19.3 | 62.9 | 40.7 | 69.4 | 9.53 | 75 |
YOLOv8s-p2-topk = 5-CAAHSPAN-Slideloss-ShapeIoU | 44.1 | 19.6 | 63.1 | 41.9 | 69.4 | 9.53 | 75 |
YOLOv8s-p2-topk = 5-CAAHSPAN-Slideloss-ShapeIoU-DyHead | 45.2 | 20.1 | 63.9 | 43.7 | 72.7 | 10.1 | 52 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Quan, Z.; Sun, J. A Feature-Enhanced Small Object Detection Algorithm Based on Attention Mechanism. Sensors 2025, 25, 589. https://doi.org/10.3390/s25020589
Quan Z, Sun J. A Feature-Enhanced Small Object Detection Algorithm Based on Attention Mechanism. Sensors. 2025; 25(2):589. https://doi.org/10.3390/s25020589
Chicago/Turabian StyleQuan, Zhe, and Jun Sun. 2025. "A Feature-Enhanced Small Object Detection Algorithm Based on Attention Mechanism" Sensors 25, no. 2: 589. https://doi.org/10.3390/s25020589
APA StyleQuan, Z., & Sun, J. (2025). A Feature-Enhanced Small Object Detection Algorithm Based on Attention Mechanism. Sensors, 25(2), 589. https://doi.org/10.3390/s25020589