YOLO-TC: An Optimized Detection Model for Monitoring Safety-Critical Small Objects in Tower Crane Operations
Abstract
:1. Introduction
- We introduce ECA-CBAM, a novel channel–spatial attention module that integrates the strengths of Efficient Channel Attention (ECA) [9] and the Convolutional Block Attention Module (CBAM) [10]. By utilizing lightweight 1D convolution for cross-channel interactions, ECA-CBAM improves feature extraction efficiency while avoiding information loss.
- To address multi-scale detection challenges, we propose the Hierarchical Asymptotic Path Aggregation Network (HA-PANet), which mitigates the semantic gap between non-adjacent feature maps. This enhancement enables more effective multi-scale feature fusion and improves the localization accuracy for small objects.
- We conduct extensive experiments on a custom tower crane dataset and the PASCAL VOC benchmark. Comparative analyses with state-of-the-art models, including Retina-Net [11], YOLOv5, YOLOX, YOLOv7 and YOLOv8, demonstrate that YOLO-TC achieves superior performance in detecting small objects while maintaining competitive inference speed and computational efficiency.
2. Related Work
2.1. Development and Application of Object Detection Algorithms
2.2. Development and Application of Attentional Mechanisms
2.3. Development and Application of Feature Fusion Strategies
3. Methodology
3.1. YOLO-TC Network Structure
3.2. ECA-CBAM Attention Mechanism
3.3. HA-PANet Structure
3.4. Detection Head Reconstruction
3.5. MPDIoU Loss
4. Experimental Design
4.1. Datasets and Evaluation Indicators
4.2. Parameter Settings
4.3. Comparative Experiments on Attentional Mechanisms
4.4. Ablation Experiment
4.5. Comparative Experiments with Other Algorithms
4.6. Model Generalizability Analysis
4.7. Example Analysis of Testing
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Redmon, J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Ge, Z. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Long, X.; Deng, K.; Wang, G.; Zhang, Y.; Dang, Q.; Gao, Y.; Shen, H.; Ren, J.; Han, S.; Ding, E.; et al. PP-YOLO: An effective and efficient implementation of object detector. arXiv 2020, arXiv:2007.12099. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Lin, T. Focal Loss for Dense Object Detection. arXiv 2017, arXiv:1708.02002. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Zhang, Y.; Zhuo, L.; Ma, C.; Zhang, Y.; Li, J. CTA-FPN: Channel-Target Attention Feature Pyramid Network for Prohibited Object Detection in X-ray Images. Sens. Imaging 2023, 24, 14. [Google Scholar] [CrossRef]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Yang, Z.; Yuan, Y.; Zhang, M.; Zhao, X.; Zhang, Y.; Tian, B. Safety distance identification for crane drivers based on mask R-CNN. Sensors 2019, 19, 2789. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Kang, S.; Wang, H. Crane hook detection based on mask r-cnn in steel-making plant. J. Phys. Conf. Ser. 2020, 1575, 012151. [Google Scholar] [CrossRef]
- Luo, H.; Wang, M.; Wong PK, Y.; Cheng, J.C. Full body pose estimation of construction equipment using computer vision and deep learning techniques. Autom. Constr. 2020, 110, 103016. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Huang, D.; Wang, Y. Learning spatial fusion for single-shot object detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
- Yang, G.; Lei, J.; Zhu, Z.; Cheng, S.; Feng, Z.; Liang, R. AFPN: Asymptotic feature pyramid network for object detection. In Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, HI, USA, 1–4 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2184–2189. [Google Scholar]
- Ma, S.; Xu, Y. Mpdiou: A loss for efficient and accurate bounding box regression. arXiv 2023, arXiv:2307.07662. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
Models | Params/106 | FLOPs/109 | [email protected]/% ↑ | FPS ↑ |
---|---|---|---|---|
YOLOv7 | 36.54 | 104.6 | 85.1 | 87 |
YOLOv7 + CAM | 42.61 | 129.4 | 85.7 | 53 |
YOLOv7 + E-CAM | 38.10 | 122.4 | 86.3 | 67 |
YOLOv7 + CBAM | 42.62 | 129.6 | 86.8 | 45 |
YOLOv7 + ECA-CBAM | 38.10 | 122.4 | 87.4 | 67 |
YOLOv7 | ECA-CBAM | HA-PANet | Detection Head Reconstruction | MPDIoU | Params/106 | [email protected]/% ↑ |
---|---|---|---|---|---|---|
√ | 36.54 | 85.1 | ||||
√ | √ | 38.10 | 87.4 | |||
√ | √ | 39.27 | 86.1 | |||
√ | √ | 33.47 | 85.9 | |||
√ | √ | 36.54 | 85.5 | |||
√ | √ | √ | 43.32 | 87.9 | ||
√ | √ | √ | √ | 38.67 | 88.5 | |
√ | √ | √ | √ | √ | 38.67 | 88.8 |
Models | APC/% ↑ | APP/% ↑ | [email protected]/% ↑ | Params/106 | FPS |
---|---|---|---|---|---|
Retina-Net | 52.1 | 87.1 | 67.1 | 37.30 | 34 |
Center-Net | 60.6 | 88.3 | 74.1 | 32.00 | 80 |
YOLOv5 | 66.5 | 90.9 | 77.7 | 6.60 | 65 |
YOLOX | 67.9 | 91.4 | 78.3 | 8.27 | 62 |
YOLOv7 | 73.5 | 96.2 | 85.1 | 36.54 | 87 |
YOLOv8 | 76.1 | 98.7 | 87.1 | 29.14 | 101 |
Ours | 78.6 | 99.0 | 88.8 | 38.67 | 64 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ding, D.; Deng, Z.; Yang, R. YOLO-TC: An Optimized Detection Model for Monitoring Safety-Critical Small Objects in Tower Crane Operations. Algorithms 2025, 18, 27. https://doi.org/10.3390/a18010027
Ding D, Deng Z, Yang R. YOLO-TC: An Optimized Detection Model for Monitoring Safety-Critical Small Objects in Tower Crane Operations. Algorithms. 2025; 18(1):27. https://doi.org/10.3390/a18010027
Chicago/Turabian StyleDing, Dong, Zhengrong Deng, and Rui Yang. 2025. "YOLO-TC: An Optimized Detection Model for Monitoring Safety-Critical Small Objects in Tower Crane Operations" Algorithms 18, no. 1: 27. https://doi.org/10.3390/a18010027
APA StyleDing, D., Deng, Z., & Yang, R. (2025). YOLO-TC: An Optimized Detection Model for Monitoring Safety-Critical Small Objects in Tower Crane Operations. Algorithms, 18(1), 27. https://doi.org/10.3390/a18010027