DAPONet: A Dual Attention and Partially Overparameterized Network for Real-Time Road Damage Detection
Abstract
:1. Introduction
- 1.
- A novel Global Localization and Context Attention (GLCA) mechanism is proposed, enhancing the model’s ability to handle complex backgrounds and multi-scale targets through the integration of both local and global attention mechanisms.
- 2.
- The proposed Cross-Stage Partial Depthwise Overparameterized Attention module is proposed, which combines partial overparameterized convolution with global and local Context Attention mechanisms to achieve efficient processing of multi-scale features, significantly improving the detection accuracy and computational efficiency of the model.
- 3.
- This study propose the Mixed Convolutional Downsampling module, which downsamples and processes feature maps via multiple parallel paths, enhancing both the diversity and efficiency of feature extraction.
- 4.
- A real-time detection model, a dual attention and partially overparameterized network, is designed for road damage detection tasks in complex scenes, which significantly improves the model’s performance in multi-scale feature extraction and fusion by incorporating the dual attention mechanism and partially overparameterized convolution, as well as parallel downsampling. The model is validated on SVRDD public dataset and MS COCO dataset, which demonstrates the superiority of the proposed model.
2. Related Work
2.1. Object Detection
2.2. Attention Mechanism
2.3. Lightweight Object Detection Models
2.4. Road Damage Detection
3. Methodology
3.1. Overview
3.2. Global Localization and Context Attention
3.3. Cross-Stage Partial Depthwise Overparameterized Attention Module
3.4. Mixed Convolutional Downsampling
3.5. Loss Function
4. Experimental Details
4.1. Datasets
4.2. Experimental Environment
4.3. Evaluation Metrics
5. Experimental Results and Discussion and Analysis
5.1. Comparative Experiments
5.2. General Object Detection Experiments
5.3. Ablation Study
5.4. Discussion on Dataset Biases and Limitations
5.5. Error Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Fan, L.; Cao, D.; Zeng, C.; Li, B.; Li, Y.; Wang, F.-Y. Cognitive-Based Crack Detection for Road Maintenance: An Integrated System in Cyber-Physical-Social Systems. IEEE Trans. Syst. Man, Cybern. Syst. 2023, 53, 3485–3500. [Google Scholar] [CrossRef]
- Zhang, T. Toward Automated Vehicle Teleoperation: Vision, Opportunities, and Challenges. IEEE Internet Things J. 2020, 7, 11347–11354. [Google Scholar] [CrossRef]
- Iparraguirre, O.; Iturbe-Olleta, N.; Brazalez, A.; Borro, D. Road Marking Damage Detection Based on Deep Learning for Infrastructure Evaluation in Emerging Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22378–22385. [Google Scholar] [CrossRef]
- Khan, M.W.; Obaidat, M.S.; Mahmood, K.; Batool, D.; Badar, H.S.S.; Aamir, M.; Gao, W. Real-Time Road Damage Detection and Infrastructure Evaluation Leveraging Unmanned Aerial Vehicles and Tiny Machine Learning. IEEE Internet Things J. 2024, 11, 21347–21358. [Google Scholar] [CrossRef]
- Silva, L.A.; Leithardt, V.R.Q.; Batista, V.F.L.; Villarrubia González, G.; De Paz Santana, J.F. Automated Road Damage Detection Using UAV Images and Deep Learning Techniques. IEEE Access 2023, 11, 62918–62931. [Google Scholar] [CrossRef]
- Yin, T.; Zhang, W.; Kou, J.; Liu, N. Promoting Automatic Detection of Road Damage: A High-Resolution Dataset, a New Approach, and a New Evaluation Criterion. IEEE Trans. Autom. Sci. Eng. 2024, 1–13. [Google Scholar] [CrossRef]
- Safaei, N.; Smadi, O.; Masoud, A.; Safaei, B. An Automatic Image Processing Algorithm Based on Crack Pixel Density for Pavement Crack Detection and Classification. Int. J. Pavement Res. Technol. 2022, 15, 159–172. [Google Scholar] [CrossRef]
- Roul, R.K.; Rani, R. Cultivating road safety: A comprehensive examination of intelligent ensemble-based road crack detection. Multimed. Tools Appl. 2024. [Google Scholar] [CrossRef]
- Xu, H.; Chen, B.; Qin, J. A CNN-Based Length-Aware Cascade Road Damage Detection Approach. Sensors 2021, 21, 689. [Google Scholar] [CrossRef]
- Yan, K.; Zhang, Z. Automated Asphalt Highway Pavement Crack Detection Based on Deformable Single Shot Multi-Box Detector Under a Complex Environment. IEEE Access 2021, 9, 150925–150938. [Google Scholar] [CrossRef]
- Cha, Y.-J.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
- He, Q.; Li, Z.; Yang, W. Lsf-rdd: A local sensing feature network for road damage detection. Pattern Anal. Appl. 2024, 27, 99. [Google Scholar] [CrossRef]
- Jiang, Y. Road damage detection and classification using deep neural networks. Discov. Appl. Sci. 2024, 6, 421. [Google Scholar] [CrossRef]
- Ding, K.; Ding, Z.; Zhang, Z.; Yuan, M.; Ma, G.; Lv, G. SCD-YOLO: A Novel Object Detection Method for Efficient Road Crack Detection. Multimed. Syst. 2024, 30, 351. [Google Scholar] [CrossRef]
- Ren, X.; Shaolin, H.; Hou, Y.; Ye, K.; Zhengquan, C.; Wu, Z. A lightweight convolutional neural network for detecting road cracks. Signal Image Video Process. 2024, 18, 6729–6743. [Google Scholar]
- He, J.; Wang, Y.; Wang, Y.; Li, R.; Zhang, D.; Zheng, Z. A lightweight road crack detection algorithm based on improved YOLOv7 model. Signal Image Video Process. 2024, 18, 847–860. [Google Scholar] [CrossRef]
- He, Q.; Li, Z.; Yang, W. LMFE-RDD: A road damage detector with a lightweight multi-feature extraction network. Multimed. Syst. 2024, 30, 176. [Google Scholar] [CrossRef]
- Zhao, M.; Su, Y.; Wang, J.; Liu, X.; Wang, K.; Liu, Z.; Liu, M.; Guo, Z. MED-YOLOv8s: A new real-time road crack, pothole, and patch detection model. J. Real-Time Image Process. 2024, 21, 26. [Google Scholar] [CrossRef]
- Guo, G.; Zhang, Z. Road damage detection algorithm for improved YOLOv5. Sci. Rep. 2022, 12, 15523. [Google Scholar] [CrossRef]
- Wang, J.; Meng, R.; Huang, Y.; Zhou, L.; Huo, L.; Qiao, Z.; Niu, C. Road defect detection based on improved YOLOv8s model. Sci. Rep. 2024, 14, 16758. [Google Scholar] [CrossRef]
- Xiang, W.; Wang, H.; Xu, Y.; Zhao, Y.; Zhang, L.; Duan, Y. Road disease detection algorithm based on YOLOv5s-DSG. J. Real-Time Image Process. 2023, 20, 56. [Google Scholar] [CrossRef]
- Xie, X. Road Surface Defect Detection Based on Partial Convolution and Global Attention. Int. J. Pavement Res. Technol. 2024. [Google Scholar] [CrossRef]
- Youwai, S.; Chaiyaphat, A.; Chaipetch, P. YOLO9tr: A lightweight model for pavement damage detection utilizing a generalized efficient layer aggregation network and attention mechanism. J. Real-Time Image Process. 2024, 21, 163. [Google Scholar] [CrossRef]
- Wan, F.; Sun, C.; He, H.; Lei, G.; Xu, L.; Xiao, T. YOLO-LRDD: A lightweight method for road damage detection based on improved YOLOv5s. EURASIP J. Adv. Signal Process. 2022, 2022, 98. [Google Scholar] [CrossRef]
- Zeng, J.; Zhong, H. YOLOv8-PD: An improved road damage detection algorithm based on YOLOv8n model. Sci. Rep. 2024, 14, 12052. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 6999–7019. [Google Scholar] [CrossRef]
- Jiao, L.; Zhang, F.; Liu, F.; Yang, S.; Li, L.; Feng, Z.; Qu, R. A Survey of Deep Learning-Based Object Detection. IEEE Access 2019, 7, 128837–128868. [Google Scholar] [CrossRef]
- Aziz, L.; Haji Salam, M.S.B.; Sheikh, U.U.; Ayub, S. Exploring Deep Learning-Based Architecture, Strategies, Applications and Current Trends in Generic Object Detection: A Comprehensive Review. IEEE Access 2020, 8, 170461–170495. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2018; pp. 4510–4520. [Google Scholar]
- Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.-C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Jocher, G. YOLOv5 by Ultralytics; Version 7.0. Available online: https://github.com/ultralytics/yolov5 (accessed on 1 November 2024).
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Ghiasi, G.; Lin, T.-Y.; Le, Q.V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7029–7038. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV 2018), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the 2020 European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar]
- Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L.; et al. Swin Transformer V2: Scaling Up Capacity and Resolution. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11999–12009. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
- Zhang, C.; Lin, G.; Liu, F.; Yao, R.; Shen, C. CANet: Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning. arXiv 2019, arXiv:1903.02351. [Google Scholar]
- Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Meila, M., Zhang, T., Eds.; 2021; Volume 139, pp. 11863–11874. Available online: http://proceedings.mlr.press/v139/yang21o (accessed on 1 November 2024).
- Liu, Y.; Shao, Z.; Hoffmann, N. Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar]
- Li, Y.; Li, J.; Lin, W.; Li, J. Tiny-DSOD: Lightweight Object Detection for Resource-Restricted Usages. arXiv 2018, arXiv:1807.11013. [Google Scholar]
- Qin, Z.; Li, Z.; Zhang, Z.; Bao, Y.; Yu, G.; Peng, Y.; Sun, J. ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6717–6726. [Google Scholar] [CrossRef]
- Mittal, P. A comprehensive survey of deep learning-based lightweight object detection models for edge devices. Artif. Intell. Rev. 2024, 57, 242. [Google Scholar] [CrossRef]
- He, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. Extreme Learning Machine for Regression and Multiclass Classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2022, 42, 513–529. [Google Scholar]
- Jiao, L.; Zhang, F.; Liu, F.; Yang, S.; Li, L.; Feng, Z.; Qu, R. SSD: Single Shot MultiBox Detector. In Proceedings of the 2016 European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Xu, W.; Wan, Y. ELA: Efficient Local Attention for Deep Convolutional Neural Networks. arXiv 2024, arXiv:2403.01123. [Google Scholar]
- Wu, Y.; He, K. Group Normalization. In Computer Vision–ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 3–19. ISBN 978-3-030-01261-8. [Google Scholar]
- Cao, J.; Li, Y.; Sun, M.; Chen, Y.; Lischinski, D.; Cohen-Or, D.; Chen, B.; Tu, C. DO-Conv: Depthwise Over-Parameterized Convolutional Layer. IEEE Trans. Image Process. 2022, 31, 3726–3736. [Google Scholar] [CrossRef]
- Ren, M.; Zhang, X.; Zhi, X.; Wei, Y.; Feng, Z. An annotated street view image dataset for automated road damage detection. Sci. Data 2024, 11, 407. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L.; Dollár, P. Microsoft COCO: Common Objects in Context. arXiv 2015, arXiv:1405.0312. [Google Scholar]
- Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Computer Vision–ECCV 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer Nature: Cham, Switzerland, 2025; pp. 1–21. Available online: https://github.com/WongKinYiu/yolov9 (accessed on 1 November 2024)ISBN 978-3-031-72751-1.
- Lyu, R. NanoDet-Plus: Super Fast and High Accuracy Lightweight Anchor-Free Object Detection Model. 2021. Available online: https://github.com/RangiLyu/nanodet (accessed on 1 November 2024).
- Shi, H.; Zhou, Q.; Ni, Y.; Wu, X.; Latecki, L.J. DPNET: Dual-Path Network for Efficient Object Detection with Lightweight Self-Attention. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 771–775. [Google Scholar] [CrossRef]
- Yu, G.; Chang, Q.; Lv, W.; Xu, C.; Cui, C.; Ji, W.; Dang, Q.; Deng, K.; Wang, G.; Du, Y.; et al. PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices. arXiv 2021, arXiv:2111.00902. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar] [CrossRef]
Data Set Name | Experimental Image Size | Number of Categories | Number of Pictures |
---|---|---|---|
SVRDD | 640 × 640 | 7 | 8000 |
MS COCO2017 | 80 | 123,287 | |
Data set name | Number of frames | Number of samples in the training set | Test set sample size |
SVRDD | 20,804 | 6000 | 1000 |
MS COCO2017 | 886,000 | 118,287 | 5000 |
Model | P | R | mAP50 | mAP 50-95 | Params | FLOPs | Model Size |
---|---|---|---|---|---|---|---|
YOLOv5n | 67.0 | 58.4 | 61.7 | 35.8 | 2.5 | 7.1 | 5.3 |
YOLOv8n | 70.7 | 59.3 | 64.5 | 37.8 | 3.0 | 8.1 | 6.3 |
YOLOv9t | 69.0 | 57.2 | 60.8 | 36.1 | 2.0 | 7.6 | 4.7 |
YOLOv10n | 65.0 | 55.6 | 59.7 | 35.8 | 2.7 | 8.2 | 5.8 |
DAPONet | 71.6 | 66.6 | 70.1 | 42.8 | 1.6 | 1.7 | 3.7 |
Model | mAP50 | mAP 50-95 | Params | FLOPs | Model Size |
---|---|---|---|---|---|
NanoDet-Plus-m-1.5x | 29.9 | 1.75 | 2.44 | 4.7 | |
DPNet | 29.6 | 1.04 | 2.5 | ||
PP-PicoDet-ShuffleNetV2 | 44.6 | 30.0 | 1.17 | 1.53 | |
PP-PicoDet-S | 45.5 | 30.6 | 0.99 | 1.24 | |
EfficientDet-D1 | 32.6 | 6.1 | 6.6 | ||
YOLOv5n | 45.7 | 28.0 | 1.9 | 4.5 | 3.9 |
DAPONet | 48.3 | 33.4 | 1.6 | 1.7 | 3.6 |
CPDA | MCD | P | R | mAP50 | mAP 50-95 | Params | FLOPs | Model Size |
---|---|---|---|---|---|---|---|---|
70.7 | 59.3 | 64.5 | 37.8 | 3.0 | 8.1 | 6.3 | ||
✓ | 68.5 | 62.1 | 66.1 | 39.2 | 2.3 | 4.6 | 5.1 | |
✓ | 68.9 | 60.8 | 65.2 | 38.3 | 2.8 | 7.8 | 5.8 | |
✓ | ✓ | 71.6 | 66.6 | 70.1 | 42.8 | 1.6 | 1.7 | 3.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pan, W.; Lei, J.; Wang, X.; Lv, C.; Wang, G.; Li, C. DAPONet: A Dual Attention and Partially Overparameterized Network for Real-Time Road Damage Detection. Appl. Sci. 2025, 15, 1470. https://doi.org/10.3390/app15031470
Pan W, Lei J, Wang X, Lv C, Wang G, Li C. DAPONet: A Dual Attention and Partially Overparameterized Network for Real-Time Road Damage Detection. Applied Sciences. 2025; 15(3):1470. https://doi.org/10.3390/app15031470
Chicago/Turabian StylePan, Weichao, Jianmei Lei, Xu Wang, Chengze Lv, Gongrui Wang, and Chong Li. 2025. "DAPONet: A Dual Attention and Partially Overparameterized Network for Real-Time Road Damage Detection" Applied Sciences 15, no. 3: 1470. https://doi.org/10.3390/app15031470
APA StylePan, W., Lei, J., Wang, X., Lv, C., Wang, G., & Li, C. (2025). DAPONet: A Dual Attention and Partially Overparameterized Network for Real-Time Road Damage Detection. Applied Sciences, 15(3), 1470. https://doi.org/10.3390/app15031470