AAL-Net: A Lightweight Detection Method for Road Surface Defects Based on Attention and Data Augmentation
Abstract
:1. Introduction
- We make a new pothole detection dataset for pothole detection.
- We propose a one-stage object detector, the AAL-Net. By designing the LF module and using the NAM attention module, we achieve the goal of a lightweight network model and high detection accuracy.
- We design a data augmentation method to improve the accuracy and robustness of pothole detection.
2. Related Work
2.1. Object Detectors
2.2. Surface Defect Detection
2.3. Attention Mechanism
3. Method
3.1. Architecture
3.2. Structure
3.2.1. Backbone
3.2.2. Neck
3.3. Method of Data Augmentation
3.3.1. Negative Sample
3.3.2. Fog
3.4. Loss Function
4. Results and Discussion
4.1. Experiment Description
4.1.1. Dataset
- (1)
- The self-made dataset, A1 in Table 1, includes 994 pictures of potholes, 795 pictures in the training set, 199 pictures in the test set, and the ratio of the training set to the test set is 8:2.
- (2)
- The negative sample dataset, A2 in Table 1, includes 746 pictures of potholes under normal conditions in the baseline dataset A1 and 248 pictures of manholes. The training set includes 795 pictures, and the test set contains 199 pictures. The ratio of the training set to the test set is 8:2.
- (3)
- The fogging dataset, referred to as A3 in Table 1, includes 746 pictures of potholes under normal conditions in the baseline dataset A1 and 248 pictures of potholes after the fogging treatment. The training set includes 795 pictures, and the test set contains 199 pictures. The ratio of the training set to the test set is 8:2.
- (4)
- The fogging and negative sample dataset, A4 in Table 1, includes 498 pictures of potholes in normal conditions in the baseline dataset A1, 248 pictures of potholes in the fogging dataset, and 248 pictures of manholes in the negative sample dataset. The training set includes 795 pictures, and the test set contains 199 pictures. The ratio of the training set to the test set is 8:2.
- (1)
- The pothole600 dataset, B1 in Table 2, contains pictures of 600 potholes. The training set includes 480 pictures, and the test set contains 120 pictures. The ratio of the training set to the test set is 8:2.
- (2)
- The negative pothole600 sample dataset, B2 in Table 2, includes 450 original pothole600 pothole pictures and 150 manhole pictures. The training set includes 480 pictures, and the test set contains 120 pictures. The ratio of the training set to the test set is 8:2.
- (3)
- The pothole600 fogging dataset, B3 in Table 2, includes 450 original pothole600 pothole pictures and 150 fogging-treated pictures. The training set includes 480 pictures, and the test set contains 120 pictures. The ratio of the training set to the test set is 8:2.
- (4)
- The pothole600 fogging and negative sample dataset, B4 in Table 2, includes 300 pictures of potholes under normal conditions in the baseline dataset B1, 150 pictures of potholes treated with fogging in the fogging dataset B3, and 150 pictures of manholes in the negative sample dataset B2. The training set consists of 450 pictures, the test set contains 150 pictures, and the ratio of the training set to the test set is 8:2.
4.1.2. Experimental Settings
4.1.3. Metrics
4.2. Comparison Experiments
4.3. Data Augmentation Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Fan, R.; Liu, M. Road Damage Detection Based on Unsupervised Disparity Map Segmentation. IEEE Trans. Intell. Transp. Syst. 2020, 21, 4906–4911. [Google Scholar] [CrossRef] [Green Version]
- Kim, Y.-M.; Kim, Y.-G.; Son, S.-Y.; Lim, S.-Y.; Choi, B.-Y.; Choi, D.-H. Review of Recent Automated Pothole-Detection Methods. Appl. Sci. 2022, 12, 5320. [Google Scholar] [CrossRef]
- Park, S.-S.; Tran, V.-T.; Lee, D.-E. Application of Various YOLO Models for Computer Vision-Based Real-Time Pothole Detection. Appl. Sci. 2021, 11, 11229. [Google Scholar] [CrossRef]
- Dewangan, D.K.; Sahu, S.P. PotNet: Pothole Detection for Autonomous Vehicle System Using Convolutional Neural Network. Electron. Lett. 2021, 57, 53–56. [Google Scholar] [CrossRef]
- Sattar, S.; Li, S.; Chapman, M. Road Surface Monitoring Using Smartphone Sensors: A Review. Sensors 2018, 18, 3845. [Google Scholar] [CrossRef] [Green Version]
- Du, R.; Qiu, G.; Gao, K.; Hu, L.; Liu, L. Abnormal Road Surface Recognition Based on Smartphone Acceleration Sensor. Sensors 2020, 20, 451. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ul Haq, M.U.; Ashfaque, M.; Mathavan, S.; Kamal, K.; Ahmed, A. Stereo-Based 3D Reconstruction of Potholes by a Hybrid, Dense Matching Scheme. IEEE Sens. J. 2019, 19, 3807–3817. [Google Scholar] [CrossRef]
- Guan, J.; Yang, X.; Ding, L.; Cheng, X.; Lee, V.C.S.; Jin, C. Automated Pixel-Level Pavement Distress Detection Based on Stereo Vision and Deep Learning. Autom. Constr. 2021, 129, 103788. [Google Scholar] [CrossRef]
- Baek, J.-W.; Chung, K. Pothole Classification Model Using Edge Detection in Road Image. Appl. Sci. 2020, 10, 6662. [Google Scholar] [CrossRef]
- Chen, H.; Yao, M.; Gu, Q. Pothole Detection Using Location-Aware Convolutional Neural Networks. Int. J. Mach. Learn. Cybern. 2020, 11, 899–911. [Google Scholar] [CrossRef]
- Pan, Y.; Zhang, X.; Cervone, G.; Yang, L. Detection of Asphalt Pavement Potholes and Cracks Based on the Unmanned Aerial Vehicle Multispectral Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3701–3712. [Google Scholar] [CrossRef]
- Salaudeen, H.; Çelebi, E. Pothole Detection Using Image Enhancement GAN and Object Detection Network. Electronics 2022, 11, 1882. [Google Scholar] [CrossRef]
- Arya, D.; Maeda, H.; Kumar Ghosh, S.; Toshniwal, D.; Omata, H.; Kashiyama, T.; Sekimoto, Y. Global Road Damage Detection: State-of-the-Art Solutions. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10 December 2020; pp. 5533–5539. [Google Scholar]
- Gao, M.; Wang, X.; Zhu, S.; Guan, P. Detection and Segmentation of Cement Concrete Pavement Pothole Based on Image Processing Technology. Math. Probl. Eng. 2020, 2020, 1360832. [Google Scholar] [CrossRef] [Green Version]
- Masihullah, S.; Garg, R.; Mukherjee, P.; Ray, A. Attention Based Coupled Framework for Road and Pothole Segmentation. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10 January 2021; pp. 5812–5819. [Google Scholar]
- Fan, J.; Bocus, M.J.; Hosking, B.; Wu, R.; Liu, Y.; Vityazev, S.; Fan, R. Multi-Scale Feature Fusion: Learning Better Semantic Segmentation for Road Pothole Detection. In Proceedings of the 2021 IEEE International Conference on Autonomous Systems (ICAS), Montreal, QC, Canada, 11 August 2021; pp. 1–5. [Google Scholar]
- Anand, S.; Gupta, S.; Darbari, V.; Kohli, S. Crack-Pot: Autonomous Road Crack and Pothole Detection. In Proceedings of the 2018 Digital Image Computing: Techniques and Applications (DICTA), IEEE, Canberra, Australia, 10–13 December 2018; pp. 1–6. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-Based Fully Convolutional Networks. In Proceedings of the Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. ISBN 978-3-319-46447-3. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11218, pp. 122–138. ISBN 978-3-030-01263-2. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.-C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10781–10790. [Google Scholar]
- Li, G.; Shao, R.; Wan, H.; Zhou, M.; Li, M. A Model for Surface Defect Detection of Industrial Products Based on Attention Augmentation. Comput. Intell. Neurosci. 2022, 2022, 9577096. [Google Scholar] [CrossRef]
- Zhang, Z.-K.; Zhou, M.-L.; Shao, R.; Li, M.; Li, G. A Defect Detection Model for Industrial Products Based on Attention and Knowledge Distillation. Comput. Intell. Neurosci. 2022, 2022, 6174255. [Google Scholar] [CrossRef]
- Wang, G.; Li, Q.; Wang, L.; Zhang, Y.; Liu, Z. Elderly Fall Detection with an Accelerometer Using Lightweight Neural Networks. Electronics 2019, 8, 1354. [Google Scholar] [CrossRef] [Green Version]
- Li, W.; Zhang, L.; Wu, C.; Cui, Z.; Niu, C. A New Lightweight Deep Neural Network for Surface Scratch Detection. Int. J. Adv. Manuf. Technol. 2022, 123, 1999–2015. [Google Scholar] [CrossRef] [PubMed]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Park, J.; Woo, S.; Lee, J.-Y.; Kweon, I.S. BAM: Bottleneck Attention Module. arXiv 2018, arXiv:1807.06514. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11211, pp. 3–19. ISBN 978-3-030-01233-5. [Google Scholar]
- Liu, Z.; Wang, L.; Wu, W.; Qian, C.; Lu, T. TAM: Temporal Adaptive Module for Video Recognition. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 13688–13698. [Google Scholar]
- Wang, C.-Y.; Mark Liao, H.-Y.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 1571–1580. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Liu, Y.; Shao, Z.; Teng, Y.; Hoffmann, N. NAM: Normalization-Based Attention Module. arXiv 2021, arXiv:2111.12419. [Google Scholar]
- Fan, R.; Wang, H.; Bocus, M.J.; Liu, M. We Learn Better Road Pothole Detection: From Attention Aggregation to Adversarial Domain Adaptation. In Computer Vision—ECCV 2020; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12538, pp. 285–300. [Google Scholar]
- Available online: https://www.kaggle.com/datasets/zchengcheng/pothole-datasets (accessed on 14 January 2023).
Name | Pothole | Fog | Manhole |
---|---|---|---|
Pothole(A1) | 994 | 0 | 0 |
Pothole-manhole(A2) | 746 | 0 | 248 |
Pothole-fog(A3) | 746 | 248 | 0 |
Pothole-manhole-fog(A4) | 498 | 248 | 248 |
Name | Pothole | Fog | Manhole |
---|---|---|---|
Pothole600(B1) | 600 | 0 | 0 |
Pothole600-manhole(B2) | 450 | 0 | 150 |
Pothole600-fog(B3) | 450 | 150 | 0 |
Pothole600-manhole-fog(B4) | 300 | 150 | 150 |
Model | P (%) | R (%) | F1 (%) | Parameter | GFLOPs |
---|---|---|---|---|---|
YOLOv3-tiny | 69.40 | 62.58 | 65.81 | 8.67M | 13.0 |
YOLOv5s | 72.92 | 63.68 | 67.98 | 7.02M | 15.9 |
YOLOv5s-tiny | 73.94 | 64.77 | 69.05 | 3.68M | 8.2 |
AAL-Net | 77.47 | 64.11 | 70.16 | 3.67M | 8.2 |
AAL-Net+aug | 87.97 | 80.08 | 83.84 | 3.67M | 8.2 |
Model | P (%) | R (%) | F1 (%) | Parameter | GFLOPs |
---|---|---|---|---|---|
YOLOv3-tiny | 91.12 | 88.97 | 90.03 | 8.67M | 13.0 |
YOLOv5s | 93.11 | 96.06 | 94.56 | 7.02M | 15.9 |
YOLOv5s-tiny | 95.29 | 96.06 | 95.67 | 3.68M | 8.2 |
AAL-Net | 95.34 | 96.85 | 96.09 | 3.67M | 8.2 |
AAL-Net+aug | 95.38 | 97.63 | 96.49 | 3.67M | 8.2 |
Data Augmentation | P (%) | R (%) | F1 (%) |
---|---|---|---|
no | 72.92 | 63.68 | 67.98 |
manhole | 89.71 | 82.04 | 85.70 |
fog | 96.03 | 90.15 | 92.99 |
fog-manhole | 85.87 | 83.80 | 84.82 |
Data Augmentation | P (%) | R (%) | F1 (%) |
---|---|---|---|
no | 93.11 | 96.06 | 94.56 |
manhole | 96.08 | 97.63 | 96.85 |
fog | 96.09 | 97.63 | 96.85 |
fog-manhole | 95.97 | 93.68 | 94.81 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, C.; Li, G.; Zhang, Z.; Shao, R.; Li, M.; Han, D.; Zhou, M. AAL-Net: A Lightweight Detection Method for Road Surface Defects Based on Attention and Data Augmentation. Appl. Sci. 2023, 13, 1435. https://doi.org/10.3390/app13031435
Zhang C, Li G, Zhang Z, Shao R, Li M, Han D, Zhou M. AAL-Net: A Lightweight Detection Method for Road Surface Defects Based on Attention and Data Augmentation. Applied Sciences. 2023; 13(3):1435. https://doi.org/10.3390/app13031435
Chicago/Turabian StyleZhang, Cheng, Gang Li, Zekai Zhang, Rui Shao, Min Li, Delong Han, and Mingle Zhou. 2023. "AAL-Net: A Lightweight Detection Method for Road Surface Defects Based on Attention and Data Augmentation" Applied Sciences 13, no. 3: 1435. https://doi.org/10.3390/app13031435
APA StyleZhang, C., Li, G., Zhang, Z., Shao, R., Li, M., Han, D., & Zhou, M. (2023). AAL-Net: A Lightweight Detection Method for Road Surface Defects Based on Attention and Data Augmentation. Applied Sciences, 13(3), 1435. https://doi.org/10.3390/app13031435