Dense Multiscale Feature Learning Transformer Embedding Cross-Shaped Attention for Road Damage Detection
Abstract
:1. Introduction
- (1)
- We use CSA mechanisms in the backbone network and focus on the pothole region to expand the attentional action range, which enables our DMTC network to deploy global attention to the specified feature information more efficiently, enhances the representational capabilities of the network, makes the network more capable of detecting environmental and road potholes, and thus improves the accuracy of the network’s recognition.
- (2)
- We utilize the DMFL module to fuse independent information of multiple scales, which significantly improves the detection performance. The DMFL module quickly constructs a feature pyramid that contains strong semantic information at every scale, recovers as much of the original pothole feature information as possible, reduces our DMTC model’s false detection rate, and makes the edge lines of the detected potholes are more complete.
- (3)
- On the publicly available road detection dataset CPRID, we replicate some segmentation algorithms, provide baselines for road damage, and conduct extensive experiments. Results of the experimental work in this paper demonstrate that our method is visually and quantitatively superior in comparison with other conventional methods.
2. Materials and Method
2.1. Cross-Shaped Attention
2.2. Dense Multiscale Feature Learning
2.3. Segmentation-Head
2.4. Loss Function
3. Experiments and Results
3.1. Datasets
3.2. Experimental Details
3.2.1. Evaluation Metrics
3.2.2. Parameter Settings
3.3. Baselines
- EfficientFCN [47]: An ImageNet pretrained network without any dilated convolutions forms the backbone of the system. Utilizing multi-scale features in the encoder to obtain high-resolution, semantically rich feature maps. To convert decoding tasks into novel codebook generation and codeword assembly tasks, encoders use their high-level and low-level functions.
- IFNet [48]: A deeply supervised image fusion network. First, extracts features by using a full convolutional network with volume branching. The second step involves detecting changes using a deep supervised difference discriminative network.
- UNet [10]: Getting the network by extending and modifying the full convolutional network. Two parts comprise the network: a contracting path for obtaining context information, and a symmetric expanding path for pinpointing the location.
- SegNet [49]: A symmetric network consisting of encoder (left) and decoder (right). Encoder is a network model along the lines of Visual Geometry Group (VGG16), which mainly parses object information. The decoder converts the parsed information into the form of the final image.
- FastFCN [50]: For the purpose of improving semantic segmentation, turning the extraction of high-resolution feature maps into a joint up-sampling problems by using a new joint up-sampling module JPU (Joint Pyramid Up-sampling).
- PSPNet [51]: In this module, the core function is pyramid pooling, which aggregates context information on different areas in order to improve access to global information.
- FCN16s [9]: The backbone network is Visual Geometry Group (VGG16), and the key step is to deconvolute (up-sampling bilinear interpolation can be done) the prediction results of 1/32 graph into 1/16 graph. Predicts the results of the 1/16th graph pooling layer and adds them to the previous 1/32nd graph results. The final result is an enhanced version of the 1/16th graph prediction result, and then deconvoluting the predicted result to obtain the original image size to get the final outcome.
- FCN32s [9]: In accordance with VGG16 (Visual Geometry Group) neural networks, removing the 3 fully connected layers firstly and then adding the 3 convolutional layers. To prevent overfitting, adding the dropout layers after each of the first 2 convolutional layers, and finally scaling up the results 32 times with transposed convolutional layers for restoring the original size of the output image.
- DeepLabV3+ [52]: The model uses DeepLabv3 as the encoder module and a simple but effective decoder module as the decoder module. Through atrous convolution, the model can adjust the resolution of the encoded features, thus balancing accuracy and runtime.
3.4. Visual Performance
3.4.1. Detection Results
3.4.2. Comparison with Baselines
3.5. More Analysis
3.5.1. Interpretable Analysis
3.5.2. Generalization Analysis
4. Quantitative Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Naddaf-Sh, S.; Naddaf-Sh, M.-M.; Kashani, A.R.; Zargarzadeh, H. In An efficient and scalable deep learning approach for road damage detection. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 5602–5608. [Google Scholar]
- Xu, C.; Ye, Z.; Mei, L.; Shen, S.; Zhang, Q.; Sui, H.; Yang, W.; Sun, S. SCAD: A Siamese Cross-Attention Discrimination Network for Bitemporal Building Change Detection. Remote Sens. 2022, 14, 6213. [Google Scholar] [CrossRef]
- Kim, H.-K.; Park, J.H.; Jung, H.-Y. An efficient color space for deep-learning based traffic light recognition. J. Adv. Transp. 2018, 2018, 1–12. [Google Scholar] [CrossRef]
- Sudakov, O.; Burnaev, E.; Koroteev, D. Driving digital rock towards machine learning: Predicting permeability with gradient boosting and deep neural networks. Comput. Geosci. 2019, 127, 91–98. [Google Scholar] [CrossRef]
- Xiao, J.; Guo, H.; Zhou, J.; Zhao, T.; Yu, Q.; Chen, Y. Tiny object detection with context enhancement and feature purification. Expert Syst. Appl. 2023, 211, 118665. [Google Scholar] [CrossRef]
- Ale, L.; Zhang, N.; Li, L. Road damage detection using RetinaNet. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 5197–5200. [Google Scholar]
- Wang, W.; Wu, B.; Yang, S.; Wang, Z. Road damage detection and classification with faster R-CNN. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 5220–5223. [Google Scholar]
- Mei, L.; Guo, X.; Huang, X.; Weng, Y.; Liu, S.; Lei, C. Dense contour-imbalance aware framework for colon gland instance segmentation. Biomed. Signal Process. Control. 2020, 60, 101988. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE—Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Zhang, Y.; Fan, J.; Zhang, M.; Shi, Z.; Liu, R.; Guo, B. A Recurrent Adaptive Network: Balanced Learning for Road Crack Segmentation with High-Resolution Images. Remote Sens. 2022, 14, 3275. [Google Scholar] [CrossRef]
- Tsai, Y.-C.; Kaul, V.; Mersereau, R.M. Critical assessment of pavement distress segmentation methods. J. Transp. Eng. 2010, 136, 11–19. [Google Scholar] [CrossRef]
- Robet, R.; Hasibuan, Z.A.; Soeleman, M.A.; Purwanto, P.; Andono, P.N.; Pujiono, P. Deep Learning Model in Road Surface Condition Monitoring. In Proceedings of the 2022 International Seminar on Application for Technology of Information and Communication (iSemantic), Kota Semarang, Indonesia, 17–18 September 2022; pp. 204–209. [Google Scholar]
- Sizyakin, R.; Voronin, V.; Gapon, N.; Pižurica, A. A deep learning approach to crack detection on road surfaces. In Artificial Intelligence and Machine Learning in Defense Applications II; SPIE: Bellingham, WA, USA, 2020; pp. 128–134. [Google Scholar]
- Li, H.; Xu, H.; Tian, X.; Wang, Y.; Cai, H.; Cui, K.; Chen, X. Bridge crack detection based on SSENets. Appl. Sci. 2020, 10, 4230. [Google Scholar] [CrossRef]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Washington, DC, USA, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Fang, F.; Li, L.; Gu, Y.; Zhu, H.; Lim, J.-H. A novel hybrid approach for crack detection. Pattern Recognit. 2020, 107, 107474. [Google Scholar] [CrossRef]
- Ibragimov, E.; Lee, H.-J.; Lee, J.-J.; Kim, N. Automated pavement distress detection using region based convolutional neural networks. Int. J. Pavement Eng. 2022, 23, 1981–1992. [Google Scholar] [CrossRef]
- Du, Y.; Pan, N.; Xu, Z.; Deng, F.; Shen, Y.; Kang, H. Pavement distress detection and classification based on YOLO network. Int. J. Pavement Eng. 2021, 22, 1659–1672. [Google Scholar] [CrossRef]
- Park, S.-S.; Tran, V.-T.; Lee, D.-E. Application of various yolo models for computer vision-based real-time pothole detection. Appl. Sci. 2021, 11, 11229. [Google Scholar] [CrossRef]
- Xu, Q.; Lin, R.; Yue, H.; Huang, H.; Yang, Y.; Yao, Z. Research on small target detection in driving scenarios based on improved yolo network. IEEE Access 2020, 8, 27574–27583. [Google Scholar] [CrossRef]
- Liu, Z.; Wu, W.; Gu, X.; Li, S.; Wang, L.; Zhang, T. Application of combining YOLO models and 3D GPR images in road detection and maintenance. Remote Sens. 2021, 13, 1081. [Google Scholar] [CrossRef]
- Dharneeshkar, J.; Aniruthan, S.; Karthika, R.; Parameswaran, L. Deep Learning based Detection of potholes in Indian roads using YOLO. In Proceedings of the 2020 International Conference on Inventive Computation Technologies (ICICT) 2020, Coimbatore, India, 26–28 February 2020; pp. 381–385. [Google Scholar]
- Zhang, A.; Wang, K.C.; Li, B.; Yang, E.; Dai, X.; Peng, Y.; Fei, Y.; Liu, Y.; Li, J.Q.; Chen, C. Automated pixel-level pavement crack detection on 3D asphalt surfaces using a deep-learning network. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 805–819. [Google Scholar] [CrossRef]
- Zhang, A.; Wang, K.C.; Fei, Y.; Liu, Y.; Tao, S.; Chen, C.; Li, J.Q.; Li, B. Deep learning–based fully automated pavement crack detection on 3D asphalt surfaces with an improved CrackNet. J. Comput. Civ. Eng. 2018, 32, 04018041. [Google Scholar] [CrossRef]
- Zhang, A.; Wang, K.C.; Fei, Y.; Liu, Y.; Chen, C.; Yang, G.; Li, J.Q.; Yang, E.; Qiu, S. Automated pixel-level pavement crack detection on 3D asphalt surfaces with a recurrent neural network. Comput.-Aided Civ. Infrastruct. Eng. 2019, 34, 213–229. [Google Scholar] [CrossRef]
- Fei, Y.; Wang, K.C.; Zhang, A.; Chen, C.; Li, J.Q.; Liu, Y.; Yang, G.; Li, B. Pixel-level cracking detection on 3D asphalt pavement images through deep-learning-based CrackNet-V. IEEE Trans. Intell. Transp. Syst. 2019, 21, 273–284. [Google Scholar] [CrossRef]
- Lee, T.; Yoon, Y.; Chun, C.; Ryu, S. CNN-based road-surface crack detection model that responds to brightness changes. Electronics 2021, 10, 1402. [Google Scholar] [CrossRef]
- Lee, T.; Chun, C.; Ryu, S.-K. Detection of road-surface anomalies using a smartphone camera and accelerometer. Sensors 2021, 21, 561. [Google Scholar] [CrossRef]
- Haris, M.; Glowacz, A. Road object detection: A comparative study of deep learning-based algorithms. Electronics 2021, 10, 1932. [Google Scholar] [CrossRef]
- Mahenge, S.F.; Wambura, S.; Jiao, L. A Modified U-Net Architecture for Road Surfaces Cracks Detection. In Proceedings of the 8th International Conference on Computing and Artificial Intelligence, Tianjin, China, 18–21 March 2022; pp. 464–471. [Google Scholar]
- Zhang, L.; Shen, J.; Zhu, B. A research on an improved Unet-based concrete crack detection algorithm. Struct. Health Monit. 2021, 20, 1864–1879. [Google Scholar] [CrossRef]
- Sun, X.; Xie, Y.; Jiang, L.; Cao, Y.; Liu, B. DMA-Net: DeepLab with Multi-Scale Attention for Pavement Crack Segmentation. IEEE Trans. Intell. Transp. Syst. 2022, 23, 18392–18403. [Google Scholar] [CrossRef]
- Vishwakarma, R.; Vennelakanti, R. Cnn model & tuning for global road damage detection. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 5609–5615. [Google Scholar]
- Liu, Y.; Zhang, X.; Zhang, B.; Chen, Z. Deep network for road damage detection. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 5572–5576. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6881–6890. [Google Scholar]
- Liu, H.; Miao, X.; Mertz, C.; Xu, C.; Kong, H. CrackFormer: Transformer Network for Fine-Grained Crack Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3783–3792. [Google Scholar]
- Yu, M.; Wu, D.; Rao, W.; Cheng, L.; Li, R.; Li, Y. Automated Road Crack Detection Method based on Visual Transformer with Multi-Head Cross-Attention. In Proceedings of the 2022 IEEE International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC), Chongqing, China, 5–7 August 2022; pp. 328–332. [Google Scholar]
- Mehajabin, N.; Ma, Z.; Wang, Y.; Tohidypour, H.R.; Nasiopoulos, P. Real-Time Deep Learning based Road Deterioration Detection for Smart Cities. In Proceedings of the 18th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), Thessaloniki, Greece, 10–12 October 2022; pp. 321–326. [Google Scholar]
- Dong, X.; Bao, J.; Chen, D.; Zhang, W.; Yu, N.; Yuan, L.; Chen, D.; Guo, B. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 12124–12134. [Google Scholar]
- Feng, H.; Xu, G.S.; Guo, Y. Multi-scale classification network for road crack detection. IET Intell. Transp. Syst. 2019, 13, 398–405. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Jadon, S. A survey of loss functions for semantic segmentation. In Proceedings of the 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Via del Mar, Chile, 27–29 October 2020; pp. 1–7. [Google Scholar]
- Thompson, E.M.; Ranieri, A.; Biasotti, S.; Chicchon, M.; Sipiran, I.; Pham, M.-K.; Nguyen-Ho, T.-L.; Nguyen, H.-D.; Tran, M.-T. SHREC 2022: Pothole and crack detection in the road pavement using images and RGB-D data. arXiv 2022, arXiv:2205.13326. [Google Scholar] [CrossRef]
- Lipton, Z.C.; Elkan, C.; Narayanaswamy, B. Thresholding classifiers to maximize F1 score. arXiv 2014, arXiv:1402.1892. [Google Scholar]
- Smith, L.N. A disciplined approach to neural network hyper-parameters: Part 1—Learning rate, batch size, momentum, and weight decay. arXiv 2018, arXiv:1803.09820. [Google Scholar]
- Liu, J.; He, J.; Zhang, J.; Ren, J.S.; Li, H. EfficientFCN: Holistically-guided decoding for semantic segmentation. arXiv 2020, arXiv:2008.10487. [Google Scholar]
- Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Wu, H.; Zhang, J.; Huang, K.; Liang, K.; Yu, Y. Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation. arXiv 2019, arXiv:1903.11816. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Method | Precision | Recall | F1 Score | mIOU | IOU_0 | IOU_1 | OA | Kappa |
---|---|---|---|---|---|---|---|---|
EfficientFCN | 79.56 | 61.72 | 69.51 | 99.77 | 53.27 | 76.52 | 99.77 | 69.4 |
IFNet | 64.17 | 40.56 | 49.7 | 99.65 | 33.07 | 66.36 | 99.65 | 49.54 |
UNet | 76.52 | 59.4 | 66.89 | 99.75 | 50.25 | 75.00 | 99.75 | 66.76 |
SegNet | 46.73 | 50.06 | 48.34 | 99.54 | 31.87 | 65.71 | 99.54 | 48.11 |
FastFCN | 80.26 | 69.39 | 74.42 | 99.80 | 59.26 | 79.53 | 99.80 | 74.31 |
PSPNet | 73.05 | 48.65 | 58.4 | 99.7 | 41.25 | 70.48 | 99.7 | 58.26 |
FCN16s | 7.68 | 7.94 | 7.85 | 99.21 | 4.09 | 51.65 | 99.21 | 7.45 |
FCN32s | 61.05 | 49.87 | 54.9 | 99.65 | 37.83 | 68.74 | 99.65 | 54.72 |
DeepLabV3+ | 33.82 | 34.34 | 34.08 | 99.43 | 20.54 | 59.98 | 99.43 | 33.79 |
DMTC | 81.37 | 77.51 | 79.39 | 99.83 | 65.83 | 82.83 | 99.83 | 79.31 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, C.; Zhang, Q.; Mei, L.; Shen, S.; Ye, Z.; Li, D.; Yang, W.; Zhou, X. Dense Multiscale Feature Learning Transformer Embedding Cross-Shaped Attention for Road Damage Detection. Electronics 2023, 12, 898. https://doi.org/10.3390/electronics12040898
Xu C, Zhang Q, Mei L, Shen S, Ye Z, Li D, Yang W, Zhou X. Dense Multiscale Feature Learning Transformer Embedding Cross-Shaped Attention for Road Damage Detection. Electronics. 2023; 12(4):898. https://doi.org/10.3390/electronics12040898
Chicago/Turabian StyleXu, Chuan, Qi Zhang, Liye Mei, Sen Shen, Zhaoyi Ye, Di Li, Wei Yang, and Xiangyang Zhou. 2023. "Dense Multiscale Feature Learning Transformer Embedding Cross-Shaped Attention for Road Damage Detection" Electronics 12, no. 4: 898. https://doi.org/10.3390/electronics12040898
APA StyleXu, C., Zhang, Q., Mei, L., Shen, S., Ye, Z., Li, D., Yang, W., & Zhou, X. (2023). Dense Multiscale Feature Learning Transformer Embedding Cross-Shaped Attention for Road Damage Detection. Electronics, 12(4), 898. https://doi.org/10.3390/electronics12040898