Trans-DCN: A High-Efficiency and Adaptive Deep Network for Bridge Cable Surface Defect Segmentation
Abstract
:1. Introduction
- The introduction of an efficient backbone based on Transformer, complemented by a multi-layer feature aggregation module in the encoder, facilitating the comprehensive utilization of global contextual defect features.
- The establishment of a serial–parallel structure featuring multiple atrous Deformable Convolutions, which dynamically adjust the receptive field according to the defect distribution, ensuring a comprehensive detection with multiple granularities.
2. Related Work
2.1. Bridge Cable Surface Defect Detection
2.2. Image Segmentation Based on Deep Neural Network
3. Method
3.1. Overview of Related Methods
3.2. Proposed Network Architecture
3.3. Encoder Based on Effective Transformer
3.3.1. Overlap Patch Merging
3.3.2. Effective Transformer Encoder
3.3.3. MFA Module for Feature Fusion
3.4. Decoder Based on Adaptive Aggregation
3.4.1. Serial0-Parallel Pyramid Based on Atrous Deformable Convolution Module
3.4.2. Adaptive Spatial Feature Fusion Module
3.5. Loss Function
4. Experiment Results
4.1. Datasets
4.2. Evaluation Metrics
- Acc indicates the agreement between the predicted class and the ground truth labels for all pixels, and it can be expressed as follows:
- mIoU indicates the mean Intersection over Union (IoU) between the predicted class and the ground truth label, which can be expressed as follows:
- F1 score is a harmonic mean of precision and recall, and it can be calculated as follows:
4.3. Implementation Details
4.4. Comparison with Regard to State-of-the-Art Models
5. Analysis
5.1. Ablation Study
5.2. Influence of Transformer Backbone
5.3. Influence of SPP-DCN
5.4. Influence of ASFF
5.5. Influence of Composite Loss Function
6. Discussion
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Optimizer | Momentum | Learning Rate | Weight Decay | Acc (%) ↑ | F1-Score ↑ | mIoU ↑ |
---|---|---|---|---|---|---|
SGD | 0.8 | 0.007 | 0.005 | 97.43 | 79.96 | 81.96 |
0.85 | 97.44 | 80.16 | 82.10 | |||
0.9 | 97.49 | 80.89 | 82.63 | |||
0.95 | 97.41 | 79.53 | 81.68 | |||
0.99 | 96.44 | 72.35 | 76.47 | |||
SGD | 0.9 | 0.007 | 0.005 | 96.25 | 71.01 | 75.56 |
0.001 | 97.03 | 77.88 | 80.33 | |||
0.0005 | 97.49 | 80.89 | 82.63 | |||
0.0001 | 97.48 | 80.51 | 82.36 | |||
SGD | 0.009 | 0.001 | 0.005 | 96.77 | 77.62 | 80.01 |
0.003 | 97.43 | 80.08 | 82.05 | |||
0.005 | 97.45 | 80.11 | 82.25 | |||
0.007 | 97.49 | 80.89 | 82.63 | |||
0.01 | 96.12 | 68.95 | 74.28 | |||
Adam | 0.9 | 0.007 | 0.0005 | 96.52 | 73.20 | 76.76 |
Appendix B
Model Architecture | Params. (M) | Input Size | Optimizer | Momentum | Learning Rate |
---|---|---|---|---|---|
U-Net [53] | 13.39 | 512 | SGD | 0.9 | 0.005 |
FCN [33] | 35.31 | 512 | SGD | 0.9 | 0.001 |
SegNet [54] | 29.44 | 512 | SGD | 0.9 | 0.007 |
DeepLabv3 [49] | 41.99 | 512 | SGD | 0.9 | 0.001 |
DeepLabv3+ [56] | 74.87 | 512 | SGD | 0.9 | 0.007 |
DenseASPP [50] | 9.20 | 512 | SGD | 0.9 | 0.001 |
DANet [57] | 66.56 | 512 | SGD | 0.9 | 0.001 |
LR-ASPP [58] | 30.22 | 512 | SGD | 0.9 | 0.001 |
SegNeXt [59] | 29.91 | 512 | SGD | 0.9 | 0.007 |
SegFormer [60] | 47.22 | 512 | SGD | 0.9 | 0.007 |
Swin-Unet [61] | 27.17 | 512 | SGD | 0.9 | 0.007 |
Bi-Former [62] | 56.81 | 512 | SGD | 0.9 | 0.007 |
ViT-CoMer [63] | 60.50 | 512 | SGD | 0.9 | 0.007 |
References
- Li, X.; Guo, Y.; Li, Y. Particle swarm optimization-based SVM for classification of cable surface defects of the cable-stayed bridges. IEEE Access 2019, 8, 44485–44492. [Google Scholar] [CrossRef]
- Wickramasinghe, W.R.; Thambiratnam, D.P.; Chan, T.H.; Nguyen, T. Vibration characteristics and damage detection in a suspension bridge. J. Sound Vib. 2016, 375, 254–274. [Google Scholar] [CrossRef]
- Rizzo, P.; di Scalea, F.L. Feature extraction for defect detection in strands by guided ultrasonic waves. Struct. Health Monit. 2006, 5, 297–308. [Google Scholar] [CrossRef]
- Li, H.; Ou, J.; Zhou, Z. Applications of optical fibre Bragg gratings sensing technology-based smart stay cables. Opt. Lasers Eng. 2009, 47, 1077–1084. [Google Scholar] [CrossRef]
- Cho, K.H.; Jin, Y.H.; Kim, H.M.; Moon, H.; Koo, J.C.; Choi, H.R. Caterpillar-based cable climbing robot for inspection of suspension bridge hanger rope. In Proceedings of the 2013 IEEE International Conference on Automation Science and Engineering (CASE), Madison, WI, USA, 17–20 August 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1059–1062. [Google Scholar] [CrossRef]
- Xu, F.Y.; Wang, X.S.; Wang, L. Climbing model and obstacle-climbing performance of a cable inspection robot for a cable-stayed bridge. Trans. Can. Soc. Mech. Eng. 2011, 35, 269–289. [Google Scholar] [CrossRef]
- Nguyen, S.T.; La, H.M. A climbing robot for steel bridge inspection. J. Intell. Robot. Syst. 2021, 102, 75. [Google Scholar] [CrossRef]
- Sun, L.; Zhang, Y.; Wang, W.; Zhao, J. Lightweight Semantic Segmentation Network for RGB-D Image Based on Attention Mechanism. Packag. Eng. 2022, 43, 10. [Google Scholar]
- Gong, R.; Ding, S.; Zhang, C.; Su, H. Lightweight and multi-pose face recognition method based on deep learning. J. Comput. Appl. 2020, 40, 6. [Google Scholar]
- Shang, H.; Sun, C.; Liu, J.; Chen, X.; Yan, R. Defect-aware transformer network for intelligent visual surface defect detection. Adv. Eng. Inform. 2023, 55, 101882. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015; Computational and Biological Learning Society: San Diego, CA, USA, 2015. [Google Scholar] [CrossRef]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable Convolutional Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar] [CrossRef]
- Li, X.; Gao, C.; Guo, Y.; He, F.; Shao, Y. Cable surface damage detection in cable-stayed bridges using optical techniques and image mosaicking. Opt. Laser Technol. 2019, 110, 36–43. [Google Scholar] [CrossRef]
- Salehi, H.; Burgueño, R. Emerging artificial intelligence methods in structural engineering. Eng. Struct. 2018, 171, 170–189. [Google Scholar] [CrossRef]
- Luo, K.; Kong, X.; Zhang, J.; Hu, J.; Li, J.; Tang, H. Computer vision-based bridge inspection and monitoring: A review. Sensors 2023, 23, 7863. [Google Scholar] [CrossRef] [PubMed]
- Yeum, C.M.; Dyke, S.J.; Ramirez, J. Visual data classification in post-event building reconnaissance. Eng. Struct. 2018, 155, 16–24. [Google Scholar] [CrossRef]
- Xu, Z.; Wang, Y.; Hao, X.; Fan, J. Crack Detection of Bridge Concrete Components Based on Large-Scene Images Using an Unmanned Aerial Vehicle. Sensors 2023, 23, 6271. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Wang, H.; Tu, C.L.; Wang, X.S.; Li, X.D. Surface Defect Detection of Cable Based on Threshold Image Difference. In Proceedings of the 2021 IEEE Far East NDT New Technology & Application Forum (FENDT), Kunming, China, 14–17 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 185–190. [Google Scholar] [CrossRef]
- Hu, J.; He, H.; Liao, G.; Hu, G. Study on Image Processing of Bridge Cable Surface Defect Detection System. In Advances in Precision Instruments and Optical Engineering, Proceedings of the International Conference on Precision Instruments and Optical Engineering, Chengdu, China, 25–27 August 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 447–456. [Google Scholar] [CrossRef]
- Qu, Z.; Feng, H.; Zeng, Z.; Zhuge, J.; Jin, S. A SVM-based pipeline leakage detection and pre-warning system. Measurement 2010, 43, 513–519. [Google Scholar] [CrossRef]
- Hsieh, Y.A.; Tsai, Y.J. Machine learning for crack detection: Review and model performance comparison. J. Comput. Civ. Eng. 2020, 34, 04020038. [Google Scholar] [CrossRef]
- Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep learning-based crack damage detection using convolutional neural networks. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
- Dung, C.V. Autonomous concrete crack detection using deep fully convolutional neural network. Autom. Constr. 2019, 99, 52–58. [Google Scholar] [CrossRef]
- Dong, S.; Tan, H.; Liu, C.; Hu, X. Apparent disease detection of bridges based on improved YOLOv5s. J. Chongqing Univ. 2024, 1–12. Available online: https://link.cnki.net/urlid/50.1044.N.20230331.1847.002 (accessed on 21 July 2024).
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Computer Vision—ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Computer Vision—ECCV 2014, Proceedings of the 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part I 13; Springer: Berlin/Heidelberg, Germany, 2014; pp. 818–833. [Google Scholar] [CrossRef]
- Geng, Q.; Zhou, Z.; Cao, X. Survey of recent progress in semantic image segmentation with CNNs. Sci. China Inf. Sci. 2018, 61, 051101. [Google Scholar] [CrossRef]
- Zhang, Y.; Yuen, K.V. Review of artificial intelligence-based bridge damage detection. Adv. Mech. Eng. 2022, 14, 16878132221122770. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
- Hao, S.; Zhou, Y.; Guo, Y. A brief survey on semantic segmentation with deep learning. Neurocomputing 2020, 406, 302–321. [Google Scholar] [CrossRef]
- Shi, J.; Dang, J.; Cui, M.; Zuo, R.; Shimizu, K.; Tsunoda, A.; Suzuki, Y. Improvement of damage segmentation based on pixel-level data balance using vgg-unet. Appl. Sci. 2021, 11, 518. [Google Scholar] [CrossRef]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Granada, Spain, 20 September 2018; Springer International Publishing: Cham, Switherland, 2018; pp. 3–11. [Google Scholar] [CrossRef]
- Deng, W.; Mou, Y.; Kashiwa, T.; Escalera, S.; Nagai, K.; Nakayama, K.; Matsuo, Y.; Prendinger, H. Vision based pixel-level bridge structural damage detection using a link ASPP network. Autom. Constr. 2020, 110, 102973. [Google Scholar] [CrossRef]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 6877–6886. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
- Thisanke, H.; Deshan, C.; Chamith, K.; Seneviratne, S.; Vidanaarachchi, R.; Herath, D. Semantic segmentation using Vision Transformers: A survey. Eng. Appl. Artif. Intell. 2023, 126, 106669. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar] [CrossRef]
- He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin transformer embedding UNet for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4408715. [Google Scholar] [CrossRef]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 568–578. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep Learning With Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Fang, M.; Liang, X.; Fu, F.; Song, Y.; Shao, Z. Attention mechanism based semi-supervised multi-gain image fusion. Symmetry 2020, 12, 451. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
- Islam, M.A.; Jia, S.; Bruce, N.D. How much Position Information Do Convolutional Neural Networks Encode? In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar] [CrossRef]
- Yang, M.; Yu, K.; Zhang, C.; Li, Z.; Yang, K. DenseASPP for Semantic Segmentation in Street Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
- Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H.; et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14408–14419. [Google Scholar] [CrossRef]
- Milletari, F.; Navab, N.; Ahmadi, S.A. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 565–571. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18; Springer: Cham, Switherland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 July 2017. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar] [CrossRef]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef]
- Guo, M.H.; Lu, C.Z.; Hou, Q.; Liu, Z.; Cheng, M.M.; Hu, S.M. SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. arXiv 2022, arXiv:2209.08575. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 205–218. [Google Scholar] [CrossRef]
- Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R.W. BiFormer: Vision Transformer with Bi-Level Routing Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 10323–10333. [Google Scholar] [CrossRef]
- Xia, C.; Wang, X.; Lv, F.; Hao, X.; Shi, Y. ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions. arXiv 2024, arXiv:2403.07392. [Google Scholar] [CrossRef]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
- Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 16133–16142. [Google Scholar] [CrossRef]
- Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L.; et al. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12009–12019. [Google Scholar] [CrossRef]
Stage i | Overlap Patch Merging | Efficient Multi-Head Attention | ||||
---|---|---|---|---|---|---|
Overlap Convolution (k, s, p) | Embedding Dimension (ED) | Output Stride (OS) | Multi-Head Number | Depth | Reduction Ratio | |
Stage 1 | (7, 4, 3) | 64 | 4 | 1 | 3 | 8 |
Stage 2 | (3, 2, 1) | 128 | 8 | 2 | 4 | 4 |
Stage 3 | (3, 2, 1) | 320 | 16 | 5 | 6 | 2 |
Stage 4 | (3, 2, 1) | 512 | 32 | 8 | 3 | 1 |
Method | Model Architecture | Params. (M) ↓ | GFlops ↓ | Evaluation ↑ | ||
---|---|---|---|---|---|---|
Acc (%) | mIoU (%) | F1-Score (%) | ||||
Convolution- based | U-Net [53] | 13.39 | 124.49 | 96.33 | 78.21 | 74.25 |
FCN (VGG-16) [33] | 35.31 | 148.53 | 96.90 | 79.12 | 76.31 | |
SegNet (VGG-16) [54] | 29.44 | 160.68 | 97.34 | 81.41 | 79.25 | |
PSPNet (ResNet50) [55] | 46.71 | 59.21 | 97.03 | 78.26 | 74.45 | |
DeepLabv3 (ResNet50) [49] | 41.99 | 173.79 | 97.12 | 78.69 | 74.98 | |
DeepLabv3+ (MobileNet-V2) [56] | 41.83 | 33.75 | 97.02 | 78.90 | 76.17 | |
DeepLabv3+ (ResNet101) [56] | 74.87 | 82.88 | 97.09 | 80.72 | 78.32 | |
DenseASPP (DenseNet121) [50] | 9.20 | 43.19 | 96.79 | 75.67 | 69.33 | |
DANet (ResNet101) [57] | 66.56 | 283.44 | 96.86 | 75.61 | 69.06 | |
LR-ASPP (MobileNet-V3) [58] | 30.22 | 20.07 | 96.57 | 76.93 | 72.99 | |
SegNeXt (Base) [59] | 29.91 | 41.55 | 94.66 | 65.34 | 56.13 | |
Transformer- based | SegFormer (B2) [60] | 47.22 | 71.36 | 97.31 | 81.47 | 79.36 |
Swin-Unet (Tiny) [61] | 27.17 | 5.92 | 96.71 | 78.02 | 74.61 | |
Bi-Former (Base) [62] | 56.81 | 91.10 | 95.12 | 68.02 | 58.25 | |
ViT-CoMer (Small) [63] | 60.50 | 1194.16 | 96.70 | 77.32 | 73.49 | |
Trans-DCN (ours) | 44.96 | 52.03 | 97.49 | 82.63 | 80.89 |
Model | MFA | SPP-DCN | ASFF | Aux-Loss | Evaluation ↑ | ||
---|---|---|---|---|---|---|---|
Acc (%) | mIoU (%) | F1-Score (%) | |||||
1 | - | - | - | - | 95.55 | 77.12 | 70.23 |
2 | ✓ | - | - | - | 97.13 | 80.29 | 77.89 |
3 | ✓ | - | - | ✓ | 97.14 | 80.48 | 78.03 |
4 | ✓ | ✓ | - | ✓ | 97.35 | 81.42 | 79.24 |
5 | ✓ | - | ✓ | ✓ | 97.30 | 81.14 | 78.87 |
6 | ✓ | ✓ | ✓ | ✓ | 97.49 | 82.63 | 80.89 |
Method | Backbone | Params. (M) ↓ | Evaluation ↑ | ||
---|---|---|---|---|---|
Acc (%) | mIoU (%) | F1-Score (%) | |||
Convolution- based | ResNet-101 [47] | 42.5 | 96.37 | 78.70 | 73.69 |
ResNeXt-101 (32 × 8d) [64] | 86.74 | 96.60 | 79.93 | 76.72 | |
Xception [56] | 37.87 | 89.32 | 57.16 | 48.66 | |
MobileNet-V2 [65] | 15.4 | 96.96 | 75.14 | 66.91 | |
ConvNeXtV2-B [66] | 88.72 | 93.37 | 65.35 | 55.27 | |
Transformer- based | Swin Transformer-T [41] | 96.52 | 93.62 | 76.76 | 72.81 |
Swin TransformerV2-T [67] | 27.58 | 95.94 | 69.00 | 59.86 | |
ViT-CoMer-S [63] | 37.34 | 96.90 | 78.98 | 75.94 | |
Proposed method | 24.2 | 97.49 | 82.63 | 80.89 |
Stage i | Embedding Dimension | Multi-Head Number | Params. (M) ↓ | GFlops ↓ | Evaluation ↑ | ||
---|---|---|---|---|---|---|---|
Acc (%) | mIoU (%) | F1-Score (%) | |||||
1 | 64 | 1 | 92.06 | 69.72 | 97.41 | 81.89 | 79.82 |
2 | 256 | 4 | |||||
3 | 512 | 8 | |||||
4 | 1024 | 16 | |||||
1 | 64 | 1 | 24.2 | 39.44 | 97.49 | 82.63 | 80.89 |
2 | 128 | 2 | |||||
3 | 320 | 5 | |||||
4 | 512 | 8 |
Model | Shortcut | MLP | SE | Evaluation ↑ | ||
---|---|---|---|---|---|---|
Acc (%) | mIoU (%) | F1-Score (%) | ||||
1 | - | - | - | 97.29 | 81.03 | 80.25 |
2 | - | ✓ | ✓ | 97.50 | 82.47 | 80.65 |
3 | ✓ | - | - | 97.38 | 80.10 | 80.22 |
4 | ✓ | - | ✓ | 97.46 | 81.14 | 80.21 |
5 | ✓ | ✓ | - | 97.37 | 82.16 | 80.31 |
6 | ✓ | ✓ | ✓ | 97.49 | 82.63 | 80.89 |
Branches | Max RF | Evaluation ↑ | ||
---|---|---|---|---|
Acc (%) | mIoU (%) | F1-Score (%) | ||
(6, 12, 18) | 73 | 96.29 | 75.95 | 71.62 |
(3, 6, 12, 18) | 79 | 96.44 | 78.21 | 75.01 |
(6, 12, 18, 24) | 122 | 96.76 | 79.94 | 77.25 |
(3, 6, 12, 18, 24) | 128 | 97.49 | 82.63 | 80.89 |
Model | Channel | Params. (M) ↓ | GFlops ↓ | Evaluation ↑ | ||
---|---|---|---|---|---|---|
Acc (%) | mIoU (%) | F1-Score (%) | ||||
1 | 64 | 0.06 | 0.7 | 97.47 | 82.12 | 80.16 |
2 | 128 | 0.2 | 2.64 | 97.49 | 82.63 | 80.89 |
3 | 320 | 0.99 | 15.43 | 97.33 | 81.80 | 79.81 |
Ratio | Evaluation ↑ | ||||
---|---|---|---|---|---|
Acc (%) | mIoU (%) | F1-Score (%) | |||
1 | 0 | 0.5 | 97.28 | 81.40 | 79.27 |
1 | 0.5 | 0.5 | 97.37 | 81.71 | 79.65 |
1.5 | 0.5 | 0.5 | 97.39 | 82.09 | 80.19 |
2 | 0.5 | 0.5 | 97.49 | 82.63 | 80.89 |
2.5 | 0.5 | 0.5 | 97.42 | 82.41 | 80.36 |
3 | 0.5 | 0.5 | 97.37 | 81.95 | 80.01 |
2 | 0.25 | 0.5 | 97.36 | 81.94 | 80.00 |
2 | 0.75 | 0.5 | 97.44 | 82.29 | 80.44 |
2 | 0.5 | 0 | 97.44 | 82.06 | 80.10 |
2 | 0.5 | 0.25 | 97.44 | 82.17 | 80.27 |
2 | 0.5 | 0.75 | 97.55 | 82.46 | 80.60 |
2 | 0.5 | 1 | 97.46 | 82.20 | 80.30 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, Z.; Guo, B.; Deng, X.; Guo, W.; Min, X. Trans-DCN: A High-Efficiency and Adaptive Deep Network for Bridge Cable Surface Defect Segmentation. Remote Sens. 2024, 16, 2711. https://doi.org/10.3390/rs16152711
Huang Z, Guo B, Deng X, Guo W, Min X. Trans-DCN: A High-Efficiency and Adaptive Deep Network for Bridge Cable Surface Defect Segmentation. Remote Sensing. 2024; 16(15):2711. https://doi.org/10.3390/rs16152711
Chicago/Turabian StyleHuang, Zhihai, Bo Guo, Xiaolong Deng, Wenchao Guo, and Xing Min. 2024. "Trans-DCN: A High-Efficiency and Adaptive Deep Network for Bridge Cable Surface Defect Segmentation" Remote Sensing 16, no. 15: 2711. https://doi.org/10.3390/rs16152711
APA StyleHuang, Z., Guo, B., Deng, X., Guo, W., & Min, X. (2024). Trans-DCN: A High-Efficiency and Adaptive Deep Network for Bridge Cable Surface Defect Segmentation. Remote Sensing, 16(15), 2711. https://doi.org/10.3390/rs16152711