Distilling Knowledge from a Transformer-Based Crack Segmentation Model to a Light-Weighted Symmetry Model with Mixed Loss Function for Portable Crack Detection Equipment
Abstract
:1. Introduction
- (1)
- Traditional loss functions [14] used for knowledge distilling do not ensure that the detail and global features extracted by the teacher model are consistent with those of the student model. This may cause deviations in the features learned by the student model. However, crack features are very subtle and have strong global contextual information. These deviations would lead to significant changes in the final prediction generated by the student model. To solve this problem, a mixed loss function proposed by us is used to substitute the traditional loss function during the knowledge distilling process.
- (2)
- Traditional UNet models do not consider the dependency relationships between different crack positions in feature maps, which would cause discontinuity in crack feature extraction. Therefore, stacked transformer modules are used to capture these dependency relationships to achieve contextual awareness in our designed TBUNet.
- (3)
- In the ULNet, only a tiny UNet with light-weighted parameters is used for maintaining very low computational complexity. In addition, compared with the traditional UNet, depth-wise separable convolutions are used to replace regular convolutions to further reduce computational complexity.
- (4)
- The current public dataset for crack segmentation is relatively simple. To improve it, this paper presents a special crack dataset named MICrack. This dataset includes multiple angles, occlusions, and environments of cracks, meeting the needs of portable crack detection devices.
2. Methods
2.1. The Structure of the Knowledge Distilling
2.2. The Mixed Loss Fuction
2.3. The Transformer-Based UNet (TBUNet)
2.4. The Ultra-Light-Weighted Model (ULNet)
2.5. Dataset Collection
3. Datasets and Experimental Setup
3.1. Datasets
3.2. Experimental Setup in Our Methods
3.3. Evaluation Criteria
- (1)
- Accuracy
- (2)
- Recall
- (3)
- F1-measure
4. Results
4.1. Comparison with the Main Stream Crack Segmentation Models
- Knowledge distilling is used to transfer the crack knowledge learned by the teacher model to the student model. Here, our teacher model is a complex and accurate model, while the student model is a simplified light-weighted model. Knowledge distilling can effectively supplement the ability of student models to extract high-level features, reducing model complexity while maintaining good performance.
- In addition, knowledge distillation loss function is improved by using multiple loss functions to supervise the knowledge transfer of high-level detail features of cracks, the knowledge transfer of the global features of cracks, the knowledge transfer of the prediction results, and the knowledge transfer of the Ground Truth. These loss functions are added together for overall training, ensuring the detail features and global features of cracks remain consistent between the teacher model and the student model during the knowledge distillation process.
- In the teacher model, stacked transformer layers are designed to learn the dependency relationships between different crack positions in feature maps. This design ensures the continuity of crack feature extraction, which improves final accuracy.
4.2. Effects of Using Different Numbers of 1 ∗ 1 Filters in DWConv in ULNet
4.3. Comparison of the Mainstream Light-Weighted Models
4.4. Effects of Using Different Parts in Our Designed Mixed Loss Function
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wu, C.; Sun, K.; Xu, Y.; Zhang, S.; Huang, X.; Zeng, S. Concrete crack detection method based on optical fiber sensing network and microbending principle. Saf. Sci. 2019, 117, 299–304. [Google Scholar] [CrossRef]
- Bradski, G.; Daebler, A. Learning OpenCV: Computer Vision with OpenCV Library; University of Arizona: Tucson, AZ, USA, 2008. [Google Scholar]
- Meghana, R.K.; Apoorva, S.; Mohana; Chitkara, Y. Inspection, Identification and Repair Monitoring of Cracked Concrete Structure—An Application of Image Processing. In Proceedings of the 2018 3rd International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 15–16 October 2018; pp. 1151–1154. [Google Scholar]
- Dorafshan, S.; Maguire, M.; Thomas, R.J. SDNET2018: A Concrete Crack Image Dataset for Machine Learning Applications; Utah State University: Logan, UT, USA, 2018. [Google Scholar] [CrossRef]
- Liu, J. Road Crack Detection Using HDD LOSS and Dual Attention Module with DeepLabv3+. In Proceedings of the 2023 3rd International Conference on Digital Society and Intelligent Systems (DSInS), Chengdu, China, 10–12 November 2023; pp. 148–152. [Google Scholar]
- Zhou, S.; Wang, Q.; Wu, H.; Wang, Q.; Meng, Y.; Shen, T. ASSA-UNet: An Efficient UNet-Based Network for Chip Internal Defect Detection. In Proceedings of the 2023 11th International Conference on Information Systems and Computing Technology (ISCTech), Qingdao, China, 30 July–1 August 2023. [Google Scholar] [CrossRef]
- Fan, L.; Zhao, H.; Li, Y.; Li, S.; Zhou, R.; Chu, W. RAO-UNet: A residual attention and octave UNet for road crack detection via balance loss. IET Intell. Transp. Syst. 2022, 16, 332–343. [Google Scholar] [CrossRef]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6877–6886. [Google Scholar]
- Yang, Y.; Niu, Z.; Su, L.; Xu, W.; Wang, Y. Multi-scale feature fusion for pavement crack detection based on Transformer. Math. Biosci. Eng. 2023, 20, 14920–14937. [Google Scholar] [CrossRef] [PubMed]
- Aso, R.; Shiota, S.; Kiya, H. Enhanced Security with Encrypted Vision Transformer in Federated Learning. In Proceedings of the 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE), Nara, Japan, 10–13 October 2023. [Google Scholar] [CrossRef]
- Cao, T.; Hu, J.; Liu, S. Enhanced Edge Detection for 3D Crack Segmentation and Depth Measurement with Laser Data. Int. J. Pattern Recognit. Artif. Intell. 2022, 36, 2255006. [Google Scholar] [CrossRef]
- Zhang, E.; Shao, L.; Wang, Y. Unifying transformer and convolution for dam crack detection. Autom. Constr. 2023, 147, 104712. [Google Scholar] [CrossRef]
- Chen, Z.; Cai, C.; Zheng, T.; Luo, J.; Xiong, J.; Wang, X. RF-Based Human Activity Recognition Using Signal Adapted Convolutional Neural Network. IEEE Trans. Mob. Comput. 2021, 22, 487–499. [Google Scholar] [CrossRef]
- Kang, J.; Wang, Z.; Zhu, R.; Xia, J.; Sun, X.; Fernandez-Beltran, R.; Plaza, A. DisOptNet: Distilling Semantic Knowledge From Optical Images for Weather-Independent Building Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4706315. [Google Scholar] [CrossRef]
- Qu, Z.; Mei, J.; Liu, L.; Zhou, D.-Y. Crack Detection of Concrete Pavement With Cross-Entropy Loss Function and Improved VGG16 Network Model. IEEE Access 2020, 8, 54564–54573. [Google Scholar] [CrossRef]
- Maurya, A.; Chand, S. A global context and pyramidal scale guided convolutional neural network for pavement crack detection. Int. J. Pavement Eng. 2023, 24, 2180638. [Google Scholar] [CrossRef]
- Mercioni, M.A.; Holban, S. P-Swish: Activation Function with Learnable Parameters Based on Swish Activation Function in Deep Learning. In Proceedings of the 2020 International Symposium on Electronics and Telecommunications (ISETC), Timișoara, Romania, 5–6 November 2020. [Google Scholar] [CrossRef]
- Qin, C.; Li, B.; Han, B. Fast brain tumor detection using adaptive stochastic gradient descent on shared-memory parallel environment. Eng. Appl. Artif. Intell. 2023, 120, 105816. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 25, 15263. [Google Scholar]
- Jenkins, M.D.; Carr, T.A.; Iglesias, M.I.; Buggy, T.; Morison, G. A deep convolutional neural network for semantic pixel-wise segmentation of road and pavement surface cracks. In Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–7 September 2018. [Google Scholar]
- Nguyen, N.T.H.; Le, T.H.; Perry, S.; Nguyen, T.T. Pavement crack detection using convolutional neural network. In Proceedings of the International Symposium on Information and Communication Technology, Da Nang, Vietnam, 6–7 December 2018. [Google Scholar]
- Di Benedetto, A.; Fiani, M.; Gujski, L.M. U-Net-Based CNN Architecture for Road Crack Segmentation. Infrastructures 2023, 8, 90. [Google Scholar] [CrossRef]
- Yang, G.; Geng, P.; Ma, H.; Liu, J.; Luo, J. Dwta-unet: Concrete crack segmentation based on discrete wavelet transform and unet. In Proceedings of 2021 Chinese Intelligent Automation Conference; Deng, Z., Ed.; Lecture Notes in Electrical Engineering; Springer: Singapore, 2022; Volume 801. [Google Scholar]
- Han, C.; Ma, T.; Huyan, J.; Huang, X.; Zhang, Y. Crackw-net: A novel pavement crack image segmentation convolutional neural network. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22135–22144. [Google Scholar] [CrossRef]
- Zhang, C.; Jiang, W.; Zhao, Q. Semantic segmentation of aerial imagery via split-attention networks with disentangled nonlocal and edge supervision. Remote Sens. 2021, 13, 1176. [Google Scholar] [CrossRef]
- Sun, X.; Xie, Y.; Jiang, L.; Cao, Y.; Liu, B. Dma-net: Deeplab with multi-scale attention for pavement crack segmentation. IEEE Trans. Intell. Transp. Syst. 2022, 23, 18392–18403. [Google Scholar] [CrossRef]
- Jun, F.; Li, J.; Shi, Y.; Zhao, Y.; Zhang, C. Acau-net: Atrous convolution and attention u-net model for pavement crack segmentation. In Proceedings of the 2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI), Shijiazhuang, China, 22–24 July 2022; pp. 561–565. [Google Scholar]
- Li, J.; Liu, Y.; Zhang, Y.; Zhang, Y. Cascaded attention denseunet (cadunet) for road extraction from very-high-resolution images. Int. J. Geo-Inf. 2021, 10, 329. [Google Scholar] [CrossRef]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Hu, Q. Eca-net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Gao, Z.; Peng, B.; Li, T.; Gou, C. Generative adversarial networks for road crack image segmentation. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
- Nhung Hong Thi Nguyen, A.; Stuart Perry, A.; Don Bone, A.; Ha Thanh Le, B.; Thuy Thi Nguyen, C. Two-stage convolutional neural network for road crack detection and segmentation. Expert Syst. Appl. 2021, 186, 115718. [Google Scholar] [CrossRef]
- Zhang, X.; Huang, H. PSNet: Parallel-Convolution-Based U-Net for Crack Detection with Self-Gated Attention Block. Appl. Sci. 2023, 13, 9875. [Google Scholar] [CrossRef]
- Zhang, X.; Huang, H. PHCNet: Pyramid Hierarchical-Convolution-Based U-Net for Crack Detection with Mixed Global Attention Module and Edge Feature Extractor. Appl. Sci. 2023, 13, 10263. [Google Scholar] [CrossRef]
- Emara, T.; Munim HE, A.E.; Abbas, H.M. LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation. In Proceedings of the 2019 Digital Image Computing: Techniques and Applications (DICTA), Perth, Australia, 2–4 December 2019; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar] [CrossRef]
- Wang, B.; Li, H.S. Lane detection algorithm based on MoblieNet + UNet lightweight network. In Proceedings of the 2021 3rd International Symposium on Robotics & Intelligent Manufacturing Technology (ISRIMT), Changzhou, China, 24–26 September 2021; pp. 352–356. [Google Scholar] [CrossRef]
- Tsai, T.H.; Tseng, Y.W. BiSeNet V3: Bilateral segmentation network with coordinate attention for real-time semantic segmentation. Neurocomputing 2023, 532, 33–42. [Google Scholar] [CrossRef]
- Ruan, J.; Xie, M.; Gao, J.; Liu, T.; Fu, Y. Ege-unet: An efficient group enhanced unet for skin lesion segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2023; pp. 481–490. [Google Scholar]
- Jiang, W.; Xie, Z.; Li, Y.; Liu, C.; Lu, H. Lrnnet: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation. In Proceedings of the 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, UK, 6–10 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
Crack Dataset | Resolution of Images | Train | Test | Train:Test |
---|---|---|---|---|
Cracktree200 | 800 width and 600 height | 165 | 41 | 8:2 |
Crack500 | 2560 width and 1440 height/1440 width and 2560 height | 1896 | 1124 | 6:4 |
MICrack | 1920 width and 1080 height | 4000 | 1000 | 4:1 |
(a) | ||||
Methods | Cracktree200 | Crack500 | MICrack | Model Size |
ConvNet [19] | 0.471 | 0.591 | 0.392 | - |
U-Net by Jenkins [20] | 0.75 | 0.681 | 0.519 | - |
U-Net by Nguyen [21] | 0.763 | 0.695 | 0.531 | - |
U-Net proposed by Di [22] | 0.791 | 0.732 | 0.546 | - |
DWTA-U-Net [23] | 0.90 | 0.77 | 0.671 | - |
CrackW-Net [24] | 0.855 | 0.789 | 0.632 | - |
Split-Attention Network [25] | 0.851 | 0.73 | 0.563 | - |
DMA-Net [26] | 0.793 | 0.746 | 0.58 | - |
ACAU-Net [27] | 0.861 | 0.792 | 0.62 | - |
Cascaded Attention DenseU-Net [28] | 0.863 | 0.74 | 0.598 | 137 M |
ECA-Net [29] | 0.885 | 0.753 | 0.617 | 87 M |
FU-Net [30] | 0.89 | 0.795 | 0.661 | 90 M |
Two-stage-CNN [31] | 0.892 | 0.79 | 0.664 | 230 M |
PSNet [32] | 0.926 | 0.812 | 0.681 | 185 M |
PHCNet [33] | 0.929 | 0.823 | 0.702 | 167 M |
ULNet | 0.962 | 0.876 | 0.753 | 1 M |
(b) | ||||
Methods | Cracktree200 | Crack500 | MICrack | |
Split-Attention Network | 0.857 | 0.725 | 0.557 | |
DMA-Net | 0.823 | 0.775 | 0.614 | |
ACAU-Net | 0.854 | 0.776 | 0.61 | |
Cascaded Attention DenseU-Net | 0.853 | 0.732 | 0.58 | |
ECA-Net | 0.891 | 0.767 | 0.628 | |
FU-Net | 0.864 | 0.761 | 0.643 | |
Two-stage-CNN | 0.851 | 0.773 | 0.652 | |
PSNet | 0.932 | 0.829 | 0.693 | |
PHCNet | 0.914 | 0.817 | 0.698 | |
ULNet | 0.971 | 0.885 | 0.762 | |
(c) | ||||
Methods | Cracktree200 | Crack500 | MICrack | |
Split-Attention Network | 0.85 | 0.73 | 0.56 | |
DMA-Net | 0.81 | 0.76 | 0.60 | |
ACAU-Net | 0.86 | 0.78 | 0.61 | |
Cascaded Attention DenseU-Net | 0.86 | 0.74 | 0.59 | |
ECA-Net | 0.89 | 0.76 | 0.62 | |
FU-Net | 0.88 | 0.78 | 0.65 | |
Two-stage-CNN | 0.87 | 0.78 | 0.66 | |
PSNet | 0.93 | 0.82 | 0.69 | |
PHCNet | 0.92 | 0.82 | 0.70 | |
ULNet | 0.97 | 0.88 | 0.76 |
Methods | Cracktree200 | Crack500 | MICrack |
---|---|---|---|
LiteSeg [34] | 0.925 | 0.814 | 0.703 |
MobileNet+UNet [35] | 0.892 | 0.786 | 0.665 |
BiSeNet v3 [36] | 0.919 | 0.792 | 0.672 |
EGE-UNet [37] | 0.928 | 0.803 | 0.687 |
LRNNet [38] | 0.937 | 0.812 | 0.715 |
ULNet | 0.962 | 0.876 | 0.753 |
Loss Function | Cracktree200 | Crack500 | MICrack |
---|---|---|---|
Lall = Lps + Lgs | 0.935 | 0.839 | 0.723 |
Lall = Les + Lps + Lgs | 0.947 | 0.858 | 0.736 |
Lall = Las + Lps + Lgs | 0.951 | 0.863 | 0.742 |
Lall = Les + Las + Lps + Lgs | 0.962 | 0.876 | 0.753 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, X.; Huang, H. Distilling Knowledge from a Transformer-Based Crack Segmentation Model to a Light-Weighted Symmetry Model with Mixed Loss Function for Portable Crack Detection Equipment. Symmetry 2024, 16, 520. https://doi.org/10.3390/sym16050520
Zhang X, Huang H. Distilling Knowledge from a Transformer-Based Crack Segmentation Model to a Light-Weighted Symmetry Model with Mixed Loss Function for Portable Crack Detection Equipment. Symmetry. 2024; 16(5):520. https://doi.org/10.3390/sym16050520
Chicago/Turabian StyleZhang, Xiaohu, and Haifeng Huang. 2024. "Distilling Knowledge from a Transformer-Based Crack Segmentation Model to a Light-Weighted Symmetry Model with Mixed Loss Function for Portable Crack Detection Equipment" Symmetry 16, no. 5: 520. https://doi.org/10.3390/sym16050520
APA StyleZhang, X., & Huang, H. (2024). Distilling Knowledge from a Transformer-Based Crack Segmentation Model to a Light-Weighted Symmetry Model with Mixed Loss Function for Portable Crack Detection Equipment. Symmetry, 16(5), 520. https://doi.org/10.3390/sym16050520