Mask-Space Optimized Transformer for Semantic Segmentation of Lithium Battery Surface Defect Images
Abstract
:1. Introduction
- We introduce mask classification into the analysis of lithium battery surface defect images and propose a novel Mask Boundary Loss (MBL) module to aid the mask attention mechanism in learning more precise computational regions, thereby further improving the model’s foreground segmentation accuracy.
- Considering the randomness in the generation of small defect locations in lithium battery surface defect images, the Dynamic Spatial Query (DSQ) module integrates position encoding of image features into the query, aiming to enhance the model’s sensitivity to small foreground objects.
- The Efficient Pixel Decoder (EPD) module achieves deformable receptive fields for irregular foreground objects through the fully convolutional FPN (Feature Pyramid Network) architecture, enhancing both model performance and operational efficiency.
- Experimental results demonstrate that the proposed MSOFormer surpasses existing methods, achieving state-of-the-art performance on the lithium battery surface defect dataset, the MT dataset, and the NEU-Seg dataset.
2. Related Work
2.1. General Semantic Segmentation
2.2. Semantic Segmentation of a Surface Defect Image
2.3. Mask Classification
3. Methods
3.1. Architecture of an Efficient Multi-Scale Pixel Decoder
3.2. Architecture of a Dynamic Spatial Query Module
3.3. Architecture of a Mask Boundary Loss Module
4. Experiments
4.1. Dataset Description
4.1.1. The Lithium Battery Surface Defect Dataset
4.1.2. The MT Dataset
4.1.3. The NEU-Seg Dataset
4.2. Experimental Settings
4.2.1. Implementation Details
4.2.2. Evaluation Metrics
4.3. Dataset Preprocessing
4.4. Comparison with Other Methods
4.4.1. Segmentation Results on the Lithium Battery Surface Defect Dataset
- The MSOFormer model demonstrates superior performance compared to other advanced models, regardless of whether the ResNet or Swin backbone network is used.
- In the incremental comparison of MSOFormer relative to other methods, the increase in mIoU is the most significant among the three evaluation metrics. For instance, compared to the KNet model, which performs best in pixel classification, MSOFormer achieved increases of 3.7%, 2.26%, and 2.84% in mIoU, mPrecision, and mRecall, respectively, showing consistent results across other methods. IoU reflects the model’s ability to differentiate between foreground and background, indicating that the mask classification mechanism of MSOFormer effectively addresses the issue of foreground–background imbalance in lithium battery defect images.
- For defect mPrecision and mRecall, MSOFormer shows more substantial improvements over Mask2Former (baseline) and achieves the best Intersection over Union (IoU) performance in the categories of nonmetal impurity, metal scrap, electrode fold, and electrode damage. In categories prone to false defects, such as nonmetal impurity and metal scrap, MSOFormer and its variants also demonstrate significant progress compared to other methods, highlighting the considerable advantage of mask classification in learning imbalanced image information. Additionally, under the same backbone network conditions, the MSOFormer model achieves higher performance with fewer parameters and computational resources, thanks to the design of the EPD network. For the detection of small target categories such as nonmetal impurity and metal scrap, the MSOFormer model, based on a multi-scale learning strategy, incorporates spatial location awareness information according to defect characteristics, enhancing the representation of small-size foreground targets and further alleviating the issue of random generation of small target positions.
4.4.2. Segmentation Results on the MT Dataset and the NEU-Seg Dataset
- MSOFormer consistently outperforms other methods, showing significant advantages in mIoU, mPrecision, and mRecall, which demonstrates the superiority and generalization capability of the proposed MSOFormer model.
- A further analysis of the IoU scores for each category reveals that MSOFormer achieves the best performance in categories such as Blowhole, Break, Crack, and Uneven in the MT dataset, as well as Inclusion and Patches in the NEU-Seg dataset. In some categories, its performance is only slightly behind other mask classification methods. This indicates that the proposed method is more capable of modeling complex and irregular targets and highlights the importance of mask classification approaches in the semantic segmentation of industrial surface defect images.
- Compared to the baseline, with the same backbone, MSOFormer shows improvements in mIoU and mPrecision over the baseline model Mask2Former by 0.56% and 1.97% on the MT dataset, and 0.59% and 0.78% on the NEU-Seg dataset. These results preliminarily demonstrate the advantages of the DSQ and MBL modules, providing a more accurate and robust segmentation model and exploring a new paradigm for foreground–background separation in industrial surface defect image semantic segmentation. More details of the ablation experiments are analyzed in Section 5.3.
4.5. Simulation Production Line Test
5. Discussion
5.1. Certification of MBL Module
5.2. Study of Hyperparameters
5.3. Effectiveness Analysis of the Modules
- (1)
- Adding EPD and DSQ to the baseline module resulted in improvements across all metrics. Compared to the baseline model Mask2Former, DSQ can more effectively integrate the semantic and visual representations of randomly dispersed objects through positional information.
- (2)
- The MBL module provides a more significant performance boost compared to the traditional mask attention mechanism, with improvements of 0.71% and 0.34% in mIoU and m-F1, respectively.
- (3)
- The EPD, DSQ, and MBL modules are complementary and work synergistically to enhance the final model’s performance. Thus, the proposed MSOFormer can achieve state-of-the-art performance on the lithium battery defect dataset by learning a balanced interdependence between foreground and background.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Jha, S.B.; Babiceanu, R.F. Deep CNN-Based Visual Defect Detection: Survey of Current Literature. Comput. Ind. 2023, 148, 103911. [Google Scholar] [CrossRef]
- Rong, D.; Rao, X.; Ying, Y. Computer Vision Detection of Surface Defect on Oranges by Means of a Sliding Comparison Window Local Segmentation Algorithm. Comput. Electron. Agric. 2017, 137, 59–68. [Google Scholar] [CrossRef]
- Kim, H.; Lee, S.; Han, S. Railroad Surface Defect Segmentation Using a Modified Fully Convolutional Network. KSII Trans. Internet Inf. Syst. TIIS 2020, 14, 4763–4775. [Google Scholar] [CrossRef]
- Guo, F.; Chen, Z.; Hu, J.; Zuo, L.; Xiahou, T.; Liu, Y. An End-to-End Bilateral Network for Multidefect Detection of Solid Propellants. IEEE Trans. Ind. Inform. 2024, 20, 8347–8357. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Cao, J.; Yang, G.; Yang, X. A Pixel-Level Segmentation Convolutional Neural Network Based on Deep Feature Fusion for Surface Defect Detection. IEEE Trans. Instrum. Meas. 2020, 70, 5003712. [Google Scholar] [CrossRef]
- Liang, Z.; Zhang, H.; Liu, L.; He, Z.; Zheng, K. Defect Detection of Rail Surface with Deep Convolutional Neural Networks. In Proceedings of the 2018 13th World Congress on Intelligent Control and Automation (WCICA), Changsha, China, 4–8 July 2018; pp. 1317–1322. [Google Scholar]
- Zhang, J.; Ding, R.; Ban, M.; Guo, T. FDSNeT: An Accurate Real-Time Surface Defect Segmentation Network. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022; pp. 3803–3807. [Google Scholar]
- Dong, H.; Song, K.; He, Y.; Xu, J.; Yan, Y.; Meng, Q. PGA-Net: Pyramid Feature Fusion and Global Context Attention Network for Automated Surface Defect Detection. IEEE Trans. Ind. Inform. 2020, 16, 7448–7458. [Google Scholar] [CrossRef]
- Schmid, S.; Reinhardt, J.; Grosse, C.U. Spatial and Temporal Deep Learning for Defect Detection with Lock-in Thermography. NDT E Int. 2024, 143, 103063. [Google Scholar] [CrossRef]
- Wang, J.; Xu, G.; Yan, F.; Wang, J.; Wang, Z. Defect Transformer: An Efficient Hybrid Transformer Architecture for Surface Defect Detection. arXiv 2022, arXiv:2207.08319. [Google Scholar] [CrossRef]
- Cheng, Z.; Sun, H.; Cao, Y.; Cao, W.; Wang, J.; Yuan, G.; Zheng, J. Pyramid Cross Attention Network for Pixel-Wise Surface Defect Detection. NDT E Int. 2024, 143, 103053. [Google Scholar] [CrossRef]
- Liu, T.; Zheng, P.; Liu, X. A Multiple Scale Spaces Empowered Approach for Welding Radiographic Image Defect Segmentation. NDT E Int. 2023, 139, 102934. [Google Scholar] [CrossRef]
- Xiao, M.; Yang, B.; Wang, S.; Mo, F.; He, Y.; Gao, Y. GRA-Net: Global Receptive Attention Network for Surface Defect Detection. Knowl. Based Syst. 2023, 280, 111066. [Google Scholar] [CrossRef]
- Zhang, C.; Cui, J.; Wu, J.; Zhang, X. Attention Mechanism and Texture Contextual Information for Steel Plate Defects Detection. J. Intell. Manuf. 2024, 35, 2193–2214. [Google Scholar] [CrossRef]
- Zhang, Y.; Wu, J.; Li, Q.; Zhao, X.; Tan, M. Beyond Crack: Fine-Grained Pavement Defect Segmentation Using Three-Stream Neural Networks. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14820–14832. [Google Scholar] [CrossRef]
- Wang, C.; Chen, H.; Zhao, S. RERN: Rich Edge Features Refinement Detection Network for Polycrystalline Solar Cell Defect Segmentation. IEEE Trans. Ind. Inform. 2024, 20, 1408–1419. [Google Scholar] [CrossRef]
- Lin, Q.; Zhou, J.; Ma, Q.; Ma, Y.; Kang, L.; Wang, J. EMRA-Net: A Pixel-Wise Network Fusing Local and Global Features for Tiny and Low-Contrast Surface Defect Detection. IEEE Trans. Instrum. Meas. 2022, 71, 2504314. [Google Scholar] [CrossRef]
- Niu, S.; Li, B.; Wang, X.; Peng, Y. Region- and Strength-Controllable GAN for Defect Generation and Segmentation in Industrial Images. IEEE Trans. Ind. Inform. 2022, 18, 4531–4541. [Google Scholar] [CrossRef]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-Attention Mask Transformer for Universal Image Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention, Proceedings of the MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18; Springer International Publishing: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
- Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Computer Vision—ECCV 2018, Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018, Proceedings, Part XV; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
- Zhao, H.; Zhang, Y.; Liu, S.; Shi, J.; Loy, C.C.; Lin, D.; Jia, J. PSANet: Point-Wise Spatial Attention Network for Scene Parsing. In Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018; Proceedings, Part IX. pp. 270–286, ISBN 978-3-030-01239-7. [Google Scholar]
- Huang, Z.; Wang, X.; Wei, Y.; Huang, L.; Shi, H.; Liu, W.; Huang, T.S. CCNet: Criss-Cross Attention for Semantic Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Guo, M.-H.; Lu, C.-Z.; Hou, Q.; Liu, Z.; Cheng, M.-M.; Hu, S.-M. SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. Adv. Neural Inf. Process. Syst. 2022, 35, 1140–1156. [Google Scholar]
- Yang, L.; Fan, J.; Huo, B.; Li, E.; Liu, Y. A Nondestructive Automatic Defect Detection Method with Pixelwise Segmentation. Knowl. Based Syst. 2022, 242, 108338. [Google Scholar] [CrossRef]
- Liu, T.; He, Z.; Lin, Z.; Cao, G.-Z.; Su, W.; Xie, S. An Adaptive Image Segmentation Network for Surface Defect Detection. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 8510–8523. [Google Scholar] [CrossRef]
- Du, W.; Shen, H.; Fu, J. Automatic Defect Segmentation in X-Ray Images Based on Deep Learning. IEEE Trans. Ind. Electron. 2021, 68, 12912–12920. [Google Scholar] [CrossRef]
- Yu, H.; Li, X.; Song, K.; Shang, E.; Liu, H.; Yan, Y. Adaptive Depth and Receptive Field Selection Network for Defect Semantic Segmentation on Castings X-Rays. NDT E Int. 2020, 116, 102345. [Google Scholar] [CrossRef]
- Li, W.; Li, B.; Niu, S.; Wang, Z.; Wang, M.; Niu, T. LSA-Net: Location and Shape Attention Network for Automatic Surface Defect Segmentation. J. Manuf. Process. 2023, 99, 65–77. [Google Scholar] [CrossRef]
- Zhou, Z.; Zhang, J.; Gong, C. Hybrid Semantic Segmentation for Tunnel Lining Cracks Based on Swin Transformer and Convolutional Neural Network. Comput. Aided Civ. Infrastruct. Eng. 2023, 38, 2491–2510. [Google Scholar] [CrossRef]
- Yao, H.; Luo, W.; Yu, W.; Zhang, X.; Qiang, Z.; Luo, D.; Shi, H. Dual-Attention Transformer and Discriminative Flow for Industrial Visual Anomaly Detection. IEEE Trans. Autom. Sci. Eng. 2023, 21, 6126–6140. [Google Scholar] [CrossRef]
- Zhang, Q.; Lai, J.; Zhu, J.; Xie, X. Wavelet-Guided Promotion-Suppression Transformer for Surface-Defect Detection. IEEE Trans. Image Process. 2023, 32, 4517–4528. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Cheng, B.; Schwing, A.G.; Kirillov, A. Per-Pixel Classification Is Not All You Need for Semantic Segmentation. Adv. Neural Inf. Process. Syst. 2021, 34, 17864–17875. [Google Scholar]
- Cavagnero, N.; Rosi, G.; Cuttano, C.; Pistilli, F.; Ciccone, M.; Averta, G.; Cermelli, F. PEM: Prototype-Based Efficient MaskFormer for Image Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. Available online: https://arxiv.org/abs/2103.02907v1 (accessed on 20 June 2024).
- Zhang, D.; Hao, X.; Wang, D.; Qin, C.; Zhao, B.; Liang, L.; Liu, W. An Efficient Lightweight Convolutional Neural Network for Industrial Surface Defect Detection. Artif. Intell. Rev. 2023, 56, 10651–10677. [Google Scholar] [CrossRef]
- Min, X.; Zhou, W.; Hu, R.; Wu, Y.; Pang, Y.; Yi, J. LWUAVDet: A Lightweight UAV Object Detection Network on Edge Devices. IEEE Internet Things J. 2024, 11, 24013–24023. [Google Scholar] [CrossRef]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable ConvNets v2: More Deformable, Better Results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Xu, J.; Xiong, Z.; Bhattacharyya, S.P. PIDNet: A Real-Time Semantic Segmentation Network Inspired by PID Controllers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Huang, Y.; Qiu, C.; Yuan, K. Surface Defect Saliency of Magnetic Tile. Vis. Comput. 2020, 36, 85–96. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv 2019, arXiv:1912.01703. [Google Scholar]
- Open-Mmlab/Mmsegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark. Available online: https://github.com/open-mmlab/mmsegmentation (accessed on 24 June 2024).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Ling, Z.; Zhang, A.; Ma, D.; Shi, Y.; Wen, H. Deep Siamese Semantic Segmentation Network for PCB Welding Defect Detection. IEEE Trans. Instrum. Meas. 2022, 71, 5006511. [Google Scholar] [CrossRef]
- Xiao, T.; Liu, Y.; Zhou, B.; Jiang, Y.; Sun, J. Unified Perceptual Parsing for Scene Understanding. In Computer Vision—ECCV 2018, Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018, Proceedings, Part XV; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
- Zhang, W.; Pang, J.; Chen, K.; Loy, C.C. K-Net: Towards Unified Image Segmentation. Adv. Neural Inf. Process. Syst. 2021, 34, 10326–10338. [Google Scholar]
- Zhang, D.; Song, K.; Xu, J.; He, Y.; Niu, M.; Yan, Y. MCnet: Multiple Context Information Segmentation Network of No-Service Rail Surface Defects. IEEE Trans. Instrum. Meas. 2021, 70, 5004309. [Google Scholar] [CrossRef]
- Li, G.; Han, C.; Liu, Z. No-Service Rail Surface Defect Segmentation via Normalized Attention and Dual-Scale Interaction. IEEE Trans. Instrum. Meas. 2023, 72, 5020310. [Google Scholar] [CrossRef]
Defect Category | Introduction | Example |
---|---|---|
Free | No defect | |
Tab damage | The welding of the tab area is crooked or damaged | |
Nonmetal impurity | There are foreign objects such as carbon powder, hair, fiber flocs, free flakes, etc. on the sealant | |
Metal scrap | It often appears in the area around the lug, producing small debris with the color characteristics of the lug. | |
Electrode fold | The outer electrode is folded to the battery body, left and right edges, or is not welded to the tabs. | |
Electrode damage | The outer cathode electrode is damaged, generally in the welding area, and the damage occurs in the transverse direction and the width exceeds half of the electrode width. | |
Electrode weld crack | There is a distinct black color on the electrode, or a pattern that is obviously different from the color of the solder joint and the electrode, extending horizontally from the solder joint. |
Category | Baseline | Augmentation | Adaptive | |||
---|---|---|---|---|---|---|
Quantity | Proportion | Quantity | Proportion | Quantity | Proportion | |
Free | 45 | 3.54% | 450 | 3.54% | 576 | 14.74% |
Tab damage | 39 | 3.07% | 390 | 3.07% | 429 | 10.98% |
Nonmetal impurity | 1175 | 92.37% | 11,750 | 92.37% | 3347 | 85.67% |
Metal scrap | 139 | 10.93% | 1390 | 10.93% | 832 | 21.3% |
Electrode fold | 403 | 31.68% | 4030 | 31.68% | 759 | 19.43% |
Electrode damage | 48 | 3.77% | 480 | 3.77% | 547 | 14% |
Electrode weld crack | 114 | 8.96% | 1140 | 8.96% | 674 | 17.25% |
Total num | 1272 | 1 | 12,720 | 1 | 3907 | 1 |
Method | mIoU | m-F1 | mIoU | m-F1 | mIoU | m-F1 |
Deeplabv3plus | 73.54 | 79.56 | 74.23 | 80.78 | 76.53 | 85.21 |
SegFormer | 74.77 | 81.02 | 75.34 | 81.54 | 79.04 | 87.04 |
Mask2Former | 78.5 | 85.7 | 79.57 | 86.24 | 83.01 | 90.02 |
Method | Backbone | Class | mIoU | m-Pre | m-Re | Flops (GB) | Param (M) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(a) | (b) | (c) | (d) | (e) | (f) | (g) | |||||||
FCN | Resnet50 | 99.84 | 96.53 | 32.22 | 63.55 | 85.09 | 63.45 | 64.17 | 72.12 | 85.91 | 79.60 | 57.99 | 47.13 |
Deeplabv3plus | Resnet50 | 99.84 | 97.03 | 38.92 | 71.04 | 82.8 | 65.35 | 80.74 | 76.53 | 88.83 | 83.22 | 59.79 | 41.22 |
UperNet | swint | 99.85 | 97.12 | 44.06 | 65.06 | 85.3 | 70.17 | 79.91 | 77.35 | 86.77 | 85.7 | 236 | 58.95 |
KNet | swint | 99.86 | 97.43 | 47.83 | 76.91 | 85.18 | 72.63 | 83.51 | 80.48 | 89.89 | 86.72 | 249 | 72.16 |
SegFormer | - | 99.87 | 96.75 | 42.25 | 72.3 | 87.74 | 72.47 | 81.73 | 79.04 | 90.66 | 84.61 | 41.94 | 44.6 |
SegNeXt | - | 99.89 | 98.56 | 42.34 | 72.88 | 89.44 | 72.1 | 79.97 | 79.31 | 89.03 | 86.26 | 32.49 | 27.56 |
McNet | Resnet50 | 99.87 | 97.88 | 37.47 | 76.92 | 88.45 | 68.7 | 72.26 | 77.36 | 89.7 | 83.23 | 54.35 | 46.77 |
NadiNet | - | 99.87 | 97.6 | 43.75 | 75.53 | 87.25 | 71.46 | 84.24 | 79.96 | 90.92 | 85.35 | 61.92 | 41.27 |
MaskFormer | swint | 99.87 | 97.68 | 49.74 | 82.61 | 87.41 | 73.13 | 85.64 | 82.3 | 90.25 | 88.66 | 54.89 | 46.46 |
Mask2Former | swint | 99.88 | 98.54 | 52.16 | 84.16 | 87.21 | 72.6 | 86.56 | 83.01 | 91.09 | 88.98 | 69.44 | 47.4 |
PEM | swint | 99.88 | 98.59 | 50.78 | 86.42 | 88.13 | 76.7 | 86.12 | 83.8 | 91.2 | 89.69 | 40.65 | 47.95 |
MSOFormer | Resnet50 | 99.87 | 98.47 | 51.09 | 86.3 | 85.32 | 74.4 | 87.34 | 83.26 | 91.48 | 88.71 | 33.74 | 46.4 |
swint | 99.89 | 98.56 | 52.86 | 86.75 | 88.28 | 76.79 | 86.12 | 84.18 | 92.15 | 89.56 | 34.69 | 46.83 |
Method | Backbone | Class | mIoU | m-Pre | m-Re | |||||
---|---|---|---|---|---|---|---|---|---|---|
(a) | (b) | (c) | (d) | (e) | (f) | |||||
FCN | Resnet50 | 99.54 | 51.77 | 81.05 | 60.97 | 89.68 | 73.73 | 76.12 | 87.06 | 83.98 |
Deeplabv3plus | Resnet50 | 99.29 | 61.25 | 65.19 | 71.99 | 42.63 | 77.63 | 69.66 | 90.00 | 76.02 |
UperNet | swint | 99.68 | 62.74 | 87.22 | 67.55 | 93.41 | 81.57 | 82.03 | 89.51 | 89.73 |
KNet | swint | 99.68 | 63.99 | 88.32 | 71.03 | 93.14 | 81.33 | 82.91 | 89.26 | 91.22 |
SegFormer | - | 99.67 | 55.95 | 87.6 | 69.7 | 92.46 | 81.01 | 81.06 | 88.52 | 89.12 |
SegNeXt | - | 99.66 | 66.29 | 89.5 | 75.55 | 94.65 | 78.38 | 84.0 | 91.62 | 90.23 |
McNet | Resnet50 | 99.54 | 64.91 | 84.09 | 72.59 | 92.9 | 71.51 | 80.92 | 90.49 | 87.54 |
NadiNet | - | 99.59 | 64.53 | 89.14 | 71.27 | 92.42 | 74.51 | 81.91 | 92.44 | 86.93 |
MaskFormer | swint | 99.56 | 70.98 | 87.59 | 74.84 | 93.66 | 72.71 | 83.22 | 90.21 | 90.76 |
Mask2Former | swint | 99.69 | 71.83 | 88.98 | 73.58 | 93.76 | 81.99 | 84.97 | 89.55 | 93.85 |
PEM | swint | 99.68 | 73.25 | 89.53 | 75.71 | 93.32 | 80.85 | 85.39 | 91.92 | 91.79 |
MSOFormer | Resnet50 | 99.69 | 65.15 | 88.8 | 75.46 | 94.93 | 80.71 | 84.12 | 90.1 | 92.06 |
swint | 99.7 | 74.14 | 90.06 | 73.95 | 93.3 | 82.04 | 85.53 | 91.52 | 92.4 |
Method | Backbone | Class | mIoU | m-Pre | m-Re | |||
---|---|---|---|---|---|---|---|---|
(a) | (b) | (c) | (d) | |||||
FCN | Resnet50 | 97.5 | 73.39 | 85.44 | 79.88 | 84.05 | 90.66 | 91.56 |
Deeplabv3plus | Resnet50 | 97.56 | 72.86 | 85.7 | 81.92 | 84.51 | 91.12 | 91.62 |
UperNet | swint | 97.84 | 77.25 | 86.7 | 83.64 | 86.36 | 92.10 | 92.94 |
KNet | swint | 97.79 | 77.48 | 86.58 | 83.15 | 86.25 | 91.6 | 93.33 |
SegFormer | - | 97.81 | 76.82 | 86.63 | 83.1 | 86.09 | 92.15 | 92.57 |
SegNeXt | - | 97.84 | 77.91 | 86.65 | 83.29 | 86.42 | 91.84 | 93.29 |
McNet | Resnet50 | 97.65 | 76.11 | 85.71 | 81.29 | 85.19 | 91.39 | 92.23 |
NadiNet | - | 97.62 | 75.0 | 86.48 | 82.14 | 85.31 | 89.71 | 94.29 |
MaskFormer | swint | 97.83 | 78.54 | 86.31 | 83.62 | 86.57 | 91.91 | 93.42 |
Mask2Former | swint | 97.87 | 77.83 | 86.68 | 83.45 | 86.46 | 92.43 | 92.75 |
PEM | swint | 97.85 | 78.45 | 86.44 | 84.16 | 86.73 | 91.89 | 93.62 |
MSOFormer | Resnet50 | 97.79 | 77.06 | 86.3 | 83.64 | 86.2 | 91.93 | 92.92 |
swint | 97.97 | 79.08 | 87.27 | 83.86 | 87.05 | 93.21 | 92.71 |
Method | Class | Macc | Mfal | Mleak | Fps | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
(a) | (b) | (c) | (d) | (e) | (f) | (g) | |||||
FCN | 45.97 | 88.89 | 38.03 | 45.83 | 95.12 | 85.71 | 100.0 | 71.37 | 0.52 | 20.50 | 222.92 |
Deeplabv3plus | 63.53 | 80.0 | 74.58 | 38.89 | 86.81 | 66.67 | 89.47 | 71.42 | 3.1 | 10.88 | 88.88 |
UperNet | 74.29 | 100.0 | 87.03 | 60.71 | 89.89 | 75.0 | 100.0 | 83.85 | 3.42 | 6.01 | 59.55 |
KNet | 79.69 | 100.0 | 91.74 | 88.0 | 91.67 | 100.0 | 100.0 | 93.01 | 3.71 | 2.24 | 26.84 |
SegFormer | 77.46 | 88.89 | 83.54 | 68.0 | 96.39 | 75.0 | 100.0 | 84.18 | 1.79 | 6.68 | 97.22 |
SegNeXt | 73.42 | 100.0 | 80.43 | 58.62 | 97.56 | 85.71 | 100.0 | 85.11 | 0.68 | 7.69 | 86.9 |
McNet | 47.27 | 100.0 | 46.03 | 45.83 | 96.34 | 100.0 | 100.0 | 76.5 | 2.68 | 18.22 | 194.12 |
NadiNet | 73.61 | 100.0 | 83.75 | 74.07 | 96.39 | 100.0 | 100.0 | 89.69 | 2.85 | 4.69 | 75.9 |
MaskFormer | 76.67 | 100.0 | 92.37 | 95.83 | 95.18 | 85.71 | 100.0 | 92.25 | 6.31 | 0.54 | 70.95 |
Mask2Former | 90.16 | 100.0 | 94.92 | 95.83 | 93.98 | 85.71 | 100.0 | 94.37 | 1.44 | 1.14 | 32.13 |
PEM | 86.89 | 100.0 | 94.54 | 88.46 | 95.18 | 100 | 100 | 95.01 | 2.39 | 0.90 | 55.55 |
MSOFormer | 93.33 | 100.0 | 96.2 | 100 | 95.24 | 100 | 100 | 97.82 | 1.36 | 0.48 | 54.21 |
Ablation Setting | EPD | DSQ | MBL | mIoU (%) | mF1 (%) | Flops (GB) | Params (M) |
---|---|---|---|---|---|---|---|
1 | 83.01 | 90.02 | 69.44 | 47.4 | |||
2 | ✓ | 83.63 (+0.62) | 90.29 (+0.27) | 34.48 | 46.83 | ||
3 | ✓ | 83.76 (+0.75) | 90.36 (+0.34) | 69.86 | 47.4 | ||
4 | ✓ | 83.72 (+0.71) | 90.37 (+0.35) | 69.44 | 47.4 | ||
5 | ✓ | ✓ | 83.94 (+0.93) | 90.47 (+0.45) | 34.69 | 46.83 | |
6 | ✓ | ✓ | 84.05 (+1.04) | 90.55 (+0.53) | 34.48 | 46.83 | |
7 | ✓ | ✓ | 83.99 (+0.98) | 90.54 (+0.52) | 69.86 | 47.4 | |
8 | ✓ | ✓ | ✓ | 84.18 (+1.17) | 90.78 (+0.76) | 34.69 | 46.83 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, D.; Chen, J.; Wu, P.; Pan, Y.; Zhong, H.; Deng, Z.; Xue, X. Mask-Space Optimized Transformer for Semantic Segmentation of Lithium Battery Surface Defect Images. Mathematics 2024, 12, 3627. https://doi.org/10.3390/math12223627
Sun D, Chen J, Wu P, Pan Y, Zhong H, Deng Z, Xue X. Mask-Space Optimized Transformer for Semantic Segmentation of Lithium Battery Surface Defect Images. Mathematics. 2024; 12(22):3627. https://doi.org/10.3390/math12223627
Chicago/Turabian StyleSun, Daozong, Jiasi Chen, Peiwen Wu, Yucheng Pan, Hongsheng Zhong, Zihao Deng, and Xiuyun Xue. 2024. "Mask-Space Optimized Transformer for Semantic Segmentation of Lithium Battery Surface Defect Images" Mathematics 12, no. 22: 3627. https://doi.org/10.3390/math12223627
APA StyleSun, D., Chen, J., Wu, P., Pan, Y., Zhong, H., Deng, Z., & Xue, X. (2024). Mask-Space Optimized Transformer for Semantic Segmentation of Lithium Battery Surface Defect Images. Mathematics, 12(22), 3627. https://doi.org/10.3390/math12223627