Experimental Design of Steel Surface Defect Detection Based on MSFE-YOLO—An Improved YOLOV5 Algorithm with Multi-Scale Feature Extraction
Abstract
:1. Introduction
- Experimental Design Framework Based on Deep Learning: We propose a deep learning-based experimental design framework that integrates artificial intelligence with industrial applications. This framework not only provides an innovative solution for steel surface defect detection but also serves as a teaching tool aimed at guiding students to learn and master relevant technologies, equipping them with the skills necessary for real-world industrial applications.
- Introduction of Efficient Multi-Scale Attention (EMA) [25] Mechanism: By incorporating the EMA mechanism into the Backbone network of the YOLOv5 model, pixel-level relationships are captured through cross-dimensional interactions. Utilizing convolution kernels of varying sizes, the model efficiently fuses multi-scale contextual information, significantly enhancing the feature extraction capabilities and detection accuracy with only a slight increase in the computational cost.
- Proposed Novel C3DX Module: In the Neck of the network, we introduce the Convolution 3 Dilated Convolution X (C3DX) module. This module uses dilated convolutions with different dilation rates to capture diverse receptive fields and integrate multi-scale contextual information, further improving defect detection precision. In addition to boosting detection performance, this module helps students understand the concept of receptive fields, fostering their innovative thinking skills.
- Model Validation Across Multiple Datasets: The improved MSFE-YOLOv5 model has been validated on the NEU-DET, GC10-DET, Severstal Steel, and Crack500 datasets, with mean average precision (mAP) increases of 4.7%, 4.5%, 3.1% and 3.0%, respectively. These results demonstrate the model’s excellent performance in detection and generalization, while the experiments help students develop practical skills and the ability to solve real-world problems.
2. Materials and Methods
2.1. YOLOv5
2.2. Method
2.2.1. EMA
2.2.2. C3DX
2.2.3. EIoU Loss
3. Results
3.1. Dataset Preparation and Preprocessing
3.2. Experimental Setup
3.3. Evaluation Metrics
3.4. Performance Evaluation of Each Module
3.4.1. Attention Effectiveness Experiment
3.4.2. Comparison Experiments
3.4.3. Ablation Study
4. Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Name | Meanings |
---|---|
YOLO | YOLO (You Only Look Once) is a deep learning model widely used for object detection tasks. Its core idea is to transform the object detection problem into a regression problem, predicting multiple classes and a bounding box |
EMA | Efficient Multi-Scale Attention. |
C3DX | Convolution 3 Dilated Convolution X |
C3 | The C3 module is a feature extraction structure in the YOLOv5 that enhances feature extraction and fusion capabilities by incorporating a Cross-Stage Partial (CSP) network. This design further optimizes the model’s ability to capture and integrate features effectively. |
mAP | mAP (Mean Average Precision) is a comprehensive metric used to evaluate a model’s detection accuracy and localization precision across all categories, with higher values indicating better performance. |
References
- Vilček, I.; Řehoř, J.; Carou, D.; Zeman, P. Residual stresses evaluation in precision milling of hardened steel based on the deflection-electrochemical etching technique. Robot. Comput.-Integr. Manuf. 2017, 47, 112–116. [Google Scholar] [CrossRef]
- Abbes, W.; Elleuch, J.F.; Sellami, D. Defect-Net: A new CNN model for steel surface defect classification. In Proceedings of the 2024 IEEE 12th International Symposium on Signal, Image, Video and Communications (ISIVC 2024), Marrakech, Morocco, 21–23 May 2024. [Google Scholar] [CrossRef]
- Nguyen, H.-V.; Bae, J.-H.; Lee, Y.-E.; Lee, H.-S.; Kwon, K.-R. Comparison of Pre-Trained YOLO Models on Steel Surface Defects Detector Based on Transfer Learning with GPU-Based Embedded Devices. Sensors 2022, 22, 9926. [Google Scholar] [CrossRef] [PubMed]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1483–1498. [Google Scholar] [CrossRef] [PubMed]
- He, Y.; Jin, Z.; Zhang, J.; Teng, S.; Chen, G.; Sun, X.; Cui, F. Pavement surface defect detection using mask Region-Based convolutional neural networks and transfer learning. Appl. Sci. 2022, 12, 7364. [Google Scholar] [CrossRef]
- Si, B.; Yasengjiang, M.; Wu, H. Deep learning-based defect detection for hot-rolled strip steel. J. Phys. Conf. Ser. 2022, 2246, 012073. [Google Scholar] [CrossRef]
- Zhao, W.; Chen, F.; Huang, H.; Li, D.; Cheng, W. A new steel defect detection algorithm based on deep learning. Comput. Intell. Neurosci. 2021, 2021, 592878. [Google Scholar] [CrossRef]
- Shi, X.; Zhou, S.; Tai, Y.; Wang, J.; Wu, S.; Liu, J.; Xu, K.; Peng, T.; Zhang, Z. An improved faster R-CNN for steel surface defect detection. In Proceedings of the 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP 2022), Shanghai, China, 26–28 September 2022. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Jocher, G.R.; Stoken, A.; Borovec, J.; Chaurasia, A.; Changyu, L.; Hogan, A.; Hajek, J.; Diaconu, L.; Kwon, Y.; Defretin, Y.; et al. Ultralytics/Yolov5: V5.0-YOLOv5-P6 1280 Models, AWS, Supervise.Ly and YouTube Integrations; Zenodo: Geneva, Switzerland, 2021. [Google Scholar]
- Guo, Z.; Wang, C.; Yang, G.; Huang, Z.; Li, G. MSFT-YOLO: Improved YOLOV5 based on transformer for detecting defects of steel surface. Sensors 2022, 22, 3467. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 6000–6010. [Google Scholar]
- Zhu, W.; Zhang, H.; Zhang, C.; Zhu, X.; Guan, Z.; Jia, J. Surface defect detection and classification of steel using an efficient Swin Transformer. Adv. Eng. Inform. 2023, 57, 102061. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Yi, C.; Xu, B.; Chen, J.; Chen, Q.; Zhang, L. An improved YOLOX model for detecting strip surface defects. Steel Res. Int. 2022, 93, 2200505. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Xikun, X.; Changjiang, L.; Meng, X. Application of attention YOLOV 4 algorithm in metal defect detection. In Proceedings of the 2021 IEEE International Conference on Emergency Science and Information Technology (ICESIT 2021), Chongqing, China, 22–24 November 2021. [Google Scholar] [CrossRef]
- Wang, L.; Liu, X.; Ma, J.; Su, W.; Li, H. Real-Time Steel Surface Defect Detection with Improved Multi-Scale YOLO-v5. Processes 2023, 11, 1357. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar] [CrossRef]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO, Version 8.0.0; Ultralytics Inc.: Los Angeles, CA, USA, 2023; Available online: https://github.com/ultralytics/ultralytics (accessed on 10 June 2023).
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. IEEE Conf. Proc. 2020, 2020, 1571–1580. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
- Brauwers, G.; Frasincar, F. A General Survey on attention Mechanisms in Deep Learning. IEEE Trans. Knowl. Data Eng. 2023, 35, 3279–3298. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Computer Vision—ECCV 2018; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Montreal, BC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Yao, C.; Tang, Y.; Sun, J.; Gao, Y.; Zhu, C. Multiscale residual fusion network for image denoising. IET Image Process. 2021, 16, 878–887. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 2022, 52, 8574–8586. [Google Scholar] [CrossRef]
- Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
- He, Y.; Song, K.; Meng, Q.; Yan, Y. An End-to-End steel surface defect detection approach via fusing multiple hierarchical features. IEEE Trans. Instrum. Meas. 2020, 69, 1493–1504. [Google Scholar] [CrossRef]
- Qilong, W.; Banggu, W.; Pengfei, Z.; Peihua, L.; Wangmeng, Z.; Qinghua, H. ECA-Net: Efficient channel attention for deep convolutional neural networks. IEEE Conf. Proc. 2020, 2020, 11531–11539. [Google Scholar]
- Lv, X.; Duan, F.; Jiang, J.-j.; Fu, X.; Gan, L. Deep Metallic Surface Defect Detection: The New Benchmark and Detection Network. Sensors 2020, 20, 1562. [Google Scholar] [CrossRef] [PubMed]
- Severstal: Steel Defect Detection. Available online: https://www.kaggle.com/c/severstal-steel-defect-detection (accessed on 21 May 2021).
- Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1525–1535. [Google Scholar] [CrossRef]
Model | Advantages | Disadvantages | Improvements |
---|---|---|---|
Traditional Manual Inspection | Visual identification, capable of detecting rare defects | Relies on subjective judgment, low efficiency, inaccurate, resource-intensive | Not suitable for large-scale or complex environments, both efficiency and accuracy are insufficient |
Traditional Machine Vision | Automation reduces manual intervention | Manual feature extraction is complex, not suitable for complex environments, computationally intensive | Feature extraction complexity and unsuitability for complex detection environments |
R-CNN Series (e.g., Faster R-CNN) | High accuracy, widely used in defect detection | Slower speed, challenging to meet real-time requirements, computation–ally complex | Region proposal network limits speed and real-time performance |
YOLO Series | Real-time detection, fast, suitable for large-scale detection | Precision for small defects is insufficient, accuracy slightly lower | Precision issues, especially for small-sized defects |
Transformer based YOLO improvements | Enhanced global feature extraction capability | High computational complexity, training difficulties, struggles with real-time requirements | Computational complexity and training challenges limit industrial application |
Name | Parameter |
---|---|
CPU | Intel Core i9-7960X |
GPU | RTX 3080Ti 12G |
Operating System | Windows10 |
Software environment | Python 3.7 + Pytorch 1.8.1 + CUDA 11.3 |
Method | (a) | (b) | ||||
---|---|---|---|---|---|---|
Size (M) | [email protected] | Time (ms) | Size (M) | [email protected] | Time (ms) | |
None [14] | 13.6 | 76.1 | 13.4 | 13.6 | 76.1 | 13.4 |
SE | 13.7 | 78.7 | 14.6 | 13.7 | 78.1 | 13.9 |
CBAM | 13.7 | 78.9 | 18.1 | 13.7 | 78.5 | 16.4 |
ECA | 13.6 | 78.7 | 14.4 | 13.6 | 78.3 | 13.9 |
Transformer | 17.9 | 79.2 | 85.1 | 17.9 | 79.1 | 73.8 |
CA | 13.7 | 79.0 | 18.0 | 13.7 | 78.7 | 16.1 |
EMA (G = 4) | 13.7 | 79.2 | 15.7 | 13.7 | 78.7 | 14.8 |
EMA (G = 8) | 13.7 | 79.4 | 16.0 | 13.7 | 78.9 | 15.0 |
Model | Size (M) | AP (%) | [email protected] (%) | Time (ms) | |||||
---|---|---|---|---|---|---|---|---|---|
Cr | In | Pa | PS | RS | Sc | ||||
Faster R-CNN | 159.5 | 45.5 | 84.9 | 91.5 | 86.1 | 68.4 | 94.0 | 78.4 | 100.4 |
Cascade R-CNN | 264.9 | 49.3 | 84.6 | 93.2 | 85.7 | 69.2 | 95.8 | 79.6 | 204.2 |
RetinaNet | 145.1 | 49.0 | 82.8 | 94.0 | 87.9 | 66.3 | 91.0 | 78.5 | 54.8 |
YOLOv3 | 236.5 | 48.0 | 79.4 | 89.3 | 79.7 | 59.6 | 90.2 | 74.4 | 48.4 |
YOLOv5s | 13.6 | 42.4 | 86.0 | 93.9 | 81.2 | 61.1 | 92.3 | 76.1 | 13.4 |
YOLOX | 68.5 | 40.8 | 85.9 | 91.8 | 87.8 | 61.9 | 84.2 | 75.4 | 18.6 |
YOLOv7-tiny | 11.6 | 48.7 | 82.5 | 93.5 | 83.5 | 53.5 | 88.9 | 75.1 | 11.1 |
YOLOv7 | 71.3 | 50.7 | 87.0 | 92.2 | 84.7 | 67.5 | 94.4 | 79.4 | 31.9 |
YOLOv8s | 22.5 | 43.0 | 81.4 | 92.6 | 82.5 | 64.3 | 94.6 | 76.4 | 16.3 |
Our model | 14.2 | 51.9 | 86.4 | 94.2 | 84.9 | 71.0 | 96.2 | 80.8 | 18.2 |
Model | GC10-DET | Severstal Steel | Crack500 |
---|---|---|---|
YOLOv5s-pre | 69.3% | 57.5% | 77.6 |
MSFE-YOLOv5s-pre | 72.0% (↑2.7%) | 59.7% (↑2.2%) | 79.8% (↑2.2%) |
YOLOv5s | 61.9% | 55.1% | 78.1% |
MSFE-YOLOv5s | 66.4% (↑4.5%) | 58.2% (↑3.1%) | 81.1% (↑3.0%) |
Method | Size (M) | AP (%) | [email protected] (%) | Time (ms) | |||||
---|---|---|---|---|---|---|---|---|---|
Cr | In | Pa | PS | RS | Sc | ||||
YOLOv5s(baseline) | 13.6 | 42.4 | 86.0 | 93.9 | 81.2 | 61.1 | 92.3 | 76.1 | 13.4 |
+C3EMA | 13.7 | 48.1 | 85.3 | 93.7 | 86.5 | 67.8 | 94.9 | 79.4 | 16.0 |
+C3DX | 14.2 | 50.8 | 85.8 | 94.5 | 85.5 | 69.3 | 95.7 | 80.3 | 18.2 |
+EIoU | 14.2 | 51.9 | 86.4 | 94.2 | 84.9 | 71.0 | 96.2 | 80.8 | 18.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, L.; Zhang, R.; Xie, T.; He, Y.; Zhou, H.; Zhang, Y. Experimental Design of Steel Surface Defect Detection Based on MSFE-YOLO—An Improved YOLOV5 Algorithm with Multi-Scale Feature Extraction. Electronics 2024, 13, 3783. https://doi.org/10.3390/electronics13183783
Li L, Zhang R, Xie T, He Y, Zhou H, Zhang Y. Experimental Design of Steel Surface Defect Detection Based on MSFE-YOLO—An Improved YOLOV5 Algorithm with Multi-Scale Feature Extraction. Electronics. 2024; 13(18):3783. https://doi.org/10.3390/electronics13183783
Chicago/Turabian StyleLi, Lin, Ruopeng Zhang, Tunjun Xie, Yushan He, Hao Zhou, and Yongzhong Zhang. 2024. "Experimental Design of Steel Surface Defect Detection Based on MSFE-YOLO—An Improved YOLOV5 Algorithm with Multi-Scale Feature Extraction" Electronics 13, no. 18: 3783. https://doi.org/10.3390/electronics13183783
APA StyleLi, L., Zhang, R., Xie, T., He, Y., Zhou, H., & Zhang, Y. (2024). Experimental Design of Steel Surface Defect Detection Based on MSFE-YOLO—An Improved YOLOV5 Algorithm with Multi-Scale Feature Extraction. Electronics, 13(18), 3783. https://doi.org/10.3390/electronics13183783