WaveSegNet: An Efficient Method for Scrap Steel Segmentation Utilizing Wavelet Transform and Multiscale Focusing
Abstract
:1. Introduction
- We propose the application of semantic segmentation within the realm of intelligent classification tasks for scrap steel recycling, aiming to enhance the precision and efficiency of sorting processes through advanced computational methods.
- To better capture the details and structural information of images, we propose the method of downsampling with Daubechies wavelets and upsampling with Haar wavelets to better understand and analyze images.
- We adopt a mechanism of multiscale focusing to further enhance the accuracy of segmentation by extracting and perceiving features at different scales.
- Through extensive experiments, we validate that WaveSegNet exhibits excellent performance in scrap steel segmentation and confirm the effectiveness of various structures.
2. Related Work
2.1. Semantic Segmentation
2.2. Wavelet Transform in CNNs
2.3. Intelligent Scrap Steel Detection
3. Scrap Steel Dataset
3.1. Simulated Scenario Dataset
3.2. Real-World Scenario Dataset
4. Methods
4.1. Encoder
4.1.1. MultiScale-Focusing-Based Self-Attention
4.1.2. Daubechies Wavelet Downsampling
4.2. Decoder
5. Experiments
5.1. Performance Evaluation Metrics
5.2. Semantic Segmentation on Scrap Steel
5.2.1. Simulated Scenario Dataset
5.2.2. Real-World Scenario Dataset
- Lossless wavelet transform: WaveSegNet uses lossless wavelet transform for upsampling and downsampling, ensuring the integrity of image information and accurately segmenting the boundaries of scrap steel.
- Multiscale perception focusing: WaveSegNet adopts a multiscale perception focusing mechanism to concentrate the attention on the scrap steel area, thereby reducing the impact of background interference.
- Task-specific customization: WaveSegNet is tailor-made and fine-tuned for the specific task, aligning with the unique features and demands of scrap steel images.
5.3. Semantic Segmentation on Cityscapes
5.4. Ablation Study
6. Conclusions and Future Works
6.1. Conclusions
6.2. Limitations and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Akram, R.; Ibrahim, R.L.; Wang, Z.; Adebayo, T.S.; Irfan, M. Neutralizing the surging emissions amidst natural resource dependence, eco-innovation, and green energy in G7 countries: Insights for global environmental sustainability. J. Environ. Manag. 2023, 344, 118560. [Google Scholar] [CrossRef]
- Ma, Y.; Wang, J. Time-varying spillovers and dependencies between iron ore, scrap steel, carbon emission, seaborne transportation, and China’s steel stock prices. Resour. Policy 2021, 74, 102254. [Google Scholar] [CrossRef]
- Lin, Y.; Yang, H.; Ma, L.; Li, Z.; Ni, W. Low-Carbon Development for the Iron and Steel Industry in China and the World: Status Quo, Future Vision, and Key Actions. Sustainability 2021, 13, 12548. [Google Scholar] [CrossRef]
- Fan, Z.; Friedmann, S.J. Low-carbon production of iron and steel: Technology options, economic assessment, and policy. Joule 2021, 5, 829–862. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Part III 18, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
- Wang, H.; Zhu, Y.; Adam, H.; Yuille, A.; Chen, L.C. Max-deeplab: End-to-end panoptic segmentation with mask transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5463–5474. [Google Scholar]
- Zhang, H.; Li, F.; Xu, H.; Huang, S.; Liu, S.; Ni, L.M.; Zhang, L. MP-Former: Mask-piloted transformer for image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 18074–18083. [Google Scholar]
- Jain, J.; Li, J.; Chiu, M.T.; Hassani, A.; Orlov, N.; Shi, H. Oneformer: One transformer to rule universal image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 2989–2998. [Google Scholar]
- Tragakis, A.; Kaul, C.; Murray-Smith, R.; Husmeier, D. The fully convolutional transformer for medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Vancouver, BC, Canada, 18–22 June 2023; pp. 3660–3669. [Google Scholar]
- Fujieda, S.; Takayama, K.; Hachisuka, T. Wavelet Convolutional Neural Networks. arXiv 2018, arXiv:cs.CV/1805.08620. Available online: http://arxiv.org/abs/1805.08620 (accessed on 1 January 2024).
- Liu, P.; Zhang, H.; Zhang, K.; Lin, L.; Zuo, W. Multi-level Wavelet-CNN for Image Restoration. arXiv 2018, arXiv:cs.CV/1805.07071. Available online: http://arxiv.org/abs/1805.07071 (accessed on 1 January 2024).
- Wu, T.; Li, W.; Jia, S.; Dong, Y.; Zeng, T. Deep Multi-Level Wavelet-CNN Denoiser Prior for Restoring Blurred Image with Cauchy Noise. IEEE Signal Process. Lett. 2020, 27, 1635–1639. [Google Scholar] [CrossRef]
- Huang, H.; He, R.; Sun, Z.; Tan, T. Wavelet-SRNet: A wavelet-based CNN for multi-scale face super resolution. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1698–1706. [Google Scholar] [CrossRef]
- Ma, H.; Liu, D.; Xiong, R.; Wu, F. iWave: CNN-Based Wavelet-Like Transform for Image Compression. IEEE Trans. Multimed. 2020, 22, 1667–1679. [Google Scholar] [CrossRef]
- Kim, C.W.; Kim, H.G. Study on automated scrap-sorting by an image processing technology. Adv. Mater. Res. 2007, 26, 453–456. [Google Scholar] [CrossRef]
- Koyanaka, S.; Kobayashi, K. Automatic sorting of lightweight metal scrap by sensing apparent density and three-dimensional shape. Resour. Conserv. Recycl. 2010, 54, 571–578. [Google Scholar] [CrossRef]
- Wieczorek, T.; Pilarczyk, M. Classification of steel scrap in the EAF process using image analysis methods. Arch. Metall. Mater. 2008, 53, 613–617. [Google Scholar]
- Xu, G.; Li, M.; Xu, J. Application of machine learning in automatic grading of deep drawing steel quality. J. Eng. Sci. 2022, 44, 1062–1071. [Google Scholar]
- Duan, S. Recognition Classification and Statistics of Scrap Steel Based on Optical Image YOLO Algorithm. Master’s Thesis, Dalian University of Technology, Dalian, China, 2021. [Google Scholar]
- Xu, W.; Xiao, P.; Zhu, L.; Zhang, Y.; Chang, J.; Zhu, R.; Xu, Y. Classification and rating of steel scrap using deep learning. Eng. Appl. Artif. Intell. 2023, 123, 106241. [Google Scholar] [CrossRef]
- Sun, L. Automatic rating of scrap steel based on neural network. Chin. Informatiz. 2021, 49–50. [Google Scholar]
- GB/T 4223-2017; Iron and Steel Scraps. China National Standards: Beijing, China, 2017.
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Zhang, C.; Kim, J. Modeling long-and short-term temporal context for video object detection. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 71–75. [Google Scholar]
- Liang, X.; Shen, X.; Xiang, D.; Feng, J.; Lin, L.; Yan, S. Semantic object parsing with local-global long short-term memory. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3185–3193. [Google Scholar]
- Bracewell, R.; Kahn, P.B. The Fourier transform and its applications. Am. J. Phys. 1966, 34, 712. [Google Scholar] [CrossRef]
- Geng, Z.; Guo, M.H.; Chen, H.; Li, X.; Wei, K.; Lin, Z. Is attention better than matrix decomposition? arXiv 2021, arXiv:2109.04553. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Lee, Y.; Kim, J.; Willette, J.; Hwang, S.J. Mpvit: Multi-path vision transformer for dense prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7287–7296. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
Category | Description | Example |
---|---|---|
<3 mm | Thickness of fewer than or equal to 3 mm. | |
3–6 mm | Thickness ranging from 3 mm to 6 mm. | |
>6 mm | Thickness greater than 6 mm. | |
paint | The surface has paint or baked paint coating. | |
galvanized | Thickness ≤ 2.5 mm and surface coating. | |
greasy dirt | The surface is contaminated with oil. | |
inclusion | Nonmetallic materials such as rocks, rubber, plastic, sand, etc. |
Category | Description | Example |
---|---|---|
overweight | Single piece weight > 700 KG. | |
airtight | Container isolated from external environment. | |
scattered | Fine shredded steel powder, rust, iron filings, etc. | |
cast_iron | Cast iron with a carbon content ranging from 2% to 6.67%. | |
ungraded | There is significant contamination, corrosion on the surface, or defects in size and shape. |
Experimental Configuration | Detailed Information |
---|---|
Operating System | Ubuntu 20.04 |
Motherboard | ROG MAXIMUS Z790 HERO |
CPU | 13th Gen Intel(R) Core(TM) i9-13,900 K |
GPU | NVIDIA GeForce RTX 4090 (24 G) × 2 |
RAM | 64 GB |
Storage space | 6 TB |
GPU Driver Version | 520.56.06 |
CUDA Version | 11.8 |
Python Version | 3.8.13 |
PyTorch Version | 1.13.0 |
Parameter | Simulated Scenario | Real-World | Parameter Description |
---|---|---|---|
img_scale | 2048 × 512 | 2048 × 1024 | Image resizing dimensions |
ratio_range | (0.5, 2.0) | (0.5, 2.0) | Range for image scaling ratios |
crop_size | 512 × 512 | 1024 × 1024 | Image cropping size |
cat_max_ratio | 0.75 | 0.75 | Maximum ratio for object cropping |
prob | 0.5 | 0.5 | Image flip probability |
batch size | 16 | 8 | Number of samples per batch |
max_iters | 40 k | 160 k | Training iterations |
optimizer | AdamW | AdamW | Type of optimizer |
betas | (0.9, 0.999) | (0.9, 0.999) | Momentum parameters for AdamW optimizer |
lr | 6 × 10 | 6 × 10 | Learning rate |
warmup | linear | linear | Learning rate warm-up method |
warmup_iters | 1500 | 1500 | Iterations for learning rate warm-up |
warmup_ratio | 1 × 10 | 1 × 10 | Minimum learning rate ratio during warm-up |
min_lr | 0.0 | 0.0 | Minimum learning rate |
weight_decay | 0.01 | 0.01 | Weight-decay coefficient |
Model | Params (M) ↓ | FLOPs (G) ↓ | mIoU (SS) ↑ | mIoU (MS) ↑ |
---|---|---|---|---|
WaveSegNet | 34.1 | 321 | 73.1 | 73.7 |
DeepLabv3+ [8] | 43.6 | 1403 | 71.9 | 72.4 |
MPViT [33] | 105.2 | 2365 | 69.6 | 71.4 |
Segformer [34] | 24.7 | 325 | 71.9 | 72.5 |
Swin [35] | 59.8 | 1879 | 72.2 | 72.4 |
ConvNeXt [36] | 60.1 | 1868 | 72.5 | 73.4 |
Model | Params (M) ↓ | FLOPs (G) ↓ | mIoU (SS) ↑ | mIoU (MS) ↑ |
---|---|---|---|---|
WaveSegNet | 34.1 | 322 | 69.8 | 74.8 |
DeepLabv3+ | 43.6 | 1404 | 65.2 | 66.3 |
MPViT | 105.2 | 2368 | 57.3 | 58.5 |
Segformer | 27.4 | 420 | 65.7 | 70.8 |
Swin | 59.8 | 1880 | 64.1 | 66.3 |
ConvNeXt | 60.1 | 1869 | 65.6 | 69.0 |
Model | Params (M) ↓ | FLOPs (G) ↓ | mIoU ↑ |
---|---|---|---|
WaveSegNet | 34.1 | 322 | 81.8 |
Deeplabv3 [7] | 68.1 | 2157 | 79.3 |
Deeplabv3+ | 43.6 | 1414 | 80.1 |
Segformer | 27.5 | 420 | 81.0 |
Swin | 59.8 | 1871 | 79.5 |
ConvNeXt | 61.1 | 1869 | 80.7 |
Ablation | Variant | Cityscapes | Simulated Scenario Dataset | |||||
---|---|---|---|---|---|---|---|---|
Params (M) | FLOPs (G) | mIoU | Params (M) | FLOPs (G) | mIoU | |||
Baseline | WaveSegNet | 34.14 | 321.52 | 81.8 | 34.13 | 321.36 | 73.1 | |
Focus Branch | Remove 7 × 7 branch | 33.75 | 316.80 | 81.6 | 33.74 | 313.92 | 72.9 | |
Remove 11 × 11 branch | 33.27 | 310.96 | 81.2 | 33.26 | 310.80 | 72.5 | ||
Remove 21 × 21 branch | 31.25 | 286.40 | 80.5 | 31.24 | 286.16 | 71.9 | ||
Downsampling | Upsampling | |||||||
Wavelet Transform | ✓ | × | 30.30 | 296.56 | 81.6 | 30.30 | 296.40 | 73.0 |
× | ✓ | 35.21 | 327.76 | 81.5 | 35.21 | 327.52 | 72.7 | |
× | × | 31.38 | 302.80 | 81.2 | 31.37 | 302.56 | 72.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhong, J.; Xu, Y.; Liu, C. WaveSegNet: An Efficient Method for Scrap Steel Segmentation Utilizing Wavelet Transform and Multiscale Focusing. Mathematics 2024, 12, 1370. https://doi.org/10.3390/math12091370
Zhong J, Xu Y, Liu C. WaveSegNet: An Efficient Method for Scrap Steel Segmentation Utilizing Wavelet Transform and Multiscale Focusing. Mathematics. 2024; 12(9):1370. https://doi.org/10.3390/math12091370
Chicago/Turabian StyleZhong, Jiakui, Yunfeng Xu, and Changda Liu. 2024. "WaveSegNet: An Efficient Method for Scrap Steel Segmentation Utilizing Wavelet Transform and Multiscale Focusing" Mathematics 12, no. 9: 1370. https://doi.org/10.3390/math12091370
APA StyleZhong, J., Xu, Y., & Liu, C. (2024). WaveSegNet: An Efficient Method for Scrap Steel Segmentation Utilizing Wavelet Transform and Multiscale Focusing. Mathematics, 12(9), 1370. https://doi.org/10.3390/math12091370