YOLOv8-WD: Deep Learning-Based Detection of Defects in Automotive Brake Joint Laser Welds

Ren, Jiajun; Zhang, Haifeng; Yue, Min

doi:10.3390/app15031184

Open AccessArticle

YOLOv8-WD: Deep Learning-Based Detection of Defects in Automotive Brake Joint Laser Welds

by

Jiajun Ren

^*,

Haifeng Zhang

^* and

Min Yue

School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, Shanghai 201620, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(3), 1184; https://doi.org/10.3390/app15031184

Submission received: 12 December 2024 / Revised: 18 January 2025 / Accepted: 20 January 2025 / Published: 24 January 2025

(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)

Download

Browse Figures

Versions Notes

Abstract

:

The rapid advancement of industrial automation in the automotive manufacturing sector has heightened demand for welding quality, particularly in critical component welding, where traditional manual inspection methods are inefficient and prone to human error, leading to low defect recognition rates that fail to meet modern manufacturing standards. To address these challenges, an enhanced YOLOv8-based algorithm for steel defect detection, termed YOLOv8-WD (weld detection), was developed to improve accuracy and efficiency in identifying defects in steel. We implemented a novel data augmentation strategy with various image transformation techniques to enhance the model’s generalization across different welding scenarios. The Efficient Vision Transformer (EfficientViT) architecture was adopted to optimize feature representation and contextual understanding, improving detection accuracy. Additionally, we integrated the Convolution and Attention Fusion Module (CAFM) to effectively combine local and global features, enhancing the model’s ability to capture diverse feature scales. Dynamic convolution (DyConv) techniques were also employed to generate convolutional kernels based on input images, increasing model flexibility and efficiency. Through comprehensive optimization and tuning, our research achieved a mean average precision (map) at IoU 0.5 of 90.5% across multiple datasets, contributing to improved weld defect detection and offering a reliable automated inspection solution for the industry.

Keywords:

laser welding; defect detection; YOLOv8-WD; deep learning; automotive

1. Introduction

Automation in welding and defect detection technology are two essential pillars driving the modernization of the manufacturing industry [1,2,3]. This technology can not only realize the standardization, automation, and intelligence of the welding process but also improve production efficiency and quality. As the most effective and economical technology for the permanent connection of metal parts, welding is widely used in many key industries such as the automotive, aviation, petrochemical, manufacturing, and construction industries and national defense [4]. The ongoing advancements in the automotive industry have led to heightened safety standards, particularly concerning the quality of weld seams in brake joints, which directly affect vehicle safety and performance [5,6]. Thus, accurate detection of defects in these weld seams is crucial. Common detection methods include non-destructive testing (NDT) and image processing techniques [7]. While NDT offers high accuracy, it entails high costs, complex operations, and strict environmental requirements. Conversely, image processing methods utilize machine learning to analyze surface images of weld seams, providing cost-effective and flexible solutions, though they are sensitive to parameter selection and may struggle with noise and varying lighting conditions [8,9,10]. This paper explores defect detection technologies for brake joint weld seams, emphasizing their significance in automotive manufacturing and proposing optimized strategies to enhance brake system safety and reliability. Through this study, we aim to offer the automotive industry more reliable and efficient quality monitoring methods, ensuring the safety of drivers and passengers.

In recent years, weld defect detection technologies have undergone substantial advancements, driven by researchers integrating advanced techniques to overcome the limitations of traditional methods. For instance, Tsun-Yen Wu et al. [11] employed laser-generated ultrasound and electromagnetic ultrasonic transducers to evaluate weld seams, utilizing various mother wavelets through discrete wavelet transform statistical methods to extract relevant defect features. Liguo Zhang et al. [12] designed a wall-climbing robot with a cross-structured optical sensor for weld seam detection, enabling the acquisition of three-dimensional information about the welds. Addressing the complexities and subtleties of defects recorded in X-ray images, Mengmeng Li et al. [13] combined thermal infrared imaging sensors with industrial robots and developed an image detection algorithm that transforms input RGB thermal images into single-channel grayscale images. This transformation is followed by adaptive thresholding for binarization, effectively revealing the shape and location of defects. Congyi Wang [14] proposed the application of eddy current testing for the detection of micro-gap weld joints, establishing a magnetic dipole model for defects, which was validated against the grayscale values of pit magneto-optical images, demonstrating the superiority of finite element analysis over the magnetic dipole model. Zhifen Zhang [15] focused on typical surface welding defects in the pulse GTAW of aluminum alloys, proposing an algorithm that computes local gray probabilities in regions of interest for monitoring welding defects, thus enhancing the real-time and intelligent capabilities of robotic welding. Ahmad Bazzi [16] proposed a compressive sensing-based full matrix capture (FMC) data compression method for the phased array ultrasonic technique (PAUT) of nozzle welds. Qian Xu [17] proposed a compressive sensing-based full matrix capture (FMC) data compression method to address the issues of excessive signal acquisition, storage, and transmission data volume in nozzle weld defect monitoring. Han Ye [18] proposed a compressive sensing-based weld defect detection method aimed at addressing defects in submerged arc welds. Through continuous advancements, welding defect detection technology is progressing towards higher precision, automation, intelligence, and real-time monitoring. This evolution significantly enhances the efficiency and reliability of welding quality assessments across various industries. Traditional methods have relied on manually designed feature extraction and machine vision classifiers; however, these approaches are subject to human intervention, leading to subjective biases that may result in missed or redundant detections [19]. With advancements in computer technology, deep learning methods have increasingly been applied to weld defect detection. Unlike traditional algorithms, convolutional neural networks (CNNs) automatically extract features, facilitating feature selection and classification while avoiding the pitfalls of manual methods [20].

The YOLO (You Only Look Once) algorithm, introduced by Joseph Redmon et al. in 2015 through the paper titled “You Only Look Once: Unified, Real-Time Object Detection,” has seen multiple updates from version V1 to V10 since 2016, significantly improving detection accuracy while maintaining the speed of single-stage algorithms, making it widely used in engineering inspection applications [21,22]. For instance, Melakhsou [23] proposed a comprehensive control system based on YOLOv3 for detecting weld defects in hot water tank connection pipes, utilizing a 13-layer Darknet-13 feature extractor that generates predictions at two scales, achieving high accuracy in identifying and localizing welding anomalies. Yang Xianbiao [24] introduced an improved YOLO-based method for detecting weld regions, addressing low recognition rates in the welding sections by employing image inversion, k-nearest median filtering, CLAHE image enhancement, and gamma correction to enhance image contrast and improve detection accuracy. Ang Gao [25] et al. developed an enhanced YOLOv5 detection network by incorporating a RepVGG module and a Normalized Attention Module (NAM), optimizing network structure to enhance detection speed and the network’s sensitivity to feature points. Lu HuaiXu [26] proposed the YOLOv5-IMPROVEMENT model, which integrates a CA attention mechanism, SIOU loss function, and FReLU activation function to enhance detection capabilities for small targets and capture low-sensitivity spatial information. Jiayi Tsang et al. [27] introduced a BOT module to extract.

Global information from road damage images, accommodating the large span characteristics of crack targets, incorporated a large separable kernel attention (LKSA) mechanism to improve detection accuracy. Additionally, a C2f Ghost block was constructed in the neck network to reduce computational load while enhancing feature extraction for complex road damage. While these advancements have significantly improved detection accuracy and speed, as we approach 2025, YOLO still faces challenges in specialized tasks, however, such as weld seam defect detection. These include limited adaptability to variations in size, shape, and texture, difficulty capturing long-range dependencies for detecting small or dispersed defects, and sensitivity to noise and low resolution, leading to inaccurate defect identification under diverse conditions.

To solve these problems, in this paper, we design a more effective weld defect detection algorithm by employing a multi-level and multi-scale attention mechanism, enhancing the model’s ability to capture fine-grained details and distinguish subtle defect features from complex backgrounds, thereby improving detection robustness and accuracy in challenging welding scenarios.

2. Related Works

This section presents a detailed description of the materials used in this study, based on various methods.

2.1. Data Collection

The components for the weld seam study were supplied by Shanghai Peijiao Automotive Parts Co., Ltd. (Shanghai, China) The laser welding defect detection platform utilized in this study is depicted in Figure 1. It primarily consists of a vision system and data acquisition equipment, all integrated to ensure accurate and efficient defect detection during the welding process. The vision system, featuring high-resolution industrial cameras and specialized light sources, captures the essential data for defect analysis. The data acquisition system employs the components listed in Figure 1. The platform includes a base bracket to provide structural stability and a camera gripper to securely mount the industrial camera. A light source bracket and circular light source are used to ensure consistent illumination during data capture, which is crucial for identifying subtle welding defects. The system’s power is supplied by a 12V, 51W power source, and the captured data are displayed on a 21.5-inch screen for real-time analysis. Additionally, the platform includes a special universal claw (RH-MHS-200) for flexible handling of various components during the inspection process. This setup ensures that the laser welding defect detection platform operates with high precision and reliability, meeting the demanding requirements of industrial welding applications.

A laser welding defect detection platform for automotive brake joints was established, resulting in the collection of 3000 defect images categorized into four types: holes, scratches, depressions, and bubbles. Following preprocessing, the dataset was expanded to encompass 5200 images, including 1780 dent images, 1630 scratch images, 800 bubble images, and 990 slag images. The dataset was split randomly into training, validation, and testing subsets with a ratio of 10:1:1.

Images of the weld seam of the brake joint obtained by the platform are shown in Figure 2. From the top to the bottom, the images are shown according to the sequence for the indentation, slag, scratch, and bubble.

Indentations (Figure 2a) are circular, oval, or irregular, with light-to-dark boundary transitions. They reflect or cast shadows differently under varying lighting conditions.

Welding slag (Figure 2b) appears as smooth, spherical metal particles along the weld seam, often lighter in color at the center.

Scratches (Figure 2c), often linear or irregular, are common near handles. They show a yellow–brown area around metallic silver grooves with a rougher texture than the base material.

Bubbles (Figure 2d) in weld seams are round or oval, lighter than their surroundings, with clear boundaries that help define their size and shape.

2.2. Data Annotation and Preprocessing

The size and diversity of a dataset are critical factors influencing a model’s generalization capabilities [28]. However, imbalances in the original dataset often pose challenges, potentially leading to suboptimal performance for minority classes. To address this issue, two primary approaches are commonly employed to augment datasets and enhance model performance: data augmentation and generative modeling. Data augmentation involves applying a range of transformation operations, such as cropping, rotation, flipping, and brightness adjustment, to the original data, thereby creating new samples [29]. These operations effectively increase data diversity, expand the feature space, and improve the model’s robustness across different scenarios. Alternatively, generative models, including Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), are used to synthesize new samples directly from the original data distribution, ensuring that the generated samples closely resemble the original data, thus further enhancing model performance.

Considering that GANs have issues with long training times and training instability, while VAEs have problems with poor sample quality generation, this study adopted the data augmentation approach. A series of transformations were applied to the original dataset, generating multiple augmented samples, which increased the dataset’s diversity and enhanced the model’s generalization capabilities. The effects of these augmentations are illustrated in Figure 3, which shows the original image alongside several common augmentation operations. These operations include image rotation, translation, vertical flip followed by translation, vertical and horizontal flipping, and brightness adjustment.

3. Model and Optimization

3.1. YOLOv8 Model

YOLOv8, introduced by Ultralytics in 2023, represents the latest evolution of the YOLO (You Only Look Once) series. Building upon the foundation of YOLOv5, it incorporates significant architectural and methodological innovations. YOLOv8 refines and extends the ideas initially introduced in YOLOv5, with a primary focus on enhancing the model’s accuracy and usability for real-time object detection tasks. This version optimizes both detection precision and computational efficiency, addressing the growing demands of modern applications that require fast and reliable performance in dynamic environments. The advancements in YOLOv8 demonstrate an important step forward in the progression of deep learning models for computer vision, balancing the trade-offs between speed, accuracy, and resource utilization in real-world scenarios.

As is shown in Figure 4, YOLOv8 is an efficient object detection model with three main components: the backbone, the neck, and the detection head. The backbone uses convolutional neural networks (CNNs) to extract both low-level features (edges and textures) and high-level features (object shapes and patterns). The neck network enhances these features by merging multi-scale representations, combining shallow (detailed) and deep (abstract) features to improve the detection of objects of varying sizes. The detection head then outputs bounding boxes, class labels, and confidence scores for object recognition [30].

YOLOv8 is available in five variants (YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x) to suit different computational needs. The YOLOv8n (Nano) version, optimized for edge devices like the NVIDIA Jetson Nano, balances speed and accuracy in resource-constrained environments. YOLOv8n achieves 161 frames per second (FPS) with a batch size of 1 and 2.8 ms of training time with a batch size of 32, making it ideal for real-time and large-scale applications. These design improvements enhance both detection accuracy and computational efficiency across various platforms.

3.2. Backbone Improvement

Weld seam defects typically vary in size, shape, and texture. EfficientViT, through cascading grouped attention and parameter reallocation, can more effectively capture and integrate features from different scales, enabling the model to better represent and identify subtle variations in defects [31].

EfficientViT’s architecture features an efficient sandwich layout, cascaded group attention modules, and parameter redistribution, enhancing memory use, computational efficiency, and parameter allocation.

The sandwich layout is shown in Figure 5; input features pass through

N

Feed-Forward Networks (FFNs), followed by cascaded group attention, and are then processed by

N

additional FFN layers to produce the output features. This design aims to improve the model’s memory efficiency while enhancing computational and parameter efficiency. The calculation formula is as follows:

X_{i + 1} = \prod^{N} Φ_{i}^{F} (Φ_{i}^{Λ} (\prod^{N} Φ_{i}^{F} (X_{i})))

(1)

where

X_{i}

represents the full input features of the

i

-th block. This block uses

N

FFNs before and after a single self-attention layer

Φ_{i}^{F}

to transform

X_{i}

into

X_{i + 1}

.

X_{i + 1}

is the result of

X_{i}

after being processed by a single self-attention layer.

The cascaded attention mechanism is the middle module in Figure 5 that improves computational efficiency and increases the diversity of attention by decomposing the full features into multiple subspaces and projecting and accumulating these subspaces. The calculation formulas are as follows:

{\tilde{X}}_{i j} = A t t n (X_{i j} W_{i j}^{Q}, X_{i j} W_{i j}^{K}, X_{i j} W_{i j}^{V})

(2)

{\tilde{X}}_{i + 1} = C o n c a t {[{\tilde{X}}_{i j}]}_{j = 1 : h} W_{i}^{P}

(3)

X_{i j}^{'} = X_{i j} + {\tilde{X}}_{i (j - 1)}, 1 < j \leq h

(4)

where in the

j

-th attention head, self-attention is computed on

X_{i j}

, the

j

-th partition of the input feature

X_{i}

, where

X_{i} = [X_{i 1}, X_{i 2}, ..., X_{i h}]

and

1 \leq j \leq h

. The projection layers

W_{i j}^{Q}

,

W_{i j}^{K}

, and

W_{i j}^{V}

map each partition into different subspaces, while

W_{i}^{P}

projects the concatenated outputs back to the original input dimension.

X_{i j}^{'}

, formed by adding

X_{i j}^{'}

to the output of the

(j - 1)

-th head,

{\tilde{X}}_{i (j - 1)}

replaces

X_{i j}

as input for the

j

-th head’s self-attention.

Parameter reallocation optimizes model efficiency by expanding key network component channels and reducing less critical ones. This redistribution minimizes redundancy, refining summation and projection at each stage based on importance analysis. The strategy improves parameter utilization and enhances overall model performance.

3.3. Attention Mechanism Improvement

To solve issues such as feature color blur, lighting interference, and background noise in weld seam photos, the CAFM attention mechanism was employed. It effectively captures both global and local features, enabling the model to focus more accurately on key information in the image, which in turn enhances detection accuracy and efficiency. In Figure 6, the CAFM architecture comprises a local branch and a global branch, each addressing different feature scales. The global branch incorporates a self-attention mechanism to capture long-range dependencies, while the local branch utilizes channel mixing to enhance model complexity, improving representation and reducing overfitting risks. This combination optimizes feature extraction and generalization in complex datasets [32].

The local branch in Figure 6 improves detail capture and noise suppression by first applying a 1 × 1 convolution to adjust the channel dimensions, followed by channel shuffling to enhance inter-channel information integration. The input tensor is divided into groups, with depthwise separable convolutions applied within each group, and the outputs are concatenated to form a new tensor. A 3 × 3 × 3 convolution is then used for feature extraction. The global branch employs attention mechanisms to model long-range dependencies, integrating global context with local details for enhanced detection accuracy and efficiency. The formula for the local branch can be expressed as:

F_{conv} = W_{3 \times 3 \times 3} (CS (W_{1 \times 1} (Y)))

(5)

where

F_{conv}

denotes the output of the local branch,

W_{1 \times 1}

represents the

1 \times 1

convolution,

W_{3 \times 3 \times 3}

denotes the

3 \times 3 \times 3

convolution,

C S

indicates the channel shuffling operation, and

Y

refers to the input features.

In the global branch, as is shown in Figure 6, an attention mechanism is introduced to enhance long-range information interaction. Initially,

1 \times 1

convolutions and

3 \times 3

depthwise convolutions generate the

Q

(Query),

K

(Key), and

V

(Value) tensors. These tensors, with specific dimensions, are reshaped to compute the attention map, focusing on relationships between image regions. Some formulas for the global branch are as follows:

F_{att} = W_{1 \times 1} Attention (\hat{Q}, \hat{K}, \hat{V}) + Y

(6)

Attention (\hat{Q}, \hat{K}, \hat{V}) = \hat{V} Softmax (\hat{K} \hat{Q} / α)

(7)

where through

1 \times 1

convolution and

3 \times 3

convolution, the

Q

(Query),

K

(Key), and

V

(Value) are generated, resulting in three tensors with shapes of

\hat{H} \times \hat{W} \times \hat{C}

. Next,

Q

is reshaped to

\hat{Q} \in ℝ^{\hat{H} \hat{W} \times \hat{C}}

and

K

is reshaped to

\hat{K} \in ℝ^{\hat{C} \times \hat{H} \hat{W}}

. Then, the attention map

A \in ℝ^{\hat{C} \times \hat{C}}

is computed through the interaction between

\hat{Q}

and

\hat{K}

, instead of calculating a large regular attention map of size

ℝ^{\hat{H} \hat{W} \times \hat{H} \hat{W}}

, which reduces the computational burden.

α

is a learnable scaling parameter used to control the size of the matrix multiplication between

\hat{Q}

and

\hat{K}

before applying the softmax function.

Finally, the output of the CAFM module is computed as:

F_{o u t} = F_{a t t} + F_{c o n v}

(8)

3.4. C2f Improvement

Traditional convolution operations use fixed kernels, limiting their ability to adapt to diverse features, particularly in complex tasks like weld defect detection. While increasing the depth, width, or resolution of convolutional neural networks (CNNs) can enhance performance, these approaches may still fail to capture intricate features. To address this, we propose Dynamic Convolution (DyConv), which improves model performance without increasing network size [33].

DyConv dynamically adjusts its convolutional parameters to suit task-specific requirements, enhancing the model’s ability to detect diverse objects. This is particularly useful in defect detection, where weld defect dimensions and shapes vary. DyConv uses dynamically generated kernels to improve flexibility and efficiency, adapting based on the input image.

By integrating a coefficient generation module, such as a multilayer perceptron (MLP), DyConv adjusts kernel weights to focus on critical defect regions, minimizing background noise and improving precision, especially in noisy welding environments. This dynamic adjustment enhances both defect recognition and detection accuracy for variable targets, making DyConv effective for complex detection tasks that traditional CNNs struggle with.

In summary, DyConv’s adaptive mechanism enhances robustness and precision, making it particularly suitable for defect detection in challenging industrial environments. It outperforms conventional CNNs by improving focus on key areas, boosting detection accuracy, and reducing noise sensitivity.

Grad-CAM (Gradient-weighted Class Activation Mapping) is a gradient-based technique that generates class activation maps by weighting gradients from intermediate layers of a convolutional neural network (CNN). It visualizes the regions of an input image most influential to the model’s prediction, where red areas indicate high contribution, yellow represents secondary attention, and blue suggests minimal relevance. As shown in Figure 7, the original YOLOv8 model disperses attention across the image. However, with DyConv incorporated, the model focuses more effectively on key features, demonstrating its ability to enhance feature localization in YOLOv8.

4. Experimental Results and Analysis

4.1. Evaluation Metrics

This study uses accuracy, recall, mAP@50, and F1 score to evaluate the YOLOv8-PD model. F1 score and mAP@50 are the main metrics, with precision and recall as supplementary measures, to assess the model’s practical performance.

Precision measures the proportion of true positives among all positive classifications and is calculated as:

P r e c i s i o n = \frac{T P}{T P + F P}

(9)

where

T P

is the number of true positives and

F P

is the number of false positives. High precision alone does not guarantee model performance if recall is low.

R e c a l l

is the ratio of true positives to the total number of actual positives and is given by

R e c a l l = \frac{T P}{T P + F N}

(10)

where

F N

represents false negatives.

R e c a l l

measures the model’s ability to identify all relevant instances.

Mean Average Precision (mAP) is the average of average precision values across all classes:

mAP = \frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{1} Precison (Recall) d (Recall)

(11)

Accuracy and recall are typically inversely related, with improvements in one often leading to a decline in the other. To address this trade-off, F1 score is used as a comprehensive metric, balancing precision and recall. A higher F1 score indicates better model performance, reflecting an optimal trade-off between the two:

F 1 = \frac{2}{\frac{1}{P r e c i s i o n} + \frac{1}{R e c a l l}} = \frac{2 P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(12)

4.2. Experimental Environment

In this experiment, the PyTorch 2.2.0 framework was utilized for training, with the system environment based on Windows 10, Python 3.8.10 as the interpreter, and PyCharm 2023.1 as the integrated development environment (IDE). Training was executed using an NVIDIA GeForce GTX 1650 GPU (Santa Clara, CA, USA), equipped with 4 GB of VRAM, and accelerated via CUDA 11.8. Model training was configured for 150 epochs. The input image dimensions were standardized to 640 × 640 pixels, with a batch size of 32. The initial learning rate was set to 0.01, and the momentum parameter was defined as 0.937. A cosine annealing learning rate adjustment algorithm was applied, with the minimum cosine annealing learning rate set to 10.

4.3. Ablation Experiment in the Study

To assess the performance of the proposed enhanced algorithm, we evaluated and compared the training outcomes of the standard YOLOv8 model against those of models integrated with different enhancements. Specifically, the comparison focuses on the average precision (mAP@50) and F1 score. In the results table, a cross (×) represents the absence of an improvement, while a check mark (√) denotes its inclusion. The ablation study results are summarized in Table 1.

In the experiment, we tested eight different model configurations by sequentially adding or removing three major improvements: EfficientViT, CAFM, and DynamicConv. The comparison between Model 1 (no improvements) and Model 8 (all improvements) reveals the contribution of each modification to the model’s performance.

Model 8 demonstrated the best overall performance, achieving a mAP@50 of 90.5% and an F1 score of 86.1%. This indicates that the combined application of EfficientViT, CAFM, and DynamicConv significantly enhances the model’s detection capabilities, thereby confirming the effectiveness of the integrated improvement strategy.

When introduced individually, EfficientViT (Model 2), CAFM (Model 3), and DynamicConv (Model 4) did not show substantial performance gains. The mAP@50 values were 88.9%, 87.1%, and 87.6%, respectively, and the F1 scores also declined. These results suggest the limitations of each improvement when applied in isolation, highlighting that although these modifications have potential, their impact is limited without the support of complementary improvements.

Notably, Model 5 exhibited improved mAP@50 and F1 scores compared to Models 2 and 3, suggesting some degree of complementarity between the improvements. However, Model 6 showed a 0.9% decrease in F1 score compared to Model 4, and Model 7 experienced a 0.2% drop in mAP@50 compared to Model 2, indicating that in certain cases, the combination of improvements may result in performance degradation.

In summary, this ablation study demonstrates the effectiveness of the proposed improvements by comparing different model configurations. Furthermore, the results indicate that an appropriate combination of enhancements can significantly improve the performance of the YOLOv8 model, while inappropriate combinations may lead to performance deterioration.

4.4. Results Comparison Experiment

To further validate the effectiveness of the proposed algorithm, experiments were conducted using the same dataset described previously. The study compared the improved algorithm with current state-of-the-art approaches, specifically utilizing YOLOv5, YOLOv8, and YOLOv10 for comparative analysis. Additionally, enhancements were introduced to the YOLOv8 architecture by incorporating modules such as BiFPN, iRMB, SCConv, and SWC. The improved model was then compared to YOLOv8-WD. The comparison primarily focused on evaluating the performance of these object detection models using mAP@50 and F1 score, two key metrics for performance assessment.

Based on Table 2, among the baseline models, YOLOv8 demonstrated an improvement over YOLOv5 in precision but a slight drop in recall. Despite the slight recall drop, YOLOv8′s F1 score (84.3%) is higher than YOLOv5′s F1 score (81.8%), indicating that YOLOv8 offers a better balance between precision and recall. The [email protected] of Yolov5 was slightly higher than that of Yolov8, but the difference was not much more than 0.3%, meaning that the two performed similarly in the overall accuracy of the detection targets. YOLOv10 had the highest accuracy (87.1%) across all models, but its recall rate declined substantially (60.0%), significantly lower than that of YOLOV5 and Yolov8. The low recall rate severely affected its [email protected] (65.9%) and F1 score (70.8%). This indicates that, although YOLOv10 performs well in precision, it fails to detect a large number of related targets and is therefore less reliable in applications that require high recall rates. Overall, YOLOv10 shows an imbalance between accuracy and recall that has a negative impact on its overall effectiveness.

For the enhanced models, YOLOv8-WD showed the overall best performance with the highest [email protected] (90.5%) and highest F1 score (86.1%) across all models. The precision and F1 fraction of YOLOv8+BiFPN were improved, but the recall rate was slightly lower than that of the basic YOLOv8 model by 2.0%. YOLOv8+iRMB maintained a similar balance to Yolov8, with a slight decrease in precision of 0.9%, but a higher recall rate than base YOLOV8, with a good final F 1 score (83.6%) and [email protected] (87.8%). YOLOv8+SCConv performed nearly as well as Yolov8+iRMB with the same precision as [email protected] but with a slightly lower recall rate and F1 score than Yolov8+iRMB. YOLOv8+SWC provided a robust balance, with a high recall rate (83.7%) and an F1 score (83.9%) superior to some other YOLOv8 variants.

In conclusion, while YOLOv10 achieves the highest precision, it is clear that YOLOv8-WD offers the most robust and balanced performance, making it the ideal choice in scenarios requiring both high precision and recall. Variants of YOLOv8 show specific improvements depending on architectural changes, but none outperform YOLOv8-WD in overall effectiveness.

As illustrated in Figure 8, YOLOv8-WD initially exhibits lower accuracy compared to YOLOv8 but surpasses it in later stages of training. However, the accuracy curve for YOLOv8-WD does not stabilize smoothly towards the end, indicating that while the model’s performance improves over time, it does not achieve a steady increase in accuracy. Figure 9 demonstrates that YOLOv8-WD’s recall rate consistently surpasses that of YOLOv8, reflecting a superior ability to capture positive samples. According to Figure 10 and Table 2, YOLOv8-WD shows a 3% improvement in mAP@50 over YOLOv8 and exhibits a stable convergence of the curve. This stability indicates that YOLOv8-WD possesses robust performance, good generalization capabilities, and an effective training strategy. Consequently, YOLOv8-WD proves to be more valuable in object detection tasks, particularly in scenarios requiring high reliability and consistency.

Figure 11 shows that YOLOv8-WD’s box, dfl, and classification loss curves are similar to YOLOv8 on the training set, indicating comparable learning and stability. On the validation set, YOLOv8-WD exhibits smoother Box and DFL loss curves, suggesting greater robustness. However, its slightly higher loss values indicate that the added complexity in YOLOv8-WD results in increased loss due to more features being processed. This does not necessarily reflect worse performance, and additional metrics are needed for a full assessment.

Figure 12 compares the defect detection results between YOLOv8 and YOLOv8-WD by defect category. YOLOv8 performed well on dents and scratches with mAP@50 scores of 96.30% and 91.40%, respectively, but less effectively on bubbles and slag, with scores of 75.30% each. YOLOv8-WD showed similar performance on indentations and scratches but improved detection of bubbles and slag, with mAP@50 increasing by 1.7% and 13.6%, respectively. Thus, YOLOv8-WD effectively mitigates dataset class imbalance, reducing average precision gaps and achieving more balanced performance.

Figure 13 presents a test set sample containing four types of welding defects: pits, slag, scratches, and bubbles. For pits and slag, the original algorithm demonstrates satisfactory detection performance, with the improved algorithm yielding a slight increase in mAP@50. However, the original algorithm exhibits suboptimal detection performance for scratches and bubbles. As observed in Figure 13c, the original algorithm suffers from the missed detection of scratches. The improved algorithm addresses this issue, significantly enhancing detection accuracy. Furthermore, the improved algorithm also shows a notable improvement in bubble detection accuracy, with mAP@50 increasing by approximately 10%, representing a substantial advancement.

5. Conclusions and Outlook

As welding technologies are increasingly applied across different industries, accurate weld defect detection has become critical for quality control. This study develops an image acquisition platform to capture and annotate high-quality weld defect images. Building on the YOLOv8 architecture, an improved model, YOLO-WD, integrates EfficientViT as the backbone, a Cross-Attention Feature Mechanism (CAFM), and Dynamic Convolution (DyConv) in the C2f module. Experimental results show that YOLO-WD outperforms YOLOv8 in accuracy (86.4%), recall (85.9%), [email protected] (90.5%), and F1 score (86.1%) while also demonstrating better stability and robustness. YOLO-WD offers potential for integration into industrial workflows, enhancing defect detection accuracy and efficiency.

Welding equipment manufacturers, quality control service providers, and companies engaged in automated inspection systems could integrate YOLO-WD into their workflows to improve the accuracy and efficiency of defect detection. A potential commercialization approach involves licensing the algorithm as part of an industrial inspection solution, where end-users can deploy the model in real-time applications through hardware interfaces, such as robots or drones equipped with cameras for automated weld inspection.

This research helps to improve the accuracy and reliability of automated welding defect detection and provides great potential for broader industrial applications. Future research needs to further verify robustness under different lighting and complex backgrounds and adapt it to a variety of operating environments to ensure that it meets the needs of actual industrial applications.

Author Contributions

Software, M.Y.; Writing—original draft, J.R.; Writing—review & editing, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shao, C.; Paynabar, K.; Kim, T.H.; Jin, J.; Hu, S.J.; Spicer, J.P.; Wang, H.; Abell, J.A. Feature selection for manufacturing process monitoring using cross-validation. J. Manuf. Syst. 2013, 32, 550–555. [Google Scholar] [CrossRef]
Zhang, Z.; Wen, G.; Chen, S. Weld image deep learning-based on-line defects detection using convolutional neural networks for Al alloy in robotic arc welding. J. Manuf. Process. 2019, 45, 208–216. [Google Scholar] [CrossRef]
Biasuz Block, S.; Dutra da Silva, R.; Eugnio Lazzaretti, A.; Minetto, R. LoHi-WELD: A Novel Industrial Dataset for Weld Defect Detection and Classification, a Deep Learning Study, and Future Perspectives. IEEE Access 2024, 12, 77442–77453. [Google Scholar] [CrossRef]
Wang, B.; Hu, S.J.; Sun, L.; Freiheit, T. Intelligent Welding System Technologies: State-of-the-Art Review and Perspec-tives. J. Manuf. Syst. 2020, 56, 373–391. [Google Scholar] [CrossRef]
Ziegler, D.; Abdelkafi, N. Exploring the automotive transition: A technological and business model perspective. J. Clean. Prod. 2023, 421, 138562. [Google Scholar] [CrossRef]
Furlan, A.D.; Kajaks, T.; Tiong, M.; Lavallière, M.; Campos, J.L.; Babineau, J.; Haghzare, S.; Ma, T.; Vrkljan, B. Advanced vehicle technologies and road safety: A scoping review of the evidence. Accid. Anal. Prev. 2020, 147, 105741. [Google Scholar] [CrossRef] [PubMed]
Silva, M.I.; Malitckii, E.; Santos, T.G.; Vilaça, P. Review of Conventional and Advanced Non-Destructive Testing Tech-niques for Detection and Characterization of Small-Scale Defects. Prog. Mater. Sci. 2023, 138, 101155. [Google Scholar] [CrossRef]
Archana, R.; Jeevaraj, P.S.E. Deep learning models for digital image processing: A review. Artif. Intell. Rev. 2024, 57, 1–33. [Google Scholar] [CrossRef]
Pan, H.; Pang, Z.; Wang, Y.; Wang, Y.; Chen, L. A New Image Recognition and Classification Method Combining Transfer Learning Algorithm and MobileNet Model for Welding Defects. IEEE Access 2020, 8, 119951–119960. [Google Scholar] [CrossRef]
He, Z.; Pei, Z.; Li, E.; Zhou, E.; Huang, Z.; Xing, Z.; Li, B. An image segmentation-based localization method for detecting weld seams. Adv. Eng. Softw. 2024, 194, 103662. [Google Scholar] [CrossRef]
Wu, T.-Y.; Ume, I.C.; Rogge, M.D. Detection of Defects in Welded Structures With Complex Geometry Using Statistical Method Based on Discrete Wavelet Transform. In Proceedings of the ASME 2009 International Mechanical Engineering Congress and Exposition, Lake Buena Vista, FL, USA, 13–19 November 2009; pp. 419–425. [Google Scholar]
Zhang, L.; Ke, W.; Han, Z.; Jiao, J. A cross structured light sensor for weld line detection on wall-climbing robot. In Proceedings of the 2013 IEEE International Conference on Mechatronics and Automation (ICMA), Takamatsu, Japan, 4–7 August 2013; pp. 1179–1184. [Google Scholar]
Li, M.; Ma, T.; Yan, D.; Liu, H.; Wang, H. Experimental Research on Welding Defect Detection Based on Thermal Imaging. In Proceedings of the2022 IEEE 5th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 16–18 December 2022; pp. 836–840. [Google Scholar]
Wang, C.; Zeng, H.; Zhang, Y.; Song, Y.; You, D.; Gao, X.; Zhang, N. Non-destructive testing of weld defects based on finite element analysis of magneto-optical images. In Proceedings of the 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Harbin, China, 25–27 December 2020; pp. 2128–2131. [Google Scholar]
Zhang, Z.; Wen, G. An easy method of image feature extraction for real-time welding defects detection. In Proceedings of the 2016 13th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Xi’an, China, 9–22 August 2016; pp. 615–619. [Google Scholar]
Bazzi, A.; Slock, D.T.M.; Meilhac, L. A Newton-type Forward Backward Greedy method for multi-snapshot compressed sensing. In Proceedings of the 2017 51st Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 29 October–1 November 2017; pp. 1178–1182. [Google Scholar]
Ye, H.; Liu, J.; Liang, H.; Zhang, Y.; Gao, W. Detection and Recognition of Defects in X-ray Images of Welding Seams un-der Compressed Sensing. J. Phys. Conf. Ser. 2019, 1314, 012064. [Google Scholar] [CrossRef]
Xu, Q.; Wang, H.; Tian, G.; Ma, X.; Hu, B.; Chu, J. Compressive sensing of ultrasonic array data with full matrix capture in nozzle welds inspection. Ultrasonics 2023, 134, 107085. [Google Scholar] [CrossRef]
Tsutsumi, M.; Saito, N.; Koyabu, D.; Furusawa, C. A Deep Learning Approach for Morphological Feature Extraction Based on Variational Auto-Encoder: An Application to Mandible Shape. NPJ Syst. Biol. Appl. 2023, 9, 30. [Google Scholar] [CrossRef] [PubMed]
Santry, D.J. Convolutional Neural Networks. In Demystifying Deep Learning: An Introduction to the Mathematics of Neural Networks; IEEE: New York, NY, USA, 2024; pp. 111–131. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Melakhsou, A.A.; Baton-Hubert, M.; Casoetto, N. Computer Vision based welding defect detection using YOLOv3. In Proceedings of the 2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA), Stuttgart, Germany, 6–9 September 2022; pp. 1–6. [Google Scholar]
Yang, X.; Wan, Y.; Liu, X.; Wang, S.; Shen, Q.; Qin, C.; Sun, Z.; Wang, L. Search method for weld area in water-cooled wall pipe based on YOLO detection network. J. Physics Conf. Ser. 2020, 1486, 022043. [Google Scholar] [CrossRef]
Gao, A.; Fan, Z.; Li, A.; Le, Q.; Wu, D.; Du, F. YOLO-Weld: A Modified YOLOv5-Based Weld Feature Detection Network for Extreme Weld Noise. Sensors 2023, 23, 5640. [Google Scholar] [CrossRef]
Xu, L.; Dong, S.; Wei, H.; Ren, Q.; Huang, J.; Liu, J. Defect signal intelligent recognition of weld radiographs based on YOLO V5-IMPROVEMENT. J. Manuf. Process. 2023, 99, 373–381. [Google Scholar] [CrossRef]
Zeng, J.; Zhong, H. YOLOv8-PD: An improved road damage detection algorithm based on YOLOv8n model. Sci. Rep. 2024, 14, 12052. [Google Scholar] [CrossRef] [PubMed]
Mumuni, A.; Mumuni, F. Data augmentation: A comprehensive survey of modern approaches. Array 2022, 16, 100258. [Google Scholar] [CrossRef]
Kaswan, K.S.; Dhatterwal, J.S.; Malik, K.; Baliyan, A. Generative AI: A Review on Models and Applications. In Proceedings of the International Conference on Communication, Security and Artificial Intelligence (ICCSAI), Greater Noida, India, 21–25 November 2023; pp. 699–704. [Google Scholar] [CrossRef]
Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Ro-bustness. In Proceedings of the Advances in Data Engineering and Intelligent Computing Systems, Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
Liu, X.; Peng, H.; Zheng, N.; Yang, Y.; Hu, H.; Yuan, Y. EfficientViT: Memory Efficient Vision Trans-former with Cascaded Group Attention. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14420–14430. [Google Scholar]
Hu, S.; Gao, F.; Zhou, X.; Dong, J.; Du, Q. Hybrid Convolutional and Attention Network for Hyperspectral Image De-noising. IEEE Geosci. Remote Sens. Lett. 2024, 21, 5504005. [Google Scholar] [CrossRef]
Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic Convolution: Attention Over Convolution Kernels. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11027–11036. [Google Scholar]

Figure 1. Laser welding defect detection platform for automobile brake joints.

Figure 2. Weld defect data.

Figure 3. Extended sample sketch: (a) original image; (b) rotated image; (c) translated image; (d) translated image after vertical flipping; (e) image flipped vertically and horizontally; (f) vrightness-adjusted image.

Figure 4. Extended sample sketch.

Figure 5. A memory-efficient sandwich layout.

Figure 6. Illustration of the proposed convolution and attention fusion module (CAFM).

Figure 7. The Grad-CAM comparison diagram (from left to right: original image, YOLOv8, and YOLOv8-WD visualization heatmaps).

Figure 8. Comparison of accuracy curve.

Figure 9. Recall curve comparison.

Figure 10. mAP@50 curve comparison.

Figure 11. Loss curve comparison.

Figure 12. Comparison of mAP@50 for different defects on the validation set before and after improvements.

Figure 13. Test set results (from left to right: original image, YOLOv8 detection image, and YOLOv8-WD detection image). (a) Depression recognition map comparison; (b) slag recognition map comparison; (c) scratch recognition map comparison (content in the pink box); (d) bubble recognition map comparison.

Table 1. Ablation experiments.

Model	EfficientVit	CAFM	DynamicConv	[email protected]	F1
1	×	×	×	87.5%	84.3%
2	√	×	×	88.9%	81.2%
3	×	√	×	87.1%	79.0%
4	×	×	√	87.6%	84.0%
5	√	√	×	88.9%	83.3%
6	×	√	√	87.6%	83.1%
7	√	×	√	88.7%	83.5%
8	√	√	√	90.5%	86.1%

Table 2. Comparison of different algorithms.

Model	Precision	Recall	[email protected]	F1
YOLOv5	80.5%	83.1%	87.8%	81.8%
YOLOv8	85.2%	83.5%	87.5%	84.3%
YOLOv10	87.1%	60.0%	65.9%	70.8%
YOLOv8 − WD	86.4%	85.9%	90.5%	86.1%
YOLOv8 + BiFPN	86.0%	81.5%	87.0%	83.7%
YOLOv8 + iRMB	84.3%	82.9%	87.8%	83.6%
YOLOv8 + SCConv	84.3%	82.5%	87.8%	83.4%
YOLOv8 + SWC	84.1%	83.7%	87.8%	83.9%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ren, J.; Zhang, H.; Yue, M. YOLOv8-WD: Deep Learning-Based Detection of Defects in Automotive Brake Joint Laser Welds. Appl. Sci. 2025, 15, 1184. https://doi.org/10.3390/app15031184

AMA Style

Ren J, Zhang H, Yue M. YOLOv8-WD: Deep Learning-Based Detection of Defects in Automotive Brake Joint Laser Welds. Applied Sciences. 2025; 15(3):1184. https://doi.org/10.3390/app15031184

Chicago/Turabian Style

Ren, Jiajun, Haifeng Zhang, and Min Yue. 2025. "YOLOv8-WD: Deep Learning-Based Detection of Defects in Automotive Brake Joint Laser Welds" Applied Sciences 15, no. 3: 1184. https://doi.org/10.3390/app15031184

APA Style

Ren, J., Zhang, H., & Yue, M. (2025). YOLOv8-WD: Deep Learning-Based Detection of Defects in Automotive Brake Joint Laser Welds. Applied Sciences, 15(3), 1184. https://doi.org/10.3390/app15031184

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLOv8-WD: Deep Learning-Based Detection of Defects in Automotive Brake Joint Laser Welds

Abstract

1. Introduction

2. Related Works

2.1. Data Collection

2.2. Data Annotation and Preprocessing

3. Model and Optimization

3.1. YOLOv8 Model

3.2. Backbone Improvement

3.3. Attention Mechanism Improvement

3.4. C2f Improvement

4. Experimental Results and Analysis

4.1. Evaluation Metrics

4.2. Experimental Environment

4.3. Ablation Experiment in the Study

4.4. Results Comparison Experiment

5. Conclusions and Outlook

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI