1. Introduction
Industry 4.0 is a new generation of industrial revolution with intelligent manufacturing as its core, which aims to achieve the integration of the physical world and the virtual network world through the deep application of information and communication technology and improve the sustainability and innovation of production [
1]. Intelligent manufacturing uses information technology, artificial intelligence, the Internet of things, and other means to realize the digitalization, networking, automation, and intelligent management and control of the entire manufacturing process and flexibly adjust the production scale and product structure according to market demand, which can effectively improve resource utilization, reduce cost, ensure quality, enhance innovation ability, and promote sustainable development [
2,
3]. With industry 4.0 promoting the intelligent transformation of enterprises, machine vision has become widely used in various manufacturing processes. Machine vision uses computer vision and image processing methods to analyze and understand various images in industrial manufacturing [
4]. It can perform functions such as monitoring, detection, identification, and measurement of production processes. Compared to manual vision, machine vision can effectively improve inspection speed and accuracy and reduce human resources and human errors [
5]. Machine vision has a broad range of applications in diverse fields, such as automotive manufacturing, electronic components, food processing, textile printing and dyeing, and pharmaceutical and chemical industries [
6]. One of the important applications of machine vision is the detection of surface defects in industrial products, where anomalies in material processing result in undesirable phenomena on the product surface, such as cracks, scratches, pits, and bubbles, which can impair the product appearance or function and may cause more serious consequences. Hence, it is essential to identify and reject defective products in a timely and accurate manner on the production line.
The difficulty of industrial defect detection lies in the fact that industrial products or parts differ in their surface materials, shapes, colors, and other characteristics, and the types, locations, sizes, and forms of defects also differ greatly [
7]. Therefore, a general and flexible method that can adapt to different inspection scenarios and needs is required. Traditional industrial defect detection methods rely on techniques such as manual rules or template matching [
8], which necessitate artificially set thresholds or templates and are not only time-consuming and labor-intensive but also difficult to adapt to different types and complexities of defects. In recent years, deep learning methods [
9] have achieved breakthroughs in the field of computer vision and have demonstrated performance and potential to surpass traditional methods in the field of industrial defect detection. Deep learning methods can automatically learn high-level feature representations from large amounts of industrial defect data and use neural network models for classification, localization, or segmentation to accomplish defect detection tasks [
10]. However, deep learning methods still encounter two main challenges and problems in the field of industrial defect detection [
11]. The first challenge is the lack of industrial defect sample data [
12]. Deep learning methods depend on a large amount of labeled data to train models, but in industrial scenarios, it is challenging to obtain sufficient and representative sample data due to the wide variety of products, complex types of defects, and strict production processes. In particular, there is a significant imbalance between normal and defective samples, which can lead to a bias toward normal samples during model training and thus affect the model’s ability to identify defective samples. The second challenge is the poor generalization ability of the model [
10]. The lack of defective datasets results in poor model training, which requires a model with strong generalization ability that can adapt to changes in different scenarios and maintain high detection accuracy and robustness. However, in practical applications, it is difficult to ensure good generalization ability of the model due to the quality and quantity of the dataset as well as the structure and parameters of the model, which leads to false detection or missed detection when the model is faced with new or unknown defects [
13].
One of the key challenges for deep learning models of industrial defect detection is how to obtain diverse and high-quality industrial sample datasets. Adversarial Generative Networks (GAN) is an innovative and influential technique in deep learning, which consist of two neural networks: a generator and a discriminator. They use adversarial training to build unsupervised learning models that can learn and generate datasets with similar distribution characteristics to the training data without relying on prior information. The generator tries to produce samples that are indistinguishable from the real target samples, while the discriminator tries to distinguish between the generated samples and the real samples. The two networks compete with each other until they reach a Nash equilibrium. Currently, GAN and their derived models have gradually become a research hotspot in the field of data generation, mainly involving two research directions: models based on network architecture and loss function and models based on domain crossing. Models based on network architecture and loss function mainly improve the objective function and network structure of the original GAN to solve problems such as unstable training, mode collapse, etc., and improve the quality and diversity of the generated samples. The generator tries to minimize the discrimination error of the discriminator for its generated samples. The cGAN [
14] proposed a GAN based on conditional probability distribution, which can input additional labels, texts, or image information as conditions to the generator and discriminator and improve the realism of the generated samples according to this information. The discriminator tries to maximize its ability to distinguish between real samples and generated samples. AC-GAN [
15] proposed a GAN based on an auxiliary classifier, which adds an extra classifier in the discriminator to predict the category of the input sample, and combines the classification loss and discrimination loss to optimize the network. This can improve the quality and diversity of the generated samples and also use category information to control the generation process. The objective function of the original GAN is a minimax game, which has problems such as gradient vanishing, saddle point, KL divergence asymmetry, etc. Therefore, WGAN introduced Wasserstein distance as a measure between the real distribution and the generated distribution and gave a simple and effective algorithm to optimize this distance. WGAN [
16] can avoid gradient vanishing and mode collapse and provide a meaningful indicator of training progress. Models based on domain crossing mainly use the relationship or transformation rules between different domains to achieve cross-domain generation tasks. In the original GAN design, the generator can only map from a random noise space to a data space and cannot achieve transformation between different data spaces. Pix2Pix [
17] proposed a GAN based on conditional GAN and U-Net structure, which can achieve supervised transformation from one image domain to another image domain, such as from sketch to color image, from day to night, etc. StarGAN [
18] proposed a GAN based on conditional GAN and CycleGAN structure, which can achieve unsupervised transformation between multiple image domains, such as changing facial expressions, hairstyles, gender, etc. These methods provide some pioneering suggestions for solving this kind of problem.
Although GAN has a wide range of applications in the sample augmentation field, they mainly focus on domains such as face attribute transformation and landscape color transformation. In the industrial defect detection field, due to difficulties such as lack of defect samples, low visibility of defects, irregular shape, unknown type, etc., existing GAN-based augmented samples are difficult to meet the task requirements of high accuracy and high speed at the same time. Therefore, designing a GAN model that can synthesize realistic and diverse defect samples with high fidelity and efficiency is a challenge in industrial defect detection. To address this challenge, we propose a perceptual capsule cyclic generative adversarial network (PreCaCycleGAN) for industrial defect sample augmentation, which aims to learn a more realistic distribution of industrial defect data. Our method leverages CycleGAN’s framework of bi-directional mapping and cyclic consistency loss and enhances it with least-squares loss and perceptual loss function. Moreover, our method adopts an optimized generator structure with U-Net and DenseNet modules, and a capsule network with perspective invariance, to further improve the generator’s ability to learn the features of industrial defect samples. The main contributions of our model are shown below:
- (i)
We design a generator model with U-Net network structure [
19] and DenseNet [
20] modules to enhance the feature propagation and feature reuse of defects. This can solve the gradient disappearance problem of deep networks and add perceptual loss functions to enhance the feature and semantic information of generated images;
- (ii)
We use cyclic consistency loss, identity mapping loss, and least squares loss to construct an adversarial training framework to achieve random changes in defect location and shape, ensure the consistency between the generated samples and the real samples in the non-defective region, improve the similarity between the generated samples and the real samples, and avoid the mode collapse and gradient vanishing or oscillation problems;
- (iii)
We design a discriminator model with PatchGAN [
21] and capsule network [
22] using dynamic routing protocols dual branches after the initial feature extraction, which can effectively extract and retain the detailed features of defective samples, identify the local and overall features of the samples, and improve the authenticity and diversity of industrial defect generation samples;
- (iv)
We compare our method with other generation algorithms and validate it in the actual industrial manufacturing defect detection model. We prove that our method has the optimal performance improvement for the actual industrial manufacturing defect detection model and can effectively increase the generalization ability of the defect detection model.
2. Related Work
Data augmentation is a common technique to enhance the performance and generalization ability of machine learning models by artificially creating new data to expand and enrich the training dataset. Sample augmentation is a specific form of data augmentation that is tailored to the characteristics and requirements of different domains or tasks. Data augmentation has been widely applied in computer vision, especially for tasks such as image classification, object detection, semantic segmentation, etc., where it can effectively address the issues of data insufficiency, dataset imbalance, and overfitting. However, in industrial defect detection, obtaining industrial defect samples is challenging due to the high yield rate of intelligent manufacturing, which leads to the lack of quantity and diversity of defect samples. Moreover, industrial defect samples require manual inspection and annotation by professionals, which is time-consuming and expensive. Furthermore, industrial defect samples have high complexity and diversity and are often sensitive and confidential, which restricts data sharing and communication and hinders the development of the industrial defect detection field. Therefore, designing suitable data enhancement techniques to overcome the data scarcity and imbalance problems in the industrial defects domain and to improve the robustness and accuracy of industrial defects detection models is an important and meaningful topic. We will review the current related research in data augmentation from three perspectives: Model-free image augmentation, Model-based image augmentation, and optimizing policy-based image augmentation, and analyze their advantages and challenges in industrial defect samples.
2.1. Model-Free Image Augmentation
Model-free Image Augmentation (MIA) is a data augmentation method that does not depend on any model training or optimization, and it augments the data by applying various geometric or color transformations to the original image, such as rotation, translation, scaling, cropping, flipping, brightness adjustment, contrast adjustment, etc. [
23]. These transformations can be done in image space or frequency domain and can be randomly combined. However, these conventional transformation methods often only increase the data quantity but not the data diversity and may cause information loss or distortion. To address this issue, some researchers proposed methods such as CutMix [
24], which mixes different images; Random Erasing [
25], which replaces pixel values with random rectangles; Noise Injection [
26], which adds random values from Gaussian distributions to an image; and Copy-Paste [
27], which randomly pastes instance targets on background images. These methods can improve the data diversity and complexity by blending or erasing images, but they can also lose the details and boundary information of the images, which can affect the performance of the model for fine-grained target detection tasks.
MIA is a general and simple data augmentation method that can be applied to any image data and task, but there are few studies on algorithms specifically designed for industrial defect sample augmentation. Farady et al. [
28] only proposed PreAugNet in 2023, which uses a Support Vector Machine (SVM) as a class boundary classifier to filter the samples generated by MIA and combine them with the original ones. The limitations of MIA for industrial defect sample augmentation are mainly divided into two aspects: on the one hand, it cannot customize the transformations for a specific type of defects, and it usually requires manual setting of the transformation types and parameters, which are hard to adapt to different tasks and datasets. On the other hand, it can only transform the original image in its spatial or frequency domain and cannot change the content or structure of the image, so the difference between the generated samples and the original samples is limited, and it cannot effectively extend the data distribution or cover a new feature space to generate new samples. This cannot cope with the industrial defect images with specific structures or constraints that are generated from complex and dynamic industrial defect scenarios, and excessive transformations may destroy the semantic information of the image and thus compromise the quality and authenticity of the generated industrial defect samples.
2.2. Optimizing Policy-Based Image Augmentation
Optimizing Policy-based Image Augmentation (OPIA) is an approach that uses an optimization algorithm to search for the optimal data augmentation policy. OPIA is essentially a sequence of MIA operations and their parameters, such as rotating 15 degrees + crop 0.8 + brightness adjustment 0.2, etc. OPIA can automatically find the best data augmentation strategy for different datasets and tasks and can significantly improve the model performance on a test set. Cubuk et al. [
29] proposed AutoAugment, the first OPIA method, which uses a reinforcement learning-based controller to select the optimal data augmentation strategy, but it is very slow and computationally intensive. Cubuk et al. [
30] then proposed RandAugment based on the data augmentation strategy of the Neural Network Architecture Search (NAS) method [
31], which reduces the search space, makes the search results more general and stable, and can adapt to models and datasets of different sizes and complexities. Lim et al. [
32] further improved AutoAugment by proposing Fast AutoAugment, which uses Bayesian optimization and density matching to speed up the search process and is three orders of magnitude faster than AutoAugment in search time while achieving similar or better performance. Ho et al. [
33] proposed Population-Based Augmentation, which optimizes both the target network and the data augmentation strategy, and PBA is four orders of magnitude faster than AutoAugment in search time while achieving similar or better performance. Zhang et al. [
34] proposed Adversarial AutoAugment based on Adversarial Production Networks, which uses adversarial loss and reinforcement learning to optimize the data augmentation strategy, and Adversarial AutoAugment is 12 times faster than AutoAugment in search time while achieving the best performance on multiple datasets. However, OPIA still depends on MIA as a transformation operation and thus suffers from the same problems and limitations faced by model-free image augmentation techniques. For industrial defect detection, no studies have been found using optimization strategy-based image augmentation techniques to improve model performance. This may be due to the lack of sufficiently large and high-quality training data and feedback signals in industrial defect detection, which makes it difficult for optimization strategy-based image augmentation techniques to effectively learn data augmentation strategies or parameters.
2.3. Model-Based Image Augmentation
Model-based Image Augmentation (MBIA) is an approach that leverages deep learning models to synthesize new data samples. With the advancement of deep learning, traditional data augmentation methods are gradually replaced by data augmentation algorithms based on deep learning frameworks. Deep learning models can learn latent feature distributions from raw data and can generate new data samples from random noise or conditional inputs. MBIA can effectively increase the size and diversity of datasets and can produce high-quality and high-fidelity data samples. Kuo et al. [
35] proposed FeatMatch based on Convolutional Neural Networks (CNNs) [
36], which replaces simple transformations in image space with complex transformations generated in feature space to achieve data augmentation effects in feature space, thus enhancing the data diversity and consistency. However, the lack of interpretability of vector data in feature space leads to difficult and time-consuming training. Therefore, Wong et al. [
37] changed the perspective of data augmentation to data space and found that data augmentation in data space is superior to data augmentation in feature space. However, both data augmentation methods in feature space and data space do not sufficiently learn the true distribution of the sample data, which makes the data augmentation methods based on Adversarial Generative Networks (GANs) [
38] start to attract attention and research.
GANs is a deep learning framework that consists of generative and discriminative models that compete with each other. GANs can learn the underlying data distribution from raw samples and generate novel samples with diverse attributes such as types, positions, sizes, and shapes. The generation and discrimination processes are driven by a zero-sum game that ensures the progressive convergence between the generated and authentic data distributions. However, the GAN training process faces many challenges due to its non-convex and non-cooperative nature. Mode collapse, gradient vanishing, and oscillatory disturbances are common problems that affect the quality and diversity preservation in generated samples. Various GAN variants such as WGAN [
16], LS-GAN [
39], and f-GAN [
40] have introduced different loss functions and distance metrics to improve the similarity between generated and real samples. Likewise, models such as cGAN [
14], AC-GAN [
15], and InfoGAN [
41] have modified the architectures of generators and discriminators to increase the expressiveness and diversity of generative models. However, due to the complex and variable features of industrial defects, relying only on GANs to generate new industrial defect samples from random noise might lead to significant differences or biases compared to real samples. The generated samples might lack plausibility or credibility, which limits the application of GANs in industrial defect sample generation.
To endow GANs with more control mechanisms for sample generation, models such as Pix2Pix [
17] and CycleGAN [
21] have used translation between different images to impose constraints on generated samples, ensuring their closer approximation to real images. Based on this idea, some researchers have explored industrial defect sample generation. Qin et al. [
42] proposed Tree-CycleGAN, a cyclic generative adversarial network based on a symmetric tree structure. This method uses a tree-structured generator with maximal diversity loss to enable one-to-many generation mappings. Using a tree-structured reconstructor and dual discriminators, Tree-CycleGAN can generate multiple target domain samples from a single source domain sample while preserving differences and cyclic consistency across different branches. This method effectively alleviates the problem of industrial defect sample insufficiency.
Similarly, Song et al. [
43] introduced CycleGAN-TSS, a Texture Self-Supervised CycleGAN that leverages texture information as a self-supervisory signal to guide the generator in acquiring enhanced shadow features. Compared to traditional CycleGAN, CycleGAN-TSS can produce more realistic shadow images, thereby improving road crack detection performance. Niu et al. [
44] proposed a method that combines CycleGAN with a Defect Attention Module (DAM). This adaptive method adjusts the weights of defect regions and integrates structural similarity (SSM) into the original L1 loss to formulate the Defect Cycle Consistency Loss (DCL). By using grayscale and structural features, this method enhances the simulation of internal defect structures. Notably, unlike other GAN-based methods, this method yields clearer and more authentic defect images, thereby enhancing defect detection accuracy. In a different contribution, SHAO et al. [
45] introduced DuCaGAN, a Dual Capsule Generative Adversarial Network based on CycleGAN. DuCaGAN uses the Dual Capsule Network (DCN) [
22] to generate diversified and high-fidelity industrial defect samples, which can be used for practical industrial data augmentation.
All these methods address the problem of industrial defect sample augmentation to some extent, but in real industrial manufacturing applications, they often suffer from low quality, low diversity, and low fidelity and do not adequately reflect the data distribution of the real industrial defect samples, which affects the detection accuracy and generalization of the deep learning-based defect detection model. Therefore, MBIA needs to design appropriate network structures and loss functions to adapt to different data characteristics and task requirements and balance the relationship between quality and diversity of generated samples while avoiding training difficulties such as mode collapse and gradient vanishing.
4. Validation Experiments
To assess the effectiveness of PreCaCycleGAN-generated defect samples in enhancing the generalization performance of real industrial defect detection models, we employed the DAGM 2007 dataset as an experimental platform. DAGM 2007 [
47] is a publicly accessible dataset of texture surface images with various types of defects, which simulates the real-world defect detection problem with high complexity and diversity. We compared the defect samples produced by PreCaCycleGAN with those produced by Tree-CycleGAN [
42] and CycleGAN-TSS [
43] and applied them to three state-of-the-art defect detection models that are widely adopted in practice, namely YOLOv5 [
48], SSD [
49], and Faster-RCNN [
50]. We evaluated the impact of PreCaCycleGAN-generated defect samples on the generalization performance of defect detection models by measuring the mAP, false detection rate, and other metrics of the three generative models on different defect detection models and datasets. All experiments were conducted on a single NVIDIA GeForce GTX 3060 GPU.
The training process lasted for 150 iterations with a batch size of 1. The learning rate was initially set to 0.0002 and was linearly decayed starting from the 100th iteration. The hyperparameters in the loss function were empirically determined. We set
to 10,
to 0.5,
to 0.02,
in the capsule network to 0.9,
to 0.1, and
to 0.5, respectively, and used the Adam optimizer with default parameters for gradient computation. The comparison plots of defects generated by the three generative models are shown in
Figure 4.
Visually, compared to the other two models, the PreCaCycleGAN model with DenseNet incorporated into the U-Net network exhibits more diverse defect sample generation in four datasets: (b), (h), (i), (j). Moreover, in three datasets: (a), (e), (f), and (g), our model with a two-branch discriminator demonstrates a more refined feature representation. However, in the remaining dataset, our defect generation performance is not clearly superior to that of the other models.
Although our model appears to improve defect generation diversity and feature quality compared to the existing models from visual inspection, the generated defect samples need to be tested on actual industrial inspection models to verify their effectiveness. Therefore, we need further quantitative data to support the current superiority of our model.
4.1. Detection Model Training Validation
To validate the effectiveness of our model-generated defect samples in real industrial manufacturing, we selected three models YOLOv5 [
48], SSD [
49], and Faster-RCNN [
50], which are currently widely used in industrial inspection, for generating images for generalization enhancement of the detection model. In the training set, we designed three types of training sets De-Train A, De-Train B, and De-Train C, for different datasets, where De-Train A consists of 700 defect-free samples and 100 real defect samples, De-Train B consists of 700 defect-free real samples and 100 synthetic defect samples generated by different models, De-Train C consists of 700 defect-free samples, 50 real defect samples and 50 synthetic defect samples generated by different models.
Table 1,
Table 2 and
Table 3 show the corresponding detection accuracies for the three training sets De-Train A, De-Train B, and De-Train C.
Out of the 60 results from De-Train B training and De-Train C training, PreCa-CycleGAN achieved 51 top scores, and the results in the items that did not achieve top scores were very close to the top scores. In general, the detection accuracy was improved by 3–5% using De-Train B to train the dataset than using De-Train A to train the dataset, while the detection accuracy was improved by 8–10% using De-Train C to train the dataset than using De-Train A to train the dataset, which proves that the defect samples generated by different models all have a significant impact on the generalization and accuracy of the detection model. The improvement is evident, especially when the mixture of generated defect samples and real defect samples is used to train the detection model, which is consistent with the current actual industrial manufacturing situation.
We further compared the defect samples generated by PreCaCycleGAN with those generated by Tree-CycleGAN and CycleGAN-TSS and found that PreCaCycleGAN-generated defect samples exhibited better detail features for detection model learning in different datasets. Taking the YOLOv5 detection model as an example, we observed that PreCaCycleGAN-generated defect samples improved the detection accuracy by about 4% in (a), (b), (f), (g), and by 1–2% in the remaining datasets, compared to the other two generative models. This proves that our model can generate images with more detailed defect features and defect diversity. The same trend was observed in both SSD and Faster-RCNN detection models, demonstrating that our model-generated images can be practically applied to industrial defect detection models and show good generalization.
4.2. Detection Model Test Validation
To further verify the generalization improvement of the generated images to the detection model, test sets are constructed to show the application performance of the detection model in multiple dimensions and to demonstrate the practicality of our model to generate defective samples. In the validation set, we also set three types of test sets De-Test A, De-Test B, and De-Test C, for different datasets, where De-Test A is composed of 120 defect-free samples and 60 real defect samples, De-Test B is composed of 120 defect-free real samples and 60 fake defect samples generated by different models. De-Test C is composed of 120 defect-free samples, 30 real defective samples, and 30 fake defective samples generated by different training sets and training models correspondingly. During the experiments, we used the three most important metrics in real industrial manufacturing, data detection accuracy (DDA), defect detection rate (DDR), and false detection rate (FDR), to measure the detection accuracy [
51]. Among them, the data detection accuracy is the percentage of the sum of correctly detected defect data and defect-free data in the total data volume, the defect detection rate is the percentage of correctly detected defects in the total defect data, and the false detection rate is the percentage of incorrectly detected defect-free samples as defective samples among all samples detected as defective, as shown in Equations (15)–(17).
where
TP is the number of correctly detected sample defects in the testing process,
TN is the number of correctly detected true defect-free samples,
FP is the number of incorrectly detected true defect-free samples as defective samples, and
FN is the number of incorrectly judged defects as true sample backgrounds. The IOU of the detection model in the testing process is set to 0.25. The results of the validation set of (a)–(j) are shown in
Table 4,
Table 5,
Table 6,
Table 7,
Table 8,
Table 9,
Table 10,
Table 11,
Table 12,
Table 13 and
Table 14.
4.3. Discussion
Out of 360 test results consisting of ten datasets, three generation models, and three detection models, our algorithm achieved 354 optimal results. Overall, the detection models trained with our model-generated defects mixed with real defects, YOLOv5 detection models in ten datasets compared to the original detection models, CycleGAN-TSS generation models and Tree-CycleGAN generation models improved the detection accuracy by 5.75%, 1.16% and 1.26% on average, respectively, and the average improvement in detection rate by an average of 14.94%, 3.60% and 3.55% improvement, and 5.44%, 1.22% and 1.63% decrease in false detection rate; SSD detection model compared to the original detection model, CycleGAN-TSS generation model and Tree-CycleGAN generation model detection accuracy improved by 5.46%, 1.23% and 1.27% on average, respectively, with an average improvement of detection rate by 13.73%, 3.76% and 3.35% on average, and the false detection rate by 6.74%, 1.89% and 2.42%; Faster-RCNN detection model compared to the original detection model, CycleGAN-TSS generation model and Tree-CycleGAN generation model detection accuracy by 5.64%, on average, respectively 0.88% and 1.42%, the average improvement in detection rate is 14.54%, 3.22% and 3.62% on average, and the false detection rate is decreased by 5.64%, 0.89% and 2.26%.
The test results show that compared with the two generation models of Tree-CycleGAN and CycleGAN-TSS, our model can extract the local and global features of defects more effectively and improve the fineness of features by adding perceptual functions and capsule discriminators for (d) and (i), which are datasets with complex backgrounds and obscure performance of defective features, Class4 and Class9 detection results are shown in
Figure 5 and
Figure 6. The false detection rate of the original Faster-RCNN model in (d) is 12.12%, the false detection rate of the Tree-CycleGAN model that detects mixed samples after mixed sample training is 7.89%, and the false detection rate of the CycleGAN-TSS model is 5.26%, and our model can reduce the false detection rate to 2.70%; in (i) The false detection rate of the original YOLOv5 model is 19.61%, and the false detection rate of the Tree-CycleGAN model that detects mixed samples after mixed sample training is 10.00%, and the false detection rate of the CycleGAN-TSS model is 10.53%, and our model can reduce the false detection rate to 7.59%.
For datasets with obvious defect features like (b) and (f), our model combines U-Net and DenseNet to enhance the defect feature representation of the samples and improve the learning ability of the defect detection model for defect features, Class2 and Class6 detection results are shown in
Figure 7 and
Figure 8. the detection rate of the original YOLOv5 model in (b) is 81.67%, and the detection of the mixed samples after training of the mixed samples Tree. The detection rate of the original YOLOv5 model in (f) is 73.33%, and the detection rate of the Tree-CycleGAN model for detecting mixed samples after training of mixed samples is 91.80%, and our model can improve the detection rate to 97.26%, with a detection rate of 84.38%, and the CycleGAN-TSS model with a false detection rate of 85.48%, our model was able to improve the detection rate to 91.04%.
The comparison experiments show that our model can generate high-quality defect samples on different types of defect datasets, and mixing and matching real defect samples can further improve the generalization ability and robustness of the defect detection model, which can effectively identify defects with obscure features in complex backgrounds and further enhance the authenticity and diversity of defect features.
Subsequent IOU value optimization for our optimal model for actual industrial defect detection, and finally, when the IOU value is 0.15, the false detection rate is 0%, and the average accuracy rate of various data sets reaches 98.73%. The experiments show that the defect samples generated by our model are better than those generated by the current defect sample generation model and can be practically applied to industrial defect detection, which can effectively improve the robustness and generalization of the defect detection model.