1. Introduction
Automotive machines, such as bearings, gearboxes, and rotors, play an indispensable role in transportation vehicles. Motor bearing, gearbox, and rotor failures significantly impact vehicle driving safety. When a vehicle experiences a malfunction, pinpointing the exact failing component is often challenging, necessitating the disassembly of the assembly to identify the issue [
1]. This process hinders the efficiency and convenience of fault determination. In addition, as technology advances, the structure of these automotive machines is becoming increasingly sophisticated, and their application scenarios are becoming more complex. This complexity increases the susceptibility to various failures or damages. Such failures can reduce the operational efficiency of mechanical equipment and cause vehicle shutdowns. Hence, studying fault diagnostic methods for automotive machines holds significant theoretical and engineering value.
Traditional methods using pure signal processing techniques are still theoretically valid, but they have significant room for improvement in intelligence [
2,
3]. Meanwhile, deep learning (DL) can extract features from complex vibration, sound, or other sensor data and learn patterns between different states of an automotive machine, leading to more accurate and early diagnosis of automotive machine faults [
4]. Deep neural networks for fault diagnosis mainly consist of the recurrent neural network (RNN), auto-encoder (AE), and convolutional neural network (CNN). These networks do not require empirical knowledge and can extract feature information from data or samples adaptively. Among these models, CNNs utilize multiple convolutional kernels to extract features from input samples or upper-layer features [
5,
6]. CNNs perform matrix element multiplication summation and accumulate the deviations of the input features. Compared to RNNs and AEs, CNNs have notable advantages due to their increased network depth and weight sharing. The former, with deep residual learning [
7], allows CNNs to increase their depth without encountering the vanishing gradient problem common in RNNs [
8], thus enabling effective feature extraction from sensor time series data over longer periods. The latter reduces the number of trainable parameters, lowering model complexity and mitigating the overfitting problem prevalent in AEs [
9], which augments the generalization of models [
10,
11].
With outstanding advantages, CNNs are widely used in fault diagnosis. For instance, an improved CNN model incorporating empirical mode decomposition (EMD) was proposed to enable end-to-end diagnosis, enhancing accuracy and anti-interference capabilities [
12]. Additionally, a diagnostic method based on persistence spectrum imaging and the residual network (ResNet) structure was proposed. The improved ResNet structure allows for direct connections between different feature maps, facilitating the extraction of discriminant features [
13]. Moreover, the CBAM-ResNet, which comprised the convolutional block attention module (CBAM) and a modified ResNet, was created to improve network feature extraction efficiency while maintaining high accuracy [
14].
These DL-based fault diagnostic methods perform well and can accurately recognize different types of faults. However, these methods depend on large and balanced samples. In real-world conditions, data acquisition for automotive machines is often limited due to practical constraints. Moreover, since automotive machines typically operate in a normal state most of the time, it is easier to collect a substantial amount of normal-state data, leading to an imbalance in fault-type data. This imbalance restricts the diagnostic accuracy [
15].
To fill the imbalanced sample sets, DL-based generative models are employed to synthesize the missing samples for each faulty class, thereby achieving a balanced dataset. The generative models used for this purpose mainly include the variational auto-encoder (VAE), generative adversarial network (GAN), and diffusion model. Among these generative models, the diffusion model’s noising process is a multi-step procedure that gradually applies noise to samples. Conversely, the reverse denoising process is multi-step, gradually removing noise from the sample. This dual process allows the diffusion model to fulfill two key objectives: (1) starting from random noise samples ensures the diversity of the synthesized samples; (2) the gradual denoising process allows for meticulous control, enhancing the fidelity of the synthetic samples and avoiding the low fidelity issues of VAE-synthesized samples and the GAN’s collapse [
16,
17].
Currently, diffusion models are developing rapidly in sample synthesis tasks. Dhariwal et al. [
18] introduced the classifier-guided mechanism to diffusion models, proposing the classifier-guided diffusion model (CGDM) in 2021. This model can outperform the GAN by using a gradient scale that weighs the focus on the diversity and fidelity. We aimed to augment the imbalanced fault sample set by synthesizing specific ones using the CGDM. The CGDM allows for a trade-off between the diversity and fidelity of synthetic samples by adjusting the gradient scale
, offering the potential to improve the quality of synthetic fault samples under varying imbalance ratios. Nevertheless, determining the proper value of the gradient scale as a hyperparameter remains an important question. Dhariwal et al. [
18] suggested that the gradient scale should be set at an intermediate point where the overall sample quality, considering diversity and fidelity, is highest. We sought a clearer indicator for setting the gradient scale than the vague “intermediate point”.
Since DivEn can reflect the diversity of time series [
19], it and its variants have been applied in machine fault diagnosis by extracting time series’ features [
20,
21,
22]. However, they have not yet been reported in imbalanced fault diagnosis. We aimed to map the DivEn of machine signals at different imbalance ratios to an appropriate gradient scale for high-quality sample syntheses. Nevertheless, DivEn is not sensitive to time series at different imbalance ratios, as illustrated in
Section 2.2. To address this problem, we propose fractional diversity entropy (FrDivEn) by incorporating fractional order calculus into DivEn. Fractional calculus generalizes the classical theory of integer-order differentiation and integration. Both the theory and its applications demonstrate that the fractional calculus operator effectively describes many complex systems. Due to its numerous unique characteristics, fractional calculus is extensively studied in fields like signal processing [
23,
24], image processing [
25,
26], and machine learning [
27,
28]. We propose FrDivEn to sensitively reflect the vibration signals’ diversity and to better balance the diversity and fidelity of the synthetic samples.
Furthermore, we present a novel imbalanced diagnostic method for automotive machines by integrating the CGDM with a gradient scale corresponding to FrDivEn, the Gramian angular field (GAF) transformation, and the fine-tuned pretrained ConvNeXt. Specifically, the method first transforms time series signals into GAF image samples using GAF transformation. Then, high-quality samples are synthesized using the CGDM with a gradient scale corresponding to FrDivEn. These synthetic samples are combined with imbalanced real samples to obtain a mixed sample set, which is subsequently fed to a fine-tuned pretrained ConvNeXt for fault diagnosis. The contributions of this paper are summarized as follows:
- (1)
For fault diagnosis with imbalanced sample sets in automotive machines, it is necessary to balance the diversity and fidelity of synthetic samples. Here, we innovatively propose a novel vibration signal measure, fractional diversity entropy (FrDivEn), to reflect signal diversity and adjust the generative model’s emphases on the diversity and fidelity of sample synthesis. The proposed FrDivEn differs from the traditional DivEn, which is insensitive to signal diversity at different imbalance ratios. FrDivEn can sensitively vary with the imbalance ratio of the signal, reflecting signal diversity more efficiently than the traditional DivEn.
- (2)
To select the appropriate gradient scale of CGDM accordingly and achieve high-quality sample synthesis, we innovatively propose using FrDivEn to determine the ideal gradient scale. This approach results in better sample synthesis compared to other generative models.
- (3)
To boost diagnostic accuracy in automotive machines, we present a fault diagnostic method. This method primarily uses CGDM with a gradient scale corresponding to FrDivEn as the sample synthesizer and a fine-tuned pretrained ConvNeXt as the fault classifier. Experiments show that this method can be extended to various automotive machines and achieves higher diagnostic accuracy compared to other sample synthesizer and fault classifier methods.
The remainder of this paper is arranged as follows: The algorithms of DivEn and FrDivEn are provided in
Section 2. The proposed imbalanced diagnostic method is described in
Section 3. The experiments designed to verify the validity and generalizability of the method are presented in
Section 4. Finally, the conclusions of this study are drawn in
Section 5.
2. Algorithms
In this section, the algorithms for the diversity entropy (DivEn) and the proposed fractional diversity entropy (FrDivEn) are presented.
2.1. Diversity Entropy
For a given time series , the diversity entropy (DivEn) can be derived according to the following steps.
Step 1: Phase space reconstruction. The time series can be reconstructed into orbits using an embedding dimension
[
29]. This reconstruction involves creating subsequences. It allows for analysis of the system’s dynamics by examining the geometric properties of the reconstructed phase space.
is divided into
subsequences. Each subsequence
is formed as
. The reconstructed matrix
consists of rows that are segments of
. The matrix
is structured with each row representing a segment of length
from the time series. This matrix can reveal patterns and structures that are not apparent in the original time series, facilitating further analysis.
The phase space reconstruction matrix
is given by
The rows of this matrix correspond to the subsequences.
Step 2: Cosine similarity calculation. The similarity between each row and the next row in the phase space matrix is calculated to yield a set of similarities . This series of similarities helps in understanding the relations between successive states in the reconstructed phase space. The cosine similarity between adjacent rows is defined mathematically. It calculates the cosine of the angle between two non-zero vectors, reflecting their directional alignment.
The series of cosine similarities is given by
where
The similarity between two rows
and
is defined as
The cosine similarity ranges from −1 to 1. A value of 1 indicates that the two rows are identical, 0 indicates that they are orthogonal (no similarity), and −1 indicates that they are completely opposite. High cosine similarity values indicate similar dynamic changes between two rows, while low values indicate diverse dynamic behavior.
Step 3: State probability calculation. The range is partitioned into intervals denoted as . This partitioning allows for the categorization of cosine similarity values into discrete intervals, facilitating the calculation of state probabilities. The state probabilities are calculated by determining the frequency of cosine similarity values and normalizing by the values. The sum of state probabilities is equal to 1, i.e., . This ensures that the probabilities are properly normalized, making the distribution valid and interpretable.
Step 4: Diversity entropy calculation. DivEn is calculated based on the state probabilities obtained from the partitioned cosine similarities using the following formula:
where
means the number of intervals, and
are the elements of the state probability.
DivEn is the expectation of the diversity between the rows of the phase space matrix. It quantifies how evenly the cosine similarities are distributed. The range of DivEn is
, according to the original entropy theory [
30]. When DivEn tends to 0, this indicates low complexity in the time series, suggesting a dynamic system with similar phenomena or repetitive patterns. When DivEn tends to 1, this indicates high complexity in the time series, suggesting a dynamic system with diverse phenomena or more varied behavior.
2.2. Fractional Diversity Entropy
DivEn can characterize the diversity of time series. However, we find limited differentiation in DivEn calculation results for vibrational signals with different imbalance ratios, which reduces DivEn’s effectiveness in measuring the diversity of time series.
To illustrate this problem, the CWRU bearing dataset with a motor load of 0 and a speed of 1797 rpm is used as an example. The vibration signals are cut and spliced at different ratios for the nine fault states to create five time series. These time series simulate scenarios where the occurrence of faults is not uniform, providing a basis for analyzing how imbalance affects system dynamics. The specific allocation of bearing faults in the imbalanced time series is detailed in
Table A1,
Appendix A. Following the methodology in the original research on DivEn [
19], the embedding dimension
is set to 4. DivEn is calculated for each of the five time series, allowing for a comparative analysis of their complexity and diversity. Similar to the original research on DivEn [
19], we set the embedding dimension
to 4 and calculated DivEn for each of the above five time series. The results of the DivEn calculations are shown in
Table 1. It can be found that the DivEn results of two adjacent imbalance ratio vibration signals have a limited difference of no more than 0.01. The calculation results reflect the limitations of DivEn in characterizing imbalance ratio fault vibration data, and DivEn is ill-equipped to reflect the diversity of time series vibration signals clearly.
To address this problem and make DivEn sensitive to different imbalance ratio vibration signals, we combine DivEn with fractional order calculus to propose fractional diversity entropy (FrDivEn), which measures the diversity of time series. The improved algorithm for DivEn, called FrDivEn, is derived from DivEn and Shannon entropy at a fractional order . FrDivEn extends the concept of DivEn to incorporate fractional calculus, enhancing the measurement of system complexity.
Shannon entropy is extended to consider fractional calculus [
31], referred to as ShannonEn
α. This extension allows for a more flexible and detailed analysis of the underlying dynamics of time series data.
The generalized expression for ShannonEn
α is given by
where
denotes the fractional order,
represents the gamma function,
represents the digamma function, and
are the elements of the state probability in Shannon entropy calculation. This formula introduces fractional exponents and special functions to adjust the traditional entropy calculation.
FrDivEn is generalized based on the generalized expression of Shannon entropy and fluctuation-based calculus [
32]. This extension enhances the traditional entropy measures by incorporating the concept of fractional calculus, providing a more detailed analysis of time series data.
FrDivEn at fractional order
is written as FrDivEn
α, defined as
where
denotes the derivative of fractional order
, introducing the concept of fractional differentiation into the entropy calculation.
Combining with Equation (6), the FrDivEn
α for the original time series
is given by
where
denotes the fractional order (
),
represents the gamma function,
represents the digamma function,
denotes the number of intervals, and
are the elements of the state probability. This formula allows for a more adaptable and comprehensive calculation of entropy, reflecting the patterns of diversity within the time series data.
3. Proposed Imbalanced Fault Diagnostic Method
To enhance automotive machine diagnostic accuracy on limited and imbalanced fault data, we propose an innovative fault diagnostic method that introduces FrDivEn to trade off the classifier-guided diffusion model’s (CGDM) sample synthesis.
First, to fully utilize the advantages of convolutional neural networks (CNNs) in image classification [
33] for fault diagnosis, Gramian angular field (GAF) transformation is employed to convert the raw vibration signals of automotive machines into GAF images. Then, to balance the number of GAF images for each fault state, the CGDM with the FrDivEn trade-off is applied to synthesize high-quality GAF images. Next, the real samples and synthetic samples are combined into a single balanced sample set. Finally, to achieve highly accurate fault diagnosis, a fine-tuned ConvNeXt model based on transfer learning is implemented. For a given automotive machine vibration signal collection platform, four processes need to be performed.
3.1. Preprocess
In automotive machine fault diagnosis, the raw vibration signals collected by the sensor are converted into GAF images through GAF transformation. The GAF transformation is a time series data analysis coding method that enhances tasks such as classification and imputation [
34]. The basic idea of GAF involves combining the coordinate transformation and the Gramian matrix. The detailed derivation of GAF transformation is provided in the
Appendix B section. In this fault diagnostic method, GAF transformation is used to extract the temporal and numerical relationships of the vibration signals, representing these relationship features as GAF images.
3.2. Sample Synthesis
After obtaining real image samples, to fill the imbalanced sample set, we use a classifier-guided diffusion model (CGDM) with the assistance of FrDivEn to synthesize samples. The CGDM comprises noising and denoising processes: (1) Noising process (
): The random noise
is added to the original image sample
gradually, resulting in a purely noise image
after
steps. (2) Denoising process (
): The noise is progressively removed from the noise image according to the conditional distribution
, yielding the synthetic image sample after
steps. As implied by its name, the CGDM incorporates a classifier
to guide sample synthesis. By guiding the diffusion model, overall sample quality is enhanced by balancing diversity and fidelity. Dhariwal et al. [
18] found that increasing the classifier’s gradient scale
boosts fidelity at the cost of diversity in synthetic samples, introducing a trade-off between sample fidelity and diversity. For instance, high fidelity for bearings means that the CGDM can accurately synthesize specific fault samples, such as ball or race faults, but this reduces the overall diversity of synthetic samples. Therefore, adjusting the gradient scale offers a trade-off between the diversity and fidelity of synthetic samples. The schematic diagram of CGDM is shown in
Figure 1.
However, there are no ideal measures to pick the appropriate gradient scale . Given that the proposed FrDivEn can sensitively represent the diversities of vibration signals, we propose to tune the CGDM’s gradient scale for high-quality sample synthesis using FrDivEn. The combination of FrDivEn and the CGDM aims to achieve a better trade-off in fault sample diversity and fidelity. The FrDivEn trade-off in diversity and fidelity can be described as the following three steps.
Step 1: Fractional order analysis. Machado [
35] highlights that the fractional order enhances the description of system dynamics. This sensitivity adjustment can capture more subtle variations and patterns within the vibration signals. To choose suitable
values, we calculate FrDivEn results of the automotive machine vibration signal at different imbalance ratios and various
values, analyze the results, and summarize the laws associated with FrDivEn.
Step 2: Gradient scale analysis. To establish the correspondence between the FrDivEn of vibration signals and the appropriate gradient scale to trade off sample synthesis, we must find the gradient scale that yields the highest fault diagnostic accuracy at different imbalance ratios. We use the CGDM with varying gradient scales to synthesize fault samples, filling the imbalanced sample set at different imbalance ratios. Subsequently, we analyze the effect of gradient scales on the fault diagnostic accuracy. The gradient scale with the highest fault diagnostic accuracy is then chosen to correspond to the FrDivEn of the imbalance ratio.
Step 3: FrDivEn–gradient scale curve fitting. We map the FrDivEn and the appropriate gradient scale at different imbalance ratios onto Cartesian coordinates, obtain several corresponding points, and fit a FrDivEn–gradient scale curve through these points. With this fitted curve, we can find correspondence from any FrDivEn to the appropriate gradient scale. In this study, FrDivEn–gradient scale fitting curves are obtained from the CWRU bearing dataset. Theoretically, we can apply this fitting curve, i.e., the correspondence between FrDivEn and the gradient scale, to other automotive machines similar to the rolling bearing.
The comparison of the proposed FrDivEn with the existing DivEn, the selection process of the gradient scale in the CGDM, and the fitting process are elucidated in detail in
Section 4.2. Additionally, we incorporate DivEn and FrDivEn into the CGDM, respectively, and demonstrate their effectiveness by validating them against other generative models through fault diagnostic experiments. The related experiments and discussions are included in
Section 4.3.
3.3. Sample Mix
Synthetic fault samples are produced through the CGDM. This method synthesizes fault samples that can be used to supplement real-world samples, enhancing the sample set for CNN training. The mixer function is used to combine these samples into a cohesive set.
The mixed sample set,
, is determined using the mixer function
, integrating the real samples
and synthetic samples
:
where
denotes the sample mixing process;
is the imbalanced real sample set; and
is the supplementary synthetic sample set, which is produced by CGDM’s synthesis technique. This integration aims to create a balanced sample set for fault diagnosis.
3.4. Fault Diagnosis
After obtaining the mixed sample set, features from the image samples are extracted to achieve accurate fault diagnosis. The ConvNeXt structure is combined with transfer learning to enhance fault diagnostic accuracy. ConvNeXt’s excellent performance in image classification tasks was validated using the ImageNet sample set [
36]. As a deep CNN model, ConvNeXt requires extensive parameter tuning after initialization. This tuning can lead to unsatisfactory fault diagnostic accuracy when trained over limited epochs, as it may not fully converge or learn the necessary features. Transfer learning significantly reduces the need for extensive parameter tuning [
37]. The pretrained ConvNeXt model can leverage hierarchical representations learned from the ImageNet sample set, enhancing its performance on the target task.
The ConvNeXt model was pretrained on the ImageNet sample set, while the target dataset comprises GAF images. When there is dissimilarity between ImageNet images and GAF images, more layers should be fine-tuned for effective fault diagnosis. Fine-tuning adjusts the model to better recognize and classify the specific features of GAF images, which are different from natural images of ImageNet. The fine-tuning process involves adjusting ConvNeXt blocks and other layers at the lower level. The original output classes of the last linear layer, corresponding to ImageNet, are replaced with classes representing possible automotive machine working states. This ensures that the model’s predictions are relevant to the fault diagnostic task. The fine-tuned ConvNeXt is illustrated in
Figure 2, which shows how the model’s architecture is adapted.
To sum up, the proposed method, shown in
Figure 3, comprises a preprocessing module, a sample synthesis module based on CGDM with FrDivEn trade-off, a sample mix module, and a fault diagnostic module based on fine-tuning pretrained ConvNeXt.
4. Experiments and Discussion
To explore the correspondence between the proposed FrDivEn and the gradient scale, and to verify the effectiveness and generalization of the imbalanced fault diagnostic method, we conducted automotive machine fault diagnosis experiments.
4.1. Experimental Setup
In our experiments, the algorithms were implemented using PyTorch 2.2.1 and run on a platform equipped with an i9 12900 K CPU, 16 G × 2 of DDR5 RAM, and an NVIDIA GeForce RTX3090 GPU for training the proposed method.
Referring to the original paper [
36,
37] and considering the fault diagnosis effect as well as hardware constraints, the training settings were as follows. The AdamW optimizer was the network optimizer, and the cross-entropy loss function was selected as the loss function. The batch size was set to 16. The training process of the entire ConvNeXt model comprised 120 epochs, sequentially divided into two stages: 20 epochs for the training output layer and 100 epochs for training other layers: (1) Output layer training: The learning rate started at 0.01 and decayed to 0.001 after 10 epochs, continuing for another 10 epochs; (2) Other layer training: The learning rate was fixed to 0.0001 for fine-tuning, running for 100 epochs. To minimize the impact of stochasticity on the experiments, we carried out five identical fault diagnostic experiments for each method with varying settings or modules, such as imbalance ratios, sample synthesizers, and fault classifiers. The median of these five fault diagnostic accuracies was selected as the experimental result. The detailed training process setup is presented in
Table 2.
4.2. FrDivEn Trade-Off
To find a trade-off between synthetic samples’ diversity and fidelity and achieve high-quality sample synthesis, we introduced FrDivEn, a sensitive measure of time series diversity. In exploring the correspondence between the proposed FrDivEn and the CGDM’s appropriate gradient scale, we chose the CWRU bearing vibration data as the dataset. This was consistent with the calculation of DivEn in
Section 2.2., ensuring a fair comparison. We proceeded sequentially with the three steps described in
Section 3.2.
Step 1: Fractional order analysis. We calculated the FrDivEn results of the automotive machine vibration time series at different imbalance ratios and
values. The previously calculated DivEn and FrDivEn results at different fractional orders
are illustrated in
Table 3 and
Figure 4. From the calculation results, the following can be surmised: (1) When other conditions are constant, with the gradual increase in the imbalance ratio, FrDivEn gradually decreases. Taking FrDivEn
0.1 as an instance, this decreases from 15.5899 at an imbalance ratio of 2:1 to 6.7208 at an imbalance ratio of 40:1, indicating that the diversity of the time series diminishes as the imbalance problem worsens. (2) When other conditions are constant, with the gradual increase in the fractional order
, FrDivEn drastically increases, with FrDivEn
0.
4 at an imbalance ratio of 2:1 even exceeding 140. (3) The difference between the FrDivEn results computed from two adjacent imbalance ratios is larger than that of DivEn. For example, the FrDivEn
0 difference between vibration signals at imbalance ratios of 2:1 and 5:1 is 0.9158, which is substantially larger than the DivEn difference of 0.0061. This indicates that the sensitivity of the proposed FrDivEn to the vibration signal is enhanced compared to DivEn.
Step 2: Gradient scale analysis. After completing the analysis of FrDivEn, we employed the CGDM with differing gradient scales to select appropriate values. Specifically, for each imbalance ratio, finding the appropriate gradient scale was divided into coarse and fine sampling: (1) Coarse sampling of the gradient scale: we swept over the gradient scale values
, consistent with Dhariwal et al.’s [
18] method when performing sample synthesis via the CGDM on ImageNet 256 × 256 (which is the same size as our selected sample size). (2) Fine sampling of the gradient scale: we denoted the gradient scale value that achieved the highest diagnostic accuracy in coarse sampling as
, and we swept over the interval
at intervals of 0.1, taking the gradient scale that achieved the highest diagnostic accuracy here as the appropriate gradient scale. The imbalanced sample set was allocated one normal state and
n fault states to classify, as shown in
Table 4. The appropriate gradient scales selected at different imbalance ratios are shown in
Table 5.
Step 3: FrDivEn–gradient scale curve fitting. To find the appropriate gradient scale value from any FrDivEn, we mapped the FrDivEn results calculated at different imbalance ratios in
Step 1 to the appropriate gradient scale found in
Step 2, plotting them as FrDivEn–gradient scale points in the coordinate plot. These points were then fitted to obtain the FrDivEn–gradient scale curve. Considering that Dhariwal et al. [
18] did not set the gradient scale
of the CGDM smaller than 0, we used an exponential fit for the FrDivEn
α–gradient scale at different fractional orders
. For comparison, we applied the same method to fit the DivEn–gradient scale curve. The fitted entropy–gradient scale curves are illustrated in
Figure 5.
From the entropy–gradient scale curves, the following can be surmised: (1) As the fractional order gradually increases, the gradient scale at the initial point of the FrDivEnα–gradient scale curves also increases. For instance, the gradient scale for FrDivEn0.1 is consistently larger than the initial value of 0.61, while for FrDivEn0.4, it is always larger than the initial value of 1.08. (2) As the fractional order continues to increase, the FrDivEnα–gradient scale curves tend to flatten. For FrDivEn0.1, when it rises from 0 to 10, the corresponding gradient scale rises from 0.61 to 1.66, a change of 1.05, whereas for FrDivEn0.4, the gradient scale increases by merely 0.08 for the same range. (3) When the entropy shifts from 0.7 to 0.9, the DivEn–gradient scale curve transitions rapidly from near-horizontal to near-vertical. In contrast, the FrDivEn0–gradient scale curve, for example, has a more stable slope of about 0.12. This indicates the instability in the gradient scale values derived from DivEn compared to FrDivEn.
Regarding the sample synthesizer, Dhariwal et al. [
18] set the initial gradient scale value to 0.5 and incrementally increased it for sample synthesis. Combined with
Figure 5, this led us to conclude that some of the minimum gradient scale values were too large to be desirable. For instance, the gradient scale corresponding to FrDivEn
0.
4 could not be taken to a value below 1. Considering the gradient scale range and the smoothness of the FrDivEn–gradient scale curves, we selected FrDivEn results corresponding to the two curves with initial gradient scale values around 0.5, i.e., FrDivEn
0 and FrDivEn
0.1, as the basis for the gradient scale value in sample synthesis.
4.3. Applications of the Proposed Method
To test the validity and generalizability of the proposed imbalanced diagnostic method, we applied this method to (1) the CWRU bearing dataset with a motor load of 3 HP and a speed of 1730 rpm; (2) the University of Connecticut (UConn) gearbox dataset [
38]; and (3) the Wuhan University (WHU) rotor dataset [
39]. This allowed us to explore the diagnostic effect under a different load and speed of the same machine, as well as across different machines. After the GADF transformation, the imbalanced sample set of each automotive machine was allocated consistently with the process in
Section 4.2, as shown in
Table 4.
In addition to the CGDMs using FrDivEn
0 and FrDivEn
0.1, we included the Wasserstein generative adversarial network (WGAN) [
40], the CGDM with a gradient scale consistent with the default value of 1 in the source code, and the CGDM using DivEn as the basis for the gradient scale value for comparison. Regarding the fault classifier, in addition to utilizing the pretrained ConvNeXt model based on transfer learning, we also incorporated the pretrained VGG model [
41], GoogLeNet model [
42], ResNet model [
7], and DenseNet model [
43] for comparison. The application of the proposed diagnostic method to the three different automotive machine datasets is demonstrated below.
4.3.1. Bearing (Motor Load: 3 HP; Speed: 1730 rpm)
We calculated the DivEn, FrDivEn
0, and FrDivEn
0.1 of the time series at each imbalance ratio. The gradient scale
corresponding to each entropy was obtained by referencing the entropy–gradient scale curves. The entropy–gradient scale
and the diagnostic accuracy with ConvNeXt as the fault classifier are presented in
Table 6. The imbalanced diagnostic accuracy with various sample synthesizers and fault classifiers of the bearing dataset is shown in
Table 7.
The fault diagnostic results obtained using different sample synthesizers and fault classifiers provided a few key insights: (1) When other conditions are constant, the CGDM with a gradient scale corresponding to FrDivEn
0 achieves the highest fault diagnostic accuracy across all five imbalance ratios. For instance, with an imbalance ratio of 40:1 and the fine-tuned pretrained ConvNeXt as the fault classifier, the CGDM using FrDivEn
0 synthesizes samples achieving a fault diagnostic accuracy of 91.22%, which is 7.32% higher than the accuracy of samples synthesized using the WGAN. The samples synthesized from the CGDM with a gradient scale corresponding to FrDivEn
0 for each fault state are provided in
Figure 6. (2) The diagnostic accuracy of samples synthesized using the CGDM with a gradient scale corresponding to DivEn is lower than those of FrDivEn
0 and FrDivEn
0.1. With an imbalance ratio of 40:1 and ConvNeXt as the fault classifier, the fault diagnostic accuracy of samples synthesized from the CGDM using DivEn is 87.67%, which is 3.89% lower than the accuracy of samples synthesized using FrDivEn
0, and even lower than the 90.22% accuracy of the default CGDM where the gradient scale is fixed to 1. (3) ConvNeXt consistently maintains a high accuracy advantage over other fine-tuned pretrained models under the same conditions. With an imbalance ratio of 40:1 and the sample synthesizer as the CGDM using FrDivEn
0, the fault diagnostic accuracy using ConvNeXt is 91.22%, which is 10.20% higher than the accuracy when using VGG and 3.40% higher than that with ResNet.
T-distributed stochastic neighbor embedding (t-SNE) [
44] was applied for the visualization and interpretation of latent features captured by the ConvNeXt models. T-SNE helped in visualizing high-dimensional data. The analysis compared the fine-tuned ConvNeXt model’s performance without a sample synthesizer and with different sample synthesizers at a moderate imbalance ratio of 10:1.
Figure 7 shows the visualization of features. Compared to other sample synthesizers, the CGDM with a gradient scale corresponding to FrDivEn
0 results in better clustering of scatter points for each state. Taking the “7_OR” and “21_OR” states as examples, the CGDM using FrDivEn
0 enables ConvNeXt to cluster them separately without overlapping, unlike other sample synthesizers.
A confusion matrix was employed to categorize the captured features into various labels, as shown in
Figure 8. The diagonal elements from top left to bottom right indicate the number of correctly categorized samples in each class. The off-diagonal elements indicate the number of incorrectly categorized samples as other classes. In the test set, each bearing state contains 90 samples. The closer the value on the diagonal of a state class is to 90, the more accurately that state class is diagnosed. The CGDM with a gradient scale corresponding to FrDivEn
0 yields the greatest predictive accuracy.
The receiver operating characteristic (ROC) curves are shown in
Figure 9. In the ROC curves, “CGDM-Default” means “default CGDM with gradient scale fixed to 1”, “CGDM-DivEn” means “CGDM with gradient scale corresponding to DivEn”, “CGDM-FrDivEn
0” means “CGDM with gradient scale corresponding to FrDivEn
0”, “CGDM-FrDivEn
0.1” means “CGDM with gradient scale corresponding to FrDivEn
0.1”. Compared to other models, the ROC curve of the CGDM with a gradient scale corresponding to FrDivEn
0 is closest to the upper left corner, with an area under the curve (AUC) of 0.9962, indicating its excellent performance. The evaluation metrics are shown in
Table 8. The CGDM with a gradient scale corresponding to FrDivEn
0 achieves the highest precision of 0.9768, recall of 0.9767, and F1-score of 0.9767, demonstrating its excellent performance. The errors made by the proposed method and other methods are shown in
Table 9. The CGDM with a gradient scale corresponding to FrDivEn
0 achieves the lowest mean absolute error (MAE) of 0.0656 and root mean squared error (RMSE) of 0.4933. After combining the confusion matrix and comparing other methods, we believe that the misdiagnosis of some “21_BA” samples as “14_IR” is the main reason for the limitations of the proposed method. In addition, other failures such as pitting and multi-failure fusion, which are not considered in this bearing dataset, may also contribute to the limitations of the proposed method. The validity of the diagnostic method was tested by utilizing a dataset different from the previous bearing working conditions.
4.3.2. Gearbox
The University of Connecticut (UConn) gearbox dataset includes nine different gear states: normal state (N), missing tooth (Miss), root crack (Crack), spalling (Spall), and five levels of chipping tip severity (Chip1, Chip2, Chip3, Chip4, and Chip5) [
38]. The accelerometers, positioned on the input end of the gearbox housing, capture the gear vibration signals.
The entropy–gradient scale
and the diagnostic accuracy with ConvNeXt as the fault classifier are presented in
Table 10. At an imbalance ratio of 40:1, given that the gradient scales corresponding to FrDivEn
0 and FrDivEn
0.1 are both 1.39 to two decimal places, we take the CGDM with a gradient scale of 1.39 as the sample synthesizer’s fault diagnostic result for both FrDivEn
0 and FrDivEn
0.1. The diagnostic accuracy with different sample synthesizers and fault classifiers of the gearbox dataset is shown in
Table 11.
The diagnostic results obtained using different sample synthesizers and fault classifiers provide a few key insights: (1) The similar fault diagnostic accuracies achieved by the CGDMs using FrDivEn0 and FrDivEn0.1 are due to the minimal differences in the gradient scales. When the imbalance ratio is 40:1, the gradient scales for FrDivEn0 and FrDivEn0.1 are identical, even to two decimal places, at 1.39. At this point, when using the fine-tuned pretrained ConvNeXt as the fault classifier and the CGDM with a gradient scale corresponding to FrDivEn as the sample synthesizer, the fault diagnostic accuracy reaches 87.90%, which is 11.59% higher than that of WGAN and 9.53% higher than the default CGDM. (2) At several imbalance ratios, the CGDM with gradient scales corresponding to FrDivEn0 and FrDivEn0.1 achieves high fault diagnostic accuracy. For instance, with an imbalance ratio of 20:1 and ConvNeXt as the fault classifier, the CGDM using FrDivEn0 achieves a fault diagnostic accuracy of 98.89%. This is 7.37% higher than the WGAN, 4.71% higher than the default CGDM, and 1.66% higher than the CGDM using DivEn. When comparing FrDivEn0 and FrDivEn0.1, the CGDM with a gradient scale corresponding to FrDivEn0 achieves a slightly higher diagnostic accuracy. (3) As the imbalance problem worsens, the advantages of ConvNeXt over other CNN models become evident. With an imbalance ratio of 40:1 and the CGDM with a gradient scale corresponding to FrDivEn as the sample synthesizer, ConvNeXt achieves a diagnostic accuracy of 87.90%, which is 19.06% higher than VGG and 7.72% higher than ResNet.
Figure 10 illustrates a t-SNE visualization of the features. Compared to other sample synthesizers, the scatter points for each state are better clustered when the CGDM with a gradient scale corresponding to FrDivEn
0 is used as a synthesizer. For example, the CGDM using FrDivEn
0 enables ConvNeXt to cluster the scatter points for the “Chip1” state without overlap, unlike the other sample synthesizers.
The confusion matrix was employed to further evaluate the sample synthesizers, as shown in
Figure 11. In the test set, each gearbox state contains 90 samples. The closer the value on the diagonal of a state class is to 90, the more accurately that state class is diagnosed. The CGDM with a gradient scale corresponding to FrDivEn
0 achieves the highest predictive accuracy, while the CGDM with a gradient scale corresponding to FrDivEn
0.1 is slightly less accurate.
The receiver operating characteristic (ROC) curves are shown in
Figure 12. In different models, the CGDMs using gradient scales corresponding to FrDivEn
0 and FrDivEn
0.1 achieve excellent performance levels, with their AUCs reaching 0.9998 and 1.0000, respectively. The evaluation metrics are shown in
Table 12. The F1-score exceeds 0.99 when the CGDMs are used as the sample synthesizers, with the CGDMs using gradient scales corresponding to FrDivEn
0 and FrDivEn
0.1 achieving F1-scores of 0.9963 and 0.9951, respectively. The confusion matrices and evaluation metrics indicate that both CGDMs using FrDivEn
0 and FrDivEn
0.1 provide high-quality samples for the ConvNeXt model. The errors made by the proposed method and other methods are shown in
Table 13. The CGDM with a gradient scale corresponding to FrDivEn
0.1 achieves the lowest MAE of 0.0111 and RMSE of 0.1685. After combining the confusion matrix and comparing other methods, we believe that the limitation is mainly due to some of the samples being misdiagnosed as the “Chip 2” state. In addition, other faults that may occur in gearboxes but are not considered in this dataset, such as shaft bending and multi-fault fusion, may also contribute to the limitations of the proposed method. By using a different automotive machine dataset from the bearing in the previous analysis, the validity and generalizability of the imbalanced fault diagnostic method in automotive machines were tested.
4.3.3. Rotor
The Wuhan University (WHU) rotor dataset was collected from an experimental automotive machinery system. Vibration signals were collected for four rotor states: normal (N), unbalanced (Unbal), misalignment (Misalign), and contact rubbing (Rub). The eddy current sensors, mounted on the sensor bracket, collected the vibration signals. The rotor vibration signals were denoised based on wavelet thresholding [
39], resulting in samples with distinct characteristics and significant differences between states, thus reducing the difficulty of fault diagnosis. We used an image size of 32 × 32, the same as the CIFAR-10 small-sized image sample set, for the input sample size of both the sample synthesizer and fault classifier. This approach was aimed at increasing the variability by using different models, and testing the proposed method’s effectiveness on small-sized samples.
The entropy–gradient scale
and the diagnostic accuracy with ConvNeXt as the fault classifier are presented in
Table 14. The imbalanced diagnostic accuracy with various sample synthesizers and fault classifiers is shown in
Table 15.
The entropy–gradient scale calculations and the fault diagnostic results provide a couple of key insights: (1) The DivEn, FrDivEn
0, and FrDivEn
0.1 of the denoised rotor vibration signals are reduced compared to the previous bearing and gearbox datasets. For example, the FrDivEn
0 results of bearing and gearbox data at an imbalance ratio of 2:1 are 7.3113 and 7.9003, respectively, while the FrDivEn
0 of denoised rotor data is 3.6170. A decrease in the gradient scale of the CGDM accompanies this decrease in DivEn and FrDivEn. It should be noted that the gradient scale loses its ability to tune according to DivEn at five different imbalance ratios. Although DivEn still varies with the imbalance ratios, the gradient scale remains at 0. We believe this is due to the DivEn–gradient scale curve approximating the straight-line y = 0 when DivEn is less than 0.7 (as shown in
Figure 5). The gradient scale being constantly 0 results in the CGDM using DivEn having the least diagnostic accuracy in the CGDMs. At an imbalance ratio of 40:1 and with the fine-tuned pretrained ConvNeXt as the fault classifier, the CGDM using DivEn as the sample synthesizer achieves a diagnostic accuracy of 97.22%, which is 1.69% lower than the CGDM using FrDivEn
0 and slightly lower than both the default CGDM and the CGDM using FrDivEn
0.1. This phenomenon reflects the limitations of setting the gradient scale according to DivEn, which may not be applicable to unfamiliar machines. (2) With all other conditions constant, ConvNeXt achieves the highest diagnostic accuracy compared to other CNN models on a sample set of size 32 × 32. This is consistent with its excellent performance in the bearing and gearbox datasets. With an imbalance ratio of 40:1 and taking the CGDM using FrDivEn
0 as the sample synthesizer, ConvNeXt achieves a diagnostic accuracy of 97.50%, which is 12.50% higher than ResNet and 8.86% higher than DenseNet. Notably, the fine-tuned pretrained VGG is second only to ConvNeXt in diagnostic accuracy. We believe this is due to the stacked small-sized 3 × 3 convolutional kernels used in VGG, which make it suitable for small-sized sample classification tasks like those with the size of 32 × 32.
The visualization through t-SNE is shown in
Figure 13. Compared to other sample synthesizers, the scatter points for each state are better clustered when the CGDM using FrDivEn
0 is used as a synthesizer. Taking the “contact rubbing” and “unbalanced” states as examples, the CGDM using FrDivEn
0 enables ConvNeXt to cluster them separately without overlapping, unlike other sample synthesizers.
The confusion matrix was employed to further evaluate the sample synthesizers, as shown in
Figure 14. In the test set, each rotor state contains 90 samples. The closer the value on the diagonal of a state class is to 90, the more accurately that state class is diagnosed.
The evaluation metrics are presented in
Table 16. The CGDM with a gradient scale corresponding to FrDivEn
0 achieves a predictive accuracy of 100% and an F1-score of 1, demonstrating its excellent performance in sample synthesis tasks. The errors made by the proposed method and other methods are shown in
Table 17. The CGDM with a gradient scale corresponding to FrDivEn
0 achieves the lowest MAE of 0 and RMSE of 0. After combining the confusion matrix and comparing other methods, we believe that diagnosing the “rub” state is the most challenging, which may limit the effectiveness of diagnostic methods. In addition, other faults that may occur in the rotor, such as bar breaking and multi-fault fusion, which are not considered in this dataset, may also affect the limitations of the proposed method.
At the moderate imbalance ratio of 10:1 and with ConvNeXt as the fault classifier, the accuracy of the rotor fault diagnosis without the help of the sample synthesizer reaches 99.44%, which leads to a subtle difference in the diagnostic accuracy results after using different sample synthesizers to assist in the fault diagnosis. To further investigate the impacts of various synthesizers on fault diagnosis, we charted a line graph of the diagnostic accuracies over training epochs for the different synthesizers with a moderate imbalance ratio of 10:1, as shown in
Figure 15.
With a moderate imbalance ratio of 10:1 and ConvNeXt as the fault classifier, the rotor fault diagnostic accuracy without a sample synthesizer’s aid reaches 99.44%. This leaves little room for the different sample synthesizers to add boosts. The pretrained ConvNeXt’s output layer is trained during epochs 1–20, with the learning rate decaying to 0.001 between the 11th and 20th epochs, resulting in decreased oscillations in the accuracy lines compared to the first 10 epochs. The 21st epoch marks the beginning of training for ConvNeXt’s other layers. After 55 epochs, the accuracy lines for each CGDM stabilize, hovering around 98% or 99%. Throughout the entire training process, the accuracy lines of CGDMs are smoother compared to the synthesizer-less models and the WGAN due to the higher-quality samples provided by the CGDMs. Notably, using the CGDM with FrDivEn0 achieved the maximum diagnostic accuracy of 100% for the first time at the 69th epoch. These results reflect the advancement provided by the proposed FrDivEn trade-off for imbalanced fault diagnosis.
5. Conclusions
In the paper, to solve the problem of diversity entropy (DivEn) being insensitive to the diversity of time series, we combined DivEn with fractional order calculus to propose fractional diversity entropy (FrDivEn). Furthermore, we introduced FrDivEn to trade off the classifier-guided diffusion model’s (CGDM) sample synthesis and presented an imbalanced diagnostic method for automotive machines. Specifically, this method first transforms the time series vibration signal into Gramian angular field (GAF) image samples using GAF transformation. Next, it synthesizes high-quality samples using the CGDM with a gradient scale corresponding to FrDivEn. These synthetic samples are then combined with imbalanced real samples to create a mixed sample set, which is finally input into the fine-tuned pretrained ConvNeXt for fault diagnosis.
The FrDivEn trade-off analysis, including the fractional order of FrDivEn and the gradient scale of the CGDM, was performed using the CWRU bearing dataset with a motor load of 0 and a speed of 1797 rpm. It should be noted that reliable diagnostic signals are a prerequisite for high-precision fault diagnosis. Suitable sensor selection and correct installation methods should not be overlooked, as they create the environment for obtaining reliable and high-quality diagnostic signals. In our study, we used three datasets to validate the effectiveness and generalizability of the proposed method: the CWRU bearing dataset and UConn gearbox dataset, which were not denoised, and the WHU rotor dataset, which was denoised. The results demonstrated the effectiveness of our method when using diagnostic signals with different processing methods. The main innovations and results were as follows:
- (1)
For fault diagnosis in automotive machines with imbalanced sample sets, it is crucial to balance the diversity and fidelity of synthetic samples. We propose a novel signal measure called fractional diversity entropy (FrDivEn) to address this need. FrDivEn reflects vibration signal diversity at varying imbalance ratios and adjusts the generative model’s emphases on diversity and fidelity during sample synthesis. Unlike the traditional DivEn, which is insensitive to signal diversity at different imbalance ratios, FrDivEn sensitively adapts to these ratios, providing a more effective reflection of signal diversity. In the CWRU bearing dataset, the differences in FrDivEn between vibration signals are significantly greater than those in DivEn. For example, the FrDivEn0 difference between vibration data at imbalance ratios of 2:1 and 5:1 is 0.9158, which is substantially larger than the DivEn difference of 0.0061.
- (2)
To select the appropriate gradient scale of the CGDM and achieve high-quality sample synthesis, we propose using FrDivEn to determine the ideal gradient scale. Utilizing the CWRU bearing dataset, we fit DivEn– and FrDivEn–gradient scale curves with various fractional orders. According to the fitting results, the FrDivEn0– and FrDivEn0.1–gradient scale curves exhibited a more suitable range of gradient scales and smoothness compared to the DivEn–gradient scale curve.
- (3)
To enhance diagnostic accuracy in automotive machines, we propose a fault diagnostic method utilizing the CGDM with a gradient scale determined by FrDivEn as the sample synthesizer, and a fine-tuned pretrained ConvNeXt as the fault classifier. In an experiment using the CWRU bearing dataset with a motor load of 3 HP, a speed of 1730 rpm, and an imbalance ratio of 40:1, the diagnostic accuracies achieved using the CGDM with gradient scales corresponding to FrDivEn0 and FrDivEn0.1 were 91.22% and 90.89%, respectively. These results represent improvements of 7.32% and 6.93% over the WGAN, and 4.05% and 3.67% over the CGDM with a gradient scale corresponding to DivEn. For the gearbox and rotor datasets, the diagnostic accuracies using the CGDM with FrDivEn at an imbalance ratio of 40:1 were 87.90% and 98.89%, respectively, marking increases of 11.59% and 3.48% over the WGAN. Across these three imbalanced fault diagnosis experiments for various automotive machines, the CGDM with a gradient scale determined by FrDivEn consistently achieved a superior diagnostic accuracy compared to other sample synthesizers, with the CGDM using FrDivEn0 performing slightly better than FrDivEn0.1.
- (4)
Across the three imbalanced fault diagnosis experiments for various automotive machines, the fine-tuned pretrained ConvNeXt consistently achieved the highest diagnostic accuracy compared to other fine-tuned pretrained CNN models. This was evident in both bearing and gearbox fault diagnoses with a sample size of 256 × 256, as well as rotor fault diagnosis with a sample size of 32 × 32. For instance, in the experiment using the CWRU bearing dataset with an imbalance ratio of 40:1 and the CGDM using FrDivEn0 as the sample synthesizer, ConvNeXt achieved a fault diagnostic accuracy of 91.22%. This was 10.20% higher than the accuracy achieved using the pretrained VGG and 3.40% higher than that of the pretrained ResNet.
In summary, this study was focused on synthesizing high-quality samples by using the FrDivEn trade-off to achieve excellent imbalanced fault diagnostic accuracy. In the future, reducing the computational complexity and considering multi-fault fusion in the imbalanced fault diagnostic method could be the subjects of further research.