1. Introduction
With the continued advancement of industrialization, rotating equipment has become an indispensable component in production and manufacturing processes. Bearings are extensively utilized in a variety of applications, including high-speed trains, heavy trucks, generators, wind turbines, and conveyor belts. As critical components of rotating machinery, bearings serve essential functions in reducing friction, supporting loads, ensuring precision, minimizing wear, and lowering energy consumption. However, due to their operation in high-speed rotating equipment, bearing failures can not only exacerbate equipment damage but also lead to safety incidents, resulting in production stoppages and substantial economic losses. Consequently, bearing fault diagnosis and condition monitoring are of paramount importance. Accurate and rapid bearing health monitoring is essential for ensuring the efficient and safe operation of equipment while mitigating adverse economic impacts.
Collecting operational data regarding various health conditions using sensors, processing this data, and establishing data-driven intelligent bearing fault diagnosis models is the current mainstream approach. During the operation of rotating machinery, bearings generate multiple physical parameters, including vibration signals, sound signals, and temperature signals. Among these, vibration signals are the most widely studied [
1]. Image processing methods can reveal bearing fault characteristics from a more multidimensional perspective. This transformation not only conveys additional fault features but also facilitates more efficient feature extraction by the models. Yan et al. [
2] proposed a fault diagnosis method using a combination of the Markov transformation field (MTF) and deep residual network (ResNet). Tang et al. [
3] proposed a fault diagnosis method utilizing Gramian angular summation fields (GASF). The study converts time-series signals from multiple sensors into two-dimensional GASF feature maps, preserving the absolute temporal relationships within the time series. Xie et al. [
4] proposed a method based on complementary ensemble empirical mode decomposition. This method identifies the intrinsic mode function with the highest correlation and converts one-dimensional signals into two-dimensional color images using recurrence plots (RP) as inputs for the multiscale perceptron. Zhang et al. [
5] proposed a method based on short-time Fourier transform (STFT) and convolutional neural networks (CNN). This method examines five typical window functions, along with their respective widths and overlap widths, to identify the optimal function. It employs stacked dual-layer convolutions to enhance the model’s nonlinear representation capabilities. Kumar et al. [
6] proposed applying continuous wavelet transform (CWT) to the collected signals, extracting time–domain statistical features from the CWT coefficients and differentiate them using the K-nearest neighbor classifier. The aforementioned studies employ various image processing techniques to capture underlying relationships and patterns within the data for fault diagnosis. However, current research in image processing faces the following limitations: (1) A single image processing method cannot comprehensively capture all important features within a signal. Furthermore, the performance of a single feature processing method may not be stable when dealing with complex working conditions. (2) The operating environment of bearings is complex and variable. Bearing vibration signals may be coupled with other vibration signals, or the fault signals may be relatively weak. Therefore, extracting fault features from mixed signals requires further attention.
In recent years, deep learning, a rapidly advancing branch of machine learning, has found extensive applications in the field of bearing fault diagnosis. Deep learning leverages multi-layer neural networks for complex feature extraction and pattern recognition, offering robust learning and generalization capabilities. Xu et al. [
7] proposed a hybrid deep learning method that uses CNNs to extract fault features from time–frequency images, which are then input into a gcForest classifier. Li et al. [
8] employed a dual-stage attention recurrent neural network to enhance minority fault features in an imbalanced dataset and used a convolutional neural network embedded with a convolutional block attention module (CBAM) for fault classification. Ma et al. [
9] proposed a multi-objective optimization-based ensemble deep learning method for rotor-bearing diagnosis, integrating convolutional residual networks, deep belief networks, and deep autoencoders. Shen et al. [
10] introduced a physics-based deep learning approach that first assesses bearing health levels using a threshold model, followed by a CNN that automatically extracts high-level features from the inputs for bearing fault detection. However, in reality, obtaining a large amount of bearing fault data is often challenging. Additionally, the operating conditions of motors are complex, and vibration signals cannot consistently maintain linearity. These factors contribute to the poor performance of deep learning models in addressing these challenges.
Transfer learning is an effective method that leverages existing knowledge to solve problems in different but related domains. It relies two fundamental assumptions of traditional deep learning: the requirement that the training and testing samples must be independently and identically distributed, and the necessity for a large number of samples to develop an accurate classification model. Consequently, transfer learning is used to tackle the issue of low fault diagnosis accuracy under varying operating conditions for bearings. In transfer learning, domain adaptation algorithms are commonly used to align the source and target domains. Xiao et al. [
11] proposed a cross-domain fault diagnosis framework based on transferable features and manifold embedding discriminant distribution adaptation. This method designs a transferability evaluation method using an adjusted Rand index and maximum mean discrepancy (MMD) to quantify the fault discriminability and domain invariance of features. Additionally, a new manifold-embedding discriminant joint distribution adaptation method is proposed to address the class imbalance problem between the target and source domains. P. Chen et al. [
12] proposed a model based on the sliced Wasserstein distance for bearing fault diagnosis under different loads and speeds. This model achieves comprehensive domain adaptation by using adversarial training to learn a domain-invariant space. Li et al. [
13] introduced a multi-scale extension method based on residual neural networks, combined with the multiple kernel maximum mean discrepancy (MK-MMD), to address issues affecting rotating components such as bearings and gears in noisy and complex environments. Xu et al. [
14] introduced a method based on a convolutional kernel dropping mechanism, skip connections, and joint maximum mean discrepancy (JMMD). This approach aims to improve diagnostic accuracy in unsupervised domain discrepancy scenarios by enhancing feature transfer and domain alignment. Wu et al. [
15] proposed a model that integrates domain-adversarial neural networks with an attention mechanism. This model incorporates an attention mechanism into the feature extractor to retain fault-related features. Additionally, it replaces the fully connected (FC) layers in the classifier and discriminator with global average pooling layers, thus reducing the number of parameters and enhancing efficiency. Wu et al. [
16] introduced the gradient conditional domain adversarial network method, which uses conditional domain adversarial networks (CDAN) as the main component and integrates data filtering and intermediate domain selection. The above article focuses on the application of domain-adaptive transfer learning in bearing fault diagnosis and proposes various domain-adaptive methods to address the differences in data distribution under different working conditions and complex environments. However, there are several shortcomings in the application of transfer learning methods for fault diagnosis: (1) Cooperation between the feature extraction model and the domain adaptation algorithm: Under complex working conditions, the feature extraction model may fail to extract effective features from the source domain, resulting in the domain adaptation algorithm’s inability to effectively bridge the gap between the source and target domains. (2) Limitations of common domain adaptation algorithms: Algorithms such as MMD and MK-MMD primarily focus on reducing marginal distribution differences between the source and target domains but neglect the joint distribution of labels and features. (3) Challenges with CDAN and DANN-based algorithms: These algorithms achieve domain adaptation through adversarial training, but the classifier’s decision boundary can be easily disrupted in the target domain, potentially leading to a decrease in the classifier’s generalization performance on the target domain.
Based on the aforementioned research status and identified shortcomings, this paper proposes a novel method for bearing fault diagnosis that is effective in conditions with high levels of noise and complex working conditions.
(1) This paper innovatively integrates three types of images: MTF, CWT, and RP. The fused images encompass various features, including state transition characteristics, periodicity, autocorrelation, and time–frequency characteristics. This approach overcomes the limitations of existing single-feature processing methods, which struggle to comprehensively capture features and maintain stability under complex working conditions.
(2) This paper proposes a residual network based on the Kolmogorov–Arnold representation theorem. By introducing KABlock layers into the traditional residual network and combining fixed basis functions with spline functions, the model’s adaptability and robustness under complex working conditions are enhanced. This optimization enables the model to better capture the complex dynamic characteristics and nonlinear features of bearing vibration signals.
(3) This paper introduces an innovative domain adaptation method that integrates the MK-MMD and CDAN+E algorithms. MK-MMD effectively aligns the distributions of the source and target domains through a multi-kernel approach, providing a solid foundation for the adversarial learning of CDAN+E, which helps improve the generalization of the CDAN+E classifier. CDAN+E introduces adversarial learning and entropy conditioning, leveraging the joint distribution of labels and features to overcome the limitations of MK-MMD, which only considers marginal distributions. The integrated domain adaptation algorithm more accurately aligns the source and target domains, harnessing their complementary advantages and demonstrating higher adaptability and robustness in complex cross-domain tasks.
3. Proposed Method
This paper introduces the MCR-KAResNet-TLDAF method. The framework of the MCR-KAResNet-TLDAF method is illustrated in
Figure 2. The method comprises three main components: image fusion, backbone network optimization, and domain adaptation algorithm integration. The image processing module employs image fusion to present fault features from multiple perspectives. The backbone network module enhances and optimizes the residual network using the Kolmogorov–Arnold representation theorem. The integrated domain adaptation module combines the MK-MMD and CDAN+E domain adaptation algorithms to better align the source and target domains.
3.1. Image Fusion
Images can intuitively reflect relevant characteristics of signals, such as time–frequency features, periodic features, autocorrelation features, and state transition features. In complex mechanical structures, the operating state of bearings is coupled with the states of other components, and the vibration signals are significantly affected by noise and random shocks. In such cases, using a single image processing method to characterize the signals from one perspective may fail to capture the complete information of the bearing vibration signals. This can hinder the model’s subsequent fault diagnosis and identification of the bearings.
This study processes signals using a image fusion method. Specifically, it fuses the MTF, RP, and CWT images through the R, G, and B channels, respectively. This approach enables the two-dimensional images to represent the original signal from multiple perspectives, offering a more comprehensive and enriched depiction of the signal’s characteristics.
Figure 3 illustrates the complete image construction process.
Step 1: Use a sliding window to segment the raw vibration signal. The sampling length of the image is determined by the frequency at which the vibration sensor collects the signal and the rotational speed of the motor. The sampling points under different working conditions in the same dataset should be kept consistent. To ensure that a single image fully describes the fault, the sampling length should at least cover one complete rotation of the bearing. When the rotational speeds under different working conditions in the same dataset are different, the sampling length should be at least greater than the maximum sampling length among all conditions. Detailed information can be found in the data description section of the two datasets selected in this paper. The specific equation for calculating the sampling length is as follows:
where
is the sampling frequency and
n is the rotational speed.
Step 2: To ensure a smooth transition at the window boundaries and to retain the complete detailed characteristics of the signal, this study uses a 50% overlap in the data when sliding the window.
Step 3: Each segmented data sample is processed using MTF, CWT, and RP image processing methods. The resulting color images are then converted to grayscale images to serve as the three inputs for the image fusion method.
Step 4: The generated grayscale images from MTF, CWT, and RP are used as inputs for the R, G, and B channels, respectively, for image fusion. To preserve the important features of the images, each channel is multiplied by a coefficient. In this study, the coefficients for the R, G, and B channels are set to [0.2, 1, 0.2]. The final result is the processed fused image.
The fused images obtained using the above processing steps simultaneously display the time–frequency characteristics, periodic features, autocorrelation properties, and state transition characteristics of the original signal. This provides a rich data foundation for subsequent models to extract fault features.
3.1.1. KAResNet
To enhance the accuracy of feature extraction under complex working conditions, this paper proposes the KAResNet model as the backbone network. The architecture of KAResNet is illustrated in
Figure 4, with specific hyperparameters provided in
Table 1. The KAResNet model comprises a Conv1 layer; H1, H2, and H3 layers; a global average pooling layer; and two KABlock layers. Each of the H1, H2, and H3 layers is composed of multiple stacked convolutions. To address the issues of gradient vanishing and exploding during training, residual connections are utilized to form residual blocks.
Unlike the conventional ResNet structure, this paper introduces the Kolmogorov–Arnold network architecture [
24], which utilizes the KABlock in place of the original fully connected layer using a fixed activation function. This modification enhances the model’s efficiency in approximating complex functions. Notably, the KABlock represents an optimized and improved implementation of the Kolmogorov–Arnold representation theorem. The KABlock consists of a combination of basis and spline functions. The basis function employs a fixed SiLU activation function, while the spline functions consist of B-Spline functions. According to the Kolmogorov–Arnold representation theorem, it is necessary to find a suitable univariate function to approximate complex functions. The KABlock selects B-Spline as the fundamental univariate function. The KABlock maintains flexibility and adjustability while providing high smoothness and numerical stability, enabling more efficient approximations of complex functions. This design enhances the overall performance of the model, achieving more complex function simulations using simpler functions.
3.1.2. MK-MMD
The most crucial aspect of transfer learning domain adaptation is reducing the distribution discrepancy between the source domain and the target domain. MMD is a commonly used metric for measuring the distance between two probability distributions in regenerative kernel Hilbert space (RKHS) [
25]. The equation for MMD is defined as follows:
where
represents the mapping function.
However, the MMD method is limited by the use of a single kernel function, which often performs poorly when the distribution differences between the two domains are complex and diverse. MK-MMD improves upon this by using multiple kernel functions to compute a combined distribution discrepancy, taking into account the various relationships between features. The specific kernel defined by
m kernel functions can be expressed as follows:
where
are the weight parameters for the different kernels. The MK-MMD loss can be expressed as follows:
where
denotes the MMD loss, computed using the kernel function
. Thus, the total MK-MMD loss function can be expressed as follows:
where
represents the classification loss, and
is a weight parameter.
3.1.3. CDAN+E
CDAN+E is a network model analogous to generative adversarial networks (GANs), integrating adversarial learning and domain adaptation into a two-player game. Adversarial domain adaptation models typically consist of a feature extractor, a category classifier, and a domain discriminator [
26]. This adversarial process introduces multilinear conditioning to simultaneously consider the joint distribution of features and labels. To avoid negative transfer and the non-convergence of adversarial training, CDAN+E uses entropy to define the uncertainty of predictions. The entropy is defined as follows:
where
c is the number of training classes, and
is the probability that a sample is predicted to belong to class
c. The smaller
is, the more accurate the prediction, indicating that such samples should contribute more to domain distribution matching. Based on the above entropy, each sample can be assigned an entropy-based weight, ensuring that the domain discriminator prioritizes samples with a higher prediction accuracy. Thus, the entropy-based certainty measure is as follows:
Thus, the loss function for CDAN+E can be defined as follows:
where
represents the parameters of the feature extractor
,
represents the parameters of the domain classifier
, and ⊗ denotes the tensor product.
represents the multilinear conditioning of the CDAN+E domain adaptation algorithm.
The total loss function for CDAN+E is given by the following:
where
represents the parameters of the category classifier
,
is the classification loss, and
is a weight parameter.
3.1.4. A Domain Adaptive Algorithm Combining MK-MMD and CDAN+E
Based on the backbone model KAResNet, this paper proposes a domain adaptation algorithm that combines MK-MMD and CDAN+E. The integrated model architecture is shown in
Figure 5. By merging kernel-based and adversarial-based domain adaptation methods, this approach addresses the limitations of using a single domain adaptation method. MK-MMD minimizes the distribution discrepancy between the source and target domains by measuring the difference in the mean embeddings of their feature distributions in the RKHS. It enhances the testing capability by utilizing different kernel functions to find the optimal kernel and minimize distribution differences. CDAN+E overcomes the limitations of MK-MMD by accounting for the joint distribution differences of features and labels. CDAN+E leverages multilinear conditioning and adversarial learning to consider the joint distribution of features and labels, addressing the limitations of MK-MMD, which cannot account for the differences in the joint distribution of features and labels. Meanwhile, MK-MMD effectively aligns the distributions of the source and target domains using the multicore method, providing a solid foundation for the adversarial learning of CDAN+E and enhancing the generalization capability of the CDAN+E classifier. The adaptive algorithm, through fusion, enables KAResNet to better extract domain-invariant features and achieve better domain-adaptive effects and fault classification effects.
The specific process of domain-adaptive fusion is as follows: First, features are extracted from the input images using KAResNet. Simultaneously, MK-MMD is utilized to reduce the distribution differences between the source and target domains through kernel mapping. CDAN+E weights the entropy predicted by the classifiers through entropy conditioning, allowing the domain discriminator to prioritize the higher-weighted parts. CDAN+E simultaneously considers the joint distributions of features and labels using multilinear conditioning. Additionally, the distributions of the source and target domains are further aligned through adversarial learning between the feature extractors and domain classifiers. After completing feature extraction and adversarial learning, faults are identified based on the category classifiers.
The MK-MMD loss, CDAN+E loss, and classification loss are combined to form the total loss function of the MCR-KAResNet-TLDAF method. The total loss function is given as follows:
where
represents the current number of iterations and
represents the maximum number of iterations. By adjusting
and
, the model achieves a better balance between classification and domain adaptation.