1. Introduction
Bearings are essential in rotating machinery, serving as a vital part in maintaining the smooth functioning of industrial equipment. The failure of bearings during the equipment’s operation not only poses a direct threat to the safety of the machinery but also leads to significant economic losses. Hence, the accurate prediction of the remaining useful life (RUL) of bearings is essential for the effective prognosis and health management of the equipment [
1,
2].
The current deep learning models have shown impressive performance in predicting the RUL of bearings [
3]. However, these models rely on the assumption that the training and testing sets are independent identically distributed (I.I.D.) [
4]. In realistic industrial scenarios, different bearings usually work under different conditions, and it is uncertain whether the vibration data collected from the unknown bearing are I.I.D. with the training bearing. Furthermore, obtaining the full life cycle, i.e., run-to-failure vibration data for every type of bearing, for the model training is unrealistic. Therefore, exploring the effective learning model trained on one bearing that can be generalized to predict the RUL of unknown bearings is of great application.
Domain adaptive approaches provide effective solutions to address the issue of data distribution discrepancies between training and testing sets [
5,
6,
7]. These methods focus on closing the gap between the source and target domains, allowing the model to carry over knowledge from the source domain (such as a source bearing) to the target domain (like a target bearing) to predict the RUL, as shown in
Figure 1a. However, domain adaptive methods cannot be applicable if the target bearing data are not accessible for model training.
To address the aforementioned issues, the single-source domain generalization approaches provide a promising solution for predicting the RUL of unknown bearings. The approaches focus on data enhancement to expand the data distribution of the single-source domain and improve the generalization performance of the model to the unknown target domain, as shown in
Figure 1b. Recently, single-source domain generalization approaches have blossomed in the computer vision (CV) field [
8,
9]. However, they have not yet been applied to the prediction RUL of bearings. This can be attributed to two main reasons as follows.
Firstly, the existing single-source domain generalization approaches do not consider the stage discrepancy of time series data. Different from the image data in the CV field, the vibration data of the full life cycle of bearings have significant time domain stage discrepancies. In the early health stage of the bearing, the fluctuation of its vibration time domain data is very tiny. Such vibration data of the health stage have a subtle, sometimes even negative, influence on describing the degradation trend of the bearing. Therefore, it is very important to divide the vibration data of the full life cycle to eliminate the effects of the early health stage data. In addition, the current single-source domain generalization methods strive to enhance the data distribution diversity using image-oriented augmentation techniques. However, such techniques may produce “counterfactual data” that completely deviate from the authentic vibration data distribution of bearings, which will deteriorate the generalization ability of the model.
To address the issues mentioned earlier, we suggest a more universal RUL prediction model, trained on a single bearing, that can be applied directly to other bearings, termed as the adaptive stage division and parallel reversible instance normalization (AsdinNorm) model. The problem of predicting the RUL for unknown bearings has been approached as a single-source domain generalization learning challenge through two key steps: firstly, determining the degradation point locations of different bearings using an adaptive threshold stage division method to adaptively and iteratively lock the degradation trend and finding the final degradation point locations after many iterations to divide the health and degradation stages of the run-to-failure vibration data of bearings, and combining instance normalization and instance denormalization of the bearing data into a unified GRU-based RUL prediction network for the purpose of leveraging the distribution bias in instance normalization, as well as to obtain a better overall prediction accuracy and improve the generalization performance of the model.
The main contributions of this paper can be summarized as follows:
The proposed AsdinNorm model comprises three modules: instance normalization, adaptive threshold stage division, and parallel reversible normalization RUL prediction, respectively, used for enhancing the diversity of data distribution, degradation stage division, and leveraging the distribution bias in instance normalization of the vibration data of the source bearing, which improve the prediction accuracy and generalization ability of the model.
We designed an adaptive threshold-based degradation point identification method to effectively divide the health and degradation stages of the full life cycle vibration data of bearings. The designed adaptive threshold algorithm iteratively updates the degradation point locations to quickly and efficiently obtain the final degradation point locations of different bearings. Correspondingly, the vibration data of the degradation stage are selected for the model training for the purpose of reducing the interference of early fluctuations of the health stage, as well as eliminating the influence of the data distribution discrepancy between the training bearings and unknown testing bearings on the model performance.
We explored the parallel instance normalization and denormalization algorithm of the source bearing data and then combined it into a unified GRU-based RUL prediction network, which avoids the generation of “counterfactual” data, as well as the distribution bias in data enhancement, and achieves a better prediction accuracy while improving the generalization performance of the model.
2. Related Works
The single-source domain generalization approaches [
10,
11] use data augmentation techniques to expand the data distribution of the training set (single-source domain) to cover the data distribution of the unknown target domain as much as possible and, further, to improve the generalization performance of the model. They are usually categorized as GAN-based, meta-learning-based, and scaling-based approaches.
GAN-based [
10,
12,
13] methods can create additional data resembling the source domain by using generative and discriminative models. HSIGAN [
14] allows the discriminator to perform classification in addition to distinguishing between real and synthetic data, i.e., it learns to generate overall real samples and also encourages the generator to learn the representation of different classes of samples. DAGAN [
15] learns a large number of data-enhanced transformations by training the autoencoder. BAGAN [
16] trains the autoencoder to learn the multivariate normal-terrestrial distribution to the image, which represents the distribution of the overall dataset.
The meta-learning [
17] approaches use the training data as a meta-training set, while the generated data serve as a test set to learn robust feature representations using a meta-learning strategy. Qiao et al. [
18] increased the source sample size in the input and label space and evaluated the guidance based on uncertainty. This method is used for data enhancement, domain generalization, and the effective training of models in a Bayesian meta-learning framework. The findings indicate that the proposed approach is effective and outperforms others in various tasks.
The scaling-based [
19] approaches are the general technology for single-source domain generalization. ASR-Norm [
20] uses neural networks to adaptively normalize and scale statistics to match various domains. SORAG [
21] uses the manual synthesis of new samples to improve the robustness of the model to tackle the problem of sample imbalance. SamplePairing [
22] performs basic data enhancement (e.g., random flipping) and then superimposes the data by pixels in the form of averaging to synthesize new samples, which can expand the diversity of the samples and enhance the generalization ability of model.
In short, the single-source domain generalization methods are currently used mainly for classification tasks of images. However, when dealing with the prediction problem for bearing vibration data, the interference of vibration data at different stages and the distribution bias in data enhancement must be considered. Therefore, it is crucial to develop a novel single-source domain generalization method that is more suitable to the characteristics of vibration data for RUL predictions of unknown bearings.
3. Proposed Method
To briefly describe the RUL prediction problem for bearings, given two sets of bearing vibration signals in different working conditions: the source bearing dataset and the unknown target bearing dataset , where , represents the total count of the samples.
Our model captures the degradation feature from the vibration data of the source bearing
; then, the model predicts the vibration data
[
23]:
where
is the parameter of the model, and
denotes the
vibration data of the source bearing.
The model parameters are optimized via the iterative training, expressed as shown:
Finally, inputting the target bearing to the trained model, the model predicts the vibration data of the testing bearing .
The AsdinNorm model’s architecture is depicted in
Figure 2 and consists of three main components: an instance normalization module, an adaptive threshold stage division module, and a parallel reversible normalization RUL prediction module.
3.1. Instance Normalization
The instance normalization module is designed to preserve the non-stationary information from vibration data while reducing the difference in data distribution from target bearings. Firstly, we obtain the peak-to-peak values
from the vibration signal data of the source bearing
, which can alleviate the interference of noise and facilitate a clear representation of the degradation trend, expressed as follows:
Next, we convert the peak-to-peak values
into a time series
using a sliding window. At last, we normalize the input time series
by applying the instance mean and standard deviation. The variable
represents the length of the input sequence, and the mean
and standard deviation
of each instance of the input sequence are calculated as follows:
where
denotes the jth sample of the ith sliding window.
With these statistics, we derive the normalized [
24] input sequence data
from the input sequence data
.
where
are the learnable affine parameter vectors used in the instance normalization method to equalize the effective information across the bearings.
Importantly, we merge the normalized data with the time series data to form a new time series input , which serves as the input to the adaptive threshold stage division module.
3.2. Adaptive Threshold Stage Division
In this section, an adaptive threshold stage division method is proposed to determine the location of the bearing degradation points for the purpose of the segregation of the health stage and the fast degradation stage. This critical step facilitates the accurate prediction RUL of unknown target bearings via using the bearing data of the degradation stage. The specific procedure involves the following steps, as shown in
Figure 3.
Calculate to obtain the degradation path: Firstly, we adopt the isotonic regression algorithm to transform the irregular bearing data into segmented incremental step data. Suppose that function
is the mapping function of the isotonic regression algorithm, the input of the algorithm is the peak-to-peak value of the vibration data
, and the output is
with a monotonically increasing trend. This transformation ensures that the degradation trend of the bearing data is monotonically increasing and eliminates the noise interference from the original data.
Figure 3a illustrates the original degradation trend of the bearing data via the isotonic regression algorithm. Several jump points can be found in the figure, which make it difficult to determine the proper degradation point positions.
The algorithm generates the gradient path: Correspondingly, we use the least square method [
25] with a sliding window to calculate the gradient
with the window size
, as shown in
Figure 3b. The specific formula is as follows:
where
is the subscript of the corresponding peak-to-peak value
, and
.
Adaptive threshold iteration process: In order to determine the proper degradation point from a number of jump points and determine the position of the stage division, we further propose an adaptive threshold algorithm to compare the incremental gradients of degradation over multiple iterations. Specifically, we use the initial position point as the first point for the initialization of this iterative algorithm and the final point position as the tail . We calculate the average value of the gradient, denoted as .
This algorithm requires two key conditions: the first is the gradient increment of degradation with a continuous fluctuation; that is, for any greater than zero, , there are in total. The second is that there is a gradient increment of , which is greater than . and are set by human experience.
Update the position of the degradation points
by iterating repeatedly until the optimal degradation point is chosen at the end of the iteration. The algorithm flow is illustrated in
Figure 4, and the algorithm process is described as follows:
Proceed to the second step only if the count of incremental gradients with consecutive jumps is at least (where ); otherwise, halt the process.
If the number of incremental gradient points meeting condition (1) is at least (where ), proceed to step 3; otherwise, halt the process.
If condition (2) is met, the starting point of the selected gradient is set as the new degradation point, and the final point of the gradient with continuous jumps becomes the new gradient’s end. The updated gradient average is then calculated for round 4.
The algorithm converges to the final degradation point and stops; if not, return to step 1.
As illustrated in
Figure 3c, the adaptive threshold stage division algorithm determines the proper degradation point position to eliminate the interference caused by multiple jump points. Subsequently, the final degradation point is used to divide the bearing data into two stages: the health and fast degradation stages. We use the bearing data from the degradation stage as the input
of the prediction module in the later section.
3.3. Parallel Reversible Instance Normalization RUL Prediction
As previously mentioned, the training set input to this module consists of two branches: the normalized data
and the time series data
. Therefore, the parallel reversible instance normalization RUL prediction module is accordingly designed with two branches to process the input data of these two parts. Firstly,
is the input of the prediction module, and the output is
. Considering that the direct RUL prediction results
from the normalized data
may result in a distribution bias from the actual data, we further calculate the reverse normalized [
5] predicted value
from the the predicted value
, expressed as follows:
Importantly, the weights of the two prediction values are simultaneously optimized via model training to achieve a better overall prediction accuracy. The representation is as follows:
Among these, are the elements of the learnable affine parameter vector.
In terms of the design of the RUL predictor, it is necessary to use a network model that extracts deep information features well and has a lighter neural network structure. Therefore, we construct the GRU-based predictor
to predict the RUL
. The RUL predictor is composed of two single-layer gated cyclic unit GRUs and three fully connected layers. The GRU-based predictor’s parameters are listed in
Table 1.
The pseudocode of the proposed method is shown in Algorithm 1.
Algorithm 1: AsdinNorm for RUL prediction. |
1: Input: (Training stage) Source domain:
, where shows the sample, and shows the number of samples.
2: Data preprocessing: peak-to-peak values extract.
3: for
epochs, do
4: Randomly initialize the weight of the AsdinNorm model
.
5: Instance normalization from Equations (3) to (5).
6: Use the adaptive threshold stage division module to select the degradation stage data.
7: Use Equation (10) to calculate the margin loss.
8: Use Equations (8) and (9) to obtain the RUL prediction values and update the affine parameters
, , and .
9: end for
10: Output: The AsdinNorm model with optimal
.
11: Input: (Test stage) Unseen target domain
, where shows the sample, and M shows the number of samples.
12: peak-to-peak values extract.
13: Use the adaptive threshold stage division module to select the degradation stage data.
14: Use Equations (8)–(10) to obtain the RUL prediction values and calculate the evaluation indicators.
15: Output: RMSE of the target bearings. |
4. Experiment and Discussion
4.1. Experiment Description
We conducted experiments using two public datasets: the IEEE PHM Challenge 2012 bearing dataset and the XJTU-SY bearing dataset. The IEEE PHM Challenge 2012 bearing dataset is provided by the bearing degradation experiments on the PRONOSTIA test stand. The PRONOSTIA experimental setup includes three primary components: rotational components, load components, and data measurement components, as illustrated in
Figure 5a. The load is 4000 N. Vibration signals are captured every 10 s, with each recording lasting 0.1 s.
Table 2 displays the dataset description under three operating conditions. We use the vibration signal of bearing 1_1 as the training set and test the other 12 bearings.
The XJTU-SY bearing datasets are provided by the Institute of Design Science and Basic Component at Xi’an Jiaotong University (XJTU), encompassing vibration signals from 15 rolling bearings operating under three distinct conditions. The vibration signals depict the operational-to-failure transitions of the 15 rolling bearings across these three conditions. The dataset is sampled at a frequency of 25.6 kHz with a sampling period of 1 min, as shown in
Figure 5b. Comprehensive details of the two datasets are presented in
Table 2. In the PHM 2012 dataset, bearing 1_1 is utilized as the training set, with the remaining bearings serving as the test set. For the XJTU-SY dataset, bearing 3_1 is employed as the training set, while the remaining bearings constitute the test set to evaluate the model’s generalization performance.
We designed experiments in three aspects (Comparison Results Using Full Life cycle Data and Fast Degradation Data; Ablation Experiments; Comparison Experiments) to validate our method.
Figure 6 illustrates the experimental flow chart of the verified procedure of the proposed method.
4.2. Adaptive Threshold Stage Division Experiment
Figure 7 presents the iterative process of the adaptive threshold stage division algorithm applied on bearing
and bearing
of the PHM 2012 dataset, along with the final degradation point position. It can be seen that the method can variously determine the positions of the degradation points for each iteration and continuously update the mean degradation gradient value. The iterative convergence of the algorithm identifies the final degradation points by filtering out several interference jump points, which accurately captures the subtle variations in the bearing data.
As depicted in
Figure 7a, it can be observed that bearing
converges to the degradation point at 1323 after three iterations. Similarly,
Figure 7b illustrates that bearing
converges to the degradation point at 228 after three iterations. The specific iteration counts and degradation point positions for each bearing can be referenced in
Table 3.
Using the proposed algorithm, all bearings are stage-divided, and
Table 3 lists the number of iterations and final degradation point positions. The vibration data of the degradation stage are selected for the RUL predictions.
4.3. Comparison Results Using Full Life Cycle Data and Fast Degradation Data
To verify the effectiveness of stage division in the proposed method, we, respectively, use full life cycle data and fast degradation data of the bearings to predict the RUL. We choose the root mean square error (
) [
26] as the metric to evaluate the model performance, expressed as follows:
where
represents the actual RUL value,
is the estimated RUL value, and
indicates the total number of samples. The smaller the value of the
, the superior the prediction performance of the model.
Figure 8 illustrates the prediction results of the GRU model and our proposed model for the full life cycle and fast degradation data of all test bearings, respectively. It is observed that the
of two models for the fast degradation stage data are smaller than those for the full life cycle data. This is because the data distribution discrepancy between the health stage and the fast degradation stage is significant. The distribution of data in the health stage can bias the model learning and affect the prediction performance in the fast degradation stage of the bearing data.
Further, we train our proposed model with the full life cycle data of bearing
of the PHM2012 dataset and the data of the fast degradation stage, respectively.
Figure 9 shows the prediction results of bearing
. The blue curve is the real values of bearing
. The red curve is the prediction results of bearing
trained on the data of the fast degradation stage of bearing
. The green curve is the one trained on the full life cycle data of bearing
. It can be seen that the model trained with the data of the fast rapid degradation stage has a superior fitting to the vibration data with the rapid variation.
Two sets of experimental results demonstrate that dividing the bearing data into two stages and using fast degradation stage data for the RUL prediction results in a superior performance than using the full life cycle data for the prediction.
4.4. Ablation Study
In this section, we conduct three ablation experiments to verify the effectiveness of each module of our proposed method. Specifically, the model with the adaptive threshold stage division module and GRU module (termed as Adapstage+GRU); the model with the adaptive threshold stage division module, instance normalization module, and reversible normalization-based RUL prediction module (termed as Adapstage+IN+RevIN); and the model with the manual threshold stage division module, instance normalization module, and parallel reversible normalization-based RUL prediction module (termed as Manualstage+IN+RevIN) are constructed, respectively.
For the PHM 2012 dataset, we train three models on bearing
. The predictive results for the 12 test bearings are depicted in
Figure 10. Similarly, for the XJTU-SY dataset, the training bearing is
, and the predictive results for nine test bearings are illustrated in
Figure 11.
As shown in
Figure 10, the above bolded data is the best result for all the ablation models. In the experiments on the PHM2012 dataset, the Manualstage+IN+RevIN model exhibits the worst prediction performance, with an average
value of 2.15. The next-worst prediction performance is the Adap-stage+GRU model, with an average
value of 1.86. Then comes the Adap-stage+IN+RevIN model, with an average
value of 1.68, and finally, the best prediction is our proposed method, with an average
value of 1.44, and it can be seen that both the average
value and the
values of individual bearing predictions are the smallest for our proposed method, which proves that the prediction performance of our proposed method is better than the other models. This is because the degradation trends of different bearings under different operating conditions differ obviously. Using fixed thresholds to perform stage division for all bearings will result in non-negligible bias in selecting degradation points for some bearings with significantly different data distributions. Meanwhile, the Adapstage+IN+RevIN model has a suboptimal prediction performance. Although the normalization and inverse normalization methods are used in that model, the lack of learnable parallel processes affects the generalization ability of the model.
As shown in
Figure 11, the above bolded data is the best result for all the ablation models. In the experiments on the XJTU-SY dataset, it can be seen that the GRU+FC model exhibits the worst prediction performance, with an average
value of 1.88. The next-worst prediction performance is the GRU+RevIN model, with an average
value of 1.70. Finally, it was our proposed method, which predicts an average
value of 1.53. From the above results, the effectiveness and generalization ability of the parallel reversible normalized RUL prediction module proposed in our method can be seen.
However, as seen in
Figure 10 and
Figure 11, it can be seen that the prediction accuracy of the proposed model for some bearings (e.g., bearing
and bearing
) is not much different or even the same compared to the prediction accuracy of other ablation models. This is because we use bearing
for training and other bearings as test bearings, the peak-to-peak height of bearing
reaches more than 50, the degradation trend is smoother in the early stage, and the performance is more dramatic in the later stage. The maximum peak-to-peak values in bearing
and bearing
are not more than 10, and the data distribution of their degradation trend is more different from that of bearing
. Our model performs very well in the prediction of other bearings that have peak-to-peak heights and degradation trends closer to the training bearings, and the data distribution differences between them are smaller, so the model shows a better generalization ability and prediction accuracy.
Summarizing the experimental results shows that our proposed model can obtain more accurate degradation positions, as well as a better prediction accuracy, than any other model.
4.5. Comparison with State-of-the-Art Methods
In this section, we compare our model with six state-of-the-art methods on the PHM 2012 dataset to verify its superiority. The comparison models include the AE (Autoencoder) [
27], SA (Self-Attention) [
28], MMD (Maximum Mean Discrepancy) [
29], TCA [
30], Transformer [
31], and AOA models [
32] listed in
Table 4. Among them, SA, AE, and Transformer are the prevalent learning models; TCA and MMD belong to the domain adaptive models; and AOA belongs to the domain generalization model.
Figure 12 shows the prediction results of different comparison models for all tested bearings, the above bolded data is the best result for all the comparison models. As well as the average
values. From the average
values, it can be seen that the AE model predicts an average
value of 1.87, which is the worst prediction accuracy among all the comparison models, followed by the AOA model, which predicts an average
of 1.82, then the TCA model, which predicts an average
of 1.76, and the SA model, which predicts an average
of 1.69; it is worth noting that the MMD model and Transformer model predict an average
value of 1.61 and 1.6, respectively. Finally, our proposed model predicts an average
value of 1.44.
It is observed that all the values of our model are lower than the other compared models. Among these, the AE and SA only extract the significant features of the training bearings and ignore the variations in data distribution across the different test bearings; therefore, the two models fail to obtain satisfactory prediction results. The TCA and the MMD models enhance the prediction accuracy by reducing the distance between the source domain bearing and the known target domain bearing. However, since the vibration data of different bearings have the different time series lengths with respect to the full life cycle, the distance metric-based domain adaptive approaches produce a certain distance bias, which has an obvious impact on the prediction performance. The AOA model uses GAN to generate pseudo samples, which expands the data distribution of the samples, but the expansion range is uncontrollable, which will affect the prediction progress. As the conditions of generalization are strict, the prediction effect will be affected.
However, as can be seen in
Figure 12, in the comparison experiments, the difference in prediction accuracy between our proposed model and the comparison model on some bearings is not large, or the results are even the same. Comparing the learning conditions of the models, it can be seen that our proposed single-source domain generalization model relies on only one bearing training and makes predictions under the condition that the target domain bearing is unknown. It can be seen from the figure that the prediction accuracy of the domain adaptive model is also better; the reason is that this kind of model can use the target bearing to perform some adaptive methods, thus bringing the distance between the source domain and the target domain closer, which makes the domain adaptive model perform very well for predictions in cross-domain scenarios. Learning models and domain generalization models, on the other hand, do not perform as well. If the bearings of the target domain are not visible to the domain adaptive model, then its prediction accuracy drops drastically.
It should be pointed out that bearing is the worst case among all the tested bearings. This is because the degradation stage of bearing lasts for a short time. The data vary widely, and correspondingly, the data distribution of bearing differs considerably from the other eleven bearings; therefore, the prediction results of bearing perform more poorly than any other test bearings on each comparison model.
4.6. Generalization Error Bound Analysis
It should be pointed out that, in
Figure 12, the
of bearing
and bearing
are obviously larger and even exceed several times those of the other bearings. Therefore, we analyze the reason from the perspective of the generalization error bound.
The generalization error usually indicates the generalization performance of the model for unknown target data, which are obtained by subtracting the training error from the error expectation over the entire input space. The generalization error bound [
33] is the maximum allowed value of the generalization error, beyond which the feasibility of model is problematic, defined as follows: when the space is assumed to be a finite function set
, the following inequality holds for any function
, with a probability of
at least.
where the left-hand side of the inequality
denotes the generalization error, the right-hand side denotes the generalization error bound,
denotes the empirical risk, and
corresponds to a correction quantity, which is a monotonically decreasing function of the corresponding sample N. d denotes the number of functions, and the more functions, the larger the correction. Correspondingly, the empirical risk
is defined as follows:
where
represents the true values, and
represents the predicted values.
In the PHM 2012 experiment dataset, the amount of training sample
is 1440, and the number of functions
is 150. The range of values of the probability
is set to [0, 1], and according to Equation (12), the larger
is, the smaller
is. When the value of
is set to 1, the minimum value of
is 0.0274. According to Equation (13),
is 1.422. Following Equation (11), we obtain the value of the generalization error bound as 1.44. The specific results are listed in
Table 5.
As can be seen from
Figure 13, except for bearing
and bearing
, all other bearings meet the above generalization error bound inequality on the PHM dataset. This indicates that the model does not have generalization ability for bearings
and
; therefore, the two prediction result
values are larger than that of the other test bearings. The experimental results of our model are in accordance with the theoretical calculations.
5. Conclusions
In this paper, to tackle the problem that the unknown target bearing data are unavailable or unknown for model training, we propose a novel single-source domain generalization method for the RUL predictions of bearings, termed as the adaptive stage division and parallel reversible instance normalization model. Firstly, we raise an adaptive threshold stage division approach to determine the degradation point in the full life cycle vibration data of bearings. Further, we explore the instance normalization and denormalization algorithms of the source bearing data and then combine them into a unified GRU-based RUL prediction network, avoiding the distribution bias in data enhancement and concurrently enhancing the generalization performance of the model for unknown bearings. In the two ablation experiments of the PHM2012 dataset and XJTU-SY bearing dataset, it can be seen that the average prediction accuracy ( value) of the proposed method is 1.44 and 1.53, respectively, which are 17 and 11 higher than that of the second-best models, i.e., the Adap-stage+IN+RevIN and GRU+RevIN models. In the comparison experiment of the PHM2012 dataset, the average prediction accuracy of the proposed model is 1.44, which is 11 superior to that of the suboptimal comparison model (Transformer model). Comparison of the experimental results shows that the model offers a good generalization performance for predicting the RUL of unknown bearings. This method introduces a novel approach to single-source domain generalization for RUL predictions.
It is noted that the generalization ability of our model on bearings and is still unsatisfactory. In future works, we can attempt to increase the training samples and explore advanced data augmentation techniques to expand the data distribution, such that the model has a wider generalization error bound and generalization ability.