1. Introduction
The complexity and sophistication of the structure of modern mechanical operating equipment has been further enhanced by the progress and continuous improvement of the level of technology and production processes. However, due to the combined effect of external environment and internal equipment, the actual performance and health condition of rotating equipment will inevitably show a tendency of decline. If this tendency of degradation reaches a certain threshold, the equipment will not be able to complete the tasks and functions it is responsible for, and if it continues to operate, it will lead to the damage of its neighbouring parts and paralysis of the equipment, causing immeasurable property damage and threatening the staff. Life safety Rolling bearings, as important components of rotating equipment, are subject to wear, deformation and dislodgement during continuous operation. Research results show that bearings cause more than 30% of rotating equipment failures [
1,
2,
3]. Therefore, predictive maintenance of rotating equipment can be achieved by studying rolling bearings. In recent years, life prediction techniques have developed rapidly. The existing life prediction methods can be broadly classified into physical failure model-based methods and data-driven methods [
4,
5,
6,
7]. For complex, obtaining the physical failure mechanism of highly reliable equipment is time-consuming and difficult. In contrast, data-driven prediction methods do not rely on the failure mechanism of the equipment, but they require monitoring the operation of the equipment and collecting valid failure data or performance degradation data [
8,
9,
10]. However, it requires monitoring of the equipment operation and collecting valid failure data or performance degradation data. Data-driven approaches generally include those based on statistically driven models and reliability functions, as well as those based on machine learning and deep learning models are the main directions of current life prediction research [
11,
12,
13,
14]. For rolling bearings, the main prediction steps include: (1) extracting failure characteristics and degradation curves using signal processing or machine learning; (2) constructing health indicators using deep learning or degradation function models to characterize life thresholds; (3) using the trained deep learning degradation models for life prediction or model fitting the obtained degradation curves to finally obtain the RUL [
15,
16,
17]. Li Hailang et al. combined the trend consistency constraint and parallel variance constraint to extract features from convolutional self-coding to reduce the individual differences of bearings with the same tag features, which improved the bearing prediction accuracy to a certain extent [
18,
19]. Zhang Guangyu used convolutional autoencoders to learn deep features from data and solve the problem of lack of labeled data in the actual operation of mechanical equipment. The optimized deep feature data is fused with the original data and input into the gated recurrent unit for temporal feature extraction, addressing issues such as insufficient utilization of temporal features in the data and achieving remaining useful life prediction [
20].
She Daoming proposed a novel method combining deep autoencoders and minimum quantization error to accurately describe the dynamic degradation process of rolling bearings. The results showed that the trend value, monotonicity value, robustness value, and fusion evaluation criterion value of the health indicators constructed using this method were all higher than those of single-layer autoencoder models and traditional Principal Component Analysis dimensionality reduction methods [
21].
These researchers applied convolutional autoencoder techniques to address the issues of unlabeled data and utilization of temporal features, effectively improving the accuracy of remaining useful life prediction for bearings. Their proposed methods outperformed traditional methods in terms of accuracy and evaluation criteria.
Statistically driven model and reliability function are applied for RUL prediction, such as Li NaiPeng et al. [
22]. A general Wiener process model included the multi-source observation function and mapping function is used to describe the causal and correlation relationships between the state and the data, and a particle filtering algorithm is also applied to dynamically match the multi-sensor data with the model to predict the RUL. This method disposes the problems of deep learning algorithms over-relying on training data and lacking necessary empirical guidelines, but further discussion is needed due to the large amount of data required and the weight assignment of multi-sensor data. Machine learning SVM was first proposed by Cortes and Vapnik in 1995 to solve classification and regression problems for analyzing small samples and multi-dimensional data [
23]. Chen et al. [
24] used SVM to predict the RUL of an aero-engine and to predict the RUL of an operating equipment based on the improved similarity theory combined with the RUL results of the engine. This method has a high degree of process visualization, but is more complex, containing a priori knowledge of data processing, feature screening, parameter optimization, etc., and also requires a large amount of data for training, which makes it difficult to achieve prediction in a practical working environment.
Wang et al. [
25] proposed a health indicator generation network model based on spatial convolutional long and short term memory neural network (ConvLSTM), which directly mines the features reflecting the degradation degree from the collected raw signals to construct health indicators and achieve the output of RUL expectancy results. The health indicators constructed by this method have better trending, monotonicity and robustness, while the accuracy of RUL prediction is higher. However, the model aims to stack the network and improve the feature extraction capability, but no specific explanation can be made as well as a large amount of data is required as the training set, and different training sets and training processes can cause large fluctuations in the model. Qiao Xiandong and others used the Autoformer model to train temperature data. Compared to other Transformer models, Autoformer has a lower error rate and higher efficiency, which improves the accuracy of temperature forecasting [
26]. Wu Haixu and others addressed the long-term prediction problem of time series by designing Autoformer as a novel decomposition architecture with autocorrelation mechanism. As a result, Autoformer achieved state-of-the-art accuracy on six benchmarks [
27].
The essence of both deep learning and machine learning is feature extraction. The difference is that deep learning is feature extraction from multiple levels and angles of the original data through neural nodes, while machine learning is data decomposition through functions, which explains why deep learning has weaker interpretability but relatively better results. The main challenges of current bearing life prediction research can be summarized as follows: 1. Scarcity and imbalance of bearing fault data: traditional deep learning requires a large amount of complete degraded bearing data for model training. 2. Characteristic selection and extraction: the feature extraction method needs to be improved, and specific explanations cannot be made. In order to increase the number of samples, Yanfang Fu et al. [
28] trained an improved deep residual network (SE-ResNet18) fault diagnosis model based on channel attention mechanism on the basis of the existing fault samples by using Wavelet Packet Decomposition (WPD) denoising and Conditional Variational Autoencoder (CVAE), and the enhanced fault samples improved the accuracy of fault diagnosis. Dominik Łuczak et al. [
29,
30] solved the multi-classification problem by converting the acquired one-dimensional signal into RGB image data and training a convolutional neural network with the image based on the problem of feature selection and extraction, and Min Su Kim [
31] efficiently extracted the feature map of each X, Y, and Z axis from the three-axis vibration signal by grouping the one-dimensional convolutions, and the feature map extracted from each axis consisted of a specific frequency of each axis without domain transformation to train the end-to-end model. The model classifies faults based on the frequency characteristics of each axis. Liang et al. [
32] used a one-dimensional dilated convolutional neural network to realize the feature extraction and fault mode classification of the excitation current, and further used the Score-CAM activation mapping algorithm to analyze the diagnostic mechanism of the model, taking into account the accuracy and interpretability of the model.
In summary, in order to minimize the impact of limited degraded bearing data on the accuracy of life prediction, this paper proposes a new deep learning network model framework. This framework processes the extracted features with higher interpretability, introduces a new feature extraction method called ECA-CAE, and selects features with better linear and exponential trends. This method has better feature extraction capabilities than CAE and MLP-AE. The Autoformer model is used for degradation trend prediction and combined with a double exponential model for RUL prediction [
33]. Furthermore, in terms of deep learning, a single bearing prediction approach is adopted, where the first half is used for training and the second half is for prediction. Compared with traditional deep learning methods that require a large amount of complete degraded bearing data for model training, this method only uses the first half of the current bearing degradation features to predict future degradation trends. This solves the problem of non-optimal model parameters due to insufficient data volume. By comparing with the latest Informer and Transformer models [
34,
35], the advantages of the proposed method are validated. Additionally, this model is suitable for predicting the life of bearings under the same operating conditions. Based on this, the next degradation trend can be predicted, and corresponding measures can be formulated to mitigate degradation.
2. Principle Introduction
2.1. CAE Model
Auto-encoders are neural networks designed to replicate their input to the output. They work by compressing the input into a latent-space representation and then reconstructing the output of this representation, the more the output converges to the input, the more the features extracted by the network represent the internal features of the input data. This network consists of two parts, Encoder and Decoder.
Figure 1 shows the implementation process of this network.
Convolutional auto-encoder is a convolutional network that replaces the fully-connected layer. The encoder consists of convolutional and pooling layers, and the decoder consists of an inverse convolutional layer.
In the encoder,
k convolutional kernels (
W) are initialized, and each convolutional kernel is paired with a bias
to generate
features h after convolution with the input
. The activation function is sigmoid. equation is as follows:
Pooling operation (Max Pooling): When pooling the features generated above, the matrix of the position relationship at the time of pooling should be retained to facilitate the operation of inverse pooling later.
In the decoder, the inverse pooling operation is performed on the features generated above, and a matrix that preserves the positional relationships at the time of pooling is used to restore the data to the corresponding positions of the matrix of the original size.
The transpose of each feature h with its corresponding convolution kernel The convolution operation is performed and the result is summed, then the bias c is added and the activation function remains sigmoid. equation is as follows:
In order to make the network better, the weights are updated and the minimum mean squared error (MSE) function is used, i.e.,: the target value minus the squared sum of the actual values and then the mean value,
is used to simplify the derivation. The formula is as follows:
2.2. ECANet
To cope with the problem of weight assignment among different channels, scholars have proposed ECANet, ECA can assign different weights to multiple channels of the input without changing the input feature size, and the structure of ECA is as shown in
Figure 2:
The flow of the ECA model is as follows.
- (1)
Features with input dimensions H × W × C;
- (2)
Compressing the input features using global average pooling to obtain 1 × 1 × C features;
- (3)
The resulting features are subjected to channel learning using 1 × 1 convolution to learn the importance between different channels and to assign weights;
- (4)
Finally, the feature map 1 × 1 × C with weights is combined with the input features H × W × C to obtain the output features with channel attention.
In doing the convolution process, dynamic convolution kernels are used to effectively extract features under different sensory fields and learn the weights between different channels. Dynamic convolution kernel means using an adaptive function to decide the convolution kernel size; the size of the convolution kernel is determined by the number of channels; the larger the number of channels, the larger the convolution kernel and the stronger the cross-channel interaction; on the contrary, in layers with smaller number of channels, smaller convolution kernels are used and less cross-channel interaction is performed; the adaptive function is as follows:
where
represents the size of the convolution kernel;
represents the number of channels; indicates that
can only take odd numbers; and
is set to 2 and 1, which is used to change the ratio between the number of channels
and the convolution kernel.
2.3. Autoformer Model
With the good performance of Transformer models on natural language (NLP) processing, researchers have turned their attention to temporal processing, however, temporal processing suffers from the following problems: as the prediction time lengthens, it is difficult to find reliable temporal dependencies from complex temporal patterns by directly using the self-attention mechanism; secondly, due to the self-attention secondary complexity problem, the model has to use its sparse version, but it will limit the information utilization efficiency and affect the prediction effect.
The Autoformer model proposes a progressive sequence decomposition and autocorrelation mechanism to solve the above problems, and the model is shown in
Figure 3, in which the encoding and decoding parts include an autocorrelation module, a sequence decomposition module, and a feedforward neural network. The feed-forward temporal network acts as a deeper feature extraction.
The series decomposition module (series decomposition) is based on the idea of sliding average, smoothing the periodic terms and highlighting the trend terms:
where
is the hidden variable to be decomposed,
and
are the trend and period terms.
In the Encoder part, a stepwise elimination of the trend term is taken to obtain the periodic terms
,
. And based on this periodicity, an autocorrelation mechanism is used to aggregate similar sub processes of different periods:
In the Decoder section, the trend term and the period term are modeled separately. In which, for the period term, the autocorrelation mechanism uses the periodic nature of the sequence to aggregate subsequences with similar processes in different cycles; for the trend term
, the trend information is gradually extracted from the predicted hidden variables using a cumulative approach.
Based on the above progressive decomposition architecture, the model can gradually decompose the hidden variables in the forecasting process and obtain the forecasting results of cycle and trend components respectively through the autocorrelation mechanism and accumulation, so as to realize the alternate and mutual promotion of decomposition and optimization of forecasting results.
5. Conclusions
To address the challenging problem of RUL (Remaining Useful Life) prediction for rolling bearings, a novel solution has been proposed: automatic time series forecasting to indirectly predict RUL based on degradation trends. The advantages of this method are as follows:
A novel feature extraction method, ECA-CAE, has been introduced, which outperforms CAE and MLP-AE in terms of feature extraction capabilities.
Under the premise of enhancing feature interpretability through Convolutional Autoencoders (CAE), this chapter presents a degradation trend prediction method based on individual bearings. In contrast to traditional deep learning methods that require a large amount of complete degradation bearing data for model training, this method only utilizes the first half of the current bearing’s degradation features to predict future degradation trends. Moreover, this model is suitable for predicting bearing life under the same operating conditions. Based on this, it predicts the next degradation trend and formulates corresponding measures to mitigate degradation. While ECA-CAE and Autoformer provide effective methods for predicting rolling bearing life, their effectiveness depends on various factors such as data quality, model complexity, interpretability, and generalization ability. When applying these methods in practical industrial scenarios, it is essential to carefully consider these limitations and strike a balance between model complexity and feasibility.
There is still significant research potential in the field of rolling bearing life prediction. Future research directions should focus on improving model performance, generalization ability, practicality, and interpretability. Research efforts should also explore how to integrate these models into real-time monitoring systems to achieve real-time health monitoring and maintenance of bearings. Additionally, considerations should be given to how to implement online learning and transfer learning to adapt to changes and new data in the bearing operating process, meeting the needs of predictive maintenance in the industrial sector.