1. Introduction
Rolling bearings are key components of mechanical rotating equipment, and their operating condition can determine the overall equipment performance life [
1]. The effective information of rolling bearings operating conditions is contained in their vibration signal. The analysis of the vibration signal is helpful in understanding the status information of the bearings and to predicting their remaining life, which makes conditions for maintenance appropriate.
In general, bearings in mechanical equipment will undergo a performance degradation from normal to degradation and failure. Monitoring the operating status of bearings and analyzing their vibration signals can effectively detect early, weak faults and reveal their performance degradation trends, which can be utilized to develop focused equipment maintenance strategies. Therefore, the performance degradation assessment of rolling bearings is the premise and key point of implementing condition-based maintenance [
2].
The bearing performance degradation assessment process can be divided into: Feature extraction, model establishment, and assessment analysis with indicators construction. In these three steps, a lot of research work has been done by domestic and foreign scholars. Liu et al. [
3] combined Bayesian inference and self-organizing mapping to assess the performance degradation of rolling bearings. They used independent component analysis as the signal feature extraction method and selected a self-organizing network as the performance degradation assessment model, then constructed the quantitative bearing performance assessment index with negative log-likelihood probability, and finally calculated the bearing failure probability with Bayesian inference to implement early failure identification and performance degradation assessment of bearings. There is a problem that the traditional performance degradation indicator cannot accurately portray the performance degradation state of rolling bearings during the whole life cycle. Yang et al. [
4] implemented the performance degradation assessment of rolling bearings using feature fusion and the gray regression method, which first extracted the time-domain features, energy features, and entropy features from the bearing vibration signal, then implemented feature screening and feature fusion on the constituent feature vectors, and finally used the gray regression model to construct performance degradation indicator. Two sets of rolling bearing full life cycle data verified the effectiveness of the proposed method. Zhang et al. [
5] extracted useful features from gear vibration signals using the Autoregressive Model (AR), then trained an AANN model with AR coefficients of normal signals as feature vectors, used AR coefficients of signals to be measured as input vectors of trained AANN, outputted AR coefficient reconstruction vectors from the model, and constructed a fault assessment indicator using the root mean square error in the residual vectors of the AR model before and after reconstruction to achieve performance degradation assessment of gears. Lei et al. [
6] proposed a performance degradation assessment model based on PCA-FCMAC (Fuzzy Cerebellar Model Articulation Controller), which first used PCA to downscale the signal features to eliminate redundant information, and then the downscaled features were input into FCMAC. The results verified the effectiveness and superiority of PCA-FCMAC.
Although good results in equipment performance degradation assessment can be achieved with the above methods, there are still some problems: (1) A series of feature engineering of signals has to be implemented such as feature selection, feature downscaling and feature fusion in the feature extraction process. Feature engineering requires a lot of effort and time in designing the preprocessing process and data conversion, which requires certain expert knowledge to judge the quality of feature extraction; (2) signals collected from sensors are inevitably mixed with a large amount of noise and transmission path interference, and the direct feature extraction of signals with noise and interference will influence the effect of performance degradation assessment; (3) most of the performance degradation assessment models are simple linear mathematical models or shallow networks, which have limited the ability to learn and extract features, and the volume of data that are input into model is quite low, which cannot meet the data analysis requirements in the background of big data; (4) the timing information of the bearing signal is usually ignored after inputting into the model. Especially for the performance degradation model of signal reconstruction [
7], the timing information of the signal is quite important, ignoring the timing information of the signal will not be able to reconstruct the maximum information of the original signal, which will eventually influence the effect of the assessment results.
In recent years, with the rise of artificial intelligence, deep learning has been widely used in the field of equipment fault diagnosis [
8]. The most representative deep learning models are Convolution Neural Network (CNN) [
9,
10], Recurrent Neural Network (RNN) [
11,
12], and Auto-Encoder (AE) [
13,
14]. The advantages of deep learning models are as follows: (1) Deep learning models can implement an “end-to-end” fault diagnosis [
15], avoiding the tedious feature engineering in traditional methods; (2) deep learning models are complex nonlinear deep networks, and model training generally needs a large amount of data, which can meet the requirement of the big data background in mechanical equipment; (3) in CNN, the operation of convolution using convolution kernel can be understood as the process of denoising while extracting the features of the signal; (4) RNN can retain the temporal information of the data when processing data.
Considering the above advantages, a few scholars have started to use deep learning methods for equipment state performance assessment. Xu et al. [
16] proposed a median filtered deep confidence network for constructing performance degradation indicators. Although the method has a good denoising effect on the signal and can obviously portray the bearing degradation process, the degradation indicator value exceeds the alarm threshold line only at the late stage of the degradation process, which is not conducive to the early bearing fault detection. Chen et al. [
17] directly input the collected signals into a model consisting of CNN and RNN to construct indicators, which achieved good results. The problem of the proposed method is that the indicator labels have to be constructed first. The degradation trend of each part of each piece of equipment is different under actual working conditions, so the method they proposed has some limitations. Zhang et al. [
18] implemented an “end-to-end” bearing performance degradation assessment using CNN and LSTM, which first trained models with normal healthy data, input test data into the trained models, and finally used the H-statistic to measure the degree of performance degradation, but the experimental data showed that the noise component in the bearing performance degradation curve was too much to accurately reflect the performance degradation process.
In summary, an end-to-end rolling bearing performance degradation assessment model based on DRSN-LSTM is proposed in this paper, in which DRSN is used to extract local abstract features and implement denoising operation from signals to obtain the denoising feature vector. The signal timing information is extracted using deep LSTM and then the original signal is reconstructed using two nonlinear layers. In order to comprehensively encompass the state information of the rolling bearing signal, the time-domain, frequency domain, and time-frequency domain features before and after signal reconstruction are extracted to form a multi-domain feature vector. Finally, the mean square error of two feature vectors is used as the degradation indicator, and the effectiveness of the proposed method for detecting early bearing faults and portraying the monotonicity and consistency of the performance degradation process is confirmed using the bearing fatigue test data.
2. The Framework of DRSN
The Deep Residual Shrinkage Network (DRSN) [
19], is improved from the Deep Residual Network (ResNet), which is similar to ResNet in structure. The DRSN in this paper consists of a convolutional layer (Conv), a batch normalization (BN), an activation function, and a residual shrinkage building unit (RSBU).
2.1. Convolutional Layer
In the convolution layer, the convolution kernel implements the convolution operation with the input data to obtain the feature map. The convolution layer can be used to extract the required features by changing the parameters of the convolution kernel. The convolution operation of the convolution layer is shown in Equation (1). Different features of the data can be learned by using different convolution kernels during feature extraction. The formula for calculating the feature map output during the convolution operation is shown in Equation (2).
where
is the
j′-th weight of the
i-th convolution kernel in the
l-th layer,
is the
j’-th bias of the
i-th convolution kernel in the
l-th layer. The local field of perception of the
j-th convolution kernel in the
l-th layer is
and
k is the width of the convolution kernel.
where
I is the input feature size.
O is the output feature size.
K is the convolutional kernel size.
S is the convolutional step size of the convolutional kernel.
P is the size of padding.
2.2. Activation Function
Since most of the problems, in reality, are nonlinear. In nonlinear problems, errors inevitably arise if modeled with linear functions. To enhance the nonlinear modeling capability, nonlinear activation functions have been introduced to CNN. The activation function layer enables CNN to learn complex and nonlinear features from the data. The activation function layer is generally placed between the convolutional layer and the pooling layer. The commonly used activation functions are the sigmoid function, the tanh function, and the linear rectification function (ReLU). ReLU is used as the activation function in DRSN.
The mathematical formula of ReLU is:
2.3. Batch Normalization Layer
BN can change the distribution of feature value in the layer, pulling it back to the standard normal distribution. The feature value will be distributed in an interval where the activation function is more sensitive to the input. A small change in the input can correspond to a larger fluctuation in the loss function, resulting in a larger gradient, avoiding the problem of gradient disappearance and speeding up the training convergence. The formula of BN can be expressed as:
where
represents the size of the batch,
represents the input of the nth batch,
represents the output of the nth batch,
is a constant value close to zero,
is a parameter value to measure the distribution,
is a parameter value to move the location of the distribution. Both
and
can be obtained by training.
2.4. Residual Shrinkage Building Unit
In signal processing, the soft-threshold segmentation algorithm implements signal denoising by converting the near-zero feature values of the signal to zero. The soft-threshold function can be expressed as:
where
x is input,
y is output,
represents threshold value.
DRSN implements feature extraction with simultaneous denoising by introducing a soft thresholding function to construct RSBU which is the most important part of DRSN. The detailed structure of the RSBU is shown in
Figure 1. C is the size of the feature map channel, W is the feature map width, and the height of the feature map is one. An RSBU consists of two BN, two ReLU, two convolutional layers, a thresholding module, a soft threshold setting module, and an identity shortcut. The feature map x after going through the absolute value layer in the threshold module, GAP layer, BN, ReLU, and two fully connected (FC) layers in the threshold module, the size of the features all became C. The output of the fully connected layer is input to Sigmoid layer. The output
is used as the scale parameter. There is the following formula
:
where
Zc is the node feature in channel C.
is multiplied with the absolute value layer and GAP layer to obtain the soft threshold value which is calculated as:
In Input , i, j, and c represent the width, length, and the number of channels of the feature map x, respectively; is the threshold values of the input feature map x.
From the above, it can be seen that the threshold module can automatically set the threshold value during the training process of the network.
3. The Framework of LSTM
The traditional Deep Neural Networks (DNN) have the advantage of nonlinear data processing, but they are not sensitive to processing time-series data problems. The Recurrent Neural Network (RNN) can process time-series data through chain interconnection between its own neurons.
LSTM [
20] introduces cellular memory units in the base of the hidden layer of RNN, which allows the model to preserve and control short-term state memory and effectively solves the problem of gradient disappearance or gradient explosion that exists in RNN.
In the training and testing of LSTM, the value of each hidden layer is determined using the current moment input and the previous moment hidden layer value. The memory unit structure of LSTM consists of three gate controllers, which are the input gate, the forgetting gate and the output gate. Three gated recurrent memory units implement the forgetting and updating of the memory state information, during which the key information is ensured to be retained by memory and the minor information is forgotten, ensuring the flow and storage of memory state information among hidden layers. The process keeps recurrent until the input of all temporal data is completed. The memory cell structure of LSTM is shown in
Figure 2. The steps of the LSTM single hidden layer computation update state are:
Immediate memory state information of the current moment
:
forgotten gate
:
input gate
:
Current moment memory state information
:
where
is the point-by-point product.
Output of the current hidden layer of the LSTM
:
In Equations (11)–(16), , , and are the connection weight matrices between the input and hidden layers at time t, respectively; , , and are the connection weight matrices between the hidden layers at times t − 1 and t, respectively; , , , are the biases of the input nodes, input gates, forgetting gates and output gates, respectively; represents the sigmoid functions.
5. Analysis of Axle Box Bearing Test Data of High-Speed Train
The defects data are collected from a pure rolling test platform of a high-speed train. The train moving mechanism consists of two groups of bogie systems. Each group of bogies has two groups of wheels and axle box bearings. The train was driven by a drive motor. In
Figure 4, the left half is the test bench and the right half is the test sensor installation schematic. In this test, the bearing disassembled from the in-service train was used to simulate the failure and its structural parameters are shown in
Table 2. The data were collected at a train speed of 100 km/h with a sampling frequency of 25,600 Hz.
Figure 5 shows the time domain waveforms of rolling bearings with failure-free state and different degrees of outer ring fault state. From the figure can be observed that with the deepening of the fault degree, the vibration amplitude of the signal was larger, but the difference between failure-free and moderate failure was small, difficult to distinguish.
Selecting 50 samples from failure-free bearings signal in the healthy state as the train dataset, the length of each group of samples was 10,240, and also collecting 50 samples from failure-free signal and three different fault degree signals as test dataset, and the data length was also 10,240. The train samples were input into the DRSN-LSTM reconstruction model, and the network loss value was constructed with the mean square error of the input signal and reconstructed signal.
As shown in
Figure 6, the loss value of the network decreased with the increase of epochs, and the loss value near 200 iterations was close to zero, indicating that the model converged at this time. The input signal and reconstructed signal were extracted using multi-domain feature extraction to obtain the input signal feature extraction vector and reconstructed signal feature extraction vector, respectively; and the mean squared error values of the two vectors were calculated as shown in
Figure 7. From
Figure 7, it can be seen that there was a big difference between failure-free samples and samples with different degrees of failures, and the mean squared error indicator increased with the aggravation of failures.
7. Comparison of Methods and Validation
To verify the superiority of the proposed method, the performance degradation of the same bearing data is assessed using multiple comparison methods under the same model hyperparameters, and the result curves are shown in
Figure 13. To verify that the model combination of DRSN-LSTM can effectively preserve the temporal characteristics of signal while extracting features, the signal reconstruction models are built with CNN and LSTM, respectively, the performance degradation indicator is constructed with multi-domain feature extraction. The curves are shown in
Figure 13a,b, respectively. From
Figure 13a, it can be seen that the performance degradation indicator fluctuates between 0 and 0.1 from the first sample to the 2330th sample, and the degradation indicator of the signal only exceeds the alert line at the 2351st sample. In
Figure 13b, the assessment curve gradually rises slowly from the first sample to the 1960th sample, and the performance degradation indicator only exceeds the alert line in the 1961st sample, indicating that the performance degradation assessment using the CNN method or LSTM is not advantageous.
In order to study the influence of DRSN model on bearing performance degradation assessment results, the CNN-LSTM model was used to build the signal reconstruction model. The mean square error between reconstructed signal and input signal after multi-domain feature extraction was used as the performance degradation indicator. The result curves are shown in
Figure 14c. The curves in the first sample to the 2400th sample mean square error slowly increased, and it only exceeds the alert line in the 2413th sample. Thus, CNN-LSTM cannot effectively detect early bearing faults, which indirectly illustrates the effectiveness and superiority of DRSN for signal reconstruction.
The DRSN-LSTM model was used to reconstruct the signal, in the model testing stage, without feature extraction, the mean square error between reconstructed signal and input signal was directly used as the performance degradation indicator. The curve shown in
Figure 13d from the first sample to the 2340th sample is nearly horizontal, and the mean square error does not exceed the alert line until the 2346th sample, which is not advantageous as the performance degradation indicator compared to reconstructed signal and input signal after multi-domain feature extraction. Thus, it shows the importance of using multi-domain feature extraction in this paper.
In this paper, the early failure points detected for bearing 1 are 540 points ahead of the 955th sample early failure point detected using the multivariate state estimation model in the literature [
21]. This is 232 sample points ahead of the 647th sample early failure point detected with the AE-KSVD network model in Chapter 5 of the literature [
7]. It proves that the deep learning model has a great advantage over the mathematical model and the shallow network model.
To further test the feasibility of the DRSN-LSTM reconstruction model, the envelope spectrum spectral peak factor was used as the optimization indicator for adaptive resonance demodulation of the 414th and 415th sample signals. The envelope spectrum analysis was implemented after resonance demodulation, and the results are shown in
Figure 14 and
Figure 15. It can be seen that the noise component of the signal was significantly weakened after resonance demodulation. The cyclic pulse of signal can be observed in
Figure 14c, and the frequency 124.6 Hz and its corresponding double frequency 242.3 Hz which was similar to the characteristic frequency of the outer ring appear in the spectrum
Figure 14d. Moreover, there was no component similar to the frequency and double frequency of the characteristic of the outer ring in
Figure 15. The optimal filtering band of the 415th sample signal has a great difference from that of the 414th sample signal. It can be confirmed that an early outer ring failure occurred in the bearing at this time, which confirms the effectiveness of the proposed method.