1. Introduction
With the rapid development of intelligent manufacturing and industrial internet of things technology, the mechanical equipment condition monitoring system collects a huge amount of data. These condition monitoring systems usually contain various sensors that provide a wealth of monitoring information offering new opportunities for predicting the remaining useful life (RUL) of machinery. At the same time, however, the data collected by these sensors is explosive and non-linear compared to traditional industrial monitoring data, making it challenging to use these data better [
1]. Due to the complexity of current mechanical systems, it is exceedingly challenging to build an accurate mathematical or physical prognostics model based on fundamental principles of failure processes [
2].
Data-driven RUL prediction is generally based on the following six processes: data acquisition, feature selection, data processing, feature extraction, degenerate behavior learning, and RUL prediction [
3]. Firstly, the sensors installed on the machinery collect different monitoring data, such as vibration, pressure, temperature, and sound. Secondly, before analyzing the data, the data type and scale need to be clarified, and the data need to be pre-processed, such as with standardization after having a preliminary understanding of the data to support subsequent analysis and modeling of the data. In the third step, to map the machine’s health state, representative features need to be selected using signal processing techniques. However, these initially selected features may not be responsive to the deterioration of the system or provide any information that is valuable for RUL prediction. Therefore, there is also a need to use feature extraction methods in the aforementioned features to improve the sensitivity to the degradation state of the device. Muhammad Mohsin Khan et al. [
4] utilized the obtained vibration signals from slurry pumps for generating degradation trends. Fernando Sánchez Lasheras et al. [
5] combined the multivariate adaptive regression splines technique with the principal component analysis (PCA), dendrograms, and classification and regression trees to extract elements from sensor signals and train a hybrid model. Aleksandar Brkovic et al. [
6] extracted the standard deviation and the logarithmic energy entropy as the representative features from interest sub-bands and reduced feature space dimension into two by scattering matrices. After that, these selected and extracted features were then fed into machine learning models to learn the degenerate behavior of the machine, such as Gaussian process regression [
7], SVM [
8], and artificial neural networks [
9]. The RUL was then computed using these learnt models.
Deep learning [
10] is gaining popularity for RUL prediction of machinery as a result of its potent data characterization capacity in data-driven RUL prediction. Deep learning is a subset of machine learning that employs multilayer artificial neural networks to provide cutting-edge accuracy in different classification or regression problems. Deep learning techniques, such as deep belief network (DBN) [
11], recurrent neural network (RNN) [
12], convolutional neural network (CNN) [
13], and long short-term memory (LSTM) network [
14], can automatically learn multiple levels of representation from raw data without the need for domain-specific expertise. Deep learning has been a considerable success in a variety of disciplines, including face recognition [
15], speech recognition [
16], and natural language processing [
17], as a result of its full representational learning potential. Several deep learning-based studies have been conducted for the prediction of RUL in mechanical systems. Chen et al. [
18] used a neural network with gated recurrent units to forecast the mechanical RUL of a nonlinear deterioration process. Zhang et al. [
19] studied a new architecture of LSTM-based networks for discovering basic patterns in time series to track the degradation of the system and predict RUL. Wang et al. [
20] proposed a deep separable conversation network, and this new deep prediction network can be used for mechanical RUL prediction. Kang et al. [
21] developed an algorithm for automatically predicting mechanical failures in continuous production lines on the basis of a machine learning approach. Ji et al. [
22] adopted the PCA–BLSTM model for the RUL prediction of an airplane engine.
Although deep learning has yielded fruitful results on mechanical RUL prediction, the existence of prediction methods still has the following limitations. (1) Correlations between data from different sensors and mechanical degradation information in deep learning algorithms are not explicitly considered in feature selection. Although the data from multiple sensors can be collected during the operation of the machine, it is not the case that more data contain more useful information. Some of the sensor monitoring data may not be correlated with the degradation of the machine, which not only fails to provide adequate information for the prediction of RUL but also generates data redundancy that leads to a decrease in the prediction accuracy of the model. Therefore, it is necessary to analyze the correlation between each sensor and the RUL of the machinery first and identify the sensor data with a strong correlation with the RUL. (2) The over-dimensionality and nonlinearity of data features generated by the operation of complex systems with multiple components and multiple states reduce the effectiveness of deep learning algorithms for RUL prediction. Due to the complexity of the system, the resulting data are highly dimensional and nonlinearly correlated. In feature extraction and data processing, high-dimensional feature vectors often lead to dimensional disasters [
23]. As the dimensionality of the dataset increases, the number of samples required for algorithm learning increases exponentially. In addition, the sparsity of the data increases as the dimensionality increases. It is more challenging to explore the same dataset in a high-dimensional vector space than in an equally sparse dataset. Therefore, reducing the dimensionality of the data and extracting their main feature components are urgent problems.
In order to solve the limitations of the above problem, this paper explores the life prediction based on the deep long short memory (DLSTM) network with the example of aero-engine life prediction. The rest of this paper is organized as follows.
Section 2 summarizes the related work and briefly introduces the proposed framework of MKDN.
Section 3 details the theory of MKDN.
Section 4 shows the experimental results of the proposed method by using the public Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) datasets.
Section 5 provides a detailed comparison and analysis of the proposed methods. Finally, conclusions and future works are drawn in
Section 6.
2. The Related Work and Proposed Framework
To overcome the above shortcomings, a general three-step solution, MIC-KPCA-DLSTM-based neural network (MKDN), is proposed in this paper. In the first step, considering the correlation impact between features and RUL information, the maximum information coefficient (MIC) method is applied to select the key features. Then, kernel principal component analysis (KPCA) is applied for nonlinear feature extraction and dimensionality reduction. By reducing the dimensionality, over-fitting caused by excessive model parameters can be effectively eliminated. Finally, we propose a DLSTM-based deep learning method to predict RUL as the third step by inputting the above dimensionality-reduced feature data. To validate the proposed MKDN model, we conducted a case study, i.e., RUL prediction for turbofan engines. The experimental results of RUL prediction show that the MKDN model achieves a high RUL prediction accuracy and outperforms some state-of-the-art RUL prediction methods and typical deep learning models. The main contributions of this paper are summarized as follows.
(1) A robust RUL prediction framework, MKDN, is proposed by utilizing effective data processing techniques and dynamic deep learning models for time series analysis such as MIC, KPCA, and DLSTM neural network. Meanwhile, in the MKDN, a two-layer DLSTM-based prognostic model (TDPM) was designed and utilized for performance degradation and remaining useful life prediction of machinery. The proposed MKDN framework produces better results compared with state-of-the-art methods.
(2) For selecting the RUL correlation features of machinery with non-linear correlation data from multiple sensors, a two-step feature selection approach based on the maximum information coefficient theory (TFMIC) is proposed. The first step constructs a threshold function based on the MIC method. In the second step, the original data of time series variables collected by mechanical sensors are filtered by the above threshold function, and the obtained time series variables with a high impact on mechanical RUL are composed into a new feature set. Benefitting from the powerful selection capability of MIC for non-linear correlation data, the introduction of MIC in the TFMIC method not only effectively selects different sensor data that have a solid intrinsic relationship with RUL but also dramatically reduces the number of the features. Consequently, the updated feature set avoids elements with minimal significance to RUL and substantially increases the precision of RUL prediction.
(3) A coupled method based on KPCA with SMA (SKPCA) is applied to noise reduction, and feature extraction of multi-sensor data is proposed. In the method, a simple moving average method (SMA) with a sliding window is used for noise reduction and smoothing of multidimensional sensor data with large random fluctuations and noise perturbations. Then, the KPCA is applied as the second step for nonlinear feature extraction and dimensionality reduction. By extracting features and noise reduction, factors containing more overlapping information can be effectively eliminated to increase the stability of training data and also help reduce the dimensionality of training data to prevent overfitting in the training process and improve the prediction accuracy.
As shown in
Figure 1, the proposed RUL prediction framework based on MKDN for the mechanical system mainly includes data acquisition, feature selection, data normalization, noise reduction and feature extraction, model training, and RUL prediction.
During the operation of machinery, various kinds of signal data are collected by various types of sensors, such as pressure, temperature, vibration, speed, flow, and static electricity. Firstly, the TFMIC method is initially used to analyze the correlation between the collected individual sensor signals and the RUL of the machine. Then, the sensor signals strongly correlated with the RUL information are selected as the following input features. Secondly, the selected features are put into the SKPCA method to reduce noise, extract valuable features, and reduce data dimensionality. Thirdly, these extracted and dimensioned features are fed into the TDPM, which contains two LSTM layers to catch temporal features. Two fully connected layers and a regression layer are employed as the output layer, where the temporal output features are sent into them, and the gathered individual sensor signals are eventually fused to the RUL values. During the final testing phase, online sensor signal data are sequentially transmitted into the trained MKDN, and the estimated RUL is obtained. To avoid overfitting when training MKDN, a regularization method, dropout [
24], is used.
4. Experiment Analysis
4.1. Dataset Description
NASA turbojet datasets generated by the commercial modular aero-propulsion system simulation (C-MAPSS) platform [
33] were utilized to evaluate the proposed method. It is one of the most widely utilized forms of benchmark data in RUL prediction studies. As seen in
Figure 6, the C-MAPSS platform was simulated using a typical gas turbofan engine consisting of five modules: fan, low-pressure compressor (LPC), high-pressure compressor (HPC), low-pressure turbine (LPT), and high-pressure turbine (HPT). Different operational parameters, such as fuel velocity and pressure, are varied to model various failure and degradation processes in turbofan engines. During the experiment, the turbofan engine begins running in good condition and gradually develops anomalous states that lead to deterioration and eventual failure.
The datasets were divided into four subsets, numbered FD001 through FD004, each with its training and test subsets, as indicated in
Table 1. The training datasets contained the signal for the entire lifetime, while the test datasets contained the entire sensor data ended at some point before the engine failure, wherein the RUL needed to be predicted. The training and test datasets consisted of several cycles, each containing 26 columns representing the engine’s ID, cycle index, 3 operating parameters, and 21 sensor measurements.
4.2. Performance Evaluation Indicators
To evaluate the performance of the proposed MKDN model for RUL prediction and to facilitate comparison with other methods, two commonly used evaluation criteria, namely, the root mean square error (RMSE) and the scoring function, were applied and introduced as follows.
- (1)
Scoring function: The scoring function utilized in this work is defined in the 2008 Prognostics and Health Management Data Challenge [
33], which is expressed as
where
score is the computed value of scoring function,
n is the number of testing samples, and
is the error between the estimated RUL and the actual RUL concerning the
ith testing sample.
- (2)
RMSE: The
RMSE is a widely used metric for performance assessment in prognostics and health management. The
RMSE can be measured as follows:
4.3. RUL Target Function
The segmented linear model is used for prediction [
31], as shown in
Figure 7. The segmented linear model is utilized because the engine’s deterioration characteristics are not immediately apparent at first. Instead, after a period of time, the engine’s level of degradation typically worsens until failure.
4.4. The Results of Feature Selection Obtained by TFMIC
Low correlation between features and RUL in the C-MAPSS datasets may cause unsatisfactory RUL estimation performance. Therefore, features reflecting mechanical degradation should be found to obtain accurate predictions.
Initially, the feature set S = {degradation life cycles, condition1, condition2, condition3, sensor1, sensor2, …, sensor21} constructed by the raw data is built as the input dataset of the TFMIC algorithm. For the convenience of calculation, we ranked the degradation life cycles of each engine from largest to smallest as the RUL of this engine. The operating conditions data and sensor features are ordered on the basis of their location in S, namely, c1, c2, c3, f1,..., fi,..., f21, where fi denotes the ith feature.
The TFMIC method is developed to solve the deficiency of inadequate consideration of nonlinear interactions and to mine the deep mutual information between features and degrading life cycles. Taking the FD003 dataset in C-MAPSS datasets as an example, we denoted the training data in the FD003 dataset by S, where n denotes the number of aero engines, m denotes the number of condition monitoring variables for each engine, Ni denotes the monitoring data of the ith engine, and Yi denotes the RUL of the ith engine. So, we can obtain n = 100, m = 24, S= {N1, N2, …Ni,…N100}, Y = {Y1, Y2,…, Yi,…Y100}, Ni = {c1, c2, c3, f1, …, fi, …, f21}. Then, calculating the MIC threshold values between each feature and the deterioration life cycle achieves the main feature subset, where features that vary little with the deterioration life cycle are excluded. The threshold in the TFMIC method is obtained by Equation (14) and finds = 0.39.
Figure 8 shows the MIC calculation results for each feature with RUL for 10 of the engines in FD003 dataset nos. 10–100. Meanwhile, as seen in
Figure 9, the threshold of the FD003 dataset was computed to be 0.39 to weed out features rarely associated with life cycles. Finally, the new optimal feature set were acquired as
Sop = [
f2
, f3
, f4
, f7
, f8
, f9
, f11
, f12
, f13
, f14
, f15
, f17
, f20
, f21].
4.5. Data Normalization
Since the acquired sensor data have different ranges, a normalization process is required to unify the values and obtain unbiased information from the readings of each sensor. In this study, the z-score normalization method [
34] was used to obtain the standard range of all variables.
where
represents the original signals collected for the
t-th sensor;
represents the standardization data; and
and
denote the mean and standard deviation of
, respectively. Normalization helps to ensure that all variables associated with all operating conditions are considered equally.
4.6. The Results of Noise Reduction and Feature Extraction
This study used the SKPCA method to reduce noise and extract degradation information with the interference signals eliminated. The multi-sensor data obtained by the engine has large random fluctuations and noise interference that may affect the performance of the RUL predictions. Therefore, SMA combined with a moving sliding window algorithm was used to remove the noise and attenuate the random fluctuations of the sensor data. The sliding window length (S
w) directly determines the smoothing effect of the engine sensor data and thus directly affects the accuracy of the RUL prediction.
Figure 10 shows the pre-processed data for sensor 2 with different sliding window lengths compared to the original sensor data in FD003. As shown in
Figure 10, the sensor data were smoothed using three different S
w of 10, 20, and 50. The fluctuations in the smoothed sensor data were reduced as compared to the raw sensor data, well reflecting the trend of the raw sensor data. In addition, a series of comparison experiments in
Section 4.7.1 found that better prediction values were obtained when a S
w of 20 was used, implying that the data smoothing effect had the best effect on the prediction when the S
w was 20. Therefore, in this experiment, the S
w was set to 20, and in order to obtain more data, the step length of the sliding window was set to 1.
KPCA was used to extract the aforementioned SMA-smoothed features and perform dimensionality reduction. The model was built by using the RBF of Equation (21) as the kernel function, and the cumulative variance contribution rate was chosen as the criterion for the selection of the target dimensionality reduction, where and the threshold of cumulative contribution rate of the kernel principal element = 95%.
The ones that satisfy the threshold condition were the first 10-dimensional kernel principal elements, and their cumulative contribution rates are shown in
Figure 11. The respective contribution rates of the first 10 kernel principal elements are shown in
Figure 12. Finally, these 10-dimensional data after feature extraction were selected as the optimal dataset for subsequent TDPM model training.
To reflect the superiority of SKPCA more intuitively, comparison tests were conducted using the most commonly used dimensionality reduction methods in other literature, PCA and KPCA, as well as the feature extraction methods SPCA (SMA + PCA) and SKPCA (SMA + KPCA) based on data smoothing SMA proposed in this paper.
Table 2 shows the average RUL prediction errors for the 10 test engines in the FD003 dataset using the four methods mentioned above. In these experiments, the feature selection method TFMIC and prediction algorithm TDPM were used the same. Clearly, the process based on SKPCA achieved the best performance with 9.82 in RMSE and 226.55 in the scoring function. The other methods were weak in scoring function, which is unsuitable for RUL estimation as opposed to SKPCA.
Figure 13 shows the raw and reconstructed data through PCA, KPCA, SPCA, and SKPCA for one engine in FD003.
Figure 13a shows the raw data before the information was extracted using the four methods mentioned above, where S2–S21 represent the raw data after normalization of the 14 sensors selected by TFMIC.
Figure 13b–e shows the principal component data obtained with the above four methods, where PC1-PC10 indicate the different principal component information extracted by the four methods.
Figure 13b,c shows that the PCA and KPCA methods had significant fluctuations in the extracted features when SMA data smoothing was not used, which was very unfavorable for predicting the mechanical RUL. From
Figure 13d,e, we can see that the features extracted by both SPCA and SKPCA were effective in noise elimination, but when the cumulative contribution of the principal element and the kernel principal element both took the same threshold of 95%, the feature information extracted by SKPCA was relatively more comprehensive than that by SPCA because there were only three principal components extracted by SPCA and 10 principal components extracted and more feature details extracted by SKPCA.
In conclusion, SKPCA can effectively reduce data fluctuation and more fully mine the data compared with other commonly used feature extraction methods so that the SKPCA algorithm can obtain more highly accurate RUL prediction results.
4.7. The Results of Model Prediction
The different architectures and parameters of this proposed network affect the prediction performance. Therefore, the architectures and parameters of the proposed TDPM were investigated on the C-MAPSS subset FD003. In particular, three essential factors, namely, sliding window length, batch size, and the number of LSTM layers, need to be determined.
4.7.1. Effects of the Sliding Window Length
Large random fluctuations and noise disturbances in the multi-sensor data obtained from the aero-engine may affect the performance of RUL predictions. Therefore, using data smoothing methods to remove noise and attenuate the random fluctuations of sensor data is beneficial to improving prediction accuracy. According to Equation (20), combination with the time sliding window technique can effectively remove the random fluctuations and noise disturbances in the data of this example. Among them, the sliding window length (Sw) determines the degree of data smoothing. However, the final RUL prediction obtained has a large gap using different Sw to smooth the data.
Figure 14a,d illustrates box plots of RMSE and score values for RUL estimation when the S
w is taken from 10 to 50. It can be seen from the plots that when the S
w was 20, the prediction performance was the best among the two-evaluation metrics. When it was greater than 20, the RUL prediction performance deteriorated rapidly as the S
w increased. This is because when SMA is used for data smoothing when the S
w is small, it can adequately attenuate the random fluctuations of sensor data and remove the noise well. However, as the S
w increased, the averaged data are so much that the data themselves become seriously distorted, leading to the rapid deterioration of RUL prediction performance. Meanwhile, when it is too small, the SMA data smoothing method is unable to effectively reduce the random fluctuations of the original sensor data and remove the noise. So, considering the two-evaluation criterion, the S
w was set to be 20.
4.7.2. Effects of the Batch Size
Each epoch’s training duration and the degree of gradient smoothness between iterations are both determined by the batch size. Appropriate batch size parameter makes the gradient descent direction of the small batch size dataset determined by it better represent the gradient descent direction of the overall sample, thus ensuring the accuracy of the loss function in calculating the extreme value direction.
Figure 14b,e shows the box plots of RMSE and score values of RUL estimation when the batch size was taken from 10 to 100. The results show that the variation of the error did not show monotonicity with the increase in batch size. With the increase in batch size, the comprehensive performance of the prediction results showed a trend of getting worse and then better. The worst prediction was achieved when the value of batch size was 60. The all-around performance of the prediction was relatively good when the value of batch size was small, especially the performance of the score function. In addition, when the batch size was larger than 50, the probability of outliers in the score evaluation index increased, and the prediction performance became unstable. Considering the accuracy and concentration of the prediction values of RMSE and score values, its performance was the best among the two-evaluation metrics when the batch size was 20. In this study, the batch size was determined to be 20, considering the significance of the accuracy and dependability of predictive capabilities in the operation of engines.
4.7.3. Effects of the LSTM Layer Number
Generally speaking, the more layers of a neural network, the deeper the abstraction level of input features and, therefore, the better the prediction effect. However, when the number of layers reaches a certain level, the prediction effect worsens due to the lack of data and overfitting. The more layers, the more resources are consumed for training and the corresponding training time is longer.
Figure 14c,f shows the box plots of RMSE and score values of RUL estimated with LSTM layers ranging from 1 to 5. The plots revealed that the RUL predictions were comparable when the LSTM layer number was 2, 3, or 4, but deteriorated when the layer number was 1 or 5. This is because the neural network cannot tap the intrinsic connection between sensor data and RUL when the layer number is too small. However, when the layer number is too large, overfitting occurs.
Figure 14i shows the average training time for a different number of layers, and it is evident from the figure that the training time became longer as the layer number increased. The prediction effect was similar when the LSTM layer number was 2, 3, and 4, but the training time was shorter when the layer number was 2. In industrial applications, the shorter the computation time of the algorithm, the better it is to make decisions quickly, so the LSTM layer number was chosen to be 2.
4.7.4. Final Parameter Settings and Prediction Results
By finding the optimal parameters for the proposed TDPM network architecture, the final parameter settings obtained are shown in
Table 3.
Figure 14g,h shows the iterative process of the loss values and the iterative process of the RMSE values of the TDPM network architecture in the training process under this setting, where the training process contains both the training and validation sets.
Figure 14j–l shows the results of RUL estimation in three random engines, where the red curve represents the predicted value and the blue curve represents the actual RUL value. It can be seen that the predicted values were distributed around the valid values when the engines were in the middle and late stages of the cycle, so the predicted values obtained by the model fitted the actual values very accurately. It can be concluded that this model has high prediction accuracy for such complex machinery as engines, which can provide a basis for improving the reliability and safety of engines.
5. Comparisons and Analysis
5.1. The Validity of Feature Construction Method TFMIC-SKPCA
In order to evaluate the efficacy of the proposed feature construction approach TFMIC-SKPCA, the prediction results obtained by the features extracted using TFMIC-SKPCA were experimentally compared with the prediction results obtained by the features employed in the existing literature. In the literature [
26,
35], the measurements of 14 sensors were used as input features, namely, 2, 3, 4, 7, 8, 9, 11, 12, 13, 14, 15, 17, 20, and 21, which are conventional features commonly used in the existing literature.
Figure 15 shows the RUL prediction results yielded through the features obtained by TFMIC-SKPCA and the conventional features commonly used in the existing literature. Regarding RMSE values, the predictions based on the TFMIC-SKPCA features significantly outperformed the traditional features. For subsets FD001 and FD003, the TFMIC-SKPCA feature-based approach performed slightly better than the conventional feature-based method in terms of score values. For subsets FD002 and FD004, however, the TFMIC-SKPCA feature-based approach achieved excellent outcomes in early prediction by obtaining lower Score values under complicated operating circumstances and high noise, which is vital for the maintenance of the critical machine.
The results of the experiments demonstrate that the degraded features can be successfully recovered from the raw sensor measurement data using the proposed feature building method TFMIC-SKPCA, and sensitive features that are strongly connected with the RUL of the machinery may be chosen. According to the preceding description, the presented TFMIC-SKPCA approach offers exceptional nonlinear noise signal processing and analysis capabilities. Furthermore, it has minimal parameters and is easily adaptable to diverse datasets.
5.2. Comparisions with the State-of-the-Art Methods
For the purpose of establishing the validity and superiority of the proposed framework, comparisons have been made with some of the state-of-the-art methods of the past several years. The MKDN framework described in this research surpasses existing comparative approaches, achieving an RMSE of 9.65 and a score of 191.34 on the FD003 test set. The predictions of the RUL for four subsets of the C-MAPSS dataset are summarized in
Table 4. Compared with the previous optimal model, the RMSE values of the MKDN model on the four datasets were reduced by 5.1%, 12.59%, 22.56%, and 0.98%, and the score values were reduced by 6.74%, 44.61%, 32.63%, and 2.53%, respectively.
Compared to SVM, MODBNE, and DCNN, which predict on the basis of local degradation features, the MKDN method proposed in this study not only has advantages in MIC feature selection and KPCA feature extraction, but it also has advantages of RNN in processing sequence information, which learns the whole degenerative trend characteristics of multi-sensor sequences.
Considering datasets with complicated failure modes, particularly FD002 and FD004, it provides a practical application with improved generalization capacity and more precise predictions. Unlike model-based approaches such as RF and GB, MKDN methods exhibit significant improvements in RMSE and score values and can adaptively uncover more complex hidden connections from sensor measurement data. DLSTM, BLSTM, and other RNN-based models perform well with low RMSE in RUL estimation; however, score performs poorly. Compared with Li-DAG, a hybrid model combining CNN and LSTM, it performs similarly in subsets FD001 and FD004. However, MKDN highlights critical information after feature selection and feature extraction and performs better overall. In conclusion, the proposed MKDN framework in this paper performs well on both evaluation metrics.
This demonstrates that the MIC-based TFMIC method can select features that are highly relevant to RUL and that SKPCA has good performance in removing noise and data fluctuations as well as extracting the primary information of the data, which leads to improved engine RUL prediction in the MKDN framework.
6. Conclusions
In this study, a new MKDN model based on the DLSTM network for RUL prediction of nonlinear deterioration process is proposed. In the model, TFMIC is designed to select the most relevant features, and the SKPCA is built to eliminate noise, reduce dimensionality, and extract nonlinear features. The last step is using TDPM, an optimized network with two LSTM layers and fully connected layers, to predict RUL. C-MAPSS-Data, a dataset consisting of aero-engines with a nonlinear degradation process, was utilized to evaluate the proposed method. Results show that MKDN can provide better RUL prediction for nonlinear deterioration process of complex systems. Compared with the state-of-the-art methods, the MKDN method achieves a maximum decrease in RMSE and score of 22.56% and 44.61%, respectively.
In the future, we will investigate ways to extract nonlinear information more efficiently in order to significantly increase prediction accuracy.