Next Article in Journal
A New Rotor Position Measurement Method for Permanent Magnet Spherical Motors
Next Article in Special Issue
A Review in Fault Diagnosis and Health Assessment for Railway Traction Drives
Previous Article in Journal
3D Bioprinting Human Induced Pluripotent Stem Cell-Derived Neural Tissues Using a Novel Lab-on-a-Printer Technology
Previous Article in Special Issue
Bearing Fault Diagnosis under Variable Rotational Speeds Using Stockwell Transform-Based Vibration Imaging and Transfer Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Transfer Learning with Deep Recurrent Neural Networks for Remaining Useful Life Estimation

1
Key Laboratory of Advanced Manufacturing Technology of Ministry of Education, Guizhou University, Guiyang 550025, China
2
Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
3
Guizhou Provincial Key Laboratory of Internet Collaborative intelligent manufacturing, Guizhou University, Guiyang 550025, China
4
School of Mechanical Engineering, Guizhou University, Guiyang 550025, China
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2018, 8(12), 2416; https://doi.org/10.3390/app8122416
Submission received: 14 September 2018 / Revised: 7 November 2018 / Accepted: 10 November 2018 / Published: 28 November 2018
(This article belongs to the Special Issue Fault Detection and Diagnosis in Mechatronics Systems)

Abstract

:
Prognostics, such as remaining useful life (RUL) prediction, is a crucial task in condition-based maintenance. A major challenge in data-driven prognostics is the difficulty of obtaining a sufficient number of samples of failure progression. However, for traditional machine learning methods and deep neural networks, enough training data is a prerequisite to train good prediction models. In this work, we proposed a transfer learning algorithm based on Bi-directional Long Short-Term Memory (BLSTM) recurrent neural networks for RUL estimation, in which the models can be first trained on different but related datasets and then fine-tuned by the target dataset. Extensive experimental results show that transfer learning can in general improve the prediction models on the dataset with a small number of samples. There is one exception that when transferring from multi-type operating conditions to single operating conditions, transfer learning led to a worse result.

Graphical Abstract

1. Introduction

Recently, fault diagnosis and health management, including diagnosis and prognosis approaches, have been actively researched [1,2,3,4,5]. Fault diagnosis and health management techniques are widely applied in diverse areas such as manufacturing, aerospace, automotive, power generation, and transportation [6,7,8,9]. Diagnostics is the process of identification of a failure. Prognostics is an engineering discipline focusing on the prediction of the time when a system or a component fails to perform its intended function. In diagnostics, once degradation is detected, unscheduled maintenance should be performed to prevent the consequences of failure. In prognostics, maintenance preparation could be performed when the system is up and running, since the time to failure is known early enough. A major application of prognostics is estimating remaining useful life (RUL) of systems. It is also called remaining service or residual life estimation [10]. Prognostic methods can be grouped into two major categories: data-driven methods, and physics-based methods. The former requires sufficient samples that are run until faults or failures are detected, whereas the latter require an understanding of the physics of system failure progression [11].
In recent years, the data-driven approaches for remaining useful life (RUL) studies [12,13,14,15,16] have been attracting a lot of attention due to their avoidance of dependency on the theoretical understanding of complex systems. However, a major challenge in data-driven prognostics is that it is often impossible to obtain a large number of samples for failure progression, which is costly and labor demanding [11]. This situation can arise for several reasons: (1) industry systems are not allowed to run until failure due to the consequences, especially for critical systems and failures; (2) most of the electro-mechanical failures occur slowly and follow a degradation path such that failure degradation of a system might take months or even years [17]. Several methods have been used to address this challenge. The first approach is accelerated aging by running the system in a lab with extreme loads, increased speed or using imitations of real components which are made by vulnerable materials, so that a failure progresses faster than normal [14,15,16]. Another approach is introducing unnatural failure progression by using exponential degradation to model regular failure progression [18]. Both methods have their own strengths and weaknesses with the capability to represent the failure degradation to a certain level. However, in real-world applications where the condition is very different from the lab environment, these methods are difficult to apply and to obtain good estimation performance.
Recently, deep learning has been shown to achieve impressive results in areas such as computer vision, image and video processing, speech, and natural language processing [19]. Deep learning methods have also been applied to fault diagnosis [20,21,22] and the RUL estimation problem. For the RUL estimation problem, a Convolutional Neural Network (CNN) is applied on a sliding window with multi-times weight and failures height in [23]. However, a major limitation of this work is that the sequence information is not fully considered in the CNN model. To address this issue, a Long Short-Term Memory (LSTM) model was proposed for RUL estimation [24], where the model can make full use of the sensor sequence information and performs much better than the CNN model. Following this success, Convolutional Bi-directional Long Short-Term Memory networks (CBLSTM) have been designed in [25] for RUL prediction, where firstly a CNN is first used to extract robust and informative local features from the sequential input, and then a bi-directional LSTM is introduced to encode temporal information. In addition to these approaches which use the raw input, a Vanilla LSTM neural network model has been used in [26] for RUL prediction, where a dynamic differential technology was proposed to extract inter-frame information and achieved high prediction accuracy. Moreover, an ensemble learning-based prognostic method is proposed in [27], which combines prediction results from multiple learning algorithms to get better performance. However, all these modern deep learning approaches for RUL prediction have a major limitation: they all require the availability of a large amount of training data for training these deep neural networks, while in real-world applications, it is often impossible to obtain a large number of failure progression samples. Therefore, both these deep neural network methods and traditional methods have not addressed the major challenge in data-driven prognostics: the big data of failure samples.
In this paper, we propose to use transfer learning-based deep neural networks for RUL prediction. In recent years, transfer learning has made great progress in image, audio, and text processing to address the data scarcity issue by taking advantage of datasets in related domains. It is widely adopted in the situation where source data and target data are in different feature spaces or have different distributions [28,29]. Transfer learning methods work by learning properties from source data and transfer them to the target data. The source data and target data can be different. But they should be more or less related. Zhong et al. [30] applied the domain separation framework of transfer learning for automatic speech recognition. Singh et al. [31] applied transfer learning in object detection with improved detection performance. Cao et al. [32] used transfer learning for breast cancer histology image analysis and achieved better performance than the popular handcrafted features. These studies showed that transfer learning can make use of both source data and target data to get better performance. In real-world applications of RUL prediction, the scarcity of target data is one of the major challenges in data-driven prognostics as it is often not possible to obtain large numbers of samples of failure progressions. However, we usually have access to a large amount of data of different but approximately related working conditions.
In this paper, we proposed a transfer learning approach with LSTM deep neural networks for RUL prediction. It provides an effective way to address the major challenge in data-driven prognostics. Our contribution in this paper includes:
(1)
We developed a bidirection LSTM recurrent neural network model for RUL prediction;
(2)
We proposed and demonstrated for the first time that the transfer learning-based prognostic model can boost the performance of RUL estimation by making full use of different but more or less related datasets;
(3)
We showed that datasets of mixed working conditions can be used to improve the performance of single working condition RUL prediction while the opposite is not true. This can give good guidance in real-world application where samples of certain working conditions are hard to obtain.
This rest of this paper is organized as follows: Section 2 describes the transfer learning-based prognostic RUL prediction algorithm. Section 3 presents the experiments and results. Section 4 discusses related issues of the results and Section 5 concludes the paper.

2. Methods

2.1. The Turbofan Engine RUL Prediction Problem and the C-MAPSS Datasets

To verify our transfer learning-based algorithm, we selected the Turbofan Engine RUL prediction problem as the benchmark. The corresponding C-MAPSS datasets (Turbofan Engine Degradation Simulation Datasets) are widely used for RUL estimation [33]. These datasets are provided by the NASA Ames Prognostics Data Repository [34], which contains 4 sub-datasets as given in Table 1. Each sub-dataset consists of multiple multivariate time series, which are further divided into training and testing sets. In the training set, the fault grows in magnitude until the system fails. In the testing set, the time series ends some time prior to system failures.
Every trajectory is an engine’s cycle records data. Every cycle record is a snapshot of data taken during a single operational cycle. A single-cycle datum in the C-MAPSS dataset is a 24-dimensional feature vector consisting of 3 operational settings and 21 sensor values. The operating condition settings are altitude, Mach number, and throttle resolver angle respectively, which determine different flight conditions of the aero-engine. In sub-dataset FD001, the engine suffered a failure of a high-pressure compressor with a single operation condition. In sub-dataset FD002, engine suffered a failure of a high-pressure compressor with six operation condition. In sub-dataset FD003, engine suffered a failure of a high-pressure compressor and fan with a single operation condition. In sub-dataset FD004, engine suffered a failure of a high-pressure compressor and fan with six operation condition.
Figure 1 illustrates the standardized operational setting values in every sub-dataset. The operational setting 3 values are stable in FD001 with a single operation condition, just as in FD003. The change and distribution of operational setting values are different between a single operation condition and six operation conditions.

2.2. Transfer Learning for RUL Prediction

Based on the availability of sample labels, transfer learning can be divided into three categories: inductive transfer learning, transductive transfer learning, and unsupervised transfer learning [28]. Inductive transfer learning requires the availability of the target data labels. If the source data labels are available and the target data labels are unavailable, transductive learning should then be used. If both the target data labels and the source data labels are unavailable, then the unsupervised transfer learning should be used. According to the properties of C-MAPSS datasets in which both the target data labels and the source data labels are available, inductive transfer learning is the most suitable approach that we adopted.
Based on what to transfer, transfer learning can be conducted at several levels: instance-transfer, feature-representation-transfer, parameter-transfer, and rational-knowledge-transfer [28]. Instance-transfer aims to re-weight some labeled data from the source domain and then use them in the target domain. Feature-representation-transfer tries to get the good feature representations that can reduce the difference between the source domain and the target domain. Parameter-transfer discovers shared parameters or prior knowledge between the source domain and the target domain. Relational-knowledge-transfer works by mapping the common relational knowledge or some similar patterns from the inputs to the outputs between both domains. According to the properties of the C-MAPSS datasets and experiment conditions, parameter-transfer is selected in our algorithm.
In this research, we apply transfer learning to address one of the major challenges in data-driven prognostics by transferring model parameters learned from the different but approximately related domain with a large amount of source data to the target RUL prediction problem with a small amount of data. The parameter-transfer scheme can be defined as follows:
D s = { X s , T s }
D t = { X t , T t }
where D s , X s , T s , D t , X t , T t respectively represent source domain, source samples, source task labels, target domain, target samples, target task labels. D s , D t are related or similar domains.
The relationship of the model weights of the source problem and of the target problem can be represented as follows:
W s = W 0 + W 1
W t = W 0 + W 2
Y s = f s ( X s + W s )
Y t = f t ( X t + W t )
Here, W s , W t are parameters in the source and the target task, respectively. They have some common parts W 0 and some different parts. Y s , Y t are the real outputs. f s , f t denote the learning models that map from the sample inputs to the task labels. The parameter-transfer scheme aims to transfer parameters from W s to W t by making use of the common parts W 0 and fine-tuning the different parts by further training the models in the target task.

2.3. The Transfer Learning Framework for RUL Prediction

The transfer learning framework is illustrated in Figure 2. The framework is composed of two Long Short-Term Memory (LSTM) networks. The network on the top was trained on the large amount data of the source task. Then the learned model was fine-tuned by further training with the small amount of data from the target task, which is usually a different but related task. In our experiments, it represents the degradation failure under different working conditions.

2.3.1. The BLSTM Neural Networks

To take advantage of the sequential nature of the sensor data in the turbine engine RUL prediction problem, recurrent neural networks (RNNs) are favored due to their ability to capture time-dependent relationships. However, conventional RNNs have the issue of vanishing and exploding gradient problems, which makes it extremely challenging to train such models. To address these issues, Long Short-Term Memory (LSTM), a gated recurrent neural network (RNN), was proposed in [35]. However, basic LSTM is only able to process the sequential data in forward directions. To capture both the past and future contexts, Bi-directional LSTM (BLSTM) [36] was proposed to process the sequence data in two directions including forward and backward directions with two separate hidden layers. Their outputs are then fed forward to the same output layer. This model has been shown to achieve good performance in machine health monitoring [25].
In an LSTM/BLSTM network, an LSTM cell contains three gates, namely input gate i t , forget gate f t , and output gate o t , which determine whether to use the input, whether to update the cell memory state, and whether to create an output, respectively. At each time step t, the following equations define the BLSTM [25].
i t = σ ( W i x t + V i h t 1 + b i ) , f t = σ ( W f x t + V f h t 1 + b f ) , o t = σ ( W o x t + V o h t 1 + b o ) , c t = f t c t 1 + i t tanh ( W c x t + V c h t 1 + b c ) , h t = o t tanh ( c t ) ,
i t = σ ( W i x t + V i h t + 1 + b i ) , f t = σ ( W f x t + V f h t + 1 + b f ) , o t = σ ( W o x t + V o h t + 1 + b o ) , c t = f t c t + 1 + i t tanh ( W c x t + V c h t + 1 + b c ) , h t = o t tanh ( c t ) ,
where the → and ← denote the forward and backward processes, respectively; model parameters including all W i , V i and b i are shared by all time steps and learned during model training; σ is the sigmoid function, ⊙ is the element-wise product and c t is a memory cell; The hidden state h t is updated by current data at the same time step x t ; the hidden state h t is at the previous time step; The complete BLSTM outputs h t of forward and backward processes as follows:
h t = h t h t
In our transfer learning framework, two BLSTM neural networks are used. Each of them has four hidden layers: the first BLSTM layer has 64 nodes with return sequences and a dropout rate of 0.2, 64 nodes in the second BLSTM layer with return sequences and a dropout rate of 0.2, the third flatten layer and 128 nodes in the fourth dense layer with a dropout rate of 0.5. Finally, a one-dimensional output layer is used to predict the RUL. We apply the L2 regularization techniques for four hidden layers and early stopping in training model. In our framework, the top BLSTM neural network model is first trained over the source data, which is then refined by training it with the target dataset. So the top BLSTM neural network model trained by source data can be considered as the initializer for the bottom BLSTM neural network model to be trained with target task data.

2.3.2. Input Data and Parameter Settings

All training and test datasets are shown in Table 1. Single-cycle data in the C-MAPSS dataset is a 30-dimensional feature vector, consisting of 3 operational settings, 21 sensor values and 6 one-hot encoding values. Each input sample in our networks contains 30 single-cycle data which are extracted from every multivariate time series of trajectories as shown in Figure 3. The step size of the sliding window is 1. The true RUL of a sample is determined by the true RUL of the last single-cycle data. For training data, 40% of trajectories are randomly selected as validation data. For testing data, the last slide window of every trajectory is selected as the input data.

2.3.3. Evaluation

To evaluate the performance of our RUL estimation model on the test data, two measures are used here: Scoring Fuction and Root Mean Square Error (RMSE).
The scoring function proposed in PHM 2008 Data Challenge [37] is shown in Equation (10), where n is the total number of trajectories, h i = R U L e s t , i R U L i , R U L e s t , i is estimated RUL and R U L i is the true RUL. This scoring function gives two different penalties. The h i < 0 penalty (estimated RUL is less than the true RUL) is smaller than the h i 0 penalty (estimated RUL is larger than true RUL). The justification for this difference is that when h i < 0 , we still have time to conduct system maintenance; but when h i 0 , the maintenance would be scheduled later than the required time, which may cause system failure.
S = i = 1 n ( e i h i 13 1 ) for h i < 0 i = 1 n ( e i h i 14 1 ) for h i 0
RMSE is also widely used as an evaluation metric for RUL estimation, as shown in Equation (11). RMSE gives the same penalty weights for h i < 0 and for h i 0 .
R M S E = 1 n i = 1 n h i 2
We used the mean score function as the loss function for training our networks, as shown below.
l o s s = S m e a n = S / n
where S is calculated from Equation (10), n is the total number of trajectories.

2.4. Data Preprocessing

2.4.1. Data Normalization

There are several ways to normalize the sensor data. Here we used standardization to remove the mean and scale the data with unit variance. We standardize the data by Equation (13), where x i is the ith sensor data, x i is the normalized data, μ i is the mean of ith sensor data, σ i is the ith corresponding standard deviation.
x i = x i μ i σ i

2.4.2. Operating Conditions

Previous research [23,24,37] on the C-MAPSS dataset showed that the operating setting in this dataset can be clustered into six distinct groups, each representing a distinct operating condition. Here we used the K-means algorithm to cluster operating setting values in all datasets into 6 clusters as shown in Figure 4. Based on clustering, an operating setting label can be represented as a 6-dimension vector using one-hot encoding.

2.4.3. RUL Target Function

The traditional way to define the degradation process in a system is using a linear model along time. However, in practical applications, the degradation in a system is negligible at the beginning of use and increases after an anomaly point. It is hard to estimate RUL before the anomaly point. Besides, estimating RUL before the anomaly point during which the system works well is not useful in practical applications. Hence, for the C-MAPSS datasets, a piece-wise linear degradation model was proposed in [37], which limits the maximum value of the RUL function as illustrated in Figure 5.
In this paper, we set the maximum RUL limit as 130 time cycles, the same as in [37]. We ignore data whose true RULs are greater than the maximum limit to pay attention to the degradation data.

3. Experiments and Results

To verify the performance of our BLSTM-based transfer learning algorithm for RUL estimation, we conducted a series of experiments on the C-MAPSS Datasets. We evaluated the performance of transfer learning over every dataset and studied how different working conditions and the numbers of samples for each pair of source and target datasets affect the performance of the resulting prediction models.
We designed four groups of without-transfer learning experiments (E1, E4, E7, E10) and eight groups of transfer learning experiments (E2, E3, E5, E6, E8, E9, E11, E12) involving 4 groups of target datasets (FD001, FD002, FD003, FD004) as shown in Table 2. For each group of target datasets, we randomly selected 10, 20, ..., 90, 100 trajectories to generate 10 datasets from which 10 evaluation datasets are generated. This will allow us to evaluate how the number of samples in the target dataset affects the performance of transfer learning.
For each transfer experiment, we compared the performance of standard BLSTM models over the target dataset without transfer learning against the performance of the BLSTM models with transfer learning. We repeated each such experiment three times to deal with the randomness of the algorithms. In total, we have conducted 10 × 3 × (8 + 4) = 360 experiments, where 10 is the number evaluation datasets, each with different numbers of train trajectories; 3 is the number of repeats for each experiment; 8 is the number of transfer experiment pairs; 4 is the number of experiments without transfer. For all experiments, the performances of the experiment were evaluated by score function and RMSE. Note that the experiment group of E2 and E3 have the same target data but different source data, and compare with the experiment group of E1. The same is true for E5 and E6 groups who compare with E4 group, E8 and E9 groups who compare with E7 group, and E11 and E12 groups who compare with E10 group.
Table 3 shows the results of all experiments in terms of the mean values of the performance scores and RMSE. IMP is the improvement of the models with transfer learning over those without transfer learning. It is defined as I M P = ( 1 ( W i t h T r a n s f e r ) / ( N o T r a n s f e r ) ) × 100 . From the IMP values, we can easily observe that transfer learning made great improvement except for groups E3 and E9, which will be explained in Section 4 (E3 and E9 are multiple operating conditions to 1 condition). For example, when the number of trajectories used is less than or equal to 60, transfer learning has improved the performance scores from 16.41% to 85.12% for group E2. Similar improvements are also observed for groups E5, E6, E8, E11, and E12.
In particular, we found transfer learning is more effective in general on models trained with small data sets. For example, it improves the model score by 43.19% for group E5 when only 10 trajectories are used to create the training sets, while the improvement is only 8.29% when 60 trajectories are used as training sets for the same E5 group.
We also found that the changes of Mean Score and RMSE have similar trends. In Figure 6 we showed the box plot of the Mean Scores to illustrate the distribution of every transfer learning experiment. Figure 6a–d respectively illustrate the performance of four groups of dataset target models (FD001, FD002, FD003, and FD004). The blue boxes are the performance without transfer and the yellow and green boxes are the performance with transfer. In Figure 6a, the transfer experiment E2 is more stable and gets lower mean score than the without-transfer experiment E1. In Figure 6b, both the transfer experiment E5 and E6 are more stable and get lower mean score than the without-transfer experiment E4. In Figure 6c, the transfer experiment E8 is more stable and gets a lower mean score than the without-transfer experiment E7. In Figure 6d, both the transfer experiment E11 and E12 are more stable and get lower mean score than the without-transfer experiment E10. Thus, from the distribution of box plots for experiments, we can easily observe that the performance of models with transfer learning are more stable and make great improvement than those without transfer learning except the transfer experiment E3 and E9, which will be explained in Section 4.
Figure 7 illustrates the RUL estimation results in test data for experiments, in which the training data set is generated by randomly selected 50 trajectories. Those figures show and compare the RUL estimation results for four test data sets (FD001, FD002, FD003, FD004). Figure 7a,d,j are the without-transfer experiment and others are the transfer experiments. For every figure, the X-axis is the test unit with increasing RUL, which is sorted by actual RUL values. The Y-axis is the actual and prediction RUL value. The blue points are the actual RUL value of test unit and the red points are the prediction RUL value of test unit. Comparing with Figure 7a,b, we found that the prediction RUL values in the transfer experiment E2 (Figure 7b) are nearer to the actual RUL values than in the without-transfer experiment E1 (Figure 7a). Comparing with Figure 7d–f, we found that both the prediction RUL values in the transfer experiment E5 (Figure 7e) and E6 (Figure 7f) are nearer to the actual RUL values than in the without-transfer experiment E4 (Figure 7d). Comparing with Figure 7g,h, we found that the prediction RUL values in the transfer experiment E8 (Figure 7h) are nearer to the actual RUL values than in the without-transfer experiment E7 (Figure 7g). Comparing with Figure 7j–l, we found that both the prediction RUL values in the transfer experiment E11 (Figure 7k) and E12 (Figure 7l) are nearer to the actual RUL values than in the without-transfer experiment E10 (Figure 7j). For these figures, comparing between transfer and no-transfer learning performance, we found that the effect of transfer experiments performance is obvious except for E3 and E9, which will be explained in Section 4.
To further examine how different transfer learnings work, we analyzed the IMP performances of a transfer experiment which transfers from different source datasets to the same target datasets. In Figure 8, Figure 8a–d respectively illustrate the improvement performance of transferring to four datasets (FD001, FD002, FD003, and FD004). Figure 8a shows the IMP performance of transfer experiment which transfers to FD001 target dataset. The E2 has effective performance. The IMP performance in the E2 is greater than 20% when the size of trajectories is less than or equal than 50. The E3 has a detrimental effect performance. Figure 8b shows the IMP performance of a transfer experiment which transfers to the FD002 target dataset. Both the E5 and E6 have effective performance. The IMP performance in the E5 is greater than 15% when the size of trajectories is less than or equal than 50. The IMP performance in the E6 is greater than 45% when the size of trajectories less than or equal than 50. Figure 8c shows the IMP performance of a transfer experiment which transfers to the FD0013 target dataset. The E8 has effective performance. The IMP performance in the E8 is greater than 17% when the size of trajectories is less than or equal than 70. The E9 has a detrimental effect on performance. Figure 8d shows the IMP performance of a transfer experiment which transfers to the FD004 target dataset. Both the E5 and E6 have effective performance. The IMP performance in the E5 is greater than 15% when the size of trajectories is less than or equal than 50. The IMP performance in the E6 is greater than 30% when the size of the trajectories is less than or equal than 50.
From all the above results, we also found that working conditions affect transfer learning performance a lot. How do working conditions affect transfer learning performance? We will discuss this in Section 4.

4. Discussion

In the prior section, we showed that transfer learning is effective, except for E3 and E9. Here, we discuss how working conditions affect the transfer performance and analyze the cases when transfer learning can bring negative effects.

4.1. How Working Conditions Affect Transfer Learning Performance

To further understand how working conditions affect transfer learning performance, we analyzed the IMP performances of transfer experiments grouped by different working conditions. Figure 9 shows the IMP performance of transfer learning with different working conditions. Table 4 shows the information of experiments in Figure 9. Figure 9a,b show how fault conditions influence the performance of transfer learning. Figure 9c,d show how operating conditions influence the performance of transfer learning.
From Figure 9a,b, we found that when the fault conditions are considered, in both the transfer learning from single fault condition to multiple fault conditions (E8, E11) and the transfer learning from multiple fault conditions to single fault condition (E2, E5), the transfer learning scheme improved the prediction models of the target dataset.
From Figure 9c,d, we found that when the operating conditions are considered, transfer learning from a single operating condition to multiple operating conditions (E6, E12) improved the prediction performance of the model on target dataset. However, transfer learning from the model trained on the multiple operating condition dataset to the single operating condition (E3, E9) had a detrimental effect on the prediction performance of the target model.
From all the above results, we found that transfer learning under working conditions made effective performance improvements, except when transferring from multiple operating conditions to single operating condition (E3, E9).

4.2. Negative Transfer

Negative transfer occurs when the information learned from the source domain has a detrimental effect on the prediction model for the target domain [29]. The more the source data is similar to the target data, the lower the negative transfer effect. We already noted that transfer from multiple operating conditions to single condition led to a detrimental effect. Here we try to understand the negative transfer by comparing sensors’ monitoring data values and explain why transfer from multiple operating conditions to a single condition has a detrimental effect.
Figure 10 shows the comparison between three type trends of sensors’ monitoring data values in each of the following datasets (FD001, FD002, FD003, FD004). Figure 10a–d are the sensor-2 values, which represent the ascending trend sensors. Figure 10e–h are the sensor-12 values, which represent the descending trend sensors. Figure 10i–l are the sensor-16 values, which represent the trend that is unchanged in FD001 and FD003 but changed in FD002 and FD004. It is important to note that the operating setting values have great influence on sensor values and influence transfer learning performance a lot.
Comparing Figure 10a,e,i with Figure 10c,g,k, FD001 and FD003 have different fault conditions. We found that the distribution of sensor values with different fault condition are similar. The same is true for FD002 in Figure 10b,f,j and FD004 in Figure 10d,h,l. Since the distribution of sensor values with different fault conditions are similar, the transfer learning from single to the multiple fault conditions (E8, E11) and from multiple to single fault conditions (E2, E5) can achieve positive effects on the prediction model of the target dataset.
Comparing Figure 10a,e,i with Figure 10b,f,j, FD001 and FD002 have different operating conditions. We found that the distribution of sensor values with different operating conditions have large differences. The same is true for FD003 in Figure 10c,g,k and FD004 Figure 10d,h,l. With the understanding that the distribution of sensor values with different operating conditions have big differences, our experiments showed that the transfer learning from multiple to single operating conditions (E3, E9) led to a detrimental effect on prediction performance while transfer learning from single to multiple operating conditions (E6, E12) led to a positive effect on the prediction performance of the target model.
What causes the different effects of transfer learning with different operating conditions? Two factors may be involved here. First, we found that the sensor monitoring data under the multiple operating conditions are more complicated than the dataset under single conditions. Second, the distribution of sensor values are more different than the sensor monitoring data with the single operating condition. And then the initial parameters of the model transferred from the complicated multiple conditions to the simple single condition causes overfitting and it is hard to fine-tune it with limited target data size. This can be verified from the plots of training loss history. Figure 11 illustrates the training loss history for experiments in which the training data set is generated by randomly selecting 50 trajectories. Comparing training loss history between Figure 11a and Figure 11b, we found that E6 in Figure 11a is able to fine-tune training well and gets a lower mean score by nearly 10 in validation data than experiments without transfer E4 in Figure 11a. However, E3 causes overfitting. It is hard to fine-tune training and it gets a higher mean score by nearly 20 on validation data than experiments without transfer E1. The same is true between E12 in Figure 11c and E9 in Figure 11d. This may explain why transfer learning from multiple operating conditions to a single condition led to detrimental effects on the prediction performance.

5. Conclusions

This paper presents a transfer learning algorithm with bi-directional LSTM neural networks for RUL prediction of a turbofan engine. Our algorithm addresses one of the major challenges in RUL prediction: the difficulty of obtaining a sufficient number of samples in data-driven prognostics. Our transfer prognostic model works by exploiting data samples from different but approximately related tasks for remaining useful life estimation. Our method was validated on the C-MAPSS Datasets with extensive experiments by comparing its performance with those models without transfer learning. The experimental results showed that transfer learning is effective in most cases except when transferring from a dataset of multiple operating conditions to a dataset of a single operating condition, which led to negative transfer learning. How to prevent negative transfer is still an open problem. In future work, we will develop more advanced transfer learning methods or data normalization schemes to improve the multi-condition RUL prediction performance.

Author Contributions

Conceptualization, A.Z., J.H., S.L.; methodology, A.Z., J.H., Z.L., Y.C.; software, A.Z., G.Y.; investigation, A.Z., J.H.; resources, S.L.; writing—original draft preparation, A.Z., J.H.; writing—review and editing, A.Z., J.H., Z.L. and Y.C.; supervision, S.L. and J.H.; project administration, H.W., J.H. and S.L.; funding acquisition, H.W. and S.L.

Funding

This research was funded by The National Natural Science Foundation of China under Grant Nos. 91746116 and 51741101, Project of Ministry of Industry and Information under Grant No. [2016]213, Science and Technology Project of Guizhou Province under Grant Nos. Talents [2015]4011 and [2016]5013, Collaborative Innovation [2015]02, National Natural Science Foundation of China: 61863005 and Project of Guizhou University’s Technology Crowdfunding for Intelligent Equipment under Grant No. JSZC[2016]001.

Acknowledgments

J.H. gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research and the support of National Institute of Measurement and Testing Technology with providing the semi-anechoic laboratory.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lei, Y.; Li, N.; Guo, L.; Li, N.; Yan, T.; Lin, J. Machinery health prognostics: A systematic review from data acquisition to RUL prediction. Mech. Syst. Signal Process. 2018, 104, 799–834. [Google Scholar] [CrossRef]
  2. Lu, C.; Wang, Z.Y.; Qin, W.L.; Ma, J. Fault diagnosis of rotary machinery components using a stacked denoising autoencoder-based health state identification. Signal Process. 2017, 130, 377–388. [Google Scholar] [CrossRef]
  3. Zhu, S.P.; Huang, H.Z.; Peng, W.; Wang, H.K.; Mahadevan, S. Probabilistic physics of failure-based framework for fatigue life prediction of aircraft gas turbine discs under uncertainty. Reliab. Eng. Syst. Saf. 2016, 146, 1–12. [Google Scholar] [CrossRef]
  4. Yin, S.; Li, X.; Gao, H.; Kaynak, O. Data-based techniques focused on modern industry: An overview. IEEE Trans. Ind. Electron. 2015, 62, 657–667. [Google Scholar] [CrossRef]
  5. Lee, J.; Wu, F.; Zhao, W.; Ghaffari, M.; Liao, L.; Siegel, D. Prognostics and health management design for rotary machinery systems—Reviews, methodology and applications. Mech. Syst. Signal Process. 2014, 42, 314–334. [Google Scholar] [CrossRef]
  6. Lu, J.; Lu, F.; Huang, J. Performance Estimation and Fault Diagnosis Based on Levenberg–Marquardt Algorithm for a Turbofan Engine. Energies 2018, 11, 181. [Google Scholar] [Green Version]
  7. Fumeo, E.; Oneto, L.; Anguita, D. Condition based maintenance in railway transportation systems based on big data streaming analysis. Procedia Comput. Sci. 2015, 53, 437–446. [Google Scholar] [CrossRef]
  8. Stetter, R.; Witczak, M. Degradation Modelling for Health Monitoring Systems. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2014; Volume 570, p. 062002. [Google Scholar]
  9. Khelassi, A.; Theilliol, D.; Weber, P.; Ponsart, J.C. Fault-tolerant control design with respect to actuator health degradation: An LMI approach. In Proceedings of the 2011 IEEE International Conference on Control Applications (CCA), Denver, CO, USA, 28–30 September 2011; pp. 983–988. [Google Scholar]
  10. Jardine, A.K.; Lin, D.; Banjevic, D. A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech. Syst. Signal Process. 2006, 20, 1483–1510. [Google Scholar] [CrossRef]
  11. Eker, O.F.; Camci, F.; Jennions, I.K. Major Challenges in Prognostics: Study on Benchmarking Prognostics Datasets. In Proceedings of the First European Conference of the Prognostics and Health Management Society 2012, Dresden, German, 3–5 July 2012. [Google Scholar]
  12. Liu, K.; Chehade, A.; Song, C. Optimize the signal quality of the composite health index via data fusion for degradation modeling and prognostic analysis. IEEE Trans. Autom. Sci. Eng. 2017, 14, 1504–1514. [Google Scholar] [CrossRef]
  13. Lei, Y.; Li, N.; Gontarz, S.; Lin, J.; Radkowski, S.; Dybala, J. A model-based method for remaining useful life prediction of machinery. IEEE Trans. Reliab. 2016, 65, 1314–1326. [Google Scholar] [CrossRef]
  14. Camci, F.; Medjaher, K.; Zerhouni, N.; Nectoux, P. Feature evaluation for effective bearing prognostics. Qual. Reliab. Eng. Int. 2013, 29, 477–486. [Google Scholar] [CrossRef] [Green Version]
  15. Camci, F.; Chinnam, R.B. Health-state estimation and prognostics in machining processes. IEEE Trans. Autom. Sci. Eng. 2010, 7, 581–597. [Google Scholar] [CrossRef] [Green Version]
  16. Diamanti, K.; Soutis, C. Structural health monitoring techniques for aircraft composite structures. Prog. Aerosp. Sci. 2010, 46, 342–352. [Google Scholar] [CrossRef]
  17. Gebraeel, N.; Elwany, A.; Pan, J. Residual life predictions in the absence of prior degradation knowledge. IEEE Trans. Reliab. 2009, 58, 106–117. [Google Scholar] [CrossRef]
  18. Eker, O.F.; Camci, F.; Guclu, A.; Yilboga, H.; Sevkli, M.; Baskan, S. A simple state-based prognostic model for railway turnout systems. IEEE Trans. Ind. Electron. 2011, 58, 1718–1726. [Google Scholar] [CrossRef] [Green Version]
  19. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef] [PubMed]
  20. Li, S.; Liu, G.; Tang, X.; Lu, J.; Hu, J. An ensemble deep convolutional neural network model with improved ds evidence fusion for bearing fault diagnosis. Sensors 2017, 17, 1729. [Google Scholar] [CrossRef] [PubMed]
  21. Li, S.; Yao, Y.; Hu, J.; Liu, G.; Yao, X.; Hu, J. An ensemble stacked convolutional neural network model for environmental event sound recognition. Appl. Sci. 2018, 8, 1152. [Google Scholar] [CrossRef]
  22. Yao, Y.; Wang, H.; Li, S.; Liu, Z.; Gui, G.; Dan, Y.; Hu, J. End-to-end convolutional neural network model for gear fault diagnosis based on sound signals. Appl. Sci. 2018, 8, 1584. [Google Scholar] [CrossRef]
  23. Babu, G.S.; Zhao, P.; Li, X.L. Deep Convolutional Neural Network Based Regression Approach for Estimation of Remaining Useful Life. In International Conference on Database Systems for Advanced Applications; Springer: Cham, Switzerland, 2016; pp. 214–228. [Google Scholar]
  24. Zheng, S.; Ristovski, K.; Farahat, A.; Gupta, C. Long short-term memory network for remaining useful life estimation. In Proceedings of the 2017 IEEE International Conference on IEEE Prognostics and Health Management (ICPHM), Allen, TX, USA, 19–21 June 2017; pp. 88–95. [Google Scholar]
  25. Zhao, R.; Yan, R.; Wang, J.; Mao, K. Learning to monitor machine health with convolutional Bi-directional LSTM networks. Sensors (Switzerland) 2017, 17, 273. [Google Scholar] [CrossRef] [PubMed]
  26. Wu, Y.; Yuan, M.; Dong, S.; Lin, L.; Liu, Y. Remaining useful life estimation of engineered systems using vanilla LSTM neural networks. Neurocomputing 2018, 275, 167–179. [Google Scholar] [CrossRef]
  27. Li, Z.; Wu, D.; Hu, C.; Terpenny, J. An ensemble learning-based prognostic approach with degradation-dependent weights for remaining useful life prediction. Reliab. Eng. Syst. Saf. 2018. [Google Scholar] [CrossRef]
  28. Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
  29. Weiss, K.; Khoshgoftaar, T.M.; Wang, D.D. A Survey of Transfer Learning. J. Big Data 2016. [Google Scholar] [CrossRef]
  30. Meng, Z.; Chen, Z.; Mazalov, V.; Li, J.; Gong, Y. Unsupervised adaptation with domain separation networks for robust speech recognition. In Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan, 16–20 December 2017; pp. 214–221. [Google Scholar]
  31. Singh, K.K.; Divvala, S.; Farhadi, A.; Lee, Y.J. DOCK: Detecting Objects by Transferring Common-Sense Knowledge. In European Conference on Computer Vision; Springer: Berlin, Germany, 2018; pp. 506–522. [Google Scholar]
  32. Cao, H.; Bernard, S.; Heutte, L.; Sabourin, R. Improve the performance of transfer learning without fine-tuning using dissimilarity-based multi-view learning for breast cancer histology images. In International Conference Image Analysis and Recognition; Springer: Berlin, Germany, 2018; pp. 779–787. [Google Scholar]
  33. Ramasso, E.; Saxena, A. Performance Benchmarking and Analysis of Prognostic Methods for CMAPSS Datasets. Int. J. Prognostics Health Manag. 2014, 5, 1–15. [Google Scholar]
  34. Saxena, A.; Goebel, K. Turbofan Engine Degradation Simulation Data Set. NASA Ames Prognostics Data Repository. 2008. Available online: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/ (accessed on 20 November 2018).
  35. Hochreiter, S.; Urgen Schmidhuber, J. LONG SHORT-TERM MEMORY. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  36. Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  37. Heimes, F. Recurrent Neural Networks for Remaining Useful Life Estimation. In Proceedings of the International Conference on 2008 Prognostics and Health Management, Denver, CO, USA, 6–9 October 2008; pp. 1–6. [Google Scholar] [CrossRef]
Figure 1. Operational setting values in every sub-dataset.
Figure 1. Operational setting values in every sub-dataset.
Applsci 08 02416 g001
Figure 2. Transfer learning model architecture.
Figure 2. Transfer learning model architecture.
Applsci 08 02416 g002
Figure 3. Extracting input samples.
Figure 3. Extracting input samples.
Applsci 08 02416 g003
Figure 4. Operating setting clusters.
Figure 4. Operating setting clusters.
Applsci 08 02416 g004
Figure 5. Piece-wise linear RUL target function.
Figure 5. Piece-wise linear RUL target function.
Applsci 08 02416 g005
Figure 6. Box plot of the mean score for every experiment. The blue boxes are the without-transfer model and the yellow and green boxes are the performance of transfer model. The X-axis is the size of trajectories which are randomly selected 10, 20, ..., 90, 100 trajectories. In the caption, A→B means from the source data A transfer to the target data B.
Figure 6. Box plot of the mean score for every experiment. The blue boxes are the without-transfer model and the yellow and green boxes are the performance of transfer model. The X-axis is the size of trajectories which are randomly selected 10, 20, ..., 90, 100 trajectories. In the caption, A→B means from the source data A transfer to the target data B.
Applsci 08 02416 g006
Figure 7. Comparison of the RUL estimation results in test data for experiments of which the training data set is generated by randomly selecting 50 trajectories. In the subfigure, the X-axis is the test unit with increasing RUL which is sorted by actual RUL values. The Y-axis is the actual and prediction RUL values. The blue points are the actual RUL values of the test unit and the red points are the prediction RUL values of the test unit. In the subfigure caption, A→B means from the source data A transfer to the target data B.
Figure 7. Comparison of the RUL estimation results in test data for experiments of which the training data set is generated by randomly selecting 50 trajectories. In the subfigure, the X-axis is the test unit with increasing RUL which is sorted by actual RUL values. The Y-axis is the actual and prediction RUL values. The blue points are the actual RUL values of the test unit and the red points are the prediction RUL values of the test unit. In the subfigure caption, A→B means from the source data A transfer to the target data B.
Applsci 08 02416 g007aApplsci 08 02416 g007b
Figure 8. Comparison of the IMP performances for transfer experiments.
Figure 8. Comparison of the IMP performances for transfer experiments.
Applsci 08 02416 g008
Figure 9. Comparison of transfer learning performance improvement with different working conditions. In the caption, the F symbol is the fault condition, the O symbol is the operating condition, the number after the symbol is the type of working condition from the Table 2, and A→B means from working condition A transfer to working condition B.
Figure 9. Comparison of transfer learning performance improvement with different working conditions. In the caption, the F symbol is the fault condition, the O symbol is the operating condition, the number after the symbol is the type of working condition from the Table 2, and A→B means from working condition A transfer to working condition B.
Applsci 08 02416 g009
Figure 10. Comparison sensor monitoring data values.
Figure 10. Comparison sensor monitoring data values.
Applsci 08 02416 g010aApplsci 08 02416 g010b
Figure 11. Training loss history for experiments in which the training data set is generated by randomly selecting 50 trajectories.
Figure 11. Training loss history for experiments in which the training data set is generated by randomly selecting 50 trajectories.
Applsci 08 02416 g011aApplsci 08 02416 g011b
Table 1. C-MAPSS Datasets.
Table 1. C-MAPSS Datasets.
DatasetFD001FD002FD003FD004
DatasetFD001FD002FD003FD004
Train trajectories100260100249
Test trajectories100259100248
Maximum life span (Cycles)362378525543
Average life span (Cycles)206206247245
Minimum life span (Cycles)128128145128
Operating Conditions1616
Fault conditions1122
Table 2. Experiments.
Table 2. Experiments.
LabelTransfer (From To)Operating ConditionsFault Conditions
E1FD001 no transfer11
E2FD003→FD0011→12→1
E3FD002→FD0016→11→1
E4FD002 no transfer61
E5FD004→FD0026→62→1
E6FD001→FD0021→61→1
E7FD003 no transfer12
E8FD001→FD0031→11→2
E9FD004→FD0036→12→2
E10FD004 no transfer62
E11FD002→FD0046→61→2
E12FD003→FD0041→62→2
Table 3. Experiments results in the mean values of very experiment that repeat three times.
Table 3. Experiments results in the mean values of very experiment that repeat three times.
LabelFrom To/IMPMean ScoreRMSE
102030405060708090100102030405060708090100
E1FD001 no transfer21.7110.0612.354.173.072.92.762.652.542.6526.3622.0721.1816.6215.1215.0614.6414.6214.214.26
E2FD003→FD0013.233.452.672.632.372.422.372.362.262.2415.7916.1714.414.6113.6914.0214.1114.0813.7313.65
IMP (%)85.1265.778.3836.923.0116.4113.9711.1510.915.6540.0926.731.9912.129.416.883.633.663.314.22
E3FD002→FD00143.3646.517.23.845.393.526.5517.383.824.339.8940.8921.0517.1518.9416.4519.7826.7317.2418.3
IMP (%)−99.7−36241.697.77−75.3−21.4−137−555−50.4−62.2−51.3−85.30.6−3.19−25.3−9.21−35.2−82.9−21.4−28.4
E4FD002 no transfer23.7917.513.7813.1111.559.689.649.079.048.3728.4426.3524.6925.123.3222.2922.1622.3521.9821.7
E5FD004→FD00213.5112.0610.2210.459.728.889.278.877.788.4522.9423.121.5721.5622.1521.0321.3521.1520.520.83
IMP (%)43.1931.125.8420.3115.858.293.832.2113.9−1.0419.3612.3412.6214.115.035.643.675.356.754.04
E6FD001→FD00211.611.398.978.987.877.818.038.67.467.0424.4623.4822.3521.6621.2620.7921.0521.8120.9620.77
IMP (%)51.2434.8834.8731.5531.8419.3116.725.1617.4115.8713.9910.899.4513.78.856.725.032.434.634.28
E7FD003 no transfer24.4712.489.410.2510.917.336.264.413.284.7926.5222.9921.0921.1421.9219.2317.6516.3314.7116.33
E8FD001→FD00314.247.657.727.085.455.064.064.413.69423.6619.819.1417.9616.5716.9615.5616.1114.4414.34
IMP (%)41.8138.6917.9230.9250.0530.935.170.05−12.516.5110.813.869.2815.0324.3911.8111.841.341.8512.18
E9FD004→FD00341.1436.9712.5519.658.8518.838.358.918.719.636.9535.4624.9129.3423.0528.5822.6222.9722.5824.1
IMP (%)−68.1−196−33.5−91.818.86−157−33.4−102−166−101−39.3−54.2−18.1−38.8−5.16−48.7−28.2−40.7−53.5−47.7
E10FD004 no transfer38.5231.7924.7322.7825.0618.6221.7718.1716.6618.233.1530.5628.7428.6228.1727.227.2926.7526.2325.9
E11FD002→FD0042723.320.9818.2217.5915.5916.2615.1215.7316.1629.2129.1427.227.2526.8626.2225.7525.6425.5525.44
IMP (%)29.926.7115.1820.0229.816.2625.2916.775.5711.1911.894.625.354.84.663.625.664.142.61.75
E12FD003→FD00425.4516.313.9113.3213.3712.4313.112.5311.7111.7526.3925.7625.0725.6625.3424.6924.3624.3623.6723.4
IMP (%)33.9248.7343.7641.5246.6433.2339.8331.0629.6935.4720.3915.7112.7510.3610.039.2510.748.949.759.65
Notes: IMP is the improvement with transfer learning versus without-transfer learning: I M P = ( 1 ( W i t h T r a n s f e r ) / ( N o T r a n s f e r ) ) × 100 .
Table 4. Information of experiments in Figure 9.
Table 4. Information of experiments in Figure 9.
FigureLabelTransfer (From To)Operating ConditionsFault Conditions
Figure 9aE8FD001→FD0031→11→2
E2FD003→FD0011→12→1
Figure 9bE11FD002→FD0046→61→2
E5FD004→FD0026→62→1
Figure 9cE6FD001→FD0021→61→1
E3FD002→FD0016→11→1
Figure 9dE12FD003→FD0041→62→2
E9FD004→FD0036→12→2

Share and Cite

MDPI and ACS Style

Zhang, A.; Wang, H.; Li, S.; Cui, Y.; Liu, Z.; Yang, G.; Hu, J. Transfer Learning with Deep Recurrent Neural Networks for Remaining Useful Life Estimation. Appl. Sci. 2018, 8, 2416. https://doi.org/10.3390/app8122416

AMA Style

Zhang A, Wang H, Li S, Cui Y, Liu Z, Yang G, Hu J. Transfer Learning with Deep Recurrent Neural Networks for Remaining Useful Life Estimation. Applied Sciences. 2018; 8(12):2416. https://doi.org/10.3390/app8122416

Chicago/Turabian Style

Zhang, Ansi, Honglei Wang, Shaobo Li, Yuxin Cui, Zhonghao Liu, Guanci Yang, and Jianjun Hu. 2018. "Transfer Learning with Deep Recurrent Neural Networks for Remaining Useful Life Estimation" Applied Sciences 8, no. 12: 2416. https://doi.org/10.3390/app8122416

APA Style

Zhang, A., Wang, H., Li, S., Cui, Y., Liu, Z., Yang, G., & Hu, J. (2018). Transfer Learning with Deep Recurrent Neural Networks for Remaining Useful Life Estimation. Applied Sciences, 8(12), 2416. https://doi.org/10.3390/app8122416

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop