1. Introduction
Hydropower is an important renewable and clean energy. With the increasingly severe energy and climate challenges, it is imperative to develop hydropower energy safely and efficiently. As a critical equipment for hydropower energy utilization, the Francis turbine unit (FTU) also undertakes essential tasks such as peak frequency modulation and emergency standby in the power grid system. Therefore, ensuring the safe and stable operation of the FTU is of great significance in promoting the development of the national economy and ensuring the stability of the energy system [
1]. Currently, the maintenance strategy of the FTU is mainly routine maintenance and reparation after failures, which has a high cost and makes it difficult to recognize the early signs of fault in time. Therefore, the performance evaluation and the degradation trend prediction of the FTU have attracted more and more attention [
2,
3,
4]. Although the studies of general rotating machinery prognostics are relatively mature [
5,
6], there are still two practical difficulties in the field of engineering applications of the FTU: (1) The quality of the on-site measured data is usually too low, characterized by low sampling frequency, missing data and anomaly data. (2) The drastic variation in operating conditions makes it difficult to evaluate and predict the performance of FTU accurately.
The working environment of the FTU is hostile, including external interferences such as moisture, dust, vibration and electromagnetic disturbance. The supporting monitoring and data acquisition system of the FTU often involves multiple distributed modules with complex data transmission link structures and long physical distances. Hence, there are often many anomaly samples and missing values in the on-site raw data of FTUs, caused by sensor failure, short-term router failure or electromagnetic interference [
7]. Meanwhile, the storage space of the data acquisition system is limited, and the short-term variation trend of each monitored quantity is not evident during the long-term working period of the FTU, so the storage frequency of the on-site data is often relatively low. Some studies about the performance evaluation and the remaining useful life prediction of rotating machinery have achieved excellent results in laboratory environments. However, these approaches often rely on high-quality data and are difficult to directly apply to the engineering practice of the FTU [
8,
9,
10]. Focusing on the anomaly data, some current research has adopted denoising methods based on frequency domain analysis or energy spectrum analysis, which are effective while the sampling frequency is high and consistent [
11,
12,
13]. However, the missing values and the variable condition significantly affect their effectiveness. The clustering method has been proved to have significant performance in the recognition of outliers of high-dimensional data [
14,
15,
16]. As a density-based clustering method, the density-based spatial clustering of applications with noise (DBSCAN) algorithm is widely used in anomaly data detection due to its simple structure and good adaptability to high-dimensional data [
17,
18,
19,
20]. Hence, the DBSCAN is adopted to clean the raw monitoring data in this paper.
Except for the missing samples in the raw data set, data cleaning also increases the amount of missing data. If these missing values were simply deleted, the potentially important information might be discarded. In order to evaluate and predict the performance of the FTU more effectively, it is necessary to fill in the missing data appropriately. Traditional filling methods based on statistics, such as mean filling and median filling, mostly ignore the time sequence information between data. With the rapid development of machine learning theory and technology, more and more studies are processing sequences with missing values based on improved recurrent neural networks (RNN) [
21,
22,
23]. Che et al. introduced a decay mechanism to the typical gate recurrent unit model (GRU) to construct the GRUD model. It was proved that the decay mechanism enables GRUD to effectively learn potential patterns in sequences with missing values [
24]. GRUD has achieved good results in the prediction of incomplete sequences. However, these supervised regression methods cannot be directly used to generate complete sequences because the training targets cannot be set for the missing values. As one of the most promising models in unsupervised learning on complex distribution, the generative adversarial network (GAN) model has made outstanding achievements in nonlinear model analysis and image generation [
25,
26,
27]. Based on the competitive learning between the generator and the discriminator, this model can adaptively learn the expression paradigm in the input data [
28,
29,
30]. The classic GAN model has problems such as training difficulties and mode collapse. Martin introduced the Wasserstein distance to guide the training process, which significantly improved the performance of the GAN model [
31,
32]. At present, the time series data-generating ability of the WGAN remains to be studied. Therefore, in this paper, the GRUD–WGAN model is proposed to realize the missing data imputation of the on-site data.
Because of the influence of natural inflow conditions and the adjustment requirements of the power grid, the operating parameters of the FTU vary across a wide range and are generally of a high frequency. The monitoring data are highly correlated with operating conditions. Traditional performance evaluation methods of the FTU are primarily based on the overage alarm strategy and the fixed threshold, ignoring the correlation between monitoring data and working condition parameters [
33]. To solve this problem, Shan et al. adopted the backpropagation neural network to construct the nonlinear mapping relationship between operating parameters and the vibration amplitude of the lower bracket [
34]. This research realizes the evaluation of the FTU under variable operating conditions. However, the definite numerical mapping relation is susceptible to the random fluctuation of signals. Therefore, the Gaussian process regression (GPR) is introduced to establish a probability mapping model between operating parameters and the probability density distribution of monitored values, so as to improve the robustness of the performance evaluation model against random noise.
After quantifying the abstract FTU performance into performance degradation indicators (PDI), the degradation trend prediction problem is essentially transformed into a time series forecasting task. Because RNNs, such as the GRU and the long short-term memory (LSTM) network, can learn the potential timing information, they are widely used in sequence prediction [
35,
36,
37,
38]. Shih et al. added the temporal pattern attention (TPA) mechanism based on the LSTM structure to further improve the performance of the LSTM model in mining temporal dependencies [
39]. In this paper, the TPA–LSTM is adopted to construct the prediction model for the performance degradation indicator of the FTU.
To sum up, in the field of the evaluation and performance prediction of the FTU, there are few studies on the cleaning of low-quality on-site data. The data imputation method of the incomplete data set of the FTU is not yet mature. Meanwhile, few studies have considered the random fluctuation of data in establishing a healthy model of the FTU. In addition, there is room for the further improvement of the accuracy of PDI prediction models.
In this paper, an approach for the performance evaluation and prediction of FTU considering low-quality data and variable operating conditions is proposed. The main contributions are highlighted as follows:
- (1)
Considering the variable operating conditions and the characteristics of the anomaly samples, an on-site data-cleaning method based on DBSCAN is constructed to adaptively detect both singulars and outliers.
- (2)
Combining the incomplete sequence information mining ability of the GRUD and the hidden pattern learning ability of the WGAN, the GRUD–WGAN-based missing value imputation model is proposed to improve the low-quality data utilization value.
- (3)
Based on the GPR, the mapping relationship between the operating parameters and the distribution of monitored data is established as the healthy-state probability model of the FTU. The robustness of the healthy model to the random noise is improved because the distribution probability, instead of a single value, is taken into consideration.
- (4)
The TPA–LSTM-based PDI prediction model is constructed to realize the accurate degradation trend prediction as the basis for predictive maintenance of the FTU.
The rest of this article is organized as follows. The framework and procedures of the proposed approach are explained in detail in
Section 2. The proposed method is applied on a large practical FTU, and the results are presented in
Section 3. The performance of the proposed data imputation model and the trend prediction model is emphatically compared and discussed in
Section 4. Finally, the conclusion is given in
Section 5.
2. Proposed Method
In this paper, an approach for the performance evaluation and prediction of FTUs considering low-quality data and variable operating conditions is proposed. The proposed framework is illustrated in
Figure 1. First, the DBSCAN algorithm is introduced to clean the anomaly samples in the raw monitoring data set of the FTU, which includes the water head (
), the active power (
) and the vibration amplitude (
) of the top cover. Second, the GRUD–WGAN model is proposed to fill in the missing values in the raw data set or those caused by data cleaning. Third, the healthy-state probabilistic model of the FTU under complex operating conditions is established based on the complete data set and the GPR algorithm. The negative log-likelihood probability (NLLP) between the data to be evaluated and the healthy-state model is defined as the PDI of the FTU. Finally, to forecast the degradation trend of the FTU, the PDI prediction model is constructed based on the TPA–LSTM algorithm.
2.1. Data Cleaning
The condition monitoring systems of large FTUs usually have a huge scale and complex structures. The state-monitoring data and operating condition parameters are usually distributed and monitored by several different monitoring modules, and collected into the computer monitoring system of the hydropower station through various data communication protocols and long-distance communication cables, which are prone to communication packet loss or short-term failure. Moreover, FTUs work in a humid, high-electromagnetic-interference and drastic-vibration environment. These factors lead to an apparent anomaly or missing values in the on-site raw data. Traditional signal denoising methods are mainly based on signal decomposition and reconstruction. Frequent changes in operating conditions will affect the effectiveness of these methods. Meanwhile, random missing values make these methods challenging to apply to engineering practice.
The DBSCAN algorithm is a kind of unsupervised clustering method. As an effective density-clustering method, DBSCAN can adaptively identify clusters with irregular shapes and automatically mark sample points with low density as noise. The DBSCAN is adopted to adaptively recognize the anomaly values in the raw data set. The schematic diagram of the DBSCAN algorithm is illustrated in
Figure 2.
For each sample
in the data set
, the region for which the Euclidean distance from the sample point
is less than
is defined as the
-neighborhood of
. Its element set is expressed as:
where
represents the Euclidean distance between samples
and
.
If sample is in the -neighborhood of sample , and are called directly density-reachable to each other. If sample is also directly density-reachable to , but is not directly density-reachable to , and are called density-reachable to each other. If the element number of is greater than the minimum density threshold , sample is defined as a core point. If sample is located in the -neighborhood of a particular core point, but is not a core point, then is defined as a border point.
The samples which are directly density-reachable or density-reachable to the core point construct a cluster, as illustrated by blue circles in
Figure 2. The samples that do not belong to any clusters are noise points, marked as red points in
Figure 2. The recognized noise points in the raw data set are dropped out.
2.2. Missing Value Imputation
To provide a complete data set for subsequent health status evaluation, the GRUD–WGAN model is proposed to fill in the missing values. The main inspiration of the GRUD–WGAN is to use GRUD to receive incomplete sequences with missing values and convert them into complete hidden sequences. Then, through the antagonistic training of the generator and the discriminator under the WGAN framework, the distribution of valid values is learned adaptively, so as to generate a proper complete sequence.
2.2.1. GRUD
GRU has been widely proved to have an excellent ability to capture dependencies between time series data [
37]. However, the traditional GRU cannot handle sequences with missing values. Based on the GRU model, GRUD adds the decay mechanism to estimate the missing values according to the previous sequence [
24]. The schematic diagram of the GRUD model is shown in
Figure 3. The trainable decay coefficient
is defined as:
where
is the time interval between the current moment
and the last non-missing value,
and
represent the weight and bias of a neural network so that that
can be updated during the training process.
The missing values of the input data
are replaced by
, expressed as:
where
is the mask code,
is the last non-missing value, and
indicates the mean value of
.
The decay mechanism is also applied to the hidden state
to enhance the learning of missing value patterns:
where
represents element multiplication.
Moreover, the mask code
is fed into the GRU cell directly. Finally, the update functions of GRUD are as follows:
where
and
represent the reset gate and the update gate, and
and
indicate the sigmoid and tanh activation function.
2.2.2. WGAN
The GAN model is inspired by the two-player zero-sum game [
26]. The typical structure of the GAN includes a discriminator and a generator. The goal of the discriminator is to correctly distinguish between actual data sampled from the input data set and fake data generated by the generator. The purpose of the generator is to produce fake data that can deceive the discriminator. GAN is trained by alternating adversarial learning between discriminator and generator, and the optimum objective is to achieve Nash equilibrium. Finally, the generator can accurately estimate the distribution of data samples. However, the original GAN has some problems, such as training difficulty and mode collapse. Therefore, Arjovsky et al. proposed the WGAN model, which adopts the Wasserstein distance instead of the Jensen–Shannon divergence to indicate the difference between the actual samples and the generated samples [
32]. Specifically, the Wasserstein distance of the WGAN can be simplistically expressed as:
where
represents the mathematical expectation,
and
represent samples of the real data and the generated data, respectively, and
indicates a neural network model for which the last layer is not a nonlinear activation layer.
The introduction of the Wasserstein distance solves the problem of gradient extinction. To minimize the Wasserstein distance, the loss function of the generator
and the discriminator
of WGAN can be expressed as:
In addition, to satisfy the Lipschitz continuity condition, the parameters of the discriminator network need to be clipped to . is a fixed constant, and its value does not affect the direction of the gradient.
2.2.3. GRUD–WGAN Model
The GRUD and the WGAN models are combined to establish the GRUD–WGAN model, as shown in
Figure 4. The GRUD model is adopted to be the essential component of both the generator and the discriminator of the WGAN framework. Specifically, the generator is constructed with a GRUD layer and a linear layer. The incomplete input sequence
at time
with length
is fed into the GRUD model, and
-dimensional hidden state vectors are outputted. The linear layer maps these hidden state vectors to a reconstructed sequence
of the same shape as the input sequence. Then, the complete output sequence
is calculated according to the mask code matrix
, expressed as:
The discriminator is also built by a GRUD layer and a linear layer. The GRUD is adopted to accept the incomplete sequences of the real data set or the complete imputed data set produced by the generator. The linear layer is used to map the hidden state vectors to a single value, which indicates the Wasserstein distance.
To make full use of the available data to accelerate convergence, the reconstruction error
is added to the loss function of the generator, given by:
when the training process converges, a generator that receives incomplete sequences and outputs complete sequences can be obtained.
2.3. Healthy-State Model Construction
To realize the accurate performance evaluation of FTUs under variable operating conditions, the mapping relationship between operating parameters and the probability density distribution of the vibration amplitude is constructed based on the GPR algorithm. GPR is a non-parametric regression method based on the probability statistical theory, which shows a strong generalization ability and adaptability in dealing with complex fitting and regression tasks. The GPR constructs the time series model through the Gaussian prior knowledge. The Gaussian prior is the distribution of
values corresponding to each independent variable
. It can be described by the mean function
and the covariance function
, expressed as:
According to the Bayesian inference, the joint distribution of actual observation samples
and the dependent variable
also obeys the Gaussian distribution, given by:
where the superscript * indicates the actual observation value.
After expanding Equation (17), the mean and the variance of
can be expressed as:
According to the Gaussian distribution formula, the probability density of
can be expressed as:
The healthy standard distribution model of vibration amplitude is constructed by the monitoring data acquired during the normal working period of the FTU. The
NLLP between the healthy standard distribution and the data to be evaluated
is defined as the PDI, given by:
The smaller the NLLP value is, the more similar the distribution of the data to be evaluated is to the healthy standard distribution, and the better the FTU status is, and vice versa.
2.4. Degradation Trend Prediction
After quantifying the differences between the data to be evaluated and the health model as the PDIs of the FTU, the degradation trend prediction task is converted to a time series forecasting problem. The TPA–LSTM introduced a temporal pattern attention mechanism based on the 1DCNN to the traditional LSTM model. The TPA–LSTM effectively improves the accuracy and stability of time series prediction by learning the attention weights of previous hidden states [
39]. The basic framework of the TPA–LSTM model is shown in
Figure 5.
The classic LSTM network is adopted to calculate the hidden state
with
dimensions according to the input data
. Then, a one-dimensional convolution operation with
kernels is performed on the matrix constructed with previous hidden state vectors to gain various temporal patterns, expressed as:
where
is the window length of the input data,
represents the time stamp, and
is the
1DCNN kernel.
The attention weights vector
is calculated by the scoring function, which is essentially a fully connected layer.
The context vector
and the final hidden state
are defined as:
3. Engineering Application
To verify the effectiveness of the proposed method, a large-scale FTU in the actual engineering environment was selected as the research object. This section begins with a brief introduction to the basic information of the FTU. Then, the long-term monitoring records of the water head (), the active power () and the vibration amplitude () of the top cover were obtained from the computer monitoring system of the hydropower station to form the raw data set. Next, anomaly samples in the raw data were removed based on DBSCAN. In addition, the GRUD–WGAN model was established to fill in the missing values. On this basis, the health state probability model of FTU was constructed based on the GPR algorithm, and the NLLP was defined as the PDI of the FTU. Finally, the TPA–LSTM model was built to realize the degradation trend prediction of the FTU.
3.1. Research Object
The researched FTU is located in the upper reaches of the Dadu River, Sichuan province, west of China. It is a large-capacity unit with a medium-high water head. Its essential performance parameters are listed in
Table 1, and the basic structure is shown in
Figure 6. The working state of the FTU is closely related to the operating parameters. Therefore, operating conditions must be considered. The operating condition of the FTU can be described by the water head (
) and the active power (
). The top cover is located between the turbine and the generator. It seals the runner chamber and connects the main shaft. As a critical component, its vibration amplitude (
) can reflect the working state of the FTU. The position of the monitoring point is illustrated in
Figure 7. Therefore, the raw sample set
including both operating parameters and monitoring data is formed by
.
The operating parameters of the FTU are monitored by the supervisory control system of the hydropower station. The vibration signals of critical components are acquired by a PSTA-2100 state monitoring system. They are transmitted to the computer monitoring system of the hydropower station through the TCP/IP protocol and Modbus 485 protocol, respectively. The physical distance of the transmission link is usually above several thousand meters, including multiple switches, routers, and different transceiver devices. In addition, these monitoring and communication systems work in the extreme environment of high humidity, strong vibration, and high electromagnetic interference, which may result in short-term failures. Consequently, the raw data directly exported from the computer monitoring system are often low quality, which manifests as data anomalies and data loss.
3.2. On-Site Data Cleaning
The acquired data include the
records from 20 January 2019, to 11 October 2019, and the sampling frequency is 30 min per sample, including 12,638 samples. The raw data are shown in
Figure 8. There is a long-term fluctuation trend in the water head data because of the seasonal fluctuation of upstream and downstream water levels. The active power is specified by the dispatching center according to the real-time power network load demand. Hence, it has high-frequency short-term fluctuation characteristics. The vibration amplitude is affected by the variation in these operating parameters, so its variation is complicated. Therefore, the operating condition parameters should be considered in data cleaning and subsequent evaluation. In addition, the overall missing rate of the raw data is 0.243, and the data anomaly in the vibration data is obvious.
The raw data
were combined into a three-dimensional point set
, as shown in
Figure 9. Due to the characteristics of the FTU, the oblique blank area is the restricted operating region. Usually, the FTU would avoid working in this restricted operating region because of the high vibration and low efficiency. The valid data are concentrated in the operating condition area on both sides. In addition, the anomaly data include singulars whose amplitude is different from the standard values and the outliers whose values are within the normal range, but with a distribution inconsistent with the standard values. The dataset
was inputted into the DBSCAN model, and the radius
and the minimum number of samples within cluster
were determined by the silhouette score
, defined as:
where
is the average distance between the
th sample and other samples in the same cluster, and
is the average distance between the
th sample and all other samples in the nearest cluster.
A larger
indicates that the samples within clusters are condensed, and the samples between clusters are dispersed.
After several experiments,
reached its maximum of 0.63 when
. The clustering result is shown in
Figure 10. The DBSCAN model effectively identified two valid data agglomerations and marked both singulars and outliers as noise points. The noise point was dropped out, and the valid data were retained, defined as the valid data set
, for subsequent analysis.
3.3. Missing Value Imputation
The missing data rates of the raw dataset
and the cleaned valid data set
were 0.243 and 0.292, respectively. The existence of missing values greatly impacts the subsequent evaluation and prediction procedures. The
,
and
of the valid data set
were inputted into the proposed GRUD–WGAN model to fill the missing values. The main parameters are listed in
Table 2. The result of data imputation is shown in
Figure 11, where imputation values are marked as red points. It can be seen that the amplitudes of imputation values are similar to the actual values nearby. The distribution of the imputation values is also similar to the actual values, as shown in
Figure 12. This indicates that the generator successfully learned the distribution of actual data. The complete data set after data imputation was defined as
. The validity of the proposed GRUD–WGAN and other data imputation methods is further compared and discussed in
Section 4.
3.4. Performance Evaluation of the FTU
The data from 20 January 2019 to 1 May 2019 in the complete data set were defined as the healthy standard data set , including 4814 samples. The FTU was maintained before 20 January 2019, and it performed well in the restart test. Meanwhile, this period includes all possible operating conditions, especially the water head. The rest of the data were defined as the evaluated data set , including 7824 samples.
The operating parameters
and
in
were selected as two independent variables, and
was selected as the dependent variable. The GPR algorithm was adopted to fit the mapping relationship between
and the probability density distribution of
, as the healthy-state model.
Figure 13a shows the three-dimensional surface formed by the mean value of
. In addition, three operating conditions (
,
, and
) were selected as the example to draw the probability density distribution curve of
, as shown in
Figure 13b. Obviously, the distribution of the vibration amplitude of the top cover is highly related to the operating condition parameters. At the rated operating condition
, the vibration amplitude distribution is more concentrated.
The operating parameters
of the evaluated data set
were inputted into the constructed healthy-state model to calculate the healthy standard distribution function of the vibration amplitude
. Then,
was put into the function, and the
was calculated as the PDI of the FTU. Moreover, considering that sufficient samples can reflect the characteristics of the probability density, one day (48 samples) was taken as the time window to generate a moving average for the calculated PDIs, and the finally obtained PDI curve is shown in
Figure 14. The defined PDI based on the NLLP represents the difference in the probability density distribution between the current state and the healthy state. The PDI indicates the relative degradation trend of the FTU, so it is a dimensionless value. It can be seen that the PDI remains stable from 1 May 2019 to 18 May 2019, and the curve shows an apparent upward trend with oscillation after 18 May 2019.
3.5. Degradation Trend Prediction of the FTU
To forecast the degradation trend of the FTU according to the historical PDIs, the TPA–LSTM model was established. The main parameters are listed in
Table 3. The mean square error was selected as the loss function, and the Adam optimizer was adopted to obtain a dynamic update of the learning rate. The obtained PDI curve included 7776 points, which were divided into a training set and test set in a ratio of 7:3. The model was trained for 300 epochs and the final prediction result is shown in
Figure 15.
The root mean square error (
RMSE), mean absolute error (
MAE) and
were selected as the metrics of the prediction result, defined as:
where
is the length of the sequence,
means the prediction value,
is the actual value and
is the mean value of the actual sequence. Lower
RMSEs and
MAEs indicate the better accuracy of the prediction result. The
value is between 0 and 1. An
close to 1 means the correlation between the predicted sequence and the actual sequence is strong. The metrics of the prediction result are listed in
Table 4. The accuracy and the correlation of the TPA–LSTM model are high in the degradation trend prediction task. The criteria of the training set and the test set are similar, indicating that the model has good generalization performance. The performances of the TPA–LSTM model and other prediction methods are further compared and discussed in
Section 4.
5. Conclusions
Focusing on the practical problems of low-quality data and the frequently changing operating conditions of the fields of engineering applications of the FTU, an approach to the performance evaluation and prediction of the FTU considering low-quality data and variable operating conditions is proposed in this study. First, the on-site data set is constructed by the operating parameters and the vibration amplitude, and the DBSCAN algorithm is adopted to clean the anomaly data under variable operating conditions. Second, combining the incomplete sequence information mining ability of the GRUD and the hidden pattern learning ability of the WGAN, the GRUD–WGAN based missing value imputation model is proposed to improve the low-quality data utilization value. Third, the probability healthy-state model of the FTU is constructed based on the GPR to reduce the impact of data randomness. Additionally, the NLLP is calculated as the PDI of the FTU. Fourth, the degradation trend prediction model of the FTU is established based on the TPA–LSTM. Finally, a set of comparison experiments were carried out. The verification results demonstrate that the proposed data imputation method enhances the stability and the smoothness of the obtained PDI curve. Among the compared methods, the proposed GRUD–WGAN for data imputation has the highest accuracy at each experimental rate of missing data, . In addition, when , the accuracy of GRUD–WGAN rises with the increase in the input length . When , the imputation accuracy reaches the maximum while . In addition, the constructed prediction model based on TPA–LSTM achieves the lowest RMSE and MAE, and the highest on both the training set and test set, indicating that the model has good accuracy and generalization performance.
The relative trend of the current state of the FTU against the healthy standard state is identified in this study. In the next phase of our research, if the long-term maintenance records can be obtained, the PDI curve can be correlated with the actual state of the FTU. Furthermore, a multistage degradation alarm model based on the PDI values can be constructed, so as to lay the foundation for state-based maintenance.