A Novel Denoising Method for Retaining Data Characteristics Brought from Washing Aeroengines

Yan, Zhiqi; Zu, Ming; Cui, Zhiquan; Zhong, Shisheng

doi:10.3390/math10091485

Open AccessArticle

A Novel Denoising Method for Retaining Data Characteristics Brought from Washing Aeroengines

¹

School of Mechatronics Engineering, Harbin Institute of Technology, Harbin 150001, China

²

School of Electronic Information & Media, Dongying Vocational Institute, Dongying 257091, China

³

School of Automotive Engineering, Harbin Institute of Technology, Weihai 264209, China

⁴

School of Ocean Engineering, Harbin Institute of Technology, Weihai 264209, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(9), 1485; https://doi.org/10.3390/math10091485

Submission received: 14 March 2022 / Revised: 25 April 2022 / Accepted: 26 April 2022 / Published: 29 April 2022

(This article belongs to the Special Issue Applied Mathematics to Mechanisms and Machines)

Download

Browse Figures

Versions Notes

Abstract

:

Airlines evaluate the energy-saving and emission reduction effect of washing aeroengines by analyzing the exhaust gas temperature margin (EGTM) data of aeroengines so as to formulate a reasonable washing schedule. The noise in EGTM data must be reduced because they interfere with the analysis. EGTM data will show several step changes after cleaning the aeroengine. These step changes increase the difficulty of denoising because they will be smoothed in the denoising. A denoising method for aeroengine data based on a hybrid model is proposed to meet the needs of accurately evaluating the washing effect. Specifically, the aeroengine data is first decomposed into several components by time and frequency. The amplitude of the component containing the most noise is amplified, and Gaussian noise is added to generate noise-amplified data. Second, a Gated Recurrent Unit Autoencoder (GAE) model is proposed to capture engine data features. The GAE is trained to reconstruct the original data from the amplified noise data to develop its noise reduction ability. The experimental results show that, compared with the current popular algorithms, the proposed denoising method can achieve a better denoising effect, retaining the key characteristics of the aeroengine data.

Keywords:

autoencoder; aeroengine; denoising; GRU

MSC:

68T09

1. Introduction

An aeroengine will be polluted after long-term operation. Pollution of aeroengines leads to mechanical failure, degraded engine performance, and pollutant gas emission. Washing is a main approach to controlling aeroengine pollution specified in aircraft maintenance manuals. Engine washing constitutes spraying cleaning fluid into the engine through multiple spray guns to remove fouling in the Engine. Washing an aeroengine is expensive. In order to minimize the washing cost while maintaining the normal performance of the engine, airlines must formulate a reasonable aircraft engine washing schedule.

A reasonable washing schedule can reduce 2735 tons of fuel consumption and 8626 tons of harmful gas emissions per 100 engines per year, saving USD 10.08 million [1]. However, the mechanical failure of the polluted engine causes high-frequency noise in the aeroengine data. The noise of aeroengine data will interfere with the reasonable formulation of an aircraft engine cleaning schedule, so it is necessary for airlines to denoise aeroengine data.

There are two difficulties in denoising aeroengine data: the noise of the engine itself is difficult to eliminate, and the step change of data generated by washing will be smoothed. Aeroengines are man-made systems. Different from noise in nature, the probability distribution of the engine’s own noise does not obey the normal distribution, which increases the difficulty of denoising. After washing the aircraft engine, the data will experience a step change. Traditional denoising methods often treat the step changes in data as noise and smooth them out, making it difficult for the data to accurately reveal the washing effect, adding difficulties for subsequent evaluation. Therefore, a new denoising model is needed to eliminate the noise of the engine itself and retain the impact of cleaning on the data, as shown in Figure 1.

Many noise reduction methods have been proposed, including empirical mode decomposition (EMD), wavelet threshold denoising, and filtering. Xue Feng et al. [2] decomposed the data with EMD after adding Gaussian noise to the data and successfully separated the noise from the signal by searching the dominant component of noise through the continuous mean square error criterion. Lu et al. [3] randomly shuffled the high-frequency noise part of the data and then decomposed the data with EMD to achieve noise reduction. Kai et al. [4] denoised complex images with a non-sampling wavelet transform method. Mohammad Saleh Sadooghi et al. [5] denoised the compressor vibration signal of aeroengines with the wavelet threshold denoising method. Because the effect of wavelet threshold denoising is related to the threshold and threshold function, the author evaluates 84 matching effects of the threshold and threshold function to search for the most suitable method for denoising engine compressor vibration signals. P. Maragos et al. [6] proposed morphological filters, which are filters composed of basic operations of mathematical morphology. Morphological filters can selectively suppress image noise. M. H. Sedaaghi et al. [7] proposed mediated morphological filters by combining morphological filters with media filtering and classical gray scale morphological operators. Mediated morphological filters have a good effect on eliminating salt and pepper noise in images. Yang Chao et al. [8] denoised the temperature data with a two-way Kalman filter method. They improved the computational efficiency by simplifying the Kalman filter algorithm by precalculating the filter coefficients.

The above methods show some disadvantages. The EMD algorithm must calculate the dividing point between noise components and key characters; the Wavelet threshold denoising algorithm must find the threshold and threshold function manually. These two algorithms are inefficient, and the calculation results can only be approximate. The Kalman filter is suitable for linear systems rather than highly nonlinear systems such as aeroengines. The above three methods are not adaptive methods. Researchers must select or set model parameters according to data characteristics [9]. Therefore, these methods are not suitable for aeroengine data with non-uniform data distribution, hardly meeting the needs of accurately evaluating the washing effect.

Autoencoders are adaptive artificial intelligence algorithms that are frequently used as noise reduction models. Denoising Autoencoders (DAE) were first proposed by Bourlard H. [10] to extract features from raw data. The idea of the denoising autoencoder is to make the autoencoder reconstruct the original data from the noise-added data to obtain its noise reduction ability. Vincent et al. [11] enhanced the robustness of the model by setting the input of the autoencoder to zero at a specific scale. Song Hui et al. [12] used seismic data to verify the above algorithm and found that the algorithm can filter out random noise with strong intensity, but the algorithm’s efficiency is low. According to the characteristics of underwater heterogeneous information data, Wang et al. [13] combined a three-layer sparse autoencoder with two-layer convolution to build a model with strong noise reduction ability. Peng et al. [14] added convolutional layers to the autoencoder, which made the autoencoder more robust with a smaller reconstruction error. However, the signal amplitude is reduced by this. Hui et al. [15] constructed a convolutional autoencoder to filter seismic data with a low signal-to-noise ratio and achieved desirable results. Alexander Kensert et al. [16] denoised the chromatogram to a completely or almost completely noise-free state with a deep convolutional autoencoder.

However, the DAE also smooths the edge features of the data while denoising. Although the literature [11,13] preserves the characteristics of the data to the greatest extent, the premise is that the noise of underwater heterogeneous information data and seismic data comes from nature, conforming to the Gaussian noise distribution. The noise of aeroengine data comes from the engine system rather than nature. There is no reference documenting that the engine noise conforms to the Gaussian noise distribution. Therefore, the above-mentioned algorithms are not very applicable to engine data. Therefore, the existing denoising methods based on autoencoders can only eliminate the conventional noise, but it is difficult to eliminate the specific noise of the aeroengine. These methods are not suitable for engine data and hardly meet the need to accurately evaluate the washing effect.

To meet the needs of accurately evaluating the washing effect, a denoising method for aeroengine data based on a hybrid model is proposed. This method can filter out the noise of the aeroengine data, retaining the edge features of the data. The method first splits the data by washing time, and then decomposes the data into several components by frequency. Second, the method finds the component that contains the most noise, amplifies its magnitude, and adds Gaussian noise to compose the noise-amplified data. Finally, the method inputs the noise-amplified data together with the original data into the proposed autoencoders, training the model’s ability to recover the detailed information of the engine data from the noise-amplified data. Figure 2 shows the principle of the aeroengine data denoising method based on the hybrid model.

The rest of the paper is as follows: The second part presents the decomposition method of the aeroengine data and the identification method of the data frequency band containing the most noise. A GAE model is proposed to denoise aeroengine data. The third part first introduces the source of the engine data and secondly gives the identification results of the data frequency bands containing the most noise and the determination results of the hyperparameters of the GAE model. Finally, this part tests the noise reduction effect of the EMD model, the EAD model, and the GAE model on aeroengine data. The fourth part summarizes the superiority of the proposed aeroengine data denoising method through the analysis of the above noise reduction testing results.

2. Denoising Method for Aeroengine Data Based on a Hybrid Model

The paper involves more abbreviations, and their meanings are given in Table 1.

2.1. Analysis of Noise Reduction Problems and Corresponding Solutions

The problems in eliminating engine noise and remaining data step sizes from washing are analyzed from the following three aspects, and methods are proposed.

Problem 1: Difficulty in retaining the data step size generated by washing the engine.

Aeroengines are complex nonlinear systems. Engine data is the output of a complex system that characterizes the performance and condition of the engine. Cleaning can be viewed as a stimulus applied to the engine system. The engine system has changed state due to being purged. As the system changes, the distribution of data output by the system changes. This change will be removed as noise, thus smoothing out the data’s features.

The solution to this problem is to divide aeroengine data into several data segments according to the washing time. The data distribution within each data segment does not change, avoiding the smoothing of data features.

Problem 2: Difficulty in identifying engine-specific noise.

Each data segment contains an engine characteristic part and an engine-specific noise part. The process of denoising is performed to maximize the separation of engine-specific noise components from data segments. Traditional methods can identify noise with a Gaussian distribution, but it is difficult to identify engine-specific noise.

The solution to this problem is to decompose the data segment into several components with different frequencies. The similarity of each component to the noise data specific to the aeroengine is calculated to identify the component that contains the most noise.

Problem 3: Difficulty in reducing engine-specific noise.

Because the components containing the most noise still contain engine characteristic information, it is unscientific to directly delete the data components containing the most noise. Removing these components results in a loss of the feature components in the aeroengine data, smoothing the edge features of the data.

To reduce the loss of feature components, a new denoising method is proposed. First, the amplitudes of the data components that contain the most noise are amplified. Artificial noise is added to the data. Secondly, the autoencoder model is built. Finally, let the model reconstruct the original data from the noise-amplified data, training its ability to perform denoising.

Based on the above analysis, a denoising method for aeroengine data based on a mixed model is proposed. The noise reduction process of this method is shown in Figure 3. After the EGTM data were collected, they were split into several sets of data according to the water wash records. The data components containing the most noise as well as artificial Gaussian noise are superimposed on the original data. The original data are input into the autoencoder for training to generate several sets of denoised EGTM data. Finally, several sets of data are assembled into denoised complete EGTM data.

As shown in Figure 3, the denoising method for aeroengine data based on mixed models includes a proposed method for identifying data components that contain noise and a proposed denoising method.

Method 1: The proposed method for identifying data components that contain noise.

The paper first decomposes the aeroengine data according to the washing records; secondly, the paper decomposes the data into several components by frequency with the EMD algorithm; lastly, the noise content in the data components is evaluated with the distance calculation method based on the DTW algorithm to find the data components that contain the most noise.

Method 2: The proposed denoising method.

First, the amplitudes of the data components that contain the most noise are amplified. Artificial noise is added to the data. Secondly, the autoencoder model is built. Finally, let the model reconstruct the original data from the noise-amplified data, training its ability to perform denoising.

2.2. Identification Method of Aeroengine Data Components Containing Noise Based on EMD and DTW

This section is about splitting the data, decomposing the data, and identifying noise.

The washing time is used as the basis for splitting the engine data. The engine data in this study come from OEM factories, and they are recorded with the flight cycle as the observation time, so the aeroengine data is time series data. The engine washing records are tables that record the time, location, engine type, and aircraft type when the engine was washed. What can be used in this study is wash time. Data is denoted as X = {x₁, x₂, x₃, …}. If the number of washings is n – 1, the washing records can be defined as: T_washing = {t₁, t₂, …, t_n₋₁}. T_washing can split data X into n pieces of split data. Let X(i) denote split data, which can be given by Equation (1).

X (i) = {x_{t_{(i - 1)} + 1}, x_{t_{(i - 1)} + 2}, \dots, x_{t_{i}}}, i = 1, 2, \dots, n

(1)

Then X(i) is decomposed in frequency with the EMD algorithm. The EMD algorithm is a signal analysis algorithm, which decomposes the signal according to the timescale of the aeroengine data without setting any basis functions in advance. Aeroengine data can be divided into several “intrinsic mode functions” (IMF) by the EMD method without deviating from the time domain. The aeroengine data can be expressed as the trend component and the sum of several IMF functions.

The steps of EMD decomposing the data segment X(i) obtained by dividing the aeroengine data are as follows:

Step 1: Find all the extreme points of the aeroengine data, use the spline curve to connect all the maximum points into the upper envelope, and connect all the minimum points to the lower envelope.

Step 2: Calculate the average value m(i) of the upper and lower envelopes and the engine data IMF component c(i) according to Equation (2).

c(i) = X(i) ‒ m(i)

(2)

Nordne E. Hunag [17] put forward the concept of IMF and defined two conditions of IMF: 1. In the whole time range, the number of local extreme points and zero-crossing points of IMF differs by at most one; 2. The mean value of the upper and lower envelopes is zero. If c(i) does not satisfy these two conditions, repeat steps 1 and 2 until they are satisfied, and then go to step 3.

Step 3: Separate c(i) from the aeroengine data X(i): X(i) ‒ c(i). Repeat step 1, step 2, and step 3 until X(i) ‒ c(i) becomes a monotonic sequence. Given r(i), defined by Equation (3):

r(i) = X(i) ‒ ∑c(i)

(3)

where r(i) is the residual component of engine data X(i).

The pseudocode of the EMD algorithm is given by Algorithm 1.

Algorithm 1. EMD algorithm

IMFs = []
While haspeaks(X(i)):
    maximum_points = search(X(i), maximum)
    minimum_points = search(X(i), minimum)
    max_envelope = fitting(maximum_points)
    min_envelope = fitting(minimum_points)
    c(i) = (max_envelope + min_envelope)/2
    r(i) = X(i) − c(i)
IMFs.append(data_imf)

Real engine data can be seen as a superposition of ideal data and noise data caused by minor faults. To identify the characteristic noise components of aeroengines, four kinds of fault data were prepared in this study, including compressor fault data, fan fault data, high-pressure turbine fault data, and low-pressure turbine fault data. The distance of the fault data from each data component is calculated to find the data component with the most noise.

However, each component of the engine data has unequal data lengths and non-existing linear correspondences with the four types of fault data. There are phenomena such as amplitude scaling and linear drift between data [18]. The Euclidean distance is directly used to represent the similarity between each component of the engine data and the four types of fault data, which may encounter the problem of information flooding to a certain extent, resulting in an unrealistic distance between each component of the engine data and the four types of fault data. Therefore, a distance calculation method that is compatible with various data lengths is required.

The DTW algorithm is proposed for measuring the distance between two time series of different lengths. DTW is widely used in the field of speech recognition, and it is also suitable for recognizing two similar aeroengine data sets. In this study, DTW is used to calculate the minimum distance between the four types of fault data and c(i) or r(i).

The process of the DTW algorithm is shown in Figure 4. First, a matrix grid is created, and the length and width are respectively the lengths of the engine data component and the length of the fault data. The elements of the matrix represent the Euclidean distance d of the data at the corresponding position of the engine data component and the fault data. Then take the lower-left corner (1,1) point of the matrix grid as the starting point and move to the adjacent grid corresponding to the smallest element of the right and upper elements until it reaches the upper-right corner of the matrix grid to form a path. Finally, calculate the cumulative value of the matrix elements that the path passes through. This accumulated value is the distance between the engine data component and the fault data.

The pseudocode of the DTW algorithm is given by Algorithm 2.

Algorithm 2. DTW algorithmM

n = length(engine_data)
m = length(fault_data)
dtw_matrix = [∞]_(n+1)×(m+1)
dtw_matrix(0,0) = 0
for i = 2:n + 1
    for j = 2:m + 1
      dtw_matrix(i,j) = ∑[engine_data(i − 1) − fault_data(j − 1)]² + min[dtw_matrix(i − 1,j), dtw_matrix(i − 1,j − 1), dtw_matrix(i,j − 1)]
    end
end
return dtw_matrix(n + 1, m + 1)

Define the distance as dis. The distance of c(i) or r(i) and the four kinds of fault data are given in Table 2, as are symbols and explanations.

In Table 2, the distance between c(i) and all fault data can be given by Equation (4).

dis_c_(i) = ∑(dis_c_{(i), FAN}, dis_c_{(i), COMP}, dis_c_{(i), HPT}, dis_c_{(i), LPT})

(4)

The distance between r(i) and all fault data can be given by Equation (5).

dis_r_(i) = ∑(dis_r_{(i), FAN}, dis_r_{(i), COMP}, dis_r_{(i), HPT}, dis_r_{(i), LPT})

(5)

The smaller the distance, the more noise it contains. The engine data component corresponding to the smallest distance is selected as the noise component of the aeroengine.

2.3. Gated Recurrent Unit Autoencoder(GAE): A Proposed Denoising Autoencoder Model for Aeroengine Data

DAE is a promising method for data denoising, which can be used for aeroengine data denoising. It is a feature extractor with a denoising function whose purpose is to convert noisy aeroengine data into clean aeroengine data.

In addition, the aeroengine data is a time series, and each element in the sequence is affected by all the previous elements, which requires the model to be able to learn the influence from the historical accumulation of the aeroengine data. Real engine data is difficult to obtain, and the limited amount of data has difficulty supporting large-scale models. This requires the model to have a simple structure, and the GRU (Gated Recurrent Unit) model meets the requirements. The GRU was proposed by Cho et al. [19]. GRUs have fewer parameters, making training faster and requiring less data [20].

Therefore, in this paper, the Gated Recurrent Unit Autoencoder (GAE) model is proposed as a denoising module for aeroengine data by combining the autoencoder with the GRU. The structure of the GAE model is shown in Figure 5. The model includes encoders and decoders with special structures. The input end of the encoder and the output end of the decoder are both GRU modules, and a three-layer autoencoder is used as the connection in the middle. The three layers of the autoencoder are marked as h₁, h₂, and h₃. In the coding stage, the aeroengine data is continuously input to the GRU and the characteristic data is output through h₁ and h₂. In the decoding stage, the feature data is input into the h₃ layer, and the denoised data is output through the GRU module.

In Figure 5, the input data is the aeroengine data that amplifies its own noise and adds Gaussian noise, which can be expressed as X(i)_noise; the label data is the raw aeroengine data, expressed as X(i). The GRU involved in Figure 5 has two doors: update door and reset door. the basic structure of the GRU is shown in Figure 6.

In Figure 6, the definitions are as follows: z is the update gate, r is the reset gate, h is the current state, and h_—1 is the previous state. The update gate z is used to control the amount of previous state information which is brought into the current state. The reset gate r controls how much information from the previous state is written to the current candidate set. The smaller the reset gate, the less information from the previous state is written. Define the current candidate set as

\tilde{h}

. For the input data X(i)_noise, the forward propagation formula of the network can be constructed, which is given by Equation (6):

\{\begin{array}{l} r = σ (W_{r} \cdot [h_{- 1}, X {(i)}_{noise}]) \\ z = σ (W_{z} \cdot [h_{- 1}, X {(i)}_{noise}]) \\ \tilde{h} = \tanh (W_{h} \cdot [r \cdot h_{- 1}, X {(i)}_{noise}]) \\ h = (1 - z) \cdot h_{- 1} + z \cdot \tilde{h} \\ y = σ (W_{y} \cdot h) \end{array}

(6)

where _r is the weight from the input and the hidden layer at the previous moment to the reset gate r; W_z is the weight from the input and the hidden layer at the previous moment to the update gate z; W_h is the input and the hidden layer of the previous moment to

\tilde{h}

; W_y is the weights from the hidden layer to the output layer.

The h₂ layer’s output of the denoising autoencoder model is the feature code of the aeroengine data. The encoding process is that the encoding function maps high-dimensional aeroengine data vectors to low-dimensional feature vectors. The encoder function is the activation function of the h₂ layer of the autoencoder. The activation function is defined as the sigmoid function. The feature vector output by the h₂ layer is co. Then the aeroengine data encoding process of the h₂ layer is given by Equation (7).

c o = \frac{1}{1 + e^{- w_{hid} \cdot y - b_{hid}}}

(7)

where w_hid is the h₂ layer weight matrix; b_hid is the h₂ layer bias vector.

The decoding process maps the feature vector co to the reconstructed aeroengine data. The activation function of the h₃ layer in the decoder is defined as a linear function, and the feature vector output by the h₃ layer is deco, which is given by Equation (8).

d e c o = w_{out} \cdot h + b_{out}

(8)

where w_out is the h₃ layer weight matrix; b_out is the h₃ layer bias vector.

Finally, deco is output by the GRU module as denoised data, which is defined as X(i)_denoise. The loss function in this model is the mean absolute error (MAE). Define the number of output nodes of the GAE model as n_out, then MAE is given by Equation (9).

MAE (X {(i)}_{denoise}, X {(i)}_{noise}) = \frac{1}{n_{out}} \sum_{j = 1}^{n_{out}} {(X {(i)}_{denoise} - X {(i)}_{noise})}^{2}

(9)

3. Experiment

All functions are written on the python 3.5 platform of the Windows system with Tensorflow as the framework. Tensorflow is an open source machine learning platform. The hardware platforms involved in the calculation are CPU and GPU; the models are Intel(R) Core (TM) i3-9100F CPU_3.60GHz and NVIDIA GeForce GTX TITAN X, respectively.

The data involved in this paper is aeroengine EGTM data. Airlines take EGTM as a reference for aeroengine performance. EGTM refers to the difference between the red line value of engine exhaust temperature and the exhaust temperature when the engine takes off at full thrust. When the engine is washed, there are step changes in the EGTM data. During denoising, these step changes may be recognized as noise and smoothed. There-fore, selecting EGTM data as the noise reduction object can better verify the performance of the proposed method.

Since exhaust gas temperature margin (EGTM) data is an important indicator for evaluating the efficiency of engine washing, the data form of the paper is EGTM data that includes the number of cycles recorded in the water wash. The EGTM data is collected at the outlet of the low-pressure turbine of the engine, and Figure 7 is the measurement point of the exhaust temperature of the aeroengine.

EGTM data is real data provided by the OEM. In this study, the flight cycle was used as a time unit to record the aeroengine data from takeoff to landing. Therefore, aeroengine data is time series data. In practice, the airline provided three materials for an aeroengine, including OEM data in Table 3, fault records in Table 4, and washing records in Table 5.

The data required for the study are spread out across three tables. The collection process of the data to be denoised is: According to the aircraft registration number (ID) and water washing date in Table 5, the IDs of the OEM data in Table 3 are located to collect all the corresponding EGTM data; the collection process of the fault data is: According to the ID in the fault record in Table 4, the ID of the OEM data in Table 3 is located to collect all the corresponding EGTM data. Figure 8 shows the aeroengine EGTM data. Figure 9 shows all types of data.

During the wing period, the engine was washed a total of nine times. Taking the washing time as the time point, the aeroengine data is split into eight segments. The first six segments are used as the training set, and the last two segments are used as the testing set. Number the eight segments sequentially to obtain Table 6.

3.1. Noise Identification Results for Data Components

To identify the data components that contain the most noise, the aeroengine data components are first decomposed based on EMD after being collected. Secondly, based on DTW technology, the noise identification of each component is carried out. The aeroengine EGTM data in Figure 8 are decomposed into residual components and IMF components. Figure 10 shows the two components derived from the decomposition of the six-segment aeroengine training data.

Based on the DTW algorithm, the distances were calculated between the two components of aeroengine EGTM data and four kinds of fault data, such as compressor fault data, fan fault data, high-pressure turbine fault data and low-pressure turbine fault data. In the calculation results, a very large number (>1 × 10³⁰⁸) appeared in the calculation of the distances, marked as “∞”. Table 7 records the distances between the IMF component of the six-segment aeroengine training data and the four kinds of fault data. Table 8 records the distances between the residual component of the six-segment aeroengine training data and the four types of fault data.

In Table 7 and Table 8, the distances between the residual components of No. 2, 3, and 6 in the training data and the four types of fault data are all “∞”, indicating that these three sets of residual component data hardly contain noise components. The distances between the residual components of No. 1, 4, and 5 and the four kinds of fault data are all within 15, while the distances between the IMF components of these three groups and the four kinds of fault data are smaller than those of the residual components, which shows that more noise is included in the IMF component of the aeroengine EGTM data. Therefore, the IMF component is selected as the data component containing the most noise. Separate the data components that contain the most noise in the testing data (No. 7 and No. 8), as shown in Figure 11.

3.2. Hyperparameter Settings for GAE

In this part, the number of nodes in the h₁ layer and the number of nodes in the h₂ layer of the GAE model are determined. Define the number of nodes in the h₁ layer as n_in, the number of nodes in the h₂ layer as n_hid, and the number of nodes in the h₃ layer as n_out. Since the number of nodes in the h₁ layer is same as that in the h₃ layer, n_in = n_out.

There is currently no formula defining the n_in. To obtain reliable results, the enumeration method is used to determine the most suitable n_in. The enumeration range is 15–25. The mean absolute error (MAE) is used to describe the reconstruction error of the denoising autoencoder for the null engine data.

The model uses the Adam algorithm as the descent algorithm. The number of iterations is 3000 and the training batch is 100. The learning rate is 0.001, and the component containing the most noise is amplified by a factor of 1.1. The model was run 10 times, and the n_in and the reconstruction error were plotted as shown in Figure 12.

The vertical line in Figure 12 marks the n_in when the reconstruction error is the smallest. In order to analyze the results in more detail, the mean, standard deviation, and minimum and maximum values of 10 experiments are given in Table 9. The data in the Table 9 are reserved for three significant figures.

From the data given in Table 9, the average reconstruction error is the lowest when n_in is 20. The average reconstruction error is slightly higher when n_in is 19. However, the standard deviation when n_in is 20 is larger than that when n_in is 19, being 0.0397 and 0.0272, respectively. The minimum error value when n_in is 20 is smaller than that when n_in is 19. The maximum error when n_in is 20 is greater than that when n_in is 19. Table 5 shows that the output is unstable when n_in is 20, although the average reconstruction error is small. Although the average reconstruction error is slightly larger, the output is stable when n_in is 19. Therefore, n_in is determined to be 19 in this paper.

Similarly, to minimize reconstruction error, n_hid must be determined. At present, researchers have developed empirical formulas [21] for n_hid, which help narrow the search for optimal nodes. The empirical formula is given by Equation (10).

n_{hid} = \sqrt{n_{in} \times n_{out}}

(10)

According to Equation (10), it can be roughly determined that n_hid is around 19. The optimal n_hid is searched by the enumeration method in the interval of 15–25. After the experiment is carried out 10 times, n_hid and the reconstruction error are plotted in Figure 13.

As can be seen from Figure 13, when n_hid is 21, the reconstruction error is the lowest. To analyze the results in more detail, in Table 10, the mean reconstruction error, standard deviation, minimum error, and maximum error of 10 experiments are given. The above four types of data in Table 10 all retain three significant figures.

From the data given in Table 10, the mean reconstruction error is the lowest when n_hid is 21. The standard deviation is the smallest when n_hid is 21. This means that the model’s output is currently stable. Therefore, the most suitable n_hid is determined to be 21.

3.3. Validation of Aeroengine Data Denoising Method

3.3.1. Reconstruction Accuracy Verification of GAE

First, the reconstruction accuracy of the GAE model is verified. Figure 14 visually shows the reconstruction errors of the three denoising models. Data No. 1–6 are the reconstruction error of the training data in the figure, and data No. 7–8 are the reconstruction error of the Testing data. The figure shows that the reconstruction error of the denoised data output by the EMD model is much larger than that of the GAE model and the EAD model. Except for data No. 4 and No. 8, the reconstruction error of the GAE model is smaller than that of the EAD model.

The reconstruction errors of the DAE, EMD, and GAE models on the training data are given in Table 11.

Table 11 and Table 12 show that the MAE error of GAE training data is (0.0864 − 0.0747)/0.0864 × 100% = 13.54% lower than that of DAE and 78.72% lower than that of EMD; the MAE error of GAE test data is 3.11% lower than that of DAE and 76.80% lower than that of EMD.

The MAE error of GAE is relatively smaller than that of DAE, which reflects that the GRU in GAE can learn the time series relationship of aeroengine data so as to better reconstruct pure data. The combination of the GRU module and the DAE model effectively improves the noise reduction effect, making the MAE error smaller than that of the DAE model.

The MAE error of GAE is much smaller than that of EMD. The reason for this is that the algorithm of EMD is susceptible to noise interference. The algorithm of EMD separates the noise signal by calculating the envelopes of the extrema of the data. Noise makes the extremum of the data change, and the envelopes follow. The changes of the envelope also make the EMD algorithm unstable, resulting in a large error.

To summarize, Table 8 and Table 11 reflect that the noise reduction effect of the DAE model is slightly better than that of the EMD model. The mean reconstruction error of the GAE model is smaller than that of the EAD model. The data reconstruction ability of the GAE model is stronger than that of the DAE model and the EMD model. It is proved that the GAE model can preserve the characteristics of the aeroengine better.

3.3.2. The Effectiveness of the Proposed Noise Reduction Method Based on Hybrid Model Is Verified

Figure 15 shows the reconstruction curves of DAE, EMD, and the proposed model on the EGTM Testing data of an aeroengine. In the figure, the original data are marked with black curves, the denoised data of the proposed model are marked with green dotted lines, the denoised data of the EAD model are marked with light blue, and the noise reduction of the EMD model are marked with dark blue.

The red vertical line in Figure 15 represents the time of washing the aeroengine. Due to the washing, there is a step change in the original EGTM data. EAD, EMD, and the proposed method show different performances under the influence of cleaning. The EAD method significantly smoothed the step change of EGTM. It shows that the EAD model has an underfitting problem in training. The EMD method separates the step change as noise from the EGTM, thus smoothing the step change of the EGTM.

In Figure 15, the proposed model can reconstruct the EGTM sequence and express data mutation well. The denoised data of the proposed model in the figure is closer to the original data than DAE and EMD, which shows that the denoised data of the proposed model is more suitable for the subsequent analysis of washing effects. In order to more accurately express the noise reduction effect of DAE, EMD, and the proposed model on aeroengine data, the EGTM step value after aeroengine washing is investigated.

The influence of the denoising model on the subsequent evaluation of the washing effect can be known by calculating step size after washing. The step size after washing is a direct expression of the washing effect. Therefore, calculating the post-washing step size of the denoised data is an important indicator to investigate whether the denoising model will affect the evaluation of the washing effect. Since the data after washing has a gradual increase rather than a sudden increase, the step size calculated in the paper is the difference between the average values of 10 data before and after washing. The step size is calculated from the No. 2 data.

Figure 16 visually shows the steps sizes of No. 2–8 raw data and the step sizes of denoised data of DAE, EMD and the proposed model. Table 13 gives the step sizes of the training data and testing data. The data in Table 13 are the steps sizes of each piece of data. Since No. 1 is unwashed engine data, the step size is recorded from No. 2 data.

Table 13 is the variance of the steps of the original data and the steps of the denoised data of DAE, EMD, and the proposed model.

It can be seen from Table 14 that EMD has a large variance in both the testing set and the training set. The variances of the DAE in the training set and testing set are much smaller than those of the EMD, but they are still large. The variances of the proposed model are extremely small in both the testing set and the training set. The step sizes calculated from the denoised data of the proposed model are the closest to the original data. Therefore, the proposed model has the least impact on the analysis of subsequent washing effects. It is more suitable for aeroengine data noise reduction than other models.

4. Conclusions

To improve prediction accuracy, a denoising method for aeroengine data based on a hybrid model is proposed in this paper. The method first amplifies the noise part of the data, and then adds Gaussian noise to the data as the input of the autoencoder. Let the autoencoder reconstruct the original data from the amplified noise data so that the autoencoder can perform targeted noise reduction. In the paper, the proposed model is compared with EMD and DAE, which reflects that the proposed model can effectively denoise the data and retain mutation characteristics after aeroengine washing.

The autoencoder involved in the hybrid-model-based aeroengine data denoising method is the GAE model proposed in this paper. The GAE model is composed of three fully connected layers connecting two GRU modules. The model is good at working with time series data. After testing with real aeroengine data, compared with EMD and DAE, the reconstruction error of the GAE model is the smallest, preserving the data features to the greatest extent.

The model proposed in this paper has an ideal effect on the denoising of EGTM data after aeroengine washing. This model is applicable for denoising the various data with sudden changes, such as the gas path data of aeroengines or gas turbines after maintenance. In the future, we will plan to collect more real data to improve our methods.

Author Contributions

Methodology, M.Z.; writing—original draft preparation, Z.Y., Z.C., and S.Z.; writing—review and editing, Z.C.; funding acquisition, M.Z. and Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number U2133202; and the National Natural Science Foundation of China, grant number 51975157.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chen, D.; Sun, J. Fuel and emission reduction assessment for civil aircraft engine fleet on-wing washing. Transp. Res. Part D Transp. Environ. 2018, 65, 324–331. [Google Scholar] [CrossRef]
Xue, F.; Sun, X.; Dong, Z.; Yang, H.; Wang, H. Research on Data Denoising Algorithm Based on EEMD. Mech. Eng. Autom. 2021, 5, 9–11. [Google Scholar]
Lu, T.; Qian, W.; He, X.; Le, Y.; Huang, J. An improved EMD noise reduction method based on noise statistical characteristics, Bull. Surv. Mapp. 2020, 11, 71–75. [Google Scholar]
Hu, K.; Cheng, Q.; Li, B.; Gao, X. The complex data denoising in MR images based on the directional extension for the undecimated wavelet transform. Biomed. Signal Process. Control 2018, 39, 336–350. [Google Scholar] [CrossRef]
Sadooghi, M.S.; Khadem, S.E. A new performance evaluation scheme for jet engine vibration signal denoising. Mech. Syst. Signal Process. 2016, 76, 201–212. [Google Scholar] [CrossRef]
Maragos, P.; Schafer, R.W. Morphological Filters. Part 1. Their Set-Theoretic Analysis and Relations to Linear Shift-Invariant Filters. IEEE Trans. Acoust. Speech Signal Process. 1987, 35, 1153–1169. [Google Scholar]
Sedaaghi, M.H.; Daj, R.; Khosravi, M. Mediated morphological filters. In Proceedings of the 2001 International Conference on Image Processing (Cat. No.01CH37205), Thessaloniki, Greece, 7–10 October 2001; pp. 692–695. [Google Scholar]
Yang, C.; Li, J.; Yang, W.; Yang, W. Denoising Method for Temperature Log Data Based on A Kalman Filter. Well Logging Technol. 2020, 2, 168–171. [Google Scholar]
Li, Y.; Wang, C.; Tian, Y.; Wang, S. Parameter-shared variational auto-encoding adversarial network for desert seismic data denoising in Northwest China. J. Appl. Geophys. 2021, 11, 104428. [Google Scholar] [CrossRef]
Bourlard, H.; Kamp, Y. Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 1988, 4, 291–294. [Google Scholar] [CrossRef] [PubMed]
Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and Composing Robust Features with Denoising Autoencoders. In Proceedings of the 25th international conference on Machine Learning, Helsinki, Finland, 5–9 June 2008. [Google Scholar]
Song, H.; Gao, Y.; Chen, W.; Zhang, X. Seismic noise suppression based on convolutional denoising autoencoder. Oil Geophys. Prospect. 2020, 6, 1210–1219. [Google Scholar]
Wang, X.; Zhao, Y.; Teng, X.; Sun, W. A stacked convolutional sparse denoising autoencoder model for underwater heterogeneous information data. Appl. Acoust. 2020, 167, 107391. [Google Scholar] [CrossRef]
Peng, F.; Gao, Y. BPSK Signal Denoise Based on Convolution Auto-Encoder Network. Inf. Commun. 2020, 8, 41–44. [Google Scholar]
Song, H.; Gao, Y.; Chen, W.; Xue, Y.J.; Zhang, H.; Zhang, X. Seismic random noise suppression using deep convolutional autoencoder neural network. J. Appl. Geophys. 2020, 178, 104071. [Google Scholar] [CrossRef]
Kensert, A.; Collaerts, G.; Efthymiadis, K.; Van Broeck, P.; Desmet, G.; Cabooter, D. Deep convolutional autoencoder for the simultaneous removal of baseline noise and baseline drift in chromatograms. J. Chromatogr. A 2021, 1, 462093. [Google Scholar] [CrossRef] [PubMed]
Wu, Z.; Huang, N.E. A study of the characteristics of white noise using the empirical mode decomposition method. Proc. R. Soc. London. Ser. A Math. Phys. Eng. Sci. 2004, 460, 1597–1611. [Google Scholar] [CrossRef]
Park, S.; Chu, W.W.; Yoon, J.; Won, J. Similarity Search of Time-Warped Subsequences via a Suffix Tree. Inf. Syst. 2003, 7, 867–883. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.H.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
Olah, C. Understanding Lstm Networks. 2015. Available online: http://colah.github.io/posts/2015-08-Understanding-LSTMs (accessed on 9 March 2021).
Sequin, C.H.; Clay, R.D. Fault tolerance in artificial neural networks. In Proceedings of the 1990 IJCNN International Joint Conference on Neural Networks, San Diego, CA, USA, 17–21 June 1990; pp. 703–708. [Google Scholar]

Figure 1. Demand for denoising aeroengine data. (a) Raw engine data; (b) Denoising effect of traditional methods; (c) Denoising required for aircraft engine

Figure 2. Denoising model principle.

Figure 3. Denoising method for aeroengine data based on a hybrid model.

Figure 4. Principle of DTW algorithm.

Figure 5. GAE Model.

Figure 6. The structure of GRU.

Figure 7. Aeroengine data sources.

Figure 8. The engine data involved in this paper.

Figure 9. The fault data involved in this paper. (a) EGTM data for compressor failure; (b) EGTM data for fan failure; (c) EGTM data for high pressure turbine failure; (d) EGTM data for low pressure turbine failure.

Figure 10. Components of the six-segment aeroengine training data. (a) No. 1 data residual component; (b) No. 1 data IMF component; (c) No. 2 data residual component; (d) No. 2 data IMF component; (e) No. 3 data residual component; (f) No. 3 data IMF component; (g) No. 4 data residual component; (h) No. 4 data IMF component; (i) No. 5 data residual component; (j) No. 5 data IMF component; (k) No. 6 data residual component; (l) No. 6 data IMF component.

Figure 11. Components that contain the most noise in the testing data. (a) Data No. 7 contains the most noise data component; (b) Data No. 8 contains the most noise data component.

Figure 12. Relationship between the number of nodes in the h₁ layer and reconstruction error.

Figure 13. Relationship between n_hid and reconstruction error.

Figure 14. Comparison of reconstruction accuracy of three noise reduction models.

Figure 15. Comparison of noise reduction effects of three noise reduction models.

Figure 16. The step sizes from data of all models.

Table 1. Meaning of abbreviations.

Abbreviations	Full Title
DAE	Denoising Autoencoders
EMD	Empirical Mode Decomposition
EGTM	Exhaust Gas Temperature Margin
DTW	Dynamic Time Warping
IMF	Intrinsic Mode Functions
GRU	Gated Recurrent Unit Autoencoder
OEM	Original Equipment Manufacturer
MAE	Mean Absolute Error

Table 2. Distance table.

dis	Interpretation of dis
dis_c(i)	Distance sum between c(i) and all fault data
dis_r(i)	Distance sum between r(i) and all fault data
dis_{c(i), FAN}	Distance between c(i) and fan fault data
dis_{c(i), COMP}	Distance between c(i) and compressor fault data
dis_{c(i), HPT}	Distance between c(i) and high-pressure turbine fault data
dis_{r(i), FAN}	Distance between r(i) and fan failure data
dis_{r(i), COMP}	Distance between r(i) and compressor failure data
dis_{r(i), HPT}	Distance between r(i)and high-pressure turbine failure data
dis_{r(i), LPT}	Distance between r(i) and low-pressure turbine failure data

Table 3. OEM data.

ID	ESN	Time	EGTM
B-5793	657208	16 September 2013 1:52	90.845
B-5793	657208	16 September 2013 5:09	84.52
B-5793	657208	16 September 2013 8:50	87.208
B-5793	657208	16 September 2013 12:31	89.306
B-5793	657208	17 September 2013 8:00	82.397
B-5793	657208	17 September 2013 11:45	85.755
B-5793	657208	18 September 2013 9:44	85.973
B-5793	657208	21 September 2013 0:06	66.281
B-5793	657208	21 September 2013 4:08	86.617
B-5793	657208	21 September 2013 8:36	75.281

Table 4. Fault records.

ID	Full Title	Noise Source	Noise Type
B2530	1 August 2009	Abnormal left engine	Compressor fault noise
B2588	2 January 2014	Fan seal broken	Fan fault noise
B6076	26 February 2002	Turbine oil leakage	Turbine fault noise
B6070	2 January 2014	Abnormal turbine blade	Turbine fault noise

Table 5. Washing records.

Date	ID	Base	CAMP
8 September 2014 0:00	B-1816	Beijing	A320 720000-CCA-C-02
8 September 2014 0:00	B-1816	Beijing	A320 720000-CCA-C-02
28 September 2014 0:00	B-2210	Hangzhou	A320 720000-CCA-C-02
16 January 2015 0:00	B-2210	Hangzhou	A320 720000-CCA-C-02
9 March 2014 0:00	B-2210	Hangzhou	A320 720000-CCA-C-03
19 April 2014 0:00	B-2210	Hangzhou	A320 720000-CCA-C-03
16 April 2014 0:00	B-2364	Chengdu	A320 720000-CCA-C-02
16 April 2014 0:00	B-2364	Chengdu	A320 720000-CCA-C-02

Table 6. Split aeroengine data grouping table.

	Data Number	Flight Cycles
Training data	1	1~329
	2	330~481
	3	482~708
	4	709~1142
	5	1143~1484
	6	1485~1711
Testing Data	7	1712~1981
	8	1982~2222

Table 7. Distances of IMF components.

Data Number	Fan Fault Data	Compressor Fault Data	High Pressure Turbine Fault Data	Low Pressure Turbine Fault Data
1	5.236	5.227	6.493	6.199
2	4442.453	5582.838	∞	∞
3	10,463.488	12,162.886	∞	∞
4	2.907	4.205	2.521	5.467
5	3.167	3.981	3.0158	6.385
6	6209.860	7909.258	∞	∞

Table 8. Distances of residual components.

Data Number	Fan Fault Data	Compressor Fault Data	High Pressure Turbine Fault Data	Low Pressure Turbine Fault Data
1	10.703	13.716	9.991	13.171
2	∞	∞	∞	∞
3	∞	∞	∞	∞
4	8.680	11.051	8.301	11.654
5	9.745	11.480	9.929	14.420
6	∞	∞	∞	∞

Table 9. Table of the nodes number in the h₁ layer and reconstruction error data.

n_in	Mean Reconstruction Error	Standard Deviation	Minimum	Maximum
15	0.0442	0.0476	0.0174	0.152
16	0.0420	0.0237	0.0208	0.0863
17	0.0486	0.0317	0.0236	0.111
18	0.0420	0.0286	0.0190	0.114
19	0.0398	0.0272	0.0197	0.117
20	0.0376	0.0397	0.0170	0.140
21	0.0481	0.0349	0.0261	0.150
22	0.0484	0.0441	0.0173	0.168
23	0.0548	0.0356	0.0254	0.114
24	0.0528	0.0496	0.0194	0.117
25	0.0497	0.0507	0.0223	0.186

Table 10. Table of n_hid and reconstruction error.

n_hid	Mean Reconstruction Error	Standard Deviation	Minimum	Maximum
15	0.0172	0.00522	0.00900	0.0260
16	0.0151	0.00460	0.00900	0.0220
17	0.0157	0.00748	0.00900	0.0350
18	0.0143	0.00487	0.0100	0.0270
19	0.0129	0.00375	0.00900	0.0190
20	0.0153	0.00503	0.00800	0.0260
21	0.0123	0.00343	0.00800	0.0180
22	0.0144	0.00469	0.00900	0.0220
23	0.0128	0.00418	0.00700	0.0190
24	0.0132	0.00486	0.00900	0.0230
25	0.0133	0.00421	0.00800	0.0230

Table 11. Reconstruction errors of training data.

Model	Reconstruction Error of Training Data						Average
GAE	0.0810	0.0909	0.0637	0.0681	0.0585	0.0861	0.0747
DAE	0.0853	0.0995	0.0991	0.0628	0.0901	0.0817	0.0864
EMD	0.185	0.3578	0.320	0.288	0.248	0.705	0.351

Table 12. Reconstruction errors of testing data.

Model	Reconstruction Error of Testing Data		Average
GAE	0.0807	0.0807	0.0965
DAE	0.161	0.161	0.0996
EMD	0.365	0.467	0.416

Table 13. Step of each segment of data.

Model	Step Size of the EGTM/°C
Model	Training Data					Testing Data
Original data	11.326	5.212	7.199	12.206	12.078	5.613	6.971
The proposed model	10.604	4.841	6.832	11.292	11.555	5.584	5.108
DAE	9.649	4.525	5.435	9.979	10.958	5.871	2.709
EMD	10.220	2.813	5.067	8.137	9.469	10.288	0.738

Table 14. Step value variance between noise reduction data and original data.

Model	Training Data	Testing Data
The proposed model	0.381	1.736
DAE	2.522	9.116
EMD	6.978	30.358

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yan, Z.; Zu, M.; Cui, Z.; Zhong, S. A Novel Denoising Method for Retaining Data Characteristics Brought from Washing Aeroengines. Mathematics 2022, 10, 1485. https://doi.org/10.3390/math10091485

AMA Style

Yan Z, Zu M, Cui Z, Zhong S. A Novel Denoising Method for Retaining Data Characteristics Brought from Washing Aeroengines. Mathematics. 2022; 10(9):1485. https://doi.org/10.3390/math10091485

Chicago/Turabian Style

Yan, Zhiqi, Ming Zu, Zhiquan Cui, and Shisheng Zhong. 2022. "A Novel Denoising Method for Retaining Data Characteristics Brought from Washing Aeroengines" Mathematics 10, no. 9: 1485. https://doi.org/10.3390/math10091485

APA Style

Yan, Z., Zu, M., Cui, Z., & Zhong, S. (2022). A Novel Denoising Method for Retaining Data Characteristics Brought from Washing Aeroengines. Mathematics, 10(9), 1485. https://doi.org/10.3390/math10091485

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Denoising Method for Retaining Data Characteristics Brought from Washing Aeroengines

Abstract

1. Introduction

2. Denoising Method for Aeroengine Data Based on a Hybrid Model

2.1. Analysis of Noise Reduction Problems and Corresponding Solutions

2.2. Identification Method of Aeroengine Data Components Containing Noise Based on EMD and DTW

2.3. Gated Recurrent Unit Autoencoder(GAE): A Proposed Denoising Autoencoder Model for Aeroengine Data

3. Experiment

3.1. Noise Identification Results for Data Components

3.2. Hyperparameter Settings for GAE

3.3. Validation of Aeroengine Data Denoising Method

3.3.1. Reconstruction Accuracy Verification of GAE

3.3.2. The Effectiveness of the Proposed Noise Reduction Method Based on Hybrid Model Is Verified

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI