Next Article in Journal
AI Student: A Machine Reading Comprehension System for the Korean College Scholastic Ability Test
Next Article in Special Issue
Adaptive Rejection of a Sinusoidal Disturbance with Unknown Frequency in a Flexible Rotor with Lubricated Journal Bearings
Previous Article in Journal
Integrated Order Picking and Multi-Skilled Picker Scheduling in Omni-Channel Retail Stores
Previous Article in Special Issue
Predictive Suspension Algorithm for Land Vehicles over Deterministic Topography
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Denoising Method for Retaining Data Characteristics Brought from Washing Aeroengines

1
School of Mechatronics Engineering, Harbin Institute of Technology, Harbin 150001, China
2
School of Electronic Information & Media, Dongying Vocational Institute, Dongying 257091, China
3
School of Automotive Engineering, Harbin Institute of Technology, Weihai 264209, China
4
School of Ocean Engineering, Harbin Institute of Technology, Weihai 264209, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(9), 1485; https://doi.org/10.3390/math10091485
Submission received: 14 March 2022 / Revised: 25 April 2022 / Accepted: 26 April 2022 / Published: 29 April 2022
(This article belongs to the Special Issue Applied Mathematics to Mechanisms and Machines)

Abstract

:
Airlines evaluate the energy-saving and emission reduction effect of washing aeroengines by analyzing the exhaust gas temperature margin (EGTM) data of aeroengines so as to formulate a reasonable washing schedule. The noise in EGTM data must be reduced because they interfere with the analysis. EGTM data will show several step changes after cleaning the aeroengine. These step changes increase the difficulty of denoising because they will be smoothed in the denoising. A denoising method for aeroengine data based on a hybrid model is proposed to meet the needs of accurately evaluating the washing effect. Specifically, the aeroengine data is first decomposed into several components by time and frequency. The amplitude of the component containing the most noise is amplified, and Gaussian noise is added to generate noise-amplified data. Second, a Gated Recurrent Unit Autoencoder (GAE) model is proposed to capture engine data features. The GAE is trained to reconstruct the original data from the amplified noise data to develop its noise reduction ability. The experimental results show that, compared with the current popular algorithms, the proposed denoising method can achieve a better denoising effect, retaining the key characteristics of the aeroengine data.
MSC:
68T09

1. Introduction

An aeroengine will be polluted after long-term operation. Pollution of aeroengines leads to mechanical failure, degraded engine performance, and pollutant gas emission. Washing is a main approach to controlling aeroengine pollution specified in aircraft maintenance manuals. Engine washing constitutes spraying cleaning fluid into the engine through multiple spray guns to remove fouling in the Engine. Washing an aeroengine is expensive. In order to minimize the washing cost while maintaining the normal performance of the engine, airlines must formulate a reasonable aircraft engine washing schedule.
A reasonable washing schedule can reduce 2735 tons of fuel consumption and 8626 tons of harmful gas emissions per 100 engines per year, saving USD 10.08 million [1]. However, the mechanical failure of the polluted engine causes high-frequency noise in the aeroengine data. The noise of aeroengine data will interfere with the reasonable formulation of an aircraft engine cleaning schedule, so it is necessary for airlines to denoise aeroengine data.
There are two difficulties in denoising aeroengine data: the noise of the engine itself is difficult to eliminate, and the step change of data generated by washing will be smoothed. Aeroengines are man-made systems. Different from noise in nature, the probability distribution of the engine’s own noise does not obey the normal distribution, which increases the difficulty of denoising. After washing the aircraft engine, the data will experience a step change. Traditional denoising methods often treat the step changes in data as noise and smooth them out, making it difficult for the data to accurately reveal the washing effect, adding difficulties for subsequent evaluation. Therefore, a new denoising model is needed to eliminate the noise of the engine itself and retain the impact of cleaning on the data, as shown in Figure 1.
Many noise reduction methods have been proposed, including empirical mode decomposition (EMD), wavelet threshold denoising, and filtering. Xue Feng et al. [2] decomposed the data with EMD after adding Gaussian noise to the data and successfully separated the noise from the signal by searching the dominant component of noise through the continuous mean square error criterion. Lu et al. [3] randomly shuffled the high-frequency noise part of the data and then decomposed the data with EMD to achieve noise reduction. Kai et al. [4] denoised complex images with a non-sampling wavelet transform method. Mohammad Saleh Sadooghi et al. [5] denoised the compressor vibration signal of aeroengines with the wavelet threshold denoising method. Because the effect of wavelet threshold denoising is related to the threshold and threshold function, the author evaluates 84 matching effects of the threshold and threshold function to search for the most suitable method for denoising engine compressor vibration signals. P. Maragos et al. [6] proposed morphological filters, which are filters composed of basic operations of mathematical morphology. Morphological filters can selectively suppress image noise. M. H. Sedaaghi et al. [7] proposed mediated morphological filters by combining morphological filters with media filtering and classical gray scale morphological operators. Mediated morphological filters have a good effect on eliminating salt and pepper noise in images. Yang Chao et al. [8] denoised the temperature data with a two-way Kalman filter method. They improved the computational efficiency by simplifying the Kalman filter algorithm by precalculating the filter coefficients.
The above methods show some disadvantages. The EMD algorithm must calculate the dividing point between noise components and key characters; the Wavelet threshold denoising algorithm must find the threshold and threshold function manually. These two algorithms are inefficient, and the calculation results can only be approximate. The Kalman filter is suitable for linear systems rather than highly nonlinear systems such as aeroengines. The above three methods are not adaptive methods. Researchers must select or set model parameters according to data characteristics [9]. Therefore, these methods are not suitable for aeroengine data with non-uniform data distribution, hardly meeting the needs of accurately evaluating the washing effect.
Autoencoders are adaptive artificial intelligence algorithms that are frequently used as noise reduction models. Denoising Autoencoders (DAE) were first proposed by Bourlard H. [10] to extract features from raw data. The idea of the denoising autoencoder is to make the autoencoder reconstruct the original data from the noise-added data to obtain its noise reduction ability. Vincent et al. [11] enhanced the robustness of the model by setting the input of the autoencoder to zero at a specific scale. Song Hui et al. [12] used seismic data to verify the above algorithm and found that the algorithm can filter out random noise with strong intensity, but the algorithm’s efficiency is low. According to the characteristics of underwater heterogeneous information data, Wang et al. [13] combined a three-layer sparse autoencoder with two-layer convolution to build a model with strong noise reduction ability. Peng et al. [14] added convolutional layers to the autoencoder, which made the autoencoder more robust with a smaller reconstruction error. However, the signal amplitude is reduced by this. Hui et al. [15] constructed a convolutional autoencoder to filter seismic data with a low signal-to-noise ratio and achieved desirable results. Alexander Kensert et al. [16] denoised the chromatogram to a completely or almost completely noise-free state with a deep convolutional autoencoder.
However, the DAE also smooths the edge features of the data while denoising. Although the literature [11,13] preserves the characteristics of the data to the greatest extent, the premise is that the noise of underwater heterogeneous information data and seismic data comes from nature, conforming to the Gaussian noise distribution. The noise of aeroengine data comes from the engine system rather than nature. There is no reference documenting that the engine noise conforms to the Gaussian noise distribution. Therefore, the above-mentioned algorithms are not very applicable to engine data. Therefore, the existing denoising methods based on autoencoders can only eliminate the conventional noise, but it is difficult to eliminate the specific noise of the aeroengine. These methods are not suitable for engine data and hardly meet the need to accurately evaluate the washing effect.
To meet the needs of accurately evaluating the washing effect, a denoising method for aeroengine data based on a hybrid model is proposed. This method can filter out the noise of the aeroengine data, retaining the edge features of the data. The method first splits the data by washing time, and then decomposes the data into several components by frequency. Second, the method finds the component that contains the most noise, amplifies its magnitude, and adds Gaussian noise to compose the noise-amplified data. Finally, the method inputs the noise-amplified data together with the original data into the proposed autoencoders, training the model’s ability to recover the detailed information of the engine data from the noise-amplified data. Figure 2 shows the principle of the aeroengine data denoising method based on the hybrid model.
The rest of the paper is as follows: The second part presents the decomposition method of the aeroengine data and the identification method of the data frequency band containing the most noise. A GAE model is proposed to denoise aeroengine data. The third part first introduces the source of the engine data and secondly gives the identification results of the data frequency bands containing the most noise and the determination results of the hyperparameters of the GAE model. Finally, this part tests the noise reduction effect of the EMD model, the EAD model, and the GAE model on aeroengine data. The fourth part summarizes the superiority of the proposed aeroengine data denoising method through the analysis of the above noise reduction testing results.

2. Denoising Method for Aeroengine Data Based on a Hybrid Model

The paper involves more abbreviations, and their meanings are given in Table 1.

2.1. Analysis of Noise Reduction Problems and Corresponding Solutions

The problems in eliminating engine noise and remaining data step sizes from washing are analyzed from the following three aspects, and methods are proposed.
Problem 1: Difficulty in retaining the data step size generated by washing the engine.
Aeroengines are complex nonlinear systems. Engine data is the output of a complex system that characterizes the performance and condition of the engine. Cleaning can be viewed as a stimulus applied to the engine system. The engine system has changed state due to being purged. As the system changes, the distribution of data output by the system changes. This change will be removed as noise, thus smoothing out the data’s features.
The solution to this problem is to divide aeroengine data into several data segments according to the washing time. The data distribution within each data segment does not change, avoiding the smoothing of data features.
Problem 2: Difficulty in identifying engine-specific noise.
Each data segment contains an engine characteristic part and an engine-specific noise part. The process of denoising is performed to maximize the separation of engine-specific noise components from data segments. Traditional methods can identify noise with a Gaussian distribution, but it is difficult to identify engine-specific noise.
The solution to this problem is to decompose the data segment into several components with different frequencies. The similarity of each component to the noise data specific to the aeroengine is calculated to identify the component that contains the most noise.
Problem 3: Difficulty in reducing engine-specific noise.
Because the components containing the most noise still contain engine characteristic information, it is unscientific to directly delete the data components containing the most noise. Removing these components results in a loss of the feature components in the aeroengine data, smoothing the edge features of the data.
To reduce the loss of feature components, a new denoising method is proposed. First, the amplitudes of the data components that contain the most noise are amplified. Artificial noise is added to the data. Secondly, the autoencoder model is built. Finally, let the model reconstruct the original data from the noise-amplified data, training its ability to perform denoising.
Based on the above analysis, a denoising method for aeroengine data based on a mixed model is proposed. The noise reduction process of this method is shown in Figure 3. After the EGTM data were collected, they were split into several sets of data according to the water wash records. The data components containing the most noise as well as artificial Gaussian noise are superimposed on the original data. The original data are input into the autoencoder for training to generate several sets of denoised EGTM data. Finally, several sets of data are assembled into denoised complete EGTM data.
As shown in Figure 3, the denoising method for aeroengine data based on mixed models includes a proposed method for identifying data components that contain noise and a proposed denoising method.
Method 1: The proposed method for identifying data components that contain noise.
The paper first decomposes the aeroengine data according to the washing records; secondly, the paper decomposes the data into several components by frequency with the EMD algorithm; lastly, the noise content in the data components is evaluated with the distance calculation method based on the DTW algorithm to find the data components that contain the most noise.
Method 2: The proposed denoising method.
First, the amplitudes of the data components that contain the most noise are amplified. Artificial noise is added to the data. Secondly, the autoencoder model is built. Finally, let the model reconstruct the original data from the noise-amplified data, training its ability to perform denoising.

2.2. Identification Method of Aeroengine Data Components Containing Noise Based on EMD and DTW

This section is about splitting the data, decomposing the data, and identifying noise.
The washing time is used as the basis for splitting the engine data. The engine data in this study come from OEM factories, and they are recorded with the flight cycle as the observation time, so the aeroengine data is time series data. The engine washing records are tables that record the time, location, engine type, and aircraft type when the engine was washed. What can be used in this study is wash time. Data is denoted as X = {x1, x2, x3, …}. If the number of washings is n – 1, the washing records can be defined as: Twashing = {t1, t2, …, tn−1}. Twashing can split data X into n pieces of split data. Let X(i) denote split data, which can be given by Equation (1).
X ( i ) = { x t ( i 1 ) + 1 , x t ( i 1 ) + 2 , , x t i } , i = 1 , 2 , , n
Then X(i) is decomposed in frequency with the EMD algorithm. The EMD algorithm is a signal analysis algorithm, which decomposes the signal according to the timescale of the aeroengine data without setting any basis functions in advance. Aeroengine data can be divided into several “intrinsic mode functions” (IMF) by the EMD method without deviating from the time domain. The aeroengine data can be expressed as the trend component and the sum of several IMF functions.
The steps of EMD decomposing the data segment X(i) obtained by dividing the aeroengine data are as follows:
Step 1: Find all the extreme points of the aeroengine data, use the spline curve to connect all the maximum points into the upper envelope, and connect all the minimum points to the lower envelope.
Step 2: Calculate the average value m(i) of the upper and lower envelopes and the engine data IMF component c(i) according to Equation (2).
c(i) = X(i) ‒ m(i)
Nordne E. Hunag [17] put forward the concept of IMF and defined two conditions of IMF: 1. In the whole time range, the number of local extreme points and zero-crossing points of IMF differs by at most one; 2. The mean value of the upper and lower envelopes is zero. If c(i) does not satisfy these two conditions, repeat steps 1 and 2 until they are satisfied, and then go to step 3.
Step 3: Separate c(i) from the aeroengine data X(i): X(i) ‒ c(i). Repeat step 1, step 2, and step 3 until X(i) ‒ c(i) becomes a monotonic sequence. Given r(i), defined by Equation (3):
r(i) = X(i) ‒ ∑c(i)
where r(i) is the residual component of engine data X(i).
The pseudocode of the EMD algorithm is given by Algorithm 1.
Algorithm 1. EMD algorithm
IMFs = []
While haspeaks(X(i)):
    maximum_points = search(X(i), maximum)
    minimum_points = search(X(i), minimum)
    max_envelope = fitting(maximum_points)
    min_envelope = fitting(minimum_points)
    c(i) = (max_envelope + min_envelope)/2
    r(i) = X(i) − c(i)
IMFs.append(data_imf)
Real engine data can be seen as a superposition of ideal data and noise data caused by minor faults. To identify the characteristic noise components of aeroengines, four kinds of fault data were prepared in this study, including compressor fault data, fan fault data, high-pressure turbine fault data, and low-pressure turbine fault data. The distance of the fault data from each data component is calculated to find the data component with the most noise.
However, each component of the engine data has unequal data lengths and non-existing linear correspondences with the four types of fault data. There are phenomena such as amplitude scaling and linear drift between data [18]. The Euclidean distance is directly used to represent the similarity between each component of the engine data and the four types of fault data, which may encounter the problem of information flooding to a certain extent, resulting in an unrealistic distance between each component of the engine data and the four types of fault data. Therefore, a distance calculation method that is compatible with various data lengths is required.
The DTW algorithm is proposed for measuring the distance between two time series of different lengths. DTW is widely used in the field of speech recognition, and it is also suitable for recognizing two similar aeroengine data sets. In this study, DTW is used to calculate the minimum distance between the four types of fault data and c(i) or r(i).
The process of the DTW algorithm is shown in Figure 4. First, a matrix grid is created, and the length and width are respectively the lengths of the engine data component and the length of the fault data. The elements of the matrix represent the Euclidean distance d of the data at the corresponding position of the engine data component and the fault data. Then take the lower-left corner (1,1) point of the matrix grid as the starting point and move to the adjacent grid corresponding to the smallest element of the right and upper elements until it reaches the upper-right corner of the matrix grid to form a path. Finally, calculate the cumulative value of the matrix elements that the path passes through. This accumulated value is the distance between the engine data component and the fault data.
The pseudocode of the DTW algorithm is given by Algorithm 2.
Algorithm 2. DTW algorithmM
n = length(engine_data)
m = length(fault_data)
dtw_matrix = [∞](n+1)×(m+1)
dtw_matrix(0,0) = 0
for i = 2:n + 1
    for j = 2:m + 1
      dtw_matrix(i,j) = ∑[engine_data(i − 1) − fault_data(j − 1)]2 + min[dtw_matrix(i − 1,j), dtw_matrix(i − 1,j − 1), dtw_matrix(i,j − 1)]
    end
end
return dtw_matrix(n + 1, m + 1)
Define the distance as dis. The distance of c(i) or r(i) and the four kinds of fault data are given in Table 2, as are symbols and explanations.
In Table 2, the distance between c(i) and all fault data can be given by Equation (4).
disc(i) = ∑(disc(i), FAN, disc(i), COMP, disc(i), HPT, disc(i), LPT)
The distance between r(i) and all fault data can be given by Equation (5).
disr(i) = ∑(disr(i), FAN, disr(i), COMP, disr(i), HPT, disr(i), LPT)
The smaller the distance, the more noise it contains. The engine data component corresponding to the smallest distance is selected as the noise component of the aeroengine.

2.3. Gated Recurrent Unit Autoencoder(GAE): A Proposed Denoising Autoencoder Model for Aeroengine Data

DAE is a promising method for data denoising, which can be used for aeroengine data denoising. It is a feature extractor with a denoising function whose purpose is to convert noisy aeroengine data into clean aeroengine data.
In addition, the aeroengine data is a time series, and each element in the sequence is affected by all the previous elements, which requires the model to be able to learn the influence from the historical accumulation of the aeroengine data. Real engine data is difficult to obtain, and the limited amount of data has difficulty supporting large-scale models. This requires the model to have a simple structure, and the GRU (Gated Recurrent Unit) model meets the requirements. The GRU was proposed by Cho et al. [19]. GRUs have fewer parameters, making training faster and requiring less data [20].
Therefore, in this paper, the Gated Recurrent Unit Autoencoder (GAE) model is proposed as a denoising module for aeroengine data by combining the autoencoder with the GRU. The structure of the GAE model is shown in Figure 5. The model includes encoders and decoders with special structures. The input end of the encoder and the output end of the decoder are both GRU modules, and a three-layer autoencoder is used as the connection in the middle. The three layers of the autoencoder are marked as h1, h2, and h3. In the coding stage, the aeroengine data is continuously input to the GRU and the characteristic data is output through h1 and h2. In the decoding stage, the feature data is input into the h3 layer, and the denoised data is output through the GRU module.
In Figure 5, the input data is the aeroengine data that amplifies its own noise and adds Gaussian noise, which can be expressed as X(i)noise; the label data is the raw aeroengine data, expressed as X(i). The GRU involved in Figure 5 has two doors: update door and reset door. the basic structure of the GRU is shown in Figure 6.
In Figure 6, the definitions are as follows: z is the update gate, r is the reset gate, h is the current state, and h—1 is the previous state. The update gate z is used to control the amount of previous state information which is brought into the current state. The reset gate r controls how much information from the previous state is written to the current candidate set. The smaller the reset gate, the less information from the previous state is written. Define the current candidate set as h ˜ . For the input data X(i)noise, the forward propagation formula of the network can be constructed, which is given by Equation (6):
r = σ ( W r [ h 1 , X ( i ) noise ] ) z = σ ( W z [ h 1 , X ( i ) noise ] ) h ˜ = tanh ( W h [ r h 1 , X ( i ) noise ] ) h = ( 1 z ) h 1 + z h ˜ y = σ ( W y h )
where r is the weight from the input and the hidden layer at the previous moment to the reset gate r; Wz is the weight from the input and the hidden layer at the previous moment to the update gate z; Wh is the input and the hidden layer of the previous moment to h ˜ ; Wy is the weights from the hidden layer to the output layer.
The h2 layer’s output of the denoising autoencoder model is the feature code of the aeroengine data. The encoding process is that the encoding function maps high-dimensional aeroengine data vectors to low-dimensional feature vectors. The encoder function is the activation function of the h2 layer of the autoencoder. The activation function is defined as the sigmoid function. The feature vector output by the h2 layer is co. Then the aeroengine data encoding process of the h2 layer is given by Equation (7).
c o = 1 1 + e w hid y b hid
where whid is the h2 layer weight matrix; bhid is the h2 layer bias vector.
The decoding process maps the feature vector co to the reconstructed aeroengine data. The activation function of the h3 layer in the decoder is defined as a linear function, and the feature vector output by the h3 layer is deco, which is given by Equation (8).
d e c o = w out h + b out
where wout is the h3 layer weight matrix; bout is the h3 layer bias vector.
Finally, deco is output by the GRU module as denoised data, which is defined as X(i)denoise. The loss function in this model is the mean absolute error (MAE). Define the number of output nodes of the GAE model as nout, then MAE is given by Equation (9).
MAE ( X ( i ) denoise , X ( i ) noise ) = 1 n out j = 1 n out X ( i ) denoise X ( i ) noise 2

3. Experiment

All functions are written on the python 3.5 platform of the Windows system with Tensorflow as the framework. Tensorflow is an open source machine learning platform. The hardware platforms involved in the calculation are CPU and GPU; the models are Intel(R) Core (TM) i3-9100F CPU_3.60GHz and NVIDIA GeForce GTX TITAN X, respectively.
The data involved in this paper is aeroengine EGTM data. Airlines take EGTM as a reference for aeroengine performance. EGTM refers to the difference between the red line value of engine exhaust temperature and the exhaust temperature when the engine takes off at full thrust. When the engine is washed, there are step changes in the EGTM data. During denoising, these step changes may be recognized as noise and smoothed. There-fore, selecting EGTM data as the noise reduction object can better verify the performance of the proposed method.
Since exhaust gas temperature margin (EGTM) data is an important indicator for evaluating the efficiency of engine washing, the data form of the paper is EGTM data that includes the number of cycles recorded in the water wash. The EGTM data is collected at the outlet of the low-pressure turbine of the engine, and Figure 7 is the measurement point of the exhaust temperature of the aeroengine.
EGTM data is real data provided by the OEM. In this study, the flight cycle was used as a time unit to record the aeroengine data from takeoff to landing. Therefore, aeroengine data is time series data. In practice, the airline provided three materials for an aeroengine, including OEM data in Table 3, fault records in Table 4, and washing records in Table 5.
The data required for the study are spread out across three tables. The collection process of the data to be denoised is: According to the aircraft registration number (ID) and water washing date in Table 5, the IDs of the OEM data in Table 3 are located to collect all the corresponding EGTM data; the collection process of the fault data is: According to the ID in the fault record in Table 4, the ID of the OEM data in Table 3 is located to collect all the corresponding EGTM data. Figure 8 shows the aeroengine EGTM data. Figure 9 shows all types of data.
During the wing period, the engine was washed a total of nine times. Taking the washing time as the time point, the aeroengine data is split into eight segments. The first six segments are used as the training set, and the last two segments are used as the testing set. Number the eight segments sequentially to obtain Table 6.

3.1. Noise Identification Results for Data Components

To identify the data components that contain the most noise, the aeroengine data components are first decomposed based on EMD after being collected. Secondly, based on DTW technology, the noise identification of each component is carried out. The aeroengine EGTM data in Figure 8 are decomposed into residual components and IMF components. Figure 10 shows the two components derived from the decomposition of the six-segment aeroengine training data.
Based on the DTW algorithm, the distances were calculated between the two components of aeroengine EGTM data and four kinds of fault data, such as compressor fault data, fan fault data, high-pressure turbine fault data and low-pressure turbine fault data. In the calculation results, a very large number (>1 × 10308) appeared in the calculation of the distances, marked as “∞”. Table 7 records the distances between the IMF component of the six-segment aeroengine training data and the four kinds of fault data. Table 8 records the distances between the residual component of the six-segment aeroengine training data and the four types of fault data.
In Table 7 and Table 8, the distances between the residual components of No. 2, 3, and 6 in the training data and the four types of fault data are all “∞”, indicating that these three sets of residual component data hardly contain noise components. The distances between the residual components of No. 1, 4, and 5 and the four kinds of fault data are all within 15, while the distances between the IMF components of these three groups and the four kinds of fault data are smaller than those of the residual components, which shows that more noise is included in the IMF component of the aeroengine EGTM data. Therefore, the IMF component is selected as the data component containing the most noise. Separate the data components that contain the most noise in the testing data (No. 7 and No. 8), as shown in Figure 11.

3.2. Hyperparameter Settings for GAE

In this part, the number of nodes in the h1 layer and the number of nodes in the h2 layer of the GAE model are determined. Define the number of nodes in the h1 layer as nin, the number of nodes in the h2 layer as nhid, and the number of nodes in the h3 layer as nout. Since the number of nodes in the h1 layer is same as that in the h3 layer, nin = nout.
There is currently no formula defining the nin. To obtain reliable results, the enumeration method is used to determine the most suitable nin. The enumeration range is 15–25. The mean absolute error (MAE) is used to describe the reconstruction error of the denoising autoencoder for the null engine data.
The model uses the Adam algorithm as the descent algorithm. The number of iterations is 3000 and the training batch is 100. The learning rate is 0.001, and the component containing the most noise is amplified by a factor of 1.1. The model was run 10 times, and the nin and the reconstruction error were plotted as shown in Figure 12.
The vertical line in Figure 12 marks the nin when the reconstruction error is the smallest. In order to analyze the results in more detail, the mean, standard deviation, and minimum and maximum values of 10 experiments are given in Table 9. The data in the Table 9 are reserved for three significant figures.
From the data given in Table 9, the average reconstruction error is the lowest when nin is 20. The average reconstruction error is slightly higher when nin is 19. However, the standard deviation when nin is 20 is larger than that when nin is 19, being 0.0397 and 0.0272, respectively. The minimum error value when nin is 20 is smaller than that when nin is 19. The maximum error when nin is 20 is greater than that when nin is 19. Table 5 shows that the output is unstable when nin is 20, although the average reconstruction error is small. Although the average reconstruction error is slightly larger, the output is stable when nin is 19. Therefore, nin is determined to be 19 in this paper.
Similarly, to minimize reconstruction error, nhid must be determined. At present, researchers have developed empirical formulas [21] for nhid, which help narrow the search for optimal nodes. The empirical formula is given by Equation (10).
n hid = n in × n out
According to Equation (10), it can be roughly determined that nhid is around 19. The optimal nhid is searched by the enumeration method in the interval of 15–25. After the experiment is carried out 10 times, nhid and the reconstruction error are plotted in Figure 13.
As can be seen from Figure 13, when nhid is 21, the reconstruction error is the lowest. To analyze the results in more detail, in Table 10, the mean reconstruction error, standard deviation, minimum error, and maximum error of 10 experiments are given. The above four types of data in Table 10 all retain three significant figures.
From the data given in Table 10, the mean reconstruction error is the lowest when nhid is 21. The standard deviation is the smallest when nhid is 21. This means that the model’s output is currently stable. Therefore, the most suitable nhid is determined to be 21.

3.3. Validation of Aeroengine Data Denoising Method

3.3.1. Reconstruction Accuracy Verification of GAE

First, the reconstruction accuracy of the GAE model is verified. Figure 14 visually shows the reconstruction errors of the three denoising models. Data No. 1–6 are the reconstruction error of the training data in the figure, and data No. 7–8 are the reconstruction error of the Testing data. The figure shows that the reconstruction error of the denoised data output by the EMD model is much larger than that of the GAE model and the EAD model. Except for data No. 4 and No. 8, the reconstruction error of the GAE model is smaller than that of the EAD model.
The reconstruction errors of the DAE, EMD, and GAE models on the training data are given in Table 11.
Table 11 and Table 12 show that the MAE error of GAE training data is (0.0864 − 0.0747)/0.0864 × 100% = 13.54% lower than that of DAE and 78.72% lower than that of EMD; the MAE error of GAE test data is 3.11% lower than that of DAE and 76.80% lower than that of EMD.
The MAE error of GAE is relatively smaller than that of DAE, which reflects that the GRU in GAE can learn the time series relationship of aeroengine data so as to better reconstruct pure data. The combination of the GRU module and the DAE model effectively improves the noise reduction effect, making the MAE error smaller than that of the DAE model.
The MAE error of GAE is much smaller than that of EMD. The reason for this is that the algorithm of EMD is susceptible to noise interference. The algorithm of EMD separates the noise signal by calculating the envelopes of the extrema of the data. Noise makes the extremum of the data change, and the envelopes follow. The changes of the envelope also make the EMD algorithm unstable, resulting in a large error.
To summarize, Table 8 and Table 11 reflect that the noise reduction effect of the DAE model is slightly better than that of the EMD model. The mean reconstruction error of the GAE model is smaller than that of the EAD model. The data reconstruction ability of the GAE model is stronger than that of the DAE model and the EMD model. It is proved that the GAE model can preserve the characteristics of the aeroengine better.

3.3.2. The Effectiveness of the Proposed Noise Reduction Method Based on Hybrid Model Is Verified

Figure 15 shows the reconstruction curves of DAE, EMD, and the proposed model on the EGTM Testing data of an aeroengine. In the figure, the original data are marked with black curves, the denoised data of the proposed model are marked with green dotted lines, the denoised data of the EAD model are marked with light blue, and the noise reduction of the EMD model are marked with dark blue.
The red vertical line in Figure 15 represents the time of washing the aeroengine. Due to the washing, there is a step change in the original EGTM data. EAD, EMD, and the proposed method show different performances under the influence of cleaning. The EAD method significantly smoothed the step change of EGTM. It shows that the EAD model has an underfitting problem in training. The EMD method separates the step change as noise from the EGTM, thus smoothing the step change of the EGTM.
In Figure 15, the proposed model can reconstruct the EGTM sequence and express data mutation well. The denoised data of the proposed model in the figure is closer to the original data than DAE and EMD, which shows that the denoised data of the proposed model is more suitable for the subsequent analysis of washing effects. In order to more accurately express the noise reduction effect of DAE, EMD, and the proposed model on aeroengine data, the EGTM step value after aeroengine washing is investigated.
The influence of the denoising model on the subsequent evaluation of the washing effect can be known by calculating step size after washing. The step size after washing is a direct expression of the washing effect. Therefore, calculating the post-washing step size of the denoised data is an important indicator to investigate whether the denoising model will affect the evaluation of the washing effect. Since the data after washing has a gradual increase rather than a sudden increase, the step size calculated in the paper is the difference between the average values of 10 data before and after washing. The step size is calculated from the No. 2 data.
Figure 16 visually shows the steps sizes of No. 2–8 raw data and the step sizes of denoised data of DAE, EMD and the proposed model. Table 13 gives the step sizes of the training data and testing data. The data in Table 13 are the steps sizes of each piece of data. Since No. 1 is unwashed engine data, the step size is recorded from No. 2 data.
Table 13 is the variance of the steps of the original data and the steps of the denoised data of DAE, EMD, and the proposed model.
It can be seen from Table 14 that EMD has a large variance in both the testing set and the training set. The variances of the DAE in the training set and testing set are much smaller than those of the EMD, but they are still large. The variances of the proposed model are extremely small in both the testing set and the training set. The step sizes calculated from the denoised data of the proposed model are the closest to the original data. Therefore, the proposed model has the least impact on the analysis of subsequent washing effects. It is more suitable for aeroengine data noise reduction than other models.

4. Conclusions

To improve prediction accuracy, a denoising method for aeroengine data based on a hybrid model is proposed in this paper. The method first amplifies the noise part of the data, and then adds Gaussian noise to the data as the input of the autoencoder. Let the autoencoder reconstruct the original data from the amplified noise data so that the autoencoder can perform targeted noise reduction. In the paper, the proposed model is compared with EMD and DAE, which reflects that the proposed model can effectively denoise the data and retain mutation characteristics after aeroengine washing.
The autoencoder involved in the hybrid-model-based aeroengine data denoising method is the GAE model proposed in this paper. The GAE model is composed of three fully connected layers connecting two GRU modules. The model is good at working with time series data. After testing with real aeroengine data, compared with EMD and DAE, the reconstruction error of the GAE model is the smallest, preserving the data features to the greatest extent.
The model proposed in this paper has an ideal effect on the denoising of EGTM data after aeroengine washing. This model is applicable for denoising the various data with sudden changes, such as the gas path data of aeroengines or gas turbines after maintenance. In the future, we will plan to collect more real data to improve our methods.

Author Contributions

Methodology, M.Z.; writing—original draft preparation, Z.Y., Z.C., and S.Z.; writing—review and editing, Z.C.; funding acquisition, M.Z. and Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number U2133202; and the National Natural Science Foundation of China, grant number 51975157.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chen, D.; Sun, J. Fuel and emission reduction assessment for civil aircraft engine fleet on-wing washing. Transp. Res. Part D Transp. Environ. 2018, 65, 324–331. [Google Scholar] [CrossRef]
  2. Xue, F.; Sun, X.; Dong, Z.; Yang, H.; Wang, H. Research on Data Denoising Algorithm Based on EEMD. Mech. Eng. Autom. 2021, 5, 9–11. [Google Scholar]
  3. Lu, T.; Qian, W.; He, X.; Le, Y.; Huang, J. An improved EMD noise reduction method based on noise statistical characteristics, Bull. Surv. Mapp. 2020, 11, 71–75. [Google Scholar]
  4. Hu, K.; Cheng, Q.; Li, B.; Gao, X. The complex data denoising in MR images based on the directional extension for the undecimated wavelet transform. Biomed. Signal Process. Control 2018, 39, 336–350. [Google Scholar] [CrossRef]
  5. Sadooghi, M.S.; Khadem, S.E. A new performance evaluation scheme for jet engine vibration signal denoising. Mech. Syst. Signal Process. 2016, 76, 201–212. [Google Scholar] [CrossRef]
  6. Maragos, P.; Schafer, R.W. Morphological Filters. Part 1. Their Set-Theoretic Analysis and Relations to Linear Shift-Invariant Filters. IEEE Trans. Acoust. Speech Signal Process. 1987, 35, 1153–1169. [Google Scholar]
  7. Sedaaghi, M.H.; Daj, R.; Khosravi, M. Mediated morphological filters. In Proceedings of the 2001 International Conference on Image Processing (Cat. No.01CH37205), Thessaloniki, Greece, 7–10 October 2001; pp. 692–695. [Google Scholar]
  8. Yang, C.; Li, J.; Yang, W.; Yang, W. Denoising Method for Temperature Log Data Based on A Kalman Filter. Well Logging Technol. 2020, 2, 168–171. [Google Scholar]
  9. Li, Y.; Wang, C.; Tian, Y.; Wang, S. Parameter-shared variational auto-encoding adversarial network for desert seismic data denoising in Northwest China. J. Appl. Geophys. 2021, 11, 104428. [Google Scholar] [CrossRef]
  10. Bourlard, H.; Kamp, Y. Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 1988, 4, 291–294. [Google Scholar] [CrossRef] [PubMed]
  11. Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and Composing Robust Features with Denoising Autoencoders. In Proceedings of the 25th international conference on Machine Learning, Helsinki, Finland, 5–9 June 2008. [Google Scholar]
  12. Song, H.; Gao, Y.; Chen, W.; Zhang, X. Seismic noise suppression based on convolutional denoising autoencoder. Oil Geophys. Prospect. 2020, 6, 1210–1219. [Google Scholar]
  13. Wang, X.; Zhao, Y.; Teng, X.; Sun, W. A stacked convolutional sparse denoising autoencoder model for underwater heterogeneous information data. Appl. Acoust. 2020, 167, 107391. [Google Scholar] [CrossRef]
  14. Peng, F.; Gao, Y. BPSK Signal Denoise Based on Convolution Auto-Encoder Network. Inf. Commun. 2020, 8, 41–44. [Google Scholar]
  15. Song, H.; Gao, Y.; Chen, W.; Xue, Y.J.; Zhang, H.; Zhang, X. Seismic random noise suppression using deep convolutional autoencoder neural network. J. Appl. Geophys. 2020, 178, 104071. [Google Scholar] [CrossRef]
  16. Kensert, A.; Collaerts, G.; Efthymiadis, K.; Van Broeck, P.; Desmet, G.; Cabooter, D. Deep convolutional autoencoder for the simultaneous removal of baseline noise and baseline drift in chromatograms. J. Chromatogr. A 2021, 1, 462093. [Google Scholar] [CrossRef] [PubMed]
  17. Wu, Z.; Huang, N.E. A study of the characteristics of white noise using the empirical mode decomposition method. Proc. R. Soc. London. Ser. A Math. Phys. Eng. Sci. 2004, 460, 1597–1611. [Google Scholar] [CrossRef]
  18. Park, S.; Chu, W.W.; Yoon, J.; Won, J. Similarity Search of Time-Warped Subsequences via a Suffix Tree. Inf. Syst. 2003, 7, 867–883. [Google Scholar] [CrossRef]
  19. Chung, J.; Gulcehre, C.; Cho, K.H.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
  20. Olah, C. Understanding Lstm Networks. 2015. Available online: http://colah.github.io/posts/2015-08-Understanding-LSTMs (accessed on 9 March 2021).
  21. Sequin, C.H.; Clay, R.D. Fault tolerance in artificial neural networks. In Proceedings of the 1990 IJCNN International Joint Conference on Neural Networks, San Diego, CA, USA, 17–21 June 1990; pp. 703–708. [Google Scholar]
Figure 1. Demand for denoising aeroengine data. (a) Raw engine data; (b) Denoising effect of traditional methods; (c) Denoising required for aircraft engine
Figure 1. Demand for denoising aeroengine data. (a) Raw engine data; (b) Denoising effect of traditional methods; (c) Denoising required for aircraft engine
Mathematics 10 01485 g001
Figure 2. Denoising model principle.
Figure 2. Denoising model principle.
Mathematics 10 01485 g002
Figure 3. Denoising method for aeroengine data based on a hybrid model.
Figure 3. Denoising method for aeroengine data based on a hybrid model.
Mathematics 10 01485 g003
Figure 4. Principle of DTW algorithm.
Figure 4. Principle of DTW algorithm.
Mathematics 10 01485 g004
Figure 5. GAE Model.
Figure 5. GAE Model.
Mathematics 10 01485 g005
Figure 6. The structure of GRU.
Figure 6. The structure of GRU.
Mathematics 10 01485 g006
Figure 7. Aeroengine data sources.
Figure 7. Aeroengine data sources.
Mathematics 10 01485 g007
Figure 8. The engine data involved in this paper.
Figure 8. The engine data involved in this paper.
Mathematics 10 01485 g008
Figure 9. The fault data involved in this paper. (a) EGTM data for compressor failure; (b) EGTM data for fan failure; (c) EGTM data for high pressure turbine failure; (d) EGTM data for low pressure turbine failure.
Figure 9. The fault data involved in this paper. (a) EGTM data for compressor failure; (b) EGTM data for fan failure; (c) EGTM data for high pressure turbine failure; (d) EGTM data for low pressure turbine failure.
Mathematics 10 01485 g009
Figure 10. Components of the six-segment aeroengine training data. (a) No. 1 data residual component; (b) No. 1 data IMF component; (c) No. 2 data residual component; (d) No. 2 data IMF component; (e) No. 3 data residual component; (f) No. 3 data IMF component; (g) No. 4 data residual component; (h) No. 4 data IMF component; (i) No. 5 data residual component; (j) No. 5 data IMF component; (k) No. 6 data residual component; (l) No. 6 data IMF component.
Figure 10. Components of the six-segment aeroengine training data. (a) No. 1 data residual component; (b) No. 1 data IMF component; (c) No. 2 data residual component; (d) No. 2 data IMF component; (e) No. 3 data residual component; (f) No. 3 data IMF component; (g) No. 4 data residual component; (h) No. 4 data IMF component; (i) No. 5 data residual component; (j) No. 5 data IMF component; (k) No. 6 data residual component; (l) No. 6 data IMF component.
Mathematics 10 01485 g010aMathematics 10 01485 g010b
Figure 11. Components that contain the most noise in the testing data. (a) Data No. 7 contains the most noise data component; (b) Data No. 8 contains the most noise data component.
Figure 11. Components that contain the most noise in the testing data. (a) Data No. 7 contains the most noise data component; (b) Data No. 8 contains the most noise data component.
Mathematics 10 01485 g011
Figure 12. Relationship between the number of nodes in the h1 layer and reconstruction error.
Figure 12. Relationship between the number of nodes in the h1 layer and reconstruction error.
Mathematics 10 01485 g012
Figure 13. Relationship between nhid and reconstruction error.
Figure 13. Relationship between nhid and reconstruction error.
Mathematics 10 01485 g013
Figure 14. Comparison of reconstruction accuracy of three noise reduction models.
Figure 14. Comparison of reconstruction accuracy of three noise reduction models.
Mathematics 10 01485 g014
Figure 15. Comparison of noise reduction effects of three noise reduction models.
Figure 15. Comparison of noise reduction effects of three noise reduction models.
Mathematics 10 01485 g015
Figure 16. The step sizes from data of all models.
Figure 16. The step sizes from data of all models.
Mathematics 10 01485 g016
Table 1. Meaning of abbreviations.
Table 1. Meaning of abbreviations.
AbbreviationsFull Title
DAEDenoising Autoencoders
EMDEmpirical Mode Decomposition
EGTMExhaust Gas Temperature Margin
DTWDynamic Time Warping
IMFIntrinsic Mode Functions
GRUGated Recurrent Unit Autoencoder
OEMOriginal Equipment Manufacturer
MAEMean Absolute Error
Table 2. Distance table.
Table 2. Distance table.
disInterpretation of dis
disc(i)Distance sum between c(i) and all fault data
disr(i)Distance sum between r(i) and all fault data
disc(i), FANDistance between c(i) and fan fault data
disc(i), COMPDistance between c(i) and compressor fault data
disc(i), HPTDistance between c(i) and high-pressure turbine fault data
disr(i), FANDistance between r(i) and fan failure data
disr(i), COMPDistance between r(i) and compressor failure data
disr(i), HPTDistance between r(i)and high-pressure turbine failure data
disr(i), LPTDistance between r(i) and low-pressure turbine failure data
Table 3. OEM data.
Table 3. OEM data.
IDESNTimeEGTM
B-579365720816 September 2013 1:5290.845
B-579365720816 September 2013 5:0984.52
B-579365720816 September 2013 8:5087.208
B-579365720816 September 2013 12:3189.306
B-579365720817 September 2013 8:0082.397
B-579365720817 September 2013 11:4585.755
B-579365720818 September 2013 9:4485.973
B-579365720821 September 2013 0:0666.281
B-579365720821 September 2013 4:0886.617
B-579365720821 September 2013 8:3675.281
Table 4. Fault records.
Table 4. Fault records.
IDFull TitleNoise SourceNoise Type
B25301 August 2009Abnormal left engineCompressor fault noise
B25882 January 2014Fan seal brokenFan fault noise
B607626 February 2002Turbine oil leakageTurbine fault noise
B60702 January 2014Abnormal turbine bladeTurbine fault noise
Table 5. Washing records.
Table 5. Washing records.
DateIDBaseCAMP
8 September 2014 0:00B-1816BeijingA320 720000-CCA-C-02
8 September 2014 0:00B-1816BeijingA320 720000-CCA-C-02
28 September 2014 0:00B-2210HangzhouA320 720000-CCA-C-02
16 January 2015 0:00B-2210HangzhouA320 720000-CCA-C-02
9 March 2014 0:00B-2210HangzhouA320 720000-CCA-C-03
19 April 2014 0:00B-2210HangzhouA320 720000-CCA-C-03
16 April 2014 0:00B-2364ChengduA320 720000-CCA-C-02
16 April 2014 0:00B-2364ChengduA320 720000-CCA-C-02
Table 6. Split aeroengine data grouping table.
Table 6. Split aeroengine data grouping table.
Data NumberFlight Cycles
Training data11~329
2330~481
3482~708
4709~1142
51143~1484
61485~1711
Testing Data71712~1981
81982~2222
Table 7. Distances of IMF components.
Table 7. Distances of IMF components.
Data
Number
Fan Fault DataCompressor
Fault Data
High Pressure
Turbine Fault Data
Low Pressure
Turbine Fault Data
15.2365.2276.4936.199
24442.4535582.838
310,463.48812,162.886
42.9074.2052.5215.467
53.1673.9813.01586.385
66209.8607909.258
Table 8. Distances of residual components.
Table 8. Distances of residual components.
Data
Number
Fan Fault DataCompressor
Fault Data
High Pressure
Turbine Fault Data
Low Pressure
Turbine Fault Data
110.70313.7169.99113.171
2
3
48.68011.0518.30111.654
59.74511.4809.92914.420
6
Table 9. Table of the nodes number in the h1 layer and reconstruction error data.
Table 9. Table of the nodes number in the h1 layer and reconstruction error data.
ninMean Reconstruction ErrorStandard DeviationMinimumMaximum
150.04420.04760.01740.152
160.04200.02370.02080.0863
170.04860.03170.02360.111
180.04200.02860.01900.114
190.03980.02720.01970.117
200.03760.03970.01700.140
210.04810.03490.02610.150
220.04840.04410.01730.168
230.05480.03560.02540.114
240.05280.04960.01940.117
250.04970.05070.02230.186
Table 10. Table of nhid and reconstruction error.
Table 10. Table of nhid and reconstruction error.
nhidMean Reconstruction ErrorStandard DeviationMinimumMaximum
150.01720.005220.009000.0260
160.01510.004600.009000.0220
170.01570.007480.009000.0350
180.01430.004870.01000.0270
190.01290.003750.009000.0190
200.01530.005030.008000.0260
210.01230.003430.008000.0180
220.01440.004690.009000.0220
230.01280.004180.007000.0190
240.01320.004860.009000.0230
250.01330.004210.008000.0230
Table 11. Reconstruction errors of training data.
Table 11. Reconstruction errors of training data.
ModelReconstruction Error of Training DataAverage
GAE0.08100.09090.06370.06810.05850.08610.0747
DAE0.08530.09950.09910.06280.09010.08170.0864
EMD0.1850.35780.3200.2880.2480.7050.351
Table 12. Reconstruction errors of testing data.
Table 12. Reconstruction errors of testing data.
ModelReconstruction Error of Testing DataAverage
GAE0.08070.08070.0965
DAE0.1610.1610.0996
EMD0.3650.4670.416
Table 13. Step of each segment of data.
Table 13. Step of each segment of data.
ModelStep Size of the EGTM/°C
Training DataTesting Data
Original data11.3265.2127.19912.20612.0785.6136.971
The proposed model 10.6044.8416.83211.29211.5555.5845.108
DAE9.6494.5255.4359.97910.9585.8712.709
EMD10.2202.8135.0678.1379.46910.2880.738
Table 14. Step value variance between noise reduction data and original data.
Table 14. Step value variance between noise reduction data and original data.
ModelTraining DataTesting Data
The proposed model 0.3811.736
DAE2.5229.116
EMD6.97830.358
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yan, Z.; Zu, M.; Cui, Z.; Zhong, S. A Novel Denoising Method for Retaining Data Characteristics Brought from Washing Aeroengines. Mathematics 2022, 10, 1485. https://doi.org/10.3390/math10091485

AMA Style

Yan Z, Zu M, Cui Z, Zhong S. A Novel Denoising Method for Retaining Data Characteristics Brought from Washing Aeroengines. Mathematics. 2022; 10(9):1485. https://doi.org/10.3390/math10091485

Chicago/Turabian Style

Yan, Zhiqi, Ming Zu, Zhiquan Cui, and Shisheng Zhong. 2022. "A Novel Denoising Method for Retaining Data Characteristics Brought from Washing Aeroengines" Mathematics 10, no. 9: 1485. https://doi.org/10.3390/math10091485

APA Style

Yan, Z., Zu, M., Cui, Z., & Zhong, S. (2022). A Novel Denoising Method for Retaining Data Characteristics Brought from Washing Aeroengines. Mathematics, 10(9), 1485. https://doi.org/10.3390/math10091485

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop