1. Introduction
The transformer is one of the most important devices in the electricity distribution process, and reliable power distribution depends largely on the failure-free operation of this equipment. The failure of the transformer during operation can bring a significant loss of revenue to the utility, possible environmental damage, explosion and fire risks, and expensive costs of repair or replacement [
1,
2]. In the case in which these devices fail, operational life expectancy and reliability may change over the years and electricity to consumers may be interrupted. Therefore, the analysis of the condition and maintenance of the transformer are extremely important to ensure stable reliability of electricity [
1,
3,
4,
5].
When the power transformer is in normal operation, the insulating oil and solid insulating material will gradually deteriorate and a small amount of gas will be decomposed, including mainly hydrogen (H
2), methane (CH
4), acetylene (C
2H
2), ethylene (C
2H
4), ethane (C
2H
6), carbon monoxide (CO), and carbon dioxide (CO
2). On condition of internal transformer failure occurring, the emergence speed of these gases is accelerated [
6]. So, one of the most important tools for power transformer condition monitoring and internal fault diagnosis is the transformer oil gas chromatography test, known as dissolved gas analysis (DGA) [
7,
8,
9,
10].
Several studies have addressed the creation of power transformer condition monitoring systems based on DGA. Many techniques for predicting the concentration of gases have been proposed, such as wavelet least squares, support vector regression, neural network, deep learning, fuzzy model, and long short-term memory (LSTM), just to name a few.
In general, artificial intelligence techniques have been widely used to develop more accurate diagnostic tools based on DGA data [
5,
9,
10,
11,
12,
13,
14]. In [
9], for example, a new approach for diagnosing transformer failure was created based on gas rate and support vector machine (SVM). First, on the basis of the International Electrotechnical Commission Technical Committees (IEC-TC) 10 database, optimal dissolved gas rates are obtained by genetic algorithm designed for simultaneous DGA rate selection and SVM parameter optimization. In that work, three traditional methods were used: SVM DGA, backpropagation neural network (BPNN) DGA, and IEC criteria, and three-key IEC gas proportions with SVM and back propagation neural network were employed to compare accuracy. The SVM technique also served as a basis for the approaches in [
11,
13,
15]. The authors in [
13] have used the least squares support vector machine (LS-SVM) for dissolved gases forecasting (H
2, CH
4, C
2H
2, C
2H
4, and C
2H
6) and assessing incipient faults of transformer polymer insulation. Meanwhile, in [
15], a new approach has been proposed to combine technical wavelet regression with LS-SVM for the prediction of dissolved gases in power transformers immersed in oil. In [
10], the authors have used a fuzzy inference system (FIS) to determine absolute concentrations of free and dissolved transformer oil, total dissolved combustible gases, total combustible gases, proportions of some gases with each other, and gas rates increasing to detect the decomposition of transformer isolation papers. A similar approach has been proposed in [
5], in which an adaptive neuro fuzzy inference (ANFIS) system was employed to estimate the transformer isolation degradation rate with the input variables H
2 (hydrogen), CH
4 (methane), N
2 (nitrogen), O
2 (oxygen), CO (carbon monoxide), CO
2 (carbon dioxide), C
2H
6 (ethane), C
2H
4 (ethylene), C
2H
2 (acetylene), and TDCG (total dissolved combustible gas).
In general, these numerous studies have used artificial intelligence techniques as regression to predict gas concentration or faults in power transformers. More specifically, the use of prediction models in connection with the wavelet transformed has been addressed in some recent works to improve the forecast [
13,
14,
15]. Despite satisfactory results, those approaches may not be the most efficient in predicting future values of the variable of interest, especially for a multi-step ahead forecast. Several empirical studies show that learning long-term time dependencies can be difficult for gradient-descent algorithms, which are more effective, converge faster, and generalize better in nonlinear autoregressive neural network models than in other neural networks [
14,
16,
17,
18,
19,
20]. Autoregressive models based upon neural networks specify that the output variable depends, in a non-linear way, on its own past values and on a stochastic imperfectly predictable term. Thus, the prediction of future values of the output variable can be realized from its past and present values. Additionally, the prediction model can also consider present and past values of one or more auxiliary external variables, resulting in a nonlinear autoregressive model with exogenous variables.
In this sense, the authors in [
14] proposed a combination of a nonlinear autoregressive neural network model with the discrete wavelet transform, resulting in a high-accuracy multi-step ahead forecast of in-oil gas concentrations. The authors investigated the use of different wavelet functions and different time delays in the autoregression model, but they did not assess how different delays in external series can influence the values of the output series.
In fact, the definition of the optimal input and output delays is one of the main limitations of an autoregressive model. In general, in multidimensional models with n external variables, equal variables delays are adopted. This means that the prediction of the output value at time t + 1, y(t + 1), is performed using the past outputs y(t), y(t − 1),…, y(t − dy) and the past observations ui(t), ui(t − 1),…, ui(t − du) of the external variables ui as inputs, i = 1,…,n. In addition, the adoption of many inputs can increase the complexity of the forecasting model and reduce its accuracy. Thus, some difficulties and limitations remain despite the advances, motivating research for new models to be conducted.
The investigation of the use of different time delays in external series that influence the output does not seem to have received the necessary attention, especially considering that there is a strong correlation between the concentrations of different gases and failures in transformers. This work seeks to contribute to overcome this limitation by proposing a wavelet-like transform to optimize the order of the factors in an autoregressive neural network model, with some exogenous variables, to predict the dissolved gas concentration in power transformer oil.
The main objective of this work is to determine the optimal delay for each input and for the output to create an autoregressive model with a reduced number of inputs and with competitive precision in relation to the literature. The hypothesis is that wavelet-like approximations of the external variables and the output variable incorporate the temporal memory of the autoregressive model. In addition, the selection of the best approximation for each variable determines the ideal delay for each input while reducing the size of the model, as each sample of the approximation is calculated considering a time window of the series.
Consequently, the contributions of the proposed approach can be stated as follows:
Development of an approach based on a wavelet-like transform that determines the optimal delay for each external variable and for the output variable in an autoregressive prediction model;
A prediction model with high precision as it focuses on the trend of the input signals from the noise-free approximations calculated by the wavelet transform;
Expansion of knowledge of the temporal relationship between gases underlying degradation process of the insulating oil and solid insulating material;
Reduction of the number of input variables in the autoregression model when using the approximations resulting from transformations with wavelets of different lengths, which already consider the time delay determined for each variable.
The remainder of this paper is organized as follows. The related theory is discussed in
Section 2—dissolved gas-in-oil analysis in
Section 2.1, discrete wavelet transform in
Section 2.2, and nonlinear autoregressive exogenous model in
Section 2.3. In
Section 3, materials and methods will be presented, followed by the results in
Section 4, discussion in
Section 5, and finally the conclusion in
Section 6.
3. Materials and Methods
The proposed approach relies on a wavelet-like transform to optimize the order of the factor (gas concentrations) in a nonlinear autoregressive model with exogenous variables. This means to define the optimal order for each gas concentration.
Thus, the approach proposed has the following steps: step 1, gas concentration acquisition and data normalization; step 2, Kaiser–Meyer–Olkin (KMO) and Bartlett test; step 3, wavelet-like decomposition of gas concentration; step 4, Pearson’s correlation; step 5, standardized regression coefficients; step 6, a model using principal components analysis (PCA) to select the principal component; step 7, calculation of contribution rate for each wavelet decomposition level; and, finally, step 8, prediction using the best time delay as input in a multi-layer perceptron (MLP) network. All these steps are illustrated in
Figure 1 and described in detail as follows.
Usually, interpretation techniques such as Duval triangle are applied to the information on the concentration of gases in the transformer oil, which is collected using an equipment such as Morgan Calisto, Luman Sense Smart DGA, General Electric (GE) Transfix, Qualitrol DGA 150, or others [
8].
Initially, this work collected a set of 190 historical oil-dissolved gas data from a transformer equipped with a GE Kelman-Transfix (GE—General Electric, Sao Paulo, Brazil) and GE Intellix BMT 330 (GE—General Electric, Sao Paulo, Brazil). In this stage, the variables pointed out by [
10,
11,
12,
13] are C
2H
2, C
2H
4, C2H
6, CO, CO
2, CH
4, O
2, and H
2. However, H
2O and combined gas concentrations were added as input, resulting in ten variables. Before the next step, all the data were normalized between 0 and 1.
The KMO test is applied to verify the measure adequacy sampling for each variable in the model [
31] and the Bartlett test to test the hypothesis that the correlation matrix is an identity matrix, which would indicate that variables are unrelated, and thus unsuitable for structure detection [
32].
KMO (1977) is a criterion for identifying whether a factor analysis model being used is adequately fitted to the data, testing the overall consistency of the data [
31]. Meanwhile, Bartlett’s sphericity test is a technique created by Maurice Stevenson Bartlett in 1937, which indicates the strength of the relationship between variables.
At this stage, DWT is used in two forms. In the first one, each gas concentration is decomposed keeping level of decomposition in 1 while changing the wavelet from db2, db4,…, to db20, in order to create smooth approximations of the original gas concentration using the low frequency filters. Additionally, the wavelet transform is applied in the gas concentrations in reverse chronological order so that each sample of the approximation is created with values passed from the original signal.
Considering
m samples from a time series in reverse chronological order, that is, the most recent samples at the beginning,
, and a low pass wavelet filter
of length
,
, Equation (5) defines the application of the transform to the signal
to create an approximation
with time delay
, as proposed in this work,
Approximations , with half the length of the original signal, , for each Daubechies wavelets from db2 to db20, are created, resulting in 10 approximations for each time series . Here, we have 190 samples of each gas concentration.
Unlike the authors of [
33], who have used Pearson’s correlation coefficient between the constant characteristic parameter and the candidate of the variable characteristic parameters to verify the concentration of gas that presents the best correlation to electrical faults, this work uses the Pearson’s correlation to calculate a relationship between the various approximations created for gas concentrations with different time delays (wavelets of different lengths). Thus, this step results in a matrix X with 110 columns and 190 rows, such that the 110 columns represent the time
t,
t − 2, through
t − 20 of each gas concentration, which generates 110 input variables.
In these steps, we apply PCA in the matrix
A created from the relation between inputs Xj (gas concentrations delayed at time
t − 2 to
t − 20 according to wavelet-like transform) and output Yi (a gas concentration in time instant
t). So, the values of
A are calculated as standardized regression coefficients
aij (Equation (6)) for each input and output, describing the relationship between the concentration of a given gas and the approximations created for all other gases in different time delays generated by the wavelet transform. Therefore, a square matrix is created for each gas concentration, in which the PCA is applied to select the main components that represent at least 99% of the original data variation, generating a supervised PCA (SPCA), according to [
34,
35].
The contribution of each time delay is calculated as follows:
, in which
A represents the input data,
λ are corresponding eigenvalues,
A’ is the representation of
A in the principal component space, and
p is the most important principal component [
35].
An MLP neural network is trained with the Levenberg–Marquardt backpropagation algorithm with 100 epochs, 1 input layer, 1 hidden layer, and 1 output layer. The neurons in the hidden layer were used following two approaches—the first one following [
36], which propose a method using
, and the second following [
37], proposing
, where
corresponds to the best neurons numbers and
is the number of input parameters.
Unlike [
3], we normalize the input data between –1 and 1 for applying a population-based metaheuristic algorithm to optimize the structure of the MLP neural network with back propagation algorithm. We propose using the optimal time delays made with the wavelet as input in an MLP with a backpropagation algorithm.
In order to test the temporal relationship between gases underlying the degradation process of the insulating oil and solid insulating material, five gas concentrations were chosen, as the main methods basically used by the IEC ratios and Rogers and Dornenburg ratios are C2H2, C2H4, C2H6, CH4, and H2 to identify possible power transformer faults.
Figure 2 shows an example of the neural network architecture to predict gas concentration C
2H
4, where the optimal time delays for C
2H
2, C
2H
6, CH
4, and O
2 are selected according Pearson’s correlation and PCA.
Regarding the output, 95 samples related to odd days were selected to create the matrix A, as well as to train and test the forecasting model, as it is necessary to put the input and output data with the same length.
Similar experiments were carried out for the other gases: when the output is C2H2, the inputs are the approximations of C2H4, C2H6, CH4, and O2 with their respective optimal delays defined by the proposed approach; when the output is C2H6, the inputs are the wavelet approximations of C2H2, C2H4, CH4, and O2; and so on.
4. Results
Firstly, we evaluated the results of the KMO and Bartlett test. In
Table 4, the KMO test indicated 0.743, while at the same time, the Bartlett test indicated 0; for this reason, these data are suitable for data structure detection, while the Bartlett test indicates that a factor analysis may be useful with your data.
The next stage shows the resulting of selection and contribution rate of decomposition of each variable.
Table 5 shows the contribution rate per gases concentration (normalized).
Each variable presents a different importance rate and first order, C
2H
6 has Wavelet
db20, which means that all gas concentration have to delay in time instant
t − 20, while
db8 has more impact in CH
4,
db18 in O
2, and so on (see
Table 5).
The level of correlation of the time delays for each gas concentration in is shown
Table 6, wherein the values marked in bold and italics are the time delays that have higher correlation with C
2H
2, C
2H
6, C
2H
4, H
2, and CH
4.
The following shows the results of the neural network prediction for two gases concentrations using 8 and 15 neurons in the hidden layer, as the methods DGA IEC ratios, as well as the Rogers and Dornenburg ratios, basically use the following to analyze the potential problems in power transformers: CH4 gas, H2, C2H2, C2H4, and C2H6.
The results presented in
Table 7 show us an average MAPE for two days of 1.525% for C
2H
6 and 1.831% for C
2H
4. Meanwhile,
Figure 3 compares the predicted values with the real values for the five gas concentrations. As can be seen, the selection of the optimal time delay in gas concentration can improve prediction accuracy, when comparing predictions with the input variables at the same time t − 2 and t − 4 (
Table 7).
5. Discussion
This study aimed to study the optimal time delay of each gas concentration impacting the gases H
2, CH
4, C
2H
2, C
2H
4, and C
2H
6 (
Table 6 and
Table 7), in which a DGA technique subsequently be used to detect the defect in the power transformer.
The approach using a wavelet-like transform and SPCA shows the contribution rate of different time delays of each gas concentration, which differs from the proposal of recent works, such as, for example [
13,
14]. In [
14], for example, despite testing different wavelet functions and different delays, all models adopted the same time delay for external variables. Here, the approach shows the rate and order of importance and wavelet-like order for ten gas concentrations (
Table 5), indicating that
db20 (
t − 20),
db8 (
t − 8), and
db18 (
t − 18) are the three most important time delays for the gas concentrations C
2H
6, CH
4, and O
2, respectively. This result shows that the effect that a given gas suffers from other gases varies differently over time for each gas.
We have used Pearson’ s correlation to consideration the impact of each time delay as using different time delays t − 2 to t − 20 in each gas concentration, showing, for example, that to predict the concentrations of C2H2, the best time delays for the other gas concentrations are as follows: t − 12 for C2H4, t − 6 for C2H6, t − 10 for CH4, and t − 8 for H2. It is important to highlight that a traditional autoregressive model that adopts the same delay for all variables would not have identified this relationship. In addition, this is a very important result for calibrating monitoring systems, as it indicates that any variation in C2H4, for example, will take about 12 units of time to reflect on the concentration of C2H2. A similar analysis applies to other gases.
A similar kind of relationship of different gases has been studied in [
38] and [
33]. In [
38], the authors have studied a correlation between the five gas concentrations, by applying the value of grey relational grade to reveal the relationships between gas features. Those authors show that the grey relation analysis is efficient in selecting and removing redundant features from the set of input variables. However, it does not consider any time delay in sampling the input series of gas concentrations. On the other hand, the authors in [
33] have used correlation coefficients of gas concentration CO as a constant characteristic parameter for the correlation of time series analysis and H
2, CH
4, C
2H
2, C
2H
4, and C
2H
6 as characteristic variable parameters to be used to distinguish electrical faults from thermal faults.
However, approaches based on autoregressive models apply the same order for all input variables and do not take into account the time delay relationship between gas concentrations. Notwithstanding, we have seen that the optimal selection of the time delay for each concentration of gas affects the output.
Regarding forecast accuracy, this approach shows some better predictions than [
33,
38,
39] (see
Table 8).
It is important to highlight the low computational cost of the proposed model, because it takes a matter of seconds to run. In the example above regarding the prediction of the C2H2 concentration, instead of input 12, 6, 10, and 8 passed values of the gases C2H4, C2H6, CH4, and H2, respectively; according to in Equation (4), we simply use the corresponding four approximations created by the wavelet-like transform for each exogenous gas.