1. Introduction
As a major equipment of the power system in an era of increasing power demand [
1,
2,
3,
4,
5,
6], the power transformer is indispensable to the transmission of electric energy, the connection between the main system and each sub-system of the power grid, so it is obvious that their performance directly affects the reliable operation of the power grid [
7]. With the rapid development of high-voltage and ultra-high-voltage transmission technologies, the capacity of the power grid is continuously increasing, while the coverage is persistently expanding. Consequently, if the fault of the power transformer is not timely and accurately detected, it will have a seriously negative effect on the power grid paralysis hurting the normal development of social economy [
8]. Therefore, the study on fault diagnosis of power transformers is highly significant to the development of the power system.
At present, oil-immersed power transformers are used primarily in the power grid. Due to the influence of environmental factors such as electricity, machinery, and chemistry, the mineral oil and insulating cellulose paper inside the traditional power transformer will gradually undergo qualitative change, which will produce carbon monoxide (
), carbon dioxide (
), and a series of low molecular hydrocarbons such as hydrogen (
), methane (
), ethane
), and other gases. When potential faults exist in power transformers, the contents of various gases will change significantly and gradually dissolve into oil. Therefore, the composition of dissolved gas in transformer oil can reflect the operation state of a power transformer to a great extent [
9]. Currently, dissolved gas analysis (DGA) has been extensively used as an effective approach in power transformer fault diagnosis [
10,
11,
12]. Based on DGA data, traditional power transformer fault diagnosis methods [
13], for instance characteristic gas method, three-ratio method, and an improved three-ratio method have been developed by the international electrotechnical commission (IEC). However, due to the complex gas generation mechanism of the power transformer, there is no clear corresponding relationship between gas content and ratio in oil and fault type, the traditional fault detection methods of the power transformer often rely on the diagnosis experience of experts and are difficult to be realized through the program. When the sample data tested is too little or there are abnormal values in the sample data tested, the accuracy of the test cannot meet the requirements of industrial production [
14].
Recently, aiming at correcting the flaws of the traditional method of fault diagnosis for the above-mentioned power transformer, more and more scholars have tried to apply artificial intelligence (AI) methods to the power transformer fault detection model based on DGA dataset, such as fuzzy theory [
15], support vector machine (SVM) [
16], and artificial neural network (ANN) [
17,
18,
19,
20]. The power transformer fault diagnosis approach with artificial neural network has been widely applied. It studies the sampled data of the power transformer under different working conditions, continuously adjusting the connection weights and bias (significant parameters) of the network model, establishing the corresponding mapping relationship between specific fault characteristics and fault types and a fault diagnosis model. Therefore, the application of ANN improves the accuracy of fault diagnosis. However, it has the disadvantages of a quite small convergence area and easy to fall into local optimal. In addition, many intelligent optimization algorithms with excellent fault diagnosis effects are also widely applied in power transformer fault diagnosis, such as particle swarm optimization algorithm (PSO) [
21], cuckoo search (CS) algorithm [
22], etc.
For different studies in the fields of fault detection and prediction in power transformer, Zhang, Y. et al. [
23] presented a novel neural network with two steps. They enhanced the accuracy of fault detection by using two artificial neural networks to detect the fault type and the condition of cellulose respectively. Dong, M. et al. [
14] proposed a power transformer fault diagnosis model using SVM as hierarchical decision making. The experimental results indicate that this model can settle the problem of parameter selection of support vector classifier and has strong generalization.
Based on the strong ability of deep learning, system features can be obtained from a small amount of sample data and represent complex relationships, therefore, the research of fault diagnosis based on deep learning has attracted many scholars’ attention. For example, Zhang, C. et al. [
24] developed a deep learning method for fault diagnosis of rotating equipment. By setting appropriate network parameters, the extraction time of fault feature data can be saved. In addition, this method can classify faults accurately even when the number of sample data is small. On the other hand, the method has some defects, that is, the convergence speed is slow. Zhang, L. et al. [
25] used deep belief network for the fault classification and identification of a vehicle transmission system. Firstly they propose calculating the spectrum of the original signal, then carrying out data fusion, and finally establishing a pattern recognition model based on deep learning. The classification results indicate that the method has a good recognition accuracy. The main advantages of this method are as follows: First, based on deep learning, it can extract features from the spectrum of sample data. Second, this method can combine the sample data of multiple sensors to extract features. Compared with the sample data of a single sensor, its data structure is more complete, which leads the classification to become more accurate. However, this method still has a shortcoming, that is, the model structure is complex, which requires the model to take a long time to train the sample data. Ji, X. et al. [
26] proposed a new method for power transformer fault diagnosis utilizing deep learning and soft maximum classification. The method utilizes the superposition of the encoder and soft maximum return to the power transformer fault detection and prediction model, using tagged without supervision and the training of a mass of samples, by the method of step k contrast differences, the parameters of the fault diagnosis model are optimized, and using the supervised algorithm to adjust the parameters of the fault diagnosis model, then the soft biggest regression method is used to determine the fault type of power transformers. Finally, through the comparative analysis, the accuracy and adaptability of this method for fault detection and prediction is superior to the methods of back-propagation neural network and SVM. In a word, deep learning has been gradually applied to the field of fault diagnosis.
It can be seen from the previous paragraphs that ANN has the advantages of a high classification accuracy and strong parallel distributed processing ability. As a branch of ANN, probabilistic neural network (PNN) not only has the advantages of ANN, but also has the advantages of easy training, fast convergence speed, and arbitrary nonlinear approximation. Thus based on the advantages of PNN, our use PNN to build the basic model of power transformer fault detection and prediction.
In view of the status quo of the power transformer fault diagnosis technology mentioned above, this paper proposes a new power transformer fault diagnosis method to improve the efficiency and accuracy of diagnosis. In addition, the other purpose of this paper is to provide a new way of thinking for the research of diagnosis methods combined with artificial intelligence technology. The main contributions of this paper are as follows: First of all, one embed modified differential evolution (MDE) operator into the whale optimization algorithm (WOA) based on life mechanism to overcome the vulnerability of WOA to drop into local optimum. Secondly, the structure parameter of PNN is optimized by using the combinatorial optimization algorithm, which leads the detection ratio of power transformer fault diagnosis to a higher level. Finally, a fault diagnosis model of power transformer is constructed, which provides a new idea for the development of fault technology.
Following introduction,
Section 2 introduces the proposed method and describes the prognostic and health management (PHM) model of the power transformer based on the method;
Section 3 describes the process of the experiment;
Section 4 introduces and analyses the experimental results;
Section 5 discusses our research work of this time; and
Section 6 draws the conclusions.
2. Methods
2.1. Whale Optimization Algorithm
WOA is a swarm optimization algorithm following a kind of special hunting way of humpback whales, developed by Mirjalili et al. [
27]. WOA simulates the hunting behavior of humpback whales in the natural world including the whales search, encircle attack prey, and so on the process to achieve optimization search. Lots of studies in the past showed that WOA has advantages of simple principle, easy implementation, and fewer parameter settings [
28,
29,
30]. The algorithm consists of three stages: Random search for prey, encircle prey, and bubble-net attack. First, whales hunt for prey at random. In this process, groups of whale search for better prey by moving away from each other. The mathematical model of this process is described as:
Here, is the position of randomly selected whale individuals in the whale population, t is the current number of iterations, and A and C are coefficient vectors. At this stage, the algorithm sets to reach the location of the search agent away from the reference whale, so as to achieve the purpose of exploring a broader field.
The coefficient vectors
A and
C can be found as:
Here,
r is a random vector between 0 and 1, and
is the maximum number of iterations. From the above equation, it can be observed that there is a linear decrease from 2 to 0 with an increase of the number of iterations. Then, the whales move closer to the prey after they have found it. Namely, WOA’s shrinkage enveloping mechanism, in which the mathematical formula of individual position updating of whales is as follows:
It is worth noting that
is less than 1,
is the current best search agent, if
has a better positional advantage,
will automatically update the current position to surround the prey. Finally, the whales use spiraling and narrowing enclosure to achieve prey hunting. The mathematical model to realize this process is:
The D′ is the distance between the individual whale and prey before it updates its position. b is a constant that determines the shape of the helix, and l is a random value between 0 and 1. It is important to note that in the WOA algorithm, in order to ensure that whales encircle prey and spiral upward simultaneously, it is assumed that the probability of both is equal, i.e., .
From the above description, the WOA algorithm may be roughly described as follows:
Step 1: Set algorithm parameters, namely the total whale group size N, the maximum number of iterations , and the dimension ;
Step 2: Generate the initial individuals randomly and record their current position;
Step 3: Calculate the fitness value of each individual and preserve the current optimal solution with its features as position;
Step 4: Judge whether the updating process finished: If
, output the optimal solution and end; if
, update a, A, and C according to Equation (
2);
Step 5: Generate a random number
p between [0,1]. If
, update the individual position according to Equation (
4), and turn into Step 3. If
, then determine the size of
and 1: If
, update the individual position via Equation (
1) and turn into Step 3; if
, update the individual position by Equation (
3) and go to Step 3.
Note: Fitness value refers to the objective function value calculated during iteration.
2.2. Hybrid Whale Optimization Algorithm with Modified Differential Evolution Operators
In WOA, the Equation (
1) requires whale populations to separate from prey and randomly move with different individuals in the beginning of the iteration. This process makes gives WOA a good global optimization ability. However, as a general swarm intelligence optimization algorithm, WOA also has common shortcomings. Increasingly the number of iterations, the population will continue to move closer to an optimal individual region, thus losing the opportunity to explore other locations in the space, which will cause a loss of diversity in the population. According to Equation (
2),
a decreases linearly with the increase of the number of iterations. It is this linear reduction that leads to
and the positions of all search agents in the algorithm can only be updated by the Equations (
3) and (
4) in the later period of iteration, leading the algorithm to easily fall into the local optimization. Therefore, we propose a whale optimization algorithm based on the modified differential evolution (MDE) operator to address the problem of easily falling into local optimality (the pseudocode for MDE-WOA is shown in Algorithm 1).
Algorithm 1: MDE-WOA |
|
In the MDE-WOA, MDE shares a population with WOA, and the improved differential evolution operator is used as a component of WOA based on the lifetime mechanism. The use of the lifetime mechanism determines when the improved differential evolution operator is embedded in WOA. In this paper,
S is taken as the life span of the individual and the current age of the individual is
s. Here is the formula for updating
s:
Here,
t is the number of iterations,
. When
, it means that
has not been updated for
s times, and then Equation (
3) will be optimized by the modified differential evolution strategy.
In this paper, MDE operator is embedded into WOA by using the concept of neighborhood mutation operator. Similar to the traditional differential evolution algorithm, MDE is mainly composed of mutation, crossover, and selection. In the mutation operation, MDE combines the local model with the global model and adds a weighting factor to obtain the desired donor vector. The local donor vector is composed of the optimal solution in the neighborhood of
and two vectors randomly selected. This model showed as:
where
is the best solution in the neighborhood of
, here
k is a non-zero integer number in the codomain
(
is the population size).
and
are disturbances selected randomly based on the fixed scaling factor
. The increase of the disturbance can reduce the chance of the solution dropping into the local optimum.
Similarly, the mathematical model of the global donor vector can be expressed as:
Here,
is the best vector captured in the
iteration.
and
are random numbers on a whole scale. The first term of the model uses the global optimal vector to replace
to enhance the performance of convergence. Finally, the local donor vector is combined with the global donor vector to obtain the final donor vector, which can be expressed as:
where
w is the weighting factor, which ranges from 0 to 1. To reduce the parameters and control the balance,
w is set here as the middle value of its range, namely 0.5.
After the donor vector was obtained by mutation operation, crossover operation was carried out to further boost the diversity of the population. In existing differential evolution algorithms, exponential crossover and binomial crossover are widely used. We adopt the binomial crossing approach, which is introduced as:
Here is a random dimension index, ensuring that the test vector has at least one element provided by the mutation vector , and controls the crossover probability.
Selection operation is to compare the experimental individuals generated by mutation and crossover operation with the target individuals, and then the better individuals are selected to enter the next generation of the population. The selection process can be described as:
According to the equation, if the evaluation value of of the test individual is less than or equal to that of the corresponding target individual, then of the test individual will replace the corresponding target individual and enter the next generation of the population; otherwise, the individual will remain unchanged.
2.3. Overview of the Probabilistic Neural Network (PNN)
PNN, a feedforward neural network with the radial basis function (RBF), presented by Dr. Specht in 1989 [
31]. The application of the Bayesian decision theory and RBF in PNN and the consideration of the cross effect of different pattern types give it a certain competitive strength over other neural network models. When there is an increasing amount of enormous data, PNN is capable of converging to the Bayesian classifier without falling into local minima. Additionally, PNN is popular in pattern classification and fault detection and prediction.
Different from the structure of the back propagation (BP) neural network, PNN is typically a parallel 4-layer structure, indicated in
Figure 1. The function of each layer and corresponding equation are described as follows:
The input layer is made use of a pre-processing data set of the training sample and transmit characteristics of the sample to the network, so the number of its neurons should be the same as the dimension of all the sample.
For the pattern layer, the Euclidean distance between the feature vector of training sample
X and radial center
is used to realize the matching between the input feature vector and various types of training set. It can be expressed as follows:
Here,
.
l is for all types of training,
d is the dimension of eigenvector,
is the
center of the
training sample, and
is a smoothing factor. The function of summation layer is weighted to average the output of the same type of pattern layer. It is expressed as:
Here,
is the output of class
i neurons, and
L is the number of class
i neurons. The type corresponding to maximum output in the summation layer is the output type of the output layer, and its equation:
2.4. PNN Optimized by MDE-WOA Power Transformer PHM Model
For the defect of PNN, the hidden layer of calculation by the smoothing factor great influence. If the incorrect value is too large or too small, the network convergence falls into local optimum too quickly or easily. As an improved intelligent optimization algorithm, MDE-WOA has strong global optimization and rich population diversity. It can be extracted by selecting a suitable number set, to improve the performance of PNN.
In this model, the input data are as follows:
The flow chart of the PNN network model optimized by MDE-WOA is shown in
Figure 2, and the specific steps can be summarized as follows:
Step 1: Randomly generate initialization sample X;
Step 2: Initialize the parameters and structures of PNN and define the random smoothing factor as:
Step 3: Set the current life
and the current number of iterations
. Initialize the size (
), proportional factor (
F), cross control parameter (
), life span (
S) of the whale population, and the fitness function
. It is worth noting that the mean square error (MSE) is taken as the corresponding value of fitness function in our study.
Here, is actual results and is the expected result.
Step 4: Compute the fitness value of factor and record the position of the optimal individual;
Step 5: Update algorithm parameters: , and p;
Step 6: Determine the size relationship between random number
p and 0.5 between [0,1]: If
, the factor updates position by spiraling through Equation (
4). If
, the size relation between
s and
S is determined: If
, the current search agent searches and encircles the prey, and updates the position via Equations (
1) and (
3) respectively. If
, the MDE operator is introduced to optimize the search strategy;
Step 7: Compute the fitness value of the factor again, and update the best search factor if there is a better solution;
Step 8: Update the current life according to Equation (
5);
Step 9: When the number of iterations t reaches the maximum number of iterations and other parameters of the algorithm reach the preset conditions, the algorithm goes to the next step; if not, it returns to step 5;
Step 10: Optimal search agent instead of PNN in training smoothing factor to gain better fault diagnosis model;
Step 11: Test samples are substituted into the network to obtain the corresponding analysis data.
4. Experimental Results
To assess the effectiveness of our approach in the power transformer fault diagnosis, we compared the classification accuracy of the method with four methods (BA-BP, CS-BP, GA-BP, and PNN). We used MATLAB for simulation experiments. The classification accuracy calculated by the experiment is shown in
Table 3.
Table 3 demonstrates that the accuracy of fault detection and prediction of MDE-WOA-PNN model was best among all of the diagnostic models, which further indicates that MDE-WOA obviously improved optimizing PNN model. With regard to four types of fault diagnosis results, for instance MDE-WOA-PNN, LT (<150
C) was
(106/106), low temperature overheating (LT)
C–
C was
(13/13), partial discharge (PD) was
(14/14), and arc discharge (AD) was
(21/22). Therefore, compared with other diagnostic models, this model is more suitable for power transformer fault detection and prediction.
As another important diagnosis index of the model, MSE can directly express the error between PNN output and ideal output of the model. Therefore, to explore the superiority of our presented method, we compare MSE with the above four methods. As can be seen from
Table 4 the MSE of the test set of MDE-WOA-PNN model was minimal. When using MDE-WOA to optimize PNN and the test sample was used as the input of PNN, the MSE of this test set was only 0.058. The performance was far superior to other models. Due to the existence of some noise data, the MSE performance of training samples was not very excellent. However, combined with
Table 3, we know that the model still obtained a competitive diagnostic accuracy, which also proved that this model had a very high robustness from the side. In addition, to know more about the impact of the number of iterations on the MSE of the proposed model, we selected iteration times of 2, 4, 6, and 8 to conduct the exploratory experiment. The experimental results are given in
Figure 5.
To explore the impact of the number of iterations on the diagnostic accuracy of the the presented algorithm, we calculated the accuracy of the model’s fault diagnosis when the number of iterations was 2, 4, 6, and 8, respectively. The variation of fault diagnosis accuracy of various types of test samples in the network model is shown in
Figure 6.
Figure 7 suggests the classification results of MDE-WOA-PNN training results and test sets. As can be seen from the figure, when the iteration was 4, the diagnostic accuracy of the test set of the model was highest. Therefore, it can be seen from the analysis that when the number of iterations was 2, there was an under fitting phenomenon in the network model; when the number of iterations was 6 or 8, the phenomenon of over fitting existed in the network model, which proves the efficiency of the developed approach.
The variation of the average accuracy with the number of iterations is given in
Figure 8. According to the comprehensive analysis, MDE-WOA-PNN has a very high average accuracy for the problem studied in this paper, up to
. Therefore, MDE-WOA-PNN was quite appropriate for fault detection and prediction of power transformer.
The value and variance of the optimal search agent for MDE-WOA-PNN algorithm are given in
Figure 9. Analysis together with
Figure 8 indicates that the model had the highest diagnostic accuracy when the value of the best search agent was 0.047265.
The fitness curve of this model shown in
Figure 10 shows the convergence speed of MDE-WOA-PNN algorithm was very fast. Meanwhile, it can be seen that the proposed method could jump out of local optimality quickly, which shows the high efficiency of the algorithm. Besides, it is worth noting that the initial error of the algorithm was small, indicating that the initial value of the algorithm was close to the global optimal value.