Next Article in Journal
The Effect of Curing Conditions on Selected Properties of Recycled Aggregate Concrete
Previous Article in Journal
Listeria Monocytogenes in Soft Spreadable Salami: Study of the Pathogen Behavior and Growth Prediction During Manufacturing Process and Shelf Life
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Power Transformer Fault Diagnosis Based on Dissolved Gas Analysis by Correlation Coefficient-DBSCAN

School of Electrical Engineering and Automation, Wuhan University, Wuhan 430000, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2020, 10(13), 4440; https://doi.org/10.3390/app10134440
Submission received: 3 June 2020 / Revised: 22 June 2020 / Accepted: 25 June 2020 / Published: 27 June 2020
(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Abstract

:
The transformers work in a complex environment, which makes them prone to failure. Dissolved gas analysis (DGA) is one of the most important methods for oil-immersed transformers’ internal insulation fault diagnosis. In view of the high correlation of the same fault data of transformers, this paper proposes a new method for transformers’ fault diagnosis based on correlation coefficient density clustering, which uses density clustering to extrapolate the correlation coefficient of DGA data. Firstly, we calculated the correlation coefficient of dissolved gas content in the fault transformers oil and enlarged the correlation of the same fault category by introducing the amplification coefficient, and finally we used the density clustering method to cluster diagnosis. The experimental results show that the accuracy of clustering is improved by 32.7% compared with the direct clustering judgment without using correlation coefficient, which can effectively cluster different types of transformers fault modes. This method provides a new idea for transformers fault identification, and has practical application value.

1. Introduction

The health of power transformers is very important for the stable operation of power grid. There are a large number of transformers in service, most of which have been put into use for a certain period of time, and there will be internal faults in the long-term aging process. It is of great significance to diagnose all kinds of latent faults in transformers accurately for the stable and normal operation of transformers. Dissolved gas analysis (DGA) is widely used in online diagnosis of oil-immersed transformers because it uses non-electric quantity as a reference and is not affected by electromagnetism [1,2]. The diagnosis process of DGA is generally divided into the extraction of transformer oil samples, the stripping of dissolved gas in oil, the measurement of gas components and the determination of fault category. The determination of fault category is the core process of DGA diagnosis. The central idea is to determine the fault category by the content of gas component in oil [3].
Fault diagnosis technology can be divided into traditional chart query methods and modern intelligent algorithm identification methods [4,5]. Traditional chart query methods include the three-ratio method [6,7], the Duval triangle method [8] and the Pentagon method [9]. These methods are very convenient to use, and do not need programming and complex calculation. When the ratio of gas components is calculated to draw a graph, you can look up the table or the graph to find the corresponding fault category, but there are problems of low accuracy and judgments that are too absolute [10,11]. An intelligent algorithm is a kind of diagnosis method of pattern recognition by a computer. The fault recognition algorithm mainly includes classification and clustering algorithms [12,13,14]. The classification algorithms include support vector machine [15], decision tree and neural network [16], etc. These methods have achieved good results in transformer fault diagnosis. Clustering algorithms like the k-means and [17] fuzzy clustering algorithm [18,19] can accurately cluster the transformer fault data and identify the type of transformer fault. However, there are some shortcomings, such as the need to determine the number of clusters, difficulty building a reasonable membership function, being easily affected by some deviation points, and having a more complex calculation process [20]. The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is a density based clustering algorithm [21,22], the algorithm does not need to determine the number of clusters in advance, can divide the high-density point area into clusters, and effectively filter out the low-density point area. It can realize clustering of any shape in the noisy dataset, and it is widely used in detection and diagnosis. Hou [23] employed DBSCAN to detect textural damage, Li [24] accomplished thermal runaway diagnosis of battery system for electric vehicles by DBSCAN, and Li [25] made the combination of a DBSCAN and symmetrized dot pattern to complete fault diagnosis of rolling bearing. For transformer fault diagnosis, the density difference of each fault category in the Euclidean distance space of DGA data is not obvious, and direct application of DBSCAN to DGA diagnosis is not effective.
In the practical application of DBSCAN in DGA data processing, due to the variety of transformer faults and the vagueness of their Euclidean distance distinction, the classification results are sensitive to the clustering data, and the accuracy of fault identification is low, making the clustering effect unsatisfying, and difficult to get the data classification in line with the engineering practice. In view of the above problems, this paper proposes a transformer fault diagnosis method called Correlation Coefficient’s Density-Based Spatial Clustering of Applications with Noise (CCDBSACN), which applies a correlation coefficient to DBSCAN, constructs a partition coefficient characterized by a correlation coefficient, and enlarges the fault characteristics of dissolved gas data in oil with an amplification coefficient, successfully realizing the application of DBSCAN in DGA.
The remaining sections of this paper are organized as follows: Section 2 gives an overview of definition and clustering principle of DBSCAN. Section 3 introduces the correlation coefficient through an analysis of the defects of the traditional DBSCAN and proposes CCDBSCAN. In Section 4, the proposed method is used to cluster and diagnose the DGA data, and the effectiveness and advantages of this method are compared and analyzed. Some conclusions are presented in Section 5.

2. DBSCAN Method

DBSCAN algorithm was proposed by Martin Ester, Hans Peter Kriegel and others in 1996 [26]. It is a spatial clustering algorithm based on density. In order to accurately describe the algorithm, the following definitions are given first.
Definition 1 given dataset D, the Eps neighborhood of an object p refers to the area with object p as the center and Eps as the radius, that is
N E p s ( p ) = { q D | D i s t ( p , q ) E p s }
where D is the dataset, Dist(p,q) is the distance between object p and q, which means all objects in dataset D that are not more than Eps away from the object p.
Definition 2 given dataset D, the core object refers to the object pD whose Eps neighborhood has more objects than the given value MinPts, MinPts is neighborhood density threshold, and core object p satisfies Equation (2):
| N E p s ( p ) | M i n P t s
Definition 3 given dataset D, direct density reachability is that if object q is in the Eps neighborhood of object p and object p is the core object, then object q is direct density reachable to object p, that is, there is object q N E p s ( p ) and object p satisfies Equation (2).
Definition 4 given dataset D, density reachability is that if there is an object chain p 1 , p 2 , , p n D , for p i ( 0 < i < n ) , pi+1 is directly density reachable to object pi, then the object pn is said to be density reachable to object p1.
Definition 5 given dataset D, density connection is that if there is object oD, so that object p and object q are density reachable from object o, then object p and object q are said to be density connected.
Definition 6 given dataset D, a cluster C is a non-empty subset of dataset D, and the following conditions are met:
(1)
For any object q, if the core object pC and the object q is density reachable from the core object p, then the object qC.
(2)
For any object p, qC, object p and object q are density connected.
Definition 7 given dataset D, the noise point is the object that does not belong to any cluster.
DBSCAN first performs region query on any object o to calculate its Eps neighborhood. If
| N E p s ( o ) | < M i n P t s
object o is temporarily marked as noise; otherwise, object o and its neighbors are marked as belonging to cluster C1, and region query is repeated for each neighbor of o. When the former cluster C1 cannot be further expanded, any unmarked objects will be selected and a new cluster C2 will be generated based on the selected objects. Repeat for cluster Ci until all objects are marked.
The essence of the density-based clustering algorithm is to find the high-density dataset in the dataset, that is, the average distance between data points in the dataset is small, while there are low-density areas between high-density datasets. The DBSCAN algorithm uses the parameters of Eps and MinPts to determine the threshold of dividing high-density datasets.

3. Improved CCDBSCAN Algorithm

3.1. Introduction of Correlation Coefficient

The correlation coefficient matrix R is defined as the matrix formed by the correlation coefficients of each parameter vector and all other vectors in the dataset, i.e.,
R = [ r 11 r 12 r 1 n r 21 r 22 r 2 n r n 1 r n 2 r n n ]
r i j = cov ( t i , t j ) D ( t i ) D ( t j )
where, in Equation (4), R is the correlation coefficient of transformer oil characteristic gas vector ti and tj; cov(ti, tj) is the covariance of ti and tj; D(ti) and D(tj) are the covariance of ti and tj, respectively.
The classification of faults in Publication 60,599 is according to the main types of faults that can be reliably identified by the equipment after the fault has occurred in service [27]:
Partial discharge (PD): under the electric field, partial discharge will be triggered in the area with weak insulation performance in the transformer insulation system [28]. Partial discharge is electric discharge that only partially bridges the insulation between conductors. Corona partial discharge is evidenced by the formation of x-wax [29].
Discharges of low energy (D1): due to sparking, treeing and tracking evidenced by significant paper punctures, carbonization of paper surface plus carbon particles in oil.
Discharges of high energy (D2): with power follow through, evidenced by extensive carbonization, metal fusion, and possible tripping of the equipment.
Thermal faults below 300 °C (T1): evidenced by brownish paper. If paper has carbonized, thermal faults above 300 °C but below 700 °C (T2).
Thermal faults above 700 °C (T3): evidenced by oil carbonization, metal coloration, or fusion.
For the power transformers, the reasons for its failure include the rationality of its structure design, the quality of its insulation performance, and more importantly, the various stresses it needs to bear in the process of work. These stresses include all kinds of over-voltage and -heating, that is, thermal stress and electrical stress. DGA data of transformers provides information of electric and thermal stress of oil immersed power transformers [30].
Oil and paper are decomposed due to electrical and thermal stresses. Both are insulation materials of transformer. These two stresses can lead to the breakdown of insulating materials and release of gas decomposition products [31]. Griffin [32] described the types of faults associated with these gases. The decomposition of transformer oil produces hydrogen (H2), methane (CH4), acetylene (C2H2), ethylene (C2H4) and ethane (C2H6). CH4 and C2H6 are related to low-temperature oil breakdown, while C2H4 is related to high-temperature oil breakdown, C2H2 is related to discharge, and H2 is related to partial discharge.
In order to illustrate the existence of DGA gas content correlation of the same fault transformer and explain the existing problems when clustering directly using correlation, 12 pieces of DGA data were selected in the typical application case of power grid equipment state detection technology prepared by the operation and maintenance department of State Grid Corporation of China [33]. Fault types include T1, T2, T3, PD, D1, D2, and each fault includes two. First, calculate the percentage content of hydrogen and various hydrocarbon gases in the total gas of 12 data, and convert the absolute content of gas into the relative content, and then the correlation analysis is carried out according to Equation (4); finally, the correlation coefficient matrix R is obtained. The line chart of DGA gas content of 12 fault transformers is shown in Figure 1.
Calculate the correlation according to Equation (4).
It can be seen that the phenomenon of high correlation of DGA data of the same fault exists, and the correlation coefficient is generally greater than 0.95, but there is also a phenomenon, that is, the correlation of the last six types of faults in the matrix is relatively high. For example, the correlation between the seventh and the eighth data of the same fault type is 0.99, but the correlation coefficient between the seventh data and the ninth to twelfth fault data is also as high as 0.92. Due to the low differentiation, it will bring some difficulties to clustering. The reason is analyzed, because the seventh and eighth data belong to PD, the characteristic gas is hydrogen, while the ninth, tenth and eleventh, twelfth data belong to D1 and D2, and acetylene is the characteristic gas. However, due to the lower content of acetylene gas, the correlation coefficient of direct calculation data does not highlight the impact of acetylene gas content in the larger hydrogen content change, that is, the direct correlation analysis of the original data will cause the annihilation of sample attribute information. Therefore, the following research has been carried out in this paper: multiply the percentage of ethylene and acetylene content in seventh to tenth data by the amplification factor α 1 , α 2 , and then carry out the correlation analysis.
R = [ 1.00 0.96 0.51 0.89 0.39 0.40 0.66 0.72 0.49 0.69 0.61 0.61 0.96 1.00 0.67 0.95 0.62 0.63 0.54 0.59 0.41 0.57 0.58 0.53 0.51 0.67 1.00 0.84 0.78 0.81 0.22 0.16 0.26 0.18 0.05 0.19 0.89 0.95 0.84 1.00 0.64 0.67 0.29 0.36 0.17 0.34 0.35 0.28 0.39 0.62 0.78 0.64 1.00 1.00 0.00 0.00 0.04 0.00 0.29 0.12 0.40 0.63 0.78 0.64 1.00 1.00 0.01 0.01 0 0.02 0.25 0.09 0.66 0.54 0.22 0.29 0.00 0.01 1.00 0.99 0.81 0.92 0.83 0.92 0.72 0.59 0.16 0.36 0.00 0.01 0.99 1.00 0.79 0.91 0.81 0.89 0.49 0.41 0.26 0.17 0.04 0.00 0.81 0.79 1.00 0.95 0.96 0.97 0.69 0.57 0.18 0.34 0.00 0.02 0.92 0.91 0.95 1.00 0.93 0.97 0.61 0.58 0.05 0.35 0.29 0.25 0.83 0.81 0.96 0.93 1.00 0.98 0.61 0.53 0.19 0.28 0.12 0.09 0.92 0.89 0.97 0.97 0.98 1.00 ]
The line chart of DGA gas content obtained when α 1 = 1 , α 2 = 1.35 is shown in Figure 2.
The correlation coefficient matrix of data 7 to 10 is:
R = [ 1.00 0.99 0.69 0.88 0.99 1.00 0.64 0.86 0.69 0.64 1.00 0.92 0.88 0.86 0.92 1.00 ]
It can be seen that when the acetylene content is multiplied by 1.35, the correlation of the same fault category gas remains high, while the correlation of different types of fault gas becomes smaller. For example, the correlation coefficient of the seventh and the tenth data is reduced from 0.92 to 0.88, which makes the differentiation of different fault gas data more obvious.
It can be seen that finding the appropriate amplification coefficient is the next step for research, so that the characteristic gas of each kind of fault can be better reflected in the correlation coefficient.

3.2. Chaotic Sequence Optimization

Different gases usually play different roles in the process of sample classification. Simulation tests and a large number of field tests show that [3] acetylene is the characteristic gas of discharges, hydrogen is the characteristic gas of partial discharges, and ethylene is the characteristic gas of a thermal fault. However, according to Equation (4), the correlation between two groups of gases cannot reflect the similarity of characteristic gases of different faults, so it is necessary to amplify the less characteristic gases. Therefore, the following calculation method for enlarging the characteristic gases of faults is proposed in this paper.
Suppose that the DGA data is
X ( n × 5 ) = ( X 1 T , X 2 T , X 3 T , X 4 T , X 5 T , X 6 T ) T
of n fault transformers is collected, among which
X c T ( c = 1 , 2 , , 6 )
is the typical fault dataset of transformers, c = 1 is the T1 dataset, c = 2 is the T2 fault dataset, c = 3 is the T3 dataset, c = 4 is the PD dataset, c = 5 is the D1 dataset, c = 6 is the D2 dataset.
When c = 1 ,
X 1 ( m × 5 ) = ( X 11 , X 12 , , X 1 m ) T ,
where
X 11 ( 5 × 1 ) = ( x 111 , x 112 , x 113 , x 114 , x 115 ) T
represents the first DGA train vector of T1.
The matrix of amplification coefficient
A ( 5 × 5 ) = d i a g ( α 1 , α 2 , α 3 , α 4 , α 5 )
is defined, and the amplification matrix Y ( n × 5 ) is obtained by multiplying the matrix X ( n × 5 ) by the matrix A .
Y ( n × 5 ) = X ( n × 5 ) A ( 5 × 5 )
where the row correlation coefficient R Y of Y is
R Y = [ R 11 R 12 R 16 R 21 R 22 R 26 R 61 R 62 R 66 ]
where R i j is the correlation coefficient matrix of type i and type j faults
R i j = [ r i j 11 r i j 12 r i j 1 m r i j 21 r i j 22 r i j 2 m r i j m 1 r i j m 2 r i j m m ]
where r i j m m is the correlation coefficient between the m-th DGA vector of class i and the m-th DGA vector of class j.
The PC (partition coefficient) can be defined by the above description
P C i j = 1 a = 1 m b = 1 m r i j a b m × m
The AC (aggregation coefficient) is
A C i i = 1 P C i j ( i = j )
The larger the partition coefficient of class i and j faults indicates that the separation is more obvious. When i j , the larger the partition coefficient indicates that the similarity between class i and j faults is lower. When i = j , the higher the partition coefficient indicates that the data correlation within class i faults is higher.
The idea of this chapter is to use chaos sequence to optimize the amplification coefficient matrix, so that the partition coefficient between different types of faults is the largest and the aggregation coefficient of the same type of faults is the largest, to improve the classification and diagnosis effect of CCDBSCAN.
Chaos sequence optimization is to search chaos variables in a certain range according to the ergodicity and regularity of chaos sequence, so as to make the search of chaos variables jump out of the local optimum and finally reach the global optimum [34]. Based on the idea of information sharing of each orbit, the next iteration is not only determined by the inertia weight, but also affected by the historical information of the orbit and the global historical information of the rest orbit. Due to the introduction of chaotic sequence, the global feasible solution is fully searched, and the global optimization ability of the search method is obviously improved.
In this paper, the logistic model is used to generate chaotic sequences
x ( n + 1 ) = μ x ( n ) ( 1 x ( n ) )
where μ is the control variable, when μ = 4 , the system is in a completely chaotic state, and the sequence generated by Equation (10) is a chaotic sequence.
The optimization algorithm is a decision-making problem. The objective function of optimization in this paper is the partition coefficient of all kinds of faults, so that when i j , the separation coefficient is the maximum, when i = j , the separation coefficient is the minimum.
The objective function of optimization is
k = 1 6 P k k 6 i = 1 5 j > i 6 P i j 15
Step1: normalize the data and initialize each constant: number of tracks L, variable dimension N, iteration precision ε, maximum iteration algebra Tmax.
Step 2: randomly generate n-dimensional optimization variable
X L k = { x L , 1 k , x L , 2 k , , x L , n k }
of L-orbit number, and calculate the fitness F ( X L k ) on each orbit from Equation (11).
Step 3: determine and update the minimum value of historical fitness
F min = { f min ( X 1 ) , f min ( X 2 ) , , f min ( X L ) }
of each track, record X i corresponding to f min ( X i ) , and form
X L min = { X 1 , X 2 , , X L } ,
determine and update the minimum value
F g = min [ f min ( X i ) ]
of historical fitness of all tracks, and the sequence X g = X i corresponding to F g .
Step 4: calculate the adjustment amount Δ X L of the variable, which is composed of three parts: the first part is the inertia component of the sequence; the second part is the vector difference
Δ X 2 = X i k X i
between the value of each track and the historical optimal value of the current track; the third part is the vector difference.
Δ X 3 = X i k X g
between the value of each track and the historical optimal value of all tracks; finally, the adjustment amount of the optimization variable is obtained as follows:
Δ X L = λ 1 Δ X 1 + λ 2 Δ X 2 + λ 3 Δ X 3
In Equation (12), λ 1 , λ 2 , λ 3 represents the weight of three parts respectively. In the early stage of the iterative process, λ 1 value is enlarged to ensure the diversity of each track, and in the later stage of the iterative process, λ 3 value is enlarged to ensure the fusion of each track information. Three weight variables are defined as:
λ 1 = e k t 2 , λ 2 = λ 1 ( 1 λ 1 ) , λ 3 = ( 1 λ 1 ) ( 1 λ 1 )
Step 5: use Equation (10) to update the L-group chaotic sequence, map it to the variable value space, and use Equation (11) to calculate its fitness. If the fitness of the chaotic sequence in a certain orbit is better than that after iteration, then update the orbit.
Step 6: judge whether the iteration meets the end condition. If it meets the condition, the optimization is ended. If it does not, return to step 3 to continue the iteration.
At the end of this process, the optimized amplification coefficient matrix will be obtained.

3.3. Improved CCDBSCAN Diagnosis Method

As the original DBSCAN determines that the standard of each cluster is to calculate the Euclidean distance between each point, the data density represented by the Euclidean distance is used for clustering. However, due to the high correlation of the same fault type data, rather than Euclidean distance, the clustering based on correlation density is more reasonable and accurate. The specific diagnosis methods are as follows:
Definition 1 (Eps neighborhood) the Eps neighborhood of an object p refers to the region with the object p as the center and the correlation coefficient greater than Eps,
N E p s ( p ) = { q D | r ( p , q ) E p s }
where D is the dataset, r (p, q) is the correlation coefficient between object p and q, which means all objects in dataset D whose correlation coefficient with object p is not less than Eps.
Other definitions remain unchanged, and the flow chart of CCDBSCAN clustering algorithm is shown in Figure 3.
Using DBSCAN to deal with DGA data unsupervised, analyzing the correlation and dispersion between the sample vectors of DGA data, classifying according to the correlation of the samples, so that the DGA data samples of the same fault set are as similar as possible, and the data samples of different fault sets are as different as possible. This method is called CCDBSCAN.

4. Example Analysis and Results

The DGA data used in this section are from the record of fault transformer data of Huzhou Power Supply Company of Zhejiang electric power company of State Grid of China, the case data of IEC TC 10 dataset [27] and typical application cases of power grid equipment state detection technology [33]. The data cover six kinds of faults: T1, T2, T3, PD, D1 and D2. There are more than 2000 pieces of data collected. These data sources are numerous and scattered, so it is necessary to filter and sort out the data. First, initialize the data, calculate the percentage of hydrogen and various hydrocarbon gases in the gas; then, classify the data according to the fault category, calculate the mean value and standard deviation of each fault data, and eliminate the data beyond the mean value twice the standard deviation. After several rounds of screening, 60 pieces of DGA data are finally left for clustering test in this paper, of which 10 pieces are for each fault type, and each data contains five attributes: H2, CH4, C2H6, C2H4 and C2H2.
In this section, the CCDBSCAN is realized with MATLAB 2018b software system. All programs are implemented by hardware of Core i7-4710MQ CPU, memory 8G and hard disk 1T.

4.1. CCDBSCAN Algorithm Steps

The fault diagnosis method of CCDBSCAN is divided into the following steps:
(1)
Data initialization: calculate the content percentage data from the DGA gas content data to be collected. The equation used for data initialization is
x i = x i x 1 + x 2 + x 3 + x 4 + x 5 ( i = 1 , 2 , , 5 )
where x1 is the hydrogen content, in uL/L, x2 is the methane content, in uL/L, x3 is the ethane content, in uL/L, x4 is the ethylene content, in uL/L, x5 is the acetylene content, in uL/L.
(2)
Using chaotic sequence to get its amplification coefficient.
(3)
Pattern classification with CCDBSCAN method.
(4)
Fault diagnosis of fault transformer data.

4.2. Analysis of Chaos Sequence Optimization Results

The parameters of chaos sequence optimization are set as follows: the number of orbits is 1500, the dimension of variables is 5, the maximum number of iterations is 250, and the minimum error is 10−3; where the fitness function is
k = 1 6 P k k 6 i = 1 5 j > i 6 P i j 15
It can be seen in Figure 4 that the fitness function of the optimization curve of chaotic sequence decreases rapidly and stays at the local minimum point for a short time. Due to the chaos of its population, its convergence is faster and the optimal fitness found is 11.31. The optimization result of amplification coefficient is
[ 2.3894 , 0.9668 , 0.8058 , 2.1159 , 1.4306 ] .
The correlation analysis of DGA data after multiplying the 7th to 10th data by the amplification coefficient is shown in the Figure 5.
The correlation coefficient matrix of DGA data in Articles 7 to 10 is as follows:
R = [ 1 0.99 0.65 0.88 0.99 1 0.56 0.84 0.65 0.56 1 0.89 0.88 0.84 0.89 1 ]
The effect of amplification coefficient is analyzed by aggregation coefficient of various faults, and aggregation coefficient of various faults of original data is shown in Table 1
It can be seen that the aggregation coefficient of six kinds of faults is relatively large, while that of DP, D1 and D2 is relatively small, which increases the difficulty of density clustering.
Aggregation coefficient of six kinds of fault data processed by amplification coefficient is shown in Table 2.
By applying the amplification coefficient to all fault datasets for analysis, and comparing the aggregation coefficient of all kinds of faults multiplied by the optimized amplification coefficient, it is found that the aggregation coefficient of T1 and T2 are decreased by 1.0% and 1.2%, respectively. The aggregation coefficients of T3, PD, D1 and D2 have increased, among which PD, D1 and D2 have increased significantly, which are 22.0%, 9.8% and 13.7%, respectively. The results show that the correlation coefficient of each fault category is significantly improved after the gas content of each component is optimized, which is more conducive to cluster analysis.

4.3. CCDBSCAN Method Classification Result Analysis

The processed DGA data were used in CCDBSCAN analysis, and compared with the original DBSCAN method. The comparative analysis indexes were accuracy, precision, and recall, and were characterized by confusion matrix.
(1) DBSCAN analysis
After 60 groups of DGA data of fault transformers are normalized, DBSCAN method is directly used for cluster analysis. The cluster graph and confusion matrix are shown in Figure 6 and Table 3.
Accuracy = 8 + 8 + 9 + 2 + 0 + 8 60 = 58.3 %
(2) CCDBSCAN analysis
After 60 groups of DGA data of fault transformers are normalized, CCDBSCAN method is used for cluster analysis. The cluster graph and confusion matrix are shown in Figure 7 and Table 4.
By comparing the confusion matrix of the two methods, we can see that the accuracy of CCDBSCAN method is 90%, which is 31% higher than that of original DBSCAN method. The accuracy of original DBSCAN method in clustering PD, D1 and D2 is very low, and the accuracy is only 33%, which is caused by the less characteristic gas content of these three faults.
Accuracy = 10 + 9 + 10 + 9 + 8 + 8 60 = 90 %
Through comparative observation of Figure 6 and Figure 7, it is found that the most significant difference between Figure 6 and Figure 7 is the clustering result in area 1. In Figure 6, area 1 is clustered into one category only, and there are five data that are not successfully classified. In Figure 7, it is successfully clustered into two categories. In fact, the data in area 1 are two kinds of fault data: D1 and D2. Among them, there are 9 data in D1 and 10 data in D2. Because the content of characteristic gas in D1 and D2 fault is not obvious, their Euclidean distance is not much different, which makes the original DBSCAN method unable to distinguish them effectively. This method enlarges the characteristic gas of D1 and D2 faults, makes them have obvious differences, and successfully distinguishes the faults with small similarity difference before, greatly improves the clustering accuracy of the method.

4.4. Analysis of Fault Diagnosis Results

After the DGA data of different fault types are collected, the CCDBSCAN method is used to extract the relevant fault vector feature set, and the fault state diagnosis of transformers is realized by the feature set.
Typical application cases of power grid equipment state detection technology prepared by the operation and maintenance department of State Grid Corporation of China [33], data in this book are collected and sorted out under the actual operation conditions of the transformer, and illustrated with field drawings, so it is quite convincing to select the cases for diagnosis and analysis. In this paper, 30 DGA data including six fault modes are obtained by selecting the fault transformers with clear fault types through field inspection. See Table 5 for DGA data and fault types. The diagnosis effect is compared and analyzed by using the fault modes clustering in this paper and IEC 60,599 method.
The above 30 data to be diagnosed are initialized with Equation (15), that is, after the absolute content of five gases is calculated as the percentage content of gases, the correlation coefficients between them and the six fault sets that have been clustered are calculated, and CCDBSCAN is used for clustering. Select T3, PD, D1 and D2 faults (corresponding to the 4th, 9th, 18th and 25th data in the table) to illustrate the clustering process and the improvement effect of CCDBSCAN. See Figure 8 for the line chart of correlation coefficient.
The data in Figure 8a is T3 data, and the correlation coefficients of the data with six clusters are calculated. It is found that the correlation coefficients of cluster 2 and cluster 3 of the original data are high, and the maximum correlation coefficient with cluster 2 reaches 0.956, which will interfere with the density clustering. The correlation coefficient between the characteristic gas of the T3 data and six clusters is calculated after being amplified. The maximum correlation coefficient between the characteristic gas of the T3 data and the cluster 2 is reduced to 0.901, which improves the accuracy of the density clustering algorithm. Figure 8b also shows the same characteristics as 8a, especially Figure 8c,d. Between the low- (8c) and high-energy discharge faults (8d), the difference of the original data correlation coefficient line is not obvious. When the original data are used for clustering in Figure 8c, the low-energy discharge DGA data of 8c will be mistakenly clustered into the high-energy discharge fault, resulting in misdiagnosis. However, after applying the method to the gas data amplification, the correlation between the data and the high-energy discharge cluster is obviously reduced. There were four data correlation coefficients between the data and the high-energy discharge cluster that were more than 0.95, and now their correlation coefficients are reduced to 0.9. The correlation coefficient of the low-energy discharge cluster is different from that of high energy discharge cluster, and this data is successfully diagnosed as low energy discharge fault. See Table 6 for CCDBSCAN results and average correlation coefficient of all 30 data.
It can be seen from the table that among the 30 fault data clusters of CCDBSCAN, the diagnosis results of 26 data are completely consistent with the actual fault types. In the other four data, two data are misjudged, No.5 data was actually D2, but is diagnosed as T3, No.17 data is actually T3 but is diagnosed as PD, and the rest of the diagnosis results of No.s7 and 8 data are close to the actual results.
Although the thermal fault is artificially divided into three types with 300 and 700 °C as clear boundaries, T1, T2, T3, there is in fact no such clear physical boundary. The distinction between temperature of T1, T2, T3 is a qualitative fuzzy description, and there is a transition state between them. Similarly, D1 and D2 are distinguished by the level of discharge energy. There is also a transition state between them, and there is no clear physical boundary. Therefore, using a correlation coefficient to describe the similarity degree of DGA data to each fault, rather than an absolute diagnosis result, may be more practical in engineering. The diagnosis results of the above two data can still be regarded as valid.
Therefore, Table 6 shows a high accuracy rate of fault diagnosis. According to the diagnostic rule [35] of IEC60599−2015 (see Table 5 for the diagnostic results), 10 of the above 30 DGA data were diagnosed as errors, with the numbers of 2, 5, 7, 8, 9, 10, 14, 19, 23 and 27 respectively. The accuracy rate of fault diagnosis was relatively low.

5. Conclusions

DBSCAN is an important method of clustering algorithm, but in transformer fault diagnosis, due to the difference of characteristic gas content used to distinguish each fault and the vagueness of their Euclidean distance distinction, the effect of DBSCAN method directly applied to fault diagnosis is not satisfying. In this paper, according to the correlation characteristics of the same kind of fault data of the transformer, the aggregation coefficient is designed to represent the similarity degree and the cluster diagnosis of CCDBSCAN is completed by optimizing and amplifying the fault characteristics. Through calculation, the following conclusions are obtained:
(1)
The method proposed in this paper is different from the traditional Dissolved Gas Analysis in oil; we introduce the concept of correlation coefficient into cluster analysis, and the aggregation coefficient is constructed to represent the similarity degree of the data. Through the optimized amplification coefficient, some gas which is important but less in content gets amplified, successfully making the correlation coefficient of dissolved gas in oil with the same fault higher than before.
(2)
By introducing the correlation coefficient into the DBSCAN method, the accuracy of clustering is improved by 31%, which successfully solved the problem of low accuracy of DBSCAN method in clustering. When used in fault diagnosis, the similarity between test set and each fault can be represented by the correlation coefficient instead of a simple diagnosis result, which is more in line with the engineering practice.
(3)
Using the correlation coefficient to represent the similarity degree of data, and the CCDBSCAN method for clustering, the accuracy of fault diagnosis is significantly improved compared with the iec60599-2015 method, providing a better prospect for application.

Author Contributions

Conceptualization and supervision, B.S.; methodology, Y.L.; validation, Y.L. and J.G.; formal analysis, L.W.; resources, L.W.; data curation, R.X.; writing—original draft preparation, Y.L. and J.G.; writing—review and editing, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bagheri, M.; Naderi, M.S.; Blackburn, T. Advanced Transformer Winding Deformation Diagnosis: Moving from Off-line to On-line. IEEE Trans. Dielectr. Electr. Insul. 2012, 19, 1860–1870. [Google Scholar] [CrossRef]
  2. Faiz, J.; Soleimani, M. Dissolved Gas Analysis Evaluation in Electric Power Transformers using Conventional Methods a Review. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 1239–1248. [Google Scholar] [CrossRef]
  3. IEEE. C57.104-2008-IEEE Guide for the Interpretation of Gases Generated in Oil-Immersed Transformers; (Revision of IEEE Std C57104-1991); IEEE: Toulouse, France, 2009; pp. 1–36. [Google Scholar]
  4. Abu-Siada, A.; Islam, S. A Novel Online Technique to Detect Power Transformer Winding Faults. IEEE Trans. Power Deliv. 2012, 27, 849–857. [Google Scholar] [CrossRef]
  5. Senoussaoui, M.E.; Brahami, M.; Fofana, I. Combining and comparing various machine-learning algorithms to improve dissolved gas analysis interpretation. IET Gener. Transm. Distrib. 2018, 12, 3673–3679. [Google Scholar] [CrossRef]
  6. Rogers, R.R. IEEE and IEC Codes to Interpret Incipient Faults in Transformers, Using Gas in Oil Analysis. IEEE Trans. Electr. Insul. 1978, 13, 349–354. [Google Scholar] [CrossRef]
  7. Duval, M. A review of faults detectable by gas-in-oil analysis in transformers. IEEE Electr. Insul. Mag. 2002, 18, 8–17. [Google Scholar] [CrossRef] [Green Version]
  8. Duval, M. The Duval Triangle for Load Tap Changers, Non-Mineral Oils and Low Temperature Faults in Transformers. IEEE Electr. Insul. Mag. 2008, 24, 22–29. [Google Scholar] [CrossRef]
  9. Duval, M.; Lamarre, L. The Duval Pentagon-A New Complementary Tool for the Interpretation of Dissolved Gas Analysis in Transformers. IEEE Electr. Insul. Mag. 2014, 30, 9–12. [Google Scholar]
  10. Khan, S.A.; Equbal, M.D.; Islam, T. A Comprehensive Comparative Study of DGA Based Transformer Fault Diagnosis Using Fuzzy Logic and ANFIS Models. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 590–596. [Google Scholar] [CrossRef]
  11. Ghoneim, S.S.M.; Taha, I.B.M.; Elkalashy, N.I. Integrated ANN-Based Proactive Fault Diagnostic Scheme for Power Transformers Using Dissolved Gas Analysis. IEEE Trans. Dielectr. Electr. Insul. 2016, 23, 1838–1845. [Google Scholar] [CrossRef]
  12. Noori, M.; Effatnejad, R.; Hajihosseini, P. Using dissolved gas analysis results to detect and isolate the internal faults of power transformers by applying a fuzzy logic method. IET Gener. Transm. Distrib. 2017, 11, 2721–2729. [Google Scholar] [CrossRef]
  13. Chen, W.G.; Pan, C.; Yun, Y.X.; Liu, Y.L. Wavelet Networks in Power Transformers Diagnosis Using Dissolved Gas Analysis. IEEE Trans. Power Deliv. 2009, 24, 187–194. [Google Scholar] [CrossRef]
  14. Lin, C.H.; Wu, C.H.; Huang, P.Z. Grey clustering analysis for incipient fault diagnosis in oil-immersed transformers. Expert Syst. Appl. 2009, 36, 1371–1379. [Google Scholar] [CrossRef]
  15. Bacha, K.; Souahlia, S.; Gossa, M. Power transformer fault diagnosis based on dissolved gas analysis by support vector machine. Electr. Power Syst. Res. 2012, 83, 73–79. [Google Scholar] [CrossRef]
  16. Tamilselvan, P.; Wang, P.F. Failure diagnosis using deep belief learning based health state classification. Reliab. Eng. Syst. Saf. 2013, 115, 124–135. [Google Scholar] [CrossRef]
  17. Zou, H.L. Clustering Algorithm and Its Application in Data Mining. Wirel. Pers. Commun. 2020, 110, 21–30. [Google Scholar] [CrossRef]
  18. Huang, Y.C.; Sun, H.C. Dissolved Gas Analysis of Mineral Oil for Power Transformer Fault Diagnosis Using Fuzzy Logic. IEEE Trans. Dielectr. Electr. Insul. 2013, 20, 974–981. [Google Scholar] [CrossRef]
  19. Li, J.A.; Liao, R.J.; Grzybowski, S.; Yang, L.J. Oil-paper Aging Evaluation by Fuzzy Clustering and Factor Analysis to Statistical Parameters of Partial Discharges. IEEE Trans. Dielectr. Electr. Insul. 2010, 17, 756–763. [Google Scholar] [CrossRef]
  20. Wang, L.; Wang, Z.O. CUBN: A Clustering Algorithm Based Ondensity and Distance; IEEE: New York, NY, USA, 2003; pp. 108–112. [Google Scholar]
  21. Amini, A.; Teh, Y.W.; Saboohi, H. On Density-Based Data Streams Clustering Algorithms: A Survey. J. Comput. Sci. Technol. 2014, 29, 116–141. [Google Scholar] [CrossRef]
  22. Fahad, A.; Alshatri, N.; Tari, Z.; Alamri, A.; Khalil, I.; Zomaya, A.Y.; Foufou, S.; Bouras, A. A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis. IEEE Trans. Emerg. Top. Comput. 2014, 2, 267–279. [Google Scholar] [CrossRef]
  23. Hou, T.C.; Liu, J.W.; Liu, Y.W. Algorithmic clustering of LiDAR point cloud data for textural damage identifications of structural elements. Measurement 2017, 108, 77–90. [Google Scholar] [CrossRef]
  24. Li, D.; Zhang, Z.S.; Liu, P.; Wang, Z.P. DBSCAN-Based Thermal Runaway Diagnosis of Battery Systems for Electric Vehicles. Energies 2019, 12, 15. [Google Scholar] [CrossRef] [Green Version]
  25. Li, H.; Wang, W.; Huang, P.; Li, Q.Z. Fault diagnosis of rolling bearing using symmetrized dot pattern and density-based clustering. Measurement 2020, 152, 13. [Google Scholar] [CrossRef]
  26. Hahsler, M.; Piekenbrock, M.; Doran, D. dbscan: Fast Density-Based Clustering with R. J. Stat. Softw. 2019, 91, 1–30. [Google Scholar] [CrossRef] [Green Version]
  27. Duval, M.; Depablo, A. Interpretation of gas-in-oil analysis using new IEC publication 60599 and IEC TC 10 databases. IEEE Electr. Insul. Mag. 2001, 17, 31–41. [Google Scholar] [CrossRef]
  28. Pan, C.; Chen, G.; Tang, J.; Wu, K. Numerical Modeling of Partial Discharges in a Solid Dielectric-bounded Cavity: A Review. IEEE Trans. Dielectr. Electr. Insul. 2019, 26, 981–1000. [Google Scholar] [CrossRef] [Green Version]
  29. Irungu, G.K.; Akumu, A.O.; Munda, J.L. A New Fault Diagnostic Technique in Oil-Filled Electrical Equipment; the Dual of Duval Triangle. IEEE Trans. Dielectr. Electr. Insul. 2016, 23, 3405–3410. [Google Scholar] [CrossRef]
  30. Dhini, A.; Faqih, A.; Kusumoputro, B.; Surjandari, I.; Kusiak, A. Data-driven Fault Diagnosis of Power Transformers using Dissolved Gas Analysis (DGA). Int. J. Technol. 2020, 11, 388–399. [Google Scholar] [CrossRef] [Green Version]
  31. Zhang, Y.; Ding, X.; Liu, Y.; Griffin, P.J. An artificial neural network approach to transformer fault diagnosis. IEEE Trans. Power Deliv. 1996, 11, 1836–1841. [Google Scholar] [CrossRef] [Green Version]
  32. Griffin, P.J. Criteria for the Interpretation of Data for Dissolved Gases in Oil from Transformers (A Review). In Electrical Insulating Oils; ASTM International: West Conshohocken, PA, USA, 1988; pp. 89–107. [Google Scholar]
  33. Operation and Maintenance Department of State Grid Corporation of China. Typical Application Cases of Power Grid Equipment State Detection Technology; China Power Press: Beijing, China, 2014. [Google Scholar]
  34. Aslimani, N.; Ellaia, R. A new chaos optimization algorithm based on symmetrization and levelling approaches for global optimization. Numer Algorithms 2018, 79, 1021–1047. [Google Scholar] [CrossRef]
  35. IEC 60599:2015, Mineral Oil-Filled Electrical Equipment in Service-Guidance on the Interpretation of Dissolved and Free Gases Analysis; IEC Webstore: Geneva, Switzerland, 2015.
Figure 1. Line chart of DGA gas content of 12 fault transformers.
Figure 1. Line chart of DGA gas content of 12 fault transformers.
Applsci 10 04440 g001
Figure 2. Line chart after gas content amplification.
Figure 2. Line chart after gas content amplification.
Applsci 10 04440 g002
Figure 3. CCDBSCAN algorithm flow chart.
Figure 3. CCDBSCAN algorithm flow chart.
Applsci 10 04440 g003
Figure 4. Change curve of fitness with iteration times.
Figure 4. Change curve of fitness with iteration times.
Applsci 10 04440 g004
Figure 5. Line chart of gas content after multiplying coefficient.
Figure 5. Line chart of gas content after multiplying coefficient.
Applsci 10 04440 g005
Figure 6. Clustering results of original DBSCAN method (Eps = 0.15, MinPts = 3).
Figure 6. Clustering results of original DBSCAN method (Eps = 0.15, MinPts = 3).
Applsci 10 04440 g006
Figure 7. Clustering results of CCDBSCAN method (Eps = 0.96, MinPts = 3).
Figure 7. Clustering results of CCDBSCAN method (Eps = 0.96, MinPts = 3).
Applsci 10 04440 g007
Figure 8. Line chart of correlation coefficient before and after improvement. (a) Correlation coefficient of T3; (b) Correlation coefficient of PD; (c) Correlation coefficient of D1; (d) Correlation coefficient of D2.
Figure 8. Line chart of correlation coefficient before and after improvement. (a) Correlation coefficient of T3; (b) Correlation coefficient of PD; (c) Correlation coefficient of D1; (d) Correlation coefficient of D2.
Applsci 10 04440 g008
Table 1. Failure aggregation coefficient of raw data.
Table 1. Failure aggregation coefficient of raw data.
Fault CategoryCategory 1(T1)Category 2(T2)Category 3(T3)Category 4(PD)Category 5(D1)Category 6(D2)
Aggregation coefficient37.9842.7542.4331.1432.4235.68
Table 2. Fault aggregation coefficient of data multiplied by coefficient.
Table 2. Fault aggregation coefficient of data multiplied by coefficient.
Fault CategoryCategory 1(T1)Category 2(T2)Category 3(T3)Category 4(PD)Category 5(D1)Category 6(D2)
Aggregation coefficient37.5942.2444.4337.9935.6140.58
Table 3. Confusion matrix of DBSCAN analysis results.
Table 3. Confusion matrix of DBSCAN analysis results.
Category 1(T1)Category 2(T2)Category 3(T3)Category 4(PD)Category 5(D1)Category 6(D2)Precision
Cluster1800000100
Cluster2080000100
Cluster3009000100
Cluster400021066.7
Cluster50006000
Cluster600006857.1
Noise221232
Recall80809020080
Table 4. Confusion matrix of CCDBSCAN analysis results.
Table 4. Confusion matrix of CCDBSCAN analysis results.
Category 1(T1)Category 2(T2)Category 3(T3)Category 4(PD)Category 5(D1)Category 6(D2)Precision
Cluster1101010083.3
Cluster2090000100
Cluster3001001090.9
Cluster400090190
Cluster500008188.9
Cluster600001888.9
Recall10090100908080
Table 5. DGA data fault type and IEC60599 diagnosis results.
Table 5. DGA data fault type and IEC60599 diagnosis results.
NumberH2(ul/l)CH4(ul/l)C2H6(ul/l)C2H4(ul/l)C2H2(ul/l)IEC60599Fault Category
154.5471.939.7293.376.58T3T3
2# 1576054040.510002760D1D2
32080.224.668.60T2T2
415.955.9822.33137.250.21T3T3
5#40102.632.3183.30.2T3D2
62.369119.6921.89120.150T1T1
7#87.1717.263.9412.8732.81D1D1
8#605158665519012.3T2T3
9462212.431.600Missing PD
10131.7116.5519.4183.970.32Missing T3
11212.010.461.485.61D1D1
1273.814838.91811.76T3T3
1318.1921.996.5846.923.97T33
14#1.610.10.91.6D2D1
15116.17180.8352.48278.185.36T3T3
1650.18171.1274.7148.690T2T2
1728.9710.941.646.964.36T3T3
187238.97695.16231.62394.32308.92D2D2
19#47.619.14.21270.72T3T1
2050.3565.5821.0599.130.96T3T3
21120.45210.9135.7285.3915.86T3T3
225.4848.8296.81489.570.3T3T3
231.962.10.50.671.59Missing D1
2425.454.978.7277.8410.47T3T3
257911.85947.4396.93907.194844.48D1D1
26676.74969.55570.572483.2617.48T3T3
27101.524.458.97128.370Missing T3
2834.765.522.094.9710.36D1D1
2920.459.845.280.50T2T2
30110.411232.580.80T1T1
1 #indicates incorrect diagnosis.
Table 6. DGA data diagnosis results using CCDBSCAN.
Table 6. DGA data diagnosis results using CCDBSCAN.
NumberCluster1Cluster2Cluster3Cluster4Cluster5Cluster6Cluster Type
10.660.820.950.02−0.230.13T3
2−0.06−0.60−0.280.760.870.94D2
30.620.950.89−0.30-0.61−0.33T2
40.470.920.98-0.25−0.39-0.11T3
50.540.930.99−0.21-0.40−0.10T3
60.880.850.13-0.39−0.62−0.61T1
70.07−0.59−0.300.860.800.93D2
80.600.970.96−0.24−0.55−0.24T2
90.50−0.36−0.270.910.350.61PD
100.710.790.930.16−0.190.22T3
110.15−0.57−0.310.930.910.90D1
120.640.920.97−0.13−0.42−0.08T3
130.530.860.98−0.10−0.260.06T3
14−0.32−0.43−0.080.230.820.72D1
150.620.910.98−0.10−0.36−0.01T3
160.620.970.88−0.31−0.69−0.41T2
170.53−0.210.020.940.500.84PD
180.25−0.250.080.830.700.95D2
190.920.380.590.690.140.61T1
200.660.900.98−0.03−0.350.02T3
210.610.870.97−0.10−0.310.02T3
220.370.890.96−0.28−0.37−0.12T3
23−0.29−0.74−0.540.330.750.581
240.520.860.96−0.19−0.29−0.01T3
25−0.24−0.73−0.430.650.900.87D1
260.520.920.99−0.17−0.38−0.07T3
270.620.690.890.27−0.040.37T3
280.17−0.52−0.250.920.920.93D1
290.670.960.72−0.34−0.45−0.37T2
300.910.580.670.510.310.49T1

Share and Cite

MDPI and ACS Style

Liu, Y.; Song, B.; Wang, L.; Gao, J.; Xu, R. Power Transformer Fault Diagnosis Based on Dissolved Gas Analysis by Correlation Coefficient-DBSCAN. Appl. Sci. 2020, 10, 4440. https://doi.org/10.3390/app10134440

AMA Style

Liu Y, Song B, Wang L, Gao J, Xu R. Power Transformer Fault Diagnosis Based on Dissolved Gas Analysis by Correlation Coefficient-DBSCAN. Applied Sciences. 2020; 10(13):4440. https://doi.org/10.3390/app10134440

Chicago/Turabian Style

Liu, Yongxin, Bin Song, Linong Wang, Jiachen Gao, and Rihong Xu. 2020. "Power Transformer Fault Diagnosis Based on Dissolved Gas Analysis by Correlation Coefficient-DBSCAN" Applied Sciences 10, no. 13: 4440. https://doi.org/10.3390/app10134440

APA Style

Liu, Y., Song, B., Wang, L., Gao, J., & Xu, R. (2020). Power Transformer Fault Diagnosis Based on Dissolved Gas Analysis by Correlation Coefficient-DBSCAN. Applied Sciences, 10(13), 4440. https://doi.org/10.3390/app10134440

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop