1. Introduction
The protection system is an important part of the smart substation, which mainly includes the merging unit, intelligent terminal, and protection device. It is of great significance for the safe operation of the whole power grid to detect the faults or abnormal conditions in the power system, so as to send an alarm signal, or directly isolate and remove the faulty part. Therefore, it is necessary to ensure the reliable operation of the protection system itself. Even if it fails, it should also quickly eliminate the fault and ensure its support for the normal operation of the power grid to the maximum extent. With the rapid development of smart grid technology, intelligent substations or digitally transformed smart stations have gradually become popular. Substations have the new features of intelligent equipment, data networking, and overall station informatization [
1]. Compared with the traditional substation, the fault types and fault characteristics of its protection system have changed a lot, and it is necessary to continuously develop new fault diagnosis methods on the basis of traditional fault diagnosis methods [
2]. Improvements in data acquisition, storage, and analysis in intelligent substations provide us with new ideas for developing new fault diagnosis methods. The function of the secondary system in the substation is more complete and the amount of monitoring information is larger, for example, the operation event record of the secondary system, the main station information, the key information of the protection device, the integrated alarm information, the network operation information, the device self-checking information, etc., which provide solid data support for the condition assessment and fault diagnosis of the substation. Reference [
3] analyzed the operation behavior of the secondary system of intelligent substation, and sorted out 18 core evaluation sub-items, including sampling accuracy, internal environment, self-checking state, protection start, input consistency, real-time input and output, operation time of the whole group, port flow, fault-free time, correct action, correct control, station-control layer communication, process-layer communication, port function, port performance, on-time receiving, and on-time output. A Big Data mining technology for intelligent substations was proposed by Reference [
4]. By deeply mining the integrated alarm information, the device self-checking information, the link information of GOOSE and SV, the sampling value information provided data support for fault diagnosis.
With regard to the research on the fault diagnosis of the protection system of the intelligent substation, Reference [
5] proposed a method for evaluating the performance of secondary equipment in smart substations based on availability, dependability, and capability (ADC). The accuracy of its evaluation needs to be improved. Reference [
6] proposed an online monitoring and fault diagnosis method of the secondary circuit of relay protection based on multi-parameter information. Through the monitoring and analysis of the SV, GOOSE, and manufacturing message specification (MMS) messages, the online state monitoring method, abnormal sampling value, switch monitoring, and abnormal alarm strategy for relay protection devices was proposed. At the same time, the typical alarm information of the protection device and secondary circuit when faults occur was collected, analyzed, and uploaded to construct a set of online monitoring and fault diagnosis systems for a secondary circuit of relay protection. References [
7,
8,
9,
10] proposed a method of locating the secondary equipment fault based on the substation configuration description (SCD) of the intelligent substation. However, the actual workflow of intelligent substations is highly dependent on the configuration tools of integrators and manufacturers, and the difference in configuration tools leads to a poor standardization of the files. Therefore, there are still many shortcomings in the method of locating faults simply using SCD files. With the research in machine learning and deep learning algorithms, their application in fault diagnosis is increasingly employed [
11]. Reference [
12] proposed a research method for the fault location of the secondary device in intelligent substations based on deep learning. According to the device self-checking information, a fault location model for the secondary device based on a recurrent neural network (RNN) was established and the fault location steps were given; however, the data source used had certain limitations, and due to the limitations of the time and accuracy of the algorithm, this method needs to be improved. Reference [
13] proposed an intelligent state assessment of the protection systems based on random forest algorithm, but the prediction accuracy and robustness need to be improved, and the requirements for the parameters are high.
The above method can basically meet the needs of fault diagnosis, but there are still various problems in its practical application. In summary, the current research on the protection system of intelligent substations still needs to face these problems: There are many types of faults in the protection system, but the correlation between the fault characteristics is weak; the complex components of the protection system equipment and the connection relationship between the different devices cause a large amount of data to be generated when a fault occurs [
14]. Conventional methods cannot efficiently and quickly analyze the massive multi-dimensional data; since the fault-feature information may be distorted and lost during the acquisition process, the results obtained by the conventional method fluctuate with the confidence of the fault feature information; In addition, the accuracy of the algorithm also needs to be improved.
In order to solve the problems of data source, data processing, fault diagnosis logic, diagnosis method, and fault accuracy rate faced in fault diagnosis, a new fault diagnosis method of an intelligent substation protection system based on a gradient boosting decision tree is proposed. The GBDT algorithm is a supervised ensemble learning method. Through the continuous iteration of the weak prediction model composed of decision trees, the strong prediction model is trained with the goal of minimizing the prediction errors of the previous round. It has extremely high accuracy and a fast convergence speed. Taking the protection system merging unit, intelligent terminal, and protection device as the main body of fault diagnosis, this method used the integrated alarm information, device self-checking information, link information of GOOSE and SV, and the sampling value information as the judgment basis to form the fault feature information set. According to the historical fault feature data and maintenance records, the faults of the protection system are divided into simple faults and complex faults. At the same time, the gradient boosting decision tree (GBDT) intelligent algorithm is used as a diagnostic tool, and the fault diagnosis process of the protection system is proposed to realize the diagnosis of complex faults of the protection system. The effectiveness of the method proposed in this paper is verified by example analysis.
2. Fault Type and Fault Feature Information of the Protection System
2.1. Classification of Protection System Fault Types
When the fault diagnosis of the protection system is carried out, the fault types are properly classified, which can ensure the accuracy of judgment, reduce the amount of calculation in the process of fault diagnosis, reduce the amount of computer resources, and improve the response speed and convergence speed [
12].
By analyzing the alarm information, self-checking information, sampling value information, and fault maintenance-record data of the device of the protection system, we can divide the faults into two categories. One is the simple fault, that is, there is an obvious mapping relationship between the fault type and the fault feature information. After the fault occurs, the fault type can be simply deduced according to the fault feature information. For example, if the fault feature information is “Power failure alarm of merging unit”, it can be directly deduced that the fault is “Power module fault of merging unit”. Another type is the complex fault, which means that the mapping relationship between the fault type and fault feature information is weak, and cannot be directly deduced by simple reasoning of fault feature information. An intelligent algorithm is needed for the reasoning and diagnosis.
According to the equipment manual, fault data, and fault characteristics, the high-frequency faults are classified as shown in
Table 1 and
Table 2.
Set the complex fault set as Formula (1):
In Formula (1),
, respectively, represent the 12 faults in
Table 2.
For the simple faults in
Table 1, the expert system can be used for fault diagnosis according to the fault feature information summarized in
Table 1. For space reasons, the performance of the method proposed in this paper is explored based on the faults in the complex fault set
F. In addition, with the development of intelligent substations and the improvement in field complexity, the fault set
F will be further expanded, and the method proposed in this paper is still applicable to subsequent faults.
2.2. Fault Feature Information of the Protection System
Based on the complex fault types of the protection system summarized in
Table 2, this paper selects four features of integrated alarm information, device self-checking information, link information of GOOSE and SV, and sampling value information as the feature information of fault diagnosis, which can comprehensively reflect the change in feature quantity caused by the fault of the protection system [
15].
The main function of the integrated alarm information is to reflect whether the protection system fails. If a fault occurs, the equipment will issue alarm information and upload it to the monitoring terminal, which can be used as one of the bases for equipment fault diagnosis while realizing fault warning.
Device self-checking is an important function of an intelligent protection system. When any abnormality occurs in the operation process of the device, the device will record the abnormal information through the event-recording function for the operator to query.
GOOSE (Generic Object-Oriented Substation Event) is equivalent to the DC control and signal cables in traditional substations, which transmit control instructions and signals. It mainly includes a switch/knife switch position, control switch position, abnormal/alarm signal, blocking signal, etc. SV (Sampled Value) is equivalent to the secondary AC cable in the traditional substation, which transmits the sampled instantaneous values of voltage and current, including the instantaneous value of voltage and current on the secondary side of the transformer. The link information of GOOSE and SV are important indicators to indicate whether the information links between the protection system equipment and between the equipment and the monitoring terminal work normally, reflecting the link connection state of the equipment.
The sampling value information is the sampling value of three-phase voltage and current transmitted by two channels, which can reflect whether the voltage and current-sampling function of the protection system are normal.
The protect system fault feature information, as shown in
Table 3:
2.3. Fault Feature Information Set of the Protection System
According to the fault feature information in
Table 3, the fault feature information set is established to provide data support for the subsequent fault diagnosis of the protection system of an intelligent substation.
The integrated alarm information set
of the protection system of an intelligent substation in the
i-th fault event is established as shown in Formula (2):
a1–
a11 in the above formula are the 11 kinds of fault feature information contained in the integrated alarm information in
Table 3. When the monitoring host receives the alarm information, the element at the corresponding position is set to 1, otherwise it is set to 0.
The link information of the GOOSE and SV set
of the protection system of an intelligent substation in the
i-th fault event is established as shown in Formula (3):
i1–
i6 in the above formula are the 6 kinds of fault feature information contained in the link information of GOOSE and SV in
Table 3. When the secondary monitoring system receives the alarm information, the element at the corresponding position is set to 1, otherwise it is set to 0.
The device self-checking information set
of the protection system of an intelligent substation in the
i-th fault event is established as shown in Formulas (4)–(7):
In the above Formula (4),
contains the device self-checking information in
Table 3, and it is divided into three parts: merging unit self-checking information
, protection device self-checking information
, and intelligent terminal self-checking information
, where Formulas (5)–(7) subscripts a, b, and c represent the number of these three types of device in the protection system of an intelligent substation. When the secondary monitoring system receives the alarm information, the element in the corresponding position is set to 1, otherwise, it is set to 0.
The sampling value information set
of the protection system of an intelligent substation in the
i-th fault event is established as shown in Formula (3).
In the above Formula (8), and represent the three-phase voltage and current sampling values in Channel 1 and Channel 2, respectively. I and U represent the three-phase current and voltage values of the dual channel.
To make the sample data of the different units comparable, improve the convergence speed of the model, and improve the accuracy of the model, the sampling value information is preprocessed by the Min-Max method, and the original value
in the dataset is mapped to the value
in the interval [0,1]. The conversion formula is shown in Formula (9):
In the Formula (9), and are the maximum and minimum values of the sampled values, respectively.
3. Fault Diagnosis of the Protection System Based on Gradient Boosting Decision Tree (GBDT)
The gradient boosting decision tree intelligent algorithm belongs to the ensemble algorithm, which has a good processing ability for discrete data and is very prominent in dealing with small sample data. Gradient boosting is the core idea and step of this intelligent algorithm for the classification task. When it carries out ‘multi-classification’ work, it is based on ‘two classifications’ and adopts the idea of ‘one positive class, multiple negative classes’. The training process of the gradient boosting decision tree intelligent algorithm is the main work of this section, which mainly includes selecting the optimal value of the learning rate and the number of iterations, and finally gives the fault diagnosis process of the protection system based on the gradient boosting decision tree.
3.1. Principle and Training Steps of Gradient Boosting Decision Tree
GBDT intelligent algorithm belongs to the ensemble learning algorithm. The ensemble learning algorithm is a hot topic in the field of engineering applications. It is a method to improve the learning ability through the combination of multiple weak learners [
16]. Compared with conventional methods, it has a good performance in terms of accuracy and generalization ability. Bagging and Boosting algorithms are two typical ensemble learning algorithms. The schematics are shown in
Figure 1 and
Figure 2.
The bagging algorithm generates
n training sample sets from the total sample library according to the random sampling method with playback. Each sample set trains a weak learner and uses the sample set to train
n weak learners. The weak learners run in parallel. According to different combination strategies,
n weak learners are combined to generate strong learners. The boosting algorithm is an inherited algorithm, in which the weak learners operate in a serial manner. The data weight in the training set of each iteration is changed by the learning results of the weak learners. The learning results are fitted according to the residuals, and then
n weak learners are combined according to different combinations to generate strong learners [
17].
The gradient boosting decision tree (GBDT) intelligent algorithm is one of the most widely used boosting algorithms in the engineering field, and it combines the sampling idea of the bagging algorithm, allowing sampling samples and features to increase the independence between weak learners. The GBDT intelligent algorithm does not change the sample weight in the iteration process, but continuously learns the negative gradient of the loss function, generates multiple new weak learners, and combines multiple weak learners into strong learners. Compared with the traditional machine learning algorithm, the GBDT intelligent algorithm can achieve higher accuracy in many of the application scenarios, and has a faster operation speed, stronger generalization ability, and lower requirements for parameter adjustment.
Regardless of whether the GBDT intelligent algorithm performs regression tasks or classification tasks, its core idea is “gradient boosting”, and the negative gradient of the loss function in the iterative process is shown in Formula (10):
In the above formula, is the negative gradient of the current loss function, namely the fitting target of the next iteration, is the current loss function, is the learning target value of the current weak learner, is the output value of the current weak learner, x is the input variable (refers to the fault feature information in this paper), K is the number of training samples, and i is the current training sample.
GBDT multi-classification is an organic combination of the GBDT binary classifier. In the training process, the idea of ‘one positive class, multiple negative classes’ is adopted. There are 12 kinds of fault types in the fault set. In the training process of a single sample, when the sample is the
i-th type of fault (
), it is assumed that the fault type of this sample is from 1 to 12, and each time it is assumed that the other 11 types of fault are unified as the negative samples of this sample, and 12 binary classifiers are trained to generate independent classification. Then, according to the real fault type of the sample, only the output result of one binary classifier (assuming that the i-th class is a positive class) is correct, and the rest are errors, then the final classification result is the fault type corresponding to this binary classifier. The schematic diagram of the GBDT multi-classification algorithm is shown in
Figure 3.
In this paper, when conducting 12-classification training for complex faults in the protection systems of intelligent substations, the fault feature information of the protection system in
Table 3 is used as a variable, and the complex fault types of the protection system in
Table 2 are used as the fitting target. The training process is as follows:
- Step 1
Select the fault sample
i in the training set. The fault feature information set of this sample is =
, and the fault type is
—“Main DSP module failure of merging unit” in
Table 2. Then, the true classification label (probability) of the fault sample in the 12 binary classifiers is the fitting target y = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0). Because the fault sample belongs to the first fault
, ‘1’ is used to indicate that it belongs to the fault, and ‘0’ is used to indicate that it does not belong to the other 11 faults, forming the input consisting of the fault feature information set and the fitting target: (
,1)(
)(
)…(
);
- Step 2
Input the input into 12 weak classifiers and obtain the output results:
- Step 3
Convert the output result into probability, as shown in Formula (11):
- Step 4
Calculate the loss function and solve the negative gradient of the loss function.
The loss function formula is shown in Formula (12):
The negative gradient formula of the loss function is shown in Formula (13):
- Step 5
Generate a new fitting target, namely (, )()()…(), and repeat steps 2 to 5.
Iterate M times according to the above steps, and after generating M weak learners, one training is completed. The training set contains 12 kinds of complex faults in the fault set and has several corresponding training samples. After each sample of the training set is trained once, the GBDT 12 classification model with a high accuracy can be obtained. It should be noted that the values of the number of iterations M and the learning rate σ require multiple verifications, and this process is described in detail in
Section 3.2.
3.2. Training of Fault Diagnosis Model Based on Gradient Lifting Tree
The GBDT model is trained by using the fault information of the protection system of a typical 110 KV intelligent substation in southern China.
Figure 4 shows the information topology between the devices of the protection system of intelligent substation. The protection system includes a line merging unit, a line protection device, a Bus protection device, and an intelligent terminal. The GOOSE/SV message-receiving form between devices is shown in
Table 4.
Select 4200 actual fault samples of this intelligent substation, and the distribution of fault types in the samples is shown in
Table 5.
The GBDT fault diagnosis model is trained by using the fault sample set of the protection system, with 75% of the samples as the training set and 25% as the test set. Use the training set to train the model according to the steps in
Section 3.1. Taking the diagnostic accuracy of the test sample set as the optimization index, the model is optimized by adjusting the learning rate σ and the number of iterations M, because these two parameters have the greatest impact on the accuracy of the model. The training results are shown in
Table 6.
It can be seen from
Table 6 that the accuracy of the GBDT model for fault diagnosis of the protection system is quite high, and when the number of iterations is 30 and the learning rate is 0.1, the accuracy is the highest, reaching 99.048%. The specific diagnosis results of the test set samples at this time are shown in
Table 7.
Compared with the existing research methods, such as recurrent neural network (RNN) [
12] and random forest algorithm (RF) [
13], the fault diagnosis accuracy under the same dataset is shown in
Table 8:
It can be seen from
Table 8 that the GBDT algorithm has the highest accuracy compared with the other two algorithms when dealing with the same dataset due to its excellent performance on small sample sets, and GBDT has fewer iterations in training, faster-running speed, and training process.
To explore the influence of the number of samples in the training set on the accuracy of the model and compare the accuracy of the three methods, according to the distribution ratio of fault samples in
Table 5, the number of samples in the training set is changed for training, and the test results are shown in
Figure 5Figure 5 shows that when the number of samples in the training set reaches 3800, the accuracy of the model reaches 99%, and with the increase in the number of samples in the training set, the accuracy of the model does not improve much. Therefore, in practical application, higher accuracy can be achieved when the number of samples in the training set reaches 3800.
3.3. Fault Diagnosis Process of the Protection System
Based on the above content, the fault diagnosis process of the protection system of an intelligent substation based on the gradient boosting decision tree is constructed, as shown in
Figure 6.
The specific steps are:
- Step 1
To avoid the false start of the diagnosis process, set the minimum number of alarm messages within 30 s after receiving the first alarm message. When the number of alarm messages received by the secondary monitoring system within the specified time is greater than or equal to , the fault diagnosis of the protection system of this intelligent substation is triggered.
The intelligence and integration of the secondary system make it produce a lot of alarm messages when a fault occurs. When the maintenance personnel repair the equipment incorrectly or the equipment is disturbed by environmental factors, alarm information is also generated, but the alarm information is single and small in number. In this case, the fault diagnosis of the protection system should not be started. To avoid the false start of diagnosis, according to the actual fault data analysis and field experience of the intelligent substation, within 30 s after the first alarm message appears, whether the number of alarm messages received by the secondary equipment monitoring system is greater than or equal to is the trigger diagnosis condition, and set = 3;
- Step 2
If the number of alarm messages is greater than
, the fault feature information of the protection system of the intelligent substation is extracted to form a set of fault feature information. The feature information in
Table 3 is collected, including the integrated alarm information, link information of GOOSE and SV, device self-checking information, and sampling value information from the secondary monitoring system. After data processing, the fault feature information set
X in
Section 2.3 is generated, which prepares the data for the fault diagnosis of the protection system of this intelligent substation;
- Step 3
Input the processed fault feature information set X into the fault diagnosis system based on GBDT for diagnosis. The specific process is: Input X into the binary classifiers in the GBDT model, respectively, and calculate the probability i = 1, 2, …) that this fault belongs to each complex fault. The one with the highest probability determines that the fault belongs to this type of complex fault and outputs the diagnostic result set R (Suppose the fault is ).
5. Conclusions
This paper sorts out the common faults of the protection system and proposes simple fault types and corresponding fault feature information and complex fault types and corresponding main fault feature information. The integrated alarm information, link information of GOOSE and SV, device self-checking information, and sampling value information that can be used as fault feature information of the protection system of an intelligent substation are sorted out to form a set of fault feature information. The model parameter adjustment of GBDT is completed according to the fault data. The fault diagnosis model of the protection system of an intelligent substation based on GBDT is studied and verified.
The method proposed in this paper has a high diagnostic accuracy and stronger generalization ability and is more suitable for processing the fault feature data of the protection system of the intelligent substation. The calculation example shows that the overall accuracy of the method proposed in this paper can reach 99.0476%. Compared with the existing methods based on recurrent neural networks and random forest algorithms, the method proposed in this paper has a higher fault diagnosis accuracy. In the case of one false alarm in the fault feature information data, the accuracy rate of the proposed method can reach 97%. In the case of two false alarms in the fault feature information data, the accuracy rate of the proposed method can reach 92%. In the case of three false alarms in the fault feature information data, the accuracy rate of the proposed method can reach 84%. In multiple fault diagnosis, the accuracy of the proposed method is 91.8367%. Through the above analysis, it can be concluded that the method proposed in this paper gives full play to the high accuracy and anti-overfitting ability of the GBDT algorithm when dealing with device faults in the protection system. Compared with the RNN and RF algorithms, this method is more convenient to adjust algorithm parameters in addition to higher accuracy. Compared with the existing methods, this method also performs very well when faced with bad data (false alarms of fault information, multiple faults). In conclusion, the method proposed in this paper can play a better role in practical applications.