1. Introduction
Overhead transmission lines (TL) are integral components of an electric power system that enable the supply of electric energy from its generation to the distribution networks. Generally, a short circuit (fault) in a transmission line represents a phenomenon that is difficult to predict, and such faults are characterized by the instant of occurrence, the type classification: phase-to-ground (SPG), two-phase (DP), two-phase-to-ground (DPG), or three-phase (TP), the location indication , and the fault resistance value (. In the event of a fault, the protection relay detects, identifies, and signals the event, commanding the circuit breakers to remove the short-circuited line from service. Following its actuation, automatic restart attempts are made. If successful, the line is reintegrated into the system; however, if the attempts fail, the line needs to be shut down until repair work is performed, which should ideally take place in the shortest time possible and with adequate reliability. Therefore, indicating the point of occurrence of the defect with the most minor possible error reduces the costs related to the shutdown.
Since the 1970s, there has been ongoing research about locating faults in transmission lines. However, it remains a considerable problem. The development of hardware has allowed access to more measurement points with time synchronization via the global positioning system (GPS), creating the possibility of applying new mathematical techniques to problems involving the electric power system. In addition to avoiding the payment of hefty fines to which electricity concessionaires are subject to regulatory agents, locating faults rapidly and accurately directly reflects the quality of the electricity supply and the stability of the system. To address the problem, according to the proposed location method, scholars on the subject establish, as shown in
Table 1, the location scenario, the origin and type of signal used, and the characteristics related to the pre-processing of the data. Existing methods typically use phasors of the voltages and currents of the transmission line terminals. Some authors applied the wavelet transform to the sampled signals to extract the detail coefficients and then the energy or frequencies.
Several authors have addressed the issue of fault location in TL through artificial neural networks (ANN). In [
21], the authors compare these structures to a black box. According to [
7], there is no single neural network structure and the results obtained with the technique highlight the possibilities presented by researchers, a situation verified through the bibliographic survey, as shown in
Table 2. Multilayer perceptron feedforward networks are predominantly used, with supervised learning backpropagation and the Levenberg Marquardt training algorithm. In most existing studies, with variations among the authors, the training data were obtained by simulating faults along the transmission line, with different values of
, fault incidence angle, and equivalent impedance of the sources generated through electromagnetic transient programs. Modular networks, such as those used by [
2,
5,
8,
9,
12,
14,
16,
17,
26,
29,
30,
31], represent a possibility in solving the problem of locating different types of faults in LTs, contributing to the reduction of computational effort.
Considering some recent work, in the study presented in [
23], instantaneous fault current measurements obtained by simulation are used in modular networks with training guided by Bayesian regularization. Most of these tests in that study indicate the short circuit distance with errors below 0.5% in relation to the simulated location. In [
25], location errors in the order of 10
−5 were achieved by employing the magnitudes of the frequencies of simulated fault voltage and current waves to feed the inputs of a single ANN. Moreover, simulated voltage and current signals from one of the line terminals of the electrical system in Great Britain were used in [
26] as input in modular networks, obtaining errors of less than 0.7% in relation to the location of the short circuit. Current and voltage wave frequencies from one of the terminals of a 230 kV and 100 km long transmission line, categorized into 5 groups, were used in [
27] as inputs to an ANN for the location of simulated SPG type faults, allowing to achieve errors of less than 0.4% in relation to the location of the fault. From the survey of academic production, there is a predominance of the use of ANN in simulated situations, and the results of an application of ANN to real cases of faults in TL are presented only in [
30], where reduced errors were achieved in a transmission line of the electricity supply system in Iran. The situation is related to the difficulty of generalizing ANN, which provides reduced errors in simulated fault location cases, but presents a lack of performance when applied to real cases, indicating the difficulty of making the training data represent the systems of the concessionaires assertively.
Regarding the use of optimization techniques, the Simplex method of linear optimization, Nelder-Mead, was applied by [
6] to an objective function to obtain the distance and fault resistance of simulated cases. The Nelder-Mead and Broyden–Fletcher–Goldfarb–Shanno numerical optimization methods were proposed in [
10] to locate simulated faults in transmission lines. Harmony Search optimization was presented by [
15] to estimate the distance from the substation to the fault point in simulated cases. Different optimization techniques are applied by [
21] to estimate the fault location from the minimization of an objective function of a variable. In the authors’ analysis, although small localization errors were achieved in all situations, the Teaching Learning Based Optimization technique presented a shorter convergence time to obtain the results. A hybrid model that combines the Relief algorithm (which performs the sequential comparison of the entire database, returning a logical response associated with the operating functionality parameters) with the Transformed Wavelet for fault detection and location is proposed [
22]. The implementation of genetic algorithms to the problem of fault location in TL is presented in [
28] to compare waveforms from digital fault recorders and simulated waves. The phasors obtained during the fault were used to calculate the fitness value. The results of the proposal were not satisfactory when applied to real faults. In [
31], the authors proposed objective functions based on symmetrical components that, applied to simulated and real faults in the Brazilian electrical system and minimized by the ellipsoidal algorithm, provided promising results for using the method in practical situations. In the context presented, this paper aims to:
to analyze the use of ANN and nonlinear optimization techniques (NLO) in the location of TL faults for simulated and real cases of the Brazilian electric power transmission system;
to compare the results obtained by the analyzed methods with those provided by the classical analytical method (AM), proposed in [
32] and used by some electric utilities in practical applications;
apply a statistical analysis that allows evaluating the existence of differences between the responses obtained by the methods in the fault location process.
The research proposal is related to evaluating and comparing the errors achieved by the implemented methods so that the possibility of joint use can be verified to improve the fault location process in TL. The application of more than one method can give engineering and maintenance teams confidence in indicating the distance to failure. In practice, inaccurate results drive repair personnel away from the point of failure and reduce confidence in the process. The proposed methods, developed and applied in simulated and real situations of faults in the Brazilian electrical system, allow statistical analysis of the variance and application of the Tukey’s test to localization errors, validating the difference between the results achieved by the different methods. Also, considering the bibliographic contribution used, the present study made it possible to verify that:
in simulated scenarios, smaller localization errors are obtained using ANN and larger using the AM, leaving the NLO techniques with intermediate errors. In real scenarios, different from the situation verified in simulations, greater errors were evidenced with the use of ANN, with no statistical evidence to reject the equality between the fault location performed by the AM and the NLO methods;
the neural networks implemented in the proposed paper, based on techniques found in the literature, proved to be incompatible for application in companies that operate electrical systems, providing location errors significantly greater than the other methods, far from acceptable in practical situations (up to 5%);
the Quasi-Newton (QN), Ellipsoidal (EL), and genetic (PRGA) of NLO methods were used in the study. Among them, considering the precision and computational effort, the QN method was the most suitable to be used in field applications, being able to be used together with the AM for the defect location indication.
Notably, for the location methods presented, this study does not intend to evaluate the sensitivity of the algorithms to factors such as current and potential transformer errors, line model, errors inline parameters, synchronization, mutual inductance, transposition, and phasor estimation method. Data are presented in equal conditions to all methods. Accordingly, computer programs containing routines and mathematical techniques necessary to provide the distance to the fault in relation to one of the line terminals were implemented.
Although the paper deals with fault location methods, preventive measures such as transient earth voltage measurement to detect partial discharges [
33,
34], methods to detect faulty insulators [
35,
36], AM modulation methods to detect sources of problems in voltage parameters [
37,
38] and thermography [
39] can be adopted to reduce or avoid future problems.
3. Statistical Study
An analysis of possible statistical differences between the proposed location methods and the types of fault that can occur in a fault event was developed by analyzing the variance with two factors (Anova2) of the location errors presented by the methods using Python [
51]. The procedure, as described in [
52], allows us to investigate the equality of means in experiments with more than one factor. For the application of the process, the test data must be obtained randomly, and the data must be adjusted to a normal distribution, with mean
μ = 0 and constant variance σ
2. These requirements, used in the process of formulating the Anova2 model, allow the generalization of a population from a sample. Regarding the test, it is possible to verify the existence of evidence of variations between the levels of factors that may interfere in the process, signaling the existence of at least one level that differs from the others. In applying the method to the fault location problem, the type of method and the type of fault that occurred were considered factors to be analyzed. Internally, different levels were considered for each of these factors, as shown in
Figure 14.
Mathematically, Anova2 can be expressed using Equation (1), and it is suitable for the problem to be studied. Accordingly,
µ is considered as the global average effect of errors arising from the experimental location process,
denotes the effect of the
ith level of the fault type factor,
represents the effect of the
jth level of the method type factor,
indicates the effect of the interaction between the fault type and method type factors,
is the number of levels of the fault type factor,
indicates the number of levels of the method type factor, and
denotes the number of samples collected for each level.
The test deals with the influence of the interaction between the two factors analyzed and occurred on the hypothesis tests described in Equations (2)–(4). Each null hypothesis (
) supports the assumption that the averages of the errors of the levels are the same for the analyzed factor. Moreover, with the alternative hypothesis (
), the existence of at least one different average is assumed. The test results conducted and interpreted in [
52] provide tools that allow us to either accept or reject a statistical hypothesis through the evidence provided by the sample.
When the Anova2 results reject and, consequently, accept , there is statistical evidence that at least one of the means of the levels differs from the others for the analyzed factor. This conclusion, analyzed for the fault location problem, allows us to verify whether the type of fault that occurred or the type of method used leads to a differentiation in the efficiency of the location process. The rejection of is performed by considering a certain level of significance related to the probability of rejecting this hypothesis when it is true. In statistical hypothesis tests, the rejection of is considered to be significant when the observed p-value is lower than the significance level defined for the study.
Although the Anova2 test indicates the existence of a discrepant level among the others, it does not locate the observed difference. It is necessary to apply a test of comparison of means between the levels to determine the best (or worst) level of the factor. As a complement to the Anova2 study, the Tukey test was applied at a significance level of 5%. The test involves building confidence intervals for all pairs of means in such a manner that the set of all intervals has a certain degree of confidence.
4. Results
In the fault location process, deterministic (QN, EL, AM) and non-deterministic (ANN and PRGA) procedures were implemented and applied. In the application of ANN to the problem, intelligent processes are used to recognize patterns simulated by the ATP related to the identification of the fault location. These structures characterized by adaptation by experiences, learning ability, generalization ability, data organization, etc., can provide a satisfactory solution to the problem. We attempted to obtain the minimum point of an objective function of two variables by applying the implemented NLO algorithms,
from the methods of QN, EL, and PRGA.
Table 7 shows the number of iterations performed and the execution time required by each algorithm to reach the result referring to a given fault case.
4.1. Fault Location: Simulated Cases
The quantities expressed in
Table 8 refer to the number of simulated cases for validation of ANN and the application of NLO techniques and the AM method, which depends on the type of fault and the methods implemented. A statistical study is needed to compare the methods used in this study. Owing to the stochastic characteristics of the location by the ANN and by the PRGA, the means and variances of the location values of these methods were obtained from 40 executions.
Following the implementation and execution of the location step, the balancing of the number of samples per class was applied for the adequacy of the data used. Accordingly, 24 samples were randomly selected from each of the possible combinations of the studied factors, a number corresponding to the smaller quantity of samples of the relationship between the fault event and the implemented methods, as listed in
Table 8. Associated with the Anova2 test, the results achieved in the balanced experiment allow the use of the differences between the averages of the levels for the estimates of the main effects of the factors and the interaction between them. When the experiment is unbalanced, differences in factor level means may be associated with unbalanced observations rather than changes in factor levels.
From Equation (5), the percentage errors of the results indicated for each fault location were calculated, provided by the difference between the estimated value by each method and the simulated value in relation to the length of the line.
In simulated scenarios, regardless of the type of fault that occurred, the analysis of the Box-plots of the percentage errors, represented in
Figure 15, revealed the better performance of the ANN.
The verification of the existence of differences between the types of methods was based on the Anova2 statistical test. Particularly, considering the results provided by the AM method and the NLO methods, when conducting the analysis of variance for the comparison of samples, it was assumed that the observations were independent and normally distributed, maintaining the variance constant in each treatment. According to [
52], the graphic results expressed in
Figure 16 made it possible to verify that these assumptions were satisfied.
In the quantile-quantile plot of Normal shown in
Figure 16a, the configuration of points approached a straight line, indicating that the residuals are normally distributed. The residuals shown in
Figure 16b were located approximately around a horizontal band, signaling the validity of the independence assumption. Regarding the homoscedasticity of the levels of the type of fault and type of method factors,
Figure 16c,d present the residual plots from randomly arranged points without a set pattern for level change, having adjusted values both for the type of fault factor and type of method factor. In addition to graphical analysis, the verification of the premises for the application of Anova2 was based on statistical tests, as indicated in [
53]. In this sense, the randomness of the residues was verified by applying the Durbin-Watson test (
DW), with the result
DW = 1.805907. The D
W test statistic is in the range between 0 and 4; values close to 2 indicate no correlation between the residues, ensuring the randomness of the sample. The normality of the residues, with 5% significance, was verified by the Shapiro-Wilk hypothesis test (
SW), with a
p-value result (
SW) = 6.023376 × 10
−2. To test the normal distribution, it is assumed that H
0 is symmetric, whereas
H1 assumes that the variable distribution is asymmetrical. In the Shapiro-Wilk test, the test’s significance was greater than 0.050,
p-value > 0.050, indicating that the residues follow a normal distribution. Regarding the homoscedasticity of the residues, Bartlett’s test (B) was used, which is not affected by the sample size and should be used when the residues present normal distribution, and also the Levene test (L), which, although limited to balanced samples, does not require the assumption of normality. In these tests, H
0 indicates that the residue variances are homoscedastic, and
H1 assumes that at least one variance differs from the others. For the type of fault, the tests provided as a result
p-value (B) = 6.071991 × 10
−1,
p-value (L) = 3.748992 × 10
−1 and for type of method
p-value (B) = 2.511740 × 10
−1,
p-value (L) = 5.339021 × 10
−2.
As presented in
Table 9, the application of the Anova2 test leads to the rejection of the null hypothesis for both factors analyzed. Specifically, about the location methods evaluated, the indication is that at a 5% significance level, the effects of the type of method affect the percentage error for the fault location, with at least one of the location methods that differ from the others. The result of the
p-value (PR) obtained by the test statistic was close to 5.816931 × 10
−12, a value considerably lower than 5%. With the data from
Table 9, it is also possible to make inferences regarding the types of faults. However, this analysis was not addressed in the proposal of this study.
As a complement to the Anova2 study in simulated cases, the Tukey test was applied at a significance level of 5%, and the results expressed in
Table 10 were obtained.
The Tukey test is interpreted based on the value of the minimum significant difference (MSD), obtained from the distribution of the studentized amplitude, the mean square of the residues of Anova2, and the sample size of the groups, in the confidence interval determined. For the test, represented in
Table 9 and
Figure 17, the modulus of the mean difference between the pairs of methods used In the fault’s location was greater than the MSD value obtained for the pairs AM—QN, EL—AM, and PRGA—AM. The value 0 (zero) is not contained in the confidence intervals of these pairs, indicating that the average performance is significantly different between them. Furthermore, the percentage errors provided by the AM method were greater than those provided by the three NLO methods.
4.2. Fault Location: Real Cases
The procedures used for the pilot study were applied in other TL. In addition to the 74.4 km transmission line used initially, the techniques for locating faults in three other lines were implemented. Accordingly, the implemented methods were tested in seven real cases of short circuits caused by atmospheric discharge (AD) and fire in the modeled lines, as indicated in
Table 11.
The results achieved by the methods for the real cases are recorded in
Table 12.
For the statistical analysis of the results, the location errors were calculated from the difference between the location estimated by the method and the actual distance from the defect (provided by the inspection team) in relation to the length of the line where the bending occurred. For the ANN and the PRGA, the errors were indicated from the average error of the 40 executions.
To identify the existence of differences in the location results that occurred depending on the type of method used, the verification of the randomness, normality, and homoscedasticity supported the assumption that the observations are independent, normally distributed, and maintain the same variance in each treatment, allowing the application of the Anova2 model.
Using the results of the test application presented in
Table 13, the null hypothesis of equality between the methods was discarded, and the result of the
p-value obtained by the test statistic was close to 9.92 × 10
−4, which is considerably less than 5%.
Tukey test results presented in
Table 14 made it possible to identify a difference in the location of faults performed by the ANN, which provided greater location errors than the other methods.
In real cases, using the Tukey test, as shown in
Table 14 and
Figure 18, there was no statistical evidence identified for the difference between the location of faults carried out between the different optimization methods implemented, nor between these methods and the classic analytical. The module of the mean difference between the pairs of methods used was greater than the value of the MSD obtained for the pairs PRGA—ANN, EL—ANN, AM—ANN, and QN—ANN. The value 0 (zero) is not contained in the interval’s confidence of these peers, indicating that average performance is significantly different between them. Furthermore, the worst performance was presented by the ANN.
An interesting point to be highlighted for the real cases is that, although the objective functions for the optimization algorithm were developed for the short transmission line model (with only the parameters of resistance and series inductance concentrated), there was no significant difference in the errors obtained in relation to the analytical method, which considers the long-line model. It is noteworthy that only seven cases were investigated (for lines of 74.4 km, 105.58 km, 248.44 km, and 342.71 km), requiring further studies from a larger database. For the cases presented, the short line model used for the transmission line was not preponderant for the results, probably due to other errors such as those caused by current and potential transformer, line parameters, mutual inductance, transposition, and phasor estimation method.
The neural networks, as they have been conventionally used, with voltage and current phasors as input structures, provide reduced errors in simulated cases, but they own the difficulty generalizing for the real fault location function. It should be mentioned that the electrical system model in simulated cases is the same for training and validation, which leads the network to obtain accurate results. The largest errors of the ANN in the real fault location process are related to the values of the Thevenin equivalents (sources and impedance) at the local and remote ends of the transmission line. Electrical systems are dynamic, with load and generation varying 24 h a day. In order to generate the fault files used in the training of location ANN on real cases, the Thevenin voltages and impedances provided by the electric utility were inserted into the ATP, calculated from a short circuit program that considers the static system with light or heavy load. However, the exact values of these equivalents at the time of the fault cannot be determined, which compromises the results of neural networks applied to real cases, contributing to unsatisfactory results. To mitigate the problem, a neural network fault location method, in which the Thevenin equivalents of the line terminals are not needed, is already being developed by the authors. The proposal will enable the practical application of these structures to solve this type of problem in conventional lines and on lines with series compensation, in which fault location becomes more complex due to the non-linearity of the Metal Oxide Varistor, the capacitor’s protection element.