1. Introduction
Faults on transmission lines disrupt the smooth operation of power systems; the robust finding of their location plays a vital role. Various fault location methods have been developed over the years; two categories, namely impedance-based methods [
1,
2] and travelling-wave-based methods [
3,
4], are well known. In the impedance-based methods, the fault position is determined by analysing the voltage and current measurements taken before and after the fault event, while in travelling-wave-based methods, the fault locations are computed by exploiting the transient signals produced by the fault. The accuracy of impedance-based methods can be influenced by the line configuration, fault resistance, load imbalance, and the presence of distributed generation sources [
5,
6,
7]. On the other hand, the disadvantages of travelling-wave-based methods are as follows: (a) The data acquisition requires complicated signal processing techniques. (b) The problem of synchronising the time between multiple observation points is challenging. (c) In lossy transmission lines, the travelling waves will be distorted and attenuated during signal propagation. Therefore, the accurate calculation of the arrival time will be challenging. (d) In scenarios involving branched lines, fault-generated transient signals have to propagate over numerous junctions and will experience attenuation, resulting in a decrease in their intensity. (e) In a mixed overhead–undergroundcable transmission line power system, the travelling speed difference between multiple media will cause location bias [
7,
8,
9,
10].
The time reversal theory, which was initially applied to the acoustic field, is used to address the issues of travelling-wave-based methods, in which ultrasonic pressure fields are recorded, time-reversed, re-emitted, and focused on the source target [
11]. Recent applications of time reversal theory to electromagnetics have led to the development of a method called the EMTR-based disturbance localisation method. It has been utilised for the localisation of flash-overs and lightning discharges, where it has been demonstrated that the wave fronts generated by time-reversed electromagnetic fields are focused at lightning strike locations [
12]. This concept is then employed for fault location in power grids, in which fault-generated transients observed at specific observation points are time-reversed and sent back to the same simulated system, focusing on the fault position [
5,
7]. In this regard, researchers have extended its application to various power system network topologies while proposing new metrics and design criteria for fault position estimation. Razzaghi et al., for example, investigated the use of EMTR-based fault location algorithms in systems such as series-compensated, overhead–underground mixed cable, and multi-terminal HVDC transmission line networks [
13,
14]. In [
15], an alternative approach to that of [
7] was proposed, where the fault position was determined on the basis of a particular range of the arguments of the voltage and transfer function. The authors of [
16] showed that the computed fault signals at the actual fault position were the time-delayed copies of the fault-generated transient signals. The concept of mirrored minimum energy was proposed in [
17], where it was demonstrated that, when time-reversed signals are back-propagated, their energy is minimum at the actual fault position. In [
18], the EMTR-based norm criteria, namely the two-norm and P-norm for fault location computation, were introduced, and it was confirmed that the P-norm performed better. Moreover, The EMTR technique has been validated experimentally on a full-scale setup [
19].
Currently, the EMTR technique can be defined as a correlation estimator between the transfer functions of the fault occurrence and fault estimation stages, yielding the correlation coefficient value (CCv) at the corresponding guessed fault location (GFL) [
8]. The correlation estimator metric is time-efficient compared to other EMTR metrics, because, in other metrics, the time-reversed transients are injected back into the same system and a large number of simulation batches are performed after the fault’s occurrence, whereas, for the correlation estimator metric, the database in the form of the transfer functions of the same system is already available to correlate with the fault-generated transfer function. Furthermore, its quantitative property outperforms other EMTR metrics in terms of offering confidence in the estimated fault candidate. For the sake of brevity, the detailed explanation of the correlation estimator is not presented here and can be found in [
8,
10,
20], where different aspects of its applicability were discussed. It is worth mentioning here that a new method based on time reversal, known as the FasTR algorithm, has been proposed recently to infer the fault location while employing the optimisation algorithm, which is time-efficient as well [
6]. However, it relies on the measured/recorded transients of two terminal observation probes. In the existing EMTR methods, usually, the low-impedance fault scenarios are considered. In recent times, the EMTR-III method has been proposed to accurately locate high-impedance faults [
21], and another EMTR-based method, which does not require knowledge of the fault impedance, has been reported [
22]. However, these methods rely on the recorded transients of multiple observation points. The correlation estimator, on the other hand, relies on a single observation point.
Nevertheless, in the correlation estimator method, the data preparation analysis of the fault estimation stage is of utmost importance. Some characteristics require further investigation, such as (1) the necessity of fault impedance analysis, because the fault impedance might take on any value during the fault occurrence stage; therefore, an accurate guess of the fault impedance during the fault estimation stage is crucial. (2) Only the transfer functions of the three-phase fault scenario are stored in the existing correlation estimator approach. In this manner, errors in the fault location and the fault type have been observed. (3) The spatial step considered in the fault estimation stage is an additional key aspect that requires investigation. It has been observed that, if a larger spatial step is selected, there will be an error in the fault location. The correlation estimator method identifies this inaccuracy based on its CCv, but the traditional EMTR method does not.
In order to improve the correlation estimator method and make it more adaptive while tackling the aforementioned issues, three different approaches are proposed in this study. Firstly, the impact of fault impedance mismatch between the fault occurrence and fault estimation stages was analysed. It was observed that the accuracy of the method was not affected by fault locations close to the observation probes, but when the fault position moved farther away, the error in the fault estimation increased significantly. Then, a simple approach was used to address this problem by storing the transfer functions for both the low- and high-fault-impedance scenarios during the fault estimation stage. Secondly, a so-called pseudo approach was used to accurately identify the fault type and locate the fault position, in which, in the fault estimation stage, the transfer functions are stored, taking into account all possible fault types, rather than just the three-phase fault type scenario. Finally, the efficiency of the correlation estimator increased with a feasible smaller spatial step, but at the expense of higher memory space requirements for the storage devices. This is especially the case for long transmission lines that have measuring devices with larger bandwidths. This memory problem was solved by proposing a regression-analysis-based hybrid approach, in which few fault estimation stage transfer functions were selected, instead of all the data. A relationship was learned/regressed between the CCvs and their corresponding locations along the line length, ultimately predicting the actual fault position.
The structure of the paper is as follows:
Section 2 discusses the basics of EMTR and the correlation estimator method. In
Section 3, the impact of fault impedance mismatch is analysed.
Section 4 presents a pseudo approach for the accurate identification of the fault type. The issue of the spatial step with the details is discussed in
Section 5, and the regression-analysis-based hybrid approach is presented in
Section 6. Finally,
Section 7 concludes this paper and proposes future works.
3. Impact of Fault Impedance Mismatch
As mentioned earlier, during a fault event, the fault impedance can take any value, and the accurate guess of fault impedance in the fault estimation stage is a very difficult task. In this section, the impact of the fault impedance mismatch between the fault occurrence and fault estimation stages of the correlation estimator method is presented. It is worth mentioning that the results shown in
Figure 5 were obtained assuming
Z = 10
in both stages. In order to observe the impact of fault the impedance mismatch,
Figure 1 and
Table 1 (with line length = 50 km) are taken as references for the demonstration. The transfer functions for the fault estimation stage were obtained and stored considering
Z = 10
, whereas, in the fault occurrence stage, four different values were considered, i.e.,
Z = 0.0
, 10
, 300
, and 1000
.
For the fault estimation stage, the BLT formulation was utilised, and
H were obtained. The fault phenomenon was emulated in the EMTP-RV environment, and
H were computed for test fault locations of 8 km and 45 km. The results are shown in
Figure 7 and
Figure 8 for the respective scenarios. It was observed that the fault impedance mismatch did not affect the method’s accuracy for the fault locations near the observation point. As it is clear from
Figure 7, when
x = 8 km, even with a high-impedance fault, i.e.,
Z = 1000
(solid black line), the GFL (
x) was 7.98 km. The mismatch due to
Z = 300
(solid grey line) had a negligible impact as well, having GFL (
x) = 7.99 km. The respective CCvs, i.e., 0.7941 and 0.8861, confirmed the mismatch between the fault occurrence and fault estimation stages. When the values of
Z in both stages were equal or almost similar, only then GFL (
x) =
x.
On the other hand, as the fault position moved away, the location error became significant in the presence of a mismatch. As can be seen in
Figure 8, for
x = 45 km considering
Z = 1000
(solid black line) and 300
(solid grey line) in the fault occurrence stage, the GFL (
x) was 44.71 km and 44.84 km, respectively. This impact can be eliminated by employing a simple approach, i.e., separately storing
H for low and high impedance faults. In other words, whenever a fault occurs, its related
H would be correlated with each
H of a low- and high-impedance fault, respectively. Then, the best candidate for the GFL (
x) would be selected based on the higher CCv. This becomes clear by looking at
Figure 9, where the results presented in
Figure 8 were obtained considering
Z = 1000
in the fault estimation stage. It can be seen that the fault location was estimated accurately, which was also confirmed by CCv = 1.00. In addition,
H due to
Z = 300
was correlated with a low and high fault impedance
H, and two GFLs (
x) were computed, i.e., 44.84 km and 45.07 km, respectively. The best candidate among these two will be decided based on the corresponding CCv, and in this case, the CCv of 45.07 km was 0.9501, which is higher than 0.8603 (solid grey line).
A comprehensive analysis of the impact of the fault impedance mismatch along the line length was performed, which is shown in
Figure 10 and
Figure 11. It can be seen that, in all cases, the impact was almost negligible for fault locations up to 25 km, and the impact is presented in the form of the relative location error, defined as
On the other hand, as the fault location moves away from the observation point, the impact increased for the cases of
Z = 300
and
Z = 1000
, while, in the fault estimation stage,
Z = 10
, as shown in
Figure 10.
A similar pattern was observed for the cases of
Z = 0
and
Z = 10
, when
Z = 1000
was assumed in the fault estimation stage, as shown in
Figure 11. Therefore, this analysis shows that both low and high fault impedances should be taken into account during the fault estimation stage to obtain more accurate results. Then, the fault location candidate with the highest CCv should be selected.
4. Identification of Fault Type with Accuracy
The issue of fault type identification for a three-phase transmission line is discussed in this section. In this regard, a 345 kV frequency-dependent transmission line system with the configuration illustrated in
Figure 12 was investigated; other related parameters were already presented in
Table 1. The corresponding total p.u.l. transmission line impedance and admittance matrices at f = 1 kHz are as follows:
The fault phenomenon was simulated in an EMTP-RV environment, shown in
Figure 13, and
H was obtained for each phase (explained in
Section 2.2), whereas,
H were computed at regular spatial steps along the line by solving the multi-conductor transmission line equation using the BLT formulation (also presented in
Section 2).
For any fault type scenario, three
H must be stored in principle. It was observed that, if only the
H of a three-phase fault scenario were correlated with
H, then the accuracy of the method would be affected. For instance, a fault location of 45 km in Phase “a” was considered, and the related
H were correlated with the
H of the three-phase fault scenario, as shown in
Figure 14. The GFL (
x) were inconsistent over all three phases, with values of 43.9 km, 46.9 km, and 47.30 km for Phases a, b, and c, respectively. Despite the fact that the CCv of Phase “a” was greater, i.e., 0.8416, identifying the faulty phase, the estimated fault location was inaccurate. A similar pattern was observed in other scenarios, which are presented comprehensively in
Table 3. This was because of the coupling phenomena that exist between the phases. The data in
Table 3 are written as (CCv,
x (km)).
In order to address this issue, another so-called pseudo approach is proposed, in which the
H for all the possible fault types, e.g., Phases a, b, c, ab, bc, ca, and abc, were stored. For the sake of simplicity, only three scenarios are written in
Table 3. Whenever the fault event will occur, to identify the fault type,
H will be correlated with all the storage data one by one. This will be decided based on the CCv and the consistency of the GFL (
x) in all three phases.
For demonstration purposes, the identical fault scenario in Phase “a” at 45 km was repeated, but this time, the related
H were correlated with
H of the Phase “a” fault type.
Figure 15 illustrates that the CCvs in all three phases were “1.00” and that the GFL (
x) were equally consistent and accurate, i.e., 45 km. Thus, the storage data of the fault estimation stage, at which the so-called “symmetry condition” is satisfied, determine both the fault type and the location.
6. Regression-Analysis-Based Hybrid Approach
Figure 1 is considered again, and the relevant information was already presented in
Table 1. In this approach, first,
H with a smaller
g were obtained and correlated with the transfer functions at fewer selected positions, obtaining the CCvs at those locations. The relationship between the CCvs and line lengths (with a smaller possible
g) was learned to construct a regression model. This model was then used for fault location prediction on new CCvs (obtained by correlating the selected transfer functions with the fault occurrence stage transfer function). Therefore, fewer
H needed to be stored instead of all of the
H and, in this way, ultimately relieving the memory burden of the data storage devices.
The MATLAB regression learner app (RLA) was used for the regression analysis [
29]. In this application, the training data were imported and trained with the available regression models. Based on their statistical properties, the best-fit models were selected, which were then evaluated on new data and and exported for future use as standalone applications. The available regression models in RLA are linear regression, decision trees, Gaussian process regression, support vector machines, and ensembles of tree.
Five locations were selected along the line length of 50 km as the storage locations, e.g.,
x (SL) = [10 m, 12,500 m, 25,000 m, 37,500 m, 49,990 m]. The transfer functions corresponding to these positions are known as storage transfer functions (
H). These
H were then correlated with
H (obtained at
g = 10 m), and the CCvs were computed utilising Equation (
6), mathematically written as follows:
These CCvs served as the independent variable, and a line length of 50 km with a spatial step of
g = 10 m was taken as the dependent variable. This training dataset was imported into MATLAB RLA to learn the relationship, employing all the available models.
Table 6 displays the statistical outcomes of these models. It is important to note that, for the sake of simplicity, only models with substantial outputs are listed in the table, where MSE = mean-squared error, RMSE = root-mean-squared error, MAE = mean absolute error, and R
= determination coefficient, which are mathematically defined as [
30]
,
y, and
are the mean, observed, and fit values of the dependent variable, respectively, whereas
N and
q are the total No. of observations and predictors, respectively. On the basis of these parameter values, the performance of the regression models was evaluated and the best-fit model was selected.
As can be seen in
Table 6, the Gaussian process regression (GPR) models outperformed the other regression models. In addition, GPR exponential has the lowest RMSE value = 18 and a determination coefficient value (R2) of “1”. Although the other regression models also had a determination coefficient value of “1”, the RMSE is considered the best criterion for the decision; moreover, the model that gave the required results would be the best choice [
30]. The response plot at one of the storage locations for GPR exponential is shown in
Figure 18. It is clear that there was a perfect agreement between the regression estimation and the training sample data. After training the data, the regression models were tested on new data, which were obtained by correlating
H with
H of some random fault locations along the line (e.g., 20 m, 370 m, 2750 m, 7320 m, 18,690 m, 23,980 m, 34,110 m, and 42,540 m), mathematically defined as
It was observed that the GPR models provided more precise and consistent fault location estimation. As shown in
Figure 19, the exponential model produced more promising outcomes within the GPRs, with the error near zero. The absolute error is mathematically denoted as
As the objective of this presented analysis was to reduce the amount of memory space required, the case was made to show that a fault can be computed accurately with just a few transfer functions and a regression model.
Table 7 compares the memory requirements of the hybrid technique and the existing correlation estimator (with a smaller spatial step). It is obvious that only tens or hundreds of megabytes were required, as opposed to allotting gigabytes (GBs) of memory for
H. For instance, for the case of line length = 100 km and frequency spectrum = DC-5MHz, while employing the existing correlation estimator, the memory space requirement was 75 GB, whereas, the same job could be performed with 80 MB by taking advantage of the machine learning algorithms.
As discussed in
Section 3, two separate copies of
H are required for low- (
Z = 10
) and high-impedance (
Z = 1000
) fault scenarios to locate the fault position accurately. This will, however, use more memory space; therefore, a hybrid approach can be applied to both scenarios independently to alleviate the large memory requirement for the data storage devices. Consequently, the fault location processes can be stated as follows: whenever a fault occurs,
H will be correlated with
H of low- and high-impedance faults. The regression model and
H for fault location prediction will be selected based on the maximum CCv of the two scenarios. For instance, in
Figure 20, it can be seen that the
H due to
x = 10 km with
Z = 0
was correlated with
H of low- and high-impedance faults, and the CCv of
H @
Z = 10
was higher. Therefore, for fault prediction, this scenario was selected to give 10.05 km. Similarly, for
H due to
x = 35 km with
Z = 500
, as the CCv of
H @
Z = 1000
was higher, so it would be utilised for fault prediction, which was 35.13 km.
The pseudo approach described in
Section 4 will also require additional memory, because it takes GBs to store just one copy of
H; as we mentioned in the case of a single-conductor transmission line, so the more memory space will be needed to store transfer function copies for each fault type. The memory burden once again was reduced by employing the hybrid approach, as argued before. The results provided in
Table 3 are repeated in
Table 8 using the hybrid method (requiring only a few selected transfer functions with a regression model). Even with a few selected transfer functions, the similarity of the results in
Table 3 and
Table 8 is evident. It is important to note that, although a new database must be generated when the system topology is altered, this will not influence the method’s efficiency because the data are prepared prior to fault occurrences, and this is not repeated.
7. Conclusions
In this paper, some key issues relating to the fault estimation stage of the correlation estimator method were analysed. First of all, the impact of the fault impedance mismatch between the fault occurrence and fault estimation stages was discussed. An error in the fault location was observed for the fault positions far from the observation point. To tackle this problem, a simple approach was proposed in which the fault estimation stage transfer functions are stored separately for low- and high-impedance faults. Secondly, a new pseudo method was developed to correctly identify fault types and pinpoint their precise locations, in which the fault estimation stage transfer functions were stored for all possible fault types.
Finally, the important issue of the spatial step, considered in the fault estimation stage of conventional EMTR and the correlation estimator methods, was studied. Meanwhile, the advantage of the latter method was also highlighted, which was based on the CCv. It was shown that only a smaller spatial step ensures the fault location’s accuracy, but at the cost of more memory space required. This problem was solved by proposing a hybrid approach utilising a mixture of the correlation estimator method and regression analysis, in which a few locations along the line length were selected to store their corresponding transfer functions. These transfer functions were correlated with the transfer functions having a smaller spatial step to provide the CCv. The line length and CCv then served as the training data for the regression model’s construction, and the Gaussian process regression was the best-fit model to predict the fault location. It was demonstrated that, by employing the existing correlation estimator, the memory space requirement was 75 GB, whereas the same job could be performed with 80 MB by taking advantage of the proposed approach.
The hybrid approach was also applied in conjunction with the simple and pseudo approaches. It is important to note that the current analyses were based on theoretical and simple cases. Complex networks will be investigated in future studies, and the experimental validation of the method proposed will be presented using field data from real-world systems. Moreover, the study of nonlinear fault impedance scenarios with uncertain parameters is already in progress.