The Development of a Data-Based Leakage Pinpoint Detection Technique for Water Distribution Systems

Kim, Ryul; Choi, Young Hwan

doi:10.3390/math11092136

Open AccessArticle

The Development of a Data-Based Leakage Pinpoint Detection Technique for Water Distribution Systems

by

Ryul Kim

and

Young Hwan Choi

^*

Department of Civil and Infrastructure Engineering, Gyeongsang National University, Jinju 52725, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(9), 2136; https://doi.org/10.3390/math11092136

Submission received: 23 March 2023 / Revised: 26 April 2023 / Accepted: 28 April 2023 / Published: 2 May 2023

(This article belongs to the Section Engineering Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

Leakage is one of the abnormal conditions in water distribution systems (WDSs). Real-time monitoring can be used to prevent or recover quickly from leakage. However, this is not enough: for improved leakage detection, a status diagnosis of the WDS must be performed together with this real-time monitoring, and numerous studies have been conducted on this. Furthermore, the existing proposed methodology only provides optimal sensor location and fast recognition. This paper proposes a technique that can quantitatively evaluate the volume of leakage along with leakage detection using deep learning technology. The hydraulic data (e.g., pressure, velocity, and flow) from the calibrated hydraulic model were used as training data and deep learning techniques were applied to conduct a simultaneous detection of leakage volume and location. We examined various scenarios regarding leakage volume and location for the data configuration of a simulated leakage accident. Furthermore, for optimal leakage detection performance, the detection performance according to the size of the network, the meter types of meters, the number of meters, and the locations of the meters were analyzed. This study is expected to be helpful in various aspects such as recovery and restoration decision making after leakage, because it simultaneously identifies the amount and location of the leakage.

Keywords:

water distribution systems; leakage; emitter; leakage detection; deep learning

MSC:

47N70

1. Introduction

Leakage is a typical abnormality in water distribution systems (WDS) and its severity varies depending on the volume, location, and condition of the system. Background leakage, which generally involves a small leakage volume and occurs at the pipeline connections, could also cause serious socio-economical damage over time, such as sinkholes and pressure deficiency. Leakage accidents can be defined in various forms, depending on the leakage volume and whether there is a report. Furthermore, leakage increases the operating costs of the system and adversely affects its efficiency. Additionally, WDSs have the potential for an inflow of contaminants due to leakage, which can directly impact the water quality. To prevent and detect leakage in WDSs, various methods are used, with a representative method being the implementation of maintenance programs based on regular monitoring.

The water distribution system monitors and controls the observation data of the facilities installed through efficient maintenance (e.g., pressure gauge, flow meter, and tank water level meter), using a supervisory control and data acquisition (SCADA) system. This enables the real-time monitoring of the prediction errors of the system characteristic values, such as pressure and flow, to detect abnormalities [1] One of the methods for exploring leaks in the field is phase testing [2]. Phase testing is a method in which the control valves installed in each section are closed one by one and the fluid changes within the isolated area are analyzed to identify leaks. However, as the valves are blocked step by step to detect the leaks in phase testing, there are many constraints in this technique, depending on the position of the valves, and the shape of the pipe network also becomes a big constraint for performing the test. Another field method utilizes a leakage sound detector with a listening stick, an electronic leakage detector, and a noise logger to locate the leakage in the pipeline accurately. Research has been conducted to specify the signals from leakages precisely [3]. However, for leakage detection in the field, the availability and recognition of the leakage is prioritized and leakage identification by reporting is the main task. Leakage identification by reporting is defined as reported burst leakage, which accounts for a very low percentage of leakages.

The majority of leakage incidents are unreported burst leakages, which account for the largest leakage volume in terms of the duration after burst and maintenance time. In addition to monitoring and solutions for the field exploration of leakages, there are various ways to improve the system’s efficiency, such as pressure management strategies for the replacement of old pipelines, an improvement in the system’s elasticity, and the use of leak detection sensors. It is difficult to obtain accurate system state estimation information only through the real-time monitoring of the water distribution system, and the analysis and interpretation of measurement data are more important factors for improving it [4]. In conclusion, for quick detection and identification after leakage, analyzing data in real time is effective for minimizing labor and leak detection time [5]. Among them, the effect of using modeling on detecting and reducing the abnormal conditions in the water distribution system has been proven by Karadirek et al. [6]

Among previous studies on leak detection through data analysis, Min et al. proposed a two-step model for detecting and locating leaks, and the methodology for each step specified the location of the leaks through K-mean clustering and trial and error optimization procedures [7]. Mounce et al. detected an abnormality from long-term time-series data on flow and pressure, estimating and detecting the leakage using support vector regression as their detection method [8]. Jung and Lansey used a Nonlinear Kalman Filter (NKF) to estimate the state of the system and detect leaks, in order to overcome the limitations of detection methods under consistent operating conditions [9]. Nam et al. aimed to detect leaks through a monitoring and maintenance system in a comprehensive direction, determine the optimal sensor configuration, and isolate the location of the leaks, improving the operation of existing systems through multivariate statistical analysis techniques for flow and pressure data, which it intended to improve [10]. Ahn et al. proposed a methodology for improving the overall leak detection rate by reducing the false alarm rate and average detection time through a hybrid SPC method, combining the WECO and CUSUM methods [11]. Lee and Yoo [12,13] evaluated the leakage detection performance of the deep learning model by applying an RNN-LSTM-based leakage recognition model for South Korea leakage accidents, showing more than a 90% accuracy at all the points, except for the singular points. Wang et al. [14] proposed a deep learning framework applicable to DMA and leakage management at the DMA level through an LSTM-based model. Fang et al. performed leakage detection with an accuracy of over 90% for single and multiple leakages by conducting this leakage detection through pressure data with a CNN model [15]. Jung et al. [16] quantitatively evaluated the degree to which the performance of the ANN model for leakage detection varies according to the degree of uncertainty of the input data.

This study proposes a leakage detection method that can accurately identify both the volume and location of leakage after its occurrence. To simulate leakage scenarios, an emitter function was used to simulate this leakage randomly within the water distribution system. We obtained the measurement data on the leakage through a hydraulic analysis program, which allowed us to accurately determine the existence or absence of leakage and its volume and location using a deep-learning-based model. The performance of the leakage detection model was evaluated through the benchmark network. Based on the results obtained from the benchmark network, the leakage detection model was applied to a domestic medium block network to evaluate its applicability to actual networks.

2. Configuration of Simulated Leakage Accident Scenario and Detection Technique

The methodology for performing the leakage detection, taking into account a simulated leakage accident scenario, is shown in Figure 1. The training data were constructed by combining the normal data with random accident data to configure the simulated leakage accident scenario. The WDS for the simulated leakage accident scenario was applied to the benchmark network and a real-world network to perform the leakage detection.

Figure 1 shows the flowchart for the study of the leakage volume and location determination using a leakage detection model. A hydraulic analysis program was used to apply the simulated leakage accident scenario. The maximum leakage volume was applied to the demand of each node in the water distribution system and a hyperparameter tuning of the model was performed to train it. The results of using all the measurement data to configure the pressure data for the model training were analyzed in comparison with the results of performing leakage detection using a minimum number of meters.

2.1. Configuration of Simulated Leakage Accident Scenario

In order to build the data for the leakage detection, the hydraulic analysis results according to each time and system condition are essential. The normal condition data for the leak simulation and the accident data with the leakage were analyzed using EPANET 2.2 [17]. Using the hydraulic analysis model of the water distribution system that was calibrated, the effect of the location and size of the pipe breakage accident that the operation manager wants to check can be simulated [1]. The emitter of the EPANET 2.2 program was used to simulate random leakage. The emitter is a function that discharges the flowrate arbitrarily, and in this study, the leakage is simulated arbitrarily and used for a quantitative evaluation according to the leakage volume. The leakage volume at the node, according to the emitter coefficient, is expressed as follows:

q = C p^{γ}

(1)

where,

q

: leakage volume,

p

: pressure,

C

: emitter coefficient, and

γ

: emitter exponent (0.5).

In the method for simulating the leakage using an emitter coefficient, the emitter was applied to all the nodes to generate random leakages. Approximately one third of the node’s basic demand was assumed as the maximum leakage volume. For the γ in Equation (1), a value of 0.5, which is commonly used, was applied [18]. K-water [2] determined a burst leakage of 0.25 m³/h or more at pressures above 50 m. For the leakage simulation, this study adopted a pressure-driven analysis (PDA) to perform the hydraulic analysis using the EPANET 2.2 program. The commonly used demand-driven analysis (DDA) can result in unrealistic hydraulic analysis problems, such as negative pressure, when interpreting an abnormal network situation such as pipe destruction and fire [19].

2.2. Leakage Detection Model: Deep Neural Network

A leakage detection model was constructed using a deep neural network (DNN) [20]. Figure 2 shows a schematic of this DNN. The DNN is suitable for nonlinear data predictions and its performance can be controlled by adjusting the numbers of hidden layers and neurons. However, the DNN has a high likelihood of overfitting. This problem is directly linked to the performance of the model, and a parameter adjustment and selection of the activation function are required to prevent this overfitting [21]. For the activation function, the rectified linear unit (ReLU) was applied, which is expressed as follows:

f (x) = \max (0, x)

(2)

The ReLU converges faster than other activation functions, such as Sigmoid and Tanh, and is efficient, owing to its simple operations. However, if the input value is negative when the ReLU function is applied, a dying ReLU phenomenon may occur where the slope is 0 and the weighting update does not take place. In addition, a gradient vanishing phenomenon may occur during training, in which the differentiated gradients become excessively large or disappear. He Initialization, a weight initialization method suitable for the ReLU function, was applied to correct this. It is easy to prevent overfitting when applying weight initialization, which can correct the dying ReLU phenomenon. For the convergence of the model, adaptive moment estimation (Adam) was used for the optimum function [22]. Equations (3)–(6) for Adam are as follows:

m_{t} = β_{1} m_{t - 1} + (1 - β_{1}) \nabla f (x_{t - 1})

(3)

g_{t} = β_{2} g_{t - 1} + (1 - β_{2}) {(\nabla f (x_{t - 1}))}^{2}

(4)

{\hat{m}}_{t} = \frac{m_{t}}{1 - β_{1}^{t}}, {\hat{g}}_{t} = \frac{g_{t}}{1 - β_{2}^{t}},

(5)

x_{t} = x_{t - 1} - \frac{η}{\sqrt{\hat{g_{t}} + ϵ}} \cdot \hat{m}

(6)

where,

β_{1}

: momentum exponential moving average (EMA)

\approx

0.9,

β_{2}

: RMSProp exponential moving average (EMA)

\approx

0.999,

\hat{m}, \hat{g}

: correction values for preventing

m_{t}

and

g_{t}

from becoming zero during the early training,

ϵ

: a small value for preventing the denominator from becoming zero

\approx

10⁻⁸, and

η

: learning rate

\approx

0.1–0.0001.

Mean absolute error (MAE) was used for the error function to evaluate the model’s error. The MAE is expressed as Equation (7). Additionally, root mean squared error (RMSE) was used along with MAE for a reliable discussion of the model’s performance. The RMSE is expressed as Equation (8). MinMaxScaler normalization was applied for the configured normal and accident data, which is expressed as Equation (9).

\frac{1}{N} \sum_{i = 1}^{n} |x_{i} - x|

(7)

\sqrt{\frac{1}{N} \sum_{i = 1}^{n} {(y_{i} - t_{i})}^{2}}

(8)

x^{'} = \frac{x_{i} - \min (x)}{\max (x) - \min (x)}

(9)

To summarize, the DNN model to be used as the current leakage detection model applied ReLU as the activation function, MAE as the error function, and Adam as the optimum function. The input layer consisted of 200 neurons, which were then passed to the hidden layer, consisting of four layers. The hidden layer also consisted of 200 neurons, the same as the input layer, and the output layer had 2 neurons. A learning rate of 0.001 was applied in this study. Pressure data were used as the learning data to be applied to the input layer, and the learning data varied depending on the applied emitter coefficient. The hydraulic analysis results, which varied depending on the emitter coefficient and location of occurrence, could indicate the volume and location of the leakage. The learning data that passed through the hidden layers then represented the leakage location and leakage volume in the two neurons that existed in the output layer.

2.3. Performance Indices (PI)

The performance of the leakage detection model is directly linked to the detection of the leakage and a reduction in the ratio of false alarms [23]. Multiple classification models were selected as evaluation metrics for evaluating the leakage and false alarm detection performances using the model. The multi-classification model consisted of four variables: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). These values were used to calculate the Precision and Recall, and the performance of the model was evaluated using the F1 Score. Equation (10-12) and each variable are as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(10)

R e c a l l = \frac{T P}{T P + F N}

(11)

F 1 S c o r e = \frac{2 * R e c a l l * P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(12)

The F1 Score represents the harmonic mean of the recall and precision. Figure 3 is an example of this F1 Score. In this study, which evaluated performance with randomly selected data, the leakage volume and location were imbalanced data that did not reflect a fixed number of leakage simulation scenarios. In this case, it is possible to evaluate the performance of the model in a more balanced way than the existing arithmetic mean. Thus, it is more appropriate to assess the performance of currently developed models.

3. Application and Results

The model was applied to both a benchmark network and a real-world network, and the results were analyzed in detail. For the benchmark network, the performance of the leakage detection model was evaluated according to the leakage scenario. The leakage detection result from the pressure data obtained from all the nodes was analyzed by comparing it with the leakage detection result from the data of 10 nodes, with a high standard deviation of the pressure.

Step 1-1. Perform the leakage detection using the pressure data from all the nodes and evaluate the result.

Step 1-2. Perform the leakage detection using the pressure data from 10 nodes with a high standard deviation of pressure and evaluate the result.

In the case of the real-world network, location-based clustering of the target network was carried out to solve the limitations of the conventional model, which finds the number of nodes because it is targeted at networks larger than a medium block. Then, the results were compared in two different cases.

Step 2-1. Determine the optimal number of clusters in the real-world network.

Step 2-2. Determine the minimum number of meters in each cluster and their locations.

The real-world network was divided by the optimal number of clusters and then the minimum number of meters and the location of the meters in each cluster were determined. Subsequently, the leakage detection was performed based on the hydraulic analysis data and the performance was evaluated.

3.1. Austin Network

A simulated leakage scenario was generated for the Austin network [24], which was modified to assess the performance of the leakage detection model, and then the leakages were detected and evaluated. The corresponding networks were all calibrated; the morphology of the Austin network is shown in Figure 4. The Austin network consisted of 67 nodes, one reservoir, 90 pipelines, and seven pumps. The average pressure used for the training data in the network was 124.9 m, the maximum pressure was 135.6 m, the minimum pressure was 108.5 m, and the standard deviation was 5.05. Using the pressure data from all the nodes, including 45 random leakage scenarios and 55 normal scenarios for the modified Austin network, the leakage occurrence, volume, and location were determined simultaneously. The leakage volume and location were determined simultaneously to evaluate the performance of the model using the pressure data from nine nodes with large standard deviations of pressure.

Performance Evaluation According to the Austin Network Leakage Scenario

Figure 5 shows the positions of nine nodes with large standard deviations of pressure in the Austin network. The detection effectiveness is related to how well burst events are detected and false alarms in natural random patterns are avoided [23]. Detection probability (DP) refers to the percentage of the detected leakage (N_d) in the total number of leakages occurring (N_tl). DP can be expressed as follows Equation (13).

D e t e c t i o n P r o b a b i l i t y (D P) = \frac{N_{d}}{N_{t l}} \times 100

(13)

Table 1 presents the results of the leakage detection for 100 scenarios, including 55 normal scenarios and 45 leakage scenarios, using the pressure data from all the nodes in the Austin network. Regarding the existence or absence of leakage, a false alarm occurred in 1 out of 100 scenarios and 99 scenarios accurately determined the existence and absence of leakage. Furthermore, the ratio of accurate detections of the leakage volume (emitter) among the 45 leakage scenarios was 0.24, and the ratio of accurate detections of the leakage locations was 0.84. The emitter error in the leakage volume was approximately 1.49 and the error ratio of the leaking location, which indicates the number of the node, showed an error of approximately 0.27.

Table 2 lists the results of the leakage detection conducted with nine nodes with large standard deviations of pressure in the Austin network and the pressure data obtained from the respective locations. Regarding the existence or absence of leakage, we were able to detect the leakage accurately in 99 scenarios, except for 1 of the 100 scenarios, the same as the results for performing the leakage detection using all the nodes. However, the ratio of accurate detections of the leakage volume out of the 45 leaking scenarios was 0.22 and the ratio of accurate detections of the leakage locations was 0.71, which showed a lower performance than the results of the leakage detection using the pressure data of all the nodes.

Table 3 lists the errors in the leakage volume (emitter) and location detections. When all the meters were used, the emitter error was approximately 1.49 and the number error of the node for the position was 0.26. The leakage detection with nine meters showed an emitter error of 1.82 and the location number of nodes showed an error of 0.58. The larger the number of nodes in the network, the more likely it was that even a small error would show different numbers of nodes, owing to the nature of the leakage detection model that passed through the MinMaxScaler. If this problem was solved, the accuracy of leakage detection would improve. Furthermore, when the emitter error was predicted through the average pressure in the network, the emitter error of 1.49 represented a leakage volume error of approximately 6.38 m³/h, and the emitter error of 1.82 represented a leakage volume error of 7.8 m³/h. However, the pressure meter installation position in the nodes may be biased, because the current pressure meter positions were selected only by considering the standard deviation of the pressure in the node. It is expected that, when this problem is solved, a more accurate quantitative assessment of the leakage volume will be possible. To solve these two problems, location-based clustering was carried out to target a large real network and a meter was installed within the cluster to prevent bias in the meter installation. Furthermore, it was assumed that the cluster was detected after the cluster execution to prevent the model’s performance from declining as the number of nodes increased.

3.2. Real World Network

The leakage detection model was applied to a real medium block network in South Korea to evaluate its leakage detection performance. The target network is shown in Figure 6. P-City had 587 nodes, 648 pipelines, and a single water source. The corresponding networks were all calibrated, the average pressure used as the training data in the network was 49.89 m, the maximum pressure was 71.27 m, the minimum pressure was 31.20 m, and the standard deviation was 6.65. For P-City, we applied the emitter for which a leakage volume was assumed for all the nodes to generate leakage scenarios and evaluated the performance of the leakage detection model through approximately 500 leakage scenarios.

3.2.1. Network Clustering

To perform leakage detection on a real network, clustering was performed using the coordinates of a large network. K-means clustering based on the positions (X and Y coordinates) was performed. K-means clustering is an unsupervised machine learning technique that classifies groups based on the similar characteristics of each object.

This technique must necessarily be provided with information about K, which means the pre-defined number of clusters, and the decision of the seeds and centers of the clusters tend to have a significant influence on the formation of these clusters, depending on the data type [25]. To solve this, we determined the optimal parameter K and performed the clustering by comparing the sum squared error (SSE), which means the sum of squares of the distances between the nodes according to K. Figure 7 shows an example of K-means clustering. In Figure 7, the black dots mean scattered data, and the red and green dots present the divided data into two clusters using the K-mean clustering method.

3.2.2. Determination of the Optimal Number of Clusters in a Real-World Network

The selection of K, which means the number of clusters, is important for the proper application of K-means clustering. We compared the SSE, which denotes the sum of squares of the distances between the nodes, to select K. The lower the SSE, the greater the density could be in the post-clustered cluster. Figure 8 shows a comparison of the SSE according to K. Figure 8 shows a sharp decrease in the SSE as the number of clusters increased to approximately 9, but the SSE decreased relatively slowly after 10 clusters. Based on the results of Figure 8, P-City was divided into nine clusters. The result of adopting K as nine for P-City’s K-means clustering performance is shown in Figure 7. The emitter was also applied to the demand of the node within the network, with an interval of 0.05 to 0.45. Figure 9 shows the shape of P-City where the clustering was carried out.

3.2.3. Leakage Detection after Clustering of the Real-World Network

After the clustering, the leakage simulation was performed using the developed model in P-City and the leakage detection was realized. The performance of the leakage detection model was evaluated based on accident data, which were obtained by randomly causing leakage accidents according to 529 scenarios with random leakage volumes (emitter) and leakage locations. The values were evaluated by an independent multi-classification model for independent performance assessments of the leakage volumes and locations. Table 4 presents the evaluation of the leakage volume detection using a multi-classification model when the leakage detection was performed using all the pressure data. Table 5 presents the evaluation using a multi-classification model for the leakage location detection using all the pressure data. Table 6 and Table 7 present the multi-classification of the leakage volume and location detection results as a confusion matrix.

The scale of the highest F1 Score indicating the efficiency of the leakage volume detection performance in Table 4 is Lev. 1. Lev. 1 denotes an emitter of 0.05, which means the lowest leakage volume. The cluster that showed the highest performance in the leakage location detection was Clust. 1, which had 11 nodes. Furthermore, the cluster that showed the lowest performance was Clust. 5, which had 129 nodes with an F1 Score of 0.821. The results obtained from Table 5 generally showed higher performances when the number of nodes in the cluster was smaller. The precision, recall, and F1 score through the macro average of the leakage volume detection were 0.930, 0.923, and 0.926, respectively. Furthermore, the precision, recall, and F1 score through the macro average of the leakage location detection were 0.903, 0.931, and 0.911, respectively. In conclusion, the leakage volume and location detection performance through the pressure data of all the nodes corresponded to the F1 scores of 0.926 and 0.911. Table 6 and Table 7 are confusion matrixes based on the data in Table 4 and Table 5. However, installing a meter in every node and obtaining data is highly difficult to implement in a real network, thus, it requires the identification of the installation locations of the meters and the number of meters for the minimum detection performance. To solve this problem, two to seven pressure gauges with large standard deviations were installed sequentially, which are the minimum numbers that can have a correlation, and the accuracy was compared.

3.2.4. Determination of the Optimal Number of Meters in Each Cluster in the Real-World Network

Table 8 lists the leakage detection performance for each number of meters based on the F1 scores. Subsequently, the F1 score of 0.8 was assumed to be the minimum accuracy and the minimum number of meters for each cluster was selected. However, Clust. 5 and Clust. 6 failed to show the minimum performance assumed. Therefore, we selected the number of meters that showed the highest performance after performing the leakage detection. The total number of meters determined thereafter was 30 and the leakage detection was carried out using meter data, according to the number of meters. Table 9 and Table 10 list the results of the evaluation of the leakage detection, using 30 pressure data for the leakage volumes and locations with a multi-classification model. In P-City, we installed meters on a total of 30 nodes to build normal and accident data, corresponding to approximately 5.1% of the 587 nodes, and conducted the detection using the leakage detection model. The leakage volumes with the lowest F1 scores in Table 9 showed the lowest detection performance in Lev. 4. The highest detection performance was shown in “Lev. 1”, which denotes the smallest leakage volume. The detection performance for the total leakage volume achieved an F1 score of 0.924 and the precision and recall results were 0.925 and 0.923, respectively. The cluster with the highest detection performance in Table 10 was Clust. 9 with 34 nodes and the cluster that had the lowest detection performance was Clust. 6 with 31 nodes. The detection performance at all the leakage locations achieved an F1 score of 0.858 and the precision and recall were 0.864 and 0.857, respectively. The pressure meters, which corresponded to 5.1% of all the nodes, could quantify the leakage volume and also detect the location successfully when a leakage occurred.

Table 11 and Table 12 are confusion matrixes based on the data in Table 9 and Table 10. Table 13 presents a performance comparison of the model by the number of pressure meters. This comparison was performed using the F1 scores, MAE, and RMSE. As for the leakage volume detection using all the pressure meters and the leakage volume detection results using 30 pressure data, a difference of about 0.005 was shown for the precision and 0.001 for the recall. Additionally, the F1 Scores showed a difference of 0.002. The MAE showed a difference of about 0.001. Additionally, the RMSE showed a difference of 0.04. The detection performance for the leakage location showed a difference of about 0.039 for the precision, about 0.074 for the recall, and 0.053 for the F1 Score. Additionally, the MAE showed a difference of about 0.031 and the RMSE showed a difference of 0.88. In conclusion, in all the indicators, it was confirmed that, as the number of pressure meters used for the leakage detection decreased, the performance of the leakage location, rather than the leakage volume, decreased. In the case of the RMSE, the error increased by about two times compared to the existing number, but with the currently determined number of pressure meters, could satisfy the minimum F1 Score of 0.8 for the leakage detection. When the same model is used, and a similar number of pressure meters installed in other areas, such as the DMA (District Metered Area) and PMA (Pressure Management Area), can show a similar performance to the current results when leakage detection is performed.

4. Conclusions

This study proposed a simulated leakage accident scenario configuration method for leakage detection and a technique to utilize the configured data for the leakage detection. The leakage detection was performed for an Austin network to evaluate the performance of the leakage detection model. When the leakage detection was performed using pressure data on all the nodes, the DP, which refers to the detected leakages among the entire leakages, was 99%, with a 1% probability of false alarms. The DP was also 99% when the leakage detection was performed using the pressure data of nine nodes with large standard deviations of pressure. To use the pressure data of all the nodes to determine the leakage volume accurately, the emitter forecast could accurately detect the numbers of nodes that occurred in 11 of the 45 leakage scenarios. When nine nodes were used for the detection, the leakage volume and location could be accurately detected for 10 and 32 scenarios, respectively, out of 45 scenarios. However, regarding the leakage location, the errors of the number of nodes were 0.26 and 0.58, respectively. Thus, it is expected that the hyperparameter optimization of the leakage detection model will display a higher performance.

To perform the leakage detection by applying the model to a real network, where the size of the network is relatively large, location-based clustering was performed using the network’s coordinates. We selected the number of clusters by comparing the sum of squares of the distances between the nodes according to the number of clusters, and then installed meters on a total of 30 nodes, corresponding to approximately 5.1% of all the nodes in the actual network, where clustering was carried out to build normal and accident data and conduct a detection with the leakage detection model. In this case, the leakage volume could be detected with precision, recall, MAE, RMSE, and an F1 score of 0.864, 0.857, 0.048, 0.219, and 0.858, respectively. A quantitative assessment of the leakage volume in the detected leakages was possible and the quantity and optimal installation location of the meters could also be selected for the performance of the leakage detection, taking into account the simulated leakage accident scenario. Through the developed model, it seems possible to detect the leakage locations with the leakage volume, using real-time data obtained from the network.

The aforementioned RNN-LSTM-based leakage recognition model showed more than a 90% accuracy at all points except for singular points, and the deep learning framework methodology for the DMA application also detected simulated leakages at 85.71%. The leakage detection was performed with over 90% accuracy for single and multiple leakages by conducting a leakage detection through pressure data with a CNN model. In the other literature, the accuracy was generally more than 85% when only the leak location and recognition were detected through the deep learning framework. However, a clear advantage of this study was that it showed F1 scores of 0.858 and 0.924, respectively, despite detecting leakage while detecting the location of the leakage. Additionally, this study is expected to be used as basic data in various fields, such as leakage recovery and restoration priorities, in that the leakage quantity can be quantitatively evaluated.

Currently, the real medium block network is divided into nine clusters and 3.3 pressure meters are installed for each cluster, showing high F1 scores, and a simultaneous exploration of the volume and location of leakages can be successfully performed. This shows that, when leak detection is performed after acquiring data by installing pressure meters in areas such as the DMA (District Metered Area) and PMA (Pressure Management Area), performances like those in the current results can be achieved. The leakage detection methodology proposed in this study is differentiated from the other literature by detecting the location of the leakage through a certain number of pressure meters and using the amount of leakage as an index for determining the quantitative scale of the leakage accident, which means it can be used as basic data in various fields, such as leakage recovery and restoration, considering the scale. However, since the pressure data currently used for the leakage detection within the network are based on a set amount of demand and a set physical factor, a leakage scenario is generated and utilized, so these data have uncertainty. There are various factors that can affect the performance, such as errors. Additionally, it is possible to compare the performances by applying various machine learning (e.g., the random forest regression, the model tree, and support vector machine) techniques, other than the currently used DNN model, so it is expected that this will be conducted in future studies.

Author Contributions

Conceptualization—Y.H.C. and R.K.; data curation—R.K.; methodology—Y.H.C. and R.K.; supervision—Y.H.C.; writing—original draft, R.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) (NRF-2021R1G1A1003295).

Data Availability Statement

Not applicable.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jun, S.; Choi, Y.H. Data Generation Approaches to Detect Abnormal Conditions in Water Distribution Systems. J. Korean Soc. Hazard Mitig. 2022, 22, 69–79. [Google Scholar] [CrossRef]
K-water. Water Facilities Construction Cost Estimation Report; Korea Water Resources Association: Seoul, Republic of Korea, 2010. [Google Scholar]
Park, S.; Kim, K.; Seo, J.; Kim, J.; Koo, J. The Leak Signal Characteristics and Estimation of the Leak Location on Water Pipeline. J. Korean Soc. Water Wastewater 2018, 32, 461–470. [Google Scholar] [CrossRef]
Stańczyk, J.; Burszta-Adamiak, E. Development of Methods for Diagnosing the Operating Conditions of Water Supply Networks over the Last Two Decades. Water 2022, 14, 786. [Google Scholar] [CrossRef]
Jun, S.; Jung, D.; Lansey, K.E. Comparison of Imputation Methods for End-User Demands in Water Distribution Systems. J. Water Resour. Plan. Manag. 2021, 147, 04021080. [Google Scholar] [CrossRef]
Karadirek, I.E.; Kara, S.; Yilmaz, G.; Muhammetoglu, A.; Muhammetoglu, H. Implementation of hydraulic modelling for water-loss reduction through pressure management. Water Resour. Manag. 2012, 26, 2555–2568. [Google Scholar] [CrossRef]
Min, K.W.; Kim, T.; Lee, S.; Choi, Y.H.; Kim, J.H. Detecting and Localizing Leakages in Water Distribution Systems Using a Two-Phase Model. J. Water Resour. Plan. Manag. 2022, 148, 04022051. [Google Scholar] [CrossRef]
Mounce, S.; Mounce, R.; Boxall, J. Novelty Detection for Time Series Data Analysis in Water Distribution Systems Using Support Vector Machines. J. Hydroinformatics 2010, 13, 672–686. [Google Scholar] [CrossRef]
Jung, D.; Lansey, K. Water Distribution System Burst Detection Using a Nonlinear Kalman Filter. J. Water Resour. Plan. Manag. 2015, 141, 04014070. [Google Scholar] [CrossRef]
Nam, K.; Ifaei, P.; Heo, S.; Rhee, G.; Lee, S.; Yoo, C. An efficient burst detection and isolation monitoring system for water distribution networks using multivariate statistical techniques. Sustainability 2019, 11, 2970. [Google Scholar] [CrossRef]
Ahn, J.; Jung, D. Hybrid statistical process control method for water distribution pipe burst detection. J. Water Resour. Plan. Manag. 2019, 145, 06019008. [Google Scholar] [CrossRef]
Lee, C.W.; Yoo, D.G. Development of Leakage Detection Model in Water Distribution Networks Applying LSTM-based Deep Learning Algorithm. J. Korea Water Resour. Assoc. 2021, 54, 599–606. [Google Scholar]
Lee, C.-W.; Yoo, D.-G. Development of leakage detection model and its application for water distribution networks using RNN-LSTM. Sustainability 2021, 13, 9262. [Google Scholar] [CrossRef]
Wang, X.; Guo, G.; Liu, S.; Wu, Y.; Xu, X.; Smith, K. Burst detection in district metering areas using deep learning method. J. Water Resour. Plan. Manag. 2020, 146, 04020031. [Google Scholar] [CrossRef]
Fang, Q.; Zhang, J.; Xie, C.; Yang, Y. Detection of multiple leakage points in water distribution networks based on convolu-tional neural networks. Water Supply 2019, 19, 2231–2239. [Google Scholar] [CrossRef]
Jung, H.; Jung, D.; Jun, S. Comparison of ANN model’s prediction performance according to the level of data uncertainty in water distribution network. J. Korea Water Resour. Assoc. 2022, 55, 1295–1303. [Google Scholar]
Rossman, L.A. EPANET 2: Users Manual; National Risk Management Research Laboratory: Cincinnati, OH, USA, 2000. [Google Scholar]
Yoo, D.G.; Yoon, J.S.; Lee, H.M.; Kang, D.; Kim, J.H. Development and Application of Leakage Detection Model for Water Distribution Networks considering Uncertainty Analysis. J. Korean Soc. Hazard Mitig. 2014, 14, 177–185. [Google Scholar] [CrossRef]
Gupta, R.; Bhave, P.R. Comparison of Methods for Predicting Deficient-Network Performance. J. Water Resour. Plan. Manag. 1996, 122, 214–217. [Google Scholar] [CrossRef]
Schmidhuber, J. Deep Learning in Neural Networks: An Overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef]
Park, D.C.; Choi, Y.H. Development of real-time defect detection technology for water distribution and sewerage networks. J. Korea Water Resour. Assoc. 2022, 55, 1261–1270. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Hagos, M.; Jung, D.; Lansey, K. Optimal meter placement for pipe burst detection in water distribution systems. J. Hydroinformatics 2016, 18, 741–756. [Google Scholar] [CrossRef]
Brion, L.M.; Mays, L.W. Methodology for optimal operation of pumping stations in water distribution systems. J. Hydraul. Eng. 1991, 117, 1551–1569. [Google Scholar] [CrossRef]
Bae, W.S.; Roh, S.W. A Study on K-means Clustering. Commun. Stat. Appl. Methods 2005, 12, 497–508. [Google Scholar] [CrossRef]

Figure 1. Flowchart of leakage detection based on deep learning model.

Figure 2. Configuration of the deep neural network model.

Figure 3. Configuration of precision, recall, and F1 score.

Figure 4. Layout of the Austin network.

Figure 5. 9 nodes for Austin network with the largest standard deviation.

Figure 6. Layout of P-City.

Figure 7. Example of K-means clustering.

Figure 8. Sum squared error (SSE) in accordance with K in P-City.

Figure 9. Application to K-means clustering for P-City.

Table 1. The performance of leakage detection for using all the pres. meters in Austin networks.

PI	Leakage Detection	Leakage Volume	Leakage Location
$N_{d}$	99	11	38
$N_{t l}$	100	45	45
Detection Probability (%)	0.99	0.24	0.84

Table 2. The performance of leakage detection for only 9 pres. meters in Austin networks.

PI	Leakage Detection	Leakage Volume	Leakage Location
$N_{d}$	99	10	32
$N_{t l}$	100	45	45
Detection Probability (%)	0.99	0.22	0.71

Table 3. The error of leakage detection.

Number of Pres. Meter	Volume Error (Emitter)	Location Error (Number)
67 m (all of pres. meter)	1.49	0.26
9 m	1.82	0.58

Table 4. Comparison of the leakage volume detection using all the pres. meters in P-City.

PI	Lev. 1	Lev. 2	Lev. 3	Lev. 4	Lev. 5	Lev. 6	Lev. 7	Lev. 8	Lev. 9
TP	43	45	52	47	73	57	57	61	52
FP	0	1	2	5	16	5	4	6	3
FN	2	2	1	8	4	11	5	6	3
Precision	1	0.978	0.963	0.904	0.820	0.919	0.934	0.910	0.945
Recall	0.956	0.957	0.981	0.855	0.948	0.838	0.919	0.910	0.945
F1 Score	0.977	0.968	0.972	0.879	0.880	0.877	0.927	0.910	0.945

Table 5. Comparison of the leakage location detection using all the pres. meters in P-City.

PI	Clust. 1	Clust. 2	Clust. 3	Clust. 4	Clust. 5	Clust. 6	Clust. 7	Clust. 8	Clust. 9
TP	46	109	66	75	46	25	39	47	18
FP	0	1	2	5	16	5	4	6	3
FN	1	41	13	7	4	0	0	0	0
Precision	1.000	0.991	0.971	0.938	0.742	0.833	0.907	0.887	0.857
Recall	0.979	0.727	0.835	0.915	0.920	1.000	1.000	1.000	1.000
F1 Score	0.989	0.838	0.898	0.926	0.821	0.909	0.951	0.940	0.923

Table 6. Confusion matrix of leakage volume detection using all the pres. meters in P-City.

	Lev. 1	Lev. 2	Lev. 3	Lev. 4	Lev. 5	Lev. 6	Lev. 7	Lev. 8	Lev. 9
	Lev. 1	Lev. 2	Lev. 3	Lev. 4	Lev. 5	Lev. 6	Lev. 7	Lev. 8	Lev. 9
Lev. 1	43	0	1	1	0	0	0	0	0
Lev. 2	0	45	0	1	1	0	0	0	0
Lev. 3	0	1	52	0	0	0	0	0	0
Lev. 4	0	0	1	47	5	2	0	0	0
Lev. 5	0	0	0	3	73	1	0	0	0
Lev. 6	0	0	0	0	10	57	1	0	0
Lev. 7	0	0	0	0	0	2	57	3	0
Lev. 8	0	0	0	0	0	0	3	61	3
Lev. 9	0	0	0	0	0	0	0	3	52

Table 7. Confusion matrix of leakage location detection using all the pres. meters in P-City.

	Clust. 1	Clust. 2	Clust. 3	Clust. 4	Clust. 5	Clust. 6	Clust. 7	Clust. 8	Clust. 9
	Clust. 1	Clust. 2	Clust. 3	Clust. 4	Clust. 5	Clust. 6	Clust. 7	Clust. 8	Clust. 9
Clust. 1	38	0	1	0	0	0	0	0	0
Clust. 2	2	109	37	0	0	0	2	0	0
Clust. 3	3	8	66	0	2	0	0	0	0
Clust. 4	1	0	3	75	0	0	1	2	0
Clust. 5	1	0	0	0	46	3	0	0	0
Clust. 6	0	0	0	0	0	25	0	0	0
Clust. 7	0	0	0	0	0	0	39	0	0
Clust. 8	0	0	0	0	0	0	0	47	0
Clust. 9	0	0	0	0	0	0	0	0	18

Table 8. F1 Score for leakage location detection according to the number of pressures meters.

No. of Pres. Meters	Clust. 1	Clust. 2	Clust. 3	Clust. 4	Clust. 5	Clust. 6	Clust. 7	Clust. 8	Clust. 9
2	0.842	0.790	0.743	0.825	0.698	0.711	0.776	0.938	1
3	0.725	0.745	0.733	0.588	0.698	0.333	0.821	0.957	1
4	0.811	0.764	0.749	0.806	0.630	0.622	0.787	0.930	1
5	0.783	0.856	0.780	0.769	0.655	0.696	0.906	0.914	0.960
6	0.815	0.730	0.880	0.836	0.779	0.578	0.860	0.943	1
7	0.800	0.740	0.774	0.880	0.676	0.486	0.844	0.925	1

Table 9. Performance comparison of the leakage volume detection using only 30 pres. meters.

PI	Lev. 1	Lev. 2	Lev. 3	Lev. 4	Lev. 5	Lev. 6	Lev. 7	Lev. 8	Lev. 9
TP	59	48	61	44	58	58	55	50	56
FP	0	2	2	5	10	5	6	7	3
FN	2	1	2	8	5	7	5	6	4
Precision	1	0.96	0.968	0.898	0.853	0.921	0.902	0.877	0.949
Recall	0.967	0.980	0.968	0.846	0.921	0.892	0.917	0.893	0.933
F1 Score	0.983	0.970	0.968	0.871	0.885	0.906	0.909	0.885	0.941

Table 10. Performance comparison of the leakage location detection using only 30 pres. meters.

PI	Clust 1	Clust 2	Clust 3	Clust 4	Clust 5	Clust 6	Clust 7	Clust 8	Clust 9
TP	69	177	85	20	23	14	38	27	14
FP	8	7	22	2	7	6	7	3	0
FN	1	20	11	6	11	9	3	1	0
Precision	0.896	0.962	0.794	0.909	0.767	0.7	0.844	0.9	1
Recall	0.986	0.898	0.885	0.769	0.676	0.609	0.927	0.964	1
F1 Score	0.939	0.929	0.837	0.833	0.719	0.651	0.884	0.931	1

Table 11. Confusion matrix of leakage volume detection using only 30 pres. meters.

	Lev. 1	Lev. 2	Lev. 3	Lev. 4	Lev. 5	Lev. 6	Lev. 7	Lev. 8	Lev. 9
	Lev. 1	Lev. 2	Lev. 3	Lev. 4	Lev. 5	Lev. 6	Lev. 7	Lev. 8	Lev. 9
Lev. 1	59	0	1	1	0	0	0	0	0
Lev. 2	0	48	0	0	1	0	0	0	0
Lev. 3	0	2	61	0	0	0	0	0	0
Lev. 4	0	0	1	44	5	2	0	0	0
Lev. 5	0	0	0	4	58	1	0	0	0
Lev. 6	0	0	0	0	4	58	3	0	0
Lev. 7	0	0	0	0	0	2	55	3	0
Lev. 8	0	0	0	0	0	0	3	50	3
Lev. 9	0	0	0	0	0	0	0	4	56

Table 12. Confusion matrix of leakage location detection using only 30 pres. meters.

	Clust. 1	Clust. 2	Clust. 3	Clust. 4	Clust. 5	Clust. 6	Clust. 7	Clust. 8	Clust. 9
	Clust. 1	Clust. 2	Clust. 3	Clust. 4	Clust. 5	Clust. 6	Clust. 7	Clust. 8	Clust. 9
Clust. 1	69	0	1	0	0	0	0	0	0
Clust. 2	2	177	16	0	0	0	2	0	0
Clust. 3	4	7	85	0	0	0	0	0	0
Clust. 4	1	0	5	20	0	0	0	0	0
Clust. 5	1	0	0	2	23	4	3	1	0
Clust. 6	0	0	0	0	6	14	2	1	0
Clust. 7	0	0	0	0	1	1	38	1	0
Clust. 8	0	0	0	0	0	1	0	27	0
Clust. 9	0	0	0	0	0	0	0	0	14

Table 13. Performance comparison of model by number of pres. meters.

PI	Volume Detection (All)	Volume Detection (30)	Location Detection (All)	Location Detection (30)
Precision	0.930	0.925	0.903	0.864
Recall	0.923	0.924	0.931	0.857
F1 Score	0.926	0.924	0.911	0.858
MAE	0.001	0.004	0.017	0.048
RMSE	0.026	0.066	0.131	0.219

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, R.; Choi, Y.H. The Development of a Data-Based Leakage Pinpoint Detection Technique for Water Distribution Systems. Mathematics 2023, 11, 2136. https://doi.org/10.3390/math11092136

AMA Style

Kim R, Choi YH. The Development of a Data-Based Leakage Pinpoint Detection Technique for Water Distribution Systems. Mathematics. 2023; 11(9):2136. https://doi.org/10.3390/math11092136

Chicago/Turabian Style

Kim, Ryul, and Young Hwan Choi. 2023. "The Development of a Data-Based Leakage Pinpoint Detection Technique for Water Distribution Systems" Mathematics 11, no. 9: 2136. https://doi.org/10.3390/math11092136

APA Style

Kim, R., & Choi, Y. H. (2023). The Development of a Data-Based Leakage Pinpoint Detection Technique for Water Distribution Systems. Mathematics, 11(9), 2136. https://doi.org/10.3390/math11092136

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Development of a Data-Based Leakage Pinpoint Detection Technique for Water Distribution Systems

Abstract

1. Introduction

2. Configuration of Simulated Leakage Accident Scenario and Detection Technique

2.1. Configuration of Simulated Leakage Accident Scenario

2.2. Leakage Detection Model: Deep Neural Network

2.3. Performance Indices (PI)

3. Application and Results

3.1. Austin Network

Performance Evaluation According to the Austin Network Leakage Scenario

3.2. Real World Network

3.2.1. Network Clustering

3.2.2. Determination of the Optimal Number of Clusters in a Real-World Network

3.2.3. Leakage Detection after Clustering of the Real-World Network

3.2.4. Determination of the Optimal Number of Meters in Each Cluster in the Real-World Network

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI