1. Introduction
Affected by the United States–China trade war and the global pandemic, the global industrial supply chain is being restructured at an accelerated pace. Therefore, many high-end products are moving back to Taiwan for manufacturing. The manufacturing of these high-end products requires a considerable amount of water. For instance, Taiwan Semiconductor Manufacturing Company consumes about 215,000 tons of water per day to manufacture 5 and 3 nm advanced transistors in Tainan Science Park, their main base for production. It is estimated that the water demand will increase to 325,000 tons per day by 2026, and the average water consumption rate will increase by more than 1.51%. This phenomenon can be expected to increase the average industrial water consumption to more than 1.51% in the future. Taiwan’s total annual water consumption is 16.713 billion tons, of which 9.98% is industrial usage, and 49.81% of industrial water consumption comes from the tap water supply system, accounting for 5.06% of the total water consumption. From the GDP of 2018, which was 2.75%, every 1% of water consumption has indirectly contributed to 0.54% of economic growth. Therefore, tap water is an important resource for domestic economic development. However, owing to the abnormal climate, Taiwan’s industrial water usage has continued to increase in recent years. The Taiwanese government is transporting water from the northern to the southern part of the country to help with droughts occurring in Taoyuan and Hsinchu. In addition, they have invested NTD 2.78 billion to build the necessary backup water pipelines, demonstrating the importance of water resources management. The current domestic annual water leakage rate is 14.25%. Suppose the leakage rate of water pipes inside and outside factories can be effectively reduced. In that case, it is estimated that domestic water resources can be strengthened and saved, and domestic economic growth can be stabilized.
Most leaking water pipes are found in the urban underground pipe network. The main pipe is used to transport water, and smaller pipes are used to distribute water to users and factories. Hence, the distribution of the pipes is dendritic. At present, it is possible to locate and repair leaking pipes via a program to reduce water loss. A popular technique developed by Fuchs and Riehle [
1] over the past 20 years can be used to locate leaks through acoustic/vibration signal analysis. At two access points on either side of the suspected leak location, accelerometers or hydrophones are used to collect vibrational or acoustic signals. Related technology is used to estimate the time delay between two acquisition signals, and the method is commonly used to determine the leakage location in buried pipelines [
1,
2,
3,
4]. For the technique to be effective, the propagation velocity of the acoustic signal and the exact pipe length must be known in advance. The study of sound waves propagating in metal or plastic pipes has received much attention for many years. The dispersion propagation model of buried underground water pipelines was studied [
5,
6], and the results were explained physically. Muggleton et al. [
7] developed a low-frequency theoretical model of buried liquid-filled pipelines for predicting wave velocity and validated them experimentally in the case of evacuated pipelines and pipelines buried in soil or water [
8,
9]. The pipe conditions—such as pipe thickness, material properties, and surrounding media—are known from these studies, and the propagation velocity can be calculated using the corresponding theoretical model. In practice, if some pipeline conditions are unknown, the propagation velocity can be measured in situ using known simulated leaks, for example, by releasing water from a fire hydrant [
10]. Furthermore, although the Taiwan Water Corporation constructs a community pipe network for the water pipeline system in each region and determines whether there are water leaks through the monitoring of changes in pressure and flow, the cost of this method is high owing to the need to install water meters (pressure gauges and flow meters). At present, the pipelines in urban areas are generally PVC pipelines, and the pipeline pressure is below 2 kg/cm
2. The leakage signal rapidly decays, and it is difficult to effectively monitor long-distance leakages, unlike in a steel pipe system. The community pipeline network can only provide information on whether the area contains a leak and cannot provide the exact location of the leak for excavation and repair. Therefore, in practice, it remains necessary for experienced leak detectors to conduct on-site leak detection in the area to determine the location of the leak. However, when conducting sound-based leak detection on-site, interference from ambient sound often causes the leak detector to misjudge, resulting in errors in leak detection and lowering leak repair efficiency. Take the international manufacturers of leak detection instruments (SEWERIN and EchoShore) as an example: their leak detection equipment performs leak detection via frequency band variation and manual judgment, which is easily affected by environmental noise. In addition, pipes suitable for leak detection equipment are made from rigid materials, differing from the current situation in Taiwan, where most main water pipelines are still PVC pipelines. This makes it difficult for the equipment to be adapted for use in the localized water pipeline transportation system. The above helps us understand the importance of leak detection.
The traditional judgment of water pipe leakage depends on the judgment of professionals, but its efficiency is very low. To help with large-scale right-of-way applications and the timeliness of emergency repairs, establishing a system that can detect the locations of leakages is vital. This study intended to apply artificial intelligence (AI) deep learning to develop a water pipe leakage and location identification system.
In recent years, artificial intelligence (AI) applications have continued to thrive in various industries, and deep learning has been widely used. Since deep learning can be used to perform deep analysis of signals and learn the implicit structure of signals, trained models are less affected by environmental noise, unmatched recording settings, and other factors. Therefore, these models are more suitable to be applied to sound event detection systems with strong key properties [
11]. Deep learning has proved to be a very powerful tool because of its ability to handle large amounts of data [
12,
13]. Among them, the convolutional neural network (CNN) is widely used. The structure of the convolutional neural network enables it to use the two-dimensional structure of input data; compared with other deep learning architectures, CNNs can produce better results in image and speech recognition. This research established an intelligent sound-assisted water leak identification system by developing and using a localized AI water leak diagnostic instrument to capture on-site dynamic audio, applying a CNN for leak detection, and integrating Internet of Things (IoT) technology to simultaneously identify and locate leakages. In addition, the system was imported into a personal handheld device or a back-end platform. The leak detector can interact with the site remotely to assess the current status of the pipelines, which greatly enhances the efficiency of water leakage detection and provides a wider range of pipeline status information. The system will be beneficial to big data analysis on cloud platforms in the future and help predict and prevent pipeline leakages.
2. System Architecture, Theoretical Methods, and Experiments
The architectural diagram of the intelligent sound-assisted water leak identification system is shown in
Figure 1. In this study, an acoustic and vibration diagnostic system for structural deterioration was constructed first, including an acoustic wave sensing unit, an acoustic and vibration diagnostic module, and a communication module for signal transmission between the acoustic wave sensing unit and the acoustic and vibration diagnostic module; then, construction equipment was used to collect more than ten thousand pieces of data. After multiple pieces of audio data were processed into noise, the audio was dissociated by the Mel-scale Frequency Cepstral Coefficient (MFCC). Finally, a CNN was used to create a database to identify whether the sound was ambient or from a leakage. In addition, a mobile version of the cloud-based intelligent water leakage signal diagnostic and management platform was also developed, as shown in
Figure 2.
2.1. Raw Data Collection
The subject of research in this study is audio captured from the ground above field pipelines in various districts of the Taiwan Water Corporation. These data signals are mixed with various random environmental audio and underground leakage audio, so the events corresponding to each time point cannot be clearly marked. Therefore, it is necessary to introduce unsupervised learning algorithms (otherwise called swarming algorithms) to perform labeling. However, if the original signal is directly sent to the unsupervised learning algorithm for learning, the “curse of dimensionality” can arise [
14], compromising the labeling. To address this issue, this study extracted representative features from the original data, and the algorithms then used these representative features to learn. In this study, the encoder in the autoencoder is used to complete feature extraction.
2.2. Using MFCC for Speech Recognition and Normalization
The acoustic signal captured in this study was subjected to the MFCC for speech recognition after discrete square-wave Fast Fourier Transform (FFT). The number of filters is 30, the MFCC has 20 dimensions, the frequency range is 0–44,100 Hz, the Fast Fourier Transform (FFT) has 2048 points, and the size of the sound frame used for the sound file is 5 s. In addition, to avoid overly drastic changes between sound frames, two sound frames can be overlapped by 20 milliseconds (ms) to obtain the time-spectrum data graph, as shown in
Figure 3. The three axes are amplitude, frequency, and time, respectively.
Figure 3a is the time–frequency characteristic of environmental audio, and
Figure 3b is the time–frequency characteristic of PVC pipe leak detection.
Each acoustic signal must be normalized before being trained with deep learning algorithms. The normalization method used in this study is min–max normalization. The readings obtained by sampling 13 times at a certain time point
n form a vector (or a one-dimensional array); the maximum and minimum values of the readings measured each time are formed into two vectors,
and
.
is normalized with the following formula:
Using the difference method, the normalized measurement reading at the current time point
is subtracted from the measurement reading at the previous time point
(as shown in Formula (2)):
where
is a differential signal.
The sum of the differential signals is calculated, and a threshold is established, as shown in Formula (3):
If the sum of the differential signals is greater than the threshold, it can be determined that the input sound wave signal is a transient signal with a sharply changing waveform; otherwise, it can be determined that the input sound wave signal is a steady-state signal with stable and moderate waveform changes.
2.3. Autoencoding with CNN
Convolutional autoencoders are based on CNNs, including convolutional layers, pooling layers, and upsampling layers. Since CNNs have achieved excellent image recognition results, the image itself is a two-dimensional matrix of pixels. Therefore, the two-dimensional matrix in the monitoring window can also be directly input to CNNs. The operations performed by the convolutional, pooling, and upsampling layers are explained below.
The convolutional layer performs convolution on the input image and the kernel. The kernel is usually a square 3 × 3 matrix, denoted by Ⓧ. Taking
Figure 4 as an example, the kernel will first operate on the orange area in the upper left corner of the original image (the orange area is also a 3 × 3 square matrix, the same as the kernel) to calculate the Hadamard product of these two matrices.
This product can be used as the first value of the feature map on the right in
Figure 4. Next, the kernel will continue to calculate the Hadamard products on the original image from left to right, from top to bottom, until the entire image is scanned by the kernel, producing a complete feature map. To further output the feature map with the same size as the original image, padding should be added on the periphery of the original image so that the output feature map and the original image before padding is added share the same size.
The purpose of the pooling layer [
15] is to further reduce the feature map output using the convolutional layer while retaining the obvious features in the feature map. There are two common pooling operations: max pooling and mean pooling. In this study, max pooling is used to reduce feature maps. Take
Figure 5 as an example of pooling. First, the input image is divided into several rectangular regions, and the maximum value is output for each subregion. Usually, the size of the subregion is 2 × 2. This mechanism works because the precise location of a feature is far less important than its rough location relative to other features.
The pooling layer will continuously reduce the spatial size of the data, decreasing the number of parameters and the amount of computation, which also prevents overfitting to a certain extent. The operations performed by the upsampling layer are quite simple; it directly resamples each area of the feature map. The detailed method can be directly seen in
Figure 6. Finally, the architecture of the autoencoder is shown in
Figure 7. The original image is a 13 × 16 matrix compressed by the encoder to obtain a 13 × 4 encoding matrix. It is worth noting that, by design, the pooling layer in the encoder sets the size of the sub-regions to be 1 × 2. This helps preserve the correlation between the timing information on the compression matrix and the sensor. The upsampling layer [
16] in the decoder is also aimed at timing amplification.
4. Conclusions
This research successfully developed and established an intelligent sound-assisted water leak identification system that uses a localized AI water leak diagnostic instrument to capture on-site dynamic audio and integrates IoT technology to simultaneously identify and locate the leakage.
A CNN is used as the basis for leak detection. The model can predict the leak’s coordinates, size, and probability, and the process does not rely on leak detection personnel to perform leak diagnosis. Professional leak detection personnel can interact remotely to understand the current on-site statuses of pipelines, greatly enhancing the efficiency of water leakage detection.
The results of testing the model proposed in this study show that the accuracy of the CNN after training is greater than 95%. The average absolute error calculated between the output data and the input data of the encoder is 0.1021, demonstrating that the proposed model outperforms existing methods in the detection of PVC pipe leaks. In addition, actual excavation data were used to verify the credibility of this AI system. The results confirm that the system has high reliability and can reduce the cost of excavation by 26%.
In the future, we expect non-professionals to use AI water leakage diagnostic instruments to inspect the regional pipeline network so that locations with suspected leakages can be found effectively and quickly (about four times the speed of traditional methods). The leakage status of the pipeline network can be tracked and managed, and limiting the area of the suspected water leakage can increase the efficiency of water leakage detection, improve the current leakage diagnosis and detection rate, improve the management efficiency of overall pipeline integrity, and reduce the workload of professional leak detection personnel.