1. Introduction
During recent decades, pipeline networks have been considered among the safest and most economical methods for transporting and storing oil and gas products [
1]. In fact, pipeline infrastructure is critical for worldwide economic growth. Multiple investments in hydrocarbons and petrochemical facilities are materialized thanks to the steady and reliable supply of feedstocks provided by pipeline infrastructure [
2]. For example, it has been estimated that, in 2015, crude oil pipelines generated approximately 200,000 jobs, accumulating over
$21.8 billion in Gross Domestic Product [
3]. Consequently, oil piping installations worldwide have been rapidly expanding to satisfy the ever-increasing energy needs of the population, intricating the topological complexity of the pipeline network, perplexing its supervision and assessment of its safety [
4]. Additionally, this breadth of pipeline usage inherently aggrandizes the probability of structural defects due to erosion over time, fracture propagation, human factors, environmental factors, and other causes [
5,
6,
7]. Leak detection in pipelines has been a prevalent issue for several decades. Pipeline leaks from sources such as small cracks and pinholes are termed chronic leaks, as they have the potential of going unnoticed for a long period of time, causing irreversible damage [
8]. Even seemingly small defects scale up fast to unfathomable magnitude. For instance, on 2 March 2006, a spill of about 1 million liters of oil occurred over around five days in the area known as Alaska’s North Slope because a quarter-inch hole corroded in a pipeline [
9]. Therefore, ensuring the apt functioning of these pipelines is imperative to avert excessive financial losses due to the interruption of oil and gas supply and, most importantly, to eliminate any potential threat to human lives and the ensuing detrimental aftermath on the environment.
Several conventional approaches undertake the recognition of defects on pipelines by analyzing their vibration response characteristics, using digital signal processing techniques, such as fast Fourier transform [
10] and the wavelet transforms [
11]. More recently, following the fourth Industrial revolution, Machine Learning data-driven approaches have gained popularity due to their high accuracy compared to other conventional methods and their efficient implementation due to recent advancements in tensor multiplication dedicated GPUs. In this vein, Deep Learning is widely employed to perform leakage detection in pipeline systems, aiming to leverage their efficacy in identifying even relatively small leakage diameters [
12], processing data either in the time domain [
13,
14] or in the frequency domain [
15]. Autoencoders are also a class of neural networks whose attribute of being trained on unlabeled data and distinguishing potential digression from the nominal state becomes very beneficial for detecting faulty conditions in the pipelines [
16,
17]. Convolutional Neural Networks (CNN) are utilized to perform feature extraction and learn through a series of filters to identify salient features in the data. Subsequently, these feature maps are fed to Multi-Layer Perceptrons (MLPs) [
18,
19,
20] or Support Vector Machines (SVM) [
21] to determine the operational state reflected by the initial signal. Although feature extraction is integral for classification data-driven methods [
22], their main limitation is that they require a high computational cost. Post-processing analysis is performed on historical data in the cloud, and a need for the storage of a high volume of data makes both the training and execution time of the models inefficient.
To address these issues, the “ESTHISIS” project [
23] aims to employ edge computing to apply DL techniques in real-time and detect leakages in oil and gas pipelines. In this framework, our novelty lies in the emphasis on providing Situational Awareness of the oil and gas pipelines to stakeholders through the harmonious integration of our wireless sensor networking to enhance the operational capacity of oil and gas pipelines. In our previous study [
24], two DL methodologies were presented in two different experimental setups and compared with their efficiency in detecting leakages in pipelines. The first method entails a supervised approach based on transforming the data to the time-frequency domain, creating spectrograms from the acquired sensors data, and using a 2D-Convolution Neural Network to characterize whether the pipeline is healthy or not. The second method is an unsupervised approach employing Long Short-Term Memory Autoencoders (LSTM AE) trained to reconstruct signals from healthy channels. The focal point of the current work is to merge these techniques efficiently, leverage the benefits yielded by each method, and present a comprehensive leakage detection scheme that can run on the edge to provide efficiency and scalability.
The main innovation of our work is the development of an edge methodology capable of running DL applications in a scalable manner for real-time analytics and providing accurate estimations for leakage detection. Our modeling entails a hybrid approach where the components of our previous analysis offer the capability of training the model simply by utilizing signals to correspond to the nominal healthy state (LSTM AE) and also present the benefit of converting long time series into low-resolution static images, which is a more memory-efficient solution (2D-CNN). Emphasis has been placed on the methodology’s validation through an experimental pipeline network during in-field testing. The dataset acquired from these trials is utilized for training and optimizing an instance of our proposed combined approach, which shall be stored on the edge. Subsequently, this instance shall be evaluated by undertaking real-time analytics to detect leakages in actual operating pipelines. In this context, parametric tests were performed to verify the model accuracy and efficiency in the actual environment within oil premises.
The rest of the paper is structured as follows: In
Section 2, a description of the processing system responsible for the data acquisition is provided. In
Section 3, the methodology behind our detection scheme is delineated.
Section 4 and
Section 5 present the results from the experimental field-testing phase and pilot testing. Finally, the conclusion from the performance of each model is summarized.
4. Method Validation
The first set of experiments was undertaken in an experimental setting, intending to verify the applicability of the proposed methodology. For this matter, the experimental setup pipeline network set up for the ESTHISIS project in Kalochori, Thessaloniki, Greece, hosted this round of experiments. This dataset is utilized for the training and obtaining the optimized instances of the network that shall be used to undertake leakage detection in the subsequent phase of our methodology testing in an actual working environment, as presented in
Section 5.
For the initial setup and after any change that altered any significant geometrical parameter of the format, such as the distance between the two sensors, the following process was followed: The water pressure was set to a pre-defined value and remained unaltered throughout the experiments, as it was regularly monitored, and water was added when needed to maintain a steady water pressure inside the pipeline. The two nodes are fully aligned on top of the pipeline wall in order to minimize uncertainties, and the induced leakage is 90° perpendicular to their plane due to mounting limitations posed by the environment. This configuration was selected in order to match the conditions that shall be met in the oil refinery premises during our method verification phase. An initial series of consecutive recordings without any leaks was taken to record a reference vibration signal for the channel. These recordings had a 10-min duration in total. Subsequently, a series of short tests of approximately 10 s each were carried out. During each trial, one of the faucets was turned on to emulate a leakage of a specific diameter. While the sampling rate can be defined by the user in our case, each node sampled the sensors’ analog signals with a sampling frequency of 25 KHz. Each run produced a sample that shall be received by the network and corresponds to 250,000-time steps, given the 10-s duration of each test and the 25 KHz sampling rate.
During the first day of the field tests in Kalochori, the majority of the tests were carried out without water flowing inside the pipeline, whereas most of the experiments carried out during the second day involved water flowing inside the channel. The other parameters that changed during the field tests were the distance between the sensors placed on the pipeline, the distance between the leakage and the sensors, and the diameter of the leakage ranging from 1 mm to 7 mm, i.e., the diameter of the faucet that was turned on each time. According to the standard test practice, each testing procedure was repeated 12 times.
Table 2 presents the number of available signals along with their properties.
Additionally,
Table 2 indicates the number of samples corresponding to each subset. Lastly,
Table 3 and
Table 4 tabulate the selected architecture and hyperparameter configuration of the trained and optimized models that shall be employed for the anomaly detection tests in the real environment.
5. Verification in a Real Environment
The optimized model that occurred from the experimental tests in Kalochori was used to monitor the pipelines in this setup and detect anomalies that imply leakage in the actual operating environment in oil refinery premises. The pipeline network for the field tests in a real environment was similar to the one described in
Section 4 for the field tests in Kalochori. The main difference between these two lies in the considerable ambient noise extant in the measurements that stem from other ongoing procedures taking place at the facilities, instigating external noise entering the monitored system. Additional differences between the two setups were the pipeline diameter since the available valves generated leakages of 5 mm and 13 mm.
The challenge of this analysis lies in the considerable ambient noise interfering with the measurements from other procedures taking place at the facilities. Similarly, according to the standard test practice, each testing procedure was repeated 12 times, and the results demonstrated below refer to the mean over these runs.
Table 5 summarizes the properties of the signals acquired during the pilot trials, such as the number of available signals, how these samples are distributed to each subset for the training, validation, and testing of our methodology, and general properties characterizing each time series.
The detection scheme follows the same pattern as described for the experimental testing. First, the LSTM AE model is responsible for the identification of leakages by detecting abnormalities in the signal. Second, after the LSTM AE has detected potential failure, the procedure of creating spectrograms with a 20-s rolling window is initiated
Figure 10). These spectrograms are then received by a 2D-CNN which undertakes the classification of the operational state of the monitored pipeline.
Table 5 lists the selected architecture and hyperparameter configuration of the models composing the monitoring scheme in the actual operating environment.
Subsequently, the generation of the spectrograms begins. Indicatively,
Figure 11 displays an example of such a spectrogram. The detection of the leakage is considered successfully identified by our monitoring scheme when it is correctly and timely recognized by the LSTM AE, and the 2D-CNN continues flagging the signal as defective. In case one of these conditions is not satisfied, then the observation is deemed as misclassified.
Lastly, the efficacy of these models in detecting outflow is presented. Indicatively,
Figure 12 demonstrates the LSTM autoencoder while inspecting a healthy and defective signal.
Figure 12 (right) demonstrates the reconstruction error, and in
Figure 12 (left), the actual acoustic signal acquired from the monitored pipeline is represented. The red denotes the outflow, as it can also be seen that no blue portion can be discerned because the outflow began before the monitoring.
The proposed methodology was again evaluated under diverse circumstances regarding the distance of the leakage from the nodes, the circulation of the fluid, the leakage diameter to determine the effect of the rupture’s size, and the distance from the nodes on the efficacy of the models. In this round of experiments, due to limitations in varying the leakage diameter, the effect of the distance from the node was further audited.
Table 6 summarizes the performance of the model in terms of accuracy detection, for each of the aforementioned instances, along with the results yielded by our previous studies concerning the components of our combined approach to the task of anomaly detection in the piping networks.
Furthermore, diverse metrics were used to obtain a more comprehensive understanding of the different models’ performance and enabled us to define the deficiencies of the model better and whether they are more susceptible to False Positives or False Negatives. The metrics employed are as follows:
where
TP,
TN,
FP,
FN denotes the True Positive, True Negative, False Positive, and False Negative, respectively.
As demonstrated in
Table 6, solely regarding the proposed methodology, highly accurate results were yielded even in an actual operating environment with substantial external noise. It was observed that as the distance from the node increased, the accuracy of our methodology decreased, maintaining very high accuracy yields across the numerous trials. Most importantly, it was also evinced that the propounded combined method presented in this study considerably improved the accuracy of detecting anomalies in the signal of the pipelines. Additionally, it is presented that the individual components were more susceptible to different types of errors. More specifically, the LSTM AEs were prone to label a signal erroneously as healthy, as demonstrated by the relatively lower recall values. This phenomenon is explained by the fact that it was commonly observed that, especially on the occasion of leakages with a small diameter, the signal resembled the signal before the occurrence of the leakage, thus misleading the LSTM AE model. Furthermore, the CNN classifiers presented a more balanced performance while slightly tilted towards falsely detecting leakages in healthy samples. Hence, it is further illustrated how the combined approach is capable of merging the two components and yielding better performance.
Lastly, despite having established the efficacy of the propounded methodology in the experimental as well as in the actual pipeline setup, it is intrinsic to compare our recommended models to other algorithms widely employed in pertinent literature for the task of anomaly detection. From our previous study, it was deduced that the AutoRegressive Moving Average for modeling univariate stationary time series performed best out of the set of benchmark models. Therefore, it is selected to provide a benchmark for comparison with the results obtained by the combined approach. The ARMA model is similarly trained solely using the dataset from the experimental pipeline network in Kalochori. This method is based on a regression model that is first fitted to the training data. Then the resulting model is used to forecast test sequences, and the difference between the predicted and real values is called residual. Suppose the orders p and q of the AR and MA models, respectively, have been chosen appropriately to model the given time series. In that case, it follows that the residuals are assumed to be distributed normally. Subsequently, these residuals are utilized to calculate the rolling z-score of the prediction error. Assuming that the fitted model is capable of satisfactorily predicting the healthy time series provided in the training step, if the error continuously exceeded the 95% confidence interval, it would serve as an anomaly indicator since the model would fail to predict the time series accurately based on the system dynamics learned during training signifying a significant change in the system. The input time series were ascertained to be stationary through the Augmented Dickey–Fuller test for our problem. The AR and MA order found from the training of the ARMA models to adequately capture the piping system’s dynamics were p = 4 and q = 5.
As
Table 7 reveals, there is a significant performance gap between the presented approach and the ARMA models. More specifically, the ARMA model struggles to maintain high levels of accuracy. This should likely be accredited to the fact that the ARMA model was trained based on the dataset of the experimental setup and was then asked to generate forecasts to the signals from the oil refinery. Conversely, it is evinced that the posed combined methodology demonstrates significantly greater transferability, allowing the model stored on the edge to perform real-time leakage detection, despite being trained on the experimental pipeline setup.
6. Conclusions
During recent decades, the ever-growing oil industry has highlighted the importance of supervising the integrity and efficient operation of piping systems worldwide. Monitoring the pipeline network operational condition and the timely detection of malfunctions of energy systems contribute to the minimization of environmental, economic, and social consequences. The critical challenge is the timely and accurate data acquisition from sensors integrated into pipelines set in industrial and harsh environments. The methodologies are part of the ESTHISIS project, which aims to detect leakages in oil and gas pipelines by gathering and processing data from accelerometers placed alongside the pipelines, forming an edge computing system that can issue early warning notifications on leakages.
The focal point of the present study is to establish a new combined methodology for leakage detection, and based on the data acquired from an intelligent wireless system for leakage detection in pipelines for oil and gas transportation and storage has been developed. More specifically, the signal from the pipelines is constantly fed to the LSTM AE, which undertakes the task of detecting anomalies instantaneously. Subsequently, based on our failure decision process, the operational state of the pipeline is either labeled as healthy or defective. Lastly, on the occasion of a defective pipeline, the signals thereafter are converted into spectrograms which are subsequently fed to CNN classifiers to achieve continuous flagging of the state as defective. Two separate trials took place in two distinct settings. First, the experimental setup in Kalochori was utilized for the training of the models, which would be subsequently used in the testing environment. The second implementation concerned an actual operating environment in an oil refinery. The main challenge in this setup was the considerable ambient noise extant in the measurements, instigating external noise entering the monitored system, which could potentially decrease the detection accuracy of our models.
However, it was demonstrated that the combined methodology managed to bridge the two components harmoniously and successfully conceal their respective weaknesses, as these models achieved near-perfect or, on some occasions, even perfect classification accuracy for the leakage detection task on the signals stemming from the oil refinery. More specifically, the LSTM AE contributes to the instantaneous and timely detection of leakages when they occur; nonetheless, in [EAAI], it was demonstrated that they were susceptible to false negatives, as the signal from the pipeline wall resembled noise for small leakages. This deficiency is compensated with the 2D-CNN classifiers, which were employed in classifying spectrograms in which the time point when the leakage occurred was purposefully omitted. Additionally, this approach offers the alternative of storing lengthy signals to store static low-resolution images that occupy considerably less memory.
The primary innovation of the presented integrated system concerns accurate and timely leakage detection. The system can contribute to preventing possible environmental disasters and incidents in the fuel industry and the future evolution of intelligent sensor solutions for liquid and gas storing and transporting procedures. Additionally, to the best of the authors’ knowledge, this is the first study implementing 2D-CNNs classifiers receiving spectrograms for the detection of leakage. Moreover, not only did we demonstrate the applicability of this combination of NN genres but also, we successfully demonstrated that this monitoring scheme could identify changes in the vibrations in the pipeline system different than the one that was used for its training. Furthermore, the neural networks presented in this study were also compared with the individual network components from our previous study, and ARMA models were used as a performance benchmark. The results demonstrated that the combined model did outperform the benchmark model, being more accurate overall, but it also outperformed its components.