1. Introduction
The safe and efficient production and transportation of oil and gas resources is extremely important for ensuring the sustainable development of society [
1]. The oil and gas production and transportation process has many equipment and connection points, so leakage problems often occur. Traditional manual leakage inspection often relies on human ears, sound bars, and other auditory means, which results in a large amount of inspection workload and missing and false alarms [
2]. With the gradual increase of unmanned stations, the requirements for the perception of the space safety status of the station are increasing. Therefore, the research on leakage detection technologies based on AI methods is attracting more and more attention.
In the current era of big data, big intelligence, mobile internet, and cloud computing, technologies such as the Internet of Things (IoT), cloud computing, big data, blockchain, and artificial intelligence (AI) are rapidly developing and significantly impacting the core areas of oil and gas exploration, development, and production [
3,
4,
5]. Core AI technologies, represented by machine learning and deep learning, have influenced these fields. Exploratory research utilizing deep learning algorithms, including convolutional neural networks, recurrent neural networks, long short-term memory, random forests, and gradient boosting, has yielded favorable outcomes in areas such as geological analysis, log interpretation, seismic interpretation, sweet spot prediction, geological modeling, and reservoir simulation (Li Yang et al., 2020) [
6]. In upstream oil and gas operations, gradient boosting and deep neural networks, among other AI technologies, have achieved notable successes in applications related to geological evaluation, drilling, reservoir engineering, and production optimization (Dmitry Koroteev et al., 2021) [
7]. Machine learning has been preliminarily applied to areas such as lithology identification and logging curve reconstruction, demonstrating substantial potential. Computer vision technologies have shown effectiveness in seismic processing and interpretation, such as first-arrival wave picking and fault recognition, while deep learning has been utilized in reservoir engineering for real-time control of water flooding development and production forecasting (Kuang Lichun et al., 2021) [
8]. Moreover, numerous domestic and international oil field companies and experts have conducted extensive discussions and research on the development and application of AI in the oil and gas industry [
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19].
At the end of year 2022, OpenAI released the ChatGPT model, which elevated the understanding and generation capabilities of AI for general natural language tasks to an unprecedented new height, attracting widespread attention from the academic and industrial sectors globally [
20]. In 2023, the landscape of generative large language models (LLMs) emerged, exemplified by LLMs, with numerous leading companies and research institutions releasing their own models, including OpenAI’s GPT series, Google DeepMind’s Gemini series, Meta’s LLaMA series, Tsinghua University’s GLM series, Baidu’s Wenxin Yiyan, Huawei’s Pangu model, Alibaba’s Tongyi Qianwen, and iFlytek’s Spark cognitive model [
21]. Compared to traditional AI technologies, LLMs exhibit a nonlinear surge in performance due to their massive training data and parameter scales exceeding tens of billions. They exhibit unexpected capabilities in language understanding, generation, and reasoning and possess general problem-solving capabilities for multiple tasks through key techniques like fine-tuning and alignment. These models are characterized by generalization, versatility, and emergent properties [
22].
As a result, AI technology has evolved significantly from small data to big data, from task-specific models to pre-trained models, and now to LLMs, transitioning from specialized to general-purpose applications. It is shifting from “AI+” (where AI is integrated as an enhancement tool within existing systems or processes to improve efficiency and effectiveness) to “AI+” (where AI serves as the core technology and driving force, actively promoting innovation and transformation in other fields, ushering in disruptive changes). Undeniably, the era of LLMs has arrived [
23].
The application of LLMs in the oil and gas industry is still in its infancy, mainly focusing on intelligent assistants, question-and-answer systems, and data analysis and visualization. Research on the application of these models in specific oil and gas scenarios is still in its exploratory stages. Some oil and gas companies have released LLM products leveraging open-source models and techniques, such as fine-tuning and retrieval enhancement. Meanwhile, some scholars have attempted to develop scenario-specific models for oil and gas operations, leveraging visual/multimodal foundational models. International researchers have conducted two experiments, PetroQA and GraphQA, utilizing OpenAI’s ChatGPT and GPT-4 in the petroleum industry for question-and-answer applications. PetroQA is a prototype tool that can answer natural language questions by sending specific oil and gas industry knowledge from PetroWiKi to ChatGPT. GraphQA allows users to automatically generate accurate graphical queries based on their natural language questions using GPT-4 and retrieve answers from a vast graphical knowledge base containing facts and relationships among concepts such as wells, oil fields, basins, strata, geography, geological age, rock types, and operators (Eckroth et al., 2023) [
24]. Exxon Mobil has developed a customized LLM specifically for the oil and gas industry, incorporating PetroWiki, research papers related to the oil and gas sector, open work order data, and industry reports, significantly enhancing the model’s capabilities and demonstrating immense potential in data processing, acronym handling, and industry-specific task processing. Saudi Aramco has released MetaBrain AI, the first large model for the oil and gas industry, trained on 70 trillion data points collected over the company’s 90-year history, featuring 250 billion parameters or learnable variables. This model aids upstream operations by analyzing drilling plans, geological data, and relationships between historical drilling times and costs, as well as well location recommendations. For downstream operations, it can provide precise predictions of refined products, including price trends, market dynamics, and geopolitical insights.
In light of the above context, this paper explores the application of AI agents based on LLMs in leakage detection in natural gas valve chambers. By integrating advanced language model technology with traditional leakage detection methods, this article aims to improve both the accuracy and efficiency of the detection process, offering novel solutions to the intelligent development of the oil and gas industry.
2. Overview of Large Language Model-Based AI Agents
For a long time, humanity has been striving to create AI that can meet or exceed human levels, with AI agents being seen as an effective tool to achieve this goal. AI agents are artificial entities that can perceive their environment, make decisions, and take actions [
25]. Numerous endeavors have been made to develop AI agents, primarily focusing on enhancing algorithms or training strategies to improve their ability to perform specific tasks. However, due to the lack of a general and robust model, it has been challenging to coordinate applications across multiple scenarios and tasks simultaneously. The emergence of LLMs has brought hope for the construction of general-purpose AI agents, owing to their formidable prowess in natural language understanding and generation [
26].
2.1. The Development History of AI Agents
The concept of AI agents originated in the field of philosophy, where described entities endowed with desires, beliefs, intentions, and capacity to act. In the 1950s, Alan Turing extended this concept to artificial entities, introducing the famous Turing Test [
27]. Standing as a fundamental pillar of AI, this test aimed at exploring whether machines can exhibit intelligent behavior comparable to that of humans. Essentially, an AI agent is not synonymous with a philosophical subject; rather, it represents a materialized embodiment of a subject’s philosophical concept in the context of AI. It can be seen as an artificial entity capable of perceiving its environment using sensors, interacting with humans, making decisions, and then responding with actuators [
28,
29].
As shown in
Table 1, the development of AI agents can be roughly divided into the following five stages.
2.2. Current Status of AI Agent Applications
It is precisely due to the outstanding capabilities of LLMs in autonomy, reactivity, pro-activity, and social ability that the development of agents has been greatly advanced, making them ideal components to function as the brain or controller of AI agents. The framework of agents based on LLMs, consists of three core components: the brain, perception, and action. The brain module is responsible for fundamental tasks, such as memory, thinking, and decision-making. The perception module senses and processes multimodal information, responsible for acquiring multimodal environmental data such as text, sound, images, and video. The action module refers to utilizing its own knowledge and abilities, in conjunction with tools, to perform various actions or processes and influence the surrounding environment [
25], for example, generating text output, executing specific actions, and manipulating tools to respond more effectively to changes in the environment and providing feedback, even altering and shaping the environment. This framework can be customized for different application scenarios.
The application scenarios for AI agents based on LLMs can be broadly divided into two directions: one is humanoid simulation, which studies the interactions and evolution among agents as well as with virtual environments, and the other is to solve specific work tasks and fulfill specific business requirements, such as developing software and intelligent assistants.
In April 2023, a team from Stanford University developed a generative agent system [
30], creating a human-like community sandbox environment where 25 AI agents were developed to simulate human behavior. These agents are based on the ChatGPT model and possess different identity, personality, and ages, enabling them to communicate freely, think, evolve autonomously, and even organize activities and disseminate information.
Sun and his team explored and validated the application of an agent architecture centered on an LLM, which issues and executes decision instructions through natural language interactions and simulates its environment using wargaming scenarios for validation in the field of intelligent decision-making [
31]. Liu and his team utilized the automatic iterative fine-tuning mechanism of LLMs, incorporating technologies such as business process cognitive mapping, on-demand resource parsing tools, and low-code business function development, and built a tailored business agent platform for the humanities and social sciences. This platform features customized generation of business resources, automated construction, and visualization of business models, effectively supporting research work [
32].
In the oil and gas industry, agents facilitate the realization of engineering concepts by enabling the deconstruction of problems and completing a standard process of identifying, defining, analyzing, generating solutions, evaluating options, selecting plans, implementing solutions, monitoring, providing feedback, and summarizing experiences. Individual agents have different capabilities and can independently reflect, provide feedback, plan, and act to execute specific work tasks or workflows in distinct task scenarios. When multiple agents interact, enhancements can be achieved through cooperative or competitive interactions, allowing for the execution of multiple work tasks or workflows. By interacting with humans and incorporating feedback, they can perform tasks more effectively and safely, providing better services.
3. Design of Intelligent Agent Architecture for Natural Gas Valve Chamber Leakage Detection
3.1. Overall Architecture Design
With the rapid development of the oil and gas production industry, the detection and identification of leaks in natural gas valve chambers during the production process has become a key link in ensuring safe production, reducing environmental pollution, and improving production efficiency. This paper attempts to construct a natural gas valve chamber leakage intelligent agent based on the aforementioned agent framework. The front end of intelligent question-and-answer system functions as the perception module, receiving information such as audio and infrared thermal imaging from the valve chamber environment. Employing a general-purpose large model as the central processing unit (CPU), it orchestrates the leakage detection industry model and a knowledge Q&A tool powered by RAG technology, facilitating the automatic completion of the leakage detection process. Based on the identification results, it intelligently generates corresponding leakage response schemes, which can be further elaborated and diversified through human–machine interaction. The overall architecture design can be summarized in
Figure 1:
Based on the IoT system, real-time collection of on-site audio, infrared temperature and other working condition data is carried out, which is transformed into characteristic data of the working environment through feature processing. The comprehensive leakage diagnosis is achieved through the scheduling of a leakage diagnosis AI model on the base large model, and corresponding disposal schemes are provided. On this basis, the system functions can be expanded through human–computer interaction. The overall process is shown in
Figure 2.
3.2. General Large Model Application Service Interface Adaptation Design
This study aims to utilize the Ollama method to design a general interface adaptation scheme for large model application services. It will provide a simple and user-centric API interface to accelerate the inference speed of LLMs, enabling seamless integration into various frameworks (such as LangChain) and enhancing both the application’s intelligence and user experience.
3.2.1. Model Deployment
The deployment of the base large model can be carried out on servers running Linux (such as Ubuntu) or on Windows, offering both offline and online deployment options. In this study, the deployment method has been developed through extensive practical exploration, revisions, and validation while referencing online resources, resulting in a deployment method that is clear, convenient, and reliable. This method provides fundamental capabilities like memory, reasoning, and decision-making to support the intelligent agent (see
Figure 3).
3.2.2. Model Encapsulation and Integration
Encapsulate and seamlessly integrate the large model. Convert the model into formats supported by various frameworks (such as the LangChain framework), configure relevant parameters and options, and integrate the encapsulated model seamlessly into the application service interface to achieve seamless integration between the model and the application.
3.2.3. Performance Optimization
To enhance the operational efficiency and stability of the large model, it is necessary to optimize the performance of the interface. This includes optimizing the model inference process, such as reducing redundant computations and optimizing memory usage, to ensure the efficient and stable operation of the model.
3.3. Design of Industry Model for Natural Gas Valve Chamber Leakage Detection
Currently, leak detection in natural gas valve chambers primarily relies on traditional methods, such as manual inspections, combustible gas detectors, and smoke alarms, which have issues like limited detection range and poor reliability [
33,
34]. This paper proposes a leak detection method based on AI acoustics and infrared temperature measurement. By collecting on-site sound and temperature data, the method utilizes neural network structures and machine learning algorithms to integrate two diagnostic modalities. The collection and analysis of acoustic features and ambient temperature variations provide a broad detection range and sensitivity to minor leaks, significantly improving the accuracy and stability of leak detection [
35].
3.3.1. Design of Leakage Detection Model Based on AI Acoustics
The AI acoustics-based leak detection model mainly includes the selection of pre-trained models, sample data collection, audio feature extraction, and retraining of the leak diagnosis model.
- (1)
Pre-trained model selection
In this paper, we selected panns_cnn14 as our pre-trained model, which has been trained using the Google Audioset audio dataset. This dataset is a large-scale audio dataset released by Google, containing 632 extended categories of audio events and a collection of 2,084,320 human-labeled 10 s sound clips sourced from YouTube videos [
36,
37]. The panns_cnn14 pre-trained model serves as the foundation for subsequent model training, enhancing the model’s generalization performance while enabling the sound classification model to be retrained with only a small amount of industrial simulated leak sound data.
- (2)
Sample data collection
A certain natural gas valve chamber was selected as the pilot project, where five explosion-proof microphones and two explosion-proof infrared thermal imaging cameras were installed in areas prone to leaks, as shown in
Figure 4. The microphones were used to capture real-time ambient sounds, while the infrared thermal imaging cameras provided real-time tracking of temperature in specific areas. An industrial venting experiment was conducted by operating the site’s vent valve to simulate leak scenarios and collect sound samples of leaks under various operating conditions, with different degrees of valve opening, simulating varying levels of leakage. Additionally, sound data from non-leak scenarios were collected for comparison.
- (3)
Audio feature extraction
Audio is essentially a time-domain signal feature; for ease of analysis, the industry generally requires converting the time-domain characteristics of audio into frequency-domain features [
38,
39]. The Mel-Frequency Cepstral Coefficients (MFCC) algorithm is one of the most commonly used speech feature extraction algorithms, which is derived by applying a linear transformation to the log energy spectrum on a nonlinear Mel scale across the sound frequency range. The logfBank feature extraction algorithm shares similarities with the MFCC algorithm, as it processes the results based on fBank feature extraction [
40,
41]. The primary distinction between the two lies in whether a discrete cosine transform is applied afterward. The logfBank algorithm has relatively lower computational requirements and exhibits higher feature correlation [
42,
43]. Therefore, this paper adopts the logfBank audio extraction method.
- (4)
Model retraining
Deep learning methods can overcome the limitations of feature dimensionality through the utilization of more flexible network architectures and deeper network layers, enabling the extraction of higher-level sound features with greater accuracy and thereby leading to improved classification metrics. The processing flow is illustrated in
Figure 5.
Using panns_cnn14 as the backbone to extract deep features of sound, the Sound Classifier creates a downstream classification network to classify the input audio. The time-domain information of the audio undergoes short-time Fourier transformation to extract spectral features. These features are then processed through the multi-layer convolutional neural network structure of panns_cnn14 to obtain an audio feature matrix. This matrix serves as the input for linear classification layers or neural network structures like LSTM, facilitating the training of a binary classification audio recognition model.
3.3.2. Design of a Leakage Detection Model Based on Infrared Temperature Measurement
When a leak occurs in a natural gas valve chamber, the vaporization of natural gas causes a rapid drop in environmental temperature. Therefore, by continuously monitoring the environmental temperature in real time and comparing temperature changes, the occurrence of the leak can be detected [
44].
This article adopts DWT temperature leakage detection, whose basic idea is to calculate the similarity between the temperature changes during the test period and the time series data of temperature changes during leakage and then compare it with a specific threshold to determine whether there is leakage. The principle of DWT is to calculate the distance between the test template T and the standard template R. Since the lengths of the two templates may be different, there are many corresponding matching relationships, and it is necessary to find the matching path with the shortest distance, assuming the following constraints are met: when moving from one grid (
i − 1,
j − 1) or (
i − 1,
j) or (
i,
j − 1) to the next grid (
i,
j), if the horizontal or vertical distance is
d (
i,
j), and if the diagonal distance is 2
d (
i,
j). The constraints are as follows:
Among them, g (i, j) represents that both templates are sequentially matched from the starting component and have been matched to the i component in M and the j component in T, and the distance between the two templates is matched to this step. And all of them take the minimum value after adding d (i, j) or 2d (i, j) to the previous matching result.
Label all matching steps, and through recursive calculations, the distance between the two templates can be computed. Additionally, the shortest distance path can be identified through backtracking (as indicated by the blue line) by tracing back the direction of the arrows, as shown in
Figure 6:
Calculate the obtained shortest path value, which can be normalized to serve as an approximate measure of similarity, representing the similarity between two time series data.
The DWT algorithm is applied to compute the similarity of time series temperature change data. By collecting temperature change sequences during leaks in a specific area under different seasons, times of day, and weather conditions, a leak temperature change sample database is constructed. When conducting leakage detection on a given temperature segment, the similarity mean of the temperature change time series of this segment is calculated against the temperature change samples in the sample database under the same working condition and compared with a predefined threshold to determine if a leak has occurred.
3.3.3. Design of an Integrated Leakage Detection Model Based on AI Listening and Infrared Temperature Measurement
Considering that a single diagnostic method may struggle to achieve complete accuracy and reliability, the integrated audio and infrared temperature diagnostic approach assigns different computational weights to perform audio leak detection and infrared temperature leak detection. This allows for the calculation of the final leak diagnosis probability, which is then compared with a specific threshold to provide the results of the integrated audio analysis and infrared temperature measurement. The accuracy weights of audio leak detection and infrared temperature monitoring sum to 1, with the accuracy weight for audio leak detection calculated as shown in Formula (2). On this basis, a large number of comparative experiments involving numerous leak detection scenarios are conducted to determine the specific threshold.
In the formula, represents the weight of audio leak detection, while and , respectively, represent the accuracy of pure audio leak detection and pure infrared temperature leak detection calculated through extensive experimental testing.
3.4. Design of a RAG-Based Natural Gas Valve Chamber Leakage Knowledge Question-and-Answer System
Despite the broad understanding of the world that LLMs possess, they have limitations. The training of these models requires a significant amount of time and relies on past data, resulting in a limitation in terms of timeliness. Furthermore, although LLMs can understand general facts from the internet, they often lack knowledge of specific fields or proprietary data from enterprises, which is vital for building AI-based applications.
Before the emergence of LLMs, fine-tuning was a common method for enhancing the model capabilities. However, as the scale of models and the amount of training data have increased, fine-tuning has become increasingly impractical. It not only necessitates a large amount of high-quality data but also consumes significant computational resources and time, making it unaffordable for many individual and enterprise users.
Therefore, researching how to effectively utilize proprietary data to assist LLMs in content generation has become an important area in both academia and industry. In this context, RAG technology emerged, first proposed by Facebook AI Research (FAIR) and its collaborators in 2021. Its purpose is to help models retrieve external information to improve their responses.
Efficiency and capability are extremely important of dialogue systems for document retrieval. SimBERT and RAG are two technologies that have attracted lots of attention; each of them employs a unique approach to enhance the performance of dialogue systems.
SimBERT transfers knowledge from a large-scale knowledge base into a small BERT model. It retains the power of the BERT model while improving performance on retrieval tasks through simplification and refinement. The advantage of this model is that it retains key knowledge through a distillation process while reducing the model’s parameters and improving the model’s operational efficiency. The answer generation processes of SimBERT is more straightforward, but the accuracy and depth of the answers have certain limitations.
RAG aims to solve the problem of knowledge acquisition in dialogue systems by combining both retrieval and generation strategies. First, it uses a retrieval model to find the most relevant knowledge and then a generative model to construct the responses. The advantage of this approach is that it can take advantage of a large pre-stored knowledge base to generate more accurate and in-depth responses by combining different attention mechanisms to understand the key information in the text.
A detailed comparative analysis of the two technologies was conducted in three aspects.
- (1)
Question retrieval: due to its powerful retrieval ability, RAG can quickly and accurately find the information that users need from a large amount of data and then generate more accurate and in-depth answers through the combination of retrieval and generation. In the comparison of 100 Chinese standard question-and-answer pairs in leakage detection, the accuracy is about 15% higher than that of the traditional SimBERT technology.
- (2)
Generative reasoning: RAG handles the generative task by generating answers in a way that allows the model to come up with entirely new answers rather than just matching pre-existing options. As a result, the ability to understand Chinese is significantly higher than the SimBERT technology. In the 10 sets of Chinese complex reasoning quizzes of leakage detection, reasoning accuracy of RAG was 80%, while the SimBERT technology was way behind, less than 10%.
- (3)
Stability of complex queries: thanks to the efficient understanding and adaptability of document context, RAG is able to handle large-scale data, ensuring stable and efficient services even under high load conditions. In the 10 sets of Chinese complex reasoning quizzes of leakage detection, the retrieval interruption rate is close to 0. The understanding of the document of the SimBERT technology is far from adequate, so it requires constant retrieval, iteration, and optimization of the document content. The retrieval interruption rate is close to 20%.
The core idea of RAG is to input the user’s question into a retrieval system, which first searches a knowledge base for the most relevant passages or texts related to the question. Then, these texts, along with the original question, are fed to the large model, which can synthesize this information to generate more targeted responses. A knowledge question-and-answer system based on RAG consists of the following key steps [
45]:
- (1)
Build a Facebook AI Similarity Search (FAISS) knowledge base index based on external data: Convert documents in the knowledge base into vector indexes for similarity matching and querying.
FAISS is designed to tackle large-scale similarity search and clustering problems and is widely used in various applications requiring efficient similarity search and clustering. The index component is responsible for creating the vector index to expedite query processing. The creation of the index can utilize mature search engine technologies to improve both query efficiency and accuracy.
- (2)
Retrieve content related to the user’s question from the knowledge base: When a user question is received, quickly search for the most relevant passages from the knowledge base based on the vector index.
The retriever component generates relevant technical solutions or answers based on leakage detection results. When the system receives the leakage detection results, the retriever will search the knowledge base for relevant technical solutions or responses and return them to the front-end interface for the user’s reference.
- (3)
Generating enhanced responses based on retrieved content: Both the retrieval results and the original question are fed into a LLM to generate a response.
Compared to knowledge question-answering systems that rely solely on the large model, those based on RAG technology have the advantage of providing relevant external information to the large model, significantly improving the quality of responses and making them more targeted and aligned with situational needs. At the same time, retrieving only the most pertinent passages instead of entire documents can reduce input volume and improve efficiency. The knowledge base is primarily constructed based on a series of specialized documents. During the research process, challenges arose in document retrieval due to the complexity of the content, strong interconnections between documents, and the presence of overlapping sections. This article also attempts to optimize the knowledge base structure by standardizing naming and categorization, extracting key information, analyzing relationships, and conducting regular maintenance and updates in order to enhance the efficiency of the knowledge base and the effectiveness of retrieval when addressing pipeline leakage issues.
The large model induction generator is the final component of the system, responsible for generating answers by incorporating comprehensive query inputs (with or without utilizing leakage detection results) leveraging RAG technology for answer generation. This component is developed based on an open-source large model and generates accurate, comprehensive, and in-depth answers by thoroughly understanding user query intent and contextual information.
4. Construction and Application of an Intelligent Agent for Natural Gas Valve Chamber Leakage Detection
4.1. Testing of General Large Language Model Capabilities
In this era of information explosion, research of open-source LLM is a hotspot in the field of AI. Qwen and LLaMa are the most complete ecosystems among Chinese and English open-source models, respectively. Qwen has a total of more than 200,000 ecologically derived models at present, while LLaMa has a series of excellent derived ecological models, such as Alpaca, Vicuna, and LLaVa. All the above models have achieved remarkable success in dialogue, multimodality, open domain Q&A, and so forth.
The performance of Qwen2:7B, Qwen2:72B, LLaMa2:7B, and LLaMa3:8B models in terms of Chinese understanding ability, retrieval accuracy, response time, and hardware resource requirements were comprehensively analyzed through 100 sets of Chinese standard Q&A pairs test of leakage detection. Test details can be found in
Table 2. The results show that Qwen2:7B has greater advantages and higher comprehensive value. Its Chinese understanding ability and retrieval accuracy are much higher than LLaMa2:7B and LLaMa3:8B, and its reaction time and hardware resource requirements are much lower than Qwen2:72B. Therefore, Qwen2:7B is more competitive and offers the highest cost performance in this practical research.
4.2. Construction of a Specialized Model for Natural Gas Valve Chamber Leak Detection
Based on the AI auscultation and infrared temperature measurement comprehensive diagnostic model, which serves as an important tool for the intelligent leak detection function, the primary tasks are conducted in the following steps: data collection, data processing, construction of the audio diagnosis model, construction of the infrared temperature measurement model, and construction of the integrated audio and infrared temperature measurement model.
4.2.1. Data Collection
Considering that the audio and infrared temperature measurements may vary significantly during leaks in the valve chamber due to seasonal changes, different times of the day, and varying weather conditions, such as rain, snow, wind, and sand, it is necessary to conduct multiple sets of industrial sampling under different conditions.
In the valve chamber of a trunk pipeline section within a long-distance transmission pipeline system, the discharge valve is turned 1/10, 1/8, and 1/6 of a turn at the pressure transmitter, local pressure gauge, and gas–liquid linkage valve, respectively, in order to simulate leakage conditions under small, medium, and large degrees of opening. Infrared thermography and a microphone collect data from the valve chamber on-site. At each point, the discharge lasts for 3 min under each degree of opening. This results in a total of nine data samples, each around 3 min long, being collected for each discharge operation.
4.2.2. Data Processing
Schedule to collect audio and image data from the valve chamber environment every 5 s and compile them into a data packet for storage in a database. For the audio data stored in the database, multiple processing measures such as noise reduction, speech enhancement, and format conversion will be applied, resulting in audio files that are 5 s long, with a sampling rate of 8000, in WAV format. As for the temperature data, since the infrared thermal imaging captures images at specific moments, OCR technology will be used to identify the temperature values in specific areas of the images, including the maximum, minimum, and average values.
4.2.3. Construction of Audio Diagnostic Model
A totat of 17,607 leakage audio files and 6791 thermal imaging pictures were cumulatively acquired through 8 groups of industrial venting experiments, covering various conditions such as different seasons, different time periods of a day, different natural weather, different opening conditions, and so forth. Detailed data collection is shown in
Table 3.
Select eight leakage audio samples collected from industrial venting as positive samples, and choose eight non-leakage audio samples from the same day of the experimental process as negative samples. Utilizing the pre-trained sound classification model panns_cnn14, perform model retraining and adjustment.
4.2.4. Construction of Infrared Temperature Measurement Model
By collating the temperature variation data under different opening conditions during all venting operations, a sample library of temperature change curves for valve chamber leakage has been constructed. When predicting an unknown temperature change curve segment, it is simply necessary to calculate the similarity between the temperature change to be tested and all samples in the temperature change curve sample library (or the temperature change curves within a specific range) and then compare it with a threshold to determine whether a leak has occurred.
As shown in
Figure 7, during the eight sets of vent sampling in the valve chamber, the change in ambient temperature was recorded during the 2 min period of venting at vent point 1 with a small opening condition. When the sample size of the industrial vent is sufficient and covers various seasons, time periods, weather conditions, and opening conditions, any temperature change sequence to be tested can be considered as indicating a leak if its curve shows a high similarity to certain curves in the sample database.
4.2.5. Construction of a Comprehensive Model for Audio and Infrared Temperature Measurement
The weight calculation of audio leak detection and infrared temperature leak detection results can normalize the accuracy values of the two single detection methods and conduct a large number of leak detection tests based on this. The experiment found that the ambient temperature has a significant impact on the accuracy of infrared temperature measurement diagnosis, especially when the ambient temperature is below −18 °C. Natural gas leakage will not cause changes in the ambient temperature, and only audio diagnosis can be used at this time. Wind speed can affect the effectiveness of the microphone in collecting leaked audio. When the wind speed is higher than 13.8 m/s, it becomes more difficult to distinguish the leaked sound from the audio collected by the microphone, and the accuracy of audio diagnosis significantly decreases. When the wind speed exceeds 17.1 m/s, any diagnostic (or acquisition device) will fail. For the convenience of research, temperatures below −18 °C, from −18 °C to 0 °C, from 0 °C to 33 °C, and above 33 °C are defined as Level I, II, III, and IV temperatures, respectively. Wind speeds below 13.8 m/s, from 13.8 m/s to 17.1 m/s, and above 17.1 m/s are defined as Level I, II, and III wind speeds. In addition, under low wind speeds, temperature changes generally show a segmented trend in the accuracy of infrared audio diagnosis. By sorting out the test data, the weights of audio and infrared diagnosis under different temperature conditions are determined as shown in
Table 4.
In the table, Sw represents wind speed, Ta represents the average temperature in the infrared temperature measurement area, and p1 and p2 represent the probability values of audio leakage detection and infrared temperature leakage detection, respectively.
In order to validate the detection effectiveness of the comprehensive diagnostic model in real production environments, emptying operations were conducted at the vent points in the valve chamber at various times and under different weather conditions. The emptying samples were then verified. In total, out of 1154 emptying tests, 1052 detected leaks, achieving an accuracy rate of 91.2%. The detailed test results are shown in
Table 5.
The leakage detection of natural gas pipelines or valve chambers mainly includes pressure detection, acoustic detection, ground penetrating radar (GPR), infrared thermal imaging, fiber optic sensors, and other detection methods. A qualitative analysis can be summarized as follows: the advantage of pressure detection is that it can monitor the pressure changes inside the pipeline in real time and is very sensitive to sudden pressure drops, but the disadvantage is that it is easily affected by the fluctuation of pipeline media and has a slow response to slow leaks. The advantage of acoustic detection lies in capturing signals generated by leaks, which is applicable to various types of pipelines, but the detection results may be affected by environmental noise interference. The advantage of GPR is that it can detect the condition and potential leakage location of underground pipelines, but certain types of soil (high moisture content) may affect the detection effect. The advantage of infrared thermal imaging is that it has a large detection range but is not effective when the environmental temperature difference is small and it is greatly affected by the weather. The advantage of fiber optic sensor detection is that it provides continuous monitoring and is sensitive to small changes, but its disadvantages include high installation costs and significant equipment investment. The advantages of integrating two detection methods based on audio and infrared leak diagnosis can effectively reduce the probability of false positives through multimodal data analysis and correction.
4.3. Construction of an Intelligent Question-and-Answer System for Gas Valve Chamber Leakage Disposal Scheme Based on RAG
This article is based on the open-source large model Qwen2:7B and designs and constructs an intelligent question-answering system based on RAG technology. It includes a front-end interface that can receive multi-modal data collected from the external environment and generate responses, as well as a specialized knowledge base leveraging RAG technology. By building a specialized knowledge base for leakage scenarios, the quality and accuracy of intelligent question-answering can be significantly improved. The main scenarios are as follows:
The specialized knowledge base in this article is primarily constructed from 100 professional Word documents, covering multiple aspects such as leak solutions, leak identification standards, and the main roles of different processes. The document content is complex, with strong interconnections between documents, and containing substantial overlapping information across different files.
Therefore, in order to overcome the above difficulties and quickly convert the documents in the knowledge base into vector index, text embedded in the vector database must be used to carry out similarity matching and query.
For the search mechanism, this paper adopts the FAISS vector database, which aims to solve the large-scale similarity search and clustering problem. The EnsembleRetriever module in the LangChain framework is utilized to enhance the accuracy and relevance of the search by combining the results of multiple searchers through a hybrid search approach. The module has the main features of multi-retriever integration, inverse rank fusion, and algorithmic advantage combination, which can accelerate the query speed and improve the query efficiency and accuracy.
In the retrieval process, an LLM induction generator is added, which is responsible for combining comprehensive query questions and generating answers using enhanced generation techniques. This component is developed based on the open-source LLM, which generates accurate, comprehensive, and in-depth answers by deeply understanding the user’s query intent and contextual information. Thus, it can obviously improve the efficiency of using the knowledge base and enhance the retrieval effect when dealing with the valve chamber leakage problem.
However, due to the limitations of the RAG, the problem of the illusion of a large model also occurs in the actual tests. For example, there were a large number of redundant and irrelevant responses to the question about the main ways of responding to oil and gas production leakages. (For example, there would be non-relevant responses like data recording and analysis, on-site verification and other post-action measures, as shown in
Figure 8 below).
4.4. Application Verification of Intelligent Agents for Natural Gas Valve Chamber Leakage Detection
Utilize an integrated diagnostic model based on audio auscultation and infrared temperature measurement as a diagnostic tool and employ a RAG-based intelligent question-answering system as a dialogue generation tool. By entering audio and infrared temperature data that reflect the status of the valve chamber into the front-end interface, the leak detection agent can diagnose whether a leak has occurred, simultaneously providing corresponding solutions and assigning tasks or issuing instructions.
The standard for pipeline leakage level is established on the basis of the standard indicators such as the “Code for Integrity Management of Oil and Gas Transmission Pipelines” [
46], combined with domain experts’ demonstration of specific pipeline operations. It is classified as suspected leakage when the leakage rate is less than 0.1 kg/s. When the leakage rate is 0.1 kg/s, it is considered microleakage. When the leakage rate reaches 1–10 kg/s, it is considered general leakage. When the leakage rate is greater than 10 kg/s, it is considered severe leakage.
A total of 6560 data samples was selected from five different leakage levels to calculate the threshold Ps according to
Table 4, as well as the probability range of each leakage level through statistical probability analysis. For example, the comprehensive diagnosis of audio and infrared was used to calculate the leakage probability Ps of 1678 samples of microleakage, among which nearly 97% (1626) samples had a diagnostic probability Ps between 0.8 and 0.85. The probability threshold Ps for other leakage levels were also calculated accordingly. The details of the threshold values for different leakage levels are shown in
Table 6 below.
The diagnostic response of the valve chamber leak detection agent is shown in
Figure 9. When a compressed file containing audio and images is input, the agent utilizes the comprehensive model of audio and infrared temperature measurement to diagnose whether a leak has occurred, determine the leak level, and provide corresponding solutions and task assignments.
Using the agent to call small models for various leakage detection tests, the accuracy of diagnosing different levels of leaks reached 90.4%. The detailed test results are shown in
Table 7.
The intelligent agent for valve chamber leakage detection, based on RAG technology, is illustrated in
Figure 10 The system can provide diagnostic results and corresponding treatment plans for the input files (audio and infrared images). Through the front-end input box, users can engage in intelligent Q&A with the agent, posing additional questions related to the leakage diagnosis results and treatment plans. Multiple testing results indicate that the agent’s responses are accurate and professional, demonstrating strong knowledge retrieval and summarization capabilities.