1. Introduction
In the process of oil field development, the production capacity of a single well or oil field is crucial for the economical and efficient development of that oil field. The production capacity of an oil field is determined by the output of its internal wells. Therefore, accurately predicting well production has always been a key focus of research in the oil industry [
1]. By predicting oil well production, researchers can develop more rational extraction plans, ensure scientific production, optimize production processes, and have significant implications for the coordinated development of the oil and gas industry and investment decision-making for enterprises [
2].
Traditional oil well production-prediction methods are mainly divided into two kinds: mathematical analysis and numerical simulation. Mathematical analysis is a statistical analysis method mainly based on the historical law of reservoir development, such as the method of Decline Curve Analysis and Water Drive Decline Curve, and these techniques have been widely used in oilfield production [
3,
4,
5]. However, this curve fitting method is based on many assumptions and experiences, while oil well production data often exhibit highly nonlinear characteristics, which these statistical methods struggle to handle. As a result, it typically adapts poorly and achieves low calculation accuracies [
6]. Numerical simulation methods are currently the most commonly used methods for predicting oil production [
7,
8,
9]. This method is based on the understanding of the real flow process of the underground porous medium, utilizing physics-driven data analysis and takes into account more factors, thereby predicting the results more objectively. Nonetheless, the computational process is limited by the development of a geologic reservoir model, which requires a laborious and time-consuming process that includes developing the geologic model, numerical modeling and performing history fitting. Each model must be of high quality to accurately predict oil well production.
Machine learning, as the mainstream direction of artificial intelligence, especially the rapid progress of deep learning, has brought new technologies and methods to guide oilfield production. In recent years, more and more petroleum enterprises have been building smart oilfields through machine learning methods to improve quality and efficiency. It has been applied in a number of oil-related works and has produced good results [
10], such as seismic interpretation [
11,
12], pipeline leakage [
13,
14] and drilling-optimization analysis [
15,
16]. In oil well production prediction, many traditional machine learning methods (such as support vector machine, artificial neural network and random forest) have been widely applied [
17,
18,
19], but most have focused on regression methods and paid little attention to the time series characteristics of production data. Production data are a typical time series structure, influenced by many internal and external factors. Consequently, many researchers utilize time series methods to predict future oil well production. Temporal Convolutional Networks (TCNs) [
20], Recurrent Neural Networks (RNNs) [
21], specifically Long Short-Term Memory (LSTM) [
22] and Gated Recurrent Unit (GRU) [
23] have emerged as prominent focal points in production prediction. Zhang et al. [
24] proposed using TCN to predict the production of water flooding oilfield wells, achieving high prediction accuracy. Indrajeet Kumar et al. [
25] integrated the LSTM algorithm with a genetic algorithm (GA) to predict oil well production, demonstrating that the GA-LSTM model surpasses other production-prediction models; Raghad et al. [
26] proposed a GRU model for petroleum production prediction, which features a low-complexity architecture and the capability to track long-interval time series data, demonstrating excellent performance.
Although machine learning methods have made significant progress in oil well production prediction, there is still room for improvement. The single RNNs show low accuracy in time series prediction and struggle to fully capture input data characteristics. To enhance prediction accuracy, researchers have combined RNNs with other methods to build hybrid models for time series. Convolutional neural networks (CNNs) [
27] are one of the most representative network structures in deep learning, excelling in feature extraction and information mining compared to other methods. Zhang Lei et al. [
28] introduced the CNN-LSTM method to predict oil well production, with the model’s efficacy validated in actual reservoirs. Chen et al. [
29] introduced the CNN-GRU method to predict oil well production and indicated that the proposed CNN-GRU model outperforms other prediction approaches. Recently, The Kolmogorov–Arnold Network (KAN) has presented a novel approach for time series prediction. Xu et al. [
30] applied KAN in time series analysis, proving KAN’s ability in capturing complex temporal dependencies. Based on this, this study proposes a GRU-KAN hybrid model to establish an oil well production-prediction model, where GRU excels at capturing long-term and short-term dependencies in time series and KAN can accurately capture the nonlinear relationships in oil well production affected by multiple complex factors through its flexible nonlinear feature-extraction capabilities. Compared with other models, this combination not only achieves better prediction accuracy but also reduces training time and improves computational efficiency, demonstrating potential applications in the oil industry. The contributions of this study are as follows:
- (1)
We propose the GRU-KAN model to address challenges in oil well production prediction and build a prediction model for each well using GRU-KAN.
- (2)
To evaluate the effectiveness of the GRU-KAN model, we introduced popular single and hybrid machine learning models for comparison, including LSTM, GRU, TCN, Bidirectional LSTM (BiLSTM) [
31], Bidirectional GRU (BiGRU) [
23], CNN-LSTM, and CNN-GRU. Evaluation results indicate that GRU-KAN outperforms the other methods in predicting oil well production accuracy and generalization ability.
- (3)
We also compared the effects of different data imputation methods and hyperparameter optimization methods on the prediction results, and the results show that using the MissForest and PSO algorithms outperforms traditional methods.
The remainder of this paper is organized as follows: In
Section 2, we describe in detail the core algorithms used in this study, including GRU and KAN. We also explain how the Particle Swarm Optimization (PSO) algorithm determines the model’s hyperparameters. Additionally, we introduce the MissForest algorithm, which is employed to handle outlier data. In
Section 3, we provide a comprehensive overview of the experiments. First, we discuss the data-processing steps, which cover data cleaning, feature parameter analysis, data normalization, the design of the time window size and dataset splitting. Next, we describe the model training and prediction process, using various evaluation metrics to assess model performance. This section also compares the prediction results of single models and hybrid models, as well as the impact of different data-imputation methods and different hyperparameter optimization methods on model performance. In the end, we discuss the model’s generalization ability. In
Section 4, we summarize the contributions of this study and discuss potential future research directions.
4. Conclusions
Accurate prediction of oil well production is of great significance for the development and management of oilfields, especially in optimizing production efficiency and reducing operating costs. This study proposes an oil well production-prediction model based on the GRU-KAN model and validates it using real oilfield data. The prediction results are accurate. The MissForest algorithm was also applied for data imputation, which improved the data quality. Specifically, on the test datasets of wells F14 and F12, the GRU-KAN model achieved excellent results: RMSE of 11.90, MAE of 9.18, MAPE of 6.0% and R2 of 0.95 for well F14; and RMSE of 45.69, MAE of 36.26, MAPE of 11.02% and R2 of 0.94 for well F12. The model’s generalization was also validated using data from wells F1, F11 and F15. In comparison, the performance of CNN-LSTM and CNN-GRU models on the same datasets was noticeably inferior to that of the GRU-KAN model, highlighting the latter’s significant advantage in improving prediction accuracy. Furthermore, the computation time of the GRU-KAN model was nearly half that of CNN-LSTM and CNN-GRU, significantly improving computational efficiency. These results indicate that the GRU-KAN model not only enhances the accuracy of oil well production predictions but also offers higher computational efficiency, making it suitable for real-time predictions in large-scale production environments.
The broader impact of this research lies in its practical potential for oilfield production optimization. The model provides a more efficient and accurate solution for oil well production prediction and also offers insights into other oil-production-related prediction tasks. In the future, this model could be applied to construct prediction models for other production indicators, enabling multi-index prediction and supporting comprehensive optimization of oilfield production.
Despite the significant achievements of this study, certain limitations still exist. First, the data used in this study primarily come from a specific oilfield and have not been widely validated under different geological conditions or types of oilfields. Additionally, the model is constructed based on single-well data and its generalization ability needs to be improved. Future research could consider integrating data from multiple wells to further enhance the model’s robustness and applicability.
The practical significance of this study lies in providing a more reliable prediction tool for oilfield production management, especially when dealing with complex nonlinear data. The GRU-KAN model demonstrates unique advantages in such scenarios. Future research can further optimize this model and explore its application in other production areas (such as hydraulic fracturing and well pressure monitoring) to address more challenges in real-world production.