1. Introduction
Swab and surge pressures are created during the drill string tripping out and into the wellbore, respectively. The magnitude of the hydrostatic pressure fluctuation due to the swab/negative and surge/positive depends on, among others, the tripping speed. An unoptimized tripping speed may induce wellbore instability, leading to well collapse or formation fracturing, as well as control issues, which increase the nonproductive time (NPT) and hence the overall drilling budget. The precise prediction of well pressures is crucial, especially in deep-water and horizontal drilling, where a narrow well stability margin poses significant challenges. In addition to tripping speed, fluid properties, geometry, eccentricity, and flow rates also control the well pressure. Moreover, the prediction of the optimum maximum and minimum tripping speeds reduces undesired nonproductive times.
Over the years, researchers have conducted numerous studies of swab and surge phenomena during tripping in and out of the wellbore. They have conducted several experiments and also developed various models based on assumptions such as steady-state and transient conditions, non-slip at the wall, different flow scenarios, fluid rheological properties, well configurations, and operational parameters. Amir et al. (2022) [
1,
2] extensively reviewed existing swab and surge models, including contributions from Burkhardt (1961) [
3], Schuh (1964) [
4], Fontenot and Clark (1974) [
5], Mitchell (1988) [
6], Ahmed (2008) [
7], Crespo (2010) [
8], Srivastav (2012) [
9], Gjerstad (2013) [
10], Tang (2016) [
11], Fredy (2012) [
12], Erge (2015) [
13], He (2016) [
14], Evren M. (2018) [
15], Ettehadi (2018) [
16], Shwetank (2020) [
17,
18], Zakarya (2021) [
19], and Amir et al. (2023) [
20]. However, these models did not consider all the parameters that affect the swab and surge, and their applicability to estimate experimental data is limited to the specific assumptions and setup conditions.
During drilling and tripping operations, the well pressure is normally determined by the hydrostatic pressure and by the pressure loss due to fluid flow. The equivalent circulation density in specific gravity (sg) is given as (Mitchel et al., 2011) [
21]:
where
is the pressure loss (bar) in the annulus due to fluid flow and
ρstatic is the static drilling fluid density (sg). TVD is the true vertical depth (m).
The pump pressure is also determined from the pressure losses across circulation flowlines. The pressure loss in the annulus is given as (Mitchel et al., 2011) [
21].
where V
Q = Q/A is the velocity of the fluid flow, D
H is the hydraulic fluid flow through the annulus (D
Well–D
Pipe), L is the length of the flow line, and f is the friction factor.
The friction factor f is a function of the Reynolds number, and the surface roughness is given by Haaland (1983) [
22]:
where ε is the surface roughness coefficient (ε = k/d), k is the surface roughness, and D is the diameter of the pipe.
The friction factor parameter is sensitive, and its prediction is difficult as it is a profile. The theoretical calculation of pressure losses in a wellbore requires knowledge of fluid properties at various temperatures and the shear rates as the fluid flows through each interval of a borehole.
To determine the rheological properties of the drilling fluids and the ECD, there are several models available in the industry’s commercial software. Despite the availability of various mathematical, empirical, and physics-based models currently used in the drilling and well construction sector, incidents related to swab and surge pressures continue to occur. The comparisons of field-measured data with the hydraulic well-flowing models showed discrepancies, and the model required a calibration factor based on measured data (Lohne et al., 2008) [
23]. Simulation studies conducted by Amir et al., 2023 [
24] showed that the swab and surge prediction of the models were inconsistent and deviated from each other for the considered experimental setup.
In recent years, the application of data-driven modeling has been employed in various sectors, including petroleum drilling. There are several machine learning modeling algorithms. For instance, Amir et al. (2022, 2023) [
24,
25] utilized machine learning techniques (i.e., linear regression, multivariable regression (MVR), Random Forest, ANN, long-short-term memory (LSTM), and XGboost models) to predict the tripping and drilling operation’s equivalent circulating mud density (ECD), and the results showed satisfactory performance.
In the petroleum industry, among others, ANN algorithms have been applied for prediction such as ROP (Reda Abdel Azim (2020) [
26], Ramin Aliyev (2019) [
27]), ECD (Husam H. Alkinani (2020) [
28], Amir et al., 2021 [
24,
25]), drilling speed (Ahmad Al-Abduljabbar et al. (2020) [
29]), and drilling-fluid-rheological-parameter real-time prediction (Khaled Al-Azani et al. (2018) [
30]). In addition, A. Alnmnr (2024) implemented machine learning to investigate Swell Mitigation [
31]. RP Ray (2023) studied the importance of data integration in Geotechnical Engineering [
32]. E. Gurina (2022) deployed machine learning techniques to predict dysfunctional events in drilling and wells [
33].
A literature review indicates the application of the Group Method of Data Handling (GMDH) technique across diverse fields. GMDH is an extended version of multivariable regression that contains non-linear interacting terms. Among others, GMDH has been utilized for accurate log interval value estimation (Mohammed Ayoub (2014) [
34]), permeability prediction by Alvin K. Mulashani (2019) [
35] and Lidong Zhao (2023) [
36], as well as permeability modeling and pore pressure analysis by Mathew Nkurlu (2020) [
37]. Additionally, GMDH finds applications in cement compressive strength design (Edwin E. Nyakilla, 2023 [
38]), rock deformation prediction (Li et al., 2020 [
39]), bubble point pressure estimation by Fahd Saeed Alakbari (2022) and Mohammad Ayoub (2022) [
40,
41], gas viscosity determination, CO
2 emission modeling (Rezaei et al., 2020 and 2018 [
42,
43]), the prediction of CO
2 adsorption by Zhou L. (2019) [
44] and Li (2017) [
45], forecasting stock indices, and modeling power and torque as demonstrated by Ahmadi (2015) [
46] and Gao Guozhong (2023) [
47], and the prediction of pore pressure by Mgimba (2023) [
48].
The GMDH neural network architecture is not as fully connected as that of the commonly used ANN. The modeling performance of the GMDH network in comparison with that of the ANN model is presented in several publications including André et al., 2012 [
49]; Bernard et al., 2020 [
50]; Ahmadi et al., 2015 [
46]; Rezaei et al., 2015 [
43].
This study aimed to implement four machine learning algorithms on field drilling data. The first study compared MVR with GMDH to assess the impact of the non-linear interacting features that GMDH has on model prediction. The second study proposed new GMDH-method-generated features that can be utilized as inputs for deep-learning (ANN) modeling, and the networks were fully connected. Then, the newly proposed GMDH-ANN method was compared with a standard ANN that did not include interacting terms. Finally, empirical models were derived from field drilling data by using GMDH and MVR methods.
4. Discussion
Accurately predicting the equivalent circulating density (ECD) during tripping in/out and drilling operations is crucial in ensuring safe and cost-effective well drilling. There are several empirical and physics-based hydraulics models available in the literature [
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20]. However, the application of the models is limited to the considered assumptions and model controlling parameters. Therefore, it is common to practice calibrating the model with a measured dataset (Lohne et al., 2008) [
23]. Recent research has focused on using data-driven modeling techniques applied in diverse fields including the petroleum industry.
This study explored the performance of multivariable regression (MVR), the Group Method of Data Handling (GMDH), a standard Multilayered Perceptron (ANN), and the proposed GMDH-featured ANN model to predict ECD based on drilling parameters.
Before implementing the machine learning algorithms, the field drilling dataset was preprocessed to make it clean and to select the appropriate features. The data used in this study included surface and downhole parameters. Appropriate Python libraries were employed for data preprocessing and feature selection.
The first part of the study compared multivariable regression with the GMDH regression. The multivariable regression related two or more independent variables with the target variable. The nature of the regression was an independent linear combination of the variables. However, input features may have nonlinearly varied with the target parameter. Moreover, the input parameters may have had an interaction effect on the target parameters. To study these effects, the GMDH regression was considered and compared with the multivariable regression. The application of the GMDH network has been implemented in several fields [
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44]. The GMDH algorithm, as described in several references [
34,
35,
46,
53], utilizes multiple inputs to identify the best combination and generates a quadratic polynomial. An external criterion determines the selection of this optimal combination of two features. Two input features were selected to compare the GMDH method (Equation (8) with MVR (Equation (1)). A total of 18 experiments were designed (
Table 3 in
Section 3).
Figure 5,
Figure 6 and
Figure 7 offer a comparison of the results obtained from the experiments. The results showed that all the GMDH models predicted a higher R
2 compared to the multivariable regression (MVR). This indicated that the nonlinear and interaction terms had a significant effect on the ECD prediction. The degree of the model accuracy performance depended on the correlation of the features with the target variable.
The second part of this study involved comparing the performance of the standard ANN and the proposed GMDH-featured ANN. To ensure consistency in the comparison, both the proposed GMDH-featured ANN (
Figure 3) and the fully connected ANN models (
Figure 1) were developed, unlike the standard GMDH network where neurons are not fully connected (
Figure 2b). The ANN used only two features, and the proposed GMDH-featured ANN had five inputs generated from the two selected features. Out of the eight experimental designs (
Table 4), three of the designs showed that the proposed GMDH-featured ANN exhibited a higher model performance as compared with the ANN, whereas the remaining five designs were the same as shown in
Table 5. This could have been due to the insignificant impact of the nonlinear and interacting terms.
Table 6 displays the input features of the standard ANN and the modeling results obtained from the selected three experiments #1, #5, and #8.
Table 5 provides a summary of the model results, showing that the proposed GMDH-featured ANN achieved the R
2 values of 0.96, 0.87, and 0.75, respectively, compared to the regular ANN with performance accuracies of 0.83, 0.78, and 0.69, respectively. These results indicated that the proposed GMDH-feature ANN had an enhanced performance compared to the regular ANN. However, more experiments need to be performed to examine the model’s performance for other transfer/optimizers and hyperparameters. This will be studied in future work.
Based on the GMDH and MVR, the mathematical model derived from Tests #1, #5, and #8 shown in
Table 6 is summarized in
Table 7,
Table 8 and
Table 9. The GMDH model given in Equation (11) is a function of input data (x
i and x
j) and has six coefficients a
0 to a
5. Similarly, the MVR model shown in Equation (4) has input data (x
i and x
j) with three coefficients β
0 to β
2.
The regression coefficients obtained from the GMDH and MVR were rounded to decimal digits and presented in
Table 7,
Table 8 and
Table 9. Moreover, during MVR modeling, the
p-test values of the coefficients of all the experiments showed less than 5%. Hence, the MVR model was statistically significant.