1. Introduction
In the process of petroleum exploration and development, accurate prediction of formation pressure has always been an extremely important aspect. In the petroleum industry, there are many methods used to predict formation pressure, which can be broadly divided into two categories: pre-drilling prediction based on seismic exploration data and post-drilling detection using well-logging data.
Currently, in China, the seismic layer velocity obtained from seismic exploration and the logging data obtained after drilling are commonly used as the basis for establishing a system that can quickly understand the geological environment at the drilling site [
1]. However, research on how to predict formation pressure more accurately has been conducted both domestically and internationally. In terms of well logging, Zhao et al. [
2] improved the prediction accuracy by using cluster analysis to separate shale from other lithologies and obtain normal trend line velocities. Xie et al. [
3] effectively calculated formation pore pressure by combining dipole sonic and other conventional well-logging data, achieving good application results. Li et al. [
4] found that the Dc index method and the time difference of sonic waves are suitable for calculating formation pressure in Block A of the Bohai Sea, with a maximum error of 5%, which meets engineering requirements and is of practical significance for on-site drilling operations. Fan et al. [
5] proposed and modified a formation pressure prediction model based on the Eaton method, taking into account the geological influences of the actual area. They established three pressure profiles for the formations in that area, providing important support for subsequent drilling. In terms of seismic exploration, pre-drilling pressure prediction mainly involves calculating formation pore pressure using seismic layer velocity data. Wang et al. [
6] introduced the effective stress principle in porous media to develop a new method for calculating pore fluid pressure in unconventional reservoirs, achieving good application results. Zhang et al. [
7] used the DIX formula to obtain layer velocities based on velocity spectrum data and predicted formation pressure using seismic layer velocities. Yang et al. [
8] combined seismic and well-logging data to establish pressure prediction profiles for deep formations and conducted in-depth analysis of the drilling geological conditions in the study area. Ma [
9] used an improved Fillippone method to predict formation pressure ahead of the drill bit, and the results showed that the method is scientifically effective and meets the requirements of actual field applications. Qian et al. [
10] applied the equivalent medium theory to determine the upper and lower limits of rock velocities in the Fillippone formula, obtaining more reasonable estimation results.
So far, the methods for predicting formation pressure based on seismic data can be broadly classified into two categories. One category relies on seismic velocity spectrum data and uses the Fillippone formula and its modified versions to predict formation pressure. The advantage is the ability to establish the spatial distribution of formation pressure in the underground three-dimensional space. However, this method relies on seismic layer velocities, which are often difficult to obtain accurately. Moreover, when predicting formation pressure, it requires the establishment of an empirical equation between velocity and pressure, resulting in insufficient prediction accuracy of formation pressure in the entire three-dimensional space. The other category involves predicting formation pressure using seismic data under well constraints, including the equivalent depth method and the Eaton formula [
11].
In recent years, the introduction of geostatistical stochastic simulation methods has provided effective tools for reservoir prediction. In particular, the collocated cokriging has strong advantages in integrating well-logging information and effectively utilizing precise well location data. Gao [
12] used three different kriging methods to interpolate the sandstone formation in Yuncheng City. The fitting effect of the exponential model in the variogram fitting was the best, and the results showed that the ordinary kriging method could more accurately reflect the spatial distribution of sandstone formation elevation in Yuncheng. Ma et al. [
13] combined functional data analysis and kriging interpolation techniques to improve the accuracy and reliability of oil and gas well productivity prediction. Li et al. [
14] addressed the “banding effect” that occurs during kriging interpolation by using distance constraints to correct the weights obtained from kriging estimation, making the kriging interpolation method more stable.
However, whether it is simple kriging or ordinary kriging, the interpolation effect decreases rapidly when the data points are sparse, and the interpolated results do not meet expectations. Cokriging technology has wider applications. Under the constraint of secondary data that has a certain correlation with the primary data, it conducts collaborative simulation on a small amount of sparse and irregularly distributed primary data, and the simulation results are similar to the spatial distribution pattern of the secondary data. Wang et al. [
15] used cokriging interpolation to predict the spatial distribution trend of low-pressure pipeline networks in suburban and rural areas. The results showed that the prediction model performed well, with average error and root mean square error within the acceptable range. Du et al. [
16] used cokriging to predict coal seam thickness, and the results showed that the method was effective, with small errors and improved accuracy. Geng et al. [
17] applied cokriging to a three-dimensional inversion of gravity gradient tensor data, reducing the ambiguity of inversion and improving the resolution of inversion results. Yu et al. [
18] proposed a cokriging porosity prediction method under facies control based on cokriging. Through comparative analysis, the estimated results had smaller errors and the predicted results were more realistic.
Cokriging is relatively complex in computation, with a large amount of calculation, and the actual values of primary data at the same location cannot be consistent, which limits the development environment of cokriging. Collocated cokriging technique simplifies the complexity of cokriging in computation and matches well with the known data points and actual parameters. Chen et al. [
19] integrated seismic, well-logging, geological, and other information to predict the distribution and variation of channel sand bodies using collocated cokriging, achieving significant improvement in prediction accuracy compared to traditional methods. Wang et al. [
20] improved the prediction accuracy of seismic inversion by combining collocated cokriging to effectively utilize information from horizontal sections. Zhang et al. [
21] applied sequential Gaussian collocated cokriging to predict reservoirs by incorporating other seismic parameter data, demonstrating high prediction accuracy and effective reduction in drilling risks. Niu et al. [
22] estimated the variogram function by filtering the expected values of covariates using primary variable observations, and proposed and derived a new collocated cokriging method.
2. Methods
2.1. Principle of Formation Pressure Prediction
The Eaton method predicts formation pressure by establishing a normal compaction trend line. The calculation formula is as follows:
In the above, represents formation pressure; represents overburden pressure; represents normal hydrostatic pressure; represents actual formation velocity, which is obtained from sonic log data or stacking velocity; represents the normal trend line velocity, mainly obtained by fitting the velocity of mudstone; and represents the Eaton index, which is a coefficient related to the formation. The value of the Eaton index varies for different geological periods and regions.
In the formula,
hydrostatic pressure increases with the increase of depth; at the same depth, its value increases with the increase in formation water density:
In the formula,
overburden pressure is an important factor in generating underground pressure. Its driving force is mainly a combination of sedimentary and compaction effects of the formation. This pressure is a fundamental parameter in the process of predicting formation pressure. In the detection of formation pressure, its value is determined first.
In this formula, represents the gradient of overburden pressure at a certain depth; represents seawater density; represents seawater depth; represents the average density of the upper density-free logging formation section; represents the thickness of the upper density-free logging formation section; represents density scatter data at a certain depth; and represents depth interval. The calculation of overburden pressure involves multiple data and is a tedious process.
2.2. Cokriging Principle
When there are two or more characteristic parameters in the interpolation area and there is a significant correlation between the main variable and covariates to be interpolated within that area, the cokriging method can be used. This method belongs to multivariate geostatistics and involves analyzing multiple parameters in the study area, studying the linear and nonlinear relationships between these parameters, and understanding the spatial differences of different parameters to achieve a certain level of precision in estimating the main variable.
The cokriging estimation is formulated as follows:
In this equation, represents the estimated value at the estimation point ; represents the actual attribute value of the main variable at point ; represents the actual attribute value of the covariate at point ; and and are the weight coefficients corresponding to the main variable and covariate, respectively.
Cokriging estimation is defined as a linear combination of available samples. Similar to ordinary Kriging, cokriging estimation requires unbiasedness and minimum error variance.
By incorporating the minimum variance condition with weight constraints, each considered random variable introduces Lagrange multipliers during the minimization process. By taking partial derivatives of each weight
,
, and
and setting the results to zero, the minimum variance can be determined. After expanding and processing the variance, derivative calculations yield the cokriging equations:
In the equations, represents the autocovariance of the main variable; represents the autocovariance of the covariate; represents the cross-covariance between the main variable and covariate; and and are Lagrange factors.
The covariance function can be obtained from the variogram function, but calculating the cross-covariance or cross-variogram requires a significant amount of computation and involves complex derivation, which severely reduces the interpolation efficiency of cokriging. Even if the cross-variogram function is obtained, solving the equations can lead to a singular, resulting in situations where the estimation point has no solution.
2.3. Collocated Cokriging Principle
Collocated cokriging is a simplified form of cokriging that greatly reduces the computational burden on the equation system. In cokriging, a considerable number of covariates need to be selected for calculation. However, collocated cokriging only requires the covariates at the same positions as the estimation point. The covariates around the estimation point will be masked by the covariance in the same position. This also requires that there are corresponding covariates for each estimation location.
In collocated cokriging, only three functions are needed: the autocovariance function of the main variable, the cross-covariance function between the main variable and covariates. The latter can be derived from the Markov model, significantly improving the computational speed of collocated cokriging.
The estimation value in collocated cokriging is given by the equation:
In this equation, represents the estimated value at the estimation point and represents the actual attribute value of the main variable at point . Since there is only one secondary datum used for calculation, the weight for the secondary data has only one value.
Then, by incorporating the condition of unbiasedness and minimum error variance:
A series of mathematical operations leads to the collocated cokriging equation matrix:
2.4. Workflow of Collocated Cokriging Method for Prediction
The Eaton formula method is widely used and has high accuracy. However, its predictive accuracy is not high over the entire spatial domain. Geostatistical stochastic simulation methods, including the collocated cokriging method, serve as effective tools to improve accuracy by integrating well-logging information in spatial analysis. Combining the Eaton formula method with the isochronous co-simulation kriging method is expected to enhance the accuracy of formation pressure prediction in the plane. Numerical simulations of seismic P-wave and S-wave velocities are conducted to analyze the experimental variogram of P-wave velocity and fit it using a spherical model. Anisotropy of the formation is considered, and an elliptical anisotropic model is established to make the simulation results better match the actual formation. By comparing with simple kriging and cokriging methods, the accuracy of the collocated cokriging method is validated. The effects of range, azimuth, and number of reference points on the simulation results are analyzed, and appropriate parameters are selected. Finally, the Eaton formula method combined with the collocated cokriging method is applied to predict formation pressure in the ultra-deep rock formations of the Junggar Basin. The operation process is shown in
Figure 1.
The Eaton method is used to calculate the formation pressure at well locations. In collocated cokriging, the assumption is that the variogram function of the secondary variable is the same as that of the primary variable. Therefore, fitting only the secondary variable is required. A spherical model is used for fitting, and the parameters are obtained using the least squares method to determine the range. An elliptical model is established to improve the interpolation accuracy. The covariance and cross-covariance functions of the primary and secondary variables are calculated using the variogram function. A certain number of primary data points are selected around the estimation point, and their weights, along with the weights of the corresponding secondary data points, are obtained by solving the equation matrix for estimation.
3. Numerical Simulation of Collocated Cokriging Method
3.1. Fitting Experimental Model Variogram Functions
The numerical simulation uses compressional wave velocity (Vp) as the covariate to constrain the interpolation of the main variable—shear wave velocity (Vs). To simulate the distribution of actual well points, six randomly sampled data points of Vs are chosen as the primary data, as shown in
Figure 2.
The Markov MM1 model assumes that the covariance function of the covariate is the same as that of the primary data, so only the experimental variogram function of the covariate needs to be discussed. Variogram functions exhibit directional characteristics in anisotropic media, with the azimuth of the variogram function being consistent with its opposite direction. An angle tolerance of 30° is chosen, and the variogram functions of the covariate in the six directions on the plane are analyzed within 30° intervals, as shown in
Figure 3. The length of the range is defined in terms of the number of grid cells, and this definition size is used for all ranges in the experimental model.
From the experimental variogram function graph (
Figure 3) and the fitted range ellipse graph (
Figure 4), it can be observed that the maximum range occurs in the northeast–southwest direction (21°) and is equal to 71.704 grid units in length. This direction represents the maximum extension of the compressional wave velocity, indicating good continuity of the compressional wave velocity in that direction.
Figure 4 is the ellipse variogram function graph on the plane of the compressional wave velocity. From this function, it can be observed that the variogram function structure type within the work area is an ellipse model. The maximum range in all directions is 71.704 grid units in length, with an azimuth of 21°. The minimum range is 36.872 grid units in length, with an azimuth of 111°. The ratio of the short axis to the long axis of the range ellipse is approximately 0.514, indicating a strong directional distribution of compressional wave velocity Vp within the work area, and its distribution on the plane is not uniform.
After determining the range size and direction, with the maximum range being 71.704 grid units, the minimum range being 36.872, and the direction being 21°, the variogram function of the covariate is established using the range ellipse model. According to the assumption of the Markov MM1 model:
where
and
are the standardized autocovariance functions or autocorrelation plots of the main and covariate variables,
is the cross-covariance function or cross-correlation plot between the main and covariate variables, and
is the linear correlation coefficient derived from the collocated main and covariate variables.
The MMI model does not require the covariance function of the covariate variable because of the following assumptions.
It is assumed that and have the same shape and continuity, and by adjusting the covariate variable y(u) with the main variable z(u) located at the same position, the influence of any more distant main variables can be eliminated. If z(u) is defined on the same or larger volume than y(u), then MM1 is a reasonable model.
3.2. Results
In the case of limited primary data points, the simple kriging, cokriging, and collocated cokriging methods are sequentially applied to simulate and predict the S-wave velocity, as shown in
Figure 5.
In comparing the three kriging methods, it is found that simple kriging has limited practical reference value when interpolating with few data points. When the number of primary data points is small and the secondary data is densely distributed, cokriging relies more on the secondary data, with less influence from the primary data during interpolation. Collocated cokriging uses only the secondary data located at the same position as the estimated point, and the calculated weights are dependent on the correlation coefficients. Cokriging requires a significant amount of computation in calculating the cross-covariance or cross-variogram functions, and its computational efficiency is significantly slower compared to collocated cokriging.
3.3. Discussion
The simple kriging method uses only the primary variable in the interpolation process, and since no secondary data are used, the predicted distribution map generated by simple kriging has limited practical reference value in the current situation. When introducing secondary data with certain correlation coefficients, the predicted distribution map simulated by cokriging approximately aligns with the spatial distribution of the secondary data. The influence of the primary data on the estimated values in the surrounding area is limited, and the predicted distribution map is mostly determined by the secondary data. On one hand, the scarcity of primary data points results in a limited influence range. On the other hand, because the prediction grid is based on the distribution of secondary data, when searching for data points around the estimated point, the secondary data points are closer and more densely distributed. According to the variogram function model, the closer the lag distance, the larger the covariance value. Therefore, in the process of solving the equation system, the weights corresponding to the secondary data are larger, which leads to the dominance of the secondary data in the prediction when the primary data points are relatively sparse. The predicted results of collocated cokriging show a spatial distribution that is roughly consistent with the distribution of the secondary data. However, compared to cokriging, the weight of the secondary data in collocated cokriging depends on the correlation coefficient between the primary and secondary data, and the influence of the primary data on the estimated values in the surrounding area is enhanced, while still preserving the information from the primary data.
3.4. Parameter Optimization for Collocated Cokriging
The variogram function is an important input parameter in the collocated cokriging method, and thus the characteristic parameters of the variogram function will inevitably have certain effects on the results.
3.4.1. Influence of Range on Experimental Results
Figure 6 shows the impact of range variation on simulated predictions. Since an elliptical variogram model is established, two ranges in different directions are needed to correspond to the major and minor axes of the ellipse, reflecting the elliptical influence of the primary data points on the surrounding data in the prediction. Decreasing the range reduces the affected area by the primary data, while increasing the range expands its influence. The magnitude of the range represents the extent of the influence of the primary data points. When there is a strong correlation between the primary and secondary data, the strong correlation weakens the influence of the primary data on the surrounding estimated points. Strong correlation plays an important role in collocated cokriging, and it can weaken the influence of fewer primary data points.
3.4.2. Influence of Anisotropy on Experimental Results
Anisotropy includes the azimuth and anisotropy ratio. The azimuth is based on the positive direction of the
x-axis, rotating clockwise as the positive direction. Changing the azimuth to 0°, 60°, and 120° (
Figure 7), the collocated cokriging is simulated for the primary data. When different azimuths are set, the simulated prediction distribution map shows evident directional characteristics around the existing primary data points, and the data outside the range, i.e., beyond the influence of the primary data, is not significantly affected by the primary data. Similarly to the range, when there is a strong correlation between the primary and secondary data, the strong correlation weakens the influence of the primary data on the estimated points within the range.
3.4.3. Influence of Number of Conditioning Points on Experimental Results
In the simulation experiment, if there are too many conditioning points involved in the estimation, it will result in complex calculations for the equation matrix, severely affecting the execution speed of the algorithm. On the other hand, a certain number of conditioning points are needed for the estimation, as an insufficient number of conditioning points will lead to interpolated results that do not reflect the true distribution of reservoir parameters. Therefore, the selection of the number of conditioning points is crucial.
In the simulation experiment, the primary and secondary data have a strong correlation. To better illustrate the influence of the number of conditioning points on the results, the same set of data is used and the correlation coefficient between the primary and secondary data is artificially changed. The collocated cokriging simulation experiments under different numbers of conditioning points are shown in
Figure 8. When performing interpolation on the estimated points using collocated cokriging, only the surrounding primary data points are considered. From the images, it can be observed that when there are too few primary data points involved in the estimation, the interpolated images have distinct abrupt boundaries. On one hand, this is due to the scarcity and uneven distribution of primary data points; on the other hand, the significant differences in parameter values among different primary data points lead to large variations between different regions. Therefore, the number of conditioning points should be reasonably selected based on the actual situation.