1. Introduction
The key task of successfully discovering oil and gas reservoirs is the prediction of elastic parameters in seismic exploration. Elastic parameters, such as compressional wave velocity (DTC) and shear wave velocity (DTS), are fundamental characteristics of seismic wave propagation in underground media. Their determination and interpretation are crucial for revealing underground structures, detecting the existence of oil and gas reservoirs, and assessing reserves. The variation in shear wave velocity can reflect the elastic properties of rocks, such as porosity, permeability, and the mineral composition of the rock matrix. Additionally, it can also reveal the underground stress state and fluid type. However, due to the constraints of acquisition costs and technology, shear wave velocity is commonly missing in some old wells. The experience equation and the petrophysical model were used to predict shear wave velocity [
1,
2,
3,
4]. For all complex geological conditions or special rock types, empirical formulas may not provide accurate predictions.
Due to the diversity of rock structure characteristics of various reservoirs, the establishment of targeted rock physics models based on the concept of equivalent media has been a subject of research for scholars both domestically and abroad. The influence of shale content, porosity, and pore shape on rock velocities in shaly sandstone was considered by Xu and White [
5], and a classical Xu–White model was proposed to describe the elastic properties of sandstone. The calculation methods for the bulk and shear moduli of dry rock in the Xu–White model for shear wave velocity prediction were simplified by Keys and Xu [
6], which can significantly enhance the computational efficiency of the Xu–White model. When constructing models for carbonate reservoirs, the pores of carbonate rocks were categorized as clay-containing pores, intergranular pores, microfractures, and hard pores by Xu and Payne [
7], thus extending the application of the Xu–White model to carbonate reservoirs. The Xu–White model was utilized for shear wave prediction, and the rock physics analysis, pre-stack elastic parameter inversion, and other techniques were combined to achieve reservoir prediction from point to plane by Liu et al. [
8]. The predicted values exhibited a high degree of agreement with the true values. The variation in clay elastic parameters was considered by Zheng et al. [
9], and the inversion was employed to correct errors in the input parameters of the Xu–White model, which can improve the prediction accuracy of shear wave velocity. A shear wave velocity model for porous fractured sandstone based on the Gassmann equation was established by Li et al. [
10], and consolidation parameters proposed by Lee were utilized to calculate the shear wave velocity. The results based on the DEM-Gassmann rock physics model were optimized by Yang et al. [
11] using simulated annealing, and the virtual fracture porosity was utilized to make up for the error caused by the inaccuracy of rock elastic parameters and physical parameters. A porosity aspect ratio estimation method based on a rock physics model was proposed by Guo et al. [
12], whereby the aspect ratio of pores is estimated using compressional wave velocity, porosity, and mineral composition, thus predicting shear wave velocity. The microstructure of shale reservoirs was described by Zhang and Liu [
13] using a fracture aspect ratio and compaction index, and a quantitative relationship between the petrophysical parameters and elastic parameters of shale reservoirs was established using statistical rock physics models. The above rock physics model method usually involves complex rock mechanics theory and numerical simulation, which requires a lot of computing resources and time, as well as a large number of rock parameters as input. These parameters may be difficult to obtain or measure accurately [
14]. Therefore, finding a reliable and adaptable method for predicting shear wave velocity has always been a focus of research.
With the continuous development of computers, neural networks capable of extracting data features and constructing complex nonlinear mapping relationships have been widely used in various fields such as image processing, medical health, and autonomous driving [
15,
16,
17]. In recent years, neural networks capable of establishing complex nonlinear mapping relationships between input and output, such as convolutional neural networks (CNNs) [
18], recurrent neural networks (RNNs) [
19], and generative adversarial networks (GANs) [
20], have been introduced into the field of geophysics by some scholars, achieving partial success in reservoir prediction, lithology identification, and stress analysis [
21,
22,
23]. The application of adaptive BP neural networks in shear wave velocity prediction was proposed by Wang et al. [
24]. The neural network model predicts that the shear wave velocity is in good agreement with the measured shear wave velocity. A convolutional neural network containing the receiving function was designed by Yang et al. [
25] to predict the shear wave velocity. To fully consider the inherent relationship between shear wave velocity and other logging data, a one-dimensional convolutional neural network was used to predict shear wave velocity by Ma et al. [
26]. Due to the gradual variation of sedimentary formation, logging data exhibits a correlation in the depth direction. RNNs have been used to extract temporal features among conventional logging data, successfully predicting reservoir parameters and formation pressure by some scholars [
27,
28]. The temporal features from logging data were extracted for predicting shear wave velocity by Sun et al. [
29] using GRU. A GRU was employed to reconstruct the logarithmic curve by Teng et al. [
30]. Compared with long short-term memory (LSTM), GRUs have a significant improvement in the average correlation coefficient of prediction results. However, the above methods did not fully consider the spatiotemporal features of logging data, which may limit the accuracy and reliability of neural network prediction results. Therefore, some scholars adopt network fusion or incorporate attention mechanisms to improve the prediction accuracy of neural networks [
31]. The spatial and temporal features of the logging data were extracted by Wang et al. [
32] using CNNs and GRUs, respectively, and the relationship between different spatial and temporal features and shear wave velocity was established. Compared with the prediction results of traditional networks, the prediction results of the network proposed are closer to the true values. The spatiotemporal features of logging data were extracted using 2DCNNs and GRUs by Chen et al. [
21]. Compared with the prediction results of traditional networks, it can be observed that the network proposed can effectively fit the shear wave velocity in sandstone sections. Aiming at the nonlinear relationship between different logging data, a shear wave velocity prediction method based on an attention mechanism and a bidirectional long short-term memory network was proposed by He et al. [
33]. To reduce the deviation of underground oil and gas distribution description, an attention mechanism is integrated into the long-term and short-term memory networks to automatically calculate the weight of data features, and the shear wave velocity is predicted by Fu et al. [
34] with a lower mean absolute error. To improve the prediction accuracy of porosity, the method of combining bidirectional long short-term memory networks and attention mechanisms was proposed by Qiao et al. [
35], and it is verified that the proposed network has the smallest training error on the verification set. Aiming at the problems of gradient explosion and gradient disappearance of traditional recurrent neural networks (RNNs), a new network that is sensitive to the depth sequence of logging data was proposed by Feng et al. [
36]. The attention mechanism is fused with BiLSTM to highlight the temporal features that are important for shear wave velocity, reduce the influence of other features, and then improve the prediction accuracy of shear wave velocity. The attention mechanism of the network was utilized to allocate weights to logging data, automatically focusing on the logging data that contribute greatly to the shear wave velocity prediction, thus significantly improving the prediction accuracy.
Shear wave velocity is one of the key parameters of rock physical properties. It is closely related to various characteristics of underground rock, including rock density, porosity, fluid type, and mechanical properties of rock. However, due to the high cost of obtaining shear wave velocity, shear wave velocity is generally lacking in actual data. Aiming at the problem that the traditional network is not sensitive to the spatiotemporal features of logging data and that the accuracy of shear wave velocity prediction is low, an integrated network based on an attention mechanism (STACBiN) is proposed. The spatial and temporal features of logging data are extracted using a convolutional neural network and bidirectional gated recurrent unit, respectively. Simultaneously, under the influence of the spatiotemporal attention mechanism, different spatiotemporal features are assigned different weights, thereby enhancing the overall performance and prediction accuracy of the network proposed. Finally, the stratum of a basin offshore is taken as the research object, and the prediction results of the STACBiN integrated network are compared with the prediction results of the traditional convolutional neural network (2DCNN) and the bidirectional gated recurrent unit (BiGRU) to verify the effectiveness of the proposed method.
3. Experimental Procedure and Operations
3.1. Training and Prediction Process of STACBiN Integrated Network
The process of shear wave velocity prediction using a STACBiN integrated network is as follows (
Figure 4):
(1) Feature Selection. Conduct a correlation analysis between shear wave velocity and other logging data using scatter plots and select the well data that hold a high correlation with shear wave velocity as inputs for the neural network.
(2) Data Preprocessing. To eliminate differences in scale among different logging data and improve computational efficiency, a normalization function is used:
where
represents the logging data.
and
are the minimum and maximum values of the logging data, respectively, and
is the normalized value.
(3) Network Training. The preprocessed data are divided into training and testing sets. The training set is input into the neural network for training. The mean square error (MSE) is used as the loss function of the network to calculate the error. In addition, the adaptive moment estimation (Adam) [
38,
39] is used as an optimization algorithm to update the network parameters until the loss error converges and remains unchanged.
where
is the number of training samples,
is the predicted value, and
is the true value.
(4) Input Testing Data. The divided test set is input into the optimal network for shear wave velocity prediction.
(5) Model Evaluation. The mean absolute error (MAE) and coefficient of determination (R
2) are used as evaluation indices to quantitatively evaluate the neural network. The closer the mean absolute error is to 0, the closer the predicted values are to the true values. The closer the coefficient of determination is to 1, the closer the predicted values are to the true values.
where
is the number of samples,
is the mean of the training samples,
is the i-th input sample data, and
is the output of the network.
3.2. Dataset Introduction
The dataset in this paper is derived from the sand–mudstone strata of an offshore basin. The depth of the target layer is mainly composed of sandstone and mudstone. In order to improve the prediction accuracy of shear wave velocity in tight sandstone reservoirs, an integrated network combined with attention mechanisms is proposed. There are eight measured wells in this area, of which five wells involve shear wave velocity, which are labeled as FH-1, FH-2, FH-3, FH-4, and FH-5, respectively. Among these, the FH-1, FH-2, and FH-3 wells are utilized as the training set to train the network, while the FH-4 and FH-5 wells are used as the test set to validate the generalization of the network.
3.3. Feature Selection
In essence, using deep learning to predict shear wave velocity is a data fitting problem. Therefore, the accuracy of the prediction results mainly relies on the correlation between the input and the output. The density, compensated neutron, resistivity, and compressional wave velocity in all wells are selected as the input of the neural network.
Figure 5 shows the correlations between logging data and shear wave velocity, and the correlations are as follows: compression wave velocity (DTC, R = 0.85), resistivity (RT, R = −0.73), density (RHOB, R = −0.70), and compensated neutron (CNL, R = 0.63). The DTC and CNL are positively correlated with the shear wave velocity, and the RHOB and RT are negatively correlated with the shear wave velocity. On the other hand, due to the correlation of sedimentary strata in the depth direction, the logging data have regularity in the sequence direction. Therefore, the autocorrelation algorithm is used to interpret the self-relativity of logging data (
Figure 6). It can be seen that with the increase in lag moment (LAG), the degree of autocorrelation reduction in different logging data is different. When the lag is 100, the autocorrelation of logging data from high to low is DTC, DTS, RHOB, CNL, and RT. This analysis shows that there are different correlations between different spatial and temporal features of other logging data and DTS.
3.4. Selection of Hyperparameters
Since the convolutional neural network was originally applied to the field of image classification, the recurrent neural network was initially successfully applied to the field of natural language processing. It has relatively few applications in the field of geophysics, and there are many factors that affect the neural network, such as time step, learning rate, etc., so it is necessary to analyze each parameter. Therefore, it is necessary to conduct sensitive parameter experiments on the network proposed in this paper and use the coefficient of determination (R2) to quantitatively evaluate the superiority of various parameter settings, so as to obtain the best hyperparameters for the basic parameter settings of the network.
The most important parameters in the neural network are the number of neurons and the number of layers, which together determine the complexity and learning ability of the network. The number of nodes and layers of the hidden layer has a significant impact on network performance, but it is not simply considered that the more layers, the better. In fact, although increasing the number of hidden layers can improve the expression ability of the network and enable it to learn more complex nonlinear mappings, it may also bring some problems, such as overfitting. Moreover, more hidden layers may cause problems such as gradient disappearance or explosion in the error backpropagation process of the neural network, thereby increasing the training difficulty of the network. The number of neurons in the neural network also follows this rule. Therefore, the determination of the hyperparameters in the neural network needs to be confirmed one by one through a specific scenario to determine the optimal neural network parameters in a specific scenario. In addition to the parameters such as the hidden layer of the neural network and the number of neurons, the parameters of the network proposed mainly include the time step (sliding window) and learning rate. The time step represents the corresponding relationship between the multi-point and single-point shear wave velocity in other logging data. When strata change or lithology difference is obvious, the change point can be selected as the division point of the time step, so as to establish a complex mapping relationship between logging data and shear wave velocity in each time step. The learning rate plays a crucial role in deep learning networks, which is a hyperparameter used to control the step size of the network parameters updated in each iteration. An improper learning rate may lead to problems such as unstable network training, slow convergence speed, or convergence to local optimal solutions. Therefore, the determination of learning rate needs to be tested repeatedly.
The FH-1, FH-2, and FH-3 wells are used as the training set and the FH-4 and FH-5 wells are used as the test set; by examining the performance of the STACBiN integrated network on the test set under different hyperparameters, the optimal network parameters are determined one by one. Finally, the hyperparameter combination with 4 hidden layers, 15 hidden units, 15 time steps, and a 0.01 learning rate is selected (
Figure 7), and the network architecture is determined based on this experiment.
3.5. Visualization of Attention Weight
In order to verify the rationality of adding an attention mechanism to the neural network, two networks with and without spatiotemporal attention mechanisms are constructed, respectively. The structure of the network without a spatiotemporal attention mechanism is composed of a 2DCNN, a BiGRU, and a fully connected layer. Meanwhile, the structure of the network with a spatiotemporal attention mechanism is composed of a 2DCNN, a spatial attention layer, a BiGRU, a temporal attention layer, and a fully connected layer.
Figure 8 demonstrates the weight distribution of the neural network with or without spatial attention layer. It can be clearly seen that the spatial attention layer assigns different weights to different logging data, from high to low: DTC, RT, RHOB, and CNL. The weight of DTC given by the spatial attention layer is the largest, reaching about 0.49, and the weight of CNL is the smallest, reaching about 0.10. This means that there is the largest correlation between DTC and DTS and the smallest correlation between CNL and DTS, which is consistent with the distribution of the Pearson correlation coefficient in
Figure 5, which is consistent with the correlation distribution between conventional logging data and shear wave velocity in
Figure 5. However, the weight distribution of the network without a spatial attention mechanism does not have this rule. This is because both DTS and DTC reflect the response of rock to stress. Under various geological conditions, the DTC and DTS of rocks are typically similarly influenced, showing a positive correlation between the two, which validates the reasonability of introducing the spatial attention mechanism.
Figure 9 demonstrates the weight distribution of the neural network with or without the temporal attention layer. It can be observed that the temporal feature at the label position is given the largest weight, reaching 0.2561, indicating that the correlation between the temporal feature at this position has the greatest influence on the shear wave velocity. Secondly, with the increasing distance between the temporal feature and the temporal feature at the label, the weight assigned by the temporal attention mechanism shows a downward trend, which is consistent with the variation in conventional logging data with lag distance in
Figure 6. This is because the label is located at the same position as the sampling point in the middle of the sample, which is the different response of the same rock, so the correlation is the largest. In addition, the sedimentary strata are gradually changing in the depth direction, and the logging data are correlated in the sequence direction. As the distance increases, the correlation gradually becomes smaller, so it proves to be reasonable.