1. Introduction
Shear wave velocity estimation is an important geoacoustic inversion task for seabed geotechnical applications since shear wave velocity can provide a good indicator of sediment rigidity and characterization [
1,
2]. The seabed shear wave velocity profile can be estimated from the dispersion curve of the seismoacoustic interface waves, which is a convenient and low-cost approach compared to the direct approach (e.g., coring). Here, the interface waves refer to Scholte waves since in most underwater and seismic experiments sources are deployed in the water column and only Scholte waves can be generated [
2].
There are two approaches for geoacoustic inversion [
3]: the optimization-based approach and the machine learning (ML)-based approach. The optimization-based approach exploits the optimization method for determining a set of geoacoustic parameters that best fit the measured data. Based on the previous reviews [
4,
5], some optimization methods have been demonstrated to perform well for geoacoustic inversions, such as the genetic algorithm (GA) [
6], differential evolution (DE) [
7], and adaptive simplex simulated annealing (ASSA) [
8]. On the other hand, with the development of ML, studies for geoacoustic inversion based on ML have appeared. Most of the studies are based on supervised learning, which aims to train a deep neural network for inversion based on a vast dataset [
9,
10,
11,
12]. This type of approach normally consists of the following steps: (1) creating a simulation dataset based on a physical forward model; (2) training a deep neural network based on the simulation dataset; (3) exploiting the trained neural network for the real-world inversion.
These two approaches can both provide acceptable performances for geoacoustic inversion. However, they also have some drawbacks. Since the prevalent optimization methods are not specifically designed for geoacoustic inversion, they may incur some limitations and difficulties to be applied in this specific field, such as difficulties in choosing the hyperparameters required by the algorithm, which may incur more time–costs. The ML-based approach introduces a drawback that the trained neural network cannot interact with the physical forward model. The procedure of creating the simulation dataset has to be repeated when the ocean environment changes significantly.
To avoid the drawbacks and keep the interactive ability of the optimization-based approach and the learnability of the ML-based approach, deep reinforcement learning (DRL) could become a potential option. Unlike supervised learning, DRL learns by trial and error, which iteratively updates the model by interacting with the environment to achieve good data fitting [
13]. Its potential for geoacoustic inversion can be intuitively interpreted. The physical forward model role, e.g., the environment, can create a replica according to a set of geoacoustic parameters input by the DRL model. The DRL method can update the model by iteratively modifying the replica to obtain the best fit of the measured data.
DRL has been widely used as an intelligent controller for different purposes, including robotics [
14], electronic sports [
15], automatic controlling [
16,
17,
18,
19,
20,
21,
22,
23], etc. Specifically, Gu et al. demonstrated the effectiveness of DRL for controlling physical robots [
14]. Joo et al. proposed a green signal time allocation system based on a deep Q-network (DQN) for reducing the standard deviation of each lane at an intersection [
16]. Zhou et al. modeled penetration testing as a Markov decision process and exploited DQN for autonomous penetration testing [
17]. Park et al. exploited a DRL-based DQN agent for a visual object-tracking task in a virtual environment. It has been demonstrated that the proposed agent outperforms some conventional methods of two public databases [
18]. Gao et al. proposed a DRL-based method to solve the relay selection problem in the decode and forward relay-aided free-space optical communication system [
19]. Guan et al. designed a DRL-based spectrum allocation algorithm for the internet of vehicles discriminating services. It has been proven that the designed method allocates spectrum resources quickly and efficiently in a highly dynamic environment [
20]. Zhao et al. utilized DQN for controlling the autonomous walking of an underground load–haul–dump machine and demonstrated the effectiveness of the proposed DQN-based method through experimental verification [
21]. Qin et al. proposed a hierarchical DQN-based path-planning method for controlling the long-term data collection of unmanned aerial vehicles in dynamic scenarios [
22]. Asaf et al. exploited DRL to set optimal contention windows under different network conditions for wireless LAN performance enhancement [
23].
Even though it has been illustrated that the DRL outperforms to control the agent’s behavior for performing well in a specific environment, the application of DRL still needs a specific configuration of the environment, action space of the agent, and reward. For instance, Wang et al. proposed a stochastic inversion of magnetotelluric data based on DRL [
24], in which the environment state is defined as the layer information and the resistivity, and the agent space includes three linear operations (addition, subtraction, and keeping no variation). However, the problem of magnetotelluric inversion is quite different from the geoacoustic inversion. Moreover, the parameter space of the latter is a high-dimensional space, which means that the naive linear operations (e.g., addition or subtraction) are inefficient for the agent to explore in the space and determine the optimal solution. To the best of our knowledge, DRL has not been used for geoacoustic inversion. Therefore, our motivation is to investigate the potential of DRL and define a useful configuration of the environment and agent for the field.
In this paper, we propose a geoacoustic inversion framework based on a popular method of DRL and the DQN for estimating shear wave velocity from the dispersion data of interface waves. In the framework, a carefully designed configuration for the environment and agent is also proposed. A comprehensive performance analysis is presented to compare the proposed framework with three popular optimization methods (i.e., GA, DE, and ASSA) widely used for geoacoustic inversion.
The remainder of this paper is organized as follows.
Section 2 states the considered problem. The theories of DRL and DQN are introduced in
Section 3.
Section 4 describes the proposed framework for geoacoustic inversion. A comprehensive performance analysis is presented in
Section 5. Finally, the conclusions are given in
Section 6.
2. Problem Formulation
The definition of the forward problem can be expressed as
where
F and
refer to the physical forward model and the observed data, respectively.
is a set of geoacoustic parameters standing for one ocean environment and seabed condition.
The inversion problem aims at inferring the set of geoacoustic model parameters generating the observed data and can be expressed as:
where
refers to the inversion operation.
The ocean environment and seabed can be parameterized as an N-layered structure with four geoacoustic parameters (layer thickness, density, compression wave velocity, and shear wave velocity): .
A general workflow of inversion is illustrated in
Figure 1. The terminologies are introduced as follows:
The environment consists of a physical forward model for calculating the replica, the observed data, and a misfit function for measuring the mismatch between the observed data and the replica. It receives a set of selected parameters from the agent and provides feedback to the agent.
An agent is an operator that samples from the parameter space following its search strategy and interacts with the environment. During each iteration of inversion, the agent will log the feedback from the environment, the instant best solution, and the related information.
A parameter space is a multi-dimensional space defined by the search bounds.
The inversion is an iterative process. It starts with an initialization that defines a prior geoacoustic model and the original search bounds based on prior knowledge. During each iteration, the agent will sample from the parameter space. The environment will receive the selected parameters, correspondingly create a replica, and provide the misfit as feedback for the agent. The iteration stops once the termination criteria are met.
As shown in
Figure 1, the existing optimization methods (GA, DE, and ASSA) role as the search strategy for controlling the agent to explore the parameter space and determine the best solution at the end. Specifically, GA [
6] and DE [
7] are two heuristic search algorithms inspired by the evolution of natural species. ASSA [
8] is a hybrid optimization method that combines the downhill simplex and the simulated annealing methods. More details about GA, DE, and ASSA can be found in [
6,
7,
8], respectively.
3. Theories of DRL and DQN
DRL is used to solve a type of task that controls an agent to iteratively interact with the environment, and maximize future rewards. This task can be formulated as a finite Markov decision process [
25] and be achieved by the DQN algorithm [
13]. DQN is derived from Q-learning and can learn an optimal strategy by estimating the Q-value, which expresses the quality of executing an action
a given a certain environment state
s. The Q-value can be iteratively updated by the following formula and converge to the optimum.
where
expresses the Q-value at the current environment state,
is the learning rate,
r is the current reward,
is the discount factor, and
represents the maximum Q-value in the next environment state
.
The conventional Q-learning needs to create a Q-table for saving and updating the Q-value at each environment state, which can be intractable to build a table when the environment state number is huge. To mitigate this problem, the DQN algorithm utilizes a neural network instead of a Q-table for estimating the Q-value. The DQN algorithm introduces a replay memory
to save the agent’s experience at different iterations, where
is the experience at
t iteration. During the training stage, the DQN randomly selects a mini-batch from the replay memory for minimizing a loss function
L:
where the meanings of symbols are the same as in Equation (
3).
Given a specific configuration of the environment and agent, the training process of DQN is expressed in Algorithm 1 [
13].
Algorithm 1: Training procedure of DQN |
- 1:
Initializing the parameters of DQN and the replay memory D. - 2:
for Epoch from 1 to M do - 3:
repeat - 4:
Collecting the initial environment state . - 5:
With a preset probability selecting a random action otherwise selecting the . - 6:
Executing action and receiving feedback from the environment. The feedback includes the reward and the new environment state . - 7:
Saving the experience in the replay memory D. - 8:
Randomly sampling a mini-batch of experience from D where is the size of the mini-batch. - 9:
- 10:
Minimizing the loss function and updating the parameters of DQN. - 11:
until the termination criteria are met. - 12:
end for
|
5. Numerical Experiments
In this paper, the DQN framework is applied to estimate the shear wave velocity based on the dispersion data of interface waves. The inversion performances of the proposed DQN framework and three alternative methods (GA, DE, and ASSA) are examined in two numerical experiments. Two geoacoustic models based on real scenarios in [
2,
26] are defined to increase the reality of the simulation. To increase the reliability of the evaluation, the inversion results discussed in this section are averaged over 100 independent inversions. A forward model DISPER80 [
27] based on the Thomson–Haskell matrix method [
28,
29] is used for calculating the simulated dispersion curve based on the given geoacoustic model.
5.1. Experiment Setup
The numerical experiments were conducted on a server with Intel Core i7-9700K CPU @ 3.60 GHz, 8 cores, 64 G memory, and 1 T hard drive. Inversions of GA and DE were implemented based on a GitHub repository scikit-opt. The inversion of ASSA was implemented based on the algorithm proposed in [
8]. The DQN framework was implemented based on PyTorch [
30]. Preset parameters of the candidate methods are shown in
Table 3, where item All refers to the parameters applicable for all the candidate methods.
The metrics for evaluation are the misfit value, the running time per independent inversion and the relative error (namely,
RE) formulated in Equation (
11).
where
is the estimated value of one geoacoustic parameter and
is the corresponding ground truth.
5.2. Case 1
Case 1 defines a six-layer geoacoustic model referenced from the inversion results in the Grane field [
26], as shown in
Table 4. The phase velocity dispersion curves for the first five modes of the Scholte waves are calculated in the frequency range from 0 to 5 Hz as the ground truth.
The density and compression wave velocities are considered as known since they are not sensitive to the dispersion property of the Scholte wave [
1]. The original search bounds for Case 1 are shown in
Table 5.
The estimated dispersion curves with the ground truth are shown in
Figure 3, where the black dots are the ground truth. The blue, green, yellow, and red lines express the estimated dispersion curves by the GA, ASSA, DE, and DQN framework, respectively.
The estimated shear wave velocity profiles with the ground truth are shown in
Figure 4, where the black line refers to the ground truth. The blue, green, yellow, and red lines express the estimated shear wave velocity profiles by the GA, ASSA, DE, and DQN framework, respectively.
The performance analysis for Case 1 is shown in
Table 6, in which the bold fonts refer to the lowest relative error, misfit value, or running time over each row.
At the low frequency of the fundamental mode, the estimated errors by ASSA and DE are larger than that by the GA and DQN framework.
For all candidate methods, the relative errors of the shear wave velocity in all layers are lower than that of thickness. Both the shear wave velocity and layer thickness are sensitive to the dispersion properties of Scholte waves. However, in some cases, one can be more sensitive than the other.
The low-to-high ranking for misfit values is the DQN framework, ASSA, DE, and GA. It is the same as the low-to-high ranking for the relative error of each geoacoustic parameter with a few exceptions of , , , and . This illustrates that the misfit value has a partly positive correlation with the overall relative errors, which is significant since the ground truth of a field survey is mostly unknown and the misfit value is the only metric.
The DQN framework attains the lowest misfit and the shortest running time over others. Furthermore, the DQN framework has the lowest relative errors of all estimated geoacoustic parameters with a few exceptions of and .
Two geoacoustic parameters are picked for further statistical analysis since the DQN framework performs the best inversion on
and does not perform the best on
. The statistical analyses of geoacoustic parameters
and
are shown in
Figure 5 and
Figure 6, respectively. In the figures, the red dashed curve illustrates the distribution of the estimated parameter over 100 independent inversions. The gray block, the purple line, and the black line correspond to the histogram, averaged value over 100 independent inversions, and the ground truth, respectively. Intuitively, the closeness between the purple and black lines indicates the inversion performance (the closer, the better). In addition, the distribution of the estimated parameter corresponds to the uncertainty of the inversion results. A narrower distribution means a lower uncertainty. The DQN framework has the narrowest distribution, which leads to the lowest uncertainty of the inversion result compared to the other methods.
5.3. Case 2
Case 2 considers another geoacoustic model from the inversion results in the North Sea [
2]. The geoacoustic model consists of a sediment layer with a linear velocity gradient over a continuous half-space (namely, LC introduced in [
1]). The model parameters listed in
Table 7 are used to generate phase velocity dispersion curves in the frequency range of 3 to 18 Hz. Five modes are selected as the ground truth.
and
refer to the shear wave velocities at the top and bottom of the sediment layer, respectively.
The search bounds of the sediment parameters H (m), (m/s), and (m/s) are , , and , respectively.
The estimated dispersion curves and shear wave velocity profiles are shown in
Figure 7 and
Figure 8, respectively. Their legends are the same as in
Figure 3 and
Figure 4, respectively.
The performance analysis for Case 2 is shown in
Table 8, in which the bold fonts refer to the lowest relative error, misfit value, or running time over each row.
5.4. Discussion
Since the ground truth information is not available in many real scenarios,
Table 9 expresses the performance comparison based on the metrics of misfit and running times and concludes the analysis in
Section 5.2 and
Section 5.3.
Table 9 demonstrates that the proposed framework performs a faster and lower misfit inversion (highlighted with bold fonts) compared to other methods. Furthermore,
Figure 5 and
Figure 6 illustrate that the proposed framework provides the inversion results with relatively lower uncertainties than other methods.
As mentioned in
Section 4.2, Action 0 explores the parameter space more roughly since it compresses the search bounds in a relatively fast way. On the other hand, Action 1 is more suitable for finely exploring one local area in the parameter space. To understand the learned search strategy of the DQN framework,
Figure 9 exhibits how the actions are executed by the DQN framework in an independent inversion for Case 1, where the blue star, the orange dot, and the black curve refer to Action 0, Action 1, and the relative misfit as a function of iteration numbers, respectively.
As shown in
Figure 9, the agent executes Action 0 at the early stage of iterations and mainly executes Action 1 after that. This pattern can be interpreted as the DQN framework executing Action 0 at the early iterations to locate the rough area of the solution in the parameter space and executing a finer exploration (i.e., Action 1) after that to determine the final solution.
Note that we do not need to set a hard threshold for the agent to change the action from Action 0 to Action 1 because each independent inversion is initialized randomly. Furthermore, the relative misfit can exceed 100% at the first several iterations since the normalization factor is defined by a random initialization.
6. Conclusions
In this paper, a DQN-based framework for geoacoustic inversion is proposed. The framework can be defined as an optimization-based inversion method with a learnable search strategy, which keeps both advantages of optimization-based and ML-based approaches. Its performance is assessed by two numerical cases for estimating the shear wave velocity profile based on Scholte wave dispersion curves and compared with that of three popular optimization methods. Compared to the fastest conventional method, the running time of the proposed framework can be further reduced by 37.7% (Case 1) and 68.9% (Case 2), respectively. Compared to the best conventional method, the misfit of the proposed framework can be further reduced by 51.6% (Case 1) and 9.4% (Case 2), respectively. The results demonstrate the potential of DRL for geoacoustic inversion and the superior performance of the proposed framework. More specifically, the proposed framework can provide a faster, lower misfit, lower relative error, and lower uncertainty inversion. Please note that the application scope of the proposed framework is the whole geoacoustic inversion field. The proposed framework can easily be applied to different inversion tasks by using an appropriate forward model.
The future research direction will focus on the representation of the misfit space (i.e., the environment state) and the design of new available actions in the action space for the agent to explore the parameter space.
Implementation code availability: The code will be made available for the peer reviewers and the public upon request after the manuscript is published.