1. Introduction
AUVs have emerged as a research focus in the field of marine exploration in recent years [
1,
2], as they are widely applicable to various underwater needs, such as seabed mapping, archaeological surveys, environmental monitoring, underwater infrastructure inspections, and military reconnaissance [
3]. The prerequisite for an AUV to perform tasks in unknown waters is its ability to obtain sufficiently accurate navigation information (attitude, heading, velocity, etc.) to support the AUV in completing the required actions [
4,
5]. In water, the Global Navigation Satellite System (GNSS) can provide most of the required navigation information. However, due to the rapid attenuation of radio signals in water, underwater positioning cannot be accomplished, and GNSS is unable to provide accurate navigation information for AUVs [
6]. Underwater navigation methods for AUVs generally include geomagnetic field matching navigation, inertial navigation, acoustic navigation, etc. Geophysical field matching methods obtain navigation information by matching real-time measured physical feature data with pre-stored map data. For instance, a multibeam echo sounder (MBES) can be utilized to acquire bathymetric maps for navigation information. While MBES offer high precision, they are also costly and cumbersome, which limits their practical application in geophysical field matching methods.
A single navigation method alone is often insufficient to meet the precision requirements for AUV navigation, so the combination of SINS and DVL is a commonly used navigation method for AUVs. The combined navigation approach allows the two sensors to complement each other’s strengths. It uses the error propagation variance of the SINS as the system state equation, the DVL velocity measurement as the velocity reference, and the difference in velocity between SINS and DVL as the observation value. Subsequently, the SINS error states are estimated and compensated through a Kalman filter, therefore enhancing the navigation accuracy. Despite this, the accuracy of the SINS/DVL combined navigation system is still influenced by the precision of the SINS and DVL sensors [
7,
8,
9,
10].
SINS utilizes an IMU to collect navigation information and computes the velocity of the AUV through integration, which is further used for dead reckoning. This can provide the navigation information required for an AUV [
11]. The advantage of the IMU sensor lies in its ability to operate independently, as it neither emits signals nor relies on external units. It offers excellent anti-interference capabilities, allowing for the acquisition of high-precision, low-latency, and high-rate navigation information in a short period of time [
12]. However, due to the nature of integration, the cumulative error in navigation information tends to increase as time progresses [
13]. To enhance the navigation accuracy of the SINS, scholars have conducted a series of studies. Properly calibrating the SINS equipment can significantly improve the accuracy of the navigation information it provides. Calibration methods have rapidly gained widespread attention, such as using a high-precision three-axis turntable and the rotation of the system’s three sensitive axes to achieve the calibration of error angles between the two systems [
14]. The improved system-level fitting calibration method can address the severe coupling errors in the estimation of the scale factor error of the SINS using high-precision Ring Laser Gyros (RLGs) [
15]. Additionally, a full-parameter calibration method for a six-axis tilt inertial measurement unit can reduce the difficulty of calibrating the tilt inertial assembly [
16]. Moreover, a rapid analytical coarse calibration algorithm can eliminate calibration errors caused by IMU orientation errors [
17] and can even achieve high-precision calibration of the SINS in polar regions [
18]. When the SINS collects data, it inevitably becomes subject to the influence of the system noise inherent in the sensors. To address random noise, the Kalman filter can be used for estimation and noise reduction [
19], or the Kalman filter can be applied to filter the high-frequency components of multi-resolution decomposed signals [
20]. Additionally, wavelet domains can be utilized for denoising with correlated thresholding [
21]. Furthermore, optimization techniques based on iterative rules can employ neural-fuzzy systems to predict SINS positioning errors [
22].
Acoustic navigation is primarily implemented through a DVL, which is a navigation method based on the Doppler effect. It estimates the velocity vector of the AUV by calculating the Doppler frequency shifts of multiple beams reflected in different directions [
23]. Due to its characteristic of not accumulating errors over time, the DVL can provide real-time high-precision velocity measurements of the AUV. However, the stability of these measurements is susceptible to interference from the external environment [
24]. Most DVL algorithms assume that the attitude of the AUV remains consistent from the time of beam emission to the time of reception [
25]. However, if the attitude changes during this period, it can lead to beam measurement errors, affecting the accuracy of DVL measurements [
13]. The factors that affect the measurement accuracy of the DVL primarily include scale factor errors, misalignment angles, and random noise errors. To address scale factor errors, current methods include GNSS-assisted calibration based on velocity and position observations [
23], as well as the use of a robust Invariant Extended Kalman Filter (IEKF). This approach combines the linear error propagation equation of the DVL calibration model based on the Lie group SO(3) with the theory of Statistical Similarity Measure (SSM), which can reduce the impact of outliers on calibration accuracy [
10].
To address the random noise errors in the DVL, an improved robust Kalman Filter (KF) algorithm can be used to process the measurement noise variance of each DVL beam separately [
26]. Additionally, a fault diagnosis method based on the chi-squared rule and a velocity tracking method based on a constant velocity model and the assumption of slow movement of the AUV can effectively suppress the random noise generated by DVL measurements [
27]. When DVL measurement anomalies or beam loss occur, leading to a decrease in DVL measurement accuracy, AI and data-driven methods can be employed for predictive regression of the DVL [
28,
29,
30]. To address installation misalignment angles, in the early stages, the least squares method could be used for estimation [
31]. Currently, online calibration using Kalman filtering [
32] or optimal estimation methods based on algorithms such as Particle Swarm Optimization (PSO) can also be employed to calibrate the DVL [
33].
The accuracy of integrated navigation systems is still affected by the measurement precision of various sensors, including those from the SINS and the DVL. Inspired by Cohen [
34], to enhance underwater navigation accuracy, this paper proposes an end-to-end deep-learning framework to improve the velocity vector of the DVL. Using the raw measurement data from IMU sensors, the method refines the DVL velocity readings. It compensates for external environmental interferences affecting the DVL, therefore enhancing navigation accuracy. The
Figure 1 is the conceptual framework of this paper. The network utilizes raw IMU data to optimize the DVL velocity vector. During the training process, the output of the network and the actual GPS velocity vector are used to calculate the loss function. Finally, the optimized DVL velocity vector is integrated through the navigation system to obtain the position. The GPS described in this paper uses RTK differential positioning, which can achieve centimeter-level accuracy.
To this end, we conducted experiments on a lake using both SINS and DVL equipment, collected data, and created a dataset to validate the effectiveness of the method proposed in this paper.
Summarizing the contributions of this paper as follows:
The remainder of this paper is organized as follows:
Section 2 introduces the calculation principle of the DVL velocity vector and the model network framework proposed in this paper.
Section 3 describes the relevant equipment used for data collection, the physical connections between the devices, and the location and process of data collection.
Section 4 presents a qualitative and quantitative analysis of the results obtained from the model.
Section 5 summarizes the work presented.
2. Principles and Network Framework
2.1. DVL Velocity Calculation
The DVL is an instrument that utilizes the Doppler effect of acoustic signals for velocity measurement and is widely used in the field of underwater navigation. The Doppler effect describes the phenomenon where there is a difference in frequency between the waves received and the waves emitted by a source when there is relative motion between the wave source and the receiver. The DVL emits multiple acoustic beams at various angles into the water and receives the echo signals that return after interacting with underwater objects, such as the sea floor. If there is relative motion, the frequency of the returning acoustic waves will change according to the Doppler effect. The DVL measures its velocity relative to the water column or the seafloor by analyzing the frequency difference (Doppler shift) between the emitted sound waves and the received echoes. It is represented by Equation (
1):
where
is the received frequency,
is the transmitted frequency,
is the velocity of the beam relative to the medium,
c is the speed of sound in the medium (water). Considering that the speed of the DVL is much slower than the speed of sound, the change in frequency can be simplified to Equation (
2):
Each beam’s velocity in a specific direction can be represented by Equation (
3):
The sensor configuration of the DVL is in an X-shaped arrangement, also known as the Janus Doppler configuration. The
Figure 2 shows a schematic diagram of the DVL beam directions:
By observing the geometric relationship between the DVL beams and the DVL body, the directions of the beams in the DVL body frame can be represented as (
4):
where
represents the beam number, and
and
are the yaw and pitch angles relative to the vehicle body frame, respectively. All four beams have the same pitch angle, while the yaw angle is expressed as Equation (
5):
We can define a transformation matrix B as shown in Equation (
6):
Then, the relationship between the DVL velocity
and the beam velocity measurements
can be represented through matrix B as shown in Equation (
7):
Based on the aforementioned relationship, using the least squares (LS) estimator, the DVL estimated velocity is obtained as shown in Equation (
8):
where
y is the measured value of the beam velocity, based on the ideas of Braginsky and others [
35], Equation (
8) can be transformed into Equation (
9).
At this point, the three-axis velocity of the DVL carrier vehicle can be solved through the beam velocities measured by the DVL. Most DVL devices use Equation (
9) to measure velocity. We aim to improve the velocity output of the DVL, hence it is necessary to select an error model. During the measurement process, the DVL may have scale factor errors, random noise, and misalignment angle errors. By incorporating these into Equation (
7), the common error model of the DVL can be obtained [
36], as shown in Equation (
10):
where
is the scale factor,
is the bias vector, and
n is zero-mean Gaussian white noise.
2.2. Deep-Learning Network Framework Based on IMU/DVL Integration
The internal IMU sensors of a SINS typically include three-axis gyroscopes and three-axis accelerometers. The raw measurement data from these gyroscopes and accelerometers can be used to calculate the position, orientation, and velocity of the AUV in space, therefore capturing the dynamic information of the AUV’s motion.
Liu and others [
37] proposed a deep-learning network architecture known as the GPS/SINS neural network, which combines Convolutional Neural Networks (CNN) and Gated Recurrent Unit (GRU) neural networks. This architecture extracts spatial features from IMU signals and tracks their temporal characteristics, therefore enhancing navigation performance during GPS outages. This demonstrates that IMU sensors contain sufficient navigation information, which is why this paper utilizes the output data from gyroscopes and accelerometers to assist in improving the DVL’s velocity vector.
GPS provides extremely high accuracy in position and velocity, and in many studies, the position and velocity information from GPS are considered a high-precision reference standard. Therefore, in this paper, the GPS velocity output is used as the target for improving the DVL output velocity. By continuously refining the calibration of DVL measurements, the reliability of the DVL output velocity is enhanced. Radio waves suffer significant attenuation in water, which impedes the underwater transmission path of GPS, leading to signal transmission delays or even loss. Therefore, we opt to enhance the forward and lateral velocities of the DVL, using the forward and lateral velocities from GPS as the targets for improvement. Data collection on the lake is simplified because it only requires a test vessel to sail on the surface to collect the required data.
The output rate of the measurement data from the gyroscope and accelerometer is 100 Hz, while the output rate of the measurement data from the DVL sensor is only 5 Hz. This means that we cannot directly input the data from the three sensors into the network. Therefore, following the approach in Cohen [
34], we package the data from the gyroscope and accelerometer according to the ratio of the output rates of the three sensors, which is 20:20:1. That is to say, every 20 sets of measurements from the gyroscope and accelerometer correspond to one set of measurements from the DVL.
We first pass the raw measurements from the gyroscope and accelerometer through a one-dimensional convolutional layer (1D Conv) composed of six 2 × 1 filters to effectively extract spatial features from the sequential data and generate feature vectors. The selection of a kernel length of 2 is a comprehensive consideration of computational efficiency and feature capture capabilities, while the design of 6 kernels is to achieve multi-scale feature fusion. Then, a Flatten layer is applied to “flatten” the multi-dimensional feature vectors into one-dimensional data. The output of the one-dimensional data is then passed through a Long Short-Term Memory (LSTM) layer. The LSTM layer can capture temporal features of the data, allowing for further feature extraction based on the temporal dynamics. The two sets of feature data are then combined and passed through a Dropout layer, which randomly discards a certain proportion of the data to reduce the risk of model overfitting. Afterward, the data goes through a series of fully connected layers. The output is combined with the current DVL velocity vector and, through the final fully connected layer, produces a 2 × 1 vector. This vector represents the predicted DVL velocity vector by the model. The specific architecture of the model is shown in
Figure 3 below.
For the aforementioned network architecture, the following definitions are utilized:
Flatten Layer: Transforms multi-dimensional input data into a one-dimensional format.
Dropout Layer: Regularizes the input data to reduce the risk of model overfitting.
Linear Layer: Performs a linear transformation on the input data, integrating features and producing an output of a fixed size.
Tanh Activation Function: The hyperbolic tangent function maps the input value
x to a nonlinear range between −1 and 1. It is specifically implemented as:
ReLU (Rectified Linear Unit) Activation Function: Also known as the ramp function, it is a nonlinear function with both biological and mathematical foundations, and it is specifically implemented as:
1D Conv Layer: Creates a convolutional kernel that captures local features of the input data along a single spatial dimension through convolution and outputs a feature vector. In a 1D Conv layer, all input parameters interact directly with the output. The relationship between the input and output is as follows:
where
x is input,
y is output,
t represents the timestamp,
p denotes the kernel length. The
p is the size of the filter or kernel used in the convolutional operation. The
w symbolizes the learned weights.
LSTM Layer: The LSTM layer can learn the long-term dependencies in the input data and extract the temporal features of the data through its internal gating structure. It is an advanced version of the Recurrent Neural Network (RNN) because it solves the problem of gradient explosion in time series [
38]. The LSTM uses input gates, output gates, and forget gates to extract temporal features. The role of the forget gate is to discard irrelevant information from previous outputs and current inputs, represented as:
where
is the current input,
is the output from the previous moment,
and
are the forget gate weights, and
is the bias. The role of the input gate is to update the cell state, represented as:
where
and
are the input gate weights, and
is the bias. The role of the output gate is to determine the output and decide which information from the cell state to pass on, represented as:
where
and
are the input gate weights, and
is the bias. The LSTM layer outputs the current output and hidden state, represented as the result of the output gate multiplied by the cell state through a tanh layer:
where
is the current cell state, represented as Equation (
18).
where
is the estimated cell state, represented as Equation (
19)
where
and
are the gate weights, and
is the bias.
The CNN-LSTM is a hybrid neural network architecture that combines the strengths of both CNN and LSTM. This architecture leverages the CNN’s proficiency in handling spatial data, such as images, and the LSTM’s expertise in processing sequential data, such as time series. It is widely used in fields like image captioning, video analysis, and time series forecasting. CNN excels at extracting spatial features, while LSTMs are adept at handling sequential features. By inputting the spatial features extracted by the CNN into the LSTM, it is possible to model and analyze sequences of spatial features. The CNN-LSTM architecture, by combining the advantages of both CNN and LSTM, exhibits stronger generalization capabilities and expressiveness when dealing with complex spatiotemporal data. The IMU, DVL, and GPS data we collected are all time series data with inherent spatial characteristics. In response to this scenario, we chose the CNN-LSTM architecture to improve the DVL velocity vector.
2.3. Network Hyperparameter Definition
Mean Squared Error (MSE) measures the accuracy of model predictions by calculating the average of the squares of the differences between the predicted values and the actual values. It is sensitive to the values of the Lie group and has high computational efficiency, which is suitable for the scenario of underwater navigation, so we use it as the model’s loss function. The definition of MSE is shown in Equation (
20) as follows:
where
n is the number of samples,
is the target value, and
is the model’s predicted value. The forward propagation process during training is the process by which input data passes through all layers of the network architecture
Figure 3. After completing the forward propagation, the loss function is calculated, and then the weights and biases of all layers are updated during the backpropagation process using the gradient descent method. The principle of gradient descent is defined as:
where
is the vector of weights and biases,
is the value of the loss function computed by the model when the weights and biases are set to
,
is the gradient operator, and
is the learning rate.
The learning rate holds an important position among all hyperparameters, as it directly affects the speed of the model training process and the performance of the model. If the learning rate is too low, it will cause the model to fall into local optima too quickly. If it is too high, it will prevent the model from converging. We used the Adam optimizer to achieve adaptive adjustment of the learning rate, aiming to improve the efficiency and stability of model training. Specifically, we set an initial learning rate of 0.001 and allowed the Adam optimizer to adjust it automatically. Additionally, we selected MSE as the loss function and found in our experiments that the model typically converges after 500 iterations. Regarding the batch size, we chose 4 as a compromise to balance memory usage and model performance. The selection of these hyperparameters was based on the results of multiple experiments. To mitigate overfitting, we employed regularization techniques during model training. Specifically, we added an L2 regularization term to the loss function to constrain the model weights and prevent the model from becoming too complex.
4. Data Analysis
Underwater navigation systems utilize technologies such as acoustics, inertial measurement, and other sensing techniques to achieve positioning and navigation. The primary methods include acoustic navigation, inertial navigation, and integrated navigation. Acoustic navigation works on the principle of using the propagation characteristics of sound waves in water for positioning. Sound wave signals are emitted from a transmitter, reflect off obstacles, and by measuring the time and direction of the sound wave’s propagation, the position of the object is determined. Inertial navigation calculates position and heading by measuring the device’s acceleration and angular velocity. Inertial navigation systems do not rely on external signals, making them particularly useful in underwater environments. Integrated navigation combines various navigation technologies (such as GPS, DVL, acoustic navigation, etc.) to enhance navigation accuracy and reliability. For example, an INS/DVL/GPS-based integrated navigation system corrects the errors of the inertial navigation system by fusing data from multiple sources, therefore improving navigation precision.
The training set is used to train the model, with the batch size set to 4 and the learning rate set to n = 0.001. The learning rate is decayed by a factor of 0.5 every 150 training epochs, and the model converges after 500 training epochs. The test set is input into our well-trained model to obtain the predicted DVL velocity vector and to calculate the corresponding evaluation metrics. The performance of the test set on the model can be seen in
Figure 7.
It is evident from the velocity curves that the DVL velocity vectors predicted by the model closely follow the GPS velocity vectors, especially during significant changes in speed. In
Figure 7, it is easy to see that the performance of the original DVL’s VL is relatively poor, especially when changing direction. This is consistent with the speed measurement rules of the DVL and is the expected result, as the likelihood of the DVL’s beam loss increases when there is a significant change in the direction of the vehicle, which can lead to poor quality of the speed vector output by the DVL. However, after optimization with our algorithm, we can better correct the DVL. The original DVL’s VF performance is better, which may be because the direction of the vehicle is always forward. The quality decreases when there is a sudden change in VF, but after optimization with our algorithm, it can show a good state. In summary, our algorithm can significantly optimize the DVL speed vector.
The performance evaluation of underwater navigation systems is a crucial step in ensuring their reliable operation in complex marine environments. The most critical aspects of navigation are position and velocity, so evaluating the performance of an underwater navigation system essentially requires assessing the accuracy of its position and velocity measurements. The definition of the model evaluation metrics follows Armaghani and Asteris [
39], and there are four in total:
Root Mean Squared Error (RMSE), a metric used to measure the difference between the model’s predictions and the actual values. The smaller the metric, the smaller the difference between the model’s predictions and the actual values, indicating better model prediction performance:
where
refers to the true value at time
i, and
refers to the predicted value at time
i.
Mean Absolute Error (MAE), a metric used to measure the prediction error of the model. The smaller the metric, the smaller the prediction error, indicating better model prediction performance:
Coefficient of Determination (
), which indicates the strength of the relationship between two quantities. The closer the metric is to 1, the stronger the relationship between the two quantities, and the closer to 0, the weaker the relationship:
where
represents the mean of the true values.
Variance Accounted For (VAF), a metric used to evaluate the predictive capability of the model. The larger the metric, the better the model’s predictive ability:
By inputting the test set into the model, we calculated the four evaluation metrics for both the DVL original velocity vector and the model-predicted velocity vector relative to the GPS velocity vector. The specific results are shown in the
Table 2 below:
From an objective evaluation of the velocity metrics, the model we proposed can significantly improve the DVL velocity vector. The RMSE achieved a maximum improvement of 69.26%, the MAE achieved a maximum improvement of 69.10%, the achieved a maximum improvement of 83.99%, and the VAF achieved a maximum improvement of 76.98%.
To further demonstrate the improvement in navigation accuracy, we performed dead reckoning on the carrier using the GPS velocity vector, the original DVL velocity vector, and the improved DVL velocity vector, and the resulting trajectory is shown in
Figure 8 below.
In
Figure 8, it is easy to see that the performance of the original SINS is poor. The reason for the poor performance of the SINS is also within our expectations, as the error of the SINS increases with the duration of navigation, so DVL is generally used to correct the velocity of the SINS. Although the DVL has a small error compared to GPS, there is still a certain error, especially after multiple changes in direction. The error increases, which also corresponds to the phenomenon in
Figure 7 where the DVL’s speed measurement is poor after changing direction. The DVL improved by our model is closer to GPS and has higher accuracy.
We use two metrics, Absolute Trajectory Error (ATE) and Relative Trajectory Error (RTE), to assess the differences between trajectories. ATE measures the overall difference of the trajectory estimation, focusing on the global discrepancy between the estimated trajectory and the true trajectory; RTE measures the local difference in trajectory estimation, focusing on the accuracy of adjacent position estimations within the trajectory. The expression for ATE is shown in Equation (
26).
where
refers to the true value at time
i, and
refers to the predicted value at time
i. The expression for RTE is shown in Equation (
27):
where
refers to the true value at time
, and
refers to the predicted value at time
. Based on the trajectory data, we can calculate the ATE and RTE of the DVL original velocity dead-reckoned trajectory and the model-predicted dead-reckoned trajectory with respect to the GPS velocity dead-reckoned trajectory. These results are shown in
Table 3.
From the objective point of view of ATE and RTE indicators, the improved DVL speed vector of the model has a good enhancement effect on the navigation accuracy, with a maximum increase of 78.62% in ATE and 69.90% in RTE.
In summary, the original DVL velocity vector, after being improved by the model we proposed, can become closer to the GPS velocity vector, therefore enhancing the accuracy of underwater navigation.
5. Conclusions and Prospects
5.1. Conclusions
We proposed a deep-learning method assisted by an IMU to enhance DVL velocity vectors. We utilize raw data from IMU sensors (gyroscopes and accelerometers) to assist in refining the original DVL velocity vectors, aiming to improve the accuracy of underwater navigation. To evaluate the effectiveness of this method, we conducted lake experiments using related equipment such as SINS and DVL, completing multiple tasks with varying navigation trajectories and times, and all task data collected served as a dataset for model training and testing. The results of the model on the test set show significant improvements in the four evaluation metrics of RMSE, MAE, , and VAF for the DVL velocity vector improved by the model. On the velocity map, it can be seen that the original DVL velocity vector does not perform well when the carrier turns, and the model’s improved DVL velocity vector handles this situation well. On the dead-reckoning trajectory map, it is evident that the trajectory of the carrier derived from the DVL velocity vector predicted by the model has a higher degree of overlap with the trajectory derived from the GPS velocity vector, and the ATE and RTE between the two trajectories are smaller.
5.2. Prospects
This paper only considered the state during constant-depth navigation, so we have only improved the DVL’s lateral and forward velocity vectors. In the future, we plan to incorporate depth data into the model, therefore improving the DVL’s three-axis velocity and further enhancing the accuracy of underwater navigation.
The conclusions of this paper are based on lake environments. It is well known that there are significant differences between lake environments and marine environments. Both acidity and temperature can affect the speed measurement of DVL, and marine environments tend to have larger waves. The applicability of the method proposed in this paper to marine environments remains to be considered. In the future, we will actively make improvements when conditions allow and complete the validation in marine environments.
The experimental analysis results presented in this paper demonstrate the effectiveness of the proposed method but do not elucidate whether it has an advantage over other similar methods. Additionally, this paper does not provide evidence to indicate that machine learning methods may incur relatively higher computational costs compared to more traditional numerical processing methods, such as filtering. Subsequent researchers can continue to investigate in this area.
This paper only considered the fixed SINS and DVL, lacking consideration for the mobility of the equipment. In the future, we will also focus on the transferability of the equipment, mainly considering the following three points:
Transferability between different DVL: Different DVL products are bound to affect the transferability of the model. In the future, the robustness of the model can be enhanced to achieve compatibility with different models of DVL products.
Transferability between different installation settings: When installing DVL and SINS, it is necessary to pay attention to the compensation of the lever arm and perform navigation after the SINS alignment, which can improve the stability of the equipment.
Transferability between SINS changes: Different SINS will also affect performance, but the better the performance of the SINS, the greater the improvement effect on the DVL. In the future, the performance of the model can be indirectly improved by improving the performance of the SINS.
We can also improve the transferability of the algorithm through certain measures, which may include developing more general-purpose algorithms, conducting more cross-platform tests, and establishing more comprehensive error compensation models.