1. Introduction
Underwater exploration is critical in understanding our oceans, impacting fields such as climate science, environmental monitoring, and resource management [
1,
2]. Recent technological advancements have enabled the development of sophisticated autonomous underwater vehicles (AUVs), equipped with high-resolution sensors, advanced navigation systems, and powerful propulsion capabilities, allowing them to operate in complex and deep-sea environments. Among these AUVs, underwater gliders (UGs) represent a unique subclass, first conceptualized by Henry Stommel in 1989. UGs utilize buoyancy adjustments to glide through the water, collecting data on essential parameters like temperature, salinity, and currents [
3,
4]. Their energy-efficient design enables long-term autonomous operation, making them particularly valuable for monitoring complex oceanic phenomena such as mesoscale eddies, internal waves, and hurricane activity [
5].
Despite their inherent advantages, during operation, UGs confront considerable challenges arising from complex and time-varying currents, which can significantly compromise their control precision and operational efficiency [
6,
7,
8]. The accurate prediction of their operational parameters is crucial for optimizing mission planning and ensuring data quality [
9,
10]. The current control methods for UGs in practical applications can generally be categorized into onboard control methods and remote control algorithms [
11,
12]. The former operate within the UG control unit, focusing extensively on constructing the dynamic model and selecting parameters specifically for the glider’s body. This tends to result in highly specific control approaches, lack universality, and struggle with the complexity and dynamic nature of underwater environments [
7,
13]. Remote control algorithms, operated from shore-based control centers, not only allow the application of complex control algorithms but also enable human intervention and real-time adjustments to the control precision of underwater gliders, thereby avoiding mishaps in this regard [
14,
15]. However, this undoubtedly increases human and time costs. Currently, UG control parameter tuning primarily relies on manual experience, which significantly lowers the accuracy and efficiency of control in practical missions [
16].
In the research on the control and navigation of underwater gliders, scholars have proposed various methods to improve control accuracy, address environmental disturbances, and optimize energy efficiency. Yang et al. [
17] employed an anti-system approach to decouple the original multi-input, multi-output (MIMO) system into two independent single-input, single-output (SISO) linear subsystems, then used sliding mode control (SMC) to control each subsystem individually, followed by simulation verification. Joo and Qu [
18] applied the LQR method to control the depth of a zigzag gliding path, and also conducted simulation validation. La et al. [
19] designed a layered PID controller that adjusted the heading via the rudder, achieving enhanced energy efficiency. Wang et al. [
20] investigated how the initial heading error in underwater gliders affects trajectory accuracy and explore sensitivity analysis of navigation commands to enhance path precision.
In the context of interdisciplinary integration across various fields, researchers have introduced deep neural network models into UG control studies, aiming to reduce reliance on physics-based modeling. Shan et al. [
21] proposed a model predictive control method based on recurrent neural networks to regulate the pitch angle, and its effectiveness was verified through simulation. Isa et al. [
22] utilized neural networks to fit dynamic models and designed a neural network predictive controller for UG motion control, comparing its performance with model predictive controllers (MPCs) and linear quadratic regulators (LQRs) through simulation. Wang et al. [
23] proposed a novel roll center compensation method (RCCM) utilizing variational mode decomposition and long short-term memory (VMD-LSTM) to accurately predict and minimize the roll regulation unit’s (RRU’s) energy consumption. Zhang et al. [
10] introduced a hybrid model that integrates single prediction models with optimal weights, utilizing a simulated annealing-optimized Frank–Wolfe method for weight derivation, in order to accurately forecast the surfacing positioning point (SPP) of UGs at various time scales.
Building upon these advancements, other studies have explored various deep learning architectures and hybrid models to further enhance the predictive accuracy and operational efficiency of underwater gliders in diverse and dynamic marine environments. Immas et al. [
24] proposed two prediction tools using deep learning, LSTM, and Transformer, to perform real-time in-situ prediction of ocean currents at any location. Mu et al. [
25] proposed a novel navigation method for AUV with hybrid recurrent neural networks, which could also satisfy the real-time requirement. Liu et al. [
26] proposed a VMD-BiGRU model for short-term buoy motion prediction, effectively enhancing accuracy by decomposing signals and capturing motion patterns across frequency scales. Hou et al. [
27] proposed an SSA-optimized Informer model with CEEMDAN to forecast long-sequence ship motion attitude, enhancing prediction performance across varied sea conditions. Jiang et al. [
28] proposed a joint multi-model machine learning method based on confidence to improve ship stability prediction accuracy, reducing input features and enhancing robustness. In order to further elaborate the literature survey, we show the main contributions of past studies in
Table 1.
Despite significant advancements in predictive modeling for control parameters within the domain of UG research, the majority of studies have been validated through simulations, reflecting an absence of extensive empirical data. Consequently, there remains a substantial opportunity for enhancing the predictive accuracy of these models, particularly through the integration of more diverse and extensive datasets [
29,
30,
31]. UGs are driven by buoyancy and have weak maneuverability when working within a strong ocean current [
32,
33]. Furthermore, UGs cannot be positioned underwater because of their lack of acoustic devices or inertial sensors, which introduces uncertainty in the control parameters within a single profile and exhibits strong nonlinear characteristics [
34]. Therefore, the development of a novel parameter prediction method is crucial for the local or global control of UGs. Concurrently, the prediction of key UG control parameters, particularly the rudder angle, aids in the dissemination of control decisions and heading maintenance during task execution.
Table 1.
Studies related to underwater glider parameter prediction.
Table 1.
Studies related to underwater glider parameter prediction.
Category | Reference | Method | Description and Conclusions |
---|
LSTM variants | [35,36,37,38] | Multiscale attention-based LSTM, self-attention LSTM (SALSTM), LSTM, LSTM with multi-head attention | Various LSTM variants have been proposed, including multiscale attention, self-attention, and multi-head attention LSTMs, to improve prediction accuracy for ship motion, roll, surge, and pitch by enhancing robustness and handling different input-output ratios. |
Deep learning and attention mechanism models | [24,25,39] | Hybrid RNN, LSTM, Transformer, weighted attention model | A range of deep learning and attention mechanisms, such as hybrid RNNs, Transformers, and weighted attention models, have been used for real-time predictions of ocean currents and AUV navigation, enhancing prediction accuracy across broad temporal and spatial ranges. |
Data decomposition and hybrid models | [26,40] | VMD-BiGRU, TVF-EMD + SVR | Data decomposition and hybrid models like VMD-BiGRU and TVF-EMD + SVR have been developed to improve the accuracy of short-term buoy motion and ship roll predictions by addressing frequency-specific patterns, nonlinearity, and time-varying dynamics. |
Informer and Its Variants | [27,41] | Conv-Informer, SSA-Optimized CEEMDAN-Informer | Informer variants, such as Conv-Informer and SSA-Optimized CEEMDAN-Informer, are used for multi-step and long-sequence ship motion predictions, capturing both local and long-range dependencies, and enhancing performance across varied sea conditions. |
Multi-model and combination methods | [10,28] | Multi-model machine learning, combination model | Multi-model and combination methods like confidence-based machine learning and optimized weighting models improve the prediction accuracy of ship stability and surfacing positioning for underwater gliders across different time scales. |
To sum up, due to the large time–space span during the observation mission, UG control parameters, particularly the rudder angle, are challenging to predict accurately utilizing traditional deep neural network methods, which exhibit a deficiency in this regard. Considering the vast and variable dataset of UGs, deep learning (DL) [
42,
43] is applied to establish the method that can predict the new parameter of the rudder angle in the next profile to guide the adjustment for heading correction in advance. The integration of DWT [
44] and FFT-Attention mechanisms [
45] into DL models has shown promise in capturing both high-frequency details and low-frequency trends in UG data. Such hybrid models can significantly enhance the robustness and accuracy of predictions by leveraging the strengths of both time- and frequency-domain analyses. Furthermore, the use of recursive neural networks (RNNs) [
46], such as long short-term memory (LSTM) [
47] and gated recurrent unit (GRU) [
48] networks, has proven effective in capturing temporal dependencies in glider data, thereby improving the prediction of dynamic behaviors over time [
49]. These models can learn from sequential data and provide robust predictions, even in the presence of noise and missing data, which are common in marine environments [
50,
51].
This paper proposes a novel method termed DFFormer to predict UG control parameters, with a particular focus on rudder angles. The proposed method integrates discrete wavelet transform (DWT) for rudder angle signal decomposition of the rudder angle and employs a fast Fourier transform-based attention mechanism (FFT-Attention) to effectively capture and analyze its frequency- and time-domain characteristics. Notably, the method leverages a Transformer architecture to process the decomposed signals through multiple parallel pathways, substantially improving the capability to forecast the complex and variable UG control parameters. The incorporation of an attention mechanism significantly bolsters the model’s capacity to identify pivotal features within time-series data, culminating in a more robust and dependable forecasting framework. The efficacy of the proposed method is validated through actual sea trials. The results demonstrate significant improvements in UG prediction accuracy and operational efficiency of UGs, thereby enhancing their heading-keeping ability. This study not only advances the field of underwater vehicle parameter prediction but also offers valuable insights for the application of deep learning techniques in other types of autonomous underwater systems.
The rest of this paper is organized as follows:
Section 2 introduces the modeling approach for underwater gliders and the working principles of the proposed DFFormer method, focusing on the kinematic equations, control strategies, and the design and implementation of the discrete wavelet transform, fast Fourier transform-based attention mechanism, and recursive layers.
Section 3 presents the experimental results and analysis, comparing the performance of various prediction methods using actual sea trial data. Finally,
Section 4 concludes the research findings and outlines future research directions.
2. Underwater Glider Modeling
2.1. The Petrel-L Underwater Glider
Petrel-L, depicted in
Figure 1, is an advanced underwater glider developed by Tianjin University in China. It features a cylindrical pressure hull that accommodates several components, including the emergency device, fixed wings, two fairings, and a trailing antenna. Within this pressure hull, the glider is equipped with numerous subsystems such as buoyancy-regulating, pitch-regulating, roll-regulating, and control units, as well as communication, navigation units, and a battery pack.
As illustrated in
Figure 2, UGs performing ocean observation operations are generally remotely controlled via satellite communication from a shore-based control center. Upon receiving instructions, the glider undertakes profile observation maneuvers, while onboard sensors simultaneously gather data on the marine environment. The conventional workflow comprises three phases: the surface communication, diving, and surfacing phases. Due to technological limitations, current technologies do not allow for stable communication during deep water operations; the glider is, thus, in a no-communication state while diving.
Surface communication phase: The buoyancy drive unit of the underwater glider adjusts the total displacement volume to ensure its net buoyancy exceeds gravity, allowing it to float on the surface. According to a predetermined optimal communication angle, the attitude adjustment unit configures the glider in a nose-down posture, exposing the stern antenna above the water surface for satellite link communication with the shore-based control center. During this phase, the glider transmits its location and attitude information and receives control commands.
Diving phase: Following the reception of a “start gliding” command from the control center, the buoyancy drive unit adjusts the displacement volume to render the glider’s net buoyancy less than gravity. The attitude adjustment unit continuously modifies the pitch and heading angles to maintain a pre-set pitch angle for the diving maneuver. The onboard control unit concurrently activates the task sensors to begin data collection.
Climbing phase: Once the depth sensor detects that the actual depth has reached a pre-set threshold, the buoyancy drive unit adjusts the buoyancy of the glider to exceed gravity, initiating upward movement. Simultaneously, the attitude adjustment unit transitions the glider from a nose-down to a nose-up posture, maintaining the pre-set pitch angle during the ascent. The task sensors operate similarly to the diving phase until the depth sensor indicates that the actual depth has reached the surface level, after which the various modules of the underwater glider enter the next profile surface communication phase.
2.2. Underwater Glider Kinematic Modeling
The ‘Petrel-L’ underwater glider is elliptical, and it navigates through the Earth’s oceans by pitching, rolling, and yawing. Therefore, it is necessary to establish body coordinate
, velocity coordinate
, and inertial coordinate systems
, as shown in
Figure 3.
The vector and represent the position and attitude angles of the underwater glider’s buoyancy center in the inertial coordinate system, respectively. Among them, X and Y are the horizontal plane coordinates, Z is the depth; is the roll angle, which is the angle between and plane . A positive value indicates a right roll when viewed from the tail to the nose of the vehicle; is the pitch angle, which is the angle between the longitudinal axis of the underwater glider and the plane . A positive value indicates a pitch-down angle; a negative value indicates a pitch-up angle; is the yaw angle, which is the angle between the projection of the longitudinal axis of the underwater glider on the plane and . A positive value indicates a right yaw when viewed from the tail to the nose of the vehicle.
The vector and represent the velocity and angular velocity of the underwater glider’s buoyancy center in the body coordinate system, respectively. Among them, u, v, and w are the linear velocities of the glider’s buoyancy center along the , , and directions, and p, q, and r are the angular velocities of the glider’s buoyancy center rotating around the , , and axes, respectively.
The angle of attack and sideslip angle are defined in the velocity coordinate system. The projection of the glider’s buoyancy center velocity vector on the plane, the angle between the projection and the axis is the angle of attack . A positive value indicates that the projection is below the axis, and a negative value indicates that it is above the axis; the angle between the glider’s buoyancy center velocity vector and the projection is the sideslip angle . When viewed from the tail to the nose of the vehicle, a positive value indicates that the velocity vector is to the right of the axis, and a negative value indicates that it is to the left of the axis.
To design control strategies for UGs, it is necessary to comprehensively consider the glider’s position, angle, velocity, and angular velocity information. Therefore, these variables must be transformed into the same coordinate system. First, rotate the inertial coordinate system around the
axis by an angle
, with the rotation matrix
. Then, rotate around the
axis by an angle
, with the rotation matrix
. Finally, rotate around the
axis by an angle
, with the rotation matrix
, to obtain the body coordinate system. Let the rotation matrix from the inertial frame to the body frame be denoted as
; then,
is specifically expressed as follows:
The rotation matrix from the body to the velocity coordinate system is denoted as
. The specific expression of
is as follows:
Studying the kinematic equations of UGs is equivalent to studying their motion in an inertial coordinate system. Therefore, it is necessary to use the rotation matrix
to transform the velocity
V in the frame of the UG, and then use the rotation matrix
to transform the velocity in the frame of the UG to
. The transformation expression for
is as follows:
In addition to velocity, the state variables of an underwater glider in the inertial coordinate system also include the angular variables
. The expression, using the angular velocity in the body coordinate system through the rotation matrix
as
, is as follows:
The expressions for obtaining the magnitude of the velocity vector, the angle of attack, and sideslip angle are as follows:
By substituting the expression of the rotation matrix into Equations (
5) and (
6), we obtain the kinematic equations of the underwater navigation robot as follows:
3. Methodology
This paper proposes a novel model for forecasting UG control parameters based on the Transformer, including an encoder and decoder, as shown in
Figure 4.
In the encoder and decoder, DWT is utilized to decompose the series into high-frequency details and low-frequency trend sections. In the former, these signals have a large number of frequency-domain features. FFT-Attention is, therefore, used to extract their frequency features. In the low-frequency trend part, there are fewer frequency features, and the time-domain features are considered to be more dominant. As such, the attention mechanism is used to extract them. In order to enhance the model’s feature extraction capability, parallel recursive layers are created. Moreover, a fusion block is proposed to integrate these extracted features’ information. Finally, multi-fusion block layers are used to integrate the learned information into the output for prediction. Specifically, the model construction process is as follows.
For every time step, let represent the input time series, where , ; b indicates the batch size, s represents the input length, p denotes the prediction steps, and D denotes the input dimension. Similar to the Transformer, the encoder and decoder inputs of the proposed model are and , respectively. l denotes the length of the decoder, i.e., the label length.
3.1. Using Discrete Wavelet Transform to Decompose the Input Series
In this paper, DWT is performed to decompose these time-series signals into trend and detail components utilizing low- and high-pass filters. DWT decomposes a given discrete signal into orthogonal wavelet functions. In the case of 1D signals, like a time series, results are shown in a transformed vector of equal length. The vector is first filtered with a low-, and then, a high-pass filter. We can mathematically represent a DWT as follows in Equation (
8):
The terms
l and
m in the above equation for DWT represent the scale factor and translation index, respectively.
denotes the mother wavelet function. In detail, the input series
is decomposed by convolving a low-
and high-pass filter
, where
and
K denote the lengths of the input series, as shown in Equations (
9) and (
10):
where
and
are the
elements of the trend and detail components obtained from the single-level discrete wavelet decomposition, and
x denotes the element. Therefore, the DWT results of these input series can be obtained according to Equations (
9) and (
10), as depicted in Equations (
11) and (
12), as follows:
where
and
,
and
, and
represents the discrete wavelet transform according to Equations (
8)–(
10).
3.2. Using FFT-Attention to Extract Parallel Features in Frequency Domain
For the encoder section, embedding is carried out to enhance the information of the input time series for more effective feature capture utilizing FFT-Attention. Firstly, these vectors can be obtained from Equation (
13):
where
and
,
d denotes the feature dimension, and
indicates the embedding structure.
In the high-frequency details section, based on signal decomposition and an adapted mechanism, FFT-Attention was utilized to exact the feature information in terms of frequency. In particular, the input vectors
are transformed to query
, key
, and value
vectors by utilizing the feature augment method, as in Equations (
14)–(
16):
where
,
, and
denote the related parameter matrices, respectively. Then, these vectors are transformed multiple times to obtain multiple self-attention sub-layers, i.e.,
, where
h is head size and
indicates the
sub-layer. Similarly,
and
can be obtained. In every sub-layer, these vectors
,
, and
are evaluated using fast Fourier transform in order to transfer the features to the frequency domain, as depicted in Equation (
17):
where
denotes the feature dimension after the fast Fourier transform, and
(
denotes fast Fourier transform). Moreover, every sub-layer can be calculated using Equation (
18):
Therefore, all sub-layer outputs are concatenated to acquire the feature vectors
in Equation (
19):
where
indicates the related parameter matrix. Furthermore, inverse fast Fourier transform is performed to make the signal return to the time domain in Equation (
20):
where
, and
denotes the inverse fast Fourier transform. Next, one feed-forward layer and two residual links are created, as in Equation (
21):
where
indicates one residual links function,
denotes one feed-forward layer function, and
represent learnable parameters.
3.3. Using the Recursive Layer to Capture Inference Features
In addition, the recursive layer is utilized to better extract feature information by capturing inference features. In this recursive layer, four structures can be chosen for information capture, including MLP, RNN, LSTM, and GRU. In detail, these structures can be represented as follows:
MLP is a feed-forward network that learns weights , mapping inputs to outputs. For a model with two hidden layers, the chain structure of multiple stacked layers provides depth. In order to improve the learning ability of the model, nonlinear functions, such as or functions, are applied to the neuronal output. The optimal weight is determined by minimizing a differentiable loss function using backpropagation, which updates the network weight by propagating the gradient of the weight relative to the loss function back through the network.
- 2.
Recurrent neural network.
The main purpose of the RNN, a structural diagram of which is shown in
Figure 5, is to process and predict sequence data. The calculation process of the classic RNN can be expressed as Formulas (
22) and (
23):
where the inputs are
,
, ⋯,
, the corresponding hidden states are
,
, ⋯,
, and the outputs denote
,
, ⋯,
.
U,
W, and
V denote the weight parameters,
b and
c represent biases, and
f denotes the activation function, which is usually the
function, as follows:
- 3.
Long short-term memory.
LSTM, a structural diagram of which is shown in
Figure 6, is composed of forget, input, and output gates. These jointly control the storage and deletion of data, as shown in Equations (
24)–(
28):
where
denotes the forget gate,
represents the input gate, and
indicates the output gate.
,
,
,
,
, and
are the related weight matrices.
,
, and
are the related bias;
denotes the data retained from the beginning to the current moment;
saves the information of the previous time step
, while
controls how much retained data can be transferred to the next moment.
- 4.
Gated recurrent unit
As shown in
Figure 7, a GRU has only two gates. It combines the input and forget gates found in LSTM into a single gate; this is called the update gate, as depicted in Equations (
29)–(
32):
where
denotes the update gate and
represents the reset gate.
,
, and
denote the related weight matrices.
and
indicate the related biases.
denotes new information and
denotes the output.
These layers are replaced with
. Therefore, the feature vectors extracted by this layer can be represented as shown in Formula (
33):
where
represents the feature vector extracted by the GRU layer, and
denotes the nonlinear activation function that replaces it. The feature vector
is a three-dimensional tensor with dimensions
, where
b denotes the batch size,
s denotes the sequence length, and
d denotes the feature dimension.
3.4. Integrating Extracted Features Using the Fusion Block
Different features often have different complexities, and simple addition cannot determine the corresponding weight, resulting in inadequate integration. To effectively integrate this feature information, for the first time, in this paper, we present the fusion block. In particular, we define the gate argument as
g, as shown in Equation (
34):
where
,
are the relevant weight matrices, respectively, and
denotes the learnable parameter. Furthermore, this feature information can be integrated into Equation (
35):
where
,
is a parameter set to
, and ⨂ represents the dot product.
3.5. Extracting Parallel Features in the Time Domain Using Attention
In this low-frequency trends section, as with the detailed part, fast Fourier transform is not used. More specifically, referring to Formulas (
14)–(
16), in the
sub-layers, the feature vectors being characterized can be defined as
,
, and
. Then, the feature vectors
can be obtained according to Equations (
18) and (
19). The remaining steps are consistent with the detailed section based on Equations (
21)–(
35), and the trend characteristics can be presented as
. Next, one feed-forward layer and two residual links are created, as shown in Equation (
36):
where
indicates one residual link function,
denotes one feed-forward layer function, and
represent learnable parameters. The rest of the structure mirrors that of the high-frequency details, and the features
can be obtained in the low-frequency trends section.
3.6. The Output Prediction Results Utilizing the Decoder Structure
In the decoder, to realize the prediction function, the information must be masked in the training. As such, the masking attention mechanism is used. Similar to before, except for the masking operation, the attention computation steps are the same as in the encoder. The detailed steps are as follows.
Firstly, embedding is executed to obtain enhanced feature information, as in Equation (
37):
where
and
. Secondly, similar to the steps of the encoder, the feature vectors of the
sub-layers
,
, and
can be obtained according to Equations (
14)–(
17). Thirdly, a mask matrix is added to the sub-layers of the attention calculation, as referred to in Equation (
18), and depicted in Equation (
38):
where
denotes the mask matrix, i.e., the 0–1 diagonal matrix. The feature vectors
are calculated using Equation (
39).
where
denotes the related parameter matrix. Referring to Equations (
20) and (
21), the masked feature vectors
can be obtained. The masked feature vectors
are calculated in the same way. The fusion block is utilized to integrate
and
to obtain
, as shown in Equation (
40):
where
, with
denoting a scaling factor and
g denoting a gating parameter. This produces the fused feature vector
, which serves as the integrated feature representation for the subsequent decoding steps.
Next, let
be the input as query vectors to FFT-Attention, and
be the key and value vectors. This allows the extracted feature information in the encoder to be integrated into the decoder for forecasting. Referring to Equations (
14)–(
19), the feature vectors
can be computed using Equations (
41)–(
44):
where
, and
.
,
, and
denote the related parameter matrices, respectively. The extracted trend feature vectors
can be obtained in a similar manner. Then, similarly, according to Equations (
20)–(
35), the feature vectors
of the already fused details are obtained.
In a similar way, the fused trend feature vectors
can be computed. Furthermore, the fusion block is created to integrate the fused detail and trend feature vectors and obtain the output feature vectors
in Equations (
45) and (
46), according to Equations (
34) and (
35).
where
,
are the relevant weight matrices, respectively, and
denotes the learnable parameter. Finally, inverse wavelet transformation with a linear layer is utilized to obtain prediction information, as in Equation (
47).
where
,
denotes the inverse wavelet transformation function, and
denotes the linear layer function.
3.7. Optimizing the Neural Network Structure Using the Huber Loss Function
Because of the harsh running environment, which is strongly affected by outliers, the model’s precision must be based on a robust loss function for optimization. The Huber loss function is robust, combining the advantages of mean absolute error (MAE) and mean square error (MSE). The Huber loss function not only has continuous derivatives but can also use the MSE gradient to reduce errors and obtain more accurate results. The Huber loss function can be expressed as in Equation (
48):
where
y denotes the original series and
represents the prediction series.
4. Experimental Results and Discussion
In this section, we use a dataset derived from sea trials to build predictive models. Specifically, the dataset captures the rotational angle of the regulatory unit’s mass during the steady-state gliding phase, hereinafter referred to as the rudder angle, and determines the mean rudder angle across both the descent and ascent phases. First, we detail the acquisition and preprocessing of the raw data. Then, we employ five established predictive algorithms, including Informer, Autoformer, VMD-SSA-LSSVM, VMD-LSTM, and DFFormer, to train and forecast the rudder angle from the training dataset. The validation set is used to validate and evaluate the performance of these methods. After a comprehensive comparison and analysis of the predictive results from various methods, DFFormer is conclusively identified as the optimal forecasting method for UGs.
4.1. Research Data
From April to June 2023, the Petrel-L glider underwent sea trials in the South China Sea.
Figure 8 illustrates its area and trajectory during the oceanic experiment, encompassing a longitudinal span from 110° E to 117° E and a latitudinal span from 16° N to 21° N. Additionally, an inset in the upper left corner displays a deployment photograph of the glider. Throughout the 70-day trial period, the glider functioned without any reported issues, successfully compiling a dataset comprising 1110 individual profiles.
Figure 9 displays the depth and rudder angle data for the Petrel-L glider over a specific period, corresponding to profiles 480 to 483. As shown, both the depth (indicated by the blue line) and the rudder angle (represented by the red line) fluctuate with gliding time, particularly during the descent and ascent phases when the RRU operates frequently. Prolonged underwater operations can lead to biofouling, the accumulation of RRU errors, and unpredictable disturbances from ocean currents, all of which may induce lateral imbalances in the glider. These imbalances can affect the stability of heading-keeping. Consequently, it is essential to make dynamic predictions for the roll center’s imbalances, particularly focusing on predicting rudder angles. By accurately forecasting these angles, the frequency of rudder adjustments can be minimized, thereby significantly enhancing the glider’s heading-keeping performance.
To ascertain the authenticity and reliability of the dataset, several stringent prerequisites must be satisfied before utilizing the raw UG data for estimating the mean rudder angle:
Eliminate the time-series raw data obtained when the buoyancy adjustment unit is operating. Data collected during buoyancy adjustments capture rapid changes that are unrepresentative of the glider’s steady-state gliding dynamics. By removing these intervals, the dataset retains only stable conditions, reducing noise and preventing the model from learning patterns that could reduce reliability.
Eliminate the raw data during UG’s surface positioning and communication periods. During surface positioning, environmental influences such as wave and current impacts cause irregular rudder adjustments. Excluding these segments allows the model to focus on stable underwater data, eliminating inconsistencies that might compromise prediction authenticity.
Since the mean rudder angle differs between ascent and descent, process the data for these two phases separately. The hydrodynamic conditions for ascent and descent differ, leading to distinct control requirements for the rudder angle. Treating these phases separately ensure that phase-specific patterns are accurately captured, enhancing the model’s capacity to provide robust and realistic parameter predictions.
4.2. Evaluation Indicators
In order to substantiate the precision of the predictive methodologies, an array of rigorous scientific evaluative indicators were meticulously selected to appraise the forecast outcomes’ efficiency. These indicators include the mean absolute error (MAE), the root mean squared error (RMSE), and the symmetric mean absolute percentage error (SMAPE).
MAE represents the average magnitude of the errors between the estimated values and the original data, without considering their direction. MAE is defined as follows:
where
represents the original values,
represents the estimated values, and
n represents the number of observations.
RMSE is another commonly used metric that represents the square root of the average squared differences between the estimated values and the original data. It is defined as follows:
where
represents the original values,
represents the estimated values, and
n represents the number of observations.
RMSE is sensitive to larger errors and, thus, provides a measure that emphasizes larger discrepancies between the predicted and observed values.
Additionally, the symmetric mean absolute percentage error (SMAPE) is used to further evaluate the performance of the prediction methods. SMAPE is described as follows:
where
represents the original values,
represents the estimated values, and
n represents the number of observations.
Generally, lower MAE, RMSE, and SMAPE values indicate a better performance on the part of the prediction method.
To further assess the predictive accuracy, Theil’s inequality coefficient (TIC) and the Index of Agreement (IA) were also employed.
TIC is a measure of the relative accuracy of the predictions, where a lower TIC indicates better predictive performance. It is defined as follows:
where
represents the original values,
represents the predicted values, and
n represents the number of observations. TIC ranges between 0 and 1, with values closer to 0 indicating better model accuracy.
The Index of Agreement (IA) is another indicator of model performance, providing a normalized measure of the match between observed and predicted values. The IA is defined as follows:
where
denotes the mean of the observed values. IA values range from 0 to 1, with values closer to 1 indicating better agreement between the predicted and observed data.
By combining the indicators MAE, RMSE, SMAPE, TIC, and IA, a comprehensive evaluation of the prediction method’s performance can be achieved.
4.3. Comparison of Different Prediction Methods
In this study, the dataset was divided into training and validation sets, with 80% of the samples used for the former and the remaining 20% for the latter. To stay on course, we applied several classic forecasting methods, including Informer [
52], Autoformer [
53], VMD-LSTM [
23], and VMD-SSA-LSSVM [
54].
Figure 10 shows the prediction results and the error of the three best-performing methods (Informer, Autoformer, and DFFormer). These prediction results are very close to the raw data, indicating that these methods have high accuracy and reliability in maintaining course prediction.
In order to select the optimal prediction method among these forecasting methods, the errors between the predicted values and the raw values are calculated, as shown in
Figure 11 and
Figure 12 and
Table 2. The indicators in
Table 2 were used to quantify the aforementioned results. Overall, DFFormer has the lowest MSE and MAE among all control variables and performs the best. In contrast, VMD-SSA-LSSVM performs the worst among all control variables, with the highest MSE and MAE. The performances of Informer, Autoformer, and VMD-LSTM fall between DFFormer and VMD-SSA-LSSVM, but are noticeably inferior to the excellent performance of DFFormer.
These results indicate that DFFormer demonstrates higher accuracy and stability when handling these control variables, clearly outperforming other methods. This superiority is attributed to improvements in the design and training process of the DFFormer model, which exhibits a better generalization ability and prediction accuracy when dealing with complex time-series data.
Furthermore, this article also tested the running time and computational resources of DFFormer and baseline models, as shown in
Table 3. These metrics include training time, inference time per prediction, and computational resources, all measured on the same hardware configuration (NVIDIA RTX 3090 GPU, 32 GB RAM).
As shown in
Table 3, although DFFormer requires more training time, runtime, and storage space compared to Informer, Autoformer, VMD-LSTM, and VMD-LSTM, it is worth noting that DFFormer has higher accuracy. It can be expected that with the development of computer technology, this disadvantage of running time and storage can be further reduced. The accuracy of DFFormer can be attributed to its architecture, which combines discrete wavelet transform (DWT) for signal decomposition and fast Fourier transform-based attention (FFT-Attention) for feature extraction. These methods effectively enhance the model’s ability to capture temporal features by focusing on the frequency domain, enabling DFFormer to achieve accurate predictions with stronger feature extraction capabilities. Based on the above tests, a conclusion can be drawn that DFFormer can achieve higher prediction accuracy with minimal running time and computational resources cost.
4.4. Comparative Analysis of Model Performance with Different Recursive Layers
Based on the aforementioned analysis, the DFFormer method demonstrates a superior performance compared to other methods. To further enhance the model’s predictive capabilities, we focused our attention on the recursive layer’s design. To assess its contribution to the DFFormer method, we conducted a series of experiments designed to analyze the recursive layer’s impact on the model’s overall predictive performance. By comparing models both with and without recursive layers, as well as those incorporating various types thereof (MLP, RNN, GRU, and LSTM), we extensively explored the performance of these structures in managing time-series data.
In the experiments, we constructed various model configurations to evaluate the effects of different recursive layers. Initially, the baseline model was modified by removing the recursive layer, establishing a comparison benchmark. Subsequently, we integrated MLP, RNN, GRU, and LSTM as distinct recursive layers within the model to process time-series data. We introduced these layers aiming to enhance the model’s capacity to capture temporal dependencies, thereby improving predictive performance. The experimental results are shown in
Figure 13 and
Figure 14 and
Table 4.
The experimental results indicate that the selection of different recursive layers within the model significantly influences predictive performance. Specifically, when utilizing LSTM as the recursive layer, the model achieved the best performance across all evaluation metrics, highlighting its strengths in managing time-series data with long-term dependencies. In contrast, the performance of MLP exhibited a relatively weaker performance, suggesting its limited capacity to capture complex temporal dependencies. Additionally, the GRU demonstrated a commendable performance, effectively balancing model accuracy and computational efficiency in certain scenarios. Meanwhile, the RNN, despite its potential to enhance model performance, was marginally less effective.
Through these experiments, the significance of the recursive layer in the new method becomes evident. The LSTM’s primary advantage is its robust capacity to capture temporal dependencies, whereas the GRU maintains an effective equilibrium between performance and computational efficiency. Consequently, these results furnish compelling evidence for selecting the most appropriate recursive structure.
The effectiveness of DFFormer depends not only on prediction accuracy but also on computational efficiency, particularly with different recurrent layer configurations, such as LSTM, GRU, MLP, and RNN.
Table 5 shows a comparison of computational resource consumption for DFFormer with different recurrent layers, including training time, inference time per prediction, and memory usage. All metrics were measured on the same hardware setup (NVIDIA RTX 3090 GPU, 32 GB RAM).
As shown in
Table 5, the MLP configuration achieves the shortest training and inference times with the least memory usage, making it suitable for resource-constrained environments. However, the LSTM recurrent layer, while requiring more computational resources, provides a stronger sequence modeling capability, which enhances predictive accuracy. These comparisons indicate that DFFormer can effectively balance computational load and prediction accuracy, allowing users to choose the optimal configuration based on specific application requirements.
While DFFormer achieved high predictive accuracy and stability during trials in the South China Sea, its potential generalizability to other UG types and different marine environments warrants further discussion. Notably, the fundamental sensor configurations and operational mechanisms of various UGs are often quite similar. Many UG models, regardless of their specific design variations, use comparable sensors for measuring parameters like pitch, yaw, roll, and depth, as well as environmental variables such as temperature and salinity. These shared sensor and data configurations suggest that DFFormer’s core predictive algorithms can be feasibly adapted to other UG models, allowing it to generalize effectively across various UG types with minimal modification.
In terms of generalizability to different marine environments, while oceanographic conditions such as temperature, salinity, and current patterns do vary across seas, DFFormer’s design leverages a multi-scale feature extraction method that is particularly robust in capturing and adjusting for local variability. The integration of discrete wavelet transform (DWT) and fast Fourier transform-based attention (FFT-Attention) allows DFFormer to effectively model both high- and low-frequency characteristics in UG data. This dual approach enables DFFormer to capture complex, location-specific patterns, thus potentially enhancing its adaptability across different oceanic conditions.