Vessel Trajectory Prediction Based on AIS Data: Dual-Path Spatial–Temporal Attention Network with Multi-Attribute Information

Huang, Feilong; Liu, Zhuoran; Li, Xiaohe; Mou, Fangli; Li, Pengfei; Fan, Zide

doi:10.3390/jmse12112031

Open AccessArticle

Vessel Trajectory Prediction Based on AIS Data: Dual-Path Spatial–Temporal Attention Network with Multi-Attribute Information

by

Feilong Huang

^1,2,3,

Zhuoran Liu

⁴,

Xiaohe Li

^1,2,

Fangli Mou

^1,2

,

Pengfei Li

^1,2,3 and

Zide Fan

^1,2,*

¹

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China

²

Key Laboratory of Target Cognition and Application Technology (TCAT), Beijing 100190, China

³

School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100190, China

⁴

School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(11), 2031; https://doi.org/10.3390/jmse12112031

Submission received: 10 October 2024 / Revised: 31 October 2024 / Accepted: 4 November 2024 / Published: 10 November 2024

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

With the rapid growth of the global shipping industry, the increasing number of vessels has brought significant challenges to navigation safety and management. Vessel trajectory prediction technology plays a crucial role in route optimization and collision avoidance. However, current prediction methods face limitations when dealing with complex vessel interactions and multi-dimensional attribute information. Most models rely solely on global modeling in the temporal dimension, considering spatial interactions only later, failing to capture dynamic changes in trajectory interactions at different time points. Additionally, these methods do not fully utilize the multi-attribute information in AIS data, and the simple concatenation of attributes limits the model’s potential. To address these issues, this paper proposes a dual spacial–temporal attention network with multi-attribute information (DualSTMA). This network models vessel behavior and interactions through two distinct paths, comprehensively considering individual vessel intentions and dynamic interactions. Moreover, we divide vessel attributes into dynamic and static categories, with dynamic attributes fused during feature preprocessing, and with static attributes being controlled through a gating mechanism during spatial interactions to regulate the importance of neighboring vessel features. Benchmark tests on real AIS data show that DualSTMA significantly outperforms existing methods in prediction accuracy. Ablation studies and visual analyses further validate the model’s reliability and advantages.

Keywords:

vessel trajectory prediction; deep learning; transformer; multi-attribute information

1. Introduction

With the rapid growth of global trade, the number of vessels in both maritime and inland navigation [1] has significantly increased, leading to higher traffic density [2,3]. While this trend promotes economic development, it also poses substantial challenges to navigation safety and efficiency management [4]. Vessel trajectory prediction technology has emerged as a crucial tool in addressing these issues. By accurately forecasting the future positions of vessels, this technology not only optimizes route planning but also enhances collision avoidance capabilities [5], ensuring safer navigation.

Existing vessel trajectory prediction methods have made significant progress [6]. From early physical models and machine learning approaches, the field has evolved to embrace deep learning as the leading methodology [7]. Notable methods such as METO-S2S [8] and STMGCN [9] have achieved remarkable results in this domain. METO-S2S employs a sequence-to-sequence framework with a multi-semantic encoder and type-oriented decoder, effectively combining temporal sequences with vessel type information to improve prediction accuracy. On the other hand, STMGCN uses spacial–temporal multi-graph convolutional networks to model complex vessel interactions, making it especially suited for dynamic maritime environments.

However, current models still face two key challenges. First, although some advanced models use graph structures, such as STMGCN, to account for vessel interactions [10,11], most focus on global feature extraction in the temporal dimension and only model spatial interactions at the final time step. This approach fails to capture the dynamic evolution of interactions over time [12]. Second, while AIS data contain rich attribute information, existing models either rely solely on absolute position data or, like METO-S2S, naively concatenate all attributes without analyzing the impact of each attribute at different model layers. This coarse handling of attributes limits the potential benefits of multi-attribute information in trajectory prediction [13].

To address these challenges, we propose a dual spacial–temporal attention network (DualSTMA) that integrates multiple attributes. The main contributions of this paper are as follows:

We design a dual-path spacial–temporal attention encoder that models both individual vessel behavior and inter-vessel interactions. One path extracts global temporal features followed by spatial interaction modeling, while the other prioritizes spatial interactions at different time points and then extracts temporal features.
We separate vessel attributes into dynamic and static categories and process them at different stages of the model, enhancing the ability to capture multi-dimensional information.
We benchmark our method on real AIS datasets, and the results show that DualSTMA significantly outperforms existing methods in prediction accuracy.

The rest of this paper is organized as follows: Section 2 reviews related work in vessel trajectory prediction. Section 3 details the proposed DualSTMA network. Section 4 describes the experimental setup, data processing, and results’ analysis. Section 5 discusses the model’s consideration of vessel characteristics and marine environmental factors. Finally, Section 6 concludes this paper and discusses future research directions.

2. Literature Review

Vessel trajectory prediction technology plays a crucial role in ensuring navigational safety in modern maritime and inland shipping. Based on different approaches, vessel trajectory prediction can be classified into three main categories: physical model-based prediction, machine learning-based prediction, and deep learning-based prediction.

2.1. Physical Model-Based Vessel Trajectory Prediction

Physical model-based methods rely on physical laws and kinematic models. These approaches predict the vessel’s next position directly from its current location and speed, without learning motion patterns from historical trajectories. Common methods include Kalman filters, particle filters, and Markov models. Tong et al. [14] propose a Kalman filter strategy that improves the accuracy of trajectory prediction by filtering vessel node location data, particularly effective in curved inland waterways. Fossen et al. [15] optimize short-term predictions in dynamic seas by applying an extended Kalman filter to real-time AIS data. Qiu et al. [16] combines the Kalman filter (KF) model with the ship navigation trajectory map obtained through clustering to enhance the accuracy of trajectory prediction. Mazzarella et al. [17] combine particle filtering with historical AIS data to predict trajectories effectively in narrow waterways. Zhang et al. [18] incorporate wavelet analysis and hidden Markov models (HMMs) to handle nonlinear vessel trajectories, specifically for large vessels. However, physical model-based approaches often struggle to capture long-term trajectory changes in complex marine environments.

2.2. Machine Learning-Based Vessel Trajectory Prediction

Machine learning-based methods predict future vessel trajectories by learning from historical AIS data. Common approaches include Gaussian mixture models (GMMs), support vector machines (SVMs), and random forest (RF). Dalsnes et al. [19] apply GMMs for short-term predictions and provide uncertainty estimates for the predicted results. Murray et al. [20] use GMM clustering and principal component analysis (PCA) to generate multiple possible vessel trajectories. While these statistical models perform well in certain scenarios, they often rely on many assumptions, making it difficult to adapt to rapidly changing marine environments. Liu et al. [21] improve trajectory prediction by applying SVM, incorporating vessel speed and heading as features, though SVM struggles with generalization. Zhang et al. [22] use the random forest algorithm to predict vessel destinations by comparing the similarity of historical trajectories. Furthermore, machine learning plays a crucial role in the preprocessing phase of trajectory prediction, particularly in vessel classification and route extraction. Luo et al. [23] utilize an ensemble classifier to categorize vessel trajectories, while Kraus et al. [24] employ behavioral and geographic features to classify vessel types based solely on trajectory data. For route extraction, Huang et al. [25] apply dynamic time warping (DTW) and hierarchical density-based spatial clustering (HDBSCAN) to identify main routes from high-traffic AIS data. Filipiak et al. [26] propose a genetic algorithm-based method to automatically construct maritime traffic networks, supporting efficient path planning and navigation in densely trafficked areas.

2.3. Deep Learning-Based Vessel Trajectory Prediction

With the advancement of deep learning, recurrent neural networks (RNNs) and their variants, such as LSTM and GRU, have become mainstream in vessel trajectory prediction [27,28]. Nguyen et al. [29] utilize a Seq2Seq model to predict vessel trajectories, showcasing the model’s strength in capturing long-term dependencies. Forti et al. [30] propose an LSTM-based Seq2Seq model that effectively captures time-series features from AIS data, significantly improving prediction accuracy. Gao et al. [31] explore the use of Bi-LSTM, enhancing the correlation between historical and future data for more accurate predictions. Zhang et al. [8] apply a multi-semantic encoder and type-oriented decoder, combining time series data with vessel-type information to boost predictive performance. However, most deep learning models overlook the interactions between vessels. Liu et al. [32] introduce a hybrid model combining convolutional neural networks (CNNs) and Bi-LSTM to predict future vessel trajectories based on AIS data. Recently, graph neural networks (GNNs) have gained attention in trajectory prediction [33]. Feng et al. [34] propose a spacial–temporal graph convolutional neural network (STGCNN), which models the interactions between vessels through graph structures, improving prediction in complex traffic scenarios. Zhao et al. [35] combine graph attention networks (GATs) with LSTM, significantly enhancing prediction accuracy in complex maritime environments such as ports and inland waterways. Liu et al. [9] further develop a spacial–temporal multi-graph convolutional network (STMGCN) based on a GCN and transformer, designed for synchronized multi-vessel trajectory prediction.

3. Research Methodology

3.1. Problem Definition and Data Preprocessing

3.1.1. Problem Definition

In the context of vessel trajectory prediction, the goal is to forecast a vessel’s future states based on its historical trajectory data. To achieve this, we first define the input and output formats.

The model takes the past n time steps of the vessel’s state as input. This is represented as

X_{i} = x_{i}^{t_{0}}, x_{i}^{t_{1}}, \dots, x_{i}^{t_{n}}

, where

x_{i}^{t}

is the state vector of vessel i at time t. Each state vector

x_{i}^{t}

contains five elements: longitude (

l o n_{i}^{t}

), latitude (

l a t_{i}^{t}

), speed (

v_{i}^{t}

), heading angle (

θ_{i}^{t}

), and vessel type. The type is a constant value that remains unchanged over time. It is used to differentiate between different types of vessels, such as cargo vessels or oil tankers. The output of the model is the predicted state of the vessel over the next m time steps. This is expressed as

{\hat{Y}}_{i} = {\hat{y}}_{i}^{t_{n + 1}}, {\hat{y}}_{i}^{t_{n + 2}}, \dots, {\hat{y}}_{i}^{t_{n + m}}

, where

{\hat{y}}_{i}^{t}

represents the predicted state vector at time t. The predicted vector includes

({\hat{l o n}}_{i}^{t}, {\hat{l a t}}_{i}^{t}, {\hat{v}}_{i}^{t}, {\hat{θ}}_{i}^{t})

. Unlike the input state vector, the output does not include the vessel type, since it remains constant and does not need prediction.

This definition clarifies the main objective of the vessel trajectory prediction task. The task is to predict future movements by learning from historical data. To ensure data consistency and accuracy, preprocessing is often required. This includes converting coordinates to a unified system, normalizing speed and heading data, and filling any missing values.

3.1.2. Data Preprocessing

In vessel trajectory prediction, traditional modeling approaches are often scene-centered, where a vessel’s position is typically represented using an absolute coordinate system. However, this absolute position representation lacks pose invariance, meaning that the same vessel trajectory may appear different depending on the reference frame. This limitation can lead to poor generalization performance, particularly in complex scenarios.

To address this issue, we propose a vessel-centered modeling approach. Specifically, let vessel i be the target for prediction, with vessel j representing the surrounding vessels within the same time frame. We construct a rotation matrix

R^{t_{n}}

using the heading

θ_{i}

of vessel i at the most recent observation time

t_{n}

, and a translation matrix

P^{t_{n}}

based on vessel i’s position

(l o n_{i}^{t_{n}}, l a t_{i}^{t_{n}})

at time

t_{n}

. By applying these matrix transformations, we convert the coordinates of vessel i and its surrounding vessels into a new coordinate system, where vessel i’s position becomes the origin, and its heading defines the positive direction. This vessel-centered modeling approach improves the model’s adaptability across different scenarios and allows for a more accurate representation of the interaction between vessels.

To better capture positional changes, we introduce the concept of relative position in addition to the absolute position. The relative position is expressed as the positional change of vessel i between two consecutive time steps:

{l o n_{i}^{t} - l o n_{i}^{t - 1}, l a t_{i}^{t} - l a t_{i}^{t - 1}}_{t = t_{1}}^{t_{n}}

. This feature helps the model capture motion trends more effectively.

3.1.3. Feature Preprocessing

In vessel trajectory prediction, the input state vector at each time step contains multiple types of information. We categorize this information into two parts: dynamic information and static information. Dynamic information includes variables that change over time, such as the vessel’s current position, speed, and heading. Static information includes fixed features such as the vessel’s type, length, and width. While dynamic information reflects the movement characteristics of the vessel and is the key basis for predicting future trajectories, static information provides inherent attributes that may influence the vessel’s movement patterns.

For dynamic information, since it originates from multiple channels (e.g., position, speed, heading), it is necessary to consolidate these channels before inputting them into the model. To achieve this, we use a combination of 1 × 1 Convolution (Conv2d), affine transformation, and ReLU activation to integrate the information from different channels. The 1 × 1 convolution enables us to extract relevant features across channels without increasing the model’s complexity. Affine transformation assigns appropriate weights to each channel, allowing us to adjust their contributions to the overall representation. Finally, ReLU activation introduces nonlinearity, enhancing the expressive power of the fused features. This approach fully utilizes channel information, yielding a unified dynamic feature representation, which serves as reliable input for subsequent trajectory prediction. The corresponding equations are as follows. Here,

v l o n_{i}^{t}

,

v l a t_{i}^{t}

,

θ l o n_{i}^{t}

, and

θ l a t_{i}^{t}

represent the velocity and heading components in the longitude and latitude directions, respectively.

z d_{i}^{t}

is a variable that aggregates the dynamic information of vessel i at time t, while

γ

and

β

are two learnable parameters.

z d_{i}^{t} = C o n v 2 d (l o n_{i}^{t}, l a t_{i}^{t}, l o n_{i}^{t} - l o n_{i}^{t - 1}, l a t_{i}^{t} - l a t_{i}^{t - 1}, v l o n_{i}^{t}, v l a t_{i}^{t}, θ l o n_{i}^{t}, θ l a t_{i}^{t})

(1)

z d_{i}^{t} = z d_{i}^{t} \times γ + β

(2)

z d_{i}^{t} = R e L U (z d_{i}^{t})

(3)

For static information, we process it using an embedding layer that maps categorical features such as vessel type into a lower-dimensional continuous vector space. However, in our initial experiments, we find that directly mixing static and dynamic information did not yield optimal results. In fact, this approach leads to a decline in prediction performance. We hypothesize that the static information may be misinterpreted during channel fusion, introducing additional noise that interferes with the model’s learning of dynamic information. Based on this observation, we choose to handle static and dynamic information separately to prevent static information from negatively affecting the dynamic features. For further processing of static information, we applied an MLP (multilayer perceptron) to transform the features. This includes mapping the vessel type embedding, as well as features like length and width, into a new feature space. The output features are then passed through a sigmoid function to scale them between 0 and 1, which serves as a gating mechanism. This control signal adjusts the weight of vessel j when aggregating information to vessel i from the surrounding vessels. This process can be represented by the following equations. Here,

z s_{i}^{t}

is a variable that aggregates the static information of vessel i at time t.

z s_{i}^{t} = M L P (t y p e_e m b e d d i n g_{i}, w i d t h_{i}, l e n g t h_{i})

(4)

z s_{i}^{t} = s i g m o i d (z s_{i}^{t})

(5)

This approach ensures that the vessel’s static information is effectively utilized in the modeling process. It also provides a flexible mechanism that allows the model to adaptively adjust the aggregation of information based on the static characteristics of different vessels, thereby enhancing overall prediction performance.

3.2. Dual Spatial–Temporal Attention Encoder

In vessel trajectory prediction, effectively capturing and modeling the spacial–temporal interactions between vessels is crucial. With the success of transformer models in various tasks, they have been increasingly applied to trajectory prediction. There are two common strategies: one is to handle the time and space axes separately, using self-attention mechanisms to model each axis independently; the other is to flatten both the temporal and spatial information into a unified self-attention layer for joint modeling. While the former can capture temporal and spatial information independently, it lacks a global understanding of the spatial–temporal relationships. The latter, although more comprehensive, suffers from high computational complexity and resource consumption.

To address these issues, we propose a dual-path architecture that models the spatial–temporal relationships from both the temporal and spatial dimensions, improving model efficiency and prediction performance, as shown in Figure 1.

3.2.1. Temporal–Spatial Path

Our temporal–spatial path primarily focuses on the global modeling of a vessel’s time series, capturing its historical features over the past n time steps and then combining these with the spatial relationships at the current observation time step to model vessel interactions. Specifically, we first use a self-attention mechanism to encode the vessel’s time series. Then, at the current observation time step, we model the interactions between vessels using a cross-attention mechanism.

In the temporal processing module, the input is the dynamic information of vessel i and the surrounding vessels within the observed time frame, centered in the coordinate system of vessel i. The dynamic information

z d_{i}^{t}

for each vessel i has a feature dimension of

[T, D]

, where T represents the number of past time steps, and D represents the feature dimension for each time step. First, we add positional encoding to all time steps to better capture the temporal positional information. Additionally, to summarize the information across the entire time series, we adopt an approach similar to SceneTransformer [11,36], where a learnable token (token(1,D)) is introduced at the last dimension of the time series as an aggregation point, resulting in the new dynamic information

\hat{z d_{i}^{t}}

:

\hat{z d_{i}^{t}} = c o n c a t ((z d_{i}^{t}, t o k e n (1, D)), d i m = t)

(6)

Next, we apply the self-attention mechanism along the temporal dimension, calculating the query, key, and value vectors, and using self-attention to perform weighted summation. Here,

W_{Q_{t i m e}}

,

W_{K_{t i m e}}

, and

W_{V_{t i m e}}

are learnable weight matrices that project the input

\hat{z d_{i}^{t}}

into the query

q_{i}^{t}

, key

k_{i}^{t}

, and value vectors

v_{i}^{t}

, respectively.

q_{i}^{t} = \hat{z d_{i}^{t}} \times W_{Q_{t i m e}}, k_{i}^{t} = \hat{z d_{i}^{t}} \times W_{K_{t i m e}}, v_{i}^{t} = \hat{z d_{i}^{t}} \times W_{V_{t i m e}}

(7)

h d_{i}^{t} = s o f t m a x (\frac{q_{i}^{t} \times {k_{i}^{t}}^{T}}{\sqrt{d_{k}}}) v_{i}^{t}

(8)

Finally, we extract the last value along the time dimension as the summary representation of the vessel’s dynamic information over the past T time steps, denoted as

h d_{i}

:

h d_{i} = h d_{i}^{t} (t = t_{n + 1})

(9)

While static information is less effective in temporal encoding, it plays a critical role in spatial interaction. For instance, vessels often behave more conservatively when interacting with larger or specific types of vessels, while interactions with smaller vessels may be more proactive or aggressive. Based on these observations, we introduce a gating mechanism to regulate the flow and aggregation of features, ensuring that important dynamic and static information is appropriately utilized in spatial interactions. Specifically, we fuse the temporally encoded dynamic information

h d_{i}

with the original static information

t y p e_e m b e d d i n g_{i}

and use a gating mechanism generated by static information

z s_{i}^{t n}

to adjust the transmission of vessel features

h_{i}

:

{\hat{h}}_{d_{i}} = c o n c a t (h d_{i}, t y p e_e m b e d d i n g_{i}, d i m = - 1)

(10)

h_{i} = {\hat{h}}_{d_{i}} ⊙ z s_{i}^{t n}

(11)

After encoding both the dynamic and static information of each vessel over the past time series, we proceed to capture the spatial interactions between vessels to enhance the model’s understanding and prediction in complex scenarios. To achieve this, we use a cross-attention mechanism to model spatial interactions, where the vessel to be predicted, vessel i, serves as the query, and the surrounding vessels j serve as keys and values. This mechanism enables the model to focus effectively on vessels that have the closest interactions with the target vessel, leading to more accurate predictions of future movements.

During the computation, we first generate the query vector

q_{i}

for the central vessel and the key and value vectors

k_{j}

and

v_{j}

for the surrounding vessels. Here,

W_{Q_{s p a c e}}

,

W_{K_{s p a c e}}

, and

W_{V_{s p a c e}}

are learnable weight matrices:

q_{i} = h_{i} \times W_{Q_{s p a c e}}, k_{i} = h_{j} \times W_{K_{s p a c e}}, v_{j} = h_{j} \times W_{V_{s p a c e}}

(12)

Next, we calculate the dot product between the query vector

q_{i}

and the key vector

k_{j}

, and we normalize it using the softmax function to obtain the attention weight

a_{i j}

between vessels. Here,

N_{i}

denotes the set of neighboring vessels of vessel i:

a_{i j} = softmax (\frac{q_{i} \cdot {[k_{j},_{j \in N_{i}}]}^{T}}{\sqrt{d_{k}}})

(13)

Finally, we perform a weighted summation of the value vectors

v_{j}

using these weights to obtain the spatial encoding

T S_{i}

for vessel i at time

t_{n}

:

T S_{i} = \sum_{j \in N_{i}} a_{i j} v_{j}

(14)

Similar to the original transformer architecture, the self-attention and cross-attention modules here also employ multi-head attention. The output from the multi-head attention module is processed by an MLP module to further extract higher-level features. Additionally, we apply layer normalization after each attention module to stabilize the training process, and residual connections are added after each module to enhance model expressiveness and accelerate convergence. The final output is the spacial–temporal representation

T S_{i}

of vessel i.

3.2.2. Spacial–Temporal Path

While the strategy of capturing temporal features first and then spatial interactions effectively models global characteristics, it struggles to fully capture the evolving relationships between vessels over time. To address this, the spacial–temporal path takes a different approach: it first models spatial interactions at each time step and then extracts features across the temporal dimension. This path, like the spacial–temporal path, is based on self-attention and cross-attention mechanisms. It consists of three main steps:

First, at each time step t, we represent the dynamic information of the vessel i to be predicted as

z d_{i}^{t}

, while the dynamic information of surrounding vessels is denoted as

z d_{j}^{t}

. Similar to the gating mechanism used earlier, we first concatenate the dynamic features

z d_{i}^{t}

with the static features

t y p e_e m b e d d i n g_{i}

and then use the gating signal

z s_{i}^{t}

generated from the static features to control the transmission of the overall feature, yielding the new feature representation

z_{i}^{t}

:

z_{i}^{t} = concat ((z d_{i}^{t}, t y p e_e m b e d d i n g_{i}), \dim = - 1)

(15)

z_{i}^{t} = z_{i}^{t} ⊙ z s_{i}^{t}

(16)

Next, we apply a multi-head attention mechanism similar to the spacial–temporal path. We treat the dynamic information of the vessel i as the query, while the surrounding vessels’ information serves as the key and value. At each time step, we compute the corresponding spatial encoding with the following steps:

q_{i}^{t} = z_{i}^{t} \times W_{Q_{s p a c e}}, k_{j}^{t} = z_{j}^{t} \times W_{K_{s p a c e}}, v_{j}^{t} = z_{j}^{t} \times W_{V_{s p a c e}}

(17)

a_{i j}^{t} = s o f t m a x (\frac{q_{i}^{t} \cdot {[k_{j}^{t},_{j \in N_{i}}]}^{T}}{\sqrt{d_{k}}})

(18)

s_{i}^{t} = \sum_{j \in N_{i}} a_{i j}^{t} v_{j}^{t}

(19)

Once the spatial interaction encoding is obtained at each time step, we further encode the information across the temporal axis. Similar to the spacial–temporal path, we introduce a learnable token at the end of the time series and add positional encoding to all time steps to enhance the model’s ability to capture temporal information. This is carried out as follows:

{\hat{s}}_{i}^{t} = concat ((s_{i}^{t}, token (1, D)), \dim = t)

(20)

q_{i}^{t} = {\hat{s}}_{i}^{t} \times W_{Q_{t i m e}}, k_{i}^{t} = {\hat{s}}_{i}^{t} \times W_{K_{t i m e}}, v_{i}^{t} = {\hat{s}}_{i}^{t} \times W_{V_{t i m e}}

(21)

h_{i}^{t} = softmax (\frac{q_{i}^{t} \times {k_{i}^{t}}^{T}}{\sqrt{d_{k}}}) v_{i}^{t}

(22)

S T_{i} = h_{i}^{t} (t = t_{n + 1})

(23)

Similar to the temporal–spatial path, the spacial–temporal path utilizes multi-head attention mechanisms and residual connections to enhance the model’s expressive power. However, by performing spatial interactions first and then extracting temporal features, the spacial–temporal path captures the complex interactions between vessels at different time steps more effectively. This sequence adjustment allows the model to better reflect the mutual influences between vessels, especially the long-term dependencies in the temporal dimension, thereby improving the overall prediction accuracy.

3.3. LSTM Decoder

After the earlier processing steps, we obtain two sets of feature representations,

T S_{i}

from the spacial–temporal path and

S T_{i}

from the temporal–spatial path. To fuse these two sets of information and obtain a more comprehensive spacial–temporal feature, we concatenate them to form the final dual-path spacial–temporal representation

E_{i}

:

E_{i} = concat (T S_{i}, S T_{i})

(24)

This fusion ensures that the model can comprehensively utilize the features captured by both paths, enhancing its ability to predict the vessel’s future trajectory.

Next, we employ an LSTM network to iteratively decode the future trajectory information. The LSTM can capture long-term dependencies while using its internal gating mechanism to control information flow, making it suitable for handling the complexity of sequential data. During the decoding process, to better leverage the multi-dimensional information from earlier inputs, we connect three independent MLP modules to the hidden state of the LSTM. These modules are used to decode the longitude, latitude, heading, and speed information for each time step. The specific structure of the LSTM network and the decoding process are as follows. Here,

E_{i}^{t}

represents the input embedding of vessel i at time t; W, U, and b are the learnable weight matrices and biases for each gate in the LSTM;

i_{i}^{t}

,

f_{i}^{t}

, and

o_{i}^{t}

are the input, forget, and output gate activations, respectively; and

a_{i}^{t}

is the candidate cell state.

c_{i}^{t}

and

h_{i}^{t}

represent the cell state and hidden state of the LSTM at time t. The decoded outputs

{\hat{l o n}}_{i}^{t}

,

{\hat{l a t}}_{i}^{t}

(longitude and latitude),

{\hat{v l o n}}_{i}^{t}

,

{\hat{v l a t}}_{i}^{t}

(velocity components in the longitudinal and latitudinal directions), and

{\hat{θ l o n}}_{i}^{t}

,

{\hat{θ l a t}}_{i}^{t}

(heading components in the longitudinal and latitudinal directions) are obtained through an MLP applied to the hidden state

ℏ_{i}^{t}

.

\begin{matrix} i_{i}^{t} & = sigmoid (W_{i} E_{i}^{t} + U_{i} ℏ_{i}^{t - 1} + b_{i}) \end{matrix}

(25)

\begin{matrix} f_{i}^{t} & = sigmoid (W_{f} E_{i}^{t} + U_{f} ℏ_{i}^{t - 1} + b_{f}) \end{matrix}

(26)

\begin{matrix} o_{i}^{t} & = sigmoid (W_{o} E_{i}^{t} + U_{o} ℏ_{i}^{t - 1} + b_{o}) \end{matrix}

(27)

\begin{matrix} a_{i}^{t} & = tanh (W_{a} E_{i}^{t} + U_{a} ℏ_{i}^{t - 1} + b_{a}) \end{matrix}

(28)

\begin{matrix} c_{i}^{t} & = f_{i}^{t} \cdot c_{i}^{t - 1} + i_{i}^{t} \cdot a_{i}^{t} \end{matrix}

(29)

\begin{matrix} h_{i}^{t} & = o_{i}^{t} \cdot tanh (c_{i}^{t}) \end{matrix}

(30)

\begin{matrix} {{\hat{l o n}}_{i}^{t}, {\hat{l a t}}_{i}^{t}} & = MLP (ℏ_{i}^{t}) \end{matrix}

(31)

\begin{matrix} {{\hat{v l o n}}_{i}^{t}, {\hat{v l a t}}_{i}^{t}} & = MLP (ℏ_{i}^{t}) \end{matrix}

(32)

\begin{matrix} {{\hat{θ l o n}}_{i}^{t}, {\hat{θ l a t}}_{i}^{t}} & = MLP (ℏ_{i}^{t}) \end{matrix}

(33)

After obtaining the predicted values for the next m time steps, we need to recover the positions using the previously calculated rotation matrix

R^{t_{n}}

and translation matrix

P^{t_{n}}

. Additionally, we adjust the heading and speed based on the rotation matrix to ensure these predicted values can be accurately compared with the corresponding ground truth.

3.4. Loss

Our loss function design consists of three main components: absolute position loss, velocity loss, and heading loss. This design aligns closely with the dynamic information feature extraction described earlier, further enhancing the model’s ability to accurately capture and predict key navigational parameters.

First, let the ground truth state values for vessel i over the next m time steps be

Y_{i} = {y_{i}^{t_{n + 1}}, y_{i}^{t_{n + 2}}, \dots, y_{i}^{t_{n + m}}}, y_{i}^{t} = (l o n_{i}^{t}, l a t_{i}^{t}, v l o n_{i}^{t}, v l a t_{i}^{t}, θ l o n_{i}^{t}, θ l a t_{i}^{t})

(34)

The predicted state values are

{\hat{Y}}_{i} = {{\hat{y}}_{i}^{t_{n + 1}}, {\hat{y}}_{i}^{t_{n + 2}}, \dots, {\hat{y}}_{i}^{t_{n + m}}}, {\hat{y}}_{i}^{t} = ({\hat{l o n}}_{i}^{t}, {\hat{l a t}}_{i}^{t}, {\hat{v l o n}}_{i}^{t}, {\hat{v l a t}}_{i}^{t}, {\hat{θ l o n}}_{i}^{t}, {\hat{θ l a t}}_{i}^{t})

(35)

In the position loss, we implement a short-term weighting strategy due to the critical nature of trajectory changes that occur more frequently in the short term during actual navigation. By assigning higher weights to short-term predictions, we ensure that the model can effectively respond to dynamic changes in the environment. Therefore, we use a short-term weighted Huber loss [37]:

L_{p o s i t i o n} = \frac{1}{N} \sum_{i = 0}^{N} \frac{1}{M} \sum_{t = 1}^{M} Huber_loss ((l o n_{i}^{0 : t}, l a t_{i}^{0 : t}), ({\hat{l o n}}_{i}^{0 : t}, {\hat{l a t}}_{i}^{0 : t}))

(36)

Huber loss is particularly advantageous for ship trajectory prediction due to its robustness in handling outliers. It combines the benefits of both L1 and L2 losses, adapting to the nature of the error. This flexibility is essential in maritime data, which often contains noise and anomalies, allowing the model to maintain accuracy in dynamic environments.

Huber_loss (x) = \{\begin{matrix} \frac{1}{2} x^{2} & if | x | \leq δ \\ δ (| x | - \frac{1}{2} δ) & otherwise \end{matrix}

(37)

The velocity and heading losses function as auxiliary tasks that leverage the velocity and heading information present in the input data. Accurate predictions of these parameters not only improve the overall performance of the model but also ensure that the predicted trajectory adheres to kinematic principles.

For heading error, since the heading is a circular angle, we first convert the predicted and ground truth heading differences to the range of

[- π, π]

. The wrapped heading error is calculated as follows:

θ_{w r a p p e d} = - π + f l o o r m o d (({\hat{θ l o n}}_{i}^{t} - θ l o n_{i}^{t}, {\hat{θ l a t}}_{i}^{t} - θ l a t_{i}^{t}) + π, 2 π)

(38)

Then, we use the Huber loss to compute the heading error loss:

L_{h e a d i n g} = \frac{1}{N} \sum_{i = 0}^{N} \frac{1}{M} \sum_{t = 0}^{M} Huber_loss (θ_{w r a p p e d})

(39)

For the velocity error, we also calculate the loss using the Huber loss:

L_{v e l o c i t y} = \frac{1}{N} \sum_{i = 0}^{N} \frac{1}{M} \sum_{t = 0}^{M} Huber_loss ((v l o n_{i}^{t}, v l a t_{i}^{t}), ({\hat{v l o n}}_{i}^{t}, {\hat{v l a t}}_{i}^{t}))

(40)

Finally, we combine the above three losses with weighted summation to form the final loss function. The weights of each component are adjusted based on their importance in the application, ensuring a balanced performance of the model across different dimensions:

L = λ_{p o s i t i o n} \cdot L_{p o s i t i o n} + λ_{h e a d i n g} \cdot L_{h e a d i n g} + λ_{v e l o c i t y} \cdot L_{v e l o c i t y}

(41)

This design not only enables precise prediction of the vessel’s future trajectory but also ensures robustness and accuracy in the key dimensions of position, heading, and velocity. Additionally, the loss function design aligns with the dynamic information feature extraction strategy, allowing the model to adapt more effectively to complex navigational environments by considering the importance of different features.

4. Experiments

4.1. Experiment Settings

4.1.1. Hyperparameters and Experimental Environment

In terms of parameter settings, the input dimension for the temporal and spatial transformer layers in the DualSTMA model is 32, with a hidden layer dimension of 128. We use 4 transformer layers, each containing 8 attention heads, and apply a dropout rate of 0.1. The positional encoding uses sinusoidal position encoding. The LSTM part of the decoder contains 2 layers, with a hidden layer dimension of 64.

The loss function weights are set as follows: position loss

λ_{p o s i t i o n} = 10

, heading loss

λ_{h e a d i n g} = 1

, and velocity loss

λ_{v e l o c i t y} = 0.1

. For the experiments, the model is trained for 100 epochs. We used the Adam optimizer with a learning rate of 0.0015 and no weight decay. The batch size is set to 256. All experiments are conducted on an Ubuntu system with an NVIDIA GeForce RTX 3090Ti GPU, using Python 3.8.8 and the PyTorch deep learning framework.

4.1.2. Dataset

The dataset used in this paper is primarily based on the publicly available data from the official U.S. Marine website in 2021. As shown in Figure 2, the data cover major coastal areas in the eastern, western, and southern regions of the United States, with specific sampling ranges listed in Table 1, which outlines four representative coastal areas. Additionally, to further assess the model’s generalizability across diverse environments, supplementary AIS data from Hawaiian deep sea areas, the Gulf of Mexico, and the Florida Strait are collected as independent test sets. Detailed information on these will be discussed in the generalization experiments section.

The primary dataset is divided into training, validation, and test sets with an 8:1:1 ratio, containing 27,313; 3414; and 3415 trajectory segments, respectively, and involving 6927; 1983; and 1980 vessels. Statistical analysis and visualization of vessel types in the training and test sets are provided in bar and pie charts (see Figure 3a,b). The analysis shows that the vessel type distribution is largely consistent across the training and test sets, ensuring that the model’s performance remains comparable and stable across different data sets. The dataset includes eight major vessel types, such as pleasure crafts, sailing, cargo, and fishing vessels, providing comprehensive support for model training.

The data processing steps refer to the open processing procedures of the METO-S2S dataset [38], with the main stages as follows: extraction of vessel static features, classification and sorting, cleaning and interpolation of trajectory points, segmenting into stationary and sailing segments [39], calculating and adding speed and heading information for each trajectory point, and removing abnormal jumps to filter out noise from abrupt speed changes.

After processing, each AIS trajectory point contains the following fields: timestamp, latitude, longitude, heading angle, speed, sailing distance, vessel type, vessel length, width, and MMSI. The time interval between trajectory points is standardized to 10 min, maintaining the continuity of trajectories while reducing computational complexity, thus enhancing the accuracy of vessel trajectory prediction.

4.1.3. Baselines

In this section, we briefly introduce several baseline models used for time series and trajectory prediction.

LSTM: LSTM [30] is a specialized recurrent neural network designed to capture long-term dependencies in sequential data. By introducing memory cells, it can retain information over long time sequences, overcoming the issue of information loss in traditional RNNs when processing long sequences. In our model setup, we use a two-layer LSTM with a hidden layer dimension of 32 to ensure sufficient memory capacity to capture complex temporal dependencies.
Bi-LSTM Bi-LSTM [40] extends the LSTM model, processing both forward and backward information in the sequence, providing more comprehensive contextual information at each time step. The key parameters include a two-layer Bi-LSTM with a hidden layer dimension of 32, enabling bidirectional processing.
GRU: GRU [41] is a simplified version of the LSTM model that merges the forget and input gates into a single update gate, simplifying the network structure while maintaining strong sequence processing capabilities. We stack two GRU layers, each with a hidden layer dimension of 32.
Transformer: The transformer [42] model relies entirely on attention mechanisms to process sequential data. It is suitable for efficient processing of large datasets. We set this model with eight attention heads, four encoder layers, and a hidden layer dimension of 32, with a dropout rate of 0.1 to prevent overfitting and improve generalization.
STGAT: STGAT [43] was initially designed for human trajectory prediction, combining graph attention mechanisms with LSTM to capture spacial–temporal interactions. After appropriate modifications, we applied it to vessel trajectory prediction to generate more reasonable trajectory results.
METO-S2S: METO-S2S [8] is a model for vessel trajectory prediction that uses a multi-semantic encoder and a type-guided decoder in a sequence-to-sequence architecture. It not only uses historical position information but also incorporates speed, heading, and other navigational information for prediction.

4.1.4. Evaluation Metric

In vessel trajectory prediction, we use four key evaluation metrics to measure the prediction performance of the models. These metrics mainly focus on the latitude and longitude errors between the predicted and actual trajectories, aiming to comprehensively evaluate the model’s performance across different time steps. Below are the specific definitions and calculation formulas for each metric.

RMSE (root mean square error) measures the average error between predicted and actual values:

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} \frac{1}{T} \sum_{t = 1}^{T} ({({\hat{l o n}}_{i}^{t} - l o n_{i}^{t})}^{2} + {({\hat{l a t}}_{i}^{t} - l a t_{i}^{t})}^{2})}

(42)

where N represents the number of vessels, T represents the number of time steps,

(l o n_{i}^{t}, l a t_{i}^{t})

is the true longitude and latitude of vessel i at time t, and

(\hat{l o n} i^{t}, \hat{l a t} i^{t})

is the predicted longitude and latitude at the corresponding time step.

MAE (mean absolute error) calculates the average absolute difference between predicted and actual values:

MAE = \frac{1}{N} \sum_{i = 1}^{N} \frac{1}{T} \sum_{t = 1}^{T} (|{\hat{l o n}}_{i}^{t} - l o n_{i}^{t}| + |{\hat{l a t}}_{i}^{t} - l a t_{i}^{t}|)

(43)

The symbols are the same as those used in RMSE. MAE does not weigh larger errors more heavily, providing a different perspective on error, especially more stability when dealing with outliers.

ADE (average displacement error) mainly evaluates the prediction accuracy of the model over the entire time period. It calculates the average displacement difference between the predicted trajectory and the actual trajectory at all time steps:

ADE = \frac{1}{N} \sum_{i = 1}^{N} \frac{1}{T} \sum_{t = 1}^{T} \sqrt{{({\hat{l o n}}_{i}^{t} - l o n_{i}^{t})}^{2} + {({\hat{l a t}}_{i}^{t} - l a t_{i}^{t})}^{2}}

(44)

FDE (final displacement error) focuses on the error at the last time step, measuring the distance between the predicted trajectory’s endpoint and the actual endpoint:

FDE = \frac{1}{N} \sum_{i = 1}^{N} \sqrt{{({\hat{l o n}}_{i}^{t} - l o n_{i}^{t})}^{2} + {({\hat{l a t}}_{i}^{t} - l a t_{i}^{t})}^{2}}

(45)

4.2. Model Performance Comparison

4.2.1. Comparison Results with Baselines

In our research, we comprehensively evaluate the performance of the proposed model by comparing it against six baseline models. We evaluate the models using four key metrics (RMSE, MAE, ADE, FDE) to measure their performance across short-term (10 min, 20 min), mid-term (30 min, 40 min), and long-term (50 min) predictions. As shown in Table 2, the results show that our model achieves the best performance across all prediction periods and metrics. Particularly in long-term predictions, our model shows an average improvement of over 65% across the four metrics compared to the best baseline model, METO-S2S, with a maximum improvement of 70%, clearly demonstrating the superiority of our model.

For short-term predictions (10 min), the baseline models Bi-LSTM, STGAT, and METO-S2S perform similarly, indicating that existing models already achieve good results in short-term predictions. However, our model significantly outperforms STGAT in terms of ADE and FDE. Specifically, the errors in ADE and FDE drop from 0.002604° and 0.003475° with STGAT to 0.000609° and 0.000807° in our model. This reduces the average displacement error of vessel trajectories over 10 min from 289 m to 67 m, and the final position displacement error from 442 m to 88 m, significantly improving short-term prediction accuracy.

In mid-term predictions (30 min), differences between the baseline models start to emerge. METO-S2S performs best in terms of ADE and FDE, but our model achieves even lower error rates for these metrics. Specifically, the errors in ADE and FDE drop from 0.005152° and 0.007298° with METO-S2S to 0.001157° and 0.001771° in our model. This reduces the average displacement error over 30 min from 572 m to 128 m, and the final position displacement error from 811m to 196m. Compared to METO-S2S, the average displacement error and final position displacement error decrease by 75%.

In the more challenging long-term prediction (50 min), the performance gap between different models becomes more pronounced. Even the best baseline model, METO-S2S, shows less satisfactory performance, with ADE and FDE errors of 894 m and 1290 m. In contrast, our model significantly outperforms all other models, achieving an ADE error of 270 m and an FDE error of 418 m, representing a 69% improvement over METO-S2S.

These analyses demonstrate that our model excels in short-term, mid-term, and long-term trajectory prediction. Figure 4a,b visually show how the performance of different models changes as the prediction time increases. While traditional LSTM and GRU models can capture hidden information in time series, they may not be sufficient for handling the complexity of long-term, large-span predictions. Bi-LSTM and transformer models can better discover patterns in long-term predictions through global information interaction.

The aforementioned temporal prediction models often overlook vessel interactions and multi-dimensional attribute information. The STGAT model improves prediction by capturing spatial interactions, while the METO-S2S model enhances prediction accuracy by considering vessel attributes such as type, speed, heading, and departure time.

Our model, unlike STGAT, which focuses only on spatial interactions or METO-S2S (which emphasizes attribute information), considers both aspects. We not only capture spatial interactions between vessels but also address the shortcomings in interaction modeling and temporal feature extraction caused by alternating between temporal and spatial information. Additionally, by introducing more attribute information and designing a more precise network architecture based on dynamic and static attributes, our model better conforms to the physical constraints of vessel movement. These comprehensive improvements allow our model to excel in short-term, mid-term, and long-term predictions.

4.2.2. Generalization Experiment on Distinct Ocean Regions

To further validate the generalization capability of our model, we select three additional test sets—Florida Strait, Gulf of Mexico, and Hawaiian Islands—each representing distinct ocean environments. Visualizing the trajectory distributions in these regions (Figure 5a–c), we observe significant differences in their geographical locations and navigational environments compared to the training set (which covers coastal areas along the U.S. East and West Coasts). Specifically, the training regions are characterized by dense ports and busy coastal routes, whereas the Gulf of Mexico and Hawaiian Islands exhibit unique environmental features and vessel activity patterns.

The bar and pie charts further illustrate differences in vessel type distributions across these regions: The Florida Strait (Figure 5d) is dominated by pleasure craft, others, and sailing vessels, reflecting its coastal and recreational characteristics. The Gulf of Mexico (Figure 5e) primarily features others, cargo, and tanker vessels, aligning with its role as a major shipping route. In the Hawaiian region (Figure 5f), fishing, others, and tug tow vessels are predominant, indicating its focus on fishing and specialized operations. In contrast, the training set (Figure 3a) is largely composed of pleasure craft, sailing, and cargo vessels. These distributional differences suggest notable variation between each test region and the training set.

In the 50-min prediction experiments, performance metrics (RMSE, MAE, ADE, and FDE) vary across the test regions, as shown in Table 3.

Florida Strait: Results here closely match those in the coastal test set, indicating that the model effectively adapts to vessel activity patterns in this area. This may be due to the similarity in vessel type distribution and the presence of geographic data close to the Florida region in the training set.
Gulf of Mexico: Compared to Florida, the Gulf of Mexico shows a slight increase in prediction error. This could be attributed to differences in vessel type distribution (e.g., a higher proportion of others) and the complex shipping patterns in this region.
Hawaiian Islands: Prediction errors are highest in the Hawaiian region. The deep-sea environment lacks typical coastal structures and landmarks, and the higher proportion of fishing and tug tow vessels, which are underrepresented in the training set, increases the difficulty of prediction in this area.

Overall, while model performance declines slightly in these new test regions, the decrease in accuracy remains within acceptable limits, particularly in the Florida Strait and Gulf of Mexico. The model demonstrates robust generalization capability when faced with different geographic environments and vessel type distributions, supporting its potential application to unseen regions.

4.2.3. Model Architectures Ablation Study

In this ablation study, we aim to assess the impact of different model architectures on vessel trajectory prediction performance. The experimental results are summarized in Table 2, which lists five different model variants that differ in their encoder (Enc) and decoder (Dec) configurations. In the table, ’T’ represents the transformer architecture in the temporal dimension, while ’S’ represents the transformer architecture in the spatial dimension, as shown in Table 4.

First, the M1 model uses a two-layer transformer architecture in the temporal dimension (TT) as the encoder, combined with an LSTM decoder. This model performs the worst among all models, with an ADE of 0.003091° and an FDE of 0.004969°. This may be due to its reliance on temporal extraction alone, which fails to capture the complex spatial relationships in the trajectory.

The M2 model adopts a spatial-first, temporal-second (ST) architecture and shows some improvement compared to M1. Its ADE drops to 0.002984°, and its FDE is 0.004654°. This suggests that introducing spatial information helps improve prediction accuracy to some extent, but the effect is still limited.

The M3 model adopts a temporal-first, spatial-second (TS) architecture, resulting in further performance improvements, with an ADE of 0.002728° and an FDE of 0.004165°. This result indicates that temporal information is more important for vessel trajectory prediction, especially when spatial relationships are relatively simple. The global extraction in the temporal dimension provides a more significant performance boost.

The M4 model attempts a dual extraction architecture (TS+ST), combined with a MLP decoder. Its performance shows a slight improvement, with an ADE of 0.002691° and an FDE of 0.004125°. This suggests that although dual extraction architecture captures more information, the MLP decoder may not fully utilize this information, leading to only modest performance gains.

Finally, the M5 model combines dual extraction architecture with an LSTM decoder and achieves the best performance across all evaluation metrics, with an RMSE of 0.004223°, MAE of 0.003021°, ADE of 0.002436°, and FDE of 0.003946°. This demonstrates that the LSTM decoder effectively utilizes the rich information in the dual extraction architecture, significantly improving prediction accuracy.

In summary, the experimental results clearly show that temporal information is crucial for vessel trajectory prediction, and the inclusion of spatial information further enhances performance. Moreover, dual extraction architecture provides more comprehensive information, but it requires a strong decoder (such as LSTM) to fully leverage its potential. Compared to the M1 model, the M5 model reduces RMSE by 23.4% and decreases ADE by 21.2%, indicating the significant contribution of dual extraction combined with LSTM to overall performance improvement.

4.2.4. Attribute Combination Ablation Study

In this ablation study, we explore the impact of different attribute combinations on vessel trajectory prediction performance, with the experimental results being summarized in Table 5. The experiment consists of 11 groups, each using a different combination of attributes to evaluate their influence on the model’s performance. The table presents the results for four evaluation metrics: RMSE, MAE, ADE, and FDE.

First, the G1 group uses only the absolute position (p) attribute, achieving an RMSE of 0.006016°, MAE of 0.004593°, ADE of 0.003616°, and FDE of 0.005422°. This serves as the baseline model and shows that relying solely on absolute position for prediction results in limited performance. This is because absolute position alone cannot sufficiently capture the vessel’s motion trends or other important features.

Next, groups G2 through G5 explore the combination of absolute position with other individual attributes to validate their contribution to prediction performance.

G2: G2 uses a combination of absolute position and positional difference (Δp), resulting in an RMSE reduction of 0.005502° and FDE of 0.005019°. This indicates that incorporating positional difference helps the model capture vessel movement changes, particularly during acceleration or deceleration.
G3: G3 combines absolute position with heading (θ), leading to further improvements with an RMSE of 0.005194° and FDE of 0.004769°. This shows that heading information effectively aids in predicting the vessel’s direction, especially during turns.
G4: G4 combines absolute position with velocity (v), reducing ADE to 0.002785°. This suggests that velocity information is well suited for reflecting the vessel’s movement trends, particularly in steady sailing conditions.
G5: G5 introduces vessel type, with all four metrics slightly worsened, indicating that including vessel type during preprocessing introduces noise rather than improving performance. This may be because the influence of vessel type is minimal in this scenario, adding unnecessary complexity to the model.

Groups G6 through G8 examine the effects of combining multiple beneficial attributes.

G6: G6 combines absolute position, positional difference, and heading, resulting in an RMSE of 0.005160° and ADE of 0.003392°. This combination captures both position and heading changes, making it suitable for complex navigation scenarios, such as multi-directional turns.
G7: G7 combines absolute position, positional difference, and velocity, further reducing the RMSE to 0.004562°, showing that velocity and positional difference work together to capture overall speed changes and instantaneous movements, making this combination effective for handling scenarios with large speed variations.
G8: G8 combines absolute position, heading, and velocity, achieving an RMSE of 0.004649° and ADE of 0.002785°. This indicates that in some cases, heading and velocity may introduce redundant information, preventing further improvements, although the combination is still effective.

It is worth noting that there are inherent correlations between different attributes. For example, positional difference (Δp) and velocity (v) can be derived from one another through data formulas. Positional difference is a discrete approximation of velocity. There is also a clear geometric relationship between heading and positional change. Since these attributes can be derived from one another, combining them may not lead to as significant performance gains as when introducing individual attributes. However, these transformations are implicit, and the derived information is not always entirely accurate. Particularly in complex dynamic environments, simplified relationships may not capture all the details. Therefore, directly using multiple attribute combinations can still lead to performance improvements, G9 combines absolute position, positional difference, heading, and velocity, achieving an RMSE of 0.004376° and ADE of 0.002555°. This combination effectively captures the global dynamic information of the vessel and is well suited for complex, variable navigation scenarios.

However, G10 adds vessel type to the G9 combination, but performance slightly decreases, with an RMSE of 0.004417°, ADE of 0.002596°, and FDE of 0.004115°. This further confirms that the introduction of vessel type in the current model structure does not improve performance and may introduce noise or redundant information, resulting in reduced prediction accuracy.

Finally, G11 introduces our proposed gating mechanism based on static vessel information (gate-type), which shifts the use of vessel type from preprocessing to spatial relationship fusion. Using the gating mechanism, we adjust the influence of surrounding vessels on the main vessel based on their type. The results show significant improvements, with the RMSE dropping to 0.004223°, MAE to 0.003021°, ADE to 0.002436°, and FDE to 0.003946°. Compared to G9, the RMSE is reduced by 3.5%, and ADE by 4.7%, demonstrating that the gating mechanism helps the model better utilize spatial information and dynamically adjust its predictions based on different vessel types, thereby significantly improving predictive accuracy.

The introduction of different attribute combinations also affects the model’s parameter count and resource consumption. However, in the current experiments, the introduction of additional attributes does not significantly increase the model’s parameter count. From a parameter perspective, the inclusion of different attributes mainly affects the storage and processing of multi-dimensional data, rather than substantially increasing the complexity of the model architecture.

During training, although different attribute combinations lead to varying degrees of performance improvement, the trend in the loss function and training stability does not differ significantly between groups. Whether using single or combined attributes, the loss function during training maintains a relatively smooth downward trend, indicating that the model effectively learns from the additional attribute information without experiencing significant fluctuations or overfitting.

In practical applications, limitations in data collection and cost may prevent the acquisition of all attribute information. Based on the experimental results and real-world scenarios, we recommend a resource-constrained strategy for attribute selection. Key attributes such as absolute position, velocity, and heading should be prioritized, as they have a significant impact on vessel trajectory prediction. In contrast, attributes like vessel type can be omitted when resources are limited. Furthermore, if certain attributes are missing, the internal correlations mentioned earlier can be leveraged to infer the missing data. This approach ensures the model can still generate reasonable predictions even with incomplete information.

4.3. Visualization Research

In the task of vessel trajectory prediction, our proposed model (DualSTMA) is compared with four baseline models (METO-S2S, STGAT, Bi-LSTM, transformer) across various navigation scenarios. The visual results illustrate the prediction accuracy of each model based on the past 10 trajectory points to forecast the next 5 trajectory points, with each being point spaced 10 min apart. The analysis reveals that our model outperforms the baseline models in handling different levels of trajectory complexity. Below, we will provide a detailed analysis of the predictions in several typical scenarios.

In the straight-line navigation scenario, the vessel’s trajectory is nearly linear, making the prediction task relatively simple. As shown in Figure 6a–c, all models produce results with minimal differences from the ground truth. Our model, along with the baseline models (METO-S2S, STGAT, Bi-LSTM, transformer), achieves highly accurate predictions. In this straightforward scenario, all models effectively handle the prediction task, as there are no significant trajectory complexities to challenge the models.

In the slightly curved scenario, the vessel’s trajectory exhibits slight curvature but generally remains in a straight-line direction. Figure 7a–c illustrates the predictions in this context. When handling such mild curvature, our model demonstrates superior robustness, successfully tracking the minor shifts in the trajectory. However, the baseline models, particularly transformer and Bi-LSTM, begin to show noticeable deviations from the ground truth, failing to accurately capture the slight directional changes. This demonstrates that our model is more capable of managing small variations in spatial changes during navigation.

In the sharp turn scenario, the vessel undergoes significant directional changes, as depicted in Figure 8a–c. Sharp turns involve both directional and speed changes, posing a greater challenge to the prediction models. In this case, the baseline models struggle, displaying significant errors in predicting the trajectory post-turn. In contrast, our model successfully follows the trajectory’s change of direction and predicts the future path with greater accuracy, closely matching the ground truth. This indicates that our model has strong prediction capabilities in handling sharp turns.

In the emergency stop scenario, the intervals between past trajectory points decrease, indicating that the vessel is slowing down or stopping, as shown in Figure 9a,b. The baseline models struggle in this scenario, failing to capture the vessel’s deceleration, and thus predicting future trajectory points with uniform spacing. Our model, on the other hand, accurately recognizes the slowing down behavior and predicts smaller intervals between future points, demonstrating its sensitivity and robustness in handling sudden changes in speed.

In the bad-case scenario, even our model encounters significant prediction difficulties. Figure 10a,b presents two typical bad-case examples. In the first example, the future trajectory is discontinuous from the past, and includes a sharp turn. All models, including our one, perform poorly in predicting the future trajectory. In the second example, the vessel suddenly turns around in the last few trajectory points, resulting in significant prediction errors across all models. These bad cases highlight the challenge of predicting highly erratic and abrupt movements, offering insights for future model improvements.

Through the analysis of different vessel trajectory prediction scenarios, it is evident that our model outperforms the baseline models in most cases, particularly in handling complex scenarios such as sharp turns and emergency stops. The baseline models, lacking in-depth modeling of complex spacial–temporal relationships, often perform poorly when faced with abrupt changes in the trajectory. These findings underscore the strength of our model in capturing dynamic movements in vessel trajectory prediction, while also providing guidance for further model enhancement to address bad-case scenarios.

5. Discussions

The model in this paper is primarily based on automatic identification system (AIS) data. While AIS data provide essential information on a vessel’s position, speed, and heading, they lack detailed descriptions of the vessel’s maneuverability and surrounding marine environmental factors. The actual movement of a vessel depends not only on its own characteristics (such as turning radius and draft) but also on dynamic environmental factors like wind, currents, and waves, which significantly impact its trajectory. Specifically, factors such as wind and currents directly affect the vessel’s speed and heading, leading to deviations from its intended path. Therefore, incorporating these factors into trajectory prediction models enhances the prediction accuracy, making it closer to real-world navigation scenarios.

Although including environmental factors and vessel characteristics improves model performance, obtaining these data presents practical challenges. Vessel maneuverability characteristics (such as turning radius and draft) are typically difficult to obtain through public channels, and there are substantial variations among different vessels. Additionally, environmental data (such as currents and waves) can be obtained through meteorological and hydrological observations, but the conditions of different regions vary significantly over time and space, making these data challenging to quantify precisely. These complex environmental factors are not only difficult to acquire but may also contain uncertainty and noise, leading to inaccuracies that propagate through the prediction model and affect the reliability of the output.

If partial vessel characteristics and environmental data are available, the model improves in the following ways:

For noisy and uncertain marine environmental data, the model incorporates uncertainty analysis, embedding error sources into the confidence interval of the model output to enhance result robustness.
Since factors such as currents and waves exhibit complex temporal and spatial variations, mathematical modeling usually requires extensive statistical experiments and theoretical derivations; simplifications are made based on classical assumptions.
The model structure adopts a multi-modal design, separately processing AIS data, vessel characteristics, and environmental data to reduce the information noise caused by simple data aggregation.
To make better use of vessel characteristic data, the model shifts from outputting positional information to outputting underlying control information (such as acceleration and lateral rate of change) and further deduces position based on each vessel’s dynamic constraints, thereby improving prediction accuracy and physical consistency.

6. Conclusions

With the rapid increase in global shipping traffic, the rising number of vessels has led to higher traffic density, posing challenges to navigation safety and management. In response to these challenges, vessel trajectory prediction has become a crucial tool. However, existing methods often struggle to handle complex vessel interactions and multi-dimensional attribute information, limiting their ability to capture the spacial–temporal dynamics of trajectories. To address these issues, this paper proposes a dual spatial–temporal attention network that integrates multi-attribute information. By modeling vessel interactions through a dual-path structure and effectively incorporating dynamic and static attributes, our method demonstrates superior predictive performance and reliability on real AIS datasets compared to existing approaches.

However, it still has certain limitations that warrant further exploration in a future work. First, the current model primarily relies on AIS data and does not fully account for vessel maneuverability and surrounding oceanic conditions (such as weather and sea state). Incorporating these dynamic factors into the inputs would allow for a more accurate reflection of vessel movement characteristics, thereby enhancing the prediction accuracy. Additionally, integrating kinematic or dynamic models of vessel motion could provide a stronger physical basis for capturing the movement patterns of vessels. Furthermore, the model’s generalizability across different maritime regions requires improvement, which could be addressed by expanding the dataset to cover a wider variety of geographical areas and navigation scenarios, enhancing its applicability in diverse and complex ocean environments. Finally, future studies could focus on analyzing bad-case scenarios to identify the model’s performance limits in extreme or complex navigation environments, clarifying the model’s applicability and providing guidance for subsequent improvements.

Author Contributions

F.H.: conceptualization, data curation, formal analysis, methodology, validation, visualization, writing—original draft preparation, and writing—review and editing; Z.F.: methodology, project administration, and supervision; Z.L.: software, validation, and visualization; P.L.: data curation and resources; F.M.: funding acquisition and resources; X.L.: project administration and validation. All authors have read and agreed to the published version of the manuscript.

Funding

The authors gratefully acknowledge financial support from the Strategic Priority Research Program of the Chinese Academy of Sciences, Grant No. XDA0310502, and the Future Star of Aerospace Information Research Institute, Chinese Academy of Sciences, Grant No. E3Z107010F.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated and/or analyzed during the current study are available at https://marinecadastre.gov/ais/ (Marine Cadaster) (accessed on 20 October 2024) and https://github.com/AIR-SkyForecast/METO-S2S/tree/main/dataset_json (AIR-Sky) (accessed on 10 August 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wijaya, W.M.; Nakamura, Y. Port performance indicators construction based on the AIS-generated trajectory segmentation and classification. Int. J. Data Sci. Anal. 2024, 1–20. [Google Scholar] [CrossRef]
Chen, X.; Wu, H.; Han, B.; Liu, W.; Montewka, J.; Liu, W. Orientation-aware ship detection via a rotation feature decoupling supported deep learning approach. Ocean Eng. 2023, 125, 106686. [Google Scholar] [CrossRef]
UNCTAD. Review of Maritime Transport 2023; United Nations Publications: New York, NY, USA, 2024. [Google Scholar]
Kaptan, M.; Uğurlu, Ö.; Wang, J. The effect of nonconformities encountered in the use of technology on the occurrence of collision, contact and grounding accidents. Reliab. Eng. Syst. Saf. 2021, 215, 107886. [Google Scholar] [CrossRef]
Lei, P.-R. Mining maritime traffic conflict trajectories from a massive AIS data. Knowl. Inf. Syst. 2020, 62, 259–285. [Google Scholar] [CrossRef]
Liu, J.; Mao, X.; Fang, Y.; Zhu, D.; Meng, M.Q.-H. A Survey on Deep-Learning Approaches for Vehicle Trajectory Prediction in Autonomous Driving. In Proceedings of the 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO), Sanya, China, 27–31 December 2021; pp. 978–985. [Google Scholar]
Ahmed, I.; Jun, M.; Ding, Y. A spatio-temporal track association algorithm based on marine vessel automatic identification system data. IEEE Trans. Intell. Transp. Syst. 2022, 23, 20783–20797. [Google Scholar] [CrossRef]
Zhang, Y.; Han, Z.; Zhou, X.; Li, B.; Zhang, L.; Zhen, E.; Wang, S.; Zhao, Z.; Guo, Z. METO-S2S: A S2S based vessel trajectory prediction method with Multiple-semantic Encoder and Type-Oriented Decoder. Ocean Eng. 2023, 277, 114248. [Google Scholar] [CrossRef]
Liu, R.W.; Liang, M.; Nie, J.; Yuan, Y.; Xiong, Z.; Yu, H.; Guizani, N. STMGCN: Mobile edge computing-empowered vessel trajectory prediction using spatial-temporal multigraph convolutional network. IEEE Trans. Ind. Inform. 2022, 18, 7977–7987. [Google Scholar] [CrossRef]
Yuan, Y.; Weng, X.; Ou, Y.; Kitani, K.M. AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 17 October 2021; pp. 9813–9823. [Google Scholar]
Ngiam, J.; Caine, B.; Vasudevan, V.; Zhang, Z.; Chiang, H.T.L.; Ling, J.; Roelofs, R.; Bewley, A.; Liu, C.; Venugopal, A.; et al. Scene Transformer: A Unified Architecture for Predicting Multiple Agent Trajectories. arXiv 2021, arXiv:2106.08417. [Google Scholar]
Mohamed, A.; Qian, K.; Elhoseiny, M.; Claudel, C. Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 19 June 2020; pp. 14424–14432. [Google Scholar]
Li, M.; Li, B.; Qi, Z.; Li, J.; Wu, J. Enhancing Maritime Navigational Safety: Ship Trajectory Prediction Using ACoAtt–LSTM and AIS Data. ISPRS Int. J. Geo-Inf. 2024, 13, 85. [Google Scholar] [CrossRef]
Tong, X.; Chen, X.; Sang, L.; Mao, Z.; Wu, Q. Vessel trajectory prediction in curving channel of inland river. In Proceedings of the 2015 International Conference on Transportation Information and Safety (ICTIS), Wuhan, China, 25–28 June 2015; pp. 706–714. [Google Scholar] [CrossRef]
Fossen, S.; Fossen, T.I. Extended kalman filter design and motion prediction of ships using live automatic identification system (AIS) data. In Proceedings of the 2018 2nd European Conference on Electrical Engineering and Computer Science (EECS), Bern, Switzerland, 20–22 December 2018; pp. 464–470. [Google Scholar] [CrossRef]
Research on Forecasting Ship Sailed Track Behavioral Abnormalities Algorithm Based on Kalman Filter; Hebei University of Technology: Tianjin, China, 2012.
Mazzarella, F.; Arguedas, V.F.; Vespe, M. Knowledge-based vessel position prediction using historical AIS data. In Proceedings of the 2015 Sensor Data Fusion: Trends, Solutions, Applications (SDF), Bonn, Germany, 6–8 December 2015; pp. 1–6. [Google Scholar] [CrossRef]
Zhang, X.; Liu, G.; Hu, C.; Ma, X. Wavelet analysis based hidden Markov model for large ship trajectory prediction. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 2913–2918. [Google Scholar] [CrossRef]
Dalsnes, B.R.; Hexeberg, S.; Flåten, A.L.; Eriksen, B.-O.H.; Brekke, E.F. The neighbor course distribution method with Gaussian mixture models for AIS-based vessel trajectory prediction. In Proceedings of the 2018 21st International Conference on Information Fusion (FUSION), Cambridge, UK, 10–13 July 2018; pp. 580–587. [Google Scholar] [CrossRef]
Murray, B.; Perera, L.P. An AIS-based multiple trajectory prediction approach for collision avoidance in future vessels. In Proceedings of the International Conference on Offshore Mechanics and Arctic Engineering, Glasgow, Scotland, 9–14 June 2019. V07BT06A031. [Google Scholar] [CrossRef]
Liu, J.; Shi, G.; Zhu, K. Vessel trajectory prediction model based on AIS sensor data and adaptive chaos differential evolution support vector regression (ACDE-SVR). Appl. Sci. 2019, 9, 2983. [Google Scholar] [CrossRef]
Zhang, C.; Bin, J.; Wang, W.; Peng, X.; Wang, R.; Halldearn, R.; Liu, Z. AIS data driven general vessel destination prediction: A random forest based approach. Transp. Res. Part C Emerg. Technol. 2020, 118, 102729. [Google Scholar] [CrossRef]
Luo, D.; Chen, P.; Yang, J.; Li, X.; Zhao, Y. A new classification method for ship trajectories based on AIS data. J. Mar. Sci. Eng. 2023, 11, 1646. [Google Scholar] [CrossRef]
Kraus, P.; Mohrdieck, C.; Schwenker, F. Ship classification based on trajectory data with machine-learning methods. In Proceedings of the 2018 19th International Radar Symposium (IRS), Bonn, Germany, 22 June 2018; pp. 1–10. [Google Scholar]
Huang, I.-L.; Lee, M.-C.; Chang, L.; Huang, J.-C. Development and Application of an Advanced Automatic Identification System (AIS)-Based Ship Trajectory Extraction Framework for Maritime Traffic Analysis. J. Mar. Sci. Eng. 2024, 12, 1672. [Google Scholar] [CrossRef]
Filipiak, D.; Węcel, K.; Stróżyńska, M.; Michalak, M.; Abramowicz, W. Extracting maritime traffic networks from AIS data using evolutionary algorithm. Bus. Inf. Syst. Eng. 2020, 62, 435–450. [Google Scholar] [CrossRef]
Liu, R.W.; Liang, M.; Nie, J.; Lim, W.Y.B.; Zhang, Y.; Guizani, M. Deep learning-powered vessel trajectory prediction for improving smart traffic services in maritime Internet of Things. IEEE Trans. Netw. Sci. Eng. 2022, 9, 3080–3094. [Google Scholar] [CrossRef]
Li, Y.; Yu, Q.; Yang, Z. Vessel Trajectory Prediction for Enhanced Maritime Navigation Safety: A Novel Hybrid Methodology. J. Mar. Sci. Eng. 2024, 12, 1351. [Google Scholar] [CrossRef]
Nguyen, D.-D.; Le Van, C.; Ali, M.I. Vessel trajectory prediction using sequence-to-sequence models over spatial grid. In Proceedings of the 12th ACM International Conference on Distributed and Event-Based Systems, Hamilton, New Zealand, 25–29 June 2018; pp. 258–261. [Google Scholar] [CrossRef]
Forti, N.; Millefiori, L.M.; Braca, P.; Willett, P. Prediction of vessel trajectories from AIS data via sequence-to-sequence recurrent neural networks. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 8936–8940. [Google Scholar] [CrossRef]
Gao, M.; Shi, G.; Li, S. Online prediction of ship behavior with automatic identification system sensor data using bidirectional long short-term memory recurrent neural network. Sensors 2018, 18, 4211. [Google Scholar] [CrossRef]
Liu, S.S.; Ma, S.X.; Meng, X.; Zhang, Q.C. Prediction model of ship trajectory based on CNN and bi-LSTM. J. Chongqing Univ. Technol. (Nat. Sci.) 2020, 34, 196–205. [Google Scholar]
Wang, S.; Li, Y.; Zhang, Z.; Xing, H. Big data driven vessel trajectory prediction based on sparse multi-graph convolutional hybrid network with spatio-temporal awareness. Ocean Eng. 2023, 287, 115695. [Google Scholar] [CrossRef]
Feng, H.; Cao, G.; Xu, H.; Ge, S.S. IS-STGCNN: An Improved Social spatial-temporal graph convolutional neural network for ship trajectory prediction. Ocean Eng. 2022, 266, 112960. [Google Scholar] [CrossRef]
Zhao, J.; Yan, Z.; Zhou, Z.; Chen, X.; Wu, B.; Wang, S. A ship trajectory prediction method based on GAT and LSTM. Ocean Eng. 2023, 289, 116159. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar]
Gokcesu, K.; Gokcesu, H. Generalized Huber Loss for Robust Learning and Its Efficient Minimization for a Robust Statistics. arXiv 2021, arXiv:2108.12627. [Google Scholar]
Gao, D.-W.; Zhu, Y.-S.; Zhang, J.-F.; He, Y.-K.; Yan, K.; Yan, B.-R. A Novel MP-LSTM Method for Ship Trajectory Prediction Based on AIS Data. Ocean Eng. 2021, 228, 108956. [Google Scholar] [CrossRef]
Dyer, S.A.; Dyer, J.S. Cubic-Spline Interpolation. IEEE Instrum. Meas. Mag. 2001, 4, 44–46. [Google Scholar] [CrossRef]
Yang, C.-H.; Wu, C.-H.; Shao, J.-C.; Wang, Y.-C.; Hsieh, C.-M. AIS-Based Intelligent Vessel Trajectory Prediction Using Bi-LSTM. IEEE Access 2022, 10, 24302–24315. [Google Scholar] [CrossRef]
Adege, A.B.; Lin, H.P.; Wang, L.C. Mobility predictions for IoT devices using gated recurrent unit network. IEEE Internet Things J. 2020, 7, 505–517. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Huang, Y.; Bi, H.; Li, Z.; Mao, T.; Wang, Z. STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October 2019; pp. 6272–6281. [Google Scholar]

Figure 1. Our dual-path spatial–temporal attention network (DualSTMA), enhanced with multi-attribute information, explores vessel trajectory prediction from two complementary spatiotemporal perspectives: the ST path and the TS path. We first categorize attribute features into dynamic and static types, processing and integrating them into the network separately. The temporal transformer and spatial transformer then apply self-attention and cross-attention mechanisms to model temporal and spatial relationships. Additionally, the model employs an LSTM as the decoder, with different MLPs used to predict position, heading, and velocity.

Figure 2. Geographical coverage of AIS data in major U.S. coastal regions, including representative areas in the eastern, western, and southern parts of the United States. The boxes represent the four regions where the dataset is collected, while the blue line visualizes the estimated coverage of AIS data throughout the year 2021.

Figure 3. Both figures illustrate the counts and proportions of various vessel types, including pleasure craft, sailing vessels, cargo ships, and fishing vessels, demonstrating a consistent distribution between the training and test sets.

Figure 4. Comparison of average displacement error (ADE) and final displacement error (FDE) across models.

Figure 5. Visualization of vessel trajectories and vessel type distributions across three distinct maritime regions. The box represent the region where the dataset is collected, while the blue line visualizes the estimated coverage of AIS data from January to February 2021.

Figure 6. Visualization of ship trajectory predictions for smooth sailing scenarios, where vessels navigate steadily in almost straight lines, and all models perform similarly well. (a–c) show three examples of this scenario.

Figure 7. Visualization of ship trajectory predictions for a slightly curved scenario, where our model maintains accuracy while other models exhibit noticeable deviations. (a–c) show three examples of this scenario.

Figure 8. Visualization of predictions for sharp turn scenarios, where our model accurately predicts both heading and speed, outperforming the baselines. (a–c) show three examples of this scenario.

Figure 9. Visualization of predictions for emergency stop scenarios in maritime trajectory forecasting. Subfigures (a,b) illustrate two different emergency stop cases, showing the overall trajectory predictions by various models. Subfigures (c,d) are magnified views of the future trajectories corresponding to (a,b), where the deceleration effects cause minimal movement. Our model accurately captures the deceleration and stopping behavior, while other models fail to predict these changes effectively.

Figure 10. Visualization of challenging scenarios. (a) The first example shows a sharp turn and discontinuity, where our model performs better than the others but still struggles. (b) The second example features a U-turn, in which all models face difficulties, though our model shows slightly better performance.

Table 1. Geographical ranges of representative coastal regions in the United States covered by the AIS dataset.

Range	Southwestern	Northeastern	Southeastern	Northwestern
LON	120° W~114° W	71° W~65° W	81° W~75° W	127° W~122° W
LAT	35° N~28° N	46° N~41° N	36° N~30° N	50° N~42° N

Table 2. Quantitative analysis of predictive performance across different models over varying time intervals. The table compares different models (LSTM, GRU, transformer, Bi-LSTM, STAGT, METO-S2S, and our proposed model) in terms of four metrics (ADE, FDE, RMSE, MAE) across short-term (10 and 20 min), mid-term (30 and 40 min), and long-term (50 min) predictions. ’Ours’ represents the full model, which consistently outperforms other baselines across all metrics, as shown by the improvement percentages at the bottom.

Model	Metrics (°)	10 min	20 min	30 min	40 min	50 min
LSTM	ADE	0.007024	0.009525	0.014168	0.016787	0.020502
	FDE	0.008658	0.008658	0.018923	0.021869	0.028527
	RMSE	0.009979	0.013531	0.020127	0.023848	0.034967
	MAE	0.006600	0.009593	0.014424	0.016669	0.026269
GRU	ADE	0.005333	0.007232	0.010757	0.012746	0.017085
	FDE	0.006743	0.009801	0.014736	0.017030	0.023773
	RMSE	0.008733	0.011841	0.017613	0.020869	0.027974
	MAE	0.006057	0.008805	0.013239	0.015300	0.021357
Transformer	ADE	0.003016	0.005351	0.007395	0.009438	0.011481
	FDE	0.003980	0.007660	0.010739	0.013818	0.016822
	RMSE	0.005166	0.009166	0.012666	0.016165	0.018665
	MAE	0.003660	0.007043	0.009874	0.012705	0.014467
Bi-LSTM	ADE	0.002893	0.005062	0.007141	0.009040	0.010938
	FDE	0.003753	0.006951	0.010148	0.012929	0.015709
	RMSE	0.004862	0.008509	0.012004	0.015195	0.017386
	MAE	0.003410	0.006315	0.009221	0.011747	0.013273
STAGT	ADE	0.002604	0.004620	0.006803	0.008483	0.010667
	FDE	0.003475	0.006434	0.009780	0.012225	0.015571
	RMSE	0.004576	0.008119	0.011957	0.014909	0.017747
	MAE	0.003246	0.006012	0.009138	0.011422	0.013548
METO-S2S	ADE	0.002802	0.004248	0.005152	0.006056	0.008045
	FDE	0.003614	0.005908	0.007298	0.008828	0.011608
	RMSE	0.004687	0.007105	0.008617	0.010129	0.013355
	MAE	0.003486	0.005699	0.007039	0.008514	0.010196
Ours	ADE	0.000609	0.000853	0.001157	0.001522	0.002436
	FDE	0.000807	0.001164	0.001771	0.002378	0.003946
	RMSE	0.001156	0.001378	0.002106	0.002839	0.004223
	MAE	0.000475	0.000921	0.001456	0.001920	0.003021
Impro.(%)	ADE	78.27%	79.92%	77.54%	74.87%	69.72%
	FDE	83.20%	80.30%	75.73%	73.06%	66.01%
	RMSE	75.34%	80.61%	75.56%	71.97%	68.38%
	MAE	86.37%	83.84%	79.32%	77.45%	70.37%

Table 3. Performance metrics for vessel trajectory prediction across four regions.

Regions	Trajectory Number	Vessel Number	RMSE	MAE	ADE	FDE
Coastal	3415	1980	0.004223	0.003021	0.002436	0.003946
Florida	2396	1113	0.004466	0.003356	0.002632	0.004094
Mexico	2920	729	0.004618	0.003145	0.002471	0.004004
Hawaiian	1713	329	0.004917	0.003908	0.003095	0.005231

Table 4. Exploration results of various transformer architectures for vessel trajectory prediction. The table compares different model variants in terms of RMSE, MAE, ADE, and FDE. ’M5’ represents our complete model, which achieves the best performance across all metrics.

Model	Enc	Dec	RMSE	MAE	ADE	FDE
M1	TT	LSTM	0.005509	0.003956	0.003091	0.004969
M2	ST	LSTM	0.005222	0.003759	0.002984	0.004654
M3	TS	LSTM	0.004878	0.003504	0.002728	0.004165
M4	TS+ST	MLP	0.004655	0.003336	0.002691	0.004125
M5	TS+ST	LSTM	0.004223	0.003021	0.002436	0.003946

Table 5. Performance comparison of different attribute combinations. The table lists various combinations of position (p), displacement (Δp), heading (θ), velocity (v), type, and gate-type attributes. The metrics include RMSE, MAE, ADE, and FDE for each group of attributes. ‘G11’ represents our complete model, achieving the best overall performance.

Group	p	Δp	θ	v	Type	Gate- Type	RMSE	MAE	ADE	FDE
G1	✓						0.006016	0.004593	0.003616	0.005422
G2	✓	✓					0.005502	0.004332	0.003412	0.005109
G3	✓		✓				0.005194	0.004037	0.003175	0.004769
G4	✓			✓			0.004691	0.003537	0.002785	0.004401
G5	✓				✓		0.006249	0.004941	0.003888	0.005858
G6	✓	✓	✓				0.005160	0.003939	0.003102	0.004715
G7	✓	✓		✓			0.004562	0.003402	0.002676	0.004234
G8	✓		✓	✓			0.004504	0.003429	0.002698	0.004358
G9	✓	✓	✓	✓			0.004376	0.003246	0.002555	0.004088
G10	✓	✓	✓	✓	✓		0.004417	0.003298	0.002596	0.004115
G11	✓	✓	✓	✓		✓	0.004223	0.003021	0.002436	0.003946

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, F.; Liu, Z.; Li, X.; Mou, F.; Li, P.; Fan, Z. Vessel Trajectory Prediction Based on AIS Data: Dual-Path Spatial–Temporal Attention Network with Multi-Attribute Information. J. Mar. Sci. Eng. 2024, 12, 2031. https://doi.org/10.3390/jmse12112031

AMA Style

Huang F, Liu Z, Li X, Mou F, Li P, Fan Z. Vessel Trajectory Prediction Based on AIS Data: Dual-Path Spatial–Temporal Attention Network with Multi-Attribute Information. Journal of Marine Science and Engineering. 2024; 12(11):2031. https://doi.org/10.3390/jmse12112031

Chicago/Turabian Style

Huang, Feilong, Zhuoran Liu, Xiaohe Li, Fangli Mou, Pengfei Li, and Zide Fan. 2024. "Vessel Trajectory Prediction Based on AIS Data: Dual-Path Spatial–Temporal Attention Network with Multi-Attribute Information" Journal of Marine Science and Engineering 12, no. 11: 2031. https://doi.org/10.3390/jmse12112031

APA Style

Huang, F., Liu, Z., Li, X., Mou, F., Li, P., & Fan, Z. (2024). Vessel Trajectory Prediction Based on AIS Data: Dual-Path Spatial–Temporal Attention Network with Multi-Attribute Information. Journal of Marine Science and Engineering, 12(11), 2031. https://doi.org/10.3390/jmse12112031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Vessel Trajectory Prediction Based on AIS Data: Dual-Path Spatial–Temporal Attention Network with Multi-Attribute Information

Abstract

1. Introduction

2. Literature Review

2.1. Physical Model-Based Vessel Trajectory Prediction

2.2. Machine Learning-Based Vessel Trajectory Prediction

2.3. Deep Learning-Based Vessel Trajectory Prediction

3. Research Methodology

3.1. Problem Definition and Data Preprocessing

3.1.1. Problem Definition

3.1.2. Data Preprocessing

3.1.3. Feature Preprocessing

3.2. Dual Spatial–Temporal Attention Encoder

3.2.1. Temporal–Spatial Path

3.2.2. Spacial–Temporal Path

3.3. LSTM Decoder

3.4. Loss

4. Experiments

4.1. Experiment Settings

4.1.1. Hyperparameters and Experimental Environment

4.1.2. Dataset

4.1.3. Baselines

4.1.4. Evaluation Metric

4.2. Model Performance Comparison

4.2.1. Comparison Results with Baselines

4.2.2. Generalization Experiment on Distinct Ocean Regions

4.2.3. Model Architectures Ablation Study

4.2.4. Attribute Combination Ablation Study

4.3. Visualization Research

5. Discussions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI