Research on Ship-Type Recognition Based on Fusion of Ship Trajectory Image and AIS Time Series Data

Pu, Zhengpeng; Hong, Yuan; Hu, Yuling; Jiang, Jingang

doi:10.3390/electronics14030431

Open AccessArticle

Research on Ship-Type Recognition Based on Fusion of Ship Trajectory Image and AIS Time Series Data

by

Zhengpeng Pu

¹,

Yuan Hong

²,

Yuling Hu

¹ and

Jingang Jiang

^1,*

¹

Institute of Intelligent Machines, and Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China

²

Ecological Environment Monitoring Station of Susong County, Anqing 246501, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(3), 431; https://doi.org/10.3390/electronics14030431

Submission received: 17 November 2024 / Revised: 14 January 2025 / Accepted: 20 January 2025 / Published: 22 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

Achieving accurate and efficient ship-type recognition is crucial for the development and management of modern maritime traffic systems. To overcome the limitations of existing methods that rely solely on AIS time series data or navigation trajectory images as single-modal approaches, this study introduces TrackAISNet, a multimodal ship classification model that seamlessly integrates ship trajectory images with AIS time series data for improved performance. The model employs a parallel structure, utilizing a lightweight neural network to extract features from trajectory images, and a specially designed TCN-GA (Temporal Convolutional Network with Global Attention) to capture the temporal dependencies and long-range relationships in the AIS time series data. The extracted image features and temporal features are then fused, and the combined features are fed into a classification network for final classification. We conducted experiments on a self-constructed dataset of variable-length AIS time series data comprising four types of ships. The results show that the proposed model achieved an accuracy of 81.38%, recall of 81.11%, precision of 80.95%, and an F1 score of 81.38%, outperforming the benchmark single-modal algorithms. Additionally, on a publicly available dataset containing three types of fishing vessel operations, the model demonstrated improvements in accuracy, recall, and F1 scores by 5.5%, 4.88%, and 5.88%, respectively.

Keywords:

AIS; multimodal fusion; TCN-GA; ship-type classification

1. Introduction

Ship-type identification plays a key role in modern maritime traffic systems, ensuring maritime safety and improving shipping management efficiency. With the globalization of international trade, intelligent ship identification and management technology are crucial to the stable operation and sustainable development of shipping. Automatic Identification System (AIS) data are one of the core data for current ship identification and are widely used in maritime traffic monitoring [1], ship collision warning, and navigation safety assurance [2,3]. AIS data contain spatiotemporal data information such as the ship’s geographic location, speed, and heading, which can reflect the dynamic behavior and navigation trajectory of the ship. However, real-world AIS data have problems such as signal loss, data noise, and incomplete information. Methods that rely solely on a single AIS data source will result in limited recognition accuracy in complex environments. Therefore, considering how to enhance the accuracy and robustness of ship-type identification has become an important direction of current research.

In early studies, the research on ship-type recognition from AIS data mainly relied on rule-based classification algorithms or machine learning models, such as random forests and support vector machines [4,5]. AIS data gathered over a period of time are processed by feature extraction methods. The extracted features include geographical features (such as ship position and heading), behavioral features (such as speed changes and berthing patterns), and geometric features (such as trajectory curvature and navigation path shape). Machine learning algorithms are used for classification based on the extracted multidimensional features [6,7]. The feature extraction step requires segmentation, analysis, and aggregation of the original trajectory, which increases the difficulty of data processing on the one hand and ignores the dynamic characteristics of the time dimension in AIS data on the other hand. It is difficult to fully utilize the dynamic time series information of AIS data [8], resulting in low recognition accuracy.

In recent years, with the rapid development of deep learning technology, researchers have begun to apply deep learning models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to tasks such as ship type recognition and prediction based on AIS data [9,10,11]. For example, by processing the spatiotemporal trajectory information in AIS data through CNN, more features can be extracted for classification [12,13]; using time series models such as RNNs to process AIS data has also improved the ability to capture ship behavior patterns. However, although some deep learning models have achieved good results in ship-type recognition tasks, they also have some limitations. For example, when processing spatiotemporal trajectory data, although CNNs can extract spatial features, their ability to model long-term dependencies is weak, and it is difficult to capture complex time series patterns [14]. RNN models, especially bidirectional long short-term memory networks, can model long-term dependencies in time series data, but due to their sequence processing characteristics, they are difficult to parallelize and have a slow training speed. When the time series is too long, they may face the problem of gradient vanishing or exploding [15]. In addition, these models rely heavily on single-modality data, and their recognition accuracy drops significantly when faced with incomplete or missing AIS data or trajectory images.

To address the aforementioned challenges, this paper proposes a multimodal fusion model, TrackAISNet, based on vessel trajectory images and AIS data. This model achieved accurate classification of four vessel types—Cargo, Passenger, Fishing, and Tanker—on a real-world, preprocessed multivariate irregular-length AIS dataset. The model employs a parallel structure to separately process vessel trajectory images and AIS time series data. On the one hand, it utilizes a lightweight convolutional neural network (CNN) to process vessel trajectory images and extract visual features of different types of vessel trajectories. On the other hand, it uses an improved Temporal Convolutional Network with Global Attention (TCN-GA) to capture the temporal features of the AIS data. In the feature fusion stage, the model aligns image and time series features, enhancing the complementarity between heterogeneous data. Finally, the fused features are classified through fully connected layers, improving the accuracy of vessel type recognition. Our experimental code and dataset are available at https://github.com/Pu-Zhengpeng/AIS_Classification (accessed on 19 January 2025).

Our main contributions are as follows:

(1): We constructed a multivariate variable-length AIS time series dataset encompassing four vessel types, with support for the effective integration of vessel trajectory images, generating a fused dataset that combines both vessel trajectory images and AIS time series data.
(2): We constructed the TrackAISNet multimodal ship classification model, which can combine the ship track image features and AIS time series data information to improve the accuracy of ship-type recognition. It overcomes the problem of poor recognition accuracy caused by poor data quality under single-modality data. It has demonstrated competitive performance in the above self-built datasets and public datasets.
(3): In the time series feature processing branch of the TrackAISNet multimodal model, a new TCN-GA skeleton network has been designed to effectively extract time series data features. The TCN-GA skeleton network first extracts local time series features through multilayer causal dilation convolution and then introduces a global attention module to dynamically adjust the weights between each time step, thereby enhancing the focus on important time series information at critical moments.

The remainder of this paper is structured as follows: Section 2 reviews related works, focusing on recent advances in vessel type recognition and applications of deep learning. Section 3 provides a detailed overview of the proposed model and methodology, including data fusion techniques and model architecture design. Section 4 describes the data collection and preprocessing steps for the custom-built dataset, with particular emphasis on handling AIS data and trajectory images. Section 5 presents the experimental setup, detailing the hardware environment, dataset partitioning, and comparative experimental results on both the custom and public datasets. Finally, Section 6 summarizes this study and discusses potential future research directions.

2. Related Works

The classification and analysis of AIS data have become key research areas in marine monitoring and vessel management. Early studies employed machine learning algorithms for the feature extraction and classification of AIS data to gain deeper insights into vessel behavior and dynamics. For instance, Huang et al. [4] proposed a classification scheme for fishing vessels based on trajectory data feature engineering and the XGBoost algorithm as a core module, successfully addressing the key challenge of identifying nine different types of fishing vessels. Luo et al. [5] introduced a novel ensemble classifier based on traditional machine learning algorithms for vessel trajectory classification. This method utilized AIS data labeled with five types of vessels and extracted features using the Tsfresh module. Baeg et al. [6] preprocessed and extracted 39 features (including newly proposed ink-based trajectory shape features) from AIS data for the classification of four vessel types—fishing boats, passenger ships, tankers, and cargo ships—while comparing the performance of decision trees, random forests, and other algorithms on this dataset. Sheng et al. [16] used a logistic regression model to build a vessel classifier based on features extracted from vessel trajectory data. Zhou et al. [17] proposed a new method for port vessel classification based on behavior clustering, providing insights into the behavior of vessels within port areas. Although these studies demonstrated certain advantages in processing and analyzing AIS data using machine learning algorithms, they heavily relied on feature extraction, which not only increases the complexity of data processing but also overlooks the temporal information embedded in AIS data.

AIS data are inherently time series data, containing temporal information, such as vessel speed and positional changes over time, which is crucial for capturing the dynamic behaviors and movement patterns of vessels. Many researchers have conducted in-depth studies on AIS time series data using algorithms like recurrent neural networks (RNNs), yielding significant results. For example, Liu et al. [18] proposed a vessel trajectory prediction framework based on AIS data, with an LSTM network at its core. By integrating ship traffic conflict scenario modeling and the concept of social force, and by reconstructing a hybrid loss function, they improved the prediction accuracy. Zhao et al. [11] introduced the GAT-LSTM model, which combines graph attention networks and long short-term memory networks to extract spatial and temporal features from vessel trajectories, exhibiting strong robustness in trajectory prediction. Gao et al. [19] proposed a novel MP-LSTM method for vessel trajectory prediction, combining the advantages of TPNet and LSTM. This method designs an efficient AIS data preprocessing pipeline and introduces physical assumptions to balance the complexity and accuracy of the prediction model. Additionally, some researchers have applied convolutional neural networks (CNNs) to classify and analyze vessel trajectory images generated from AIS data. For instance, Chen et al. [12] proposed a CNN-based vessel movement pattern classification algorithm (CNN-SMMC), converting AIS trajectories into movement images for classification. Luo et al. [13] developed a method that integrates speed information into trajectory images and applied a ResNet-based model for vessel type classification. Their experiments, using real AIS data from five types of vessels, validated the classifier’s effectiveness. Duan et al. [20] proposed a semi-supervised deep learning method called SSL-VTC for vessel trajectory classification. This approach leverages a semi-supervised learning framework that not only relies on labeled data but also learns latent structures from unlabeled data, achieving superior performance in vessel trajectory classification tasks. Yang et al. [21] introduced a vessel-type recognition method based on trajectory images generated from vessel speed and acceleration. Using CNNs, they extracted features from the trajectories under three different navigation states, achieving efficient vessel-type recognition. Bai et al. [22] showed that in sequence modeling tasks, simple convolutional architectures outperform classic recurrent neural networks across a variety of tasks and datasets, exhibiting greater long-range memory capabilities. However, while these deep learning models have achieved promising results in analyzing AIS data, they often exhibit a strong dependence on single-modal data. In practical applications, vessel trajectory images and AIS time series data each contain rich spatial and temporal feature information. If these modalities are processed in a fused manner, it could potentially lead to better performance in vessel-type recognition tasks.

To this end, this paper proposes a multimodal fusion model, TrackAISNet, based on vessel trajectory images and AIS data, aiming to improve the accuracy and robustness of identifying four types of vessels: Cargo, Passenger, Fishing, and Tanker. The model employs a parallel structure to separately process trajectory images and AIS time series data. A lightweight convolutional neural network is used to extract visual features from the trajectory images, while the designed TCN-GA network is utilized to extract temporal features from the AIS data. During the feature fusion stage, features from these two heterogeneous data sources are integrated, followed by classification through fully connected layers.

3. Method

The objective of this research is to accurately predict vessel types by fusing two data modalities: vessel trajectory images and AIS time series data. Let the vessel trajectory image data be denoted as

X_{img} \in R^{N \times C \times H \times W}

, where N is the batch size, C represents the number of image channels, and H and W are the image height and width, respectively. The AIS time series data are denoted as

X_{AIS} \in R^{N \times T \times D}

, where T is the number of time steps, and D represents the feature dimension of the AIS data.

3.1. TCN-GA Backbone

The multivariate, variable-length AIS time series data can be represented mathematically as follows: let the multivariate, variable-length time series sample be denoted as

X_{i} = {x_{i 1}, x_{i 2}, \dots, x_{i T_{i}}}

, where

x_{i t} \in R^{d}

. Here,

T_{i}

represents the length of the time series for the i-th sample, which may vary across samples. Each feature vector at time step t is denoted as

x_{i t} = [x_{i t}^{1}, x_{i t}^{2}, \dots, x_{i t}^{d}] \in R^{d}

, where d is the feature dimension, and for multivariate time series,

d > 1

.

The research conducted by Bai et al. [22] indicates that simple convolutional architectures outperform classical recurrent neural networks, such as LSTM, across various tasks and datasets in sequence modeling tasks, demonstrating a longer effective memory capacity. Lea et al. [23] were the first to introduce Temporal Convolutional Networks (TCNs) to address the problem of long-term dependency modeling in time series data. They demonstrated the superior performance of TCN in tasks such as action recognition and temporal segmentation, providing an efficient convolutional approach for processing time series data.

From Figure 1a, we know that the core of the TCN backbone primarily consists of dilated causal convolutions and residual connections. For an input sequence

X = {x_{1}, x_{2}, \dots, x_{T}}

, the output

y_{t}

of the causal convolution layer is only related to

x_{\leq t}

, preserving the sequential order of the time series data. This relationship can be expressed as follows:

y_{t} = \sum_{i = 0}^{k - 1} W_{i} \cdot x_{t - i}

(1)

In the above equation,

W_{i}

represents the convolution kernel weights, k denotes the size of the convolution kernel, t indicates the time step, and

x_{(t - i)}

is the input sequence. To further expand the receptive field and capture dependencies over longer time spans, the TCN introduces dilated convolutions. Dilated convolution increases the receptive field of the input signal exponentially by introducing a dilation rate d, allowing it to effectively process long time sequences with fewer layers. This can be expressed as follows:

y_{t} = \sum_{i = 0}^{k - 1} W_{i} \cdot x_{t - d \cdot i}

(2)

In the above equation, d is the dilation rate,

W_{i}

is the convolution kernel, and

x_{t - d \cdot i}

is the i-th input at a distance from the current time step t. To avoid the issues of vanishing or exploding gradients in deep networks, residual connections are employed to ensure that information can propagate smoothly through the network, enhancing the model’s training effectiveness. This can be expressed as follows:

y_{t}^{(l + 1)} = f (y_{t}^{l}) + y_{t}^{l}

(3)

However, when using a TCN network to extract features from AIS time series data for ship classification, several issues arise. The TCN applies the same convolutional weights at each time step, which means it assigns equal importance to every segment of the entire sequence. In practical applications, certain time steps may contain more critical information than others, especially when dealing with variable-length time series data.

Inspired by Liu et al. [24], we have developed a new TCN-GA backbone network, as shown in Figure 1b. By incorporating a global attention mechanism, the model is better able to focus on the most valuable segments of the time series data.

The output of the original TCN module is a feature matrix

H \in R^{T \times C}

, where T represents the number of time steps, and C denotes the number of output channels (i.e., feature dimensions). By performing a mean operation on the features at each time step, we can compute the weights for each time step, which can be expressed as follows:

α_{t} = \frac{1}{C} \sum_{c = 1}^{C} H_{t, c}

(4)

The attention weight

α_{t}

indicates the importance of time step t. Subsequently, the softmax function is applied to normalize these weights, transforming them into a probability distribution that reflects the relative importance of each time step. This normalization ensures that the sum of the attention weights equals 1, allowing for a weighted sum of the input features. Ultimately, this generates a refined representation that emphasizes the most relevant information, which can be expressed as follows:

α_{t}^{'} = \frac{exp (α_{t})}{\sum_{j = 1}^{T} exp (α_{j})}

(5)

The normalized weights

α^{'} \in R^{T}

represent the importance of each time step. These normalized weights

α^{'}

are then used to perform a weighted sum of the features extracted by the TCN, specifically by weighting the output H at each time step. In the following expression,

H_{t}

is the feature vector at time step t, and

\hat{H} \in R^{C}

is the feature representation after the weighted sum, which can be expressed as follows:

\hat{H} = \sum_{t = 1}^{T} α_{t}^{'} H_{t}

(6)

3.2. TrackAISNet

As shown in Figure 2, the overall framework of TrackAISNet consists of two main branches: feature extraction from ship trajectory images and feature extraction from AIS time series data. Each branch processes its respective input data—ship trajectory images and AIS time series data—to extract relevant features. The features from different modalities are then fused, enabling precise classification of ship types based on the integrated feature representation.

Trajectory Image Feature Extraction: Convolutional neural networks (CNNs) are commonly used to process image data. Through multiple layers of convolution operations, deep visual features

F_{img} \in R^{N \times F_{img \dim}}

can be extracted from trajectory images, aiding in the differentiation of various types of vessels. The backbone network for feature extraction in TrackAISNet is the lightweight network EfficientNet-B0, summarized in the following formula, where

F_{img_\dim}

represents the dimension of the extracted features, which can be expressed as follows:

F_{img} = CNN (X_{img}) \in R^{N \times F_{img_\dim}}

(7)

AIS Temporal Feature Extraction: The Temporal Convolutional Network with Global Attention (TCN-GA) is constructed to process AIS data. In the AIS temporal data, the dynamic behavioral features of vessels are crucial. The TCN-GA effectively extracts these temporal features and outputs

F_{AIS} \in R^{N \times F_{AIS_\dim}}

, where

F_{AIS_\dim}

is the dimension of the AIS temporal features, which can be expressed as follows:

F_{AIS} = TCN - GA (X_{AIS}) \in R^{N \times F_{AIS_\dim}}

(8)

Multimodal Feature Fusion: To fully utilize the complementarity of trajectory images and AIS temporal data, the model needs to fuse features from both modalities. A common fusion approach is feature concatenation, where the extracted trajectory image features

F_{img}

are concatenated with the extracted AIS temporal features

F_{AIS}

to obtain the fused feature

F_{fusion}

. This fusion method allows different modalities of information to be unified into a representation of the same dimension, enabling the model to comprehensively consider important information from both data sources, thereby improving classification performance. The mathematical representation is as follows:

F_{fusion} = [F_{img}, F_{AIS}] \in R^{N \times (F_{img_\dim} + F_{AIS_\dim})}

(9)

Classifier and Output: The fused feature

F_{fusion}

serves as the input for the model to classify vessel types. After feature fusion, the model maps the fused features to a classification space for specific categories through fully connected layers. Let K be the number of vessel categories, such as Cargo, Passenger, Fishing, and Tanker. The output probability distribution is represented as

\hat{y} \in R^{N \times K}

, where

{\hat{y}}_{i, k}

denotes the predicted probability that the i-th sample belongs to category k:

\hat{y} = Softmax (FC (F_{fusion}))

(10)

Overall, the research problem can be defined as a multimodal classification task. The objective of the model is to input vessel trajectory images and AIS temporal data, and through feature extraction, fusion, and classification, it achieves accurate identification of vessel types. The task can be represented as follows:

\hat{y} = f (X_{img}, X_{AIS})

(11)

where

f (\cdot)

is the model responsible for converting the input multimodal data into a probability distribution of ship types.

4. AIS Data Preprocessing

The raw AIS data contain issues such as feature redundancy, missing values, noise, and class imbalance. Prior to model training, we performed detailed preprocessing on the dataset to ensure compatibility with the model’s input requirements. The following sections will provide a comprehensive overview of the data sources, preprocessing steps, and visualizations of trajectory data for different vessel types.

4.1. Data Source

The raw AIS data for the self-constructed AIS dataset in this study were sourced from the official website of the Danish Maritime Authority [6]. The study utilized historical AIS data spanning a 7-day period in April 2024 along the Danish coastline. Each file contains approximately 1.5 GB of daily records. To facilitate understanding of the AIS data, we visualized unified tracks for four vessel types using latitude and longitude information from selected data points. Figure 3 illustrates the distribution range of the data.

4.2. AIS Data Preprocessing

We conducted a detailed analysis of the raw AIS data, with a subset of sample records presented in Table 1. Additionally, a missing data heatmap in Figure 4 was generated using AIS data from a specific day to further illustrate the patterns of missing values.

Figure 4 presents the missing data heatmap, which visualizes the correlation between missing values across different features in the dataset. The intensity of the color indicates the strength of the relationship: darker shades suggest a strong correlation between missing values, while lighter shades indicate a more independent or random distribution. Columns with no missing values or an extremely low proportion of missing data are typically represented in white or near-white on the heatmap.

The raw AIS data exhibits issues such as feature redundancy, inconsistent time intervals, and missing data, necessitating data preprocessing operations. The data preprocessing workflow is illustrated in Figure 5.

To address class imbalance in the data, this study focuses on four vessel types: Cargo, Passenger, Fishing, and Tanker. First, data for other vessel types are filtered out. Next, columns with missing values, severely incomplete data, and redundant features are removed, retaining only the nine essential features: MMSI, timestamp, vessel type, latitude, longitude, speed, course, width, and length.

Duplicate timestamps for each vessel record are removed, and outliers are identified, such as excluding data with latitude > 90 or longitude > 180.

Furthermore, AIS data for vessels mostly stationary over time are filtered out by setting a threshold based on positional changes and speed to determine whether the vessel is moving or stationary at a given time. Vessel trajectories with over 90% of stationary records are excluded from the dataset.

The processed AIS data contain thousands of time steps per day, with irregular time intervals. To reduce data volume and standardize the sampling rate, a downsampling approach was applied to lower the frequency to 1 min.

To address the issue of missing data, this study employed the second-order polynomial interpolation method. This approach leverages the trend information of adjacent points in time series data to construct a second-order polynomial function for filling in missing values. Compared to commonly used methods such as linear interpolation and mean imputation, second-order polynomial interpolation more accurately captures the smooth variations in trajectories while avoiding the oscillation issues that may arise with higher-order polynomial interpolation. Figure 6 illustrates a simulation experiment conducted on a segment of AIS trajectory data using different interpolation methods. The experiment first randomly generated segments of continuous missing trajectory points, followed by applying various interpolation methods to complete the data. The results demonstrate that second-order polynomial interpolation exhibits superior performance in restoring the overall trajectory trend.

After this processing, the data for each vessel type at each timestamp yielded 11 features: MMSI, timestamp, vessel type, latitude, longitude, speed, course, width, length, latitude change, and longitude change.

4.3. Trajectory Image

The preprocessed AIS data contain the latitude and longitude position information for each vessel at different time points. This positional data can be used to generate trajectory images for each vessel over a specific period. During the training process, these trajectory images will be fed into the network simultaneously. Figure 7 illustrates examples of trajectory images for various types of vessels.

The above figure presents trajectory images of four different types of vessels generated based on the latitude and longitude positional information from AIS data. Each image corresponds to a specific vessel type, displaying the movement path of each ship over a certain period. Figure 7a illustrates the trajectory of a cargo ship, which appears relatively straight, reflecting the cargo ship’s typical long-distance, straight-line navigation. In contrast, the trajectory of the passenger ship shown in Figure 7b is denser and more complex, likely representing short-distance travel or intricate maneuvers around the port. Figure 7c depicts the fishing vessel’s trajectory, which is notably more irregular, indicating its characteristic pattern of movement within fishing grounds. The trajectory of the tanker in Figure 7d is similar to that of the cargo ship, appearing generally stable and straight, suggesting that tankers usually follow fixed transportation routes. From these trajectory images, it is evident that the movement characteristics of fishing and passenger vessels are more distinct, while cargo and tanker trajectories exhibit similarities. The resemblance between the trajectories of cargo ships and tankers is reasonable, as tankers are also a type of transport vessel, specifically carrying oil. Through these trajectory images, we can clearly observe the differences in movement characteristics among various vessel types, providing both feasibility and a basis for vessel-type identification based on trajectory images.

5. Experiment

In this section, we will evaluate the performance of TrackAISNet. First, Section 5.1 presents the experimental setup, including the environment, dataset, and evaluation metrics. Section 5.2 provides relevant statistical information on the four types of vessels in the custom dataset, including dataset partitioning. Next, in Section 5.3, we assess the performance of TrackAISNet on the custom AIS dataset. Section 5.4 conducts ablation experiments to demonstrate the performance enhancement effects of each module within TrackAISNet. Finally, in Section 5.5, we further validate the model’s effectiveness by comparing TrackAISNet against state-of-the-art multimodal algorithms on a public AIS dataset related to three types of fishing activities.

5.1. Experimental Setup

Environment: All experiments were conducted on a machine equipped with an Intel i5-13400F CPU and an NVIDIA RTX 4060Ti GPU. Python 3.8 and PyTorch 2.2.2 were used to build and train our model.

Datasets: (1) A custom multivariate variable-length AIS time series dataset. (2) A publicly available AIS dataset comprising three types of fishing activities.

Experimental Details: The training of the classification model employed a weighted cross-entropy loss function to mitigate the issue of class imbalance. The Adam optimizer was utilized to accelerate convergence and improve training efficiency. Hyperparameter settings were tuned using a grid search strategy, including hidden layer dimensions (64 and 128), the number of network layers (1 to 3), dropout rates (0.1 and 0.2), and learning rates (0.0001, 0.0005, and 0.001).

Evaluation Metrics: The classification accuracy, recall, precision, and F1 score were used as evaluation metrics. Additionally, in the comparative experiments with publicly available datasets and state-of-the-art algorithms, we also included time per sample as an evaluation metric. Time per sample indicates the time taken to test each individual sample, which is measured in seconds. Furthermore, it is essential to understand the confusion matrix before introducing other metrics, as shown in Table 2:

TP (True Positive): Instances that are correctly predicted as positive. In other words, the actual value is positive, and the prediction is also positive.

TN (True Negative): Instances that are correctly predicted as negative. This refers to situations where the actual value is negative, and the prediction is also negative.

FP (False Positive): Instances that are incorrectly predicted as positive. This occurs when the actual value is negative, but it is mistakenly predicted as positive.

FN (False Negative): Instances that are incorrectly predicted as negative. This occurs when the actual value is positive, but it is mistakenly predicted as negative.

Based on these definitions, we can derive the definitions of accuracy, precision, recall, and F1 score.

Accuracy represents the proportion of correctly classified samples among the total number of samples, and it is expressed as follows:

Accuracy = \frac{T P + T N}{T P + F P + T N + F N}

(12)

Precision represents the proportion of predicted positive samples among the actual positive samples, and it is expressed as follows:

Precision = \frac{T P}{T P + F P}

(13)

Recall represents the proportion of actual positive samples among the predicted positive samples, and it is expressed as follows:

Recall = \frac{T P}{T P + F N}

(14)

The F1 score is the weighted average of precision and recall, and it is expressed as follows:

F 1 = \frac{2 * (Precision * Recall)}{Precision + Recall}

(15)

Precision reflects the model’s ability to distinguish negative samples, with a higher precision indicating stronger discrimination capability for negative samples. Recall reflects the model’s ability to identify positive samples; a higher recall indicates a greater ability to recognize positive samples. The F1 Score combines both metrics, and a higher F1 Score suggests a more robust model.

ROC curve is a graphical tool for representing the performance of a classification model. By plotting the True Positive Rate (TPR) on the Y axis against the False Positive Rate (FPR) on the X axis, it illustrates the classifier’s performance across different threshold settings.

True Positive Rate (TPR): Also known as recall, the TPR measures the classifier’s ability to correctly identify positive instances. The TPR can be understood as the detection rate among all positive instances, where a higher TPR indicates better performance. Its calculation formula is the same as that of the recall metric. Its calculation formula is as follows:

TPR = \frac{T P}{T P + F N}

(16)

False Positive Rate (FPR): The FPR represents the proportion of negative instances that the model incorrectly classifies as positive. It can be understood as the rate of false positives among all actual negative instances (also known as the false alarm rate), where a lower FPR indicates better performance. Its calculation formula is as follows:

FPR = \frac{F P}{F P + T N}

(17)

AUC (Area Under the Curve): The AUC is the area under the ROC curve and serves as a metric to evaluate the classifier’s performance. The higher the AUC value, the better the classifier performs; conversely, a lower AUC indicates poorer performance.

5.2. Statistic of Our Processed Dataset

The detailed information of the dataset for the four types of vessels—Cargo, Passenger, Fishing, and Tanker—obtained after the preliminary preprocessing phase is shown in Table 3. During the experiment, the training set needs to be divided, with a ratio of 8:2 for the training set to the testing set.

This section provides a more intuitive visualization of some details of the preprocessed dataset. The distribution of sample sizes for the four different categories of vessels—Cargo, Passenger, Fishing, and Tanker—is illustrated in Figure 8a. Additionally, Figure 8b shows the distribution of the lengths of continuous AIS data time series for different MMSI vessels in the dataset.

5.3. Experiments on a Self-Constructed Dataset

We compared the performance of our proposed TrackAISNet and TCN-GA against various temporal algorithms, including LSTM, BiLSTM, GRU, and BiLSTM-CNN, as well as several lightweight convolutional neural networks such as MobileNetV2 [25], ShuffleNetV2 [26], and EfficientNet [27] under different modalities. For the comparison algorithms, hyperparameter settings were optimized using a grid search method. The performance comparison of different algorithms is shown in Table 4.

The data from the aforementioned table indicate that the TCN-GA model exhibited the best performance among the time modality algorithms, with an accuracy of 80.35%, an F1 score of 80.26%, a precision of 80.18%, and a recall of 80.35%. This was followed closely by the BiLSTM-CNN model. Notably, although the TCN performed relatively weakly when used alone, the improved TCN-GA significantly enhanced model performance. This improvement may be related to the dataset being composed of variable-length time series data. When the time series data are padded, using the last time step for classification predictions may lead to lower identification accuracy. The incorporation of an attention mechanism helps focus on key features at critical moments, thereby improving recognition accuracy.

In the image modality comparison, the EfficientNet-B0 network outperformed both MobileNetV2 and ShuffleNetV2 in terms of its accuracy, F1 score, and precision, although the differences among the three are relatively small.

In the area of multimodality processing, TrackAISNet demonstrated excellent performance, achieving the highest accuracy of 81.38% compared to the various single-modality algorithms, along with correspondingly high F1 score, precision, and recall values. This underscores the robustness of TrackAISNet in handling noise, uncertainty, and AIS data loss. In practical applications, AIS data are often subject to noise or partial loss due to equipment malfunctions, transmission interference, or environmental factors. TrackAISNet incorporates a multimodal fusion mechanism within its architecture, effectively integrating information from both the temporal and image modalities. This multimodal approach enables the model to leverage complementary information from one modality when the other is affected by noise or anomalies, significantly mitigating the impact of single-modality noise on overall performance. This compensation mechanism greatly enhances the model’s adaptability and robustness in scenarios involving AIS data loss.

Selecting the right algorithm for a specific task is critical. For tasks focused on time series analysis, TCN-GA stood out as the most suitable choice. With its balanced parameter size and low computational complexity, it is well suited for deployment in scenarios with high real-time performance requirements. When the task involves image data processing, EfficientNet was a strong candidate, offering excellent performance despite its relatively higher parameter size and computational demands, provided sufficient computational resources are available. For complex applications that require integrating multisource information to optimize outcomes, the multimodal approach proposed in this study—TrackAISNet—proves to be a robust and comprehensive solution.

As the primary model proposed in this study, TrackAISNet has a relatively large parameter size (16.64 M) and higher computational complexity. However, the rapid advancement of embedded systems and edge computing technologies ensures that modern high-performance hardware can fully support this scale. Moreover, optimization techniques such as quantization, pruning, and knowledge distillation can significantly reduce computational overhead, making the model more efficient for deployment in real-world environments.

In maritime traffic management systems, real-time performance is critical, as it directly affects operational efficiency and safety. By combining time series and image data, TrackAISNet offers a holistic analytical capability, making it ideal for key decision support scenarios, such as intelligent route planning in high-risk areas and vessel collision risk prediction. Despite its relatively high computational demands, TrackAISNet can be efficiently supported by modern high-performance servers or embedded edge devices commonly used in maritime systems. This ensures the model can deliver accurate, reliable, and timely decision support, enhancing the safety and efficiency of maritime operations.

5.4. Ablation Study

To clearly demonstrate the performance gains contributed by each module in TrackAISNet, a comprehensive study was conducted, as shown in Table 4. From the last row of the table, it is evident that, by integrating all modules, the experimental results—including accuracy, recall, and precision—highlight the effectiveness of each module in enhancing the classification network. Additionally, several key conclusions can be drawn from the ablation study results in Table 5 and what follows:

1.: When using only the TCN module, although it has certain advantages in modeling temporal data, the classification accuracy was only 62.65%, with an F1 score of 59.40%, precision of 66.97%, and recall of 62.65%. This result suggests that relying solely on temporal features is insufficient to significantly enhance the network’s classification performance. This may be related to the presence of variable-length time series in the dataset; after padding, the classification predictions based on the last time step led to low recognition accuracy. We observed a similar phenomenon when using recurrent neural networks like LSTM, where leveraging the final hidden layer features rather than the last time step output can effectively improve classification performance.
2.: Combining the TCN and global attention modules significantly enhanced the overall model performance, raising the accuracy from 62.65% to 80.35% and achieving an F1 score of 80.26%. This indicates that the attention mechanism has a notable advantage in emphasizing key temporal features, with both precision and recall approaching 80%, demonstrating the balance and robustness of this combination in classifying positive and negative samples.
3.: When using only the EfficientNet-B0 module, the accuracy reached 66.56%, a slight improvement over using TCN alone, with the F1 score increasing to 66.16%. This indicates the importance of visual features within the fused data, though relying solely on visual features still offered limited improvement in overall classification performance.
4.: Integrating the TCN, global attention module, and EfficientNet-B0 module to construct TrackAISNet further enhanced the model’s overall performance, achieving a peak accuracy of 81.38% and an F1 score of 81.11%. This result confirms the effectiveness of multimodal fusion, indicating that the combination of structured and unstructured information significantly improves classification accuracy. Each module plays an essential role in the overall enhancement of classification performance, leveraging the complementary strengths of temporal and visual features to some extent.

5.5. Experiments on Public Dataset

To further validate the effectiveness of the TrackAISNet model presented in this paper, we conducted comparative experiments on a public AIS dataset that includes three types of fishing activities. The dataset comprises trajectory data from vessels in the East China Sea (anonymized) and represents authentic historical maritime tracking information, encompassing multiple dimensions of data. Each trajectory includes details such as vessel ID, latitude, longitude, speed, heading, timestamp, and operational mode (trawl, surround, and gillnet). This dataset can be found at https://aistudio.baidu.com/datasetdetail/146541 (accessed on 19 January 2025). It contains 14,656 training samples and 3664 testing samples, with a balanced distribution of positive and negative samples. Most of the experimental results for the algorithms compared in the following table are derived from the latest literature [28].

The results in Table 6 indicate that TrackAISNet demonstrated significant performance improvements in comparative experiments on public datasets. After only three training epochs, the model achieved an accuracy of 82.76%, with recall and precision reaching 82.76% and 83.01%, respectively. At this stage, its performance was comparable to that of the latest multimodal algorithm, MFGTN, which reported an accuracy of 82.61% and a recall of 83.23%. As training progressed to 10 epochs, TrackAISNet underwent further optimization, reaching an accuracy of 89.33% and an F1 score of 89.32%, showcasing superior performance. Moreover, TrackAISNet delivered these results with an impressive processing speed of only 0.005 s per sample, underscoring its potential in ship trajectory classification tasks. While other models, such as SVP-T and MFGTN, also demonstrated strong results, TrackAISNet’s performance over extended training suggests it may serve as an effective solution for this task.

Figure 9 and Figure 10 present the ROC curves of our algorithm across different network models on a public dataset containing three types of fishing activities. The X axis represents the false positive rate, while the Y axis represents the true positive rate. The dotted line indicates the micro-average ROC curve, and the dashed line shows the macro-average ROC curve. The solid blue line represents the ROC curve for gillnetting, the solid green line represents the ROC curve for purse seining, and the solid red line represents the ROC curve for trawling.

6. Conclusions and Prospects

This paper presents a multimodal fusion model, TrackAISNet, capable of integrating ship trajectory images with AIS time series data. The model employs a parallel architecture to extract visual features through a lightweight image feature extraction network, while the designed TCN-GA network processes the temporal features. Subsequently, during the feature fusion phase, these heterogeneous features are aligned and integrated, ultimately leading to accurate classification through a fully connected layer. Experiments were conducted using a self-constructed dataset of variable-length AIS time series data from four types of vessels, along with a public dataset that includes three types of fishing activities. The experimental results demonstrate that TrackAISNet achieved the highest accuracy on both datasets. Notably, when compared to the latest multimodal models on the public dataset, our constructed model exhibited competitive performance across various evaluation metrics.

However, this study also has some limitations, such as the relatively small number of vessel categories in the dataset. TrackAISNet is not the final chapter of our work; future efforts will focus on the following: (1) testing TrackAISNet with additional AIS datasets to validate the algorithm’s scalability and its potential for handling complex maritime traffic scenarios and (2) exploring new network architectures to enhance the algorithm, optimizing the feature extraction networks for different modalities and constructing more lightweight and accurate feature extraction networks, as well as more effective feature fusion methods to develop more precise and efficient multimodal classification models.

Author Contributions

Z.P.: Methodology, Investigation, Formal analysis, Data curation, Conceptualization. J.J.: Software, Project administration, Methodology, Investigation, Data curation, Conceptualization. Y.H. (Yuan Hong): Software, Project administration, Investigation, Data curation. Y.H. (Yuling Hu): Supervision, Software, Resources, Data curation, Conceptualization. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China [2023YFC3207503] and the Natural Science Foundation of Anhui Provincial [2208085MD89].

Data Availability Statement

The public dataset is already included in the article, while the self-constructed dataset is available on the author’s GitHub.

Conflicts of Interest

All authors of this paper declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AIS	Automatic Identification System
TCN-GA	Temporal Convolutional Network with Global Attention
TrackAISNet	The proposed multimodal ship classification model in this paper
CNNs	Convolutional Neural Networks
RNNs	Recurrent Neural Networks
BiLSTM	Bidirectional Long Short-Term Memory
GRU	Gated Recurrent Unit
LSTM	Long Short-Term Memory Network
TCN	Temporal Convolutional Network
MMSI	Maritime Mobile Service Identity
TP	True Positive
TN	True Negative
FP	False Positive
FN	False Negative
FPR	False Positive Rate
TPR	True Positive Rate
AUC	Area Under the Curve
ROC	Receiver Operating Characteristic

References

Ribeiro, C.V.; Paes, A.; de Oliveira, D. AIS-based maritime anomaly traffic detection: A review. Expert Syst. Appl. 2023, 231, 120561. [Google Scholar] [CrossRef]
Rong, H.; Teixeira, A.P.; Soares, C.G. Ship collision avoidance behaviour recognition and analysis based on AIS data. Ocean Eng. 2022, 245, 110479. [Google Scholar] [CrossRef]
Liu, Z.; Zhang, B.; Zhang, M.; Wang, H.; Fu, X. A quantitative method for the analysis of ship collision risk using AIS data. Ocean Eng. 2023, 272, 113906. [Google Scholar] [CrossRef]
Huang, H.; Hong, F.; Liu, J.; Liu, C.; Feng, Y.; Guo, Z. FVID: Fishing vessel type identification based on VMS trajectories. J. Ocean Univ. China 2019, 18, 403–412. [Google Scholar] [CrossRef]
Luo, D.; Chen, P.; Yang, J.; Li, X.; Zhao, Y. A new classification method for ship trajectories based on AIS data. J. Mar. Sci. Eng. 2023, 11, 1646. [Google Scholar] [CrossRef]
Baeg, S.; Hammond, T. Ship Type Classification Based on The Ship Navigating Trajectory and Machine Learning. In Proceedings of the ACM IUI Workshops 2023, Sydney, Australia, 27–31 March 2023. [Google Scholar]
Kraus, P.; Mohrdieck, C.; Schwenker, F. Ship classification based on trajectory data with machine-learning methods. In Proceedings of the 2018 19th International Radar Symposium (IRS), Bonn, Germany, 20–22 June 2018; pp. 1–10. [Google Scholar]
Zheng, Y. Trajectory data mining: An overview. ACM Trans. Intell. Syst. Technol. 2015, 6, 1–41. [Google Scholar] [CrossRef]
Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent neural networks for time series forecasting: Current status and future directions. Int. J. Forecast. 2021, 37, 388–427. [Google Scholar] [CrossRef]
Jurkus, R.; Venskus, J.; Treigys, P. Application of coordinate systems for vessel trajectory prediction improvement using a recurrent neural networks. Eng. Appl. Artif. Intell. 2023, 123, 106448. [Google Scholar] [CrossRef]
Zhao, J.; Yan, Z.; Zhou, Z.; Chen, X.; Wu, B.; Wang, S. A ship trajectory prediction method based on GAT and LSTM. Ocean. Eng. 2023, 289, 116159. [Google Scholar] [CrossRef]
Chen, X.; Liu, Y.; Achuthan, K.; Zhang, X. A ship movement classification based on Automatic Identification System (AIS) data using Convolutional Neural Network. Ocean Eng. 2020, 218, 108182. [Google Scholar] [CrossRef]
Luo, P.; Gao, J.; Wang, G.; Han, Y. Research on Ship Classification Method Based on AIS Data. In Proceedings of the Computer Supported Cooperative Work and Social Computing: 15th CCF Conference, ChineseCSCW 2020, Shenzhen, China, 7–9 November 2020. [Google Scholar]
Mahmoud, A.; Mohammed, A. A Survey on Deep Learning for Time-Series Forecasting. In Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges; Hassanien, A.E., Darwish, A., Eds.; Springer: Cham, Switzerland, 2021; pp. 365–392. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Sheng, K.; Liu, Z.; Zhou, D.; He, A.; Feng, C. Research on ship classification based on trajectory features. J. Navig. 2018, 71, 100–116. [Google Scholar] [CrossRef]
Zhou, Y.; Daamen, W.; Vellinga, T.; Hoogendoorn, S.P. Ship classification based on ship behavior clustering from AIS data. Ocean. Eng. 2019, 175, 176–187. [Google Scholar] [CrossRef]
Liu, R.W.; Liang, M.; Nie, J.; Lim, W.Y.B.; Zhang, Y.; Guizani, M. Deep Learning-Powered Vessel Trajectory Prediction for Improving Smart Traffic Services in Maritime Internet of Things. IEEE Trans. Netw. Sci. Eng. 2022, 9, 3080–3094. [Google Scholar] [CrossRef]
Gao, D.W.; Zhu, Y.S.; Zhang, J.F.; He, Y.K.; Yan, K.; Yan, B.R. A novel MP-LSTM method for ship trajectory prediction based on AIS data. Ocean. Eng. 2021, 228, 108956. [Google Scholar] [CrossRef]
Duan, H.; Ma, F.; Miao, L.; Zhang, C. A semi-supervised deep learning approach for vessel trajectory classification based on AIS data. Ocean Coast. Manag. 2022, 218, 106015. [Google Scholar] [CrossRef]
Yang, T.Y.; Wang, X.; Liu, Z.J. Ship Type Recognition Based on Ship Navigating Trajectory and Convolutional Neural Network. J. Mar. Sci. Eng. 2022, 10, 84. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
Lea, C.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks for action segmentation and detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1003–1012. [Google Scholar]
Liu, Y.; Shao, Z.; Hoffmann, N. Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 122–138. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Gu, Y.; Hu, Z.; Zhao, Y.; Liao, J.; Zhang, W. MFGTN: A multi-modal fast gated transformer for identifying single trawl marine fishing vessel. Ocean. Eng. 2024, 303, 117711. [Google Scholar] [CrossRef]
Wang, Z.; Yan, W.; Oates, T. Time series classification from scratch with deep neural networks: A strong baseline. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 1578–1585. [Google Scholar]
Serrà, J.; Pascual, S.; Karatzoglou, A. Towards a Universal Neural Network Encoder for Time Series. In Artificial Intelligence Research and Development; IOS Press BV: Amsterdam, The Netherlands, 2018; pp. 120–129. [Google Scholar]
Chen, R.; Yan, X.; Wang, S.; Xiao, G. DA-Net: Dual-attention network for multivariate time series classification. Inf. Sci. 2022, 610, 472–487. [Google Scholar] [CrossRef]
Du, M.; Wei, Y.; Zheng, X.; Ji, C. Multi-feature based network for multivariate time series classification. Inf. Sci. 2023, 639, 119009. [Google Scholar] [CrossRef]
Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-variation modeling for general time series analysis. arXiv 2023, arXiv:2210.02186. [Google Scholar]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? In Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37.
Zuo, R.; Li, G.; Choi, B.; Bhowmick, S.S.; Mah, D.N.Y.; Wong, G.L. SVP-T: A shape-level variable-position transformer for multivariate time series classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37. [Google Scholar]

Figure 1. TCN comparison with TCN-GA.

Figure 2. TrackAISNet.

Figure 3. Trajectory distribution of cargo, fishing, passenger, and tanker vessels.

Figure 4. Missing data heatmap.

Figure 5. Data preprocessing workflow.

Figure 6. Simulation experiment with interpolation methods.

Figure 7. Trajectory images for different types of vessels.

Figure 8. Statistics of dataset.

Figure 9. ROC curve at epoch = 3.

Figure 10. ROC curve at epoch = 10.

Table 1. Partial raw AIS data sample.

Timestamp	MMSI	Latitude	Longitude	ROT	SOG	COG	Heading	IMO	Call Sign	Name	Type
03/04/2024 00:00:12	248886000	57.795723	10.375297	0.0	8.6	83.6	83.0	9329306	9HA2563	TRANS FJELL	Tanker
03/04/2024 00:00:12	248886000	57.795723	10.375297	0.0	8.6	85.0	85.0	9329306	9HA2563	TRANS FJELL	Tanker
03/04/2024 00:00:12	277029000	91.000000	0.000000	NaN	NaN	NaN	NaN	6714809	LYAB	MINGE	Fishing
03/04/2024 00:00:12	244140321	55.188458	15.665447	1.1	11.8	33.6	32.0	9735220	PDIF	FWN-SPIRIT	Cargo
⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮	⋮
03/04/2024 00:00:12	219543000	56.250683	12.402150	0.0	6.9	347.2	346.0	9944546	OZGV2	LAURA-MAERSK	Cargo

Table 2. Confusion matrix.

Ship Type	Predicted as Class i	Predicted as Other Classes
Class i	True Positive (TP)	False Negative (FN)
Other Class	False Positive (FP)	True Negative (TN)

Table 3. Dataset.

Dataset	Cargo	Fishing	Tanker	Passenger
train dataset	1848	620	796	620
test dataset	459	154	214	145
Total	2307	774	1010	765

Table 4. Performance comparison of algorithms on the self-constructed dataset.

Modality	Algorithm	Accuracy (%)	F1 Score (%)	Precision (%)	Recall (%)	Params
Time Modality	LSTM [11]	73.25	73.10	73.32	73.25	0.22 M
	BiLSTM [11]	78.50	78.41	78.47	78.50	0.56 M
	BiLSTM-CNN [12]	79.53	79.58	79.88	79.53	0.69 M
	GRU [10]	79.94	79.30	79.31	79.94	0.65 M
	TCN [23]	62.65	59.40	66.97	62.65	0.44 M
	TCN-GA	80.35	80.26	80.18	80.35	0.44 M
Image Modality	MobileNetV2 [25]	66.15	66.16	65.07	66.15	9.11 M
	ShuffleNetV2 [26]	62.35	61.52	62.72	62.35	5.28 M
	EfficientNet-B0 [27]	66.56	66.16	66.43	66.56	15.92 M
Multimodality	TrackAISNet	81.38	81.11	80.95	81.38	16.64 M

Note: Bold values indicate the highest performance metrics achieved across the experiments.

Table 5. The results of ablation study on the self-constructed dataset.

TCN [23]	Attention [24]	EfficientNet-B0 [27]	Accuracy	F1 Score	Precision	Recall
✔			62.65%	59.40%	66.97%	62.65%
✔	✔		80.35%	80.26%	80.18%	80.35%
		✔	66.56%	66.16%	66.43%	66.56%
✔	✔	✔	81.38%	81.11%	80.95%	81.38%

Note: Bold values indicate the highest performance metrics achieved across the experiments.

Table 6. Algorithm comparison on public dataset.

Method	Accuracy	Recall	Precision	F1 Score	Time/Sample (s)
FCN [29]	63.56	63.21	63.89	63.54	0.003
ResNet [29]	63.56	65.29	66.03	65.66	0.0143
MLP [29]	62.95	62.68	62.05	62.36	0.0127
Encoder [30]	77.97	77.92	78.55	77.98	0.0138
DA-Net [31]	74.32	74.32	74.09	73.60	0.0125
MF-Net [32]	75.03	75.08	75.01	74.99	0.0115
TimesNet [33]	78.52	77.12	78.04	77.58	0.0167
Dlinear [34]	70.32	70.04	71.64	70.83	0.002
SVP-T [35]	79.66	79.64	79.67	79.37	0.0121
MFGTN [28]	82.61	83.23	81.92	82.07	0.008
Ours (epoch = 3)	82.76	82.76	83.01	82.39	0.005
Ours (epoch = 10)	88.11	88.11	89.10	87.95	0.005

Note: The bold values indicate the best performance metrics in the comparative experiments.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pu, Z.; Hong, Y.; Hu, Y.; Jiang, J. Research on Ship-Type Recognition Based on Fusion of Ship Trajectory Image and AIS Time Series Data. Electronics 2025, 14, 431. https://doi.org/10.3390/electronics14030431

AMA Style

Pu Z, Hong Y, Hu Y, Jiang J. Research on Ship-Type Recognition Based on Fusion of Ship Trajectory Image and AIS Time Series Data. Electronics. 2025; 14(3):431. https://doi.org/10.3390/electronics14030431

Chicago/Turabian Style

Pu, Zhengpeng, Yuan Hong, Yuling Hu, and Jingang Jiang. 2025. "Research on Ship-Type Recognition Based on Fusion of Ship Trajectory Image and AIS Time Series Data" Electronics 14, no. 3: 431. https://doi.org/10.3390/electronics14030431

APA Style

Pu, Z., Hong, Y., Hu, Y., & Jiang, J. (2025). Research on Ship-Type Recognition Based on Fusion of Ship Trajectory Image and AIS Time Series Data. Electronics, 14(3), 431. https://doi.org/10.3390/electronics14030431

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Ship-Type Recognition Based on Fusion of Ship Trajectory Image and AIS Time Series Data

Abstract

1. Introduction

2. Related Works

3. Method

3.1. TCN-GA Backbone

3.2. TrackAISNet

4. AIS Data Preprocessing

4.1. Data Source

4.2. AIS Data Preprocessing

4.3. Trajectory Image

5. Experiment

5.1. Experimental Setup

5.2. Statistic of Our Processed Dataset

5.3. Experiments on a Self-Constructed Dataset

5.4. Ablation Study

5.5. Experiments on Public Dataset

6. Conclusions and Prospects

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI