1. Introduction
A tropical cyclone is a low-pressure vortex originating over the tropical or subtropical ocean with different names and classifications in different seas. Once a “tropical storm” or “tropical cyclone” reaches the maximum sustained surface winds of at least 33 m/s, it is typically called a “typhoon” in the northwest Pacific Ocean [
1]. According to the National Standard of Tropical Cyclone Classification (GB/T 19201-2006) published by the China Meteorological Administration (CMA), tropical cyclones are classified into the following six grades in detail: Super Typhoon (Super TY, 51.0 m/s or above), Severe Typhoon (STY, 41.5~50.9 m/s), Typhoon (TY, 32.7~41.4 m/s), Severe Tropical Storm (STS, 24.5–32.6 m/s), Tropical Storm (TS, 17.2–24.4 m/s), and Tropical Depression (TD, 10.8–17.1 m/s) [
2]. Typhoons occur over the tropical ocean surface, relying on the condensation of water vapor to release potential heat as the main energy source for their maintenance and development. The typical typhoon is a warm-core vortex, with the low-level air flows converging toward the center in a cyclonic rotation, and the wind speed increases rapidly as it approaches the eye wall around the center [
3]. It is characterized by a strong low-pressure center, gale, heavy precipitation and spiral cloud system. The average life span is about a week, or it can be as short as two or three days, and the longest can be about a month. Many studies have been going on around different stages of the typhoon’s life cycle: formation, development, maturation and extinction, including research about tropical cyclogenesis forecast before formation [
4], locating the center of tropical cyclones [
5], tropical cyclones’ intensity estimates [
6] and tropical cyclones’ trajectory forecast [
7] after formation. This paper focuses on tropical cyclones’ trajectory prediction, which belongs to the middle stages of a typhoon’s life span.
The typhoon is one of the major regional climate disasters along the coastal area of China. In the face of the uncertainty and periodicity of typhoon disasters, it is crucial for the development of meteorology and other fields to construct an adaptive elastic security concept while realizing an effective and accurate typhoon forecast. Due to the complexity of factors involved in typhoon track prediction, feature extraction is faced with severe challenges. Traditional prediction methods depend on prior knowledge about meteorology, which is time-consuming and laborious, and the prediction accuracy is limited. However, with the continuous breakthrough of deep-learning technology in various fields, in recent years, more and more researchers began to explore the application of deep learning technologies for typhoon track prediction, achieving remarkable research results. In 2018, Gao et al. built a Long Short-Term Memory (LSTM) model for typhoon trajectory prediction based on the optimal tracks dataset of tropical cyclones collected during 1949–2011, provided by the China Meteorological Administration’s Shanghai Typhoon Institute (CMA-STI) [
8]. However, the sole input utilized for its prediction process was the observations of typhoon tracks, ignoring other meteorological features. In 2019, Rüttger et al. adopted a Generative Adversarial Network (GAN) to train and test different typhoons which occurred in the Korea peninsula during 1995–2016 [
9]. In the same year, Pingping Dong and Jie Lian et al. proposed a data-driven deep-learning model based on Granger causality and Gated Recurrent Unit (GRU) [
10,
11], comprising data preprocessing, feature selection, and GRU with a customized batch processing. The model was evaluated on the Western North Pacific (WNP) Ocean Best Track Data (2194 cyclones, 61,089 records, 1945–2017) provided by the Joint Typhoon Warning Center (JTWC), which were divided into training sets, validation sets, and test sets in an 8:1:1 ratio [
10]. In recent years, the continuous development of deep-learning-based models has shown significant improvements in areas such as image processing, natural language processing, and object detection [
12,
13,
14], The use of CNNs and auto-encoders (AEs), among other models, has shown significant advantages in extracting hidden features from complex multi-variable datasets while increasing the generalization capability. In 2020, Jie Lian et al. developed two updated models by adding a CNN or an AE layer separately, which consists of a multi-dimensional feature selection layer and a GRU layer [
15,
16]. The model gradually evolved into a multi-layer model, including the coding and decoding layers, with the CNN or AE servers as the encoder, and the GRU as the decoder.
Recently, there has been much research on the application of LSTM-related models to typhoon prediction. In 2022, Tong et al. utilized the convolutional LSTM (ConvLSTM) model to predict both the track and intensity of tropical cyclones [
17]. The model uses four kinds of tropical cyclone best-tracking records for a 6 h prediction, and achieves a high score of R-squared 0.998 on average at latitude and longitude. However, many public data sources are used, while the regional data source was not included. In 2023, Jiadong Lu et al. designed a TimeForce CNN-LSTM hybrid model, in which an experiment on the data of one day (24 h) was selected as the training input, and all the collected typhoon track data were divided into the training and validation sets according to the ratio of 3:1 [
18]. In 2024, Gan et al. studied on typhoon prediction via four mainstream DL techniques: Long Short-Term Memory (LSTM), Convolutional LSTM (CovnLSTM), Temporal Convolutional Network (TCN), and the Transformer [
19]. The regional data source was not included, as in previous research.
In the field of typhoon prediction, on one hand, the application of a deep-learning fusion algorithm is increasingly put into practice, which significantly improves the accuracy and efficiency of prediction. On the other hand, time-series deep-learning algorithms continue to evolve, providing more support for the construction of more refined predictive models. In 2017, Vaswani et al. introduced the Transformer model, which uses the classic encoder–decoder framework, where the encoder is responsible for understanding the input sequence and the decoder generates the target sequence based on the output of the encoder [
20]. In addition, the Transformer’s Self-Attention mechanism enables the model to automatically notice and integrate information from other elements of the sequence when processing each element in the sequence. In doing so, it accurately captures and models long-distance dependencies, which is especially important when processing long text or complex sequence scenarios. In order to review a longer history and optimize the performance on the look-back window based on local semantic information retaining in the embedding, the channel-independent Patch Time-Series Transformer (PatchTST) was proposed in 2023 by Yuqi Nie et al. [
21]. In general, while deep-learning algorithms have been applied to typhoon tracks and other studies, such as intensity [
7,
22], the timing-based structures continue to evolve, such as bidirectional GRUs, encoder–decoder architecture including multi-layer perceptrons (MLPS) [
22,
23] and so on. Previous studies have encompassed investigations into ocean–atmosphere coupled models, including case studies of Typhoon Hato [
24] and Mangkhut [
25], in addition to the fusion of artificial intelligence with these types of models [
26].
However, most of the studies are based on public datasets, and the data collected by local meteorological organizations are rarely included. Additionally, most of the test sets are based only on a number of selected typhoons during a long period of different years, rather than based on the most recent full years. Therefore, this paper considers the aforementioned two aspects, and studies the performance of fusion algorithms of CNN and three different time-series models: Transformer, LSTM and PathTST.
2. Design on Typhoon Trajectory Prediction System
The typhoon trajectory prediction system pipeline is shown in
Figure 1. There are four main parts: the Trajectory Prediction layer (TP), the Model Construction layer (MC), Data Process (DP), and Data Storage (DS).
The DS obtains both system settings data and typhoon or meteorological datasets from different kinds of data providers, such as CMA, ECMWF, and ZMB. Different data providers provide datasets differently in terms of data types and data formats. In order to simplify the system’s processing of typhoon-related business data from heterogeneous data sources, we provide two basic data readings in the system, which are used to parse text files in CSV format and binary files in NCF, respectively. These files will be organized in advance and stored in the local file system according to a uniform naming convention and directory hierarchy. System settings data are required for the start of the system. The system provides default values of parameters for users to run and test quickly, and users can also specify other parameter values at runtime. On one hand, the DP is expected to read raw business data from the file system, converting them into data objects that meet the format requirements of the algorithm, and to prepare the input and expected output data in advance for the next model training and testing. On the other hand, the DP is also responsible for the parsing of system parameters such as the GPU device ids, the configuration file path, the data source file path, and so on. The purpose of the MC module is to build the neural network model according to the customized algorithm names and rules of net hidden layers, e.g., the number of input channels, the name of each layer, key parameters, etc. After the training and validation, the neural network model will be built with trained parameters for each model layer and all trained parameters will be stored. There is a parameter named “no training” in the system, the purpose of which is not to train, only to test. Once the model is built, we can only choose trajectory prediction without the model training stage by setting it to true. The TP is to execute the trajectory prediction using the trained model.
2.1. Data Sources and Data Storage
The main typhoon track datasets used in this study comes from the CMA, which has provided the datasets as text files every year since 1949 with the named format “CHYYYYBST.txt” [
27,
28]. The CMA Tropical Cyclone Best Track dataset covers the Northwest Pacific Ocean, including the South China Sea, north of the equator and west of 180° E. The track location is recorded every 6 h; however, since 2017, the recording frequency of typhoons landing in China has increased to every 3 h in the 24 h before the typhoon landing. Furthermore, a time frame of every 3 h extends the entire typhoon activity after its landfall. In order to unify the frequency of data collection, this paper adopts all the data of one line every 6 h, and ignores other lines of data collected every 3 h. The recording of each typhoon begins with a typhoon head row data, followed by multiple track lines closely arranged in chronological order, which record the typhoon’s moving path in detail. Taking the first typhoon “Megi” in 2022 as an example,
Figure 2 shows the format of the header and track lines. The header always begins with the classification flag “66666”, which indicates the best path data. From track data lines, we can extract seven feature values for the location passed by each typhoon.
For example, from the first row of the track data in
Figure 2, we can read that at midnight on 9 April 2022 (time), a typhoon intensity level 1 is recorded, belonging to Tropical Depression (TD) and passing through the latitude 10.8° N and longitude 127.1° E, with the lowest central pressure of 1002 hPa, the 2 min mean near-center maximum wind speed (MSW) of 13 m/s, and the last feature, i.e., the 2 min mean wind speed (m/s), has no valid value. The study involved all the CMA best typhoon track data between 2000 and 2022, which are put into the configuration file with csv text format, named similarly to “cma_bst.csv”.
According to the CMA best typhoon track, some necessary settings should be prepared in advance, including the latitude and longitude range of the grid, pressure levels, and meteorological features. Then, more meteorological information can be obtained from ECMWF based on the time when the typhoon passed the track location and other settings by using the Climate Data Store (CDS) API [
29]. Once configured, grid points can be expanded in the four directions, i.e., east, west, south and north, centered on the latitude and longitude of the best CMA track path. The preset feature data can be downloaded for each grid point according to the time of the current typhoon passing through the track point. The download data task is a little time-consuming, since, in addition to the download content having a large number basically equivalent to the CMA trajectory data, it also needs to consider the network speed on the machine, the country or regional network access restrictions where the client is located, and the actual response ability of the server that provide services to the client at the time of downloading. Compared with direct web access, it is no better than using a background program without human–computer interaction by utilizing the Climate Data Store (CDS) API. In this study, we set six levels for each grid point, and six meteorological features for each level. The biggest size of the grid is 97 × 97, i.e., 48 points are extended in four directions, respectively. The latitude or longitude units of adjacent points in each direction are increased or decreased by 0.25° grid units, as shown in
Figure 3. For each CMA typhoon best path trajectory point, a file such as “2022040906_108_1267.nc” in the Network Common Format (NCF) holds more meteorological information, up to 5D in size (1, 6, 6, 97, 97). It is difficult to illustrate all the detail information from ECMWF since the data size is too big. In here, only time, the six level values, size, and other feature names are listed in
Figure 3, without detailed values of all the features.
The third data source is Zhuhai Meteorological Bureau (ZMB) which offers more meteorologic information around Zhuhai city. Similar to ERA5, we only focus on the data at the specific time when the typhoon best track location was recorded. All the associated data were exported into a csv text format from the database and then put into the system. Considering the confidentiality of local data, a pseudo example is given here, as shown in
Figure 3.
A total of six meteorological features are available, and we gathered the data of each meteorological feature into a separate csv file, so we have a total of six csv files for storing local private meteorological data. In order to simplify the system reading of data, we made specifications on the rows and columns of each csv file. The first attribute is the time point of system acquisition, which is the primary key consistent with the time recorded in CMA, and the rows are sorted in chronological order. Except for the first attribute, the other attributes are named after the site number and arranged in natural order, which ensures that on reading different files, the cell with the same row number and column number corresponds to the different feature values collected by the same remote data collection station at the same time, facilitating the data loading and merging in detail. It also facilitates the selection or non-selection of feature value data on a large scale by setting whether the corresponding data file is loaded or not.
In this study, all meteorological information is saved in files. ERA5 data are stored in the form of binary files, which covers the largest amount of data, accounting for about 12.0 GB of storage space. There are 19,045 NCF files in total, and each file is stored in a two-layer structure directory, with the upper layer of the parent directory named after the year and the lower layer of the subdirectory named after the typhoon number. For example, a relative full file path “2022/20220002/2022040906_108_1267.nc” contains the relevant meteorological data around the track point of the 2022 typhoon number “20220002” passing through the latitude 10.8° N and the longitude 126.7° E at 06:00 on 9 April 2022. The data of ZMB and CMA are stored as csv files in the configuration folder.
In addition to the meteorological data, the system startup configuration data need to be set before the system runs, specifying which GPU to use, whether to append the ZMB data, the batch size, initial learning rate, loss function type, number of folds for multi-fold cross validation and so on.
Table 1 lists some commonly used parameters for the prediction system.
2.2. Data Loading
In contrast, system-level argument parsing can be implemented in an easy way, such as by using the command-line parsing library to extract arguments. However, handling three different data sources is somewhat more complex, as one must consider how to organize the structure of folders and local data, how to name files, how to read different data sources, and how to design interfaces accordingly. In this section, we mainly discuss how to deal with business data.
There are two main steps for reading business data and
Figure 4 shows the whole process.
On the first step, the system reads the CMA best trajectory data and generates the TyDicUnion instance. All the best typhoon track points are sorted in time order and grouped by typhoon number. Two additional attributes are added into each best typhoon track point, both previous and post track points. In order to improve the efficiency of reading the CMA best track data and building these two other attributes, they are cached. The first data loaded will be read from the file system and cached after the two attributes are extended. Later on, the system accesses these data directly from the cache instead. In the second step, all the typhoon best track points are examined by group of typhoon number, data are read from two other data sources accordingly, and the integrated business data object will be built, becoming available for the subsequent model building and prediction layers.
2.3. Model Construction Layer
After setting the business data object and the expected label data, the former is divided into training and validation data according to the division ratio. Both training and validation data need the expected label data, which is mandatory for supervised learning. The main task of MC is to build a deep-learning model. Model layers are divided into the two following parts: (1) the encoding layer of meteorological data without the best track data from CMA, and (2) the decoding layer, which trains the best track data from CMA and the intermediate results generated by the previous encoding, calculates and checks validation losses, and finds and saves the optimized model from the subsequent prediction task.
Only when the number of optimal trajectory points reaches a specific value can it be used as effective data for training, validation and testing, and we call this a segment. During the stage of model building and testing, a segment includes both the input and expected output track points. However, in actual application scenarios, the concept of the segment will be weakened to the set of input track points of the model, due to the lack of real values in the future.
Figure 5 shows an example of the segment with five input trajectory points and four output trajectory points.
Only one unique time can be found as the key time per segment, depending on if it is the latest time of the input trajectory points and whether the next time is the first prediction. For example, in
Figure 6, the key of the first segment is “2020073118”; at this time the typhoon “Sinlaku”, numbered “20200003”, was passing through the position of 17.8° N 111.3° E, and at the next time, “2020073121”, it would pass through the position of 18.4° N 110.9° E. These coordinate points are filled in the cells corresponding to the Current0 and Post1 columns in the first row.
In order to compare the influence of the size of the best track segment on the typhoon track prediction results, we design a dynamic adjustment of the segment by two related parameters, “NORMALIZED_SIZE” and “PREDICTABLE_SIZE”. Within the segment, the former is the number of input trajectory points, and the latter is the number of predicted trajectory points. The whole length of the segment is the minimum length per typhoon. In other words, the result obtained by adding these two parameters is the limitation of the minimum size of the typhoon track points, and this value will also be used as an evaluation standard for whether the typhoon data are input into the system or not. For example, if NORMALIZED_SIZE = 5 and PREDICTABLE_SIZE = 4, then only the typhoons that contain nine or more best track points are considered.
For better training data, this study forms and shuffles typhoon data segments, and divides the training data and validation data after shuffling the order. Considering the purpose of simulating the best track prediction of typhoons in the future, we choose a year as the dividing line between training and testing data. The data before the chosen year are used as the training and validation sets, and the data after that year is used as the test set. For example, when the selected year is 2021, the data from 2000 to 2020 are used for training and validation, while the data from 2021 to 2022 are used for testing.
There are three main classes for datasets, i.e., DataProcessor, TyDataset, and TyDicUnion, as shown in
Figure 6.
The DataProcessor is the only class that can be used externally (i.e., the interface class directly accessed by the programmer), and its instance “dp” can be generated by the system. The “dp” instance generates a TyDicUnion instance and multiplies TyDataset instances based on it. For instance, one TyDataset instance is for training, and the other is for testing, which can be obtained by the public function get_data within “dp” by passing different values for the parameter data_flag, either 0 for train or 1 for test, separately. The data model classes consist of both the encoding layer and the second core layer after encoding, such as CNNModel and PositionalEncodingModel, and the model TransformerModel after encoding. For different models, the class of Transformer can be replaced with other models, such as LSTM, Path-TST and so on. The process of model creation is actually a process of finding the best model through training, and the loss function results of the validation set are evaluated after each epoch to find the optimal model with the minimum loss value. In this study, mean square error (MSE) is used, since MSE has a clear mathematical expression and is relatively simple in mathematical processing, which is not only convenient for calculation and solutions, but also sensitive to outliers in the data. In addition to MSE, we also tried the distance between the real location and the prediction location.
2.4. Trajectory Prediction Layer
Once the model is trained, it is much easier to make predictions, as the predictions are computed only once on the test set, rather than many times over to find the best model during model training stage. The overall process of prediction on the test set is shown in
Figure 7.
If the model has been built and saved into the local file system, the model can be loaded from the file system during testing, otherwise, we have to build the model by training it in advance. So, the trajectory prediction process also includes the building of the Model Construction Layer if the model is not built yet.
3. Data and Algorithm Analysis
In this paper, the following three data sources are used: CMA, ECMW and ZMB; the first two are public and the latter is confidential. In terms of time, we mainly focus on the moments when the best track point of the typhoon was recorded by CMA, each of which is an integer multiple of 6, such as 0, 6, 12, 18, every typhoon day. From the perspective of spatial points, we focus on the surrounding spatial grid centered on the coordinate point of the best trajectory of the typhoon, and the location of the local acquisition stations in Zhuhai. The former changes with the typhoon track and represents a dynamic spatial coordinate point, while the latter does not change with the typhoon track and is a static point. When all data are fed into the system, a prediction of the future latitude and longitude is made. Only 6, 12, 18 and 24 h in the future are considered in this study, so predictions always contain four future points. Only typhoons with a certain number of trajectory points can be correctly identified, and the total number includes the input points and expected output points, e.g., a minimum limit of eight points, where the first four points are input and the last four points are output. The quantity limit varies with the input length and the prediction time range, which form a segment together. From a time point of view, we mainly focus on the moments when the best track points of the typhoon were recorded by CMA.
Although the best track data of CMA can be traced back to 1949, considering on the one hand that the earliest record of ZMB local database is much later (2010), and that we hope to exert the influence of local data as much as possible, while maintaining enough data for model training, only data prior to 2000 were used in this study.
Some basic system settings are listed in
Table 1, which can be set to different values for research purposes. This paper focuses on the combination of different values of the three following parameters: MD, which indicates the type of algorithm; NS, which is the determining factor of the size of the typhoon data segment; and LF, which shows whether to append the local data or not, for a total of 30 different scenarios, as shown in
Table 2.
The system can be started once the data are ready, with the system settings well-configured. When the system is running, the system settings are parsed, and subsequently, the business data are retrieved. After that, CNN layers are dynamically generated based on customized rules, with model layers constructed correspondingly. Then, the model is trained and evaluated, before entering the stage of prediction. The dimension of time step or window size is ignored on the CNN layer and is used on the next layer such as Transformer, LSTM and PatchTST due to its spatial time series. We listed the whole data flow in
Figure 8. The segments consists of two parts: M and T. The first part, M, is built based on datasets from both ECMWF and ZMB; the second part, T, is from CMA.
The CNN layer reduces the spatial dimension into a smaller feature size, such as the size from the input 36 × 27 × 27 to the output 1 × 128—see
Figure 9—where on the final convolution sublayer, the size of tensor is converted to 256 × 4 × 4. It becomes 1 × 4096 on the flat sublayer, and the output of CNN has a size of 1 × 128 after multiple dense sublayers.
This study allows the user to define the CNN structure by preset associated parameters. Different CNN parameter combinations can be configured in advance, such as the kernel size, stride, etc. Each parameter combination contains its active condition, such as the minimum and maximum size of input matrix. We build both the rules of generating the CNN structures by all parameter combinations and their active conditions together. Once the rules are prepared, the only thing we need to do is to look over all the rules to check if the condition is met; if yes, discard the rule, otherwise, continue. The process stops when all of the preset rules are exhausted. The whole logic is described in
Appendix A.
The CNN layer is the encoding layer of the whole model, which belongs to the first stage of the full deep-learning model. After this stage, a time step is added, the value of which is specified by NS, and the latitude and longitude are also included. At the beginning of the second stage, the size of the tensor is NS × (128 + 2). For example, when NS = 5, PS = 4, the input size of the algorithm becomes 5 × 130, and the expected output size is 4 × 2, that is, the four trajectory points of the typhoon track passed in the next 6, 12, 18, and 24 h, providing eight predicted values in total. We listed
R-squared, MSE and average distance errors for 30 different scenarios in
Table 3.
Table 3 shows that when ZMB data are added, the two algorithms, CNN+LSTM and CNN+Transformer, have higher
R-squared scores, by 0.004 and 0.005, respectively. The average distance error of the CNN+LSTM, CNN+PatchTST and CNN+Transformer algorithms is shortened by about 13.58, 28.99, 267.37 km, respectively. The structure of CNN+LSTM, PatchTST, and Transformer is listed in
Appendix B.
For each scenario, we also plot its true trajectory points and prediction points on the map of each typhoon, as shown in
Figure 10, which is the map of the 12 typhoons in 2021 and 2022 after CNN+LSTM, scenario 12.
We also plot the prediction errors for both latitude and longitude components on the coordinate system for each prediction trajectory track for the next 6 h, 6 h, 18 h, and 24 h by CNN+LSTM, as shown in
Figure 11. As we know, the study predicts trajectory based on segments, so the axis x is the recorded number of the segments in chronological order, where there are 1046 segments in total.
4. Discussion and Result
This paper proposes a hierarchical design of a Typhoon Trajectory Prediction simulation system, including Trajectory Prediction (TP), Model Construction (MC), Data Process (DP), and Data Storage (DS). TP can be run alone or in conjunction with MC. When TP was run alone, the test model was directly loaded with the status data that was previously saved on the MC stage. When TP is run with MC, the model with the best fitting effect is selected from the previously trained models and tested. The system can automatically identify new data from data sources that have been used when users download and save the new ERA5 data according to the naming protocol, or organize the data in the established format and store them in the corresponding configuration file. In addition, in order to keep a good test coverage of the system, we have also designed a combination of dynamic parameter detection and default parameter presetting, which is convenient for customers to modify parameters on demand, such as controlling the window size of input sequences by NS.
Since the typhoon trajectory data have both spatial and time-series characteristics, we choose different algorithms to process different characteristics; CNN is used to process spatial data, and LSTM, Transformer and PatchTST are used to process time-series data based on the moderate results after CNN. This design is open to other algorithms, but only if new algorithms are implemented and embedded according to the interface definition, using both functions get_default_xxxModel and init_weights, where xxx is the detail model name.
From the experimental data, it can be found that the performance of each combination model of CNN+LSTM, CNN+Transformer and CNN+PatchTST is very good, and the average R-squared score is at least 0.969. With ZMB data, the the average R-squared score of CNN+LSTM achieves 0.991, 0.004 more than that without ZMB data. Therefore, we can obtain the following results.
- (1)
Appropriately increasing the data close to the land and ocean boundaries around the coast is conducive to the prediction of typhoon tracks.
- (2)
Practice has proved that the following strategies work well: extracting hidden features by CNN, configuring dynamic customized algorithm names and the rules of net hidden layers to build the model, making segments of the track points of the input and expected output, grouping all the typhoon best path points by typhoon number to divide the training set and test set, using adaptive learning rates, adopting K-fold cross-validation, and so on.
- (3)
The CNN+LSTM is a slightly better choice in terms of the prediction of typhoon tracks according to our experiment data, while the CNN+PatchTST is more stable than CNN+Transformer.
The worst performance of transformers that was recorded is contrary to what happens in most natural language processing and image processing applications. In our case, this could mean two things: either modeling long dependencies is not crucial in our application or the data used is not enough to properly train the transformer. This is an issue we intend to address in future work.
Recently, the retention mechanism for sequence modeling has been proposed, which supports three computation paradigms, i.e., the parallel, recurrent, and chunkwise recurrent paradigms [
30,
31]. This methodology presents several advantages over transformers that we intend to explore in future works. Regarding future work, we will focus on similar solutions based on a coupled atmosphere–ocean global climate model.