Sea Surface Temperature Forecasting Using Foundational Models: A Novel Approach Assessed in the Caribbean Sea

Bustos Usta, David Francisco; Rodríguez-López, Lien; Torres Parra, Rafael Ricardo; Bourrel, Luc

doi:10.3390/rs17030517

Open AccessArticle

Sea Surface Temperature Forecasting Using Foundational Models: A Novel Approach Assessed in the Caribbean Sea

by

David Francisco Bustos Usta

^1,*

,

Lien Rodríguez-López

²

,

Rafael Ricardo Torres Parra

³

and

Luc Bourrel

⁴

¹

Facultad de Oceanografía, Universidad de Concepción, Concepción 4030000, Chile

²

Facultad de Ingeniería, Universidad San Sebastián, Lientur 1457, Concepción 4030000, Chile

³

Grupo de Investigación en Geociencias GEO4, Departamento de Física y Geociencias, Universidad del Norte, km 5 vía Puerto Colombia, Barranquilla 25260, Colombia

⁴

Géosciences Environnement Toulouse, UMR 5563, Université de Toulouse, CNRS-IRD-OMP-CNES, 31000 Toulouse, France

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(3), 517; https://doi.org/10.3390/rs17030517

Submission received: 12 December 2024 / Revised: 27 January 2025 / Accepted: 30 January 2025 / Published: 2 February 2025

(This article belongs to the Special Issue Advanced Applications of Remote Sensing in Monitoring Marine Environment (Second Edition))

Download

Browse Figures

Versions Notes

Abstract

:

Sea surface temperature (SST) plays a pivotal role in air–sea interactions, with implications for climate, weather, and marine ecosystems, particularly in regions like the Caribbean Sea, where upwelling and dynamic oceanographic processes significantly influence biodiversity and fisheries. This study evaluates the performance of foundational models, Chronos and Lag-Llama, in forecasting SST using 22 years (2002–2023) of high-resolution satellite-derived and in situ data. The Chronos model, leveraging zero-shot learning and tokenization methods, consistently outperformed Lag-Llama across all forecast horizons, demonstrating lower errors and greater stability, especially in regions of moderate SST variability. The Chronos model’s ability to forecast extreme upwelling events is assessed, and a description of such events is presented for two regions in the southern Caribbean upwelling system. The Chronos forecast resembles SST variability in upwelling regions for forecast horizons of up to 7 days, providing reliable short-term predictions. Beyond this, the model exhibits increased bias and error, particularly in regions with strong SST gradients and high variability associated with coastal upwelling processes. The findings highlight the advantages of foundational models, including reduced computational demands and adaptability across diverse tasks, while also underscoring their limitations in regions with complex physical oceanographic phenomena. This study establishes a benchmark for SST forecasting using foundational models and emphasizes the need for hybrid approaches integrating physical principles to improve accuracy in dynamic and ecologically critical regions.

Keywords:

Lag-Llama model; Chronos model; extreme upwelling events; southern Caribbean upwelling system

1. Introduction

One of the most critical outcomes of global warming is Sea Surface Temperature (SST) increase [1]. From the 1950s to 2010, the average global SST has risen by approximately 0.4 °C [2], significantly impacting marine ecosystems [3] and sea level rise [4]. Accurate SST predictions offer substantial advantages to environmental studies and applications, particularly in the Caribbean Sea (hereafter, CAR), which hosts vital resources such as tourism, fisheries, coral reefs, mangroves, and extensive biodiversity [5].

The CAR is a semi-enclosed tropical basin, which lies to the east of Central America, north of South America, and west of the Antilles, which forms its boundary with the Atlantic Ocean [6]. Spanning an area of roughly 2.75 million km², it ranks among the world’s largest seas. The bathymetry of the Caribbean varies significantly, with coastal shallows giving way to depths exceeding 7000 m in some regions (Figure 1). This diverse sea supports critical ecosystems, including coral reefs, mangroves, coastal forests, and submarine plains [7]. The CAR climate is warm year-round, influenced by the Intertropical Convergence Zone (ITCZ), which drives seasonal variations. The dry season, from December to March, coinciding with the boreal winter, experiences stronger winds and lower precipitation, with relatively cooler temperatures [8]. The rainy season, from August to November, marked by weaker winds and warmer conditions, aligns with the boreal summer. Between these seasons, sea surface temperatures show a modest variation of about 2 °C at the mixed layer depth [9]. The region is also vulnerable to hurricanes during the rainy season, posing significant ecological, economic, and cultural challenges.

Several approaches have been used for generating short and long-term ocean or atmospheric forecasts in this region, utilizing a range of techniques, including (1) classical statistical methods, (2) physics-based numerical models, (3) machine learning models, (4) traditional neural networks, and (5) convolutional and deep neural networks [10,11,12,13]. However, statistical and physics-based numerical models have limitations, such as requiring substantial computational resources [14], and they depend on robust data assimilation schemes to account for biases and errors in both models and observations [15]. Consequently, producing such forecasts is often limited to large research centers. An alternative to these models is data-driven computing (e.g., machine learning, deep learning, and foundational models), which leverages vast amounts of open-source data to train simpler, more efficient, faster, and less complex models compared to traditional approaches [16]. In this particular study, we decide to use foundational models, which are models trained on a large-scale broad data, and once trained, they operate as a basis for a wide range of downstream tasks related to the original trained model [17].

While deep learning and machine learning have been explored in previous environmental studies in other regions of the world [18,19,20], to our knowledge, they have not been used in the CAR. Emerging methodologies now allow us to harness the strengths of foundational models, such as Amazon Chronos [21] and Lag Llama [22], for short-term SST forecasting (7- to 30-day intervals). These models are aligned with the rise of Large Language Models (hereafter, LLMs) and feature zero-shot learning capabilities, meaning extensive and complex training is not required to achieve strong performance. Interestingly, LLMs are not limited to natural language processing (NLP) tasks but can also be applied effectively in time series forecasting [23,24].

Short-term SST forecasting in the Caribbean Sea is important because it is a key variable in air–ocean interaction processes, affecting weather and climate. Therefore, it is used to assess heat, mass, and momentum fluxes between the ocean and the atmosphere; ocean heat content, which plays an important role in sea level rise [25,26]; upper ocean stratification affecting the mixed layer depth, isothermal layer depth, and barrier layer thickness, which in the Caribbean can affect tropical cyclone intensification [27,28,29,30]; the assessment of marine heat waves, which put in risk important ecosystems in the Caribbean [31]; evaluation of physical processes such as the southern Caribbean upwelling system [32,33]; among others.

The main motivation of this paper is to introduce a novel SST forecasting methodology, applicable not only to the CAR but also to other regions. Therefore, we assess the SST forecast at four regions in the Caribbean Sea (Figure 1), which are modulated by different air–ocean dynamics [9]. We focus on Regions 1 and 4, which are placed in the southern Caribbean upwelling system, with a strong relation between SST and primary productivity [32,34]. In the Caribbean Sea, where small-scale artisanal fisheries are a major livelihood for many underdeveloped coastal communities, reliable SST forecasts provide valuable information on expected changes in fish availability [35]. Improved SST forecasting could enable fishermen and community planners to anticipate productive fishing periods, optimize fishing efforts, and plan for seasonal variations, ultimately contributing to food security and economic stability. This information could also support sustainable fishing practices, helping to manage resources effectively by aligning fishing activities with natural upwelling cycles, thus avoiding overfishing and preserving marine biodiversity.

A key step toward the achievement of this middle-term capability is to assess the SST forecast accuracy by foundational models as a function of the forecast horizon. For this purpose, we use 22 years of observational data (2002–2023), split so that 80% of the time series are used to train the model, and 20% are used to assess the models’ daily prediction capabilities up to 30 days ahead. By bridging the gap between emerging techniques and traditional oceanography, this study aims to advance predictive capabilities and offer valuable insights for broader environmental applications.

Figure 1. Caribbean Sea bathymetry from [36]. Boxes 1, 2, 3, and 4 are used to represent the Venezuelan, Cayman, and Colombian Basins, and western region of the Southern Caribbean upwelling system, respectively.

2. Materials and Methods

2.1. Observational Datasets

For this study, we utilized the Multiscale Ultra-high Resolution (MUR) global SST analysis. This SST dataset integrates both high- and low-resolution satellite observations from microwave and infrared bands (e.g., AMSR-E, AMSR-2, WindSat, AVHRR, and MODIS). These satellite-derived data are further merged with in situ SST estimates through a Multi-resolution Variational Analysis Method [37]. The analysis is generated daily and provided on a regular grid with a spatial resolution of 1/100° (~1 km). The time period selected for this analysis spans from 2002 to 2023.

An important consideration is that cloud coverage bias might affect the MUR dataset, as it would limit the availability of satellite observations. This bias could affect extreme upwelling events assessment, especially during the rainy season, when seasonal cloud coverage peaks in the Caribbean. However, this bias effect in SST is reduced using spatially averaged regions of ~2° × 2°.

To evaluate the forecast performance in assessing upwelling events, we also incorporate daily surface wind data from 2002 to 2023, sourced from the Global Ocean Hourly Reprocessed Sea Surface Wind and Stress from Scatterometer and Model product [38], to compute an upwelling index. This dataset is available at a spatial resolution of 0.125° (~12.5 km).

2.2. Models’ Architecture

2.2.1. Chronos Model

The Chronos model architecture leverages the existing language model structures, such as T5 [39], to perform time series forecasting without any complex domain-specific adjustments.

Considering that we have a time series given by

x_{1 : C + H} = [x_{1}, \dots, x_{C + H}]

where

C

represents the first-time steps to create an historical context and

H

the forecast horizon. The Chornos model follows these four steps to produce the forecast from a given univariate series [40]:

Time series normalization: to avoid bias given by the Equation (1):

Z_{i} = \frac{x_{i} - m}{s}

(1)

where

x_{i}

represents the individual observations,

m

the mean, and

s

the sample standard deviation.

2.: Quantization method: to convert the scaled time series into discrete tokens. For this, we select $B$ bin centers denoted by $c_{1} < \dots < c_{B}$ and $B - 1$ edges $b_{i}$ separating them into $c_{i} < b_{i} < c_{i + 1}$ for $i ϵ {1, \dots, B - 1}$ . The quantization function $q : R \to {1,2, \dots, B}$ is defined by Equation (2):

$q (x) = \{\begin{matrix} 1 if - \infty \leq x < b_{1} \\ 2 if b_{1} \leq x < b_2 \\ B i f b_{B - 1} \leq x < \infty \end{matrix}$

(2)

In addition to the time series tokens

\{1,2, \dots, B\},

two special tokens are also added (

V_{t s}

), which are the PAD and EOS. The PAD is used to pad the time series of different lengths to a fixed length and the EOS is utilized to denote the end of the sequence.

3.: Prediction: the model provides multiple realizations of the future using the predicted distribution using a loss function given by Equation (3):

l (θ) = - \sum_{h = 1}^{H + 1} \sum_{i = 1}^{|V_{t s}|} 1_{Z_{C + h + 1 = i}} \log p_{θ} (Z_{C + h + 1} = i| z_{1 : C + h})

(3)

where

p_{θ} (Z_{C + h + 1} = i| z_{1 : C + h})

represents the categorical distribution by the model parametrized by

θ

.

4.: Dequantization method: to obtain the final prediction, which implements this transformation $d : \{1,2, \dots, B\} \to R$ defined in Equation (4):

$d (j) = c_{j} x_{2}^{i}$

(4)

Figure 2 provides a schematic of the steps outlined above. It is important to note that Chronos, being a model designed for zero-shot predictions, does not require fine-tuning to the time series (Figure 2, middle panel).

2.2.2. Lag-Llama Model

Lag-Llama is a foundation model for univariate probabilistic forecasting, trained on a diverse range of time series data [22]. The model follows these four steps to produce the forecast from a given univariate time series (Figure 3):

Tokenization: to construct lagged features from the prior values of the time series. Given a sorted set of lag indices $L = \{1, \dots, L\},$ the lag operation is defined by $x_{t} \to k_{t} \in R^{| L |}$ where each entry $j$ of $k_{t}$ is given by $k_{t} [j] = x_{t - L [j]}$ . In that way, to construct lag features for some context-length window $x_{1 : C},$ we need to satisfy a larger window with $L$ more historical points denoted by $x_{- L : C}^{P}$ .
Projection: the tokens processed are passed through a shared linear projection layer mapping the features to the hidden dimension of the attention module.
Masked Transformer Decoder Layer: The outputs from the linear projection are passed to the attention layer using a pre-normalization with RMSNorm [41] and Rotary Positional Encoding (RoPE) [42] to obtain the layer’s query and key representations in the LLaMa architecture.
Distribution head: the outputs from the masked transformer layers are used to predict the parameters $ϕ$ of the forecast distribution of the next step; these parameters are given by the parametric distribution head.

Figure 3. Overall Lag-Lama Architecture. The model learns an output distribution based on lagged features. The input is a token of univariate time series

i

at a given timestamp

x_{t}^{i}

. The

C_{t}^{i}

represent all the covariates used for the prediction. The inputs are projected through M masked decoded layers. The features are passed through the distribution head and trained to predict the parameters of the forecast distribution in the next time step. Adapted from [22].

Figure 3. Overall Lag-Lama Architecture. The model learns an output distribution based on lagged features. The input is a token of univariate time series

i

at a given timestamp

x_{t}^{i}

. The

C_{t}^{i}

represent all the covariates used for the prediction. The inputs are projected through M masked decoded layers. The features are passed through the distribution head and trained to predict the parameters of the forecast distribution in the next time step. Adapted from [22].

2.3. Other Models for Comparison

To assess data-driven foundational models’ performance against other statistical forecast approaches, two additional models were considered for the analysis, Autoregressive Integrated Moving Average (ARIMA) and Prophet. The former has three main components such as AutoRegression (AR), where past values of the series are linearly regressed to predict future values; Integration (I), which involves differencing the series to achieve stationarity by removing trends or seasonality; and Moving Average (MA), where past forecast errors are used to refine predictions. Mathematically, an ARIMA(p,d,q) model is defined by p, the order of the autoregressive terms; d, the degree of differencing applied to the series; and q, the order of the moving average terms. This model has been used for a long time to predict univariate time series data and it has been applied in different applications in meteorology [43,44] and oceanography [20,45].

The prophet model is an additive time series forecasting model designed to handle data with strong seasonal patterns and irregularities such as missing values or outliers [46]. It decomposes the time series into three main components: trend, which captures long-term growth or decline using piecewise linear or logistic models; seasonality, which represents recurring patterns at daily, weekly, or yearly frequencies using Fourier series; and holidays, which allows the inclusion of user-defined special events. Unlike traditional models like ARIMA, Prophet is highly automated, and has been used in a few studies related to SST forecasting [47,48].

2.4. Model Evaluation

The evaluation process proceeds as follows. Four distinct regions are selected, placed in the Venezuela Basin, the Cayman Basin, the south Colombia Basin, and north of the Guajira Peninsula referred to as Boxes 1, 2, 3, and 4 in Figure 1. Regions 1 and 4 are placed in the Southern Caribbean Upwelling system. These regions were chosen due to the presence of diverse physical variability modes within the broader study area [49]. Subsequently, daily time series are computed by averaging the data within each box, resulting in four separate time series for further analysis. The time series is then divided into an 80–20% split: 80% (6306 days, between 1 June 2002 and 6 September 2019) of the data is used to establish a baseline for few-shot learning, and the remaining 20% (1577 days, between 7 September 2019 and 31 December 2023) is reserved for testing models’ prediction performance. Given that Chronos and Lag-Llama models are specifically designed for zero-shot learning, no dedicated training procedures are required [21,22]. For ARIMA and Prophet, since a learning period is required, we train the models using the first 80% of the sample and then calculate the forecast metrics in this test period.

For each day within the 20% test segment of the time series, both the Chronos and Lag-Llama models are utilized to forecast up to 30 days ahead, moving forward in daily time steps. Using these predictions, we assess the forecasting accuracy against the observed MUR time series by calculating errors for each run with various performance metrics. These include Mean Absolute Percentage Error (MAPE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Bias (BIAS), Median Absolute Error (MedABS), and Mean Absolute Error (MAE), defined in Equations (5)–(10), respectively. These metrics allow us to quantify the models’ predictive performance over different forecast horizons.

y_{i}

corresponds to each daily SST observation, while

\hat{y_{i}}

corresponds to the daily forecast.

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - \hat{y_{i}}}{y_{i}}|

(5)

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}

(6)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}

(7)

B I A S = \hat{y_{i}} - y_{i}

(8)

M e d A B S = m e d i a n (|\hat{y_{i}} - y_{i}|)

(9)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - \hat{y_{i}} |

(10)

Finally, after identifying the optimal model from the time series, we evaluated spatial forecasting performance through parallel processing of individual pixels within Regions 1 and 4, areas of particular interest due to their characteristic upwelling systems (see Section 2.4). Simulations were generated for each node within the defined boxes (Figure 1) at 5 km (0.05°) spatial resolution, with errors at lags 1, 7, 14, 21, and 28 (as shown in Figure 4, Figure 5 and Figure 6).

2.5. Case Study: Forecast Assessment of Extreme Upwelling Events

To validate the model’s high-frequency performance, we decided to use two specific cases in the Southern Caribbean upwelling system (Box 1 and 4, Figure 1) assessing the models’ ability to replicate four extreme upwelling events (characterized by large SST negative anomalies). There are different ways to identify upwelling events such as cross-shore SST differences [50] and the vertical water transport rate along the coastline (CUTI) [51,52]. However, all existing indices consider three important factors: SST, wind speed, and the relative alignment of wind direction with respect to the coastline. Thus, these factors were used as key indicators to identify upwelling events through the upwelling index proposed by [53]:

U I = μ (\cos (θ - α))

(11)

where μ and θ represent the wind speed (ms⁻¹) and direction (degrees), and α indicates the coastline orientation relative to the north. We utilized α = 57° as the angle that the coast forms with respect to the Equator in this region [54]. An upwelling event was identified based on three criteria: (1) UI > 0, (2) SST below a 25th percentile threshold [55] using the SST daily climatology using the MUR data in the 2002–2023 period, and (3) persistence of these conditions for more than 3 consecutive days. Different upwelling metrics were obtained using the spatially averaged SST time series of Regions 1 and 4, including frequency, mean intensity, and duration. Frequency represents the number of events in the assessed period (annual or monthly). Mean intensity refers to the average temperature drop relative to the daily climatological baseline (Percentile 25) of all upwelling events in a given period, while duration measures the span in days average of all upwelling events [56].

3. Results

3.1. Evaluation of Models’ Performance

Results from the comparative analysis of the forecasting models, Chronos (blue), Lag-Llama (red), ARIMA (green), and Prophet (purple), across four distinct regions (Region 1, 2, 3, and 4) are presented on Figure 4. The evaluation metrics (Section 2.3) are plotted against the forecast horizon, extending from 1 to 30 days (D).

Across all regions, the Chronos model consistently outperforms Lag-Llama. This suggests that Chronos provides more accurate forecasts and exhibits a more stable predictive performance over time. Metrics reveal significant discrepancies between the models, especially in Regions 1 and 2, where Lag-Llama’s errors increase more rapidly with the forecast horizon, while Chronos maintains relatively lower errors. We also observed a similar performance between Chronos and Prophet; however, Chronos shows better error metrics across all the regions and metrics compared. Overall, ARIMA shows the poorest performance.

Furthermore, the error metrics for zero-shot models demonstrate an increasing trend with the forecast horizon, which is expected as uncertainty in predictions typically grows over longer time horizons. On the other hand, the ARIMA and Prophet models show a more consistent trend across the forecast horizon. In all regions, Chronos and Prophet consistently outperform Lag-Llama and ARIMA, but the performance gap varies. For instance, in Regions 1 and 4, both foundational models show stable error trends, with Lag-Llama maintaining a similar difference to Chronos’ performance at the different forecast horizons. In contrast, Region 2 reveals pronounced fluctuations in Lag-Llama’s error metrics, suggesting difficulties in capturing complex patterns or data irregularities. These results highlight Chronos’s and Prophet’s superior ability to maintain accuracy and generalize across extended periods.

Figure 4. Regional Comparison (see Figure 1) of Forecasting Accuracy for Chronos (Blue), Lag-Llama (Red), ARIMA (green), and Prophet (purple) models over 30-Day Forecast Horizons. First row (MAPE in %), second row (MSE in °C²), third row (RMSE in °C), fourth row (Bias in °C), fifth row (MedABS in °C), and sixth row (MAE in °C).

Additionally, we conducted a model performance comparison across all four regions during the test period using Taylor diagrams (see Figure S1 in the Supplementary Material). Results are coherent with Figure 4, showing that Chronos outperforms all other models. Prophet ranks second with lower error metrics, while ARIMA is the poorest-performing model. Based on these results and the aim of testing foundational models’ performance, we continue the forecast assessment using only results from the Chronos model since it was the best-performing model from the comparison.

3.2. SST Time Series Predictions

We evaluate the Chronos model performance forecasting the time series in the four regions during the test period, using four different forecast daily horizons (D1, D7, D14, and D28) (Figure 5). The MUR SST shows slightly different seasonal patterns across all four regions, with temperatures fluctuating between approximately 24 °C and 32 °C. Overall, September–October are the warmest months (rainy season), while February–March are the coldest months (dry season), and are thus in accordance with boreal summer and winter, respectively. Region 2 consistently shows the highest temperature and warmest peaks, as this is a convergence region due to the Caribbean Low-Level Jet forcing [9]. Conversely, Regions 1 and 4 exhibit generally cooler temperatures as they are placed in the Southern Caribbean Upwelling system. Note that SST is also affected by interannual variations. For example, during February of 2021 and 2023 in Region 4, SST is lower than in other Februarys. This is probably because the CLLJ fluctuates over time [57] (Figure 5A). Region 3 SST shows a similar pattern to Region 2.

We assess the forecast bias, as it indicates the daily differences from the observed SST. The closer lead-time forecasts (D1) align well with the observed data, capturing the variability and seasonality of the SST with minimal deviations (Figure 5B–E). As the forecast lead time increases to D7, D14, and D28, a gradual deterioration in forecast accuracy becomes evident. This is especially noticeable during periods of rapid temperature changes, where longer lead-time forecasts struggle to track the observed SST dynamics, indicating increased uncertainty and error. Furthermore, Region 4 appears to have slightly larger bias magnitudes at longer forecast horizons.

We compute some statistics of the SST forecast bias in all regions, to objectively assess their differences (Table 1). Regions 1–3 exhibit a mean systematic negative bias that intensifies with longer forecast horizons, reaching its most pronounced value (−0.075 °C) at D28 in Region 2, while in Region 4, all the values are positive and small (<0.01 °C) across all the forecast horizons. Standard deviation (STD) and range increases with forecast horizon across all regions. These results demonstrate that shorter lead-time forecasts (<7D) maintain higher accuracy, while longer forecasts exhibit increase divergence from the observed SST.

Region 4 shows the largest bias when compared to the other regions at D14 and D28 forecast horizons, reaching an STD of 0.781 °C and 4.731 °C range at D28, thus significantly higher than other regions. STD and range in Regions 1 and 2 are very similar, while the lowest bias metrics occur in Region 3. This indicates that the model accuracy varies regionally, probably because ocean dynamics, which affect SST, also vary among regions, with some being more difficult to forecast. Thus, it seems that the model is less accurate at forecasting SST in Region 4 in forecast horizons >7 days, probably due to large SST variability associated with upwelling processes in fortnight and larger periods.

We also assess the frequencies at which the model struggles to forecast the observed SST time series in Regions 1 and 4, by comparing the frequency spectra of the observed SST time series with the forecast biases of 7 and 28 days horizon (Figure S2). In all cases, the mean and trend were previously removed from the time series. Results show that both observed SST time series concentrate their variance in low frequencies, close to the annual period. The bias spectrum shows that D7 and D28 forecasts are good, capturing dominant SST low-frequency signals in both regions and lowering the variance, allowing the main frequencies remaining in the bias to be observed, which indicate the periods in which the model has some limitations reproducing the observed SST time series. In both regions, D7 better removes lower frequencies (periods > 30 days); as a consequence, its bias has lower variance dispersion along the spectrum when compared to D28. In addition, D28 bias in Region 4 keeps more variance in the lower frequencies when compared to Region 1 (as seen in Figure 5). In both cases, variance is not concentrated in a single frequency.

As the Chronos model, especially at forecast horizons < 7 days, succeeded in reproducing the SST variability at all frequencies, including its strong seasonal cycle, we decided to move a step forward and assess the model’s ability to reproduce extreme upwelling events (Section 3.4.4), which to our knowledge has not been attempted before in the southern Caribbean upwelling system.

Figure 5. Observed Sea surface temperature across all analyzed regions (see Figure 1) from 4 October 2019 to 31 December 2023 (A). (B–E) present bias errors for forecast lags of 1 day (D1), 7 days (D7), 14 days (D14), and 28 days (D28).

Table 1. Statistical metrics of forecast bias across four regions selected at different temporal lags (D1, D7, D14, and D28 days). Metrics include mean bias (MEAN), standard deviation (STD), minimum (MIN) and maximum (MAX) values, and range. All values are expressed in degrees Celsius (°C).

Region	Lag	MEAN	STD	MIN	MAX	RANGE
1	D1	0.003	0.155	−1.117	0.761	1.878
	D7	−0.018	0.369	−1.389	1.202	2.591
	D14	−0.017	0.447	−1.516	1.416	2.931
	D28	−0.039	0.505	−1.724	1.695	3.420
2	D1	0.002	0.169	−0.668	0.946	1.614
	D7	−0.032	0.411	−1.339	1.771	3.111
	D14	−0.049	0.472	−1.504	1.542	3.047
	D28	−0.075	0.534	−1.981	1.574	3.555
3	D1	−0.003	0.140	−0.661	0.707	1.368
	D7	−0.021	0.309	−1.143	0.751	1.894
	D14	−0.019	0.371	−1.415	1.196	2.611
	D28	−0.015	0.443	−1.742	1.252	2.994
4	D1	0.001	0.160	−0.772	0.533	1.306
	D7	0.002	0.454	−2.141	1.435	3.576
	D14	0.005	0.613	−2.224	2.041	4.264
	D28	0.009	0.781	−2.295	2.435	4.731

3.3. SST Spatial Predictions

We evaluated the spatial forecast performance through a comprehensive error analysis across forecast horizons from D1 to D30. While the analysis encompassed all temporal lags, we present results for key intervals at D1, D7, D14, D21, and D28, utilizing the MAE metric. This assessment focused particularly on Regions 1 and 4 (see Section 2.3), which represent characteristic upwelling zones. Figure 6 reveals marked spatial heterogeneity in forecast skill across both regions. In Region 1, the highest MAE values (0.55–0.65 °C) appear in the southeastern region, while in the northern region, there are lower values (0.40–0.50 °C). This north–south gradient in forecast skill persists across all forecast horizons, though error magnitude increases progressively from day 1 to 28 (spatially averaged error increases from 0.45 to 0.48 °C, respectively).

Figure 6. Spatial distribution of Mean Absolute Error (MAE) in SST forecasts for Region 1 (A–E) and Region 4 (F–J) at forecast horizons of 1, 7, 14, 21, and 28 days. Error values are shown in °C across all panels.

Region 4 presents a similar error pattern. The highest MAE values (0.65–0.80 °C) are concentrated along the upwelling zone nearest to the coast (~12.5°N). This pattern reflects the challenge in predicting intense SST gradients associated with coastal upwelling events, especially close to the coast where upwelling is stronger ([48] their Figure 5). Again, the error magnitude increases with the forecast horizon, as the spatially averaged error increases from 0.55 °C on Day 1 to 0.61 °C on Day 28. Note that MAE is higher in Region 4 than in Region 1, probably because coastal upwelling is stronger in the former (Figure 7), as has been also shown in the sea level signal [43]. Similar results are observed with other error metrics (see Supplementary Material, Figures S3–S5).

One potential explanation for the largest forecast errors near the coasts could be related to MUR dataset limitations in this boundary region, including the cloud coverage bias, which would limit the availability of some satellite observations. Although this can affect the model forecast accuracy, we think this is not a dominant effect, as larger errors are not constant along the coasts, and they change with the forecasting time horizon (Figure 6 and Figures S3–S5).

3.4. Observed Extreme Upwelling Events and Their Forecast

3.4.1. Observed Upwelling Seasonality

We evaluate upwelling seasonality using daily climatological percentiles (25th and 50th) derived from the 2002 to 2023 period for Regions 1 and 4, respectively (Figure 7). In these two regions, seasonality shows lower values in February–March, as in the dry (colder) season the upwelling strengths as a consequence of the Caribbean Low-Level Jet (CLLJ) intensification. On the contrary, higher SST values are in September-November, as in the rainy (warmer) season the CLLJ and coastal upwelling weakens. In addition, the CLLJ strengthens in ~July during the Mid-Summer Drought [34,58], causing an upwelling intensification, which can be observed in an SST reduction in this month.

Region 1 (Figure 7A) presents a moderated thermal regime, with SST median variations between 25.95 °C and 29.12 °C. In contrast, Region 4, situated along the Guajira upwelling system, shows a more pronounced seasonal SST variability, with median temperatures ranging from 25.45 °C during the strongest upwelling period (February–March) to 29.18 °C during the warm season (October). The larger SST range (3.73 °C) suggests a more intense upwelling response, likely due to its direct exposure to the Caribbean Low-Level Jet (CLLJ) [57] when compared to Region 1 (range 3.17 °C). Note that the upwelling SST seasonality agrees with the observed semi-annual sea level cycle [48] in their Figure 5, as the upwelling intensification also reduces the sea level toward the coast.

The P25 percentile is used as a limit to detect extreme upwelling events. In Region 1, the P25 curve maintains a more consistent separation from the median throughout the year when compared to Region 4 (Figure 7). Consequently, the average difference between P50 and P25 is 0.35 °C in the former and 0.40 °C in the latter. This difference and the largest SST median range indicate that stronger upwelling events (SST reduction) occur in Region 4, which supports that the spatial differences in the MAE between these two regions (Figure 7) are probably related to model limitations to reproduce coastal upwelling events. Note that stronger upwelling events seem to occur in February–March, and July, thus when the CLLJ is more intense (largest P50–P25 differences).

3.4.2. Observed Annual Variability

The temporal analysis of observed extreme upwelling events (SST < P25) in Regions 1 and 4 during the 2002–2023 period exhibits large interannual variability and weak positive and significant trends in mean intensity (0.05 ± 0.03 °C and 0.07 ± 0.04 °C per decade, respectively) (Figure 8A). This trend might be explained by the observed SST regional warming [59], as extreme upwelling events are calculated from a 22-year fixed period. The mean intensity values are comparable between regions (−0.74 ± 0.09 °C for Region 1 and −0.83 ± 0.11 °C for Region 4) (Table 2), with both areas experiencing mild upwelling events (maximum SST intensities) in 2021 and 2011 (−0.60 °C and −0.58 °C, respectively) and stronger upwelling events (minimum SST intensities) in 2002 (−0.92 °C and −1.15 °C, respectively).

Regarding duration metrics, neither region shows statistically significant trends (Figure 8B). Mean duration is similar between regions (10.67 ± 5.02 days for Region 1 and 10.89 ± 6.68 days for Region 4), and similar behavior is observed for longer-lasting extremes, with maximum duration values of 23 days (Region 1) and 31 days (Region 4) in 2015 and 2002. Both regions demonstrate considerable interannual variability in event duration, with minimum values occurring in 2022 (3 days for Region 1 and 4 days for Region 4).

Frequency metrics show non-significant trends (Figure 8C). The mean frequency remains comparable between regions (5.10 ± 2.96 and 4.80 ± 3.09 events per year for Regions 1 and 4, respectively), suggesting a similar occurrence of upwelling events despite their geographical discrepancies. Both regions show identical maximum event frequencies (11 events per year) but in different years (2014 and 2018 for Region 1 and 2009 for Region 4), while the minimum is 0 for Region 1 (2023), and 1 event for Region 4 (2005, 2011, 2016, and 2022).

Table 2. Upwelling properties for the 2002–2023 period in the Caribbean Sea from the Regions 1 and 4 annual time series. Intensity and duration from averaged annual values. Frequency from the sum of events per year. Metrics include the 23-year mean, standard deviation (Std), maximum (MaV), and minimum (MiV) values, and their year of occurrence.

	Intensity (°C)		Duration (Days)		Frequency (Total Events)
	Region 1	Region 4	Region 1	Region 4	Region 1	Region 4
Mean	−0.74	−0.83	10.67	10.89	5.10	4.80
Std	0.09	0.11	5.02	6.68	2.96	3.09
MaV	−0.60	−0.58	23	31	11	11
Year MaV	2021	2011	2015	2002	2014/18	2009
MiV	−0.92	−1.15	3	4	0	1
Year MiV	2002	2002	2022	2022	2023	2005/11/16/22

3.4.3. Observed Interannual and Sub-Annual Variability

The annual cycle of the extreme upwelling events’ intensity shows similar magnitudes in both regions, with mean values ranging from −0.6 °C to −1.0 °C (Figure 9A). Region 1 exhibits lower temperatures (stronger upwelling events) during May–June (Figure 9A), while Region 4 shows lower temperatures from June to August. In addition, interannual variability modulates extreme upwelling events, as these events occur only in some of the months/years (Figure 9D–G). More intense months were for Region 1 in September 2002 (−1.0 °C) and for Region 4 in June 2002 (−1.4 °C).

The mean duration annual cycle shows averaged months with events that range between 4 days and 14 days (Figure 9B). In Region 1, longer-lasting events are in March, while in Region 4, they are in January and July. The months with longer-lasting upwelling events are June 2002 (26 days) for Region 1 and May 2015 (30 days) for Region 4 (Figure 9E,H).

The monthly averaged frequency ranges between 0.9 and 1.6 events per month. Its seasonality, although weak, shows more events in May for Region 1 and more events in August for Region 4 (Figure 9C). In the assessed period, most of the months show the occurrence of a non- or just one event (Figure 10F,I), which indicates that in these two regions, these are not very frequent extreme events. In Region 1, the month with the largest number of events was July 2014 (three events), and in Region 4, the maximum number of events per month was two.

Bear in mind that the seasonality of these metrics is computed from the multiannual SST P25 threshold (Figure 9); thus, they indicate extreme upwelling events on top of the seasonal SST variation. Therefore, upwelling extreme events (colder events referenced to the seasonal variations) do not seem to be coupled with the months where seasonal upwelling is stronger as a consequence of the CLLJ intensification (January–April—Figure 9).

Figure 9. Monthly upwelling metrics in the Caribbean Sea from the MUR dataset to assess inter-annual and sub-annual variability. Columns represent mean intensity (°C) (A,D,G), duration (days) (B,E,H), and frequency (events per month) (C,F,I), respectively. First row shows the monthly means (A–C) Region 1 in dark blue and Region 4 in dark red, using only years with extreme upwelling events. Second and third rows show monthly metrics distribution in the 2002–2023 period ((D–F), Region 1) and ((G–I), Region 4).

3.4.4. Model’s Ability to Forecast Upwelling Events

We assess the model’s ability to forecast extreme upwelling events, including the SST time series and spatial performance. We focused on the most intense and prolonged events in each region, denoted by the suffixes D (Duration) and I (Intensity), and regions R1 and R4. For this assessment, we started the forecast one day before the event occurs and for 30 days ahead. Therefore, we selected events lasting up to 7 days, available in the test period (between 7 September 2019 and 31 December 2023), as model performance deteriorates beyond this forecast horizon (see Figure 5). Table 3 details the selected events.

The R1_I upwelling event, spanning five days in September/21 (Figure 10), shows good model performance during the initial 7–10 days (subbox in Panel A), indicating reliable short-term predictability of upwelling dynamics in this region, on top of the low-frequency variability. However, bias analysis (Panels G–K) reveals a persistent difficulty in accurately capturing coastal processes, particularly in the southern sector, where upwelling is stronger (Section 3.3). This spatial bias, though generally under 1 °C, reflects a systematic overestimation of SST in nearshore areas.

The R1_D event, lasting seven days from 28 November to 4 December (Figure 11), shows good model performance as the forecast (red dashed line) closely tracks observations (black line) across all forecast horizons, as seen in the subbox of Panel A. Spatial SST distributions (Panels B–H) reveal cooler temperatures (purple-blue shades, ~25 to 26 °C) toward the southern domain. Bias analysis indicates alternating positive (red) and negative (blue) biases in the area (±1 °C).

Figure 10. Analysis of an extreme-intensity upwelling event in Region 1 (10–14 September 2021): (A) Year-long context illustrating seasonal SST variations, the 22-year climatology (median in blue), and the 25th percentile threshold for upwelling event identification (green). The SST anomaly during the upwelling event is highlighted in red, while other upwelling events are marked in orange. The insert box corresponding to the gray-shaded region displays the 30-day SST forecast time series (observed values in black, predicted values in brown). (B–F) Daily observed SST maps during the upwelling event. (G–K) Bias analysis presents the difference between spatial forecast and observed SST.

Figure 11. Analysis of an extreme-duration upwelling event in Region 1 (28 November–4 December 2021): (A) Year-long context illustrating seasonal SST variations, the 22-year climatology (median in blue), and the 25th percentile threshold for upwelling event identification (green). The SST anomaly during the upwelling event is highlighted in red, while other upwelling events are marked in orange. The insert box corresponding to the gray-shaded region displays the 30-day SST forecast time series (observed values in black, predicted values in brown). (B–H) Daily observed SST maps during the upwelling event. (I–O) Bias analysis presents the difference between spatial forecast and observed SST.

Regarding the R4_I event in early April 2022 (Figure 12), the forecast performance during this period shows a good agreement between the forecast (red dashed line) and observations (black line) in the first few days, though there is a slight divergence as the forecast extends further in time. The spatial SST distribution (panels B–D) spanning 2–4 April 2022 reveals a strong upwelling signature with temperatures ranging from 22 to 27 °C. This event shows particularly intense upwelling activity, as it is at the beginning of April, when the seasonal upwelling is still strong in this region (Figure 7). Cooler waters (purple colors, around 23–24 °C) are placed along the coastal region. The spatial extent of the cool waters suggests a well-developed upwelling cell, typical of the strong upwelling season in the Guajira region. The bias analysis (panels E–G) shows initially a predominant positive bias (red colors) in the northern and coastal areas, indicating model overestimation of SST (~0.5 °C). However, by the third day (panel G), in the strong upwelling area the bias is close to 0 °C, and negative in the middle of the cell.

Finally, for the R4_D event, occurring during early October 2021 (Figure 13), the subbox in Panel A shows a good agreement between forecast (red dashed line) and observations (black line) like in previous cases. The spatial SST distribution (panels B–E) from 4 to 7 October 2021, shows relatively warmer conditions, with temperatures ranging from 25 to 30 °C. There is a less pronounced upwelling signature compared to winter events, which is expected for this time of year (Figure 7). Slightly cooler waters (purple colors, <28 °C) appear intermittently near the coast. The bias analysis (panels F–I) reveals initially a strong bi-pole bias (red in the NE and blue in the SW in panel F), whose mean shows a good agreement in the observed and forecasted SST time series (subbox in Panel A). As time progresses, the bias does not show a clear pattern.

Figure 12. Analysis of an extreme-intense upwelling event in Region 4 (2–4 March 2022): (A) Year-long context illustrating seasonal SST variations, the 22-year climatology (median in blue), and the 25th percentile threshold for upwelling event identification (green). The SST anomaly during the upwelling event is highlighted in red, while other upwelling events are marked in orange. The insert box corresponding to the gray-shaded region displays the 30-day SST forecast time series (observed values in black, predicted values in brown). (B–D) Daily observed SST maps during the upwelling event. (E–G) Bias analysis presents the difference between spatial forecast and observed SST.

Figure 13. Analysis of an extreme-intense upwelling event in Region 4 (2–7 October 2021): (A) Year-long context illustrating seasonal SST variations, the 22-year climatology (median in blue), and the 25th percentile threshold for upwelling event identification (green). The SST anomaly during the upwelling event is highlighted in red, while other upwelling events are marked in orange. The insert box corresponding to the gray-shaded region displays the 30-day SST forecast time series (observed values in black, predicted values in brown). (B–E) Daily observed SST maps during the upwelling event. (F–I) Bias analysis presents the difference between spatial forecast and observed SST.

4. Discussion

This study evaluates the applicability of foundational models, Chronos and Lag-Llama, for SST forecasting in the Caribbean Sea, evaluating four regions with different air–ocean dynamics, but focusing on the southern Caribbean upwelling system. Using 22 years of observed SST data, the models were assessed based on their ability to predict SST variations across multiple forecast horizons (1 to 30 days). To our knowledge, this is the first study that uses this type of model to forecast SST in this region, highlighting the novelty of the methodology proposed to forecast environmental variables in the Caribbean Sea.

Results demonstrate that the Chronos model consistently outperforms Lag-Llama and two statistical-based models across all metrics, maintaining higher accuracy and lower error propagation over extended forecast horizons. Lag-Llama, although designed for probabilistic forecasting, struggles with larger errors at increasing horizons, particularly in regions with pronounced SST gradients caused by upwelling events. The Prophet statistical model performance is close to Chronos and overcomes Lag-Llama. Chronos’s superior performance stems from its ability to capture intricate non-linear patterns across diverse datasets using a zero-shot approximation, which classical models and probabilistic frameworks fail to fully address [40].

SSTs in the four assessed regions concentrate their variance on the annual period. Chronos effectively captures dominant low-frequency SST signals, especially when a ≤7-day forecast horizon is used (Figure 5 and Figure S2). Therefore, the SST forecast deteriorates with larger temporal horizons (Table 1). In addition, model forecast accuracy varies regionally, reducing in areas of complex ocean dynamics, such as in the upwelling regions, as shown in Regions 1 and 4 (Figure 6).

In Section 1, we highlight the importance of SST predictability in the region, for climate studies and at high frequencies. Previous studies focus on annual, interannual, and climate SST forecasts [10,11,13,59]. As we found that the Chronos model adequately forecasts most of the SST variability, we moved a step forward and assessed the ability of the model to predict extreme upwelling events. Therefore, in this work, we include the assessment of such extreme events in two particular regions based on observations. Regional implications of achieving high-frequency SST forecast are mentioned later in this Section.

To our knowledge, extreme upwelling events in the southern Caribbean upwelling system have not been previously studied. In the assessment of these events, we first described the observed SST seasonality in the spatially averaged Regions 1 and 4. We show that the latter has stronger (colder) upwelling events, due to its larger SST seasonal range and a larger difference between the median and percentile 25 (P25). An extreme event occurs if SST is below the daily SST P25 for at least three consecutive days. Thus, extreme events were determined from a seasonal variating threshold. Each event was described by three metrics, its mean intensity (SST below the threshold), days of duration, and frequency (number of events per assessed period).

These metrics’ annual behavior shows that interannual variability modulates extreme upwelling events in both regions (Figure 8), changing intensity, duration, and number of events per year (frequency). In addition, mean metrics in Region 1 and 4 are not statistically different; however, interannual variations differ between them regardless of whether they are both in the southern Caribbean upwelling system (Table 2). Insignificant trends indicate that upwelling events did not change in the 22-year period assessed, except for their intensity, probably as a consequence of the Caribbean Sea warming [59]. Monthly averaged metrics indicate that extreme upwelling events are not very frequent (a non- or one event per month), lasting about 4 to 14 days and forcing a temperature reduction of −0.6 to −1.0 °C (Figure 9). In addition, the seasonality of these extremes does not seem to be coupled with the upwelling regional seasonality (Figure 7).

We further assessed the ability of the Chronos model to forecast extreme upwelling events in Regions 1 and 4 (Figure 10, Figure 11, Figure 12 and Figure 13). Our results show the ability of the model to forecast these high-frequency events on top of the seasonal varying SST time series. However, a closer look at the spatial SST biases between the observations and forecast shows some limitations of the model in resembling the upwelling extreme events. An important consideration is that the Chronos model used to assess upwelling events was calibrated with data up to 6 September 2019. Consequently, it is unreasonable to expect accurate forecasts for events in 2021 and 2022, as the model lacks updated context to capture specific events on this synoptic scale. While these limitations are acknowledged, the primary objective was to evaluate the model’s capability to broadly reproduce such events. The promising results indicate the potential for further development. Future efforts should focus on operationalizing the model, although this lies beyond the scope of the present study.

To better understand the impacts of extreme upwelling events assessed in this study, future research should focus on investigating the biological responses to these phenomena and their comparison to seasonal upwelling responses (P25 in Figure 7). Such studies could examine changes in primary productivity, shifts in phytoplankton communities, and impacts on higher trophic levels, such as fish and marine mammals. These responses are critical to understanding the ecological consequences of upwelling-driven cooling events and their potential modulation under future warming scenarios. Integrating biological monitoring with SST forecasting models like Chronos could also provide insights into ecosystem resilience and inform sustainable management strategies in the southern Caribbean upwelling system.

The novel methodology of leveraging foundational models provides several benefits. These include reduced computational requirements, flexibility for deployment in resource-limited settings, and applicability across diverse environmental forecasting tasks without extensive retraining. These advantages make foundational models a promising tool for broad-scale SST forecasting for different purposes. Additionally, the zero-shot learning capability of Chronos minimizes the dependence on region-specific training, enhancing its adaptability for global applications.

The proposed foundational models represent a step forward in SST forecasting; however, their performance in regions with strong dynamic oceanographic processes like upwelling requires further refinement. Future efforts should focus on integrating physical oceanographic principles into these data-driven frameworks to enhance their reliability and accuracy.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs17030517/s1, Figure S1: Taylor diagrams of daily SST in four regions using four different models (AR, Chronos, Lag-Llama and Prophet) against MUR, for 7 September 2019 and 31 December 2023 period respectively. Blue dotted lines represent the 0.75 and 0.99 correlation values and gray concentric circles represent the Root Mean Square Error (RMSE); Figure S2: Power spectral density of (a) observed sea surface temperature (SST) and forecast residuals in (b) Region 1 and (c) Region 2, using 7 and 28 days forecast horizons. Vertical dashed black lines indicate 7 days, 30 days, and 6 months. In each spectrum, the three frequencies with the largest power spectral density are indicated with circles and crosses; Figure S3: Spatial distribution of Root Mean Squared Error (RMSE) in SST forecasts for Region 1 (A–E) and Region 4 (G–K) at forecast horizons of 1, 7, 14, 21, and 28 days. Error-values are shown in °C across all panels; Figure S4: Spatial distribution of Mean Squared Error (MSE) in SST forecasts for Region 1 (A–E) and Region 4 (G–K) at forecast horizons of 1, 7, 14, 21, and 28 days. Error-values are shown in °C across all panels; Figure S5: Spatial distribution of Median Absolute Error (MED) in SST forecasts for Region 1 (A–E) and Region 4 (G–K) at forecast horizons of 1, 7, 14, 21, and 28 days. Error values are shown in °C across all panels.

Author Contributions

Conceptualization, D.F.B.U. and R.R.T.P.; methodology, D.F.B.U. and L.R.-L.; software, D.F.B.U.; validation, D.F.B.U., R.R.T.P. and L.R.-L.; formal analysis, D.F.B.U.; investigation, D.F.B.U.; resources, L.B.; data curation, D.F.B.U.; writing—original draft preparation, D.F.B.U.; writing—review and editing, R.R.T.P., L.B. and L.R.-L.; visualization, D.F.B.U.; supervision, R.R.T.P. and L.R.-L.; project administration, R.R.T.P.; funding acquisition, L.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Available upon request from the corresponding authors. Repo: https://github.com/dfbustosus/Transformers_SST_CAR (accessed on 1 February 2025).

Acknowledgments

We acknowledge NASA for the Multi-scale Ultra-high Resolution (MUR) Sea Surface Temperature (SST) Analyses from the Making Earth System Data Records for Use in Research Environments (MEaSUREs) Program.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bindoff, N.L.; Willebrand, J.; Artale, V.; Cazenave, A.; Gregory, J.M.; Gulev, S.; Hanawa, K.; Le Quere, C.; Levitus, S.; Nojiri, Y. Observations: Oceanic Climate Change and Sea Level; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Levitus, S.; Antonov, J.I.; Boyer, T.P.; Locarnini, R.A.; Garcia, H.E.; Mishonov, A.V. Global Ocean Heat Content 1955–2008 in Light of Recently Revealed Instrumentation Problems. Geophys. Res. Lett. 2009, 36. [Google Scholar] [CrossRef]
Doney, S.C.; Ruckelshaus, M.; Emmett Duffy, J.; Barry, J.P.; Chan, F.; English, C.A.; Galindo, H.M.; Grebmeier, J.M.; Hollowed, A.B.; Knowlton, N. Climate Change Impacts on Marine Ecosystems. Annu. Rev. Mar. Sci. 2012, 4, 11–37. [Google Scholar] [CrossRef] [PubMed]
IPCC. The Physical Science Basis. In Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; IPCC: Geneva, Switzerland, 2021. [Google Scholar]
Clegg, P.; Mahon, R.; McConney, P.; Oxenford, H. The Caribbean Blue Economy; Routledge: London, UK, 2020; ISBN 1-00-022707-3. [Google Scholar]
Martinez, S. THEMATIC REPORT FOR THE CENTRAL/SOUTH AMERICAN SUB-REGION. Unpublished. In CLME Project Implementation Unit, Centre for Resource Management and Environmental Studies (CERMES); University of the West Indies: Saint Michael, Barbados, 2007; Available online: https://clmeplus.org/web/app/uploads/2020/04/CLME-PDF-B-2007-Preliminary-TDA-Central-South-America.pdf (accessed on 1 February 2025).
Wilkinson, T.A. Marine Ecoregions of North America; Instituto Nacional de Ecología: Mexico City, Mexico, 2009; ISBN 2-923358-41-4. [Google Scholar]
Andrade-Amaya, C.A. The Circulation and Variability of the Colombian Basin in the Caribbean Sea. Ph.D. Tesis, University of Wales, Cardiff, Wales, 2000. [Google Scholar]
Torres, R.R.; Latandret, S.; Salon, J.; Dagua, C. Water Masses in the Caribbean Sea and Sub-Annual Variability in the Guajira Upwelling Region. Ocean Dyn. 2023, 73, 39–57. [Google Scholar] [CrossRef]
Penland, C.; Matrosova, L. Prediction of Tropical Atlantic Sea Surface Temperatures Using Linear Inverse Modeling. J. Clim. 1998, 11, 483–496. [Google Scholar] [CrossRef]
Repelli, C.A.; Nobre, P. Statistical Prediction of Sea-Surface Temperature over the Tropical Atlantic. Int. J. Climatol. 2004, 24, 45–55. [Google Scholar] [CrossRef]
Ashby, S.A.; Taylor, M.A.; Chen, A.A. Statistical Models for Predicting Rainfall in the Caribbean. Theor. Appl. Climatol. 2005, 82, 65–80. [Google Scholar] [CrossRef]
Villegas, N.; Malikov, I.; Farneti, R. Sea Surface Temperature in Continental and Insular Coastal Colombian Waters: Observations of the Recent Past and near-Term Numerical Projections. Lat. Am. J. Aquat. Res. 2021, 49, 307–328. [Google Scholar] [CrossRef]
Bell, M.J.; Schiller, A.; Le Traon, P.-Y.; Smith, N.R.; Dombrowsky, E.; Wilmer-Becker, K. An Introduction to GODAE OceanView. J. Oper. Oceanogr. 2015, 8, s2–s11. [Google Scholar] [CrossRef]
Rawlins, F.; Ballard, S.P.; Bovis, K.J.; Clayton, A.M.; Li, D.; Inverarity, G.W.; Lorenc, A.C.; Payne, T.J. The Met Office Global Four-dimensional Variational Data Assimilation Scheme. Q. J. R. Meteorol. Soc. J. Atmospheric Sci. Appl. Meteorol. Phys. Oceanogr. 2007, 133, 347–362. [Google Scholar] [CrossRef]
Hey, T.; Tansley, S.; Tolle, K.M. The Fourth Paradigm: Data-Intensive Scientific Discovery; Microsoft research Redmond: Redmond, WA, USA, 2009; Volume 1. [Google Scholar]
Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E. On the Opportunities and Risks of Foundation Models. arXiv 2021, arXiv:210807258. [Google Scholar]
Jahanbakht, M.; Xiang, W.; Azghadi, M.R. Sea Surface Temperature Forecasting with Ensemble of Stacked Deep Neural Networks. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1502605. [Google Scholar] [CrossRef]
Taylor, J.; Feng, M. A Deep Learning Model for Forecasting Global Monthly Mean Sea Surface Temperature Anomalies. Front. Clim. 2022, 4, 932932. [Google Scholar] [CrossRef]
Fei, T.; Huang, B.; Wang, X.; Zhu, J.; Chen, Y.; Wang, H.; Zhang, W. A Hybrid Deep Learning Model for the Bias Correction of Sst Numerical Forecast Products Using Satellite Data. Remote Sens. 2022, 14, 1339. [Google Scholar] [CrossRef]
Ansari, A.F.; Stella, L.; Turkmen, C.; Zhang, X.; Mercado, P.; Shen, H.; Shchur, O.; Rangapuram, S.S.; Arango, S.P.; Kapoor, S. Chronos: Learning the Language of Time Series. arXiv 2024, arXiv:240307815. [Google Scholar]
Rasul, K.; Ashok, A.; Williams, A.R.; Khorasani, A.; Adamopoulos, G.; Bhagwatkar, R.; Biloš, M.; Ghonia, H.; Hassen, N.V.; Schneider, A. Lag-Llama: Towards Foundation Models for Time Series Forecasting. arXiv 2023, arXiv:231008278. [Google Scholar]
Zhou, T.; Niu, P.; Sun, L.; Jin, R. One Fits All: Power General Time Series Analysis by Pretrained Lm. Adv. Neural Inf. Process. Syst. 2023, 36, 43322–43355. [Google Scholar]
Jin, M.; Wang, S.; Ma, L.; Chu, Z.; Zhang, J.Y.; Shi, X.; Chen, P.-Y.; Liang, Y.; Li, Y.-F.; Pan, S. Time-Llm: Time Series Forecasting by Reprogramming Large Language Models. arXiv 2023, arXiv:231001728. [Google Scholar]
Montoya-Sánchez, R.A.; Devis-Morales, A.; Bernal, G.; Poveda, G. Seasonal and Interannual Variability of the Mixed Layer Heat Budget in the Caribbean Sea. J. Mar. Syst. 2018, 187, 111–127. [Google Scholar] [CrossRef]
Varona, H.L.; Veleda, D.; Silva, M.; Cintra, M.; Araujo, M. Amazon River Plume Influence on Western Tropical Atlantic Dynamic Variability. Dyn. Atmos. Ocean. 2019, 85, 1–15. [Google Scholar] [CrossRef]
Saha, A.; Serra, N.; Stammer, D. Growth and Decay of Northwestern Tropical Atlantic Barrier Layers. J. Geophys. Res. Oceans 2021, 126, e2020JC016956. [Google Scholar] [CrossRef]
Balaguru, K.; Chang, P.; Saravanan, R.; Leung, L.R.; Xu, Z.; Li, M.; Hsieh, J.-S. Ocean Barrier Layers’ Effect on Tropical Cyclone Intensification. Proc. Natl. Acad. Sci. USA 2012, 109, 14343–14347. [Google Scholar] [CrossRef] [PubMed]
Mignot, J.; Lazar, A.; Lacarra, M. On the Formation of Barrier Layers and Associated Vertical Temperature Inversions: A Focus on the Northwestern Tropical Atlantic. J. Geophys. Res. Oceans 2012, 117. [Google Scholar] [CrossRef]
Foltz, G.R.; McPhaden, M.J. Impact of Barrier Layer Thickness on SST in the Central Tropical North Atlantic. J. Clim. 2009, 22, 285–299. [Google Scholar] [CrossRef]
Bustos Usta, D.F.; Torres Parra, R.R.; Rodríguez-López, L.; Castillo Alvarez, M.; Bourrel, L. Observation and Projection of Marine Heatwaves in the Caribbean Sea from CMIP6 Models. Remote Sens. 2024, 16, 2357. [Google Scholar] [CrossRef]
Rueda-Roa, D.T.; Muller-Karger, F.E. The Southern Caribbean Upwelling System: Sea Surface Temperature, Wind Forcing and Chlorophyll Concentration Patterns. Deep Sea Res. Part Oceanogr. Res. Pap. 2013, 78, 102–114. [Google Scholar] [CrossRef]
Alonso del Rosario, J.J.; Vidal Pérez, J.M.; Blázquez Gómez, E. On the Prediction of Upwelling Events at the Colombian Caribbean Coasts from Modis-SST Imagery. Sensors 2019, 19, 2861. [Google Scholar] [CrossRef]
Rueda-Roa, D.T.; Ezer, T.; Muller-Karger, F.E. Description and Mechanisms of the Mid-Year Upwelling in the Southern Caribbean Sea from Remote Sensing and Local Data. J. Mar. Sci. Eng. 2018, 6, 36. [Google Scholar] [CrossRef]
Townhill, B.L.; Birchenough, S.N.; Engelhard, G.H.; Harrod, O.; McHarg, E.; Monnereau, I.; Buckley, P.J. Responding to Climate Change in Caribbean Fisheries and Aquaculture through Adaptation; CLME Hub: Cartagena, Colombia, 2021; pp. 1–57. [Google Scholar]
GEBCO. GEBCO Gridded Bathymetry Data. Available online: https://www.gebco.net/data_and_products/gridded_bathymetry_data/ (accessed on 1 February 2025).
Chin, T.M.; Vazquez-Cuervo, J.; Armstrong, E.M. A Multi-Scale High-Resolution Analysis of Global Sea Surface Temperature. Remote Sens. Environ. 2017, 200, 154–169. [Google Scholar] [CrossRef]
Giesen, R.; Stoffelen, A.; Verhoef, A. Product User Manual; Copernicus Marine Service: 2024. Available online: https://documentation.marine.copernicus.eu/PUM/CMEMS-WIND-PUM-012-004-006.pdf (accessed on 31 January 2025).
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
Ansari, A.F.; Heng, A.; Lim, A.; Soh, H. Neural continuous-discrete state space models for irregularly-sampled time series. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 926–951. [Google Scholar]
Zhang, B.; Sennrich, R. Root Mean Square Layer Normalization. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Su, J.; Ahmed, M.; Lu, Y.; Pan, S.; Bo, W.; Liu, Y. Roformer: Enhanced Transformer with Rotary Position Embedding. Neurocomputing 2024, 568, 127063. [Google Scholar] [CrossRef]
Wang, H.; Huang, J.; Zhou, H.; Zhao, L.; Yuan, Y. An Integrated Variational Mode Decomposition and ARIMA Model to Forecast Air Temperature. Sustainability 2019, 11, 4018. [Google Scholar] [CrossRef]
De Saa, E.; Ranathunga, L. Comparison between Arima and Deep Learning Models for Temperature Forecasting. arXiv 2020, arXiv:201104452. [Google Scholar]
Nawi, W.; Lola, M.S.; Zakariya, R.; Zainuddin, N.H.; Abd Hamid, A.A.K.; Aruchunan, E.; Nazzrol, N.S.A. Improved of Forecasting Sea Surface Temperature Based on Hybrid Arima and Support Vector Machines Models. Malays. J. Fundam. Appl. Sci. 2021, 17, 609–620. [Google Scholar] [CrossRef]
Taylor, S.J.; Letham, B. Forecasting at Scale. Am. Stat. 2018, 72, 37–45. [Google Scholar] [CrossRef]
van de Sande, S.N.; Alsahag, A.M.; Mohammadi Ziabari, S.S. Enhancing the Predictability of Wintertime Energy Demand in The Netherlands Using Ensemble Model Prophet-LSTM. Processes 2024, 12, 2519. [Google Scholar] [CrossRef]
Zrira, N.; Kamal-Idrissi, A.; Farssi, R.; Khan, H.A. Time Series Prediction of Sea Surface Temperature Based on BiLSTM Model with Attention Mechanism. J. Sea Res. 2024, 198, 102472. [Google Scholar] [CrossRef]
Torres, R.R.; Tsimplis, M.N. Seasonal Sea Level Cycle in the Caribbean Sea. J. Geophys. Res. Oceans 2012, 117. [Google Scholar] [CrossRef]
Benazzouz, A.; Mordane, S.; Orbi, A.; Chagdali, M.; Hilmi, K.; Atillah, A.; Pelegrí, J.L.; Hervé, D. An Improved Coastal Upwelling Index from Sea Surface Temperature Using Satellite-Based Approach—The Case of the Canary Current Upwelling System. Cont. Shelf Res. 2014, 81, 38–54. [Google Scholar] [CrossRef]
Jacox, M.G.; Edwards, C.A.; Hazen, E.L.; Bograd, S.J. Coastal Upwelling Revisited: Ekman, Bakun, and Improved Upwelling Indices for the US West Coast. J. Geophys. Res. Oceans 2018, 123, 7332–7350. [Google Scholar] [CrossRef]
Bograd, S.J.; Jacox, M.G.; Hazen, E.L.; Lovecchio, E.; Montes, I.; Pozo Buil, M.; Shannon, L.J.; Sydeman, W.J.; Rykaczewski, R.R. Climate Change Impacts on Eastern Boundary Upwelling Systems. Annu. Rev. Mar. Sci. 2023, 15, 303–328. [Google Scholar] [CrossRef]
Fielding, P.J.; Davis, C.L. Carbon and Nitrogen Resources Available to Kelp Bed Filter Feeders in an Upwelling Environment. Mar. Ecol. Prog. Ser. 1989, 55, 181–189. [Google Scholar] [CrossRef]
Orfila, A.; Urbano-Latorre, C.P.; Sayol, J.M.; Gonzalez-Montes, S.; Caceres-Euse, A.; Hernández-Carrasco, I.; Muñoz, Á.G. On the Impact of the Caribbean Counter Current in the Guajira Upwelling System. Front. Mar. Sci. 2021, 8, 626823. [Google Scholar] [CrossRef]
Schlegel, R.W.; Smit, A.J. heatwaveR: A Central Algorithm for the Detection of Heatwaves and Cold-Spells. J. Open Source Softw. 2018, 3, 821. [Google Scholar] [CrossRef]
Abrahams, A.; Schlegel, R.W.; Smit, A.J. Variation and Change of Upwelling Dynamics Detected in the World’s Eastern Boundary Upwelling Systems. Front. Mar. Sci. 2021, 8, 626411. [Google Scholar] [CrossRef]
Bustos Usta, D.F.; Torres Parra, R.R. Projected Wind Changes in the Caribbean Sea Based on CMIP6 Models. Clim. Dyn. 2022, 60, 3713–3727. [Google Scholar] [CrossRef]
Gamble, D.W.; Curtis, S. Caribbean Precipitation: Review, Model and Prospect. Prog. Phys. Geogr. Earth Environ. 2008, 32, 265–276. [Google Scholar] [CrossRef]
Bustos, D.; Torres, R. Ocean and Atmosphere Changes in the Caribbean Sea during the Twenty-First Century Using CMIP5 Models. Ocean Dyn. 2021, 71, 757–777. [Google Scholar] [CrossRef]

Figure 2. Overall Chronos Architecture: (Left) Input time series scaled and quantized to obtain a sequence of context tokens. (Center) The tokens are fed into a language model, and trained using the cross-entropy loss. (Right) During inference, we autoregressively sample tokens from the model and map them back to numerical values. Adapted from [21].

Figure 7. Twenty-two-year SST climatology (P50, blue) and 25th percentile (P25, orange) used for upwelling events threshold definition in Regions 1 (A) and 4 (B).

Figure 8. Caribbean Sea (CAR) annual time series of extreme upwelling events properties: (A) averaged mean intensity, (B) averaged mean duration, and (C) frequency (sum of events per year). Time series from Region 1 (dark blue) and Region 4 (dark red). Trends estimated over the 2002–2023 period are shown in each panel for each time series in units/decade. * indicates significant trends at the 90% level.

Table 3. Selected upwelling events for analysis: Prefixes R1 and R4 indicate Regions 1 and 4, respectively, while the suffixes I and D denote the more intense and longest events for each region. Each event entry includes start and end dates, mean intensity, and duration.

Event	Date_Start	Date_End	Mean_Intensity (°C)	Duration (Days)
R1_I	10 September 2021	14 September 2021	−0.602	5
R1_D	28 November 2022	4 December 2022	−0.585	7
R4_I	2 April 2022	4 April 2022	−0.631	3
R4_D	4 October 2021	7 October 2021	−0.634	4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bustos Usta, D.F.; Rodríguez-López, L.; Torres Parra, R.R.; Bourrel, L. Sea Surface Temperature Forecasting Using Foundational Models: A Novel Approach Assessed in the Caribbean Sea. Remote Sens. 2025, 17, 517. https://doi.org/10.3390/rs17030517

AMA Style

Bustos Usta DF, Rodríguez-López L, Torres Parra RR, Bourrel L. Sea Surface Temperature Forecasting Using Foundational Models: A Novel Approach Assessed in the Caribbean Sea. Remote Sensing. 2025; 17(3):517. https://doi.org/10.3390/rs17030517

Chicago/Turabian Style

Bustos Usta, David Francisco, Lien Rodríguez-López, Rafael Ricardo Torres Parra, and Luc Bourrel. 2025. "Sea Surface Temperature Forecasting Using Foundational Models: A Novel Approach Assessed in the Caribbean Sea" Remote Sensing 17, no. 3: 517. https://doi.org/10.3390/rs17030517

APA Style

Bustos Usta, D. F., Rodríguez-López, L., Torres Parra, R. R., & Bourrel, L. (2025). Sea Surface Temperature Forecasting Using Foundational Models: A Novel Approach Assessed in the Caribbean Sea. Remote Sensing, 17(3), 517. https://doi.org/10.3390/rs17030517

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sea Surface Temperature Forecasting Using Foundational Models: A Novel Approach Assessed in the Caribbean Sea

Abstract

1. Introduction

2. Materials and Methods

2.1. Observational Datasets

2.2. Models’ Architecture

2.2.1. Chronos Model

2.2.2. Lag-Llama Model

2.3. Other Models for Comparison

2.4. Model Evaluation

2.5. Case Study: Forecast Assessment of Extreme Upwelling Events

3. Results

3.1. Evaluation of Models’ Performance

3.2. SST Time Series Predictions

3.3. SST Spatial Predictions

3.4. Observed Extreme Upwelling Events and Their Forecast

3.4.1. Observed Upwelling Seasonality

3.4.2. Observed Annual Variability

3.4.3. Observed Interannual and Sub-Annual Variability

3.4.4. Model’s Ability to Forecast Upwelling Events

4. Discussion

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI