1. Introduction
The study of hydrological time series patterns is essential for effective water resource management. By uncovering the regularities and trends in hydrological phenomena, these patterns significantly impact water resource management, climate change research, urban planning, infrastructure development, ecological protection, and disaster management. For instance, in watershed management, analyzing long-term time series data enables managers to understand trends in precipitation and flow, which in turn helps optimize reservoir operations and ensure sustainable water resource use [
1,
2,
3,
4]. Additionally, during potential floods or droughts, timely and reliable hydrological data are crucial for developing emergency response plans. Analyzing historical data and patterns allows for predictions of disaster likelihood and severity, facilitating the formulation of effective measures to mitigate impacts on communities and the economy [
5].
However, changes in hydrological data patterns are influenced by a variety of human and climatic factors. Key human factors include urbanization, which alters natural water flow and increases runoff; irrigation, which affects local water availability; reservoirs, which modify river flows and water storage; water withdrawals, which impact streamflow and groundwater levels; and pollution, which degrades water quality and affects aquatic ecosystems. Climatic factors include precipitation patterns, which determine water input and distribution; temperature changes, which influence evaporation rates and snowmelt; evapotranspiration, which affects water loss from land surfaces; and phenomena such as El Niño and La Niña, which impact global weather patterns and, consequently, hydrological cycles [
6,
7,
8,
9].
Due to limitations in data collection periods, hydrologists often rely on time series modeling techniques to reconstruct and forecast data, thereby improving the reliability and accuracy of their systems. The ARIMA (Autoregressive Integrated Moving Average) model is a widely used statistical tool for generating and forecasting time series data [
10]. ARIMA models assume linear relationships within the data, which may not effectively capture the complexities of time series with nonlinear characteristics. Additionally, ARIMA models require that the time series be stationary before modeling; while differencing can transform non-stationary series into stationary ones, this approach is not always suitable for data generation.
On the other hand, neural networks are often used for forecasting, which is predicting future values based on historical data [
11,
12,
13,
14]. However, their “black box” nature can make model predictions less transparent, affecting their credibility. Hochreiter and Schmidhuber [
15] introduced the Long Short-Term Memory (LSTM) method for generating non-stationary sequences. This method is capable of capturing long-term dependencies in time series data, enabling it to effectively model and generate non-stationary time series. Nevertheless, in practice, LSTMs can still face challenges when dealing with long sequences or complex dependency relationships.
The Fourier Transform is another method used to analyze the frequency components of a signal. It transforms a signal from the time domain to the frequency domain by decomposing the signal into a sum of sine and cosine functions, enabling the analysis of its frequency components. This method is particularly well-suited for processing stationary signals, especially those that are periodic or have distinct frequency components. However, the Fourier Transform is less effective for handling signals with long-term or slowly varying components.
In contrast, Empirical Mode Decomposition (EMD) is highly effective for analyzing nonlinear and non-stationary signals. EMD decomposes signals into Intrinsic Mode Functions (IMFs) and a residual trend component, making it useful for handling signals with trends and complex components. The original data can be reconstructed by summing all IMFs and the residual. Huang et al. [
16] provide a detailed description of the sifting process to extract IMFs. Karthikeyan and Kumar [
17] used EMD to decompose time series, fitted the resulting components with autoregressive models, and combined the forecasts to obtain final predictions. However, it was found that using linear ARMA models to fit nonlinear decomposed components appears to be unsuitable. Zhang et al. [
18] investigated long-term runoff fluctuations in the Jing River of the Yellow River using EMD. Their findings revealed that the river’s annual runoff displays fluctuations across multiple timescales. Huang and Lee [
19] demonstrated the feasibility of using a combination of EMD and statistical regression analysis to assess typhoon frequency. They also demonstrated the applicability of EMD for projecting non-stationary daily Sea Surface Temperatures (SST) series.
Given that many hydrological variables, such as precipitation, flow, temperature, and evaporation, are exhibiting trends of gradual increase or decrease, generating data for research requires addressing the challenge of reproducing non-stationary sequences. For example, sea surface temperatures have consistently risen throughout the 20th century, with an average increase of 0.14 °F per decade from 1901 to 2020 (NOAA,
https://www.ncei.noaa.gov/products/extended-reconstructed-sst, 16 September 2024). The significant rise in SSTs over the past three decades underscores the impact of climate change. When using SST data in related studies, shifts in its statistical properties make ARIMA models impractical for data generation. Additionally, non-stationary SST sequences pose challenges for examining the relationship between SST and El Niño events.
To address these challenges, this paper proposes a novel method for synthesizing non-stationary data and applying it to El Niño forecasting. The proposed method employs EMD to generate non-stationary sequences, advancing the research objectives.
3. Result and Discussion
3.1. El Niño
The “El Niño phenomenon” is characterized by opposing changes in sea surface temperatures between the eastern and western Pacific Oceans, coupled with a corresponding seesaw-like oscillation in atmospheric pressure [
23]. This phenomenon can have widespread effects, altering atmospheric circulation patterns and causing significant impacts across various regions worldwide [
24,
25]. The Oceanic Niño Index (ONI) is the primary tool used by NOAA to monitor El Niño and La Niña events (
https://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_v5.php, 16 September 2024). The ONI is calculated by taking the actual monthly sea surface temperatures in the Niño 3.4 region (5° S–5° N, 120° W–170° W), subtracting the 30-year long-term monthly average for that month, and then averaging the differences over three consecutive months to obtain the representative mid-month ONI value. An ONI value exceeding 0.5 °C for at least five consecutive months indicates an El Niño event, while a value below −0.5 °C for five consecutive months indicates a La Niña event. Due to the noticeable warming trend in the Niño 3.4 region since 1950, NOAA uses multiple 30-year base periods to define the ONI.
Based on data from a Poisson process,
Table 1 reveals from 1981 to 2022, El Niño conditions occurred an average of 2.93 times per year, whereas La Niña conditions occurred 3.50 times per year. This suggests that La Niña conditions are slightly more frequent than El Niño conditions. The return period for El Niño events is 3.82 years, while for La Niña events, it is 3.23 years. Additionally, events lasting more than one year for both El Niño and La Niña occur approximately every 8.40 years on average.
Currently, there are numerous dynamic and statistical models available for forecasting ONI values, as accessible through the International Research Institute (IRI) for Climate and Society [
26,
27,
28]. (
https://iri.columbia.edu/our-expertise/climate/forecasts/enso/current/, 16 September 2024). Huang and Lee [
19] demonstrated the effectiveness of Empirical Mode Decomposition (EMD) in projecting non-stationary daily SST series and used the projected SST series to estimate the number of typhoons. Given the similar context of rising sea surface temperatures, this study also aims to employ EMD-based SST fitting for non-stationary time series to statistically predict ONI values for each month in the upcoming year, along with the probability distribution of surpassing the +/−0.5 °C threshold.
3.2. Adjustments to Oceanic Niño Index Estimates
The Oceanic Niño Index (ONI) is calculated using a 30-year base period, which is updated every five years to cover the assessment period. However, this updating process can cause delays in reflecting current conditions, potentially failing to capture the latest changes in real time. For example, ONI values from 2001 to 2005 were calculated using the base period from 1986 to 2015, while values from 2006 to 2010 were based on the 1991–2020 period. As for the period from 2021 to 2025, the ONI will temporarily use the initial base period (from 1991 to 2020) until the new base period (from 2006 to 2035) is established. This study explores these delays in ONI applications and proposes modifications to improve the index’s responsiveness.
An accuracy assessment based on SST anomalies from 1981 to 2022, shown in
Table 2a, reveals that using the final ONI as the reference data results in higher accuracy for El Niño anomalies (warm, ONI ≥ 0.5 °C) but lower accuracy for La Niña anomalies (cold, ONI ≤ −0.5 °C). Overall, the producer’s accuracy rates for El Niño, Neutral, and La Niña anomalies are 99.19%, 84.19%, and 80.27%, respectively, with an overall accuracy of 86.71%. The initial ONI shows more El Niño events and fewer La Niña events compared to the final ONI. For instance, in 1993, the initial ONI indicated El Niño conditions, while the final ONI did not. Similarly, from 1999 to 2000, the final ONI identified La Niña conditions, whereas the initial ONI showed both El Niño and La Niña events.
This research proposes adjustments to ONI estimates to enhance usability. This is called adjusted ONI. Firstly, the study suggests using sea temperature data from the past 30 years for the calculation year as the base period and updating it annually. Additionally, instead of employing a three-month moving average to represent variations, the study recommends using the difference between the current month’s sea surface temperature and the mean value of the base period. Practically, calculating the indicator using one-month SST anomalies is simpler and more convenient.
The study also compares the accuracy of SST anomalies between NOAA’s final ONI (reference data) and the adjusted ONI (classified data). The producer’s accuracy rates for El Niño, Neutral, and La Niña are 97.56%, 85.47%, and 89.80%, respectively, with an overall accuracy of 89.68% (see
Table 2b). This performance is comparable to the initial ONI.
Figure 6 demonstrates that the adjusted ONI aligns well with NOAA’s final ONI and can promptly respond to El Niño (e.g., during 1982–1983, 1997–1998, and 2015–2016) and La Niña (e.g., during 1999–2000 and 2021–2022) events.
Currently, NOAA’s “final” ONI serves as the standard for condition judgment. The ONI adjustments proposed in this study offer a simpler, more direct approach, with results closely aligning with the final ONI and outperforming the initial ONI. This revised approach can be used alongside the current ONI to facilitate a more comprehensive climate assessment.
3.3. Characteristics of Long-Term Daily SST Data Decomposed by EMD
The upward trend in sea surface temperatures (SST), as discussed in
Section 2.2, highlights the limitations of traditional hydrological models for data generation and prediction in such scenarios. To address these limitations, this study employs Empirical Mode Decomposition (EMD) for SST projection, offering a novel approach for handling non-stationary series.
Using 120 years of daily SST records (1900–2019) from the Niño 3.4 region, EMD decomposes the data into 13 Intrinsic Mode Functions and one residual. Due to space constraints, only a subset of the IMFs is shown in
Figure 7. The residual displays an upward trend, starting at 26.812 °C and ending at 28.099 °C, with an average annual increase of 0.0107 °C. It is observed that the period of oscillations increases with the IMF level. Specifically, IMF7 exhibits the largest amplitude, fluctuating between ±1 °C, while IMF13 has the smallest amplitude, oscillating between −0.0122 °C and 0.0087 °C. When all IMFs and the residual are combined and compared to the original data, the average error is 9.09E−6 °C, which is negligible, and the standard deviation is 0.0137 °C, demonstrating the effectiveness of EMD in decomposing the data.
To determine the average period of each IMF, this research utilizes zero-crossing analysis as described by Goda [
29]. This technique tracks the change in sign (from positive to negative) through the axis (zero value) to identify the period of each “wave” and then calculates the average period of all “waves” as the period of the IMF. Additionally, the weight of each IMF is defined using the following formula [
22]:
where
represents the vertical distance of the i
th IMF at time t; E
i denotes the total energy of the i
th IMF; and W
i represents the weight of the i
th IMF.
Table 3 shows that the time periods of IMFs in the Niño 3.4 region are similar to those in the typhoon birthplace region, particularly IMF1 to IMF7. Notably, the average period of IMF7 is 347.22 days, approximately one year. As the IMF level increases, the number of “wave” samples decreases, leading to an increase in the time period. The cumulative weight of IMFs from IMF1 to IMF7 in the Niño 3.4 region is 75.61%, with IMF7 contributing the most significant weight at 54.11%, indicating its primary role in data reconstruction. However, the amplitude pattern based on IMF7 does not fully reflect actual SST changes. In contrast, in the typhoon region, the cumulative weight ratio of IMFs from IMF1 to IMF7 reaches 96.82%, with IMF7 accounting for 90.15% of the weight. These differences are attributed to variations in amplitude. For example, the temperature change of IMF7 in the Niño 3.4 area is concentrated between ±1 °C, while in the typhoon region, IMF7’s temperature fluctuation spans ±2 °C [
19].
3.4. SST Projection Based on a 5-Year Moving Process
Using SST data from 1900 to 2019, the data are segmented into 120 one-year-long daily sea surface temperature series. Each series is then decomposed using EMD into seven IMFs and one residual (p = 7 here).
Figure 8 presents the residuals for each year across the past 120 years, demonstrating EMD’s ability to detect significant SST trends, such as the peak in 2015 and the trough in 1981 (refer to
Figure 2). An autocorrelation analysis is conducted on annual average sea surface temperature data from 1900 to 2019. With 120 annual observations, the 95% confidence interval for the autocorrelation sequence of the white noise process is approximately ±0.18. The analysis reveals that the autocorrelation at lag 5 is 0.23, which is marginally above the confidence interval. In contrast, autocorrelation values for lags greater than 5 fall within the confidence interval, indicating that they are not statistically significant beyond lag 5. If L = 5, it implies that, based on data from the past 5 years, 390,625 sequences can be generated, as indicated by L
(p + 1) = 5
8. Conversely, with L = 4, only 65,536 sequences can be generated. Considering the increased number of sequences that can be handled, L = 5 is regarded as a more suitable choice for this study.
Figure 9 illustrates the projection process: the SST estimate for year 6 is generated from observations from years 1 to 5, the estimate for year 7 from years 2 to 6, and so forth. This 5-year moving process retains the data variations from the preceding five years, capturing potential non-stationary characteristics.
In the projection process, each year is treated as an individual entity, with the IMFs and residuals from the data of that year resembling unique chromosomal traits. Just as genetic recombination can create new individuals, this process can generate new SST data. Data anomalies in this context are analogous to chromosomal abnormalities. Each year’s SST series can be viewed as a potential representation of daily SST values for that year, allowing for the projection of daily sea surface temperatures from 1905 to 2020. The study compares the actual annual average SST of the sixth year (the projected year) with the annual average SSTs of the previous five years. Suppose the actual SST of the projected year is not an extreme value (i.e., neither the highest nor the lowest), the monthly SSTs for that year generally fall within the interquartile range of the predicted values. Conversely, if the actual SST of the projected year is an extreme value, the monthly SSTs for that year are often outside the interquartile range of the predicted values.
The SST projection is exemplified using the years 1981, 1982, 1997, 1999, 2000, and 2015. The estimated monthly values are categorized into quartiles, with results shown in
Figure 10. As indicated in
Figure 2, compared to the sea temperatures of the previous five years, 1981 had the lowest average temperature, while 1997 and 2015 had the highest. The monthly SSTs for 1981 mostly fall below the first quartile, whereas those for 1997 and 2015 are mostly above the third quartile. In contrast, the annual mean temperatures for 1982, 1999, and 2000 were not extreme values. Therefore, the monthly SSTs for these years generally fall within the interquartile range.
A comparison between the actual annual sea surface temperature and the distribution of the synthetic annual sea surface temperatures for the Niño 3.4 region from 1905 to 2019 is also provided (see
Figure 11). Generally, unless the actual values for the projected years are extreme, such as in 1981, 1997, and 2015, most of the actual values fall within the box plot range of the synthetic series. The study concludes that the 5-year moving process based on EMD effectively tracks future SST trends, reflecting the characteristics of non-stationary sequences.
3.5. El Niño Forecast
The projected sea surface temperature data will be used to forecast El Niño events. The forecast is categorized into three levels: El Niño, Neutral, and La Niña, based on specific temperature thresholds. An adjusted calculation of the Oceanic Niño Index (ONI) will be employed for the assessment. The periods 1997–2000 and 2015–2016 are used as examples, with 1997 recording the highest temperature of the 20th century (annual average of 28.34 °C) and 2015 marking the highest temperature in history (annual average of 29.12 °C).
As illustrated in
Figure 10, the 5-year moving process produces 390,625 SST estimates each month. By comparing these estimates with the historical monthly average SST over the past 30 years, the presence of El Niño can be determined for each month. By analyzing all 390,625 results, the probabilities of El Niño, Neutral, and La Niña occurrences can be calculated.
Figure 12 presents the probabilities of El Niño and La Niña occurrences for each month from 1997 to 2000 and from 2015 to 2016.
To provide a detailed illustration of the El Niño forecasting process, consider the year 1997 as an example:
- (i)
EMD Generation: Using EMD, 390,625 annual sequences are generated based on SST data from 1992 to 1996 (as shown in
Figure 10).
- (ii)
ONI Estimation: The historical average monthly SST from 1967 to 1996 is subtracted from each monthly SST in the 1997 projected series to obtain the 1-month adjusted ONI value. This results in a total of 390,625 ONI forecast values per month.
- (iii)
Probability Distribution Analysis: The probability distribution of the ONI estimates is analyzed to determine the occurrence counts for El Niño, Neutral, and La Niña conditions. This distribution represents the probabilities of potential El Niño conditions in 1997 and is displayed in
Figure 12.
Based on the ONI estimate for 1997, the likelihood of El Niño conditions in April was only 13%, compared to over 20% in other months. In August and September, the probability approached 50%, surpassing the probabilities for Neutral and La Niña conditions. Overall, the probability of Neutral conditions was higher from January to July and October to December, while the probability of La Niña conditions was lower.
However, according to NOAA’s retrospective ONI indicator (
https://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_v5.php, 16 September 2024), a year-long El Niño event began in May 1997, followed by a La Niña event from July 1998 to the end of 2000. In contrast,
Figure 12 shows that during those years, the probability of “El Niño” and “Neutral” varied, while the probability of “La Niña” remained low. Similarly, NOAA’s ONI indicates that an El Niño event occurred from 2015 to April 2016, followed by a short-term La Niña event.
Figure 12 shows that although all three conditions were possible during 2015–2016, “Neutral” was the most likely, while “La Niña” was the least likely.
The study found that when the inferred estimates align with observed values, the adjusted ONI from 1997 to 2000 and from 2015 to 2016 is consistent with NOAA’s retrospective ONI (see
Figure 6). This consistency supports the ONI adjustment made in
Section 3. However, due to the inherent uncertainty in SST estimates, predictions of future events can only be expressed in terms of probabilities. The study’s results in
Figure 12 differ from NOAA’s retrospective ONI results, suggesting that biases in SST estimates may lead to differences in the probabilities of triggering El Niño, Neutral, or La Niña conditions. The study concludes that the uncertainty in SST estimates is the main reason for the lower incidence of La Niña conditions.
Additionally, the study notes that the original data used in this analysis was sourced from ICOADS Release 3, which directly uses the daily average SST in the Niño 3.4 region as the representative value for analysis. In contrast, the current NOAA Extended Reconstructed Sea Surface Temperature V5 (ERSST) data employ statistical methods to fill in missing SSTs based on the ICOADS dataset and calculate the monthly SST (
https://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/detrend.nino34.ascii.txt, 16 September 2024). As a result, the SST utilized in this study may be slightly elevated due to differences in sampling points.
4. Conclusions
Due to the impact of climate change on the environment, many hydrological variables, such as precipitation, streamflow, temperature, and groundwater levels, exhibit trends of increase or decrease over time. These changing trends pose challenges to traditional hydrological time series models, making it difficult to match them and complicating water resource management or climate change research. The data generation method based on Empirical Mode Decomposition (EMD) proposed in this paper successfully addresses the issue of generating non-stationary sequences. This method can capture the statistical characteristics of non-stationary sequences that change over time.
In this case study, EMD is applied to project non-stationary daily sea surface temperature data, and the resulting data are used to estimate the probabilities of monthly El Niño and La Niña events. The research results demonstrate the practicality of the proposed EMD-based data generation method. Furthermore, in addition to the existing ONI provided by NOAA, the adjusted ONI outlined in this study is simple and practical, making it a useful reference for evaluating El Niño events.
Based on the authors’ experience with EMD synthetic data, due to the significant variability in rainfall and streamflow in Taiwan, especially during typhoon events, it is generally advisable to apply a logarithmic transformation to the data before conducting EMD analysis. Additionally, compared to the variability in rainfall/runoff EMD analysis, temperature fluctuations are much smaller, which generally leads to better EMD analysis results. Moreover, to avoid mode mixing problems during EMD analysis, it is recommended to use Ensemble Empirical Mode Decomposition (EEMD) for data decomposition.
The forecasts for El Niño and La Niña in this paper differ from NOAA’s predictions. Investigation revealed that the data used in this study are based on ICOADS Release 3, whereas NOAA uses ERSST data, leading to discrepancies in SSTs in the Niño 3.4 region and indirectly resulting in differences in El Niño and La Niña event forecasts. As shown in
Table 2a,b, if the analysis were conducted using ERSST data, the adjusted ONI indicator proposed in this study would be highly consistent with NOAA’s ONI results. Therefore, the differences in the basic data are indeed a significant factor of the variation in the research outcomes.