Hydrological Data Projection Using Empirical Mode Decomposition: Applications in a Changing Climate

Chang, Che-Wei; Lee, Jung-Chen; Huang, Wen-Cheng

doi:10.3390/w16182669

Open AccessArticle

Hydrological Data Projection Using Empirical Mode Decomposition: Applications in a Changing Climate

by

Che-Wei Chang

,

Jung-Chen Lee

and

Wen-Cheng Huang

^*

Department of Harbor and River Engineering, National Taiwan Ocean University, Keelung City 202301, Taiwan

^*

Author to whom correspondence should be addressed.

Water 2024, 16(18), 2669; https://doi.org/10.3390/w16182669

Submission received: 26 August 2024 / Revised: 10 September 2024 / Accepted: 18 September 2024 / Published: 19 September 2024

(This article belongs to the Special Issue Watershed Hydrology and Management under Changing Climate)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper demonstrates the effectiveness and superiority of Empirical Mode Decomposition (EMD) in projecting non-stationary hydrological data. The study focuses on daily Sea Surface Temperature (SST) sequences in the Niño 3.4 region and uses EMD to forecast the probability of El Niño events. Applying the Mann–Kendall test at the 5% significance level reveals a significant increasing trend in SST changes in this region, particularly noticeable after 1980. This trend is associated with the occurrence of El Niño and La Niña events, which have a recurrence interval of approximately 8.4 years and persist for over a year. The modified Oceanic Niño Index (ONI) proposed in this study demonstrates high forecast accuracy, with 97.56% accuracy for El Niño and 89.80% for La Niña events. Additionally, the EMD of SST data results in 13 Intrinsic Mode Functions (IMFs) and a residual component. The oscillation period increases with each IMF level, with IMF7 exhibiting the largest amplitude, fluctuating between ±1 °C. The residual component shows a significant upward trend, with an average annual increase of 0.0107 °C. These findings reveal that the EMD-based data generation method overcomes the limitations of traditional hydrological models in managing non-stationary sequences, representing a notable advancement in data-driven hydrological time series modeling. Practically, the EMD-based 5-year moving process can generate daily sea temperature sequences for the coming year in this region, offering valuable insights for assessing El Niño probabilities and facilitating annual updates.

Keywords:

non-stationary hydrological data; empirical mode decomposition; sea surface temperature; El Niño

1. Introduction

The study of hydrological time series patterns is essential for effective water resource management. By uncovering the regularities and trends in hydrological phenomena, these patterns significantly impact water resource management, climate change research, urban planning, infrastructure development, ecological protection, and disaster management. For instance, in watershed management, analyzing long-term time series data enables managers to understand trends in precipitation and flow, which in turn helps optimize reservoir operations and ensure sustainable water resource use [1,2,3,4]. Additionally, during potential floods or droughts, timely and reliable hydrological data are crucial for developing emergency response plans. Analyzing historical data and patterns allows for predictions of disaster likelihood and severity, facilitating the formulation of effective measures to mitigate impacts on communities and the economy [5].

However, changes in hydrological data patterns are influenced by a variety of human and climatic factors. Key human factors include urbanization, which alters natural water flow and increases runoff; irrigation, which affects local water availability; reservoirs, which modify river flows and water storage; water withdrawals, which impact streamflow and groundwater levels; and pollution, which degrades water quality and affects aquatic ecosystems. Climatic factors include precipitation patterns, which determine water input and distribution; temperature changes, which influence evaporation rates and snowmelt; evapotranspiration, which affects water loss from land surfaces; and phenomena such as El Niño and La Niña, which impact global weather patterns and, consequently, hydrological cycles [6,7,8,9].

Due to limitations in data collection periods, hydrologists often rely on time series modeling techniques to reconstruct and forecast data, thereby improving the reliability and accuracy of their systems. The ARIMA (Autoregressive Integrated Moving Average) model is a widely used statistical tool for generating and forecasting time series data [10]. ARIMA models assume linear relationships within the data, which may not effectively capture the complexities of time series with nonlinear characteristics. Additionally, ARIMA models require that the time series be stationary before modeling; while differencing can transform non-stationary series into stationary ones, this approach is not always suitable for data generation.

On the other hand, neural networks are often used for forecasting, which is predicting future values based on historical data [11,12,13,14]. However, their “black box” nature can make model predictions less transparent, affecting their credibility. Hochreiter and Schmidhuber [15] introduced the Long Short-Term Memory (LSTM) method for generating non-stationary sequences. This method is capable of capturing long-term dependencies in time series data, enabling it to effectively model and generate non-stationary time series. Nevertheless, in practice, LSTMs can still face challenges when dealing with long sequences or complex dependency relationships.

The Fourier Transform is another method used to analyze the frequency components of a signal. It transforms a signal from the time domain to the frequency domain by decomposing the signal into a sum of sine and cosine functions, enabling the analysis of its frequency components. This method is particularly well-suited for processing stationary signals, especially those that are periodic or have distinct frequency components. However, the Fourier Transform is less effective for handling signals with long-term or slowly varying components.

In contrast, Empirical Mode Decomposition (EMD) is highly effective for analyzing nonlinear and non-stationary signals. EMD decomposes signals into Intrinsic Mode Functions (IMFs) and a residual trend component, making it useful for handling signals with trends and complex components. The original data can be reconstructed by summing all IMFs and the residual. Huang et al. [16] provide a detailed description of the sifting process to extract IMFs. Karthikeyan and Kumar [17] used EMD to decompose time series, fitted the resulting components with autoregressive models, and combined the forecasts to obtain final predictions. However, it was found that using linear ARMA models to fit nonlinear decomposed components appears to be unsuitable. Zhang et al. [18] investigated long-term runoff fluctuations in the Jing River of the Yellow River using EMD. Their findings revealed that the river’s annual runoff displays fluctuations across multiple timescales. Huang and Lee [19] demonstrated the feasibility of using a combination of EMD and statistical regression analysis to assess typhoon frequency. They also demonstrated the applicability of EMD for projecting non-stationary daily Sea Surface Temperatures (SST) series.

Given that many hydrological variables, such as precipitation, flow, temperature, and evaporation, are exhibiting trends of gradual increase or decrease, generating data for research requires addressing the challenge of reproducing non-stationary sequences. For example, sea surface temperatures have consistently risen throughout the 20th century, with an average increase of 0.14 °F per decade from 1901 to 2020 (NOAA, https://www.ncei.noaa.gov/products/extended-reconstructed-sst, 16 September 2024). The significant rise in SSTs over the past three decades underscores the impact of climate change. When using SST data in related studies, shifts in its statistical properties make ARIMA models impractical for data generation. Additionally, non-stationary SST sequences pose challenges for examining the relationship between SST and El Niño events.

To address these challenges, this paper proposes a novel method for synthesizing non-stationary data and applying it to El Niño forecasting. The proposed method employs EMD to generate non-stationary sequences, advancing the research objectives.

2. Material and Method

2.1. Data Collection

The sea temperature data for this study are obtained from the International Comprehensive Ocean-Atmosphere Data Set (ICOADS), available at https://icoads.noaa.gov (16 September 2024). ICOADS Release 3 includes ocean surface data from 1800 to the present, with a resolution of 1° × 1° since 1960. This study focuses on the SST data from the Niño 3.4 region (5° S–5° N, 120° W–170° W) covering the period from 1900 to 2019. This results in a dataset of approximately 20 million daily SST observations. The average daily sea temperature for each observation point within the Niño 3.4 region was used to represent the daily sea temperature for that region.

2.2. Changes in Sea Surface Temperature

Figure 1 illustrates the changes in daily sea surface temperature in the Niño 3.4 region from 1900 to 2019. The data reveal that most SST values fluctuated between 25 °C and 29 °C throughout the year, with a significant increase observed since 1970. After 1997, the majority of SST values ranged between 27 °C and 30 °C. Figure 2 presents the annual average SST along with the maximum and minimum daily SST values. The highest recorded annual average SST was 29.12 °C in 2015, while the lowest was 26.62 °C in 1981. The annual maximum daily SST in the Niño 3.4 region typically occurs from March to May, with 62% of these maxima recorded in April and 26% in May. Conversely, the lowest daily SST generally falls between July and November, with August accounting for 45% and September for 22% of these minima.

Apart from anomalies in SST data from 1944 to 1945 due to World War II, the variation in annual average SST from 1900 to 1982 was minimal, fluctuating around 27 °C. However, since 1982, the annual average SST has increased significantly, consistently exceeding the 27 °C threshold. Notably, between 1997 and 1998, the annual average SST reached historically high values of 28.34 °C and 28.31 °C, respectively. Since 2001, the annual average SST has consistently exceeded 28 °C, fluctuating between 28 °C and 29 °C, and in 2015, it surpassed 29 °C. Strong El Niño events occurred in 1982, the 1997–1998 period, and 2015, as documented by NOAA (https://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_v5.php, 16 September 2024).

Furthermore, since 2001, the highest daily SST has consistently exceeded 29 °C, fluctuating between 29 °C and 30 °C. In contrast, the lowest daily SST has exhibited more variability. Since 1981, there has been a steady increase in the lowest daily SST. After 1989, the lowest daily SST consistently exceeded 26 °C, and since 1993, it has often surpassed 27 °C. Subsequently, the lowest daily SST has ranged between 27 °C and 28 °C. Since 1989, the temperature difference between the highest and lowest daily SST has decreased to below 3 °C, even dropping below 2 °C during the period from 1999 to 2009.

Additionally, Figure 3 displays a box plot of monthly SSTs, showing that the median monthly SST ranges from 27 °C to 29 °C. The high-temperature period in the Niño 3.4 region typically occurs from March to May, while the low-temperature period spans from July to September. Figure 4 further depicts the monthly SST series from 1900 to 2019, highlighting that the warming trend during the high-temperature period (January to June) is more pronounced than during the low-temperature period (July to December). According to the data presented in Figure 2, which includes changes in annual mean sea temperature, annual maximum daily SST, and annual minimum daily SST, as well as Figure 4, which shows monthly sea temperature changes, the Mann–Kendall test at the 5% significance level indicates that all Z-scores exceed 1.96. This demonstrates a significant upward trend in seawater temperature changes in the Niño 3.4 region, evident across both yearly and monthly scales, with a more pronounced increase observed after 1980.

2.3. A Brief Description of Empirical Mode Decomposition

In Empirical Mode Decomposition (EMD) analysis, an Intrinsic Mode Function (IMF) must satisfy two key conditions: (1) Number of Extrema and Zero-Crossings: Throughout the entire dataset, the number of extreme points (local maxima and minima) must be equal to or differ by at most one from the number of zero-crossings (points where the function crosses the zero axis); (2) Envelope Condition: At any given point in time, the average of the upper envelope, which is defined by the local maxima, and the lower envelope, defined by the local minima, must be zero. These conditions ensure that the IMFs extracted through EMD are consistent with the underlying assumptions of the decomposition process [16,20].

EMD analysis is readily accessible on the MATLAB computing platform. In instances where the IMFs experience mode mixing, the ensemble empirical mode decomposition (EEMD) method, proposed by Wu and Huang [21], can be employed to resolve this issue. In summary, the original sea temperature data SST(t) can be decomposed into the sum of p IMFs and a residual r(t), as expressed in the following Equation:

SST (t) = \sum_{i = 1}^{p} {IMF}_{i} (t) + r (t) \forall t

(1)

where the number of IMFs p is approximately close to [log₂n] − 1, with n representing the sample size [22]. The residual is a monotonic function or has at most one extremum, reflecting the underlying trend in daily sea temperatures.

2.4. Concept of SST Data Synthesis

The concept of SST data synthesis is depicted in Figure 5. Assume the total length of the data is L × h years. This dataset is divided into L segments, each with a length of h years. Within each segment, the EMD further decomposes it into p IMFs and one residual. This produces L^(p+1) sequence combinations by exchanging the IMFs and residuals between segments. In this study, we focus on synthesizing data for an entire year, meaning this method is employed to generate sea surface temperature data for one year (h = 1). For the length of L, it is advisable to determine it based on autocorrelation analysis of annual data and the number of suitable generated sequences that can be processed.

3. Result and Discussion

3.1. El Niño

The “El Niño phenomenon” is characterized by opposing changes in sea surface temperatures between the eastern and western Pacific Oceans, coupled with a corresponding seesaw-like oscillation in atmospheric pressure [23]. This phenomenon can have widespread effects, altering atmospheric circulation patterns and causing significant impacts across various regions worldwide [24,25]. The Oceanic Niño Index (ONI) is the primary tool used by NOAA to monitor El Niño and La Niña events (https://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_v5.php, 16 September 2024). The ONI is calculated by taking the actual monthly sea surface temperatures in the Niño 3.4 region (5° S–5° N, 120° W–170° W), subtracting the 30-year long-term monthly average for that month, and then averaging the differences over three consecutive months to obtain the representative mid-month ONI value. An ONI value exceeding 0.5 °C for at least five consecutive months indicates an El Niño event, while a value below −0.5 °C for five consecutive months indicates a La Niña event. Due to the noticeable warming trend in the Niño 3.4 region since 1950, NOAA uses multiple 30-year base periods to define the ONI.

Based on data from a Poisson process, Table 1 reveals from 1981 to 2022, El Niño conditions occurred an average of 2.93 times per year, whereas La Niña conditions occurred 3.50 times per year. This suggests that La Niña conditions are slightly more frequent than El Niño conditions. The return period for El Niño events is 3.82 years, while for La Niña events, it is 3.23 years. Additionally, events lasting more than one year for both El Niño and La Niña occur approximately every 8.40 years on average.

Currently, there are numerous dynamic and statistical models available for forecasting ONI values, as accessible through the International Research Institute (IRI) for Climate and Society [26,27,28]. (https://iri.columbia.edu/our-expertise/climate/forecasts/enso/current/, 16 September 2024). Huang and Lee [19] demonstrated the effectiveness of Empirical Mode Decomposition (EMD) in projecting non-stationary daily SST series and used the projected SST series to estimate the number of typhoons. Given the similar context of rising sea surface temperatures, this study also aims to employ EMD-based SST fitting for non-stationary time series to statistically predict ONI values for each month in the upcoming year, along with the probability distribution of surpassing the +/−0.5 °C threshold.

3.2. Adjustments to Oceanic Niño Index Estimates

The Oceanic Niño Index (ONI) is calculated using a 30-year base period, which is updated every five years to cover the assessment period. However, this updating process can cause delays in reflecting current conditions, potentially failing to capture the latest changes in real time. For example, ONI values from 2001 to 2005 were calculated using the base period from 1986 to 2015, while values from 2006 to 2010 were based on the 1991–2020 period. As for the period from 2021 to 2025, the ONI will temporarily use the initial base period (from 1991 to 2020) until the new base period (from 2006 to 2035) is established. This study explores these delays in ONI applications and proposes modifications to improve the index’s responsiveness.

This study analyzes ONI based on NOAA’s monthly SST products (https://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/detrend.nino34.ascii.txt, 16 September 2024) and the specified 30-year base period data (https://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_change.shtml, 16 September 2024). Figure 6 illustrates the initial and final ONI values for the period from 1981 to 2022 based on NOAA’s current ONI calculation process. The ONI values represented by each point on the red line in Figure 6 align with NOAA’s data (https://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_v5.php, 16 September 2024). Analysis indicates that the initial ONI values (depicted in black) are generally higher than the final ONI values (depicted in red) due to the use of an earlier, lower base period.

An accuracy assessment based on SST anomalies from 1981 to 2022, shown in Table 2a, reveals that using the final ONI as the reference data results in higher accuracy for El Niño anomalies (warm, ONI ≥ 0.5 °C) but lower accuracy for La Niña anomalies (cold, ONI ≤ −0.5 °C). Overall, the producer’s accuracy rates for El Niño, Neutral, and La Niña anomalies are 99.19%, 84.19%, and 80.27%, respectively, with an overall accuracy of 86.71%. The initial ONI shows more El Niño events and fewer La Niña events compared to the final ONI. For instance, in 1993, the initial ONI indicated El Niño conditions, while the final ONI did not. Similarly, from 1999 to 2000, the final ONI identified La Niña conditions, whereas the initial ONI showed both El Niño and La Niña events.

This research proposes adjustments to ONI estimates to enhance usability. This is called adjusted ONI. Firstly, the study suggests using sea temperature data from the past 30 years for the calculation year as the base period and updating it annually. Additionally, instead of employing a three-month moving average to represent variations, the study recommends using the difference between the current month’s sea surface temperature and the mean value of the base period. Practically, calculating the indicator using one-month SST anomalies is simpler and more convenient.

The study also compares the accuracy of SST anomalies between NOAA’s final ONI (reference data) and the adjusted ONI (classified data). The producer’s accuracy rates for El Niño, Neutral, and La Niña are 97.56%, 85.47%, and 89.80%, respectively, with an overall accuracy of 89.68% (see Table 2b). This performance is comparable to the initial ONI. Figure 6 demonstrates that the adjusted ONI aligns well with NOAA’s final ONI and can promptly respond to El Niño (e.g., during 1982–1983, 1997–1998, and 2015–2016) and La Niña (e.g., during 1999–2000 and 2021–2022) events.

Currently, NOAA’s “final” ONI serves as the standard for condition judgment. The ONI adjustments proposed in this study offer a simpler, more direct approach, with results closely aligning with the final ONI and outperforming the initial ONI. This revised approach can be used alongside the current ONI to facilitate a more comprehensive climate assessment.

3.3. Characteristics of Long-Term Daily SST Data Decomposed by EMD

The upward trend in sea surface temperatures (SST), as discussed in Section 2.2, highlights the limitations of traditional hydrological models for data generation and prediction in such scenarios. To address these limitations, this study employs Empirical Mode Decomposition (EMD) for SST projection, offering a novel approach for handling non-stationary series.

Using 120 years of daily SST records (1900–2019) from the Niño 3.4 region, EMD decomposes the data into 13 Intrinsic Mode Functions and one residual. Due to space constraints, only a subset of the IMFs is shown in Figure 7. The residual displays an upward trend, starting at 26.812 °C and ending at 28.099 °C, with an average annual increase of 0.0107 °C. It is observed that the period of oscillations increases with the IMF level. Specifically, IMF7 exhibits the largest amplitude, fluctuating between ±1 °C, while IMF13 has the smallest amplitude, oscillating between −0.0122 °C and 0.0087 °C. When all IMFs and the residual are combined and compared to the original data, the average error is 9.09E−6 °C, which is negligible, and the standard deviation is 0.0137 °C, demonstrating the effectiveness of EMD in decomposing the data.

To determine the average period of each IMF, this research utilizes zero-crossing analysis as described by Goda [29]. This technique tracks the change in sign (from positive to negative) through the axis (zero value) to identify the period of each “wave” and then calculates the average period of all “waves” as the period of the IMF. Additionally, the weight of each IMF is defined using the following formula [22]:

E_{i} = {\sum_{\forall t} | A_{t} |}^{2} for i = 1, 2, \dots, p

(2)

W_{i} = \frac{E_{i}}{\sum_{\forall i} E_{i}}

(3)

where

| A_{t} |

represents the vertical distance of the i^th IMF at time t; E_i denotes the total energy of the i^th IMF; and W_i represents the weight of the i^th IMF.

Table 3 shows that the time periods of IMFs in the Niño 3.4 region are similar to those in the typhoon birthplace region, particularly IMF1 to IMF7. Notably, the average period of IMF7 is 347.22 days, approximately one year. As the IMF level increases, the number of “wave” samples decreases, leading to an increase in the time period. The cumulative weight of IMFs from IMF1 to IMF7 in the Niño 3.4 region is 75.61%, with IMF7 contributing the most significant weight at 54.11%, indicating its primary role in data reconstruction. However, the amplitude pattern based on IMF7 does not fully reflect actual SST changes. In contrast, in the typhoon region, the cumulative weight ratio of IMFs from IMF1 to IMF7 reaches 96.82%, with IMF7 accounting for 90.15% of the weight. These differences are attributed to variations in amplitude. For example, the temperature change of IMF7 in the Niño 3.4 area is concentrated between ±1 °C, while in the typhoon region, IMF7’s temperature fluctuation spans ±2 °C [19].

3.4. SST Projection Based on a 5-Year Moving Process

Using SST data from 1900 to 2019, the data are segmented into 120 one-year-long daily sea surface temperature series. Each series is then decomposed using EMD into seven IMFs and one residual (p = 7 here). Figure 8 presents the residuals for each year across the past 120 years, demonstrating EMD’s ability to detect significant SST trends, such as the peak in 2015 and the trough in 1981 (refer to Figure 2). An autocorrelation analysis is conducted on annual average sea surface temperature data from 1900 to 2019. With 120 annual observations, the 95% confidence interval for the autocorrelation sequence of the white noise process is approximately ±0.18. The analysis reveals that the autocorrelation at lag 5 is 0.23, which is marginally above the confidence interval. In contrast, autocorrelation values for lags greater than 5 fall within the confidence interval, indicating that they are not statistically significant beyond lag 5. If L = 5, it implies that, based on data from the past 5 years, 390,625 sequences can be generated, as indicated by L ^{(p + 1)} = 5⁸. Conversely, with L = 4, only 65,536 sequences can be generated. Considering the increased number of sequences that can be handled, L = 5 is regarded as a more suitable choice for this study. Figure 9 illustrates the projection process: the SST estimate for year 6 is generated from observations from years 1 to 5, the estimate for year 7 from years 2 to 6, and so forth. This 5-year moving process retains the data variations from the preceding five years, capturing potential non-stationary characteristics.

In the projection process, each year is treated as an individual entity, with the IMFs and residuals from the data of that year resembling unique chromosomal traits. Just as genetic recombination can create new individuals, this process can generate new SST data. Data anomalies in this context are analogous to chromosomal abnormalities. Each year’s SST series can be viewed as a potential representation of daily SST values for that year, allowing for the projection of daily sea surface temperatures from 1905 to 2020. The study compares the actual annual average SST of the sixth year (the projected year) with the annual average SSTs of the previous five years. Suppose the actual SST of the projected year is not an extreme value (i.e., neither the highest nor the lowest), the monthly SSTs for that year generally fall within the interquartile range of the predicted values. Conversely, if the actual SST of the projected year is an extreme value, the monthly SSTs for that year are often outside the interquartile range of the predicted values.

The SST projection is exemplified using the years 1981, 1982, 1997, 1999, 2000, and 2015. The estimated monthly values are categorized into quartiles, with results shown in Figure 10. As indicated in Figure 2, compared to the sea temperatures of the previous five years, 1981 had the lowest average temperature, while 1997 and 2015 had the highest. The monthly SSTs for 1981 mostly fall below the first quartile, whereas those for 1997 and 2015 are mostly above the third quartile. In contrast, the annual mean temperatures for 1982, 1999, and 2000 were not extreme values. Therefore, the monthly SSTs for these years generally fall within the interquartile range.

A comparison between the actual annual sea surface temperature and the distribution of the synthetic annual sea surface temperatures for the Niño 3.4 region from 1905 to 2019 is also provided (see Figure 11). Generally, unless the actual values for the projected years are extreme, such as in 1981, 1997, and 2015, most of the actual values fall within the box plot range of the synthetic series. The study concludes that the 5-year moving process based on EMD effectively tracks future SST trends, reflecting the characteristics of non-stationary sequences.

3.5. El Niño Forecast

The projected sea surface temperature data will be used to forecast El Niño events. The forecast is categorized into three levels: El Niño, Neutral, and La Niña, based on specific temperature thresholds. An adjusted calculation of the Oceanic Niño Index (ONI) will be employed for the assessment. The periods 1997–2000 and 2015–2016 are used as examples, with 1997 recording the highest temperature of the 20th century (annual average of 28.34 °C) and 2015 marking the highest temperature in history (annual average of 29.12 °C).

As illustrated in Figure 10, the 5-year moving process produces 390,625 SST estimates each month. By comparing these estimates with the historical monthly average SST over the past 30 years, the presence of El Niño can be determined for each month. By analyzing all 390,625 results, the probabilities of El Niño, Neutral, and La Niña occurrences can be calculated. Figure 12 presents the probabilities of El Niño and La Niña occurrences for each month from 1997 to 2000 and from 2015 to 2016.

To provide a detailed illustration of the El Niño forecasting process, consider the year 1997 as an example:

(i): EMD Generation: Using EMD, 390,625 annual sequences are generated based on SST data from 1992 to 1996 (as shown in Figure 10).
(ii): ONI Estimation: The historical average monthly SST from 1967 to 1996 is subtracted from each monthly SST in the 1997 projected series to obtain the 1-month adjusted ONI value. This results in a total of 390,625 ONI forecast values per month.
(iii): Probability Distribution Analysis: The probability distribution of the ONI estimates is analyzed to determine the occurrence counts for El Niño, Neutral, and La Niña conditions. This distribution represents the probabilities of potential El Niño conditions in 1997 and is displayed in Figure 12.

Based on the ONI estimate for 1997, the likelihood of El Niño conditions in April was only 13%, compared to over 20% in other months. In August and September, the probability approached 50%, surpassing the probabilities for Neutral and La Niña conditions. Overall, the probability of Neutral conditions was higher from January to July and October to December, while the probability of La Niña conditions was lower.

However, according to NOAA’s retrospective ONI indicator (https://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_v5.php, 16 September 2024), a year-long El Niño event began in May 1997, followed by a La Niña event from July 1998 to the end of 2000. In contrast, Figure 12 shows that during those years, the probability of “El Niño” and “Neutral” varied, while the probability of “La Niña” remained low. Similarly, NOAA’s ONI indicates that an El Niño event occurred from 2015 to April 2016, followed by a short-term La Niña event. Figure 12 shows that although all three conditions were possible during 2015–2016, “Neutral” was the most likely, while “La Niña” was the least likely.

The study found that when the inferred estimates align with observed values, the adjusted ONI from 1997 to 2000 and from 2015 to 2016 is consistent with NOAA’s retrospective ONI (see Figure 6). This consistency supports the ONI adjustment made in Section 3. However, due to the inherent uncertainty in SST estimates, predictions of future events can only be expressed in terms of probabilities. The study’s results in Figure 12 differ from NOAA’s retrospective ONI results, suggesting that biases in SST estimates may lead to differences in the probabilities of triggering El Niño, Neutral, or La Niña conditions. The study concludes that the uncertainty in SST estimates is the main reason for the lower incidence of La Niña conditions.

Additionally, the study notes that the original data used in this analysis was sourced from ICOADS Release 3, which directly uses the daily average SST in the Niño 3.4 region as the representative value for analysis. In contrast, the current NOAA Extended Reconstructed Sea Surface Temperature V5 (ERSST) data employ statistical methods to fill in missing SSTs based on the ICOADS dataset and calculate the monthly SST (https://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/detrend.nino34.ascii.txt, 16 September 2024). As a result, the SST utilized in this study may be slightly elevated due to differences in sampling points.

4. Conclusions

Due to the impact of climate change on the environment, many hydrological variables, such as precipitation, streamflow, temperature, and groundwater levels, exhibit trends of increase or decrease over time. These changing trends pose challenges to traditional hydrological time series models, making it difficult to match them and complicating water resource management or climate change research. The data generation method based on Empirical Mode Decomposition (EMD) proposed in this paper successfully addresses the issue of generating non-stationary sequences. This method can capture the statistical characteristics of non-stationary sequences that change over time.

In this case study, EMD is applied to project non-stationary daily sea surface temperature data, and the resulting data are used to estimate the probabilities of monthly El Niño and La Niña events. The research results demonstrate the practicality of the proposed EMD-based data generation method. Furthermore, in addition to the existing ONI provided by NOAA, the adjusted ONI outlined in this study is simple and practical, making it a useful reference for evaluating El Niño events.

Based on the authors’ experience with EMD synthetic data, due to the significant variability in rainfall and streamflow in Taiwan, especially during typhoon events, it is generally advisable to apply a logarithmic transformation to the data before conducting EMD analysis. Additionally, compared to the variability in rainfall/runoff EMD analysis, temperature fluctuations are much smaller, which generally leads to better EMD analysis results. Moreover, to avoid mode mixing problems during EMD analysis, it is recommended to use Ensemble Empirical Mode Decomposition (EEMD) for data decomposition.

The forecasts for El Niño and La Niña in this paper differ from NOAA’s predictions. Investigation revealed that the data used in this study are based on ICOADS Release 3, whereas NOAA uses ERSST data, leading to discrepancies in SSTs in the Niño 3.4 region and indirectly resulting in differences in El Niño and La Niña event forecasts. As shown in Table 2a,b, if the analysis were conducted using ERSST data, the adjusted ONI indicator proposed in this study would be highly consistent with NOAA’s ONI results. Therefore, the differences in the basic data are indeed a significant factor of the variation in the research outcomes.

Author Contributions

Formal analysis, C.-W.C. and J.-C.L.; Supervision, W.-C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was sponsored by the National Science and Technology Council in Taiwan (NSTC 113-2221-E-019-020-).

Data Availability Statement

Sea surface temperature data in this study were taken from the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Release 3 (https://icoads.noaa.gov/, 16 September 2024).

Acknowledgments

The authors sincerely thank the National Oceanic and Atmospheric Administration (NOAA) for providing daily sea surface temperature data.

Conflicts of Interest

The authors declare no conflict of interest.

References

Huang, W.C.; Yuan, L.C.; Lee, C.M. Linking genetic algorithms with stochastic dynamic programming to the long-term operation of a multi-reservoir system. Water Resour. Res. 2002, 38, 40-1–40-9. [Google Scholar] [CrossRef]
Huang, W.C.; Chang, C.W. Water shortage risk in Taiwan’s Silicon Valley. J. Chin. Inst. Eng. 2022, 45, 513–520. [Google Scholar] [CrossRef]
Loucks, D.P.; Stedinger, J.R.; Haith, D.A. Water Resources System Planning and Analysis; Prentice-Hall: Old Tappan, NJ, USA, 1981. [Google Scholar]
Yeh, W.W.-G. Reservoir management and operations models: A state-of-the-art review. Water Resour. Res. 1985, 21, 1797–1818. [Google Scholar] [CrossRef]
Huang, W.C.; Chou, C.C. Drought early warning system in reservoir operation: Theory and practice. Water Resour. Res. 2005, 41, W11406. [Google Scholar] [CrossRef]
Biemans, H.; Haddeland, I.; Kabat, P.; Ludwig, F.; Hutjes, R.W.A.; Heinke, J.; von Bloh, W.; Gerten, D. Impact of reservoirs on river discharge and irrigation water supply during the 20th century. Water Resour. Res. 2011, 47. [Google Scholar] [CrossRef]
Brooks, K.N.; Ffolliott, P.F.; Magner, J.A. Hydrology and the Management of Watersheds, 4th ed.; Wiley-Blackwell: Hoboken, NJ, USA, 2012. [Google Scholar]
McGrane, S.J. Impacts of urbanisation on hydrological and water quality dynamics, and urban water management: A review. Hydrol. Sci. J. 2016, 61, 2295–2311. [Google Scholar] [CrossRef]
Viessman, W., Jr.; Lewis, G.L. Introduction to Hydrology, 5th ed.; Pearson: London, UK, 2002. [Google Scholar]
Salas, J.D.; Delleur, J.W.; Yevjevich, V.; Lane, W.L. Applied Modeling of Hydrologic Time Series; Water Resources Publications: Littleton, CO, USA, 1985. [Google Scholar]
Chang, L.C.; Amin, M.Z.M.; Yang, S.N.; Chang, F.J. Building ANN-based regional multi-step-ahead flood inundation forecast models. Water 2018, 10, 1283. [Google Scholar] [CrossRef]
Kao, I.F.; Zhoub, Y.; Chang, L.C.; Chang, F.J. Exploring a long short-term memory based encoder-decoder framework for multi-step-ahead flood forecasting. J. Hydrol. 2020, 583, 124631. [Google Scholar] [CrossRef]
Kow, P.Y.; Hsia, I.W.; Chang, L.C.; Chang, F.J. Real-time image-based air quality estimation by deep learning neural networks. J. Environ. Manag. 2022, 307, 114560. [Google Scholar] [CrossRef]
Zhou, Y.L.; Guo, S.L.; Chang, F.J. Explore an evolutionary recurrent ANFIS for modelling multi-step-ahead flood forecasts. J. Hydrol. 2019, 570, 343–355. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural. Compt. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary time series analysis. Proc. R. Soc. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
Karthikeyan, L.; Kumar, D.N. Predictability of nonstationary time series using wavelet and EMD based ARMA models. J. Hydrol. 2013, 502, 103–119. [Google Scholar] [CrossRef]
Zhang, H.B.; Huang, W.C.; Xi, Q.Y.; Wang, B.; Lan, T. Assessment on long-Term fluctuations of runoff and its climate driving factors. J. Mar. Sci. Technol. 2016, 24, 329–337. [Google Scholar] [CrossRef]
Huang, W.C.; Lee, J.C. Projection of sea surface temperature based on empirical mode decomposition and its application in typhoon number forecast. J. Mar. Sci. Technol. 2023, 31, 7. [Google Scholar] [CrossRef]
Lu, C.L.; Hsu, C.Y.; Huang, P.H. Analysis of power system low frequency oscillation with empirical mode decomposition. J. Mar. Sci. Technol. 2018, 26, 1. [Google Scholar] [CrossRef]
Wu, Z.; Huang, N.E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
Huang, W.C.; Chu, T.Y.; Jhang, Y.S.; Lee, J.L. Data synthesis based on empirical mode decomposition. J. Hydrol. Eng. 2020, 25, 04020028. [Google Scholar] [CrossRef]
Cane, M.A.; Zebiak, S.E.; Dolan, S.C. Experimental forecasts of EI Nino. Nature 1986, 321, 827–832. [Google Scholar] [CrossRef]
Marek, G.W.; Baumhardt, R.L.; Brauer1, D.K.; Gowda, P.H.; Mauget, S.A.; Moorhead, J.E. Evaluation of the Oceanic Niño Index as a decision support tool for winter wheat cropping systems in the Texas High Plains using SWAT. Comput. Electron. Agric. 2018, 151, 331–337. [Google Scholar] [CrossRef]
Yu, J.Y.; Zou, Y.; Kim, S.T.; Lee, T. The changing impact of El Niño on US winter temperatures. Geophys. Res. Lett. 2012, 39, L15702. [Google Scholar] [CrossRef]
Barnston, A.G.; Tippett, M.K.; L’Heureux, M.L.; Li, S.; Dewitt, D.G. Skill of real-time seasonal ENSO model predictions during 2002–11: Is our capability increasing? Bull. Am. Meteorol. Soc. 2012, 93, 631–651. [Google Scholar] [CrossRef]
Clarke, A.J. El Niño physics and El Niño predictability. Ann. Rev. Mar. Sci. 2014, 6, 79–99. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Huang, X.; Luo, J.J.; Lin, Y.; Wright, J.S.; Lu, Y.; Chen, X.; Jiang, H.; Lin, P. Prediction of ENSO using multivariable deep learning. Atmos. Ocean. Sci. Lett. 2023, 16, 100350. [Google Scholar] [CrossRef]
Goda, Y. Random Seas and Design of Maritime Structures, 3rd ed.; Advanced Series on Ocean Engineering; World Scientific: Singapore, 2010; Volume 33. [Google Scholar]

Figure 1. Historical daily sea surface temperature data in the Niño 3.4 region (1900–2019).

Figure 2. Characteristics of annual SST in the Niño 3.4 region (1900–2019).

Figure 3. Box plots of monthly SST in the Niño 3.4 region.

Figure 4. Changes in monthly mean SST in the Niño 3.4 region.

Figure 5. EMD-based data synthesis: h-year individual decomposition + inter-individual mating.

Figure 6. Comparison of initial ONI and adjusted ONI with final ONI (1981–2022).

Figure 7. Decomposition of 120-year daily SST by EMD in the Niño 3.4 region.

Figure 8. Residue in each year in the Niño 3.4 region (1900–2019).

Figure 9. Projection based on a 5-year moving process.

Figure 10. Comparison between the projected SST quartiles and the actual SST for each month.

Figure 11. Comparison of projected and observed annual mean SST in the Niño 3.4 region (1905~2019).

Figure 12. El Niño forecast (Cases: 1997–2000 and 2015–2016).

Table 1. Frequency of El Niño and La Niña (1981–2022).

		ONI Reaches Threshold	ONI Reaches Threshold for 5 Consecutive Months	ONI Reaches Threshold for 7 Consecutive Months	ONI Reaches Threshold for 9 Consecutive Months	ONI Reaches Threshold for 12 Consecutive Months
El Niño (ONI ≥ 0.5 °C)	Number of occurrences	123	11	10	6	5
El Niño (ONI ≥ 0.5 °C)	Return period (year)	0.34	3.82	4.20	7.00	8.40
La Niña (ONI ≤ −0.5 °C)	Number of occurrences	147	13	10	7	5
La Niña (ONI ≤ −0.5 °C)	Return period (year)	0.29	3.23	4.20	6.00	8.40

Table 2. Accuracy assessment of the classification results (1981–2022).

(a)	Final ONI (Reference Data)
Initial ONI (Classified data)		Reference	El Niño	Neutral	La Niña	Total
	Classified
	El Niño		122	37	6	165
	Neutral		1	197	23	221
	La Niña		0	0	118	118
	Total		123	234	147	504
(b)	Final ONI (Reference Data)
Adjusted ONI (Classified data)		Reference	El Niño	Neutral	La Niña	Total
	Classified
	El Niño		120	29	1	150
	Neutral		3	200	14	217
	La Niña		0	5	132	137
	Total		123	234	147	504

Table 3. Comparison of EMD results of SST between Niño 3.4 region and typhoon region.

Region	Niño 3.4 (5° S–5° N, 120° W–170° W)		Typhoon Birthplace * (5° N–25° N, 110° E–170° E)
Data Length	1900~2019		1969~2018
	Period (Days)	Weight (%)	Period (Days)	Weight (%)
IMF1	2.95	3.01	2.90	1.12
IMF2	5.86	3.26	5.73	1.04
IMF3	11.33	2.90	11.15	0.91
IMF4	22.14	2.12	22.09	0.77
IMF5	43.51	1.83	43.76	0.80
IMF6	116.65	8.38	100.35	2.03
IMF7	347.22	54.11	365.52	90.15
IMF8	1059.47	7.01	710.21	1.26
IMF9	2049.23	3.97	1459.81	0.58
IMF10	4058.37	3.21	3107.59	0.65
IMF11	10,800.15	3.85	7940.18	0.68
IMF12	27,678.34	6.34	12,066.70	0.01
IMF13	N/A	0.01	N/A	0.00

Note: * Cited from Huang and Lee (2023) [19].

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chang, C.-W.; Lee, J.-C.; Huang, W.-C. Hydrological Data Projection Using Empirical Mode Decomposition: Applications in a Changing Climate. Water 2024, 16, 2669. https://doi.org/10.3390/w16182669

AMA Style

Chang C-W, Lee J-C, Huang W-C. Hydrological Data Projection Using Empirical Mode Decomposition: Applications in a Changing Climate. Water. 2024; 16(18):2669. https://doi.org/10.3390/w16182669

Chicago/Turabian Style

Chang, Che-Wei, Jung-Chen Lee, and Wen-Cheng Huang. 2024. "Hydrological Data Projection Using Empirical Mode Decomposition: Applications in a Changing Climate" Water 16, no. 18: 2669. https://doi.org/10.3390/w16182669

APA Style

Chang, C. -W., Lee, J. -C., & Huang, W. -C. (2024). Hydrological Data Projection Using Empirical Mode Decomposition: Applications in a Changing Climate. Water, 16(18), 2669. https://doi.org/10.3390/w16182669

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hydrological Data Projection Using Empirical Mode Decomposition: Applications in a Changing Climate

Abstract

1. Introduction

2. Material and Method

2.1. Data Collection

2.2. Changes in Sea Surface Temperature

2.3. A Brief Description of Empirical Mode Decomposition

2.4. Concept of SST Data Synthesis

3. Result and Discussion

3.1. El Niño

3.2. Adjustments to Oceanic Niño Index Estimates

3.3. Characteristics of Long-Term Daily SST Data Decomposed by EMD

3.4. SST Projection Based on a 5-Year Moving Process

3.5. El Niño Forecast

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI