Next Article in Journal
Marine Operations in the Norwegian Sea and the Ice-Free Part of the Barents Sea with Emphasis on Polar Low Pressures
Previous Article in Journal
Refinement and Validation of the SPEcies at Risk Index for Metals (SPEARmetal Index) for Assessing Ecological Impacts of Metal Contamination in the Nakdong River, South Korea
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Forecasting Gate-Front Water Levels Using a Coupled GRU–TCN–Transformer Model and Permutation Entropy Algorithm

Water Conservancy College, North China University of Water Resources and Electric Power, Zhengzhou 450046, China
*
Author to whom correspondence should be addressed.
Water 2024, 16(22), 3310; https://doi.org/10.3390/w16223310
Submission received: 12 September 2024 / Revised: 4 November 2024 / Accepted: 6 November 2024 / Published: 18 November 2024
(This article belongs to the Topic Water and Energy Monitoring and Their Nexus)

Abstract

:
Water level forecasting has significant impacts on transportation, agriculture, and flood control measures. Accurate water level values can enhance the safety and efficiency of water conservancy hub operation scheduling, reduce flood risks, and are essential for ensuring sustainable regional development. Addressing the nonlinearity and non-stationarity characteristics of gate-front water level sequences, this paper introduces a gate-front water level forecasting method based on a GRU–TCN–Transformer coupled model and permutation entropy (PE) algorithm. Firstly, an analysis method combining Singular Spectrum Analysis (SSA) and Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) is used to separate the original water level data into different frequency modal components. The PE algorithm subsequently divides each modal component into sequences of high and low frequencies. The GRU model is applied to predict the high-frequency sequence part, while the TCN–Transformer combination model is used for the low-frequency sequence part. The forecasting from both models are combined to obtain the final water level forecasting value. Multiple evaluation metrics are used to assess the forecasting performance. The findings indicate that the combined GRU–TCN–Transformer model achieves a Mean Absolute Error (MAE) of 0.0154, a Root Mean Square Error (RMSE) of 0.0205, and a Coefficient of Determination (R2) of 0.8076. These metrics indicate that the model outperforms machine learning Support Vector Machine (SVM) models, GRU models, Transformer models, and TCN–Transformer combination models in forecasting performance. The forecasting results have high credibility. This model provides a new reference for improving the accuracy of gate-front water level forecasting and offers significant insights for water resource management and flood prevention, demonstrating promising application prospects.

1. Introduction

With global climate warming and the intensification of ecological environmental degradation, various natural disasters have become increasingly frequent, especially floods in river regions, which result in severe losses [1]. Hydrological forecasting explores the patterns of hydrological phenomena and is essential for flood prevention, drought mitigation, and efficient water resource utilization [2]. Hydrological forecasting typically includes flow forecasting, water level forecasting, sediment concentration forecasting, and water quality forecasting, with water level forecasting being central to flood control, navigation management, and water conservancy safety. However, due to the non-stationarity of hydrological processes, the complexity of characteristic variables, and noisy data, accurate water level forecasting has remained a challenging problem in related research [3].
To date, water level forecasting models can be broadly divided into two main categories, the first being those based on physical models and mathematical statistical models. Vicente et al. [4] developed a forecasting model for the Estana Lake basin based on rainfall characteristics, basin location, and spatiotemporal scale changes. This model used eight different runoff accumulation algorithms and fifteen different spatial patterns to predict runoff and analyze the results, also predicting lake water level trends under different threshold runoff conditions. Liu Zhiyu et al. [5] introduced a distributed hydrological model to address the issues of rapid onset, fast convergence time, and short forecast periods associated with small and medium river floods, and applied it to watersheds with limited data. These models, primarily based on statistical methods, have made varying degrees of progress in water level forecasting but suffer from limitations such as handling non-linear relationships and insufficient generalization ability.
Another approach is based on machine learning and hybrid models for water level forecasting. With the rapid development of big data and artificial intelligence technologies, many hydrologists have utilized machine learning, data mining, and deep learning methods [6,7] to improve existing water level forecasting methods and models, proposing a series of feasible forecasting models. Nguyen et al. [8] used Support Vector Machines (SVM) for river water level forecasting. Although their evaluation metrics yielded favorable outcomes, the factors considered were not exhaustive. Puttinaovarat et al. [9] introduced an adaptive machine learning method that uses learning strategies to drive data for predicting different flood levels. This method achieved good forecasting results through extensive applications. Khalaf et al. [10] developed a forecasting model combining the Internet of Things and machine learning to prevent river flooding. Garcia et al. [11] used historical water level and rainfall data with a Random Forest method to predict water level changes. While traditional machine learning methods are commonly employed, their accuracy requires further enhancement. Seo et al. [12] conducted simulation experiments using daily water level data from Korean hydrological stations, combining wavelet decomposition with neural networks and fuzzy inference to construct river water level forecasting models. Comparison with original methods showed that the hybrid model improved the accuracy of river water level forecasting. Indrastanti et al. [13] used a Long Short-Term Memory (LSTM) network model to predict downstream river water levels using time series of precipitation and water levels from upstream and downstream points. Moishin et al. [14] developed a flood forecasting model using the deep learning method ConvLSTM to assess the likelihood of future flood events based on flood index forecasting. Although this model achieved certain results, it did not effectively handle correlations between data. In 2014, Cho and his team [15] presented the Gated Recurrent Unit (GRU), a model that streamlines the internal neuron operations and improves training efficiency, while maintaining an accuracy level similar to that of LSTM models. Water level forecasting is influenced by various factors and features nonlinearity, temporality, and complexity. Single neural network models often do not meet the accuracy requirements for forecasting [16]. Therefore, Liu Weifei et al. [17] combined the GRU with a Back Propagation (BP) neural network to predict lake water levels and examined how varying training sets affect model accuracy. Results showed that the GRU-BP hybrid model maintained high forecasting accuracy even with limited sample data. Hu Hao et al. [18] introduced a hybrid model based on a modified weighted DRSN-LSTM for multi-timescale forecasting of downstream water levels at the Xiangjiaba Dam. By using a newly constructed error weight correction function and cross-entropy function to adjust water level response factors, and optimizing LSTM network parameters with the Archimedes optimization algorithm, the model demonstrated good accuracy and efficiency in water level forecasting. Nie Qingqing et al. [19] proposed a forecasting model combining an improved Grey Wolf Optimization (MGWO) algorithm with a Temporal Convolutional Network (TCN), using the improved Grey Wolf algorithm to enhance TCN’s forecasting performance.
These forecasting models can handle the nonlinearity and temporality of precipitation data sequences, but their performance often declines due to non-stationary elements and substantial noise within the sequences [20]. Hence, it is essential to preprocess the sequences prior to predictive analysis. Singular Spectrum Analysis (SSA) is a non-parametric statistical method used for time series analysis, signal processing, and image processing [21]. It extracts trends, periodic components, and noise from the original data to analyze and predict time series [22], effectively identifying trends, periodic components, and noise in the data. SSA is particularly effective for removing or reducing random noise in complex hydrological sequences. Therefore, preliminary decomposition using SSA can effectively separate signal and noise components from the data, providing a clearer and less noisy data foundation for subsequent analysis. The CEEMDAN method is one of the most commonly used data-denoising techniques currently [23]. Guo et al. [24] used CEEMDAN to process Zhengzhou annual precipitation time series data, and the CEEMDAN-LSTM model improved the accuracy of single LSTM models in precipitation time series forecasting. Tao et al. [25] utilized CEEMDAN for decomposing historical water level time series data and employed the CEEMDAN-GRU model for predicting water levels following IMF reorganization. The findings indicated that the optimized CEEMDAN-GRU model achieved better forecasting accuracy compared to the LSTM and CEEMDAN-LSTM models. Despite this, previous research overlooked the significance of varying frequency components. Different frequency components exert different influences on forecasting results, with long-term forecasting being more affected by low-frequency components. Moreover, the GRU model frequently shows greater deviations during the training and forecasting of low-frequency components, which affects the overall precision of the forecasting. Research on frequency-targeted precipitation forecasting models for long-term forecasting is still lacking. Therefore, addressing the shortcomings of single models in parameter optimization and forecasting accuracy, we propose a “decompose-divide frequency domain-predict” forecasting model. The permutation entropy algorithm classifies the components obtained from CEEMDAN, including both IMF and Res, into high- and low-frequency categories. The low-frequency components are modeled using a TCN–Transformer hybrid model, while the high-frequency components are trained using a GRU model. This approach captures trends from subtle variations in the data, leading to improved prediction accuracy. The Transformer model is widely used in natural language processing (NLP) fields, such as machine translation [26], time series forecasting [27], and question-answering systems [28]. TCN (Temporal Convolutional Network) is a deep learning model specifically designed for processing sequence data and capturing temporal dependencies through convolutional layers [29]. Given the distinct advantages of both TCN and Transformer models, this paper considers coupling them. This coupled model leverages the complementary strengths of both models in sequence data handling, enhancing the model’s ability to process time series data, particularly in capturing long-term dependencies and understanding complex sequence dynamics, making it particularly suitable for complex time series forecasting tasks, such as forebay water level data in this study.
Based on the above, to improve the accuracy of gate-front water level forecasting, this study, using daily water level monitoring data from the Mengcheng Water Conservancy Hub from 2018 to 2022, proposes a method combining Singular Spectrum Analysis, Complete Ensemble Empirical Mode Decomposition with Adaptive Noise, Gated Recurrent Unit Neural Network, Temporal Convolutional Network, and Transformer models based on self-attention mechanisms. The GRU–TCN–Transformer water level forecasting model is constructed to simulate and predict the gate-front water levels of the Mengcheng Water Conservancy Hub from 2018 to 2022. The forecasting results are compared with those from various models and real data to demonstrate the model’s effectiveness and provide a new reference for improving the accuracy of gate-front water level forecasting.

2. Research Methodology

2.1. GRU

The GRU network improves upon traditional LSTM and RNN models. It effectively captures dependencies in varied temporal sequences while addressing the gradient information vanishing issue inherent in RNNs [30]. The GRU’s internal design is relatively straightforward. Similar to LSTM, GRU introduces gating mechanisms to mitigate the gradient problem found in RNNs. However, the GRU comprehensively resolves issues seen in both previous algorithms by maintaining a pair of input variables for the recurrent mechanism. Specifically, the GRU features update and reset gates. The update gate determines the extent to which previous information is kept in the current state, while the reset gate manages the combination of information from earlier and current time steps. Consequently, GRU ensures accelerated training durations while maintaining high levels of predictive accuracy. Employing GRU for temporal data analysis enables the identification of patterns across different intervals, making it especially advantageous for tasks like forecasting. Figure 1 depicts the fundamental architecture of the GRU neural network.
Figure 1 illustrates the flow path of data using arrows. Within the equations, σ denotes the Sigmoid activation function, while tanh represents the hyperbolic tangent activation function. Variables z t and r t correspond to the update gate and reset gate, respectively. The input is denoted as x t , and H t 1 represents the output from the preceding GRU unit. The term h t ˜ synthesizes data from H t 1 and x t , with h t being the final output of an individual GRU unit.
The computation process for the GRU unit is explained below [15]:
(1)
The update vector z t is obtained by calculating the update gate using the following formula.
z t = σ W z x t + U z h t 1 + b z
(2)
To determine the reset vector r t , the reset gate is computed using the current time step’s input vector x t and the output vector H t 1 from the previous time step. This calculation yields the reset vector r t for the current time step.
r t = σ W r x t + U r H t 1 + b r
(3)
Update the vector h t ˜ with current time step data.
h t ˜ = tanh r t W h 1 + U x t + b
(4)
To compute the output vector h t for the present time step, the update gate vector is integrated with the retained vector h t ˜ through the output gate.
h t = ( 1 z t ) h t ˜ + z t h t 1
The GRU architecture provides an effective approach for capturing complex relationships in time series data across multiple temporal scales, showcasing significant adaptability and accuracy in modeling high-frequency precipitation data. Compared to LSTM, the GRU demonstrates superior performance in extracting features from long sequences, demands a reduced number of training parameters, and provides enhanced computational efficiency. To enhance the model’s forecasting accuracy, we utilized the GRU architecture to process and learn from the high-frequency intrinsic mode functions (IMFs) derived from the CEEMDAN decomposition of precipitation data in this study.

2.2. TCN

TCN (Temporal Convolutional Network) is a deep learning model specifically designed for handling sequential data. It captures temporal dependencies in time series data through convolutional layers [31]. TCN demonstrates excellent performance in various sequence forecasting and classification tasks due to its series of innovative structural designs, particularly in addressing long-term dependency issues more effectively and efficiently than traditional Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks. The structure of the TCN network is shown in Figure 2.
Its development is a natural extension of deep learning applications in time series analysis. It stems from recognizing the advantages of Convolutional Neural Networks (CNNs) in handling sequential data, particularly in capturing long-term dependencies and avoiding the complexities associated with recurrent structures. The conceptualization and implementation of TCN integrate the benefits of CNNs, dilated convolutions, residual connections, and causal convolutions, creating a powerful tool for time series analysis. By leveraging the advantages of convolutional networks and optimizing specifically for time series data, TCN offers an efficient and robust framework for various sequence modeling tasks and is extensively used in areas such as natural language processing, audio analysis, and time series forecasting. Its main features include the following:
Causal Convolution: TCN uses causal convolution to ensure that when predicting the output at the current time step, the model can only utilize information from the current time step and prior steps, thereby preventing the leakage of future information. This is achieved by designing the convolutional kernel to only be connected to past data.
Causal convolution ensures that when predicting the output at time step t, the model relies only on inputs from time step t and before. Mathematically, given an input sequence x 0 , x 1 , , x T 1 , the causal convolution operation can be represented as [29]:
y t = i = 0 k 1 w i x t i
where y t is the output at time step t, w i represents the weights of the convolutional kernel, and k is the size of the convolutional kernel. To ensure the causality of the model, it is required that ti ≥ 0, which means that when calculating y t , the convolution operation does not use any future inputs x t + 1 , x t + 2 , .
Dilated Convolution: To capture long-term dependencies, TCN employs dilated convolution techniques. By increasing the spacing between elements in the convolutional kernel, dilated convolution expands the receptive field of the convolutional layer, allowing the network to cover longer input sequences without significantly increasing computational complexity.
Dilated convolution introduces a dilation factor d to increase the spacing between the weights of the convolutional kernel, thereby expanding the receptive field of the convolutional layer and capturing long-term dependencies. The operation of dilated convolution can be represented as [29]:
y t = i = 0 k 1 w i x t d i
where d is the dilation factor used to control the spacing between input elements. By adjusting the value of d, the model can cover a longer range of input sequences without significantly increasing the number of parameters.
Residual Connections: TCN typically includes residual connections, which help alleviate the problems of vanishing or exploding gradients during the training of deep networks, allowing the network to learn deeper sequence features. In TCN, the output of the residual module includes not only the output of the convolutional layer but also the input itself, as represented by the following formula [29]:
  output   = Activation y t + x t
where y t is the output of the convolutional layer (which may include multiple convolution operations), x t is the input to the residual module, and Activation() is an activation function, such as ReLU. Residual connections allow gradients to flow directly through the network, thereby improving training stability and efficiency.
Variable Sequence Lengths: Unlike recurrent neural network structures, TCN can handle sequences of varying lengths, making it more flexible when dealing with time series data of different lengths.
By incorporating these concepts, TCN can effectively capture both short-term and long-term dependencies in time series data. Specifically, TCN achieves learning of long-term sequence dependencies by stacking multiple dilated convolutional layers (with dilation factors typically increasing exponentially for each layer), while using residual connections to enhance the training efficiency and stability of the model. These designs enable TCN to perform excellently in various time series-related tasks, such as text generation and time series forecasting.

2.3. Transformer

In 2017, the team at Google introduced the Transformer architecture [32], which is built entirely on attention mechanisms and includes both encoder and decoder components. This model, possessing a more complex architecture compared to conventional attention methods, demonstrates superior capability in feature extraction. In contrast to the GRU, the Transformer employs self-attention to analyze the full sequence, enabling parallel processing. Additionally, it offers enhanced capabilities for managing global information, making it particularly effective for detecting periodic variations in low-frequency precipitation elements.
The Transformer model is divided into two main components: the encoder on the left and the decoder on the right. The encoder comprises N = 6 identical layers, each containing two sub-layers: a multi-headed self-attention mechanism and a fully connected feed-forward neural network. The decoder replicates this structure with six identical layers; however, each layer in the decoder incorporates three sub-layers. The first two sub-layers are akin to those in the encoder, while the third sub-layer focuses on attention between the encoding and decoding processes. Residual connections and layer normalization are applied to each sub-layer in the Transformer. Figure 3 illustrates the Transformer model’s architecture.
The Transformer model employs an attention mechanism that consists of three types of vectors: Query, Key, and Value. Here, Query denotes the feature matrix for the query, Key signifies the feature matrix for the key, and Value refers to the feature matrix for the value. Each weight is determined through the application of the softmax function for normalization, which is then utilized to produce the output. The output matrix is derived using the formula presented in Equation (8) [32].
A t t e n t i o n ( Q ; K ; V ) = s o f t max ( Q K T d K ) V
Here, d K represents the dimension of the Key vector in each head, while d V denotes the dimension of the Query vector.
In practical applications, the Transformer network primarily employs the multi-head self-attention framework. This method projects the Query, Key, and Value vectors into different subspaces using linear transformations, followed by the concatenation of results from h attention heads to produce the final output. The entire computation process is outlined in the following Equations (9) and (10) [30].
M u l t i H e a d ( Q ; K ; V ) = C o n c a t ( h e a d 1 , h e a d 2 , , h e a d h ) W O
h e a d 1 = A t t e n t i o n ( Q W i Q , K W i K , V W i V )
Here, W O denotes the weight matrix, W i * represents the projection matrix, and h indicates the number of heads in the multi-head attention mechanism.
To apply the Transformer model to time series analysis, it is essential to focus on multi-head self-attention mechanism and the positional encoding. These capabilities enable the simultaneous handling of input time series, which speeds up the training processes and enables effective modeling of dependencies over both long and short time frames. Furthermore, by analyzing trends in low-frequency components obtained after decomposition, the Transformer model improves the accuracy of rainfall predictions.

2.4. TCN–Transformer Coupled Model

Due to the distinct advantages of both the TCN and Transformer models, this paper considers coupling the two. This coupled model leverages the complementary strengths of both models in handling sequential data, thereby enhancing the model’s ability to process time series data, particularly in capturing long-term dependencies and understanding complex sequence dynamics. This makes it especially suitable for handling complex time series forecasting tasks, such as the upstream water levels addressed in this study.
From the perspective of long-term dependencies, water level data typically contain long-term seasonal and cyclical patterns. TCN can effectively capture these long-term dependencies through its dilated convolutions, while the Transformer’s self-attention mechanism can learn dependencies between time points on a global scale, enabling the model to understand and utilize these complex temporal dynamics. From the perspective of understanding local patterns, the convolutional structure of TCN is well suited for capturing local time series patterns, such as short-term water level fluctuations. These randomly occurring local patterns are crucial for understanding water level characteristics and making reasonable forecasting, especially during extreme weather events. In terms of improving forecasting accuracy and robustness, coupling TCN and Transformer can enhance the model’s forecasting accuracy and robustness. TCN provides efficient learning of local features in the time series, while Transformer enhances the understanding of global dependencies. This combination makes the model more flexible and robust in facing different forecasting challenges. From the perspective of reducing training difficulty, the inclusion of TCN can alleviate the burden on the Transformer model when processing long sequences, thereby reducing training difficulty and improving training efficiency. TCN reduces the length and complexity of the sequence through local convolution operations, enabling the Transformer to learn key information in the sequence more effectively. In summary, the TCN–Transformer coupled model effectively predicts upstream water levels by leveraging the strengths of both architectures. This coupled model can more comprehensively and accurately capture the complex dynamics of water level changes, thereby providing important support for water resource management, flood warning, and water conservancy planning. Figure 4 illustrates the TCN–Transformer coupled model.

2.5. GRU–TCN–Transformer Coupled Model

To achieve a deeper decomposition of upstream water levels and enhance the accuracy of rainfall forecasting, this research proposes a forecasting method that integrates GRU neural networks, TCN deep learning models, and Transformer models. This approach utilizes daily historical precipitation data from the Mengcheng Water Conservancy Hub covering the years 2018 to 2022. The GRU–TCN–Transformer coupled forecasting model is constructed by employing an analysis method that integrates SSA and CEEMDAN. This approach transforms complex, non-stationary time series data into multiple relatively stationary time sub-series, thereby fully extracting information from the original water level data. Additionally, the permutation entropy algorithm is employed to partition the stationary sub-series by frequency, separating the decomposed sub-series into components of high and low frequencies. For the high-frequency sub-series components, a GRU neural network, which is particularly effective in learning the features of long sequences with lower parameter overhead and accelerated computational performance, is used for modeling and forecasting. This not only shortens the duration of training but also minimizes the consumption of resources. For the low-frequency sub-series components, the TCN–Transformer coupled model, which combines the advantages of both architectures, can comprehensively and accurately capture the complex dynamics of water level changes, thereby yielding more accurate forecasting results. Therefore, the TCN–Transformer model is used for modeling and forecasting of the low-frequency sub-series components. Finally, the predicted results of each sub-series are combined to produce the upstream water level forecasts. Figure 5 illustrates the GRU–TCN–Transformer coupled model framework.
The process is outlined below:
(1)
Preprocessing and Decomposition of Data. First, outliers in the data are replaced and filled in as necessary. SSA is used to decompose the original data sequence. After removing noise components, the data are reconstructed. Subsequently, CEEMDAN performs a secondary decomposition on the reconstructed data. The appropriate number of modes is selected based on the central frequency of different data sequences, resulting in K intrinsic mode function (IMF) components.
(2)
Frequency Division of Components. The components obtained from CEEMDAN decomposition, including both the IMF and residual components, are partitioned.PE calculates the entropy values for each component, facilitating the separation into high-frequency and low-frequency sub-series.
(3)
Model Training. Each IMF component is trained and tested independently. The components from the training dataset are used to train the deep learning models. Once training is complete, the test dataset labels are used to forecast each IMF component. The forecasts for all sub-series are then aggregated to reconstruct the final simulated upstream water level.
(4)
Model Evaluation. The simulated values from the GRU–TCN–Transformer coupled forecasting model are assessed against the actual values from the validation dataset using appropriate evaluation metrics.
Traditionally, data has been split into training and validation subsets based on established guidelines, where the training subset generally represents more than 60% of the entire dataset, while approximately 20% is allocated for validation. Depending on the context, researchers adopt various splitting ratios. For instance, Huang Chao and colleagues [33] assigned 75% of their dataset to training and cross-validation, reserving 25% for independent testing while developing a model for summer precipitation forecasting in Hunan. Similarly, Ren Yufei and his team [34] allocated 70% of their dataset for training purposes and 30% for testing when developing a model for short-term wind speed forecasting along high-speed rail corridors. In this research, we noted significant trend changes during the MK mutation test in the later stages of the data. To ensure the overall accuracy of the model, we decided to utilize the initial 91% of the original data sequence for training, while the remaining 9% was set aside for validation during the model construction.

2.6. Model Assessment Metrics

To assess the accuracy of annual precipitation forecasts from each model and to reflect the intuitiveness, reliability, and accuracy of the GRU–TCN–Transformer coupled model in comparison to others, we utilized several evaluation metrics: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination ( R 2 ). The following outlines the definitions of these metrics:
Mean Absolute Error (MAE):
M A E = 1 N k = 1 N y k - y ^ k
Root Mean Square Error (RMSE):
R M S E = 1 N k = 1 N y k - y ^ k 2
Coefficient of Determination ( R 2 ):
R 2 = 1 k = 1 N y k y ^ k k = 1 N y k y ¯
The following equations delineate the evaluation metrics employed to assess the efficacy of the model. In these metrics, y k represents the actual values within the dataset, y ^ k predicted outcomes generated by the model, y ¯ is the average of the actual values, and N indicates the total number of data points. Mean Absolute Error (MAE) measures the average of the absolute errors between the predicted and true values, while Root Mean Square Error (RMSE) calculates the square root of the average of the squared differences between predicted and true values. Both MAE and RMSE function as measures of model error and forecasting precision, where values nearer to zero indicate greater accuracy and reduced error. The Coefficient of Determination ( R 2 ) assesses the goodness of fit for the regression model, where a higher R 2 value signifies a better fit. Generally, an R 2 value approaching 1 indicates a high degree of model fit.

3. Case Study

3.1. Study Area

Mengcheng County, under the jurisdiction of Bozhou City in Anhui Province, People’s Republic of China, is located in the western part of the province, bordering Henan Province. It occupies the junction between Anhui and Henan. The county lies at the western edge of the Huaihe River Basin, with geographical coordinates ranging from 116°15′43″ E to 116°49′25″ E and 32°55′29″ N to 33°29′04″ N. Mengcheng County covers a total area of approximately 2091 square kilometers. Its topography is predominantly plain, especially in the eastern and southern regions, while the western and northern regions exhibit some undulation, as shown in Figure 6.
Mengcheng County falls within the warm temperate monsoon climate zone, displaying distinct monsoon characteristics. The county’s climate features can be summarized as having four distinct seasons, moderate temperatures, and suitable precipitation levels. Specifically, spring is warm and humid but occasionally experiences spring cold snaps; summer is hot and rainy. The county’s geographical location and climate support its multifunctional infrastructure, including flood control, drainage, water storage, irrigation, transportation, and navigation. The levees on either side of the area are primary levees of the Huaihe River, holding significant geographical strategic importance. The layout of the hub is shown in Figure 7.

3.2. Data Source

The water level measurements utilized in this research are sourced from daily monitoring at the Mengcheng Water Conservancy Hub, covering the period from 1 January 2018, to 31 December 2022, and encompassing a total of 1826 days. The upstream water level data are shown in Figure 8. The maximum water level is 25.51 m and the minimum is 24.08 m. From the original data, it can be observed that the water level fluctuates between 25.0 m and 25.8 m without extreme variations. From the beginning of 2018 to early 2020, the water level remained relatively stable, oscillating slightly around 25.2 m. From early 2020 to mid-2021, there was a slight downward trend, particularly around 20 February 2020, when a notable decrease in water level occurred. From mid-2021 to the end of 2022, the upstream water level stabilized again, with a slight increase in fluctuation range but without any drastic changes. Overall, the upstream water level demonstrates a relatively stable state.
In hydrological research, the Mann–Kendall (MK) trend test on historical upstream water level data offers significant advantages [35]. As a non-parametric statistical method, the MK test does not rely on distributional assumptions of the data, which makes it highly adaptable when dealing with environmental and hydrological data, especially when the data do not follow a strict normal distribution. Under a significance level of 0.05, a Mann–Kendall (MK) trend test was conducted on the original upstream water level data, with the results shown in Figure 9. The MK test results for the upstream water level data indicate that the UF statistic (red solid line) shows an insignificant trend during the early period of the study but then rapidly decreases, entering the significant region. Subsequently, the UF statistic demonstrates a significant upward trend until it intersects with the UB statistic (blue dashed line) in the later part of the time series (around 1300 days), indicating a significant moment of trend reversal. The UB statistic displays an opposing trend to UF for most of the time but also enters the significant region towards the end of the series, intersecting with the significant upward trend of the UF statistic. Both lines fluctuate around the significance level line (yellow dashed line) at 0.05, indicating that the upstream water level experienced significant fluctuations during the study period. Particularly toward the end of the time series, the crossing of UF and UB signifies a notable change in the water level trend. In summary, the upstream water level has undergone significant trend changes over the past five years, which may be related to regional water regulation policies.

3.3. Feature Data Decomposition Results

When studying the upstream water level data, considering the complexity and inherent non-linear characteristics of such data, this study proposes an analytical method that combines SSA with CEEMDAN. Specifically, the process involves initially applying SSA for preliminary decomposition and denoising, followed by reconstruction, and then using CEEMDAN for further in-depth decomposition.

3.3.1. Initial Decomposition Using SSA

SSA is a non-parametric time series analysis method that effectively identifies trends, periodic components, and noise within the original data. This method is particularly effective at removing or reducing random noise in the data. Therefore, when dealing with complex hydrological sequences, initial decomposition using SSA can effectively separate signal and noise components in the data, providing a clearer and noise-reduced basis for subsequent analysis [36]. This study utilizes MATLAB R2021a to perform SSA decomposition on the historical upstream water level data, with the specific results shown in Figure 10.
Figure 10b presents the overall decomposition results for upstream water levels, while Figure 10a illustrates the fluctuations of IMF2-5 components after decomposition. From Figure 10, it can be seen that the first row of data typically represents the trend component (IMF1) after SSA processing. This component is the most prominent feature in the upstream water level time series, appearing as a relatively smooth curve that reflects the overall temporal trend in water levels, indicating a certain trend and seasonal variation. The periodic components IMF2-IMF4 represent cyclical fluctuations or seasonal elements within the original data. These fluctuations may be associated with factors such as rainy seasons, gate discharges, or agricultural water demands. The final component, IMF5, is residual noise, which has a smaller amplitude and lacks a distinct structural pattern, as shown in Figure 10b. Residual noise may include data deviations caused by measurement errors or other irregular factors. After SSA decomposition, we discard the noise-related component, namely the high-frequency noise component IMF5, and retain only those components that represent the main dynamics of the original series for reconstruction. This process further purifies the data, providing a foundation for high-quality analysis. The reconstructed data preserve the most significant trend and periodic information while removing most of the random noise, thereby enhancing the signal quality of the data.

3.3.2. Secondary Decomposition Using CEEMDAN

Building on the denoising and reconstruction achieved through SSA, applying CEEMDAN for further decomposition can reveal deeper and more subtle dynamic features of the data. This method is particularly useful for complex upstream water level variations, providing profound insights into basin responses, seasonal changes, and potential nonlinear dynamics [37]. This study uses MATLAB R2021a to perform CEEMDAN decomposition on the historical upstream water level data, with the specific results shown in Figure 11.
As depicted in the figure, the original nonlinear and non-stationary time series is effectively decomposed into a series of intrinsic mode functions (IMFs), each revealing different inherent frequency components within the data. The decomposition ranges from high-frequency short-term fluctuations (e.g., IMF1 and IMF2) to low-frequency components representing long-term trends (e.g., IMF10 and IMF11). Observing the provided charts, it is evident that the volatility of the water level data decreases progressively from IMF1 to IMF11, with frequencies continuing to decrease. The components IMF1 through IMF4 exhibit substantial amplitude and high frequency, indicating more intense short-term water level changes, which may be related to daily gate operations or sudden hydrological events. The presence of these high-frequency components suggests instability in the original data during high-frequency periods. Further examination of IMF5 to IMF7 reveals smoother waveforms, highlighting medium- to long-term variations in the water level data, potentially corresponding to seasonal fluctuations or hydrological cycles driven by environmental changes. The low-frequency component IMF11 reflects the long-term trend in the data, which can be interpreted as a cumulative effect of factors such as climate change, regional water policies, or natural water cycles. Compared to the original water level series, the subsequence components obtained through CEEMDAN decomposition display more regular fluctuation patterns. This increased regularity indicates that CEEMDAN effectively reduces data non-stationarity, which is beneficial for training deep learning models. Deep learning models learn more representative features when processing stationary or nearly stationary data, thereby enhancing the accuracy and reliability of water level forecasting.
As illustrated in the figure, the original nonlinear and non-stationary time series is effectively decomposed into a series of intrinsic mode functions (IMFs) using this method, with each IMF revealing different inherent frequency components within the data. This ranges from high-frequency short-term fluctuations (e.g., IMF1 and IMF2) to low-frequency components representing long-term trends (e.g., IMF10 and IMF11). From the provided charts, it is evident that the volatility of the water level data decreases progressively from IMF1 to IMF11, with frequencies continuing to diminish. The components IMF1 through IMF4 exhibit substantial amplitude and high frequency, indicating more intense short-term water level changes, which may be related to daily gate operations or sudden hydrological events. The presence of these high-frequency components points to instability in the original data during high-frequency periods. Further observation of IMF5 to IMF7 reveals smoother waveforms, highlighting medium- to long-term variations in the water level data, which may correspond to seasonal fluctuations or hydrological cycles driven by environmental changes. The low-frequency component IMF11 reflects the long-term trend in the data, which can be interpreted as a cumulative effect of factors such as climate change, regional water policies, or natural water cycles. Compared to the original water level series, the subsequence components obtained through CEEMDAN decomposition display more regular fluctuation patterns. This increased regularity suggests that CEEMDAN effectively reduces data non-stationarity, which is beneficial for training deep learning models. Deep learning models learn more representative features when processing stationary or nearly stationary data, thereby enhancing the accuracy and reliability of water level forecasting.
By combining SSA and CEEMDAN, this study utilizes the advantages of both analytical techniques. SSA provides a clean, denoised data foundation, while CEEMDAN further reveals the complex dynamics of the data. This integrated method enhances both the precision of data analysis and the insight into the underlying dynamics of hydrological processes.
In summary, through the process of initial decomposition and denoising with SSA, followed by reconstruction and in-depth decomposition with CEEMDAN, this study effectively handles and analyzes complex upstream water level data, revealing its intrinsic dynamic changes and characteristics, and providing a scientific basis for water resource management and forecasting.

3.4. Model Building

Based on the pre-processing of the dataset, this study aims to construct a water level forecasting model using deep learning. Parameter selection is critical in model construction, as it directly affects prediction performance and generalization ability. Choosing the appropriate lag length is particularly important; an excessive lag length may lead to overfitting, while a short one may omit valuable historical information, thereby reducing prediction accuracy. In this paper, the input step length is determined using ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) calculations.
ACF measures the correlation between a time series and its lagged versions, essentially revealing the relationship between current observations and past values [38]. The ACF calculation involves several steps: first, the mean of the entire series is computed; for each lag k, the covariance between the series and its lagged version k is calculated. Then, this covariance is divided by the variance of the series to obtain the autocorrelation coefficient, which ranges from −1 to 1, where 1 indicates perfect positive correlation, −1 indicates perfect negative correlation, and 0 indicates no correlation.
PACF, on the other hand, measures the correlation between a time series and its lagged versions, while removing the influence of intermediate lags [39]. In simpler terms, it seeks to identify the unique contribution of a specific lag k to the time series, given all other lags. Typically, PACF is calculated through the following steps: first, the series’ ACF is calculated; then, for each lag k, linear regression is performed between the time series and its lagged values at k − 1, k − 2, etc., up to lag k. The regression coefficient obtained from each model for lag k serves as the PACF value. ACF and PACF plots for upstream water levels are shown in Figure 12.
As shown in Figure 12b, the upstream water level declines after the initial stage, though the rate of decrease is relatively slow, indicating the need to consider more lagged values. Accordingly, this study calculates the time step N = 5 for the upstream water level prediction model. Additionally, it is assumed that each IMF component has the same lag length N as the original precipitation series in this research.
The parameters for deep learning model training in this study were determined through a literature review and trial-and-error optimization [40]. After extensive experiments, the optimal settings for the GRU model were selected as follows: 200 hidden units, ReLU activation function, 500 iterations, an initial learning rate of 0.005, a mini-batch size of 50, a learning rate decay period of 250 iterations, and a learning rate decay factor of 0.2. The optimizer used is Adam, and the loss function is RMSE. To control overfitting, a Dropout rate of 5% is set, and the input sequence length is N. For the TCN–Transformer model, the input lag step is N, the number of input features is 1, the embedding dimension is 32, the number of neurons in the fully connected layer is 64, and the number of heads in the multi-head attention mechanism is 4. To prevent overfitting, a dropout rate of 3% is set. The model includes 3 TCN and Transformer encoder–decoder layers. The learning rate is set to 0.001, and the number of iterations is 300. The TCN layers use 32, 64, and 128 channels, respectively, and the convolution kernel size is 6.
To assess the frequency levels of the IMF components after CEEMDAN decomposition, the permutation entropy algorithm calculates the entropy values for each component, as depicted in Figure 13. The IMF components obtained from SSA and CEEMDAN decomposition are typically arranged in order from high to low frequency, where high-frequency components capture rapid fluctuations in the data, while low-frequency components reveal underlying trends or periodic changes. When applying the permutation entropy (PE) algorithm to analyze these IMF components, they can be grouped based on their entropy values. This is because the PE algorithm captures the regularity and randomness within each component, characteristics closely linked to frequency: high-frequency components tend to exhibit more randomness (as they represent rapid changes), thus showing higher entropy values; conversely, low-frequency components are generally more regular or periodic, resulting in lower entropy values. This approach provides a theoretically sound basis for grouping and enhances the practical understanding and utilization of these IMF components.
Figure 13 shows the trend of decreasing permutation entropy from IMF1 to IMF11 components, with the horizontal axis (1–11) representing each corresponding component. It can be observed that there is a significant difference in permutation entropy values among the components. The permutation entropy value of IMF1 reaches nearly 1.8, while that of IMF11 is just above 0.6, indicating a notable difference in complexity across the components. The decrease in permutation entropy suggests that the transition from high-frequency to low-frequency components involves a shift from higher randomness and complexity to lower levels. Therefore, we set a threshold of 1 for permutation entropy values, categorizing sequences with values greater than 1 as high-frequency and those with values less than or equal to 1 as low-frequency. The GRU model is applied to the high-frequency sequences, while the TCN–Transformer model is used for the low-frequency sequences to further predict the components.

4. Results and Commentary

4.1. Forecasted Outcomes

Using the TCN–Transformer models and GRU models to predict each IMF component and the residuals following decomposition, and then aggregating these forecasting, the resulting model is the CEEMDAN-GRU–Transformer. A comparison of the forecasting from the GRU–TCN–Transformer coupled model with the validation dataset can be seen in Figure 14. From Figure 14, it can be seen that the predicted values generally exhibit a similar fluctuation trend to the actual data, with the two curves approaching or overlapping at certain time points, indicating that the forecasting are very close to the real data. The model’s predictive capability is particularly important at extreme points, such as the peaks and troughs of the water level in the figure. The model can relatively well predict the occurrence of peaks and troughs, although there might be slight deviations in the exact water level values. In summary, the GRU–TCN–Transformer coupled model demonstrates a relatively good trend-fitting ability and stability in water level forecasting.
To empirically evaluate the effectiveness of the GRU–TCN–Transformer coupled precipitation forecasting model, this study also developed four alternative predictive models for comparative analysis. These models include the Support Vector Machine (SVM) model, the GRU model, the Transformer model, and the TCN–Transformer combined model. All models maintain consistent input conditions and parameter optimization, with the results obtained from the two-stage SSA-CEEMDAN decomposition used as input values. The comparison results of each model are shown in Figure 15, where each subplot displays the relationship between actual values and model forecasting in the form of scatter plots, with the black diagonal line representing the ideal case of perfect forecasting. It can be observed from the figure that the forecasting of all models is closely distributed around the diagonal line, indicating the accuracy of the model forecasting. Among them, the GRU–TCN–Transformer combined model has the most compact point distribution near the diagonal line, suggesting that this model outperforms the others in forecasting performance. Additionally, the scatter points for the SVM and Transformer models are more dispersed, indicating slightly lower forecasting accuracy.

4.2. Discussion of Findings

In order to assess the accuracy of annual precipitation forecasts from each model, several evaluation metrics were utilized: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination ( R 2 ) [41]. Table 1 provides a summary of the evaluation metrics for the different models.
As shown in Table 1, the MAE and RMSE of the single SVM model are the largest, and the R 2 value is the smallest, indicating that its forecasting performance is the worst. Compared to the single models GRU, Transformer, and SVM, the combined TCN–Transformer model significantly improves forecasting accuracy, with MAE decreasing by 33.01%, RMSE decreasing by 31.17%, and R 2 increasing by 0.1314. When comparing the GRU–TCN–Transformer model with the combined TCN–Transformer model, MAE decreases by 24.14%, RMSE decreases by 25.72%, and R 2 increases by 0.0747. Among all models, the GRU–TCN–Transformer model exhibits the best forecasting performance, with an MAE of 0.0154, an RMSE of 0.0205, and an R 2 of 0.8076. The R 2 of the GRU–TCN–Transformer model exceeds 0.8, and it has the smallest MAE and RMSE values, indicating that this model has high reliability [34]. A comparison of the various metrics is shown in Figure 16.
In summary, compared to other single-architecture models (such as TCN–Transformer, GRU, Transformer, and SVM), the GRU–TCN–Transformer model demonstrates higher predictive accuracy for upstream water levels, achieving a prediction value of 0.8076. This indicates that the combined use of TCN–Transformer and GRU maximizes the strengths of each model, allowing it to better capture the dynamic variations in water levels across different components. Differences in model structure, data processing methods, and parameter optimization strategies all have varying degrees of influence on prediction accuracy. Notably, the composite model (such as GRU–TCN–Transformer) shows significant potential in reducing prediction error, suggesting that the two-stage decomposition method with frequency-based grouping used for predictions in this study is particularly suitable for hydrological conditions of this kind. This approach offers valuable insights and experience for future research.
The paper combines daily water level monitoring data from the Mengcheng Water Conservancy Hub between 2018 and 2022 to develop a forecasting method for the water level in front of the sluice gate, based on a GRU–TCN–Transformer coupled model and the PE algorithm. From a broader perspective, Cho et al. [15] proposed the GRU in 2014, a type of recurrent neural network that simplifies the computational process of internal neurons, thereby enhancing training efficiency while maintaining comparable output accuracy to the LSTM model. Although this method can account for both the nonlinearity and temporal characteristics of the water level data sequence, its predictive performance significantly declines when dealing with the non-stationary parts of the sequence and substantial noise. To address this issue, prior to predictive analysis, we employed a preprocessing technique that combines SSA and CEEMDAN to improve the stability of the algorithm and mitigate the impact of noise. Tao et al. [25] utilized CEEMDAN for decomposing historical water level time series data and employed the CEEMDAN-GRU model for predicting water levels following IMF reorganization. However, this study did not adequately consider the significance of various frequency components, each of which influences the prediction outcomes differently. Low-frequency components have a greater impact on long-term forecasts, and the GRU model frequently shows considerable bias during the training and prediction of these components, which adversely affects the accuracy of the overall predictions. Therefore, we used the PE algorithm to classify the IMFs and residual components obtained from CEEMDAN into high- and low-frequency components. The low-frequency components are modeled using a TCN–Transformer hybrid model, while the high-frequency components are trained using a GRU model. This approach captures trends from subtle variations in the data, leading to improved prediction accuracy. The final prediction results of the GRU–TCN–Transformer coupled model, when compared to the validation dataset, were very close to the real data, demonstrating relatively strong trend-fitting capabilities and stability. The predictive performance was also superior to that of other single or combined models.
Furthermore, this study has certain limitations. It primarily relies on monitoring data from a single hydrological environment, specifically the daily monitoring data of the water level at the gate of the Mengcheng Water Conservancy Hub. This choice may result in insufficient spatial representativeness of the data, thereby limiting the understanding of broader hydrological phenomena. During the research process, we did not consider incorporating additional variables that could influence the prediction of the water level at the gate. To enhance the diversity of the data, an important direction for future work will be to include a wider range of data sources. This should involve collecting multidimensional data such as meteorological information, water demand, hydrological factors, and water quality metrics. By employing interdisciplinary approaches, these data can be more effectively integrated into model construction and data analysis, thereby improving the model’s adaptability to complex environmental factors and the comprehensiveness of its predictions.

5. Conclusions

To address the nonlinear and non-stationary characteristics of the upstream water level data, this study proposes an analysis method combining Singular Spectrum Analysis (SSA) and Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN). This approach involves a process of data decomposition, denoising, reconstruction, and further decomposition to obtain high-quality IMF subsequences. The PE algorithm is then used to divide each modal component into high-frequency and low-frequency sequences. The GRU and TCN–Transformer models are employed to predict these two parts separately, which reduces the forecasting error of each component, improves overall forecasting accuracy, and enhances forecasting stability. This method provides a scientific basis for water resource management and forecasting.
The daily upstream water level forecasting results for the Mengcheng Water Conservancy Hub from 2018 to 2022 indicate that using an analysis method combining Singular Spectrum Analysis (SSA) with Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) to decompose the original data can significantly improve forecasting performance. The GRU–TCN–Transformer model shows higher forecasting accuracy in upstream water level forecasting compared to other single architecture models such as TCN–Transformer, GRU, Transformer, and SVM. The forecasting results have achieved a value of 0.8076, demonstrating that the combination of TCN–Transformer and GRU can maximize the advantages of each model, allowing the model to better capture the dynamics of water level changes in the components and resulting in forecasting with high reliability.
The model proposed in this study effectively improves the forecasting accuracy of upstream water levels. However, the research process did not consider incorporating additional variables that could impact upstream water level forecasting. Thus, a key focus for future research should be expanding the data diversity by incorporating additional dimensions, such as meteorological factors, water demand, hydrology, and water quality. Utilizing interdisciplinary approaches to more deeply integrate these variables into model construction and data analysis will enhance the model’s adaptability to complex environmental factors and the comprehensiveness of its forecasting. Additionally, the study did not consider whether to incorporate other variables that may affect the prediction of water levels in front of the sluice gate. It is necessary to increase the diversity of data to enhance the model’s adaptability to complex environmental factors and improve the comprehensiveness of the predictions.

Author Contributions

All authors contributed to the study conception and design. Writing and editing: J.Z. and T.H.; Chart editing: L.W. and Y.W.; Preliminary data collection: T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Henan Provincial Key R&D and Promotion Special Project (Science and Technology Tackling) (182102210066), the National Natural Science Foundation of China (51709115).

Data Availability Statement

Data and materials are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Emanuel, K. Assessing the present and future probability of Hurricane Harvey’s rainfall. Proc. Natl. Acad. Sci. USA 2017, 114, 12681–12684. [Google Scholar] [CrossRef] [PubMed]
  2. Clark, M.P.; Bierkens, M.F.P.; Samaniego, L.; Woods, R.A.; Uijlenhoet, R.; Bennett, K.E.; Pauwels, V.R.N.; Cai, X.; Wood, A.W.; Peters-Lidard, C.D. The evolution of process-based hydrologic models: Historical challenges and the collective quest for physical realism. Hydrol. Earth Syst. Sci. 2017, 21, 3427–3440. [Google Scholar] [CrossRef] [PubMed]
  3. Mosavi, A.; Ozturk, P.; Chau, K.W. Flood prediction using machine learning models: Literature review. Water 2018, 10, 1536. [Google Scholar] [CrossRef]
  4. López-Vicente, M.; Pérez-Bielsa, C.; López-Montero, T.; Lambán, L.J.; Navas, A. Runoff simulation with eight different flow accumulation algorithms: Recommendations using a spatially distributed and open-source model. Environ. Model. Softw. 2014, 62, 11–21. [Google Scholar] [CrossRef]
  5. Liu, Z.Y.; Hou, A.Z.; Wang, X.Q. Flood forecasting for small-and medium-sized rivers based on distributed hydrological modeling. J. China Hydrol. 2015, 35, 1–6. [Google Scholar]
  6. Wu, L.; Bai, T.; Ha, Y.; Huang, Q. Study on the impact of hydrological sequence variability on reservoir scheduling and operation. J. Water Resour. Water Eng. 2016, 4, 88–92. [Google Scholar]
  7. Liu, X.; Yao, H.; Zhang, H.; Xia, Y.; Zhao, J. Hourly-scale water level prediction in the Three Gorges Reservoir based on machine learning. Yangtze River 2023, 2, 147–151. [Google Scholar] [CrossRef]
  8. Nguyen, T.T.; Le, H.T. Water level prediction at tich-bui river in vietnam using support vector regression. In Proceedings of the 2019 International Conference on Machine Learning and Cybernetics (ICMLC), Kobe, Japan, 7–10 July 2019; pp. 1–6. [Google Scholar] [CrossRef]
  9. Puttinaovarat, S.; Horkaew, P. Flood forecasting system based on integrated big and crowdsource data by using machine learning techniques. IEEE Access 2020, 8, 5885–5905. [Google Scholar] [CrossRef]
  10. Clark, M.P.; Bierkens, M.F.P.; Samaniego, L.; Woods, R.A.; Uijlenhoet, R.; Bennett, K.E.; Pauwels, V.R.N.; Cai, X.; Wood, A.W.; Peters-Lidard, C.D. IoT-enabled flood severity prediction via ensemble machine learning models. IEEE Access 2020, 8, 70375–70386. [Google Scholar] [CrossRef]
  11. Garcia, F.C.C.; Retamar, A.E.; Javier, J.C. Development of a predictive model for on-demand remote river level nowcasting: Case study in Cagayan River Basin, Philippines. In Proceedings of the 2016 IEEE Region 10 Conference (TENCON), Singapore, 22–25 November 2016; pp. 3275–3279. [Google Scholar] [CrossRef]
  12. Seo, Y.; Kim, S. River stage forecasting using wavelet packet decomposition and data-driven models. Procedia Eng. 2016, 154, 1225–1230. [Google Scholar] [CrossRef]
  13. Widiasari, I.R.; Nugoho, L.E.; Efendi, R. Context-based hydrology time series data for a flood prediction model using LSTM. In Proceedings of the 2018 5th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), Semarang, Indonesia, 27–28 September 2018; pp. 385–390. [Google Scholar] [CrossRef]
  14. Moishin, M.; Deo, R.C.; Prasad, R.; Raj, N.; Abdulla, S. Designing deep-based learning flood forecast model with ConvLSTM hybrid algorithm. IEEE Access 2021, 9, 50982–50993. [Google Scholar] [CrossRef]
  15. Cho, K. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
  16. Liu, X.; Song, W.; Qian, F.; Wang, L.; Feng, L.; Xie, W. Meteorological drought prediction method based on VMD-CQPSO-GRU model. J. North China Univ. Water Resour. Electr. Power (Nat. Sci. Ed.) 2021, 42, 31–40. [Google Scholar] [CrossRef]
  17. Liu, W.; Chen, B.; Yu, Z. Exploration of lake water level prediction methods based on GRU-BP combined model. China Rural. Water Hydropower 2022, 58–65. Available online: https://openurl.ebsco.com/results?sid=ebsco:ocu:record&bquery=IS+1007-2284+AND+IP+11+AND+DT+2022 (accessed on 5 November 2024).
  18. Hu, H.; Ma, X.; Xu, Y.; Ren, Y. Multi-time scale prediction of downstream water level at Xiangjiaba based on weight correction and DRSN-LSTM model. Water Resour. Hydropower Technol. 2022, 53, 46–57. [Google Scholar] [CrossRef]
  19. Nie, Q.; Wan, D.; Zhu, Y.; Li, Z.; Yao, C. Hydrological model based on time-domain convolutional network. Comput. Appl. 2022, 46, 1756–1761. [Google Scholar]
  20. Adamowski, J.; Chan, H.F. A wavelet neural network conjunction model for groundwater level forecasting. J. Hydrol. 2011, 407, 28–40. [Google Scholar] [CrossRef]
  21. Golyandina, N.; Shlemov, A. Variations of singular spectrum analysis for separability improvement: Non-orthogonal decompositions of time series. arXiv 2013, arXiv:1308.4022. [Google Scholar] [CrossRef]
  22. Finley, M.G.; Broadfoot, R.M.; Shekhar, S.; Miles, D.M. Identification and removal of reaction wheel interference from in-situ magnetic field data using multichannel singular spectrum analysis. J. Geophys. Res. Space Phys. 2023, 128, e2022JA031020. [Google Scholar] [CrossRef]
  23. Colominas, M.A.; Schlotthauer, G.; Torres, M.E. Improved complete ensemble EMD: A suitable tool for biomedical signal processing. Biomed. Signal Process. Control 2014, 14, 19–29. [Google Scholar] [CrossRef]
  24. Guo, S.; Wen, Y.; Zhang, X.; Zhu, G.; Huang, J. Research on precipitation prediction based on a complete ensemble empirical mode decomposition with adaptive noise–long short-term memory coupled model. Water Supply 2022, 22, 9061–9072. [Google Scholar] [CrossRef]
  25. Sun, T.; Wang, Y.; Chen, W.; Liang, X. Research on water level prediction on CEEMDAN-GRU model under the IMFs recombination. In Proceedings of the 2021 2nd Asia Symposium on Signal Processing (ASSP), Beijing, China, 12–14 November 2021; pp. 77–83. [Google Scholar] [CrossRef]
  26. Zhang, J.; Luan, H.; Sun, M.; Zhai, F.; Xu, J.; Zhang, M.; Liu, Y. Improving the transformer translation model with document-level context. arXiv 2018, arXiv:1810.03581. [Google Scholar] [CrossRef]
  27. Civitarese, D.S.; Szwarcman, D.; Zadrozny, B.; Watson, C. Extreme precipitation seasonal forecast using a transformer neural network. arXiv 2021, arXiv:2107.06846. [Google Scholar] [CrossRef]
  28. Vale, L.D.N.; Maia, M.D.A. Towards a question answering assistant for software development using a transformer-based language model. arXiv 2021, arXiv:2103.09423. [Google Scholar] [CrossRef]
  29. Lea, C.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks for action segmentation and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 156–165. Available online: https://arxiv.org/abs/1611.05267 (accessed on 5 November 2024).
  30. Hussain, Y.; Huang, Z.; Zhou, Y.; Wang, S. CodeGRU: Context-aware deep learning with gated recurrent unit for source code modeling. Inf. Softw. Technol. 2020, 125, 106309. [Google Scholar] [CrossRef]
  31. Lea, C.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks: A unified approach to action segmentation. In Proceedings, Part III 14, Proceedings of the Computer Vision–ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–10 and 15–16 October 2016; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 47–54. [Google Scholar] [CrossRef]
  32. Vaswani, A. Attention is all you need. Advances in Neural Information Processing Systems. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
  33. Huang, C.; Li, Q.P.; Xie, Y.J.; Peng, J.D. Application of machine learning methods in summer precipitation prediction in Hunan Province. J. Atmos. Sci. 2022, 45, 191–202. [Google Scholar]
  34. Ren, Y.F.; Li, L.; Guo, J.J. Short-term wind speed prediction model along high-speed railway based on CEEMDAN-GWO-LSSVM. Technol. Econ. Areas Commun. 2023, 25, 68–73. [Google Scholar]
  35. Hamed, K.H.; Rao, A.R. A modified Mann-Kendall trend test for autocorrelated data. J. Hydrol. 1998, 204, 182–196. [Google Scholar] [CrossRef]
  36. Bonizzi, P.; Karel, J.M.; Meste, O.; Peeters, R.L. Singular spectrum decomposition: A new method for time series decomposition. Adv. Adapt. Data Anal. 2014, 6, 1450011. [Google Scholar] [CrossRef]
  37. Xie, P.; Gu, H.; Sang, Y.F.; Wu, Z.; Singh, V.P. Comparison of different methods for detecting change points in hydroclimatic time series. J. Hydrol. 2019, 577, 123973. [Google Scholar] [CrossRef]
  38. Yakubu, U.A.; Saputra, M.P.A. Time series model analysis using autocorrelation function (ACF) and partial autocorrelation function (PACF) for E-wallet transactions during a pandemic. Int. J. Glob. Oper. Res. 2022, 3, 80–85. [Google Scholar] [CrossRef]
  39. Weiß, C.H.; Aleksandrov, B.; Faymonville, M.; Jentsch, C. Partial autocorrelation diagnostics for count time series. Entropy 2023, 25, 105. [Google Scholar] [CrossRef]
  40. Cahuantzi, R.; Chen, X.; Güttel, S. A comparison of LSTM and GRU networks for learning symbolic sequences. In Science and Information Conference; Springer Nature: Cham, Switzerland, 2023; pp. 771–785. [Google Scholar] [CrossRef]
  41. Zhao, J.; Nie, G.; Yan, M.; Wang, Y.; Wang, L. A novel approach to precipitation prediction using a coupled CEEMDAN-GRU-Transformer model with permutation entropy algorithm. Water Sci. Technol. 2023, 88, 1015–1038. [Google Scholar] [CrossRef]
Figure 1. Structure of the basic GRU unit.
Figure 1. Structure of the basic GRU unit.
Water 16 03310 g001
Figure 2. TCN network modules: (left) cell structure details; (center) residual block 1; (right) residual block 2.
Figure 2. TCN network modules: (left) cell structure details; (center) residual block 1; (right) residual block 2.
Water 16 03310 g002
Figure 3. Transformer model structure.
Figure 3. Transformer model structure.
Water 16 03310 g003
Figure 4. TCN–Transformer coupled model structure.
Figure 4. TCN–Transformer coupled model structure.
Water 16 03310 g004
Figure 5. Architecture of the coupled GRU–TCN–Transformer framework.
Figure 5. Architecture of the coupled GRU–TCN–Transformer framework.
Water 16 03310 g005
Figure 6. Geographic location map of Mengcheng.
Figure 6. Geographic location map of Mengcheng.
Water 16 03310 g006
Figure 7. Schematic diagram of the Mengcheng hub layout.
Figure 7. Schematic diagram of the Mengcheng hub layout.
Water 16 03310 g007
Figure 8. Pre-gate level data.
Figure 8. Pre-gate level data.
Water 16 03310 g008
Figure 9. Mann–Kendall test result chart for upstream water level data.
Figure 9. Mann–Kendall test result chart for upstream water level data.
Water 16 03310 g009
Figure 10. SSA decomposition of upstream water level data. (a) fluctuation diagram of IMF2-5 after SSA decomposition of upstream water level data. (b) overall SSA decomposition diagram of upstream water level data.
Figure 10. SSA decomposition of upstream water level data. (a) fluctuation diagram of IMF2-5 after SSA decomposition of upstream water level data. (b) overall SSA decomposition diagram of upstream water level data.
Water 16 03310 g010aWater 16 03310 g010b
Figure 11. CEEMDAN decomposition of upstream water level reconstruction data.
Figure 11. CEEMDAN decomposition of upstream water level reconstruction data.
Water 16 03310 g011
Figure 12. ACF and PACF plots of upstream water levels.
Figure 12. ACF and PACF plots of upstream water levels.
Water 16 03310 g012
Figure 13. Results of entropy calculations for upstream water level alignments.
Figure 13. Results of entropy calculations for upstream water level alignments.
Water 16 03310 g013
Figure 14. Upstream water level GRU–TCN–Transformer coupled model forecasting vs. validation set data.
Figure 14. Upstream water level GRU–TCN–Transformer coupled model forecasting vs. validation set data.
Water 16 03310 g014
Figure 15. Upstream water level forecasting model fits scatter plots.
Figure 15. Upstream water level forecasting model fits scatter plots.
Water 16 03310 g015
Figure 16. Comparison of different model evaluation indicators.
Figure 16. Comparison of different model evaluation indicators.
Water 16 03310 g016
Table 1. Performances of different models.
Table 1. Performances of different models.
ModelMAERMSE R 2
GRU-TCN–Transformer0.01540.02050.8076
TCN–Transformer0.02030.02760.7329
GRU0.02710.03710.6428
Transformer0.03030.04010.6015
SVM0.03520.04360.5367
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, J.; He, T.; Wang, L.; Wang, Y. Forecasting Gate-Front Water Levels Using a Coupled GRU–TCN–Transformer Model and Permutation Entropy Algorithm. Water 2024, 16, 3310. https://doi.org/10.3390/w16223310

AMA Style

Zhao J, He T, Wang L, Wang Y. Forecasting Gate-Front Water Levels Using a Coupled GRU–TCN–Transformer Model and Permutation Entropy Algorithm. Water. 2024; 16(22):3310. https://doi.org/10.3390/w16223310

Chicago/Turabian Style

Zhao, Jiwei, Taotao He, Luyao Wang, and Yaowen Wang. 2024. "Forecasting Gate-Front Water Levels Using a Coupled GRU–TCN–Transformer Model and Permutation Entropy Algorithm" Water 16, no. 22: 3310. https://doi.org/10.3390/w16223310

APA Style

Zhao, J., He, T., Wang, L., & Wang, Y. (2024). Forecasting Gate-Front Water Levels Using a Coupled GRU–TCN–Transformer Model and Permutation Entropy Algorithm. Water, 16(22), 3310. https://doi.org/10.3390/w16223310

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop