Spatiotemporal Prediction of Tidal Fields in a Semi-Enclosed Marine Bay Using Deep Learning

Zhu, Zuhao; Yan, Xiaohui; Wang, Zhuo; Liu, Sidi

doi:10.3390/w17030386

Open AccessArticle

Spatiotemporal Prediction of Tidal Fields in a Semi-Enclosed Marine Bay Using Deep Learning

¹

Guangxi Key Laboratory of Beibu Gulf Marine Resources, Environment and Sustainable Development, Fourth Institute of Oceanography, Ministry of Natural Resources, Beihai 536000, China

²

Department of Water Resources Engineering, Dalian University of Technology, Dalian 116000, China

³

Liaoning Province Water Resources Management Group Co., Ltd., Shenyang 110000, China

^*

Author to whom correspondence should be addressed.

Water 2025, 17(3), 386; https://doi.org/10.3390/w17030386

Submission received: 24 December 2024 / Revised: 16 January 2025 / Accepted: 28 January 2025 / Published: 31 January 2025

(This article belongs to the Special Issue Advances in Hydraulic and Water Resources Research (3rd Edition))

Download

Browse Figures

Versions Notes

Abstract

:

The prediction of tidal fields is crucial in coastal and marine hydrodynamic analyses, particularly in complex tidal environments, as it plays an essential role in disaster warning and fisheries management. However, monitoring the entire tidal field is impractical, and harmonic analysis and numerical simulation methods continue to face challenges in accuracy and efficiency for large-scale predictions. To address these issues, this paper proposes a tidal field prediction method based on Long Short-Term Memory (LSTM) networks. A physics-based hydrodynamic model is established, and the numerical model is validated using observational data from multiple sites in the study area. The accuracy is quantified using performance indicators such as root mean square error (RMSE) and correlation coefficients. The validated numerical model is then used to generate a high-quality comprehensive dataset. An LSTM-based model is then developed to predict tidal fields in a semi-closed marine bay. The performance of the LSTM-based model is compared with models developed using Transformer, Random Forest, and KNN regression methods. The results demonstrate that the LSTM-based model surpasses the other machine learning models in prediction accuracy, with a notable advantage in handling time series field data. This study introduces new ideas and technical approaches for rapid tidal field prediction, overcoming the limitations of traditional methods and providing robust support for coastal disaster prevention, resource management, and environmental protection.

Keywords:

LSTM; tide level field prediction; numerical simulation; machine learning

1. Introduction

Tidal variations profoundly impact coastal ecosystems, marine resource development, and climate change research. Tidal field forecasting is a core topic in marine and coastal hydrodynamics, with significant implications for disaster prevention, ecological protection, and port operations in coastal areas. However, tidal changes are influenced by numerous factors, and the nonlinear characteristics and topographic effects in coastal regions complicate predictions. Traditional monitoring methods, constrained by the limited number of tidal stations and high maintenance costs, cannot provide comprehensive tidal variation information. The early harmonic analysis method relied on a large amount of historical observational data, which limited its application in dynamic or data-scarce environments. Wavelet analysis, as an improved method, allows for local time–frequency analysis and uses adaptive basis functions to capture the transient features of tidal signals. However, its effectiveness depends on the selection of the wavelet basis function and scale parameters, which may not be widely applicable. In semi-enclosed bays, the complex coastline geometry, strong tidal interactions, and local meteorological effects further complicate the application of harmonic and wavelet analysis. With the advancement of computational capabilities, numerical simulation methods have become an important tool for tidal field prediction. These methods simulate tidal processes through fluid dynamics equations, making them adaptable to complex environments such as semi-enclosed bays or estuaries. However, they come with high computational costs and require accurate input data, as input biases may affect the reliability of the results.

Machine learning enables computers to learn from data and make predictions through algorithms. Deep learning, a subfield of machine learning, automatically extracts features through multi-layer neural networks, making it particularly suited for complex tasks. The advancement of machine learning technologies, especially the widespread application of deep learning models, offers new solutions for tidal field forecasting.

To verify the applicability of deep learning in tidal field prediction, this study applies deep learning techniques to predict the tidal field of a semi-enclosed bay, using Qinzhou Bay as a case study. Tidal variations significantly affect the hydrological conditions, coastal erosion, and fishery resources of semi-enclosed bays. Accurate tidal field predictions provide essential references for marine resource development, fishery management, and port navigation. Furthermore, the complex tidal, oceanic circulation, and topographic characteristics of semi-enclosed bays introduce significant uncertainty and complexity in the prediction of tidal fields across spatial and temporal scales. This study, therefore, serves as an ideal testing platform for deep learning models, validating their effectiveness and advantages in addressing nonlinearity and the complexity of spatiotemporal problems.

Tidal level forecasting traditionally relied on tidal prediction theories that considered gravitational influences. Isaac Newton first proposed the theory of equilibrium tides in Principia (1687), based on the law of universal gravitation. Darwin [1] later advanced this theory in 1892 by applying harmonic analysis of tidal forces to estimate tidal behavior in coastal regions with relatively simple seabed topography. In 1957, Doodson [2] expanded this work using the least squares method combined with extensive tidal observation data to estimate more complex tidal phenomena in shallow water areas. These methods have provided effective predictive tools for tidal behavior in shallow waters. However, they require large amounts of historical observational data and exhibit limited adaptability to complex seabed topography and nonlinear tidal phenomena.

With the development of computational capabilities, numerical simulation methods have been increasingly applied to tidal level forecasting. Guan et al. [3] proposed an instantaneous tide correction (ITC) algorithm for areas with insufficient tide gauges, combining the MIKE21 model to simulate astronomical tides and reduce errors. Georg et al. [4] developed and validated a two-dimensional finite element model for the Venice Lagoon, effectively simulating tidal wave propagation. Gallien et al. [5] employed a two-dimensional hydraulic inundation model to more accurately predict tidal flooding impacts on streets and land plots. Feng [6] used numerical models to study the spatial variability effects of sea level rise (SLR) on tides in the coastal areas of Jiangsu and the Yellow Sea shelf. Researchers like Kim [7], Marsooli [8], and Kong [9] used coupled models to analyze the effects of tides on storm surges and waves, while Hsiao et al. [10,11] investigated typhoon effects on storm wave heights along Taiwan’s northeastern coast. Similarly, Anwa et al. [12] applied SEAWAT-2005 and PHT3D models to simulate how tides and waves impact nutrient distribution in coastal aquifers. Yang [13] and Hu [14] studied typhoon effects on wave–current interactions in the Zhoushan Islands, with Zhang et al. [15] developing a three-dimensional ice–ocean coupled model to explore the influence of sea ice on astronomical tides in the Bohai Sea. Wang [16] and Liu [17] examined K1 internal tide reflections in the northern South China Sea and the seasonal variations of M2 internal tides in the Yellow Sea. These studies demonstrate that compared to harmonic analysis methods, numerical simulations offer better adaptability to complex geographical and meteorological conditions, improving both prediction accuracy and application range. However, numerical simulations are computationally intensive, particularly for large-scale, long-duration simulations, necessitating improvements in prediction accuracy and computational efficiency.

The advent of artificial intelligence technologies has led to the application of machine learning models for tide level prediction. Liang et al. [18] developed neural network models such as the Differential Neural Network (DNN) and Weather Data-Based Neural Network model (WDNN), achieving effective tide level predictions. Similarly, Riazi [19] proposed deep neural networks, while Deo et al. [20] used a network-based approach to replace empirical correction factors. Other studies include BPNN-based short-term data predictions by Lee [21], LightGBM and CNN-BiGRU combined models by Su et al. [22], and harmonic analysis combined with NARX neural network models by Wu et al. [23]. Chang et al. [24] applied tidal force-based neural networks, and Vicens-Miquel et al. [25] utilized deep learning models. More recently, Zhang et al. [26] proposed graph convolutional recurrent networks, and Salim et al. [27] applied ANN models, all achieving significant progress in prediction accuracy and computational efficiency. Machine learning has also been effective in ocean wave prediction, as demonstrated by Granata et al. [28] and Sadeghifar et al. [29], who explored the efficiency of M5P regression trees and random forests. Despite these advancements, traditional machine learning models struggle to address long-term dependencies and fail to capture complex interdependencies in long time series data.

The LSTM (Long Short-Term Memory) algorithm, with its unique gating mechanism, effectively captures long-term dependencies, solving the vanishing gradient problem in traditional neural networks. Recent applications of LSTM in ocean wave and tide level prediction include nearshore wave height reconstructions by Jörges et al. [30], Seq2Seq network integration by Pirhooshyaran et al. [31], polar wave height predictions by Ni et al. [32], and short-term tide level forecasts by Bai et al. [33]. These studies highlight LSTM’s potential but also its limitations, particularly its focus on single-point tide level prediction. Addressing spatial integrity and prediction efficiency for tidal fields with spatiotemporal characteristics remains challenging, especially in semi-enclosed bays.

This study introduces an LSTM-based model for rapid tidal field forecasting, leveraging high-precision tidal datasets from Delft3D numerical simulations. The model is compared with other machine learning algorithms, such as Transformer, Random Forest, and KNN regression models, to verify its advantages in tidal field prediction.

To the best of the authors’ knowledge, this is the first study to apply the LSTM model to the rapid prediction of tidal fields in semi-enclosed bays. The proposed method quickly and accurately predicts tidal fields based on boundary tidal data from global ocean models, effectively addressing the spatiotemporal complexities of tidal variations while enhancing prediction accuracy and efficiency. This approach provides a novel solution for tidal simulation and ocean hydrodynamic studies, offering valuable technical support for marine environmental monitoring and coastal disaster prevention.

2. Methodology

2.1. Overall Research Approach

The overall research approach of this study is designed to predict the tide level field in a semi-closed marine bay using the LSTM model. As depicted in Figure 1, the process begins with the use of a validated numerical model, Delft3D, to perform numerical simulations of Qinzhou Bay, which generates accurate tide data for comparison. The obtained data are then preprocessed to form a comprehensive dataset. This dataset is divided into three sets: training, validation, and testing. The training set is used for training the model, the validation set is employed for hyperparameter tuning, and the test set serves to evaluate the model’s generalization ability. An LSTM model is developed for tide level prediction. The input to the model consists of tide data from four boundary points over the past 6 h, which are extracted from a global ocean model. The model aims to predict the changes in tide levels across the entire region for the next 1 h. During the optimization phase, hyperparameters of the LSTM model are adjusted, and regularization techniques are introduced to enhance the prediction accuracy. To evaluate the model’s performance, error analysis is conducted using metrics such as the coefficient of determination (R²) and root mean square error (RMSE). These metrics help identify the sources of error and guide adjustments to the model, ensuring its performance reaches an optimal level.

2.2. Study Area and Data Sources

2.2.1. Study Area

The study area, shown in Figure 2, is Qinzhou Bay, located in the southeastern part of Qinzhou City, Guangxi Zhuang Autonomous Region, China. It lies to the north of the Beibu Gulf and is a typical bay-type marine area. This region serves as a crucial port and marine economic development zone for Guangxi, with significant economic, environmental, and shipping importance. Qinzhou Bay is rich in marine resources, featuring a winding coastline, extensive shoals and mudflats, and shallow waters, making it a typical tide-affected area. The tides in Qinzhou Bay are semi-diurnal, with a large tidal range, where the tidal difference can reach 2 to 3 m, particularly during the spring and autumn neap tides, when tidal fluctuations are most pronounced. These tidal effects play a significant role in influencing the hydrological processes, sediment transport, and marine ecosystems in the bay area. Tidal currents, especially at the bay’s entrance, are intense, creating a typical tidal hydrodynamic environment.

2.2.2. Data Sources

The coastline data are sourced from the GSHHG dataset provided by the National Centers for Environmental Information, which was first processed and compiled in 1996 by geographers Wessel and Smith [34]. The topographic data are obtained from the ETOPO Global Relief Model, also provided by the National Centers for Environmental Information. For this study, the ETOPO 2022 (Bedrock; 60 arcseconds) dataset is selected. The observed tidal level station data are obtained from the China National Shipping Service Network. The boundary conditions are derived from the water level time series at the open boundaries, based on the tidal harmonic constants from the global tidal model EOT20. EOT20, developed by the German Geodetic Research Institute at the Technical University of Munich (DGFI-TUM), is based on residual tidal analysis of multi-mission satellite altimetry data, providing the amplitudes and phases of 17 tidal constituents with a resolution of 0.125 degrees [35].

2.3. Physical Representation

The hydrodynamics in a semi-closed bay can be described using the following hydrodynamic governing equations [36]:

(1): Continuity Equation

\frac{\partial ζ}{\partial t} + \frac{1}{\sqrt{G_{ξ ξ}} \sqrt{G_{η η}}} \frac{\partial ((d + ζ) u \sqrt{G_{η η}})}{\partial ξ} + \frac{1}{\sqrt{G_{ξ ξ}} \sqrt{G_{η η}}} \frac{\partial ((d + ζ) v \sqrt{G_{ξ ξ}})}{\partial η} = (d + ζ) Q

(1)

In the equation, u and v represent the flow velocities in the ξ and η directions, respectively; Q denotes the flow rate change due to drainage or precipitation per unit area; d is the depth below the reference plane, ζ represents the free surface elevation above the reference plane, and d + ζ is the total water depth; t represents time.

(2): Momentum Equation

The momentum equation in the ξ-direction is as follows:

\begin{array}{l} \frac{\partial u}{\partial t} + \frac{u}{\sqrt{G_{ξ ξ}}} \frac{\partial u}{\partial ξ} + \frac{v}{\sqrt{G_{η η}}} \frac{\partial u}{\partial η} + \frac{ω}{d + ζ} \frac{\partial u}{\partial σ} - \frac{v^{2}}{\sqrt{G_{ξ ξ}} \sqrt{G_{η η}}} \frac{\partial \sqrt{G_{η η}}}{\partial ξ} + \frac{u v}{\sqrt{G_{ξ ξ}} \sqrt{G_{η η}}} \frac{\partial \sqrt{G_{ξ ξ}}}{\partial η} \\ - f v = - \frac{1}{ρ_{0} \sqrt{G_{ξ ξ}}} P_{ξ} + F_{ξ} + \frac{1}{{(d + ζ)}^{2}} \frac{\partial}{\partial σ} (υ_{v} \frac{\partial u}{\partial σ}) + M_{ξ} \end{array}

(2)

The momentum equation in the η-direction is as follows:

\begin{array}{l} \frac{\partial v}{\partial t} + \frac{u}{\sqrt{G_{ξ ξ}}} \frac{\partial v}{\partial ξ} + \frac{v}{\sqrt{G_{η η}}} \frac{\partial v}{\partial η} + \frac{ω}{d + ζ} \frac{\partial v}{\partial σ} - \frac{u^{2}}{\sqrt{G_{ξ ξ}} \sqrt{G_{η η}}} \frac{\partial \sqrt{G_{ξ ξ}}}{\partial η} + \frac{u v}{\sqrt{G_{ξ ξ}} \sqrt{G_{η η}}} \frac{\partial \sqrt{G_{η η}}}{\partial ξ} \\ - f u = - \frac{1}{ρ_{0} \sqrt{G_{η η}}} P_{η} + F_{η} + \frac{1}{{(d + ζ)}^{2}} \frac{\partial}{\partial σ} (υ_{v} \frac{\partial v}{\partial σ}) + M_{η} \end{array}

(3)

In the equations, f is the Coriolis force parameter, w represents the flow velocity in the vertical direction; σ is the scaled vertical coordinate; ρ₀ is the water density; P_ξ and P_η represent the hydrostatic pressure gradients in the ξ and η directions, respectively; F_ξ and F_η represent the effects of horizontal Reynolds stresses in the ξ and η directions, respectively; M_ξ and M_η represent the momentum influences in the ξ and η directions due to external factors, respectively; ν_v is the vertical eddy viscosity coefficient.

This study uses a two-dimensional hydrodynamic model for simulation, setting both the horizontal vortex viscosity and horizontal vortex diffusion coefficient to 1 m²/s to effectively capture the turbulent effects and material diffusion processes in the water flow. In the transport equation, the R = HU form is used to describe the transport characteristics of the water flow, and the Perot q(uio-u) fast convection scheme is selected for convection type to handle nonuniform convection and turbulent regions, ensuring accurate simulation of rapidly changing flow velocity environments. The boundary smoothing time is set to 3600 s to smooth fluctuations near the boundaries, ensuring the stability of the simulation results. Additionally, the model adopts the Villemonte fixed weir scheme with a weir contraction coefficient of 1, accurately simulating the contraction effect of the weir flow.

2.4. Data Processing

Due to the limited number of observation stations and the fact that observed data are primarily site-specific, it is challenging for single-point observational data to capture the spatial and temporal variability of the study area. Numerical simulations are used to generate high-resolution tidal data over a larger area, which helps compensate for gaps in both spatial and temporal data coverage. This approach allows for the construction of a comprehensive dataset, thereby improving the prediction accuracy and generalization ability of the deep learning model. However, the accuracy and reliability of numerical simulations directly affect the quality of the generated data, which in turn impacts the performance of the LSTM model. Numerical models require high-quality input data, such as precise topographic data and tidal boundary conditions. Errors in the input data can propagate through the simulation. In addition, the choice of numerical model, grid resolution, and parameterization schemes can also influence the simulation results, leading to biases in the generated dataset. To ensure that the simulated data closely represent real-world tidal dynamics, multiple validation and calibration steps are necessary. The numerical model was calibrated using observational data from the Qisha, Paotai Jiao, Fangchenggang, and Bailongwei stations, adjusting the model parameters to minimize the discrepancy between simulated and observed values. Validation was performed by comparing the simulated tidal data with the tidal data from observation stations, and the model’s performance was assessed using metrics such as root mean square error (RMSE) and correlation coefficients.

In this study, the Delft3D model is used to simulate the tidal processes in Qinzhou Bay. The model domain spans 21°29′ N to 21°55′ N and 108°12′ E to 108°57′ E, covering the coastal and nearshore waters of Qinzhou Bay. The model employs an unstructured grid for discretization, consisting of approximately 90,279 grid cells; the overall grid is shown in Figure 2. Local grid refinement is applied in nearshore areas within the simulation domain to enhance the accuracy of tidal simulations. The minimum grid resolution is approximately 100 m, which effectively captures the spatial variation of tidal dynamics while maintaining a balance between computational efficiency and simulation accuracy. The model time step is set to 5 min.

The model boundary conditions are based on the harmonic constants of the eight major tidal constituents (M2, S2, N2, K2, K1, O1, P1, and Q1), which provide water level time series data at the boundaries. These time series reflect water level changes at different time points and are used as boundary conditions in the numerical model to drive the hydrodynamic processes within the simulation domain.

The primary output of the simulation is the tidal time series data, reflecting the spatial and temporal variation of tidal levels at different points within Qinzhou Bay. Specifically, tidal data from four boundary points will be used as input features for the deep learning model. Additionally, the simulation will provide detailed tidal data at various locations across Qinzhou Bay, capturing the temporal evolution of tidal levels throughout the region. The tidal data will be normalized and divided into three parts: 60% for model training, 20% for testing, and the remaining 20% for validation. This division ensures that the model’s generalization ability and prediction accuracy are properly assessed.

2.5. LSTM Method

LSTM is an extension of recurrent neural networks (RNNs) designed to address the gradient vanishing and gradient explosion problems that traditional RNNs face when handling long sequences. LSTM controls the flow of information through three gating mechanisms: the input gate, forget gate, and output gate. These gates enable the model to capture long-term dependencies in time series data, making it ideal for applications like tidal level prediction.

In this study, the LSTM model uses a two-layer LSTM network for tidal level prediction, which allows it to capture long-term dependencies in time series data. As shown in Figure 3, the first LSTM layer consists of 200 units and uses the tanh activation function. This layer outputs values for each time step, providing sequence information for the subsequent LSTM layer. The second LSTM layer contains 150 units and outputs only the result from the last time step of the sequence. This output is then passed to a fully connected layer for further processing. To reduce overfitting, an 80% Dropout layer is added after each LSTM layer. The output layer uses a Dense fully connected layer with dimensions matching the target data, and a LeakyReLU activation function to accelerate the training process. The model is trained using the Adam optimizer and the mean squared error (MSE) loss function, which minimizes the error between the predicted and actual tidal level data. This architecture effectively learns the temporal features of the tidal level data through the LSTM layers, while the Dropout layers enhance the model’s generalization ability, enabling it to make accurate predictions of future tidal level changes.

2.6. Reference Methods

2.6.1. Transformer

In this study, the Transformer model architecture utilizes the self-attention mechanism and feed-forward networks to process time series data. The multi-head attention mechanism is employed to capture long-range dependencies within the sequence, while residual connections improve the stability and training performance of the model. The basic structure of the Transformer is defined by the transformer_encoder function, which includes the self-attention layer, feed-forward neural network layer, and residual connections. The self-attention layer is implemented using MultiHeadAttention, with each attention head having a size of 64, and the number of attention heads set to 4. The output of the attention layer is processed through Dropout and LayerNormalization, which are then added back to the input through a residual connection. The feed-forward neural network consists of two Conv1D convolutional layers used to map features and restore the output dimensions. The result is again normalized using LayerNormalization and added to the input via a residual connection. In the build_model function, multiple Transformer encoder blocks are stacked together, and the predictions are output through a multi-layer feed-forward network (MLP). The MLP consists of Dense layers and Dropout layers, with the final layer’s output dimension matching the target dataset’s dimension. The model is compiled using the Adam optimizer and the mean squared error (MSE) loss function, which minimizes the error between the predicted and actual results.

2.6.2. Random Forest Regressor

The Random Forest regressor model performs regression prediction by integrating multiple decision trees. The model is initialized with 100 decision trees. During training, the model fits the normalized training data, constructing multiple trees through bootstrap sampling and random feature selection. This helps reduce overfitting and improves the model’s generalization ability. The Random Forest model aggregates the predictions from all decision trees to generate the final output.

2.6.3. K-Nearest Neighbors Regressor

The K-Nearest Neighbors regressor (KNN) model performs regression prediction by calculating the distances between samples. For each prediction, the model identifies the k most similar samples in the feature space and predicts the output based on the labels of these neighboring samples. In this study, the number of neighbors k is set to 5, meaning the model considers the 5 nearest neighbors during each prediction. By selecting an appropriate distance metric, the KNN model calculates the average label value of the nearest neighbors in the training set, which serves as the final prediction result.

3. Results

3.1. Numerical Results

To validate the accuracy of the numerical model, simulation results were compared with observed data from multiple tide stations to assess its performance during tidal processes and analyze its precision and reliability. Four tide stations within and around Qinzhou Bay were selected for validation: Qisha, Paotai Jiao, Fangchenggang, and Bailongwei. These stations are located in different regions of Qinzhou Bay and represent typical tidal areas, effectively testing the model’s prediction capability.

R² and RMSE were used to evaluate the model’s accuracy. R² is used to measure how well the simulation matches the observed data, while RMSE quantifies the average error in the simulation. The validation results show that the R² values for each tide station were relatively high: 0.982 for Qisha, 0.985 for Paotai Jiao, 0.970 for Fangchenggang, and 0.985 for Bailongwei. The RMSE values were 16.08 cm, 14.15 cm, 20.49 cm, and 14.14 cm, respectively, indicating that the differences between the simulated results and the observed data are within an acceptable range.

As shown in Figure 4, the simulated tidal changes closely match the actual observed data. At each tide station (the locations of the tide stations are indicated by the markers in Figure 2, from left to right: Bailongwei, Fangchenggang, Paotai Jiao, and Qisha), the simulation results demonstrate a high degree of consistency with the actual tide observations, especially at the high and low tide moments within the tidal cycle, where the simulated tide fluctuations in trend and amplitude are consistent with the actual tide changes. Additionally, the model accurately captures the instantaneous changes in tide levels, reflecting the propagation characteristics of tidal waves and demonstrating the model’s precision in both spatial and temporal aspects. By comparing the tide station data, the model’s reliability is thoroughly validated, indicating that it can effectively simulate the tidal processes of Qinzhou Bay and surrounding areas with high prediction accuracy.

To form a comprehensive dataset, numerical simulations were conducted using the validated model to simulate tidal data at different times. Matlab was used to extract simulation data from the output files, selecting 16 evenly spaced time points starting from the 10th time point. The SciPy griddata function in Python was used to convert scatter data for each time point into regular grid data, and tidal distribution colormaps for each time point were plotted, as shown in Figure 5. The results show significant spatial variations in tidal changes within Qinzhou Bay, especially during the high and low tide moments of the tidal cycle. Tidal variations in the coastal and inner bay areas differ in magnitude, with areas near the mouth of the bay experiencing more pronounced tidal fluctuations, while tidal variations in the inner bay are relatively moderate. Furthermore, as tidal waves propagate from the sea mouth toward the inner bay, a lag effect in tidal changes is evident, particularly in narrow channels and bay areas. Tidal fluctuations are more intense in shallow regions and remain relatively stable in deeper areas. These observations are consistent with existing theoretical knowledge, further validating the reliability of the numerical simulation outcomes. They provide a reliable comprehensive dataset for subsequent machine learning simulations.

3.2. Performance of the LSTM Method

To evaluate the predictive ability of the LSTM model for tidal spatial distribution, a specific time point was randomly selected from the training set, validation set, and test set. The LSTM’s prediction results were then compared with the ground truth tidal spatial distribution. As shown in Figure 6, the LSTM model successfully simulates the water level distribution trend at these random time points, with overall consistency with the ground truth. In the training set, the LSTM simulation accurately reproduced the distribution characteristics of low water levels in the northern region and high water levels in the southern region at the selected time point. In the validation set, the LSTM model also captured the main patterns of water level distribution, though the water level changes in the boundary regions were slightly smoother compared to the ground truth. In the test set, despite some discrepancies in local low water level areas, the overall trend remained consistent with the ground truth. Overall, the spatial distribution predicted by the LSTM model closely aligns with the numerical simulation in most regions, demonstrating high reliability and indicating that the LSTM model has strong predictive ability for tidal spatial distribution.

To further validate the performance of the LSTM model in time series prediction, six representative points were randomly selected, and the LSTM model’s simulated results for January 2023 were compared with the ground truth. The selection of these points was made using a random sampling method, with a fixed random seed to ensure the repeatability of the selected points. The results show that the error indicators for these six points are similar to the overall error indicators of the model, indicating that the verification results can represent the performance of the LSTM model in time series forecasting. As shown in Figure 7, the tidal level simulations at the six different measurement points exhibit a high degree of consistency with the ground truth, demonstrating the same periodic variation trend. Over the entire simulation period, the goodness of fit between the LSTM simulation values and the ground truth exceeded R² = 0.980, indicating strong predictive ability. The root mean square error (RMSE) ranged from 0.119 m to 0.158 m, indicating small errors and further validating the model’s reliability. The tidal peaks and troughs at each measurement point were well captured, especially during phases with more pronounced periodic changes, where the LSTM and ground truth curves nearly overlapped. While minor deviations occurred between the LSTM simulation values and the ground truth at certain individual time points, the overall prediction error remained within a reasonable range, demonstrating a good fit. This further validates the advantages of the LSTM model in time series prediction, especially its ability to accurately capture the variation patterns of tidal waves during complex tidal cycles.

To further analyze the performance of the LSTM model in the frequency domain, frequency spectrum plots were created through Fourier transforms [37,38], as shown in Figure 8. The dominant frequency is concentrated in the low-frequency region, indicating that the signal changes are relatively slow, and the water level variations are moderate. The R² values for all subplots are close to 1, indicating that the simulation results fit the observed data well, especially for subplots (b) and (c), where the R² values are 0.995, demonstrating the best fit, with relatively small RMSE values of 2.126 and 2.264, respectively. Overall, the magnitudes of the LSTM data and ground truth data at the same frequencies are similar, indicating that the simulated data match the observed data well in the frequency domain.

To more comprehensively analyze the spatial prediction capability of the LSTM model, 16 additional measurement points were randomly selected, and the LSTM simulation results were compared with the ground truth. These points were selected using a random sampling method with a fixed random seed to ensure repeatability. The coordinates shown in the figure represent the spatial distribution of these points. The results indicate that the error metrics for these points are similar to the overall error metrics of the model, suggesting that the validation results can represent the spatial prediction performance of the LSTM model. As shown in Figure 9, the water level predictions at each measurement point show a high correlation with the ground truth. The R² values at each point range from 0.961 to 0.990, indicating a high consistency in the model’s prediction performance across different spatial locations. Some points even have R² values as high as 0.990, suggesting very high prediction accuracy for water levels in those areas. Meanwhile, the RMSE values ranged from 0.115 m to 0.245 m, indicating low error. From the distribution of scatter points in the figure, most points are concentrated near the y = x diagonal line, indicating that the model does not exhibit significant systematic bias in its predictions and captures water level variation characteristics well. Although some scatter points are slightly more dispersed, the majority of the measurement points show high goodness of fit and low error, demonstrating that the model not only accurately predicts tidal levels in time series but also exhibits high predictive ability in spatial distribution.

3.3. Performance of the Reference Methods

To evaluate the performance of the Transformer, Random Forest, and KNN regression models, 25 coordinate points were generated, and the R² and RMSE values were calculated, as shown in Figure 9. The Transformer model demonstrated high prediction accuracy, with R² values ranging from 0.976 to 0.977 and RMSE errors between 0.165 m and 0.179 m. The Random Forest model showed similar performance, with R² values around 0.976 and RMSE errors ranging from 0.168 m to 0.179 m, indicating stable performance. In contrast, the KNN model performed significantly worse, with R² values between 0.917 and 0.919 and larger RMSE errors ranging from 0.309 m to 0.331 m. These results suggest that the Transformer and Random Forest models exhibit strong predictive capabilities, while the KNN model has lower accuracy and higher errors in prediction.

4. Discussion

This study explores the application of LSTM networks for tide level prediction, a task traditionally approached using numerical simulations based on physical models, such as finite difference or finite element methods. These conventional methods, while capable of providing accurate predictions, are computationally intensive and demand significant resources and time, particularly for large-scale applications. In contrast, LSTM, a deep learning model designed for time series data, captures long-term dependencies in tide level variations and efficiently handles large volumes of data. This allows LSTM to generate predictions more quickly, making it a promising tool for real-time applications.

Traditional tide forecasting methods, such as harmonic analysis, rely on known astronomical forcing to decompose tidal signals into several sinusoidal components, offering robustness and long-term predictive capability. However, their applicability is limited in complex environments, such as semi-enclosed bays. On the other hand, machine learning methods, particularly Long Short-Term Memory (LSTM) networks, are better suited to adapt to complex and dynamic tidal patterns, capturing nonlinear relationships in the data, and providing higher accuracy and efficiency in short-term predictions.

A key innovation of this study is the integration of numerical simulation and deep learning. By generating a comprehensive dataset from numerical models to train the LSTM, the study overcomes the issue of limited observational data, thus improving the spatiotemporal accuracy of tide predictions. This fusion of machine learning and physical simulations represents a novel approach in tide level forecasting. When compared to computational fluid dynamics (CFD) methods, which require solving complex fluid dynamics equations and extensive computational resources, LSTM offers clear advantages. CFD methods can be time-consuming and resource-demanding, whereas LSTM’s training and prediction processes are more efficient, making it well suited for scenarios that require fast predictions. However, LSTM also has certain limitations. Its performance heavily depends on the quality and quantity of the training data, and it struggles with extrapolation beyond the range of the training dataset. Additionally, as LSTM relies on patterns learned from past observations, it may have difficulty capturing abrupt changes in tidal patterns or rare events not reflected in historical data.

When compared to other machine learning models, including Random Forest, Transformer, and KNN regression, LSTM stands out in terms of performance. As shown in Figure 10, LSTM consistently outperforms these models across the training, validation, and test sets. Specifically, LSTM demonstrates smaller RMSE values (Figure 10) and a correlation coefficient nearing 1, indicating high accuracy and consistency. In the training set, LSTM effectively fits the data, while in the validation and test sets, it shows strong generalization ability, accurately capturing the spatiotemporal patterns of tide levels. On the other hand, the other models perform slightly worse, particularly KNN regression, which shows larger deviations in both standard deviation and correlation from the ground truth. The superior overall performance of LSTM further validates its stability and reliability, attributed to its ability to capture both long-term and short-term dependencies in time series data. In contrast, Random Forest and KNN regression models struggle with capturing temporal dynamics, and while the Transformer model performs decently, it is not as accurate as LSTM. See Figure 11.

Unlike traditional LSTM methods, this study predicts future tidal fields using tidal data from four boundary points over the past 6 h. Specifically, when predicting the tidal field in the study area, the model does not rely on the historical tidal data of each point. Instead, it uses the tidal information from the boundary points to infer the changes in the entire tidal field. The model was trained on a computer equipped with a 13th Gen Intel^® Core™ i5 processor and 16 GB of RAM, with a computation time of a few minutes (no more than five minutes), indicating that the model has high computational efficiency. However, tidal signals are sparse, exhibiting significant energy only within specific frequency ranges in the frequency domain (as shown in Figure 8). In the future, compressed sensing techniques can be employed to reconstruct the complete tidal signal from fewer observation data [37], reducing the number of sampling points and computational burden. This would further enhance the model’s capability to handle sparse signals and long time series data, thereby improving the accuracy and efficiency of tidal predictions.

Despite its strong performance, the LSTM method is not without its limitations. One of the primary challenges is the need for large datasets to train the model effectively and improve prediction accuracy. Enhancing LSTM performance when data are limited, or by expanding the dataset through data augmentation techniques, such as using synthetic data generation, time series interpolation, or adding noise to existing data, presents an ongoing challenge. Future research could address these issues by integrating LSTM with other deep learning techniques, for example, integrating LSTM with convolutional neural networks (CNNs) can improve the ability to extract features from spatial data. Alternatively, combining LSTM with ensemble learning or physics-informed neural networks may address the limitations when dealing with extreme tidal events or sparse historical data.

5. Conclusions

This study introduces a rapid tide level field prediction method based on the LSTM model and applies it to reconstruct tidal fields in a semi-closed marine bay. The results show that compared to the Transformer, Random Forest, and KNN regression models, the LSTM model outperforms these algorithms in terms of accuracy and stability across the training, validation, and test sets. Notably, the LSTM model excels in capturing the temporal features and nonlinear relationships inherent in tide levels. The findings demonstrate that the LSTM model exhibits minimal error in predicting both the spatial distribution and temporal evolution of tide levels, showcasing its excellent performance in tide level prediction tasks. This study effectively validates the LSTM model’s suitability and practicality for tidal field prediction in semi-closed marine bays and provides a valuable reference for tidal process simulations and marine early warning systems. However, challenges remain in predicting extreme tide levels, which may require further refinement of the model. Future research could focus on integrating more advanced deep learning algorithms and multi-source data fusion techniques to enhance the model’s overall performance, particularly in handling extreme tidal events.

Author Contributions

Z.Z. directed the research; X.Y. prepared the original draft; Z.W. and S.L. reviewed and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The project was supported by the fund of Guangxi Key Laboratory of Beibu Gulf Marine Resources, Environment and Sustainable Development (Grant no. MRESD-2023-B03), the National Key Research and Development Program of China (Grant no. 2022YFC3702300), and the National Natural Science Foundation of China (Grant no. 52309079).

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

Author Zhuo Wang was employed by the company Liaoning Province Water Resources Management Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Darwin, G.H.I. On an apparatus for facilitating the reduction of tidal observations. Proc. R. Soc. Lond. 1893, 52, 345–389. [Google Scholar]
Doodson, A.T. The analysis and prediction of tides in shallow water. In The International Hydrographic Review; Semantic Scholar: Monte-Carlo, Monaco, 1957. [Google Scholar]
Guan, M.; Li, Q.; Zhu, J.; Wang, C.; Zhou, L.; Huang, C.; Ding, K. A method of establishing an instantaneous water level model for tide correction. Ocean Eng. 2019, 171, 324–331. [Google Scholar] [CrossRef]
Umgiesser, G.; Canu, D.M.; Cucco, A.; Solidoro, C. A finite element model for the Venice Lagoon. Development, set up, calibration and validation. J. Mar. Syst. 2004, 51, 123–145. [Google Scholar] [CrossRef]
Gallien, T.W.; Schubert, J.E.; Sanders, B.F. Predicting tidal flooding of urbanized embayments: A modeling framework and data requirements. Coast. Eng. 2011, 58, 567–577. [Google Scholar] [CrossRef]
Feng, X.; Feng, H.; Li, H.; Zhang, F.; Feng, W.; Zhang, W.; Yuan, J. Tidal responses to future sea level trends on the Yellow Sea shelf. J. Geophys. Res. Ocean. 2019, 124, 7285–7306. [Google Scholar] [CrossRef]
Kim, S.Y.; Yasuda, T.; Mase, H. Numerical analysis of effects of tidal variations on storm surges and waves. Appl. Ocean Res. 2008, 30, 311–322. [Google Scholar] [CrossRef]
Marsooli, R.; Lin, N. Numerical modeling of historical storm tides and waves and their interactions along the US East and Gulf Coasts. J. Geophys. Res. Ocean. 2018, 123, 3844–3874. [Google Scholar] [CrossRef]
Kong, X. A numerical study on the impact of tidal waves on the storm surge in the north of Liaodong Bay. Acta Oceanol. Sin. 2014, 33, 35–41. [Google Scholar] [CrossRef]
Hsiao, S.-C.; Chen, H.; Wu, H.-L.; Chen, W.-B.; Chang, C.-H.; Guo, W.-D.; Chen, Y.-M.; Lin, L.-Y. Numerical simulation of large wave heights from super typhoon Nepartak (2016) in the eastern waters of Taiwan. J. Mar. Sci. Eng. 2020, 8, 217. [Google Scholar] [CrossRef]
Hsiao, S.-C.; Wu, H.-L.; Chen, W.-B.; Chang, C.-H.; Lin, L.-Y. On the sensitivity of typhoon wave simulations to tidal elevation and current. J. Mar. Sci. Eng. 2020, 8, 731. [Google Scholar] [CrossRef]
Anwar, N.; Robinson, C.; Barry, D.A. Influence of tides and waves on the fate of nutrients in a nearshore aquifer: Numerical simulations. Adv. Water Resour. 2014, 73, 203–213. [Google Scholar] [CrossRef]
Yang, Z.; Shao, W.; Ding, Y.; Shi, J.; Ji, Q. Wave simulation by the SWAN model and FVCOM considering the sea-water level around the Zhoushan islands. J. Mar. Sci. Eng. 2020, 8, 783. [Google Scholar] [CrossRef]
Hu, Y.; Shao, W.; Wei, Y.; Zuo, J. Analysis of typhoon-induced waves along typhoon tracks in the western North Pacific Ocean, 1998–2017. J. Mar. Sci. Eng. 2020, 8, 521. [Google Scholar] [CrossRef]
Zhang, N.; Wang, J.; Wu, Y.; Wang, K.H.; Zhang, Q.; Wu, S.; You, Z.J.; Ma, Y. A modelling study of ice effect on tidal damping in the Bohai Sea. Ocean Eng. 2019, 173, 748–760. [Google Scholar] [CrossRef]
Wang, S.; Cao, A.; Li, Q.; Chen, X. Reflection of K1 internal tides at the continental slope in the northern South China Sea. J. Geophys. Res. Ocean. 2021, 126, e2021JC017260. [Google Scholar] [CrossRef]
Liu, K.; Sun, J.; Guo, C.; Yang, Y.; Yu, W.; Wei, Z. Seasonal and spatial variations of the M2 internal tide in the Yellow Sea. J. Geophys. Res. Ocean. 2019, 124, 1115–1138. [Google Scholar] [CrossRef]
Liang, S.X.; Li, M.C.; Sun, Z.C. Prediction models for tidal level including strong meteorologic effects using a neural network. Ocean Eng. 2008, 35, 666–675. [Google Scholar] [CrossRef]
Riazi, A. Accurate tide level estimation: A deep learning approach. Ocean Eng. 2020, 198, 107013. [Google Scholar] [CrossRef]
Deo, M.C.; Chaudhari, G. Tide prediction using neural networks. Comput.-Aided Civ. Infrastruct. Eng. 1998, 13, 113–120. [Google Scholar] [CrossRef]
Lee, T.-L. Back-propagation neural network for long-term tidal predictions. Ocean Eng. 2004, 31, 225–238. [Google Scholar] [CrossRef]
Su, Y.; Jiang, X. Prediction of tide level based on variable weight combination of LightGBM and CNN-BiGRU model. Sci. Rep. 2023, 13, 9. [Google Scholar] [CrossRef] [PubMed]
Wu, W.; Li, L.; Yin, J.; Lyu, W.; Zhang, W. A modular tide level prediction method based on a NARX neural network. IEEE Access 2021, 9, 147416–147429. [Google Scholar] [CrossRef]
Chang, H.-K.; Lin, L.-C. Multi-point tidal prediction using artificial neural network with tide-generating forces. Coast. Eng. 2006, 53, 857–864. [Google Scholar] [CrossRef]
Vicens-Miquel, M.; Tissot, P.E.; Medrano, F.A. Exploring Deep Learning Methods for Short-Term Tide Gauge Water Level Predictions. Water 2024, 16, 2886. [Google Scholar] [CrossRef]
Zhang, X.; Wang, T.; Wang, W.; Shen, P.; Cai, Z.; Cai, H. A multi-site tide level prediction model based on graph convolutional recurrent networks. Ocean Eng. 2023, 269, 113579. [Google Scholar] [CrossRef]
Salim, A.M.; Dwarakish, G.S.; Liju, K.V.; Thomas, J.; Devi, G.; Rajeesh, R. Weekly prediction of tides using neural networks. Procedia Eng. 2015, 116, 678–682. [Google Scholar] [CrossRef]
Granata, F.; Di Nunno, F. Artificial Intelligence models for prediction of the tide level in Venice. Stoch. Environ. Res. Risk Assess. 2021, 35, 2537–2548. [Google Scholar] [CrossRef]
Sadeghifar, T.; Lama, G.F.; Sihag, P.; Bayram, A.; Kisi, O. Wave height predictions in complex sea flows through soft-computing models: Case study of Persian Gulf. Ocean Eng. 2022, 245, 110467. [Google Scholar] [CrossRef]
Jörges, C.; Berkenbrink, C.; Stumpe, B. Prediction and reconstruction of ocean wave heights based on bathymetric data using LSTM neural networks. Ocean Eng. 2021, 232, 109046. [Google Scholar] [CrossRef]
Pirhooshyaran, M.; Snyder, L.V. Forecasting, hindcasting and feature selection of ocean waves via recurrent and sequence-to-sequence networks. Ocean Eng. 2020, 207, 107424. [Google Scholar] [CrossRef]
Ni, C.; Ma, X. An integrated long-short term memory algorithm for predicting polar westerlies wave height. Ocean Eng. 2020, 215, 107715. [Google Scholar] [CrossRef]
Bai, L.-H.; Xu, H. Accurate estimation of tidal level using bidirectional long short-term memory recurrent neural network. Ocean Eng. 2021, 235, 108765. [Google Scholar] [CrossRef]
Wessel, P.; Smith, W.H.F. A global, self-consistent, hierarchical, high-resolution shoreline database. J. Geophys. Res. Solid Earth 1996, 101, 8741–8743. [Google Scholar] [CrossRef]
Michael, H.-D.; Gaia, P.; Denise, D.; Christian, S.; Marcello, P.; Floria, S. EOT20—A Global Empirical Ocean Tide Model from Multi-Mission Satellite Altimetry. Earth Syst. Sci. Data 2021, 13, 3869–3884. [Google Scholar]
Wang, J.; Chu, A.; Dai, Z.; Nienhuis, J. Delft3D model-based estuarine suspended sediment budget with morphodynamic changes of the channel-shoal complex in a mega fluvial-tidal delta. Eng. Appl. Comput. Fluid Mech. 2024, 18, 2300763. [Google Scholar] [CrossRef]
Alan, A.R.; Bayındır, C.; Ozaydin, F.; Altintas, A.A. The Predictability of the 30 October 2020 İZmir-Samos Tsunami Hydrodynamics and Enhancement of Its Early Warning Time by LSTM Deep Learning Network. Water 2022, 15, 4195. [Google Scholar] [CrossRef]
Ghosh, A. Spectrum Usage Analysis and Prediction Using LSTM Networks. Master’s Thesis, University of Utah, Salt Lake City, UT, USA, 2020. [Google Scholar]

Figure 1. Schematic of research methods.

Figure 2. Schematic of the study area.

Figure 3. Diagram of the LSTM model architecture.

Figure 4. Numerical model validation: (a) Qisha; (b) Paotai Jiao; (c) Fangchenggang; and (d) Bailongwei.

Figure 5. Tidal distribution colormaps at different time points: T01 to T16 correspond sequentially to the 10th, 58th, 107th, 156th, 205th, 254th, 303rd, 352nd, 400th, 449th, 498th, 547th, 596th, 645th, 694th, and 743rd time steps.

Figure 6. Comparison of LSTM and ground truth spatial distribution.

Figure 7. Comparison of time series at representative locations(unit: km): (a) Location (12,111.0, 2455.9); (b) Location (12,123.0, 2465.2); (c) Location (12,118.0, 2457.4); (d) Location (12,089.0, 2481.6); (e) Location (12,064.0, 2425.7); (f) Location (12,045.0, 2460.9).

Figure 8. Frequency spectrum plot of LSTM data and ground truth data: (a) Location (12,111.0, 2455.9); (b) Location (12,123.0, 2465.2); (c) Location (12,118.0, 2457.4); (d) Location (12,089.0, 2481.6); (e) Location (12,064.0, 2425.7); (f) Location (12,045.0, 2460.9).

Figure 9. Comparison of the LSTM and ground truth data: x and y are the coordinates of the sample points (unit: km).

Figure 10. R² and RMSE heatmaps for the reference methods: (a) Transformer model; (b) Random Forest model; and (c) K-Nearest Neighbors regression model.

Figure 11. Performance comparison of machine learning methods—Taylor diagram: (a) training set; (b) validation set; (c) test set; (d) full dataset.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, Z.; Yan, X.; Wang, Z.; Liu, S. Spatiotemporal Prediction of Tidal Fields in a Semi-Enclosed Marine Bay Using Deep Learning. Water 2025, 17, 386. https://doi.org/10.3390/w17030386

AMA Style

Zhu Z, Yan X, Wang Z, Liu S. Spatiotemporal Prediction of Tidal Fields in a Semi-Enclosed Marine Bay Using Deep Learning. Water. 2025; 17(3):386. https://doi.org/10.3390/w17030386

Chicago/Turabian Style

Zhu, Zuhao, Xiaohui Yan, Zhuo Wang, and Sidi Liu. 2025. "Spatiotemporal Prediction of Tidal Fields in a Semi-Enclosed Marine Bay Using Deep Learning" Water 17, no. 3: 386. https://doi.org/10.3390/w17030386

APA Style

Zhu, Z., Yan, X., Wang, Z., & Liu, S. (2025). Spatiotemporal Prediction of Tidal Fields in a Semi-Enclosed Marine Bay Using Deep Learning. Water, 17(3), 386. https://doi.org/10.3390/w17030386

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Spatiotemporal Prediction of Tidal Fields in a Semi-Enclosed Marine Bay Using Deep Learning

Abstract

1. Introduction

2. Methodology

2.1. Overall Research Approach

2.2. Study Area and Data Sources

2.2.1. Study Area

2.2.2. Data Sources

2.3. Physical Representation

2.4. Data Processing

2.5. LSTM Method

2.6. Reference Methods

2.6.1. Transformer

2.6.2. Random Forest Regressor

2.6.3. K-Nearest Neighbors Regressor

3. Results

3.1. Numerical Results

3.2. Performance of the LSTM Method

3.3. Performance of the Reference Methods

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI