Enhancing Soil Moisture Forecasting Accuracy with REDF-LSTM: Integrating Residual En-Decoding and Feature Attention Mechanisms

Li, Xiaoning; Zhang, Ziyin; Li, Qingliang; Zhu, Jinlong

doi:10.3390/w16101376

Open AccessArticle

Enhancing Soil Moisture Forecasting Accuracy with REDF-LSTM: Integrating Residual En-Decoding and Feature Attention Mechanisms

College of Computer Science and Technology, Changchun Normal University, Changchun 130032, China

^*

Authors to whom correspondence should be addressed.

Water 2024, 16(10), 1376; https://doi.org/10.3390/w16101376

Submission received: 9 April 2024 / Revised: 1 May 2024 / Accepted: 8 May 2024 / Published: 11 May 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This study introduces an innovative deep learning model, Residual-EnDecode-Feedforward Attention Mechanism-Long Short-Term Memory (REDF-LSTM), designed to overcome the high uncertainty challenges faced by traditional soil moisture prediction methods. The REDF-LSTM model, by integrating a residual learning encoder–decoder LSTM layer, enhanced LSTM layers, and feedforward attention, not only captures the deep features of time series data but also optimizes the model’s ability to identify key influencing factors, including land surface features, atmospheric conditions, and other static environmental variables. Unlike existing methods, the innovation of this model lies in its first-time combination of the residual learning encoder–decoder and feedforward attention mechanisms in the soil moisture prediction field. It delves into the complex patterns of time series through the encoder–decoder structure and accurately locates key influencing factors through the feedforward attention mechanism, significantly improving predictive performance. The choice to combine the feedforward attention mechanism and encoder–decoder with the LSTM model is to fully leverage their advantages in processing complex data sequences and enhancing the model’s focus on important features, aiming for more accurate soil moisture prediction. After comparison with current advanced models such as EDLSTM, FAMLSTM, and GANBiLSTM, our REDF-LSTM demonstrated the best performance. Compared to traditional LSTM models, it achieved an average improvement of 13.07% in R², 20.98% in RMSE, 24.86% in BIAS, and 11.1% in KGE key performance indicators, fully proving its superior predictive capability and potential application value in precision agriculture and ecosystem management.

Keywords:

soil moisture prediction; long short-term memory networks; encoder–decoder framework; feedforward attention mechanism; residual learning

1. Introduction

Soil moisture plays a pivotal role as a key state variable in climate, hydrology, and ecosystems, linking atmospheric processes with terrestrial surface states, and is central to the complex interactions between them [1]. Beyond its significant impact on weather forecasting [2] and flood potential assessments [3], accurate soil moisture prediction is invaluable for guiding disaster response, formulating irrigation strategies, and other scientific applications [4]. Notably, precise soil moisture prediction provides substantial support to practical fields like ecosystem management and precision agriculture [5]. Soil moisture variability is influenced by numerous factors including, but not limited to, precipitation [6], soil properties [7], and topography [8], with its predictability often based on soil moisture’s own lag or the effect of other external forcing factors [9]. These complex influencing factors render soil moisture prediction both deterministic and markedly nonlinear, complicating accurate forecasts, especially considering the uncertainties in future soil moisture changes. This study focuses on developing models for short-term (1-day lead time) soil moisture prediction to address these challenges, offering more accurate decision-support tools for relevant fields.

Over the past decades, many scholars have attempted to predict soil moisture using control equations based on complex hydrological processes [10], involving a range of methods from integrating weather forecast models into land surface data assimilation systems to independently calculating soil moisture with land surface models driven by specific factors [11,12]. However, these physically based models face challenges in key application areas, such as the uncertainty of driving factors, incompleteness of land surface process models, and substantial computational resource demands [13]. Consequently, some studies have evaluated the performance of these physical models, highlighting their limitations, and explored machine learning (ML) models as alternatives [14,15]. ML models, such as Multiple Linear Regression (MLR) [16], Support Vector Machines (SVMs) [17], Gradient Boosted Regression Trees (GBRTs) [18], and Artificial Neural Networks (ANNs) [19], offer new approaches for predicting soil moisture, demonstrating potential for accelerated predictions in various cases. For instance, Li et al.’s study [20] showed that a basic feedforward three-layer ANN could simulate and predict soil moisture with lower computational cost compared to traditional physical models and reduced reliance on prior conditions (like boundary conditions and soil structure heterogeneity). Similarly, Gumiere et al.’s research [21] indicated that the Random Forest (RF) method outperformed traditional physical models (such as HYDRUS 2D) in predicting soil moisture performance, mainly due to its lower computational demand and easier implementation.

Models based on Artificial Neural Networks (ANNs) are extensively used in predicting soil moisture because of their exceptional ability to learn autonomously [22], showcasing significant potential in managing intricate forecasting challenges. For instance, Hou et al. [23] employed ANNs to examine meteorological data for predicting soil moisture, exhibiting strong concordance between the model’s forecasts and real measurements. Hou et al. [23] predicted soil moisture using meteorological data with an ANN model, which performed well. Li et al. [24] employed an adaptive genetic ANN model and atmospheric forcing data to predict soil moisture, significantly improving predictions.

With the accumulation of vast datasets and advances in deep learning technologies, significant progress has been made across many domains. Deep learning transforms basic input information through complex nonlinear relationships into higher-level features, enhancing the expressiveness of features with notable distinctions for inputs in fields like computer science and earth science [25]. The focus of this study is on Long Short-Term Memory (LSTM) networks within deep learning, particularly for their attributes related to analyzing time series data in predicting soil moisture. LSTM models have shown their exceptional value in various domains, including video processing [26], image processing [27], and earth science [28]. In soil moisture prediction, LSTMs have also demonstrated superior performance. Using LSTMs for predictions involving meteorological data and soil characteristics outperforms traditional models, as proven by Fang et al.’s study [29]. Fang et al. [30] further used LSTMs to successfully capture interannual trends in soil moisture and, combined with land surface models, improved near-real-time estimates of land surface soil moisture. Li et al. [28] innovated a novel Local-Global Dependency LSTM (LGD-LSTM) model designed for accurately predicting soil textures at different depths. Lal and his team [31] used spatial-temporal series clustering to analyze global vegetation area soil moisture distribution patterns. Datta et al. [32] introduced a multi-head LSTM model that integrates soil moisture time series data from different scales to accurately predict soil moisture changes over the next month. Additionally, Li’s team [33] proposed an LSTM model based on attention mechanisms for soil moisture prediction, enhancing prediction accuracy and deepening the understanding of physical concepts during the learning process with time attention weights and interpreters. Li et al. [34] proposed an attention-based convolutional long short-term memory model (AttConvLSTM) that integrates spatial compression, axial attention, and an encoder–decoder mechanism for multi-step prediction of surface soil moisture. Wang et al. [35] systematically compared the predictive capabilities and computational costs of models for different soil textures and depths across ten network architectures in soil moisture prediction. They noted that the use of attention mechanisms in LSTM time series for predicting soil moisture can enhance prediction effectiveness.

In time series prediction tasks, despite Long Short-Term Memory (LSTM) networks being highly regarded for their excellent capability to capture long-term dependencies [14], they face significant challenges when dealing with complex time series predictions like soil moisture. Especially in processing dynamically complex data rich in spatial features, LSTMs often exhibit tendencies of overfitting or underfitting when tackling nonlinear and multiscale time series features [36]. Moreover, the original design of LSTMs, primarily for predicting future single time points based on historical information rather than the overall time series, may lead to a model’s inability to fully mine or appropriately allocate key information in long-term data analysis [37]. This limitation could negatively impact prediction accuracy and the model’s generalizability. Pan et al. [18] observed that the memory characteristics of soil moisture at different temporal scales have varying impacts on long-term soil moisture predictions in data-driven models.

Predicting soil moisture is particularly challenging due to the influence of multiple dynamic environmental factors such as rainfall, temperature, and soil structure, which exhibit high nonlinearity and temporal dependency. Although traditional LSTM models can handle temporal dependencies, they struggle to manage complex nonlinear relationships and capture key spatiotemporal features relevant to future predictions.

To address these challenges, we propose an innovative model named REDF-LSTM, which integrates a residual network and an encoder–decoder architecture, incorporating a feedforward attention mechanism. Inspired by residual learning, REDF-LSTM mitigates the vanishing gradient problem encountered in training deep networks by introducing skip connections, thereby enhancing its ability to capture deep nonlinear relationships. The encoder–decoder architecture is optimized for sequence prediction tasks; the encoder deeply learns the feature representation of the input sequence, after which the decoder uses these deep features for precise future time point predictions. Moreover, the integration of a feedforward attention mechanism allows REDF-LSTM to dynamically focus on crucial features in the input sequence for future predictions [38], significantly improving the accuracy and generalization capability of the model. Through these enhancements, the REDF-LSTM model effectively addresses key issues in soil moisture prediction, especially in handling multiscale and nonlinear features and in providing predictions across the entire time series, showing significant improvements and advantages over traditional models.

The REDF-LSTM model was validated using the ERA5-Land global surface meteorological data product from the European Centre for Medium-Range Weather Forecasts (ECMWF). In particular, we focused on testing the model’s performance in predicting the next day’s surface soil moisture (0–7 cm depth). Validation results indicated that the REDF-LSTM model showed significant performance improvements in complex soil moisture time series forecasting tasks, compared to traditional LSTM models and currently more advanced models such as EDLSTM [39], FAMLSTM [40], and GANBiLSTM [41].

2. Materials and Methods

2.1. Data Description

For this study, we relied on the ERA5-Land dataset as our primary data source, provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). It is a highly consistent and accurate global dataset, covering an extensive time series since 1950, integrating observational data from around the world and consolidated through physical models, ensuring high precision and consistency on a global scale [42]. For our research needs, the data underwent bilinear interpolation processing, adjusting to a 1° spatial resolution (180 × 360 grid points) to fit our analytical framework and reduce computational burden while maintaining sufficient accuracy to capture a wide range of climatic changes and surface states accurately.

The primary validation metric selected was the first layer soil moisture (sm1, in m³/m³), reflecting soil moisture at depths of 0 to 7 cm, crucial for seed germination and a key climate variable in global climate monitoring systems. The secondary validation metric was surface sensible heat flux, considered a key influencing variable as it relates to the interaction between the atmosphere and the surface and climate variability. Notably, the choice of input variables significantly impacts the performance of LSTM neural networks. In soil moisture prediction, for example, various factors influence the complex process of land–atmosphere interaction. Precipitation, as an external climate factor, directly affects the increase and decrease of soil moisture, which in turn feeds back to affect precipitation [43].

We selected these specific input variables—including atmospheric conditions (such as precipitation, 2 m temperature, specific humidity, meridional and zonal wind speed, surface atmospheric pressure, and land surface states (such as the volumetric soil water layer, downward surface solar radiation, and thermal radiation), as well as static variables [44] (soil water capacity)—based on their direct relevance to soil moisture and their significance in past research. For example, precipitation directly affects the moisture condition of the soil, while temperature and radiation influence the evaporation rate, both of which are critical factors in soil moisture variation. We conducted an analysis of the importance of input variables to the research variable using random forests, as shown in Figure 1. In addition, soil water capacity reflects the soil’s ability to store moisture and is a key static datum for predicting soil moisture [18].

2.2. REDF-LSTM Model

2.2.1. Long Short-Term Memory Network (LSTM)

Soil moisture prediction is essentially a task for time series data, and the Long Short-Term Memory (LSTM) model, with its unique architecture, is particularly suited for learning time series patterns. In the LSTM layers, memory cells can effectively determine which information needs to be retained and which can be discarded. Thus, this study focuses on using the LSTM network for soil moisture prediction analysis. As shown in Figure 2, the LSTM model accomplishes this prediction task by computing the mapping relationship between meteorological parameters and soil moisture. The corresponding computation formulas are shown in Equations (1)–(6):

i (t) = σ (W_{i h} h (t - 1) + W_{i x} x (t) + W_{i x} c (t - 1) + b_{i})

(1)

f (t) = σ (W_{f h} h (t - 1) + W_{f x} x (t) + W_{f x} c (t - 1) + b_{f})

(2)

c (t) = f (t) \otimes c (t - 1) + i (t) \otimes t a n h (W_{c h} h (t - 1) + W_{c x} x (t) + b_{c})

(3)

o (t) = σ (W_{o h} h (t - 1) + W_{o x} x (t) + W_{o x} c (t) + b_{o})

(4)

h (t) = o (t) \otimes t a n h (c (t))

(5)

\hat{y} (t) = W_{y h} h (t) + b_{o}

(6)

Within the LSTM framework, the interaction between the input gate i(t), forget gate f(t), and output gate o(t) with the model’s input x(t), cell state c(t), and hidden state h(t) is crucial, where W and b signify the model’s weight and bias parameters, correspondingly. The functions σ() and tanh() are identified as the nonlinear activation and hyperbolic tangent functions, respectively. The symbol ⊗ is used for the element-wise multiplication of vector pairs. The variables for input and output at any given time step t are represented by x(t) and

\hat{y} (t)

, in that order.

The key operation process of the LSTM can be simplified into the following steps: First, the input gate i(t) integrates the current input x(t), the previous cell state c(t − 1), and hidden state h(t − 1) to determine which short-term information is crucial then updates the short-term memory part through the activation function tanh, that is,

i (t) \otimes t a n h (W_{c h} h (t - 1) + W_{c x} x (t) + b_{c})

. Subsequently, the forget gate f(t) decides which long-term memories need to be forgotten, updating the long-term memory through f(t) ⊗ c(t − 1). The output gate o(t) is calculated based on the latest cell state c(t) and is used to refresh the hidden state h(t). Finally, using the new hidden state h(t), predictions are made for target variables, such as soil moisture.

2.2.2. Feedforward Attention Mechanism

The identification of areas requiring attention in complex scenarios has sparked the development of attention mechanisms in the computing field. Attention mechanisms can capture the most important data for current tasks in complex datasets [45], achieving tremendous success in computer vision tasks [46]. In large-scale data processing, compared to conventional attention mechanisms, feedforward attention mechanisms can demonstrate better performance in dealing with medium to long-term time series issues [40].

As shown in Figure 3, in each time series

h_{t} = {h_{1}, h_{2}, h_{3} \dots h_{m}}

, the importance of each point in time t is assessed,

m

represents the number of input sequences at time point t, and

h_{m}

represents the m^th input sequence in the

h_{t}

series.

a (h_{t})

is a feedforward neural network based on the

h_{t}

input, as shown in Equation (7). The process of attention calculation yields a probability vector

{α_{t} = {{α}_{1}, α}_{2}, α_{3} \dots α_{m}}

for the input

h_{t}

across various time steps, as defined by Equation (8). Finally, the output vector

c_{t}

is obtained by summing up each

α_{l} h_{l}

, as shown in Equation (9). Therefore, feedforward attention achieves a method that can adaptively adjust feature weights over time, enhancing the efficiency of information fusion. This method, by directly generating attention scores, eliminates the need for the query and key matching steps required by traditional attention mechanisms. This simplification reduces computational load, saves processing time and resources, thereby enhancing the model’s learning efficiency. The corresponding computation formulas are shown in Equations (7)–(9):

e_{t} = a (h_{t})

(7)

α_{m} = \frac{e x p (e_{m})}{\sum_{n = 1}^{m} e x p (e_{n})}

(8)

c_{t} = \sum_{l = 1}^{m} α_{l} h_{l}

(9)

In this framework, the

α

( ) function acts as a trainable evaluator, responsible for determining the importance of each element in the input sequence;

m^{t h}

refers to the total number of input sequences at time point t;

α_{m}

is the weight vector for the mth input sequence at time t, reflecting the relative contribution of each sequence element to the target task;

c_{t}

, by applying the weight vector t to the input sequences, the weighted average obtained represents the aggregated input information at time point t. This mechanism adjusts the contribution of each element according to its importance, effectively filtering and concentrating input information for processing.

The feedforward attention mechanism plays a crucial role in soil moisture prediction models, particularly in distinguishing and weighting features that most significantly impact the prediction outcomes. This mechanism operates through several steps:

Feature Weight Allocation: In the context of soil moisture prediction, the feedforward attention mechanism first assesses the importance of each feature within the input feature set. This is accomplished by calculating each feature’s contribution to the prediction target. The mechanism generates a weight value for each feature, indicating the relative importance of various features in predicting soil moisture.
Dynamic Adjustment: A key characteristic of the feedforward attention mechanism is its dynamism. It can automatically adjust feature weights based on different input data and environmental conditions. For instance, if a particular area experiences sudden rainfall, the weights of moisture-related features (such as recent rainfall amounts and the current moisture state of the soil) might increase, as these factors become more critical in subsequent soil moisture predictions.
Weighted Input Synthesis: After weight assignment, the feedforward attention mechanism applies these weights to the corresponding features to generate an “attention-weighted feature representation”. This representation focuses on those features most crucial to the prediction outcome, allowing the model to concentrate more on these key pieces of information.
Enhancing Prediction Accuracy and Interpretability: By emphasizing important features and suppressing less significant information, the feedforward attention mechanism not only enhances prediction accuracy but also improves the model’s interpretability. The model can explicitly indicate which factors are key to influencing the prediction outcomes, which is invaluable for formulating management strategies and decision support in practical applications.

Overall, the feedforward attention mechanism, through its detailed feature analysis and dynamic adjustments, effectively identifies and utilizes the key factors affecting soil moisture, thereby playing a significant role in the field of soil moisture prediction. This not only enhances the performance of prediction models but also provides an effective tool for research and applications in related fields, enabling more accurate understanding and forecasting of environmental changes.

2.2.3. REDF-LSTM

In deep learning research, especially in time series forecasting, common issues include the LSTM layer’s inadequacy in capturing complex data patterns and the degradation phenomenon of decreased prediction accuracy with increased complexity, as proposed by Li et al. [47]. Additionally, when dealing with multi-source heterogeneous data, standard deep learning models may struggle to effectively integrate various features, thus limiting their application potential in specific tasks. Moreover, the model’s ability to capture long-term dependencies faces challenges in processing long sequence data, making it difficult for key information to be continuously tracked and utilized by the model. In light of these challenges, we propose a novel model design. Our REDF-LSTM model not only integrates the deep residual learning concept by He et al. [48] to mitigate network degradation but also innovatively incorporates an encoder–decoder structure and a feedforward attention mechanism. The REDF-LSTM model incorporates a residual learning encoder–decoder structure and a feedforward attention mechanism, which together bring significant advantages to the model:

Residual Learning Encoder–Decoder Structure: In Equation (1), x(t) includes current atmospheric conditions and land surface state data. x(t) is a vector containing current precipitation, temperature, specific humidity, wind speed, and surface radiation. These inputs are multiplied by the weight matrix W_ix and combined with the previous moment’s hidden state h(t − 1) and cell state c(t − 1) to compute the activation values of the input gate. In Equation (2), similar to the input gate, x(t) provides information about the current environmental state, assisting the model in deciding what information should be forgotten from the cell state. In Equation (3), the update of the cell state is directly controlled by the input and forget gates, combining new input data with the previous moment’s cell state to form a new cell state. In Equations (4) and (5), the output gate determines which information will be passed to the hidden state h(t) at the next time step, affecting the model’s predictive output and the long-term maintenance of internal memory. Equation (6) generates the model’s prediction of soil moisture at the current moment, based on the hidden state, reflecting the cumulative impact of all past inputs. This structure, by introducing residual connections, allows direct transmission of information across different layers of the model, effectively reducing the loss of information during transmission. The encoding stage deeply mines the features of the input data and forms a high-level feature representation, while the decoding stage uses these advanced features in conjunction with original input features to participate in the final prediction, enhancing the model’s understanding of the intrinsic structure and dynamic changes in the data, thus reducing the uncertainty in prediction.
Feedforward Attention Mechanism: The soil moisture time series, as referenced in Equation (6), is fed into the feedforward attention mechanism. By dynamically allocating feature weights across different time steps, the feedforward attention mechanism enables the model to focus on the most influential features for the prediction outcomes. Unlike traditional LSTM models that treat all input features equally, the feedforward attention mechanism adjusts weights based on the actual impact of the data, thus improving the accuracy and efficiency of predictions. This mechanism is particularly important in dealing with nonlinear and nonstationary characteristics in time series, effectively enhancing the model’s responsiveness to sudden or significant events.

Overall, the REDF-LSTM model, through the effective integration of a residual learning-enhanced encoder–decoder structure and feedforward attention mechanism, significantly improves the accuracy of predictions of soil moisture changes, making this innovative structure more suitable for handling complex and dynamically changing environmental data. This not only enhances the model’s generalizability but also increases its reliability and the accuracy of its predictions in practical applications.

The REDF-LSTM model, through its carefully designed architecture, effectively processes multi-dimensional time series features to improve the accuracy of soil moisture predictions. Each layer of the model plays a crucial role, ensuring efficient and accurate data processing. As shown in Figure 4, Let us break down the model and explain the function of each layer:

Input layer: As the entry point of the model, the input layer is responsible for handling complex multi-dimensional time series data. This layer transforms the input data into a three-dimensional array format suitable for the LSTM structure, with dimensions (number of samples S, time steps T = {T₁, T₂, … T_i}, feature dimensions X = {X₁, X₂, … X_j}). Here, S represents the total number of samples in the dataset, while i and j denote specific indices for time steps and feature dimensions, respectively.
Encoder–decoder LSTM layer: The encoder deeply encodes multi-dimensional input features to create a new comprehensive feature representation. Then, the decoder phase combines this comprehensive representation with the original input features. Through this strategy, the model reveals the inherent connections in the data, enhancing understanding of the input data’s structure and dynamics. This enhanced feature set significantly reduces the model’s uncertainty in predicting soil moisture, ensuring closer correspondence of predictions to actual observational data.
Fully connected LSTM layer: The objective of this layer is to integrate the output from the encoder–decoder LSTM segment with the initial input soil moisture characteristics, mitigating overfitting and adjusting for any predictive discrepancies originating from the encoder–decoder LSTM structure.
Attention layer: In this layer, a feature attention mechanism is used to calculate weights for each time step in the input sequence, denoted as ${α_{} = {{α}_{1}, α}_{2}, α_{3} \dots α_{m}}$ , where m is the total number of time steps and $α_{m}$ represents the feature weight for the mth time step. By using these dynamically adjusted weights, the model dynamically adjusts these weights through integrating time series data, thus allocating more attention to important data and reducing attention to less important information.
Output layer: The task of this layer is to output the model’s prediction results, with an output format of (output sample number O, time steps H). The design of this layer ensures that the model can make precise predictions for future sequences based on historical and current input information.

Through the collaborative work of these layers, the REDF-LSTM model not only enhances the understanding and processing capability of time series data features but also effectively improves the accuracy and efficiency of soil moisture prediction.

Although the REDF-LSTM model demonstrates outstanding performance in predicting soil moisture, it still has limitations that may affect its practicality and scalability in broader applications. Firstly, due to the complexity of the model structure, particularly with the integration of residual learning in the encoder–decoder architecture and the feedforward attention mechanism, REDF-LSTM demands high computational resources. This substantial resource consumption could limit its application in resource-constrained environments, such as real-time soil moisture monitoring on mobile or remote sensing devices. Secondly, while the model improves prediction accuracy by deeply learning complex patterns in time series, this deep learning approach may make the model more reliant on extensive training data. In scenarios of data scarcity or low data quality, the model’s predictive performance might be compromised. Additionally, the adaptability and robustness of REDF-LSTM in handling extreme climate events still need enhancement, as the characteristics of these extreme events may significantly differ from the patterns in the training data.

Addressing these limitations, future research could focus on the following areas:

Model Optimization and Simplification: Investigating more efficient network architectures or simplifying the REDF-LSTM model through techniques like model pruning could reduce its computational resource demands. This would make the model more suitable for deployment in environments with limited computing capabilities, broadening its application scope.
Enhancing Data Efficiency: Developing new training strategies or data augmentation techniques to reduce the model’s reliance on large amounts of training data. For example, utilizing data from different but related domains through transfer learning or semi-supervised learning methods could enhance the model’s training process.
Improving Model Adaptability and Robustness: Introducing more adaptive and self-regulating mechanisms to better handle extreme weather events and other unconventional inputs. For instance, integrating graph neural networks could improve the understanding and prediction of interactions within complex climate networks.
Exploring Cross-Domain Applications: By testing and validating the REDF-LSTM model in different environmental and application contexts, its versatility and effectiveness can be further assessed and optimized. This not only promotes the model’s application beyond soil moisture prediction to other environmental monitoring tasks but also deepens the understanding of its performance in various practical scenarios.

Through these research directions, the potential of the REDF-LSTM model could be further explored and optimized, thereby playing a greater role in soil moisture prediction and broader environmental and climate monitoring fields.

2.2.4. Model Setting, Training

The predictive accuracy of the REDF-LSTM model is influenced by multiple parameters, including batch size, learning rate, iteration number, the scale of hidden layers, and training rounds (Epochs). Detailed adjustment of these parameter values is crucial for constructing an optimally performing prediction model. We optimized the parameters to ensure REDF-LSTM operates at its best, as shown in Table 1.

Test results indicate that different settings of hyperparameters significantly impact model performance. For example, a high learning rate (1 × 10⁻²) might cause the model to miss the lowest loss point, affecting accuracy (R value of 0.8587). In contrast, a low learning rate (1 × 10⁻⁴) can mitigate fluctuations during training and prevent over-adjustment, although it might sometimes result in the model settling for suboptimal solutions (R value of 0.9383). The number of training iterations, or niter, is another decisive factor affecting the model’s depth of understanding of the data and its generalization ability. Fewer iterations (300) might not allow the model to learn data patterns deeply enough (R value of 0.9451), while too many iterations (500) might lead to overfitting (R value of 0.9483). The scale of hidden layers directly determines the model’s capability to capture complex data structures. A smaller hidden layer size (128) may not effectively process complex patterns (R value of 0.9412), while a larger hidden layer (512) increases model capacity but also makes finding global or good local optima more difficult (R value of 0.9467). The number of Epochs reflects the number of times the entire dataset is reused during model training. Fewer Epochs (800) might lead to insufficient learning (R value of 0.9465), while more Epochs (1200) could increase the risk of overfitting (R value of 0.9407). The choice of batch size is also crucial; too small (32) might hinder model convergence (R value of 0.9412), while larger (128) might reduce the match with optimal performance (R value of 0.9467).

Our research extends the foundational work of Li et al. [49] on LandBench, which has established a benchmark dataset and predictive toolkit specifically tailored to propel advancements in data-driven modeling of surface variables. LandBench is pivotal in enhancing the robustness of deep learning frameworks for surface variable predictions and in bridging interdisciplinary collaborations between computer science and earth system science communities.

In this study, we employed surface features, atmospheric conditions, and other static environmental variables as inputs to train the REDF-LSTM model for predicting soil moisture. The data utilized were divided into three main segments: training set, validation set, and test set. Specifically, data from 1 January 1990 to 31 December 2019 were selected for training and validation, split in an 80:20 ratio. To enhance computational efficiency, the training data for all deep learning models were compiled by randomly selecting grid points. The validation set played a crucial role in our research; it was used not only for tuning the model’s hyperparameters but also for helping us prevent overfitting on the training data. The test set comprised data from 1 January 2020 to 31 December 2020 and was used to evaluate the model’s performance on unseen data. This structure ensures that the model is robust and generalizable, capable of handling real-world scenarios effectively.

Furthermore, we conducted a thorough preprocessing of the dataset collected from 1990 to 2020 to ensure the quality and consistency of the input data. Initially, we identified and addressed outliers within the dataset while employing imputation techniques to fill missing values, ensuring the integrity of the data. Furthermore, all variables were calibrated for consistency across time scales, ensuring accurate alignment in time series analysis.

To maintain the comparability and consistency of data in model training and analysis, we adopted the Min-Max normalization method, mapping all feature values between 0 and 1. This normalization technique not only simplifies the complexity of model processing but also effectively maintains the original distribution characteristics of the data. Since Min-Max normalization is suitable for data with known upper and lower bounds, it provides a uniform metric, allowing for effective analysis and model building without altering the distribution of the data themselves.

Additionally, all models were trained using the Adam optimization algorithm with its default learning rate setting (initial learning rate set at 0.001), utilizing its adaptive learning rate adjustment feature to optimize the training process. The choice of the Adam algorithm is based on its superior performance in handling large datasets, especially in effectively addressing non-convex optimization problems in deep learning scenarios. By meticulously recording and adjusting these parameters, we ensured the stability of model training and the reliability of the final prediction results.

To ensure the comparability and accuracy of experimental results, this study conducted global soil moisture predictions using the LSTM, EDLSTM, FAMLSTM, GANBiLSTM, and REDFLSTM models. All models employed the same data preprocessing strategy and uniform hyperparameter settings. Specific procedures included thorough data standardization (outlier handling, missing value imputation, and Min-Max normalization) and meticulous hyperparameter grid search, ensuring that each model achieved stable convergence under optimal parameters. We also monitored the training loss and validation metrics of each model, used early stopping techniques to prevent overfitting, and ensured good generalization capability on unseen data. Ultimately, the performance of each model was evaluated and compared on the test set using key metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Coefficient of Determination (R²) to validate the effectiveness and reliability of the methods, ensuring the fairness and scientific integrity of the research results.

Experiments were conducted on our server, with a CPU configuration of Intel(R) Xeon(R) Gold 6330 @ 2.00GHz and an NVIDIA A800 GPU, supporting CUDA version 12.1. The operating system was Linux, and the deep learning framework used was PyTorch, with NVIDIA’s CUDA backend for improved computational efficiency and performance. Furthermore, to reduce the randomness of experiments, results are based on the average of 5 repeated experiments, ensuring stability and reliability. All comparative experiments used the same random seed value to ensure consistency. The experimental design aims to ensure the accuracy and repeatability of results, demonstrating the effectiveness of the REDF-LSTM model in predicting soil moisture.

2.2.5. Model Evaluation

In this study, we used multiple statistical indicators to measure the predictive results: Root Mean Square Error (RMSE), BIAS, Coefficient of Determination (R²), and Kling–Gupta Efficiency (KGE) to comprehensively evaluate the performance of the predictive model. RMSE assesses the average error and volatility of the prediction results. The Coefficient of Determination (R²) measures the degree to which the model explains the variability of the observed values. BIAS is the average difference between the predicted and actual values, measuring whether the predictive model systematically overestimates or underestimates the results. KGE (Kling–Gupta Efficiency) is a comprehensive efficiency metric used to evaluate model performance, combining the correlation coefficient, the ratio of standard deviations, and the bias ratio to provide a comprehensive assessment of model simulation accuracy. KGE is particularly suitable for evaluating hydrological and environmental science models, including soil moisture prediction models. The closer the RMSE and BIAS values are to 0, and the closer the R² and KGE values are to 1, the better the model’s predictive accuracy.

The formulas are as follows:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}

(10)

B I A S = \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}_{}}{N}

(11)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \overline{y})}^{2}}

(12)

K G E = 1 - \sqrt{(r - 1)^{2} + (α - 1)^{2} + (β - 1)^{2}}

(13)

where

y_{i}

represents the observed soil moisture for the i^th observation,

{\bar{y}}_{}

represents the average observed soil moisture, and

{\hat{y}}_{i}

indicates the soil moisture predicted by the model at the i^th time step. N is the total number of observations.

r

is the Pearson correlation coefficient. α quantifies the ratio of standard deviations between model predictions and actual observations, expressed as

α = \frac{σ_{s i m}}{σ_{o b s}}

, with

σ_{s i m}

being the standard deviation of model predictions and

σ_{o b s}

being the standard deviation of actual observations, assessing how well the model’s variability matches that of the observed variability.

β

calculates the ratio of average predicted values to average actual values,

β = \frac{μ_{s i m}}{\overline{y}}

, where

μ_{s i m}

is the average of the model’s predictions, evaluating the extent of bias in the model’s predictions.

3. Results

3.1. Box Plot Comparison of Model Performances

In this study, box plots were used to compare the standard LSTM model with REDF-LSTM, ED-LSTM, FAM-LSTM, and GANBi-LSTM models. The box plots clearly show the R², KGE, BIAS, and RMSE for the five models (Figure 5). Summarized over the ERA5-Land dataset, REDF-LSTM exhibited the highest R² and KGE values and the lowest BIAS and RMSE.

For the standard LSTM model, the median R² was about 0.721, indicating relatively high prediction accuracy. However, its first quartile (Q1) and third quartile (Q3) were 0.416 and 0.870, respectively, revealing significant variability or uncertainty in model performance. Compared to LSTM, REDF-LSTM, FAM-LSTM, ED-LSTM, and GANBi-LSTM all showed substantial improvements, with REDF-LSTM achieving the greatest increase in R². Similarly, REDF-LSTM displayed the highest KGE value. Although the LSTM model already had a relatively low RMSE of 0.025 and BIAS of 0.019, the ED-LSTM, FAM-LSTM, and GANBi-LSTM models still demonstrated significant improvements. Among them, REDF-LSTM, with an RMSE of 0.018 and BIAS of 0.013, showed the best performance among all models. This indicates that the REDF-LSTM model not only outperforms the standard model in terms of median values but also exhibits more robust consistency in overall predictions, with smaller fluctuations in performance. These findings suggest that the predictive capability of the LSTM model has significant potential for improvement by introducing additional features or technological advancements.

3.2. Visual Comparison of Model Performance

To validate the performance of EDT-LSTM and compare it with other existing state-of-the-art prediction models like ED-LSTM, FAM-LSTM, and GANBi-LSTM, substantial improvements in BIAS, RMSE, and R² indicators were observed (Figure 6). Globally, LSTM exhibits significant, spatially varying biases, as illustrated by the R² map of LSTM in Figure 6. We noticed that, except for the Antarctic and Arctic regions, REDF-LSTM, ED-LSTM, FAM-LSTM, and GANBi-LSTM showed a widespread decrease in R² compared to LSTM across global terrestrial areas, especially in the western region of North America and most parts of Asia, as highlighted by the orange areas in the figure. Among them, REDF-LSTM demonstrated the most notable improvement, with R² optimization of 13.07% compared to LSTM.

In terms of BIAS, REDF-LSTM’s predictions showed significantly more improvement in meteorology than LSTM, with REDF-LSTM’s BIAS below 0.02 in 80.98% of the areas, while LSTM’s BIAS below 0.02 only accounted for 55.72%. Across most global terrestrial areas, the improvement in BIAS was greater than 24.86%, and in inland water bodies, particularly near the Caspian Sea and the Hudson Bay area in Canada, the improvement in REDF-LSTM’s BIAS reached up to 92.02%. It is observed that this representation diverges from the BIAS layout of LSTM, suggesting that the accurate seasonal adjustment differs from the bias correction approach. It is theorized that LSTM has markedly enhanced the dynamics of soil moisture in farming regions (like those under irrigation) and has more accurately depicted the superficial soil’s reaction to the Appalachian Plateau [29].

On the other hand, REDF-LSTM’s RMSE below 0.03 and below 0.02 accounted for 89.76% and 53.66% of the global land surface, respectively, compared to LSTM’s RMSE below 0.03 and below 0.02, which accounted for 70.61% and 26.56% of the global land surface, respectively improving by 19.15% and 27.1%. We discovered an interesting phenomenon that REDF-LSTM’s predictions are significantly better between 30 and 50 degrees north latitude than near the equator. This may be attributed to several key factors: firstly, the temperate regions’ climate types have more regular seasonal changes, which are easier for REDF-LSTM to capture; secondly, these areas often have a denser layout of observation stations and high-quality data, providing a better training foundation for the model; additionally, the pronounced seasonal changes in temperate regions provide easily learnable cyclical patterns for the REDF-LSTM model. In contrast, climate changes in the equatorial region are more complex and variable, and the quality and coverage of data may be lower, increasing the difficulty of model prediction. These factors collectively lead to differences in the improvement of the REDF-LSTM model’s predictive performance across different latitudinal regions.

3.3. Time Series Plots Comparison of Model Performance

During the time generalization evaluation, REDF-LSTM, FAM-LSTM, ED-LSTM, and GANBi-LSTM, featuring extended time series, demonstrated superior performance compared to LSTM over a broad spectrum of performance metrics, with REDF-LSTM exhibiting the most notable enhancements. For the locations in Figure 7 (46.0° S, 70.0° W), (59.0° N, 70.0° E), and (51.0° N, 60.0° E), characterized by frequent rainfall events and their decay, REDF-LSTM, in particular, exhibited curves that were closer to the observed curves, especially near the peaks, and accurately captured changes in the peaks. ED-LSTM, FAM-LSTM, and GANBi-LSTM all had average performance, with LSTM performing the worst, its curve being far from the observed curve.

At the location (49.0° N, 110.0° W), the six curves generally showed similar trends, indicating that all models were able to roughly follow the frequent rainfall events and their decay, mirroring the actual observed changes in trends.

The case at (69.0° N, 150.0° E) demonstrates that in predicting soil moisture, REDF-LSTM was also able to show better prediction levels compared to other advanced models. This underscores REDF-LSTM’s enhanced capability in accurately modeling and forecasting soil moisture dynamics, even in the presence of complex weather events and their impacts on soil moisture content across different geographical locations.

The REDF-LSTM model exhibits significant performance advantages in handling varying climatic conditions across different geographical regions. This model is tailored to accommodate fluctuating weather patterns and complex climate behaviors, enhancing its capability to precisely simulate and forecast the climatic conditions of various regions. It is particularly effective in predicting crucial climate variables like rainfall occurrences and soil moisture levels. For instance, in Figure 7, for specific coordinates in South America and Asia, REDF-LSTM, through its advanced neural network architecture, successfully captures the peaks and decay of rainfall events, displaying prediction curves that are closer to actual observational data compared to traditional LSTM models. This capability is attributed to specific mechanisms introduced in the model, such as further mining complex spatiotemporal soil moisture information through the encoder–decoder and prioritizing crucial information with the feedforward attention mechanism. These allow the model to adjust its internal states based on real-time changes in input data, thus more accurately reflecting complex climate variables.

The features of the REDF-LSTM model demonstrate broad application potential in fields such as precision agriculture and ecosystem management. For example, in precision agriculture, accurate prediction of soil moisture is crucial for irrigation management, crop growth monitoring, and yield forecasting. With the REDF-LSTM model, agricultural managers can optimize irrigation systems based on high-precision soil moisture data provided by the model, achieve efficient use of water resources, and maximize crop yield and quality. Additionally, the model’s capabilities can also be used to forecast potential excess or lack of moisture due to extreme weather events, thus formulating preemptive measures to mitigate these events’ impact on agricultural production.

In ecosystem management, the REDF-LSTM model can provide key insights into changes in soil moisture, which are particularly important for protecting wetland ecosystems, preventing desertification, and managing nature reserves. For example, by monitoring and simulating soil moisture dynamics within protected areas, managers can better understand the water cycling processes in ecosystems and timely adjust conservation measures to support biodiversity protection and maintain ecological balance. Furthermore, the application of the REDF-LSTM model can also extend to forest fire prevention, predicting soil moisture reduction under extreme drought conditions, and providing early warnings for high-risk areas, thus implementing preventative measures to reduce the likelihood of fires.

3.4. Comparative Error Bar Graph of Model Performance

This error bar graph provides an intuitive display comparing the performance of five models in the task of predicting global soil moisture, arranged from left to right as follows: REDFLSTM, LSTM, EDLSTM, FAMLSTM, and GANBiLSTM (Figure 8). The focus is on highlighting the superiority of the REDF-LSTM model, which is clearly evident from the error bars in the graph.

Error bar graphs are graphical representations used to depict the variability of data, informing us about the stability of model predictions. In this graph, the R² values of each model are represented by bar charts, while the error bars show the uncertainty of these R² values, specifically manifested as the size of the standard errors. When comparing models, we consider not only the height of the bar charts (the size of the R² values) but also the length of the error bars, which reflect the precision and reliability of the model predictions.

The REDF-LSTM model’s bar is the highest, with an average R² value of 0.846, significantly surpassing the other models, indicating its strongest predictive capability. Simultaneously, it has the shortest error bar with the lowest standard error at only 0.00229. Compared to the standard errors of the LSTM (0.00291), EDLSTM (0.00243), FAMLSTM (0.00249), and GANBiLSTM (0.00253) models, this indicates that the REDFLSTM model not only achieves high accuracy but also exhibits excellent consistency and repeatability in predicting soil moisture. Compared to other models, REDFLSTM also shows the best stability in predictions, an extremely important attribute for scientific forecasting.

Therefore, from the error bar graph, we can conclude that REDFLSTM demonstrates outstanding performance in predicting global soil moisture, leading not only in accuracy but also in the stability and credibility of prediction results. The utilization of this model, particularly for environmental monitoring and decision-making that relies on precise soil moisture data, is highly valuable.

3.5. Forecasting the Seventh Day Ahead and Predictive Results Surface_Sensible_Heat_Flux

In addition, we conducted forecasts of soil moisture for the seventh day ahead, comparing the performance of LSTM and REDFLSTM models. As clearly delineated in three distinct areas within Figure 9, the REDFLSTM model exhibits lighter shades compared to LSTM, indicating a lower BIAS. This observation is further supported by the marked values of R, RMSE, and BIAS within the figure. Specifically, while the LSTM model achieved an R value of 0.685, an RMSE of 0.039, and a BIAS of 0.029, the REDFLSTM model demonstrated enhancements across all metrics. These results affirm that for predicting soil moisture on the seventh day ahead, REDFLSTM consistently outperforms LSTM, showcasing its superior reliability.

Given the crucial link between surface latent and sensible heat fluxes and their impact on atmosphere–ocean interactions and climate change, such fluxes are key factors in soil moisture prediction studies [23]. In this context, our study extended to predict the forthcoming day’s latent heat flux with the use of surface sensible heat flux, testing the durability of our model. Latent heat flux, vital for balancing surface heat and moisture levels, is affected by various surface features, including terrain and vegetation types [50]. As illustrated in the scatterplot (Figure 10), the REDFLSTM model records an R value of 0.854 and an RMSE of 14.543, surpassing the LSTM model, which shows an R value of 0.843 and an RMSE of 15.478. This performance suggests that REDF-LSTM provides a more precise prediction of latent heat flux, with its trend line more closely aligned with the perfect fit line (y = x), demonstrating superior predictive precision.

3.6. Ablation Experiments

In our ablation study conducted on the ERA5-Land dataset, we thoroughly examined the role of different model components in the performance of soil moisture prediction (Table 2). Here is an in-depth analysis of these models and their performance metrics: LSTM (Baseline Model): Serving as the benchmark for comparison, this model demonstrated considerable predictive capability, with an R value of 0.917. However, there is room for improvement, as indicated by its performance on KGE (0.749), RMSE (0.024), and BIAS (0.019) metrics. ED-LSTM (Encoder–Decoder Module Integration): Compared to the baseline, this variant showed enhanced capability to process structured information, with an improvement in KGE to 0.856, although it has not yet reached optimal prediction accuracy. FAM-LSTM (Feedforward Attention Mechanism): By integrating the attention mechanism into LSTM, we observed a significant reduction in prediction bias, particularly the improvement in the BIAS metric (from 0.019 in LSTM to 0.014 in FAM-LSTM), indicating effective control over systematic errors in model predictions. REDF-LSTM (Integrated Model): This model, integrating both aforementioned modules and through careful tuning and optimization, exhibited excellent performance across all indicators, especially in R (improved from 0.917 in LSTM to 0.951) and KGE (improved from 0.749 to 0.869), demonstrating its superior predictive consistency and efficiency. A notable reduction in RMSE to 0.013 and the smallest BIAS among all models consistently highlighted its significant advantage in prediction accuracy.

In this study, we further explored the contribution of individual components to the overall performance of the model. The performance boost in ED-LSTM is largely attributed to its enhanced spatiotemporal information encoding capability; FAM-LSTM, when processing sequence data, better captures crucial features at key moments through the attention mechanism. REDF-LSTM performed the best, revealing how the interaction between residual networks and the attention mechanism synergistically promotes comprehensive improvement in predictive performance. This synergistic effect not only enhances the model’s adaptability but also significantly boosts its generalizability in complex environments.

Our findings indicate that targeted improvements to the LSTM model can effectively enhance prediction accuracy. The outstanding performance of the REDF-LSTM model provides a solid foundation for more accurate global-scale soil moisture predictions in the future, holding profound practical applications for climate change research and environmental management.

4. Discussion

This study integrates a residual learning-based encoder–decoder structure with a feedforward attention mechanism, aiming to deeply decipher complex patterns in time-series data and accurately pinpoint crucial influencing factors, thereby achieving significant performance improvements in predicting soil moisture. The effectiveness of the proposed REDF-LSTM model is evidenced by its superior performance on the ERA5-Land dataset, surpassing that of leading models such as ED-LSTM, FAM-LSTM, GANBi-LSTM, and conventional LSTM. Notably, the incorporation of a residual-based encoder–decoder and feedforward attention mechanism led to an average increase in the R² value by 13.07%, a decrease in root mean square error (RMSE) by 20.98%, and an improvement in the Kling–Gupta Efficiency (KGE) by 11.1% for predicting soil moisture one day ahead (Figure 5). Overall, these results highlight the importance of exploring the intricacies of time-series data and focusing on key information through a feedforward attention mechanism. This not only significantly enhances the accuracy of soil moisture predictions but also offers a valuable methodological approach for future time-series analysis, with significant theoretical and practical implications.

Compared to the research by Li et al. [5], which utilizes an encoder–decoder structure, our study builds on the LSTM model to further investigate the effect of an encoder–decoder based on residual learning on improving the prediction accuracy of LSTM models. Unlike conventional methods, our research employs feature fusion technology, which allows for a more in-depth exploration of the intrinsic complexities of time series data compared to the dual-layer structure approach of EDT-LSTM. Additionally, inspired by the feedforward attention mechanism (FAM) [38], we have integrated a feedforward attention mechanism into our encoder–decoder-based LSTM model. This mechanism dynamically adjusts weight distribution, prioritizing more significant information, thereby effectively enhancing the model’s capacity to process time series data. In contrast to the method proposed by Yang et al. [40], where a feedforward attention mechanism is directly added to LSTM, our approach, which combines an encoder–decoder with residual learning, not only augments the model’s ability to mine complex data but also improves the accuracy and performance of time series prediction. This indicates that through carefully designed feature fusion and attention mechanisms, the LSTM model can be further optimized to accommodate the increasingly complex and variable demands of time series analysis.

In our comparative study of ED-LSTM, FAM-LSTM, and GANBi-LSTM against the traditional LSTM networks, we have observed significant enhancements in prediction accuracy with the advanced models. This improvement is visually represented in Figure 6, which clearly illustrates the performance gains of LSTM-based optimized models over the standard LSTM. Notably, these enhancements are attributed to innovative modifications and optimizations to the LSTM architecture, such as the incorporation of attention mechanisms and the adoption of encoder–decoder structures. The ED-LSTM model, for instance, leverages an enhanced deep learning framework to delve into the deeper features of data, thereby effectively improving the memory and prediction capabilities of LSTM. The FAM-LSTM model integrates a fused attention mechanism, which prioritizes crucial information, optimizing the information flow process and consequently increasing prediction accuracy. Meanwhile, the GANBi-LSTM model combines generative adversarial networks with bidirectional LSTM, enhancing the model’s understanding and prediction of time series data through a competitive learning mechanism. The optimized models based on LSTM not only demonstrate the potential of technical improvements such as attention mechanisms and encoder–decoder structures to enhance LSTM networks but also prove that these strategies can significantly increase the prediction accuracy on complex datasets. These findings offer valuable insights for future LSTM model optimizations and time series data analysis, outlining specific directions and methods to enhance model performance. Through continuous exploration and experimentation, we can anticipate the development of more accurate and efficient models in the field of time series prediction.

The REDF-LSTM model, by integrating advanced residual learning encoders–decoders with feedforward attention mechanisms, brings breakthrough improvements in data prediction within dynamic environments. This combination not only enhances the model’s ability to understand complex patterns in time series but also effectively identifies key factors influencing soil moisture changes through the feedforward attention mechanism, significantly enhancing prediction accuracy. In the field of environmental science, this approach is particularly suited to handling complex datasets related to climate change, such as predicting air quality indices, monitoring potential risks of forest fires, or simulating the impacts of urban heat islands. The residual learning encoder–decoder analyzes long-term dependencies in historical data, enhancing the model’s grasp of interactions between environmental variables; simultaneously, the feedforward attention mechanism automatically highlights features most impactful to the prediction outcomes during model training, ensuring focus on the most critical information.

Additionally, the REDF-LSTM methodology can be applied in other domains requiring precise time series prediction, such as in the healthcare sector for tracking disease progression trends or predicting changes in patient health conditions. The applicability of this model lies in its ability to adjust in real time while receiving continuous input data, accurately capturing and responding to changes in environmental conditions. Overall, the innovative structure of REDF-LSTM offers an efficient and accurate method for processing and predicting complex data sequences, especially in scenarios sensitive to dynamic data and requiring high-precision predictions.

Through an in-depth analysis of Figure 7, we identified a pervasive lagging phenomenon in predictive models. This issue frequently arises in time series forecasting, where the root cause is the reliance of machine learning models on historical data to predict future trends, aiming to minimize errors. To address the lagging phenomenon, Hao et al. [50] applied Empirical Mode Decomposition (EMD) techniques for data preprocessing without sacrificing accuracy, thereby serving as a new input source for predictive models. Balancing predictive accuracy and mitigating latency issues remains a challenge in time series forecasting. He posits that employing Empirical Mode Decomposition (EMD) offers a viable solution to this problem, providing an effective strategy for addressing prediction latency in critical applications such as soil moisture. Furthermore, Zhao et al. [51] demonstrated that Ensemble Empirical Mode Decomposition (EEMD) captures periodic changes and environmental conditions. When combined with the attention mechanism, it significantly enhances predictive performance and effectively mitigates prediction lag. This offers an effective strategy for addressing prediction latency in critical applications such as soil moisture.

5. Conclusions

This study introduced a novel soil moisture prediction model, Residual-Encode-Decode-FAM-LSTM (REDFLSTM), combining the encoder–decoder of residual networks and feature attention mechanism (FAM) with long short-term memory (LSTM) networks. After comprehensive evaluation, the REDF-LSTM model demonstrated superior performance in soil moisture prediction tasks compared to the traditional LSTM model as well as currently advanced models like ED-LSTM, FAM-LSTM, and GANBi-LSTM, particularly in terms of accuracy and reliability improvements, proving its potential value in enhancing crop quality and achieving sustainable water resource utilization.

However, we recognize that every model has its limitations. While the REDF-LSTM model showed excellent performance in this study, its efficiency and accuracy might be influenced by the complexity of different geographical areas, various soil types, and changing climate conditions. Our research assumes the model can be broadly applicable in agriculture, water resource management, meteorology, urban planning, and disaster early warning, but this may oversimplify the complexities of different fields. Thus, the applicability and potential limitations of the model need further verification and discussion in a broader range of real-world application scenarios.

Moreover, while we discussed the potential applications of the REDF-LSTM model in various fields, we also realize that specific evidence and case studies about its real-world impact remain limited. Future work will include testing and optimizing the model in specific applications, such as integration with existing physical models or exploring the combination of the model with physical mechanisms to further enhance the model’s performance and application scope. Additionally, exploring the lag phenomenon within the model and its solutions will also be an important direction for future research.

In summary, the REDF-LSTM model provides an effective new tool for soil moisture prediction, but we are also aware of the model’s limitations and challenges. By acknowledging these limitations and exploring future research directions, we hope not only to further optimize the REDF-LSTM model but also to offer valuable references and inspiration for researchers in related fields.

Author Contributions

Conceptualization, Q.L., X.L., and Z.Z.; methodology, Z.Z., Q.L., X.L., and J.Z.; formal analysis, Z.Z. and X.L.; investigation, X.L. and Z.Z.; resources, Q.L.; data curation, Q.L.; writing—original draft preparation, Z.Z. and J.Z.; writing—review and editing, Q.L., X.L., and Z.Z.; supervision, Q.L., X.L., and J.Z.; project administration, Q.L. and J.Z.; funding acquisition, Q.L. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

The study was partially supported by the National Natural Science Foundation of China, grant number 42275155, 42105144, the Jilin Provincial Science and Technology Development Plan Project under grants 20230101370JC, and the Jilin Provincial Department of Education Science and Technology Research Project under grants JJKH20220840KJ, JJKH20230919KJ.

Data Availability Statement

The data in this study have been explained in the article. For detailed data, please contact the first author or corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Denissen, J.M.C.; Orth, R.; Wouters, H.; Miralles, D.G.; van Heerwaarden, C.C.; de Arellano, J.V.-G.; Teuling, A.J. Soil moisture signature in global weather balloon soundings. npj Clim. Atmos. Sci. 2021, 4, 13. [Google Scholar] [CrossRef]
Koster, R.D.; Dirmeyer, P.A.; Guo, Z.; Bonan, G.; Chan, E.; Cox, P.; Gordon, C.T.; Kanae, S.; Kowalczyk, E.; Lawrence, D.; et al. Regions of strong coupling between soil moisture and precipitation. Science 2004, 305, 1138–1140. [Google Scholar] [CrossRef]
Norbiato, D.; Borga, M.; Esposti, S.D.; Gaume, E.; Anquetin, S. Flash flood warning based on rainfall thresholds and soil moisture conditions: An assessment for gauged and ungauged basins. J. Hydrol. 2008, 362, 274–290. [Google Scholar] [CrossRef]
Dirmeyer, P.A.; Gao, X.; Zhao, M.; Guo, Z.; Oki, T.; Hanasaki, N. GSWP-2: Multimodel Analysis and Implications for Our Perception of the Land Surface. Bull. Am. Meteorol. Soc. 2006, 87, 1381–1397. [Google Scholar] [CrossRef]
Li, Q.; Li, Z.; Shangguan, W.; Wang, X.; Li, L.; Yu, F. Improving soil moisture prediction using a novel encoder-decoder model with residual learning. Comput. Electron. Agric. 2022, 195, 106816. [Google Scholar] [CrossRef]
Rosenbaum, U.; Bogena, H.; Herbst, M.; Huisman, J.A.; Peterson, T.; Weuthen, A.; Western, A.W.; Vereecken, H. Seasonal and event dynamics of spatial soil moisture patterns at the small catchment scale. Water Resour. Res. 2012, 48, W10544. [Google Scholar] [CrossRef]
Li, Q.; Zhao, Y.; Yu, F. A Novel Multichannel Long Short-Term Memory Method with Time Series for Soil Temperature Modeling. IEEE Access 2020, 8, 182026–182043. [Google Scholar] [CrossRef]
Martinez, C.; Hancock, G.R.; Kalma, J.D.; Wells, T. Spatio-temporal distribution of near-surface and root zone soil moisture at the catchment scale. Hydrol. Process. 2008, 22, 2699–2714. [Google Scholar] [CrossRef]
Zhu, S.; Chen, H.; Dong, X.; Wei, J. Influence of persistence and oceanic forcing on global soil moisture predictability. Clim. Dyn. 2020, 54, 3375–3385. [Google Scholar] [CrossRef]
Shoaib, M.; Shamseldin, A.Y.; Melville, B.W.; Khan, M.M. A comparison between wavelet based static and dynamic neural network approaches for runoff prediction. J. Hydrol. 2016, 535, 211–225. [Google Scholar] [CrossRef]
Henderson-Sellers, A.; Yang, Z.-L.; Dickinson, R.E. The Project for Intercomparison of Land-surface Parameterization Schemes. Bull. Am. Meteorol. Soc. 1993, 74, 1335–1349. [Google Scholar] [CrossRef]
Rasp, S.; Dueben, P.D.; Scher, S.; Weyn, J.A.; Mouatadid, S.; Thuerey, N. WeatherBench: A benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 2020, 12, e2020MS002203. [Google Scholar] [CrossRef]
Sabzipour, B.; Arsenault, R.; Troin, M.; Martel, J.-L.; Brissette, F.; Brunet, F.; Mai, J. Comparing a long short-term memory (LSTM) neural network with a physically-based hydrological model for streamflow forecasting over a Canadian catchment. J. Hydrol. 2023, 627, 130380. [Google Scholar] [CrossRef]
Li, Q.; Wang, Z.; Shangguan, W.; Li, L.; Yao, Y.; Yu, F. Improved daily SMAP satellite soil moisture prediction over China using deep learning model with transfer learning. J. Hydrol. 2021, 600, 126698. [Google Scholar] [CrossRef]
Lei, G.; Zeng, W.; Yu, J.; Huang, J. A comparison of physical-based and machine learning modeling for soil salt dynamics in crop fields. Agric. Water Manag. 2023, 277, 108115. [Google Scholar] [CrossRef]
Jung, C.; Lee, Y.; Cho, Y.; Kim, S. A Study of Spatial Soil Moisture Estimation Using a Multiple Linear Regression Model and MODIS Land Surface Temperature Data Corrected by Conditional Merging. Remote Sens. 2017, 9, 870. [Google Scholar] [CrossRef]
Shrestha, N.K.; Shukla, S. Support vector machine based modeling of evapotranspiration using hydro-climatic variables in a sub-tropical environment. Bioresour. Technol. 2013, 128, 351–358. [Google Scholar] [CrossRef]
Pan, J.; Shangguan, W.; Li, L.; Yuan, H.; Zhang, S.; Lu, X.; Wei, N.; Dai, Y. Using data-driven methods to explore the predictability of surface soil moisture with FLUXNET site data. Hydrol. Process. 2019, 33, 2978–2996. [Google Scholar] [CrossRef]
Li, Q.; Hao, H.; Zhao, Y.; Geng, Q.; Liu, G.; Zhang, Y.; Yu, F. GANs-LSTM Model for Soil Temperature Estimation from Meteorological: A New Approach. IEEE Access 2020, 8, 59427–59443. [Google Scholar] [CrossRef]
Li, P.; Zha, Y.; Shi, L.; Tso, C.-H.; Zhang, Y.; Zeng, W. Comparison of the use of a physical-based model with data assimilation and machine learning methods for simulating soil water dynamics. J. Hydrol. 2020, 584, 124692. [Google Scholar] [CrossRef]
Gumiere, S.J.; Camporese, M.; Botto, A.; Lafond, J.A.; Paniconi, C.; Gallichand, J.; Rousseau, A.N. Machine Learning vs. Physics-Based Modeling for Real-Time Irrigation Management. Front. Water 2020, 2, 8. [Google Scholar] [CrossRef]
Zhang, G. Synergistic advantages of deep learning and reinforcement learning in economic forecasting. Int. J. Glob. Econ. Manag. 2023, 1, 89–95. [Google Scholar] [CrossRef]
Hou, X.; Feng, Y.; Wu, G.; He, Y.; Chang, D.; Yang, H. Application research on artificial neural network dynamic prediction model of soil moisture. Water Sav. Irrigation. 2016, 7, 70–72. [Google Scholar]
Li, N.; Zhang, Q.; Yang, F.X.; Deng, Z.L. Research of adaptive genetic neural network algorithm in soil moisture prediction. Comput. Eng. Appl. 2018, 54, 54–59+69. [Google Scholar]
Tesch, T.; Kollet, S.; Garcke, J. Causal deep learning models for studying the Earth system. Geosci. Model Dev. 2023, 16, 2149–2166. [Google Scholar] [CrossRef]
Chen, Z.; Zhang, R.; Song, Y.; Wan, X.; Li, G. Advancing Visual Grounding with Scene Knowledge: Benchmark and Method. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Ding, J.; Wen, L.; Zhong, C.; Loffeld, O. Video SAR Moving Target Indication Using Deep Neural Network. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7194–7204. [Google Scholar] [CrossRef]
Li, Q.; Zhang, C.; Shangguan, W.; Li, L.; Dai, Y. A novel local-global dependency deep learning model for soil mapping. Geoderma 2023, 438, 116649. [Google Scholar] [CrossRef]
Fang, K.; Shen, C. Full-flow-regime storage-streamflow correlation patterns provide insights into hydrologic functioning over the continental US. Water Resour. Res. 2017, 53, 8064–8083. [Google Scholar] [CrossRef]
Fang, K.; Shen, C. Near-Real-Time Forecast of Satellite-Based Soil Moisture Using Long Short-Term Memory with an Adaptive Data Integration Kernel. J. Hydrometeorol. 2020, 21, 399–413. [Google Scholar] [CrossRef]
Lal, P.; Shekhar, A.; Gharun, M.; Das, N.N. Spatiotemporal evolution of global long-term patterns of soil moisture. Sci. Total. Environ. 2023, 867, 161470. [Google Scholar] [CrossRef]
Datta, P.; Faroughi, S.A. A multihead LSTM technique for prognostic prediction of soil moisture. Geoderma 2023, 433, 116452. [Google Scholar] [CrossRef]
Li, Q.; Zhu, Y.; Shangguan, W.; Wang, X.; Li, L.; Yu, F. An attention-aware LSTM model for soil moisture and soil temperature prediction. Geoderma 2022, 409, 115651. [Google Scholar] [CrossRef]
Li, L.; Dai, Y.; Shangguan, W.; Wei, N.; Wei, Z.; Gupta, S. Multistep Forecasting of Soil Moisture Using Spatiotemporal Deep Encoder–Decoder Networks. J. Hydrometeorol. 2022, 1, 337–350. [Google Scholar] [CrossRef]
Wang, Y.; Shi, L.; Hu, Y.; Hu, X.; Song, W.; Wang, L. A comprehensive study of deep learning for soil moisture prediction. Hydrol. Earth Syst. Sci. 2024, 28, 917–943. [Google Scholar] [CrossRef]
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat, F. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef]
Khanal, S.; Lutz, A.F.; Immerzeel, W.W.; de Vries, H.; Wanders, N.; Hurk, B. The Impact of Meteorological and Hydrological Memory on Compound Peak Flows in the Rhine River Basin. Atmosphere 2019, 10, 171. [Google Scholar] [CrossRef]
Raffel, C.; Ellis, D.P. Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems. arXiv 2015, arXiv:1512.08756. [Google Scholar]
Anshuman, A.; Eldho, T. A parallel workflow framework using encoder-decoder LSTMs for uncertainty quantification in contaminant source identification in groundwater. J. Hydrol. 2023, 619, 129296. [Google Scholar] [CrossRef]
Yang, Y.; Gao, P.; Sun, Z.; Wang, H.; Lu, M.; Liu, Y.; Hu, J. Multistep ahead prediction of temperature and humidity in solar greenhouse based on FAM-LSTM model. Comput. Electron. Agric. 2023, 213, 108261. [Google Scholar] [CrossRef]
Zhai, W.; Mo, G.; Xiao, Y.; Xiong, X.; Wu, C.; Zhang, X.; Xu, Z.; Pan, J. GAN-BiLSTM network for field-road classification on imbalanced GNSS recordings. Comput. Electron. Agric. 2024, 216, 108457. [Google Scholar] [CrossRef]
Cao, B.; Gruber, S.; Zheng, D.; Li, X. The ERA5-Land soil temperature bias in permafrost regions. Cryosphere 2020, 14, 2581–2595. [Google Scholar] [CrossRef]
Zhang, J.; Zhu, Y.; Zhang, X.; Ye, M.; Yang, J. Developing a Long Short-Term Memory (LSTM) based model for predicting water table depth in agricultural areas. J. Hydrol. 2018, 561, 918–929. [Google Scholar] [CrossRef]
Chen, H.; Huang, J.J.; Dash, S.S.; Wei, Y.; Li, H. A hybrid deep learning framework with physical process description for simulation of evapotranspiration. J. Hydrol. 2022, 606, 127422. [Google Scholar] [CrossRef]
Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media. 2022, 8, 331–368. [Google Scholar] [CrossRef]
Wei, H.; Zhu, M.; Wang, B.; Wang, J.; Sun, D. Two-Level Progressive Attention Convolutional Network for Fine-Grained Image Recognition. IEEE Access 2020, 8, 104985–104995. [Google Scholar] [CrossRef]
Li, L.; Shangguan, W.; Deng, Y.; Mao, J.; Pan, J.; Wei, N.; Yuan, H.; Zhang, S.; Zhang, Y.; Dai, Y. A Causal Inference Model Based on Random Forests to Identify the Effect of Soil Moisture on Precipitation. J. Hydrometeorol. 2020, 21, 1115–1131. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, TO, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Li, Q.; Zhang, C.; Shangguan, W.; Wei, Z.; Yuan, H.; Zhu, J.; Li, X.; Li, L.; Li, G.; Liu, P.; et al. LandBench 1.0: A benchmark dataset and evaluation metrics for data-driven land surface variables prediction. Expert Syst. Appl. 2023, 243, 122917. [Google Scholar] [CrossRef]
Hao, H.; Yu, F.; Li, Q. Soil Temperature prediction using convolutional neural network based on ensemble empirical mode decomposition. IEEE Access 2021, 9, 4084–4096. [Google Scholar] [CrossRef]
Zhao, Z.; Yao, X.; Xu, K.; Song, J.; Chen, X. Water yield of mine analysis and prediction method based on EEMD-PSO- ELM-LSTM model. arxiv 2023. [Google Scholar] [CrossRef]

Figure 1. The importance of the input data for predicting soil moisture: the volume of the soil water layer (0–7 cm).

Figure 2. The structure of the LSTM model.

Figure 3. The schematic diagram of FAM.

Figure 4. The structure of the REDF-LSTM model.

Figure 5. Comparison of LSTM, REDF-LSTM, EDLSTM, FAMLSTM, and GANBiLSTM performance through box plots for temporal and spatial generalization tests in next-day prediction. Each plot consolidates data from ERA5-Land pixels across global land variables, with numerical values highlighted within the panels. The four charts illustrate R² (a), KGE (b), BIAS (c), and RMSE (d) metrics, respectively.

Figure 6. Visualization of global soil moisture prediction accuracy (m³/m³) for LSTM, REDF-LSTM, FAM-LSTM, ED-LSTM, and GANBi-LSTM models using ERA5-Land data at 1-degree resolution. The first row showcases the Root Mean Square Error (RMSE), BIAS, and Coefficient of Determination (R²) for the LSTM model, indicated as subfigures (a–c), respectively. Subsequent rows present the same metrics for REDFLSTM (second row, subfigures (d–f)), FAMLSTM (third row, subfigures (g–i)), EDLSTM (fourth row, subfigures (j–l)), and GANBiLSTM (final row, subfigures (m–o)). Each metric provides insight into different aspects of the models’ predictive performance and accuracy. Validation of these models is conducted using data from a randomly selected day (Day 135) of the year 2020.

Figure 7. Comparison of ERA5-Land observations with soil moisture predictions by LSTM, REDF-LSTM, EDLSTM, FAMLSTM, and GANBiLSTM at five locations. Figure (a) delineates the geographical coordinates of five selected points. Figure (b) compares ERA5-Land dataset’s observed soil moisture values with predictions by REDFLSTM, EDLSTM, FAMLSTM, GANBiLSTM, and LSTM at (46.0° S, 70.0° W). Figure (c) provides this comparative analysis at (49.0° N, 110.0° W), Figure (d) at (69.0° N, 150.0° E), Figure (e) at (51.0° N, 60.0° E), and Figure (f) at (59.0° N, 70.0° E).

Figure 8. Comparison of R² metric error bar plots for global soil moisture prediction models.

Figure 9. Comparison of soil moisture predictions for the seventh day ahead using REDFLSTM versus LSTM, with indicators R, RMSE, and BIAS marked in the graph. Figure (a) represents the global soil moisture values for day 7 predicted by LSTM, while Figure (b) represents the global soil moisture values for the same forecast horizon predicted by the REDFLSTM model.

Figure 10. Scatterplot comparison of surface sensible heat flux predictions for the next day using REDF-LSTM versus LSTM, with indicators R and RMSE marked in the graph. Figure (a) depicts the forecasts from the REDFLSTM model, whereas Figure (b) illustrates the results from the LSTM model.

Table 1. The forecasted soil moisture results for the future 1 day using different sets of hyperparameters for REDF-LSTM. The best set within each group is highlighted in bold.

Learning Rate	Hidden Size	Batch Size	Epoch	Niter	R
1 × 10⁻²	256	64	1000	400	0.8587
1 × 10⁻³	256	64	1000	400	0.9506
1 × 10⁻⁴	256	64	1000	400	0.9383
1 × 10⁻³	256	64	800	400	0.9465
1 × 10⁻³	256	128	1000	400	0.9412
1 × 10⁻³	256	32	1000	400	0.9492
1 × 10⁻³	256	64	1200	400	0.9407
1 × 10⁻³	128	64	1000	400	0.9453
1 × 10⁻³	512	64	1000	400	0.9467
1 × 10⁻³	256	64	1000	300	0.9451
1 × 10⁻³	256	64	1000	500	0.9483

Table 2. Ablation study results for the ERA5-Land dataset. The best performance is shown in bold.

Method	R	KGE	RMSE	BIAS
LSTM	0.917	0.749	0.024	0.019
ED-LSTM	0.934	0.856	0.020	0.015
FAM-LSTM	0.943	0.860	0.019	0.014
REDF-LSTM	0.951	0.869	0.013	0.013

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Zhang, Z.; Li, Q.; Zhu, J. Enhancing Soil Moisture Forecasting Accuracy with REDF-LSTM: Integrating Residual En-Decoding and Feature Attention Mechanisms. Water 2024, 16, 1376. https://doi.org/10.3390/w16101376

AMA Style

Li X, Zhang Z, Li Q, Zhu J. Enhancing Soil Moisture Forecasting Accuracy with REDF-LSTM: Integrating Residual En-Decoding and Feature Attention Mechanisms. Water. 2024; 16(10):1376. https://doi.org/10.3390/w16101376

Chicago/Turabian Style

Li, Xiaoning, Ziyin Zhang, Qingliang Li, and Jinlong Zhu. 2024. "Enhancing Soil Moisture Forecasting Accuracy with REDF-LSTM: Integrating Residual En-Decoding and Feature Attention Mechanisms" Water 16, no. 10: 1376. https://doi.org/10.3390/w16101376

APA Style

Li, X., Zhang, Z., Li, Q., & Zhu, J. (2024). Enhancing Soil Moisture Forecasting Accuracy with REDF-LSTM: Integrating Residual En-Decoding and Feature Attention Mechanisms. Water, 16(10), 1376. https://doi.org/10.3390/w16101376

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Soil Moisture Forecasting Accuracy with REDF-LSTM: Integrating Residual En-Decoding and Feature Attention Mechanisms

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Description

2.2. REDF-LSTM Model

2.2.1. Long Short-Term Memory Network (LSTM)

2.2.2. Feedforward Attention Mechanism

2.2.3. REDF-LSTM

2.2.4. Model Setting, Training

2.2.5. Model Evaluation

3. Results

3.1. Box Plot Comparison of Model Performances

3.2. Visual Comparison of Model Performance

3.3. Time Series Plots Comparison of Model Performance

3.4. Comparative Error Bar Graph of Model Performance

3.5. Forecasting the Seventh Day Ahead and Predictive Results Surface_Sensible_Heat_Flux

3.6. Ablation Experiments

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI