1. Introduction
Soil moisture, an integral component of the soil-plant-atmosphere water cycle, refers to the water present in the soil, which is essential for maintaining plant growth [
1]. It is a key factor in determining irrigation water requirements [
2]. Forecasting soil moisture is invaluable for understanding future soil moisture trends, managing water stress conditions affecting crops, and planning irrigation schedules while conserving limited water resources. The Bundaberg region in Queensland, Australia, the focus of this study, is extensively used for growing commercial crops. Thus, a soil moisture forecasting model will significantly benefit agricultural operations in this region.
Data-driven predictive models have demonstrated superior competency in predictions of soil moisture [
3] and other hydro-meteorological variables such as precipitation [
4], drought [
5,
6], evapotranspiration [
7] and river flow [
8]. For example, Jamei et al. [
9] constructed multi-level pre-processing model frameworks using NASA’s Soil Moisture Active Passive (SMAP) satellite datasets for multi-step (one and seven days ahead) daily forecasting of Surface Soil Moisture (SSM) in Iran’s dry and semi-arid regions. This experiment integrated Boruta Gradient Boosting Decision Tree (Boruta-GBDT) feature selection and Multivariate Variational Mode Decomposition (MVMD) techniques with advanced Machine Learning (ML) models, including Bidirectional Gated Recurrent Unit (Bi-GRU), Cascaded Forward Neural Network (CFNN), Adaptive Boosting (AdaBoost), Genetic Programming (GP), and classical Multilayer Perceptron neural network (MLP). The results indicated that MVMD-Boruta-GBDT-CFNN outperformed all other hybrid models in one and seven days soil moisture forecasting across all tested sites. Basak et al. [
10] also employed a data-driven approach to forecast soil moisture. Their study developed and tested two data-driven models based on Naive Accumulative Representation (NAR) and the Additive Exponential Accumulative Representation (AEAR). The proposed NAR and AEAR models have demonstrated their competitiveness in forecasting tasks, performing better than several well-establishe and cutting-edge benchmark models.
Among data intelligence approaches, deep learning (DL), the latest generation of artificial intelligence systems, has become increasingly popular and performs exceptionally well in both industrial and scientific research [
11]. The strength of DL techniques lies in their ability to learn complex nonlinear functions of input data with low-level information, allowing them to successfully capture and extract the detailed features from extensive row input data sets accumulated over decades. These capabilities make DL techniques particularly valuable for research initiatives. The LSTM algorithm is one of the DL artificial intelligence approaches that is being utilized to forecast various hydrological and environmental variables, such as water quality [
12] and rainfall-runoff [
13]. Several studies have investigated the feasibility of using LSTM-based model for SM prediction.
For example, in South Louisiana in the United States, ElSaadani et al. [
14] found that among the spatial-temporal models tested, the ConvLSTM outperformed other Convolutional Neural Network (CNN) and LSTM-based models in SM prediction. To improve the SM prediction accuracy, Li et al. [
15] experimented with a unique residual learning encoder-decoder model (EDT-LSTM), using data from 13 sites across different countries. This model demonstrated improved accuracy in forecasting moisture levels in 5 cm deep surface soil layers for 1, 3, 5, 7 and 10 days ahead. In another study, Suebsombut et al. [
16] developed LSTM-based models to forecast SM values in Chiang Mai province, Thailand, showing that the LSTM-based model performs well in predicting SM. Additionally, Zeynoddin and Bonakdari [
17] proposed two DL methods, Genetic and Teacher–Learner-based Algorithms (GA and TLA), coupled with LSTM for SM forecasting in Quebec, Canada. Their results showed that TLA-LSTM model was more computationally efficient and therefore a better option than GA-LSTM.
To further enhance forecasting model capabilities, many researchers have recently been developing hybrid models. It is common to combine data pre-processing techniques with forecasting models when designing these hybridized models, as pre-processing methods particularly work well with nonlinear and non-stationary time series data. In artificial intelligence model hybridization, feature selection is a popular data pre-processing method and a variety of research studies have shown that it enhances the model’s performance. The purpose of this process is to reduce the high dimensionality of input data by screening out the most correlated input data sets to the target variable data set as a first step in advanced data-driven model development. For example, Iterative Input Selection (IIS) has been used to forecast streamflow [
3], while Boruta-random forest (Boruta) has been applied to forecast evapotranspiration [
18] and soil moisture [
9]. Additionally, Neighbourhood Component Analysis (NCA) has been used to forecast wind speed [
19] and agricultural drought [
20].
The Lasso feature selection method, used in this study, has also been employed in hydrological forecasting research. For instance, Alizadeh et al. [
21] developed a Support Vector Regression (SVR) based model for monthly stream flow prediction at the Karaj River in Iran, using both Lasso and Particle Swarm Optimization-Artificial Neural Networks (PSO-ANN) feature selection methods to identify the most correlated input variables. The results indicated that Lasso input selection is more accurate than the PSO-ANN algorithm, therefore improving the accuracy of the model forecast. Chu et al. [
22] has also employed the Lasso feature selection technique along with Fuzzy C-means (FCM) classification and Deep Belief Networks (DBN) deep learning model (Lasso-FCM-DBN) to forecast streamflow at gauge stations in the Tennessee River catchment, USA. They found that the Lasso-FCM-DBN approach enhanced the performance of streamflow prediction compared to ANN. However, this feature selection technique has not so far been employed with any deep learning approach in soil moisture forecasting model development.
Along with feature selection, wavelet decomposition is a common data pre-processing step in data intelligence model hybridization. Hydrological and water resources time series data are inherently complex due to periodicity, transients, and trends. Wavelet transform algorithms can decompose this complex data into sub-time series data that are more interpretable for data-driven models. As a result, wavelet decomposed data often improve model performance and are widely used in hydrological and water resources-related prediction applications.
The wavelet decomposition methods widely used in recent model hybridization works are Discrete Wavelet Transformation (DWT), Maximum Overlap Discrete Wavelet Transform with Multi-Resolution Analysis (moDWT-MRA), Maximum Overlap Discrete Wavelet Transform (moDWT), and (a) trous (AT) algorithm [
23]. For instance, Prasad et al. [
3] employed moDWT in their hybrid IIS-moDWT-ANN model designed for forecasting streamflow and it has shown better accuracy than the counterpart single and hybrid benchmark models. Adib et al. [
24] in their study for predicting one-day-ahead snow depth (SD) at the North Fork Jocko snow telemetry (SNOTEL) station in Missoula, Montana (US), tested different wavelet transform (WT) approaches including discrete wavelet transform (DWT), maximal overlap discrete wavelet transform (MODWT), and multiresolution-based MODWT (MODWT-MRA) along with autoregressive integrated moving average (ARIMA), and artificial intelligence (AI) models. Their findings showed that hybrid ARIMA-AI models produced more accurate results than standalone ARIMA and AI models, highlighting the wavelet technique’s capacity to enhance the model performances.
It is important to note that DWT and moDWT-MRA can introduce errors into forecasts due to boundary condition-related issues and provide results that are not realistically achievable in real-world scenarios. Therefore, these methods are unsuitable for practical applications. However, moDWT and AT wavelet transform algorithms, when applied correctly, can resolve boundary condition related issues [
23]. These boundary condition issues, their impact to the model forecast and the remedies to overcome them will be discussed later in detail under the theoretical overview section of this paper.
Despite these constraints, many recent hybrid forecasting model development studies, including the examples mentioned earlier that employed wavelet transform techniques to decompose hydrological and water resources related data, have not adequately considered these limitations. Instead, they have used DWT and moDWT-MRA regardless of their shortcomings. Furthermore, moDWT and AT wavelet transform algorithms, which do not add errors to model forecasts due to boundary condition issues, are not much used in hydrological prediction as DWT and moDWT-MRA. Therefore, these algorithms still need to be explored further.
In this study, time series data from satellites and ground stations are combined. The methodology section provides detailed information about the types of data collected, and their resolutions and sources. It is well-documented that data from satellite sensor variables can lower the accuracy of hydrological variable predictions [
25]. This issue can be minimized by integrating ground-based and satellite-based data, as done in this study. Ghimire et al. [
26] used data from Goddard’s Online Interactive Visualization and Analysis Infrastructure (Giovanni) combined with reanalysis data from the European Centre for Medium-Range Weather Forecasting (ECMWF) to forecast long-term solar radiation. Similarly, Ahmed et al. (2021b) used a combination of satellite GLDAS data, ground Scientific Information for Landowners (SILO) data, and meteorological indices to predict soil moisture.
Due to the high dimensionality of hydrological time series data extracted in large volumes, this study require feature selection and wavelet decomposition data pre-processing techniques. Thus, this hybridizing approach used moDWT and Lasso algorithms for wavelet decomposition and feature selection, respectively, along with an LSTM data-driven DL network. This methodology is novel, as there is no existing literature that explains the use of Lasso feature selection and moDWT data decomposition techniques in SM prediction Additionally, this study has implemented remedial procedures to overcome the boundary condition related issues which are likely to add errors to real-world forecasts. That is also a forward step in prediction studies that use wavelet transform data decomposition procedures. Further, the proposed combination of algorithms, abbreviated as the moDWT-Lasso-LSTM model, has not yet been tested in other geographic locations, thus filling this gap in soil moisture prediction research.
The objectives in this study are threefold:
To develop deep learning methods for forecasting soil moisture (SM) at 10 cm depth, integrating moDWT data decomposition methods with Lasso methods as feature selection procedures to produce a prediction model based on LSTM utilizing satellite data from GIOVANNI and ground data from SILO.
To employ the hybrid moDWT-Lasso-LSTM model in multi-step SM forecasting, i.e., 1-day (t + 1), 14-day (t + 14) and 30-day (t + 30) SM forecasting.
To compare the objective model with benchmark models: LSTM, DNN, and ANN (standalone models), Lasso-LSTM, Lasso-DNN, and Lasso-ANN (2-phase hybrid models) and moDWT-Lasso-DNN and, moDWT-Lasso-ANN (3-phase hybrid models).
The above objectives have been established in this study to design a precise SM forecasting model for short-, medium- and long-term SM predictions and to confirm its comparative advantage. SM, as a major form of water resource, significantly influences agricultural production and consequently affects food security. Similar to other forms of water resources available globally, SM is a limited resource with growing demand due to the expansion of agricultural activities. Under SM depleted conditions, the demand for water storage for irrigation purposes increases, thereby restricting water availability for other purposes like drinking and recreational activities.
Currently, agriculture accounts for an average of 70 percent of global freshwater withdrawals [
27]. Accurate SM predictions are crucial for early identification of moisture stress in crops and actual irrigation water requirements in advance. Furthermore, precise SM predictions can help to minimize water wastage in irrigation activities, provide early warnings of crop production fluctuations, and conserve valuable water resources. Given these benefits, this study aims to design a SM forecasting model using LSTM deep learning algorithm, combined with Lasso feature selection and moDWT wavelet transform data decomposition techniques.
Further, this study aims to apply the proposed model to multi-step SM forecasting scenarios, including 1-day (t + 1), 14-day (t + 14) and 30-day (t + 30) multi-step SM forecasts. This will give an opportunity to observe its usefulness in short-, medium- and long-term forecasting time horizons. A wide range of forecasting time horizons is important for implementing remedial actions against SM stress conditions at different levels. For instance, short-term SM predictions may important in taking prompt actions against potential sudden crop failures due to moisture stress, while long-term SM predictions are valuable for strategic planning to cope with future drought conditions, conserving water resources, and ensuring stable crop production in the long run. In addition, by comparing the proposed model with competitive rival models, this study aims to recognize the performance improvement without overestimating the proposed model capabilities. The research objectives in this study will help to make further improvement in the precision of SM prediction and thereby adding valuable contribution to the SM prediction studies.
2. Theoretical Overview
This section describes the moDWT, Lasso, and LSTM algorithms used in the current study to develop the model. Additionally, benchmark models were used to evaluate the target model’s performance. These benchmarks are relatively recent machine learning models with neural networks, similar to LSTM, and were chosen for their advanced capabilities, making them suitable competitive benchmarks for the data-driven forecasting algorithm used in this study.
The theoretical foundation of the single neural layer ANN machine learning model is described in earlier research [
28]. In hydrology, ANN is widely used and has proven competent in various prediction tasks. For instance, Prasad et al. [
29], developed an ANN-CoM based multi-model ensemble strategy to forecast monthly soil moisture at four farming locations in the Murray-Darling Basin, Australia. The ANN-CoM model, validated against Volterra, Random Forest, M5 tree, and ELM models, demonstrated high competency in capturing the nonlinear dynamics of soil moisture levels. Similarly, Shirsath and Singh [
30] constructed ANN and Multiple Linear Regression (MLR) models for pan evaporation estimation, and statistical comparisons revealed that the ANN model outperformed other models.
The DNN algorithm, an improvement over ANN, has been detailed by Le et al. [
31] and is progressively used in hydrology. It consists of multiple neural layers and falls under the deep learning subset of machine learning. El Bilali et al. [
32] developed an interpretable ML framework to forecast daily pan evaporation using hourly climate datasets, employing DNN along with Extra Tree, XGBoost, and SVR models. The interpretability of these models was evaluated using SHAP, Sobol-based sensitivity analysis, and LIME, showing good consistency with real hydro-climatic processes of evaporation in a semi-arid environment.
2.1. Decomposition Method: Maximum Overlap Discrete Wavelet Transform (moDWT)
Maximum Overlap Discrete Wavelet Transform (moDWT) decomposition method decomposes complex time series data, characterized by multiple periodicities, transients, and trends, into high and low frequency sub-time series, termed as wavelet and scaling coefficients. These wavelets and scaling coefficients resulting from moDWT are defined as follows [
23]:
where
X is a time series input vector with
N values;
j = 1, 2,…
J, and
J represents the level of decomposition at the time
t; the
jth level wavelet (
) and scaling (
) filters of moDWT are represented as
and
, respectively, and
L is the
jth level filters’ width.
The moDWT addresses issues associated with other data decomposition algorithms such as DWT and moDWT multi resolution analysis (moDWT-MRA). These issues arise when the decomposition process cannot calculate the output values of coefficients accurately (without adding errors) due to the lack of necessary time series observations at specific time points. These errors are termed boundary conditions.
For instance, DWT and moDWT-MRA require future data at certain points to calculate detail and approximation coefficients. While historical time series data can provide this future data, real-world scenarios do not have access to future data, leading to inaccuracies in the calculated coefficients. Consequently, models developed using DWT and moDWT-MRA may fail to accurately forecast soil moisture (SM) in practical implementations.
The moDWT effectively addresses these boundary condition issues by relying solely on current and past time series data for its decomposition outputs, namely wavelet and scaling coefficients, without requiring future data. However, moDWT cannot accurately calculate decomposition outputs for data points at the beginning of a time series because the necessary past data is unavailable. As a result, the wavelet and scaling coefficients calculated for early data points are incorrect and are affected by boundary conditions.
The number of incorrect or boundary-condition-affected coefficients depends on the decomposition level and the wavelet filter used. Higher decomposition levels and longer wavelet filters tend to increase the total number of incorrect coefficients. The total number of incorrect coefficients can be determined using Equation (
3) [
23].
where
represents the number of wavelet and scaling coefficients affected by the boundary condition for decomposition level
J and a wavelet filter of length
L.
In order to improve model forecasting accuracy, it is necessary to remove all these boundary-condition-affected, incorrect wavelet and scaling coefficients derived at the beginning of the data set. High decomposition levels and lengthy wavelet filters result in more incorrect coefficients that must be removed, reducing the number of correct coefficients available for model training. Therefore, the appropriate selection of decomposition levels and wavelet filters is essential for enhancing model performance. Quilty and Adamowski [
23] provides detailed discussions on boundary condition issues arises due to unavailability of future data when employing DWT and moDWT-MRA for data decomposition.
There is no standard rule for determining the optimal decomposition level and wavelet filter type. Increasing the number of boundary-condition-affected coefficients unnecessarily should be avoided, as it leaves an inadequate number of correct coefficients for model training. However, Equation (
4) can be used to calculate the maximum decomposition level (
J) that can be adopted [
33].
2.2. Feature Selection Method: Least Absolute Shrinkage and Selection Operator (Lasso)
In this study, the Lasso algorithm [
34] is employed as a feature selection technique after decomposition of input time series variables by the moDWT algorithm. Suppose that the dataset consists of
p input variables and
N observations. Let
is the input data matrix, in which each column denotes an input variable and
is the response variable where the response value at observation
j is represented by
and
is a vector containing
p characteristics.
The Lasso method therefore follows the work of [
35]:
By applying a penalty for the regression coefficients, the Lasso technique degrades least-squares by shrinking the regression coefficients () to zero. The variables are chosen to be included in the model during this feature selection procedure if their coefficients after the shrinking step are still non-zero. This process minimizes the prediction error by reducing the complexity of the model.
2.3. Data Driven Forecasting Model: Long Short-Term Memory Network (LSTM)
The LSTM is a unique type of Recurrent Neural Network (RNN) [
36] in connection with traditional artificial neural networks that can recognize the intrinsic characteristics of time sequence predictors and targets, considering the recurrent patterns and tendencies throughout long periods. Input, output, and forget gates are the main components of the special units, or memory blocks, that the LSTMs use to operate and these memory blocks regulate the flow of information and are continuously updated [
37]. The 4 steps calculations are described as follows [
38]:
The forget gate
is used by the LSTM layer to determine which data should either be discarded or retained depending on the most recent hidden layer output
and the new input
:
where
stands for weight matrix;
stands for bias vector and
(…) stands for sigmoid logistic function.
After the information is updated by utilizing a “input gate”
, the LSTM layer determines which signal must be kept in the newly formed cell state
that is denoted as the new candidate cell state
:
where the hyperbolic tangent function is denoted by
tanh(…).
The “forget gate”
removes unwanted information from the old cell state
to
and the “input gate”
obtains a new candidate cell state
:
The cell state
and the “output gate”
are then used to calculate the output
:
4. Results and Discussion
The summary of the descriptive statistics for all predictor and target variables is presented in
Table 4. Refer to Goos and Meintrup [
51] for the details of the calculations and interpretations of these statistics. These descriptive statistics provide information on the central tendency (mean, median) and variability (standard deviation) of the data set, as well as the shape and frequency of data distribution. The mean and median values of variables such as radiation, rh-tmax, ET, mslp, and GWS are close together, indicating that their distributions are nearly symmetric. Other variables have greater differences in the mean and median values, reflecting skewnesses in their distributions. Specifically, SM, SM10-40, SM100-200, SM40-100 and rain exhibit slightly to moderately right-skewed distributions, indicated by the skewness values just below and above 0.5, respectively. By contrast, the distributions of max-temp, min-temp, rh-tmin, ST40-100, ST10-40, and ST0-10 are slightly left-skewed, indicated by the skewness values close to −0.5.
In terms of the shape of a distribution’s tails and peak, the negative kurtosis values indicate that generally most of the variables in this study are platykurtic, meaning they have thinner tails and flatter peaks compared to a normal distribution. The exceptions are rh-tmax and rh-tmin with positive kurtosis values, which are leptokurtic, indicating fatter tails and sharper peaks. Rain is extremely leptokurtic, suggesting a high frequency of extreme values.
Table 5 summarizes the results of trial-and-error to identify the best decomposition level and wavelet filter combination. In most cases, best suited combinations differ from one another, with the exception of moDWT-Lasso-LSTM and moDWT-Lasso-DNN at
t + 1. The best model forecasts are obtained using decomposition level 4 and wavelet filter “haar” for those two cases.
According to decomposition level 4, each predictor variable’s time series data is split into four wavelet coefficients and one scaling coefficient, regardless of the wavelet filter used.
Figure 3 illustrates the decomposition results for SM10-40 based on decomposition level 4 and wavelet filter “haar”. (Notably, this is the decomposition level and wavelet filter combinations confirm the best model performances in moDWT-Lasso-LSTM and moDWT-Lasso-DNN at
t + 1 lead time). When decomposition level 4 is used, the number of predictor variables increased to 75 (60 wavelet coefficients (15 × 4) + 15 scaling coefficients (15 × 1)). When decomposition level 2 is used for data decomposition, total number of predictor variables is increased up to 45 (=15 × 2 + 15 × 1).
The Lasso feature selection algorithm is employed to identify the predictor variables most correlated with the target variable (SM). This algorithm reduces the number of wavelet and scaling coefficient data series used in the forecasting model training and testing. Different wavelet and scaling coefficient data series are selected by the Lasso algorithm for each decomposed data set derived from different combinations of decomposition levels and wavelet filters used in 3-phase model development. Notably, the majority of the coefficient data series selected by the Lasso algorithm are scaling coefficients of predictor variables.
Table 6 summarizes the wavelet and scaling coefficient data series selected by the Lasso feature selection algorithm for all 3-phase hybrid models at
t + 1 lead time. For the predictor variable SM10-40, the second, third, and fourth wavelet coefficient data series (
W2,
W3,
W4) and the scaling coefficient data series (V) depicted in
Figure 3 are selected by the Lasso algorithm for the development of the moDWT-Lasso-LSTM and moDWT-Lasso-DNN models at
t + 1 lead time.
In developing 2-phase hybrid models (i.e., Lasso-LSTM, Lasso-DNN, and Lasso-ANN), the number of predictor variables selected by the Lasso algorithm for t + 1, t + 14, and t + 30 lead times are as follows:
For t + 1 lead time: 10 predictor variables (min-temp, radiation, rh-tmax, rh-tmin, ST0-10, rh-tmax, rh-tmin, mslp, ST40-100, ST0-10, rain, SM10-40, SM40-100, SM100-200, and GWS).
For t + 14 and t + 30 lead times: 12 predictor variables (max-temp, min-temp, radiation, rh-tmin, mslp, ST40-100, ST0-10, rain, SM10-40, SM100-200, SM40-100, and GWS).
Table 7 displays the calculated values of statistical metrics to evaluate the performance of the proposed deep moDWT-Lasso-LSTM model and other benchmark models. These metrics include Pearson’s Correlation Coefficient (
r), Coefficient of Determination (
), Root Mean Squared Error (
; kgm
−2), Mean Absolute Error (
; kgm
−2), Mean Absolute Scaled Error (
), Symmetric Mean Absolute Percentage Error (
), Legates and McCabe Index (
) and Willmott’s Index (
). In general, the moDWT-Lasso-LSTM model demonstrated superior performance in SM forecasting compared to all the benchmark models in all different lead times. The proposed model has produced the highest values for
r,
,
and
, indicating strong model performance and reliability. Simultaneously, it has produced the lowest values for
,
,
and
, reflecting high accuracy and minimal error.
For the t + 1 lead time, the moDWT-Lasso-LSTM model demonstrates superior performance across almost all metrics. It has the highest r, indicating a very strong correlation between predicted and actual SM values. Its shows that over 92% of the variance in SM is explained by the model, the highest among all models tested. The lowest values of and signify more accurate predictions. The and are both lower than those of other models, further emphasizing the model’s accuracy. Additionally, the values of and are the highest, indicating superior model performance and agreement with observed data.
Comparatively, the moDWT-Lasso-DNN and moDWT-Lasso-ANN models also perform well, with high r and values but slightly higher , , , and values than the moDWT-Lasso-LSTM model. Traditional models such as Lasso-DNN and DNN show significantly lower performance, evidenced by lower r and values and higher error metrics. For example, the standalone ANN model (r ) had the lowest correlation with the observed SM values, indicating that it was the least effective in capturing the relationship between the predictor variables and observed SM. This demonstrates the efficacy of the hybrid models, especially the moDWT-Lasso-LSTM, in short-term SM forecasting.
For the t + 14 lead time, the moDWT-Lasso-LSTM model again outperforms other models, though the performance metrics show slight degradation compared to the t + 1 lead time. It achieves an r and an , indicating strong predictive power. The and , both are the lowest among all models, further highlighting accurate predictions. The and , which, although slightly higher than the t + 1 lead time, remain the best among the models. The and also exhibit good model reliability and consistency.
The moDWT-Lasso-DNN model shows comparable performance but has slightly lower r and values and higher error metrics. Other models, such as Lasso-LSTM and Lasso-DNN, demonstrate relatively good performance but still fall short of the moDWT-based models, highlighting the advantage of the wavelet transform in enhancing forecasting accuracy over a 14-day horizon. Specifically, the Lasso-LSTM model, with an r score of approximately 0.93999, performs somewhat lower compared to its moDWT-enhanced counterpart. Meanwhile, the ANN model shows the lowest performance for a 14-day lead time, with r and .
For the t + 30 lead time, the moDWT-Lasso-LSTM model continues to show the best performance, with an r and an . Its values of and are the lowest among all models, indicating accurate long-term predictions. The and are also the lowest. Additionally, the values of and reflect high model agreement and reliability even for extended forecasts.
Other models, including the moDWT-Lasso-DNN and moDWT-Lasso-ANN, perform well but with slightly higher error metrics and lower correlation coefficients compared to the moDWT-Lasso-LSTM. Traditional models such as DNN and ANN show significantly lower performance, with higher , , , and values, indicating less accurate predictions. This further underscores the superiority of the proposed model in long-term SM forecasting.
For instance, the heatmap (
Figure 4) shows that the moDWT-Lasso-LSTM model consistently performs better in both RMSE and SMAPE across all leading times, suggesting its robustness in reducing relative error. On the other hand, the DNN and ANN models exhibit the highest SMAPE values, particularly at the
t + 14 and
t + 30 horizons, with values exceeding 5.5%. This indicates that these models may not handle longer-term predictions compared to others.
Moreover,
Figure 5 splays three radar charts representing the MAE for all models across three forecasting horizons. The charts illustrate that the moDWT-Lasso-LSTM model is consistently closest to the center, indicating the lowest MAE and the best short-term (
t + 1) and long-term (
t + 30) prediction performance. At the mid-term (
t + 14), the moDWT-Lasso-LSTM continues to outperform the other models, with moDWT-Lasso-DNN showing slight improvement. These radar charts effectively highlight the comparative strength of each model across different forecasting horizons, emphasizing the superior performance of the hybrid moDWT-Lasso-LSTM model.
To further affirm the superiority of the proposed moDWT-Lasso-LSTM model in terms of prediction competency over the other benchmark models, the absolute forecasting errors (i.e., |
| = |observed SM − forecasted SM|; kgm
−2) of the proposed model and all benchmark models are compared (
Figure 6). The distribution of |
| during the testing phase, including the upper, median, and lower quartiles for each model for
t + 1,
t + 14, and
t + 30 lead time SM forecasting, is illustrated in the box plots in
Figure 6. According to these box plots, the multi-step moDWT-Lasso-LSTM model exhibits the fewest quartiles for |
| across all lead times. These results indicate a narrow error distribution for the moDWT-Lasso-LSTM model compared to the benchmark models, further demonstrating its suitability for SM forecasting.
Figure 7 shows the stem plots for the Nash-Sutcliffe Coefficient (
) calculated for the target moDWT-Lasso-LSTM model and benchmark models during the testing phase for
t + 1,
t + 14, and
t + 30 lead times SM forecasting. These graphs present that the moDWT-Lasso-LSTM model exhibits the highest values of
for all lead times. Additionally, scatter plots for the
t + 30 lead time are provided for all the models tested (
Figure 8). In comparison to the scatter plots of other forecasting models, data points for the moDWT-Lasso-LSTM model are more uniformly distributed along the 45-degree line, with fewer outliers and deviations. This indicates a strongly positive correlation between observed and forecasted SM values for the moDWT-Lasso-LSTM model.
The results consistently show that the moDWT-Lasso-LSTM model outperforms all other benchmark models across different lead times, demonstrating its robustness and reliability in SM forecasting. The use of the wavelet transform (moDWT) combined with Lasso feature selection and the LSTM model significantly enhances prediction accuracy. This hybrid approach effectively captures the temporal dependencies and intricate patterns in the data, leading to superior performance metrics. Furthermore, the decrease in model performance with increasing lead time is expected due to the complexity and variability of SM dynamics over longer periods. However, the moDWT-Lasso-LSTM model’s ability to maintain relatively high accuracy and low error metrics even at t + 30 lead time indicates its potential for practical applications in agricultural and environmental monitoring.
5. Conclusions and Future Work
Agricultural decision-making increasingly relies on reliable information regarding climatic and hydrological variables. For instance, farmers commonly use rainfall forecasts to make decisions about crop establishment, crop harvesting, fertilizer application, and land preparation activities. The proposed model is designed to forecast soil moisture (SM), which is a critical hydrological variable. Accurate SM information is essential for determining the need and timing for irrigation, as well as the precise quantity of irrigation water required on a particular day. If sufficient moisture is available in the soil, irrigation can be skipped; if moisture is inadequate, only the deficit should be compensated through irrigation. Reliable SM information greatly aids in such decision-making.
Furthermore, SM forecasts are crucial for fertilizer application. Adequate soil moisture is essential for dissolving nutrients in fertilizers and making them available to plants. This process enhances the plant’s ability to absorb essential nutrients, maximizing fertilizer use efficiency while reducing waste. In areas without access to irrigation water, where farming relies entirely on rainfall, knowing the moisture levels in soils in advance can significantly impact activities like land preparation, planting, and fertilizer application. This is particularly relevant as many farmers are transitioning towards precision agriculture to reduce production costs, minimize waste, and conserve resources and inputs.
In this context, this study has developed a multi-step wavelet 3-phase hybrid deep learning SM forecasting (moDWT-Lasso-LSTM) model. This model employs Lasso regression optimization and moDWT decomposition algorithms to forecast SM in Bundaberg, Queensland, Australia. Daily input data from 1 January 2005 to 31 December 2020, were obtained from NASA’s Global Land Data Assimilation System (GLDAS), the Land Data Assimilation System (FLDAS), and the ground database SILO. To achieve an accurate model, the extracted data were decomposed using moDWT, and features were selected using the Lasso algorithm for 1 (t + 1), 14 (t + 14), and 30 (t + 30) days ahead forecasts. Incorporating LSTM, moDWT, and Lasso, the proposed deep learning multi-step moDWT-Lasso-LSTM hybrid model was created. Its performance was evaluated using statistical score measures and compared with eight other models: moDWT-Lasso-DNN, moDWT-Lasso-ANN, Lasso-LSTM, Lasso-DNN, Lasso-ANN, LSTM, DNN, and ANN.
In comparison to other benchmark models, The moDWT-Lasso-LSTM hybrid model demonstrates superior performance in forecasting SM across various lead times, with particularly notable improvements for
t + 1 and
t + 30. According to the statistical metrics discussed in
Table 7, the moDWT-Lasso-DNN model shows performance very similar to that of the moDWT-Lasso-LSTM model for the
t + 14 lead time. Visualizing the results using box plots of |
| and stem plots of
, the moDWT-Lasso-LSTM model consistently outperforms the moDWT-Lasso-DNN and all other benchmark models. Consequently, the moDWT-Lasso-LSTM model proves to be more effective in predicting SM than the other benchmark models.
When considering model complexity, the moDWT-Lasso-LSTM, while offering high accuracy, is relatively complex due to the use of deep learning techniques combined with wavelet based feature selection. These types of models require substantial computational resources and longer training times compared to simpler models. In contrast, models like Lasso-LSTM and Lasso-DNN, although still complex, have fewer parameters and generally shorter training times. On the other hand, standalone model such as LSTM and DNN offer a balance between performance and complexity but still require considerable computational power compared to the basic models. Overall, while more complex models like the hybrid moDWT-Lasso variants provide improved accuracy, they come with increased computational costs and training times.
Despite the efficacy of the proposed model, the present research considers only 1-day, 14-day, and 30-day ahead SM forecasting. Therefore, the number of lead times used in this study may not impose a limitation on longer-term applications of the proposed model. However, it is important to note that increasing the lead time can potentially cause significant changes in model performance. Future researchers could conduct new studies to assess the forecasting capability of the model with extended lead times. Additionally, future research could develop an SM prediction tool to generate long-term forecasts, such as several months ahead, which could be significantly more important for irrigation, water resource management, and strategic planning than shorter-duration forecasts.
As this study uses satellite data, the inputs to the model are continuously recorded, enabling the methodology to be further implemented for operational use in agriculture and other industries with real-time access to historical input data. Unlike discrete wavelet methods used in earlier studies, the wavelet transform data decomposition procedure adopted here does not require future data to calculate the wavelet and scaling coefficients [
23]. This advantage allows the proposed model to be practically implemented in real-time, utilizing accessible historical input data.
Due to time and resource constraints, this methodology has not been tested across the entire Queensland or Australian region. Therefore, it should be tested in other regions to examine the geographical consistency of the proposed model. Additionally, while this model was developed to forecast SM in the topsoil layer (0-10 cm depth), future researchers could examine the methodology’s effectiveness in forecasting SM in deeper soil layers.
For methodological improvements over the present moDWT technique coupled with Lasso feature selection or LSTM model, future studies might adopt alternative decomposition methods such as the atrous (AT) algorithm [
23] that can address issues related to using future data in model design. Moreover, the moDWT-Lasso-LSTM model could be applied to predict important drought indices such as the Palmer Drought Severity Index (PDSI), Standardized Precipitation Index (SPI), and Standardized Precipitation and Evaporation Index (SPEI), which are time-series methods where data splitting through multi-resolution analysis can be utilized.