Forecasting Blue and Green Water Footprint of Wheat Based on Single, Hybrid, and Stacking Ensemble Machine Learning Algorithms Under Diverse Agro-Climatic Conditions in Nile Delta, Egypt

Lotfy, Ashrakat A.; Abuarab, Mohamed E.; Farag, Eslam; Derardja, Bilal; Khadra, Roula; Abdelmoneim, Ahmed A.; Mokhtar, Ali

doi:10.3390/rs16224224

Open AccessArticle

Forecasting Blue and Green Water Footprint of Wheat Based on Single, Hybrid, and Stacking Ensemble Machine Learning Algorithms Under Diverse Agro-Climatic Conditions in Nile Delta, Egypt

by

Ashrakat A. Lotfy

^1,2

,

Mohamed E. Abuarab

¹

,

Eslam Farag

³

,

Bilal Derardja

²

,

Roula Khadra

^2,*

,

Ahmed A. Abdelmoneim

² and

Ali Mokhtar

¹

Department of Agricultural Engineering, Faculty of Agriculture, Cairo University, Giza 12613, Egypt

²

Mediterranean Agronomic Institute of Bari, 70010 Bari, Italy

³

Agriculture Applications Department, National Authority for Remote Sensing and Space Sciences, Cairo 1564, Egypt

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(22), 4224; https://doi.org/10.3390/rs16224224

Submission received: 22 July 2024 / Revised: 18 October 2024 / Accepted: 4 November 2024 / Published: 13 November 2024

(This article belongs to the Topic Advances in Smart Agriculture with Remote Sensing as the Core and Its Applications in Crops Field)

Download

Browse Figures

Versions Notes

Abstract

:

The aim of this research is to develop and compare single, hybrid, and stacking ensemble machine learning models under spatial and temporal climate variations in the Nile Delta regarding the estimation of the blue and green water footprint (BWFP and GWFP) for wheat. Thus, four single machine learning models (XGB, RF, LASSO, and CatBoost) and eight hybrid machine learning models (XGB-RF, XGB-LASSO, XGB-CatBoost, RF-LASSO, CatBoost-LASSO, CatBoost-RF, XGB-RF-LASSO, and XGB-CatBoost-LASSO) were used, along with stacking ensembles, with five scenarios including climate and crop parameters and remote sensing-based indices. The highest R² value for predicting wheat BWFP was achieved with XGB-LASSO under scenario 4 at 100%, while the minimum was 0.16 with LASSO under scenario 3 (remote sensing indices). To predict wheat GWFP, the highest R² value of 100% was achieved with RF-LASSO across scenario 1 (all parameters), scenario 2 (climate parameters), scenario 4 (Pe_eff, T_max, T_min, and SA), and scenario 5 (Pe_eff and T_max). The lowest value was recorded with LASSO and scenario 3. The use of individual and hybrid machine learning models showed high efficiency in predicting the blue and green water footprint of wheat, with high ratings according to statistical performance standards. However, the hybrid programs, whether binary or triple, outperformed both the single models and stacking ensemble.

Keywords:

wheat BWFP and GWFP; climatic parameters; remote sensing indices; machine learning models; single and hybrid models; stacking ensemble

1. Introduction

In the contemporary era, a crucial global concern shared by all nations is the management of water resources. Recent studies indicate that the inadequate and inefficient utilization of water could lead to imminent perils and challenges worldwide. Of particular note is the continuous decline in the per capita availability of clean water, a situation exacerbated in Egypt due to factors such as high living standards, sustained population growth, and rapid industrial expansion, all contributing to water scarcity and compromising water security. A pronounced deficit in the water balance has become apparent, with Egypt’s projected resource-to-needs ratio estimated at 21 billion m³ annually in 2019 [1]. Moreover, it is estimated that around 34 billion m³ of water will be necessary to ensure food security through virtual water imports by 2050. Consequently, there is a pressing need for various institutions to increasingly tap into non-traditional water sources in order to address this escalating gap. In Egypt, the strategic prioritization of water resource management underscores a shift from conventional, exclusive governmental deliberations to a more inclusive approach involving diverse stakeholders, including user communities and marginalized cultural groups, in decision-making processes [1].

It is crucial to effectively manage water resources in order to ensure water sustainability, particularly in arid and semi-arid regions (ASARs) where factors such as human activities, population growth, and the decline in fresh surface and groundwater are ongoing issues. ASARs constitute approximately 40% of the Earth’s land area [2], characterized by low annual rainfall and an irregular distribution of precipitation over time and space. Egypt is recognized as one of the driest locations globally, receiving an average yearly rainfall of merely 32 mm overall and up to 150 mm in select regions [3]. Therefore, optimizing the utilization of rainfall in these regions is vital for freshwater provision [4].

The concept of a “Water Footprint” (WF), introduced by Hoekstra (2008) [5], serves as an indicator relating production to the overall water resource consumption. Green and blue WFs represent rainfall and agricultural water usage, respectively. The WF enables the determination of primary and secondary freshwater resource consumption necessary for the production of a specific item [6]. Given the benefits of WFs, investigating the WF of crop yield has become a compelling research area. Therefore, the focus lies on developing models and accurately estimating the WF value [7]. The water footprint indicates the total water consumption from direct and indirect activities throughout the supply or production chain [8].

The assessment of crop water consumption can adopt a regionalized perspective and encompass water sources like rivers, lakes, or other natural reservoirs. Challenges in estimating the crop WF arise due to the diverse impacts of climatic variables—either individually or in combination—on water consumption and crop yield, especially under future climate change scenarios [9]. The computation of the crop WF using physical process-based crop or hydrological models requires certain fundamental prerequisites in terms of expenses, time, and expertise. Thus, machine learning (ML) emerges as a viable alternative due to its cost-effectiveness, high speed, and low complexity of processing [10].

Enhancing the performance of algorithms can be achieved by adopting a hybrid approach. Various techniques have been proposed for creating hybrid classifiers [11]. Despite the numerous methods suggested for hybridization, there is still uncertainty regarding the most effective approach [12]. Consequently, a current focus in supervised learning is the exploration of strategies for developing efficient hybrid algorithms. These algorithms are formed by amalgamating components from existing methods to create a cohesive blend, which enhances the overall stability and accuracy of the results. The selection of appropriate algorithms is crucial for generating effective combinations. Many researchers have actively pursued the combination of multiple algorithms for data mining purposes [13]. Despite the variety of proposed techniques for hybrid algorithms, the optimal method remains unclear [12].

However, alternative techniques for constructing ensembles, such as stacking, have not received adequate attention in the water footprint literature [14]. Stacking involves the creation of an enhanced dataset comprising original features, as well as new synthetic variables derived from predictions made using multiple models. The primary motivation behind utilizing stacking ensembles in water footprint prediction is that these synthetic variables enable the system to better align its output responses compared to models trained solely on original data [15,16].

Remote sensing techniques offer the capability of mapping the WF of a specific crop by utilizing models like crop growth, productivity, and crop water usage models [17]. Advances in remote sensing techniques and algorithms have demonstrated the ability to estimate crop yield, actual evapotranspiration (ETa), crop water usage, and water footprints using multispectral data from satellites such as MODIS, Landsat, Sentinel, IRS, VIIRS, and AVHRR [18]. Sentinel-2 data were used to retrieve individual bands, and vegetation indices (VIs), including the Normalized Difference Vegetation Index (NDVI), Soil Adjusted Vegetation Index (SAVI), Optimized Soil Adjusted Vegetation Index (OSAVI), Renormalized Difference Vegetation Index (RDVI), Enhanced Vegetation Index (EVI), and Land Surface Temperatures (LST) from Landsat-8 data, were utilized to estimate crop productivity (CP), crop water usage (CWU) (i.e., ETa), and crop WF [19].

Wheat, scientifically known as (Triticum aestivum L.), holds a significant position as the primary crop in Egypt, covering a vast expanse of 1.5 million hectares, as highlighted by Nasr and Sewilam [20]. A substantial portion of the Egyptian population relies on wheat-based food for their daily nutritional requirements, with 1/3 of their caloric intake and 45% of protein intake sourced from products like Balady bread. The annual wheat consumption in Egypt amounts to a staggering 20 million metric tons [21], with less than half of this demand being met through local production, leading to a heavy dependence on imports to bridge the gap. According to the Food and Agriculture Organization [21], statistics from 2021 indicate that Egypt managed to produce approximately 9 million tons of wheat from 1.52 million hectares of land. Notably, a significant proportion of Egypt’s wheat imports, totaling 85%, originate from Russia and Ukraine, with Russia contributing 60–66% and Ukraine 20–25% to the total import volume, as reported by Abdalla, Stellmacher [22]. Furthermore, on a global scale, wheat, specifically Triticum aestivum, plays a crucial role in both the global and Egyptian food systems, as highlighted by Hachisuca, Abdala [23]. The worldwide cultivation of wheat spanned over 221 million hectares in 2019, yielding a massive output of approximately 766 million tons, according to FAO [21].

Despite the many studies that dealt with predicting the water footprint in the agricultural sector for different types of crops and applying single and hybrid machine learning programs [24,25,26,27,28], the application of the stacking ensembles system to predict the water footprint is still limited; in addition, the programs that compare between different systems of machine learning are still limited, which made this research important and based on making a comparison between single, hybrid, and stacking ensembles to predict the water footprint of agricultural crops.

In light of the significance of wheat production and consumption in Egypt, this study aimed to achieve several key objectives. Firstly, it sought to develop and compare a range of machine learning models, including extreme gradient boosting (XGBoost), random forest (RF), least absolute shrinkage and selection operator (LASSO), and CatBoost, as well as hybrid models and stacking ensembles. These models were applied in the context of spatial and temporal climate variations in the Nile Delta region to estimate the blue and green water footprints of wheat from 2013 to 2022 using measured weather data. Secondly, the study aimed to identify the most effective model and scenario with which to accurately predict both the blue and green water footprints of wheat. Additionally, it aimed to demonstrate efficient and cost-effective techniques for forecasting the water footprint of wheat in the Mediterranean region, providing valuable insights for water policy-makers to enhance water resource management strategies.

2. Materials and Methods

2.1. Study Area and Workflow

The research was conducted in the EL-Beheira and AL-Sharkia Governorates in Egypt, focusing on wheat cultivation due to the region’s reputation as a prominent wheat-farming area in the country. EL-Beheira is located in the northern part of Egypt, it spans an extensive area of 9826 square kilometers, with the coordinates of 30.27 N and 30.28 E, and it is located at an average elevation of 20 m above sea level (Figure 1). On the other hand, AL-Sharkia features a latitude of 30.36 N, a longitude of 31.24 E, and an elevation of 9 m above sea level, and it covers 4180 square kilometers, experiencing an arid climate with varying annual rainfall levels.

The workflow of input data, applied machine learning models, scenarios, and expected output is represented in Figure 2, which includes the collection of climate and crop data as the initial step. Subsequently, four machine learning models (XGB, RF, LASSO, and CatBoost) were individually employed, along with hybrid and stacking ensembles, integrating remote sensing indices such as EVI, NDVI, SAVI, NDMI, GCI, and LST to predict the blue and green water footprints based on five distinct scenarios combining climate, crop data, and remote sensing indices.

2.2. Climate Conditions

This research used monthly recordings of the maximum and minimum air temperatures (T_max and T_min in degrees Celsius), wind speed (WS in meters per second), relative humidity (RH in percentage), and precipitation (P in millimeters) gathered from 2013 to 2022. Additionally, data on solar radiation (SR) and vapor pressure deficit (VPD) were acquired from the Central Laboratory for Agricultural Climate at https://www.arc.sci.eg (accessed on 1 January 2023). The dataset covers a significant period from 2013 to 2022, providing a comprehensive understanding of the climatic conditions over the years, essential for various environmental and agricultural analyses and applications.

2.3. Remote Sensing for GWFP and BWFP Estimation

The utilization of Landsat time-series data in this study enabled the computation of six vegetation indices, including EVI, NDVI, SAVI, NDMI, GCI, and LST. The integration of the Google Earth Engine (GEE) facilitated the efficient calculation of these indices, showcasing the capabilities and speed of cloud-based processing. The evaluation of these indices was performed using Landsat 7 and 8 imageries, offering a spatial resolution of 30 m at Data level 2 and ensuring detailed and accurate assessments of vegetation dynamics and environmental changes (Table 1).

The research investigation, spanning from 2013 to 2022, concentrated on the months from November to May, crucial for the sustainable growth of wheat crops. The following indices were used: EVI, NDVI, SAVI, NDMI, GCI, and LST. Information regarding the spatial and temporal resolution are presented in Table 1. Moreover, more details are available in the following sections.

2.3.1. Multi-Temporal Image Analysis

The Enhanced Vegetation Index (EVI)

EVI emerges as a vital tool in remote sensing applications for monitoring vegetation health and changes over time. Derived from satellite imagery, EVI offers a quantitative assessment of vegetation density and vigor, considering factors like aerosol scattering and canopy background reflectance. EVI values range from −1 to 1, with higher values indicating healthier and denser vegetation cover, providing valuable insights into ecosystem productivity and environmental conditions.

E V I = 2.5 \times \frac{(N I R - R E D)}{(N I R + R E D + 1)}

(1)

where NIR represents the near-infrared band reflectance, and RED represents the red band reflectance.

Normalized Difference Vegetation Index (NDVI)

NDVI, initially introduced by [29], remains a prominent vegetation index widely used in diverse applications. With its normalization process, NDVI values range from −1 to 1, emphasizing the response to green vegetation, particularly in areas with sparse vegetation cover. The mathematical representation of NDVI allows for quantitative assessments of vegetation health and distribution, supporting ecological studies, agriculture, and land management practices.

N D V I = \frac{(N I R - I R)}{(N I R + I R)}

(2)

Soil-Adjusted Vegetation Index (SAVI)

The SAVI index, recognized as a reliable technique, serves as a valuable tool for vegetation mapping by leveraging the unique spectral characteristics of red and near-infrared wavelengths. SAVI’s normalization approach ensures consistency in values across different images, ranging from −1 to 1, enhancing the comparability of results.

S A V I = \frac{N I R - R E D}{(N I R + R E D + L)} \times (1 + L)

(3)

where L is the soil-brightness correction factor, ranging from 0 to 1. In this study, L was 0.5 by default. An L value of 0.5 in the reflectance space was found to minimize soil brightness variations and eliminate the need for the additional calibration of different soils [30].

Normalized Difference Moisture Index (NDMI)

Since it reflects the moisture content of the vegetation, NDMI is widely used as a proxy to evaluate water stress in plants [31]. Equation (4) is used to compute the NDMI value.

N D M I = \frac{N I R - S W I R}{N I R + S W I R}

(4)

The short-wave infrared (SWIR) bands of Landsat 8 OLI, namely band 5 (0.85–0.88 μm) and band 6 (SWIR1: short-wave infrared 1, 1.57–1.65 μm), are the sources of the NDMI spectral index. Their values vary from −1 to 1; higher values indicate a lack of water stress at the soil–leaf canopy level, and lower values reflect excessive water stress [32].

Green Chlorophyll Index (GCI)

In relation to crop development, GCI may show the canopy chlorophyll content and light usage efficiency. Generally speaking, the crop’s vegetation is immediately reflected in the chlorophyll value. Information from https://nassgeodata.gmu.edu/CropScape/ was collected [33] (accessed on 1 January 2023).

G C I = \frac{N I R}{G R E E N - 1}

(5)

Land Surface Temperature (LST) Derivation

Lastly, Li, Gu [34] detailed a computation process for LST using the Split-Window Algorithm (SWA) approach.

L S T = T B 10 + C_{1} (T B 10 - T B 11) + C_{2} {(T B 10 - T B 11)}^{2} + C_{0} + (C_{5} + C_{6} W) + (C_{3} + C_{4} W) ∆ m

(6)

where W is atmospheric water vapor content = 0.013 [35]; ε is the mean of land surface emissivity (LSE) bands 10 and 11; C₀ to C₆ are the split-window coefficient value; and Δm is the difference in LSE bands 10 and 11. For Landsat 8 data, the coefficients (C₀ to C₆) are constants needed for this approach. They are supplied by [36], where the values for the constants from C₀ to C₆ are, in this order, 0.268, 1.378, 0.183, 54.30, −2.238, −129.20, and 16.40.

2.4. Blue and Green Water Footprint Calculations

The detailed data and computation procedure of the ET_O can be found in [37].

{E T}_{o} = \frac{0.408 Δ (R_{n} - G) + γ \frac{900}{T + 273} u_{2} (e_{s} - e_{a})}{Δ + γ (1 + 0.34 u_{2})}

(7)

where ET_O is the reference evapotranspiration (mm day⁻¹), u₂ is the wind speed at a 2 m height (m s⁻¹), T is the air temperature at a 2 m height (°C), e_s is the saturation vapor pressure (kPa), e_a is the actual vapor pressure (kPa),

e_{s} - e_{a}

is the saturation vapor pressure deficit (kPa), Δ is the slope vapor pressure curve (kPa °C⁻¹), and γ is the psychometric constant (kPa °C⁻¹). Once ET_O is calculated, the crop evapotranspiration (ET_O, mm day⁻¹) relative to the different development stages was calculated using an empirical factor (crop coefficient K_C) as follows:

{E T}_{C} = {E T}_{O} . K_{C}

(8)

In certain situations, where RH_min deviates from 45% or when u₂ is more or less than 2.0 m s⁻¹, ET_C stands for the potential crop water requirement (CWR), and K_C stands for the adjusted crop coefficient [38].

K_{c a d j u s t e d =} k_{c r e f r e n c e} + [0.04 (u_{2} - 2) - 0.004 ({R H}_{m i n} - 45)] {(\frac{h}{3})}^{0.3}

(9)

The main presumptions underlying the family water footprint calculation are that all evapotranspiration that transpires following a rainfall event is attributed to the effective rainfall, hence forming the green crop water utilization (CWU_green). Blue crop water usage refers to any quantity of water applied through irrigation (CWU_blue). Depending on the kind of system used, irrigation efficiency will differ; rates of 60% for surface irrigation, 75% for sprinkler irrigation, and 80% for drip irrigation are typical estimates. A gray water footprint illustrates a body of water’s capacity to take up contaminants that are released while maintaining predetermined standards of quality. Using the above formulae, data for orange and wheat crops grown in the designated research region between 2013 and 2022 were calculated to determine the family water footprint.

T h e B l u e W a t e r F o o t P r i n t ({W F}_{b l u e}) = \frac{{C W U}_{b l u e}}{Y} = 10 \times \frac{{E T}_{b l u e}}{Y}

(10)

T h e G r e e n W a t e r F o o t p r i n t ({W F}_{g r e e n}) = \frac{{C W U}_{g r e e n}}{Y} = 10 \times \frac{{E T}_{g r e e n}}{Y}

(11)

T h e G r e e n W a t e r F o o t p r i n t ({E T}_{g r e e n}) = M i n ({E T}_{C}, P_{e})

(12)

T h e B l u e W a t e r F o o t P r i n t ({W F}_{b l u e}) = \frac{{C W U}_{b l u e}}{Y} = 10 \times \frac{{E T}_{b l u e}}{Y}

(13)

where WF_green represents the green water footprint, WF_blue represents the blue water footprint (m³ ton⁻¹), Y stands for the crop production (ton ha⁻¹), and CWU_green and CWU_blue identify the green and blue water usage (m³ ha⁻¹, respectively). ET_green and ET_blue depict green (effective precipitation) and blue (evapotranspiration) water, respectively; P_e denotes effective precipitation across the growing season.

Estimation of Effective Rainfall

The primary goal of calculating effective rainfall is to determine the green water footprint. The following formula may be used to calculate effective rainfall:

P_{e} = {\begin{matrix} P (4.17 - 0.2 P) / 4.17, & P < 8.3 \\ 4.17 + 0.1 P & P \geq 8.3 \end{matrix}}

(14)

where P and P_e are the monthly precipitation and effective precipitation (mm), respectively [39,40].

2.5. Machine Learning Implementations

Four machine learning models, least absolute shrinkage (LASSO), random forest (RF), extreme gradient boosting (XGBoost), and CatBoost were used in this work to predict the green and blue water footprints. Hybrid models based on these four models were used to leverage the strengths and mitigate the weaknesses of each individual model, enhancing overall prediction accuracy and robustness. Each model captures different aspects and patterns in the data, offering a diverse set of predictions. By combining them, it is possible to reduce the impact of biases or errors inherent to any single model, thus improving the reliability of the green and blue water footprint estimates. Two groups of data were generated, 70% for training the model 30% for testing the model, in order to compare the actual and predicted values of the water footprint.

2.5.1. Random Forest (RF)

Random forest, developed by Breiman [41], is based on an ensemble of decision trees with controlled variance. Random forest regression is one example of a bootstrap assembly. It works with random binary trees that use bootstrapping, a technique in which a random subset of the training dataset is chosen from the raw dataset and used to develop the model with a fraction of the observations. Random forest is designed to form an ensemble of weak unbiased classifiers that combine their results during the final classification of each object. Individual classifiers are built as classification trees.

Each tree is constructed using a different bootstrap sample of the training set. Each bootstrap sample results from drawing with the replacement of the same number of objects as in the original training set. As a result, roughly 1/3 of objects are not used for building a tree but instead for performing an out-of-bag (OOB) error estimate and for importance measurement. A different subset of attributes is randomly selected at each step of the tree construction. The RF classification algorithm is relatively quick, can usually be run without tuning parameters, and provides a numerical estimate of the feature’s importance. It is an ensemble method in which classification is performed by voting of multiple unbiased weak classifiers decision trees [34,42].

The split is performed using the attribute that leads to the best distribution of data between nodes of the tree. This procedure is performed until the whole tree is built. The constructed tree is used to classify its OOB objects, and the result is used for obtaining the approximations of the classification error and computation of confusion matrices. New objects are classified by all trees in the forest, and the final decision is made by simple voting. The importance of each variable is estimated in the following way. First, the classification of all objects is performed. Each tree contributes its votes only to the classification of objects, which were not used for its construction.

The number of votes for a correct class is recorded for each tree. Then, the values of a given variable are randomly permuted across objects, and the classification is repeated. The number of votes for a correct class is again recorded for each tree. The importance of the variable for the single tree can then be defined as a difference between the number of correct votes cast in the original and permuted system divided by the number of objects. The importance of the variable is then obtained by averaging importance measures for individual trees. One can also use the Z-score, obtained by dividing the average value by its standard deviation, as an importance measure. The advantage of the latter measure is that it ascribes more weight to a relatively small but stable decrease in classification performance [43].

The calculation technique and detailed statistics are accessible in [41,44].

The RF classification algorithm is relatively quick, can usually be run without a tuning of parameters, and provides a numerical estimate of the feature importance. It is an ensemble method in which classification is performed by the voting of multiple unbiased weak classifiers—decision trees. These trees are independently developed on different bagging samples of the training set. The importance measure of an attribute is obtained as the loss of accuracy of classification caused by the random permutation of attribute values between objects. It is computed separately for all trees in the forest that use a given attribute for classification. Then, the average and standard deviation of the accuracy loss are computed. Alternatively, the Z score computed by dividing the average loss by its standard deviation can be used as the importance measure. Unfortunately, the Z score is not directly related to the statistical significance of the feature importance returned via the random forest algorithm since its distribution is not N (0, 1) [45]. The RF model has been widely used for regression and classification problems. The Boruta–random forest feature section optimizer (BRF) [43] proposed the concept of the Boruta random forest (BRF) optimizer algorithm to determine significant features. The BRF algorithm was established using the RF algorithm through a wrapper method [41]. The overall BRF algorithm is summarized in the following steps [46]:

Step 1: generate a randomly ordered shadow (duplicated) variable (x′t) for a particular input vector (x_t).

Step 2: remove correlations and add randomness between shadow predictors (inputs) and outputs (y_t) for a set of T distinct predictors (x_t ϵ Rn) and target variable (y_t ϵ R) with n = several inputs and t = 1, 2, …, T.

Step 3: predict the y_t using x′_t and x_t inputs through an RF model. Step 4: determine the mean decrease accuracy (MDA) for each x_t and x′_t over all trees (m_tree = 500 in this work) using Equation (15) [47,48]:

M D A = \frac{1}{m_{t r e e}} \sum_{m = 1}^{m_{t r e e}} \frac{\sum_{t \in O O B} I (y_{t} = f (x_{t})) - \sum_{t \in O O B} I (y_{t} = f (x_{t}^{n}))}{| O O B |}

(15)

In Equation (15), OOB defines out-of-bag values (i.e., the prediction error of each of the training trials through bootstrap aggregation), (y_t = f(x_t)) and (y_t = f(x_t)) describe the predicted values before and after permuting, separately. Further, I (•) represents the indicator function.

Step 5: utilize Equation (16) to compute the Z-scores, as shown below:

Z - s c o r e = \frac{M D A}{s t d}

(16)

where std outlines the standard deviation of accuracy losses, and then the maximum Z-score among duplicate attributes (MZSA) is calculated.

Step 6: if Z-scores < MZSA, then inputs that are marked as “unimportant” and separated permanently until inputs having Z-scores > MZSA are marked as “Confirmed”.

Step 7: Produce new shadow inputs and rest the algorithm when all input parameters are confirmed or reached the iteration threshold (i.e., maxi: Runs = 100 in the current work).

In this study, based on the grid search method, the RF model was trained using the following parameters for parameter tuning: the n estimators were (50, 100, 200, 300, 400, 500, and 1000), the max depth was (1, 2, 5, 10, 12, and 15) and the random state was (10, 15, 20, 30, and 42). The best parameters achieved were n_estimators of 200, max depth = 10, and a random state of 30. The other hyperparameters remained the default. In performing the cross-validation, the cross-validation was 2 in order to split the data into 2 folds.

2.5.2. Extreme Gradient Boosting (XGBoost)

The XGB algorithm, as introduced by Chen and Guestrin [30], offers a fresh perspective on implementing a gradient boosting machine using regression trees. This method relies on the boosting concept, which entails combining predictions from a group of weak learners to form a strong learner via iterative training techniques.

XGB is a machine learning algorithm realized by gradient lifting technology; it is an enhanced GBDT algorithm. Its base classifier is the Classification and Regression Tree (CART). XGB is a tree integration model combining multiple CART [30]. The XGB model is built by adding trees iteratively. The predicted value of the i-th sample in the t-th iteration can be expressed as follows in Equation (3):

ŷ_{i}^{(t)} = ŷ_{i}^{(t - 1)} + f_{t} (x_{i})

(17)

The tree is added iteratively to minimize the objective function, which can be expressed as follows:

{o b j}^{(t)} = \sum_{i = 1}^{n} L (y_{i}, ŷ_{i}^{(t - 1)} + f_{t} (x_{i})) + Ω (f_{t})

(18)

where

{o b j}^{(t)}

is the loss function, and Ω(f_t) represents the model complexity. To optimize the objective quickly, second-order Taylor expansion [49] is used for Equation (18), as shown in Equation (19).

{o b j}^{(t)} \approx \sum i^{n} = 1 (L (y_{i}, ŷ_{i}^{(t - 1)} + g_{i} f_{i} (x_{i})) + \frac{1}{2} h_{i} f_{i}^{2} (x_{i})) + Ω (f_{t})

(19)

where

g_{i} = \partial_{{j_{i}}^{(t - 1)}} L (y_{i}, ŷ_{i}^{(t - 1)})

and

h_{i} = \partial_{ŷ i (t - 1)}^{2} L (y_{i}, ŷ_{i}^{(t - 1)})

are the first and second derivatives of loss function terms, respectively. When adding the t-th tree, the previous t − 1 tree has completed the training; that is, L

(y_{i}, ŷ_{i}^{(t - 1)})

is a constant term. This term is removed to obtain the simplified objective function of step t, which can be expressed as follows:

{o \tilde{b} j}^{(t)} = \sum_{i = 1}^{n} (g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{i}^{2} (x_{i})) + Ω (f_{t})

(20)

I_{j} = {i | q (x_{i}) = j}

is defined as the sample set of leaf node j; by expanding the regular term Ω, Equation (6) can be transformed into the following:

{o \tilde{b} j}^{(t)} = \sum_{i = 1}^{n} (g_{i} ω_{q} (x_{i}) + \frac{1}{2} h_{i} ω_{q}^{2} (x_{i})) + γ T + \frac{1}{2} λ \sum_{j = 1}^{T} ω_{j}^{2}

(21)

where

ω_{j}^{*}

is the weight of leaf node j. Finally, the objective function is the optimal solution that can be expressed as follows:

ω_{j}^{*} = - \sum_{i \in I j} g_{i} / (\sum_{i \in I j} h_{i} + λ)

(22)

{o \tilde{b} j}^{(t)} = - \frac{1}{2} \sum_{j = 1}^{T} \frac{{{(\sum}_{i \in I j} g_{i})}^{2}}{\sum_{i \in I j} h_{i} + λ} + γ T

(23)

The leaf node split is based on the input variable of the model; though the input variables are applied to divide the number of leaf nodes in order to calculate the scores of the importance of input variables, the score reflects the correlation between input variables and the model output, so we can use the relative importance score of input variables to determine XGB input variables.

XGB is designed to avoid overfitting while still optimizing computing resources. This is done by simplifying the target functions in such a way that they can combine predictive and regularization terms while also retaining a high computational speed. During the training process of XGB, parallel simulations are also performed automatically for the functions. In the XGB additive learning procedure, the learner is fitted initially with the entire input space, and then a second model with the residuals is fitted to overcome a weak learner’s disadvantages. This fit is performed until the stop criteria have been met. The final forecast of the model is computed according to the number of predictions of each learner. The general function is presented as follows:

f_{i}^{(t)} = \sum_{k = 1}^{1} f_{k} (x_{i}) = f_{i}^{(t - 1)} + f_{t} (x_{i})

(24)

where

f_{i}^{(t)}

and

f_{i}^{(t - 1)}

represent the forecasts in steps t,

f_{t} (x_{i})

is the learner step by step, and

(t - 1)

, and

x_{i}

is the input variable. Chen and Guestrin, 2016, presented extensive details and calculations for the XGB algorithm [30].

By applying the grid search method, the XGB model was trained using the following parameters for parameter tuning: the estimators were (50, 100, 200, 300, 400, 500, and 1000), and the max depth was (1, 2, 5, 10, 12, and 15). The best parameters achieved were n_estimators of 1000 and max depth = 1; the other hyperparameters remained as the default through the performance of cross-validation by splitting the data into 2 folds for the cross-validation process.

2.5.3. CatBoost Model

CatBoost is a novel machine-learning algorithm rooted in the gradient-boosting decision tree (GBDT) algorithm, and it was proven to outperform other advanced GBDT algorithms like XGBoost and LightGBM in various aspects, particularly when handling substantial data and features. The enhancements are primarily evident in three key areas. Firstly, traditional GBDT algorithms typically handle categorical features using Greedy Target Statistics (Greedy TS), which is effective but prone to conditional shift issues. To circumvent this challenge, CatBoost employs an ordered principle-based approach to overcome target leakage, enabling the entire dataset to be available for model training and managing categorical features during training. Specifically, CatBoost conducts a random permutation of the dataset, selects a categorical feature, and then calculates the average label value for examples with the same category value positioned before the selected category in the permutation. According to Prokhorenkova, Gusev [50], if we sample a permutation (q = [s1, s2, …, sn]nT) from the provided dataset, the permutation is substituted with Equation (25):

x σ p, k = \frac{\sum_{j - 1}^{p - 1} (x σ j, k = x σ p, k) . Y σ J + β . P}{\sum_{j - 1}^{p - 1} (x σ j, k = x σ p, k) + β}

(25)

where P is a prior value, and β is the weight of the prior value. The prior is usually the average label value in the dataset, and it helps reduce the noise from the low-frequency category [51]. The grid search method was applied for parameter tuning. The learning rate was applied at (0.01, 0.05, and 0.1), depth of (4, 6, and 8), iterations of (100, 200, and 300), and a random seed of (33, 50, 70, and 90).

2.5.4. Least Absolute Shrinkage and Selection Operator Model (LASSO)

LASSO is a widely used LR model for dealing with regression problems, especially when working with high-dimensional data. LASSO selects a subset of the predictor variables that are most important for predicting the target variable. It also shrinks the coefficients of the remaining predictor variables, which are unimportant, towards zero. LASSO achieves this by introducing a penalizing term to ordinary least squares objective function. This objective function is proportional to the sum of the absolute errors of the coefficients of the predictor variables. The penalizing term used in LASSO is shown in Equation (26).

(\frac{1}{2 \times n_{S a m p l e s}}) \times {‖ X w - y ‖}_{2}^{2} + α \times {‖ w ‖}_{1}

(26)

where α is a constant, and ||ω|| 1 is the l-norm of the coefficient vector. The working principle of LASSO is iteratively optimizing the objective function and reducing the sum of squared errors between the predicted and observed values, subject to the penalty term. The tuning parameter in the objective function, alpha, controls the strength of the penalty and determines the amount of shrinkage applied to the coefficients of predictor variables. The LASSO model used in the current study uses multiple hyperparameters, including “alpha”, which controls the strength of penalty, “max_iter”, which controls the total number of iterations, “tol”, which is the tolerance for optimization, and “selection”, which indicates the method used for predictor variable selection. The input data are also scaled before training the model, as, without scaling, the LASSO model may not be able to weigh the impact of each input feature in data properly. The data are scaled using the standard scaling technique, where the data are transformed with a unit standard deviation and a zero mean [52].

Robert Tibshirani coined the term LASSO in 1996 [53]. It is a robust approach that accomplishes two primary tasks: regularization and feature selection. The Lasso approach constrains certain of the model parameters’ absolute values. The total must be less than a preset value (upper bound). To accomplish this, the approach employs a shrinkage (or regularization) procedure in which it penalizes the coefficients of regression variables, thereby shrinking them to zero [54]. Incorporating a penalty item into linear regression may dramatically reduce the variance of a model by effectively shrinking the coefficient estimates, particularly in models with high-dimension predictors [55].

The optimized objective function of lasso regression (Lasso-Reg) is as follows: where β0 denotes the Lasso-Reg shift, and βj denotes the xij coefficients. In this relation, Ŵ is a regulation parameter, which means that, if its value is equal to zero, the model becomes a normal regression, and all variables will be present, and if its value increases, the number of independent variables in the model will decrease. The determination of the value for this parameter is usually performed using the cross-validation method [54].

LASSO regression limits the size and the numbers of the coefficients and tries to retain the better features. Equation (27) is the formula for the minimization objective.

M i n i m i z a t i o n o f o b j e c t i v e = L S O b j e c t i v e + λ

(27)

where LS Objective is the least squares objective, which is a linear regression objective without regularization, and Y handles the regularization [56]. Based on the grid search method, the LASSO model was trained using an alpha of (0.1, 0.01, 0.001, and 1,0), and the best value for alpha was 0.01.

2.5.5. Hybrid Model Building

Hybrid prediction models strategically merge various algorithms to leverage the specific benefits of different techniques, resulting in more precise estimations compared to alternative models, as discussed by Xu, Peng [57]. The current research expanded upon prior studies by introducing additional models, totaling eight hybrid models, either double or triple, to determine the most optimal models and scenarios for predicting BWFP and GWFP between 2013 and 2022. These hybrid models consisted of XGB-RF, XGB-LASSO, XGB-CatBoost, RF-LASSO, CatBoost-LASSO, CatBoost-RF, XGB-RF-LASSO, and XGB-CatBoost-LASSO.

This combination of XGB and RF in the hybrid models was a crucial aspect of the research. To elaborate further, the hybrid model was segmented into three distinct sections, each serving a specific purpose in the prediction process. All random forest and extreme gradient boost models improve the accuracy of determining irrigation timing for agriculture using real-time data. The XGB and RF models achieved 87.73% [42] and 84.74% of accuracy, better than artificial neural networks (ANNs), support vector machines (SVMs), and decision trees [58].

In predicting the growth environment in agriculture, RF and XGB perform better than Ridge regression and lasso regression, where the study targeted the control of the saturation value in the adequate range in tomato cultivation. In the water quality field, random forest recorded the best accuracy (R² = 0.9853) among eight machine learning models in predicting the water quality index (WQI) based on the minimum set of inputs [59]. A similar paper deals with the modeling of irrigation suitability indices, and random forest was found to be suitable for predicting total dissolved salts (TDSs), potential salinity (PS), the sodium adsorption ratio (SAR), and the exchangeable sodium percentage (ESP), with a performance based on a Pearson’s coefficient (r) equal to 0.87, 0.92, 0.83, and 0.65 for RDS, PS, SAR, and ESP, respectively [60].

2.5.6. Stacking Ensemble Technique

Stacking, an ensemble technique, integrates diverse classification models through a meta-classifier, as highlighted by Verma and Pal [61]. The fundamental concept behind stacking is to amalgamate multiple weak learners to create a model with superior generalization capabilities, as discussed by Jiang, Liu [62]. This innovative ensemble framework in ensemble learning leverages meta-learners to merge outcomes from individual base learners [63], with the first-level learners being referred to as base learners and the combinators as second-level learners or meta-learners, as illustrated in Figure 3.

The fundamental concept of stacking involves a multi-stage process. Initially, the first-level learner undergoes training using the initial training dataset. Subsequently, the output generated via the first-level learner serves as the input feature for the meta-learner. Following this, a new dataset is constructed utilizing the corresponding original labels as new labels to facilitate the training of the meta-learner. The categorization of ensembles is based on whether the first-level learner employs the same type of learning algorithm. When the same learning algorithm is used, it is referred to as homogeneous ensembles; otherwise, it is termed heterogeneous ensembles [62,64,65]. This distinction is crucial, as it impacts the overall performance and characteristics of the ensemble model.

In this particular investigation, four distinct machine-learning models were incorporated as the base learners of the random forest (RF) to establish a boosting ensemble consisting of various models. These models include extreme gradient boosting (XGBoost), the LASSO regression algorithm (LASSO), and categorical boosting (CatBoost). Each of these models operates using unique principles, and the selection of the most effective model can be pivotal in determining the performance of the stacking model. The diversity among these models adds a layer of complexity and richness to the ensemble, contributing to the overall predictive power and robustness of the model.

The initial-layer prediction model within the stacking ensemble framework underwent training utilizing a k-fold cross-validation approach. The training procedure involved several steps to ensure the robustness and accuracy of the model. Firstly, the original dataset, S, was randomly partitioned into K sub-datasets, with each denoted as Si (where i ranges from 1 to K) [66]. Each sub-dataset Si was individually validated using a base learner, while the remaining K − 1 sub-datasets were utilized as training sets to generate K prediction outcomes. These predictions were then amalgamated into set D1, which aligns in length with the original dataset, S. This process was repeated for the other n − 1 base learners, resulting in the creation of sets D2, D3, and so forth. By combining the prediction outputs from all n base learners, a new dataset, D = [D1, D2, …, Dn] was formed, serving as the input data for the second-layer meta-learner. The second-layer prediction model plays a crucial role in identifying and rectifying errors from the initial-layer predictions promptly, thereby enhancing the overall accuracy and reliability of the prediction model. Given the superior generalization performance and prediction accuracy observed in heterogeneous ensembles, this study advocates for a stacked ensemble classifier structured in two stages.

Initially, random forest (RF), XGBoost (XGB), CatBoost, and LASSO were utilized as the base classifiers in the first stage of the ensemble classifier. Each of the four classification models underwent training using the complete training set. Subsequently, the probabilistic outputs obtained from the first stage were channeled into the meta-classifier during the second stage. The meta-classifier was then fitted based on the meta-features’ output from each classification model, employing selected ensemble techniques. The meta-classifier was trained on either the predicted category labels or probabilities derived from the ensemble technique, as depicted in Figure 4.

2.6. Input Combination and Performance Evaluation of the Models

To explore the weights and interconnections between the available data and gross world food production (GWFP), this study delved into five distinct scenarios featuring varying combinations of crop data, temperature data, and remote sensing indices (Table 2). Two data subsets from 2013 to 2022 were delineated: one encompassing training purposes and the other segment for test purposes.

By contrasting the predicted GWFP values against the actual values derived from the testing data using various models, the performance of the models was rigorously evaluated. The assessment of the models utilized in the study was conducted through the utilization of various metrics such as Nash–Sutcliffe model efficiency (NSE), the root mean squared error (RMSE), the mean absolute error (MAE), and the mean bias error (MBE). Additionally, the mean average percentage error (MAPE), the mean absolute relative error (MARE), accuracy (A), and coefficient of determination (R²) were also employed for evaluation.

To evaluate the performance of the models, the data from the ten seasons were divided into two subsets: 30% of the data were allocated to testing, while 70% were allocated to training. The effectiveness of the applied models was measured using the mean absolute error (MAE), the root mean square error (RMSE), and the mean bias error according to the study conducted by Springmann, Mason-D’Croz [67]. Additionally, the T-statistic test (Tstat) and uncertainty with a 95% confidence level (U₉₅) were used to assess significance.

{R^{2} = [\frac{\sum_{i = 1}^{n} (O_{i} - \bar{O}) (P_{i} - \bar{p})}{\sqrt{(\sum_{i = 1}^{n} {(O_{i} - {\bar{O}}_{i})}^{2})} {(\sum_{i = 1}^{n} (P_{i} - \bar{P})}^{2})}]}^{2}

(28)

where O_i is the ith observed data point, and P_i is the ith predicted data point. Higher R² values are associated with increased prediction accuracy, while lower RMSE values indicate better model performance. The mean bias error (MBE) was used to assess the applicable models.

M B E = \frac{1}{n} \sum_{i = 1}^{n} (P_{i} - O_{i})

(29)

The mean absolute error (MAE) calculates the average error magnitude in projections, regardless of their signs. It involves averaging the absolute differences between projected and actual yields across the test sample [68,69].

M A E = \frac{1}{n} \sum_{i = 1}^{n} | O_{i} - P_{i} |

(30)

R M S E = \sqrt{\frac{1}{n} {\sum (P_{i} - O_{i})}^{2}}

(31)

Moreover, the accuracy (A) and the mean squared error (MSE) are calculated as follows:

A = 1 - a b s (m e a n \frac{{P_{i} - O}_{i}}{O_{i}})

(32)

M S E = \frac{1}{n} \sum {(P_{i} - O_{I})}^{2}

(33)

C C = \frac{\sum_{i = 1}^{n} (O_{i} - \bar{O}) (P_{i} - \bar{p})}{\sqrt{(\sum_{i = 1}^{n} {(O_{i} - {\bar{O}}_{i})}^{2})} {(\sum_{i = 1}^{n} (P_{i} - \bar{P})}^{2})}

(34)

The Nash–Sutcliffe efficiency coefficient (NSE), a normalized statistic comparing residual variance to data variance, was utilized in this study’s performance evaluation. The accuracy of the models in Table 3 is demonstrated through the scatter index (SI) range [70] and the NSE value [71,72].

In addition, the 95% uncertainty interval for model deviations helps assess significant differences between estimated and forecasted GWFP to enhance the understanding of the model’s efficiency. This is characterized as follows:

U_{95} = 1.96 \sqrt{({S D}^{2} + {R M S E}^{2})}

(35)

The relative error (RE) and standard difference (SD) between estimated and calculated values were also taken into account.

3. Results

3.1. The Spatiotemporal Changes in Climate Variables (2013–2022)

Weather conditions play a pivotal role in the management of water resources, as evidenced by fluctuations observed from 2013 to 2022 (Figure 5). There was a significant variation in temperature ranges during this period, with the lowest maximum temperature (T_max) recorded in 2022 being 15.64 °C and the highest in 2021 being 38.28 °C, with an average of 26.96 °C. Likewise, the lowest and highest values for the minimum temperature (T_min) were documented in 2017 (4.82 °C) and 2018 (19.32 °C), respectively, with an average of 12.07 °C (Figure 5A).

The highest relative humidity (RH) percentage ranged from 36.74% in 2019 to 75.1% in 2016, with an average value of 55.92% over the time series. The trend in wind speed (WS) values peaked at 14.36 m s⁻¹ in 2019, while the lowest value was observed in 2016 at 1.02 m s⁻¹, with a mean value of 15.38 m s⁻¹ (Figure 5B). The most significant levels of effective precipitation were noted in the year 2015, reaching 4.42 mm, whereas the lowest level was documented in 2014 at zero, with an average of 2.21 mm. The rainy season typically spans from November to May, while the wheat cultivation season lasts for 6 months from November to April. In the identical scenario, the minimum and maximum ETo measurements were documented in 2016 at 3.10 mm d⁻¹ and in 2021 at 12 mm d⁻¹, respectively, with an average value of 7.55 mm d⁻¹ (Figure 5C).

The fluctuations in ET_C, yield, GWFP, and BWFB values in wheat cultivation seasons between 2013 and 2022 exhibited significant variability due to shifts in climate conditions, which significantly influence climatic parameters. The highest ET_C value of 143.15 m³ ha⁻¹ was attained in 2019, followed by 131.36 m³ ha⁻¹ in 2021, whereas the lowest value of 0.37 m³ ha⁻¹ was documented in 2017, resulting in a mean value of 71.76 m³ ha⁻¹ (Figure 6).

The highest crop yield was attained in 2021 at 7.40 ton ha⁻¹, followed by 7.21 ton ha⁻¹ in 2015. Conversely, the lowest yield was documented in 2020 at 5.85 ton ha⁻¹, with an average value of 6.62 ton ha⁻¹ (Figure 6).

The wheat BWFP reached its peak in 2019 at 23.70 m³ ton⁻¹, followed by a value of 20.85 m³ ton⁻¹ in 2021, whereas the minimum value was observed in 2017 at 0.05 m³ ton⁻¹, with an average value of 11.87 m³ ton⁻¹ (Figure 7). Moreover, the wheat GWFP reached its peak in 2016 at 6.88 m³ ton⁻¹, followed by 5.09 m³ ton⁻¹ in 2020, whereas the minimum value was observed in 2014 at zero m³ ton⁻¹, with an average value of 3.44 m³ ton⁻¹ (Figure 7).

3.2. Evaluation of the Machine Learning Models

In the interim, the outcomes pertaining to the inaccuracies within BWFP indicated that the minimum MBE value of −1.5 was recorded under the LASSO model and Sc3 (remote sensing indices), whereas the maximum value of 4.48 was attained under CatBoost and Sc2 (climate parameters), with 3.51 following closely under RF and Sc2. For the same content, the lowest MAPE value was −84.06 gained under RF-LASSO and Sc1 (all parameters), and the highest value of 783.81 was obtained under the LASSO model and Sc3 (remote sensing indices), followed by RF-LASSO and Sc2 with 684.42 (Table 4). In relation to the NSE error, the findings indicated that the RF model and Sc2 yielded the lowest value of 0.05, whereas the XGB-LASSO hybrid model and Sc4 (Pe_eff, T_max, T_min, and SA) obtained the highest value of 0.9966, followed by 0.98839 under the XGB-CAT hybrid model and Sc1 (Table 4).

The lowest MSE value was obtained under XGB-LASSO and Sc4 with 0.11; on the other hand, the highest value of 39.71 was obtained under the RF model and Sc2, followed by 39.07 under XGB and Sc2. The minimum mean absolute error (MAE) recorded was 0.16, achieved using the XGB-RF-LASSO hybrid model under Sc4. Conversely, the maximum MAE value was observed with the XGB model under Sc2 at 4.81, followed by an MAE of 4.65 under Sc2 using the RF model (Table 4).

The outcomes of diverse statistical inaccuracies for the GWFB of wheat cultivation in the presence of single, hybrid, and stacked assemble models, as well as different scenarios, were ascertained as follows: Initially, the minimum MBE value noted was −0.158, occurring within the XGB-RF-LASSO hybrid model with Sc3. Additionally, the maximum value of 0.119 was observed in the XGB model with Sc2, with a subsequent value of 0.080 seen in the CatBoost model with Sc1. In terms of NSE values, the lowest value was obtained with 0.010 under the XGB model and Sc3, while the highest value was obtained with 0.996 under the RF-LASSO hybrid model and Sc4, as well as 0.995 under the CAT-LASSO hybrid model and Sc4. The MSE indicated that the minimum value was 0.001 under the CAT-LASSO and Sc5, whilst the maximum value of 1.131 was obtained under the RF-LASSO model and Sc3, followed by 0.936 under the RF model and Sc3. The lowest MAE value was 0.024, and it was obtained under the CAT-LASSO hybrid model and Sc5, whilst the highest value was 0.680 under the stacked model and Sc3, followed by 0.677 under the RF model and Sc3 (Table 4).

Regarding the analysis of the statical parameters of BWFB, first, the maximum value of uncertainty (U₉₅) of 1.80 was obtained under the LASSO model and Sc3, while the minimum value was 0.42 under the RF-LASSO Sc3 (Figure 8). Related to accuracy, Sc2, Sc3, and Sc4 almost achieved 0.99, but the maximum value was 100% under Sc5 (Pe_eff and T_max) with hybrid XGB-CAT-LASSO, whereas the minimum was 0.84 under Sc3 with the LASSO model (Figure 8).

According to the results recorded for GWFB, the results obtained for the U₉₅ parameter indicated that the maximum value was 0.76 under the XGB model and Sc3, while the minimum was 0.04 under Sc5 with the CAT-RF model (Figure 9). Moreover, accuracy reached the highest values under the XGB-CAT-LASSO hybrid model with values around 100% for all scenarios except for Sc3, for which it was the worst scenario. Meanwhile, the lowest accuracy value was 62% for XGB-CAT with Sc3 (Figure 9).

For both BWFP and GWFP, almost all scenarios achieved high statistical evaluations, but Sc3 did not yield what we expected. In the same context, the single and hybrid models achieved the highest evaluation parameters and low statistical errors, while the stacking ensemble did not achieve promising outcomes, and the hybrid models, whether binary or triple models, were superior in relation to statistical evaluations.

Regarding the analysis of the coefficient of determination (R²) in predicting BWFB, the analysis indicated that, for Sc1, the highest R² value was 0.99 with the RF-LASSO hybrid model, while the lowest was 0.79 under the LASSO model. Moreover, for the second scenario, the maximum was 0.84 under XGB-CAT-LASSO, while the minimum was 0.36 with the RF model. In the face of Sc3, the peak of 0.82 was obtained under XGB-CAT-LASSO, whereas the lowest R² value was 0.16 under the LASSO model. Regardless of Sc4 was recorded the highest R² value of 1.0 with XGB-LASSO, and the minimum was 0.95 with CatBoost. The highest R² value for Sc5 was 0.94 with XGB-CAT-LASSO, but the lowest value was 0.77 with RF model (Figure 10).

For GWFB, the peak R² value under Sc1 was 100% with RF-LASSO, but the lowest was 0.95 under XGB-CAT. The Sc2 followed the same trend in maximum and minimum values as the first scenario, where the maximum value of 1.0 was obtained under both RF and RF-LASSO, while the minimum value of 0.95 was found under XGB-CAT and XGB-CAT-LASSO. For the third scenario, the highest R² value was 0.80 with CAT-LASSO; the lowest was 0.01 with lasso. The fourth scenario’s upper value was 1.0 with LASSO, while the lowest was 0.94 with XGB-CAT. The fifth scenario’s highest R² value was 1.0 with CatBoost, whilst the lowest was 0.95 with XGB-CAT (Figure 10).

All possible combinations of ML models and scenarios were represented as radar patterns in the RMSE. However, it was clear from every statistical study that Sc3 achieved the lowest evaluation and the greatest RMSE values when the results were taken out of the analysis for BWFB and GWFB. First of all, the values recorded of RMSE for BWFB were as follows: The Sc1’s highest value was 4.19 with the LASSO model, and the lowest was 0.93 under the XGB-RF-LASSO hybrid model. The Sc2’s peak RMSE value was 6.30 with the RF model, and the minimum was 2.02 with XGB-CAT-LASSO.

The third scenario’s maximum was 5.21 with LASSO, but the lowest was 2.89 with CAT-LASSO. The fourth model achieved the top value of 4.36 with LASSO, while the bottom was 0.47 with the RF-LASSO model. The Sc5’s highest RMSE value was 4.77 with CatBoost, and the lowest was 1.32 with XGB-CAT-LASSO (Figure 11).

The RMSE values for GWFB were recorded as follows: For Sc1, the highest RMSE value was 0.25 with XGB, while the lowest was 0.06 with CAT-RF. For Sc2, the maximum RMSE was 0.25 with XGB, while the lowest was 0.07 under RF-LASSO. Moreover, for Sc3, the highest RMSE value was 1.06 with RF-LASSO, while the minimum was 0.43 under XGB-RF-LASSO. Let us move on to Sc4. The RMSE values ranged from 0.05 under the RF-LASSO hybrid model to 0.27 under the XGB model. Lastly, the RMSE values for Sc5 fluctuated between 0.04 and 0.26 under the CAT-LASSO and XGB models, respectively (Figure 11).

The analysis of SI for BWFB indicated that the highest SI value was obtained with the LASSO model under Sc3, with a value of 1.29, which is considered a fair value. In contrast, the lowest value was achieved under the Sc4 and XGB-LASSO model with a value of 0.08, which is classified as a poor value (Figure 12A).

In the same context, the SI of GWFB indicated that the highest value was achieved with RF-LASSO under Sc3 with a value of 1.91, which is considered excellent value; moreover, the lowest value obtained under Sc5 was achieved using CAT-LASSO by 0.07, which is classified as a poor value (Figure 12B).

Factors such as ET_C, SR, WS, K_cadj, ET_O, T_max, T_ave, and T_min had positive correlations with BWFP in descending order: 1, 0.63, 0.6, 0.54, 0.53, 0.34, 0.3, and 0.22, respectively. Conversely, RH exhibited the highest negative correlation at −0.47, followed by Pe at −0.14 (Figure 13A). Effective rainfall exhibited the highest positive correlation with GWFP at 0.99, followed by RH at 0.45, while the maximum temperature showed the highest negative correlation of −0.22 with GWFP (Figure 13B). Analyzing the correlation coefficients revealed that climatic parameters had the most significant influence, followed by crop parameters and then remote sensing indices, which had the least impact on both BWFP and GWFP, as illustrated in Figure 13.

3.3. Comparison of the Machine Learning Models

To gain a more in-depth comprehension of the data distribution and the efficacy of the selected model in forecasting both BWFP and GWFP for wheat, an analysis was conducted using the predicted and actual BWFP, as well as GWFP values, during the testing phase. These datasets were meticulously presented and thoroughly compared through the utilization of box plots (Figure 14). The findings of this analysis revealed the presence of statistically significant variances among the 13 models that were applied. The box plots served as a visual representation of the data distribution, focusing on key statistical values such as the first quartile (Q1), third quartile (Q3), interquartile range (IQR)—calculated as (Q3 − Q1), and a specific segment within the box denoting the median value. This detailed examination allowed for a comprehensive understanding of how the various models performed in predicting both BWFP and GWFP values for wheat, shedding light on the strengths and weaknesses of each model in capturing the intricacies of the data distribution. The visualization provided by the box plots offered valuable insights into the distribution patterns and central tendencies of the data, enabling a more informed assessment of the predictive capabilities of the models under consideration.

To predict the BWFP of wheat, the XGB-RF hybrid model under Sc1 achieved the lowest IQR error with 0.098, followed by the XGB-LASSO hybrid model under Sc4 with 0.130; however, the LASSO model achieved the highest IQR in Sc4 with 8.52, followed by the RF model under Sc2 with 6.45 (Figure 14). The lowest IQR demonstrates that the error distribution is close to zero, and the median line in the middle of the rectangle represents the normal distribution of the error.

On the other hand, the box plot for the GWFP of wheat showed that CAT-LASSO under Sc5 achieved the lowest interquartile range (IQR) with 0.008, followed by the RF-LASSO model under Sc4 with the same value of 0.026. However, the stacked ensemble model exhibited the highest IQR in Sc3 with 0.852, followed by RF in Sc3 with 0.843 (Figure 14).

Figure 15 provides a visual representation of the importance of 16 input variables and their respective contributions to wheat’s BWFP and GWFP. These contributions are scaled between 0 and 1, where a value of 1 indicates the highest impact on the target variable. The sown area (SA) emerges as the most influential variable, accounting for 42.0% of the impact on BWFP, followed by WS with 19% and K_cadj with 13%. On the other hand, effective precipitation emerges as the most influential variable, contributing to 94.07% of the impact on GWFP. Following effective precipitation, wheat evapotranspiration (ET_C) had a 0.8% impact, wind speed and solar radiation had an impact of 0.64%, and remote sensing indices had the least impact on both wheat BWFP and GWFP (Figure 15).

4. Discussion

4.1. Scientific Interpretation of the Results

In summary, the hybrid models RF-LASSO, XGB-LASSO, and XGB-RF-LASSO consistently showed the lowest errors across metrics such as MSE and MAE in terms of MBE. In contrast, the NSE parameter demonstrated the lowest values when LASSO and RF were used as single models, based on the statistical assessment of predicted BWFP across various statistical parameters.

The models CATBOOST, LASSO, RF, and XGB recorded the highest errors for MBE, MSE, and MAE, except for NSE, which achieved the highest value under XGB LASSO. The hybrid model performed exceptionally well under the climate and crop scenarios (Scenarios 2, 4, and 5) but achieved the lowest performance under the remote sensing scenario (Scenario 3). Furthermore, the statistical analysis of predicted wheat GWFP (grain weight of the finished product) revealed that the hybrid models CAT-LASSO and XGB-RF-LASSO exhibited the lowest error values for MSE, MAE, and MBE, while the XGB model had the lowest NSE value.

Regarding the BWFB analysis, various parameters such as U95, accuracy, SI, and R-square were examined. Tstat displayed the highest value in the stack model and the lowest in the XGB-CAT-LASSO hybrid model, as depicted in Figure 8. The hybrid model surpassed both the single and stacked models in terms of accuracy, with the XGB-CAT-LASSO hybrid model achieving the highest accuracy value, while the LASSO model obtained the lowest. Additionally, the LASSO model yielded the highest U₉₅ value, whereas the RF-LASSO hybrid model yielded the lowest. In a similar vein, the SI value was highest in the LASSO model and lowest in the XGB-LASSO hybrid model, as illustrated in Figure 12. The R² values varied, with the highest recorded in the LASSO model and the lowest under LASSO in different scenarios, as shown in Figure 10.

However, the analysis of GWFP statistics indicated that the RF-LASSO hybrid model reported the highest SI value, while the CAT-LASSO model had the lowest SI value, as shown in Figure 12. The maximum and minimum R² values were observed under different scenarios for XGB-RF-LASSO. In terms of accuracy, the hybrid models outperformed the single models, with the XGB-CAT model showing the lowest accuracy value and the XGB-CAT-LASSO model showing the highest. The U₉₅ values ranged from the maximum under XGB to the minimum under the CAT-RF model, as shown in Figure 9.

Concerning the discussion of the outcomes of the stacked ensemble model, which yielded lower values compared to single and hybrid models, several factors can be attributed to these results. Firstly, the limited size of the dataset played a crucial role since the available data only spanned a period of ten years. It is worth noting that, before 2013, there was no agricultural activity in the newly exploited land of the EL-Beheira governorate. This lack of historical data could have significantly impacted the performance of the stacked ensemble model. Secondly, the dataset exhibited low variance, indicating minimal differences among the values present in the data. This homogeneity in data points could have contributed to the inferior performance of the stacked ensemble model. Lastly, the correlation matrices between the input and output variables were found to be notably low. This weak correlation might have hindered the model’s ability to effectively capture the relationships between the predictors and the target variable.

The poor performance of remote sensing indices in predicting the blue and green water footprint is due to the fact that the calculation of the blue and green water footprints depends mainly on climate data to calculate evapotranspiration during the growing season and crop productivity, which is the basis for calculating the blue water footprint, while the green water footprint depends on another criterion of climate data, which is effective rainfall with crop productivity. Hence, scenarios that include climate data achieve good and effective performance in predicting the blue and green water footprints. On the other hand, remote sensing indices were not related to water footprint estimation methods.

The weak performance of Sc3, which represents remote sensing indices, can overcome this problem and increase the prediction accuracy using remote sensing technology through, firstly, using higher-resolution remote sensing data such as Sentinel 1 and 2, which might capture more detailed field variability compared to coarse-resolution data. Secondly, temporal resolution can increase the temporal frequency of data acquisition to better capture phenological changes, especially during critical growth stages of wheat. And thirdly, integrating additional RS indices such as the Normalized Difference Temperature Index (NDTI) may be helpful in understanding plant stress, gross primary productivity (GPP) can directly relate to crop water use, and soil moisture and land surface temperature (LST) can explain the water dynamics of wheat

The performance of models can be improved in the future by taking into account the size of the data used so that they are of the appropriate size to obtain accurate and good results, although this solution is not effectively available in developing countries such as Egypt due to the lack of available data or their absence in the first place, which prompted the research team to try more than one model and more than one method for hybrid systems in the desire to obtain the highest and most accurate result possible with the highest performance evaluation under conditions of data shortage or unavailability.

4.2. Comparison with Previous Research

Abdel-Hameed, Abuarab [24] employed four different machine learning models, namely support vector regression (SVR), a random forest (RF), extreme gradient boost (XGB), and an artificial neural network (ANN), in their study on potato governorates in the Nile Delta of Egypt. The research focused on predicting the potato BWFP during the period of 1990–2016 across three specific governorates. Six scenarios of input variables were tested to assess the weight of each variable in the applied models. The findings revealed that Sc5 with the XGB and ANN models provided the most promising results for BWFP prediction in the arid region, based on key data such as the vapor pressure deficit, precipitation, solar radiation, and the crop coefficient. Following Sc5, Sc1 emerged as the next best scenario for predicting BWFP based on climate, crop, and remote sensing parameters.

Wang, Yan [28] conducted a study evaluating the green water footprint (GWFP) and blue water footprint (BWFP) of the Chinese Baojixia Irrigation District (BID) for wheat and maize. The researchers utilized a remote sensing-based support vector machine (SVM) method to map the spatial distribution of wheat and maize fields. Subsequently, the partial least squares regression (PLSR) method was employed to construct a yield estimation model using multi-temporal remote sensing data. Additionally, a remote-sensing-based water balance assessment tool (RWBAT) model was developed to quantify water consumption. The study results indicated that the average BWFP and GWFP for wheat in BID were 0.525 and 0.120 m³ kg⁻¹, respectively. In contrast, the research findings showed that the values of BWFB and GWFB were 23.70 m³ kg⁻¹ and 6.88 m³ kg⁻¹, respectively. Furthermore, Elbeltagi, Azad [73] utilized various machine learning kernels to estimate the water footprint of maize in the Nile Delta, Egypt. The study compared four kernels of Gaussian processes models and found that the Pearson universal function (PUK) kernel outperformed the others in predicting the blue water footprint (WF), followed by the polynomial kernel. The research also highlighted that model 7, which included parameters such as T_max, T_min, T_ave, WS, sunshine hours (SHs), VPD, and SR, exhibited good performance with the PUK kernel. Overall, the study outcomes emphasized that the scenario incorporating all climate, crop, and remote sensing parameters (Sc1) demonstrated superior performance in predicting the BWFP and GWFP of wheat.

Arshad, Kazmi [74] conducted a study in which they applied two nonlinear machine learning algorithms, namely random forest (RF) and support vector machines (SVMs), along with a linear model known as LASSO for predicting wheat yields. Their aim was to identify the optimal combination of predictors and machine learning algorithms under two distinct scenarios. The results revealed that, in Scenario 1, the RF regression model, using a combination of predictors including GNDVI, T_max, T_min, Rn, RH, and WS, outperformed other models with an R² value of 0.71 and an RMSE of 2.365. Similarly, in Sc2, the RF regression model surpassed that of the SVM, achieving the highest performance with an R squared of 0.78 and the lowest RMSE of 2.07, followed by another combination of predictors (GNDVI, SPEI, RH, and WS) with an R² value of 0.75. Interestingly, the linear LASSO model demonstrated comparable performance to RF, with R² values ranging from 0.77 to 0.73 in both scenarios.

In a similar vein, Li, Han [75] sought to address the limitations of existing modeling frameworks by integrating the dynamic linear model (DLM) and random forest machine learning model (RF) with nine global gridded crop models (GGCMs) to enhance projections and alleviate uncertainties in maize and soybean yield forecasts. For maize, the combined GGCM + RF models elevated the R² values from the range of 0.15–0.61 to 0.64–0.77 while reducing the RMSE from approximately 0.20–0.50 to 0.13–0.17 compared to using a GGCM alone. In the case of soybean, these integrated models increased the R-squared values from 0.37–0.70 to 0.54–0.70 and decreased the RMSE from 0.17–0.35 to 0.17–0.20 when compared to utilizing a GGCM solely.

Furthermore, Kashka, Sarvestani [76] employed two distinct artificial intelligence (AI) methods, namely the ANFIS–FCM algorithm as a novel computational approach and an artificial neural network (ANN) as a conventional method, to predict the environmental impacts of soybean production under various scenarios (such as soybean cultivation after rapeseed (R-S), wheat (W-S), and fallow (F-S)). The findings indicated that the ANFIS–FCM algorithm emerged as the superior predictive model for environmental indicators related to soybean cultivation across all scenarios in comparison to the ANN. The RMSE values obtained from the ANFIS–FCM model were consistently lower than those derived from the ANN model for all environmental indicators. Moreover, the R2 values for both the ANFIS–FCM and ANN algorithms ranged between 0.9967 to 0.9989 and 0.9269 to 0.9870, respectively.

When comparing the study outcomes for anticipated wheat BWFP, it was observed that the lowest and highest RMSE values were 0.33 under the XGB-LASSO hybrid model and Sc4, versus 6.30 under the RF model and Sc2, respectively. This indicates that the utilization of hybrid models led to a reduction in RMSE of approximately 94.76%. Similarly, in the case of GWFB, the lowest and highest RMSE values were 0.04 under the CAT-LASSO hybrid model with Sc5 and 1.06 under the RF-LASSO with Sc3 (remote sensing indices), supporting the efficacy of hybrid models in reducing the RMSE by about 96.22%. Furthermore, for the R² values of predicted wheat BWFP, the R² value was 1.0 under XGB-LASSO and Sc4, while the lowest R² value of 0.16 was detected under LASSO and Sc3, illustrating that the adoption of hybrid ML models enhanced the R² values by 525.0%. In contrast, the top R² values for predicted wheat GWFP of 1.0 were achieved under RF-LASSO and CAT-LASSO for all scenarios except Sc3, while the lowest R² value of 0.01 was identified under LASSO and Sc3, showcasing a 9990% improvement in R² values for GWFP through the application of hybrid ML models.

Azzam, Zhang [66] employed an artificial neural network (ANN), a support vector machine (SVM), random forest (RF), and k-nearest neighbor (KNN) in their research to predict green water evapotranspiration (GWET) and blue water evapotranspiration (BWET). Among these models, the random forest (RF) model exhibited the highest performance in estimating BWET, with a coefficient of determination (R²) of 0.96, a mean inter-annual (MIA) of 0.91, a root mean square error (RMSE) of 10.77 mm month⁻¹, Nash–Sutcliffe efficiency (NSE) of 0.92, and a mean absolute error (MAE) of 6.84 mm month⁻¹. Additionally, the RF model, except for the Pre variable, demonstrated acceptable simulation results (0.3 ≤ NSE < 0.6), while all other machine learning algorithms exhibited poor simulation performance (NSE < 0.3). Upon analyzing the results obtained for predicted wheat BWFP, it was found that the RF model and Sc2 (climate parameters) produced the lowest NSE value of 0.05, which was deemed unsatisfactory. Conversely, the XGB-LASSO hybrid model and Sc4 (Pe_eff, T_max, T_min, and SA) yielded the highest NSE value of 0.996, followed by 0.988 achieved using the XGB-CAT hybrid model and Sc1, both of which were rated as very good. In the case of predicting wheat GWFP, the XGB model and Sc3 resulted in the lowest NSE value of 0.010, falling into the unsatisfactory category. Conversely, the RF-LASSO hybrid model and Sc4 obtained the highest NSE value of 0.996, while the CAT-LASSO hybrid model and Sc4 achieved a value of 0.995, both rated as very good.

Mali, Shirsath [77] conducted a study to quantify the blue water footprint (BWFP) and green water footprint (GWFP) of major cereal crops in India by utilizing high-resolution soil and climatic datasets. This was carried out to evaluate the impact of climate change on the water footprint (WF). The researchers generated multi-model ensemble climate change scenarios by employing the hybrid-delta ensemble method for Representative Concentration Pathways (RCPs) 4.5, RCP 6.0, and future timeframes of the 2030s and 2050s. The results indicated that the percentage share of blue water in the total WF of paddy, wheat (both in autumn and rainy seasons), and maize (in spring/winter) stood at 18.0%, 78.6%, 95.0%, and 84.8%, respectively. Notably, autumn/rainy season crops exhibited a higher proportion of green water (ranging from 82.0% to 100.0%), emphasizing their substantial reliance on monsoon precipitation to fulfill their water needs. Conversely, the analysis revealed that, for wheat crops, the blue water component accounted for the highest proportion in the total WF, specifically 77.5% for wheat BWFP, while the green water share recorded the lowest value at 22.5% for wheat GWFP. This disparity can be attributed to Egypt’s arid climate, where annual rainfall rarely surpasses 150 mm, necessitating heavy irrigation even for winter crops like wheat. In contrast, India, being a predominantly rainy country, primarily depends on rainfall to satisfy crop water requirements, as evidenced by the substantial portion of 82–100% attributed to the green water footprint in the total water footprint.

Jiang, Zhang [78] introduced a novel classification model for assessing stored wheat quality, leveraging the evidence reasoning rule and stacking ensemble learning (ER-Stacking). The experimental outcomes demonstrated that the ER-Stacking ensemble model achieved commendable performance metrics with an accuracy of 88.1%, a precision of 88.05%, a recall of 89.31%, and an F1-score of 88.4%. On the other hand, the outcomes of the research revealed that the accuracy of the stacked ensemble model in predicting wheat BWFP and GWFP reached 100% across various scenarios, except in Sc3 (utilizing remote sensing indices), where a slight decline was observed with −5.16% and −4.21% accuracies for BWFP and GWFP, respectively.

Li, Wang [79] proposed an innovative stacking technique aimed at integrating information from multiple growth stages to enhance the predictive capability of the winter wheat grain yield (GY) model. The stacking approach yielded more information that is consistent and improved prediction accuracy compared to individual fertility results, as evidenced by an impressive R² value of 0.74. During the model validation phase, the R² values witnessed significant enhancements of 236%, 51%, 27.6%, and 12.1%, respectively. A comparison of research outcomes illustrated that the stacked ensemble model achieved the highest and lowest R² values in predicting wheat BWFP under Sc4 (incorporating Pe_eff, T_max, T_min, and SA) and Sc3 (utilizing remote sensing indices) at 0.98 and 0.33, respectively. Similarly, for forecasting wheat GWFP, the highest and lowest R2 values were observed at 0.99 and 0.02 under Sc4 and Sc3, respectively.

Duan, Yang [80] introduced a novel approach in their research, presenting a stacking ensemble learning algorithm utilizing time series data for the classification of winter wheat phenology. This method involved the integration of various machine learning models such as random forest (RF), support vector machine (SVM), K-nearest neighbor (K-NN), naive Bayes (NB), and BP neural network (BP). The outcomes of the experiments demonstrated that the stacking ensemble-learning algorithm outperformed individual models, achieving an impressive overall recognition accuracy of 81.40%. This finding highlights the efficacy and potential application of the proposed approach in identifying winter wheat phenology. However, the results presented a contrast to our initial expectations, as the stacking ensemble models did not surpass the performance of the individual models despite exhibiting high accuracy values and coefficients of determination. The limitations in achieving superior results through the ensemble models can be attributed to the constrained size of the available datasets, influencing the model’s ability to generalize effectively.

5. Conclusions

This investigation has explored the evaluation of single, hybrid, and stacked ensemble machine learning models for predicting the BWFP and GWFT of wheat. Several key findings emerged, highlighting variability in the optimal ML model, depending on factors such as the crop type, input variable combinations, water supply conditions, spatial scope, and regional variations. Notably, the XGB-CAT-LASSO hybrid model combined with Scenario 5 demonstrated the highest accuracy in predicting wheat BWFP. Hybrid machine learning models effectively predicted both wheat BWFP and GWFP, with slightly superior accuracy noted for GWFP. Despite variations, all models delivered satisfactory results, particularly with Scenario 3 for both BWFP and GWFP.

Furthermore, the analysis of RMSE highlighted the impact of hybrid models in reducing prediction errors. RMSE values varied across different scenarios and models, with the lowest and highest values observed under specific combinations. Similarly, the R² values demonstrated the effectiveness of hybrid ML models in improving predictive accuracy. Differences in R² values between different scenarios and models highlighted the importance of selecting appropriate base and meta-models when employing stacking ensemble techniques. While the proposed framework exhibited promising statistical evaluation metrics, limitations were identified in the model construction process. These complexities stem from the uncertainties regarding the selection of base models and the configuration of meta-model layers, as well as the lack of definitive guidelines on the optimal number of base models and layers for peak performance.

Author Contributions

A.A.L. and M.E.A. collected and analyzed the research data, wrote the original draft preparation, and generated the figures in the main manuscript. A.A.L. and A.M. designed and applied the machine learning models of the research. E.F. applied the remote sensing part. A.A.L., M.E.A., E.F., B.D., R.K., A.A.A. and A.M. read and edited the final manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The authors declare that no funds, grants, or other support was received during the preparation of this manuscript.

Data Availability Statement

All machine learning algorithms and model codes used in the research are sources available through the internet for free, as well as satellite images of the study area obtained from Google Earth, and all data included in the research will be made available upon request. Meanwhile, the data from previous studies and research was obtained through the Cairo University platform, which provides research information on a regular basis, and the center for advanced Mediterranean Agronomic Studies (CIHEAM), Bari, Italy.

Acknowledgments

The authors would like to express their thanks to the Faculty of Agriculture, Cairo University, and the Faculty of Higher African Studies for their support in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

Random forest (RF), extreme gradient boost (XGB), least absolute shrinkage operator (LASSO), CATBOOST model (CatBoost), blue water footprint (BWFP), green water footprint (GWFP), Nash–Sutcliffe model efficiency coefficient (NSE), root mean square error (RMSE), mean absolute error (MAE), mean bias error (MBE), coefficient of determination (R2), uncertainty with 95% (U95).

References

Elkholy, M. Assessment of water resources in Egypt: Current status and future plan. Groundw. Egypt’s Deserts 2021, 395–423. [Google Scholar]
Alene, A.; Yibeltal, M.; Abera, A.; Andualem, T.G.; Lee, S.S. Identifying rainwater harvesting sites using integrated GIS and a multi-criteria evaluation approach in semi-arid areas of Ethiopia. Appl. Water Sci. 2022, 12, 238. [Google Scholar] [CrossRef]
Gado, T.A. Statistical behavior of rainfall in egypt. In Flash Floods in Egypt; Springer: Cham, Switzerland, 2020; pp. 13–30. [Google Scholar]
Gabr, M.E.; El-Ghandour, H.A.; Elabd, S.M. Prospective of the utilization of rainfall in coastal regions in the context of climatic changes: Case study of egypt. Appl. Water Sci. 2023, 13, 19. [Google Scholar] [CrossRef]
Hoekstra, A.Y. Human appropriation of natural capital: A comparison of ecological footprint and water footprint analysis. Ecol. Econ. 2009, 68, 1963–1974. [Google Scholar] [CrossRef]
Hoekstra, A.Y. Water Neutral: Reducing and Offsetting the Impacts of Water Footprints, Value of Water Research Report; Series No. 28; UNESCO-IHE Institute for Water Education: Delft, The Netherlands, 2008; Available online: http://www.waterfootprint.org/Reports/Report28-WaterNeutral.pdf (accessed on 3 November 2024).
Elbeltagi, A.; Deng, J.; Wang, K.; Hong, Y. Crop Water footprint estimation and modeling using an artificial neural network approach in the Nile Delta, Egypt. Agric. Water Manag. 2020, 235, 106080. [Google Scholar] [CrossRef]
Mekonnen, M.; Hoekstra, A.Y. National Water Footprint Accounts: The Green, Blue and Grey Water Footprint of Production and Consumption; Unesco-IHE Institute for Water Education: Delft, The Netherlands, 2011; Volume 2: Appendices. [Google Scholar]
Huang, H.; Han, Y.; Jia, D. Impact of climate change on the blue water footprint of agriculture on a regional scale. Water Supply 2019, 19, 52–59. [Google Scholar] [CrossRef]
Li, Z.; Wang, W.; Ji, X.; Wu, P.; Zhuo, L. Machine learning modeling of water footprint in crop production distinguishing water supply and irrigation method scenarios. J. Hydrol. 2023, 625, 130171. [Google Scholar] [CrossRef]
Dietterich, T.G. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn. 2000, 40, 139–157. [Google Scholar] [CrossRef]
Vilalta, R.; Drissi, Y. A perspective view and survey of meta-learning. Artif. Intell. Rev. 2002, 18, 77–95. [Google Scholar] [CrossRef]
Hung, B.P.; Naved, B.A.; Nyberg, E.L.; Dias, M.; Holmes, C.A.; Elisseeff, J.H.; Dorafshar, A.H.; Grayson, W.L. Three-dimensional printing of bone extracellular matrix for craniofacial regeneration. ACS Biomater. Sci. Eng. 2016, 2, 1806–1816. [Google Scholar] [CrossRef]
Vidyarthi, S.K.; Tiwari, R.; Singh, S.K. Stack ensembled model to measure size and mass of almond kernels. J. Food Process Eng. 2020, 43, e13374. [Google Scholar] [CrossRef]
Martin, J.; Saez, J.A.; Corchado, E. On the suitability of stacking-based ensembles in smart agriculture for evapotranspiration prediction. Appl. Soft Comput. 2021, 108, 107509. [Google Scholar] [CrossRef]
Aly, M.S.; Darwish, S.M.; Aly, A.A. High performance machine learning approach for reference evapotranspiration estimation. Stoch. Environ. Res. Risk Assess. 2024, 38, 689–713. [Google Scholar] [CrossRef]
Madugundu, R.; Al-Gaadi, K.A.; Tola, E.; Hassaballa, A.A.; Kayad, A.G. Utilization of Landsat-8 data for the estimation of carrot and maize crop water footprint under the arid climate of Saudi Arabia. PLoS ONE 2018, 13, e0192830. [Google Scholar] [CrossRef] [PubMed]
Massari, C.; Modanesi, S.; Dari, J.; Gruber, A.; De Lannoy, G.J.; Girotto, M.; Quintana-Seguí, P.; Le Page, M.; Jarlan, L.; Zribi, M. A review of irrigation information retrievals from space and their utility for users. Remote Sens. 2021, 13, 4112. [Google Scholar] [CrossRef]
Al-Gaadi, K.A.; Madugundu, R.; Tola, E.; El-Hendawy, S.; Marey, S. Satellite-based determination of the water footprint of carrots and onions grown in the arid climate of saudi arabia. Remote Sens. 2022, 14, 5962. [Google Scholar] [CrossRef]
Nasr, P.; Sewilam, H. Investigating fertilizer drawn forward osmosis process for groundwater desalination for irrigation in Egypt. Desalination Water Treat. 2016, 57, 26932–26942. [Google Scholar] [CrossRef]
FAO. World Food and Agriculture Statistical Yearbook 2022; FAO: Roma, Italy, 2022. [Google Scholar]
Abdalla, A.; Stellmacher, T.; Becker, M. Trends and prospects of change in wheat self-sufficiency in Egypt. Agriculture 2022, 13, 7. [Google Scholar] [CrossRef]
Hachisuca, A.M.M.; Abdala, M.C.; de Souza, E.G.; Rodrigues, M.; Ganascini, D.; Bazzi, C.L. Growing degree-hours and degree-days in two management zones for each phenological stage of wheat (Triticum aestivum L.). Int. J. Biometeorol. 2023, 67, 1169–1183. [Google Scholar] [CrossRef]
Abdel-Hameed, A.M.; Abuarab, M.; Al-Ansari, N.; Sayed, H.; Kassem, M.A.; Elbeltagi, A.; Mokhtar, A. Estimation of Potato Water Footprint Using Machine Learning Algorithm Models in Arid Regions. Potato Res. 2024, 67, 1–20. [Google Scholar] [CrossRef]
Higazy, N.; Merabet, S.; Al-Sayegh, S.; Hosseini, H.; Zarif, L.; Mohamed, M.S.; Khalifa, R.; Saleh, A.; Wahib, S.; Alabsi, R.; et al. Water Footprint Assessment and Virtual Water Trade in the Globally Most Water-Stressed Country, Qatar. Water 2024, 16, 1185. [Google Scholar] [CrossRef]
Ma, X.; Liu, C.; Niu, Y.; Zhang, Y. Spatio-temporal pattern and prediction of agricultural blue and green water footprint scarcity index in the lower reaches of the Yellow River Basin. J. Clean. Prod. 2024, 437, 140691. [Google Scholar] [CrossRef]
Mialyk, O.; Booij, M.J.; Schyns, J.F.; Berger, M. Evolution of global water footprints of crop production in 1990–2019. Environ. Res. Lett. 2024, 19, 114015. [Google Scholar] [CrossRef]
Wang, L.; Yan, C.; Zhang, W. Water Footprint Assessment of Agricultural Crop Productions in the Dry Farming Region, Shanxi Province, Northern China. Agronomy 2024, 14, 546. [Google Scholar] [CrossRef]
Rouse, J.H.; Shaw, J.A.; Lawrence, R.L.; Lewicki, J.L.; Dobeck, L.M.; Repasky, K.S.; Spangler, L.H. Multi-spectral imaging of vegetation for detecting CO₂ leaking from underground. Environ. Earth Sci. 2010, 60, 313–323. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Burnett, M.; Chen, D. Urban Heat Island Footprint Effects on Bio-Productive Rural Land Covers Surrounding A Low Density Urban Center. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2021, 43, 539–550. [Google Scholar] [CrossRef]
Taloor, A.K.; Manhas, D.S.; Kothyari, G.C. Retrieval of land surface temperature, normalized difference moisture index, normalized difference water index of the Ravi basin using Landsat data. Appl. Comput. Geosci. 2021, 9, 100051. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, Z.; Feng, L.; Du, Q.; Runge, T. Combining multi-source data and machine learning approaches to predict winter wheat yield in the conterminous United States. Remote Sens. 2020, 12, 1232. [Google Scholar] [CrossRef]
Li, Z.; Gu, X.; Dixon, P.; He, Y. Applicability of Land surface Temperature (LST) estimates from AVHRR satellite image composites in northern Canada. Prairie Perspect. 2008, 11, 119–130. [Google Scholar]
Latif, M.S. Land surface temperature retrival of Landsat-8 data using split window algorithm-A case study of Ranchi district. Int. J. Eng. Dev. Res. 2014, 2, 2840–3849. [Google Scholar]
Rozenstein, O.; Qin, Z.; Derimian, Y.; Karnieli, A. Derivation of land surface temperature for Landsat-8 TIRS using a split window algorithm. Sensors 2014, 14, 5768–5780. [Google Scholar] [CrossRef] [PubMed]
Mokhtar, A.; He, H.; Zhao, H.; Keo, S.; Bai, C.; Zhang, C.; Ma, Y.; Ibrahim, A.; Li, Y.; Li, F. Risks to water resources and development of a management strategy in the river basins of the Hengduan Mountains, Southwest China. Environ. Sci. Water Res. Technol. 2020, 6, 656–678. [Google Scholar] [CrossRef]
Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop evapotranspiration-guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56. Fao Rome 1998, 300, D05109. [Google Scholar]
Li, X.; Chen, D.; Cao, X.; Luo, Z.; Webber, M. Assessing the components of, and factors influencing, paddy rice water footprint in China. Agric. Water Manag. 2020, 229, 105939. [Google Scholar] [CrossRef]
Muratoglu, A.; Bilgen, G.K.; Angin, I.; Kodal, S. Performance analyses of effective rainfall estimation methods for accurate quantification of agricultural water footprint. Water Res. 2023, 238, 120011. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Mokhtar, A.; Hamed, M.M.; He, H.; Salem, A.; Hendy, Z.M. Egypt’s water future: AI predicts evapotranspiration shifts across climate zones. J. Hydrol. Reg. Stud. 2024, 56, 101968. [Google Scholar] [CrossRef]
Kursa, M.B.; Jankowski, A.; Rudnicki, W.R. Boruta–a system for feature selection. Fundam. Informaticae 2010, 101, 271–285. [Google Scholar] [CrossRef]
Ferreira, L.B.; da Cunha, F.F. Multi-step ahead forecasting of daily reference evapotranspiration using deep learning. Comput. Electron. Agric. 2020, 178, 105728. [Google Scholar] [CrossRef]
Rudnicki, W.R.; Kierczak, M.; Koronacki, J.; Komorowski, J. A statistical method for determining importance of variables in an information system. In Rough Sets and Current Trends in Computing: 5th International Conference, RSCTC 2006 Kobe, Japan, November 6–8; 2006 Proceedings 5; Springer: Berlin/Heidelberg, Germany, 2006; pp. 557–566. [Google Scholar]
Kursa, M.B.; Rudnicki, W.R. Feature selection with the boruta package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef]
Hur, J.-H.; Ihm, S.-Y.; Park, Y.-H. A variable impacts measurement in random forest for mobile cloud computing. Wirel. Commun. Mob. Comput. 2017, 2017, 6817627. [Google Scholar] [CrossRef]
Strobl, C.; Boulesteix, A.-L.; Kneib, T.; Augustin, T.; Zeileis, A. Conditional variable importance for random forests. BMC Bioinform. 2008, 9, 307. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Qiu, Z.; Zhang, X. Eigenvalue analysis of structures with interval parameters using the second-order Taylor series expansion and the DCA for QB. Appl. Math. Model. 2017, 49, 680–690. [Google Scholar] [CrossRef]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 6639–6649. [Google Scholar]
Dong, L.; Zeng, W.; Wu, L.; Lei, G.; Chen, H.; Srivastava, A.K.; Gaiser, T. Estimating the pan evaporation in northwest china by coupling catboost with bat algorithm. Water 2021, 13, 256. [Google Scholar] [CrossRef]
Toqeer, A.; Defourny, P. Developing Wheat Crop Yield Estimation Method for Spain from Remotely Sensed Metrics Using Artificial Intelligence. Master’s Thesis, University of Tartu, Tartu, Estonia, November 2023. [Google Scholar]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Shafiee, S.; Lied, L.M.; Burud, I.; Dieseth, J.A.; Alsheikh, M.; Lillemo, M. Sequential forward selection and support vector regression in comparison to LASSO regression for spring wheat yield prediction based on UAV imagery. Comput. Electron. Agric. 2021, 183, 106036. [Google Scholar] [CrossRef]
Zhang, S.; Wu, J.; Jia, Y.; Wang, Y.-G.; Zhang, Y.; Duan, Q. A temporal LASSO regression model for the emergency forecasting of the suspended sediment concentrations in coastal oceans: Accuracy and interpretability. Eng. Appl. Artif. Intell. 2021, 100, 104206. [Google Scholar] [CrossRef]
Jhajharia, K.; Mathur, P. Prediction of crop yield using satellite vegetation indices combined with machine learning approaches. Adv. Space Res. 2023, 72, 3998–4007. [Google Scholar] [CrossRef]
Xu, W.; Peng, H.; Zeng, X.; Zhou, F.; Tian, X.; Peng, X. A hybrid modelling method for time series forecasting based on a linear regression model and deep learning. Appl. Intell. 2019, 49, 3002–3015. [Google Scholar] [CrossRef]
Mokhtar, A.; He, H.; Nabil, M.; Kouadri, S.; Salem, A.; Elbeltagi, A. Securing China’s rice harvest: Unveiling dominant factors in production using multi-source data and hybrid machine learning models. Sci. Rep. 2024, 14, 14699. [Google Scholar] [CrossRef] [PubMed]
Kouadri, S.; Kateb, S.; Zegait, R. Spatial and temporal model for WQI prediction based on back-propagation neural network, application on EL MERK region (Algerian southeast). J. Saudi Soc. Agric. Sci. 2021, 20, 324–336. [Google Scholar] [CrossRef]
Hou, W.; Yin, G.; Gu, J.; Ma, N. Estimation of spring maize evapotranspiration in semi-arid regions of Northeast China using machine learning: An improved SVR Model based on PSO and RF algorithms. Water 2023, 15, 1503. [Google Scholar] [CrossRef]
Verma, A.K.; Pal, S. Prediction of skin disease with three different feature selection techniques using stacking ensemble method. Appl. Biochem. Biotechnol. 2020, 191, 637–656. [Google Scholar] [CrossRef] [PubMed]
Jiang, M.; Liu, J.; Zhang, L.; Liu, C. An improved Stacking framework for stock index prediction by leveraging tree-based ensemble models and deep learning algorithms. Phys. A: Stat. Mech. Its Appl. 2020, 541, 122272. [Google Scholar] [CrossRef]
Cui, S.; Qiu, H.; Wang, S.; Wang, Y. Two-stage stacking heterogeneous ensemble learning method for gasoline octane number loss prediction. Appl. Soft Comput. 2021, 113, 107989. [Google Scholar] [CrossRef]
Dong, Y.; Zhang, H.; Wang, C.; Zhou, X. Wind power forecasting based on stacking ensemble model, decomposition and intelligent optimization algorithm. Neurocomputing 2021, 462, 169–184. [Google Scholar] [CrossRef]
Papouskova, M.; Hajek, P. Two-stage consumer credit risk modelling using heterogeneous ensemble learning. Decis. Support Syst. 2019, 118, 33–45. [Google Scholar] [CrossRef]
Azzam, A.; Zhang, W.; Akhtar, F.; Shaheen, Z.; Elbeltagi, A. Estimation of green and blue water evapotranspiration using machine learning algorithms with limited meteorological data: A case study in Amu Darya River Basin, Central Asia. Comput. Electron. Agric. 2022, 202, 107403. [Google Scholar] [CrossRef]
Springmann, M.; Mason-D’Croz, D.; Robinson, S.; Wiebe, K.; Godfray, H.C.J.; Rayner, M.; Scarborough, P. Health-motivated taxes on red and processed meat: A modelling study on optimal tax levels and associated health impacts. PLoS ONE 2018, 13, e0204139. [Google Scholar] [CrossRef]
Gueymard, C.A. A review of validation methodologies and statistical performance indicators for modeled solar radiation data: Towards a better bankability of solar projects. Renew. Sustain. Energy Rev. 2014, 39, 1024–1034. [Google Scholar] [CrossRef]
Behar, O.; Khellaf, A.; Mohammedi, K. A novel parabolic trough solar collector model–Validation with experimental data and comparison to Engineering Equation Solver (EES). Energy Convers. Manag. 2015, 106, 268–281. [Google Scholar] [CrossRef]
Li, D.; Liu, Y.; Zhang, X. Linear statics and free vibration sensitivity analysis of the composite sandwich plates based on a layerwise/solid-element method. Compos. Struct. 2013, 106, 175–200. [Google Scholar] [CrossRef]
Moriasi, D.N.; Arnold, J.G.; Van Liew, M.W.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE 2007, 50, 885–900. [Google Scholar] [CrossRef]
Downing, A.R.; Greenberg, I.B.; Peha, J.M. OSCAR: A system for weak-consistency replication. In Proceedings of the Workshop on the Management of Replicated Data, Houston, TX, USA, 8–9 November 1990; IEEE: Piscataway, NJ, USA, 1990. [Google Scholar]
Elbeltagi, A.; Azad, N.; Arshad, A.; Mohammed, S.; Mokhtar, A.; Pande, C.; Etedali, H.R.; Bhat, S.A.; Islam, A.R.M.T.; Deng, J. Applications of Gaussian process regression for predicting blue water footprint: Case study in Ad Daqahliyah, Egypt. Agric. Water Manag. 2021, 255, 107052. [Google Scholar] [CrossRef]
Arshad, S.; Kazmi, J.H.; Javed, M.G.; Mohammed, S. Applicability of machine learning techniques in predicting wheat yield based on remote sensing and climate data in Pakistan, South Asia. Eur. J. Agron. 2023, 147, 126837. [Google Scholar] [CrossRef]
Li, Y.; Han, M.; Shahidehpour, M.; Li, J.; Long, C. Data-driven distributionally robust scheduling of community integrated energy systems with uncertain renewable generations considering integrated demand response. Appl. Energy 2023, 335, 120749. [Google Scholar] [CrossRef]
Kashka, F.M.; Sarvestani, Z.T.; Pirdashti, H.; Motevali, A.; Nadi, M. Predicting of Agro-Environmental Footprint with Artificial Intelligence (Soybean Cultivation in Various Scenarios). 2022. Available online: https://www.researchsquare.com/article/rs-1098555/v1 (accessed on 21 July 2024).
Mali, S.S.; Shirsath, P.B.; Islam, A. A high-resolution assessment of climate change impact on water footprints of cereal production in india. Sci. Rep. 2021, 11, 8715. [Google Scholar] [CrossRef]
Jiang, H.; Zhang, S.; Yang, Z.; Zhao, L.; Zhou, Y.; Zhou, D. Quality classification of stored wheat based on evidence reasoning rule and stacking ensemble learning. Comput. Electron. Agric. 2023, 214, 108339. [Google Scholar] [CrossRef]
Li, C.; Wang, Y.; Ma, C.; Chen, W.; Li, Y.; Li, J.; Ding, F.; Xiao, Z. Improvement of wheat grain yield prediction model performance based on stacking technique. Appl. Sci. 2021, 11, 12164. [Google Scholar] [CrossRef]
Duan, H.; Yang, H.; Zhao, J.; Li, N. Winter wheat phenology classification using stacking ensemble learning algorithm based on Sentinel-1A SAR images. In Proceedings of the Fifth International Conference on Geoscience and Remote Sensing Mapping (ICGRSM 2023), Lianyungang, China, 13–15 October 2023; SPIE: Bellingham, WA, USA, 2024. [Google Scholar]

Figure 1. Geographical location of the study area and the meteorological stations.

Figure 2. Workflow summarizing input data, applied machine learning models, scenarios, and expected output. EVI: Landsat Enhanced Vegetation Index; NDVI: the normalized difference vegetation index; SAVI: Soil-Adjusted Vegetation Index; NDMI: The Normalized Difference Moisture Index; GCI: Green Chlorophyll Index; LST: land surface temperature; BWFP: blue water footprint; GWFP: green water footprint; Sc: scenario.

Figure 3. Stacking ensemble-learning workflow.

Figure 4. Stacking ensemble based on a cross-validation of all feature subsets.

Figure 5. The climatic parameters and reference evapotranspiration from 2013 to 2022 in the study area (A) T_max and T_min, (B) relative humidity and wind speed, and (C) effective precipitation and reference evapotranspiration for both governorates and months of wheat growing season.

Figure 6. The yield and evapotranspiration of wheat for the time series from 2013 to 2022 for both governorates and months of the wheat growing season.

Figure 7. The GWFP and BWFP of wheat for the time series from 2013 to 2022 for both governorates and months of the wheat growing season.

Figure 8. Bar charts to compare the models in each scenario separately, based on the U₉₅ and accuracy for BWFP prediction.

Figure 9. Bar charts to compare the models in each scenario separately, based on U₉₅ and accuracy for GWFP prediction.

Figure 10. Flower plots to investigate the correlations between actual and predicted BWFP and GWFP values through R² parameter.

Figure 11. Radar charts to compare the models in each scenario separately, based on the RMSE criterion for the BWFP and GWFP of wheat.

Figure 12. Column charts to compare the models in each scenario separately, based on the SI criterion for the (A) BWFP and (B) GWFP of wheat.

Figure 13. Correlation matrix between climate parameters, crop parameters, and remote sensing indices with BWFP and GWFP for wheat under EL-Sharkia and EL-Beheira governorates.

Figure 14. Box plots illustrating the distribution of the BWFP and GWFP estimate errors for the best model and scenarios in the test section.

Figure 15. Relative contributions of 13 input variables to green water footprint.

Table 1. Vegetation indices and remote sensing data description.

Index	Platform	Spatial Resolution (m)	Temporal Resolution (d)	Data Level	Years
EVI	Landsat 7 ETM + Sensor Landsat 8 OLI Sensor	30	2	L2	2013–2022
NDVI	Landsat 7 ETM + Sensor Landsat 8 OLI Sensor	30	2	L2	2013–2022
SAVI	Landsat 7 ETM + Sensor Landsat 8 OLI Sensor	30	2	L2	2013–2022
NDMI	Landsat 7 ETM + Sensor Landsat 8 OLI Sensor	30	2	L2	2013–2022
GCI	Landsat 7 ETM + Sensor Landsat 8 OLI Sensor	30	2	L2	2013–2022
LST	Landsat 7 ETM + Sensor Landsat 8 OLI Sensor	30	2	L2	2013–2022

Table 2. Summary of the scenarios applied in this study.

Scenario	Input Parameters
Scenario	Pe_eff	T_max	T_min	RH	T_ave	Rn	WS	Kc_adj	SA	GCI	EVI	NDVI	SAVI	NDMI	GCI	LST
Sc1	√	√	√	√	√	√	√	√	√	√	√	√	√	√	√	√
Sc2	√	√	√	√	√	√	√
Sc3											√	√	√	√	√	√
Sc4	√	√	√						√
Sc5	√	√

Table 3. The range of NSE and SI.

NSE	Classifications	SI	Classifications
NSE = 1	Perfect	SI < 0.1	Excellent
NSE > 0.75	Very good	0.1 < SI < 0.2	Good
0.74 > NSE > 0.64	Good	0.2 < SI < 0.3	Fair
0.64 > NSE > 0.5	Satisfactory	SI > 0.3	Poor
NSE < 0.5	Unsatisfactory

Table 4. Performance statistics of ML models applied to the five distinct climate and remote sensing variable scenarios.

Model	Index	GWFP					BWFP
Model	Index	Sc1	Sc2	Sc3	Sc4	Sc5	Sc1	Sc2	Sc3	Sc4	Sc5
RF	MBE	0.026	0.050	−0.066	0.031	0.037	0.440	3.518	−0.373	0.510	1.117
	NSE	0.981	0.982	0.012	0.975	0.982	0.919	0.051	0.336	0.956	0.733
	MSE	0.019	0.019	0.936	0.025	0.018	3.373	39.715	19.660	1.827	11.198
	MAE	0.065	0.065	0.678	0.070	0.073	1.191	4.651	3.006	0.860	2.702
XGB	MBE	0.100	0.120	−0.121	0.117	0.111	0.498	3.354	−0.323	0.305	1.185
	NSE	0.994	0.993	0.010	0.994	0.994	0.904	0.067	0.134	0.970	0.741
	MSE	0.065	0.064	0.778	0.074	0.066	4.010	39.074	25.640	1.276	10.853
	MAE	0.137	0.135	0.665	0.138	0.137	1.413	4.812	3.655	0.635	2.139
LASSO	MBE	0.026	0.046	−0.026	0.032	0.032	0.526	2.861	−1.505	0.064	0.815
	NSE	0.994	0.993	0.010	0.994	0.994	0.580	0.470	0.082	0.545	0.458
	MSE	0.006	0.008	0.634	0.006	0.006	17.580	22.194	27.174	19.047	22.707
	MAE	0.041	0.046	0.569	0.043	0.043	3.595	3.591	4.454	3.736	4.184
CatBoost	MBE	0.081	0.066	−0.073	0.032	0.048	0.883	4.487	−0.229	1.396	1.839
	NSE	0.943	0.957	0.027	0.968	0.959	0.921	0.086	0.264	0.862	0.701
	MSE	0.058	0.044	0.922	0.033	0.042	3.304	38.279	21.811	5.790	12.514
	MAE	0.120	0.108	0.683	0.089	0.100	1.183	4.701	3.286	1.580	2.741
XGB-RF	MBE	−0.012	0.001	−0.108	−0.004	−0.006	−0.113	−0.290	0.626	−0.146	−0.302
	NSE	0.960	0.959	0.557	0.956	0.962	0.993	0.804	0.558	0.987	0.858
	MSE	0.037	0.039	0.533	0.042	0.036	0.198	5.811	10.976	0.386	4.193
	MAE	0.082	0.079	0.438	0.084	0.079	0.221	1.379	1.703	0.256	0.877
XGB-CatBoost	MBE	−0.007	0.006	−0.093	−0.010	−0.011	−0.204	−0.276	0.469	−0.176	−0.349
	NSE	0.951	0.946	0.554	0.941	0.949	0.988	0.786	0.616	0.987	0.887
	MSE	0.047	0.052	0.536	0.056	0.048	0.344	6.345	9.542	0.380	3.354
	MAE	0.093	0.088	0.419	0.104	0.087	0.389	1.436	1.403	0.330	0.900
XGB-LASSO	MBE	−0.011	−0.009	−0.119	−0.005	−0.012	−0.108	−0.421	0.413	−0.037	−0.319
	NSE	0.991	0.992	0.537	0.991	0.992	0.982	0.798	0.638	0.996	0.894
	MSE	0.009	0.008	0.557	0.008	0.008	0.530	5.974	9.001	0.112	3.135
	MAE	0.050	0.049	0.439	0.047	0.049	0.390	1.584	1.410	0.182	0.909
RF-LASSO	MBE	0.013	0.015	−0.122	0.004	0.008	−0.027	−0.641	0.180	−0.093	−0.516
	NSE	0.995	0.995	0.060	0.997	0.996	0.985	0.509	0.631	0.993	0.883
	MSE	0.005	0.005	1.131	0.003	0.003	0.449	14.542	9.179	0.221	3.459
	MAE	0.036	0.033	0.570	0.030	0.030	0.474	2.761	1.749	0.283	1.099
CatBoost-RF	MBE	−0.003	0.015	−0.183	−0.027	−0.001	−0.052	−0.482	0.465	−0.014	−0.394
	NSE	0.992	0.981	0.474	0.991	0.992	0.963	0.746	0.645	0.962	0.893
	MSE	0.003	0.008	0.633	0.004	0.003	1.101	7.522	8.810	1.120	3.180
	MAE	0.029	0.042	0.480	0.036	0.029	0.463	1.990	1.520	0.520	0.892
CatBoost-LASSO	MBE	−0.034	−0.002	−0.287	−0.014	−0.005	−0.068	−0.557	0.388	−0.027	−0.353
	NSE	0.994	0.994	0.724	0.995	0.998	0.975	0.744	0.664	0.977	0.927
	MSE	0.006	0.006	0.332	0.004	0.002	0.732	7.593	8.349	0.673	2.166
	MAE	0.046	0.038	0.424	0.034	0.025	0.432	2.079	1.489	0.424	0.867
XGB-RF-LASSO	MBE	0.033	0.030	−0.159	0.024	0.023	0.007	−0.412	0.468	−0.123	−0.451
	NSE	0.973	0.975	0.594	0.977	0.983	0.993	0.807	0.780	0.990	0.913
	MSE	0.032	0.030	0.189	0.027	0.020	0.183	4.788	10.777	0.250	2.170
	MAE	0.043	0.042	0.330	0.051	0.041	0.208	1.465	1.577	0.161	0.719
XGB-CatBoost-LASSO	MBE	−0.007	0.006	−0.093	−0.010	−0.011	−0.001	−0.331	0.627	−0.203	−0.500
	NSE	0.951	0.946	0.554	0.941	0.949	0.986	0.836	0.806	0.983	0.930
	MSE	0.047	0.052	0.536	0.056	0.048	0.348	4.067	9.479	0.430	1.745
	MAE	0.093	0.088	0.419	0.104	0.087	0.339	1.294	1.386	0.336	0.835
Stacked	MBE	0.041	0.058	−0.072	0.011	0.057	0.598	2.979	−0.562	0.518	1.553
	NSE	0.980	0.973	0.012	0.985	0.980	0.952	0.193	0.318	0.962	0.757
	MSE	0.020	0.027	0.936	0.015	0.021	1.999	33.790	20.201	1.607	10.189
	MAE	0.074	0.078	0.680	0.058	0.075	0.944	4.273	3.244	0.759	2.088

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lotfy, A.A.; Abuarab, M.E.; Farag, E.; Derardja, B.; Khadra, R.; Abdelmoneim, A.A.; Mokhtar, A. Forecasting Blue and Green Water Footprint of Wheat Based on Single, Hybrid, and Stacking Ensemble Machine Learning Algorithms Under Diverse Agro-Climatic Conditions in Nile Delta, Egypt. Remote Sens. 2024, 16, 4224. https://doi.org/10.3390/rs16224224

AMA Style

Lotfy AA, Abuarab ME, Farag E, Derardja B, Khadra R, Abdelmoneim AA, Mokhtar A. Forecasting Blue and Green Water Footprint of Wheat Based on Single, Hybrid, and Stacking Ensemble Machine Learning Algorithms Under Diverse Agro-Climatic Conditions in Nile Delta, Egypt. Remote Sensing. 2024; 16(22):4224. https://doi.org/10.3390/rs16224224

Chicago/Turabian Style

Lotfy, Ashrakat A., Mohamed E. Abuarab, Eslam Farag, Bilal Derardja, Roula Khadra, Ahmed A. Abdelmoneim, and Ali Mokhtar. 2024. "Forecasting Blue and Green Water Footprint of Wheat Based on Single, Hybrid, and Stacking Ensemble Machine Learning Algorithms Under Diverse Agro-Climatic Conditions in Nile Delta, Egypt" Remote Sensing 16, no. 22: 4224. https://doi.org/10.3390/rs16224224

APA Style

Lotfy, A. A., Abuarab, M. E., Farag, E., Derardja, B., Khadra, R., Abdelmoneim, A. A., & Mokhtar, A. (2024). Forecasting Blue and Green Water Footprint of Wheat Based on Single, Hybrid, and Stacking Ensemble Machine Learning Algorithms Under Diverse Agro-Climatic Conditions in Nile Delta, Egypt. Remote Sensing, 16(22), 4224. https://doi.org/10.3390/rs16224224

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Forecasting Blue and Green Water Footprint of Wheat Based on Single, Hybrid, and Stacking Ensemble Machine Learning Algorithms Under Diverse Agro-Climatic Conditions in Nile Delta, Egypt

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Workflow

2.2. Climate Conditions

2.3. Remote Sensing for GWFP and BWFP Estimation

2.3.1. Multi-Temporal Image Analysis

The Enhanced Vegetation Index (EVI)

Normalized Difference Vegetation Index (NDVI)

Soil-Adjusted Vegetation Index (SAVI)

Normalized Difference Moisture Index (NDMI)

Green Chlorophyll Index (GCI)

Land Surface Temperature (LST) Derivation

2.4. Blue and Green Water Footprint Calculations

Estimation of Effective Rainfall

2.5. Machine Learning Implementations

2.5.1. Random Forest (RF)

2.5.2. Extreme Gradient Boosting (XGBoost)

2.5.3. CatBoost Model

2.5.4. Least Absolute Shrinkage and Selection Operator Model (LASSO)

2.5.5. Hybrid Model Building

2.5.6. Stacking Ensemble Technique

2.6. Input Combination and Performance Evaluation of the Models

3. Results

3.1. The Spatiotemporal Changes in Climate Variables (2013–2022)

3.2. Evaluation of the Machine Learning Models

3.3. Comparison of the Machine Learning Models

4. Discussion

4.1. Scientific Interpretation of the Results

4.2. Comparison with Previous Research

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI