1. Introduction
Estimation of reference evapotranspiration (ET
0) has become momentous and necessary. It is considered a crucial parameter due to its boundless and extensive range of applications in hydrological studies. These studies are used to estimate the amount of water crops will need, making irrigation scheduling possible, stimulating crop yield, and enabling better planning and management of water resources [
1]. Accurate estimation of ET
0 values has gained higher importance in agro-meteorological, hydrological and water-balance studies. In the interaction between flora, atmosphere, and soil, ET
0 is an important variable. Also, it can provide an accurate quantification for planning cropland water consumption [
2] and effective irrigation [
3].
Evapotranspiration is the term for the overall loss of moisture (water) caused by evaporation from surfaces like soil and plants [
4]. The moisture loss from a well-irrigated grassy surface is referred to as “reference evapotranspiration” (ET
0) [
5]. By using in situ monitoring-based experimental techniques such as the lysimeter method, the Bowen ratio-energy balance methodology, or eddy covariance devices, ET
0 can be measured directly [
6]. A weighing lysimetric method that was based on the phenomena of water gain and loss to estimate ET
0 directly was used by the author of [
7]. This method gained significant importance among direct methods (eddy covariance system, Bowen ratio) and was widely used in scientific studies [
8]. The high capital, operating, and maintenance expenses of these techniques may restrict their practical application. Therefore, the best option for quantification is to rely on an indirect technique. Utilizing empirical models, such as those based on temperature, radiation, mass transfer, and other variables, is one of these indirect strategies. Nearly all empirical models determine ET
0. This is due to the difficulty of determining ET for each crop. As a result, crop coefficients are used to estimate crop evapotranspiration (ET
c) of each desired crop once ET
0 is first determined using indirect methods. To accurately estimate ET
0, several attempts have been undertaken. The Penman–Monteith (FAO PM56) approach, however, was developed by Allen in 1998, and he validated it in a variety of climatic conditions. The FAO PM56 method is recommended by the United Nations Food and Agriculture Organization (FAO) as the primary reference approach for determining ET
0 and validating other techniques [
9,
10,
11]. Many locations throughout the world do not have the entire set of meteorological data needed to calculate ET
0 using the FAO PM56 method. A substantial obstacle to estimating ET
0 using the FAO PM56 approach is the lack of accessibility to all required information, uncertainty in dependability of climatic data, and unavailability of climatic data for many locations [
12,
13].
Recent research [
14,
15] reproduced FAO PM56 ET
0 using machine learning (ML) by utilizing a comprehensive set of climatic data, revealing the links and interrelationships among the variables. Similarly, the results of a deep learning neural network model using only one predictor parameter of solar radiation and FAO PM56 for estimating ET
0 were compared [
16]. In a semi-arid location, the authors of [
14] reported that Rs was the most important meteorological variable in determining ET
0. It is possible to substitute the variable Rs with the number of sunny hours (n); however, this is not always a viable option. It is also supported by the authors of [
17] who investigated the potential of a deep factorization machine, gradient boosting techniques, and three tree-based ML models for modeling daily ET
0 in the context of a daily time series. According to previous studies [
18,
19,
20], sunny hours have a stronger relationship with net radiation (Rn) than any other meteorological variable. As a result, this study chose N as an alternative to Rs [
21,
22]. The phenomenon of net radiation holds significant implications for the thermal characteristics of the Earth’s surface, thus constituting a crucial variable in the examination of land-surface phenomena and the wider topic of global climate change. Rn is the difference between inbound and outbound radiation (i.e., reflected shortwave radiation) at the surface of earth.
In practical applications, employing stand-alone AI models to process complex datasets can result in inadequate predictive capabilities. This limitation stems from the inability of an individual model to learn the diverse array of intricate patterns in data. The outcome can be suboptimal predictions. An ensemble of stand-alone prediction models can be used to get around this problem, yielding promising outcomes that surpass the performance of an individual model [
23]. These are used to reduce single ML model bias and variance [
24,
25]. Ensemble of different ML models over the individual model yield best results as stated by the author of [
26]. The authors of [
27] have recommended the ensemble of ML models as they found better results in comparison to an individual model. The following literature highlighted the use of an ensemble approach reported recently in the literature.
The findings of the authors of [
28] clearly depicted that ensembles of ML models have the capacity to increase the efficiency of individual models. By applying the ensemble approach, they have improved the efficiency of Artificial Intelligence (AI) models and empirical models up to 22% and 55%, respectively. Furthermore, they also found AI ensemble modeling superior to the empirical models. The ensemble-based genetic programming model was also utilized by the authors of [
29] to measure the degree of unpredictability related to the model architecture. The findings support the idea that quantifying the structural ambiguity of the model may be carried out thoroughly, objectively, and realistically by using the projections of these ensemble models. To forecast the model’s dependability, the authors of [
30] examined three linear ensembles and one non-linear ensemble technique. The non-linear ensemble surpassed all the other ensembles and the individual statistical and intelligent methods, according to the research. The ensembles created here can also be utilized to replace current techniques in effective ways.
Despite the ease with which weather data are being made accessible recently, many locations still lack reliable and consistent weather information. Insufficient weather observatories were established in Pakistan (the subject of our study), and climatic information for various sites was observed to be inadequate for calculating crop water needs based on ET
0. Consequently, conventional techniques (like PM56) are not suitable to be used owing to exorbitant requirements of input or the absence of weather-related variables, such as Rs. The development of approaches depending on lesser weather-related data inputs and the advancement of ML algorithms for the estimation of ET
0 with limited climate data become tasks of great significance. ML is among the finest solutions for developing an ET
0 model for this purpose. However, the formation of an ML model that can be tested versus a target variable using an established set of input parameters is a key and essential issue that was successfully solved in this work. With less climate data, the constructed ML models were tested at several test sites to confirm their accuracy in predicting ET
0. Moreover, in stand-alone applications, the ML models were prone to poor performance due to their inability to capture the trends and abruptly changing elements, which often reduced modelling performance. The objective of the ensemble technique, as shown by its notation, is to achieve distinctive characteristics for the component models that will result in the varied patterns that are displayed in the dataset [
31]. In addition, an ensemble of various ML models increases the predictive ability of the model to draw input–output relations perfectly [
32,
33]. The selection of the best model to use in an ensemble depends on the outcomes of comparative analysis of the stand-alone performance of the models. The machine learning models with better performance are selected and ensembled into a conjunction to leverage the strengths of each model. The current study unifies the three tree-based techniques (TB, SDT and DTF) using an MLP-based non-linear ensemble through a parallel combination of the machine learning models. This implication enables the ensemble model to leverage the strength of each technique to enhance the final modelling accuracy.
In view of the above-discussed literature, it is evident that a reliable ensemble of machine learning models can significantly decrease parametric requirements to accurately predict reference evapotranspiration. Most of the existing literature focuses on making hydrological predictions or forecasts by direct modeling from input space to output space, therefore ensemble modelling is still a growing research direction in the field of hydrology. In the context of evapotranspiration estimation, an ensemble of tree-based machine learning techniques is a novel application. Therefore, this study aims to apply a tree-based ML ensemble approach for ET
0 estimation with the following objectives: (i) apply sensitivity analysis using tree-based ML techniques to identify the best indicators of ET
0 in order to reduce parametric requirements (ii) develop an ensemble model and improve ET
0 estimation, (iii) investigate ensemble model performance at various climatic stations. In addition, the studies conducted on ET
0 estimation using ML techniques have been limited to the analysis of only one climatic station or region. For example, the authors of [
34] investigated a hybrid neural network approach in a semi-arid station only, and recommended using at least one climatic station from arid, semi-arid and humid regions to propose a generalized conclusion of the developed ensemble approach. Thus, the current study includes climatic stations from each selected region to investigate the performance of the developed tree-based ensemble ML approach.
4. Conclusions
The research effort that was carried out to create an ensemble-based machine learning model to predict ET0 with scant climate data is discussed in this paper. The lengthy process and significant data requirements (not readily accessible in some scenarios) for determining ET0 using the FAO-PM56 approach, which is recommended, served as the impetus for the study. The study’s findings demonstrate that it is possible to predict ET0 from extant climatic data using a tree-based model. It has been shown that the mean relative humidity, minimum temperature, and wind speed are the three most important inputs for a precise determination of ET0 using a tree-based model. This effective input was supported by a sensitivity analysis of the input parameters on ET0 carried out using tree-based models, where the lowest RMSE and maximum R2 values were obtained. According to the study’s findings, tree-based models can still predict ET0 precisely even when just data for these three variables are provided. Furthermore, an ensemble approach was applied to improve ET0 estimation using only three effective inputs (Tmin, RHmean, u(x)) and the results showed considerable improvement in ET0 estimation. The performance of this ensemble model was further investigated in seven adjacent and four faraway climatic stations of the selected study area to include different climatic effects from diverse climatic regions. The obtained results of the ensemble model indicate its usefulness and reliability as the obtained ET0 was well correlated with the standard FAO-PM56 method. Lastly, the study proposed to develop different ensemble ML techniques for ET0 estimation in other parts of the world.
Because ML strategies can handle system uncertainty, the ensemble stays superior, which implies that when a single ML methodology performs poorly, an ensemble approach will have more potential for improvements. Additionally, when a single ML approach performs well, ensemble modelling produces findings of a high caliber, and when a stand-alone ML technique performs poorly, improved results may be obtained. The approach we have suggested for estimating ET0 has to be applied in a number of places with diverse climatic conditions. The crucial thing to remember is that applying this ensemble approach in many parts of the world will assist in increasing its veracity and accuracy, and more recent machine learning approaches built on cutting-edge algorithms offer fodder for further study. This study proposes that an ensemble approach can be used by combining other ML techniques such as ANFIS, SVM, GMDH and CCNN on ET0 estimation. The most significant factor in applying an ensemble approach is the use in all parts of the world to determine its efficiency and reliability, specifically in areas that have limited climatic data. In addition, the current study used climatic data on a monthly basis, therefore we recommend future research should be focused to develop an ensemble model based on data on a daily basis to increase the accuracy and generalizability of the developed ensemble model for ET0 estimation.