Machine Learning Based Peach Leaf Temperature Prediction Model for Measuring Water Stress

Kim, Heetae; Kim, Minyoung; Kim, Youngjin; Kim, Byounggap; Lee, Choungkeun; No, Jaeseung

doi:10.3390/w16213157

Open AccessTechnical Note

Machine Learning Based Peach Leaf Temperature Prediction Model for Measuring Water Stress

by

Heetae Kim

,

Minyoung Kim

,

Youngjin Kim

,

Byounggap Kim

,

Choungkeun Lee

and

Jaeseung No

^*

Department of Agricultural Engineering, National Institute of Agricultural Science, Rural Development Administration, Jeonju 54875, Republic of Korea

^*

Author to whom correspondence should be addressed.

Water 2024, 16(21), 3157; https://doi.org/10.3390/w16213157

Submission received: 29 September 2024 / Revised: 17 October 2024 / Accepted: 2 November 2024 / Published: 4 November 2024

(This article belongs to the Special Issue Evapotranspiration and Plant Water Stress Measurements as the Driving Standard for Agricultural Irrigation)

Download

Browse Figures

Versions Notes

Abstract

:

When utilizing the Crop Water Stress Index (CWSI), the most critical factor is accurately measuring canopy temperature, which is typically done using infrared sensors and imaging cameras. In this study, however, we aimed to develop a machine learning model capable of predicting leaf temperature based on environmental data, without relying on sensors, for calculating CWSI. The data underwent preprocessing to remove outliers and missing values. The number of training data points for each factor was 307,924. After data preprocessing, a Pearson correlation analysis (bivariate correlation coefficient) was conducted to select the training data for model operation. The relationship between leaf temperature and air temperature showed a strong positive correlation of 0.928 (p < 0.01). Solar radiation and relative humidity were also found to have high correlations. However, wind speed and soil moisture tension showed very low correlations with leaf temperature and were excluded from the model operation. The Decision Tree, Random Forest, and Gradient Boosting models were selected, and each model was evaluated using RMSE (Root Mean Squared Error), MAE (Mean Absolute Error), MSE (Mean Squared Error), and R² (coefficient of determination). The evaluation results showed that the Gradient Boosting model had a high R² (0.97) and low RMSE (0.88) and MAE (0.54), making it the most suitable model for predicting leaf temperature. Through the leaf temperature prediction model developed in this study, labor and costs associated with sensors can be reduced, and by applying it to real agricultural settings, it can improve crop quality and enhance the sustainability of agriculture.

Keywords:

decision tree; environmental; gradient boosting; random forest; water stress

1. Introduction

Agriculture serves as a fundamental pillar of global economies, playing a critical role in enhancing Gross Domestic Product (GDP) and ensuring food security on a global scale [1]. However, agriculture is facing challenges to food security due to ongoing climate change and population growth, leading to potential food shortages. The relationship between food security and freshwater consumption efficiency in agriculture is becoming increasingly evident, especially with the growing dependency on irrigation practices. Agriculture remains the largest consumer of freshwater resources, utilizing approximately 70% of the global freshwater supply to irrigate about 25% of the world’s arable land [2,3]. According to Anon (2019), the global population is expected to reach 9.7 billion by 2050, and the demand for both nutrient-dense food and water resources will rise significantly, placing further pressure on agricultural systems [4]. Food demand is expected to increase by 60% by 2050, necessitating more cultivated land and intensive production, which will further escalate water usage [5,6,7]. Given the limited potential for agricultural land expansion, there is an imperative need for crop systems to optimize the utilization of available water and land resources to support the future population. Therefore, understanding and improving water use efficiency is paramount to achieving substantial water conservation and enhanced crop yields.

Smart irrigation technologies are considered the best method for saving water resources while ensuring that crops do not experience water deficiency. Smart irrigation refers to applying the right amount of water at the right time and place [8]. When fruit trees experience water stress, they may exhibit various responses, such as a loss of turgor, reduced cell pressure, lowered root water potential, stomatal closure, decreased transpiration, crop water deficits, and root hardening, depending on the intensity and duration of the exposure. These factors can lead to decreased productivity and deteriorated fruit quality [9,10]. Consequently, research has increasingly focused on measuring crop stress—both directly and indirectly—due to changes in environmental factors like soil and weather, and incorporating these measurements into decision-making for crop irrigation [11,12]. Moving away from traditional water management practices based on experience or intuition, modern methods include real-time measurements of soil moisture content and tension, as well as meteorological information, such as air temperature, humidity, solar radiation, wind speed, and rainfall, to estimate crop evapotranspiration [13]. Furthermore, the water consumption process in crops is influenced by complex nonlinear interactions involving environmental factors, soil, and crops. These dynamic, strongly coupled interactions make irrigation management more challenging. To analyze and predict these complex interaction factors, algorithms and prediction techniques for irrigation timing and volume based on artificial intelligence and machine learning have been developed.

Technologies that quantify soil moisture and crop water stress to determine optimal irrigation timing are being introduced. Canopy temperature measured using infrared thermometry has been recognized as a non-invasive reliable indicator of a plant’s water status [14,15,16]. Various canopy temperature-based indices have been introduced, with the Crop Water Stress Index (CWSI) being one of the most extensively applied indicators for assessing plant water stress [17]. The CWSI has been applied across various crops and agricultural climates for stress detection, irrigation scheduling, and yield forecasting [18,19,20,21]. The CWSI is scaled between 0 and 1, where a value of 0 represents optimal irrigation conditions with no water stress, and a value of 1 denotes extreme water stress or the cessation of transpiration. The CWSI can be derived through both theoretical and empirical methodologies. The theoretical model, grounded in the energy balance approach, was initially proposed by Jackson et al. (1981) [22]. Although this theoretical approach yields precise estimates of CWSI, it demands an extensive set of input parameters, particularly aerodynamic resistance and net radiation values (as indicated in Equations (1)–(3)).

C W S I = \frac{(d T - d T_{l})}{(d T_{u} - d T_{l})}

(1)

d T_{u} = r_{c} (R_{n} - G) / Y C_{p}

(2)

d T_{l} = [\frac{r_{c} (R_{n} - G)}{Y C_{p}}] [\frac{K}{S + K} - \frac{V P D}{S + K}]

(3)

where dT represents the difference between leaf temperature and air temperature (°C); dT_u refers to the difference between leaf temperature and air temperature (°C) when no transpiration occurs due to water stress; dT_l is the difference between leaf temperature and air temperature (°C) when there is no water stress due to sufficient irrigation; R_n is the net radiation energy (W/m²); S is the slope of the saturation vapor pressure curve (kPa/°C); K is the psychrometric constant (kPa/°C); r_a is the aerodynamic resistance (s/m); Y is the air density (kg/m³); C_p is the specific heat capacity (1013 J/kg·°C); VPD is the vapor pressure deficit (kPa); and G is the soil heat flux (W/m²). The empirical approach (Idso et al., 1981) is relatively easy to use and provides a reliable estimate of CWSI [23]. Both approaches take into account the leaf temperature under well-watered conditions and the leaf temperature under non-transpiring conditions. Therefore, the most critical factor when using the CWSI is the accurate measurement of canopy temperature. Traditional methods measure leaf temperature using infrared thermometers and thermal imaging cameras. However, this method requires adjustments for sensor positioning due to the shadow effects caused by tree shape and canopy structure, and the growth of the trees necessitates multiple temperature sensors, involving significant effort and cost for installation and maintenance. When using fixed sensors, it is challenging to accurately measure leaf temperature due to the influence of shadows and the sensor’s direction, which may result in measuring the temperature of elements other than the leaf. In this study, to ensure the accurate calculation of the CWSI, a machine learning model was developed to predict leaf temperature based on environmental data rather than relying on sensors. This approach aims to improve the accuracy of CWSI estimation by predicting leaf temperature without the need for physical sensors.

2. Material and Methods

In this study, the target crop is peach, and the cultivar is Prunus persica (L.) Batsch (Cheonjeongdobaekdo). The data collection site is the peach experimental field of the National Institute of Horticultural and Herbal Science, located at a latitude of 35°49′29″ and a longitude of 127°01′32″ (Figure 1). For the development of the leaf temperature prediction model, data collected at 1-min intervals from June to September in 2020, 2021, and 2022, including leaf temperature, relative humidity, air temperature, solar radiation, wind speed, and soil moisture tension, were utilized. Leaf temperature was measured using an infrared temperature sensor (SI-431, Apogee, Logan, UT, USA), and soil moisture tension was measured with a soil moisture tension meter (TEROS-21, METER Group, Pullman, WA, USA). An automatic weather station (AWS) was installed in the orchard to collect atmospheric environmental data using sensors for air temperature and humidity (KSH-7310, Korea Digital, Seoul, Republic of Korea), solar radiation (SWSR-7500, Korea Digital, Republic of Korea), and wind speed (SWAP-7300, Korea Digital, Republic of Korea), with data collected at 1-min intervals. The specifications of the sensors are listed in Table 1.

The missing values in the time-series data for each variable were initially visualized and subsequently excluded. Outliers were detected and removed using the Interquartile Range (IQR) method, whereby air temperature, relative humidity, and solar radiation values that exceeded the upper quartile or fell below the lower quartile were classified as outliers and excluded from the dataset. The total number of data points after preprocessing was 307,924 for each variable. To select the training data for model development, a Pearson correlation analysis (bivariate correlation coefficient) was performed using SPSS Statistics 29.0. The correlation coefficient ranges from −1.0 to 1.0, with a higher absolute value indicating a stronger correlation. Various machine learning algorithms have been used in the agricultural sector for predicting crop growth and physiological characteristics. To propose a machine learning-based crop recommendation system suitable for the agricultural environment, the efficiency of nine unique machine learning models, including Logistic Regression (LR), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF), Bagging (BG), AdaBoost (AB), Gradient Boosting (GB), and Extra Trees (ET), was evaluated [24]. Lwandile et al. trained machine learning models using historical datasets that included temperature, rainfall, humidity, soil pH, and nutrient levels [25]. Among the models, the Random Forest model achieved the highest accuracy, with a score of 99.31%. The model was used to predict crop height on winter wheat farms by leveraging soil data and spectral indices. Patil et al. applied various models, such as Random Forest (RF), Naive Bayes, Decision Tree, Logistic Regression, and K-Nearest Neighbors (KNN), for crop and yield prediction. Among the models used for yield prediction, Random Forest regression demonstrated the best performance, with a Mean Absolute Error (MAE) of 0.64 and an R² score of 0.96. For crop prediction, the Naive Bayes classifier produced the most accurate results, with an accuracy of 99.39% [26]. Such research efforts in measuring agricultural environments are actively being conducted.

Machine learning was conducted using the regression program of MATLAB^® R2023a’s Statistics and Machine Learning Toolbox^TM. The regression learning models used were linear regression, Decision Tree, Support Vector Machine (SVM), Gaussian Process Regression (GPR), and Decision Tree ensemble. The models were evaluated based on Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), R² (coefficient of determination), and Mean Squared Error (MSE) (Table 2). RMSE represents the square root of the average squared difference between predicted and actual values, indicating the magnitude of the error. MAE is the average of the absolute differences between predicted and actual values, while MSE is the mean of the squared differences between actual and predicted values. Additionally, R² represents the explanatory power of the model. The three best-performing machine learning algorithms were selected, and Python 3.11 was used to implement representative models of these algorithms. The models were evaluated using MAE, RMSE, MSE, and R² to assess their performance. The processing for the development of the leaf temperature prediction model is shown in Table 3.

3. Results and Discussion

Table 4 presents the results of the Pearson correlation analysis (bivariate correlation coefficient) conducted using SPSS Statistics 29.0 for leaf temperature, air temperature, wind speed, soil moisture tension, relative humidity, and solar radiation. According to the correlation analysis, correlations in the range from 0.60 to 0.80 at the p-value level of 0.01 are considered significant. The relationship between leaf temperature and air temperature showed a significant positive correlation of 0.928 (p < 0.01). Solar radiation (0.661) and relative humidity (–0.645) also exhibited significant correlations, while wind speed (0.081) and soil moisture tension (0.077) had very low correlations with leaf temperature, and thus, they were excluded from the model operation.

Table 5 shows the results of running the machine learning model for predicting peach leaf temperature using MATLAB. The MRMR (Minimum Redundancy Maximum Relevance) algorithm assigned importance scores of 1.9766 for air temperature, 0.7311 for solar radiation, and 0.5187 for relative humidity. Among the machine learning algorithms, ensemble, medium tree, coarse tree, Gaussian regression, and linear regression models demonstrated a superior performance. The ensemble model achieved an RMSE of 0.46628, an MSE of 0.21742, and an MAE of 0.29036. The medium tree regression model exhibited RMSE, MSE, and MAE values of 0.50625, 0.25629, and 0.32045, respectively, while the coarse tree, Gaussian, and linear regression models showed values of 0.51006, 0.26017, 0.32942, 0.53133, 0.28232, 0.36252, 0.65458, 0.42848, and 0.45197, respectively. Most models had RMSE values between 0.2 and 0.6, suggesting they were able to accurately predict the data. Moreover, the coefficient of determination (R²) for the prediction models was greater than 0.98, indicating a high performance.

In general, machine learning algorithms are classified into two primary types: supervised learning and unsupervised learning models. Supervised learning is more commonly used and is generally easier to implement than unsupervised learning. Supervised learning algorithms are trained on a labeled data training set to make predictions or decisions. This approach is used when the user knows the answer to the problem and trains the AI to find that answer. There are two main types of supervised learning algorithms: classification and regression. Classification methods ask the algorithm to predict anomalies in order to classify input data into specific categories, while linear regression deals with continuous data. Supervised learning is more widely employed due to its relative ease of implementation when compared to unsupervised learning techniques. Supervised learning can be divided into classification and regression, where classification predicts a class when the dependent variable is categorical. In contrast, regression predicts a continuous dependent variable using predictor features, and instead of predicting a class, it quantifies the probability of belonging to a class. One of the most popular methods in classification is ensemble learning, which can be divided into Bagging and Boosting. Bagging involves randomly selecting multiple samples, training each model, and aggregating the results for prediction or classification. Boosting, on the other hand, combines weak models to create a stronger model that performs better [27]. Random Forest is a representative example of Bagging, while Gradient Boosting, XGBoost, and Light GBM are examples of Boosting. GBM is a predictive model that can be used for both regression and classification analyses.

Based on previous studies and references, the Decision Tree, Random Forest, and Gradient Boosting were selected, and each model was evaluated using RMSE (Root Mean Squared Error), MAE (Mean Absolute Error), MSE (Mean Squared Error), and R² (coefficient of determination). For the peach leaf temperature prediction, 30% of the dataset was used as a test set, while the remaining 70% was used as a training set. This approach allowed the model to be trained using 70% of the total data and tested with the remaining 30% (for example, 70 data points out of 100 would be used as the training set, and 30 data points as the test set). This ratio has been proven through various studies and experiments to generally provide a good performance. Additionally, after selecting the models, hyper-parameters were fine-tuned, and graphs comparing actual and predicted values were generated. Box plots were also created to visualize the distribution of predicted and actual values.

The processes of model implementation and evaluation are depicted in Figure 2, Figure 3 and Figure 4. Decision Tree is characterized by their hierarchical structure, where internal nodes represent features (attributes), branches define decision rules, and terminal nodes (leaf nodes) correspond to predicted outcomes [28]. Decision Trees are highly interpretable models that can handle both numerical and categorical data without the need for extensive preprocessing or feature scaling, making them versatile for various types of data. However, they are prone to overfitting when dealing with noisy data, and biased trees can be generated when certain classes dominate. Random Forest is an ensemble learning technique consisting of multiple Decision Trees. Each tree is trained on a randomly sampled subset of the data (using bagging) and a random subset of features. By averaging the outputs of multiple trees, Random Forest reduces the risk of overfitting, resulting in improved robustness and higher accuracy compared to individual Decision Trees. It can handle high-dimensional datasets but is harder to interpret than a single Decision Tree, and using a large number of trees can be computationally expensive [29]. Gradient Boosting builds an ensemble of trees sequentially, where each new tree is added to correct the errors of the previous tree. New trees are fitted to the residual errors of the previous trees using gradient descent to minimize the loss function. When properly tuned, Gradient Boosting often outperforms Random Forest, can handle a wide range of data types, and offers flexibility in selecting loss functions. However, it is more complex than Random Forest, prone to overfitting if not tuned properly, and more computationally expensive [30]. Many studies have been conducted using ensemble algorithms to detect crop water stress through multispectral and image data. Wu et al. used Random Forest for analysis to estimate water stress in rice crops and achieved high accuracy [31]. Additionally, Kapari et al. proposed a method to detect water stress in maize crops using multispectral and thermal images collected by UAVs in conjunction with machine learning algorithms [32].

Performance evaluation showed that the Gradient Boosting model had the lowest RMSE of 0.88, an R² of 0.97, an MAE of 0.54, and an MSE of 0.77, indicating that it was the most suitable model for predicting leaf temperature. The Random Forest (RF) and Decision Tree (DT) models followed in performance. The performance of the peach leaf temperature prediction models is shown in Table 6.

The Light GBM (Gradient Boosting Machine) is a high-performance framework developed by Microsoft, designed to operate efficiently on large datasets. Although it follows the same basic principles as traditional GBM, Light GBM is optimized for faster and more efficient training on large datasets [33]. Therefore, an additional evaluation of the performance of the GBM and Light GBM was conducted. The Light GBM for peach achieved an RMSE of 0.91, an R² of 0.96, and an MAE of 0.58, which were slightly lower than the results of the Gradient Boosting model. As more data are collected over time, the potential for Light GBM’s use in big data applications will be considered.

Hyper-parameter tuning is a critical process for optimizing machine learning model performance and preventing overfitting. In large datasets and complex models, the impact of hyper-parameters is significant, making appropriate tuning essential. Among the model’s hyper-parameters, the learning rate determines how quickly the model learns during training, the maximum depth (max depth) helps reduce overfitting, the minimum number of samples (min samples leaf, min samples split) controls overfitting, and the number of estimators (n estimators) determines the number of Decision Trees used. Random search selects random combinations of hyper-parameters within a defined range for testing. It is faster than grid search but may have a lower probability of finding the optimal combination. Grid search systematically considers all combinations of predefined hyper-parameters, but it is time-consuming. In this study, random search was used to derive hyper-parameters, where random combinations were selected within the defined range to find the best. After the random search, a grid search was employed for more precise hyper-parameter optimization. The optimal hyper-parameters for the peach are shown in Table 7.

The actual leaf temperature measurements from June to September in 2020 and 2022 were compared with the predicted values obtained from the prediction model. Figure 5, Figure 6, Figure 7 and Figure 8 show a monthly comparison graph for 2020. While there was a maximum difference of 1.5 °C, the model exhibited satisfactory performance. In 2022, the actual and predicted values were found to be almost identical (Figure 9, Figure 10, Figure 11 and Figure 12). The GBM demonstrated high accuracy in predicting actual leaf temperature data, with similar patterns across monthly data. However, errors in some intervals could be attributed to model limitations or data variability.

Box plots are useful tools for visually representing data distribution and key statistics. Through the box plot analysis, the distribution between predicted and actual values was examined. The median near 0 °C indicates that prediction errors were not significantly skewed toward either positive or negative values. The green triangle, representing the mean, is also located near 0 °C, showing that the prediction errors were generally balanced. This confirms that the GBM model had relatively small prediction errors and made stable predictions (Figure 13).

4. Conclusions

This study aimed to develop a machine learning model capable of predicting leaf temperature based on environmental data for the calculation of the Crop Water Stress Index (CWSI) without relying on physical sensors. Through rigorous data preprocessing and a correlation analysis, the Gradient Boosting model was found to be the most suitable for predicting leaf temperature, exhibiting the lowest RMSE and MAE values and the highest R². This suggests that the Gradient Boosting model is highly effective in predicting leaf temperature using environmental variables such as air temperature, solar radiation, and relative humidity. The successful development of this machine learning model demonstrates the potential for more efficient irrigation management in peach orchards. By applying the model to real-world agricultural practices, it can aid in the creation of more accurate irrigation schedules, optimizing water use while maintaining crop quality and yield. Additionally, the approach outlined in this study could serve as a foundation for further research into the application of machine learning models in other crops or environmental conditions. Future work will involve continuous data collection to further refine the model and ensure its applicability in different farming conditions. Implementing the model in field practices will help evaluate its effectiveness in improving water efficiency, crop quality, and overall sustainability in agriculture. Although this study focused on peach crops, the underlying machine learning model can be adapted for other crops by incorporating crop-specific physiological parameters and environmental data. The flexibility of machine learning algorithms allows for customization by updating the training dataset with data from other crops, such as leaf temperature, transpiration rates, and environmental factors relevant to those crops. The model can be expanded to other crops by adjusting the dataset to reflect the specific environmental variables relevant to each crop. Additionally, it can provide accurate predictions even in regions where climate conditions vary significantly.

Author Contributions

Conceptualization, H.K.; software, H.K. and J.N.; formal analysis, M.K.; data curation, H.K. and Y.K.; visualization, H.K.; supervision, C.L.; project administration, B.K. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by funding from the Rural Development Administration ‘New Agricultural Climate Change Response Program’, Project No. RS-2020-RD009124.

Data Availability Statement

Research data may be made available upon request to the authors and after review.

Acknowledgments

I would like to express my gratitude to all members of the Agricultural infrastructure engineering Lab. For their hard work and dedication in helping complete this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Erion, B.; Felix, K.; Geophrey, K. Smart irrigation monitoring and control strategies for improving water use efficiency in precision agriculture: A review. Agric. Water Manag. 2022, 260, 107324. [Google Scholar] [CrossRef]
FAO. The State of Food and Agriculture 2020. Overcoming Water Challenges in Agriculture; FAO: Rome, Italy, 2020. [Google Scholar] [CrossRef]
Khokhar, T. Chart: Globally, 70% of Freshwater is Used for Agriculture [WWW Document]. WorldBankBlogs. 2017. Available online: https://blogs.worldbank.org/opendata/chart-globally-70-freshwater-used-agriculture (accessed on 15 May 2021).
FAO. Water for Sustainable Food and Agriculture: A report produced for the G20 Presidency of Germany; Food and Agriculture Organization of the United Nations: Rome, Italy, 2017. [Google Scholar]
Rafael, G.; Rodrigo, F.; F’abio, L. Development of a Digital Twin for smart farming: Irrigation management system for water saving. J. Clean. Prod. 2023, 388, 959–6526. [Google Scholar] [CrossRef]
Zhang, Y.; Cui, J.; Liu, X.; Liu, H.; Liu, Y.; Jiang, X.; Li, Z.; Zhang, M. Application of water-energy-food nexus approach for optimal tillage and irrigation management in intensive wheat-maize double cropping system. J. Clean. Prod. 2022, 381, 959–6526. [Google Scholar] [CrossRef]
Wang, Y.; Guo, S.; Guo, P. Crop-growth-based spatially-distributed optimization model for irrigation water resource management under uncertainties and future climate change. J. Clean. Prod. 2022, 345, 959–6526. [Google Scholar] [CrossRef]
Singh, U.; Praharaj, C.; Gurjar, D.; Kumar, R. Precision irrigation management: Concepts and applications for higher use efficiency in field crops. In Scaling Water Productivity and Resource Conservation in Upland Field Crops Ensuring More Crop per Drop; ICAR-Indian Institute of Pulses Research: Kampur, India, 2019. [Google Scholar]
Choi, Y.H.; Kim, Y.W.; Kim, H.T.; Kim, Y.J.; Kim, M.Y.; Lee, S.Y. Effect of Infrared Thermometer View Direction on Crop Water Stress Index for Apple Tree. In Proceedings of the Korean Society of Agricultural Engineers Conference, Jeju, Republic of Korea, 15 October 2020. [Google Scholar]
Limpus, S. Isotropic and Anisotropic Characterisation of Vegetable Crops. The Classification of Vegetables by Their Physiological Responses to Water Stress; Project Report; The State of Queensland, Department of Primary Industries and Fisheries: Brisbane, Australia, 2009.
Erdem, Y.; Sehirali, S.; Erdem, T.; Kenar, D. Determination of Crop Water Stress Index for Irrigation Scheduling of Bean(Phaseolus culgaris L.). Turk. J. Agric. For. 2006, 30, 195–202. [Google Scholar]
O’Shaughnessy, S.; Evett, S.; Colaizzi, P.; Howell, T. A crop water stress index and time threshold for automatic irrigation scheduling of grain sorghum. Agric. Water Manag. 2012, 107, 122–132. [Google Scholar] [CrossRef]
Kim, S.W.; Kim, S.J. Study on the estimation of project duty of water and facility capacity in upland irrigation—On the estimation of duty of water for the upland crops by the measurement of evapotranspiration. J. Korean Soc. Agric. Eng. 1988, 30, 23–44. [Google Scholar]
Fuchs, M. Infrared measurement of canopy temperature and detection of plant water stress. Theor. Appl. Climatol. 1990, 42, 253–261. [Google Scholar] [CrossRef]
González, V.; Zarco, P.; Fereres, E. Applicability and limitations of using the crop water stress index as an indicator of water deficits in citrus orchards. Agric. For. Meteorol. 2014, 198, 94–104. [Google Scholar] [CrossRef]
Kumar, N.; Poddar, A.; Shankar, V. Optimizing irrigation through environmental canopy sensing–A proposed automated approach. AIP Conf. Proc. 2019, 2134, 060003. [Google Scholar] [CrossRef]
Ihuoma, S.; Madramootoo, C. Recent advances in crop water stress detection. Comput. Electron. Agric. 2017, 141, 267–275. [Google Scholar] [CrossRef]
Yuan, G.; Luo, Y.; Sun, X.; Tang, D. Evaluation of a crop water stress index for detecting water stress in winter wheat in the North China Plain. Agric. Water Manag. 2004, 64, 29–40. [Google Scholar] [CrossRef]
Emekli, Y.; Bastug, R.; Buyuktas, D.; Emekli, N.Y. Evaluation of a crop water stress index for irrigation scheduling of bermudagrass. Agric. Water Manag. 2007, 90, 205–212. [Google Scholar] [CrossRef]
Gontia, N.K.; Tiwari, K.N. Development of crop water stress index of wheat crop for scheduling irrigation using infrared thermometry. Agric. Water Manag. 2008, 95, 1144–1152. [Google Scholar] [CrossRef]
Anda, A.; Soós, G.; Menyhárt, L.; Kucserka, T.; Simon, B. Yield features of two soybean varieties under different water supplies and field conditions. Field Crops Res. 2020, 245, 107673. [Google Scholar] [CrossRef]
Jackson, R.D.; Idso, S.B.; Reginato, R.J.; Pinter, P.J., Jr. Canopy temperature as a crop water stress indicator. Water Resour. Res. 1981, 17, 1133–1138. [Google Scholar] [CrossRef]
Idso, S.B.; Jackson, R.D.; Pinter, P.J.; Reginato, R.J.; Hatfield, J.L. Normalizing the stress-degree-day parameter for environmental variability. Agric. Meteorol. 1981, 24, 45–55. [Google Scholar] [CrossRef]
Farida, S.P.; Mehadi, H.; Shakhawat, H.S.; Maruf, H.; Sazzad, H.B.; Ariful, I.; Tousif, H.L. Enhancing Agricultural Productivity: A Machine Learning Approach. In Human-Centric Intelligent Systems; Springer: Berlin/Heidelberg, Germany, 2024. [Google Scholar] [CrossRef]
Lwandile, N.; Cilence, M.; Zinhle, M.M.; Wonga, M.; Phathutshedzo, E.R.; Ahmed, M.K.; Johannes, G.C. Machine Learning Methods with Unmanned Aerial Vehicle Imagery and Soil Properties. Land 2024, 13, 299. [Google Scholar] [CrossRef]
Pritesh, P.; Pranav, A.; Manas, B. Crop Selection and Yield Prediction using Machine. Curr. Agric. Res. J. 2023, 11, 968–980. [Google Scholar] [CrossRef]
Hamid, J.; Masoud, M.; Eric, G.; Fariba, M.; Saeid, H. Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation. Remote Sens. 2021, 13, 4405. [Google Scholar] [CrossRef]
Freodl, M.A.; Brodleyf, C.E. Decision Tree Classification of Land Cover from Remotely Sensed Data. Remote Sens. Environ. 2001, 61, 399–409. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Jerome, H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar]
Wu, Y.; Jiang, J.; Zhang, X.; Zhang, J.; Cao, Q.; Tian, Y.; Zhu, Y.; Cao, W.; Liu, X. Combining Machine Learning Algorithm and Muti-temporal Temperature Indices to Estimate The Water Status of Rice. Agric. Water Manag. 2023, 289, 108521. [Google Scholar] [CrossRef]
Mpho, K.; Mbulisi, S.; James, M.; Tafadzwanashe, M.; Luxon, N.; Sylvester, M. Comparing Machine Learning Algorithms for Estimating the Maize Crop Water Stress Index(CWSI) Using UAV-Acquired Remotely Sensed Data in Smallholder Croplands. Drones 2024, 8, 61. [Google Scholar] [CrossRef]
Shi, X.; Cheng, Y.; Xue, D. Classification Algorithm of Urban Point Cloud Data based on LightGBM. Mater. Sci. Eng. 2019, 631, 052041. [Google Scholar] [CrossRef]

Figure 1. Experimental site(Peach orchard) for data measurement. (Purple Box: latitude 35°49′29″; longitude 127°01′32″; scale 2475 m²).

Figure 2. Schematic diagram of the DT model (nodes and leaves are colored green and brown, respectively) [27].

Figure 3. Schematic diagram of RF classifier (nodes and leaves are colored green and brown, respectively) [27].

Figure 4. Schematic diagram of GBM classifier(nodes and leaves are colored green and brown, respectively) [27].

Figure 5. Monthly comparison of observed and predicted leaf temperature values for June 2020.

Figure 6. Monthly comparison of observed and predicted leaf temperature values for July 2020.

Figure 7. Monthly comparison of observed and predicted leaf temperature values for August 2020.

Figure 8. Monthly comparison of observed and predicted leaf temperature values for September 2020.

Figure 9. Monthly comparison of observed and predicted leaf temperature values for June 2022.

Figure 10. Monthly comparison of observed and predicted leaf temperature values for July 2022.

Figure 11. Monthly comparison of observed and predicted leaf temperature values for August 2022.

Figure 12. Monthly comparison of observed and predicted leaf temperature values for September 2022.

Figure 13. Box plot of the GBM (Green triangle: Mean value of the data).

Table 1. Specifications of the sensors installed in a peach orchard.

Sensor	Content	Feature
Temperature -humidity	Measurement range	Temperature: −40~60 °C
	Measurement range	Relative humidity: 0~99%
	Resolution	Temperature: 0.1 °C
	Resolution	Relative humidity: 0.1%
	Accuracy	Temperature: ±0.3 °C
	Accuracy	Relative humidity: ±3.0%
Solar radiation	Irradiance range	0~2000 W/m²
	Spectral range	400~1000 nm
	Resolution	1 W/m²
	Accuracy	±5%
Infrared temperature	Measurement range	−30~65 °C
	Resolution	0.01 °C
	Accuracy	±2%
Wind speed	Wind speed range	0~76 m/s
	Resolution	0.1 m/s
	Accuracy	±5%
Soil water potential	Water potential range	−9~−2000 kPa
Soil water potential	Accuracy	0.1 kPa

Table 2. Performance evaluation metrics of prediction model.

Evaluation	Formula
MAE, Mean Absolute Error	$\frac{\sum_{i = 1}^{n} \|y_{i} - x_{i}\|}{n}$
MSE, Mean Squared Error	$\frac{\sum_{i = 1}^{n} {(y_{i} - x_{i})}^{2}}{n}$
RMSE, Root Mean Squared Error	$\sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - x_{i})}^{2}}{n}}$
(R², R squared)	$1 - \frac{\sum_{i = 1}^{n} {(y_{i} - x_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\dot{x}}_{i})}^{2}}$

Note(s):

x_{i}

= actual values,

y_{i}

= predicted values,

\dot{x}

_i = mean of actual values, n = sample size.

Table 3. Flowchart of leaf temperature prediction model development.

1		2		3		4		5		6
Selection of training data	→	Data preprocessing	→	Model selection using MATLAB	→	Model training using Python	→	Model performance evaluation	→	Hyper parameter tuning

Table 4. Pearson correlation coefficients.

	Leaf Temp.	Air Temp.	Wind Speed	Soil Water Tension	Relative Humidity	Solar Radiation
Leaf temp.	1	0.928	0.081	0.077	−0.645	0.661
Air temp.	0.928	1	0.126	0.046	−0.659	0.516
Wind speed	0.081	0.126	1	−0.144	−0.157	0.165
Soil water tension	0.077	0.046	−0.144	1	0.037	0.088
Relative humidity	−0.645	−0.659	−0.157	0.037	1	−0.610
Solar radiation	0.661	0.516	0.165	0.088	−0.610	1

Note(s): Significant correlation at the p < 0.01 level (two-tailed), n = 307,924.

Table 5. The results of training the leaf temperature prediction model using MATLAB’s regression program toolbox.

Model	RMSE	R2	MSE	MAE
Ensemble	0.46628	0.99	0.21742	0.29036
Medium tree	0.50625	0.99	0.25629	0.32045
Coarse tree	0.51006	0.99	0.26017	0.32942
Gaussian regression	0.53133	0.99	0.28232	0.36252
Linear regression	0.65458	0.98	0.42848	0.45197

Table 6. Performance evaluation results of the leaf temperature prediction model using Python.

Model	RMSE	R²	MAE	MSE
Gradient Boosting	0.88	0.97	0.54	0.77
Random Forest	0.99	0.96	0.66	0.99
Decision Tree	1.13	0.95	0.79	1.27

Table 7. Hyper-parameter results of the GB model.

Crop	Learning Rate	Max Depth	Min Samples Leaf	Min Samples Split	N Estimators
Peach	0.04916655057	10	8	6	199

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, H.; Kim, M.; Kim, Y.; Kim, B.; Lee, C.; No, J. Machine Learning Based Peach Leaf Temperature Prediction Model for Measuring Water Stress. Water 2024, 16, 3157. https://doi.org/10.3390/w16213157

AMA Style

Kim H, Kim M, Kim Y, Kim B, Lee C, No J. Machine Learning Based Peach Leaf Temperature Prediction Model for Measuring Water Stress. Water. 2024; 16(21):3157. https://doi.org/10.3390/w16213157

Chicago/Turabian Style

Kim, Heetae, Minyoung Kim, Youngjin Kim, Byounggap Kim, Choungkeun Lee, and Jaeseung No. 2024. "Machine Learning Based Peach Leaf Temperature Prediction Model for Measuring Water Stress" Water 16, no. 21: 3157. https://doi.org/10.3390/w16213157

APA Style

Kim, H., Kim, M., Kim, Y., Kim, B., Lee, C., & No, J. (2024). Machine Learning Based Peach Leaf Temperature Prediction Model for Measuring Water Stress. Water, 16(21), 3157. https://doi.org/10.3390/w16213157

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Based Peach Leaf Temperature Prediction Model for Measuring Water Stress

Abstract

1. Introduction

2. Material and Methods

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI