1. Introduction
Spring wheat is an important crop in northern China, with a wide planting range and high yields. The Hetao Irrigation District is one of the major production areas for spring wheat in China, and it plays a significant role in ensuring food security [
1]. Chlorophyll fluorescence has proven to be a useful indicator of photosynthetic system health and is widely employed in assessing photosynthesis [
2]. Fv/Fm, a commonly used chlorophyll fluorescence parameter, represents the ratio of variable fluorescence (Fv) to maximum fluorescence (Fm) of chlorophyll. Fv represents the fluorescence emitted by open PSII reaction centers, whereas Fm represents the maximum fluorescence emitted by fully open PSII reaction centers under saturating light conditions. Fv/Fm reflects the efficiency of energy transfer within the PSII antenna and the proportion of open PSII reaction centers [
3,
4]. The Fv/Fm parameter is critical in understanding the physiological status of plants and is a measure of a plant’s capacity to convert light energy into chemical energy, which can offer insights into the health and productivity of crops [
5]. In the case of spring wheat, Fv/Fm can be utilized to monitor crop growth and identify potential stress factors, such as water and nutrient deficiencies, that may adversely impact crop yield [
6]. Traditionally, Fv/Fm has been estimated using portable instruments such as pulse amplitude modulated (PAM) fluorometers, which necessitate darkening the leaf with a leaf clamp for 15–20 min before measurement. PAM measurements can be laborious and time-consuming, particularly when large areas require monitoring. To overcome these limitations, researchers have explored the use of remote sensing technology.
Remote sensing is a powerful tool for monitoring plant growth. Remote sensing approaches retrieve chlorophyll fluorescence, which is excited by the absorption of sunlight, using spectral reflectance. Since the fluorescence emission spectrum is superimposed on leaf or canopy reflectance that can be obtained by handheld, ground-mounted, aerial, or space-borne sensors, remote sensing technique opens a new way for upscaling chlorophyll fluorescence from leaf to landscape levels. Fv/Fm has been estimated in many studies. Zhao et al. [
7] collected spectral data and Fv/Fm values from potato leaves using a hyperspectral imaging system and a closed chlorophyll fluorescence imaging system, decomposed the spectral data by continuous wavelet transform (CWT), and developed an estimation model using partial least squares. Yi et al. [
8] used hyperspectral and PAM fluorescence data along with correlation and regression analyses to develop Fv/Fm estimation models for aspen and cherry leaves. Jia et al. [
9] calculated vegetation index using hyperspectral data to estimate Fv/Fm for wheat through linear regression. With these Fv/Fm estimation studies, the time-consuming issue of traditional fluorescence determination methods were addressed, but ground-based hyperspectral data collection was not only expensive but also incapable of estimating Fv/Fm at high spatial and temporal resolution. Unmanned aerial vehicle (UAVs) equipped with RGB sensors and multispectral sensors may solve the problem at low cost [
10]. Most research on estimating Fv/Fm has primarily relied on ground-based hyperspectral measurements, with few studies employing unmanned aerial vehicles equipped with multispectral and visible sensors. This study fills a critical gap in the literature by demonstrating the feasibility of using UAV-based remote sensing to estimate Fv/Fm, offering new opportunities for efficient and high-throughput monitoring of plant health and productivity.
Machine learning techniques have revolutionized the field of data analysis by identifying complex patterns and trends that are often challenging to detect using traditional methods. In recent years, the application of machine learning methods to analyze data acquired by UAVs has gained significant traction [
11,
12,
13,
14,
15]. However, the majority of previous studies have focused on using a limited number of machine learning methods (one to four) to estimate the desired parameters, with few investigations comparing the performance of more than twenty different techniques. Given the subtle variation of Fv/Fm and the limited spectral resolution of multispectral and RGB sensors compared to hyperspectral, it is crucial to employ multiple machine learning methods to achieve higher accuracy. This approach allows for the exploitation of the complementarity of various algorithms and enables robust and comprehensive estimation of Fv/Fm from remote sensing data. In this study, the goal is to estimate Fv/Fm in spring wheat using UAV-acquired RGB and MS remote sensing data by multiple machine learning methods to improve accuracy, which is important for rapid and accurate detection of wheat stress and timely adjustment of field management measures.
2. Materials and Methods
2.1. Study Site and Experimental Design
During the wheat flowering stage, both RGB and MS remote sensing images were obtained, resulting in the calculation of 51 vegetation indices (comprising 25 RGB and 26 multispectral). Following this, critical spectral features were extracted, while multicollinearity was eliminated, and feature selection was conducted to estimate the Fv/Fm values. An array of 26 machine learning techniques were utilized, with their respective performances assessed based on accuracy, stability, and interpretability. In conclusion, a high-precision UAV remote sensing monitoring model for the Fv/Fm of spring wheat in the Hetao Irrigation District was developed, thus providing a robust scientific foundation and theoretical underpinning for local agricultural advancement.
The study was conducted from 2020 to 2021 at the experimental field of the Bayannur Institute of Agriculture and Animal Husbandry Science, located in the Inner Mongolia Autonomous Region at 40°04′ N, 10°03′ E and an altitude of 1038 m above sea level (
Figure 1). The soil type at the experimental site was loam, and the baseline fertility information is presented in
Table 1. A split-plot design was utilized, with nitrogen (N) fertilizer application methods serving as the main plot and cultivar as the subplot. The main plot, which included five levels (CK, N1, N2, N3, and N4), featured various N application methods. N1 (0.8/0.2), N2 (0.7/0.3), N3 (0.5/0.5), and N4 (0.3/0.7) had the same N application rate of 180 kg/ha but differed in seeding fertilizer rates and follow-up fertilizer rates, while CK had no fertilizer applied. The subplot comprised three cultivars of spring wheat: “Baimai 13”, “Nongmai 730”, and “Nongmai 482”. The experiment included a total of 15 treatments with three replications, resulting in 45 experimental plots, each measuring 12 m
2. The plots were arranged in randomized groups. The sowing rate was set at 300 kg/ha. Phosphorus fertilizer was applied as a basal fertilizer during sowing, and no potassium fertilizer was applied during the entire reproductive phase. Three flood irrigations were performed at the tillering, heading, and grain filling stages, each with a volume of 900 m
3/ha.
2.2. UAV Multispectral Data Acquisition and Processing
Remote images were obtained during the flowering stage of the wheat plant (12 June 2020; 15 June 2021) using a DJI Phantom 4 multispectral drone (Da-Jiang Innovations, Shenzhen, China). The drone (P4M,
Figure 2) features 6 CMOS, including 1 color sensor (ISO: 200–800) for visible (RGB) imaging and 5 monochrome sensors (
Table 2) for multi-spectral (MS) imaging. The images were acquired on clear and windless days, with a fixed and consistent takeoff location. The D-RTK 2 (Da-Jiang Innovations, Shenzhen, China) high-precision GNSS mobile station was utilized to assist in the positioning of the UAV and enhance its positioning accuracy. Prior to takeoff, the UAV was manually placed directly above three reflectivity gray plates of 20%, 40%, and 60%, and reflectivity plate photos were taken. The flight path was automatically planned by DJI GS Pro (Da-Jiang Innovations, Shenzhen, China) after calculating the current solar azimuth, with a flight altitude of 30 m, the ground sampling distance was 1.59 cm/pixel, a heading overlap of 85%, and a collateral overlap of 80%. Following the flight, DJI Terra (Da-Jiang Innovations, Shenzhen, China) was used to perform radiometric correction for multispectral images, followed by image stitching to generate a single-band reflectivity orthophoto. RGB images were stitched to produce color ortho images without a radiation correction step.
2.3. Construction and Selection of Spectral Indices
The digital number values for each RGB band and the reflectance of each MS band in each treatment plot were extracted using the zonal statistics function of ENVI. Subsequently, two types of vegetation indices (VIs) were computed, as presented in
Table 3 and
Table 4.
2.4. Fluorescence Data Acquisition and Processing
The collection of fluorescence data was synchronized with the UAV flight, and the fluorescence information of wheat canopy leaves was obtained using the Handy PEA plant efficiency analyzer (Hansatech Instruments Ltd., Norfolk, UK). In each plot, 20 leaves were randomly collected, and the average value was adopted as a representative value of the plant. Prior to collection, the target leaves were subjected to a dark treatment for 20 min using the leaf clips that were provided with the instrument.
2.5. Construction of Regression Model
In this study, 26 machine learning regression models (listed in
Table 5) were developed using PyCaret to estimate Fv/Fm. PyCaret is a user-friendly, open source, low-code machine learning library in Python that enables users to easily prepare data, train and evaluate machine learning models, and deploy models to production. PyCaret offers various features for data preparation, feature engineering, model training and evaluation, model interpretation, and model deployment. Additionally, it includes built-in visualizations and interactive plots that help users to interpret model results.
To prepare the data, a normalization technique was applied to transform the data into a fixed range between 0 and 1, thereby ensuring that all features were on the same scale. Multicollinearity, which refers to high correlation between multiple features, was addressed by removing highly correlated features to ensure data stability.
Feature selection was performed to select key features and reduce noise, thereby enhancing the accuracy and efficiency of the algorithm. Once all the models were built, the model with the highest accuracy was selected based on the accuracy ranking, and hyperparameter optimization was conducted to further improve model accuracy. This study used the feature selection scheme of the embedding method, implemented by calling the SelectFromModel API in sklearn, relying on the algorithmic model of random forests.
2.6. Segmentation of Dataset and Accuracy Evaluation
The 90 samples were randomly partitioned into a training set and a test set in a 0.7/0.3 ratio; a K-fold cross-validation (K = 5) was employed to train and optimize the model. Seven indicators were utilized to assess the accuracy of the model in the test set:
MAE (Mean Absolute Error) is a measure of the average magnitude of the errors in a set of predictions, without considering their direction. It measures the average absolute difference between the actual and predicted values. A smaller MAE indicates a more accurate prediction;
MSE (Mean Squared Error) is a measure of the average magnitude of the errors in a set of predictions, considering both the magnitude and direction of the errors. It measures the average of the squared differences between the actual and predicted values. A smaller MSE indicates a more accurate prediction;
R2 (Coefficient of Determination) is a statistical measure that represents the proportion of the variance in the dependent variable that is predictable from the independent variables. In regression analysis, R2 is used to evaluate the goodness of fit of the model. Generally, it ranges from 0 to 1, with a higher value indicating a better fit. An R2 of 1 indicates that the model perfectly predicts the target variable, while an R2 of 0 indicates that the model does not explain any variance in the target variable;
RMSE (Root Mean Squared Error) is the square root of MSE. It is a measure of the average magnitude of the errors in a set of predictions, considering both the magnitude and direction of the errors. A smaller RMSE indicates a more accurate prediction;
RMSLE (Root Mean Squared Logarithmic Error) is similar to RMSE, but instead of taking the difference between the actual and predicted values, it takes the logarithmic difference. It is used in cases where the target variable has a skewed distribution;
MAPE (Mean Absolute Percentage Error) is a measure of the average error as a percentage of the actual values. It measures the average percentage difference between the actual and predicted values. A lower MAPE indicates a more accurate prediction;
TT (Total Time) is defined as the amount of time spent on building the machine learning model. The value of TT represents the computational cost of constructing the model, including the time spent on training and validating the model. A smaller value of TT indicates that the model has a lower computational cost and can be built more efficiently. This can be beneficial in scenarios where the model needs to be constructed in a timely manner, or when computational resources are limited. A lower TT value also indicates that the model may be more scalable and can be trained on larger datasets without excessive computational cost.
where
n is the number of samples,
is the observed value,
is the mean of the observed value, and
is the predicted value.
4. Discussion
Chlorophyll fluorescence is a potent indicator of photosynthesis, where Fv/Fm serves as a measure of the maximum photochemical efficiency of photosystem II in chloroplasts and can be utilized as an indicator of plant health [
75]. For many plant species, the optimal Fv/Fm range is approximately between 0.79 and 0.84, with lower values indicating higher plant stress [
76]. The wheat flowering period is a critical stage of growth and development, as it signifies the transition of wheat plants from vegetative to reproductive growth stages, where the focus of growth shifts from leaves and roots to flowers and seeds [
77]. During this period, wheat plants require significant amounts of nutrients and energy to support the normal development and maturation of flowers and seeds [
78]. Therefore, providing sufficient water and nutrients is crucial for wheat plants during the flowering period. Stress during this phase can significantly affect the yield and quality of wheat [
79]. Monitoring Fv/Fm during the flowering period is crucial in detecting stress and implementing timely agricultural management practices to optimize wheat production and achieve high yields.
In this study, both multispectral vegetation indices and RGB vegetation indices were found to be effective in estimating Fv/Fm values. The R
2 of test set was 0.920 for multispectral vegetation indices and 0.838 for RGB vegetation indices, indicating that the former had a higher accuracy in estimating Fv/Fm values. Multispectral images acquired by UAVs can provide information about the spectral reflectance of vegetation, which changes simultaneously in the canopy when stress occurs [
80], and this can be used to estimate Fv/Fm. The key idea of this approach is that Fv/Fm is related to the fluorescence yield of photosystem II, which in turn is related to the chlorophyll content of leaves. The chlorophyll content can be estimated from the reflectance in the red and near-infrared (NIR) bands [
81]. The important spectral index MSAVI2 extracted from the multispectral estimation in this study was calculated precisely from the red and NIR bands. In addition to multispectral images, RGB images can also be used to estimate Fv/Fm values by color information. In general, the color of plants changes when they are under stress, and although some color changes are difficult for the human eye to observe, color values can be quantified by computer technology [
82]. However, due to the lack of vegetation-sensitive red edge and infrared bands, the RGB images do not provide enough spectral information, so the estimated Fv/Fm is not as accurate as the MS images.
This study shows that the machine learning models have high accuracy and stability and can effectively use RGB and MS data to estimate Fv/Fm, which is consistent with the findings of other studies [
83] using machine learning for remote sensing estimation. The effects of different machine learning models and different types of vegetation indices on the accuracy of Fv/Fm estimation are obvious. Firstly, different types of vegetation indices can provide different information in estimating Fv/Fm, and different machine learning models have different adaptation and fitting ability to different datasets [
84]. The MS + RGB model exhibits superior accuracy compared to both the MS and RGB models in estimating Fv/Fm. This suggests that the integration of RGB and MS data provides benefits in enhancing the accuracy of the estimation. Additionally, the findings suggest that utilizing multiple data sources can enhance the accuracy of the model as compared to relying solely on single-source data. These results are consistent with previous studies that have reported improved accuracy in multi-source data estimation [
85,
86]. However, the improvement in accuracy when combining (MS + RGB) is only marginally better than using MS alone in this study. The ARD model cannot be quantified for the percentage contribution of RGB and MS, so the random-forest-model-based feature importance assessment was implemented, the result was shown in
Figure 4. The importance of multispectral vegetation index in the model was higher than that of RGB vegetation indices, and the RGB contributed less valid information.
Zhao et al. [
7] developed two estimation models for Fv/Fm of potato leaves with test set R
2 of 0.807 and 0.822 and RMSE of 0.018 and 0.017, respectively. Yi et al. [
8] developed a generalized estimation model of Fv/Fm for poplar leaves and cherry leaves with R
2 = 0.88. In a study [
9] estimating Fv/Fm in winter wheat, training set R
2 = 0.50, RMSE = 0.012, test set R
2 = 0.55, RMSE = 0.014. In these previous estimation studies of Fv/Fm based on hyperspectral, the highest R
2 was of 0.88, and the highest accuracy combined model without hyperparameter optimization in this study had R
2 of 0.868, which shows that the estimation accuracy of multispectral and RGB is not as good as hyperspectral. However, Yang et al. [
87] showed that hyperparameter optimization can be helpful in improving the estimation accuracy of the model. In this study, the test set R
2 of both the MS estimation and the estimation of the MS and RGB combination reaches 0.92 after hyperparameter optimization, indicating that the accuracy gap caused by the sensors can be narrowed or even surpassed by algorithm optimization. Negative values of R
2 were observed in all three estimation models, which are usually uncommon, indicating that the prediction results using the model are worse than the estimation results using the mean, because the mean reflects the central tendency of the data, while the prediction results of the model deviate more from the true value than the mean [
88]. Notably, the performance of linear regression (LR), ridge regression (Ridge), least angle regression (LAR), and orthogonal matching pursuit (OMP) models was almost identical. Moreover, the reason for this was that only MSAVI2 was screened in the multispectral vegetation indices and LR, Ridge, LAR, and OMP are all linear models, indicating that different linear models may not have a significant effect on the accuracy of estimation when a single vegetation index was used as a characteristic variable.
The present study has important practical implications for agriculture and environmental monitoring. By demonstrating the feasibility of using UAV-based remote sensing to estimate Fv/Fm in spring wheat, our research offers new opportunities for efficient and high-throughput monitoring of plant health and productivity. This can help to improve crop management strategies, enhance yield and quality of agricultural products, and reduce the environmental impact of farming practices. In addition, the integration of machine learning methods with multispectral and RGB imagery can further enhance the accuracy and reliability of Fv/Fm estimation, enabling more precise and targeted interventions in crop management.
5. Conclusions
Integration of multiple machine learning methods with multispectral and RGB imagery acquired from UAV-based remote sensing can improve the accuracy and reliability of Fv/Fm estimation in spring wheat during the flowering stage. The important features and the optimal Fv/Fm estimation models for different types remote sensing images were different: with gradient boosting regressor (GBR) as the optimal estimation model for RGB, the important features were RGBVI and ExR; with Huber as the optimal estimation model for MS, the important feature was MSAVI2; and automatic relevance determination (ARD) as the optimal estimation model for combination (MS + RGB), the important features were SIPI, ExR, VEG. The highest accuracy was achieved using the ARD model for estimating Fv/Fm with RGB + MS vegetation indices (Test set MAE = 0.019, MSE = 0.001, RMSE = 0.024, R2 = 0.925, RMSLE = 0.014, MAPE = 0.026). Based on the results of this study, there is great potential for the use of remote sensing and machine learning for efficient and sustainable plant health monitoring and management.
In conclusion, while the present study provides a valuable contribution to the use of remote sensing and machine learning for estimating Fv/Fm in wheat during the flowering stage, there are several limitations. Firstly, the study was conducted at a single ecological site over a two-year period, which may limit the generalizability of the findings to other regions and ecosystems. Future studies conducted at multiple sites with varying environmental conditions would provide a more comprehensive understanding of the applicability of the proposed methods. Secondly, the study focused solely on the flowering stage of wheat growth, which is only one phase of the crop’s development. It is possible that the performance of the proposed remote sensing and machine learning methods may differ at other stages of wheat growth, such as germination, tillering, or grain filling. Further research is needed to examine the feasibility and accuracy of the methods across multiple growth stages. Thirdly, while the proposed methods show promise for estimating Fv/Fm in wheat, it should be noted that these methods may not be directly applicable to other plant species or crops. The optimal settings and parameters for remote sensing and machine learning may vary depending on the physiological and structural characteristics of the target plants. Future studies should explore the generalizability and adaptability of the methods to other crops and plant species.