Next Article in Journal
Metabolic, Nutritional and Morphophysiological Behavior of Eucalypt Genotypes Differing in Dieback Resistance in Field When Submitted to PEG-Induced Water Deficit
Previous Article in Journal
Comparative Evaluation of Pyrus Species to Identify Possible Resources of Interest in Pear Breeding
Previous Article in Special Issue
Estimation of Strawberry Crop Productivity by Machine Learning Algorithms Using Data from Multispectral Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Stage Corn Yield Prediction Using High-Resolution UAV Multispectral Data and Machine Learning Models

1
Department of Plant and Soil Sciences, Mississippi State University, Starkville, MS 39762, USA
2
Crop Production Systems Research Unit, United States Department of Agriculture, Agriculture Research Service, Stoneville, MS 38776, USA
3
Genetics and Sustainable Agriculture Research Unit, United States Department of Agriculture, Agriculture Research Service, Starkville, MS 39762, USA
*
Author to whom correspondence should be addressed.
Agronomy 2023, 13(5), 1277; https://doi.org/10.3390/agronomy13051277
Submission received: 27 March 2023 / Revised: 26 April 2023 / Accepted: 27 April 2023 / Published: 28 April 2023
(This article belongs to the Special Issue Crop Yield Estimation through Remote Sensing Data)

Abstract

:
Timely and cost-effective crop yield prediction is vital in crop management decision-making. This study evaluates the efficacy of Unmanned Aerial Vehicle (UAV)-based Vegetation Indices (VIs) coupled with Machine Learning (ML) models for corn (Zea mays) yield prediction at vegetative (V6) and reproductive (R5) growth stages using a limited number of training samples at the farm scale. Four agronomic treatments, namely Austrian Winter Peas (AWP) (Pisum sativum L.) cover crop, biochar, gypsum, and fallow with sixteen replications were applied during the non-growing corn season to assess their impact on the following corn yield. Thirty different variables (i.e., four spectral bands: green, red, red edge, and near-infrared and twenty-six VIs) were derived from UAV multispectral data collected at the V6 and R5 stages to assess their utility in yield prediction. Five different ML algorithms including Linear Regression (LR), k-Nearest Neighbor (KNN), Random Forest (RF), Support Vector Regression (SVR), and Deep Neural Network (DNN) were evaluated in yield prediction. One-year experimental results of different treatments indicated a negligible impact on overall corn yield. Red edge, canopy chlorophyll content index, red edge chlorophyll index, chlorophyll absorption ratio index, green normalized difference vegetation index, green spectral band, and chlorophyll vegetation index were among the most suitable variables in predicting corn yield. The SVR predicted yield for the fallow with a Coefficient of Determination (R2) and Root Mean Square Error (RMSE) of 0.84 and 0.69 Mg/ha at V6 and 0.83 and 1.05 Mg/ha at the R5 stage, respectively. The KNN achieved a higher prediction accuracy for AWP (R2 = 0.69 and RMSE = 1.05 Mg/ha at V6 and 0.64 and 1.13 Mg/ha at R5) and gypsum treatment (R2 = 0.61 and RMSE = 1.49 Mg/ha at V6 and 0.80 and 1.35 Mg/ha at R5). The DNN achieved a higher prediction accuracy for biochar treatment (R2 = 0.71 and RMSE = 1.08 Mg/ha at V6 and 0.74 and 1.27 Mg/ha at R5). For the combined (AWP, biochar, gypsum, and fallow) treatment, the SVR produced the most accurate yield prediction with an R2 and RMSE of 0.36 and 1.48 Mg/ha at V6 and 0.41 and 1.43 Mg/ha at the R5. Overall, the treatment-specific yield prediction was more accurate than the combined treatment. Yield was most accurately predicted for fallow than other treatments regardless of the ML model used. SVR and KNN outperformed other ML models in yield prediction. Yields were predicted with similar accuracy at both growth stages. Thus, this study demonstrated that VIs coupled with ML models can be used in multi-stage corn yield prediction at the farm scale, even with a limited number of training data.

1. Introduction

Corn (Zea mays) is among the most important crops that play a vital role in global food security. Along with rice and wheat, corn provides ~30% of the food calories to more than 4.5 billion people worldwide [1]. Corn contributes about 96% of feed grain use in the US [2]. The US is among the leading producers of corn followed by China, Brazil, Argentina, and Ukraine. Agricultural practices such as soil fertility, irrigation, and crop management are aimed at increasing crop yield to meet the demands of the world’s population. Despite significant development in agricultural sectors, over 800 million people continue to experience chronic hunger, as reported by the Food and Agriculture Organization (FAO), and issues like climate change, pandemics, and international conflicts make things even worse [3]. In the context of global food security and demand, a timely and reliable crop yield prediction is crucial in effective decision and policy making [4,5,6,7].
Conventional yield estimation is labor-intensive, time and cost ineffective, and often accomplished at the end of the season. Remotely sensed data acquired using spaceborne, airborne, and Unmanned Aerial Vehicles (UAVs) have been successfully used for crop yield prediction [4,8,9,10,11,12]. The major advantages of remote sensing-based crop yield prediction are that it is reliable, time and cost-effective, and can be used across different growth stages of crops, facilitating efficient crop management [4,6,13,14]. Furthermore, compared to space and airborne platforms, UAVs offer a high spatio-temporal resolution, flexible acquisition windows, and less atmospheric attenuation, making them more suitable for crop monitoring and yield prediction at the farm or field scale [12,15,16,17,18]. Spatio-temporal farm-scale crop monitoring and yield prediction are crucial for growers and insurance agencies to make better-informed decisions.
The multispectral sensor mounted on a UAV consists of suitable spectral bands within the Visible and Near Infrared (VNIR) range, which bands are highly effective in deriving various Vegetation Indices (VIs) sensitive to crop health [19,20]. The UAV-based multispectral data, coupled with Machine Learning (ML) models, has been effectively used in monitoring and predicting the yield of various crops such as corn [21,22,23,24,25,26,27], wheat [17,28,29], rice [30,31,32,33], soybean [34], cotton [35,36], and others [37,38,39]. ML is a branch of artificial intelligence that offers several advantages over conventional statistical models, such as the ability to learn complex and non-linear relationships, it can deal with a wide range of variables, higher accuracy, and can handle big data [40,41]. ML learning techniques have also been successfully used in other fields including geology [42,43], geohazards [44], forestry [45], and others [41]. The successful implementation of ML algorithms depends on four crucial steps: suitable training data, relevant variables, hyperparameter optimization, and robust validation approaches [40,42,43,44].
Crop yield prediction has been successfully achieved using various machine learning (ML) algorithms such as Logistic Regression (LR), k-Nearest Neighbor (KNN), Support Vector Machine/Regression (SVM/SVR), Decision Trees (DTs), Random Forest (RF), Multivariate Adaptive Regression Splines (MARS), Artificial Neural Networks (ANN), Least Absolute Shrinkage and Selection Operator (LASSO), Gradient Boost Regression Tree (GBRT), and others [7,14,17,46,47,48,49,50]. However, the performance of ML models varies based on several factors, including training data, input variables, crop types, and growth stages. For example, Mupangwa et al. [25] evaluated six ML algorithms (LR, LDA, NB, KNN, CART, and SVM) in corn yield prediction and found that KNN produced the best prediction results whereas SVM produced the poorest prediction accuracy. Croci et al. [50] achieved higher accuracy in corn prediction from Gaussian process regression, SVR, and single-layer perceptron feed-forward neural networks as compared to RF, KNN, and cubist regression models. Matsumura et al. [51] obtained higher corn yield prediction using DNN than LR. Kim and Lee [52] obtained accurate corn yield prediction using deep learning methods as compared to SVR, RF, and extremely randomized trees. Shahhosseini et al. [26] achieved a better yield prediction using extreme gradient boosting as compared to LASSO regression, ridge regression, and RF. Recent studies also reported higher accuracy in crop yield prediction using different deep learning models as compared to classical ML models but required a substantially higher amount of training data for model development and validation [53]. Therefore, it is crucial to evaluate the ML model’s performance to select the most suitable one for achieving accurate results.
Previous ML-based crop yield prediction studies have utilized a significant number of training samples and a diverse set of variables including VIs, soil nutrients and properties, and meteorological data. However, such comprehensive datasets are often limited to researchers and big consulting companies. This poses a challenge for small farmers and growers as obtaining suitable training data and comprehensive input variables at a farm level can be difficult. Therefore, the main objective of this study was to evaluate the effectiveness of UAV-based VIs for predicting corn yield at vegetative and reproductive growth stages using different ML models with a limited number of training samples. Moreover, we assessed the influence of agronomic treatments on overall yield and their potential impact on yield prediction. We further assessed the influence of the number of variables on ML models’ performance. Lastly, we evaluated the performance of five state-of-the-art ML algorithms (i.e., LR, KNN, RF, SVR, and DNN) for predicting corn yield at vegetative and reproductive growth stages.

2. Materials and Methods

2.1. Materials

Experimental Design and Yield Data Collection

The study was conducted at the United State Department of Agriculture-Agricultural Research Service, Crop Production Systems Research Unit farm in Stoneville, Mississippi. Figure 1 displays the geographical location of the experimental site and the design of agronomic treatments applied in this study. The agronomic treatments used were Austrian Winter Peas (AWP) (Pisum sativum L.) cover crop, biochar, gypsum, and fallow (i.e., left bare) during the non-growing corn season (i.e., October 2020 to April 2021) to assess their impact on the following corn yield. A seeding rate of 67 kg/ha for AWP and quantities of 15 t/ha and 2 t/ha for biochar and gypsum, respectively, were applied in this experiment. The description of treatments is summarized in Table 1. The experiment was designed with plot sizes of 12.50 × 8.50 m and 16 replications in a completely randomized block design. The corn variety: Dekalb DKC 62-08 was planted at a seeding rate of 55,352 seed/ha using John Deere 1705 Planter at a 1 m row spacing on 7 May 2021. The corn was harvested on 15 September 2021, using a Massey-Ferguson 2065 combine harvester to obtain the yield of each plot.

2.2. Methods

A workflow diagram of the adopted methodology is shown in Figure 2. This section describes the experimental design and yield data collection, UAV data acquisition and processing, derivation of VIs and training data preparation, suitable variable selection, ML implementation, validation, and performance evaluation of ML models in corn yield prediction.

2.2.1. UAV Data Acquisition and Processing

Corn field images were acquired using a DJI Phantom 3 Pro quadcopter UAV (DJI Phantom). On the UAV, a portable Parrot Sequoia multispectral camera was mounted that captures broadband RGB (14 MP), and narrowband green (550 nm, ±40 nm), red (660 nm, ±40 nm), red edge (735 nm, ±10 nm), and NIR (790 nm, ±40 nm) wavelengths. The camera automatically corrects brightness and synchronized global position system (GPS) positions with the UAV. The UAV flights were conducted between 10:30 a.m. and 12:00 p.m. to avoid cloud shadows as weather permitted, with flight altitude 30 m above the canopy surface to acquire high-resolution (~3 cm/pixel) image data during vegetative (V6) (16 June 2021) and reproductive stages (R5) (9 September 2021). Flight routes were preset using the mission planning tool of Pix4DCapture software, version 4.7.5 with image front overlap of 80% and side overlap of 70%. The collected images were imported to Pix4DMapper (https://www.pix4d.com, accessed on 15 September 2022) to generate RGB orthomosaics in broadband and green, red, red edge, and NIR orthomosaics in narrow band spectrum, which were orthorectified to correct geometric and vignetting distortion. Orthomosaic images were imported to ArcMap to draw the boundary of each plot based on different treatments. An R script was written to extract the mean values of different spectral bands and VIs within each experimental plot.

2.2.2. Derivation of VIs

VIs optically characterize crop health across their growth stages. The acquired multispectral datasets of vegetative and reproductive stages were used in deriving twenty six different VIs, including the Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index 2 (EVI2), Soil Adjusted Vegetation Index (SAVI), Modified Soil Adjusted Vegetation Index (MSAVI), Optimized Soil Adjusted Vegetation Index (OSAVI), Transformed Vegetation Index (TVI), Green Normalized Difference Vegetation Index (GNDVI), Normalized Difference Red Edge Index (NDRE), Normalized Green Red Difference Index (NGRDI), Normalized Crop Management Index (NCMI), Simplified Canopy Chlorophyll Content Index (SCCCI), Canopy Chlorophyll Content Index (CCCI), Renormalized Difference Vegetation Index (RDVI), Transformed Chlorophyll Absorption Reflectance Index (TCARI), Normalized Area Vegetation Index (NAVI), Chlorophyll Index Red Edge (CIRE), Modified Triangular Vegetation Index (MTVI), Modified Triangular Vegetation Index 2 (MTVI2), Red Edge Chlorophyll Vegetation Index (RECI), ratio of NIR and green bands (IV1), ratio of NIR and red bands (IV2), ratio of NIR and red edge bands (IV3), Green Chlorophyll Vegetation Index (GCVI), Chlorophyll Vegetation Index (CVI), Napierian logarithm of the red edge (LNRE), and the ratio of the red edge and the red spectral band produces the Chlorophyll Absorption Ratio Index (CARI). Table 2 provides a list of derived VIs with their mathematical formulas.
The mean values of spectral bands and VIs were extracted based on each treatment at both stages to prepare training data for corn yield prediction. For each treatment, 16 training samples were obtained with thirty independent variables. To prepare the training data for all treatments, the yield data of each treatment were combined, resulting in 64 training samples with thirty independent variables. The spectral bands and VIs are hereafter referred to as variables for improved readability.

2.2.3. ML Implementation

Five different ML algorithms, namely LR, KNN, SVM, RF, and DNN, were used in this study. We selected these algorithms as they have been widely used in different fields, including crop yield prediction. Furthermore, they represent unique learning mechanics with varied model complexity. The comparison of these algorithms can provide useful insight into their performance in context applied to limited training samples for yield prediction. ML algorithms were implemented using the ‘CARET’ package [67] in R statistical programming language [68] and are briefly described below.

LR

The LR is a simple and interpretable statistical model that describes the linear relationship between dependent and independent variables. LR makes the following assumptions: homogeneity of variance (i.e., training samples have similar variance), training samples are normally distributed and statistically independent, and there is linearity between dependent and independent variables [69].

KNN

The KNN Is a non-parametric and computationally efficient ML algorithm that stores available training samples and predicts the dependent variable based on similarity measures (i.e., Euclidean, Manhattan, and Minkowski distance) [70]. The samples of similar characteristics produce a lower distance and vice versa. The implemented KNN algorithm consists of one tuning hyperparameter, i.e., k, which describes the number of neighborhood samples considered in prediction.

RF

The RF uses a bagging technique where many decision trees (DTs) are developed to obtain an ensemble model for accurate classification or prediction results [70]. The implemented RF algorithm has one hyperparameter, noted as ‘mtry’, which describes the number of input variables randomly selected at each split while developing different DTs.

SVR

The SVR employs the same principle of maximal margin as SVM in the classification task, which enables an optimal hyperplane to be obtained and minimizes the difference between predicted and observed values [70]. The availability of kernel functions, such as linear, polynomial, radial basis function, and sigmoid, facilitates the development of optimal hyperplanes to produce higher accuracy. The employed SVR consists of two hyperparameters: cost and sigma. Cost denotes penalty and sigma indicates the complexity of the hyperplane.

DNN

The typical architecture of a neural network algorithm consists of input, hidden, and output layers [70]. The hidden layer(s) apply a transformation to learn the structure and pattern from the input data. The neural network employs the backpropagation method to allow hidden layer(s) to adjust the weight of neurons to obtain the desired output [70]. The implemented neural network algorithm consists of three hyperparameters, layer1, layer2, and layer3, as hidden layers, and can be called a Deep Neural Network (DNN).

2.2.4. Model Performance Measures

Considering the small number of training samples (i.e., sixteen for each treatment) in this study, which limits the training and testing data split, a K-fold cross-validation method is an effective approach for the performance evaluation of the different ML models [71]. This approach has been widely used in ML model performance assessment [71,72,73]. In K-fold cross-validation, the complete dataset is split into a training set with a different set of tests at every iteration to compute the performance measures. We used the coefficient of determination (R2) (Equation (1)) and Root Mean Square Error (RMSE) (Equation (2)) in this study as these are commonly used in assessing the prediction performance of ML models [17,25]. A 5-fold cross-validation with 10 repetitions (i.e., 50 iterations) was used to compute R2 and the RMSE (Figure 3) to assess the performance of the different ML models. A fold size of 5 was selected to ensure that approximately 75% of data were used for training and 25% for the validation sample at each iteration to compute the different performance measures.
R 2 = ( i = 1 k ( y i y ¯ ) ( f i f ¯ ) ) 2 / i = 1 k ( y i y ¯ ) 2 i = 1 k ( f i f ¯ ) 2
R M S E = 1 k i = 1 k y i f i 2

2.2.5. Suitable Variable Selection

Pearson’s correlation coefficient and the Variance Inflation Factor (VIF) were used in this study to remove highly correlated independent variables [74]. The correlation value between yield and independent variables assists in assessing their relative importance in explaining the yield variability whereas the VIF statistics assist in removing highly correlated variables. The correlation values range from −1 to 1. X i and Y i denote the corresponding values of X and Y for the i-th variable. X ¯ and Y ¯ represent the means of X and Y . r x y is the Pearson correlation coefficient between two variables (i.e., X and Y ) (Equation (3)). We also used the correlation plot to visually interpret the correlation among different variables. Let X = X 1 , X 2 , X 3 , , X N represent the independent variable set and R j 2 represent the multicollinear coefficient between X j and other variables. The VIF can be computed using Equation (4). A VIF value > 5 indicates higher multicollinearity among independent variables and should be discarded as these variables can negatively impact the model performance and interpretability [75].
r x y = i = 1 n X i X ¯ k = 1 n ( X i X ¯ ) × Y i Y ¯ k = 1 n ( Y i Y ¯ )
V I F = 1 1 R j 2

3. Results

3.1. Relationship between Agronomic Treatments and Yield

A boxplot (Figure 4) and analysis of variance (ANOVA) test (Table 3) confirmed that a one-year experiment with different agronomic treatments did not have a significant impact on corn yields. However, the AWP treatment indicated the lowest impact in improving the total yield (i.e., mean = 18.84 Mg/ha) whereas biochar (i.e., mean = 19.06 Mg/ha), gypsum (i.e., mean = 19.48 Mg/ha), and fallow (i.e., mean = 19.82 Mg/ha) treatments showed a slightly higher yield. The AWP treatment indicated the lowest variability in yield compared to the other treatments. The yield values of AWP and the other treatments are approximately symmetrically distributed and positively skewed, respectively.

3.2. Relationship between VIs and Yield

The linear correlation coefficient and VIF statistics were used in selecting suitable variables (i.e., spectral bands and VIs). The variables produce VIF values ≥5 were excluded as these indicate multicollinearity among them. The correlation values between different variables and yield at the V6 and R5 stages of different agronomic treatments are presented in Table 4. The correlation between variables and yield varies greatly based on different treatments and growth stages. Variables indicate positive and negative correlations with yield, however, the magnitude of the correlation is of main interest as compared to the positive/negative relationship. The red edge and GNDVI have the maximum correlation with the yield (i.e., 0.40 and −0.31) at V6 and R5, respectively, for the AWP treatment. The CCCI and red edge have the maximum correlation with yield (i.e., −0.63 and −0.60) at V6 and R5 stages for the biochar treatment. The RECI and green have the maximum correlation with the yield (i.e., −0.59 and 0.69) at V6 and R5 for the gypsum treatment. The CARI and CVI have the maximum correlation with the yield (i.e., −0.87 and −0.69) at V6 and R5 for the fallow.
Figure 5 and Figure 6 present the correlation between suitable variables and the yield for each treatment at the V6 and R5 stages, respectively. For the AWP treatment, the red edge, NIR, CARI, and CVI are suitable VIs at V6 whereas the GNDVI, NCMI, NIR, IV3, and SCCCI are suitable VIs at R5. Similarly, for the biochar, the CCCI, green, and NGRDI are suitable variables at V6 whereas red edge, TCARI, and SCCCI are suitable at R5. For the gypsum, the RECI, CCCI, and green are suitable variables at V6 whereas the green, IV3, CVI, and NGRDI are suitable at R5. For the fallow, the CARI and red edge are suitable variables at V6 whereas the CVI, CARI, SCCCI, green, and NDRE are suitable variables at R5. The suitable variables for each treatment at both growth stages were used in assessing the impact of the number of variables on ML model performance.
It is worth noting that the R 2 of the suitable variables and yield is significantly lower for the AWP than for the other treatments at both growth stages. The R 2 for the AWP treatment is in the range of 0–0.16 at V6 and 0–0.10 at R5. For biochar treatment, the R 2 is in the range 0.11–0.40 at V6 and 0.10–0.37 at R5. The R 2 for the gypsum treatment is in the range of 0.28–0.35 at V6 and 0.16–0.48 at R5. For the fallow, the selected variables indicated a higher correlation with yield as compared with other treatments. The R 2 is in the range of 0.13–0.48 at V6 and 0.01–0.48 at R5. Furthermore, the VIs show a similar correlation at both growth stages across different treatments, indicating their suitability for use in yield prediction irrespective of the growth stage.

3.3. Impact of the Number of Variables on ML Performance

The optimal number of variables is crucial in obtaining accurate results from any ML model. For each treatment, the impact of the number of variables (i.e., one variable, two variables, and so on) on ML performance was assessed to derive the optimal number of variables to use in the final yield prediction at both stages. The hyperparameter of the ML models was optimized using the grid search 5-fold cross-validation method before assessing their performance.
Figure 7 and Figure 8 and Table 5 and Table 6 present the prediction accuracies of the ML models using a different number of variables at the V6 and R5 growth stages. In most cases, ML models produced the best prediction results using the top one or two variables across different treatments at both stages. The top one or two variables having a higher correlation with yield indicate a higher contribution to predicting yield accurately as compared with variables showing a lesser correlation with yield. The addition of more variables did not improve the model’s performance significantly, except for a few occasions. As compared with other ML models, the DNN shows a higher sensitivity to the number of variables. For example, for the AWP treatment at V6, the R 2 of DNN increased from 0.58 to 0.67 when the number of variables increased from one to two but showed a decreasing trend in performance when the number of variables increased to three (0.64) or four variables (0.62).
The best prediction (i.e., R 2 = 0.67 and RMSE = 1.62 Mg/ha) was achieved by the DNN for the AWP treatment at V6 when developed using two variables (i.e., red edge and NIR). For the biochar treatment at V6, the R 2 of DNN increased from 0.57 to 0.71 and RMSE decreased from 1.74 to 1.08 Mg/ha when the number of variables increased from one to two, but the model’s performance was reduced when developed using three variables (i.e., R 2 = 0.67 and RMSE = 1.29 Mg/ha). Similar inferences can be made at the R5 stage across different treatments (Figure 8 and Table 6). For example, the RMSE of DNN for the AWP treatment decreased from 2.03 to 1.17 Mg/ha when developed using five variables. Similarly, for the AWP treatment, the KNN predicted the yield accurately when developed using the top one variable (i.e., GNDVI) with R 2 = 0.64 and RMSE = 1.13 Mg/ha.

3.4. ML Performance Evaluation in Yield Prediction

The ML model achieved maximum prediction accuracy using the optimal number of variables in comparing their performance for each treatment at both stages (Figure 9 and Table 7). The optimal values of the hyperparameter of the best-performing models are presented in Table 8. Most of the ML models show noticeable differences in yield prediction accuracy. As compared to other ML models, the KNN and SVR produced the most accurate prediction results. The KNN produced the best prediction accuracy for the AWP ( R 2 = 0.69 and RMSE = 1.05 Mg/ha at V6 and R 2 = 0.64 and RMSE = 1.13 Mg/ha at R5) and gypsum treatments ( R 2 = 0.65 and RMSE = 1.75 Mg/ha at V6 and R 2 = 0.80 and RMSE = 1.35 Mg/ha at R5). The SVR achieved the best prediction accuracy for the fallow ( R 2 = 0.84 and RMSE = 0.69 Mg/ha at V6 and R 2 = 0.83 and RMSE = 1.05 Mg/ha at R5) and all treatments ( R 2 = 0.36 and RMSE = 1.48 Mg/ha at V6 and R 2 = 0.41 and RMSE = 1.41 Mg/ha at R5). The DNN achieved the best prediction accuracy for the biochar ( R 2 = 0.71 and RMSE = 1.08 Mg/ha at V6 and R 2 = 0.74 and RMSE = 1.27 Mg/ha at R5).
The RF and DNN showed more variability in their performance as compared to SVM and KNN with respect to treatments. The LR produced equally good accuracy compared with the other models on a few occasions. For example, for the fallow treatment at the V6 stage, the LR produced a similar prediction accuracy as obtained by the best-performing model (i.e., SVR). On a few occasions, the performance of RF and DNN was even poorer than that of LR. For the gypsum treatment at the V6 stage, the RF and DNN produced even worse prediction accuracies than LR.

4. Discussion

Agronomic treatments (i.e., AWP cover crop, biochar, and gypsum) applied from November 2020 to April 2021 in this study did not increase corn yields compared to the fallow treatment after only a year of treatment application (Figure 4 and Table 3). This was attributed to the fact that AWP used nitrogen for growth and the subsequent nitrogen release through mineralization to meet the ensuing corn crop demands may not have synchronized with corn needs in the first year of corn production [76]. Reduced yields due to the use of cover crops and resultant lower availability of nitrogen have been reported elsewhere [77]. Biochar’s high carbon-to-nitrogen ratio results in nitrogen immobilization, reducing the available inorganic nitrogen for the corn crop [78,79]. Leaving the soil bare during the winter fallow period makes the soil vulnerable to erosion, with potential nutrient losses to the environment through surface runoff. The use of gypsum as a soil amendment was meant to mitigate phosphorus losses through enhanced soil aggregation and water infiltration, thus reducing surface runoff [80] and enhancing phosphorus use efficiency. The use of gypsum showed comparable corn yields to traditional fallow practice in the first year of use.
The first and critical step for in-season crop decision-making is yield prediction [81]. VIs are sensitive to photosynthesis and overall crop health, which makes them suitable input variables in crop yield prediction [17,24,31,34,82,83,84]. Historically, sensor-based decision systems have relied on simple mechanistic frameworks using single variable information, limiting their application [85]. However, we used thirty variables consisting of four spectral bands and twenty-six VIs to explore their effectiveness in explaining corn yield variability for each treatment individually and combined at the V6 and R5 growth stages. Our results indicated that different VIs were suitable for explaining yield variability for each treatment at both stages (Table 4). Like AWP, all treatment data (i.e., the combination of each treatment) also indicated a lower correlation with yield. The spatial yield variability was well explained by VIs for fallow treatment as compared to other treatments at both growth stages. Most of the previous studies have noted a strong relationship between VIs incorporating red edge wavelength, and crop yield as growth stages progress [86]. The red edge, CCCI, RECI, CARI, GNDVI, green, and CVI are among the most suitable variables in explaining the corn yield variability in this study. The suitable variables found in this study were also found to be suitable in explaining crop yield variability in previous studies [87]. For example, Li et al. [61] found the red-edge-based Vis, particularly CCCI, to be effective in estimating summer corn plant N concentration and uptake across different growth stages.
The influence of the number of variables on ML performance is vital in obtaining an accurate and less complex model [44]. A model developed using relatively fewer variables offers a better interpretability of the decision made and is likely to reduce the risk of overfitting. In this study, we assessed the impact of the number of selected variables on model performance (Figure 7 and Figure 8) and developed the best possible yield prediction ML model for each treatment at both growth stages (Figure 9 and Table 7). Most of the ML models achieved best-performing results using the top one or two variables that indicated a higher correlation with yield.
The performance of any ML model highly depends on its input variables. ML models indicated different levels of prediction accuracy based on different treatments applied during the non-growing corn season (Figure 7 and Figure 8). The differences in ML performance of yield prediction for different treatments are linked with their input variables and correlation with yield. For example, the yield was most accurately predicted for fallow using all ML models compared to the other treatments as its variables indicated a higher correlation with yield. The yield prediction model that grouped AWP, biochar, and gypsum treatments, and the long-term fallow practice, resulted in lower prediction accuracy with R 2 = 0.36 and RMSE = 1.48 Mg/ha at V6 and R 2 = 0.41 and RMSE = 1.41 Mg/ha at R5. However, yield prediction models that handled the different treatments in isolation significantly improved the predictive accuracy with maximum R2 values ranging from 0.64 to 0.83 and RMSE values from 1.05 to 1.75 Mg/ha for both the V6 and R5 growth stages. The relatively lower R2 values for AWP and biochar treatments were attributed to the treatment’s effects on the use of inorganic nitrogen by AWP for growth and nitrogen immobilization by biochar lowering the available nitrogen for the corn crop, bringing random variation since this was the first year of treatment. This research showed that corn yields could be predicted with similar precision at both vegetative (V6) and reproductive (R5) stages and that farmer practice should be considered when implementing predictive models for reliable outcomes.
The comparative analysis of ML models with their optimal configuration demonstrated that KNN and SVR outperform other models at both growth stages (Figure 9 and Table 6). Unlike this study, Mupangwa et al. [25] found poor corn yield prediction from the SVR but achieved accurate results from KNN. Matsumura et al. [51] achieved the best prediction accuracy using neural networks but our study found a higher RMSE from neural networks, indicating overfitting, possibly due to the limited number of samples. The prediction accuracy of RF in this study was not good when obtained from SVR and KNN, except on a few occasions, such as for AWP at V6 and fallow at R5. The LR produced a similar prediction accuracy where input variables indicated a higher linear correlation with yield, as expected. For example, the prediction accuracy of LR for the fallow treatment is approximately as good as the best-performing ML models (i.e., KNN and SVR), but performance decreases where the input variables do not indicate a higher linear correlation with yield as seen in AWP and all treatments at both growth stages. On the other hand, other ML models produce a higher prediction accuracy than LR even if there is a low to moderate correlation with yield due to their better ability to learn the complex relationship between dependent and independent variables. Furthermore, the experimental results demonstrated that selected VIs can be efficiently used in corn yield prediction across different growth stages (i.e., V6 and R5) with reasonably good accuracy (Figure 9 and Table 7).
This study demonstrated the utility of UAV-based VIs and ML models for predicting corn yield with reasonably good accuracy under the constraint of a limited number of training data and a lack of knowledge of other variables like soil and weather. Obtaining larger training data is practically challenging for small agricultural plots. Studies should also focus on assessing how early corn yield can be predicted with adequate accuracy to support effective crop management practices and decision-making. Furthermore, the integration of UAV-based optical and thermal sensors along with ground proximal sensing data can potentially improve the prediction results and facilitate a more thorough ground-based validation of crop yield prediction. Future studies can also explore data augmentation approaches to derive synthesis training data and their subsequent utilization in developing different ML and deep learning models for accurate yield prediction and forecasting.

5. Conclusions

This study demonstrated the potential application of UAV-derived Vegetation Index (VIs) coupled with state-of-the-art Machine Learning (ML) models in corn yield prediction with a limited number of training samples at the farm scale. Spectral bands and VIs including red edge, canopy chlorophyll content index, red edge chlorophyll index, chlorophyll absorption ratio index, green normalized difference vegetation index, green spectral band, and chlorophyll vegetation index indicated moderate to good correlation with the yield at the V6 and R5 growth stages, indicating their suitability for use in predicting yield using ML models.
Support Vector Regression (SVR) and k-Nearest Neighbor (KNN) outperformed the other models such as the Linear Regression (LR), Random Forest (RF), and Deep Neural Network (DNN) models. The performance of DNN was found to be more sensitive to the number of variables used in its development and often produced a higher RMSE than other models, mainly due to the limited amount of training data. This study suggests the utilization of simpler models like LR or KNN, particularly when input variables indicate a strong linear relationship with the dependent variable as these models also offer a better model interpretability than the black-box nature of ML.
Corn yield prediction models under different agronomic practices showed higher accuracy when they were management specific rather than combined. This study highlighted the importance of considering agronomic practices and farming history to enhance models’ predictive performance. This research also showed that corn yields could be predicted with adequate accuracy at both vegetative and reproductive stages, which is vital in time-sensitive decision-making. Thus, this study confirmed that UAV-derived VIs in conjunction with ML models can produce an adequate corn yield prediction even with a limited number of training samples and could be effectively used in better-informed crop management.

Author Contributions

Conceptualization, C.K. and Y.H.; methodology, C.K.; software, C.K.; validation, C.K.; formal analysis, C.K.; investigation, C.K., P.M. and Y.H.; resources, J.D. and K.R.; data curation, Y.H. and P.M.; writing—original draft preparation, C.K.; writing—review and editing, C.K., P.M., Y.H., J.D. and K.R.; visualization, C.K.; supervision, Y.H., J.D. and K.R.; project administration, K.R.; funding acquisition, K.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the U. S. Department of Agriculture-Agriculture Research Service (USDA-ARS), Crop Production Systems Research Unit through project number 6066-22000-089-000D and through USDA-ARS and Mississippi State University non-assistance cooperative agreement number 58-6066-1-020.

Data Availability Statement

The data used in this manuscript are available upon reasonable request to the corresponding author. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture.

Acknowledgments

We would like to express our sincere thanks to the anonymous reviewers for their constructive comments and suggestions to improve the original version of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shiferaw, B.; Prasanna, B.M.; Hellin, J.; Bänziger, M. Crops that feed the world 6. Past successes and future challenges to the role played by maize in global food security. Food Secur. 2011, 3, 307–327. [Google Scholar] [CrossRef]
  2. McConnell, M. Feedgrains Sector at a Glance; USDA Economic Research Service US Department of Agriculture: Washington, DC, USA, 2022. [Google Scholar]
  3. World Health Organization. The State of Food Security and Nutrition in the World 2021: Transforming Food Systems for Food Security, Improved Nutrition and Affordable Healthy Diets for All; Food & Agriculture Organization: Rome, Italy, 2021; Volume 2021. [Google Scholar]
  4. Sishodia, R.P.; Ray, R.L.; Singh, S.K. Applications of remote sensing in precision agriculture: A review. Remote Sens. 2020, 12, 3136. [Google Scholar] [CrossRef]
  5. Kamath, P.; Patil, P.; Shrilatha, S.; Sowmya, S. Crop yield forecasting using data mining. Glob. Transit. Proc. 2021, 2, 402–407. [Google Scholar] [CrossRef]
  6. Surya, P.; Aroquiaraj, I.L. Crop yield prediction in agriculture using data mining predictive analytic techniques. Int. J. Res. Anal. Rev. 2018, 5, 783–787. [Google Scholar]
  7. Bala, A. Machine Learning Approaches for Crop Yield Prediction-Review. Int. J. Comput. Eng. Technol. 2020, 11, 23–27. [Google Scholar]
  8. Sagar, B.; Cauvery, N. Agriculture data analytics in crop yield estimation: A critical review. Indones. J. Electr. Eng. Comput. Sci. 2018, 12, 1087–1093. [Google Scholar] [CrossRef]
  9. Wójtowicz, M.; Wójtowicz, A.; Piekarczyk, J. Application of remote sensing methods in agriculture. Commun. Biometry Crop Sci. 2016, 11, 31–50. [Google Scholar]
  10. Ali, A.M.; Abouelghar, M.; Belal, A.A.; Saleh, N.; Yones, M.; Selim, A.I.; Amin, M.E.S.; Elwesemy, A.; Kucher, D.E.; Maginan, S.; et al. Crop Yield Prediction Using Multi Sensors Remote Sensing (Review Article). Egypt. J. Remote Sens. Space Sci. 2022, 25, 711–716. [Google Scholar] [CrossRef]
  11. Yang, C. A high-resolution airborne four-camera imaging system for agricultural remote sensing. Comput. Electron. Agric. 2012, 88, 13–24. [Google Scholar] [CrossRef]
  12. Tsouros, D.C.; Bibi, S.; Sarigiannidis, P.G. A review on UAV-based applications for precision agriculture. Information 2019, 10, 349. [Google Scholar] [CrossRef]
  13. Wang, T.; Xu, X.; Wang, C.; Li, Z.; Li, D. From smart farming towards unmanned farms: A new mode of agricultural production. Agriculture 2021, 11, 145. [Google Scholar] [CrossRef]
  14. Ji, Z.; Pan, Y.; Zhu, X.; Zhang, D.; Dai, J. Prediction of Corn Yield in the USA Corn Belt Using Satellite Data and Machine Learning: From an Evapotranspiration Perspective. Agriculture 2022, 12, 1263. [Google Scholar] [CrossRef]
  15. Maes, W.H.; Steppe, K. Perspectives for remote sensing with unmanned aerial vehicles in precision agriculture. Trends Plant Sci. 2019, 24, 152–164. [Google Scholar] [CrossRef] [PubMed]
  16. Rokhmana, C.A. The potential of UAV-based remote sensing for supporting precision agriculture in Indonesia. Procedia Environ. Sci. 2015, 24, 245–253. [Google Scholar] [CrossRef]
  17. Bian, C.; Shi, H.; Wu, S.; Zhang, K.; Wei, M.; Zhao, Y.; Sun, Y.; Zhuang, H.; Zhang, X.; Chen, S. Prediction of field-scale wheat yield using machine learning method and multi-spectral UAV data. Remote Sens. 2022, 14, 1474. [Google Scholar] [CrossRef]
  18. Hussain, N.; Sarfraz, S.; Javed, S. A Systematic Review on Crop-Yield Prediction through Unmanned Aerial Vehicles. In Proceedings of the 2021 16th International Conference on Emerging Technologies (ICET), Islamabad, Pakistan, 22–23 December 2021; pp. 1–9. [Google Scholar]
  19. Huang, Y.; Thomson, S.J.; Hoffmann, W.C.; Lan, Y.; Fritz, B.K. Development and prospect of unmanned aerial vehicle technologies for agricultural production management. Int. J. Agric. Biol. Eng. 2013, 6, 1–10. [Google Scholar]
  20. Yang, G.; Liu, J.; Zhao, C.; Li, Z.; Huang, Y.; Yu, H.; Xu, B.; Yang, X.; Zhu, D.; Zhang, X. Unmanned aerial vehicle remote sensing for field-based crop phenotyping: Current status and perspectives. Front. Plant Sci. 2017, 8, 1111. [Google Scholar] [CrossRef]
  21. Ma, Y.; Zhang, Z.; Kang, Y.; Özdoğan, M. Corn yield prediction and uncertainty analysis based on remotely sensed variables using a Bayesian neural network approach. Remote Sens. Environ. 2021, 259, 112408. [Google Scholar] [CrossRef]
  22. Shahhosseini, M.; Hu, G.; Khaki, S.; Archontoulis, S.V. Corn yield prediction with ensemble CNN-DNN. Front. Plant Sci. 2021, 12, 709008. [Google Scholar] [CrossRef]
  23. Buthelezi, S.; Mutanga, O.; Sibanda, M.; Odindi, J.; Clulow, A.D.; Chimonyo, V.G.; Mabhaudhi, T. Assessing the prospects of remote sensing maize leaf area index using UAV-derived multi-spectral data in smallholder farms across the growing season. Remote Sens. 2023, 15, 1597. [Google Scholar] [CrossRef]
  24. Pipatsitee, P.; Tisarum, R.; Taota, K.; Samphumphuang, T.; Eiumnoh, A.; Singh, H.P.; Cha-Um, S. Effectiveness of vegetation indices and UAV-multispectral imageries in assessing the response of hybrid maize (Zea mays L.) to water deficit stress under field environment. Environ. Monit Assess 2022, 195, 128. [Google Scholar] [CrossRef] [PubMed]
  25. Mupangwa, W.; Chipindu, L.; Nyagumbo, I.; Mkuhlani, S.; Sisito, G. Evaluating machine learning algorithms for predicting maize yield under conservation agriculture in Eastern and Southern Africa. SN Appl. Sci. 2020, 2, 952. [Google Scholar] [CrossRef]
  26. Shahhosseini, M.; Martinez-Feria, R.A.; Hu, G.; Archontoulis, S.V. Maize yield and nitrate loss prediction with machine learning algorithms. Environ. Res. Lett. 2019, 14, 124026. [Google Scholar] [CrossRef]
  27. Danilevicz, M.F.; Bayer, P.E.; Boussaid, F.; Bennamoun, M.; Edwards, D. Maize yield prediction at an early developmental stage using multispectral images and genotype data for preliminary hybrid selection. Remote Sens. 2021, 13, 3976. [Google Scholar] [CrossRef]
  28. Dhaka, V.; Vikas, L. Wheat yield prediction using artificial neural network and crop prediction techniques (a survey). IJRASET 2014, 2, 330–341. [Google Scholar]
  29. Tanabe, R.; Matsui, T.; Tanaka, T.S.T. Winter wheat yield prediction using convolutional neural networks and UAV-based multispectral imagery. Field Crops Res. 2023, 291, 108786. [Google Scholar] [CrossRef]
  30. Bascon, M.V.; Nakata, T.; Shibata, S.; Takata, I.; Kobayashi, N.; Kato, Y.; Inoue, S.; Doi, K.; Murase, J.; Nishiuchi, S. Estimating Yield-Related Traits Using UAV-Derived Multispectral Images to Improve Rice Grain Yield Prediction. Agriculture 2022, 12, 1141. [Google Scholar] [CrossRef]
  31. Wan, L.; Cen, H.; Zhu, J.; Zhang, J.; Zhu, Y.; Sun, D.; Du, X.; Zhai, L.; Weng, H.; Li, Y. Grain yield prediction of rice using multi-temporal UAV-based RGB and multispectral images and model transfer—A case study of small farmlands in the South of China. Agric. For. Meteorol. 2020, 291, 108096. [Google Scholar] [CrossRef]
  32. Zhou, X.; Zheng, H.; Xu, X.; He, J.; Ge, X.; Yao, X.; Cheng, T.; Zhu, Y.; Cao, W.; Tian, Y. Predicting grain yield in rice using multi-temporal vegetation indices from UAV-based multispectral and digital imagery. ISPRS J. Photogramm. Remote Sens. 2017, 130, 246–255. [Google Scholar] [CrossRef]
  33. Tian, L.; Wang, C.; Li, H.; Sun, H. Yield prediction model of rice and wheat crops based on ecological distance algorithm. Environ. Technol. Innov. 2020, 20, 101132. [Google Scholar] [CrossRef]
  34. da Silva, E.E.; Baio, F.H.R.; Teodoro, L.P.R.; da Silva Junior, C.A.; Borges, R.S.; Teodoro, P.E. UAV-multispectral and vegetation indices in soybean grain yield prediction based on in situ observation. Remote Sens. Appl. Soc. Environ. 2020, 18, 100318. [Google Scholar] [CrossRef]
  35. Siegfried, J.; Adams, C.B.; Rajan, N.; Hague, S.; Schnell, R.; Hardin, R. Combining a cotton ‘Boll Area Index’ with in-season unmanned aerial multispectral and thermal imagery for yield estimation. Field Crops Res. 2023, 291, 108765. [Google Scholar] [CrossRef]
  36. Ashapure, A.; Jung, J.; Chang, A.; Oh, S.; Yeom, J.; Maeda, M.; Maeda, A.; Dube, N.; Landivar, J.; Hague, S.; et al. Developing a machine learning based cotton yield estimation framework using multi-temporal UAS data. ISPRS J. Photogramm. Remote Sens. 2020, 169, 180–194. [Google Scholar] [CrossRef]
  37. He, L.; Fang, W.; Zhao, G.; Wu, Z.; Fu, L.; Li, R.; Majeed, Y.; Dhupia, J. Fruit yield prediction and estimation in orchards: A state-of-the-art comprehensive review for both direct and indirect methods. Comput. Electron. Agric. 2022, 195, 106812. [Google Scholar] [CrossRef]
  38. de Oliveira, R.P.; Barbosa Júnior, M.R.; Pinto, A.A.; Oliveira, J.L.P.; Zerbato, C.; Furlani, C.E.A. Predicting Sugarcane Biometric Parameters by UAV Multispectral Images and Machine Learning. Agronomy 2022, 12, 1992. [Google Scholar] [CrossRef]
  39. Akbarian, S.; Xu, C.; Wang, W.; Ginns, S.; Lim, S. Sugarcane yields prediction at the row level using a novel cross-validation approach to multi-year multispectral images. Comput. Electron. Agric. 2022, 198, 107024. [Google Scholar] [CrossRef]
  40. Mitchell, T.M. Machine Learning; McGraw-Hill: New York, NY, USA, 2007; Volume 1. [Google Scholar]
  41. Mjolsness, E.; DeCoste, D. Machine learning for science: State of the art and future prospects. Science 2001, 293, 2051–2055. [Google Scholar] [CrossRef] [PubMed]
  42. Kumar, C.; Chatterjee, S.; Oommen, T.; Guha, A. Automated lithological mapping by integrating spectral enhancement techniques and machine learning algorithms using AVIRIS-NG hyperspectral data in Gold-bearing granite-greenstone rocks in Hutti, India. Int. J. Appl. Earth Obs. Geoinf. 2020, 86, 102006. [Google Scholar] [CrossRef]
  43. Kumar, C.; Chatterjee, S.; Oommen, T.; Guha, A.; Mukherjee, A. Multi-sensor datasets-based optimal integration of spectral, textural, and morphological characteristics of rocks for lithological classification using machine learning models. Geocarto Int. 2022, 37, 6004–6032. [Google Scholar] [CrossRef]
  44. Kumar, C.; Walton, G.; Santi, P.; Luza, C. An Ensemble Approach of Feature Selection and Machine Learning Models for Regional Landslide Susceptibility Mapping in the Arid Mountainous Terrain of Southern Peru. Remote Sens. 2023, 15, 1376. [Google Scholar] [CrossRef]
  45. Bhatt, P.; Maclean, A.; Dickinson, Y.; Kumar, C. Fine-Scale Mapping of Natural Ecological Communities Using Machine Learning Approaches. Remote Sens. 2022, 14, 563. [Google Scholar] [CrossRef]
  46. Santana, D.C.; Teodoro, L.P.R.; Baio, F.H.R.; Santos, R.G.d.; Coradi, P.C.; Biduski, B.; da Silva Junior, C.A.; Teodoro, P.E.; Shiratsuchi, L.S. Classification of soybean genotypes for industrial traits using UAV multispectral imagery and machine learning. Remote Sens. Appl. Soc. Environ. 2023, 29, 100919. [Google Scholar] [CrossRef]
  47. Rashid, M.; Bari, B.S.; Yusup, Y.; Kamaruddin, M.A.; Khan, N. A comprehensive review of crop yield prediction using machine learning approaches with special emphasis on palm oil yield prediction. IEEE Access 2021, 9, 63406–63439. [Google Scholar] [CrossRef]
  48. Shahhosseini, M.; Hu, G.; Huber, I.; Archontoulis, S.V. Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt. Sci. Rep. 2021, 11, 1606. [Google Scholar] [CrossRef]
  49. Van Klompenburg, T.; Kassahun, A.; Catal, C. Crop yield prediction using machine learning: A systematic literature review. Comput. Electron. Agric. 2020, 177, 105709. [Google Scholar] [CrossRef]
  50. Croci, M.; Impollonia, G.; Meroni, M.; Amaducci, S. Dynamic Maize Yield Predictions Using Machine Learning on Multi-Source Data. Remote Sens. 2022, 15, 100. [Google Scholar] [CrossRef]
  51. Matsumura, K.; Gaitan, C.F.; Sugimoto, K.; Cannon, A.J.; Hsieh, W.W. Maize yield forecasting by linear regression and artificial neural networks in Jilin, China. J. Agric. Sci. 2015, 153, 399–410. [Google Scholar] [CrossRef]
  52. Kim, N.; Lee, Y.-W. Machine learning approaches to corn yield estimation using satellite images and climate data: A case of Iowa State. J. Korean Soc. Surv. Geod. Photogramm. Cartogr. 2016, 34, 383–390. [Google Scholar] [CrossRef]
  53. Shen, Y.; Mercatoris, B.; Cao, Z.; Kwan, P.; Guo, L.; Yao, H.; Cheng, Q. Improving Wheat Yield Prediction Accuracy Using LSTM-RF Framework Based on UAV Thermal Infrared and Multispectral Imagery. Agriculture 2022, 12, 892. [Google Scholar] [CrossRef]
  54. Liebman, A.M.; Grossman, J.; Brown, M.; Wells, M.S.; Reberg-Horton, S.; Shi, W. Legume cover crops and tillage impact nitrogen dynamics in organic corn production. Agron. J. 2018, 110, 1046–1057. [Google Scholar] [CrossRef]
  55. DeLaune, P.; Mubvumba, P. Winter cover crop production and water use in Southern Great Plains cotton. Agron. J. 2020, 112, 1943–1951. [Google Scholar] [CrossRef]
  56. Syuhada, A.B.; Shamshuddin, J.; Fauziah, C.; Rosenani, A.; Arifin, A. Biochar as soil amendment: Impact on chemical properties and corn nutrient uptake in a Podzol. Can. J. Soil Sci. 2016, 96, 400–412. [Google Scholar] [CrossRef]
  57. Rogovska, N.; Laird, D.A.; Karlen, D.L. Corn and soil response to biochar application and stover harvest. Field Crops Res. 2016, 187, 96–106. [Google Scholar] [CrossRef]
  58. Prakash, N.B.; Dhumgond, P.; Shruthi; Chikkaramappa, T.; Ashrit, S. Performance of slag-based gypsum on maize yield and available soil nutrients over commercial gypsum under acidic and neutral soil. Commun. Soil Sci. Plant Anal. 2020, 51, 1780–1798. [Google Scholar] [CrossRef]
  59. Bossolani, J.W.; Crusciol, C.A.C.; Merloti, L.F.; Moretti, L.G.; Costa, N.R.; Tsai, S.M.; Kuramae, E.E. Long-term lime and gypsum amendment increase nitrogen fixation and decrease nitrification and denitrification gene abundances in the rhizosphere and soil in a tropical no-till intercropping system. Geoderma 2020, 375, 114476. [Google Scholar] [CrossRef]
  60. Ballester, C.; Brinkhoff, J.; Quayle, W.C.; Hornbuckle, J. Monitoring the Effects of Water Stress in Cotton Using the Green Red Vegetation Index and Red Edge Ratio. Remote Sens. 2019, 11, 873. [Google Scholar] [CrossRef]
  61. Li, F.; Miao, Y.; Feng, G.; Yuan, F.; Yue, S.; Gao, X.; Liu, Y.; Liu, B.; Ustin, S.L.; Chen, X. Improving estimation of summer maize nitrogen status with red edge-based spectral vegetation indices. Field Crops Res. 2014, 157, 111–123. [Google Scholar] [CrossRef]
  62. Venancio, L.P.; Mantovani, E.C.; do Amaral, C.H.; Neale, C.M.U.; Gonçalves, I.Z.; Filgueiras, R.; Eugenio, F.C. Potential of using spectral vegetation indices for corn green biomass estimation based on their relationship with the photosynthetic vegetation sub-pixel fraction. Agric. Water Manag. 2020, 236, 106155. [Google Scholar] [CrossRef]
  63. Zhang, L.; Zhang, Z.; Luo, Y.; Cao, J.; Xie, R.; Li, S. Integrating satellite-derived climatic and vegetation indices to predict smallholder maize yield using deep learning. Agric. For. Meteorol. 2021, 311, 108666. [Google Scholar] [CrossRef]
  64. San Bautista, A.; Fita, D.; Franch, B.; Castiñeira-Ibáñez, S.; Arizo, P.; Sánchez-Torres, M.J.; Becker-Reshef, I.; Uris, A.; Rubio, C. Crop Monitoring Strategy Based on Remote Sensing Data (Sentinel-2 and Planet), Study Case in a Rice Field after Applying Glycinebetaine. Agronomy 2022, 12, 708. [Google Scholar] [CrossRef]
  65. Hassan, M.A.; Yang, M.; Rasheed, A.; Jin, X.; Xia, X.; Xiao, Y.; He, Z. Time-series multispectral indices from unmanned aerial vehicle imagery reveal senescence rate in bread wheat. Remote Sens. 2018, 10, 809. [Google Scholar] [CrossRef]
  66. Qiao, L.; Tang, W.; Gao, D.; Zhao, R.; An, L.; Li, M.; Sun, H.; Song, D. UAV-based chlorophyll content estimation by evaluating vegetation index responses under different crop coverages. Comput. Electron. Agric. 2022, 196, 106775. [Google Scholar] [CrossRef]
  67. Kuhn, M.; Wing, J.; Weston, S.; Williams, A.; Keefer, C.; Engelhardt, A.; Cooper, T.; Mayer, Z.; Kenkel, B.; R Core Team. Package ‘caret’. R J. 2020, 223, 1–224. [Google Scholar]
  68. Ihaka, R.; Gentleman, R. R: A language for data analysis and graphics. J. Comput. Graph. Stat. 1996, 5, 299–314. [Google Scholar]
  69. Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
  70. Alpaydin, E. Introduction to Machine Learning; MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
  71. Wong, T.-T.; Yeh, P.-Y. Reliable accuracy estimates from k-fold cross validation. IEEE Trans. Knowl. Data Eng. 2019, 32, 1586–1594. [Google Scholar] [CrossRef]
  72. Fushiki, T. Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 2011, 21, 137–146. [Google Scholar] [CrossRef]
  73. Wong, T.-T.; Yang, N.-Y. Dependency analysis of accuracy estimates in k-fold cross validation. IEEE Trans. Knowl. Data Eng. 2017, 29, 2417–2427. [Google Scholar] [CrossRef]
  74. Shrestha, N. Detecting multicollinearity in regression analysis. Am. J. Appl. Math. Stat. 2020, 8, 39–42. [Google Scholar]
  75. Kim, J.H. Multicollinearity and misleading statistical results. Korean J. Anesthesiol. 2019, 72, 558–569. [Google Scholar] [CrossRef]
  76. Jani, A.D.; Grossman, J.; Smyth, T.J.; Hu, S. Winter legume cover-crop root decomposition and N release dynamics under disking and roller-crimping termination approaches. Renew. Agric. Food Syst. 2016, 31, 214–229. [Google Scholar] [CrossRef]
  77. Parr, M.; Grossman, J.; Reberg-Horton, S.; Brinton, C.; Crozier, C. Nitrogen delivery from legume cover crops in no-till organic corn production. Agron. J. 2011, 103, 1578–1590. [Google Scholar] [CrossRef]
  78. Bruun, E.W.; Ambus, P.; Egsgaard, H.; Hauggaard-Nielsen, H. Effects of slow and fast pyrolysis biochar on soil C and N turnover dynamics. Soil Biol. Biochem. 2012, 46, 73–79. [Google Scholar] [CrossRef]
  79. Nelissen, V.; Rütting, T.; Huygens, D.; Staelens, J.; Ruysschaert, G.; Boeckx, P. Maize biochars accelerate short-term soil nitrogen dynamics in a loamy sand soil. Soil Biol. Biochem. 2012, 55, 20–27. [Google Scholar] [CrossRef]
  80. Kaur, H.; Williard, K.W.; Schoonover, J.E.; Singh, G. Impact of Flue Gas Desulfurization Gypsum Applications to Corn-Soybean Plots on Surface Runoff Water Quality. Water Air Soil Pollut. 2022, 233, 72. [Google Scholar] [CrossRef]
  81. Dhillon, J.; Aula, L.; Eickhoff, E.; Raun, W. Predicting in-season maize (Zea mays L.) yield potential using crop sensors and climatological data. Sci. Rep. 2020, 10, 11479. [Google Scholar] [CrossRef]
  82. Mwinuka, P.R.; Mourice, S.K.; Mbungu, W.B.; Mbilinyi, B.P.; Tumbo, S.D.; Schmitter, P. UAV-based multispectral vegetation indices for assessing the interactive effects of water and nitrogen in irrigated horticultural crops production under tropical sub-humid conditions: A case of African eggplant. Agric. Water Manag. 2022, 266, 107516. [Google Scholar] [CrossRef]
  83. Nevavuori, P.; Narra, N.; Linna, P.; Lipping, T. Crop yield prediction using multitemporal UAV data and spatio-temporal deep learning models. Remote Sens. 2020, 12, 4000. [Google Scholar] [CrossRef]
  84. Zhou, X.; Kono, Y.; Win, A.; Matsui, T.; Tanaka, T.S.T. Predicting within-field variability in grain yield and protein content of winter wheat using UAV-based multispectral imagery and machine learning approaches. Plant Prod. Sci. 2020, 24, 137–151. [Google Scholar] [CrossRef]
  85. Colaço, A.; Richetti, J.; Bramley, R.; Lawes, R. How will the next-generation of sensor-based decision systems look in the context of intelligent agriculture? A case-study. Field Crops Res. 2021, 270, 108205. [Google Scholar] [CrossRef]
  86. Oglesby, C.; Fox, A.A.; Singh, G.; Dhillon, J. Predicting In-Season Corn Grain Yield Using Optical Sensors. Agronomy 2022, 12, 2402. [Google Scholar] [CrossRef]
  87. Sumner, Z.; Varco, J.J.; Dhillon, J.S.; Fox, A.A.; Czarnecki, J.; Henry, W.B. Ground versus aerial canopy reflectance of corn: Red-edge and non-red edge vegetation indices. Agron. J. 2021, 113, 2782–2797. [Google Scholar] [CrossRef]
Figure 1. Experimental location and design showing the different agronomic treatments, namely Austrian Winter Peas (AWP) cover crop, biochar, gypsum, and fallow, applied in this experiment. The vector file of treatments wrapped over the False Color Composite (FCC) derived using UAV multispectral data.
Figure 1. Experimental location and design showing the different agronomic treatments, namely Austrian Winter Peas (AWP) cover crop, biochar, gypsum, and fallow, applied in this experiment. The vector file of treatments wrapped over the False Color Composite (FCC) derived using UAV multispectral data.
Agronomy 13 01277 g001
Figure 2. A flowchart diagram presenting the methodology adopted in this study.
Figure 2. A flowchart diagram presenting the methodology adopted in this study.
Agronomy 13 01277 g002
Figure 3. A schematic diagram of a 5-fold cross-validation method for computing the performance measures of different ML models. PMs: performance measures.
Figure 3. A schematic diagram of a 5-fold cross-validation method for computing the performance measures of different ML models. PMs: performance measures.
Agronomy 13 01277 g003
Figure 4. A boxplot of different agronomic treatments and total corn yield.
Figure 4. A boxplot of different agronomic treatments and total corn yield.
Agronomy 13 01277 g004
Figure 5. Correlation between suitable variables and yield for each treatment at the V6 stage.
Figure 5. Correlation between suitable variables and yield for each treatment at the V6 stage.
Agronomy 13 01277 g005
Figure 6. Correlation between suitable variables and corn yield for each treatment at the R5 stage.
Figure 6. Correlation between suitable variables and corn yield for each treatment at the R5 stage.
Agronomy 13 01277 g006
Figure 7. Impact of the number of suitable variables on ML models’ performance in yield prediction at the V6 stage.
Figure 7. Impact of the number of suitable variables on ML models’ performance in yield prediction at the V6 stage.
Agronomy 13 01277 g007
Figure 8. Impact of number of suitable variables on ML models’ performance in yield prediction at the R5 stage.
Figure 8. Impact of number of suitable variables on ML models’ performance in yield prediction at the R5 stage.
Agronomy 13 01277 g008
Figure 9. Best-performing ML models for each treatment at the V6 and R5 growth stages.
Figure 9. Best-performing ML models for each treatment at the V6 and R5 growth stages.
Agronomy 13 01277 g009
Table 1. Specification and description of agronomic treatments applied in this experiment.
Table 1. Specification and description of agronomic treatments applied in this experiment.
TreatmentsSpecificationDescription
AWPSeeding date = 6 November 2020
Seeding rate = 67 kg/ha
Harvesting data = 29 April 2021
AWP is a legume cover crop that can improve soil health, thus enhancing the soil’s physical, chemical, and biological properties. It improves nutrient cycling, increases nitrogen, sequesters carbon, and enhances soil aggregation, water infiltration, storage capacity, and use efficiency [54,55].
BiocharQuantity = 15 t/ha
Application date = 12 November 2020
Biochar is a soil amendment. It is made from sugarcane bagasse (sugarcane stalk residue after juice extraction). It increases soil organic carbon, soil pH, and microbial activity, improving soil structure, soil porosity, soil water holding capacity, cation exchange capacity, nutrient cycling, and plant growth and yields [56,57].
GypsumQuantity = 2 t/ha
Application date = 20 November 2020
Gypsum (Calcium sulfate) is a soil amendment. It improves soil fertility by increasing sulfur, phosphorus, calcium, magnesium, manganese, and enhances plant nitrogen use efficiency [58,59].
FallowLeft bare during the winter period from November 2020 to April 2021.This is the common practice in the Delta, Mississippi between harvest and planting. The soil is exposed to erosion with essential nutrients susceptible to losses through leaching or runoff.
Table 2. Implemented Vegetation Indices (VIs) and their mathematical equations. G: green, R: red, RE: red edge, and NIR: near-infrared spectral bands of UAV multispectral data.
Table 2. Implemented Vegetation Indices (VIs) and their mathematical equations. G: green, R: red, RE: red edge, and NIR: near-infrared spectral bands of UAV multispectral data.
VIsEquationReferences
CARI C A R I = R E / R [60]
CCCI C C C I = N D R E N D R E m i n / N D R E m a x N D R E m i n [61]
CIRE C I R E = N I R / ( R E 1 ) [31]
CVI C V I = N I R ( R / G 2 ) [62]
EVI2 E V I 2 = 2.5 ( N I R R ) / ( N I R + 2.4 R + 1 ) [17]
GCVI G C V I = N I R / G 1 [63]
GNDVI G N D V I = ( N I R G ) / ( N I R + G ) [34]
IV1 I V 1 = N I R / G [34]
IV2 I V 2 = N I R / R [34]
IV3 I V 3 = N I R / R E [34]
LNRE L N R E = 100 ln N I R ln R [34]
MSAVI2 M S A V I 2 = 0.5 ( 2 N I R + 1 2 N I R 2 8 N I R R ) [34]
MTVI M T V I = 1.2 ( 1.2 N I R G 2.5 R G ) [5]
MTVI2 M T V I 2 = 1.5 1.2 N I R G 2.5 R G 2 N I R + 1 2 6 N I R 5 R 0.5 [5]
NAVI N A V I = ( N I R R ) / N I R [62]
NCMI N C M I = N I R ( G + R ) / N I R + ( G + R ) [64]
NDRE N D R E = ( N I R R E ) / ( N I R + R E ) [34]
NDVI N D V I = ( N I R R ) / ( N I R + R ) [34]
NGRDI G N D V I = ( G R ) / ( G + R ) [65]
OSAVI O A V I = ( 1.16 ( N I R R ) ) / ( N I R + R + 0.16 ) [66]
RDVI R D V I = N I R R / ( N I R + R ) 0.5 [32]
RECI R E C I = ( N I R R ) / R [65]
SAVI S A V I = ( 1.5 ( N I R R ) ) / ( N I R + R + 0.5 ) [34]
SCCCI S C C C I = N D V I / N D R E [34]
TCARI T C A R I = 3 ( ( R E R ) 0.2 ( R E G ) ( R E / R ) )[5]
TVI T V I = 0.5 ( 120 R E G 200 R G ) [5]
Table 3. One-way ANOVA test statistics of different agronomic treatments.
Table 3. One-way ANOVA test statistics of different agronomic treatments.
Degrees of FreedomSum of SquaresMean of the Sum SquaresF-ValuePr(>F)
Treatment39.183.060.980.41
Residuals60187.523.13
Table 4. Correlation matrix of suitable variables (i.e., spectral bands and vegetation indices) and yield of different treatments at vegetative (V6) and reproductive (R5) stages. The bold font indicates the highest correlation between the variable and yield.
Table 4. Correlation matrix of suitable variables (i.e., spectral bands and vegetation indices) and yield of different treatments at vegetative (V6) and reproductive (R5) stages. The bold font indicates the highest correlation between the variable and yield.
Vegetative (V6)Reproductive (R5)
AWP
RedeNIRCVICARI NIRSCCCINCMIIV3GNDVI
Yield0.400.320.020.14Yield−0.17−0.08−0.250.11−0.31
Rede1.000.600.280.73NIR1.00−0.100.650.050.71
NIR 1.000.610.69SCCI 1.000.19−0.23−0.19
CVI 1.000.69NCMI 1.00−0.440.74
IV3 1.00−0.34
Biochar
GreenNGRDICCCI RedeTCARISCCCI
Yield0.45−0.33−0.63 Yield−0.60−0.42−0.31
Green1.00−0.61−0.16 Rede1.000.740.49
NGRDI 1.00−0.03 TCARI 1.000.60
Gypsum
GreenRECICCCI GreenNGRDIIV3CVI
Yield0.53−0.59−0.57 Yield0.690.400.53−0.50
Green1.00−0.67−0.43 Green1.000.330.71−0.65
RECI 1.000.39 NGRDI 1.00−0.18−0.73
IV3 1.00−0.22
Fallow
RedeCARI GreenSCCCINDRECVICARI
Yield−0.47−0.87 Yield0.360.41−0.11−0.690.47
Rede1.000.65 Green1.00−0.320.19−0.13−0.23
SCCI 1.00−0.04−0.130.39
NDRE 1.000.41−0.47
CVI 1.00−0.72
All treatments
GreenMTVI2CCCI CVIGCVIGreenIV3
Yield0.51−0.36−0.28 Yield−0.40−0.390.350.20
Green1.00−0.67−0.22 CVI1.000.41−0.240.27
MTVI2 1.000.16 GCVI 1.00−0.39−0.23
Green 1.00049
Table 5. Impact of the number of suitable variables on ML models’ performance for each treatment at the V6 growth stage. The bold font indicates the best accuracy achieved by the ML models for each treatment.
Table 5. Impact of the number of suitable variables on ML models’ performance for each treatment at the V6 growth stage. The bold font indicates the best accuracy achieved by the ML models for each treatment.
Number of Suitable Variables
OneTwoThreeFour
AWP
R2RMSER2RMSER2RMSER2RMSE
LR0.621.120.551.210.531.180.521.32
RF0.681.020.661.090.621.090.571.12
KNN0.691.050.651.130.611.230.581.85
SVR0.651.170.581.390.531.410.642.45
DNN0.581.570.671.620.641.670.621.83
Biochar
LR0.591.440.551.590.541.70
RF0.631.270.611.350.621.42
KNN0.641.700.661.200.681.29
SVR0.621.480.671.190.561.41
DNN0.571.740.711.080.671.29
Gypsum
LR0.591.560.581.660.571.79
RF0.511.900.561.740.581.64
KNN0.601.730.611.490.651.75
SVR0.621.870.581.650.571.52
DNN0.553.210.512.800.542.26
Fallow
LR0.840.960.811.00
RF0.741.030.731.09
KNN0.841.250.791.60
SVR0.840.690.781.15
DNN0.780.990.681.74
All treatments (i.e., combined AWP, biochar, gypsum, and fallow)
LR0.321.520.311.540.281.60
RF0.161.800.191.690.171.68
KNN0.301.620.311.540.281.55
SVR0.331.520.361.480.291.57
DNN0.162.720.182.290.171.99
Table 6. Impact of the number of suitable variables on ML models’ performance for each treatment at the R5 growth stage. The bold font indicates the best accuracy achieved by the ML models for each treatment.
Table 6. Impact of the number of suitable variables on ML models’ performance for each treatment at the R5 growth stage. The bold font indicates the best accuracy achieved by the ML models for each treatment.
Number of Suitable Variables
OneTwoThreeFourFive
AWP
R2RMSER2RMSER2RMSER2RMSER2RMSE
LR0.491.150.481.310.511.390.531.620.551.83
RF0.501.380.511.390.511.330.501.180.451.17
KNN0.641.130.591.130.521.310.541.120.541.19
SVR0.661.680.591.460.511.420.451.200.471.77
DNN0.552.030.552.150.491.680.491.280.541.17
Biochar
LR0.641.520.581.670.581.99
RF0.661.550.611.670.611.73
KNN0.611.600.561.680.641.65
SVR0.601.570.601.600.641.87
DNN0.741.270.552.840.503.04
Gypsum
LR0.741.390.681.570.571.780.491.74
RF0.711.170.721.250.681.330.631.39
KNN0.801.350.691.760.711.630.691.58
SVR0.741.080.661.210.701.450.691.48
DNN0.711.370.631.520.691.640.601.58
Fallow
LR0.721.590.671.670.721.510.771.260.771.34
RF0.751.300.751.340.771.300.841.220.801.31
KNN0.761.230.751.260.771.290.811.410.771.39
SVR0.831.050.831.080.801.280.821.230.801.28
DNN0.721.720.592.310.593.190.721.690.691.72
All treatments
LR0.231.640.251.600.251.580.251.58
RF0.271.600.281.560.311.520.311.52
KNN0.361.450.381.450.401.400.401.40
SVR0.361.450.411.430.401.410.401.41
DNN0.151.880.172.250.182.280.182.28
Table 7. Best-performing ML models for each treatment using optimal variables at the V6 and R5 stages. The bold font indicates the best accuracy achieved by the ML models.
Table 7. Best-performing ML models for each treatment using optimal variables at the V6 and R5 stages. The bold font indicates the best accuracy achieved by the ML models.
LRRFKNNSVRDNN
R2RMSER2RMSER2RMSER2RMSER2RMSE
Austrian Winter Peas
V60.621.120.681.020.691.050.651.170.671.62
R50.551.830.511.330.641.130.661.680.541.17
Biochar
V60.591.440.631.270.681.290.671.190.711.08
R50.641.520.661.550.641.650.641.870.741.27
Gypsum
V60.591.560.581.640.651.750.621.870.542.26
R50.741.390.721.250.801.350.741.080.711.37
Fallow
V60.840.960.741.030.841.250.840.690.780.99
R50.771.260.841.220.771.290.831.050.721.69
All treatments
V60.321.520.191.680.311.540.361.480.181.99
R50.251.580.311.520.401.400.411.410.181.88
Table 8. Optimal value of hyperparameters of the best-performing ML models at the V6 and R5 growth stages.
Table 8. Optimal value of hyperparameters of the best-performing ML models at the V6 and R5 growth stages.
Best-Performing ML Model (V6 and R5)Optimal Hyperparameter Value
(V6 and R5)
Grid Search Space
(V6 and R5)
AWPKNNK = 5K = 13K = 1, 3, 5, 7, 9, 11, 13, 15
GypsumKNNK = 5K = 3
BiocharDNNLayer1, layer2, layer3 = 7Layer1, layer2, layer3 = 11Layer1, layer2, and layer3 = 1 to 15.
FallowSVRC = 10
Sigma = 0.412
C = 30
Sigma = 0.711
C = 1, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100. Sigma (V6) = 0.412, 0.917, 3.185.
Sigma (R5) = 0.189, 0.711, 8.714.
All treatmentsSVRC = 1
Sigma = 0.148
C = 5
Sigma = 0.387
C = 1, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100. Sigma (V6) = 0.148, 1.855, 111.72.
Sigma (R5) = 0.120, 0.387, 1.878.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kumar, C.; Mubvumba, P.; Huang, Y.; Dhillon, J.; Reddy, K. Multi-Stage Corn Yield Prediction Using High-Resolution UAV Multispectral Data and Machine Learning Models. Agronomy 2023, 13, 1277. https://doi.org/10.3390/agronomy13051277

AMA Style

Kumar C, Mubvumba P, Huang Y, Dhillon J, Reddy K. Multi-Stage Corn Yield Prediction Using High-Resolution UAV Multispectral Data and Machine Learning Models. Agronomy. 2023; 13(5):1277. https://doi.org/10.3390/agronomy13051277

Chicago/Turabian Style

Kumar, Chandan, Partson Mubvumba, Yanbo Huang, Jagman Dhillon, and Krishna Reddy. 2023. "Multi-Stage Corn Yield Prediction Using High-Resolution UAV Multispectral Data and Machine Learning Models" Agronomy 13, no. 5: 1277. https://doi.org/10.3390/agronomy13051277

APA Style

Kumar, C., Mubvumba, P., Huang, Y., Dhillon, J., & Reddy, K. (2023). Multi-Stage Corn Yield Prediction Using High-Resolution UAV Multispectral Data and Machine Learning Models. Agronomy, 13(5), 1277. https://doi.org/10.3390/agronomy13051277

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop