Next Article in Journal
An Effect and Less Spraying Control Method Successfully Controls Botrytis cinerea on Grapes in China
Previous Article in Journal
Soil Quality Improvement with Increasing Reclamation Years in the Yellow River Delta
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improving the Prediction of Grain Protein Content in Winter Wheat at the County Level with Multisource Data: A Case Study in Jiangsu Province of China

1
National Engineering and Technology Center for Information Agriculture, Nanjing Agricultural University, Nanjing 210095, China
2
MOE Engineering and Research Center for Smart Agriculture, Nanjing Agricultural University, Nanjing 210095, China
3
MARA Key Laboratory for Crop System Analysis and Decision Making, Nanjing Agricultural University, Nanjing 210095, China
4
Jiangsu Key Laboratory for Information Agriculture, Nanjing Agricultural University, Nanjing 210095, China
5
Collaborative Innovation Centre for Modern Crop Production Co-Sponsored by Province and Ministry, Nanjing Agricultural University, Nanjing 210095, China
6
Jiangsu Kesheng Group Co., Ltd., Yancheng 224700, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Agronomy 2023, 13(10), 2577; https://doi.org/10.3390/agronomy13102577
Submission received: 30 August 2023 / Revised: 28 September 2023 / Accepted: 5 October 2023 / Published: 7 October 2023
(This article belongs to the Section Precision and Digital Agriculture)

Abstract

:
Wheat is an important food crop in China. The quality of wheat affects the development of the agricultural economy. However, the high-quality wheat produced in China cannot meet the demand, so it would be an important direction for research to develop high-quality wheat. Grain protein content (GPC) is an important criterion for the quality of winter wheat and its content directly affects the quality of wheat. Studying the spatial heterogeneity of wheat grain proteins is beneficial to the prediction of wheat quality, and it plays a guiding role in the identification, grading, and processing of wheat quality. Due to the complexity and variability of wheat quality, conventional evaluation methods have shortcomings such as low accuracy and poor applicability. To better predict the GPC, geographically weighted regression (GWR) models, multiple linear regression, random forest (RF), BP neural networks, support vector machine, and long-and-short-term memory algorithms were used to analyze the meteorological data and soil data of Jiangsu Province from March to May in 2019–2022. It was found that the winter wheat GPC rises by 0.17% with every 0.1° increase in north latitude at the county level in Jiangsu. Comparison of the prediction accuracy of the coefficient of determination, mean deviation error, root mean square error, and mean absolute error by analyzing multiple algorithms showed that the GWR model was the most accurate, followed by the RF model. The regression coefficient of precipitation in April showed the smallest range of variation among all factors, indicating that precipitation in April had a more stable effect on GPC in the study area than the other meteorological factors. Therefore, consideration of spatial information might be beneficial in predicting county-level winter wheat GPC. GWR models based on meteorological and soil factors enrich the studies regarding the prediction of wheat GPC based on environmental data. It might be applied to predict winter wheat GPC and improve wheat quality to better guide large-scale production and processing.

1. Introduction

Wheat is the most important major food crop in the world, feeding nearly 40% of the global population [1] and providing approximately 20% of energy and 22% of protein for daily human diets. In recent years, the demand for agricultural products, including wheat, has changed from availability to quality. Although China is a large agricultural country with stable wheat production, the current domestic production of high-quality wheat cannot meet the demand for high-quality strong gluten wheat, and imports tend to be a common solution.
The quality of wheat determines its purchase price, processing purposes, and value in use and in turn affects the profitability of wheat production [2]. As a key indicator of wheat quality evaluation, wheat grain protein content (GPC) is mainly subject to factors such as wheat varieties, climate during growth, and soil fertility, among which meteorological factors have the greatest influence [3]. Many studies have included analyses on the effect of climate information on winter wheat GPC through daily or monthly data [4]. They have demonstrated that meteorological conditions, including temperature, radiation, and precipitation, have the potential to influence the protein content and composition of wheat seeds during grain filling [5]. However, some studies have shown that the effect of precipitation on GPC cannot be ascertained, and it is likely to be positively [5] or negatively [6] correlated with GPC. In addition to the effect of variety and environment on winter wheat GPC, wheat GPC changed accordingly with geographic location. Some studies found that the differences in wheat GPC based on spatial location were greater than the differences between varieties at different latitudes and ecological environments. Therefore, the study of spatial heterogeneity in wheat is beneficial to improving the quality of processed wheat products and can help to guide large-scale production and processing [7].
For a long time, to advance the study of spatial heterogeneity in wheat, researchers at home and abroad have conducted numerous experiments to obtain physiological and biochemical parameters. However, the existing research methods are still insufficient in terms of practical application, although they are able to initially assess the quality of wheat. This insufficiency can be attributed to the fact that the role of multi-indicator weights is not obvious, and the accuracy of prediction requires improvement [8].
Statistical models and simulation models are two main approaches to study spatial heterogeneity in wheat based on environmental information [9]. Statistical models can complement simulation models, which record environmental elements and analyze their effects on crop GPC [10] and quality [4]. Previous studies have developed various statistical models, including multiple linear regression, stepwise regression analysis, spatial lagged effects models, and hierarchical linear regression models [11], to study spatial heterogeneity in wheat based on meteorological parameters [12]. The crop simulation model is designed to explore the effect of climate on winter wheat GPC in the absence of agricultural management [13], but its disadvantage lies in the sole focus on the direct effects of climatic factors on crop growth, ignoring indirect effects such as crop losses due to adverse meteorological conditions or pests and diseases [14].
In contrast, machine learning has more advantages. It is highly data-driven and capable of building powerful nonlinear regression models that can effectively mine and exploit detailed information such as spatial structure. It is possible to guarantee smarter, more efficient, and more accurate decisions and services. Combined with multisource data, machine learning has great potential for investigating spatial heterogeneity [15].
Due to the variable geographic factors and diverse climate, wheat GPC and wheat characteristics vary greatly from region to region and year to year under normal growth conditions, which is a serious obstacle to the study of spatial heterogeneity [16]. To address this issue, a geographically weighted regression (GWR) model based on partial regression is proposed for assessing whether the spatial relativity of all regional data is stable [17]. This model integrates the geographic location parameters of selected sampling points, examines the variation in their spatial parameters [18], and can explore the spatial variability of a study object at a certain scale [19]. It is also widely used in sociological work, agroecology, and soil surveys. It is worthwhile to study the GWR model to assess the effect on the spatial heterogeneity of winter wheat GPC at the county level. Currently, although the effects of meteorological parameters in various locations on GPC have been widely investigated, studies on the effects of soil elements on the spatial heterogeneity of winter wheat GPC are still lacking.
In summary, few studies have assessed spatial heterogeneity and spatio-temporal variability in county-level wheat GPC. The innovation point of this study lies in the use of multiple machine learning algorithms fusing GPC, soil, and meteorological data of 4-year multi-point trials of winter wheat in Jiangsu Province, which constructs a prediction model of the winter wheat GPC at the county level and studies the spatial heterogeneity of the GPC through correlation analysis, feature combinations, feature selections, and feature importance assessments. The objectives of this study were (1) to investigate the correlation of county-level winter wheat GPC with latitude, (2) to evaluate the accuracy of models using multiple machine learning algorithms, and (3) to assess the influences of various meteorological and soil factors on the heterogeneity of county-level winter wheat GPC in Jiangsu.

2. Materials and Methods

2.1. Study Region

The wheat GPC data and environmental parameters in this study were obtained from Jiangsu Province (30°45′ N~35°20′ N, 116°18′ E~121°57′ E), China, from 2019 to 2022. Jiangsu has a variety of climate types, including subtropical humid monsoon, subtropical monsoon, and warm temperate humid and semi-humid monsoon. Approximately 25% of the entire winter wheat growing region in China (approximately 607.5 million ha) is in Jiangsu Province, which leads to a high representativeness of our results for winter wheat yield prediction in China (National Bureau of Statistics of China, 2019 to 2022).

2.2. Data Sources

2.2.1. Winter Wheat GPC at the County Level

The winter wheat GPC was collected from the Jiangsu Wheat Quality Report of the General Agricultural Technology Extension Station of Jiangsu Province. The dataset is based completely on experimental measurements of varying grades of the samples collected from various counties and districts in Jiangsu Province, with a measurement accuracy of two decimal places. The samples for winter wheat GPC data for the years 2019 to 2022 include 1253 samples from 73 counties in the region.

2.2.2. Environmental Data

In the specified four years, the winter wheat harvesting period in Jiangsu Province was between late May and late June. This work is focused on the key growth stages of winter wheat that affect GPC formation, such as stem elongation, germination, and partial filling, so the meteorological data analysis in this paper is mainly from March to May each year. The mean temperature in March (MT03), April (MT04), and May (MT05), mean maximum temperature in March (TMAX03), April (TMAX04), and May (TMAX05), mean sunshine duration in March (MSD03), April (MSD04), and May (MSD05), and precipitation in March (PRE03), April (PRE04), and May (PRE05) from 2019 to 2022 were acquired from the ECMWF climate data shared service system (https://cds.climate.copernicus.eu/, accessed on 4 October 2023). Through the field experiments, basic soil attribute data were collected from different locations and years in the experimental spots to examine the effects of soil factors on GPC, which included nitrogen alkali digestion (N), available phosphorus (P), rapidly available potassium (K), and soil organic matter (SOM). More experimental details are in Ruan et al. [20]. The 12 meteorological factors and 4 soil factors used in this paper are listed in Table 1.

2.3. Estimation Method

This study included an exploration of the linear interaction between latitude and county-level winter wheat GPC using SPSS (The SPSSAU project (2021), SPSSAU (Version 21.0) Online Application Software; retrieved from https://www.spssau.com, accessed on 4 October 2023). Two statistical models containing GWR and multiple linear regression (MLR), and four machine learning algorithms, namely, random forest (RF), BP neural network (BPNN), support vector machine (SVM), and long-and-short-term memory (LSTM), were used to construct and evaluate county-level winter wheat forecasting models. Shapley additive planning (SHAP) and sensitivity analysis were used to study the spatial heterogeneity of winter wheat GPC in Jiangsu Province. The modeling of out-of-sample prediction in this study used the Python Spatial Analysis library.

2.3.1. Geographically Weighted Regression

GWR is a spatial analysis technique that explores the spatial variability of a study object at a given scale and the associated drivers by creating a local regression equation (n at each point of the spatial scale). Considering the partial influence of the spatial object, GWR excels in accuracy. The specific expression is as follows:
G P C i = a 0 ( u i , v i ) + k = 1 p a k ( u i . v i ) x k ( u i , v i ) + β ( u i , v i ) ,   i = 1 , , n
where i is the i-th county-level GPC in Jiangsu Province; n is the number of independent variables in the model; and  ( u i , v i )  represents the geographic and spatial location of the ith county.  a 0 ( u i , v i ) a k ( u i . v i ) , and  x k ( u i , v i )  are the intercept coefficients, explanatory functions for the kth variable, and local regression coefficients for the kth variable, respectively. The positive and negative values of the regression parameter  a k ( u i . v i )  are meaningful. This study uses the GWR model to abstract the location of counties as central points to easily determine the county-to-county distances.
Although the value of the regression coefficient  β  does not fluctuate with sample point i, in this case, GWR is equivalent to linear regression. The core of the GWR model principle is the spatial weight matrix, and the regression coefficient of GWR is calculated by the least squares method through the import of the weights Wij, which is expressed as follows:
W i j = e x p d i j / b 2
where  d i j  is the distance between observation point i and observation point  j b  is a nonnegative attenuation parameter describing the function relationship between weight and distance, called bandwidth, and the best bandwidth is generally obtained by the cross-validation method.
The GWR model analyzes the weight matrix derived from the location relationship of spatial objects, but it cannot fully reveal the autocorrelation of spatial variables. The spatial autocorrelation coefficient can represent the spatial autocorrelation characteristics of winter wheat GPC in each county. Spatial statistical analysis can describe the spatial autocorrelation of the data, but it is difficult to quantitatively generalize the causal relationship between spatial things. The GWR model, on the other hand, can quantitatively describe the association between winter wheat GPC in each county and influential factors, taking into account the individual spatial object locations. Therefore, combining the spatial autocorrelation model and GWR, in this study, the advantages of each of them are summarized, and the spatial relationship between GPCs in each county and latitude, as well as the causality with meteorological, soil, and other parameters, are profoundly diagnosed. However, the GWR model can still be improved. For example, multi-scale geographically weighted regression (MGWR) improves GWR by taking into account spatial scale differences in the effects of factors, which makes the modeling results more reliable.

2.3.2. Machine Learning

Machine learning algorithms are also known as estimators. This study evaluates four estimators, namely, RF, BPNN, SVM, and LSTM, for wheat quality prediction. The RF algorithm is a multiclassification tree consisting of a series of different regression trees, which is more adaptable to database sets and can run large datasets efficiently. The process of constructing the BPNN in this study includes constructing input, hidden, and output layers with interconnected neurons between the layers and no connection between the same layers. The actual process is divided into the following three parts: prediction model building, mathematical model training and optimization, and model testing and simulation. The SVM is an optimization algorithm based on supervised learning that has satisfactory results in dealing with small samples and nonlinear and high-dimensional patterns. The SVM uses a two-dimensional sample dataset; for example, a two-dimensional planar dataset can be classified with multiple straight lines of different slopes, and the SVM searches for the best generalization capability among the many classified straight lines. When the dataset is three-dimensional and above, dataset D requires a maximum separation hyperplane to classify the high-dimensional data. The LSTM is a recurrent neural network with LSTM units as its hidden layer building blocks, which is very effective in solving time series.
Due to the complexity of machine learning, the prediction models sometimes match the known datasets too closely or precisely and thus lack the generalization ability to predict future observations well. In this case, overfitting occurs. Therefore, since 60% of the dataset is used to generate the predictive models, 20% of the data is used as a test set for the subsequent evaluation of the predictive model’s ability to predict unknown data, and the remaining data are used as a validation set to adjust the hyperparameters of the model and preliminarily evaluate the model’s ability. The overall workflow quality prediction program is shown in Figure 1.

2.4. Accuracy Evaluation

Four variables are used to evaluate the performance of the models: coefficient of determination (R2), mean deviation error (MBE), root mean square error (RMSE), and mean absolute error (MAE), whose formulas are shown below. The higher the R2 is, the larger the absolute value of MBE is, which indicates that the model makes better predictions, which are closer to the true value. RMSE and MAE express the difference between the model and the measured value, and the smaller the value is, the higher the accuracy of the model is.
R 2 = [ i = 1 n x i x i ¯ y i y i ¯ i = 1 n x i x i ¯ i = 1 n y i y i ¯ ] 2
MBE = 1 n i = 1 n y i y i ¯
R M S E = i = 1 n x i y i 2 n
M A E = 1 n i = 1 n y i x i
where n denotes the total number of samples, xi denotes the i-th measured value, and yi denotes the i-th predicted value.

2.5. Sensitivity Analysis

2.5.1. E-Fast Method

The E-Fast method is a new global sensitivity analysis method based on the Fourier amplitude sensitivity test, which combines the advantages of the Sobol method with the robustness, low sample size, and efficient calculation of the markers. It is a quantitative global sensitivity analysis method based on the variance method. By decomposing the variance of the model output to obtain the quantitative sensitivity of the parameters (each subsensitivity and the total sensitivity), the quantitative analysis of the influence of the model parameters on the model output is performed, and the quantitative value of the direct or indirect contribution of each parameter to the model output is obtained.

2.5.2. SHAP Method

SHAP is a method for interpreting models, mainly for explaining the prediction results of machine learning models. SHAP makes use of the concept of the Shapley value in game theory to calculate the feature importance, giving the contribution of each feature an output impact. It can interpret feature importance for different combinations of variables and for individual observations, so it can be used not only for the analysis of overall features but also for the interpretation of output results for individual features. SHAP values consider the importance of each feature itself and the interrelationships between features, enabling more accurate prediction results. SHAP can visualize the features for an entire dataset or a single data point for easy visualization.

3. Results

3.1. Relationship between GPC and Latitude

The statistical results of winter wheat GPC at the county level in Jiangsu Province from 2019 to 2022 are shown in Figure 2. The mean value of winter wheat GPC for the whole dataset was 14.00%, ranging from 9.07% to 20.98%. By analyzing the correlation between winter wheat GPC and latitude in Jiangsu province, it was found that GPC was positively and dramatically linked to latitude, with R2 = 0.78 (Figure 3). There was a 0.17% increase in winter wheat GPC with each 0.1° rise in latitude to the north.

3.2. Accuracy Comparison of Winter Wheat GPC Models Based on Different Methods

In this study, six methods, namely, RF, BPNN, SVM, LSTM, GWR, and MLR, were used to combine meteorological and soil information to construct a winter wheat GPC prediction model. As shown in Figure 4, the GWR model fits the predicted and actual values significantly better than the other models. Among the four methods of machine learning algorithms, the best simulation effect resulted from the RF algorithm, followed by SVM, and finally BPNN and LSMT.
In addition, six feature subsets (Table 2) were created to evaluate the prediction accuracy of each method for winter wheat GPC (Figure 5, Table 3) through four indices, namely, R2, RMSE, MAE, and MBE. It could be seen that among all the models constructed by different methods, the GWR model had the largest R2 and the smallest RMSE and MAE, and the absolute value of MBE was slightly smaller than that of the RF model, so the GWR model had the highest accuracy. It could also be seen that MLR had the lowest accuracy. Therefore, this study used GWR to further analyze the spatial heterogeneity of winter wheat GPC.

3.3. Analysis of the Spatial Heterogeneity of Winter Wheat GPC Based on the GWR Model

Sixteen factors with GWR coefficients are listed in Table 4 to better explore the spatial influences of various parameters on GPC. Comparison of the interquartile range of the estimation coefficients of the GWR and MLR models yielded errors within 1 standard error, indicating that the effects of environmental variables on the winter wheat GPC at the county level are spatially heterogeneous. The sensitivity indices of environmental parameters for county winter wheat GPC are illustrated in Figure 6, demonstrating that 12 meteorological factors and 4 soil factors exhibited spatial heterogeneity in the county-level winter wheat GPC sensitivity, among which, the sensitivity indices of P, PRE03, PRE04, and PRE05 were poorly correlated with latitude.

3.4. Sensitivity of Factors Affecting Winter Wheat GPC at the County Level

As shown in the sensitivity indices of the factors of the GWR model (Figure 7), there were significant spatial differences in the effects of each environmental factor on GPC, meaning that there is spatial non-smoothness in the relationship between GPC and these environmental factors. The degree of influence of each factor on GPC can be explained by the corresponding regression coefficients. The absolute value of the regression coefficient reflects the intensity of the effect on GPC (Figure 7), with positive response factors exerting a positive influence or having a positive correlation with GPC, and vice versa. For example, the regression coefficients of MT04 had a smaller range of variation than those of the other factors, indicating that the effect of MT04 on GPC in the study area was more stable than that of the other factors. TMAX03, N, and K had much greater effects on wheat quality than other factors; that is, the changes in TMAX03, N, and K had the greatest impact on the accuracy of the model. This was followed by MSD04, TMAX04, and TMAX05, while P, PRE03, PRE04, and PRE05 had the least effect.
Furthermore, SHAP values were used to explain the four machine learning models. As shown in Figure 8, MT04 had a much greater effect on wheat quality than the other factors, meaning that the changes in MT04 had the greatest effect on the accuracy of the model; this was followed by MSD03 and TMAX03. Finally, TMAX04, P, and SOM had the least effect.

4. Discussion

4.1. Spatial Heterogeneity Analysis of the County-Level Winter Wheat GPC

Winter wheat with low GPC is readily available at low subtropical latitudes, whereas the converse is the case at high latitudes, where GPC is also susceptible to climatic factors (Figure 3). Our study revealed that in Jiangsu Province, the county-level winter wheat GPC increased by 0.17% for every 0.1° increase in north latitude. This result is consistent with previous findings that wheat GPC decreases with decreasing latitude in the northeast spring wheat region, the Yellow–Huai winter wheat region, and the middle and lower Yangtze River winter wheat region [21]. This result also shows that the overall trend of wheat GPC is high in the northeast and low in the southwest, decreasing from year to year, mostly in a zonal distribution, and overall higher in the north than in the south, and latitude is the main influencing factor [22]. Many factors contribute to the spatial heterogeneity of wheat GPC. Although it is warmer during wheat growth stages at low latitudes, high temperatures in winter wheat early stages can dilute GPC and positively affect yield [23].
At low latitudes, precipitation affects the formation of GPC. During the early stages of grain development, water stress diminishes the uptake potential of the grain by reducing the number of formed endosperm cells and amyloplasts [24], which leads to a decrease in grain weight and an increase in protein content [25]. In addition, sunlight also influences the accumulation of seed protein to some extent due to variations in sunlight duration and light intensity. Photosynthesis provides energy for the accumulation of protein content in wheat seeds. The longer the duration of sunlight, the longer the photosynthesis time and the more organic matter accumulates [26]. Meanwhile, when investigating the spatial differences in the winter wheat GPC, differences in cropping patterns, for example, early sowing and late harvesting of winter wheat and the length of the growing season, cannot be ignored.

4.2. Effectiveness of Different Methods to Predict the GPC of Winter Wheat

The MLR model had a limited ability to express the spatial heterogeneity characteristics of winter wheat GPC. In contrast, the GWR model is a partial model that considers the spatial heterogeneity of the variables and decomposes the global parameter estimations into partial parameters for evaluation. The GWR model obviously performs better than the MLR model in terms of estimation accuracy and storage of sample spatial features and can effectively weaken the spatial autocorrelation of model residuals [27]. The RF model does not require prior assumptions about the relationship between the independent and dependent variables, and it can effectively overcome the multi-collinearity among independent variables and give the importance ranking of each variable [28], which has been applied in agricultural factor analysis and predicting biomass [29]. As shown in Figure 5, the R2 of the RF model was 0.70, which is similar to that of GWR, and both can achieve a good fit. BPNN is widely used for model building because of its powerful nonlinear mapping ability and flexible network structure. But from the results in this study, it can be seen that it is not effective in predicting wheat winter GPC. The SVM model can improve the intelligence and automation of spectral data [30]. In this study, it was also used to predict winter wheat GPC with better results, but the prediction accuracy was not as high as that of the GWR and RF models. LSTM can be used to process temporal data and is widely used in natural language processing and speech recognition [31]. However, the fitting of winter wheat GPC by LSTM in this paper was not satisfactory.
In summary, GWR is the best choice for constructing a winter wheat GPC model at the county level. In this study, to build a better general model applicable to the whole region, meteorological and soil factors were added to the GWR model to accommodate geospatial variations. In previous studies, it has been shown that the phenological period of wheat in Jiangsu is earlier than that in some northern regions [32]. Therefore, it may be challenging to rely on the phenological period for a more detailed geographical analysis of winter wheat GPC. GWR is a linear regression of specific sample points, while the effect of meteorological factors on GPC may be nonlinear [33]. In a subsequent study, the effect of nonlinear regression of exponential, logarithmic, and quadratic functions on the quality of wheat in large geographical areas was investigated [34]. In addition, remote sensing data can reveal differences between winter wheat growth and GPC in the same geographic setting [12,35]. It is hoped that remote sensing data will be combined with the GWR model in the future to establish a multilevel GPC estimation model to further diagnose the spatial and temporal differences in winter wheat GPC.

4.3. Sensitivity Analysis of GPC Predictor Variables for Winter Wheat at the County Level

Meteorological factors are utilized in studies of spatial heterogeneity. In this study, 12 meteorological factors and 4 soil factors were identified, and MLR and GWR models were established to explore the spatial heterogeneity of the county-level winter wheat GPC in different years in Jiangsu Province. The use of monthly meteorological and soil data as independent variables for the study of spatial heterogeneity supports the results of previous studies [12].
From the results of this study, the regression coefficients of N and K as soil factors had a wide range of variability, indicating that the effects of N and K on winter wheat GPC vary more significantly in different places. The effect of N on winter wheat GPC has been mentioned in previous studies on wheat quality [23]. N uptake, assimilation, and utilization by wheat plants directly affects seed yield and GPC, and an increase in N application significantly improves wheat GPC [36]. K application can increase leaf K content at anthesis and significantly increase N accumulation at anthesis and maturity, which in turn, facilitates the transport of N stored before flowering to seeds, thereby enhancing winter wheat GPC. In recent years, it has been shown that increased N fertilization and higher crop yield levels accelerate soil K export [37], and K has become a key limiting factor in improving wheat yield and achieving high-quality wheat.
In China, winter wheat returns in early March and is harvested in early May, which matches the period of seed filling and ripening [3]. In this study, the sensitivity indices of TMAX03 and N were negatively correlated with latitude, while other meteorological factors and soil were reversed. When latitude and radiation drive photosynthesis, it is mediated by daytime temperature, while the respiration rate responds to both daytime and night temperatures. The effects of photosynthesis and respiration on crop development differ during different periods. At the county level, the effects of temperature, radiation, precipitation, and N content on winter wheat GPC are complex and challenging to interpret. Although the interannual variation in March maximum temperature in Jiangsu Province is not significant, timely and reasonable planting management and fertilizer management will minimize the effect of temperature on winter wheat GPC.

4.4. Limitations and Future Applications

The accuracy of the GWR model is superior to that of other models, consistent with the results of numerous previous studies [38,39,40]. However, the GWR model can still be improved. It is challenging to fully understand how temperature, radiation, precipitation, and nitrogen content affect GPC. Moreover, the effects are not always consistent with the assumptions of the GWR model and the accuracy of the simulations may be affected. For example, in practice, the effects of independent variables on dependent variables do not always have spatial differences as assumed by the GWR, a fact that gives rise to errors in model simulations. The analysis using machine learning methods for wheat GPC spatial heterogeneity studies was valid. In future work, it is necessary to continue to study the following areas in depth.
First, the sample size of the training data can be expanded. The present article included 4 years of data from 73 counties in 13 cities in Jiangsu Province due to the limited collected data. If possible, future studies could expand the data volume at a later stage. Second, the model algorithm can be further optimized. Based on the model for predicting wheat quality established in this study, more influential factors in addition to meteorological, soil, and latitude factors should be selected for training to optimize the model algorithm. Third, other machine algorithms or autonomous learning algorithms, such as gradient boosted decision trees and other nonlinear modeling methods, can be considered for further exploratory analysis.
In addition to model improvement, various factors can continue to be selected or added for modeling in subsequent studies. Some scholars constructed a GPC prediction model based on hyperspectral data and agronomic parameters, involving the relationship between agronomic parameters and seed protein content at maturity, which could accurately predict seed quality and provide necessary information for agricultural management and production [41]. Although 16 factors were selected for GWR modeling in this study, the regulation of winter wheat GPC is complex. Regional spatial spans and local microclimates vary greatly, resulting in large differences in wheat GPC between regions and years under conventional cultivation conditions [28]. Therefore, corrections and refinements need to be made based on the actual situation in practice.

5. Conclusions

GPC is one of the important indicators of winter wheat quality. Studying the spatial heterogeneity of GPC is favorable for the prediction of wheat quality. In this study, the relationship of county-level environmental variants with winter wheat GPC was investigated using the GWR model. The results showed that the effect of April mean temperature on GPC was more stable than that of the other factors, and the county-level winter wheat GPC increased by 0.17% for every 0.1° increase in north latitude in Jiangsu Province. Through the linear fitting of the predicted and true values of each model and the comparison of the four methods, namely, R2, RMSE, MAE, and MBE, it was concluded that the GWR model was the most accurate, followed by RF, and MLR was the least accurate, which illustrated the superiority of the GWR model in the study of spatial heterogeneity. The GWR analysis showed significant spatial heterogeneity in its effect on GPC: the correlation coefficients of latitude varied with the sensitivity indices of soil and meteorological factors. It was evident from the GWR sensitivity analyses that MT04 had a stable effect on GPC. The GWR model based on meteorological and soil factors could be used for county-level winter wheat GPC prediction to study the spatial heterogeneity of winter wheat, and is beneficial for assessing the status of wheat quality in a timely and accurate manner. In conclusion, the GWR model is valuable for improving wheat prediction research systems. It is also of some reference value in guiding wheat production and processing.

Author Contributions

Q.C., Q.X., X.L., Y.T., Y.Z. and W.C. conceived and designed the experiments; Y.S., X.Z., X.C. and Q.C. performed the experiments, analyzed the data, and wrote the original manuscript; Q.X., X.L., Y.T., Y.Z., W.C. and Q.C. reviewed and revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (2022YFD1900703); the Open Project of the Key Laboratory of Oasis Eco-agriculture, Xinjiang Production and Construction Crops (202104); the Undergraduate Student Innovation Research and Entrepreneurship Training (SRT) of Nanjing Agricultural University (202211XX450); and the Qing Lan Project of Jiangsu Universities.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

We would like to thank Jufang Wang from the College of Foreign Studies at Nanjing Agricultural University for her contributions to the English corrections.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wan, W.; Xiao, J.; Li, M.; Tang, X.; Wen, M.; Cheruiyot, A.K.; Li, Y.; Wang, H.; Wang, X. Fine mapping of wheat powdery mildew resistance gene Pm6 using 2B/2G homoeologous recombinants induced by the ph1b mutant. Theor. Appl. Genet. 2020, 133, 1265–1275. [Google Scholar] [CrossRef]
  2. Kim, K.-H.; Choi, E.D. Retrospective study on the seasonal forecast-based disease intervention of the wheat blast outbreaks in Bangladesh. Front. Plant Sci. 2020, 11, 570381. [Google Scholar] [CrossRef] [PubMed]
  3. Pan, J.; Zhu, Y.; Cao, W.; Dai, T.; Jiang, D. Predicting the protein content of grain in winter wheat with meteorological and genotypic factors. Plant Prod. Sci. 2006, 9, 323–333. [Google Scholar] [CrossRef]
  4. Vollmer, E.; Musshoff, O. Average protein content and its variability in winter wheat: A forecast model based on weather parameters. Earth Interact. 2018, 22, 1–24. [Google Scholar] [CrossRef]
  5. Lee, B.-H.; Kenkel, P.; Brorsen, B.W. Pre-harvest forecasting of county wheat yield and wheat quality using weather information. Agric. For. Meteorol. 2013, 168, 26–35. [Google Scholar] [CrossRef]
  6. McMillen, D.P. Geographically weighted regression: The analysis of spatially varying relationships. Am. J. Agric. Econ. 2004, 86, 554–556. [Google Scholar] [CrossRef]
  7. Magney, T.S.; Eitel, J.U.H.; Huggins, D.R.; Vierling, L.A. Proximal NDVI derived phenology improves in-season predictions of wheat quantity and quality. Agric. For. Meteorol. 2016, 217, 46–60. [Google Scholar] [CrossRef]
  8. Guo, T.; Dai, L.; Yan, B.; Lan, G.; Li, F.; Li, F.; Pan, F.; Wang, F. Measurements of chemical compositions in corn stover and wheat straw by near-infrared reflectance spectroscopy. Animals 2021, 11, 3328. [Google Scholar] [CrossRef]
  9. Kandiannan, K.; Karthikeyan, R.; Krishnan, R.; Kailasam, C.; Balasubramanian, T.N. A crop-weather model for prediction of rice (Oryza sativa L.) yield using an empirical-statistical technique. J. Agron. Crop Sci. 2002, 188, 59–62. [Google Scholar] [CrossRef]
  10. Sun, S.; Yang, X.; Lin, X.; Zhao, J.; Liu, Z.; Zhang, T.; Xie, W. Seasonal variability in potential and actual yields of winter wheat in China. Field Crops Res. 2019, 240, 1–11. [Google Scholar] [CrossRef]
  11. Shaw, S.; Khan, J.; Paswan, B. Spatial modeling of child malnutrition attributable to drought in India. Int. J. Public Health 2020, 65, 281–290. [Google Scholar] [CrossRef] [PubMed]
  12. Xu, X.; Teng, C.; Zhao, Y.; Du, Y.; Zhao, C.; Yang, G.; Jin, X.; Song, X.; Gu, X.; Casa, R.; et al. Prediction of wheat grain protein by coupling multisource remote sensing imagery and ECMWF data. Remote Sens. 2020, 12, 1349. [Google Scholar] [CrossRef]
  13. Smith, G.P.; Gooding, M.J. Models of wheat grain quality considering climate, cultivar and nitrogen effects. Agric. For. Meteorol. 1999, 94, 159–170. [Google Scholar] [CrossRef]
  14. Chen, X.; Wang, L.; Niu, Z.; Zhang, M.; Li, J. The effects of projected climate change and extreme climate on maize and rice in the Yangtze River Basin, China. Agric. For. Meteorol. 2020, 282, 107867. [Google Scholar] [CrossRef]
  15. Genze, N.; Bharti, R.; Grieb, M.; Schultheiss, S.J.; Grimm, D.G. Accurate machine learning-based germination detection, prediction and quality assessment of three grain crops. Plant Methods 2020, 16, 157. [Google Scholar] [CrossRef] [PubMed]
  16. Kristensen, K.; Schelde, K.; Olesen, J.E. Winter wheat yield response to climate variability in Denmark. J. Agric. Sci. 2011, 149, 33–47. [Google Scholar] [CrossRef]
  17. Dalla Marta, A.; Grifoni, D.; Mancini, M.; Zipoli, G.; Orlandini, S. The influence of climate on durum wheat quality in Tuscany, Central Italy. Int. J. Biometeorol. 2011, 55, 87–96. [Google Scholar] [CrossRef]
  18. Oshan, T.M.; Li, Z.; Kang, W.; Wolf, L.J.; Fotheringham, A.S. MGWR: A Python implementation of multiscale geographically weighted regression for investigating process spatial heterogeneity and scale. ISPRS Int. J. Geo-Inf. 2019, 8, 269. [Google Scholar] [CrossRef]
  19. Woolfolk, C.W.; Raun, W.R.; Johnson, G.V.; Thomason, W.E.; Mullen, R.W.; Wynn, K.J.; Freeman, K.W. Influence of late-season foliar nitrogen applications on yield and grain nitrogen in winter wheat. Agron. J. 2002, 94, 429–434. [Google Scholar] [CrossRef]
  20. Ruan, G.; Li, X.; Yuan, F.; Cammarano, D.; Ata-Ui-Karim, S.T.; Liu, X.; Tian, Y.; Zhu, Y.; Cao, W.; Cao, Q. Improving wheat yield prediction integrating proximal sensing and weather data with machine learning. Comput. Electron. Agric. 2022, 195, 106852. [Google Scholar] [CrossRef]
  21. Zheng, B.; Jiang, J.; Wang, L.; Huang, M.; Zhou, Q.; Cai, J.; Wang, X.; Dai, T.; Jiang, D. Reducing nitrogen rate and increasing plant density accomplished high yields with satisfied grain quality of soft wheat via modifying the free amino acid supply and storage protein gene expression. J. Agric. Food Chem. 2022, 70, 2146–2159. [Google Scholar] [CrossRef] [PubMed]
  22. Feng, J.; Zhao, J.; Bian, X.; Zhang, W. Spatial distribution and controlling factors of heavy metals contents in paddy soil and crop grains of rice-wheat cropping system along highway in East China. Environ. Geochem. Health 2012, 34, 605–614. [Google Scholar] [CrossRef] [PubMed]
  23. Qiu, H.; Yang, S.; Jiang, Z.; Xu, Y.; Jiao, X. Effect of irrigation and fertilizer management on rice yield and nitrogen loss: A meta-analysis. Plants 2022, 11, 1690. [Google Scholar] [CrossRef] [PubMed]
  24. Aqeel, A.; Hassan, A.; Khan, M.A.; Rehman, S.; Tariq, U.; Kadry, S.; Majumdar, A.; Thinnukool, O. A long short-term memory biomarker-based prediction framework for Alzheimer’s disease. Sensors 2022, 22, 1475. [Google Scholar] [CrossRef]
  25. Osman, R.; Zhu, Y.; Ma, W.; Zhang, D.; Ding, Z.; Liu, L.; Tang, L.; Liu, B.; Cao, W. Comparison of wheat simulation models for impacts of extreme temperature stress on grain quality. Agric. For. Meteorol. 2020, 288, 107995. [Google Scholar] [CrossRef]
  26. Rharrabti, Y.; Elhani, S.; Martos-Nunez, V.; del Moral, L.F.G. Protein and lysine content, grain yield, and other technological traits in durum wheat under Mediterranean conditions. J. Agric. Food Chem. 2001, 49, 3802–3807. [Google Scholar] [CrossRef]
  27. Ding, L.; Li, Z.; Wang, X.; Yan, R.; Shen, B.; Chen, B.; Xin, X. Estimating grassland carbon stocks in Hulunber China, using Landsat8 oli imagery and regression kriging. Sensors 2019, 19, 5374. [Google Scholar] [CrossRef]
  28. Liu, S.; Xu, L.; Wu, Y.; Simsek, S.; Rose, D.J. End-use quality of historical and modern winter wheats adapted to the great plains of the United States. Foods 2022, 11, 2975. [Google Scholar] [CrossRef]
  29. Wang, L.; Zhou, X.; Zhu, X.; Dong, Z.; Guo, W. Estimation of biomass in wheat using random forest regression algorithm and remote sensing data. Crop J. 2016, 4, 212–219. [Google Scholar] [CrossRef]
  30. West, D.W.D.; Sawan, S.A.; Mazzulla, M.; Williamson, E.; Moore, D.R. Whey protein supplementation enhances whole body protein metabolism and performance recovery after resistance exercise: A double-blind crossover study. Nutrients 2017, 9, 735. [Google Scholar] [CrossRef]
  31. Liu, Y.; Tang, L.; Qiu, X.; Liu, B.; Chang, X.; Liu, L.; Zhang, X.; Cao, W.; Zhu, Y. Impacts of 1.5 and 2.0 degrees C global warming on rice production across China. Agric. For. Meteorol. 2020, 284, 107900. [Google Scholar] [CrossRef]
  32. Wang, J.; Wang, E.; Feng, L.; Yin, H.; Yu, W. Phenological trends of winter wheat in response to varietal and temperature changes in the North China Plain. Field Crops Res. 2013, 144, 135–144. [Google Scholar] [CrossRef]
  33. Martinez-Fernandez, J.; Chuvieco, E.; Koutsias, N. Modelling long-term fire occurrence factors in Spain by accounting for local variations with geographically weighted regression. Nat. Hazards Earth Syst. Sci. 2013, 13, 311–327. [Google Scholar] [CrossRef]
  34. Kang, Y.; Ozdogan, M. Field-level crop yield mapping with Landsat using a hierarchical data assimilation approach. Remote Sens. Environ. 2019, 228, 144–163. [Google Scholar] [CrossRef]
  35. Li, Z.; Taylor, J.; Yang, H.; Casa, R.; Jin, X.; Li, Z.; Song, X.; Yang, G. A hierarchical interannual wheat yield and grain protein prediction model using spectral vegetative indices and meteorological data. Field Crops Res. 2020, 248, 107711. [Google Scholar] [CrossRef]
  36. Motzo, R.; Giunta, F.; Deidda, M. Relationships between grain-filling parameters, fertility, earliness and grain protein of durum wheat in a Mediterranean environment. Field Crops Res. 1996, 47, 129–142. [Google Scholar] [CrossRef]
  37. Singh, M.; Singh, V.P.; Reddy, D.D. Potassium balance and release kinetics under continuous rice-wheat cropping system in Vertisol. Field Crops Res. 2002, 77, 81–91. [Google Scholar] [CrossRef]
  38. Ochola, D.; Boekelo, B.; van de Ven, G.W.J.; Taulya, G.; Kubiriba, J.; van Asten, P.J.A.; Giller, K.E. Mapping spatial distribution and geographic shifts of East African highland banana (Musa spp.) in Uganda. PLoS ONE 2022, 17, e0263439. [Google Scholar] [CrossRef]
  39. Kumar, S.; Lal, R.; Liu, D. A geographically weighted regression kriging approach for mapping soil organic carbon stock. Geoderma 2012, 189, 627–634. [Google Scholar] [CrossRef]
  40. Mishra, U.; Lal, R.; Liu, D.S.; Van Meirvenne, M. Predicting the spatial variation of the soil organic carbon pool at a regional scale. Soil Sci. Soc. Am. J. 2010, 74, 906–914. [Google Scholar] [CrossRef]
  41. Sun, Z.; Li, Q.; Jin, S.; Song, Y.; Xu, S.; Wang, X.; Cai, J.; Zhou, Q.; Ge, Y.; Zhang, R.; et al. Simultaneous prediction of wheat yield and grain protein content using multitask deep learning from time-series proximal sensing. Plant Phenomics 2022, 2022, 9757948. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Workflow of county-level winter wheat quality prediction based on machine learning.
Figure 1. Workflow of county-level winter wheat quality prediction based on machine learning.
Agronomy 13 02577 g001
Figure 2. County-level winter wheat grain protein content (GPC) statistics.
Figure 2. County-level winter wheat grain protein content (GPC) statistics.
Agronomy 13 02577 g002
Figure 3. Correlation between latitude and winter wheat grain protein content (GPC) in Jiangsu Province. Note: ** denotes statistically significant differences at p < 0.01.
Figure 3. Correlation between latitude and winter wheat grain protein content (GPC) in Jiangsu Province. Note: ** denotes statistically significant differences at p < 0.01.
Agronomy 13 02577 g003
Figure 4. Relationship between true and predicted values obtained by (a) GWR, (b) MLR, (c) RF, (d) BPNN, (e) SVM, and (f) LSTM. Note: *, ** denote statistically significant differences at p < 0.05 and p < 0.01, respectively.
Figure 4. Relationship between true and predicted values obtained by (a) GWR, (b) MLR, (c) RF, (d) BPNN, (e) SVM, and (f) LSTM. Note: *, ** denote statistically significant differences at p < 0.05 and p < 0.01, respectively.
Agronomy 13 02577 g004
Figure 5. Comparison of the values of the six methods on the metrics (a) R2, (b) RMSE, (c) MAE, and (d) MBE based on 6 features.
Figure 5. Comparison of the values of the six methods on the metrics (a) R2, (b) RMSE, (c) MAE, and (d) MBE based on 6 features.
Agronomy 13 02577 g005
Figure 6. Coefficient of 16 factors including N, P, K, SOM, TAMX03, MSD03, PRE03, MT03, TMAX04, MSD04, PRE04, MT04, TMAX05, MSD05, PRE05, and MT05 versus latitude.
Figure 6. Coefficient of 16 factors including N, P, K, SOM, TAMX03, MSD03, PRE03, MT03, TMAX04, MSD04, PRE04, MT04, TMAX05, MSD05, PRE05, and MT05 versus latitude.
Agronomy 13 02577 g006
Figure 7. Sensitivity indices of 16 factors including (a) N, (b) K, (c) P, (d) SOM, (e) TMAX03, (f) MSD03, (g) PRE03, (h) MT03, (i) TMAX04, (j) MSD04, (k) PRE04, (l) MT04, (m) TMAX05, (n) MSD05, (o) PRE05, and (p) MT05 in the GWR model.
Figure 7. Sensitivity indices of 16 factors including (a) N, (b) K, (c) P, (d) SOM, (e) TMAX03, (f) MSD03, (g) PRE03, (h) MT03, (i) TMAX04, (j) MSD04, (k) PRE04, (l) MT04, (m) TMAX05, (n) MSD05, (o) PRE05, and (p) MT05 in the GWR model.
Agronomy 13 02577 g007
Figure 8. SHAP values of (a) RF, (b) BPNN, (c) SVM, and (d) LSTM.
Figure 8. SHAP values of (a) RF, (b) BPNN, (c) SVM, and (d) LSTM.
Agronomy 13 02577 g008
Table 1. Summary of the collected soil and weather datasets.
Table 1. Summary of the collected soil and weather datasets.
DescriptionAbbreviationUnit
Soil nitrogen contentNg/kg
Soil potassium contentK%
Soil phosphorus contentPmg/kg
Soil organic matter contentSOM%
Maximum temperature in MarchTMAX03°C
Mean sunshine duration in MarchMSD03h
Precipitation in MarchPRE03mm
Mean temperature in MarchMT03°C
Maximum temperature in AprilTMAX04°C
Mean sunshine duration in AprilMSD04h
Precipitation in AprilPRE04mm
Mean temperature in AprilMT04°C
Maximum temperature in MayTMAX05°C
Mean sunshine duration in MayMSD05h
Precipitation in MayPRE05mm
Mean temperature in MayMT05°C
Table 2. Selected features in different subsets.
Table 2. Selected features in different subsets.
Feature
Subset
Feature NameFeature
Number
1N, K, P, SOM, TMAX03, MSD03, PRE03, MT03, TMAX04, MSD04, PRE04, MT04, TMAX05, MSD05, PRE05, MT0516
2N, K, P, SOM4
3TMAX03, TMAX04, TMAX053
4MSD03, MSD04, MSD053
5PRE03, PRE04, PRE053
6MT03, MT04, MT053
Table 3. Accuracy evaluation of four machine learning methods.
Table 3. Accuracy evaluation of four machine learning methods.
IndicatorRFBPNNSVMLSTM
R20.620.280.320.27
RMSE1.331.241.261.18
MAE1.021.171.150.98
MBE0.18−0.17−0.16−0.14
Table 4. GWR and MLR coefficients of the 16 independent variables.
Table 4. GWR and MLR coefficients of the 16 independent variables.
VariableGWRMLR
Min1/4 QuantileMedian3/4 QuantileMaxCoefficientSE
Intercept0.00110.10016.24923.11834.20317.4051.231
N−1.0450.1210.3030.5971.6760.4250.152
K−1.702−0.654−0.386−0.1150.974−0.7650.175
P−0.0410.0030.0120.0360.0570.0090.004
SOM−0.414−0.0920.0850.2120.5730.1210.076
TMAX03−1.126−0.670−0.449−0.2890.688−0.4940.042
MSD03−0.446−0.094−0.026−0.0450.233−0.0590.029
PRE03−0.020−0.010−0.006−0.0030.006−0.0080.001
MT03−0.266−0.0210.0860.1480.2400.0200.028
TMAX04−0.552−0.307−0.2060.0440.210−0.1150.053
MSD04−0.447−0.0900.1310.2720.5400.0590.037
PRE04−0.005−0.0010.0060.0110.0200.0030.001
MT040.478−0.245−0.095−0.0380.175−0.1500.030
TMAX05−0.6580.0400.2650.4180.6380.2520.058
MSD05−0.322−0.0880.0320.1400.4420.0640.045
PRE05−0.007−0.0010.0020.0050.0120.0030.001
MT05−0.281−0.0180.0800.2240.4520.0910.037
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Song, Y.; Zheng, X.; Chen, X.; Xu, Q.; Liu, X.; Tian, Y.; Zhu, Y.; Cao, W.; Cao, Q. Improving the Prediction of Grain Protein Content in Winter Wheat at the County Level with Multisource Data: A Case Study in Jiangsu Province of China. Agronomy 2023, 13, 2577. https://doi.org/10.3390/agronomy13102577

AMA Style

Song Y, Zheng X, Chen X, Xu Q, Liu X, Tian Y, Zhu Y, Cao W, Cao Q. Improving the Prediction of Grain Protein Content in Winter Wheat at the County Level with Multisource Data: A Case Study in Jiangsu Province of China. Agronomy. 2023; 13(10):2577. https://doi.org/10.3390/agronomy13102577

Chicago/Turabian Style

Song, Yajing, Xiaoyi Zheng, Xiaotong Chen, Qiwen Xu, Xiaojun Liu, Yongchao Tian, Yan Zhu, Weixing Cao, and Qiang Cao. 2023. "Improving the Prediction of Grain Protein Content in Winter Wheat at the County Level with Multisource Data: A Case Study in Jiangsu Province of China" Agronomy 13, no. 10: 2577. https://doi.org/10.3390/agronomy13102577

APA Style

Song, Y., Zheng, X., Chen, X., Xu, Q., Liu, X., Tian, Y., Zhu, Y., Cao, W., & Cao, Q. (2023). Improving the Prediction of Grain Protein Content in Winter Wheat at the County Level with Multisource Data: A Case Study in Jiangsu Province of China. Agronomy, 13(10), 2577. https://doi.org/10.3390/agronomy13102577

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop