1. Introduction
As one of the important cereal crops, wheat is a vital food source for human beings. With global climate change, the frequent occurrence of various extreme climate events, and the growing global population, food security has become a major concern [
1]. In this context, an accurate estimation of wheat production plays a significant role in the stable development of agricultural production, ensuring the effective supply of important agricultural products, improving the agro-ecological environment, and promoting food security [
2].
Conventional crop yield estimation mainly relies on human labor, and it is difficult to perform on a large scale. In recent years, multi-source data have played an increasingly important role in crop yield estimation [
3,
4,
5]. Crop yield estimation methods can be mainly classified into crop growth model estimation based on the crop growth process and empirical model estimation based on conventional statistics. Crop growth models (e.g., DSSAT, WOFOST, ASPIM) can simulate the growth of crops at different time steps, as well as the process of yield formation under extreme climatic conditions, with good physical interpretation [
3,
6,
7,
8,
9]. However, due to the excessive number of input parameters (e.g., varieties, field management data, soil properties), the difficulty in obtaining these parameters, and the strong influence of spatial heterogeneity, the model applicability is low, the simulation is time-consuming, and it is difficult to realize an accurate yield estimation over a large-scale region [
10,
11].
Empirical statistical models (conventional statistical models and machine learning models) have shown advantages in large-scale crop yield estimation compared with crop growth models based on crop growth processes. In recent years, conventional empirical statistical models have produced better yield estimation results [
12]. For example, Ren [
13] et al. constructed a linear relationship between the Normalized Difference Vegetation Index (NDVI) and yield; this relationship effectively helped estimate the winter wheat yields in the Yellow–Huaihua Plain of China, with an estimation accuracy of R
2 ≥ 0.86 for different prefecture-level cities. However, since conventional empirical statistical models rely only on the linear relationship between vegetation indices (VIs) and target yields, it is difficult to dig deeper into the relationship between data and yields when the linear relationship is not satisfied. Machine learning algorithms [
14,
15], such as random forest (RF), support vector machine (SVM), gaussian process regression (GPR), partial least squares (PLS), etc., can be used to construct nonlinear relationships between environmental factors and yield. Various optimization algorithms can be used to optimize the model and identify features from data that play a key role in yield estimation [
16,
17]. For example, Son [
18] et al. used the Moderate Resolution Imaging Spectroradiometer (MODIS) NDVI series data to develop a machine learning model based on the RF to estimate rice yields in both seasons in Taiwan, China; the coefficient of determination R
2 was greater than 0.84.
With the continuous development of satellite technology, large amounts of remote sensing data have been applied in various fields, and multi-source remote sensing data are widely used in crop yield estimation [
19,
20]. Currently, most studies have employed the MODIS image data and Landsat to obtain relevant VIs, such as the NDVI, EVI, leaf area index (LAI), etc., to construct a relationship between the index and yield [
21,
22,
23]. Huang [
10] et al. predicted the yield in Hebei Province, China, by assimilating the LAI obtained from Landsat TM and MODIS data into a crop growth model with an RMSE = 151.29 kg/ha. Douglas K. Bolton [
24] developed empirical models for predicting corn and soybean yields in central U.S. The use of the MODIS two-band enhanced vegetation index (EVI2) for corn yield estimation, compared with the widely used NDVI, resulted in better estimates. Due to the limitations of MODIS and the spatiotemporal resolution of Landsat, although better yield information can be obtained, there are some drawbacks in interpreting the complex relationship between crop yields over time. Therefore, an increasing number of studies have adopted Sentinel-2 image data and calculated the correlation indices for yield estimation, achieving better results for both conventional linear and nonlinear models.
The booming development of computer technology has provided powerful arithmetic support. Several algorithmic models have emerged, and the deep learning technique has been widely used. Deep learning, which is an advanced machine learning algorithm, is more advantageous than conventional linear models in dealing with nonlinear relationships by continuously learning layer by layer from low-level features to high-level features, and it has been widely used in various fields [
25,
26,
27,
28]. As a result, the application of deep learning models has gradually gained attention, especially combining the feature extraction capability of convolutional neural networks (CNNs) [
29,
30] with the advantages of long short-term memory networks (LSTMs) in processing time series data [
31]. This algorithmic fusion not only improves the prediction performance of the model but also identifies the key factors affecting yield during crop growth more effectively. By using CNN to extract spatial features, the model is able to analyze important information in satellite images in greater depth, while LSTM is able to deal with dependencies in time-series data, thus capturing dynamic changes at different growth stages. In addition, the introduction of the multi-attention mechanism enables the model to focus on multiple relevant features, thus enhancing the identification and weighting of important variables. This integrated approach effectively enhances the accuracy and reliability of yield estimation and can provide farmers and policymakers with more accurate forecasting information.
In the yield estimation process for crops, accumulating and learning from historical information is crucial. However, since 1D-CNNs lack a recurrent mechanism, integrating historical data into the learning process can be challenging. Xiao [
32] et al. constructed a novel deep learning framework, ACNN, with Sentinel-2 raw bands as input data and effectively identified the key spectral bands by placing a weighted calculation of the extracted features after a number of convolutional layers. By placing attention channels and weighting the extracted features after several convolutional layers, the key spectral bands were effectively identified, overcoming the problems of high-yield underestimation and low-yield overestimation; the R
2 and RMSE values of the model were 0.44 and 1544 kg/ha, respectively. Clearly, the CNN has strong feature extraction capability and the advantage of the attention mechanism in determining the contribution of features. Tian [
23] constructed a deep learning framework based on the attention mechanism (ALSTM). The LSTM network has the advantage of processing temporal data, and the incorporation of the attention mechanism improves the internal interpretability of the model. It can also identify the importance of input variables in determining the yield at different fertility periods. Wang [
33] et al. combined a CNN with a GRU network to estimate the wheat yield in the Guanzhong Plain of Shaanxi; in terms of prediction accuracy, the R
2 and RMSE values of the model were 0.64 and 462.56 kg/ha, respectively. This demonstrates the advantages of the CNN combined with RNN for yield estimation. Sun [
34] introduced an innovative multi-level CNN–LSTM model designed to extract spatiotemporal features for predicting maize yield, showcasing the benefits of integrating CNN with RNN. Nonetheless, research focusing on the use of CNN in combination with LSTM and attention mechanisms for crop yield estimation remains limited.
Hence, in this study, a CNN–ALSTM deep learning model was developed, and its performance and effectiveness in estimating wheat yield across different years and fertility periods were assessed. A CNN–MALSTM yield estimation framework was constructed by combining remote sensing and meteorological data to estimate winter wheat yield in the counties of Henan Province. The advantage of the LSTM in processing time-series data was utilized, combined with the powerful spatial feature extraction capability of the CNN and the identification capability of different input features contributing to the yield by the multi-attention mechanism, in order to improve the interpretability of the model between the input data and the target yield. The EVI, temperature, and precipitation were used as input data for the yield estimation, and the reason for choosing these parameters as the model input features is that the EVI often responds to crop growth, while temperature and precipitation as meteorological data are crucial in the crop growth process. The main objectives of this study were as follows:
- (1)
Investigating the differences between the CNN–MALSTM model and the baseline model for county crop yield estimation in terms of their accuracy and performance.
- (2)
Investigating the stability of the CNN–MALSTM model in yield estimation by fertility period for each year and the spatial and temporal distribution of production estimates in counties in different years.
- (3)
Determining the optimal fertility period for yield estimation.
3. Results and Analysis
3.1. Yield Estimation Using CNN–MALSTM and Baseline Model
Figure 7 shows the scatter plot of the yield estimation for the test set across 101 counties of Henan Province. The results indicated that the R
2, RMSE, and MAPE of the CNN–MALSTM model and the baseline model, CNN–LSTM, were 0.79 and 0.75, 576.01 kg/ha and 646.53 kg/ha, and 7.29% and 8.82%, respectively. The overall performance of the CNN–MALSTM model was better; in comparison with the CNN–LSTM model, the R
2 increased by 0.04, and the RMSE and MAPE decreased by 70.52 kg/ha and 1.53%, respectively. From the yield scatter plot, the data are distributed on both sides of the 1:1 line, exhibiting a linear relationship. The MAPE of the CNN–MALSTM model was 7.29%, which means that the model could explain more than 90% of the yield variations. This demonstrates the effectiveness of the combination of the CNN and multi-attention mechanism in extracting the relevant features from the original data and the fact that the model can pay attention to the differences in the contribution of different features.
Compared with the CNN–LSTM model, the slope of the trend line for the CNN–MALSTM model was closer to 1. In the yield intervals of 3000–5000 kg/ha and 7000–8000 kg/ha, considered low-yield and high-yield, respectively, the scatter plots of the two models showed that the CNN–MALSTM model can alleviate the problems of high-yield underestimation and low-yield overestimation in the yield estimation process. This also verifies the effectiveness of the multi-attention mechanism to solve these problems. Compared with the CNN–LSTM model, the proposed model performed well for both high- and low-yield samples; however, the overall slope was still less than 1, and there was still an underestimation of high yields and an overestimation of low yields in some areas. Based on the density of the scatterplot in different yield intervals, the yield data were mainly concentrated in the range of 5000–7500 kg/ha, compared with the low-yield samples and high-yield samples, which also explains the lower accuracy of the two models for the high and low yields from one aspect. Deep learning typically requires a large sample size, and the sample size in this study resulted in the model’s inability to learn high-yield features. From the 600 kg/ha error line on the three scatter plots, combined with the evaluation metrics R2 and RMSE, the CNN–MALSTM model fell the least outside of the error line and was closer to the trend line, whereas the CNN–LSTM model suffered from significant underestimation of high yields. This further suggests that a combination of CNN and the multi-attention mechanism allows for better extraction of important features.
3.2. Comparison of CNN–MALSTM Model in Terms of County Residuals
In this study, the difference between the official statistical yields and the predicted yields was used to calculate the residuals. We took into account the specific yield characteristics of the study area and referred to the threshold-setting method for wheat yield estimation from similar areas [
39]. We used the absolute value threshold method to set a fixed absolute value threshold of 600 kg/ha. Specifically, residual values greater than 600 kg/ha were considered underestimates, while those less than 600 kg/ha were considered overestimates.
Figure 8 shows the histograms of the residuals in counties with the CNN–MALSTM model. Overall, the yield errors in most areas were within 600 kg/hm
2, and the overall residuals followed a normal distribution and were uniformly distributed on both sides of zero. Based on the residual box plots, high yields showed different degrees of underestimation, while the medium- and low-yield residuals were both positive and negative. The average residuals for the high yield were 337.50 kg/ha, which was much higher than the residuals for the medium and low yields of −230.29 kg/ha and −183.05 kg/ha. These results indicated that the model performed well in medium- and low-yielding areas. From the size of the residual violin plot shown in
Figure 9, it can be seen that the number of medium-yield samples is high, consistent with the distribution of the collected yield data. During model training, the use of fewer high-yield data resulted in the model performing poorly in estimating high yields and better in estimating low and medium yields, with the CNN–MALSTM model performing the best. Overall, the model can learn the relationship between different features to better relate to the final yield.
Box plots are an effective statistical chart used to demonstrate the variability and distributional characteristics of data. They help researchers quickly understand the distribution of data by displaying key statistical indicators of the dataset, such as minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum values. Thus, box plots can help in analyzing the variations of different ranges of yields in winter wheat yield prediction results. Further analysis of the cause of the residuals through box plots reveals that the unevenness of the data distribution may be a key factor contributing to this phenomenon. The scarcity of high-yield samples makes the model’s learning ability in these areas insufficient to capture the key factors affecting high yields, such as climate change and corresponding remote sensing data. In addition, feature selection and model complexity also affect the prediction results. In order to improve the performance of the model in high-yield areas, future research could consider introducing more high-yield data or applying data enhancement techniques to improve the accuracy of the model in high-yield estimation.
3.3. Model Robustness Assessment
To further assess the robustness and generalizability of the model, cross-validation was used for model training, i.e., one year’s data were used for model validation, while the data from other years were used for model training. The model performance was then assessed for each year until all the year’s data had been used as the test set in turn.
Figure 10 shows the results of the cross-validation evaluation of the model for different years, with R
2 ranging from a minimum of 0.70 to a maximum of 0.74 and the corresponding RMSE ranging from 718.40 kg/ha to 697.58 kg/ha. Overall, the estimation accuracy of the model was more stable across all years, with an average R
2 of 0.73 and an RMSE of 690.82 kg/ha. The interannual differences in the estimation results of the different years were relatively small, indicative of the robustness of the CNN–ALSTM model (
Table 3 and
Figure 10).
3.4. Fertility-by-Fertility CNN–MALSTM Model Performance Analysis
To explore the cumulative effect of input feature time on the final yield estimation, a stepwise sensitivity analysis was used to quantify the changes in the model’s yield estimation performance with the increase in the feature data. The evaluation metrics used R
2 and RMSE to compare and analyze the accuracy of the yield estimation for 2019–2023 in each county by the fertility period.
Figure 11 and
Figure 12 show the results.
Before the overwintering period, winter wheat grows slowly, soil background information has a large impact, and it is difficult to obtain yield-related information from satellite images. On the other hand, meteorological data inputs are small, and it is difficult to distinguish the impact of environmental factors on winter wheat, resulting in the model being inaccurate in the early stage.
As shown in
Table 4, in five years, the mean values of the yield estimation for the same fertility period were 0.43 and 999.84 kg/ha for R
2 and RMSE in the overwintering period, respectively. After winter, wheat enters the greening period; the temperature gradually rises, and under the influence of photosynthesis, winter wheat grows rapidly, and the chlorophyll content of the plant body rises substantially until it reaches the maximum value before the tassel filling period. Accordingly, this is reflected in remote sensing imagery. After overwintering, with the increase in the input model data over time, the model R
2 increased significantly, and the RMSE decreased significantly, indicating that the complex nonlinear relationship between the input characteristics and yield could be significantly captured; the model accuracy leveled off from the late stage of filling to maturity. During the gradual input of data from different fertility stages, the effect of the data on final yield varies across fertility periods, and there was an evident rising interval in the model accuracy. The mean R
2 increased by 0.25, and the RMSE decreased by 261.52 kg/ha from the overwintering to the heading stage. In terms of wheat growth stage, the dramatic increase in chlorophyll content, enhanced photosynthesis, and increased demand for water and nutrients in winter wheat from jointing to heading are important factors in the formation of wheat yields, and the model could successfully identify the contribution of the characteristics of this stage. The performance of the model was consistent from year to year, as seen in different years.
An analysis of
Table 4 shows that with the gradual increase in the input data, the model can better capture the critical period of wheat growth and gradually improve the ability of yield estimation. In the later stage of growth, as the growth of winter wheat reaches its peak and gradually undergoes senescence, the chlorophyll and water contents in the plant body decline rapidly. The input data reflecting the yield information also tends to be saturated, and the model accuracy is slow to improve and tends to be stable, achieving a better effect in the anthesis period. The five-year average R
2 and RMSE were 0.71 and 720.83 kg/ha, respectively. In summary, the model can obtain good yield estimation around the anthesis period.
3.5. Analysis of the Spatiotemporal Distribution of County-Level Estimates of the Winter Wheat Yield for the CNN–MALSTM Model
Figure 13 shows the predicted and residual distributions of the model in each year, where
Figure 13a–e are the county-level estimated yield distribution maps of the model for the 2019–2023 period, and
Figure 13a’–e’ are the differences between the statistical and predicted values of the model in each year. The estimated yield distribution map shows that the forecast results are overall consistent with the trend in the yield distribution in Henan Province, with lower values in the west and higher values in the center and east. This is related to the undulating terrain. The western part of the country is at a higher altitude, and the central and northern plains are suitable for wheat cultivation, while the southern Xinyang area has limited wheat cultivation due to topographical and climatic factors, resulting in lower yields. From the residual plot, the high-yielding areas in the east-central part, such as Boai County, Wuzhi County, Wen County, Yanling County, Xiangcheng County, Joon County, Changge City, and Shangshui County, were underestimated to varying degrees. Low-yielding areas in the west, such as Luanchuan County, Luoning County, Ruzhou City, Xixia County, Xichuan County, and others, were overestimated. Among them, Xixia County had an error greater than 600 kg/ha in all five years of yield estimation, and the area has a high degree of topographic relief, which is affected by topographic shading, leading to a decline in image quality and variations that may be significant across years and locations, further exacerbating the fluctuations in yield. Therefore, these factors may have contributed to the bias in yield estimates. Overall, the yield production in Henan Province increased steadily; the model’s estimation results for each year were consistent with the official statistical distribution, the absolute value of the model’s overall residuals remained within 600 kg/ha, and the performance of the CNN–MALSTM model in estimating the yields in each county in Henan Province was overall better.
4. Discussion
4.1. Analysis of the Advantages of the CNN–MALSTM Model
The CNN–MALSTM model proposed in this paper effectively estimates the yield of winter wheat at the county level in Henan Province by integrating multi-source remote sensing data. The model not only realistically reflects the distribution of the data but also captures the key variables that affect the yield of winter wheat throughout the growth and development process. Compared with the baseline model CNN–LSTM, the CNN–MALSTM model incorporates a multi-attention mechanism, which significantly improves the accuracy of yield estimation, especially in overcoming the problem of underestimation of high yield. The yield estimation of the model highly matches the actual yield distribution in Henan Province.
In the cross-validation of the “one year on leave” approach, the model’s yield predictions show stability, although the prediction accuracy varies slightly across years. In addition, the model performed well in the fertility-based sensitivity analysis in different years. Using the sensitivity analyses in
Table 4, we found that gradually adding fertility data to the model was effective in quantifying the accuracy of predicted yields and that the model accurately predicted yields as early as 20 days before the harvest of winter wheat, which is in line with the results of a previous study [
40]. This 20-day lead time is important for early decision-making and planning by farmers and policymakers to help them adjust their strategies in time to maximize yield potential and resource allocation.
In order to deeply explore the changes in the contribution of CNN–MALSTM to winter wheat yield estimation compared to CNN–LSTM in different years and different fertility periods, we conducted fertility-by-fertility sensitivity analyses of CNN–LSTM and calculated the R
2 and RMSE in each year. By averaging the results of the two models over five years in each fertility period (shown in
Figure 14), we are able to intuitively observe the trend of changes in the estimation accuracy of the models at different fertility periods. Before the greening period of winter wheat, the factors affecting the yield are relatively simple due to weak crop growth, such as temperature, precipitation, and other basic meteorological data. These factors can be effectively captured by the CNN–LSTM model, so the model performs well at this stage even without the help of the attention mechanism. However, as the growth of winter wheat advances, the meteorological data (e.g., temperature and precipitation) and remote sensing data are gradually fused, at which point the advantage of the attention mechanism begins to emerge. It can help the model focus more precisely on key yield-related features, which in turn improves the prediction ability. Both models were able to identify important fertility stages of winter wheat (e.g., the jointing to anthesis stage), where the link between key crop characteristics and yield potential became clearer, especially when the EVI reached saturation. The introduction of the multiple attention mechanism allows the model to capture these key characteristics at an early stage, resulting in earlier and more accurate yield predictions, providing relatively accurate predictions about 20 days in advance.
In summary, by combining CNN–LSTM models with multiple attention mechanisms, we revealed the cumulative effects and nonlinear relationships between the influencing factors and yield, and explored the timeliness of crop yield prediction as well as the contribution of multivariate variables to yield estimation at different growth stages. Simple feature relationships in the early stage may allow the model without the attention mechanism to perform better, while in the later stage, as the feature relationships become more complex, the inclusion of the attention mechanism can effectively improve the model’s performance. This phenomenon reflects the different needs for model structure and feature extraction capability at different stages. The model not only improves the reliability and timeliness of yield prediction but also provides a solid scientific basis for relevant management decisions.
4.2. Generalisation Performance of the Model
Based on the data processing and preparation in Chapter 2, we applied the CNN–MALSTM model to Pingyuan County, Shandong Province, for a heterogeneous generalization experiment to estimate winter wheat yield between 2019 and 2023. Under the premise of keeping the model structure and parameters unchanged, the prediction accuracy performance of the model in these five years is shown in
Table 5.
During the study period from 2019 to 2023, the model achieved an R
2 value of 0.73 and an RMSE of 422.21 kg/ha, showing good performance of the model in different years. The average accuracy was as high as 93.80%, and the maximum inter-annual gap was only 4.0%. This result indicates that despite the increased fluctuation in yields in the county after 2019 (
Table 5) and the decrease in the accuracy of the model compared to 2019, the overall high predictive ability was still maintained. There is a subtle underestimation of the overall yield, potentially because the distribution of training data features in Henan may be significantly different from those in Shandong, and this distributional difference can affect the predictive accuracy of the model, especially when regional extrapolation is prone to bias. Specifically, the model is able to adapt to climate change and agricultural management practices in different years, ensuring that a high level of prediction accuracy is maintained despite data changes. In addition, the ability to effectively integrate remotely sensed data with meteorological information enhances its applicability to different regions and time periods.
In summary, the performance of the CNN–MALSTM model in heterogeneous generalization confirms its reliability and adaptability in winter wheat yield estimation, and lays the foundation for future applications in other regions.
4.3. Analysis of Model Limitations
Although the proposed CNN–MALSTM model performs well at the county level, there are still some limitations. One of the main challenges is related to data collection. Due to the relatively short runtime of Sentinel-2’s binary star network, the model may have a more limited sample size, which affects its robustness and generalization ability. Faced with the discrepancy between the high-frequency temperature and precipitation data and the low-frequency satellite imaging data, we unified all the data to the same time step, i.e., the fertility interval, according to the research needs. The other high-frequency meteorological data can be fulfilled within the current time interval only if the satellite images satisfy the prerequisites. To cope with this problem, data augmentation methods such as generative adversarial networks (GANs) and variational autoencoders (VAEs) can be considered in the future to extend the diversity of the dataset. In addition, when Sentinel-2 data are not available at a certain growth stage of winter wheat due to factors such as clouds and rain, a potential solution is to integrate Landsat and Sentinel-2 data through multi-source image fusion techniques to ensure the completeness of the images during the reproductive period. Second, this study explored the ability of the model to be applied in winter wheat yield estimation, mainly from the perspective of the fertility period. Thanks to the high temporal resolution of Sentinel-2, future studies can further explore yield estimation at finer time scales (e.g., 5-, 10-, and 15-day intervals). It is also important to perform a detailed data quality assessment, including the detection of missing values, outliers, and erroneous data, before using meteorological datasets. Consider using interpolation techniques (e.g., mean interpolation, linear interpolation, or model-based interpolation) to fill in missing values and ensure the integrity of the dataset. In cases where there are fewer missing values, there is an option to remove missing data points to avoid any impact on model training.
The diversity of cultivars and field management practices within the vast winter wheat growing region of Henan Province, coupled with the fact that it spans a significant north-south distance, poses additional challenges for yield estimation. For example, there are differences in the phenological periods of winter wheat in northern and southern Henan Province. Although this study used a uniform standard to delineate fertility periods across the province, this may not fully explain the regional differences. Future studies could adjust fertility intervals based on recognized phenological differences to improve the accuracy of input data and ultimately enhance the precision of yield estimates. Additionally, certain factors that significantly influence yield, such as elevation, soil properties, human field management, and irrigation practices, were not included in this study due to limitations in feature quantification and data acquisition. Jiang et al. [
41] showed that the introduction of these factors can effectively help the model identify spatial heterogeneity in yield. Although the model performed well on the existing dataset, the inclusion of these additional factors is expected to further enhance its predictive ability. For example, solar-induced fluorescence (SIF), which reflects crop photosynthesis, and the vegetation temperature condition index (VTCI), which is sensitive to moisture stress during the growing season [
42], can be integrated into the yield estimation model to further improve its accuracy.
Although deep learning models like CNN–MALSTM are good at revealing complex relationships between data, they are often criticized as “black boxes” due to the black-box nature of their internal computational processes. This lack of transparency can lead to uncertainty, as the intrinsic relationship between input features and target yield remains unclear. In contrast, traditional crop growth models provide a more explanatory characterization of crop growth processes and mechanisms. Future research could attempt to combine crop growth models with deep learning methods to improve the interpretability of the models and apply them to large-scale yield estimation. This hybrid approach could combine both the predictive power of deep learning and the explanatory advantages of crop growth models.
In addition, the model in this study has only been tested on winter wheat and has not been validated for other crops (e.g., corn and soybean). This limits the applicability of the model to some extent, as its accuracy and reliability for other crops cannot be guaranteed. Different crops have different growth cycles and phenological characteristics and therefore require independent modeling and validation to ensure generalization and accuracy of the predictive model. In addition, the model has only been tested in wheat-producing areas in Henan Province and Pingyuan County, and further validation in other areas in the future will help to improve the applicability of the model. Therefore, the generalization and wide applicability of the method still need further research and validation.
5. Conclusions
In this study, we developed a CNN–MALSTM deep learning framework that combines the spatial feature extraction capability of convolutional neural networks (1D-CNN) with the advantages of long short-term memory (LSTM) networks in processing time-series data and integrates a multi-attention mechanism to enable the model to efficiently capture the contribution of various features to the final yield. All data processing steps were performed on the Google Earth Engine (GEE) platform, including the pre-processing of Sentinel-2 imagery (resampling and masking) and computation of the EVI. In addition, meteorological data (precipitation and temperature) were derived from the ERA5 dataset, extracted hour by hour through GEE, and aggregated to the fertility scale. This processing method makes the data processing more efficient and guarantees the spatial and temporal consistency of the data.
The results show that the CNN–MALSTM model exhibits excellent accuracy in estimating winter wheat yield at the county level in Henan Province, outperforming the baseline model CNN–LSTM. In particular, the model is able to accurately predict the yield about 20 days before harvest, showing good robustness and reliability. In addition, the spatio-temporal distribution map of county-level winter wheat yields in Henan Province from 2019 to 2023 demonstrated a trend of lower yields in the western region and higher yields in the central-eastern region, along with a reduction in inter-annual variability. The results are highly correlated with the official statistics, indicating that the proposed CNN–MALSTM framework has significant potential for yield estimation.
Future research could explore the application of the CNN–MALSTM framework to other crops or regions to assess its generality and adaptability. In addition, incorporating other variables that affect yield, such as soil, geographic, or pest and disease information, may further improve the predictive power and robustness of the model. These research avenues will help expand the range of model applications and provide broader technical support for global agricultural yield prediction.