Mapping Field-Level Maize Yields in Ethiopian Smallholder Systems Using Sentinel-2 Imagery
Round 1
Reviewer 1 Report (Previous Reviewer 2)
Comments and Suggestions for AuthorsThe revised manuscript improved greatly. However,I think the introduction and the result should add some relevant references in order to provide more scientific soundness and interest to the readers. In a word, the manuscript can be accepted after minor revision.
Comments on the Quality of English LanguageStill I think minor editing of English language required.
Author Response
Please see attached document.
Author Response File: Author Response.docx
Reviewer 2 Report (New Reviewer)
Comments and Suggestions for AuthorsTitle: Mapping Field-level Maize Yields in Ethiopian Smallholder Systems Using Sentinel-2 Imagery
This work assesses the ability of three vegetation indices derived from Sentinel 2 imagery, and two models, linear regression and random forest regression, both with and without adding environmental data regarding land temperature and soil characteristics, to map maize yields at the plot level in Oromia district, Ethiopia, during 2021 long rainy season. They conclude that it is possible to map yields at the plot level using vegetation indices, in particular using MTCI, with accuracies similar to those obtained by other authors but the models developed for one study region were not generally applicable in the other region. Adding environmental parameters to the models did not improve accuracy.
General remarks:
The authors show an interesting application of Sentinel 2 data to map smallholders’ yields which is a research problem yet to be addressed thoroughly in the literature.
I find that one weakness of this manuscript is that the authors directly model maize yield but don’t discuss the above ground dry matter production that could be more directly related to the vegetation indexes. Maybe the authors could explain if they couldn’t have this kind of data (aboveground dry matter) or if they have reasons to think that the maize harvest index is quite stable among varieties and cultural practices in the study area.
Also showing some graphs of predicted yield vs. actual yield could be interesting for the reader to understand the range and the trends of yields obtained in this work.
Further remarks (not exhaustive):
Reference to Mueller et al., 2012 (line 34) isn’t provided in the references section.
Please check reference to IPCC 2022 (line 35).
Reference to FAO 2022 (line 35) could be OECD – FAO 2022?.
The year of reference to Abate et al., 2015 in line 39, is 2013 in the references.
Referenece to Paliwal and Jain, 2020 isn’t provided in the references section.
Reference (Johnson et al., 2016) (line 104) does not appear in the refrences section.
Reference (Kang et al., 2019) (line 104) isn’t provided or the year is 2020 (cf. line 585).
Reference Debalke and Abebe, 2022 (line 115) isn’t provided in the references section.
Reference Guo et al., 2023 (line 116-117, 396-397, and 444) isn’t provided in the references section.
Reference FAO 2022 in line 135 is probably OECD/FAO 2022, please correct if it is.
Reference Gitelson et al., 2003 is not provided in the references section.
Reference year in (Breiman, 2011) in line 291 should be 2001? Please check.
Reference year (Dash and Curran, 2007), in line 408, and line 420, and, doesn’t match the reference. In line 535-536. Please check.
References to Gu et al., 2013 and Ulfa et al., 2022 ( line 414) are missing in the references section.
Reference to Johnson et al., 2016 in line 463 is missing in the references section.
Author Response
Please see attached document.
Author Response File: Author Response.docx
Reviewer 3 Report (New Reviewer)
Comments and Suggestions for AuthorsThis study aimed to estimate field-level maize yields using field level yield data and Sentinel-2 data through linear regression and random forest regression methods. The authors addressed several common challenges in yield estimation, such as selecting the optimal vegetation index and comparing the performance of linear regression and random forest. Of particular interest is the exploration of whether a generalizable model can accurately predict maize yields using limited training ground data. The topic of the research is straightforward and comprehensible, yet the research problem is intriguing. The manuscript is well-written, below are suggestions for improving the study:
1. Lines 279-282, the inconsistency between "B1VI1" and the symbol β in formula (2) should be addressed.
2. It is recommended to present comparisons between estimated maize yields by linear regression or random forest regression and in-field yield estimations using scatter plots, similar to Fig. 2(c, d) in Burke and Lobell (2017) or Fig. 4 in Lobell et al. (2015). Including a 1:1 line in the figure will help identify overestimations or underestimations due to saturation effects or other factors.
3. The manuscript cites Guo et al. (2023) three times but does not include it in the reference list. Please ensure all cited works are correctly listed.
4. Line 484: Could you briefly explain how crop model simulations can replace localized ground data in training algorithms?
5. In future studies, will you consider incorporating SAR data (e.g., Sentinel-1) or Lidar data to enhance accuracy?
6. Figure S1: Please add a figure caption explaining what VI 1, VI 2, VI 3, etc., represent.
Comments on the Quality of English LanguageMinor editing of English language required
Author Response
Please see attached document.
Author Response File: Author Response.docx
Reviewer 4 Report (New Reviewer)
Comments and Suggestions for AuthorsThis manuscript is concerned with maize yields mapping using Sentinel-2 satellite, different indices are tested including MTCI, GCVI, and NDVI, different models (linear regression and random forest regression) are used to map field-level yields. The authors also examine if models improve by weather and soil data, and how generalizable the models are if trained in one region and applied to another region without data for model calibration. Some inspiration can be drawn from this work, which could provide a possible reference for maize yields mapping using remote sensing data. However, there are some significant flaws in the manuscript. I advised accepting this paper after a major revision. Some major problems must be addressed including maps of yields, correlation analysis, colinearity problem, these will directly affect whether this manuscript will be accepted. The specific revision suggestions can be found as follows:
(1) This manuscript is focused on maize yields mapping, but the authors do not provide any maps of yields (both results and references) throughout the whole manuscript. I think this is the biggest problem.
(2) We can find from Figures s1-s3 in the supplementary that most of the independent variables are weakly correlated with the dependent variables, and we can even conclude that there is no correlation between them. These variables are not suitable for modeling, especially linear regression. Because linear regression already assumes that these environmental variables and vegetation index are related to yield. In other words, these independent variables should be excluded in feature selection.
(3) The authors have found that adding soil and weather data did little to improve model fit. Are the colinearity or multicollinearity between these environmental variables and the vegetation index considered? Still in Figures s1-s3, we found that most of the independent variables have weak correlation with the dependent variables, but we found that there are strong correlations between the independent variables, such as precipitation, TMAX and TMIN in Figure S2. This suggests that authors need to be careful in selecting variables for modeling.
(4) Section 3.2, how do the authors directly assess and influence cloud cover on vegetation index calculations and final yield predictions? This corresponds to question 4 of this article, where I can not find a direct data result.
(5) For the comparison of the model accuracy, the authors only use tables to present, which make the results not obvious. The author should consider using graphs for better present.
(6) Line 492-494, authors find that random forest with MTCI has the best perform, though these models were less generalizable than linear regression models. In evaluating the model's generalization ability, what are the specific factors that lead to the insufficient generalization ability of the model?
Author Response
Please see attached document.
Author Response File: Author Response.docx
Round 2
Reviewer 4 Report (New Reviewer)
Comments and Suggestions for AuthorsNo further questions.
Author Response
Thank you for your note. We have now improved the clarity of our results thanks to you and the academic editor by moving Figure 3 into the results section and clearly explaining this result and how it connects to the rest of the paper. Thanks for this helpful suggestion.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper discusses the remote sensing data to explain the maize yield variability, in this context; many research works already published and available for readers. This technical note could be considered as low-medium quality work. My suggestions for authors to use "remote sensing data along with soil and weather info, because only remote sensing data can not accurately define the yield levels at field scale, remote data do not always agreed with the yield variability levels because high spectral properties of crop plants/indices values should not be used to predict the future yield, because high values maybe because of early sowing/adequate water availability and lower values maybe because of insect pest attacks/diseases/flowering or grain filling stage/or even lodging/extreme weather, so high or low indices values do not significantly important for increasing or decreasing grain yield and associated traits. It is known that soil and weather factors significantly affecting crop yields so these influencing factors are crucial to include in the analysis to estimate the productivity levels within field. I recommend to revise this paper by adding soil and weather data as a model inputs to precisely estimate the maize yield within field.
Reviewer 2 Report
Comments and Suggestions for AuthorsI think this manuscript are very useful at map field-level maize yields in smallholder systems by Sentinel-2 Imagery and Google Earth Engine. However, the result and discussion are very simple, there are very little tables and figures, I suggest authors should add some content for the manuscript to illustrate more specific of the result, and do some comparison between their result with other similar scale or large scale around the world.
Comments on the Quality of English LanguageI think minor editing of English language required and should be more academic.
Reviewer 3 Report
Comments and Suggestions for AuthorsGENERAL
It is seen the authors have used the Sentinel-2 data to map the field maize yields based on VIs and two regression methods. However, the manuscript has room for improvement.
Comments in detail:
ABSTRACT
Before introducing the work (Line 11), the research progress or issues need to be clarified. That is, what problems can this study solve compared to previous studies?
The prediction accuracy is not adaptable (R2<0.47), so can it be applied in practice?
INTRODUCTION
The introduction should review the articles related to crop yield mapping or retrieving, and pointing out the issues in previous studies (before the last paragraph). However, although the study introduces the common pros and cons of Sentinel-2 data, different VIs, and LR and RF models, it didn't see the problem wanted to solve among these methods, but just used them. Therefore, the study needs to highlight the innovation compared with previous yield estimation
MATERIALS AND METHODS
Line 172-177: the GPS for each plot was recorded, thus, what is the purpose of determining the field boundary.
Line 227-235: there are some parameters in the random forest model that modulate the model performance. Thus, how to determine these parameters, and if the parameters are the optimal?
RESULTS
1. Suggest adding a paragraph (before 252) to describe the dynamic in vegetation index and yield changing in each region
2. Line 293. Why does R2 have a negative value, similar to in table 4
DISCUSSION
1. the study mostly mentioned the impact of cloud coverage on the results, thus it is suggested to add a comparison with other studies.
2. the conclusion section should be separated.