A Novel Machine Learning Approach to Estimate Grapevine Leaf Nitrogen Concentration Using Aerial Multispectral Imagery
Round 1
Reviewer 1 Report
General comments
The topic is novel and fits well with the scope of the remote sensing journal. Not a lot of papers available on making use of machine learning techniques to predict the status of N in Table grapes. However, several aspects should be addressed before publication.
- The title should reflect better the extent of the results. Nitrogen status is vague in that sense.
- Why just in bloom? A better explanation about the selection of this phenological period and vineyard characteristics should be provided. The authors stated “Insufficient N limits vine vigour, especially from bud-break to flowering [6, 7]” Will it still be beneficial the take the image at bloom when the crucial time for N is between bud-break and flowering?
- Are the split treatments necessary if the image is only taken at bloom when all Nitrogen additions have not been applied? (Might it not be better to just have the different levels at bloom stage as treatments)
- The prediction of the probability of a vine having a high Nitrogen status is not a very strong result, since this approach can be obtained with standard remote sensing methods. The authors should compare this approach with the standard methods to determine why their approach is better
- Transforming classification into regression. Although the 5 plants consist of a large number of pixels to train the models – all those pixels are still related only to 5 values. (More tissue samples per vine for the training plants would have been of value). Or could have been beneficial to train the model not only on the high and low levels but more target plants.
- The trained classifiers were used to predict the test data probability of high-N. Those pixels were averaged per vine to get a probability high-N value per vine. Regression was done between the vines actual % N measured tissue sample and the probability of high N-value. XGBoost was the best to predict the probability of a vine with a high-N?
- The conclusion section has a lot of methodological aspects. The authors should put more focus on the main results and the implications derived from these results.
Specific comments:
Line 55. Reference format.
Line 76-77. Expand this idea and give some examples.
Line 123-125. Explain the difference between pixel-based classification and pixel-based soft classification.
Line 132. Explain which parameters were considered to select the studied vines/locations.
Line 127. Treatments and applications could be explained better – a bit confusing.
Line 145. Tissue sampling: Would it not have been better to sample the top leaves instead of one shoot – as the pixels of the top leaves are used for the machine learning training models.
Line 219. Figure 2 could be explained better.
Line 570. Concentration or status?
Line 455. Data is only shown for regression models % N and (P-high N|X) probability.
Line 305. What is the effect of this variation?
Line 376. Indicate the number of samples per each dataset.
Line 379. Is it possible, with your approach, to compare measured and estimated nitrogen in physical units (%)?
Line 405. Why the discussion was focused on individual spectral bands? Is this relevant in your approach?
Author Response
We greatly appreciate your critical observations as well as your constructive and helpful comments. We hope that we could address your questions/comments by the explanations and revisions made in the manuscript.
The topic is novel and fits well with the scope of the remote sensing journal. Not a lot of papers available on making use of machine learning techniques to predict the status of N in Table grapes. However, several aspects should be addressed before publication.
- The title should reflect better the extent of the results. Nitrogen status is vague in that sense.
Per your suggestion, the title was revised to better reflect the manuscript context.
A novel machine learning approach to estimate grapevine leaf nitrogen concentration using aerial multispectral imagery
- Why just in bloom? A better explanation about the selection of this phenological period and vineyard characteristics should be provided. The authors stated “Insufficient N limits vine vigour, especially from bud-break to flowering [6, 7]” Will it still be beneficial the take the image at bloom when the crucial time for N is between bud-break and flowering?
Thank you for the question. Bloom is the earliest standard tissue sampling time for grape, and the most widely used by growers [1].
This section has been revised to avoid confusion. Tissue N content changes markedly through the season, and bloom is the earliest standard sampling time in grape. In this preliminary study, our primary goal was to investigate whether remote sensing had potential to estimate N content of grape leaves at this important sampling time.
- Iland, P.; Dry, P.; Proffitt, T.; Tyerman, S. The Grapevine: From the Science to the Practice of Growing Vines for Wine; Patrick Iland Wine Promotions Pty Ltd: Adelaide, Australia, 2011; ISBN 9780958160551.
- Are the split treatments necessary if the image is only taken at bloom when all Nitrogen additions have not been applied? (Might it not be better to just have the different levels at bloom stage as treatments).
The various nitrogen treatments applied were designed to address agronomic questions having to do with nitrogen use efficiency, and those results will be presented elsewhere. Having applied various nitrogen treatments to the vines resulted in a range of leaf nitrogen content at bloom, which made the site an appropriate place to conduct this research.
- The prediction of the probability of a vine having a high Nitrogen status is not a very strong result, since this approach can be obtained with standard remote sensing methods. The authors should compare this approach with the standard methods to determine why their approach is better.
The vine with a very high N concentration (4.19%) might be an outlier because it falls outside of 3 standard deviations from the samples mean (mean=2.9 and std=0.34). Our method (XGBoost) predicted its N concentration about 3.57% with 90% probability, which seems reasonable compared to the other samples.
We were not sure what the reviewer means by ‘standard methods’. If that refers to vegetation indices, here is our response:
We examined the performance of various vegetation indices in N prediction. The R2 was lower than the R2 obtained by our method:
Vegetation index |
R2 |
NDRE |
0.55 |
ARI2 |
0.40 |
NDVI |
0.22 |
MCARI2 |
0.20 |
TCARI |
0.01 |
In addition, the results achieved by XGBoost could capture the variability of N across the plots (which was one of our objectives)—a vine with high N concentration exhibits more dark green pixels, whereas a vine with low N concentration displays more light yellow pixels (please see Figure 6B). Similar to the histograms (please see Figure 6A), there is a distinct difference between the vines with low and high N concentration for the compared to NDVI and NDRE. In essence, machine learning algorithms provided a wider dynamic range, which is visually more appealing for human sensory comparison. Furthermore, it can be used as a practical tool to spot hot zones with low nitrogen concentration in a large commercial orchard. Lastly, as we discussed in introduction:
“Spectral indices have some significant limitations as analytical tools. For instance, normalized difference vegetation index (NDVI) saturates when vegetation coverage is dense [18], and atmospherically resistant vegetation index (ARVI) saturates when chlorophyll concentration reaches a certain level [19].”
- Transforming classification into regression. Although the 5 plants consist of a large number of pixels to train the models – all those pixels are still related only to 5 values. (More tissue samples per vine for the training plants would have been of value). Or could have been beneficial to train the model not only on the high and low levels but more target plants.
We agree with the reviewer that more tissue samples per vine would be of value. However, note that:
- Tissue sampling is destructive, and collecting additional shoots could affect our ongoing agronomic study.
- Several tissue samples per vine quickly increase analysis cost and labor.
The reason we selected the vines with high and low N levels was to train the models on extreme values so they can learn the pattern to distinguish the two classes with high accuracy and in a more effective way. Table 2 demonstrates that we were successful as F1-score on test dataset (unseen samples) was promising, ranging from 80% to about 82%.
- The trained classifiers were used to predict the test data probability of high-N. Those pixels were averaged per vine to get a probability high-N value per vine. Regression was done between the vines actual % N measured tissue sample and the probability of high N-value. XGBoost was the best to predict the probability of a vine with a high-N?
That is correct. We can also obtain probability of low N-value per each vine by subtracting P(high_N|X) from one ( P(low_N|X) = 1 – P(high_N|X) ).
- The conclusion section has a lot of methodological aspects. The authors should put more focus on the main results and the implications derived from these results.
The conclusion has been revised by adding more results (comparing F1-score and required training time). In addition, we summarized the implementations that can be derived from our findings:
“The findings of this study can offer immediate practical applications for sustainable nitrogen management, such as (i) providing insights on nitrogen variability in vineyards, which could be useful for variable rate management, (ii) identifying hot zones with low nitrogen content for a more informed and efficient tissue sampling.”
Specific comments:
Line 55. Reference format.
It was revised using the appropriate format.
Line 76-77. Expand this idea and give some examples.
Some examples of how remote sensing could be used to improve vineyard practice are now listed. Furthermore, we provided some examples of issues that hinder the full potential of remote sensing for N estimation in crops.
“For example, remote sensing could potentially reveal spatial variation of N status in grapevines, which can assist in identifying hot spots for smart sampling (i.e., directed sampling) and generating precise maps to develop a variable rate N fertilization program. However, the use of remote sensing for N estimation in crops has been constrained by the issues associated with data analysis such as overfitting, the curse of dimensionality, and developing robust, scalable, and generalizable predictive models.”
Line 123-125. Explain the difference between pixel-based classification and pixel-based soft classification.
The manuscript has been revised. Pixel-based classification refers to a hard binary classification, and in a pixel-based soft classification, we calculated the conditional probabilities of classes.
Line 132. Explain which parameters were considered to select the studied vines/locations.
The vineyard was selected based on the grapevine variety and rootstock, accessibility, and history of tissue sample data that suggested the vines might respond to N fertilization. Section 2.1 has been revised accordingly to provide this information.
Line 127. Treatments and applications could be explained better – a bit confusing.
Section 2.1 has been revised to improve clarity. The treatments were applied as part of an agronomic study. The differential applications affected leaf N concentration in a way that we thought would make this a good site to test whether remote sensing could estimate leaf N concentration.
Line 145. Tissue sampling: Would it not have been better to sample the top leaves instead of one shoot – as the pixels of the top leaves are used for the machine learning training models.
Each shoot on grape contains a population of leaves that should be similar to the population of leaves on other shoots in the whole canopy, including exterior (“top”) leaves.
Thank you for your suggestion. The vine canopy in this vineyard has a relatively shallow depth so the pixels of the sampled shoot were among the segmented pixels for training and testing process. We will consider your suggestion for our next tissue sampling as this is an ongoing research.
Line 219. Figure 2 could be explained better.
The caption of Figure 2 has been revised by adding mean and std of the measured nitrogen concentration for 150 vines, which were 2.98% and 0.34%, respectively. We also comprehensively discussed Figure 2B in section 4.1.
Line 570. Concentration or status?
The goal was to assess N concentration; we revised the title (changed ‘status’ to ‘concentration’).
Line 455. Data is only shown for regression models % N and (P-high N|X) probability.
That is correct. However, the sum of the probabilities of the two classes is 1 so:
P(low_N|X) = 1 – P(high_N|X)
Line 305. What is the effect of this variation?
The reason that low_N class tend to scatter more in feature space is that there might be a delay in showing the symptoms of N deficiency; some parts of the vine might exhibit the symptoms sooner, so their reflectivity is different. We discussed the difference the spatial distribution of N across the vine’s canopy in section 4.2.2.
Line 376. Indicate the number of samples per each dataset.
The number of samples in test and training datasets was added. It was already mentioned in the text at the beginning of section 3.3.
Line 379. Is it possible, with your approach, to compare measured and estimated nitrogen in physical units (%)?
Absolutely. Each of the models can estimate N concentration in percentage. Figure 5 can be shown as estimated N vs measure N (%) (obviously with the same R2 and RMSE). The reason we plotted P(high_N|X) was to show the outcome of the classification models can be directly used to estimate N concentration. If we plot estimated N vs measured N, it might cause confusion as readers might ask how we converted P(high_N|X) to estimated N.
Line 405. Why the discussion was focused on individual spectral bands? Is this relevant in your approach?
Our objective for section 4.1 was to discuss how various N concentration could affect the reflectance of leaves. This will help us better understand how and why grapevine’s leaves with low N concentration have different spectral response than the leaves with high N concentration. Once we show there is such a difference in reflectivity and discuss its reasons (it is not a random difference), then we can justify the use of machine learning techniques to find the patters.
Reviewer 2 Report
The authors used one-time aerial multispectral images and ground measurements to develop a data-driven method for the estimation of grapevine nitrogen content. Although the topic is of interest for the remote sensing and viticulture community, the manuscript is not ready for the publication in terms of the data quantity and quality, and the validity and generalizability of the proposed approach. Therefore, I can not recommend the manuscript for publication. For more comments, please see below
Line 23, why one minimum and two maximum nitrogen vines?
Line 55, correct the citation formatting issue.
Line 85, the citation number is missing!
Line 106-116, what are you trying to convey in this paragraph? Not clear to me at least. Back up your statements with relevant studies.
Also, include a paragraph to discuss the selected machine learning methods used in this study. These methods need to be justified again based on previous studies. Why those methods???
Lines 117-125, Pixel-based, and soft classification methods have not been mentioned earlier in the introduction.
Line 135 and line 143, the levels of N are confusing. Why there are two levels? Confusing, Clarify!
Line 146, how did the shoot for each vine were selected?
Line 163, what’s the exact date?
Was the camera on a gimble or a hard-mounted? What was the flight speed, side, and frontal overlap, etc? include relevant flight parameters and environmental conditions for the imaging date?
Line 172, citation 37 is not valid. Has it been published anywhere, not a complete citation. The authors should pay attention to what type of citation can be used in a scientific setting. This is important because the authors used the cited work for preprocessing the multispectral data and we are not sure if that’s the right way to do it without the validity of the method used.
How do we know if the accuracy of the preprocessed radiometrically corrected images? Were the images orthomosaiced? If so, what was the accuracy of the alignment between the different bands? I doubt there were aligned perfectly. There’re also vignetting issue on these type of multispectral images, has this been solved? If so how?
Line 185, I do not see the two markers in Figure. There were multiple panels/markers on the only one end of the plot not next to the first and last vines Figure 1B.
Line 189, what’s the value of NDVI used as a threshold.
Line 193-199, you stated that averaging pixel values over a vine is ignoring within canopy variability and not a good idea. However, you sampled a shoot from a vine and represented a whole vine N content with a single value. Similarly, this is also ignoring the within canopy variability of N in ground truth data.
Why only two classes (high- and low-N classes)? I see you are based on the data distribution, but more classes would give a more accurate estimation?
What’s the input for your classifier? Is that a single band or multiple bands, clarify?
Lines 242-244, describe these methods and provide justifications.
Line 273, what are those evaluation metrics? Specify and explain the meaning of high and low values for these metrics.
Line 274, why 10-fold cross-validation?
Section 2.8 is confusing. You trained a classification model based-on five vines ( three low and two high N vines). How did you translate the results of the classification of the five vines to the rest of the 145 vines? You did not want to ignore the within canopy variability, yet you averaged the classification probability of the whole vine, isn’t this still ignoring the whole vine variability?
Within vine variability is interesting, especially with high-resolution images. This variability is usually from illumination, viewing angle, vine structure, and vine self-shading. Non of these were accounted for in this study. At least, sorting out the canopy pixels into shaded and sunlit pixels then approach the problem at hand by accounting for these effects would be another interesting direction.
Figure 3 is nice but not informative. Low-N vine has high green and red reflectance, this is for sure due to the bright color and lower absorption of chlorophyll, that’s lacking in low-N vines.
Section 3.2, why SVM hyperparameters only in this section? Have these further discussed later in the manuscript?
Line 360-367: no need to present as the ensemble did not work. You did not mention this in the methods section. What’s the point?
Section 3.4, using the only classification of the five vines to predict the rest of the 145 vines, not convincing for me. Maybe that’s why R2 is not high even with the state-of-the-art machine learning methods. Have you tried using the average reflectance to predict the N? either from band images or indices? That may give a more accurate prediction. If Your approach can beat those approached than it’s impressive.
Present the map of the full field, not just a few vines for N.
You only used data from anthesis, how do your results transfer to other growth stages such as veraison?
Author Response
We greatly appreciate your critical comments as well as your helpful suggestions. We hope that we could address your questions/comments by the explanations and revisions made in the manuscript.
Author Response File: Author Response.docx
Reviewer 3 Report
It was a real pleasure to review this paper: the reading is fluent, the article is well constructed, the experiments were well conducted, and analysis was well performed.
Because the main aim of this paper is to develop a data-driven, decision-support tool to facilitate grapevine nitrogen management, I would suggest authors to insert a flow-chart to summarize the procedure in order to make it easily replicable.
Moreover, in the conclusion at lines 586-587 you declare that “the largest difference was observed for green, near-infrared, red, red edge, and blue”,i.e., in all the available used spectral bands. Probably, may be useful to include the dimension of this “difference”.
Minor comments:
Please, modify according to the MDPI rules the references at the following lines:
- 55-56: Keller, M, et al, 2001;
- 85: Pacheco-Labrador et al., 2014;
- 93: Hansen and Schjoerring, 2003;
- 96: Ferwerda et al., 2007;
- 519: Friedel et al, 2020
Author Response
We greatly appreciate your complimentary and constructive comments as well as your helpful suggestions. We hope that we could address your questions/comments by the explanations and revisions made in the manuscript.
It was a real pleasure to review this paper: the reading is fluent, the article is well constructed, the experiments were well conducted, and analysis was well performed.
Because the main aim of this paper is to develop a data-driven, decision-support tool to facilitate grapevine nitrogen management, I would suggest authors to insert a flow-chart to summarize the procedure in order to make it easily replicable.
Thanks for your comments. We totally agree with adding flow chart. Per your suggestion, we added a flowchart as a supplementary figure to explain the steps from data collection to model selection.
Moreover, in the conclusion at lines 586-587 you declare that “the largest difference was observed for green, near-infrared, red, red edge, and blue”,i.e., in all the available used spectral bands. Probably, may be useful to include the dimension of this “difference”.
We quantified the difference and revised the last paragraph of section 4.1 as:
However, if we sort the multispectral bands based on the absolute values of percentage difference , the order will become green (10.41%), near-infrared (9.02%), red (8.63%), red edge (6.09%), and blue (2.61%).
Minor comments:
Please, modify according to the MDPI rules the references at the following lines:
Thank you for pointing out to these inconsistencies in citation.
- 55-56: Keller, M, et al, 2001;
Revised based on the journal’s format.
- 85: Pacheco-Labrador et al., 2014;
Revised based on the journal’s format.
- 93: Hansen and Schjoerring, 2003;
Revised based on the journal’s format.
- 96: Ferwerda et al., 2007;
Revised based on the journal’s format.
- 519: Friedel et al, 2020
Revised based on the journal’s format.
Reviewer 4 Report
The manuscript explores machine learning techniques to estimate nitrogen content in a grapevine variety used for table grape production—which is valuable since studies on this types of grapes are not common to be seen, at least using remote sensed data. Such approach can serve as a decision-support tool to improve vineyard management.Generally, the manuscript is well written and the content concise, the Introduction properly presents the problem and motivates the study. However, the authors should have been used photogrammetric processing to orthorectify the imagery and vegetation indices should have been as dataset features. Nonetheless, the manuscript has quality to be accepted with minor revisions. Consider to include the response to some of my comments as future work. Please see specific comments and suggestions in the attached PDF.
Comments for author File: Comments.pdf
Author Response
We greatly appreciate your complimentary and constructive comments as well as your helpful suggestions. We hope that we could address your questions/comments by the explanations and revisions made in the manuscript.
Author Response File: Author Response.docx
Round 2
Reviewer 2 Report
Appreciate the authors' effort to address my comments and improve their work. I am satisfied with the authors' response