Monitoring the Degree of Mosaic Disease in Apple Leaves Using Hyperspectral Images
Round 1
Reviewer 1 Report
The authors propose a very interesting idea and article, the strong points of the approach being (i) the correlation between the spectral characteristics and the anthocyanin content in the apple leaves and (ii) the analysis of the most representative bands carrying information for the proposed application. However, there are some aspects which raised various questions that should be addressed. Please see below.
The most concerning tis the resampling of the spectral reflectance at 1 nm, which definitely introduces some artefacts. More details should be provided, also the arguments for this resampling.
All the images are of poor quality, this should be corrected. Also their size should be increased for better visibility and interpretation of the results.
More detailed comments in what follows:
In the abstract and the introduction, there is a slight contradiction which should be corrected or clarified: the leaves exhibited stronger reflectance at 600 nm (line 16), but the apple leaves infected with mosaic virus show chlorotic (yellow) spots (lines 32-33). The wavelength of 600 nm does not correspond to yellow.
In the abstract, please replace Gaus1 with the appropriate name of the wavelet transform (line 17). The same applies for Table 2.
Section 2.3.1. What is the purpose of using SPA? What are the variables (line 128), I assume the selected bands (line 131)? Please provide references to this method. The selected bands are used to better interpret the spectral shape in general or a specific characteristic in particular (line 132)? Please be more specific. Not clear what is the input data set for the SPA – all the measurement points in the spectral reflectance curves or just the ones corresponding to the yellow parts?
Regarding Fig. 2 – the images should be larger and better quality. Maybe Fig. 3 should be presented first and the selected bands should be depicted on Fig. 3 as well. Do the selected variables (wavelengths) reflect the differences between healthy and non-healthy leaves? I am not convinced by the presented results, in different figures on different pages.
How the threshold for the Fig. 2(b) was selected?
Regarding Table 1 – I do not agree that for the computation of NDVI any two bands can be used. For the case of images produced by LANDSAT and Sentinel 2 satellites, very specific bands are used for the computation of NDVI index, bands which were chosen for the construction of the multispectral imaging system on-board the satellites. In literature, the red band used for NDVI is 660 nm and the NIR band is 860.
In addition, what is Ri, Rj and Rn in Table 1?
Section 2.3.3, please provide arguments for the usage of the CWT.
Lines 172-175, it is confusing: wavelength is both lambda and the b scaling factor?
Line 178, please provide more details about the Matlab implementation(s).
Regarding 2.4.1, is it partial or not? Not clear from the text on line 186.
The major factors are not explained on line 190. How exactly the PLSR method was used?
Regarding the methods Random forests, ANN and XGBoost the details about the implementation were not provided.
Equation (5) is not explained. References should be provided at least for RPD, if not all 3 used metrics: R2, RMSE and RPD.
Regarding Fig. 4 - the correlation is basically the same (between 0.6 and 0.8) in the visible spectrum for most of the wavelengths (especially for the “plateau” at 0.8 for lambdas between 500 and 580 nm roughly). This contradicts somehow the hypothesis that there are several bands that present higher information for the present correlation study.
Larger images for Fig. 7 should be presented.
Author Response
Dear reviewer,
Thank you for your letter and the reviewers’ comments on our manuscript entitled " Monitoring the degree of mosaic disease in apple leaves using hyperspectral images " (ID:remotesensing-2333377). Those comments are very helpful for revising and improving our paper, as well as the important guiding significance to other research. We have studied the comments carefully and made corrections which we hope meet with approval. The main corrections are in the manuscript and the responds to the reviewers’ comments are as follows (the replies are highlighted in red ).
Point 1: The most concerning tis the resampling of the spectral reflectance at 1 nm, which definitely introduces some artefacts. More details should be provided, also the arguments for this resampling.
Response 1: After reading some literature, we believed that this processing method can dig up more spectral features. Therefore, when resampling, we used linear interpolation, which does not change the spectral reflectance of the original bands, but fills the gaps between them linearly. We have made modifications to Section 2.2 to explain this issue (line 144 - 146). We hope to shed some light on this problem.
Point 2: All the images are of poor quality, this should be corrected. Also their size should be increased for better visibility and interpretation of the results.
Response 2: We appreciate your comment on the image quality and size. We will improve these aspects by re-processing the images at a higher resolution and increasing their size accordingly to ensure better visibility and interpretation of the results. We will also make sure that all figures and tables are of high quality before finalizing the manuscript. Thank you for bringing this to our attention.
Point 3: In the abstract and the introduction, there is a slight contradiction which should be corrected or clarified: the leaves exhibited stronger reflectance at 600 nm (line 16), but the apple leaves infected with mosaic virus show chlorotic (yellow) spots (lines 32-33). The wavelength of 600 nm does not correspond to yellow.
Response 3: We read the paper and run the data again, found that we made a careless mistake when describing the figure, it should be: with the increase of the disease degree, the leaves exhibited stronger reflectance at the range of 500-560 nm (correspond to green and yellow), and for the sever infected leaves, 570-588 nm which correspond to yellow also show a stronger reflectance. We revised this in the resubmitted manuscript (line 16 and line 299 -300). Thank you again.
Point 4: In the abstract, please replace Gaus1 with the appropriate name of the wavelet transform (line 17). The same applies for Table 2.
Response 4: We guess we used Gaus1 in the manuscript because we used Gaus1 in Matlab. We replaced all of Gaus1 by Gaussian1.
Point 5: Section 2.3.1. What is the purpose of using SPA? What are the variables (line 128), I assume the selected bands (line 131)? Please provide references to this method. The selected bands are used to better interpret the spectral shape in general or a specific characteristic in particular (line 132)? Please be more specific. Not clear what is the input data set for the SPA – all the measurement points in the spectral reflectance curves or just the ones corresponding to the yellow parts?
Response 5: Previous studies have shown that SPA can extract more feature bands (the inflection point of the curve or the point where the reflectance difference is large). The "variables" mentioned in line 128 of this study represent the selected bands. Since the concept presented in the manuscript is applicable to the general public (variables), I did not modify the word, and I apologize for any confusion it may have caused. The input dataset of SPA is all measured points on the spectral reflectance curve. Since the goal of this paper is to estimate the degree of mosaic disease through Anth, we need spectral data from both healthy and infected parts (yellow parts).
Point 6: Regarding Fig. 2 – the images should be larger and better quality. Maybe Fig. 3 should be presented first and the selected bands should be depicted on Fig. 3 as well. Do the selected variables (wavelengths) reflect the differences between healthy and non-healthy leaves? I am not convinced by the presented results, in different figures on different pages.
Response 6: For Figure 2 (now Figure 4), it shows the feature bands obtained using the SPA method, which uses vector projection analysis. By projecting the wavelength onto other wavelengths and comparing the sizes of the projection vectors, the wavelength with the largest projection vector is selected as a candidate wavelength. Then, based on the corrected model RMSE, the final feature wavelength is chosen. The selected feature wavelength has the least collinearity and may have a relatively low signal-to-noise ratio (SNR). The purpose of Figure 3 is to show the spectral reflectance curve features at different levels of disease severity, proving the feasibility of using spectral features for flower leaf disease recognition. Based on your questions, we read more literature related to SPA. In the revised manuscript, we added an explanation of the SPA principle and further explanations of the SPA selection results in the Results section (3.2.2).
Point 7: How the threshold for the Fig. 2(b) was selected?
Response 7: The selected RMSE is the point at which the RMSE is not significantly larger than RMSEmin according to an F-test with α = 0.25. We also added some details in the resubmitted manuscript. To better illustrate the issues (point 5-7), we reviewed the following articles and attached a link, I hope we can explain the issues clearly.
An overview of variable selection methods in multivariate analysis of near-infrared spectra. https://doi.org/10.1016/j.trac.2019.01.018
The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. https://doi.org/10.1016/S0169-7439(01)00119-8
The Successive Projecttions Algorithm(SPA) Homepage
http://www.ele.ita.br/~kawakami/spa/
Point 8: Regarding Table 1 – I do not agree that for the computation of NDVI any two bands can be used. For the case of images produced by LANDSAT and Sentinel 2 satellites, very specific bands are used for the computation of NDVI index, bands which were chosen for the construction of the multispectral imaging system on-board the satellites. In literature, the red band used for NDVI is 660 nm and the NIR band is 860.
Response 8: It should be NDSI. And we checked and calculated the data, fortunately, errors only occurred in writing, and the calculation process of the data was not affected. In our resubmitted manuscript, all the words are revised. Thank you for your correction and reminder.
Point 9: In addition, what is Ri, Rj and Rn in Table 1?
Response 9: Ri and Rj are the reflectance at i and j nm over the entire reflectance spectrum. We added the explanation in the resubmitted manuscript (line 163).
Point 10: Section 2.3.3, please provide arguments for the usage of the CWT.
Response 10 & 12: This study used the old wavelet function (version 2006) provided by Matlab: coefs = cwt (x, scales, 'wname'), where x is the spectral reflectance from 400-988 nm, scales represent:21,22,23…210, respectively and 'wname' is the mother wavelet function we used. The wavelet features decomposed at different scales using the same mother wavelet function will change. Figure(a) is the description of old CWT, figure(b) is the newer one(version2016) from MATLAB (Please see the pdf file for a more detailed explanation).
Point 11: Lines 172-175, it is confusing: wavelength is both lambda and the b scaling factor?
Response 11: λ is the number of hyperspectral bands, b is shifting factor determining the position. When b is shiftting from 400 – 988 nm, when shift to such a coincidence , you also multiply to get a large value. C11-C1n represent the wavalet coefficients of scale1. Please see the pdf file for a more detailed explanation.
Point 12: Line 178, please provide more details about the Matlab implementation(s).
Response 12: Please see response 10&12.
Point 13: Regarding 2.4.1, is it partial or not? Not clear from the text on line 186.
Response 13: Thank you for your eagle eyes, and this is our clerical error, it should be partial-least-squares solution.
Point 14: The major factors are not explained on line 190. How exactly the PLSR method was used?
Response 14: We apologize for not explaining PLSR clearly enough. After discussing with the professor and consulting more relevant literature, we have explained the PLSR method again in the revised version. We hope this explanation is clear enough. The main factor we want to express is that the principal component analysis within PLSR, it varies for each modeling parameter. When we change the value of n_components in the equation PLSR = PLSRegression (n_components =n, scal = true), we obtain different R2 values and select the best one as the modeling result. For example, when the input object is λspa, the model accuracy is highest when n_components=4; when the input object is VI, the model accuracy is highest when n_components=3.
Here is a brief introduction: We ran PLSR using the sklearn module and numpy library in Python, and added a cross-validation process to the model. I have listed some of the literature or websites we consulted below, hoping they can help me further explain how to use PLSR.
Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression. DOI: 10.1007/978-1-62703-059-5_23
PLS Python code. https://nirpyresearch.com/partial-least-squares-regression-python/
Point 15: Regarding the methods Random forests, ANN and XGBoost the details about the implementation were not provided.
Response 15: I am sorry for not providing more implementation details of RF, ANN, XGBoost, which we have added in the modified version.
https://xgboost.readthedocs.io/en/stable/parameter.html
Point 16: Equation (5) is not explained. References should be provided at least for RPD, if not all 3 used metrics: R2, RMSE and RPD.
Response 16: For R2 and RMSE, is the measured value, is predicted value given to the model, is the average of the measured value, m is the number of samples. For RPD (Relative Percent Difference), SD is the standard deviation of the analysis sample, SEP is the root mean square error of the analyzed sample. And we also added this in the resubmitted manuscript (line 286 - 288).
Point 17: Regarding Fig. 4 - the correlation is basically the same (between 0.6 and 0.8) in the visible spectrum for most of the wavelengths (especially for the “plateau” at 0.8 for lambdas between 500 and 580 nm roughly). This contradicts somehow the hypothesis that there are several bands that present higher information for the present correlation study.
Response 17: To be honest, we didn't fully understand this issue, but we assume that it may be because in the original manuscript, we placed the results of SPA in the method section, which may have caused confusion for you. We guess that what you want to express is: The spectral reflectance between 500-580 nm is highly correlated with anthocyanin content, but why do we still need to select feature bands for research or why the selected feature bands do not include 500-580 nm? Based on our thinking, we propose the following answer:
First of all, the spectral reflectance between 500-580 nm or even between 500-720 nm is highly correlated with anthocyanin content. However, after calculation, we found that the spectral reflectance of these bands has strong autocorrelation. If they are directly used as input factors for the model without selecting feature bands, the model accuracy will be reduced due to information redundancy. Therefore, only a subset needs to be selected. Therefore, SPA method was used later. During the operation of SPA method, wavelengths of 654 nm, 673 nm, and 720 nm were selected by projection, and it was believed that the feature information of this subset could replace the feature information within the range of 500-720 nm. Therefore, wavelengths within this range were not selected anymore. Although the other wavelengths selected by SPA have low correlation with anthocyanin content, they are located at the inflection points of the spectral reflectance curve and have large differences.
Studies have shown that there is a high correlation between bands of hyperspectral data, which leads to multiple collinearity between spectral data, and the redundancy of spectral information increases. The existence of multicollinearity between variables will greatly reduce the accuracy and stability of model prediction, thus reducing the accuracy of prediction results. Links to the reference are as follows:
Winter wheat chlorophyll content retrieval based on machine learning using in situ hyperspectral data. https://doi.org/10.1016/j.compag.2022.106728
Point 18: Larger images for Fig. 7 should be presented.
Response 18: In order to make the graph larger, we split the original graph and put two scatter graphs in each row, but this would cause a lot of pictures, so we decided to only put the multi-parameter modeling scatter graph (VPs-model), and upload the remaining several graphs as attachments.
Thank you again for your attention and time, look forward to hearing from you.
Yours sincerely,
Danyao Jiang
E-mail: [email protected]
Corresponding author: Qingrui Chang
E-mail: changqr@ nwsuaf.edu.cn
Author Response File: Author Response.pdf
Reviewer 2 Report
Authors in this paper proposed comprehensive indicators to estimate the severity of apple mosaic disease. They analysed the spectral characteristics of apple leaves under mosaic stress. This seems to be a well written paper but with primary focus on agriculture reader. It looks very much as well described case study. However, as this manuscript is submitted to Remote sensing journal, introduction of research using hyperspectral images for plant health monitoring could be strengthened. Methods used in this research are original and new. Article is very well-written.
Line 136: I think it would be better the selected wavelengths list in table or in the text, because in this figure 2 it is hard to see and understand which spectrum bands were selected.
Line 300: What is (at R694, R720) ?
Author Response
Dear reviewer,
Thank you for your letter and the reviewers’ comments on our manuscript entitled " Monitoring the degree of mosaic disease in apple leaves using hyperspectral images " (ID:remotesensing-2333377). Those comments are very helpful for revising and improving our paper, as well as the important guiding significance to other research. We have studied the comments carefully and made corrections which we hope meet with approval. The main corrections are in the manuscript and the responds to the reviewers’ comments are as follows (the replies are highlighted in red ).
Point 1: I think it would be better the selected wavelengths list in table or in the text, because in this figure 2 it is hard to see and understand which spectrum bands were selected.
Response 1: Your question made me think that we might confuse readers at 2.3.1 by putting them far away, so I adjusted them to 3.2.2 (please see the attachment) which gives a better description of the figure. Thank you for your advice.
Point 2: What is (at R694, R720) ?
Response 2: I’m sorry I made a few verbal errors here, I think it should be ' the correlation between anthocyanin content and NDSI (R694, R720) was the best '. We revised it in the resubmitted manuscript.
Thank you again for your attention and time, look forward to hearing from you.
Yours sincerely,
Danyao Jiang
E-mail: [email protected]
Corresponding author: Qingrui Chang
E-mail: changqr@ nwsuaf.edu.cn
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Thank you for addressing all my comments.