Machine Learning Identification of Saline-Alkali-Tolerant Japonica Rice Varieties Based on Raman Spectroscopy and Python Visual Analysis
Round 1
Reviewer 1 Report
Authors propose combination of several data analysis methods and two classifiers for identification between saline-alkali tolerant and sensitive Japonica rice varieties.
The main concern of the research is related to amount and utilization rate of data in comparison to complexity of the selected data analysis methods. For this, it is important for validity of the results that reliability of the classification results is evaluated properly. This include firstly a cross-validated analysis of data, for example k-fold CV. This can help to judge the stability and uncertainties related to the proposed classifiers. Also, it is typical in classification task to report and discuss AUC, ROC, and confusion matrix of the classifiers as the sensitivity analysis for the methods applied. This includes both training and test sets.
Table 4 show the classification results (Table text is about modeling results ?), there it is not mentioned, which data set (training or test) was in use.
The applied methods should be explicitly opened and reported with respect to their configuration and related hyperparameters in this experiment.
In 3.3.1 the performance on LR was considered not good. Still, its accuracy stays around 80% that is feasible with real data. On the other hand, and without any details revealed about the SVM structure, the result of 97% accuracy may suggest an overparameterized classifier. The above analysis suggestion of classifiers goodness/stability tests may provide valuable insight to this matter.
The manuscript seems to concentrate to data analysis and reporting the results of binary classification. However, if the resulted method combination is to be applied to next future cases, what are the most important features to be considered then? More throughout feature analysis is needed, since this could be an important finding - there are dozens of powerful classifiers available for this task other than SVM, it is therefore important to stress the findings related to information (features) extracted from the spectral data. This also leads to a question about the prediction capability, and especially the generalization capability of a classifiers (that can be partly considered with k-fold CV). Authors should clearly describe the potentiality to utilize this kind of approach in future and possible uncertainties related when doing so.
There are many terms applied in the manuscript for example to SVM (correction method, model,..), instead, a unified terminology is needed to help the reader.
Visualization is considered as a typical method for pre-processing stage, now the figure 5 is in the end of the manuscript.
Author Response
Please see the attachment!
Author Response File: Author Response.pdf
Reviewer 2 Report
The study may be interesting to understand the visualization method of Python data processing used to analyze the Raman spectroscopy of japonica rice to study a simple and efficient identification method of saline-alkali tolerant japonica rice varieties. The authors established the linear Logistic Regression and nonlinear Support Vector Machines models for discriminant analysis. However, the study appears to be shallow and not developed.
Some points could be improved.
1. Authors should write the full name of the LR and SVM methods when they mention them for the first time in the text (not abstract). The same for other abbreviations.
2. The introduction section is good, but the authors should clarify the paper's structure at the end of the Introduction section.
3. The Literature Review section is missing. The Introduction refers to some studies but does not explore the specifics of the use of Machine Learning methods.
4. The dataset described in the Materials and Methods section needs a reference.
5. The authors used some methods: the filtering method for data noise reduction and different methods to extract spectral crest information, and so on. But the authors did not explain why they choose these methods.
6. The authors input the LR output results into the sigmoid function. It should be explained why the authors used this function.
7. Authors should describe all parameters and variables in formulas.
8. The authors divided according to 7:2. It should be clarified why was this proportion chosen.
9. The authors compare the methods based only on the assessment of accuracy. It is advisable to add to this indicator other estimates of the adequacy of the models.
10. It would be good to investigate other machine learning methods, or to note this point from the perspectives of further research.
11. Conclusion may need to focus on what was achieved from this research and compare it with the literature.
Author Response
Please see the attachment!
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
the authors have partly covered the comments, although some of those have been omitted.
At the abstract, still an accuracy of SVM is stated to be 97.92%, as this is a single best value, according to k-fold analysis the average would be expected to be around 94% this result should be presented along the individual best classification result.
It is needed in classification task to report and discuss AUC, ROC, and/or confusion matrix of the classifiers as the sensitivity analysis for the methods applied. This includes both training and test sets.
More throughout results analysis is still needed, since this could be an important finding for potential utilization of the results in future - it is important to report the most feasible feature subsets extracted from the spectral data with the best tested classifiers.
In text, there are still variations in the terms, unified terminology should be applied since this is more a classification task rather than a modelling task.
Author Response
Once again, thank you very much for your good comments and suggestions.
Author Response File: Author Response.docx
Reviewer 2 Report
The authors have addressed all the comments.
Author Response
Once again, thank you very much for you!
Round 3
Reviewer 1 Report
The authors have now successfully answered the presented comments.
Please, carefully check the format of the figures in the manuscript in order to provide high quality resolution.