Machine Learning Strategy for Improved Prediction of Micronutrient Concentrations in Soils of Taif Rose Farms Based on EDXRF Spectra
Round 1
Reviewer 1 Report
The present study proposes the use of EDXRF measurements combined with machine learning to improve the prediction of 14 micronutrients in agricultural soils of the Taify rose (Rosa Damascena) in Taif, Saudi Arabia. The study is interesting since some papers have been demonstrating that machine learning can reduce some drawbacks of the EDXRF measurements. However, it is not totally clear how the machine learning models were built. i.e., the manuscript does not make it clear what conventional method (y matrix) was used as input to build the machine learning models; some metrics, such as r² and RMSE, were not reported which would help in models’ comparison. For me, figures 5, 6, and 7 are not useful and clear to compare the prediction error of the models; the dataset was not split into train and test subsets, according is suggested for this kind of work in order to validate the models as well as to verify if exist model overfitting. Moreover, the manuscript does not cite the methodology used to determine the elemental concentration by EDXRF (Table 1).
In my opinion, some points need to be corrected and clarified.
It would be better to enhance some points regarding methodology and results to accomplish a clearer understanding of the study.
Objective 1) Use XRF measurements to develop national models for predicting concentrations of 14 micronutrients in agricultural soils of Taif roses farms.
This objective has been partially reached. However, the manuscript does not report the methodology used to determine the elemental concentration by EDXRF. i.e., was it by fundamental parameters, calibration curve, or quantitative package of the manufacturer?
Objective 2) Validate the models at the farm scale by using cross-validation.
The study compared the XRF quantitative results with values for elemental concentration from USA soils. Soil is a very complex matrix, so the comparison results with literature do not validate the results. Line 169 cites that CRMs were used to evaluate precision and trueness. However, these validation results were not reported in the manuscript. It would be extremely important to describe the CRM results since the comparison with concentrations reported in another country is very vague for analytical purposes.
Objective 3) Compare the performance of three model types: multiple linear regression (MLR), multivariate adaptive regression splines (MARS), and random forest regression (RF).
I have some queries about the dataset and methodology used to build the models:
- Were the 14 elements determined by XRF the independent variables (matrix X)?
- What was the conventional method used to determine the dependent variables (vector y)?
- What were the metrics used to compare the models?
The manuscript does not make it clear which metrics were used for models’ comparison. It presents some figures with prediction errors. For me, the figures’ quality is not so good which made the reading and interpretation difficult. In general, the coefficient of determination (r²) and root mean square error (RMSE) are used as metrics to compare multivariate calibration models. I suggest describing the metrics used for models’ comparison and comparing the performance of the three model types using a table with RMSE values.
4. Was the dataset split into train and test subsets? If not, was some cross-validation method used? What was the criterion used to avoid the models’ overfitting?
lastly, the authors argue that machine learning aid the analysis of elements with high detection limits (LLDs). However, The EDXRF LLDs were not reported. Furthermore, if EDXRF has high LLDs, are the dataset used in this study suitable?
Author Response
We highly appreciate the time and efforts dedicated to reviewing our manuscript and the constructive comments and invaluable suggestions provided by you and the esteemed reviewers. These comments allow us to significantly improve and strengthen the content of our paper entitled "Machine learning strategy for improved prediction of micronu-trients concentrations in Soils of Taif roses Farms based on EDXRF spectra ". Therefore, the all-out effort has been expended to address the comments and incorporate them in the revised manuscript accordingly.
In the following, please find our responses to your comments where the reviewers' comments appear in black and our responses in blue. Two versions of the manuscript have been uploaded to Manuscript Central:
- Old_MARKED3.docx includes track changes to highlight modifications to the original paper.
- New_UNMARKED3.docx is the final version of the revised paper with all changes accepted.
Author Response File: Author Response.pdf
Reviewer 2 Report
This authors utilize the recently improved machine learning techniques to develop a general prediction performance for agricultural soils in Saudi Arabia, specifically the area of Taif. The new index achieved higher accuracy than some existing indices. The overall presentation is straightforward and clear.
However, the methodology and discussion sections need an improvement. Particularly, the authors should have described how those approached were used? Based on thresholds?
Also, the authors should describe what motivates the authors develop an index like this. Discussion should also include why the new index in this formula achieved better results. Otherwise, it looks like the authors reported one successful experiment from tons of failed ones.
The introduction lacks an overview of the methods used in previous studies in the same type of analyzes for microelements
delete soil sampling from (2.1. Study area and soil sampling) as it is repeated in 2.2. Soil Sampling
Abstract: Does reflect the content of the article
Figure 1. is not accepted and must be changed with any of available satellite images like Landsat 9 or sentinel
all figures did not have an enough and clear illustration and discussion
What is the rationale (theory) behind the proposed index?
A flowchart showing the main steps of the study would be helpful.
number of the cited references are too short to valuable study like this, so you need to fulfill this by increasing clarity to the discussion section
Author Response
We highly appreciate the time and efforts dedicated to reviewing our manuscript and the constructive comments and invaluable suggestions provided by you and the esteemed reviewers. These comments allow us to significantly improve and strengthen the content of our paper entitled "Machine learning strategy for improved prediction of micronu-trients concentrations in Soils of Taif roses Farms based on EDXRF spectra ". Therefore, the all-out effort has been expended to address the comments and incorporate them in the revised manuscript accordingly.
In the following, please find our responses to your comments where the reviewers' comments appear in black and our responses in blue. Two versions of the manuscript have been uploaded to Manuscript Central:
- Old_MARKED3.docx includes track changes to highlight modifications to the original paper.
- New_UNMARKED3.docx is the final version of the revised paper with all changes accepted.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
I appreciated the changes made in the paper. It is now clear that the study proposed comparing ML techniques to predict the concentration of a specific microelement using as input the concentration of other microelements (one or more) obtained by EDXRF. The authors presented a good research paper. However, the Introduction section uses an approach that does not help the goal of the manuscript. I suggest making it clear in the Introduction section that the main objective was to compare ML techniques for predicting the concentration of a specific microelement by the concentration of other microelements as well as citing the advantages of this approach. I really enjoyed the reading, and I am pleased to recommend the acceptance of this paper after addressing some suggestions and queries:
- What is the advantage of the proposed method? Would not be simpler to obtain concentration using the UniQuant standardless method (the method used by the authors in this study)?
A discussion about this needs to be present in the introduction.
- Line 64: “For example, copper (Cu), Ferrous (Fe), and zinc (Zn) are important elements for crop production due to their vital roles in photosynthesis, respiration, and other plant functions [11-12].”
The ionic form absorbed by plants is Cu2+. The EDXRF technique determines total contents and does not determine chemical speciation. I see that this should be made clear so as not to imply that the proposed method determines the ionic forms of micronutrients absorbed by plants.
- Line 178: 2.2 XRF measurements section
I suggest keeping the following paragraph according was in the previous version of the manuscript:
“Using a Shimadzu EDX-720 energy dispersive X-ray fluorescence spectrometer, we measured the concentrations of the elements. The samples were irradiated in triplicate for 300 s under vacuum using an Rh X-ray tube at 15 kV (Na to Sc) and 50 kV. The current was automatically adjusted (maximum of 1 mA). A 10 mm collimator, and a Si (Li) detector were used, and then it was cooled with liquid nitrogen for detection.”
- Line 204: “For the ideal "mining" procedure under high purity helium (He), limits of detection (LODs) were within the range of 2 mg kg−1 (e.g. Ni and Cu) and 60 mg kg−1 (e.g. Cl).”
What was the atmosphere used? Vacuum or He? It is a little bit confusing because the previous version mentions that vacuum was used and now the LODs values are presented under a high purity helium atmosphere.
- Line 27: “Our study proposes a Machine Learning (ML) strategy for predicting fertility parameters more accurately in agricultural soils using 10 farms of the Taify rose (Rosa damascena) in Taif, Saudi Arabia as a case study.”
- Line 32 – “The study showed that multivariate models can be used to overwhelm the drawbacks of the EDXRF device, such as high detection limits and an element that cannot be directly measured.”
Although the study shows that is possible to predict a microelement that cannot be directly measured by EDXRF. What is the statistical confidence of this prediction?
The cross-validation results attest to the most accurate machine learning for this case study and do not validate the methodology. It would be good the authors highlighted that is not possible to obtain results more accurately and with lower detection limits than the reference method used to build the calibration models (which in this study was the own concentrations measured by EDXRF). I suggest the authors make a statement to warn the reader of over-optimistic expectations. Or even better they should devise how such approaches can help overcoming instrumental limitations and contribute to the EDXRF field. For instance, raw EDXRF spectral data or peak intensity data could be used as input for the models. It would eliminate the need for quantitative packages which in some cases are expansive. The accuracy could be evaluated if another reference method, such as ICP, or CRMs were used by comparing ML and UniQuant results.
Author Response
we thank the Reviewer for his valuable significant recommendations and for giving his time
We highly appreciate the time and efforts dedicated to reviewing our manuscript and the constructive comments and invaluable suggestions provided by you and the esteemed reviewers. These comments allow us to significantly improve and strengthen the content of our paper entitled "Machine learning strategy for improved prediction of micronu-trients concentrations in Soils of Taif rose Farms based on EDXRF spectra
Author Response File: Author Response.pdf
Reviewer 2 Report
It appears that efforts have been made to amend the manuscript
Author Response
we thank reviewer for his comments