Benchmarking Biologically-Inspired Automatic Machine Learning for Economic Tasks
Round 1
Reviewer 1 Report
Review of the article “Benchmarking Biologically-inspired Automatic Machine Learning for Economic Tasks”
The aim of the paper is to show that biologically-inspired AutoML models statistically outperform non-biological AutoML in economic tasks while all AutoML models outperform the traditional methods. Overall, this study makes contributions to the current literature. I am overall positive regarding the work, however I present two detailed comments, which I believe can assist the authors to improve the paper.
Firstly, one of the disadvantages of the article is its generality. In the introduction the authors claimed that their aim is to investigate if biologically-inspired AutoML models would outperform other methods for data-driven economic tasks. From scientific point of view it would be more appropriate to limit proposed analysis to certain sets of economic tasks (and economic models), and then compare them to biologically-inspired AutoML models. Additionally, as it has not been explained in the article what is the meaning of “data-driven economic tasks” and what datasets have been selected for the analysis (see paragraph 3), it is not clear what data is the basis of authors’ comparison. In this context authors’ statement referring to economy-related datasets (subsection 3.1 of the article) does not improve readability of the paper.
Secondly, as I understand from the 3.3 paragraph, the criterion for models’ comparison has been fitting performance. The problem is that in some economic tasks that authors refer to, e.g. a stock trading decision support system (page 3 of the article), this is often (but obviously not always) not the case. In stock trading what also matters is the computational efficiency of the algorithm – prices/FX rates can change in nanoseconds which means that algorithms must be not only accurate but also computationally efficient. As AutoML models are iterative, i.e. computationally costly, they do not have high efficiency. In consequence their practical implementation to solving at least some economic problems is limited – especially problems related to decision making in financial market, e.g. decision making in HFT.
The article is written with a smooth writing style. If there were any spelling or grammar mistakes they were undetectable to me. I did not notice any problems with references.
Summing up, I recommend to publish the article after improvements referring to the comments presented above.
Author Response
We are pleased and privileged to re-submit our manuscript, "Benchmarking Biologically-inspired Automatic Machine Learning for Economic Tasks”, to have it re-considered for publication in Sustainability.
We sincerely thank the reviewers for their detailed and thought-provoking feedback. It was a delight to address the comments and questions, and their feedback has significantly improved our manuscript.
In response to the reviewers' comments, we revised the manuscript to better outline the proposed work and made it more suitable for a high-quality journal like Sustainability. To make the review process smoother, we highlighted all additions and edits in the manuscript in bold font. In addition, we address the reviewers' comments and questions below, point-by-point. We look forward to hearing from you and hope you will positively consider our manuscript for publication in your journal.
Kind regards,
Teddy Lazebnik*, Tzach Fleischer, Amit Yaniv-Rosenfeld
Comment #1: “Firstly, one of the disadvantages of the article is its generality. In the introduction the authors claimed that their aim is to investigate if biologically-inspired AutoML models would outperform other methods for data-driven economic tasks. From scientific point of view it would be more appropriate to limit proposed analysis to certain sets of economic tasks (and economic models), and then compare them to biologically-inspired AutoML models. Additionally, as it has not been explained in the article what is the meaning of “data-driven economic tasks” and what datasets have been selected for the analysis (see paragraph 3), it is not clear what data is the basis of authors’ comparison. In this context authors’ statement referring to economy-related datasets (subsection 3.1 of the article) does not improve readability of the paper.“
Answer #1: Thank you very much for pointing our attention to this shortcoming. Following this comment, we alter the manuscript in two ways to address this issue. First, in the Introduction section, we limited the claims of the proposed study to a more specific fields of economy where we find our datasets from. Second, in section 3.1. we provide more details on the nature and sources of the data so the readers could take these factors into consideration, making it more clear.
Comment #2: “Secondly, as I understand from the 3.3 paragraph, the criterion for models’ comparison has been fitting performance. The problem is that in some economic tasks that authors refer to, e.g. a stock trading decision support system (page 3 of the article), this is often (but obviously not always) not the case. In stock trading what also matters is the computational efficiency of the algorithm – prices/FX rates can change in nanoseconds which means that algorithms must be not only accurate but also computationally efficient. As AutoML models are iterative, i.e. computationally costly, they do not have high efficiency. In consequence their practical implementation to solving at least some economic problems is limited – especially problems related to decision making in financial market, e.g. decision making in HFT“
Answer #2: We thank the reviewer for the thought-provoking comment. First of all, the reviewer is absolutely right, models are measured not only by their accuracy performance but also by other factors such as stability, robustness, and speed. In the context of trading, speed can be of great importance. However, the reviewer is actually discussing a very specific type of algo-trading which requires a high level of computational efficiency. Nonetheless, other algo-trading methods such as the rare-event-based method are not time sensitive. Trading commodities is also done on a 15-minute time scale. In such cases, using autoML is legitimate as you train it once, and then in production is just an ML that works relatively fast (milliseconds to seconds for huge ensemble models). For this reason, we believe autoML is suitable for the proposed case. Indeed, it does not suitable for any task but we never claimed it does. Moreover, the results show clearly it is not perfect - far from it. On top of that, computation efficiently, while very important, is out of the scope of the proposed work. Nonetheless, we clearly state this limitation of this work in the discussion.
Reviewer 2 Report
This article shows an experimentation using machine learning regression over economics datasets. It is compared three kind of regressors called: traditional, bio inspirited and not bio inspirited.
Strong points of the article:
a) It is used a wide experimentation with 50 datasets. These datasets were carefully chosen using experts.
Weak points of the article:
1) In lines 146 – 148 it is said: “Each dataset is represented using a single table (matrix) and any column where the number of unique values was less than a quarter of the
number of instances was removed in order to avoid uninformative features [64].”
Please explain with more detail this decision. Why do you suppose that discrete (binary) attributes are not informative? It is applicable only to numeric attributes?
2) In line 195 it is said “that the best result of each algorithmic group is taken for each case, as presented in Table 1” This means that for the group of traditional the best between least mean square and minimal decision tree is showed in table 1? This procedure can not be applied in production time, since it is not possible to know beforehand witch model is the best if the test’s category is unknown.
Why are not displayed the measure of the two models instead of the best? This could be more informative.
Minor issues:
3) Recently, [38] show -> Recently, [38] shows
It is only one reference, so it is singular.
Author Response
Dear Editors and Reviewers,
We are pleased and privileged to re-submit our manuscript, "Benchmarking Biologically-inspired Automatic Machine Learning for Economic Tasks”, to have it re-considered for publication in Sustainability.
We sincerely thank the reviewers for their detailed and thought-provoking feedback. It was a delight to address the comments and questions, and their feedback has significantly improved our manuscript.
In response to the reviewers' comments, we revised the manuscript to better outline the proposed work and made it more suitable for a high-quality journal like Sustainability. To make the review process smoother, we highlighted all additions and edits in the manuscript in bold font. In addition, we address the reviewers' comments and questions below, point-by-point. We look forward to hearing from you and hope you will positively consider our manuscript for publication in your journal.
Kind regards,
Teddy Lazebnik*, Tzach Fleischer, Amit Yaniv-Rosenfeld
Comment #1: “ In lines 146 – 148 it is said: “Each dataset is represented using a single table (matrix) and any column where the number of unique values was less than a quarter of the
number of instances was removed in order to avoid uninformative features [64].”
Please explain with more detail this decision. Why do you suppose that discrete (binary) attributes are not informative? It is applicable only to numeric attributes?
“
Answer #1: Thank you very much for spotting this mistake! Indeed, removing *less* than a quarter does not make sense. The right text should be *more* than as this indicates that each value is shown, on average, only four times. Commonly, these are not very informative columns that happen to be in the data by mistake. For example, index columns - where each value is unique. Following this comment, we fixed this issue in the text.
Comment #2: “In line 195 it is said “that the best result of each algorithmic group is taken for each case, as presented in Table 1” This means that for the group of traditional the best between least mean square and minimal decision tree is showed in table 1? This procedure can not be applied in production time, since it is not possible to know beforehand witch model is the best if the test’s category is unknown.
Why are not displayed the measure of the two models instead of the best? This could be more informative. .“
Answer #2: Thank you for this comment. Indeed, as the reviewer states, we show the best of each category. In production time this is not available but not the purpose of the proposed experiment. We tried to show that even if one takes the *best* outcome of each category it is still better/worse than the other. Nonetheless, following this comment, we provide as supplementary material the full experiment data with 50 rows, one for each dataset, and 6 columns, one for each algorithm, so readers can compute any statistics they wish over the obtained results. We hope this setessfies the reviewer’s comment.
Comment #3: “ Recently, [38] show -> Recently, [38] shows | It is only one reference, so it is singular.“
Answer #3: Thank you - fixed.
Reviewer 3 Report
1. In Introduction, the literature is too sketchy. Detalied discussion on key literatures is needed.
2. More discussion is needed on results.
None.
Author Response
Dear Editors and Reviewers,
We are pleased and privileged to re-submit our manuscript, "Benchmarking Biologically-inspired Automatic Machine Learning for Economic Tasks”, to have it re-considered for publication in Sustainability.
We sincerely thank the reviewers for their detailed and thought-provoking feedback. It was a delight to address the comments and questions, and their feedback has significantly improved our manuscript.
In response to the reviewers' comments, we revised the manuscript to better outline the proposed work and made it more suitable for a high-quality journal like Sustainability. To make the review process smoother, we highlighted all additions and edits in the manuscript in bold font. In addition, we address the reviewers' comments and questions below, point-by-point. We look forward to hearing from you and hope you will positively consider our manuscript for publication in your journal.
Kind regards,
Teddy Lazebnik*, Tzach Fleischer, Amit Yaniv-Rosenfeld
Comment #1: “In Introduction, the literature is too sketchy. Detalied discussion on key literatures is needed.“
Answer #1: Thank you very much for pointing our attention to this shortcoming. Following this comment, we alter the introduction section to make sure it presents the most recent and relevant background we believe is important for the reader to know in order for us to convey the importance and novelty of the proposed work and how it combines with the currently available literature in the field. In particular, we removed several references, leaving only the most up-to-date and relevant to support our claims.
Comment #2: “More discussion is needed on results.“
Answer #2: Thank you for this suggestion. Following this comment, we slightly extend the discussion section providing more limitations for the proposed work. The results of this study seem to be very narrow as we wished to answer a single two-fold question: does autoML is better than traditional data-driven models in the economy (yes) and does biologically-inspired autoML better than other autoML in the context of the economy (yes). As both questions are binary in nature, we discuss the meaning of obtaining these (expected) results which we aimed to scientifically validate and state as clearly as we could the limitation of our empirical test.
Round 2
Reviewer 2 Report
All the notes are corrected answered.
Author Response
Thank you very much for the professional review, it is highly appreciated.