Model Soups for Various Training and Validation Data
Round 1
Reviewer 1 Report (New Reviewer)
In the article "Model Soups for Various Training and Validation Data," the authors examine the process of model soups and attempt to maximize fine-tuning accuracy by optimizing the model soup building process by applying k-fold cross validation.
In Section 1 (Introduction) they motivate this work, give a neat introduction to the area of interest and point out many previous and related works in this area before they define the aim of this work.
* Line 67-69: The conjecture is not far off, but it would be a nice addition to this work to verify this, or maybe add to possible future works.
In Section 2 (Related Works) the authors introduce the concept of model soups, and the two methods considered here in particular as well as the concept of k-fold cross-validation.
* I recommend renaming this section to 'preliminaries' as it describes the existing basics needed to understand this work.
* line 80: maybe split sentence: "...method. The weights..."
* line 81 pre*-*trained
* "This study proposes" reference is not immediately clear. Maybe dismiss line-break here and/or change formulation to "The study proposes.."
In Section 3 (Methods) the authors introduce their adaptation of the method and the experimnet setup.
* Please stick to the same tense througout the article, preferably present tense (i.e. line 111: "This study determine*s* ..." so it can be read from the tense used if the author refers to this work (present tense) or previous works (past tense)
* Line 112: "The problem that differs from k-fold cross validation is that by synthesizing models that include each other’s validation data in the training data, the validation data are included in the training of the models." The problem that differs from - meaning unclear. This sentence states an obvious fact, yet it doesn't get clear what the problem is. Each model is supposedly trained on its set of training data and tested/evaluatet on the corresponding set of evaluation data (which is, per definition, not par of the training data?) This part needs further explanation.
* Please add subsection "Experimental setup" maybe at the end of this section. Introduction of models and datasets used should move there, as well as an overall explanation of the setup. It does not get completely clear how the data is divided, was the resulting model soup tested on untouched testdata, how does your process differ from k-fold cross-validation?
* Line 116: "validation data determine*s*"
* "Additionally, the validation data determine which model must be selected to improve the accuracy of the test data when training multiple models." - Also unclear. The validation data determines? - Or Based on the validation data, using some evaluation function, it will be determined? And: The accuracy of the test data or the models (inference) accuracy on the test data? Please get into more detail here.
* Line 118 "Best on val" references a method that has not been introduced in this article - "which served as a baseline" add reference here since this is referencing another paper.
* Line 110 / 111: "greedy soups greedily" maybe drop the 'greedily'
* Line 121: "Therefore, when training and validation data are separated, it is an indicator of synthesis in greedy soups;" could also need some additional explanation
* This paragraph distinguishes between validation data and test data. While validation data has been introduced with k-fold cross-validation, the test data hast not been introduced? This is supposedly refering to the data used to evaluate the finished soup? Please add further explanation.
In Section 4 (Experimental Results) the authors introduce the models and datasets used for the experiments and present their results
* Introduction of models and datasets would not be expected here, this should be part of a section/subsection "Experimental Setup"
* Table 3: mark best value for Val also and add a litte more explanation to the caption
* If possible, add information about the performance of the single best/worst model that was available to the model-soup building process
In Section 5 (Discussion) and 6 (Conclusion) the authors discuss their results draw their conclusions and propose possible future works, which is presented very conclusively and also promises highly interesting future work.
---
And just a remark that may be of interest to the authors:
You also talked about ensemble building and in ensemble building it is an established method to build the average of the models predictions to efficiently build a stronger model (best/earliest example are probably random forests (cf. Leo Breiman)).
However, Acar et al.[1] and Friese et al. [2] showed that the optimal mixture does not have to be at the average. Echtenbruck et al. [3] showed that the optimal mixture of models, when fitness is evaluated by RMSE, can be directly calculated using quadratic programming. Though the methods obviously differ, some ideas are really close and (from my point of view) it would be totally interesting to see if this method can also be applied here and also achieve better results. Maybe you want to consider this for future works.
[1] Acar, E., Rais-Rohani, M.: Ensemble of metamodels with optimized weight factors. Structural and Multidisciplinary Optimization 37 (2009) 279–294
[2] Friese, M., Bartz-Beielstein, T., Emmerich, M.: Building ensembles of surrogates by optimal convex combination. In Papa, G., Mernik, M., eds.: Bioinspired Optimization Methods and their Applications. (2016) 131–143
[3] Echtenbruck, P.; Echtenbruck, M.; Batenburg, J.; Bäck, T.; Naujoks, B.; Emmerich, M.: Optimally Weighted Ensembles of Regression Models: Exact Weight Optimization and Applications
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report (New Reviewer)
Very interesting ant timely article. I think it deserves publication and I am recommending accept with minor corrections. But there are some minor issues that require your attention. I list these corrections below as feedback / comments, and I am looking forward to reading the updated version of this article.
- woudl this Training and Validation Data be applicable for running AI on low memory / low energy devices? There is a reecent artile on the related topic of ‘Algorithms for Artificial Intelligence on Low Memory Devices’ - see: https://ieeexplore.ieee.org/document/9502714 It would be interesting to see a few sentences review and comparison of your work in relations to these recent studies in related topics.
- In the conclusion, you use some techical language e.g., '..we synthesized n ∈ {1, · · · , k} models..' maybe focus less on what you did (because that is for the discussion) and focus more on your key findings, main discoveries, key contributions to knowleged, because these are expecterd from a conclusion chapter. You might also need to include a 'limitations and further research' section in your conclusion, that usualy a good way to end the article.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report (New Reviewer)
Thank you for the adaptations of the paper. Overall, this paper has considerably improved.
Some minor remarks I recognized while reading:
* Line 53: add space Model soups* *[24]
* Line 64/65 k-fold cross validation at the beginning of the sentence - It looks really strange to me when the sentence starts with a small letter. Since I have no idea if it has to be this way (because it's a fixed name or if it should be capitalized, because it's the beginning off this sentence) -- I would end to reformulate the sentence so, that it doesn't start with k-fold. But I would leave that to the author.
* Line 113/114: "A problem that exists is distinct from k-fold cross-validation." Maybe better: "...arises from k-fold..."
* 119 baseline* *in - missing space
* Line 122-146: thank you for the additional explanation this clarifies it here. Post-synthesized model soup is then tested on the independent test data?
* Also the experimental setup section makes the whole process reproducible now. Thank you.
* Table 3 Caption, 1st line: Greedy s*o*ups
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
The paper titled “Model Soups for k-Fold Cross Validation”, the work discussed about maximizes model fine-tuning accuracy with a single 3 model’s inference time and memory cost. The way of presentation is good but the concepts still needs more improvement.
The abstract is very formal; it does not bring any objective of the present work.
The major highlights and working objective is not clear with introduction section.
Authors stated “We examine the application of model soups to a training and validation dataset”, based on the above quote author must include some case study based approach to solve the training and validation dataset issues.
k-Fold Cross Validation is an common method and how it is used for your proposed problem. There is no clarity in the work.
Section 3 may explained with overall state diagram to show the proposed advantages.
Fig2 and 3 is very common result, No proper discussion is shown.
The paper not clearly discussed about different training and validation data and k-fold Cross-validation method.
Authors must refer more recent research to improve the quality of the work
Reviewer 2 Report
Proposed research deals with models for improvement of accuracy of data via k fold cross validation using model soups. Sections are devided according to the guidelines. The introduction gives sufficient overview of the previous work. Materials and methods part, as well as discussion are clearly presented.
The importance and application of the method on concrete examples should be emphasized more strongly in the conclusion and in the abstract.
In scientific papers, it is common to write in the third person. Accordingly, it is necessary to rewrite the entire text, especially the abstract.
Reviewer 3 Report
The authors have done an excellent job in terms of the research and the writing. The manuscript is well organized and neatly written coherently with all necessary information. I think it's easy to read and understand without losing the scientific standard.
Reviewer 4 Report
The subject has minor priority. The novelty of the approach is evaluated as minor. Also the experiments are very shallow and does not cover all aspects of the problem. Therefore I can not suggest the acceptance or even revision of the manuscript. The authors have to consider the following into acount.
1- provide more justification on the proposed approaches.
2- experiment on other datasets.
3- evaluate the approach with other metrics including the performance and generalization.