Comparison of Instance Selection and Construction Methods with Various Classifiers
Round 1
Reviewer 1 Report
Datasets used in the experiments are heterogeneous in all aspects: fields of application, contents, attribute type and number of classes. It would be interesting to highlight if, during the tests, substantial differences in the results were detected, related to the above aspects.
Are there any further perspectives for future research? Authors should also consider elaborating on the limitations of their study.
Author Response
The answers are in the PDF file.
Author Response File: Author Response.pdf
Reviewer 2 Report
This research compared the performance of a few instance selection and construction methods, including decision trees, random forest, Naive Bayes, linear model, support vector machine (SVM), and k nearest neighbor (KNN), in terms of classification accuracy and reduction of training set size. Research results reveal that only a small group of instance selection methods can be used as a general purpose preprocessing step. In addition, when applying training set filtering methods, the users should be aware of the problem of reducing the prediction accuracy of the final classifier. The reviewer believes that the current version of the manuscript is not yet ready for publication; the authors are encouraged to consider the following comments and suggestion and revise the manuscript accordingly.
- The authors should consider streamlining the manuscript to make it more concise. Also, the authors should improve the Abstract section. Currently, it has many acronyms and normally in the abstract all acronym should be spelled out. In addition, the authors should consider splitting the Introduction section into two sections, including an Introduction section and a Background section. The introduction section should focus on introducing the research objectives and research questions to answer. The Background section should focus on literature review and defining the research gap. The current Instance Selection and construction Methods section should be included in the Background section.
- For the experiment, what are the criteria of selecting the input data? All the instance selection and construction methods are designed and developed for different input data. How to make sure the input data are general?
- Why are these instance selection construction methods were selected for this research? Did these methods include the all the current available methods? If not, why are some methods not included? The authors need to justify this.
- For classification accuracy assessment, the authors only used number of correctly classified samples divided by the total number of evaluated samples. The authors are suggested to use true positive (TR), false positive (FR), true negative (TN), and false negative (FN) in a confusion matrix for the accuracy testing. The authors also need to consider commission and omission errors.
- The authors need to justify why reduction of training sets is an indicator of a better method. In operationally practices, of course it does not mean that more training sets will get a better result. The number of training sets selection is dependent on the input data, the purposes, and other impacting factors.
- Some of the figures need to be recreated because currently it is very difficult to read them without zooming in to 200%. Similar to the tables, the authors need to improve some of them.
Author Response
The answers to the comments are in the PDF file.
Author Response File: Author Response.pdf
Reviewer 3 Report
- The paper is well written.
- For the experiment analysis, it's better to have a further explanation of the different results. The reason is from the data or from the approaches or both?
- The size of the selected data set is different. Does the simplified average affect the experiment results?
Author Response
The answers to the comments are in the PDF file.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
The authors have addressed all my comments.