Improved Test Input Prioritization Using Verification Monitors with False Prediction Cluster Centroids
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript titled "Improved Test Input Prioritization Using Verification Monitors with False Prediction Cluster Centroids" discusses the Test input prioritization techniques to detect corner cases efficiently. The developed deep learning-based false prediction cluster achieved significantly improved performance of Test input prioritization. The approach and results presented by the authors are novel, and hence should be accepted in the present form.
Author Response
Thank you very much for taking the time to review this manuscript. We hope that this reply will satisfy all concerns of the reviewers. Please find the detailed responses below and corrections.
Response to Reviewer #1 Comments
> Comments # 1: The manuscript titled "Improved Test Input Prioritization Using Verification Monitors with False Prediction Cluster Centroids" discusses the Test input prioritization techniques to detect corner cases efficiently. The developed deep learning-based false prediction cluster achieved significantly improved performance of Test input prioritization. The approach and results presented by the authors are novel, and hence should be accepted in the present form.
Response # 1: Thank you for your kind review for our manuscript. We updated our manuscript with additional paragraphs and ablation study. Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsBased on my review, the paper "Improved Test Input Prioritization Using Verification Monitors with False Prediction Cluster Centroids" makes several notable contributions:
- The paper proposes a novel test input prioritization (TIP) method called "DeepFPC" that uses the distance between false prediction cluster centroids and unlabeled test instances to prioritize inputs that are more likely to induce errors.
- The DeepFPC method is theoretically grounded in the concepts of intra-class feature compactness and inter-class feature separability. The authors provide analysis linking these concepts to the potential effectiveness of DeepFPC.
- Extensive experiments on image classification datasets demonstrate superior performance of DeepFPC over several state-of-the-art TIP techniques such as DeepAbstraction, surprise adequacy methods, and intrinsic function methods like Gini impurity.
- DeepFPC shows strong results both for identifying error-inducing inputs and for retraining via active learning across diverse models and datasets. This highlights its general applicability.
- The visualization provides useful intuition on differences between high/low priority inputs selected by DeepFPC. The lower priority inputs appear more prototypical while higher priority ones seem outliers.
Overall, I find this a novel contribution advancing the state-of-the-art in intelligent testing of deep learning systems. Some potential limitations/areas of improvement:
- The computational overhead of DistFPC vs simpler measures like Gini should be analyzed more thoroughly.
- It would be informative to test DeepFPC on additional complex model architectures and datasets.
- Further ablation studies could help illuminate the individual benefits of the Euclidean vs angular components of DistFPC.
But these do not significantly diminish the quality, rigor, and relevance of the research. In summary, this paper makes excellent progress on an important problem and provides a strong basis for future work in the space. I recommend acceptance after minor revisions.
Comments on the Quality of English LanguageIt is correct.
Author Response
Thank you very much for taking the time to review this manuscript. We hope that this reply will satisfy all concerns of the reviewer. Please find the detailed responses below and corrections in the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis work proposes a Test Input Prioritization (TIP) technique, which shows promising performance on various tasks. Generally, the overall quality of this paper is good, some minor comments are listed as follows.
1, The loss function(s) used in this work should be specified.
2, It's better to compare the proposed method with standard supervisied learning results.
3, Active learning is important for various tasks (beyond image classification) where human annotations are expensive, for example, image/video quality assessment [1-2], it would be better to give some discussions about the potential of applying the proposed method to these tasks.
[1] Continual learning for blind image quality assessment
[2] Blindly Assess Quality of In-the-Wild Videos via Quality-aware Pre-training and Motion Perception
Author Response
Thank you very much for taking the time to review this manuscript. We hope that this reply will satisfy all concerns of the reviewers. Please find the detailed responses below and corrections in the attachment.
Author Response File: Author Response.pdf