DiffuPrompter: Pixel-Level Automatic Annotation for High-Resolution Remote Sensing Images with Foundation Models
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript propose a training free automatic annotation method called DiffuPrompter, achieving pixel-level automatic annotation of RSIs(remote-sensing image). Experimental results indicate that the proposed method can provide reliable pseudo labels, significantly reducing the annotation costs of the segmentation task.
I have the following comments for this manuscript
- The topic presented in the manuscript looks quite interesting. The proposed method, DiffuPrompter utilizes the pre-trained SDM to explore generating semantically explicit prompts for SAM. The manuscript provides brief overview of SDM and SAM and uses quite interesting dataset for study.
- An experiment conducted is not enough. It is suggested measure performance through different segmentation methods. Moreover, it is also suggested to include the efficiency results, e.g., running time, for the proposed method and baselines.
- The implementation details of the proposed method are compared with classical deep learning architectures and not compared with the most important state-of-the-art work.
- The writing of this paper needs further improvements.
Comments on the Quality of English Language
-
Author Response
Thank you very much for taking the time to review this manuscript. Your comments and suggestions have been extremely helpful in improving the quality of the article. Please see the attachment to find the detailed responses.
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript title "DiffuPrompter: Pixel-level Automatic annotation for High-Resolution Remote Sensing Images with Foundation Models" is well presented and interesting topic. I think it will help the other researchers to fork effectively.
I have some concerns for the authors.
1) Is there any hypothesis or condition is consider while implementing this work?
2) The text prompt can be longer or shorter than 3 words?
3) can you make source code available?
4) What are the limitations of the proposed system?
5) Please add more keywords.
6) Also it will be interesting; if you provide time taken to finish annotations of different benchmark dataset.
Overall article is good but need minor revision.
Author Response
Thank you very much for taking the time to review this manuscript. Your comments and suggestions have been extremely helpful in improving the quality of the article. Please see the attachment to find the detailed responses.
Author Response File: Author Response.docx
Reviewer 3 Report
Comments and Suggestions for Authors-
In the introduction, when delineating the contributions, it is recommended to first provide an overarching summary of the novel method introduced in the paper, the problem it addresses, and the conclusions drawn. Following this, a detailed description of the designed method should be presented. Lastly, the performance on datasets and the effects achieved should be described. The current narrative is too general.
-
In the section "Theory and Methods," the term "Theoty" appears to be inappropriately used.
-
The font style for the equations is not suitable; the text should be in regular typeface, not italicized.
-
The narrative in section 2.2.2 is not sufficiently clear. It is suggested to highlight the core concepts and consider using diagrams or flowcharts for illustration.
-
How does the method address potential issues with incomplete segmentation for non-rigid targets, such as certain venues, tennis courts, and running tracks?
-
What are the criteria for evaluating the accuracy of the proposed annotation method?
-
The method seems to have an excessive number of hyperparameters. How are these values determined, and can the generalizability of these parameters across different categories or scenarios be assessed?
-
The tables in the article are difficult to comprehend. It is suggested to include important details to aid understanding.
-
In terms of experimentation, a comparison with some common supervised segmentation methods is necessary. Additionally, a comparison with Ground Truth (GT) results should be provided, analyzing the performance for each category.
-
Key references are missing. It would be beneficial to include additional citations on remote sensing segmentation methods (e.g., "Learning to aggregate multi-scale context for instance segmentation in remote sensing images"; "Building extraction from remote sensing images with sparse token transformers") and attention control methods for Stable Diffusion (e.g., "Diffusion models for imperceptible and transferable adversarial attack").
n/a
Author Response
Thank you very much for taking the time to review this manuscript. Your comments and suggestions have been extremely helpful in improving the quality of the article. Please see the attachment to find the detailed responses.
Author Response File: Author Response.docx
Round 2
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have addressed all my questions.
Comments on the Quality of English Languagen/a