Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

DiffuPrompter: Pixel-Level Automatic Annotation for High-Resolution Remote Sensing Images with Foundation Models

Remote Sens. 2024, 16(11), 2004; https://doi.org/10.3390/rs16112004

by Huadong Li^1,2

, Ying Wei^1,*, Han Peng²

and Wei Zhang²

Reviewer 1:

Anisha Rodrigues

Reviewer 2: Anonymous

Reviewer 3:

Keyan Chen

Remote Sens. 2024, 16(11), 2004; https://doi.org/10.3390/rs16112004

Submission received: 23 April 2024 / Revised: 24 May 2024 / Accepted: 29 May 2024 / Published: 2 June 2024

(This article belongs to the Special Issue Advanced Machine Learning Models for Remote Sensing Applications and Data Analysis—Recent Developments)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript propose a training free automatic annotation method called DiffuPrompter, achieving pixel-level automatic annotation of RSIs(remote-sensing image). Experimental results indicate that the proposed method can provide reliable pseudo labels, significantly reducing the annotation costs of the segmentation task.

I have the following comments for this manuscript

The topic presented in the manuscript looks quite interesting. The proposed method, DiffuPrompter utilizes the pre-trained SDM to explore generating semantically explicit prompts for SAM. The manuscript provides brief overview of SDM and SAM and uses quite interesting dataset for study.

An experiment conducted is not enough. It is suggested measure performance through different segmentation methods. Moreover, it is also suggested to include the efficiency results, e.g., running time, for the proposed method and baselines.

The implementation details of the proposed method are compared with classical deep learning architectures and not compared with the most important state-of-the-art work.
The writing of this paper needs further improvements.

Comments on the Quality of English Language

Author Response

Thank you very much for taking the time to review this manuscript. Your comments and suggestions have been extremely helpful in improving the quality of the article. Please see the attachment to find the detailed responses.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript title "DiffuPrompter: Pixel-level Automatic annotation for High-Resolution Remote Sensing Images with Foundation Models" is well presented and interesting topic. I think it will help the other researchers to fork effectively.

I have some concerns for the authors.

1) Is there any hypothesis or condition is consider while implementing this work?

2) The text prompt can be longer or shorter than 3 words?

3) can you make source code available?

4) What are the limitations of the proposed system?

5) Please add more keywords.

6) Also it will be interesting; if you provide time taken to finish annotations of different benchmark dataset.

Overall article is good but need minor revision.

Author Response

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript under review explores the potential of foundational models in the realm of automated annotation, proposing an innovative, training-free annotation method that integrates Stable Diffusion and SAM, thereby reducing the annotation costs associated with segmentation tasks. The paper demonstrates a degree of innovation, yet there are several issues that require attention:

In the introduction, when delineating the contributions, it is recommended to first provide an overarching summary of the novel method introduced in the paper, the problem it addresses, and the conclusions drawn. Following this, a detailed description of the designed method should be presented. Lastly, the performance on datasets and the effects achieved should be described. The current narrative is too general.
In the section "Theory and Methods," the term "Theoty" appears to be inappropriately used.
The font style for the equations is not suitable; the text should be in regular typeface, not italicized.
The narrative in section 2.2.2 is not sufficiently clear. It is suggested to highlight the core concepts and consider using diagrams or flowcharts for illustration.
How does the method address potential issues with incomplete segmentation for non-rigid targets, such as certain venues, tennis courts, and running tracks?
What are the criteria for evaluating the accuracy of the proposed annotation method?
The method seems to have an excessive number of hyperparameters. How are these values determined, and can the generalizability of these parameters across different categories or scenarios be assessed?
The tables in the article are difficult to comprehend. It is suggested to include important details to aid understanding.
In terms of experimentation, a comparison with some common supervised segmentation methods is necessary. Additionally, a comparison with Ground Truth (GT) results should be provided, analyzing the performance for each category.
Key references are missing. It would be beneficial to include additional citations on remote sensing segmentation methods (e.g., "Learning to aggregate multi-scale context for instance segmentation in remote sensing images"; "Building extraction from remote sensing images with sparse token transformers") and attention control methods for Stable Diffusion (e.g., "Diffusion models for imperceptible and transferable adversarial attack").

Article Menu

DiffuPrompter: Pixel-Level Automatic Annotation for High-Resolution Remote Sensing Images with Foundation Models

Further Information

Guidelines

MDPI Initiatives

Follow MDPI