FFA: Foreground Feature Approximation Digitally against Remote Sensing Object Detection
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe foreground (shallow) feature approximation proposed in this paper is used to generate adversarial samples for a mixture of targeted or untargeted attacks around the task of adversarial attacks for target detection in remote sensing imagery. It is worthwhile to recognize that the writing of the paper and the design of the FFA framework are clearer in description. However, my question is for the foreground image composed of high-confidence targets extracted by the target detection algorithm, why not extract mid-level or deep-level features for that foreground image, but only shallow-level features? Is it because the noise of the approximated hybrid image or the specified target cannot correspond better to the target of the foreground image on the middle or deep layer? The manuscript requires careful consideration of the following three questions before publication:
1、Currently, scholars have utilized Adversarial examples to test the feature representation within deep neural networks from the perspective of model failure. Through the analysis, it is found that there is an inconsistency between the feature representations learned by deep neural networks and the semantic concepts understood by humans. The authors can refer to the corresponding analysis to further evaluate the advantages and drawbacks of the proposed method to generate adversarial samples based on the quantitative description of the inconsistency.
2、For remote sensing images that include both small and large targets, if the target detection algorithm recognizes that high-quality targets constitute the foreground image, assuming that its foreground includes part of the large targets and part of the small targets, how to assess the relationship between the attacked targets and the targets not detected in the original remote sensing image after attacking these targets to generate the antagonistic hybrid image using FFA? This is like an image with large and small apples, how to portray the relationship between the peaches and the remaining undetected apples after changing some of them to peaches? As a hacker, after getting an adversarial sample, one would consider the quantification of this relationship, i.e., does the presence of peaches enhance the deception of the blended image?
3、The contribution of the paper needs to be re-refined, FFA is a combinatorial approach to existing target detection algorithms and modules, which aims to generate adversarial samples adversarial and deceptive, and evaluating the adversarial samples is also a more important part.
Comments on the Quality of English LanguageLanguage and content logic need some modification.
Author Response
Thank you very much for your valuable review comments on this article. After carefully reviewing your feedback, we comprehensively revised and improved the manuscript to make the study more accurate, clear, and persuasive. Please see the attached detailed response and the corresponding amendments highlighted in the resubmission.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsReviewer Confidential Comments to Editor:
The paper addresses the issue of weak portability in existing adversarial attack techniques for object detection and proposes a Foreground Feature Approximation (FFA) method to generate adversarial examples (AEs). By altering the feature information carried by the image itself, it discovers common vulnerabilities in object detection (OD) to implement attacks. The paper is innovative and the experiments are thorough. However, some minor issues need further revision, as detailed below.
Reviewer Blind Comments to Author:
1、The abstract should specifically list the effects of the method, such as a quantitative analysis of the overall improvement in accuracy.
2、The paper should include an overall framework diagram encompassing all parts of the method. The current Figure 3 primarily showcases the network structure.
3、The introduction and related work sections should better reflect the latest research, especially in various applications of deep learning. Some recent articles should be referenced, such as landslide extraction from aerial imagery considering context association characteristics, a cross-view intelligent person search method based on multi-feature constraints, and building height extraction from high-resolution single-view remote sensing images using shadow and side information.
4、In section 4.1.3. Evaluation Metrics, specific formulas and calculation methods for evaluation should be presented.
5、The title of the fifth section should be "Conclusion" as it contains very little discussion. If discussion is needed, the author can add a separate section for it.
Author Response
Thank you very much for your valuable review comments on this article. After carefully reviewing your feedback, we comprehensively revised and improved the manuscript to make the study more accurate, clear, and persuasive. Please see the attached detailed response and the corresponding amendments highlighted in the resubmission.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsAll questions have been revised and agreed to be published.