Next Article in Journal
Editorial on Hydrology and Water Resources in Agriculture and Ecology
Next Article in Special Issue
Unsupervised Domain-Adaptive SAR Ship Detection Based on Cross-Domain Feature Interaction and Data Contribution Balance
Previous Article in Journal
Suitability Assessment of Cage Fish Farming Location in Reservoirs through Neural Networks-Based Remote Sensing Analysis
Previous Article in Special Issue
Semantic-Layout-Guided Image Synthesis for High-Quality Synthetic-Aperature Radar Detection Sample Generation
 
 
Article
Peer-Review Record

A Lightweight Arbitrarily Oriented Detector Based on Transformers and Deformable Features for Ship Detection in SAR Images

Remote Sens. 2024, 16(2), 237; https://doi.org/10.3390/rs16020237
by Bingji Chen 1,2, Fengli Xue 1 and Hongjun Song 1,*
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Remote Sens. 2024, 16(2), 237; https://doi.org/10.3390/rs16020237
Submission received: 29 November 2023 / Revised: 29 December 2023 / Accepted: 31 December 2023 / Published: 7 January 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

General comments:

This manuscript presents LD-Det, which is a lightweight arbitrary-oriented ship detector for SAR images based on transformers and deformable features. The overall structure of the paper is sound and well experimented. But there were still problems with the paper:

 

In Figure 1, there are a lot of ambiguities in the author's description of the network, and I have the following questions: First, the feature maps {C2,M2,P2} shown have exactly the same size and feature dimensions. Please confirm whether this is reasonable. Meanwhile, the horizontal arrows are not clear; please explain specifically what operation similar to C2 to M2 has gone through. In addition, the head part is not clear; is it that the author only utilizes the head corresponding to P3, and whether other heads are used or not? Please explain it clearly in the figure. Also, please explain the meaning of the arrow in the first step of the dotted line and why the input and output are identical.

 

In the lightPVT module, why is it better to delete the C5 layer of PVT v2-B0-Li? Please add experiments to illustrate this question. How much did the experimental metrics decrease after deleting C5? And by how much does the number of parameters decrease? What is the trade-off between the two?

 

In the ablation experiments in the MDFPN module, please add the results of the experiments using MDC only at Position 2.

 

In the SDHead module ablation experiments in Table 3, the addition of this module resulted in a limited increase in experimental metrics as well as an increase in computational effort, so please provide additional information on the need for this module.

 

 

Details:

1.     The displayed image results are too large and not very beautiful, and the target detection box is not uniformly thick and thin; they can all be adjusted a little thicker.

2.     There is a problem with the formatting of lines 132–151; please revise.

3.     Figs. 7–9 are too big; the font in the figure is much larger than the font in the article; please modify accordingly.

Comments on the Quality of English Language

1.     The first letter of the full name must be capitalized, e.g.,Multiscale Deformable Feature Pyramid Network (MDFPN).line 110 - line 126 Make the corresponding changes.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

In this paper, the authors propose a lightweight transformer-based method for detecting arbitrarily oriented ships in SAR images, called LD-Det, which excels in promptly and accurately identifying rotating ship targets. In general, the work in this paper is an important application of synthetic aperture radar (SAR).

 

1. In section 2.5, the Loss function is presented. However, the reviewer wanders to know how to determine the parameters of lamda.

2. A recommendation section should be added to discuss the limitations of authors’ method and future work about the authors’ method.

Comments on the Quality of English Language

Fine

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The main contribution of the paper is the proposal of LD-Det, a lightweight transformer-based method for detecting arbitrarily oriented ships in SAR images. This method addresses the limitations of existing CNN-based detection frameworks by introducing the LightPVT as a lightweight backbone network and the MDFPN as a neck network, effectively capturing long-range dependencies and extracting ship features from SAR images. The SDHead further enhances ship feature extraction, resulting in state-of-the-art performance for detecting rotated ship targets compared to other lightweight methods. In general, the manuscript is well-structured and holds practical significance and application value. However, several minor issues need to be addressed before it can be published.

The introduction should be refined to clearly articulate the research problem and provide a strong justification for the proposed study. Explicitly stating the research gap and the significance of the study will enhance its impact. Furthermore, the transition between discussing existing SAR ship detection methods and introducing the proposed approach needs to be smoother, while eliminating repetition.

Currently, the transformer has been applied to detect vehicle rotation, and it may be beneficial to expand the comparative experiments in this area. Furthermore, the author introduced a lightweight ship detection method, but only one of the comparison methods is lightweight (RTMDet), and the comparison results lack persuasiveness. Other lightweight methods, such as YOLO-based techniques, are not included in the comparison, which is a notable absence.

Line 289: Is the ratio of training set to test set 5: 2?

Line 294-296: The rationale behind setting different epochs for the two datasets should be explained. Additionally, it needs to be clarified whether the decision to use a half learning rate for a specific epoch is deliberate or random.

In the conclusion section, it highlights the plan for future research but does not offer a strong conclusive statement or summarize the main findings of the study. A clearer wrap-up of the key contributions and implications of the research is needed to enhance the conclusion.

Comments on the Quality of English Language

The language presentation is strong, but it could benefit from further refinement to enhance conciseness and ensure consistency throughout the text.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

Dear Author:

I recommend the publication of this manuscript.

Best regards

Author Response

Dear Reviewer 4,

Thank you for reviewing this article.

Best regards,

Bingji Chen

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

This paper has been revised with clearer experiments, clearer and more detailed descriptions, and more reasonable English expressions. The authors have provided additional experiments and detailed descriptions to answer my questions well.

But there are still two minor problems in details, i.e. the serial numbers of subfigures in Figure 10 and 11 are labeled incorrectly( not in order (a)(b)(c)(d)).

Author Response

Dear Reviewer 1,

Thank you for pointing out the errors in the article. I have modified the the serial numbers of subfigures in Figure 10 and 11.

Best regards,
Bingji Chen

Back to TopTop