ADSSD: Improved Single-Shot Detector with Attention Mechanism and Dilated Convolution
Round 1
Reviewer 1 Report
ADSSD: Improved Single-Shot Detector with Attention Mechanism and Dilated Convolution
Single Shot MultiBox Detection (SSD) is one of the most suitable algorithms for small object detection. This paper proposed an effective detection method called Single Shot MultiBox Detection with Attention mechanism and Dilated convolution (ADSSD). The purpose of this method is to suppress background information and extract multi-scale context information.
Experimental results on PASCAL VOC2007 and VOC2012 datasets, shows the 300×300 input ADSSD model reaches 78.4% mAP and 76.1% mAP. This results outperform SSD and other advanced detectors. At the same time, the performance of ADSSD is less affected by factors such as dense occlusion compared to SSD.
This paper is clearly presented and easy to follow.
Some issues need more explanation:
1. In line 245 'we use AP and AR for evaluation', what is 'AR' meaning at here?
2. why ADSSD use 300*300 input size and VGG16 as backbone?
Because SSD also use this setting?
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
The paper is well-organized and well-structured and also lacks some details for readers to understand. I would suggest the authors address the following concerns in order to meet the publication requirement:
1. Your abstract does not highlight the specifics of your research or findings. Rewrite the Abstract section to be more meaningful. (The Problem, Aim, Methods, Results, and Conclusion.)
2. Introduction section can be extended to add the issues in the context of the existing work and how the proposed algorithms/approach can be used to overcome this.
3. The problems of this work are not clearly stated. There is ambiguity in the statement understanding.
4. In the Literature review section, the authors should find some latest research work and cite them. Also, the difference between the proposed model and the existing methods should be further described.
5. I feel that more explanation would be needed on how the proposed method is performed. The following papers are good examples:
https://doi.org/10.1038/s41598-021-90428-8
https://doi.org/10.1155/2022/5052435
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
The paper proposes to improve SSD by incorporating attention mechanisms and dilated convolutions. The method is verified on VOC 2007 and VOC 2012 and results are improving. Some issues should be addressed:
1. The novelty is limited. For example, by introducing a residual layer to CBAM cannot be regarded as a major technical contribution.
2. What's the meaning of "small object detection algorithms" in the caption of Table 4. Averages scores should computed for Table 5 and table 6.
3. Some relevant works on visual attention mechanisms should be included in Sec. 2.2., e.g., Motion-attentive transition for zero-shot video object segmentation and Cascaded human-object interaction recognition.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
The paper is goog enough and I suggest publishing it.
Reviewer 3 Report
The revision has addressed my concerns.