Instance Segmentation Method Based on Improved Mask R-CNN for the Stacked Electronic Components
Round 1
Reviewer 1 Report
The authors have addressed most of my review comments. However, I still want to suggest a few more points:
(1) Based on the results represented in Table 5, the author shows their AP regarding the detection accuracy. However, the main issue in this paper is instance segmentation. Therefore, the various APs for the mask should be considered (AP, AP50, AP75,…).
(2) The results showed in Fig.12 and Fig. 13 indicated that the model is not good enough for the new images:
- Mis-detection (Fig.12a, Fig.12b, and Fig.13a)
- The bounding box cannot cover the whole object in many cases.
- Poor mask result.
In my opinion, the authors need to conduct an in-depth investigation to figure out what is wrong with the model. Configuration and hyper-parameters are the main problems that cause this issue.
(3) The image pair shown in Fig.13 (the upper pair's image) does not correspond to each other.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
The paper was nicely updated.
Technical and scientific contributions are underlined better.
Author Response
Point 1:The paper was nicely updated. Technical and scientific contributions are underlined better.
Response 1:Thank you for your approval. We have revised the conclusion and emphasized the technical and scientific contributions (Page 16, Line 404-408).
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
The authors had addressed most of the comments I made. I think it is now acceptable for publication.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
The authors present an instance segmentation method based on the improved Mask R-CNN network. The new network is used for recognizing the stacked electronic components. The purpose of this work is to achieve fast and accurate autonomous detection of stacked electronic components. Although the purpose of this research and the proposed method look interesting and promising, the result shown in this paper fail to show enough contribution for consideration for publication.
These are my main concerns:
It is hard to say that the hypothesis from authors is applicable. Line 244-252: Authors claim that large and complex network is difficult to be practically applied, especially on embedded devices with small memory. That is the reason of the proposed the method to combine the MobileNets and the Mask R-CNN to achieve a balance between accuracy and speed. Table 2 (page 13) shows the comparison of the detection speed and the model size. I agree that with this combination, you can have a smaller network architecture with fewer parameters. Of course, the size of the model is also smaller, compared with the original one. However, even the test time per image is two times faster compared with the original Mask R-CNN under the same circumstances, 1.8 seconds is too long for real-time operations. Another problem is that the original Mask R-CNN can be run at the speed of 5 fps (0.2 seconds per image) with a typical hardware configuration, nine times faster when compared with the results obtained from the paper. Therefore, it is hard to consider this assumption is acceptable.
I recommend authors reproduce their work under the typical hardware configuration rather than focusing on low-level hardware (or embedded devices) as their work in this version when real-time implementation is sought.
Results produced by this manuscript are not purely superior than the original version of Mask R-CNN. In Line 351-352, authors mentioned their improved method reduced the mAP value by about 4% compared to the original Mask R-CNN method (refer to Table 3 on page 13). Authors consider that the contributions of this research are an improved method; therefore, the mAP must be also higher or at least equal to the original Mask R-CNN network. But not in the case.
Comments for author File: Comments.pdf
Reviewer 2 Report
In the section where the dataset is introduced - and the VIA annotation tool is mentioned - please describe what labels have been used in the annotation process.The labels are the same as the ones in Table 1? A description of RPN and Mask-RCNN, MobileNets and improved Mask-RCNN is given. It should not be too detailed because they are well known and they do not represent a contribution of the current paper. Instead the authors should detail if they have modified any of these architectures for their particular task. The scientific contribution is shallow. Authors should insist in their original contribution - that upon my opinion is just the introduction of an annotated dataset of electronic components. This can be a contribution itself but a more detailed study about the behavior of object recognition methods should be included in this direction.