SGooTY: A Scheme Combining the GoogLeNet-Tiny and YOLOv5-CBAM Models for Nüshu Recognition
Round 1
Reviewer 1 Report
This manuscript proposed a publicly available Chinese Nushu character dataset and leveraged a few deep neural networks with architecture modifications to achieve the image classification task. Overall, this paper is well-written. The authors did a thorough literature review and presented the networks in detail. There are only minor revisions needed:
1. There are minor formatting issues. Examples are: in line 51 "datasets,In this paper," there should be a space before "In" and "In" should be lowercased; In line 325 "cross-entropy[41](BCE)", there should be a space before "(BCE)".
2. I suggest to add a short paragraph at the end of the Introduction Section to briefly introduce how this article is organized so that readers are easier to follow.
The quality of English language is good, only minor formatting issues need to be fixed.
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 2 Report
The authors present a method to recognize handwritten letters. The authors developed a two stage recognition system: On the first stage, the GoogLeNet-tiny, and on the second stage, YOLO5-CBAM is applied.
* Comments for unclear points in the manuscript:
1. I have a question: the authors proposed a two stage recognition method. However, it is unclear that, how the first stage's model of the recognition system determines the falsely recognized characters(letters) and tosses them to the second stage's model? Describe the method in more detailed manner in that sense.
2. Generally, it is a common idea to apply ensemble method to identify data with several weak method. Why the authors develop a two stage model?
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
1. In section 4.4.2. Parameter Setting, it emphasizes the importance of the initial configuration. However, it does not provide a description of why the Discrete Staircase Method was adopted as the optimization algorithm for training the deep learning model, despite the existence of multiple optimization algorithms for deep learning model training in the initial configuration.
2. The paper claims that the addition of a BN (Batch Normalization) layer to the GoogLeNet-tiny model significantly improves its training speed. However, the paper does not provide direct comparison data between the GoogLeNet-tiny model and the GoogLeNet model. This lack of comparison or evaluation of performance differences and training speed improvement between the two models is a limitation.
3. 1. In the experimental results of the paper, various graphs were presented to clearly demonstrate the training outcomes of YOLOv5. However, for the basic models such as AlexNet-BN, AlexNet-SC, AlexNet-LR, and GoogLeNet-tiny, apart from the validation accuracy and train loss graphs, no additional visual materials were provided. Therefore, to enhance the explanatory power of the paper, it would be beneficial to include graphs similar to the YOLOv5 training, specifically for the basic models, to provide a clear visualization of the results.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
1. In Line 237-242, what is the meaning of the threshold? Please, explain the role of the threshold in training process and in test process, separately, with mathematical expression.
2. For example, if the predicted label is scored under the threshold value, then the input image is put into the YOLO5, in test process?
3. In Line 237-242, the first stage's threshold value is not presented. The authors must reveal the applied threshold's value exactly.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
I think the revised manuscript inlcudes all of the comments rised by the reviewer. I recomment this one for publication.
Author Response
We gratefully appreciate for your valuable suggestion.