Evading Logits-Based Detections to Audio Adversarial Examples by Logits-Traction Attack
Round 1
Reviewer 1 Report
The paper is very well written. It raises important issues and presents a new solution of high-precision speech adversarial example detection techniques. The literature review is detailed and up to date. The presented methods are presented solidly. However, the algorithm could be described less briefly, e.g. i := 1 to b; max(logits(i,j,:)) is not indicated which means colon. What's more, it is required to remove all texts from the journal template!
Author Response
Dear Reviewer:
Thank you so much for your careful check.
Regarding the algorithm's symbolic description, this results from our preconceived ideas; we didn't realize that the colon would be ambiguous. In the new edition of the paper, we have inserted an explanation of the symbol, where "0:b" means traversing over values from 0 to b, and ":" means it traverses all the values of the dimension in symbol resides.
Regarding the issue that the journal template was not thoroughly cleaned up, we checked the peer-reviewed version of the paper again. The "Publisher's Note" and "Copyright" information on the first page was not completely removed, and the DOI information in the footer of each page was not removed. We have deleted them from the newly submitted paper.
Thanks again for your valuable advice.
Sincerely
Author Response File: Author Response.docx
Reviewer 2 Report
In the introduction it is said "To this end, this paper designs a Logits-Traction attack." This aspect might be stressed more strongly with advantages and disadvantages in the paper.
Please revise the description of the contributions: lines 92-108.
For example at point 1 ... To the best of our knowledge, since containing the decision of the neural network, the difference in logit distribution is extensive between normal speech and the traditional adversarial examples.
More details regarding Figure 3
Please be more clear with line 237 "That means, in the features used to detect adversarial examples,..."
If possible more examples might be given.
Maybe you revise the style: ... This paper does...
Author Response
Dear Reviewer:
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
The authors uses Logits-based Detections on Audio Adversarial Examples by using Logits-Traction Attack.
These are my comments for the authors, which are needed to enhance the manuscript:
1.The main contribution is not clear. Please specify it.
2. The motivation of the approach is not clear.
3. The abstract and conclusion should be objectified.
4. Discussion of related work on modeling and classification is advised to be extended to the following ideas that treat successful applications in various fields:
A Neural Architecture Search for Automated Multimodal Learning [Expert Systems with Applications 2022]; Benchmarking deep neural network approaches for Indian Sign Language recognition [Neural Computing and Applications 2021]
5.You are invited to point out the training, the validation and the testing.
6. Are these very good results caused by overfitting? Please discuss.
7. The quality of figures requires improvement. Fig. 2 is not visible.
Author Response
Dear Reviewer:
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 3 Report
The manuscript may be accepted for publication.