Hardware–Software Co-Design of an Audio Feature Extraction Pipeline for Machine Learning Applications
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper explores the simplifications of the MFCC audio features and derive a simplified version, which can be more easily used in embedded applications. Additionally, this paper implement a hardware generator that generates an appropriate hardware pipeline for the simplified audio features extraction. The cited references are generally related to the research content and have some cutting-edge characteristics, But the latest literature is relatively scarce and it is recommended to increase them. No important references were excluded.
Comments on the Quality of English LanguageThe paper has good readability, smooth logic, and clear expression. It is recommended to provide a more detailed description of the core scheme of the paper. The tables in the paper can clearly reflect the experimental results. It is recommended to add more detailed and eye-catching annotations in the images or adjust the proportion of key display positions appropriately to reflect the experimental results.
Author Response
Dear reviewer,
Please find our responses in the attached pdf file.
Best regards,
Jure Vreča
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript is well written and organized; the proposed approach, while certainly not novel in its general form, is interesting for the specific case. Furthermore, the authors report convincing results that, in my opinion, well support their proposal. The manuscript is somehow short and it can be extended by explaining better certain choices that have been made and, more in general, by providing more details on the implementation.
Following, some additional issues that I noticed in the manuscript:
* Section 0 (Introduction) should be numbered Section 1.
* In Section 0, it is mentioned that audio is sampled at 16kHz, but in Equation 1, 16383 samples are considered. The author should explain the reason of these additional 383 samples.
* Subsection 1.1 is unnecessary, being the only subsection of Section 1, and should be removed.
* In Section 2.2, the authors mention that the FFT results are incremented by 1 to achieve better numerical stability. This choice needs to be motivated.
* Note 1 should be included in the text of the section, if the authors feel like it is important to mention this.
* Figure 6 is difficult to understand even by reading the explanatory text. To me, it creates unneeded complexity and it should be changed/removed, unless there is a specific reason for the proposed graphical representation (in this case, it should definitely be explained better).
Author Response
Dear reviewer,
Please find our responses in the attached pdf file.
Best regards,
Jure Vreča
Author Response File: Author Response.pdf