Next Article in Journal
A Vision-Based Approach for Autonomous Motion in Cluttered Environments
Next Article in Special Issue
Multi-Label Extreme Learning Machine (MLELMs) for Bangla Regional Speech Recognition
Previous Article in Journal
Experimental Study on Settlement Behavior of Ballasted Tracks with Polymer Compound-Coated Gravel
Previous Article in Special Issue
Electroglottograph-Based Speech Emotion Recognition via Cross-Modal Distillation
 
 
Review
Peer-Review Record

Automatic Speech Recognition (ASR) Systems for Children: A Systematic Literature Review

Appl. Sci. 2022, 12(9), 4419; https://doi.org/10.3390/app12094419
by Vivek Bhardwaj 1, Mohamed Tahar Ben Othman 2,*, Vinay Kukreja 3,*, Youcef Belkhier 4, Mohit Bajaj 5, B. Srikanth Goud 6, Ateeq Ur Rehman 7,8, Muhammad Shafiq 9,* and Habib Hamam 8,10,11,12
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2022, 12(9), 4419; https://doi.org/10.3390/app12094419
Submission received: 21 March 2022 / Revised: 21 April 2022 / Accepted: 23 April 2022 / Published: 27 April 2022
(This article belongs to the Special Issue Automatic Speech Recognition)

Round 1

Reviewer 1 Report

The paper is an interesting and useful review in a very important and actual topic. I’m sure the paper deserves publication in the journal after some minor aspects are fixed.

Avoid capital letters chapters and subchapters titles.

 

The introductive chapter should not be divided into sections.

 

In Figure 2, why years are reported in a contrary way respect to standard expectation? I mean, I'm expecting to see years from older to younger, and not the opposite

 

The sentence at line 137 and 138 is not so clear, it would be useful to explain better what do you mean with the difference between speech signal as a signal or as a natural language.

 

In lines 612,613 and 614 there is a wrong sentence from a grammatical point of view, maybe some verbs or something else lacks.

 

In subsections 2.6.2 and 2.6.3 it would be useful to specify better in what references have been shown the results you are talking about, because two references are cited but it is not clear if they are generic reference on the topic or the ones which found the results you are talking about.

 

In sections 4.5 and 4.7 the references are not present, differently from the other sections of chapter 4 which have very detailed refences tables.

 

Almost all the sections show a very good presence of references, except for the ones mentioned. It is a very complete and systematic review, with very precise methodologies for grading the various references. Every chosen reference is then analyzed deeply. It is useful also because it contains many now available.

 

Figura 1 depicts well the various components of the field of speech recognition and it is useful to fix the important points.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This review is to through light on the trends of research in Children's speech recognition and to analyze the potential of tending techniques to recognize the children's speech. This paper is well organized. Some comments are as follows:

 

[1] Figs. 2, 4, 5 and 7 need to be scaled down so that it does not exceed the text size.

 

[2] The annotation explanation for Figure 3 should be placed inside the pie chart.

 

[3] Figure 6 needs to be described more clearly in its title.

 

[4] Classifiers are common tools used for automatic speech recognition. So, some robust classification methods should be introduced for better recognition, such as follows:

(1) Classification with Noisy Labels by Importance Reweighting[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2016, 38(3):447-461.

(2) Granular Ball Sampling for Noisy Label Classification or Imbalanced Classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021.

(3) mCRF and mRD: Two Classification Methods based on a Novel Multiclass Label Noise Filtering Learning Framework [J]. IEEE Transactions on Neural Networks and Learning Systems.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

This study presents the Systematic Literature Review on Automatic Speech Recognition (ASR) System for Children’s speech.

Automatic Speech Recognition (ASR) is one of the ways for transforming the acoustic speech signals into the text. Over the last few decades, an enormous amount of research work has been done in the research area of ASR. Although, over the last few years, most studies have focused on building ASR systems based on Adult speech. Recognition of children's speech was neglected for a long time, which leaves children's ASR research area wide open. Children's ASR is a challenging task due to the large variations in the children’s articulatory, acoustic, physical, and linguistic characteristics as compared to the adults. Thus, it became a very attractive area of research and is important to understand, where the main center of attention is: what are the most widely used methods for extracting the acoustic features, various acoustic models, speech datasets, SR toolkits used during the recognition process, and so on. ASR systems or interfaces are extensively used and integrated into various real-life applications such as search engines, healthcare industry, biometric analysis, car systems, military, people with disabilities, and mobile devices. A systematic literature review is presented in this work (SLR) by extracting the relevant information from 76 research papers published from 2009 to 2020 in the field of ASR for children. The objective of this review is to through light on the trends of research in Children’s speech recognition and to analyze the potential of tending techniques to recognize the children's speech.

 

My comments:

1) Include block diagrams of major ASR systems, specifically for Children speech

2) Latest techniques in ASR are missing (like end-to-end systems, wav2vec, deep speech, etc)

3) Include abbreviations as table.

4) Recent references are missing: for example: A Formant Modification Method for Improved ASR of Children’s Speech”, Speech Communication, Vol. 136, pp. 98-106, January 2022.

5) Some articles published in this journal like Applied Sciences are missing: Using Data Augmentation and Time-Scale Modification to Improve ASR of Children’s Speech in Noisy Environments'', Applied Sciences, Vol.11, No. 18, pp. 8420, 2021.

6) Lots of mistakes in the reference list, for example see 16th. Please check them carefully.

7) Indicate captions of figures and tables clearly, such that they are self-contained.

8) Indicate significantly important papers in terms of table with database, ASR model, performance and any observation, etc. This is very important for the readers.

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

This study presents the Systematic Literature Review on Automatic Speech Recognition (ASR) System for Children’s speech.  

One recent article is missing in the references. Spectral modification for recognition of children’s speech under mismatched conditions, in Nordic Conference on Computational Linguistics (NoDaLiDa), pp. 94-100, May-June 2021.

Please add this reference at the location where formant frequencies/spectral characteristics are described. 

I feel that the authors have adequately addressed all of my comments raised in a previous round of review and this manuscript is now acceptable for publication with minor revision.  

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop