Automatic Assessment of Prosodic Quality in Down Syndrome: Analysis of the Impact of Speaker Heterogeneity †
Abstract
:1. Introduction
2. Methodology
2.1. Game Description
2.2. Corpus Description
2.3. Corpus Evaluation
2.3.1. Evaluation Criteria
- Intonation: adjustment to the expected modality. That is, if the target sentence must be interrogative and the speaker manages to model the intonation of a question, it is labeled as correct; otherwise, for instance, in the set of exclamatory phrases, if the speaker fails to reproduce an exclamatory intonation (within a range of intonation possibilities), the sentence is labeled as incorrect.
- Accent: preservation of the difference between lexical stress (stressed versus unstressed syllables) and accent (accented versus unaccented syllables). The loss of this difference can occur in three directions: (a) when tonal prominence appears in all the syllables, creating an undesired rhythmic effect; (b) when the speaker does not discriminate between stressed and unstressed syllables, as shown by the absence of variation in any of the acoustic parameters of intensity, duration and pitch; and (c) when there is tonal prominence variability but the syllable stress is inappropriately allocated.
- Phrasing: adjustment to the organization in prosodic groups and distinction between function and content words. The sentence is labeled as incorrect if every word is pronounced as if it were in an isolated context, without distinguishing between unstressed and stressed words. The sentence is also considered incorrect when the pauses are inappropriately allocated within the speech chain.
2.3.2. Therapist Evaluations
2.3.3. Expert Judgments
2.4. Feature Extraction and Selection
2.5. Automatic Classification
3. Results
3.1. Classification Results
3.2. Speakers Variability Results
4. Discussion
4.1. Analysis of the Classification Results
4.2. Impact of Variability on Assessment
4.3. Limitations and Future Work
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A. Description of the Features
Feature | Description |
---|---|
F0_mean (F0semitoneFrom27.5Hz_sma3nz_amean) | Mean of logarithmic F0 on a semitone frequency scale, starting at 27.5 Hz |
F0_stddevNorm (F0semitoneFrom27.5Hz_sma3nz_stddevNorm) | Coefficient of variation of logarithmic F0 on a semitone frequency scale, starting at 27.5 Hz |
F0_percentile20 (F0semitoneFrom27.5Hz_sma3nz_percentile20.0) | Percentile 20-th of logarithmic F0 on a semitone frequency scale, starting at 27.5 Hz |
F0_percentile50 (F0semitoneFrom27.5Hz_sma3nz_percentile50.0) | Percentile 50-th of logarithmic F0 on a semitone frequency scale, starting at 27.5 Hz |
F0_percentile80 (F0semitoneFrom27.5Hz_sma3nz_percentile80.0) | Percentile 80-th of logarithmic F0 on a semitone frequency scale, starting at 27.5 Hz |
F0_pctlrange (F0semitoneFrom27.5Hz_sma3nz_pctlrange0-2) | Range of 20-th to 80-th of logarithmic F0 on a semitone frequency scale, starting at 27.5 Hz |
F0_meanRisingSlope (F0semitoneFrom27.5Hz_sma3nz_meanRisingSlope) | Mean of the slope of rising signal parts of F0 |
F0_stddevRisingSlope (F0semitoneFrom27.5Hz_sma3nz_stddevRisingSlope) | Standard deviation of the slope of rising signal parts of F0 |
F0_meanFallingSlope (F0semitoneFrom27.5Hz_sma3nz_meanFallingSlope) | Mean of the slope of falling signal parts of F0 |
F0_stddevFallingSlope (F0semitoneFrom27.5Hz_sma3nz_stddevFallingSlope) | Standard deviation of the slope of falling signal parts of F0 |
jitter_mean (jitterLocal_sma3nz_amean) | Mean of the deviations in individual consecutive F0 period lengths |
jitter_stddevNorm (jitterLocal_sma3nz_stddevNorm) | Coefficient of variation of the deviations in individual consecutive F0 period lengths |
Feature | Description |
---|---|
loudness_mean (loudness_sma3_amean) | Mean of estimate of perceived signal intensity from an auditory spectrum |
loudness_stddevNorm (loudness_sma3_stddevNorm) | Coefficient of variation of estimate of perceived signal intensity from an auditory spectrum |
loudness_percentile20 (loudness_sma3_percentile20.0) | Percentile 20-th of estimate of perceived signal intensity from an auditory spectrum |
loudness_percentile50 (loudness_sma3_percentile50.0) | Percentile 50-th of estimate of perceived signal intensity from an auditory spectrum |
loudness_percentile80 (loudness_sma3_percentile80.0) | Percentile 80-th of estimate of perceived signal intensity from an auditory spectrum |
loudness_pctlrange02 (loudness_sma3_pctlrange0-2) | Range of 20-th to 80-th of estimate of perceived signal intensity from an auditory spectrum |
loudness_meanRisingSlope (loudness_sma3_meanRisingSlope) | Mean of the slope of rising signal parts of loudness |
loudness_stddevRisingSlope (loudness_sma3_stddevRisingSlope) | Standard deviation of the slope of rising signal parts of loudness |
loudness_meanFallingSlope (loudness_sma3_meanFallingSlope) | Mean of the slope of falling signal parts of loudness |
loudness_stddevFallingSlope (loudness_sma3_stddevFallingSlope) | Standard deviation of the slope of falling signal parts of loudness |
shimmer_mean (shimmerLocaldB_sma3nz_amean) | Mean of difference of the peak amplitudes of consecutive F0 periods |
shimmer_stddevNorm (shimmerLocaldB_sma3nz_stddevNorm) | Coefficient of variation of difference of the peak amplitudes of consecutive F0 periods |
Feature | Description |
---|---|
silencePercentage | Duration percentage of unvoiced regions |
silencesMean | Mean of unvoiced regions |
silencesPerSecond | The number of silences per second |
soundingPercentage | Duration percentage of voiced regions |
loudnessPeaksPerSec | The number of the loudness peaks per second |
VoicedSegmentsPerSec | The number of continuous voiced regions per second |
MeanVoicedSegmentLengthSec | Mean of continuously voiced regions |
StddevVoicedSegmentLengthSec | Standard deviation of continuously voiced regions |
MeanUnvoicedSegmentLength | Mean of unvoiced regions |
StddevUnvoicedSegmentLength | Standard deviation of unvoiced regions |
References
- Roach, P. English Phonetics and Phonology Fourth Edition: A Practical Course; Ernst Klett Sprachen: Cambridge, UK, 2010. [Google Scholar]
- Wells, B.; Peppé, S.; Vance, M. Linguistic assessment of prosody. Linguistics in Clinical Practice; Whurr: London, UK, 1995; pp. 234–265. [Google Scholar]
- Fidler, D.J.; Nadel, L. Education and children with Down syndrome: Neuroscience, development, and intervention. Ment. Retard. Dev. Disabil. Res. Rev. 2007, 13, 262–271. [Google Scholar] [CrossRef]
- Grieco, J.; Pulsifer, M.; Seligsohn, K.; Skotko, B.; Schwartz, A. Down syndrome: Cognitive and behavioral functioning across the lifespan. Am. J. Med. Genet. Part C Semin. Med. Genet. 2015, 169, 135–149. [Google Scholar] [CrossRef] [PubMed]
- Martin, G.E.; Klusek, J.; Estigarribia, B.; Roberts, J.E. Language characteristics of individuals with Down syndrome. Top. Lang. Disord. 2009, 29, 112. [Google Scholar] [CrossRef]
- Eadie, P.A.; Fey, M.; Douglas, J.; Parsons, C. Profiles of grammatical morphology and sentence imitation in children with specific language impairment and Down syndrome. J. Speech Lang. Hear. Res. 2002, 45, 720–732. [Google Scholar] [CrossRef]
- Smith, E.; Næss, K.A.B.; Jarrold, C. Assessing pragmatic communication in children with Down syndrome. J. Commun. Disord. 2017, 68, 10–23. [Google Scholar] [CrossRef]
- Laws, G.; Bishop, D.V. Verbal deficits in Down’s syndrome and specific language impairment: A comparison. Int. J. Lang. Commun. Disord. 2004, 39, 423–451. [Google Scholar] [CrossRef]
- Kent, R.D.; Vorperian, H.K. Speech impairment in Down syndrome: A review. J. Speech Lang. Hear. Res. 2013, 56, 178–210. [Google Scholar] [CrossRef]
- Heselwood, B.; Bray, M.; Crookston, I. Juncture, rhythm and planning in the speech of an adult with Down’s syndrome. Clin. Linguist. Phon. 1995, 9, 121–137. [Google Scholar] [CrossRef]
- Peppé, S.J. Why is prosody in speech-language pathology so difficult? Int. J. Speech-Lang. Pathol. 2009, 11, 258–271. [Google Scholar] [CrossRef]
- Martínez-Castilla, P.; Sotillo, M.; Campos, R. Prosodic abilities of Spanish-speaking adolescents and adults with Williams syndrome. Lang. Cogn. Process. 2011, 26, 1055–1082. [Google Scholar] [CrossRef]
- Peppé, S.; McCann, J.; Gibbon, F.; O’Hare, A.; Rutherford, M. Receptive and expressive prosodic ability in children with high-functioning autism. J. Speech Lang. Hear. Res. 2007, 50, 1015–1028. [Google Scholar] [CrossRef]
- Peppé, S.; McCann, J. Assessing intonation and prosody in children with atypical language development: The PEPS-C test and the revised version. Clin. Linguist. Phon. 2003, 17, 345–354. [Google Scholar] [CrossRef]
- Stojanovik, V. Prosodic deficits in children with Down syndrome. J. Neurolinguist. 2011, 24, 145–155. [Google Scholar] [CrossRef]
- Saz, O.; Yin, S.C.; Lleida, E.; Rose, R.; Vaquero, C.; Rodríguez, W.R. Tools and technologies for computer-aided speech and language therapy. Speech Commun. 2009, 51, 948–967. [Google Scholar] [CrossRef]
- Rodríguez, W.R.; Saz, O.; Lleida, E. A prelingual tool for the education of altered voices. Speech Commun. 2012, 54, 583–600. [Google Scholar] [CrossRef]
- Shahin, M.; Ahmed, B.; Parnandi, A.; Karappa, V.; McKechnie, J.; Ballard, K.J.; Gutierrez-Osuna, R. Tabby Talks: An automated tool for the assessment of childhood apraxia of speech. Speech Commun. 2015, 70, 49–64. [Google Scholar] [CrossRef]
- Öster, A.M.; House, D.; Protopapas, A.; Hatzis, A. Presentation of a new EU project for speech therapy: OLP (Ortho-Logo-Paedia). In Proceedings of the XV Swedish Phonetics Conference (Fonetik 2002), Stockholm, Sweden, 29–31 May 2002; pp. 29–31. [Google Scholar]
- Tan, T.S.; Ariff, A.; Ting, C.M.; Salleh, S.H. Application of Malay speech technology in Malay speech therapy assistance tools. In Proceedings of the IEEE 2007 International Conference on Intelligent and Advanced Systems, Kuala Lumpur, Malaysia, 25–28 November 2007; pp. 330–334. [Google Scholar]
- PRADIA, misterio en la ciudad. Available online: http://www.pradia.net (accessed on 18 July 2018).
- Adell, F.; Aguilar, L.; Corrales-Astorgano, M.; Escudero-Mancebo, D. Proceso de innovación educativa en educación especial: Enseñanza de la prosodia con fines comunicativos con el apoyo de un videojuego educativo. In Proceedings of the I Congreso Internacional en Humanidades Digitales, Valladolid, Spain, 17–19 April 2018. [Google Scholar]
- Le, D.; Licata, K.; Persad, C.; Provost, E.M. Automatic assessment of speech intelligibility for individuals with aphasia. IEEE/ACM Trans. Audio Speech Lang. Process. 2016, 24, 2187–2199. [Google Scholar] [CrossRef]
- Qin, Y.; Lee, T.; Feng, S.; Kong, A.P.H. Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning. In Proceedings of the Interspeech, Hyderabad, India, 2–6 September 2018; pp. 3418–3422. [Google Scholar]
- Maier, A.; Haderlein, T.; Eysholdt, U.; Rosanowski, F.; Batliner, A.; Schuster, M.; Nöth, E. PEAKS–A system for the automatic evaluation of voice and speech disorders. Speech Commun. 2009, 51, 425–437. [Google Scholar] [CrossRef]
- Kim, J.; Kumar, N.; Tsiartas, A.; Li, M.; Narayanan, S.S. Automatic intelligibility classification of sentence-level pathological speech. Comput. Speech Lang. 2015, 29, 132–144. [Google Scholar] [CrossRef]
- Maier, A.; Hönig, F.; Hacker, C.; Schuster, M.; Nöth, E. Automatic evaluation of characteristic speech disorders in children with cleft lip and palate. In Proceedings of the Ninth Annual Conference of the International Speech Communication Association, Brisbane, Australia, 22–26 September 2008. [Google Scholar]
- Laaridh, I.; Kheder, W.B.; Fredouille, C.; Meunier, C. Automatic prediction of speech evaluation metrics for dysarthric speech. In Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017; pp. 1834–1838. [Google Scholar]
- Martínez, D.; Lleida, E.; Green, P.; Christensen, H.; Ortega, A.; Miguel, A. Intelligibility assessment and speech recognizer word accuracy rate prediction for dysarthric speakers in a factor analysis subspace. ACM Trans. Access. Comput. (TACCESS) 2015, 6, 10. [Google Scholar] [CrossRef]
- Lee, H.Y.; Hu, T.Y.; Jing, H.; Chang, Y.F.; Tsao, Y.; Kao, Y.C.; Pao, T.L. Ensemble of machine learning and acoustic segment model techniques for speech emotion and autism spectrum disorders recognition. In Proceedings of the Interspeech, Lyon, France, 25–29 August 2013; pp. 215–219. [Google Scholar]
- Dunn, L.; Dunn, L.; Arribas, D. Test de vocabulario en imágenes Peabody; TEA: Madrid, Spain, 2006. [Google Scholar]
- Corral, S.; Arribas, D.; Santamaría, P.; Sueiro, M.; Pereña, J. Escala de Inteligencia de Wechsler para niños-IV; TEA Ediciones: Madrid, Spain, 2005. [Google Scholar]
- Raven, J.; Raven, J.C.; Court, J. Test de matrices progresivas: Manual/Manual for Raven’s progessive matrices and vocabulary scalesTest de matrices progresivas; Number 159.9. 072; J C Raven Ltd.: Buenos Aires, Argentina, 1993. [Google Scholar]
- Martínez-Castilla, P.; Peppé, S. Developing a test of prosodic ability for speakers of Iberian Spanish. Speech Commun. 2008, 50, 900–915. [Google Scholar] [CrossRef]
- González-Ferreras, C.; Escudero-Mancebo, D.; Corrales-Astorgano, M.; Aguilar-Cuevas, L.; Flores-Lucas, V. Engaging adolescents with Down syndrome in an educational video game. Int. J. Human–Comput. Interact. 2017, 33, 693–712. [Google Scholar] [CrossRef]
- Ladd, D.R. Intonational Phonology; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
- Eyben, F.; Weninger, F.; Gross, F.; Schuller, B. Recent developments in opensmile, the Munich open-source multimedia feature extractor. In Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Spain, 21–25 October 2013; ACM: New York, NY, USA, 2013; pp. 835–838. [Google Scholar]
- Eyben, F.; Scherer, K.R.; Schuller, B.W.; Sundberg, J.; André, E.; Busso, C.; Devillers, L.Y.; Epps, J.; Laukka, P.; Narayanan, S.S.; et al. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 2016, 7, 190–202. [Google Scholar] [CrossRef]
- Corrales-Astorgano, M.; Escudero-Mancebo, D.; González-Ferreras, C. Acoustic characterization and perceptual analysis of the relative importance of prosody in speech of people with Down syndrome. Speech Commun. 2018, 99, 90–100. [Google Scholar] [CrossRef]
- Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
- Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
- Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software: An update. ACM SIGKDD Explor. Newslett. 2009, 11, 10–18. [Google Scholar] [CrossRef]
- Pardo, A.; Ruiz, M.Á. SPSS 11: Guia para el analisis de datos.; Mc Graw Hill: Madrid, Spain, 2002. [Google Scholar]
- Le, D.; Provost, E.M. Modeling pronunciation, rhythm, and intonation for automatic assessment of speech quality in aphasia rehabilitation. In Proceedings of the Fifteenth Annual Conference of the International Speech Communication Association, Singapore, 14–18 September 2014. [Google Scholar]
- Tu, M.; Berisha, V.; Liss, J. Interpretable Objective Assessment of Dysarthric Speech Based on Deep Neural Networks. In Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017; pp. 1849–1853. [Google Scholar]
- Kreiman, J.; Gerratt, B.R.; Precoda, K.; Berke, G.S. Individual differences in voice quality perception. J. Speech Lang. Hear. Res. 1992, 35, 512–520. [Google Scholar] [CrossRef]
- Li, M.; Tang, D.; Zeng, J.; Zhou, T.; Zhu, H.; Chen, B.; Zou, X. An automated assessment framework for atypical prosody and stereotyped idiosyncratic phrases related to autism spectrum disorder. Comput. Speech Lang. 2019, 56, 80–94. [Google Scholar] [CrossRef]
Therapist Decision | Expert Judgment | ||||||
---|---|---|---|---|---|---|---|
(Real Time) | (Offline) | ||||||
Speaker | #Utterances | Cont.R | Cont. | Rep. | Right | Wrong | Corpus |
S01 | 120 | 70 | 33 | 17 | 87 | 33 | C1 |
S02 | 106 | 90 | 16 | 0 | 81 | 25 | C1 |
S03 | 97 | 93 | 3 | 1 | 78 | 19 | C1 |
S04 | 131 | 19 | 51 | 61 | 75 | 56 | C1 |
S05 | 151 | 21 | 54 | 76 | 77 | 74 | C1 |
S06 | 30 | x | x | x | 19 | 11 | C2 |
S07 | 34 | x | x | x | 13 | 21 | C2 |
S08 | 28 | x | x | x | 23 | 5 | C2 |
S09 | 43 | x | x | x | 20 | 23 | C2 |
S10 | 33 | x | x | x | 29 | 4 | C2 |
S11 | 57 | x | x | x | 31 | 26 | C3 |
S12 | 12 | x | x | x | 7 | 5 | C3 |
S13 | 7 | x | x | x | 2 | 5 | C3 |
S14 | 11 | x | x | x | 3 | 8 | C3 |
S15 | 33 | x | x | x | 19 | 14 | C3 |
S16 | 10 | x | x | x | 6 | 4 | C3 |
S17 | 8 | x | x | x | 5 | 3 | C3 |
S18 | 11 | x | x | x | 6 | 5 | C3 |
S19 | 10 | x | x | x | 6 | 4 | C3 |
S20 | 10 | x | x | x | 6 | 4 | C3 |
S21 | 9 | x | x | x | 1 | 8 | C3 |
S22 | 7 | x | x | x | 3 | 4 | C3 |
S23 | 8 | x | x | x | 3 | 5 | C3 |
Total | 966 | 293 | 157 | 155 | 465 | 302 |
Speaker | Gender | CA | VA | STVM | NVCL | MPercT | MProdT |
---|---|---|---|---|---|---|---|
S01 | f | 195 | 84 | 94 | 17 | 69.8% | 48.3% |
S02 | m | 204 | 99 | 134 | 18 | 76% | 72.1% |
S03 | f | 178 | 96 | 78 | 20 | 74% | 74.7% |
S04 | m | 190 | 60 | below 74 | 10 | 60.4% | 49.8% |
S05 | m | 223 | 69 | below 74 | 13 | 56.3% | 45.7% |
DT | SVM | MLP | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Corpora | BL | CR | AUC | UAR | CR | AUC | UAR | CR | AUC | UAR | #Utt. | #Feat. | #SPK | |
Case A | C1 | 65.8% | 69.6% | 0.68 | 0.74 | 78.5% | 0.74 | 0.83 | 73.2% | 0.7 | 0.79 | 605 | 21 | 5 |
Case B | C2 | 61.9% | 60.3% | 0.58 | 0.61 | 72.7% | 0.7 | 0.79 | 68.5% | 0.67 | 0.73 | 168 | 16 | 5 |
Case C | C3 | 50.8% | 65.8% | 0.66 | 0.66 | 61.6% | 0.62 | 0.69 | 63.7% | 0.64 | 0.64 | 193 | 7 | 13 |
Case D | C1+C2 | 64.9% | 70.8% | 0.68 | 0.75 | 79.3% | 0.76 | 0.83 | 72.6% | 0.7 | 0.78 | 773 | 21 | 10 |
Case E | C1+C3 | 62.2% | 66.3% | 0.65 | 0.69 | 72.3% | 0.7 | 0.79 | 67.2% | 0.65 | 0.74 | 798 | 20 | 18 |
Case F | C2+C3 | 56% | 60.9% | 0.6 | 0.64 | 66.5% | 0.66 | 0.75 | 64% | 0.63 | 0.69 | 361 | 13 | 18 |
Case G | C1+C2+C3 | 62.1% | 66.9% | 0.66 | 0.71 | 74.3% | 0.71 | 0.81 | 69.4% | 0.66 | 0.76 | 996 | 20 | 23 |
Feature | S01 | S02 | S03 | S04 | S05 | All |
---|---|---|---|---|---|---|
silencesMean | 4 (0.675) | 6 (0.638) | 3 (0.673) | 2 (0.744) | 2 (0.662) | 1 (0.692) |
silencesPerSecond | 10 (0.6) | 1 (0.754) | 2 (0.696) | 3 (0.725) | 11 (0.581) | 2 (0.683) |
jitterLocal_sma3nz_amean | 1 (0.688) | 16 (0.534) | 5 (0.618) | 8 (0.683) | 22 (0.515) | 3 (0.65) |
F0semitoneFrom27.5Hz_sma3nz_stddevNorm | 7 (0.646) | 7 (0.633) | 22 (0.506) | 4 (0.712) | 17 (0.559) | 4 (0.647) |
jitterLocal_sma3nz_stddevNorm | 2 (0.683) | 2 (0.681) | 18 (0.524) | 6 (0.689) | 9 (0.592) | 5 (0.631) |
F0semitoneFrom27.5Hz_sma3nz_stddevRisingSlope | 5 (0.662) | 12 (0.578) | 7 (0.601) | 9 (0.66) | 3 (0.651) | 6 (0.629) |
F0semitoneFrom27.5Hz_sma3nz_percentile80.0 | 17 (0.572) | 3 (0.67) | 4 (0.652) | 16 (0.548) | 1 (0.684) | 7 (0.628) |
F0semitoneFrom27.5Hz_sma3nz_pctlrange0.2 | 11 (0.598) | 14 (0.559) | 15 (0.545) | 7 (0.689) | 18 (0.544) | 8 (0.626) |
F0semitoneFrom27.5Hz_sma3nz_stddevFallingSlope | 3 (0.679) | 8 (0.6) | 8 (0.6) | 14 (0.595) | 10 (0.588) | 9 (0.625) |
StddevVoicedSegmentLengthSec | 22 (0.506) | 4 (0.66) | 16 (0.535) | 1 (0.762) | 16 (0.561) | 10 (0.601) |
loudnessPeaksPerSec | 12 (0.595) | 13 (0.563) | 12 (0.558) | 17 (0.533) | 8 (0.603) | 11 (0.586) |
shimmerLocaldB_sma3nz_stddevNorm | 9 (0.601) | 5 (0.642) | 11 (0.58) | 13 (0.598) | 14 (0.563) | 12 (0.583) |
shimmerLocaldB_sma3nz_amean | 6 (0.651) | 18 (0.523) | 1 (0.698) | 20 (0.532) | 21 (0.528) | 13 (0.583) |
loudness_sma3_stddevNorm | 18 (0.569) | 10 (0.585) | 14 (0.548) | 11 (0.615) | 20 (0.53) | 14 (0.579) |
MeanUnvoicedSegmentLength | 8 (0.617) | 9 (0.586) | 20 (0.518) | 10 (0.628) | 19 (0.542) | 15 (0.557) |
StddevUnvoicedSegmentLength | 15 (0.579) | 11 (0.581) | 21 (0.511) | 12 (0.613) | 13 (0.575) | 16 (0.555) |
loudness_sma3_meanFallingSlope | 13 (0.59) | 15 (0.549) | 17 (0.524) | 15 (0.578) | 7 (0.609) | 17 (0.545) |
VoicedSegmentsPerSec | 16 (0.573) | 19 (0.517) | 9 (0.599) | 21 (0.519) | 12 (0.577) | 18 (0.521) |
loudness_sma3_stddevFallingSlope | 19 (0.528) | 21 (0.513) | 13 (0.555) | 18 (0.532) | 5 (0.624) | 19 (0.519) |
loudness_sma3_pctlrange0.2 | 20 (0.512) | 22 (0.504) | 10 (0.586) | 5 (0.696) | 4 (0.63) | 20 (0.514) |
MeanVoicedSegmentLengthSec | 14 (0.582) | 20 (0.514) | 6 (0.617) | 22 (0.503) | 15 (0.562) | 21 (0.514) |
loudness_sma3_stddevRisingSlope | 21 (0.509) | 17 (0.523) | 19 (0.52) | 19 (0.532) | 6 (0.612) | 22 (0.501) |
Expert Judgment | Classified as | Therapist Decision | ||||||
---|---|---|---|---|---|---|---|---|
Speaker | #Total utt | Type | #utt | R | W | Cont.R | Cont. | Rep. |
S01 | 120 | R | 87 | 83.9% | 16.1% | 69% | 24.1% | 6.9% |
W | 33 | 57.6% | 42.4% | 30.3% | 36.7% | 33.3% | ||
S02 | 106 | R | 81 | 87.7% | 12.4% | 85.2% | 14.8% | 0.0% |
W | 25 | 28.0% | 72.0% | 84.0% | 16.0% | 0.0% | ||
S03 | 97 | R | 78 | 97.4% | 2.6% | 94.9% | 3.9% | 1.3% |
W | 19 | 73.7% | 26.3% | 100.0% | 0.0% | 0.0% | ||
S04 | 131 | R | 75 | 94.6% | 5.3% | 21.3% | 44.0% | 34.7% |
W | 56 | 41.1% | 58.9% | 5.4% | 32.1% | 62.5% | ||
S05 | 151 | R | 77 | 87% | 13% | 20.8% | 50.7% | 28.6% |
W | 74 | 29.7% | 70.3% | 6.8% | 20.3% | 73% | ||
Total | 605 | R | 398 | 89.9% | 10.1% | 80.2% | 68.8% | 35.5% |
W | 207 | 41.1% | 58.9% | 19.8% | 31.2% | 64.5% |
CA | VA | NVCL | MPercT | MProdT | RRate | ContRRate | ContRate | RepRate | CR | |
---|---|---|---|---|---|---|---|---|---|---|
CA | 1.0 | −0.29 | −0.36 | −0.53 | −0.49 | −0.64 | −0.52 | 0.53 | 0.5 | −0.16 |
VA | 1.0 | 0.96 | 0.93 | 0.84 | 0.91 | 0.97 | −0.91 | −0.96 | 0.42 | |
NVCL | 1.0 | 0.87 | 0.76 | 0.9 | 0.95 | −0.92 | −0.93 | 0.29 | ||
MPercT | 1.0 | 0.83 | 0.98 | 0.96 | −0.86 | −0.99 | 0.36 | |||
MProdT | 1.0 | 0.81 | 0.89 | −0.93 | −0.83 | 0.80 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Corrales-Astorgano, M.; Martínez-Castilla, P.; Escudero-Mancebo, D.; Aguilar, L.; González-Ferreras, C.; Cardeñoso-Payo, V. Automatic Assessment of Prosodic Quality in Down Syndrome: Analysis of the Impact of Speaker Heterogeneity. Appl. Sci. 2019, 9, 1440. https://doi.org/10.3390/app9071440
Corrales-Astorgano M, Martínez-Castilla P, Escudero-Mancebo D, Aguilar L, González-Ferreras C, Cardeñoso-Payo V. Automatic Assessment of Prosodic Quality in Down Syndrome: Analysis of the Impact of Speaker Heterogeneity. Applied Sciences. 2019; 9(7):1440. https://doi.org/10.3390/app9071440
Chicago/Turabian StyleCorrales-Astorgano, Mario, Pastora Martínez-Castilla, David Escudero-Mancebo, Lourdes Aguilar, César González-Ferreras, and Valentín Cardeñoso-Payo. 2019. "Automatic Assessment of Prosodic Quality in Down Syndrome: Analysis of the Impact of Speaker Heterogeneity" Applied Sciences 9, no. 7: 1440. https://doi.org/10.3390/app9071440
APA StyleCorrales-Astorgano, M., Martínez-Castilla, P., Escudero-Mancebo, D., Aguilar, L., González-Ferreras, C., & Cardeñoso-Payo, V. (2019). Automatic Assessment of Prosodic Quality in Down Syndrome: Analysis of the Impact of Speaker Heterogeneity. Applied Sciences, 9(7), 1440. https://doi.org/10.3390/app9071440