Comparison of In-Person and Online Recordings in the Clinical Teleassessment of Speech Production: A Pilot Study
Abstract
:1. Introduction
1.1. The Assessment of Speech Production
- Perceptual judgments, i.e., the clinician relies on her/his personal expertise to judge whether the perceived speech is unaltered or distorted, and if so, to which extent, usually by means of rating scales that can be filled in, either in front of the speaker or based on recordings;
- Acoustic measurements, i.e., the properties of the acoustic signal are quantified with the help of speech analysis software which are able to compute parameters that are not easily definable by ear, such as specific spectral components; various software is available and best practice recommendations/tutorials are only just emerging to compute acoustic measures for clinical purposes [5,6].
1.2. The Teleassessment of Speech Production
1.3. The Reliability of Speech Production Teleassessments
1.4. Purpose of the Study
2. Materials and Methods
2.1. Participants
2.2. Materials and Procedure
- Local high-quality condition (local HQ): Speech was recorded at the participant’s home on a Dell laptop running MonPaGe-2.0.s. The in-person recording source was a professional headset condenser microphone (Shure SM35-XLR) connected to an external USB sound card (Scarlett 2i4 Focusrite). This condition served as the “gold standard” condition, and was compared to the two teleassessment settings below;
- Teleassessment with local standard quality recordings (local SQ): Speech was recorded at the participant’s home on an Apple laptop running Zoom and a website in Safari allowing continuous recording. The in-person recording source was therefore the built-in microphone of the laptop. The website used the recorder js library by Matt Diamond (https://github.com/mattdiamond/recorderjs, accessed on 22 April 2020) to access the laptop’s microphone via the browser and to record audio. The recorded audio was automatically sent to a server via ajax once the recording was finished and was therefore easily available to the remote assessor. Note that this setting resembles previous linguistic research approaches using both Zoom and the smartphone of the participant as a recorder with a specialized app transmitting the audio files to the remote experimenter (e.g., [43]); however, here the same device (laptop) was used both to conduct the Zoom session and to record the participant;
- Teleassessment with remote zoom recordings (online): Speech was recorded at the remote experimenter’s place on an Apple laptop running MonPaGe-2.0.s and sharing the screen on Zoom. The remote recording source was therefore the built-in microphone of the laptop; the speech signal of the participant was played by the built-in speakers of the same laptop via Zoom.
2.2.1. Recording Setting
2.2.2. Perceptual and Acoustic Analyses
2.2.3. Tasks and Speech Parameters
- Intelligibility: In this task, 15 target words randomly extracted from a dataset of 437 words, all having at least one phonological neighbour, appeared successively alongside their corresponding picture on a grid of colored shapes visible only to the speakers but not to the experimenter. Speakers had to instruct the experimenter where to place the target word in the grid by using a pre-defined sentence such as “Put the word leaf (=target word) in the red square”. In the original MonPaGe-2.0.s protocol, assessors transcribe the words and place them on the mentioned locations in real-time, i.e., during the assessment when interacting with the speakers. Here, the three SLT raters performed the same task but offline, based on the audio recordings captured during the assessment. Intelligibility was rated perceptually as the number of words incorrectly understood by the SLT raters out of the actual 15 target words speakers had to produce. In total, 675 target words (15 items × 15 participants × 3 conditions) were transcribed by each SLT rater;
- Articulation in pseudowords: In this task, 53 pseudowords containing all French phonemes were presented both orally and in written form. The pseudowords followed the phonotactic restrictions of French. Speakers had to repeat or read aloud each pseudoword. Recordings were rated offline in MonPaGe-2.0.s through a guided perceptual procedure: SLT raters could play the recorded answers as needed, and were asked to judge the production of 151 targeted sounds/sequences of sounds (e.g., /r/, /f/ and /o/ in rafo or /v/ and /str/ in vastra) as correctly produced, incorrectly produced or containing a sound/recording problem. Articulation was rated as the number of errors perceived on the 151 target phonemes or target syllables. In total, 6795 target sounds (151 items × 15 participants × 3 conditions) were coded by each SLT rater.
- Five parameters related to speech rate were measured in three different tasks:
- Repetition of the days of the week: In this task, speakers had to repeat the overlearned series of the days of the week continuously for at least 30 s. SLT raters had to manually label the onset of the production on the recordings, and the PRAAT script automatically defined a window of 30 s from this onset (which could be manually adjusted to the right if needed in order to not cut a word). The rate over this window was computed as the number of words produced divided by the window duration (Rate_Days in words per second);
- Sentence reading: In this task, speakers had to read a short sentence containing seven CV syllables. SLT raters had to manually label the onset and offset of the sentence on the recordings, and speech rate was automatically computed as the number of syllables (7) divided by the sentence duration (Rate_Sentence in syllables per second);
- Diadochokinesia (DDK): In this task, speakers had to repeat as fast and as precisely as possible CV or CCV syllables (/ba/, /de/, /go/, /kla/ or /tra/; alternative motion rate—AMR) and a sequence of three CV syllables (/badego/; sequential motion rate—SMR) for at least 4 s. SLT raters had to manually label the onset of the production on the recordings, and the PRAAT script automatically defined a window of 4 s from this onset (which could be manually adjusted to the right in order to not cut a syllable, if needed). The DDK rate was computed as the number of syllables produced divided by the window duration (Rate_DDK AMR CV, AMR CCV and SMR CV in syllables per second).
- Eight parameters related to voice were measured in two different tasks:
- Sustained /a/: In this task, speakers had to maintain the vowel /a/ for at least 2 s at a comfortable pitch and loudness. SLT raters had to manually label the onset of the /a/ and the PRAAT script automatically defined a window of 2 s from the onset. Raters were instructed to move this 2 s window and/or to tune the frequency range to insure adequate f0 detection. Over this window, five voice measures were automatically computed: Jitter (5-point Period Perturbation Quotient), Shimmer (11-point Amplitude Perturbation Quotient), HNR, smooth Cepstral Peak Prominence (CPPs), f0 standard deviation;
- Sentence reading: On the segmented sentence used for the speech rate measure (see above), three measures related to voice and speaking f0 were also computed as follows: mean f0, f0 standard deviation, CPPs.
- Maximal phonation time: In this task, speakers had to maintain voicing on the vowel /a/ as long as possible in a single breath at a comfortable pitch and loudness in two trials. SLT raters had to segment the recordings, and the duration of the longest trial was retained as a measure of Maximal Phonation Time in seconds.
- Prosody: In this task, speakers had to read aloud a four syllable sentence the first time with affirmative prosody and the second time with interrogative prosody. In order to measure the linguistic prosodic contrast between the two sentences, SLT raters had to manually label the onset and the offset of the sentence on the recordings, and to adjust f0 detection. The difference in f0 modulation (f0 range in semitones) between the beginning and the end of the sentences served to compute the Prosodic Contrast in semitones.
2.2.4. Reliability Analyses
- Intersource agreement: The agreement of the scoring between the recording sources was calculated in two-by-two comparisons between raw scores of each rater for each participant stemming from local HQ vs. online recordings and from local HQ vs. local SQ recordings. For both teleassessment settings (local SQ and online), the local HQ condition served as a gold-standard baseline condition. Intersource differences were calculated on absolute values for each speech parameter of each participant scored by each rater to control for differences going in opposite directions (e.g., f0 computed higher online than in person for a given participant or by a given rater, but lower for another participant or by another rater). A ratio of divergence among sources was calculated by dividing the absolute difference between the local HQ and the conditions of interest (|local HQ − online| or |local HQ − local SQ|) by the value of the local HQ condition considered as the reference (|local HQ − online|/local HQ or |local HQ − local SQ|/local HQ). For example, a maximal phonation time (MPT) of 18 s according to local HQ recordings compared to an online MPT of 18 s would give 0 source divergence (i.e., |18 − 18|/18 = 0 or 0% if expressed as a percentage), whereas an online MPT of 12 s would give 0.33 source divergence (i.e., |18 − 12|/18 = 0.33 or 33% if expressed as a percentage). For each speech parameter, a mean percentage of source divergence across participants and raters was then calculated.
- Interrater agreement: The agreement of the scoring between the three SLT raters was also examined, primarily as a control measure, but also to test to what extent recordings of lower quality increased interrater variations as compared to high-quality recordings. For each speech parameter in each participant, the averaged scoring of the three SLT raters served as a reference value (SLT1 + SLT2 + SLT3/3 = meanSLT). Interrater differences were calculated on absolute values between the scoring of each rater and the mean scoring (|SLT1 − meanSLT|, |SLT2 − meanSLT| and |SLT3 − meanSLT|), and each of these differences was divided by the reference value (|SLT1 − meanSLT|/meanSLT, |SLT2 − meanSLT|/meanSLT and |SLT3 − meanSLT|/meanSLT). The three obtained ratios were finally averaged to compute a mean ratio of divergence among raters. For example, for the prosodic contrast, if SLT1 scored 5 semitones for a given participant, but SLT2 scored 9 semitones and SLT3 10 semitones, the averaged scoring would be 8 semitones for this participant ((5 + 9 + 10)/3). In this case, the mean rater divergence would reach 0.25 (i.e., (|5 − 8|/8 + |9 − 8|/8 + |10 − 8|/8)/3 = 0.25 or 25% if expressed as a percentage). For each speech parameter, a mean percentage of rater divergence across participants was then calculated.
2.2.5. Statistical Analyses
3. Results
3.1. Intersource Agreement
3.2. Interrater Agreement
3.3. Impact of Internet Speed
4. Discussion
4.1. Perceptual Measures
4.2. Speech Rate Measures
4.3. Voice Measures
4.4. Maximal Phonation Time
4.5. Prosodic Constrast
4.6. Limitations and Perspectives
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Rusz, J.; Hlavnicka, J.; Tykalova, T.; Novotny, M.; Dusek, P.; Sonka, K.; Ruzicka, E. Smartphone Allows Capture of Speech Abnormalities Associated with High Risk of Developing Parkinson’s Disease. IEEE Trans. Neural Syst. Rehabil. Eng. 2018, 26, 1495–1507. [Google Scholar] [CrossRef] [PubMed]
- Robin, J.; Harrison, J.E.; Kaufman, L.D.; Rudzicz, F.; Simpson, W.; Yancheva, M. Evaluation of Speech-Based Digital Biomarkers: Review and Recommendations. Digit Biomark 2020, 4, 99–108. [Google Scholar] [CrossRef] [PubMed]
- Vogel, A.P.; Reece, H. Recording Speech. In Manual of Clinical Phonetics, 1st ed.; Ball, M., Ed.; Routledge: London, UK, 2021; pp. 217–227. [Google Scholar]
- Pommée, T.; Balaguer, M.; Pinquier, J.; Mauclair, J.; Woisard, V.; Speyer, R. Relationship between phoneme-level spectral acoustics and speech intelligibility in healthy speech: A systematic review. Speech Lang. Hear. 2021, 24, 105–132. [Google Scholar] [CrossRef]
- Schultz, B.G.; Vogel, A.P. A Tutorial Review on Clinical Acoustic Markers in Speech Science. J. Speech Lang Hear. Res. 2022, 65, 3239–3263. [Google Scholar] [CrossRef]
- Rusz, J.; Tykalova, T.; Ramig, L.O.; Tripoliti, E. Guidelines for Speech Recording and Acoustic Analyses in Dysarthrias of Movement Disorders. Mov. Disord. 2021, 36, 803–814. [Google Scholar] [CrossRef]
- Duffy, J.R. Motor Speech Disorders: Substrates, Differential Diagnosis, and Management, 3rd ed.; Elsevier Mosby: St. Louis, MO, USA, 2013. [Google Scholar]
- Haley, K.L.; Roth, H.; Grindstaff, E.; Jacks, A. Computer-mediated assessment of intelligibility in aphasia and apraxia of speech. Aphasiology 2011, 25, 1600–1620. [Google Scholar] [CrossRef]
- Ding, L.; Lin, Z.; Radwan, A.; El-Hennawey, M.S.; Goubran, R.A. Non-intrusive single-ended speech quality assessment in VoIP. Speech Commun. 2007, 49, 477–489. [Google Scholar] [CrossRef]
- Utianski, R.L.; Sandoval, S.; Berisha, V.; Lansford, K.L.; Liss, J.M. The Effects of Speech Compression Algorithms on the Intelligibility of Two Individuals With Dysarthric Speech. Am. J. Speech Lang Pathol. 2019, 28, 195–203. [Google Scholar] [CrossRef] [PubMed]
- Xue, S.A.; Lower, A. Acoustic fidelity of internet bandwidths for measures used in speech and voice disorders. J. Acoust. Soc. Am. 2010, 128, 1366. [Google Scholar] [CrossRef] [PubMed]
- Zhang, C.; Jepson, K.; Lohfink, G.; Arvaniti, A. Comparing acoustic analyses of speech data collected remotely. J. Acoust. Soc. Am. 2021, 149, 3910–3916. [Google Scholar] [CrossRef]
- Ge, C.; Xiong, Y.; Mok, P. How Reliable Are Phonetic Data Collected Remotely? Comparison of Recording Devices and Environments on Acoustic Measurements. Proc. Interspeech 2021, 2021, 3984–3988. [Google Scholar] [CrossRef]
- Calder, J.; Wheeler, R.; Adams, S.; Amarelo, D.; Arnold-Murray, K.; Bai, J.; Church, M.; Daniels, J.; Gomez, S.; Henry, J.; et al. Is Zoom viable for sociophonetic research? A comparison of in-person and online recordings for vocalic analysis. Linguistics Vanguard 2022, 2022, 20200148. [Google Scholar] [CrossRef]
- Constantinescu, G.; Theodoros, D.; Russell, T.; Ward, E.; Wilson, S.; Wootton, R. Assessing disordered speech and voice in Parkinson’s disease: A telerehabilitation application. Int. J. Lang Commun. Disord 2010, 45, 630–644. [Google Scholar] [CrossRef] [PubMed]
- Hill, A.J.; Theodoros, D.G.; Russell, T.G.; Ward, E.C. The Redesign and Re-evaluation of an Internet-Based Telerehabilitation System for the Assessment of Dysarthria in Adults. Telemed. E-Health 2009, 15, 840–850. [Google Scholar] [CrossRef] [Green Version]
- Weidner, K.; Lowman, J. Telepractice for Adult Speech-Language Pathology Services: A Systematic Review. Perspect ASHA SIGs 2020, 5, 326–338. [Google Scholar] [CrossRef] [Green Version]
- Eriks-Brophy, A.; Quittenbaum, J.; Anderson, D.; Nelson, T. Part of the problem or part of the solution? Communication assessments of Aboriginal children residing in remote communities using videoconferencing. Clin. Linguist. Phon. 2008, 22, 589–609. [Google Scholar] [CrossRef]
- Waite, M.C.; Theodoros, D.G.; Russell, T.G.; Cahill, L.M. Assessing children’s speech intelligibility and oral structures, and functions via an Internet-based telehealth system. J. Telemed. Telecare 2012, 18, 198–203. [Google Scholar] [CrossRef]
- Manning, B.L.; Harpole, A.; Harriott, E.M.; Postolowicz, K.; Norton, E.S. Taking Language Samples Home: Feasibility, Reliability, and Validity of Child Language Samples Conducted Remotely With Video Chat Versus In-Person. J. Speech Lang Hear Res. 2020, 63, 3982–3990. [Google Scholar] [CrossRef]
- Molini-Avejonas, D.R.; Rondon-Melo, S.; de La Higuera Amato, C.A.; Samelli, A.G. A systematic review of the use of telehealth in speech, language and hearing sciences. J. Telemed. Telecare 2015, 21, 367–376. [Google Scholar] [CrossRef]
- Castillo-Allendes, A.; Contreras-Ruston, F.; Cantor-Cutiva, L.C.; Codino, J.; Guzman, M.; Malebran, C.; Manzano, C.; Pavez, A.; Vaiano, T.; Wilder, F.; et al. Voice Therapy in the Context of the COVID-19 Pandemic: Guidelines for Clinical Practice. J. Voice 2021, 35, 717–727. [Google Scholar] [CrossRef]
- Fahed, V.S.; Doheny, E.P.; Busse, M.; Hoblyn, J.; Lowery, M.M. Comparison of Acoustic Voice Features Derived from Mobile Devices and Studio Microphone Recordings. J. Voice 2022, S0892199722003125. [Google Scholar] [CrossRef] [PubMed]
- Penney, J.; Gibson, A.; Cox, F.; Proctor, M.; Szakay, A. A Comparison of Acoustic Correlates of Voice Quality Across Different Recording Devices: A Cautionary Tale. Interspeech 2021, 2021, 1389–1393. [Google Scholar] [CrossRef]
- Uloza, V.; Ulozaitė-Stanienė, N.; Petrauskas, T.; Kregždytė, R. Accuracy of Acoustic Voice Quality Index Captured With a Smartphone—Measurements with Added Ambient Noise. J. Voice 2021, S0892199721000734. [Google Scholar] [CrossRef] [PubMed]
- Caverlé, M.W.J.; Vogel, A.P. Stability, reliability, and sensitivity of acoustic measures of vowel space: A comparison of vowel space area, formant centralization ratio, and vowel articulation index. J. Acoust. Soc. Am. 2020, 148, 1436–1444. [Google Scholar] [CrossRef] [PubMed]
- Maryn, Y.; Ysenbaert, F.; Zarowski, A.; Vanspauwen, R. Mobile Communication Devices, Ambient Noise, and Acoustic Voice Measures. J. Voice 2017, 31, 248.e11–248.e23. [Google Scholar] [CrossRef]
- Wertz, R.T.; Dronkers, N.F.; Bernstein-Ellis, E.; Shubitowski, Y.; Elman, R.; Shenaut, G.K. Appraisal and diagnosis of neurogenic communication disorders in remote settings. Clin. Aphasiology 1987, 17, 117–123. [Google Scholar]
- Wertz, R.T.; Dronkers, N.F.; Bernstein-Ellis, E.; Sterling, L.K.; Shubitowski, Y.; Elman, R.; Shenaut, G.K.; Knight, R.T.; Deal, J.L. Potential of telephonic and television technology for appraising and diagnosing neurogenic communication disorders in remote settings. Aphasiology 1992, 6, 195–202. [Google Scholar] [CrossRef]
- Duffy, J.R.; Werven, G.W.; Aronson, A.E. Telemedicine and the Diagnosis of Speech and Language Disorders. Mayo Clin. Proc. 1997, 72, 1116–1122. [Google Scholar] [CrossRef]
- Hill, A.J.; Theodoros, D.G.; Russell, T.G.; Cahill, L.M.; Ward, E.C.; Clark, K.M. An Internet-Based Telerehabilitation System for the Assessment of Motor Speech Disorders: A Pilot Study. Am. J. Speech Lang. Pathol. 2006, 15, 45–56. [Google Scholar] [CrossRef]
- Theodoros, D.; Russell, T.G.; Hill, A.; Cahill, L.; Clark, K. Assessment of motor speech disorders online: A pilot study. J. Telemed. Telecare 2003, 9, 66–68. [Google Scholar] [CrossRef]
- Ramig, L.O.; Countryman, S.; Thompson, L.L.; Horii, Y. Comparison of Two Forms of Intensive Speech Treatment for Parkinson Disease. J. Speech Lang Hear Res. 1995, 38, 1232–1251. [Google Scholar] [CrossRef] [PubMed]
- Vogel, A.P.; Rosen, K.M.; Morgan, A.T.; Reilly, S. Comparability of modern recording devices for speech analysis: Smartphone, landline, laptop, and hard disc recorder. Folia Phoniatr. Logop. 2014, 66, 244–250. [Google Scholar] [CrossRef] [PubMed]
- Dahl, K.L.; Weerathunge, H.R.; Buckley, D.P.; Dolling, A.S.; Díaz-Cádiz, M.; Tracy, L.F.; Stepp, C.E. Reliability and Accuracy of Expert Auditory-Perceptual Evaluation of Voice via Telepractice Platforms. Am. J. Speech Lang Pathol. 2021, 30, 2446–2455. [Google Scholar] [CrossRef] [PubMed]
- Sanker, C.; Babinski, S.; Burns, R.; Evans, M.; Johns, J.; Kim, J.; Smith, S.; Weber, N.; Bowern, C. (Don’t) try this at home! The effects of recording devices and software on phonetic analysis. Language 2021, 97, e360–e382. [Google Scholar] [CrossRef]
- Jannetts, S.; Schaeffler, F.; Beck, J.; Cowen, S. Assessing voice health using smartphones: Bias and random error of acoustic voice parameters captured by different smartphone types: Assessing voice health using smartphones. Int. J. Lang. Commun. Disord. 2019, 54, 292–305. [Google Scholar] [CrossRef] [PubMed]
- Keck, C.S.; Doarn, C.R. Telehealth Technology Applications in Speech-Language Pathology. Telemed. E-Health 2014, 20, 653–659. [Google Scholar] [CrossRef]
- Tran, K.; Xu, L.; Stegmann, G.; Liss, J.; Berisha, V.; Utianski, R. Investigating the Impact of Speech Compression on the Acoustics of Dysarthric Speech. Interspeech 2022, 2022, 2263–2267. [Google Scholar] [CrossRef]
- Sevitz, J.S.; Kiefer, B.R.; Huber, J.E.; Troche, M.S. Obtaining Objective Clinical Measures During Telehealth Evaluations of Dysarthria. Am. J. Speech Lang Pathol. 2021, 30, 503–516. [Google Scholar] [CrossRef]
- Fougeron, C.; Delvaux, V.; Ménard, L.; Laganaro, M. The MonPaGe_HA Database for the Documentation of Spoken French Throughout Adulthood. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 7–12 May 2018; pp. 4301–4306. [Google Scholar]
- Laganaro, M.; Fougeron, C.; Pernon, M.; Levêque, N.; Borel, S.; Fournet, M.; Catalano Chiuvé, S.; Lopez, U.; Trouville, R.; Ménard, L.; et al. Sensitivity and specificity of an acoustic- and perceptual-based tool for assessing motor speech disorders in French: The MonPaGe-screening protocol. Clin. Linguist. Phon. 2021, 35, 1060–1075. [Google Scholar] [CrossRef]
- Leemann, A.; Jeszenszky, P.; Steiner, C.; Studerus, M.; Messerli, J. Linguistic fieldwork in a pandemic: Supervised data collection combining smartphone recordings and videoconferencing. Linguist. Vanguard 2020, 6, 20200061. [Google Scholar] [CrossRef]
- McGraw, K.O.; Wong, S.P. Forming inferences about some intraclass correlation coefficients. Psychol. Methods 1996, 1, 30–46. [Google Scholar] [CrossRef]
- Koo, T.K.; Li, M.Y. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J. Chiropr. Med. 2016, 15, 155–163. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, C.; Jepson, K.; Lohfink, G.; Arvaniti, A. Speech data collection at a distance: Comparing the reliability of acoustic cues across homemade recordings. J. Acoust. Soc. Am. 2020, 148, 2717. [Google Scholar] [CrossRef]
- Freeman, V.; De Decker, P. Remote sociophonetic data collection: Vowels and nasalization over video conferencing apps. J. Acoust. Soc. Am. 2021, 149, 1211–1223. [Google Scholar] [CrossRef] [PubMed]
- Fougeron, C.; Guitard-Ivent, F.; Delvaux, V. Multi-Dimensional Variation in Adult Speech as a Function of Age. Languages 2021, 6, 176. [Google Scholar] [CrossRef]
Participants | Age | Gender | Native Language (Area of Acquisition) | Country of Residence |
---|---|---|---|---|
P1 | 25 | F | French (Ile de France) | France |
P2 | 67 | F | French (Jura) | Switzerland (French-speaking part) |
P3 | 28 | F | French (Auvergne-Rhône-Alpes) | France |
P4 | 50 | F | French (Ile de France) | France |
P5 | 83 | F | French (Auvergne-Rhône-Alpes) | France |
P6 | 35 | M | French (Fribourg) | Switzerland (French-speaking part) |
P7 | 60 | F | French (Fribourg) | Switzerland (French-speaking part) |
P8 | 58 | M | French (Auvergne-Rhône-Alpes) | France |
P9 | 82 | F | French (Neuchâtel) | Switzerland (French-speaking part) |
P10 | 27 | F | French (Auvergne-Rhône-Alpes) | France |
P11 | 55 | F | French (Auvergne-Rhône-Alpes) | France |
P12 | 35 | M | French (Fribourg) | Switzerland (French-speaking part) |
P13 | 47 | M | French (Auvergne-Rhône-Alpes) | France |
P14 | 82 | F | French (Fribourg) | Switzerland (French-speaking part) |
P15 | 62 | M | French (Fribourg) | Switzerland (French-speaking part) |
Condition | Recording Source | Origin of the Recorded Speech Signal | Recording Software | Audio Files Storage |
---|---|---|---|---|
Local HQ | Shure microphone and Scarlett external USB sound card connected to a Dell laptop situated next to the participant | In-person | MSD battery MonPaGe-2.0.s (sounddevice python library) | On the local computer |
Local SQ | Built-in microphone of an Apple laptop situated in front of the participant | In-person | Safari browser (recorder js plugin) | On an online server |
Online | Built-in microphone of an Apple laptop situated in front of the remote experimenter | Remote (played by the built-in speakers of the experimenter’s laptop via Zoom) | MSD battery MonPaGe-2.0.s (sounddevice python library) | On the remote computer |
Intersource | ||||||||
---|---|---|---|---|---|---|---|---|
Local HQ vs. Online | Local HQ vs. Local SQ | |||||||
m | Range | % | m | Range | % | |||
Acoustic measures | Speech rate | Rate_Sentence (syll/sec) | 0.4 | 0–1.8 | 7.4 | 0.4 | 0–1.4 | 6.2 |
Rate_Days (word/sec) | 0 | 0–0.3 | 1.1 | 0 | 0–0.3 | 0.6 | ||
Rate_DDK AMR CCV (syll/sec) | 0.1 | 0–0.4 | 1.7 | 0 | 0–0.2 | 0.7 | ||
Rate_DDK AMR CV (syll/sec) | 0.1 | 0–0.2 | 1.4 | 0 | 0–0.1 | 0.4 | ||
Rate_DDK SMR CV (syll/sec) | 0.2 | 0–0.9 | 2.8 | 0 | 0–0.3 | 0.5 | ||
Voice | CPPs /a/ (dB) | 4.7 | 1.8–8.5 | 25 | 2.4 | 0.4–5.2 | 12.8 | |
HNR /a/ (dB) | 6.3 | 2.2–12.9 | 29.9 | 3.5 | 0.2–10.1 | 16.3 | ||
Jitter /a/ (%) | 0.1 | 0–0.6 | 62.1 | 0.1 | 0–0.1 | 16.7 | ||
SD f0 /a/ (Hz) | 2.9 | 0–48.8 | 62.5 | 2.4 | 0–48.7 | 49.9 | ||
Shimmer /a/ (%) | 7.0 | 3.3–13.6 | 374.4 | 3.5 | 1.1–8.6 | 187.4 | ||
CPPs sentence (dB) | 4.7 | 1.8–7.8 | 25 | 2.4 | 0.4–5.1 | 13 | ||
Mean f0 sentence (Hz) | 5.6 | 0.3–22.2 | 2.8 | 3.4 | 0–14.3 | 1.7 | ||
SD f0 sentence (Hz) | 8.2 | 0.2–41.1 | 29.2 | 3.8 | 0–33.5 | 10.9 | ||
Maximal Phonation Time (sec) | 2.4 | 0–17.1 | 12.5 | 0.3 | 0–8.6 | 1.7 | ||
Prosodic Contrast (semitones) | 3.6 | 0–15.6 | 273.2 | 2.1 | 0–11.2 | 109.3 | ||
Perceptual measures | Articulation (errors in pseudowords) | 1.3 | 0–7 | 0.9 | 0.6 | 0–5 | 0.4 | |
Intelligibility (incorrect words) | 0.4 | 0–3 | 2.7 | 0.2 | 0–1 | 0.6 | ||
Mean | 53.8 | 25.2 |
Interrater | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Local HQ | Local SQ | Online | |||||||||
m | Range | % | m | Range | % | m | Range | % | |||
Acoustic measures | Speech rate | Rate_Sentence (syll/sec) | 0.1 | 0–0.4 | 2.3 | 0.4 | 0.1–0.8 | 6.5 | 0.5 | 0–1.2 | 7.1 |
Rate_Days (word/sec) | 0 | 0–0.2 | 0.6 | 0 | 0–0.2 | 1.1 | 0 | 0–0.2 | 1.1 | ||
Rate_DDK AMR CCV (syll/sec) | 0 | 0–0.1 | 0.6 | 0 | 0–0.1 | 0.9 | 0 | 0–0.4 | 0.8 | ||
Rate_DDK AMR CV (syll/sec) | 0 | 0–0.2 | 0.5 | 0 | 0–0.2 | 0.6 | 0 | 0–0.2 | 0.5 | ||
Rate_DDK SMR CV (syll/sec) | 0 | 0–0.2 | 0.4 | 0 | 0–0.2 | 0.5 | 0.1 | 0–0.6 | 1.2 | ||
Voice | CPPs /a/ (dB) | 0 | 0–0.2 | 0.3 | 0.1 | 0–0.2 | 0.4 | 0.1 | 0–1.0 | 0.8 | |
HNR /a/ (dB) | 0.3 | 0–2.2 | 1.0 | 0.2 | 0–0.8 | 0.8 | 0.1 | 0–0.4 | 1.4 | ||
Jitter /a/ (%) | 0 | 0–0.1 | 2.5 | 0 | 0–0.1 | 5.5 | 0 | 0–0.1 | 9.4 | ||
SD f0 /a/ (Hz) | 1.2 | 0–12.6 | 14.8 | 1.4 | 0–14.1 | 14.2 | 0.3 | 0–2.1 | 13.5 | ||
Shimmer /a/ (%) | 0 | 0–0.1 | 1.8 | 0.1 | 0–0.3 | 1.9 | 0.1 | 0–0.6 | 2.6 | ||
CPPs sentence (dB) | 0 | 0–0.2 | 0.3 | 0.1 | 0–0.2 | 0.4 | 0.1 | 0–0.3 | 0.5 | ||
Mean f0 sentence (Hz) | 2.1 | 0.2–8.7 | 0.8 | 0.6 | 0–5.9 | 0.5 | 2.6 | 0–8.4 | 1.2 | ||
SD f0 sentence (Hz) | 3.9 | 0.2–19.8 | 8.7 | 0.3 | 0–2.9 | 3.6 | 6.7 | 0–30.3 | 15.8 | ||
Maximal Phonation Time (sec) | 0.1 | 0–0.6 | 0.7 | 0.4 | 0–3.8 | 2.4 | 1.1 | 0–5.2 | 7.9 | ||
Prosodic Contrast (semitones) | 0.9 | 0.2–2.9 | 66.6 | 1.6 | 0.1–5.1 | 99.0 | 2.6 | 0.7–9.0 | 90.6 | ||
Perceptual measures | Articulation (errors in pseudowords) | 0.1 | 0–1 | 0.1 | 0.8 | 0–5 | 0.3 | 1.1 | 0–5 | 0.4 | |
Intelligibility (incorrect words) | 0.3 | 0–1 | 1.2 | 0.3 | 0–1 | 1.2 | 0.6 | 0–3 | 2.7 | ||
Mean | 6.1 | 8.2 | 9.3 |
Intersource | Interrater | ||||||
---|---|---|---|---|---|---|---|
Local HQ vs. Online | Local HQ vs. Local SQ | Local HQ | Local SQ | Online | |||
Acoustic measures | Speech rate | Rate_Sentence | 0.87 | 0.89 | 0.97 | 0.77 | 0.77 |
Rate_Days | 0.99 | 0.99 | 0.98 | 0.96 | 0.96 | ||
Rate_DDK AMR CCV | 0.98 | 1 | 1 | 0.99 | 0.99 | ||
Rate_DDK AMR CV | 0.99 | 1 | 1 | 1 | 1 | ||
Rate_DDK SMR CV | 0.95 | 1 | 1 | 1 | 0.97 | ||
Voice | CPPs /a/ | 0.38 | 0.72 | 1 | 1 | 0.99 | |
HNR /a/ | 0.08 | 0.17 | 0.90 | 0.99 | 0.99 | ||
Jitter /a/ | 0.10 | 0.69 | 0.93 | 0.85 | 0.79 | ||
SD f0 /a/ | 0.12 | 0.16 | 0.40 | 0.44 | 0.50 | ||
Shimmer /a/ | 0.08 | 0.21 | 1 | 0.99 | 0.95 | ||
CPPs sentence | 0.38 | 0.72 | 1 | 1 | 1 | ||
Mean f0 sentence | 0.99 | 0.99 | 1 | 1 | 0.99 | ||
SD f0 sentence | 0.50 | 0.72 | 0.60 | 0.95 | 0.53 | ||
Maximal Phonation Time | 0.88 | 0.99 | 1 | 0.98 | 0.88 | ||
Prosodic Contrast | 0.43 | 0.76 | 0.79 | 0.60 | 0.43 | ||
Perceptual measures | Articulation (errors in pseudowords) | 0.47 | 0.69 | 0.98 | 0.64 | 0.76 | |
Intelligibility (incorrect words) | 0.27 | 0.44 | 0.46 | 0.13 | 0.39 | ||
Mean ICC | 0.56 | 0.70 | 0.91 | 0.84 | 0.82 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Python, G.; Demierre, C.; Bourqui, M.; Bourbon, A.; Chardenon, E.; Trouville, R.; Laganaro, M.; Fougeron, C. Comparison of In-Person and Online Recordings in the Clinical Teleassessment of Speech Production: A Pilot Study. Brain Sci. 2023, 13, 342. https://doi.org/10.3390/brainsci13020342
Python G, Demierre C, Bourqui M, Bourbon A, Chardenon E, Trouville R, Laganaro M, Fougeron C. Comparison of In-Person and Online Recordings in the Clinical Teleassessment of Speech Production: A Pilot Study. Brain Sciences. 2023; 13(2):342. https://doi.org/10.3390/brainsci13020342
Chicago/Turabian StylePython, Grégoire, Cyrielle Demierre, Marion Bourqui, Angelina Bourbon, Estelle Chardenon, Roland Trouville, Marina Laganaro, and Cécile Fougeron. 2023. "Comparison of In-Person and Online Recordings in the Clinical Teleassessment of Speech Production: A Pilot Study" Brain Sciences 13, no. 2: 342. https://doi.org/10.3390/brainsci13020342
APA StylePython, G., Demierre, C., Bourqui, M., Bourbon, A., Chardenon, E., Trouville, R., Laganaro, M., & Fougeron, C. (2023). Comparison of In-Person and Online Recordings in the Clinical Teleassessment of Speech Production: A Pilot Study. Brain Sciences, 13(2), 342. https://doi.org/10.3390/brainsci13020342