A Systematic Review of Using Deep Learning in Aphasia: Challenges and Future Directions
Abstract
:1. Introduction
2. Materials and Methods
- Deep Dyslexia—A Case-Study of Connectionist Neuropsychology
- Aphasia owing to subcortical brain infarcts in childhood.
- Simulating single word processing in the classic aphasia syndromes based on the Wernicke-Lichtheim-Geschwind theory.
- A proposed reinterpretation and reclassification of aphasic syndromes
- Dysgraphia in primary progressive aphasia: Characterisation of impairments and therapy options
- Sibilant Consonants Classification with Deep Neural Networks
- Bonato, P., Chen, Y., Chen, F., & Zhang, Y.-T. (2020). Guest editorial flexible sensing and medical imaging for cerebro-cardiovascular health. IEEE Journal of Biomedical and Health Informatics, 24 (11), 3189–3190.
- Zhang, X., Qin, F., Chen, Z., Gao, L., & Qiu Guoxin and Lu, S. (2020). Fast screening for children’s developmental language disorders via comprehensive speech ability evaluation-using a novel deep learning framework. Annals of Translational Medicine, 8 (11). (nothing to do with aphasia)
- Anjos, I., Cavalheiro Marques, N., Grilo, M., Guimaraes, I., Magalhaes, J., & Cavaco, S. (2020). Sibilant consonants classification comparison with multi- and single-class neural networks. Expert Systems, 37 (6, SI).
3. Challenges
3.1. High Variability in Speech Patterns
3.1.1. Linguistic Diversity
3.1.2. Individual Variability
3.2. Complexity of Speech Recognition
3.2.1. Linguistic Abnormalities
3.2.2. Speech Characteristics
3.3. Data Availability and Quality
3.3.1. Limited Annotated Data
3.3.2. Representation Variability
3.4. Model Complexity and Computational Efficiency
3.4.1. Optimal Model Complexity
3.4.2. Real-Time Deployment
3.5. Integration with Clinical Workflows
3.5.1. Usability Concerns
3.5.2. Latency Challenges
3.6. Algorithmic Challenges and Future Direction
4. Possible Solutions
4.1. Solution for High Variability in Speech Patterns
4.1.1. Develop Deep Learning Models Capable of Handling Diverse Speech Patterns and Linguistic Errors
4.1.2. Implement Data Augmentation Techniques to Artificially Increase the Variability in Training Data
4.2. Complexity of Speech Recognition
4.2.1. Design Architectures Specifically Tailored to Recognize and Account for Aphasic Speech Characteristics
4.2.2. Train Models on Large, Diverse Datasets That Encompass a Wide Range of Aphasic Speech Patterns
4.3. Data Availability and Quality
4.3.1. Investigate Techniques for Semi-Supervised and Unsupervised Learning to Make the Most of Limited Annotated Data
4.3.2. Collaborate with Speech-Language Pathologists to Annotate Data Accurately and Ensure its Clinical Relevance
4.4. Model Complexity and Computational Efficiency
4.4.1. Research Methods for Optimizing Models to Balance Complexity and Computational Efficiency
4.4.2. Utilize Model Compression and Quantization Techniques to Reduce the Computational Requirements of Deep Learning Models
4.5. Integration with Clinical Workflows
4.5.1. Explore Methods for Integrating Solutions Seamlessly into Existing Clinical Workflows
4.5.2. Develop User-Friendly Interfaces and Visualization Tools to Present Model Outputs in a Clinically Meaningful Manner
5. Discussion on Bibliometric Analysis
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Luria, A.R. Higher Cortical Functions in Man; Basic Books: New York, NY, USA, 1966. [Google Scholar]
- Shinn, P.; Blumstein, S.E. Phonetic Disintegration in Aphasia: Acoustic Analysis of Spectral Characteristics for Place of Articulation. Brain Lang. 1983, 20, 90–114. [Google Scholar] [CrossRef]
- Davis, G.A. Aphasiology: Disorders and Clinical Practice, 2nd ed.; Pearson: London, UK, 2007. [Google Scholar]
- Estabrooks, N.H.; Albert, M.L.; Nicholas, M. Manual of Aphasia and Aphasia Therapy, 3rd ed.; Pro-Ed: Austin, TX, USA, 2013. [Google Scholar]
- Lam, J.M.C.; Wodchis, W.P. The Relationship of 60 Disease Diagnoses and 15 Conditions to Preference-Based Health-Related Quality of Life in Ontario Hospital-Based Long-Term Care Residents. Med. Care 2010, 48, 380–387. [Google Scholar] [CrossRef] [PubMed]
- Marshall, R.C.; Wright, H.H. Developing a Clinician-Friendly Aphasia Test. Am. J. Speech Lang. Pathol. 2007, 16, 295–315. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Xiao, L.; Tian, H. Development and Norms of the Chinese Standard Aphasia Examination. Chin. J. Rehabil. Theory Pract. 2000, 6, 162–164. [Google Scholar]
- Gao, S. Aphasia, 2nd ed.; Peking University Medical Press: Beijing, China, 2006. [Google Scholar]
- Goodglass, E.; Caplan, E. Boston Diagnostic Aphasia Examination; Lea and Febiger: Philadelphia, PA, USA, 1983. [Google Scholar]
- Kertesz, A. Western Aphasia Battery-Revised. APA PsycTests 2007. [Google Scholar] [CrossRef]
- Halevi, G.; Moed, H.; Bar-Ilan, J. Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation—Review of the Literature. J. Informetr. 2017, 11, 823–834. [Google Scholar] [CrossRef]
- Gusenbauer, M.; Haddaway, N.R. Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 2020, 11, 181–217. [Google Scholar] [CrossRef]
- Kaur, A.; Gulati, S.; Sharma, R.; Sinhababu, A.; Chakravarty, R. Visual citation navigation of open education resources using Litmaps. Libr. Hi Tech News 2022, 39, 7–11. [Google Scholar] [CrossRef]
- Waltman, L.; van Eck, N.J.; Noyons, E.C.M. A unified approach to mapping and clustering of bibliometric networks. J. Informetr. 2010, 4, 629–635. [Google Scholar] [CrossRef]
- Mahmoud, S.S.; Kumar, A.; Li, Y.; Tang, Y.; Fang, Q. Performance evaluation of machine learning frameworks for aphasia assessment. Sensors 2021, 21, 2582. [Google Scholar] [CrossRef]
- Ranjith, R.; Chandrasekar, A. GTSO: Gradient tangent search optimization enabled voice transformer with speech intelligibility for aphasia. Comput. Speech Lang. 2024, 84, 101568. [Google Scholar] [CrossRef]
- Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Int. J. Surg. 2010, 8, 336–341. [Google Scholar] [CrossRef]
- Adikari, A.; Hernandez, N.; Alahakoon, D.; Rose, M.L.; Pierce, J.E. From concept to practice: A scoping review of the application of AI to aphasia diagnosis and management. Disabil. Rehabil. 2023, 46, 1288–1297. [Google Scholar] [CrossRef] [PubMed]
- Day, M.; Dey, R.K.; Baucum, M.; Paek, E.J.; Park, H.; Khojandi, A. Predicting Severity in People with Aphasia: A Natural Language Processing and Machine Learning Approach. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Virtual, 1–5 November 2021; IEEE: New York, NY, USA, 2021; pp. 2299–2302. [Google Scholar] [CrossRef]
- Barbera, D.S.; Huckvale, M.; Fleming, V.; Upton, E.; Coley-Fisher, H.; Shaw, I.; Latham, W.; Leff, A.P.; Crinion, J. An utterance verification system for word naming therapy in Aphasia. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Shanghai, China, 25–29 October 2020; International Speech Communication Association: Washington, DC, USA, 2020; pp. 706–710. [Google Scholar] [CrossRef]
- Barbera, D.S.; Huckvale, M.; Fleming, V.; Upton, E.; Coley-Fisher, H.; Doogan, C.; Shaw, I.; Latham, W.; Leff, A.P.; Crinion, J. NUVA: A Naming Utterance Verifier for Aphasia Treatment. Comput. Speech Lang. 2021, 69, 101221. [Google Scholar] [CrossRef] [PubMed]
- Herath, H.M.D.P.M.; Weraniyagoda, W.A.S.A.; Rajapaksha, R.T.M.; Wijesekara, P.A.D.S.N.; Sudheera, K.L.K.; Chong, P.H.J. Automatic Assessment of Aphasic Speech Sensed by Audio Sensors for Classification into Aphasia Severity Levels to Recommend Speech Therapies. Sensors 2022, 22, 6966. [Google Scholar] [CrossRef]
- Jothi, K.R.; Mamatha, V.L. A systematic review of machine learning based automatic speech assessment system to evaluate speech impairment. In Proceedings of the 3rd International Conference on Intelligent Sustainable Systems, ICISS 2020, Thoothukudi, India, 3–5 December 2020; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2020; pp. 175–185. [Google Scholar] [CrossRef]
- Fernandes, R.; Huang, L.; Vejarano, G. Non-audible speech classification using deep learning approaches. In Proceedings of the 6th Annual Conference on Computational Science and Computational Intelligence, CSCI 2019, Las Vegas, NV, USA, 5–7 December 2019; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2019; pp. 630–634. [Google Scholar] [CrossRef]
- Li, H.; Tang, C.; Vishwakarma, S.; Ge, Y.; Li, W. Speaker identification using Ultra-Wideband measurement of voice. IET Radar Sonar Navig. 2024, 18, 266–276. [Google Scholar] [CrossRef]
- Joshi, A.; Bagate, R.; Hambir, Y.; Sapkal, A.; Sable, N.P.; Lonare, M. System for Detection of Specific Learning Disabilities Based on Assessment. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 362–368. [Google Scholar]
- Krishna, G.; Carnahan, M.; Shamapant, S.; Surendranath, Y.; Jain, S.; Ghosh, A.; Tran, C.; Del R Millan, J.; Tewfik, A.H. Brain Signals to Rescue Aphasia, Apraxia and Dysarthria Speech Recognition. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, Virtual, 1–5 November 2021; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2021; pp. 6008–6014. [Google Scholar] [CrossRef]
- Kumar, A.; Mahmoud, S.S.; Wang, Y.; Faisal, S.; Fang, Q. A Comparison of Time-Frequency Distributions for Deep Learning-Based Speech Assessment of Aphasic Patients. In Proceedings of the International Conference on Human System Interaction, HSI, Melbourne, Australia, 28–31 July 2022; IEEE Computer Society: Piscataway, NJ, USA, 2022. [Google Scholar] [CrossRef]
- Ortiz-Perez, D.; Ruiz-Ponce, P.; Rodríguez-Juan, J.; Tomás, D.; Garcia-Rodriguez, J.; Nalepa, G.J. Deep Learning-Based Emotion Detection in Aphasia Patients. In Lecture Notes in Networks and Systems; Bringas, P.G., García, H.P., de Pisón, F.J.M., Álvarez, F.M., Lora, A.T., Herrero, Á., Rolle, J.L.C., Quintián, H., Corchado, E., Eds.; Springer Science and Business Media Deutschland GmbH: Berlin, Germany, 2023; pp. 195–204. [Google Scholar] [CrossRef]
- Qin, Y.; Wu, Y.; Lee, T.; Kong, A.P.H. An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia. J. Signal Process. Syst. Signal Image Video Technol. 2020, 92, 819–830. [Google Scholar] [CrossRef]
- Qin, Y.; Lee, T.; Wu, Y.; Kong, A.P.H. An End-to-End Approach to Automatic Speech Assessment for People with Aphasia. In Proceedings of the 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), Taipei, Taiwan, 26–29 November 2018; pp. 66–70. [Google Scholar]
- Mahmoud, S.S.; Kumar, A.; Tang, Y.; Li, Y.; Gu, X.; Fu, J.; Fang, Q. An efficient deep learning based method for speech assessment of mandarin-speaking aphasic patients. IEEE J. Biomed. Health Inform. 2020, 24, 3191–3202. [Google Scholar] [CrossRef]
- Mahmoud, S.S.; Pallaud, R.F.; Kumar, A.; Faisal, S.; Wang, Y.; Fang, Q. A Comparative Investigation of Automatic Speech Recognition Platforms for Aphasia Assessment Batteries. Sensors 2023, 23, 857. [Google Scholar] [CrossRef]
- Xu, H.; Dong, M.; Lee, M.-H.; O’Hara, N.; Asano, E.; Jeong, J.-W. Objective Detection of Eloquent Axonal Pathways to Minimize Postoperative Deficits in Pediatric Epilepsy Surgery Using Diffusion Tractography and Convolutional Neural Networks. IEEE Trans. Med. Imaging 2019, 38, 1910–1922. [Google Scholar] [CrossRef]
Database | Advanced Query Implementation Specific to Databases | Result |
---|---|---|
PubMed | ((Aphasia) AND (Deep Learning)) AND ((voice) OR (speech)) AND ((recognition) OR (disorder) OR (assessment)) | 12 |
Web of Science | ALL = (Aphasia) AND ALL = (Deep Learning) AND (ALL = (voice) OR ALL = (Speech)) AND (ALL = (recognition) OR ALL = (disorder) OR ALL = (assessment)) | 24 |
Scopus | TITLE-ABS-KEY (“Aphasia”) AND TITLE-ABS-KEY (“Deep Learning”) AND (TITLE-ABS-KEY (“voice”) OR TITLE-ABS-KEY (“speech”)) AND (TITLE-ABS-KEY (“recognition”) OR TITLE-ABS-KEY (“disorder”) OR TITLE-ABS-KEY (“assessment”)) | 17 |
IEEE Explore | (“All Metadata”: Aphasia AND “All Metadata”: Deep Learning AND (“All Metadata”: voice OR “All Metadata”: speech) AND (“All Metadata”: recognition OR “All Metadata”: disorder OR “All Metadata”: assessment)) | 19 |
Reference | Linguistic Diversity | Individual Variability | Linguistic Abnormalities | Speech Characteristics | Limited Annotated Data | Representation Variability | Optimal Model Complexity | Real-Time Deployment | Usability Concerns | Latency Challenges |
---|---|---|---|---|---|---|---|---|---|---|
[20] | No | Yes | Yes | No | No | No | No | Yes | No | Yes |
[18] | Yes | No | Yes | No | No | No | No | No | No | No |
[21] | No | Yes | Yes | No | No | No | No | Yes | No | Yes |
[19] | Yes | No | Yes | No | Yes | Yes | No | No | No | No |
[24] | No | No | No | No | Yes | No | No | No | No | No |
[25] | No | No | No | No | Yes | No | Yes | No | No | No |
[22] | No | No | Yes | No | Yes | No | No | No | Yes | No |
[26] | No | No | No | No | Yes | No | Yes | Yes | No | Yes |
[23] | No | No | No | Yes | No | No | No | No | No | No |
[27] | No | No | No | No | Yes | No | No | No | No | No |
[28] | No | No | No | No | Yes | No | No | No | No | No |
[29] | No | No | No | No | Yes | No | No | No | Yes | No |
[30] | No | No | No | No | Yes | No | No | No | No | No |
[31] | No | No | No | No | Yes | No | No | No | No | No |
[32] | No | No | No | No | Yes | No | No | No | No | No |
[15] | No | No | No | No | Yes | No | No | No | No | No |
[33] | No | No | No | No | Yes | No | No | No | No | No |
[34] | No | No | No | No | Yes | No | No | No | No | No |
[16] | No | No | Yes | Yes | Yes | No | Yes | Yes | Yes | No |
Ref. | Algorithm Used | Technology Challenges |
---|---|---|
[20] | Recurrent Neural Networks (RNNs) including LSTM and Gated Recurrent Units (GRUs), Dynamic Time Warping (DTW) | 1. High variability in speech patterns among individuals with aphasia, posing challenges for ASR systems. 2. Achieving accuracy comparable to human speech and language therapists (SLTs). 3. Ensuring system’s utility across various levels of speech impairment in aphasia. |
[18] | Supervised and unsupervised machine learning, NLP, fuzzy rules, genetic programming | 1. Complexity of speech recognition within aphasia, including paraphasic errors, neologisms, revisions, greater pause times, and agrammatism. 2. Slow-paced implementation of AI into aphasia management. 3. Need for data fusion from multiple modalities to improve accuracy. |
[21] | RNNs including LSTM and GRUs, DTW | 1. High variability in speech patterns among individuals with aphasia. 2. Need for system to process and classify a wide range of error types in aphasic speech. 3. Calibration of system’s threshold for classifying naming attempts. 4. Latency in system response time for real-time feedback during therapy sessions. |
[19] | NLP, Machine Learning (ML) | 1. High variability in language use among individuals with aphasia. 2. Need for large and diverse datasets to train algorithms effectively. 3. Complexity of accurately capturing and analyzing nuances of human language through NLP. 4. Ensuring ML models can be easily integrated into clinical workflows. |
[24] | LSTM, Bi-directional LSTM, 1-D Convolutional Neural Network (CNN), CWT-CNN | 1. Need for extensive, accurately annotated EMG data for training deep learning models. 2. Balancing model complexity and computational efficiency. 3. Signal processing and transformation challenges. |
[25] | Deep Learning, ResNet, Ultra-Wideband (UWB) technology | 1. Quality of radar data influenced by distance, orientation, and environment. 2. Complexity of distinguishing between similar voices. 3. Need for extensive, accurately annotated data for training deep learning models. 4. Balancing model complexity and computational efficiency. |
[22] | DNN, KNN, Decision Trees, Random Forest, Text to Speech (TTS) | 1. Need for accurately labeled data for training machine learning models. 2. Complexity of distinguishing between different aphasia severity levels. 3. Difficulty of capturing subtleties of aphasic speech. 4. Challenges associated with deploying effective and user-friendly software applications. |
[26] | Computer Vision, NLP, Deep Learning (CNNs, Transformer Models, DNNs), Eyeball tracking | 1. Handling complex and multiple Specific Learning Disabilities (SLDs). 2. Generating diverse and coherent questions for detection tests. 3. Improving quality and relevance of generated reports. 4. Complexity of integrating various technological solutions into a cohesive system. 5. Reliance on subjective human judgment for certain diagnostic tools. |
[23] | SVM, DNN, Hidden Markov Model (HMM) | 1. High variability in speech patterns among individuals with aphasia. 2. Need for extensive and accurately annotated data. 3. Complexity of distinguishing between different severity levels of aphasia. 4. Challenge of deploying effective, user-friendly software applications. |
[27] | Deep Learning (CNNs, Transformer Models, DNNs), Google Search and YouTube API | 1. Handling complex and multiple SLDs. 2. Generating diverse and coherent questions for detection tests. 3. Improving quality and relevance of generated reports. 4. Complexity of integrating various technological solutions into a cohesive system. 5. Reliance on subjective human judgment for certain diagnostic tools. |
[28] | CNNs | 1. High variability in speech patterns among individuals with aphasia. 2. Need for accurately labeled data for training deep learning models. 3. Complexity of identifying the most effective Time-Frequency Distributions (TFDs) for Automatic Speech Impairment Assessment (ASIA). |
[29] | Deep Learning (CNNs) | 1. Accurately differentiating between patients and interviewers. 2. Accurately interpreting emotions of aphasic patients. |
[30] | CNNs | 1. High variability in speech patterns among individuals with aphasia. 2. Need for accurately labeled data for training deep learning models. 3. Challenge of identifying the most effective TFDs for ASIA. |
[31] | RNNs, CNNs | 1. High variability in speech patterns among individuals with aphasia. 2. Need for accurately labeled data for training deep learning models. 3. Ensuring effectiveness of neural network models in accurately classifying and assessing speech impairment severity. |
[32] | CNNs | 1. Scarcity of aphasia syndrome datasets for improving CNN-enabled assessments. 2. Limitations of general-purpose ASR systems in accurately recognizing and assessing impaired speech. 3. Need for reliable, standardized automatic tools for speech assessment in Mandarin-speaking aphasic patients. 4. Complexity of accurately classifying speech data based on speech lucidity features. |
[15] | CML, Deep Neutral Network (CNNs) | 1. High variability in speech patterns among individuals with aphasia. 2. Need for large and well-annotated datasets for optimal model training and performance evaluation. 3. Need for reliable, standardized automatic tools for speech assessment in Mandarin-speaking aphasic patients. |
[33] | CNNs, LDA, Microsoft Azure, Google speech recognition platforms | 1. Variability in speech patterns among individuals with aphasia. 2. Scarcity of aphasia syndrome datasets. 3. Limitations of general-purpose ASR systems in recognizing impaired speech. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Cheng, W.; Sufi, F.; Fang, Q.; Mahmoud, S.S. A Systematic Review of Using Deep Learning in Aphasia: Challenges and Future Directions. Computers 2024, 13, 117. https://doi.org/10.3390/computers13050117
Wang Y, Cheng W, Sufi F, Fang Q, Mahmoud SS. A Systematic Review of Using Deep Learning in Aphasia: Challenges and Future Directions. Computers. 2024; 13(5):117. https://doi.org/10.3390/computers13050117
Chicago/Turabian StyleWang, Yin, Weibin Cheng, Fahim Sufi, Qiang Fang, and Seedahmed S. Mahmoud. 2024. "A Systematic Review of Using Deep Learning in Aphasia: Challenges and Future Directions" Computers 13, no. 5: 117. https://doi.org/10.3390/computers13050117
APA StyleWang, Y., Cheng, W., Sufi, F., Fang, Q., & Mahmoud, S. S. (2024). A Systematic Review of Using Deep Learning in Aphasia: Challenges and Future Directions. Computers, 13(5), 117. https://doi.org/10.3390/computers13050117