Next Article in Journal
An Efficient Attribute-Based Participant Selecting Scheme with Blockchain for Federated Learning in Smart Cities
Next Article in Special Issue
A Wireless Noninvasive Blood Pressure Measurement System Using MAX30102 and Random Forest Regressor for Photoplethysmography Signals
Previous Article in Journal
A Hybrid Deep Learning Architecture for Apple Foliar Disease Detection
Previous Article in Special Issue
GFLASSO-LR: Logistic Regression with Generalized Fused LASSO for Gene Selection in High-Dimensional Cancer Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

A Systematic Review of Using Deep Learning in Aphasia: Challenges and Future Directions

1
Department of Biomedical Engineering, College of Engineering, Shantou University, Shantou 515063, China
2
School of Public Health and Preventive Medicine, Monash University, 553 St. Kilda Rd., Melbourne, VIC 3004, Australia
*
Authors to whom correspondence should be addressed.
Computers 2024, 13(5), 117; https://doi.org/10.3390/computers13050117
Submission received: 21 March 2024 / Revised: 22 April 2024 / Accepted: 26 April 2024 / Published: 9 May 2024
(This article belongs to the Special Issue Machine and Deep Learning in the Health Domain 2024)

Abstract

:
In this systematic literature review, the intersection of deep learning applications within the aphasia domain is meticulously explored, acknowledging the condition’s complex nature and the nuanced challenges it presents for language comprehension and expression. By harnessing data from primary databases and employing advanced query methodologies, this study synthesizes findings from 28 relevant documents, unveiling a landscape marked by significant advancements and persistent challenges. Through a methodological lens grounded in the PRISMA framework (Version 2020) and Machine Learning-driven tools like VosViewer (Version 1.6.20) and Litmaps (Free Version), the research delineates the high variability in speech patterns, the intricacies of speech recognition, and the hurdles posed by limited and diverse datasets as core obstacles. Innovative solutions such as specialized deep learning models, data augmentation strategies, and the pivotal role of interdisciplinary collaboration in dataset annotation emerge as vital contributions to this field. The analysis culminates in identifying theoretical and practical pathways for surmounting these barriers, highlighting the potential of deep learning technologies to revolutionize aphasia assessment and treatment. This review not only consolidates current knowledge but also charts a course for future research, emphasizing the need for comprehensive datasets, model optimization, and integration into clinical workflows to enhance patient care. Ultimately, this work underscores the transformative power of deep learning in advancing aphasia diagnosis, treatment, and support, heralding a new era of innovation and interdisciplinary collaboration in addressing this challenging disorder.

1. Introduction

Aphasia is an acquired language disorder typically induced by cerebrovascular accidents, traumatic brain injuries, or other neurological disorders, leading to compromised or absent language acquisition capabilities [1,2]. This condition exerts a pervasive impact on language comprehension and expression, encompassing auditory comprehension, verbal communication, reading, and written expression [3,4]. Relevant studies indicate that individuals with aphasia experience significantly diminished quality of life, with severity levels surpassing even those observed in cancer patients [5].
Nevertheless, conventional aphasia assessment methods are excessively intricate and time-consuming. Many individuals with aphasia, particularly those in the acute phase of post-stroke aphasia, find it challenging to endure this form of language evaluation [6]. Illustratively, consider several commonly utilized aphasia assessment methods, such as the Chinese Rehabilitation Research Center Standard Aphasia Examination (CRRCAE) [7], Aphasia Battery of Chinese (ABC) [8], Boston Diagnostic Aphasia Examination (BDAE) [9], and Western Aphasia Battery (WAB) [10]. Although these instruments have undergone extensive clinical validation and psychological reliability testing, enabling a comprehensive evaluation of a subject’s strengths and weaknesses across various linguistic domains, they necessitate 30 min or even several hours to complete the entire assessment.
Due to the large population base of individuals with aphasia, traditional assessment approaches are time-consuming and labor-intensive, often resulting in a lack of effective guidance for a considerable number of aphasic patients during their speech and language rehabilitation. This situation indirectly contributes to the poor quality of life for individuals with aphasia and their families. Therefore, the development of automated assessment methods to assist persons with aphasia (PWA) is paramount. It can alleviate the burden on families with aphasia patients and speech-language pathologists (SLPs). The integration of machine learning and deep learning methods, as well as their amalgamation with aphasia assessment, provides a foundation for the advancement of automated assessment solutions. Home-based automated language therapy, facilitated by these technologies, eliminates the need for healthcare personnel on site and enables remote diagnosis and treatment. Simultaneously, aphasia assessment methods based on deep learning can assist SLPs in devising individualized treatment plans for patients. Figure 1 is a general framework of aphasia assessment based on deep learning.
A comprehensive systematic literature review on the application of deep learning methodologies in the context of aphasia is imperative for several reasons. Firstly, such a review provides a structured synthesis of existing research, offering insights into the current state of knowledge, methodological approaches, and empirical findings. By focusing on challenges encountered in implementing deep learning techniques for aphasia, this review can elucidate the complexities inherent in this domain, such as the scarcity of annotated datasets, the need for interpretability and explainability of models, and the ethical considerations surrounding data privacy and consent. Furthermore, by delineating future directions, including potential advancements in model architectures, incorporation of multimodal data sources, and refinement of evaluation metrics, this review can guide researchers and practitioners towards novel avenues for innovation and development in this crucial area of study, thereby facilitating enhanced diagnosis, treatment, and support for individuals with aphasia.
In this study, major databases like Scopus, Web of Science, IEEE Xplore, and PubMed were utilized to systematically obtain highly relevant papers using advanced query. Any databases not supporting advanced query (e.g., Google Scholar) were not utilized [11]. While Google Scholar is good at conducting exploratory research, existing studies have not deemed Google Scholar to be suitable for systematic literature reviews [11,12]. This study only used primary datastores like Scopus, Web of Science, PubMed as suggested by existing literacy works focused on systematic literature review [12]. After obtaining 75 papers from databases and Artificial Intelligence (AI)-based registers (e.g., Litmaps [13]), deduplication and screening were performed. By strictly following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework of literature review, 28 relevant documents were included in this study. Through an in-depth analysis of the 28 relevant studies, five key categories of challenges in employing deep learning for aphasia were identified. This systematic approach facilitated a comprehensive understanding of the obstacles hindering the effective use of deep learning in addressing aphasia-related issues, serving as a valuable resource for researchers and practitioners aiming to navigate and tackle these challenges in future endeavors.
Utilizing Machine Learning tools such as VosViewer [14], researchers employed co-occurrence analysis to identify patterns of keyword relationships within the literature, revealing thematic clusters indicative of common research themes and focal areas within the field of deep learning for aphasia. Additionally, co-authorship analysis, with authors as the unit of analysis, facilitated the exploration of collaborative networks and the identification of influential researchers and research groups, thereby providing insights into the interdisciplinary nature of research efforts and potential collaborations in this domain. This methodological approach is crucial for understanding the landscape of scholarly contributions, fostering collaboration, and guiding future research directions within the complex and evolving field of deep learning for aphasia.
According to the literature and to the best of our knowledge, this is the first systematic review focusing on employing deployment of deep learning techniques on aphasia specifically utilizing a systematic framework (i.e., PRISMA) and AI-based tools (e.g., Litmaps, VosViewer). The next section (i.e., Section 2) describes the systematic literature review. Then, Section 3 focuses on the challenges, followed by the solution in Section 4. Finally, Section 5 discusses the bibliometric analysis.

2. Materials and Methods

This review was registered at Open Science Framework (OSF) under the name of “A Systematic Review of Using Deep learning in Aphasia”. It is publicly accessible and available at https://osf.io/vsbmj/ (accessed on 25 April 2024). Initially, a comprehensive query framework was devised to encompass all pertinent literature within the domain of “Deep Learning in Aphasia”. As illustrated in Figure 2, seven primary keywords—“Aphasia”, “Deep Learning”, “Voice”, “Speech”, “Recognition”, “Disorder”, and “Assessment”—were interconnected using a combination of “AND” and “OR” logic to refine the search. The query design is delineated in Figure 2. However, during implementation, the unique guidelines and limitations of each database necessitated tailored query executions for Scopus, Web of Science, PubMed, and IEEE Explore, as depicted in Table 1. Figure 3 illustrates the outcomes derived from these query implementations.
Subsequently, all retrieved literature from the four selected databases underwent scrutiny based on their Digital Object Identifiers (DOIs) using Litmaps. Litmaps, as an innovative tool in the realm of bibliographic analysis, endeavors to elucidate the intricate interconnections among academic literatures within a given domain [13]. Employing sophisticated algorithms and semantic analysis techniques, Litmaps operates on the premise that scholarly works are inherently interconnected through thematic, conceptual, and contextual relationships [13]. Litmaps epitomizes the convergence of computational prowess and bibliographic acumen, offering researchers a sophisticated means of navigating the vast expanse of scholarly literature and uncovering the interconnected tapestry of ideas that underpin academic discourse in their respective domains. As seen from Figure 3, using the literature in [15] by Mahmoud et al. (2021) as the seed paper, a new paper by Ranjith et al. (2003) [16] was suggested by Litmaps. The paper by Ranjith [16] is clearly seen at the bottom right corner of Figure 3. This paper was not located through the advanced database queries conducted on Scopus, Web of Science, IEEE Explore, and PubMed. Consequently, an exhaustive collection of literature was acquired utilizing Litmaps as an alternative resource, as depicted in Figure 4.
Figure 5 demonstrates the PRISMA methodology that was applied for this study. The PRISMA framework stands as a cornerstone in the methodology of systematic literature reviews, embodying a standardized and rigorous approach to synthesizing evidence from diverse sources [17]. Its significance lies in its provision of a comprehensive and transparent reporting structure, ensuring the methodological rigor and reproducibility of systematic reviews. By delineating explicit criteria for study selection, data extraction, and synthesis, PRISMA serves to mitigate bias and enhance the reliability of review outcomes, thereby fostering confidence in the conclusions drawn from synthesized evidence. Moreover, adherence to PRISMA guidelines facilitates the dissemination of findings to stakeholders, including researchers, policymakers, and practitioners, thereby maximizing the impact and utility of systematic reviews in informing evidence-based decision making across diverse domains of inquiry.
As seen from Figure 5, once 75 articles were found from the databases and registers, each of the records was scrutinized for duplications. All the databases were searched on 15 March 2024 using the advanced queries of Table 1. A total of 32 duplicate records were identified, where identical articles appeared in multiple databases. Upon their removal, 42 unique articles remained. However, not all of these articles fell within the scope of interest for this study. For example, the following papers did not study deep learning:
  • Deep Dyslexia—A Case-Study of Connectionist Neuropsychology
  • Aphasia owing to subcortical brain infarcts in childhood.
  • Simulating single word processing in the classic aphasia syndromes based on the Wernicke-Lichtheim-Geschwind theory.
  • A proposed reinterpretation and reclassification of aphasic syndromes
  • Dysgraphia in primary progressive aphasia: Characterisation of impairments and therapy options
  • Sibilant Consonants Classification with Deep Neural Networks
Furthermore, the following paper did not pertain to aphasia:
  • Bonato, P., Chen, Y., Chen, F., & Zhang, Y.-T. (2020). Guest editorial flexible sensing and medical imaging for cerebro-cardiovascular health. IEEE Journal of Biomedical and Health Informatics, 24 (11), 3189–3190.
Nevertheless, the aforementioned papers were acquired utilizing the keyword scheme outlined in Figure 2. These records were subsequently excluded during the screening phase. As illustrated in Figure 5, eight papers that did not center on either deep learning or aphasia were filtered out during the initial screening stage, where only the title and abstract of the papers were scrutinized for eligibility. Following the removal of these 8 records, full texts were obtained for 34 records. Through meticulous examination of all 34 full texts, 6 papers were ultimately excluded as they did not align with the scope and objectives of this study. For example, the following two papers were eliminated, as they did not demonstrate any direct usability for aphasia.
  • Zhang, X., Qin, F., Chen, Z., Gao, L., & Qiu Guoxin and Lu, S. (2020). Fast screening for children’s developmental language disorders via comprehensive speech ability evaluation-using a novel deep learning framework. Annals of Translational Medicine, 8 (11). (nothing to do with aphasia)
  • Anjos, I., Cavalheiro Marques, N., Grilo, M., Guimaraes, I., Magalhaes, J., & Cavaco, S. (2020). Sibilant consonants classification comparison with multi- and single-class neural networks. Expert Systems, 37 (6, SI).
Ultimately, this study encompassed only 28 papers, adhering to the rigorous methodology outlined by PRISMA. The utilization of benchmark methodologies such as PRISMA alongside AI-driven tools like Litmaps distinguishes this study from others in the realm of deep learning on aphasia, rendering it scientifically and methodologically sound, robust, and comprehensive.

3. Challenges

Once the 28 literatures were obtained, all the papers were scrutinized for common themes and concepts in terms of challenges. Figure 6 shows these challenges along with the main causes.

3.1. High Variability in Speech Patterns

Individuals with aphasia exhibit a wide range of speech patterns, including paraphasic errors, neologisms, revisions, and agrammatism. This variability poses a significant challenge for automatic speech recognition (ASR) systems, making it difficult to accurately recognize and assess speech performance.

3.1.1. Linguistic Diversity

Aphasic speech encompasses a wide range of linguistic abnormalities, including paraphasic errors, neologisms, and agrammatism [18,19]. This diversity poses a significant challenge for automatic speech recognition systems, as they must be trained to recognize and interpret these various linguistic features accurately. Additionally, the linguistic characteristics of aphasic speech may vary greatly among individuals, making it difficult to develop a one-size-fits-all solution [19].

3.1.2. Individual Variability

Speech patterns can vary significantly among individuals with aphasia, even those with similar clinical diagnoses [20]. Factors such as the type and severity of aphasia, cognitive abilities, and other individual differences contribute to this variability [21]. As a result, automatic speech recognition systems must be robust enough to adapt to these individual differences and accurately assess speech performance across a diverse population of individuals with aphasia [20,21].

3.2. Complexity of Speech Recognition

The complexity of speech recognition within aphasia involves the need to recognize and appropriately account for various error types and linguistic characteristics specific to aphasic speech, such as paraphasic errors, neologisms, greater pause times, and agrammatism.

3.2.1. Linguistic Abnormalities

Aphasic speech often contains linguistic abnormalities such as paraphasic errors (substitution, addition, or omission of sounds or words), neologisms (novel or nonsensical words), and agrammatism (difficulty with grammar and sentence structure) [18,19]. These abnormalities pose challenges for automatic speech recognition systems, as they must be able to accurately recognize and interpret these linguistic features to assess speech performance effectively [20]. Incorporating algorithms capable of handling such linguistic complexity is essential for improving recognition accuracy [18,20,21,22].

3.2.2. Speech Characteristics

The unique characteristics of aphasic speech, such as longer pause times, reduced speech fluency, and distorted articulation, further complicate speech recognition [23]. These characteristics can vary widely among individuals with aphasia and may change over time, making it challenging to develop a one-size-fits-all solution. Addressing these challenges requires the development of sophisticated algorithms capable of capturing and interpreting the nuanced features of aphasic speech accurately [16,23].

3.3. Data Availability and Quality

Adequate data, particularly accurately annotated datasets, are crucial for training deep learning models effectively. However, there is a challenge in obtaining large and diverse datasets that adequately represent the variability in aphasic speech patterns, leading to difficulties in training robust models.

3.3.1. Limited Annotated Data

Obtaining large-scale, accurately annotated datasets of aphasic speech is challenging due to the time-consuming nature of manual annotation and the limited availability of such data [15,16,19,24,25]. Without sufficient data for training, deep learning models may struggle to generalize effectively across different types and severities of aphasia, leading to reduced performance and reliability in real-world applications [22,26]. Improving access to annotated datasets and developing techniques for efficient data annotation are essential for advancing research in this area [15,16,27,28,29,30,31,32,33,34].

3.3.2. Representation Variability

Aphasic speech exhibits considerable variability in its representation, including differences in speech characteristics, linguistic abnormalities, and individual variations [19]. Capturing this variability accurately in training data is crucial for developing robust and generalizable deep learning models. However, achieving adequate representation of this variability in datasets can be challenging, particularly given the limited availability of annotated data [19]. Addressing this challenge requires careful consideration of data collection methods, sample diversity, and data augmentation techniques to ensure that deep learning models can effectively capture and generalize from the full range of aphasic speech characteristics.

3.4. Model Complexity and Computational Efficiency

Balancing model complexity with computational efficiency is a significant challenge, especially when aiming for real-time applications or working with limited data. Finding the right balance is essential for optimizing both accuracy and efficiency in deep learning models.

3.4.1. Optimal Model Complexity

Balancing model complexity with computational efficiency is essential for developing effective deep learning solutions for aphasia [25,26]. While complex models may offer superior performance in recognizing the nuanced features of aphasic speech, they often come with increased computational costs and resource requirements, making them impractical for real-time applications or deployment on resource-constrained devices. Finding the right balance between model complexity and computational efficiency is crucial for developing scalable and deployable deep learning solutions that can meet the computational demands of real-world use cases [16,26].

3.4.2. Real-Time Deployment

Achieving real-time processing capabilities is critical for many applications in aphasia research, particularly those involving interactive therapy or clinical assessment tools [20,21]. However, the latency introduced by complex deep learning models and processing pipelines can hinder real-time performance, leading to delays in system response times and compromising the user experience [26]. Addressing latency challenges requires optimizing model architectures, streamlining processing pipelines, and leveraging hardware acceleration techniques to minimize processing times and achieve real-time performance without sacrificing recognition accuracy or reliability [16,21,26].

3.5. Integration with Clinical Workflows

Deploying deep learning solutions effectively into clinical workflows presents challenges, including ensuring user-friendly interfaces and actionable insights for clinicians without extensive technical expertise [22]. Additionally, the latency in real-time deployment and the need for seamless integration with existing clinical practices pose further challenges [22].

3.5.1. Usability Concerns

Ensuring that deep learning solutions are user-friendly and seamlessly integrated into clinical workflows is essential for their adoption and effectiveness in real-world settings [22,29]. Clinicians may have varying levels of technical expertise and familiarity with technology, so designing intuitive interfaces and workflows that align with existing clinical practices is critical for facilitating adoption and usability [16]. Additionally, addressing usability concerns requires considering factors such as workflow integration, user training, and support mechanisms to ensure that deep learning solutions can be effectively integrated into clinical practice and used to augment existing diagnostic and therapeutic processes [16,22].

3.5.2. Latency Challenges

Minimizing latency in deep learning solutions is crucial for providing timely and actionable insights to clinicians during patient assessment and therapy sessions. However, achieving low latency in real-world applications can be challenging, particularly when processing large volumes of data or implementing complex algorithms [20]. Addressing latency challenges requires optimizing processing pipelines, leveraging parallelization and distributed computing techniques, and prioritizing computational efficiency to minimize processing times and ensure timely delivery of results [21]. By minimizing latency, deep learning solutions can provide clinicians with real-time feedback and support decision-making processes during patient care [20,21,26].
Table 2 demonstrates how the above challenges and their causes impacted existing literature.

3.6. Algorithmic Challenges and Future Direction

Aphasia, a language disorder resulting from brain damage, poses significant challenges in accurate assessment and diagnosis due to the diverse manifestations of speech abnormalities among individuals. Machine learning and deep learning techniques offer promising avenues for automating aphasia assessment, leveraging algorithms to analyze speech patterns and neuroimaging data to provide valuable insights into language function and impairment severity. Table 3 highlights the diverse array of machine learning and deep learning methods employed in aphasia research and their respective contributions to advancing our understanding and management of this complex disorder.
The current research landscape in deep learning for aphasia assessment faces several algorithmic challenges that impact the development and deployment of effective solutions. One prominent challenge is the high variability in speech patterns exhibited by individuals with aphasia, as shown in Figure 6. This variability encompasses a wide range of linguistic abnormalities, including paraphasic errors, neologisms, and agrammatism. Such diversity poses significant hurdles for ASR systems, necessitating robust algorithms capable of accurately interpreting and assessing these various linguistic features. Moreover, individual variability further complicates speech recognition, as speech patterns can vary significantly among individuals with similar clinical diagnoses due to factors such as aphasia type, severity, and cognitive abilities. Addressing these challenges requires the development of adaptable algorithms capable of accommodating individual differences and accurately assessing speech performance across diverse populations of individuals with aphasia.
In addition to the complexity of speech recognition, the availability and quality of data present significant challenges (see Figure 6). Obtaining large-scale, accurately annotated datasets of aphasic speech remains challenging, primarily due to the time-consuming nature of manual annotation and the limited availability of such data. Limited annotated data impede the training of deep learning models, leading to reduced generalization and performance in real-world applications. Overcoming this challenge necessitates improving access to annotated datasets and developing efficient data annotation techniques. Furthermore, capturing the variability in aphasic speech representation accurately in training data is crucial for developing robust and generalizable deep learning models. Achieving adequate representation of this variability in datasets requires careful consideration of data collection methods, sample diversity, and data augmentation techniques.
Looking towards future directions, advancements in RNNs and CNNs are expected to continue driving progress in aphasia assessment. These architectures excel at capturing temporal dependencies and extracting features from sequential data like speech utterances. Moreover, techniques such as transfer learning and synthetic data generation hold promise for addressing data scarcity issues, enabling more efficient utilization of available data. Enhancing model interpretability through the incorporation of attention mechanisms and post hoc explanation methods is also crucial for fostering clinical acceptance and trust. Additionally, efforts to optimize model complexity while ensuring computational efficiency will facilitate real-time deployment of deep learning solutions in clinical workflows, supporting interactive therapy and assessment tools. Overall, future research endeavors should focus on overcoming existing algorithmic challenges while leveraging emerging trends to improve the accuracy, reliability, and usability of aphasia assessment systems in clinical practice.

4. Possible Solutions

By critically analyzing the five major challenges and their root causes, corresponding solutions (both theoretical and practical) were hypothesized. This is visually depicted in Figure 7.

4.1. Solution for High Variability in Speech Patterns

4.1.1. Develop Deep Learning Models Capable of Handling Diverse Speech Patterns and Linguistic Errors

Research efforts can focus on designing neural network architectures that are robust to variations in speech patterns commonly observed in individuals with aphasia. This may involve incorporating mechanisms such as attention mechanisms or hierarchical modeling to capture and adapt to different linguistic characteristics. Studies in [20,21,23,31] focus on developing deep learning models tailored for handling the high variability in speech patterns among individuals with aphasia. By designing models specifically adapted to the unique characteristics of aphasic speech, such as paraphasic errors and neologisms, researchers aim to improve the accuracy and robustness of automatic speech recognition systems for aphasia assessment.

4.1.2. Implement Data Augmentation Techniques to Artificially Increase the Variability in Training Data

By augmenting existing datasets with synthetic variations of aphasic speech patterns, such as paraphasic errors or neologisms, deep learning models can be trained on a more diverse range of inputs, thereby improving their generalization performance. The utilization of data augmentation techniques helps address the challenge of limited annotated data by artificially increasing the size and diversity of the training dataset [24]. By augmenting existing data with variations and distortions, researchers can train more robust deep learning models capable of generalizing better to unseen examples of aphasic speech.

4.2. Complexity of Speech Recognition

4.2.1. Design Architectures Specifically Tailored to Recognize and Account for Aphasic Speech Characteristics

Researchers can develop specialized neural network architectures optimized for the unique features of aphasic speech, such as RNNs augmented with gating mechanisms to handle variable-length sequences and linguistic abnormalities.
Studies in [18,19] focus on designing architectures specifically tailored for recognizing and assessing aphasic speech characteristics. By leveraging architectures optimized for handling the complexity of aphasia, such as RNNs and CNNs, researchers aim to improve the accuracy and efficiency of speech recognition systems for aphasia assessment.

4.2.2. Train Models on Large, Diverse Datasets That Encompass a Wide Range of Aphasic Speech Patterns

Acquiring and annotating comprehensive datasets containing diverse examples of aphasic speech can facilitate the training of deep learning models capable of accurately recognizing and interpreting the complex speech patterns associated with aphasia. The utilization of large, diverse datasets ensures that deep learning models are trained on a representative sample of aphasic speech patterns [22]. By training models on comprehensive datasets covering the variability in aphasic speech characteristics, researchers can improve the generalization and robustness of automatic speech recognition systems for aphasia assessment.

4.3. Data Availability and Quality

4.3.1. Investigate Techniques for Semi-Supervised and Unsupervised Learning to Make the Most of Limited Annotated Data

Leveraging semi-supervised or unsupervised learning approaches can help mitigate the reliance on large annotated datasets by leveraging unlabeled or partially labeled data to improve model performance. Study in [26] explored the use of unsupervised and semi-supervised learning techniques to address the challenge of limited annotated data. By leveraging techniques that can learn from unlabeled or partially labeled data, researchers aim to improve the efficiency of model training and enhance the performance of automatic speech recognition systems for aphasia assessment.

4.3.2. Collaborate with Speech-Language Pathologists to Annotate Data Accurately and Ensure its Clinical Relevance

Close collaboration with domain experts can ensure that annotated datasets accurately reflect the linguistic characteristics and diagnostic criteria relevant to aphasia, enhancing the quality and clinical utility of deep learning models trained on such data. Collaborating with speech-language pathologists ensures accurate annotation of aphasic speech data, addressing the challenge of obtaining accurately annotated datasets [29]. By involving domain experts in the annotation process, researchers can ensure that deep learning models are trained on high-quality data that capture the nuances of aphasic speech patterns effectively.

4.4. Model Complexity and Computational Efficiency

4.4.1. Research Methods for Optimizing Models to Balance Complexity and Computational Efficiency

Exploring techniques such as model pruning, quantization, or knowledge distillation can help reduce the computational complexity of deep learning models while preserving their predictive performance, making them more suitable for resource-constrained environments or real-time applications. Balancing model complexity and computational efficiency is crucial for deploying deep learning models in real-world applications, such as aphasia assessment [25]. By optimizing model architectures and training procedures to strike the right balance between complexity and efficiency, researchers can develop models that are both accurate and computationally efficient, making them suitable for real-time deployment in clinical settings.

4.4.2. Utilize Model Compression and Quantization Techniques to Reduce the Computational Requirements of Deep Learning Models

Techniques such as parameter sharing, low-rank factorization, or quantization can help compress the size of neural network models and reduce memory and computational overhead, enabling their deployment on resource-limited devices or platforms. Model compression and quantization techniques help reduce the computational resources required for deploying deep learning models, addressing the challenge of limited computational resources in clinical settings [34]. By compressing and quantizing model parameters, researchers can develop lightweight models that can be deployed on resource-constrained devices without sacrificing performance.

4.5. Integration with Clinical Workflows

4.5.1. Explore Methods for Integrating Solutions Seamlessly into Existing Clinical Workflows

Developing user-friendly interfaces and visualization tools that align with existing clinical practices can facilitate the adoption and integration of deep learning solutions into routine clinical assessments and interventions. Developing user-friendly interfaces and integrating solutions seamlessly into clinical workflows ensure that deep learning-based tools are accessible and usable by clinicians [16]. By designing interfaces that are intuitive and easy to use, researchers can facilitate the adoption of deep learning technologies in clinical settings, enabling clinicians to leverage these tools effectively for aphasia assessment.

4.5.2. Develop User-Friendly Interfaces and Visualization Tools to Present Model Outputs in a Clinically Meaningful Manner

Designing intuitive interfaces that provide interpretable and actionable insights derived from deep learning models can enhance their usability and acceptance among clinicians, ultimately improving patient care and outcomes. Integrating solutions seamlessly into clinical workflows ensures that deep learning-based tools are effectively incorporated into existing clinical practices [27]. By developing solutions that seamlessly integrate with clinical workflows and provide actionable insights for clinicians, researchers can enhance the utility and impact of deep learning technologies in aphasia assessment and treatment.

5. Discussion on Bibliometric Analysis

Bibliometric analysis, coupled with co-occurrence analysis of keywords, plays a pivotal role in elucidating the landscape of scholarly research within a particular field. By systematically analyzing the frequency and patterns of keyword co-occurrence across a corpus of academic literature, researchers can gain valuable insights into the prevailing themes, trends, and relationships within the field. This approach enables the identification of key concepts, emerging topics, and influential research directions, facilitating the mapping of knowledge domains and the exploration of interdisciplinary connections. By harnessing the power of bibliometric analysis and keyword co-occurrence, researchers can uncover hidden patterns, inform strategic decision-making, and advance scholarly discourse within their respective fields.
As seen from Figure 8, VOSviewer [14] was used for identifying clusters of keywords that co-occurred. Co-occurrence analysis with keywords in VOSviewer involves first constructing a co-occurrence matrix where each cell represents the frequency of occurrence of two keywords together in the analyzed documents. This matrix is often normalized to account for variations in keyword frequency. Next, clustering algorithms are applied to group keywords with similar co-occurrence patterns into clusters, facilitating the identification of thematic relationships within the literature. The resulting clusters are visualized in a two-dimensional space, with keywords within the same cluster positioned closer together. Interpretation of these clusters offers insights into the underlying research landscape, revealing distinct themes, topics, and subfields present in the analyzed literature. As seen from Figure 8, out of the 12 clusters identified, cluster 9 with the “aphasia” keyword had 16 occurrences and 317 links. However, the highest number of links were found for “Deep Learning”. Hence, as seen from Figure 9, the “deep learning” keyword had a higher link strength of 452 compared to the link strength of “aphasia” (only 404). In other words, within this study, there were more articles with “deep learning” keywords, compared to “aphasia”. Figure 9a,b also clearly demonstrated the associated keywords more likely to appear with aphasia and deep learning.
Similarly, co-authorship analysis with author as the unit of analysis, where one author has at least two articles, reveals four distinct clusters with 19 authors in total. Figure 10a shows the links among these four groups of researchers (i.e., four clusters) with 50 links among them. Figure 10b shows John Fang along with his co-authors S. S. Mahmoud and others at Shantou University. Figure 10c demonstrates the co-authorship links for D.S. Barbera. Undertaking bibliographic analytics such as the one shown in Figure 10, particularly through the lens of co-authorship analysis with the author as the unit of analysis, yields multifaceted benefits that extend across scholarly inquiry, institutional assessment, and research strategy formulation. By scrutinizing patterns of collaboration among authors within a given field or discipline, this approach offers insights into the dynamics of knowledge production, highlighting prolific partnerships, emergent research clusters, and influential hubs of expertise. Such analyses enable the identification of collaborative networks and the assessment of their impact, thereby elucidating the social structures that underpin scholarly communities. Additionally, co-authorship analysis facilitates the identification of key players and potential collaborators, fostering interdisciplinary engagement and knowledge exchange. Moreover, from an institutional standpoint, this approach enables the evaluation of research productivity, collaboration trends, and interdisciplinary engagement, thereby informing strategic decision-making and resource allocation. In summary, co-authorship analysis within bibliographic analytics affords researchers, institutions, and policymakers a nuanced understanding of scholarly collaboration dynamics, thereby facilitating collaboration, innovation, and the advancement of knowledge.

6. Conclusions

The systematic literature review on deploying deep learning in aphasia has made significant strides in advancing our understanding of how technology can be leveraged to improve assessment and treatment for individuals with this language disorder. This review illuminated the intricate challenges that researchers face, such as the high variability in speech patterns, the complexity of speech recognition, and the scarcity of annotated datasets. However, it also showcased innovative solutions like the development of specialized deep learning models, data augmentation techniques, and collaborative efforts for dataset annotation that are paving the way for more effective interventions.
One of the key achievements of this body of work is the identification and detailed examination of the multifaceted challenges in applying deep learning to aphasia. This study identified five core challenges by critically reviewing 28 relevant literatures. This has not only increased awareness of the specific needs in this area but has also laid a solid foundation for future research endeavors. The collaborative efforts highlighted, particularly in data annotation and model optimization, demonstrate the critical role of interdisciplinary approaches in overcoming these obstacles.
Despite these advancements, the review also sheds light on the limitations inherent in the current research landscape. The scarcity of large, diverse, and accurately annotated datasets remains a significant hurdle, limiting the generalizability and effectiveness of deep learning models. Additionally, the balance between model complexity and computational efficiency continues to be a critical issue, especially for real-time applications in clinical settings.
Looking ahead, the future of deep learning in aphasia treatment and assessment is bright but requires focused efforts on several fronts. Enhancing data availability through collaborative projects and leveraging advanced techniques for semi-supervised learning could address data scarcity and quality issues. Continued innovation in model architecture and optimization will also be essential for developing solutions that are not only effective but also practical for clinical deployment. Moreover, integrating these technological solutions into clinical workflows with user-friendly interfaces will be crucial for their adoption and impact on patient care.
In summary, this systematic literature review has highlighted both the progress and challenges in the field, providing a roadmap for future research. By continuing to build on the foundations laid by this work, there is potential for significant advancements in the diagnosis, treatment, and support for individuals with aphasia, ultimately enhancing their quality of life and communication abilities.

Author Contributions

Conceptualization, F.S. and S.S.M.; methodology, F.S.; software, F.S.; validation, Q.F., Q.F. and S.S.M.; formal analysis, F.S.; investigation, F.S., Q.F. and S.S.M.; resources, S.S.M.; data curation, F.S.; writing—original draft preparation, F.S.; writing—review and editing, S.S.M., F.S., Y.W. and W.C.; visualization, F.S.; supervision, Q.F.; project administration, Q.F. and S.S.M.; funding acquisition, Q.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the 2020 Li Ka Shing Foundation Cross-disciplinary Research Grant (Ref: 2020LKSFG04C).

Data Availability Statement

Bibliographic data in Research Information System (*.ris) file.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Luria, A.R. Higher Cortical Functions in Man; Basic Books: New York, NY, USA, 1966. [Google Scholar]
  2. Shinn, P.; Blumstein, S.E. Phonetic Disintegration in Aphasia: Acoustic Analysis of Spectral Characteristics for Place of Articulation. Brain Lang. 1983, 20, 90–114. [Google Scholar] [CrossRef]
  3. Davis, G.A. Aphasiology: Disorders and Clinical Practice, 2nd ed.; Pearson: London, UK, 2007. [Google Scholar]
  4. Estabrooks, N.H.; Albert, M.L.; Nicholas, M. Manual of Aphasia and Aphasia Therapy, 3rd ed.; Pro-Ed: Austin, TX, USA, 2013. [Google Scholar]
  5. Lam, J.M.C.; Wodchis, W.P. The Relationship of 60 Disease Diagnoses and 15 Conditions to Preference-Based Health-Related Quality of Life in Ontario Hospital-Based Long-Term Care Residents. Med. Care 2010, 48, 380–387. [Google Scholar] [CrossRef] [PubMed]
  6. Marshall, R.C.; Wright, H.H. Developing a Clinician-Friendly Aphasia Test. Am. J. Speech Lang. Pathol. 2007, 16, 295–315. [Google Scholar] [CrossRef] [PubMed]
  7. Li, S.; Xiao, L.; Tian, H. Development and Norms of the Chinese Standard Aphasia Examination. Chin. J. Rehabil. Theory Pract. 2000, 6, 162–164. [Google Scholar]
  8. Gao, S. Aphasia, 2nd ed.; Peking University Medical Press: Beijing, China, 2006. [Google Scholar]
  9. Goodglass, E.; Caplan, E. Boston Diagnostic Aphasia Examination; Lea and Febiger: Philadelphia, PA, USA, 1983. [Google Scholar]
  10. Kertesz, A. Western Aphasia Battery-Revised. APA PsycTests 2007. [Google Scholar] [CrossRef]
  11. Halevi, G.; Moed, H.; Bar-Ilan, J. Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation—Review of the Literature. J. Informetr. 2017, 11, 823–834. [Google Scholar] [CrossRef]
  12. Gusenbauer, M.; Haddaway, N.R. Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 2020, 11, 181–217. [Google Scholar] [CrossRef]
  13. Kaur, A.; Gulati, S.; Sharma, R.; Sinhababu, A.; Chakravarty, R. Visual citation navigation of open education resources using Litmaps. Libr. Hi Tech News 2022, 39, 7–11. [Google Scholar] [CrossRef]
  14. Waltman, L.; van Eck, N.J.; Noyons, E.C.M. A unified approach to mapping and clustering of bibliometric networks. J. Informetr. 2010, 4, 629–635. [Google Scholar] [CrossRef]
  15. Mahmoud, S.S.; Kumar, A.; Li, Y.; Tang, Y.; Fang, Q. Performance evaluation of machine learning frameworks for aphasia assessment. Sensors 2021, 21, 2582. [Google Scholar] [CrossRef]
  16. Ranjith, R.; Chandrasekar, A. GTSO: Gradient tangent search optimization enabled voice transformer with speech intelligibility for aphasia. Comput. Speech Lang. 2024, 84, 101568. [Google Scholar] [CrossRef]
  17. Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Int. J. Surg. 2010, 8, 336–341. [Google Scholar] [CrossRef]
  18. Adikari, A.; Hernandez, N.; Alahakoon, D.; Rose, M.L.; Pierce, J.E. From concept to practice: A scoping review of the application of AI to aphasia diagnosis and management. Disabil. Rehabil. 2023, 46, 1288–1297. [Google Scholar] [CrossRef] [PubMed]
  19. Day, M.; Dey, R.K.; Baucum, M.; Paek, E.J.; Park, H.; Khojandi, A. Predicting Severity in People with Aphasia: A Natural Language Processing and Machine Learning Approach. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Virtual, 1–5 November 2021; IEEE: New York, NY, USA, 2021; pp. 2299–2302. [Google Scholar] [CrossRef]
  20. Barbera, D.S.; Huckvale, M.; Fleming, V.; Upton, E.; Coley-Fisher, H.; Shaw, I.; Latham, W.; Leff, A.P.; Crinion, J. An utterance verification system for word naming therapy in Aphasia. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Shanghai, China, 25–29 October 2020; International Speech Communication Association: Washington, DC, USA, 2020; pp. 706–710. [Google Scholar] [CrossRef]
  21. Barbera, D.S.; Huckvale, M.; Fleming, V.; Upton, E.; Coley-Fisher, H.; Doogan, C.; Shaw, I.; Latham, W.; Leff, A.P.; Crinion, J. NUVA: A Naming Utterance Verifier for Aphasia Treatment. Comput. Speech Lang. 2021, 69, 101221. [Google Scholar] [CrossRef] [PubMed]
  22. Herath, H.M.D.P.M.; Weraniyagoda, W.A.S.A.; Rajapaksha, R.T.M.; Wijesekara, P.A.D.S.N.; Sudheera, K.L.K.; Chong, P.H.J. Automatic Assessment of Aphasic Speech Sensed by Audio Sensors for Classification into Aphasia Severity Levels to Recommend Speech Therapies. Sensors 2022, 22, 6966. [Google Scholar] [CrossRef]
  23. Jothi, K.R.; Mamatha, V.L. A systematic review of machine learning based automatic speech assessment system to evaluate speech impairment. In Proceedings of the 3rd International Conference on Intelligent Sustainable Systems, ICISS 2020, Thoothukudi, India, 3–5 December 2020; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2020; pp. 175–185. [Google Scholar] [CrossRef]
  24. Fernandes, R.; Huang, L.; Vejarano, G. Non-audible speech classification using deep learning approaches. In Proceedings of the 6th Annual Conference on Computational Science and Computational Intelligence, CSCI 2019, Las Vegas, NV, USA, 5–7 December 2019; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2019; pp. 630–634. [Google Scholar] [CrossRef]
  25. Li, H.; Tang, C.; Vishwakarma, S.; Ge, Y.; Li, W. Speaker identification using Ultra-Wideband measurement of voice. IET Radar Sonar Navig. 2024, 18, 266–276. [Google Scholar] [CrossRef]
  26. Joshi, A.; Bagate, R.; Hambir, Y.; Sapkal, A.; Sable, N.P.; Lonare, M. System for Detection of Specific Learning Disabilities Based on Assessment. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 362–368. [Google Scholar]
  27. Krishna, G.; Carnahan, M.; Shamapant, S.; Surendranath, Y.; Jain, S.; Ghosh, A.; Tran, C.; Del R Millan, J.; Tewfik, A.H. Brain Signals to Rescue Aphasia, Apraxia and Dysarthria Speech Recognition. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, Virtual, 1–5 November 2021; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2021; pp. 6008–6014. [Google Scholar] [CrossRef]
  28. Kumar, A.; Mahmoud, S.S.; Wang, Y.; Faisal, S.; Fang, Q. A Comparison of Time-Frequency Distributions for Deep Learning-Based Speech Assessment of Aphasic Patients. In Proceedings of the International Conference on Human System Interaction, HSI, Melbourne, Australia, 28–31 July 2022; IEEE Computer Society: Piscataway, NJ, USA, 2022. [Google Scholar] [CrossRef]
  29. Ortiz-Perez, D.; Ruiz-Ponce, P.; Rodríguez-Juan, J.; Tomás, D.; Garcia-Rodriguez, J.; Nalepa, G.J. Deep Learning-Based Emotion Detection in Aphasia Patients. In Lecture Notes in Networks and Systems; Bringas, P.G., García, H.P., de Pisón, F.J.M., Álvarez, F.M., Lora, A.T., Herrero, Á., Rolle, J.L.C., Quintián, H., Corchado, E., Eds.; Springer Science and Business Media Deutschland GmbH: Berlin, Germany, 2023; pp. 195–204. [Google Scholar] [CrossRef]
  30. Qin, Y.; Wu, Y.; Lee, T.; Kong, A.P.H. An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia. J. Signal Process. Syst. Signal Image Video Technol. 2020, 92, 819–830. [Google Scholar] [CrossRef]
  31. Qin, Y.; Lee, T.; Wu, Y.; Kong, A.P.H. An End-to-End Approach to Automatic Speech Assessment for People with Aphasia. In Proceedings of the 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), Taipei, Taiwan, 26–29 November 2018; pp. 66–70. [Google Scholar]
  32. Mahmoud, S.S.; Kumar, A.; Tang, Y.; Li, Y.; Gu, X.; Fu, J.; Fang, Q. An efficient deep learning based method for speech assessment of mandarin-speaking aphasic patients. IEEE J. Biomed. Health Inform. 2020, 24, 3191–3202. [Google Scholar] [CrossRef]
  33. Mahmoud, S.S.; Pallaud, R.F.; Kumar, A.; Faisal, S.; Wang, Y.; Fang, Q. A Comparative Investigation of Automatic Speech Recognition Platforms for Aphasia Assessment Batteries. Sensors 2023, 23, 857. [Google Scholar] [CrossRef]
  34. Xu, H.; Dong, M.; Lee, M.-H.; O’Hara, N.; Asano, E.; Jeong, J.-W. Objective Detection of Eloquent Axonal Pathways to Minimize Postoperative Deficits in Pediatric Epilepsy Surgery Using Diffusion Tractography and Convolutional Neural Networks. IEEE Trans. Med. Imaging 2019, 38, 1910–1922. [Google Scholar] [CrossRef]
Figure 1. Aphasia assessment framework using deep learning.
Figure 1. Aphasia assessment framework using deep learning.
Computers 13 00117 g001
Figure 2. Advanced query scheme for acquiring relevant articles on “Deep learning in Aphasia”.
Figure 2. Advanced query scheme for acquiring relevant articles on “Deep learning in Aphasia”.
Computers 13 00117 g002
Figure 3. Outcome of executing the advanced database queries of Table 1.
Figure 3. Outcome of executing the advanced database queries of Table 1.
Computers 13 00117 g003aComputers 13 00117 g003b
Figure 4. Litmaps identified a relevant paper by Ranjith 2023 [16] by using Mahmoud 2021 [15] as a seed paper.
Figure 4. Litmaps identified a relevant paper by Ranjith 2023 [16] by using Mahmoud 2021 [15] as a seed paper.
Computers 13 00117 g004
Figure 5. The application of PRISMA methodology for performing systematic literature review for this study.
Figure 5. The application of PRISMA methodology for performing systematic literature review for this study.
Computers 13 00117 g005
Figure 6. Five major challenges arising in the current research on “Deep Learning on Aphasia”.
Figure 6. Five major challenges arising in the current research on “Deep Learning on Aphasia”.
Computers 13 00117 g006
Figure 7. Theoretical and practical solutions for the five major challenges.
Figure 7. Theoretical and practical solutions for the five major challenges.
Computers 13 00117 g007
Figure 8. Co-occurrence analysis with a minimum of 1 keyword occurrence in article identified 391 keywords in 12 clusters and 11,387 links with 11,895 total link strength.
Figure 8. Co-occurrence analysis with a minimum of 1 keyword occurrence in article identified 391 keywords in 12 clusters and 11,387 links with 11,895 total link strength.
Computers 13 00117 g008
Figure 9. Keyword co-occurrence analysis for “aphasia” and “deep learning”. (a) Keyword aphasia with 404 total link strength, 16 occurrences, and 317 links (cluster 9). (b) Keyword aphasia with 452 total link strength, 16 occurrences, and 349 links (cluster 2).
Figure 9. Keyword co-occurrence analysis for “aphasia” and “deep learning”. (a) Keyword aphasia with 404 total link strength, 16 occurrences, and 317 links (cluster 9). (b) Keyword aphasia with 452 total link strength, 16 occurrences, and 349 links (cluster 2).
Computers 13 00117 g009
Figure 10. Co-authorship analysis with author as the unit of analysis, distinct groups of active researchers in the field. (a) Nineteen authors with 50 total links. (b) Author, John Fang with 4 total links in Cluster 2. (c) Author D. S. Barbera with 8 links in Cluster 1.
Figure 10. Co-authorship analysis with author as the unit of analysis, distinct groups of active researchers in the field. (a) Nineteen authors with 50 total links. (b) Author, John Fang with 4 total links in Cluster 2. (c) Author D. S. Barbera with 8 links in Cluster 1.
Computers 13 00117 g010
Table 1. Specific implementation of advanced query for each of the selected databases.
Table 1. Specific implementation of advanced query for each of the selected databases.
DatabaseAdvanced Query Implementation Specific to DatabasesResult
PubMed((Aphasia) AND (Deep Learning)) AND ((voice) OR (speech)) AND ((recognition) OR (disorder) OR (assessment))12
Web of ScienceALL = (Aphasia) AND ALL = (Deep Learning) AND (ALL = (voice) OR ALL = (Speech)) AND (ALL = (recognition) OR ALL = (disorder) OR ALL = (assessment))24
ScopusTITLE-ABS-KEY (“Aphasia”) AND TITLE-ABS-KEY (“Deep Learning”) AND (TITLE-ABS-KEY (“voice”) OR TITLE-ABS-KEY (“speech”)) AND (TITLE-ABS-KEY (“recognition”) OR TITLE-ABS-KEY (“disorder”) OR TITLE-ABS-KEY (“assessment”))17
IEEE Explore(“All Metadata”: Aphasia AND “All Metadata”: Deep Learning AND (“All Metadata”: voice OR “All Metadata”: speech) AND (“All Metadata”: recognition OR “All Metadata”: disorder OR “All Metadata”: assessment))19
Table 2. Categorization of existing literatures on “deep learning in aphasia” into challenge classes.
Table 2. Categorization of existing literatures on “deep learning in aphasia” into challenge classes.
ReferenceLinguistic DiversityIndividual VariabilityLinguistic AbnormalitiesSpeech CharacteristicsLimited Annotated DataRepresentation VariabilityOptimal Model ComplexityReal-Time DeploymentUsability ConcernsLatency Challenges
[20]NoYesYesNoNoNoNoYesNoYes
[18]YesNoYesNoNoNoNoNoNoNo
[21]NoYesYesNoNoNoNoYesNoYes
[19]YesNoYesNoYesYesNoNoNoNo
[24]NoNoNoNoYesNoNoNoNoNo
[25]NoNoNoNoYesNoYesNoNoNo
[22]NoNoYesNoYesNoNoNoYesNo
[26]NoNoNoNoYesNoYesYesNoYes
[23]NoNoNoYesNoNoNoNoNoNo
[27]NoNoNoNoYesNoNoNoNoNo
[28]NoNoNoNoYesNoNoNoNoNo
[29]NoNoNoNoYesNoNoNoYesNo
[30]NoNoNoNoYesNoNoNoNoNo
[31]NoNoNoNoYesNoNoNoNoNo
[32]NoNoNoNoYesNoNoNoNoNo
[15]NoNoNoNoYesNoNoNoNoNo
[33]NoNoNoNoYesNoNoNoNoNo
[34]NoNoNoNoYesNoNoNoNoNo
[16]NoNoYesYesYesNoYesYesYesNo
Table 3. Detailed review of deep learning algorithms for aphasia assessment.
Table 3. Detailed review of deep learning algorithms for aphasia assessment.
Ref.Algorithm UsedTechnology Challenges
[20]Recurrent Neural Networks (RNNs) including LSTM and Gated Recurrent Units (GRUs), Dynamic Time Warping (DTW)1. High variability in speech patterns among individuals with aphasia, posing challenges for ASR systems. 2. Achieving accuracy comparable to human speech and language therapists (SLTs). 3. Ensuring system’s utility across various levels of speech impairment in aphasia.
[18]Supervised and unsupervised machine learning, NLP, fuzzy rules, genetic programming1. Complexity of speech recognition within aphasia, including paraphasic errors, neologisms, revisions, greater pause times, and agrammatism. 2. Slow-paced implementation of AI into aphasia management. 3. Need for data fusion from multiple modalities to improve accuracy.
[21]RNNs including LSTM and GRUs, DTW1. High variability in speech patterns among individuals with aphasia. 2. Need for system to process and classify a wide range of error types in aphasic speech. 3. Calibration of system’s threshold for classifying naming attempts. 4. Latency in system response time for real-time feedback during therapy sessions.
[19]NLP, Machine Learning (ML)1. High variability in language use among individuals with aphasia. 2. Need for large and diverse datasets to train algorithms effectively. 3. Complexity of accurately capturing and analyzing nuances of human language through NLP. 4. Ensuring ML models can be easily integrated into clinical workflows.
[24]LSTM, Bi-directional LSTM, 1-D Convolutional Neural Network (CNN), CWT-CNN1. Need for extensive, accurately annotated EMG data for training deep learning models. 2. Balancing model complexity and computational efficiency. 3. Signal processing and transformation challenges.
[25]Deep Learning, ResNet, Ultra-Wideband (UWB) technology1. Quality of radar data influenced by distance, orientation, and environment. 2. Complexity of distinguishing between similar voices. 3. Need for extensive, accurately annotated data for training deep learning models. 4. Balancing model complexity and computational efficiency.
[22]DNN, KNN, Decision Trees, Random Forest, Text to Speech (TTS)1. Need for accurately labeled data for training machine learning models. 2. Complexity of distinguishing between different aphasia severity levels. 3. Difficulty of capturing subtleties of aphasic speech. 4. Challenges associated with deploying effective and user-friendly software applications.
[26]Computer Vision, NLP, Deep Learning (CNNs, Transformer Models, DNNs), Eyeball tracking1. Handling complex and multiple Specific Learning Disabilities (SLDs). 2. Generating diverse and coherent questions for detection tests. 3. Improving quality and relevance of generated reports. 4. Complexity of integrating various technological solutions into a cohesive system. 5. Reliance on subjective human judgment for certain diagnostic tools.
[23]SVM, DNN, Hidden Markov Model (HMM)1. High variability in speech patterns among individuals with aphasia. 2. Need for extensive and accurately annotated data. 3. Complexity of distinguishing between different severity levels of aphasia. 4. Challenge of deploying effective, user-friendly software applications.
[27]Deep Learning (CNNs, Transformer Models, DNNs), Google Search and YouTube API1. Handling complex and multiple SLDs. 2. Generating diverse and coherent questions for detection tests. 3. Improving quality and relevance of generated reports. 4. Complexity of integrating various technological solutions into a cohesive system. 5. Reliance on subjective human judgment for certain diagnostic tools.
[28]CNNs1. High variability in speech patterns among individuals with aphasia. 2. Need for accurately labeled data for training deep learning models. 3. Complexity of identifying the most effective Time-Frequency Distributions (TFDs) for Automatic Speech Impairment Assessment (ASIA).
[29]Deep Learning (CNNs)1. Accurately differentiating between patients and interviewers. 2. Accurately interpreting emotions of aphasic patients.
[30]CNNs1. High variability in speech patterns among individuals with aphasia. 2. Need for accurately labeled data for training deep learning models. 3. Challenge of identifying the most effective TFDs for ASIA.
[31]RNNs, CNNs1. High variability in speech patterns among individuals with aphasia. 2. Need for accurately labeled data for training deep learning models. 3. Ensuring effectiveness of neural network models in accurately classifying and assessing speech impairment severity.
[32]CNNs1. Scarcity of aphasia syndrome datasets for improving CNN-enabled assessments. 2. Limitations of general-purpose ASR systems in accurately recognizing and assessing impaired speech. 3. Need for reliable, standardized automatic tools for speech assessment in Mandarin-speaking aphasic patients. 4. Complexity of accurately classifying speech data based on speech lucidity features.
[15]CML, Deep Neutral Network (CNNs)1. High variability in speech patterns among individuals with aphasia. 2. Need for large and well-annotated datasets for optimal model training and performance evaluation. 3. Need for reliable, standardized automatic tools for speech assessment in Mandarin-speaking aphasic patients.
[33]CNNs, LDA, Microsoft Azure, Google speech recognition platforms1. Variability in speech patterns among individuals with aphasia. 2. Scarcity of aphasia syndrome datasets. 3. Limitations of general-purpose ASR systems in recognizing impaired speech.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Cheng, W.; Sufi, F.; Fang, Q.; Mahmoud, S.S. A Systematic Review of Using Deep Learning in Aphasia: Challenges and Future Directions. Computers 2024, 13, 117. https://doi.org/10.3390/computers13050117

AMA Style

Wang Y, Cheng W, Sufi F, Fang Q, Mahmoud SS. A Systematic Review of Using Deep Learning in Aphasia: Challenges and Future Directions. Computers. 2024; 13(5):117. https://doi.org/10.3390/computers13050117

Chicago/Turabian Style

Wang, Yin, Weibin Cheng, Fahim Sufi, Qiang Fang, and Seedahmed S. Mahmoud. 2024. "A Systematic Review of Using Deep Learning in Aphasia: Challenges and Future Directions" Computers 13, no. 5: 117. https://doi.org/10.3390/computers13050117

APA Style

Wang, Y., Cheng, W., Sufi, F., Fang, Q., & Mahmoud, S. S. (2024). A Systematic Review of Using Deep Learning in Aphasia: Challenges and Future Directions. Computers, 13(5), 117. https://doi.org/10.3390/computers13050117

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop