Next Article in Journal
The Sustainability Coefficient of Urban Open Space Illumination Compliance as a Subjective Indicator of Environmental Comfort
Previous Article in Journal
AFF_CGE: Combined Attention-Aware Feature Fusion and Communication Graph Embedding Learning for Detecting Encrypted Malicious Traffic
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Personal and Clinical Predictors of Voice Therapy Outcomes: A Machine Learning Analysis Using the Voice Handicap Index

1
Department of Bigdata Medical Convergence, Eulji University, 553 Sanseong-daero, Seongnam-si 13135, Republic of Korea
2
Department of Otorhinolaryngology, Nowon Eulji Medical Center, School of Medicine, Eulji University, 68 Hangeulbiseok-ro, Nowon-gu, Seoul 01830, Republic of Korea
3
Division of Global Business Languages, Seokyeong University, Seogyeong-ro, Seongbuk-gu, Seoul 02173, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(22), 10376; https://doi.org/10.3390/app142210376
Submission received: 4 October 2024 / Revised: 1 November 2024 / Accepted: 8 November 2024 / Published: 11 November 2024

Abstract

:
In this study, we examine the predictive factors influencing the outcomes of voice treatment in patients with voice-related disorders, using the voice handicap index (VHI) as a key assessment tool. By analyzing various personal habits and clinical variables, we identify the primary factors associated with changes when comparing VHI scores before and after voice treatment. For this research, we employed binomial logistic regression, random forest (RF), and a multilayer perceptron (MLP) model to evaluate the effectiveness of voice treatment. The findings reveal that gender (with female patients showing greater improvements in VHI scores compared to male patients), surgical history, voice use status, and voice training status are significant predictors of therapy outcomes. The MLP model demonstrated high accuracy, sensitivity, and specificity, with an area under the curve (AUC) value of 0.87 indicating its potential as a valuable clinical predictive tool; however, the model’s relatively low specificity suggests the need for further refinement to enhance its predictive accuracy. The results of this study provide valuable insights for clinicians and speech–language pathologists in developing personalized treatment strategies to optimize the effectiveness of voice therapy. Future research should prioritize the validation of these findings in larger and more diverse population samples. Furthermore, it is essential to explore additional predictive variables in order to enhance the model’s accuracy across different types of voice disorders.

1. Introduction

Voice disorders are defined as deviations in the quality, pitch, volume, or variability of one’s voice compared to those of people from similar age, gender, and cultural groups [1]. They encompass more than simply abnormal morphological findings in the larynx or perceptibly abnormal voice quality, as the impact on the patient’s life must also be considered [2]. Therefore, in clinical practice, comprehensive and multidimensional evaluations are widely used, including acoustic analysis, auditory–perceptual assessment, and psychometric evaluations based on patient self-reports. These thorough evaluations provide information on the differences in perception of the severity of the voice disorder between the clinician and the patient [3], which is crucial for planning intervention programs and serves as a benchmark for measuring the outcomes of the intervention [4,5].
The evaluation of a speaker’s self-awareness of their voice problems has attracted significant attention [6]. From the patient’s perspective, the degree to which they perceive their voice issues can vary greatly depending on their individual circumstances, such as their occupation and social activities; for example, even if several patients are judged by evaluators to have the same degree of voice problem, a particular patient whose daily life does not heavily depend on the use of their voice might not feel particularly inconvenienced. Conversely, if a patient’s voice is crucial for their livelihood, hobbies, or professional activities, they may perceive their voice problems as severe and report significant discomfort. Although the degree of self-awareness of voice issues can vary depending on the situation and individual predispositions, it is a very important variable from a therapeutic standpoint and plays a crucial role as an evaluation factor [6,7].
Accordingly, tools for assessing the self-awareness of speakers with voice problems have been developed and are actively used in clinical practice and research, both domestically and internationally [8,9,10,11]. The most representative of these tools is the voice handicap index (VHI), which is used for voice diagnosis and treatment effect analysis in many countries, including the Republic of Korea. Specifically, the VHI quantifies the degree of handicap a patient feels due to voice disorders before and after treatment, thus serving as a tool to evaluate the effectiveness of treatments. The VHI was designed by Jacobson et al. in 1997 and remains the most widely used tool to date [12]; it has been adapted into two Korean versions and is widely used in clinical practice at voice clinics in Korea [6,8].
In recent clinical research settings, efforts have been made to more meticulously analyze the clinical value of evaluating the self-awareness of voice problems, which involves studying the correlation with objective assessments [13,14,15,16]. Ben Barsties v. Latoszek et al. systematically reviewed the literature and conducted a meta-analysis to compare the efficacies of voice therapy, phonosurgery, and a combination of both treatments for vocal fold polyp (VFP) treatment. The study aimed to evaluate the effectiveness of each treatment method by comparing the VHI scores before and after treatment. The results of this study suggest that the VHI can be a valuable tool in guiding treatment decisions for patients with vocal fold polyps [13].
Kenichi Watanabe et al. evaluated the characteristics of the VHI after arytenoid adduction (AA) in patients with unilateral vocal fold paralysis (UVFP). The results from 43 patients showed significant improvements in the functional, physical, and emotional subscale scores of the VHI before and after surgery. Postoperative VHI scores had weak correlations with other voice measurements, but preoperative VHI scores were the only significant predictive variable; therefore, the results of this study demonstrate that VHI is an important tool for assessing treatment outcomes in UVFP patients [14].
Marzie Jalalian et al. used multivariate analysis of variance (MANOVA) to compare the VHI scores between groups with structural and functional voice disorders. The results showed that patients with structural voice disorders had significantly higher VHI scores than those with functional voice disorders; therefore, this paper is significant in that the authors evaluate the severity of voice disorders according to each type and propose appropriate treatment methods [15].
Imam Muslih et al. analyzed the association between the VHI and Praat voice analysis in patients with benign vocal cord lesions before and after microscopic laryngeal surgery. The results of this study demonstrate a significant correlation between the total VHI score and jitter, as well as between the functional subscale score and HNR after surgery [16].
The above findings indicated that the VHI has proven to be an important assessment tool in various voice disorders and treatment situations, including vocal fold polyp treatment, arytenoid adduction surgery for patients with unilateral vocal fold paralysis, the evaluation of structural and functional voice disorders, microscopic laryngeal surgery for benign vocal cord lesions, etc. Notably, the VHI has shown correlations with specific acoustic parameters; overall, it demonstrates its significance as a meaningful indicator for evaluating treatment outcomes in patients with various voice disorders [13,14,15,16,17,18]. However, these studies focus on analyzing the correlations between acoustic parameters and VHI scores using p-values. Additionally, some studies have centered on correlation analysis with only the total VHI score; thus, the first limitation of these studies is the limitation of correlation analysis. Correlation analysis does not establish causation, and relying on p-values may not adequately reflect the clinical significance of the findings. Secondly, in most studies, researchers have analyzed the significant relationships between objective acoustic parameters and the AHI. These constraints have compromised the reliability of outcomes in earlier studies, contributing to a limited and fragmented comprehension of speech treatment.
In order to address these limitations in this study, we aim to conduct a comprehensive analysis of predictive factors related to personal habits that influence speech therapy outcomes, as measured using the VHI; in particular, we examine significant relationships and predictive factors, not only in relation to the total VHI score, but also across its functional, physical, and emotional subdomains. We utilize a range of methodologies to achieve these objectives, including assessments of personal characteristics, statistical analysis, logistic regression, random forest, and a multilayer perceptron model. Through this multifaceted approach, we seek to elucidate the correlations between predictive factors and VHI outcomes, encompassing the functional, physical, and emotional scores. The findings of this research are anticipated to offer valuable insights for clinicians and speech–language therapists, particularly in the context of the VHI; moreover, the development of personalized treatment strategies as informed by these predictive factors has the potential to significantly enhance the efficacy of speech therapy and improve both vocal function and quality of life for individuals with speech disorders.
The contributions of this study are summarized as follows:
  • Multidimensional evaluation: in this study, the significant relationships and predictive factors between personal habits and not only the total VHI score, but also the functional, physical, and emotional scores, are analyzed, thus allowing for a more comprehensive understanding and assessment of various voice disorder patients;
  • Utilization of various analytical methods: predictive factors are analyzed based on VHI scores to evaluate the effectiveness of voice therapy, employing various methods, such as personal characteristic assessments, statistical analysis, logistic regression analysis, random forest, and the multilayer perceptron model, thereby enhancing the depth and accuracy of the research;
  • Superiority of methodological approach: the utilization of analytical techniques such as logistic regression analysis, random forest, and the multilayer perceptron model demonstrates the methodological rigor in measuring the effectiveness of voice therapy; this approach not only strengthens the validity of the results, but also sets a precedent for future research in the field;
  • Providing clinical guidelines: The findings of this study are anticipated to offer valuable guidelines for clinicians and speech–language pathologists. By incorporating predictive factors, the formulation of personalized speech therapy strategies has the potential to significantly enhance the effectiveness of treatment, thereby improving vocal function and quality of life for individuals with speech disorders.

2. Materials and Methods

2.1. Database

In this study, we retrospectively analyzed the medical records of 304 patients diagnosed with voice disorders at the Otorhinolaryngology Department of Nowon Eulji Medical Center, Eulji University, from March 2020 to January 2024. This research was conducted in compliance with research ethics guidelines and was approved by the Institutional Review Board (IRB No. 2022-04-014). Following the exclusion of patients who did not complete voice therapy or were lost to follow-up, 102 patients with voice disorders were included in the study. A microphone was positioned 5 cm from a subject’s mouth, and the subject was instructed to produce and sustain the vowel /a/ at a comfortable pitch and volume [11]. The recorded audio data were subsequently transferred to a computer with a sampling frequency of 44.1 kHz for further analysis using the multidimensional voice program (MDVP) software version 2.3 from Kay Elemetrics [6,13,16].
Table 1 provides an overview of the dataset utilized in this study. The dataset includes recordings of the /a/ vowel from 66 female and 36 male subjects who were diagnosed with more than 19 distinct pathologies. Subjects were divided into two groups according to the effectiveness of voice treatment as determined based on improvements in the total VHI score and its functional, physical, and emotional subscale scores. The first group, termed the responsive group, consisted of individuals who each exhibited an improvement of at least 1 point, while the second group, termed the non-responsive group, showed no improvement. These effectiveness metrics, denoted as positive (+) or negative (−), are detailed for both groups in Table 1. In total, 93 subjects demonstrated responsiveness in their total AHI score, 78 in the functional subscale score, 96 in the physical subscale score, and 73 in the emotional subscale score. The dataset also includes variables such as gender, smoking status, alcohol consumption, voice use status, coffee consumption, comorbidities, surgical history, and voice training status.

2.2. VHI Questionnaire

Voice treatment was delivered through a combination of direct and indirect techniques. Sessions were scheduled weekly with the number of sessions varying among patients, ranging from one to eight and averaging four sessions in total, with each session lasting approximately 40 min. Patients self-assessed their vocal condition using the VHI questionnaire prior to and following treatment [19,20,21,22,23].
The VHI, a psychometric assessment tool widely used for evaluating voice disorders, was developed by Jacobson et al. in 1997 [12] and was designed to be broadly applicable to adult patients with voice disorders, regardless of the type, including those who have undergone laryngectomy. In this study, the Korean translation of the VHI was used. The tool comprises 30 questions, which are equally allocated across three domains: functional (F), physical (P), and emotional (E). Each variable is scored on a five-point scale (0–4), representing “never”, “almost never”, “sometimes”, “almost always”, or “always”. The scores are expressed in subscores (0–40) and a total score (0–120). Firstly, the functional subscale is used to evaluate the difficulties in communication caused by voice problems and assess the impacts of these issues on social interactions. Secondly, the physical subscale is used to measure the physical challenges and discomfort experienced during the vocalization process. Lastly, the emotional subscale is used to assess the psychological effects of voice problems, such as emotional stress and reduced self-esteem. Higher scores indicate greater severity of the disorder [24,25,26,27,28]. The details of the questionnaire are shown in Table 2.

2.3. Statistical Analysis

The statistical analysis was performed utilizing software packages, including IBM SPSS version 21.0 (IBM Corp., Armonk, NY, USA) [18,21]. As an initial step in the analysis, descriptive statistics such as histograms and box plots were generated. Specifically, means, maxima, minima, and medians were calculated for the total VHI scores, as well as for their functional, emotional, and physical subscale scores, in order to illustrate the differences observed before and after treatment.
Binomial logistic regression analysis is a statistical technique applied when the dependent variable is categorical, particularly dichotomous, with only two possible outcomes [29]; this method differs from linear regression in its utilization of the logit function to model data, thereby enabling the prediction of the probability of a specific class or event through fitting the data to a logistic curve [30]. The logistic regression model is used to calculate the log odds of the dependent variable, which corresponds to the likelihood of an event occurring, in order to estimate the probability of a binary event’s occurrence based on one or more predictor variables. Logistic regression is commonly utilized for maximum likelihood estimation to estimate the model parameters and evaluate the influences of independent variables on the dependent variable; this method is widely used to understand the effects of particular variables on an outcome and to make predictions for future observations [29,30]. In this study, a binomial logistic regression model was fitted to analyze eight parameters as predictors of voice therapy based on the VHI. The dependent variable is VHI value (+ and −), while the eight independent variables are gender, smoking status, alcohol consumption, voice user status, coffee consumption, comorbidity, surgery status, and voice training status.

2.4. Random Forest Model

The random forest (RF) algorithm is an ensemble learning approach that combines multiple decision trees to enhance predictive performance and model stability [31,32]. Predominantly applied to classification and regression problems, RF utilizes diverse sampling techniques to ensure that each tree in the ensemble generates distinct predictions, thereby increasing both accuracy and robustness [33]. Specifically, RF builds multiple decision trees on various bootstrap samples of the data, ensuring that each tree explores different patterns by selecting only a subset of features for each split. In classification tasks, RF finalizes predictions through majority voting, selecting the class with the most votes, whereas in regression, the predictions of all trees are averaged. Through this process, RF not only builds a reliable model less prone to overfitting but also offers feature importance insights, enabling the identification of key predictors that significantly influence outcomes [31,32,33]. In this study, we leverage RF to classify VHI based on a limited dataset, demonstrating the model’s adaptability in smaller data environments while still providing accurate predictions and meaningful insights into feature contributions.

2.5. Multilayer Perceptron Model

Numerous artificial neural network models have been developed for a wide range of applications across various disciplines [34,35,36,37]. Among these, the multilayer feed-forward artificial neural network (MLP) is one of the most commonly utilized models and was selected for use in our study [38,39,40,41]. The selection of MLP in this study is motivated by its structural simplicity and high interpretability. Given the smaller dataset size, MLP presents a lower risk of overfitting compared to more complex models and has demonstrated dependable performance across numerous studies. Additionally, compared to more complex ML models, MLP is easier to train and interpret, making it well-suited for the objectives and dataset size of this study. All MLP models were implemented using Python (version 3.8.0). Before being input into the networks, the data were normalized using the min–max method [38,39]. The training of the MLP was conducted through supervised learning, where a sequence of input and output variables from the training dataset was provided [40,41]. Through iterative adjustments of the connection weights, an optimal input–output mapping function was established [36]. Throughout the training process, careful consideration was given to several factors, including the choice of optimization algorithm, the configuration of model parameters, and the determination of the maximum number of training iterations [35,36,37]. The neural network architecture and connection parameters were fine-tuned until the model’s loss function was stabilized, thus achieving optimal fitting performance [38,39]. Following successful training, the model’s generalization ability was evaluated using an external testing dataset [34,35,36,37,38,39,40,41].
The input features for the MLP models were extracted from the clinical data obtained from patients’ medical histories and were linked to the target output variables, which included gender, smoking status, alcohol use, voice usage, coffee consumption, comorbidities, surgical history, and the status of voice training. The chosen features were utilized to develop the MLP model using the training dataset. Each neuron’s activation function was set to the hyperbolic tangent function, while the softmax function was used at the output layer. The learning process of the model was driven by the adaptive momentum (Adam) optimization algorithm, with cross-entropy serving as the loss function. The model‘s performance was assessed through a five-fold cross-validation method on the training dataset, which allowed for the identification of the optimal number of hidden layer units and the maximum iteration limit. To reduce the risk of overfitting, a regularization coefficient of 0.001 was implemented. After the evaluation phase, the final MLP model was trained on the complete training dataset using the optimal parameters determined during cross-validation.

2.6. Research Process Flowchart

Figure 1 illustrates each stage of the research process. The first stage, data collection, involves gathering the data necessary for the study. This is followed by the data preprocessing stage, where the collected data are refined to ensure suitability for analysis. Subsequently, in the clinical and personal data analysis stage, clinical and personal data are analyzed to identify the variables required for the research. The next phase utilizes three models: binomial logistic regression, random forest, and multilayer perceptron model, which are employed to analyze, train on, and predict outcomes based on the data. Following this, the evaluation metrics stage calculates performance metrics for each model, assessing their accuracy and sensitivity. Finally, in the analysis of results stage, the outcomes are analyzed based on these evaluation metrics, leading to the study’s conclusions.

3. Results

3.1. Descriptive Analysis

Figure 2 presents the frequency distribution of the variables relevant to the database used in this study. In terms of gender, there were more females than males, and for smoking status, the majority were non-smokers. Regarding alcohol consumption, there were more non-drinkers, though a significant number of participants consumed alcohol. The proportion of voice users and non-users was nearly equal, and more participants drank coffee than those who did not. There were more individuals without comorbidities and, in terms of surgery status, slightly more participants had not undergone surgery, with a small number having undergone laser microlaryngeal surgery (LMS). Lastly, more participants had received voice training compared to those who had not. The following graphs help in comparing and understanding the demographic and lifestyle characteristics of the subjects in the database.
Figure 3 presents a boxplot comparing the distribution of VHI scores across four categories (F, P, E, and T) before and after voice therapy. In this context, F denotes functional scores, P represents physical scores, E corresponds to emotional scores, and T signifies the total VHI score. Each category reflects a specific dimension of the VHI assessment, and the boxplot visually represents the median, interquartile range, minimum and maximum values, as well as outliers. Under the pre-therapy (“Before”) condition, the T category exhibits a notably high median score, with a wide range of distribution; however, under the post-therapy (“After”) condition, the median score decreases significantly, and the variability is markedly reduced. The P category, which initially shows a relatively high median and broad distribution under the “Before” condition, also experiences reductions in both the median score and the range of distribution under the “After” condition. The F and E categories maintain low medians and narrow distributions across both conditions, indicating minimal score variability. Overall, the T category demonstrates a substantial reduction in scores following the intervention, while the other categories (F, P, E) also show decreases in score distributions or reduced variability under the “After” condition. Ultimately, the lower scores across all categories following voice therapy suggest that most individuals perceived an improvement in the quality of their voice after undergoing treatment.
Figure 4 consists of four histograms, each illustrating the effectiveness of therapy on AHI scores across different dimensions. The histograms are divided into two categories: “Effectiveness −” and “Effectiveness +”, representing cases in which therapy was ineffective and effective, respectively. The first histogram (a) displays the distribution of total AHI scores, with 93 samples in the “Effectiveness +” category, indicating that most participants experienced positive changes following therapy. The second histogram (b) shows the distribution of functional scores, where 78 samples fall into the “Effectiveness +” category, suggesting significant improvements in the functional aspect for many participants. The third histogram (c) depicts the distribution of physical scores, with 96 samples in the “Effectiveness +” category, demonstrating that most participants experienced positive physical changes post-therapy. Lastly, the fourth histogram (d) represents the distribution of emotional scores, with 73 samples in the “Effectiveness +” category, indicating that therapy also had a beneficial impact on the emotional dimension. Overall, these four histograms indicate that therapy was effective across various aspects of the AHI, with the majority of participants showing improvement after treatment. The physical scores, in particular, exhibited the highest effectiveness, while the emotional scores also showed substantial improvement, which suggests that treatment had a positive impact across multiple dimensions, supporting its overall success.
Figure 5a presents a boxplot illustrating the time (in days) differences before and after voice treatment, specifically indicating the duration between the start and end dates of speech treatment. During this period, patients self-assessed their VHI scores. While the majority of the data are concentrated within 200 days, the presence of several outliers with significantly higher values suggests that a considerable amount of time was required for some patients to complete their voice treatment. Figure 5b illustrates the distribution of the number of voice training sessions. The median and mean are positioned closely, indicating that most participants underwent a similar number of voice training sessions. The maximum number of sessions recorded is twelve, with an average of four sessions.
As shown in Table 3, the effectiveness of voice therapy was evaluated using pre- and post-therapy VHI scores, with p-values calculated for various factors. The results indicate that gender has a highly statistically significant impact (p < 0.001) on the outcome of therapy; additionally, surgery status (p = 0.038) and voice training status (p = 0.021) also demonstrate statistically significant effects. Conversely, coffee consumption, smoking status, comorbidity, alcohol consumption, and voice usage status do not exhibit statistically significant effects, as their p-values exceed the 0.05 threshold.

3.2. Binomial Logistic Regression Analysis

To assess the efficacy of voice treatment based on VHI scores, a binomial logistic regression analysis was carried out. The analysis considered independent variables such as gender, smoking habits, alcohol use, voice usage status, coffee consumption, the presence of comorbidities, surgical history, and voice training status. Gender was coded as 0 for males and 1 for females; smoking status, alcohol consumption, voice user status, and other variables were coded similarly (0 for no, 1 for yes). The dependent variable, representing VHI scores after speech therapy, was coded as 1 if improvement occurred and 0 if not.
Table 4 summarizes the findings from the binomial logistic regression analysis. The analysis shows that gender has a statistically significant effect on VHI score improvement (p = 0.012), with a corresponding odds ratio (exp (B)) of 25.033, indicating that differences in gender are associated with a 25-fold increase in the odds of an improved VHI score. Other variables, such as smoking status, alcohol consumption, and voice user status, do not have significant impacts. The model’s Nagelkerke R2 is 0.422, suggesting moderate explanatory power, and the Hosmer–Lemeshow test indicates good model fit (p = 0.967). The classification accuracy of the model is 91.2%, reflecting strong predictive performance.

3.3. Random Forest Analysis

Table 5 outlines the input factors and key parameter settings of the RF model. This model utilizes eight input factors: gender, smoking status, alcohol consumption, voice user status, coffee consumption, comorbidity, surgery status, and voice training status. The parameter configuration includes an n_estimators value of 50, indicating that the model comprises 50 decision trees. The max_depth parameter is set to none, allowing each tree to grow to its maximum depth. The min_samples_split and min_samples_leaf are set to 2 and 1, respectively, specifying the minimum number of samples required to split an internal node and to be present at a leaf node. Additionally, max_features is configured as sqrt, meaning the number of features considered for splitting each node is the square root of the total number of features. The bootstrap parameter is set to true, indicating that each tree is trained using bootstrap samples.
Table 6 illustrates the confusion matrix, demonstrating the predictive performance of the model, while Table 7 summarizes the classification metrics. The model achieves an overall accuracy of 84.61%, with a high precision of 0.88, indicating that 88% of cases predicted as “VHI +” are true positives. However, the specificity is 0%, revealing that the model fails to correctly identify any “VHI −” instances. Conversely, the recall is 0.95, showing that the model successfully identifies most actual “VHI +” cases. The F-score of 0.91 suggests that the model maintains a good balance between precision and recall. These results indicate that while the model performs well in predicting “VHI +”, its performance in predicting “VHI −” is notably poor. This could be attributed to data imbalance, where the limited number of “VHI −” cases may hinder the model’s ability to accurately detect negative instances.

3.4. Multilayer Perceptron (MLP) Model

In this study, we utilized the MLP method to analyze the dataset presented in Table 1 and identified effectiveness factors, such as VHI scores, that may be associated with voice treatment outcomes. The details of the MLP model as used for this study are provided in Table 8. About 75% of the entire dataset was used for training, with the remaining 25% allocated for testing. The MLP model was designed to predict improvements in VHI scores based on a variety of input factors, including gender, smoking status, alcohol consumption, voice usage, coffee consumption, comorbidity status, surgical history, and voice training status. The input layer consists of seventeen units; the hidden layer is composed of a single layer with two units and employs the hyperbolic tangent activation function; and the output layer, which includes two units, utilizes the softmax activation function. The model’s optimization is guided via the cross-entropy error function.
The MLP model’s performance was assessed using several metrics, including accuracy, sensitivity, specificity, F score, and the area under the curve (AUC) [5,27]. Figure 6 illustrates a receiver operating characteristic (ROC) curve, a widely used tool for assessing the performance of binary classification models. The y-axis represents sensitivity, or the true-positive rate, while the x-axis depicts specificity, which corresponds to the false-positive rate. The two curves are labeled “VHI −” and “VHI +”, corresponding to different categories or conditions, where “VHI −” denotes cases in which VHI scores did not improve following voice therapy, while “VHI +” indicates cases in which VHI scores showed improvement post-therapy. The curves in Figure 6 demonstrate that the model exhibits high sensitivity and specificity, as they are positioned above the diagonal line, indicating a strong ability to distinguish between positive and negative outcomes. The AUC-ROC value was 0.871; overall, this ROC curve suggests that the classification model for the VHI performs effectively, with high sensitivity and specificity.
The confusion matrices, shown in Table 9, illustrate the predictive performance of the model, which compares the predicted values against the actual outcomes to determine whether VHI scores improved. Out of twenty-six cases, the model predicted that VHI scores would not improve in three instances; among these, two cases were correctly identified as having no improvement (true negative, TN), while in one instance, the model incorrectly predicted no improvement when there actually was improvement (false negative, FN). Additionally, the model predicted improvement in VHI scores for 23 cases; of these, 22 cases were correctly identified as having improved (true positive, TP), while the model incorrectly predicted improvement when there was none in the one remaining case (false positive, FP).
Table 10 shows the performance metrics of the model, which achieved an accuracy of 84.61%, indicating that it correctly classified 84.61% of all cases. The model’s precision is 0.95, signifying that 95% of the instances predicted as positive were indeed positive; its specificity is 0.67, indicating that the model accurately identified 67% of true-negative cases; its recall, or sensitivity, is 0.86, demonstrating that the model correctly detected 86% of the actual positive cases; its F score is 0.90, representing a high level of harmonic balance between precision and recall; finally, its AUC is 0.87, reflecting the model’s strong capacity to distinguish between positive and negative outcomes.
Figure 7 presents a bar graph illustrating the relative importance of various parameters. Each bar represents the significance of a specific parameter, with the horizontal axis indicating the relative magnitude of importance. According to the graph, surgery status is the most influential parameter, followed by voice user status, gender, smoking status, alcohol consumption, coffee consumption, voice training status, and comorbidity, in descending order of importance. The relative importance of each parameter increases towards the right side of the graph, indicating that the parameters further to the right have a greater impact on the model.

4. Discussion

In this study, we employed binomial logistic regression analysis [29,30], random forest (RF) [31,32,33], and the MLP model [38,39,40,41] to predict the effectiveness of voice treatment based on various personal habits and clinical variables. The primary objective of this study was to analyze differences in VHI scores before and after therapy and identify key factors influencing the success of voice treatment.
The binomial logistic regression analysis revealed that gender was a significant predictor of improvements in VHI scores; specifically, female patients showed greater improvement in VHI scores after voice therapy compared to male patients, a finding suggesting that gender differences may play a crucial role in how patients perceive and respond to voice therapy. Clinically, this result indicates the need for tailored therapeutic approaches that consider gender-specific responses. For instance, more aggressive treatment protocols or gender-specific interventions may be beneficial for female patients to maximize the effectiveness of voice therapy. On the other hand, variables such as smoking status, alcohol consumption, and voice use status did not show statistically significant effects on VHI scores, which suggests that these factors may not directly influence the outcome of voice therapy. However, these results are based on a surface-level analysis, and future research should explore potential indirect effects of, or interactions among, these variables. For example, voice use status, when combined with certain environmental or lifestyle factors, might have a more substantial impact on the effectiveness of voice therapy.
The MLP model analysis identified gender, surgical history, voice use status, and voice training status as important predictors of voice therapy outcomes. Notably, surgical history and voice use status were shown to have high importance in the model, indicating that the physical causes of voice disorders and voice usage patterns significantly influence therapy outcomes. For instance, patients with a history of surgery might experience more significant changes in their voice condition following treatment; additionally, individuals who regularly use their voice may respond more sensitively to therapy, likely due to the higher impact of voice on their quality of life. Voice training status also emerged as a key variable, suggesting that regular voice training contributes positively to the effectiveness of voice therapy, a finding that underscores the importance of continuous voice exercises in maintaining and enhancing vocal function. Clinically, this supports the recommendation for ongoing voice training as part of the therapeutic regimen, and further research should investigate the frequency and intensity of training needed to optimize treatment outcomes.
The performance evaluation of the RF and MLP models reveals notable strengths and weaknesses for each approach. The MLP model outperforms the RF model in terms of precision and specificity, suggesting that the MLP model is more effective at minimizing false-positive predictions and accurately classifying negative cases. In contrast, the RF model exhibits higher recall, indicating its ability to identify a greater number of positive cases. However, the RF model’s specificity of 0.00 represents a significant drawback, as it fails to accurately classify negative cases. In conclusion, the MLP model provides a more balanced performance overall, particularly excelling in the classification of negative cases compared to the RF model. On the other hand, while the RF model’s higher recall highlights its strength in detecting positive cases, its inability to effectively identify negative cases limits its potential application in clinical settings.
This study makes several significant contributions. Firstly, by conducting a multidimensional analysis of the relationship between patients’ personal habits and VHI scores, we identified key predictive factors for the total VHI score; this approach provides a deeper understanding and evaluation of the effectiveness of voice therapy. Secondly, we employed statistical analysis, logistic regression, RF, and an MLP model to analyze the relationships between VHI scores and predictive factors, a methodological approach that enhances the validity of the research findings and sets a valuable precedent for future studies; however, our study also has certain limitations. Firstly, the logistic regression analysis revealed that variables such as smoking status, alcohol consumption, and voice use status did not have statistically significant impacts on VHI score improvement, which may suggest a superficial analysis, indicating a need for further research to explore potential indirect effects or interactions among these variables. Secondly, the MLP model’s high sensitivity and relatively low specificity suggest that it may fail to accurately predict cases for which therapy is ineffective, highlighting the need for further refinements to improve the model’s predictive accuracy.
The results of this study offer valuable guidelines for clinicians and speech–language pathologists in predicting the outcomes of voice therapy and developing personalized treatment plans. By considering key predictive variables such as gender, surgical history, voice use status, and voice training status, therapeutic approaches can be tailored to significantly enhance vocal function and improve the quality of life for patients with voice disorders. Future research should aim to validate these findings across larger and more diverse patient populations and explore additional predictive variables; furthermore, enhancing the model’s predictive power across different types of voice disorders will be crucial in developing more precise treatment plans. Such research will contribute to maximizing the effectiveness of voice therapy and providing personalized care tailored to individual patient needs.

5. Conclusions

Most previous studies have predicted the effects of voice therapy using objective measures such as voice analysis tests. However, in clinical practice, it is also important to consider subjective factors, such as patient satisfaction with treatment outcomes. Therefore, this study provides valuable insights into predicting voice therapy outcomes using the VHI. By analyzing various personal habits and clinical variables, we were able to identify key factors influencing the changes in VHI scores before and after voice therapy. Notably, gender, surgical history, frequency of voice use, and voice training status emerged as critical predictors of therapy outcomes; among these, gender was particularly significant, with female patients showing greater improvement in VHI scores compared to male patients. Surgical history and voice use status demonstrated high importance within the model, indicating that the physical causes of voice disorders and patterns of voice usage significantly impact therapeutic outcomes. For instance, patients with a history of surgery may experience more pronounced improvements in vocal condition post-therapy, while individuals who frequently use their voice may respond more sensitively to treatment due to the substantial impact on their quality of life. Clinically, these findings underscore the importance of voice training, emphasizing its positive role in sustaining and enhancing vocal function over time. Our analysis using the MLP model demonstrated the model’s ability to effectively distinguish between positive and negative therapy outcomes, making it a valuable tool in clinical settings; however, the model’s relatively low specificity indicates a limitation in accurately predicting cases for which therapy is ineffective, suggesting the need for further refinement to improve predictive accuracy.
This research offers valuable guidance for clinicians and speech–language pathologists in predicting voice therapy outcomes and developing personalized treatment strategies that take individual patient characteristics into account. Future research should focus on validating these findings across larger and more diverse patient populations and exploring additional predictive variables to enhance the accuracy of the model; moreover, further studies are needed to refine the model’s predictive power for various types of voice disorders. Such efforts will ultimately contribute to maximizing the effectiveness of voice therapy and improving the quality of life for patients with voice disorders.

Author Contributions

Data collection and analysis, J.H.P. and A.R.J.; conceptualization, J.-Y.L. and A.R.J.; methodology, J.-Y.L. and A.R.J.; software, J.-Y.L.; validation, J.-Y.L.; original draft preparation, J.-Y.L. and J.-N.L.; writing—review and editing, J.-Y.L. and A.R.J.; visualization, J.-N.L.; funding acquisition, J.-Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by Eulji University in 2024 (2024-0114).

Institutional Review Board Statement

This study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of Nowon Eulji Medical Center, Eulji University (IRB No. 2022-04-014).

Informed Consent Statement

Not applicable.

Data Availability Statement

Data were obtained from Nowon Eulji Medical Center and are available from Prof. Ah Ra Jung with the permission of Nowon Eulji Medical Center.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Lee, M.G.; Choi, H.S.; Choi, H.M.; Baek, H.S.; Lim, S.E.; Kauh, S.K.; Choi, Y. Changes in Respiration and Phonation in Acting Students after training with the Alexander Technique. Commun. Sci. Disord. 2014, 19, 371–380. [Google Scholar] [CrossRef]
  2. Ma, E.M.; Yiu, E.M. Handbook of Voice Assessments; Plural Publishing: San Diego, CA, USA, 2011. [Google Scholar]
  3. Ma, E.P.; Yiu, E.M. Voice Activity and Participation Profile: Assessing the Impact of Voice Disorders on Daily Activities. J. Speech Lang. Hear. Res. 2001, 44, 511–524. [Google Scholar] [CrossRef] [PubMed]
  4. Zraick, R.I.; Risner, B.Y.; Smith-Olinde, L.; Gregg, B.A.; Johnson, F.L.; McWeeny, E.K. Patient Versus Partner Perception of Voice Handicap. J. Voice 2007, 21, 485–494. [Google Scholar] [CrossRef] [PubMed]
  5. Lee, S.J.; Choi, H.; Kim, H.; Byeon, H.K.; Lim, S.; Yang, M.K. Korean Version of the Voice Activity and Participation Profile (K-VAPP): A Validation Study. Commun. Sci. Disord. 2016, 21, 695–708. [Google Scholar] [CrossRef]
  6. Jang, H.-R.; Shim, H.-J.; Shin, H.-B.; Ko, D.-H.; Kim, H.-K. The Relationship between Acoustic Characteristics and Voice Handicap Index in Esophageal Speakers. Phon. Speech Sci. 2014, 6, 115–121. [Google Scholar] [CrossRef]
  7. Behrman, A.; Sulica, L.; He, T. Factors Predicting Patient Perception of Dysphonia Caused by Benign Vocal Fold Lesions. Laryngoscope 2004, 114, 1693–1700. [Google Scholar] [CrossRef]
  8. Kim, J.O.; Lim, S.E.; Park, S.Y.; Choi, S.H.; Choi, J.N.; Choi, H.S. Validity and Reliability of Korean-Version of Voice Handicap Index and Voice-Related Quality of Life. Speech Sci. 2007, 14, 111–125. [Google Scholar]
  9. Portone, C.R.; Hapner, E.R.; McGregor, L.; Otto, K.; Johns, M.M., 3rd. Correlation of the Voice Handicap Index (VHI) and the Voice-Related Quality of Life Measure (V-RQOL). J. Voice 2007, 21, 723–727. [Google Scholar] [CrossRef]
  10. Siupsinskiene, N.; Vaitkus, S.; Grebliauskaite, M.; Engelmanaite, L.; Sumskiene, J. Quality of Life and Voice in Patients Treated for Early Laryngeal Cancer. Medicina 2008, 44, 288–295. [Google Scholar] [CrossRef] [PubMed]
  11. Lu, D.; Huang, M.; Cheng, I.K.; Dong, J.; Yang, H. Comparison and Correlation Between the Pediatric Voice Handicap Index and the Pediatric Voice-Related Quality-of-Life Questionnaires. Medicine 2018, 97, e11850. [Google Scholar] [CrossRef]
  12. Jacobson, B.; Johnson, A.; Grywalski, C.; Silbergleit, A.; Jacobson, G.; Benninger, M.; Newman, C. The Voice Handicap Index (VHI): Development and Validation. Am. J. Speech Lang. Pathol. 1997, 6, 66–70. [Google Scholar] [CrossRef]
  13. Barsties, V.L.B.; Watts, C.R.; Hetjens, S.; Neumann, K. The Efficacy of Different Voice Treatments for Vocal Fold Polyps: A Systematic Review and Meta-Analysis. J. Clin. Med. 2023, 12, 3451. [Google Scholar] [CrossRef] [PubMed]
  14. Watanabe, K.; Sato, T.; Honkura, Y.; Kawamoto-Hirano, A.; Kashima, K.; Katori, Y. Characteristics of the Voice Handicap Index for Patients with Unilateral Vocal Fold Paralysis Who Underwent Arytenoid Adduction. J. Voice 2020, 34, 649.e1–649.e6. [Google Scholar] [CrossRef] [PubMed]
  15. Jalalian, M.; Saleh, M.; Zarei, N.; Shekari, E.; Afshari, S. Comparing the Voice Handicap Index Scores in Groups with Structural and Functional Voice Disorders. J. Rehabil. 2019, 20, 376–382. [Google Scholar] [CrossRef]
  16. Muslih, I.; Herawati, S.; Pawarti, D.R. Association Between Voice Handicap Index and Praat Voice Analysis in Patients with Benign Vocal Cord Lesion Before and After Microscopic Laryngeal Surgery. Indian J. Otolaryngol. Head Neck Surg. 2019, 71, 482–488. [Google Scholar] [CrossRef]
  17. Hakkesteegt, M.M.; Brocaar, M.P.; Wieringa, M.H. The Applicability of the Dysphonia Severity Index and the Voice Handicap Index in Evaluating Effects of Voice Therapy and Phonosurgery. J. Voice 2010, 24, 199–205. [Google Scholar] [CrossRef]
  18. Niebudek-Bogusz, E.; Kuzańska, A.; Woznicka, E.; Sliwinska-Kowalska, M. Assessment of the Voice Handicap Index as a Screening Tool in Dysphonic Patients. Folia Phoniatr. Logop. 2011, 63, 269–272. [Google Scholar] [CrossRef]
  19. Kim, J.H.; Choi, H.G.; Park, B. Change of Voice Handicap Index After Laryngeal Microsurgery for Benign Vocal Fold Lesions. J. Laryngol. Voice 2015, 26, 34–39. [Google Scholar] [CrossRef]
  20. Caffier, F.; Nawka, T.; Neumann, K.; Seipelt, M.; Caffier, P.P. Validation and Classification of the 9-Item Voice Handicap Index (VHI-9i). J. Clin. Med. 2021, 10, 3325. [Google Scholar] [CrossRef]
  21. Sabir, B.; Touri, B.; Moussetad, M. Correlation Between Acoustic Measures, Voice Handicap Index and GRBAS Scales Scores Among Moroccan Students. Curr. Pediatr. Res. 2017, 21, 343–353. [Google Scholar]
  22. Cheng, J.; Woo, P. Correlation Between the Voice Handicap Index and Voice Laboratory Measurements After Phonosurgery. Ear Nose Throat J. 2010, 89, 183–188. [Google Scholar] [CrossRef] [PubMed]
  23. Maertens, K.; de Jong, F.I. The Voice Handicap Index as a Tool for Assessment of the Biopsychosocial Impact of Voice Problems. B-ENT 2007, 3, 61–66. [Google Scholar] [PubMed]
  24. De Oliveira Lemos, I.; Marchand, D.L.; Cassol, M. Voice Handicap Index Check Pre- and Post-Vocal Intervention in Patients with Dysphonia. Audiol. Commun. Res. 2015, 20, 355–360. [Google Scholar]
  25. Lee, S.A.; Choi, H.J.; Kim, B.; Lee, H.; Lee, S.K.; Lee, J.G.; Nam, E.C. Voice Handicap Index and Vocal Characteristics of Teachers. Korean J. Otorhinolaryngol.-Head Neck Surg. 2012, 55, 101–106. [Google Scholar] [CrossRef]
  26. Núñez-Batalla, F.; Corte-Santos, P.; Señaris-González, B.; Llorente-Pendás, J.L.; Górriz-Gil, C.; Suárez-Nieto, C. Adaptation and Validation to the Spanish of the Voice Handicap Index (VHI-30) and its Shortened Version (VHI-10). Acta Otorrinolaringol. Esp. 2007, 58, 386–392. [Google Scholar] [CrossRef]
  27. Hwang, H.; Lee, S.; Park, H.Y.; Lim, H.Y.; Park, K.H.; Park, G.Y.; Im, S. Investigating the Impact of Voice Impairment on Quality of Life in Stroke Patients: The Voice Handicap Index (VHI) Questionnaire Study. Brain Neurorehabil. 2023, 16, e10. [Google Scholar] [CrossRef]
  28. Mark, G.; Jackie, G.; Clark, R. The VHI-10 and VHI Item Reduction Translations—Are we all Speaking the Same Language? J. Voice 2017, 31, e1–e250. [Google Scholar]
  29. Kim, S.H.; Jeong, G.H. An Analysis for Influencing Factors in Purchasing Electric Vehicle Using a Binomial Logistic Regression Model (Focused on Suwon City). KSCE J. Civ. Environ. Eng. Res. 2018, 38, 887–894. [Google Scholar]
  30. Kim, M.J. A Study on WLB (Work-Life Balance) Attributes Affecting Job Satisfaction by Gender by Using a Logistic Regression. Inst. Bus. Manag. 2018, 41, 213–229. [Google Scholar]
  31. Gilles, L. Understanding Random Forests: From Theory to Practice. arXiv 2014, arXiv:1407.7502. [Google Scholar]
  32. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  33. Zhigang, S.; Guotao, W.; Pengfei, L.; Hui, W.; Min, Z.; Xiaowen, L. An improved random forest based on the classification accuracy and correlation measurement of decision trees. Expert Syst. Appl. 2024, 237, 121549. [Google Scholar]
  34. Byun, H.W. The Prediction Model for Self-Reported Voice Problem Using a Decision Tree Model. J. Korea Acad.-Ind. Coop. Soc. 2013, 14, 3368–3373. [Google Scholar]
  35. Verde, L.; Pietro, G.D.; Sannino, G. Voice Disorder Identification by Using Machine Learning Techniques. IEEE Access 2018, 6, 16246–16255. [Google Scholar] [CrossRef]
  36. Yoo, J.-H.; Heo, E.-J.; Kim, N.-Y.; Lee, Y.-J.; Kim, G.-W. Predictors of Clinical Efficacy of Oriental Medical Treatment in Patients with Panic Disorder. J. Orient. Neuropsychiatry 2015, 26, 293–305. [Google Scholar] [CrossRef]
  37. Yun, J.; Shim, H.J.; Seong, C. Classification of Muscle Tension Dysphonia (MTD) Female Speech and Normal Speech Using Cepstrum Variables and Random Forest Algorithm. Phon. Speech Sci. 2020, 12, 91–98. [Google Scholar] [CrossRef]
  38. Mehmet, K. Performance Evaluation of Multilayer Perceptron Artificial Neural Network Model in the Classification of Heart Failure. J. Cogn. Syst. 2021, 6, 35–38. [Google Scholar]
  39. Gholamreza, P.; Maryam, M.Z. Comparison of Artificial Neural Network and SPSS Model in Predicting Customers Churn of Iran’s Insurance Industry. Int. J. Comput. Appl. 2020, 176, 14–21. [Google Scholar]
  40. Lee, J.; Choi, J. Alcohol Dependence Screening Test Using Artificial Neural Network Analysis: The Sensitivity and Specificity Study. J. Korean Acad. Addict. Psychiatry 2005, 9, 102–109. [Google Scholar]
  41. Zhang, Z.; Zhou, D.; Zhang, J.; Xu, Y.; Lin, G.; Jin, B.; Liang, Y.; Geng, Y.; Zhang, S. Multilayer Perceptron-Based Prediction of Stroke Mimics in Prehospital Triage. Sci. Rep. 2022, 12, 17994. [Google Scholar] [CrossRef]
Figure 1. Research process flowchart of this study.
Figure 1. Research process flowchart of this study.
Applsci 14 10376 g001
Figure 2. Histogram distributions of multiple variables categorized by participant status.
Figure 2. Histogram distributions of multiple variables categorized by participant status.
Applsci 14 10376 g002
Figure 3. Comparison of VHI scores pre- and post-voice treatment across functional (F), physical (P), emotional (E), and total (T) domains.
Figure 3. Comparison of VHI scores pre- and post-voice treatment across functional (F), physical (P), emotional (E), and total (T) domains.
Applsci 14 10376 g003
Figure 4. AHI scores based on effectiveness (+, −) after voice treatment.
Figure 4. AHI scores based on effectiveness (+, −) after voice treatment.
Applsci 14 10376 g004
Figure 5. (a) Duration of voice treatment and (b) number of voice training sessions.
Figure 5. (a) Duration of voice treatment and (b) number of voice training sessions.
Applsci 14 10376 g005
Figure 6. ROC curves of MLP modeling.
Figure 6. ROC curves of MLP modeling.
Applsci 14 10376 g006
Figure 7. The relative importance of each variable.
Figure 7. The relative importance of each variable.
Applsci 14 10376 g007
Table 1. Overview of the dataset utilized in this study.
Table 1. Overview of the dataset utilized in this study.
Number of Samples66 Female and 36 Male Voices
Average age51
Types of voice disorders (numbers)vocal fold polyp (31), vocal paresis (2), intracordal cyst (2), vocal nodule (19), mutational dysphonia (1), thyroid nodule (1), hoarseness (1), muscle tension dysphonia (7), sulcus vocalis (2), presbyphonia (8), dysphonia (10), vocal cyst (7), leukoplakia (2), vocal palsy (1), thyroid cancer (2), vallecular cyst (1), vocal mass (2), laryngopharyngeal reflux (1), none (3)
Number of responsive samples (effectiveness, +)93 (total AHI score), 78 (functional subscale score), 96 (physical subscale score), 73 (emotional subscale score)
Variablesgender, smoking status, alcohol consumption, voice user status, coffee consumption, comorbidity, surgery status, voice training status
Table 2. Voice handicap index 30 questionnaire [24,25,26,27,28].
Table 2. Voice handicap index 30 questionnaire [24,25,26,27,28].
Functional Subscale
  • My voice makes it difficult for people to hear me.
01234
2.
People have difficulty understanding me in a noisy room.
01234
3.
My family has difficulty hearing me when I call them throughout the house.
01234
4.
I use the phone less often than I would like to.
01234
5.
I tend to avoid groups of people because of my voice.
01234
6.
I speak with friends, neighbors, or relatives less often because of my voice.
01234
7.
People ask me to repeat myself when speaking face-to-face.
01234
8.
My voice difficulties restrict my personal and social life.
01234
9.
I feel left out of conversations because of my voice.
01234
10.
My voice problem causes me to lose income.
01234
Physical subscale
  • I run out of air when I talk.
01234
2.
The sound of my voice varies throughout the day.
01234
3.
People ask, “What’s wrong with your voice?”
01234
4.
My voice sounds creaky and dry
01234
5.
I feel as though I have to strain to produce voice.
01234
6.
The clarity of my voice is unpredictable.
01234
7.
I try to change my voice to sound different.
01234
8.
I use a great deal of effort to speak.
01234
9.
My voice is worse in the evening.
01234
10.
My voice “gives out” on me in the middle of speaking.
01234
Emotional subscale
  • I tend to avoid groups of people because of my voice.
01234
2.
People seem irritated with my voice.
01234
3.
I find other people don’t understand my voice problem.
01234
4.
My voice problem upsets me.
01234
5.
I am less outgoing because of my voice problem.
01234
6.
My voice makes me feel handicapped.
01234
7.
I feel annoyed when people ask me to repeat.
01234
8.
I feel embarrassed when people ask me to repeat.
01234
9.
My voice makes me feel incompetent.
01234
10.
I’m ashamed of my voice problem.
01234
Table 3. p-values calculated for various factors.
Table 3. p-values calculated for various factors.
p-Value p-Value
gender<0.001 **coffee consumption0.125
smoking status0.372comorbidity0.894
alcohol consumption0.104surgery status0.038 *
voice user status0.061voice training status0.021 *
* and ** indicate statistical significance at the 0.05 and 0.01 levels, respectively.
Table 4. p-values calculated for various factors.
Table 4. p-values calculated for various factors.
Dependent VariableIndependent VariableBS.E. W a l d p ValueExp(B)
VHIgender3.2201.2756.3810.01225.033
smoking status1.1751.6021.2240.2633.237
alcohol consumption0.3240.9720.1110.7391.382
voice user status−1.9161.0433.3720.0660.147
coffee consumption−1.3301.2771.0840.2980.265
comorbidity−0.3130.9640.1060.7450.731
surgery status−1.7571.1872.1900.1390.173
voice training status−0.4051.0540.1470.7010.667
ModelNagelkerke R2 = 0.422, p = 0.006
Hosmer–Lemeshow test p = 0.967
Classification accuracy: 91.2%
Table 5. Parameter setting information related to RF modeling.
Table 5. Parameter setting information related to RF modeling.
Input FactorsGender, Smoking Status, Alcohol Consumption, Voice User Status, Coffee Consumption, Comorbidity, Surgery Status, and Voice Training Status
n_estimators50
max_depthNone
min_samples_split2
min_samples_leaf1
max_featuressqrt
bootstrapTrue
Table 6. Confusion matrices of RF model.
Table 6. Confusion matrices of RF model.
Reference
VHI (−)VHI (+)Total
PredictedVHI (−)033
VHI (+)12223
Total12526
Table 7. Classification performance of RF model.
Table 7. Classification performance of RF model.
PerformanceValue
Accuracy (%)84.61%
Precision0.88
Specificity0.00
Recall0.95
F score0.91
Table 8. Information related to MLP modeling and results.
Table 8. Information related to MLP modeling and results.
Input LayerInput FactorsGender, Smoking Status, Alcohol Consumption, Voice User Status, Coffee Consumption, Comorbidity, Surgery Status, and Voice Training Status
Number of Units9 (Including Bias)
Hidden layerNumber of hidden layers1
Number of units2
Activation functionHyperbolic tangent
Output layerDependent variableVHI
Number of units1
Activation functionSoftmax
Error functionCross-entropy
Table 9. Confusion matrices of MLP model.
Table 9. Confusion matrices of MLP model.
Reference
VHI (−)VHI (+)Total
PredictedVHI (−)213
VHI (+)32023
Total32326
Table 10. Classification performance of MLP model.
Table 10. Classification performance of MLP model.
PerformanceValue
Accuracy (%)84.61%
Precision0.95
Specificity0.67
Recall0.86
F score0.90
AUC0.87
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, J.-Y.; Park, J.H.; Lee, J.-N.; Jung, A.R. Personal and Clinical Predictors of Voice Therapy Outcomes: A Machine Learning Analysis Using the Voice Handicap Index. Appl. Sci. 2024, 14, 10376. https://doi.org/10.3390/app142210376

AMA Style

Lee J-Y, Park JH, Lee J-N, Jung AR. Personal and Clinical Predictors of Voice Therapy Outcomes: A Machine Learning Analysis Using the Voice Handicap Index. Applied Sciences. 2024; 14(22):10376. https://doi.org/10.3390/app142210376

Chicago/Turabian Style

Lee, Ji-Yeoun, Ji Hye Park, Ji-Na Lee, and Ah Ra Jung. 2024. "Personal and Clinical Predictors of Voice Therapy Outcomes: A Machine Learning Analysis Using the Voice Handicap Index" Applied Sciences 14, no. 22: 10376. https://doi.org/10.3390/app142210376

APA Style

Lee, J. -Y., Park, J. H., Lee, J. -N., & Jung, A. R. (2024). Personal and Clinical Predictors of Voice Therapy Outcomes: A Machine Learning Analysis Using the Voice Handicap Index. Applied Sciences, 14(22), 10376. https://doi.org/10.3390/app142210376

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop