Next Article in Journal
An Improved Measurement Matrix Generator for Compressed Sensing of ECG Signals
Next Article in Special Issue
Bradykinesia Detection in Parkinson’s Disease Using Smartwatches’ Inertial Sensors and Deep Learning Methods
Previous Article in Journal
A Semantic Mapping Method of Relation Representation Enhancement for Few-Shot Knowledge Graph Completion
Previous Article in Special Issue
Classical FE Analysis to Classify Parkinson’s Disease Patients
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Efficient Machine Learning Approach for Diagnosing Parkinson’s Disease by Utilizing Voice Features

1
Computer Science & Engineering, Veer Madho Singh Bhandari Uttarakhand Technical University, Dehradun 248007, India
2
Department of Computer Science and Engineering, Women Institute of Technology, Dehradun 248007, India
3
Department of Computer Science & Engineering, Graphic Era Deemed to be University, Dehradun 248001, India
4
Division of Research and Innovation, Uttaranchal Institute of Technology, Uttaranchal University, Dehradun 248007, India
5
Department of Project Management, Universidad Internacional Iberoamericana, Campeche 24560, Mexico
6
Department of Computer Engineering, Faculty of Science and Technology, Vishwakarma University, Pune 411048, India
7
Research Center of Excellence for Health Informatics, Vishwakarma University, Pune 411048, India
8
Department of Information System, College of Applied Sciences, King Khalid University, Muhayel 61913, Saudi Arabia
9
Department of Electrical Engineering, G.B. Pant Institute of Engineering and Technology, Pauri 246194, India
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(22), 3782; https://doi.org/10.3390/electronics11223782
Submission received: 13 October 2022 / Revised: 3 November 2022 / Accepted: 16 November 2022 / Published: 17 November 2022

Abstract

:
Parkinson’s disease (PD) is a neurodegenerative disease that impacts the neural, physiological, and behavioral systems of the brain, in which mild variations in the initial phases of the disease make precise diagnosis difficult. The general symptoms of this disease are slow movements known as ‘bradykinesia’. The symptoms of this disease appear in middle age and the severity increases as one gets older. One of the earliest signs of PD is a speech disorder. This research proposed the effectiveness of using supervised classification algorithms, such as support vector machine (SVM), naïve Bayes, k-nearest neighbor (K-NN), and artificial neural network (ANN) with the subjective disease where the proposed diagnosis method consists of feature selection based on the filter method, the wrapper method, and classification processes. Since just a few clinical test features would be required for the diagnosis, a method such as this might reduce the time and expense associated with PD screening. The suggested strategy was compared to PD diagnostic techniques previously put forward and well-known classifiers. The experimental outcomes show that the accuracy of SVM is 87.17%, naïve Bayes is 74.11%, ANN is 96.7%, and KNN is 87.17%, and it is concluded that the ANN is the most accurate one with the highest accuracy. The obtained results were compared with those of previous studies, and it has been observed that the proposed work offers comparable and better results.

1. Introduction

Parkinson’s disease, commonly known as Tremor, is affected by a reduction in dopamine levels in the brain which damages a person’s motion functions, or physical functioning. It is one of the world’s most common diseases. Intermittent neurological signs and symptoms result from these lesions, which get worse as the disease progresses [1]. Because aging causes changes in our brains, such as loss of synaptic connections and changes in neurotransmitters and neurohormones, this condition is more frequent among the elderly. With the passage of time, the neurons in a person’s body begin to die and become inimitable. The consequences of neurological problems and the falling dopamine levels in the patient’s body show gradually, making them difficult to detect until the patient’s condition requires medical treatment [2]. However, the symptoms and severity levels are different for individuals. Major symptoms of this disease are deficiency in speech, short-term memory loss, loss of balance, and unbalanced posture [1].
Every year, 8.5 million individual cases of this disease are registered worldwide, as per the World Health Organization (WHO) report in 2019 [3]. The chance of developing this disease rises with age; currently, there are 4% of sufferers worldwide under 50 years of age. This disease is the most widespread neurodegenerative disease in the world after Alzheimer’s disease, impacting millions of people [4,5]. Therapy for this disease is still in its initial stages, and doctors can only assist patients in alleviating the symptoms of the disease [6]. However, there are no definite diagnostics for this disease, and the diagnosis is largely dependent on the medical history of the patient [1]. As invasive procedures are typically used for diagnosis and therapy, which are both expensive and demanding [7], a reasonably straightforward and accurate way to diagnose this disease looks very relevant.

1.1. Machine Learning-Based Detection of Parkinson’s Disease

Over the past few decades, researchers have looked at a new way of detecting this disease through ML techniques, a subset of artificial intelligence (AI). Clinical personnel might better recognize these disease patients by combining traditional diagnostic indications with ML.
As walking is the most common activity in every person’s day-to-day life, it has been linked to physical as well as neurological disorders. This disease, for example, has been identifiable using gait (mobility) data. Gait analysis approaches offer advantages such as being non-intrusive and having the potential to be extensively used in residential settings [8]. Few researchers have attempted to combine ML methods to make the procedure autonomous and possible to do offline [9].
Furthermore, persons with the subject disease in its early stages might experience speech problems [10]. These include dysphonia (weak vocal fluency), repetitious echoes (a tiny assortment of audio variations), and hypophonia (vocal musculature disharmony) [7,11]. Information from human aural emissions might be detected and evaluated using a computing unit [12,13].

1.2. Research Problem and Motivation

Early PD detection in PD patients is a crucial challenge. Even if their health deteriorates, people can enhance their quality of life if they receive an early diagnosis. Another issue is that the diagnosis of PD requires a number of steps, including gathering a thorough neurological history from the patient and examining their motor abilities in various environments.
The majority of recent studies deal with the homo dataset (text, speech, video, or image). Problems with dataset modification and multi-data handling procedures have been highlighted in the suggested study. The effectiveness of disease prediction is regulated as a result of the examination of a particular dataset. More real-time solutions are made possible by the use of machine learning-based techniques for multivariate data processing. The multi-variate vocal data analysis (MVDA) is driven to provide multiple dataset attribute-based Parkinson’s disease identification utilizing machine learning approaches. This study examines the potential for improving multi-variate and multimodal data processing, which aids in raising the disease detection rate. The existing research simultaneously concentrates on various ML-based techniques such as support vector machines, naïve Bayes, K-NN, and artificial neural network evaluations of Parkinson’s data based on voice features. The MVDA employs extensive datasets and machine learning approaches to improve disease identification based on these works. The incorporation of numerous patients’ multivariate acoustic characteristics in the proposed MVDA is encouraged. The subjective disease has been diagnosed with the help of proposed machine learning techniques under the MVDA system.

1.3. Contribution

This research article covers the techniques of machine learning which are implemented in the auditory analysis of speech to diagnose this disease. The benefits and shortcomings of these algorithms in detecting the disease are thoroughly contrasted, and existing comparative studies’ potential drawbacks are explored. The accuracy of ANN in speech analysis for diagnosis is the finest among different classifiers; however, the assumption is to enhance and adapt to the difficulties that may come from the data. Using the naïve Bayes classifier with suitable pre-processing might result in greater average accuracy. The main contribution of this paper is as follows:
  • To identify which machine learning algorithms, such as SVM, KNN, naive Bayes, and ANN, offer the most accurate classifications and diagnosis of Parkinson’s disease.
  • To develop statistical evaluations for the diagnosis of Parkinson’s disease in order to identify the frequency at which the best training and test results will be acquired, and consequently to assist in upcoming literature-based research.
  • The proposed system has used an ANN classifier to attain the maximum classification accuracy when compared to the approaches used in earlier research.
  • In order to improve the prediction of PD, a comprehensive methodology was employed to explore the effectiveness and efficiency of various feature selection approaches.
  • The proposed model is examined with four machine learning methods, including SVM, naive Bayes, k-NN, and ANN, as well as with earlier and more current studies on PD detection.

1.4. Structure of Proposed Work

The structure of the study is as follows: Section 2 describes the related research survey. Section 3 discusses the methodology used to achieve the proposed objective. Section 4 defines the materials and methods. Section 5 examines the experiment and results. Section 6 discusses the comparative study and discussion. Finally, Section 7 concludes the proposed work.

2. Related Works

In order to distinguish PD cases from healthy controls, a variety of modern machine learning algorithms, including support vector machines, artificial neural networks, logistic regression, naïve Bayes, etc., have been successfully used. In this study, numerous databases, including Web of Science, Elsevier, MDPI, Scopus, Science Direct, IEEE Xplore, Springer, and Google Scholar, were utilized to survey relevant papers on Parkinson’s disease.
In a survey by [14], the authors used KNN, SVM, and discrimination-function-based (DBF) classifiers for the diagnosis of PD. In their study, they used several parameters such as jitter, fundamental frequency, pitch, shimmer, and other statistical measures. The best accuracy among these classifiers was obtained from KNN with a 93.83% accuracy rate and it also provided good performance in other parameters, such as sensitivity, specificity, and error rate.
The authors in [15] used a convolution neural network classifier applied to speech classification datasets. The accuracy reached throughout the training phase, which was over 77%, makes the results optimistic. In accordance with the works mentioned above, [16] examined a variety of classifiers to identify individuals who were likely to have Parkinson’s disease. They used 40 participants for their investigation, including 20 PD patients and 20 healthy controls. According to the experimental findings, the naive Bayes classifier has a detection accuracy of 65%, with a sensitivity rate of 63.6% and a specificity rate of 66.6%, respectively. In [17], the authors used three types of classifiers based on KNN, SVM, and multilayer perceptron (MLP) to diagnose Parkinson’s disease. Among all these ML classifiers, SVM using an RBF kernel outperformed with an overall classification accuracy rate of 85.294%.
A summary of the most recent deep learning methods for audio signal processing is given in another work by [18]. The works that have been examined include convolution neural networks as well as other long short-term memory architecture models and audio-specific neural network models. Similar to the previous studies, [19] detected PD using naive Bayes and other machine learning approaches. In their method, relevant features were extracted from the voice signal of PD patients and healthy control subjects using signal processing techniques. The naive Bayes algorithm shows a 69.24% detection accuracy and 96.02% precision rate for the 22 voice characteristics. In [20], the authors suggested a technique for detecting Parkinson’s disease using SVM on shifted delta cepstral (SDC) and single frequency filtering cepstral coefficients (SFFCC) features extracted from speech signals of PD patients and healthy controls. Comparing the standard MFCC + SDC features to the SDC + SFFCC features, performance increases of 9% were observed. The 73.33% detection accuracy with a 73.32% F1-score was displayed by the conventional SVM on SDC + SFCC features. In addition to the naive Bayes classifier, several additional supervised methods, including but not restricted to well-known deep learning methods, have been suggested to identify PD patients among healthy controls.
In a survey conducted by [21], the authors examined two recognizing decision forests i.e., SysFor and ForestPA, along with the most widely used random forest classifier, which has been utilized as a Parkinson’s detector. In their study, as compared to SysFor and ForestPA, random forest’s average detection accuracy on incremental trees showed 93.58%. For the purpose of classifying Parkinson’s disease through sets of acoustic vocal (voice) characteristics, the authors [22] suggested two frameworks based on CNN. Both frameworks are used for the mixing of different feature sets, although they combine feature sets in different ways. While the second framework provides feature sets to the parallel input levels that are directly connected to convolution layers, the first framework first combines several feature sets before passing them as inputs to the nine-layered CNN.
AI is assisting physicians in better diagnosing and treating diseases such as postoperative hypotension, and more advanced future models may have even more widespread medical uses. The evolutionary step in the creation of therapeutic pathways and adherence is machine learning. The real benefit of machine learning, however, is that it enables provider organizations to use information about the patient population from their own systems of record to create therapeutic pathways that are unique to their procedures, clientele, and physicians [23].
The vocal biomarkers and the description of the Aachen aphasia database, which contains recordings and transcriptions of therapy sessions, were covered in [24]. The authors also discussed how the biomarkers and the database could be used to build a recognition system that automatically maps pathological speech to aphasia type and severity.
In [25], the authors examined the suggested technique using a dataset of 288 audio files from 96 patients, including 48 healthy controls and 48 participants with cognitive impairment. The suggested method outperformed techniques based on manual transcription and speech annotation, with classification results that were comparable to those of the most advanced neuropsychological screening tests and an accuracy rate of 90.57%.
In [26], the authors intended to enlighten on the early indicators of major depressive relapse, which were discreetly measured using remote measurement technologies (RMT).
RMT has the potential to alter how depression and other long-term disorders are evaluated and handled if it is found to be acceptable to patients and other important stakeholders and capable of providing clinically meaningful information predicting future deterioration.
It can be seen from the reviews above that all the research that has been carried out is only restricted to a small number of datasets. The above previous works inspired us to try a new methodology. In this study, we experimented with several feature selection methods before comparing the results with various machine learning classifiers. Table 1 illustrates the review of ML techniques used to diagnose major symptoms of PD i.e., speech recording, handwriting pattern, and gait features, where data were collected from the UCI machine learning repository, the University of Oxford (UO), and other resources for 20 studies.

3. Proposed Work

The proposed ML model uses an SVM, naïve Bayes, KNN, and ANN algorithm in the core. These algorithms are widely used in the literature since they are easy to use and only need a small number of parameters to be tuned. There are several processes involved in developing a model to detect PD from voice recordings. In the first phase, relevant features are extracted from the dataset for better understanding. In the second phase, machine learning techniques are applied to classify healthy as well as PD patients, which are dependent on acoustic features to predict the outputs in the form of visual representation of graphs and percentage of accuracy score tables. Finally, in the third phase, there is a difference between the entire machine learning classifier models to predict the best accuracy score. The complete technical process of the proposed work is represented in Figure 1. The proposed methodology is shown to be better than the other methodologies with respect to computational cost since few voice features were used instead of heavy feature extraction processes such as MRI, motion sensors, or handwriting assessments. Additionally, the performances of different popular classifiers were evaluated, and the best classifier was found to be ANN for PD diagnosis problems.

Feature Selection

Due to many available features, feature selection is a frequent approach used to minimize the dimension of data in machine learning based on voice analysis. As demonstrated in Figure 2, all feature selection algorithms have the same aim of reducing redundancy and increasing relevance, which improves the accuracy of the disease’s diagnosis. Prior to supplying the data to the classifier, a variety of feature selection strategies were used. The filter-based strategies take into account the importance of the characteristics. As a result, they are stable and scalable and have a low level of complexity [47,48]. The major drawback of this method is that, especially when the data are flowing in a stream, it may overlook certain useful aspects [49]. Both univariate and multivariate techniques based on filters are possible [50]. According to statistically based criteria such as information gain (IG) [51,52,53], the univariate approaches analyze attributes. Multivariate approaches calculate feature dependence before ranking the feature. In addition, a widely utilized statistical technique for data analysis is principle component analysis (PCA). By choosing a collection of features that accurately reflects the entire data set, PCA can minimize the size of the data sets. The initial variables’ principal components are the components with the largest variance value since PCA is a conversion technique. Following that, the other principal components are arranged in descending order of variance values [54]. Additionally, the wrapper-based algorithms assess the quality of the chosen features based on the learning classifier’s performance.
In the pre-processing section, the whole procedure for filter techniques takes place independent of the model. The models are skipped by the filter. Filter methods primarily consider the data’s distribution and correlation and internal relationships. As a result, filter techniques have the advantage of being simple and quick to compute. Because of their simplicity and quick computing speed, filter approaches are commonly used in the diagnosis of this disease. Some popular filtering methods are listed below. The minimum redundancy and maximum relevancy (mRMR) method selects characteristics that are far apart but have a strong “correlation” with the classification variable.
The wrapper method decides whether to have or reject a feature depending on a classifier’s working change [55]. The wrapper method takes certain classifiers into account and provides a well-tailored subset. As a result, wrapper methods have a lower chance of finding the local maximum. Due to its huge gain in performance, the wrapper approach is popular among ML diagnostics. However, it has drawbacks such as being prone to overfitting and being computationally costly. Wrapper-based feature selection techniques use a classifier to build ML models with different predictor variables and select the variable subset that leads to the best model.
In contrast, filter-based methods are statistical techniques independent of a learning algorithm used to compute the correlation between the predictor and independent variables. The predictor variables are scored according to their relevance to the target variable. The variables with higher scores are then used to build the ML model. Therefore, this research aims to use a filter-based feature selection method, to identify the most relevant features for improved PD detection.

4. Materials and Methods

4.1. Dataset

The dataset of recorded speech signals was obtained from Max Little of the University of Oxford [56,57]. Table 2 contains the details of the dataset. This dataset has an assortment of acoustic speech measures from 195 persons, where 147 persons have Parkinson’s disease. All the attributes in the dataset characterize an individual voice measure, and each tuple represents a total number of voice recordings made by these people. The objective of the dataset is to differentiate fit persons compared to the unhealthy using the “status” column, which is set to negative for fit persons and positive for those having the disease.

4.2. Parkinson’s Disease Diagnosis Based on Voice Analysis and Machine Learning

Some studies have concentrated on the acoustic level or the fluctuations in fundamental frequency (F0) caused by vocal activities. The effects of power spectral analysis of F0 phonation in persons with sensorineural audibility loss and the disease have been examined in [58,59,60]. F0’s rhythm was unique in the incidence and amplitude of the diseases. Further, the study demonstrated that the F0 analysis can be a useful tool for neurological diseases under investigation. The autocorrelation function approach was used to find the basic frequencies of speech transmissions. According to the concept, Parkinsonian dysprosody is frequently described as a simple neuro-motor disorder.
The understanding and generation of pitch characteristics in a group of patients were examined to confirm the idea. Conventional medications, such as LDOPA, define that in the early stages of PD, LDOPA is a very effective treatment of subjective disease [61]. In [62], the authors use deep learning to categorize the patient’s speech data as “severe” and “not severe”. The evaluation measures employed in this study were the unified Parkinson’s disease ranking scale (UPDRS). The motor UPDRS examines the patient’s motor ability on a 0–108 scale, while the entire UPDRS provides a range of scores from 0 to 1766.

4.3. Classification of Parkinson’s Disease with ML Classifier

In this technique, we’ll use an ML classifier to classify the disease. First, we select a target variable of patient health status and measure the number of patients in this report. We visualize the data graphically after assessing the health status of a patient. Two types of datasets were developed: 80% of the dataset was used for training and 20% for the testing dataset. In the following Figure 3, the score of 0 represents the healthy persons in the sample, whose count is 48, and 1 represents the patients with Parkinson’s disease, whose count is 147. The count of Parkinson’s disease patients in the dataset: 147 out of 195 (75.38%). The count of healthy persons in the dataset: 48 out of 195 (24.62%).

4.4. Building of Machine Learning Techniques with Classifier Evaluation Metrics

By using different types of classifiers, it becomes easy to detect the disease. Classification sensitivity, Matthews’s correlation coefficient (MCC), accuracy, specificity, F-score (F-measure), and other measurement parameters are used to distinguish it. Each of these measurement criteria includes a formula for calculating it and determining which classifier is the most qualitatively appropriate for the analysis. It is requisite to focus on the confusion matrix before developing these criteria [63]. The confusion matrix of the multi-class classifier is shown in Figure 4.
F1-Score: It represents the accuracy of a model on a given dataset which is also known as F-Score as shown in Equation (1):
F S c o r e = 2 p r e c i s i o n s e n s i t i v i t y p r e c i s i o n + s e n s i t i v i t y
MCC: It is utilized for model evaluation to evaluate the quality of the binary and multi-class classifications as shown in Equation (2). It is based on true-negative, true-positive, and false-negative, false-positive. It lies between −1 to 1 which is defined as follows:
M C C = T P T N F P F N T P + F P T P + F N T N + F P T N + F N
(−1): Contradiction between prediction and observation
(0): No better than random prediction
(1): Perfect classifier (accurate prediction).

5. Experiments and Results

The proposed work is implemented in Python 3.7: JupyterLab. Here we detail the experimental setup and the results of the four machine learning classification methods.

5.1. SVM-Classifier

SVM is one of the most prevalent classifier models because it provides accurate as well as highly robust results. The fundamental goal of SVM is to classify the training data by separating the classes while executing a multiple-class learning activity. It allows for the best classification performance on training data and accurately classifies patterns from the data [64]. The training procedure uses a sequential minimization strategy, and classification accuracy is shown to be higher in SVM due to its greater generalization ability [65]. The linear SVM is calculated by using the following Equation (3).
  y = f x = w T x b
where x represents the data, y represents the class label, w represents the weight of vector orthogonal to the decision hyper-plane, b represents the offset of the hyper-plane and T shows the transpose operator [66].
In this study, we use the sklearn library in the SVM-classifier module for the classification of the given dataset. Table 3 represents the results that are generated by using the SVM classifier (Figure 5). Figure 6 represents the confusion matrix with the true positive, true negative, false positive, and false negative value of a PD person by using the SVM classifier.

5.2. Naive Bayes Classifier

Another main essential category method of ML is the naive Bayes classifier technique. It provides effective classification and learning and the majority of results are acquired through the naïve Bayes method [67]. Naïve Bayes, based on Bayes’ theorem, determines the likelihood of an event occurring depending on the event’s circumstances. For instance, variations in the voice are common in people with the disease; hence, these symptoms are linked to the prediction for diagnosis of this disease. The naive variation of the theorem extends and simplifies the original Bayes theorem, which gives a mechanism for determining the probability of a target occurrence. To estimate the likelihood of the medical condition, the data comprise numerous speech signal variants. The sklearn Gaussian naive Bayes algorithm is used to provide the classifier module for the execution of the naïve Bayes categorization. The result of the classifier is shown in Table 4 and graphical representation is illustrated in Figure 7.

5.3. Artificial Neural Network

ANN is a subfield of deep neural networks that predict how the human brain works. In general, there is a significant distinction between the human brain and ANN. The brain has ‘n’ number of parallel neurons, whereas the machine only has a finite sum of processors. Additionally, neurons are meeker and more relaxed than computer processors. Another major disparity between computer systems and the brain is the ability to process information on a larger scale. Neurons are made up of synapses or networks that operate together [64,68]. In this article, the main aim is to classify the functionality of ANN techniques in the early detection of this disease which is built on the subsequent phases:
  • Identifying the responsibility and function of ANN in the detection of this disease.
  • Making observations on labels and features of datasets.
  • Grouping the types of the studied disease centered on their symptoms.
  • Examining the accurate outcomes.
These outcomes can be further used in the medical sector as direction for developers considering ANN deployment to enhance the civic health potential as a reaction to the studied disease [69].
In the experiment of an artificial neural network, the dataset was split into two parts i.e., the training dataset (80%) and the test dataset (20%). The classification results of the artificial neural network were found to be very high in the form of the average accuracy score which was the highest among all the classification methods, i.e., 96.7% shown in Table 5 and graphical representation is shown in Figure 8.

5.4. K-Nearest Neighbor

The KNN technique is costly while presenting with a huge training dataset since it has been used most of the time in pattern recognition. KNN is the base concept of learning by analogy utilized to categorize the nearest neighbors. It is accomplished by comparing closely similar training tuples to the provided test tuple. As a result, “n” characteristics are utilized to recognize training tuples in which each tuple corresponds to a distinct point in the n-dimensional space. The KNN classifier’s responsibility in the event of an unlabeled tuple is to explore the pattern space for all k training tuples that are close together [64]. This study aims to identify the accuracy rate of detecting the subject disease. To find out the difference between affected patients and healthy persons, the KNN algorithm is used. In terms of accuracy, experimental data reveal that the ANN classifier outperformed the KNN classifier on average. The results of the KNN classifier are shown in Table 6 with the accuracy rate of the training and test datasets, F1-score, and MCC illustrated in Figure 9.

5.5. Summary of Evaluation Results

The performance of all the classifier models used in the experiment for the disease’s prediction is illustrated in Table 7. The artificial neural network classifier scores the highest accuracy rate followed by SVM, naïve Bayes, and KNN. Figure 10 shows the graphical representation of the results obtained by these four ML classifiers based on various parameters. Table 7 illustrates that SVM attained the average accuracy for the training and test datasets, which are 88.46% and 87.17% respectively, F1-score (66.19%), and MCC (56.59%), sensitivity and specificity 62.5% and 93.54%, respectively. In addition, the naïve Bayes achieved the average accuracy for the training and test datasets, F1-score, MCC, sensitivity, and specificity, which are 76.23%, 74.11%, 86.74%, 66.56%, 84%, and 79.76% respectively.
It has been observed that the results obtained by the SVM and KNN have the same values for all the parameters except MCC (65.02 %) and sensitivity (60%). Finally, the best accuracy was obtained by the ANN where the results of parameters such as accuracy of the training and test datasets, F1-score, MCC, sensitivity, and specificity are 97.4%, 96.7%, 64.55%, 87.01%, 70.11%, 92.42%, and 91.25%, respectively. Overall, the results of our experiments show that ANN outperforms SVM, naive Bayes, and KNN.

6. Comparative Study and Discussion

This section examines the efficient comparative result analysis of the proposed technique with other conventional machine learning techniques. The comparison of the proposed study with previously published research is shown in Table 8.
As per the comparative analysis, the proposed model (using four machine learning algorithms) shows better results obtained as compared to all other experimental machine learning models and the existing state of the art. In the proposed study, the best result was achieved by ANN with 96.7% accuracy, which is higher than the other experimental algorithms. The authors of [49] collected 20 PD and 20 HC speech datasets using high-quality recording equipment and used KNN and SVM to analyze the datasets in order to detect PD. KNN and SVM classifiers performed with accuracy rates of 59.52% (LOSO) and 68.45% (LOSO), respectively. In addition to [50], the authors used various algorithms such as C4.5, C5.0, random forest, and CART based on decision trees. The authors experimented on 40 individuals’ records, where 50% were affected with the subjective disease and 50% were HC. For this study, the highest average model accuracy of 66.5% was attained. ANN was used by [51] to identify PD. The dataset was obtained from the University of California, Irvine’s machine learning library. A total of 45 attributes were chosen as input values and one outcome for the categorization using the MATLAB tool. With an accuracy of 94.93%, their suggested model was able to differentiate healthy individuals from PD subjects. In [52], the authors used random forest, SVM, MLP, and KNN classifiers for the detection of PD patients from HC. The result obtained from this study was 78.4% and 82.2% for the SVM and KNN classifiers, respectively. In a study by [53], the authors examined the comparison between the patients with PD (PWP) and healthy controls (HC) based on a variety of speech samples. In their study, human factor cepstral coefficients (HFCC) were applied. The extracted HFCC was used to generate the average voice print for each voice recording. For the classification, SVM was used with a variety of kernels, including RBF, polynomial, linear, and MLP. The SVM’s linear kernel allowed for the highest accuracy of 87.5%.
In addition to the comparisons mentioned above, the performance of the proposed methodology is compared with related ML methods for PD analysis in various scenarios and with various types of evaluated PD datasets. The proposed technique outperformed other similar contributions of ML methods in terms of performance for diagnosing PD, as seen in the above table, and is thus superior to them.

7. Conclusions

Automated ML techniques may classify PD from HC and predict the outcome using non-invasive speech biomarkers as features. With noisy and high-dimensional data, our study compares the performance of multiple machine learning classifiers for disease detection. Accuracy at the clinical level is feasible with careful feature selection. In this paper, we compared ML classifiers: SVM with an accuracy of 87.17%, naïve Bayes’ classifier with an accuracy of 74.11%, ANN with an accuracy of 96.7%, and KNN with an accuracy of 87.17%. We used these techniques to distinguish between affected patients and healthy people. The disease is diagnosed using human speech signals. The acquired results demonstrate how feature selection techniques work well with ML classifiers, especially when working with voice data where it is possible to extract a large number of phonetic characteristics. The proposed early diagnosis approach makes it possible to detect PD with high accuracy in its early stages and the subjective disease’s severe symptoms can be prevented. Many categorization algorithms are being used in the medical imaging area to obtain the best level of accuracy. This research may be used in different machine learning methods and datasets to improve classifier performance and reach the maximum accuracy score. In order to improve the accuracy of the models created, future efforts will make use of the already-existing recordings and add to the number of existing attributes. In order to compare the collected data, various different records processing software that are available online may also be used.

Author Contributions

Conceptualization, A.R.; methodology, A.R. and A.D.; validation, A.R. and M.R.; formal analysis, A.R. and N.A.; writing—original draft preparation, A.R.; writing—review and editing, M.K.P. and M.R.; supervision, A.D. and R.S. All authors have read and agreed to the published version of the manuscript.

Funding

There was no external funding received for this article.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data in this research paper will be shared upon request made to the first author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. DeMaagd, G.; Philip, A. Parkinson’s Disease and Its Management: Part 1: Disease Entity, Risk Factors, Pathophysiology, Clinical Presentation, and Diagnosis. Pharm. Ther. 2015, 40, 504–532. [Google Scholar]
  2. Rizek, P.; Kumar, N.; Jog, M.S. An update on the diagnosis and treatment of Parkinson disease. CMAJ 2016, 188, 1157–1165. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Available online: https://www.who.int/news-room/fact-sheets/detail/parkinson-disease (accessed on 30 October 2022).
  4. De Rijk, M.C.; Launer, L.J.; Berger, K.; Breteler, M.M.; Dartigues, J.F.; Baldereschi, M.; Fratiglioni, L.; Lobo, A.; Martinez-Lage, J.; Trenkwalder, C.; et al. Prevalence of Parkinson’s disease in Europe: A collaborative study of population-based cohorts. Neuro-logic Diseases in the Elderly Research Group. Neurology 2000, 54 (Suppl. S5), S21–S23. [Google Scholar] [PubMed]
  5. Cantürk, İ.; Karabiber, F. A machine learning system for the diagnosis of Parkinson’s disease from speech signals and its application to multiple speech signal types. Arab. J. Sci. Eng. 2016, 41, 5049–5059. [Google Scholar] [CrossRef]
  6. Singh, N.; Pillay, V.; Choonara, Y.E. Advances in the treatment of Parkinson’s disease. Prog. Neurobiol. 2007, 81, 29–44. [Google Scholar] [CrossRef]
  7. Rana, A.; Rawat, A.S.; Bijalwan, A.; Bahuguna, H. Application of multi-layer (perceptron) artificial neural network in the diagnosis system: A systematic review. In Proceedings of the 2018 International Conference on Research in Intelligent and Computing in Engineering (RICE), San Salvador, El Salvador, 22–24 August 2018; pp. 1–6. [Google Scholar]
  8. Lakany, H. Extracting a diagnostic gait signature. Pattern Recognit. 2008, 41, 1627–1637. [Google Scholar] [CrossRef]
  9. Figueiredo, J.; Santos, C.P.; Moreno, J.C. Automatic recognition of gait patterns in human motor disorders using machine learning: A review. Med. Eng. Phys. 2018, 53, 1–12. [Google Scholar] [CrossRef]
  10. Hazan, H.; Hilu, D.; Manevitz, L.; Ramig, L.O.; Sapir, S. Early diagnosis of Parkinson’s disease via machine learning on speech data. In Proceedings of the 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel, Eilat, Israel, 14–17 November 2012; pp. 1–4. [Google Scholar] [CrossRef]
  11. Karan, B.; Sahu, S.S.; Mahto, K. Parkinson disease prediction using intrinsic mode function based features from speech signal. Biocybern. Biomed. Eng. 2019, 40, 249–264. [Google Scholar] [CrossRef]
  12. Frid, A.; Safra, E.J.; Hazan, H.; Lokey, L.L.; Hilu, D.; Manevitz, L.; Ramig, L.O.; Sapir, S. Computational diagnosis of Parkinson’s Disease directly from natural speech using machine learning techniques. In Proceedings of the 2014 IEEE International Conference on Software Science, Technology and Engineering, Washington, DC, USA, 11–12 June 2014; pp. 50–53. [Google Scholar]
  13. Rawat, A.S.; Rana, A.; Kumar, A.; Bagwari, A. Application of multi layer artificial neural network in the diagnosis system: A systematic review. IAES Int. J. Artif. Intell. 2018, 7, 138. [Google Scholar] [CrossRef] [Green Version]
  14. KarimiRouzbahani, H.; Daliri, M.R. Diagnosis of Parkinson’s Disease in Human Using Voice Signals. BCN 2011, 2, 12–20. [Google Scholar]
  15. Khamparia, A.; Gupta, D.; Nguyen, N.G.; Khanna, A.; Pandey, B.; Tiwari, P. Sound Classification Using Convolutional Neural Network and Tensor Deep Stacking Network. IEEE Access 2019, 7, 7717–7727. [Google Scholar] [CrossRef]
  16. Bourouhou, A.; Jilbab, A.; Nacir, C.; Hammouch, A. Comparison of classification methods to detect the parkinson disease. In Proceedings of the 2016 International Conference on Electrical and Information Technologies (ICEIT), Tangiers, Morocco, 4–7 May 2016; pp. 421–424. [Google Scholar]
  17. Sharma, A.; Giri, R.N. Automatic Recognition of Parkinson’s Disease via Artificial Neural Network and Support Vector Machine. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 2014, 4, 7. [Google Scholar]
  18. Purwins, H.; Li, B.; Virtanen, T.; Schluter, J.; Chang, S.-Y.; Sainath, T.N. Deep Learning for Audio Signal Processing. IEEE J. Sel. Top. Signal Process. 2019, 13, 206–219. [Google Scholar] [CrossRef] [Green Version]
  19. Zhang, L.; Qu, Y.; Jin, B.; Jing, L.; Gao, Z.; Liang, Z. An Intelligent Mobile-Enabled System for Diagnosing Parkinson Disease: Development and Validation of a Speech Impairment Detection System. JMIR Public Health Surveill. 2020, 8, e18689. [Google Scholar] [CrossRef]
  20. Kadiri, S.R.; Kethireddy, R.; Alku, P. Parkinson’s Disease Detection from Speech Using Single Frequency Filtering Cepstral Coefficients. In Proceedings of the Interspeech 2020, Shanghai, China, 25–29 October 2020. [Google Scholar] [CrossRef]
  21. Pramanik, M.; Pradhan, R.; Nandy, P.; Bhoi, A.K.; Barsocchi, P. Machine Learning Methods with Decision Forests for Parkinson’s Detection. Appl. Sci. 2021, 11, 581. [Google Scholar] [CrossRef]
  22. Gunduz, H. Deep Learning-Based Parkinson’s Disease Classification Using Vocal Feature Sets. IEEE Access 2019, 7, 115540–115551. [Google Scholar] [CrossRef]
  23. Available online: https://www.dataversity.net/improving-clinical-insights-machine-learning/# (accessed on 27 August 2022).
  24. Kohlschein, C.; Schmitt, M.; Schuller, B.; Jeschke, S.; Werner, C.J. A machine learning based system for the automatic evaluation of aphasia speech. In Proceedings of the 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), Dalian, China, 12–15 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
  25. Bertini, F.; Allevi, D.; Lutero, G.; Montesi, D.; Calzà, L. Automatic Speech Classifier for Mild Cognitive Impairment and Early Dementia. ACM Trans. Comput. Healthc. 2022, 3, 1–11. [Google Scholar] [CrossRef]
  26. Matcham, F.; On Behalf of the RADAR-CNS Consortium; Pietro, C.B.D.S.; Bulgari, V.; de Girolamo, G.; Dobson, R.; Eriksson, H.; Folarin, A.A.; Haro, J.M.; Kerz, M.; et al. Remote assessment of disease and relapse in major depressive disorder (RADAR-MDD): A multi-centre prospective cohort study protocol. BMC Psychiatry 2019, 19, 72. [Google Scholar] [CrossRef] [Green Version]
  27. Sakar, C.O.; Serbes, G.; Gunduz, A.; Tunc, H.C.; Nizam, H.; Sakar, B.E.; Tutuncu, M.; Aydin, T.; Isenkul, M.E.; Apaydin, H. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable-factor wavelet transform. Appl. Soft Comput. 2019, 74, 255–263. [Google Scholar] [CrossRef]
  28. Yasar, A.; Saritas, I.; Sahman, M.A.; Cinar, A.C. Classification of Parkinson disease data with artificial neural networks. In Proceedings of the IOP Conference Series: Materials Science and Engineering, Wuhan, China, 10–12 October 2019; Volume 675, p. 012031. [Google Scholar]
  29. Avuçlu, E.; Elen, A. Evaluation of train and test performance of machine learning algorithms and Parkinson diagnosis with statistical measurements. Med. Biol. Eng. Comput. 2020, 58, 2775–2788. [Google Scholar] [CrossRef]
  30. Marar, S.; Swain, D.; Hiwarkar, V.; Motwani, N.; Awari, A. Predicting the occurrence of Parkinson’s Disease using various Classification Models. In Proceedings of the 2018 International Conference on Advanced Computation and Telecommunication (ICACAT), Bhopal, India, 28–29 December 2018; pp. 1–5. [Google Scholar]
  31. Nikookar, E.; Sheibani, R.; Alavi, S.E. An ensemble method for diagnosis of Parkinson’s disease based on voice measurements. J. Med. Signals Sens. 2019, 9, 221–226. [Google Scholar] [CrossRef] [PubMed]
  32. Tracy, J.M.; Özkanca, Y.; Atkins, D.C.; Ghomi, R.H. Investigating voice as a biomarker: Deep phenotyping methods for early detection of Parkinson’s disease. J. Biomed. Inform. 2019, 104, 103362. [Google Scholar] [CrossRef] [PubMed]
  33. Cibulka, M.; Brodnanova, M.; Grendar, M.; Grofik, M.; Kurca, E.; Pilchova, I.; Osina, O.; Tatarkova, Z.; Dobrota, D.; Kolisek, M. SNPs rs11240569, rs708727, and rs823156 in SLC41A1 Do Not Discriminate Between Slovak Patients with Idiopathic Parkinson’s Disease and Healthy Controls: Statistics and Machine-Learning Evidence. Int. J. Mol. Sci. 2019, 20, 4688. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Hsu, S.-Y.; Lin, H.-C.; Chen, T.-B.; Du, W.-C.; Hsu, Y.-H.; Wu, Y.-C.; Tu, P.-W.; Huang, Y.-H.; Chen, H.-Y. Feasible Classified Models for Parkinson Disease from 99mTc-TRODAT-1 SPECT Imaging. Sensors 2019, 19, 1740. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Drotár, P.; Mekyska, J.; Rektorová, I.; Masarová, L.; Smékal, Z.; Faundez-Zanuy, M. Evaluation of handwriting kinematics and pressure for differential diagnosis of Parkinson’s disease. Artif. Intell. Med. 2016, 67, 39–46. [Google Scholar] [CrossRef]
  36. Maass, F.; Michalke, B.; Willkommen, D.; Leha, A.; Schulte, C.; Tönges, L.; Mollenhauer, B.; Trenkwalder, C.; Rückamp, D.; Börger, M.; et al. Elemental fingerprint: Reassessment of a cerebrospinal fluid biomarker for Parkinson’s disease. Neurobiol. Dis. 2019, 134, 104677. [Google Scholar] [CrossRef]
  37. Mucha, J.; Mekyska, J.; Faundez-Zanuy, M.; Lopez-De-Ipina, K.; Zvoncak, V.; Galaz, Z.; Kiska, T.; Smekal, Z.; Brabenec, L.; Rektorova, I. Advanced Parkinson’s Disease Dysgraphia Analysis Based on Fractional Derivatives of Online Handwriting. In Proceedings of the 2018 10th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), Moscow, Russia, 5–9 November 2018; pp. 1–6. [Google Scholar] [CrossRef]
  38. Wenzel, M.; Milletari, F.; Krüger, J.; Lange, C.; Schenk, M.; Apostolova, I.; Klutmann, S.; Ehrenburg, M.; Buchert, R. Automatic classification of dopamine transporter SPECT: Deep convolutional neural networks can be trained to be robust with respect to variable image characteristics. Eur. J. Pediatr. 2019, 46, 2800–2811. [Google Scholar] [CrossRef]
  39. Segovia, F.; Gorriz, J.M.; Ramirez, J.; Martinez-Murcia, F.J.; Castillo-Barnes, D. Assisted Diagnosis of Parkinsonism Based on the Striatal Morphology. Int. J. Neural Syst. 2019, 29, 1950011. [Google Scholar] [CrossRef] [Green Version]
  40. Ye, Q.; Xia, Y.; Yao, Z. Classification of Gait Patterns in Patients with Neurodegenerative Disease Using Adaptive Neuro-Fuzzy Inference System. Comput. Math. Methods Med. 2018, 2018, 9831252. [Google Scholar] [CrossRef]
  41. Klomsae, A.; Auephanwiriyakul, S.; Theera-Umpon, N. (2018). String grammar unsupervised possibilistic fuzzy c-medians for gait pattern classification in patients with neurodegenerative diseases. Comput. Intell. Neurosci. 2018, 2018, 1869565. [Google Scholar] [CrossRef] [Green Version]
  42. Felix, J.P.; Vieira, F.H.T.; Cardoso, A.A.; Ferreira, M.V.G.; Franco, R.A.P.; Ribeiro, M.A.; Araujo, S.G.; Correa, H.P.; Carneiro, M.L. A Parkinson’s Disease Classification Method: An Approach Using Gait Dynamics and Detrended Fluctuation Analysis. In Proceedings of the 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), Edmonton, AB, Canada, 5–8 May 2019. [Google Scholar] [CrossRef]
  43. Andrei, A.-G.; Tautan, A.-M.; Ionescu, B. Parkinson’s Disease Detection from Gait Patterns. In Proceedings of the 2019 E-Health and Bioengineering Conference (EHB), Iasi, Romania, 21–23 November 2019; pp. 1–4. [Google Scholar] [CrossRef]
  44. Priya, S.J.; Rani, A.J.; Subathra, M.S.P.; Mohammed, M.A.; Damaševičius, R.; Ubendran, N. Local Pattern Transformation Based Feature Extraction for Recognition of Parkinson’s Disease Based on Gait Signals. Diagnostics 2021, 11, 1395. [Google Scholar] [CrossRef]
  45. Yurdakul, O.C.; Subathra, M.; George, S.T. Detection of Parkinson’s Disease from gait using Neighborhood Representation Local Binary Patterns. Biomed. Signal Process. Control 2020, 62, 102070. [Google Scholar] [CrossRef]
  46. Li, B.; Yao, Z.; Wang, J.; Wang, S.; Yang, X.; Sun, Y. Improved Deep Learning Technique to Detect Freezing of Gait in Parkinson’s Disease Based on Wearable Sensors. Electronics 2020, 9, 1919. [Google Scholar] [CrossRef]
  47. Rana, A.; Dumka, A.; Singh, R.; Panda, M.K.; Priyadarshi, N.; Twala, B. Imperative Role of Machine Learning Algorithm for Detection of Parkinson’s Disease: Review, Challenges and Recommendations. Diagnostics 2022, 12, 2003. [Google Scholar] [CrossRef]
  48. Masoudi-Sobhanzadeh, Y.; MotieGhader, H.; Masoudi-Nejad, A. FeatureSelect: A software for feature selection based on machine learning approaches. BMC Bioinform. 2019, 20, 107. [Google Scholar] [CrossRef]
  49. Rahmaninia, M.; Moradi, P. OSFSMI: Online stream feature selection method based on mutual information. Appl. Soft Comput. 2018, 68, 733–746. [Google Scholar] [CrossRef]
  50. Pourbahrami, S. Improving PSO global method for feature selection according to iterations global search and chaotic theory. arXiv 2018, preprint. arXiv:1811.08701. [Google Scholar]
  51. Yu, L.; Liu, H. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA, 21–24 August 2003; pp. 856–863. [Google Scholar]
  52. Blum, A.L.; Langley, P. Selection of relevant features and examples in machine learning. Artif. Intell. 1997, 97, 245–271. [Google Scholar] [CrossRef]
  53. Raileanu, L.E.; Stoffel, K. Theoretical Comparison between the Gini Index and Information Gain Criteria. Ann. Math. Artif. Intell. 2004, 41, 77–93. [Google Scholar] [CrossRef]
  54. Jolliffe, I.T. Principal Component Analysis, 2nd ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
  55. Miao, Y.; Lou, X.; Wu, H. The Diagnosis of Parkinson’s Disease Based on Gait, Speech Analysis and Machine Learning Techniques. In Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing (BIC 2021). Association for Computing Machinery, New York, NY, USA, 22–24 January 2021; pp. 358–371. [Google Scholar]
  56. Little, M.; McSharry, P.; Hunter, E.; Spielman, J.; Ramig, L. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. Nat. Preced. 2008, 1. [Google Scholar] [CrossRef]
  57. Lichman, M. UCI Machine Learning Repository; University of California, School of Information and Computer Science: Irvine, CA, USA; Available online: http://archive.ics.uci.edu/ml (accessed on 25 September 2022).
  58. Rewar, S. A systematic review on Parkinson’s disease (PD). Indian J. Res. Pharm. Biotechnol. 2015, 3, 176. [Google Scholar]
  59. Arora, S.; Venkataraman, V.; Zhan, A.; Donohue, S.; Biglan, K.; Dorsey, E.; Little, M. Detecting and monitoring the symptoms of Parkinson’s disease using smartphones: A pilot study. Park. Relat. Disord. 2015, 21, 650–653. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  60. Miljkovic, D.; Aleksovski, D.; Podpečan, V.; Lavrač, N.; Malle, B.; Holzinger, A. Machine Learning and Data Mining Methods for Managing Parkinson’s Disease. In Machine Learning for Health Informatics; Springer: Cham, Switzerland, 2016; pp. 209–220. [Google Scholar] [CrossRef]
  61. Challa, K.N.R.; Pagolu, V.S.; Panda, G.; Majhi, B. An improved approach for prediction of Parkinson’s disease using machine learning techniques. In Proceedings of the 2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES), Odisha, India, 3–5 October 2016; pp. 1446–1451. [Google Scholar] [CrossRef] [Green Version]
  62. Lee, G.S.; Lin, S.H. Changes of rhythm of vocal fundamental frequency in sensorineural hearing loss and in Parkinson’s disease. Chin. J. Physiol. 2009, 52, 446–450. [Google Scholar] [CrossRef] [PubMed]
  63. Asmae, O.; Abdelhadi, R.; Bouchaib, C.; Sara, S.; Tajeddine, K. Parkinson’s Disease Identification using KNN and ANN Algorithms based on Voice Disorder. In Proceedings of the 2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), Meknes, Morocco, 16–19 April 2020; Institute of Electrical and Electronics Engineers (IEEE): New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
  64. Wu, X.; Kumar, V.; Ross Quinlan, J.; Ghosh, J.; Yang, Q.; Motoda, H.; Steinberg, D. Top 10 algorithms in data mining. Knowl. Inf. Syst. 2008, 14, 1–37. [Google Scholar] [CrossRef] [Green Version]
  65. Ray, P.K.; Mohanty, A.; Panigrahi, T. Power quality analysis in solar PV integrated microgrid using independent component analysis and support vector machine. Optik 2019, 180, 691–698. [Google Scholar] [CrossRef]
  66. Lahmiri, S.; Shmuel, A. Detection of Parkinson’s disease based on voice patterns ranking and optimized support vector machine. Biomed. Signal Process. Control 2018, 49, 427–433. [Google Scholar] [CrossRef]
  67. Bhatia, A.; Sulekh, R. Predictive Model for Parkinson’s disease through Naïve Bayes Classification. Int. J. Comput. Sci. Commun. 2017, 9, 194–202. [Google Scholar]
  68. Rana, A.; Bahuguna, H.; Bijalwan, A. Artificial Neural Network based Diagnosis System. Int. J. Comput. Trends Technol. 2017, 3, 189–191. [Google Scholar]
  69. Alzubaidi, M.S.; Shah, U.; DhiaZubaydi, H.; Dolaat, K.; Abd-Alrazaq, A.A.; Ahmed, A.; Househ, M. The Role of Neural Network for the Detection of Parkinson’s Disease: A Scoping Review. Healthcare 2021, 9, 740. [Google Scholar] [CrossRef]
  70. Sakar, B.E.; Isenkul, M.E.; Sakar, C.O.; Sertbas, A.; Gurgen, F.; Delil, S.; Apaydin, H.; Kursun, O. Collection and Analysis of a Parkinson Speech Dataset With Multiple Types of Sound Recordings. IEEE J. Biomed. Health Inform. 2013, 17, 828–834. [Google Scholar] [CrossRef]
  71. Vadovský, M.; Paralič, J. Parkinson’s disease patients classification based on the speech signals. In Proceedings of the 2017 IEEE 15th International Symposium on Applied Machine Intelligence and Informatics (SAMI), Herlany, Slovakia, 26–28 January 2017; pp. 000321–000326. [Google Scholar]
  72. Ouhmida, A.; Raihani, A.; Cherradi, B.; Terrada, O. A Novel Approach for Parkinson’s Disease Detection Based on Voice Classification and Features Selection Techniques. Int. J. Online Biomed. Eng. 2021, 17, 111–130. [Google Scholar] [CrossRef]
  73. Mabrouk, R.; Chikhaoui, B.; Bentabet, L. Machine Learning Based Classification Using Clinical and DaTSCAN SPECT Imaging Features: A Study on Parkinson’s Disease and SWEDD. IEEE Trans. Radiat. Plasma Med. Sci. 2018, 3, 170–177. [Google Scholar] [CrossRef]
  74. Benba, A.; Jilbab, A.; Hammouch, A. Using Human Factor Cepstral Coefficient on Multiple Types of Voice Recordings for Detecting Patients with Parkinson’s Disease. Irbm 2017, 38, 346–351. [Google Scholar] [CrossRef]
Figure 1. Diagram of the flowchart of the proposed work.
Figure 1. Diagram of the flowchart of the proposed work.
Electronics 11 03782 g001
Figure 2. Feature Selection and Feature Extraction from Dataset.
Figure 2. Feature Selection and Feature Extraction from Dataset.
Electronics 11 03782 g002
Figure 3. Health Status of PD Patient.
Figure 3. Health Status of PD Patient.
Electronics 11 03782 g003
Figure 4. Confusion Matrix with Sensitivity, Specificity, Accuracy, and Precision value.
Figure 4. Confusion Matrix with Sensitivity, Specificity, Accuracy, and Precision value.
Electronics 11 03782 g004
Figure 5. Results obtained by SVM.
Figure 5. Results obtained by SVM.
Electronics 11 03782 g005
Figure 6. Confusion Matrix and Heatmap of SVM Classifier.
Figure 6. Confusion Matrix and Heatmap of SVM Classifier.
Electronics 11 03782 g006
Figure 7. Results obtained by Naïve Bayes.
Figure 7. Results obtained by Naïve Bayes.
Electronics 11 03782 g007
Figure 8. Results obtained by ANN.
Figure 8. Results obtained by ANN.
Electronics 11 03782 g008
Figure 9. Results obtained by KNN.
Figure 9. Results obtained by KNN.
Electronics 11 03782 g009
Figure 10. Graphical representation of distributions of performance measures for all classifiers.
Figure 10. Graphical representation of distributions of performance measures for all classifiers.
Electronics 11 03782 g010
Table 1. Comparative Studies of Machine Learning Approaches to diagnose Parkinson’s Disease.
Table 1. Comparative Studies of Machine Learning Approaches to diagnose Parkinson’s Disease.
ReferenceFeatureMachine Learning Algorithms UsedObjectiveTools UsedSource of DataNo. of SubjectsOutcomes
Sakar et al., 2019 [27]SpeechNaïve Bayes, Logistic Regression, SVM (RBF and Linear), KNN, Random Forest, MLPClassification of PD from HCJupyterLab with python programming languageCollected from participants252, 188 PD + 64 HCHighest accuracy obtained from SVM (RBF)—86%
Yasar A. et al., 2019 [28]SpeechArtificial Neural NetworkClassification of PD from HCMATLABCollected from participants80, 40 PD + 40 HCAccuracy of ANN—94.93%
Avuçlu, E., Elen, A, 2020 [29]SpeechKNN, Random Forest, Naïve Bayes, SVMClassification of PD from HCJupyterLab with python programming languageUCI machine learning repository31, 23 PD + 8 HCAccuracy from Naïve Bayes—70.26%
Marar et al., 2018 [30]SpeechNaïve Bayes, ANN, KNN, Random Forest, SVM, Logistic Regression, Decision Tree (DT)Classification of PD from HCR programmingCollected from participants31, 23 PD + 8 HCHighest accuracy obtained from ANN—94.87%
Sheibani R et al., 2019 [31]SpeechEnsemble Based MethodClassification of PD from HCJupyterLab with python programming languageUCI machine learning repository31, 23 PD + 8 HCAccuracy obtained from ensemble learning—90.6%,
John M. Tracy et al., 2020 [32]SpeechLogistic Regression (L2-Regularized), Random Forest, Gradient Boosted TreesClassification of PD from HCPythonmPower database2289, 246 PD + 2023 HCHighest accuracy obtained from gradient boosted trees Recall—79.7%, Precision—90.1%, F1-score—83.6%
Cibulka et al., 2019 [33]Handwriting PatternsRandom ForestClassification of PD from HCNot mentionedCollected from participants270, 150 PD + 120 HCClassification error for rs11240569, rs708727, rs823156 is 49.6%, 44.8%, 49.3% respectively.
Hsu S-Y et al., 2019 [34]Handwriting PatternsSVM with RBF Kernel, Logistic RegressionClassification of PD from HCWekaPACS202, 94 Severe PD + 102 mild PD + 6 HCHighest accuracy obtained from SVM-RBF 83.2% having sensitivity 82.8%, specificity 100%
Drotár, P et al., 2016 [35]Handwriting PatternsK-NN, Ensemble AdaBoostClassifier, Support Vector MachineClassification of PD from HCPython [scikit-learn library]PaHaW database37 PD and 38 HCAccuracy—81.3%
Fabian Maass et al., 2020 [36]Handwriting PatternsSVMClassification of PD from HCWekaUCI machine learning repository157, 82 PD + 68 HC +7 Normal Pressure Hydrocephalus (NPH)sensitivity-80%, and specificity—83%
J. Mucha et al., 2018 [37]Handwriting PatternsRandom Forest ClassifierClassification of PD from HCPython ProgrammingPaHaW database69, 33 PD + 36 HCObtained classification accuracy-90% with sensitivity 89%, and specificity 91%
Wenzel et al., 2019 [38]Handwriting PatternsCNNClassification of PD from HCMATLABPPMI database645, 438 PD + 207 HCAccuracy-97.2%
Segovia, F. et al., 2019 [39]Handwriting PatternsSVM with 10 Cross ValidationClassification of PD from HCPython programmingVirgen De La Victoria Hospital, Malaga, Spain189, 95 PD + 94 HCAccuracy-94.25%
Ye, Q. et al., 2018 [40]GaitLeast Square (LS)—SVM, Particle Swarm Optimization (PSO)Classification of PD, ALS, HD from HCNot mentionedNeurology Outpatient Clinic at Massachusetts General Hospital, Boston, MA, USA64, 15 PD + 16 HC + 13 (Amyotrophic lateralsclerosis disease (ALS)) + 20 (Huntington’s disease (HD))Accuracy to diagnose PD from HC- 90.32%, Accuracy to diagnose HD from HC-94.44%, Accuracy to diagnose ALS from HC- 93.10%
Klomsae, A et al., 2018 [41]GaitFuzzy KNNClassification of PD, ALS, HD from HCNot mentionedNeurology Outpatient Clinic at Massachusetts General Hospital, Boston, MA, USA64, 15 PD + 20 HD + 13 ALS + 16 HCAccuracy to diagnose PD from HC- 96.43%, Accuracy to diagnose HD from HC-97.22%, Accuracy to diagnose ALS from HC-96.88%
J. P. Félix et al., 2019 [42]GaitSVM, KNN, Naïve Bayes, LDA, Decision TreeClassification of PD from HCMATLAB R2017aNeurology Outpatient Clinic at Massachusetts General Hospital, Boston, MA, USA31, 15 PD + 16 HCHighest accuracy obtained from SVM, KNN, and decision tree- 96.8%
Andrei et al., 2019 [43]GaitSVMClassification of PD from HCNot mentionedLaboratory for Gait and Neurodynamics166, 93 PD + 73 HCAccuracy-100%
Priya SJ et al., 2021 [44]GaitANNClassification of PD from HCMATLAB R2018bLaboratory for Gait and Neurodynamics166, 93 PD + 73 HCAccuracy-96.28%
Oğul, et al., 2020 [45]GaitANNClassification of PD from HCMATLABLaboratory for Gait and Neurodynamics166, 93 PD + 73 HCClassification accuracy-98.3%
Li B et al., 2020 [46]GaitDeep CNNClassification of PD from HCNot mentionedCollected from participants20, 10 PD + 10 HCAccuracy-91.9%
Table 2. Detail of Parkinson’s Dataset.
Table 2. Detail of Parkinson’s Dataset.
Dataset CharacteristicMultivariate
No. of Instances197
Attributes CharacteristicReal
No. of Attributes23
Missing ValuesN/A
Made byMax Little of the University of Oxford
Associated TasksClassification
Types of ClassificationBinary {0 for healthy and 1 for PD patient}
Table 3. SVM Classifier.
Table 3. SVM Classifier.
NameResults
Accuracy Score of test data87.17%
Accuracy Score of training data88.46%
Execution Time0.03111 s
F1-score66.19%
MCC56.59%
Table 4. Naïve Bayes Classifier Results.
Table 4. Naïve Bayes Classifier Results.
NameResults
Accuracy Rate of test data74.11%
Accuracy Rate of training data76.23%
Execution Time0.0323 s
F1-score86.74%
MCC66.56%
Table 5. Artificial Neural Network Classifier Outcome.
Table 5. Artificial Neural Network Classifier Outcome.
TitleResults
Accuracy Rate of test data96.7%
Accuracy Rate of training data97.4%
Execution Time0.025 s
F1-Score87.01%
MCC70.11%
Table 6. KNN Classifier Results.
Table 6. KNN Classifier Results.
NameResults
Accuracy Rate of test data87.17%
Accuracy Rate of training data88.46%
Execution Time0.03111 s
F1-score71%
MCC65.02%
Table 7. An overview of evaluation results.
Table 7. An overview of evaluation results.
Performance Measure
AccuracyF1-ScoreMCCSensitivitySpecificity
Training DatasetTest Dataset
SVM88.46%87.17%66.19%56.59%62.5%93.54%
Naïve Bayes76.23%74.11%86.74%66.56%84%79.76%
KNN88.46%87.17%71%65.02%60.0%93.54%
ANN97.4%96.7%87.01%70.11%92.42%91.25%
Table 8. Performance Comparison with previous studies.
Table 8. Performance Comparison with previous studies.
ReferenceBasisMachine Learning ClassifierAccuracySensitivitySpecificity
Sakar et al. [70]SpeechSVM and KNN68.45%6050
Vadovsk’y and Parali [71]SpeechC4.5 + C5.0 + randomforest + CART66.5NANA
Ouhmida, A. [72]SpeechSVM, K-NN, Decision Tree98.26% (AUC)NANA
Mabrouk et al., [73]SpeechRandom forest, SVM, MLP, KNN78.4% (SVM), 82.2% (KNN)NANA
Benba et al. [74]SpeechHFCC-SVM87.5%90%85%
Proposed WorkSpeechSVM, naïve Bayes, KNN and ANN87.17%, 74.11%, 87.17%, and 96.7%62.5%, 84%, 60%, and 92.42%93.54%, 79.76%, 93.54%, and 91.25%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Rana, A.; Dumka, A.; Singh, R.; Rashid, M.; Ahmad, N.; Panda, M.K. An Efficient Machine Learning Approach for Diagnosing Parkinson’s Disease by Utilizing Voice Features. Electronics 2022, 11, 3782. https://doi.org/10.3390/electronics11223782

AMA Style

Rana A, Dumka A, Singh R, Rashid M, Ahmad N, Panda MK. An Efficient Machine Learning Approach for Diagnosing Parkinson’s Disease by Utilizing Voice Features. Electronics. 2022; 11(22):3782. https://doi.org/10.3390/electronics11223782

Chicago/Turabian Style

Rana, Arti, Ankur Dumka, Rajesh Singh, Mamoon Rashid, Nazir Ahmad, and Manoj Kumar Panda. 2022. "An Efficient Machine Learning Approach for Diagnosing Parkinson’s Disease by Utilizing Voice Features" Electronics 11, no. 22: 3782. https://doi.org/10.3390/electronics11223782

APA Style

Rana, A., Dumka, A., Singh, R., Rashid, M., Ahmad, N., & Panda, M. K. (2022). An Efficient Machine Learning Approach for Diagnosing Parkinson’s Disease by Utilizing Voice Features. Electronics, 11(22), 3782. https://doi.org/10.3390/electronics11223782

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop