Analyzing Wav2Vec 1.0 Embeddings for Cross-Database Parkinson’s Disease Detection and Speech Features Extraction
Abstract
:1. Introduction
- Evaluation of wav2vec 1.0 embeddings for various inter-dataset HC vs. PD classification scenarios (also known as cross-db). There is still limited knowledge about high-performing methods for inter-dataset applications in PD detection based on speech. Research in this area is especially important for advancing federated learning.
- Current studies typically focus on diadochokinetic tasks or vowels. To address this gap, we demonstrate cross-db relevance using repeated syllables as well as read text.
- Providing evidence that wav2vec 1.0 embeddings can be used in various clinically relevant tasks related to PD, including regression. Our efforts focus on constructing models to predict quantitative targets such as age, loud region duration, and other articulation characteristics. Speech features like reading duration, pause detection, and speech instability are typically assessed through manual inspection and annotation of sound files using specialized software.
- To our knowledge, this paper is the first to investigate overlapping components in wav2vec 1.0 embeddings between classification and regression models, examining the presence of significant overlaps. Understanding the (dis)similarities between speech representations is essential for developing new corpus-independent models and for recommending standardized tasks that can be applied across different datasets.
- This work prioritizes wav2vec 1.0 over wav2vec 2.0, extensively examining its embeddings generated from its output layer. Despite its non-transformer-based design, wav2vec 1.0 has shown high performance, even comparable to other transformer-based embeddings.
2. Related Work
3. Methods
3.1. Datasets
3.1.1. Participants Rhythmically Repeat Syllables /pa/
3.1.2. Italian Study by Dimauro et al. [59]
3.1.3. English Dataset
3.2. Signal Processing and Feature Extraction
3.2.1. Naive Loud Regions Segmentation
3.2.2. MFCCs Features Calculation
3.2.3. Wav2Vec Embedding and Features Calculation
3.3. Modeling and Statistical Methods
3.3.1. Cross-Database Classification Experiments
3.3.2. Regression Experiments
3.3.3. Strategy to Evaluate Overlapping Components across Models
4. Results
4.1. Intra- and Inter-Dataset Classification for Detection of PD
4.2. Regression Models to Predict Age and Articulation
4.3. Exploration of Overlapping Important Features across Models
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Topol, E.J. High-Performance Medicine: The Convergence of Human and Artificial Intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef]
- Ching, T.; Himmelstein, D.S.; Beaulieu-Jones, B.K.; Kalinin, A.A.; Do, B.T.; Way, G.P.; Ferrero, E.; Agapow, P.-M.; Zietz, M.; Hoffman, M.M.; et al. Opportunities and Obstacles for Deep Learning in Biology and Medicine. J. R. Soc. Interface 2018, 15, 20170387. [Google Scholar] [CrossRef] [PubMed]
- Shehab, M.; Abualigah, L.; Shambour, Q.; Abu-Hashem, M.A.; Shambour, M.K.Y.; Alsalibi, A.I.; Gandomi, A.H. Machine Learning in Medical Applications: A Review of State-of-the-Art Methods. Comput. Biol. Med. 2022, 145, 105458. [Google Scholar] [CrossRef]
- Sigcha, L.; Borzì, L.; Amato, F.; Rechichi, I.; Ramos-Romero, C.; Cárdenas, A.; Gascó, L.; Olmo, G. Deep Learning and Wearable Sensors for the Diagnosis and Monitoring of Parkinson’s Disease: A Systematic Review. Expert Syst. Appl. 2023, 229, 120541. [Google Scholar] [CrossRef]
- Shaban, M. Deep Learning for Parkinson’s Disease Diagnosis: A Short Survey. Computers 2023, 12, 58. [Google Scholar] [CrossRef]
- Dixit, S.; Bohre, K.; Singh, Y.; Himeur, Y.; Mansoor, W.; Atalla, S.; Srinivasan, K. A Comprehensive Review on AI-Enabled Models for Parkinson’s Disease Diagnosis. Electronics 2023, 12, 783. [Google Scholar] [CrossRef]
- Klempíř, O.; Krupička, R. Machine Learning Using Speech Utterances for Parkinson Disease Detection. Clin. Technol. 2018, 48, 66–71. [Google Scholar]
- Schneider, S.; Baevski, A.; Collobert, R.; Auli, M. Wav2vec: Unsupervised Pre-Training for Speech Recognition. In Proceedings of the Interspeech 2019, Graz, Austria, 15 September 2019; ISCA: Singapore, 2019; pp. 3465–3469. [Google Scholar]
- Baevski, A.; Zhou, H.; Mohamed, A.; Auli, M. Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. arXiv 2020, arXiv:2006.11477. [Google Scholar]
- Baevski, A.; Mohamed, A. Effectiveness of Self-Supervised Pre-Training for ASR. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 7694–7698. [Google Scholar]
- Pepino, L.; Riera, P.; Ferrer, L. Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings. In Proceedings of the Interspeech 2021, Brno, Czech Republic, 30 August 2021; ISCA: Singapore, 2021; pp. 3400–3404. [Google Scholar]
- Javanmardi, F.; Tirronen, S.; Kodali, M.; Kadiri, S.R.; Alku, P. Wav2vec-Based Detection and Severity Level Classification of Dysarthria From Speech. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
- Défossez, A.; Caucheteux, C.; Rapin, J.; Kabeli, O.; King, J.-R. Decoding Speech Perception from Non-Invasive Brain Recordings. Nat. Mach. Intell. 2023, 5, 1097–1107. [Google Scholar] [CrossRef]
- Conneau, A.; Baevski, A.; Collobert, R.; Mohamed, A.; Auli, M. Unsupervised Cross-Lingual Representation Learning for Speech Recognition. In Proceedings of the Interspeech 2021, Brno, Czech Republic, 30 August 2021; ISCA: Singapore, 2021; pp. 2426–2430. [Google Scholar]
- Morris, M.E. Movement Disorders in People With Parkinson Disease: A Model for Physical Therapy. Phys. Ther. 2000, 80, 578–597. [Google Scholar] [CrossRef] [PubMed]
- Riboldi, G.M.; Frattini, E.; Monfrini, E.; Frucht, S.J.; Di Fonzo, A. A Practical Approach to Early-Onset Parkinsonism. JPD 2022, 12, 1–26. [Google Scholar] [CrossRef] [PubMed]
- Poewe, W.; Seppi, K.; Tanner, C.M.; Halliday, G.M.; Brundin, P.; Volkmann, J.; Schrag, A.-E.; Lang, A.E. Parkinson Disease. Nat. Rev. Dis. Primers 2017, 3, 17013. [Google Scholar] [CrossRef] [PubMed]
- Skodda, S.; Grönheit, W.; Mancinelli, N.; Schlegel, U. Progression of Voice and Speech Impairment in the Course of Parkinson’s Disease: A Longitudinal Study. Parkinson’s Dis. 2013, 2013, 389195. [Google Scholar] [CrossRef] [PubMed]
- Postuma, R.B.; Lang, A.E.; Gagnon, J.F.; Pelletier, A.; Montplaisir, J.Y. How Does Parkinsonism Start? Prodromal Parkinsonism Motor Changes in Idiopathic REM Sleep Behaviour Disorder. Brain 2012, 135, 1860–1870. [Google Scholar] [CrossRef]
- Rusz, J.; Tykalová, T.; Novotný, M.; Zogala, D.; Růžička, E.; Dušek, P. Automated Speech Analysis in Early Untreated Parkinson’s Disease: Relation to Gender and Dopaminergic Transporter Imaging. Eur. J. Neurol. 2022, 29, 81–90. [Google Scholar] [CrossRef]
- Neto, O.P. Harnessing Voice Analysis and Machine Learning for Early Diagnosis of Parkinson’s Disease: A Comparative Study Across Three Datasets. J. Voice 2024, S0892199724001395. [Google Scholar] [CrossRef]
- Klempíř, O.; Příhoda, D.; Krupička, R. Evaluating the Performance of Wav2vec Embedding for Parkinson’s Disease Detection. Meas. Sci. Rev. 2023, 23, 260–267. [Google Scholar] [CrossRef]
- Rahman, W.; Lee, S.; Islam, M.S.; Antony, V.N.; Ratnu, H.; Ali, M.R.; Mamun, A.A.; Wagner, E.; Jensen-Roberts, S.; Waddell, E.; et al. Detecting Parkinson Disease Using a Web-Based Speech Task: Observational Study. J. Med. Internet Res. 2021, 23, e26305. [Google Scholar] [CrossRef]
- Cumplido-Mayoral, I.; García-Prat, M.; Operto, G.; Falcon, C.; Shekari, M.; Cacciaglia, R.; Milà-Alomà, M.; Lorenzini, L.; Ingala, S.; Meije Wink, A.; et al. Biological Brain Age Prediction Using Machine Learning on Structural Neuroimaging Data: Multi-Cohort Validation against Biomarkers of Alzheimer’s Disease and Neurodegeneration Stratified by Sex. eLife 2023, 12, e81067. [Google Scholar] [CrossRef]
- Cole, J.H. Multimodality Neuroimaging Brain-Age in UK Biobank: Relationship to Biomedical, Lifestyle, and Cognitive Factors. Neurobiol. Aging 2020, 92, 34–42. [Google Scholar] [CrossRef]
- Smith, S.M.; Vidaurre, D.; Alfaro-Almagro, F.; Nichols, T.E.; Miller, K.L. Estimation of Brain Age Delta from Brain Imaging. NeuroImage 2019, 200, 528–539. [Google Scholar] [CrossRef] [PubMed]
- Eickhoff, C.R.; Hoffstaedter, F.; Caspers, J.; Reetz, K.; Mathys, C.; Dogan, I.; Amunts, K.; Schnitzler, A.; Eickhoff, S.B. Advanced Brain Ageing in Parkinson’s Disease Is Related to Disease Duration and Individual Impairment. Brain Commun. 2021, 3, fcab191. [Google Scholar] [CrossRef]
- Ravishankar, S.; Kumar, M.K.P.; Patage, V.V.; Tiwari, S.; Goyal, S. Prediction of Age from Speech Features Using a Multi-Layer Perceptron Model. In Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
- Sánchez-Hevia, H.A.; Gil-Pita, R.; Utrilla-Manso, M.; Rosa-Zurera, M. Age Group Classification and Gender Recognition from Speech with Temporal Convolutional Neural Networks. Multimed. Tools Appl. 2022, 81, 3535–3552. [Google Scholar] [CrossRef]
- Islam, R.; Abdel-Raheem, E.; Tarique, M. Voice Pathology Detection Using Convolutional Neural Networks with Electroglottographic (EGG) and Speech Signals. Comput. Methods Programs Biomed. Update 2022, 2, 100074. [Google Scholar] [CrossRef]
- Peng, X.; Xu, H.; Liu, J.; Wang, J.; He, C. Voice Disorder Classification Using Convolutional Neural Network Based on Deep Transfer Learning. Sci. Rep. 2023, 13, 7264. [Google Scholar] [CrossRef] [PubMed]
- Hireš, M.; Gazda, M.; Drotár, P.; Pah, N.D.; Motin, M.A.; Kumar, D.K. Convolutional Neural Network Ensemble for Parkinson’s Disease Detection from Voice Recordings. Comput. Biol. Med. 2022, 141, 105021. [Google Scholar] [CrossRef]
- Vásquez-Correa, J.C.; Orozco-Arroyave, J.R.; Nöth, E. Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease. In Proceedings of the Interspeech 2017, Stockholm, Sweden, 20 August 2017; ISCA: Singapore, 2017; pp. 314–318. [Google Scholar]
- Vásquez-Correa, J.C.; Rios-Urrego, C.D.; Arias-Vergara, T.; Schuster, M.; Rusz, J.; Nöth, E.; Orozco-Arroyave, J.R. Transfer Learning Helps to Improve the Accuracy to Classify Patients with Different Speech Disorders in Different Languages. Pattern Recognit. Lett. 2021, 150, 272–279. [Google Scholar] [CrossRef]
- Liu, X.; Wang, H.; He, T.; Liao, Y.; Jian, C. Recent Advances in Representation Learning for Electronic Health Records: A Systematic Review. J. Phys. Conf. Ser. 2022, 2188, 012007. [Google Scholar] [CrossRef]
- Wang, L.; Wang, Q.; Bai, H.; Liu, C.; Liu, W.; Zhang, Y.; Jiang, L.; Xu, H.; Wang, K.; Zhou, Y. EHR2Vec: Representation Learning of Medical Concepts From Temporal Patterns of Clinical Notes Based on Self-Attention Mechanism. Front. Genet. 2020, 11, 630. [Google Scholar] [CrossRef]
- Jiang, Z.; Yang, M.; Tsirlin, M.; Tang, R.; Dai, Y.; Lin, J. “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 6810–6828. [Google Scholar]
- Ali, S.; Chourasia, P.; Tayebi, Z.; Bello, B.; Patterson, M. ViralVectors: Compact and Scalable Alignment-Free Virome Feature Generation. Med. Biol. Eng. Comput. 2023, 61, 2607–2626. [Google Scholar] [CrossRef]
- Algayres, R.; Zaiem, M.S.; Sagot, B.; Dupoux, E. Evaluating the Reliability of Acoustic Speech Embeddings. In Proceedings of the Interspeech 2020, Shanghai, China, 25 October 2020; ISCA: Singapore, 2020; pp. 4621–4625. [Google Scholar]
- Zaiem, S.; Kemiche, Y.; Parcollet, T.; Essid, S.; Ravanelli, M. Speech Self-Supervised Representation Benchmarking: Are We Doing It Right? In Proceedings of the Interspeech 2023, Dublin, Ireland, 20 August 2023; ISCA: Singapore, 2023; pp. 2873–2877. [Google Scholar]
- Hugging Face–The AI Community Building the Future. Available online: https://huggingface.co/ (accessed on 24 July 2024).
- Snyder, D.; Garcia-Romero, D.; Sell, G.; Povey, D.; Khudanpur, S. X-Vectors: Robust DNN Embeddings for Speaker Recognition. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 5329–5333. [Google Scholar]
- Shor, J.; Venugopalan, S. TRILLsson: Distilled Universal Paralinguistic Speech Representations. In Proceedings of the Interspeech 2022, Incheon, Republic of Korea, 18 September 2022; ISCA: Singapore, 2022; pp. 356–360. [Google Scholar]
- Hsu, W.-N.; Bolte, B.; Tsai, Y.-H.H.; Lakhotia, K.; Salakhutdinov, R.; Mohamed, A. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 3451–3460. [Google Scholar] [CrossRef]
- Favaro, A.; Tsai, Y.-T.; Butala, A.; Thebaud, T.; Villalba, J.; Dehak, N.; Moro-Velázquez, L. Interpretable Speech Features vs. DNN Embeddings: What to Use in the Automatic Assessment of Parkinson’s Disease in Multi-Lingual Scenarios. Comput. Biol. Med. 2023, 166, 107559. [Google Scholar] [CrossRef] [PubMed]
- Moro-Velazquez, L.; Villalba, J.; Dehak, N. Using X-Vectors to Automatically Detect Parkinson’s Disease from Speech. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1155–1159. [Google Scholar]
- Jeancolas, L.; Petrovska-Delacrétaz, D.; Mangone, G.; Benkelfat, B.-E.; Corvol, J.-C.; Vidailhet, M.; Lehéricy, S.; Benali, H. X-Vectors: New Quantitative Biomarkers for Early Parkinson’s Disease Detection From Speech. Front. Neuroinform. 2021, 15, 578369. [Google Scholar] [CrossRef] [PubMed]
- Burkhardt, F.; Wagner, J.; Wierstorf, H.; Eyben, F.; Schuller, B. Speech-Based Age and Gender Prediction with Transformers. arXiv 2023, arXiv:2306.16962. [Google Scholar]
- Escobar-Grisales, D.; Ríos-Urrego, C.D.; Orozco-Arroyave, J.R. Deep Learning and Artificial Intelligence Applied to Model Speech and Language in Parkinson’s Disease. Diagnostics 2023, 13, 2163. [Google Scholar] [CrossRef] [PubMed]
- Hireš, M.; Drotár, P.; Pah, N.D.; Ngo, Q.C.; Kumar, D.K. On the Inter-Dataset Generalization of Machine Learning Approaches to Parkinson’s Disease Detection from Voice. Int. J. Med. Inform. 2023, 179, 105237. [Google Scholar] [CrossRef] [PubMed]
- Javanmardi, F.; Kadiri, S.R.; Alku, P. Exploring the Impact of Fine-Tuning the Wav2vec2 Model in Database-Independent Detection of Dysarthric Speech. IEEE J. Biomed. Health Inform. 2024, 28, 4951–4962. [Google Scholar] [CrossRef] [PubMed]
- Javanmardi, F.; Kadiri, S.R.; Alku, P. Pre-Trained Models for Detection and Severity Level Classification of Dysarthria from Speech. Speech Commun. 2024, 158, 103047. [Google Scholar] [CrossRef]
- Cabitza, F.; Campagner, A. The Need to Separate the Wheat from the Chaff in Medical Informatics. Int. J. Med. Inform. 2021, 153, 104510. [Google Scholar] [CrossRef]
- Illner, V.; Krýže, P.; Švihlík, J.; Sousa, M.; Krack, P.; Tripoliti, E.; Jech, R.; Rusz, J. Which Aspects of Motor Speech Disorder Are Captured by Mel Frequency Cepstral Coefficients? Evidence from the Change in STN-DBS Conditions in Parkinson’s Disease. In Proceedings of the Interspeech 2023, Dublin, Ireland, 20 August 2023; ISCA: Singapore, 2023; pp. 5027–5031. [Google Scholar]
- Tracey, B.; Volfson, D.; Glass, J.; Haulcy, R.; Kostrzebski, M.; Adams, J.; Kangarloo, T.; Brodtmann, A.; Dorsey, E.R.; Vogel, A. Towards Interpretable Speech Biomarkers: Exploring MFCCs. Sci. Rep. 2023, 13, 22787. [Google Scholar] [CrossRef]
- Acosta, J.N.; Falcone, G.J.; Rajpurkar, P.; Topol, E.J. Multimodal Biomedical AI. Nat. Med. 2022, 28, 1773–1784. [Google Scholar] [CrossRef] [PubMed]
- Dogan, G.; Akbulut, F.P. Multi-Modal Fusion Learning through Biosignal, Audio, and Visual Content for Detection of Mental Stress. Neural Comput. Appl. 2023, 35, 24435–24454. [Google Scholar] [CrossRef]
- Nguyen, N.D.; Huang, J.; Wang, D. A Deep Manifold-Regularized Learning Model for Improving Phenotype Prediction from Multi-Modal Data. Nat. Comput. Sci. 2022, 2, 38–46. [Google Scholar] [CrossRef]
- Dimauro, G.; Di Nicola, V.; Bevilacqua, V.; Caivano, D.; Girardi, F. Assessment of Speech Intelligibility in Parkinson’s Disease Using a Speech-To-Text System. IEEE Access 2017, 5, 22199–22208. [Google Scholar] [CrossRef]
- Jaeger, H.; Trivedi, D.; Stadtschnitzer, M. Mobile Device Voice Recordings at King’s College London (MDVR-KCL) from Both Early and Advanced Parkinson’s Disease Patients and Healthy Controls 2019. Available online: https://data.niaid.nih.gov/resources?id=zenodo_2867215 (accessed on 22 July 2024).
- Hähnel, T.; Nemitz, A.; Schimming, K.; Berger, L.; Vogel, A.; Gruber, D.; Schnalke, N.; Bräuer, S.; Falkenburger, B.H.; Gandor, F. Speech Differences between Multiple System Atrophy and Parkinson’s Disease: A Multicenter Study. medRxiv 2024. [Google Scholar] [CrossRef]
- McFee, B.; Raffel, C.; Liang, D.; Ellis, D.; McVicar, M.; Battenberg, E.; Nieto, O. Librosa: Audio and Music Signal Analysis in Python. In Proceedings of the SciPy 2015 14th Python in Science Conference, Austin, TX, USA, 6–12 July 2015; pp. 18–24. [Google Scholar]
- Wav2vec Large. Available online: https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_large.pt (accessed on 24 July 2024).
- Scikit-Learn: Machine Learning in Python—Scikit-Learn 1.5.1 Documentation. Available online: https://scikit-learn.org/ (accessed on 16 August 2024).
- RandomForestClassifier. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (accessed on 16 August 2024).
- XGBoost Documentation—Xgboost 2.1.1 Documentation. Available online: https://xgboost.readthedocs.io (accessed on 16 August 2024).
- Lasso. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html (accessed on 16 August 2024).
- Spearmanr—SciPy v1.14.0 Manual. Available online: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html (accessed on 16 August 2024).
- R2_Score. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html (accessed on 16 August 2024).
- Mean_Absolute_Error. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html (accessed on 16 August 2024).
- LogisticRegression. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html (accessed on 16 August 2024).
- Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
- Ibarra, E.J.; Arias-Londoño, J.D.; Zañartu, M.; Godino-Llorente, J.I. Towards a Corpus (and Language)-Independent Screening of Parkinson’s Disease from Voice and Speech through Domain Adaptation. Bioengineering 2023, 10, 1316. [Google Scholar] [CrossRef] [PubMed]
- Tirronen, S.; Javanmardi, F.; Kodali, M.; Reddy Kadiri, S.; Alku, P. Utilizing Wav2Vec In Database-Independent Voice Disorder Detection. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
- Malekroodi, H.S.; Madusanka, N.; Lee, B.; Yi, M. Leveraging Deep Learning for Fine-Grained Categorization of Parkinson’s Disease Progression Levels through Analysis of Vocal Acoustic Patterns. Bioengineering 2024, 11, 295. [Google Scholar] [CrossRef] [PubMed]
- Di Cesare, M.G.; Perpetuini, D.; Cardone, D.; Merla, A. Machine Learning-Assisted Speech Analysis for Early Detection of Parkinson’s Disease: A Study on Speaker Diarization and Classification Techniques. Sensors 2024, 24, 1499. [Google Scholar] [CrossRef]
- Bisgin, H.; Bera, T.; Ding, H.; Semey, H.G.; Wu, L.; Liu, Z.; Barnes, A.E.; Langley, D.A.; Pava-Ripoll, M.; Vyas, H.J.; et al. Comparing SVM and ANN Based Machine Learning Methods for Species Identification of Food Contaminating Beetles. Sci. Rep. 2018, 8, 6532. [Google Scholar] [CrossRef]
- Bhadra, T.; Mallik, S.; Hasan, N.; Zhao, Z. Comparison of Five Supervised Feature Selection Algorithms Leading to Top Features and Gene Signatures from Multi-Omics Data in Cancer. BMC Bioinform. 2022, 23, 153. [Google Scholar] [CrossRef]
- Joudaki, A.; Takeda, J.; Masuda, A.; Ode, R.; Fujiwara, K.; Ohno, K. FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon. Genes 2023, 14, 1765. [Google Scholar] [CrossRef]
- Riviere, M.; Joulin, A.; Mazare, P.-E.; Dupoux, E. Unsupervised Pretraining Transfers Well Across Languages. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 7414–7418. [Google Scholar]
- Islam, M.S.; Rahman, W.; Abdelkader, A.; Lee, S.; Yang, P.T.; Purks, J.L.; Adams, J.L.; Schneider, R.B.; Dorsey, E.R.; Hoque, E. Using AI to Measure Parkinson’s Disease Severity at Home. NPJ Digit. Med. 2023, 6, 156. [Google Scholar] [CrossRef] [PubMed]
- Tayebi Arasteh, S.; Ríos-Urrego, C.D.; Nöth, E.; Maier, A.; Yang, S.H.; Rusz, J.; Orozco-Arroyave, J.R. Federated Learning for Secure Development of AI Models for Parkinson’s Disease Detection Using Speech from Different Languages. In Proceedings of the Interspeech 2023, Dublin, Ireland, 20 August 2023; ISCA: Singapore, 2023; pp. 5003–5007. [Google Scholar]
- Xie, J.; Fonseca, P.; Van Dijk, J.; Overeem, S.; Long, X. Assessment of Obstructive Sleep Apnea Severity Using Audio-Based Snoring Features. Biomed. Signal Process. Control 2023, 86, 104942. [Google Scholar] [CrossRef]
- Chronowski, M.; Klaczynski, M.; Dec-Cwiek, M.; Porebska, K. Parkinson’s Disease Diagnostics Using AI and Natural Language Knowledge Transfer. arXiv 2022, arXiv:2204.12559. [Google Scholar] [CrossRef]
- Javanmardi, F.; Kadiri, S.R.; Alku, P. A Comparison of Data Augmentation Methods in Voice Pathology Detection. Comput. Speech Lang. 2024, 83, 101552. [Google Scholar] [CrossRef]
- Sriram, A.; Auli, M.; Baevski, A. Wav2Vec-Aug: Improved Self-Supervised Training with Limited Data. In Proceedings of the Interspeech 2022, Incheon, Republic of Korea, 18 September 2022; ISCA: Singapore, 2022; pp. 4950–4954. [Google Scholar]
# | TRAIN | TEST | w2v-Mean | w2v-Sum | w2v-std | MFCC-Mean | 10 PCA w2v-Mean | 10 PCA MFCC-Mean | MFCC-Mean + w2v-Mean ** |
---|---|---|---|---|---|---|---|---|---|
1 | pa | pa | 0.61 * | 0.82 * | 0.70 | 0.78 | 0.58 | 0.84 | 0.81 |
Italian | Italian | 0.98 * | 0.97 * | 0.98 | 0.98 | 0.98 | 0.99 | 0.99 | |
Eng | Eng | 0.80 | 0.71 | 0.80 | 0.74 | 0.65 | 0.79 | 0.77 | |
2 | Italian, Eng | Italian, Eng | 0.94 | 0.90 | 0.93 | 0.89 | 0.83 | 0.81 | 0.88 |
Italian, pa | Italian, pa | 0.87 | 0.89 | 0.86 | 0.92 | 0.81 | 0.89 | 0.91 | |
Eng, pa | Eng, pa | 0.72 | 0.75 | 0.71 | 0.71 | 0.58 | 0.81 | 0.78 | |
3 | pa | Italian | 0.46 | 0.50 | 0.50 | 0.68 | 0.68 | 0.40 | 0.57 |
pa | Eng | 0.40 | 0.50 | 0.51 | 0.49 | 0.34 | 0.68 | 0.55 | |
Italian | pa | 0.48 | 0.43 | 0.46 | 0.49 | 0.61 | 0.63 | 0.68 | |
Italian | Eng | 0.72 * | 0.56 | 0.67 | 0.47 | 0.56 | 0.42 | 0.51 | |
Eng | Italian | 0.90 * | 0.78 | 0.77 | 0.51 | 0.73 | 0.34 | 0.47 | |
Eng | pa | 0.46 | 0.53 | 0.48 | 0.44 | 0.48 | 0.60 | 0.60 | |
4 | Italian, Eng | pa | 0.49 | 0.43 | 0.52 | 0.55 | 0.53 | 0.63 | 0.64 |
Italian, pa | Eng | 0.72 | 0.61 | 0.61 | 0.56 | 0.46 | 0.46 | 0.51 | |
Eng, pa | Italian | 0.86 | 0.80 | 0.88 | 0.56 | 0.75 | 0.50 | 0.64 |
Task | Scenario | TOP10 (min. 2) | TOP20 (min. 3) | TOP30 (min. 5) | TOP50 (min. 9) |
---|---|---|---|---|---|
Random Forest Classifier | pa vs. Italian | 0.1 (±0.3) | 0.4 (±0.7) | 1.2 (±1.0) | 3.9 (±1.8) |
pa vs. Eng | 0.0 (±0.2) | 0.4 (±0.6) | 1.0 (±0.9) | 4.5 (±1.8) | |
Italian vs. Eng | 0.8 (±0.8) | 1.8 (±1.0) | * 3.9 (±1.2) | * 7.6 (±1.8) | |
Random Forest Regressor | Age vs. CHPS | ** 2.1 (±0.7) | ** 4.7 (±1.4) | ** 7.6 (±1.5) | ** 11.4 (±2.0) |
Age vs. LRD | 0.5 (±0.5) | * 2.1 (±1.0) | * 4.5 (±1.4) | * 7.1 (±2.5) | |
CHPS vs. LRD | 0.8 (±0.7) | ** 3.8 (±1.3) | ** 8.5 (±1.7) | ** 16.4 (±2.2) |
Task | Scenario | TOP5 (min. 3) | TOP10 (min. 4) | TOP15 (min. 7) | TOP20 (min. 10) |
---|---|---|---|---|---|
Random Forest Classifier | pa vs. Italian | 0.5 (±0.6) | 1.5 (±0.8) | 3.6 (±0.9) | 6.7 (±1.5) |
pa vs. Eng | 0.2 (±0.4) | 0.8 (±0.7) | 2.8 (±1.2) | 5.1 (±1.1) | |
Italian vs. Eng | 0.0 (±0.1) | 1.0 (±0.8) | 3.2 (±1.0) | 7.5 (±1.3) | |
Random Forest Regressor | Age vs. CHPS | 1.2 (±0.4) | 2.5 (±0.7) | 5.4 (±1.0) | * 9.2 (±0.9) |
Age vs. LRD | 1.0 (±0.0) | 2.0 (±0.7) | 4.6 (±1.0) | 7.9 (±1.1) | |
CHPS vs. LRD | 2.1 (±0.3) | ** 4.7 (±0.9) | ** 8.7 (±0.8) | ** 10.9 (±1.3) |
Study | Features | Method | Acoustic Material (TRAIN) | Acoustic Material (TEST) | Metric 1 | Drop 2 |
---|---|---|---|---|---|---|
Hires et al., 2023 [50] | Traditional approach | XGBoost | CzechPD vowel /a/ | RMIT-PD vowel /a/ | AUC = 0.74 | ~7% |
Hires et al., 2023 [50] | STFT | CNN | ITA 3 vowel /a/ | RMIT-PD vowel /a/ | AUC = 0.70 | ~25% |
Ibarra et al., 2023 [73] | Mel-scale spectrograms | 1D-CNN with DA 4 | mixed four vowel /a/ datasets | PD-Neurovoz vowel /a/ | Acc = 72% | 0% 5 |
Ibarra et al., 2023 [73] | Mel-scale spectrograms | 1D-CNN with DA 4 | mixed four /pa-ta-ka/ datasets | PD-Neurovoz /pa-ta-ka/ | Acc = 83% | 3% 5 |
Tirronen et al., 2023 [74] | wav2vec 2.0 | SVM | HUPA vowels /a/ | SVD vowels /a/ | Acc = 58% | ~11% |
Favaro et al., 2023 [45] | TRILLsson | PLDA + PCA | Cross-lingual SS/RP/TDU 6 | CzechPD SS/RP/TDU 6 | AUC was close to 1 | 0% |
Javanmardi et al., 2024 [51] | Fine-tuned wav2vec 2.0-XLSR | SVM | EasyCall | TORGO | Acc = 70.3% | ~1% |
This paper | wav2vec 1.0 | Random Forest | MDVR-KCL read text | ITA 3 read text | AUC = 0.90 | 8% |
This paper | wav2vec 1.0 | XGBoost | MDVR-KCL read text | ITA 3 read text | AUC = 0.98 | 0% |
This paper | wav2vec 1.0 | Random Forest | ITA 3 read text | MDVR-KCL read text | AUC = 0.72 | 8% |
This paper | wav2vec 1.0 | XGBoost | ITA 3 read text | MDVR-KCL read text | AUC = 0.78 | 2% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Klempíř, O.; Krupička, R. Analyzing Wav2Vec 1.0 Embeddings for Cross-Database Parkinson’s Disease Detection and Speech Features Extraction. Sensors 2024, 24, 5520. https://doi.org/10.3390/s24175520
Klempíř O, Krupička R. Analyzing Wav2Vec 1.0 Embeddings for Cross-Database Parkinson’s Disease Detection and Speech Features Extraction. Sensors. 2024; 24(17):5520. https://doi.org/10.3390/s24175520
Chicago/Turabian StyleKlempíř, Ondřej, and Radim Krupička. 2024. "Analyzing Wav2Vec 1.0 Embeddings for Cross-Database Parkinson’s Disease Detection and Speech Features Extraction" Sensors 24, no. 17: 5520. https://doi.org/10.3390/s24175520
APA StyleKlempíř, O., & Krupička, R. (2024). Analyzing Wav2Vec 1.0 Embeddings for Cross-Database Parkinson’s Disease Detection and Speech Features Extraction. Sensors, 24(17), 5520. https://doi.org/10.3390/s24175520