Pareto-Optimized Non-Negative Matrix Factorization Approach to the Cleaning of Alaryngeal Speech Signals
Abstract
:Simple Summary
Abstract
1. Introduction
- We propose a novel method for cleaning impaired speech signals by combining Pareto-optimized deep learning with NMF, addressing the limitations of traditional speech enhancement techniques.
- We introduce a smoothing technique for the noise-to-signal mask to avoid abrupt transitions in noise levels, resulting in a more natural-sounding output signal.
- We demonstrate the effectiveness of our approach through a series of experiments, showing significant improvements in speech quality and intelligibility compared to traditional methods.
2. Review of State-of-the-Art Works
2.1. Assessing Speech-Signal Impairments
2.2. Algorithms for Alaryngeal Speech Enhancement
3. Materials and Methods
3.1. Dataset
3.2. Alaryngeal Speech Assessment
- 1.
- The artificial intelligence-based automated classifier for substitution voicing ResNet 118 was used to assign speech samples to the following classes: normal speech—Probability 0; speech with a single vocal fold—Probability 1; and alaryngeal speech with TEP—Probability 2 [91].
- 2.
- The acoustic parameter of alaryngeal speech (average voicing evidence (AVE), available in the AMPEX software [92]) was utilized to compare the alaryngeal speech samples before and after optimization using Pareto-optimized NMF software. The AVE parameter describes the average voicing evidence and the degree of regularity/periodicity in the voiced frames. Since the actual background frames are usually unvoiced, the analysis is performed on all frames, not just speech frames. This approach is more robust against possible errors of the speech/background classification, which is purely energy-based. In contrast, the voicing evidence is derived from analyzing all the sub-band signals created by the auditory model.
- 3.
- The AI-based acoustic substitution voicing index (ASVI) parameter [93] was employed to quantitatively evaluate the alaryngeal speech samples before and after optimization using Pareto-optimized NMF software. This parameter includes the constant combined with statistically significant parameters from ResNet 118 (Probability 0, Probability 1, and Probability 2) combined with the AVE and mean fundamental frequency. The possible ASVI values ranged from 0 to 30, with better speech quality indicated by higher scores.
3.3. Methodology
3.3.1. Non-Negative Matrix Factorization (NMF)
3.3.2. Pareto-Optimized Non-Negative Matrix Factorization (PONMF)
Algorithm 1 Pareto-Optimized Deep Learning for Impaired Speech Cleaning |
|
3.3.3. Speech-Signal Cleaning
- 1.
- Calculate the spectrogram of the entire noisy voice clip. This is achieved by windowing the noisy voice clip and taking its Fourier transform over time to obtain a spectrogram, which is a representation of the frequency spectrum of a signal over time.
- 2.
- Compute the frequency statistics from the spectrogram. This is achieved by calculating the mean and standard deviation of the magnitude of each frequency bin over time. These statistics help in understanding the distribution and characteristics of the noise present in the voice clip.
- 3.
- Calculate a threshold based on the desired noise sensitivity. This threshold helps differentiate between the noise and signal components in the spectrogram.
- 4.
- Determine the signal spectrogram using the same input noisy voice clip. This is achieved by windowing the noisy voice clip and taking its Fourier transform over time.
- 5.
- Compute the noise-to-signal mask using the calculated threshold. The mask is a binary value for each frequency bin and time frame of the spectrogram, where 1 indicates the signal and 0 indicates noise.
- 6.
- Smooth the noise-to-signal mask by applying a filter in both the frequency and time domains. This helps avoid sudden jumps in noise levels and produces a more continuous and less abrupt mask.
- 7.
- Apply the smoothed mask to the spectrogram of the signal. This step effectively suppresses the noise components in the spectrogram while retaining the desired signal.
- 8.
- Decompose the modified spectrogram using Pareto-optimized non-negative matrix factorization (NMF). NMF-based methods for speech enhancement involve learning the basis functions and Pareto-optimized weights that best represent the clean speech signal.
- 9.
- Reconstruct the clean speech from the noisy input signal using the learned basis functions and Pareto-optimized weights.
- 10.
- Invert the reconstructed spectrogram to create a noise-reduced waveform. This final output is a cleaned version of the original impaired speech, with the noise components significantly reduced or removed.
3.3.4. Pareto-Optimized Deep Learning with NMF for Impaired Speech Cleaning
Algorithm 2 Pareto-Optimized Deep Learning with NMF for Impaired Speech Cleaning |
|
4. Results
- Probability 0: sig. = 0.000, indicating that the variances were not equal across groups.
- Probability 1: sig. = 0.454, indicating that the variances were equal.
- Probability 2: sig. = 0.008, indicating that the variances were not equal.
- AVE: sig. = 0.340, indicating equal variances across groups.
- ASVI: sig. = 0.166, indicating equal variances across groups.
- Probability 0: sig. = 0.036 (for equal variances assumed) and 0.037 (for equal variances not assumed), indicating that the means of the two groups were significantly different.
- Probability 1: sig. = 0.890 (both cases), indicating that the means were not significantly different.
- Probability 2: sig. = 0.163 (both cases), indicating that the means are not significantly different.
- AVE: sig. = 0.750 (both cases), indicating that the means are not significantly different.
- ASVI: sig. = 0.133 (for equal variances assumed) and 0.134 (for equal variances not assumed), indicating that the means are not significantly different.
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
NMF | non-negative matrix factorization |
UA-SPEECH | sound dataset |
GMM | Gaussian mixture model |
SVM | support vector machine |
AGMM | adjusted Gaussian mixture model |
DSP | digital signal processing |
AI | Artificial Intelligence |
LMS | least mean square |
SNR | signal-to-noise ratio |
MSE | mean square error |
PSNR | peak signal-to-noise ratio |
LPC | linear predictive coding |
CNN | convolutional neural network |
RNN | recurrent neural network |
GAN | generative adversarial network |
TEP | tracheoesophageal prosthesis |
Prob0 | probability of healthy speech |
Prob1 | probability of speech with a single vocal fold |
Prob2 | probability of tracheoesophageal speech |
AVE | average voicing evidence |
ASVI | acoustic substitution voicing index |
References
- Steuer, C.E.; El-Deiry, M.; Parks, J.R.; Higgins, K.A.; Saba, N.F. An update on larynx cancer. CA Cancer J. Clin. 2016, 67, 31–50. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Groome, P.A.; O’Sullivan, B.; Irish, J.C.; Rothwell, D.M.; Schulze, K.; Warde, P.R.; Schneider, K.M.; Mackenzie, R.G.; Hodson, D.I.; Hammond, J.A.; et al. Management and Outcome Differences in Supraglottic Cancer Between Ontario, Canada, and the Surveillance, Epidemiology, and End Results Areas of the United States. J. Clin. Oncol. 2003, 21, 496–505. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Groome, P.; Schulze, K.; Keller, S.; Mackillop, W.; O’Sullivan, B.; Irish, J.; Bissett, R.; Dixon, P.; Eapen, L.; Gulavita, S.; et al. Explaining Socioeconomic Status Effects in Laryngeal Cancer. Clin. Oncol. 2006, 18, 283–292. [Google Scholar] [CrossRef]
- Hoffman, H.T.; Porter, K.; Karnell, L.H.; Cooper, J.S.; Weber, R.S.; Langer, C.J.; Ang, K.K.; Gay, G.; Stewart, A.; Robinson, R.A. Laryngeal Cancer in the United States: Changes in Demographics, Patterns of Care, and Survival. Laryngoscope 2006, 116, 1–13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Caudell, J.J.; Gillison, M.L.; Maghami, E.; Spencer, S.; Pfister, D.G.; Adkins, D.; Birkeland, A.C.; Brizel, D.M.; Busse, P.M.; Cmelak, A.J.; et al. NCCN Guidelines® Insights: Head and Neck Cancers, Version 1.2022. J. Natl. Compr. Cancer Netw. 2022, 20, 224–234. [Google Scholar] [CrossRef]
- Allegra, E.; Mantia, I.L.; Bianco, M.R.; Drago, G.D.; Fosse, M.C.L.; Azzolina, A.; Grillo, C.; Saita, V. Verbal performance of total laryngectomized patients rehabilitated with esophageal speech and tracheoesophageal speech: Impacts on patient quality of life. Psychol. Res. Behav. Manag. 2019, 12, 675–681. [Google Scholar] [CrossRef] [Green Version]
- van Sluis, K.E.; van der Molen, L.; van Son, R.J.J.H.; Hilgers, F.J.M.; Bhairosing, P.A.; van den Brekel, M.W.M. Objective and subjective voice outcomes after total laryngectomy: A systematic review. Eur. Arch. Oto-Rhino-Laryngol. 2017, 275, 11–26. [Google Scholar] [CrossRef] [Green Version]
- Chakravarty, P.D.; McMurran, A.E.L.; Banigo, A.; Shakeel, M.; Ah-See, K.W. Primary versus secondary tracheoesophageal puncture: Systematic review and meta-analysis. J. Laryngol. Otol. 2017, 132, 14–21. [Google Scholar] [CrossRef] [Green Version]
- Hurren, A.; Miller, N. Voice outcomes post total laryngectomy. Curr. Opin. Otolaryngol. Head Neck Surg. 2017, 25, 205–210. [Google Scholar] [CrossRef]
- Kotby, M.; Hegazi, M.; Kamal, I.; el Dien, N.G.; Nassar, J. Aerodynamics of the Pseudo-Glottis. Folia Phoniatr. Logop. 2009, 61, 24–28. [Google Scholar] [CrossRef]
- Brook, I.; Goodman, J.F. Tracheoesophageal Voice Prosthesis Use and Maintenance in Laryngectomees. Int. Arch. Otorhinolaryngol. 2020, 24, e535–e538. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- de Coul, B.M.R.O.; Hilgers, F.J.M.; Balm, A.J.M.; Tan, I.B.; van den Hoogen, F.J.A.; van Tinteren, H. A Decade of Postlaryngectomy Vocal Rehabilitation in 318 Patients. Arch. Otolaryngol. Head Neck Surg. 2000, 126, 1320. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dejonckere, P.H.; Bradley, P.; Clemente, P.; Cornut, G.; Crevier-Buchman, L.; Friedrich, G.; Heyning, P.V.D.; Remacle, M.; Woisard, V. A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Eur. Arch. Oto-Rhino-Laryngol. 2001, 258, 77–82. [Google Scholar] [CrossRef] [PubMed]
- Semple, C.; Parahoo, K.; Norman, A.; McCaughan, E.; Humphris, G.; Mills, M. Psychosocial interventions for patients with head and neck cancer. Cochrane Database Syst. Rev. 2013. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Uscher-Pines, L.; Sousa, J.; Raja, P.; Mehrotra, A.; Barnett, M.L.; Huskamp, H.A. Suddenly Becoming a “Virtual Doctor”: Experiences of Psychiatrists Transitioning to Telemedicine during the COVID-19 Pandemic. Psychiatr. Serv. 2020, 71, 1143–1150. [Google Scholar] [CrossRef]
- Bohnenkamp, T.A. Postlaryngectomy Respiratory System and Speech Breathing. In Clinical Care and Rehabilitation in Head and Neck Cancer; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 103–117. [Google Scholar] [CrossRef]
- Qian, Z.; Niu, H.; Wang, L.; Kobayashi, K.; Zhang, S.; Toda, T. Mandarin Electro-Laryngeal Speech Enhancement based on Statistical Voice Conversion and Manual Tone Control. In Proceedings of the 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan, 14–17 December 2021; pp. 546–552. [Google Scholar]
- Dinh, T.; Kain, A.; Samlan, R.; Cao, B.; Wang, J. Increasing the Intelligibility and Naturalness of Alaryngeal Speech Using Voice Conversion and Synthetic Fundamental Frequency. Proc. Interspeech 2020, 2020, 4781–4785. [Google Scholar] [CrossRef]
- Graham, M.S. Strategies for Excelling with Alaryngeal Speech Methods. Perspect. Voice Voice Disord. 2006, 16, 25–32. [Google Scholar] [CrossRef]
- Kabir, R.; Greenblatt, A.; Panetta, K.; Agaian, S. Enhancement of alaryngeal speech utilizing spectral subtraction and minimum statistics. In Proceedings of the 2008 International Conference on Machine Learning and Cybernetics, Kunming, China, 12–15 July 2008; Volume 7, pp. 3704–3709. [Google Scholar] [CrossRef]
- Garg, A.; Sahu, O.P. Enhancement of speech signal using diminished empirical mean curve decomposition-based adaptive Wiener filtering. Pattern Anal. Appl. 2019, 23, 179–198. [Google Scholar] [CrossRef]
- Wang, Q.; Du, X.; Gu, W. A Source-Filter Model-Based Unvoiced Speech Detector for Speech Coding. In Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013), Hangzhou, China, 22–23 March 2013; Atlantis Press: Dordrecht, The Netherlands, 2013. [Google Scholar] [CrossRef] [Green Version]
- Huq, M.; Maskeliunas, R. Speech Enhancement Using Generative Adversarial Network (GAN). In Hybrid Intelligent Systems; Springer International Publishing: Berlin/Heidelberg, Germany, 2022; pp. 273–282. [Google Scholar] [CrossRef]
- Sack, A.; Jiang, W.; Perlmutter, M.; Salanevich, P.; Needell, D. On Audio Enhancement via Online Non-Negative Matrix Factorization. In Proceedings of the 2022 56th Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 9–11 March 2022; pp. 287–291. [Google Scholar] [CrossRef]
- Wang, D.; Cui, J.; Wang, J.; Tan, H.; Xu, M. Convex Hull Convolutive Non-negative Matrix Factorization Based Speech Enhancement For Multimedia Communication. In Proceedings of the 2022 6th International Conference on Cryptography, Security and Privacy (CSP), Tianjin, China, 14–16 January 2022; pp. 138–142. [Google Scholar] [CrossRef]
- Knollhoff, S.M.; Borrie, S.A.; Barrett, T.S.; Searl, J.P. Listener impressions of alaryngeal communication modalities. Int. J. Speech-Lang. Pathol. 2021, 23, 540–547. [Google Scholar] [CrossRef]
- Mouret, F.; Crevier-Buchman, L.; Pillot-Loiseau, C. Intelligibility of pseudo-whispered speech after total laryngectomy. Clin. Linguist. Phon. 2022, 1–17. [Google Scholar] [CrossRef]
- Hui, T.F.; Cox, S.R.; Huang, T.; Chen, W.R.; Ng, M.L. The Effect of Clear Speech on Cantonese Alaryngeal Speakers’ Intelligibility. Folia Phoniatr. Logop. 2021, 74, 103–111. [Google Scholar] [CrossRef] [PubMed]
- Aueworakhunanan, T.; Dechongkit, S.; Jeeraumporn, J.; Punkla, W. An Evaluation Pertaining to Esophageal Speech Outcomes in Alaryngeal Patients. Ramathibodi Med. J. 2022, 45, 16–24. [Google Scholar] [CrossRef]
- Cao, B.; Teplansky, K.; Sebkhi, N.; Bhavsar, A.; Inan, O.; Samlan, R.; Mau, T.; Wang, J. Data Augmentation for End-to-end Silent Speech Recognition for Laryngectomees. Proc. Interspeech 2022, 2022, 3653–3657. [Google Scholar] [CrossRef]
- Kent, R.D.; Kim, Y.; mei Chen, L. Oral and Laryngeal Diadochokinesis Across the Life Span: A Scoping Review of Methods, Reference Data, and Clinical Applications. J. Speech Lang. Hear. Res. 2022, 65, 574–623. [Google Scholar] [CrossRef]
- Dahl, K.L.; Bolognone, R.K.; Childes, J.M.; Pryor, R.L.; Graville, D.J.; Palmer, A.D. Characteristics associated with communicative participation after total laryngectomy. J. Commun. Disord. 2022, 96, 106184. [Google Scholar] [CrossRef] [PubMed]
- Roy, N.; Barkmeier-Kraemer, J.; Eadie, T.; Sivasankar, M.P.; Mehta, D.; Paul, D.; Hillman, R. Evidence-Based Clinical Voice Assessment: A Systematic Review. Am. J. Speech-Lang. Pathol. 2013, 22, 212–226. [Google Scholar] [CrossRef] [Green Version]
- Rosdi, F.; Salim, S.S.; Mustafa, M.B. An FPN-based classification method for speech intelligibility detection of children with speech impairments. Soft Comput. 2019, 23, 2391–2408. [Google Scholar] [CrossRef]
- Failla, S.; Al-Zanoon, N.; Smith, N.; Doyle, P.C. The Effects of Contextual Priming and Alaryngeal Speech Mode on Auditory-Perceptual Ratings of Listener Comfort. J. Voice 2021, 35, 934.e17–934.e23. [Google Scholar] [CrossRef]
- Stipancic, K.L.; Tjaden, K. Minimally Detectable Change of Speech Intelligibility in Speakers with Multiple Sclerosis and Parkinson’s Disease. J. Speech Lang. Hear. Res. 2022, 65, 1858–1866. [Google Scholar] [CrossRef]
- Malini, S.; Chandrakala, S. Intelligibility assessment of impaired speech using Regularized self-representation based compact supervectors. Comput. Speech Lang. 2022, 74, 101355. [Google Scholar] [CrossRef]
- Albaqshi, H.; Sagheer, A. Dysarthric Speech Recognition using Convolutional Recurrent Neural Networks. Int. J. Intell. Eng. Syst. 2020, 13, 384–392. [Google Scholar] [CrossRef]
- Bessell, N.; Gurd, J.M.; Coleman, J. Dissociation between speech modalities in a case of altered accent with unknown origin. Clin. Linguist. Phon. 2020, 34, 222–241. [Google Scholar] [CrossRef] [PubMed]
- Moon, A.M.; Kim, H.P.; Cook, S.; Blanchard, R.T.; Haley, K.L.; Jacks, A.; Shafer, J.S.; Fried, M.W. Speech patterns and enunciation for encephalopathy determination—A prospective study of hepatic encephalopathy. Hepatol. Commun. 2022, 6, 2876–2885. [Google Scholar] [CrossRef] [PubMed]
- De Cock, E.; Oostra, K.; Bliki, L.; Volkaerts, A.; Hemelsoet, D.; De Herdt, V.; Batens, K. Dysarthria following acute ischemic stroke: Prospective evaluation of characteristics, type and severity. Int. J. Lang. Commun. Disord. 2021, 56, 549–557. [Google Scholar] [CrossRef]
- Rowe, H.P.; Gutz, S.E.; Maffei, M.F.; Tomanek, K.; Green, J.R. Characterizing Dysarthria Diversity for Automatic Speech Recognition: A Tutorial From the Clinical Perspective. Front. Comput. Sci. 2022, 4, 770210. [Google Scholar] [CrossRef]
- Stipancic, K.L.; van Brenk, F.; Kain, A.; Wilding, G.; Tjaden, K. Clear Speech Variants: An Investigation of Intelligibility and Speaker Effort in Speakers with Parkinson’s Disease. Am. J. Speech-Lang. Pathol. 2022, 31, 2789–2805. [Google Scholar] [CrossRef]
- Rosdi, F.; Mustafa, M.B.; Salim, S.S.; Mat Zin, N.A. Automatic speech intelligibility detection for speakers with speech impairments: The identification of significant speech features. Sains Malays. 2019, 48, 2737–2747. [Google Scholar] [CrossRef]
- Maskeliūnas, R.; Kulikajevas, A.; Damaševičius, R.; Pribuišis, K.; Ulozaitė-Stanienė, N.; Uloza, V. Lightweight Deep Learning Model for Assessment of Substitution Voicing and Speech after Laryngeal Carcinoma Surgery. Cancers 2022, 14, 2366. [Google Scholar] [CrossRef]
- Kim, H.; Jeon, J.; Han, Y.J.; Joo, Y.; Lee, J.; Lee, S.; Im, S. Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy. J. Clin. Med. 2020, 9, 3415. [Google Scholar] [CrossRef]
- Feng, Y.; Chen, F.; Ma, J.; Wang, L.; Peng, G. Production of Mandarin consonant aspiration and monophthongs in children with Autism Spectrum Disorder. Clin. Linguist. Phon. 2022. [Google Scholar] [CrossRef]
- Vieira, S.T.; Rosa, R.L.; Rodrguez, D.Z. A speech quality classifier based on tree-cnn algorithm that considers network degradations. J. Commun. Softw. Syst. 2020, 16, 180–187. [Google Scholar] [CrossRef]
- Poncelet, J.; Renkens, V.; Van Hamme, H. Low resource end-to-end spoken language understanding with capsule networks. Comput. Speech Lang. 2021, 66, 101142. [Google Scholar] [CrossRef]
- Cave, R.; Bloch, S. The use of speech recognition technology by people living with amyotrophic lateral sclerosis: A scoping review. Disabil. Rehabil. Assist. Technol. 2021. [Google Scholar] [CrossRef] [PubMed]
- Schultz, B.G.; Tarigoppula, V.S.A.; Noffs, G.; Rojas, S.; van der Walt, A.; Grayden, D.B.; Vogel, A.P. Automatic speech recognition in neurodegenerative disease. Int. J. Speech Technol. 2021, 24, 771–779. [Google Scholar] [CrossRef]
- Gupta, S.; Patil, A.T.; Purohit, M.; Parmar, M.; Patel, M.; Patil, H.A.; Guido, R.C. Residual Neural Network precisely quantifies dysarthria severity-level based on short-duration speech segments. Neural Netw. 2021, 139, 105–117. [Google Scholar] [CrossRef]
- Latha, M.; Shivakumar, M.; Manjula, G.; Hemakumar, M.; Kumar, M.K. Deep Learning-Based Acoustic Feature Representations for Dysarthric Speech Recognition. SN Comput. Sci. 2023, 4, 272. [Google Scholar] [CrossRef]
- Vishnika Veni, S.; Chandrakala, S. Investigation of DNN-HMM and Lattice Free Maximum Mutual Information Approaches for Impaired Speech Recognition. IEEE Access 2021, 9, 168840–168849. [Google Scholar]
- Chandrakala, S.; Malini, S.; Veni, S.V. Histogram of States Based Assistive System for Speech Impairment Due to Neurological Disorders. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 2425–2434. [Google Scholar] [CrossRef]
- Srinivasan, M.; Shanmuganathan, C.; Gupta, S.M.K.; Sikkandar, M.Y. Multi-view representation based speech assisted system for people with neurological disorders. J. Ambient. Intell. Humaniz. Comput. 2021. [Google Scholar] [CrossRef]
- Chandrakala, S.; Malini, S.; Jayalakshmi, S.L. Bag of Models Based Embeddings for Assessment of Neurological Disorders Using Speech Intelligibility. IEEE Trans. Emerg. Top. Comput. 2021, 9, 1265–1275. [Google Scholar] [CrossRef]
- Fu, J.; Yang, S.; He, F.; He, L.; Li, Y.; Zhang, J.; Xiong, X. Sch-net: A deep learning architecture for automatic detection of schizophrenia. BioMed. Eng. Online 2021, 20, 75. [Google Scholar] [CrossRef] [PubMed]
- Marini, M.; Vanello, N.; Fanucci, L. Optimising speaker-dependent feature extraction parameters to improve automatic speech recognition performance for people with dysarthria. Sensors 2021, 21, 6460. [Google Scholar] [CrossRef] [PubMed]
- Mathew, L.R.; Gopakumar, K. Evaluation of speech enhancement algorithms applied to electrolaryngeal speech degraded by noise. Appl. Acoust. 2021, 174, 107771. [Google Scholar] [CrossRef]
- Ishikawa, K.; Boyce, S.; Kelchner, L.; Powell, M.G.; Schieve, H.; de Alarcon, A.; Khosla, S. The Effect of Background Noise on Intelligibility of Dysphonic Speech. J. Speech Lang. Hear. Res. 2017, 60, 1919–1929. [Google Scholar] [CrossRef] [PubMed]
- Dhivya, R.; Justin, J. Performance Evaluation of a Speech Enhancement Technique Using Wavelets. In Proceedings of the International Conference on Soft Computing Systems; Springer: New Delhi, India, 2015; pp. 637–646. [Google Scholar] [CrossRef]
- Jaiswal, R.K.; Yeduri, S.R.; Cenkeramaddi, L.R. Single-channel speech enhancement using implicit Wiener filter for high-quality speech communication. Int. J. Speech Technol. 2022, 25, 745–758. [Google Scholar] [CrossRef]
- Pauline, S.H.; Dhanalakshmi, S.; Kumar, R.; Narayanamoorthi, R. Noise reduction in speech signal of Parkinson’s Disease (PD) patients using optimal variable stage cascaded adaptive filter configuration. Biomed. Signal Process. Control 2022, 77, 103802. [Google Scholar] [CrossRef]
- Doi, H.; Toda, T.; Nakamura, K.; Saruwatari, H.; Shikano, K. Alaryngeal Speech Enhancement Based on One-to-Many Eigenvoice Conversion. IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 172–183. [Google Scholar] [CrossRef]
- Pauline, S.H.; Dhanalakshmi, S. A low-cost automatic switched adaptive filtering technique for denoising impaired speech signals. Multidimens. Syst. Signal Process. 2022, 33, 1387–1408. [Google Scholar] [CrossRef]
- Pandey, P.; Bhandarkar, S.; Bachher, G.; Lehana, P. Enhancement of alaryngeal speech using spectral subtraction. In Proceedings of the 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628), Santorini, Greece, 1–3 July 2002; Volume 2, pp. 591–594. [Google Scholar] [CrossRef]
- Azarnoush, H.; Mir, F.; Agaian, S.; Jamshidi, M.; Shadaram, M. Alaryngeal Speech Enhancement Using Minimum Statistics Approach to Spectral Subtraction. In Proceedings of the 2007 IEEE International Conference on System of Systems Engineering, San Antonio, TX, USA, 16–18 April 2007; pp. 1–5. [Google Scholar] [CrossRef]
- Wei, Y.; Li, C.; Li, T.; Zeng, Y. Whispered Speech Enhancement Based on Improved Mel Frequency Scale and Modified Compensated Phase Spectrum. Circuits Syst. Signal Process. 2019, 38, 5839–5860. [Google Scholar] [CrossRef]
- Mollaei, F.; Shiller, D.M.; Baum, S.R.; Gracco, V.L. The Relationship Between Speech Perceptual Discrimination and Speech Production in Parkinson’s Disease. J. Speech Lang. Hear. Res. 2019, 62, 4256–4268. [Google Scholar] [CrossRef]
- Giri, M.; Rayavarapu, N. Improving the intelligibility of dysarthric speech using a time domain pitch synchronous-based approach. Int. J. Electr. Comput. Eng. 2023, 13, 4041–4051. [Google Scholar] [CrossRef]
- Ishaq, R.; Shahid, M.; Lövström, B.; Zapirain, B.G.; Claesson, I. Modulation frequency domain adaptive gain equalizer using convex optimization. In Proceedings of the 2012 6th International Conference on Signal Processing and Communication Systems, Gold Coast, QLD, Australia, 12–14 December 2012; pp. 1–5. [Google Scholar] [CrossRef] [Green Version]
- Vijayan, K.; Murty, K.S.R. Prosody Modification Using Allpass Residual of Speech Signals. Proc. Interspeech 2016, 2016, 1069–1073. [Google Scholar] [CrossRef] [Green Version]
- Bhangale, K.B.; Kothandaraman, M. Survey of Deep Learning Paradigms for Speech Processing. Wirel. Pers. Commun. 2022, 125, 1913–1949. [Google Scholar] [CrossRef]
- Kobayashi, K.; Toda, T. Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 18–21 January 2021; pp. 396–400. [Google Scholar] [CrossRef]
- Saleem, N.; Khattak, M.I.; Alqahtani, S.A.; Jan, A.; Hussain, I.; Khan, M.N.; Dahshan, M. U-Shaped Low-Complexity Type-2 Fuzzy LSTM Neural Network for Speech Enhancement. IEEE Access 2023, 11, 20814–20826. [Google Scholar] [CrossRef]
- Huq, M. Enhancement of Alaryngeal Speech using Generative Adversarial Network (GAN). In Proceedings of the 2021 IEEE/ACS 18th International Conference on Computer Systems and Applications (AICCSA), Tangier, Morocco, 30 November–3 December 2021; pp. 1–2. [Google Scholar] [CrossRef]
- Pascual, S.; Bonafonte, A.; Serrà, J.; Gonzalez, J.A. Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks. arXiv 2018, arXiv:1808.10687. [Google Scholar]
- Pascual, S.; Serrà, J.; Bonafonte, A. Towards Generalized Speech Enhancement with Generative Adversarial Networks. arXiv 2019, arXiv:1904.03418. [Google Scholar]
- Amarjouf, M.; Bahja, F.; Di-Martino, J.; Chami, M.; Ibn-Elhaj, E.H. Predicted Phase Using Deep Neural Networks to Enhance Esophageal Speech. In Lecture Notes on Data Engineering and Communications Technologies; Springer Nature: Cham, Switzerland, 2023; pp. 68–76. [Google Scholar] [CrossRef]
- Subramanian, A.S.; Wang, X.; Baskar, M.K.; Watanabe, S.; Taniguchi, T.; Tran, D.; Fujita, Y. Speech Enhancement Using End-to-End Speech Recognition Objectives. In Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 20–23 October 2019; pp. 234–238. [Google Scholar] [CrossRef]
- Muthusamy, H.; Polat, K.; Yaacob, S. Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals. Math. Probl. Eng. 2015, 2015, 394083. [Google Scholar] [CrossRef] [Green Version]
- Li, M.; Wang, L.; Xu, Z.; Cai, D. Mandarin electrolaryngeal voice conversion with combination of Gaussian mixture model and non-negative matrix factorization. In Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Malaysia, 12–15 December 2017; pp. 1360–1363. [Google Scholar] [CrossRef]
- Xu, T.; Feng, K.; Ge, Y.; Zhang, X.; Tao, Z. Identification of vocal nodules and laryngitis by Gauss mixture model. In Proceedings of the 2017 4th International Conference on Systems and Informatics (ICSAI), Hangzhou, China, 11–13 November 2017; pp. 1098–1102. [Google Scholar] [CrossRef]
- Areiza-Laverde, H.J.; Castro-Ospina, A.E.; Peluffo-Ordóñez, D.H. Voice Pathology Detection Using Artificial Neural Networks and Support Vector Machines Powered by a Multicriteria Optimization Algorithm. In Communications in Computer and Information Science; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 148–159. [Google Scholar] [CrossRef]
- Das, N.; Chakraborty, S.; Chaki, J.; Padhy, N.; Dey, N. Fundamentals, present and future perspectives of speech enhancement. Int. J. Speech Technol. 2020, 24, 883–901. [Google Scholar] [CrossRef]
- Lee, S.H.; Kim, M.; Seo, H.G.; Oh, B.M.; Lee, G.; Leigh, J.H. Assessment of Dysarthria Using One-Word Speech Recognition with Hidden Markov Models. J. Korean Med. Sci. 2019, 34, e108. [Google Scholar] [CrossRef]
- van Sluis, K.E.; van Son, R.J.J.H.; van der Molen, L.; MCGuinness, A.J.; Palme, C.E.; Novakovic, D.; Stone, D.; Natsis, L.; Charters, E.; Jones, K.; et al. Multidimensional evaluation of voice outcomes following total laryngectomy: A prospective multicenter cohort study. Eur. Arch. Oto-Rhino-Laryngol. 2020, 278, 1209–1222. [Google Scholar] [CrossRef]
- Succo, G.; Peretti, G.; Piazza, C.; Remacle, M.; Eckel, H.E.; Chevalier, D.; Simo, R.; Hantzakos, A.G.; Rizzotto, G.; Lucioni, M.; et al. Open partial horizontal laryngectomies: A proposal for classification by the working committee on nomenclature of the European Laryngological Society. Eur. Arch. Oto-Rhino-Laryngol. 2014, 271, 2489–2496. [Google Scholar] [CrossRef] [PubMed]
- Dejonckere, P.H.; Moerman, M.B.J.; Martens, J.P.; Schoentgen, J.; Manfredi, C. Voicing quantification is more relevant than period perturbation in substitution voices: An advanced acoustical study. Eur. Arch. Oto-Rhino-Laryngol. 2012, 269, 1205–1212. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Maskeliūnas, R.; Damaševičius, R.; Kulikajevas, A.; Padervinskis, E.; Pribuišis, K.; Uloza, V. A Hybrid U-Lossian Deep Learning Network for Screening and Evaluating Parkinson’s Disease. Appl. Sci. 2022, 12, 11601. [Google Scholar] [CrossRef]
- Moerman, M.; Martens, J.P.; Dejonckere, P. Multidimensional assessment of strongly irregular voices such as in substitution voicing and spasmodic dysphonia: A compilation of own research. Logop. Phoniatr. Vocology 2014, 40, 24–29. [Google Scholar] [CrossRef]
- Uloza, V.; Maskeliunas, R.; Pribuisis, K.; Vaitkus, S.; Kulikajevas, A.; Damasevicius, R. An Artificial Intelligence-Based Algorithm for the Assessment of Substitution Voicing. Appl. Sci. 2022, 12, 9748. [Google Scholar] [CrossRef]
- Campbell, I. Chi-squared and Fisher–Irwin tests of two-by-two tables with small sample recommendations. Stat. Med. 2007, 26, 3661–3675. [Google Scholar] [CrossRef]
- Schindler, A.; Mozzanica, F.; Ginocchio, D.; Invernizzi, A.; Peri, A.; Ottaviani, F. Voice-related quality of life in patients after total and partial laryngectomy. Auris Nasus Larynx 2012, 39, 77–83. [Google Scholar] [CrossRef]
- Teruya, N.; Sunagawa, Y.; Toyosato, T.; Yokota, T. Association between Daily Life Difficulties and Acceptance of Disability in Cancer Survivors after Total Laryngectomy: A Cross-Sectional Survey. Asia-Pac. J. Oncol. Nurs. 2019, 6, 170–176. [Google Scholar] [CrossRef]
- Lin, Z.; Lin, H.; Chen, Y.; Xu, Y.; Chen, X.; Fan, H.; Wu, X.; Ke, X.; Lin, C. Long-term survival trend after primary total laryngectomy for patients with locally advanced laryngeal carcinoma. J. Cancer 2021, 12, 1220–1230. [Google Scholar] [CrossRef]
- Birkeland, A.C.; Beesley, L.; Bellile, E.; Rosko, A.J.; Hoesli, R.; Chinn, S.B.; Shuman, A.G.; Prince, M.E.; Wolf, G.T.; Bradford, C.R.; et al. Predictors of survival after total laryngectomy for recurrent/persistent laryngeal squamous cell carcinoma. Head Neck 2017, 39, 2512–2518. [Google Scholar] [CrossRef]
Group | N | Mean | Std. Deviation | p | |
---|---|---|---|---|---|
Probability 0 | Original | 75 | 4.09 | 19.51 | 0.001 |
Pareto-optimized NMF | 75 | 13.51 | 33.3 | 0.001 | |
Probability 1 | Original | 75 | 56.18 | 48.66 | 0.454 |
Pareto-optimized NMF | 75 | 57.28 | 47.83 | 0.454 | |
Probability 2 | Original | 75 | 39.73 | 47.9 | 0.08 |
Pareto-optimized NMF | 75 | 29.21 | 43.89 | 0.08 | |
AVE | Original | 75 | 0.81 | 0.11 | 0.34 |
Pareto-optimized NMF | 75 | 0.8 | 0.1 | 0.34 | |
ASVI | Original | 75 | 8.8 | 4.94 | 0.166 |
Pareto-optimized NMF | 75 | 10.17 | 6.09 | 0.166 |
Group | Method | N | p | |
---|---|---|---|---|
Healthy speech | Original | 4 | 4.0 | 0.043 |
Pareto-optimized NMF | 10 | 13.33 | ||
Speech after laryngeal oncosurgery | Original | 72 | 96.0 | 4.097 |
Pareto-optimized NMF | 65 | 86.67 |
Levene’s Test | t-Test for Equality of Means | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
F | Sig. | t | df | Sig. (2-Tailed) | Mean Difference | Std. Error Difference | 95% Conf. Int. | |||
Lower | Upper | |||||||||
Probability 0 | Equal variances assumed | 18.313 | 0.000 | −2.113 | 148 | 0.036 | −9.41893 | 4.45670 | −18.22592 | −0.61195 |
Equal variances not assumed | −2.113 | 119.448 | 0.037 | −9.41893 | 4.45670 | −18.24330 | −0.59457 | |||
Probability 1 | Equal variances assumed | 0.563 | 0.454 | −0.139 | 148 | 0.890 | −1.09627 | 7.87862 | −16.66538 | 14.47284 |
Equal variances not assumed | −0.139 | 147.956 | 0.890 | −1.09627 | 7.87862 | −16.66542 | 14.47288 | |||
Probability 2 | Equal variances assumed | 7.317 | 0.008 | 1.402 | 148 | 0.163 | 10.51547 | 7.50161 | −4.30864 | 25.33957 |
Equal variances not assumed | 1.402 | 146.885 | 0.163 | 10.51547 | 7.50161 | −4.30957 | 25.34050 | |||
AVE | Equal variances assumed | 0.918 | 0.340 | 0.319 | 148 | 0.750 | 0.005560 | 0.017451 | −0.028926 | 0.040046 |
Equal variances not assumed | 0.319 | 147.237 | 0.750 | 0.005560 | 0.017451 | −0.028927 | 0.040047 | |||
ASVI | Equal variances assumed | 1.941 | 0.166 | −1.509 | 148 | 0.133 | −1.36607 | 0.90525 | −3.15495 | 0.42281 |
Equal variances not assumed | −1.509 | 141.961 | 0.134 | −1.36607 | 0.90525 | −3.15558 | 0.42343 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Maskeliūnas, R.; Damaševičius, R.; Kulikajevas, A.; Pribuišis, K.; Ulozaitė-Stanienė, N.; Uloza, V. Pareto-Optimized Non-Negative Matrix Factorization Approach to the Cleaning of Alaryngeal Speech Signals. Cancers 2023, 15, 3644. https://doi.org/10.3390/cancers15143644
Maskeliūnas R, Damaševičius R, Kulikajevas A, Pribuišis K, Ulozaitė-Stanienė N, Uloza V. Pareto-Optimized Non-Negative Matrix Factorization Approach to the Cleaning of Alaryngeal Speech Signals. Cancers. 2023; 15(14):3644. https://doi.org/10.3390/cancers15143644
Chicago/Turabian StyleMaskeliūnas, Rytis, Robertas Damaševičius, Audrius Kulikajevas, Kipras Pribuišis, Nora Ulozaitė-Stanienė, and Virgilijus Uloza. 2023. "Pareto-Optimized Non-Negative Matrix Factorization Approach to the Cleaning of Alaryngeal Speech Signals" Cancers 15, no. 14: 3644. https://doi.org/10.3390/cancers15143644
APA StyleMaskeliūnas, R., Damaševičius, R., Kulikajevas, A., Pribuišis, K., Ulozaitė-Stanienė, N., & Uloza, V. (2023). Pareto-Optimized Non-Negative Matrix Factorization Approach to the Cleaning of Alaryngeal Speech Signals. Cancers, 15(14), 3644. https://doi.org/10.3390/cancers15143644