Predictions for Three-Month Postoperative Vocal Recovery after Thyroid Surgery from Spectrograms with Deep Neural Network
Abstract
:1. Introduction
2. Related Works
3. Materials and Methods
3.1. Patients and Vocal Data
3.2. Preprocessing the Vocal Data
3.3. The Two Stages of Deep Neural Networks
3.4. The Activation Heatmap of the Convolution Layer
3.5. Performance Evaluation
4. Results and Discussion
4.1. The Prediction Performance of the Deep Neural Network
4.2. Binary Classification Performance for Prognostic Prediction
4.3. Prognosis in Patients after 6 Months
4.4. Scores for the Activation Heatmap
4.5. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Choi, J.-S.; Jeong, J.-I.; Jang, M.-S.; Son, Y.-I. Voice Changes after Thyroidectomy without Recurrent Laryngeal Nerve Injury. J. Korean Soc. Laryngol. Phoniatr. Logop. 2010, 21, 37–41. [Google Scholar]
- Lombardi, C.P.; Raffaelli, M.; De Crea, C.; D’Alatri, L.; Maccora, D.; Marchese, M.R.; Paludetti, G.; Bellantone, R. Long-Term Outcome of Functional Post-Thyroidectomy Voice and Swallowing Symptoms. Surgery 2009, 146, 1174–1181. [Google Scholar] [CrossRef] [PubMed]
- Roy, N.; Barkmeier-Kraemer, J.; Eadie, T.; Sivasankar, M.P.; Mehta, D.; Paul, D.; Hillman, R. Evidence-Based Clinical Voice Assessment: A Systematic Review. Am. J. Speech-Lang. Pathol. 2013, 22, 212–226. [Google Scholar] [CrossRef]
- Randolph, G.W. The Importance of Pre-and Postoperative Laryngeal Examination for Thyroid Surgery. Thyroid 2010, 20, 453–458. [Google Scholar] [CrossRef] [PubMed]
- Chandrasekhar, S.S.; Randolph, G.W.; Seidman, M.D.; Rosenfeld, R.M.; Angelos, P.; Barkmeier-Kraemer, J.; Benninger, M.S.; Blumin, J.H.; Dennis, G.; Hanks, J.; et al. Clinical Practice Guideline: Improving Voice Outcomes after Thyroid Surgery. Otolaryngol. Head Neck Surg. 2013, 148, S1–S37. [Google Scholar] [CrossRef]
- Henry, L.R.; Helou, L.B.; Solomon, N.P.; Howard, R.S.; Gurevich-Uvena, J.; Coppit, G.; Stojadinovic, A. Functional Voice Outcomes after Thyroidectomy: An Assessment of the Dsyphonia Severity Index (DSI) after Thyroidectomy. Surgery 2010, 147, 861–870. [Google Scholar] [CrossRef]
- De Bodt, M.S.; Wuyts, F.L.; de Heyning, P.H.; Croux, C. Test-Retest Study of the GRBAS Scale: Influence of Experience and Professional Background on Perceptual Rating of Voice Quality. J. Voice 1997, 11, 74–80. [Google Scholar] [CrossRef]
- Wuyts, F.L.; De Bodt, M.S.; de Heyning, P.H. Is the Reliability of a Visual Analog Scale Higher than an Ordinal Scale? An Experiment with the GRBAS Scale for the Perceptual Evaluation of Dysphonia. J. Voice 1999, 13, 508–517. [Google Scholar] [CrossRef]
- Bhuta, T.; Patrick, L.; Garnett, J.D. Perceptual Evaluation of Voice Quality and Its Correlation with Acoustic Measurements. J. Voice 2004, 18, 299–304. [Google Scholar] [CrossRef]
- Xie, Z.; Gadepalli, C.; Farideh, J.; Cheetham, B.M.G.; Homer, J.J. Machine Learning Applied to GRBAS Voice Quality Assessment. Adv. Sci. Technol. Eng. Syst. J. 2018, 3, 329–338. [Google Scholar] [CrossRef]
- Dennis, J.; Tran, H.D.; Li, H. Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions. IEEE Signal Process. Lett. 2010, 18, 130–133. [Google Scholar] [CrossRef]
- Costa, Y.M.G.; Oliveira, L.S.; Silla Jr, C.N. An Evaluation of Convolutional Neural Networks for Music Classification Using Spectrograms. Appl. Soft Comput. 2017, 52, 28–38. [Google Scholar] [CrossRef]
- Sakashita, Y.; Aono, M. Acoustic Scene Classification by Ensemble of Spectrograms Based on Adaptive Temporal Divisions. In Technical report, Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge; DCASE Community: Washington, DC, USA, 2018. [Google Scholar]
- Wyse, L. Audio Spectrogram Representations for Processing with Convolutional Neural Networks. arXiv 2017, arXiv:1706.09559. [Google Scholar]
- Xie, S.; Yan, N.; Yu, P.; Ng, M.L.; Wang, L.; Ji, Z. Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale. In Proceedings of the Interspeech, San Francisco, CA, USA, 8–12 September 2016; pp. 2656–2660. [Google Scholar]
- Hamet, P.; Tremblay, J. Artificial Intelligence in Medicine. Metabolism 2017, 69, S36–S40. [Google Scholar] [CrossRef]
- Lee, J.H.; Baek, J.H.; Kim, J.H.; Shim, W.H.; Chung, S.R.; Choi, Y.J.; Lee, J.H. Deep Learning—Based Computer-Aided Diagnosis System for Localization and Diagnosis of Metastatic Lymph Nodes on Ultrasound: A Pilot Study. Thyroid 2018, 28, 1332–1338. [Google Scholar] [CrossRef]
- Lee, J.H.; Ha, E.J.; Kim, J.H. Application of Deep Learning to the Diagnosis of Cervical Lymph Node Metastasis from Thyroid Cancer with CT. Eur. Radiol. 2019, 29, 5452–5457. [Google Scholar] [CrossRef]
- Lee, J.H.; Ha, E.J.; Kim, D.; Jung, Y.J.; Heo, S.; Jang, Y.; An, S.H.; Lee, K. Application of Deep Learning to the Diagnosis of Cervical Lymph Node Metastasis from Thyroid Cancer with CT: External Validation and Clinical Utility for Resident Training. Eur. Radiol. 2020, 30, 3066–3072. [Google Scholar] [CrossRef]
- Alhussein, M.; Muhammad, G. Voice Pathology Detection Using Deep Learning on Mobile Healthcare Framework. IEEE Access 2018, 6, 41034–41041. [Google Scholar] [CrossRef]
- Gama, A.C.C.; Cardoso, F.E.C.; Reis, C.A.d.C.; Bassi, I.B. Idiopathic Parkinson’s Disease: Vocal and Quality of Life Analysis. Arq. Neuropsiquiatr. 2012, 70, 674–679. [Google Scholar]
- Ongkasuwan, J.; Espinosa, M.C.L.; Hollas, S.; Devore, D.; Procter, T.; Bassett, E.; Schwabe, A. Predictors of Voice Outcome in Pediatric Non-Selective Laryngeal Reinnervation. Laryngoscope 2020, 130, 1525–1531. [Google Scholar] [CrossRef]
- Lee, D.Y.; Lee, K.J.; Hwang, S.M.; Oh, K.H.; Cho, J.-G.; Baek, S.-K.; Kwon, S.-Y.; Woo, J.-S.; Jung, K.-Y. Analysis of Temporal Change in Voice Quality after Thyroidectomy: Single-Institution Prospective Study. J. Voice 2017, 31, 195–201. [Google Scholar] [CrossRef]
- Tedla, M.; Chakrabarti, S.; Suchankova, M.; Weickert, M.O. Voice Outcomes after Thyroidectomy without Superior and Recurrent Laryngeal Nerve Injury: VoiSS Questionnaire and GRBAS Tool Assessment. Eur. Arch. Oto-Rhino-Laryngol. 2016, 273, 4543–4547. [Google Scholar] [CrossRef]
- Tulics, M.G.; Szaszák, G.; Mészáros, K.; Vicsi, K. Using ASR Posterior Probability and Acoustic Features for Voice Disorder Classification. In Proceedings of the 2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Mariehamn, Aland, 23–25 September 2020; pp. 155–160. [Google Scholar]
- Tulics, M.G.; Szaszák, G.; Mészáros, K.; Vicsi, K. Artificial Neural Network and Svm Based Voice Disorder Classification. In Proceedings of the 2019 10th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Naples, Italy, 23–25 October 2019; pp. 307–312. [Google Scholar]
- Sauder, C.; Bretl, M.; Eadie, T. Predicting Voice Disorder Status from Smoothed Measures of Cepstral Peak Prominence Using Praat and Analysis of Dysphonia in Speech and Voice (ADSV). J. Voice 2017, 31, 557–566. [Google Scholar] [CrossRef]
- Fex, B.; Fex, S.; Shiromoto, O.; Hirano, M. Acoustic Analysis of Functional Dysphonia: Before and after Voice Therapy (Accent Method). J. Voice 1994, 8, 163–167. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. V EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
- Van Houdt, G.; Mosquera, C.; Nápoles, G. A Review on the Long Short-Term Memory Model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
- Mikolov, T.; Kombrink, S.; Burget, L.; Černocký, J.; Khudanpur, S. Extensions of Recurrent Neural Network Language Model. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 5528–5531. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-Cam: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Arias-Londoño, J.D.; Gómez-García, J.A.; Godino-Llorente, J.I. Multimodal and Multi-Output Deep Learning Architectures for the Automatic Assessment of Voice Quality Using the Grb Scale. IEEE J. Sel. Top. Signal Process. 2019, 14, 413–422. [Google Scholar] [CrossRef]
- Fujimura, S.; Kojima, T.; Okanoue, Y.; Shoji, K.; Inoue, M.; Omori, K.; Hori, R. Classification of Voice Disorders Using a One-Dimensional Convolutional Neural Network. J. Voice 2020, 36, 15–20. [Google Scholar] [CrossRef]
- Verde, L.; De Pietro, G.; Sannino, G. Voice Disorder Identification by Using Machine Learning Techniques. IEEE Access 2018, 6, 16246–16255. [Google Scholar] [CrossRef]
- Hegde, S.; Shetty, S.; Rai, S.; Dodderi, T. A Survey on Machine Learning Approaches for Automatic Detection of Voice Disorders. J. Voice 2019, 33, 947.e11–947.e33. [Google Scholar] [CrossRef]
- Poburka, B.J.; Patel, R.R.; Bless, D.M. Voice-Vibratory Assessment with Laryngeal Imaging (VALI) Form: Reliability of Rating Stroboscopy and High-Speed Videoendoscopy. J. Voice 2017, 31, 513.e1–513.e14. [Google Scholar] [CrossRef] [PubMed]
Grade | Pre op. | Post op. | 3 Months Post op. |
---|---|---|---|
G0 | 43 | 25 | 31 |
G1 | 61 | 60 | 67 |
G2 | 9 | 24 | 14 |
G3 | 1 | 5 | 2 |
Class | RMSE | Rho | p Value |
---|---|---|---|
Grade | 0.399 | 0.796 | <0.001 |
Roughness | 0.365 | 0.149 | 0.509 |
Breathiness | 0.409 | 0.784 | <0.001 |
Asthenia | 0.469 | 0.602 | 0.003 |
Strain | 0.203 | NA | NA |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, J.H.; Lee, C.Y.; Eom, J.S.; Pak, M.; Jeong, H.S.; Son, H.Y. Predictions for Three-Month Postoperative Vocal Recovery after Thyroid Surgery from Spectrograms with Deep Neural Network. Sensors 2022, 22, 6387. https://doi.org/10.3390/s22176387
Lee JH, Lee CY, Eom JS, Pak M, Jeong HS, Son HY. Predictions for Three-Month Postoperative Vocal Recovery after Thyroid Surgery from Spectrograms with Deep Neural Network. Sensors. 2022; 22(17):6387. https://doi.org/10.3390/s22176387
Chicago/Turabian StyleLee, Jeong Hoon, Chang Yoon Lee, Jin Seop Eom, Mingun Pak, Hee Seok Jeong, and Hee Young Son. 2022. "Predictions for Three-Month Postoperative Vocal Recovery after Thyroid Surgery from Spectrograms with Deep Neural Network" Sensors 22, no. 17: 6387. https://doi.org/10.3390/s22176387
APA StyleLee, J. H., Lee, C. Y., Eom, J. S., Pak, M., Jeong, H. S., & Son, H. Y. (2022). Predictions for Three-Month Postoperative Vocal Recovery after Thyroid Surgery from Spectrograms with Deep Neural Network. Sensors, 22(17), 6387. https://doi.org/10.3390/s22176387