Transformer-Based Approach to Pathology Diagnosis Using Audio Spectrogram
Abstract
:1. Introduction
2. Literature Review
3. Materials and Methods
3.1. Dataset Description
3.2. Data Preprocessing
3.3. Data Transformation
3.4. Audio Spectrogram Transformer Architecture
3.4.1. AST Spectrogram Generation
3.4.2. Patch-Based Input Representation
3.4.3. Transformer Encoder
3.4.4. Linear Layer
3.5. Model Training and Evaluation Measures
4. Experimental Results
5. Discussion
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- World Health Organization. Newborn Mortality. Available online: https://www.who.int/news-room/fact-sheets/detail/newborns-reducing-mortality (accessed on 2 January 2024).
- National Heart, Lung, and Blood Institute (NHLBI). Respiratory Distress Syndrome (RDS). Available online: https://www.nhlbi.nih.gov/health-topics/respiratory-distress-syndrome (accessed on 2 January 2024).
- World Health Organization. Sepsis. Available online: https://www.who.int/news-room/fact-sheets/detail/sepsis (accessed on 2 January 2024).
- Sood, B.G.; Thomas, R.; Delaney-Black, V.; Xin, Y.; Sharma, A.; Chen, X. Aerosolized Beractant in neonatal respiratory distress syndrome: A randomized fixed-dose parallel-arm phase II trial. Pulm. Pharmacol. Ther. 2021, 66, 101986. [Google Scholar] [CrossRef]
- Turhan, E.E.; Gürsoy, T.; Ovali, F. Factors which affect mortality in neonatal sepsis. Türk. Pediatri. Arşivi 2015, 50, 170–175. [Google Scholar] [CrossRef]
- Mayo Clinic. 2022. Available online: https://www.mayoclinic.org/diseases-conditions/ards/diagnosis-treatment/drc-20355581 (accessed on 2 January 2024).
- Randolph, A.G.; McCulloh, R.J. Pediatric sepsis: Important considerations for diagnosing and managing severe infections in infants, children, and adolescents. Virulence 2014, 5, 179–189. [Google Scholar] [CrossRef]
- Khalilzad, Z.; Hasasneh, A.; Tadj, C. Newborn Cry-Based Diagnostic System to Distinguish between Sepsis and Respiratory Distress Syndrome Using Combined Acoustic Features. Diagnostics 2022, 12, 2802. [Google Scholar] [CrossRef]
- Mampe, B.; Friederici, A.D.; Christophe, A.; Wermke, K. Newborns’ Cry Melody Is Shaped by Their Native Language. Curr. Biol. 2009, 19, 1994–1997. [Google Scholar] [CrossRef]
- The Cry of The Human Infant on JSTOR. Available online: https://www.jstor.org/stable/24950031 (accessed on 2 January 2024).
- Osmani, A.; Hamidi, M.; Chibani, A. Machine Learning Approach for Infant Cry Interpretation. In Proceedings of the 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), Boston, MA, USA, 6–8 November 2017; pp. 182–186. [Google Scholar] [CrossRef]
- Wu, K.; Zhang, C.; Wu, X.; Wu, D.; Niu, X. Research on Acoustic Feature Extraction of Crying for Early Screening of Children with Autism. In Proceedings of the 2019 34rd Youth Academic Annual Conference of Chinese Association of Automation (YAC), Jinzhou, China, 6–8 June 2019; pp. 290–295. [Google Scholar] [CrossRef]
- Hariharan, M.; Sindhu, R.; Yaacob, S. Normal and hypoacoustic infant cry signal classification using time–frequency analysis and general regression neural network. Comput. Methods Programs Biomed. 2012, 108, 559–569. [Google Scholar] [CrossRef] [PubMed]
- Orlandi, S.; Manfredi, C.; Bocchi, L.; Scattoni, M.L. Automatic newborn cry analysis: A Non-invasive tool to help autism early diagnosis. In Proceedings of the 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Diego, CA, USA, 28 August–1 September 2012; pp. 2953–2956. [Google Scholar] [CrossRef]
- Zayed, Y.; Hasasneh, A.; Tadj, C. Infant Cry Signal Diagnostic System Using Deep Learning and Fused Features. Diagnostics 2023, 13, 2107. [Google Scholar] [CrossRef]
- Ji, C.; Mudiyanselage, T.B.; Gao, Y.; Pan, Y. A review of infant cry analysis and classification. EURASIP J. Audio Speech Music Process. 2021, 2021, 8. [Google Scholar] [CrossRef]
- Lederman, D.; Zmora, E.; Hauschildt, S.; Stellzig-Eisenhauer, A.; Wermke, K. Classification of cries of infants with cleft-palate using parallel hidden Markov models. Med. Biol. Eng. Comput. 2008, 46, 965–975. [Google Scholar] [CrossRef]
- Joshi, V.R.; Srinivasan, K.; Vincent, P.M.D.R.; Rajinikanth, V.; Chang, C.-Y. A Multistage Heterogeneous Stacking Ensemble Model for Augmented Infant Cry Classification. Front. Public Health 2022, 10, 819865. [Google Scholar] [CrossRef]
- Patil, A.T.; Kachhi, A.; Patil, H.A. Subband Teager Energy Representations for Infant Cry Analysis and Classification. In Proceedings of the 2022 30th European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 29 August–2 September 2022; pp. 1313–1317. [Google Scholar]
- Liu, L.; Li, Y.; Kuo, K. Infant Cry Signal Detection, Pattern Extraction and Recognition. In Proceedings of the 2018 International Conference on Information and Computer Technologies (ICICT), DeKalb, IL, USA, 23–25 March 2018; pp. 159–163. [Google Scholar]
- Cohen, R.; Ruinskiy, D.; Zickfeld, J.; IJzerman, H.; Lavner, Y. Baby Cry Detection: Deep Learning and Classical Approaches. In Development and Analysis of Deep Learning Architectures; Springer: Berlin/Heidelberg, Germany, 2020; pp. 171–196. [Google Scholar] [CrossRef]
- Orlandi, S.; Reyes Garcia, C.A.; Bandini, A.; Donzelli, G.; Manfredi, C. Application of Pattern Recognition Techniques to the Classification of Full-Term and Preterm Infant Cry. J. Voice 2016, 30, 656–663. [Google Scholar] [CrossRef] [PubMed]
- Chang, C.-Y.; Li, J.-J. Application of Deep Learning for Recognizing Infant Cries. In Proceedings of the 2016 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), Nantou, Taiwan, 27–29 May 2016; pp. 1–2. [Google Scholar]
- Chaiwachiragompol, A.; Suwannata, N. The Study of Learning System for Infant Cry Classification Using Discrete Wavelet Transform and Extreme Machine Learning. Ingénierie Des. Systèmes D Inf. 2022, 27, 433–440. [Google Scholar] [CrossRef]
- Vincent, P.M.; Srinivasan, K.; Chang, C.Y. Deep Learning Assisted Neonatal Cry Classification via Support Vector Machine Models. Front. Public Health 2021, 9, 670352. [Google Scholar] [CrossRef] [PubMed]
- Felipe, G.Z.; Aguiar, R.L.; Costa, Y.M.G.; Silla, C.N.; Brahnam, S.; Nanni, L.; McMurtrey, S. Identification of Infants’ Cry Motivation Using Spectrograms. In Proceedings of the 2019 International Conference on Systems, Signals and Image Processing (IWSSIP), Osijek, Croatia, 5–7 June 2019; pp. 181–186. [Google Scholar]
- Ji, C.; Basodi, S.; Xiao, X.; Pan, Y. Infant Sound Classification on Multi-stage CNNs with Hybrid Features and Prior Knowledge. In International Conference on AI and Mobile Services; Springer International Publishing: Cham, Switzerland, 2020; pp. 3–16. [Google Scholar] [CrossRef]
- Ting, H.-N.; Choo, Y.-M.; Ahmad Kamar, A. Classification of Asphyxia Infant Cry Using Hybrid Speech Features and Deep Learning Models. Expert. Syst. Appl. 2022, 208, 118064. [Google Scholar] [CrossRef]
- Lahmiri, S.; Tadj, C.; Gargour, C.; Bekiros, S. Deep learning systems for automatic diagnosis of infant cry signals. Chaos Solitons Fractals 2022, 154, 111700. [Google Scholar] [CrossRef]
- Li, Y.; Tagliasacchi, M.; Rybakov, O.; Ungureanu, V.; Roblek, D. Real-Time Speech Frequency Bandwidth Extension. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 691–695. [Google Scholar] [CrossRef]
- Beaugeant, C.; Schonle, M.; Varga, I. Challenges of 16 kHz in acoustic pre- and post-processing for terminals. IEEE Commun. Mag. 2006, 44, 98–104. [Google Scholar] [CrossRef]
- Lie, W.-N.; Chang, L.-C. Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification. IEEE Trans. Multimed. 2006, 8, 46–59. [Google Scholar] [CrossRef]
- Lu, L.; Liu, C.; Li, J.; Gong, Y. Exploring Transformers for Large-Scale Speech Recognition. arXiv 2020, arXiv:2005.09684. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. arXiv 2020, arXiv:2005.14165. [Google Scholar]
- Gong, Y.; Chung, Y.A.; Glass, J. Ast: Audio spectrogram transformer. arXiv 2021, arXiv:2104.01778. [Google Scholar]
- Zhang, S.; Loweimi, E.; Bell, P.; Renals, S. On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers. In Proceedings of the 2021 IEEE Spoken Language Technology Workshop (SLT), Shenzhen, China, 19–22 January 2021; pp. 89–96. [Google Scholar] [CrossRef]
- Shih, Y.-J.; Wu, S.-L.; Zalkow, F.; Müller, M.; Yang, Y.-H. Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer. IEEE Trans. Multimed. 2023, 25, 3495–3508. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Gong, Y.; Lai, C.-I.; Chung, Y.-A.; Glass, J. SSAST: Self-Supervised Audio Spectrogram Transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, Online, 22 February–1 March 2022; Volume 36, pp. 10699–10709. [Google Scholar] [CrossRef]
- Baade, A.; Peng, P.; Harwath, D. MAE-AST: Masked Autoencoding Audio Spectrogram Transformer. Interspeech 2022, 2022, 2438–2442. [Google Scholar] [CrossRef]
- Gong, Y.; Khurana, S.; Rouditchenko, A.; Glass, J. Cmkd: Cnn/transformer-based cross-model knowledge distillation for audio classification. arXiv 2022, arXiv:2203.06760. [Google Scholar]
Demographic Factors | Details |
---|---|
Gender | Female and male |
Babies’ ages | 1 to 53 days old |
Weight | 0.98 to 5.2 kg |
Origin | Canada, Haiti, Portugal, Syria, Lebanon, Algeria, Palestine, Bangladesh, and Turkey |
Race | Caucasian, Arabic, Asian, Latino, African, Native Hawaiian, and Quebec |
Hyperparameter | Best Value |
---|---|
Epoch | 6 |
Learning rate | 6 × 10−5 |
Learning rate scheduler | Linear |
Weight decay | 0.5% |
Batch size | 16 |
Loss function | Categorical Cross-Entropy |
Optimizer | adamW |
Model | Learning Rate | Epochs | Scheduler | Weight Decay | Accuracy |
---|---|---|---|---|---|
1 | 6 × 10−5 | 6 | Linear | 0.5% | 98.69% |
2 | 8 × 10−5 | 5 | Linear | 0.5% | 95% |
3 | 1 × 10−5 | 5 | Linear | 0.5% | 93% |
4 | 5 × 10−4 | 5 | Constant | 0.5% | 87% |
Metric | Value |
---|---|
Accuracy | 98.69% |
Precision | 98.73% |
Recall | 98.71% |
F1 score | 98.71 |
Comparison | Model [8] * | Model [15] * | Model [25] * | Model [18] * | Proposed Model |
---|---|---|---|---|---|
Classes | 2 classes | 3 classes | 3 classes | 4 classes | 3 classes |
Audio features | GFCC, HR | GFCC, HR, spectrogram | Spectrogram | Linear Frequency Cepstral Coefficients (LFCCs) | Spectrogram |
ML algorithm | Multilayer perceptron | Fusion deep learning (CNN) | SVM + CNN | XGBoost | Transformer |
Accuracy | 95.92% | 97.50% | 92.5% | 92% | 98.69% |
Precision | 95% | 97.51% | 88.8% | - | 98.73% |
Recall | 95% | 97.53% | 89.3 | - | 98.71% |
F1 score | 95% | 97.52% | 88.9% | 92.3% | 98.71% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tami, M.; Masri, S.; Hasasneh, A.; Tadj, C. Transformer-Based Approach to Pathology Diagnosis Using Audio Spectrogram. Information 2024, 15, 253. https://doi.org/10.3390/info15050253
Tami M, Masri S, Hasasneh A, Tadj C. Transformer-Based Approach to Pathology Diagnosis Using Audio Spectrogram. Information. 2024; 15(5):253. https://doi.org/10.3390/info15050253
Chicago/Turabian StyleTami, Mohammad, Sari Masri, Ahmad Hasasneh, and Chakib Tadj. 2024. "Transformer-Based Approach to Pathology Diagnosis Using Audio Spectrogram" Information 15, no. 5: 253. https://doi.org/10.3390/info15050253
APA StyleTami, M., Masri, S., Hasasneh, A., & Tadj, C. (2024). Transformer-Based Approach to Pathology Diagnosis Using Audio Spectrogram. Information, 15(5), 253. https://doi.org/10.3390/info15050253