A Wearable Assistant Device for the Hearing Impaired to Recognize Emergency Vehicle Sirens with Edge Computing
Abstract
:1. Introduction
2. Materials and Methods
2.1. Features
2.2. EfficientNet-based Model
2.3. Fuzzy Rank-Based Model
2.4. Wearable Assistant Device
2.5. Performances
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cornelius, C.; Marois, Z.; Sober, J.; Peterson, R.; Mare, S.; Kotz, D. Vocal resonance as a passive biometric. In Computer Science Technical Report; Dartmouth Digital Commons: Hanover, NH, USA, 2014; pp. 747–761. [Google Scholar]
- Donmez, L.; Gokkoca, Z. Accident profile of older people in Antalya city center, Turkey. Arch. Gerontol. Geriatr. 2003, 37, 99–108. [Google Scholar] [CrossRef] [PubMed]
- Tiwari, R.R.; Ganveer, G.B. A study on human risk factors in non-fatal road traffic accidents at Nagpur. Indian J. Public Health 2008, 52, 197–199. [Google Scholar] [PubMed]
- Lee, W.; Seong, J.J.; Ozlu, B.; Shim, B.S.; Marakhimov, A.; Lee, S. Biosignal Sensors and Deep Learning-Based Speech Recognition: A Review. Sensors 2021, 21, 1399. [Google Scholar] [CrossRef]
- Nassif, A.B.; Shahin, I.; Attili, I.; Azzeh, M.; Shaalan, K. Speech Recognition Using Deep Neural Networks: A Systematic Review. IEEE Access 2019, 7, 19143–19165. [Google Scholar] [CrossRef]
- Dossou, B.F.P.; Gbenou, Y.K.S. FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021) Workshops, Montreal, QC, Canada, 11–17 October 2021; pp. 3533–3538. [Google Scholar]
- Padi, S.; Sadjadi, S.O.; Sriram, R.D.; Manocha, D. Improved Speech Emotion Recognition Using Transfer Learning and Spectrogram Augmentation. In Proceedings of the 2021 International Conference on Multimodal Interaction (ICMI 2021), Montréal, QC, Canada, 18–22 October 2021; pp. 645–652. [Google Scholar]
- Gunawan, K.W.; Hidayat, A.A.; Cenggoro, T.W.; Pardamean, B. Repurposing transfer learning strategy of computer vision for owl sound classification. Procedia Comput. Sci. 2023, 216, 424–430. [Google Scholar] [CrossRef]
- Lee, J.H.; Lee, C.Y.; Eom, J.S.; Pak, M.; Jeong, H.S.; Son, H.Y. Predictions for Three-Month Postoperative Vocal Recovery after Thyroid Surgery from Spectrograms with Deep Neural Network. Sensors 2022, 22, 6387. [Google Scholar] [CrossRef]
- Lu, Q.; Li, Y.; Qin, Z.; Liu, X.; Xie, Y. Speech Recognition using EfficientNet. In Proceedings of the 2020 5th International Conference on Multimedia Systems and Signal Processing (ICMSSP 2020), Chengdu, China, 28–30 May 2020; pp. 64–68. [Google Scholar]
- Allamy, S.; Koerich, A.L. 1D CNN Architectures for Music Genre Classification. In Proceedings of the 2021 IEEE Symposium Series on Computational Intelligence (SSCI 2021), Orlando, FL, USA, 5–7 December 2021; pp. 1–7. [Google Scholar]
- Breiman, L. Stacked regressions. Mach. Learn. 1996, 24, 49–64. [Google Scholar] [CrossRef]
- Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Opitz, D.W.; Shavlik, J.W. Actively Searching for an Effective Neural Network Ensemble. Connect. Sci. 1996, 8, 337–353. [Google Scholar] [CrossRef]
- Hashem, S. Optimal linear combinations of neural networks. Neural Netw. 1997, 10, 599–614. [Google Scholar] [CrossRef]
- Tan, T.H.; Wu, J.Y.; Liu, S.H.; Gochoo, M. Human Activity Recognition Using an Ensemble Learning Algorithm with Smartphone Sensor Data. Electronics 2022, 11, 322. [Google Scholar] [CrossRef]
- Xie, J.; Xu, B.; Chuang, Z. Horizontal and vertical ensemble with deep representation for classification. arXiv 2013, arXiv:1306.2759. [Google Scholar]
- Tasci, E.; Uluturk, C.; Ugur, A. A voting-based ensemble deep learning method focusing on image augmentation and preprocessing variations for tuberculosis detection. Neural Comput. Appli. 2021, 33, 15541–15555. [Google Scholar] [CrossRef] [PubMed]
- Shaga Devan, K.; Kestler, H.A.; Read, C.; Walther, P. Weighted average ensemble-based semantic segmentation in biological electron microscopy images. Histochem. Cell Biol. 2022, 158, 447–462. [Google Scholar] [CrossRef] [PubMed]
- Manna, A.; Kundu, R.; Kaplun, D.; Sinitca, A.; Sarkar, R. A fuzzy rank-based ensemble of CNN models for classification of cervical cytology. Sci. Rep. 2021, 11, 14538. [Google Scholar] [CrossRef] [PubMed]
- Cao, K.; Liu, Y.; Meng, G.; Sun, Q. An Overview on Edge Computing Research. IEEE Access 2020, 8, 85714–85728. [Google Scholar] [CrossRef]
- Varghese, B.; Wang, N.; Barbhuiya, S.; Kilpatrick, P.; Nikolopoulos, D.S. Challenges and Opportunities in Edge Computing. In Proceedings of the 2016 IEEE International Conference on Smart Cloud (SmartCloud 2016), New York, NY, USA, 18–20 November 2016; pp. 20–26. [Google Scholar]
- Höchst, J.; Bellafkir, H.; Lampe, P.; Vogelbacher, M.; Mühling, M.; Schneider, D.; Lindner, K.; Röcner, S.; Schabo, D.G.; Farwig, N.; et al. Bird@ Edge: Bird Species Recognition at the Edge. In Proceedings of the International Conference on Networked Systems (NETYS 2022), Networked Systems, Virtual, 17–19 May 2022; pp. 69–86. [Google Scholar]
- Rahman, M.A.; Hossain, M.S. An Internet-of-Medical-Things-Enabled Edge Computing Framework for Tackling COVID-19. IEEE Internet Things J. 2021, 8, 15847–15854. [Google Scholar] [CrossRef]
- Nath, R.K.; Thapliyal, H.; Caban-Holt, A.; Mohanty, S.P. Machine Learning Based Solutions for Real-Time Stress Monitoring. IEEE Consum. Electron. Mag. 2020, 9, 34–41. [Google Scholar] [CrossRef]
- CREMA-D Dataset. Available online: https://paperswithcode.com/dataset/crema-d (accessed on 2 May 2020).
- Asif, M.; Usaid, M.; Rashid, M.; Rajab, T.; Hussain, S.; Wasi, S. Large-scale audio dataset for emergency vehicle sirens and road noises. Sci. Data 2022, 9, 599. [Google Scholar] [CrossRef]
- Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020) Workshops, Seattle, WA, USA, 13–19 June 2020; pp. 702–703. [Google Scholar]
- Tan, M.; Le, Q. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (PMLR 2019), Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 6105–6114. [Google Scholar]
- Liu, S.H.; Li, R.X.; Wang, J.J.; Chen, W.; Su, C.H. Classification of Photoplethysmographic Signal Quality with Deep Convolution Neural Networks for Accurate Measurement of Cardiac Stroke Volume. Appl. Sci. 2020, 10, 4612. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861.30. [Google Scholar]
- Bello, I.; Fedus, W.; Du, X.; Cubuk, E.D.; Srinivas, A.; Lin, T.-Y.; Shlens, J.; Zoph, B. Revisiting ResNets: Improved Training and Scaling Strategies. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Online, 6–14 December 2021. [Google Scholar]
- Chen, Y.; Xu, K.; Zhou, P.; Ban, X.; He, D. Improved cross entropy loss for noisy labels in vision leaf disease classification. IET Image Process. 2022, 16, 1511–1519. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Kurniawan, A. Arduino Nano 33 BLE Sense Board Development. In IoT Projects with Arduino Nano 33 BLE Sense; Apress: Berkeley, CA, USA, 2021; pp. 21–74. ISBN 978-1-4842-6457-7. [Google Scholar]
- Classifying Live Audio Input with a Built-in Sound Classifier. Available online: https://developer.apple.com/documentation/soundanalysis/classifying_live_audio_input_with_a_built-in_sound_classifier (accessed on 3 January 2023).
- Available online: https://drive.google.com/file/d/1NO1FpZ4LxTDhiH1B0ZW7RQoiTlb0gQlF/view?usp=drive_link (accessed on 6 January 2023).
- Leon, F.; Floria, S.A.; Bădică, C. Evaluating the Effect of Voting Methods on Ensemble-Based Classification. In Proceedings of the 2017 IEEE International Conference on INnovations in Intelligent Systems and Applications (INISTA 2017), Gdynia, Poland, 3–5 July 2017; pp. 1–6. [Google Scholar]
- Huang, F.; Xie, G.; Xiao, R. Research on Ensemble Learning. In Proceedings of the 2009 International Conference on Artificial Intelligence and Computational Intelligence (AICI 2009), Shanghai, China, 7–8 November 2009; Volume 3, pp. 249–252. [Google Scholar]
- Zielonka, M.; Piastowski, A.; Czyżewski, A.; Nadachowski, P.; Operlejn, M.; Kaczor, K. Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets. Electronics 2022, 11, 3831. [Google Scholar] [CrossRef]
- Hans, A.S.A.; Rao, S. A CNN-LSTM based deep neural networks for facial emotion detection in videos. Int. J. Adv. Signal Image Sci. 2021, 7, 11–20. [Google Scholar] [CrossRef]
- Beard, R.; Das, R.; Ng, R.W.; Gopalakrishnan, P.K.; Eerens, L.; Swietojanski, P.; Miksik, O. Multi-Modal Sequence Fusion via Recursive Attention for Emotion Recognition. In Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL 2018), Brussels, Belgium, 31 October–1 November 2018; pp. 251–259. [Google Scholar]
- Kumar, P.; Jain, S.; Raman, B.; Roy, P.P.; Iwamura, M. End-to-end Triplet Loss based Emotion Embedding System for Speech Emotion Recognition. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR 2021), Milan, Italy, 10–15 January 2021; pp. 8766–8773. [Google Scholar]
stage | Operator . | Resolution | #Channels | #Layers |
---|---|---|---|---|
1 | Conv33 | 224192 | 32 | 1 |
2 | MBConv1, k33 | 11296 | 16 | 1 |
3 | MBConv6, k33 | 11296 | 24 | 2 |
4 | MBConv6, k55 | 5648 | 40 | 2 |
5 | MBConv6, k33 | 2824 | 80 | 3 |
6 | MBConv6, k55 | 2824 | 112 | 3 |
7 | MBConv6, k55 | 1412 | 192 | 4 |
8 | MBConv6, k33 | 76 | 320 | 1 |
9 | Conv11 & Flatten & FC | 76 | 1280 | 1 |
Hyperparameter | Selected Value |
---|---|
Loss function | |
Optimizer | Adam |
Learning rate | 1 |
Batch size | 16 |
Epoch | 1000 |
Estimated Class | |||
---|---|---|---|
Actual Class | Positive | Negative | |
Positive | TP (True Positive) | FN (False Negative) | |
Negative | FP (False Positive) | TN (True Negative) |
Neutral | Anger | Fear | Happy | Car Horns | Sirens | Ambulance Siren | |
---|---|---|---|---|---|---|---|
Neutral | 2590 | 10 | 5 | 4 | 0 | 0 | 0 |
Anger | 10 | 2588 | 14 | 5 | 5 | 0 | 0 |
Fear | 6 | 11 | 2594 | 3 | 0 | 2 | 0 |
Happy | 8 | 6 | 2 | 2598 | 0 | 0 | 0 |
Car Horns | 0 | 0 | 0 | 0 | 1094 | 17 | 25 |
Sirens | 0 | 0 | 0 | 0 | 13 | 1105 | 35 |
Ambulance Siren | 0 | 0 | 1 | 0 | 26 | 21 | 1096 |
Neutral | Anger | Fear | Happy | Car Horns | Sirens | Ambulance Siren | |
---|---|---|---|---|---|---|---|
Neutral | 2538 | 20 | 10 | 12 | 0 | 0 | 0 |
Anger | 35 | 2549 | 33 | 20 | 0 | 0 | 0 |
Fear | 24 | 27 | 2568 | 7 | 0 | 0 | 0 |
Happy | 17 | 19 | 3 | 2571 | 0 | 0 | 0 |
Car Horns | 0 | 0 | 0 | 0 | 1079 | 66 | 46 |
Sirens | 0 | 0 | 0 | 0 | 18 | 1067 | 28 |
Ambulance Siren | 0 | 0 | 0 | 0 | 39 | 12 | 1082 |
Neutral | Anger | Fear | Happy | Car Horns | Sirens | Ambulance Siren | |
---|---|---|---|---|---|---|---|
Neutral | 1927 | 215 | 192 | 794 | 0 | 0 | 0 |
Anger | 143 | 1589 | 776 | 118 | 0 | 0 | 0 |
Fear | 230 | 743 | 1521 | 101 | 0 | 0 | 0 |
Happy | 314 | 68 | 127 | 1597 | 0 | 0 | 0 |
Car Horns | 0 | 0 | 0 | 0 | 1065 | 23 | 16 |
Sirens | 0 | 0 | 0 | 0 | 26 | 1045 | 43 |
Ambulance Siren | 0 | 0 | 1 | 0 | 47 | 77 | 1097 |
Neutral | Anger | Fear | Happy | Car Horns | Sirens | Ambulance Siren | |
---|---|---|---|---|---|---|---|
Neutral | 1272 | 14 | 12 | 11 | 8 | 12 | 10 |
Anger | 12 | 1274 | 13 | 23 | 10 | 8 | 12 |
Fear | 23 | 8 | 1268 | 12 | 5 | 10 | 11 |
Happy | 9 | 10 | 4 | 1267 | 7 | 3 | 5 |
Car Horns | 15 | 20 | 10 | 8 | 1084 | 14 | 6 |
Sirens | 7 | 2 | 18 | 9 | 13 | 1079 | 7 |
Ambulance Siren | 10 | 12 | 9 | 14 | 9 | 5 | 1095 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chin, C.-L.; Lin, C.-C.; Wang, J.-W.; Chin, W.-C.; Chen, Y.-H.; Chang, S.-W.; Huang, P.-C.; Zhu, X.; Hsu, Y.-L.; Liu, S.-H. A Wearable Assistant Device for the Hearing Impaired to Recognize Emergency Vehicle Sirens with Edge Computing. Sensors 2023, 23, 7454. https://doi.org/10.3390/s23177454
Chin C-L, Lin C-C, Wang J-W, Chin W-C, Chen Y-H, Chang S-W, Huang P-C, Zhu X, Hsu Y-L, Liu S-H. A Wearable Assistant Device for the Hearing Impaired to Recognize Emergency Vehicle Sirens with Edge Computing. Sensors. 2023; 23(17):7454. https://doi.org/10.3390/s23177454
Chicago/Turabian StyleChin, Chiun-Li, Chia-Chun Lin, Jing-Wen Wang, Wei-Cheng Chin, Yu-Hsiang Chen, Sheng-Wen Chang, Pei-Chen Huang, Xin Zhu, Yu-Lun Hsu, and Shing-Hong Liu. 2023. "A Wearable Assistant Device for the Hearing Impaired to Recognize Emergency Vehicle Sirens with Edge Computing" Sensors 23, no. 17: 7454. https://doi.org/10.3390/s23177454
APA StyleChin, C. -L., Lin, C. -C., Wang, J. -W., Chin, W. -C., Chen, Y. -H., Chang, S. -W., Huang, P. -C., Zhu, X., Hsu, Y. -L., & Liu, S. -H. (2023). A Wearable Assistant Device for the Hearing Impaired to Recognize Emergency Vehicle Sirens with Edge Computing. Sensors, 23(17), 7454. https://doi.org/10.3390/s23177454