Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System
Abstract
:1. Introduction
2. Related Work
3. Methodology
3.1. Data Set Description
3.2. Harris Hawks Sparse Auto-Encoder Networks (HHSAE)-ASR Framework
3.2.1. Speech Signal Preprocessing and Denoising
3.2.2. Signal Decomposition and Feature Extraction
3.2.3. Speech Recognition
4. Results and Discussion
4.1. Experiment Setup
4.2. Objective Performance Evaluation
4.3. Subjective Performance Evaluation
4.4. Data Accessing in HHSAE-ASR
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1. Sparse Autoencoding
Appendix A.2. Model Fine-Tuning using Haris Hawk Optimisation
References
- Jahangir, R.; Teh, Y.W.; Nweke, H.F.; Mujtaba, G.; Al-Garadi, M.A.; Ali, I. Speaker identification through artificial intelligence techniques: A comprehensive review and research challenges. Expert Syst. Appl. 2021, 171, 114591. [Google Scholar] [CrossRef]
- Alharbi, S.; Alrazgan, M.; Alrashed, A.; Alnomasi, T.; Almojel, R.; Alharbi, R.; Almojil, M. Automatic speech recognition: Systematic literature review. IEEE Access 2021, 9, 131858–131876. [Google Scholar] [CrossRef]
- Harouni, M.; Rahim, M.; Al-Rodhaan, M.; Saba, T.; Rehman, A.; Al-Dhelaan, A. Online Persian/Arabic script classification without contextual information. Imaging Sci. J. 2014, 62, 437–448. [Google Scholar] [CrossRef]
- Lung, J.W.J.; Salam, M.S.H.; Rehman, A.; Rahim, M.S.M.; Saba, T. Fuzzy phoneme classification using multi-speaker vocal tract length normalization. IETE Tech. Rev. 2014, 31, 128–136. [Google Scholar] [CrossRef]
- Chiu, P.; Chang, J.; Lee, M.; Chen, C.; Lee, D. Enabling intelligent environment by the design of emotionally aware virtual assistant: A case of smart campus. IEEE Access 2020, 8, 62032–62041. [Google Scholar] [CrossRef]
- Joudaki, S.; Mohamad, D.b.; Saba, T.; Rehman, A.; Al-Rodhaan, M.; Al-Dhelaan, A. Vision-based sign language classification: A directional review. IETE Tech. Rev. 2014, 31, 383–391. [Google Scholar] [CrossRef]
- Delić, V.; Perić, Z.; Sečujski, M.; Jakovljević, N.; Nikolić, J.; Mišković, D.; Delić, T. Speech technology progress based on new machine learning paradigm. Comput. Intell. Neurosci. 2019. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nassif, A.B.; Shahin, I.; Attili, I.; Azzeh, M.; Shaalan, K. Speech recognition using deep neural networks: A systematic review. IEEE Access 2019, 7, 19143–19165. [Google Scholar] [CrossRef]
- Awan, M.J.; Rahim, M.S.M.; Salim, N.; Rehman, A.; Nobanee, H.; Shabir, H. Improved Deep Convolutional Neural Network to Classify Osteoarthritis from Anterior Cruciate Ligament Tear Using Magnetic Resonance Imaging. J. Pers. Med. 2021, 11, 1163. [Google Scholar] [CrossRef]
- Gnanamanickam, J.; Natarajan, Y.; Sri Preethaa, K.R. A hybrid speech enhancement algorithm for voice assistance application. Sensors 2021, 21, 7025. [Google Scholar] [CrossRef]
- Jamal, A.; Hazim Alkawaz, M.; Rehman, A.; Saba, T. Retinal imaging analysis based on vessel detection. Microsc. Res. Tech. 2017, 80, 799–811. [Google Scholar] [CrossRef] [PubMed]
- Awan, M.J.; Masood, O.A.; Mohammed, M.A.; Yasin, A.; Zain, A.M.; Damaševičius, R.; Abdulkareem, K.H. Image-Based Malware Classification Using VGG19 Network and Spatial Convolutional Attention. Electronics 2021, 10, 2444. [Google Scholar] [CrossRef]
- Chen, Y.-Y. Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm. Sensors 2018, 18, 1467. [Google Scholar] [CrossRef] [Green Version]
- Ferooz, F.; Hassan, M.T.; Awan, M.J.; Nobanee, H.; Kamal, M.; Yasin, A.; Zain, A.M. Suicide Bomb Attack Identification and Analytics through Data Mining Techniques. Electronics 2021, 10, 2398. [Google Scholar] [CrossRef]
- Neamah, K.; Mohamad, D.; Saba, T.; Rehman, A. Discriminative features mining for offline handwritten signature verification. 3D Research 2014, 5, 1–6. [Google Scholar] [CrossRef]
- Hori, T.; Watanabe, S.; Zhang, Y.; Chan, W. Advances in joint CTC-attention based end-to-end speech recognition with a deep CNN encoder and RNN-LM. arXiv 2017, arXiv:1706.02737. [Google Scholar]
- Pipiras, L.; Maskeliūnas, R.; Damaševičius, R. Lithuanian speech recognition using purely phonetic deep learning. Computers 2019, 8, 76. [Google Scholar] [CrossRef] [Green Version]
- Awan, M.J.; Farooq, U.; Babar, H.M.A.; Yasin, A.; Nobanee, H.; Hussain, M.; Hakeem, O.; Zain, A.M. Real-Time DDoS Attack Detection System Using Big Data Approach. Sustainability 2021, 13, 10743. [Google Scholar] [CrossRef]
- Akçay, M.B.; Oğuz, K. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 2020, 116, 56–76. [Google Scholar] [CrossRef]
- Li, Q.; Yang, Y.; Lan, T.; Zhu, H.; Wei, Q.; Qiao, F.; Yang, H. MSP-MFCC: Energy-efficient MFCC feature extraction method with mixed-signal processing architecture for wearable speech recognition applications. IEEE Access 2020, 8, 48720–48730. [Google Scholar] [CrossRef]
- Haeb-Umbach, R.; Watanabe, S.; Nakatani, T.; Bacchiani, M.; Hoffmeister, B.; Seltzer, M.L.; Souden, M. Speech processing for digital home assistants: Combining signal processing with deep-learning techniques. IEEE Signal Processing Mag. 2019, 36, 111–124. [Google Scholar] [CrossRef]
- Awan, M.J.; Bilal, M.H.; Yasin, A.; Nobanee, H.; Khan, N.S.; Zain, A.M. Detection of COVID-19 in Chest X-ray Images: A Big Data Enabled Deep Learning Approach. Int. J. Environ. Res. Public Health 2021, 18, 10147. [Google Scholar] [CrossRef] [PubMed]
- Aftab, M.O.; Awan, M.J.; Khalid, S.; Javed, R.; Shabir, H. Executing Spark BigDL for Leukemia Detection from Microscopic Images using Transfer Learning. In Proceedings of the 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia, 6–7 April 2021; pp. 216–220. [Google Scholar]
- Malik, M.; Malik, M.K.; Mehmood, K.; Makhdoom, I. Automatic speech recognition: A survey. Multimed. Tools Appl. 2021, 80, 9411–9457. [Google Scholar] [CrossRef]
- Lokesh, S.; Malarvizhi Kumar, P.; Ramya Devi, M.; Parthasarathy, P.; Gokulnath, C. An Automatic Tamil Speech Recognition system by using Bidirectional Recurrent Neural Network with Self-Organizing Map. Neural Comput. Appl. 2019, 31, 1521–1531. [Google Scholar] [CrossRef]
- Ismail, A.; Abdlerazek, S.; El-Henawy, I.M. Development of Smart Healthcare System Based on Speech Recognition Using Support Vector Machine and Dynamic Time Warping. Sustainability 2020, 12, 2403. [Google Scholar] [CrossRef] [Green Version]
- Khan, M.A.; Sharif, M.; Akram, T.; Raza, M.; Saba, T.; Rehman, A. Hand-crafted and deep convolutional neural network features fusion and selection strategy: An application to intelligent human action recognition. Appl. Soft Comput. 2020, 87, 105986. [Google Scholar] [CrossRef]
- Mao, H.H.; Li, S.; McAuley, J.; Cottrell, G. Speech recognition and multi-speaker diarization of long conversations. arXiv 2020, arXiv:2005.08072. [Google Scholar]
- Wani, T.M.; Gunawan, T.S.; Qadri, S.A.A.; Kartiwi, M.; Ambikairajah, E. A comprehensive review of speech emotion recognition systems. IEEE Access 2021, 9, 47795–47814. [Google Scholar] [CrossRef]
- Koromilas, P.; Giannakopoulos, T. Deep multimodal emotion recognition on human speech: A review. Appl. Sci. 2021, 11, 7962. [Google Scholar] [CrossRef]
- Hussain, M.; Javed, W.; Hakeem, O.; Yousafzai, A.; Younas, A.; Awan, M.J.; Nobanee, H.; Zain, A.M. Blockchain-Based IoT Devices in Supply Chain Management: A Systematic Literature Review. Sustainability 2021, 13, 13646. [Google Scholar] [CrossRef]
- Khalil, R.A.; Jones, E.; Babar, M.I.; Jan, T.; Zafar, M.H.; Alhussain, T. Speech Emotion Recognition Using Deep Learning Techniques: A Review. IEEE Access 2019, 7, 117327–117345. [Google Scholar] [CrossRef]
- Fahad, M.S.; Deepak, A.; Pradhan, G.; Yadav, J. DNN-HMM-Based Speaker-Adaptive Emotion Recognition Using MFCC and Epoch-Based Features. Circuits Syst. Signal Process 2021, 40, 466–489. [Google Scholar] [CrossRef]
- Zhao, J.; Mao, X.; Chen, L. Learning deep features to recognise speech emotion using merged deep CNN. IET Signal Processing 2018, 12, 713–721. [Google Scholar] [CrossRef]
- Zhao, J.; Mao, X.; Chen, L. Speech emotion recognition using deep 1D 2D CNN LSTM networks. Biomed. Signal Processing Control 2019, 47, 312–323. [Google Scholar] [CrossRef]
- Lee, W.; Seong, J.J.; Ozlu, B.; Shim, B.S.; Marakhimov, A.; Lee, S. Biosignal sensors and deep learning-based speech recognition: A review. Sensors 2021, 21, 1399. [Google Scholar] [CrossRef]
- Awan, M.J.; Yasin, A.; Nobanee, H.; Ali, A.A.; Shahzad, Z.; Nabeel, M.; Zain, A.M.; Shahzad, H.M.F. Fake News Data Exploration and Analytics. Electronics 2021, 10, 2326. [Google Scholar] [CrossRef]
- Bérubé, C.; Schachner, T.; Keller, R.; Fleisch, E.; Wangenheim, F.V.; Barata, F.; Kowatsch, T. Voice-based conversational agents for the prevention and management of chronic and mental health conditions: Systematic literature review. J. Med. Internet Res. 2021, 23, e25933. [Google Scholar] [CrossRef]
- Połap, D.; Woźniak, M.; Damaševičius, R.; Maskeliūnas, R. Bio-inspired voice evaluation mechanism. Appl. Soft Comput. J. 2019, 80, 342–357. [Google Scholar] [CrossRef]
- Mohammed, M.A.; Abdulkareem, K.H.; Mostafa, S.A.; Ghani, M.K.A.; Maashi, M.S.; Garcia-Zapirain, B.; Al-Dhief, F.T. Voice pathology detection and classification using convolutional neural network model. Appl. Sci. 2020, 10, 3723. [Google Scholar] [CrossRef]
- Lauraitis, A.; Maskeliunas, R.; Damaševičius, R.; Krilavičius, T. Detection of speech impairments using cepstrum, auditory spectrogram and wavelet time scattering domain features. IEEE Access 2020, 8, 96162–96172. [Google Scholar] [CrossRef]
- Lauraitis, A.; Maskeliūnas, R.; Damaševičius, R.; Krilavičius, T. A mobile application for smart computer-aided self-administered testing of cognition, speech, and motor impairment. Sensors 2020, 20, 3236. [Google Scholar] [CrossRef]
- Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
- Meethongjan, K.; Dzulkifli, M.; Rehman, A.; Altameem, A.; Saba, T. An intelligent fused approach for face recognition. J. Intell. Syst. 2013, 22, 197–212. [Google Scholar] [CrossRef]
- Javed Awan, M.; Shafry Mohd Rahim, M.; Nobanee, H.; Yasin, A.; Ibrahim Khalaf, O.; Ishfaq, U. A Big Data Approach to Black Friday Sales. Intell. Autom. Soft Comput. 2021, 27, 785–797. [Google Scholar] [CrossRef]
- Awan, M.J.; Khan, R.A.; Nobanee, H.; Yasin, A.; Anwar, S.M.; Naseem, U.; Singh, V.P. A Recommendation Engine for Predicting Movie Ratings Using a Big Data Approach. Electronics 2021, 10, 1215. [Google Scholar] [CrossRef]
- Awan, M.J.; Rahim, M.S.M.; Nobanee, H.; Munawar, A.; Yasin, A.; Zain, A.M. Social Media and Stock Market Prediction: A Big Data Approach. Comput. Mater. Contin. 2021, 67, 2569–2583. [Google Scholar] [CrossRef]
- Haafza, L.A.; Awan, M.J.; Abid, A.; Yasin, A.; Nobanee, H.; Farooq, M.S. Big Data COVID-19 Systematic Literature Review: Pandemic Crisis. Electronics 2021, 10, 3125. [Google Scholar] [CrossRef]
- Awan, M.J.; Gilani, S.A.H.; Ramzan, H.; Nobanee, H.; Yasin, A.; Zain, A.M.; Javed, R. Cricket Match Analytics Using the Big Data Approach. Electronics 2021, 10, 2350. [Google Scholar] [CrossRef]
- O’Brien, M.G.; Derwing, T.M.; Cucchiarini, C.; Hardison, D.M.; Mixdorff, H.; Thomson, R.I.; Strik, H.; Levis, J.M.; Munro, M.J.; Foote, J.A. Directions for the future of technology in pronunciation research and teaching. J. Second Lang. Pronunciation 2018, 4, 182–207. [Google Scholar] [CrossRef]
- Ramzan, F.; Khan, M.U.G.; Rehmat, A.; Iqbal, S.; Saba, T.; Rehman, A.; Mehmood, Z. A deep learning approach for automated diagnosis and multi-class classification of Alzheimer’s disease stages using resting-state fMRI and residual neural networks. J. Med. Syst. 2020, 44, 1–16. [Google Scholar] [CrossRef]
- Ali, S.F.; Aslam, A.S.; Awan, M.J.; Yasin, A.; Damaševičius, R. Pose Estimation of Driver’s Head Panning Based on Interpolation and Motion Vectors under a Boosting Framework. Appl. Sci. 2021, 11, 11600. [Google Scholar] [CrossRef]
- Elaziz, M.A.; Heidari, A.A.; Fujita, H.; Moayedi, H. A competitive chain-based harris hawks optimizer for global optimization and multi-level image thresholding problems. Appl. Soft Comput. J. 2020, 95, 106347. [Google Scholar] [CrossRef]
- Mujahid, A.; Awan, M.J.; Yasin, A.; Mohammed, M.A.; Damaševičius, R.; Maskeliūnas, R.; Abdulkareem, K.H. Real-Time Hand Gesture Recognition Based on Deep Learning YOLOv3 Model. Appl. Sci. 2021, 11, 4164. [Google Scholar] [CrossRef]
- Awan, M.J.; Rahim, M.S.M.; Salim, N.; Mohammed, M.A.; Garcia-Zapirain, B.; Abdulkareem, K.H. Efficient Detection of Knee Anterior Cruciate Ligament from Magnetic Resonance Imaging Using Deep Learning Approach. Diagnostics 2021, 11, 105. [Google Scholar] [CrossRef] [PubMed]
- Schwoebel, J. Jim-Schwoebel/Voice_Datasets: A Comprehensive List of Open-Source Datasets for Voice and Sound Computing (95+ Datasets). GitHub. Available online: https://github.com/jim-schwoebel/voice_datasets (accessed on 27 November 2021).
Methods | Mean Square Error (MSE) | Root Mean Square Error (RMSE) | Voice/Unvoiced (VUV) Error |
---|---|---|---|
Multiobjective evolutionary optimization algorithm [12] | 1.65 | 1.43 | 1.28 |
Deep convolution encoder and LSTM-RNN [14] | 1.53 | 1.38 | 1.26 |
Continual learning algorithms [15] | 1.427 | 1.25 | 1.17 |
Genetic algorithm [18] | 1.36 | 1.14 | 1.15 |
MFCC and DTW [20] | 1.21 | 1.12 | 1.10 |
HHSAE-ASR | 1.11 | 1.087 | 1.01 |
Participants | Precision (%) | Recall (%) | Mathew Correlation Coefficient (MCC) (%) | F-Measure (%) |
---|---|---|---|---|
100 | 99.53 | 99.02 | 99.21 | 99.24 |
150 | 99.24 | 99.35 | 99.5 | 99.35 |
200 | 98.56 | 99.21 | 99.23 | 99.23 |
250 | 99.13 | 99.46 | 99.1 | 99.56 |
300 | 99.56 | 99.45 | 99.43 | 99.22 |
350 | 99.25 | 99.13 | 99.00 | 99.26 |
400 | 99.22 | 99.54 | 99.23 | 99.23 |
450 | 99.21 | 99.24 | 99.18 | 99.35 |
500 | 99.56 | 99.56 | 99.34 | 99.03 |
550 | 99.13 | 99.12 | 99.39 | 99.13 |
600 | 99.11 | 99.02 | 99.12 | 99.23 |
650 | 99.09 | 99.23 | 99.02 | 99.3 |
700 | 99.15 | 99.76 | 98.92 | 99.22 |
750 | 99.25 | 98.37 | 99.032 | 98.77 |
800 | 99.35 | 99.02 | 99.21 | 98.3 |
850 | 99.53 | 98.98 | 99.34 | 99.45 |
900 | 99.21 | 99.12 | 99.10 | 99.56 |
950 | 99.02 | 99.3 | 99.3 | 99.1 |
1000 | 99.13 | 99.33 | 99.42 | 98.98 |
Number of Participants | MFCC (%) | MSE (%) | HHSAE-ASR (%) |
---|---|---|---|
10 | 47.2 | 50.4 | 71.4 |
20 | 48.4 | 53.6 | 74.7 |
30 | 49.5 | 56.8 | 76.3 |
40 | 52.7 | 58.5 | 77.7 |
50 | 54.8 | 60.7 | 81.9 |
60 | 55.9 | 62.2 | 83.1 |
70 | 59.4 | 61.3 | 85.2 |
80 | 60.6 | 65.6 | 88.5 |
90 | 63.8 | 68.8 | 90.6 |
100 | 66.2 | 69.2 | 91.8 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ali, M.H.; Jaber, M.M.; Abd, S.K.; Rehman, A.; Awan, M.J.; Vitkutė-Adžgauskienė, D.; Damaševičius, R.; Bahaj, S.A. Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System. Appl. Sci. 2022, 12, 1091. https://doi.org/10.3390/app12031091
Ali MH, Jaber MM, Abd SK, Rehman A, Awan MJ, Vitkutė-Adžgauskienė D, Damaševičius R, Bahaj SA. Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System. Applied Sciences. 2022; 12(3):1091. https://doi.org/10.3390/app12031091
Chicago/Turabian StyleAli, Mohammed Hasan, Mustafa Musa Jaber, Sura Khalil Abd, Amjad Rehman, Mazhar Javed Awan, Daiva Vitkutė-Adžgauskienė, Robertas Damaševičius, and Saeed Ali Bahaj. 2022. "Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System" Applied Sciences 12, no. 3: 1091. https://doi.org/10.3390/app12031091
APA StyleAli, M. H., Jaber, M. M., Abd, S. K., Rehman, A., Awan, M. J., Vitkutė-Adžgauskienė, D., Damaševičius, R., & Bahaj, S. A. (2022). Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System. Applied Sciences, 12(3), 1091. https://doi.org/10.3390/app12031091