CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network
Abstract
:1. Introduction
- We proposed a novel one-dimensional (1D) architecture for SER using the ConvLSTM layers, which find the spatial and the semantic correlation between the speech segments. We adopted an input-to-state and a state-to-state transition strategy that utilized a hierarchical connection that learns the spatial local cues by utilizing the convolution operation. Our model extracts the spatiotemporal hierarchical emotional cues by proposing several ConvLSTM blocks that are called the local features learning blocks (LFLBs). According to the best of our knowledge, this is the up-to-date invasion of AI and deep learning in SER domain.
- The sequence learning is very important to adjust the relevant global features’ weights in order to find the correlation and the long-term contextual dependencies in the input features, which ensure the improvement of the feature extraction and the recognition performance. That’s why we adaptively adjust the GRUs network with the ConvLSTM in order to compute the weights of the global features, since the global feature weights are re-adjusting from the local and the global features. According our knowledge, this is the first time that the GRUs network is seamlessly integrated with the ConvLSTM mechanism for the SER using raw speech signals.
- We introduced a center loss function, which learns and finds the center of the feature vector of each class and defines the distance among the features and their consistent class center. The usage of the center loss with softmax loss improves the recognition result with a high performance. We proved that the center loss is easily optimized and increased the performance of deep learning model. To the best of our knowledge, this is the first time that the center loss is used with the ConvLSTM in the SER domain.
- We tested our system on two standard corpora, which include the IEMOCAP [18] and the RAVDESS [19] emotional speech corpus, in order to evaluate the effectiveness and the significance of the model. We secured a 75% recognition rate and an 80% recognition rate for the IEMOCAP and RAVDESS corpora. The experimental result indicated the robustness and the simplicity of the proposed system and showed the applicability for actual application.
2. Related Works
3. The Proposed SER Framework
3.1. ConvLSTM in the SER Model
3.2. The Global Feature Learning Block (GFLB)
3.3. Center Loss Function
3.4. Our Model Configuration
4. Experimental Evaluation and Discussion
4.1. The Results of the IEMOCAP (Interactive Emotional Dyadic Capture)
4.2. Results of the RAVDESS (Ryerson Audiovisual Database of Emotional Speech and Song)
5. Conclusions and Future Direction
Author Contributions
Funding
Conflicts of Interest
References
- Kim, J.-Y.; Cho, S.-B. Towards Repayment Prediction in Peer-to-Peer Social Lending Using Deep Learning. Mathematics 2019, 7, 1041. [Google Scholar] [CrossRef] [Green Version]
- Sajjad, M.; Kwon, S. Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM. IEEE Access 2020, 8, 79861–79875. [Google Scholar]
- Lin, Y.-C.; Wang, Y.-C.; Chen, T.-C.T.; Lin, H.-F. Evaluating the suitability of a smart technology application for fall detection using a fuzzy collaborative intelligence approach. Mathematics 2019, 7, 1097. [Google Scholar] [CrossRef] [Green Version]
- Kwon, S. A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition. Sensors 2020, 20, 183. [Google Scholar]
- Baioletti, M.; Di Bari, G.; Milani, A.; Poggioni, V. Differential Evolution for Neural Networks Optimization. Mathematics 2020, 8, 69. [Google Scholar] [CrossRef] [Green Version]
- Anvarjon, T.; Kwon, S. Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features. Sensors 2020, 20, 5212. [Google Scholar] [CrossRef]
- Das Antar, A.; Ahmed, M.; Ahad, A.R. Challenges in Sensor-based Human Activity Recognition and a Comparative Analysis of Benchmark Datasets: A Review. In Proceedings of the 2019 Joint 8th International Conference on Informatics, Electronics & Vision (ICIEV) and 2019 3rd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), Spokane, WA, USA, 30 May–2 June 2019; pp. 134–139. [Google Scholar]
- Khalil, R.A.; Jones, E.; Babar, M.I.; Jan, T.; Zafar, M.H.; Alhussain, T. Speech Emotion Recognition Using Deep Learning Techniques: A Review. IEEE Access 2019, 7, 117327–117345. [Google Scholar] [CrossRef]
- Pandey, S.K.; Shekhawat, H.S.; Prasanna, S.R.M. Deep Learning Techniques for Speech Emotion Recognition: A Review. In Proceedings of the 2019 29th International Conference Radioelektronika (RADIOELEKTRONIKA), Pardubice, Czech Republic, 16–18 April 2019; pp. 1–6. [Google Scholar]
- Ji, S.; Kim, J.; Im, H. A Comparative Study of Bitcoin Price Prediction Using Deep Learning. Mathematics 2019, 7, 898. [Google Scholar] [CrossRef] [Green Version]
- Khan, N.; Ullah, A.; Haq, I.U.; Menon, V.G.; Baik, S.W. SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network. J. Real Time Image Process. 2020, 1–15. [Google Scholar] [CrossRef]
- Jara-Vera, V.; Sánchez-Ávila, C. Cryptobiometrics for the Generation of Cancellable Symmetric and Asymmetric Ciphers with Perfect Secrecy. Mathematics 2020, 8, 1536. [Google Scholar] [CrossRef]
- El Ayadi, M.; Kamel, M.S.; Karray, F. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognit. 2011, 44, 572–587. [Google Scholar] [CrossRef]
- Zhu, L.; Chen, L.; Zhao, D.; Zhou, J.; Zhang, W. Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN. Sensors 2017, 17, 1694. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ullah, W.; Ullah, A.; Haq, I.U.; Muhammad, K.; Sajjad, M.; Baik, S.W. CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimed. Tools Appl. 2020, 1–17. [Google Scholar] [CrossRef]
- Zhang, J.; Jiang, X.; Chen, X.; Li, X.; Guo, N.; Cui, L. Wind Power Generation Prediction Based on LSTM. In Proceedings of the 2019 4th International Conference on Mathematics and Artificial Intelligence—ICMAI 2019, Chegndu, China, 12–15 April 2019; pp. 85–89. [Google Scholar]
- Kurpukdee, N.; Koriyama, T.; Kobayashi, T.; Kasuriya, S.; Wutiwiwatchai, C.; Lamsrichan, P. Speech emotion recognition using convolutional long short-term memory neural network and support vector machines. In Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Malaysia, 12–15 December 2017; pp. 1744–1749. [Google Scholar]
- Busso, C.; Bulut, M.; Lee, C.-C.; Kazemzadeh, A.; Mower, E.; Kim, S.; Chang, J.N.; Lee, S.; Narayanan, S.S. IEMOCAP: Interactive emotional dyadic motion capture database. Lang. Resour. Eval. 2008, 42, 335–359. [Google Scholar] [CrossRef]
- Livingstone, S.R.; Russo, F.A. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 2018, 13, e0196391. [Google Scholar] [CrossRef] [Green Version]
- Ma, X.; Wu, Z.; Jia, J.; Xu, M.; Meng, H.; Cai, L. Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms. In Proceedings of the INTERSPEECH 2018, Hyderabad, India, 2–6 September 2018. [Google Scholar] [CrossRef] [Green Version]
- Liu, B.; Qin, H.; Gong, Y.; Ge, W.; Xia, M.; Shi, L. EERA-ASR: An Energy-Efficient Reconfigurable Architecture for Automatic Speech Recognition With Hybrid DNN and Approximate Computing. IEEE Access 2018, 6, 52227–52237. [Google Scholar] [CrossRef]
- Zhao, J.; Mao, X.; Chen, L. Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed. Signal Process. Control. 2019, 47, 312–323. [Google Scholar] [CrossRef]
- Yu, Y.; Kim, Y.-J. Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database. Electronics 2020, 9, 713. [Google Scholar] [CrossRef]
- Eeyben, F.; Scherer, K.R.; Schuller, B.; Sundberg, J.; Andre, E.; Busso, C.; Devillers, L.Y.; Epps, J.; Laukka, P.; Narayanan, S.S.; et al. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE Trans. Affect. Comput. 2016, 7, 190–202. [Google Scholar] [CrossRef] [Green Version]
- Triantafyllopoulos, A.; Keren, G.; Wagner, J.; Steiner, I.; Schuller, B.W. Towards Robust Speech Emotion Recognition Using Deep Residual Networks for Speech Enhancement. In Proceedings of the INTERSPEECH 2019, Graz, Austria, 15–19 September 2019. [Google Scholar] [CrossRef] [Green Version]
- Schuller, B.; Steidl, S.; Batliner, A.; Hirschberg, J.; Burgoon, J.K.; Baird, A.; Elkins, A.; Zhang, Y.; Coutinho, E.; Evanini, K. The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity and Native Language. In Proceedings of the Interspeech 2016, San Francisco, CA, USA, 8–12 September 2016; Volumes 1–5, pp. 2001–2005. [Google Scholar]
- Burkhardt, F.; Paeschke, A.; Rolfes, M.; Sendlmeier, W.F.; Weiss, B. A database of German emotional speech. In Proceedings of the 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 4–8 September 2005; pp. 1517–1520. [Google Scholar]
- Lim, W.; Jang, D.; Lee, T. Speech emotion recognition using convolutional and Recurrent Neural Networks. In Proceedings of the 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Jeju, Korea, 13–16 December 2016; pp. 1–4. [Google Scholar]
- Badshah, A.M.; Rahim, N.; Ullah, N.; Ahmad, J.; Muhammad, K.; Lee, M.Y.; Kwon, S.; Muhammad, K. Deep features-based speech emotion recognition for smart affective services. Multimed. Tools Appl. 2019, 78, 5571–5589. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Osia, S.A.; Shamsabadi, A.S.; Sajadmanesh, S.; Taheri, A.; Katevas, K.; Rabiee, H.R.; Lane, N.D.; Haddadi, H. A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics. IEEE Internet Things J. 2020, 7, 4505–4518. [Google Scholar] [CrossRef] [Green Version]
- Carta, S.M.; Corriga, A.; Ferreira, A.; Podda, A.S.; Recupero, D.R. A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning. Appl. Intell. 2020, 1–17. [Google Scholar] [CrossRef]
- Carta, S.M.; Ferreira, A.; Podda, A.S.; Recupero, D.R.; Sanna, A. Multi-DQN: An ensemble of Deep Q-learning agents for stock market forecasting. Expert Syst. Appl. 2020, 164, 113820. [Google Scholar] [CrossRef]
- Chatziagapi, A.; Paraskevopoulos, G.; Sgouropoulos, D.; Pantazopoulos, G.; Nikandrou, M.; Giannakopoulos, T.; Katsamanis, A.; Potamianos, A.; Narayanan, S. Data Augmentation Using GANs for Speech Emotion Recognition. In Proceedings of the INTERSPEECH 2019, Graz, Austria, 15–19 September 2019. [Google Scholar] [CrossRef] [Green Version]
- Bao, F.; Neumann, M.; Vu, N.T. CycleGAN-based emotion style transfer as data augmentation for speech emotion recognition. In Proceedings of the INTERSPEECH 2019, Graz, Austria, 15–19 September 2019; pp. 35–37. [Google Scholar]
- Fahad, M.; Yadav, J.; Pradhan, G.; Deepak, A. DNN-HMM based Speaker Adaptive Emotion Recognition using Proposed Epoch and MFCC Features. arXiv 2018, arXiv:1806.00984. [Google Scholar]
- Kourbatov, A.; Wolf, M. Predicting maximal gaps in sets of primes. Mathematics 2019, 7, 400. [Google Scholar] [CrossRef] [Green Version]
- Demircan, S.; Kahramanli, H. Application of fuzzy C-means clustering algorithm to spectral features for emotion classification from speech. Neural Comput. Appl. 2018, 29, 59–66. [Google Scholar] [CrossRef]
- Wu, X.; Liu, S.; Cao, Y.; Li, X.; Yu, J.; Dai, D.; Ma, X.; Hu, S.; Wu, Z.; Liu, X.; et al. Speech Emotion Recognition Using Capsule Networks. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 6695–6699. [Google Scholar]
- Jukic, S.; Saračević, M.; Subasi, A.; Kevric, J. Comparison of Ensemble Machine Learning Methods for Automated Classification of Focal and Non-Focal Epileptic EEG Signals. Mathematics 2020, 8, 1481. [Google Scholar] [CrossRef]
- Ahmad, J.; Sajjad, M.; Rho, S.; Kwon, S.; Lee, M.Y.; Baik, S.W. Determining speaker attributes from stress-affected speech in emergency situations with hybrid SVM-DNN architecture. Multimed. Tools Appl. 2016, 77, 4883–4907. [Google Scholar] [CrossRef]
- Shegokar, P.; Sircar, P. Continuous wavelet transform based speech emotion recognition. In Proceedings of the 2016 10th International Conference on Signal Processing and Communication Systems (ICSPCS), Gold Coast, Australia, 19–21 December 2016; pp. 1–8. [Google Scholar]
- Li, Y.; Zhao, T.; Kawahara, T. Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning. In Proceedings of the INTERSPEECH 2019, Graz, Austria, 15–19 September 2019. [Google Scholar] [CrossRef]
- Zeng, Y.; Mao, H.; Peng, D.; Yi, Z. Spectrogram based multi-task audio classification. Multimed. Tools Appl. 2019, 78, 3705–3722. [Google Scholar] [CrossRef]
- Popova, A.S.; Rassadin, A.G.; Ponomarenko, A.A. Emotion Recognition in Sound. In Proceedings of the International Conference on Neuroinformatics, Moskow, Russia, 2–6 October 2017; Springer Science and Business Media LLC: Berlin/Heidelberg, Germany, 2017; Volume 736, pp. 117–124. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Zapata-Impata, B.S.; Gil, P.; Torres, F. Learning Spatio Temporal Tactile Features with a ConvLSTM for the Direction Of Slip Detection. Sensors 2019, 19, 523. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Jiang, G.; Cottrell, G.W. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 2627–2633. [Google Scholar]
- Chen, M.; He, X.; Yang, J.; Zhang, H. 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition. IEEE Signal Process. Lett. 2018, 25, 1440–1444. [Google Scholar] [CrossRef]
- Fushiki, T. Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 2011, 21, 137–146. [Google Scholar] [CrossRef]
- Fayek, H.M.; Lech, M.; Cavedon, L. Evaluating deep learning architectures for Speech Emotion Recognition. Neural Netw. 2017, 92, 60–68. [Google Scholar] [CrossRef]
- Guo, L.; Wang, L.; Dang, J.; Liu, Z.; Guan, H. Exploration of Complementary Features for Speech Emotion Recognition Based on Kernel Extreme Learning Machine. IEEE Access 2019, 7, 75798–75809. [Google Scholar] [CrossRef]
- Zheng, W.Q.; Yu, J.S.; Zou, Y.X. An experimental study of speech emotion recognition based on deep convolutional neural networks. In Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), Xi’an, China, 21–24 September 2015; pp. 827–831. [Google Scholar]
- Han, K.; Yu, D.; Tashev, I. Speech emotion recognition using deep neural network and extreme learning machine. In Proceedings of the Fifteenth Annual Conference of The International Speech Communication Association, Singapore, 14–18 September 2014. [Google Scholar]
- Meng, H.; Yan, T.; Yuan, F.; Wei, H. Speech Emotion Recognition From 3D Log-Mel Spectrograms With Deep Learning Network. IEEE Access 2019, 7, 125868–125881. [Google Scholar] [CrossRef]
- Zhao, Z.; Bao, Z.; Zhao, Y.; Zhang, Z.; Cummins, N.; Ren, Z.; Schuller, B. Exploring Deep Spectrum Representations via Attention-Based Recurrent and Convolutional Neural Networks for Speech Emotion Recognition. IEEE Access 2019, 7, 97515–97525. [Google Scholar] [CrossRef]
- Luo, D.; Zou, Y.; Huang, D. Investigation on Joint Representation Learning for Robust Feature Extraction in Speech Emotion Recognition. In Proceedings of the Interspeech 2018, Hyderabad, India, 2–6 September 2018. [Google Scholar]
- Jiang, S.; Zhou, P.; Li, Z.; Li, M. Memento: An Emotion Driven Lifelogging System with Wearables. In Proceedings of the 2017 26th International Conference on Computer Communication and Networks (ICCCN), Vancouver, BC, Canada, 31 July–3 August 2017; Volume 15, pp. 1–9. [Google Scholar] [CrossRef]
- Issa, D.; Demirci, M.F.; Yazici, A. Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control. 2020, 59, 101894. [Google Scholar] [CrossRef]
- Mustaqeem, S.K. MLT-DNet: Speech Emotion Recognition Using 1D Dilated CNN Based on Multi-Learning Trick Approach. Expert Syst. Appl. 2020, 114177. [Google Scholar] [CrossRef]
- Jalal, A.; Loweimi, E.; Moore, R.K.; Hain, T. Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition. In Proceedings of the INTERSPEECH 2019, Graz, Austria, 15–19 September 2019; pp. 1701–1705. [Google Scholar] [CrossRef] [Green Version]
- Bhavan, A.; Chauhan, P.; Shah, R.R. Bagged support vector machines for emotion recognition from speech. Knowl. Based Syst. 2019, 184, 104886. [Google Scholar] [CrossRef]
- Zamil, A.A.A.; Hasan, S.; Baki, S.M.J.; Adam, J.M.; Zaman, I. Emotion Detection from Speech Signals using Voting Mechanism on Classified Frames. In Proceedings of the 2019 International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), Dhaka, Bangladesh, 10–12 January 2019; pp. 281–285. [Google Scholar]
- Khan, Z.A.; Hussain, T.; Ullah, A.; Rho, S.; Lee, M.; Baik, S.W. Towards Efficient Electricity Forecasting in Residential and Commercial Buildings: A Novel Hybrid CNN with a LSTM-AE based Framework. Sensors 2020, 20, 1399. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Input | Deep Learning Architectures | IEMOCAP | RAVDESS |
---|---|---|---|
Audio clip or raw waveform | ConvLSTM + FCN + Softmax | 65.18% | 69.21% |
ConvLSTM + Skip connection + FCN + Softmax | 66.33% | 69.67% | |
ConvLSTM + BiLSTM + Softmax | 65.44% | 69.37% | |
ConvLSTM + Skip connection + BiLSTM + Softmax | 67.01% | 73.29% | |
ConvLSTM + BiGRU + Softmax | 69.26% | 76.67% | |
ConvLSTM + Skip connection + BiGRU + Softmax | 73.86% | 78.66% | |
ConvLSTM + Skip connection +BiGRU + Center loss | 75.03% | 80.05% |
Emotions/Classes | Whole Utterances | Participation |
---|---|---|
Anger | 1103 | 19.94% |
Sadness | 1084 | 19.60% |
Happy | 1636 | 29.58% |
Neutral | 1708 | 30.88% |
Emotion | Precision | Recall | F1 Score |
---|---|---|---|
Anger | 0.70 | 0.83 | 0.76 |
Happiness | 0.51 | 0.61 | 0.56 |
Neutral | 0.96 | 0.79 | 0.87 |
Sadness | 0.81 | 0.81 | 0.81 |
Weighted | 0.77 | 0.78 | 0.77 |
Un-weighted | 0.76 | 0.74 | 0.75 |
Accuracy | 78% |
Dataset | Reference | Year | Un-Weighted Accuracy |
---|---|---|---|
IEMOCAP | [22] | 2019 | 52.14% |
// | [52] | 2017 | 64.78% |
// | [53] | 2019 | 57.10% |
// | [54] | 2015 | 40.02% |
// | [55] | 2014 | 51.24% |
// | [56] | 2019 | 69.32% |
// | [57] | 2019 | 66.50% |
// | [58] | 2018 | 63.98% |
// | [59] | 2019 | 61.60% |
// | [50] | 2018 | 64.74% |
// | [60] | 2020 | 64.03% |
// | [2] | 2020 | 71.25% |
// | [61] | 2020 | 73.09% |
Proposed | [Ref#] | 2020 | 75.00% |
Emotions/Classes | Whole Utterances | Participation |
---|---|---|
Anger | 192 | 13.33% |
Fear | 192 | 13.33% |
Happy | 192 | 13.33% |
Sad | 192 | 13.33% |
Neutral | 96 | 6.667% |
Surprise | 192 | 13.33% |
Calm | 192 | 13.33% |
Disgust | 192 | 13.33% |
Emotions | Precision | Recall | F1_score |
---|---|---|---|
Anger | 0.84 | 0.82 | 0.83 |
Calm | 0.92 | 0.85 | 0.88 |
Disgust | 0.86 | 0.79 | 0.83 |
Fearful | 0.83 | 0.87 | 0.85 |
Happy | 0.88 | 0.76 | 0.81 |
Neutral | 0.53 | 0.80 | 0.64 |
Sad | 0.72 | 0.81 | 0.76 |
Surprise | 0.84 | 0.78 | 0.81 |
Weighted | 0.83 | 0.82 | 0.81 |
Un-weighted | 0.80 | 0.81 | 0.80 |
Accuracy | 82% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mustaqeem; Kwon, S. CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network. Mathematics 2020, 8, 2133. https://doi.org/10.3390/math8122133
Mustaqeem, Kwon S. CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network. Mathematics. 2020; 8(12):2133. https://doi.org/10.3390/math8122133
Chicago/Turabian StyleMustaqeem, and Soonil Kwon. 2020. "CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network" Mathematics 8, no. 12: 2133. https://doi.org/10.3390/math8122133
APA StyleMustaqeem, & Kwon, S. (2020). CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network. Mathematics, 8(12), 2133. https://doi.org/10.3390/math8122133