Emotion Recognition Method for Call/Contact Centre Systems
Abstract
:1. Introduction
- ▪
- development of a design methodology for a new emotion recognition method dedicated directly to the contact centre industry, providing the ability to study emotional states in both text channels and voice channels of CC systems;
- ▪
- analysis of the effectiveness of classification depending on the types of classifiers used and the mechanisms implementing the key tasks of the proposed methodology;
- ▪
- the proposed hybrid approach is the first comprehensive solution to the problem of automatically recognizing emotions simultaneously in both CC channels, while assuming additional possibilities for using transcriptions of recordings in assessing emotional states.
2. Related Work
No | Solution | Proposed Approach | Results | Ref. |
---|---|---|---|---|
Text channel | ||||
1. | Sentiment analysis of Twitter data | SVM model using the following descriptors: POS-tagging, tree kernel, senti features. | A validation level of about 56–60% was achieved, the model recognizes three groups of emotions: positive, negative, and neutral. | [18] |
2. | MTC problem solution | The kNN algorithm combined with profile classifiers with improved performance. | A validation rate of approximately 67% was achieved. The model considers the relevance of the documents to the case, which are divided into two categories: positive and negative. | [36] |
3. | Sentiment classification based on SVM and ANN | SVM and ANN models using the TF/IDF descriptor. | For ANN, the validation level is about 76–90% (depending on the training dataset). For the SVM classifier, the validation level is about 74–89%. Models recognize emotions: positive and negative. | [37] |
4. | Twitter sentiment analysis based on ordinal regression | The following network types were tested: SVR, DT, RFC. The TF/IDF descriptor was used. | For DT, the validation level is about 92%, for RFC about 83% and for SVR–82%. The models listed above recognize positive and negative emotions. | [30] |
5. | Sentiment representation model for Twitter analysis | The following network types were tested: SVM, RFC, CNN, and LSTM, and the Bayes classifier NBC. The QSR descriptor was used. | For the LSTM and CNN networks, the validation level is about 78%, the other solutions had validation levels in the range of about 64–65%. The models listed above recognize emotions: positive, negative, and neutral. | [38] |
6. | Twitter text analysis using NBC, kNN and SVM | The following classifiers were tested: NBC, kNN, and SVM. They use the TF/IDF descriptor. | For the NBC classifier, results of about 63% were achieved, for the SVM classifier about 61% and kNN about 60%. The models classify: openness, conscientiousness, extraversion, agreeableness, neuroticism. | [39] |
7. | Multi modal dialog act classification | DNN based model using GloVe vectors as a descriptor. | A validation rate of 39–65% was achieved depending on the training dataset used. The model recognizes the following 8 emotions: sadness, anger, fear, happiness, surprise, disgust, frustration, neutrality. | [40] |
8. | Attention based word embeddings | The ABC algorithm with SVM classifier was used with the objective of maximizing classification accuracy. ATVs were used as descriptors. | A validation rate of 82–96% was achieved depending on the training dataset used. The model recognizes positive, negative, and neutral emotions. | [41] |
Voice channel | ||||
1. | Emotion detection through speech | CNN network was tested. MFCC were used as descriptors. | For the CNN model, the best performance was achieved (at about 83%), the model recognizes the following types of emotions: anger, disgust, fear, happiness, sadness, and surprise | [42] |
2. | Speech emotion recognition based on rough set and SVM | The training data for the SVM model is a total of 13 parameters, including three from the energy features group, five pitch features, four formant features, and speech rate. | A validation level of approximately 78% was achieved, the model recognizes the following 5 emotions: anger, happiness, sadness, fear, surprise. There is variation by gender. | [43] |
3. | Speech emotion recognition for SROL database using weighted kNN algorithm | 18 parameters were used as training data for kNN model: F0, standard deviation of F0, median fundamental frequency, mean 1–4 formant frequency, standard deviation of 1–4 formant frequency, median 1–4 formant frequency, local jitter, local absolute jitter, local shimmer. | A validation level of approximately 63% was achieved, the model recognizes the following three types of emotions: anger, happiness, sadness. There is variation by gender. | [44] |
4. | A hierarchical framework for speech emotion recognition | The following network types were tested: PCA and LDA. 64 parameters were used as training data: 48 prosodic features and 16 formant frequency features. | Comparable results were obtained for the PCA and LDA models. The average recognition rate for males was 83.4% and 78.7% for females, the model recognizes the following five types of emotions: anger, happiness, sadness, fear, surprise. | [45] |
Fusion of text and voice methods | ||||
1. | Deep neural networks for emotion recognition | An LSTM network and the average fundamental frequency, shimmer, jitter and MFCC descriptors were used to detect emotion based on acoustic features. In contrast, a multilayer CNN network was used to detect emotions in transcriptions. | A validation level of 65% was achieved, the fusion of the models recognizes: anger, joy, sadness, and neutrality. | [46] |
2. | Emotion recognition system using speech features and transcriptions | A CNN network and MFCC spectrograms and parameters were used to detect emotions based on acoustic features. In terms of emotion classification from text, the CNN network was also used using the word2vec method. | A validation level of 76% was achieved, CNN model fusion recognizes: anger, joy, sadness, and neutrality. | [47] |
3. | Emotion recognition based on bottleneck acoustic features and lexical features | A DNN network and the F0, MFCC, 40 mel filterbank energies parameters descriptors were used to detect emotions based on acoustic features. In terms of emotion classification from text, the DNN network was also used with word2vec and ANEW descriptors. | A validation level of 74% was achieved, DNN model fusion recognizes: anger, joy, sadness, and neutrality. | [48] |
4. | Multimodal emotion recognition network with personalized attention profile | For emotion detection based on acoustic features, a BLSTM network and 45-dimensional vector consisting of MFCC features, F0, zero cross ratio along with the first and second derivatives of MFCC were used. Meanwhile, BLSTM using the GloVe method was also used to detect emotion in text. | A validation level of 70% was achieved, BLSTM model fusion recognizes: anger, joy, sadness, and neutrality. | [49] |
5. | Speech emotion recognition based on attention weight correction using word level measure | The BLSTM network and the following descriptors were used to detect emotions based on acoustic features: MFCC, constant transform and average fundamental frequency. In terms of emotion classification based on text, the BLSTM network was also used with the BERT model. | A validation level of about 76% was achieved. The following emotions are recognized: anger, joy, sadness, and neutrality. | [50] |
3. Research Methodology
4. Methods
4.1. Data Balancing Techniques
4.2. Techniques Used in Text Channel
4.3. Speech Signal Descriptors
5. Proposed Approach
5.1. Transcriptor Module
5.2. Text Analyzer Module
5.3. Voice Analyzer Module
6. Results
6.1. Text Channel (Chats)
6.2. Voice Channel
6.3. Voice Channel with Text Method Combined
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jyoti, R.; Wardley, M. Unlocking the Transformative Power of AI for Contact Centers. IDC InfoBrief, October 2020. Available online: https://inthecloud.withgoogle.com/idc-infobrief/CCAI-IDC-Infobrief.html (accessed on 22 April 2022).
- DMG Consulting LLC. The State of Artificial Intelligence in the Contact Center; Report; DMG Consulting LLC: West Orange, NJ, USA, 2022. [Google Scholar]
- Kask, S.; Fitterer, R.; Anshelm, L. Augmenting Digital Customer Touchpoints: Best Practices for Transforming Customer Experience Through Conversational AI; Marketing Review; University St. Gallen: Sankt Gallen, Switzerland, 2019; Volume 5, pp. 64–69. [Google Scholar]
- Plaza, M.; Pawlik, L. Influence of the Contact Center Systems Development on Key Performance Indicators. IEEE Access 2021, 9, 44580–44591. [Google Scholar] [CrossRef]
- Google Cloud. Natural Language API Basics. 2021. Available online: https://cloud.google.com/naturallanguage/docs/basics#sentiment_analysis (accessed on 10 May 2022).
- Google Cloud. Language Reference. 2021. Available online: https://cloud.google.com/dialogflow/es/docs/reference/language (accessed on 1 August 2022).
- Stolletz, R.; Helber, S. Performance analysis of an inbound call center with skills-based routing. OR Spektrum 2004, 26, 331–352. [Google Scholar] [CrossRef]
- Jahangir, R.; Teh, Y.W.; Hanif, F.; Mujtaba, G. Deep learning approaches for speech emotion recognition: State of the art and research challenges. Multimed. Tools Appl. 2021, 8, 23745–23812. [Google Scholar] [CrossRef]
- Roy, T.; Marwala, T.; Chakraverty, S. A Survey of Classification Techniques in Speech Emotion Recognition. In Mathematical Methods in Interdisciplinary Sciences; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2020; pp. 33–48. [Google Scholar] [CrossRef]
- Swain, M.; Routray, A.; Kabisatpathy, P. Databases, features and classifiers for speech emotion recognition: A review. Int. J. Speech Technol. 2018, 21, 93–120. [Google Scholar] [CrossRef]
- Rubin, V.L.; Stanton, J.M.; Liddy, E.D. Discerning Emotions in Texts, The AAAI Symposium on Exploring Attitude and Affect in Text AAAI-EAAI, Stanford, CA, 2004. Available online: https://www.aaai.org/Papers/Symposia/Spring/2004/SS-04-07/SS04-07-023.pdf (accessed on 10 August 2022).
- Sathe, J.B.; Mali, M.P. A hybrid Sentiment Classification method using Neural Network and Fuzzy Logic. In Proceedings of the 11th International Conference on Intelligent Systems and Control (ISCO), Coimbatore, India, 5–6 January 2017; pp. 93–96. [Google Scholar] [CrossRef]
- Khan, M.T.; Durrani, M.; Ali, A.; Inayat, I.; Khalid, S.; Khan, K.H. Sentiment analysis and the complex natural language. Complex Adapt. Syst. Model. 2016, 4, 1–19. [Google Scholar] [CrossRef] [Green Version]
- Dragoni, M.; Tettamanzi, A.G.B.; Pereira, C.D.C. A Fuzzy System for Concept-Level Sentiment Analysis. In Semantic Web Evaluation Challenge; Springer: Cham, Switzerland, 2014; pp. 21–27. [Google Scholar] [CrossRef] [Green Version]
- Pawlik, Ł.; Płaza, M.; Deniziak, S.; Boksa, E. A method for improving bot effectiveness by recognising implicit customer intent in contact centre conversations. Speech Commun. 2022, 143, 33–45. [Google Scholar] [CrossRef]
- Ekman, P.; Friesen, W.V. Unmasking the Face: A Guide to Recognizing Emotions from Facial Clues; Malor Books: Los Altos, CA, USA, 2003. [Google Scholar]
- Liu, B. Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 2012, 5, 1–167. [Google Scholar]
- Agarwal, A.; Xie, B.; Vovsha, I.; Rambow, O.; Passonneau, R. Sentiment analysis of Twitter data. In Proceedings of the Workshop on Languages in Social Media, Portland, Orego, 23 June 2011; pp. 30–38. [Google Scholar]
- Phan, H.T.; Tran, V.C.; Nguyen, N.T.; Hwang, D. Improving the Performance of Sentiment Analysis of Tweets Containing Fuzzy Sentiment Using the Feature Ensemble Model. IEEE Access 2020, 8, 14630–14641. [Google Scholar] [CrossRef]
- Sitaula, C.; Basnet, A.; Mainali, A.; Shahi, T.B. Deep Learning-Based Methods for Sentiment Analysis on Nepali COVID-19-Related Tweets. Comput. Intell. Neurosci. 2021, 2021, 2158184. [Google Scholar] [CrossRef]
- Ortigosa, A.; Martín, J.M.; Carro, R.M. Sentiment analysis in Facebook and its application to e-learning. Comput. Hum. Behav. 2014, 31, 527–541. [Google Scholar] [CrossRef]
- Jianqiang, Z.; Xiaolin, G.; Xuejun, Z. Deep Convolution Neural Networks for Twitter Sentiment Analysis. IEEE Access 2018, 6, 23253–23260. [Google Scholar] [CrossRef]
- Zeng, J.; Ma, X.; Zhou, K. Enhancing Attention-Based LSTM With Position Context for Aspect-Level Sentiment Classification. IEEE Access 2019, 7, 20462–20471. [Google Scholar] [CrossRef]
- Wang, J.; Yu, L.-C.; Lai, K.R.; Zhang, X. Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 2, pp. 225–230. [Google Scholar]
- Shekhawat, S.S.; Shringi, S.; Sharma, H. Twitter sentiment analysis using hybrid Spider Monkey optimization method. Evol. Intell. 2020, 14, 1307–1316. [Google Scholar] [CrossRef]
- Bouazizi, M.; Ohtsuki, T. A Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter. IEEE Access 2017, 5, 20617–20639. [Google Scholar] [CrossRef]
- Wang, Y.; Kim, K.; Lee, B.; Youn, H.Y. Word clustering based on POS feature for efficient twitter sentiment analysis. Hum.-Cent. Comput. Inf. Sci. 2018, 8, 17. [Google Scholar] [CrossRef]
- Demszky, D.; Movshovitz-Attias, D.; Ko, J.; Cowen, A.; Nemade, G.; Ravi, S. GoEmotions: A Dataset of Fine-Grained Emotions. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 6–8 July 2020; pp. 4040–4054. [Google Scholar] [CrossRef]
- Kumar, P.; Raman, B. A BERT based dual-channel explainable text emotion recognition system. Neural Netw. 2022, 150, 392–407. [Google Scholar] [CrossRef]
- Saad, S.E.; Yang, J. Twitter Sentiment Analysis Based on Ordinal Regression. IEEE Access 2019, 7, 163677–163685. [Google Scholar] [CrossRef]
- Busso, C.; Lee, S.; Narayanan, S. Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection. IEEE Trans. Audio Speech Lang. Process. 2009, 17, 582–596. [Google Scholar] [CrossRef] [Green Version]
- Kuchibhotla, S.; Vankayalapati, H.D.; Vaddi, R.S.; Anne, K.R. A comparative analysis of classifiers in emotion recognition through acoustic features. Int. J. Speech Technol. 2014, 17, 401–408. [Google Scholar] [CrossRef]
- Koolagudi, S.G.; Rao, K.S. Emotion recognition from speech: A review. Int. J. Speech Technol. 2012, 15, 99–117. [Google Scholar] [CrossRef]
- Fahad, S.; Ranjan, A.; Yadav, J.; Deepak, A. A survey of speech emotion recognition in natural environment. Digit. Signal Process. 2020, 110, 102951. [Google Scholar] [CrossRef]
- Smagowska, B. Noise at Workplaces in the Call Center. Arch. Acoust. 2010, 35, 253–264. [Google Scholar] [CrossRef] [Green Version]
- Zadrozny, S.; Kacprzyk, J.; Gajewski, M. Multiaspect Text Categorization Problem Solving: A Nearest Neighbours Classifier Based Approaches and Beyond. J. Autom. Mob. Robot. Intell. Syst. 2015, 9, 58–70. [Google Scholar] [CrossRef]
- Moraes, R.; Valiati, J.F.; Neto, W.P.G. Document-level sentiment classification: An empirical comparison between SVM and ANN. Expert Syst. Appl. 2013, 40, 621–633. [Google Scholar] [CrossRef]
- Zhang, Y.; Song, D.; Zhang, P.; Li, X.; Wang, P. A quantum-inspired sentiment representation model for twitter sentiment analysis. Appl. Intell. 2019, 49, 3093–3108. [Google Scholar] [CrossRef]
- Pratama, B.Y.; Sarno, R. Personality classification based on Twitter text using Naive Bayes, KNN and SVM. In Proceedings of the International Conference on Data and Software Engineering (ICoDSE), Yogyakarta, Indonesia, 1 November 2015. [Google Scholar] [CrossRef]
- Saha, T.; Patra, A.; Saha, S.; Bhattacharyya, P. Towards Emotion-aided Multi-modal Dialogue Act Classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 6–8 July 2020; pp. 4361–4372. [Google Scholar] [CrossRef]
- Zhang, M.; Palade, V.; Wang, Y.; Ji, Z. Attention-based word embeddings using Artificial Bee Colony algorithm for aspect-level sentiment classification. Inf. Sci. 2020, 545, 713–738. [Google Scholar] [CrossRef]
- Qayyum, A.B.A.; Arefeen, A.; Shahnaz, C. Convolutional Neural Network (CNN) Based Speech-Emotion Recognition. In Proceedings of the IEEE International Conference on Signal Processing, Information, Communication & Systems (SPICSCON), Dhaka, Bangladesh, 28–30 November 2019; pp. 122–125. [Google Scholar] [CrossRef]
- Zhou, J.; Wang, G.; Yang, Y.; Chen, P. Speech Emotion Recognition Based on Rough Set and SVM. In Proceedings of the 5th IEEE International Conference on Cognitive Informatics, Beijing, China, 17–19 July 2006; pp. 53–61. [Google Scholar] [CrossRef]
- Feraru, M.; Zbancioc, M. Speech emotion recognition for SROL database using weighted KNN algorithm. In Proceedings of the International Conference on Electronics, Computers and Artificial Intelligence, Pitesti, Arkansas, 27–29 June 2013; pp. 1–4. [Google Scholar] [CrossRef]
- You, M.; Chen, C.; Bu, J.; Liu, J.; Tao, J. A Hierarchical Framework for Speech Emotion Recognition. IEEE Int. Symp. Ind. Electron. 2006, 1, 515–519. [Google Scholar] [CrossRef]
- Cho, J.; Pappagari, R.; Kulkarni, P.; Villalba, J.; Carmiel, Y.; Dehak, N. Deep Neural Networks for Emotion Recognition Combining Audio and Transcripts. arXiv 2019, arXiv:1911.00432. [Google Scholar] [CrossRef]
- Tripathi, S.; Kumar, A.; Ramesh, A.; Singh, C.; Yenigalla, P. Deep learning based emotion recognition system using speech features and transcriptions. arXiv 2019, arXiv:1906.05681. [Google Scholar] [CrossRef]
- Kim, E.; Shin, J.W. DNN-based Emotion Recognition Based on Bottleneck Acoustic Features and Lexical Features. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 6720–6724. [Google Scholar] [CrossRef]
- Li, J.-L.; Lee, C.-C. Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile. In Proceedings of the Interspeech 2019, Graz, Austria, 15–19 September 2019. [Google Scholar] [CrossRef] [Green Version]
- Santoso, J.; Yamada, T.; Makino, S.; Ishizuka, K.; Hiramura, T. Speech Emotion Recognition Based on Attention Weight Correction Using Word-Level Confidence Measure. In Proceedings of the Interspeech 2021, Brno, Czechia, 30 August–3 September 2021. [Google Scholar] [CrossRef]
- Atmaja, B.T.; Sasou, A.; Akagi, M. Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion. Speech Commun. 2022, 140, 11–28. [Google Scholar] [CrossRef]
- NAWL. Interactive Analysis of the NAWL Database. 2022. Available online: https://exp.lobi.nencki.gov.pl/nawl-analysis (accessed on 12 June 2022).
- Plaza, M.; Pawlik, L.; Deniziak, S. Call Transcription Methodology for Contact Center Systems. IEEE Access 2021, 9, 110975–110988. [Google Scholar] [CrossRef]
- Płaza, M.; Trusz, S.; Kęczkowska, J.; Boksa, E.; Sadowski, S.; Koruba, Z. Machine Learning Algorithms for Detection and Classifications of Emotions in Contact Center Applications. Sensors 2022, 22, 5311. [Google Scholar] [CrossRef]
- Scikit-Learn User Manual. Available online: https://scikit-learn.org/stable/modules/model_evaluation.html# (accessed on 16 August 2022).
- Behera, B.; Kumaravelan, G.; Kumar, P. Performance Evaluation of Deep Learning Algorithms in Biomedical Document Classification. In Proceedings of the 11th International Conference on Advanced Computing (ICoAC), Chennai, India, 18–20 December 2019; pp. 220–224. [Google Scholar] [CrossRef]
- López, V.; Fernández, A.; García, S.; Palade, V.; Herrera, F. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 2013, 250, 113–141. [Google Scholar] [CrossRef]
- Sun, Y.; Wong, A.K.C.; Kamel, M.S. Classification of imbalanced data: A review. Int. J. Pattern Recognit. Artif. Intell. 2009, 23, 687–719. [Google Scholar] [CrossRef]
- Chawla, N.V. Data mining for imbalanced datasets: An overview. In Data Mining and Knowledge Discovery Handbook; Springer: Berlin/Heidelberg, Germany, 2009; pp. 875–886. [Google Scholar]
- Prusa, J.; Khoshgoftaar, T.M.; Dittman, D.J.; Napolitano, A. Using Random Undersampling to Alleviate Class Imbalance on Tweet Sentiment Data. In Proceedings of the IEEE International Conference on Information Reuse and Integration, San Francisco, CA, USA, 13–15 August 2015; pp. 197–202. [Google Scholar] [CrossRef]
- Lemaitre, G.; Nogueira, F.; Aridas, C. Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced satasets in machine learning. J. Mach. Learn. Res. 2017, 18, 1–5. [Google Scholar]
- The Imbalanced-Learn Developers. Imbalanced-Learn Documentation. 2022. Available online: https://imbalanced-learn.org/ (accessed on 1 June 2022).
- Aizawa, A. An information-theoretic perspective of tf–idf measures. Inf. Process. Manag. 2003, 39, 45–65. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Duchesnay, E. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Scikitlearn. Tf–idf Term Weighting. 2022. Available online: https://scikit-learn.org/stable/modules/feature_extraction.html#tfidf-term-weighting (accessed on 11 May 2022).
- Gebre, B.G.; Zampieri, M.; Wittenburg, P.; Heskes, T. Improving Native Language Identification with TF-IDF Weighting. In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, Association for Com-putational Linguistics, Atlanta, GA, USA, 13 June 2013; pp. 216–223. [Google Scholar]
- Murugappan, M.; Rizon, M.; Nagarajan, R.; Yaacob, S.; Hazry, D.; Zunaidi, I. Time-Frequency Analysis of EEG Signals for Human Emotion Detection. IFMBE Proc. 2008, 21, 262–265. [Google Scholar] [CrossRef]
- Kong, J. A Study on Jitter, Shimmer and F0 of Mandarin Infant Voice by Developing an Applied Method of Voice Signal Processing. In Proceedings of the Congress on Image and Signal Processing, Sanya, China, 27–30 May 2008; pp. 314–318. [Google Scholar] [CrossRef]
- Korkmaz, O.E.; Atasoy, A. Emotion recognition from speech signal using mel-frequency cepstral coefficients. In Proceedings of the 9th International Conference on Electrical and Electronics Engineering (ELECO), Bursa, Turkey, 26–28 November 2015; pp. 1254–1257. [Google Scholar] [CrossRef]
- Ancilin, J.; Milton, A. Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl. Acoust. 2021, 179, 108046. [Google Scholar] [CrossRef]
- Chamoli, A.; Semwal, A.; Saikia, N. Detection of emotion in analysis of speech using linear predictive coding techniques (L.P.C). In Proceedings of the International Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 19–20 January 2017; pp. 1–4. [Google Scholar] [CrossRef]
- Basu, S.; Chakraborty, J. Aftabuddin Emotion recognition from speech using convolutional neural network with recurrent neural network architecture. In Proceedings of the 2017 2nd International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 19–20 October 2017; pp. 333–336. [Google Scholar] [CrossRef]
- Wang, K.; An, N.; Li, B.N.; Zhang, Y.; Li, L. Speech Emotion Recognition Using Fourier Parameters. IEEE Trans. Affect. Comput. 2015, 6, 69–75. [Google Scholar] [CrossRef]
- Aouani, H.; Ben Ayed, Y. Emotion recognition in speech using MFCC with SVM, DSVM and auto-encoder. In Proceedings of the 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 21–24 March 2018; pp. 1–5. [Google Scholar] [CrossRef]
- Saste, S.T.; Jagdale, S.M. Emotion recognition from speech using MFCC and DWT for security system. In Proceedings of the 2017 International Conference of Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 20–22 April 2017; pp. 701–704. [Google Scholar] [CrossRef]
- Riegel, M.; Wierzba, M.; Wypych, M.; Żurawski, Ł.; Jednoróg, K.; Grabowska, A.; Marchewka, A. Nencki Affective Word List (NAWL): The cultural adaptation of the Berlin Affective Word List–Reloaded (BAWL-R) for Polish. Behav. Res. Methods 2015, 47, 1222–1236. [Google Scholar] [CrossRef] [PubMed]
Name | Description | Application |
---|---|---|
DBT1 | 345 chat text conversations (7515 statements of clients and agents were extracted). | Data used in the process of training and testing of classifiers in part one. |
DBT2 | 100 chat text conversations (3718 statements of clients and agents were extracted). | Database used in the verification process of the developed method. |
DBV1 | 302 actual voice calls ranging from 3 to 20 min in duration. The total recording time of this database is 22 h 59 min and 11 s. | Data used in the process of training and testing of classifiers in part two. |
DBV2 | 100 actual voice calls ranging from 3 to 20 min in duration. The total recording time of this database is 7 h 25 min and 15 s. | Database used in the verification process of the developed method. |
DBVT1 | Database of transcriptions of recordings from DBV1 set. | Additional functionality to improve recognition for the problem in part two. |
DBVT2 | Database of transcriptions of recordings from DBV2 set. | Database used in the verification process of the developed method. |
Method Name | Type | Used Algorithm |
---|---|---|
RandomOS | OS | Random over-sampler |
SMOTE | OS | Synthetic minority over-sampling technique |
ADASYN | OS | Adaptive synthetic sampling |
RandomUS | US | Random under-sampler |
Near-Miss | US | Near-miss method based under-sampler |
CondensedNN | US | Condensed nearest neighbor |
Method | Number of Recognized Emotions [pcs] | ||||
---|---|---|---|---|---|
Anger | Sadness | Happiness | Fear | Neutral | |
Labeled emotions | 177 | 30 | 38 | 30 | 312 |
Recognition without using balancing techniques | |||||
No balancing | 125 | 1 | 1 | 1 | 283 |
Recognition with using balancing techniques | |||||
RandomOS | 139 | 13 | 15 | 17 | 177 |
SMOTE | 139 | 13 | 15 | 12 | 192 |
ADASYN | 142 | 15 | 14 | 12 | 185 |
RandomUS | 119 | 22 | 14 | 13 | 124 |
NearMiss | 78 | 15 | 13 | 16 | 62 |
CondensedNN | 102 | 11 | 1 | 0 | 287 |
Method | ACC | WAP | WAF1 |
---|---|---|---|
RandomOS | 0.61 | 0.69 | 0.64 |
SMOTE | 0.63 | 0.68 | 0.65 |
ADASYN | 0.63 | 0.68 | 0.64 |
RandomUS | 0.50 | 0.64 | 0.53 |
NearMiss | 0.31 | 0.58 | 0.35 |
CondensedNN | 0.68 | 0.68 | 0.63 |
Emotion | Emoticon Type | Number of Emoticons |
---|---|---|
ANGER | >:(, >:-(, :P, :-P, :/:-/ | 11 |
FEAR | :O, :-O, :o, :-o | 8 |
SADNESS | :(, :-(, :((, :-((, :’(, :’-( | 13 |
HAPPINESS | :), :-), :)), :-)), :D, xD, XD | 111 |
NEUTRAL | Not applicable |
Word | Group of Words |
---|---|
wish | nice, hours, day |
works | slow, messenger, mostly |
unfortunately | my, telephone, allows, important |
very | Mr, thank |
where | information, stated, important |
please | prompt, find, can |
help | can, somehow, something |
unacceptable | disconnected, can, number |
client | service, 24, disconnected, headquarters |
case | weeks, last, 2 |
better | contact, attention, was |
want | by, given, indicated |
waiting | once, help, case |
favour | short, window, mark |
best | wishes, end, day, nice |
MFCC | Anger | Sadness | Happiness | Fear | Neutral |
---|---|---|---|---|---|
F0 | −0.070 | −0.002 | 0.118 | 0.035 | −0.081 |
JITTER | 0.157 | −0.132 | 0.075 | −0.087 | −0.013 |
SHIMMER | 0.289 | −0.178 | −0.001 | −0.093 | −0.016 |
MFCC0 | 0.335 | −0.290 | 0.003 | 0.079 | −0.128 |
MFCC1 | 0.218 | 0.117 | −0.144 | 0.036 | −0.227 |
MFCC2 | −0.304 | 0.061 | 0.065 | 0.133 | 0.044 |
MFCC3 | 0.151 | −0.089 | −0.006 | −0.014 | −0.042 |
MFCC4 | 0.241 | −0.021 | −0.059 | −0.083 | −0.078 |
MFCC5 | −0.338 | 0.110 | 0.054 | 0.024 | 0.150 |
MFCC6 | −0.154 | 0.029 | 0.011 | −0.034 | 0.148 |
MFCC7 | 0.002 | −0.103 | 0.083 | −0.106 | 0.123 |
MFCC8 | −0.324 | 0.027 | 0.150 | −0.033 | 0.181 |
MFCC9 | −0.182 | 0.090 | 0.019 | −0.024 | 0.096 |
MFCC10 | −0.011 | −0.108 | 0.028 | 0.016 | 0.075 |
MFCC11 | −0.178 | −0.135 | 0.151 | −0.020 | 0.182 |
MFCC12 | −0.000 | −0.013 | 0.073 | −0.146 | 0.086 |
No | ABC | ANN | DT | kNN | NBC | RFC | SVM | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC | WAP | WAF1 | ACC | WAP | WAF1 | ACC | WAP | WAF1 | ACC | WAP | WAF1 | ACC | WAP | WAF1 | ACC | WAP | WAF1 | ACC | WAP | WAF1 | |
1 | 0.51 | 0.55 | 0.52 | 0.60 | 0.60 | 0.61 | 0.52 | 0.54 | 0.53 | 0.53 | 0.53 | 0.52 | 0.34 | 0.54 | 0.53 | 0.51 | 0.63 | 0.54 | 0.48 | 0.52 | 0.48 |
2 | 0.51 | 0.61 | 0.53 | 0.67 | 0.67 | 0.66 | 0.51 | 0.54 | 0.52 | 0.54 | 0.63 | 0.54 | 0.38 | 0.54 | 0.56 | 0.52 | 0.63 | 0.53 | 0.49 | 0.61 | 0.45 |
3 | 0.51 | 0.63 | 0.54 | 0.64 | 0.64 | 0.65 | 0.55 | 0.60 | 0.56 | 0.56 | 0.63 | 0.56 | 0.40 | 0.58 | 0.58 | 0.53 | 0.67 | 0.55 | 0.48 | 0.59 | 0.46 |
4 | 0.52 | 0.61 | 0.54 | 0.63 | 0.63 | 0.63 | 0.56 | 0.57 | 0.56 | 0.56 | 0.62 | 0.56 | 0.38 | 0.53 | 0.55 | 0.55 | 0.68 | 0.57 | 0.49 | 0.53 | 0.48 |
5 | 0.54 | 0.67 | 0.61 | 0.66 | 0.66 | 0.68 | 0.57 | 0.65 | 0.61 | 0.59 | 0.69 | 0.62 | 0.38 | 0.64 | 0.58 | 0.56 | 0.69 | 0.61 | 0.52 | 0.63 | 0.57 |
Av | 0.52 | 0.61 | 0.55 | 0.64 | 0.64 | 0.64 | 0.54 | 0.58 | 0.56 | 0.56 | 0.62 | 0.56 | 0.38 | 0.57 | 0.56 | 0.53 | 0.66 | 0.56 | 0.50 | 0.58 | 0.50 |
Std | 0.01 | 0.04 | 0.04 | 0.03 | 0.03 | 0.03 | 0.02 | 0.05 | 0.04 | 0.02 | 0.06 | 0.04 | 0.02 | 0.05 | 0.02 | 0.02 | 0.03 | 0.03 | 0.02 | 0.05 | 0.05 |
Metric | Classifier | ||||||
---|---|---|---|---|---|---|---|
ABC | ANN | DT | kNN | NBC | RFC | SVM | |
Agent | |||||||
ACC | 0.64 | 0.60 | 0.53 | 0.63 | 0.29 | 0.64 | 0.75 |
WAP | 0.69 | 0.72 | 0.72 | 0.68 | 0.74 | 0.73 | 0.72 |
WAF1 | 0.66 | 0.65 | 0.60 | 0.65 | 0.26 | 0.67 | 0.73 |
Client | |||||||
ACC | 0.53 | 0.53 | 0.48 | 0.50 | 0.32 | 0.51 | 0.60 |
WAP | 0.51 | 0.57 | 0.51 | 0.50 | 0.56 | 0.51 | 0.50 |
WAF1 | 0.51 | 0.54 | 0.49 | 0.49 | 0.32 | 0.50 | 0.54 |
No | CNN | kNN | LDA | SVM | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC | WAP | WAF1 | ACC | WAP | WAF1 | ACC | WAP | WAF1 | ACC | WAP | WAF1 | |
1 | 0.78 | 0.79 | 0.78 | 0.71 | 0.66 | 0.68 | 0.66 | 0.58 | 0.61 | 0.72 | 0.70 | 0.67 |
2 | 0.88 | 0.88 | 0.87 | 0.70 | 0.69 | 0.68 | 0.67 | 0.62 | 0.63 | 0.71 | 0.64 | 0.65 |
3 | 0.86 | 0.86 | 0.86 | 0.69 | 0.66 | 0.65 | 0.62 | 0.64 | 0.57 | 0.69 | 0.63 | 0.63 |
4 | 0.85 | 0.85 | 0.84 | 0.69 | 0.69 | 0.66 | 0.66 | 0.58 | 0.61 | 0.64 | 0.74 | 0.57 |
5 | 0.78 | 0.78 | 0.78 | 0.71 | 0.69 | 0.69 | 0.66 | 0.57 | 0.60 | 0.70 | 0.76 | 0.65 |
Av | 0.83 | 0.83 | 0.83 | 0.70 | 0.68 | 0.67 | 0.65 | 0.60 | 0.60 | 0.70 | 0.69 | 0.63 |
Std | 0.05 | 0.04 | 0.04 | 0.01 | 0.02 | 0.02 | 0.02 | 0.03 | 0.02 | 0.03 | 0.05 | 0.04 |
Metric | Classifier | |||
---|---|---|---|---|
CNN | kNN | LDA | SVM | |
Agent | ||||
ACC | 0.68 | 0.53 | 0.54 | 0.63 |
WAP | 0.97 | 0.56 | 0.80 | 0.54 |
WAF1 | 0.80 | 0.54 | 0.64 | 0.58 |
Client | ||||
ACC | 0.67 | 0.52 | 0.47 | 0.61 |
WAP | 0.71 | 0.79 | 0.71 | 0.74 |
WAF1 | 0.69 | 0.61 | 0.55 | 0.66 |
Metric | Classifier | |||
---|---|---|---|---|
CNN | kNN | LDA | SVM | |
Client | ||||
ACC | 0.68 | 0.53 | 0.48 | 0.62 |
WAP | 0.80 | 0.81 | 0.73 | 0.76 |
WAF1 | 0.73 | 0.62 | 0.57 | 0.67 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Płaza, M.; Kazała, R.; Koruba, Z.; Kozłowski, M.; Lucińska, M.; Sitek, K.; Spyrka, J. Emotion Recognition Method for Call/Contact Centre Systems. Appl. Sci. 2022, 12, 10951. https://doi.org/10.3390/app122110951
Płaza M, Kazała R, Koruba Z, Kozłowski M, Lucińska M, Sitek K, Spyrka J. Emotion Recognition Method for Call/Contact Centre Systems. Applied Sciences. 2022; 12(21):10951. https://doi.org/10.3390/app122110951
Chicago/Turabian StylePłaza, Mirosław, Robert Kazała, Zbigniew Koruba, Marcin Kozłowski, Małgorzata Lucińska, Kamil Sitek, and Jarosław Spyrka. 2022. "Emotion Recognition Method for Call/Contact Centre Systems" Applied Sciences 12, no. 21: 10951. https://doi.org/10.3390/app122110951
APA StylePłaza, M., Kazała, R., Koruba, Z., Kozłowski, M., Lucińska, M., Sitek, K., & Spyrka, J. (2022). Emotion Recognition Method for Call/Contact Centre Systems. Applied Sciences, 12(21), 10951. https://doi.org/10.3390/app122110951