Multimodal Emotion Recognition Using Visual, Vocal and Physiological Signals: A Review
Abstract
:1. Introduction
- RQ1: How can multimodal emotion recognition methods, using visual, vocal and physiological signals be optimized to enhance robustness and accuracy, and what is the impact of deep learning techniques and dynamic expression analysis in overcoming these challenges?
2. Materials and Methods
2.1. Search Strategy
2.2. Study Selection
3. Human Emotion Categorization Models
4. Human Emotional Expression Recognition
4.1. Single Modality
4.1.1. Visual Modality
Ref | Year | Database | Features/Classifier | Best Performance |
---|---|---|---|---|
[118] | 1994 | in-house | Neural network | Acc: to |
[119] | 2002 | JAFFE | Neural network | Acc: |
[120] | 2005 | DFAT-504 | Gabor filters + AdaBoost/SVM | Acc: |
[121] | 2016 | FER2013 + SFEW 2.0 | DNNRL | Acc: |
[122] | 2017 | CK+ Oulu-CASIA MMI | PHRNN + MSCNN | Acc: Acc: Acc: |
[123] | 2018 | in-house | Wavelet entropy/Neural network | Acc: |
[124] | 2018 | KDEF CK+ | CNN/SVM | Acc: Acc: |
[125] | 2020 | FACES Lifespan CIFE FER2013 | VGG-16/RF | Acc: Acc: Acc: Acc: |
[126] | 2023 | CK+ FER2013 | CNN + DAISY/RF | Acc: Acc: |
[127] | 2023 | JAFFE CK+ FER2013 SFEW 2.0 | VGG19 + GoogleNet + ResNet101/SVM | Acc: Acc: Acc: Acc: |
[128] | 2024 | Aff-Wild2 | Landmarks/GCN | Acc: |
4.1.2. Speech Modality
4.1.3. Physiological Modality
4.2. Multimodal Emotion Recognition
Ref | Year | Multimodal Database | Elicitation | Features | Classifier | Average Accuracy | Fusion Method | Modalities |
---|---|---|---|---|---|---|---|---|
[223] | 2018 | IEMOCAP, EmotiW | - | BiGRU, attention layer | CNN | Best acc: , WF1 = | Word-level feature-level fusion, CNN | Audio/text |
[164] | 2015 | eNTERFACE | - | - | (SVM), ELM | Acc: | Feature-level fusion | Audio/facial/ text |
[224] | 2021 | RAVDESS | Acted | xlsr-Wav2Vec2.0 AUs/ bi-LSTM | - | Acc: | Decision-level fusion using multinomial logistic regression | Audio/facial |
[225] | 2024 | eNTERFACE’05 | Induced | MobileNetV2 spectrogram/(2D CNN with a federated learning concept) | - | Acc: (subject-dependent) | Decision-level fusion using average probability voting | Audio/facial |
[226] | 2023 | WESAD CASE k-EmoCon | - | temporal convolution-based modality-specific encoders | FC | Acc: val: , ar: val: , ar: | Feature-level fusion using a transformer | EDA, BVP, TEMP |
[227] | 2022 | In-house dataset | Induced | CBAM and ResNet34 | MLP | Acc: (subject-dependent) | Data-level fusion | EEG/facial |
[228] | 2022 | RAVDESS SAVEE | - | ConvLSTM2D and CNN (MFCCs + MS + SC + TZ) CNN | MLP | Acc: Acc: | Feature-level fusion | Audio/video |
[229] | 2022 | SAVEE RAVDESS RML | - | 2-stream CNN and bi-LSTM (ZC, EN, ENE) CNN | MLP | Acc: Acc: Acc: | Feature-level fusion | Audio/video |
[230] | 2024 | IEMOCAP (facial/audio) | - | AlexNet with contrastive adversarial learning (facial) MFCC, velocity and acceleration + VGGNet Convolutional autoencoder (teacher), CNN (student) | - | Acc: per emotion state | Adaptive decision-level fusion | Facial/audio |
[231] | 2023 | ASCERTAIN | Induced | FOX-optimized DDQ | - | Acc: | Optimization-based model fusion | Facial/audio/ GSR |
[232] | 2023 | RAVDESS CREMA-D | - | 3D CNN with attention mechanism 2D CNN with attention mechanism | - | Acc: Acc: | Cross-attention fusion system (feature-level fusion) | Audio/facial |
[233] | 2023 | M-LFW-F (facial) and CREMA-D (audio) | - | modified Xception model (spectrogram images extracted from the audio signal) | - | Acc: | Feature-level fusion between entry flow and middle flow | Audio/facial |
[234] | 2023 | AFEW SFEW MELD AffWild2 | - | VGG19 for face (spectrogram) ResNet50 for audio | - | Acc: Acc: Acc: | Embracenet+, feature-level fusion | Facial/audio |
[235] | 2024 | In-house dataset | Induced | CNN (EEG topography) CNN | - | Acc: | Decision-level fusion | Facial/EEG |
[236] | 2024 | FEGE | Acted | 3D-CNN + FC | - | Acc: | model fusion through a shared encoder at feature level and Type-2 fuzzy decision system | Facial/gesture |
5. Deep Learning Challenges and Solutions for High-Quality Emotion Recognition
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Frijda, N.H. Passions: Emotion and Socially Consequential Behavior. In Emotion: Interdisciplinary Perspectives; Kavanaugh, R.D., Zimmerberg, B., Fein, S., Eds.; Lawrence Erlbaum Associates, Inc.: Mahwah, NJ, USA, 1996; Volume 1, pp. 1–27. [Google Scholar]
- Frijda, N.H. Emotions. In The International Handbook of Psychology; Pawlik, K., Rosenzweig, M.R., Eds.; Sage Publications: London, UK, 2000; pp. 207–222. [Google Scholar]
- Magai, C. Personality theory: Birth, death, and transfiguration. In Emotion: Interdisciplinary Perspectives; Kavanaugh, R.D., Zimmerberg, B., Fein, S., Eds.; Lawrence Erlbaum Associates, Inc.: Mahwah, NJ, USA, 1996; Volume 1, pp. 171–201. [Google Scholar]
- Keltner, D.; Oatley, K.; Jenkins, J.M. Understanding Emotions; Wiley: Hoboken, NJ, USA, 2014. [Google Scholar]
- Scherer, K.R. Emotion. In Introduction to Social Psychology: A European perspective, 3rd ed.; Hewstone, M., Stroebe, W., Eds.; Blackwell Publishing Ltd.: Oxford, UK, 2001; Chapter 6; pp. 151–195. [Google Scholar]
- Fredrickson, B.L. The role of positive emotions in positive psychology: The broaden-and-build theory of positive emotions. Am. Psychol. 2001, 56, 218–226. [Google Scholar] [CrossRef]
- Rosenzweig, M.R.; Liang, K.C. Psychology in Biological Perspective. In The International Handbook of Psychology; Pawlik, K., Rosenzweig, M.R., Eds.; Sage Publications: London, UK, 2000; pp. 54–75. [Google Scholar]
- Shuman, V.; Scherer, K.R. Psychological Structure of Emotions. In International Encyclopedia of the Social & Behavioral Sciences; Wright, J.D., Ed.; Elsevier Ltd.: Waltham, MA, USA, 2015; Volume 7, pp. 526–533. [Google Scholar]
- Koolagudi, S.G.; Rao, K.S. Emotion recognition from speech: A review. Int. J. Speech Technol. 2012, 15, 99–117. [Google Scholar] [CrossRef]
- Ko, B.C. A brief review of facial emotion recognition based on visual information. Sensors 2018, 18, 401. [Google Scholar] [CrossRef] [PubMed]
- Glowinski, D.; Dael, N.; Camurri, A.; Volpe, G.; Mortillaro, M.; Scherer, K. Toward a minimal representation of affective gestures. IEEE Trans. Affect. Comput. 2011, 2, 106–118. [Google Scholar] [CrossRef]
- Horlings, R.; Datcu, D.; Rothkrantz, L.J. Emotion recognition using brain activity. In Proceedings of the 9th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing, Gabrovo, Bulgaria, 12–13 June 2008; p. II–1. [Google Scholar]
- Monajati, M.; Abbasi, S.H.; Shabaninia, F.; Shamekhi, S. Emotions states recognition based on physiological parameters by employing of fuzzy-adaptive resonance theory. Int. J. Intell. Sci. 2012, 2, 24190. [Google Scholar] [CrossRef]
- Kim, M.Y.; Bigman, Y.; Tamir, M. Emotional regulation. In International Encyclopedia of the Social & Behavioral Sciences, 2nd ed.; Wright, J.D., Ed.; Elsevier Ltd.: Waltham, MA, USA, 2015; Volume 7, pp. 452–456. [Google Scholar]
- Scherer, K.R. What are emotions? And how can they be measured? Soc. Sci. Inf. 2005, 44, 695–729. [Google Scholar] [CrossRef]
- Santhoshkumar, R.; Geetha, M.K. Deep learning approach for emotion recognition from human body movements with feedforward deep convolution neural networks. Procedia Comput. Sci. 2019, 152, 158–165. [Google Scholar] [CrossRef]
- Hassouneh, A.; Mutawa, A.; Murugappan, M. Development of a real-time emotion recognition system using facial expressions and EEG based on machine learning and deep neural network methods. Inform. Med. Unlocked 2020, 20, 100372. [Google Scholar] [CrossRef]
- Shah, M.; Cooper, D.G.; Cao, H.; Gur, R.C.; Nenkova, A.; Verma, R. Action unit models of facial expression of emotion in the presence of speech. In Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, Geneva, Switzerland, 2–5 September 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 49–54. [Google Scholar]
- Liebrucks, A. The Concept of Social Construction. Theory Psychol. 2001, 11, 363–391. [Google Scholar] [CrossRef]
- Scherer, K.R. Appraisal Theory. In Handbook of Cognition and Emotion; Dalgleish, T., Power, M.J., Eds.; John Wiley & Sons Ltd.: Chichester, West Sussex, UK, 1999; pp. 637–663. [Google Scholar]
- Le Ngo, A.C.; See, J.; Phan, R.C.W. Sparsity in Dynamics of Spontaneous Subtle Emotions: Analysis and Application. IEEE Trans. Affect. Comput. 2017, 8, 396–411. [Google Scholar] [CrossRef]
- Fang, X.; Sauter, D.A.; Van Kleef, G.A. Seeing Mixed Emotions: The Specificity of Emotion Perception From Static and Dynamic Facial Expressions Across Cultures. J. Cross-Cult. Psychol. 2018, 49, 130–148. [Google Scholar] [CrossRef] [PubMed]
- Tan, C.B.; Sheppard, E.; Stephen, I.D. A change in strategy: Static emotion recognition in Malaysian Chinese. Cogent Psychol. 2015, 2, 1085941. [Google Scholar] [CrossRef]
- Schmid, P.C.; Schmid Mast, M. Mood effects on emotion recognition. Motiv. Emot. 2010, 34, 288–292. [Google Scholar] [CrossRef]
- Jack, R.E.; Garrod, O.G.; Yu, H.; Caldara, R.; Schyns, P.G. Facial expressions of emotion are not culturally universal. Proc. Natl. Acad. Sci. USA 2012, 109, 7241–7244. [Google Scholar] [CrossRef] [PubMed]
- Grainger, S.A.; Henry, J.D.; Phillips, L.H.; Vanman, E.J.; Allen, R. Age deficits in facial affect recognition: The influence of dynamic cues. J. Gerontol. Ser. B: Psychol. Sci. Soc. Sci. 2017, 72, 622–632. [Google Scholar] [CrossRef]
- Martinez, A.M. Visual perception of facial expressions of emotion. Curr. Opin. Psychol. 2017, 17, 27–33. [Google Scholar] [CrossRef]
- Holland, C.A.; Ebner, N.C.; Lin, T.; Samanez-Larkin, G.R. Emotion identification across adulthood using the Dynamic FACES database of emotional expressions in younger, middle aged, and older adults. Cogn. Emot. 2019, 33, 245–257. [Google Scholar] [CrossRef]
- Barrett, L.F.; Adolphs, R.; Marsella, S.; Martinez, A.M.; Pollak, S.D. Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements. Psychol. Sci. Public Interest 2019, 20, 1–68. [Google Scholar] [CrossRef]
- Khosdelazad, S.; Jorna, L.S.; McDonald, S.; Rakers, S.E.; Huitema, R.B.; Buunk, A.M.; Spikman, J.M. Comparing static and dynamic emotion recognition tests: Performance of healthy participants. PLoS ONE 2020, 15, e0241297. [Google Scholar] [CrossRef]
- Krumhuber, E.G.; Kappas, A.; Manstead, A.S. Effects of dynamic aspects of facial expressions: A review. Emot. Rev. 2013, 5, 41–46. [Google Scholar] [CrossRef]
- Kamachi, M.; Bruce, V.; Mukaida, S.; Gyoba, J.; Yoshikawa, S.; Akamatsu, S. Dynamic properties influence the perception of facial expressions. Perception 2013, 42, 1266–1278. [Google Scholar] [CrossRef] [PubMed]
- Bassili, J.N. Facial motion in the perception of faces and of emotional expression. J. Exp. Psychol. Hum. Percept. Perform. 1978, 4, 373. [Google Scholar] [CrossRef] [PubMed]
- Namba, S.; Kabir, R.S.; Miyatani, M.; Nakao, T. Dynamic displays enhance the ability to discriminate genuine and posed facial expressions of emotion. Front. Psychol. 2018, 9, 672. [Google Scholar] [CrossRef] [PubMed]
- Sato, W.; Krumhuber, E.G.; Jellema, T.; Williams, J.H. Dynamic emotional communication. Front. Psychol. 2019, 10, 2836. [Google Scholar] [CrossRef]
- Ghorbanali, A.; Sohrabi, M.K. A comprehensive survey on deep learning-based approaches for multimodal sentiment analysis. Artif. Intell. Rev. 2023, 56, 1479–1512. [Google Scholar] [CrossRef]
- Ahmed, N.; Al Aghbari, Z.; Girija, S. A systematic survey on multimodal emotion recognition using learning algorithms. Intell. Syst. Appl. 2023, 17, 200171. [Google Scholar] [CrossRef]
- Zhang, S.; Yang, Y.; Chen, C.; Zhang, X.; Leng, Q.; Zhao, X. Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects. Expert Syst. Appl. 2023, 237, 121692. [Google Scholar] [CrossRef]
- Pan, B.; Hirota, K.; Jia, Z.; Dai, Y. A review of multimodal emotion recognition from datasets, preprocessing, features, and fusion methods. Neurocomputing 2023, 561, 126866. [Google Scholar] [CrossRef]
- Gladys, A.A.; Vetriselvi, V. Survey on multimodal approaches to emotion recognition. Neurocomputing 2023, 556, 126693. [Google Scholar] [CrossRef]
- Ezzameli, K.; Mahersia, H. Emotion recognition from unimodal to multimodal analysis: A review. Inf. Fusion 2023, 99, 101847. [Google Scholar] [CrossRef]
- Singh, U.; Abhishek, K.; Azad, H.K. A Survey of Cutting-edge Multimodal Sentiment Analysis. ACM Comput. Surv. 2024, 56, 1–38. [Google Scholar] [CrossRef]
- Hazmoune, S.; Bougamouza, F. Using transformers for multimodal emotion recognition: Taxonomies and state of the art review. Eng. Appl. Artif. Intell. 2024, 133, 108339. [Google Scholar] [CrossRef]
- Liu, H.; Lou, T.; Zhang, Y.; Wu, Y.; Xiao, Y.; Jensen, C.S.; Zhang, D. EEG-based multimodal emotion recognition: A machine learning perspective. IEEE Trans. Instrum. Meas. 2024, 73, 4003729. [Google Scholar] [CrossRef]
- Khan, U.A.; Xu, Q.; Liu, Y.; Lagstedt, A.; Alamäki, A.; Kauttonen, J. Exploring contactless techniques in multimodal emotion recognition: Insights into diverse applications, challenges, solutions, and prospects. Multimed. Syst. 2024, 30, 115. [Google Scholar] [CrossRef]
- Kalateh, S.; Estrada-Jimenez, L.A.; Hojjati, S.N.; Barata, J. A Systematic Review on Multimodal Emotion Recognition: Building Blocks, Current State, Applications, and Challenges. IEEE Access 2024, 12, 103976–104019. [Google Scholar] [CrossRef]
- Poria, S.; Cambria, E.; Bajpai, R.; Hussain, A. A review of affective computing: From unimodal analysis to multimodal fusion. Inf. Fusion 2017, 37, 98–125. [Google Scholar] [CrossRef]
- Kitchenham, B.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; EBSE Technical Report EBSE-2007-01; Keele University and University of Durham: Keele, UK; Durham, UK, 2007. [Google Scholar]
- Bosse, T. On computational models of emotion regulation and their applications within HCI. In Emotions and Affect in Human Factors and Human-Computer Interaction; Elsevier: Amsterdam, The Netherlands, 2017; pp. 311–337. [Google Scholar]
- Scherer, K.R. Psychological Structure of Emotions. In International Encyclopedia of the Social & Behavioral Sciences; Smelser, N.J., Baltes, P.B., Eds.; Elsevier Ltd.: Amsterdam, The Netherlands, 2001; pp. 4472–4477. [Google Scholar] [CrossRef]
- Ekman, P. An argument for basic emotions. Cogn. Emot. 1992, 6, 169–200. [Google Scholar] [CrossRef]
- Ekman, P.; Davidson, R.J. (Eds.) The Nature of Emotion: Fundamental Questions; Oxford University Press: Oxford, UK, 1994. [Google Scholar]
- Shaver, P.; Schwartz, J.; Kirson, D.; O’Connor, C. Emotion knowledge: Further exploration of a prototype approach. J. Personal. Soc. Psychol. 1987, 52, 1061–1086. [Google Scholar] [CrossRef]
- Cowen, A.S.; Keltner, D. Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proc. Natl. Acad. Sci. USA 2017, 114, E7900–E7909. [Google Scholar] [CrossRef]
- Oatley, K.; Johnson-laird, P.N. Towards a Cognitive Theory of Emotions. Cogn. Emot. 1987, 1, 29–50. [Google Scholar] [CrossRef]
- Zeng, Z.; Pantic, M.; Roisman, G.I.; Huang, T.S. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 39–58. [Google Scholar] [CrossRef] [PubMed]
- Plutchik, R. The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am. Sci. 2001, 89, 344–350. [Google Scholar] [CrossRef]
- Plutchik, R. A psychoevolutionary theory of emotions. Soc. Sci. Inf. 1982, 21, 529–553. [Google Scholar] [CrossRef]
- Russell, J.A.; Mehrabian, A. Evidence for a three-factor theory of emotions. J. Res. Personal. 1977, 11, 273–294. [Google Scholar] [CrossRef]
- Mehrabian, A. Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Curr. Psychol. 1996, 14, 261–292. [Google Scholar] [CrossRef]
- Russell, J.A. A circumplex model of affect. J. Personal. Soc. Psychol. 1980, 39, 1161–1178. [Google Scholar] [CrossRef]
- Whissell, C.M. The dictionary of affect in language. In The Measurement of Emotions; Elsevier: Amsterdam, The Netherlands, 1989; Chapter 5; pp. 113–131. [Google Scholar]
- Ortony, A.; Clore, G.L.; Collins, A. The Cognitive Structure of Emotions; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
- Lövheim, H. A new three-dimensional model for emotions and monoamine neurotransmitters. Med. Hypotheses 2012, 78, 341–348. [Google Scholar] [CrossRef]
- Cambria, E.; Livingstone, A.; Hussain, A. The Hourglass of Emotions. In Cognitive Behavioural Systems; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7403, pp. 144–157. [Google Scholar] [CrossRef]
- Susanto, Y.; Livingstone, A.G.; Ng, B.C.; Cambria, E. The Hourglass Model Revisited. IEEE Intell. Syst. 2020, 35, 96–102. [Google Scholar] [CrossRef]
- Fontaine, J.R.; Scherer, K.R.; Roesch, E.B.; Ellsworth, P.C. The world of emotions is not two-dimensional. Psychol. Sci. 2007, 18, 1050–1057. [Google Scholar] [CrossRef]
- Cochrane, T. Eight dimensions for the emotions. Soc. Sci. Inf. 2009, 48, 379–420. [Google Scholar] [CrossRef]
- Liu, Y.; Fu, Q.; Fu, X. The interaction between cognition and emotion. Chin. Sci. Bull. 2009, 54, 4102–4116. [Google Scholar] [CrossRef]
- Lee, Y.; Seo, Y.; Lee, Y.; Lee, D. Dimensional emotions are represented by distinct topographical brain networks. Int. J. Clin. Health Psychol. 2023, 23, 100408. [Google Scholar] [CrossRef]
- Mauss, I.B.; Robinson, M.D. Measures of emotion: A review. Cogn. Emot. 2009, 23, 209–237. [Google Scholar] [CrossRef] [PubMed]
- Kahou, S.E.; Bouthillier, X.; Lamblin, P.; Gulcehre, C.; Michalski, V.; Konda, K.; Jean, S.; Froumenty, P.; Dauphin, Y.; Boulanger-Lewandowski, N.; et al. Emonets: Multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Interfaces 2016, 10, 99–111. [Google Scholar] [CrossRef]
- Davison, A.K.; Merghani, W.; Yap, M.H. Objective Classes for Micro-Facial Expression Recognition. J. Imaging 2018, 4, 119. [Google Scholar] [CrossRef]
- Mehrabian, A. Communication without words. In Communication Theory; Routledge: London, UK, 2017; Chapter 13; pp. 193–200. [Google Scholar]
- Wolfkühler, W.; Majorek, K.; Tas, C.; Küper, C.; Saimed, N.; Juckel, G.; Brüne, M. Emotion recognition in pictures of facial affect: Is there a difference between forensic and non-forensic patients with schizophrenia? Eur. J. Psychiatry 2012, 26, 73–85. [Google Scholar] [CrossRef]
- Yan, W.J.; Wu, Q.; Liang, J.; Chen, Y.H.; Fu, X. How fast are the leaked facial expressions: The duration of micro-expressions. J. Nonverbal Behav. 2013, 37, 217–230. [Google Scholar] [CrossRef]
- Porter, S.; ten Brinke, L. Reading between the lies: Identifying concealed and falsified emotions in universal facial expressions. Psychol. Sci. 2008, 19, 508–514. [Google Scholar] [CrossRef]
- Ekman, P. Darwin, deception, and facial expression. Ann. N. Y. Acad. Sci. 2003, 1000, 205–221. [Google Scholar] [CrossRef]
- Ekman, P. Telling Lies: Clues to Deceit in the Marketplace, Politics, and Marriage; W.W. Norton & Company Inc.: New York, NY, USA, 2009. [Google Scholar]
- Ekman, P. Emotions Revealed: Recognizing Faces and Feelings to Improve Communication and Emotional Life; Times Books, Henry Holt and Company: New York, NY, USA, 2003. [Google Scholar]
- Porter, S.; Ten Brinke, L.; Wallace, B. Secrets and lies: Involuntary leakage in deceptive facial expressions as a function of emotional intensity. J. Nonverbal Behav. 2012, 36, 23–37. [Google Scholar] [CrossRef]
- Frank, M.; Herbasz, M.; Sinuk, K.; Keller, A.; Nolan, C. I see how you feel: Training laypeople and professionals to recognize fleeting emotions. In Proceedings of the Annual Meeting of the International Communication Association, Sheraton New York, New York, NY, USA, 21–25 May 2009; pp. 1–35. [Google Scholar]
- Ekman, P.; Freisen, W.V.; Ancoli, S. Facial signs of emotional experience. J. Personal. Soc. Psychol. 1980, 39, 1125. [Google Scholar] [CrossRef]
- Rosenberg, E.L.; Ekman, P. What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS); Oxford University Press: Oxford, UK, 2020. [Google Scholar]
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Ghimire, D.; Lee, J. Geometric feature-based facial expression recognition in image sequences using multi-class Adaboost and support vector machines. Sensors 2013, 13, 7714–7734. [Google Scholar] [CrossRef] [PubMed]
- Murugappan, M.; Mutawa, A. Facial geometric feature extraction based emotional expression classification using machine learning algorithms. PLoS ONE 2021, 16, e0247131. [Google Scholar]
- López-Gil, J.M.; Garay-Vitoria, N. Photogram classification-based emotion recognition. IEEE Access 2021, 9, 136974–136984. [Google Scholar] [CrossRef]
- Rivera, A.R.; Castillo, J.R.; Chae, O.O. Local directional number pattern for face analysis: Face and expression recognition. IEEE Trans. Image Process. 2013, 22, 1740–1752. [Google Scholar] [CrossRef]
- Moore, S.; Bowden, R. Local binary patterns for multi-view facial expression recognition. Comput. Vis. Image Underst. 2011, 115, 541–558. [Google Scholar] [CrossRef]
- Mistry, K.; Zhang, L.; Neoh, S.C.; Jiang, M.; Hossain, A.; Lafon, B. Intelligent Appearance and shape based facial emotion recognition for a humanoid robot. In Proceedings of the 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014), Dhaka, Bangladesh, 18–20 December 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–8. [Google Scholar]
- Yang, G.; Ortoneda, J.S.Y.; Saniie, J. Emotion Recognition Using Deep Neural Network with Vectorized Facial Features. In Proceedings of the 2018 IEEE International Conference on Electro/Information Technology (EIT), Rochester, MI, USA, 3–5 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 0318–0322. [Google Scholar]
- Hai-Duong, N.; Soonja, Y.; Guee-Sang, L.; Hyung-Jeong, Y.; In-Seop, N.; Soo-Hyung, K. Facial Emotion Recognition Using an Ensemble of Multi-Level Convolutional Neural Networks. Int. J. Pattern Recognit. Artif. Intell. 2019, 33, 1940015. [Google Scholar]
- Agrawal, E.; Christopher, J. Emotion recognition from periocular features. In Proceedings of the Second International Conference on Machine Learning, Image Processing, Network Security and Data Sciences (MIND 2020), Silchar, India, 30–31 July 2020; Springer: Berlin/Heidelberg, Germany, 2020. Part I. pp. 194–208. [Google Scholar]
- Dirik, M. Optimized ANFIS model with hybrid metaheuristic algorithms for facial emotion recognition. Int. J. Fuzzy Syst. 2023, 25, 485–496. [Google Scholar] [CrossRef]
- Pfister, T.; Li, X.; Zhao, G.; Pietikäinen, M. Recognising spontaneous facial micro-expressions. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1449–1456. [Google Scholar]
- Wang, Y.; See, J.; Phan, R.C.W.; Oh, Y.H. LBP with Six Intersection Points: Reducing Redundant Information in LBP-TOP for Micro-expression Recognition. In Computer Vision—Asian Conference on Computer Vision ACCV 2014; Cremers, D., Reid, I., Saito, H., Yang, M.H., Eds.; Springer International Publishing: Cham, Switzerland, 2015; Volume 9003, pp. 525–537. [Google Scholar] [CrossRef]
- Huang, X.; Wang, S.J.; Zhao, G.; Piteikainen, M. Facial Micro-Expression Recognition Using Spatiotemporal Local Binary Pattern with Integral Projection. In Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), Santiago, Chile, 7–13 December 2015; pp. 1–9. [Google Scholar] [CrossRef]
- Wang, Y.; See, J.; Phan, R.C.W.; Oh, Y.H. Efficient spatio-temporal local binary patterns for spontaneous facial micro-expression recognition. PLoS ONE 2015, 10, e0124674. [Google Scholar]
- Li, X.; Pfister, T.; Huang, X.; Zhao, G.; Pietikäinen, M. A spontaneous micro-expression database: Inducement, collection and baseline. In Proceedings of the 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Shanghai, China, 22–26 April 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1–6. [Google Scholar]
- Wang, Y.; See, J.; Oh, Y.H.; Phan, R.C.W.; Rahulamathavan, Y.; Ling, H.C.; Tan, S.W.; Li, X. Effective recognition of facial micro-expressions with video motion magnification. Multimed. Tools Appl. 2017, 76, 21665–21690. [Google Scholar] [CrossRef]
- Li, X.; Hong, X.; Moilanen, A.; Huang, X.; Pfister, T.; Zhao, G.; Pietikäinen, M. Towards Reading Hidden Emotions: A Comparative Study of Spontaneous Micro-Expression Spotting and Recognition Methods. IEEE Trans. Affect. Comput. 2018, 9, 563–577. [Google Scholar] [CrossRef]
- Zhao, G.; Pietikainen, M. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 915–928. [Google Scholar] [CrossRef]
- Park, S.Y.; Lee, S.H.; Ro, Y.M. Subtle facial expression recognition using adaptive magnification of discriminative facial motion. In Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, Australia, 26–30 October 2015; pp. 911–914. [Google Scholar]
- Shreve, M.; Godavarthy, S.; Goldgof, D.; Sarkar, S. Macro- and micro-expression spotting in long videos using spatio-temporal strain. In Proceedings of the 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), Santa Barbara, CA, USA, 21–25 March 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 51–56. [Google Scholar]
- Liong, S.T.; See, J.; Phan, R.C.W.; Le Ngo, A.C.; Oh, Y.H.; Wong, K. Subtle Expression Recognition Using Optical Strain Weighted Features. In Computer Vision—ACCV 2014 Workshops; Jawahar, C., Shan, S., Eds.; Springer International Publishing: Cham, Switzerland, 2015; Volume 9009, pp. 644–657. [Google Scholar] [CrossRef]
- Liong, S.T.; See, J.; Phan, R.C.W.; Oh, Y.H.; Le Ngo, A.C.; Wong, K.; Tan, S.W. Spontaneous subtle expression detection and recognition based on facial strain. Signal Process. Image Commun. 2016, 47, 170–182. [Google Scholar] [CrossRef]
- Liong, S.T.; See, J.; Wong, K.; Phan, R.C.W. Less is more: Micro-expression recognition from video using apex frame. Signal Process. Image Commun. 2018, 62, 82–92. [Google Scholar] [CrossRef]
- Xu, F.; Zhang, J.; Wang, J.Z. Microexpression Identification and Categorization Using a Facial Dynamics Map. IEEE Trans. Affect. Comput. 2017, 8, 254–267. [Google Scholar] [CrossRef]
- Zheng, H.; Geng, X.; Yang, Z. A Relaxed K-SVD Algorithm for Spontaneous Micro-Expression Recognition. In PRICAI 2016: Trends in Artificial Intelligence; Booth, R., Zhang, M.L., Eds.; Springer International Publishing: Cham, Switzerland, 2016; Volume 9810, pp. 692–699. [Google Scholar] [CrossRef]
- Le Ngo, A.C.; Phan, R.C.W.; See, J. Spontaneous subtle expression recognition: Imbalanced databases and solutions. In Proceedings of the 12th Asian Conference on Computer Vision (ACCV), Singapore, 1–5 November 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 33–48. [Google Scholar]
- Oh, Y.H.; Le Ngo, A.C.; See, J.; Liong, S.T.; Phan, R.C.W.; Ling, H.C. Monogenic Riesz wavelet representation for micro-expression recognition. In Proceedings of the 2015 IEEE International Conference on Digital Signal Processing (DSP), Singapore, 21–24 July 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1237–1241. [Google Scholar]
- Huang, X.; Wang, S.J.; Liu, X.; Zhao, G.; Feng, X.; Pietikainen, M. Discriminative Spatiotemporal Local Binary Pattern with Revisited Integral Projection for Spontaneous Facial Micro-Expression Recognition. IEEE Trans. Affect. Comput. 2019, 10, 32–47. [Google Scholar] [CrossRef]
- Peng, M.; Wang, C.; Chen, T.; Liu, G.; Fu, X. Dual Temporal Scale Convolutional Neural Network for Micro-Expression Recognition. Front. Psychol. 2017, 8, 1745. [Google Scholar] [CrossRef] [PubMed]
- Zhou, L.; Mao, Q.; Xue, L. Cross-database micro-expression recognition: A style aggregated and attention transfer approach. In Proceedings of the 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Shanghai, China, 8–12 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 102–107. [Google Scholar]
- Belaiche, R.; Liu, Y.; Migniot, C.; Ginhac, D.; Yang, F. Cost-effective CNNs for real-time micro-expression recognition. Appl. Sci. 2020, 10, 4959. [Google Scholar] [CrossRef]
- Liu, Y.; Du, H.; Zheng, L.; Gedeon, T. A neural micro-expression recognizer. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–4. [Google Scholar]
- Avent, R.R.; Ng, C.T.; Neal, J.A. Machine vision recognition of facial affect using backpropagation neural networks. In Proceedings of the 16th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Baltimore, MD, USA, 3–6 November 1994; IEEE: Piscataway, NJ, USA, 1994; Volume 2, pp. 1364–1365. [Google Scholar]
- Gargesha, M.; Kuchi, P.; Torkkola, I. Facial expression recognition using artificial neural networks. Artif. Neural Comput. Syst. 2002, 8, 1–6. [Google Scholar]
- Bartlett, M.S.; Littlewort, G.; Frank, M.; Lainscsek, C.; Fasel, I.; Movellan, J. Recognizing facial expression: Machine learning and application to spontaneous behavior. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; IEEE: Piscataway, NJ, USA, 2005; Volume 2, pp. 568–573. [Google Scholar]
- Guo, Y.; Tao, D.; Yu, J.; Xiong, H.; Li, Y.; Tao, D. Deep neural networks with relativity learning for facial expression recognition. In Proceedings of the 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Seattle, WA, USA, 11–15 July 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
- Zhang, K.; Huang, Y.; Du, Y.; Wang, L. Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 2017, 26, 4193–4203. [Google Scholar] [CrossRef]
- Wang, S.H.; Phillips, P.; Dong, Z.C.; Zhang, Y.D. Intelligent facial emotion recognition based on stationary wavelet entropy and Jaya algorithm. Neurocomputing 2018, 272, 668–676. [Google Scholar] [CrossRef]
- Ruiz-Garcia, A.; Elshaw, M.; Altahhan, A.; Palade, V. A hybrid deep learning neural approach for emotion recognition from facial expressions for socially assistive robots. Neural Comput. Appl. 2018, 29, 359–373. [Google Scholar] [CrossRef]
- Caroppo, A.; Leone, A.; Siciliano, P. Comparison between deep learning models and traditional machine learning approaches for facial expression recognition in ageing adults. J. Comput. Sci. Technol. 2020, 35, 1127–1146. [Google Scholar] [CrossRef]
- Khanbebin, S.N.; Mehrdad, V. Improved convolutional neural network-based approach using hand-crafted features for facial expression recognition. Multimed. Tools Appl. 2023, 82, 11489–11505. [Google Scholar] [CrossRef]
- Boughanem, H.; Ghazouani, H.; Barhoumi, W. Multichannel convolutional neural network for human emotion recognition from in-the-wild facial expressions. Vis. Comput. 2023, 39, 5693–5718. [Google Scholar] [CrossRef]
- Arabian, H.; Abdulbaki Alshirbaji, T.; Chase, J.G.; Moeller, K. Emotion Recognition beyond Pixels: Leveraging Facial Point Landmark Meshes. Appl. Sci. 2024, 14, 3358. [Google Scholar] [CrossRef]
- Kim, D.H.; Baddar, W.J.; Ro, Y.M. Micro-Expression Recognition with Expression-State Constrained Spatio-Temporal Feature Representations. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 382–386. [Google Scholar] [CrossRef]
- Khor, H.Q.; See, J.; Phan, R.C.W.; Lin, W. Enriched Long-Term Recurrent Convolutional Network for Facial Micro-Expression Recognition. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 667–674. [Google Scholar] [CrossRef]
- Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Deep face recognition. In Proceedings of the BMVC 2015-Proceedings of the British Machine Vision Conference 2015, Swansea, UK, 7–10 September 2015. [Google Scholar]
- Zhou, L.; Mao, Q.; Xue, L. Dual-inception network for cross-database micro-expression recognition. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–5. [Google Scholar]
- Wang, C.; Peng, M.; Bi, T.; Chen, T. Micro-attention for micro-expression recognition. Neurocomputing 2020, 410, 354–362. [Google Scholar] [CrossRef]
- Gan, Y.S.; Liong, S.T.; Yau, W.C.; Huang, Y.C.; Tan, L.K. OFF-ApexNet on micro-expression recognition system. Signal Process. Image Commun. 2019, 74, 129–139. [Google Scholar] [CrossRef]
- Xia, Z.; Feng, X.; Hong, X.; Zhao, G. Spontaneous facial micro-expression recognition via deep convolutional network. In Proceedings of the 2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA), Xi’an, China, 7–10 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
- Liong, S.T.; Gan, Y.S.; See, J.; Khor, H.Q.; Huang, Y.C. Shallow triple stream three-dimensional cnn (ststnet) for micro-expression recognition. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–5. [Google Scholar]
- Li, J.; Wang, Y.; See, J.; Liu, W. Micro-expression recognition based on 3D flow convolutional neural network. Pattern Anal. Appl. 2019, 22, 1331–1339. [Google Scholar] [CrossRef]
- Wu, C.; Guo, F. TSNN: Three-Stream Combining 2D and 3D Convolutional Neural Network for Micro-Expression Recognition. IEEJ Trans. Electr. Electron. Eng. 2021, 16, 98–107. [Google Scholar] [CrossRef]
- Peng, M.; Wang, C.; Bi, T.; Shi, Y.; Zhou, X.; Chen, T. A novel apex-time network for cross-dataset micro-expression recognition. In Proceedings of the 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), Cambridge, UK, 3–6 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
- Van Quang, N.; Chun, J.; Tokuyama, T. CapsuleNet for micro-expression recognition. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–7. [Google Scholar]
- Xie, H.X.; Lo, L.; Shuai, H.H.; Cheng, W.H. AU-assisted graph attention convolutional network for micro-expression recognition. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2871–2880. [Google Scholar]
- Polikovsky, S.; Kameda, Y.; Ohta, Y. Facial micro-expressions recognition using high speed camera and 3D-gradient descriptor. In Proceedings of the 3rd International Conference on Imaging for Crime Detection and Prevention (ICDP 2009), London, UK, 3 December 2009. [Google Scholar]
- Warren, G.; Schertler, E.; Bull, P. Detecting deception from emotional and unemotional cues. J. Nonverbal Behav. 2009, 33, 59–69. [Google Scholar] [CrossRef]
- Lyons, M.J. “Excavating AI” Re-excavated: Debunking a Fallacious Account of the JAFFE Dataset. arXiv 2021, arXiv:2107.13998. [Google Scholar] [CrossRef]
- Yin, L.; Wei, X.; Sun, Y.; Wang, J.; Rosato, M.J. A 3D facial expression database for facial behavior research. In Proceedings of the 7th international conference on automatic face and gesture recognition (FGR06), Southampton, UK, 10–12 April 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 211–216. [Google Scholar]
- Li, S.; Deng, W.; Du, J. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2584–2593. [Google Scholar]
- Goeleven, E.; De Raedt, R.; Leyman, L.; Verschuere, B. The Karolinska directed emotional faces: A validation study. Cogn. Emot. 2008, 22, 1094–1118. [Google Scholar] [CrossRef]
- Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I. The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 13–18 June 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 94–101. [Google Scholar]
- Aifanti, N.; Papachristou, C.; Delopoulos, A. The MUG facial expression database. In Proceedings of the 11th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 10), Desenzano del Garda, Italy, 12–14 April 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–4. [Google Scholar]
- Chen, L.F.; Yen, Y.S. Taiwanese Facial Expression Image Database; Brain Mapping Laboratory, Institute of Brain Science, National Yang-Ming University: Taipei, Taiwan, 2007. [Google Scholar]
- Langner, O.; Dotsch, R.; Bijlstra, G.; Wigboldus, D.H.; Hawk, S.T.; Van Knippenberg, A. Presentation and validation of the Radboud Faces Database. Cogn. Emot. 2010, 24, 1377–1388. [Google Scholar] [CrossRef]
- Yan, W.J.; Li, X.; Wang, S.J.; Zhao, G.; Liu, Y.J.; Chen, Y.H.; Fu, X. CASME II: An Improved Spontaneous Micro-Expression Database and the Baseline Evaluation. PLoS ONE 2014, 9, e86041. [Google Scholar] [CrossRef]
- Davison, A.K.; Lansley, C.; Costen, N.; Tan, K.; Yap, M.H. SAMM: A Spontaneous Micro-Facial Movement Dataset. IEEE Trans. Affect. Comput. 2018, 9, 116–129. [Google Scholar] [CrossRef]
- Piana, S.; Staglianò, A.; Odone, F.; Verri, A.; Camurri, A. Real-time Automatic Emotion Recognition from Body Gestures. arXiv 2014, arXiv:1402.5047. [Google Scholar]
- Piana, S.; Staglianò, A.; Camurri, A.; Odone, F. A set of full-body movement features for emotion recognition to help children affected by autism spectrum condition. In Proceedings of the IDGEI International Workshop, Chania, Greece, 14 May 2013; Volume 23. [Google Scholar]
- Noroozi, F.; Corneanu, C.A.; Kamińska, D.; Sapiński, T.; Escalera, S.; Anbarjafari, G. Survey on emotional body gesture recognition. IEEE Trans. Affect. Comput. 2018, 12, 505–523. [Google Scholar] [CrossRef]
- Zacharatos, H.; Gatzoulis, C.; Chrysanthou, Y.L. Automatic emotion recognition based on body movement analysis: A survey. IEEE Comput. Graph. Appl. 2014, 34, 35–45. [Google Scholar] [CrossRef] [PubMed]
- Ly, S.T.; Lee, G.S.; Kim, S.H.; Yang, H.J. Emotion recognition via body gesture: Deep learning model coupled with keyframe selection. In Proceedings of the 2018 International Conference on Machine Learning and Machine Intelligence (MLMI2018), Hanoi, Vietnam, 28–30 September 2018; pp. 27–31. [Google Scholar]
- Liu, X.; Shi, H.; Chen, H.; Yu, Z.; Li, X.; Zhao, G. iMiGUE: An identity-free video dataset for micro-gesture understanding and emotion analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 10626–10637. [Google Scholar]
- Wu, J.; Zhang, Y.; Sun, S.; Li, Q.; Zhao, X. Generalized zero-shot emotion recognition from body gestures. Appl. Intell. 2022, 52, 8616–8634. [Google Scholar] [CrossRef]
- Ekman, P.; Keltner, D. Universal facial expressions of emotion. Calif. Ment. Health Res. Dig. 1970, 8, 151–158. [Google Scholar]
- Kerkeni, L.; Serrestou, Y.; Mbarki, M.; Raoof, K.; Mahjoub, M.A.; Cleder, C. Automatic speech emotion recognition using machine learning. In Social Media and Machine Learning; IntechOpen: Rijeka, Croatia, 2019. [Google Scholar]
- Murray, I.R.; Arnott, J.L. Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. J. Acoust. Soc. Am. 1993, 93, 1097–1108. [Google Scholar] [CrossRef]
- Poria, S.; Cambria, E.; Hussain, A.; Huang, G.B. Towards an intelligent framework for multimodal affective data analysis. Neural Netw. 2015, 63, 104–116. [Google Scholar] [CrossRef] [PubMed]
- Kamińska, D.; Sapiński, T.; Anbarjafari, G. Efficiency of chosen speech descriptors in relation to emotion recognition. EURASIP J. Audio Speech Music Process. 2017, 2017, 3. [Google Scholar] [CrossRef]
- Vogt, T.; André, E. Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, Amsterdam, The Netherlands, 6 July 2005; IEEE: Piscataway, NJ, USA, 2005; pp. 474–477. [Google Scholar]
- Devillers, L.; Vidrascu, L.; Lamel, L. Challenges in real-life emotion annotation and machine learning based detection. Neural Netw. 2005, 18, 407–422. [Google Scholar] [CrossRef]
- Burkhardt, F.; Paeschke, A.; Rolfes, M.; Sendlmeier, W.F.; Weiss, B. A database of German emotional speech. Interspeech 2005, 5, 1517–1520. [Google Scholar]
- Adigwe, A.; Tits, N.; Haddad, K.E.; Ostadabbas, S.; Dutoit, T. The emotional voices database: Towards controlling the emotion dimension in voice generation systems. arXiv 2018, arXiv:1806.09514. [Google Scholar]
- You, M.; Chen, C.; Bu, J. CHAD: A Chinese affective database. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction, Beijing, China, 22–24 October 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 542–549. [Google Scholar]
- Palo, H.K.; Mohanty, M.N. Wavelet based feature combination for recognition of emotions. Ain Shams Eng. J. 2018, 9, 1799–1806. [Google Scholar] [CrossRef]
- Kerkeni, L.; Serrestou, Y.; Raoof, K.; Mbarki, M.; Mahjoub, M.A.; Cleder, C. Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO. Speech Commun. 2019, 114, 22–35. [Google Scholar] [CrossRef]
- Nagarajan, S.; Nettimi, S.S.S.; Kumar, L.S.; Nath, M.K.; Kanhe, A. Speech emotion recognition using cepstral features extracted with novel triangular filter banks based on bark and ERB frequency scales. Digit. Signal Process. 2020, 104, 102763. [Google Scholar] [CrossRef]
- Busso, C.; Bulut, M.; Lee, C.C.; Kazemzadeh, A.; Mower, E.; Kim, S.; Chang, J.N.; Lee, S.; Narayanan, S.S. IEMOCAP: Interactive emotional dyadic motion capture database. Lang. Resour. Eval. 2008, 42, 335–359. [Google Scholar] [CrossRef]
- Snyder, D.; Garcia-Romero, D.; Sell, G.; Povey, D.; Khudanpur, S. X-Vectors: Robust DNN Embeddings for Speaker Recognition. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 5329–5333. [Google Scholar] [CrossRef]
- Kumawat, P.; Routray, A. Applying TDNN Architectures for Analyzing Duration Dependencies on Speech Emotion Recognition. In Proceedings of the Interspeech 2021 (ISCA), Brno, Czech Republic, 30 August–3 September 2021; pp. 3410–3414. [Google Scholar] [CrossRef]
- Zhou, S.; Beigi, H. A Transfer Learning Method for Speech Emotion Recognition from Automatic Speech Recognition. arXiv 2020, arXiv:2008.02863. [Google Scholar]
- Morais, E.; Hoory, R.; Zhu, W.; Gat, I.; Damasceno, M.; Aronowitz, H. Speech Emotion Recognition using Self-Supervised Features. arXiv 2022, arXiv:2202.03896. [Google Scholar]
- Ahmed, M.R.; Islam, S.; Islam, A.M.; Shatabda, S. An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition. Expert Syst. Appl. 2023, 218, 119633. [Google Scholar] [CrossRef]
- Nam, H.J.; Park, H.J. Speech Emotion Recognition under Noisy Environments with SNR Down to- 6 dB Using Multi-Decoder Wave-U-Net. Appl. Sci. 2024, 14, 5227. [Google Scholar] [CrossRef]
- Alkhamali, E.A.; Allinjawi, A.; Ashari, R.B. Combining Transformer, Convolutional Neural Network, and Long Short-Term Memory Architectures: A Novel Ensemble Learning Technique That Leverages Multi-Acoustic Features for Speech Emotion Recognition in Distance Education Classrooms. Appl. Sci. 2024, 14, 5050. [Google Scholar] [CrossRef]
- Sekkate, S.; Khalil, M.; Adib, A. A statistical feature extraction for deep speech emotion recognition in a bilingual scenario. Multimed. Tools Appl. 2023, 82, 11443–11460. [Google Scholar] [CrossRef]
- Huang, Y.; Tian, K.; Wu, A.; Zhang, G. Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition. J. Ambient Intell. Humaniz. Comput. 2019, 10, 1787–1798. [Google Scholar] [CrossRef]
- Balakrishnan, A.; Rege, A. Reading Emotions from Speech Using Deep Neural Networks; Technical Report; Computer Science Department, Stanford University: Stanford, CA, USA, 2017. [Google Scholar]
- Alu, D.; Zoltan, E.; Stoica, I.C. Voice based emotion recognition with convolutional neural networks for companion robots. Sci. Technol. 2017, 20, 222–240. [Google Scholar]
- Tzirakis, P.; Zhang, J.; Schuller, B.W. End-to-end speech emotion recognition using deep neural networks. In Proceedings of the 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 5089–5093. [Google Scholar]
- Schuller, B.W. Speech emotion recognition: Two decades in a nutshell, benchmarks, and ongoing trends. Commun. ACM 2018, 61, 90–99. [Google Scholar] [CrossRef]
- Shon, S.; Ali, A.; Glass, J. MIT-QCRI Arabic dialect identification system for the 2017 multi-genre broadcast challenge. In Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan, 16–20 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 374–380. [Google Scholar]
- Shu, L.; Xie, J.; Yang, M.; Li, Z.; Li, Z.; Liao, D.; Xu, X.; Yang, X. A review of emotion recognition using physiological signals. Sensors 2018, 18, 2074. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.W.; Nie, D.; Lu, B.L. Emotional state classification from EEG data using machine learning approach. Neurocomputing 2014, 129, 94–106. [Google Scholar] [CrossRef]
- Hosseinifard, B.; Moradi, M.H.; Rostami, R. Classifying depression patients and normal subjects using machine learning techniques and nonlinear features from EEG signal. Comput. Methods Programs Biomed. 2013, 109, 339–345. [Google Scholar] [CrossRef]
- Zhang, Y.; Ji, X.; Zhang, S. An approach to EEG-based emotion recognition using combined feature extraction method. Neurosci. Lett. 2016, 633, 152–157. [Google Scholar] [CrossRef]
- Soroush, M.Z.; Maghooli, K.; Setarehdan, S.K.; Nasrabadi, A.M. A novel method of EEG-based emotion recognition using nonlinear features variability and Dempster–Shafer theory. Biomed. Eng. Appl. Basis Commun. 2018, 30, 1850026. [Google Scholar] [CrossRef]
- Murugappan, M.; Nagarajan, R.; Yaacob, S. Combining spatial filtering and wavelet transform for classifying human emotions using EEG Signals. J. Med. Biol. Eng. 2011, 31, 45–51. [Google Scholar] [CrossRef]
- Jie, X.; Cao, R.; Li, L. Emotion recognition based on the sample entropy of EEG. Bio-Med. Mater. Eng. 2014, 24, 1185–1192. [Google Scholar] [CrossRef]
- Lan, Z.; Sourina, O.; Wang, L.; Liu, Y. Real-time EEG-based emotion monitoring using stable features. Vis. Comput. 2016, 32, 347–358. [Google Scholar] [CrossRef]
- Song, T.; Zheng, W.; Song, P.; Cui, Z. EEG emotion recognition using dynamical graph convolutional neural networks. IEEE Trans. Affect. Comput. 2018, 11, 532–541. [Google Scholar] [CrossRef]
- Gao, Z.; Li, R.; Ma, C.; Rui, L.; Sun, X. Core-brain-network-based multilayer convolutional neural network for emotion recognition. IEEE Trans. Instrum. Meas. 2021, 70, 1–9. [Google Scholar] [CrossRef]
- Yao, L.; Lu, Y.; Wang, M.; Qian, Y.; Li, H. Exploring EEG Emotion Recognition through Complex Networks: Insights from the Visibility Graph of Ordinal Patterns. Appl. Sci. 2024, 14, 2636. [Google Scholar] [CrossRef]
- Álvarez-Jiménez, M.; Calle-Jimenez, T.; Hernández-Álvarez, M. A Comprehensive Evaluation of Features and Simple Machine Learning Algorithms for Electroencephalographic-Based Emotion Recognition. Appl. Sci. 2024, 14, 2228. [Google Scholar] [CrossRef]
- Salama, E.S.; El-Khoribi, R.A.; Shoman, M.E.; Shalaby, M.A.W. EEG-based emotion recognition using 3D convolutional neural networks. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 329–337. [Google Scholar] [CrossRef]
- Kumar, N.; Khaund, K.; Hazarika, S.M. Bispectral analysis of EEG for emotion recognition. Procedia Comput. Sci. 2016, 84, 31–35. [Google Scholar] [CrossRef]
- Quesada Tabares, R.; Molina Cantero, A.J.; Gómez González, I.M.; Merino Monge, M.; Castro García, J.A.; Cabrera Cabrera, R. Emotions Detection based on a Single-electrode EEG Device. In Proceedings of the 4th International Conference on Physiological Computing Systems (PhyCS 2017), Madrid, Spain, 27–28 July 2017; SciTePress: Setubal, Portugal, 2017; pp. 89–95. [Google Scholar]
- Van Dyk, D.A.; Meng, X.L. The art of data augmentation. J. Comput. Graph. Stat. 2001, 10, 1–50. [Google Scholar] [CrossRef]
- Khosrowabadi, R.; Quek, C.; Ang, K.K.; Wahab, A. ERNN: A biologically inspired feedforward neural network to discriminate emotion from EEG signal. IEEE Trans. Neural Netw. Learn. Syst. 2013, 25, 609–620. [Google Scholar] [CrossRef]
- Antoniou, A.; Storkey, A.; Edwards, H. Data augmentation generative adversarial networks. arXiv 2017, arXiv:1711.04340. [Google Scholar]
- Pan, Z.; Yu, W.; Wang, B.; Xie, H.; Sheng, V.S.; Lei, J.; Kwong, S. Loss functions of generative adversarial networks (GANs): Opportunities and challenges. IEEE Trans. Emerg. Top. Comput. Intell. 2020, 4, 500–522. [Google Scholar] [CrossRef]
- Harper, R.; Southern, J. End-to-end prediction of emotion from heartbeat data collected by a consumer fitness tracker. In Proceedings of the 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), Cambridge, UK, 3–6 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–7. [Google Scholar]
- Sarkar, P.; Etemad, A. Self-supervised learning for ECG-based emotion recognition. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 3217–3221. [Google Scholar]
- Li, L.; Chen, J.H. Emotion recognition using physiological signals. In Proceedings of the International Conference on Artificial Reality and Telexistence, Hangzhou, China, 29 November–1 December 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 437–446. [Google Scholar]
- Lisetti, C.; Nasoz, F.; LeRouge, C.; Ozyer, O.; Alvarez, K. Developing multimodal intelligent affective interfaces for tele-home health care. Int. J. Hum.-Comput. Stud. 2003, 59, 245–255. [Google Scholar] [CrossRef]
- Lisetti, C.L.; Nasoz, F. MAUI: A multimodal affective user interface. In Proceedings of the Tenth ACM International Conference on Multimedia, Juan-les-Pins, France, 1–6 December 2002; pp. 161–170. [Google Scholar]
- Shimojo, S.; Shams, L. Sensory modalities are not separate modalities: Plasticity and interactions. Curr. Opin. Neurobiol. 2001, 11, 505–509. [Google Scholar] [CrossRef] [PubMed]
- D’mello, S.K.; Kory, J. A review and meta-analysis of multimodal affect detection systems. ACM Comput. Surv. (CSUR) 2015, 47, 1–36. [Google Scholar] [CrossRef]
- Scherer, K.R. Adding the affective dimension: A new look in speech analysis and synthesis. In Proceedings of the ICSLP, Philadelphia, PA, USA, 3–6 October 1996. [Google Scholar]
- Lu, Y.; Zheng, W.L.; Li, B.; Lu, B.L. Combining eye movements and EEG to enhance emotion recognition. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
- Huang, Y.; Yang, J.; Liu, S.; Pan, J. Combining facial expressions and electroencephalography to enhance emotion recognition. Future Internet 2019, 11, 105. [Google Scholar] [CrossRef]
- Van Huynh, T.; Yang, H.J.; Lee, G.S.; Kim, S.H.; Na, I.S. Emotion recognition by integrating eye movement analysis and facial expression model. In Proceedings of the 3rd International Conference on Machine Learning and Soft Computing, Da Lat, Vietnam, 25–28 January 2019; pp. 166–169. [Google Scholar]
- Soleymani, M.; Pantic, M.; Pun, T. Multimodal emotion recognition in response to videos. IEEE Trans. Affect. Comput. 2011, 3, 211–223. [Google Scholar] [CrossRef]
- Nguyen, H.D.; Yeom, S.; Oh, I.S.; Kim, K.M.; Kim, S.H. Facial expression recognition using a multi-level convolutional neural network. In Proceedings of the International Conference on Pattern Recognition and Artificial Intelligence, Montréal, Canada, 14–17 May 2018; pp. 217–221. [Google Scholar]
- Li, T.H.; Liu, W.; Zheng, W.L.; Lu, B.L. Classification of five emotions from EEG and eye movement signals: Discrimination ability and stability over time. In Proceedings of the 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER), San Francisco, CA, USA, 20–23 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 607–610. [Google Scholar]
- Zhao, L.M.; Li, R.; Zheng, W.L.; Lu, B.L. Classification of five emotions from EEG and eye movement signals: Complementary representation properties. In Proceedings of the 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER), San Francisco, CA, USA, 20–23 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 611–614. [Google Scholar]
- Gu, Y.; Yang, K.; Fu, S.; Chen, S.; Li, X.; Marsic, I. Multimodal affective analysis using hierarchical attention strategy with word-level alignment. Proc. Conf. Assoc. Comput. Linguist. Meet. 2018, 2018, 2225. [Google Scholar]
- Luna-Jiménez, C.; Kleinlein, R.; Griol, D.; Callejas, Z.; Montero, J.M.; Fernández-Martínez, F. A proposal for multimodal emotion recognition using aural transformers and action units on ravdess dataset. Appl. Sci. 2021, 12, 327. [Google Scholar] [CrossRef]
- Simić, N.; Suzić, S.; Milošević, N.; Stanojev, V.; Nosek, T.; Popović, B.; Bajović, D. Enhancing Emotion Recognition through Federated Learning: A Multimodal Approach with Convolutional Neural Networks. Appl. Sci. 2024, 14, 1325. [Google Scholar] [CrossRef]
- Wu, Y.; Daoudi, M.; Amad, A. Transformer-based self-supervised multimodal representation learning for wearable emotion recognition. IEEE Trans. Affect. Comput. 2023, 15, 157–172. [Google Scholar] [CrossRef]
- Li, D.; Liu, J.; Yang, Y.; Hou, F.; Song, H.; Song, Y.; Gao, Q.; Mao, Z. Emotion recognition of subjects with hearing impairment based on fusion of facial expression and EEG topographic map. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 31, 437–445. [Google Scholar] [CrossRef]
- Middya, A.I.; Nag, B.; Roy, S. Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities. Knowl.-Based Syst. 2022, 244, 108580. [Google Scholar] [CrossRef]
- Sharafi, M.; Yazdchi, M.; Rasti, R.; Nasimi, F. A novel spatio-temporal convolutional neural framework for multimodal emotion recognition. Biomed. Signal Process. Control 2022, 78, 103970. [Google Scholar] [CrossRef]
- Kang, D.; Kim, D.; Kang, D.; Kim, T.; Lee, B.; Kim, D.; Song, B.C. Beyond superficial emotion recognition: Modality-adaptive emotion recognition system. Expert Syst. Appl. 2024, 235, 121097. [Google Scholar] [CrossRef]
- Selvi, R.; Vijayakumaran, C. An Efficient Multimodal Emotion Identification Using FOX Optimized Double Deep Q-Learning. Wirel. Pers. Commun. 2023, 132, 2387–2406. [Google Scholar] [CrossRef]
- Mocanu, B.; Tapu, R.; Zaharia, T. Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning. Image Vis. Comput. 2023, 133, 104676. [Google Scholar] [CrossRef]
- Shahzad, H.; Bhatti, S.M.; Jaffar, A.; Rashid, M.; Akram, S. Multi-modal CNN Features Fusion for Emotion Recognition: A Modified Xception Model. IEEE Access 2023, 11, 94281–94289. [Google Scholar] [CrossRef]
- Aguilera, A.; Mellado, D.; Rojas, F. An assessment of in-the-wild datasets for multimodal emotion recognition. Sensors 2023, 23, 5184. [Google Scholar] [CrossRef]
- Roshdy, A.; Karar, A.; Kork, S.A.; Beyrouthy, T.; Nait-ali, A. Advancements in EEG Emotion Recognition: Leveraging Multi-Modal Database Integration. Appl. Sci. 2024, 14, 2487. [Google Scholar] [CrossRef]
- Han, X.; Chen, F.; Ban, J. FMFN: A Fuzzy Multimodal Fusion Network for Emotion Recognition in Ensemble Conducting. IEEE Trans. Fuzzy Syst. 2024. [Google Scholar] [CrossRef]
- Wang, Y.; Guan, L.; Venetsanopoulos, A.N. Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition. IEEE Trans. Multimed. 2012, 14, 597–607. [Google Scholar] [CrossRef]
- Xie, Z.; Guan, L. Multimodal information fusion of audiovisual emotion recognition using novel information theoretic tools. In Proceedings of the 2013 IEEE International Conference on Multimedia and Expo (ICME), San Jose, CA, USA, 15–19 July 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1–6. [Google Scholar]
- Bota, P.; Wang, C.; Fred, A.; Silva, H. Emotion assessment using feature fusion and decision fusion classification based on physiological data: Are we there yet? Sensors 2020, 20, 4723. [Google Scholar] [CrossRef]
- Arthanarisamy Ramaswamy, M.P.; Palaniswamy, S. Subject independent emotion recognition using EEG and physiological signals—A comparative study. Appl. Comput. Inform. 2022. [Google Scholar] [CrossRef]
- Douglas-Cowie, E.; Cowie, R.; Schröder, M. A new emotion database: Considerations, sources and scope. In Proceedings of the ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion, Newcastle, Northern Ireland, UK, 5–7 September 2000; pp. 39–44. [Google Scholar]
- Martin, O.; Kotsia, I.; Macq, B.; Pitas, I. The eNTERFACE’ 05 Audio-Visual Emotion Database. In Proceedings of the 22nd International Conference on Data Engineering Workshops, Atlanta, GA, USA, 3–7 April 2006; p. 8. [Google Scholar]
- Douglas-Cowie, E.; Cowie, R.; Sneddon, I.; Cox, C.; Lowry, O.; McRorie, M.; Martin, J.C.; Devillers, L.; Abrilian, S.; Batliner, A.; et al. The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data. In International Conference on Affective Computing and Intelligent Interaction; Paiva, A.C.R., Prada, R., Picard, R.W., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; Volume 4738, pp. 488–500. [Google Scholar] [CrossRef]
- McKeown, G.; Valstar, M.; Cowie, R.; Pantic, M.; Schroder, M. The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent. IEEE Trans. Affect. Comput. 2012, 3, 5–17. [Google Scholar] [CrossRef]
- Livingstone, S.R.; Russo, F.A. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 2018, 13, e0196391. [Google Scholar] [CrossRef] [PubMed]
- Cao, H.; Cooper, D.G.; Keutmann, M.K.; Gur, R.C.; Nenkova, A.; Verma, R. Crema-d: Crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affect. Comput. 2014, 5, 377–390. [Google Scholar] [CrossRef] [PubMed]
- Bänziger, T.; Mortillaro, M.; Scherer, K.R. Introducing the Geneva Multimodal expression corpus for experimental research on emotion perception. Emotion 2012, 12, 1161. [Google Scholar] [CrossRef]
- Busso, C.; Parthasarathy, S.; Burmania, A.; AbdelWahab, M.; Sadoughi, N.; Provost, E.M. MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Trans. Affect. Comput. 2017, 8, 67–80. [Google Scholar] [CrossRef]
- Ngai, W.K.; Xie, H.; Zou, D.; Chou, K.L. Emotion recognition based on convolutional neural networks and heterogeneous bio-signal data sources. Inf. Fusion 2022, 77, 107–117. [Google Scholar] [CrossRef]
- Liu, Y.J.; Zhang, J.K.; Yan, W.J.; Wang, S.J.; Zhao, G.; Fu, X. A main directional mean optical flow feature for spontaneous micro-expression recognition. IEEE Trans. Affect. Comput. 2016, 7, 299–310. [Google Scholar] [CrossRef]
- Xia, Z.; Hong, X.; Gao, X.; Feng, X.; Zhao, G. Spatiotemporal recurrent convolutional networks for recognizing spontaneous micro-expressions. IEEE Trans. Multimed. 2019, 22, 626–640. [Google Scholar] [CrossRef]
- Peng, W.; Hong, X.; Xu, Y.; Zhao, G. A boost in revealing subtle facial expressions: A consolidated eulerian framework. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–5. [Google Scholar]
- Sun, Y.; Zheng, L.; Yang, Y.; Tian, Q.; Wang, S. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the 15th European Conference on Computer Vision (ECCV2018), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 501–518. [Google Scholar]
- Nguyen, H.D.; Yeom, S.; Lee, G.S.; Yang, H.J.; Na, I.S.; Kim, S.H. Facial emotion recognition using an ensemble of multi-level convolutional neural networks. Int. J. Pattern Recognit. Artif. Intell. 2019, 33, 1940015. [Google Scholar] [CrossRef]
- Hu, G.; Liu, L.; Yuan, Y.; Yu, Z.; Hua, Y.; Zhang, Z.; Shen, F.; Shao, L.; Hospedales, T.; Robertson, N.; et al. Deep multi-task learning to recognise subtle facial expressions of mental states. In Proceedings of the 15th European Conference on Computer Vision (ECCV2018), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 106–123. [Google Scholar]
- Ganin, Y.; Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proceedings of the 32nd International Conference on Machine Learning (PMLR 37), Lille, France, 7–9 July 2015; Volume 37, pp. 1180–1189. [Google Scholar]
- Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-Adversarial Training of Neural Networks. In Domain Adaptation in Computer Vision Applications; Csurka, G., Ed.; Springer International Publishing: Cham, Switzerland, 2017; pp. 189–209. [Google Scholar] [CrossRef]
- Bhattacharya, P.; Gupta, R.K.; Yang, Y. Exploring the contextual factors affecting multimodal emotion recognition in videos. IEEE Trans. Affect. Comput. 2021, 14, 1547–1557. [Google Scholar] [CrossRef]
- Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric deep learning: Going beyond euclidean data. IEEE Signal Process. Mag. 2017, 34, 18–42. [Google Scholar] [CrossRef]
- Zheng, J. Geometric Deep Learning with 3D Facial Motion. Master’s Thesis, Imperial College London, London, UK, 2019. [Google Scholar]
Query Structure: Subexpression 1 AND Subexpression 2 AND Subexpression 3 |
---|
subexpression 1: multimodal OR “multi modal” OR multi-modal OR “multiple modalities” OR multimodality OR multi-modality OR “multiple channels” OR multichannel OR multi-channel OR “multiple sensors” OR multisensor OR multi-sensor OR bimodal OR bi-modal OR bimodality OR bi-modality OR trimodal OR tri-modal OR trimodality OR tri-modality |
subexpression 2: “emotion analysis” OR “emotion recognition” OR “emotion classification” OR “emotion* detection” OR “emotion computing” OR “emotion sensing” OR “emotion assessment” OR “affect recognition” OR “affective computing” OR “emotional state recognition” OR “affective state recognition” |
subexpression 3: visual OR facial OR face OR “body movement*” OR “body motion*” OR gesture* OR posture* OR gesticulation* OR eye OR gaze OR “pupil* dilation” OR “pupil* reflex” OR “pupil* response” OR pupillometry OR pupillogra* OR oculogra* OR lip* OR video* OR audiovisual OR audio-visual OR vocal* OR speech OR audio* OR voice* OR physiological OR biological OR psychophysiological OR biosignal* OR “bio signal*” OR “bio-signal*” OR electroencephalogra* OR eeg OR magnetoencephalogra* OR electrocardiogra* OR ecg OR ekg OR “heart rate” OR “cardiac activit*” OR electromyogra* OR emg OR temg OR “muscle” OR “blood volume” OR bvp OR “blood pressure” OR “blood pulse” OR electrodermal OR eda OR “galvanic skin” OR gsr OR “skin conductance” OR psychogalvanic OR respiration OR accelerometer* OR acceleration* OR electrooculogra* OR eog OR heog OR photoplethysmogra* OR ppg OR “inter-beat interval” OR “interbeat interval” OR “inter beat interval” OR “brain wave*” OR “brain signal*” OR “brain activit*” OR temperature |
Ref | Year | Database | Features/Classifier | Best Performance |
---|---|---|---|---|
[90] | 2011 | BU3DFE | LGBP+LBP/SVM | Acc: |
[89] | 2013 | CK CK+ JAFFE MMI CMU-PIE | LDN/SVM | Acc: Acc: Acc: Acc: Acc: |
[86] | 2013 | CK+ | Geometric/SVM | Acc: |
[91] | 2014 | CK+ | AAM/NN | Acc: |
[94] | 2020 | CK+ | AUs/RF | Acc: |
[87] | 2021 | in-house | ICAT/RF | Acc: |
[88] | 2021 | CK+ MUG | AUs/Ensemble | Acc: Acc: |
[95] | 2023 | MUG | AUs and geometric features/ANFIS | Acc: |
Ref | Year | Database | Features/Classifier | Best Performance |
---|---|---|---|---|
[96] | 2011 | SMIC | LBP-TOP + TIM/MKL | Acc: |
[100] | 2013 | SMIC-HS | LBP-TOP + TIM/SVM | Acc: |
[111] | 2014 | SMIC CASME II | STM/AdaBoost | F1: , Acc: F1: , Acc: |
[97] | 2015 | SMIC CASME II | LBP-SIP/SVM | Acc: Acc: |
[99] | 2015 | SMIC CASME II | LBP-MOP/SVM | Acc: Acc: |
[98] | 2015 | SMIC CASME II | STLBP-IP | Acc: Acc: |
[112] | 2015 | CASME II | Monogenic Riesz Wav./SVM | F1: |
[104] | 2015 | CASME II | LBP-TOP with adapt. magnification/SVM | Acc: |
[106] | 2015 | SMIC CASME II | OSW-LBP-TOP/SVM | Acc: Acc: |
[107] | 2016 | SMIC CASME II | OSF + OSW/SVM | Acc: Acc: |
[110] | 2016 | CASME CASME II | LBP-TOP/RK-SVD | Acc: Acc: |
[101] | 2017 | CASME II | LBP-TOP with EVM/SVM | Acc: |
[21] | 2017 | SMIC CASME II | LBP-TOP with a sparse sampling/SVM | Acc: Acc: 49.00% |
[109] | 2017 | SMIC-HS CASME CASME II | FDM/SVM | F1: , Acc: F1: , Acc: F1: , Acc: |
[108] | 2018 | CAS(ME)2 CASME II SMIC | Bi-WOOF/SVM | F1: , Acc: F1: 0.61 F1: 0.62 |
[102] | 2018 | SMIC-HS CASME II | HIGO + Magnification/SVM | Acc: 75.00% Acc: 78.14% |
[113] | 2019 | SMIC CASME CASME II | DiSTLBP-RIP/SVM | Acc: Acc: Acc: |
Ref | Database | Features/Classifier | Best Performance |
---|---|---|---|
2D CNN | |||
[117] | SMIC, CASME II, and SAMM | EMR | UF1: and UAR: |
[132] | SMIC, CASME II, and SAMM | Dual-Inception | UF1: and UAR: |
[133] | SMIC CASME II SAMM | ResNet, Micro-Attention | Acc: Acc: Acc: |
[134] | SMIC CASME II SAMM | OFF-ApexNet | Acc: Acc: Acc: |
RCNN | |||
[135] | SMIC CASME CASME II | MER-RCNN | Acc: Acc: Acc: |
3D-CNN | |||
[114] | CASME I/II | DTSCNN | Acc: |
[136] | SMIC, CASME II, and SAMM | STSTNet | UF1: and UAR: |
[137] | SMIC CASME CASME II | 3D-FCNN | Acc: Acc: Acc: |
Combined 2D CNN and 3D CNN | |||
[138] | SMIC CASME II SAMM + CASME II | TSNN-IF TSNN-LF | F1:0.6631, UAR:0.6566, WAR:0.7547 F1: 0.6921, UAR:0.6833, WAR:0.7632 |
Combined 2D-CNN and LSTM/GRU/RNN | |||
[139] | SMIC CASME II SAMM | Apex–time network | UF1: and UAR: UF1: and UAR: UF1: and UAR: |
2D-CNN and then LSTM/GRU/RNN | |||
[129] | CASME II | CNN-LSTM | Acc: |
[130] | CASME II SAMM | ELRCN | F1 score:0.5 F1 score:0.409 |
Spatial contextual | |||
[140] | SMIC, CASME II, and SAMM | CapsuleNet | UF1: and UAR: |
[141] | CASME II SAMM | AU-GACN | Acc: Acc: |
Dataset | No. Samples | No. Characters | Comments |
---|---|---|---|
FER2013 | 35 K (small) [35,887] | 35,887 | Hard: Web/Montreal |
JAFFE [144] | 213 | 1 actor | Medium |
BU-3DFE [145] | 2500 | 100 | Video |
RAF-DB [146] | 30 K | 100 | Medium |
Oulu-CASIA | 30 K | 80 subjects | Medium |
Ferg | 55 K | 6 3D characters | Easy |
KDEF [147] | 837 | 36 | Image |
SFER | 30 K | - | Subtle |
CK+ [148] | 327 (593) | 123 subjects | video |
MUG [149] | 75 K | 476 | Image |
TFEID [150] | 368 | 4 | Image |
RaFD [151] | 676 | 18 | Video |
CASME II [152] | 247 | 26 subjects | Spontaneous subtle: video |
SMIC [100] | 164 | 16 subjects | Spontaneous subtle |
SAMM [153] | 159 | 32 subjects | Spontaneous subtle: video |
Ref | Year | Database | Elicitation | Features | Classifier | Average Accuracy |
---|---|---|---|---|---|---|
[158] | 2018 | FABO | Acted | keyframes HMI + CNN + convLSTM | MLP | |
[16] | 2019 | GEMEP | Acted | CNN | MLP | |
[159] | 2021 | iMiGUE (micro-gestures) | Natural | BiLSTM (encoder)/ LSTM (decoder) | BiLSTM (encoder)/ LSTM (decoder) | |
[160] | 2022 | MASR | Acted | BiLSTM with attention module | HPN/SAE | (seen)/ (unseen) |
Dataset | No. of Emotions | No. of Utterances | Persons | Comments | Difficulty |
---|---|---|---|---|---|
EMO-DB [168] | 7 | 535 | 10 | Acted | Medium + text |
EVD [169] | 5 | 9750 | 5 | Acted | Medium |
CHAD [170] | 7 | 6228 | 42 | Acted | Medium |
Ref | Year | Database | Features | Classifier | Average Accuracy |
---|---|---|---|---|---|
[171] | 2018 | EMO-DB SAVEE | Combine WLPCCVQ and WMFCCVQ | RBFNN | |
[172] | 2019 | EMO-DB | AM–FM modulation features (MS, MFF), cepstral features (ECC, EFCC, SMFCC) from THT | SVM | |
[173] | 2020 | EMO-DB SAVEE | Combine either MFCC, HFCC, TBFCC-B or TBFCC-E | SVM |
Ref | Year | Database | Features | Classifier | Average Accuracy |
---|---|---|---|---|---|
[174] | 2008 | Crema-D | CNN | CNN | 60–70% |
[175] | 2018 | AESDD | LSTM | LSTM | F1 score: 0.6523 |
[176] | 2021 | LSSED | GRU | GRU | Acc:53.45 |
[177] | 2020 | ESD | CNN | SVM | Acc:63.78 |
[178] | 2022 | AESDD | CNN | CNN | |
[179] | 2023 | TESS EMO-DB RAVDESS SAVEE CREMA-D | Ensemble of 1D CNN, LSTM and GRU | FCN | WAAcc: WAAcc: WAAcc: WAAcc: WAAcc: |
[180] | 2024 | IEMOCAP NoiseX92 | Wave-U-Net | Wave-U-Net | 62.4% (at 0 dB SNR) |
[181] | 2024 | EMO-DB RAVDESS SAVEE IEMOCAP SHEIE | Ensemble of Transformer, CNN and LSTM | FCN | |
[182] | 2023 | RAVDESS EMOVO | Statistical mean of MFCCs | 3-1D CNN | Acc: 87.08 Acc:83.90 (speaker-dependent) |
[183] | 2019 | EMO-DB | Feature fusion by DBNs: prosody features (fundamental frequency, power), voice quality features (the first, second and third formants with their bandwidths), spectral features (WPCC and W-WPCC) | DBNs/SVM | Acc: |
[172] | 2019 | EMO-DB | AM–FM modulation features (MS, MFF), cepstral features (ECC, EFCC, SMFCC) from THT | RNN | Acc: |
Ref | Year | Database | Elicitation | Features | Classifier | Accuracy |
---|---|---|---|---|---|---|
[190] | 2014 | GAMEEMO | Acted | PS wavelet ENT HE FD | SVM | |
[195] | 2014 | DEAP | Induced | Sample entropy | SVM | val: , ar: |
[192] | 2016 | DEAP | Induced | Sample entropy of IMFs | SVM | |
[196] | 2016 | In-house dataset induced by IADS | Induced | FD, 5 statistics and 4 band powers (, , , ratio) | SVM | |
[194] | 2011 | in-house dataset induced by audio-visual stimuli | Induced | Wavelet | kNN LDA | |
[193] | 2018 | DEAP | Induced | Entropy from temporal window | MLPs combined through DST | ar: , val: |
[197] | 2018 | SEED DREAMER | Induced | Differential entropy PSD | DGCNN | val: , ar: , dom: |
[198] | 2021 | SEED | Induced | Differential entropy and brain network | CNN | |
[199] | 2024 | SEED | Induced | AND NDE | SVM | 79.16–91.39% 81.66–85.39% |
[200] | 2024 | DEAP | Induced | Hybrid of time, frequency, time–frequency and location features | kNN SVM ANN |
Dataset | No. of Modalities | No. of Emotions | No. of Subjects | Comments |
---|---|---|---|---|
WESAD | BVP, ECG, EDA, EMG, RESP, TEMP, ACC | 6 | 15 | Induced |
SWELL | ECG | Multi-dimensional (val, ar, dom) | 25 | Induced |
AMIGOS | ECG, EEG and GSR | 7 | 40 | Induced |
GAMEEMO | EEG | Multi-dimensional (val, ar) | 28 | Induced |
Ref | Year | Database | Modality | Features | Classifier | Average Accuracy | Fusion Method |
---|---|---|---|---|---|---|---|
[164] | 2015 | eNTERFACE | Audio/facial (visual)/text combined | - | SVM | Acc: | Feature-level fusion (concatenation) |
[237] | 2012 | RML eNTERFACE | Visual/video and audio | - | HMM | Acc: | Kernel-based feature-level fusion and decision-level fusion |
[238] | 2013 | RML eNTERFACE | Facial visual and audio | - | HMM | Acc: | Combination of feature-level (KECA) and decision-level |
[239] | 2020 | ITMDER WESAD | ECG, EDA, RESP and BVP | - | SVM (ar), RF (val) QDA (ar) SVM (val) | ar: , val: ar: , val: | Feature-level fusion (concatenation) |
[219] | 2011 | In-house dataset (induced) | Eye gaze and EEG | - | SVM | (FF) ar: , val: (DF) ar: , val: | Feature-level (FF) and decision-level (DF) |
[217] | 2019 | MAHNOB-HCI DEAP | Facial and EEG | CNN/PSD | CNN/SVM | ar: , val: ar: , val: | AdaBoost for decision-level fusion |
[240] | 2022 | DEAP | EOG and EMG | PSD, Hjorth activity and complexity | Logit boost | Data-level fusion |
Dataset | No. Emotions | No. Utterances | No. Persons | Comments |
---|---|---|---|---|
Belfast [241] | 8 | 1440 | 60 | dimensional + categorical |
eNTERFACE [242] | 7 | 239 | 42 | - |
HUMAINE [243] | 7 | 50 | 10 | naturalistic and induced data + text |
IEMOCAP [174] | 10 | 5K-10K | 10 | Actors conversation + text |
SEMAINE [244] | 7 | 959 | 150 | - |
RAVDESS [245] | 8 | 1440 | 60 | - |
CREMA-D [246] | 6 | 7442 | 91 | - |
GEMEP [247] | 15 | 1260 | 10 | - |
MSP-IMPROV [248] | - | - | 12 | - |
DEAP | - | EEG and audio | 25 (useful) | Dimensional emotion model description |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Udahemuka, G.; Djouani, K.; Kurien, A.M. Multimodal Emotion Recognition Using Visual, Vocal and Physiological Signals: A Review. Appl. Sci. 2024, 14, 8071. https://doi.org/10.3390/app14178071
Udahemuka G, Djouani K, Kurien AM. Multimodal Emotion Recognition Using Visual, Vocal and Physiological Signals: A Review. Applied Sciences. 2024; 14(17):8071. https://doi.org/10.3390/app14178071
Chicago/Turabian StyleUdahemuka, Gustave, Karim Djouani, and Anish M. Kurien. 2024. "Multimodal Emotion Recognition Using Visual, Vocal and Physiological Signals: A Review" Applied Sciences 14, no. 17: 8071. https://doi.org/10.3390/app14178071
APA StyleUdahemuka, G., Djouani, K., & Kurien, A. M. (2024). Multimodal Emotion Recognition Using Visual, Vocal and Physiological Signals: A Review. Applied Sciences, 14(17), 8071. https://doi.org/10.3390/app14178071