Macro- and Micro-Expressions Facial Datasets: A Survey
Abstract
:1. Introduction
- -
- Posed datasets are typically acquired by asking the subjects to show one of the six basic expressions as defined by Ekman [8]. In most of the cases, experienced actors are enrolled, and capturing takes place in constrainedaboratory conditions;
- -
- Spontaneous datasets include expressions that are stimulated by the participants. For instance, this can be the result of watching a video or of a face-to-face interaction. Participants are aware that they are monitored, but emotions are shown in a natural way, rather than acted. In most of the cases, the acquisition context is a constrained one;
- -
- In-the-wild datasets relax any acquisition constraint, and expressive subjects are filmed in real-world scenarios. This is obtained by analyzing facial expressions in images and videos in movies, talk-shows, interviews, etc.
- -
- First quadrant—emotional states go from pleased (high valence, medium arousal) to excited (about neutral valence, high arousal);
- -
- Second quadrant—high arousal with about neutral valence here indicates an alarmed state, while high-negative valence and medium arousal bring to a frustrated state;
- -
- Third quadrant—in this quadrant, high-negative valence and medium arousal indicate sad/depressed condition, while the status withow arousal and about neutral valence corresponds to a tired state;
- -
- Fourth quadrant—finally, in this quadrant forow arousal and about neutral valence a calm/sleepy state is valence and medium arousal.
2. Macro-Expression Datasets
- -
- Number of subjects: The existing datasets vary between four and thousands of subjects. The number of different individuals is particularly relevant for methods that needarge quantities of data toearn models capable of generalizing to unseen identities;
- -
- Age: Enrolled subjects vary from infants to young children and elderly people;
- -
- Frames per second (FPS): This can vary depending on the application context. For instance, to study the facial expression dynamics, a high FPS can help, whereasow FPS is often adopted for samples captured in real-life conditions;
- -
- Ethnicity: Variability in terms of ethnic groups such as Caucasian, Latino, Asian, Black or African American, East-Asian, South-Asian, Turkish, etc., can be relevant and is typically a desired feature in collecting expression datasets;
- -
- Amount of data: Number of images, videos or video frames.
2.1. Spontaneous Datasets
2.2. Spontaneous and Posed Datasets
2.3. In-the-Wild Datasets
2.4. Other Categorizations of Macro-Expression Datasets
- -
- In spontaneous datasets, unlike posed datasets, where participants are asked to perform an emotion, subjects’ emotions are stimulated. For example, in [9], face expressions were captured when volunteers were asked to watch a few stimulant videos. In a similar way, in [43], participants were shown fragments of movies and pictures. In [31], emotional videos were used for each emotion, and in the dataset investigated in [14], combined interviews, planned activities, film watching, cold pressor, test/social challenge and Olfactory stimulation were explored. In [42], participants were told to change character when they got bored, annoyed or felt they had nothing more to say to the character. The dataset proposed in [49] collected conversational speech, and the work in [51] had been based on a conversation between two people in which one paysittle or no attention to the meaning of what the other says and chooses responses on the basis of superficial cues. In [50], participants were from a clinical trial for treatment of depression, however, in [27], the participant has a dialogue script with vignettes for each emotional category. In [38], subjects had performed a human–computer interaction task, similarly to the work of [39], where natural conversations between pairs of people were investigated. In [59], subjects were interviewed and asked to describe the childhood experience, and in [56], subjects tried to convince the interviewers they were telling the truth. In [48], subjects had described neutral photographs, played a game of Tetris, described the game of Tetris and solved cognitive tasks. Differently, in [57], a driver was recorded during the drive, and the work of [52] presented an interaction from TV chat shows and religious programs and discussions between old acquaintances. In [53], participants were playing a game in which one person has to explain to the other using gestures and body movement a ‘taboo’ concept or word.
- -
- Number of subjects: Table 3 presents a classification of macro-expression datasets according to the number of subjects. Most of the datasets containess than 50 subjects, with just few datasets containing more than 500 subjects. The number of subjects can reach more than thousands, if the expressions are spontaneous or in-the-wild.
- Age variation: There are many age ranges in macro-expression datasets. Most of the datasets include subjects in a relatively small range (from 18 to30 years), namely TAVER, RAVDESS, GFT, MAHNOB Mimicry, BP4D-Spontaneous, MAHNOB Laughter, DEAP, USTC-NVIE, MMI-V, AvID, AVIC, ENTERFACE, UT-Dallas, RU-FACS, UA-UIUC, AAI, iSAFE, and ISED. Some other datasets have a moderate range (18–60), including EB+, SEWA, BP4D+ (MMSE), BAUM-1, BioVid Emo, 4D CCDb, AVEC’14, DISFA, AVEC’13 AViD-Corpus, CCDb, DynEmo, SEMAINE, MAHNOB-HCI, Hi4D-ADSIP, CAM3D, B3D(AC), CK+, VAM-faces, and MM. Few datasets contain children, including CHEAVD, 4DFAB, BAUM-2, AFEW-VA, AFEW, and Aff-Wild2. However, child facial expressions were mixed within adult expression samples without differentiating them based on age or age group. On the other hand, in the CHEAVD dataset, the participants were divided into six groups of ages, and in the 4DFAB dataset, the age distribution includes five categories, with infants being in the 5–18 category. However, the datasets did not take into consideration the difference of the facial expressions according to the age.
- Frame per second (FPS): In macro-expression analysis, the number of FPS is relevant depending on the application context. In the following datasets, the number of FPS is smaller or equal to 20: TAVER, AM-FED+, and AM-FED. Instead, the number of FPS is greater than 50 for the 4DFAB, 4D CCDb, MAHNOB-HCI, Hi4D-ADSIP, FreeTalk, iSAFE, and ISED datasets. Theargest number of FPS, equal to 120, is reached in the IEMOCAP dataset, which makes it a relevant source for studying macro expressions.
- Ethnicity: The existing macro-expression datasets contain various ethnicities such as Latino (EB+, 4DFAB, Aff-Wild2, BP4D+, RU-FACS), Hispanic (EB+, 4DFAB, Aff-Wild2, BP4D+, BP4D-Spontaneous, DISFA), White (EB+, BP4D+), African (EB+, Aff-Wild2, BP4D+, BP4D-Spontaneous, DISFA), Asian (EB+, 4DFAB, Aff-Wild2, BP4D+, BP4D-Spontaneous, DISFA, CAM3D, MMI-V, AVIC, MMI, RU-FACS, iSAFE), and Caucasian (4DFAB, Aff-Wild2, RAVDESS, DynEmo, CAM3D, UT-Dallas). However, some datasets contain participants from around the world or randomly selected (RAF-DB, AM-FED+, GFT, AffectNet, AFEW-VA, EmotioNet, AM-FED, AFEW, FreeTalk).
- Amount of data: Here, the main distinction is between datasets that include images;ike EB+, TAVER, Aff-Wild2, AM-FED+, AFEW-VA, SEWA, Aff-Wild, BAUM-1, BioVid Emo, Vinereactor, CHEAVD, 4D CCDb, OPEN-EmoRec-II, AVEC’14, RECOLA, AM-FED, AVEC’13, CCDb, DynEmo, DEAP, AFEW, Belfast induced, MAHNOB-HCI, UNBC-McMaster, CAM3D, B3D(AC), UT-Dallas, EmoTV, UA-UIUC, and AAI; and datasets that instead comprise videos;ike RAF-DB, AffectNet, EmotioNet, FER-Wild, HAPPEI, FER-2013, SFEW, USTC-NVIE, iSAFE, and ISED.
2.5. Current Challenges and Future Perspectives
3. Micro-Expression Datasets
3.1. Spontaneous Datasets
3.2. In-the-Wild Datasets
3.3. Other Categorizations of Micro-Expression Datasets
- Number of subjects: Table 4 presents a classification of micro-expression datasets according to the number of enrolled subjects. We classify the datasets according to the fact they involveess than 50 participants or more than 100 participants.
- Frame per second (FPS) and resolution: Due to the importance of the FPS rate in the detection of micro-expression datasets, we have found that the number of FPS reaches the value of 200 in both the SAMM and the CASME II datasets, which is a higher number than that used in macro-expression datasets. In the following datasets, the number of FPS is equal or greater than 100: Silesian deception, CASME, SMIC-E HS, and SMIC. There are also micro-expression datasets, where the number of FPS is smaller than 50 as for CAS(ME)2, MEVIEW, SMIC-E VIS, SMIC-E NIR, and YorkDDT. To help capture more subtle facial movements, a higher number of FPS and resolution is needed. As best as we know, the highest resolution available for micro-expressions datasets is pixels as presented by the SMM dataset; and theowest resolution set, instead, is equal to as contained in the YorkDDT dataset. For the rest of the micro-expression datasets, the resolution is set to in the CAS(ME)2, Silesian deception, CASME II, CASME, SMIC-E, and SMIC datasets.
- Amount of data and samples: Unlike macro-expression datasets, most of the micro-expression datasets contain videos. The major difference between micro- and macro-expressions resides in the number of samples and/or the number of micro-expressions. We classify the datasets according to whether they containess than 50 samples as in MEVIEW, Canal9 and YorkDDT, between 50 and 100 samples as in CAS(ME)2, SMIC-E VIS, SMIC-E NIR and SMIC, or between 100 and 200 samples as in SAMM, Silesian deception, CASME and SMIC-E HS. The CASME II dataset includes 247 samples.
- Lights: Micro-expression datasets propose severalightning conditions. Fourights have been used in both the CASME II and the SMIC-E datasets, while twoights were performed for SAMM and CAS(ME)2 and in the second class of CASME.
3.4. Current Challenges and Future Perspectives
4. Applications
4.1. Medical Applications
4.2. Smart Driving Applications
4.3. Social Marketing Applications
4.4. Human Computer Interactions
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Panksepp, J. Affective Neuroscience: The Foundations of Human and Animal Emotions; Oxford University Press: Oxford, UK, 1998. [Google Scholar]
- Myers, D. Theories of Emotion. In Psychology, 7th ed.; Worth Publishers: New York, NY, USA, 2004. [Google Scholar]
- Davis, J.; Senghas, A.; Ochsner, K. How does facial feedback modulate emotional experience? J. Res. Personal. 2009, 43, 822–829. [Google Scholar] [CrossRef] [Green Version]
- Heaven, D. Why faces don’t always tell the truth about feelings. Nature 2020, 578, 502–504. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Barrett, L.; Adolphs, R.; Marsella, S.; Martinez, A.; Pollak, S. Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements. Psychol. Sci. Public Interest 2019, 20, 1–68. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ghazouani, H. A genetic programming-based feature selection and fusion for facial expression recognition. Appl. Soft Comput. 2021, 103, 107173. [Google Scholar] [CrossRef]
- Dhall, A.; Goecke, R.; Lucey, S.; Gedeon, T. Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011; pp. 2106–2112. [Google Scholar]
- Ekman, P. An argument for basic emotions. Cogn. Emot. 1992, 6, 169–200. [Google Scholar] [CrossRef]
- Shivendra, S.; Shajulin, B.; Thampi Sabu, M.; Hegde Rajesh, M.; Sri, K.; Jayanta, M.; Vipin, C.; Oge, M.; Selwyn, P.; Corchado Juan, M. (Eds.) Indian Semi-Acted Facial Expression (iSAFE) Dataset for Human Emotions Recognition. In Advances in Signal Processing and Intelligent Recognition Systems. Communications in Computer and Information Science; Springer: Singapore, 2020; pp. 150–162. ISBN 978-981-15-4828-4. [Google Scholar] [CrossRef]
- Dhall, A.; Goecke, R.; Lucey, S.; Gedeon, T. Collectingarge, richly annotated facial-expression databases from movies. IEEE Multimed. 2012, 3, 34–41. [Google Scholar] [CrossRef] [Green Version]
- Goodfellow, I.J.; Erhan, D.; Carrier, P.L.; Courville, A.; Mirza, M.; Hamner, B.; Cukierski, W.; Tang, Y.; Thaler, D.; Lee, D.-H.; et al. Challenges in representationearning: A report on three machineearning contests. Neural Netw. 2015, 64, 59–63. [Google Scholar] [CrossRef] [Green Version]
- Matuszewski, B.J.; Quan, W.; Shark, L.; Mcloughlin, A.S.; Lightbody, C.E.; Emsley, H.C.A.; Watkins, C.L. Hi4D-ADSIP 3-d dynamic facial articulation database. Image Vis. Comput. 2012, 30, 713–727. [Google Scholar] [CrossRef]
- Erdem, C.E.; Turan, C.; Aydin, Z. BAUM-2: A multilingual audio-visual affective face database. Multimed. Tools Appl. 2014, 74, 7429–7459. [Google Scholar] [CrossRef]
- Zhang, X.; Yin, L.; Cohn, J.; Canavan, S.; Reale, M.; Horowitz, A.; Liu, P.; Girard, J.M. BP4D-spontaneous: A high-resolution spontaneous 3d dynamic facial expression database. Image Vis. Comput. 2014, 32, 692–706. [Google Scholar] [CrossRef]
- Mollahosseini, A.; Hasani, B.; Salvador, M.J.; Abdollahi, H.; Chan, D.; Mahoor, M.H. Facial expression recognition from world wild web. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 58–65. [Google Scholar]
- Zhalehpour, S.; Onder, O.; Akhtar, Z.; Erdem, C.E. Baum-1: A spontaneous audio-visual face database of affective and mental states. IEEE Trans. Affect. Comput. 2016, 8, 300–313. [Google Scholar] [CrossRef]
- Benitez-Quiroz, C.F.; Srinivasan, R.; Martinez, A.M. Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR16), Las Vegas, NV, USA, 27–30 June 2016; pp. 5562–5570. [Google Scholar]
- Russell, J.A. A circumplex model of affect. J. Personal. Soc. Psychol. 1980, 39, 1161–1178. [Google Scholar] [CrossRef]
- Rubin, D.C.; Talarico, J.M. Comparison of dimensional models of emotion: Evidence from emotions, prototypical events, autobiographical memories, and words. Memory 2009, 17, 802–808. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Khan, G.; Samyan, S.; Khan, M.U.; Shahid, M.; Wahla, S. A survey on analysis of human faces and facial expressions datasets. Int. J. Mach. Learn. Cybern. 2020, 11, 553–571. [Google Scholar] [CrossRef]
- Naga, P.; Marri, S.D.; Borreo, R. Facial emotion recognition methods, datasets and technologies: Aliterature survey. Mater. Today Proc. 2021, in press. [CrossRef]
- Ekman, P.; Keltner, D. Universal facial expressions of emotion. In Nonverbal Communication: Where Nature Meets Culture; Segerstrale, U., Molnar, P., Eds.; Routledge: Milton Park, UK, 1997; Volume 27, p. 46. [Google Scholar]
- Ekman, P. Are There Basic Emotions? Cambridge University Press: Cambridge, UK, 1992. [Google Scholar]
- Ertugrul, I.O.; Cohn, J.F.; Jeni, L.A.; Zhang, Z.; Yin, L.; Ji, Q. Crossing Domains for AU Coding: Perspectives, Approaches, and Measures. IEEE Trans. Biom. Behav. Identity Sci. 2020, 2, 158–171. [Google Scholar] [CrossRef]
- Zhang, Z.; Girard, J.M.; Wu, Y.; Zhang, X.; Liu, P.; Ciftci, U.; Canavan, S.; Reale, M.; Horowitz, A.; Yang, H.; et al. Multimodal spontaneous emotion corpus for human behavior analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3438–3446. [Google Scholar]
- Lee, J.; Kim, S.; Kim, S.; Sohn, K. Tri-modal Recurrent Attention Networks for Emotion Recognition. IEEE Trans. Image Process. 2019, 29, 6977–6991. [Google Scholar] [CrossRef]
- Livingstone, S.R.; Russo, F.A. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 2018, 13, e0196391. [Google Scholar] [CrossRef] [Green Version]
- Girard, J.M.; Chu, W.; Jeni, L.A.; Cohn, J.F. Sayette group formation task (GFT) spontaneous facial expression database. In Proceedings of the IEEE International Conference on Automated Face and Gesture Recognition, Washington, DC, USA, 30 May–3 June 2017. [Google Scholar]
- Kossaifi, J.; Walecki, R.; Panagakis, Y.; Shen, J.; Schmitt, M.; Ringeval, F.; Han, J.; Pandit, V.; Schuller, B.; Star, K.; et al. Sewa db: A rich database for audio-visual emotion and sentiment research in the wild. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 2019, 43, 1022–1040. [Google Scholar] [CrossRef] [Green Version]
- Zhang, L.; Walter, S.; Ma, X.; Werner, P.; Al-Hamadi, A.; Traue, H.C.; Gruss, S. BioVid Emo DB: A Multimodal Database for Emotion Analyses validated by Subjective Ratings. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 6–9 December 2016; pp. 1–6. [Google Scholar]
- Happy, S.L.; Patnaik, P.; Routray, A.; Guha, R. The Indian Spontaneous Expression Database for Emotion Recognition. IEEE Trans. Affect. Comput. 2016, 8, 131–142. [Google Scholar] [CrossRef] [Green Version]
- Vandeventer, J.; Aubrey, A.J.; Rosin, P.L.; Marshall, D. 4D Cardiff Conversation Database (4D CCDb): A 4D database of natural, dyadic conversations. In Proceedings of the 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing (FAAVSP 2015), Vienna, Austria, 11–13 September 2015; pp. 157–162. [Google Scholar]
- Bilakhia, S.; Petridis, S.; Nijholt, A.; Pantic, M. The mahnob mimicry database: A database of naturalistic human interactions. Pattern Recognit. Lett. 2015, 66, 52–61. [Google Scholar] [CrossRef] [Green Version]
- Rukavina, S.; Gruss, S.; Walter, S.; Hoffmann, H.; Traue, H.C. OPEN-EmoRec-II-A Multimodal Corpus of Human-Computer Interaction. Int. J. Comput. Electr. Autom. Control Inf. Eng. 2015, 9, 977–983. [Google Scholar]
- Valstar, M.; Schuller, B.; Smith, K.; Almaev, T.; Eyben, F.; Krajewski, J.; Cowie, R.; Pantic, M. AVEC 2014—The Three Dimensional Affect and Depression Challenge. In Proceedings of the 4th ACM International Workshop on Audio/Visual Emotion Challenge, New York, NY, USA, 7 November 2014; ACM: Orlando, FL, USA, 2014. [Google Scholar]
- Mavadati, S.M.; Mahoor, M.H.; Bartlett, K.; Trinh, P.; Cohn, J.F. DISFA: A spontaneous facial action intensity database. IEEE Trans. Affect. Comput. 2013, 4, 151–160. [Google Scholar] [CrossRef]
- Ringeval, F.; Sonderegger, A.; Sauer, J.; Lalanne, D. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Shanghai, China, 22–26 April 2013; pp. 1–8. [Google Scholar] [CrossRef]
- Valstar, M.; Schuller, B.; Smith, K.; Eyben, F.; Jiang, B.; Bilakhia, S.; Schnieder, S.; Cowie, R.; Pantic, M. AVEC 2013-The Continuous Audio/Visual Emotion and Depression Recognition Challenge. In Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, New York, NY, USA, 21 October 2013; ACM: Murcia, Spain, 2013; pp. 3–10. [Google Scholar]
- Aubrey, A.J.; Marshall, D.; Rosin, P.L.; Vendeventer, J.; Cunningham, D.W.; Wallraven, C. Cardiff conversation database (ccdb): A database of natural dyadic conversations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Washington, DC, USA, 21–23 June 2013; pp. 277–282. [Google Scholar]
- Tcherkassof, A.; Dupré, D.; Meillon, B.; Mandran, N.; Dubois, M.; Adam, J.-M. DynEmo: A video database of natural facial expressions of emotions. Int. J. Multimed. Its Appl. 2013, 5, 61–80. [Google Scholar] [CrossRef] [Green Version]
- Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.-S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. Deap: A database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 2012, 3, 18–31. [Google Scholar] [CrossRef] [Green Version]
- McKeown, G.; Valstar, M.; Cowie, R.; Pantic, M.; Schroder, M. The SEMAINE database: Annotated multimodal records of emotionally colored conversations between a person and aimited agent. IEEE Trans. Affect. Comput. 2012, 3, 5–17. [Google Scholar] [CrossRef] [Green Version]
- Soleymani, M.; Lichtenauer, J.; Pun, T.; Pantic, M. A Multimodal Database for Affect Recognition and Implicit Tagging. IEEE Trans. Affect. Comput. 2012, 3, 42–55. [Google Scholar] [CrossRef] [Green Version]
- Lucey, P.; Cohn, J.F.; Prkachin, K.M.; Solomon, P.E.; Matthews, I. Painful data: The UNBC-McMaster shoulder pain expression archive database. In Proceedings of the 2011 IEEE International Conference on Automatic Face and Gesture Recognition (FG), Santa Barbara, CA, USA, 21–25 March 2011; pp. 57–64. [Google Scholar]
- Mahmoud, M.; Baltrusaitis, T.; Robinson, P.; Riek, L.D. 3D corpus of spontaneous complex mental states. In ACII 2011. LNCS; D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C., Eds.; ACII: Memphis, TN, USA, 2011; Volume 6974, pp. 205–214. [Google Scholar]
- Fanelli, G.; Gall, J.; Romsdorfer, H.; Weise, T.; Gool, L.V. A 3-d audio-visual corpus of affective communication. IEEE Trans. Multimed. 2010, 12, 591–598. [Google Scholar] [CrossRef]
- Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 94–101. [Google Scholar]
- Gajsek, R.; Struc, V.; Mihelic, F.; Podlesek, A.; Komidar, L.; Socan, G.; Bajec, B. Multi-modal emotional database: AvID. Informatica 2009, 33, 101–106. [Google Scholar]
- Schueller, B.; Mueller, R.; Eyben, F.; Gast, J.; Hoernler, B.; Woellmer, M.; Rigoll, G.; Hoethker, A.; Konosu, H. Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image Vis. Comput. 2009, 27, 1760–1774. [Google Scholar] [CrossRef] [Green Version]
- Cohn, J.; Kruez, T.; Matthews, I.; Yang, Y.; Nguyen, M.; Padilla, M.; Zhou, F.; De la Torre, F. Detecting depression from facial actions and vocal prosody. In Proceedings of the 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, Amsterdam, The Netherlands, 10–12 September 2009; pp. 1–7. [Google Scholar]
- Douglas-Cowie, E.; Cowie, R.; Cox, C.; Amier, N.; Heylen, D. The sensitive artificialistener: An induction technique for generating emotionally coloured conversation. In Proceedings of the LREC Workshop on Corpora for Research on Emotion and Affect, Kingston, ON, Canada, 13 September 2008; pp. 1–4. [Google Scholar]
- Douglas-Cowie, E.; Cowie, R.; Sneddon, I.; Cox, C.; Lowry, O.; McRorie, M.; Martin, J.-C.; Devillers, L.; Abrilian, S.; Batliner, A.; et al. The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data. In Proceedings of the Affective Computing and Intelligent Interaction, Second International Conference, Lisbon, Portugal, 12–14 September 2007; pp. 488–500. [Google Scholar]
- Zara, A.; Maffiolo, V.; Martin, J.C.; Devillers, L. Collection and Annotation of a Corpus of HumanHuman Multimodal Interactions: Emotion and Others Anthropomorphic Characteristics. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction, Lisbon, Portugal, 12–14 September 2007; pp. 464–475. [Google Scholar]
- Savran, A.; Ciftci, K.; Chanel, G.; Mota, J.; Hong Viet, L.; Sankur, B.; Rombaut, M. Emotion detection in theoop from brain signals and facial images. In Proceedings of the eNTERFACE 2006 Workshop, Dubrovnik, Croatia, 17 June–16 August 2006. [Google Scholar]
- O’Toole, A.J.; Harms, J.; Snow, S.L.; Hurst, D.R.; Pappas, M.R.; Ayyad, J.H.; Abdi, H. A video database of moving faces and people. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 812–816. [Google Scholar] [CrossRef] [PubMed]
- Bartlett, M.; Littlewort, G.; Frank, M.; Lainscsek, C.; Fasel, I.; Movellan, J. Automatic recognition of facial actions in spontaneous expressions. J. Multimed. 2006, 1, 22–35. [Google Scholar] [CrossRef]
- Healey, J.A.; Picard, R.W. Detecting Stress during Real-World Driving Tasks Using Physiological Sensors. IEEE Trans. Intell. Transp. Syst. 2005, 6, 156–166. [Google Scholar] [CrossRef] [Green Version]
- Sebe, N.; Lew, M.S.; Cohen, I.; Sun, Y.; Gevers, T.; Huang, T.S. Authentic facial expression analysis. In Proceedings of the 6th IEEE International Conference Automatic Face and Gesture Recognition, Seoul, Korea, 19 May 2004. [Google Scholar]
- Roisman, G.I.; Tsai, J.L.; Chiang, K.S. The emotional integration of childhood experience: Physiological, facial expressive, and self-reported emotional response during the adult attachment interview. Dev. Psychol. 2004, 40, 776–789. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schmidt, K.L.; Cohn, J.F. Dynamics of facial expression: Normative characteristics and individual differences. In Proceedings of the IEEE International Conference on Multimedia and Expo, 2001, ICME 2001, Tokyo, Japan, 22–25 August 2001; pp. 547–550. [Google Scholar]
- Cheng, S.; Kotsia, I.; Pantic, M.; Zafeiriou, S. 4DFAB: A Large Scale 4D Facial Expression Database for Biometric Applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Petridis, S.; Martinez, B.; Pantic, M. The MAHNOB Laughter database. Image Vis. Comput. 2013, 31, 186–202. [Google Scholar] [CrossRef] [Green Version]
- Psychological Image Collection at Stirling (PICS) 2013. 2013. Available online: http://pics.stir.ac.uk (accessed on 29 May 2020).
- Wang, S.; Liu, Z.; Lv, S.; Lv, Y.; Wu, G.; Peng, P.; Wang, X. A Natural Visible and Infrared Facial Expression Database for Expression Recognition and Emotion Inference. IEEE Trans. Multimed. 2010, 12, 682–691. [Google Scholar] [CrossRef]
- Valstar, M.; Pantic, M. Induced disgust, happiness and surprise: An addition to the MMI facial expression database. In Proceedings of the Int’l Conf. Language Resources and Evaluation, Workshop on EMOTION, Valletta, Malta, 5 May 2010; pp. 65–70. [Google Scholar]
- Pantic, M.; Valstar, M.; Rademaker, R.; Maat, L. Web-based database for facial expression analysi. In Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, Amsterdam, Netherlands, 6 July 2005; p. 5. [Google Scholar]
- Urbain, J.; Bevacqua, E.; Dutoit, T.; Moinet, A.; Niewiadomski, R.; Pelachaud, C.; Picart, B.; Tilmanne, J.; Wagner, J. The AVLaughterCycle database. In Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17–23 May 2010. [Google Scholar]
- Busso, C.; Bulut, M.; Lee, C.-C.; Kazemzadeh, A.; Mower, E.; Kim, S.; Chang, J.N.; Lee, S.; Narayanan, S.S. Iemocap: Interactive emotional dyadic motion capture database. Lang. Resour. Eval. 2008, 42, 335–359. [Google Scholar] [CrossRef]
- Carletta, J.; Ashby, S.; Bourban, S.; Flynn, M.; Guillemot, M.; Hain, T.; Kadlec, J.; Karaiskos, V.; Kraaij, W.; Kronenthal, M.; et al. The AMI meeting corpus: A pre-announcement. In Machine Learning for Multimodal Interaction; Springer: Berlin/Heidelberg, Germany, 2006; pp. 28–39. [Google Scholar]
- Li, S.; Deng, W. Reliable Crowdsourcing and Deep Locality-Preserving Learning for Unconstrained Facial Expression Recognition. IEEE Trans. Image Process. 2019, 28, 356–370. [Google Scholar] [CrossRef]
- Kollias, D.; Zafeiriou, S. Aff-Wild2: Extending the Aff-Wild Database for Affect Recognition. arXiv 2018, arXiv:1811.07770. [Google Scholar]
- McDuff, D.; Amr, M.; Kaliouby, R.E. AM-FED+: An Extended Dataset of Naturalistic Facial Expressions Collected in Everyday Settings. IEEE Trans. Affect. Comput. 2018, 10, 7–17. [Google Scholar] [CrossRef]
- Mollahosseini, A.; Hasani, B.; Mahoor, M.H. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Trans. Affect. Comput. 2017, 10, 18–31. [Google Scholar] [CrossRef] [Green Version]
- Kossaifi, J.; Tzimiropoulos, G.; Todorovic, S.; Pantic, M. AFEW-VA database for valence and arousal estimation in-the-wild. Image Vis. Comput. 2017, 65, 23–36. [Google Scholar] [CrossRef]
- Zafeiriou, S.; Papaioannou, A.; Kotsia, I.; Nicolaou, M.A.; Zhao, G. Facial affect “in-the-wild”: A survey and a new database. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Affect “in-the-Wild” Workshop, Las Vegas, NA, USA, 26 June–1 July 2016. [Google Scholar]
- Kim, E.; Vangala, S. Vinereactor: Crowdsourced spontaneous facial expression data. In Proceedings of the International Conference on Multimedia Retrieval (ICMR), New York, NY, USA, 2–6 June 2016. [Google Scholar]
- Li, Y.; Tao, J.; Chao, L.; Bao, W.; Liu, Y. Cheavd: A chinese natural emotional audio–visual database. J. Ambient. Intell. Humaniz. Comput. 2016, 8, 913–924. [Google Scholar] [CrossRef]
- Dhall, A.; Goecke, R.; Gedeon, T. Automatic group happiness intensity analysis. IEEE Trans. Affect. Comput. 2015, 6, 13–26. [Google Scholar] [CrossRef]
- McDuff, D.; Kaliouby, R.E.; Senechal, T.; Amr, M.; Cohn, J.; Picard, R. Affectiva MIT Facial Expression Dataset (AM-FED): Naturalistic and Spontaneous Facial Expressions Collected “In the Wild”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Portland, OR, USA, 23–28 June 2013; pp. 881–888. [Google Scholar]
- Sneddon, I.; McRorie, M.; McKeown, G.; Hanratty, J. The belfast induced natural emotion database. IEEE Trans. Affect. Comput. 2012, 3, 32–41. [Google Scholar] [CrossRef]
- Grimm, M.; Kroschel, K.; Narayanan, S. The vera am mittag german audio-visual emotional speech database. In Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, Hannover, Germany, 23 June–26 April 2008; pp. 865–868. [Google Scholar]
- Campbell, N. Tools and resources for visualising conversational-speech interaction. In Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco, 28–30 May 2008; pp. 231–234. [Google Scholar]
- Abrilian, S.; Devillers, L.; Buisine, S.; Martin, J.-C. Emotv1: Annotation of real-life emotions for the specification of multimodal affective interfaces. HCI Int. 2005, 401, 407–408. [Google Scholar]
- Husak, P.; Cech, J.; Matas, J. Spotting facial micro-expressions “in the wild”. In Proceedings of the 22nd Computer Vision Winter Workshop, Pattern Recognition and Image Processing Group (PRIP) and PRIP Club, Hotel Althof Retz, Austria, 6–8 February 2017. [Google Scholar]
- Du, S.; Tao, Y.; Martinez, A.M. Compound facial expressions of emotion. Proc. Natl. Acad. Sci. USA 2014, 111, E1454–E1462. [Google Scholar] [CrossRef] [Green Version]
- Yan, W.-J.; Wu, Q.; Liu, Y.-J.; Wang, S.-J.; Fu, X. Casme database: A dataset of spontaneous micro-expressions collected from neutralized faces. In Proceedings of the 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Shanghai, China, 22–26 April 2013; pp. 1–7. [Google Scholar]
- Mesquita, B.; Frijda, N.H. Cultural variations in emotions: A review. Psychol. Bull. 1992, 112, 179–204. [Google Scholar] [CrossRef]
- Izard, C.E. The Face of Emotion; Appleton-CenturyCrofts: New York, NY, USA, 1971. [Google Scholar]
- Svetieva, E.; Frank, M.G. Empathy, emotion dysregulation, and enhanced microexpression recognition ability. Motiv. Emot. 2016, 40, 309–320. [Google Scholar] [CrossRef]
- Hurley, C.M.; Anker, A.E.; Frank, M.G.; Matsumoto, D.; Hwang, H.C. Background factors predicting accuracy and improvement in micro expression recognition. Motiv. Emot. 2014, 38, 700–714. [Google Scholar] [CrossRef]
- Polikovsky, S.; Kameda, Y.; Ohta, Y. Facial micro-expressions recognition using high speed camera and 3D-gradient descriptor. In Proceedings of the 3rd International Conference on Imaging for Crime Detection and Prevention (ICDP), London, UK, 3 December 2009. [Google Scholar]
- Davison, A.K.; Lansley, C.; Costen, N.; Tan, K.; Yap, M.H. Samm: A spontaneous micro-facial movement dataset. IEEE Trans. Affect. Comput. 2018, 9, 116–129. [Google Scholar] [CrossRef] [Green Version]
- Merghani, W.; Davison, A.K.; Yap, M.H. A Review on Facial Micro-Expressions Analysis: Datasets, Features and Metrics. arXiv 2018, arXiv:1805.02397. [Google Scholar]
- Radlak, K.; Bozek, M.; Smolka, B. Silesian deception database: Presentation and analysis. In Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection, Seattle, WA, USA, 13 November 2015; ACM: Murcia, Spain, 2015; pp. 29–35. [Google Scholar]
- Yan, W.-J.; Li, X.; Wang, S.-J.; Zhao, G.; Liu, Y.-J.; Chen, Y.-H.; Fu, X. Casme ii: An improved spontaneous micro-expression database and the baseline evaluation. PLoS ONE 2014, 9, e86041. [Google Scholar] [CrossRef]
- Li, X.; Pfister, T.; Huang, X.; Zhao, G.; Pietikainen, M. A spontaneous micro-expression database: Inducement, collection and baseline. In Proceedings of the 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Shanghai, China, 22–26 April 2013; pp. 1–6. [Google Scholar]
- Pfister, T.; Li, X.; Zhao, G.; Pietikäinen, M. Recognising spontaneous facial micro-expressions. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 1449–1456. [Google Scholar]
- Vinciarelli, A.; Dielmann, A.; Favre, S.; Salamin, H. Canal9: A database of political debates for analysis of social interactions. In Affective Computing and Intelligent Interaction and Workshops; ACII: Memphis, TN, USA, 2009; pp. 1–4. [Google Scholar]
- Warren, G.; Schertler, E.; Bull, P. Detecting deception from emotional and unemotional cues. J. Nonverbal Behav. 2009, 33, 59–69. [Google Scholar] [CrossRef]
- Gerlowska, J.; Dmitruk, K.; Rejdak, K. Facial emotion mimicry in older adults with and without cognitive impairments due to Alzheimer’s disease. AIMS Neurosci. 2021, 8, 226–238. [Google Scholar] [CrossRef]
- Simons, G.; Ellgring, H.; Pasqualini, M. Disturbance of spontaneous and posed facial expressions in Parkinson’s disease. Cogn. Emot. 2003, 17, 759–778. [Google Scholar] [CrossRef]
- Alvino, C.; Kohler, C.; Barrett, F.; Gur, E.R.; Gur, R.C.; Verma, R. Computerized measurement of facial expression of emotions in schizophrenia. J. Neurosci. Methods 2007, 163, 350–361. [Google Scholar] [CrossRef] [Green Version]
- Samad, M.D.; Diawara, N.; Bobzien, J.L.; Taylor, C.M.; Harrington, J.W.; Iftekharuddin, K.M. A pilot study to identify autism related traits in spontaneous facial actions using computer vision. Res. Autism Spectr. Disord. 2019, 65, 14–24. [Google Scholar] [CrossRef]
Macro- and Micro-Expressions Facial Datasets | |||
---|---|---|---|
Macro-Expression Datasets | Micro-Expression Datasets | ||
Spontaneous | In-the-wild | Spontaneous | In-the-wild |
EB+, TAVER, RAVDESS, GFT, SEWA, BP4D+ (MMSE), BioVid Emo, 4D CCDb, MAHNOB Mimicry, OPEN-EmoRec-II, AVEC’14, BP4D-Spontaneous, DISFA, RECOLA, AVEC’13, CCDb, DynEmo, DEAP, SEMAINE, MAHNOB-HCI, UNBC-McMaster, CAM3D, B3D(AC), CK+, AvID, AVIC, DD, SAL, HUMAINE, EmoTABOO, ENTERFACE, UT-Dallas, RU-FACS, MIT, UA-UIUC, AAI, Smile dataset, iSAFE, ISED | RAF-DB, Aff-Wild2, AM-FED+, AffectNet, AFEW-VA, Aff-Wild, EmotioNet, FER-Wild, Vinereactor, CHEAVD, HAPPEI, AM-FED, FER-2013, AFEW, Belfast induced, SFEW, VAM-faces, FreeTalk, EmoTV, BAUM-2 | SAMM, CAS(ME)2, Silesian deception, CASME II, CASME, SMIC-E, SMIC, Canal9, YorkDDT | MEVIEW |
Expression Representation | Macro-Expression Datasets |
---|---|
Six basic expressions | MMI, USTC-NVIE, MMI-V, SFEW |
Six basic expressions + neutral | iSAFE, AFEW, FER-2013 |
Six basic expressions + neutral, pain | Hi4D-ADSIP |
Six basic expressions + neutral, contempt | BAUM-2 |
Six basic expressions (happiness or amusement, sadness, surprise or startle, fear or nervous, anger or upset, disgust) + embarrassment, pain | BP4D-Spontaneous |
23 categories of emotion | EmotioNet |
Nine categories of emotions (no-face, six basic expressions, neutral, none, and uncertain) | FER-Wild |
13 emotional and mental states (six basic emotions plus boredom and contempt plus mental states, confusion, neutral, thinking, concentrating, and bothered) | BAUM-1 |
Four emotions (sadness, surprise, happiness, and disgust) | ISED |
One emotion (smile) | AM-FED, Smile dataset |
Valence–arousal | AffectNet, DEAP, Aff-Wild, AVEC’14 |
Number of Subjects | Macro-Expression Datasets |
---|---|
≤50 | TAVER, RAVDESS, BAUM-1, OPEN-EmoRec-II, BP4D-Spontaneous, DISFA, RECOLA, CCDb, MAHNOB Laughter, DEAP, SEMAINE, MAHNOB-HCI, UNBC-McMaster, CAM3D, B3D(AC), MMI-V, AVLC, AvID, AVIC, VAM-faces, ENTERFACE, MMI, MIT, EmoTV, UA-UIUC, 4D CCDb, FreeTalk, IEMOCAP, SAL, iSAFE, ISED |
∈[50, 100] | GFT, SEWA, BioVid Emo, MAHNOB Mimicry, AVEC’14, PICS-Stirling ESRC 3D Face Database, Belfast induced (Set2 and Set3), Hi4D-ADSIP, DD, RU-FACS, AAI, Smile dataset |
∈[100, 250] | EB+, 4DFAB, AFEW-VA, BP4D+ (MMSE), Vinereactor, CHEAVD, AM-FED, Belfast induced (Set1), USTC-NVIE, CK+ |
∈[250, 500] | SFEW, Aff-Wild2, AM-FED+, BAUM-2, AVEC’13 AViD-Corpus, DynEmo, AFEW, UT-Dallas |
≥500 | RAF-DB, AffectNet, Aff-Wild, EmotioNet, FER-Wild, FER-2013, HAPPEI, HUMAINE |
Number of Subjects | Micro-Expression Datasets |
---|---|
≤50 | SAMM, CAS(ME)2, MEVIEW, CASME II, CASME, SMIC-E, SMIC, YorkDDT |
≥100 | RAF-DB, AffectNet, Aff-Wild, EmotioNet, FER-Wild, FER-2013, HAPPEI, HUMAINE |
Dataset | Year | Number of Subjects | Age | FPS | Ethnicity | Amount of Data/Frames |
---|---|---|---|---|---|---|
EB+ [24] | 2020 | 200 | 18–66 | 25 | Five ethnicities (Latino/Hispanic, White, African American, Asian, and Others) | 1216 videos, with 395 K frames in total |
iSAFE [9] | 2020 | 44 | 17–22 | 60 | Two ethnicities (Indo-Aryan and Dravidian (Asian)) | 395 clips |
RAF-DB * [70] | 2019 | thousands | - | - | The images URLs were collected from Flickr | 30,000 facial images |
TAVER * [26] | 2019 | 17 | 21–38 | 10 | One ethnicity (Korean) | 17 videos of 1–4 mn |
4DFAB* [61] | 2018 | 180 | 5–75 | 60 | Three ethnicities (Caucasian (Europeans and Arabs), Asian (East-Asian and South-Asian) and Hispanic/Latino) | Two million frames. The vertex number of reconstructed 3D meshes ranges from 60 k to 75 k |
Aff-Wild2 * [71] | 2018 | 258 | infants, young and elderly | 30 | Five ethnicities (Caucasian, Hispanic or Latino, Asian, black, or African American) | Extending it with 260 more subjects and 1,413,000 new video frames |
RAVDESS * [27] | 2018 | 24 | 21–33 | 30 | (Caucasian, East-Asian, and Mixed (East-Asian Caucasian, and Black-Canadian First nations Caucasian)) | 7356 recordings composed of 4320 speech recordings and 3036 song recordings |
AM-FED+ * [72] | 2018 | 416 | - | 14 | Participants from around the world | 1044 videos of naturalistic facial responses to online media content recorded over the Internet |
GFT * [28] | 2017 | 96 | 21–28 | - | Participants were randomly selected | 172,800 frames |
AffectNet* [73] | 2017 | 450,000 | average age 33.01 years | - | More than 1,000,000 facial images from the Internet | 1,000,000 images with facialandmarks. 450,000 images annotated manually |
AFEW-VA* [74] | 2017 | 240 | 8–76 | - | Movie actors | 600 video clips |
SEWA* [29] | 2017 | 398 | 18–65 | 20–30 | Six ethnicities (British, German, Hungarian, Greek, Serbian, and Chinese) | 1990 audio-visual recording clips |
BP4D+ (MMSE) [25] | 2016 | 140 | 18–66 | 25 | Five ethnicities (Latino/Hispanic, White, African American, Asian, and Others) | 1.4 million frames. Over 10TB high quality data generated for the research community |
Aff-Wild * [75] | 2016 | 500 | - | - | - | 500 videos from YouTube |
EmotioNet * [17] | 2016 | 1,000,000 | - | - | One million images of facial expressions downloaded from the Internet | Images queried from web: 100,000 images annotated manually, 900,000 images annotated automatically |
FER-Wild * [15] | 2016 | 24,000 | - | - | - | 24,000 images from web |
BAUM-1 * [16] | 2016 | 31 | 19-65 | 30 | One ethnicity (Turkish) | 1184 multimodal facial video clips contain spontaneous facial expressions and speech of 13 emotional and mental states |
BioVid Emo * [30] | 2016 | 86 | 18–65 | - | - | 15 standardized film clips |
Vinereactor * [76] | 2016 | 222 | - | web-cam | Mechanical tuckers | 6029 video responses from 343 unique mechanical truck workers in response to 200 video stimulus. Total number of 1,380,343 video frames |
CHEAVD * [77] | 2016 | 238 | 11–62 | 25 | - | Extracted from 34 films, two TV series and four other television shows. In the wild |
ISED * [31] | 2016 | 50 | 18–22 | 50 | One ethnicity (India) | 428 videos |
4D CCDb * [32] | 2015 | 4 | 20–50 | 60 | - | 34 audio-visuals |
MAHNOB Mimicry * [33] | 2015 | 60 | 18–34 | 25 | Staff and students at Imperial College London | Over 54 sessions of dyadic interactions between 12 confederates and their 48 counterparts |
OPEN-EmoRec-II * [34] | 2015 | 30 | Mean age: women 37.5 years; men 51.1 years | - | - | Video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and facial reactions annotations |
HAPPEI * [78] | 2015 | 8500 faces | - | - | - | 4886 images. |
AVEC’14 * [35] | 2014 | 84 | 18–63 | - | German | 300 audio-visuals |
BAUM-2 * [13] | 2014 | 286 | 5–73 | - | two ethnicities (Turkish, English) | 1047 video clips |
BP4D-Spontaneous * [14] | 2013 | 41 | 18–29 | 25 | four ethnicities (Asian, African-American, Hispanic, and Euro-American) | 368,036 frames |
DISFA * [36] | 2013 | 27 | 18–50 | 20 | four ethnicities (Asian, Euro American, Hispanic, and African-American) | 130,000 frames |
RECOLA * [37] | 2013 | 46 | Mean age: 22 years, standard deviation: three years | - | four ethnicities (French, Italian, German and Portuguese) | 27 videos |
AM-FED * [79] | 2013 | 242 | Range of ages and ethnicities | 14 | Viewers from a range of ages and ethnicities | 168,359 frames/242 facial videos |
FER-2013 * [11] | 2013 | 35,685 | - | - | - | Images queried from web |
AVEC’13 (AViD-Corpus) * [38] | 2013 | 292 | 18–63 | 30 one ethnicity (German) | 340 audio-visuals | |
CCDb * [39] | 2013 | 16 | 25–56 | - | All participants were fully fluent in the Englishanguage | 30 audio-visuals |
MAHNOB Laughter * [62] | 2013 | 22 | Average age: 27 and 28 years | 25 | 12 different countries and of different origins. | 180 sessions 563aughter episodes, 849 speech utterances, 51 posedaughs, 67 speech–laughs episodes and 167 other vocalizations annotated in the dataset |
DynEmo * [40] | 2013 | 358 | 25–65 | 25 | One ethnicity (Caucasian) | Two sets of 233 and 125 recordings of EFE of ordinary people |
PICS-Stirling ESRC 3D Face Database * [63] | 2013 | 99 | - | - | - | 2D images, video sequences and 3D face scans |
DEAP * [41] | 2012 | 32 | 19–37 | - | Mostly European students | 40 one-minuteong videos shown to subjects |
AFEW * [10] | 2012 | 330 | 1–70 | - | Extracted from movies | 1426 sequences withength from 300 to 5400 ms. 1747 expressions |
SEMAINE * [42] | 2012 | 24 | 22–60 | - | Undergraduate and postgraduate students | 130,695 frames |
Belfast induced * [80] | 2012 | Set1: 114 | Undergraduate students | - | undergraduate students | 570 audio-visuals |
Set2: 82 | Mean age of participants 23.78 | - | Undergraduate students, postgraduate students or employed professionals | 650 audio-visuals | ||
Set3: 60 | age of participants 32.54 | - | (Peru, Northern Ireland) | 180 audio-visuals | ||
MAHNOB-HCI * [43] | 2012 | 27 | 19–40 | 60 | Different educational background, from undergraduate students to postdoctoral fellows, with different English proficiency from intermediate to native speakers | 756 data sequences |
Hi4D-ADSIP * [12] | 2011 | 80 | 18–60 | 60 | Undergraduate students from the Performing Arts Department at the University. Undergraduate students, postgraduate students and members of staff from other departments | 3360 images/sequences |
UNBC-McMaster (UNBC Shoulder Pain Archive (SP)) * [44] | 2011 | 25 | - | - | Participants were self-identified while having a problem with shoulder pain | 48,398 frames/200 video sequences |
CAM3D * [45] | 2011 | 16 | 24–50 | 25 | Three ethnicities (Caucasian, Asian and Middle Eastern) | 108 videos of 12 mental states |
SFEW * [7] | 2011 | 95 | - | - | - | 700 images: 346 images in Set 1 and 354 images in Set 2 |
B3D(AC) * [46] | 2010 | 14 | 21–53 | 25 | Native English speakers | 1109 sequences, 4.67 song |
USTC-NVIE * [64] | 2010 | 215 | 17–31 | 30 | Students | 236 apex images |
CK+ * [47] | 2010 | 123 | 18–50 | - | Three ethnicities (Euro-American, Afro-American and other) | 593 sequences |
MMI-V * [65] | 2010 | 25 | 20–32 | 25 | Three ethnicities (European, South American, Asian) | 1 h and 32 min of data. 392 segments |
AVLC * [67] | 2010 | 24 | Average ages were respectively 30, 28 and 29 years | 25 | eleven ethnicities (Belgium, France, Italy, UK, Greece, Turkey, Kazakhstan, India, Canada, USA and South Korea) | 1000 spontaneousaughs and 27 actedaughs |
AvID * [48] | 2009 | 15 | 19–37 | - | Native Slovenian speakers | Approximately one-hour video for each subject |
AVIC [49] | 2009 | 21 | ≤30 and ≥40 | 25 | Two ethnicities (Asian and European) | No. episodes 324 |
DD [50] | 2009 | 57 | - | 30 | 19% non-Caucasian | No. episodes 238 |
VAM-faces * [81] | 2008 | 20 | 16–69 (70% ≤ 35) | 25 | One ethnicity (German) | 1867 images (93.6 images per speaker on average) |
FreeTalk * [82] | 2008 | 4 | - | 60 | Originating from different countries and each of them speaking a different nativeanguage (Finnish, French, Japanese, and English) | No. episodes 300 |
IEMOCAP * [68] | 2008 | 10 | - | 120 | Actors (fluent English speakers) | Two hours of audiovisual data, including video, speech, motion capture of face, and text transcriptions |
SAL * [51] | 2008 | 4 | - | - | - | 30 min sessions for each user |
HUMAINE * [52] | 2007 | Multiple | - | - | - | 50 ‘clips’ from naturalistic and induced data |
EmoTABOO * [53] | 2007 | - | - | - | French dataset | 10 clips |
AMI [69] | 2006 | - | - | 25 | - | A multi-modal data set consisting of 100 h of meeting recordings |
ENTERFACE * [54] | 2006 | 16 | average age 25 | - | - | - |
5 | 22–38 | |||||
16 | average age 25 | |||||
RU-FACS [56] | 2005 | 100 | 18–30 | 24 | Two ethnicities (African-American and Asian or Latino) | 400–800 min dataset |
MMI * [66] | 2005 | 19 | 19–62 | 24 | Three ethnicities (European, Asian, or South American) | Subjects portrayed 79 series of facial expressions. Image sequence of frontal and side view are captured. 740 static images/848 videos |
UT-Dallas * [55] | 2005 | 284 | 18–25 | 29.97 | One ethnicity (Caucasians) | 1540 standardized clips |
MIT [57] | 2005 | 17 | - | - | - | Over 25,000 frames were scored |
EmoTV * [83] | 2005 | 48 | - | - | French | 51 video clips |
UA-UIUC * [58] | 2004 | 28 | Students | - | Students | One video clip for each subject |
AAI [59] | 2004 | 60 | 18–30 | - | Two ethnicities (European American and Chinese American) | One audiovisual for each subject |
Smile dataset [60] | 2001 | 95 | - | 30 | - | 195 spontaneous smiles |
Dataset | Year | Number Subjects | Age | FPS | Ethnicity | # of Data/Frames | FACs Coded | Samples | Lights | Resolution | Emotions |
---|---|---|---|---|---|---|---|---|---|---|---|
SAMM [92] | 2018 | 32 | average 33.24 | 200 | Thirteen ethnicities (white British and other) | 338 micro movements | Yes | 159 | Twoights as array of LEDs | 2040 | Seven emotions. Macro/Micro |
CAS(ME)2 [93] | 2018 | 22 | Average 22.59 | 30 | One ethnicity | 250 macro, 53 micro | No | 53 | Twoight-emitting diose (LDE)ights | Four emotions. Macro/Micro | |
MEVIEW [84] | 2017 | 16 | - | 25 | - | 31 videos | Yes | 31 | - | - | Five emotions. Macro/Micro |
Silesian deception [94] | 2015 | 101 | Students | 100 | Third and fourth year students | 101 videos 1.1 M frames. | Yes | 183 micro-tensions | Proper illumination | Macro/Micro | |
CASME II [95] | 2014 | 26 | Average 22.03 | 200 | One ethnicity | Among 3000 facial movements | Yes | 247 | Four selected LEDamps under umbrella reflectors | Five emotions | |
CASME [86] | 2013 | 35 (19 valid) | Average 22.03 | 60 | One ethnicity | More than 1500 elicited facial movements | Yes | 195 in Class A, 100 in Class B, 95 | Class A: naturalight, Class B: room with two LEDights | Class A: . Class B: | Seven emotions |
SMIC-E: HS VIS NIR [96] | 2013 | HS: 16 | (22–34) | 100 | Three ethnicities (Asians, Caucasians and African) | Longest micro-expression clips: 50 frames | No | 164 | 4ights at the four upper corners of the room | 3 emotions (positive, negative and surprise) | |
VIS: 8 | 25 | Theongest micro-expression clips: 13 frames | 71 | ||||||||
NIR: 8 | 25 | Same as VIS | 71 | ||||||||
SMIC [97] | 2011 | 6 | - | 100 | - | 1,260,000 frames | No | 77 | Indoor bunker environment resembling an interrogation room | Five emotions: Micro | |
Canal9 [98] | 2009 | 190 | - | - | - | 70 debates for a total of 43 h and 10 min of material | - | 24 | - | Political debates recorded by the Canal 9ocal Switzerland TV station | |
YorkDDT [99] | 2009 | 9 | - | 25 | - | 20 videos for a deception detection test (DDT). seven frames | No | 18 | - | 320 × 240 | Two emotion classes |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guerdelli, H.; Ferrari, C.; Barhoumi, W.; Ghazouani, H.; Berretti, S. Macro- and Micro-Expressions Facial Datasets: A Survey. Sensors 2022, 22, 1524. https://doi.org/10.3390/s22041524
Guerdelli H, Ferrari C, Barhoumi W, Ghazouani H, Berretti S. Macro- and Micro-Expressions Facial Datasets: A Survey. Sensors. 2022; 22(4):1524. https://doi.org/10.3390/s22041524
Chicago/Turabian StyleGuerdelli, Hajer, Claudio Ferrari, Walid Barhoumi, Haythem Ghazouani, and Stefano Berretti. 2022. "Macro- and Micro-Expressions Facial Datasets: A Survey" Sensors 22, no. 4: 1524. https://doi.org/10.3390/s22041524
APA StyleGuerdelli, H., Ferrari, C., Barhoumi, W., Ghazouani, H., & Berretti, S. (2022). Macro- and Micro-Expressions Facial Datasets: A Survey. Sensors, 22(4), 1524. https://doi.org/10.3390/s22041524