Facial Expression Recognition Using Computer Vision: A Systematic Review
Abstract
:1. Introduction
2. Selection Criteria
- ACM: 239 papers,
- IEEE: 118 papers,
- BASE: 78 papers,
- Springer: 369 papers.
- Theoretical studies,
- Studies that are not related with Computer Vision,
- Surveys and Thesis,
- Dataset publications,
- Older iterations of the same studies.
3. FER Databases
- The Extended Cohn–Kanade database (CK+) [50]: contains 593 image sequences of posed and non-posed emotions. In addition, 123 participants were 18 to 50 years of age, 69% female, 81%, Euro-American, 13% Afro-American, and 6% other groups. The images were digitized into either 640 × 490 or 640 × 480 resolution and are mostly gray. Each sequence was built on frontal views and 30-degree views, starting with a neutral expression up until the peak emotion (last frame of the sequence). Most sequences are labeled with eight emotions: anger, disgust, contempt, fear, neutral, happiness, sadness, and surprise.
- The Japanese Female Facial Expression database (JAFFE) [51]: contains 213 images of six basic emotions, plus the neutral expression posed by 10 Japanese female models. Each image has been labeled by 60 Japanese subjects.
- Binghamton University 3D Facial Expression database (BU-3DFE) [52]: contains 606 3D facial expression sequences captured from 101 subjects. The texture video has a resolution of about 1040 × 1329 pixels per frame. The resulting database consists of 58 female and 43 male subjects, with a large variety of ethnic/racial ancestries. This database was built on the six basic emotions, plus the neutral expression.
- Facial Expression Recognition 2013 database (FER-2013) [53]: was created using the Google image search API to search for images of faces that match a set of 184 emotion-related keywords like “blissful”, “enraged”, etc. These keywords were combined with words related to gender, age or ethnicity, leading to 35,887 grayscale images with a 48 × 48 resolution, mapped into the six basic emotions, plus the neutral expression.
- Emotion Recognition in the Wild database (EmotiW) [54]: contains two sub-databases, Acted Facial Expression in the Wild (AFEW) and the Static Facial Expression in the Wild (SFEW). AFEW contains videos (image sequences including audio) and SFEW contains static images. This database was built on the six basic emotions, plus the neutral expression and the image size is 128 × 128.
- MMI database [55]: contains over 2900 videos and high-resolution still images of 75 subjects. It is fully annotated for the presence of AUs in videos, and partially coded on frame-level, indicating for each frame whether an AU is in either the neutral, onset, apex or offset phase. This database was built on six emotions: anger, disgust, fear, happiness, sadness, and surprise.
- eNTERFACE’05 Audiovisual Emotion database [56]: contains 42 subjects from 14 different nationalities. Among the 42 subjects, 81% were men and the remaining 19% were women. In addition, 31% of the subjects wore glasses, while 17% had a beard, which consists of video sequences (including audio) with a 720 × 576 resolution. This database was built on six emotions: anger, disgust, fear, happiness, sadness, and surprise.
- Karolinska Directed Emotional Faces database (KDEF) [57]: contains a set of 4900 pictures of human facial expressions. The set contains 70 individuals (35 females and 35 males) displaying the six basic emotions, plus the neutral expression. Each expression is viewed from five different angles and was photographed in two sessions.
- Radboud Faces Database (RaFD) [58]: contains a set of pictures of 67 models (including Caucasian males and females and Moroccan Dutch males) displaying eight emotional expressions (anger, disgust, contempt, fear, neutral, happiness, sadness, and surprise), amounting to 120 images per model. Each emotion is shown with three different gaze directions and all pictures were taken from five camera angles simultaneously. The image size is 1024 × 681.
4. Pre-Processing
4.1. Face Detection
4.2. Geometric Transformations
4.3. Image Processing
4.3.1. Smoothing
4.3.2. Histogram Equalization
4.3.3. Data Augmentation
4.3.4. Principal Component Analysis
5. Feature Extraction
5.1. Local Binary Patterns
5.2. Optical Flow
5.3. Active Appearance Model
5.4. Action Units
5.5. Facial Animation Parameters
5.6. Gabor Filter
5.7. Scale-Invariant Feature Transform
5.8. Histogram of Oriented Gradients
6. Classification/Regression
6.1. Convolutional Neural Network
- Increasing the complexity of the model (by adding more layers).
- Adding dropout layers [93] which randomly disable a determined number of nodes during training to avoid that the model memorizes patterns instead of learning them.
- Tuning the parameters of the model during the training, such as epochs, batch size, learning rate, class weight, among others.
- Increasing the training data by adding more samples or through DA as mentioned in Section 4.3.3.
- Whenever the database is too small (a common problem on the publicly available databases for emotion recognition), Transfer Learning (TL) can be applied. TL uses a pre-defined model that has already been trained on a large database and one can fine-tune it using a smaller database for its own classification problem.
6.2. Support Vector Machine
6.3. K-Nearest Neighbor
6.4. Naive Bayes
6.5. Hidden Markov Model
6.6. Decision Tree
6.7. Random Forest
6.8. Euclidean Distance
7. Results and Discussion
- Hold-out (HO) is when the database is split up into two groups: one for training and one for testing. Generally, the training set has more data than the testing set (e.g., 70%/30%). This method has the advantage of being the fastest to be computed; however, it may produce high variance evaluations, since it heavily depends on which data end up being used for training and for testing.
- Cross-validation (CV), which can be divided into:
- K-fold cross-validation [99], which is when the database is randomly split up into k groups. One group is for testing and the remaining for training. The process is repeated until every group is used for testing. This method has the advantage of being more robust to the division of the database, since every piece of data ends up being trained k-1 times and tested once. Therefore, the evaluation variance can be reduced as k is increased, although the computation time also increases.
- Leave-p-out cross-validation [100], which is when all possible sets of p data are left out from the training and used for validation. Although this method provides more robust evaluations than k-fold cross-validation, it can become computationally infeasible depending on p.
- The posed facial expressions made by actors when building the databases are too artificial. This means that, even if the works present a high accuracy on the benchmarks (using databases that are also built on posed facial expressions), it might not translate into a high accuracy when the same system faces a real world scenario.
- Some databases are poorly annotated or have an ambiguous annotation. Authors who noticed this problem tried to overcome it by making their own annotations or by excluding those samples.
- Emotion databases are generally small (mainly because of how hard it is to set up the image acquisition system and how hard it is to get several actors to do different facial expressions).
8. Insights on Emotion Recognition in the Wild Challenge
- Pre-processing techniques that normalize pose-variant faces as well as the image intensity.
- The exploration of AUs, HOG, SIFT, and LBP features.
- The use of hybrid classifiers based on SVMs, fine-tuned CNNs, and LSTMs.
9. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef] [PubMed]
- Coan, J.A.; Allen, J.J. Frontal EEG asymmetry as a moderator and mediator of emotion. Biol. Psychol. 2004, 67, 7–50. [Google Scholar] [CrossRef] [PubMed]
- Zafeiriou, S.; Zhang, C.; Zhang, Z. A survey on face detection in the wild: Past, present and future. Comput. Vis. Image Underst. 2015, 138, 1–24. [Google Scholar] [CrossRef] [Green Version]
- Perez, L.; Wang, J. The effectiveness of data augmentation in image classification using deep learning. arXiv 2017, arXiv:1712.04621. [Google Scholar]
- Tian, Y.I.; Kanade, T.; Cohn, J.F. Recognizing action units for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 97–115. [Google Scholar] [CrossRef] [Green Version]
- Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; ACM: New York, NY, USA, 1992; pp. 144–152. [Google Scholar] [CrossRef]
- Liu, W.; Wang, Z.; Liu, X.; Zeng, N.; Liu, Y.; Alsaadi, F.E. A survey of deep neural network architectures and their applications. Neurocomputing 2017, 234, 11–26. [Google Scholar] [CrossRef]
- ACM Digital Library. Available online: https://dl.acm.org/ (accessed on 26 September 2019).
- IEEE Xplore Digital Library. Available online: https://ieeexplore.ieee.org/Xplore/home.jsp (accessed on 26 September 2019).
- Bielefeld Academic Search Engine. Available online: https://www.base-search.net/ (accessed on 26 September 2019).
- Springer Link. Available online: https://link.springer.com/ (accessed on 26 September 2019).
- Valstar, M.F.; Pantic, M.; Ambadar, Z.; Cohn, J.F. Spontaneous vs. posed facial behavior: Automatic analysis of brow actions. In Proceedings of the 8th International Conference on Multimodal Interfaces, Banff, AB, Canada, 2–4 November 2006; ACM: New York, NY, USA, 2006; pp. 162–170. [Google Scholar] [CrossRef]
- Duthoit, C.J.; Sztynda, T.; Lal, S.K.; Jap, B.T.; Agbinya, J.I. Optical flow image analysis of facial expressions of human emotion: Forensic applications. In Proceedings of the 1st International Conference on Forensic Applications and Techniques in Telecommunications, Information, and Multimedia and Workshop, Adelaide, Australia, 21–23 January 2008; p. 5. [Google Scholar]
- Dornaika, F.; Davoine, F. Simultaneous facial action tracking and expression recognition in the presence of head motion. Int. J. Comput. Vis. 2008, 76, 257–281. [Google Scholar] [CrossRef]
- Caridakis, G.; Karpouzis, K.; Kollias, S. User and context adaptive neural networks for emotion recognition. Neurocomputing 2008, 71, 2553–2562. [Google Scholar] [CrossRef]
- Sun, X.; Rothkrantz, L.; Datcu, D.; Wiggers, P. A Bayesian approach to recognise facial expressions using vector flows. In Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing, Ruse, Bulgaria, 18–19 June 2009; ACM: New York, NY, USA, 2009; p. 28. [Google Scholar] [CrossRef]
- Popa, M.; Rothkrantz, L.; Wiggers, P. Products appreciation by facial expressions analysis. In Proceedings of the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing on International Conference on Computer Systems and Technologies, Sofia, Bulgaria, 17–18 June 2010; ACM: New York, NY, USA, 2010; pp. 293–298. [Google Scholar] [CrossRef]
- Liu, X.; Zhang, L.; Yadegar, J. A multi-modal emotion recognition system for persistent and non-invasive personal health monitoring. In Proceedings of the 2nd Conference on Wireless Health, La Jolla, 10–13 October 2011; ACM: New York, NY, USA, 2011; p. 28. [Google Scholar] [CrossRef]
- Metri, P.; Ghorpade, J.; Butalia, A. Facial emotion recognition using context based multimodal approach. Int. J. Emerg. Sci. 2012, 2, 171–183. [Google Scholar] [CrossRef]
- Cruz, A.C.; Bhanu, B.; Thakoor, N. Facial emotion recognition with expression energy. In Proceedings of the 14th ACM International Conference on Multimodal Interaction, Santa Monica, CA, USA, 22–26 October 2012; ACM: New York, NY, USA, 2012; pp. 457–464. [Google Scholar] [CrossRef]
- Soladié, C.; Salam, H.; Pelachaud, C.; Stoiber, N.; Séguier, R. A multimodal fuzzy inference system using a continuous facial expression representation for emotion detection. In Proceedings of the 14th ACM International Conference on Multimodal Interaction, Santa Monica, CA, USA, 22–26 October 2012; ACM: New York, NY, USA, 2012; pp. 493–500. [Google Scholar] [CrossRef]
- Monkaresi, H.; Calvo, R.A.; Hussain, M.S. Automatic natural expression recognition using head movement and skin color features. In Proceedings of the International Working Conference on Advanced Visual Interfaces, Capri Island, Italy, 21–25 May 2012; ACM: New York, NY, USA, 2012; pp. 657–660. [Google Scholar] [CrossRef]
- Biel, J.I.; Teijeiro-Mosquera, L.; Gatica-Perez, D. Facetube: Predicting personality from facial expressions of emotion in online conversational video. In Proceedings of the 14th ACM International Conference on Multimodal Interaction, Santa Monica, CA, USA, 22–26 October 2012; ACM: New York, NY, USA, 2012; pp. 53–56. [Google Scholar] [CrossRef]
- Nedkov, S.; Dimov, D. Emotion recognition by face dynamics. In Proceedings of the 14th International Conference on Computer Systems and Technologies, Ruse, Bulgaria, 28–29 June 2013; ACM: New York, NY, USA, 2013; pp. 128–136. [Google Scholar] [CrossRef]
- Terzis, V.; Moridis, C.N.; Economides, A.A. Measuring instant emotions based on facial expressions during computer-based assessment. Pers. Ubiquitous Comput. 2013, 17, 43–52. [Google Scholar] [CrossRef]
- Meng, H.; Huang, D.; Wang, H.; Yang, H.; Ai-Shuraifi, M.; Wang, Y. Depression recognition based on dynamic facial and vocal expression features using partial least square regression. In Proceedings of the 3rd ACM International Workshop on Audio/visual Emotion Challenge, Barcelona, Spain, 21–25 October 2013; ACM: New York, NY, USA, 2013; pp. 21–30. [Google Scholar] [CrossRef]
- Gómez Jáuregui, D.A.; Martin, J.C. Evaluation of vision-based real-time measures for emotions discrimination under uncontrolled conditions. In Proceedings of the 2013 on Emotion Recognition in the Wild Challenge and Workshop, Sydney, Australia, 9–13 December 2013; ACM: New York, NY, USA, 2013; pp. 17–22. [Google Scholar] [CrossRef]
- Bakhtiyari, K.; Husain, H. Fuzzy model of dominance emotions in affective computing. Neural Comput. Appl. 2014, 25, 1467–1477. [Google Scholar] [CrossRef]
- Sangineto, E.; Zen, G.; Ricci, E.; Sebe, N. We are not all equal: Personalizing models for facial expression analysis with transductive parameter transfer. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; ACM: New York, NY, USA, 2014; pp. 357–366. [Google Scholar] [CrossRef]
- Jang, G.J.; Jo, A.; Park, J.S. Video-based emotion identification using face alignment and support vector machines. In Proceedings of the Second International Conference on Human-agent Interaction, Tsukuba, Japa, 28–31 October 2014; ACM: New York, NY, USA, 2014; pp. 285–286. [Google Scholar] [CrossRef]
- Zen, G.; Sangineto, E.; Ricci, E.; Sebe, N. Unsupervised domain adaptation for personalized facial emotion recognition. In Proceedings of the 16th International Conference on Multimodal Interaction, Istanbul, Turkey, 12–16 November 2014; ACM: New York, NY, USA, 2014; pp. 128–135. [Google Scholar] [CrossRef]
- Rothkrantz, L. Online emotional facial expression dictionary. In Proceedings of the 15th International Conference on Computer Systems and Technologies, Ruse, Bulgaria, 27–28 June 2014; ACM: New York, NY, USA, 2014; pp. 116–123. [Google Scholar] [CrossRef]
- Chao, L.; Tao, J.; Yang, M.; Li, Y.; Wen, Z. Long short term memory recurrent neural network based multimodal dimensional emotion recognition. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, Brisbane, Australia, 26–30 October 2015; ACM: New York, NY, USA, 2015; pp. 65–72. [Google Scholar] [CrossRef]
- Kim, Y.; Provost, E.M. Emotion recognition during speech using dynamics of multiple regions of the face. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2015, 12, 25. [Google Scholar] [CrossRef]
- Nomiya, H.; Sakaue, S.; Hochin, T. Recognition and intensity estimation of facial expression using ensemble classifiers. In Proceedings of the 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), Okayama, Japan, 26–29 June 2016; pp. 1–6. [Google Scholar] [CrossRef]
- Zhang, Y.D.; Yang, Z.J.; Lu, H.M.; Zhou, X.X.; Phillips, P.; Liu, Q.M.; Wang, S.H. Facial emotion recognition based on biorthogonal wavelet entropy, fuzzy support vector machine, and stratified cross validation. IEEE Access 2016, 4, 8375–8385. [Google Scholar] [CrossRef]
- Barsoum, E.; Zhang, C.; Ferrer, C.C.; Zhang, Z. Training deep networks for facial expression recognition with crowd-sourced label distribution. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan, 12–16 November 2016; ACM: New York, NY, USA, 2016; pp. 279–283. [Google Scholar] [CrossRef] [Green Version]
- Liu, Z.; Wu, M.; Cao, W.; Chen, L.; Xu, J.; Zhang, R.; Zhou, M.; Mao, J. A facial expression emotion recognition based human-robot interaction system. IEEE/CAA J. Autom. Sin. 2017, 4, 668–676. [Google Scholar] [CrossRef]
- Bouzakraoui, M.S.; Sadiq, A.; Enneya, N. A Customer Emotion Recognition through Facial Expression using POEM descriptor and SVM classifier. In Proceedings of the 2nd International Conference on Big Data, Cloud and Applications, Tetouan, Morocco, 29–30 March 2017; ACM: New York, NY, USA, 2017; p. 80. [Google Scholar]
- Elfaramawy, N.; Barros, P.; Parisi, G.I.; Wermter, S. Emotion recognition from body expressions with a neural network architecture. In Proceedings of the 5th International Conference on Human Agent Interaction, Bielefeld, Germany, 17–20 October 2017; ACM: New York, NY, USA, 2017; pp. 143–149. [Google Scholar] [CrossRef]
- Qi, C.; Li, M.; Wang, Q.; Zhang, H.; Xing, J.; Gao, Z.; Zhang, H. Facial expressions recognition based on cognition and mapped binary patterns. IEEE Access 2018, 6, 18795–18803. [Google Scholar] [CrossRef]
- Zhang, Z.; Chen, T.; Meng, H.; Liu, G.; Fu, X. SMEConvnet: A convolutional neural network for spotting spontaneous facial micro-expression from long videos. IEEE Access 2018, 6, 71143–71151. [Google Scholar] [CrossRef]
- Guo, J.; Lei, Z.; Wan, J.; Avots, E.; Hajarolasvadi, N.; Knyazev, B.; Kuharenko, A.; Junior, J.C.S.J.; Baró, X.; Demirel, H.; et al. Dominant and complementary emotion recognition from still images of faces. IEEE Access 2018, 6, 26391–26403. [Google Scholar] [CrossRef]
- Slimani, K.; Kas, M.; El Merabet, Y.; Messoussi, R.; Ruichek, Y. Facial emotion recognition: A comparative analysis using 22 LBP variants. In Proceedings of the 2nd Mediterranean Conference on Pattern Recognition and Artificial Intelligence, Rabat, Morocco, 27–28 March 2018; ACM: New York, NY, USA, 2018; pp. 88–94. [Google Scholar] [CrossRef]
- Bernin, A.; Müller, L.; Ghose, S.; Grecos, C.; Wang, Q.; Jettke, R.; von Luck, K.; Vogt, F. Automatic Classification and Shift Detection of Facial Expressions in Event-Aware Smart Environments. In Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference, Corfu, Greece, 26–29 June 2018; ACM: New York, NY, USA, 2018; pp. 194–201. [Google Scholar] [CrossRef]
- Magdin, M.; Prikler, F. Real time facial expression recognition using webcam and SDK affectiva. IJIMAI 2018, 5, 7–15. [Google Scholar] [CrossRef]
- Pham, T.T.D.; Kim, S.; Lu, Y.; Jung, S.W.; Won, C.S. Facial action units-based image retrieval for facial expression recognition. IEEE Access 2019, 7, 5200–5207. [Google Scholar] [CrossRef]
- Slimani, K.; Lekdioui, K.; Messoussi, R.; Touahni, R. Compound Facial Expression Recognition Based on Highway CNN. In Proceedings of the New Challenges in Data Sciences: Acts of the Second Conference of the Moroccan Classification Society, Kenitra, Morocco, 28–29 March 2019; ACM: New York, NY, USA, 2019; p. 1. [Google Scholar] [CrossRef]
- Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 94–101. [Google Scholar] [CrossRef]
- Lyons, M.J.; Akamatsu, S.; Kamachi, M.; Gyoba, J.; Budynek, J. The Japanese female facial expression (JAFFE) database. In Proceedings of the Third International Conference on Automatic Face And Gesture Recognition, Nara, Japan, 14–16 April 1998; pp. 14–16. [Google Scholar] [CrossRef]
- Yin, L.; Wei, X.; Sun, Y.; Wang, J.; Rosato, M.J. A 3D facial expression database for facial behavior research. In Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR06), Southampton, UK, 10–12 April 2006; pp. 211–216. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Erhan, D.; Carrier, P.L.; Courville, A.; Mirza, M.; Hamner, B.; Cukierski, W.; Tang, Y.; Thaler, D.; Lee, D.H.; et al. Challenges in representation learning: A report on three machine learning contests. In International Conference on Neural Information Processing; Springer: Berlin, Germany, 2013; pp. 117–124. [Google Scholar] [CrossRef]
- Dhall, A.; Ramana Murthy, O.; Goecke, R.; Joshi, J.; Gedeon, T. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on International Conference On Multimodal Interaction, Seattle, WA, USA, 9–13 November 2015; ACM: New York, NY, USA, 2015; pp. 423–426. [Google Scholar] [CrossRef]
- Pantic, M.; Valstar, M.; Rademaker, R.; Maat, L. Web-based database for facial expression analysis. In Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, Amsterdam, Netherlands, 6 July 2005; p. 5. [Google Scholar] [CrossRef]
- Martin, O.; Kotsia, I.; Macq, B.; Pitas, I. The eNTERFACE’05 audio-visual emotion database. In Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDEW’06), Atlanta, GA, USA, 3–7 April 2006; p. 8. [Google Scholar] [CrossRef]
- Calvo, M.G.; Lundqvist, D. Facial expressions of emotion (KDEF): Identification under different display-duration conditions. Behav. Res. Methods 2008, 40, 109–115. [Google Scholar] [CrossRef] [Green Version]
- Langner, O.; Dotsch, R.; Bijlstra, G.; Wigboldus, D.H.; Hawk, S.T.; Van Knippenberg, A. Presentation and validation of the Radboud Faces Database. Cogn. Emot. 2010, 24, 1377–1388. [Google Scholar] [CrossRef]
- The Extended Cohn–Kanade Database. Available online: http://www.consortium.ri.cmu.edu/ckagree/ (accessed on 8 September 2019).
- The Japanese Female Facial Expression Database. Available online: http://www.kasrl.org/jaffe.html (accessed on 8 September 2019).
- Binghamton University 3D Facial Expression Database. Available online: http://www.cs.binghamton.edu/~lijun/Research/3DFE/3DFE_Analysis.html (accessed on 8 September 2019).
- Facial Expression Recognition 2013 Database. Available online: https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data (accessed on 8 September 2019).
- Emotion Recognition in the Wild Database. Available online: https://cs.anu.edu.au/few/AFEW.html (accessed on 8 September 2019).
- MMI Database. Available online: https://mmifacedb.eu/ (accessed on 8 September 2019).
- eNTERFACE’05 Audio-Visual Emotion Database. Available online: http://www.enterface.net/enterface05/ (accessed on 8 September 2019).
- Karolinska Directed Emotional Faces Database. Available online: http://kdef.se/ (accessed on 8 September 2019).
- Radboud Faces Database. Available online: http://www.socsci.ru.nl:8180/RaFD2/RaFD?p=main (accessed on 8 September 2019).
- Viola, P.; Jones, M.J. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
- Kazemi, V.; Sullivan, J. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; 2014; pp. 1867–1874. [Google Scholar] [Green Version]
- Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 2016, 23, 1499–1503. [Google Scholar] [CrossRef]
- Farfade, S.S.; Saberian, M.J.; Li, L.J. Multi-view face detection using deep convolutional neural networks. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, Shanghai, China, 23–26 June 2015; ACM: New York, NY, USA, 2015; pp. 643–650. [Google Scholar] [CrossRef]
- Azulay, A.; Weiss, Y. Why do deep convolutional networks generalize so poorly to small image transformations? arXiv 2018, arXiv:1805.12177. [Google Scholar]
- Tomasi, C.; Manduchi, R. Bilateral filtering for gray and color images. ICCV 1998, 98, 2. [Google Scholar] [CrossRef]
- Lindenbaum, M.; Fischer, M.; Bruckstein, A. On Gabor’s contribution to image enhancement. Pattern Recognit. 1994, 27, 1–8. [Google Scholar] [CrossRef]
- Garg, P.; Jain, T. A Comparative Study on Histogram Equalization and Cumulative Histogram Equalization. Int. J. New Technol. Res. 2017, 3, 41–43. [Google Scholar]
- Pizer, S.M.; Amburn, E.P.; Austin, J.D.; Cromartie, R.; Geselowitz, A.; Greer, T.; ter Haar Romeny, B.; Zimmerman, J.B.; Zuiderveld, K. Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 1987, 39, 355–368. [Google Scholar] [CrossRef]
- Hawkins, D.M. The problem of overfitting. J. Chem. Inf. Comput. Sci. 2004, 44, 1–12. [Google Scholar] [CrossRef]
- Jolliffe, I. Principal Component Analysis; Springer: Berlin, Germany, 2011. [Google Scholar] [CrossRef]
- Ojala, T.; Pietikäinen, M.; Mäenpää, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Analy. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Horn, B.K.; Schunck, B.G. Determining optical flow. Artif. Intell. 1981, 17, 185–203. [Google Scholar] [CrossRef] [Green Version]
- Barron, J.L.; Fleet, D.J.; Beauchemin, S.S.; Burkitt, T. Performance of optical flow techniques. In Proceedings of the 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Champaign, IL, USA, 15–18 June 1992; pp. 236–242. [Google Scholar] [CrossRef]
- Cootes, T.F.; Edwards, G.J.; Taylor, C.J. Active appearance models. IEEE Trans. Pattern Analy. Mach. Intell. 2001, 23, 681–685. [Google Scholar] [CrossRef]
- Abdulameer, M.H.; Abdullah, S.; Huda, S.N.; Othman, Z.A. A modified active appearance model based on an adaptive artificial bee colony. Sci. World J. 2014, 2014. [Google Scholar] [CrossRef] [PubMed]
- Ekman, R. What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS); Oxford University Press: New York, NY, USA, 1997. [Google Scholar]
- Pakstas, A.; Forchheimer, R.; Pandzic, I.S. MPEG-4 Facial Animation: The Standard, Implementation and Applications; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2002. [Google Scholar]
- Chandrasiri, N.P.; Naemura, T.; Harashima, H. Real time facial expression recognition system with applications to facial animation in MPEG-4. IEICE Trans. Inf. Syst. 2001, 84, 1007–1017. [Google Scholar]
- Jain, A.K.; Farrokhnia, F. Unsupervised texture segmentation using Gabor filters. Pattern Recognit. 1991, 24, 1167–1186. [Google Scholar] [CrossRef] [Green Version]
- Choraś, R.S. Image Processing and Communications Challenges 2; Springer: Berlin, Germany, 2010; pp. 15–17. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Wu, J.; Cui, Z.; Sheng, V.S.; Zhao, P.; Su, D.; Gong, S. A Comparative Study of SIFT and its Variants. Meas. Sci. Rev. 2013, 13, 122–131. [Google Scholar] [CrossRef] [Green Version]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005. [Google Scholar] [CrossRef]
- Liu, Y.; Li, Y.; Ma, X.; Song, R. Facial expression recognition with fusion features extracted from salient facial areas. Sensors 2017, 17, 712. [Google Scholar] [CrossRef]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Friedman, N.; Geiger, D.; Goldszmidt, M. Bayesian network classifiers. Mach. Learn. 1997, 29, 131–163. [Google Scholar] [CrossRef]
- Eddy, S.R. Hidden markov models. Curr. Opin. Struct. Biol. 1996, 6, 361–365. [Google Scholar] [CrossRef]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chui, K.T.; Lytras, M.D. A Novel MOGA-SVM Multinomial Classification for Organ Inflammation Detection. Appl. Sci. 2019, 9, 2284. [Google Scholar] [CrossRef]
- Arlot, S.; Celisse, A. A survey of cross-validation procedures for model selection. Stat. Surv. 2010, 4, 40–79. [Google Scholar] [CrossRef]
- Trimmer, P.; Paul, E.; Mendl, M.; McNamara, J.; Houston, A. On the evolution and optimality of mood states. Behav. Sci. 2013, 3, 501–521. [Google Scholar] [CrossRef]
- Zhang, Z.; Luo, P.; Loy, C.C.; Tang, X. From facial expression recognition to interpersonal relation prediction. Int. J. Comput. Vis. 2018, 126, 550–569. [Google Scholar] [CrossRef]
- Yang, B.; Cao, J.; Ni, R.; Zhang, Y. Facial expression recognition using weighted mixture deep neural network based on double-channel facial images. IEEE Access 2017, 6, 4630–4640. [Google Scholar] [CrossRef]
- Gogić, I.; Manhart, M.; Pandžić, I.S.; Ahlberg, J. Fast facial expression recognition using local binary features and shallow neural networks. Vis. Comput. 2018, 1–16. [Google Scholar] [CrossRef]
- Kim, J.H.; Kim, B.G.; Roy, P.P.; Jeong, D.M. Efficient Facial Expression Recognition Algorithm Based on Hierarchical Deep Neural Network Structure. IEEE Access 2019, 7, 41273–41285. [Google Scholar] [CrossRef]
- Hua, W.; Dai, F.; Huang, L.; Xiong, J.; Gui, G. HERO: Human emotions recognition for realizing intelligent Internet of Things. IEEE Access 2019, 7, 24321–24332. [Google Scholar] [CrossRef]
- Wu, B.F.; Lin, C.H. Adaptive feature mapping for customizing deep learning based facial expression recognition model. IEEE Access 2018, 6, 12451–12461. [Google Scholar] [CrossRef]
- Ruiz-Garcia, A.; Elshaw, M.; Altahhan, A.; Palade, V. A hybrid deep learning neural approach for emotion recognition from facial expressions for socially assistive robots. Neural Comput. Appl. 2018, 29, 359–373. [Google Scholar] [CrossRef]
- Meng, Z.; Liu, P.; Cai, J.; Han, S.; Tong, Y. Identity-aware convolutional neural network for facial expression recognition. In Proceedings of the 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 30 May–3 June 2017; pp. 558–565. [Google Scholar] [CrossRef]
- Uçar, A.; Demir, Y.; Güzeliş, C. A new facial expression recognition based on curvelet transform and online sequential extreme learning machine initialized with spherical clustering. Neural Comput. Appl. 2016, 27, 131–142. [Google Scholar] [CrossRef]
- Mistry, K.; Zhang, L.; Neoh, S.C.; Lim, C.P.; Fielding, B. A micro-GA embedded PSO feature selection approach to intelligent facial emotion recognition. IEEE Trans. Cybern. 2016, 47, 1496–1509. [Google Scholar] [CrossRef]
- Liliana, D.Y.; Basaruddin, C.; Widyanto, M.R. Mix emotion recognition from facial expression using SVM-CRF sequence classifier. In Proceedings of the International Conference on Algorithms, Computing and Systems, Jeju Island, Korea, 10–13 August 2017; ACM: New York, NY, USA, 2017; pp. 27–31. [Google Scholar] [CrossRef]
- Ferreira, P.M.; Marques, F.; Cardoso, J.S.; Rebelo, A. Physiological Inspired Deep Neural Networks for Emotion Recognition. IEEE Access 2018, 6, 53930–53943. [Google Scholar] [CrossRef]
- Dapogny, A.; Bailly, K.; Dubuisson, S. Confidence-weighted local expression predictions for occlusion handling in expression recognition and action unit detection. Int. J. Comput. Vis. 2018, 126, 255–271. [Google Scholar] [CrossRef]
- Yaddaden, Y.; Bouzouane, A.; Adda, M.; Bouchard, B. A new approach of facial expression recognition for ambient assisted living. In Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments, Corfu, Island, Greece, 29 June –1 July 2016; ACM: New York, NY, USA, 2016; p. 14. [Google Scholar] [CrossRef]
- Ratliff, M.S.; Patterson, E. Emotion recognition using facial expressions with active appearance models. In Proceedings of the Third IASTED International Conference on Human Computer Interaction, Innsbruck, Austria, 17–19 March 2008. [Google Scholar]
- Khan, S.A.; Hussain, S.; Xiaoming, S.; Yang, S. An effective framework for driver fatigue recognition based on intelligent facial expressions analysis. IEEE Access 2018, 6, 67459–67468. [Google Scholar] [CrossRef]
- Hu, M.; Zheng, Y.; Yang, C.; Wang, X.; He, L.; Ren, F. Facial Expression Recognition Using Fusion Features Based on Center-Symmetric Local Octonary Pattern. IEEE Access 2019, 7, 29882–29890. [Google Scholar] [CrossRef]
- Deng, J.; Pang, G.; Zhang, Z.; Pang, Z.; Yang, H.; Yang, G. cGAN Based Facial Expression Recognition for Human-Robot Interaction. IEEE Access 2019, 7, 9848–9859. [Google Scholar] [CrossRef]
- Shan, K.; Guo, J.; You, W.; Lu, D.; Bie, R. Automatic facial expression recognition based on a deep convolutional-neural-network structure. In Proceedings of the 2017 IEEE 15th International Conference on Software Engineering Research, Management and Applications (SERA), London, UK, 7–9 June 2017; pp. 123–128. [Google Scholar] [CrossRef]
- Ige, E.O.; Debattista, K.; Chalmers, A. Towards hdr based facial expression recognition under complex lighting. In Proceedings of the 33rd Computer Graphics International, Heraklion, Greece, 28 June–1 July 2016; ACM: New York, NY, USA, 2016; pp. 49–52. [Google Scholar] [CrossRef]
- Berretti, S.; Amor, B.B.; Daoudi, M.; Del Bimbo, A. 3D facial expression recognition using SIFT descriptors of automatically detected keypoints. Vis. Comput. 2011, 27, 1021. [Google Scholar] [CrossRef]
- Rassadin, A.; Gruzdev, A.; Savchenko, A. Group-level emotion recognition using transfer learning from face identification. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK, 13–17 November 2017; ACM: New York, NY, USA, 2017; pp. 544–548. [Google Scholar] [CrossRef] [Green Version]
- Zhang, S.; Pan, X.; Cui, Y.; Zhao, X.; Liu, L. Learning Affective Video Features for Facial Expression Recognition via Hybrid Deep Learning. IEEE Access 2019, 7, 32297–32304. [Google Scholar] [CrossRef]
- Tan, H.; Zhang, Y.; Cheri, H.; Zhao, Y.; Wang, W. Person-independent expression recognition based on person-similarity weighted expression feature. J. Syst. Eng. Electron. 2010, 21, 118–126. [Google Scholar] [CrossRef]
- Sang, D.V.; Cuong, L.T.B.; Van Thieu, V. Multi-task learning for smile detection, emotion recognition and gender classification. In Proceedings of the Eighth International Symposium on Information and Communication Technology, Nha Trang City, Vietnam, 7–8 December 2017; ACM: New York, NY, USA, 2017; pp. 340–347. [Google Scholar] [CrossRef]
- Yu, Z.; Zhang, C. Image based static facial expression recognition with multiple deep network learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, 9–13 November 2015; ACM: New York, NY, USA, 2015; pp. 435–442. [Google Scholar] [CrossRef]
- Ng, H.W.; Nguyen, V.D.; Vonikakis, V.; Winkler, S. Deep learning for emotion recognition on small datasets using transfer learning. In Proceedings of the 2015 ACM on International Conference On Multimodal Interaction, Seattle, WA, USA, 9–13 November 2015; ACM: New York, NY, USA, 2015; pp. 443–449. [Google Scholar] [CrossRef]
- Levi, G.; Hassner, T. Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In Proceedings of the 2015 ACM on International Conference On Multimodal Interaction, Seattle, WA, USA, 9–13 November 2015; ACM: New York, NY, USA, 2015; pp. 503–510. [Google Scholar] [CrossRef]
- Sert, M.; Aksoy, N. Recognizing facial expressions of emotion using action unit specific decision thresholds. In Proceedings of the 2nd Workshop on Advancements in Social Signal Processing for Multimodal Interaction, Tokyo, Japan, 12–16 November 2016; ACM: New York, NY, USA, 2016; pp. 16–21. [Google Scholar] [CrossRef]
- Sun, B.; Li, L.; Zhou, G.; Wu, X.; He, J.; Yu, L.; Li, D.; Wei, Q. Combining multimodal features within a fusion network for emotion recognition in the wild. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, 9–13 November 2015; ACM: New York, NY, USA, 2015; pp. 497–502. [Google Scholar] [CrossRef]
- Danelakis, A.; Theoharis, T.; Pratikakis, I. A spatio-temporal wavelet-based descriptor for dynamic 3D facial expression retrieval and recognition. Vis. Comput. 2016, 32, 1001–1011. [Google Scholar] [CrossRef]
- Hossain, M.S.; Muhammad, G. An emotion recognition system for mobile applications. IEEE Access 2017, 5, 2281–2287. [Google Scholar] [CrossRef]
- Zhao, J.; Mao, X.; Zhang, J. Learning deep facial expression features from image and optical flow sequences using 3D CNN. Vis. Comput. 2018, 34, 1461–1475. [Google Scholar] [CrossRef]
- Tang, Y.; Zhang, X.M.; Wang, H. Geometric-convolutional feature fusion based on learning propagation for facial expression recognition. IEEE Access 2018, 6, 42532–42540. [Google Scholar] [CrossRef]
- Stankovic, I.; Karnjanadecha, M. Use of septum as reference point in a neurophysiologic approach to facial expression recognition. Songklanakarin J. Sci. Technol. 2013, 35, 461–468. [Google Scholar]
- Uddin, M.Z.; Hassan, M.M.; Almogren, A.; Alamri, A.; Alrubaian, M.; Fortino, G. Facial expression recognition utilizing local direction-based robust features and deep belief network. IEEE Access 2017, 5, 4525–4536. [Google Scholar] [CrossRef]
- Uddin, M.Z.; Khaksar, W.; Torresen, J. Facial expression recognition using salient features and convolutional neural network. IEEE Access 2017, 5, 26146–26161. [Google Scholar] [CrossRef]
- Danelakis, A.; Theoharis, T.; Pratikakis, I. A robust spatio-temporal scheme for dynamic 3D facial expression retrieval. Vis. Comput. 2016, 32, 257–269. [Google Scholar] [CrossRef]
- Agarwal, S.; Santra, B.; Mukherjee, D.P. Anubhav: Recognizing emotions through facial expression. Vis. Comput. 2018, 34, 177–191. [Google Scholar] [CrossRef]
- Ding, Y.; Zhao, Q.; Li, B.; Yuan, X. Facial expression recognition from image sequence based on LBP and Taylor expansion. IEEE Access 2017, 5, 19409–19419. [Google Scholar] [CrossRef]
- Kabir, M.H.; Salekin, M.S.; Uddin, M.Z.; Abdullah-Al-Wadud, M. Facial expression recognition from depth video with patterns of oriented motion flow. IEEE Access 2017, 5, 8880–8889. [Google Scholar] [CrossRef]
- Agarwal, S.; Chatterjee, M.; Mukherjee, P.D. Recognizing facial expressions using a novel shape motion descriptor. In Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing, Mumbai, India, 16–19 December 2012; ACM: New York, NY, USA, 2012; p. 29. [Google Scholar] [CrossRef]
- Datcu, D.; Rothkrantz, L. Facial expression recognition in still pictures and videos using active appearance models: A comparison approach. In Proceedings of the 2007 International Conference on Computer Systems and Technologies, Bulgaria, 14–15 June 2007; ACM: New York, NY, USA, 2007; p. 112. [Google Scholar] [CrossRef]
- Berretti, S.; Del Bimbo, A.; Pala, P. Automatic facial expression recognition in real-time from dynamic sequences of 3D face scans. Vis. Comput. 2013, 29, 1333–1350. [Google Scholar] [CrossRef]
- Caridakis, G.; Malatesta, L.; Kessous, L.; Amir, N.; Raouzaiou, A.; Karpouzis, K. Modeling naturalistic affective states via facial and vocal expressions recognition. In Proceedings of the 8th International Conference on Multimodal Interfaces, Banff, AB, Canada, 2–4 November 2006; ACM: New York, NY, USA, 2006; pp. 146–154. [Google Scholar] [CrossRef]
- Meng, H.; Romera-Paredes, B.; Bianchi-Berthouze, N. Emotion recognition by two view SVM_2K classifier on dynamic facial expression features. In Proceedings of the Ninth IEEE International Conference on Automatic Face and Gesture Recognition (FG 2011), Santa Barbara, CA, USA, 21–25 March 2011; pp. 854–859. [Google Scholar] [CrossRef]
- Kumano, S.; Otsuka, K.; Yamato, J.; Maeda, E.; Sato, Y. Pose-invariant facial expression recognition using variable-intensity templates. In Asian Conference on Computer Vision; Springer: Berlin, Germany, 2007; pp. 324–334. [Google Scholar] [CrossRef]
- Park, S.Y.; Lee, S.H.; Ro, Y.M. Subtle facial expression recognition using adaptive magnification of discriminative facial motion. In Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, Australia, 26–30 October 2015; ACM: New York, NY, USA, 2015; pp. 911–914. [Google Scholar] [CrossRef]
- Pan, X.; Ying, G.; Chen, G.; Li, H.; Li, W. A Deep Spatial and Temporal Aggregation Framework for Video-Based Facial Expression Recognition. IEEE Access 2019, 7, 48807–48815. [Google Scholar] [CrossRef]
- Ghazi, M.M.; Ekenel, H.K. Automatic emotion recognition in the wild using an ensemble of static and dynamic representations. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan, 12–16 November 2016; ACM: New York, NY, USA, 2016; pp. 514–521. [Google Scholar] [CrossRef]
- Almaev, T.R.; Yüce, A.; Ghitulescu, A.; Valstar, M.F. Distribution-based iterative pairwise classification of emotions in the wild using lgbp-tofp. In Proceedings of the 15th ACM on International Conference On Multimodal Interaction, Sydney, NSW, Australia, 9–13 December 2013; ACM: New York, NY, USA, 2013; pp. 535–542. [Google Scholar] [CrossRef]
- Gehrig, T.; Ekenel, H.K. Why is facial expression analysis in the wild challenging? In Proceedings of the 2013 on Emotion Recognition in the Wild Challenge and Workshop, Sydney, Australia, 9 December 2013; ACM: New York, NY, USA, 2013; pp. 9–16. [Google Scholar] [CrossRef]
- Rázuri, J.G. Decision-making content of an agent affected by emotional feedback provided by capture of human’s emotions through a Bimodal System. 2015. Available online: https://pdfs.semanticscholar.org/111c/55156dac0e7b31a13e80ca6a4534cd962174.pdf?_ga=2.192627626.1409604446.1572417099-1535876467.1565229560 (accessed on 1 October 2019).
- Rashid, M.; Abu-Bakar, S.; Mokji, M. Human emotion recognition from videos using spatio-temporal and audio features. Vis. Comput. 2013, 29, 1269–1275. [Google Scholar] [CrossRef]
- Bejani, M.; Gharavian, D.; Charkari, N.M. Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks. Neural Comput. Appl. 2014, 24, 399–412. [Google Scholar] [CrossRef]
- Paleari, M.; Huet, B.; Chellali, R. Towards multimodal emotion recognition: A new approach. In Proceedings of the ACM International Conference on Image and Video Retrieval, Xi’an, China, 5–7 July 2010; ACM: New York, NY, USA, 2010; pp. 174–181. [Google Scholar] [CrossRef]
- Liu, C.; Tang, T.; Lv, K.; Wang, M. Multi-feature based emotion recognition for video clips. In Proceedings of the 2018 on International Conference on Multimodal Interaction, Boulder, CO, USA, 16–20 October 2018; ACM: New York, NY, USA, 2018; pp. 630–634. [Google Scholar] [CrossRef]
- Mansoorizadeh, M.; Charkari, N.M. Bimodal person-dependent emotion recognition comparison of feature level and decision level information fusion. In Proceedings of the 1st International Conference on PErvasive Technologies Related to Assistive Environments, Athens, Greece, 16–18 July 2008; ACM: New York, NY, USA, 2008; p. 90. [Google Scholar] [CrossRef]
- Ding, W.; Xu, M.; Huang, D.; Lin, W.; Dong, M.; Yu, X.; Li, H. Audio and face video emotion recognition in the wild using deep neural networks and small datasets. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan, 12–16 November 2016; ACM: New York, NY, USA, 2016; pp. 506–513. [Google Scholar] [CrossRef]
- Yao, A.; Shao, J.; Ma, N.; Chen, Y. Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, 9–13 November 2015; ACM: New York, NY, USA, 2015; pp. 451–458. [Google Scholar] [CrossRef]
- Kaya, H.; Gürpinar, F.; Afshar, S.; Salah, A.A. Contrasting and combining least squares based learners for emotion recognition in the wild. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, 9–13 November 2015; ACM: New York, NY, USA, 2015; pp. 459–466. [Google Scholar] [CrossRef]
- Ebrahimi Kahou, S.; Michalski, V.; Konda, K.; Memisevic, R.; Pal, C. Recurrent neural networks for emotion recognition in video. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, 9–13 November 2015; ACM: New York, NY, USA, 2015; pp. 467–474. [Google Scholar] [CrossRef]
- Pini, S.; Ahmed, O.B.; Cornia, M.; Baraldi, L.; Cucchiara, R.; Huet, B. Modeling multimodal cues in a deep learning-based framework for emotion recognition in the wild. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK, 13–17 November 2017; ACM: New York, NY, USA, 2017; pp. 536–543. [Google Scholar] [CrossRef] [Green Version]
- Gideon, J.; Zhang, B.; Aldeneh, Z.; Kim, Y.; Khorram, S.; Le, D.; Provost, E.M. Wild wild emotion: A multimodal ensemble approach. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan, 12–16 November 2016; ACM: New York, NY, USA, 2016; pp. 501–505. [Google Scholar] [CrossRef]
- Chen, J.; Chen, Z.; Chi, Z.; Fu, H. Emotion recognition in the wild with feature fusion and multiple kernel learning. In Proceedings of the 16th International Conference on Multimodal Interaction, Istanbul, Turkey, 12–16 November 2014; ACM: New York, NY, USA, 2014; pp. 508–513. [Google Scholar] [CrossRef]
- Paleari, M.L.; Singh, V.; Huet, B.; Jain, R. Toward environment-to-environment (E2E) affective sensitive communication systems. In Proceedings of the First, ACM International Workshop on Multimedia Technologies for Distance Learning, Beijing, China, 19–24 October 2009; ACM: New York, NY, USA, 2009; pp. 19–26. [Google Scholar]
- Sidorov, M.; Minker, W. Emotion recognition in real-world conditions with acoustic and visual features. In Proceedings of the 16th International Conference on Multimodal Interaction, Istanbul, Turkey, 12–16 November 2014; ACM: New York, NY, USA, 2014; pp. 521–524. [Google Scholar] [CrossRef]
- Kahou, S.E.; Pal, C.; Bouthillier, X.; Froumenty, P.; Gülçehre, Ç.; Memisevic, R.; Vincent, P.; Courville, A.; Bengio, Y.; Ferrari, R.C.; et al. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the 15th ACM on International Conference On Multimodal Interaction, Sydney, Australia, 9–13 Decemebr 2013; ACM: New York, NY, USA, 2013; pp. 543–550. [Google Scholar] [CrossRef] [Green Version]
- Krishna, T.; Rai, A.; Bansal, S.; Khandelwal, S.; Gupta, S.; Goyal, D. Emotion recognition using facial and audio features. In Proceedings of the 15th ACM on International Conference On Multimodal Interaction, Sydney, Australia, 9–13 December 2010; ACM: New York, NY, USA, 2013; pp. 557–564. [Google Scholar] [CrossRef]
- Wang, H.; Huang, H.; Hu, Y.; Anderson, M.; Rollins, P.; Makedon, F. Emotion detection via discriminative kernel method. In Proceedings of the 3rd International Conference on Pervasive Technologies Related to Assistive Environments, Corfu, Greece, 26–29 June 2018; ACM: New York, NY, USA, 2010; p. 7. [Google Scholar] [CrossRef]
- Nicolaou, M.A.; Gunes, H.; Pantic, M. A multi-layer hybrid framework for dimensional emotion classification. In Proceedings of the 19th ACM International Conference on Multimedia, Scottsdale, AZ, USA, November 28–December 01 2011; ACM: New York, NY, USA, 2011; pp. 933–936. [Google Scholar] [CrossRef] [Green Version]
- Chao, L.; Tao, J.; Yang, M.; Li, Y.; Wen, Z. Multi-scale temporal modeling for dimensional emotion recognition in video. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, Orlando, FL, USA, 7 November 2014; ACM: New York, NY, USA, 2014; pp. 11–18. [Google Scholar] [CrossRef]
- Meng, H.; Bianchi-Berthouze, N. Affective state level recognition in naturalistic facial and vocal expressions. IEEE Trans. Cybern. 2013, 44, 315–328. [Google Scholar] [CrossRef]
- Song, Y.; Morency, L.P.; Davis, R. Learning a sparse codebook of facial and body microexpressions for emotion recognition. In Proceedings of the 15th ACM on International Conference On Multimodal Interaction, Sydney, Australia, 9–13 December 2013; ACM: New York, NY, USA, 2013; pp. 237–244. [Google Scholar] [CrossRef] [Green Version]
- Meng, H.; Bianchi-Berthouze, N.; Deng, Y.; Cheng, J.; Cosmas, J.P. Time-delay neural network for continuous emotional dimension prediction from facial expression sequences. IEEE Trans. Cybern. 2015, 46, 916–929. [Google Scholar] [CrossRef] [PubMed]
- Liu, M.; Wang, R.; Li, S.; Shan, S.; Huang, Z.; Chen, X. Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction, Istanbul, Turkey, 12–16 November 2014; ACM: New York, NY, USA, 2014; pp. 494–501. [Google Scholar] [CrossRef]
- Fan, Y.; Lu, X.; Li, D.; Liu, Y. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan, 12–16 November 2016; ACM: New York, NY, USA, 2016; pp. 445–450. [Google Scholar] [CrossRef]
- Hu, P.; Cai, D.; Wang, S.; Yao, A.; Chen, Y. Learning supervised scoring ensemble for emotion recognition in the wild. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK, 13–17 November 2017; ACM: New York, NY, USA, 2017; pp. 553–560. [Google Scholar] [CrossRef]
- Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Deep face recognition. BMVC 2015, 1, 6. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Database | Capacity | Emotion | Environment | Facial Expressions | Website |
---|---|---|---|---|---|
CK+ | 593 videos | Posed | Controlled | 8 | [59] |
JAFFE | 213 images | Posed | Controlled | 7 | [60] |
BU-3DFE | 606 videos | Posed and Spontaneous | Controlled | 7 | [61] |
FER-2013 | 35,887 images | Posed and Spontaneous | Uncontrolled | 7 | [62] |
EmotiW | 1268 videos and 700 images | Spontaneous | Uncontrolled | 7 | [63] |
MMI | 2900 videos | Posed | Controlled | 6 | [64] |
eNTERFACE’05 | 1166 videos | Spontaneous | Controlled | 6 | [65] |
KDEF | 4900 images | Posed | Controlled | 7 | [66] |
RaFD | 8040 images | Posed | Controlled | 8 | [67] |
Year | Classifier | Pre-processing | Features | Testing Procedure | Databases | Accuracy |
---|---|---|---|---|---|---|
[102] 2018 | CNN (TL) | DA | ROI | CV (PI) | CK+/SFEW | 98.90% |
[103] 2017 | WMDNN (TL) | Geometric/DA | LBP | CV | CK+/JAFFE/CASIA | 97.02% |
[104] 2018 | DT/NN | Geometric | LBF | CV (PI) | CK+/JAFFE/… | 96.48% |
[105] 2019 | CNN | Smooth | LBP/AUs | CV | CK+/JAFFE | 96.46% |
[106] 2019 | CNN | DA | ROI | HO | FER-2013/JAFFE/… | 96.44% |
[107] 2018 | CNN | Geometric | ROI | HO (PI) | CK+/RaFD/ADFES | 96.27% |
[108] 2018 | CNN/SVM | - | Gabor | CV | KDEF | 96.26% |
[109] 2017 | IACNN (TL) | Geometric/DRMF | ROI | CV (PI) | CK+/SFEW/MMI | 95.37% |
[110] 2016 | RBF | HE | Curvelet | CV | CK/JAFFE | 95.17% |
[111] 2016 | NN/SVM | HE/Smooth | LBP/Gabor | CV | CK+/MMI | 94.66% |
[112] 2017 | SVM/CRF | - | AAM/Gabor | CV | CK+ | 93.93% |
[113] 2018 | CNN (TL) | Geometric/DA | Facial motions | CV (PI) | CK+/JAFFE/SFEW | 93.64% |
[114] 2018 | RF | - | AUs/HOG | CV (PI) | CK+/BU4D/SFEW | 93.40% |
[115] 2016 | KNN | - | Landmarks | HO | JAFFE/KDEF | 92.29% |
[116] 2008 | ED | - | AAM | CV | FEEDTUM | 91.70% |
[117] 2018 | SVM | CLAHE | DCT | HO | CK+/JAFFE/MMI | 91.10% |
[118] 2019 | SVM | Smooth | CS-LOP | CV (PI) | CK+/JAFFE | 89.58% |
[119] 2019 | CNN | Geometric/DA | ROI/AUs | HO | RaFD/AffectNet | 81.83% |
[120] 2017 | CNN | Geometric/HE | ROI | HO | CK+/JAFFE | 80.30% |
[121] 2016 | SVM | HDR | LBP/SURF | CV (PI) | JAFFE/SFEW | 79.80% |
[122] 2011 | SVM | PCA/Geometric/Smooth | SIFT | CV (PI) | BU-3DFE | 78.43% |
[123] 2017 | RF | - | Landmarks/HOG | HO | SFEW | 75.39% |
[124] 2019 | DBN/SVM (TL) | - | ROI/OF | CV (PI) | BAUM/RML/MMI | 73.73% |
[125] 2010 | HOSVD | - | Gabor/AUs | CV (PI) | CK/JAFFE | 73.30% |
[126] 2017 | CNN | Geometric/DA | ROI | CV | FER-2013 | 71.03% |
[127] 2015 | CNN (TL) | Geometric/HE | ROI | HO | FER-2013/SFEW | 61.29% |
[128] 2015 | CNN (TL) | Geometric | Landmarks | HO | FER-2013/SFEW | 55.60% |
[129] 2015 | CNN (TL) | Geometric | LBP | HO | SFEW | 54.56% |
[130] 2016 | SVM | - | AAM/AUs | CV | CK+ | 54.47% |
[131] 2015 | CNN (TL) | Geometric | LBP/HOG/… | CV | SFEW | 51.08% |
Year | Classifier | Pre-processing | Features | Testing Procedure | Databases | Accuracy |
---|---|---|---|---|---|---|
[132] 2016 | ED | Geometric | Landmarks | HO | BU-4DFE/BP4D-S | 100.00% |
[133] 2017 | GMM | Bandlet | LBP/KW | HO | CK/JAFFE | 99.80% |
[134] 2018 | CNN | HE/Geometric | OF | HO | CK+/SAVEE/AFEW | 98.77% |
[135] 2018 | DFSN-I | Geometric | AUs | CV (PI) | CK+/MMI/CASIA | 98.73% |
[136] 2013 | SVM | - | AAM/AUs | CV | CK+ | 96.80% |
[137] 2017 | DBN | PCA | LDPP | CV | Depth | 96.67% |
[138] 2017 | CNN | PCA | LDSP/LDRHP | CV | CK/Bosphorus | 96.25% |
[139] 2016 | ED | - | Landmarks | CV | BU-4DFE | 96.04% |
[140] 2018 | SVM | - | LBP | CV | CK+/MUG | 95.80% |
[141] 2017 | TFP | LL | LBP | HO | CK/JAFFE | 94.84% |
[142] 2017 | HMM | - | LBP/POMF | HO | Depth | 94.17% |
[143] 2012 | SVM/RBF | - | OF/HOG | CV (PI) | CK | 87.44% |
[144] 2007 | SVM | - | AAM/AUs | CV | CK | 85.00% |
[145] 2013 | HMM | Geometric | Landmarks | CV (PI) | BU-4DFE | 79.40% |
[146] 2006 | RNN | - | FAPs | HO | SAL | 79.00% |
[147] 2011 | SVM | - | LBP/MHH | HO | GEMEP | 70.30% |
[148] 2007 | Bayes | Intensity adjustment | Landmarks | CV (PI) | CK | 70.20% |
[149] 2015 | SVM | - | LBP_TOP | CV (PI) | CASME II | 69.63% |
[150] 2019 | CNN/LSTM (TL) | Geometric | ROI/OF | CV (PI) | RML/eNTERFACE’05 | 65.72% |
[151] 2016 | CNN/SVM | PCA/Geometric | LBP_TOP/SIFT | HO | AFEW | 40.13% |
[152] 2013 | SVM | PCA | LBP/Gabor | HO | AFEW | 30.05% |
[153] 2013 | SVM | Geometric | Gabor/AUs | HO | AFEW | 29.81% |
Year | Classifiers | Pre-Processing | Features | Testing Procedure | Databases | Accuracy |
---|---|---|---|---|---|---|
[154] 2015 | Bayes | Geometric | MK | CV | eNTERFACE’05 | 98.00% |
[155] 2013 | SVM | Smooth | Gabor/PCA | CV | eNTERFACE’05 | 80.27% |
[156] 2014 | MLP/RBF | - | ITMI/QIM | HO (PI) | CK/eNTERFACE’05 | 77.78% |
[157] 2010 | NN | - | FAPs/OF | HO (PI) | eNTERFACE’05 | 75.00% |
[158] 2018 | ED/CNN/LSTM (TL) | Geometric/DA | Landmarks | CV | AFEW/STED | 61.87% |
[159] 2008 | SVM | PCA | Landmarks | HO | eNTERFACE’05 | 57.00% |
[160] 2016 | CNN (TL) | - | ROI | HO | FER-2013/AFEW | 53.90% |
[161] 2015 | SVM | DCT/Geometric | AUs | HO | EmotiW | 53.80% |
[162] 2015 | CNN | PCA | SIFT/LBP/… | CV | AFEW | 53.62% |
[163] 2015 | CNN/RNN (TL) | HE/DA | ROI | HO | TFD/FER-2013/AFEW | 52.88% |
[164] 2017 | CNN | Geometric | ROI | CV (PI) | AFEW/FER-2013/… | 49.92% |
[165] 2016 | SVM/RF | - | LBP_TOP/AUs | CV | AFEW | 46.88% |
[166] 2014 | SVM | Geometric | HOG_TOP | HO | AFEW | 45.21% |
[167] 2009 | SVM/NN | - | FAPs/OF | HO | eNTERFACE’05 | 45.00% |
[168] 2014 | SVM | Geometric | LBP_TOP | HO | AFEW | 41.77% |
[169] 2013 | SVM/CNN (TL) | Smooth/Contrast | ROI | HO | TFD/AFEW | 41.03% |
[170] 2013 | SVM/HMM | - | Gabor/OF | HO | AFEW | 20.51% |
Year | Classifier | Pre-Processing | Features | Testing Procedure | Databases | Accuracy |
---|---|---|---|---|---|---|
[171] 2010 | SVM | LDA | Landmarks | CV | JAFFE | 87.50% |
Year | Classifier | Approach | Pre-Processing | Features | Testing Procedure | Databases | Accuracy |
---|---|---|---|---|---|---|---|
[172] 2011 | LSTM/HMM/SVM | Video | - | Landmarks | CV (PI) | SAL | 85.00% |
[173] 2014 | NN | Audiovisual | Geometric/PCA | Landmarks | HO | AVEC2013 | 54.99% |
[174] 2013 | KNN/HMM | Audiovisual | PCA | LBP/AAM | CV (PI) | AVEC2011/… | 52.60% |
[175] 2013 | SVR | Audiovisual | - | HOG/Harris3D | HO | AVEC2012 | 41.70% |
[176] 2015 | KNN/SVR | Video | - | LBP/EOH/LPQ | CV | AVEC2012/2013 | 14.09% |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Canedo, D.; Neves, A.J.R. Facial Expression Recognition Using Computer Vision: A Systematic Review. Appl. Sci. 2019, 9, 4678. https://doi.org/10.3390/app9214678
Canedo D, Neves AJR. Facial Expression Recognition Using Computer Vision: A Systematic Review. Applied Sciences. 2019; 9(21):4678. https://doi.org/10.3390/app9214678
Chicago/Turabian StyleCanedo, Daniel, and António J. R. Neves. 2019. "Facial Expression Recognition Using Computer Vision: A Systematic Review" Applied Sciences 9, no. 21: 4678. https://doi.org/10.3390/app9214678
APA StyleCanedo, D., & Neves, A. J. R. (2019). Facial Expression Recognition Using Computer Vision: A Systematic Review. Applied Sciences, 9(21), 4678. https://doi.org/10.3390/app9214678