Siamese Architecture-Based 3D DenseNet with Person-Specific Normalization Using Neutral Expression for Spontaneous and Posed Smile Classification
Abstract
:1. Introduction
2. Related Works
2.1. Hand-Crafted Feature-Based Approaches
2.2. Deep Learning-Based Approaches
3. Materials and Methods
3.1. Data Collection
3.2. Data Preprocessing
3.2.1. Automatic Detection of Smile Moments
3.2.2. Facial Region Segmentation
3.2.3. Time-Series Segmentation Using a Fixed-Size Sliding Window
3.3. Method
3.3.1. 3D DenseNet Based on Siamese Architecture
3.3.2. Model Structure
3.3.3. Loss Function and Parameter Initialization
4. Experimental Results
4.1. Performance of the Proposed Model
4.2. Visual Explanations Using Grad-CAM for Siamese Architecture-Based 3D CNN
4.3. Performance Comparison between 3D CNN Models with Diversiform Structures
4.4. Comparison between Neutral Anchor Siamese Net and Posed Anchor Siamese Net
5. Discussion
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Bibri, S.E. The Human Face of Ambient Intelligence; Atlantis Press: Amsterdam, The Netherlands, 2015; pp. 427–431. [Google Scholar]
- Duthoit, C.J.; Sztynda, T.; Lal, S.K.; Jap, B.T.; Agbinya, J.I. Optical flow image analysis of facial expressions of human emotion: Forensic applications. In Proceedings of the 1st International Conference on Forensic Applications and Techniques in Telecommunications, Information, and Multimedia and Workshop, Adelaide, Australia, 21–23 January 2008; ICST: Brussels, Belgium, 2008. [Google Scholar]
- Haamer, R.E.; Kulkarni, K.; Imanpour, N.; Haque, M.A.; Avots, E.; Breisch, M.; Naghsh-Nilchi, A.R. Changes in facial expression as biometric: A database and benchmarks of identification. In Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition, Xi’an, China, 15–19 May 2018; IEEE: Piscataway, NJ, USA, 2018. [Google Scholar]
- Manfredonia, J.; Bangerter, A.; Manyakov, N.V.; Ness, S.; Lewin, D.; Skalkin, A.; Leventhal, B. Automatic recognition of posed facial expression of emotion in individuals with autism spectrum disorder. J. Autism Dev. Disord. 2019, 49, 279–293. [Google Scholar] [CrossRef] [PubMed]
- Adams, B., Jr.; Garrido, C.O.; Albohn, D.N.; Hess, U.; Kleck, R.E. What facial appearance reveals over time: When perceived expressions in neutral faces reveal stable emotion dispositions. Front. Psychol. 2016, 7, 986. [Google Scholar]
- Li, S.; Deng, W. Deep facial expression recognition: A survey. IEEE Trans. Affect. Comput. 2020. [Google Scholar] [CrossRef] [Green Version]
- Caltagirone, C.; Ekman, P.; Friesen, W.; Gainotti, G.; Mammucari, A.; Pizzamiglio, L.; Zoccolotti, P. Posed emotional expression in unilateral brain damaged patients. Cortex 1989, 25, 653–663. [Google Scholar] [CrossRef]
- Adolphs, R. Recognizing emotion from facial expressions: Psychological and neurological mechanisms. Behav. Cogn. Neurosci. Rev. 2002, 1, 21–62. [Google Scholar] [CrossRef] [Green Version]
- Jankovic, J. Parkinson’s disease: Clinical features and diagnosis. J. Neurol. Neurosurg. Psychiatry 2008, 79, 368–376. [Google Scholar] [CrossRef] [Green Version]
- Smith, M.C.; Smith, M.K.; Ellgring, H. Spontaneous and posed facial expression in Parkinson’s disease. J. Int. Neuropsychol. Soc. 1996, 2, 383–391. [Google Scholar] [CrossRef]
- Ekman, P.; Hager, J.C.; Friesen, W.V. The symmetry of emotional and deliberate facial actions. Psychophysiology 1981, 18, 101–106. [Google Scholar] [CrossRef]
- Ross, E.D.; Pulusu, V.K. Posed versus spontaneous facial expressions are modulated by opposite cerebral hemispheres. Cortex 2013, 49, 1280–1291. [Google Scholar] [CrossRef]
- Borod, J.C.; Koff, E.; Lorch, M.P.; Nicholas, M. The expression and perception of facial emotion in brain-damaged patients. Neuropsychologia 1986, 24, 169–180. [Google Scholar] [CrossRef]
- Gazzaniga, M.S.; Smylie, C.S. Hemispheric mechanisms controlling voluntary and spontaneous facial expressions. J. Cogn. Neurosci. 1990, 2, 239–245. [Google Scholar] [CrossRef]
- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 221–231. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese Neural Networks for One-Shot Image Recognition. Master’s Thesis, University of Toronto, Toronto, Canada, 2015. [Google Scholar]
- Rasti, B.; Hong, D.; Hang, R.; Ghamisi, P.; Kang, X.; Chanussot, J.; Benediktsson, J.A. Feature extraction for hyperspectral imagery: The evolution from shallow to deep. arXiv 2020, arXiv:2003.02822. [Google Scholar]
- Oh, S.H.; Xiang, Y.; Jegelka, S.; Savarese, S. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4004–4012. [Google Scholar]
- Dibeklioglu, H.; Valenti, R.; Salah, A.A.; Gevers, T. Eyes do not lie: Spontaneous versus posed smiles. In Proceedings of the 18th ACM international conference on Multimedia, Firenze, Italy, 25–29 October 2010; pp. 703–706. [Google Scholar]
- Duchenne, G.B.; de Boulogne, G.B.D. Duchenne and facial expression of emotion. In The Mechanism of Human Facial Expression; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
- Dibeklioğlu, H.; Salah, A.A.; Gevers, T. Recognition of genuine smiles. IEEE Trans. Multimed. 2015, 17, 279–294. [Google Scholar] [CrossRef]
- Valstar, M.F.; Gunes, H.; Pantic, M. How to distinguish posed from spontaneous smiles using geometric features. In Proceedings of the 9th International Conference on Multimodal Interfaces, Nagoya Aichi, Japan, 12–15 November 2007; ACM: New York, NY, USA, 2007; pp. 28–45. [Google Scholar]
- Wu, P.P.; Liu, H.; Zhang, X.W.; Gao, Y. Spontaneous versus posed smile recognition via region-specific texture descriptor and geometric facial dynamics. Front. Inf. Technol. Electron. Eng. 2017, 18, 955–967. [Google Scholar] [CrossRef]
- FERC 2013 Form 714—Annual Electric Balancing Authority Area and Planning Area Report (Part 3 Schedule 2). Form 714 Database, Federal Energy Regulatory Commission, 2013; pp. 2006–2012. Available online: https://datarepository.wolframcloud.com/resources/FER-2013 (accessed on 22 September 2020).
- Mandal, B.; Lee, D.; Ouarti, N. Distinguishing Posed and Spontaneous Smiles by Facial Dynamics. In Computer Vision—ACCV 2016; Springer: Cham, Switzerland, 2016; pp. 552–566. [Google Scholar]
- Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Deep face recognition. In Proceedings of the British Machine Vision Conference (BMVC), Swansea, UK, 7–10 September 2015. [Google Scholar]
- Ojansivu, V.; Heikkil, J. Blur insensitive texture classification using local phase quantization. Image Signal. Process. 2008, 5099, 236–243. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Farneback, G. Two-frame motion estimation based on polynomial expansion. In Proceedings of the SCIA’03, the 13th Scandinavian Conference on Image Analysis, Halmstad, Sweden, 29 June–2 July 2003; pp. 363–370. [Google Scholar]
- Gan, Q.; Wu, C.; Wang, S.; Ji, Q. Posed and spontaneous facial expression differentiation using deep Boltzmann machines. In Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), Xi’an, China, 21–24 September 2015; pp. 643–648. [Google Scholar]
- Wang, S.; Liu, Z.; Lv, S.; Lv, Y.; Wu, G.; Peng, P.; Wang, X. A natural visible and infrared facial expression database for expression recognition and emotion inference. IEEE Trans. Multimed. 2010, 12, 682–691. [Google Scholar] [CrossRef]
- Pfister, T.; Li, X.; Zhao, G.; Pietikainen, M. Differentiating spontaneous from posed facial expressions within a generic facial expression recognition framework. In Proceedings of the Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 868–875. [Google Scholar]
- Kumar, G.R.; Kumar, R.K.; Sanyal, G. Discriminating real from fake smile using convolution neural network. In Proceedings of the 2017 International Conference on Computational Intelligence in Data Science (ICCIDS) IEEE, Chennai, India, 2–3 June 2017; pp. 1–6. [Google Scholar]
- Valstar, M.; Pantic, M. Induced disgust, happiness and surprise: An addition to the MMI facial expression database. In Proceedings of the 3rd International Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, Valletta, Malta, 18 May 2010; pp. 65–70. [Google Scholar]
- Yang, Y.; Hossain, M.Z.; Gedeon, T.; Rahman, S. RealSmileNet: A Deep End-To-End Network for Spontaneous and Posed Smile Recognition. In Proceedings of the 15th Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020. [Google Scholar]
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28, 802–810. [Google Scholar]
- Wang, S.; Hao, L.; Ji, Q. Posed and Spontaneous Expression Distinction Using Latent Regression Bayesian Networks. ACM Trans. Multimed. Comput. Commun. Appl. 2020, 16, 1–18. [Google Scholar] [CrossRef]
- Baltrušaitis, T.; Mahmoud, M.; Robinson, P. Cross-dataset learning and person-specific normalisation for automatic action unit detection. In Proceedings of the 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Ljubljana, Slovenia, 4–8 May 2015; Volume 6. [Google Scholar]
- Lee, K.; Lee, E.C. Facial asymmetry feature based spontaneous facial expression classification using temporal convolutional networks and support vector regression. Basic Clin. Pharmacol. Toxicol. 2018, 124 (Suppl. 2), 63–64. [Google Scholar]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 2014, 27, 568–576. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Hoffer, E.; Ailon, N. Deep metric learning using triplet network. In Similarity-Based Pattern Recognition. SIMBAD 2015. Lecture Notes in Computer Science; Feragen, A., Pelillo, M., Loog, M., Eds.; Springer: Cham, Switzerland, 2015; pp. 84–92. [Google Scholar]
- Park, S.; Lee, K.; Lim, J.A.; Ko, H.; Kim, T.; Lee, J.I.; Lee, J.Y. Differences in Facial Expressions between Spontaneous and Posed Smiles: Automated Method by Action Units and Three-Dimensional Facial Landmarks. Sensors 2020, 20, 1199. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hong, D.; Yokoya, N.; Chanussot, J.; Zhu, X.X. An augmented linear mixing model to address spectral variability for hyperspectral unmixing. IEEE Trans. Image Process. 2018, 28, 1923–1938. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Number of Subjects | Number of Spontaneous Smile Data (%) | Number of Posed Smile Data (%) | Number of Total Data (%) | |
---|---|---|---|---|
Training set | 68 | 18,300 (75%) | 23,486 (77%) | 41,786 (76%) |
Validation set | 10 | 1719 (7%) | 3753 (12%) | 5472 (10%) |
Test set | 10 | 4473 (18%) | 3261 (11%) | 7734 (14%) |
Total | 88 | 24,492 (100%) | 30,500 (100%) | 54,992 (100%) |
Layers | Output Size (Width × Height × Time Series) | 3D DenseNet Based on Siamese Architecture-60 |
---|---|---|
3D convolution | 56 × 56 × 30 | |
3D dense block (1) | 56 × 56 × 30 | |
Transition layer (1) | 56 × 56 × 30 | |
28 × 28 × 15 | ||
3D dense block (2) | 28 × 28 × 15 | |
Transition layer (2) | 28 × 28 × 15 | |
14 × 14 × 15 | ||
3D dense block (3) | 14 × 14 × 15 | |
Transition layer (3) | 14 × 14 × 15 | |
7 × 7 × 7 | ||
3D dense block (4) | 7 × 7 × 7 | |
3D convolution | 7 × 7 × 7 |
Network | Accuracy | Precision | Recall | AUC |
---|---|---|---|---|
Lee et al. [39] | 87.13% | 90.15% | 86.94% | 87.99% |
3D DenseNet-38 | 97.06% | 97.21% | 98.53% | 98.97% |
3D DenseNet-62 | 95.87% | 98.84% | 95.10% | 99.54% |
3D DenseNet-94 | 98.12% | 97.85% | 99.44% | 99.91% |
Model | Network Depth | Accuracy | Precision | Recall | AUC |
---|---|---|---|---|---|
Neutral anchor Siamese Net (proposed model) | 38 | 97.06% | 97.21% | 98.53% | 98.97% |
62 | 95.87% | 98.84% | 95.10% | 99.54% | |
94 | 98.12% | 97.85% | 99.44% | 99.91% | |
Posed anchor Siamese Net | 38 | 91.31% | 95.85% | 91.16% | 97.65% |
62 | 93.73% | 95.20% | 95.59% | 98.01% | |
94 | 93.08% | 92.01% | 98.36% | 98.40% | |
Plain 3D CNN | 38 | 65.86% | 89.39% | 56.99% | 85.23% |
62 | 72.77% | 92.52% | 65.60% | 79.11% | |
94 | 71.24% | 93.91% | 62.08% | 90.86% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, K.; Lee, E.C. Siamese Architecture-Based 3D DenseNet with Person-Specific Normalization Using Neutral Expression for Spontaneous and Posed Smile Classification. Sensors 2020, 20, 7184. https://doi.org/10.3390/s20247184
Lee K, Lee EC. Siamese Architecture-Based 3D DenseNet with Person-Specific Normalization Using Neutral Expression for Spontaneous and Posed Smile Classification. Sensors. 2020; 20(24):7184. https://doi.org/10.3390/s20247184
Chicago/Turabian StyleLee, Kunyoung, and Eui Chul Lee. 2020. "Siamese Architecture-Based 3D DenseNet with Person-Specific Normalization Using Neutral Expression for Spontaneous and Posed Smile Classification" Sensors 20, no. 24: 7184. https://doi.org/10.3390/s20247184
APA StyleLee, K., & Lee, E. C. (2020). Siamese Architecture-Based 3D DenseNet with Person-Specific Normalization Using Neutral Expression for Spontaneous and Posed Smile Classification. Sensors, 20(24), 7184. https://doi.org/10.3390/s20247184