Multi-Attention Module for Dynamic Facial Emotion Recognition
Abstract
:1. Introduction
2. Related Works
3. Materials and Methods
3.1. Spatial Attention
3.2. Channel Attention
3.3. Frame Attention
4. Results
4.1. Datasets
4.2. Experiment
4.2.1. Experiment on CK+
4.2.2. Experiment on eNTERFACE’05
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Saravanan, S.; Ramkumar, K.; Adalarasu, K.; Sivanandam, V.; Kumar, S.R.; Stalin, S.; Amirtharajan, R. A Systematic Review of Artificial Intelligence (AI) Based Approaches for the Diagnosis of Parkinson’s Disease. Arch. Comput. Methods Eng. 2022, 1, 1–15. [Google Scholar] [CrossRef]
- Jiang, Z.; Seyedi, S.; Haque, R.U.; Pongos, A.L.; Vickers, K.L.; Manzanares, C.M.; Lah, J.J.; Levey, A.I.; Clifford, G.D. Automated analysis of facial emotions in subjects with cognitive impairment. PLoS ONE 2022, 17, e0262527. [Google Scholar] [CrossRef] [PubMed]
- Cecchetto, C.; Aiello, M.; D’Amico, D.; Cutuli, D.; Cargnelutti, D.; Eleopra, R.; Rumiati, R.I. Facial and bodily emotion recognition in multiple sclerosis: The role of alexithymia and other characteristics of the disease. J. Int. Neuropsychol. Soc. 2014, 20, 1004–1014. [Google Scholar] [CrossRef] [PubMed]
- Shan, L.; Deng, W. Deep Facial Expression Recognition: A Survey. IEEE Transactions on Affective Computing. 2020. Available online: https://ieeexplore.ieee.org/abstract/document/9039580 (accessed on 1 March 2022).
- Ekman, R. What the Face RevealsBasic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS); Oxford University Press: New York, NY, USA, 1997. [Google Scholar]
- Littlewort, G.; Bartlett, M.S.; Fasel, I.; Susskind, J.; Movellan, J. Dynamics of facial expression extracted automatically from video. In Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop, Washington, DC, USA, 27 June–2 July 2004. [Google Scholar]
- Shan, C.; Gong, S.; McOwan, P.W. Facial expression recognition based on Local Binary Patterns: A comprehensive study. Image Vis. Comput. 2009, 27, 803–816. [Google Scholar] [CrossRef] [Green Version]
- Kahou, S.E.; Michalski, V.; Konda, K.; Memisevic, R.; Pal, C. Recurrent Neural Networks for Emotion Recognition in Video. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, New York, NY, USA, 9 November 2015; pp. 467–474. [Google Scholar]
- Byeon, Y.-H.; Kwak, K.-C. Facial Expression Recognition Using 3D Convolutional Neural Network. Int. J. Adv. Comput. Sci. Appl. 2014, 5, 12. [Google Scholar] [CrossRef]
- Fan, Y.; Lu, X.; Li, D.; Liu, Y. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan, 12–16 November 2016; Nakano, Y., Ed.; The Association for Computing Machinery Inc.: New York, NY, USA, 2016; pp. 445–450, ISBN 9781450345569. [Google Scholar]
- Noroozi, F.; Marjanovic, M.; Njegus, A.; Escalera, S.; Anbarjafari, G. Audio-Visual Emotion Recognition in Video Clips. IEEE Trans. Affect. Comput. 2017, 10, 60–75. [Google Scholar] [CrossRef]
- Ma, F.; Li, Y.; Ni, S.; Huang, S.; Zhang, L. Data Augmentation for Audio–Visual Emotion Recognition with an Efficient Multimodal Conditional GAN. Appl. Sci. 2022, 12, 527. [Google Scholar] [CrossRef]
- Kanade, T.; Tian, Y.; Cohn, J.F. Comprehensive database for facial expression analysis. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition, Grenoble, France, 28–30 March 2002. [Google Scholar]
- Martin, O.; Kotsia, I.; Macq, B.; Pitas, I. The eNTERFACE’05 Audio-Visual Emotion Database, International Conference on Data Engineering Workshops. IEEE Comput. Soc. 2006, 8, 383–388. [Google Scholar] [CrossRef]
- Meng, D.; Peng, X.; Wang, K.; Qiao, Y. Frame Attention Networks for Facial Expression Recognition in Videos. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019. [Google Scholar] [CrossRef] [Green Version]
- Sepas-Moghaddam, A.; Etemad, A.; Pereira, F.; Correia, L.P. Facial emotion recognition using light field images with deep attention-based bidirectional LSTM. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020. [Google Scholar]
- Aminbeidokhti, M.; Pedersoli, M.; Cardinal, P.; Granger, E. Emotion recognition with spatial attention and temporal softmax pooling. In International Conference on Image Analysis and Recognition; Springer: Berlin/Heidelberg, Germany, 2019; pp. 323–331. [Google Scholar]
- Hu, M.; Chu, Q.; Wang, X.; He, L.; Ren, F. A two-stage spatiotemporal attention convolution network for continuous dimensional emotion recognition from facial video. IEEE Signal Process. Lett. 2021, 28, 698–702. [Google Scholar] [CrossRef]
- Wang, Y.; Wu, J.; Hoashi, K. Multi-attention fusion network for video-based emotion recognition. In Proceedings of the 2019 International Conference on Multimodal Interaction, Association for Computing Machinery, New York, NY, USA, 14 October 2019; pp. 595–601. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Model | Acc. |
---|---|
ResNet18+FC (Baseline) | 81.25% |
Baseline with SA | 85.23% |
Baseline with CA | 82.34% |
Baseline with FA | 84.38% |
Baseline with SA, CA | 85.42% |
Ours (Baseline with SA, CA, FA) | 89.52% |
CNN-RNN | 87.52% |
FAN (with Relation-attention) | 83.28% |
ResNet18+FC with CBAM [22] | 85.38% |
Vgg16+Bi-LSTM with Attention | 87.26% |
RAN+TS-SATCN | 90.17% |
Model | Acc. |
---|---|
Vgg16 [23] + FC | 72.18% |
ResNet18+FC (Baseline) | 80.83% |
Baseline with SA | 85.00% |
Baseline with CA | 85.38% |
Baseline with FA | 84.17% |
Baseline with SA, CA | 86.25% |
Ours (Baseline with SA, CA, FA) | 88.33% |
CNN-RNN | 86.18% |
FAN (with Relation-attention) | 82.08% |
ResNet18+FC with CBAM | 85.41% |
Vgg16+Bi-LSTM with Attention | 85.83% |
RAN+TS-SATCN | 89.25% |
ResNet18+FC | ResNet18+FC with CBAM | FAN | Ours | CNN-RNN |
---|---|---|---|---|
11.78 M | 11.87 M | 11.79 M | 13.39 M | 30.99 M |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhi, J.; Song, T.; Yu, K.; Yuan, F.; Wang, H.; Hu, G.; Yang, H. Multi-Attention Module for Dynamic Facial Emotion Recognition. Information 2022, 13, 207. https://doi.org/10.3390/info13050207
Zhi J, Song T, Yu K, Yuan F, Wang H, Hu G, Yang H. Multi-Attention Module for Dynamic Facial Emotion Recognition. Information. 2022; 13(5):207. https://doi.org/10.3390/info13050207
Chicago/Turabian StyleZhi, Junnan, Tingting Song, Kang Yu, Fengen Yuan, Huaqiang Wang, Guangyang Hu, and Hao Yang. 2022. "Multi-Attention Module for Dynamic Facial Emotion Recognition" Information 13, no. 5: 207. https://doi.org/10.3390/info13050207
APA StyleZhi, J., Song, T., Yu, K., Yuan, F., Wang, H., Hu, G., & Yang, H. (2022). Multi-Attention Module for Dynamic Facial Emotion Recognition. Information, 13(5), 207. https://doi.org/10.3390/info13050207