Submit to Sensors Review for Sensors Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Multimodal Emotion Recognition in Artificial Intelligence

Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (4 February 2022) | Viewed by 26409

Share This Special Issue

Special Issue Editors

Prof. Dr. Valentina Franzoni

E-Mail Website
Guest Editor

Department of Mathematics and Computer Science, University of Perugia, 06123 Perugia, Italy
Interests: artificial intelligence; emotion recognition; learner behaviour modeling; semantic proximity measures; link prediction; deep learning algorithms
Special Issues, Collections and Topics in MDPI journals

Dr. Giulio Biondi

E-Mail
Guest Editor

Dipartimento di Matematica e Informatica (DiMaI), University of Florence, Florence, Italy
Interests: artificial intelligence; e-learning; link prediction; complex networks
Special Issues, Collections and Topics in MDPI journals

Dr. Alfredo Milani

E-Mail Website
Guest Editor

Department of Mathematics and Computer Science, University of Perugia, 06123 Perugia, Italy
Interests: online evolutionary algorithms; metaheuristic for combinatorial optimization; discrete differential evolution; semantic proximity measures; planning agents and complex network dynamics; emotion recognition
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Jordi Vallverdú

E-Mail Website
Guest Editor

Philosophy Department, Universitat Autònoma de Barcelona, 08193 Bellaterra (BCN), Spain
Interests: robot emotions; affective computing; computational cognitive science; human-robot interaction; philosophy of technology; Bayesian probability; blended cognition
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Advances in artificial intelligence demand multidisciplinary development of the whole field of affective computing and emotion recognition, which is becoming a key scenario for human–machine interaction, data mining systems, medical self-care, social network analysis, and social influence of multimodal communication.

This Special Issue aims to bring together researchers and practitioners toward stimulating cooperation and cross-fertilization between different communities focused on the research, development, and applications of emotion recognition, both through the use of emotional data and data arousing different types and levels of emotions.

Critical innovations are paving the way for innovative applications in the broad concept of sensors, ranging from personal data (e.g., wearable devices, crowd-sound data, speech, images, brain–computer interfaces) to data from professional sensors collected in labs (e.g., eye-tracking, medical data, MRI, balance boards).

Prof. Dr. Valentina Franzoni
Dr. Giulio Biondi
Prof. Dr. Alfredo Milani
Prof. Dr. Jordi Vallverdú
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

affective computing
emotion recognition
artificial intelligence
wearable sensors
brain–computer devices
crowd-sound emotions
speech emotions
text emotions
face recognition
social robots

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (5 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

18 pages, 519 KiB

Open AccessArticle

Robust Multi-Scenario Speech-Based Emotion Recognition System

by Fangfang Zhu-Zhou, Roberto Gil-Pita, Joaquín García-Gómez and Manuel Rosa-Zurera

Sensors 2022, 22(6), 2343; https://doi.org/10.3390/s22062343 - 18 Mar 2022

Cited by 11 | Viewed by 2447

Abstract

Every human being experiences emotions daily, e.g., joy, sadness, fear, anger. These might be revealed through speech—words are often accompanied by our emotional states when we talk. Different acoustic emotional databases are freely available for solving the Emotional Speech Recognition (ESR) task. Unfortunately, many of them were generated under non-real-world conditions, i.e., actors played emotions, and recorded emotions were under fictitious circumstances where noise is non-existent. Another weakness in the design of emotion recognition systems is the scarcity of enough patterns in the available databases, causing generalization problems and leading to overfitting. This paper examines how different recording environmental elements impact system performance using a simple logistic regression algorithm. Specifically, we conducted experiments simulating different scenarios, using different levels of Gaussian white noise, real-world noise, and reverberation. The results from this research show a performance deterioration in all scenarios, increasing the error probability from 25.57% to 79.13% in the worst case. Additionally, a virtual enlargement method and a robust multi-scenario speech-based emotion recognition system are proposed. Our system’s average error probability of 34.57% is comparable to the best-case scenario with 31.55%. The findings support the prediction that simulated emotional speech databases do not offer sufficient closeness to real scenarios. Full article

(This article belongs to the Special Issue Multimodal Emotion Recognition in Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 3072 KiB

Open AccessArticle

Emotional Speech Recognition Method Based on Word Transcription

by Gulmira Bekmanova, Banu Yergesh, Altynbek Sharipbay and Assel Mukanova

Sensors 2022, 22(5), 1937; https://doi.org/10.3390/s22051937 - 2 Mar 2022

Cited by 16 | Viewed by 4022

Abstract

The emotional speech recognition method presented in this article was applied to recognize the emotions of students during online exams in distance learning due to COVID-19. The purpose of this method is to recognize emotions in spoken speech through the knowledge base of emotionally charged words, which are stored as a code book. The method analyzes human speech for the presence of emotions. To assess the quality of the method, an experiment was conducted for 420 audio recordings. The accuracy of the proposed method is 79.7% for the Kazakh language. The method can be used for different languages and consists of the following tasks: capturing a signal, detecting speech in it, recognizing speech words in a simplified transcription, determining word boundaries, comparing a simplified transcription with a code book, and constructing a hypothesis about the degree of speech emotionality. In case of the presence of emotions, there occurs complete recognition of words and definitions of emotions in speech. The advantage of this method is the possibility of its widespread use since it is not demanding on computational resources. The described method can be applied when there is a need to recognize positive and negative emotions in a crowd, in public transport, schools, universities, etc. The experiment carried out has shown the effectiveness of this method. The results obtained will make it possible in the future to develop devices that begin to record and recognize a speech signal, for example, in the case of detecting negative emotions in sounding speech and, if necessary, transmitting a message about potential threats or riots. Full article

(This article belongs to the Special Issue Multimodal Emotion Recognition in Artificial Intelligence)

► Show Figures

Figure 1

13 pages, 958 KiB

Open AccessArticle

Lie to Me: Shield Your Emotions from Prying Software

by Alina Elena Baia, Giulio Biondi, Valentina Franzoni, Alfredo Milani and Valentina Poggioni

Sensors 2022, 22(3), 967; https://doi.org/10.3390/s22030967 - 26 Jan 2022

Cited by 9 | Viewed by 3081

Abstract

Deep learning approaches for facial Emotion Recognition (ER) obtain high accuracy on basic models, e.g., Ekman’s models, in the specific domain of facial emotional expressions. Thus, facial tracking of users’ emotions could be easily used against the right to privacy or for manipulative purposes. As recent studies have shown that deep learning models are susceptible to adversarial examples (images intentionally modified to fool a machine learning classifier) we propose to use them to preserve users’ privacy against ER. In this paper, we present a technique for generating Emotion Adversarial Attacks (EAAs). EAAs are performed applying well-known image filters inspired from Instagram, and a multi-objective evolutionary algorithm is used to determine the per-image best filters attacking combination. Experimental results on the well-known AffectNet dataset of facial expressions show that our approach successfully attacks emotion classifiers to protect user privacy. On the other hand, the quality of the images from the human perception point of view is maintained. Several experiments with different sequences of filters are run and show that the Attack Success Rate is very high, above 90% for every test. Full article

(This article belongs to the Special Issue Multimodal Emotion Recognition in Artificial Intelligence)

► Show Figures

Figure 1

29 pages, 1759 KiB

Open AccessArticle

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

by Cristina Luna-Jiménez, David Griol, Zoraida Callejas, Ricardo Kleinlein, Juan M. Montero and Fernando Fernández-Martínez

Sensors 2021, 21(22), 7665; https://doi.org/10.3390/s21227665 - 18 Nov 2021

Cited by 69 | Viewed by 10546

Abstract

Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance. Full article

(This article belongs to the Special Issue Multimodal Emotion Recognition in Artificial Intelligence)

► Show Figures

Figure 1

20 pages, 1274 KiB

Open AccessArticle

The Extensive Usage of the Facial Image Threshing Machine for Facial Emotion Recognition Performance

by Jung Hwan Kim, Alwin Poulose and Dong Seog Han

Sensors 2021, 21(6), 2026; https://doi.org/10.3390/s21062026 - 12 Mar 2021

Cited by 53 | Viewed by 4800

Abstract

Facial emotion recognition (FER) systems play a significant role in identifying driver emotions. Accurate facial emotion recognition of drivers in autonomous vehicles reduces road rage. However, training even the advanced FER model without proper datasets causes poor performance in real-time testing. FER system performance is heavily affected by the quality of datasets than the quality of the algorithms. To improve FER system performance for autonomous vehicles, we propose a facial image threshing (FIT) machine that uses advanced features of pre-trained facial recognition and training from the Xception algorithm. The FIT machine involved removing irrelevant facial images, collecting facial images, correcting misplacing face data, and merging original datasets on a massive scale, in addition to the data-augmentation technique. The final FER results of the proposed method improved the validation accuracy by 16.95% over the conventional approach with the FER 2013 dataset. The confusion matrix evaluation based on the unseen private dataset shows a 5% improvement over the original approach with the FER 2013 dataset to confirm the real-time testing. Full article

(This article belongs to the Special Issue Multimodal Emotion Recognition in Artificial Intelligence)

► Show Figures

Journal Menu

Journal Browser

Multimodal Emotion Recognition in Artificial Intelligence

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (5 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI