Next Article in Journal
An Adaptive Topology Management Scheme to Maintain Network Connectivity in Wireless Sensor Networks
Next Article in Special Issue
Validation of an RF Image System for Real-Time Tracking Neurosurgical Tools
Previous Article in Journal
Impact of Gender and Feature Set on Machine-Learning-Based Prediction of Lower-Limb Overuse Injuries Using a Single Trunk-Mounted Accelerometer
Previous Article in Special Issue
Robots in Eldercare: How Does a Real-World Interaction with the Machine Influence the Perceptions of Older People?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Emotion Recognizing by a Robotic Solution Initiative (EMOTIVE Project)

1
Clinical Psychology Service, Health Department, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, 71013 Foggia, Italy
2
Department of Industrial Engineering, University of Florence, 50121 Florence, Italy
3
Information and Communication Technology, Innovation & Research Unit, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, 71013 Foggia, Italy
4
Complex Unit of Geriatrics, Department of Medical Sciences, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, 71013 Foggia, Italy
*
Author to whom correspondence should be addressed.
Sensors 2022, 22(8), 2861; https://doi.org/10.3390/s22082861
Submission received: 26 January 2022 / Revised: 3 April 2022 / Accepted: 7 April 2022 / Published: 8 April 2022
(This article belongs to the Special Issue Assistive Robots for Healthcare and Human-Robot Interaction)

Abstract

:
Background: Emotion recognition skills are predicted to be fundamental features in social robots. Since facial detection and recognition algorithms are compute-intensive operations, it needs to identify methods that can parallelize the algorithmic operations for large-scale information exchange in real time. The study aims were to identify if traditional machine learning algorithms could be used to assess every user emotions separately, to relate emotion recognizing in two robotic modalities: static or motion robot, and to evaluate the acceptability and usability of assistive robot from an end-user point of view. Methods: Twenty-seven hospital employees (M = 12; F = 15) were recruited to perform the experiment showing 60 positive, negative, or neutral images selected in the International Affective Picture System (IAPS) database. The experiment was performed with the Pepper robot. Concerning experimental phase with Pepper in active mode, a concordant mimicry was programmed based on types of images (positive, negative, and neutral). During the experimentation, the images were shown by a tablet on robot chest and a web interface lasting 7 s for each slide. For each image, the participants were asked to perform a subjective assessment of the perceived emotional experience using the Self-Assessment Manikin (SAM). After participants used robotic solution, Almere model questionnaire (AMQ) and system usability scale (SUS) were administered to assess acceptability, usability, and functionality of robotic solution. Analysis wasperformed on video recordings. The evaluation of three types of attitude (positive, negative, andneutral) wasperformed through two classification algorithms of machine learning: k-nearest neighbors (KNN) and random forest (RF). Results: According to the analysis of emotions performed on the recorded videos, RF algorithm performance wasbetter in terms of accuracy (mean ± sd = 0.98 ± 0.01) and execution time (mean ± sd = 5.73 ± 0.86 s) than KNN algorithm. By RF algorithm, all neutral, positive and negative attitudes had an equal and high precision (mean = 0.98) and F-measure (mean = 0.98). Most of the participants confirmed a high level of usability and acceptability of the robotic solution. Conclusions: RF algorithm performance was better in terms of accuracy and execution time than KNN algorithm. The robot was not a disturbing factor in the arousal of emotions.

1. Introduction

The concept of emotion has a place in evolutionary history and it even predates the rise of human beings [1]. This piece of knowledge corroborates the idea that some emotions are innate in humans rather than socially constructed [2]. While processing emotions may come naturally to most people, computers, on the other hand, have struggled with the execution of this concept for decades. Much research has been gathered in fields like computer vision and machine learning in the effort to use computers to accurately classify emotions portrayed by humans. This effort includes analyzing human speech, gestures, and facial expressions.
Many areas offer potential for the application of automated human facial expression recognition. Such fields include education, marketing, security, and medicine. Essentially, wherever a human is present to evaluate emotion, a computer can also aid in the analysis.
In the future, assistive robots maybe components of our daily lives exhibiting a high level of social interaction [3]. In the human–robot interaction (HRI), assistive robots are perceived associal actors evoking mental patterns characteristic of the human–human interaction [4].
A person’s attitude towards social interaction is expressed by a series of social signals that convey information concerning feelings, mental state, and other individual traits (e.g., voice quality, posture, gaze) [5]. Furthermore, social relationships are complex and take in several indicators (e.g., emotions, language, tone, facial expression, and body posture).
In order to rise above HRI gaps, new robots need to incorporate cognitive models and create a real vision of how human interaction models employ [6], identifying the user behavior and its changing modality during the interaction and modelling the robot behavior consequently. De Carolis et al. [7] described NAO robot capable to simulate empathic behavior based on the user’s recognized emotion through speech prosody and facial expression with a higher capability to understand emotions (4.11 vs. 3.35) than ability to feel emotions.
TheiCub robot wasalso provided with affective and interactive prompts based on facial expressions [8]: the authors reported that iCub’s internal dynamics was perceived by users more positively with adaptive vs. non-adaptive robot behavior with significant differences found both over the sessions [F(1, 22) = 7.87, p = 0.01] and for interaction [F(1, 22) = 5.27, p = 0.03].
Of the key factors of robots and human interaction is the robot human-like head. As a result, a person could more naturally convey theirthoughts to the robot, which consequently could helpthem to express their internal state [9,10]. However, several challenges persist in order to achieve research outcomes.
The first important question concerns the modalities needed to sense the emotional state of people by the robot. Secondly, there is the problem of modelling the interaction between human and robot on the emotional level.
Emotion recognition skills should be present in real social service robots moving in an open world, closelyrelated to a group of people to perform specific tasks.
Some experimenters believe that emotion recognition in robots should be used to collect feedback on user comportment of robots and also adapt it consequently. Mei and Liu [11] assembled a robotic grabber with two cameras, one to detect objects and the other to observe the user’s face. Based on the emotion expressed by the user during the process, the robot decided to capture or avoid the displayed object. Chumkamon et al. [12] have developed a robot capableof sympathizing with the emotion expressed by the user. Interestingly, unsupervised learning was used in both of these cases. Nho et al. [13] estimated a cleaning robot’s actions by using hand gestures to fine-tune its internal emotional model. In these cases, the user showed an emotion caused directly by the robot’s actions, so the emotion was related to a specific action and not to the general environment. Hence, a robot could also be tutored to act directly based on the user’s emotional feeling, therefore unconnected to the robot’s actions. The robot could use it to fine- tune internal parameters, for example not to act too invasively in the task performed. Roning et al. [14] developed a remarkable illustration of an independent agent carrying in the real world; their Minotaurus robot could recognize people through sight, discover their mood, and move around intelligent ambient and take information from it.
Another group of experimenters supposed that robots should have an internal model for feelings to regulate their actions. Jitviriya et al. [15] used a self-organizing map (SOM) to classify eight inner feelings of a ConBe robot. Based on the difficulty of catching the ball, the robot tuned its actions and emotional expression when it caughtthe ball held by a person. Van Chien et al. [16] have developed a humanoid robotic platform that expresses its internal feelings through changes in the way it walks.
The design of a social robot depends on the operation and the set of contexts it manages. The presence of a head in the robot’s body can clearly be an effective way for robots to express feelings, although its design can be relatively grueling and constrained.
Emotion recognition in robotics is important for several reasons, not only for the pure interaction between human and robot but also for bioengineering operations. A substantial part of the investigativework in this area concerns the treatment of children on theautism spectrum or seniors with mild cognitive impairments [16]. In this case, robots are used as tools to show the feelings that children need to showduring the therapy session, as they can communicate with robots more fluently than with humans. Additionally, they discover the emotion expressed by the patients and try to classify it according to the correct degree of performance [17,18,19,20,21]. A physiotherapist is always present to cover the work and judge the categories made by the robots, as these results are notintended to replace human work [16]. In fact, in these operations, robots act as simple tools to help workers and are more known as support robots. In similar operations, a robot is awaited to give only an objective evaluation that complements the individual evaluation of the physiotherapists. It does notrequire emotion recognition or environment mindfulness to shape one’s actions as it has no real emotion. Otherwise, the robot could be used to identifya person’s mood inside their home and try to relieve them of a stressful condition. An interesting sample comes from Han et al. [22]. Their robot recognized emotions in elderly people and changed the parameters of an intelligent environmental context to try to induce happiness. In an alternate case, it could be used to discover the onset of dementia or mild cognitive impairment.
Since facial detection and recognition algorithms are compute-intensive operations, it needs to identify methods that can parallelize the algorithmic operations for large-scale information exchange in real time. Emergence of low-cost graphic processing units (GPU), many-core architectures, and high-speed Internet connectivity has provided sufficient communication resources to accelerate facial detection and recognition [23].
In a recent study, a conditional-generative-adversarial-network-based (CGAN- based) framework was used to drop intra-class variances through management of facial expressions and to learn generative and discriminative representations [24]. The face image wasconverted into a prototypic shape of facial expression by the generator G with an accuracy of 81.83%. In other studies, a convolutional neural network (CNN)-based multi-task model was designed for gender classification, smile discovery, and emotion recognition with an accuracy of 71.03% [25] and performance accuracies of 76.74% and 80.30% with the integration of k-nearest neighbor (KNN) technique [26].
Additionally, CNN can be trained for the face identification task. Thus, random forest (RF) classifiers were learned to predict an emotion score using an available training set with an accuracy of 75.4% [27].
The present study addressed a main challenge: how can the principles of engineering and mathematics be applied to create a system that can recognize emotions in given facial images?
The specific aims of the present study were:
  • to identify if traditional machine learning algorithms could be used to evaluate each user’s emotions independently (Intra-classification task);
  • to compare emotion recognitionin two types of robotic modalities: static robot (which does not perform any movement) and motion robot (whichperforms movements in concordant with the emotions elicited); and
  • to assess the acceptability and usability of assistive robot from the end-user point of view.

2. Materials and Methods

2.1. Robot Used for Experimentation

For this studyphase, Pepper robot [28] was used (Figure 1).
Pepper is the world’s first social humanoid robot able to recognize faces and basic human emotions. It was optimized for human interaction and is able to engage with people through conversation and its touch screen.
It has six main characteristics, as shown below:
  • Twenty freedom degrees for natural and expressive movements.
  • Perception components to identify and interact with the person talking to it.
  • Speech recognition and dialogue available in 15 languages.
  • Bumpers, infrared sensors, 2D and 3D cameras, sonars for omnidirectional and autonomous navigation, and an inertial measurement unit (IMU).
  • Touch sensors, LEDs, and microphones for multimodal interactions.
  • Open and fully programmable platform.
Standing 120 cm tall, Pepper has no trouble perceiving its environment and entering into a conversation when it sees a person. The touch screen on its chest displays content to highlight messages and support speech. Its curvy design ensures danger-free use and a high level of acceptance by users.

2.2. Recruitment

This study fulfilled the Declaration of Helsinki [29], guidelines for Good Clinical Practice, and the Strengthening the Reporting of Observational Studies in Epidemiology guidelines [30]. The approval of the study for experiments using human subjects was obtained from the local Ethics Committee on human experimentation (Prot. N. 3038/01DG). Written informed consent for research was obtained from each participant.
Participants were recruited in July 2020 in Casa Sollievo della Sofferenza Hospital, San Giovanni Rotondo, Italy.
In total, 30 hospital employees were screened for eligibility.
Inclusion criteria were:
  • No significant neuropsychiatric symptoms evaluated by neuropsychiatric inventory (NPI) [31]: through use of LabView, an interface to upload data relating to the NPI has been created in order to run the calculation of the uploaded data;
  • No significant visual or hearing loss; and
  • No cognitive impairment evaluated by mini mental state examination (MMSE) [32]: MMSE score ≥ 27.
Exclusion criteria were:
  • No completed and signed informed consent;
  • Incomplete acceptability and usability assessment; and
  • Recorded video that is not properly visible.

2.3. Experimentation Protocol

The experiment was run by a showing of International Affective Picture System (IAPS) [33] capable ofeliciting emotions. IAPS is a database of pictures designed to provide a standardized set of pictures for studying emotion and attention that has been widely used in psychological research. The IAPS was developed by the National Institute of Mental Health Centre for Emotion and Attention at the University of Florida. It is the essential property of the IAPS that the stimulus set is accompanied by a detailed list of average ratings of the emotions elicited by each picture. This enables other researcher to select stimuli eliciting a specific range of emotions for their experiments when using the IAPS. The process of establishing such average ratings for a stimulus set is also referred to as standardization by psychologists.
The normative rating procedure for the IAPS is based on the assumption that emotional assessments can be accounted for by the three dimensions’ valence, arousal, and dominance. Thus, participants taking part in the studies that are conducted to standardize the IAPS are asked to rate how pleasant or unpleasant, calm or excited, and controlled or in-control they felt when looking at each picture. A graphic rating scale, the Self-Assessment Manikin (SAM) [34], is used for this rating procedure.
Of this database, 60 pictures with 20 positive, negative, or neutral valence, respectively, were selected.
The positive valence is characterized by emotional states such as joy, happiness, affection, and amusement. The negative valence is characterized by behaviors such as irritation, disappointment, impatience, and annoyance.
The experiment was performed with Pepper robot in two modalities: static and active.
About the experimental phase with Pepper in the active modality, a concordant mimic was programmed according to the image types (positive, negative, and neutral): the robot performed an action according to the valence of each image shown (i.e., it smiled if a picture of a smiling child was shown). In Table 1, the actions performed by the robot according to positive or negative images is shown. The actions of the robot varied in order to avoid the repetition of the same movement in close proximity.
The participants were randomly assigned to one of two groups according to static (static robot group) and active (accordant motion group) modalities of the robot.
During the experiment, the pictures were shown over the tablet mounted on the robot chest. During the session, the user was sitting in front of the robot (Figure 2). It was achieved by developing an ad-hoc web interface in which each picture was shown for 7 s.
The experiment lasted about 20–25 min for participant. All 60 pictures did not follow a specific layout, but they were shown in the same order to all participants.
For each picture, the participants were called to perform a subjective evaluation about their perceived emotional experience (emotive feedback) by SAM directly in the tablet.
The SAM quantifies subjective emotional states using sequences of humanoid figures that reproduce the different gradations of the three fundamental dimensions of evaluation (valence, activation, and control). For the hedonic valence, the figures vary from being happy and smiling to being unhappy and frowning. For arousal, the figures vary from being relaxed and asleep to being aroused with open eyes. For control, the figures vary from being wide and in control to being small and out of control. The experimenter, with each image presented, had the task to compile a grid of coding composed of six macro categories regarding the movements of the face (facial expressions such as movements of the head and the look) and the movements of the body (posture, hand gestures, gestures, of self-contact). The coding grid was created thanks to the contributions by coding systems specialized in detecting non-verbal behavior: Facial Action Coding System (FACS) [35] and Specific Affective Coding System (SPAFF) [36]. After the participants used the robotic solution, the Almere model questionnaire (AMQ) [37] and system usability scale (SUS) [38] were administered in order to assess the acceptability, usability, and functionality of the robotic solution by participants.
AMQ is a questionnaire using a 5-point Likert scale ranging from 1 to 5 (totally disagree—disagree—don’t know—agree—totally agree, respectively) designed primarily to measure users’ acceptance toward socially assistive robots. The questionnaire focuses on the following 12 constructs: (1) anxiety; (2) attitude toward technology; (3) facilitating conditions; (4) intention to use; (5) perceived adaptiveness; (6) perceived enjoyment; (7) perceived ease of use; (8) perceived sociability; (9) perceived usefulness; (10) social influence; (11) social presence; and (12) trust. The results are expressed as the average of each construct.
SUS is a tool for measuring usability. It consists of a 10-item questionnaire with five response options for respondents: strongly agree to strongly disagree. The participant’s scores for each question are converted to a new number, added together, then multiplied by 2.5 to convert the original scores of 0–40 to 0–100. A SUS score above 68 would be considered above average confirming that the system is useful and easy to use.

2.4. Data Analysis

The interaction with the Pepper robot wasassessed by automatically analyzing the recorded videos achieved by an external camera.
Videorecordings (front video shooting) of the user’s interaction with the robotic solution were evaluated and coded using the behavioral parameters in Table 2.
Specifically, the features wereextracted from each frame for all videos. The entiredata analysis process is shown in Figure 3. The extracted features werecollected in a unimodal dataset. Moreover, the attitude evaluation wasperformed on the dataset, using the raw representation of each instance.

2.4.1. Feature Extraction

The recorded video wasthen analyzed to extract the visual features. At the end of this process, a dataset wasobtained containing the data extracted from the video (unimodal).
As described in [3], the visual features of interest wereextracted from each image frame (sampled at 30 Hz) using an open source data processing tool: OpenFace [39].
The OpenFace toolkit wasused to analyze the user’s facial behavior. In detail, we used the OpenFace toolkit [40] to estimate gaze, eye reference points, head pose, face reference point, and facial action units (AU)—total characteristics = 465, which are commonly used to evaluate emotions in affective processing. As described in [30], the OpenFace model uses a convolutional experts constrained local model (CE-CLM) which is composed of point distribution model (PDM), which detects changes in the shape of landmarks and patch experts thatlocally model variations in the appearance of each landmark. CE-CLM initialized with the bounding box from a multi-task cascaded neural network face detector.
Other tracking methods perform face detection only in the first frame and then apply facial landmark localization using the fitting result from the previous frame as initialization. The estimated head position is expressed in terms of the position of the head with respect to the camera in millimeters (Tx, Ty, and Tz) and rotations in radians around the x, y, and z axes (Rx, Ry, Rz). The 18 recognized facial action units are expressed in terms of presence (0–1) and intensity (on a 6-pointLikert scale).

2.4.2. Classification

We performed intra-subject validation using raw data [3]. In the intra-subjective case, the classification wascarried out on the characteristics of each participant individually. To minimize bias, the 10-cross fold validation technique wasapplied to each participant dataset. Using the raw representation, the data wasrepresented by a vector of numbers with real values.
The z-normalization wasapplied on vectors. According to the IAPS picture (positive, negative, and neutral) table provided by the expert, each instance wastagged manually.
Attitude assessment wascarried out using the following algorithms:
  • KNN—a non-parametric algorithm employed for regressions and classification. Class membership of each point is calculated by a majority vote of the closest neighbors of each point [3]: a query point is assigned the data class that has the greatest number of representatives within the point’s closest neighbors [3]. In our case, we used K = 3 [3].
  • RF—an ensemble learning technique that works by building multiple decision trees during training [3]. The class mode reverts to the corresponding class [3]. We set up a maximum of 64 trees in the forest and the entropy function to measure the split quality [3].
For these methods, we used the sklearn Python toolbox for machine learning [3,41] in the Google Collaboratory cloud service [3,42]. The effectiveness of each algorithm wasestimated in terms of accuracy (A), precision (P), recall (R), F-Measure (F), and execution time (T) [3]. The same metrics werealso used to compare the performance of the two algorithms in analyzing the dataset [3].

2.5. Statistical Analysis

Data analyses were performed using R Ver. 4.1.2. Statistical software package (The R Project for Statistical computing; available at URL http://www.r-project.org/; Accessed 18 November 2021). For dichotomous variables, differences between the groups were tested using the Fisher exact test. This analysis was made using the two-way contingency table analysis available at the Interactive Statistical Calculation Pages (http://statpages.org/; Accessed 18 November 2021). For continuous variables, normal distribution was verified by the Shapiro–Wilk normality test and the 1-sample Kolmogorov–Smirnov test. For normally distributed variables, differences among the groups were tested by the Welch two-sample t-test or analysis of variance under general linear model. For non-normally distributed variables, differences among the groups were tested by the Wilcoxon rank sum test with continuity correction or the Kruskal–Wallis rank sum test. Test results in which the p-value was smaller than the type 1 error rate of 0.05 were declared significant.

3. Results

3.1. Participant Characteristics

Of all participants, three (users n. 10, 15, and 25) were excluded because the videos recorded were not properly visible. In total, 27 participants (M = 12, F = 15, mean age = 40.48 ± 10.82 years) were included in the study according to the inclusion and exclusion criteria as shown in Table 3. Of these 27, 18 participants used the robot in the static modality while 9 participants used the robot in the active modality.
The two groups did not differ for gender (p = 0.411) and educational level (p = 0.194). The participants who used the robot in static modality were younger (p = 0.049) than the other group.

3.2. Emotion Analysis

The longest interaction session lasted 29 min while the shortest one lasted 17 min. Figure 4 describes the total frames for each participant. In intra-subject validation, the number of samples in the training and testing dataset depends on the number of frames recorded for each user.
As reported in Figure 5 and Figure 6, dealing with the raw representation, the KNN obtainedan average precision of 0.85 ± 0.06 and anaverage speed of the algorithm of 6.62 ± 1.62 s.
The performance of RF algorithm wasbetter in terms of accuracy (mean ± sd = 0.98 ± 0.01) and execution time (mean ± sd = 5.73 ± 0.86 s) with respect to KNN algorithm. By KNN algorithm, the neutral attitudes showed the worst performance in terms of precision (mean ± sd = 0.82 ± 0.07) and F-measure (mean ± sd = 0.84 ± 0.06) followed bypositive and negative attitudes. According to RF algorithm, all neutral, positive, and negative attitudes had an equal and high precision (mean = 0.98) and F-measure (mean = 0.98). The participant groups that used the robot in static and active modalities did not differ in all considered variables according to KNN and RF algorithms.

3.3. Usability and Acceptability Results

As shown in Table 4, the three groups of participants did not differ in SUS (p = 0.157) and AMQ domains (ANX, p = 0.716; ATT, p = 0.726; FC, p = 0.226; ITU, p = 0.525; PAD, p = 0.701; PENJ, p = 0.624; PEOU, p = 0.525; PS, p = 0.527; PU, p = 0.519; SP, p = 0.197; SI, p = 0.194). Most participants confirmed a high level of usability of the robot solution (SUS > 70 in mean) and acceptability as shown in the AMQ domains: low level of anxiety (ANX = 7.59 in mean), good attitude (ATT = 11.59 in mean), few facilitating conditions (FC = 6.18 in mean), high level of intention to use (ITU = 8.44 in mean), high level of perceived adaptability (PAD = 10.74 in mean), highest level of perceived enjoyment (PENJ = 20.18 in mean), highest level of perceived ease of use (PEOU = 16.96 in mean), high level of perceived sociability (PS = 13.74 in mean), high level of perceived usefulness (PU = 9.85 in mean), and highest level of social presence (SP = 14.44 in mean). Only the social influence level was low (SI = 5.48 in mean) because the experiment did not involve connecting with other people.

4. Discussion

In this study, the main objective was to explore if automatic tools couldimprove emotion detection and the robot couldbe a disturbing factor in the elicitation of emotions. The outcomes showedthat extracted visual features madethe attitude assessment stronger. In this study, no sufficient audio data belonged to each attitude state to investigate this modality on its own. As for the intra-subject analysis, RF achieved high performance in terms of accuracy for the raw representation. As shown in [43], the results obtained in the intra-subject validation showedthat the assessment can be customized for each user, suggesting the possibility of integrating this representation into a larger architecture.
Conversely, a recent studydemonstrated that KNN algorithm had a superior performance as well as the support vector machine (SVM) and multilayer perceptron (MLP) algorithms [27]. In another study, among ten machine learning algorithms applied to predict four types of emotions, RF showed the highest area under the curve of 0.70, thenother algorithms including KNN (0.61) [44]. There are discrepancies between the results of previous studies and the present study. Previously, studies reported SVM [45,46,47,48,49] and KNN [50,51] superiority. Possible reasons for these divergences could be the following: (1) SVM has parameters that include C and gamma that can cause overfitting problems (i.e., high accuracy in the training dataset and low accuracy in the test dataset); (2) it is unclear which previous studies used proper cross-validations for their algorithms because they did not describe how to conduct cross-validations; (3) SVM and KNN belong to the classifier type of machine learning algorithms whereas RF belongs to the ensemble type of machine learning algorithms; (4) RF analyzes high-dimensional data and solves a variety of problems to achieve high accuracy [52]: this contrasts with simple classifiers such as SVM and KNN which are suitable for small sample sizes.
The information coming out of encoder includes the most pertinent features identified by the robot’s perception system, decreasing setting assessment times. The flow of information proposed in this studycan be integrated into a cognitive architecture to model robot behavior based on user behavior [3]. Attitude information can be used not only to define what to do but also how the robot should perform the task [3]. This study wascarried out with a relatively large data sample. As a preliminary work, we used common machine learning algorithms to assess the user’s attitude state. Once more data areavailable, more advanced neural architectures can be introduced. We strongly believe that by introducing neural architecture, the robot can automatically assess online what has been done offline.
Furthermore, the fundamental result is that all the participating groups that used the robot in static and active modalities did not differ in any ofthe considered variables according to the KNN and RF algorithms. This aspect canmean that a static or active robot is not a disturbing factor in the arousal of emotions.
Furthermore, most of the participants confirmed a high level of usability and acceptability of the robotic solution.
The outcome of all work can be integrated into a robotic platform to automatically assess the quality of interaction and to modify its behavior accordingly.
Affective computing or social robotics increased not only the necessity of studies about how to improve the emotional interaction between humans and machines but also how to design cognitive architectures which included biomimetic elements related with emotions. The range of emotional aspects with fundamental interest for robot engineers and experts is comprehensive: human–robot interaction, robot task-planning, energy management, social robotics, body design, care robotics, and service robotics, among a long list. Practically, there is no field related to robotic and AI thatis not directly or indirectly related to the implementation of emotional values.
Methods of evaluation for human studies in HRI are listed as (1) self-assessments, (2) interviews, (3) behavioral measures, (4) psychophysiology measures, and (5) task performance metrics, with self-assessment and behavioral measures being the most common.
As the area of social robotics and HRI grows, public demonstrations have the potential to provide insights about the robot and system effectiveness in public settings and reactions of the people. Live public demonstrations enable us to better understand humans and inform the science and engineering fields to design and build better robots with more purposeful interaction capabilities.
One of the challenges is that, although modelling the dynamics of expressions and emotions has been extensively studied in the literature, how to model personality in a time-continuous manner has been an open problem.
Robust facial expression recognition is technically challenging, especially if the age range of the intended participants is large; the facial expression dataset of this thesis contained only adult participants. The technical challenges are compounded by the fact that expression recognition needs to be carried out with real-time processing speed on a standard computer. It needs to conductmultiple live demonstrations in elderly patients without cognitive impairment and with dementia. This isrequired to build a robust facial expression recognition system.

5. Conclusions

The study aimed to identify if traditional machine learning algorithms (KNN and RF) could be used to assess three types of emotion valences (positive, negative, andneutral) for every user, to relate emotion recognizing in two robotic modalities (static or motion robot), and to evaluate the acceptability and usability of assistive robot from an end-user point of view. According to the analysis of emotions performed on the recorded videos, RF algorithm performance wasbetter in terms of accuracy and execution time than KNN algorithm. Most of the participants confirmed a high level of usability and acceptability of the robotic solution.
In conclusion, the robot wasnot a disturbing factor in the arousal of emotions.

Author Contributions

Conceptualization, G.D.; methodology, G.D.; data curation, L.F., A.S., F.C. (Filomena Ciccone) and D.S.; writing—original draft preparation, G.D.; writing—review and editing, L.F. and A.S.; visualization, S.R. and F.G.; supervision, F.C. (Filippo Cavallo); project administration, G.D. All authors have read and agreed to the published version of the manuscript.

Funding

Funding for this research wasprovided by EU Horizon 2020 PHArA-ON Project ‘Pilots for Healthy and Active Ageing’, grant agreement no. 857188.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the local ethics committee of Fondazione Casa Sollievo della Sofferenza IRCCS (Prot. Code: EMOTIVE; Prot. N.: 102/CE) for studies involving humans.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to restrictions (their containing information that could compromise the privacy of research participants).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

AMQAlmere Model Questionnaire
ANXanxiety
ATTattitude
FCfacilitating conditions
ITUintention to use
PADperceived adaptability
PENJperceived enjoyment
PEOUperceived ease of use
PSperceived sociability
PUperceived usefulness
SIsocial influence
SPsocial presence

References

  1. Darwin, C. The Expression of Emotion in Man and Animals; D. Appleton And Company: New York, NY, USA, 1899. [Google Scholar]
  2. Ekman, P. Afterword: Universality of Emotional Expression? A Personal History of the Dispute. In The Expression of the Emotions in Man and Animals; Darwin, C., Ed.; Oxford University Press: New York, NY, USA, 1998; pp. 363–393. [Google Scholar]
  3. Sorrentino, A.; Fiorini, L.; Fabbricotti, I.; Sancarlo, D.; Ciccone, F.; Cavallo, F. Exploring Human attitude during Human-Robot Interaction. In Proceedings of the 29th IEEE International Symposium on Robot and Human Interactive Communication, Naples, Italy, 31 August–4 September 2020. [Google Scholar]
  4. Horstmann, A.C.; Krämer, N.C. Great Expectations? Relation of Previous Experiences With Social Robots in Real Life or in the Media and Expectancies Based on Qualitative and Quantitative Assessment. Front. Psychol. 2019, 10, 939. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Vinciarelli, A.; Pantic, M.; Bourlard, H. Social signal processing: Survey of an emerging domain. Image Vis. Comput. 2009, 27, 1743–1759. [Google Scholar] [CrossRef] [Green Version]
  6. Nocentini, O.; Fiorini, L.; Acerbi, G.; Sorrentino, A.; Mancioppi, G.; Cavallo, F. A survey of behavioural models for social robots. Robotics 2019, 8, 54. [Google Scholar] [CrossRef] [Green Version]
  7. De Carolis, B.; Ferilli, S.; Palestra, G. Simulating empathic behavior in a social assistive robot. Multimed. Tools Appl. 2017, 76, 5073–5094. [Google Scholar] [CrossRef]
  8. Tanevska, A.; Rea, F.; Sandini, G.; Cañamero, L.; Sciutti, A. A Socially Adaptable Framework for Human-Robot Interaction. Front. Robot. AI 2020, 7, 121. [Google Scholar] [CrossRef]
  9. Chumkamon, S.; Hayashi, E.; Masato, K. Intelligent emotion and behavior based on topological consciousness and adaptive resonance theory in a companion robot. Biol. Inspired Cogn. Arch. 2016, 18, 51–67. [Google Scholar] [CrossRef]
  10. Cavallo, F.; Semeraro, F.; Fiorini, L.; Magyar, G.; Sinčák, P.; Dario, P. Emotion Modelling for Social Robotics Applications: A Review. J. Bionic Eng. 2018, 15, 185–203. [Google Scholar] [CrossRef]
  11. Mei, Y.; Liu, Z.T. An emotion-driven attention model for service robot. In Proceedings of the 2016 12th World Congress on Intelligent Control and Automation (WCICA), Guilin, China, 12–15 June 2016; pp. 1526–1531. [Google Scholar]
  12. Hirschberg, J.; Manning, C.D. Advances in natural language processing. Science 2015, 349, 261–266. [Google Scholar] [CrossRef]
  13. Nho, Y.H.; Seo, J.W.; Seol, W.J.; Kwon, D.S. Emotional interaction with a mobile robot using hand gestures. In Proceedings of the 2014 11th International Conference on Ubiquitous Robots and Ambient Intelligence, Kuala Lumpur, Malaysia, 12–15 November 2014; pp. 506–509. [Google Scholar]
  14. Röning, J.; Holappa, J.; Kellokumpu, V.; Tikanmäki, A.; Pietikäinen, M. Minotaurus: A system for affective human–robot interaction in smart environments. Cogn. Comput. 2014, 6, 940–953. [Google Scholar] [CrossRef]
  15. Jitviriya, W.; Koike, M.; Hayashi, E. Behavior selection system based on emotional variations. In Proceedings of the 2015 24th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Kobe, Japan, 31 August–4 September 2015; pp. 462–467. [Google Scholar]
  16. Van Chien, D.; Sung, K.J.; Trung, P.X.; Kim, J.W. Emotion expression of humanoid robot by modification of biped walking pattern. In Proceedings of the 2015 15th International Conference on Control, Automation and Systems (ICCAS), Busan, Korea, 13–16 October 2015; pp. 741–743. [Google Scholar]
  17. Sinčák, P.; Novotná, E.; Cádrik, T.; Magyar, G.; Mach, M.; Cavallo, F.; Bonaccorsi, M. Cloud-based Wizard of Oz as a service. In Proceedings of the 2015 IEEE 19th International Conference on Intelligent Engineering Systems (INES), Bratislava, Slovakia, 3–5 September 2015; pp. 445–448. [Google Scholar]
  18. Leo, M.; Del Coco, M.; Carcagnì, P.; Distante, C.; Bernava, M.; Pioggia, G.; Palestra, G. Automatic emotion recognition in robot-children interaction for ASD treatment. In Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), Santiago, Chile, 7–13 December 2015; pp. 537–545. [Google Scholar]
  19. Mazzei, D.; Zaraki, A.; Lazzeri, N.; De Rossi, D. Recognition and expression of emotions by a symbiotic android head. In Proceedings of the 2014 14th IEEE-RAS International Conference on Humanoid Robots (Humanoids), Madrid, Spain, 18–20 November 2014; pp. 134–139. [Google Scholar]
  20. Boccanfuso, L.; Barney, E.; Foster, C.; Ahn, Y.A.; Chawarska, K.; Scassellati, B.; Shic, F. Emotional robot to examine differences in play patterns and affective response of Children with and without ASD. In Proceedings of the 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Christchurch, New Zealand, 7–10 March 2016; pp. 19–26. [Google Scholar]
  21. Cao, H.L.; Esteban, P.G.; De Beir, A.; Simut, R.; Van De Perre, G.; Lefeber, D.; Vanderborght, B. ROBEE: A homeostatic-based social behavior controller for robots in Human-Robot Interaction experiments. In Proceedings of the 2014 IEEE International Conference on Robotics and Biomimetics (ROBIO), 5–10 December 2014; pp. 516–521. [Google Scholar]
  22. Han, J.; Xie, L.; Li, D.; He, Z.J.; Wang, Z.L. Cognitive emotion model for eldercare robot in smart home. China Commun. 2015, 12, 32–41. [Google Scholar]
  23. Thinakaran, P.; Guttman, D.; Taylan Kandemir, M.; Arunachalam, M.; Khanna, R.; Yedlapalli, P.; Ranganathan, N. Chapter 11—Visual Search Optimization. Editor(s): James Reinders, Jim Jeffers. In High Performance Parallelism Pearls; Morgan Kaufmann: Burlington, MA, USA, 2015; pp. 191–209. [Google Scholar]
  24. Deng, J.; Pang, G.; Zhang, Z.; Pang, Z.; Yang, H.; Yang, G. cGAN based facial expression recognition for human-robot interaction. IEEE Access 2019, 7, 9848–9859. [Google Scholar] [CrossRef]
  25. Sang, D.V.; Cuong, L.T.B.; Van Thieu, V. Multi-task learning for smile detection, emotion recognition and gender classification. In Proceedings of the Eighth International Symposium on Information and Communication Technology, New York, NY, USA, 7–8 December 2017; pp. 340–347. [Google Scholar]
  26. Shan, K.; Guo, J.; You, W.; Lu, D.; Bie, R. Automatic facial expression recognition based on a deep convolutional-neural-network structure. In Proceedings of the 2017 IEEE 15th International Conference on Software Engineering Research, Management and Applications (SERA), London, UK, 7–9 June 2017; pp. 123–128. [Google Scholar]
  27. Siam, A.I.; Soliman, N.F.; Algarni, A.D.; Abd El-Samie, F.E.; Sedik, A. Deploying Machine Learning Techniques for Human Emotion Detection. Comput. Intell. Neurosci. 2022, 2, 8032673. [Google Scholar] [CrossRef] [PubMed]
  28. SoftBank Robotics Home Page. Available online: https://www.softbankrobotics.com/emea/en/pepper (accessed on 27 January 2022).
  29. World Medical Association. World Medical Association Declaration of Helsinki: Ethical principles for medical research involving human subjects. JAMA 2013, 310, 2191. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  30. von Elm, E.; Altman, D.G.; Egger, M.; Pocock, S.J.; Gøtzsche, P.C.; Vandenbroucke, J.P.; STROBE-Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: Guidelines for reporting of observational studies. Internist 2008, 49, 688–693. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  31. Cummings, J.L.; Mega, M.; Gray, K.; Rosenberg-Thompson, S.; Carusi, D.A.; Gornbein, J. The Neuropsychiatric Inventory: Comprehensive assessment of psychopathology in dementia. Neurology 1994, 44, 2308–2314. [Google Scholar] [CrossRef] [Green Version]
  32. Folstein, M.; Folstein, S.; McHugh, P. Mini-mental state: A practical method for grading the cognitive state of patients for the clinician. J. Psychiatr. Res. 1975, 12, 189–198. [Google Scholar] [CrossRef]
  33. Lang, P.J.; Bradley, M.M.; Cuthbert, B.N. International affective picture System (IAPS): Affective ratings of pictures and instruction manual. In Technical Report A-8; University of Florida: Gainesville, FL, USA, 2008. [Google Scholar]
  34. Lang, P.J. Behavioral treatment and bio-behavioral assessment: Computer applications. In Technology in Mental Health Care Delivery Systems; Sidowski, J.B., Johnson, J.H., Williams, E.A., Eds.; Ablex: Norwood, NJ, USA, 1980; pp. 119–137. [Google Scholar]
  35. Ekman, P.; Friesen, W.V.; Hager, J.C. Facial Action Coding System. In Manual and Investigator’s Guide; Research Nexus: Salt Lake City, UT, USA, 2002. [Google Scholar]
  36. Gottman, J.M.; McCoy, K.; Coan, J.; Collier, H. The Specific Affect Coding System (SPAFF) for Observing Emotional Communication in Marital and Family Interaction; Erlbaum: Mahwah, NJ, USA, 1995. [Google Scholar]
  37. Heerink, M.; Kröse, B.J.A.; Wielinga, B.J.; Evers, V. Assessing acceptance of assistive social agent technology by older adults: The Almere model. Int. J. Soc. Robot. 2010, 2, 361–375. [Google Scholar] [CrossRef] [Green Version]
  38. Borsci, S.; Federici, S.; Lauriola, M. On the dimensionality of the System Usability Scale: A test of alternative measurement models. Cogn. Process. 2009, 10, 193–197. [Google Scholar] [CrossRef]
  39. Baltrusaitis, T.; Zadeh, A.; Lim, Y.C.; Morency, L.P. OpenFace 2.0: Facial behavior analysis toolkit. In Proceedings of the 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 59–66.
  40. Baltrusaitis, T. Posted on 28 October 2019. Available online: https://github.com/TadasBaltrusaitis/OpenFace/wiki/Output-Format (accessed on 21 October 2020).
  41. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  42. Bisong, E. Google Colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform; Apress: Berkeley, CA, USA, 2019; pp. 59–64. [Google Scholar]
  43. Rudovic, O.; Lee, J.; Dai, M.; Schuller, B.; Picard, R.W. Personalized machine learning for robot perception of affect and engagement in autism therapy. Sci. Robot. 2018, 3, eaao6760. [Google Scholar] [CrossRef] [Green Version]
  44. Li, X.; Ono, C.; Warita, N.; Shoji, T.; Nakagawa, T.; Usukura, H.; Yu, Z.; Takahashi, Y.; Ichiji, K.; Sugita, N.; et al. Heart Rate Information-Based Machine Learning Prediction of Emotions Among Pregnant Women. Front. Psychiatry 2022, 12, 799029. [Google Scholar] [CrossRef] [PubMed]
  45. Rakshit, R.; Reddy, V.R.; Deshpande, P. Emotion detection and recognition using HRV features derived from photoplethysmogram signals. In Proceedings of the 2nd Workshop on Emotion Representations and Modelling for Companion Systems, Tokyo, Japan, 16 November 2016; pp. 1–6. [Google Scholar]
  46. Cheng, Z.; Shu, L.; Xie, J.; Chen, C.P. A novel ECG-based real-time detection method of negative emotions in wearable applications. In Proceedings of the 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), Shenzhen, China, 15–18 December 2017; pp. 296–301. [Google Scholar]
  47. Jang, E.H.; Rak, B.; Kim, S.H.; Sohn, J.H. Emotion classification by machine learning algorithm using physiological signals. Proc. Comput. Sci. Inf. Technol. Singap. 2012, 25, 1–5. [Google Scholar]
  48. Guo, H.W.; Huang, Y.S.; Lin, C.H.; Chien, J.C.; Haraikawa, K.; Shieh, J.S. Heart rate variability signal features for emotion recognition by using principal component analysis and support vectors machine. In Proceedings of the 2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE), Taichung, Taiwan, 31 October–2 November 2012; pp. 274–277. [Google Scholar]
  49. Dominguez-Jimenez, J.A.; Campo-Landines, K.C.; Martínez-Santos, J.C.; Delahoz, E.J.; Contreras-Ortiz, S.H. A machine learning model for emotion recognition from physiological signals. Biomed. Signal Process. Control. 2020, 55, 101646. [Google Scholar] [CrossRef]
  50. Zheng, B.S.; Murugappan, M.; Yaacob, S. Human emotional stress assessment through Heart Rate Detection in a customized protocol experiment. In Proceedings of the 2012 IEEE Symposium on Industrial Electronics and Applications, Bandung, Indonesia, 23–26 September 2012; pp. 293–298. [Google Scholar]
  51. Ferdinando, H.; Seppänen, T.; Alasaarela, E. Comparing features from ECG pattern and HRV analysis for emotion recognition system. In Proceedings of the 2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Chiang Mai, Thailand, 5–7 October 2012; pp. 1–6. [Google Scholar]
  52. Ayata, D.; Yaslan, Y.; Kamasak, M.E. Emotion based music recommendation system using wearable physiological sensors. IEEE Trans. Consum. Electron. 2018, 64, 196–203. [Google Scholar] [CrossRef]
Figure 1. Overview of Pepper robot in front view (a) and side view (b).
Figure 1. Overview of Pepper robot in front view (a) and side view (b).
Sensors 22 02861 g001
Figure 2. Overview of experimental context.
Figure 2. Overview of experimental context.
Sensors 22 02861 g002
Figure 3. Data analysis.
Figure 3. Data analysis.
Sensors 22 02861 g003
Figure 4. Total frames analyzed for each participant.
Figure 4. Total frames analyzed for each participant.
Sensors 22 02861 g004
Figure 5. Raw representation according to KNN algorithm.
Figure 5. Raw representation according to KNN algorithm.
Sensors 22 02861 g005
Figure 6. Raw representation according to RF algorithm.
Figure 6. Raw representation according to RF algorithm.
Sensors 22 02861 g006
Table 1. Robot actions performed according to positive or negative images shown.
Table 1. Robot actions performed according to positive or negative images shown.
PositiveNegative
To smileTo step back slightly showing disgust
To clap handsTo cry
To raise arms and cheerTo bend chest forward showing boredom
To blow a kissTo turn head left and right quickly showing fear
To waveTo bow head showing sadness
To make an appreciationTo fold arms showing confusion
Table 2. Social cues analyzed.
Table 2. Social cues analyzed.
ParameterCategoryTypes
BehavioralEmotionJoy, sadness, fear, anger, disgust, neutral
GazeDirected gaze, mutual face gaze, none
Facial expressionsSmile, laugh, raise eyebrows, frown, inexpressive
Table 3. Participant characteristics.
Table 3. Participant characteristics.
All n = 27Static Robot n = 18Accordant Motion n = 9p-Value
Gender 0.411
Men/Women12/157/115/4
Men (%)44.4038.9055.60
Age (years) 0.049
Mean ± SD40.48 ± 10.8237.61 ± 8.1446.22 ± 13.56
Range28–6628–5331–66
Educational level 0.194
Degree—n (%)24 (88.90)17 (94.40)7 (77.80)
High school—n (%)3 (11.10)1 (5.60)2 (22.20)
Table 4. Usability and acceptability post-robot interaction.
Table 4. Usability and acceptability post-robot interaction.
All n = 27Static Robot n = 18Accordant Motion n = 9p-Value
SUS 0.157
    Mean ± SD72.87 ± 13.1175.42 ± 14.9867.78 ± 6.18
    Range *45.00–100.0045.00–100.0060.00–77.50
AMQ 0.716
ANX
    Mean ± SD7.59 ± 2.547.62 ± 2.607.33 ± 2.54
    Range *4–134–134–11
ATT 0.726
    Mean ± SD11.59 ± 1.8811.50 ± 2.0111.78 ± 1.71
    Range *7–157–159–14
FC 0.226
    Mean ± SD6.18 ± 1.886.50 ± 2.095.56 ± 1.24
    Range *2–102–104–8
ITU 0.525
    Mean ± SD8.44 ± 3.138.72 ± 3.187.89 ± 3.14
    Range *3–153–153–12
PAD 0.701
    Mean ± SD10.74 ± 1.7210.83 ± 1.8510.55 ± 1.51
    Range *7–157–158–13
PENJ 0.624
    Mean ± SD20.18 ± 2.9720.39 ± 3.2919.78 ± 2.33
    Range *15–2515–2516–24
PEOU 0.525
    Mean ± SD16.96 ± 2.7116.72 ± 3.0217.44 ± 2.01
    Range *12–2112–2114–20
PS 0.527
    Mean ± SD13.74 ± 2.7213.50 ± 3.1814.22 ± 1.48
    Range *4–184–1812–16
PU 0.519
    Mean ± SD9.85 ± 2.2610.05 ± 2.489.44 ± 1.81
    Range *5–155–157–12
SI 0.197
    Mean ± SD5.48 ± 2.085.11 ± 2.136.22 ± 1.85
    Range *2–92–84–9
SP 0.194
    Mean ± SD14.44 ± 2.7913.94 ± 2.6215.44 ± 3.00
    Range *9–199–199–19
* Minimum and maximum scores obtained by the participants.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

D’Onofrio, G.; Fiorini, L.; Sorrentino, A.; Russo, S.; Ciccone, F.; Giuliani, F.; Sancarlo, D.; Cavallo, F. Emotion Recognizing by a Robotic Solution Initiative (EMOTIVE Project). Sensors 2022, 22, 2861. https://doi.org/10.3390/s22082861

AMA Style

D’Onofrio G, Fiorini L, Sorrentino A, Russo S, Ciccone F, Giuliani F, Sancarlo D, Cavallo F. Emotion Recognizing by a Robotic Solution Initiative (EMOTIVE Project). Sensors. 2022; 22(8):2861. https://doi.org/10.3390/s22082861

Chicago/Turabian Style

D’Onofrio, Grazia, Laura Fiorini, Alessandra Sorrentino, Sergio Russo, Filomena Ciccone, Francesco Giuliani, Daniele Sancarlo, and Filippo Cavallo. 2022. "Emotion Recognizing by a Robotic Solution Initiative (EMOTIVE Project)" Sensors 22, no. 8: 2861. https://doi.org/10.3390/s22082861

APA Style

D’Onofrio, G., Fiorini, L., Sorrentino, A., Russo, S., Ciccone, F., Giuliani, F., Sancarlo, D., & Cavallo, F. (2022). Emotion Recognizing by a Robotic Solution Initiative (EMOTIVE Project). Sensors, 22(8), 2861. https://doi.org/10.3390/s22082861

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop