Improvement of Negative Emotion Recognition in Visible Images Enhanced by Thermal Imaging

Lee, Ji-Min; An, Young-Eun; Bak, EunSang; Pan, Sungbum

doi:10.3390/su142215200

Open AccessArticle

Improvement of Negative Emotion Recognition in Visible Images Enhanced by Thermal Imaging

by

Ji-Min Lee

¹,

Young-Eun An

²,

EunSang Bak

^2,* and

Sungbum Pan

^2,*

¹

SW Convergence Education Institute, Chosun University, 309 Pilmun-daero, Dong-gu, Gwang-Ju 61452, Republic of Korea

²

IT Research Institute, Chosun University, 309 Pilmun-daero, Dong-gu, Gwang-Ju 61452, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Sustainability 2022, 14(22), 15200; https://doi.org/10.3390/su142215200

Submission received: 22 September 2022 / Revised: 23 October 2022 / Accepted: 7 November 2022 / Published: 16 November 2022

(This article belongs to the Special Issue New Insights on Intelligence and Security for Sustainable Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Facial expressions help in understanding the intentions of others as they are an essential means of communication, revealing human emotions. Recently, thermal imaging has been playing a complementary role in emotion recognition and is considered an alternative to overcome the drawbacks of visible imaging. Notably, a relatively severe recognition error of fear among negative emotions frequently occurs in visible imaging. This study aims to improve the recognition performance of fear by using the visible and thermal images acquired simultaneously. When fear was not recognized in a visible image, we analyzed the causes of misrecognition. We thus found the condition of replacing the image with a thermal image. It improved emotion recognition performance by 4.54% on average, compared to the performance of using only visible images. Finally, we confirmed that the thermal image effectively compensated for the visible image’s shortcomings.

Keywords:

facial emotion recognition; fusion of visible light image and thermal facial image; convolutional neural network; robustness to classify in negative emotion

1. Introduction

As an essential element of human communication, facial expressions that reveal emotions help with the understanding of the intentions of people. For example, people generally infer emotional states such as happiness, sadness, fear, and anger from facial expressions and tone of voice. Therefore, facial emotion recognition (FER) [1,2,3,4] has been studied in computer vision and machine learning over the past several decades. Indeed, FER technology is rapidly emerging in the field of emotional Information and Communication Technology (ICT), including virtual reality, augmented reality, Advanced Driver Assistance Systems (ADAS), and human-computer interaction.

Facial expressions contain lots of emotional information, and emotions play several roles in human life by being involved in humans’ psychological state, behavior, and reactions. Human judgment and behavior also depend on the emotions felt. Therefore, emotions are used as indicators to infer psychological states. Among the various emotions, negative emotions (sadness, anger, disgust, surprise, fear), particularly fear, cause stress and decrease concentration [5]. Therefore, accurately recognizing negative emotions can help identify the causes of stress. Furthermore, since negative emotions are risk factors that can adversely affect health, it is essential for maintaining good health to recognize and categorize negative emotions.

In general, humans feel fear when faced with dangerous situations or when threatened. Facial expressions in which the eyebrows are raised with tensed lower eyelids and lips stretched horizontally backward are classified as fear. However, fear emotion recognition is a complicated process associated with a lot of errors among negative emotions [6]. According to John Cacioppo, a professor of psychology at the University of Chicago, it is said that negative emotions are directly linked to human survival instincts. They are relatively easy to detect due to their high level of emotional expression. However, since fear is the basis for other negative emotions, it is easy to confuse it with other negative emotions [7].

Most of the existing approaches [6,7,8,9,10,11,12,13,14] for recognizing emotions by classifying facial expressions have applied convolution neural network (CNN) models to visible light images (hereafter referred to simply as visible images). Compared with existing studies that classify facial expressions, there has been significant improvement in recognizing emotions by applying deep learning techniques such as CNN. However, the recognition performance for fear remained low. Facial expression classification based on visible light imaging in an uncontrolled environment (i.e., where lighting and background were not constant) showed low accuracy [15]. In contrast, thermal imaging is less affected by lighting conditions and can be used even in completely dark environments. Since thermal imaging captures changes in temperature in the face region that is affected by human emotions, it has the potential for emotion recognition through facial expressions. It is even considered an alternative means to compensate for the shortcomings of visible light imaging [16,17].

Therefore, this study attempts to take advantage of thermal imaging to supplement the emotion recognition performance of visible images that manifest low recognition rates for fear. In order to do this, the face region was extracted from a visible image and used to train a CNN model, while the one from the thermal image was used to train a residual neural network (ResNet). In addition, each neural network was selected for its relatively good performance given the image. After learning facial expressions with each database by training the neural network, we confirmed that substituting visible images with corresponding thermal images for fear emotion improved overall performance.

The remainder of this study is organized as follows. Section 2 discusses the existing research on emotion recognition using visible light and thermal images. Section 3 presents the proposed method for improving fear recognition performance. Section 4 details the database (DB) construction process and characteristics used in this study and analyzes the results of experiments with the proposed method. The conclusions drawn from the study are stated in Section 5.

2. Related Works

Much research on general facial expression classification (or FER) has been conducted based on visible images [18,19,20,21]. Facial expression classification technology, based on visible light imaging that acquires an object’s image by measuring the light reflected from the object, is sensitive to changes in lighting. Furthermore, it is difficult to distinguish between real and fake emotions in images obtained from people who are good at disguising their emotions [4]. Nguyen, D.H. et al. [6] proposed a method of extracting facial features by an image classifier to obtain essential information about emotions. Considering the extracted facial features as temporal data, he assigned them to one of seven basic emotion types. Pitaloka, D.A. et al. [9] proposed a method to increase the classification performance for six facial expressions by applying various preprocessing steps such as face detection and cropping, resizing, data normalization, and histogram equalization.

Jung et al. [19] used two different types of CNNs. The first type extracted facial features along the time axis from an image sequence. The second extracted the geometric features of facial movements over time by receiving facial landmarks as input. They then proposed a method of integrating the two models to improve the facial expression classification performance. Ahmed Fnaiech et al. [20] proposed a method to increase the fear recognition rate by projecting visible images from 3D to 2D and using angle deviation. However, there was a limitation in performance comparison as the experiment only classified emotions into two categories: fear and other negative emotions. Samadiani, N. et al. [21] performed an emotion recognition experiment using Acted Facial Expression in the Wild (AFEW) obtained from the real environment. When this data was used, it was confirmed that the recognition performance of negative emotions was noticeably low, and although attempts were made to improve the recognition performance using multi-modality, the fear emotion recognition performance was still low.

Figure 1 shows several emotion recognition results of lower performance for fear in literature [6,10,11]. The fear recognition rate was significantly lower than other emotions, which was relatively low even compared to other negative emotions.

Unlike the approaches to classifying facial expressions using visible images, the thermal image method, which is expressed in temperature according to the intensity of infrared radiation energy emitted from an object, is less sensitive to changes in lighting and can express an object even in a completely dark environment. Thermal imaging can also be applied to distinguish between spontaneous emotions (real emotions) and deliberate emotions (fake emotions) by capturing changes in body temperature that are affected by human emotions. J.W. Seo et al. [16] proposed Thermal Face-CNN, a face liveness detection technique that could distinguish a real face from a fake face because the average human face temperature was 36–37 °C. Priya et al. [22] proposed a method for recognizing emotions based on eigenfaces and principal component analysis (PCA) techniques using thermal imaging.

Hung Nguyen et al. [23] studied integrating visible images and thermal images to overcome the disadvantages of visible light imaging, which is highly dependent on illuminance. They found the region of interest (ROI) in the thermal image and used a method to integrate the feature vectors by applying wavelet transform to the visible image. However, only the facial expression recognition accuracy for the entire image was evaluated.

As such, previous studies using visible images have suggested various techniques for recognizing emotions in facial expressions. Since the performance of fear recognition is lower than that of other emotions, there have been attempts to introduce thermal imaging to overcome the shortcomings of visible light imaging. However, it is difficult to find a study to improve the recognition performance of negative emotions among all emotions. Therefore, this study intends to use thermal imaging to compensate for the disadvantage, particularly the low recognition performance of fear emotion in visible images. A summary of related works is shown in Table 1.

In order to build a cooperative algorithm using visible and thermal images, we outline our contributions as follows:

Given a synchronized sequence of visible and thermal images, we try to find discriminative attributes to recognize emotions, especially a negative one(fear).
Based on the discriminative attributes, we design a framework containing appropriate classifiers for both visible and thermal images. One of these classifiers could be supplementary to the other to take advantage of thermal imaging information.
The cause of the low recognition performance of fear emotion is investigated to find conditions for utilizing thermal images.
A new algorithm is derived by statistically analyzing both classifiers’ interactions. There should be a significant factor to differentiate the attributes of each emotion. We try to formulate such characteristics for further development.

3. Method of Improving Recognition Performance for Negative Emotions

3.1. Neural Network Design for Emotion Recognition Based on Visible and Thermal Images

This study proposes a method that compensates for low fear recognition performance using visible images empowered by thermal images using a database built for the study. Figure 2 shows the neural network structure of the proposed method.

An image of a face region by removing the background was used as the input image. It was obtained by sampling the visible and thermal images at the same time interval that were acquired for 30 s simultaneously. The same image size of 224 × 224 was used for network input and trained on the proposed neural network. Among various neural network structures, we adopted the CNN block that takes visible images as input and is a simple structure consisting of four convolution layers, a pooling layer, and a drop-out layer. ResNet, which takes thermal images as input, uses the residual learning method to reconnect the features used in the previous layer. Since the gradient has a value greater than 1 in all layers, the gradient vanishing problem is solved. This structure has the effect of passing the input information to the next layer, and the amount of change in the input can be detected well. In particular, since 1 × 1 convolution and 3 × 3 convolution groups are used, extensive feature extraction is possible.

Learning was executed by repeating the CNN block three times using the face regions from the images since the recognition accuracy of the visible image when trained by a CNN was higher than by ResNet. Regarding thermal images, the recognition accuracy was higher when trained by ResNet than the other CNN. Thus, learning was conducted with ResNet considering the self-constructed database consisting of sequence data. A ResNet model makes a skip connection to facilitate the data differences over time. The skip connection is similar to long short-term memory (LSTM) in the sense that it is a process of better transmitting the gradient of the previous convolution block.

After learning facial expressions with each database in this way, the results from visible images were combined with those from thermal images, followed by an emotion recognition process based on facial expressions. Visible images are sensitive to changes in lighting, whereas thermal images are robust to changes in lighting. Therefore, it could be expected to recognize emotions better based on thermal images under a specific condition, which is found to be fear emotion recognition. To confirm that the fear recognition performance using thermal images is better than using visible images, the classification results by the neural network, as shown in Figure 2, were compared and analyzed. More specifically, the variations between the two networks, each of which was trained with visible images and thermal images, were compared to each other to recognize the emotions of the sequence data for each subject.

Figure 3 shows the training procedures. As the backbone was completely trained for facial feature extraction, only 97 epochs were required to achieve the best performance.

3.2. Proposed Emotion Recognition Method

Figure 4 shows graphs representing significant variations in the similarity of fear among temporally continuous visible images. The x-axis represents the sampled data number of each subject, and the y-axis represents the similarity value of the emotions. The solid blue line represents the change in the similarity to fear, and the yellow line represents the similarity to sadness. An emotion is incorrectly recognized when the similarity graph to another emotion is located above the solid blue line. As shown in Figure 4, when fear is misclassified as another emotion, the similarity to fear is less than or equal to a certain value. Considering the portions of misclassification, a specific criterion can be established to overcome the low accuracy of fear emotion. As a result, the similarity graph of fear from visible images will be replaced by that from thermal images.

Where

N

denotes the number of images whose recognition results are erroneously predicted to be different emotions when visible images representing fear are input, and

p t c l

denotes a specific position value expressed as a percentage after arranging the images in order. This is expressed by the following equation.

p t c l_{p} = ((N - 1) * p) / 100 + 1

(1)

When

p

was 25, 50, and 75, the entire data set was divided into four equal parts, and the similarity of images corresponding to the boundaries was defined as

Q_{1}

,

Q_{2}

, and

Q_{3}

, respectively.

Q_{1}

was the lower quartile,

Q_{3}

was the upper quartile, and

Q_{2}

was the median. The quartile range is expressed as the difference (

IQR

) between the

Q_{3}

and

Q_{1}

values, and the maximum value (

Max

) and the minimum value (

Min

) of the similarity to fear are defined in Equation (2).

Max = Q_{3} + 1.5 * IQR Min = Q_{1} - 1.5 * IQR

(2)

With respect to the emotion similarities of visible images and thermal images, a specific position value of the fear emotion image classified as false negative in visible images is used as a threshold value. For example, suppose the similarity of the visible image is less than the threshold. In that case, the similarity of the visible image is reset to the similarity of the thermal image. When the threshold value is

Q_{2}

,

Q_{3}

, and max values, respectively, the similarity value of the visible image is replaced by the value of the corresponding thermal image. Then, an emotion is predicted again for the thermal image. Subsequently, the overall recognition performance is updated by the results from the thermal images.

4. Experimental and Analysis

4.1. Building the Database

A database composed of numerous visible images has been used in the field of FER [24,25,26,27]. However, emotion recognition could be affected by skin color or cultural differences when using publicly available visible images. To make matters worse, there are few databases consisting of thermal images expressing subjects’ spontaneous or induced emotions. If any, the databases used in the existing thermal image classification [16,22,28,29] are not publicly available. Therefore, in order to overcome such disadvantages, we constructed a database by acquiring visible and thermal images simultaneously. Visible light and thermal imaging cameras were installed in a space with constant lighting and background condition to acquire images of 53 subjects. For 30 s, four emotions of neutrality, happiness, sadness, and fear were induced to acquire images simultaneously with each camera. Finally, our original image database was constructed, as shown in Table 2, by dividing it into frames and saving them as still images.

Visible images were saved in high-density (HD) (1280 × 720) resolution, 30 frames per second, in MPEG-4 file format. Subsequently, still images were extracted from the saved images by sampling at regular intervals and removing the unnecessary background. As a result, a dataset was then constructed by only storing the face regions.

The original thermal images were acquired using a forward-looking infrared (FLIR) thermal imaging camera. They were saved in MPEG-4 file format, with an HD (1080 × 1440) resolution, a thermal resolution of 80 × 60, and 8.57 frames per second. A new database was constructed by removing redundant backgrounds from each frame of the original image and extracting only the face regions.

Figure 5 shows a sample database of thermal and visible images built for this study. Since the sampling rate differed, the temporally closest image pair was extracted by manually synchronizing the visible and thermal images as much as possible.

4.2. Comparing Feature Attributes between Visible and Thermal Images

Seventy percent of the constructed DB was used as training data, and 15% was used to validate the neural network proposed in this study. The remaining 15% of the data was used to evaluate the performance in classifying four emotions.

Figure 6 shows a similarity graph learned in the fear class of two different subjects. The x-axis represents the sampled data number of each subject, and the y-axis represents the similarity value of the emotions. In the similarity graph, a value closer to 1 indicates a greater similarity to the corresponding emotion, and a value closer to 0 indicates a smaller similarity to the corresponding emotion. For example, Figure 6a,c show the graphs for the visible image, and Figure 6b,d for the thermal image, where the solid blue line represents a similarity variation of fear, and the solid yellow line represents a similarity variation of sadness. The red box in Figure 6 represents a significant difference in the similarity measurements between the visible and thermal images acquired simultaneously. In the visible image shown in Figure 6a,c, there is a radical deviation in the similarity values classified as fear. Precisely, there are even more points where the similarity values classified as fear intersect the similarity values classified as sadness. In other words, fear is often mistaken for sadness.

On the other hand, as shown in Figure 6b,d, there is a small deviation in the similarity values classified as fear in the sequence data of the thermal image. For example, in the thermal image, the similarity graph of fear does not intersect with others, and their numerical values constantly maintain a higher value. It indicates that it was correctly recognized as fear in all sequence images. These results suggest that fear is better recognized based on thermal images than visible images.

4.3. Comparative Analysis of Classification Performance Using Four Emotions

Figure 7 shows the recognition performance for four emotions in visible and thermal images. For emotions like neutral, happiness, and sadness, the recognition performance based on visible images is higher than that based on thermal images. However, the recognition accuracy for fear is 94.98 with thermal images, which is higher than that with visible images. In line with previous studies, the recognition performance with visible images is particularly low for fear. Such a trend was demonstrated consistently throughout the entire data.

As a result of calculating the recognition accuracy for all four emotions, the accuracy of the thermal image was 94.61%, and the accuracy of the visible image was 96.52%, confirming that the recognition performance of the visible image was higher than that of the thermal image. Since the visible image has more feature information than the thermal image, the performance is higher with the visible image on average in terms of the overall emotion recognition performance.

Figure 8 shows a boxplot showing the similarity distribution for each emotion, with the test data accounting for about 15% of the database. Fifty percent of the data are distributed based on the median between the top and bottom sides of the box. The whiskers, which are the solid lines in the figure, that extend above and below the box, represent the maximum and minimum values in the quartile range of the data. Since the dotted line includes extreme values (outliers), it could be evaluated as being outside of the valid range. A larger box size represents a greater deviation of the data, and the intersections between the effective value ranges indicate a more severe difficulty in distinguishing between the data. For example, in Figure 8a, the lower whisker of fear overlaps the upper whiskers of the other three emotions. Specifically, most of the effective range of sadness is overlapped with the effective range of fear, then, neutral and happy follow in descending order of difficulty. On the contrary, in Figure 8b, the blue box and whiskers, which is the effective range of fear, do not overlap any effective ranges of the other three emotions. Instead, only their dotted lines intersect each other, which degrades the recognition performance at the minimum.

As a result of checking the similarity distribution of visible and thermal images, the deviation of similarity values is larger with visible images than with thermal images, as shown in Figure 8. In other words, the blue whiskers representing fear and the yellow whiskers representing sadness intersect seriously, as in the red box in Figure 8, indicating more cases of mistaking fear for sadness.

Conversely, there is a smaller deviation of similarity values to fear using thermal images, and the effective values of the data intersect less with other emotions, leading to better fear recognition performance. This was also confirmed in Figure 6, which showed the similarity variation according to the sequence data of individual subjects.

By comparing Figure 8 with the boxplot of neutral in Figure 9, the difference between visible and thermal images can be more clearly revealed than in the case of fear. Both the visible and thermal images show little deviation in the similarity value of neutral emotion for the entire DB image. Accordingly, the neutral emotion was accurately recognized as the similarity hardly intersected with other emotions. The small deviation of the similarity value indicates that the difference in the recognition rate between the sequence data is small for each subject, and the recognition accuracy is also high.

4.4. Improving the Recognition Performance of Fear Using Thermal Images

In this study, the recognition results of thermal images were used to compensate for the low recognition performance of fear among negative emotions using visible images. In the distribution of false negative images whose actual labels were fear but recognized as another emotion based on visible images, the similarity values to fear ranges between 0 and 0.4976, as in Figure 10. Hence, to increase the recognition performance of the data predicted as a false negative, the recognition performance using visible images was evaluated with the boundaries of

Q_{2}

(0.3108),

Q_{3}

(0.4073), and the maximum value (0.4976) is calculated by Equations (1) and (2).

The emotion recognition accuracy was the highest when the visible image data was replaced with thermal image data based on the maximum distribution of the similarity value of the false negatives. Therefore, the resultant recognition performance for each emotion is shown in Table 3 by substituting thermal images for all the visible images with a classification similarity of 0.4976 or less to fear.

By applying the selective thermal images to compensate for the low performance of the visible images, the performance is improved evenly in the recognition performance for fear emotion, as shown in Figure 11.

As a result of re-evaluating the recognition performance by synchronizing the fear recognition results based on the thermal images to the visible images, the fear recognition accuracy improved from 94.01% to 99.17%, with the recall, precision, and F1 scores all improved, as shown in Figure 11.

Since there was no existing study using a DB consisting of visible light and thermal images simultaneously acquired to compare with the proposed method, the following method was used to compare the proposed method with the previous studies indirectly. First, both the publicly accessible visible image DB used in a previous study and the visible image DB constructed in this study were learned by the proposed method. Then, the fear recognition rates were compared. The visible image data of all DBs used for comparison were used for CNN training by matching the input data size to the same size. The CK+ DB consists of images including the upper body, and the FER2013 data consists of images obtained from the side or other angles in addition to those obtained from the front of the face, which usually leads to degradation in the fear recognition performance compared to the DB constructed in this study. The recognition performance of other emotions was 75–93%. As shown in Figure 12, the fear recognition accuracy using the DB constructed in this study is 94.01%, whereas that using FER2013 and CK+ is 76.99% and 76.2%, respectively.

Among the previous studies that performed emotion classification with the CNN using the open DB, the overall emotion recognition accuracy of the study [11] that performed emotion classification using a CNN with the FER2013 database was 61.7%, and the overall emotion recognition accuracy of the study [30] that performed emotion classification with the CK+ DB was 80.3%. For the DB constructed in this study, the emotion recognition accuracy using visible images was 96.52%.

As shown in Figure 13, the overall emotion recognition accuracy using the method proposed in this study improved from 96.52% to 99.09%, showing an improvement in emotion recognition performance with other performance metrics as well. Through this relative comparison, it was demonstrated that the low fear recognition performance using visible images can be improved by using thermal images as proposed in this study.

In order to conduct a fair comparison, there should be an existing DB that contains synchronized visible and thermal images acquired simultaneously. Unfortunately, we have not found such a DB yet. Thus, we tried to compare our system with others by using several public DBs containing only visible images. Such an indirect comparison shows that the proposed system gives a comparable recognition performance over other systems using visible images and that the DB we constructed has a good quality that provides temporally synchronized visible and thermal images.

5. Conclusions

This study used thermal images to improve low recognition performance for fear emotion with visible images. A DB was constructed by simultaneously acquiring visible and thermal images. Then only the face regions were extracted from the face images. The CNN was trained using visible images. The database constructed by extracting only the face regions from the thermal images were also used for learning with a ResNet-18 model. Subsequently, the learning results of the thermal image DB, which showed strength in the classification of fear, were synchronized with the learning results of the visible image DB.

First, the emotion similarity was calculated based on the fear images falsely recognized as another emotion in the visible image. For images with a similarity value lower than the threshold, the emotion recognition performance was evaluated by replacing the similarity of the visible image with that of the thermal image. For the visible images, the fear recognition rate improved from 78.08% to 98.44% in recall, from 97.97% to 98.25% in precision, from 94.01% to 99.17% in accuracy, and from 86.9% to 98.34% in F1 score. Finally, it showed an average improvement of 4.54% in classification performance compared to emotion classification using only visible images.

We confirmed that thermal imaging could complement visible images in emotion recognition, unlike the existing emotion recognition technology that utilized the features of visible and thermal images individually. The most important contribution of this study is that we found significant characteristics of thermal imaging that remarkably differentiated the fear emotion attributes and that it led to an efficient system integration yielding significant performance improvement in recognizing fear among negative emotions. As a result, we have found a potential application of thermal imaging in emotion recognition throughout this research. We are currently working on designing a more sophisticated system by elaborating the decision-making routines, whereby the other negative emotions will be dealt with by resolving false positive errors for a real-time process.

Future research is planned to improve facial expression classification and emotion recognition performance based on thermal images by using only a part of the thermal face image data or extracting new features through preprocessing. In addition, it will be possible to improve the emotion recognition performance by applying an algorithm such as extracting various correlations from input data by composing an ensemble network with visible images.

Author Contributions

Conceptualization, J.-M.L., Y.-E.A. and E.B.; Methodology, J.-M.L. and E.B.; Software, J.-M.L. and Y.-E.A.; Validation, J.-M.L., E.B. and S.P.; Formal analysis, J.-M.L.; Investigation, J.-M.L.; Writing—Original Draft Preparation, J.-M.L.; Writing—Review and Editing J.-M.L., Y.-E.A., E.B. and S.P.; Supervision, S.P.; Project Administration, S.P.; Funding Acquisition, S.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Healthcare AI Convergence Research and Development Program through the National IT Industry Promotion Agency of Korea (NIPA) funded by the Ministry of Science and ICT (No. S0254-22-1001) and the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-2017R1A6A1A03015496).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

FER (facial emotion recognition), ICT (Information and Communication Technology), ADAS (Advanced Driver Assistance Systems), CNN (convolution neural network), ResNet (residual neural network), DB (database), PCA (principal component analysis), ROI (region of interest), LSTM (long short-term memory), HD (high density)

References

Shome, A.; Rahman, M.M.; Chellappan, S.; Islam, A.A. A generalized mechanism beyond NLP for real-time detection of cyber abuse through facial expression analytics. In Proceedings of the 16th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, Texas, TX, USA, 12 November 2019; pp. 348–357. [Google Scholar]
George, A.; Mostaani, Z.; Geissenbuhler, D.; Nikisins, O.; Anjos, A.; Marcel, S. Biometric Face presentation attack detection with multi-channel convolutional neural network. IEEE Trans. Inf. Forensics Secur. 2019, 15, 42–55. [Google Scholar] [CrossRef] [Green Version]
Taha, B.; Hatzinakos, D. Emotion Recognition from 2D Facial Expressions. In Proceedings of the IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), Edmonton, AB, Canada, 5 May 2019; pp. 1–4. [Google Scholar]
Balasubramanian, B.; Diwan, P.; Nadar, R.; Bhatia, A. Analysis of Facial Emotion Recognition. In Proceedings of the IEEE 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 23 April 2019; pp. 945–949. [Google Scholar]
Maaoui, C.; Pruski, A. Emotion Recognition through Physiological Signals for Human-Machine Communication. In Cutting Edge Robotics 2010; IntechOpen: London, UK, 2010; [Online]; Available online: https://www.intechopen.com/chapters/12200 (accessed on 1 September 2010). [CrossRef] [Green Version]
Nguyen, D.H.; Kim, S.; Lee, G.S.; Yang, H.J.; Na, I.S.; Kim, S.H. Facial Expression Recognition Using a Temporal Ensemble of Multi-Level Convolutional Neural Networks. IEEE Trans. Affect. Comput. 2022, 13, 226–237. [Google Scholar] [CrossRef]
Ito, T.A.; Cacioppo, J.T. Electrophysiological evidence of implicit and explicit categorization processes. J. Exp. Soc. Psychol. 2000, 36, 660–676. [Google Scholar] [CrossRef] [Green Version]
Ueda, J.; Okajima, K. Face morphing using average face for subtle expression recognition. In Proceedings of the IEEE 11th International Symposium on Image and Signal Processing and Analysis (ISPA), Dubrovnik, Croatia, 23–25 September 2019; pp. 187–192. [Google Scholar]
Pitaloka, D.A.; Wulandari, A.; Basaruddin, T.; Liliana, D.Y. Enhancing CNN with preprocessing stage in automatic emotion recognition. Procedia Comput. Sci. 2017, 116, 523–529. [Google Scholar] [CrossRef]
Hua, C.H.; Huynh-The, T.; Seo, H.; Lee, S. Convolutional Network with Densely Backward Attention for Facial Expression Recognition. In Proceedings of the IEEE 14th International Conference on Ubiquitous Information Management and Communication (IMCOM), Taichung, Taiwan, 3–5 January 2020; pp. 1–6. [Google Scholar]
Singh, S.; Nasoz, F. Facial Expression Recognition with Convolutional Neural Networks. In Proceedings of the IEEE 10th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 6–8 January 2020; pp. 0324–0328. [Google Scholar]
Salah, A.A.; Kaya, H.; Gurpınar, F. Video-based emotion recognition in the wild. In Multimodal Behavior Analysis in the Wild; Academic Press: Cambridge, MA, USA, 2019; pp. 369–386. [Google Scholar]
Yang, H.; Han, J.; Min, K. A Multi-Column CNN Model for Emotion Recognition from EEG Signals. Sensors 2019, 19, 4736. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhang, B.; Quan, C.; Ren, F. Study on CNN in the recognition of emotion in audio and images. In Proceedings of the IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), Okayama, Japan, 26–29 June 2016; pp. 1–5. [Google Scholar]
Zhao, G.; Huang, X.; Taini, M.; Li, S.Z.; Pietikäinen, M. Facial expression recognition from near-infrared videos. Image Vis. Comput. 2011, 29, 607–619. [Google Scholar] [CrossRef]
JSeo, W.; Chung, I.J. Face Liveness Detection Using Thermal Face-CNN with External Knowledge. J. Korean Inst. Intell. Syst. 2015, 25, 451–456. [Google Scholar]
Kong, S.G.; Heo, J.; Abidi, B.R.; Paik, J.; Abidi, M.A. Recent advances in visual and infrared face recognition—A review. Comput. Vis. Image Underst. 2005, 97, 103–135. [Google Scholar] [CrossRef] [Green Version]
Breuer, R.; Kimmel, R. A deep learning perspective on the origin of facial expressions. arXiv 2017, arXiv:1705.01842. [Google Scholar]
Jung, H.; Lee, S.; Yim, J.; Park, S.; Kim, J. Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–12 December 2015; pp. 2983–2991. [Google Scholar]
Fnaiech, A.; Sahli, H.; Sayadi, M.; Gorce, P. Fear Facial Emotion Recognition Based on Angular Deviation. Electronics 2021, 10, 358. [Google Scholar] [CrossRef]
Samadiani, N.; Huang, G.; Luo, W.; Chi, C.H.; Shu, Y.; Wang, R.; Kocaturk, T. A multiple feature fusion framework for video emotion recognition in the wild. Concurr. Comput. Pract. Exp. 2022, 34, e5764. [Google Scholar] [CrossRef]
Priya, M.S.; Nawaz, G.K. Modified emotion recognition system to study the emotion cues through thermal facial analysis. Biomed. Res. 2017, 28, 8718–8723. [Google Scholar]
Nguyen, H.; Chen, F.; Kotani, K. Fusion of visible images and thermal image sequences for automated facial emotion estimation. J. Mob. Multimed. 2014, 10, 294–308. [Google Scholar]
Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I. The extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 94–101. [Google Scholar]
Tao, S.Y.; Martinez, A.M. Compound facial expressions of emotion. Proc. Natl. Acad. Sci. USA 2014, 111, E1454–E1462. [Google Scholar]
Lyons, M.J.; Akamatsu, S.; Kamachi, M.; Gyoba, J. Coding facial expressions with Gabor wave. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 14–16 April 1998; pp. 200–205. [Google Scholar]
Available online: https://mmifacedb.eu/ (accessed on 29 November 2017).
Wang, S.; Liu, Z.; Lv, S.; Lv, Y.; Wu, G.; Peng, P.; Chen, F.; Wang, X. A Natural Visible and Infrared Facial Expression Database for Expression Recognition and Emotion Inference. IEEE Trans. Multimed. 2010, 12, 682–691. [Google Scholar] [CrossRef]
Nguyen, H.; Kotani, K.; Chen, F.; Le, B. A thermal facial emotion database and its analysis. In Pacific-Rim Symposium on Image and Video Technology; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Shan, K.; Guo, J.; You, W.; Lu, D.; Bie, R. Automatic facial expression recognition based on a deep convolutional-neural-network structure. In Proceedings of the 2017 IEEE 15th International Conference on Software Engineering Research, Management and Applications (SERA), London, UK, 7–9 June 2017; pp. 123–128. [Google Scholar]

Figure 1. Comparison of each emotion recognition performance in related works [5,10,11].

Figure 2. Block diagram of the proposed method.

Figure 3. Training procedures.

Figure 4. Example of similarity graphs of emotion (Fear, Happy, Neutral, Sadness) in visible images by subjects: (a) Subject 1; (b) Subject 2; (c) Subject 6; (d) Subject 12; (e) Subject 22; (f) Subject 29; (g) Subject 33; (h) Subject 38 and (i) Subject 41.

Figure 5. Synchronized visible and thermal image: (a) visible image dataset; (b) thermal image dataset.

Figure 6. Similarity graphs of emotion (Fear, Happy, Neutral, Sadness) between visible and thermal images by subjects. Each red box indicates the emotional similarity at the same range in time for both images: (a,c) similarity of fear of subjects No. 9 and No. 10 in the visible image; (b,d) Similarity to fear of subjects No. 9 and No. 10 in the thermal image.

Figure 7. Recognition performance for each emotion with visible and thermal images: (a) Recognition accuracy by emotion with visible images; (b) Recognition accuracy by emotion with thermal images.

Figure 8. Distribution of similarity values of visible and thermal images to fear: (a) Boxplot of visible images; (b) Boxplot of thermal images.

Figure 9. Distribution of similarity values of visible and thermal images to neutral: (a) Boxplot of visible images; (b) Boxplot of thermal images.

Figure 10. Distribution of similarity values in images predicted as false negative among visible images of fear.

Figure 11. Comparison of fear recognition accuracy between the method using only visible images and the proposed method.

Figure 12. Comparison of the fear recognition accuracy with visible light images between the existing DB and the DB constructed in this study.

Figure 13. Comparison of recognition accuracy for each DB consisting of the visible image, thermal image, and visible + thermal image.

Table 1. Comparison of the accuracies achieved through various techniques.

Author Name	Technique Used	Database Used	Emotion Recognition Accuracy
Nguyen [6]	Multi-Level Convolutional Neural Networks Ensemble of Multi-Level Convolutional Neural Networks	FER2013	73.03%-MLCNN 74.09%-Ensemble model
Pitaloka [9]	Convolution Neural Networks (CNN)	Collected from CK+ and JAFFE and MUG	85.21%
Jung [19]	DTAGN: Integrated model of the deep temporal appearance network (DTAN) and the deep temporal geometry network (DTGN)	CK+, Oulu-CASIA and MMI	97.25%-CK+ 81.46%-Oulu 70.24%-MMI
Ahmed [20]	a 3D/2D projection method using the combination of PCA	own database obtained from 69 subjects	96.2%
Priya [22]	Eigen face PCA	own thermal image database	97.8%
Hung [23]	norm-Eigenspace method(n-EMC) thermal-Principal Component Analysis(t-PCA)	KTFE	85.6%-n-EMC 86.2%-t-PCA

Table 2. Database configuration.

Emotion	Number of Data
Neutral	3738
Happiness	3484
Sadness	3354
Fear	3346
Total	13,922

Table 3. Comparison of recognition accuracy after replacing visible images with thermal images.

Emotion	Recognition Accuracy after Replacing Visible Images with Thermal Images (%)
Emotion	Recall	Precision	Accuracy	F1 Score
Fear	98.44	98.25	99.17	98.34
Neutral	98.18	98.29	99.06	98.23
Sadness	96.54	97.82	98.67	97.17
Happy	99.53	98.38	99.48	98.95

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, J.-M.; An, Y.-E.; Bak, E.; Pan, S. Improvement of Negative Emotion Recognition in Visible Images Enhanced by Thermal Imaging. Sustainability 2022, 14, 15200. https://doi.org/10.3390/su142215200

AMA Style

Lee J-M, An Y-E, Bak E, Pan S. Improvement of Negative Emotion Recognition in Visible Images Enhanced by Thermal Imaging. Sustainability. 2022; 14(22):15200. https://doi.org/10.3390/su142215200

Chicago/Turabian Style

Lee, Ji-Min, Young-Eun An, EunSang Bak, and Sungbum Pan. 2022. "Improvement of Negative Emotion Recognition in Visible Images Enhanced by Thermal Imaging" Sustainability 14, no. 22: 15200. https://doi.org/10.3390/su142215200

APA Style

Lee, J. -M., An, Y. -E., Bak, E., & Pan, S. (2022). Improvement of Negative Emotion Recognition in Visible Images Enhanced by Thermal Imaging. Sustainability, 14(22), 15200. https://doi.org/10.3390/su142215200

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improvement of Negative Emotion Recognition in Visible Images Enhanced by Thermal Imaging

Abstract

1. Introduction

2. Related Works

3. Method of Improving Recognition Performance for Negative Emotions

3.1. Neural Network Design for Emotion Recognition Based on Visible and Thermal Images

3.2. Proposed Emotion Recognition Method

4. Experimental and Analysis

4.1. Building the Database

4.2. Comparing Feature Attributes between Visible and Thermal Images

4.3. Comparative Analysis of Classification Performance Using Four Emotions

4.4. Improving the Recognition Performance of Fear Using Thermal Images

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI