1. Introduction
Given that emotion plays an important role in our daily lives and work, the real-time assessment and regulation of emotions can improve our lives. For example, emotion recognition will facilitate the natural advancement of human–machine interactions and communication. Furthermore, recognizing the real emotional state of patients, particularly those of patients with expression problems, will help improve the quality of medical care. In recent years, emotion recognition based on EEG signals has gained considerable attention. The method of emotion recognition is a crucial factor in human-computer interaction (HCI) systems, which will effectively improve communication between humans and machines [
1,
2].
However, emotion recognition based on EEG signals is challenging given the vague boundaries and individual variations presented by emotions. Moreover, in theory, we cannot obtain the “ground truth” of human emotions, that is, the true label of EEG that correspond to different emotional states, because emotion is a function of time, context, space, language, culture, and races. Therefore, researchers have used various affective materials, such as images, sounds, and videos, to elicit emotions. Affective video materials are widely used by researchers given that these materials can expose subjects to real-life scenarios through the visual and aural stimuli that they provide.
DEAP is a multimodal dataset used to analyze human affective states. This dataset contains EEG and peripheral physiological signals acquired from 32 participants as they watched 40 one-minute-long excerpts of music videos [
3]. MAHNOB-HCI is another multimodal database of recorded responses to affective movie stimuli. A multimodal setup was established for the synchronized recording of face videos, audio signals, eye-gaze data, and peripheral/central nervous system physiological signals of 27 participants [
4]. Zheng et al. developed a SEED dataset to investigate stable patterns over time for emotion recognition from EEG. Fifteen subjects participated in the experiment, and each subject was required to perform the experiment for three sessions. The time interval between two sessions is one week or longer [
5]. Liu et al. constructed a standard database of 16 emotional film clips selected from over one thousand film excerpts and proposed a system for real-time recognition of movie-induced emotion through the analysis of EEG signals [
6].
Various features and extraction methods based on the above datasets have been proposed for the recognition of emotions from EEG signals. These methods include time domain, frequency domain, joint time-frequency analysis, and empirical mode decomposition (EMD) techniques [
7].
The statistical parameters of EEG series, including first and second difference, mean value, and power, are usually utilized as features in time domain techniques [
8]. Nonlinear features, including fractal dimension [
9,
10], sample entropy [
11] and nonstationary index [
12] have been utilized for emotion recognition. Hjorth features [
13], and higher order crossing features [
14] had also been used in EEG studies [
15,
16].
Time-frequency analysis is based on the spectrum of EEG signals, and the energy, power, power spectral density and differential entropy (DE) [
17] of a certain subband are utilized as features. Short-time Fourier transform (STFT) [
18,
19], Hilbert-Huang transform [
20,
21] and discrete wavelet transform [
22,
23,
24,
25] are the most commonly used techniques for spectral calculation. Higher frequency subbands, such as Beta (16–32 Hz) and Gamma (32–64 Hz) bands, have been verified to outperform lower subbands in emotion recognition [
3,
26].
Mert et al. extracted entropy, power, power spectral density, correlation, and asymmetry of intrinsic mode functions (IMF) as features through EMD and then utilized independent component analysis (ICA) to reduce the dimensions of the feature set. Classification accuracy of emotions was computed with all the subjects merged together [
27]. Zhuang et al. utilized the multidimensional information of IMF, the first difference of time series, the first difference of phase, and normalized energy as features. They then verified the classification performance of their method with the DEAP dataset and found that the classification accuracy is superior to DE of the Gamma band [
28].
Other features extracted from electrode combinations, such as the coherence and asymmetry of electrodes in different brain regions [
29,
30,
31], and graph-theoretic features [
32], have been utilized. Jenke et al. compared the performance of different features and obtained a guiding rule for feature extraction and selection [
33].
Some other strategies, such as the utilization of deep networks, have also been investigated to improve classification performance. Zheng used a deep neural network to investigate critical frequency bands and channels for emotion recognition [
34]. Yang used a hierarchical network with subnetwork nodes for emotion recognition [
35]. Li et al. designed a hybrid deep-learning model that combines the convolutional neural network and recurrent neural network to extract task-related features. They then performed experiments with the DEAP dataset [
36].
All datasets and methods for emotion recognition are based on external affective stimuli. However, few studies on self-induced emotion recognition from EEG have been conducted despite their importance to endogenous emotion recognition. Liu et al. investigated the profile of autonomic nervous responses during the experience of five basic self-induced emotions: sadness, happiness, fear, anger, surprise and neutral. ECG and respiratory activity of fourteen healthy volunteers were recorded during their reading passages with five basic emotional tones and neutral tone to elicit corresponding endogenous emotions. They found that it was feasible and effective to recognize users’ affective states based on peripheral physiological response patterns of ECG and respiratory activities. However, their research did not include the patterns of EEG signals for self-induced emotion [
37]. The stability, performance and neural patterns of self-induced emotion recognition based on EEG signals remain unknown. Moreover, whether self-induced emotion and affective stimuli-induced emotion share commonalities remains a point of contention. The main contributions of this study to the EEG-based emotion recognition can be summarized as follows:
- (1)
We have developed an emotional EEG dataset for the evaluation of stable patterns of self-induced emotion across subjects. To the best of our knowledge, a public EEG dataset for analyzing the classification performance of stable neural patterns in the recognition of self-induced emotion is unavailable.
- (2)
We systematically compared self-induced emotion with movie-induced emotion and found that these two types of emotions share numerous commonalities.
- (3)
We analyzed the important features, electrode distribution, and average neural patterns of different self-induced emotions. Our analytical results will support future efforts for real-time recognition of endogenous emotions in real life.
- (4)
We confirmed that self-induced emotions exhibit subject-independent neural signatures and relatively stable EEG patterns at critical frequency bands and brain regions.
This paper is structured as follows: a detailed description of the experimental setup is presented in
Section 2. A discussion of the methodology is provided in
Section 3. The classification results and analysis are presented in
Section 4. The discussion is given in
Section 5, and the conclusion is given in
Section 6.
4. Results
4.1. Classification of Self-Induced Emotions
We explored the classification of self-induced emotions by performing three subject-dependent experiments:
● Movie-Induced Emotion Recognition
Movie-induced emotional data were used as the training and testing sets for this classification task. Each subject watched 18 movie clips. In binary classification, samples from joy movie clips (three clips) were classified as positive, and samples from sad, disgust, anger, and fear movie clips (12 clips) were classified as negative. We utilized 49 samples from one movie clip as the testing set and all the other 686 samples from 14 movie clips as the training set to avoid correlations between the training and testing sets. The final accuracy for each subject could be acquired by averaging 15 results from the 15 tested movie clips. To classify emotions into six discrete categories, we utilized 49 × 2 samples from two movie clips of each emotional category as the training set and 49 samples from the one remaining movie clip as the testing set to avoid correlations between the training and testing sets. The final accuracy of each subject could be acquired by averaging three results from the three tested movie clips.
● Self-Induced Emotion Recognition
Self-induced emotional data were used as the training and testing sets for this classification task. Each subject recalled 18 movie clips. In binary classification, samples from the recollection of joy movie clips (three clips) were classified as positive, whereas those from the recollection of sad, disgust, anger, and fear movie clips (12 clips in total) were classified as negative. Each time, we utilized 49 samples from one movie clip as the testing set and all other 686 samples from 14 movie clips as the training set to avoid correlations between the training and testing sets. The final accuracy for each subject could be acquired by averaging 15 results from the 15 test sets. To classify emotions into six discrete categories, we utilized 49 × 2 samples from the recollection of 2 movie clips from each emotion category as a training set and 49 samples from the remaining movie clip as a testing set to avoid correlations between the training and testing sets. The final accuracy for each subject could be acquired by averaging three results from the three test sets.
● Prediction of Self-Induced Emotion through Movie-Induced Emotion
We utilized all 49 × 18 samples from 18 movie-induced emotional data as the training set to establish a classification model. By utilizing the established model, we predicted the categories of self-induced emotion, all 49 × 18 samples from self-induced emotional data as the testing set.
4.1.1. Classification of Positive and Negative Emotions
Table 4 shows the accuracies of binary classification for 30 participants in three experiment tasks above. The average accuracy for the binary classification of self-induced emotion is 87.36%, which is close to that of movie-induced emotion (87.20%). The average accuracy obtained for the third experiment task is 78.53%, which is far above the random probability of 50%. These findings indicated that self-induced emotion and movie-induced emotion share numerous commonalities. In the future, we can use a model established on the basis of affective-stimulus -induced emotion to predict comprehensively endogenous emotion.
We provided some other strategies for measuring classification performance in the case of an unbalanced training set of positive and negative samples.
Figure 9 illustrates the ROC curve of three experiment tasks. The areas under curve (AUC) are 0.9047, 0.8996, and 0.8102 and indicate that the model exhibits robust classification performance in discriminating positive from negative for both self-induced emotions and movie-induced emotions.
Table 5 provides the F1 score and classification accuracy for binary emotion recognition. The F1 score of positive samples is lower than that of negative samples because the training set contains higher numbers of negative samples than positive samples. The F1 scores of negative samples for three experiment tasks are 0.94, 0.92, and 0.86. The classification performance in F1 score and accuracy for self-induced emotions is similar to that for movie-induced emotions.
4.1.2. Classification of Emotions into Six Discrete Categories
Table 6 shows the accuracies obtained for the classification of the emotions of 30 participants in three experiment tasks. Emotions are classified into six discrete categories. The average classification accuracy for self-induced emotion is 54.52%, which is close to that for movie-induced emotion (55.65%). The average accuracy for the third case, prediction of self-induced emotion through movie-induced emotion, is 49.92%, which is far above the random probability of 16.67%.
The average confusion matrix of all participants under three experiment tasks is illustrated in
Figure 10.
Figure 10a,b show that the classification performance for joy is the best, followed by that for neutral emotion.
The classification performance for disgust is the best among the classification performances for all four types of self-induced negative emotions. The classification performance for anger is the best among all classification performances for all four types of movie-induced negative emotions.
Figure 10c shows that the model established on the basis of movie-induced emotion exhibits the best prediction performance for self-induced neutral emotion and then for joy. Among the classification performances for four types of negative emotions, that for anger is the best. The four negative emotions are easily misclassified, indicating that negative emotions share some commonalities.
4.2. Dimensionality Reduction
For each sample, we extracted 366 features in total. Are these features effective in emotion recognition? Which features and electrodes are more important in self-induced emotion recognition? In this subsection, we utilized MRMR method to analyze the important features, electrodes for self-induced emotion recognition.
Figure 11 illustrates the dimensionality reduction performance of the MRMR algorithm. The binary classification of self-induced emotion recognition achieves an accuracy of 85.21% and that of movie-induced emotion achieves an accuracy of 83.75% when the top 10 ranked features sorted by MRMR are selected for recognition. Accuracy increases continuously with the increasing number of utilized features. When 366 features are utilized, the classification accuracy for self-induced emotions is 87.36% and that for movie-induced emotion is 87.20%.
When the top 10 ranked features sorted by MRMR are selected for the classification of emotions into six discrete categories, the classification accuracy for self-induced emotion is 46.70% and that for movie-induced emotion is 46.47%. Accuracy increases continuously as the number of utilized features increases. When 366 features are utilized, the classification accuracy for self-induced emotions is 54.52% and that for movie-induced emotion is 55.65%.
To classify emotions into six discrete categories, we selected the corresponding top 20 electrode distributions in accordance with the ranking of the 366 features sorted by MRMR. The results are shown in
Table 7. The DE of electrode TP8 in the Beta band; the DE of electrodes AF7, AF8, FP1, FP2, F6, F8, FC6, FT8, T7, T8, TP8, TP9, TP10, CP6, P8, O1, O2, and Oz of the Gamma band; and the first difference of IMF1 of electrodes T7, T8, and C6 decomposed through EMD play an important role in the classification of movie-induced emotions.
The DE of electrodes AF7, AF8, FP1, FC5, FC6, FT7, FT8, T7, T8, TP7, TP8, TP9, TP10, C5, C6, CP6, P8, O1, and Oz in the Gamma band and the first difference of IMF1 of electrodes FT8, T8, TP10, and CP6 decomposed through EMD play an important role in the classification of self-induced emotions.
The features of high-frequency bands provide outstanding classification performance. These features include the DE of the Gamma band and the first difference of wave IMF1 with the highest oscillation frequency decomposed through EMD.
Figure 12 shows the distribution of the top 20 subject-independent electrodes selected on the basis of MRMR ranking. As can be seen from the figure, electrodes C5, C6, CP6, T7, T8, TP8, TP9, and TP10 on the temporal lobe; electrodes AF7, AF8, and FP1 on the prefrontal lobe; and electrodes O1, O2, and Oz on the occipital lobe play important roles in emotion recognition. This finding shows that the neural modes of external movie-induced emotion and internal self-induced emotion share common characteristics. We can use some of the important characteristics of self-induced emotion to lay the foundation for endogenous emotion recognition.
4.3. Neural Signatures and Patterns of Self-Induced Emotion
We analyzed the important features and average neural patterns of different self-induced emotions.
Figure 13 shows the boxplots of 10 important features of self-induced emotion. The figure shows that different emotions can be effectively identified by setting proper thresholds for different electrodes and features. For example, joy can be effectively distinguished from sadness, disgust, anger, and fear when the DE threshold of electrode T7 in the Gamma band is set to 0.6.
Figure 14 and
Figure 15 show the average brain topographies of movie-induced emotion and self-induced emotion, respectively. The six discrete emotion categories do not have significantly different brain topographies under the features of DEs of Delta (1–4 Hz), Theta (4–8 Hz), and Alpha (8–12 Hz) band. However, a slight difference in the left temporal lobe is noted under the DE of the Beta (12–30 Hz) band.
Under the DE of the Gamma (30–64 Hz) band, the six discrete categories of self-induced emotion result in significant differences in electrodes T7, T8, TP7, TP8, TP9, and TP10 on both temporal lobes; electrodes O1, O2, and Oz on the occipital lobe; and electrodes AF7, AF8, FP1, and FP2 on the prefrontal lobe. The feature values of both sides of the temporal and occipital lobes for joy are higher than those of other emotions. The feature value of the frontal lobe for disgust is the highest for all emotions. Neutrality has the lowest feature value over the entire brain topography, compared with other five discrete emotion categories. Similar results are observed for movie-induced emotions.
Under feature Dt based on EMD, the six discrete categories of self-induced emotion result in significant differences in electrodes T7, T8, TP7, TP8, TP9, and TP10 on both temporal lobes and in electrodes FPz, FP1, and FP2 on the prefrontal lobe. Disgust has the highest feature value at the prefrontal lobe, and joy has the highest feature value in the left temporal and occipital lobes. Similar results are observed for movie-induced emotions.
The important electrodes and features inferred from average brain topography are consistent with those selected by MRMR (refer to
Section 4.2). Therefore, the neural patterns of self-induced emotion do exist and they have much in common with stimuli-induced emotion. This finding makes sense for real-time recognition of comprehensive endogenous emotion.
5. Discussion
Emotion recognition from EEG signals has achieved significant progress in recent years. Previous researches have mainly focused on emotion induced by external affective stimuli, and few studies on the classification of self-induced emotion from EEG are available. The main contributions of this study can be summarized as follows:
First, we designed an experiment that considers two types of emotions: movie-induced emotion and self-induced emotion. Thirty participants took part in our experiment, and we developed an EEG-based dataset for the evaluation of the patterns of self-induced emotion across subjects.
Second, we evaluated classification performance for self-induced emotions. We achieved an average accuracy of 87.36% in discriminating positive from negative emotions and an average accuracy of 54.52% in classifying emotions into six discrete categories. We achieved similar accuracies for classifying movie-induced emotions. We also utilized movie-induced emotional data as a training set to establish a classification model. We used this model to classify self-induced emotions and achieved 78.53% accuracy in discriminating positive from negative emotions and 49.92% accuracy in classifying emotions into six discrete categories.
Third, we analyzed the important features and distributions of electrodes through MRMR algorithm. We found that the DE of the Gamma band and the first difference of IMF1 decomposed through EMD have good classification performance. These important electrodes are distributed in the bilateral temporal lobe (C5, C6, CP6, T7, T8, TP8, TP9, and TP10), the prefrontal lobe (AF7, AF8, and FP1), and the occipital lobe (O1, O2, and Oz). We also discovered that self-induced emotion and movie-induced emotion share numerous commonalities.
Finally, by analyzing the average brain topography of all the participants over all experimental sessions, we obtained the neural patterns of self-induced emotion as follows: Disgust is associated with the highest feature value of the prefrontal lobe; joy is associated with high feature values of bilateral temporal lobe and occipital lobes; and negative emotions elicit asymmetries in the bilateral temporal lobe. Moreover, the important brain regions and electrodes that we identified on the basis of average brain topography are consistent with those selected through the MRMR algorithm.
Our study is limited by our small sample size. We only collected EEG signals from 30 participants. In the future, we will collect additional EEG signals to verify our analysis and conclusion. In addition, we will investigate the real-time recognition of comprehensive endogenous emotion to promote the practical application of emotion recognition based on EEG signals.