1. Introduction
The brain–computer interface (BCI) is a subfield of human–computer interaction (HCI). The BCI enables the association between the human brain and electronic devices such as a computer and a mobile phone. The BCI has contributed to helping disabled people. A BCI system makes the user interact with the device, which employs EEG signals and others. The different processing steps in the BCI center focus on knowing the purposes of the brain signals and transforming them into actions [
1]. BCI techniques obtain signals from a subject’s brain, extract knowledge from the obtained/captured signals, and utilize this knowledge to define the purpose of the subject that might have created those signals. EEG signals are also employed in nonmedical contexts such as entertainment education, monitoring, and games [
2].
Emotions perform an essential role in human cognition, particularly in rational decision-making, perception, human interaction, and human intelligence. Affective computing has appeared to satisfy the gap in emotion, specifically in HCI, by gathering technology and emotions into HCI [
3]. HCI measures the emotional status of a user by capturing emotional interactions between a human and a computer. Emotion recognition is the method of knowing a human’s emotional status. Analysis of emotion recognition profits from the progress of psychology, modern neuroscience, cognitive science, and computer science [
4]. In computer science, emotion recognition by computer systems aims to enhance human–machine interaction over a broad range of application areas, including clinical, industrial, military, and gaming [
5].
Different approaches have been suggested for emotional recognition and can be split into two types: first, using the characteristics of emotional behavior, such as facial expression, tone of voice, and body gestures, to identify a particular emotion; second, using signals to identify emotions. The physiological activities can be registered by noninvasive sensors, often as electrical signals. These models involve skin conductivity, electrocardiogram, and EEG [
6].
Emotion evaluation techniques may consist of subjective and/or objective measurements. Subjective measures can be instruments for self-reporting, such as questionnaires, adjective checklists, and pictorial tools. Objective measures can apply physiological signals such as blood pressure responses, skin responses, pupillary responses, brain waves, and heart responses. Subjective and objective methods can be used jointly to improve the accuracy and reliability of emotional state determination [
7].
Emotion models were divided into two types: dimensional and discrete. The dimensional model describes the permanence of an emotional state. Most dimensional models combine valence and arousal. The discrete model of emotions assumes more emotions according to a particular number of emotions. Valence regards the level of pleasantness related to emotion. The range of valence represents an unpleasant state to a pleasant state. Arousal indicates the force of experience by emotion. This arousal happens along a continuous sequence and ranges from inactive (e.g., bored) to active (e.g., excited). The following points define the valence, arousal, and dominance emotion categories [
8]:
Valence: positive, happy emotions affecting a higher frontal consistency in alpha signals, and higher right parietal beta signal power, a contrast to negative emotion.
Arousal: excitation displaying a higher beta signal power and consistency in the parietal lobe, and lower alpha signal activity.
Dominance: the force of emotion, which is usually shown in the EEG as an addition to the beta/alpha signal activity proportion in the frontal lobe, and an increment in beta activity at the parietal lobe.
Plutchik [
9] illustrates eight essential emotions: anger, fear, sadness, disgust, surprise, anticipation, acceptance, and joy. All other emotions can be created by these essential ones; for example, disappointment is a combination of surprise and sadness.
Emotions can also be classified as negative, positive, and neutral emotions. The basic positive emotions care and happiness are necessary for survival, development, and evolution. Basic negative emotions, including sadness, anger, disgust, and fear, usually operate automatically and within a short period. However, the neutral emotional show policy is not based on scientific theory or research; it is more of a theory or prescriptive model of negotiations [
10].
Figure 1 shows another classification of emotions, ranging from negative to positive in the case of valence and from high to low in the case of arousal. For example, depressed, as an emotion, lies in the category of low arousal and negative valence.
Recognizing emotion from physiological signals primarily with EEG has obtained attention from researchers recently. EEG is the method that is most suited for signal gathering because of its high temporal resolution, safety, and ease of use. EEG has low locative resolution and is dynamic. EEG signals suffer from sensitivity produced by eye winking, eye movements, heartbeats, muscular exercises, and power line obstacles [
12].
Another stimulus that is especially physiologically efficient is the activation of the brain, as many activated neurons cause electrical stimulation on the surface of the skin with EEG electrodes. The dataset also contains external records for eye activity, electromyography (EMG), galvanic skin response (GSR), pacing, blood pressure, and temperature.
An EEG is a specific kind of biological signal. It is a measure of the electrical activity of the brain, performed by positioning several electrodes across the scalp [
13].
Recently, studying EEG signals has gained attention due to its availability. Today, there are new wireless EEG devices in the market that are portable, affordable, and easy to use. Studying EEG signals is an interdisciplinary approach that consists of different research areas in computer science, neuroscience, health and medical science, and biomedical engineering [
14].
EEG-based emotion recognition is broadly used in entertainment, e-learning, and healthcare applications. EEG is utilized for different purposes—for example, instant messaging, online games, assisted therapy, and psychology [
15].
Capturing human brain patterns is most efficient when the person is relaxed and has his/her eyes closed. Normally, they are estimated from peak to peak with a range from 0.5 to 100 μV in amplitude, which is around 100 times below EEG signals [
16].
Human brain waves have been classified according to different frequency collections: delta (0.1–4) Hz, theta (4–8) Hz, alpha (8–13) Hz, beta (13–30) Hz, and gamma (30–64) Hz [
17]. Alpha can be normally noticed more easily in the posterior, and action is provoked by closing the eyes and by relaxation, by eye-opening, or by warning through any status (thinking and computation). Beta waves begin to appear at a high frequency of more than 14 Hz and reach 80 Hz during tension. Theta waves are at a frequency of (4–7) Hz, theta waves appear when normal sleep and deep meditation, and delta waves at less than (3.5) Hz occur with deep sleep and guiding meditation [
16].
This work investigates human emotions based on EEG signals by applying machine learning methods to detect and classify various human emotions.
2. Related Work
Santamaria-Granados et al. [
18] applied a deep convolution neural network on the AMIGOS dataset [
19] of physiological signals (electrocardiogram and galvanic skin response). The study used advanced classic machine learning approaches to obtain the properties of physiological signals in the time, frequency, and nonlinear fields. This method accomplishes greater precision in the classification of emotional states.
Bazgir et al. [
20] applied EEG signals from the DEAP dataset to recognize an emotion according to the valence/arousal model. Support Vector Machine (SVM),
k-nearest neighbor (
k-NN), and artificial neural network (ANN) classifiers are classified as emotional states. Further information about the DEAP dataset can be found in
Section 4.1. The experiment showed a 91.3% accuracy for arousal and a 91.1% accuracy for valence in the beta frequency band using the cross-validated SVM with a radial basis function (RBF) kernel.
Alhagry et al. [
21] used a deep learning approach to recognize emotion from raw EEG signals after applying long short-term memory (LSTM) to detect features from EEG signals next to the dense layer, and features were classified into low/high arousal, valence, and liking sequentially. The DEAP dataset was used to verify this method, which provided an average accuracy of 85.65%, 85.45%, and 87.99% for the arousal, valence, and liking classes, sequentially.
Mehmood et al. [
22] produced EEG signals from special sensors that measured electrical activity for 21 healthy cases based on recordings from 14-channel.
The EEG signals were captured while the subjects looked at images, and four models of emotional stimuli (happy, calm, sad, or scared) were considered. The feature extraction phase used a statistical approach based on specific features for different frequency ranges. Features chosen by this statistical approach exceeded univariate and multivariate features. The optimal features were additionally prepared for emotion classification by applying SVM, k-NN, linear discriminant analysis, naïve Bayes, random forest, deep learning, and four ensembles methods. The outcomes reveal that the suggested method gave good results regarding classifying emotions.
Al-Nafjan et al. [
2] used a deep neural network (DNN) to identify human emotions from EEG signals taken from the DEAP dataset. The suggested method was compared to state-of-the-art emotion detection systems using the same dataset. The study showed how EEG-based emotion recognition can be performed by applying DNNs, particularly for a large number of training datasets.
Based on the previously discussed literature, there are common and unique issues about the conducted approaches for emotion detection based on different classifiers, which can be summarized as follows. First, classifiers that are utilized in the literature are varied. As noted, most of these conducted experiments over emotion detection, in general, use different classification algorithms. Second, different emotion states are used for classification together with the selected classification algorithm. Third, most of the previously mentioned approaches used the DEAP dataset because it is applicable for the analysis of human affective states and publicly available datasets. However, the accuracy of some approaches has reached above 91.3%, the best approach being with the DEAP dataset. Moreover, the complexity of the existing approaches is high if real-time processing is implemented. Accordingly, there is a need to enhance the accuracy of emotion detection and classification and reduce the complexity of the utilized approaches. The comparison is presented in
Table 1.
5. Results, Discussion, and Comparison
In this work, training, validation, and testing of the data are performed.
Figure 5 shows the data regarding the machine learning classifications. Three major sizes of testing and training data were used on this method to obtain the accuracy and run time. The sizes are as follows:
80% for the training and 20% for the testing.
70% for the training and 30% for the testing.
50% for the training and 50% for the testing.
The training phase involves splitting the data, shuffling, and random training to obtain the best accuracy rate for the different machine learning algorithms. The testing phase is the same as the training phase in order to test the model in all possibilities that the dataset presents. The following measurements are used to test the performance of each one of the used classifiers: sensitivity (SN), specificity (SP), positive predictive (PPV), and accuracy (ACC).
In this phase, the 40 signals and channels are divided between actual (32) and non-actual (8) brain signals, the latter of which is used later in the cross-check method after obtaining results from the classifiers.
5.1. Results
The classification process is done based on two stages: training and testing. In each task, the training and testing processes are implemented in n-folds, where n is set to 10. In n-folds, the data are divided into n equal folds, and the experiments are conducted in n-rounds. In each round, folds are used for training and 1-fold for testing. Accordingly, each of the folds is used as a testing set in each round leading to tests of all the available data. The results are reported as the results of all folds.
For the emotion classification task, four subclasses are presented: happy, calm, angry, and sad. Based on these subclasses, two main classes are calculated: valence and arousal. The comparison between the classifiers is based on the size of the training and testing data as well as each classifier’s run time and performance. Other researchers’ results are lastly compared with the proposed model.
Table 2 presents, for each dataset, the testing and training data sizes. The first table, which contains the valence and arousal results, shows the accuracy of each section with and without the other brain signals. The mean and standard division are shown for all results obtained from each instance of valence and arousal, using the EEG signals alone, the other brain signals and power alone, and all signals obtained from the brain together. The next table shows the overall accuracy with other accuracy measurements for each classifier and plots the results for a visual representation of the results obtained.
Participants experienced sadness and happiness emotions, and these were reflected in the brain signals. Furthermore, calmness and boredom emotions were experienced to a smaller degree, which indicates that the participants had stopped paying attention over time or that the videos were replayed.
5.2. Classifier Results
Table 3 and
Table 4 show the results obtained from each classifier (
k-NN and CNN) whereby the parameters were chained. The overall accuracy is shown in the tables in this section.
Table 3 shows the results based on different
k values in order to know which
k gives the best results. It is shown that the accuracy is 93% when
and
, whereas when
, the accuracy is 86.8%. This is due to the fact that the smaller the value of
k is, the more accurate the result is.
The method of preparation and testing is as follows. The whole sample was split into 10 sections, including nine training pieces and one testing portion. Each element per study was special. The other nine sections were used for preparation 10 times for the overall exercise and examination, and tests from training and testing did not always overlap. Therefore, k was set to three and five.
The best results were in k-NN, when , and in CNN, when epoch, layers, and hidden nodes (hNodes) = 20, 10, and 20, respectively. These results and parameters were selected to be set in the next tests for comparison between classifiers and to be used in our test for the experiment designed for the proposed method.
Table 4 shows the results of applying CNN with different values of epochs, layers, and hNodes. The best results were obtained when
,
, and
.
Table 5 and
Table 6 display three instances of data splitting to train and test the DEAP and the classifiers. The results show each splitting result. A summary and discussion are presented below. These results show the performance accuracy of 80% training and 20% testing of signals.
Table 5 and
Table 6 show the results of dividing the dataset into two groups: 80% and 20% for testing the accuracy of emotion classification. The obtained results were higher since the percentage of the trained data was large. The CNN classifier obtained the highest values regarding arousal and valence, shown in
Table 5, which was due to the convolution layer. Regarding the decision tree and naïve Bayes classifers, the results were close. Similar results were obtained in terms of accuracy, shown in
Table 6, where the highest value was obtained with CNN and the lowest value was obtained with decision tree.
The dataset was preprocessed using the previously explained filters and feature extraction algorithms. The model ran each classifier separately. The results are based on all channels and signals, which were studied.
Table 7 and
Table 8 show the results of classification based on the experiment of the two datasets: 70% training and 30% testing. Again, the CNN classifier yielded the highest accuracy for both arousal and valence.
Finally,
Table 9 and
Table 10 show the results of dividing the dataset into 50% and 50% for testing and training. This group yielded the lowest values because the percentages of the trained and tested dataset were equal. Nevertheless, the CNN classifier still showed the highest accuracy.
In the previous results, the CNN had a better performance for each test and train size. The accuracy decreased in each run when the training size was smaller. The 80% training size achieved better results than the 50% size due to the amount of training data and the amount of test data.
In general, all classifiers could detect emotions from the DEAP dataset and could classify and process the signals.
Comparison
A confusion matrix is a technique for summarizing the performance of a classification algorithm. Classification accuracy alone can be misleading if one has an unequal number of observations in each class or if one has more than two classes in a dataset. Calculating a confusion matrix can give one a better idea of which types of errors a classification model is making. This matrix can be used for two-class problems that are easy to understand, but it can also be easily applied to problems with three or more class values by adding more rows and columns to the confusion matrix.
Table 11 shows the values of the confusion matrix for testing the correctness of the used data. For example, in the sadness cases, the percentage of correctness was 68%.
5.3. Comparison with Other Model Results
Finally,
Table 12 compares the proposed work with others that used the same DEAP dataset. The proposed work yielded better results, where the accuracy is 92.44%.
However, the CNN classifier yielded better results than k-NN. However, it required more time due to the number of layers and calculations. k-NN yielded similar results to CNN but in a shorter time.
6. Conclusions
The evolution in the creation of sensors and signal record devices, as well as the development of signal handling and feature extraction techniques, has increased opportunities for using signals extracted from human organs, such as brain signals or heart signals, to identify a person’s condition, and thus detect psychological or pathological conditions in humans. This made the task of classifying signals required for improving the productivity of performance in the categorization of cases based on signals.
Categorizing emotions based on EEG signals could be one of the most complex applications with regard to analyzing human actions. This type of application can be defined as determining a person’s emotional state, which could reflect particular problems. EEG data can be extracted using different systems or devices. In this study, a DEAP dataset was used to identify and classify human emotions.
The proposed model in this paper is based on three main steps: processing, feature extraction, and classification. In the signal processing stage, three different techniques were used, including EMD/IMF and VMD, to remove noise from the signals and clean them to obtain the best possible details from the primary EEG data. For the feature extraction method, three methods were adopted to provide the classifiers with refined data for their classification and prediction.
In the classification stage, four main classifiers were used: k-NN, decision tree, naïve Bayes, and CNN. These were used to classify and define human feelings. After applying these classifiers under different criteria, each classifier yielded different results and running times, and these results were studied. It was concluded that the CNN classifier yielded the best results in terms of model performance. The work also contains a section for comparing the results of the proposed method with the work and results of other studies, which showed that the proposed method had better results in runtime and accuracy for predicting arousal and valence, and thus human emotions in general.
There are several differences in the performance of machine learning classifiers in terms of accuracy, precision, recall, and F1-measure. Through our tests, we found that the CNN was the best in terms of accuracy. Results also showed that the results of NB and k-NN were convergent. However, CNN outperformed other methods in EEG signal categorization. When applying an F1-measure on various cases and different classifiers, CNN yielded the highest F1-measure and accuracy in all cases.