1. Introduction
Emotions have the potential to improve the effectiveness of human interaction, whether it is human-to-human or human-to-machine. Emotions have a profound impact on human cognition, including logical decision making, perception, human interaction, and intelligence [
1,
2]. However, modeling human emotion based on the mechanism behind the emotional function of the brain is a challenging task [
3]. In the last decade, human–machine interaction (HMI) has received much attention. However, while interacting with a machine, emotional communication is almost nonexistent compared with that between humans. As we are strongly associated with machines (especially computers), it has become essential to involve emotion in HMI. According to Rani et al., HMI may be more intuitive, smoother, and effective in creating a new approach in the affective, cognitive, and developmental systems if machines can grasp a person’s affective state [
4]. At the core of such systems lies the problem of emotion recognition, which is to identify human emotional states from their behavioral and physiological signals [
5]. These human emotions can be vital information for HMI, biomedical research, and others. EEG-based emotion recognition can help improve patient treatment, especially those with expression problems and depression, as it will help the doctor with identifying the real emotional state of the patients [
6].
Emotion classification, in general, is the process of classifying an individual’s emotional state. There are several ways to record brain activity, but EEG has gained enormous popularity because it is noninvasive, portable, affordable, and applicable to practically all settings [
7]. However, EEG signals are very complex, and human emotions are very ambiguous, making the high-accuracy classification of emotion a challenging task. There are various preprocessing techniques for EEG data, such as downsampling and denoising. Some of the standard denoising techniques in EEG are bandpass filtering for removing external noise, eye-blink artifact removal, and baseline removal. A baseline is the EEG signal generated from the brain during the relaxed state of an individual. The baseline blurs the intended EEG signal corresponding to a stimulus; thus, baseline removal is an essential preprocessing step for denoising EEG signals. The main motivation in performing baseline removal is to refine the EEG signals before extracting features. The baseline-removed EEG signals does not carry the subject-specific noise, thus resulting in subject-independent features.
This paper proposes a novel method called the InvBase method for extracting subject-independent features for emotion classification. The method employs the concept of inverse filtering for baseline removal. However, inverse filtering is a common method in stationary signals such as images. Its application for baseline removal in nonstationary signals, such as EEG, is considered a significant contribution of this study. This method exploits the baseline recording of the benchmark DEAP dataset captured during the relaxed state of an individual, where DEAP stands for database for emotion analysis using physiological signals [
8]. The power spectrum of the baseline is used to eliminate the excess power in the trial EEG power captured during an emotional event. The proposed method for baseline removal utilizes the idea of inverse filtering, which is commonly used in denoising blurred images [
9,
10]. In the proposed method for baseline removal, an EEG signal is first split into fixed-size time slots. The time-domain EEG signals corresponding to each time slot are then converted to frequency-domain signals. The frequency-domain signals are baseline removed using inverse filtering. These baseline signals are grouped in nonoverlapping time windows and averaged. For each window, the trend and harmonic are simultaneously fit to segments in order to estimate the power of the residual segments [
11]. Necessarily, the window size is greater than the slot size. After that, the frequency spectrum in each window is subdivided into four frequency sub-bands for each channel, and statistical features, such as mean and variance, are extracted as features. These features are considered to be subject-independent as the individual’s EEG data are filtered with the removal of baseline data of the individual, thus retaining only the EEG characteristics corresponding to the particular emotion. Following that, these features are used to train three different classifiers: k-nearest neighbour (kNN), support vector machine (SVM), and multilayer perceptron (MLP). In this study, two classification problems were taken into consideration, (1) high arousal vs. low arousal and (2) high valence vs. low valence.
The baseline removal technique, which subtracts the frequency spectrum of the baseline from the frequency spectrum of the EEG signal, was also implemented in this study and is termed the subtractive method. The InvBase method was compared with the subtractive method and the no-baseline-correction (NBC) method. The NBC method does not remove the baseline from the EEG data. Various validation analyses were performed on all the methods.
The novel InvBase method, used to extract subject-independent features, can be further implemented for other EEG-based classification problems, such as cognitive load estimation and motor imagery. However, the technique was employed in this study to remove the baseline from EEG data in order to classify emotions. It is evident after observing the DEAP dataset [
8] that EEG signals vary from subject to subject for the same elicited emotion. Furthermore, performing feature extraction in such data generates subject-dependent features that hamper the classification accuracy. Emotion-related EEG features are highly subject-dependent due to the presence of a baseline. In order to obtain subject-independent features for EEG-based emotion classification, the InvBase method shows considerable potential.
The remainder of the paper is organized as follows: In
Section 2, a detailed review of the studies in the field of emotion classification is presented. The background of the current study is elaborated in
Section 3. In
Section 4, the proposed InvBase method for baseline removal and feature extraction process is discussed in detail, and the classification problem is also elaborated. The experimental results are provided in
Section 5. Finally, the discussion and conclusions are presented in
Section 6 and
Section 7, respectively.
2. Literature Survey
In this section, we discuss the different aspects of EEG-based emotion classification research. EEG-based emotion classification requires the following actions: emotion elicitation and signal acquisition, preprocessing, feature extraction, and classification.
The two major techniques used for emotion elicitation are using external stimuli, such as audio–visual [
8,
12,
13] or memory recall [
14]. For signal acquisition, BiosemiActive Two, Emotiv wireless headset, EEG module from Neuroscan Inc., and g.MOBIlab are the most used devices [
15]. The preprocessing step comprises downsampling, eye-blink artifact removal [
16], electromyogram artifact removal [
17], baseline removal [
8,
18,
19], bandpass filtering for noise removal, and others. Various researchers have also used wavelet-transform-based denoising techniques for EEG signals [
20].
After preprocessing the EEG signals, the next important step is feature extraction. Features are frequently derived from the delta, theta, alpha, beta, and gamma frequency regions for emotion classification. The following feature extraction techniques are usually used for emotion classification: asymmetry measure (ASM) [
21], power spectral density (PSD) [
13], differential entropy (DE) [
21], wavelet transform (WT) [
22], higher-order crossings (HOC) [
23], common spatial patterns (CSP) [
17], asymmetry index (AI) [
24], and AsMap [
25]. Furthermore, in the least-squares wavelet analysis, features are extracted from time series data without the need for editing or preprocessing of the original series [
26].
In this study, frequency sub-bands features were extracted as they are the most widely used features in EEG research. Emotion classification is the final step in which the extracted features are used to train a classifier. Classification tools, such as SVM [
13,
17,
21,
27,
28], linear discriminant analysis [
29,
30], quadratic discriminant analysis [
23], k-NN [
21,
22,
23,
31], naïve Bayes [
30], feed-forward neural network [
32], deep belief network [
1], multilayer perceptron neural network (MLPNN) [
22], convolution neural network (CNN), and recurrent neural network (RNN) [
19] are frequently used in EEG-based emotion classification. Fraiwan et al. in [
33] proposed an ANN-based machine learning model for classifying enjoyment levels of individuals. Their model uses multiscale entropy (MSE) to calculate features, such as mean MSE, slope of the MSE, and complexity index for emotion classification. However, these researchers did not consider any baseline removal technique for eliminating unwanted noise in the EEG signals.
Later in this section, the preprocessing technique that involves baseline removal before extracting features for emotion classification is discussed. Fewer studies have been reported in this area, as baseline removal is considered trivial preprocessing. A dataset, namely DEAP, was created by Koelstra et al., which contains EEG and physiological information from subjects exposed to audio–visual stimuli [
8]. In their study, they recorded a 5 s baseline EEG in a relaxed state, and a 60 s music video was played during which EEG data were recorded. The baseline frequency power was subtracted from each trial’s frequency power. The frequency power was calculated between 3 and 47 Hz. The subtractive method in this study calculates the change in power compared with the prestimulus time. Theta (3–7 Hz), alpha (8–13 Hz), beta (14–29 Hz), and gamma (30–47 Hz) frequency bands were summed to provide these variations in power, which were then deployed as features to train a Gaussian naïve Bayes classifier for low/high arousal, valence, or liking. Lastly, the accuracies of the EEG-based classification for arousal, valence, and liking were 62.0%, 57.6%, and 55.4%, respectively.
Xu et al. [
18] suggested a fundamental method for deriving emotional traits from EEG. The strategy’s core element is to rectify the emotional data by filtering the baseline data. To verify their method, they employed the DEAP dataset. The baseline data are first converted into a frequency spectrum, and correlation coefficients are calculated for the baselines of a subject. Highly correlated baselines are retained, and those weakly correlated are replaced by the mean of the highly correlated baselines. After that, each trial’s power spectral density (PSD) is corrected based on the PSD of the original high-correlation baseline and the new baseline. However, Xu et al. did not directly mention the baseline removal method. In order to determine the PSD’s mean, maximum, minimum, standard deviation, skewness, kurtosis, and fractal dimension, the frequency spectrum was divided into five segments: theta (4–7 Hz), alpha (8–12 Hz), lower beta (13–21 Hz), upper beta (22–30 Hz), and gamma (31–45 Hz). From each channel, 35 characteristics, or 1120 features in total, were obtained by the aforementioned methods. The SVM with a radial basis kernel function was then used as the classifier. Additionally, the PSD features were used to train a CNN. The arousal classification accuracies obtained using the baseline strategy on SVM and CNN were 79.54% and 77.69%, respectively. Furthermore, using the baseline strategy, the valence classification accuracies obtained on SVM and CNN were 75.62% and 81.14%, respectively.
Yang et al. [
19] put forward a preprocessing method based on baseline signals. In their study, they built a hybrid network that combines CNN and RNN to classify emotions. In addition to the preprocessing conducted in the DEAP dataset, they further processed the EEG signal. From the baseline signal, the baseMean is calculated by segmenting the signal into N segments of L length for each C channel. Each segment is a C × L matrix. All the C × L matrices are added lengthwise, and mean values are calculated, which is termed as baseMean. The baseMean is then subtracted from the raw EEG signal segmented likewise. The 1D EEG vector is then transformed into a 2D vector or an EEG frame in a subsequent step that preserves spatial information between many neighboring channels. After that, each data frame is normalized across the nonzero element using Z-score normalization. The 2D EEG frames are fed in parallel to CNN and RNN to obtain the spatial feature vector (SFV) and temporal feature vector (TFV), respectively. The SFV and TFV are concatenated and fed into a SoftMax function to classify valence and arousal. The highest classification accuracies achieved for valence and arousal were 90.80% and 91.03%, respectively.
6. Discussion
The results presented here indicate that the proposed baseline-removal technique provides significant improvement in classification accuracy compared with the subtractive and no-baseline-correction methods. Using multilayer perceptron, the classification accuracy improved by 29% over the no-baseline-correction method and 15% over the subtractive method. Experiments were conducted on various slot sizes, and we found that the 6 s slot size provided the highest classification accuracy and F1 score for both valence and arousal. Furthermore, varying the window size, classifications were performed, and the results indicated that increases in the window size result in decreases in classification accuracy. This is due to the fact that calculating the features over a large window size results in lower frequency resolution.
There are few studies that considered a baseline removal strategy. When compared with the existing studies, the proposed method outperforms other methods [
8,
18,
19] in terms of classification accuracy. Another advantage of the proposed method compared with other existing methods is that it uses traditional machine learning models, which are relatively less complex. One of the limitations of this study is that we used fixed-size time windows while calculating the features. Furthermore, we did not use advanced machine learning techniques, such as CNN, for enhancing the classification accuracy. This study highlights the importance of baseline removal, as the accuracy of the classifier directly depends on the quality of the input data. The results showed that the InvBase method of baseline elimination outperforms existing state-of-the-art baseline-removal methods in EEG-based emotion recognition systems. The ability of the proposed method to remove baseline noise from EEG signals provides room for progress in EEG-based emotion detection. Many researchers have reported improved classification accuracy using deep learning techniques [
38,
39]. The InvBase method with deep learning is a promising option for further improving the classification accuracy.