A Novel Method of Emotion Recognition from Multi-Band EEG Topology Maps Based on ERENet

Lv, Ziyi; Zhang, Jing; Epota Oma, Estanislao

doi:10.3390/app122010273

Open AccessArticle

A Novel Method of Emotion Recognition from Multi-Band EEG Topology Maps Based on ERENet

by

Ziyi Lv

^1,2

,

Jing Zhang

^1,2,*

and

Estanislao Epota Oma

^1,2

¹

School of Information Science and Engineering, University of Jinan, Jinan 250022, China

²

Shandong Provincial Key Laboratory of Network-Based Intelligent Computing, Jinan 250022, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(20), 10273; https://doi.org/10.3390/app122010273

Submission received: 31 August 2022 / Revised: 30 September 2022 / Accepted: 10 October 2022 / Published: 12 October 2022

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

EEG-based emotion recognition research has become a hot research topic. However, many studies focus on identifying emotional states from time domain features, frequency domain features, and time-frequency domain features of EEG signals, ignoring the spatial information and frequency band characteristics of the EEG signals. In this paper, an emotion recognition method based on multi-band EEG topology maps is proposed by combining the frequency domain features, spatial information, and frequency band characteristics of multi-channel EEG signals. In this method, multi-band EEG topology maps are introduced to present EEG signals, and a novel emotion recognition network, ERENet, is proposed to recognize emotional states from multi-band EEG topology maps. The results on the DEAP dataset show that the performance of ERENet outperforms that of most of the current methods.

Keywords:

emotion recognition; EEG; deep learning; multi-band EEG topology maps; ERENet

1. Introduction

In social life, emotion plays an increasingly important role; it not only reflects people’s attitude towards objective things, but it also affects people’s communication and behavior [1]. The accurate and timely expression and recognition of emotions are the basis of interpersonal communication. However, there are some people in daily life who cannot clearly express their emotions or emotional needs, such as patients with advanced Alzheimer’s disease, patients with facial neuritis, autistic patients, and patients with depression. They are in a closed emotional state due to physical or psychological barriers. Emotion recognition can help these patients to convey their emotions in their social environment in real time. The patients’ families can more accurately understand their inner thoughts, and doctors can properly evaluate their emotional state and formulate appropriate treatment plans according to their personal conditions, which can benefit their lives and their health. Therefore, research on human emotion recognition is of great significance, and has become an important research hotspot in neuroscience, psychology, computer science, cognitive science, and other fields.

Several methods may be used for emotion recognition, which can be divided into two categories: non-physiological signals and physiological signals. Non-physiological signals include facial expressions, body movements, speech, and language, while physiological signals include EEG signals, ECG signals, Galvanic skin response signals, EMG signals, etc. However, there are some people who cannot express their emotions through facial expressions or language, such as patients with severe facial neuritis. Besides, non-physiological signals can be deliberately disguised and do not always reflect people’s objective physiological emotions. Therefore, few studies have been conducted on emotion recognition based on non-physiological signals. Physiological characteristics, such as EEG signals, are difficult to be manipulated or changed manually, which can reflect the real objective physiological emotions of people. Additionally, EEG signals respond faster to emotional changes and have a higher time resolution [2]. Moreover, EEG signals are directly related to neuronal activities, which can provide more information in the process of emotion generation. Therefore, it has also become a powerful support for emotion recognition technology. In recent years, with the emergence of EEG acquisition devices with wearability, portability, and ease of use, it has become possible to record and analyze multi-channel EEG signals, which further promotes the development of emotion recognition research based on EEG signals.

Due to the low signal-to-noise ratio and non-stationary characteristics of EEG signals, many studies have been conducted on how to extract key identifiable features from EEG signals for emotion recognition. The commonly used features mainly include time domain features, frequency domain features, and time-frequency domain features. Time domain features are mainly signal statistics in the time domain, such as amplitude [3], mean [4], and standard deviation [5]. In addition, differential entropy (DE) [6,7], event-related potentials (ERP) [8], high-order cross features (HOC) [9], non-stationary index (NSI) [9,10] and fractal dimension (FD) [10] have also been proved to be effective time-domain features. In frequency domain analysis, fast Fourier transform (FFT) [11], Welch [12], and other methods are used to extract the power spectrum and power spectral density (PSD) [13,14] in EEG signals. In addition, the auto-regressive coefficient (AR) [15] has also been introduced for EEG emotion recognition. Time frequency feature extraction methods mainly include the short-time Fourier transform (SIFT) [16,17,18,19], wavelet packet transform [20], and empirical mode decomposition (EMD) [21].

In the early days, many traditional machine learning methods were adopted to identify emotional states, mainly including the Bayesian network [22], K-nearest neighbor (KNN) [14,23,24], support vector machine (SVM) [14,25,26,27], and hidden Markov model (HMM) [28]. In recent years, more and more deep-learning-based methods have been widely used in emotion recognition tasks due to the excellent performance of deep learning in the field of vision. Li et al. [29] proposed a self-organized graph neural network (SOGNN) for cross-subject EEG emotion recognition. Mokatren et al. [30] proposed a new EEG classification method based on convolutional neural network (CNN), which considered the spatial configuration of EEG sensors and adopted a new EEG analysis model, 2D-IDR, to construct 2D image representations of EEG samples. Schirrmeister et al. [31] proposed a series of deep ConvNets with different architectures for the end-to-end learning of EEGs and proved the effectiveness of ConvNets in spatial mapping learning features by visualizing the learned EEG features. Tao et al. [32] proposed a convolution recurrent neural network (ACRNN) based on the attention mechanism for EEG emotion recognition, which uses the channel attention mechanism to learn the spatial features of multi-channel EEG signals in convolution layers and uses an extended self-attention mechanism to explore the temporal features of different time slices in an LSTM network. Arjun et al. [33] proposed a novel deep learning framework consisting of unsupervised long short-term memory (LSTM) with a channel attention autoencoder and CNN with an attention framework for performing the task of subject-independent emotion recognition. Bao et al. [34] proposed a novel model named MDGCN-SRCNN, which combines the graph convolution network (GCN) and CNN and aims to fully learn spatial features at different levels and deep abstract features to distinguish different emotions. Compared with traditional machine learning methods, deep learning methods can better dig out the internal information of EEG signals and show great potential in emotion recognition based on multi-channel EEGs.

However, most emotion recognition methods at present are limited to recognizing emotional states from time domain features, frequency domain features, and time-frequency domain features of EEG signals. Little research has been conducted on the spatial topological characteristics and frequency band characteristics of multi-channel EEG signals. There are some methods [32,35,36] that try to learn spatial features from a multi-channel EEG time series that is a 2D matrix, with one dimension representing the electrodes and the other dimension representing the time series. The EEG time slices only aggregate all of the electrode measurements together, and the correlation between EEG changes in different brain regions cannot be mined from them. Jones et al. [37] found that the left hemisphere is more active in positive emotions and the right hemisphere is more active in negative emotions, which has also been confirmed by many studies [38,39]. Huang et al. [40] pointed out that for valence recognition, prefrontal asymmetry and parietal asymmetry are present in the alpha band; temporal asymmetry is observable in the gamma band; and for arousal recognition, prefrontal asymmetry in the alpha band and temporal asymmetry in the gamma band are present. The spatial information of EEG signals and the characteristic information of each frequency band contain significant information related to emotional states, which is of great significance for the discrimination of emotional states. Therefore, how to combine the frequency features of EEG signals with spatial information and frequency band characteristics is a key problem.

In view of the above reasons, a new method of emotion recognition based on multi-band EEG topology maps is proposed in this paper. In this method, multi-band EEG topology maps are introduced by combining the frequency domain characteristics, spatial information, and frequency band characteristics of multi-channel EEG signals, which can clearly represent the changes in EEG signals on the scalp under different emotional states. Inspired by the neuronal circuits in the human brain’s nervous system, a lightweight emotion recognition network, ERENet, is proposed to learn the feature representation from the multi-band EEG topology. In ERENet, a multi-band discrete parallel processing module is designed firstly to extract the features of the EEG topology map of each frequency band in parallel, so that the spatial frequency features of each specific frequency band can be effectively learned. In addition, a multi-band information exchange and reorganization module is designed to extract the features of the feature map groups composed of different bands for learning the combined features of different bands. Finally, a weighted classification module is designed, in which the attention mechanism is integrated into the aggregation of feature values at the same location to explore the correlation of different combined features. In this paper, the effectiveness of the proposed framework is evaluated using three classification scenarios: intra-subject, inter-subject, and mixed-subject, and compared with the other methods. The experimental results show that the proposed method outperforms most existing methods in terms of three classification strategies. The main contributions are summarized as follows:

Fusing the frequency domain features, spatial information, and frequency band characteristics of multi-channel EEG signals, a novel emotion recognition framework based on multi-band EEG topology maps is proposed. In this framework, the multi-band EEG topology maps are introduced, which not only reflect the relative positional relationship between EEG channels, but also reflect the differences in frequency bands in EEG frequency domain features.
A novel emotion recognition network, ERENet, is proposed. In ERENet, a multi-band discrete parallel processing module is designed for extracting spatial frequency domain features from the EEG topology map of each specific frequency band. In addition, a multi-band information exchange and recombination module is designed for the interaction of features in different frequency bands; thereby, more representational combined features can be extracted. Finally, a weighted classification module is designed to explore the correlation of different combined features.

The remainder of this paper is structured as follows: Section 2 presents the dataset and the preprocessing of the data, and describes the proposed method. Section 3 presents the experiments and the results. The experimental results are further discussed in Section 4, and Section 5 concludes this paper.

2. Materials and Methods

In this section, the dataset used in the experiments is firstly described, followed by the process of how to preprocess the EEG signals, and finally the proposed emotion recognition method from multi-band EEG topology using ERENet is introduced in detail.

2.1. Dataset

The DEAP dataset [41], a very important dataset for analyzing human emotional states through physiological signals, is used in this paper to verify the effectiveness of the proposed emotion recognition method. The dataset records some physiological signals of 32 participants (16 males and 16 females) when watching 40 music videos and the subjective scores of the participants in four aspects: valence, arousal, dominion, and liking. According to the 10–20 international standard lead system, 32 electrodes are placed in the frontal lobe, parietal lobe, occipital lobe, and temporal lobe of the participants’ brains and their EEG signals are collected at a sampling frequency of 512 Hz. In addition to 32 EEG channels, 16 other channels are used in the dataset, including common signals such as EOG and ECG. Forty trials are performed on each subject, and the subjects need to watch a 60 s music video each trial. After watching the video, subjects need to give a subjective score of valence, arousal, dominion, and liking in a range from 1 to 9.

The official preprocessed EEG data and corresponding subjective scores are used in this paper. After the preliminary preprocessing, the original EEG signals are downsampled from 512 Hz to 128 Hz, and the noise, such as eye blinking artifacts, is removed. At the same time, band-pass filtering is used to limit the signals to the frequency range of 4–45 Hz. The processed dataset contains the data of 40 channels, including 32 EEG signals, 2 EOG signals (1 horizontal EOG signal, 1 vertical EOG signal), 2 EMG signals, 1 GSR signal, 1 respiratory band signal, 1 plethysmography, and 1 temperature recording signal. Only the first 32 EEG channels are studied in this paper; we are not currently considering other channel signals. In addition, valence and arousal are used as emotional evaluation criteria for the proposed model. Valence indicates the degree of pleasantness, with scores from 1 to 9 corresponding to the transition from negative to positive, and arousal indicates the degree of excitement of one’s mood, with scores from 1 to 9 corresponding to the transition from calmness to excitement. In the experiments, the samples are divided into two classes by setting the threshold value to 5. When the score is less than 5, the sample is marked as low, and when the score is greater than or equal to 5, the sample is marked as high.

2.2. Data Preprocessing

Data preprocessing mainly includes removing the baseline signals and segmenting the signal using a sliding window to augment the data.

In the original data acquisition, 63 s EEG signals were recorded through each electrode, including 3 s baseline signals and 60 s trial signals, so the primary task was to remove the 3 s baseline signals. Baseline signals were only removed without considering their role in most previous studies. Yang et al. [33] took the baseline signals into consideration by measuring the difference between the baseline signals and the trial signals recorded when the subject was stimulated, thus improving the recognition accuracy. Given a segment

X \in ℝ^{M \times N}

of EEG signals with sampling frequency

S

and sampling time

T

, where

M

and

N

are the number of electrodes and the number of sampling points, respectively,

X

consists of two parts:

X_{b a s e}

and

X_{t r i a l}

.

X_{b a s e} \in ℝ^{M \times N_{1}}

represents the baseline signals before the trial, and

N_{1}

is the number of sampling points in the baseline signals

X_{b a s e}

. To begin with,

X_{b a s e}

is evenly divided into

K

segments and several segments

\{X_{b a s e}^{1}, X_{b a s e}^{2}, \dots X_{b a s e}^{K}\}

of baseline signals are obtained, where

X_{b a s e}^{i} \in ℝ^{M \times S} (i = 1, 2, \dots, K)

represents the

i

-th segment of baseline signals and

K = \frac{N_{1}}{S}

. Next, the average value

{\bar{X}}_{b a s e}

of the baseline signals

X_{b a s e}

is calculated, as shown in Equation (1):

{\bar{X}}_{b a s e} = \frac{\sum_{i = 1}^{K} X_{b a s e}^{i}}{K}, {\bar{X}}_{b a s e} \in ℝ^{M \times S},

(1)

X_{t r i a l} \in ℝ^{M \times N_{2}}

represents trial signals, where

N_{2}

denotes the number of sampling points in the trial signals.

X_{t r i a l}

is divided into

L

segments the same size as

{\bar{X}}_{b a s e}

, and several segments

\{X_{t r i a l}^{1}, X_{t r i a l}^{2}, \dots X_{t r i a l}^{L}\}

are obtained, where

X_{t r i a l}^{i} \in ℝ^{M \times S} (i = 1, 2, \dots, L)

represents the

i

-th segment of trial signals

X_{t r i a l}

and

L = \frac{N_{2}}{S}

. Then

X_{b a s e R e m o v e d}^{i}

is calculated as shown in Equation (2):

X_{b a s e R e m o v e d}^{i} = X_{t r i a l}^{i} - {\bar{X}}_{b a s e}, X_{b a s e R e m o v e d}^{i} \in ℝ^{M \times S},

(2)

where

X_{b a s e R e m o v e d}^{i} (i = 1, 2, \dots, L)

represents the difference between the

i

-th segment of trial signals and the average value of the baseline signals. Finally, these segments

{X_{b a s e R e m o v e d}^{1}, X_{b a s e R e m o v e d}^{2}, \dots, X_{b a s e R e m o v e d}^{L}}

are connected to obtain a new matrix

X_{b a s e R e m o v e d} \in ℝ^{M \times N_{2}}

.

There are only 1280 EEG samples (32 subjects × 40 trials) in the dataset, and it is difficult to train the model with few samples. The sliding window is applied to each EEG sample for augmenting samples. The sample

X_{b a s e R e m o v e d} \in ℝ^{M \times N_{2}}

is divided into several segments

{X_{S}^{1}, X_{S}^{2}, \dots, X_{S}^{n}}

using a sliding window of

u

seconds with a sliding step size of

v

seconds along the time dimension, where

X_{S}^{i} = \{x_{1}, x_{2}, \dots, x_{M}\} \in ℝ^{M \times R} (i = 1, 2, \dots, n)

represents the

i

-th segment,

x_{i} \in ℝ^{R} (i = 1, 2, \dots, M)

represents the EEG signal of the

i

-th electrode in

X_{S}^{i}

, and

R

represents the number of sampling points per segment. Controlled experiments are carried out with the sliding windows of 1 s, 2 s and 5 s, respectively, and experimental results show that the classification accuracy of the model with the sliding window of 2 s is higher. In addition, experimental results show that a sliding step of 0.5 s is better compared with the sliding steps of 0.125 s, 0.25 s, and 1 s. After data augmentation, each EEG sample is divided into 117 new samples inheriting the label of the original sample, and the number of EEG samples of each subject is expanded to 4680. A total of 149,760 new samples are obtained, as shown in Table 1 and Table 2.

2.3. An Emotion Recognition Method from Multi-Band EEG Topology Maps Using ERENet

The proposed emotion recognition framework based on multi-band EEG topology maps is shown in Figure 1. Firstly, the multi-channel EEG signal sequences are acquired and preprocessed mainly through removing the baseline signals and augmenting the data. Secondly, the spectral power of the theta, alpha, beta, and gamma bands is extracted using FFT, and a 2D topology map corresponding to each band is created. Then, the corresponding four EEG topology maps are used as the input of ERENet to learn the emotion generation modes and emotion evaluation.

In this subsection, the process of how to preprocess the multi-channel EEG sequences is firstly described, then the multi-band EEG topology maps and their construction process are introduced, and finally the proposed emotion recognition network ERENet is described in detail.

2.3.1. Multi-Band EEG Topology Maps

EEG signals are some multi-channel continuous time sequences with the format of “channel number × sampling time”, which are adopted as the input data format of the models in many studies [32,35,36]. The disadvantage of this data format is that the electrode measurements at different spatial locations in the cerebral cortex are only aggregated together and the topological structure of EEG signals is ignored. The brain is a complex system, the completion of tasks relies on the cooperation of various regions, and there is a certain internal correlation between the various regions. The spatial information of EEG signals contains significant information related to emotional states, which is of great significance in exploring the correlation between brain regions. Aiming to preserve the topological structure of EEG signals, multi-channel EEG signal sequences are converted into 2D topology maps, so that the signals adjacent to the physical channels are still adjacent in the 2D topology map. The international 10–20 system describes the location of scalp electrodes, in which “10” and “20” mean the actual distance between adjacent electrodes is 10% or 20% of the total distance between the front and back of the skull or the left and right sides. Figure 2a shows the spatial distribution of electrodes in the international 10–20 system, in which the electrodes marked with blue are the 32 electrodes used in the DEAP dataset. To address the loss of electrode topological location information, 32 electrodes used in the DEAP dataset are relocated to the 2D topological structure based on the spatial distribution of electrodes in the international 10–20 system. For each sampling point, 32-channel EEG signals are mapped to 9 × 9 matrix, as shown in Figure 2b.

At present, some studies have been conducted on the activity of emotional states in different frequency bands. Alarcão et al. [42] pointed out that the alpha, beta, and gamma bands of the frontal lobe have relatively significant information to distinguish emotional states. In addition, Frantzidis et al. [43] observed that the features displayed in the theta band are closely related to arousal. Therefore, theta, alpha, beta, and gamma frequency bands are selected to research the emotional state characteristics of the human brain, and four 2D matrices are constructed to describe the significant information related to emotional states in four frequency bands. The construction of multi-band EEG topological maps is shown in Figure 3. The multi-band EEG topology maps can clearly represent the changes in EEG signals on the scalp under different emotional states and can provide more abundant information integrating the frequency domain features, spatial information, and frequency band characteristics of multi-channel EEG signals.

EEG signals are non-stationary signals with strong randomness, which cannot be analyzed by some traditional methods. If EEG signals are decomposed into several short signals, the signals in a short time can be approximately stationary. Therefore, the EEG segment

x_{i} \in ℝ^{R}

of each electrode can be regarded as a stationary signal after the samples are segmented by sliding window. Fast Fourier transform (FFT) is performed on

x_{i} \in ℝ^{R}

to extract the spectral power

{p_{i}^{θ}, p_{i}^{α}, p_{i}^{β}, p_{i}^{γ}}

of theta, alpha, beta, and gamma bands as frequency domain features. The calculation of

\{p_{i}^{θ}, p_{i}^{α}, p_{i}^{β}, p_{i}^{γ}\}

is shown in Equation (3):

x (k) = F F T (x_{i}), k = 0, 1, \dots, R - 1, p_{i}^{b a n d} = \sum_{j = b a n d_F r e q S t a r t}^{b a n d_F r e q E n d} |x (\frac{j \times R}{S})|, b a n d = θ, α, β, γ,

(3)

where the frequency bands of theta, alpha, beta, and gamma are 4–8 Hz, 8–13 Hz, 13–30 Hz, and 30–45 Hz, respectively. Let

p_{i} = {p_{i}^{θ}, p_{i}^{α}, p_{i}^{β}, p_{i}^{γ}}

denote the frequency domain features extracted from

x_{i}

, and

p = \{p_{1}, p_{2}, \dots, p_{M}\}

denote the frequency domain features of

M

electrodes extracted from EEG sample

X_{S}^{i} \in ℝ^{M \times R}

.

According to the mapping rule described in Figure 2, the topological positions of 32 electrodes can be obtained by relocating the electrode channels to their topological positions, as shown in Figure 2c. The spectral power

p^{θ} = {p_{1}^{θ}, p_{2}^{θ}, \dots, p_{M}^{θ}}

of theta band in

p

is mapped into a 2D matrix

f_{θ} \in ℝ^{h \times w}

according to their respective locations, where

h

and

w

represent the height and width of the 2D matrix, respectively, and

h = w = 9

in this paper. The remaining positions in the matrix are filled with zero. There are two reasons why zero is used to fill the remaining positions: matrix data more suitable for model input can be obtained, and the relative position of electrodes on the brain can be better simulated without introducing excess noise. This process is repeated for alpha, beta, and gamma bands. Therefore, the frequency domain features

{p^{θ}, p^{α}, p^{β}, p^{γ}}

of the four frequency bands extracted from the time slice

X_{S}^{i}

are converted into four 2D topological maps

{f_{θ}, f_{α}, f_{β}, f_{γ}}

through mapping, as shown in the lower left corner of Figure 3.

{f_{θ}, f_{α}, f_{β}, f_{γ}}

will be used as an input for ERENet for learning and emotion evaluation of emotion generation mode.

2.3.2. The Model of Proposed ERENet

In this paper, fully considering the band information of EEG signals and the spatial information between various channels, a lightweight EEG-based emotion recognition network is proposed, inspired by the neuronal circuits in the human brain, as shown in Figure 4.

The model mainly includes three modules: a multi-band discrete parallel processing module, a multi-band information exchange and reorganization module, and a weighted classification module.

Multi-band Discrete Parallel Processing Module

The discrete parallel processing circuit architecture is an organizational mode of presenting information in the nervous system. Signals are presented and processed in parallel through discrete information channels [44], as shown in Figure 5. A typical example is the olfactory bulb in the brain: olfactory receptor neurons expressing the same odorant receptors are scattered in the olfactory bulb but send their axons to the same glomerulus to synapse onto the dendrites of their corresponding second-order projection neurons, forming discrete olfactory processing channels [45,46]. In addition, discrete parallel processing is also characteristic of the visual nervous system. Different bipolar and ganglion cells in the retina form specific connections, and different types of visual signals, such as brightness, color, and motion, are processed in parallel. Compared with serial processing, parallel processing reduces the computational depth, thus reducing the error rate and increasing the processing speed.

Different frequency bands of EEG signals have different degrees of correlation with emotional states. Therefore, EEG signals of different frequency bands can be processed in parallel in discrete information channels based on the principle of discrete parallel processing circuit architecture. Each information channel corresponds to a kind of band signal, and characteristic information specific to the frequency band signal can be extracted in each information channel. The design of the multi-band discrete parallel processing module is shown in the left box in Figure 4. This module is mainly responsible for extracting the feature information specific to each band, and is composed of a group convolution layer, a batch normalization layer (BN) and an activation layer. The BN and the activation layer are not shown in Figure 4.

In the group convolution layer, the input data

{f_{θ}, f_{α}, f_{β}, f_{γ}}

are grouped by frequency bands, and each frequency band’s data

f_{i} (i = θ, α, β, γ)

are set with

μ

convolution kernels for convolution, respectively, with stride of 1. The boundaries are filled with 0 to keep the output feature map the same size as the input image. After convolution, each 2D topology map

f_{i} (i = θ, α, β, γ)

is encoded into a higher-level representation, as shown in Equation (4):

T^{θ} = C o n v 2 D_θ (f_{θ}) = \{T_{1}^{θ}, T_{2}^{θ}, \dots, T_{μ}^{θ}\}, T^{θ} \in ℝ^{h \times w \times μ}, T^{α} = C o n v 2 D_α (f_{α}) = \{T_{1}^{α}, T_{2}^{α}, \dots, T_{μ}^{α}\}, T^{α} \in ℝ^{h \times w \times μ}, T^{β} = C o n v 2 D_β (f_{β}) = \{T_{1}^{β}, T_{2}^{β}, \dots, T_{μ}^{β}\}, T^{β} \in ℝ^{h \times w \times μ}, T^{γ} = C o n v 2 D_γ (f_{γ}) = \{T_{1}^{γ}, T_{2}^{γ}, \dots, T_{μ}^{γ}\}, T^{γ} \in ℝ^{h \times w \times μ},

(4)

where

T_{i}^{j} \in ℝ^{h \times w} (i = 1, 2, \dots, μ; j = θ, α, β, γ)

represents the

i

-th feature map of the band

j

. In the classical CNN model, a pooling layer is usually added behind the convolution layer to reduce the data dimension, but some information is often lost. In the classical CNN models, a pooling layer is usually added after the convolution layer to reduce the data dimension, which often leads to the loss of some information. In the EEG-based recognition tasks, the size of the EEG topology is much smaller than the image used in the field of computer vision. Therefore, the pooling operation is not used after the convolution layer in this study for preserving all information. After the group convolution layer, the BN layer is added to normalize the data distribution, which does not only avoid the gradient disappearance or explosion and make the gradient more predictable and stable, but also increases the training speed. Finally, the final feature representation

U

of the module is obtained through the activation of the Relu activation function, as shown in Equation (5):

U = \{U_{i}^{j} \in ℝ^{h \times w} |U_{i}^{j} = R E L U (B N (T_{i}^{j})), i = 1, 2, \dots, μ; j = θ, α, β, γ\},

(5)

where

U_{i}^{j} \in ℝ^{h \times w}

represents the

i

-th feature representation of the band

j

. Moreover,

U

is output to the next module.

2.: Multi-band Information Exchange and Reorganization Module

Many studies have pointed out that the fusion of features of different frequency bands can improve the recognition accuracy [17,47,48]. Therefore, aiming to effectively utilize the feature information of different frequency bands at the same spatial position, a multi-band information exchange and reorganization module is designed, which is responsible for the exchange and reorganization of feature maps, and fully extracting deep features in the feature map groups composed of different frequency bands, as shown in the middle box of Figure 4. The multi-band information exchange and reorganization module also includes a group convolution layer, a BN layer, and an activation layer.

In the group convolution layer of this module, the feature maps

U^{j} = \{U_{1}^{j}, U_{2}^{j}, \dots, U_{μ}^{j}\} \in ℝ^{h \times w} (j = θ, α, β, γ)

convoluted through the same frequency band information channel in the previous module are divided into different groups, so that each group

U_{i} = \{U_{i}^{θ}, U_{i}^{α}, U_{i}^{β}, U_{i}^{γ}\} \in ℝ^{h \times w \times 4} (i = 1, 2, \dots, μ)

of feature maps contains feature maps from different frequency bands.

U_{i}

is set with

μ

convolution kernels for convolution, respectively, with stride of 1, and the boundaries of the feature maps are filled with 0. The convolution result

V_{i}

of

U_{i}

is shown in Equation (6):

V_{i} = C o n v 2 D_i (U_{i}) = \{V_{i}^{1}, T_{i}^{2}, \dots, T_{i}^{λ}\}, V_{i} \in ℝ^{h \times w \times λ}, i = 1, 2, \dots, μ,

(6)

where

V_{i}^{j} \in ℝ^{h \times w}

represents the

i \times j

-th feature map. After convolution of

μ

groups of feature maps,

μ \times λ

different fusion feature maps can be obtained. After convolution operation, the BN layer is also set to normalize the data distribution, and the final feature representation

H

of the module is obtained through the Relu activation layer and output to the next module, as shown in Equation (7):

H = \{H_{i}^{j} \in ℝ^{h \times w} |H_{i}^{j} = R E L U (B N (V_{i}^{j})), i = 1, 2, \dots, μ; j = 1, 2, \dots, λ\},

(7)

where

H_{i}^{j} \in ℝ^{h \times w} (i = 1, 2, \dots, μ; j = 1, 2, \dots, λ)

represents the

i \times j

-th feature representation.

3.: Weighted Classification Module

The structure of the weighted classification module shown in the right box of Figure 4 includes a channel-weighted pooling layer, a fully connected layer, and an output layer with a Sigmoid activation function. In the traditional CNN, the fully connected layer also accounts for abundant parameters in addition to the convolution layer. For example, when MobileNets [49] is applied to Imagenet dataset for classification, the fully connected layer uses 1024 feature vectors as input and generates 1000 probability values, corresponding to 1000 classes. The number of parameters is more than one million, accounting for 24.33% of the total parameters. Therefore, channel-weighted pooling is proposed to replace the fully connected layer in the weighted classification module, which not only aggregates features, but also reduces abundant parameters in the classification layer.

In order to filter all the fused features and further extract features, all feature maps are directly considered instead of performing the convolution operation on a feature map from the beginning to the end, the feature values at the same position of

μ \times λ

feature maps are aggregated, and each position is given a learnable weight according to its importance. Let

F = \{F_{1}, F_{2}, \dots, F_{μ \times λ}\} = H

denote the fused features’ output by the previous module, and

F_{i} = \{F_{1}, F_{2}, \dots, F_{μ \times λ}\} (i = 1, 2, \dots, μ \times λ)

denote the

i

-th feature map, where

s_{j} (j = 1, 2, \dots, h \times w)

represents the feature value at the

j

-th position of the feature map. Therefore, when aggregating feature values at the

j

-th position of all feature maps, a learnable weight

η_{i}^{j} (i = 1, 2, \dots, μ \times λ)

is assigned to each position, and the extracted features can be expressed as shown in Equation (8):

c_{j} = η_{1}^{j} \times s_{j}^{F_{1}} + η_{2}^{j} \times s_{j}^{F_{2}} + \dots + η_{μ \times λ}^{j} \times s_{j}^{F_{μ \times λ}} = \sum_{i = 1}^{μ \times λ} (η_{i}^{j} \times s_{j}^{F_{i}}), j = 1, 2, \dots, h \times w,

(8)

F_{C W P} = \{c_{1}, c_{2}, \dots, c_{h \times w}\}

represents the feature extracted from feature maps

F

. Therefore, channel-weighted pooling not only reduces the feature dimension, but also filters different fusion features to further extract more representative features. The fully connected layer is not completely discarded, and one is added between the channel-weighted pooling and the output layer to enhance the feature representation. The final feature vector

F_{F C}

is shown in Equation (9):

F_{F C} = F C (F_{C W P}), F_{F C} \in ℝ^{m},

(9)

where

m

represents the number of neurons in the fully connected layer. In the output layer, the feature vectors

F_{F C}

are aggregated, and the aggregated result is activated by Sigmoid to obtain the final prediction result, as shown in Equation (10):

P = s i g m o i d (F C (F_{C W P})), P \in (0, 1),

(10)

In the traditional classification layer as shown in a, the total number of parameters is

h \times w \times C \times m + 2 m + 1

from the feature vector

h \times w \times C

flattened into a 1D vector, then through the fully connected layer to the output layer. The number of parameters of the weighted classification module is

h \times w \times (C + m) + m + 1

as shown in Figure 6b.

C

represents the number of feature maps’ output by the previous module and

C = μ \times λ

.

Although the weighted classification module only replaces a fully connected layer with a channel-weighted pooling layer, this module only has 10,784 trainable parameters, which is a reduction of 41,077 trainable parameters compared with the previous classification layer that was only composed of full-connection layers. Moreover, the model with channel-weighted pooling layer achieves better results than other models with similar parameter quantities.

3. Experiments and Results

The proposed model is evaluated by using the EEG-based two-class classification tasks that include two subtasks: the low valence/high valence (LV/HV) classification task and the low arousal/high arousal (LA/HA) classification task. Accuracy is applied as an indicator to measure the performance of all the methods used in the paper. In this section, the experimental environment and the experimental settings are firstly introduced, and then the selection of some hyperparameters in the model is discussed in detail. Finally, the proposed method is compared with other methods.

3.1. Experimental Environment and Experimental Settings

The proposed model is implemented based on the TensorFlow framework and deployed on a NVIDIA GeForce RTX 2060 GPU. The binary cross entropy is employed as the loss function of the model, and an Adam optimizer is applied to minimize the loss function with an initial learning rate of 1 × 10⁻³ and a decay value of 5 × 10⁻⁴. In addition, the batch size and epoch are set to 100 and 100, respectively.

3.2. Selection of Hyperparameters

Different selections of hyperparameters will affect the performance of the model. Therefore, some controlled experiments on some hyperparameters are conducted for finding the best combination scheme of hyperparameters in the model. They are evaluated using the validation set, and the combination scheme of hyperparameters with the highest recognition accuracy can finally be adopted. The hyperparameters involved include the number and size of convolution kernels in the group convolution layer of the multi-band discrete parallel processing module, the number and size of convolution kernels in the group convolution layer of the multi-band information exchange and reorganization module, and the number of neurons in the fully connected layer of the weighted classification module. To ensure the reliability of the results, a 10-fold cross validation strategy is employed to evaluate the performance of the model under different combination schemes. The samples of all subjects are shuffled and divided into a training set and test set in the ratio of 9:1, and the divided test set is used to test the model in the next subsection. The samples in the training set are randomly divided into 10 equal data subsets, one of which is used as the verification set and the other nine subsets constitute the training set. The process is repeated 10 times, and the average accuracy of the 10 experiments is taken as the experimental result.

Aiming to choose the number of convolution kernels that makes the model perform best, four groups of hyperparameter combination schemes are designed, as shown in Table 3.

In Table 3, only the number of convolution kernels is different in the four groups of schemes, and the other hyperparameters are the same. Among them, the convolution kernels in module 1 and module 2 are set to 3 × 3, and the number of neurons in the fully connected layers in module 3 is set to 5. The emotion recognition results of models with different numbers of convolution kernels are shown in Figure 7.

It can be seen from the experimental results in Figure 7 that with the increase in the number of convolution kernels, the accuracy recognition of the models is gradually increasing. However, there is a problem: as the number of convolution kernels increases, the amount of model parameters and the computational complexity also increase, and the number of parameters of each scheme is about twice that of the previous scheme, as shown in Table 4. It can also be observed from Figure 7 that scheme 2, scheme 3, and scheme 4 have better recognition performance compared with scheme 1, and the difference in recognition accuracy of the three schemes is small. Therefore, scheme 2 with relatively few parameters is a more appropriate choice among the latter three schemes with better recognition performance.

After determining the number of convolution kernels in the group convolution layers of the first two modules, a controlled experiment was further set up for choosing an appropriate size for the convolution kernels. Five groups of hyperparameter combination schemes with different sizes of convolution kernels are shown in Table 5.

In Table 5, only the sizes of the convolution kernels are different and the other hyperparameters are the same in the five groups of schemes. The number of convolution kernels in module 1 and module 2 is set to eight and 16, respectively, and the number of neurons in the fully connected layer of module 3 is set to five. Figure 8 shows the emotion recognition results of models with different sizes of convolution kernels.

It can be seen from the experimental results in Figure 8 that the accuracy of scheme 3 with the convolution kernel of 5 × 5 is higher in these five groups of schemes. Although the 3 × 3 convolution kernel has more extensive applications in the field of computer vision, it is not suitable for EEG feature extraction due to the sparsity of 2D EEG signals, and a larger convolution kernel can often mine more correlations between channels and learn the features of EEG signals more comprehensively. In the last four groups of schemes, scheme 3 achieves a better recognition performance. Therefore, the size of the convolution kernels is set to 5 × 5 in the group convolution layer of module 1 and module 2.

After determining the number and size of convolution kernels in the group convolution layer of the first two modules, four groups of combination schemes are designed to determine the number of neurons in the fully connected layer of the last module. The number of neurons in the four groups of schemes is set to 0, 5, 10, and 20, respectively, as shown in Table 6. Figure 9 shows the emotion recognition results of the four schemes.

It can be seen from the experimental results in Figure 9 that the model with the fully connected layer has a higher accuracy compared with that without the fully connected layer, and the fully connected layer with five neurons can achieve the best performance of the model.

3.3. Comparison between ERENet and Other Methods

Aiming to verify the effectiveness of proposed method, the recognition performance of ERENet is compared with three baseline methods and five deep learning methods. The three baseline methods include SVM, 1D-CNN, and 2D-CNN. The linear function is adopted as the kernel function with a penalty parameter of one in SVM, and the input feature vector is the splicing of all channel features, that is, 4 × 32 = 128 dimensions. Both 1D-CNN and 2D-CNN are composed of an input layer, two convolution modules, a flattening layer, two fully connected layers, and an output layer. Each convolution module is composed of a convolution layer, a pooling layer, and a Relu activation layer. In 1D-CNN, the input data are a vector with the length of 32 and the channel number of four. The first convolution layer uses 32 kernels of 1 × 16 with a stride of one, and the boundaries are padded with zero. Some parameters of the second convolution layer are the same as those of the first convolution layer, except that the number of convolution kernels is set to 128. The max pooling layer with a kernel of 1 × 2 and a stride of 1 is adopted after the convolution layer. The number of neurons in the two fully connected layers are 128 and 32, respectively, and the Sigmoid activation function is used in the output layer. In 2D-CNN, the shape of the input data is the same as those of ERENet, and the size of the convolution kernel and the size of the pooling kernel are 5 × 5 and 2 × 2, respectively. Apart from these parameters, the other parameters are the same as 1D-CNN. The five deep learning methods include EEGNet [35], ShallowNet [31], CapsNet [11], EmoPercept [50], and 3D-CNN [51]. All methods undergo the same preprocessing as ERENet.

The performance of all methods in this study is evaluated in three classification scenarios: intra-subject, inter-subject, and mixed-subject. For intra-subject classification, also known as subject-dependent classification, the training data and test data are from a single subject. In this classification scenario, the samples of a single subject are randomly divided into the training set and test set at a ratio of 9:1 for experiments. For inter-subject classification, also known as subject-independent classification, the training data and test data are from several subjects. One subject is selected as the target subject, and the data of this subject is used as the test set, while the other subjects are regarded as the source subjects, and their data are used as the training set. The results of intra-subject and inter-subject classification are the average accuracies of 32 subjects. For mixed-subject classification, the training data and test data are also from several subjects. Unlike inter-subject classification, the source subjects and target subject are not divided in mixed-subject classification, and the samples of all subjects are shuffled and divided into the training set and test set at a ratio of 9:1. In order to obtain more reliable results, the experiments are repeated five times and different data distribution schemes are adopted for each experiment. The average results of the five experiments are taken as the final results of the task. Table 7 shows the experimental results of all methods.

The experimental results in Table 7 show that on the two classification tasks (low/high valence, low/high arousal), ERENet achieves a high accuracy of 95.38% and 95.09% in the intra-subject classification scenario, 68.32% and 67.97% in the inter-subject classification scenario, and 88.47% and 87.52% in the mixed-subject classification scenario. It can be found that the emotion recognition results under different classification scenarios are significantly different, and the classification accuracy of the intra-subject classification scenario is higher than that of the between-subject and subject mixed classification scenarios for all methods, which is due to the variability in the EEG signals of different subjects. Moreover, compared with the traditional method, SVM, the recognition performance of the deep learning methods is better.

Compared with the baseline methods, such as SVM, 1D-CNN and 2D-CNN, ERENet achieves better results. On the valence and arousal classification tasks, ERENet has an average improvement of 20.01% and 20.88% in the intra-subject classification scenario, 19.91% and 19.56% in the inter-subject classification scenario, and 21.35% and 21.34% in the mixed-subject classification scenario. The input of SVM is a high-dimensional vector, which cannot mine the spatial information between the electrodes, so the recognition performance is poor. Compared with ERENet, the classification accuracy of 1D-CNN and 2D-CNN is lower, because the two models ignore the correlation between frequency bands.

Compared with five deep learning methods, not including 3D-CNN [51], ERENet achieves the highest recognition accuracy in three classification scenarios. Compared with EEGNet [35] and ShallowNet [31], the proposed model has similar parameters to theirs, but on the valence and arousal classification tasks, it has an average improvement of 8.96% and 9.38% in the intra-subject classification scenario, 4.74% and 6.39% in the inter-subject classification scenario, and 9.64% and 9.01% in the mixed-subject classification scenario. Compared with other deep learning models, such as CapsNet [11] and EmoPercept [50], the proposed model shows better performance. In the valence and arousal classification tasks, ERENet has an average improvement of 7.48% and 6.01% in the intra-subject classification scenario, 6.12% and 5.53% in the inter-subject classification scenario, and 10.45% and 9.05% in the mixed-subject classification scenario. Although the spatial information and frequency band characteristics of the EEG signals are also considered in CapsNet [11], the spatial representation in CapsNet [11] is not very effective. The recognition accuracy of 3D-CNN [51] in three classification scenarios is higher than that of ERENet, because it can simultaneously explore spatial and temporal information, and can simultaneously extract multi-scale features which can effectively detect attention states.

4. Discussion

The proposed method outperforms most of the previous methods on EEG-based emotion recognition tasks in three classification scenarios. Compared with SVM, 1D-CNN, 2D-CNN, EEGNet [35], ShallowNet [31], CapsNet [11], and EmoPercept [50], the proposed method has a better recognition performance for two reasons. Firstly, the multi-band EEG topology maps that combine the frequency domain features, spatial information, and frequency band characteristics of multi-channel EEG signals are a spatial representation that is more helpful in recognizing emotional states. Secondly, the proposed emotion recognition network ERENet is more capable of mining the features related to specific emotional states. However, ERENet has a lower recognition accuracy compared with 3D-CNN [51] because 3D-CNN [51] can simultaneously explore spatial and temporal information and simultaneously extract multi-scale features which can effectively detect attention states. Although the recognition performance of 3D-CNN [51] is better than that of ERENet, ERENet has a simpler structure and fewer parameters, which is more suitable for situations with limited computing resources, as shown in Table 8.

5. Conclusions

A novel method for emotion recognition based on multi-band EEG topology is proposed in this paper. In this method, multi-band EEG topology maps are introduced by combining the frequency domain characteristics, spatial information, and frequency band characteristics of multi-channel EEG signals, which can clearly represent the changes in EEG signals on the scalp under different emotional states. Meanwhile, a novel emotion recognition network, ERENet, is proposed to learn feature representation from multi-band EEG topology maps. The experimental results on the DEAP dataset show that on the two classification tasks (low/high valence, low/high arousal), ERENet achieves a high accuracy of 95.38% and 95.09% in the intra-subject classification scenario, 68.32% and 67.97% in the inter-subject classification scenario, and 88.47% and 87.52% in the mixed-subject classification scenario. The proposed method outperforms most previous methods and is superior to other methods with similar parameters. In the future work, an EEG-based emotion recognition method that is more suitable for platforms with limited computing resources will be explored, and further research will be conducted on EEG-based emotion recognition in inter-subjects.

Author Contributions

Conceptualization, Methodology, software, investigation, data curation and writing—original draft preparation, Z.L.; modification and writing—review and editing, Z.L., J.Z. and E.E.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (Youth) under Grant No. 52001039, National Natural Science Foundation of China under Grant No. 52171310, Funding of Shandong Natural Science Foundation in China under Grant No. ZR2019LZH005, Research fund from Science and Technology on Underwater Vehicle Technology Laboratory under Grant No. 2021JCJQ-SYSJJ-LB06903.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

http://www.eecs.qmul.ac.uk/mmv/datasets/deap/ (accessed on 7 October 2022).

Acknowledgments

The authors would like to thank the editors and reviewers for their advice.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wright, R.; Riedel, R.; Sechrest, L.; Lane, R.D.; Smith, R. Sex differences in emotion recognition ability: The mediating role of trait emotional awareness. Motiv. Emot. 2017, 42, 149–160. [Google Scholar] [CrossRef]
Pattnaik, S.; Sabut, S.K.; Dash, M. DWT-based Feature Extraction and Classification for Motor Imaginary EEG Signals. In Proceedings of the 2016 International Conference on Systems in Medicine and Biology, Kharagpur, India, 4–7 January 2016. [Google Scholar]
Hajjar, Y.; Hajjar, A.E.S.; Daya, B.; Chauvet, P. Determinant characteristics in EEG signal based on bursts amplitude segmentation for predicting pathological outcomes of a premature newborn. In Proceedings of the 2017 Sensors Networks Smart and Emerging Technologies, Beiriut, Lebanon, 12–14 September 2017. [Google Scholar]
Samara, A.; Menezes, M.; Galway, L. Feature Extraction for Emotion Recognition and Modelling Using Neurophysiological Data. In Proceedings of the 2016 15th International Conference on Ubiquitous Computing and Communications and 2016 International Symposium on Cyberspace and Security (IUCC-CSS), Granada, Spain, 14–16 December 2016. [Google Scholar]
Yuen, C.T.; San, W.S.; Rizon, M.; Seong, T.C. Classification of human emotions from EEG signals using statistical features and neural network. Int. J. Integr. Eng. 2009, 1, 71–76. [Google Scholar]
Zhang, G.; Yu, M.; Liu, Y.J.; Zhao, G.; Zhang, D.; Zheng, W. SparseDGCNN: Recognizing emotion from multichannel EEG signals. IEEE Trans. Affect. Comput. 2021, 1949–3045. [Google Scholar] [CrossRef]
An, Y.; Hu, S.; Duan, X.; Zhao, L.; Xie, C.; Zhao, Y. EEG emotion recognition based on 3D feature fusion and convolutional autoencoder. Front. Comput. Neurosci. 2021, 15, 743426. [Google Scholar] [CrossRef]
Frantzidis, C.A.; Bratsas, C.; Papadelis, C.L.; Konstantinidis, E.; Pappas, C.; Bamidis, P.D. Toward Emotion Aware Computing: An Integrated Approach Using Multichannel Neurophysiological Recordings and Affective Visual Stimuli. IEEE Trans. Inf. Technol. Biomed. 2010, 14, 589–597. [Google Scholar] [CrossRef]
Petrantoakis, P.C.; Hadjileontiadis, L.J. Emotion Recognition from EEG Using Higher Order Crossings. IEEE Trans. Inf. Technol. Biomed. 2010, 14, 186–197. [Google Scholar] [CrossRef]
Wen, T.X. Research on Feature Extraction and Classification of EEG Signals. Ph.D. Thesis, Xiamen University, Xiamen, China, 2018. [Google Scholar]
Chao, H.; Dong, L.; Liu, Y.; Lu, B. Emotion Recognition from Multiband EEG Signals Using CapsNet. Sensors 2019, 19, 2212. [Google Scholar] [CrossRef] [Green Version]
Thammasan, N.; Moriyama, K.; Fukui, K.I.; Numao, M. Continuous Music-Emotion Recognition Based on Electroencephalogram. Ieice Trans. Inf. Syst. 2016, 99, 1234–1241. [Google Scholar] [CrossRef] [Green Version]
Song, T.; Zheng, W.; Song, P.; Cui, Z. EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks. IEEE Trans. Affect. Comput. 2020, 11, 532–541. [Google Scholar] [CrossRef] [Green Version]
Liu, Y.; Shi, S.; Song, Y.; Gao, Q.; Li, Z.; Song, H.; Pang, S.; Li, D. EEG based Mental Workload Assessment by Power Spectral Density Feature. In Proceedings of the 2022 IEEE International Conference on Mechatronics and Automation, Guilin, China, 7–10 August 2022. [Google Scholar]
Zhang, T.; Chen, W.; Li, M. AR based quadratic feature extraction in the VMD domain for the automated seizure detection of EEG using random forest classifier. Biomed. Signal Process. Control 2017, 31, 550–559. [Google Scholar] [CrossRef]
Liu, Y.; Yu, M.; Zhao, G.; Song, J.; Ge, Y.; Shi, Y. Real-Time Movie-Induced Discrete Emotion Recognition from EEG Signals. IEEE Trans. Affect. Comput. 2018, 9, 550–562. [Google Scholar] [CrossRef]
Fang, W.C.; Wang, K.Y.; Fahier, N.; Ho, Y.L.; Huang, Y.D. Development and validation of an EEG-based real-time emotion recognition system using edge AI computing platform with convolutional neural network system-on-chip design. IEEE J. Emerg. Sel. Top. Circuits Syst. 2019, 9, 645–657. [Google Scholar] [CrossRef]
Song, T.; Zheng, W.; Lu, C.; Zong, Y.; Zhang, X.; Cui, Z. MPED: A multi-modal physiological emotion database for discrete emotion recognition. IEEE Access 2019, 7, 12177–12191. [Google Scholar] [CrossRef]
Ullah, H.; Uzair, M.; Mahmood, A.; Ullah, M.; Khan, S.D.; Cheikh, F.A. Internal emotion classification using EEG signal with sparse discriminative ensemble. IEEE Access 2019, 7, 40144–40153. [Google Scholar] [CrossRef]
Li, M.; Chen, W.; Zhang, T. A novel seizure diagnostic model based on kernel density estimation and least squares support vector machine. Biomed. Signal Process. Control 2018, 41, 233–241. [Google Scholar] [CrossRef]
Chen, X.; Xu, X.; Liu, A.; McKeown, M.J.; Wang, Z.J. The Use of Multivariate EMD and CCA for Denoising Muscle Artifacts from Few-Channel EEG Recordings. IEEE Trans. Instrum. Meas. 2018, 67, 359–370. [Google Scholar] [CrossRef]
Wang, Z.; Hope, R.M.; Wang, Z.; Ji, Q.; Gray, W.D. An EEG workload classifier for multiple subjects. In Proceedings of the 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, USA, 30 August–3 September 2011. [Google Scholar]
Zainuddin, A.Z.A.; Mansor, W.; Khuan, L.Y.; Mahmoodin, Z. Classification of EEG Signal from Capable Dyslexic and Normal Children Using KNN. Adv. Sci. Lett. 2018, 24, 1402–1405. [Google Scholar] [CrossRef]
Li, M.; Xu, H.; Liu, X.; Lu, S. Emotion recognition from multichannel EEG signals using K-nearest neighbor classification. Technol. Health Care 2018, 26, 509–519. [Google Scholar] [CrossRef] [PubMed]
Bhardwaj, A.; Gupta, A.; Jain, P.; Rani, A.; Yadav, J. Classification of human emotions from EEG signals using SVM and LDA Classifiers. In Proceedings of the 2015 2nd International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 19–20 February 2015. [Google Scholar]
Li, X.; Sun, X.; Qi, X.; Sun, X. Relevance vector machine Based EEG emotion recognition. In Proceedings of the 2016 Sixth International Conference on Instrumentation & Measurement, Computer, Communication and Control (IMCCC), Harbin, China, 21–23 July 2016. [Google Scholar]
Tibdewal, M.N.; Tale, S.A. Multichannel detection of epilepsy using SVM classifier on EEG signal. In Proceedings of the International Conference on Computing Communication Control & Automation, Pune, India, 12–13 August 2016. [Google Scholar]
Golmohammadi, M.; Hossein, H.; Silvia, L.; Obeid, L.; Picone, J. Automatic Analysis of EEGs Using Big Data and Hybrid Deep Learning Architectures. Front. Hum. Neuroence 2017, 13, 76. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Li, J.; Li, S.; Pan, J.; Wang, F. Cross-subject EEG emotion recognition with self-organized graph neural network. Front. Comput. Neurosci. 2021, 15, 611653. [Google Scholar] [CrossRef]
Mokatren, L.S.; Ansari, R.; Cetin, A.E.; Leow, A.D.; Ajilore, O.A.; Klumpp, H.; Vural, F.T.Y. EEG Classification by Factoring in Sensor Spatial Configuration. IEEE Access 2021, 9, 19053–19065. [Google Scholar] [CrossRef]
Schirrmeister, R.T.; Springenberg, J.T.; Fiedere, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tao, W.; Li, C.; Song, R.; Cheng, J.; Liu, Y.; Wan, F.; Chen, X. EEG-based Emotion Recognition via Channel-wise Attention and Self Attention. IEEE Trans. Affect. Comput. 2020. [Google Scholar] [CrossRef]
Arjun; Rajpoot, A.S.; Panicker, M.R. Subject independent emotion recognition using EEG signals employing attention driven neural networks. Biomed. Signal Process. Control 2022, 75, 103547. [Google Scholar] [CrossRef]
Bao, G.; Yang, K.; Tong, L.; Shu, J.; Zhang, R.; Wang, L.; Yan, B.; Zeng, Y. Linking Multi-Layer Dynamical GCN With Style-Based Recalibration CNN for EEG-Based Emotion Recognition. Front. Comput. Neurosci. 2022, 16, 834952. [Google Scholar] [CrossRef]
Lawhern, V.J.; Solon, A.J.; Waytowich, N.R.; Gordon, S.M.; Hung, C.P.; Lance, B.J. EEGNet: A Compact Convolutional Network for EEG-based Brain-Computer Interfaces. J. Neural Eng. 2018, 15, 1741–2552. [Google Scholar] [CrossRef] [Green Version]
Zhang, D.; Yao, L.; Chen, K.; Monaghan, J. A Convolutional Recurrent Attention Model for Subject-Independent EEG Signal Analysis. IEEE Signal Process. Lett. 2019, 26, 715–719. [Google Scholar] [CrossRef]
Jones, N.A.; Fox, N.A. Electroencephalogram asymmetry during emotionally evocative films and its relation to positive and negative affectivity. Brain Cogn. 1992, 20, 280–299. [Google Scholar] [CrossRef]
Liu, Y.; Sourina, O.; Nguyen, M.K. Real-Time EEG-Based Emotion Recognition and Its Applications. Trans. Comput. Sci. XII 2011, 6670, 256–277. [Google Scholar]
Jatupaiboon, N.; Pan-Ngum, S.; Israsena, P. Emotion classification using minimal EEG channels and frequency bands. In Proceedings of the International Joint Conference on Computer Science & Software Engineering, Khon Kaen, Thailand, 29–31 May 2013. [Google Scholar]
Huang, D.; Guan, C.; Ang, K.K.; Zhang, H.; Pan, Y. Asymmetric Spatial Pattern for EEG-based emotion detection. In Proceedings of the International Joint Conference on Neural Networks, Brisbane, QLD, Australia, 10–15 June 2012. [Google Scholar]
Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.-S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. DEAP: A Database for Emotion Analysis Using Physiological Signals. IEEE Trans. Affect. Comput. 2012, 3, 18–31. [Google Scholar] [CrossRef] [Green Version]
Alarcão, S.M. Reminiscence Therapy Improvement using Emotional Information. In Proceedings of the 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, TX, USA, 23–26 October 2017. [Google Scholar]
Frantzidis, C.A.; Lithari, C.D.; Vivas, A.B.; Papadelis, C.L.; Pappas, C.; Bamidis, P.D. Towards emotion aware computing: A study of arousal modulation with multichannel event-related potentials, delta oscillatory activity and skin conductivity responses. In Proceedings of the 8th IEEE International Conference on Bioinformatics & Bioengineering, Athens, Greece, 8–10 October 2008. [Google Scholar]
Luo, L. Architectures of neuronal circuits. Science 2021, 373, 7285. [Google Scholar] [CrossRef] [PubMed]
Axel, R. The molecular logic of smell. Sci. Am. 1995, 273, 154–159. [Google Scholar] [CrossRef] [PubMed]
Vosshall, L.B.; Stocker, R.F. Molecular architecture of smell and taste in Drosophila. Annu. Rev. Neurosci. 2007, 30, 505–533. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yin, Z.; Zhao, M.; Wang, Y.; Yang, J.; Zhang, J. Recognition of emotions using multimodal physiological signals and an ensemble deep learning model. Comput. Methods Programs Biomed. 2017, 140, 93–110. [Google Scholar] [CrossRef] [PubMed]
Chao, H.; Zhi, H.; Dong, L.; Liu, Y. Recognition of Emotions Using Multichannel EEG Data and DBN-GC-Based Ensemble Deep Learning Framework. Comput. Intell. Neurosci. 2018, 2018, 9750904. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Tubaishat, A.A.; Al-Obeidat, F.; Halim, Z.; Waqas, M.; Qayum, F. EmoPercept: EEG-based emotion classification through perceiver. Soft Comput. 2022, 26, 10563–10570. [Google Scholar]
Zhang, Y.; Cai, H.; Nie, L.; Xu, P.; Zhao, S.; Guan, C. An end-to-end 3D convolutional neural network for decoding attentive mental state. Neural Netw. 2021, 144, 129–137. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The emotion recognition framework from multi-band EEG topology maps using ERENet.

Figure 2. The mapping relationship between the spatial distribution of electrodes and the 2D topology of electrodes in the International 10–20 system: (a) The spatial distribution of the 32 electrodes in the International 10–20 system; (b) 9 × 9 electrode topology matrix; (c) Topological positions of the 32 electrodes.

Figure 3. The construction of multi-band EEG topology maps.

Figure 4. The structure of ERENet.

Figure 5. The discrete parallel processing circuit architecture.

Figure 6. (a) Traditional classification layer; (b) Weighted classification module.

Figure 7. Performance comparison of models with different numbers of convolution kernels.

Figure 8. Performance comparison of models with different sizes of convolution kernels.

Figure 9. Performance comparison of models with different numbers of neurons in the fully connected layer.

Table 1. Sample distribution on the valence classification task.

Label	Threshold	Data Quantity
Label	Threshold	Instances	Total
LV ¹	<5	84,708	149,760
HV ²	≥5	65,052	149,760

¹ LV means low valence. ² HV means high valence.

Table 2. Sample distribution on the arousal classification task.

Label	Threshold	Data Quantity
Label	Threshold	Instances	Total
LA ¹	<5	88,218	149,760
HA ²	≥5	61,542	149,760

¹ LA means low arousal. ² HA means high arousal.

Table 3. Hyperparameter combination schemes with different number of convolution kernels ¹.

Scheme	Group Convolution of Module 1		Group Convolution of Module 2		Number of Neurons in the Fully Connected Layers of Module 3
Scheme	Number of Kernels	Size of Kernels	Number of Kernels	Size of Kernels	Number of Neurons in the Fully Connected Layers of Module 3
1	8	3 × 3	8	3 × 3	5
2	8	3 × 3	16	3 × 3	5
3	16	3 × 3	16	3 × 3	5
4	16	3 × 3	32	3 × 3	5

¹ Number of Kernels means the number of convolution kernels in each group of feature map after grouping.

Table 4. The number of parameters and average training time of the four schemes.

Scheme	Number of Total Parameters	Number of Trainable Parameters	Training Time (Seconds)
1	8576	8384	700
2	16,320	16,000	800
3	32,224	31,584	1200
4	63,200	62,048	1400

Table 5. Hyperparameter combination schemes with different sizes of convolution kernels ¹.

Scheme	Group Convolution of Module 1		Group Convolution of Module 2		Number of Neurons in the Fully Connected Layers of Module 3
Scheme	Number of Kernels	Size of Kernels	Number of Kernels	Size of Kernels	Number of Neurons in the Fully Connected Layers of Module 3
1	8	3 × 3	16	3 × 3	5
2	8	4 × 4	16	4 × 4	5
3	8	5 × 5	16	5 × 5	5
4	8	6 × 6	16	6 × 6	5
5	8	7 × 7	16	7 × 7	5

¹ Number of Kernels means the number of convolution kernels in each group of feature map after grouping.

Table 6. Hyperparameter combination schemes with different numbers of neurons in the fully connected layer ¹.

Scheme	Group Convolution of Module 1		Group Convolution of Module 2		Number of Neurons in the Fully Connected Layers of Module 3
Scheme	Number of Kernels	Size of Kernels	Number of Kernels	Size of Kernels	Number of Neurons in the Fully Connected Layers of Module 3
1	8	5 × 5	16	5 × 5	0
2	8	5 × 5	16	5 × 5	5
3	8	5 × 5	16	5 × 5	10
4	8	5 × 5	16	5 × 5	20

¹ Number of Kernels means the number of convolution kernels in each group of feature map after grouping.

Table 7. The average accuracies of each method in intra-subject, inter-subject, and mixed-subject classification scenarios.

Method	Accuracy (%)
	Intra-Subject		Inter-Subject		Mixed-Subject
	Valence	Arousal	Valence	Arousal	Valence	Arousal
SVM	70.65	71.05	41.01	42.43	63.76	63.09
1D-CNN	74.33	71.12	45.89	43.12	65.21	63.03
2D-CNN	81.12	80.45	58.32	59.67	72.39	72.42
EEGNet [35]	85.67	85.03	64.04	61.85	78.56	78.71
ShallowNet [31]	87.18	86.39	63.12	61.31	79.11	78.32
CapsNet [11]	82.24	83.37	57.28	58.79	70.73	71.28
EmoPercept [50]	93.56	94.79	67.13	66.09	85.32	85.66
3D-CNN [51]	97.89	96.56	74.32	71.42	92.12	91.79
ERENet (Ours)	95.38	95.09	68.32	67.97	88.47	87.52

Table 8. The number of parameters, average training time, and average test time of 3D-CNN [51] and ERENet in the mixed-subject classification scenario.

Method	Number of Parameters	Training Time (Seconds)	Test Time (Seconds)
3D-CNN [51]	240,000	8200	9.69
ERENet (Ours)	25,024	900	0.94

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lv, Z.; Zhang, J.; Epota Oma, E. A Novel Method of Emotion Recognition from Multi-Band EEG Topology Maps Based on ERENet. Appl. Sci. 2022, 12, 10273. https://doi.org/10.3390/app122010273

AMA Style

Lv Z, Zhang J, Epota Oma E. A Novel Method of Emotion Recognition from Multi-Band EEG Topology Maps Based on ERENet. Applied Sciences. 2022; 12(20):10273. https://doi.org/10.3390/app122010273

Chicago/Turabian Style

Lv, Ziyi, Jing Zhang, and Estanislao Epota Oma. 2022. "A Novel Method of Emotion Recognition from Multi-Band EEG Topology Maps Based on ERENet" Applied Sciences 12, no. 20: 10273. https://doi.org/10.3390/app122010273

APA Style

Lv, Z., Zhang, J., & Epota Oma, E. (2022). A Novel Method of Emotion Recognition from Multi-Band EEG Topology Maps Based on ERENet. Applied Sciences, 12(20), 10273. https://doi.org/10.3390/app122010273

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Method of Emotion Recognition from Multi-Band EEG Topology Maps Based on ERENet

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Data Preprocessing

2.3. An Emotion Recognition Method from Multi-Band EEG Topology Maps Using ERENet

2.3.1. Multi-Band EEG Topology Maps

2.3.2. The Model of Proposed ERENet

3. Experiments and Results

3.1. Experimental Environment and Experimental Settings

3.2. Selection of Hyperparameters

3.3. Comparison between ERENet and Other Methods

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI