Emotional Stress Recognition Using Electroencephalogram Signals Based on a Three-Dimensional Convolutional Gated Self-Attention Deep Neural Network
Abstract
:1. Introduction
- A 3D convolutional self-attention DNN is developed that integrates features learned from the spatiotemporal and frequency domains of multi-channel EEG signals to significantly improve emotional stress recognition performance.
- The proposed 3D convolutional self-attention neural network is comprised of a 3D convolutional block (3DConvB) and a gated self-attention block. 3DConvB is applied to each frequency band individually, rather than to the entire frequency bands, to capture internal variation in the spatiotemporal relationship between electrodes within each frequency band. Additionally, nonlocal operations are performed using a gated self-attention block to reliably extract both long-distance dependencies and global information.
- We combine the inter-electrode correlation information according to the emotional stress response extracted from each frequency band of the EEG signal using inter-frequency mapping. This allows us to additionally learn complementary and interconnected information between frequency bands, as well as to model the internal dependencies between salient EEG features related to emotional stress.
2. Proposed Methods
2.1. Preprocessing and Frequency Band Decomposition
2.2. 3D Convolutional Neural Network for Spatiotemporal Encoding
2.3. Gated Self-Attention Block
2.4. Inter-Frequency Band Mapping
3. Experiment and Results
3.1. Evaluation Datasets
- The database for emotion analysis using physiological signals (DEAP) [30] is a publicly available dataset for emotional classification. It comprises multiple physiological signals collected from 32 participants, aged between 19 and 37 years, who watched 40 music videos for 60 s each. Among the different signals, there were EEG signals collected at a sample rate of 512 Hz from participants using 32 channels/electrodes, which were positioned according to the international 10–20 system. The collected EEG signals were downsampled to 128 Hz; electrooculogram (EOG) artefacts and noises were preprocessed; the data were then averaged as a common reference. The self-assessment manikin (SAM) scale, provided by DEAP for measuring the valence and arousal based on Russell’s model for emotional representation, was included for each music video. The valance, arousal, dominance, and preference were scored in 4 dimensions, each ranging from 1 to 10. The emotional state was labeled based on the arousal and valence of the SAM scale. Low arousal and high valence are considered to constitute a state of calm. Conversely, a stress state is when the valence is low and the arousal is high. A total of twenty-four participants from this dataset were screened and their annotated valence and arousal values were applied to (14) and (15) [31] to classify the EEG signals into two state classes, those of calm and stress.
- The virtual reality environment (VRE) dataset was obtained from a climbing virtual reality system, comprising periods of stressful climbing over rugged mountain and periods of rest (or calm) between the climbs. The virtual reality system was designed to enable participants to receive and respond to audiovisual feedback through an Oculus Rift, which included EEG caps. Twelve participants, including 6 males and 6 females aged between 20 and 30 years, wore an Oculus Rift with built-in displays and lenses and experienced the VRE, and EEG signals were recorded for 10 min from each participant at a sampling rate of 512 Hz using an EEG cap with 32 channels/electrodes.
- The EEG dataset for the emotional stress recognition (EDESC) is a dataset containing EEG signals obtained from 20 participants, including 10 males and 10 females aged between 18 and 30 years. The EDESC recorded data at a sampling rate of 256 Hz in two stages, before and after an activity, using a four-channel EEG headband. In the preactivity phase, EEG data were collected for 3 min from participants sitting in a comfortable position in a quiet room with their eyes open. In the postactivity phase, EEG data were collected for 3 min from the participants sitting in the measurement room after they performed the activity. The perceived stress scale (PSS) questionnaire was used to classify the EEG signals as stressed or nonstressed. If the PSS score was 20 or higher, the subject was classified as stressed, whereas the subject was classified as nonstressed if the score was less than 20. The authors who created the database compared the pre- and postactivity phases, and reported that the preactivity phase was more accurate for identifying stress. Therefore, we applied the preactivity phase data to our experiments by dividing them into two classes (stressed and nonstressed) and three classes (stress-free, mild, and stressed).
3.2. Experimental Methods
- Support vector machines using entropy features (EF-SVM) [32]: In this method, entropy-based features were extracted from EEG signals decomposed using stationary wavelet transforms to detect mental stress; the obtained signals were then applied to SVMs.
- Random forest algorithm with fine-grained EEG spectra (RF-FS) [12]: After decomposing the EEG signals into 99 fine sub-bands rather than 5 EEG sub-bands, the PSD was obtained, and it was then applied to the RF algorithm for stress recognition.
- 2D AlexNet-CNN (2D-AlexNet) [33]: Multi-channel EEG signals were converted into 2D spectral images and applied to AlexNet, which comprised five convolutional layers, three max-pooling layers, three fully connected layers, and two dropout layers involved in recognizing the stress state.
- Convolutional recurrent neural network (CRNN) [34]: This is a hybrid neural network that combines a CNN and RNN, with the former encoding a high-level representation of an EEG signal and the latter exploring the temporal dynamics of the EEG signal. CRNN was composed of two convolutional layers, one sub-sampling layer, two fully connected recurrent layers, and one output layer.
- Pre-layer-norm transformer (PLNTF) [28]: To learn the long-term temporal dependence of multi-channel EEG signals, layer-norm was used before applying the multi-head attention mechanism. Subsequently, the residual connection was applied, and the layer-norm, feedforward, and residual connection were sequentially re-performed.
- Gated transformer (GTR) [28]: A gating mechanism was used to stabilize the training process, using GRU instead of the addition operation in the residual connections of the PLNTF structure.
- Hierarchical bidirectional gated recurrent unit model with attention (HBGRUA) [14]: HBGRUA comprised two layers, wherein the first layer encoded the local correlation between signal samples of one epoch of the EEG signal and the second layer recognized stress by encoding the temporal correlation between the EEG sequences. The bidirectional GRU and attention mechanism were used at both sample and epoch levels.
- Spatial frequency convolutional self-attention network (SFCSAN) [17]: In this method, the EEG signal was decomposed into four frequency bands, and a convolutional self-attention network was applied to the time–frequency entropy values obtained from each frequency band. Furthermore, band mapping was performed between frequencies, and the stress states were recognized using the softmax layer.
- 3D residual attention deep neural network (3DRADNN) [35]: A 3D residual attention neural network was combined with a 3D CNN and a residual attention deep neural network (RADNN). Additionally, RADNN was implemented to improve stress recognition performance and capture local, global, and spatial information.
- Spatial frequency 3D convolutional neural network (SF3DCNN): In this method, the EEG signal was decomposed into four frequency bands and a 3D convolution block was applied to the time–frequency power spectrum value obtained in each frequency band to extract the stress state spatiotemporal features. Thereafter, band mapping between frequencies was performed, and the stress state was recognized using the softmax layer.
- Spatial frequency 3D convolutional residual-attention deep neural network (SF3DCRA): This method uses 3D convolutional block and residual-attention block (using additive operations) in each frequency band to capture the spatiotemporal features corresponding to the stress state, and inputs the final attention characteristic information obtained through inter-frequency band mapping into a linear layer. Then, the stress state is recognized through the softmax function.
3.3. Results
- In the case of intra-frequency self-attention, the γ band had superior accuracy than other frequency bands. This experimental result is consistent with psychological studies [36] that reported that γ band activity is closely related to memory, learning, reminiscence, selective concentration, and high-level cognitive processing.
- When comparing the results of the inter-frequency band mapping model with those of the intra-frequency self-attention method, the advantages of this framework could be clearly seen. Combining EEG features of all frequency bands, (θ, α, β, γ), resulted in much better recognition performance compared to using only one frequency band. This means that combinations of frequency bands help to achieve the best results by making the most of complementary information.
- When applying the RA method, using GRU instead of additive connection showed improved results in both intra-frequency self-attention and inter-frequency band mapping.
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Sharma, N.; Gedeon, T. Objective measures, sensors and computational techniques for stress recognition: A survey. Comput. Meth. Programs Biomed. 2012, 108, 1287–1301. [Google Scholar] [CrossRef] [PubMed]
- Burke, H.M.; Davis, M.C.; Otte, C.; Mohr, D.C. Depression and cortisol responses to psychological stress: A meta-analysis. Psychoneuroendocrinology 2005, 30, 846–856. [Google Scholar] [CrossRef] [PubMed]
- Ahn, M.H. Analysis on The Reflection Degree of Worker’s Stress by Brain-waves based Anti-Stress Quotient. J. Korea Acad.-Ind. Coop. Soc. 2010, 11, 3833–3838. [Google Scholar]
- Pourmohammadi, S.; Maleki, A. Stress detection using ECG and EMG signals: A comprehensive study. Comput. Meth. Programs Biomed. 2020, 193, 105482. [Google Scholar] [CrossRef]
- Liu, Y.; Du, S. Psychological stress level detection based on electrodermal activity. Behav. Brain Res. 2018, 341, 50–53. [Google Scholar] [CrossRef]
- Katmah, R.; Al-Shargie, F.; Tariq, U.; Babiloni, F.; Al-Mughairbi, F.; Al-Nashash, H. A Review on Mental Stress Assessment Methods Using EEG Signals. Sensors 2021, 21, 5043. [Google Scholar] [CrossRef]
- Steptoe, A.; Marmot, M. Impaired cardiovascular recovery following stress predicts 3-year increases in blood pressure. J. Hypertens. 2005, 23, 529–536. [Google Scholar] [CrossRef] [Green Version]
- Pedrotti, M.; Mirzaei, M.A.; Tedesco, A.; Chardonnet, J.R.; Mérienne, F.; Benedetto, S.; Baccino, T. Automatic Stress Classification with Pupil Diameter Analysis. Int. J. Hum.-Comput. Interact. 2014, 30, 220–236. [Google Scholar] [CrossRef] [Green Version]
- Lee, M.; Moon, J.; Cheon, D.; Lee, J.; Lee, K. Respiration signal based two layer stress recognition across non-verbal and verbal situations. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, Brno, Czech Republic, 30 March–3 April 2020; pp. 638–645. [Google Scholar]
- Liao, C.Y.; Chen, R.C.; Tai, S.K. Emotion stress detection using EEG signal and deep learning technologies. In Proceedings of the 2018 IEEE International Conference on Applied System Invention (ICASI), Chiba, Japan, 13–17 April 2018; pp. 90–93. [Google Scholar]
- Jebelli, H.; Khalili, M.M.; Lee, S. Mobile EEG-based workers’ stress recognition by applying deep neural network. In Advances in Informatics and Computing in Civil and Construction Engineering; Springer: Cham, Switzerland, 2019; pp. 173–180. [Google Scholar]
- Baumgartl, H.; Fezer, E.; Buettner, R. Two-level classification of chronic stress using machine learning on resting-state EEG recordings. In Proceedings of the 25th Americas Conference on Information Systems (AMCIS), Virtual Conference, 12–16 August 2020. [Google Scholar]
- Subhani, A.R.; Mumtaz, W.; Saad, M.N.B.M.; Kamel, N.; Malik, A.S. Machine learning framework for the detection of mental stress at multiple levels. IEEE Access 2017, 5, 13545–13556. [Google Scholar] [CrossRef]
- Chen, J.X.; Jiang, D.M.; Zhang, Y.N. A hierarchical bidirectional GRU model with attention for EEG-based emotion classification. IEEE Access 2019, 7, 118530–118540. [Google Scholar] [CrossRef]
- Wang, Z.; Wang, Y.; Hu, C.; Yin, Z.; Song, Y. Transformers for eeg-based emotion recognition: A hierarchical spatial information learning model. IEEE Sens. J. 2022, 22, 4359–4368. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 30 (NIPS 2017); Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Li, D.; Xie, L.; Chai, B.; Wang, Z.; Yang, H. Spatial-frequency convolutional self-attention network for EEG emotion recognition. Appl. Soft Comput. 2022, 122, 108740. [Google Scholar] [CrossRef]
- Issa, S.; Peng, Q.; You, X.; Shah, W.A. Emotion Assessment Using EEG Brain Signals and Stacked Sparse Autoencoder. J. Inf. Assur. Secur. 2019, 14, 20–29. [Google Scholar]
- Song, Y.; Jia, X.; Yang, L.; Xie, L. Transformer-based spatial-temporal feature learning for eeg decoding. arXiv 2021, arXiv:2106.11170. [Google Scholar]
- Newson, J.J.; Thiagarajan, T.C. EEG frequency bands in psychiatric disorders: A review of resting state studies. Front. Hum. Neurosci. 2019, 12, 521. [Google Scholar] [CrossRef] [Green Version]
- Wen, T.Y.; Bani, N.A.; Muhammad-Sukki, F.; Aris, S.A.M. Electroencephalogram (EEG) human stress level classification based on Theta/Beta ratio. Int. J. Integr. Eng. 2020, 12, 174–180. [Google Scholar] [CrossRef]
- Jun, T.J.; Nguyen, H.M.; Kang, D.; Kim, D.; Kim, D.; Kim, Y.H. ECG arrhythmia classification using a 2-D convolutional neural network. arXiv 2018, arXiv:1804.06812. [Google Scholar]
- Salama, E.S.; El-Khoribi, R.A.; Shoman, M.E.; Shalaby, M.A.W. EEG-based emotion recognition using 3D convolutional neural networks. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 329–337. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, A.; Pham, K.; Ngo, D.; Ngo, T.; Pham, L. An analysis of state-of-the-art activation functions for supervised deep neural network. In Proceedings of the 2021 International Conference on System Science and Engineering (ICSSE), Ho Chi Minh City, Vietnam, 26–28 August 2021; pp. 215–220. [Google Scholar]
- Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
- Tao, W.; Li, C.; Song, R.; Cheng, J.; Liu, Y.; Wan, F.; Chen, X. EEG-based emotion recognition via channel-wise attention and self attention. IEEE Trans. Affect. Comput. 2020. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Tao, Y.; Sun, T.; Muhamed, A.; Genc, S.; Jackson, D.; Arsanjani, A.; Kumar, P. Gated transformer for decoding human brain eeg signals. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Guadalajara, Mexico, 1–5 November 2021; pp. 125–130. [Google Scholar]
- Yang, J.; Huang, X.; Wu, H.; Yang, X. EEG-based emotion classification based on bidirectional long short-term memory network. Procedia Comput. Sci. 2020, 174, 491–504. [Google Scholar] [CrossRef]
- Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.S.; Yazdani, A.; Ebrahimi, T.; Patras, I. DEAP: A database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 2011, 3, 18–31. [Google Scholar] [CrossRef] [Green Version]
- Hag, A.; Handayani, D.; Altalhi, M.; Pillai, T.; Mantoro, T.; Kit, M.H.; Al-Shargie, F. Enhancing EEG-Based Mental Stress State Recognition Using an Improved Hybrid Feature Selection Algorithm. Sensors 2021, 21, 8370. [Google Scholar] [CrossRef] [PubMed]
- Candra, H.; Yuwono, M.; Chai, R.; Handojoseno, A.; Elamvazuthi, I.; Nguyen, H.T.; Su, S. Investigation of window size in classification of EEG-emotion signal with wavelet entropy and support vector machine. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015; pp. 7250–7253. [Google Scholar]
- Nogay, H.S.; Adeli, H. Detection of epileptic seizure using pretrained deep convolutional neural network and transfer learning. Eur. Neurol. 2020, 83, 602–614. [Google Scholar] [CrossRef]
- Li, X.; Song, D.; Zhang, P.; Yu, G.; Hou, Y.; Hu, B. Emotion recognition from multi-channel EEG data through convolutional recurrent neural network. In Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Shenzhen, China, 15–18 December 2016; pp. 352–359. [Google Scholar]
- Zhang, X.; Han, L.; Zhu, W.; Sun, L.; Zhang, D. An explainable 3D residual self-attention deep neural network for joint atrophy localization and Alzheimer’s disease diagnosis using structural MRI. IEEE J. Biomed. Health Inform. 2021. [Google Scholar] [CrossRef] [PubMed]
- Matsumoto, A.; Ichikawa, Y.; Kanayama, N.; Ohira, H.; Iidaka, T. Gamma band activity and its synchronization reflect the dysfunctional emotional processing in alexithymic persons. Psychophysiology 2006, 43, 533–540. [Google Scholar] [CrossRef]
Method | Accuracy | Precision | F-Measure |
---|---|---|---|
RF-FS | 79.84 | 75.64 | 76.03 |
EF-SVM | 81.45 | 81.31 | 81.07 |
2D-AlexNet | 81.83 | 83.51 | 82.11 |
HBGRUA | 84.62 | 84.52 | 83.94 |
CRNN | 86.77 | 86.58 | 86.16 |
SF3DCNN | 87.13 | 86.94 | 86.52 |
PLNTF | 88.42 | 86.89 | 86.75 |
GTR | 89.67 | 89.77 | 88.65 |
3DRADNN | 91.46 | 91.27 | 90.85 |
SFCSAN | 92.83 | 92.76 | 92.15 |
SF3DCRA | 94.52 | 94.62 | 93.50 |
3DCGSA | 96.68 | 96.77 | 96.39 |
Method | Accuracy | Precision | F-Measure |
---|---|---|---|
RF-FS | 75.79 | 77.47 | 77.09 |
EF-SVM | 77.78 | 77.64 | 76.40 |
2D-AlexNet | 79.57 | 79.38 | 78.96 |
HBGRUA | 81.78 | 80.25 | 80.11 |
CRNN | 83.92 | 85.60 | 85.22 |
SF3DCNN | 85.63 | 85.56 | 84.96 |
PLNTF | 86.53 | 86.62 | 86.24 |
GTR | 87.97 | 87.87 | 87.29 |
3DRADNN | 89.59 | 91.27 | 90.89 |
SFCSAN | 92.74 | 92.55 | 92.13 |
SF3DCRA | 93.61 | 94.9 | 93.91 |
3DCGSA | 95.64 | 95.57 | 94.96 |
Method | Accuracy | Precision | F-Measure |
---|---|---|---|
RF-FS | 74.52 | 77.60 | 76.20 |
EF-SVM | 76.13 | 77.81 | 77.43 |
2D-AlexNet | 78.56 | 78.65 | 78.27 |
HBGRUA | 80.13 | 80.03 | 79.45 |
CRNN | 81.03 | 82.71 | 82.33 |
SF3DCNN | 82.12 | 82.05 | 81.44 |
PLNTF | 83.17 | 83.26 | 82.88 |
GTR | 84.34 | 84.4 | 83.66 |
3DRADNN | 86.25 | 84.72 | 84.58 |
SFCSAN | 87.47 | 87.28 | 86.86 |
SF3DCRA | 89.48 | 89.57 | 89.19 |
3DCGSA | 91.52 | 93.20 | 92.82 |
Method | Accuracy | Precision | F-Measure |
---|---|---|---|
RF-FS | 71.47 | 73.15 | 72.77 |
EF-SVM | 73.21 | 73.30 | 72.92 |
2D-AlexNet | 75.39 | 77.07 | 76.69 |
HBGRUA | 77.24 | 77.33 | 76.95 |
CRNN | 79.53 | 79.46 | 78.85 |
SF3DCNN | 80.62 | 80.72 | 80.60 |
PLNTF | 81.24 | 82.92 | 82.54 |
GTR | 83.32 | 83.25 | 82.64 |
3DRADNN | 85.04 | 84.85 | 84.43 |
SFCSAN | 86.67 | 86.53 | 85.29 |
SF3DCRA | 88.15 | 88.25 | 87.13 |
3DCGSA | 90.12 | 90.21 | 89.83 |
Method | Recognition Accuracy (%) | |
---|---|---|
RA | GRU-RA | |
θ | 84.95 | 87.06 |
α | 86.53 | 88.55 |
β | 88.54 | 90.72 |
γ | 90.07 | 92.33 |
(θ, α, β, γ) | 93.99 | 96.68 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, H.-G.; Jeong, D.-K.; Kim, J.-Y. Emotional Stress Recognition Using Electroencephalogram Signals Based on a Three-Dimensional Convolutional Gated Self-Attention Deep Neural Network. Appl. Sci. 2022, 12, 11162. https://doi.org/10.3390/app122111162
Kim H-G, Jeong D-K, Kim J-Y. Emotional Stress Recognition Using Electroencephalogram Signals Based on a Three-Dimensional Convolutional Gated Self-Attention Deep Neural Network. Applied Sciences. 2022; 12(21):11162. https://doi.org/10.3390/app122111162
Chicago/Turabian StyleKim, Hyoung-Gook, Dong-Ki Jeong, and Jin-Young Kim. 2022. "Emotional Stress Recognition Using Electroencephalogram Signals Based on a Three-Dimensional Convolutional Gated Self-Attention Deep Neural Network" Applied Sciences 12, no. 21: 11162. https://doi.org/10.3390/app122111162
APA StyleKim, H. -G., Jeong, D. -K., & Kim, J. -Y. (2022). Emotional Stress Recognition Using Electroencephalogram Signals Based on a Three-Dimensional Convolutional Gated Self-Attention Deep Neural Network. Applied Sciences, 12(21), 11162. https://doi.org/10.3390/app122111162