Temporal Convolutional Network-Enhanced Real-Time Implicit Emotion Recognition with an Innovative Wearable fNIRS-EEG Dual-Modal System

Chen, Jiafa; Yu, Kaiwei; Wang, Fei; Zhou, Zhengxian; Bi, Yifei; Zhuang, Songlin; Zhang, Dawei

doi:10.3390/electronics13071310

Open AccessArticle

Temporal Convolutional Network-Enhanced Real-Time Implicit Emotion Recognition with an Innovative Wearable fNIRS-EEG Dual-Modal System

by

Jiafa Chen

¹

,

Kaiwei Yu

¹

,

Fei Wang

^1,*

,

Zhengxian Zhou

^2,*,

Yifei Bi

¹,

Songlin Zhuang

¹ and

Dawei Zhang

^1,3

¹

Engineering Research Center of Optical Instrument and System, Ministry of Education and Shanghai Key Lab of Modern Optical System, University of Shanghai for Science and Technology, Shanghai 200093, China

²

Anhui Province Key Laboratory of Optoelectric Materials Science and Technology, Anhui Normal University, Wuhu 241002, China

³

Shanghai Environmental Biosafety Instruments and Equipment Engineering Technology Research Center, University of Shanghai for Science and Technology, Shanghai 200093, China

^*

Authors to whom correspondence should be addressed.

Electronics 2024, 13(7), 1310; https://doi.org/10.3390/electronics13071310

Submission received: 6 February 2024 / Revised: 21 March 2024 / Accepted: 28 March 2024 / Published: 31 March 2024 / Corrected: 9 May 2024

(This article belongs to the Special Issue New Application of Wearable Electronics)

Download

Browse Figures

Versions Notes

Abstract

:

Emotion recognition remains an intricate task at the crossroads of psychology and artificial intelligence, necessitating real-time, accurate discernment of implicit emotional states. Here, we introduce a pioneering wearable dual-modal device, synergizing functional near-infrared spectroscopy (fNIRS) and electroencephalography (EEG) to meet this demand. The first-of-its-kind fNIRS-EEG ensemble exploits a temporal convolutional network (TC-ResNet) that takes 24 fNIRS and 16 EEG channels as input for the extraction and recognition of emotional features. Our system has many advantages including its portability, battery efficiency, wireless capabilities, and scalable architecture. It offers a real-time visual interface for the observation of cerebral electrical and hemodynamic changes, tailored for a variety of real-world scenarios. Our approach is a comprehensive emotional detection strategy, with new designs in system architecture and deployment and improvement in signal processing and interpretation. We examine the interplay of emotions and physiological responses to elucidate the cognitive processes of emotion regulation. An extensive evaluation of 30 subjects under four emotion induction protocols demonstrates our bimodal system’s excellence in detecting emotions, with an impressive classification accuracy of 99.81% and its ability to reveal the interconnection between fNIRS and EEG signals. Compared with the latest unimodal identification methods, our bimodal approach shows significant accuracy gains of 0.24% for EEG and 8.37% for fNIRS. Moreover, our proposed TC-ResNet-driven temporal convolutional fusion technique outperforms conventional EEG-fNIRS fusion methods, improving the recognition accuracy from 0.7% to 32.98%. This research presents a groundbreaking advancement in affective computing that combines biological engineering and artificial intelligence. Our integrated solution facilitates nuanced and responsive affective intelligence in practical applications, with far-reaching impacts on personalized healthcare, education, and human–computer interaction paradigms.

Keywords:

portable wearable; fNIRS-EEG; bimodal; time convolution; invisible emotion recognition

1. Introduction

1.1. Background and Motivation

The field of human–computer interaction is rapidly advancing with the development of brain–computer interface technologies [1]. Emotion recognition is crucial for interpreting brain signals to discern emotional states. However, traditional approaches relying on external behaviors such as facial expressions, tone, and body language have limitations in capturing implicit emotions in social interactions [2,3]. Real-time recognition of implicit emotions is important across disciplines, such as cognitive neuroscience, psychology, and human–computer interaction for fostering social connections and enhancing user experience [4]. However, the complexity of implicit emotions poses significant challenges for conventional emotion recognition technologies. Novel methods integrating multimodal brain imaging techniques are increasingly pursued to achieve real-time implicit emotion recognition.

1.2. Related Works

Recent studies have explored emotion recognition using functional near-infrared spectroscopy (fNIRS) and electroencephalography (EEG) independently [5,6]. As a non-invasive optical imaging technology, fNIRS monitors changes in the concentrations of oxygenated hemoglobin (HbO) and deoxygenated hemoglobin (HbR) in the brain, thereby objectively observing the process of functional activity in the brain. It offers advantages such as non-invasiveness, compactness, portability, and low energy consumption. Researchers have investigated hemodynamic responses associated with emotion processing in the prefrontal cortex region [7,8,9,10]. These studies demonstrated the potential for distinguishing emotional states based on hemodynamic changes and proposed emotion prediction models using machine learning techniques. For instance, Bandara et al. (2017) successfully differentiated emotional states (i.e., emotional value and arousal dimensions) using fNIRS, creating an emotion prediction model [8]. Yeung et al. (2023) also demonstrated the significant potential of fNIRS in examining the neural mechanisms of explicit and implicit facial emotion recognition based on PFC hemodynamic reactions [11].

Similarly, EEG examines the electrical activity of neurons and detects the brain’s state by recording spontaneous rhythmic movement potentials of neurons beneath the scalp. EEG has become a critical tool in emotion recognition research with superior temporal resolution, non-invasiveness, convenience, and rapid response to physiological and psychological changes. Research in emotion recognition using EEG has investigated neural oscillations and event-related potentials associated with emotional stimuli [12,13,14,15]. Machine learning algorithms have been utilized to classify and predict emotional states based on EEG signals. For instance, Zheng et al. (2019) used machine learning approaches to examine time-stable patterns in EEG emotion recognition [12]. Subsequently, Zhang et al. (2020) carried out experimental studies to determine the efficacy of deep neural networks, convolutional neural networks (CNNs), extended short-term memory networks (LSTMs), and a combined model of CNN and LSTM in EEG emotion recognition applications [13].

Nonetheless, there are several limitations to these technologies. Firstly, although fNIRS boasts high spatial resolution, its time resolution limitation of fNIRS is due to the monitoring of relatively slow hemodynamic changes. On the other hand, EEG has excellent time resolution but seems to lack spatial resolution [16]. The two approaches complement each other at the technical level. They can be applied to different participant groups, from infants to older people, showing many advantages in comparison to technologies such as functional magnetic resonance imaging, positron emission tomography, or Magnetoencephalography, where no apparent physical limitations have been identified. fNIRS and EEG do not involve high-intensity magnetic fields or ionizing radiation and have significantly lower hardware costs than other brain imaging methods. Integrating these two detection methods will provide information on the cerebral cortex’s electrical signals and metabolic-cerebral hemodynamic activity. The nature of being free from electromagnetic interference allows this integration to be conducted in non-laboratory environments, such as natural settings, mobile monitoring setups, bedside applications, etc. Importantly, this integration does not cause severe discomfort to the participants.

1.3. Research Gap

The integration of fNIRS and EEG signals for real-time emotion recognition has not been fully explored, with little research focusing on innovative wearable systems capable of capturing dynamic emotional responses in natural environments. The key limitations of existing approaches are the temporal resolution constraints of fNIRS and spatial resolution limitations of EEG. While offering synergistic advantages, their integrated use in user-friendly portable systems remains largely unexplored. Additionally, applying these technologies in dynamic real-world environments presents challenges, especially for clinical populations, as shown in Figure 1. Developing an integrated fNIRS-EEG system addressing these gaps is crucial. Traditional fNIRS-EEG studies typically focus on the wired connections of independent fNIRS and EEG systems for data collection. The cumbersome connection method and immovable desktop machine limit the mobility of the system, restricting the synchronized application of fNIRS and EEG [17]. Several key factors need to be optimized in the design: (1) compatibility and synchronization of fNIRS and EEG; (2) optimized layout design of electrodes and optodes; (3) lightweight and highly integrated design to enhance user experiences; (4) ensuring high-quality signals; (5) adopting a comfortable ergonomic design; (6) portable and wearable, lightweight, and flexible instruments; (7) cost reduction; (8) device complexity and reliability; (9) high resolution; (10) advanced signal processing algorithms suitable for long-term practical research.

Contemporary emotion recognition approaches often overlook real-time dynamic changes in emotional states, emphasizing static, discrete emotional states [18]. This necessitates advanced signal processing techniques to improve the accuracy and speed of emotion recognition by leveraging the temporal dynamics of fNIRS-EEG signals. Existing methods typically rely on machine learning models, which lack the capability to capture temporal dependencies within the data. For instance, researchers extensively employ convolutional neural networks (CNNs) in emotion recognition, but their relatively shallow network architecture hinders the accurate capture of high- and low-frequency features in the fused fNIRS and EEG signals [19,20]. Therefore, our proposed framework aims to track and understand the temporal fluctuations of emotional states reflected in fNIRS-EEG signals, introducing the dynamic characteristics of emotional changes and investigating how emotions are expressed at a latent level via fNIRS-EEG signals.

Figure 1. Comparison of fNIRS-EEG dual-modality imaging instruments. (a) characterization of bimanual cyclical tasks from EEG-fNIRS (Germany) measurements [21]; (b) automatic depression diagnosis through EEG-fNIRS (Artinis) [22]; (c) subject-specific modeling of EEG-fNIRS (Danyang Huichuang, China) neurovascular coupling [23]; (d) our self-designed wireless wearable portable instrument.

1.4. Contribution

This paper presents an innovative wearable dual-modal system integrating wireless fNIRS and EEG. Our lightweight modular design addresses the mobility limitations of previous immobilized setups, our key technical parameters are shown in Table 1. Comprehensive emotion regulation insights are achieved by employing a multidimensional methodology from system engineering to signal processing and machine learning. A temporal convolutional network enhances real-time implicit emotion recognition performance versus conventional methods. Extensive analysis with healthy participants demonstrates the superiority of our dual-modal approach over single modalities. This work paves the way for emotionally intelligent applications through personalized healthcare, adaptive interfaces, and enriched human–computer interactions.

2. Materials and Methods

2.1. Participants

A total of 30 participants were recruited for this study, including 15 males and 15 females aged between 19 and 28 years. The participants’ average age was 24.27 years, with a standard deviation of 2.27. All participants were university students who had no known history of brain injury or neurological disorders. Before the study commenced, participants were given a detailed explanation of the experimental procedure and instructions and asked to provide written consent. We also informed participants of their right to withdraw from the study at any time without consequence. During the experiment, participants were asked to sit in front of a computer and attentively watch movie clips designed to evoke emotions. We emphasized that participants should remain as still as possible during the data recording phase, refraining from making any body movements to ensure data accuracy. A quiet and comfortable environment was established to effectively capture participant data while viewing emotionally stimulating movie segments.

2.2. Experimental Procedure

This study employed movie clips as emotional induction stimuli, the key purpose of which was to elicit participants’ emotional responses through visual and auditory stimuli. This would enable changes in their subjective experiences and physiological reactions to be captured [24]. In our affective experiments, we used Chinese movie clips because we believe that local cultural factors may influence the inspiration in affective experiments. In our preliminary study, we manually selected a set of emotional movie clips from famous Chinese movies. Movie clips were selected based on the following criteria: (1) the duration was controlled to prevent participant visual fatigue; (2) pre-experimental subject evaluations were performed to ensure that the selected materials effectively induce specific emotions; (3) movie clips were edited to determine each emotion category and generate natural emotional responses, encompassing the four basic emotions of calmness, sadness, joy, and fear. The experiment followed the paradigm design illustrated in Figure 2: a 30-s quiet baseline state was established at the start, after which a Latin square order presentation of four sets of movie clips was ensured, each of which lasted 180 s. A 30-s interval between videos allowed participants to prepare for the next round of emotion induction. Following the conclusion of each emotional induction video, participants completed a self-assessment scale to evaluate the effectiveness of the emotional elicitation. We computed the average and standard deviation of participants’ scores across the four emotional induction categories, as shown in Table 2. The experimental assessment employed a 10-point scale, where 1 represented a very poor level of emotional induction, and 10 denoted the highest level of emotional induction (see Supplementary Materials S1 participants’ self-assessment scale). The experiment lasted 14 min and took place in a quiet and soundproof room. Through this setup, we ensured the standardized execution of the experiment and the accuracy of data collection, the details of the movie clips used in the experiment are shown in Table 3.

2.3. Overview of the New Portable Wearable Functional Near-Infrared Spectroscopy-Electroencephalography (fNIRS-EEG) System

Our system consists of a bimodal full-head cap, fNIRS light source, fNIRS detector, EEG electrodes, integrated miniaturized fNIRS host module, and EEG host module. The entire equipment set weighs ≤ 360 g, includes a built-in rechargeable battery, has a lightweight integrated design, and positions the entire system on the head, which eliminates the need for a backpack. This achieves true portability, can be used for indoor and outdoor experiments, and ensures participants’ comfort during activities (see Figure 3). The system employs a flexible EEG cap as the base, using perforations to secure the fNIRS light source-detector holder at specific locations. The distance between the light source and detector is maintained between 10 and 55 mm by adjusting the space ranges with elastic-plastic connectors, enabling multiple distances and short-distance detector arrangements (see Figure 3i). Simultaneously, a dual-layer structure combined the physical space of fNIRS-EEG bimodal detectors. The upper and lower layers are fixed with button structures, which enables a layout diagram of the integrated detection system’s electrodes and light source detector for the entire head to be presented (see Figure 4). This design reduces costs, enhances comfort, and avoids conflicts in spatial coupling between the positions of near-infrared probes and EEG electrodes. The fNIRS host and the optical cable follow an integrated structure design, eliminating the need for plugging whilst simultaneously minimizing the risk of damage. We also created a separate connector and socket to connect the EEG components (see Figure 3d). This system design facilitates efficient, conflict-free fNIRS-EEG bimodal monitoring, providing a reliable tool for research.

In this system, fNIRS and EEG signals are collected separately and subsequently analyzed on the upper computer to simplify the research process and enhance the reproducibility of the research. The upper computer can perform real-time data processing and analysis, allowing researchers to obtain prompt preliminary results. This straightforward operation saves considerable time and effort for researchers, ensuring consistency and comparability between experiments. Furthermore, this approach reduces interference between signals, thus enhancing the quality and reliability of the data.

The fNIRS experimental setup employed in this work consists of ten emission light sources (with wavelengths of 760 nm and 850 nm), eight detectors, and an integrated miniaturized NIRS host module (dimensions: 8.5 × 8.5 × 3.5 cm, weight: ≤300 g), with 24 channels in total (see Figure 3 and Figure 4). The light source section of the system contains an LED ring-shaped needle insertion design (weight: <12 g) and integrates a custom-made light pole fixator to shield ambient light. In turn, this facilitates the light emission function, which can be seen in Figure 3a,b,j,k. This light source employs a transistor-driven circuit and controls dual-wavelength LEDs employing time-division multiplexing technology. The detector section contains an photodiode ring-shaped needle insertion. Moreover, it employs a custom-made light pole fixator to facilitate the light reception function, as shown in Figure 3a,c,j,k. The detector incorporates photodiode operational amplifiers on the same chip, enabling light signals to be directly converted into voltage signal output, achieving high gain and low noise amplification of weak signals on the part of the photodetector. Moreover, we employ synchronous modulation technology between LED light sources and the photodetector, in addition to a digital smoothing algorithm, to stabilize the detector operation under ambient solid light interference. The light detection sensitivity is as high as 10³ photons; while the photodetector background noise is less than 50 μV. The maximum output is more significant than 3.5 V, indicating excellent electromagnetic interference resistance with an average electromagnetic shielding > 90 dB (200 kHz–1 GHz). The overall power consumption is ≤1 W and is highly sensitive and stable. We employed Low-Temperature Co-fired Ceramic (LTCC) technology to integrate the light source driving circuit, amplification and filtering circuit, I/V conversion circuit, A/D circuit, wireless communication circuit, ARM processing circuit, and power supply circuit in the fNIRS host module, encapsulated in a 3D-printed plastic shell. This process follows the principles of a minor, portable, comprehensive system with stable and easily designable circuitry. This achieves control over fNIRS signal acquisition, light source driving, power supply, data transmission, and other units (see Figure 5). Furthermore, the sampling frequency of this fNIRS can be as high as 150 Hz.

The EEG experimental setup in this paper includes an electrode cap, electrodes, and an EEG host module (dimensions: 60 × 85 × 20 mm, weight: ≤60 g), with 16 channels in total (see Figure 3 and Figure 4). The electrodes and electrode cap are integrated, generating a need for electrode installation each time to ensure optimal signal quality. Cables with custom snap-on heads connected to the EEG host module facilitate electrode signal transmission, as seen in Figure 3g,h. Ag/Cl ring-shaped dry electrodes are selected for EEG signals to mitigate noticeable polarization phenomena, reduce baseline drift, and effectively record EEG signals. Furthermore, this design streamlines preparatory work, enabling quick testing. It is also suitable for portable and wearable purposes. The EEG host module in the system is integrated through LTCC technology, encompassing filtering and amplification circuits, A/D circuits, power supply circuits, ARM processing circuits, and wireless communication circuits. It is kept in a 3D-printed plastic shell for cap connector insertion, enabling functions such as EEG signal acquisition, amplification filtering, and wireless communication (see Figure 3d and Figure 5). Moreover, the EEG component has a sampling rate ≥ 500 Hz, resolution ≥ 24 bit, 0.05 μV, and a bandwidth range of 0–250 Hz.

The upper computer processing platform plays a critical role in the system, with a design that includes data communication, data processing, display interface design, and brain function recognition analysis (see Figure 6). This segment is integrated using Python 3.9 and Visual Studio 2019, enabling tasks such as data communication, data processing, display interface design on Visual Studio, and data calculation, feature extraction, and classification recognition training on Python to be handled. A serial port connects the data communication segment to the hardware detection platform via a serial port. Communication content primarily involves three components, namely, configuring communication ports; verifying the correct functioning of the communication port, sending different signals to control the hardware platform’s start, pause, and stop; as well as receiving and parsing fNIRS and EEG data packets. Data are sent to the upper computer from the hardware platform, the former of which parses and checks the data, separating fNIRS and EEG data channels based on the timing of light signals and data acquisition. During data parsing, different wavelengths of weak signals are separated based on timing. The data processing segment involves more accurate fNIRS and EEG data filtering. Due to the relatively gradual changes in blood flow, variations in the direct current (DC) component of optical data in fNIRS are indicative of shifts in cerebral oxygenation levels. Hence, it is frequently essential to include the DC component in the analysis as a reference for establishing the baseline optical density level. Simultaneously, meticulous processing is imperative to prevent the inclusion of artifacts resulting from head movement or other factors unrelated to cerebral blood flow. Additionally, under specific circumstances, analyzing alternating current (AC) components may prove necessary to identify swift physiological changes linked to heart rate and respiration. This procedural step aids in noise removal, ensuring a clear signal for the precise extraction and interpretation of brain activity data. Meanwhile, with EEG data, high-frequency noise and DC components containing data from 1 Hz to 50 Hz are removed. Changes in brain oxygen levels are calculated using the modified Beer–Lambert law to estimate changes in HbO and HbR. Analysis of brain function recognition includes feature extraction and analysis of brain oxygen, EEG feature extraction, feature fusion, and the development of classification recognition models.

2.3.1. Functional Near-Infrared Spectroscopy-Electroencephalography (fNIRS-EEG) Data Preprocessing

(1): Functional near-infrared spectroscopy (fNIRS) data preprocessing

In this study, we utilized the HomER2 toolkit—a MATLAB R2022a-based GUI program—to analyze and preprocess fNIRS data [25], as depicted in Figure 7. Initially, we conducted an automatic data quality assessment using the ‘enprun-channels’ function, which removed any channels with insufficient or overly robust signals, or with large standard deviations, to safeguard data quality [25]. Following that, we transformed raw light intensities into concentrations of HbO and HbR based on the modified Beer–Lambert law [26]:

C = \frac{\log (\frac{I_{0}}{I})}{ε \cdot d}

(1)

where

C

is the concentration of the target substance (HbO or HbR),

I_{0}

is the intensity of the incident light,

I

is the intensity of the light detected after transmitting through the sample,

ε

is the molar absorption coefficient of the target substance, and

d

is the optical distance, i.e., the distance the light travels through the sample. To filter out motion artifacts, we applied Spline interpolation and Savitzky–Golay filtering techniques, mitigating alterations in optical signals caused by participant movements, and thereby enhancing fNIRS data accuracy [27]. The spline function

S_{i} (x)

can be expressed as follows:

S_{i} (x) = \sum_{i = 0}^{n} a_{i} x_{i}

(2)

where

x

is the independent variable,

a_{i}

is the polynomial coefficients, and

n

is the order of the polynomial. We perform Savitzky–Golay filtering on the light intensity values of the obtained continuous time series to obtain the smoothed output fNIRS signal values

y_{k}

as follows:

y_{k} = \sum_{j = - n}^{g} c_{j} x_{k + j}

(3)

where

x_{j + k}

is the local data point of the input signal,

c_{j}

is the coefficient of the Savitzky–Golay filter, and

g

is the window size. Subsequently, high-frequency physiological noise, such as cardiac signals, was filtered out using a 0.50 Hz low-pass filter, while a 0.01 Hz high-pass filter was employed to eliminate low-frequency noise like data drift [28,29].

Z (t)

is the filtered EEG output signal as follows:

Z {(t)}_{L} = \sum_{u = 0}^{U - 1} b_{u} y_{k} (t - u T)

(4)

Z {(t)}_{H} = \sum_{u = 0}^{U - 1} b_{u} x (t - v T) - \sum_{v = 1}^{V - 1} e_{v} y (t - v T)

(5)

where

x (t) = y_{k}

,

U

is the order of the low-pass filter,

V

is the order of the high-pass filter,

T

is the sampling time interval, and

b_{u}

and

e_{v}

are the coefficients of the low-pass and high-pass filters, respectively. The processed data were then segmented into task-specific modules and resting periods, with the latter serving as a reference for accurate assessment of relative concentration changes during task performance, ensuring baseline correction and yielding cleaner HbO and HbR signals [30]. The fNIRS signal after baseline calibration can be expressed as

η (t)

as follows:

η (t) = Z (t) - μ

(6)

where

μ

is the mean value of the baseline signal. Given the demonstrated sensitivity of HbO signals and their favorable classification in emotion recognition [31], this study primarily examines HbO signaling.

(2): Electroencephalography (EEG) data preprocessing

This study employs the EEGlab data preprocessing tool—complete with a comprehensive function library—for the offline analysis of EEG data [32], executed in five distinct phases (see Figure 8). Initially, the data are re-referenced to the average mastoid potential and downsampled to a 250 Hz sampling rate [33]. The EEG signal after re-referencing can be expressed as

φ (t)

as follows:

φ (t) = h (t) - \frac{1}{m} \sum_{i}^{m} h_{i} (t)

(7)

where

h (t)

is the original EEG signal,

h_{i} (t)

is the ith papillary potential signal, and

m

is the number of papillary electrodes. The sampling rate of EEG is

f

, and the interpolation method is used to generate a new data point between every two neighboring sampling points, so that the interpolated signal is

φ^{'} (t)

, and the new sampling rate becomes

f_{1}

, and the downsampled EEG signal can be expressed as

σ (t)

as follows:

σ (t) = φ^{'} (t \times \frac{f_{1}}{f}) = φ^{'} (r)

(8)

where

r

indicates that downsampling is achieved by scaling the time axis. Next, a 1–80 Hz bandpass filter is employed to remove high- and low-frequency waves [34]. The band-pass filtered EEG signal is expressed as

γ (t)

as follows:

γ (t) = \int_{- \infty}^{\infty} σ (τ) \cdot δ (t - τ) d τ

(9)

where

f_{H z}

is the frequency and

δ (t)

is the impulse response of the filter, which can be obtained by the Fourier transform

H (f_{H z})

as follows:

H (f_{H z}) = 0, (f_{H z} < 1 Hz, o r, f_{H z} > 80 Hz)

(10)

H (f_{H z}) = 1, (1 Hz \leq f_{H z} \leq 80 Hz)

(11)

Visual inspection is also performed to excise artifact-affected data segments [35]. To further improve quality, a 50 Hz notch filter targets power line noise [36]. The frequency response function of the filter can be expressed as

H_{1} (f_{H z})

as follows:

H_{1} (f_{H z}) = 0, (f_{H z} = 50 Hz)

(12)

H_{1} (f_{H z}) = 1, (else)

(13)

At this point, the resulting trap-filtered EEG signal is expressed as

θ (t)

as follows:

θ (t) = γ (t) \times H_{1} (f_{H z})

(14)

Defective channels are identified and mitigated through interpolation [37]. The interpolated EEG signal

ζ (t)

can be obtained as follows:

ζ (t) = (1 - α) \cdot ζ (t_{1}) + α \cdot ζ (t_{2})

(15)

where

α

is the relative position between the interpolated point and the known point, and

t_{1}

and

t_{2}

are neighboring known time points. Finally, independent component analysis (ICA) is used for artifact tagging and preprocessing to correct artifacts produced by eye movement (see Figure 9) [38]. We turn

ζ (t)

into a matrix

X

, where each row represents a measurement of an electrode and each column represents a measurement at a point in time. The goal of ICA is to decompose the matrix

X

into a set of independent source signals

S

, represented by the following matrix:

X = A S

(16)

where

A

is a mixing matrix that represents the weights of each source signal on each electrode, and

S

is a source signal matrix that contains the independent source signals. During this process, EEG data is visually inspected to remove segments with significant movement, thus obtaining pure EEG signals [35].

2.3.2. Functional Near-Infrared Spectroscopy-Electroencephalography (fNIRS-EEG) Correlation Analysis

In this study, Pearson correlation analysis [39] was performed to explore the relationship between the power spectral energy values of each EEG frequency band and the HbO features, including mean, peak, slope, variance, and kurtosis, in 30 subjects under four emotional states. The aim of this was to examine the potential relationships between fNIRS and EEG during emotion recognition.

Assuming that the power spectrum energy value of the EEG signal is as follows:

E E G = [E E G_{δ 1}, E E G_{θ 1}, E E G_{α 1}, E E G_{β 1}, E E G_{γ 1}, E E G_{δ 2} \dots, E E G_{γ 30}]

(17)

Here,

δ, θ, α, β, γ

are represented as five different brain waves of the EEG signal. The mean, peak, slope, variance, and kurtosis of the functional fNIRS signal are characterized as follows:

f N I R S = [H b O_{Mean}, H b O_{Var}, H b O_{Slope}, H b O_{Peak}, H b O_{Kurt}]

(18)

The Pearson correlation coefficients between EEG signals and functional fNIRS signals are as follows:

r_{x y} = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{(n - 1) σ_{x} σ_{y}}

(19)

where

x_{i}

denotes the ith value of the power spectrum energy value of the EEG signal,

y_{i}

denotes the ith value of the eigenvalue of the fNIRS signal,

\bar{x}

and

\bar{y}

denote the sample mean values of the EEG and fNIRS signals, respectively. Meanwhile,

σ_{x}

and

σ_{y}

represents the sample standard deviations of the EEG and fNIRS signals, respectively and

n

is the sample capacity. We can combine the EEG signal power spectrum energy values and fNIRS signal eigenvalues into a matrix

D

as follows:

D = [\begin{array}{l} E E G_{δ 1}, E E G_{θ 1}, E E G_{α 1}, E E G_{β 1}, E E G_{γ 1}, E E G_{δ 2} \dots, E E G_{γ 30} \\ H b O_{Mean}, H b O_{Var}, H b O_{Slope}, H b O_{Peak}, H b O_{Kurt}, 0 \dots, 0 \end{array}]

(20)

The first sub-matrix EEG in this formula includes the power spectrum energy values of the EEG signals of all participants in five frequency bands, while the second sub-matrix fNIRS contains the HbO signal features of the fNIRS of all participants, and 0 indicates that the fNIRS properties are independent of the EEG signals. The covariance matrix

S

of matrix

D

is obtained from the equation presented above as follows:

S = \frac{D D^{T}}{n - 1}

(21)

where

D^{T}

denotes the transpose matrix of

D

. We use the covariance matrix

S

to calculate the correlation coefficient moments

R

as follows:

R = [\begin{matrix} r_{E E G_{δ 1}, H b O_{Mean}} \\ r_{E E G_{θ 1}, H b O_{Mean}} \\ r_{E E G_{α 1}, H b O_{Mean}} \\ r_{E E G_{β 1}, H b O_{Mean}} \\ r_{E E G_{γ 1}, H b O_{Mean}} \\ ⋮ \\ r_{E E G_{γ 30}, H b O_{Mean}} \end{matrix}] [\begin{matrix} r_{E E G_{δ 1} H b O_{Var}} \\ r_{E E G_{θ 1}, H b O_{Var}} \\ r_{E E G_{α 1}, H b O_{Var}} \\ r_{E E G_{β 1}, H b O_{Var}} \\ r_{E E G_{γ 1}, H b O_{Var}} \\ ⋮ \\ r_{E E G_{γ 30}, H b O_{Var}} \end{matrix}] [\begin{matrix} r_{E E G_{δ 1} H b O_{Slope}} \\ r_{E E G_{θ 1}, H b O_{Slope}} \\ r_{E E G_{α 1}, H b O_{Slope}} \\ r_{E E G_{β 1}, H b O_{Slope}} \\ r_{E E G_{γ 1}, H b O_{Slope}} \\ ⋮ \\ r_{E E G_{γ 30}, H b O_{Slope}} \end{matrix}] [\begin{matrix} r_{E E G_{δ 1} H b O_{Peak}} \\ r_{E E G_{θ 1}, H b O_{Peak}} \\ r_{E E G_{α 1}, H b O_{Peak}} \\ r_{E E G_{β 1}, H b O_{Peak}} \\ r_{E E G_{γ 1}, H b O_{Peak}} \\ ⋮ \\ r_{E E G_{γ 30}, H b O_{Peak}} \end{matrix}] [\begin{matrix} r_{E E G_{δ 1} H b O_{K u r t}} \\ r_{E E G_{θ 1}, H b O_{K u r t}} \\ r_{E E G_{α 1}, H b O_{K u r t}} \\ r_{E E G_{β 1}, H b O_{K u r t}} \\ r_{E E G_{γ 1}, H b O_{K u r t}} \\ ⋮ \\ r_{E E G_{γ 30}, H b O_{K u r t}} \end{matrix}]

(22)

where each element

r_{x y}

represents the correlation coefficient between the EEG frequency bands

x

and the features of fNIRS

y

. The positive correlation is usually 1, while the negative correlation is usually −1 when the linear relationship between the two variables is gradually strengthened. If we analyze the correlation coefficient matrix

R

, we learn the degree of relationship and interaction between the EEG bands and the characteristics of functional fNIRS signals. This helps us to understand the application of EEG and fNIRS signals in brain function research and diagnosis.

To enable comparisons to be made between the fNIRS data and the EEG data, we normalized these data using z-score zero-mean normalization, which reduces individual differences [40]. In turn, this ensures that the processed data

y (n)

conform to the standard normal distribution. This enables the results of the correlation test between fNIRS and EEG to be obtained as follows:

y (n) = \frac{x (n) - \bar{x}}{σ}

(23)

2.4. Temporal Convolutional Network (TC-ResNet) Model

2.4.1. Temporal Convolution for Emotion Recognition

The spectral features extracted from fNIRS-EEG data are input into the CNN network for real-time recognition in this work, leading to significant results. We have converted the fNIRS-EEG data to a time-frequency representation, i.e.,

Ψ = H^{f E \times T}

(where

T

represents the time axis and

f E

is the feature axis extracted in the frequency domain). If we assume a step size of 1 and apply 0-padding to match the input and output resolutions, the output feature map would be

Y = H^{w \times h \times c^{'}}

, given the input tensor

X = H^{w \times h \times c}

and weights

W \in H^{k_{w} \times k_{h} \times c \times c^{'}}

[41,42].

To develop a quick and accurate real-time emotion recognition model, we must consider the fNIRS-EEG data acquired in each frame separately within a set of time series data. We set

w = T

,

h = 1

,

c = f E

to obtain

X_{1} = H^{T \times 1 \times f E}

. The new tensor

X_{1}

is then used as an input to the temporal convolution with weights

W_{1} \in H^{3 \times 1 \times f E \times c^{'}}

. Meanwhile, the output feature map is

Y_{1} = H^{t \times 1 \times c^{'}}

. In this work, a temporal convolution is proposed that can overcome the issues associated with existing convolution operations, which are computationally intensive, run slower, and use a high memory overhead (see Figure 10).

From Figure 10, it can be seen that our adopted approach effectively integrates all low-level features to construct higher-level features in the subsequent layer. It is important to note that this takes place without requiring the stacking of multiple layers, as low-level information features seamlessly develop into advanced information features. The key benefit of this approach is its capacity to achieve superior performance with a reduced number of layers whilst simultaneously capturing a wide spectrum of emotional features. To further determine the efficacy of this approach, we compared it with the existing two-dimensional convolution [43], maintaining consistent parameters, which are presented in Table 4.

Table 4 shows that the current 2D convolution method exhibits 39 times higher MACs than our proposed temporal convolution method. This implies a more substantial computational load. Furthermore, the feature map output from temporal convolution is smaller than that of 2D convolution, which suggests that a decreased feature map causes substantial reductions in computational load and memory usage. On the whole, our material convolution method has outstanding performance and advancement, effectively reducing computational load and memory usage—factors that are vital in successfully implementing real-time emotion recognition.

2.4.2. Temporal Convolutional Network (TC-ResNet) Architecture

This study employs ResNet architecture [44] to perform real-time emotion recognition, featuring a distinctive modification in which the conventional 3 × 3 kernel is replaced with an m × 1 kernel. Specifically, m = 3 is applied for the first layer, while m = 9 is utilized for subsequent layers. Interestingly, there is no bias in any convolutional or fully connected layers, with each batch normalization layer [45] containing trainable scaling and shifting parameters. Moreover, temporal convolution can be introduced to improve the effective receptive field. The original ResNet implementation is retained by including stride convolution and excluding expansion convolution in this modified architecture. This is sometimes known as TC-ResNet (see Figure 11).

Figure 11 shows that our framework is based on the TC-ResNet8 model. The model contains three residual blocks, with each layer having channel numbers of 16, 24, 32, and 48, including the initial convolutional layer. A shortcut is directly applied to achieve efficient emotion recognition when the dimensions between input and output match, which can be seen in Figure 11b. When dimensions differ, an additional conv-BN-ReLU module can be used for dimension matching, as presented in Figure 11c. We used a variety of kernel sizes to improve the model’s focus and extraction capabilities for distinct emotional critical information (j × 1, where j = 3, 5, 7, 9). This design allows the network to autonomously determine the appropriate kernel size for each layer. As shown in Figure 11a, the well-established Squeeze-and-Excitation (SE) structure is seamlessly incorporated into the block. Furthermore, we introduce a width multiplier, denoted as

k

, facilitating the adjustment of channel numbers in each layer. This feature enables an appropriate model capacity to be selected based on specific constraints.

2.4.3. Temporal Convolutional Network (TC-ResNet8) Setup

This study acknowledges the potential influence of individual differences on classification outcomes by collecting data from all participants to create a consolidated dataset for real-time emotion recognition. The dataset was then divided into training, validation, and test sets, maintaining an 8:1:1 ratio, with each participant’s data exclusively assigned to one of those sets.

(1): Data processing

As can be seen in Figure 12, emotions such as rest, calmness, sadness, joy, and fear were assigned numerical labels of 0, 1, 2, 3, and 4. This was performed for every sample in the fNIRS-EEG dataset. A sliding window approach was used to extract features from fNIRS and EEG data, with the critical parameters being detailed in Table 5. To determine the label for the current window, the mode of all labels within that window is determined. If a unique mode emerged, it was assigned as the label. Meanwhile, when there are two modes indicative of a transitional emotional state (rest), the label for the current window was set as the last-occurring mode. Each sample was treated as an individual data point, developing a multidimensional time series input of dimensions (n_sample, n, n_features), in which n represents the number of samples continuously extracted for the current state. An additional dimension was appended to both inputs, with seq_len set to 1, following identical processing for fNIRS and EEG data. They were then entered into TC-ResNet8.

(2): Training

We used tensor streaming [46] to train and evaluate these models. To address overfitting, a strategy with a weight decay set to 0.001 and a dropout rate of 0.5 was used. The momentum for stochastic gradient descent was fixed at 0.9. Five iterations of fresh training were performed on these models. The learning rate commenced at 0.01 and gradually decreased based on a predefined rule every ten iterations. We also used an early stopping system containing a validation split [47].

(3): Evaluation

The key metrics for evaluating model performance include accuracy, precision, time, recall, and F1 score. Each model underwent 15 training sessions, after which their average performance was reported.

3. Results and Analysis

3.1. Functional Near-Infrared Spectroscopy (fNIRS) Data Analysis

Brain oxygenation directly represents cerebral hemodynamic responses, which ultimately reflect brain activation characteristics. Several researchers have found that the PFC plays a vital role in eliciting emotional responses [48]. fNIRS is an ideal approach to capturing PFC neural activation and has been widely applied by cognitive neuroscientists examining the effects of emotions on cognition [49]. This study strategically covered emotion-related frontal lobe areas based on the fNIRS channel layout. Rigorous preprocessing was performed to increase channel quality, including calculating the correlation coefficient HbO concentrations for each channel. Any channel with a correlation coefficient of -1 or exceeding 0.5 was flagged as abnormal. After statistical analyses at the individual level using generalized linear models, trends in the mean concentration of oxyhemoglobin were obtained for 30 participants in different emotional states. This approach validated the average brain activation maps for all participants across emotions (see Figure 13).

As depicted in Figure 13, the emotion induction phase significantly increases activity in the frontal brain region. Data from the 30 s preceding the onset of each emotional block were used to establish baseline values. By subtracting these baseline values, and then averaging across the 24 channels for each emotional state, we created brain activation maps corresponding to specific emotions (see Figure 13a–d, which align with Figure 13e,f). The results indicate significant hemodynamic responses during the induction of emotions, with fear and joy exhibiting the most pronounced effects. To be more precise, channels 11, 12, and 18 are notably activated when inducing a calm emotional state, as depicted in Figure 13a,e,f. With regard to a sad emotional state, channels 6, 12, and 18 exhibit significance, as presented in Figure 13b,e,f. Channels 4, 5, and 14 show significance in inducing a happy emotional state, as presented in Figure 13c,e,f. Finally, for the fearful emotional state, channels 3, 6, 14, 15, and 20 display significance, as presented in Figure 13d–f. In addition, we calculated the mean, SD, and SEM of the single-channel HbO concentration changes for all participants in the four moods, as shown in Figure 13g.

To further analyze this phenomenon, we segmented the data and obtained concentration change curves of HbO for the significance channels under four emotions after baseline calibration (see Figure 14).

In Figure 14, participants showed relatively consistent trends in the mean HbO concentration in the salience channel across the four emotional states. When participants were introduced to the calm emotion, the mean HbO concentration in the frontal lobe initially showed pronounced fluctuations, which gradually stabilized over time. This pattern may be attributed to the initial impact of the movie clip, followed by a gradual calming of the inner self while watching a calm video, resulting in decreased activity in the frontal lobe region (see Figure 14a). During the induction of sad emotions, frontal lobe brain activity was significantly higher than that induced by calm emotions, although there were more pronounced fluctuations towards the end. This behavior may be associated with the concentration of sad emotional segments intensifying towards the end of the clip (see Figure 14b).

Meanwhile, when happy emotions were induced, brain activity in the frontal lobe exhibited more pronounced fluctuations, indicating an excited state in the frontal lobe region, which can be seen in Figure 14c. On the other hand, frontal lobe activity gradually decreased when inducing fearful emotions and stabilized over time. This observed trend may be due to individual differences; the experimental participant noted that the latter part of the horror-inducing movie clip lacked a significant horror effect, causing an actual induction of fearful emotions that was different than expected, as presented in Figure 14d. To summarize, increasing emotional intensity enhances hemodynamic responses compared to calm emotion induction. During emotion induction, the amplitude and the mean value of changes in HbO concentration increase, and this is accompanied by an elevated oxygen consumption rate. These physiological responses are likely related to the execution of brain functions and increased metabolic demands during emotion induction. These research findings contribute to a deeper understanding of the relationship between emotions and physiological processes, providing critical insights into emotion regulation and emotional disorders.

3.2. Electroencephalography (EEG) Data Analysis

In neuropsychology, many studies have highlighted a relationship between EEG and emotions [50]. In the frequency domain, a relationship exists between spectral power in distinct frequency bands and emotional states [51]. The α rhythm varies under different emotional states, highlighting changes associated with valence or discrete emotions such as happiness, sadness, and fear [52]. Moreover, the positive asymmetry of the α spectrum is consistently linked to valence stability. Subsequent research has found that frontal α rhythm asymmetry indicates approach/withdrawal tendencies related to emotions rather than the valence of emotions per se [53]. To examine these dynamics in more depth, our study considered the entire induction period, calculating average amplitude changes in Delta (1–4 Hz), Theta (4–8 Hz), Alpha (8–12 Hz), Beta (12–30 Hz), and Gamma (30–80 Hz) rhythms for each channel under various emotional states. This approach is presented in Figure 15, which visually presents the differences between different emotions.

In Figure 15, there are differences in each participant’s frequency band under different emotional states. During the calm emotional state, EEG activity predominantly manifests in Delta, Theta, and Alpha waves. The emphasis moves to the Theta and Alpha waves in the sad emotional state. During the happy emotional state, there is heightened activity in the Theta, Alpha, and Beta waves, while during the fearful demonstrative state, the focus is on the Delta, Theta, Alpha, and Beta waves.

Given the significance of the frontal lobe in emotional processing, we used the Welch approach to calculate the average power spectral energy values in different frequency bands for each lead. Normalization was applied to the average values for each participant to overcome the effects of inter-subject differences in power spectral energy on results. The dataset was then segmented to extract EEG data in different frequency bands (see Figure 16).

In Figure 16, a discernible pattern within the three low-frequency bands (Delta, Theta, and Alpha waves) can be seen. Mean power spectral energy values align consistently with the emotional states, ranking in descending order as calm, happy, sad, and fearful. Similarly, there is a clear and consistent pattern with the Beta and Gamma waves of the mid-to-high-frequency bands. Here, mean power spectral energy values align with emotional states in descending order: fearful, calm, happy, and sad. This indicates that individuals exhibit a closer action tendency during relaxed and comfortable emotional inductions in the low-frequency range. Meanwhile, when faced with sad and fearful emotional inductions, individuals display a more distant action tendency. Additionally, participants exhibited varying distributions across five distinct EEG power spectra during emotional states of calm, sadness, happiness, and fear (see Figure 16f). Specifically, during calm emotional states, Delta, Theta, and Alpha waves predominated in EEG activity. In sad emotional states, the emphasis shifted towards Theta and Alpha waves. Conversely, during happy emotional states, increased activity was observed across the Theta, Alpha, and Beta waves, while fearful emotional states showed heightened activity across Delta, Theta, Alpha, and Beta waves.

3.3. Functional Near-Infrared Spectroscopy-Electroencephalography (fNIRS-EEG) Data Correlation

Electroencephalographic activity, particularly in the low-frequency Delta and Theta bands, is intrinsically related to cortical hemodynamic responses [54]. In this work, a significant relationship has been identified between emotions and the Alpha and Beta frequency bands. After using the Gaussian criterion to eliminate outliers from the EEG band power spectrum features and brain oxygen response features (mean, peak, variance, slope), Shapiro−Wilk normality tests were performed to ensure that the data used in the correlation analysis exhibited a normal distribution. A two-tailed Pearson correlation analysis was conducted to closely examine the associational characteristics between the two, which are presented in Table 6.

In Table 6, the results of the correlation test between EEG power spectrum features of the frontal cortex, and z-score normalized fNIRS features for all 30 participants are presented. Significant correlations predominantly occur for fNIRS features, specifically mean, variance, slope, and kurtosis: Delta-slope (r = −0.725, p = 0.028), Theta-slope (r = −0.705, p = 0.030), Alpha-variance (r = −0.225, p = 0.020), Alpha-kurtosis (r = −0.959, p = 0.041), Beta-variance (r = −0.854, p = 0.036), Gamma-mean (r = 0.988, p = 0.0122). Moreover, a negative relationship can be seen between the EEG power spectrum energy values and both the peak and variance of the fNIRS features. In turn, this indicates that hemodynamic responses manifest with lower peaks and more minor fluctuations in blood oxygen concentration as emotional power spectrum energy increases. On the other hand, there is a positive relationship between the EEG energy values and the fNIRS slope and kurtosis, indicating that higher energy corresponds with a slower oxygen consumption rate of oxygenated hemoglobin (which is evident in the negative slope of the task period curve, in which a more significant slope indicates slower fluctuations and a smaller slope represents steeper fluctuations). These correlations align with the expected experimental outcomes and are consistent with the hemodynamic responses presented in Figure 13 and Figure 14, as well as the electrophysiological results in Figure 15 and Figure 16. To summarize, there is a clear and significant relationship between EEG average energy values and fNIRS features under different emotional states, providing an effective approach for quantitatively assessing the interrelated features of brain electrophysiology and hemodynamic responses.

3.4. Analysis of Emotion Recognition Classification Results

In this study, we harnessed the capabilities of TC-ResNet8 to perform emotion recognition classification experiments using fNIRS-EEG data, achieving precise emotion classification results. Moreover, we examined the emotion classification outcomes of two single-modal models (fNIRS and EEG) based on previous research and the dual-modal model presented in this paper (detailed in Table 7).

In Table 7, it can be seen that the accuracy of emotion recognition in the dual-modal fNIRS-EEG approach increases by 8.37% compared to the advanced single-modal fNIRS, and by 0.24% in comparison to the advanced single-modal EEG. This suggests that there is complementary information present between the features of fNIRS and EEG, enabling the inherent limitations in each single modality to be overcome and ultimately enhancing the accuracy of dual-modal emotion recognition classification. These findings have significant implications for future emotion recognition research and applications, providing valuable insights and a theoretical foundation upon which more accurate and reliable emotion recognition technologies can be developed.

In addition, we further explored the classification accuracy in five different states (0, 1, 2, 3, 4), which can provide more insight into the predictive power of the model and can help identify potential areas where the model may underperform, as illustrated in Figure 17.

As can be seen from Figure 17, our model’s classification ability is robust across all states, with the classification for state 2 being perfectly accurate. The slightly lower value observed for state 0 (although still very high) suggests that there might be a minor scope for improvement, and we will consider investigating this further in future research.

3.5. Temporal Convolutional Network (TC-ResNet) Model Evaluation

To further examine the sophistication of this model, nine baseline models for emotion classification were selected, after which a comparative analysis of performance metrics was performed, including datatype, accuracy, and time. Specific results are outlined in Table 8.

Table 8 distinctly showcases the significant enhancements our model brings to the field of emotion recognition by leveraging temporal convolution to bolster both accuracy and time efficiency. In the comparative assessment of emotion recognition models, we thoroughly evaluated the methodologies applied to fNIRS-EEG data. The strengths of our proposed model are evident in both predictive accuracy and computational efficiency. In comparison to random forest, RHMM-SVM, and SVM, our model showcased respective accuracy enhancements of 0.7%, 24.81%, and 19.81%. Furthermore, when contrasted with traditional convolutional neural networks (CNNs), our model demonstrated a substantial accuracy improvement of 17.81%. This advancement is particularly noteworthy, as our approach surpassed sophisticated architectures such as CNN-Transformer, R-CSP-E, backpropagation ANN, MA-MP-GF, and stacking ensemble learning, achieving noteworthy improvements of 13.11%, 32.98%, 2.89%, 4.1%, and 3.98%, highlighting the exceptional capabilities of our model. These comparative analyses indicate that our research model balances time and accuracy, enabling real-time and precise emotion recognition to be performed. Notably, this study also examines the model’s scalability by extending its performance through an increased layer count or width multiplication. In turn, this results in the development of more extensive variant models, such as TC-ResNet14 (see Figure 18), which affirm our model’s adaptability and potential for escalated complexity and capability.

In Figure 18, we expanded the network by incorporating more than twice the number of residual blocks than TC-ResNet8, yielding TC-ResNet14. Interestingly, the TC-ResNet8-1.5 signifies an expansion of the width multiplier of the TC-ResNet8 model to 1.5, enabling models with channel numbers {24, 36, 48, 72}, respectively, to be developed. Similarly, TC-ResNet14-1.5 indicates an extension of the width multiplier of the TC-ResNet14 model to 1.5, generating models with channel numbers {24, 36, 48, 72}. We also enhanced the TC-ResNet8 model by incorporating traditional 3 × 3 2D convolutions, facilitating the development of the 2D-TC-ResNet8 model. We then introduced an average pooling layer after the first convolutional layer, to reduce the number of operations and minimize accuracy loss on this model, resulting in the improved TC-ResNet8-pooling model. The comparison of accuracy, time, and FLOPs performance metrics for these variant TC-ResNet models can be seen in Table 9.

When compared to the model employed in this study, the TC-ResNet8-1.5, TC-ResNet14, and TC-ResNet14-1.5 models exhibited marginal increases in accuracy by 0.01%, 0.02%, and 0.04%, respectively. Nonetheless, their processing speeds decelerated by 2.5 times, 2.3 times, and 5.2 times, respectively. These three models were found to have higher accuracy but higher processing times. With regard to 2D-TC-ResNet8 and TC-ResNet8-pooling (which maintained the architecture and parameters of the TC-ResNet8 model while incorporating 2D convolution), their accuracy declined by 0% and 1.18%, respectively. Nonetheless, they operated 9.2 times and 3.2 times slower than TC-ResNet8. To summarize, the models employed in this study strike a balance between accuracy and time, using fewer computational resources and operations to achieve fast, real-time, and accurate emotion recognition on the fNIRS-EEG system. This research also highlights the efficacy of temporal convolution in a multitude of network architectures.

4. Results and Discussion

Our research introduces a novel wearable dual-modal system, seamlessly integrating fNIRS and EEG, and employs a state-of-the-art TC-ResNet for real-time implicit emotion recognition. The distinctiveness of our fNIRS-EEG ensemble lies in its portability, energy efficiency, wireless capabilities, and scalable architecture, providing a real-time visual interface for monitoring cerebral electrical and hemodynamic changes across diverse real-world scenarios. Our comprehensive emotional detection strategy encompasses the entire spectrum, from system architecture and deployment to signal processing and interpretation. Through extensive evaluations involving 30 subjects under four emotion induction protocols, our bimodal system demonstrated outstanding performance, achieving an impressive 99.81% classification accuracy. This substantiates its excellence in detecting emotions and highlights the intricate interconnection between fNIRS and EEG signals. Comparative analyses with the latest unimodal identification methods underscore the superiority of our bimodal approach, showcasing notable accuracy gains of 0.24% for EEG and 8.37% for fNIRS independently. Moreover, the proposed TC-ResNet-driven temporal convolutional fusion technique outperforms conventional fNIRS-EEG fusion methods, enhancing recognition accuracy from 0.7% to 32.98%.

In recent years, advancements have been made in the synchronous monitoring of brain activity via combined fNIRS and EEG technologies. Despite these advances, the parameters of extant systems impose persistent obstacles for researchers. This study presents a comprehensive analysis contrasting the key technical parameters of conventional fNIRS-EEG systems against those of our innovatively engineered system, highlighting the latter’s considerable advantages in several critical dimensions (see Table 10). Our system adopts the dual-wavelength near-infrared technology established in existing research. We have amplified channel count, yielding a more intricate signal acquisition and resolution—thus amplifying biological signal sensitivity. A higher fNIRS/EEG sampling rate in our system adeptly captures rapid biosignal dynamics, bestowing a distinctive edge for real-time monitoring. In addition, our optimization of photodetector intervals through multi-distance measurements permits enhanced precision in subcortical process surveillance. The system’s source-detector multi-distance detection ensures accurate identification of changes in fine microbial signals.Our modern data transmission infrastructure ensures seamless handling of vast datasets in real time, reinforcing our capability to conduct comprehensive, large-scale studies. The high-density measurement layout not only grants precise spatial mapping of biological signals, but also promotes meticulous scrutiny of minute interregional cerebral variations. The system’s modular construction and intuitive interface markedly diminish operational complexity, allowing researchers to dedicate greater focus to experimental design and data interpretation. Moreover, its compact, portable configuration is well-suited to laboratory use and paves the way for practical implementations in fields like personalized healthcare and live emotional tracking. By synthesizing a comparative evaluation and imaginative design, our system exhibits major performance improvements—broadening the scope of fNIRS-EEG technology applications within neuroscience and cognitive research and laying a substantial groundwork for forthcoming brain–computer interface endeavors.

5. Conclusions

Our cross-disciplinary research effort has culminated in the creation of a state-of-the-art dual-mode wearable device. This device seamlessly merges fNIRS with EEG, allowing for the precise detection and interpretation of users’ emotional states. With its lightweight and modular design, multiple channels, wireless capabilities, and straightforward operation, the fNIRS-EEG system comprehensively addresses the portability challenges inherent in conventional devices. This study pioneers the introduction of TC-ResNet technology into emotion recognition. By integrating this cutting-edge technology, our system has achieved breakthrough improvements in both accuracy and response speed, substantially outperforming traditional and single-modality models. The data preprocessing stage has been approached with meticulous attention to detail. We have conducted a thorough quality check of the fNIRS signals, transforming raw intensities into the concentrations of HbO and HbR, and filtering out the movement artifacts and physiological noise filtering. The EEG signal processing has likewise been fortified by the inclusion of re-referencing, filtering, rigorous visual artifact inspection and removal, interpolation for compromised channels, and ICA, all contributing to the integrity of our brain signal collection. Our conclusive findings, derived from an extensive evaluation encompassing 30 subjects and four emotion induction protocols, demonstrate that our bimodal system achieves a stunning 99.81% classification accuracy. The fusion of fNIRS and EEG signals not only confirms their interconnectivity, but also establishes the superiority of our bimodal system compared to their single-modal counterparts, which saw accuracy increments of 0.24% for EEG and 8.37% for fNIRS when used independently. Furthermore, our dual-modal approach utilizing TC-ResNet trumps conventional fNIRS-EEG fusion techniques, improving recognition accuracy by a significant margin of up to 32.98%. Our pragmatic and carefully articulated emotional detection system paves the way for a revolutionary shift in personalized healthcare, adaptive learning ecosystems, and enhanced human–computer interaction frameworks. The outcomes of this work hold promising potential for future advancements in the fields of emotion analysis and emotion–brain–machine interface development, reshaping the frontiers of how emotional responses are perceived and utilized.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/2079-9292/13/7/1310/s1, Supplementary Materials S1: participants’ self-assessment scale; Supplementary Materials S2: fNIRS optodes.

Author Contributions

The authors confirm their contribution to the paper as follows: Conceptualization, S.Z., D.Z. and J.C.; data curation, F.W.; formal analysis, Y.B., Z.Z. and D.Z.; methodology, Z.Z., J.C. and K.Y.; writing—original draft, J.C.; writing—review and editing, J.C. and F.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (62205210).

Institutional Review Board Statement

All subjects gave their informed consent for inclusion before they participated in this study. This study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Institutional Review Board of University of Shanghai for Science and Technology (protocol code: IRB-AF65-V1.0, and date of approval: 24 November 2023).

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

This study adheres to the requirements of the project agreement and, considering that the project is still undergoing in-depth research, we have restricted public access to certain datasets. These datasets have been provided with the permission of Shanghai University of Science and Technology and can be accessed by contacting the designated representative of the institution, Xing Hu, via email ([email protected]). Other data from this study are available in the article.

Acknowledgments

The authors are thankful to the anonymous reviewers and editors for their valuable comments and suggestions. And this project relies on the innovation platform of the University of Shanghai for Science and Technology. I would like to thank Zhuang Songlin of the Chinese Academy of Engineering and Dawei Zhang for their help in this project.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jiang, X.; Fan, J.; Zhu, Z.; Wang, Z.; Guo, Y.; Liu, X.; Jia, F.; Dai, C. Cybersecurity in neural interfaces: Survey and future trends. Comput. Biol. Med. 2023, 167, 107604. [Google Scholar] [CrossRef]
Liu, Z.-T.; Xie, Q.; Wu, M.; Cao, W.-H.; Mei, Y.; Mao, J.-W. Speech emotion recognition based on an improved brain emotion learning model. Neurocomputing 2018, 309, 145–156. [Google Scholar] [CrossRef]
Liu, H.; Cai, H.; Lin, Q.; Zhang, X.; Li, X.; Xiao, H. FEDA: Fine-grained emotion difference analysis for facial expression recognition. Biomed. Signal Process. Control 2023, 79, 104209. [Google Scholar] [CrossRef]
Zhang, F.; Li, X.-C.; Lim, C.P.; Hua, Q.; Dong, C.-R.; Zhai, J.-H. Deep Emotional Arousal Network for Multimodal Sentiment Analysis and Emotion Recognition. Inf. Fusion 2022, 88, 296–304. [Google Scholar] [CrossRef]
Rahman, M.M.; Sarkar, A.K.; Hossain, M.A.; Hossain, M.S.; Islam, M.R.; Hossain, M.B.; Quinn, J.M.W.; Moni, M.A. Recognition of human emotions using EEG signals: A review. Comput. Biol. Med. 2021, 136, 104696. [Google Scholar] [CrossRef]
Condell, E.; Aseem, S.; Suvranu, D.; Xavier, I. Deep learning in fNIRS: A review. Neurophotonics 2022, 9, 041411. [Google Scholar] [CrossRef]
Vanutelli, M.E.; Grippa, E. 104. Resting lateralized activity (fNIRS) predicts the cortical response and appraisal of emotions. Clin. Neurophysiol. 2016, 127, e156. [Google Scholar] [CrossRef]
Bandara, D.; Velipasalar, S.; Bratt, S.; Hirshfield, L. Building predictive models of emotion with functional near-infrared spectroscopy. Int. J. Hum.-Comput. Stud. 2018, 110, 75–85. [Google Scholar] [CrossRef]
Manelis, A.; Huppert, T.J.; Rodgers, E.; Swartz, H.A.; Phillips, M.L. The role of the right prefrontal cortex in recognition of facial emotional expressions in depressed individuals: fNIRS study. J. Affect. Disord. 2019, 258, 151–158. [Google Scholar] [CrossRef]
Floreani, E.D.; Orlandi, S.; Chau, T. A pediatric near-infrared spectroscopy brain-computer interface based on the detection of emotional valence. Front. Hum. Neurosci. 2022, 16, 938708. [Google Scholar] [CrossRef]
Yeung, M.K. The prefrontal cortex is differentially involved in implicit and explicit facial emotion processing: An fNIRS study. Biol. Psychol. 2023, 181, 108619. [Google Scholar] [CrossRef]
Zheng, W.L.; Zhu, J.Y.; Lu, B.L. Identifying Stable Patterns over Time for Emotion Recognition from EEG. IEEE Trans. Affect. Comput. 2019, 10, 417–429. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, J.; Tan, J.H.; Chen, Y.; Chen, Y.; Li, D.; Yang, L.; Su, J.; Huang, X.; Che, W. An Investigation of Deep Learning Models for EEG-Based Emotion Recognition. Front. Neurosci. 2020, 14, 622759. [Google Scholar] [CrossRef]
Gao, Q.; Yang, Y.; Kang, Q.; Tian, Z.; Song, Y. EEG-based Emotion Recognition with Feature Fusion Networks. Int. J. Mach. Learn. Cybern. 2022, 13, 421–429. [Google Scholar] [CrossRef]
Zheng, Y.; Ding, J.; Liu, F.; Wang, D. Adaptive neural decision tree for EEG based emotion recognition. Inf. Sci. 2023, 643, 119160. [Google Scholar] [CrossRef]
Jiaming, C.; Theodore, J.H.; Pulkit, G.; Jana, M.K. Enhanced spatiotemporal resolution imaging of neuronal activity using joint electroencephalography and diffuse optical tomography. Neurophotonics 2021, 8, 015002. [Google Scholar] [CrossRef]
Abtahi, M.; Borgheai, S.B.; Jafari, R.; Constant, N.; Diouf, R.; Shahriari, Y.; Mankodiya, K. Merging fNIRS-EEG Brain Monitoring and Body Motion Capture to Distinguish Parkinsons Disease. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 1246–1253. [Google Scholar] [CrossRef]
Tan, X.; Fan, Y.; Sun, M.; Zhuang, M.; Qu, F. An emotion index estimation based on facial action unit prediction. Pattern Recognit. Lett. 2022, 164, 183–190. [Google Scholar] [CrossRef]
Bendjoudi, I.; Vanderhaegen, F.; Hamad, D.; Dornaika, F. Multi-label, multi-task CNN approach for context-based emotion recognition. Inf. Fusion 2021, 76, 422–428. [Google Scholar] [CrossRef]
Alruily, M. Sentiment analysis for predicting stress among workers and classification utilizing CNN: Unveiling the mechanism. Alex. Eng. J. 2023, 81, 360–370. [Google Scholar] [CrossRef]
Jiang, Y.C.; Ma, R.; Qi, S.; Ge, S.; Sun, Z.; Li, Y.; Song, J.; Zhang, M. Characterization of Bimanual Cyclical Tasks From Single-Trial EEG-fNIRS Measurements. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 30, 146–156. [Google Scholar] [CrossRef]
Yi, L.; Xie, G.; Li, Z.; Li, X.; Zhang, Y.; Wu, K.; Shao, G.; Lv, B.; Jing, H.; Zhang, C.; et al. Automatic depression diagnosis through hybrid EEG and near-infrared spectroscopy features using support vector machine. Front. Neurosci. 2023, 17, 1205931. [Google Scholar] [CrossRef]
Lin, J.; Lu, J.; Shu, Z.; Han, J.; Yu, N. Subject-Specific Modeling of EEG-fNIRS Neurovascular Coupling by Task-Related Tensor Decomposition. IEEE Trans. Neural Syst. Rehabil. Eng. 2024, 32, 452–461. [Google Scholar] [CrossRef]
Carvalho, S.; Leite, J.; Galdo-Álvarez, S.; Gonçalves, Ó.F. The Emotional Movie Database (EMDB): A Self-Report and Psychophysiological Study. Appl. Psychophysiol. Biofeedback 2012, 37, 279–294. [Google Scholar] [CrossRef]
Zheng, Q.; Chi, A.; Shi, B.; Wang, Y.; Ma, Q.; Zhou, F.; Guo, X.; Zhou, M.; Lin, B.; Ning, K. Differential features of early childhood motor skill development and working memory processing: Evidence from fNIRS. Front. Behav. Neurosci. 2023, 17, 1279648. [Google Scholar] [CrossRef]
Karmakar, S.; Kamilya, S.; Dey, P.; Guhathakurta, P.K.; Dalui, M.; Bera, T.K.; Halder, S.; Koley, C.; Pal, T.; Basu, A. Real time detection of cognitive load using fNIRS: A deep learning approach. Biomed. Signal Process. Control 2023, 80, 104227. [Google Scholar] [CrossRef]
Sahar, J.; Seyed, K.S.; David, A.B.; Meryem, A.Y. Motion artifact detection and correction in functional near-infrared spectroscopy: A new hybrid method based on spline interpolation method and Savitzky–Golay filtering. Neurophotonics 2018, 5, 015003. [Google Scholar] [CrossRef]
Hong, K.-S.; Khan, M.J.; Hong, M.J. Feature Extraction and Classification Methods for Hybrid fNIRS-EEG Brain-Computer Interfaces. Front. Hum. Neurosci. 2018, 12, 246. [Google Scholar] [CrossRef]
Bizzego, A.; Balagtas, J.P.M.; Esposito, G. Commentary: Current Status and Issues Regarding Pre-processing of fNIRS Neuroimaging Data: An Investigation of Diverse Signal Filtering Methods Within a General Linear Model Framework. Front. Hum. Neurosci. 2020, 14, 00247. [Google Scholar] [CrossRef]
Firooz, S.; Setarehdan, S.K. IQ estimation by means of EEG-fNIRS recordings during a logical-mathematical intelligence test. Comput. Biol. Med. 2019, 110, 218–226. [Google Scholar] [CrossRef]
Fogazzi, D.V.; Neary, J.P.; Sonza, A.; Reppold, C.T.; Kaiser, V.; Scassola, C.M.; Casali, K.R.; Rasia-Filho, A.A. The prefrontal cortex conscious and unconscious response to social/emotional facial expressions involve sex, hemispheric laterality, and selective activation of the central cardiac modulation. Behav. Brain Res. 2020, 393, 112773. [Google Scholar] [CrossRef]
Stropahl, M.; Bauer, A.-K.R.; Debener, S.; Bleichner, M.G. Source-Modeling Auditory Processes of EEG Data Using EEGLAB and Brainstorm. Front. Hum. Neurosci. 2018, 12, 2018. [Google Scholar] [CrossRef]
Zheng, J.; Li, Y.; Zhai, Y.; Zhang, N.; Yu, H.; Tang, C.; Yan, Z.; Luo, E.; Xie, K. Effects of sampling rate on multiscale entropy of electroencephalogram time series. Biocybern. Biomed. Eng. 2023, 43, 233–245. [Google Scholar] [CrossRef]
Aghajani, H.; Garbey, M.; Omurtag, A. Measuring Mental Workload with EEG+fNIRS. Front. Hum. Neurosci. 2017, 11, 00359. [Google Scholar] [CrossRef]
Dong, Y.; Tang, X.; Li, Q.; Wang, Y.; Jiang, N.; Tian, L.; Zheng, Y.; Li, X.; Zhao, S.; Li, G.; et al. An Approach for EEG Denoising Based on Wasserstein Generative Adversarial Network. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 3524–3534. [Google Scholar] [CrossRef]
Li, R.; Zhao, C.; Wang, C.; Wang, J.; Zhang, Y. Enhancing fNIRS Analysis Using EEG Rhythmic Signatures: An EEG-Informed fNIRS Analysis Study. IEEE Trans. Biomed. Eng. 2020, 67, 2789–2797. [Google Scholar] [CrossRef]
Abidi, A.; Nouira, I.; Assali, I.; Saafi, M.A.; Bedoui, M.H. Hybrid Multi-Channel EEG Filtering Method for Ocular and Muscular Artifact Removal Based on the 3D Spline Interpolation Technique. Comput. J. 2022, 65, 1257–1271. [Google Scholar] [CrossRef]
Kang, G.; Jin, S.-H.; Keun Kim, D.; Kang, S.W. T59. EEG artifacts removal using machine learning algorithms and independent component analysis. Clin. Neurophysiol. 2018, 129, e24. [Google Scholar] [CrossRef]
Rosenbaum, D.; Leehr, E.J.; Kroczek, A.; Rubel, J.A.; Int-Veen, I.; Deutsch, K.; Maier, M.J.; Hudak, J.; Fallgatter, A.J.; Ehlis, A.-C. Neuronal correlates of spider phobia in a combined fNIRS-EEG study. Sci. Rep. 2020, 10, 12597. [Google Scholar] [CrossRef]
Xu, H.; Li, C.; Shi, T. Is the z-score standardized RSEI suitable for time-series ecological change detection? Comment on Zheng et al. (2022). Sci. Total Environ. 2022, 853, 158582. [Google Scholar] [CrossRef]
Zhang, Y.; Suda, N.; Lai, L.; Chandra, V.J.A. Hello Edge: Keyword Spotting on Microcontrollers. arXiv 2017, arXiv:1711.07128. [Google Scholar] [CrossRef]
Tang, R.; Lin, J. Deep Residual Learning for Small-Footprint Keyword Spotting. arXiv 2017, arXiv:1710.10361. [Google Scholar] [CrossRef]
Cheng, C.; Parhi, K.K. Fast 2D Convolution Algorithms for Convolutional Neural Networks. IEEE Trans. Circuits Syst. I Regul. Pap. 2020, 67, 1678–1691. [Google Scholar] [CrossRef]
He, F.; Liu, T.; Tao, D. Why ResNet Works? Residuals Generalize. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 5349–5362. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C.J.J.o. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167. [Google Scholar] [CrossRef]
Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Zhang, X.J.U.A. TensorFlow: A system for large-scale machine learning. arXiv 2016, arXiv:1605.08695. [Google Scholar] [CrossRef]
Prechelt, L.J.S.B.H. Early Stopping—But When? In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar] [CrossRef]
Zibman, S.; Daniel, E.; Alyagon, U.; Etkin, A.; Zangen, A. Interhemispheric cortico-cortical paired associative stimulation of the prefrontal cortex jointly modulates frontal asymmetry and emotional reactivity. Brain Stimul. 2019, 12, 139–147. [Google Scholar] [CrossRef]
Segar, R.; Chhabra, H.; Sreeraj, V.S.; Parlikar, R.; Kumar, V.; Ganesan, V.; Kesavan, M. fNIRS study of prefrontal activation during emotion recognition–A Potential endophenotype for bipolar I disorder? J. Affect. Disord. 2021, 282, 869–875. [Google Scholar] [CrossRef]
Liang, Z.; Oba, S.; Ishii, S. An unsupervised EEG decoding system for human emotion recognition. Neural Netw. 2019, 116, 257–268. [Google Scholar] [CrossRef] [PubMed]
Gao, C.; Uchitomi, H.; Miyake, Y. Influence of Multimodal Emotional Stimulations on Brain Activity: An Electroencephalographic Study. Sensors 2023, 23, 4801. [Google Scholar] [CrossRef] [PubMed]
Xie, J.; Lan, P.; Wang, S.; Luo, Y.; Liu, G. Brain Activation Differences of Six Basic Emotions Between 2D Screen and Virtual Reality Modalities. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 700–709. [Google Scholar] [CrossRef] [PubMed]
Baldo, D.; Viswanathan, V.S.; Timpone, R.J.; Venkatraman, V. The heart, brain, and body of marketing: Complementary roles of neurophysiological measures in tracking emotions, memory, and ad effectiveness. Psychol. Mark. 2022, 39, 1979–1991. [Google Scholar] [CrossRef]
Vanutelli, M.E.; Grippa, E.; Balconi, M. 105. Hemodynamic (fNIRS), electrophysiological (EEG) and autonomic responses to affective pictures: A multi-method approach to the study of emotions. Clin. Neurophysiol. 2016, 127, e156. [Google Scholar] [CrossRef]
Jin, Z.; Xing, Z.; Wang, Y.; Fang, S.; Gao, X.; Dong, X. Research on Emotion Recognition Method of Cerebral Blood Oxygen Signal Based on CNN-Transformer Network. Sensors 2023, 23, 8643. [Google Scholar] [CrossRef] [PubMed]
Tang, T.B.; Chong, J.S.; Kiguchi, M.; Funane, T.; Lu, C.K. Detection of Emotional Sensitivity Using fNIRS Based Dynamic Functional Connectivity. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 894–904. [Google Scholar] [CrossRef]
Andreu-Perez, A.R.; Kiani, M.; Andreu-Perez, J.; Reddy, P.; Andreu-Abela, J.; Pinto, M.; Izzetoglu, K. Single-Trial Recognition of Video Gamer’s Expertise from Brain Haemodynamic and Facial Emotion Responses. Brain Sci. 2021, 11, 106. [Google Scholar] [CrossRef] [PubMed]
Sánchez-Reolid, R.; Martínez-Sáez, M.C.; García-Martínez, B.; Fernández-Aguilar, L.; Ros, L.; Latorre, J.M.; Fernández-Caballero, A. Emotion Classification from EEG with a Low-Cost BCI Versus a High-End Equipment. Int. J. Neural Syst. 2022, 32, 2250041. [Google Scholar] [CrossRef] [PubMed]
Chatterjee, S.; Byun, Y.-C. EEG-Based Emotion Classification Using Stacking Ensemble Approach. Sensors 2022, 22, 8550. [Google Scholar] [CrossRef]
Shah, S.J.H.; Albishri, A.; Kang, S.S.; Lee, Y.; Sponheim, S.R.; Shim, M. ETSNet: A deep neural network for EEG-based temporal–spatial pattern recognition in psychiatric disorder and emotional distress classification. Comput. Biol. Med. 2023, 158, 106857. [Google Scholar] [CrossRef]
Su, Y.; Hu, B.; Xu, L.; Cai, H.; Moore, P.; Zhang, X.; Chen, J. EmotionO+: Physiological signals knowledge representation and emotion reasoning model for mental health monitoring. In Proceedings of the 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2014, Belfast, UK, 2–5 November 2014; pp. 529–535. [Google Scholar] [CrossRef]
Sun, Y.; Ayaz, H.; Akansu, A.N. Neural correlates of affective context in facial expression analysis: A simultaneous EEG-fNIRS study. In Proceedings of the 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Orlando, FL, USA, 14–16 December 2015. [Google Scholar] [CrossRef]
Sun, Y.; Ayaz, H.; Akansu, A.N. Multimodal Affective State Assessment Using fNIRS + EEG and Spontaneous Facial Expression. Brain Sci. 2020, 10, 85. [Google Scholar] [CrossRef]
Wang, Y.; Yang, Z.; Ji, H.; Li, J.; Liu, L.; Zhuang, J. Cross-Modal Transfer Learning From EEG to Functional Near-Infrared Spectroscopy for Classification Task in Brain-Computer Interface System. Front. Psychol. 2022, 13, 2022. [Google Scholar] [CrossRef]
Zhao, Q.; Zhang, X.; Chen, G.; Zhang, J. EEG and fNIRS emotion recognition based on modal attention map convolutional feature fusion. Zhejiang Univ. J. 2023, 57, 1987–1997. Available online: https://kns.cnki.net/kcms/detail/33.1245.T.20231017.0939.008.html (accessed on 6 March 2024).
Maher, A.; Mian Qaisar, S.; Salankar, N.; Jiang, F.; Tadeusiewicz, R.; Pławiak, P.; Abd El-Latif, A.A.; Hammad, M. Hybrid EEG-fNIRS brain-computer interface based on the non-linear features extraction and stacking ensemble learning. Biocybern. Biomed. Eng. 2023, 43, 463–475. [Google Scholar] [CrossRef]
Al-Shargie, F.; Kiguchi, M.; Badruddin, N.; Dass, S.C.; Hani, A.F.M.; Tang, T.B. Mental stress assessment using simultaneous measurement of EEG and fNIRS. Biomed. Opt. Express 2016, 7, 3882–3898. [Google Scholar] [CrossRef] [PubMed]
Güven, A.; Altınkaynak, M.; Dolu, N.; İzzetoğlu, M.; Pektaş, F.; Özmen, S.; Demirci, E.; Batbat, T. Combining functional near-infrared spectroscopy and EEG measurements for the diagnosis of attention-deficit hyperactivity disorder. Neural Comput. Appl. 2020, 32, 8367–8380. [Google Scholar] [CrossRef]
Kassab, A.; Hinnoutondji Toffa, D.; Robert, M.; Lesage, F.; Peng, K.; Khoa Nguyen, D. Hemodynamic changes associated with common EEG patterns in critically ill patients: Pilot results from continuous EEG-fNIRS study. NeuroImage Clin. 2021, 32, 102880. [Google Scholar] [CrossRef]
Xu, T.; Zhou, Z.; Yang, Y.; Li, Y.; Li, J.; Bezerianos, A.; Wang, H. Motor Imagery Decoding Enhancement Based on Hybrid EEG-fNIRS Signals. IEEE Access 2023, 11, 65277–65288. [Google Scholar] [CrossRef]

Figure 2. Experimental paradigm for emotion induction.

Figure 3. Wearable portable fNIRS-EEG bimodal system. (a) NIR source-detector head gripper; (b) NIR light source probes; (c) Photoelectric detector; (d) EEG host module; (e) fNIRS host module; (f) bimodal headcaps; (g) Ag/AgCl dry electrodes; (h) Chip electrodes; (i) source-detector distance 30 mm; (j) Upper part of the optical pole holder; (k) Lower part of the optical pole holder.

Figure 4. Layout of the transmitter-receiver-electrode brain in the acquisition cap of the fNIRS-EEG bimodal system.

Figure 5. Flowchart of fNIRS-EEG dual-modal system design.

Figure 6. Upper computer processing platform design.

Figure 7. fNIRS data preprocessing process.

Figure 8. EEG data preprocessing process.

Figure 9. ICA independent principal component labeling.

Figure 10. Comparison between time convolution and 2D convolution.

Figure 11. TC-ResNet model. The kernel size j = 9. (a) Optional SE module; (b) normal block (s = 1); (c) reduction block (s = 2); (d) TC-ResNet8 model.

Figure 12. Label mapping diagram.

Figure 13. The average brain activation diagram of all participants under the four emotions of calm, sadness, happiness, and fear. (a) Maps of the average brain activation diagram of all participants under calm emotion; (b) maps of the average brain activation diagram of all participants under sad emotion; (c) maps of the average brain activation of all participants under happy emotion; (d) maps of average brain activation in all participants under fear emotion; (e) the average statistical values of four emotions across the 24 channels for all participants; (f) the average value change for all participants across the 24 channels; (g) changes in single-channel HbO concentration values for all participants in four emotions.

Figure 14. Plots of mean HbO concentrations in the salience channels for all participants in four emotions. (a) Calm emotion, channels 11, 12, and 18; (b) sad emotion, channels 6, 12, and 18; (c) happy emotion, channels 4, 5, and 14; (d) fear emotion, channels 3, 6, 14, 15, and 20.

Figure 15. Plots of the mean amplitude changes in the five frequency bands for all participants in the four emotion states. (a) Delta band mean amplitude changes in the four emotion states; (b) Theta band mean amplitude changes in the four emotion states; (c) Alpha band mean amplitude changes in the four emotion states; (d) Beta band amplitude changes in the four emotion states; (e) Gamma band mean amplitude changes in the four emotion states.

Figure 16. The average energy values of the power spectra of the frequency bands in the four emotions. (a) The average energy values of the power spectra of the Delta band in the four emotions; (b) the average energy values of the power spectra of the Theta band in the four emotions; (c) the average energy values of the power spectra of the Alpha band in the four emotions; (d) the average energy values of the power spectra of the Beta band in the four emotions; (e) the average energy values of the power spectra of the Gamma band in the four emotions; (f) distribution of the number of participants in the five EEG power spectra for the four emotions.

Figure 17. Normalized confusion matrix plot of classification accuracy in five different states.

Figure 18. Comparison of TC-ResNet8 model and TC-ResNet14 model.

Table 1. fNIRS-EEG bimodal system main technical specifications.

Technical Indicators	Parameters
Measurement projects	fNIRS: HbO, HbO, Hb. EEG: brain electric activity.
Channels	fNIRS: 24 channels. EEG: 20-lead.
Sampling frequency	fNIRS: ≤150 Hz. EEG: ≥500 Hz.
Weight of the main unit	fNIRS: ≤300 g. EEG: ≤65 g.
Size of the main unit	fNIRS: ≤8.5 × 8.5 × 3.5 cm. EEG: 6 × 8.5 × 2 cm.
Light/Electrode source type	fNIRS: LED. EEG: AgCl, wet electrodes.
Data transmission	Bluetooth, in real-time transmission distance 20 m.
Acquisition, Expansion	D-LAB plug-in for synchronization compatibility and support for EEG, fMRI, tDCS, and other device extensions.
Sensor technology	fNIRS: built-in 9-axis motion sensor. EEG: built-in 3-axis motion sensor.
fNIRS/EEG battery part	1. Power adapter: input is 100 V–240 V, 50/60 Hz, output is 5 V. 2. Battery type: built-in lithium battery, can be supplemented with external batteries and power banks to extend battery life. 3. Battery dimensions: 3 × 2.5 × 0.6 cm. 4. Battery capacity: 1400 mah. 5. Battery output voltage: 3.7 V. 6. Battery efficiency: 92%. 7. Battery endurance Time: ≥3 h.
fNIRS partial parameters	1. Spectral type: successive waves; wavelength (760 nm, 850 nm). 2. Source-probe quantity: 10, 8 (weight ≤ 12 g). 3. Detector type: SiPDs; sensitivity (<1 pW); dynamic range (≥90 dB). 4. Functions: one-stop data preprocessing, event and data editing, artifact correction, probe position editing, dynamic display of oximetry status, GLM, fast real-time display of 2D mapping maps, support for display of HbO, HbR, Hb status, signal quality detection in 2D, scalp, cerebral cortex, and glass view, etc.
EEG partial parameters	1. Bandwidths: 0–250 Hz; sync accuracy (≤1 ms). 2. Noise level: 1 uV rms. 3. Common mode rejection ratio: ≥120 dB. 4. Functions: 3D current density map, 3D FFT mapping and spectrum analysis, inter-/intra-group comparison, real-time display of the signal detection status of each electrode, MATLAB and other real-time communication and remote control ports, support for automatic filtering and classification of different EEG waves, online EEG impedance detection, filtering settings and data analysis, etc.

Table 2. Statistical results of participants’ self-assessment scale.

Evoke Emotions	Average Value	Standard Deviation
Calm	9.351	0.20149
Sad	9.474	0.21360
Happy	9.1	0.16456
Fear	9.54	0.27401

Table 3. Details of the movie clips used in our emotion recognition experiments.

Labels	Film Clip Sources	Clips	Chinese Audience Web Rating
Calm	Tip of the Tongue China	2	9.0
Sad	Tangshan Earthquake	5	9.9
Happy	Lost in Thailand/Kung Fu Panda	6/4	9.7/9.8
Fear	A Wicked Ghost/Soul Ferry/double pupils	5/3/2	9.3/8.9/8.8

Table 4. Comparison of methods for time convolution and 2D convolution.

Method	Weights	MACs	Output
2D convolution	$W \in H^{3 \times 3 \times 1 \times 13 c^{'}}$	$3 \times 3 \times 1 \times f E \times T \times 13 c^{'}$	$Y = H^{T \times f E \times 13 c^{'}}$
This method	$W_{1} \in H^{3 \times 1 \times f E \times c^{'}}$	$3 \times 1 \times f E \times T \times 1 \times c^{'}$	$Y_{1} = H^{T \times 1 \times c^{'}}$

Table 5. Key parameters of sliding window approach.

Parameter Name	Parameters	Descriptive
Window size (n_mels)	250	250 × 0.004 = 1, window size is 1 s
Time step (step_len)	25	Capture a window every 0.1 s
fNIRS_seq_len	Number of fNIRS features	-
EEG_seq_len	Number of EEG features	-

Table 6. Pearson correlation coefficients between EEG and fNIRS.

	Delta	Theta	Alpha	Beta	Gamma
Mean	r = −0.372	r = −0.515	r = −0.266	r = 0.682	r = 0.988
Mean	p = 0.628	p = 0.485	p = 0.234	p = 0.318	p = 0.012
Variance	r = −0.202	r = −0.328	r = −0.441	r = −0.854	r = −0.905
Variance	p = 0.798	p = 0.672	p = 0.020	p = 0.036	p = 0.095
Slope	r = − 0.725	r = −0.705	r = − 0.813	r = −0.572	r = − 0.108
Slope	p = 0.028	p = 0.030	p = 0.187	p = 0.428	p = 0.892
Peak	r = −0.08	r = −0.213	r = −0.081	r = −0.914	r = −0.844
Peak	p = 0.920	p = 0.787	p = 0.919	p = 0.086	p = 0.116
Kurtosis	r = 0.901	r = 0.833	r = 0.959	r = 0.692	r = 0.007
Kurtosis	p = 0.099	p = 0.167	p = 0.041	p = 0.308	p = 0.993

Table 7. Mood classification results of EEG, fNIRS, and fNIRS-EEG.

Modality	Accuracy (%)
fNIRS	86.70 [55]
	89.49 [56]
	91.44 [57]
EEG	98.78 [58]
	99.55 [59]
	99.57 [60]
fNIRS + EEG	99.81

Table 8. Comparison of emotion classification results of different emotion recognition models.

Data Type	Models	Accuracy (%)	Time (ms)
fNIRS-EEG	RF [61]	99.11	-
fNIRS-EEG	RHMM-SVM [62]	75	-
fNIRS-EEG	SVM [63]	80	-
fNIRS-EEG	CNN [55]	82	-
fNIRS-EEG	CNN-Transformer [55]	86.7	-
fNIRS-EEG	R-CSP-E [64]	66.83	-
fNIRS-EEG	backpropagation ANN [61]	96.92	-
fNIRS-EEG	MA-MP-GF [65]	95.71	-
fNIRS-EEG	Stacking ensemble learning [66]	95.83	-
fNIRS-EEG	This work	99.81	1.1

Table 9. Comparative analysis of different variants of TC-ResNet models.

Models	Accuracy (%)	Time (ms)	FLOPs (M)
TC-ResNet8-1.5	99.82	2.8	6.6
TC-ResNet14	99.83	2.5	6.1
TC-ResNet14-1.5	99.85	5.7	13.4
2D-TC-ResNet8	99.81	10.1	35.8
TC-ResNet8-pooling	98.63	3.5	4
This work	99.81	1.1	3

Table 10. Comparison of main parameters of fNIRS-EEG systems.

Parameters	[67]	[68]	[69]	[22]	[70]	This Work
Wavelength (nm)	695, 830	730, 850	760, 850	750, 850	762, 845.5	760, 850
fNIRS/EEG Channels	23/16	16/4	128/19	8/32	20/64	24/20
fNIRS/EEG sampling rate (Hz)	10/256	2/2500	20/500	10/500	5/1000	≤150/≥500
Photopolar spacing (mm) (see Supplementary Materials S2 fNIRS optodes)	30	25	25–50	35	-	10–55
Light source type	-	LED	-	-	LED	LED
Detector type	APD	Photodiode	-	-	APD	SiPDs
Source-probe quantity	16, 8	4, 10	32, 32	8, 2	-	10, 8
Data transmission	Wireline transmission	Wireline transmission	Wireline transmission	Wireline transmission	Wireline transmission	Bluetooth wireless
High-density measurement	No support	No support	Support	No support	Support	Support
Operational complexity	Ordinary	Ordinary	Highly complex	Simple	Complex	Simple
Instrument power supply method	Direct AC power supply	Direct plug-in or battery-powered with rechargeable batteries	Direct AC power supply	Direct AC power supply	Direct AC power supply	Direct plug-in or battery-powered with rechargeable batteries
Portable and compact design	Wearable	Wearable	Wearable	Wearable	Wearable	Small, portable, and wearable

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Yu, K.; Wang, F.; Zhou, Z.; Bi, Y.; Zhuang, S.; Zhang, D. Temporal Convolutional Network-Enhanced Real-Time Implicit Emotion Recognition with an Innovative Wearable fNIRS-EEG Dual-Modal System. Electronics 2024, 13, 1310. https://doi.org/10.3390/electronics13071310

AMA Style

Chen J, Yu K, Wang F, Zhou Z, Bi Y, Zhuang S, Zhang D. Temporal Convolutional Network-Enhanced Real-Time Implicit Emotion Recognition with an Innovative Wearable fNIRS-EEG Dual-Modal System. Electronics. 2024; 13(7):1310. https://doi.org/10.3390/electronics13071310

Chicago/Turabian Style

Chen, Jiafa, Kaiwei Yu, Fei Wang, Zhengxian Zhou, Yifei Bi, Songlin Zhuang, and Dawei Zhang. 2024. "Temporal Convolutional Network-Enhanced Real-Time Implicit Emotion Recognition with an Innovative Wearable fNIRS-EEG Dual-Modal System" Electronics 13, no. 7: 1310. https://doi.org/10.3390/electronics13071310

APA Style

Chen, J., Yu, K., Wang, F., Zhou, Z., Bi, Y., Zhuang, S., & Zhang, D. (2024). Temporal Convolutional Network-Enhanced Real-Time Implicit Emotion Recognition with an Innovative Wearable fNIRS-EEG Dual-Modal System. Electronics, 13(7), 1310. https://doi.org/10.3390/electronics13071310

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Temporal Convolutional Network-Enhanced Real-Time Implicit Emotion Recognition with an Innovative Wearable fNIRS-EEG Dual-Modal System

Abstract

1. Introduction

1.1. Background and Motivation

1.2. Related Works

1.3. Research Gap

1.4. Contribution

2. Materials and Methods

2.1. Participants

2.2. Experimental Procedure

2.3. Overview of the New Portable Wearable Functional Near-Infrared Spectroscopy-Electroencephalography (fNIRS-EEG) System

2.3.1. Functional Near-Infrared Spectroscopy-Electroencephalography (fNIRS-EEG) Data Preprocessing

2.3.2. Functional Near-Infrared Spectroscopy-Electroencephalography (fNIRS-EEG) Correlation Analysis

2.4. Temporal Convolutional Network (TC-ResNet) Model

2.4.1. Temporal Convolution for Emotion Recognition

2.4.2. Temporal Convolutional Network (TC-ResNet) Architecture

2.4.3. Temporal Convolutional Network (TC-ResNet8) Setup

3. Results and Analysis

3.1. Functional Near-Infrared Spectroscopy (fNIRS) Data Analysis

3.2. Electroencephalography (EEG) Data Analysis

3.3. Functional Near-Infrared Spectroscopy-Electroencephalography (fNIRS-EEG) Data Correlation

3.4. Analysis of Emotion Recognition Classification Results

3.5. Temporal Convolutional Network (TC-ResNet) Model Evaluation

4. Results and Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI