Emotion Recognition in a Closed-Cabin Environment: An Exploratory Study Using Millimeter-Wave Radar and Respiration Signals

Wang, Hanyu; Chen, Dengkai; Gu, Sen; Zhou, Yao; Xiao, Jianghao; Sun, Yiwei; Sun, Jianhua; Huang, Yuexin; Zhang, Xian; Fan, Hao

doi:10.3390/app142210561

Open AccessArticle

Emotion Recognition in a Closed-Cabin Environment: An Exploratory Study Using Millimeter-Wave Radar and Respiration Signals

by

Hanyu Wang

^1,2

,

Dengkai Chen

^1,2,*,

Sen Gu

³,

Yao Zhou

¹,

Jianghao Xiao

¹,

Yiwei Sun

¹,

Jianhua Sun

¹,

Yuexin Huang

¹,

Xian Zhang

¹ and

Hao Fan

⁴

¹

Key Laboratory of Industrial Design and Ergonomics, Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi’an 710072, China

²

Industrial Design and Ergonomics Innovation Center, Ningbo Institute of Northwestern Polytechnical University, Ningbo 315103, China

³

School of Mechanical and Electrical Engineering, Henan University of Technology, Zhengzhou 450001, China

⁴

Department of Design, Southeast University, Nanjing 211189, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(22), 10561; https://doi.org/10.3390/app142210561

Submission received: 14 October 2024 / Revised: 10 November 2024 / Accepted: 12 November 2024 / Published: 15 November 2024

(This article belongs to the Special Issue Research on Biomechanics, Equipment Development, Motor Control and Learning of Human Movements)

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

This study has the potential to provide a reference for improving personnel and system safety and working performance in closed cabin environment.

Abstract

In the field of psychology and cognition within closed cabins, noncontact vital sign detection holds significant potential as it can enhance the user’s experience by utilizing objective measurements to assess emotions, making the process more sustainable and easier to deploy. To evaluate the capability of noncontact methods for emotion recognition in closed spaces, such as submarines, this study proposes an emotion recognition method that employs a millimeter-wave radar to capture respiration signals and uses a machine-learning framework for emotion classification. Respiration signals were collected while the participants watched videos designed to elicit different emotions. An automatic sparse encoder was used to extract features from respiration signals, and two support vector machines were employed for emotion classification. The proposed method was experimentally validated using the FaceReader software, which is based on audiovisual signals, and achieved an emotion classification accuracy of 68.21%, indicating the feasibility and effectiveness of using respiration signals to recognize and assess the emotional states of individuals in closed cabins.

Keywords:

millimeter-wave radar; respiration signal; machine-learning; emotion recognition; closed cabin

1. Introduction

Emotion is an adaptive multidimensional response triggered by meaningful events or stimuli that influence thoughts, feelings, behaviors, and interactions with others in daily life [1,2]. These responses are underpinned by changes in cognitive, physiological, motivational, motor expression, and subjective feeling systems that facilitate effective self-regulation mechanisms that help humans adapt to complex and ever-changing environments [3]. These changes can be expressed in several ways. In addition to the well-known methods of verbal communication, facial expressions, and body movements, they also include alterations in biological markers, such as heart rate (HR), electrical brain signals, and respiration (RSP) [4,5,6].

Closed cabins are typically used to create small environments in which humans can temporarily survive under harsh conditions, thereby facilitating the exploration and re-search of specific environments [7]. For example, in deep-sea submersibles, missions require high cognitive processing from the submersible crew in a high-stress environment. Consequently, changes in emotions can significantly affect mission performance, potentially leading to reduced work efficiency or even severe accidents [8]. Current research on the impact of missions on submersible crew members primarily focuses on the effects of the environment and the comfort of operational postures [9,10]. However, these studies have not considered the influence of emotions on mission performance, and research methods for emotion recognition in closed-cabin environments are scarce.

Studies have found that emotions significantly impact work performance and influence individuals’ cognitive abilities and attention distribution. Positive emotional states generally enhance focus and creativity, facilitating problem-solving and task completion [11]. An individual’s emotional state affects their decision-making process; negative emotions may lead to impulsive and irrational decisions [12]. Furthermore, emotional states influence work motivation and engagement [12,13]. Positive emotions help boost work motivation, increase enthusiasm, and enhance self-motivation, which, in turn, improves job performance [14].

Currently, various methods are available for emotion recognition, such as those based on computer vision and speech analysis [15]. Computer vision-based methods rely primarily on capturing external changes in facial expressions, eye movements, and body movements to recognize emotions [16,17,18]. However, these external expressions of emotion can be consciously suppressed or exaggerated, and are easily influenced by factors such as culture, environment, age, and gender. Speech-based methods typically utilize acoustic analysis because features such as pitch, tone, and clarity of speech signals are highly correlated with underlying emotions [19]. Inferring emotional states from the semantics of speech is a common approach. Nevertheless, the reliability of these methods in closed spaces such as submarines is questionable. The lighting conditions inside such cabins may not support clear facial expression capture [20], and speech signals during work may not contain sufficient emotional information.

Emotion recognition through physiological signals has also been reported using metrics such as the respiration rate, blood flow, galvanic skin response (GSR), electroencephalography (EEG), and electrocardiography (ECG) [21,22,23,24,25]. Physiological signals cannot be consciously controlled by individuals, which makes these methods more closely related to internal feelings and more accurate in reflecting emotional changes. However, most of these physiological systems rely on direct contact with highly invasive sensors and specific precision instruments. This contrasts with the practical conditions of missions in a closed cabin, where continuous data collection cannot be guaranteed and the awareness of being monitored may affect the results. Therefore, because some vital signs can be detected noninvasively, using noncontact systems for emotion recognition is advantageous.

In the field of emotion recognition using radar sensors, Healey [26] demonstrated that physiological information collected from a professional actor under eight different emotional states, combined with feature extraction algorithms, Fisher’s linear discriminant analysis, and leave-one-out testing, could distinguish between anger and calm with high accuracy (90–100%). In addition, high and low arousal were significantly more distinguishable than were high and low valence. In this context, arousal describes the intensity or level of activity of moods and emotions, indicating whether they are relaxed/calm or excited/stimulated. Valence, on the other hand, refers to the positive or negative direction of moods and emotions, indicating whether they are positive or negative. Zhao [27] used a millimeter-wave radar to extract and separate heartbeat and respiration signals, manually extracting 27 physiological signal-related features as inputs for two classification models. The average recognition accuracy for the four emotions was 87 and 72.3% for the person-dependent and person-independent classification models, respectively. However, current radar-based emotion recognition methods require manual feature identification to improve accuracy and have not been optimized for a closed-cabin environment, thereby providing a gap for our research.

In this paper, we hypothesize that millimeter-wave radar sensors can classify individuals’ emotions by measuring their respiratory signals in conjunction with machine learning algorithms within a closed-cabin environment. To address the issue of emotion recognition for personnel in closed cabins, this study proposes a method using a millimeter-wave radar to collect respiration signals, combined with a machine-learning framework for emotion classification and recognition. First, we used a continuous-wave radar to measure the respiration signals of the participants in a closed-cabin laboratory while they watched videos designed to elicit different emotions. After processing, we obtained respiration waveforms corresponding to different emotional states. Next, we used a sparse autoencoder (SAE) to extract features from the waveforms. These features were then inputted into two support vector machines (SVMs) for arousal and valence classification. Our method was compared with the FaceReader™, a commercial emotion recognition device that uses audiovisual signals. The results indicate that our method is not only more suitable for deployment in closed cabins but also maintains accuracy and provides objective results. The overall research framework is illustrated in Figure 1.

The contributions of this paper can be summarized as follows. First, we used a millimeter-wave radar to capture human respiration signals in a closed-cabin environment, which is noncontact and interference-free. Second, considering the characteristics of feature extraction and selection in machine-learning, SAEs were introduced to extract respiration features for emotion recognition. Third, the results of the experiment demonstrated that RSP signals can be effectively used for emotion recognition, eliminating the need for more complex physiological signals such as EEG [28]. Although machine-learning involves significant computational effort during model training, the time required for testing is minimal. Finally, we preliminarily validated that the system can be applied to closed-cabin environments. Overall, this study provides a foundation for the development of noncontact emotion recognition systems that can operate effectively in challenging environments, ensuring safety and efficiency during critical operations.

The remainder of this paper is as follows. In Section 2, the working principle of a millimeter-wave radar is explained, the experimental process is described, and the signal preprocessing algorithms used before emotion classification are introduced. Then, the implementation of the classifier and the feature extraction process were described. Section 3 presents the experimental results and cross-validation methods. Section 4 provides a discussion of the results. Finally, Section 5 presents the conclusions.

2. Methodology

In this study, we employed a comprehensive methodology to investigate emotion recognition in a closed-cabin environment using noncontact millimeter-wave radar for respiration signal monitoring. Our experimental setup was specifically designed to simulate real-world closed-cabin conditions, ensuring an environment conducive to reliable data collection. A 77 GHz frequency-modulated continuous-wave (FMCW) radar sensor with a 2-transmitter and 4-receiver MIMO configuration was used to capture respiration signals at a sampling rate of 32 Hz. This system, installed within a shielded, soundproof room, allowed us to measure respiration rates accurately, unaffected by external environmental factors such as lighting and airflow.

The participants were seated in a controlled laboratory setting with ambient conditions maintained at 25 °C and lighting set to 6500 K × 600 lx to support a relaxed atmosphere. They watched a series of emotion-inducing video clips designed to elicit specific emotional responses (e.g., high-arousal positive, low-arousal negative) while the radar recorded their respiration signals. Respiration data were collected in segments of 20 s for a total of 1200 recordings across 20 participants, using pre-screening criteria aligned with operational requirements in closed-cabin environments, such as manned submersibles.

Signal preprocessing included the application of bandpass filters to remove noise and isolate the respiration signal from mixed physiological data. We then applied the variational mode extraction (VME) algorithm to further refine the signal for analysis, allowing accurate identification of respiration patterns associated with different emotional states. These data were subsequently used as input for a sparse autoencoder (SAE) combined with a support vector machine (SVM) for emotion classification. This methodology not only provided a noninvasive approach to emotion recognition but also enabled a practical implementation framework suitable for closed-cabin applications where real-time, continuous monitoring is critical.

2.1. Methods for Collection and Processing of Respiration Signals

2.1.1. Working Principle of Millimeter-Wave Radar for Biosignal Detection

The principle of bio-signal detection with millimeter-wave radar relies on the Doppler effect, where the received signal reflects distance variations between the radar antenna and the participant’s chest wall, driven by cardiac and respiratory movements. The radar continuously emits a digitally generated sinusoidal carrier wave, and for objects with periodic motion, the radar receiver component receives echoes that vary periodically over time. This characteristic makes it possible to detect vital signs such as the periodic movement of the chest caused by breathing or heartbeats, from which physiological information such as heart and respiration rates can be extracted [29].

We selected a 77 GHz millimeter-wave radar sensor (Figure 2) for this study. The radar sensor based on frequency-modulated continuous wave (FMCW) signals for radar detection, featuring a 2-transmitter, 4-receiver, multiple-input multiple-output (MIMO) antenna configuration. This system achieves a detection range of 0.1 to 2 m and operates independently of environmental conditions, including temperature, humidity, noise, airflow, dust, and lighting. The high integration and bandwidth characteristics of this sensor chip enable flexible applications with a compact size. The sampling rate was set to 32 Hz to facilitate data processing.

2.1.2. Valence–Arousal Emotion Model

In this study, Russell’s circumplex model, the valence–arousal (VA) two-dimensional model of emotion, was used to aid emotion recognition [30]. This theory suggests that a common, interconnected neurophysiological system is responsible for all emotional states. Emotions can be distributed in a two-dimensional space consisting of arousal and valence dimensions. Arousal is the vertical axis, measuring the intensity of emotional activation, whereas valence is the horizontal axis, describing how positive or negative an emotion is.

Figure 3 illustrates how this theory categorizes emotions. The arousal and valence of an emotion can be classified into four categories. Thus, emotion-recognition tasks can be divided into two binary classification tasks.

2.1.3. Respiration Data Acquisition Based on Millimeter-Wave Radar

We designed the entire experimental system according to a closed-cabin environment, and placed the millimeter-wave radar inside the backrest of a chair. According to the pre-test, this method can collect the signal according to the demand and at the same time avoid the interference of the previous radar, which needs to be placed in front of the experiment. Because the relative distance between the participant and sensor changes very little, it is important that the deployment method allows the signal processing method to disregard the distance as much as possible. Since the radar sensor can detect only one person at a time, only one subject was allowed to remain in the experimental area once the experiment began, to ensure accurate data collection.

The participants were screened according to the requirements of the diving mission by consulting with experts related to manned deep diving missions, and the experiment was conducted on 20 male participants, aged 21–40 years, at intervals of at least two days. All subjects had no history of cardiovascular or psychiatric disorders or other medical contraindications. Additionally, none had consumed alcohol within the previous three days, and all reported adequate sleep. This sample selection was balanced using a convenience sampling method, which allowed us to easily recruit participants from an existing pool [31]. Convenience sampling was deemed appropriate for this preliminary study to gather initial data quickly and evaluate the feasibility of the proposed emotion recognition method. During the experiment, the RSP signals of each participant were measured while watching different video clips that elicited different emotions, as shown in Figure 4.

The experimental setup is illustrated in Figure 5. To test whether the methodology of this study could be applied to a closed-cabin environment, this experiment was conducted in the closed-cabin laboratory at the Department of Industrial Design, College of Electrical and Mechanical Engineering, Northwestern Polytechnical University. The laboratory is an electrically shielded and soundproof room with manually controlled light and temperature. The room temperature is set at 25 °C, and the lighting conditions are configured to 6500 K × 600 lx, which, according to a previous study [20], can help create a relaxed atmosphere. The experimental steps and details were explained to all participants prior to the experiment, and written informed consent was obtained from all participants, who were assured that the data obtained from this experiment would not be used for other purposes. The experiment was approved by Northwestern Polytechnical University (No. 202202053) and complied with the Declaration of Helsinki.

A total of 1200 sets of 1 min RSP signal sequences were collected from each of the 20 participants, including 300 sets for each emotion. The data were randomly selected for training (80%) and testing (20%).

To achieve rapid onsite emotion recognition, we evaluated different lengths of RSP signal segments, ranging from 5 to 60 s. After considering both effectiveness and efficiency, we decided to use 20 s segments. Given that our sampling rate was set at 32 Hz, we segmented the signal data into 20 s segments, starting from the beginning of each record, with the segmentation window moving every second. This means that in two adjacent segments, the latter segment starts from 2-s of the former segment, resulting in a 19 s overlap between adjacent segments. Thus, for a 60 s trial video, we obtained 41 segments. Each participant performed 60 trials, and each trial produced 41 segments, resulting in a total of 49,200 data samples.

2.1.4. Emotion Stimulus Materials

We selected video clips capable of eliciting different emotions [32,33] as stimulus materials and measured the RSP signals of each participant while watching the videos. The comedic clips induced high-arousal and high-valence emotions (such as happiness), whereas the horrific videos elicited high-arousal and low-valence emotions (such as fear or anger).

To induce genuine emotions in the participants, we selected stimuli capable of eliciting four emotional states: high-valence low-arousal, low-valence high-arousal, high-valence high-arousal, and low-valence low-arousal. In total, 120 video clips were used, with 30 clips corresponding to each emotional state. After all 120 clips had been rated by at least 15 participants, we selected the 60 highest and most consistently rated videos to ensure high-quality experimental data, with 15 clips for each emotional state.

For each video, normalized arousal and valence scores were calculated by dividing the mean rating by the standard deviation. We then selected the 15 videos closest to the polar extremities in each quadrant of the normalized valence–arousal space, as illustrated in Figure 6. During the experiment, the videos were played in an order designed to enhance the emotional effect, which continuously and effectively stimulated the participants throughout the process. To avoid potential bias, the participants were not informed in advance of the type of video they would watch. Each test session was approximately 30 min.

Since the primary goal of this work is to validate the use of a millimeter-wave radar system for emotion recognition of personnel in closed cabins, we employed FaceReader to simultaneously measure participants’ emotions as a means of verification and comparison. FaceReader 8.1 is a professional facial analysis software developed by Noldus, which is capable of recognizing facial expressions [34]. FaceReader can identify six basic emotions—happiness, sadness, anger, surprise, fear, and disgust—in addition to the neutral emotional state. When connected to a camera, the software first detects the face and then determines the expression based on facial muscle movements, achieving an accuracy of up to 95% [34]. The analysis results provide the probabilities of the six basic emotions occurring at corresponding time intervals and generate valence and arousal scores. A higher valence score indicates more positive emotions, whereas a lower valence score indicates more negative emotions.

The software’s effectiveness in recognizing facial expressions for Chinese faces is reasonably good, particularly for the emotions of “happiness”, “surprise”, and “neutral”. As shown in Figure 5, the measurement results from FaceReader are displayed on a separate computer.

2.1.5. Respiration Signal Processing

Currently, there are many methods for processing bio-signals based on a millimeter-wave radar, such as the variational mode decomposition (VMD) algorithm [35], the ensemble empirical mode decomposition (EEMD) algorithm, which is an improvement based on the empirical mode decomposition (EMD) algorithm [36], and the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) algorithm [37]. The variational mode extraction (VME) algorithm offers advantages over the VMD algorithm in terms of lower time complexity for extracting specific mode signals and the ability to determine the number of modes without prior specification [38].

As shown in Table 1, respiration and heartbeat signals exhibit different frequency characteristics. Heartbeat signal frequencies typically range from 0.8 to 2.0 Hz, while RSP signal frequencies range from 0.1 to 0.5 Hz. Therefore, a Butterworth bandpass filter was used to remove the signal components in different frequency ranges. By filtering the heartbeat signal components from the mixed vital sign signals, we can extract the RSP signals, thereby separating the respiration and heartbeat signals.

To select the appropriate signal processing method, the same 1 min data segments were sequentially processed using the EMD, EEMD, ICEEMDAN, VMD, and VME algorithms. The estimated average respiration rates obtained using these methods were compared with the actual values. Additionally, the data were divided into 20 s segments, processed using different methods, and the average processing time required for each method was estimated. The final results are presented in Table 2.

From Table 2, it can be observed that the EEMD algorithm estimates the RSP rate with a significantly lower accuracy than the ICEEMDAN, VMD, and VME algorithms. The accuracies of the ICEEMDAN, VMD, and VME algorithms were similar, with the ICEEMDAN performing slightly better than the VME, although the differences were minimal. The processing times for the EEMD and ICEEMDAN algorithms exceeded the data duration, rendering real-time computation infeasible. Although the processing time of the VMD algorithm was less than 20 s, because it was sufficiently close to this limit, it can easily lead to buffer overflow in the actual system, posing a risk to system stability.

The specific steps for the final data processing are as Figure 7:

Sample and parse the data acquired from the millimeter-wave radar;
Perform three different dimensional analyses on the intermediate frequency signal to obtain target information;
Conduct a fast Fourier transform on the received data in the range dimension to obtain the positional information;
Eliminate stationary objects in the indoor environment to remove clutter;
Extract and analyze the phase to obtain vital signs signals and use bandpass filters for preliminary signal separation;
Apply the VME algorithm(see Algorithm 1) to the extracted vital signs signals to isolate the RSP signals.

Algorithm 1: VME.
Initialize ${\hat{u}}_{d}^{1}, {\hat{λ}}^{1}, n \leftarrow 0$ and $ω_{d}^{1} \leftarrow$ initial guess
Repeat
$n \leftarrow n + 1$
(1) Update ${\hat{u}}_{d}$ for all $ω \geq 0$ :
${\hat{u}}_{d}^{n + 1} (ω) = \frac{\hat{f} (ω) + a^{2} {(ω - ω_{d}^{n + 1})}^{4} {\hat{u}}_{d}^{n} (ω) + \frac{\hat{λ} (ω)}{2}}{[1 + α^{2} {(ω - ω_{d}^{n + 1})}^{4}] [1 + 2 α {(ω - ω_{d}^{n})}^{2}]}$	(1)
(2) Update ω_d:
$ω_{d}^{n + 1} = \frac{\int_{0}^{\infty} ω {\| {\hat{u}}_{d}^{n + 1} (ω) \|}^{2} d ω}{\int_{0}^{\infty} {\| {\hat{u}}_{d}^{n + 1} (ω) \|}^{2} d ω}$	(2)
(3) Dual Ascent for all $ω \geq 0$ :
${\hat{λ}}^{n + 1} = {\hat{λ}}^{n} + τ [\frac{\hat{f} (ω) - {\hat{u}}_{d}^{n + 1} (ω)}{1 + α^{2} {(ω - ω_{d}^{n + 1})}^{4}}]$	(3)
until convergence: $\frac{{‖{\hat{u}}_{d}^{n + 1} - {\hat{u}}_{d}^{n}‖}_{2}^{2}}{{‖{\hat{u}}_{d}^{n}‖}_{2}^{2}} < ϵ .$

Figure 7. Respiration signal processing.

The final RSP signals are presented in Figure 8. Using the VME algorithm allows for direct extraction of the corresponding signals, thereby lowering the computational load and improving the execution speed of the program.

After the experiment, the respiration data of the 20 participants in four different emotional states were collected, counted, and tabulated, as shown in Table 3. No further statistical analysis of the data features was performed because we used a machine-learning approach to obtain useful features.

2.2. Emotion Recognition Method Based on Sparse Auto-Encoder and Support Vector Machine

We investigated an effective method for mapping users’ physiological signals to their emotional states. As shown in Figure 9, we use an SAE to transform raw signals into extracted features. The sparse representations learned by the SAE were then used as inputs to train the SVM as a classifier for recognizing emotional states.

First, an SAE was trained on the raw input x^(k) to learn its primary features h₁^(k) of the raw input. These primary features were then used as the input for another SAE to learn secondary features h₂^(k). These secondary features were then used as inputs for an SVM classifier, which was trained to map the secondary features to the data labels. Finally, all three layers were combined to form a stacked SAE with two hidden layers and an SVM classifier layer that can classify the collected RSP signal dataset as required [39].

This combined approach of using an SAE and an SVM leverages the strengths of both methods to enhance the generalization ability and performance of the model, particularly when the data have high dimensionality, a small sample size, or an unclear feature representation. Our hypothesis is that by automatically extracting features through deep learning, we can produce a more powerful and accurate emotion recognition model.

2.2.1. Feature Selection of Respiration Signals Based on Sparse Auto-Encoder

An SAE is a deep-learning method used to automatically learn features from unlabeled data. As a fundamental element of an SAE, an autoencoder (AE) transforms input data into hidden representations or extracts features through an encoder (Figure 10). The AE learns to map these features back to the input space using a decoder. Despite the presence of minor reconstruction errors in the training examples, the goal was to match the reconstructed input data to the original input data as closely as possible. The entire structure uses the features extracted by one AE as the input for another AE, thereby hierarchically determining general representations from the input data [40].

Therefore, the SAE pre-trains the AE and uses it as its first hidden layer. These techniques can also be applied to train subsequent hidden layers. Once pre-training is complete, the first layer remains unchanged, while the other hidden layers are trained. Generally, the encoder maps the input example

x \in R^{n}

to the hidden representation

h (x) \in R^{m}

as follows:

h (x) = f (W_{1} \times x + b_{1}),

(4)

where

W_{1} \in R^{m \times n}

is a weight matrix,

b_{1} \in R^{m}

is a bias vector, and

f (z)

is a nonlinear activation function, typically a sigmoid function

f (z) = 1 / (1 + e x p (- z))

. The decoder maps the hidden representation back to the reconstructed input

\tilde{x} \in R^{n}

:

\tilde{x} = f (W_{2} \times h (x) + b_{2}),

(5)

where

W_{2} \in R_{m \times n}

is a weight matrix,

b_{2} \in R_{n}

is a bias vector, and f is the same function used in the encoder. To minimize the difference between the original input xxx and reconstructed input x~\tilde{x}x~, we consider reducing the reconstruction gap

\sum_{i = 1}^{D} ‖x^{(i)} - {\tilde{x}}^{(i)}‖

. Given a training dataset

x^{(i)}, i = 1, \dots, D

with D examples, we adjust the weight matrices

W_{1}

and

W_{2}

as well as the bias vectors

b_{1}

and

b_{2}

through backpropagation. In addition, we impose a sparsity constraint on the expected activation of the hidden units by adding a penalty term. This leads to the following optimization problem:

m i n \sum_{i = 1}^{P} {‖x^{(i)} - {\tilde{x}}^{(i)}‖}^{2} + β \sum_{j = 1}^{m} S P (ρ ‖\hat{ρ_{j}}) .

(6)

The sparse penalty term is

S P (ρ ‖\hat{ρ_{j}}) = ρ \log \frac{ρ}{\hat{ρ_{j}}} + (1 - ρ) \log \frac{1 - ρ}{1 - \hat{ρ_{j}}},

(7)

where

\hat{ρ_{j}} = \frac{1}{D} \sum_{i = 1}^{D} h_{j} (x^{(i)})

is the average activation of hidden unit, j,

ρ

is the sparsity level, and β is the weight of the sparsity penalty term. By minimizing the cost function using the L-BFGS (Limited-memory BFGS) algorithm [41], we obtain the optimal

W_{1}

and

b_{1}

to determine the internal features

h (x)

.

We trained the SAE to extract features from the RSP signals and emotional states present in the dataset. Through preliminary experiments with one- and two-layer SAEs and by testing with 20 neurons in each hidden layer, we selected the topology of the AE. After pretraining each layer of the SAE, we fine-tuned the deep learning framework to obtain the optimal SAE configuration with two hidden layers. The first and second hidden layers contained 200 and 50 neurons, respectively.

To demonstrate the reconstruction capability of the SAE, we inputted the same RSP signal (shown in blue) from Figure 8 into the SAE. The red line in Figure 11 represents the output of the SAE (the reconstructed input).

2.2.2. Emotion Classification Method Based on Support Vector Machine

We employed an SVM as the classifier to balance the performance and power consumption of the proposed method. SVMs are nonlinear models that have been widely used in various fields. Because SVMs use support vectors to determine the optimal hyperplane that maximizes the margin, the complexity of finding the hyperplane function is reduced and the generalization ability of the classifier is enhanced, especially with smaller datasets. SVMs have been proven to be effective models for detecting comfort levels and emotions [42,43].

The SVM calculation steps are as follows:

Input:

(x_{i}, y_{i}), i = 1, 2, \dots, n

Given the nonlinearity of the RSP dataset samples, we introduced the radial basis function as a kernel function K. This kernel function transforms nonlinear data samples in the dataset into linearly separable data samples as follows:

K (x, x_{i}) = e x p (- \frac{{‖x - x_{i}‖}^{2}}{2 σ^{2}}) (σ > 0) .

(8)

Range of linear sample data:

x \in R^{d}, y \in {+ 1, - 1}

. The results of the linear SVM are determined by the parameters ω and b, and the sign function sgn, where ω is the weight vector and b is the bias. This process was conducted as follows.

An optimal solution problem for the objective function is constructed:

$\{\begin{matrix} \min ϕ (ω) = \frac{1}{2} {‖ω‖}^{2} \\ s . t . : y_{i} (ω^{T} \cdot x_{i} + b) \geq 1 (i = 1, 2, \dots, n) \end{matrix} .$

(9)
The function is optimized by adding the Lagrange multiplier parameter $α_{i}$ , taking the form of:

$\{\begin{matrix} m i n L (α) = \frac{1}{2} \sum_{i, j = 1}^{N} α_{i} α_{j} y_{i} y_{j} (x_{i} \cdot x_{j}) - \sum_{i = 1}^{N} α_{i} \\ s . t . : α_{i} \geq 0 (i = 1, 2, \dots, n) \\ \sum_{i = 1}^{N} y_{i} α_{i} = 0 \end{matrix} .$

(10)
The formula is simplified to obtain the SVM classifier:

$f (x) = s g n [\sum_{i = 1}^{N} α_{i} y_{i} (x \cdot x_{i}) + b] .$

(11)
The final classifier is obtained by combining Equations (8) and (11):

$f (x) = s g n [\sum_{i = 1}^{N} α_{i} y_{i} K (x \cdot x_{i}) + b] .$

(12)

Because an SVM is a binary classifier, to validate our method and compare it with previous results, we used two SVMs to classify arousal and valence into two binary classes based on the assigned values. We chose a threshold value of 5, making the task a binary classification problem; that is, high/low valence and high/low arousal. In this study, to achieve the best classification results, we set the initial value of C (the regularization parameter) to 1 to avoid overfitting. The Gamma value was set to

1 / (n u m b e r o f f e a t u r e s \times d a t a v a r i a n c e)

.

3. Results

3.1. Emotion Recognition Results

Our method avoids the complex process of manual feature extraction by using an SAE to automatically extract statistical features from RSP sequences, and then utilizes an SVM to build the recognition model.

The training loss curve is an important visualization tool during the training of a machine-learning model, illustrating the change in the value of the loss function over time. Figure 12 shows the training losses of the two trained SVMs. Our models performed well on both the training and validation sets.

To demonstrate the effectiveness of the proposed method, a confusion matrix for the test results is presented in Figure 13. As can be seen, converting emotions into two dimensions and then classifying them using two binary SVM models achieved good results, validating the effectiveness of the proposed emotion recognition method.

As listed in Table 4, to comprehensively evaluate the classification ability of the proposed emotion recognition method, we used four evaluation metrics: accuracy, precision, recall rate, and F-score. As shown in Figure 14, the accuracy of the trained SVMs gradually improved with the number of iterations and stabilized at approximately 70%.

3.2. Comparison of Results with FaceReader

The final results demonstrated that the emotion recognition method accurately predicted the true labels of the videos with accuracy. The implementation results of this method, based on the process described in the previous section, are listed in Table 5. It can be seen that the accuracy of the radar sensor method is slightly lower than that of the FaceReader.

3.3. Comparison of Results with Manual Feature Selection Methods

In addition to the above results, we tested the prediction accuracy by inputting the RSP signal features manually selected from the literature [44] (Table 6) into the SVM. A comparison between the accuracy results of our method and manually selected features is presented in Table 7. It can be observed that the manually selected features based on experience and knowledge achieved higher accuracy owing to their higher relevance to the current emotional state of the subject.

3.4. Results Under Specific Lighting Condition in Closed Cabin

To verify the applicability of our method in a typical closed-cabin environment, we invited three participants to partake in the same experiment. However, we reduced the lighting brightness in the laboratory to 60% of that in the original experiment to simulate actual task conditions. The accuracy results obtained under these conditions are listed in Table 8. Owing to the reduced lighting, the camera could not capture clear facial images, leading to a significant decrease in the accuracy of the FaceReader.

4. Discussion

In this study, we propose an effective noncontact emotion recognition system based on RSP signals. The system aims to be deployed in specialized work environments, such as manned submersibles, where emotional changes can lead to decreased performance or even accidents. The emotion recognition accuracy achieved in the study was 68.21%. For all participants, our method achieved an acceptable accuracy rate of over 60% for emotion recognition. The goal of our study is to investigate the feasibility of using millimeter wave radar and breath signals for emotion recognition. We aimed to demonstrate that emotional state detection in a closed-cabin environment can be effectively achieved by combining these two signals. Our results indicate that the model integrating millimeter wave radar and respiration signals shows promising accuracy and feasibility in emotion recognition. This supports our initial hypothesis that this combined approach can enhance the effectiveness of emotion recognition.

Emotions create a motivational tendency and increase the likelihood of engaging in a range of maladaptive behaviors [45]. By identifying a method that minimally affects the normal operations of personnel and can detect emotional states in closed-cabin environments, specific emotions may be prevented from negatively affecting work tasks. Potential application scenarios for the proposed method include various confined spaces, such as aerospace, automotive, and maritime environments. In these settings, emotions can significantly influence task completion and the safety of both personnel and systems.

We also note that in the literature [46], similar to our approach, the valence–arousal emotion model is employed. However, this study utilizes a VR scene to induce corresponding emotions and categorizes emotional valence and arousal by collecting EEG signals using a Gradient Boosted Decision Tree. This research includes more detailed emotion classification labels and demonstrates that EEG can effectively reflect neural activation patterns associated with different emotions. In contrast, our study benefits from the radar sensor we used, which exerts less impact and fewer constraints on individuals, functioning effectively in various environmental conditions (e.g., changing light, crowded spaces, etc.). This adaptability allows for a broader range of application scenarios, whereas EEG devices require a complex deployment and setup process, along with significant perceptibility during data acquisition, potentially affecting the final results.

4.1. Signal Selection and Emotion Model

The literature [25] systematically reviewed the physiological signals employed in current research and concluded that internal physiological signals, such as EEG and ECG, are involuntarily activated and, therefore, more reliable for emotion recognition than body signals such as facial expressions and posture.

Research has shown that when individuals experience emotions such as anger, fear, or happiness, the frequency and amplitude of their breathing increase involuntarily owing to the heightened energy demand of cells. Conversely, when individuals are in a state of sadness, relaxation, or calmness, their energy demand decreases, resulting in slower and shallower breathing. When individuals are startled, there may be a brief breathing interruption resulting in no RSP signal [47]. The RSP parameters also included vital capacity, respiration rhythm, and tidal volume.

Our study indicates that high-valence emotions exhibit more stable and uniform RSP values than low-valence emotions. In terms of arousal differences, high-arousal states were characterized by a higher breathing frequency, whereas low-arousal states exhibited a lower breathing frequency. Low-arousal emotions are characterized by quick inhalations and slow exhalations with a prolonged ending [48].

In this study, we used FaceReader as a comparative emotion recognition method. FaceReader captures facial images using a camera and detects emotions by analyzing the relative position changes of facial landmarks. This recognition method is different from the measurement approach of the millimeter-wave radar and is more direct, thereby providing a robust validation of the effectiveness of the proposed method. FaceReader has shown high accuracy in some literature [49]. However, in our study, we observed that the accuracy of the FaceReader had some discrepancies compared to the emotions expected to be elicited by the videos.

We believe that this discrepancy is due to differences in cultural backgrounds and emotional expression styles, which can lead to recognition errors when FaceReader is used with East Asian individuals. This issue has also been discussed in the literature [50].

To classify emotions, we referred primarily to Russell’s two-dimensional emotion model using valence and arousal as classification parameters. There are three reasons for using this approach:

Consistency with FaceReader: We aimed for our method to be as consistent as possible with FaceReader’s classification method, making the comparison between the two more relevant and informative.
Simplified Modeling: Using valence and arousal simplifies the modeling process, as these dimensions are easily comparable.
Enhanced Research Depth: Our study builds on existing literature and further refines classification by providing a more detailed differentiation of emotions.

4.2. Machine-Learning Frameworks

We used an SAE to transform raw signals into extracted features, and the sparse representations learned by the SAE were used as inputs to train SVMs as classifiers to predict emotion states. The combination of the SAE and SVMs leveraged the feature extraction capabilities of the SAE, reduced the difficulty of handling high-dimensional data, and enhanced the overall model performance. The advantages of combining SAE and SVM are as follows.

Effective feature extraction: The SAE, an unsupervised learning method, automatically learns useful feature representations from raw data. This is particularly beneficial for emotion recognition tasks because physiological signals related to emotions (e.g., RSP signals) often contain redundant and irrelevant information. Using the SAE, we can automatically extract features closely related to emotional states, thereby reducing the complexity of subsequent classification tasks.

Dimensionality reduction: The SAE maps the original high-dimensional data to a lower-dimensional sparse representation through a multilayer encoding and decoding process. This helps mitigate the “curse of dimensionality” problem faced by SVMs when dealing with high-dimensional data, which can lead to reduced generalization capability. By providing low-dimensional features extracted by the SAE, the SVM can be classified more effectively.

Robust feature learning: The SAE can learn robust representations, even from limited datasets. This implies that even with a small amount of training data, the features extracted by the SAE can help the SVM achieve better performance.

Nonlinear mapping: The SAE employs nonlinear activation functions to map raw data to the feature space, enabling the model to capture nonlinear relationships within the data. This is crucial for emotion recognition tasks because emotions often have complex nonlinear relationships with physiological signals.

By combining the SAE and SVM, we achieved a more efficient and effective approach to emotion recognition. The SAE reduces the complexity and dimensionality of the input data, making it easier for the SVM to handle and classify the data, thus improving the overall performance of the model in recognizing emotions based on physiological signals.

4.3. Comparison with FaceReader Results

The results show that the proposed system achieves high accuracy. The SAE helped reduce the complexity of the data processing, reducing the data dimensionality. The SAE automatically selects the appropriate RSP signal features for classification through training, thereby enhancing the overall system performance.

Under normal lighting conditions, the accuracy of our method is comparable to that of the FaceReader. However, under poor lighting conditions, the accuracy of the FaceReader decreased significantly owing to its reliance on clear facial images, which could not be obtained. In contrast, our method maintained consistent accuracy regardless of lighting conditions, as visual data are unnecessary.

According to the literature [44], using manually selected features for prediction achieves a higher accuracy than directly using segmented signal waveforms. Compared with our method, the manually selected features resulted in a higher accuracy, validating the findings in the literature. Additionally, the results of our method are close to those obtained using manual feature selection. The manual selection of signal features requires experts with extensive experience and knowledge, which increases the deployment complexity and cost of the entire system. In contrast, our method demonstrates the power of machine-learning in this field, with broad potential for application. The automation and efficiency of machine-learning models make them suitable for emotion recognition tasks in various environments, including those with challenging conditions, such as poor lighting.

4.4. Availability in Closed Cabins

To comprehensively evaluate the performance of our study in closed-cabin environments, we consulted human factor experts and compared our method with three existing emotion recognition methods across three dimensions: invasiveness, system complexity, and deployment cost. The comparison results are presented in Figure 15.

Figure 15 illustrates these comparisons and highlights the strengths and weaknesses of each approach. Our noncontact method, with its balance of low invasiveness, moderate complexity, and cost-effectiveness, shows great promise for practical deployment in closed-cabin environments.

4.5. Limitation

The limitations of this study are as follows:

Sample bias: The sample size was not sufficiently diverse. As closed-cabin personnel are mostly young males, there was a selection bias in the choice of participants. The experimental environment was set within a closed space, which may have affected the generalizability of the proposed method.
Comparison with FaceReader: FaceReader was used as the control method. However, previous literature indicates that FaceReader’s accuracy is somewhat limited for East Asian populations [50,51,52], which may affect the strength of the evidence provided in this study.
Laboratory setting: The study was conducted in a laboratory environment rather than in a real closed cabin, such as a manned submersible. Real-life electromagnetic environments are more complex and may affect the effectiveness of the proposed method.
Single modality: For convenience of deployment, this study only used respiration signals. According to other studies, employing multimodal signals such as EEG and skin conductance signals can significantly enhance the accuracy of emotion prediction [25,53].
Emotion stimulus material selection: In this manuscript, we utilize video as the emotion stimulus material; however, the literature [46] offers a more effective solution by using VR scenes to evoke the corresponding emotions. This method provides a more controlled and immersive environment for emotion elicitation and assessment, leading to applications in various domains such as mental health, marketing, and entertainment.

We plan to address these issues in future studies. Previous studies have shown that the ability to maintain emotional stability has a significant impact on workplace performance. Therefore, further research is required to understand the patterns and distribution of emotional changes among personnel working in closed-cabin environments.

Future research directions include the following.

Diverse sample population: Include a broader range of participants, to improve the generalizability and robustness of the findings.

Enhanced validation methods: Use more reliable algorithms that account for cultural and demographic variations, to strengthen the validation of the proposed method.

Real-world testing: Conduct experiments in actual closed environments such as manned submersibles, to test the robustness of the method under real-world conditions.

Multimodal signal integration: Incorporate additional physiological signals such as EEG and ECG signals to improve the accuracy and reliability of emotion recognition.

To address these limitations, we aim to refine our method and enhance its applicability to various closed-cabin environments. Further investigation into the emotional stability of closed-cabin personnel during work will also contribute to improving their performance and safety.

5. Conclusions

This study explored the potential of RSP signals for recognizing emotions in closed cabins. The widely used dimensional emotion theory, the valence–arousal theory, was employed to classify the four types of emotions. SAEs were used to extract and select emotion-related features. SVMs were then used to classify high/low arousal and valence. We validated the method from several perspectives. The test results demonstrated that the proposed method achieved acceptable performance. In summary, millimeter-wave radar sensors and respiration signals exhibit significant potential for recognizing emotions in closed cabins.

We believe that there are two main directions for improvement to achieve better recognition results: introducing other physiological signals that can be measured noninvasively, such as heart signals or skin conductance, and adopting multimodal physiological data to enhance performance. In addition, choosing more powerful classifiers, such as random forests, can improve the recognition accuracy. With these improvements, we will be able to monitor the emotions of personnel in closed cabins without interrupting their ongoing activities.

The results of this study not only offer new insights for the development of emotion recognition technology but also contribute to enhancing the safety and work efficiency of individuals in closed environments, demonstrating significant application value.

Author Contributions

Conceptualization, H.W. and D.C.; methodology, H.W., D.C. and Y.Z.; software, X.Z. and H.W.; validation, H.F., Y.S. and J.X.; investigation, Y.H. and J.S.; resources, D.C.; data curation, H.W. and S.G.; writing—original draft preparation, H.W.; writing—review and editing, H.W. and H.F.; visualization, S.G.; supervision, D.C.; project administration, D.C.; funding acquisition, D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was part of the Program on Research on Optimal Design Method of Human Factors in Deep-Sea Manned Vehicle Cabin Supported by Compound Simulation Mechanism funded under the Fundamental Research Funds for the Central Universities (CN), grant number 31020190504007. Additionally, it was also funded under the Special Support Program for High-Level Talents of Shaanxi Province (CN), grant number w099115.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Northwest Polytechnic University (protocol code 202202053, approved on 1 November 2022).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank Hu Yuhua and Bai Yunpeng and Chen Zhengyu for their help in this manuscript. We also thank the Associate Editor and anonymous reviewers for their constructive feedback which has led to an improved version of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Barrett, L.F. Are Emotions Natural Kinds? Perspect. Psychol. Sci. 2006, 1, 28–58. [Google Scholar] [CrossRef] [PubMed]
Frijda, N.H. Emotion Experience and Its Varieties. Emot. Rev. 2009, 1, 264–271. [Google Scholar] [CrossRef]
Scherer, K.R. What Are Emotions? And How Can They Be Measured? Soc. Sci. Inf. 2005, 44, 695–729. [Google Scholar] [CrossRef]
Hönig, F.; Anton, B.; Elmar, N. Real-time recognition of the affective user state with physiological signals. In Affective Computing and Intelligent Interaction, Proceedings of the Doctoral Consortium; Cowie, R., de Rosis, F., Eds.; OPUS Augsburg: Lisbon, Portugal, 2007; pp. 1–8. ISBN 978-989-20-0798-4. [Google Scholar]
Jones, C.M.; Troen, T. Biometric Valence and Arousal Recognition. In Proceedings of the 19th Australasian Conference on Computer-Human Interaction: Entertaining User Interfaces; Association for Computing Machinery, New York, NY, USA, 28 November 2007; pp. 191–194. [Google Scholar]
Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.-S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. DEAP: A Database for Emotion Analysis; Using Physiological Signals. IEEE Trans. Affect. Comput. 2012, 3, 18–31. [Google Scholar] [CrossRef]
Zhu, M.; Chen, D.; Wang, J.; Zhang, X.; Xie, N. Oceanaut’s Personal Acoustic Comfort Prediction Model and Sound Environment Improvement Method in the Cabin of a Deep-Sea Manned Submersible. Ergonomics 2023, 66, 1424–1448. [Google Scholar] [CrossRef] [PubMed]
Byrne, E.A.; Parasuraman, R. Psychophysiology and Adaptive Automation. Biol. Psychol. 1996, 42, 249–268. [Google Scholar] [CrossRef]
Zhang, S.; He, W.; Chen, D.; Chu, J.; Fan, H.; Duan, X. Thermal Comfort Analysis Based on PMV/PPD in Cabins of Manned Submersibles. Build. Environ. 2019, 148, 668–676. [Google Scholar] [CrossRef]
Zhu, M.; Chen, D.; Wang, J.; Sun, Y. Analysis of Oceanaut Operating Performance Using an Integrated Bayesian Network Aided by the Fuzzy Logic Theory. Int. J. Ind. Ergon. 2021, 83, 103129. [Google Scholar] [CrossRef]
Diener, E.; Thapa, S.; Tay, L. Positive Emotions at Work. Annu. Rev. Organ. Psychol. Organ. Behav. 2020, 7, 451–477. [Google Scholar] [CrossRef]
Fisher, C.D.; Noble, C.S. A Within-Person Examination of Correlates of Performance and Emotions While Working. In Emotion and Performance; CRC Press: Boca Raton, FL, USA, 2004; ISBN 978-0-429-18763-6. [Google Scholar]
Ko, W.H.; Schiavon, S.; Zhang, H.; Graham, L.T.; Brager, G.; Mauss, I.; Lin, Y.-W. The Impact of a View from a Window on Thermal Comfort, Emotion, and Cognitive Performance. Build. Environ. 2020, 175, 106779. [Google Scholar] [CrossRef]
Paakkanen, M.A.; Martela, F.; Pessi, A.B. Responding to Positive Emotions at Work–The Four Steps and Potential Benefits of a Validating Response to Coworkers’ Positive Experiences. Front. Psychol. 2021, 12, 668160. [Google Scholar] [CrossRef] [PubMed]
Zacharatos, H.; Gatzoulis, C.; Chrysanthou, Y.L. Automatic Emotion Recognition Based on Body Movement Analysis: A Survey. IEEE Comput. Graph. Appl. 2014, 34, 35–45. [Google Scholar] [CrossRef] [PubMed]
Ahmed, F.; Bari, A.S.M.H.; Gavrilova, M.L. Emotion Recognition From Body Movement. IEEE Access 2020, 8, 11761–11781. [Google Scholar] [CrossRef]
Geraets, C.N.W.; Klein Tuente, S.; Lestestuiver, B.P.; van Beilen, M.; Nijman, S.A.; Marsman, J.B.C.; Veling, W. Virtual Reality Facial Emotion Recognition in Social Environments: An Eye-Tracking Study. Internet Interv. 2021, 25, 100432. [Google Scholar] [CrossRef] [PubMed]
Xu, C.; Yan, C.; Jiang, M.; Alenezi, F.; Alhudhaif, A.; Alnaim, N.; Polat, K.; Wu, W. A Novel Facial Emotion Recognition Method for Stress Inference of Facial Nerve Paralysis Patients. Expert. Syst. Appl. 2022, 197, 116705. [Google Scholar] [CrossRef]
Verkholyak, O.; Dvoynikova, A.; Karpov, A. A Bimodal Approach for Speech Emotion Recognition Using Audio and Text. J. Internet Serv. Inf. Secur. 2021, 11, 80–96. [Google Scholar] [CrossRef]
Zhang, X.; Qiao, Y.; Wang, H.; Wang, J.; Chen, D. Lighting Environmental Assessment in Enclosed Spaces Based on Emotional Model. Sci. Total Environ. 2023, 870, 161933. [Google Scholar] [CrossRef]
Katsigiannis, S.; Ramzan, N. DREAMER: A Database for Emotion Recognition Through EEG and ECG Signals From Wireless Low-Cost Off-the-Shelf Devices. IEEE J. Biomed. Health Inform. 2018, 22, 98–107. [Google Scholar] [CrossRef]
Liu, Z.-T.; Xie, Q.; Wu, M.; Cao, W.-H.; Li, D.-Y.; Li, S.-H. Electroencephalogram Emotion Recognition Based on Empirical Mode Decomposition and Optimal Feature Selection. IEEE Trans. Cogn. Dev. Syst. 2019, 11, 517–526. [Google Scholar] [CrossRef]
Raja, M.; Sigg, S. Applicability of RF-Based Methods for Emotion Recognition: A Survey. In Proceedings of the 2016 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops), Sydney, Australia, 14–18 March 2016; pp. 1–6. [Google Scholar]
Jerritta, S.; Murugappan, M.; Khairunizam, W.; Yaacob, S. Electrocardiogram-based Emotion Recognition System Using Empirical Mode Decomposition and Discrete Fourier Transform. Expert. Syst. 2014, 31, 110–120. [Google Scholar] [CrossRef]
Shu, L.; Xie, J.; Yang, M.; Li, Z.; Li, Z.; Liao, D.; Xu, X.; Yang, X. A Review of Emotion Recognition Using Physiological Signals. Sensors 2018, 18, 2074. [Google Scholar] [CrossRef] [PubMed]
Healey, J.; Picard, R. Digital Processing of Affective Signals. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP ’98 (Cat. No.98CH36181), Seattle, WA, USA, 12–15 May 1998; Volume 6, pp. 3749–3752. [Google Scholar]
Zhao, M.; Adib, F.; Katabi, D. Emotion Recognition Using Wireless Signals. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking, New York, NY, USA, 3 October 2016; pp. 95–108. [Google Scholar]
Padilla-Buritica, J.I.; Martinez-Vargas, J.D.; Castellanos-Dominguez, G. Emotion Discrimination Using Spatially Compact Regions of Interest Extracted from Imaging EEG Activity. Front. Comput. Neurosci. 2016, 10, 55. [Google Scholar] [CrossRef] [PubMed]
Brooker, G. Understanding Millimetre Wave FMCW Radars. In Proceedings of the 1st International Conference Sensors Technology, Wuhan, China, 30 October–1 November 2005. [Google Scholar]
Russell, J.A. A Circumplex Model of Affect. J. Personal. Soc. Psychol. 1980, 39, 1161. [Google Scholar] [CrossRef]
Sedgwick, P. Convenience Sampling. BMJ 2013, 347, f6304. [Google Scholar] [CrossRef]
De Groot, J.H.B.; Smeets, M.A.M.; Rowson, M.J.; Bulsing, P.J.; Blonk, C.G.; Wilkinson, J.E.; Semin, G.R. A Sniff of Happiness. Psychol. Sci. 2015, 26, 684–700. [Google Scholar] [CrossRef]
Ferreira, J.; Parma, V.; Alho, L.; Silva, C.F.; Soares, S.C. Emotional Body Odors as Context: Effects on Cardiac and Subjective Responses. Chem. Senses 2018, 43, 347–355. [Google Scholar] [CrossRef]
Facial Expression Recognition Software|FaceReader. Available online: https://www.noldus.com/facereader (accessed on 22 May 2024).
Xia, S.; Yang, J.; Cai, W.; Zhang, C.; Hua, L.; Zhou, Z. Adaptive Complex Variational Mode Decomposition for Micro-Motion Signal Processing Applications. Sensors 2021, 21, 1637. [Google Scholar] [CrossRef]
Wu, Z.; Huang, N.E. Ensemble Empirical Mode Decomposition: A Noise-Assisted Data Analysis Method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
Colominas, M.A.; Schlotthauer, G.; Torres, M.E. Improved Complete Ensemble EMD: A Suitable Tool for Biomedical Signal Processing. Biomed. Signal Process. Control 2014, 14, 19–29. [Google Scholar] [CrossRef]
Nazari, M.; Sakhaei, S.M. Variational Mode Extraction: A New Efficient Method to Derive Respiratory Signals from ECG. IEEE J. Biomed. Health Inform. 2018, 22, 1059–1067. [Google Scholar] [CrossRef]
Ju, Y.; Guo, J.; Liu, S. A Deep Learning Method Combined Sparse Autoencoder with SVM. In Proceedings of the 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, Xi’an, China, 17–19 September 2015; pp. 257–260. [Google Scholar]
Zhang, Q.; Chen, X.; Zhan, Q.; Yang, T.; Xia, S. Respiration-Based Emotion Recognition with Deep Learning. Comput. Ind. 2017, 92–93, 84–90. [Google Scholar] [CrossRef]
Liu, D.C.; Nocedal, J. On the Limited Memory BFGS Method for Large Scale Optimization. Math. Program. 1989, 45, 503–528. [Google Scholar] [CrossRef]
Sheikhan, M.; Bejani, M.; Gharavian, D. Modular Neural-SVM Scheme for Speech Emotion Recognition Using ANOVA Feature Selection Method. Neural Comput. Appl. 2013, 23, 215–227. [Google Scholar] [CrossRef]
Su, H.; Jia, Y. Study of Human Comfort in Autonomous Vehicles Using Wearable Sensors. IEEE Trans. Intell. Transport. Syst. 2022, 23, 11490–11504. [Google Scholar] [CrossRef]
Gouveia, C.; Tomé, A.; Barros, F.; Soares, S.C.; Vieira, J.; Pinho, P. Study on the Usage Feasibility of Continuous-Wave Radar for Emotion Recognition. Biomed. Signal Process. Control 2020, 58, 101835. [Google Scholar] [CrossRef]
Roidl, E.; Frehse, B.; Höger, R. Emotional States of Drivers and the Impact on Speed, Acceleration and Traffic Violations—A Simulator Study. Accid. Anal. Prev. 2014, 70, 282–292. [Google Scholar] [CrossRef] [PubMed]
Castiblanco Jimenez, I.A.; Olivetti, E.C.; Vezzetti, E.; Moos, S.; Celeghin, A.; Marcolin, F. Effective Affective EEG-Based Indicators in Emotion-Evoking VR Environments: An Evidence from Machine Learning. Neural Comput. Appl. 2024. [Google Scholar] [CrossRef]
Jerath, R.; Beveridge, C. Respiratory Rhythm, Autonomic Modulation, and the Spectrum of Emotions: The Future of Emotion Recognition and Modulation. Front. Psychol. 2020, 11, 1980. [Google Scholar] [CrossRef] [PubMed]
Valderas, M.T.; Bolea, J.; Laguna, P.; Bailón, R.; Vallverdú, M. Mutual Information between Heart Rate Variability and Respiration for Emotion Characterization. Physiol. Meas. 2019, 40, 084001. [Google Scholar] [CrossRef]
Lewinski, P.; Den Uyl, T.M.; Butler, C. Automated Facial Coding: Validation of Basic Emotions and FACS AUs in FaceReader. J. Neurosci. Psychol. Econ. 2014, 7, 227–236. [Google Scholar] [CrossRef]
YANG, C.; LI, H. Validity Study on Facereader’s Images Recognition from Chinese Facial Expression Database. Chin. J. Ergon. 2015, 21, 38–41. [Google Scholar] [CrossRef]
Masuda, T.; Ellsworth, P.C.; Mesquita, B.; Leu, J.; Tanida, S.; Van De Veerdonk, E. Placing the Face in Context: Cultural Differences in the Perception of Facial Emotion. J. Personal. Soc. Psychol. 2008, 94, 365–381. [Google Scholar] [CrossRef] [PubMed]
Li, Y.-T.; Yeh, S.-L.; Huang, T.-R. The Cross-Race Effect in Automatic Facial Expression Recognition Violates Measurement Invariance. Front. Psychol. 2023, 14, 1201145. [Google Scholar] [CrossRef] [PubMed]
Goshvarpour, A.; Abbasi, A.; Goshvarpour, A. An Accurate Emotion Recognition System Using ECG and GSR Signals and Matching Pursuit Method. Biomed. J. 2017, 40, 355–368. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overall research framework.

Figure 2. Millimeter-wave radar sensor.

Figure 3. Valence–arousal (VA) two-dimensional model of emotion.

Figure 4. Flowchart of respiration data acquisition experiment.

Figure 5. Experiment Scenario.

Figure 6. Score distribution of video clips.

Figure 8. Respiration signal segments after processing.

Figure 9. Structure of SAE-SVM classification system.

Figure 10. Structure of an auto-encoder.

Figure 11. Original (blue) and reconstructed (red) respiration signal segment.

Figure 12. Training loss curve of two support vector machines: (a) training loss curve of valence SVM; (b) training loss curve of arousal SVM.

Figure 13. Confusion matrix of classification results for (a) valence and (b) arousal.

Figure 14. Change in accuracy of classification results of (a) valence and (b) arousal.

Figure 15. Comparison between different methods of measurement on the dimensions of intrusiveness, system complexity, and deployment cost.

Table 1. Comparison of respiration and heartbeat signals.

Signal	Frequency (Hz)	Amplitude of Anterior Chest (mm)	Amplitude of Posterior Chest (mm)
Respiration	0.10–0.50	1.00–12.00	0.10–0.50
Heartbeat	0.80–2.00	0.10–0.50	0.01–0.20

Table 2. Comparison of respiration and heartbeat signals.

Signal	Respiration Rate	Inaccuracies (%)	Time (s)
EEMD	19.11	10.40	17.85
ICEEMDAN	17.84	2.89	50.94
VMD	16.12	6.94	17.57
VME	18.38	6.36	1.89

Table 3. Statistical results of selected respiration features for different emotions.

Features	Meaning	High Valence and High Arousal	Low Valence and High Arousal	Low Valence and Low Arousal	High Valence and Low Arousal
RSP rate	Respiration rate	19.16 ± 3.32	20.32 ± 3.21	24.23 ± 4.64	21.82 ± 3.42
MEAN_BB (ms)	Mean value	1179.41 ± 216.24	1011.56 ± 253.86	1090.87 ± 327.02	1221.02 ± 262.93
SDNN_BB (ms)	Standard deviation	962.69 ± 3129.23	989.69 ± 1726.07	1118.32 ± 2836.30	1070.03 ± 625.21
S²	Variance	9267.75 ± 123.81	9893.90 ± 367.42	12,507.51 ± 382.11	11,450.23 ± 368.37
W_w (ms)	The waveform width	25.13 ± 3.01	23.82 ± 4.12	24.31 ± 4.68	26.10 ± 1.48
T_p (ms)	Time between peaks	622.28 ± 112.70	589.43 ± 147.87	500.18 ± 162.66	546.29 ± 54.31
SK	Skewness	0.16 ± 0.02	0.44 ± 0.02	0.31 ± 0.02	0.09 ± 0.01
BK	Kurtosis	−1.72 ± 0.17	−1.46 ± 0.16	−1.83 ± 0.26	−1.87 ± 0.13
F	RSP frequency	0.29 ± 0.08	0.18 ± 0.08	0.13 ± 0.05	0.11 ± 0.08
PSD	Power spectral density	18,334.09	19,662.05	24,708.53	22,666.06
ApEn	Approximate entropy	1.48 ± 0.16	1.21 ± 0.19	0.98 ± 0.16	1.20 ± 0.14
SampEn	Sample entropy	1.36 ± 0.38	0.96 ± 0.82	0.89 ± 0.36	1.06 ± 0.32

All values in the table are expressed as mean ± standard deviation.

Table 4. Results of the evaluation metrics.

Metrics	Results
Metrics	Valence SVM (%)	Arousal SVM (%)	Average Value (%)
Accuracy	73.72	68.34	71.03
Precision	74.83	69.11	71.97
Recall rate	50.76	50.57	50.66
F-score	60.49	58.40	59.44

Table 5. Comparison of complete accuracy results between two methods.

Method	Accuracy (%)
Millimeter-wave radar	68.21
FaceReader	73.35

Table 6. Manually selected specific features.

Feature Code	Feature Meaning
F1	Waveform segmented mean
F2	Variance of average of waveform segments
F3	Waveform width: average of widths of all peaks in waveform
F4	Distance between peaks within waveform, calculating the number of samples between two peaks and their mean values
F5	RSP frequency of whole segment
F6	Power spectral density in 0.0–0.1 Hz range
F7	Power spectral density in 0.1–0.2 Hz range
F8	Power spectral density in 0.2–0.3 Hz range
F9	Power spectral density in 0.3–0.4 Hz range
F10	Power spectral density in 0.4–0.9 Hz range
F11	Power spectral density in 0.9–1.5 Hz range
F12	Ratio of power spectral density values in low frequency (0.1–0.5 Hz) and high frequency (0.6–1.5 Hz) ranges

Table 7. Comparison of accuracy using manual selected features with SAE.

Feature Acquisition Method	Accuracy (%)
Sparse auto-encoder	69.34
Manual selected features	73.35

Table 8. Accuracy under task-specific lighting condition.

Method	Accuracy (%)
Millimeter-wave radar	70.12
FaceReader	31.29

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Chen, D.; Gu, S.; Zhou, Y.; Xiao, J.; Sun, Y.; Sun, J.; Huang, Y.; Zhang, X.; Fan, H. Emotion Recognition in a Closed-Cabin Environment: An Exploratory Study Using Millimeter-Wave Radar and Respiration Signals. Appl. Sci. 2024, 14, 10561. https://doi.org/10.3390/app142210561

AMA Style

Wang H, Chen D, Gu S, Zhou Y, Xiao J, Sun Y, Sun J, Huang Y, Zhang X, Fan H. Emotion Recognition in a Closed-Cabin Environment: An Exploratory Study Using Millimeter-Wave Radar and Respiration Signals. Applied Sciences. 2024; 14(22):10561. https://doi.org/10.3390/app142210561

Chicago/Turabian Style

Wang, Hanyu, Dengkai Chen, Sen Gu, Yao Zhou, Jianghao Xiao, Yiwei Sun, Jianhua Sun, Yuexin Huang, Xian Zhang, and Hao Fan. 2024. "Emotion Recognition in a Closed-Cabin Environment: An Exploratory Study Using Millimeter-Wave Radar and Respiration Signals" Applied Sciences 14, no. 22: 10561. https://doi.org/10.3390/app142210561

APA Style

Wang, H., Chen, D., Gu, S., Zhou, Y., Xiao, J., Sun, Y., Sun, J., Huang, Y., Zhang, X., & Fan, H. (2024). Emotion Recognition in a Closed-Cabin Environment: An Exploratory Study Using Millimeter-Wave Radar and Respiration Signals. Applied Sciences, 14(22), 10561. https://doi.org/10.3390/app142210561

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Emotion Recognition in a Closed-Cabin Environment: An Exploratory Study Using Millimeter-Wave Radar and Respiration Signals

Abstract

Featured Application

Abstract

1. Introduction

2. Methodology

2.1. Methods for Collection and Processing of Respiration Signals

2.1.1. Working Principle of Millimeter-Wave Radar for Biosignal Detection

2.1.2. Valence–Arousal Emotion Model

2.1.3. Respiration Data Acquisition Based on Millimeter-Wave Radar

2.1.4. Emotion Stimulus Materials

2.1.5. Respiration Signal Processing

2.2. Emotion Recognition Method Based on Sparse Auto-Encoder and Support Vector Machine

2.2.1. Feature Selection of Respiration Signals Based on Sparse Auto-Encoder

2.2.2. Emotion Classification Method Based on Support Vector Machine

3. Results

3.1. Emotion Recognition Results

3.2. Comparison of Results with FaceReader

3.3. Comparison of Results with Manual Feature Selection Methods

3.4. Results Under Specific Lighting Condition in Closed Cabin

4. Discussion

4.1. Signal Selection and Emotion Model

4.2. Machine-Learning Frameworks

4.3. Comparison with FaceReader Results

4.4. Availability in Closed Cabins

4.5. Limitation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI