1. Introduction
Stress plays an important role in our overall health. However, many people do not take stress seriously, and most people are unaware of their current stress levels. Stress is a trigger for many diseases. Numerous studies have demonstrated that stress can negatively affect individuals with coronary artery disease and increase the risk of stroke [
1]. Stress also causes blood pressure to rise, and rapid stress can eventually lead to hypotension. Managing stress is crucial for maintaining overall health and can be effective for reducing blood pressure and the development of hypertension [
2]. Other studies have indicated psychological stress, alcohol abuse, clinical infection, trauma, and surgery as some of the main causes of stroke [
3]. If stress is not managed in time, it can have severe consequences for professionals in specific fields, such as surgery, aviation, and driving. Therefore, many researchers try to find the stress levels of people [
4]. Stress management is crucial for maintaining overall health. By taking stress seriously and employing measures to manage it, individuals can significantly reduce their risk of developing stress-related diseases. Stress can be defined in different ways using different mental stressors, including computer work tasks, the Stroop color and word task, arithmetic tasks, public speech tasks, and academic examinations [
5]. To detect stress, researchers often rely on contact methods and devices, such as an electrocardiogram (ECG), electrodermal activity (EDA), and an electroencephalogram, which measures the electrical activity of the brain signals (EEGs). By pre-processing these signals, it becomes possible to identify specific characteristics that correspond closely with a person’s emotional states. Posada-Quintero et al. [
6] measured stress in divers by measuring changes in their sweat levels using EDA during water immersion. EDA data were collected from 14 subjects while divers performed a specific Stroop task underwater. Tzevelekakis et al. [
7] classify three levels of stress: low, moderate, and high, using ultra-short-term raw ECG signals. They used the DriveDB dataset, in which ECG signals were recorded while drivers were driving. By employing convolutional neural networks (CNNs), they achieved accuracies of 83.55% and 98.77% for the 3-level and 2-level stress classifications, respectively. Some researchers used multiple signals; Keshan et al. [
8] collected signals from contact-based devices—ECG, electromyogram (EMG), foot and hand galvanic skin response, and respiration rate—from 17 participants. They detected the stress levels of the drivers’ driving periods and divided them into low, moderate, and high stress levels based on traffic conditions. Using all device data, they achieved 97.4% for high stress, using only ECG signals. The proposed system achieved 88.24% accuracy in predicting low, moderate, and high stress levels. This shows that the information obtained through the ECG signal is very useful in determining stress. Heart rate variability (HRV) is increasingly recognized as a powerful and reliable indicator of stress [
5]. This variability refers to the variation in the time interval between consecutive heartbeats, which is affected by various physiological and psychological factors, including stress levels. HRV is measured by analyzing the time series of beat-to-beat intervals from heart rate data, providing a non-invasive window into the autonomic nervous system’s dynamics. Changes in the nervous system during periods of mental stress can significantly affect heart rate variability (HRV) features. Studies have shown that both long-term HRV analysis from 24-h recordings and ultra-short-term HRV analysis, shorter than 5 min, can detect stress [
5,
9,
10]. Moreover, stress-induced changes in the nervous system can also influence other HRV features, such as the power spectral density, which provides insights into the balance between sympathetic and parasympathetic activity, and various time-domain and frequency-domain features. During periods of stress, significant changes are observed in the time domain features of HRV, specifically, the RR intervals—the time intervals between successive heartbeats—along with the root mean square of successive differences (RMSSD) and pNN50, which measures the number of pairs of successive NN intervals that differ by more than 50 ms. All features decreased during stress. Furthermore, in the frequency domain, the high-frequency (HF) component of HRV, also decreases during stress. Conversely, the low-frequency/high-frequency (LF/HF) ratio and the low-frequency (LF) component increase during stress [
11,
12,
13,
14,
15,
16]. A higher HRV is generally associated with a healthy, resilient cardiovascular system and a strong ability to adapt to stress. Conversely, reduced HRV suggests a predominance of stress responses, less flexibility in responding to environmental demands, and potentially greater risk for cardiovascular and other stress-related disorders [
1,
2,
3].
Typically, HRV measurements derived from ECG are conducted in a clinical setting, and operated by professionals with specialized knowledge in the field. However, this approach necessitates a visit to the clinic, which might not always be convenient for continuous monitoring. Alternatively, PPG-based wearable technology offers a practical solution for those looking to measure their HRV outside of a clinical environment. There are a variety of wearable devices available on the market, designed to track HRV, among other health metrics. These gadgets, which can range from smartwatches to fitness trackers, provide the advantage of continuous monitoring in real time, allowing users to keep track of their HRV data throughout the day during various activities. Recent advancements in technology have led to the development of non-contact methods for monitoring physiological signals, among which is the remote photoplethysmography (rPPG) technique. The rPPG technology enables the detection of blood volume changes in the facial skin through a camera that captures light reflected from the skin. In recent studies, researchers have made significant progress and achieved promising results in determining HRV by utilizing camera-based technologies. Huang et al. [
17] used the rPPG signal to estimate HRV from facial videos using chrominance (CHROM)-based methods. To enhance the accuracy of detecting R–R intervals, a continuous wavelet transform was implemented. The HRV metrics calculated for each participant, including SD1, SD2, SDNN, RMSSD, and SDSD, were evaluated under two different conditions: “Static subjects” and “Static subjects with makeup”. The results showed an average absolute error of 3.53 ms when compared to the ECG chest band device. In addition, the proposed method was compared with the ICA and CHROM methods, and as a result, the proposed method showed better performance in calculating HRV features. Deep learning techniques have become increasingly popular for enhancing the accuracy of HRV analysis. Song et al. [
18] introduced the PulseGAN method, which incorporates CHROM and conditional generative adversarial networks (GAN), to estimate HRV from the face. Kuang et al. [
19] proposed ESA-rPPGNet, employing 3D depth-wise separable convolution to enhance network performance for HRV analysis. Some researchers used thermal images to extract features to detect stress from the face. Mohd et al. [
20] found a correlation between blood flow and temperature changes in facial expression during stress in thermal images. In the research, they used thermal infrared and visible cameras, and the proposed method showed 88.6% accuracy. To increase accuracy, Gioia et al. [
21] combined thermal imaging with physiological signals, like cardiac, electrodermal, and respiratory activity, to detect acute stress. All signals were recorded from 25 participants. For classifications, they implemented a support vector machine model. Only using a thermal image system achieved 86.84% of accuracy, and combining it with the physiological features system achieved 97.37% accuracy. Zhang et al. [
22] detected stress using a combination of ECG, voice, and facial expressions using deep learning. ECG signal is acquired by three-electrode leads using a Biopac MP160 device; for facial expressions and voice recording, they used a Sony video camera, FDR-AX700. The proposed system showed an accuracy of 0.74 for ECG, 0.83 for voice, and 0.79 for facial expressions. Combining all of them, they can achieve 85.1% accuracy in detecting acute stress. Mitsuhashi et al. [
23] combined two methods: the hemoglobin, melanin, and shading (HMS) method and the Spatial Subspace Rotation (2SR) method. A total of 78 videos were collected from 7 subjects. An ECG device was used as ground truth for pulse waves. The participants’ stress levels were assessed through responses to the State-Trait Anxiety Inventory (STAI) questionnaire. The K-nearest neighbor method was used for stress classification. The combination of HMS and 2SR systems achieved over 90% accuracy in the relaxation state; in the stress states system, it achieved 80% accuracy.
In this paper, we will evaluate various machine learning techniques to identify the most effective predictive model that can accurately determine stress levels using only HR and HRV features from different datasets. Furthermore, the models with the highest predictive accuracy will be used to classify stress based on HR and HRV features obtained from the face using a camera. Estimation of HRV from the face consists of several main steps: first, we detected the subject’s face, and then we employed the plane-orthogonal-to-skin (POS) [
24] method. This technique enhances the accuracy of detecting physiological signals from the face. Following this, we applied the discrete wavelet transform (DWT) technique to remove noise from the signal. This step allowed a more precise calculation of HRV from the face. An overview of our proposed method for stress detection from the face is shown in
Figure 1. Because stress detection is a complex process and depends on many factors, in our study, we selected three different publicly available datasets, each offering unique insights into stress indicators and responses. These datasets encompass a variety of scenarios, including work-related stress and cognitive tasks. To effectively analyze these datasets and extract meaningful insights into stress indicators, we employed a range of machine learning techniques. These techniques were selected for their particular strengths in pattern recognition and predictive modeling. The first dataset is the SWELL dataset [
25]. Many researchers have achieved good prediction accuracy in the SWELL dataset. Sharma et al. [
26] applied various machine learning methods, and a two-class neural network model achieved an accuracy of 98% in the SWELL dataset. Another study, conducted by Koldijk et al. [
27], achieved 90% accuracy using support vector machines (SVM). Albaladejo-González et al. [
28] achieved an accuracy of 88.64% using a supervised Multi-layer Perceptron (MLP) model. Ghosh et al. [
29], using an image-encoding-based deep neural network, achieved a promising accuracy of 99.39% for the SWELL dataset. The second dataset is the PPG sensor dataset. The author implemented various machine learning classifiers, and the K-nearest neighbor (KNN) algorithm achieved 72% accuracy. Using a genetic algorithm led to a significant increase in accuracy, reaching 81% [
30]. In the last ECG and EEG sensor dataset [
31], the research study focused on classifying stress levels—low stress, moderate stress, and high stress—by analyzing ECG and electroencephalogram (EEG) data. When examining the dataset, the authors found notable differences between genders. Specifically, for females, the accuracy rate of correctly identifying the stress levels was 62.60%, whereas for males, the accuracy increased to 71.57%. To enhance the accuracy of stress classification, the researchers employed stacking techniques. Stacking is a method used in ensemble learning that integrates several classification models to enhance the accuracy of predictions. By using this method, they were able to achieve an overall accuracy of 64.08% across both genders.
3. Results
In our study, we used only the HR and HRV features from the publicly available dataset. and applied various machine learning techniques. These models included a decision tree, logistic regression, random forest, K-nearest neighbor, gradient-boosting classifier, and support vector classifier with a linear kernel. To provide a rigorous assessment of the model’s performance and to enhance the generalizability of our findings, we employed 10-fold cross-validation. For each model, we conducted a series of experiments to fine-tune its parameters, aiming to achieve the best possible balance between sensitivity and specificity.
In the SWELL dataset, random forest achieved an impressive 99% predictive accuracy in stress level prediction by using only HR and HRV features. To achieve this, the dataset was divided into a training set (70%) and a testing set (30%). We normalized the training and testing datasets by scaling feature values to a range between 0 and 1. This normalization helps in speeding up the learning process and improves the performance of many machine learning algorithms by eliminating the bias that can occur due to the variance in measurement scales. In this study, we employed the minimal-redundancy-maximal-relevance (mRMR) feature selection technique. It helps in identifying a compact set of features that contribute most significantly to predicting the outcome. We selected eight features using the mRMR approach. The hyperparameter configuration of the random forest classifier is as follows: the classifier was configured with 100 trees, a maximum depth of 15, and leaves containing at least one sample. This high level of accuracy demonstrates the capability to predict stress effectively using only HR and HRV metrics.
Table 2 demonstrates the performance of the random forest model in classifying stress levels.
Table 3 demonstrates the performance comparison of different models, and
Table 4 demonstrates a comparison of the proposed method with existing methods.
In the PPG sensor dataset among the tested models, the logistic regression technique outperformed other methodologies in terms of predictive accuracy, achieving a good accuracy rate of 84.2%. In the study, data preparation and analysis involved several critical stages. In our study, the initial stages of data analysis involved cleaning, preprocessing, and filtering of the raw data. Following filtering, we extracted both time-domain and frequency-domain HRV features. The dataset was then split into training and testing sets in a 70/30 ratio. After splitting, we normalized the data. The logistic regression model was configured with a maximum iteration parameter of 2000. Other parameters of the logistic regression model were left at their default settings.
Table 5 demonstrates the performance of the logistic regression model in classifying stress levels.
Table 6 demonstrates a performance comparison of different models, and
Table 7 demonstrates a comparison of the proposed method with existing methods.
For ECG and EEG datasets, the datasets were split into training and testing sets in an 80/20 ratio. We employed a soft voting ensemble technique, combining the predictions of diverse classifiers to enhance the robustness and accuracy of stress prediction. A variety of base learners were initialized, including logistic regression, decision tree, random forest, and K-nearest neighbors. All classifiers used default parameters as a starting point, providing a balanced approach between performance and computational efficiency. These classifiers were integrated into a soft voting ensemble, wherein each classifier’s probabilistic predictions were aggregated to yield a final verdict. This approach capitalizes on the strength of each base learner and compensates for their weaknesses. This approach significantly enhanced our predictive accuracy, leading to a notable improvement, with the ensemble model achieving an accuracy rate of 67%.
Table 8 demonstrates the performance of the ensemble model in classifying stress levels.
Table 9 demonstrates a performance comparison of different models, and
Table 10 demonstrates a comparison of the proposed method with existing methods.
To validate the proposed method for estimating HR and HRV from facial data, we calculated the mean absolute error (MAE) for both the standard deviation of normal-to-normal intervals (SDNN) and heart rate. We conducted experiments using the PURE datasets. On the PURE dataset, our HR and HRV estimation models achieved an MAE of 9.32 ms for SDNN and an MAE of 1.75 bpm for heart rate. The results of these experiments are presented in
Figure 7 and
Figure 8 and
Table 11.
As shown in
Table 11, our proposed method performed better than traditional CHROM [
18], FaceRPPG [
19] methods, and deep learning-based PulseGAN [
18], PhysNet, [
19], rPPGGAN [
37], ESA-rPPGNet [
19] methods. In the custom dataset, our proposed system showed an accuracy of 95% for HR and 82% for overall HRV features
Table 12.
To evaluate the models under real-world conditions, we measured the participants’ stress levels both in a normal state and during public speaking. Public speaking is widely recognized as an effective mental stressor, making it an ideal method for inducing stress [
9]. During the public speaking task, each participant’s face was recorded using cameras. An experiment was conducted with 5 participants. We evaluated the performance of models trained on different datasets: a SWELL dataset, a PPG dataset, and an ECG and EEG dataset. Each dataset offers physiological signals and their correlation with stress.
We evaluated participants’ stress levels using the best models before public speaking activities. The results showed that, at this stage, none of the participants demonstrated any signs of stress. For the random forest model trained on the SWELL dataset, an impressive detection rate was observed, with all 5 participants showing signs of stress (interruption, time pressure) during the public speaking task. This suggests that the SWELL dataset, which includes a variety of physiological responses to stressors, is particularly effective for training models to recognize stress during high-pressure tasks like public speaking. On the other hand, the logistic regression model trained on the second PPG datasets identified stress in 3 out of the 5 participants. The detection rate of 60% indicates a significant sensitivity to stress. For the ensemble model trained on the ECG dataset, no stress was detected in any of the 5 participants. This result could suggest several possibilities, such as the model achieved an accuracy rate of 67% during its training phase. It also suggests that the model’s detection capabilities are somewhat limited, potentially leading to an inability to accurately recognize stress under certain conditions. Secondly, the dataset size used for training this model was small, which significantly impacted its performance. Consequently, the small dataset size, combined with the moderate accuracy, contributed to the model’s failure to detect stress in the participants.
4. Discussion
Stress is increasingly being recognized as a significant issue with serious consequences for health. In this research, we proposed a contactless method for stress detection using a standard camera. We used only HR and HRV to determine stress. Using only HRV, we achieved a high result, especially in the SWELL dataset. This suggests that HRV is a powerful indicator of stress. The high performance of a model trained on the SWELL dataset, in comparison to models trained on other datasets, is because of the dataset’s extensive size and the comprehensive duration over which the signals were recorded. The SWELL dataset comprises 410,322 records. A large number of examples provide more information about stress characteristics, increasing the model’s ability to detect stress in different situations. This large dataset size is critical to training robust machine learning models because it allows the identification of subtle patterns in the data that smaller datasets may miss. PPG and ECG datasets could not capture the full spectrum of stress responses, potentially limiting the model’s learning scope.
In summary, the superior performance of the model trained on the SWELL dataset is a direct consequence of the dataset’s large size and the comprehensive nature of the physiological signals it contains, recorded over a significant duration. This analysis highlights the importance of dataset selection in developing highly effective stress detection models and suggests that the inclusion of diverse and extensive data can significantly enhance model performance. While the current study presents a promising contactless method for stress detection via HRV analysis using a simple camera, there are some limitations that affect the accuracy of the system. One of the primary limitations identified in our proposed system is the sensitivity to environmental factors, specifically the influence of lighting conditions during the process of assessing HRV using a camera. This sensitivity presents a significant challenge, as fluctuations in light intensity and direction can affect the camera’s ability to detect the changes in skin coloration associated with heartbeats. Such conditions can lead to inaccuracies in HRV readings, potentially impacting the system’s overall effectiveness. Another limitation is that sharp movements and dark skin tones can significantly influence the system’s accuracy. While the system is less sensitive to minor, subtle motions, sudden or sharp movements can still disrupt the signal. Moreover, dark skin tones can also pose challenges due to their lower reflectivity and higher absorption of light. As a result, both sharp movements and dark skin tones can lead to decreased accuracy in the measurements. Future research will focus on addressing this limitation and exploring stress responses across various scenarios with more participants, extending the investigation to include other vital sign indicators.