1. Introduction
Stress refers to a feeling of emotional and physical tension provoked by an event or thought that causes frustration, nervousness, happiness, emotional distress, or anger [
1,
2]. Hans Selye defined stress as “a phenomenon which has a non-specific response to high demands”. Therefore, stress is recognized as the body’s mental and physical response to highly demanding events [
3]. These demands may arise from financial and relational issues and complexities that are difficult for an individual to manage. When stress levels surpass an individual’s coping capacity, it adversely affects their lifestyle, productivity, and performance [
4]. Persistent and chronic stress can lead to multiple health problems and extended recovery periods. Stress can result in mental disorders, joint disease, hypotension, and hypertension, as well as physiological, physical, and behavioral issues [
5]. Stress can weaken the immune system and impair an individual’s ability to perform daily tasks [
6]. According to a World Health Organization (WHO) report, approximately one-third of the global population experiences stress annually. On average, one in twenty people suffers from stress or its effects [
7]. With the rise in internet usage, young adults have increased access to online content through personal computers, tablets, and smartphones [
8]. Young adults are highly engaged with social networks such as WhatsApp, Facebook, Twitter, YouTube, Instagram, LinkedIn, and Snapchat, which are significant multimedia content sources [
9,
10]. Surveys indicate that young adults are more associated with Internet use [
11]. Previous studies have identified multimedia use as a critical risk factor in modern society [
12,
13]. It poses significant risks to mental health by inducing emotions that can disrupt mental equilibrium [
14]. Multimedia content can elicit emotions that positively or negatively impact brain activity, thus affecting their lifestyle directly or indirectly [
15]. Persistent feelings of fear, distress, and nervousness can negatively affect brain activity and induce stress [
16]. Functional neuroimaging research indicates that addiction to multimedia damages the frontal-basal pathways, leading to an increased frequency of unwanted actions such as stress, anxiety, and suicidal tendencies [
17].
Various studies have shown that stress can be measured using brain signals [
7,
18,
19,
20,
21,
22]. Brain-Computer Interface (BCI) is a technique that captures the brain signals responding to a provided stimulus. After examining the acquired signals, their features are extracted. Finally, classification is performed using machine learning or deep learning classifiers. Stress significantly impacts the health, lifestyle, and overall well-being of an individual [
23]. Neuroscience reveals that the brain is the primary target of stress. For a long time, EEG has been studied to distinguish between different emotions. Observing human responses to problems through physiological signals enhances our understanding of their emotions [
24].
Human emotions are the most important and complex feature. They are the source of information about people’s experiences. Emotional states are described by the valence-arousal model, which is a foundational concept in psychology [
1]. Valence refers to the pleasantness or unpleasantness of an emotion. Positive valence emotions are considered enjoyable (e.g., joy, contentment), while negative valence emotions are unpleasant (e.g., anger, sadness), and arousal captures the level of intensity associated with emotion. High-arousal emotions are characterized by increased alertness and physiological responses (e.g., excitement, anxiety), while low-arousal emotions are associated with calmness and relaxation. This model proves valuable in understanding human stress. Stressful situations typically evoke emotions with negative valence and high arousal. Emotion consists of three components: subjective experience, physiological response, and expressive response [
25]. Strong emotions like anger and fear, as well as mental stress, are linked to abnormal heart rhythms [
26].
Media content can both relieve and induce stress. Limited research exists on assessing stress on social websites where people interact [
27]. The literature reveals various research studies focused on stress assessment [
28,
29,
30], many of which utilize stimulus-based approaches that achieve significant accuracy. Researchers have used the valence-arousal model to map emotional responses during stressful situations. In [
28] there were increased reports of negative valence and high arousal when participants engaged in a stressful activity, like a public speaking task. The valence-arousal model provides a framework for developing tools to measure stress. These tools might analyze facial expressions, physiological signals (EEG, heart rate, skin conductance), or voice patterns to estimate valence and arousal levels, thereby indicating potential stress [
29]. In [
30], scholars explore how physiological signals using valence and arousal can be used to recognize stress in real-world settings.
However, several limitations exist in these studies. Firstly, most employ specific stimuli such as arithmetic tests, Stroop tests, and presentation activities to induce stress [
31,
32,
33], with only a few using videos as a stimulus to evaluate stress levels. Secondly, arousal and values obtained from available datasets are rarely used to label stress within the participants. With the rise in internet usage dominated by video content, it is necessary to determine the impact of videos on brain activity using videos and valence-arousal models.
To address these gaps, we employed a new labeling mechanism. It is based on valence and arousal values to classify individuals in stressed or relaxed conditions in response to the presented videos. Additionally, we have statistically determined whether video-based datasets can influence different EEG signal bands. Furthermore, machine learning classifiers are trained and used to automate stress detection in response to videos. This research adds to the existing body of literature through the following key contributions:
Enhances the classification accuracy of stress and relaxation states in response to videos by using a valence-arousal labeling mechanism and Power Spectral Density (PSD) features.
Identifies alpha band activity as a statistically significant feature in distinguishing between stressed and relaxed conditions, which aligns with existing literature.
The rest of the work is organized as follows.
Section 2 is a literature review of the related work in human stress assessment using multimedia content.
Section 3 covers a detailed methodology description, following the results of the proposed framework and statistical analysis of videos using the DEAP dataset in
Section 4. A comparison with related studies and the recent techniques has been described in
Section 5.
Section 6 is about the conclusion and future work of the proposed scheme.
2. Literature
Human stress research is one of the growing areas of research in the domain of neural engineering and biomedical science. Stress assessment includes studying different methodologies and patterns to analyze human stress, such as mental, physical, and emotional [
7]. The stressor may vary from person to person due to the different trigger points of individuals. When human beings confront a potentially threatening situation, their physiological changes occur in the brain and nervous system. This potentially threatening situation acts like a stimulus that induces a cascade of processes in the body to adapt to the incident [
34].
There is a rapid increase in internet-related applications and social media sites. Consequently, human beings’ exposure to multimedia content is rapidly increasing [
19]. So, it has become more crucial to assess stress in the response to multimedia content, i.e., videos. Various methods are proposed to make an automatic distinction between stress and relaxation using facial expression, body language, skin conductance, and brain activity. Stress assessment through heart rate is obtained via functional near-infrared spectroscopy.
Emotions are affected by external influences along with internal influences. They are evoked by stimuli such as images and videos. EEG-based emotion recognition studies have been focused on the valence arousal model [
1,
35] by analyzing the facial expression and heart rate [
15]. In research literature, two perspectives are usually used to understand emotions. The first is a discrete model involving the basic evolutionary features such as happiness, joy, sadness, fear, disgust, and surprise. The other perspective for understanding emotions is known as a dimensional model. Recent studies showed that emotions could be determined from emotional models such as Russel’s continuous emotional model, which includes the valence and arousal values; Davis’s emotional power model, which uses arousal and pleasure; and the Mehrabian pleasure, arousal, and dominance emotional model [
1]. Valence and arousal values range between 1 and −1. Stress and anxiety can be predicted by using valence and arousal values [
18].
It was found that invoking different emotions can affect stress [
36]. A study uses arousal-based electrodermal activity to determine emotional stress by using the genetic algorithm on the DEAP dataset [
20]. In this study, subjects undergo relaxation, arousal, and a stressful environment. The results showed that the mental state is affected by arousal and a stressful environment. Some studies used passive methods for stress induction, and some used cognitive tasks, psychosocial stress, physical discomfort, or naturalistic stressors. In [
37] the DEAP dataset [
38], EEG signals are recorded while watching the videos. Stress and relaxed conditions are determined from the valence and arousal values. Arousal and valence values range from 0 to 9. These values are mapped in the equations to determine stress and relaxed conditions for the participants.
Figure 1 depicts the circumplex model of effect. It shows the categorization of emotions based on their pleasantness and activation levels. From this, we can see how stress is mapped to the unpleasant and arousal quadrant of the model. The Geneva Affective PicturE Database (GAPED) is a relatively large collection of images used to evoke various emotions [
39,
40]. Each image in the collection was scored for valence and arousal. Negative images with high arousal ratings, such as violence against humans and animals, may trigger stress. This kind of stress is identified by fluctuations in heart rate, physiological stress markers, EEG, and subjective rating [
41]. In [
42], a preliminary study is conducted to classify arousal by comparing the efficiency of machine learning classifiers. DEAP dataset was used to determine participants’ stress and relaxed conditions.
So, the arousal valence model is a two-dimensional emotion model that studies different emotions ranging from negative to positive and is used to describe the emotion quantitatively [
43]. Standard emotion models are proposed by Russel that include the valence and arousal emotion model [
44]. The effectiveness in determining emotion is the vital reason for this model’s wide usage and popularity. A summary of related work is presented in
Table 1.
Table 1 specifies the classification model, dataset, and the purpose of the related studies.
3. Materials and Methods
In this section, the proposed methodology for stress classification and assessment is presented in response to videos, as shown in
Figure 2. We have selected the videos of the bench-marked DEAP dataset [
38] for stimulating stress and relaxation conditions. A brief description of the DEAP dataset is presented in
Table 2. These videos are available on the YouTube platform. The two-dimensional valence arousal model is used for labeling, and participants rate the videos based on their valence and arousal. Valence and arousal are used to label the participants in stressed and relaxed conditions. The participants’ EEG data was decomposed into four frequency bands, delta, theta, alpha, and beta, the frequency ranges of which are given in
Table 3.
After participants are marked for labeling, PSD features for delta, theta, alpha, and beta frequency bands are extracted using the MATLAB 2020b code. Machine learning classifiers are trained to classify participants’ stressful and relaxed conditions, and their accuracy is determined. A statistical t-test is conducted on each PSD feature to check the statistical significance of each feature in distinguishing stressed and relaxed conditions.
3.1. DEAP Dataset
DEAP is widely used for emotion recognition and stress assessment [
38]. Therefore, we used the DEAP dataset to validate the proposed scheme. It contains the physiological signals and subjective ratings collected from the 32 healthy participants while watching the emotionally evocative music videos. Each participant watched the 40 one-minute music videos that were selected to induce various emotional states of valance and arousal. The music videos are rated by participants based on their valence and arousal values.
DEAP dataset uses a BioSemi ActiView system to record EEG data. The original sampling frequency for the EEG signals was 512 Hz. In the preprocessing, the original data is down-sampled at a sampling rate of 128 Hz to avoid the computational complexity and improve the time-domain and frequency-domain feature extraction without compromising the quality of signals. Also, feature extraction can be effectively performed on down-sampled data. The original data is down-sampled, EOG removed, filtered, and segmented.
Table 2.
DEAP Dataset—Description and Demographics.
Table 2.
DEAP Dataset—Description and Demographics.
Parameter | Description |
---|
Data Modality | EEG (Electroencephalogram), Peripheral physiological signals, Audio |
Number of Participants | 32 |
Age Range | 19–37 years old |
Gender | 46.87% Females, 53.12%Males |
Stimuli | 40 one-minute music video excerpts |
Self-Assessment | Valence (1–9 scale), Arousal (1–9 scale), Dominance (1–9 scale), Liking (1–9 scale), Familiarity (1–5 scale) |
Physiological Signals EEG | 32 Channels, International 10–20 system |
Original Sampling Frequency (EEG) | 512 Hz |
Down Sampling Frequency | 128 Hz |
Data Format | Preprocessed Python/NumPy files |
3.2. Participants
In the DEAP dataset, 32 healthy participants (46.87% females and 53.12% males) participated in the age range of 19–37 years, having a mean value of 27.18 years. Furthermore, the participants have the same educational background, undergraduate and postgraduate. It includes the scores of valence, arousal, dominance, and liking that range from 0 to 9. These ratings are done by the participants while watching the videos. From these rating scores, participants are labeled as in stressed or relaxed conditions on their arousal valence score.
3.3. Preprocessing
Preprocessing is intended to filter out all the noise data that may be added while gathering the data from the sensors. EEG frequencies are primarily present in the 0.3 to 40 Hz bands, but higher frequencies can be present as the noise can occur due to blinking of eyes and muscle activities. Therefore, it is essential to extract meaningful data before further processing. Furthermore, it was required to amplify the signal-to-noise ratio to improve results obtained from feature extraction. EOG artifacts were removed using the tool EEGLAB version 2021.0, and a bandpass frequency filter of 0.3–45.0 Hz was applied. The data was averaged to the common reference. The data was segmented into 60-s trials, and a 3-s pre-trial baseline was removed [
38]. We opted for a sampling rate of 128 Hz. This sampling rate allows us to capture the dominant rhythms of the brain, such as delta, theta, alpha, and beta waves, which are crucial for assessing various brain states. While a higher sampling rate could potentially provide even finer details, 128 Hz offers a good balance between capturing essential information and keeping the data size manageable for further analysis.
We employed a median filter during the preprocessing stage. This technique functions by analyzing short segments of the signal around each data point. Within these segments, the median value, essentially the “middle” value, is calculated. The original data point is then replaced with this median value. By prioritizing the central tendency within a localized window, the median filter effectively cancels out impulsive noise spikes or dips that deviate significantly from the surrounding brain activity. This filtering process helps us extract a cleaner representation of the underlying neural processes, allowing for a more accurate analysis of stress-related changes reflected in the EEG recordings.
3.4. Feature Extraction
Features are extracted from the preprocessed data. Recent studies [
7,
19,
37] showed that removing the information from data is more helpful for classifying stressed and relaxed conditions. In the DEAP dataset, EEG is recorded using the 32-channel headset, and four EEG bands are obtained from the EEG headset, including delta (1–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), and beta (13–30 Hz) frequency bands. For this purpose, EEG data from all channels (i.e., 32 channels) is collected along with all the videos labeled either stressed or relaxed. So, the dimension of our feature vector is (channels) ∗ (features), i.e., 32 ∗ 4 = 128.
Power Spectral Density (PSD) features are a common choice for analyzing EEG signals due to their ability to provide valuable insights into the different brain rhythms, such as delta, theta, alpha, beta, and gamma waves. These brain rhythms are associated with various cognitive states and mental processes, making PSD a valuable tool for studying stress and relaxation states. The statistical properties of EEG signals change over time. PSD features can be calculated over short windows to capture these dynamic changes. Deviations in PSD features have been linked to various neurological and psychiatric disorders, making them clinically relevant.
3.4.1. PSD Features
In spectral estimation, power is distributed over frequency bands contained in a finite set of data signals. PSD and the correlation sequence are mathematically related through the discrete-time Fourier transform. The signal will be more concentrated in its power spectrum if it is more correlated. Various methods of power estimation are present that are used for feature extraction [
45,
46]. However, non-parametric estimation methods are used in the EEG signal as they have higher classification performance. Non-parametric methods usually directly estimate the PSD from the EEG signal itself. The easiest way is the periodogram. The periodogram for the N-sample recorded signal is defined as
Pxx is the power spectral density, a function of frequency f, N is the total number of samples in the signal x(m). x(m) indicates the discrete-time signal, where m is the sample index, and j is the imaginary unit (√−1). Π is the mathematical constant (approximately 3.14159), f is the frequency variable in Hertz, |x(f)|2 represents the magnitude squared of the Fourier transform of x(m), evaluated at frequency f. x(m) is the signal having N-samples where x(f) shows the Fourier transform of the signal, such as x(m).
3.4.2. Computation of Feature Vector Using EEG Spectrum
The PSD non-parametric methods were employed to extract PSDs of EEG signals. The power levels of the EEG signals were obtained from the power matrix using the frequency estimations of PSD methods. For each signal, PSDs were computed by setting the sampling frequency (fs) to 128 Hz. The signals contain helpful information about their type, namely delta, theta, alpha, and beta. These features were described based on their frequency ranges. The frequency ranges of these features are given in
Table 3.
Table 3.
Range of EEG band frequency.
Table 3.
Range of EEG band frequency.
Sr. No. | Band | Frequency Range (Hz) |
---|
1 | Delta | 1–3 |
2 | Theta | 4–7 |
3 | Alpha | 8–12 |
4 | Beta | 13–30 |
3.5. Classification
The information extracted from features is used to train supervised machine learning algorithms and make decisions based on those features. In our study, we used the four widely used classifiers: Naïve Bays, logistic regression, multilayer perceptron, and SMO. We used the 10-fold cross-validation scheme to validate the experimental results [
47]. The primary reason for choosing these classifiers was the efficient performance of these classifiers in prior research. The subsections briefly describe each machine learning classifier employed in this study.
3.5.1. Naïve Bayes
The Naïve Bayes model is the simplified Bayesian model that works on an independence assumption. Therefore, the probability of one of the attributes does not affect the probability of other attributes. Nevertheless, the results obtained from the naïve Bayes classifier are often more reliable. It states that the error depends on three factors: noise of the training data, bias, and variance. Through data training, noise can be reduced, but in this process, good data selection training is very important. For this, it is necessary to divide the training data into multiple groups. Bias is the error due to groupings in the training data that occur if the grouping size of the training data is very large. However, the variance is the error that occurs if the grouping size of the data is too small. The maximum posterior hypothesis of statistics is also used, which works well if the input data is high-dimensional. NB is a nonlinear classifier and provides an effective outcome when it is used in real-world problems.
3.5.2. Multilayer Perceptron (MLP)
MLP is a widely used feed-forward neural network classifier. It works on a supervised backpropagation learning algorithm for the classification of instances. There are three layers: 1st is known as the input layer, 2nd is the output layer, and 3rd is the hidden layer that may contain several hidden layers. The transfer function’s purpose is to calculate the weighted input units corresponding to the output of each neuron. In our experiment, the sigmoid function is used to find out the state
by using the total weighted input that is given as
The accumulated weighted input is written as
and is determined by the following equation.
is the output of the logistic function, typically representing a probability or a binary value (0 or 1), and
is the input to the function, which can be a single value or a vector of values. Here, e is Euler’s number, a mathematical constant approximately equal to 2.71828.
shows the “state level”, and is the weight between the ith and jth connection.
3.5.3. Logistic Regression
Logistic regression (LR) is a statistical machine-learning model. The problems related to binary classification can be solved through LR. In this model, the input vector is projected on multiple hyperplanes. Each hyperplane represents a different class. The link function in LR reflects the relationship between the set of EEG features and the conditional result. The logistic function is defined as;
t is the EEG features and class labels, and e is Euler’s number as mentioned earlier.
By adding value, the logistic function equation is written as
The resultant value ranges between 0 and 1 is obtained from the LR classifier. This value represents the relationship between the features and the classification category.
3.5.4. Sequential Minimal Optimization (SMO)
SMO is an algorithm used to train the support vector machine (SVM). In this classifier, feature space is classified according to the hyperplane that splits the Stress and relaxed/non-stress state by using the class labels. It uses the iterative algorithm that iterates the optimization problem into a series of mini subproblems until all its conditions are satisfied.
3.6. Simulation Parameters
For classification purposes, we have used the WEKA toolkit. The classification of stressed and relaxed conditions has been performed based on extracted PSD features. The classifiers in this study use their default parameters in WEKA 3.8.5. The default parameters used for the simulation of selected classifiers in the WEKA toolkit are given in
Table 4.
For Naïve Bayes, we have used the default parameter, that is, the Laplace smoothing parameter having a value equal to 0.001. The SMO default parameter uses the polykernel with a degree of one and a cache size of 100,000. The logistic regression default parameter is the ridge estimator, having a value of 0.001. Similarly, for MLP, we have used the default parameters with a learning rate of 0.3, momentum of 0.2, 1 hidden layer, and 10 units per hidden layer.
3.7. Performance Measures
True Positives (TP) is the number of instances that are correctly predicted as positive. True Negatives (TN) indicate the number of instances that are correctly predicted as negative. False Positives (FP) are the number of instances that are incorrectly predicted as positive (i.e., predicted to be positive but are negative), and False Negatives (FN) represent the number of instances that are incorrectly predicted as negative (i.e., predicted to be negative but were positive). The performance measures that are used for the evaluation of classifiers are accuracy, precision, recall, Kappa statistics, and f-measure. The details of these performance metrics are explained below.
3.7.1. Accuracy
Accuracy is the system’s capacity to distinguish between the proximity of a measured value to a standard and an unknown value. It is the proportion of correctly classified examples among the total number of available instances. In mathematics, it can be expressed as,
3.7.2. Recall
The recall is a subset of the overall number of relevant instances retrieved. This parameter is used to measure the reliability of two ratings. Mathematically, it can be written as,
3.7.3. Precision
Precision is the measure of the closeness of two or more values/measurements to each other. Mathematically, it is described as:
3.7.4. F-Measure (Fm)
The weighted average of Precision and Recall is known as F-measure. Thus, in its formula, scores are taken from both false positives and false negatives. The mathematical representation of F
m is as follows:
3.7.5. Kappa Statistics
κ indicates the Kappa coefficient, a value between 0 and 1. Where P(a) is the observed agreement between the two methods, P(e) represents the expected agreement by chance. Kappa statistics values have a range of 0 to 1, where 0 describes the chance level classification and 1 presents perfect classification.
4. Experimental Results and Performance Analysis
This section presents the experimental results and performance analysis of the study. The data labeling technique has been described in detail. The statistical analysis and results of supervised learning classifiers have been presented.
4.1. Participants Data Labeling
DEAP dataset contains the valence and arousal scores for each video watched by each participant. The videos involved in this study are grouped into two-class problems, such as stress and relaxation, using the arousal and valence scores. The valence and arousal values are obtained for each video for all 32 participants that are in the range of 0–9. The arousal and valence have maximum and minimum scores of 9 and 1, respectively. The stressed and relaxed state conditions, along with each video for each participant, are found using Equations (11) and (12) [
18].
From the results of these equations, we found that some of the videos do not affect brain activity, i.e., neither do participants feel stressed nor relaxed. At the same time, we observed that if an individual is feeling stressed while watching a particular video, the other participants may not experience the same state while watching the same video. Based on valence and arousal values, we collected a total of 92 video data out of 40 (videos) ∗ 32 (participants) = 1280, which induced stressed and relaxed conditions in participants. A total of 67 video data are labeled as stressful videos, while 25 videos are labeled as relaxing. The videos that did not induce a stressed or relaxed state were removed from the study. The number of videos that are labeled as stressful and relaxing is shown in
Figure 3. In
Figure 3, a bar graph depicts the number of videos that are associated with stressed and relaxed emotions. From here, we can see that there are 67 videos related to stress, which is significantly higher than the 25 videos that are associated with relaxed conditions.
4.2. Statistical Analysis
To validate the classification results, a paired
t-test for the nonequivalence is performed on the EEG frequency bands of the delta, theta, alpha, and beta in stress and relax conditions as they tend to follow a normal distribution.
p-values are calculated for the average of each frequency from all electrodes using a significance level of 0.05. The population size is considered 67 + 25 = 92 samples. The null hypothesis is that the mean band power is that the same in stressed and relaxed individuals. At the same time, an alternative hypothesis is that the mean band power is different in stressed and relaxed individuals.
p-values for each frequency band are given in
Table 5. The results show that videos have a statistically significant effect on the average alpha band as its value is less than 0.05. The related studies [
18,
19,
20,
21,
22,
34] also show that EEG alpha band frequencies are highly affected and are sensitive to stress.
Figure 4 shows the average and standard deviation of alpha band power from all EEG electrodes for stressed and relaxed participants. The y-axis shows the average alpha band power for all electrodes, and the x-axis labels show average and either stressed or relaxed conditions. The average alpha band power is higher for relaxed people than for stressed people. Meanwhile, the standard deviation in alpha waves is lower for relaxed conditions than for stressed conditions. When someone is stressed, their brain waves tend to be faster and more irregular. However, it is important to note that EEG results can vary depending on several factors.
4.3. Experimental Results
Machine learning classifiers are implemented to classify stress and relaxed conditions using the EEG signals in response to videos used in the DEAP dataset. To obtain the results, machine learning classifiers are trained using the PSD features on two-class problems, i.e., both groups, i.e., stressed and relaxed. Each of the participants watched the
A total of 40 videos and their EEG signals were recorded while watching the videos. These videos are presented randomly, irrespective of age, gender, and preference. While watching the videos, the participants rated the videos based on their arousal valence. The arousal and valence scores were used to label the participants for the two-class problems, such as those in the stressed and relaxed classes. PSD features from four EEG bands, delta, theta, alpha, and beta, are extracted using 32 channels.
Four different classifiers named NB, MLP, SMO, and Linear Regression are employed for the assessment of stress levels into two states: stressed and relaxed. A 10-fold cross-validation approach is used in our study to validate results. The results obtained by implementing the different classifier algorithms are analyzed and compared based on accuracy, precision, recall, and f-measure. Their results are also evaluated and differentiated based on kappa statistics.
Table 6 shows the performance comparison for the different classifiers for the stressed and relaxed states. The presented results of these classifiers show that the overall SMO classifier is better than the other classifiers used, such as MLP, NB, and LR, having an accuracy of 95.65%. Similarly, precision and recall show that more relevant instances are close to the best value. From
Table 6, it is evident that the SMO classifier outperforms the other classifiers. Although the difference between the accuracy of MLP and SMO is not statistically significant, it was observed that MLP took longer to train than of SMO. The kappa value is 0.69, which is close to the best value, i.e., 1, which means the classification is not by chance. F
m value is 0.91, which shows that the SMO classifier provides a value closest to its best value, 1. The classification results of MLP, NB, and LR are also comparable with the SMO classifier accuracy results, which are 94.56%, 88.04%, and 92.39%, respectively.
Figure 5 shows the confusion matrices of different classifiers for the classification of two-class problems: stressed and relaxed groups whose features are extracted from the EEG signals that are recorded while watching the videos. It is noticed that the SMO classifier provides the highest classification accuracy for the individual two-class problem, i.e., stressed and relaxed groups.
5. Comparison and Discussion of Results
EEG is widely used for stress assessment along with the PSS-10 Questionnaire [
48,
49]. However, studies for stress assessment in response to multimedia are limited in the literature as evident by
Table 7. The primary part of multimedia is obtained from the internet, and in the last few years, there has been an increase in the use of online social networks [
50]. These social networks allow people to comment, share, and observe multimedia content such as images, audio, and video clips. That may induce emotions such as happiness, distress, sadness, depression, satisfaction, calm, delight, and excitement. These emotions induce stress in people, leading to mental disorders and causing many other diseases [
51]. In [
21], a new method is developed to improve the accuracy of emotional stress assessment. For that, a combination of a 3D (three-dimensional) neural network and attention method is developed to make the 3D convolutional gated self-attention neural network. Then, the experiment is performed on benchmark datasets, including the DEAP, VRE, and EDESC datasets, to competently perceive emotional stress.
They use the 3D feature vector by applying the 3D convolution block (3DConvB) to the 3D cuboid representation (Cb) of the EEG data in a specific frequency band (b). This 3D cuboid representation consists of the power spectral density (PSD) values (Xb) for each electrode in that frequency band. The results of the experiments were worthwhile. An accuracy of 93.99% for self-gated is obtained for the DEAP dataset. In [
20] Emotional stress detection is made by performing the genetic algorithm for feature selection using the DEAP dataset. Emotional stress is detected by developing a new feature selection algorithm that is called the genetic algorithm for the DEAP dataset. This algorithm selects the feature that enhances the performance of the classifiers. The number of features extracted is 192. They used the K-NN classifier and the K-NN classification algorithm, which showed an accuracy of 71.76%. However, in our study, we extracted 4 features for 32 channels, so the features for our research are 128 (4 ∗ 32). DEAP dataset video sessions are used as multimedia stimuli. NB, SMO, LR, and MLP classifiers are used to classify stressed and relaxed participants. The results showed the highest accuracy of 95.65% for the SMO classifier. If we compare the proposed study with [
20,
21] data from only 32 subjects, it was acquired and analyzed, but we obtained results with improved accuracy.
In [
21], the same dataset is used that we have used in our study. They used the deep learning recurrent algorithm for feature extraction and the 3D neural network model. By this, an accuracy of 93.9% is obtained. However, we have extracted spectral density features for the delta, theta, alpha, and beta bands. We have used machine learning algorithms for classification, and an accuracy of 95.65% was obtained for the SMO classifier. In [
20], a genetic algorithm is developed and is employed on the same dataset as ours. However, an accuracy of 71.76% is obtained. The studies [
20,
21] both focus on emotional stress using the DEAP dataset. Therefore, these studies are closely related to our proposed work.
6. Conclusions and Recommendations for Future Work
The research paper proposes to perform a stress assessment in response to video clips using EEG. The arousal and valence values obtained from the DEAP dataset categorize participants in stressful and relaxed conditions. We extracted the power spectral density features (delta, theta, alpha, and beta) for stress assessment and applied four machine learning classifiers: NB, MLP, LR, and SMO. The performance analysis shows the proposed scheme’s effectiveness. Furthermore, it is concluded that the proposed schemes provided promising results for stress assessment in response to the high arousal values videos. Alpha activity came out to be a statistically significant feature in the classification of stressful and relaxed conditions, which conforms to existing literature. For stress assessment, results obtained from overall feature extraction are satisfactory and worth considering for all metrics. The SMO classifier showed 95.65% accuracy for stressed and relaxed conditions in participants, which is better than all the classifiers used. There is a limitation in our study that using valence arousal-based labeling resulted in unbalanced classes in the DEAP dataset. This class imbalance can be removed by using techniques like SMOTE, GANs, and ADASYN in the future. In the future, we can extend our research on stress-induced responses to multimedia data on assessing high arousal by acquiring and analyzing the raw data through Android applications.
Personalized content recommendation systems on various media platforms can incorporate this model to track users’ stress and emotional states in real-time while they watch videos. Furthermore, wearable technology can make use of this model to continuously monitor mental health, providing information on stressors and assisting people in managing their well-being. These potential applications show how versatile this research is in addressing modern challenges in mental health, multimedia consumption, and human-computer interaction. Examples of such applications include workplace productivity tools, gaming and virtual reality experiences, educational platforms, and therapeutic interventions.