Next Article in Journal
Artificial Intelligence-Powered Recommender Systems for Promoting Healthy Habits and Active Aging: A Systematic Review
Next Article in Special Issue
Beyond xG: A Dual Prediction Model for Analyzing Player Performance Through Expected and Actual Goals in European Soccer Leagues
Previous Article in Journal
A Decision Framework for Selecting Highly Sustainable Packaging Circular Model in Mass-Customized Packaging Industry
Previous Article in Special Issue
SimCDL: A Simple Framework for Contrastive Dictionary Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Impact of Sound and Image Features in ASMR on Emotional and Physiological Responses

1
Department of Emotion Engineering, Sangmyung University, Seoul 03016, Republic of Korea
2
Department of Human-Centered Artificial Intelligence, Sangmyung University, Seoul 03016, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(22), 10223; https://doi.org/10.3390/app142210223
Submission received: 30 August 2024 / Revised: 25 October 2024 / Accepted: 4 November 2024 / Published: 7 November 2024

Abstract

:
As media consumption through electronic devices increases, there is growing interest in ASMR videos, known for inducing relaxation and positive emotional states. However, the effectiveness of ASMR varies depending on each video’s characteristics. This study identifies key sound and image features that evoke specific emotional responses. ASMR videos were categorized into two groups: high valence–low relaxation (HVLR) and low valence–high relaxation (LVHR). Subjective evaluations, along with physiological data such as electroencephalography (EEG) and heart rate variability (HRV), were collected from 31 participants to provide objective evidence of emotional and physiological responses. The results showed that both HVLR and LVHR videos can induce relaxation and positive emotions, but the intensity varies depending on the video’s characteristics. LVHR videos have sound frequencies between 50 and 500 Hz, brightness levels of 20 to 30%, and a higher ratio of green to blue. These videos led to 45% greater delta wave activity in the frontal lobe and a tenfold increase in HF HRV, indicating stronger relaxation. HVLR videos feature sound frequencies ranging from 500 to 10,000 Hz, brightness levels of 60 to 70%, and a higher ratio of yellow to green. These videos resulted in 1.2 times higher beta wave activity in the frontal lobe and an increase in LF HRV, indicating greater cognitive engagement and positive arousal. Participants’ subjective reports were consistent with these physiological responses, with LVHR videos evoking feelings of calmness and HVLR videos inducing more vibrant emotions. These findings provide a foundation for creating ASMR content with specific emotional outcomes and offer a framework for researchers to achieve consistent results. By defining sound and image characteristics along with emotional keywords, this study provides practical guidance for content creators and enhances user understanding of ASMR videos.

1. Introduction

In the modern digital era, the widespread use of electronic devices has fundamentally transformed media consumption patterns, allowing individuals to access music and video content at any time and place [1]. This shift has led to an increased reliance on electronic devices for listening to music or watching videos while engaging in various activities such as sleeping, commuting, working, or studying [2]. As a result, there is a growing interest in various forms of media that can regulate psychological and physiological states. Among these, autonomous sensory meridian response (ASMR) videos, which are designed to elicit specific emotional responses, have gained considerable attention.
ASMR is typically characterized by a tingling sensation that begins in the scalp and extends down the back of the neck and upper spine, often triggered by specific auditory and visual stimuli. Since the emergence of ASMR content in the early 2010s, research on its mental and physical effects has been ongoing. Most studies have concluded that ASMR elicits various positive emotional responses, such as relaxation, comfort, and a sense of well-being [3,4]. Reflecting these findings, millions of users have sought out ASMR videos to enhance relaxation and positive emotional states, contributing to their rapid growth on platforms like YouTube [5,6]. Furthermore, advancements in high-resolution audio and video technologies have enhanced the ASMR experience, further increasing its significance [3]. As a result, this underscores the need for a thorough understanding of its emotional and physiological impacts.
Research on the effects of ASMR on emotional states has primarily focused on its potential to reduce stress, depression, and anxiety. The most widely documented emotional impact of ASMR is relaxation, with the primary argument being that it helps alleviate both stress and anxiety [2,7]. Additionally, studies have shown that ASMR enhances positive emotions, highlighting its potential as a therapeutic tool by reducing depression through the promotion of positive emotional states [4,8,9]. On the physiological side, ASMR’s effects have been examined primarily through brain activity and cardiovascular responses. Neuroscientific research indicates that ASMR stimuli activate regions such as the frontal lobe and amygdala, which are associated with attention, relaxation, and emotional processing [10,11]. Furthermore, ASMR has been found to influence specific brain wave frequencies, particularly alpha waves, supporting the claim that ASMR induces relaxation and positive emotional experiences [11,12]. Additional research suggests that theta and beta waves may also be influenced by ASMR, suggesting a potential relationship not only with relaxation but also with arousal and focused attention [8]. Studies on heart rate variability (HRV) have further demonstrated that ASMR significantly reduces heart rate, reinforcing its relaxation effect [8,13,14].
Although the majority of studies have shown that ASMR’s emotional and physiological effects induce relaxation and positive emotional states, a challenge remains due to the variability of results. The effects can differ based on the specific ASMR stimuli selected by researchers. Establishing a clearer definition of ASMR’s stimulus characteristics is crucial to fully understanding its emotional and physiological impacts. However, research on these specific segmented features remains limited.
This study aims to analyze the emotional and physiological responses related to relaxation and positivity, as influenced by the sound and image characteristics of ASMR content. Previous studies [4,8,13] have indicated that ASMR videos can promote positive emotions and relaxation, suggesting that ASMR typically elicits these emotional states. Based on this, we hypothesized that while ASMR generally induces positive emotions and relaxation, the intensity of these effects may vary depending on the physical properties of the stimuli. Accordingly, we categorized ASMR videos into two groups: high valence–low relaxation (HVLR) and low valence–high relaxation (LVHR).
This study aims to identify the sound and visual features of ASMR that produce contrasting emotional responses by focusing on these two emotional dimensions. Additionally, the study will define the emotional vocabulary associated with each dimension by analyzing the terms linked to HVLR and LVHR experiences. To achieve this, we will measure both subjective emotional evaluations and heart rate variability (HRV), which reflect EEG and autonomic nervous system activity. This approach will allow us to capture participants’ subjective emotional states and validate them through EEG and HRV pattern analysis. Ultimately, the study aims to specify the sound and image features of ASMR stimuli that evoke distinct emotional responses.
This study provides guidelines for selecting ASMR content suitable for research purposes by clearly identifying the key features of ASMR stimuli. It also establishes a foundation for designing ASMR content, enabling creators to strategically manipulate sound and visual elements to achieve desired emotional outcomes. Furthermore, we aim to enhance viewers’ understanding and satisfaction when selecting ASMR content by presenting emotional keywords that effectively describe each ASMR video.

2. Materials and Methods

2.1. Materials

2.1.1. ASMR Video

This study employed ASMR videos sourced from YouTube to investigate emotional responses associated with specific sensory stimuli. Initially, 450 ASMR-related videos were randomly selected from the platform. These videos underwent a thorough review process by a focus group consisting of six experts in content creation and five specialists in emotion recognition. For each of the 450 videos, the group evaluated positivity and arousal on a 7-point scale and selected the emotional vocabulary that best matched each video. Videos with high positivity and arousal scores, where more than half of the group selected the same vocabulary, were classified into the high valence–low relaxation (HVLR) group based on relative comparisons. Conversely, videos with low positivity and arousal scores, where the majority agreed on the vocabulary, were categorized into the low valence–high relaxation (LVHR) group. A total of 56 ASMR videos were selected, with 28 videos in each group. From these, four videos per group were chosen based on the most distinct ratings in positivity, arousal, and emotional vocabulary alignment, resulting in a final set of eight experimental stimuli (Figure 1). Each selected video was edited to a standardized length of 3 min to ensure consistency and comparability for subsequent analyses.

2.1.2. Sound Features

To analyze the sound characteristics of the selected ASMR videos, a total of 36 acoustic features were extracted through a 3-min duration analysis. These features included 16 power spectrum features and 20 mel-frequency cepstral coefficients (MFCCs). Since sound can be classified into various frequency ranges, and its emotional effects vary depending on the range, the extracted sound variables were categorized into different frequency bands for analysis. The specific frequency ranges and their meanings are detailed in Table 1 [15]. For each sound variable, both the mean and standard deviation were calculated across the 3-min video segments to provide a comprehensive statistical profile of the sound characteristics.
MFCCs, widely used to represent the short-term power spectrum of sounds, provide a more precise understanding of how sounds are perceived by humans, as they mimic the auditory processing of the human ear [16]. These features were extracted using the Python Librosa library, with a sampling rate of 250 Hz. The audio data, sampled at 16 kHz, were divided into 25 ms frames with a hop length of 10 ms. A 1024-point fast Fourier transform (FFT) was performed on each frame to calculate the frequency spectrum, which was then converted to the mel frequency scale. A mel spectrogram was created using a mel filter bank, and MFCCs up to the 20th order were computed through algebraic transformation and inverse discrete cosine transformation (IDCT). The coefficients were summarized by calculating their mean values, forming the feature vectors. These extracted sound features from the basic for further analysis, helping to determine how specific auditory characteristics contribute to evoking particular emotional responses in viewers.

2.1.3. Image Features

Image features were extracted from 22 variables, encompassing multiple color spaces and spatial characteristics. These included RGB (gray, red, green, blue), HSV (hue, saturation, value), and LAB (lightness, alpha, beta). The RGB color space, which combines red, green, and blue light, is fundamental for digital displays and is known as “additive mixing”, where colors become brighter as they are combined. The HSV color space represents color through hue, saturation, and brightness, and is often used to capture properties such as rainbow-like, vivid, or monochromatic tones [17]. The LAB, designed to mimic the human visual system, is valuable for consistently representing color and detecting color differences across devices [18]. Since color can evoke emotional responses, we extracted variables from these three color spaces to assess whether emotional effects differ across different methods of color representation.
For each variable, both the mean and standard deviation were calculated to capture the central tendency and dispersion of image characteristics. In the LAB color space, the alpha and beta values indicate the deviation of the mean values of the a and b channels from the central point of 128. A positive alpha value signifies a strong red component, whereas a negative value indicates a strong green component. Similarly, a positive beta value reflects a strong yellow component, while a negative value suggests a strong blue component.

2.1.4. Emotional Questionnaire

Participants’ emotional responses to the ASMR videos were evaluated using an emotional questionnaire administered via Google Forms. They rated their levels of relaxation and positivity while watching each video on a 5-point Likert scale. A lower score on the relaxation scale corresponded to higher relaxation, while a higher score on the positivity scale indicated a higher positive affect. These scales allowed for the comprehensive assessment of participants’ subjective feelings of relaxation and positivity induced by the videos. Participants also selected the emotional vocabulary they felt best matched each video. The vocabulary options included nine terms: “focused”, “enjoyable”, “vibrant”, “not bad”, “pleasant”, “joyful”, “relaxed”, “calm”, and “comfortable”. Based on these responses, the emotional responses to the 8 videos were mapped accordingly.

2.1.5. Electroencephalography (EEG)

EEG signals were recorded using a 12-channel electrode cap aligned with the international 10–20 electrode placement system, within a 32-channel configuration using the TruScan Family of EEG systems by DEYMED Diagnostic (Hronov, Czech Republic). The sampling rate was set at 250 Hz, and the impedance between the electrodes and the skin was maintained below 5 kΩ, with the GM electrode serving as the ground and the Cz electrode as the reference.
Previous research on ASMR and brain responses has shown that activation is primarily observed in the frontal lobe, although other studies have reported brain activation in other regions [19,20]. To comprehensively analyze brain responses to ASMR stimuli across different regions, this study selected electrode sites representing the frontal, central, occipital, parietal, and temporal lobes. As a result, EEG signals were recorded from 12 electrode sites (Fp1, Fp2, F3, F4, C3, C4, P3, P4, T3, T4, O1, O2).
Brain wave activity was recorded across four frequency bands: delta (0.0–4.0 Hz), theta (4.0–8.0 Hz), alpha (8.0–12.0 Hz), and beta (12.0–30.0 Hz), with reference to the Cz channel. Theta, alpha, and beta frequencies were selected based on previous studies indicating that ASMR stimuli predominantly affect these bands, while delta frequencies were included for a more detailed analysis of ASMR’s relaxation effects. Bandpass filters (BPF) were applied to all EEG data to remove noise.
To assess hemispheric asymmetry in brain activity, an asymmetry index (AI) was calculated for EEG variables that exhibited significant differences. The AI was defined as follows [21]:
A I = R i g h t L e f t R i g h t + L e f t
This ratio-based method was employed to compare the relative differences between the left and right hemispheres. To ensure the accuracy of the calculations, the denominator was treated as an absolute value, given that the data included both negative and positive values.

2.1.6. Electrocardiogram (ECG)

ECG data were recorded using the ECG100C Electrocardiogram Amplifier by BIOPAC Systems, Inc. (Goleta, CA, USA). The data were categorized into time-domain and frequency-domain variables, from which relevant parameters were extracted. Table 2 provides fundamental information on the variables related to heart rate variability (HRV) in both time and frequency domains [22,23,24]. These tables summarize the key parameters used to assess HRV, including their formulas, units of measurement, and corresponding frequency ranges.
Respiratory sinus arrhythmia (RSA) was calculated by first converting the power values within the specific frequency range of 0.12–0.4 Hz into a time series. This time series was then segmented into 30 ms intervals. The natural logarithm was applied to each segment, and the RSA value was determined by averaging these log-transformed values. This approach provides a standardized measure of RSA, emphasizing variations in heart rate that correspond to respiratory cycles within the specified frequency range.

2.1.7. Emotion Recognition

The physical attributes of 56 ASMR videos, categorized by a group of 11 experts, were analyzed to verify that the 8 ASMR videos accurately represented the sound and image characteristics of the LVHR and HVLR groups. A statistical analysis was conducted on the sound and image characteristics of the eight representative videos, and the range for each group was determined based on the mean and standard deviation for key variables. The 56 videos were then classified into one of the two groups by weighting their sound and image characteristic values according to whether they fell within the specified range. The accuracy of the emotional classification for these videos was subsequently calculated.
In addition, the results of the emotion vocabulary questionnaire, completed by the 11 experts for the 56 videos, were analyzed to identify emotion keywords that effectively describe HVLR and LVHR content. Euclidean distance was used to compare the sound and image features of the 56 videos with the average feature values of the 8 representative videos. The most frequently selected vocabulary for each group was identified by aggregating the questionnaire results for the top seven videos with the closest feature values.

2.2. Methods

2.2.1. Participants

The study recruited 31 participants (mean age = 23.58 years, standard deviation = 2.28, age range = 20–28 years; 16 males) who met specific inclusion criteria. Participants were healthy adults in their 20s with no history of cardiovascular or central nervous system disorders and did not require corrective lenses or glasses. All participants voluntarily agreed to partake in the study and provided written informed consent after receiving a comprehensive explanation of the experimental procedures. Upon completion, participants were compensated with a nominal fee for their participation.

2.2.2. Procedure

The experiment was conducted in a controlled environment to maintain consistency across participants. Upon arrival, participants were briefed on the study procedures and provided written consent. They were seated comfortably about 50 cm from a computer screen, and EEG electrodes were attached to their scalp, while ECG electrodes were placed on their arms. Throughout the experiment, participants were instructed to maintain a fixed posture and focus on the screen to minimize movement and reduce noise in the EEG and HRV data collection. The experimental session commenced with the initiation of EEG, ECG, and video recordings. Baseline measurements were taken while participants viewed a blank screen for three minutes. Subsequently, participants watched a series of ASMR videos, each lasting three minutes, with a one-minute break between videos to complete a questionnaire. This sequence was repeated for each of the 8 videos, resulting in a total of 8 repetitions.

2.2.3. Analysis

Data analysis was performed using IBM SPSS Statistics 21. The sound and image characteristics of the ASMR videos were thoroughly examined based on the data extracted from the 8 selected videos, with no missing data. EEG data from 30 participants (excluding one outlier) were included in the analysis. Subjective evaluations and ECG measurements were analyzed with complete data from all 31 participants.
Subjective evaluations were analyzed using the Mann–Whitney U test. Non-parametric tests were applied to the 5-point scale ratings for positivity and relaxation to determine significant differences across the quadrants. Additionally, the emotional vocabulary survey results were analyzed using a cross-tabulation method.
During preprocessing, the EEG and ECG data were normalized by calculating the difference between the baseline (pre-stimulus) values and the values recorded after the stimulus presentation. EEG and ECG data for each variable were sorted in ascending order, and the top and bottom 10% of the data were excluded to focus on the central 80% of the data. This approach aimed to reduce errors due to data dispersion by removing extreme values. The remaining data were averaged for each corresponding video segment. The primary focus of the analysis was on investigating the average changes in power density values across frequency bands based on electrode positions. Each variable was statistically analyzed using independent sample t-tests and Mann–Whitney U tests to compare group means.
Sound and image features of the ASMR videos were extracted using Python (v3.9.10). The sound power spectrum was extracted using the Numpy (v1.24.0) libraries, while mel-frequency cepstral coefficients (MFCCs) were extracted using the Librosa (v0.10.0) library. Image features were extracted using the OpenCV (cv2) library. The features from the 8 ASMR videos were grouped according to specific quadrants. Independent sample t-tests and Mann–Whitney U tests were employed to calculate and analyze the mean values of each group. This process helped to identify significant differences between the groups.
The emotion recognition in ASMR videos was also conducted using Python, analyzing a dataset of 56 videos selected by 11 experts. From this dataset, eight videos—representing high valence–low relaxation (HVLR) and low valence–high relaxation (LVHR) emotions—were identified based on their distinct sound and image features. The likelihood of a video being categorized as HVLR or LVHR was determined through statistical analysis. This analysis used the mean and the standard deviation ranges for each group. A total of 26 relevant features were extracted from the sound and image data. Each emotion within the feature value range was initially assigned a probability of 30%. The remaining 70% was distributed as weighted values, based on the distance ratio. The distance between the median of the emotional range and the feature value was calculated as a ratio, normalized to 70%, and added to the base score. Emotion probabilities were computed for all 26 variables, and the emotion with the highest probability was selected as the output.

3. Results

The emotional and physical characteristics of the LVHR and HVLR videos, categorized into two groups based on the subjective evaluations of the expert group, were analyzed (Section 2.1.1). The experiment was conducted with participants viewing eight representative videos selected from each group. The emotional responses of both groups were analyzed using subjective and physiological measures of emotion. Subjective emotions experienced by participants while viewing the videos were analyzed through a survey (Section 3.1). Furthermore, emotions interpreted from a physiological standpoint were assessed by measuring EEG and ECG responses (Section 3.2 and Section 3.3).
These two perspectives provided confirmation of the emotions elicited by the pre-classified LVHR and HVLR groups. The auditory and visual features of each group were analyzed to identify the physical characteristics of ASMR content that triggers these emotional responses (Section 3.4 and Section 3.5). Finally, to evaluate the significance of the experimental findings, the remaining ASMR videos categorized as LVHR and HVLR were compared to the selected experimental stimuli (Section 3.6). The consistency observed between emotional responses and physical characteristics confirmed the reliability of the experimental results.

3.1. Analysis of Emotional Questionnaire Results

As depicted in Figure 2, the analysis of subjective evaluations using the Mann–Whitney U test demonstrated that the arousal and positivity scores for the HVLR group videos were significantly higher than those for the LVHR group videos. Specifically, the HVLR group videos exhibited significantly greater scores in arousal (U = 2817.5, z = −8.872, p = 0.000) and positive affect (U = 4728, z = −5.371, p = 0.000) compared to the LVHR group videos.
The results of the survey, which identified the most closely aligned emotional vocabulary for each group, are presented in Figure 3. A cross-analysis revealed that bright and energetic terms such as “vibrant”, “pleasant”, and “enjoyable” were predominantly selected in the HVLR group, while calm and serene terms such as “calm”, “comfortable”, and “relaxed” were more frequently chosen in the LVHR group.

3.2. EEG Anlaysis

The analysis of EEG changes during video viewing revealed significant differences in the delta and the theta band between the HVLR and LVHR groups, using independent sample t-tests and Mann–Whitney U tests. As shown in Figure 4, mean power changes in the delta band differed notably between groups at the Fp2, P3, P4, C3, and T4 electrode sites, with higher values observed during LVHR video viewing compared to HVLR. In the theta band, a statistically significant difference emerged at the O2 site, while the O1 site exhibited a meaningful trend without reaching statistical significance. Theta wave activity in the occipital region displayed greater mean power changes during LVHR video viewing compared to HVLR. Figure 5 provides a comparison of activation levels across brain regions and frequency bands for both groups.
Beta wave activity was more prominent at the C3 and C4 sites compared to other locations, displaying a meaningful trend despite lacking statistical significance (F = 0.014, T(6) = −2.170, p = 0.073). Specifically, the HVLR group exhibited greater activation at C3, while the LVHR group showed higher activation at C4. The difference in power values between the HVLR and LVHR groups was more pronounced at C3 than at C4, highlighting a distinctive trend in beta band activation patterns across these electrode sites.
As presented in Table 3, the asymmetry index (AI) was generally positive across all brain regions, indicating greater right hemisphere activation. Specifically, at the Fp sites, the AI values for all frequency bands exceeded 0.2 for both LVHR and HVLR group videos. At the C sites, the AI for the alpha and beta bands was greater than 0.2 during the viewing of LVHR group videos, whereas, at the T sites, the AI for the beta band exceeded 0.2 during the viewing of HVLR group videos. At the O sites, the AI for the beta band was above 0.2 for both LVHR and HVLR group videos. Conversely, the absolute AI values were calculated to be below 0.05 in the delta and theta bands at the T sites and in the theta band at the O sites, irrespective of the video type.

3.3. ECG Analysis

The ECG analysis, performed using independent sample t-tests and Mann–Whitney U tests (Figure 6), revealed significant differences across several variables. The time-domain variables did not yield statistically significant results. However, within the frequency domain, LF, HF, ln HF, and RSA demonstrated statistically significant differences, with all variables showing higher values during the viewing of LVHR group videos compared to HVLR. Although SDNN and total power—representing overall autonomic nervous system activity in the time and frequency domains—did not reach statistical significance, both were relatively elevated during LVHR video viewing. These findings align with the trends observed in significant frequency-domain variables such as LF and HF, providing insight into the overall pattern of ECG variable changes across conditions.

3.4. Sound Features Analysis

The analysis of the sound features, conducted using independent sample t-tests and Mann–Whitney U tests, identified significant differences across most frequency bands (Figure 7). When categorized by frequency bands, the mean power values in the mid-frequency band (U = 0, z = −2.309, p = 0.029) and high-frequency band (F = 0.363, T(6) = −4.096, p = 0.006) were significantly higher in the HVLR group compared to the LVHR group. Additionally, the standard deviation of power in the high-frequency band (F = 5.450, T(6) = −7.459, p < 0.001) was significantly greater in the HVLR group.
When classified according to semantic characteristics, the mean values and standard deviations of mel power, general power, sensory power, and conversational power were significantly elevated in HVLR group videos compared to LVHR group videos: mean mel power (F = 0.147, T(6) = −3.905, p = 0.008), standard deviation (F = 3.996, T(6) = −6.441, p = 0.001), mean general power (F = 0.092, T(6) = −4.290, p = 0.005), standard deviation (F = 1.761, T(6) = −5.885, p = 0.001), mean sensory power (F = 0.101, T(6) = −5.030, p = 0.002), standard deviation (F = 0.019, T(6) = −5.287, p = 0.002), mean conversational power (F = 0.006, T(6) = −4.555, p = 0.004), and standard deviation (F = 0.193, T(6) = −3.273, p = 0.017). For the main conversational power (F = 0.001, T(6) = −4.493, p = 0.004), only the mean value was significantly higher in the HVLR group videos.
The extraction of mel-frequency cepstral coefficients (MFCCs) revealed notable differences between the HVLR and LVHR group videos (Figure 8). Specifically, coefficients F1 (F = 0.152, T(6) = −3.569, p = 0.012), F10 (F = 0.047, T(6) = −4.922, p = 0.003), and F18 (F = 5.566, T(6) = −5.804, p = 0.001) were significantly higher in the HVLR group videos. Conversely, the lower frequency coefficients F2 (F = 0.231, T(6) = 5.858, p = 0.001), F3 (U = 16, z = 2.309, p = 0.029), and F4 (U = 16, z = 2.309, p = 0.029) were higher in the LVHR group videos.

3.5. Image Features Analysis

Figure 9 illustrates the analysis of image features in HVLR and LVHR group videos using independent sample t-tests and Mann–Whitney U tests. In the RGB color space, the mean values for gray (F = 1.939, T(6) = −6.319, p = 0.001), red (U = 16, z = 2.309, p = 0.029), green (F = 1.708, T(6) = −6.056, p = 0.001), and blue (F = 0.341, T(6) = −6.737, p = 0.001) were significantly higher in the HVLR group videos compared to the LVHR group videos.
In the HSV color space, the mean value of hue (F = 1.622, T(6) = 3.976, p = 0.007) was significantly higher in the LVHR group videos, while the mean value of value (U = 16, z = 2.309, p = 0.029) was greater in the HVLR group videos.
In the LAB color space analysis, the mean values of lightness (F = 2.061, T(6) = −6.078, p = 0.001) and the beta component (F = 0.002, T(6) = −2.914, p = 0.027) were significantly higher in the HVLR group videos.

3.6. Emotion Recognition Analysis

The results of the emotion recognition, based on the statistical analysis, are presented in Table 4. This evaluation involved 56 videos classified into the HVLR and LVHR groups based on subjective evaluations. The goal was to determine whether the physical characteristics of the videos could effectively differentiate between HVLR and LVHR emotion-inducing videos. The model achieved an accuracy of 91.07%. The precision for HVLR videos was 87.10%, with a recall of 96.43% and an F1-score of 91.53%. For LVHR videos, the precision was 96.00%, with a recall of 85.71% and an F1-score of 90.43%.
Additionally, the top 7 images for each group were selected by calculating the Euclidean distance between the characteristic values of the 56 images and the average values of the HVLR and LVHR groups. Figure 10 presents the emotional vocabulary results for the 14 images with the smallest Euclidean distance. For the HVLR group, the most frequently selected words were “vibrant” (42.86%), “pleasant” (20.78%), and “enjoyable” (12.99%). In contrast, the LVHR group predominantly selected the words “relaxed” (54.55%), “calm” (16.88%), and “comfortable” (14.29%).

4. Discussion

This study aimed to investigate the sound and image characteristics of ASMR videos that evoke specific emotional responses by utilizing physiological signals and subjective assessments. The analysis of EEG and ECG patterns and subjective responses confirmed that ASMR videos, which evoke distinct emotional and physiological responses, are clearly distinguished by their sound and image features. Specific sound and image features associated with each emotional response were also identified. These findings provide important insights into the physical properties of ASMR videos that effectively regulate viewers’ emotional and physiological states, particularly those elements that enhance positive emotions and relaxation.

4.1. Induced Emotions

4.1.1. Emotional Perspective: Subjective Evaluation

Our findings revealed significant differences in both positivity and arousal scores between the HVLR and LVHR groups. This indicates that the extent to which ASMR induces positivity and relaxation varies based on its physical characteristics. Specifically, the HVLR group induced relatively less relaxation but elicited more positive emotions. This outcome aligns with previous studies [4,8,9] that demonstrated the positive emotional, attention, and arousal-inducing effects of ASMR. It suggests that the ASMR stimuli used in those studies likely share similar characteristics with the HVLR group. In contrast, the LVHR group elicited fewer positive emotions but promoted greater relaxation. This suggests that the characteristics of the LVHR group are more effective in inducing relaxation. These characteristics may resemble those of the ASMR stimuli used in studies focused on sleep induction and stress reduction [7,13]. Overall, these findings support our hypothesis that the levels of positive emotion and arousal induced by ASMR vary depending on the characteristics of the stimuli.
In terms of emotional vocabulary selection, the HVLR group predominantly chose words such as “vibrant”, “enjoyable”, and “pleasant”, reflecting bright and dynamic emotions. Conversely, the LVHR group primarily selected vocabulary like “calm”, “relaxed”, and “comfortable”, which convey more static and tranquil emotions. These results directly demonstrate the emotional responses elicited by the videos in each group, supporting the hypothesis that different types of ASMR stimuli evoke distinct emotional experiences.

4.1.2. Physiological Perspective: EEG

EEG analysis showed that both the LVHR and HVLR groups exhibited higher activation in the frontal lobe compared to other brain regions. Overall, delta waves decreased, while beta waves increased. These results align with previous studies suggesting that ASMR activates the frontal lobe, which is associated with attention and relaxation, and induces positive emotional states and relaxation [19,20].
Significant differences in delta and theta wave activity were observed between the HVLR and LVHR groups, with the HVLR group showing a greater decrease. Delta wave activity was significantly lower at most electrode sites in the HVLR group compared to the LVHR group. Theta wave activity in the occipital lobe also notably decreased in the HVLR group. Chow et al., (2021) demonstrated that delta and theta waves increase in a relaxation-inducing environment [25]. Furthermore, activation of these waves has been observed during meditative states [26], supporting previous research indicating that deep relaxation increases delta and theta waves. Delta waves typically increase during deep sleep or rest, but they also play a crucial role in emotional processing and neural recovery [27,28]. The relatively higher activation of delta waves in the LVHR group suggests that it may promote overall neural recovery, contributing to greater relaxation. Theta waves are linked to cognitive processes, including memory formation and emotion regulation [29,30]. In the LVHR group, increased theta wave activity in the occipital lobe suggests that the darker and more monotonous images in the LVHR videos reduced visual stimulation, promoting relaxation [26].
Differences in mean power values between the LVHR and HVLR groups exhibited distinct patterns depending on the EEG site. In the frontal lobe, theta, alpha, and beta wave activity increased while delta wave activity decreased when viewing HVLR videos compared to LVHR videos. This suggests that the HVLR group requires relatively more attention and cognitive processing [31,32]. In other regions, such as the parietal, temporal, occipital, and central regions, delta and theta waves increased while alpha and beta waves decreased when viewing LVHR videos. This indicates that LVHR videos tend to induce more relaxation and reduce cognitive activity [20,31].
Calculations of the asymmetry index (AI) revealed that the right hemisphere was more activated than the left hemisphere in most brain regions while watching ASMR videos. Typically, positive stimuli are associated with left-brain activation [33,34], but previous research suggests that the right hemisphere can also respond to positive emotions, especially when attention is required for arousal [35,36]. This implies that both HVLR and LVHR videos can provide positive stimuli that promote focus. The frontal lobe, responsible for cognitive abilities, showed significantly greater asymmetry than other regions. There were pronounced increases in right-hemisphere alpha and beta wave activation when watching HVLR videos compared to LVHR videos. This implies that HVLR videos induce relatively higher attention and arousal [32,33]. In the central region, LVHR videos activated beta waves in the right hemisphere, whereas HVLR videos significantly activated beta waves in the left hemisphere. Since the central region integrates emotional and cognitive functions, this suggests that HVLR videos evoke more positive emotions than LVHR videos [32,37].

4.1.3. Physiological Perspective: ECG

Heart rate variability (HRV) analysis revealed that the LVHR group generally induced a higher level of relaxation compared to the HVLR group. The mean values of SDNN (a time-domain measure) and total power (a frequency-domain measure), both indicators of autonomic nervous system balance, were higher when participants watched LVHR videos. These higher values indicate greater HRV variability, which is associated with a healthier, less stressed physiological state [24]. This suggests that LVHR videos are more effective at promoting relaxation and reducing stress.
HRV data across frequency bands showed that both low-frequency (LF) and high-frequency (HF) components increased after viewing ASMR videos in both groups. However, the increase was significantly more pronounced in the LVHR group, consistent with the higher total power observed. Additionally, the value of ln HF, reflecting parasympathetic nervous system activity, was higher in the LVHR group, suggesting a greater state of rest and stress reduction [24]. RSA, another indicator of parasympathetic activity, also showed a significant increase in the LVHR group compared to the HVLR group, further indicating enhanced parasympathetic activation.
Both groups exhibited increased HRV variability following ASMR stimulation, which is consistent with previous research demonstrating that ASMR enhances HRV and leads to relaxation and stress reduction [13,14,38]. However, the degree of this effect varied based on the type of ASMR stimulus, supporting the study’s hypothesis.

4.2. Sound and Image Features

The comparison of sound characteristics between the two groups revealed that the HVLR group had louder, higher-pitched sounds with greater variability compared to the LVHR group. The HVLR videos exhibited significantly higher average values across various sound characteristics, reflecting greater power in each frequency range, indicating overall louder sound profiles. Specifically, the HVLR group showed much higher average and standard deviation values for power in the high-frequency band, mel power, general power, sensory power, and conversational power categories.
MFCC feature extraction further indicated that the coefficients F1, F10, and F18 were significantly higher in HVLR group videos. In contrast, the coefficients F2, F3, and F4 were notably higher in LVHR group videos. The first MFCC (F1) represents the overall energy distribution of the sound signal and serves as an indicator of signal strength. F10 (1233–1800 Hz) and F18 (4300–5400 Hz) correspond to the medium- and high-frequency ranges, while F2, F3, and F4 reflect the low-frequency bands (66.67–433.33 Hz) [16].
These findings collectively suggest that HVLR group videos have a more variable and rich acoustic profile with higher acoustic power. In contrast, LVHR group videos feature relatively lower, less variable sounds, particularly in the medium and high-frequency ranges.
As previously discussed, the LVHR and HVLR groups were classified based on subjective evaluations from the expert panel. Previous studies have confirmed that the HVLR group evokes high positivity and lively emotions, while the LVHR group induces high relaxation, with calm and static emotions. These results align with the findings of Ma (2015) and Meng (2020), who argued that fast-tempo, high-pitched sounds tend to induce positive and energetic emotions, whereas slower-tempo and lower-pitched sounds elicit relaxation [39,40]. This study’s results further support their conclusions.
Beyond sound characteristics, HVLR group videos exhibited clearer and richer image components. All components of the RGB color space (red, green, blue) were significantly higher in the HVLR group compared to the LVHR group. These higher values indicate a greater proportion of the respective colors, resulting in clearer and brighter images. This suggests that HVLR group videos have brighter and more vivid visual elements than LVHR group videos. The grayscale value, calculated by combining the three components, was also significantly higher in the HVLR group, indicating overall brighter images. Additionally, the brightness values in both the HSV and LAB color spaces were significantly higher in the HVLR group. Therefore, HVLR group videos consistently had brighter images across all color spaces, suggesting higher arousal levels [41].
In terms of color, there was no statistically significant difference between red and green in the RGB color space. However, the HVLR group had the highest red component (R = 167.189, G = 156.911, B = 135.765), while the LVHR group had the highest blue component (R = 58.565, G = 59.811, B = 66.984). Green remained at a similar intermediate level in both groups. This trend was reversed between the LVHR and HVLR groups. The hue, a crucial variable in the HSV color space, showed significant differences between the LVHR group (m = 169.131) and the HVLR group (m = 72.673). In this space, a hue of 60 represents yellow, while 180 represents cyan. Thus, the HVLR group primarily exhibited yellow with a hint of green, whereas the LVHR group exhibited blue–green tones. In the LAB color space, the beta variable, which represents color, also showed significant differences. A positive beta value represents yellow, and a negative value represents blue. The HVLR group had a mean beta value of 12.269, while the LVHR group had a mean of −4.283. This indicates that the HVLR group primarily contained yellow components, while the LVHR group contained blue components.
In conclusion, the HVLR group consistently displayed brighter and more yellow-dominant image elements across all color spaces, while the LVHR group exhibited relatively darker and more blue-dominant elements. This aligns with previous research indicating that yellow evokes feelings of happiness and vitality, whereas blue induces calmness and freshness [17,18]. However, no significant differences were observed between red and green in any color space, suggesting that the primary differences between the HVLR and LVHR groups are influenced by brightness and the yellow–blue color components.

4.3. Emotion Recognition Result

Based on the subjective evaluation of 11 experts, the physical characteristic values of the 56 videos in the HVLR and LVHR groups were effectively explained by the eight sound and image features selected as experimental stimuli. The classification accuracy for the images was 91.07%, indicating that these characteristic values effectively distinguish between HVLR and LVHR sensitivities.
However, the LVHR group had a higher proportion of misclassifications compared to the HVLR group, likely because ASMR videos tend to induce both positivity and relaxation. Since ASMR elicits emotions along the same dimensions of positivity and relaxation, variations in intensity between HVLR and LVHR videos may result in different emotional responses among individuals, particularly for videos with moderate characteristic values between the two categories [7,19]. Nevertheless, the relatively high accuracy in distinguishing LVHR and HVLR emotions suggests that different emotions can be derived from the physical characteristics of the videos, even within the same ASMR content.
Furthermore, the sound and image characteristic values of the 56 images were compared to the average values of the HVLR and LVHR groups, which consisted of 8 representative images, using Euclidean distance calculations. For each group, the top 7 images were ranked. The HVLR group was primarily associated with the emotional vocabulary “vibrant” and “pleasant”, while the LVHR group was characterized by “relaxed” and “calm”.
These findings are consistent with the analysis of the 8 representative images, demonstrating that the emotional vocabulary selected in the experiment accurately describes the HVLR and LVHR groups.

5. Conclusions

This study sought to elucidate how specific sound and image features of ASMR videos elicit distinct emotional responses, focusing on high valence–low relaxation (HVLR) and low valence–high relaxation (LVHR) states. By integrating physiological measures such as EEG and HRV with subjective evaluations, we aimed to provide concrete evidence of how different ASMR stimuli influence emotional arousal and relaxation levels. The findings clearly demonstrate that ASMR content can be strategically designed to evoke targeted emotional states based on its sensory properties.
Our results confirmed that LVHR ASMR videos, characterized by lower-frequency sounds and darker, more muted colors, significantly increase delta wave activity in the frontal lobe and enhance parasympathetic nervous system activity, as indicated by heightened HRV. These physiological markers are consistent with greater relaxation and emotional calm. In contrast, HVLR ASMR videos, featuring mid- to high-frequency sounds and visually brighter elements, were associated with increased beta wave activity in the frontal lobe and a more modest improvement in HRV, indicating heightened cognitive engagement and positive arousal. The subjective reports of positivity and relaxation also aligned with participants’ physiological responses. Additionally, the emotional vocabulary selected reflected these patterns, with the LVHR group choosing words linked to calmness and tranquility, while the HVLR group opted for words associated with brightness and energy. These results demonstrate a strong correlation between physiological responses and subjective evaluations, offering scientific support for the emotional effects of ASMR.
The analysis of the ASMR videos’ physical characteristics demonstrated a high level of accuracy in distinguishing between LVHR and HVLR emotional states. For instance, videos with louder sounds in the 50 to 500 Hz range, lower brightness levels of 20 to 30%, and a higher ratio of green to blue were more likely to evoke LVHR responses. In contrast, videos with louder sounds in the 500 to 10,000 Hz range, brightness levels of 60 to 70%, and a higher ratio of yellow to green were more likely to elicit HVLR responses. This finding underscores the potential for ASMR content to be systematically designed to evoke specific emotional responses. It provides evidence-based guidelines for both researchers and content creators.
However, several limitations should be noted. The study involved 31 participants, all healthy adults from a specific age group. This relatively small sample size may limit the generalizability of the findings. Additionally, the study focused exclusively on a subset of ASMR videos, specifically those categorized as high valence–low relaxation (HVLR) and low valence–high relaxation (LVHR), excluding a broader range of ASMR content that may elicit different emotional or physiological responses. Furthermore, the experiment was conducted in a controlled laboratory environment, which may not fully reflect the real-world settings in which ASMR is typically experienced. ASMR is often consumed in relaxed, personal environments, such as before sleep, where physiological and emotional responses may differ.
Additionally, individual differences in disposition and preference may affect ASMR’s impact. While this study analyzed ASMR content characteristics using statistical techniques, future research could explore optimal content properties tailored to individual differences through methods such as machine learning.
In conclusion, this research provides foundational insights into how ASMR content can be designed to elicit specific emotional responses based on its sound and image characteristics. The findings not only enhance the effectiveness and appeal of ASMR videos but also have practical implications for their use in therapeutic settings and experimental design. Thoughtfully crafted ASMR content can serve as a valuable tool for promoting emotional relaxation and well-being. Additionally, by defining emotional vocabulary that accurately captures specific emotional states, this research can improve consumers’ understanding of ASMR videos and offer content creators clear guidance for effective content design. Overall, this study provides a scientific framework for the strategic development of ASMR content, bridging the gap between the subjective ASMR experience and measurable physiological responses.

Author Contributions

Conceptualization, A.C. and M.W.; methodology, A.C.; software, H.L.; validation, Y.K. and A.C.; investigation, Y.K.; data curation, Y.K. and H.L.; writing—original draft preparation, A.C.; writing—review and editing, M.W.; visualization, Y.K.; supervision, M.W.; project administration, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by an Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government [24ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System] and Wishcompany under the Development of an AI Model for Emotion Recognition in Digital Content.

Institutional Review Board Statement

The study was approved by the Ethics Committee of the National Bioethics Committee (SMIRB(S-2023-10)) on 2 January 2024.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets presented in this article are not available because we jointly own the data with our partner organization, Wishcompany Co., Ltd.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Waldfogel, J. How Digitization Has Created a Golden Age of Music, Movies, Books, and Television. J. Econ. Perspect. 2017, 31, 195–214. [Google Scholar] [CrossRef]
  2. Woods, N.; Turner-Cobb, J.M. “It’s like Taking a Sleeping Pill”: Student Experience of Autonomous Sensory Meridian Response (ASMR) to Promote Health and Mental Wellbeing. Int. J. Environ. Res. Public Health 2023, 20, 2337. [Google Scholar] [CrossRef] [PubMed]
  3. Garro, D. Autonomous Sensory Meridian Response—From Internet Subculture to Audiovisual Therapy. Proc. EVA Lond. 2017, 395, 1–8. [Google Scholar]
  4. Barratt, E.L.; Davis, N.J. Autonomous Sensory Meridian Response (ASMR): A flow-like mental state. PeerJ 2015, 3, e851. [Google Scholar] [CrossRef]
  5. Gallagher, R. Eliciting Euphoria Online: The Aesthetics of ‘ASMR’ Video Culture. Film Crit. 2016, 40, 202. [Google Scholar] [CrossRef]
  6. Smith, N.; Snider, A.M. ASMR, affect and digitally-mediated intimacy. Emot. Space Soc. 2019, 31, 41–48. [Google Scholar] [CrossRef]
  7. Yusaira, F.; Bennett, C.N. Influence of Autonomous Sensory Meridian Response on Relaxation States: An Experimental Study. NeuroRegulation 2021, 8, 184–193. [Google Scholar] [CrossRef]
  8. Engelbregt, H.J.; Schilperoort, T.; van Harten, L.; van Boxtel, G.J.M. The effects of autonomous sensory meridian response (ASMR) on mood, attention, heart rate, skin conductance, and EEG in healthy young adults. Exp. Brain Res. 2022, 240, 1321–1335. [Google Scholar] [CrossRef]
  9. Lohaus, T.; Yüksekdag, S.; Bellingrath, S.; Thoma, P. The effects of Autonomous Sensory Meridian Response (ASMR) videos versus walking tour videos on ASMR experience, positive affect, and state relaxation. PLoS ONE 2023, 18, e0277990. [Google Scholar] [CrossRef]
  10. Lochte, B.C.; Guillory, S.A.; Richard, C.A.; Kelley, W.M. An fMRI investigation of the neural correlates underlying the autonomous sensory meridian response (ASMR). BioImpacts 2018, 8, 259–266. [Google Scholar] [CrossRef]
  11. Fredborg, B.K.; Clark, J.M.; Smith, S.D. An examination of personality traits associated with autonomous sensory meridian response (ASMR). Front. Psychol. 2017, 8, 247. [Google Scholar] [CrossRef] [PubMed]
  12. Smith, S.D.; Fredborg, B.K.; Kornelsen, J. An electroencephalographic examination of the autonomous sensory meridian response (ASMR). NeuroImage 2020, 207, 116–360. [Google Scholar]
  13. Poerio, G.L.; Blakey, E.; Hostler, T.J.; Veltri, T. More than a feeling: Autonomous sensory meridian response (ASMR) is characterized by reliable changes in affect and physiology. PLoS ONE 2018, 13, e0196645. [Google Scholar] [CrossRef] [PubMed]
  14. Engelbregt, H.J.; Stins, J.F.; Zwaan, K. The effects of autonomous sensory meridian response (ASMR) on mood, attention, and sleep. J. Psychophysiol. 2022, 36, 89–101. [Google Scholar]
  15. Berard, G.; Brockett, S. Hearing Equals Behavior: Updated and Expanded; eBooks2go, Inc.: Schaumburg, IL, USA, 2014. [Google Scholar]
  16. Davis, S.B.; Mermelstein, P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 1980, 28, 357–366. [Google Scholar] [CrossRef]
  17. Mohseni, S.A.; Wu, H.R.; Thom, J.A.; Bab-Hadiashar, A. Recognizing Induced Emotions With Only One Feature: A Novel Color Histogram-Based System. IEEE Access 2020, 8, 37173–37188. [Google Scholar] [CrossRef]
  18. Kim, H.-R.; Kang, H.; Lee, I.-K. Image Recoloring with Valence-Arousal Emotion Model. Comput. Graph. Forum 2016, 35, 1–12. [Google Scholar] [CrossRef]
  19. Sakurai, N.; Nagasaka, K.; Sasaki, K.; Yuguchi, Y.; Takahashi, S.; Kasai, S.; Onishi, H.; Kodama, N. The relaxation effect of autonomous sensory meridian response depends on personal preference. Front. Hum. Neurosci. 2023, 17, 1249176. [Google Scholar] [CrossRef]
  20. Lin, C.; Kondo, H. Brain circuits in autonomous sensory meridian response and related phenomena. NeuroImage 2024, 230, 105382. [Google Scholar] [CrossRef]
  21. Packheiser, J.; Schmitz, J.; Pan, Y.; El Basbasse, Y.; Friedrich, P.; Güntürkün, O.; Ocklenburg, S. Using Mobile EEG to Investigate Alpha and Beta Asymmetries During Hand and Foot Use. Front. Neurosci. 2020, 14, 109. [Google Scholar] [CrossRef]
  22. Burch, J.B.; Alexander, M.; Balte, P.; Sofge, J.; Winstead, J.; Kothandaraman, V.; Ginsberg, J.P. Shift work and heart rate variability coherence: Pilot study among nurses. Appl. Psychophysiol. Biofeedback 2019, 44, 21–30. [Google Scholar] [CrossRef] [PubMed]
  23. Gospodinov, M.; Gospodinova, E.; Georgieva-Tsaneva, G. Mathematical methods of ECG data analysis. In Healthcare Data Analytics and Management; Elsevier: Amsterdam, The Netherlands, 2019; pp. 177–206. [Google Scholar]
  24. Task Force of the European Society of Cardiology the North American Society of Pacing Electrophysiology. Heart rate variability: Standards of measurement, physiological interpretation and clinical use. Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology. Circulation 1996, 93, 1043–1065. [Google Scholar] [CrossRef]
  25. Chow, L.S.; George, R.; Rizon, M.; Moghavvemi, M.; Paley, M.N.J. Investigating the Axonal Magnetic Fields Corresponding to Delta and Theta Waves in the Human Brain Using Direct Detection MRI. IEEE Access 2021, 9, 152856–152868. [Google Scholar] [CrossRef]
  26. Dennison, P. The Human Default Consciousness and Its Disruption: Insights From an EEG Study of Buddhist Jhāna Meditation. Front. Hum. Neurosci. 2019, 13, 178. [Google Scholar] [CrossRef]
  27. Harmony, T. The functional significance of delta oscillations in cognitive processing. Front. Integr. Neurosci. 2013, 7, 83. [Google Scholar] [CrossRef]
  28. Başar, E.; Schürmann, G.; Başar-Eroglu, M. Brain oscillations in perception and memory. Int. J. Psychophysiol. 2001, 39, 151–171. [Google Scholar] [CrossRef]
  29. Klimesch, W. EEG theta and memory processes. Cortex 1999, 35, 25–34. [Google Scholar]
  30. Knyazev, G.G. Motivation, emotion, and their inhibitory control mirrored in brain oscillations. Neurosci. Biobehav. Rev. 2007, 31, 377–395. [Google Scholar] [CrossRef]
  31. Swart, T.R.; Banissy, M.J.; Hein, T.P.; Bruña, R.; Pereda, E.; Bhattacharya, J. ASMR amplifies low frequency and reduces high frequency oscillations. Cortex 2022, 149, 85–100. [Google Scholar] [CrossRef]
  32. Mesa-Gresa, P.; Gil-Gómez, J.A.; Lozano-Quilis, J.A.; Schoeps, K.; Montoya-Castilla, I. Electrophysiological correlates of the emotional response on brain activity in adolescents. Biomed. Signal Process. Control 2024, 89, 105754. [Google Scholar] [CrossRef]
  33. Apicella, A.; Arpaia, P.; Mastrati, G.; Moccaldi, N. EEG-based detection of emotional valence towards a reproducible measurement of emotions. Sci. Rep. 2021, 11, 21615. [Google Scholar] [CrossRef] [PubMed]
  34. Prete, G.; Croce, P.; Zappasodi, F.; Tommasi, L.; Capotosto, P. Exploring brain activity for positive and negative emotions by means of EEG microstates. Sci. Rep. 2022, 12, 3404. [Google Scholar] [CrossRef] [PubMed]
  35. Hartikainen, K.M. Emotion-Attention Interaction in the Right Hemisphere. Brain Sci. 2021, 11, 1006. [Google Scholar] [CrossRef] [PubMed]
  36. Kheirkhah, M.; Baumbach, P.; Leistritz, L.; Witte, O.W.; Walter, M.; Gilbert, J.R.; Zarate, C.A., Jr.; Klingner, C.M. The Right Hemisphere Is Responsible for the Greatest Differences in Human Brain Response to High-Arousing Emotional versus Neutral Stimuli: A MEG Study. Brain Sci. 2021, 11, 960. [Google Scholar] [CrossRef] [PubMed]
  37. Alakus, T.B.; Gonen, M.; Turkoglu, I. Database for an emotion recognition system based on EEG signals and various computer games—GAMEEMO. Biomed. Signal Process. Control 2020, 60, 101951. [Google Scholar] [CrossRef]
  38. Benz, A.B.E.; Gaertner, R.J.; Meier, M.; Unternaehrer, E.; Scharndke, S.; Jupe, C.; Wenzel, M.; Bentele, U.U.; Dimitroff, S.J.; Denk, B.F.; et al. Nature-Based Relaxation Videos and Their Effect on Heart Rate Variability. Front. Psychol. 2022, 13, 866682. [Google Scholar] [CrossRef]
  39. Ma, W.; Thompson, W.F. Human Emotions Track Changes in the Acoustic Environment. Proc. Natl. Acad. Sci. USA 2015, 112, 14563–14568. [Google Scholar] [CrossRef]
  40. Meng, Q.; Jiang, J.; Liu, F.; Xu, X. Effects of the Musical Sound Environment on Communicating Emotion. Int. J. Environ. Res. Public Health 2020, 17, 2499. [Google Scholar] [CrossRef]
  41. Wilms, L.; Oberfeld, D. Color and Emotion: Effects of Hue, Saturation, and Brightness. Psychol. Res. 2017, 81, 448–463. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions, or products referred to in the content.
Figure 1. The images from video stimuli used in the experiment.
Figure 1. The images from video stimuli used in the experiment.
Applsci 14 10223 g001
Figure 2. This figure illustrates the comparison of mean arousal and positive affect scores between the HVLR and LVHR group videos. The average arousal score for the HVLR group was 3.36 (SD = 1.219), compared to 1.81 (SD = 1.166) for the LVHR group. Similarly, the mean positive affect score for the HVLR group was 3.54 (SD = 1.115), while it was 2.64 (SD = 1.315) for the LVHR group.
Figure 2. This figure illustrates the comparison of mean arousal and positive affect scores between the HVLR and LVHR group videos. The average arousal score for the HVLR group was 3.36 (SD = 1.219), compared to 1.81 (SD = 1.166) for the LVHR group. Similarly, the mean positive affect score for the HVLR group was 3.54 (SD = 1.115), while it was 2.64 (SD = 1.315) for the LVHR group.
Applsci 14 10223 g002
Figure 3. This graph presents the results of a cross-analysis of emotional vocabulary between the HVLR and LVHR groups. For the HVLR group, the most frequently selected emotional terms were “vibrant” (37.90%), “enjoyable” (18.35%), and “pleasant” (12.90%). In contrast, for the LVHR group, the most chosen words were “calm” (37.90%), “comfortable” (17.74%), and “relaxed” (17.74%). The results were statistically significant (Pearson chi-square test, X2(8, n = 248) = 138.002, p = 0.000).
Figure 3. This graph presents the results of a cross-analysis of emotional vocabulary between the HVLR and LVHR groups. For the HVLR group, the most frequently selected emotional terms were “vibrant” (37.90%), “enjoyable” (18.35%), and “pleasant” (12.90%). In contrast, for the LVHR group, the most chosen words were “calm” (37.90%), “comfortable” (17.74%), and “relaxed” (17.74%). The results were statistically significant (Pearson chi-square test, X2(8, n = 248) = 138.002, p = 0.000).
Applsci 14 10223 g003
Figure 4. This figure illustrates significant differences in the average brainwave power between the HVLR group and the LVHR group across several EEG variables. Notable differences were identified in delta wave activity at the Fp2 (F = 0.003, T(6) = −2.877, p = 0.028), T3 (F = 7.988, T(4.608) = 2.056, p = 0.100), T4 (F = 4.221, T(6) = 2.570, p = 0.042), P3 (U = 0, z = −2.309, p = 0.029), P4 (U = 0, z = −2.309, p = 0.029), C3 (U = 0, z = −2.309, p = 0.029), and O2 (F = 3.273, T(6) = 1.977, p = 0.095) electrode sites, as well as in theta wave activity at the O1 (F = 0.048, T(6) = 2.224, p = 0.068) and O2 (F = 0.014, T(6) = 3.251, p = 0.017) sites. (*: p < 0.05).
Figure 4. This figure illustrates significant differences in the average brainwave power between the HVLR group and the LVHR group across several EEG variables. Notable differences were identified in delta wave activity at the Fp2 (F = 0.003, T(6) = −2.877, p = 0.028), T3 (F = 7.988, T(4.608) = 2.056, p = 0.100), T4 (F = 4.221, T(6) = 2.570, p = 0.042), P3 (U = 0, z = −2.309, p = 0.029), P4 (U = 0, z = −2.309, p = 0.029), C3 (U = 0, z = −2.309, p = 0.029), and O2 (F = 3.273, T(6) = 1.977, p = 0.095) electrode sites, as well as in theta wave activity at the O1 (F = 0.048, T(6) = 2.224, p = 0.068) and O2 (F = 0.014, T(6) = 3.251, p = 0.017) sites. (*: p < 0.05).
Applsci 14 10223 g004
Figure 5. The power changes across frequency bands for the LVHR and HVLR groups were visualized using a heat map. Electrode sites with higher activation were represented in red when comparing the two groups. The LVHR group showed significantly greater activation in the left frontal lobe in the delta band compared to the HVLR group. While the HVLR group exhibited higher overall activation in the beta band, this difference was not statistically significant compared to the LVHR group.
Figure 5. The power changes across frequency bands for the LVHR and HVLR groups were visualized using a heat map. Electrode sites with higher activation were represented in red when comparing the two groups. The LVHR group showed significantly greater activation in the left frontal lobe in the delta band compared to the HVLR group. While the HVLR group exhibited higher overall activation in the beta band, this difference was not statistically significant compared to the LVHR group.
Applsci 14 10223 g005
Figure 6. This graph shows the mean power differences between the HVLR and LVHR groups for each ECG variable. The statistical outcomes for each variable are as follows: SDNN (F = 0.263, T(6) = −2.193, p = 0.071), LF (F = 3.695, T(6) = −2.727, p = 0.034), HF (F = 0.035, T(6) = −3.971, p = 0.007), ln HF (F = 0.489, T(6) = −3.456, p = 0.014), total power (F = 0.001, T(6) = −2.433, p = 0.051), and RSA (U = 0, z = −2.309, p = 0.029).
Figure 6. This graph shows the mean power differences between the HVLR and LVHR groups for each ECG variable. The statistical outcomes for each variable are as follows: SDNN (F = 0.263, T(6) = −2.193, p = 0.071), LF (F = 3.695, T(6) = −2.727, p = 0.034), HF (F = 0.035, T(6) = −3.971, p = 0.007), ln HF (F = 0.489, T(6) = −3.456, p = 0.014), total power (F = 0.001, T(6) = −2.433, p = 0.051), and RSA (U = 0, z = −2.309, p = 0.029).
Applsci 14 10223 g006
Figure 7. These graphs illustrate the sound features with significant power differences between the HVLR group and the LVHR group. (a) shows the average power values for the LVHR and HVLR groups, respectively. (b) depicts the standard deviations of power for the LVHR and HVLR groups.
Figure 7. These graphs illustrate the sound features with significant power differences between the HVLR group and the LVHR group. (a) shows the average power values for the LVHR and HVLR groups, respectively. (b) depicts the standard deviations of power for the LVHR and HVLR groups.
Applsci 14 10223 g007
Figure 8. (a) This graph presents the average power values of the MFCCs that display significant differences between the LVHR and HVLR groups. (b) Highlights variables with smaller average values for comparative purposes.
Figure 8. (a) This graph presents the average power values of the MFCCs that display significant differences between the LVHR and HVLR groups. (b) Highlights variables with smaller average values for comparative purposes.
Applsci 14 10223 g008
Figure 9. This graph presents the average values of all image variables for the LVHR and HVLR groups.
Figure 9. This graph presents the average values of all image variables for the LVHR and HVLR groups.
Applsci 14 10223 g009
Figure 10. Results from the emotional vocabulary survey of the top 14 images (7 from each group) with the image characteristics most closely aligned with the LVHR and HVLR group averages, selected from a total of 56 images. A cross-analysis between emotions and vocabulary was conducted, and Fisher’s exact test showed a statistically significant association (p = 0.000).
Figure 10. Results from the emotional vocabulary survey of the top 14 images (7 from each group) with the image characteristics most closely aligned with the LVHR and HVLR group averages, selected from a total of 56 images. A cross-analysis between emotions and vocabulary was conducted, and Fisher’s exact test showed a statistically significant association (p = 0.000).
Applsci 14 10223 g010
Table 1. Frequency range and description of sound feature variables.
Table 1. Frequency range and description of sound feature variables.
Sound FeaturesRange (Hz)Meaning
Low-frequency200–600Low frequency
Mid-frequency600–1500Middle frequency
High-frequency1500–20,000High frequency
Mel power15–20,000The audible range of a human being
General power125–8000The range of everyday life
Sensory power1000–5000A high-sensitivity range
Conversational power250–4000Conversational range
Main conversational power500–2000Important conversation range
MFCC coefficients-Mel-frequency cepstral coefficients
Table 2. Range and Calculation Formulas of ECG Variables.
Table 2. Range and Calculation Formulas of ECG Variables.
VariableUnitsFormulasRangeDescription
Time-domain variablesBPMbpm 60 ÷ ( 1 N i = 1 N P P I i ) -Beats per minute.
SDNNms 1 N 1 I = 1 N [ M e a n P P I P P I i ] 2 -Standard deviation of NN intervals.
RMSSDms 1 N 2 I = 2 N ( P P I i P P I i 1 ) 2 -The square root of the mean of the sum of the squares of differences between adjacent NN intervals.
pNN50% N N 50   c o u n t t o t a l   N N   c o u n t × 100 -NN50 count divided by the total number of all NN intervals.
Frequency-domain variablesTotal powerms2 i = 0.0033 d f 0.4 d f P o w e r i 0.0033–0.40 HzSum of power over the entire frequency range.
LFms2 i = 0.04 d f 0.15 d f P o w e r i 0.04–0.15 HzPower in low frequency range.
LF%n.u. L F T o t a l P o w e r -Normalized power of LF.
ln LF l n ( L F ) -The natural logarithm of LF.
HFms2 i = 0.15 d f 0.4 d f P o w e r i 0.15–0.40 HzPower in high frequency.
HF%n.u. H F T o t a l P o w e r -Normalized power of HF.
ln HF l n ( H F ) -The natural logarithm of HF.
LF/HF ratio- L F H F -Index of the sympathovagal balance.
Coherence ratio P e a k P o w e r T o t a l P o w e r P e a k P o w e r 0.04–0.26 HzThe ratio of peak power within 0.04–0.26 Hz to the difference between the total spectral power and the peak power.
RSAHz 1 N i = 1 N l n ( P t i : t i + t ) 0.12–0.40 HzRespiratory sinus arrhythmia.
Table 3. This table details the asymmetry index (AI) for different brainwave frequency bands. A positive AI value indicates greater activation in the right hemisphere, while a negative value reflects greater activation in the left hemisphere. An absolute AI value greater than 0.2 denotes a high degree of asymmetry, whereas an absolute value smaller than 0.05 suggests near-symmetry between the left and right hemispheres.
Table 3. This table details the asymmetry index (AI) for different brainwave frequency bands. A positive AI value indicates greater activation in the right hemisphere, while a negative value reflects greater activation in the left hemisphere. An absolute AI value greater than 0.2 denotes a high degree of asymmetry, whereas an absolute value smaller than 0.05 suggests near-symmetry between the left and right hemispheres.
FpPCTO
LVHRHVLRLVHRHVLRLVHRHVLRLVHRHVLRLVHRHVLR
Delta0.4880.2760.1540.1210.0680.0760.0160.0100.1110.078
Theta0.3430.3300.1240.0930.1280.064−0.041−0.0520.0190.020
Alpha0.4770.4540.0980.0960.2220.1770.1290.1690.1080.077
Beta0.4410.9670.0990.1480.262−0.1450.1370.3870.2050.315
Table 4. Confusion Matrix of the Emotion Recognition Program.
Table 4. Confusion Matrix of the Emotion Recognition Program.
Predictive ValuesTotal
LVHRHVLR
Actual
values
LVHR24428
HVLR12728
Total2531-
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, Y.; Cho, A.; Lee, H.; Whang, M. Impact of Sound and Image Features in ASMR on Emotional and Physiological Responses. Appl. Sci. 2024, 14, 10223. https://doi.org/10.3390/app142210223

AMA Style

Kim Y, Cho A, Lee H, Whang M. Impact of Sound and Image Features in ASMR on Emotional and Physiological Responses. Applied Sciences. 2024; 14(22):10223. https://doi.org/10.3390/app142210223

Chicago/Turabian Style

Kim, Yubin, Ayoung Cho, Hyunwoo Lee, and Mincheol Whang. 2024. "Impact of Sound and Image Features in ASMR on Emotional and Physiological Responses" Applied Sciences 14, no. 22: 10223. https://doi.org/10.3390/app142210223

APA Style

Kim, Y., Cho, A., Lee, H., & Whang, M. (2024). Impact of Sound and Image Features in ASMR on Emotional and Physiological Responses. Applied Sciences, 14(22), 10223. https://doi.org/10.3390/app142210223

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop