1. Introduction
Virtual reality (VR) technology offers immersive user experiences that blur the boundaries between the actual and simulated worlds. However, the phenomenon of cybersickness presents a significant barrier to broader adoption and user satisfaction. As VR applications have expanded beyond gaming into education, healthcare, and even remote work, the need to effectively predict and mitigate these unwanted side effects has become more crucial than ever.
Cybersickness is similar to motion sickness in many ways. It includes nausea, vertigo, eyestrain, and other oculomotor symptoms [
1]. Historically, many theories have studied the causes of motion sickness, where early ones focused on cerebral blood flow and later ones focused on vestibular, sensory conflict [
2], and postural instabilities [
3]. Out of all these theories, the sensory conflict theory remains the most widely accepted and is applicable in virtual reality [
4,
5].
Traditional methods for evaluating the existence of cybersickness depend heavily on emotional responses collected from self-reported questionnaires or post-exposure surveys. While these methods offer significant insights, they come with limitations. Self-reported questionnaires have limitations, as the user may have a hard time remembering the specifics of an experience after a long session. Additionally, users may under- or overestimate their symptoms, which can affect the accuracy of the self-reported questionnaires. Modern studies have explored the use of physiological signals such as heart rate (HR), electroencephalogram (EEG), electrocardiogram (EKG), and skin conductance or Galvanic Skin response (GSR) to predict cybersickness. Though effective, these methods require specialized sensors, which can detract from the user experience and limit practicality in everyday VR applications. Strict research protocols and exorbitant equipment are needed to collect and analyze these types of data. Also, the variability among different individuals in physiological signals makes it challenging to develop universally effective prediction models.
Among these challenges, machine learning (ML) has emerged as a promising tool for estimating cybersickness in virtual environments. Research using ML to determine cybersickness using different types of self-reported and physiological data is constantly on the rise. One promising area is using head-tracking data individually or in combination with other types of data that contribute to cybersickness estimation or prediction.
1.1. Problem Statement
Modern VR devices rely on continuous tracking of complex head movements to deliver highly immersive environments. These rich head-tracking data (HTD) present a promising, nonintrusive avenue for predicting the onset of cybersickness, which is a significant barrier to VR adoption. While recent research [
6,
7,
8] has explored correlations between HTD patterns and cybersickness, existing approaches often combine HTD with additional sensors or rely on post-experience self-reports. These methods increase system complexity, cost, and intrusiveness, limiting their scalability for real-time, consumer-grade VR applications. There remains a critical gap in developing systems that leverage HTD alone to predict and mitigate cybersickness in a practical, accessible, and cost-effective manner. This gap underscores the need for robust, machine learning-based solutions that can analyze HTD in real time to enhance user comfort and expand VR accessibility.
1.2. Contributions
To address the gap in the existing research, we propose an ML model that estimates the severity of cybersickness based on the HTD and validates them against the self-reported questionnaire responses. Like the existing classification models, our approach aims to provide a continuous assessment of cybersickness that helps understand user discomfort. The two objectives of our research are as follows:
With these objectives, we strive to provide a scalable and practical solution for cybersickness prediction with readily available HTD in real time.
Following the Introduction, the remainder of this paper is organized as follows.
Section 2 provides a review of related work, discussing prior approaches to cybersickness prediction and highlighting the unique contributions of this study.
Section 3 details the methodology, including the experimental design, data collection process, and feature extraction from head-tracking data. The machine learning models used for prediction are described, along with their training and evaluation procedures. The results of the study, including model performance metrics and key findings, are presented in
Section 4.
Section 5 offers a discussion of the results, emphasizing the implications, limitations, and potential applications of the proposed approach. Finally,
Section 6 concludes the paper by summarizing the main contributions and outlining directions for further research to enhance the scalability and accuracy of cybersickness prediction systems.
2. Related Work
Researchers have explored various approaches to predict cybersickness in VR environments by integrating multiple data sources.
2.1. Physiological Data
Tasnim et al. [
7] utilized eye tracking, EDA, and heart rate data, while Oh et al. [
9] employed EEG, ECG, and GSR signals. Dennison et al. [
10] also incorporated EOG, GSR, ECG, and HR data. These physiological signals can provide great insight into the user’s physical and cognitive state [
11]. The challenges in using such data are the dependency on external sensors, individual differences, and complexity because of the integration of multiple data types.
2.2. Head and Eye Movement Data
Head and eye movements within virtual environments have also been shown to be strong predictors of cybersickness. Maneuvrier et al. [
11] and Salehi et al. [
12] focused on head rotation data, while Chang et al. [
13] utilized user gaze behavior. Kundu et al. [
14], Kim et al. [
15], Islam et al. [
16], and Ref. [
6] incorporated eye-tracking and head-tracking data in combination with physiological signals and subjective reports, achieving high prediction accuracy. Motion tracking data can capture sensory conflicts and discrepancies between visual and vestibular cues, known contributors to cybersickness. Palmisano et al. [
17] explored the effects of display lag and differences between virtual and physical head pose, while Curry et al. [
18,
19] used head and torso movements. Zhu et al., in their study, use a multimodal dataset comprising physiological data and VR scene recordings [
20]. These factors can introduce visual–vestibular conflicts and potentially exacerbate discomfort in VR environments.
2.3. Subjective User Reports
Data collected through questionnaires like the Simulation Sickness Questionnaire (SSQ) [
21] or the Virtual Reality Sickness Questionnaire (VRSQ) [
22] play a pivotal role in understanding the user perception and experience of cybersickness. These self-reported measures are not just data points but invaluable insights that help us comprehend individual differences in cybersickness perception. They are essential for tailoring VR experiences to user comfort levels, thereby enhancing the overall user experience and reducing the incidence of cybersickness.
Despite the abundance of research, several critical gaps remain, highlighting the need for more studies. Most researchers rely on data handed down from external sensors or other physiological monitoring devices. This raises the setup complexity and cost, limiting its applicability in consumer VR experiences. Force plate sensors used to measure postural sway require users to be in a fixed position, limiting VR interactions. Some studies, like Li et al., also explore the behavioral issues of cybersickness in VR environments [
23]. Newer generations of VR headsets are capable of advanced sensing and capturing very detailed head-tracking data, which have not been explored well for cybersickness prediction; there is a need to leverage such first-hand kinematic data from VR headsets.
Our research aims to overcome these limitations by using first-hand kinematic data directly from the sensors in the VR headset. Our work improves real-time feasibility for predicting cybersickness by simplifying the experimental setup, focusing on metrics such as velocity, acceleration, and jerk derived from head-tracking data. Studying the kinematics metrics with thresholds that trigger motion sickness provides new insights into specific conditions that contribute to users’ discomfort.
3. Methodology
The primary objective of this study is to process, compute, and analyze kinematic data derived from head tracking in a VR environment. We aim to extract meaningful metrics such as velocity, acceleration, and jerk and investigate their impact on user comfort. Furthermore, we analyze whether these metrics exceed certain thresholds that are known to trigger sickness in VR, allowing us to identify conditions that contribute to user discomfort. The following subsections detail the experimental design, the types of data collected and their collection procedures, the analysis of kinematic variables, statistical data analysis, and the use of machine learning to predict cybersickness. A detailed workflow is illustrated in the
Figure 1.
3.1. Design
We conducted a comprehensive study to investigate the relationship between head movement kinematics and cybersickness in virtual reality (VR) environments. By employing a within-subjects experimental design, we aimed to reduce individual variability and enhance the reliability of our findings. Participants began by completing a questionnaire to provide background information, demographics, and prior VR experience. They later attended a single VR session, with a minimum duration of 5 min, which was extended as needed to accommodate the participant’s comfort level. Following the session, participants completed the Simulator Sickness Questionnaire (SSQ) and the Virtual Reality Sickness Questionnaire (VRSQ) to assess and document symptoms of simulator-induced discomfort and the specific effects of the VR experience on physical and cognitive comfort. This provided a comprehensive evaluation of their experience and any associated symptoms.
The architectural diagram in
Figure 2 illustrates the workflow of our research design. It comprises modules for data collection, preprocessing, machine learning, and prediction, with the integration of hardware for collecting data and software components that help us preprocess and extract features, train regression models, and generate sickness score predictions from the data.
3.1.1. Equipment and Environment
Our experimental setup featured an Oculus Quest 2 VR headset. We utilized Unity Game engine v2022 [
24] to craft a simple yet immersive apartment scene. The virtual environment consisted of a fully furnished indoor area and an accessible outdoor space, allowing users to explore different settings seamlessly (see
Figure 3). Unity was used to ensure compatibility with the Oculus hardware and enable customized data-logging scripts. The indoor environment replicated a modern apartment with high-resolution textures and 3D models for furniture, appliances, and decor items. Users could navigate the environment using teleportation by selecting target locations with the Oculus controllers. The application maintained an average frame rate of 71.5 Hz throughout the experiment.
We developed our VR experience using the Meta Quest 2 headset as the foundational hardware. This device offered a per-eye resolution of approximately 1832 × 1920 pixels, and we configured it to run at a refresh rate of up to 72–120 Hz, striking a careful balance between visual clarity and motion comfort. The optics, employing Fresnel lenses, provided a relatively wide field of view and crisp central visuals, albeit with occasional lens glare. Inside-out tracking, powered by the headset’s onboard cameras, enabled full six-degrees-of-freedom motion without external sensors, and its integrated spatial audio delivered directionally accurate sound cues. The standalone Snapdragon XR2 processor ensured stable performance without requiring a PC connection, and we optimized ergonomics through both the device’s default design and optional adjustable head straps.
Our scene was designed with both performance efficiency and user comfort in mind. The geometry load remains manageable, with total triangles hovering around 168k and roughly 186k vertices, and frustum culling is employed to minimize unnecessary rendering. Textures were maintained at approximately 2K resolution, suitable for the device’s GPU capabilities, and we relied on precomputed lighting solutions to limit real-time shading overhead.
Figure 3 provides representative snapshots of the developed VR scene, illustrating its spatial configuration, lighting conditions, and interactive elements. Interaction in the scene was implemented through XR interaction setups, such as teleport anchors, which allowed participants to navigate the environment comfortably and minimized the risk of simulator sickness.
Custom scripts written in C# were integrated into the Unity project to log user interactions, movements, and session duration. Unity’s compatibility with Oculus hardware facilitated seamless deployment and testing, while its robust developer tools allowed for efficient iteration and customization of the VR environment.
3.1.2. Population and Sampling
There were twelve participants, five females and seven males, aged 20 to 45. Participation was voluntary and did not exclude individuals based on gender or previous VR experience as long as they were familiar with VR technology. Additionally, for IRB purposes, participants needed to be between 18 and 55 years old.
Table 1 summarizes the demographic details of the 12 participants, including their age, gender, prior VR experience, and prior gaming experience. The participants were selected based on their familiarity with VR technology, although only a few had prior VR experience. This selection aimed to ensure the generalizability of the findings across a diverse group with varying degrees of familiarity with virtual environments.
3.2. Data Collection
The primary data collection instrument employed in this study was an Oculus Quest 2 VR headset, selected for its robust capabilities in tracking head movements with high precision. The headset continuously monitored participant head movements across six degrees of freedom (6DOF), accounting for both positional and rotational dynamics. As illustrated in
Figure 4, these 6DOF measurements encompassed translational shifts along the three orthogonal spatial axes (forward/backward, up/down, left/right), as well as rotational changes around these axes (pitch, yaw, roll). To capture these subtle head movements over time, the headset sampled tracking data at a rate of 10 Hz, effectively recording positional and angular information every 0.1 s.
In order to ensure that the recorded data represented meaningful and sustained user engagement, the study required a minimum exposure duration of 5 min. This threshold prevented the inclusion of brief, incidental exposures that would not reflect the participant’s true interaction with the virtual environment or yield sufficiently robust data. Beyond this initial period, the session could be extended as needed, provided the participant remained comfortable. This approach allowed for the collection of more nuanced and ecologically valid data, as participants had the flexibility to continue beyond the minimum duration if they experienced no adverse effects. The headset’s inter-pupillary distance and overall fit were carefully adjusted for each participant to ensure comfort and minimize potential misalignment issues. These adjustments were made prior to the experiment to reduce discomfort and ensure accurate head-tracking, thereby minimizing any confounding variables. Participants performed the experiment in a standing position, with instructions provided to ensure consistent and stable posture throughout the session, avoiding unnecessary movements. Conversely, to safeguard participant well-being and mitigate the risk of prolonged virtual reality exposure, the experiment concluded if the participant indicated any discomfort or reached a predetermined maximum time limit, whichever came first. This protocol thus ensured an ethical and balanced approach, obtaining rich datasets while maintaining the participants’ comfort and safety. Additionally, it is to be noted that we did not have multiple trials with each user in this study.
In addition to head-tracking data, the headset also recorded frame rate as a function of time. Frame rate data were crucial for assessing the smoothness of the visual experience, as fluctuations could contribute to visual discomfort and cybersickness. To assess subjective measures of cybersickness, participants completed two standardized post-session questionnaires, the SSQ and the VRSQ. Participants reported the severity of each symptom (as listed in
Table 2), allowing for the quantification of cybersickness levels for further analysis.
3.2.1. Head-Tracking Data
The head-tracking data were collected from the VR headset, capturing both positional and rotational movements over time. The 6DOF head-tracking data include positional coordinates (X, Y, Z) and rotational quaternions (W, X, Y, Z), as well as the time stamps and frame rate of the VR system (See
Table 3). These data provided a comprehensive representation of the user’s head motion during the VR experience, capturing key aspects such as how fast the head moves, how rapidly it changes direction, and the smoothness of these transitions. The purpose of collecting these data is to use them as the foundation for kinematic calculations that allow us to evaluate user motion. Each of the 6 DOFs is described as follows.
Roll involves rotating the headset along the front-to-back axis, resembling the action of tilting the head from one shoulder to the other. Yaw refers to rotating the headset around a vertical axis, as in shaking the head in disagreement. Pitch indicates tilting the headset forward or backward on a horizontal axis, similar to nodding to agree. Surge denotes linear motion forward or backward, comparable to moving the head closer to or further from an object. Slide describes horizontal movement without rotation, shifting left or right, while heave refers to the vertical motion of the headset as it moves up or down in a straight line.
3.2.2. Sickness Questionnaire Data
Cybersickness was estimated using two standardized post-session questionnaires: the Virtual Reality Sickness Questionnaire (VRSQ) and the Simulator Sickness Questionnaire (SSQ). The VRSQ specifically evaluates symptoms related to VR experiences by assessing eight key symptoms [
22]. The SSQ is a broader tool widely used to measure simulator-induced motion sickness across various platforms that assesses 16 symptoms categorized into three factors: Nausea, Oculomotor, and Disorientation [
21]. To provide a clear comparison,
Table 2 lists the specific symptoms assessed by each questionnaire.
The study aimed to use these two questionnaires to capture a comprehensive view of participants’ discomfort levels and identify patterns in cybersickness related to the VR experience. However, only the VRSQ scores were utilized in the analysis as they are specifically tailored to assess symptoms in virtual reality environments. This approach ensured a focused and efficient evaluation of cybersickness, avoiding redundancy and prioritizing the most relevant metrics.
3.3. Data Processing and Analysis
In the initial stage of the analysis, our focus was on preprocessing the raw data. We computed time intervals between consecutive records to facilitate velocity and acceleration calculations, enabling us to gauge the rate of change in the user’s head position over time. Addressing the presence of missing or invalid values, such as NaN or infinite values, was crucial, as these could disrupt the computation of kinematic metrics. To safeguard the integrity of our analysis, we took measures to either interpolate or remove these values, ensuring the smooth progression of our calculations.
In the VRSQ, all users rated each symptom on a scale from 0 to 3, where 0 indicated no symptom and 3 signified a severe symptom. The individual symptom scores were categorized into two groups: “Oculomotor” and “Disorientation” (see
Table 4).
First, the scores within each category were summed to obtain raw totals. Since the maximum possible points differed between categories—12 points for Oculomotor and 15 points for Disorientation—the totals needed to be normalized for comparison; for this, each raw total was converted into a percentage: for Oculomotor, the category sum was divided by 12 and multiplied by 100; for Disorientation, the category sum was divided by 15 and multiplied by 100. Finally, to calculate the overall VRSQ score, the two percentage values (Oculomotor and Disorientation) were averaged. This method ensured that, despite the different numbers of symptoms in each category, both contributed equally to the final VRSQ score.
3.3.1. Computed Physical Attributes
Once the data were cleaned, we calculated the fundamental kinematic metrics, which include linear velocity, acceleration, and jerk, as well as angular velocity, acceleration, and jerk. Linear velocity was derived from the change in position over time, indicating how fast the user’s head was moving. Similarly, acceleration was calculated as the rate of change in velocity, showing how quickly the movement itself was changing. Jerk, the derivative of acceleration, provided insight into the abruptness of changes in acceleration, which can be associated with discomfort.
Linear acceleration
:
Angular acceleration
:
In these equations, the subscripts
x,
y, and
z indicate the respective components along the
X-,
Y-, and
Z-axes. The quaternion data, which represent the headset’s orientation in space, were used to compute angular velocity, acceleration, and jerk. Angular velocity described how fast the user’s head was rotating, while angular acceleration and angular jerk provided deeper insights into the smoothness of rotational movements [
26]. The recorded head-tracking data and the computed variables for a user are presented as an example in
Figure 5.
3.3.2. Statistical Description
We performed a statistical analysis of the computed metrics. We calculated mean, median, minimum, and maximum values to capture both typical movement patterns and extreme events for each metric. The data revealed substantial variability in linear acceleration and angular acceleration (See
Figure 6). In addition, angular jerk showed high peaks, indicating abrupt changes in movement.
The analysis revealed a weak correlation between rapid movements and high sickness scores (See
Figure 6,
Figure 7 and
Figure 8). This indicates that isolated kinematic events, when considered independently, may not serve as robust predictors of user discomfort in virtual environments. Furthermore, the multivariate regression analysis between cybersickness and the summary statistics did not yield significant results. Such a finding highlights the limitations of univariate analysis in capturing the complexity of factors contributing to cybersickness. Given the multifaceted nature of human responses in virtual environments, this observation underscores the need to explore more sophisticated approaches, such as multivariate analysis, to uncover potential interactions and combined effects of multiple variables. The weak individual correlations suggest that single-variable predictors fail to account for the non-linear and possibly synergistic relationships that may exist between kinematic features and user discomfort. To address these challenges, we are compelled to adopt machine learning techniques, which are well suited for modeling complex and non-linear interactions.
3.4. Predict Cyber Sickness with ML
We used a regression [
27]-based approach to develop a predictive model that estimates cybersickness levels based on input features extracted from head-tracking data. The models were trained and optimized, and their performance was evaluated using standard regression metrics. The dataset is a collection of CSV files containing positional and rotational values, frame rates obtained from the VR headset, and computed variables of velocity, acceleration, and jerk from both positional and rotational coordinates. The predicted cybersickness values range from 0 to 100.
The implementation methodology employs a systematic approach to develop, train, and evaluate machine learning algorithms for predicting user sickness based on features derived from virtual reality (VR) data. The pipeline begins by extracting and preprocessing data from multiple CSV files extracted from each user, containing user head-tracking data and the computed variables. The data are consolidated with associated sickness scores provided in a separate file, enabling supervised learning. The input features include both linear and angular measurements such as velocity, acceleration, jerk, and rotation components, alongside environmental parameters like frame rate and time intervals. These features are scaled using tthe StandardScaler module from the scikit-learn library [
28], ensuring comparability across different magnitudes and preventing feature dominance due to different scales. Four machine learning models are employed, including Random Forest Regressor, Gradient Boosting Regressor, Support Vector Regressor (SVR) and K-Nearest Neighbors (KNN). Each algorithm brings unique strengths: Random Forest and Gradient Boosting leverage ensemble learning to capture complex relationships [
29,
30], SVR provides robust handling of non-linear boundaries, and KNN is intuitive and effective for local patterns [
31,
32,
33]. Hyperparameter tuning is conducted for all models using a grid search strategy combined with cross-validation to optimize performance, using the GridSearchCV module in the sklearn library [
28].
The parameter search space for grid search was defined to balance model performance and computational efficiency. For the Random Forest Regressor,
n_estimators was set to {100, 200},
max_depth to {10, 20}, and
min_samples_split to {2, 5, 10}. The Gradient Boosting Regressor included
n_estimators values of {100, 200},
learning_rate options of {0.01, 0.1, 0.2}, and
max_depth values of {3, 5, 10}. The Support Vector Regressor (SVR) grid spanned
C values of {0.1, 1, 10},
kernel was {
linear,
RBF}, and
epsilon values were {0.1, 0.2, 0.5}. For K-Nearest Neighbors (KNN),
n_neighbors was set to {3, 5, 7}, with
weights as {
uniform,
distance}, and distance metrics as {
Euclidean,
Manhattan}. For more details about the nomenclature used in these models, the reader is referred to the scikit-learn user manual [
28].
Following training and optimization, the models were evaluated on the test set to ensure generalization. Performance metrics including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and were calculated to assess predictive accuracy and reliability. Additionally, the methodology included a mechanism to predict sickness scores for new, unseen data. CSV files representing new users were processed using the same feature extraction and scaling procedures. The pre-trained models generated sickness predictions, which were averaged to produce an aggregated score.
The data from 12 participants were divided into training and testing sets, with data from 9 participants used for training the models, while data from the remaining 3 participants, kept unseen during training, were used for validation.
4. Results
We present the detailed model performance results for each model after hyperparameter tuning and evaluation on both the train and the test sets. The optimum parameters found during the training phase are presented in
Table 5. We compare the model performance with respect to normalized differences (Equation (
9)) to highlight their strengths and limitations in solving the regression task. The normalized differences for all models in both the training and test settings are shown in
Table 6 and
Table 7.
The Random Forest model performed well, achieving the second-lowest average normalized difference (−0.09%). The model’s ability to effectively capture underlying patterns in the data makes it a reliable approach for predictive tasks. In the training results, the most considerable normalized difference was 2.22% (participant 8), with an average of 0.81%, showcasing its consistency across observed data. However, in unseen test data, the largest normalized difference increased to 7.32% (participant 9), with an average of 3.94%. Despite its robustness, the error spread in some instances indicates room for further refinement to enhance precision.
Gradient Boosting outperformed other models, achieving the lowest normalized difference (0.02%). The results indicate that the model captured the variance in the data most effectively and made the most precise predictions. The optimized hyperparameters, including a moderate learning rate and tree depth, contributed to its superior performance, underscoring its effectiveness in this context. It demonstrated the lowest average normalized difference (0.19%) during training, indicating strong performance in capturing complex relationships within the data. The largest training normalized difference was 0.53% (participant 5), showcasing minimal variability in prediction accuracy. In test results, the model maintained reasonable consistency, with the largest normalized difference at 3.08% (participant 4) and an average of 1.91%.
To compare the performance of different ML models, we computed the percentage of the normalized difference between the predicted and true sickness using the formula
KNN demonstrated moderate performance, with higher normalized differences (0.6%) compared to the ensemble models. In training, the average normalized difference was 0.62% with a maximum of 1.34%, while in testing, the average increased to 7.74% with a maximum of 15.59%. This result suggests that while KNN is capable of identifying local patterns, it struggles to generalize the underlying relationships in the data.
SVM showed the highest normalized differences (3.01 %) among all the trained models, indicating its limited ability to capture the data’s non-linear relationships. The average normalized difference in training was 6.09% with a maximum of 17.82%, while in testing, the average was 2.51% with a maximum of 3.40%. Despite attempts at hyperparameter tuning, SVM’s predictive capability remained constrained.
5. Discussion
HTD in a virtual reality (VR) environment provide quantitative measures of users’ head movements in real time. Relying exclusively on HTD can significantly reduce external hardware dependencies and the costs and complexities associated with integrating physiological data collection devices. However, HTD are inherently limited as an indirect measure of cybersickness. Although many instances of cybersickness stem from a visual–vestibular mismatch, HTD do not capture users’ internal physiological or psychological states. Other factors—such as individual user tolerance, gameplay style, and technical issues like tracking latency—can confound patterns in head movement data. The volume and complexity of high-frequency real-time HTD pose practical challenges for data handling and computation. Despite these constraints, the utility of HTD remains promising. A continuous regression approach (e.g., predicting an SSQ or VRSQ score) can be more informative than binary classification or simple severity classes. By providing nuanced, quantitative measures of discomfort, regression-based models allow researchers and developers to adapt VR content in real-time, potentially mitigating cybersickness episodes before they become severe.
Where previous investigations often used multiple data sources, our work demonstrates that head-tracking data alone can be used to predict cybersickness severity to a certain degree with comparable performance to some multimodal approaches. The research findings suggest that kinematic signals can be indicators of discomfort, as we observed weak correlations between rapid head movements and higher sickness scores. Furthermore, such weak single-variable correlations indicate that any single kinematic measure is insufficient to capture the complexities of user discomfort. Instead, more sophisticated methods like machine learning regression are necessary to account for the multifaceted relationship between user movement and cybersickness. In this study, we employed ML regression models to predict VRSQ scores based solely on head-tracking data from a VR headset. Our best model demonstrates robust performance, with normalized score prediction errors of less than 3.08% on unseen data. This outcome highlights the potential of head-tracking data as a convenient and cost-effective resource for modeling cybersickness severity in real time. This contrasts with studies that integrate physiological signals or external sensors, which, while potentially more accurate, increase the technological burden and may hinder widespread adoption [
7,
34].
Comparisons to existing cybersickness research are challenging, given the limited availability of datasets and modeling approaches that use HTD alone. Age, gender, prior VR experience, and prior gaming experience were some of the demographic information collected in the sickness questionnaire. Correlation analysis of demographic factors and sickness scores revealed some interesting patterns. Overall, prior VR experience showed the strongest correlation (
p-value: 0.02) with sickness, indicating that users who had previous experience reported significantly less sickness after the VR experience. Age showed a weak positive correlation, indicating that older participants might experience slightly more sickness. Gender showed no correlation with sickness, suggesting that it did not play any significant role in contributing to sickness. Prior gaming experience did not show any statistically significant correlation with sickness. Nevertheless, several studies underscore both the promise and the complexities of cybersickness modeling: Islam et al. [
16] merged head-tracking and eye-tracking data, achieving 87.77% classification accuracy. Jeong et al. [
8] employed the Multimodal Time-Series Sensor Data (MSCVR) dataset—incorporating eye-tracking, head movement, and physiological signals—to examine cybersickness. Li et al. [
35] integrated multiple data streams, including kinematic data, physiological signals, and the Simulator Sickness Questionnaire (SSQ), achieving up to 97.8% accuracy. Salehi et al. [
12] used second-hand force plate sensor data for head tracking, classifying sickness severity at a rate of 76% accuracy. Our previous research, which used only demographic data from a group of 148 participants, helped predict cybersickness with around 73% accuracy [
36]. Though these studies achieved high classification accuracies, most relied on external sensors or multimodal data, limiting their overall accessibility. By contrast, our work focuses solely on HTD collected from an off-the-shelf consumer VR device (Oculus Quest 2); this approach would complement other techniques that utilize external hardware in predicting cybersickness.
The pre-survey data show that participants with VR experience have lower cybersickness scores than those without VR experience. Participants with some gaming experience, especially moderate experience, also reported less sickness. In contrast, participants with no gaming or VR experience had higher sickness scores. The age and gender of the participants did not really have any specific patterns or correlations. However, these patterns must be considered with caution based on the limited sample size (n = 12), which restricts the statistical power. Further research with a larger and more diverse cohort would be necessary to confirm these preliminary findings.
Limitations and Future Directions
A key limitation of this study is the small dataset, comprising data from only 12 users. The small sample size restricts the model’s generalizability to diverse user populations and varied VR contexts. It also raises concerns about overfitting, in which the model fits the idiosyncrasies of the existing sample rather than capturing generalizable patterns. To address these shortcomings and enhance the robustness of our findings, future research could expand the dataset to include a broader and more diverse user base encompassing varied demographics and play styles. It could also assess the model’s performance on different VR content and entirely separate user populations or develop systems capable of reconfiguring VR content in real time based on user-specific patterns of head movement and physiological signals.
The current study indicates that head-tracking data, when utilized with machine learning models, can predict cybersickness to a certain extent. Furthermore, prior research highlights the predictive potential of physiological signals, user profiles, external sensor data, hardware specifications, and environmental factors. As a future research direction, we propose integrating these independent datasets with head-tracking data. This integration is expected to complement existing approaches, leading to the development of more accurate and robust models for predicting cybersickness.
6. Conclusions
This study demonstrates that head-tracking data from VR headsets can be effectively used to predict the severity of cybersickness using machine learning regression models. By computing kinematic metrics such as velocity, acceleration, and jerk from the HTD, we developed a gradient-boosting regression model that accurately predicts cybersickness scores with normalized differences less than 3.08% on unseen data. Instead of predicting cybersickness as a definite class using classification models, this research estimates the sickness scores using a regression-based method to provide a more precise and individualized assessment of user discomfort. The findings suggest that real-time estimation of cybersickness is feasible without the need for data from additional physiological sensors or intrusive monitoring methods to a certain extent. However, we anticipate that integrating HTD with data from physiological sensors could significantly enhance the predictive accuracy of cybersickness. Nonetheless, the use of readily available HTD simplifies the experimental setup and enhances the practicality of implementing cybersickness prediction in consumer VR applications.
However, the study was conducted with a limited sample size of twelve participants, which may affect the generalizability of the results. Future research should involve a more extensive and diverse participant pool to validate the model’s effectiveness across different user demographics and VR environments. Additionally, integrating multimodal data from different sources, such as HTD, eye tracking data, physiological data, demographic data, hardware parameters, and spatio-temporal data, could provide more robust estimation of cybersickness. Finally, exploring real-time implementation and integration of the model into VR systems could facilitate adaptive interventions to mitigate cybersickness as users experience it.