Next Article in Journal
A Novel Approach for Evaluating Web Page Performance Based on Machine Learning Algorithms and Optimization Algorithms
Previous Article in Journal
Multidisciplinary ML Techniques on Gesture Recognition for People with Disabilities in a Smart Home Environment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Decoding Subjective Understanding: Using Biometric Signals to Classify Phases of Understanding

Department of Applied Psychology and Human Development, Ontario Institute for Studies in Education, University of Toronto, Toronto, ON M5S 1V6, Canada
*
Author to whom correspondence should be addressed.
Submission received: 6 December 2024 / Revised: 12 January 2025 / Accepted: 13 January 2025 / Published: 17 January 2025

Abstract

:
The relationship between the cognitive and affective dimensions of understanding has remained unexplored due to the lack of reliable methods for measuring emotions and feelings during learning. Focusing on five phases of understanding—nascent understanding, misunderstanding, confusion, emergent understanding, and deep understanding—this study introduces an AI-driven solution to measure subjective understanding by analyzing physiological activity manifested in facial expressions. To investigate these phases, 103 participants remotely worked on 15 riddles while their facial expressions were video recorded. Action units (AUs) for each phase instance were measured using AFFDEX software. AU patterns associated with each phase were then identified through the application of six supervised machine learning algorithms. Distinct AU patterns were found for all five phases, with gradient boosting machine and random forest models achieving the highest predictive accuracy. These findings suggest that physiological activity can be leveraged to reliably measure understanding. Further, they advance a novel approach for measuring and fostering understanding in educational settings, as well as developing adaptive learning technologies and personalized educational interventions. Future studies should explore how physiological signatures of understanding phases both reflect and influence their associated cognitive processes, as well as the generalizability of this study’s findings across diverse populations and learning contexts (A suite of AI tools was employed in the development of this paper: (1) ChatGPT4o (for writing clarity and reference checking), (2) Grammarly (for grammar and editorial corrections), and (3) ResearchRabbit (reference management)).

1. Introduction

Understanding is arguably the cornerstone of learning. Its significance is recognized, both intuitively and explicitly, by students seeking conceptual clarification, teachers fostering critical thinking instead of rote memorization, and policymakers advancing initiatives like the “Transforming American Education: Learning Powered by Technology” program from the U.S. Department of Education [1]. This sense of significance is driven by the positive outcomes that understanding fosters, such as insights into what is meaningful and relevant, new opportunities for action, and progress in personal development. As a result, it is imperative to address gaps in what is understood about understanding, particularly within educational psychology and other disciplines that shape educational reform. Only then can students realize their highest academic and personal potential, teachers foster holistic understanding, and policymakers establish effective guidelines and standards for achievement.
Though challenging to define, understanding is often described in two ways: First, as a cognitive process that involves grasping the significance of information, allowing individuals to apply their knowledge effectively [2,3]; second, as a subjective feeling—that something “makes sense”—which enables us to move forward with confidence [4,5]. However, both perspectives lack the integration of the other, a synthesis long advocated by educational, developmental, and social psychologists [6,7,8,9,10,11,12]. The challenge of incorporating feelings and emotions into cognitive processes has been a point of psychological discussion since the work of William James [13], who highlighted their subjective and introspective nature. In the past, self-report measures were primarily used to investigate the affective dimension of understanding. However, when these measures were found to be unreliable, they—and the study of feelings and emotions related to understanding—were largely discarded.
More recently, our understanding of feelings and emotions has evolved [14,15,16,17]. Damasio argues that while emotions are unconscious, automatic responses to stimuli, feelings are the conscious representations of these emotions, emerging from the interoception of physiological change. Through this mechanism, positively and negatively valenced feelings integrate the body’s responses into subjective experience, reflecting changes in mental states and prompting adjustments in behaviour based on appraisals of events concerning personal goals or needs [18,19,20]. For Damasio, feelings operate within an embodied cognition framework, bridging sympathetic and parasympathetic nervous system responses to cognitive processes such as perception and decision-making. Within this framework, the brain dynamically regulates and interprets the body’s physiological responses to its environment in a continuous feedback loop [7,14,21]. This brain–body–environment nexus forms a complex, adaptive, self-organizing system where cognitive processes, feelings, and emotions emerge as interdependent properties [22,23].
Technological advancements in measuring physiological responses, combined with machine learning, now provide the tools necessary for exploring the affective dimension of understanding. Facial expression analysis software such as AFFDEX [24] utilize deep learning to measure AUs—movements of facial muscles—in real time based on the Facial Action Coding System [15]. AUs indicate parasympathetic and sympathetic nervous system activation and distinct blood flow patterns in the facial epidermis correspond to discrete emotions [25]. Thus, feelings are manifest in facial expressions. If so, the feeling of understanding can be reliably measured.
While working towards understanding, learners may experience five distinct phases that signify changes and growth in their understanding: nascent understanding, misunderstanding, confusion, emergent understanding, and deep understanding [20]. These phases are dynamic and do not necessarily follow a fixed sequence; learners may transition between them as they encounter new information or face challenges. This process situates understanding on a continuum, marked by varying levels of depth and sophistication [26]. For example, a learner beginning to study general relativity might start with a nascent understanding characterized by insufficient prior knowledge, a limited ability to form meaningful connections, and difficulty identifying relevant information [27]. Subsequently, misunderstandings may arise from incomplete or incorrect information, possibly leading to an inflated perception of their understanding [28] or from the feeling of being unable to move forward despite meeting an objective standard of correctness [4]. Upon delving deeper, they might experience confusion when inconsistencies or contradictions appear between their initial knowledge and new information [29]. However, an emergent understanding develops by integrating this information, through active processing, and insight [30,31]. As this understanding advances, it remains incomplete and unstable. However, with continued learning and reflection, deep understanding can be attained, characterized by thorough and stable knowledge of the subject, effective knowledge application, and the skill to communicate concepts clearly [32].
This paper is situated within our broader research objective of identifying physiological signatures of feelings and emotions and exploring how these affective signatures reflect and influence cognitive processes in learning. Here, we investigate understanding as a synthesis of cognitive processes and affective experiences, proposing that the feeling of understanding emerges from conscious interoception of physiological change. Consequently, we focus on subjective understanding, which is grounded in personal experience, rather than objective understanding, which is based on established knowledge and frameworks. To facilitate this investigation, we describe the process of understanding through five simplified phases: nascent understanding, misunderstanding, confusion, emergent understanding, and deep understanding. We aim to identify the feelings corresponding to each phase and hypothesize that each phase is associated with unique AU patterns. To test this hypothesis, we measured these phases as participants solved riddles, utilizing facial expression analysis and machine learning algorithms to identify their associated AU patterns.
The primary contributions of this study are: (1) developing a robust framework for measuring the feeling of understanding, grounded in the principles of embodied cognition; (2) advancing the integration of cognitive processes and affective experiences associated with subjective understanding; and (3) contributing to the development of an AI tutor capable of tracking phases of understanding without relying on explicit communication, thereby enhancing learning outcomes. The remainder of this paper is structured as follows: Section 2 describes the data analysis pipeline, the operationalization of variables, the remote nature of this study, and the sample. Section 3 presents the results of the machine learning analysis, while Section 4 expands on the contributions listed, outlines future directions, and addresses this study’s limitations.

2. Materials and Methods

2.1. Procedure

Participants completed this study remotely via Zoom during the COVID-19 pandemic. To ensure high-quality video data, participants were centered in front of their camera with good facial lighting, among other criteria (e.g., no facial obstructions, a stable surface for their computer, no phones or tablets, no green-screen filters or movement in the background). Participants were given access to a Qualtrics link where they completed a consent form and demographic questionnaire before attempting to solve 15 riddles. As participants completed the riddles, they shared their screens via Zoom, allowing for real-time monitoring of their progress and the provision of feedback throughout this study. To ensure consistency and neutrality of the emotional climate, the first author conducted each session, adhering to a standardized procedural script and riddle-specific feedback. Video recording began when participants started the first riddle, with the researcher’s video and audio disabled to minimize any potential influence on participants’ performance.
Participants were given three minutes to answer each riddle, allowing sufficient time for reading, considering, and answering the riddle while being mindful of response fatigue. After three minutes, participants were asked to provide an answer and justification. Once an answer and justification were provided, verbal feedback was given, indicating whether the answer was correct or incorrect. If a participant was unable to provide a correct answer, the correct answer and justification were then provided aloud. Before moving on to the next riddle, participants were asked if they already knew the correct answer to the current riddle to determine whether it should be excluded. After completing the riddles, participants were debriefed and compensated CAD 10 via e-transfer. The Zoom recordings were then analyzed using AFFDEX and AUs were measured (Figure 1).

2.2. Participants

A total of 103 participants were enrolled in this study, of which 78 (75.7%) were female, with a mean age of 24 years (SD = 4.3). Online platforms (e.g., Facebook), student-affiliated websites (e.g., University of Toronto Psychology Student’s Association), and word-of-mouth referrals were used to recruit participants. Most participants were currently enrolled in undergraduate programs at the University of Toronto.

2.3. Phases of Understanding

To assess the phases of understanding, participants worked on 15 carefully selected riddles, each designed to have a single, clear answer and accompanying explanation (Appendix A). Riddles are an effective task due to their ability to elicit the occurrence of each phase, cognitive challenge, and affective responses, while also encouraging concise responses, justifications, and participant engagement. The riddles, sourced from research articles [33,34] and online platforms, varied in both difficulty and type. For each riddle, participants were required to provide a written answer, justification, and their level of certainty on a four-point scale (1 = Not certain, 2 = Somewhat certain, 3 = Certain, 4 = Very certain). A four-point scale was used to potentially explore whether nuanced response categories could help distinguish between phases. Ultimately, no distinction was made between “Certain” and “Very certain”, so a three-point scale would have sufficed.
Three methods were used to operationalize the phases. First, the answer, justification, level of certainty and time taken before providing an answer were captured from written responses. Though it may have oversimplified the phases, this approach kept focus on the key aspects of understanding—objective correctness and the subjective feeling [4]. Further, it allowed us to differentiate between phases such as misunderstanding and deep understanding by asking participants for their level of certainty. Second, observational coding was used for incorrect riddle answers. Last, AFFDEX was used to automatically identify one phase.
Nascent understanding was operationalized as providing an incorrect answer or incorrect justification and lack of certainty (1 = Not certain).
Misunderstanding was operationalized as providing a correct answer, justification, and low level of certainty (2 = Somewhat certain). Responses with a correct answer, justification, and lack of certainty (1 = Not certain), as well as those with an incorrect answer, justification, and high certainty (3 = Certain or 4 = Very certain), were not analyzed due to insufficient observations.
Confusion was identified using AFFDEX, as the software automatically detects this emotion based on its associated AU patterns. The three most intense instances of confusion per participant were considered for analysis, ensuring the inclusion of clear observations while collecting data efficiently.
Emergent understanding was operationalized as providing a correct answer, justification, high level of certainty (3 = Certain or 4 = Very certain) and taking at least two minutes to provide an answer. This time frame reflected active processing and diligent effort on the riddle. Emergent understanding was also identified using observational coding. Three independent coders evaluated whether participants understood the answer and justification to a riddle when it was provided for all instances across 71 participants. Based on these evaluations, a codebook describing the behavioural indicators associated with instances of unanimous agreement was developed (Appendix B). The codebook was then applied to code the remaining 32 participants. Instances that did not achieve unanimous agreement among the coders were excluded from further analysis. Additionally, the codebook was used to confirm the occurrence of emergent understanding for written responses.
Deep understanding was operationalized as providing a correct answer, justification, high level of certainty (3 = Certain or 4 = Very certain) and providing an answer within approximately 20 s of reading the riddle. This time frame reflected ease and mastery in solving the riddle.
Given the exploratory nature of this study, only clear instances of each phase were selected for analysis. For example, instances were excluded if participants were speaking while answering a riddle, as this could alter facial expressions; wrote down their answer to a riddle but continued brainstorming, indicating ongoing thought processing; or if there was any indication that a phase was not being reliably experienced, such as laughter about something seemingly unrelated to the riddle in question, potentially indicating a different mental state.

2.4. Action Units

AUs were measured using the AFFDEX module within the facial expression analysis suite developed by iMotions. AFFDEX applies FACS to code facial expressions from recorded videos or in real time. FACS is well suited for use with traditional machine learning algorithms, which depend on feature extraction—an important consideration given the limited observations per phase in this study. In contrast, deep learning algorithms, which analyze raw images directly, demand much larger datasets, making them unfeasible in this context.
AFFDEX measures 20 AUs, a neutral state, confusion, and the basic emotions. It outputs the number of frames each AU, the neutral state, and emotion exceeds a predetermined likelihood threshold. On a scale ranging from 0 to 100, a threshold of ‘25’ represents a mild facial response, ‘50’ a moderately strong response, and ‘75’ a strong response [35]. The likelihood threshold was set at 50 to enhance reliability in measuring AUs while also capturing relatively subtle facial expressions. AFFDEX has been validated for its accuracy and reliability in measuring AUs [24].
For written responses, AUs were measured when participants began writing their final answer to a riddle for a duration of five seconds. AUs were measured prior to participants writing their final answer only if there were clear indicators of a phase occurring shortly beforehand. Time intervals were extended if clear indicators of a phase were observed beyond the initial five seconds. For observational coding, AUs were measured when clear indicators of emergent understanding were present until they began to fade. AUs were measured for the entire duration of each instance of confusion, as detected by AFFDEX.

2.5. Data Analysis

2.5.1. Interrater Reliability

Interrater reliability was calculated using Gwet’s AC1 [36] for instances of emergent understanding identified using observational coding. Gwet’s AC1 was used due to its robustness in conditions of high agreement between coders and immunity to the Kappa paradox [37].

2.5.2. Machine Learning

Supervised machine learning was used to identify AU patterns associated with each phase of understanding, prioritizing high predictive accuracy, generalizability to new data, and avoiding overfitting. Six algorithms were used: gradient boosting machine (GBM), random forest (RF), decision tree, lasso regression, elastic net regression, and logistic regression. GBM and RF are ensemble methods known for handling complex, non-linear relationships, providing enhanced predictive performance. Decision trees offer interpretable models that capture non-linear patterns without extensive preprocessing. Lasso and elastic net regressions incorporate regularization techniques for feature selection by penalizing less important variables, improving model interpretability and preventing overfitting. Logistic regression offers a straightforward baseline for classification tasks, facilitating comparison with more complex models. By using this diverse set of algorithms, a comprehensive analysis that balanced predictive accuracy, interpretability, and computational efficiency was ensured, making it well suited to the anticipated complexity of the data.
The analysis was conducted using RStudio version 2023.12.1-402. GBM and RF models were trained using ‘caret’ (v6.0.94), the decision tree model was trained using ‘rpart’ (v4.1.23), lasso and elastic net models were trained using ‘glmnet’ (v4.1.8), and the logistic regression model was trained using ‘nnet’ (v7.3.19). The analysis pipeline can be summarized in the following steps:
  • Splitting data: Stratified 80–20 test–train splits were conducted to address moderate class imbalance across phases.
  • Feature scaling: Feature scaling was implemented for logistic regression, lasso, and elastic net, as these algorithms are sensitive to the range of feature values. AUs were standardized across training and test sets.
  • Cross-validation setup: 10-fold stratified cross-validation was used for robust model evaluation. Cross-validation was intentionally omitted for logistic regression, allowing the model to serve as a benchmark for traditional inferential statistical modeling.
  • Selecting evaluation metrics: Models were evaluated using precision, recall, F1 score, and AUC. Specifically, precision measures the proportion of true positives among all instances predicted as positive, indicating the accuracy of positive predictions. Recall measures the proportion of true positives correctly identified out of all actual positives. The F1 score is the harmonic mean of precision and recall, balancing accuracy and completeness. AUC (Area Under the Curve) evaluates the model’s performance across all classification thresholds, reflecting its ability to distinguish between classes. During cross-validation, metrics for each phase were averaged across the 10 folds to yield a consolidated measure of performance. Weighted averages were calculated for each metric, based on the number of observations per phase. All metrics were calculated using the One-vs-Rest (OvR) approach.
  • Training baseline models: Baseline GBM, RF, decision tree, lasso, and elastic net models were trained using default package settings. Algorithms demonstrating promise were selected for optimization.
  • Hyperparameter tuning: Models were optimized through hyperparameter tuning. Tuning focused on hyperparameters and grid searches considered most relevant to improving model performance, while accounting for efficiency and computational resources required.
  • Test-set evaluation: Finally, optimized models and the logistic regression model were evaluated on test sets.

3. Results

3.1. Inter-Rater Reliability

Three independent coders analyzed 241 observations of emergent understanding. Of these, 225 achieved unanimous agreement, with a 93% agreement rate. Gwet’s AC1 was 0.95, indicating excellent agreement.

3.2. Descriptive Statistics

There was a total of 1245 observations across the phases of understanding: 249 observations classified as nascent understanding, 171 as misunderstanding, 200 as confusion, 287 as emergent understanding, and 338 as deep understanding (Table 1).
The heatmap in Figure 2 illustrates the average distribution of frames for the 20 AUs and neutral state for each phase of understanding. The colour gradient represents the mean value, with warmer tones (red) indicating higher means and cooler tones (blue) indicating lower means. The AU patterns across phases are distinct, but also overlap. Specific AUs, such as Brow Furrow in confusion and Smile in emergent understanding, highlight unique expressions for each phase, while the neutral state remains prevalent across all phases. Overall, confusion and emergent understanding show higher AU activity, while nascent understanding, misunderstanding, and deep understanding show lower AU activity.
Images of participants’ facial expressions for a representative instance of each phase are also provided (Figure 3), along with examples of participants’ answers, think aloud, justifications, and levels of certainty in response to riddles (Table 2).

3.3. Machine Learning Analysis

Training sets included 998 observations or 80% of the data: 200 for nascent understanding, 137 for misunderstanding, 160 for confusion, 230 for emergent understanding, and 271 for deep understanding. Test sets included 247 observations or 20% of the data: 49 for nascent understanding, 34 for misunderstanding, 40 for confusion, 57 for emergent understanding, and 67 for deep understanding (Table 3). Aggregated results for baseline models are reported in Table 4.
The algorithms selected for optimization were GBM, RF, and lasso. Elastic net was not selected because the optimal alpha was set at 1 when models were trained with default alpha values ranging from 0.1 to 1, indicating a preference for shrinking coefficients to zero rather than near zero. Moreover, its performance was nearly identical to lasso.
GBM tuning involved two rounds. In round one, a comprehensive grid search was conducted for the following hyperparameters and ranges: number of trees from 50 to 500 (in increments of 50), interaction depth from 1 to 10 (in increments of 1), shrinkage values of 0.01, 0.05, 0.1, and 0.2, and the minimum number of observations in the terminal nodes from 5 to 20 (in increments of 5). In round two, a Bayesian search grid was conducted for the same parameters with the following ranges: number of trees from 50 to 600, interaction depth from 1 to 12, shrinkage values of 0.01, 0.05, 0.1, 0.2, and 0.25, and terminal nodes from 5 to 20. RF tuning involved a comprehensive grid search for the number of variables tried at each split, with a range of 4 to 18 (in increments of 1). Lasso tuning involved a grid search for lambda, the regularization parameter, for 100 values between −3 and 3.
For GBM, the number of trees (‘n.trees’) was set at 250, the interaction depth (‘interaction.depth’) at 8, the shrinkage (‘shrinkage’) at 0.01, and the minimum number of observations in the terminal nodes (‘n.minobsinnode’) at 10. The RF model used 5 variables tried at each split (‘mtry’). For lasso, the regularization parameter (‘lambda’) was set at 0.006.
Aggregated results for the optimized models are reported in Table 5.
For test set evaluations, weighted averages are reported for the optimized GBM, RF, and lasso models, as well as the logistic regression model for succinct but clear reporting of results. The GBM model achieved a precision of 0.91, recall of 0.87, F1 score of 0.88, and AUC of 0.84. The RF model achieved a precision of 0.91, recall of 0.88, F1 score of 0.89, and AUC of 0.82. The lasso model achieved a precision of 0.90, recall of 0.85, F1 score of 0.86, and AUC of 0.80. The logistic regression model achieved a precision of 0.90, recall of 0.86, F1 score of 0.87, and AUC of 0.79 (Table 6, Figure 4).
Figure 5 presents the confusion matrix for the optimized GBM model’s performance on the test set, selected as the best-performing model and for efficient reporting. The matrix demonstrates strong classification accuracy for emergent understanding and confusion. However, phases with fewer test set instances, such as misunderstanding, show higher misclassification rates, likely due to their limited sample size. The matrix also reveals challenges in distinguishing between deep and nascent understanding, suggesting potential overlap in the features associated with these phases.
A notable discrepancy exists between the weighted performance metrics reported for the test set and the misclassification patterns observed in the matrix. This discrepancy occurs because weighted metrics prioritize performance for more prevalent classes, which can obscure the model’s difficulties with less frequent phases.
These findings indicate that future iterations could benefit from addressing class imbalance by incorporating a more balanced dataset and improving feature selection and representation.
Overall, the machine learning models demonstrated good performance. However, there was room for improvement in reliably predicting and distinguishing the phases of understanding, particularly nascent understanding, misunderstanding, and deep understanding. The optimized models demonstrated improvements relative to their respective baselines, but overall performance remained comparable. The performance of linear models suggests that the decision boundaries between phases exhibit a degree of linear separability, exist on a continuum, and are predominantly distinguished by the intensity of overlapping AUs. The performance of non-linear models indicates that, to some extent, the phases exhibit complex AU interactions that can be effectively exploited.
Although not reported, it is important to note that experiments involving synthetic sample size increases using SMOTE and random oversampling led to improved performance for GBM and RF. Conversely, lasso’s performance significantly declined after SMOTE and remained unchanged after random oversampling. The quality of the synthetically generated observations may explain this; however, it suggests that if AU patterns across phases become more complex, the performance of linear models diminishes. Notably, adjusting class weights did not impact the overall performance of the GBM, RF, and lasso models, indicating that these models already leveraged phase patterns as effectively as possible.
Feature importance scores for each AU, ranging from 0 to 100, are reported for the optimized GBM model (Table 7, Figure 6). These scores align with the importance scores for the optimized RF model, so the RF scores are not reported. Lasso coefficients are also reported for each AU for each phase (Table 7, Figure 7).
Feature importance scores reflect the dominant role of neutral expression, with a score of 100, in distinguishing the phases. This feature is likely the most important because the intensity of physiological responses varies across phases, and it represents an overall score for the presence or absence of AUs in any given frame. Manually removing AUs considered unimportant based on importance scores, near zero, and zero variance did not impact the overall performance of GBM and RF models.
The lasso coefficients represent the features selected by the model after regularization, thereby indicating the most important AUs in predicting each corresponding phase. Confusion shows significant positive coefficients for several AUs, especially Lid Tighten and Brow Furrow. Deep understanding showed AU coefficients shrunk to zero for AUs like Jaw Drop, and negative coefficients for AUs like Brow Furrow. Emergent understanding had relatively stable coefficients around zero, with a significant dip for Neutral. Misunderstanding showed low to moderate positive coefficients, including Lip Suck and Inner Brow Raise. Nascent understanding also showed relatively stable coefficients around zero, except for positive coefficients for AUs like Smile. Overall, the lasso coefficients demonstrated distinct AU patterns across the phases.

4. Discussion

The results suggest a potential association between distinct physiological feelings, measured through AUs, and phases of understanding. This leads to three key points of discussion.
First, a robust framework for measuring the feeling of understanding is advanced, challenging traditional educational practices. Educational and developmental psychologists such as Bruner [6], Dewey [38], and Vygotsky [11,39] have long emphasized the student’s active role in learning. Despite their advocacy, classrooms often prioritize standardized curricula and objective measures of achievement while paying limited attention to feelings, emotions, and personal experiences—an approach reinforced by policymakers and researchers due to difficulties in measuring subjective experience [4]. Grounded in an embodied cognition framework, the results and methods of the present study challenge this rationale by offering a clear and observable basis for measurement through physiological activity. Furthermore, embodied cognition underpins our most promising accounts of consciousness, cognition, and affect, highlighting the centrality of the individual in these processes [22,23]. Thus, while objective measures of achievement in educational settings are essential for maintaining quality and accountability, they must be balanced with an equal emphasis on the subjective experiences of students, addressing their unique needs and characteristics.
Second, integrating the cognitive processes and affective experiences associated with subjective understanding in real time is reliably and directly advanced. By combining the introduced method—which captures immediate shifts in mental states through physiological activity—with a framework that considers the valence, rationality, and social impact of mental states [40], along with predictive processing [41], the link between the cognitive processes, feelings, and emotions of subjective understanding can be explored. Specifically, physiological signatures of various affective experiences associated with understanding can be leveraged to gain deeper insights into how cognitive processes, such as logical reasoning, unfold during understanding.
Last, contributions have been made in advancing the development of an intelligent multimodal tutor that harnesses large language models revolutionizing personalized learning [42]. A tutor capable of detecting feelings and discerning when they hinder learning could guide learners toward deep understanding in real time by identifying misconceptions before they are entrenched, reducing frustration, fostering engagement, and helping them independently monitor, control, and manage their learning processes. Given the ethical implications, developing such a tutor requires careful consideration to ensure it respects user privacy, autonomy, and consent. Its intention must be to promote development across all individuals rather than mining data, conducting surveillance, or prioritizing the benefits of commercial stakeholders.
The findings have limited generalizability as the machine learning models were trained on clean data. These models are expected to perform poorly in noisy, in-the-wild instances of each phase. In addition, the number of observations in test folds and sets was low, particularly for underrepresented phases like misunderstanding which further limits the generalizability and robustness of the models. The participants in this study were predominantly female undergraduates from the same university, which presents a limitation. Although the phases are expected to correspond with the same AU patterns regardless of demographic differences, the sample’s homogeneity may limit the generalizability of the findings.
The exclusion of data points where participants were speaking while answering riddles may have inadvertently omitted valuable information. Additionally, parsing understanding into distinct phases oversimplifies the complex nature of understanding. While our approach allowed us to discover a relationship between feelings and cognition, the categorization of the phases of understanding may require further refinement as research progresses. Finally, while efforts were made to ensure consistency, verbal feedback after each riddle may have influenced participants’ subsequent responses or facial expressions, potentially introducing a confounding variable into our analysis.
Future research should focus on training models using larger, noisy datasets to improve performance across various contexts, especially real-time applications. This aligns with the growing emphasis among researchers on using ecologically valid data, which better captures the complexity of real-world environments and behaviours. This shift is particularly evident in fields like affective computing, where datasets such as IEMOCAP and RAMAS are increasingly used to train models for more realistic applications [43]. As larger datasets become available, deep learning models like Convolutional Neural Networks (CNNs) and Transformer-based models should be developed as they typically outperform shallow models like RF and GBM and can be trained on raw data [44], eliminating the need to extract features such as AUs from intermediary modules like AFFDEX. Additionally, incorporating physiological data through speech analysis [45] and transdermal optical imaging (TOI, [25]) could enhance performance, and offer a more comprehensive understanding of the phases [46,47]. This study investigated misunderstanding as the feeling of being unable to move forward despite meeting an objective standard of correctness; therefore, future studies must also examine misunderstanding as an inflated sense of certainty in conjunction with an incorrect answer. Lastly, future studies must explore the relationship between physiological signatures and cognitive processes, such as logical reasoning, for each phase and replicate this study’s findings using a real-world academic task.

Author Contributions

Conceptualization, M.L. and E.W.; methodology, M.L.; validation, M.L.; formal analysis, M.L.; investigation, M.L.; data curation, M.L.; writing—original draft preparation, M.L., E.W. and J.J.; writing—review and editing, M.L., E.W. and J.J.; visualization, J.J.; supervision, E.W.; project administration, M.L; funding acquisition, E.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Social Science and Humanities Research Council (SSHRC), Operating Grant #500427.

Institutional Review Board Statement

This study was approved by the Social Sciences, Humanities & Education Research Ethics Board of the University of Toronto (protocol# 42280, date of approval: 04/03/2022) for studies involving humans.

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

The datasets presented in this article are not readily available because consent to share data were not obtained from all participants. Requests to access the datasets should be directed to: [email protected].

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

  • Riddle 1: I start with M, end with X and have a never-ending amount of letters. What am I? Answer: Mailbox.
  • Riddle 2: What 5 letter word becomes shorter when you add two letters to it? Answer: Short.
  • Riddle 3: What building has the most stories? Answer: Library.
  • Riddle 4: I have two coins that equal 30 cents and one is not a nickel. What two coins do I have? Answer: A quarter and nickel.
  • Riddle 5: I saw a boat full of people, yet there wasn’t a single person on board. How is this possible? Answer: Everyone on the boat was in a relationship.
  • Riddle 6: Three different doctors said that Paul is their brother, yet Paul claims he has no brothers. Who is lying? Answer: No one.
  • Riddle 7: Why is it against the law for a man living in Delhi to be buried in Mumbai? Answer: Because he is alive.
  • Riddle 8: Before Mt. Everest was discovered, what was the highest mountain in the world? Answer: Mt. Everest.
  • Riddle 9: A big brown cow is lying down in the middle of a country road. The streetlights are not on, the moon is not out, and the skies are heavily clouded. A truck is driving towards the cow at full speed, its headlights off. Yet the driver sees the cow from afar easily, and avoids hitting it, without even having to brake hard. How is that possible? Answer: It was daytime
  • Riddle 10: Which month has 28 days? Answer: Every month has at least 28 days.
  • Riddle 11: In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days to cover the entire lake, how long does it take for the patch to cover half of the lake? Answer: 47 days.
  • Riddle 12: You are a cyclist in a cross-country race. Just before the crossing finish line, you overtake the person in second place. In what place did you finish? Answer: Second.
  • Riddle 13: A bat and a ball cost $1.10 in total. The bat costs a dollar more than the ball. How much does the ball cost? Answer: 5 cents.
  • Riddle 14: You’re in a dark room with a candle, a wood stove, and a gas lamp. You only have one match, what do you light first? Answer: The match.
  • Riddle 15: How much dirt is there in a hole that is 3 feet deep, and 6 inches in diameter? Answer: None

Appendix B

The purpose of this codebook is to guide raters when assessing emergent understanding. Specifically, behavioural indicators of emergent understanding are provided so that raters can determine if it has occurred and how long an instance of it lasts.
General Guidelines:
Emergent understanding is more likely to be present as the number of indicators in combination increases.
Example:
  • Participant 1: Gasps, smiles, laughs, nods head, and verbally indicates understanding (e.g., “that makes sense”)
  • Participant 2: Nods head
  • Emergent understanding is more likely to be present for participant 1
Emergent understanding is more likely to be present when indicators are expressed with greater intensity.
Example:
  • Participant 1: Nods head enthusiastically and smiles widely
  • Participant 2: Nods head slightly and smiles faintly
  • Emergent understanding is more likely to be present for participant 1
What is “intense” is relative to each participant. It is therefore necessary to be attuned to each participant and how expressive they generally are.
If the evaluation of whether emergent understanding occurred requires a lot of considerations and interpretations, do not code and mark as “unclear”. It is not very likely that the other raters will come to the same interpretation as you did, so it will increase reliability if you do not code.
Rely on intuition in addition to the codebook when coding.
When coding, first assess whether emergent understanding has occurred. Behavioural indicators of emergent understanding are provided below.
Head:
  • Nodding
  • Smiling
  • Eyebrow(s) raise
  • Eyebrows furrow
  • Eyes closing
  • Eyes widening
  • Eyes rolling
  • Face scrunching
  • Face relaxing
  • Head movement
I.e., head movement that expresses an acknowledgement, realization, or judgment regarding the correct answer or own answer
  • E.g., Looking upward
  • E.g., Tilt to the side
The following facial expressions:
  • Surprise
  • Interest
  • Disgust
Auditory:
  • Laughing
  • Chuckling
  • Sighing
  • Gasping
  • Kissing teeth
  • Providing explanation for the correct answer
Example:
  • Researcher: “The correct answer is mailbox”
  • Participant: “Right, because mailbox starts with m, ends with x, and letters in this case refers to the mail”
Repeating explanation of correct answer using own words
Example:
  • Researcher: “The correct answer is that you would finish in second place because you were in 3rd place just before crossing the finish line”
  • Participant: “That makes sense, I am just swapping places with the person in 2nd”
Making a statement or judgment about the nature of the riddle or own answer
Examples:
  • Participant: “This was a tricky riddle, I did not expect that to be the answer”
  • Participant: “My answer was stupid”
Realization of correct answer
Examples:
  • “Ohh”
  • “Ahh”
  • “Mhmm”
Acknowledgment of correct answer
Examples:
  • “That makes sense”
  • “I see”
Bodily:
  • Hand gesture
I.e., hand gestures that express an acknowledgment, realization, or judgment regarding the correct answer or own answer
E.g., Participant covering their mouth after receiving the correct answer
  • The following body movements:
    • Leaning or sitting back in chair
    • Slumping over
Time Interval:
If emergent understanding has occurred, identify a time interval for when it occurred (e.g., 10:00–10:05). To identify when a moment of understanding has ended, use the following indicators:
  • Absence of behavioural indicators of understanding:
I.e., Nodding, acknowledgment of correct answer, etc., has ended
  • Fading of behavioural indicators of understanding:
E.g., Participant’s smile begins to fade
  • Attention shifts from the riddle:
    E.g., Participant moves on to the next item or page in the survey
    Note: Absence or fading of indicators + shift in attention is the strongest indicator.
    Note: Participant’s may shift their attention from the riddle (e.g., move on to the next item), but behavioural indicators of understanding (e.g., smiling) are present. In these cases, rely on intuition to determine if the behavioural indicator(s) is associated with the understanding that occurred. If it is, identify when the indicator is absent or fades to determine when understanding has ended.

References

  1. Roumell, E.A.; Walker, J.; Salajan, F.D. Lifelong Learning and Education Policy in North America. In Third International Handbook of Lifelong Learning; Springer: Berlin/Heidelberg, Germany, 2023; pp. 633–654. [Google Scholar]
  2. Duke, N.K.; Cartwright, K.B. The Science of Reading Progresses: Communicating Advances beyond the Simple View of Reading. Read. Res. Q. 2021, 56, S25–S44. [Google Scholar] [CrossRef]
  3. Pearson, P.D.; Palincsar, A.S.; Biancarosa, G.; Berman, A.I. Reaping the Rewards of the Reading for Understanding Initiative. Natl. Acad. Educ. 2020. [Google Scholar] [CrossRef]
  4. Olson, D.R. Making Sense: What It Means to Understand; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar]
  5. Olson, D.R. Ascribing Understanding to Ourselves and Others. Am. Psychol. 2023. [Google Scholar] [CrossRef] [PubMed]
  6. Bruner, J.S. The Process of Education. Phys. Teach. 1965, 3, 369–370. [Google Scholar] [CrossRef]
  7. Damasio, A.R. Descartes’ Error. Emotion, Reason and the Human Brain; Grosset/Putnam 1994: New York, NY, USA, 1994. [Google Scholar]
  8. Gardner, H.E. Frames of Mind: The Theory of Multiple Intelligences; Basic Books: New York, NY, USA, 2011. [Google Scholar]
  9. Goleman, D. Emotional Intelligence: Why It Can Matter More than IQ; Bloomsbury Publishing: London, UK, 2020. [Google Scholar]
  10. Piaget, J. The Construction of Reality in the Child; Routledge: London, UK, 2013. [Google Scholar]
  11. Vygotsky, L.S. The Collected Works of LS Vygotsky: The Fundamentals of Defectology; Springer Science & Business Media: Berlin, Germany, 1987; Volume 2. [Google Scholar]
  12. Russell, J.A. A Circumplex Model of Affect. J. Pers. Soc. Psychol. 1980, 39, 1161–1178. [Google Scholar] [CrossRef]
  13. James, W. The Principles of Psychology. Henry Holt 1890. [Google Scholar] [CrossRef]
  14. Damasio, A. Feeling & Knowing: Making Minds Conscious; Pantheon Books: New York, NY, USA, 2021. [Google Scholar]
  15. Ekman, P.; Rosenberg, E.L. What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS); Oxford University Press: New York, NY, USA, 1997. [Google Scholar]
  16. Barrett, L.F. How Emotions Are Made: The Secret Life of the Brain; Houghton Mifflin Harcourt: Boston, MA, USA, 2017. [Google Scholar]
  17. Lazarus, R.S. Emotion and Adaptation; Oxford University Press: New York, NY, USA, 1991; Volume 557. [Google Scholar]
  18. Frijda, N.H. The Emotions; Cambridge University Press: Cambridge, UK, 1986. [Google Scholar]
  19. Frijda, N.H. The Laws of Emotion; Taylor and Francis: Abingdon, UK, 2007. [Google Scholar] [CrossRef]
  20. Woodruff, E. AI Detection of Human Understanding in a Gen-AI Tutor. AI 2024, 5, 898–921. [Google Scholar] [CrossRef]
  21. Damasio, A. Self Comes to Mind: Constructing the Conscious Brain; Pantheon/Random House: New York, NY, USA, 2010. [Google Scholar]
  22. Kauffman, S.A. The Origins of Order: Self-Organization and Selection in Evolution; Oxford University Press: New York, NY, USA, 1993. [Google Scholar]
  23. Kauffman, S.A. At Home in the Universe: The Search for Laws of Self-Organization and Complexity; Oxford University Press: New York, NY, USA, 1995. [Google Scholar]
  24. Stöckli, S.; Schulte-Mecklenbeck, M.; Borer, S.; Samson, A.C. Facial Expression Analysis with AFFDEX and FACET: A Validation Study. Behav. Res. Methods 2018, 50, 1446–1460. [Google Scholar] [CrossRef]
  25. Fu, G.; Zhou, X.; Wu, S.J.; Nikoo, H.; Panesar, D.; Zheng, P.P.; Oatley, K.; Lee, K. Discrete Emotions Discovered by Contactless Measurement of Facial Blood Flows. Cogn. Emot. 2022, 36, 1429–1439. [Google Scholar] [CrossRef]
  26. Biggs, J.B.; Collis, K.F. Evaluating the Quality of Learning: The SOLO Taxonomy (Structure of the Observed Learning Outcome); Academic Press: Cambridge, MA, USA, 1982. [Google Scholar]
  27. Bartlett, F.C. Remembering: A Study in Experimental and Social Psychology; Cambridge University Press: Cambridge, UK, 1932. [Google Scholar]
  28. Johnson-Laird, P.N. Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness; Harvard University Press: Cambridge, MA, USA, 1983. [Google Scholar]
  29. Piaget, J. The Origins of Intelligence in Children; International Universities Press: Madison, CT, USA, 1952. [Google Scholar]
  30. Vosniadou, S.; Brewer, W.F. Mental Models of the Earth: A Study of Conceptual Change in Childhood. Cognit. Psychol. 1992, 24, 535–585. [Google Scholar] [CrossRef]
  31. Weinstein, C.E.; Mayer, R.E. The Teaching of Learning Strategies. In Handbook of Research on Teaching; Wittrock, M.C., Ed.; Macmillan: New York, NY, USA, 1986; pp. 315–327. [Google Scholar]
  32. Barnett, S.M.; Ceci, S.J. When and Where Do We Apply What We Learn? A Taxonomy for Far Transfer. Psychol. Bull. 2002, 128, 612–637. [Google Scholar] [CrossRef] [PubMed]
  33. Bar-Hillel, M.; Noah, T.; Frederick, S. Learning Psychology from Riddles: The Case of Stumpers. Judgm. Decis. Mak. 2018, 13, 112–122. [Google Scholar] [CrossRef]
  34. Toplak, M.E.; West, R.F.; Stanovich, K.E. Assessing Miserly Information Processing: An Expansion of the Cognitive Reflection Test. Think. Reason. 2014, 20, 147–168. [Google Scholar] [CrossRef]
  35. Otamendi, F.J.; Sutil Martín, D.L. The Emotional Effectiveness of Advertisement. Front. Psychol. 2020, 11, 2088. [Google Scholar] [CrossRef]
  36. Gwet, K.L. Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement among Raters; Advanced Analytics, LLC: Fort Wayne, IN, USA, 2014. [Google Scholar]
  37. Wongpakaran, N.; Wongpakaran, T.; Wedding, D.; Gwet, K.L. A Comparison of Cohen’s Kappa and Gwet’s AC1 When Calculating Inter-Rater Reliability Coefficients: A Study Conducted with Personality Disorder Samples. BMC Med. Res. Methodol. 2013, 13, 61. [Google Scholar] [CrossRef] [PubMed]
  38. Dewey, J. Experience and Education. In The Educational Forum; Taylor & Francis: Abingdon, UK, 1986; Volume 50, pp. 241–252. [Google Scholar]
  39. Vygotsky, L.S.; Cole, M. Mind in Society: Development of Higher Psychological Processes; Harvard University Press: Boston, MA, USA, 1978. [Google Scholar]
  40. Tamir, D.I.; Thornton, M.A. Modeling the Predictive Social Mind. Trends Cogn. Sci. 2018, 22, 201–212. [Google Scholar] [CrossRef]
  41. Clark, A. Whatever next? Predictive Brains, Situated Agents, and the Future of Cognitive Science. Behav. Brain Sci. 2013, 36, 181–204. [Google Scholar] [CrossRef]
  42. Kasneci, E.; Seßler, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Gasser, U.; Groh, G.; Günnemann, S.; Hüllermeier, E.; et al. ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education. Learn. Individ. Differ. 2023, 103, 102274. [Google Scholar] [CrossRef]
  43. Ryumina, E.; Dresvyanskiy, D.; Karpov, A. In Search of a Robust Facial Expressions Recognition Model: A Large-Scale Visual Cross-Corpus Study. Neurocomputing 2022, 514, 435–450. [Google Scholar] [CrossRef]
  44. Janiesch, C.; Zschech, P.; Heinrich, K. Machine Learning and Deep Learning. Electron. Mark. 2021, 31, 685–695. [Google Scholar] [CrossRef]
  45. Lu, C.; Zong, Y.; Zheng, W.; Li, Y.; Tang, C.; Schuller, B.W. Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition. IEEEACM Trans. Audio Speech Lang. Process. 2022, 30, 2217–2230. [Google Scholar] [CrossRef]
  46. Fairclough, S.H.; Venables, L. Prediction of Subjective States from Psychophysiology: A Multivariate Approach. Biol. Psychol. 2006, 71, 100–110. [Google Scholar] [CrossRef] [PubMed]
  47. Zhao, G.; Zhang, H.; Liu, Z.; Zhang, S.; Zhang, Y. Multiscale Convolutional Neural Networks for Affect Recognition Using EEG and Peripheral Physiological Signals. arXiv 2019, arXiv:1911.12918. [Google Scholar]
Figure 1. Simplified flowchart illustrating the methodology.
Figure 1. Simplified flowchart illustrating the methodology.
Ai 06 00018 g001
Figure 2. Average distribution of AU frames for each phase of understanding.
Figure 2. Average distribution of AU frames for each phase of understanding.
Ai 06 00018 g002
Figure 3. Facial expressions associated with (1) nascent understanding, (2) misunderstanding, (3) confusion, (4) emergent understanding, and (5) deep understanding.
Figure 3. Facial expressions associated with (1) nascent understanding, (2) misunderstanding, (3) confusion, (4) emergent understanding, and (5) deep understanding.
Ai 06 00018 g003
Figure 4. Line graph of test set weighted averages for precision, recall, F1 score, and AUC for optimized models and logistic regression.
Figure 4. Line graph of test set weighted averages for precision, recall, F1 score, and AUC for optimized models and logistic regression.
Ai 06 00018 g004
Figure 5. Confusion matrix for the optimized GBM model’s performance on the test set.
Figure 5. Confusion matrix for the optimized GBM model’s performance on the test set.
Ai 06 00018 g005
Figure 6. Bar graph of feature importance scores for AUs.
Figure 6. Bar graph of feature importance scores for AUs.
Ai 06 00018 g006
Figure 7. Line graphs of lasso coefficients for AUs per phase.
Figure 7. Line graphs of lasso coefficients for AUs per phase.
Ai 06 00018 g007
Table 1. Number of observations for each phase of understanding, with the total number of instances across all phases.
Table 1. Number of observations for each phase of understanding, with the total number of instances across all phases.
Phase of UnderstandingObservations
Nascent Understanding249
Misunderstanding171
Confusion200
Emergent Understanding287
Deep Understanding338
Total Observations1245
Table 2. Participants’ answers, think aloud, justifications, and levels of certainty in response to riddles across phases of understanding.
Table 2. Participants’ answers, think aloud, justifications, and levels of certainty in response to riddles across phases of understanding.
PhaseRiddleAnswerThink AloudJustificationLevel of Certainty
NascentWhich month has 28 days?JulyDon’t knowNot sure1 (Not certain)
MisunderstandingI have two coins that equal thirty cents and one is not a nickel. What two coins do I have?Quarter and nickelX + Y = 30 cents... not a nickel…dime x2 = 20... when they say one is not a nickel does this mean the other could be??? I know it says one is not a nickel, but maybe that means the first one is not a nickel (so maybe a quarter) and then the second one is a nickel... 2 (Somewhat certain)
ConfusionYou’re in a dark room with a candle, a wood stove, and a gas lamp. You only have one match, what do you light first? Not sure why I need to light anything
EmergentI have two coins that equal thirty cents and one is not a nickel. What two coins do I have?Quarter, nickelA + B = 30. Coins 25, 15, 10, 1. Ok this isn’t a coins problem. Aha the other is a nickel. 25 + 5, it said 1 is not a nickel. 4 (Very certain)
DeepYou are a cyclist in a cross-country race. Just before crossing the finish line, you overtake the person in second place. In what place did you finish?Second Easy—if you pass the person—u take over that person’s place.4 (Very certain)
Table 3. Stratified 80–20 test–train splits for each phase of understanding.
Table 3. Stratified 80–20 test–train splits for each phase of understanding.
Phase of UnderstandingTraining SetsTest Sets
Nascent Understanding20049
Misunderstanding13734
Confusion16040
Emergent Understanding23057
Deep Understanding27167
Total Observations998247
Table 4. Performance metrics—precision, recall, F1 score, and AUC—for five baseline models: GBM, RF, lasso, elastic net, and decision tree.
Table 4. Performance metrics—precision, recall, F1 score, and AUC—for five baseline models: GBM, RF, lasso, elastic net, and decision tree.
MetricNascent (GBM/RF/Lasso/Net/Tree)Misunderstanding (GBM/RF/Lasso/Net/Tree)Confusion (GBM/RF/Lasso/Net/Tree)Emergent (GBM/RF/Lasso/Net/Tree)Deep (GBM/RF/Lasso/Net/Tree)
Precision0.84/0.85/0.83/0.83/0.350.87/0.88/0.86/0.86/0.190.93/0.94/0.9/0.9/0.60.97/0.97/0.95/0.95/0.70.92/0.84/0.93/0.93/0.5
Recall0.93/0.88/0.93/0.93/0.280.98/0.94/0.99/0.99/0.060.95/0.94/0.97/0.97/0.440.95/0.94/0.92/0.92/0.80.67/0.76/0.59/0.58/0.69
F1 Score0.88/0.86/0.88/0.88/0.30.92/0.9/0.92/0.92/0.20.94/0.94/0.93/0.93/0.50.96/0.95/0.94/0.94/0.740.78/0.8/0.72/0.72/0.54
AUC0.75/0.72/0.73/0.73/0.660.7/0.66/0.68/0.68/0.650.92/0.91/0.83/0.83/0.740.97/0.97/0.94/0.94/0.880.8/0.77/0.77/0.77/0.74
Table 5. Performance metrics—precision, recall, F1 score, and AUC—for three optimized models: GBM, RF, and lasso.
Table 5. Performance metrics—precision, recall, F1 score, and AUC—for three optimized models: GBM, RF, and lasso.
MetricNascent
(GBM/RF/Lasso)
Misunderstanding
(GBM/RF/Lasso)
Confusion
(GBM/RF/Lasso)
Emergent
(GBM/RF/Lasso)
Deep
(GBM/RF/Lasso)
Precision0.84/0.84/0.830.87/0.87/0.860.93/0.95/0.900.97/0.97/0.950.92/0.89/0.93
Recall0.93/0.93/0.930.98/0.98/0.990.94/0.93/0.970.95/0.94/0.920.69/0.71/0.59
F1 Score0.89/0.89/0.880.92/0.92/0.920.94/0.94/0.930.96/0.95/0.940.79/0.79/0.72
AUC0.77/0.72/0.730.68/0.63/0.680.93/0.92/0.830.97/0.96/0.940.79/0.78/0.77
Table 6. Test set weighted averages for precision, recall, F1 score, and AUC for optimized models and logistic regression.
Table 6. Test set weighted averages for precision, recall, F1 score, and AUC for optimized models and logistic regression.
MetricGBMRFLassoLogistic
Precision0.910.910.900.90
Recall0.870.880.850.86
F1 Score0.880.890.860.87
AUC0.840.820.800.79
Table 7. Feature importance scores for AUs, and lasso coefficients for AUs per phase.
Table 7. Feature importance scores for AUs, and lasso coefficients for AUs per phase.
Action UnitImportanceNascentMisunderstandingConfusionEmergentDeep
Brow Furrow38.800.0980.0000.550−0.976−0.290
Brow Raise0.00−0.0170.072−0.0350.2130.000
Cheek Raise5.790.0000.0000.0000.3000.000
Chin Raise5.010.0000.0000.2210.000−0.227
Dimpler4.25−0.0010.0000.2160.0000.000
Eye Closure21.750.0440.081−0.328−0.2010.000
Eye Widen7.230.0000.000−0.0280.0000.105
Inner Brow Raise3.000.0000.2950.2480.0000.000
Jaw Drop7.02−0.088−0.0940.1480.4440.000
Lid Tighten12.160.0000.0001.1370.6670.000
Lip Corner Depressor5.340.0000.0000.004−0.1030.000
Lip Press6.100.0740.0000.0000.000−0.119
Lip Pucker1.560.003−0.0220.000−0.0160.000
Lip Stretch0.98−0.0050.0000.1420.0000.000
Lip Suck1.140.0000.1340.000−0.0220.000
Mouth Open17.820.000−0.2280.1210.710−0.160
Neutral100.000.5370.3030.000−3.735−0.447
Nose Wrinkle0.240.2020.0000.0000.0000.000
Smile7.970.3770.0000.0000.0000.000
Smirk4.400.1900.0040.0000.000−0.154
Upper Lip Raise1.050.2200.0000.0000.0000.000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lazic, M.; Woodruff, E.; Jun, J. Decoding Subjective Understanding: Using Biometric Signals to Classify Phases of Understanding. AI 2025, 6, 18. https://doi.org/10.3390/ai6010018

AMA Style

Lazic M, Woodruff E, Jun J. Decoding Subjective Understanding: Using Biometric Signals to Classify Phases of Understanding. AI. 2025; 6(1):18. https://doi.org/10.3390/ai6010018

Chicago/Turabian Style

Lazic, Milan, Earl Woodruff, and Jenny Jun. 2025. "Decoding Subjective Understanding: Using Biometric Signals to Classify Phases of Understanding" AI 6, no. 1: 18. https://doi.org/10.3390/ai6010018

APA Style

Lazic, M., Woodruff, E., & Jun, J. (2025). Decoding Subjective Understanding: Using Biometric Signals to Classify Phases of Understanding. AI, 6(1), 18. https://doi.org/10.3390/ai6010018

Article Metrics

Back to TopTop