Deep Knowledge Tracing Integrating Temporal Causal Inference and PINN
Abstract
:1. Introduction
- (1)
- We use a temporal causal model to explore the relationships between the knowledge points, which can be combined with the general mastery levels of students to derive the final knowledge state influenced by causally related knowledge points.
- (2)
- By implementing backdoor adjustment, we can obtain students’ learning abilities and problem difficulty levels, effectively removing the confounding factors related to students’ preferences. It can improve the prediction of students’ learning abilities and problem difficulties.
- (3)
- We add the physical loss term. We also increase the accuracy of predicting student performance by increasing the penalty for violating common-sense patterns in the student learning process.
2. Related Work
2.1. Knowledge Tracing
2.2. Temporal Causal Inference
2.3. PINN Model
3. TLPKT_PINN Model
3.1. Overall Framework of the Model
- Temporal causal knowledge point relationship mining: Inputting student ID, problem-solving time, and the correct or incorrect situation of knowledge points, the intensity function between knowledge points is obtained. Then, the intensity function is combined with Granger causality to finally obtain the temporal causal relationships between knowledge points. By using temporal causality to obtain the causal relationship between knowledge points, the causal relationship between knowledge points is combined with the student’s mastery level of knowledge points output by the knowledge tracking model to obtain the final student’s mastery level of knowledge points, which is finally used for predicting student grades;
- Calculating students’ learning ability and the difficulty of exercises: Students with different learning abilities have varying preferences for answering questions. Students with stronger learning abilities tend to prefer more relatively difficult questions, which may result in a lower accuracy rate. However, we should not underestimate their learning potential because of this. Similarly, students with lower learning abilities may also have a low accuracy rate on simple exercises, and we cannot therefore overestimate the difficulty of the question. To more accurately assess students’ learning abilities and the difficulty of exercises, we constructed a prior causal model for learning ability and exercise difficulty. Initially, we assumed that all students had the same level of learning ability. When identifying students’ abilities, we considered exercise difficulty as the confounding factor, and when identifying exercise difficulty, we treated students’ abilities as the confounding factor. We employed backdoor adjustment to eliminate the influence of these confounding factors.
- Using the physical loss function to adjust the loss term and optimize the prediction: the logistic growth model can describe the dynamic process of students’ knowledge mastery. In order to enable the neural network to learn the rules that conform to the differential equations of the logistic model, the logistic model is used as a physical model to construct a physical loss function for constraints. The academic performance of students is related to their abilities and the difficulty of their problems. If the predicted student score is high, it indicates that the student’s learning ability is high and the selected question is relatively difficult. On the contrary, if the predicted student score is low, it indicates that the student’s learning ability is low and the selected question is relatively easy. We use student abilities obtained through backdoor adjustments and interventions, and if we violate the above common sense, the punishment will be increased.
3.2. Causal Relationship Mining of Knowledge Points
3.3. LPKT Model Integrating Temporal Causality
3.3.1. Learning Embedding and Knowledge Point Embedding
3.3.2. The Learning Module
3.3.3. Forgetting Module
3.3.4. Prediction Module and Objective Function
3.4. Loss Function Optimization
3.4.1. Student’s Ability and Exercise Difficulty Based on Backdoor Adjustment
3.4.2. The Logistic Model
3.4.3. Optimize the Loss Function
4. Experiment
4.1. Datasets
- ASSIST2012: This dataset was collected from the educational platform ASSISTMENTS, which provides high school math problems. This dataset contains data from the 2012–2013 academic year, and students need to do similar exercises to master these problem sets. We filtered out records without knowledge concepts and students who completed fewer than 20 questions.
- ASSISTchall: This dataset was collected from ASSISTMENTS in 2017 and was used in a data mining competition. The data were gathered from a longitudinal study that tracked middle school students’ use of the ASSISTMENTS blended learning platform from 2004 to 2007. In this dataset, the learning sequence of students is much longer than ASSIST2012.
4.2. Results and Discussion
- BKT: Using the hidden Markov model and Bayesian inference methods, it is used to evaluate and predict the dynamic changes in students’ mastery of knowledge points during the learning process.
- DKT: The model is based on a Recurrent Neural Network (RNN) and is used to dynamically track students’ mastery of knowledge points. It analyzes students’ interactive data, learns their knowledge status, and predicts their performance in future tasks.
- DKVMN: This model defines a static matrix to store potential knowledge concepts and a dynamic matrix to update the corresponding knowledge states over time through read and write operations, using a memory network to obtain the interpretable student knowledge states.
- AKT: This uses two self-attention encoders to learn context-aware representations of exercises and answers; the knowledge evolution model is called the knowledge retriever, which utilizes attention mechanisms to retrieve knowledge obtained in the past that is relevant to the current exercise.
- LPKT: Modeling the learning gain during the learning process by capturing the difference between two consecutive learning units. The diversity of learning benefits is measured by students’ relevant knowledge status and interval time. The learning gate is used to distinguish students’ ability to absorb knowledge, and the forgetting gate is used to determine the decrease in students’ knowledge over time.
4.3. Ablation Experiment
- LPKT-PINN: Remove the temporal causality module from the model;
- TLPKT_PINN_ability: Remove the module that adjusts the learning ability through backdoor adjustment in the model, and remove the learning ability term from the loss function term;
- TLPKT: Remove the physical loss module from the model.
- TLPKT_1: Remove additional loss terms from the model.
4.4. Updating the Mastery Level of Students’ Knowledge Points
4.5. Performance Analysis of PINN Model
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI-EdTech | AI-powered educational technologies |
AKT | Context-aware Attentive Knowledge Tracing |
BKT | Bayesian Knowledge Tracing |
DBKT | Dynamic Bayesian Knowledge Tracing |
DBNs | Dynamic Bayesian Networks |
DKT | Deep Knowledge Tracing |
DKVMN | Dynamic Key–Value Memory Network |
GKT | Graph-based Knowledge Tracing |
HMM | Hidden Markov Model |
LPKT | Learning Process-consistent Knowledge Tracing |
LSTM | Long short-term memory |
MDPI | Deep Knowledge Tracing Integrating Temporal Causal Inference and PINN |
RNNs | Recurrent Neural Networks |
TLS-BKT | Three-Learning-State BKT |
References
- Darvishi, A.; Khosravi, H.; Sadiq, S.; Gašević, D.; Siemens, G. Impact of AI assistance on student agency. Comput. Educ. 2024, 210, 104967. [Google Scholar] [CrossRef]
- Chen, H.; Yin, C.; Li, R.; Rong, W.; Xiong, Z.; David, B. Enhanced learning resource recommendation based on online learning style model. Tsinghua Sci. Technol. 2019, 25, 348–356. [Google Scholar] [CrossRef]
- Shen, S.; Liu, Q.; Huang, Z.; Zheng, Y.; Yin, M.; Wang, M.; Chen, E. A survey of knowledge tracing. arXiv 2021, arXiv:2105.15106. [Google Scholar]
- Corbett, A.T.; Anderson, J.R. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Model. User-Adapt. Interact. 1995, 4, 253–278. [Google Scholar] [CrossRef]
- Eddy, S.R. Hidden markov models. Curr. Opin. Struct. Biol. 1996, 6, 361–365. [Google Scholar] [CrossRef]
- Piech, C.; Bassen, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L.J.; Sohl-Dickstein, J. Deep Knowledge Tracing. Adv. Neural Inform. Process. Syst. 2015, 28. [Google Scholar]
- Jordan, M.I. Serial order: A parallel distributed processing approach. In Advances in Psychology; North-Holland: Amsterdam, The Netherlands, 1997; Volume 121, pp. 471–495. [Google Scholar]
- Shen, S.; Liu, Q.; Chen, E.; Huang, Z.; Huang, W.; Yin, Y.; Su, Y.; Wang, S. Learning process-consistent knowledge tracing. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual, 14–18 August 2021; pp. 1452–1460. [Google Scholar]
- Zanellati, A.; Di Mitri, D.; Gabbrielli, M.; Levrini, O. Hybrid models for knowledge tracing: A systematic literature review. IEEE Trans. Learn. Technol. 2024, 17, 1021–1036. [Google Scholar] [CrossRef]
- Kaser, T.; Klingler, S.; Schwing, A.G.; Gross, M. Dynamic Bayesian networks for student modeling. IEEE Trans. Learn. Technol. 2017, 10, 450–462. [Google Scholar] [CrossRef]
- De Baker, R.S.J.; Corbett, A.T.; Aleven, V. More accurate student modeling through contextual estimation of slip and guess probabilities in Bayesian knowledge tracing. In Proceedings of the 9th International Conference on Intelligent Tutoring Systems (LNCS 5091), Montreal, QC, Canada, 23–27 June 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 406–415. [Google Scholar]
- Zeng, G.; Zhuang, J.; Huang, H.; Tian, M.; Gao, Y.; Liu, Y.; Yu, X. Use of Deep Learning for Continuous Prediction of Mortality for All Admissions in Intensive Care Units. Tsinghua Sci. Technol. 2023, 28, 639–648. [Google Scholar] [CrossRef]
- Yang, X.; Esquivel, J.A. Time-aware LSTM neural networks for dynamic personalized recommendation on business intelligence. Tsinghua Sci. Technol. 2023, 29, 185–196. [Google Scholar] [CrossRef]
- Zhang, J.; Shi, X.; King, I.; Yeung, D.Y. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 765–774. [Google Scholar]
- Nakagawa, H.; Iwasawa, Y.; Matsuo, Y. Graph-based knowledge tracing: Modeling student proficiency using graph neural network. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, Thessaloniki, Greece, 14–17 October 2019; pp. 156–163. [Google Scholar]
- Ghosh, A.; Heffernan, N.; Lan, A.S. Context-aware attentive knowledge tracing. In Proceedings of the 26th ACM SIGKDD International Conference On Knowledge Discovery & DATA Mining, Virtual, 6–10 July 2020; pp. 2330–2339. [Google Scholar]
- Eljialy, A.E.M.; Uddin, M.Y.; Ahmad, S. Novel framework for an intrusion detection system using multiple feature selection methods based on deep learning. Tsinghua Sci. Technol. 2024, 29, 948–958. [Google Scholar] [CrossRef]
- Jiang, Z.; Ning, Z.; Miao, H.; Wang, L. STDNet: A Spatio-Temporal Decomposition Neural Network for Multivariate Time Series Forecasting. Tsinghua Sci. Technol. 2024, 29, 1232–1247. [Google Scholar] [CrossRef]
- Daley, D.J.; Vere-Jones, D. An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods; Springer: New York, NY, USA, 2003. [Google Scholar]
- Granger, C.W.J. Investigating causal relations by econometric models and cross-spectral methods. Econom. J. Econom. Soc. 1969, 37, 424–438. [Google Scholar] [CrossRef]
- Didelez, V. Graphical models for marked point processes based on local independence. J. R. Stat. Soc. Ser. B Stat. Methodol. 2008, 70, 245–264. [Google Scholar] [CrossRef]
- Eichler, M.; Dahlhaus, R.; Dueck, J. Graphical modeling for multivariate Hawkes processes with nonparametric link functions. J. Time Ser. Anal. 2017, 38, 225–242. [Google Scholar] [CrossRef]
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Inferring solutions of differential equations using noisy multi-fidelity data. J. Comput. Phys. 2017, 335, 736–746. [Google Scholar] [CrossRef]
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Machine learning of linear differential equations using Gaussian processes. J. Comput. Phys. 2017, 348, 683–693. [Google Scholar] [CrossRef]
- Owhadi, H. Bayesian numerical homogenization. Multiscale Model. Simul. 2015, 13, 812–828. [Google Scholar] [CrossRef]
- Williams, C.K.I.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
- Xiao, S.; Yan, J.; Farajtabar, M.; Song, L.; Yang, X.; Zha, H. Learning time series associated event sequences with recurrent point process networks. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3124–3136. [Google Scholar] [CrossRef] [PubMed]
ASSIST2012 | ASSISTchall | |
---|---|---|
Number of students | 28,914 | 1600 |
Number of knowledge concepts | 265 | 102 |
Number of problems | 532,090 | 3142 |
ASSIST2012 | ASSISTChall | |||
---|---|---|---|---|
AUC | RMSE | AUC | RMSE | |
BKT | 0.622 | 0.511 | 0.638 | 0.513 |
DKT | 0.701 | 0.432 | 0.721 | 0.447 |
DKVWM | 0.685 | 0.437 | 0.710 | 0.450 |
AKT | 0.769 | 0.414 | 0.766 | 0.431 |
LPKT | 0.778 | 0.407 | 0.772 | 0.415 |
TLPKT_PINN | 0.828 | 0.375 | 0.798 | 0.382 |
ASSIST2012 | ASSISTChall | |||
---|---|---|---|---|
AUC | RMSE | AUC | RMSE | |
LPKT_PINN | 0.786 | 0.410 | 0.788 | 0.405 |
TLPKT_PINN_ability | 0.803 | 0.384 | 0.789 | 0.392 |
TLPKT | 0.801 | 0.392 | 0.784 | 0.398 |
TLPKT_1 | 0.792 | 0.405 | 0.781 | 0.399 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lu, F.; Li, Y.; Bao, Y. Deep Knowledge Tracing Integrating Temporal Causal Inference and PINN. Appl. Sci. 2025, 15, 1504. https://doi.org/10.3390/app15031504
Lu F, Li Y, Bao Y. Deep Knowledge Tracing Integrating Temporal Causal Inference and PINN. Applied Sciences. 2025; 15(3):1504. https://doi.org/10.3390/app15031504
Chicago/Turabian StyleLu, Faming, Yingran Li, and Yunxia Bao. 2025. "Deep Knowledge Tracing Integrating Temporal Causal Inference and PINN" Applied Sciences 15, no. 3: 1504. https://doi.org/10.3390/app15031504
APA StyleLu, F., Li, Y., & Bao, Y. (2025). Deep Knowledge Tracing Integrating Temporal Causal Inference and PINN. Applied Sciences, 15(3), 1504. https://doi.org/10.3390/app15031504