BPSKT: Knowledge Tracing with Bidirectional Encoder Representation Model Pre-Training and Sparse Attention
Abstract
:1. Introduction
- (1)
- The paper proposes a new knowledge tracing model, BPSKT, which focuses on the characteristics of the response pairs of the long sequence problem and employs a suitable neural network for feature extraction, resulting in a better solution to the long sequence problem;
- (2)
- Modifying self-attention to sparse attention in BERT, after fine-tuning and decoders’ output, the data can be extended to other downstream tasks, and good results can be obtained on other domain datasets;
- (3)
- Extensive validation experiments were conducted on multiple knowledge tracing datasets, and various metrics were used to analyze the experimental data, thereby demonstrating the logical rationality and structural effectiveness of the model.
2. Related Work
2.1. Development of Knowledge Tracing
2.2. Evolution of Attention Mechanisms
3. Method
3.1. Knowledge Tracing Problem Set
3.2. BPSKT Methodology
Algorithm 1 BPSKT Complete Algorithm |
Input: LRS (Long response sequence) Output: PSAV(Predicted student ability values)
|
3.3. Graph Convolutional Neural Networks(GCN)
- (1)
- Introducing the own-degree matrix to solve the self-passing problem;
- (2)
- The normalization operation on the adjacency matrix is obtained by multiplying both sides of the adjacency matrix by the degree of the nodes in square and then taking the inverse.
3.4. Sparse Attention Mechanism
3.5. BPSKT Pre-training and Fine-tuning
3.6. Response Prediction Network
4. Experiment
4.1. Comparative Experiments on the Dataset
4.1.1. Datasets
4.1.2. Comparison of Models
4.1.3. Parameter Setting
4.2. Experimental Details and Results
4.2.1. Comparative Experiments
4.2.2. Experimental Ablation Studies
4.2.3. Visualization of Experimental Studies
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Nwana, H.S. Intelligent tutoring systems: An overview. Artif. Intell. Rev. 1990, 4, 251–277. [Google Scholar] [CrossRef]
- Abdelrahman, G.; Wang, Q.; Nunes, B. Knowledge tracing: A survey. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar] [CrossRef]
- Song, X.; Li, J.; Cai, T.; Yang, S.; Yang, T.; Liu, C. A survey on deep learning based knowledge tracing. Knowl.-Based Syst. 2022, 258, 110036. [Google Scholar] [CrossRef]
- Anderson, J.R. Cognitive Modelling and Intelligent Tutoring; Psychology Press: Hillsdale, NJ, USA, 1986. [Google Scholar]
- Corbett, A.T.; Anderson, J.R. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Model. User-Adapt. Interact. 1994, 4, 253–278. [Google Scholar] [CrossRef]
- Piech, C.; Bassen, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L.J.; Sohl-Dickstein, J. Deep knowledge tracing. Adv. Neural Inf. Process. Syst. 2015, 28, 505–513. [Google Scholar]
- Wang, L.; Sy, A.; Liu, L.; Piech, C. Deep knowledge tracing on programming exercises. In Proceedings of the Fourth (2017) ACM Conference on Learning@ Scale, Cambridge, MA, USA, 20–21 April 2017; pp. 201–204. [Google Scholar]
- Zhang, J.; Shi, X.; King, I.; Yeung, D.Y. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 May 2017; pp. 765–774. [Google Scholar]
- Nakagawa, H.; Iwasawa, Y.; Matsuo, Y. Graph-based knowledge tracing: Modeling student proficiency using graph neural network. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, Thessaloniki, Greece, 14–17 October 2019; pp. 156–163. [Google Scholar]
- Tan, W.; Jin, Y.; Liu, M.; Zhang, H. Bidkt: Deep knowledge tracing with bert. In Proceedings of the International Conference on Ad Hoc Networks, Virtual, 6–7 December 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 260–278. [Google Scholar]
- Song, X.; Li, J.; Lei, Q.; Zhao, W.; Chen, Y.; Mian, A. Bi-CLKT: Bi-graph contrastive learning based knowledge tracing. Knowl.-Based Syst. 2022, 241, 108274. [Google Scholar] [CrossRef]
- Su, Y.; Liu, Q.; Liu, Q.; Huang, Z.; Yin, Y.; Chen, E.; Ding, C.; Wei, S.; Hu, G. Exercise-enhanced sequential modeling for student performance prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Child, R.; Gray, S.; Radford, A.; Sutskever, I. Generating long sequences with sparse transformers. arXiv 2019, arXiv:1904.10509. [Google Scholar]
- Tong, H.; Wang, Z.; Zhou, Y.; Tong, S.; Han, W.; Liu, Q. Hgkt: Introducing hierarchical exercise graph for knowledge tracing. arXiv 2020, arXiv:2006.16915. [Google Scholar]
- Asselman, A.; Khaldi, M.; Aammou, S. Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interact. Learn. Environ. 2021, 31, 3360–3379. [Google Scholar] [CrossRef]
- Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Bahdanau, D. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Sammani, F.; Melas-Kyriazi, L. Show, edit and tell: A framework for editing image captions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4808–4816. [Google Scholar]
- Wojna, Z.; Gorban, A.N.; Lee, D.S.; Murphy, K.; Yu, Q.; Li, Y.; Ibarz, J. Attention-based extraction of structured information from street view imagery. In Proceedings of the IEEE 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; Volume 1, pp. 844–850. [Google Scholar]
- Vaswani, A. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Ghosh, A.; Heffernan, N.; Lan, A.S. Context-aware attentive knowledge tracing. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 2330–2339. [Google Scholar]
- Pandey, S.; Karypis, G. A self-attentive model for knowledge tracing. arXiv 2019, arXiv:1907.06837. [Google Scholar]
- Luo, Y.; Xiao, B.; Jiang, H.; Ma, J. Heterogeneous graph based knowledge tracing. In Proceedings of the IEEE 2022 11th International Conference on Educational and Information Technology (ICEIT), Chengdu, China, 6–8 January 2022; pp. 226–231. [Google Scholar]
- Yang, Y.; Shen, J.; Qu, Y.; Liu, Y.; Wang, K.; Zhu, Y.; Zhang, W.; Yu, Y. GIKT: A graph-based interaction model for knowledge tracing. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, 14–18 September 2020; Proceedings, Part I. Springer: Berlin/Heidelberg, Germany, 2021; pp. 299–315. [Google Scholar]
- Wei, L.; Li, B.; Li, Y.; Zhu, Y. Time interval aware self-attention approach for knowledge tracing. Comput. Electr. Eng. 2022, 102, 108179. [Google Scholar] [CrossRef]
- Graves, A. Adaptive computation time for recurrent neural networks. arXiv 2016, arXiv:1603.08983. [Google Scholar]
- Roy, A.; Saffar, M.; Vaswani, A.; Grangier, D. Efficient content-based sparse attention with routing transformers. Trans. Assoc. Comput. Linguist. 2021, 9, 53–68. [Google Scholar] [CrossRef]
- Ye, Z.; Guo, Q.; Gan, Q.; Qiu, X.; Zhang, Z. Bp-transformer: Modelling long-range context via binary partitioning. arXiv 2019, arXiv:1911.04070. [Google Scholar]
- Kitaev, N.; Kaiser, Ł.; Levskaya, A. Reformer: The efficient transformer. arXiv 2020, arXiv:2001.04451. [Google Scholar]
- Lei Ba, J.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Xin, J.; Tang, R.; Lee, J.; Yu, Y.; Lin, J. DeeBERT: Dynamic early exiting for accelerating BERT inference. arXiv 2020, arXiv:2004.12993. [Google Scholar]
- Choi, Y.; Lee, Y.; Shin, D.; Cho, J.; Park, S.; Lee, S.; Baek, J.; Bae, C.; Kim, B.; Heo, J. Ednet: A large-scale hierarchical dataset in education. In Proceedings of the Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, 6–10 July 2020; Proceedings, Part II 21. Springer: Berlin/Heidelberg, Germany, 2020; pp. 69–73. [Google Scholar]
- Shen, S.; Liu, Q.; Chen, E.; Wu, H.; Huang, Z.; Zhao, W.; Su, Y.; Ma, H.; Wang, S. Convolutional knowledge tracing: Modeling individualization in student learning process. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 25–30 July 2020; pp. 1857–1860. [Google Scholar]
- Guo, X.; Huang, Z.; Gao, J.; Shang, M.; Shu, M.; Sun, J. Enhancing knowledge tracing via adversarial training. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–29 October 2021; pp. 367–375. [Google Scholar]
Dataset | Questions | Students | Concepts | Interactions | Public |
---|---|---|---|---|---|
ASSISTments2009 | 26,688 | 4217 | 123 | 325,600 | Yes |
ASSISTments2015 | 100 | 19,917 | - | 683,566 | Yes |
ASSISTments2017 | 1680 | 3155 | 411 | 870,866 | Yes |
Statics2011 | 80 | 335 | 1362 | 350,192 | Yes |
EdNet | 11,658 | 10,000 | 290 | 687,265 | Yes |
Dataset | DKT | DKVMN | CKT | Bi-CLKT | AKT | SAKT | BPSKT |
---|---|---|---|---|---|---|---|
AUC ACC | AUC ACC | AUC ACC | AUC ACC | AUC ACC | AUC ACC | AUC ACC | |
ASSISTments2009 | 0.8600 0.8385 | 0.8157 0.8003 | 0.8254 0.8477 | 0.8377 0.8022 | 0.8346 0.8379 | 0.8480 0.8157 | 0.8785 0.8576 |
ASSISTments2015 | 0.7365 0.7125 | 0.7268 0.7022 | 0.7291 0.7355 | 0.7652 0.7575 | 0.7828 0.7883 | 0.8540 0.8621 | 0.8288 0.8438 |
ASSISTments2017 | 0.7343 0.7055 | 0.6853 0.6695 | 0.7119 0.7345 | 0.7450 0.7642 | 0.7702 0.7725 | 0.7340 0.7369 | 0.7865 0.8038 |
Statics2011 | 0.8233 0.8038 | 0.8284 0.8089 | 0.8241 0.8229 | 0.8321 0.8472 | 0.8268 0.8191 | 0.8530 0.8324 | 0.8692 0.8522 |
EdNet | 0.7638 0.7452 | 0.7663 0.7079 | 0.7327 0.7491 | 0.7756 0.7792 | 0.7686 0.7756 | 0.7513 0.7073 | 0.8005 0.8241 |
Efficiency | DKT | DKVMN | CKT | Bi-CLKT | AKT | SAKT | BPSKT |
---|---|---|---|---|---|---|---|
Training Time | 13 h 32 min | 12 h 57 min | 13 h 3 min | 10 h 15 min | 9 h 27 min | 9 h 55 min | 7 h 52 min |
Prediction Time | 7 h 23 min | 7 h 40 min | 8 h 10 min | 6 h 47 min | 6 h 12 min | 6 h 28 min | 5 h 44 min |
Memory Usage Ratio (GB) | 6.2/7.9 (78%) | 6.7/7.9 (85%) | 6/7.9 (76%) | 5.6/7.9 (71%) | 5.2/7.9 (66%) | 5.6/7.9 (71%) | 4.5/7.9 (57%) |
Dataset | BPSKT-MS | BPSKT-NG | BPSKT-NS | BPSKT-NAC | BPSKT |
---|---|---|---|---|---|
ASSISTments2009 | 0.8611 | 0.8689 | 0.8555 | 0.8726 | 0.8785 |
ASSISTments2015 | 0.8099 | 0.8117 | 0.8257 | 0.8232 | 0.8288 |
ASSISTments2017 | 0.7692 | 0.7621 | 0.7583 | 0.7752 | 0.7865 |
Statics2011 | 0.8585 | 0.8617 | 0.8406 | 0.8603 | 0.8692 |
EdNet | 0.7852 | 0.7731 | 0.7685 | 0.7967 | 0.8005 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, W.; Xu, Z.; Qiu, L. BPSKT: Knowledge Tracing with Bidirectional Encoder Representation Model Pre-Training and Sparse Attention. Electronics 2025, 14, 458. https://doi.org/10.3390/electronics14030458
Zhao W, Xu Z, Qiu L. BPSKT: Knowledge Tracing with Bidirectional Encoder Representation Model Pre-Training and Sparse Attention. Electronics. 2025; 14(3):458. https://doi.org/10.3390/electronics14030458
Chicago/Turabian StyleZhao, Weidong, Zhen Xu, and Liqing Qiu. 2025. "BPSKT: Knowledge Tracing with Bidirectional Encoder Representation Model Pre-Training and Sparse Attention" Electronics 14, no. 3: 458. https://doi.org/10.3390/electronics14030458
APA StyleZhao, W., Xu, Z., & Qiu, L. (2025). BPSKT: Knowledge Tracing with Bidirectional Encoder Representation Model Pre-Training and Sparse Attention. Electronics, 14(3), 458. https://doi.org/10.3390/electronics14030458