Continual Pre-Training of Language Models for Concept Prerequisite Learning with Graph Neural Networks
Abstract
:1. Introduction
- A two-stage framework for concept prerequisite learning is proposed. The pre-trained language model is enhanced by two continual pre-training tasks in the first stage, to obtain better textual representation, the textual and structural information is fused in the second stage, and the prerequisite relationships between concepts are predicted end-to-end;
- A joint optimization approach of R-GCN and pre-trained language models is proposed, with hinge loss as an auxiliary training objective, instead of using them separately as feature extractors, allowing the two models to gradually generate concept representations more suitable for concept prerequisite prediction tasks;
- Extensive experiments were conducted on three real datasets, to evaluate the proposed model. The experimental results demonstrated the effectiveness of the proposed model, compared to all baseline models.
2. Related Works
2.1. Concept Prerequisite Prediction as Text Matching
2.2. Concept Prerequisite Prediction as Link Prediction
2.3. Continual Pre-Training of Language Models
3. Preliminaries
3.1. Resource–Concept Graph
3.2. Task Formulation
4. Method
4.1. Continual Pre-Training Stage
4.1.1. Masked Language Model
4.1.2. Relationship Discrimination
4.2. Joint Learning Stage
4.2.1. Text Encoder BERT
4.2.2. Graph Encoder R-GCN
4.2.3. Joint Learning Layer
5. Experiments
5.1. Experimental Setup
5.1.1. Datasets and Evaluation Metrics
- The University Course dataset (UCD) [29], which includes 654 computer science courses from universities in the USA, and 407 concepts. There are also prerequisite relationships of courses and concepts, respectively, in this dataset. For edges between courses and concepts, we assumed that a relationship existed if a concept appeared in the course captions;
- The LectureBank dataset (LBD) [17], which includes lecture files and topics from five domains: artificial intelligence; machine learning; natural language processing; deep learning; and information retrieval. We considered lecture files as resources, topics as concepts in this dataset, and hierarchy relationships between topics. For resource edge construction, we computed the cosine similarity of lecture file embedding, and set the threshold as 0.9;
- The MOOC dataset (MD) [9], which contains 382 MOOC video texts of computer science courses, and the same topic and number of concept prerequisite relationship pairs as in the University Course dataset. The construction of edges between resources and concepts was the same as the UCD dataset.
5.1.2. Baseline Models
- PREREQ [9]: we used the Pairwise Latent Dirichlet Allocation model to obtain the latent representation of concepts, and fed them into a Siamese network, to infer the relationships between concepts;
- BERT-base [11]: we fine-tuned on datasets, and used the [CLS] vector of concept pairs for prediction;
- VGAE [13]: the AU unsupervised model combines the ideas of autoencoder and variational inference; it samples the latent variables from a multidimensional Gaussian distribution, and the decoder predicts the label, based on the latent variables;
- R-GCN [37]: the core idea of R-GCN is to process tasks by learning the embedding vectors of nodes and relationships; it has strong scalability in dealing with multiple types of relationships;
- R-GCN(BERT): using the textual representation of concepts from BERT as the initialization of R-GCN;
- MHAVGAE [12]: this model constructs a resource–concept heterogeneous graph, initializing node feature with word2vec [39], and then uses multi-head attention and gating mechanisms to enhance concept representation; finally, it uses a variational graph autoencoder to predict the relationships between concepts.
5.1.3. Implementation Details
5.2. Main Results
5.3. Ablation Study
5.4. Model Analysis
5.4.1. Hyperparameter
5.4.2. Embedding Dimension
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Changuel, S.; Labroche, N.; Bouchon-Meunier, B. Resources Sequencing Using Automatic Prerequisite-Outcome Annotation. ACM Trans. Intell. Syst. Technol. 2015, 6, 1–30. [Google Scholar] [CrossRef] [Green Version]
- Lu, Y.; Chen, P.; Pian, Y.; Zheng, V.W. CMKT: Concept Map Driven Knowledge Tracing. IEEE Trans. Learn. Technol. 2022, 15, 467–480. [Google Scholar] [CrossRef]
- Gao, W.; Liu, Q.; Huang, Z.; Yin, Y.; Bi, H.; Wang, M.; Ma, J.; Wang, S.; Su, Y. RCD: Relation Map Driven Cognitive Diagnosis for Intelligent Education Systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 501–510. [Google Scholar]
- Manrique, R.; Nunes, B.P.; Mariño, O.; Cardozo, N.; Siqueira, S.W.M. Towards the Identification of Concept Prerequisites Via Knowledge Graphs. In Proceedings of the 2019 IEEE 19th International Conference on Advanced Learning Technologies (ICALT), Maceio, Brazil, 15–18 July 2019; pp. 332–336. [Google Scholar]
- Gordon, J.; Zhu, L.; Galstyan, A.; Natarajan, P.; Burns, G. Modeling Concept Dependencies in a Scientific Corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016. [Google Scholar]
- Chen, W.; Lan, A.S.; Cao, D.; Brinton, C.G.; Chiang, M. Behavioral Analysis at Scale: Learning Course Prerequisite Structures from Learner Clickstreams. In Proceedings of the International Conference on Educational Data Mining, Raleigh, NC, USA, 16–20 July 2018. [Google Scholar]
- Liang, C.; Wu, Z.; Huang, W.; Giles, C.L. Measuring Prerequisite Relations Among Concepts. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, the Association for Computational Linguistics, Lisbon, Portugal, 17–21 September 2015; pp. 1668–1674. [Google Scholar]
- Jia, C.; Shen, Y.; Tang, Y.; Sun, L.; Lu, W. Heterogeneous Graph Neural Networks for Concept Prerequisite Relation Learning in Educational Data. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 2036–2047. [Google Scholar]
- Roy, S.; Madhyastha, M.; Lawrence, S.; Rajan, V. Inferring Concept Prerequisite Relations from Online Educational Resources. In Proceedings of the AAAI Conference on Artificial Intelligence, Waikiki, HI, USA, 27 January–1 February 2019; AAAI Press: Washington, DC, USA, 2019; pp. 9589–9594. [Google Scholar]
- Li, I.; Fabbri, A.R.; Hingmire, S.; Radev, D.R. R-VGAE: Relational-variational Graph Autoencoder for Unsupervised Prerequisite Chain Learning. In Proceedings of the COLING, International Committee on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 1147–1157. [Google Scholar]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MO, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Zhang, J.; Lin, N.; Zhang, X.; Song, W.; Yang, X.; Peng, Z. Learning Concept Prerequisite Relations from Educational Data via Multi-Head Attention Variational Graph Auto-Encoders. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, Virtual, 21–25 February 2022; pp. 1377–1385. [Google Scholar]
- Kipf, T.N.; Welling, M. Variational graph auto-encoders. arXiv 2016, arXiv:1611.07308. [Google Scholar]
- Liu, H.; Ma, W.; Yang, Y.; Carbonell, J.G. Learning Concept Graphs from Online Educational Data. J. Artif. Intell. Res. 2016, 55, 1059–1090. [Google Scholar] [CrossRef] [Green Version]
- Pan, L.; Li, C.; Li, J.; Tang, J. Prerequisite Relation Learning for Concepts in MOOCs. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1447–1456. [Google Scholar]
- Li, B.; Peng, B.; Shao, Y.; Wang, Z. Prerequisite Learning with Pre-trained Language and Graph Embedding Models. In Natural Language Processing and Chinese Computing, Proceedings of the 10th CCF International Conference, NLPCC 2021, Qingdao, China, 13–17 October 2021, Proceedings, Part II 10; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2021; Volume 13029, pp. 98–108. [Google Scholar]
- Li, I.; Fabbri, A.R.; Tung, R.R.; Radev, D.R. What Should I Learn First: Introducing LectureBank for NLP Education and Prerequisite Chain Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Waikiki, HI, USA, 27 January–1 February 2019; AAAI Press: Washington, DC, USA, 2019; pp. 6674–6681. [Google Scholar]
- Li, I.; Yan, V.; Li, T.; Qu, R.; Radev, D.R. Unsupervised Cross-Domain Prerequisite Chain Learning using Variational Graph Autoencoders. arXiv 2021, arXiv:2105.03505. [Google Scholar]
- Shen, J.T.; Yamashita, M.; Prihar, E.; Heffernan, N.T.; Wu, X.; Lee, D. MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education. arXiv 2021, arXiv:2106.07340. [Google Scholar]
- Liu, X.; Yin, D.; Zheng, J.; Zhang, X.; Zhang, P.; Yang, H.; Dong, Y.; Tang, J. OAG-BERT: Towards a Unified Backbone Language Model for Academic Knowledge Services. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 3418–3428. [Google Scholar]
- Gong, Z.; Zhou, K.; Zhao, X.; Sha, J.; Wang, S.; Wen, J. Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory Network. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; pp. 5923–5933. [Google Scholar]
- Ke, P.; Ji, H.; Liu, S.; Zhu, X.; Huang, M. SentiLR: Linguistic Knowledge Enhanced Language Representation for Sentiment Analysis. arXiv 2019, arXiv:1911.02493. [Google Scholar]
- Zhou, W.; Lee, D.; Selvam, R.K.; Lee, S.; Ren, X. Pre-training Text-to-Text Transformers for Concept-centric Common Sense. arXiv 2020, arXiv:2011.07956. [Google Scholar]
- Li, J.; Zhang, Z.; Zhao, H.; Zhou, X.; Zhou, X. Task-specific Objectives of Pre-trained Language Models for Dialogue Adaptation. arXiv 2020, arXiv:2009.04984. [Google Scholar]
- Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Chen, X.; Zhang, H.; Tian, X.; Zhu, D.; Tian, H.; Wu, H. ERNIE: Enhanced Representation through Knowledge Integration. arXiv 2019, arXiv:1904.09223. [Google Scholar]
- Levine, Y.; Lenz, B.; Dagan, O.; Ram, O.; Padnos, D.; Sharir, O.; Shalev-Shwartz, S.; Shashua, A.; Shoham, Y. SenseBERT: Driving Some Sense into BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 4656–4667. [Google Scholar]
- Qin, Y.; Lin, Y.; Takanobu, R.; Liu, Z.; Li, P.; Ji, H.; Huang, M.; Sun, M.; Zhou, J. ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; pp. 3350–3363. [Google Scholar]
- Pan, L.; Wang, X.; Li, C.; Li, J.; Tang, J. Course Concept Extraction in MOOCs via Embedding-Based Graph Propagation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing, Taipei, Taiwan, 27 November–1 December 2017; Asian Federation of Natural Language Processing: Taipei, Taiwan, 2017; pp. 875–884. [Google Scholar]
- Liang, C.; Ye, J.; Wu, Z.; Pursel, B.; Giles, C.L. Recovering Concept Prerequisite Relations from University Course Dependencies. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; AAAI Press: Washington, DC, USA, 2017; pp. 4786–4791. [Google Scholar]
- Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality Reduction by Learning an Invariant Mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; pp. 1735–1742. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G.E. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the International Conference on Machine Learning, Proceedings of Machine Learning Research, Virtual, 13–18 July 2020; Volume 119, pp. 1597–1607.
- Chen, T.; Sun, Y.; Shi, Y.; Hong, L. On Sampling Strategies for Neural Network-based Collaborative Filtering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 767–776. [Google Scholar]
- Gutmann, M.; Hyvärinen, A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Sardinia, Italy, 13–15 May 2010; Volume 9, pp. 297–304. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2013; Volume 26. [Google Scholar]
- Gururangan, S.; Marasovic, A.; Swayamdipta, S.; Lo, K.; Beltagy, I.; Downey, D.; Smith, N.A. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 8342–8360. [Google Scholar]
- Schlichtkrull, M.S.; Kipf, T.N.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M. Modeling Relational Data with Graph Convolutional Networks. In Proceedings of the 15th International Conference, ESWC 2018, Heraklion, Greece, 3–7 June 2018; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2018; Volume 10843, pp. 593–607. [Google Scholar]
- Wu, Y.; Zhao, S.; Li, W. Phrase2Vec: Phrase embedding based on parsing. Inf. Sci. 2020, 517, 100–127. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the ICLR (Workshop Poster), Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 conference on Empirical Methods in Natural Language Processing: System Demonstrations (Demos), Association for Computational Linguistics, Online, 16–20 November 2020; pp. 38–45. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the ICLR (Poster), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Dataset | |||||
---|---|---|---|---|---|
UCD | 407 | 654 | 1008 | 861 | 580 |
LBD | 307 | 250 | 471 | 995 | 265 |
MD | 406 | 382 | 1004 | 1404 | 3634 |
Dataset | Method | ACC | F1 | AP | AUC |
---|---|---|---|---|---|
UCD | PREREQ ⋄ | 0.5433 | 0.5866 | 0.5309 | 0.6702 |
BERT-base | 0.6916 | 0.6635 | 0.6412 | 0.7433 | |
DAPT BERT | 0.7173 | 0.7085 | 0.7497 | 0.7944 | |
R-GCN | 0.6450 | 0.5989 | 0.6333 | 0.6548 | |
VGAE | 0.6700 | 0.6413 | 0.7534 | 0.6972 | |
R-GCN (BERT) | 0.6100 | 0.5244 | 0.5964 | 0.6200 | |
R-VGAE | 0.6950 | 0.6772 | 0.8073 | 0.7661 | |
MHAVGAE | 0.7450 | 0.7330 | 0.8201 | 0.7797 | |
TCPL (ours) | 0.8088 | 0.7900 | 0.8434 | 0.8668 | |
LBD | PREREQ ⋄ | 0.4875 | 0.5130 | 0.5032 | 0.5557 |
BERT-base | 0.6526 | 0.6207 | 0.6143 | 0.6516 | |
DAPT BERT | 0.6526 | 0.6374 | 0.7677 | 0.7176 | |
R-GCN | 0.5394 | 0.5921 | 0.5870 | 0.5840 | |
VGAE | 0.5904 | 0.5792 | 0.5733 | 0.6053 | |
R-GCN (BERT) | 0.5120 | 0.5239 | 0.5357 | 0.5536 | |
R-VGAE | 0.6538 | 0.6764 | 0.6467 | 0.6338 | |
MHAVGAE | 0.6774 | 0.6899 | 0.7608 | 0.7256 | |
TCPL (ours) | 0.7737 | 0.7774 | 0.8380 | 0.8393 | |
MOOC | PREREQ ⋄ | 0.5429 | 0.5746 | 0.5286 | 0.6248 |
BERT-base | 0.7645 | 0.7776 | 0.8313 | 0.8461 | |
DAPT BERT | 0.7628 | 0.7851 | 0.8428 | 0.8564 | |
R-GCN | 0.6500 | 0.5532 | 0.5742 | 0.6208 | |
VGAE | 0.6550 | 0.5818 | 0.7371 | 0.7045 | |
R-GCN (BERT) | 0.6120 | 0.5346 | 0.5333 | 0.6037 | |
R-VGAE | 0.7050 | 0.7204 | 0.7978 | 0.7544 | |
MHAVGAE | 0.7485 | 0.7653 | 0.8832 | 0.8789 | |
TCPL (ours) | 0.8400 | 0.8411 | 0.9115 | 0.9076 |
Method | UCD | LBD | MD | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC | F1 | AP | AUC | ACC | F1 | AP | AUC | ACC | F1 | AP | AUC | |
TCPL | 0.8088 | 0.79 | 0.8434 | 0.8668 | 0.7737 | 0.7774 | 0.838 | 0.8393 | 0.8400 | 0.8411 | 0.9115 | 0.9076 |
-w/o BERT | 0.6450 | 0.5989 | 0.6333 | 0.6548 | 0.5394 | 0.5921 | 0.587 | 0.584 | 0.6500 | 0.5532 | 0.5742 | 0.6408 |
-w/o R-GCN | 0.6916 | 0.6635 | 0.6412 | 0.7433 | 0.6526 | 0.6207 | 0.6143 | 0.6516 | 0.7645 | 0.7776 | 0.8313 | 0.8461 |
-w/o hinge loss | 0.8062 | 0.7877 | 0.7768 | 0.8416 | 0.7333 | 0.7039 | 0.8176 | 0.7958 | 0.7463 | 0.7584 | 0.7026 | 0.7555 |
-w/o BCE loss | 0.4493 | 0.6177 | 0.4828 | 0.5372 | 0.5395 | 0.5270 | 0.5494 | 0.5528 | 0.5249 | 0.6056 | 0.5907 | 0.5841 |
-w/o CP | 0.8017 | 0.7798 | 0.8366 | 0.8608 | 0.7605 | 0.7598 | 0.8163 | 0.8135 | 0.8192 | 0.8256 | 0.8812 | 0.8870 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tang, X.; Liu, K.; Xu, H.; Xiao, W.; Tan, Z. Continual Pre-Training of Language Models for Concept Prerequisite Learning with Graph Neural Networks. Mathematics 2023, 11, 2780. https://doi.org/10.3390/math11122780
Tang X, Liu K, Xu H, Xiao W, Tan Z. Continual Pre-Training of Language Models for Concept Prerequisite Learning with Graph Neural Networks. Mathematics. 2023; 11(12):2780. https://doi.org/10.3390/math11122780
Chicago/Turabian StyleTang, Xin, Kunjia Liu, Hao Xu, Weidong Xiao, and Zhen Tan. 2023. "Continual Pre-Training of Language Models for Concept Prerequisite Learning with Graph Neural Networks" Mathematics 11, no. 12: 2780. https://doi.org/10.3390/math11122780
APA StyleTang, X., Liu, K., Xu, H., Xiao, W., & Tan, Z. (2023). Continual Pre-Training of Language Models for Concept Prerequisite Learning with Graph Neural Networks. Mathematics, 11(12), 2780. https://doi.org/10.3390/math11122780