An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation
Abstract
:1. Introduction
- We propose an actor-critic hierarchical reinforcement learning model (ACHRL) to optimize the user profile reviser of the HRL model and improve the accuracy of course recommendations.
- We propose a policy gradient method based on the temporal difference error, which updates the policy adopted by the agent of the reinforcement learning model with rewards at the current moment and states of the moments before and after. This can speed up model convergence and improve the accuracy of the recommendation model.
- We performed experiments using two MOOC datasets, with the majority of users enrolled in courses from various categories, and our proposed model demonstrated a significant improvement over the baseline model on different evaluation metrics.
2. Related Work
2.1. Course Recommendation
2.2. RL-Based Course Recommendation
3. Preliminaries
3.1. Policy Gradient Method
3.2. Actor-Critic Method
4. Methodologies
4.1. Definitions and Problem Formalization
4.2. Framework
4.2.1. Profile Reviser Module
4.2.2. Policy Gradient Based on Temporal Differential Error
Algorithm 1 Actor-Critic Method |
Input: a derivable policy parameterization , a derivable actor-value, a derivable parameterization , a state-value parameterization Initialize: parameters , 1: 2: Generate a sampling sequence {s1,a1,r1,s2,a2,r2…} following 3: for data at each step do 4: 5: 6: 7: |
4.2.3. Objective Function
4.3. Attention-Based Recommendation Module
4.4. Separable Two Components of ACHRL
4.4.1. High-Level Task Optimization: ACHRL_H
4.4.2. Low-Level Task Optimization: ACHRL_L
4.5. Model Training
Algorithm 2 Actor-Critic-Based Hierarchical Reinforcement Learning |
Input: training data: εu; pre-train recommendation model parameterized by: ; pre-train profile reviser parameterized by: = {w1,w2,b}; derivable state value functions: (s,w), (s,w) Initialize: , , (r,s) = 0, (r,s) = 0; 1: for sequence k = 1 to k do 2: for each εu: = (eu1, … ) and ci do 3: Sample a high-level action ah , in the high-level task; 4: if then 5: 6: else 7: Sample a sequence of states and actions with , in the low-level task; 8: Calculate: , ; 9: Calculate the gradients using Equations (9) and (10) according to Algorithm 1; 10: end if 11: end for 12: Update parameters , by the gradients; 13: Update parameter by the recommendation module; 14: Output the recommendation probability using Equation (13) 15: end for |
5. Experiments
5.1. Dataset
5.2. Experimental Setup
5.2.1. Compared Method
- MLP [52]: A powerful recommendation system model that leverages deep learning techniques to provide personalized recommendations based on complex user–item interactions.
- FISM [13]: An important model in the field of recommendation systems, particularly suitable for collaborative filtering recommendation problems where user historical behavior data is the primary input.
- NeuMF [55]: A model that combines matrix decomposition techniques and MLP methods to mine the potential information of user courses for modelling and recommends relevant courses to users.
- NARM [56]: An optimized gated recursive model that evaluates the attention factor based on the behavior and primary purpose of users.
- NAIS [14]: The model built using an item-based collaborative filtering method combined with an attention mechanism neural network that can distinguish between different historical course weights for course recommendation.
- HRRL [45]: An HRL-based method using time-context rewards can optimize strategy learning in reinforcement learning for course recommendation.
- DARL [46]: A novel course recommendation framework that can capture user preferences using historical data for improving the effectiveness of course recommendations.
- HRL [12]: Recommendation model and profile reviser joint training.
- ACHRL_H: A simplified version of the ACHRL model and profile reviser joint training that only adopts the optimization of the AC method in the high-level task of the profile reviser.
- ACHRL_L: A simplified version of the ACHRL model and profile reviser joint training that only adopts the optimization of the AC method in the low-level task of the profile reviser.
5.2.2. Evaluation Metrics
- HR@K (Hit Ratio at K): A measure of how many of the relevant items were successfully included in the top K recommendations.The following formula represents a successful recommendation to the user
- NDCG@K (Normalized Discounted Cumulative Gain): An accumulative performance measure that takes into account both the relevance and position of ranked items. It can be defined as follows:
5.2.3. Parameters and Environment
5.3. Results and Analysis
5.3.1. Comparison of Experimental Results
5.3.2. Ablation Experiment
5.3.3. Influence of Hyper-Parameters
5.3.4. Performance Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Saadatdoost, R.; Sim, A.T.H.; Jafarkarimi, H.; Mei Hee, J. Exploring MOOC from education and Information Systems perspectives: A short literature review. Educ. Rev. 2015, 67, 505–518. [Google Scholar] [CrossRef]
- Cheng, J.; Yuen, A.H.; Chiu, D.K. Systematic review of MOOC research in mainland China. Libr. Hi Tech 2022, 41, 1476–1497. [Google Scholar] [CrossRef]
- Atiaja, L.A.; Proenza, R. The MOOCs: Origin, characterization, principal problems and challenges in Higher Education. J. e-Learn. Knowl. Soc. 2016, 12, 65–76. [Google Scholar]
- Laurillard, D. The educational problem that MOOCs could solve: Professional development for teachers of disadvantaged students. Res. Learn. Technol. 2016, 24, 30–53. [Google Scholar] [CrossRef]
- Xu, M.; Deng, J.; Zhao, T. On Status Quo, Problems, and Future Development of Translation and Interpreting MOOCs in China--A Mixed Methods Approach. J. Interact. Media Educ. 2020, 2020, 367–371. [Google Scholar] [CrossRef]
- Parameswaran, A.; Venetis, P.; Garcia-Molina, H. Recommendation systems with complex constraints: A course recommendation perspective. ACM Trans. Inf. Syst. (TOIS) 2011, 29, 1–33. [Google Scholar] [CrossRef]
- Zhang, H.; Huang, T.; Lv, Z.; Liu, S.; Zhou, Z. MCRS: A course recommendation system for MOOCs. Multimed. Tools Appl. 2018, 77, 7051–7069. [Google Scholar] [CrossRef]
- Jiang, W.; Pardos, Z.A.; Wei, Q. Goal-based course recommendation. In Proceedings of the 9th International Conference on Learning Analytics & Knowledge, Tempe, AZ, USA, 4–8 March 2019; pp. 36–45. [Google Scholar]
- Ma, B.; Lu, M.; Taniguchi, Y.; Konomi, S.I. CourseQ: The impact of visual and interactive course recommendation in university environments. Res. Pract. Technol. Enhanc. Learn. 2021, 16, 18. [Google Scholar] [CrossRef]
- Thanh-Nhan, H.L.; Nguyen, H.H.; Thai-Nghe, N. Methods for building course recommendation systems. In Proceedings of the 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), Hanoi, Vietnam, 6–8 October 2016; IEEE: Piscataway, NJ, USA; pp. 163–168. [Google Scholar]
- Khalid, A.; Lundqvist, K.; Yates, A. A literature review of implemented recommendation techniques used in Massive Open online Courses. Expert Syst. Appl. 2022, 187, 115926. [Google Scholar] [CrossRef]
- Zhang, J.; Hao, B.; Chen, B.; Li, C.; Chen, H.; Sun, J. Hierarchical reinforcement learning for course recommendation in MOOCs. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 435–442. [Google Scholar]
- Kabbur, S.; Ning, X.; Karypis, G. Fism: Factored item similarity models for top-n recommender systems. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; pp. 659–667. [Google Scholar]
- He, X.; He, Z.; Song, J.; Liu, Z.; Jiang, Y.G.; Chua, T.S. Nais: Neural attentive item similarity model for recommendation. IEEE Trans. Knowl. Data Eng. 2018, 30, 2354–2366. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, L. Top-N recommendation algorithm integrated neural network. Neural Comput. Appl. 2021, 33, 3881–3889. [Google Scholar] [CrossRef]
- Zhao, X.; Zhang, Z.; Bi, X.; Sun, Y. A new point-of-interest group recommendation method in location-based social networks. Neural Comput. Appl. 2020, 35, 12945–12956. [Google Scholar] [CrossRef]
- Jiang, X.; Sun, H.; Zhang, B.; He, L.; Jia, X. A novel meta-graph-based attention model for event recommendation. Neural Comput. Appl. 2022, 34, 14659–14682. [Google Scholar] [CrossRef]
- Liu, H.; Wang, Y.; Lin, H.; Xu, B.; Zhao, N. Mitigating sensitive data exposure with adversarial learning for fairness recommendation systems. Neural Comput. Appl. 2022, 34, 18097–18111. [Google Scholar] [CrossRef]
- Ren, Y.; Liang, K.; Shang, Y.; Zhang, Y. MulOER-SAN: 2-layer multi-objective framework for exercise recommendation with self-attention networks. Knowl. Based Syst. 2023, 260, 110117. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Sutton, R.S.; McAllester, D.; Singh, S.; Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 1999, 12, 1057–1063. [Google Scholar]
- O’Doherty, J.P.; Dayan, P.; Friston, K.; Critchley, H.; Dolan, R.J. Temporal difference models and reward-related learning in the human brain. Neuron 2003, 38, 329–337. [Google Scholar] [CrossRef]
- Li, J.; Ye, Z. Course recommendations in online education based on collaborative filtering recommendation algorithm. Complexity 2020, 2020, 6619249. [Google Scholar] [CrossRef]
- Ghauth, K.I.; Abdullah, N.A. The effect of incorporating good learners’ ratings in e-Learning content-based recommender System. J. Educ. Technol. Soc. 2011, 14, 248–257. [Google Scholar]
- Xu, G.; Jia, G.; Shi, L.; Zhang, Z. Personalized course recommendation system fusing with knowledge graph and collaborative filtering. Comput. Intell. Neurosci. 2021, 2021, 9590502. [Google Scholar] [CrossRef]
- Emon, M.I.; Shahiduzzaman, M.; Rakib MR, H.; Shathee MS, A.; Saha, S.; Kamran, M.N.; Fahim, J.H. (2021, August). In Profile Based Course Recommendation System Using Association Rule Mining and Collaborative Filtering. In Proceedings of the 2021 International Conference on Science & Contemporary Technologies (ICSCT), Dhaka, Bangladesh, 5–7 August 2021; IEEE: Piscataway, NJ, USA; pp. 1–5. [Google Scholar]
- Gao, M.; Luo, Y.; Hu, X. Online course recommendation using deep convolutional neural network with negative sequence mining. Wirel. Commun. Mob. Comput. 2022, 2022, 9054149. [Google Scholar] [CrossRef]
- Wang, X.; Ma, W.; Guo, L.; Jiang, H.; Liu, F.; Xu, C. Hgnn: Hyperedge-based graph neural network for mooc course recommendation. Inf. Process. Manag. 2022, 59, 102938. [Google Scholar] [CrossRef]
- Ren, X.; Yang, W.; Jiang, X.; Jin, G.; Yu, Y. A deep learning framework for multimodal course recommendation based on LSTM+ attention. Sustainability 2022, 14, 2907. [Google Scholar] [CrossRef]
- Moerland, T.M.; Broekens, J.; Plaat, A.; Jonker, C.M. Model-based reinforcement learning: A survey. Found. Trends® Mach. Learn. 2023, 16, 1–118. [Google Scholar] [CrossRef]
- Rohde, D.; Bonner, S.; Dunlop, T.; Vasile, F.; Karatzoglou, A. Recogym: A reinforcement learning environment for the problem of product recommendation in online advertising. arXiv 2018, arXiv:1808.00720. [Google Scholar]
- Wang, X.; Chen, W.; Wu, J.; Wang, Y.F.; Wang, W.Y. Video captioning via hierarchical reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4213–4222. [Google Scholar]
- Lanctot, M.; Zambaldi, V.; Gruslys, A.; Lazaridou, A.; Tuyls, K.; Pérolat, J.; Graepel, T. A unified game-theoretic approach to multiagent reinforcement learning. Adv. Neural Inf. Process. Syst. 2017, 30, 4193–4206. [Google Scholar]
- François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. Found. Trends® Mach. Learn. 2018, 11, 219–354. [Google Scholar] [CrossRef]
- Henderson, P.; Islam, R.; Bachman, P.; Pineau, J.; Precup, D.; Meger, D. Deep reinforcement learning that matters. In Proceedings of the AAAI Conference on Artificial Intelligence 2018, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Li, S.E. Deep Reinforcement Learning. In Reinforcement Learning for Sequential Decision and Optimal Control Singapore; Springer Nature: Singapore, 2023; pp. 365–402. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Hassabis, D. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Wiering, M.A.; Van Otterlo, M. Reinforcement learning. Adapt. Learn. Optim. 2012, 12, 729. [Google Scholar]
- Mo, S.; Pei, X.; Wu, C. Safe reinforcement learning for autonomous vehicle using monte carlo tree search. IEEE Trans. Intell. Transp. Syst. 2021, 23, 6766–6773. [Google Scholar] [CrossRef]
- Wang, J.X.; Kurth-Nelson, Z.; Tirumala, D.; Soyer, H.; Leibo, J.Z.; Munos, R.; Botvinick, M. Learning to reinforcement learn. arXiv 2016, arXiv:1611.05763. [Google Scholar]
- Pateria, S.; Subagdja, B.; Tan, A.H.; Quek, C. Hierarchical reinforcement learning: A comprehensive survey. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
- Nachum, O.; Gu, S.S.; Lee, H.; Levine, S. Data-efficient hierarchical reinforcement learning. Adv. Neural Inf. Process. Syst. 2018, 31, 3307–3317. [Google Scholar]
- Botvinick, M.M. Hierarchical reinforcement learning and decision making. Curr. Opin. Neurobiol. 2012, 22, 956–962. [Google Scholar] [CrossRef]
- Lin, Y.; Lin, F.; Zeng, W.; Xiahou, J.; Li, L.; Wu, P.; Miao, C. Hierarchical reinforcement learning with dynamic recurrent mechanism for course recommendation. Knowl. Based Syst. 2022, 244, 108546. [Google Scholar] [CrossRef]
- Lin, Y.; Lin, F.; Yang, L.; Zeng, W.; Liu, Y.; Wu, P. Context-aware reinforcement learning for course recommendation. Appl. Soft Comput. 2022, 125, 109189. [Google Scholar] [CrossRef]
- Lin, Y.; Feng, S.; Lin, F.; Zeng, W.; Liu, Y.; Wu, P. Adaptive course recommendation in MOOCs. Knowl. Based Syst. 2021, 224, 107085. [Google Scholar] [CrossRef]
- Nachum, O.; Norouzi, M.; Xu, K.; Schuurmans, D. Bridging the gap between value and policy based reinforcement learning. Adv. Neural Inf. Process. Syst. 2017, 30, 2772–2782. [Google Scholar]
- Howard, R.A. Dynamic Programming and Markov Processes; John Wiley: Hoboken, NJ, USA, 1960. [Google Scholar]
- Garcia, F.; Rachelson, E. Markov decision processes. In Markov Decision Processes in Artificial Intelligence; Wiley Online Library: Hoboken, NJ, USA, 2013; pp. 1–38. [Google Scholar]
- Frome, A.; Corrado, G.S.; Shlens, J.; Bengio, S.; Dean, J.; Ranzato, M.A.; Mikolov, T. Devise: A deep visual-semantic embedding model. Adv. Neural Inf. Process. Syst. 2013, 26, 2121–2129. [Google Scholar]
- Lahitani, A.R.; Permanasari, A.E.; Setiawan, N.A. Cosine similarity to determine similarity measure: Study case in online essay assessment. In Proceedings of the 2016 4th International Conference on Cyber and IT Service Management, Bandung, Indonesia, 26–27 April 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
- Taud, H.; Mas, J.F. Multilayer perceptron (MLP). In Geomatic Approaches for Modeling Land Change Scenarios; Springer: Cham, Switzerland, 2018; pp. 451–455. [Google Scholar]
- Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [Google Scholar] [CrossRef]
- Poth, C.; Pfeiffer, J.; Rücklé, A.; Gurevych, I. What to pre-train on efficient intermediate task selection. arXiv 2021, arXiv:2104.08247. [Google Scholar]
- Liu, H.; Yu, J.; Chen, X.; Zhang, L. NeuMF: Predicting Anti-cancer Drug Response Through a Neural Matrix Factorization Model. Curr. Bioinform. 2022, 17, 835–847. [Google Scholar] [CrossRef]
- Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; Ma, J. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 1419–1428. [Google Scholar]
- Dalianis, H. Evaluation metrics and evaluation. In Clinical Text Mining: Secondary Use of Electronic Patient Records; Springer: Cham, Switzerland, 2018; pp. 45–53. [Google Scholar]
- Von Lücken, C.; Barán, B.; Brizuela, C. A survey on multi-objective evolutionary algorithms for many-objective problems. Comput. Optim. Appl. 2014, 58, 707–756. [Google Scholar] [CrossRef]
- Alibabaei, K.; Gaspar, P.D.; Assunção, E.; Alirezazadeh, S.; Lima, T.M.; Soares, V.N.; Caldeira, J.M. Comparison of on-policy deep reinforcement learning A2C with off-policy DQN in irrigation optimization: A case study at a site in Portugal. Computers 2022, 11, 104. [Google Scholar] [CrossRef]
Dataset | Courses | Users | Interactions | Average Interactions |
---|---|---|---|---|
MOOCCourse | 1302 | 82,535 | 458,453 | 5.55 |
MOOCCube | 706 | 55,203 | 190,049 | 3.44 |
MOOCCourse | MOOCCube | |||||||
---|---|---|---|---|---|---|---|---|
HR@5 | HR@10 | NDCG@5 | NDCG@10 | HR@5 | HR@10 | NDCG@5 | NDCG@10 | |
MLP | 52.53 | 66.74 | 40.61 | 40.96 | 51.62 | 66.55 | 40.00 | 43.58 |
FISM | 53.12 | 65.89 | 40.63 | 45.13 | 52.85 | 65.80 | 40.50 | 45.52 |
NeuMF | 54.20 | 67.25 | 42.06 | 46.05 | 54.25 | 67.50 | 41.72 | 46.00 |
NARM | 54.23 | 69.37 | 42.54 | 47.24 | 54.12 | 69.50 | 41.85 | 47.20 |
NAIS | 56.05 | 68.98 | 43.58 | 47.69 | 56.02 | 69.53 | 43.50 | 47.23 |
HRL | 59.84 | 75.00 | 44.50 | 50.95 | 58.45 | 72.05 | 44.87 | 49.28 |
HRRL | 61.36 | 78.29 | 45.82 | 51.70 | - | - | - | - |
DARL | 63.12 | 77.63 | 48.53 | 53.25 | - | - | - | - |
ACHRL_H | 63.61 | 77.21 | 46.07 | 51.18 | 61.95 | 76.98 | 45.42 | 50.67 |
ACHRL_L | 64.96 | 78.04 | 48.58 | 52.86 | 62.91 | 77.13 | 46.33 | 51.07 |
ACHRL | 66.19 | 78.42 | 49.84 | 53.84 | 64.03 | 78.40 | 46.35 | 51.40 |
Recommendation Module | Course Embedding Size | Course Hidden Layer Size | Batch Size | Learning Rate |
---|---|---|---|---|
HRRL | 16 | 16 | 256 | 0.02 |
DARL | 16 | 16 | 256 | 0.02 |
profile reviser module: | sampling time N | discount coefficient | hidden layer size | learning rate (pre-training and joint-training) |
HRRL | 4 | 0.5 | 8 | 0.001/0.005 |
DARL | 3 | 0.5 | 8 | 0.001/0.005 |
Model | Performance | Recommended Result | |
---|---|---|---|
(1) | ACHRL | Data Structure, Java, Assembly Language, Software Engineering | Software Engineering (√) |
HRL | Data Structure, Java, Economics, Data Structure, Software Engineering | Organic Chemistry (×) | |
(2) | ACHRL | Monetary and Financial Studies, Investment Studies, Corporate Finance | Principles of Economics (√) |
HRL | Operating Systems, Monetary and Financial Studies, Investment Studies | Software Engineering (×) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liang, K.; Zhang, G.; Guo, J.; Li, W. An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation. Electronics 2023, 12, 4939. https://doi.org/10.3390/electronics12244939
Liang K, Zhang G, Guo J, Li W. An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation. Electronics. 2023; 12(24):4939. https://doi.org/10.3390/electronics12244939
Chicago/Turabian StyleLiang, Kun, Guoqiang Zhang, Jinhui Guo, and Wentao Li. 2023. "An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation" Electronics 12, no. 24: 4939. https://doi.org/10.3390/electronics12244939
APA StyleLiang, K., Zhang, G., Guo, J., & Li, W. (2023). An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation. Electronics, 12(24), 4939. https://doi.org/10.3390/electronics12244939