An Effective Student Grouping and Course Recommendation Strategy Based on Big Data in Education
Abstract
:1. Introduction
- (1)
- The commonly used feature word extraction methods are statistics-based methods and semantics-based methods. Statistics-based methods include methods using term frequency–inverse document frequency (TF–IDF) [2], information gain, word length and so on. Semantics-based methods include methods based on the HowNet [3] concept and ontology. However, the above methods are not comprehensive enough in the representation of semantic information, especially in the representation of synonyms. So, a reliable feature extraction method should be proposed to accurately and comprehensively characterize students and courses.
- (2)
- As students’ grouping labels are not easy to obtain in practice, existing research mainly uses unsupervised clustering algorithms, for example, K-means, to group students. However, the traditional K-means algorithm initializes cluster centers randomly, which will lead to incorrect or uneven cluster division and cause incorrect results. Therefore, an effective student grouping strategy should be deeply studied based on the traditional K-means algorithm.
- (3)
- Considering the goal to achieve in this paper, the item collaboration filter (ItemCF) [4] algorithm is adopted in this paper to recommend courses to the student groups. However, the traditional ItemCF algorithm has a serious cold-start problem, because it mainly uses user behaviors to calculate the similarity of items and recommend similar items. Therefore, it is impossible to recommend a new item as there is no record of it in the item-related table. Therefore, how to recommend high-quality courses that meet students’ interests and solve the cold-start problem needs to be solved.
- (1)
- An accurate feature extraction algorithm for representing students’ characteristics is designed. This paper comprehensively selected feature words from two dimensions: First, TF–IDF weighting is used to select feature words from the word frequency dimension. Then, a Word2Vec [5] model is trained to select feature words from the semantic dimension. These feature words are combined to obtain the final text set representing students’ characteristics to the greatest extent.
- (2)
- An improved K-means algorithm is designed to group students. By observing the characteristics of the extracted feature words, this paper constructs a multi-dimensional vector to represent each student. Then, an appropriate grouping result is obtained by using the improved K-means algorithm, in which the method of selecting the initial cluster center is improved to ensure that the distance between the points in the cluster and the initial cluster center is less than a certain value.
- (3)
- A group-oriented course recommendation method is ultimately proposed. This paper introduces a semantic recommendation model and expert scoring to assist in course recommendation, thus improving the quality of the recommendation results. Considering that the number of courses selected by students in each semester is generally very small compared to the number of courses, this paper uses the semantic information to solve the serious cold-start problem.
- (4)
- A series of experiments, based on real data provided by junior high school students (12 to 15 years old), are conducted to verify the feasibility and effectiveness of the proposed strategy, which can group students of all educational levels and recommend courses (both online and offline) to them.
2. Related Works
2.1. Student Grouping Strategy
2.2. Personalized Course Recommendation
3. System Workflow
4. Student Grouping Strategy and Group-Oriented Course Recommendation Method Based on Semantics
4.1. Feature Extraction Based on TF–IDF and Word Vectors
4.1.1. Text Preprocessing
4.1.2. Screening Feature Words from the Statistical Dimension
4.1.3. Screening Feature Words from the Semantic Dimension
4.2. Student Grouping Based on Semantics
4.2.1. Multidimensional Feature Vector Representation Model
4.2.2. Student Grouping Based on an Improved K-Means Algorithm
Algorithm 1 Improved initial cluster center selection algorithm |
Input: Threshold n; K; Student data set SV |
Output: Cluster centers set C |
1 i = 0, distances = [] 2 while SV not None do |
3 num ← randint(0, len(SV)) |
4 ← SV[num] |
5 remove fromSV |
6 for eachx ∈ SV do 7 distances[x] = dist(, x) |
8 end for 9 select first n points from sort(distances) and remove them from SV 10 if len(SV) < 2n 11 = center(SV) 12 break 13 end if 14 i += 1 |
15 end while |
Algorithm 2 Student grouping method based on an improved K-means algorithm |
Input: Student feature and TF–IDF value; K in range(M); Tolerance ε |
Output: M kinds of clusters about K values |
1 for eachK do |
2 C ← Select K cluster centers by Algorithm 1 |
3 E ← ∞ |
4 whileE > ε do |
5 for each sv ∈ SV do |
6 for eachc ∈ C do |
7 sv ∈ mindist(cluster(c)) |
8 end for |
9 E←0 |
10 for eachc ∈ C do |
11 for eachsv ∈ cluster(c) do |
12 E += dist(sv, c) |
13 end for |
14 update c |
15 end for |
16 end while |
17 end for |
4.3. Personalized Course Recommendation Scheme Based on Student Grouping
Algorithm 3 Personalized Course Recommendation Scheme |
Input: Train Set t; Usergroup_id u; Similarity Matrix W; N |
Ouput: Recommendation results Results |
1 InitialResults ← dict |
2 Initialru ← t[u] |
3 for each i, ∈ ru.items() do |
4 kl ← sorted(W[i].items(), key=itemgetter(1), reverse=True)[0:N] |
5 for each j, wj ∈ kl do |
6 ifi == j do |
7 continue 8 end if |
9 Results[j] += * wj |
10 end for |
11 end for |
5. Experimental Results and Analysis
5.1. Data Set
5.2. Evaluation Metrics
5.2.1. Evaluation Metrics for the Student Grouping Method
5.2.2. Evaluation Metrics for the Course Recommendation Algorithm
5.3. Experimental Results
5.3.1. Verification of the Feature Extraction Method
5.3.2. Verification of the Student Grouping Method
5.3.3. Verification of the Course Recommendation Method
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sofroniou, A.; Poutos, K. Investigating the Effectiveness of Group Work in Mathematics. Educ. Sci. 2016, 6, 30. [Google Scholar] [CrossRef] [Green Version]
- Martineau, J.; Finin, T. Delta TFIDF: An Improved Feature Space for Sentiment Analysis. In Proceedings of the 2009 3rd AAAI International Conference on Weblogs and Social Media (ICWSM), San Jose, CA, USA, 17–20 May 2009; Available online: https://ojs.aaai.org/index.php/ICWSM/article/view/13979/13828 (accessed on 18 October 2021).
- Dong, Z.; Qiang, D.; Hao, C. HowNet and its computation of meaning. In Proceedings of the 2010 23rd International Conference on Computational Linguistics, Beijing, China, 23–27 August 2010. [Google Scholar]
- Linden, G.; Smith, B.; York, J. Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Comput. 2003, 7, 76–80. [Google Scholar] [CrossRef] [Green Version]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. In Proceedings of the 2013 1st International Conference on Learning Representations (ICLR), Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar]
- So, H.; Brush, T. Student perceptions of collaborative learning, social presence and satisfaction in a blended learning environment: Relationships and critical factors. Comput. Educ. 2008, 51, 318–336. [Google Scholar] [CrossRef]
- Ruane, R. A Study of Student Interaction in an Online Learning Environment Specially Crafted for Cross-Level Peer Mentoring. Ph.D. Thesis, Philadelphia, PA, USA, 2012. [Google Scholar]
- Liu, Q.; Ba, S.; Huang, J.; Wu, L.; Lao, C. A study on grouping strategy of collaborative learning based on clustering algorithm. In Proceedings of the International Conference on Blended Learning, Hong Kong, China, 27–29 June 2017; Springer: Cham, Switzerlnad, 2017; pp. 284–294. [Google Scholar] [CrossRef]
- Tacadao, G.; Toledo, R.P. Forming Student Groups with Student Preferences Using Constraint Logic Programming. In Proceedings of the 2016 17th International Conference on Artificial Intelligence: Methodology, Systems, and Applications (AIMSA), Varna, Bulgaria, 7–10 September 2016. [Google Scholar] [CrossRef]
- Pang, Y.; Xiao, F.; Wang, H.; Xue, X. A Clustering-Based Grouping Model for Enhancing Collaborative Learning. In Proceedings of the 2014 13th International Conference on Machine Learning and Applications (ICMLA), Detroit, MI, USA, 3–6 December 2014. [Google Scholar] [CrossRef]
- Zhu, T.B.; Wang, L.; Wang, D. Features of Group Online Learning Behaviors Based on Data Mining. Int. J. Emerg. Technol. Learn. 2022, 17, 34–47. [Google Scholar] [CrossRef]
- Wang, Y.H.; Tseng, M.H.; Liao, H.C. Data mining for adaptive learning sequence in English language instruction. Expert Syst. Appl. 2009, 36, 7681–7686. [Google Scholar] [CrossRef]
- Aher, S.B.; Lobo, L.M.R.J. Combination of machine learning algorithms for recommendation of courses in E-Learning System based on historical data. Knowl. Based Syst. 2013, 51, 1–14. [Google Scholar] [CrossRef]
- Meson, G.; Dragovich, J. Program assessment and evaluation using student grades obtained on outcome-related course learning objectives. J. Prof. Issues Eng. Educ. Pract. 2010, 27, 1315–1318. [Google Scholar] [CrossRef]
- Manouselis, N.; Sampson, D. Agent-Based E-Learning Course Recommendation: Matching Learner Characteristics with Content Attributes. Int. J. Comput. Appl. 2003, 25, 50–64. [Google Scholar] [CrossRef]
- Xiao, J.; Wang, M.; Jiang, B.; Li, J. A personalized recommendation system with combinational algorithm for online learning. J. Ambient Intell. Humanized Comput. 2018, 9, 667–677. [Google Scholar] [CrossRef]
- Wang, H.; Zhang, P.; Lu, T.; Gu, H.; Gu, N. Hybrid recommendation model based on incremental collaborative filtering and content-based algorithms. In Proceedings of the IEEE 21st International Conference on Computer Supported Cooperative Work in Design (CSCWD), Wellington, New Zealand, 26–28 April 2017; pp. 337–342. [Google Scholar] [CrossRef]
- Parameswaran, A.; Venetis, P.; Garcia-Molina, H. Recommendation systems with complex constraints. ACM Trans. Inf. Syst. 2011, 29, 1–33. [Google Scholar] [CrossRef]
- Si, H.J. Big Data-Assisted Recommendation of Personalized Learning Resources and Teaching Decision Support. Int. J. Emerg. Technol. Learn. 2022, 17, 19–32. [Google Scholar] [CrossRef]
- Jieba Chinese Word Segmentation Tool. Available online: https://github.com/fxsjy/jieba. (accessed on 12 November 2021).
- Prospere, K.; McLaren, K.; Wilson, B. Plant Species Discrimination in a Tropical Wetland Using in Situ Hyperspectral Data. Remote Sens. 2014, 6, 8494–8523. [Google Scholar] [CrossRef] [Green Version]
Vocabulary | Similar Words |
---|---|
Reading | Extracurricular reading materials, classics, storybooks, reading notes, required reading, intensive reading, bibliography, books, reading |
Music | Light music, arrangement, singing, pure music, melody, electric sound, rock music, concert, piano, guitar |
Unmanned aerial vehicle | Aerial photography, control, rotor, flight, aircraft, remote control, helicopter, aircraft, fixed wing |
Data Set | Information | Example | Data Set Size |
---|---|---|---|
Student information | Student ID | 10037952 | 27,275 items |
School name | Beijing No. 66 High School | ||
Comprehensive quality evaluation data | Student ID/Activity type | 10037952/Practice | 270 M |
Activity record | I participate in the open science practice—“Playing with the Solar system”. I learned that the center of the solar system is the sun and I also learned about the eight planets. | ||
Open social practice course data | Course ID/Course name | 10013-c2/Centrifugal force in life | 1018 items |
Expert score | 64 | ||
Course record | By understanding the structure of the dryer, students can learn the relevant scientific knowledge of centrifugal movement. | ||
Course selection records | Student ID/Course ID | 10037952/636 | 133,047 items |
Algorithms | Significance Level (α): | p | No. of Samples | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |||
Traditional Algorithm | 0.05 | 0.004203 | 1.34 | 1.88 | 1.86 | 1.51 | 1.56 | 1.48 | 2.31 | 1.33 | 1.29 | 1.39 | 1.21 | 1.49 | 1.90 | 1.61 | 2.13 |
Improved Algorithm | 0.98 | 1.60 | 1.43 | 1.21 | 1.33 | 1.23 | 1.89 | 0.93 | 0.96 | 0.99 | 0.96 | 1.34 | 1.39 | 1.26 | 1.69 |
School | School_1 | School_2 | School_3 | School_4 |
---|---|---|---|---|
Accuracy of clustering | 93.2% | 90.5% | 91.6% | 90.2% |
Algorithms | Metrics | Average | Number of Recommendation Results | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |||
Traditional Algorithm | Precision | 0.69 | 0.73 | 0.76 | 0.78 | 0.76 | 0.72 | 0.68 | 0.64 | 0.61 | 0.53 |
Recall | 0.65 | 0.66 | 0.71 | 0.72 | 0.72 | 0.68 | 0.65 | 0.61 | 0.59 | 0.55 | |
Popularity | 0.77 | 0.77 | 0.79 | 0.82 | 0.81 | 0.79 | 0.76 | 0.73 | 0.72 | 0.71 | |
Improved Algorithm | Precision | 0.63 | 0.70 | 0.74 | 0.74 | 0.71 | 0.67 | 0.62 | 0.55 | 0.51 | 0.50 |
Recall | 0.60 | 0.62 | 0.67 | 0.68 | 0.68 | 0.61 | 0.59 | 0.55 | 0.52 | 0.48 | |
Popularity | 0.31 | 0.31 | 0.32 | 0.31 | 0.30 | 0.31 | 0.30 | 0.31 | 0.32 | 0.30 |
Word Cloud of Specific Words | Recommended Courses |
---|---|
Exploration of 3D holographic projection | |
Human body sensor car | |
UAV flight principles and aerial photography experience | |
From recording metal to memory metal | |
Overview of rockets | |
Luban No. 7 | |
Small world—Leeuwenhoek microscope | |
Hydraulic mechanical arm production | |
Mortise and tenon chair | |
Homemade remote-control vehicle |
Data Set | Average Precision | Average Recall Rate | Average Popularity | Average Course Score |
---|---|---|---|---|
School_1 | 0.65 | 0.61 | 0.36 | 0.70 |
School_2 | 0.69 | 0.68 | 0.33 | 0.68 |
School_3 | 0.61 | 0.57 | 0.31 | 0.64 |
School_4 | 0.58 | 0.55 | 0.31 | 0.62 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, Y.; Chen, Y.; Xie, Y.; Ban, X. An Effective Student Grouping and Course Recommendation Strategy Based on Big Data in Education. Information 2022, 13, 197. https://doi.org/10.3390/info13040197
Guo Y, Chen Y, Xie Y, Ban X. An Effective Student Grouping and Course Recommendation Strategy Based on Big Data in Education. Information. 2022; 13(4):197. https://doi.org/10.3390/info13040197
Chicago/Turabian StyleGuo, Yu, Yue Chen, Yuanyan Xie, and Xiaojuan Ban. 2022. "An Effective Student Grouping and Course Recommendation Strategy Based on Big Data in Education" Information 13, no. 4: 197. https://doi.org/10.3390/info13040197
APA StyleGuo, Y., Chen, Y., Xie, Y., & Ban, X. (2022). An Effective Student Grouping and Course Recommendation Strategy Based on Big Data in Education. Information, 13(4), 197. https://doi.org/10.3390/info13040197