1. Summary
Higher educational institutions (HEIs) employ a variety of learning approaches based on information and communications technology (ICT). These approaches involve different learning environments to facilitate the teaching and learning process with ease and dissemination of knowledge to their learners. Moreover, these environments keep track of the users and their interactions within these environments for auditing and recovery purposes. The logs can help stakeholders with valuable learning data, and when analyzed effectively, can help to provide a better learning experience to learners. Reports generating different users/courses can be used to evaluate the efficacy of the courses and the progress of the learners. Insights can help cater different learning styles, which helps to determine the complexity of courses, identifying specific parts of the content that cause problems in understanding the concepts and gaining insights into the future performance of learners.
Many HEIs use machine learning (ML) to discover associated patterns from these learning environments for better decision making and datamining (DM) to improve decision-making models using artificial intelligence (AI). HEIs require educational datamining (EDM) for a better understanding of learners’ behaviors in these learning environments that have the potential to impact educational practices [
1].
The provided dataset was from the students of Middle East College (MEC), Muscat, Oman, studying in a computing specialization from the sixth semester and above. Historical data about student academics (extracted from SIS), student logs (extracted from Moodle, where the time spent engaging in activities was considered), and video interactions on blended learning material (extracted from logs of the mobile application eDify). For the students’ academic data, the SIS parameters included student demographic data, academic data, degree plan, and academic integrity violations (AIVs). The academic and AIV data were considered. For the students’ activities, the Moodle log parameters included logs of course activity, logs of site activity, live logs, site administration settings, and view log capabilities—although, only logs of course activity were considered. Different from the aforementioned, the eDify logs consist of video interactions for each student, indicating attributes such as played, paused, likes, and number of segments replayed within the video—all of which were selected. The
Supplementary Material provided within this paper is the raw and filtered data.
To predict student performance based on the datamining approach, many studies have been carried out [
2,
3,
4,
5,
6,
7]. Nevertheless, these studies have primarily focused on demographic data, and predictions have been carried out based on activities performed in the online environment. However limited research has been conducted based on analyzing the video interactions of learners in a video-assisted course [
8,
9,
10,
11,
12]. The provided dataset contains SIS, Moodle, and video-assisted course data (eDify), which can help researchers to understand video learning analytics using EDM, thereby enhancing the teaching and learning process.
The dataset provided aimed to predict student performance using EDM. The dataset contained 326 observations, where each observation represents an individual student and has 40 attributes. The application of the dataset can provide the research community to benchmark EDM tasks performed on longitude and latitude datasets. This can help to understand student academic performance (SAP) modeling and prediction using datamining techniques [
13,
14]. Furthermore, it can be combined with other online environments such as Moodle and online video streaming to understand the behaviors of their learners [
15,
16,
17].
2. Data Description
The presented dataset was classified into three categories: Student academic information (
Section 2.1 and
Section 2.2), student activity (
Section 2.3 and
Section 2.4), and student video interactions (
Section 2.5). First, student academic information was collected from SIS. Second, student activity information was collected from the activities performed on Moodle. Lastly, student video interactions were collected from the mobile application “eDify”.
Figure 1 shows the mapping on how the dataset was formed.
2.1. Student Academic Information
Ten comma-separated value (CSV) files of “
KMS Module <Number> <Semester>,” which contain “
Know My Student” detail features, were extracted from SIS, with 20 attributes.
Table 1 summarizes these attributes, accompanied by a brief description.
2.2. Student Academic Performance
Ten comma-separated value (CSV) files of “Result
Module <Number> <Semester>”, containing the overall results in the modules extracted from SIS, and six attributes (including “RollNumber”, “ApplicantName” and “Session”) and three new attributes were extracted.
Table 2 summarizes these attributes, accompanied by a brief description.
2.3. Student Moodle Logs
Ten comma-separated value (CSV) files of “
Moodle Module <Number> <Semester>” containing nine attributes were extracted.
Table 3 summarizes these attributes, accompanied by a brief description.
2.4. Student Online Activity on Moodle
Ten comma-separated value (CSV) files of “
Activity Module <Number> <Semester>” containing “RollNumber” and “ApplicantName” and two new attributes were extracted.
Table 4 summarizes these attributes, accompanied by a brief description.
2.5. Student Video Interaction
Ten comma-separated value (CSV) files of “
VL Module <Number> <Semester>” containing “RollNumber” and “ApplicantName” and four new attributes were extracted.
Table 5 summarizes these attributes, accompanied by a brief description.
2.6. Data Pre-Processing
The pre-processing .csv file contained the consolidated data, and out of that 24 attributes, the following were selected for this study: “ModuleCode”, “ModuleTitle”, “SessionName”, “ApplicantName”, “CGPA”, “AttemptCount”, “RemoteStudent”, “Probation”, “HighRisk”, “TermExceeded”, “AtRisk”, ”AtRiskSSC”, “OtherModules”, “PlagiarismHistory”, “CW1”, “CW2”, “ESE”, “Online C”, “Online O”, “Played”, “Paused”, “Likes”, “Segment” and “Result” (mapped with the outcome of the student either having passed or failed the module) based on the grading scheme, as shown in
Table 6. Eight attributes were converted from numeric to ordinal values, as shown in
Table 7, and three attributes were converted from different numeric to ordinal values, as shown in
Table 8. The criteria used to convert the grading scheme and marks to ordinal were in line with the assessment evaluation classification range used at MEC. This conversion was carried out to map the outcome of the target variable “Result”.
3. Methods
Before starting the data collection, the first step was to identify the modules. The data were extracted from SIS, Moodle, and eDify.
Figure 2 shows the design, materials, and methods used in the process. The raw data were collected from SIS in two phases. First, “
Know My Student” details were extracted from the chosen modules. Second, the results of those specific students in the particular modules were extracted. The logfiles from Moodle and eDify were extracted from the selected modules for data cleansing. After the data cleansing, the files were merged, and pre-processing was carried out by merging them into a single consolidated .csv file.
Figure 2 shows the procedure on how the data were collected, processed, and made available for the datamining tool (Orange) to predict the students’ academic performance using SIS, Moodle activity data, and video interactions through the mobile application.
3.1. Module Selection
The sixth semester modules from the computing department at MEC were selected based on the difficulty level (level 2 and 3 modules). Through sampling, it was found that 188 samples were sufficient for this study. Data were collected only from the Spring 2017 to Spring 2021 semesters in the respected modules. The next step was to obtain informed consent from the students who were enrolled on the module; module leaders/instructors obtained this consent, as the study posed no potential risk or discomfort for the students. Ensuring confidentiality and privacy, the identity of the students was coded and mapped accordingly in the data cleansing process. The character marking and generalization method was used to anonymize the data where necessary and applicable.
3.2. Data Cleansing
For data extracted from SIS, the data were complete and had no missing values. From the first extraction out of 20 attributes, few were not relevant to the study. “RollNumber”, “ApplicantName” and “ApplicantMobile” were encoded to “xxxxxxxxxx”. The “Advisor” attribute were also encoded to “xxxxx”. From the second set, only “RollNumber” and “ApplicantName” were also encoded as the first set to make the data anonymous.
For the data extracted from Moodle, the faculty and moderator logs were filtered out, as they were not required for the study. After the removal of the entries, “User Full Name” and “Affected user” were encoded to “-” instead.
For the data extracted from eDify, “RollNumber” and “ApplicantName” were encoded to “xxxxxxxxxx”.
3.3. Data Pre-Processing
The pre-processing .csv files contained all of the data in a single file that could be pre-processed before being used for classification in any datamining tool. The step carried out here was merging all data into one single data file, where we identified 24 attributes that were useful for this study. The next step was to convert the ordinal values to nominal values, as shown in
Table 7 and
Table 8.
For the data extracted from Moodle, “Affected user”, “Event context”, “Component”, “Event name”, “Description” and “Origin” were not relevant to this study, so they were omitted and only the time spent by the individual user in Moodle courses was converted into minutes. IP addresses were used to identify the login timings, either connected from within or outside the campus. 192.168.x.x IP was considered as the within the campus IP address, and the rest were IP addresses from outside the campus. The access time was then converted into minutes to understand the time spent on the activities within or outside the campus.
For data extracted from eDify, all four attributes were taken and no conversion was performed on the data.
3.4. Final Dataset
The final .csv dataset was the complete dataset, with 21 out of 40 attributes that could be used for this study. This dataset can be used with any datamining tool for classifying and predicting student academic performance using EDM.
From SIS, 15 out of the 24 attributes were selected for the final dataset: “ApplicantName”, “CGPA”, “AttemptCount”, “RemoteStudent”, “Probation”, “HighRisk”, “TermExceeded”, “AtRisk”, “AtRiskSSC”, “OtherModules”, “PlagiarismHistory”, “CW1”, “CW2”, “ESE” and “Result (Target Variable)”.
From Moodle, two attributes were selected based on the activities performed on Moodle from outside or within the campus: “Online C” and “Online O”.
From eDify, four attributes were selected: “Played”, “Paused”, “Likes” and “Segment”.
The final dataset can help researchers to better understand the learning behaviors of the students in the online learning environment setting.
4. Conclusions
This article provides the dataset with multiple learning environments, which will be useful for researchers who want to explore students’ academic performance in online learning environments. This will help them to model their educational datamining models. The dataset will be useful for researchers who want to conduct comparative studies on student behaviors and patterns related to online learning environments. It will further help to form an educational datamining model that can be applied to different classification algorithms to predict successful students. Moreover, feature selection techniques can be applied, which can provide a better accuracy rate for predicting students’ academic performance.
For future studies, weekly video interaction records can be considered to provide better insights into video learning analytics and student performance. Furthermore, the data can be used with the predictive churn model to act as an early warning system for the dropouts in the course.
5. Patents
Hasan, Raza, Palaniappan, Sellappan, Mahmood, Salman, and Asif Hussain, Shaik. A novel method and system to enhance teaching and learning and the student evaluation process using the “eDify” mobile application. AU Patent Innovation 2021103523, filed 22 June 2021.
Author Contributions
Conceptualization and methodology, R.H.; supervision, S.P.; data curation and validation, S.M.; writing—original draft preparation and visualization, A.A.; investigation and writing—review and editing, K.U.S. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not Applicable.
Informed Consent Statement
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement
The authors confirm that the data supporting the findings of this study are available within the article and/or its
Supplementary Materials.
Acknowledgments
The authors of this data article are extremely thankful to all of the faculty and students who participated in this study.
Conflicts of Interest
The authors declare that they have no known competing financial interest or personal relationship that have, or could be perceived to have, influenced the work reported in this article.
References
- Romero, C.; Ventura, S.; de Bra, P. De knowledge discovery with genetic programming for providing feedback to courseware authors. User Model. User-Adapt. Interact. 2004, 14, 425–464. [Google Scholar] [CrossRef]
- Yaacob, W.F.W.; Nasir, S.A.M.; Yaacob, W.F.W.; Sobri, N.M. Supervised data mining approach for predicting student performance. Indones. J. Electr. Eng. Comput. Sci. 2019, 16, 1584–1592. [Google Scholar] [CrossRef]
- Shetty, I.D.; Shetty, D.; Roundhal, S. Student performance prediction. Int. J. Comput. Appl. Technol. Res. 2019, 8, 157–160. [Google Scholar] [CrossRef]
- Zohair, L.M.A. Prediction of Student’s performance by modelling small dataset size. Int. J. Educ. Technol. High. Educ. 2019, 16, 27. [Google Scholar] [CrossRef]
- Daud, A.; Aljohani, N.R.; Abbasi, R.A.; Lytras, M.D.; Abbas, F.; Alowibdi, J.S. Predicting student performance using advanced learning analytics. In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017; pp. 415–421. [Google Scholar]
- Tomasevic, N.; Gvozdenovic, N.; Vranes, S. An overview and comparison of supervised data mining techniques for student exam performance prediction. Comput. Educ. 2020, 143, 103676. [Google Scholar] [CrossRef]
- Saa, A.A.; Al-Emran, M.; Shaalan, K. Mining student information system records to predict students’ academic performance. In Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2020; pp. 229–239. ISBN 9783030141172. [Google Scholar]
- Giannakos, M.N.; Chorianopoulos, K.; Chrisochoides, N. Making sense of video analytics: Lessons learned from clickstream interactions, attitudes, and learning outcome in a video-assisted course. Int. Rev. Res. Open Distrib. Learn. 2015, 16, 260–283. [Google Scholar] [CrossRef] [Green Version]
- Lau, K.H.V.; Farooque, P.; Leydon, G.; Schwartz, M.L.; Sadler, R.M.; Moeller, J.J. Using learning analytics to evaluate a video-based lecture series. Med. Teach. 2018, 40, 91–98. [Google Scholar] [CrossRef] [PubMed]
- Mohammed, Q.A.; Naidu, V.R.; Said, M.; Al Harthi, M.S.A.; Babiker, S.; Al Balushi, Q.; Al Rawahi, M.Y.; Al Riyami, N.H.S. Role of online collaborative platform in higher education context. IJAEDU-Int. E-J. Adv. Educ. 2020, 6, 220–227. [Google Scholar] [CrossRef]
- Hasnine, M.N.; Akcapinar, G.; Flanagan, B.; Majumdar, R.; Mouri, K.; Ogata, H. Towards final scores prediction over clickstream using machine learning methods. In Proceedings of the ICCE 2018—26th International Conference on Computers in Education, Metro Manila, Philippines, 26–30 November 2018. [Google Scholar]
- Yang, T.-Y.; Brinton, C.G.; Joe-Wong, C.; Chiang, M. Behavior-based grade prediction for MOOCs via time series neural networks. IEEE J. Sel. Top. Signal Process. 2017, 11, 716–728. [Google Scholar] [CrossRef]
- Hasan, R.; Palaniappan, S.; Raziff, A.R.A.; Mahmood, S.; Sarker, K.U. Student academic performance prediction by using decision tree algorithm. In Proceedings of the 2018 4th International Conference on Computer and Information Sciences (ICCOINS), Kuala Lumpur, Malaysia, 13–14 August 2018; pp. 1–5. [Google Scholar]
- Hasan, R.; Palaniappan, S.; Mahmood, S.; Sarker, K.U.; Abbas, A. Modelling and predicting student’s academic performance using classification data mining techniques. Int. J. Bus. Inf. Syst. 2020, 34, 403–422. [Google Scholar] [CrossRef]
- Hasan, R.; Palaniappan, S.; Mahmood, S.; Shah, B.; Abbas, A.; Sarker, K.U. Enhancing the teaching and learning process using video streaming servers and forecasting techniques. Sustainability 2019, 11, 2049. [Google Scholar] [CrossRef] [Green Version]
- Hasan, R.; Palaniappan, S.; Mahmood, S.; Abbas, A.; Sarker, K.U.; Sattar, M.U. Predicting student performance in higher educational institutions using video learning analytics and data mining techniques. Appl. Sci. 2020, 10, 3894. [Google Scholar] [CrossRef]
- Hasan, R.; Palaniappan, S.; Mahmood, S.; Sarker, K.U.; Sattar, M.U.; Abbas, A.; Naidu, V.R.; Malali Rajegowda, P. eDify: Enhancing teaching and learning process by using video streaming server. Int. J. Interact. Mob. Technol. 2021, 15, 49–65. [Google Scholar] [CrossRef]
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).