Next Article in Journal
PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles
Previous Article in Journal
Fractional Intuitionistic Fuzzy Support Vector Machine: Diabetes Tweet Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SoK: The Impact of Educational Data Mining on Organisational Administration

1
Department of Computer Science and Information Technology, School of Computing, Engineering and Mathematical Sciences, La Trobe University, Bundoora, VIC 3086, Australia
2
La Trobe Business School, La Trobe University, Bundoora, VIC 3086, Australia
3
Data Science Department, College of Computing, Umm Al-Qura University, Makkah 24382, Saudi Arabia
*
Author to whom correspondence should be addressed.
Information 2024, 15(11), 738; https://doi.org/10.3390/info15110738
Submission received: 26 September 2024 / Revised: 5 November 2024 / Accepted: 15 November 2024 / Published: 19 November 2024
(This article belongs to the Section Information Applications)

Abstract

:
Educational Data Mining (EDM) applies advanced data mining techniques to analyse data from educational settings, traditionally aimed at improving student performance. However, EDM’s potential extends to enhancing administrative functions in educational organisations. This systematisation of knowledge (SoK) explores the use of EDM in organisational administration, examining peer-reviewed and non-peer-reviewed studies to provide a comprehensive understanding of its impact. This review highlights how EDM can revolutionise decision-making processes, supporting data-driven strategies that enhance administrative efficiency. It outlines key data mining techniques used in tasks like resource allocation, staff evaluation, and institutional planning. Challenges related to EDM implementation, such as data privacy, system integration, and the need for specialised skills, are also discussed. While EDM offers benefits like increased efficiency and informed decision-making, this review notes potential risks, including over-reliance on data and misinterpretation. The role of EDM in developing robust administrative frameworks that align with organisational goals is also explored. This study provides a critical overview of the existing literature and identifies areas for future research, offering insights to optimise educational administration through effective EDM use and highlighting its growing significance in shaping the future of educational organisations.

1. Introduction

EDM is an interdisciplinary field that combines methods from computer science, statistics, and education to analyse data generated within educational settings. The primary aim of EDM is to develop techniques that can explore large-scale educational data to better understand how students learn and how educational environments operate. This field has grown rapidly due to the increasing availability of educational data from various sources, including learning management systems (LMSs), online courses, and administrative records [1].
EDM’s scope includes, but is not limited to, analysing student interactions with educational software, identifying learning patterns, predicting student performance, and optimising educational resources. The integration of EDM into educational administration can enhance decision-making processes, streamline operations, and improve institutional efficiency. However, the implementation of EDM poses several challenges, such as ensuring data privacy, integrating with existing systems, and training staff to use these advanced analytical tools effectively [2].
Several literature reviews have been conducted on EDM, such as [3,4,5,6], focusing primarily on its applications in improving learning outcomes and student performance. These reviews have provided comprehensive insights into the techniques and tools used in EDM, as well as its impact on teaching and learning processes. However, there is a notable gap in the existing literature regarding the use of EDM for administrative purposes in educational organisations. A key milestone in the field of EDM was the establishment of the first International Conference on Educational Data Mining in 2008. This event marked a formal recognition of EDM as a field, bringing together researchers who explored new ways to apply DM techniques to educational settings. The conference, which has continued annually, laid the groundwork for understanding how DM could transform not only teaching and learning processes but also administrative functions within educational institutions [7].
For example, in [8], the exploration of student modelling and the application of DM to learning management systems were pivotal in shaping the direction of EDM research. This study extends those foundational efforts by exploring the application of EDM to organisational administration, with a particular focus on resource allocation, staff evaluation, and policy-making [9]. While some studies have touched on administrative applications, such as [2,10,11,12], there is a lack of literature review papers on how EDM can specifically benefit administrative tasks, the challenges involved, and the best practices for implementation.
Conducting a SoK on the use of EDM for management processes in educational organisations is essential. A systematisation of knowledge (SoK) provides a structured and comprehensive overview of existing research. This approach helps in identifying best practices, challenges, and effective strategies for implementing EDM in administrative contexts. By doing so, it supports educational administrators in making informed decisions, optimising resource allocation, and enhancing overall organisational efficiency. Moreover, a SoK can highlight areas where further research is needed, guiding future studies to address existing gaps and improve the implementation of EDM in educational management.
Grey literature (GL) refers to materials and research produced by organisations outside of traditional commercial publishing and distribution channels. This includes technical reports, working papers, government documents, research reports, theses, preprints, and policy documents. Unlike traditional or white literature (WL), GL is used to disseminate current research findings and practical applications quickly, and these materials are not peer-reviewed. This is particularly valuable in rapidly evolving fields like EDM, capturing a fuller spectrum of current knowledge and practices, offering timely and relevant insights [13,14,15]. Incorporating GL into a SoK enriches the research by introducing diverse insights and broader perspectives, extending the research beyond conventional peer-reviewed articles [16,17]. While this study primarily relies on the peer-reviewed scientific literature, GL and a Multivocal Literature Review (MLR) were incorporated to enrich the analysis with current practices and real-world applications. Given the fast-evolving nature of EDM, GL provides timely insights into practical implementations and cutting-edge techniques that are yet to be fully explored in academic publications [15]. For instance, many innovative administrative processes and technologies are first introduced in working papers, policy documents, and technical reports before they reach peer-reviewed journals [18]. To ensure the inclusion of GL, a prior comparison was conducted, excluding GL from the analysis. While the key findings regarding the benefits and challenges of EDM for administrative purposes remain consistent, the exclusion of GL results in the loss of recent insights into emerging applications and real-world case studies. Thus, GL complements the peer-reviewed literature by providing practical examples and timely updates, which are crucial for understanding how EDM can be effectively implemented in organisational administration [15].
To address this gap, this paper conducts a SoK to study the work related to the application of EDM in the management process of educational organisations. To achieve this, the following research questions are developed:
RQ 1: 
How does the integration of EDM impact decision-making processes in educational organisation administration?
RQ 2: 
What are the EDM techniques that are mostly used for educational organisation administration purposes?
RQ 3: 
How do educational administrators perceive the role of EDM in improving organisational performance and efficiency?
RQ 4: 
What are the potential benefits, drawbacks, and key challenges faced by educational organisations in implementing EDM for administrative purposes?
This study makes several key contributions to the field:
1 
It offers a critical overview of the current literature on EDM in organisational administration, identifying significant gaps and highlighting areas for future research.
2 
It examines how EDM can revolutionise decision-making processes, fostering data-driven strategies that significantly enhance administrative efficiency and effectiveness.
3 
It delves into the challenges of integrating EDM into administrative functions, including issues related to data privacy, system integration, and the necessity for specialised expertise in data interpretation.
4 
The research evaluates both the advantages and potential risks associated with using EDM in educational administration, providing a balanced perspective on its overall impact.
5 
It illustrates how EDM can contribute to the development of robust administrative frameworks that align with and support the strategic objectives of educational organisations.

2. Conceptual Framework of EDM

EDM is a field that sits at the intersection of several key disciplines, as illustrated in Figure 1. This diagram visually represents the interdisciplinary nature of EDM, highlighting its foundations in computer science, education, and statistics.
  • Computer science: EDM leverages advances in computer science, particularly in the areas of data mining and machine learning, to analyse educational data. This encompasses methodologies and technologies such as algorithm design, computational models, and artificial intelligence to extract meaningful patterns from vast datasets.
  • Education: Within the educational domain, EDM focuses on computer-based education. This includes the study and implementation of technology-enhanced learning environments, digital learning platforms, and other educational technologies that facilitate data collection and analysis.
  • Statistics: Statistical methods are central to EDM for analysing and interpreting educational data. Learning analytics, a subset of EDM, applies statistical techniques to understand and improve learning processes and outcomes.
At the confluence of these disciplines lies EDM, which integrates data mining and machine learning methods to advance our understanding of educational processes. This integration facilitates the development of predictive models, the identification of learning patterns, and the provision of actionable insights to educators and policy-makers.
The diagram underscores the collaborative nature of EDM, demonstrating how it harnesses the strengths of each discipline to innovate and enhance educational practices.

3. Methodology

This section details the search strategy used in this SoK, developed to identify relevant studies from various databases. The search encompassed studies on the use of EDM in educational organisations for administrative tasks. Both WL and GL were considered to provide a comprehensive view of the current state of research.
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework was employed to enhance the transparency and robustness of this SoK. PRISMA provides a structured checklist and a flowchart that guide the reporting of reviews and meta-analyses, which is particularly valuable in fields such as computer science and educational technology [19,20,21,22]. Consequently, this SoK approach adheres to PRISMA guidelines. Figure 2 summarises the stages of the research. The methodology in this study comprises five phases.

3.1. Phase 1: Initial Search

In this phase, an initial search was conducted to retrieve the literature related to EDM. A diverse range of databases, used in previous studies such as [23,24], was selected. These databases include the following:
  • ERIC—chosen for its extensive coverage of educational research and resources.
  • Scopus—selected for its comprehensive database of the peer-reviewed literature across technology and educational disciplines.
  • ACM Digital Library—targeted for its focus on computing and technology, relevant to data mining technologies.
  • IEEE Xplore—included for its technical literature in engineering and technology, which underpins the technological aspects of EDM.
  • Web of Science—utilised for its interdisciplinary coverage, including educational sciences and technology.
  • Google Scholar—employed to capture a broad spectrum of GL and less formal publications.
  • Base—known for its particularly strong coverage of academic web resources.
  • Science Research—included for its ability to access a wide variety of scientific databases simultaneously.
The search terms used include “Educational Data Mining”, “Educational Machine Learning”, “Educational Big Data”, and “Educational Artificial Intelligence”, covering the period from January 2010 to July 2024. This phase yielded 7123 articles from peer-reviewed databases and 9058 articles from GL databases, as shown in Figure 2.

3.2. Phase 2: Screening the Title, Abstract, and Keywords

In this phase, a more focused search and filtering process was applied to the articles retrieved in the initial search. In addition to the initial EDM-related search terms, additional terms related to administrative tasks were incorporated. These terms included “educational administration” and “educational management”.
This combination ensured that the search captured studies specifically focusing on the use of EDM for administrative purposes in educational organisations. The combined search terms were applied to the titles, abstracts, and keywords of the articles. This approach helped to identify and exclude studies that did not align with the scope of this SoK review. By focusing on these sections, the process efficiently filtered out irrelevant studies without the need for a full-text review at this phase.
Articles that did not match the criteria set in Phase 1 and the additional focus of Phase 2 were excluded. Specifically, studies that did not pertain to EDM in the context of educational administration or management or that focused solely on the learning or instructional aspects of EDM rather than administrative applications were removed.
As a result of this focused screening, the number of articles was reduced. Out of the initial 16,181 articles, approximately 10,500 articles that did not meet the criteria were excluded. This left a more manageable number of around 2000 articles from WL sources and 3500 articles from GL databases for further analysis in subsequent phases. By incorporating these additional steps and refining the search criteria, Phase 2 effectively narrowed the pool of articles, ensuring that only the most relevant studies progressed to the next phase of the systematisation of knowledge.

3.3. Phase 3: Screening Based on the Inclusion and Exclusion Criteria

In this phase, the inclusion and exclusion criteria were applied to further filter the articles obtained from the previous phase. This phase was important as it ensured that the remaining articles were not only relevant but also met the quality standards required for this SoK.
Inclusion criteria were applied uniformly across different types of literature, including WL and GL, ensuring consistency in the SoK [15,24]. Studies were selected based on predefined inclusion and exclusion criteria to ensure relevance. The inclusion criteria are as follows:
  • Studies published in English.
  • Studies focusing on the use of EDM for administrative purposes in educational settings.
  • Studies within the time period from 2010 to July 2024.
The exclusion criteria are as follows:
  • Studies primarily addressing EDM for teaching and learning without administrative implications.
  • Non-English publications.
  • Duplicate studies across different databases.
  • Studies with insufficient methodological detail.
In cases of duplication, the most recent version of an article was selected to ensure quality and relevance. If the same article was found in multiple databases with the same authors, the newest version was chosen, as it may contain important revisions and updates. Additionally, if an article appeared in both peer-reviewed and GL databases, the latest version was selected to include the most recent findings and analyses [25].
As a result of applying these criteria, the number of articles was reduced to 398 articles from WL sources and 1578 articles from GL sources. This substantial reduction ensured that only the most relevant studies progressed to the next phase of this review.

3.4. Phase 4: Screening the Introduction and Conclusion

In this phase, the introduction and conclusion sections of the articles were screened to ensure their relevance and alignment with this study’s objectives. This phase involved a detailed examination of these sections to confirm that the studies explicitly addressed the use of EDM for administrative tasks in educational organisations. By focusing on the introductions and conclusions, the screening process aimed to quickly identify articles that provided substantial insights or findings pertinent to the research questions. This step helped in further refining the selection by excluding articles that, despite passing earlier phases, did not adequately align with the specific focus of this review. As a result of this screening, the number of articles was reduced to 102 and 598 for WL and GL, respectively.

3.5. Phase 5: Screening the Full Text

This phase consisted of reading the full text of the selected articles. For the GL papers, there was an additional quality assurance step. It was performed using the Authority, Accuracy, Coverage, Objectivity, Date, and Significance (AACODS) Checklist [26], which assesses aspects such as authority, accuracy, coverage, objectivity, date, and significance. This checklist helped to ensure that the GL included in this review was of high quality and relevant to the study’s objectives. In this SoK, the AACODS Checklist was applied; however, a further measurement was added to improve the GL quality. After using the AACODS Checklist, every aspect was coded. Only articles that achieved 20 points or above were included. Articles that achieved less than that were excluded. By applying these standards, the screening process was able to filter out articles that did not sufficiently meet the criteria. This phase concluded with 83 articles from WL databases and 21 articles from GL databases being included in the final review. These selected articles represent the most relevant studies, providing a solid foundation for this SoK.

4. Findings

This section presents the findings from the SoK review of 104 papers on the use of EDM in educational administration. The findings are organised by their impact on decision-making processes, key themes, challenges, the application of EDM techniques for administrative purposes, and the overall improvement in educational performance. The discussion also addresses the research questions outlined in Section 1.
The SoK covers papers published between 2010 and 2024, examining various aspects of EDM and its role in educational administration, including decision support systems, educational management, performance monitoring, and predictive analytics. Most studies emphasise practical applications, highlighting both the advantages and challenges of implementing EDM in educational settings. In addition, this review explores how educational administrators perceive EDM’s role in enhancing organisational performance and efficiency, as well as its contribution to strategic planning.

4.1. Impact of EDM Integration

The integration of EDM into educational organisations has a significant impact on various aspects of administration and management. This section explores these impacts and synthesises findings from various studies to answer the first research question.

4.1.1. Enhanced Decision-Making

One of the primary benefits of EDM integration is the enhancement of decision-making within educational settings. By leveraging data-driven insights, administrators can make more informed decisions, leading to improved outcomes for students and more efficient management practices.
Data-driven approaches enable educational organisations to make well-informed decisions. For example, an LMS provides valuable insights into learning behaviours due to collecting and analysing extensive amounts of data. This enhances the decision-making capabilities of educational administrators, supporting the optimisation of educational resources and interventions [27,28,29,30,31]. Many studies have investigated how educational institutions can make informed decisions about teaching strategies and interventions to improve student outcomes, identify at-risk students early, and provide timely interventions [32,33,34,35,36]. Moreover, data-driven approaches are instrumental in analysing, predicting, and classifying student performance and the factors influencing it [37,38,39,40,41,42,43,44,45,46,47]. These techniques also facilitate the prediction of students’ grades and study durations [48,49,50,51]. Overall, EDM systems enhance decision-making processes by enabling the early identification of at-risk students and the implementation of timely interventions.
EDM enables consolidating data from various educational subsystems into a unified framework to enhance decision-making processes. This is demonstrated by [52,53]. They provide a system that integrates data from synchronous (e.g., MS Teams) and asynchronous (e.g., e-Class) learning environments, offering comprehensive insights into student interactions and academic performance.
EDM helps to improve administrative efficiency in schools. For example, EDM can provide information to improve the development and usage of information and communication technology (ICT) systems. ML models are used to predict user satisfaction with ICT systems employed in the administration of educational institutions [2]. Another study discusses the analysis of student responses to predict satisfaction, which helps in making informed decisions regarding educational practices and policies [54].
EDM enhances the decision-making capabilities of educational administrators by providing actionable insights. The study [55] provides detailed insights into the factors influencing students’ choices of study programs, such as religion, transportation, gender, and school status. By understanding these factors, educational administrators can make informed decisions to tailor their recruitment strategies and improve the alignment of educational offerings with student preferences. A further study examines how chatbots can help personalise the learning plan and improve administration efficiency [56].
EDM systems can help to enhance the quality of teaching and school performance and provide actionable insights for administrators. For example, some studies have explored the development and application of decision support systems for education management, leveraging EDM techniques to enhance decision-making processes and utilising well-designed data warehouses [32,57,58,59,60]. These studies focus on integrating various data sources and applying improved algorithms to predict and assess teaching quality. Evaluating school performance is a critical aspect of educational management. The use of EDM techniques to evaluate school performance is investigated by integrating various performance indicators sourced from the Educational Portal of the Ministry of Education in Oman. The study provides a comprehensive evaluation of schools [61]. The results demonstrate the high accuracy and effectiveness of these techniques in evaluating school performance. Thus, educational administrators can make informed decisions, identifying strengths and weaknesses within schools and implementing targeted interventions.
The application of EDM in education offers a robust framework for enhancing decision-making processes. For illustration, one study provides a comprehensive framework for using EDM to extract meaningful insights that can inform educational decisions. Educational administrators can gain a deeper understanding of factors influencing educational outcomes using EDM techniques [58,62,63,64,65].

4.1.2. Optimised Resource Allocation

Effective resource allocation is crucial for the administration operation of educational organisations. EDM allows administrators to identify areas where resources can be optimally distributed, ensuring better utilisation and enhanced educational outcomes.
Many studies emphasise the importance of selecting appropriate parameters to evaluate the implementation of new system concepts, focusing on enhancing the quality of the user experience, life, resource allocation, and online learning process [42,66,67,68]. EDM leverages ML to find factors affecting the users’ experience of regrades of ICT systems, such as in [2]. Using EDM helps administrators to identify inefficiencies and implement corrective measures to optimise resource use and enhance service delivery. Preprocessing is an important step in resource allocation optimisation in educational organisations. This step helps in collecting insight knowledge [55,57]. These studies address the challenge of effectively mapping prospective students’ interests to suitable study programs.
More studies highlight the potential of EDM to enhance various aspects of teaching management, including subject diversification, personalised teaching management, teaching evaluation systems, and human resource management. By leveraging EDM, universities and educational organisations can effectively allocate resources to improve teaching quality and streamline administrative processes [69]. Research conducted in Ghana explores the use of EDM as an expert system to improve the educational organisation’s administration process. They study the Ministry of Education to identify inaccuracies and inefficiencies in the educational system [70].
EDM technologies help save resources as well, and that means better optimisation of the resource. EDM helps in enhancing teaching effectiveness, personalising student learning experiences, and automating administrative tasks. EDM technologies, such as chatbots, work to enhance educational organisations’ outcomes by providing teachers with actionable insights and reducing their workload. As a result, EDM helps to leave staff with complicated tasks instead of the obvious, easy tasks as well as save their time [68,71]. Furthermore, another study discusses challenges faced by educational organisations and universities due to the sharp increase in student numbers and the limited availability of teaching resources [41].
Moreover, EDM technologies are used to predict at-risk students and those likely to withdraw early and then improve resource distribution and support students, improving retention rates and academic outcomes [32,55]. Some other papers utilise EDM to predict student dropouts in massive open online courses and students’ performance. Using EDM provides a clear understanding to the administrators in order to reallocate resources, enabling timely interventions and thus preventing higher loss [44,58,72].

4.1.3. Enhanced Institutional Performance

The integration of EDM can significantly elevate the overall performance of educational institutions. By leveraging data-driven insights, institutions can streamline processes, improve operational efficiency, and achieve better educational outcomes.
Some research explores the critical stages of preprocessing educational data. Thus, educational organisation managers can take actions that enhance institutional performance. Different studies have highlighted that the early process is a critical phase in the protection and classification process as this directly affects the process’s accuracy [28,40,60,73]. These studies focus on the data collection, data interpretation, database creation, and data organisation stages, emphasising the importance of these preprocessing steps in maintaining the integrity and accuracy of educational data. These research studies illustrate how proper data prepossessing can prevent inaccuracies and biases, leading to more authentic and reliable conclusions [27,53,57,74,75]. Another aspect discussed in the literature is how EDM techniques can be utilised to manage and analyse the large volumes of data generated in higher education institutions. By leveraging these techniques, educational institutions can improve their administrative and academic functions, leading to enhanced institutional performance [62,76]. Moreover, EDM systems work with synchronous and asynchronous applications in a way that can enhance institutional performance. EDM can be used with synchronous and asynchronous e-learning systems to assess previous performance, monitor current services, and predict future outcomes [52]. EDM seemingly engages with new AI technologies such as Virtual Reality and business intelligence. The adaptation of EDM helps to demonstrate how AI can reduce the administrative burden, provide actionable insights, and provide personalised learning experiences; therefore, EDM technologies enhance students’ outcomes as well as the institutional performance [29,68].
Performance prediction and student classification are widely used applications of EDM systems. EDM techniques enhance various aspects of education, including the prediction of student performance, at-risk students, satisfaction, slow learners, and attendance rates, as well as personalised learning paths, real-time feedback mechanisms, and challenges faced [32,34,37,39,41,44,45,46,47,54,58,61,64,72,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92]. For example, EDM technologies are used to predict the final grades of students in a computer engineering program at an Ecuadorian university. The study aims to enhance the academic quality as well as the institution’s performance by developing predictive models based on historical academic data [48]. EDM technologies are used to predict the performance of primary students in rural and urban areas in India. The study focuses on students from classes three to five. The findings reveal that certain variables, including previous exam results and socio-economic factors, significantly impact student performance [93].
Likewise, EDM technologies are used to measure users’ satisfaction and use it as an indication to enhance institutional performance and identify where development is needed. In [2], EDM technologies are utilised to predict users’ satisfaction with ICT systems and identify what factors affect that satisfaction. The authors summarise some key factors that influence the users’ satisfaction, such as usability, privacy, security, and IT support [2]. Another research study highlights how chatbots can help with this matter as well and improve administration efficiency [56]. Educational organisations can improve their performance and ICT systems by tackling the identified factors, which will result in better administrative performance and service delivery.
In addition, EDM systems help in early-alert systems that can predict and identify potential academic problems. A study conducted in China investigates various amounts of data collected from educational administration, library usage, and student self-study records [94]. EDM enhances institutional performance by reducing dropout rates and improving academic outcomes. Additionally, one study applies EDM to classify students as “passed” or “failed” using their performance and finding factors that impact that. The data are collected from LMSs that help the administrators to identify key factors influencing classification, including the average number of days students logged into the system monthly and weekly [38].
All in all, EDM technologies can provide a better understanding of educational data. This will help administrators in educational organisations to gather more knowledge. To illustrate, one study discusses how institutions can gain insights into student behaviour, optimise resource allocation, and improve the overall educational experience. Consequently, EDM technologies help to enhance institutional performance [43,67,70,71,95,96].

4.1.4. Increased Transparency and Accountability

The application of EDM systems within educational organisations enhances transparency and accountability. EDM systems help to identify and monitor performance indicators. Furthermore, EDM systems help in supporting informed, data-driven decision-making. Thus, EDM technologies ensure that educational strategies are transparent and institutions are held accountable for their outcomes, which helps build trust and continuous improvement in educational organisations [97]. Furthermore, EDM applications are used to increase and improve the quality of the data generated by educational systems such as the LMS and ICT [98]. This helps in improving the transparency and accountability of the data and by default the educational organisations’ performance. Using EDM helps educational organisations become more open to society and provides better information to stakeholders [99]. This approach enables educational organisations to attract more investments by offering accountable and transparent information.

4.1.5. Informed Policy Development

The application of EDM is extended beyond immediate educational outcomes to influence policy-making processes. These tools lay a data-driven foundation, enabling the development of informed policies that tackle systemic issues within educational institutions.
EDM works better with well-organised and preprocessed data. This has been highlighted in different papers. For example, one paper presents the importance of preprocessing the data and how it is important. The preprocessing is divided into four stages, including data gathering, data interpretation, database creation, and data organisation. This helps to ensure the reliability and authenticity of the dataset, which in turn supports informed policy development within educational institutions [27,44,45,59,65,75].
EDM helps in building an evidence-based atmosphere that supports informed policy development, enabling educational institutions to create targeted interventions for improving educational organisations’ performance. Prediction, analysis, classification, and clustering are broad applications of EDM technologies, and they help provide data-driven interactions. The outcome of EDM systems enriches data-driven actions, addressing management challenges to support informed policy development and improve educational outcomes [34,40,41,47,52,54,57,61,67,69,84,89,95,97,100,101,102]. The identification of key predictors and the use of sophisticated methodologies ensure that the resulting policies are both data-driven and theoretically grounded [33,70,76,82,103,104]. An experiment was performed to study the students from K–12 classrooms in Florida to see how the initiation of new technologies was integrated into classrooms to enhance learning experiences. The administrators monitored and supervised the experiment, gaining more information. EDM helped them focus on professional development and technology integration and provided formative feedback to guide future policies and practices [87,105]. Furthermore, many researchers address the usage of multimedia and social networks within educational organisations and the challenges associated with it, as well as the policies needed for this matter [74,88,106]. Moreover, EDM can handle a large amount of data. Using EDM with BD informed policy development to gain more information from data [48,64,107].
EDM helps administrators and investors to develop policy. EDM helps in driving investments and improving recourse efficiency. EDM provides more knowledge for policy-makers, those in research management, governments, and investors, who will make their decisions based on solidly clear information provided by EDM technologies [77,108,109,110]. For example, one study discusses EDM to identify the benefits of data-driven interventions and how that helps informed policy development and guides resource allocation, such as the need for skilled data scientists and robust data privacy measures [62]. Additionally, EDM helps data scientists bridge the gap with policy-makers to inform stakeholders about policy developments [43].

4.1.6. Improved Administrative Efficiency

Enhancing administrative efficiency is critical for the overall success of educational organisations. EDM techniques help educational organisations streamline their administrative processes. Applying EDM technologies leads to more effective resource management and improved educational outcomes.
EDM technologies can identify the necessity of effective data preparation, the integration of diverse data sources, and the use of interpretable analysis tools to improve administrative efficiency. EDM applications help emphasise a collaboration-centric application architecture that integrates various data sources, customisable data services, and interpretable analysis tools [45,57,111]. On the other hand, EDM helps to specify the challenges or hurdles of gaining an efficient administration process [53,98,112]. To illustrate, many researchers have discussed the challenges that are in educational organisations such as inadequate resource collection, outdated service models, and weak policy frameworks [69,74]. Furthermore, in the literature, EDM technologies are applied to increase staff performance within educational organisations. Thus, EDM helps reduce staff burnout, enabling them to focus on other tasks, which demonstrates its role in improving administrative efficiency [68]. Another study highlights how chatbots can also contribute to this issue, further enhancing administrative efficiency [56].
EDM systems are robust and enhance data-driven decision-making, leading to improved administrative efficiency across various areas. Techniques such as clustering, prediction, and classification elevate EDM technologies, providing administrators with simplified insights and actionable knowledge [2,31,36,52,63,75,82,90,92,97,109]. In addition, the literature compares various machine learning (ML) algorithms, examining the factors that affect their performance to identify the most efficient algorithms for different database structures [70,82]. Some studies go further by refining ML algorithms to improve accuracy, which in turn leads to more valuable insights from educational data through EDM [50,95].
As mentioned before, EDM supports the optimisation of administrative processes through effective use of different DM techniques such as clustering. These tools streamline the management and analysis of large datasets, making it easier for administrators to access and interpret critical information. Techniques such as association rule methods and clustering further optimise student admissions and other administrative functions [34,60,83].
The integration of EDM has significantly transformed administrative workflows in educational organisations, increasing efficiency in managing educational processes. Expert systems automate various administrative tasks, reducing the administrative burden and allowing staff to focus on different tasks. For instance, these systems enable real-time monitoring and automated alerts, facilitating timely interventions and support for students [70].
Another change is the use of click-track data to analyse user interactions and behaviours in online learning environments. By employing EDM and evidence-centred design, organisations can gain deeper insights into user engagement and usage patterns. These data help administrators make informed decisions to improve the effectiveness of online education, optimise resource allocation, and tailor instructional strategies to better meet users’ needs [103]. Change-detection ML models are also used to monitor and manage educational processes, detecting anomalies and shifts in student performance early. This real-time detection and response capability has led to more dynamic and responsive educational management practices [113].
In programming education, learning analytics have been valuable in identifying student difficulties and tailoring instructional approaches, enhancing the overall learning experience and enabling more efficient allocation of resources. By investigating patterns in student performance, managers can proactively address learning gaps and provide targeted support, thus improving educational outcomes [100].
Analysing student behaviour with BD technologies plays a crucial role in administrative workflows. By creating comprehensive profiles of student learning behaviours, educational organisations can better understand factors influencing academic success and implement targeted interventions to support student achievement. This approach enables more personalised and effective educational experiences, aligning administrative efforts with student needs [86]. For instance, a study in the Indian academic context shows that adopting EDM practices improves administrative efficiency. Using DM techniques, educational institutions can streamline processes, enhance decision-making, and optimise resource utilisation, resulting in more effective management of educational activities and improved institutional performance [107]. Another study highlights the use of IoT as a Service (IoTaaS) to enhance administrative workflows in primary schools when integrated into school management tasks [112].
The development of frameworks integrating EDM and BD for e-learning environments has revolutionised educational administration. These frameworks offer a holistic view of user interactions and performance, enabling administrators to make informed decisions that enhance education quality and operational efficiency. Integrating such frameworks has facilitated a more cohesive and data-driven approach to educational management [114]. Evaluating administrative policies and procedures through EDM has provided insights into their effectiveness and areas for improvement. By analysing policy outcomes, educational organisations can refine their approaches to better support student success and organisational goals. This data-driven evaluation ensures that administrative practices are aligned with best practices and continuously optimised for better outcomes [46]. For example, the workflow improves due to the crucial roles of preprocessing phases such as data gathering, interpretation, database creation, and data organisation [27,77].

4.2. EDM Techniques

EDM encompasses a variety of techniques commonly used to improve administrative functions in educational organisations. This section examines these techniques, synthesising findings from multiple studies to address the second research question. As summarised in Figure 3, some of the most widely used methods in EDM include classification, regression, and clustering.
Classification is a learning method that can allocate data to specific categories or classes. Within EDM, this technique aids in classifying users’ satisfaction, exploring hidden factors that affect choices, and tailoring learning experiences. By organising educational data, classification models yield actionable insights crucial for enhancing administrative decisions and boosting the overall effectiveness of education [59]. For example, one study performed in Brazil concluded that demographic factors impact the students’ studying journey. They used classification algorithms to categorise students as either academically successful or at risk of failure [78]. Similarly, classification advanced learning by enabling data-oriented decision-making to improve educational practices and learning materials. Another study focuses on finding the key factors affecting the performance of the students enrolled in technology-related degree programs in Sri Lanka [47]. The findings of the study positively impacted future decisions about the progress of the students’ performance, the quality of the education process, and the future of the education provider.
Regression is a supervised learning technique used to predict continuous outcomes based on input features. In the context of EDM, regression models help predict various educational metrics, such as student performance, graduation rates, or resource allocation needs. These predictions can be invaluable for administrative purposes, enabling data-driven decision-making and strategic planning [39]. Regression can help in improving the students’ grades and change their habits in a way to improve their performance. For example, regression models are applied to link the student’s results and their reading behaviour [115]. This helps the administrators and the educators to identify the issues at an early stage and provide more advice to the students. For instance, a study employed regression models to predict academic outcomes using data from Virtual Learning Environments [32].
Clustering is an unsupervised learning technique used to group similar data points. In the context of EDM, clustering identifies natural patterns in educational data, which can be highly beneficial for administrative purposes [116]. For example, it enables the design of targeted interventions for each group, such as providing extra support and resources to students in the low-engagement, low-performance cluster while offering advanced learning materials and opportunities to those in the high-engagement, high-performance group [92]. Another common application of clustering is identifying at-risk students who may drop out. This enables educational organisations to take proactive measures, improving overall outcomes [34,102]. Given the vast amount of data generated by educational institutions, clustering helps administrators and stakeholders uncover hidden patterns and gain deeper insights. Consequently, this enhances the organisation’s performance by informing data-driven strategies and interventions.
There are many ML algorithms used in EDM. Table 1 summarises the literature work in terms of methods used. As Figure 3 shows, some of the most commonly used methods in EDM include classification, regression, and clustering. Predictive modelling is a cornerstone of EDM, providing powerful tools to forecast and analyse student outcomes and institutional performance. The two primary approaches within predictive modelling are classification and regression, each serving distinct purposes in the educational context. Classification involves predicting categorical outcomes based on input data. In educational settings, classification can be used to categorise students, predict binary outcomes, or segment data into discrete classes. For example, EDM technologies help to analyse historical data on student demographics, attendance, and grades, and classification models can predict which students are at risk of dropping out [36,82]. This allows administrators to intervene early and provide targeted support.
Figure 4 shows the most commonly used ML models in educational organisations for administration reasons based on our SoK boundaries. The ML models are detailed in the following subsections. In practice, educational administrators often combine classification and regression models to gain comprehensive insights. For instance, a regression model might predict future test scores, while a classification model categorises students based on predicted scores into different intervention groups.

4.2.1. Decision Tree

The Decision Tree (DT) is the second most used DM technique. This is due to its simplicity, interpretability, and effectiveness in classification and regression tasks [119]. A DT is a tree-like graph based on a set of conditions. A set of features is used as input, and class labels are the output of the Decision Tree. A root node is placed on the top which generates a set of different branches. Each branch describes a condition that is further connected with the next node [120]. The DT algorithm can be mathematically represented as follows:
H ( S ) = i = 1 c p i log 2 ( p i )
where
  • H ( S ) is the entropy of the dataset S.
  • c is the number of classes.
  • p i is the probability of class i in the dataset.
In the data mining literature, there are other splitting measures in Decision Trees such as the Misclassification Error function, which is the most commonly used approach for creating classification trees. It is demonstrated that directly calculating the binary response variable (1 or 0) can significantly enhance the parametric model estimate over a basic binary classification method. When the sample size is large enough, a parametric model provides an obvious advantage: the estimated classification model is less affected by input variable multicollinearity, and the model is easier to interpret [121].
Using both empirical and artificial data, it is demonstrated that the Gini Index is a viable criterion for building classification trees, resulting in more useful trees than the Misclassification Error. Numerical examples are provided to demonstrate the differences in results obtained by utilising the Gini Index vs. Misclassification Error as the splitting criterion when creating categorical variables, which are the most common classification methods in data mining. The choice of criterion is critical to the importance of the results. The objectives of these trees are to classify the input variables into groups that have comparable response variables that are known in each node and then branch out to the other input variables [122,123].
The DT algorithm is used by [57] to interpret the various amounts of data generated by educational organisations, such as LMSs. The DT helps administrators improve the nature of the educational management process and also enhances the effectiveness of digital education construction. The DT algorithm helps save time in managing important aspects of education management, and the complexity of the system is reduced. Additionally, Ref. [43] uses DT with EDM data to improve the administration process in educational organisations within smart cities. It also helps planners and higher departments gain a clear understanding of the future and enhance their infrastructure planning. This proves that the DT algorithm helps improve the management process. The DT model is utilised by [124] to predict students’ performance using their profiles. The DT helps administrators and educators by providing a roadmap to follow while executing complex knowledge projects that involve multiple stages and possibly several iterations. It bases the decision-making process in the educational domain on sound business analysis, thereby providing the best service for their customers, the students. To achieve customer satisfaction, there needs to be high student achievement. The DT algorithm is employed by [44] to predict student performance and identify the features required to enhance overall student outcomes and improve administrative processes.
Hybrid DT models can combine the DT with other algorithms such as gradient boosting or RF to improve prediction accuracy and reduce overfitting, especially when dealing with high-dimensional educational datasets [125]. These techniques are particularly useful for identifying at-risk students early in their academic journey, enabling timely interventions. One emerging application of DTs in EDM is the use of interpretable AI models, where visualisation tools are combined with DT outputs to create transparent, user-friendly models for educators and administrators [126]. This allows nontechnical staff to engage directly with the data insights without needing extensive technical knowledge [39]. However, DTs are susceptible to creating over-complex trees that do not generalise well to unseen data. Pruning methods are often employed to address this challenge, though it remains a limitation in educational settings where data can be sparse or noisy [77].
The DT model is utilised by [40] to analyse students’ factors during a semester to predict their results for the next semester. This helps administrators to provide insight into interventions that enhance students’ performance and organisations’ outcomes. In the study, the accuracy was around 76%. Another study applied DT [93] to identify factors influencing the students’ performance using the previous examination with around 76% accuracy. The DT algorithm is employed by [44] to determine whether sets of variables and factors can be useful for predicting student expulsions. Identifying students at risk of expulsion helps administrators understand real situations and act in advance to improve the educational organisation’s management process. Various factors are analysed for their impact on student expulsion with around 83% accuracy. Additionally, the DT is used by [55] to analyse students’ factors both before admission and during the current semester to predict their semester examination results and choices. This helps administrators and educators better understand students’ needs and reallocate resources accordingly. As a result, the DT improves the educational organisation’s management process.
In short, the DT algorithm has transformed educational administration by streamlining tasks and improving digital education through efficient data analysis. DT models predict student performance and identify key factors, allowing educators, administrators, and stakeholders to tailor interventions for better outcomes. They also provide insights for proactive interventions and infrastructure planning, enhancing student retention and education quality. Overall, DT algorithms optimise resource allocation and improve decision-making, delivering better services to students and increasing operational efficiency.

4.2.2. Random Forest

Random Forest (RF) is an ensemble learning method that constructs multiple Decision Trees. It uses the mode of the classifications or regressions of the individual trees. This method is known for its high accuracy and ability to handle large datasets [127]. However, one limitation of RF is its tendency to become computationally expensive as the size of the dataset increases, particularly in the context of real-time data analysis within educational management systems, where low-latency responses are required [128].
In the field of EDM for administration, RF has been used to predict student performance, identify at-risk students, and analyse educational outcomes. The results show that RF helps organisations enhance both student and organisational performance as well as coordinate resource distribution [127]. The RF algorithm offers valuable insights into students’ performance and helps predict outcomes by identifying key attributes that influence their academic routines. This model not only enhances students’ performance but also improves the efficiency of administrative processes, overall organisational outcomes, and the effectiveness of educational institutions. Additionally, it helps increase educational quality, which is vital for attracting students to stay in school [129]. The RF algorithm is employed to enhance decision-making and improve student performance and institutional efficiency. In [130], RF is used to predict the students’ performance and find what factors influencing their success among academic and family factors. The RF algorithm is utilised in [131] to predict college students’ grades based on their learning behaviour. RF helps analyse achievement and allows teachers and students to arrange and adjust their learning plans reasonably to improve student achievement. This enhances the educational organisation management process by saving resources.
Recent advancements have demonstrated that integrating RF with ensemble techniques such as gradient boosting can further improve accuracy in predicting long-term student outcomes such as graduation rates and job placements [28]. These advancements, however, come at the cost of increased computational complexity, which may pose challenges for institutions with limited infrastructure.
The RF model is utilised in [132] to predict students’ learning outcomes. Key attributes are identified based on students’ interactions with the e-learning management system, such as visit frequency, resource views, assignment submissions, and scores, with the goal of achieving maximum prediction accuracy. The RF model achieves 76.9% accuracy. The RF algorithm is utilised in [133] to identify aspects that influence students before admission to professional courses. This helps administrators in educational organisations understand the market and students’ needs, as well as provide guidelines for investors. RF helps improve the final outcome and increases the chances of students enrolling in their desired courses. In addition, RF algorithms are used to predict students’ performance based on demographic and academic features. The RF model achieves around 93% accuracy [134]. Similarly, another study [2] utilises the RF algorithm to predict ICT users’ satisfaction, highlighting the factors impacting users’ satisfaction. The RF model achieves around 94.90% accuracy in predicting satisfaction. Another study utilises the RF model to predict the students’ satisfaction [54]. They use the RF to predict different levels of student satisfaction and infer the influential factors related to course and instructor. In the study, the RF algorithm achieves more than 81% accuracy. The RF algorithm is used to identify factors influencing the prediction of students’ exam performance. The analysis concludes that the most reliable predictors are students’ performance in previous semesters and their interaction with learning resources [135].
Although RF demonstrates high accuracy in predicting student outcomes, it is often susceptible to overfitting when applied to highly complex datasets, particularly when irrelevant features are included such as the educational data. This is a limitation commonly addressed by dimensionality reduction techniques such as Principal Component Analysis or feature selection methods, which enhance the model’s general applicability [136]. Furthermore, data privacy concerns arise with the increased collection of student data for these predictive models, especially in institutions lacking strong data protection protocols [137].
On the other hand, the RF model is used to predict students’ dropout and grades, showing around 82% accuracy [138]. Another study discusses the use of RF to predict student dropouts using educational big data (BD) in the context of massive open online courses (MOOCs). Their study helps optimise dropout prediction, as mentioned by [72]. Similarly, Ref. [36] uses the RF algorithm to predict early student dropouts. The RF model has proven effective in helping educators and administrators identify at-risk students early, thereby reducing dropout rates and improving educational outcomes. For instance, Ref. [79] used the RF model to predict slow learners at an early stage, allowing administrators to provide targeted support. Similarly, Ref. [39] utilised RF to predict at-risk students and improve their academic achievements by identifying the factors that impact performance. In another study, Ref. [82] applied the RF algorithm to predict students’ academic performance during the semester, enabling administrators to take preemptive actions to prevent course failure. In this case, the RF model achieved an accuracy rate of approximately 89%, significantly enhancing the institution’s ability to support students and streamline administrative processes.
Overall, the application of the RF algorithm in educational organisations enhances organisations’ performance and management efficiency. By providing accurate predictions and analyses of factors affecting the outcomes and the management process, RF aids administrators in making data-driven decisions, optimising resource allocation, and supporting at-risk students. This leads to improved educational quality and a more responsive environment. Additionally, these insights help developers, investors, and stakeholders understand future expectations and needs, ensuring better planning and investment in educational resources and technologies.

4.2.3. k-Nearest Neighbours

The k-Nearest Neighbours (KNN) algorithm is a nonparametric, instance-based learning method used for classification and regression tasks. The basic principle of the algorithm is to classify a data point based on the majority class among its closest k neighbours in the feature space. The algorithm relies on a distance metric, commonly Euclidean distance, to determine the “closeness” between points [139].
In the context of EDM, the KNN algorithm is particularly useful for predicting outcomes, identifying at-risk students, enhancing the educational organisations administration, and tailoring educational resources to individual needs [140]. KNN helps improve management by analysing historical data like student demographics, performance, and satisfaction levels to identify patterns and trends that guide decision-making.
The KNN model is utilised by [44] to mine the large, complex data generated by educational organisations, identifying the smallest subset of features that impact the algorithm’s performance in predicting student outcomes. The KNN algorithm aids in the early prediction of student performance, enabling the identification of low-performing students who may be at risk of failing exams. This allows administrators to intervene and support these students in improving their outcomes [81] with around 85% accuracy. The KNN algorithm is applied by [78] to predict student outcomes using various attributes, including demographic and academic features. This approach allows managers to take proactive measures in supporting students. Their findings indicate that neighbourhood, school, and age are the most influential features affecting student success. Similarly, the KNN algorithm is utilised by [38] to predict students’ performance and classify them as either “passed” or “failed”. Log reports generated by EDM systems are employed for this classification.
In addition, KNN is employed by [82] with around 98.1% accuracy after generating the needed balance in the educational data generated by the educational organisations. Moreover, the KNN algorithm is utilised in [36] to detect students at risk at an early stage, providing valuable insights to administrators, investors, developers, and decision-makers for improving organisational outcomes.
The KNN algorithm assists managers and higher departments in identifying factors influencing resource allocation, enabling more effective redistribution. As demonstrated in [63], KNN aids in understanding the factors impacting resource utilisation. By applying EDM and the KNN algorithm, both resource allocation and the educational administration process are improved.
In addition, the KNN algorithm is used to predict the satisfaction level. For example, in [54], they use KNN to predict students’ satisfaction regarding a course to help educators and managers to provide suitable support at the correct time. Recent developments in kNN for EDM involve its application in personalised learning pathways [141]. For instance, hybrid approaches combining kNN with dimensionality reduction techniques help to mitigate the “curse of dimensionality” that arises in large, feature-rich educational datasets. This allows k-NN to remain computationally efficient while maintaining high prediction accuracy [142]. Moreover, k-NN has been employed in recent studies to cluster students based on their learning behaviour, leading to more personalised interventions and resource allocation strategies [143]. These approaches enhance the overall decision-making process, enabling administrators to deliver targeted support to students in need, improving retention rates and academic performance.
Another study [2] applied the KNN algorithm to predict ICT users’ satisfaction and identify the factors influencing it. The KNN algorithm achieved an accuracy of around 92%. The study concluded that the key factors affecting ICT users’ satisfaction include usability, privacy, security, and IT support.
Overall, the KNN algorithm proves to be a valuable tool in EDM by predicting student performance and identifying at-risk students. Its application across various studies demonstrates high accuracy in analysing complex educational data, aiding in resource allocation, and enhancing decision-making processes. The algorithm supports educational institutions in improving management and providing targeted support to students, ultimately contributing to better educational outcomes.

4.2.4. Support Vector Machine

The Support Vector Machine (SVM) is a supervised ML algorithm used for classification and regression tasks. It is particularly effective in high-dimensional spaces and is known for its robustness in handling linear and nonlinear data. In the domain of EDM, SVM has been utilised for different goals, such as predicting student performance, classifying learning behaviours, and identifying at-risk students [144].
SVM is utilised in several studies to interpret educational data and transform them into valuable information for decision-making. For example, Ref. [45] applied SVM to educational data to aid decision-making processes. In Ref. [82], SVM was used to balance the dataset, achieving an accuracy of around 98.3%. In addition, Ref. [145] employed SVM to predict students’ exam performance and overall personality development, achieving 97.3% accuracy. In Ref. [81], SVM was highlighted for improving student attendance and academic performance.
Another study by [75] applied SVM to predict student performance using data from an educational administration system and an online learning platform, achieving 67% accuracy in early warnings for course performance. Furthermore, Ref. [81] used SVM to predict academic performance with around 96% accuracy based on students’ academic attributes.
SVM was also used by [63] to predict student dropouts, considering factors such as academic performance, social welfare status, and secondary school type, achieving 83% accuracy. In Ref. [72], SVM was applied in MOOCs to optimise dropout prediction and identify at-risk students. Moreover, Ref. [146] used SVM to predict degree completion within three years for STEM community college students, achieving 90.42% accuracy, allowing institutions to take early action to prevent dropouts.
Improvements in SVMs have made this algorithm highly effective for detecting nonlinear patterns in educational data, particularly in multidimensional feature spaces where traditional linear classifiers might struggle. For example, SVMs with radial basis function kernels have been shown to effectively classify students’ learning behaviours [147]. Moreover, tuning the hyperparameters of an SVM is crucial for optimal performance. Automated hyperparameter tuning methods, such as grid search and random search, have been recently applied to educational datasets, reducing the expertise barrier for implementing SVM effectively [148].
In short, the SVM algorithm proves to be a reliable and effective tool in EDM. It is widely used to predict student performance, identify at-risk students, and classify learning behaviours, contributing significantly to early intervention and decision-making in educational settings. The accuracy rates achieved in various studies demonstrate SVM’s strong potential in enhancing educational outcomes and supporting administrative processes. These results highlight the importance of SVM in improving the overall quality of education.

4.2.5. Artificial Neural Network

An Artificial Neural Network (ANN) is a computational model inspired by the way biological neural networks in the human brain process information. It consists of interconnected nodes, or “neurons”, organised in layers that process input data to produce an output [149]. These networks can learn from data by adjusting the weights of connections based on the input–output relationships observed during training.
In the context of EDM, the ANN is utilised to model complex patterns in the big data generated by educational organisations, such as predicting student performance, identifying at-risk students, and optimising administrative processes. By providing insights that inform interventions, resource allocation, and policy development, ANNs significantly enhance decision-making processes within educational organisations [150].
The new versions of deep learning architectures, such as Long Short-Term Memory and Convolutional Neural Networks, have further enhanced the capability of ANNs in EDM. These architectures enable more accurate predictions, particularly in sequential data like student engagement over time or spatial data like student interactions with LMSs [151]. However, a key limitation of ANNs is their lack of interpretability, commonly referred to as the “black box” problem. Researchers are addressing this by exploring Explainable AI techniques that provide insights into how ANN models make predictions, improving their transparency and trust among educational administrators and policy-makers [152].
The ANN model has been widely utilised in various studies to predict at-risk students and improve educational outcomes. For example, Ref. [36] employed the ANN model to predict at-risk students before a course began, achieving approximately 83% accuracy. Similarly, Ref. [49] used the ANN algorithm to predict students’ future grades and study durations based on prior course data, enabling managers to take proactive measures and improve administrative processes.
In another study, a Generative Adversarial Network, an extension of the ANN model, was used by [82] to identify factors affecting student performance, providing administrators with early insights. This model achieved 98% accuracy. In addition, Ref. [87] utilised the ANN model to enhance the efficiency of a student management system, enabling administrators to take proactive steps. The model achieved 88% accuracy, with key factors influencing student success identified as graduate profile, optional courses taken, age at enrolment, admission scores, and the number of failed exams [37].
The ANN model also played a key role in identifying low-performing students in [81], allowing educators and managers to provide early support, improving student performance with 95% accuracy. Similarly, Ref. [79] applied the ANN algorithm to predict slow learners at an early stage, achieving around 75% accuracy, helping administrators to decide on special assistance for these students. Overall, these applications contribute to improving both the administrative processes and outcomes of educational organisations.
Furthermore, educational organisations leverage LMS data, using the ANN algorithm to extract insights that enhance administrative processes. In this context, the ANN algorithm serves as an EDM tool to extract valuable insights from this BD, thereby improving and enhancing the administration process. The ANN algorithm predicts at-risk students, enabling early intervention measures [32]. The ANN model achieved approximately 93% accuracy by utilising a set of uniquely handcrafted features extracted from clickstream data in virtual learning environments. Another example of using the ANN model to explore hidden information in educational BD is demonstrated by [28]. In the study, the ANN algorithm is utilised to predict student performance by leveraging hidden information in LMS data, such as learning behavioural features. This enables managers to reallocate resources effectively and improve the administration process.
In short, the ANN plays an important role in EDM by modelling complex patterns within the BD generated by educational organisations. ANNs are employed to predict student performance, identify at-risk students, and optimise administrative processes, providing actionable insights that enhance decision-making. Studies demonstrate high accuracy rates, with ANNs achieving up to 98% accuracy in various predictive tasks. These models enable early intervention, resource reallocation, and improved management within educational institutions, contributing to the overall efficiency and effectiveness of educational administration.

4.2.6. Logistic Regression

Logistic Regression (LR) is a supervised learning algorithm for binary classification tasks. Unlike linear regression, which predicts continuous values, LR predicts the probability that a given input belongs to a particular class between 0 and 1. The model estimates the parameters of the input features by maximising the likelihood of the observed data, which is performed through a process called maximum likelihood estimation [140].
In the context of EDM, LR is often used to improve the educational organisations’ management process [153]. The LR algorithm is employed to predict and identify key factors affecting student success. One study found that the average number of logins to the LMS was the most significant predictor of student performance [38]. Similarly, Ref. [63] used the LR model to predict student success based on profile information and personal attributes, aiding educators and decision-makers in providing early support and improving educational management.
In addition, Ref. [82] utilised the LR model to predict academic performance over both short and long observation periods, enabling educators and administrators to identify students at risk of failing. The LR model achieved an accuracy of around 99.8% for longer observation periods. In another study by [40], the LR algorithm was used to identify students in need of additional support and interventions to enhance academic performance, contributing to improved outcomes and administrative processes.
Moreover, Ref. [34] applied the LR algorithm to predict school dropout using students’ scores and EDM techniques. The model achieved an accuracy of approximately 74.05% in identifying students at risk of dropping out, guiding managers to take proactive steps to reduce dropout rates and enhance administrative efficiency.
In short, the LR algorithm is a good tool in EDM for binary classification tasks, effectively predicting student outcomes, enhancing the administration of educational organisations, and identifying at-risk individuals. Studies demonstrate its efficacy in pinpointing factors such as LMS login frequency and personal attributes that influence academic performance. With good accuracy rates, the LR model enables educators and administrators to intervene proactively, enhancing student success and educational institutions’ overall management. The algorithm’s applications in predicting risks, such as school dropout, further underscore its value in facilitating informed decision-making and improving organisational processes.

4.2.7. Naive Bayes

The Naive Bayes (NB) algorithm is a family of probabilistic ML algorithms based on Bayes’s theorem, with an assumption of independence among predictors [154]. It calculates the probability of belonging to a specific class. The class that obtains the highest probability is considered as the class of those data [116].
The NB algorithm was utilised by [47] to identify factors affecting the academic success of tertiary students, achieving a prediction accuracy of 92.17%. The study highlights NB’s ability to handle large educational datasets and provide valuable insights for decision-making in educational settings. In addition, Ref. [90] employed the NB model to analyse factors influencing college students’ decisions regarding majors and their overall academic performance. The model was used to predict student behaviours, attitudes, and performance, facilitating the implementation of proactive measures to enhance student achievement.
Moreover, Ref. [89] applied the NB algorithm to develop a mechanism assisting teachers in predicting students’ academic performance and implementing corresponding interventions. This approach enabled early warnings and tailored support for students at different levels. NB was also employed by [155] to analyse learning behaviour patterns using EDM, focusing on identifying key factors not directly related to learning behaviours. Their combined algorithm achieved an accuracy of around 90%, helping to interpret students’ learning habits.
In addition, Ref. [156] used NB to predict academic performance and suggest improvements where needed. The NB algorithm proved effective in making successful decisions that enhance student performance. Furthermore, Ref. [157] developed a flexible and generalisable NB model using a large dataset of student responses, incorporating a broader range of attributes, making NB an effective predictor of student performance.
Overall, the NB algorithm is an effective EDM tool, particularly for predicting student performance and identifying influential academic factors. Its probabilistic nature allows for accurate classification, even with independent predictors. Studies consistently demonstrate NB’s ability to process large educational datasets, providing critical insights for decision-making and proactive interventions. Its versatility across various educational applications underscores its value in enhancing student outcomes and improving institutional processes.

4.3. Administrators’ Views on Impact of EDM

Understanding administrators’ perceptions of EDM is crucial for its successful implementation. As key decision-makers, their attitudes toward EDM influence its adoption and effectiveness. Addressing their concerns and insights can lead to more effective integration of data-driven strategies in educational administration.
Educational administration systems should meet users’ needs, and decision-makers must clearly communicate these requirements to developers. For example, one study discusses user satisfaction with ICT systems that utilise EDM technology for daily administrative tasks [2]. It mentions that many ICT systems are in use, yet users are not highly satisfied. The study also highlights factors affecting user satisfaction, such as privacy, security, and usability [2]. Another study finds that schools and administrators need to focus on performance-based accountability policies, supportive relationships among users, and developing a positive attitude towards EDM. Additionally, improving the accessibility of data systems and enhancing teachers’ data and ICT literacy is crucial [158]. Furthermore, the use of EDM applications in educational administration has not received sufficient attention, as highlighted by [98]. Implementing EDM systems in educational organisations’ administrative processes requires effective collaboration across different organisational levels.
Administrative processes at the primary level often receive less attention compared to higher levels, as noted by [70]. For example, resource allocation and availability in primary schools are frequently overlooked, resulting in increased wastage [70]. The integration of EDM applications can help identify gaps in the educational administration process, providing valuable insights to higher-level departments and administrators. A study by [159] explores the adaptation of Saudi Arabian schools to ICT and EDM systems for administrative purposes, highlighting challenges such as limited infrastructure, staff shortages, inadequate IT support, and resistance from principals.
Similarly, Ref. [160] discusses the influence of school principals on their staff’s use of ICT systems in educational administration. Their findings reveal that when principals actively promote and lead the implementation of ICT systems, usage rates increase significantly. Conversely, a lack of leadership from principals hinders the adoption of these technologies.
EDM technologies, in conjunction with ICT systems, were employed by [161] to identify factors influencing educational staff satisfaction during the COVID-19 pandemic. The study found that principals play a crucial role in promoting ICT systems and ensuring proper training for staff in educational organisations. However, Ref. [69] highlights that principals and managers often lack the time to analyse and interpret the large volumes of big data (BD) generated by educational administration systems. As digital systems become more prevalent, the size of data increases, making it difficult for managers to extract valuable insights from BD.
Moreover, educational managers frequently struggle with the complexity and time-consuming nature of managing various educational administration systems. This challenge is further exacerbated by the need to train staff and communicate effectively with higher departments to address issues, as noted by [112]. In addition, bureaucracy in educational organisations is identified as a key factor that hinders the use of EDM technologies. According to [162], bureaucratic processes demotivate managers and impede the digital adoption of ICT systems, largely due to limited communication with upper-level departments.

4.4. Potential EDM Benefits, Drawbacks, and Challenges

4.4.1. Benefits

EDM empowers educational administrators to make data-driven decisions, enhancing the accuracy and efficiency of administrative processes. For instance, insights derived from data can guide resource allocation, inform course offerings, and streamline organisational assignments, ensuring that decisions are grounded in actual needs and trends rather than intuition or guesswork. Recent studies demonstrate how EDM tools support decision-making by offering actionable insights from student data, resulting in more effective institutional management. Moreover, EDM facilitates the sorting, analysis, and utilisation of data generated by educational management systems, such as ICT platforms. It also aids in predicting and improving educational outcomes, as highlighted in studies such as [81].
EDM enables educational administrators to make data-driven decisions, improving the accuracy and efficiency of administrative processes. For example, data-driven insights can inform resource allocation [63], course offerings [55], and organisations assignments [34], ensuring that decisions are based on actual needs and trends rather than intuition or guesswork [86]. Recent studies demonstrate how EDM tools can support decision-making by providing actionable insights derived from student data, leading to more effective institutional management [41,59]. EDM helps educational organisations to sort, learn, improve, and collect information from data generated by educational management systems such as ICT systems [67]. Additionally, EDM aids in predicting the outcomes of educational organisations and improving their overall performance, as discussed in [81].
EDM can identify patterns in student behaviour [106], such as attendance [81], engagement [92], and academic performance [102], which are predictive of student success or failure [78]. By identifying at-risk students early [140], educational institutions can intervene with targeted support, such as tutoring or counselling, to improve retention and graduation rates. By providing insights into trends and potential future developments, EDM supports long-term strategic planning in educational institutions. For example, forecasting enrolment trends or identifying emerging skill demands can help institutions adjust their curriculum and resources to better meet future challenges. A study conducted by [62] discusses how EDM aids in strategic planning by offering predictive insights that inform the direction of educational policies and practices.
The automation of routine administrative tasks, such as scheduling, enrolment management, and faculty workload distribution, can be greatly enhanced by EDM [68]. EDM can also contribute to resource savings by automating routine tasks, such as monitoring attendance. By integrating EDM technologies, administrative departments can redistribute resources more effectively and reduce the risk of staff burnout [68]. EDM tools can streamline operations by analysing historical data to predict future needs, thus improving the overall efficiency of educational organisations [2]. Using EDM helps prevent potential cyber issues such as privacy and security and can identify sections needing more attention like staff training or infrastructure [112]. EDM helps IT departments and decision-makers to know more about educational administration systems and improve the cybersecurity awareness of the workers and the organisations [163]. For example, during the COVID-19 pandemic, almost all educational organisations shifted to remote learning, which put many of them at risk of cybersecurity problems such as data breaches, unskilled usage, and phishing scams. Using EDM technologies helped reduce the risk and impact of these cybersecurity threats [163].
EDM facilitates the creation of personalised learning experiences by analysing individual student data, such as learning styles, performance metrics, and engagement levels [69]. This allows educators to tailor instruction and resources to meet the unique needs of each student, enhancing their learning experience [72]. One study illustrates how personalisation through EDM leads to more engaged and successful students by aligning educational content with individual learning needs [106].

4.4.2. Drawbacks and Challenges

  • Privacy and security concerns: The extensive collection and analysis of student data raise significant privacy and security concerns. Educational organisations must ensure that they comply with data protection regulations to avoid breaches that could harm students, systems, and the organisation’s reputation [67]. Ensuring data privacy while harnessing the power of EDM is a critical and ongoing challenge for educational organisations [112,159].
  • Operational cost: The implementation of EDM systems can be expensive and resource-intensive. Organisations need to invest in new technologies, upgrade their data infrastructure, and provide training for staff, which can be a significant burden, especially for smaller institutions [112]. For example, in rural areas it is hard to have a consistently reliable internet connection, and this impacts the users’ usage of EDM systems for administration tasks, as mentioned in [164].
  • Resistance to change: Introducing EDM tools often requires a cultural shift within the institution. Faculty and staff may resist these changes, particularly if they perceive them as undermining their professional judgment or increasing their workload. Resistance to change is a major barrier to the successful adoption of EDM technologies in educational settings, as discussed in [159].
  • Data quality and integration: One of the key challenges in implementing EDM is ensuring that the data used are accurate, complete, and integrated from various sources within the institution [165]. Inaccurate or incomplete data can lead to faulty analyses and poor decision-making [53]. Data quality issues are a barrier to the effective use of EDM in educational settings [166].
  • Overfitting and model generalisability: ML models including RFs and ANNs are prone to overfitting when applied to highly specific or complex datasets [167]. This occurs when a model becomes too tailored to the training data, resulting in poor performance on unseen or future data. Educational datasets, which often contain noise or outliers, can exacerbate this issue, leading to unreliable predictions. To mitigate this, institutions can employ techniques such as cross-validation and regularisation, though this requires a level of technical expertise that may not always be available in smaller institutions or those with limited resources [168]. Additionally, ensuring that models generalise well to new data is a significant challenge in the real-world deployment of EDM systems, which could impact the reliability of insights drawn from them [169].
  • Bias in data: Historical biases in educational datasets can significantly affect the performance and fairness of predictive models [170]. For example, if a dataset reflects existing socio-economic inequalities or other demographic imbalances, the model’s predictions may perpetuate these disparities, leading to biased decision-making in areas such as admissions or student support. Techniques such as fairness-aware machine learning and bias detection algorithms are being explored to mitigate this issue, though their implementation is still in its infancy in most educational institutions. It is critical that educational organisations actively monitor for biases and develop strategies to minimise their impact on both the model’s outcomes and the students affected [171].
  • Ethical considerations: The use of EDM raises important ethical questions, particularly regarding consent, transparency, and potential biases in data analysis [172]. Educational organisations must carefully consider the ethical implications of using student data to ensure that their practices are fair and just. This emphasises the importance of addressing ethical issues in EDM to avoid potential harm to students [173].
  • Technical expertise: Successfully implementing EDM requires a certain level of technical expertise, which may not be readily available in all educational organisations [164]. Institutions may need to invest in training or hiring specialists to manage and interpret the data effectively. The need for skilled personnel is highlighted by [112], which discusses the challenges of building technical capacity in educational institutions for EDM implementation and how EDM affects the users’ satisfaction regarding ICT systems.
  • Alignment with educational goals: Ensuring that EDM initiatives align with the institution’s broader educational goals and values can be challenging. There is a risk that the focus on data-driven decision-making could direct the institution from other important aspects of education, such as creativity and critical thinking [138].

5. Conclusions

This SoK highlights the transformative impact of EDM on the administrative functions of educational organisations. The findings reveal that EDM plays an essential role in enhancing decision-making, optimising resource allocation, and elevating overall institutional performance. By employing advanced techniques such as clustering, classification, and regression, EDM provides administrators with powerful tools to make data-driven decisions that lead to more effective management practices.
Notably, the DT and RF emerged as particularly effective techniques for predicting student performance and identifying at-risk students. These methods enable timely interventions, improving student retention rates and academic outcomes. The ability of EDM to process and analyse BD of educational data also allows for more strategic resource allocation, ensuring that efforts are focused where they are most needed for maximum impact.
The ANN, despite its potential, has not yet received sufficient attention as an EDM tool in the studies reviewed. While ANNs are known for their ability to model complex patterns and provide deep insights, their application in educational administration remains under-explored. This gap presents an opportunity for future research to investigate the potential of ANNs in enhancing EDM applications, particularly in areas requiring nuanced data analysis.
Moreover, this review highlights how EDM contributes to improved institutional performance by streamlining administrative processes and fostering data-driven policy development. However, the integration of EDM is not without challenges, particularly concerning data privacy and the need for seamless system integration. These issues point to the ongoing need for research and innovation in this area.
In conclusion, this review demonstrates the significant potential of EDM to revolutionise educational administration. By providing detailed insights into the most effective techniques and their practical applications, this study offers valuable guidance for educational leaders and policy-makers aiming to harness the full power of data to enhance organisational efficiency and student success.

6. Future Work and Limitations

The identified applications of EDM can be utilised for further analysis as a foundation for developing a comprehensive EDM framework for administrative tasks. This will encourage educational organisations to hire EDM programmers that will assist them in achieving their objectives and enhancing the efficiency and effectiveness of their administrative processes.
The findings also set the stage for future research aimed at refining EDM methodologies, exploring the underutilised potential of tools like ANN, and addressing implementation challenges, thereby advancing the field of educational management.
However, the presented SoK has several limitations. The applications of EDM have been identified through qualitative content analysis and based on the authors’ interpretations of the findings, which may introduce bias into this review. Furthermore, our review focuses on the use of EDM for administrative tasks within educational organisations rather than instructional or learning-focused applications. Despite these limitations, this review’s findings highlight the main requirements that educational organisations should consider for the successful implementation of EDM programmers.

Author Contributions

Conceptualization, H.A. and B.S.; Methodology, H.A., B.S., A.L. and I.A.; Validation, H.A. and I.A.; Formal analysis, H.A., B.S. and I.A.; Investigation, H.A. and B.S.; Resources, B.S. and A.L.; Data curation, H.A.; Writing—original draft, H.A.; Writing—review & editing, B.S. and A.L.; Supervision, B.S. and A.L.; Project administration, B.S. and A.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SoKSystematisation of knowledge
ICTInformation and communication technology
EDMEducational Data Mining
BDBig data
MLMachine learning
PRISMAPreferred Reporting Items for Systematic Reviews and Meta-Analyses
WLWhite literature
GLGrey literature

References

  1. Cappa, F.; Oriani, R.; Peruffo, E.; McCarthy, I. Big data for creating and capturing value in the digitalized environment: Unpacking the effects of volume, variety, and veracity on firm performance. J. Prod. Innov. Manag. 2021, 38, 49–67. [Google Scholar] [CrossRef]
  2. Almaghrabi, H.; Soh, B.; Li, A. Using ML to Predict User Satisfaction with ICT Technology for Educational Institution Administration. Information 2024, 15, 218. [Google Scholar] [CrossRef]
  3. Umezuruike, C.; Ngugi, H.N. Imminent Challenges of Adoption of Big Data in Educational Systems in Sub-Saharan Africa Nations. Int. J. Recent Technol. Eng. 2020, 8, 4544–4550. [Google Scholar] [CrossRef]
  4. Zafari, M.; Bazargani, J.S.; Sadeghi-Niaraki, A.; Choi, S.M. Artificial intelligence applications in K-12 education: A systematic literature review. IEEE Access 2022, 10, 61905–61921. [Google Scholar] [CrossRef]
  5. Alenezi, H.S.; Faisal, M.H. Utilizing crowdsourcing and machine learning in education: Literature review. Educ. Inf. Technol. 2020, 25, 2971–2986. [Google Scholar] [CrossRef]
  6. Sandra, L.; Lumbangaol, F.; Matsuo, T. Machine Learning Algorithm to Predict Student’s Performance: A Systematic Literature Review. TEM J. 2021, 10, 1919–1927. [Google Scholar] [CrossRef]
  7. de Baker, R.S.J.; Barnes, T.; Beck, J.E. Educational data mining 2008. In Proceedings of the 1st International Conference on Educational Data Mining, Montréal, QC, Canada, 20–21 June 2008. [Google Scholar]
  8. Baker, R.S.; Yacef, K. The state of educational data mining in 2009: A review and future visions. J. Educ. Data Min. 2009, 1, 3–17. [Google Scholar]
  9. Romero, C.; Ventura, S.; Espejo, P.G.; Hervás, C. Data mining algorithms to classify students. In Proceedings of the 1st International Conference on Educational Data Mining, Montréal, QC, Canada, 20–21 June 2008. [Google Scholar]
  10. Xiao, W.; Ji, P.; Hu, J. A survey on educational data mining methods used for predicting students’ performance. Eng. Rep. 2022, 4, e12482. [Google Scholar] [CrossRef]
  11. Dake, D.K.; Buabeng-Andoh, C. Using Machine Learning Techniques to Predict Learner Drop-out Rate in Higher Educational Institutions. Mob. Inf. Syst. 2022, 2022, e2670562. [Google Scholar] [CrossRef]
  12. Costa, A.G.; Queiroga, E.; Primo, T.T.; Mattos, J.C.B.; Cechinel, C. Prediction analysis of student dropout in a Computer Science course using Educational Data Mining. In Proceedings of the 2020 XV Conferencia Latinoamericana de Tecnologias de Aprendizaje (LACLO), Loja, Ecuador, 19–23 October 2020; pp. 1–6. [Google Scholar] [CrossRef]
  13. Cavacini, A. What is the best database for computer science journal articles? Scientometrics 2015, 102, 2059–2071. [Google Scholar] [CrossRef]
  14. Garousi, V.; Felderer, M.; Mäntylä, M.V.; Rainer, A. Benefitting from the Grey Literature in Software Engineering Research. arXiv 2019, arXiv:1911.12038. [Google Scholar] [CrossRef]
  15. Garousi, V.; Felderer, M.; Mäntylä, M.V. Guidelines for including grey literature and conducting multivocal literature reviews in software engineering. Inf. Softw. Technol. 2019, 106, 101–121. [Google Scholar] [CrossRef]
  16. Neto, G.T.G.; Santos, W.B.; Endo, P.T.; Fagundes, R.A. Multivocal literature reviews in software engineering: Preliminary findings from a tertiary study. In Proceedings of the 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), Porto de Galinhas, Brazil, 19–20 September 2019; pp. 1–6. [Google Scholar] [CrossRef]
  17. Garousi, V.; Küçük, B. Smells in software test code: A survey of knowledge in industry and academia. J. Syst. Softw. 2018, 138, 52–81. [Google Scholar] [CrossRef]
  18. Pardo, A.; Han, F.; Ellis, R.A. Combining university student self-regulated learning indicators and engagement with online learning events to predict academic performance. IEEE Trans. Learn. Technol. 2016, 10, 82–92. [Google Scholar] [CrossRef]
  19. Panic, N.; Leoncini, E.; de Belvis, G.; Ricciardi, W.; Boccia, S. Evaluation of the endorsement of the preferred reporting items for systematic reviews and meta-analysis (PRISMA) statement on the quality of published systematic review and meta-analyses. PLoS ONE 2013, 8, e83138. [Google Scholar] [CrossRef] [PubMed]
  20. Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G. Reprint-Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. Phys. Ther. 2009, 89, 873–880. [Google Scholar] [CrossRef]
  21. Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. bmj 2021, 372, n71. [Google Scholar] [CrossRef]
  22. Maphosa, M.; Maphosa, V. Educational data mining in higher education in sub-Saharan Africa: A systematic literature review and research agenda. In Proceedings of the 2nd International Conference on Intelligent and Innovative Computing Applications, Plaine Magnien, Mauritius, 24–25 September 2020; pp. 1–7. [Google Scholar]
  23. Kitchenham, B.; Pearl Brereton, O.; Budgen, D.; Turner, M.; Bailey, J.; Linkman, S. Systematic literature reviews in software engineering—A systematic literature review. Inf. Softw. Technol. 2009, 51, 7–15. [Google Scholar] [CrossRef]
  24. Keele, S.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; EBSE Technical Report; EBSE: Durham, UK, 2007. [Google Scholar]
  25. Vanwersch, R.J.; Shahzad, K.; Vanhaecht, K.; Grefen, P.W.; Pintelon, L.M.; Mendling, J.; van Merode, G.G.; Reijers, H.A. Methodological support for business process redesign in health care: A literature review protocol. Int. J. Care Pathw. 2011, 15, 119–126. [Google Scholar] [CrossRef]
  26. Tyndall, J. AACODS Checklist. Available online: https://fac.flinders.edu.au/dspace/api/core/bitstreams/e94a96eb-0334-4300-8880-c836d4d9a676/content (accessed on 25 September 2024).
  27. Feldman-Maggor, Y.; Barhoom, S.; Blonder, R.; Tuvi-Arad, I. Behind the scenes of educational data mining. Educ. Inf. Technol. 2021, 26, 1455–1470. [Google Scholar] [CrossRef]
  28. Amrieh, E.A.; Hamtini, T.; Aljarah, I. Mining educational data to predict student’s academic performance using ensemble methods. Int. J. Database Theory Appl. 2016, 9, 119–136. [Google Scholar] [CrossRef]
  29. Villegas-Ch, W.; Luján-Mora, S.; Buenaño-Fernandez, D. Towards the integration of business intelligence tools applied to educational data mining. In Proceedings of the 2018 IEEE World Engineering Education Conference (EDUNINE), Buenos Aires, Argentina, 11–14 March 2018; pp. 1–5. [Google Scholar]
  30. Wang, Y. Artificial intelligence in educational leadership: A symbiotic role of human-artificial intelligence decision-making. J. Educ. Adm. 2021, 59, 256–270. [Google Scholar] [CrossRef]
  31. Ayub, M.; Toba, H.; Wijanto, M.C.; Yong, S. Modelling online assessment in management subjects through educational data mining. In Proceedings of the 2017 International Conference on Data and Software Engineering (ICoDSE), Palembang, Indonesia, 1–2 November 2017; pp. 1–6. [Google Scholar]
  32. Waheed, H.; Hassan, S.U.; Aljohani, N.R.; Hardman, J.; Alelyani, S.; Nawaz, R. Predicting academic performance of students from VLE big data using deep learning models. Comput. Hum. Behav. 2020, 104, 106189. [Google Scholar] [CrossRef]
  33. Chau, V.T.N.; Phung, N.H. A knowledge-driven educational decision support system. In Proceedings of the 2012 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future, Ho Chi Minh City, Vietnam, 27 February–1 March 2012. [Google Scholar] [CrossRef]
  34. Lee Hernández, L.E.; Castán-Rocha, J.A.; Ibarra-Martínez, S.; Terán-Villanueva, J.D.; Treviño-Berrones, M.G.; Laria-Menchaca, J. Cluster Analysis Using k-Means in School Dropout. In Data Analytics and Computational Intelligence: Novel Models, Algorithms and Applications; Springer: Berlin/Heidelberg, Germany, 2023; pp. 3–16. [Google Scholar]
  35. Fahd, K.; Miah, S.J. Effectiveness of data augmentation to predict students at risk using deep learning algorithms. Soc. Netw. Anal. Min. 2023, 13, 113. [Google Scholar] [CrossRef]
  36. Wahdan, A.; Hantoobi, S.; Al-Emran, M.; Shaalan, K. Early detecting students at risk using machine learning predictive models. Proceedings of International Conference on Emerging Technologies and Intelligent Systems: ICETIS 2021; Springer: Berlin/Heidelberg, Germany, 2021; Volume 2, pp. 321–330. [Google Scholar]
  37. Oprea, C.; Zaharia, M. Using data mining methods in knowledge management in educational field. Fascicle Manag. Technol. Eng. 2011, 10, 5.222–5.227. [Google Scholar]
  38. Koyuncu, I.; Kılıc, A.F.; Goksun, D.O. Classification of Students’Achievement via Machine Learning by Using System Logs in Learning Management System. Turk. Online J. Distance Educ. 2022, 23, 18–30. [Google Scholar] [CrossRef]
  39. Jang, Y.; Choi, S.; Jung, H.; Kim, H. Practical early prediction of students’ performance using machine learning and eXplainable AI. Educ. Inf. Technol. 2022, 27, 12855–12889. [Google Scholar] [CrossRef]
  40. Nafea, A.A.; Mishlish, M.; Shaban, A.M.S.; AL-Ani, M.M.; Alheeti, K.M.A.; Mohammed, H.J. Enhancing Student’s Performance Classification Using Ensemble Modeling. Iraqi J. Comput. Sci. Math. 2023, 4, 204–214. [Google Scholar] [CrossRef]
  41. Zheng, C.; Zhou, W. Research on information construction and management of education management based on data mining. J. Phys. Conf. Ser. 2021, 1881, 042073. [Google Scholar] [CrossRef]
  42. Chen, J.; Zhao, J. An Educational Data Mining Model for Supervision of Network Learning Process. Int. J. Emerg. Technol. Learn. 2018, 13, 67–77. [Google Scholar] [CrossRef]
  43. Gomede, E.; Gaffo, F.H.; Briganó, G.U.; de Barros, R.M.; Mendes, L.d.S. Application of computational intelligence to improve education in smart cities. Sensors 2018, 18, 267. [Google Scholar] [CrossRef]
  44. Thaher, T.; Zaguia, A.; Al Azwari, S.; Mafarja, M.; Chantar, H.; Abuhamdah, A.; Turabieh, H.; Mirjalili, S.; Sheta, A. An enhanced evolutionary student performance prediction model using whale optimization algorithm boosted with sine-cosine mechanism. Appl. Sci. 2021, 11, 10237. [Google Scholar] [CrossRef]
  45. Siddique, A.; Jan, A.; Majeed, F.; Qahmash, A.I.; Quadri, N.N.; Wahab, M.O.A. Predicting academic performance using an efficient model based on fusion of classifiers. Appl. Sci. 2021, 11, 11845. [Google Scholar] [CrossRef]
  46. Sajja, G.S.; Pallathadka, H.; Phasinam, K.; Ray, S. Using Classification Data Mining for Predicting Student Performance. ECS Trans. 2022, 107, 10217. [Google Scholar] [CrossRef]
  47. Sanvitha Kasthuriarachchi, K.; Liyanage, S.; Bhatt, C.M. A data mining approach to identify the factors affecting the academic success of tertiary students in Sri Lanka. In Software Data Engineering for Network eLearning Environments: Analytics and Awareness Learning Services; Springer: Berlin/Heidelberg, Germany, 2018; pp. 179–197. [Google Scholar]
  48. Buenaño-Fernández, D.; Gil, D.; Luján-Mora, S. Application of machine learning in predicting performance for computer engineering students: A case study. Sustainability 2019, 11, 2833. [Google Scholar] [CrossRef]
  49. Christou, V.; Tsoulos, I.; Loupas, V.; Tzallas, A.T.; Gogos, C.; Karvelis, P.S.; Antoniadis, N.; Glavas, E.; Giannakeas, N. Performance and early drop prediction for higher education students using machine learning. Expert Syst. Appl. 2023, 225, 120079. [Google Scholar] [CrossRef]
  50. Duan, Y. A Study of Prediction Accuracy of English Test Performance Using Data Mining and Analysis. Ann. Emerg. Technol. Comput. AETIC 2023, 7, 1–8. [Google Scholar] [CrossRef]
  51. Llauró, A.; Fonseca, D.; Villegas, E.; Aláez, M.; Romero, S. Educational data mining application for improving the academic tutorial sessions, and the reduction of early dropout in undergraduate students. In Proceedings of the Ninth International Conference on Technological Ecosystems for Enhancing Multiculturality (TEEM’21), Barcelona, Spain, 26–29 October 2021; pp. 212–218. [Google Scholar]
  52. Chytas, K.; Tsolakidis, A.; Triperina, E.; Karanikolas, N.N.; Skourlas, C. An Integrated Platform for Educational and Research Management Using Institutional Digital Resources. In Proceedings of the Novel & Intelligent Digital Systems Conferences; Springer: Berlin/Heidelberg, Germany, 2023; pp. 266–276. [Google Scholar]
  53. Chalaris, M.; Gritzalis, S.; Maragoudakis, M.; Sgouropoulou, C.; Tsolakidis, A. Improving quality of educational processes providing new knowledge using data mining techniques. Procedia-Soc. Behav. Sci. 2014, 147, 390–397. [Google Scholar] [CrossRef]
  54. Afrin, F.; Rahaman, M.S.; Hamilton, M. Mining Student Responses to Infer Student Satisfaction Predictors. arXiv 2020, arXiv:2006.07860. [Google Scholar] [CrossRef]
  55. Harsono, S.; Utami, E.; Yaqin, A. The Association Rule Methods and K-Means Clustering For Optimization Mapping Of New Students Admission. In Proceedings of the 2024 IEEE International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), Virtual, 22–23 February 2024; pp. 1–6. [Google Scholar]
  56. Chiu, T.K.; Moorhouse, B.L.; Chai, C.S.; Ismailov, M. Teacher support and student motivation to learn with Artificial Intelligence (AI) based chatbot. Interact. Learn. Environ. 2023, 32, 3240–3256. [Google Scholar] [CrossRef]
  57. Yu, W. Application of Big Data Technology in the Innovation of University Education Management Work. In Proceedings of the 2021 4th International Conference on Information Systems and Computer Aided Education, Dalian, China, 24–26 September 2021; pp. 988–992. [Google Scholar]
  58. Fan, Z.; Gou, J.; Wang, C. Predicting secondary school student performance using a double particle swarm optimization-based categorical boosting model. Eng. Appl. Artif. Intell. 2023, 124, 106649. [Google Scholar] [CrossRef]
  59. Kovalev, S.; Kolodenkova, A.; Muntyan, E. Educational data mining: Current problems and solutions. In Proceedings of the 2020 V International Conference on Information Technologies in Engineering Education (Inforino), Moscow, Russia, 14–17 April 2020; pp. 1–5. [Google Scholar]
  60. Moscoso-Zea, O.; Sampedro, A.; Luján-Mora, S. Datawarehouse design for educational data mining. In Proceedings of the 2016 15th International Conference on Information Technology Based Higher Education and Training (ITHET), Istanbul, Turkey, 8–10 September 2016; pp. 1–6. [Google Scholar]
  61. Almuniri, I.; Said, A.M. School’s performance evaluation based on data mining. Int. J. Eng. Inf. Syst. 2017, 1, 56–62. [Google Scholar]
  62. Agasisti, T.; Bowers, A.J. Data analytics and decision making in education: Towards the educational data scientist as a key actor in schools and higher education institutions. In Handbook of Contemporary Education Economics; Edward Elgar Publishing: Northampton, MA, USA, 2017; pp. 184–210. [Google Scholar]
  63. Stasyshin, V.M.; Stasyshin, T.V. Analysis of Educational Data in the Decision-Making Support System of University; IEEE: Piscataway, NJ, USA, 2018; pp. 541–545. ISBN 9781538670545. [Google Scholar] [CrossRef]
  64. Ma, Y. Utilization of Data Mining Technology in University Students Management. In Proceedings of the 2022 International Symposium on Advances in Informatics, Electronics and Education (ISAIEE), Frankfurt, Germany, 17–19 December 2022; pp. 663–667. [Google Scholar]
  65. Stefanova, K.; Kabakchieva, D. Educational data mining perspectives within university big data environment. In Proceedings of the 2017 International Conference on Engineering, Technology and Innovation (ICE/ITMC), Madeira Island, Portugal, 27–29 June 2017; pp. 264–270. [Google Scholar]
  66. Jerabek, M.; Kubat, J.; Fabera, V. Smart, Smarter, and Smartest City: The Method to Comparison of Cities; Springer: Berlin/Heidelberg, Germany, 2020; pp. 33–41. ISBN 9783030342722. [Google Scholar] [CrossRef]
  67. Nisha, N.S. Mining and its applications in Data Educational Management System. J. Sci. 2018, 12, 79–82. [Google Scholar]
  68. Ahmad, S.F.; Alam, M.M.; Rahmat, M.K.; Mubarik, M.S.; Hyder, S.I. Academic and Administrative Role of Artificial Intelligence in Education. Sustainability 2022, 14, 1101. [Google Scholar] [CrossRef]
  69. Li, S. Analysis on the application and challenge of educational big data in university teaching management. In Proceedings of the 2020 Conference on Education, Language and Inter-cultural Communication (ELIC 2020), Zhengzhou, China, 21–22 September 2020; Atlantis Press: Amsterdam, The Netherlands, 2020; pp. 148–153. [Google Scholar]
  70. Inusah, F.; Missah, Y.M.; Ussiph, N.; Twum, F. Expert system in enhancing efficiency in basic educational management using data mining techniques. IJACSA Int. J. Adv. Comput. Sci. Appl. 2021, 12, 427–434. [Google Scholar] [CrossRef]
  71. Murphy, R.F. Artificial Intelligence Applications to Support K-12 Teachers and Teaching; Rand Corporation: Santa Monica, CA, USA, 2019; p. 10. [Google Scholar]
  72. Basnet, R.B.; Johnson, C.; Doleck, T. Dropout prediction in Moocs using deep learning and machine learning. Educ. Inf. Technol. 2022, 27, 11499–11513. [Google Scholar] [CrossRef]
  73. Zhao, H. Research on Construction of Educational Management Model Based on Data Mining Technology. J. Appl. Sci. Eng. 2022, 26, 613–621. [Google Scholar]
  74. Wu, L. Educational Integrated Management System based on Artificial Intelligence and Multimedia. In Proceedings of the 2022 IEEE 2nd International Conference on Mobile Networks and Wireless Communications (ICMNWC), Tumkur, India, 3–4 December 2022; pp. 1–6. [Google Scholar]
  75. Liu, C.; Wang, H.; Yuan, Z. A Method for Predicting the Academic Performances of College Students Based on Education System Data. Mathematics 2022, 10, 3737. [Google Scholar] [CrossRef]
  76. Zhang, Y.; Yun, Y.; An, R.; Cui, J.; Dai, H.; Shang, X. Educational data mining techniques for student performance prediction: Method review and comparison analysis. Front. Psychol. 2021, 12, 698490. [Google Scholar] [CrossRef]
  77. Bai, X.; Zhang, F.; Li, J.; Guo, T.; Aziz, A.; Jin, A.; Xia, F. Educational Big Data: Predictions, Applications and Challenges. Big Data Res. 2021, 26, 100270. [Google Scholar] [CrossRef]
  78. Fernandes, E.; Holanda, M.; Victorino, M.; Borges, V.; Carvalho, R.; Van Erven, G. Educational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil. J. Bus. Res. 2019, 94, 335–343. [Google Scholar] [CrossRef]
  79. Kaur, P.; Singh, M.; Josan, G.S. Classification and Prediction Based Data Mining Algorithms to Predict Slow Learners in Education Sector. Proc. Procedia Comput. Sci. 2015, 57, 500–508. [Google Scholar] [CrossRef]
  80. Moodley, R.; Chiclana, F.; Carter, J.; Caraffini, F. Using data mining in educational administration: A case study on improving school attendance. Appl. Sci. 2020, 10, 3116. [Google Scholar] [CrossRef]
  81. Garg, S.; Aleem, A.; Gore, M.M. Employing Deep Neural Network for Early Prediction of Students’ Performance. In Proceedings of the Intelligent Systems: Proceedings of ICMIB 2020; Springer: Berlin/Heidelberg, Germany, 2021; pp. 497–507. [Google Scholar]
  82. Vives, L.; Cabezas, I.; Vives, J.C.; Reyes, N.G.; Aquino, J.; Cóndor, J.B.; Altamirano, S.F.S. Prediction of Students’ Academic Performance in the Programming Fundamentals Course Using Long Short-Term Memory Neural Networks. IEEE Access 2024, 12, 5882–5898. [Google Scholar] [CrossRef]
  83. Leelaluk, S.; Minematsu, T.; Taniguchi, Y.; Okubo, F.; Shimada, A. Predicting student performance based on Lecture Materials data using Neural Network Models. Ceur Workshop Proc. 2022, 3120, 11–20. [Google Scholar]
  84. Liu, T.; Wang, C.; Chang, L.; Gu, T. Predicting high-risk students using learning behavior. Mathematics 2022, 10, 2483. [Google Scholar] [CrossRef]
  85. Arun, D.; Namratha, V.; Ramyashree, B.; Jain, Y.P.; Choudhury, A.R. Student academic performance prediction using educational data mining. In Proceedings of the 2021 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 27–29 January 2021; pp. 1–9. [Google Scholar]
  86. Zhang, Y.; Jing, R.; Lan, L. Analysis of Student Learning Behavior Portrait Based on Big Data Technology. In Proceedings of the 3rd International Conference on New Media Development and Modernized Education, NMDME 2023, Xi’an, China, 13–15 October 2023. [Google Scholar]
  87. Huang, H.; Li, B. Design and implementation of student management system of integrated programmable device programming system. Sci. Rep. 2024, 14, 11873. [Google Scholar] [CrossRef]
  88. Desai, U.; Ramasamy, V.; Kiper, J. Evaluation of student collaboration on canvas LMS using educational data mining techniques. In Proceedings of the 2021 ACM Southeast Conference, Virtual, 15–17 April 2021; pp. 55–62. [Google Scholar]
  89. Wang, Y.J.; Gao, C.L.; Ye, X.D. A data-driven precision teaching intervention mechanism to improve secondary school students’ learning effectiveness. Educ. Inf. Technol. 2023, 29, 11645–11673. [Google Scholar] [CrossRef]
  90. Veluri, R.K.; Patra, I.; Naved, M.; Prasad, V.V.; Arcinas, M.M.; Beram, S.M.; Raghuvanshi, A. Learning analytics using deep learning techniques for efficiently managing educational institutes. Mater. Today Proc. 2022, 51, 2317–2320. [Google Scholar] [CrossRef]
  91. Pallathadka, H.; Wenda, A.; Ramirez-Asís, E.; Asís-López, M.; Flores-Albornoz, J.; Phasinam, K. Classification and prediction of student performance data using various machine learning algorithms. Mater. Today Proc. 2023, 80, 3782–3785. [Google Scholar] [CrossRef]
  92. Prada, M.A.; Dominguez, M.; Vicario, J.L.; Alves, P.A.V.; Barbu, M.; Podpora, M.; Spagnolini, U.; Pereira, M.J.V.; Vilanova, R. Educational data mining for tutoring support in higher education: A web-based tool case study in engineering degrees. IEEE Access 2020, 8, 212818–212836. [Google Scholar] [CrossRef]
  93. Singh, M.; Nagar, H.; Sant, A. Using data mining to predict primary school student performance. IJARIIE 2019, 2, 43–46. [Google Scholar] [CrossRef]
  94. Wang, Z.; Zhu, C.; Ying, Z.; Zhang, Y.; Wang, B.; Jin, X.; Yang, H. Design and implementation of early warning system based on educational big data. In Proceedings of the 2018 5th International Conference on Systems and Informatics (ICSAI), Nanjing, China, 10–12 November 2018; pp. 549–553. [Google Scholar]
  95. Wang, W. Model construction and research on decision support system for education management based on data mining. Comput. Intell. Neurosci. 2021, 2021, 9056947. [Google Scholar] [CrossRef] [PubMed]
  96. Romero-Rodríguez, J.M.; Alonso-García, S.; Marín-Marín, J.A.; Gómez-García, G. Considerations on the implications of the internet of things in spanish universities: The usefulness perceived by professors. Future Internet 2020, 12, 123. [Google Scholar] [CrossRef]
  97. Ray, S.; Saeed, M. Applications of educational data mining and learning analytics tools in handling big data in higher education. Applications of Big Data Analytics: Trends, Issues, and Challenges; Springer: Berlin/Heidelberg, Germany, 2018; pp. 135–160. [Google Scholar]
  98. Chen, S.; Pian, Y.; Zheng, Y. Challenges and Strategies for Designing More Effective Educational Data Mining Applications. In Proceedings of the 2023 Twelfth International Conference of Educational Innovation through Technology (EITT), Fuzhou, China, 15–17 December 2023; pp. 175–179. [Google Scholar]
  99. Kiu, M.S.; Lai, K.W.; Chia, F.C.; Wong, P.F. Blockchain integration into electronic document management (EDM) system in construction common data environment. Smart Sustain. Built Environ. 2024, 13, 117–132. [Google Scholar] [CrossRef]
  100. Fernandez-Medina, C.; Pérez-Pérez, J.R.; Álvarez-García, V.M.; Paule-Ruiz, M.D.P. Assistance in computer programming learning using educational data mining and learning analytics. In Proceedings of the 18th ACM Conference on Innovation and Technology in Computer Science Education, Canterbury, UK, 1–3 July 2013; pp. 237–242. [Google Scholar]
  101. Moreno-Guerrero, A.J.; López-Belmonte, J.; Marín-Marín, J.A.; Soler-Costa, R. Scientific development of educational artificial intelligence in Web of Science. Future Internet 2020, 12, 124. [Google Scholar] [CrossRef]
  102. Penteado, B.E.; Paiva, P.M.P.; Morettin-Zupelari, M.; Isotani, S.; Ferrari, D.V. Toward better outcomes in audiology distance education: An educational data mining approach. Am. J. Audiol. 2018, 27, 513–525. [Google Scholar] [CrossRef] [PubMed]
  103. Uguz, C. Exploring Methodologies for Utilizing Click-Track Data Using Educational Data Mining and Evidence Centered Design in Online Professional Development Environments. Ph.D. Thesis, University of Virginia, Charlottesville, VA, USA, 2016. [Google Scholar]
  104. Chandra, D.G.; Raman, A.C. Educational Data Mining on Learning Management Systems Using SCORM. In Proceedings of the 2014 Fourth International Conference on Communication Systems and Network Technologies, Bhopal, India, 7–9 April 2014; pp. 362–368. [Google Scholar]
  105. Stork, M.G. Implementing a digital learning initiative: A case study in K-12 classrooms. J. Form. Des. Learn. 2018, 2, 36–48. [Google Scholar] [CrossRef]
  106. Kellogg, S.B. Patterns of Peer Interaction and Mechanisms Governing Social Network Structure in Two Massively Open Online Courses for Educators. Ph.D. Thesis, North Carolina State University, Raleigh, NC, USA, 2014. [Google Scholar]
  107. Rawal, A.; Thomas, A. Educational Data Mining Practices in Indian Academia. In Proceedings of the 10th Innovations in Software Engineering Conference, Jaipur, India, 5–7 February 2017; pp. 218–219. [Google Scholar]
  108. Ganga, X. Educational Artificial Intelligence (EAI) Connotation, Key Technology and Application Trend-Interpretation and analysis of the two reports entitled “Preparing for the Future of Artificial Intelligence” and “The National Artificial Intelligence Research and Development Strategic Plan”. In Proceedings of the 2021 International Conference on Intelligent Computing, Automation and Applications (ICAA), Nanjing, China, 25–27 June 2021; pp. 219–223. [Google Scholar]
  109. Liu, D.; Huang, M. Engineering Certification Practice Teaching Management and Data Mining Based on Complex Hierarchical Model. In Proceedings of the 2021 4th International Conference on Information Systems and Computer Aided Education, Dalian, China, 24–26 September 2021; pp. 87–91. [Google Scholar]
  110. Sweta, S.; Sweta, S. Educational data mining in e-learning system. In Modern Approach to Educational Data Mining and Its Applications; Springer: Berlin/Heidelberg, Germany, 2021; pp. 1–12. [Google Scholar]
  111. Alvarado-Uribe, J.; Mejía-Almada, P.; Masetto Herrera, A.L.; Molontay, R.; Hilliger, I.; Hegde, V.; Montemayor Gallegos, J.E.; Ramírez Díaz, R.A.; Ceballos, H.G. Student dataset from Tecnologico de Monterrey in Mexico to predict dropout in higher education. Data 2022, 7, 119. [Google Scholar] [CrossRef]
  112. Almaghrabi, H.; Li, A.; Soh, B. IiCE: A Proposed System Based on IoTaaS to Study Administrative Efficiency in Primary Schools. In Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST; Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2022; Volume 421, pp. 121–138. [Google Scholar] [CrossRef]
  113. Huang, C.K. Change-Detection Machine Learning Model for Educational Management. Cybern. Syst. 2023, 54, 1212–1239. [Google Scholar] [CrossRef]
  114. Udupi, P.K.; Sharma, N.; Jha, S. Educational data mining and big data framework for e-learning environment. In Proceedings of the 2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 7–9 September 2016; pp. 258–261. [Google Scholar]
  115. Hidayat, N.; Wardoyo, R.; Azhari, S. Educational Data Mining (EDM) as a Model for Students’ Evaluation in Learning Environment. In Proceedings of the 2018 Third International Conference on Informatics and Computing (ICIC), Palembang, Indonesia, 17–18 October 2018; pp. 1–4. [Google Scholar]
  116. Batool, S.; Rashid, J.; Nisar, M.W.; Kim, J.; Kwon, H.Y.; Hussain, A. Educational data mining to predict students’ academic performance: A survey study. Educ. Inf. Technol. 2023, 28, 905–971. [Google Scholar] [CrossRef]
  117. Cong, J.; Zheng, P.; Bian, Y.; Chen, C.H.; Li, J.; Li, X. A machine learning-based iterative design approach to automate user satisfaction degree prediction in smart product-service system. Comput. Ind. Eng. 2022, 165, 107939. [Google Scholar] [CrossRef]
  118. Araka, E.; Oboko, R.; Maina, E.; Gitonga, R. Using educational data mining techniques to identify profiles in self-regulated learning: An empirical evaluation. Int. Rev. Res. Open Distrib. Learn. 2022, 23, 131–162. [Google Scholar] [CrossRef]
  119. Tomasevic, N.; Gvozdenovic, N.; Vranes, S. An overview and comparison of supervised data mining techniques for student exam performance prediction. Comput. Educ. 2020, 143, 103676. [Google Scholar] [CrossRef]
  120. Freund, Y.; Mason, L. The alternating decision tree learning algorithm. ICML 1999, 99, 124–133. [Google Scholar]
  121. Zhao, B.; Shuai, C.; Hou, P.; Qu, S.; Xu, M. Estimation of unit process data for life cycle assessment using a decision tree-based approach. Environ. Sci. Technol. 2021, 55, 8439–8446. [Google Scholar] [CrossRef]
  122. Tangirala, S. Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 612–619. [Google Scholar] [CrossRef]
  123. Rahmati, O.; Avand, M.; Yariyan, P.; Tiefenbacher, J.P.; Azareh, A.; Bui, D.T. Assessment of Gini-, entropy-and ratio-based classification trees for groundwater potential modelling and prediction. Geocarto Int. 2022, 37, 3397–3415. [Google Scholar] [CrossRef]
  124. Guruler, H.; Istanbullu, A. Modeling student performance in higher education using data mining. In Educational Data Mining: Applications and Trends; Springer: Berlin/Heidelberg, Germany, 2014; pp. 105–124. [Google Scholar]
  125. Sokkhey, P.; Okazaki, T. Development and optimization of deep belief networks applied for academic performance prediction with larger datasets. IEIE Trans. Smart Process. Comput. 2020, 9, 298–311. [Google Scholar] [CrossRef]
  126. Chitti, M.; Chitti, P.; Jayabalan, M. Need for interpretable student performance prediction. In Proceedings of the 2020 13th International Conference on Developments in eSystems Engineering (DeSE), Virtual, 14–17 December 2020; pp. 269–272. [Google Scholar]
  127. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  128. Allevato, A.; Thornton, M.; Edwards, S.; Perez-Quinones, M. Mining data from an automated grading and testing system by adding rich reporting capabilities. In Proceedings of the Educational Data Mining, Montreal, QC, Canada, 20–21 June 2008; Citeseer: State College, PA, USA, 2008. [Google Scholar]
  129. Salal, Y.; Abdullaev, S.; Kumar, M. Educational data mining: Student performance prediction in academic. Int. J. Eng. Adv. Technol. 2019, 8, 54–59. [Google Scholar]
  130. Kumar, M.; Singh, A. Evaluation of data mining techniques for predicting student’s performance. Int. J. Mod. Educ. Comput. Sci. 2017, 8, 25–31. [Google Scholar] [CrossRef]
  131. Dai, J. Improving Random Forest Algorithm for University Academic Affairs Management System Platform Construction. Adv. Multimed. 2022, 2022, 8064844. [Google Scholar] [CrossRef]
  132. Abubakar, Y.; Ahmad, N.B.H. Prediction of Students’ Performance in E-Learning Environment Using Random Forest. Int. J. Innov. Comput. 2017, 7, 1–5. [Google Scholar]
  133. Algur, S.P.; Bhat, P.; Ayachit, N.H. Educational data mining: RT and RF classification models for higher education professional courses. Int. J. Inf. Eng. Electron. Bus. 2016, 8, 59. [Google Scholar] [CrossRef]
  134. Senthil, S.; Lin, W.M. Applying classification techniques to predict students’ academic results. In Proceedings of the 2017 IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), Bangalore, India, 2–3 March 2017; pp. 1–6. [Google Scholar]
  135. Sandoval, A.; Gonzalez, C.; Alarcon, R.; Pichara, K.; Montenegro, M. Centralized student performance prediction in large courses based on low-cost variables in an institutional context. Internet High. Educ. 2018, 37, 76–89. [Google Scholar] [CrossRef]
  136. Yağcı, M. Educational data mining: Prediction of students’ academic performance using machine learning algorithms. Smart Learn. Environ. 2022, 9, 11. [Google Scholar] [CrossRef]
  137. Kim, Y.; Wang, Q.; Roh, T. Do information and service quality affect perceived privacy protection, satisfaction, and loyalty? Evidence from a Chinese O2O-based mobile shopping application. Telemat. Inform. 2021, 56, 101483. [Google Scholar] [CrossRef]
  138. Rovira, S.; Puertas, E.; Igual, L. Data-driven system to predict academic grades and dropout. PLoS ONE 2017, 12, e0171207. [Google Scholar] [CrossRef]
  139. Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
  140. Slater, S.; Joksimović, S.; Kovanovic, V.; Baker, R.S.; Gasevic, D. Tools for educational data mining: A review. J. Educ. Behav. Stat. 2017, 42, 85–106. [Google Scholar] [CrossRef]
  141. Bhutoria, A. Personalized education and Artificial Intelligence in the United States, China, and India: A systematic review using a Human-In-The-Loop model. Comput. Educ. Artif. Intell. 2022, 3, 100068. [Google Scholar] [CrossRef]
  142. Jawthari, M.; Stoffová, V. Predicting students’ academic performance using a modified kNN algorithm. Pollack Period. 2021, 16, 20–26. [Google Scholar] [CrossRef]
  143. Okfalisa; Gazalba, I.; Mustakim; Reza, N.G.I. Comparative analysis of k-nearest neighbor and modified k-nearest neighbor algorithm for data classification. In Proceedings of the 2017 2nd International Conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, 1–2 November 2017; pp. 294–298. [Google Scholar]
  144. Steinwart, I.; Christmann, A. Support Vector Machines; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
  145. Agarwal, S.; Pandey, G.; Tiwari, M. Data mining in education: Data classification and decision tree approach. Int. J. e-Educ. e-Bus. e-Manag. e-Learn. 2012, 2, 140. [Google Scholar] [CrossRef]
  146. Cardona, T.A.; Cudney, E.A. Predicting student retention using support vector machines. Procedia Manuf. 2019, 39, 1827–1833. [Google Scholar] [CrossRef]
  147. Li, X.; Zhang, Y.; Cheng, H.; Zhou, F.; Yin, B. An unsupervised ensemble clustering approach for the analysis of student behavioral patterns. IEEE Access 2021, 9, 7076–7091. [Google Scholar] [CrossRef]
  148. Asogbon, M.G.; Samuel, O.W.; Omisore, M.O.; Ojokoh, B.A. A multi-class support vector machine approach for students academic performance prediction. Int. J. Multidiscip. Curr. Res. 2016, 4, 210–215. [Google Scholar]
  149. Yegnanarayana, B. Artificial Neural Networks; PHI Learning Pvt. Ltd.: Delhi, India, 2009. [Google Scholar]
  150. Romero, C.; Ventura, S. Educational data mining: A review of the state of the art. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2010, 40, 601–618. [Google Scholar] [CrossRef]
  151. Lin, Y.W.; Zhou, Y.; Faghri, F.; Shaw, M.J.; Campbell, R.H. Analysis and prediction of unplanned intensive care unit readmission using recurrent neural networks with long short-term memory. PLoS ONE 2019, 14, e0218942. [Google Scholar] [CrossRef]
  152. Khosravi, H.; Shum, S.B.; Chen, G.; Conati, C.; Tsai, Y.S.; Kay, J.; Knight, S.; Martinez-Maldonado, R.; Sadiq, S.; Gašević, D. Explainable artificial intelligence in education. Comput. Educ. Artif. Intell. 2022, 3, 100074. [Google Scholar] [CrossRef]
  153. Romero, C.; Ventura, S. Data mining in education. In Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery; Wiley: Hoboken, NJ, USA, 2013; Volume 3, pp. 12–27. [Google Scholar]
  154. Rish, I. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, WA, USA, 4–10 August 2001; Volume 3, pp. 41–46. [Google Scholar]
  155. Feng, G.; Fan, M. Research on learning behavior patterns from the perspective of educational data mining: Evaluation, prediction and visualization. Expert Syst. Appl. 2024, 237, 121555. [Google Scholar] [CrossRef]
  156. Jalota, C.; Agrawal, R. Analysis of educational data mining using classification. In Proceedings of the 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 14–16 February 2019; pp. 243–247. [Google Scholar]
  157. Asad, R.; Altaf, S.; Ahmad, S.; Shah Noor Mohamed, A.; Huda, S.; Iqbal, S. Achieving personalized precision education using the Catboost Model during the COVID-19 lockdown period in pakistan. Sustainability 2023, 15, 2714. [Google Scholar] [CrossRef]
  158. Luo, J.; Wang, M.; Yu, S. Exploring the factors influencing teachers’ instructional data use with electronic data systems. Comput. Educ. 2022, 191, 104631. [Google Scholar] [CrossRef]
  159. Almalki, G.; Williams, N. A Strategy to Improve The Usage of ICT in The Kingdom of Saudi Arabia Primary School. Int. J. Adv. Comput. Sci. Appl. 2012, 3, 42–49. [Google Scholar] [CrossRef]
  160. Thannimalai, R.; Raman, A. The Influence of Principals’ Technology Leadership and Professional Development on Teachers’ Technology Integration in Secondary Schools. Malays. J. Learn. Instr. 2018, 15, 201–226. [Google Scholar] [CrossRef]
  161. Chen, T.; Peng, L.; Yin, X.; Rong, J.; Yang, J.; Cong, G. Analysis of User Satisfaction with Online Education Platforms in China during the COVID-19 Pandemic. Healthcare 2020, 8, 200. [Google Scholar] [CrossRef]
  162. Dormann, M.; Hinz, S.; Wittmann, E. Improving school administration through information technology? How digitalisation changes the bureaucratic features of public school administration. Educ. Manag. Adm. Leadersh. 2017, 47, 275–290. [Google Scholar] [CrossRef]
  163. Mandal, S.; Khan, D.A. A Study of Security Threats in Cloud: Passive Impact of COVID-19 Pandemic. In Proceedings of the 2020 International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 10–12 September 2020; pp. 837–842. [Google Scholar] [CrossRef]
  164. Khalid, M.S.; Nyvang, T. A change agent’s facilitation process for overcoming the barriers of ICT adoption for educational administration—The case of a rural-Bangladesh vocational institution. Australas. J. Educ. Technol. 2014, 30, 547–561. [Google Scholar] [CrossRef]
  165. Nwakanma, C.I.; Hossain, M.S.; Lee, J.M.; Kim, D.S. Towards machine learning based analysis of quality of user experience (QoUE). Int. J. Mach. Learn. Comput. 2020, 10, 752–758. [Google Scholar] [CrossRef]
  166. Baker, R.S.J. Encyclopedia of Data Warehousing and Mining; IGI Global Scientific Publishing: Hershey, PA, USA, 2005. [Google Scholar] [CrossRef]
  167. Guo, B.; Zhang, R.; Xu, G.; Shi, C.; Yang, L. Predicting students performance in educational data mining. In Proceedings of the 2015 International Symposium on Educational Technology (ISET), Wuhan, China, 27–29 July 2015; pp. 125–128. [Google Scholar]
  168. Ashraf, M.; Zaman, M.; Ahmed, M. An intelligent prediction system for educational data mining based on ensemble and filtering approaches. Procedia Comput. Sci. 2020, 167, 1471–1483. [Google Scholar] [CrossRef]
  169. Hämäläinen, W.; Vinni, M. Classifiers for educational data mining. In Handbook of Educational Data Mining; Chapman & Hall/CRC Data Mining and Knowledge Discovery Series; CRC Press: Boca Raton, FL, USA, 2011; pp. 57–71. [Google Scholar]
  170. Sha, L.; Gašević, D.; Chen, G. Lessons from debiasing data for fair and accurate predictive modeling in education. Expert Syst. Appl. 2023, 228, 120323. [Google Scholar] [CrossRef]
  171. Baker, R.; de Carvalho, A. Labeling student behavior faster and more precisely with text replays. In Proceedings of the Educational Data Mining, the 1st International Conference on Educational Data Mining, Montreal, QC, Canada, 20–21 June 2008. [Google Scholar]
  172. Anshari, M.; Syafrudin, M.; Fitriyani, N.L. Fourth Industrial Revolution between Knowledge Management and Digital Humanities. Information 2022, 13, 292. [Google Scholar] [CrossRef]
  173. Putrama, I.M.; Pradnyana, G.A.; Paramartha, A.A.G.Y.; Darmawiguna, I.G.M.; Wirawan, I.M.A.; Pascima, I.B.N.; Wijaya, I.N.S.W.; Aryanto, K.Y.E. Educational big data infrastructure: Opportunities, design and challenges. J. Phys. Conf. Ser. 2021, 1810, 012023. [Google Scholar] [CrossRef]
Figure 1. EDM relationships between the fields.
Figure 1. EDM relationships between the fields.
Information 15 00738 g001
Figure 2. SoK methodology stages.
Figure 2. SoK methodology stages.
Information 15 00738 g002
Figure 3. Distribution of papers included in this SoK based on EDM techniques.
Figure 3. Distribution of papers included in this SoK based on EDM techniques.
Information 15 00738 g003
Figure 4. Most ML algorithms applied by educational organisations as part of EDM techniques.
Figure 4. Most ML algorithms applied by educational organisations as part of EDM techniques.
Information 15 00738 g004
Table 1. Summary of ML techniques and models that are applied in EDM included in this SoK paper (2011–2024).
Table 1. Summary of ML techniques and models that are applied in EDM included in this SoK paper (2011–2024).
YearReferenceTechniquesModels
2008[9]ClassificationRF, ANN, NB, KNN, DT
2011[37]Classification, regressionRF, ANN, NB
2015[79]ClassificationANN, NB, DT, SVM
2016[28]Classification, regressionANN, NB, DT
2017[61]ClassificationOneR, DT
2017[62]Classification, regressionDT
2017[31]ClassificationDT
2018[42]Clustering, classificationKNN
2018[43]Classification, regressionRF, ANN, NB
2018[115]ClassificationNA
2019[78]ClassificationGBM
2020[32]Classification, regressionANN, LR, SVM
2020[59]RegressionANN
2020[92]Classification, clusteringKNN, SVM
2021[108]RegressionANN
2021[95]Classification, regressionDT
2021[44]ClassificationDT, KNN, NB, LDA, LB
2021[81]Classification, regressionSVM, NB, KNN, DT, RF, ANN
2021[45]ClassificationDT, ANN
2021[85]Classification, regressionNB, RF, DT, LR
2022[38]Classification, regressionFLDA, NB, DT, RF, ANN, LR, KNN
2022[36]ClassificationSVM, KNN, ANN
2022[39]Regression, classificationLR, DT, DT, ANN, SVM, NB
2022[117]ClassificationANN, SVM, DT
2022[72]Regression, classificationRF, KNN, SVM, LR, LDA, NB, ANN
2022[75]RegressionDT, RF, ANN, SVM
2022[118]ClusteringDT, RF, ANN, SVM
2022[83]ClassificationANN
2022[84]RegressionANN
2022[46]Classification, regressionSVM, RF, DT
2022[90]Classification, regressionANN, SVM, NB
2023[40]Classification, regressionSVM, RF, DT, LR, ANNKNN, NB
2023[49]Classification, regressionANN, SVM, DT, NB, KNN
2023[52]Classification, clusteringDT
2023[34]Regression, clustering, classificationRF, DT, LR, KNN
2023[50]Classification, regressionDT
2023[58]ClassificationANN, SNM, RF, DT
2023[35]ClassificationANN, RF, KNN
2023[91]ClassificationANN, SVM, RF, DT
2024[55]Clusteringk-means
2024[82]Classification, regressionSVM, KNN, SVM, RF, DT, LR, ANN
2024[87]RegressionKNN
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Almaghrabi, H.; Soh, B.; Li, A.; Alsolbi, I. SoK: The Impact of Educational Data Mining on Organisational Administration. Information 2024, 15, 738. https://doi.org/10.3390/info15110738

AMA Style

Almaghrabi H, Soh B, Li A, Alsolbi I. SoK: The Impact of Educational Data Mining on Organisational Administration. Information. 2024; 15(11):738. https://doi.org/10.3390/info15110738

Chicago/Turabian Style

Almaghrabi, Hamad, Ben Soh, Alice Li, and Idrees Alsolbi. 2024. "SoK: The Impact of Educational Data Mining on Organisational Administration" Information 15, no. 11: 738. https://doi.org/10.3390/info15110738

APA Style

Almaghrabi, H., Soh, B., Li, A., & Alsolbi, I. (2024). SoK: The Impact of Educational Data Mining on Organisational Administration. Information, 15(11), 738. https://doi.org/10.3390/info15110738

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop