1. Introduction
Over the last decade, educational data mining has been among the approaches used to analyze student performance and make related predictions of education outcomes [
1]. These methods, specifically classification algorithms, enable educators to identify trends from a large dataset, supporting evidence-based decision-making processes in education [
2,
3]. However, with all this development in the area, the prediction of skills like a mathematical integrated skill still needs to be discovered due to the complex nature of such a skill.
Mathematical integrated skills refer to the ability of students to integrate and use mathematical knowledge from several content strands, including algebra, geometry, and data analysis, in solving complex problems [
4]. These are essential skills for students in their general mathematical ability and success in high-order mathematics. Traditional assessment and prediction methods often need to include the completeness of these integrated abilities in students and, hence, are varied in our understanding and support for student learning in this area.
For that matter, this study proposes a double inertial forward–backward splitting algorithm that is improved to better predict students’ mathematical integrated skills. This algorithm is applied to an educational dataset of students’ performance across nine key mathematical content areas and predicts overall integrated skill level. This paper uses variational inclusion techniques in machine learning to obtain an effective prediction model supporting targeted educational interventions.
It would be an essential study, as it would develop the accuracy of educational predictions, allowing more effective teaching strategies to be applied at the individual level. The proposed algorithm may perform the same in terms of accuracy, precision, and recall, but it achieves those results with fewer iterations than existing methods. Hence, it has proven to be a promising tool for educational data analysis.
To investigate the predictive capacity of the variational inclusion problem (VIP), a specific area within the field of optimization, in determining students’ mathematical integrated skill, we let
be a real Hilbert space, the VIP is to find an element
such that
where an operator
is monotone and
is a monotone Lipschitz continuous operator. Finding a solution for the VIP (
1) encompasses several practical applications, including signal recovery, image processing, machine learning, and other related techniques [
5,
6,
7].
One technique of solving the VIP (
1) is considering designing algorithms to solve the fixed point problem of the mapping
that is
if and only if
where
is resolvent mapping associated with the set-value mapping
is identified by
for some
, where
I denotes the identity operator acting on
. A multitude of algorithms have been devised using these concepts, one of which is the well-known forward–backward splitting method [
7,
8], also referred to as the Picard algorithm with mapping
). This algorithm is characterized by the following method:
and
where
is the stepsize. Additionally, one of the renowned algorithms is generally known as Mann’s algorithm, which was introduced by Mann [
9]. Mann’s algorithm with
mapping is in the form of
and
where
is a nonnegative sequence in
. Mann’s algorithm with
mapping has been seen in several applications, as detailed in [
10,
11].
The inertial forward–backward algorithm (IFB) is another well-known variational inclusion algorithm that was first proposed by Moudafi and Oliny [
12]. This algorithm underwent improvement based on the inertial technique, which was proposed by [
13] in 1964 in order to expedite the convergence of the algorithm. The algorithm was generated by
and
where
is a positive real sequence. Also, the inertial Mann forward–backward splitting algorithm (IMFB), which was proposed by Peeyada et al. [
14] in a recent study, was developed by combining the inertial technique and the forward–backward algorithm. The aforementioned algorithm was applied in the field of machine learning to forecast breast cancer with favorable performance. To generate algorithm, let
, and
where
is a positive real sequence and
.
Very recently, Wang et al. [
15] introduced a modified Tseng splitting method with double inertial extrapolation steps (MTSMDISs) and self-adaptive step sizes for solving monotone inclusion problems in real Hilbert spaces. The algorithm was defined by choosing
and
and
where
If
, then stop. Otherwise,
where
,
are extrapolation parameters, and
,
,
are positive real sequences.
Inspired by the above works, we introduce a new algorithm by modifying Picard and Mann algorithms with a double inertial technique, which computes using the previous three iterative. We apply the algorithm for machine learning to a dataset of students from 109 schools using nine attributes in students’ mathematical integrated skill prediction. To show the performance of our algorithm, we compare it with the other algorithms in the literature.
2. Mathematical Preliminaries
In this section, let be a real Hilbert space. The symbols ⇀ and → are used to denote weak and strong convergence, respectively. We proceed to gather a set of necessary definitions and lemmas that will be instrumental in establishing the validity of our main results. In the subsequent section, we will provide fundamental definitions and lemmas necessary for establishing our main theorem.
Definition 1. Let be a nonempty, closed, and convex subset of a Hilbert space . The nearest point projection of onto is denoted by , that is, for all and . Such is called the metric projection of onto .
Definition 2. A mapping is said to be
- (i)
for all ;
- (ii)
α-cocoercive (also known as α-inverse strongly monotone) if is firmly nonexpansive, when ;
- (iii)
L-Lipschitz continuous if there exists such that for all ;
- (iv)
Nonexpansive if is L-Lipschitz continuous when .
Definition 3. Let be a multivalued mapping. Then, is said to be
- (i)
Monotone if for all (the graph of mapping );
- (ii)
Maximal monotone if there does not exist a proper monotone extension of .
Based on the given definition, it can be inferred that every mapping classified as
-cocoercive possesses both monotonicity and
-Lipschitz continuity. It is widely recognized that if
is a multi-valued maximal monotone mapping and
, then
is a single-valued firmly nonexpansive mapping [
16].
Lemma 1 ([
17])
. Let be a nonexpansive mapping such that . If there exists a sequence in such that and , then . Lemma 2 ([
8])
. Let be a α-cocoercive mapping and be a maximal monotone mapping. Then, we have for , ,
for and , .
Lemma 3 ([
18], Opial)
. Let be a nonempty set of and be a sequence in . Assume that the following conditions hold. For every , converges.
Every weak sequential cluster point of belongs to .
Then, weakly converges to an element in .
Lemma 4 ([
19])
. Let and be nonnegative sequences of real numbers satisfying and Then, is a convergent sequence. 3. Analysis of Convergence
In this section, let be a nonempty closed convex subset of a real Hilbert space . Let be an -inverse strongly monotone operator and be a maximal monotone operator such that .
Remark 1. (i) If or , Algorithm 1 reduces to following two different one-inertial Picard–Mann projective forward–backward splitting algorithms:
(ii) If and , Algorithm 1 reduces to Picard–Mann forward–backward splitting algorithms (PMPFBS): (iii) The difference between an inertial method and a standard method with relaxing extrapolation parameters and is an inertial method is more relaxing than a standard method, which can be set to obtain faster convergence. The following structure shows the first step of the comparison between an inertial method and a standard method in :
From Algorithm 1, we see that is generated by , which is relaxed by the condition of extrapolation parameters and , as follows:
(1) If (OIPMFBS-I), is impacted to generate ; from Figure 1, it can be observed that can be a vector in red and blue lines when represents positive and negative numbers, respectively, and generate in purple; (2) If (OIPMFBS-II), is impacted to generate ; from Figure 1, it can be observed that can be a vector in yellow and green lines when is positive and negative numbers, respectively, and generate in dark blue; (3) If and , , in black, is generated by (light green) from , lines and ;
(4) If and , ;
(5) Vector (dark blue) can be updated in the closed convex constrained set by the projection metric (green square or orange circle areas).
Algorithm 1. DRIPFBS: Double Relaxed Inertial Projective Forward–Backward Splitting Algorithm |
|
Theorem 2. The sequence generated by the DRIFBS algorithm converges weakly to an element in .
Proof. Let
and
. Since
,
is nonexpansive mapping by [
20], and we deduce the following:
By the conditions of
,
, it follows from Lemma 4 that we attain that
exists. Since
is firmly nonexpansive, we have
Repeatedly, by the conditions of the sequences
,
, and (
12), we have
Since
, there exists
such that
. By Lemma 2(ii), we have
By the definition of
, we obtain
Since
is bounded, there exists a subsequence
of
that weakly converges to
. Since
is generated into
and
is closed,
. According to (
15),
also converges weakly to
. Moreover, by Lemma 2(i) and Lemma 1, it follows from (
14), we obtain
Finally, by Opial’s lemma (Lemma 3), we achieve that
converges weakly to an element in
. □
Next, we give an example in the infinite dimension space , where is -norm, defined by , supporting our main theorem (Algorithm 1).
Example 1. Let , and , where .
In our experiments, we consider two cases of closed convex sets :
(i) ;
(ii)
; then
We use Cauchy errors
to stop the iteration. In order to optimize the performance of our algorithm, we will examine the necessary parameters of Algorithm 1 in four distinct cases when
and
where
M is the iteration number that we want to stop, and we choose the initialization
,
, and
.
Case1. We set
,
, and
for (i) and (ii). The different parameters
are considered in
Table 1.
Case2. We set
,
, and
for (i) and (ii). The different parameters
are considered in
Table 2.
Case3. We set
,
, and
for (i), and
,
, and
for (ii). The different parameters
are considered in
Table 3.
Case4. We set
,
, and
for case (i) and
,
, and
for case (ii). The different parameters
are considered in
Table 4.
From
Table 1,
Table 2,
Table 3 and
Table 4 and
Figure 2,
Figure 3,
Figure 4,
Figure 5,
Figure 6,
Figure 7,
Figure 8 and
Figure 9, we see that the parameters
,
,
, and
of case (i) and
,
,
, and
of case (ii) provide the most favorable results for the initialization
,
, and
.
4. Application to Educational Data Classification
To demonstrate the performance of our algorithm when applied to a data classification problem, we classify the empirical data obtained from the Thailand Ordinary National Educational Test (O-NET) regarding students’ mathematical integrated skill scores. We also collected the data from the assessment of nine learning standards following mathematics content strands designated by the Thailand primary core curriculum [
4].
The learning standards were designated, serving as the development objectives for each strand. These standards specify the knowledge and skills that learners should possess. In addition, the learning standards serve as crucial mechanisms for advancing mathematics education, as they inform us of the contents and methods of teaching and evaluation. It is essential because it reveals the degree of success in attaining the quality specified by the applicable standards [
4]. Therefore, by designated learning standards, we employ students’ performances in number representation, number operation, measurement, geometric figures, spatial reasoning, pattern and function, algebraic understanding, data analysis, and valid estimation as nine attributes in this classification problem.
We proceed to apply Algorithm 1 for machine learning to solve data classification problems using an extreme learning machine (ELM), which was introduced by Huang et al. [
21] for classifying students’ mathematical integrated skills, categorizing them into five levels, defined as the following.
The training dataset is defined by
,
N is a number of distinct samples,
is input training data, and
is the target of
. We focus on using single-hidden layer feed-forward neural networks (SLFNs) such that the output function is computed by
where
M represents hidden nodes,
is the activation function,
is the optimal output weight at the
j-th hidden node,
is the parameter weight at the
j-th hidden node, and
is the bias. The hidden layer output matrix
is generated as follows:
Finding optimal output weight
such that
is the goal in the ELM method, where
represents the training target data. We cannot write the solution
in the form
, where
is the
Moore–Penrose generalized inverse of
, not existing. In this particular example, the least squares problem was taken into consideration. The subsequent regularization of the least squares problem is considered for obtaining a good fit model:
where
is a regularization parameter. This problem is well recognized as the least absolute shrinkage and selection operator (LASSO) [
22].
The performance of the algorithm was evaluated using classification efficient matrices, including accuracy, precision, recall, and F1 score, as shown in (
20)–(
23).
where these matrices gave True Positive (
), True Negative (
), False Positive (
), and False Negative (
).
The multi-class cross-entropy loss function is used in multi-class classification. The following average was computed:
where
is the
j-th scalar value in the model output,
is the corresponding target value, and
N is the number of scalar values in the model output.
These data contain a 109-instance educational dataset containing nine attributes of students’ performances in number representation, number operation, measurement, geometric figures, spatial reasoning, pattern and function, algebraic understanding, data analysis, and valid estimation. Before starting data training, a description of each attribute and an overview of the data are shown in
Table 5 and
Table 6, respectively.
We next solve the ELM model (
19) by setting
and
with
and
. The necessary parameters that are used in Algorithm 1, IFB (
5), and IMFB (
6) algorithms can be seen in
Table 6, where
where
K is the number of iterations that we aim to stop.
In this experiment, we let
be a sigmoid function and let
. The following
Table 7 shows the numerical training results when the model stops at the highest stable accuracy.
Remark 2. According to the findings presented in Table 8, it can be observed that (i) IFB (5) and IMFB (6) required 747 iterations, both taking slightly longer training times (0.0208 and 0.0206, respectively) but achieving the same accuracy, precision, recall, and F1 score as Algorithm 1; (ii) MTSMDIS (7) required 514 iterations with a training time of 0.0214, also yielding the same performance metrics; (iii) OIPMPFBS-I and DRIPFBS were significantly more efficient, requiring only 51 and 46 iterations, respectively, with training times of 0.0045 and 0.0057, while still achieving the same performance metrics;
(iv) OIPMPFBS-II and PMPFBS required 550 and 371 iterations, respectively, with training times of 0.0173 and 0.0139, and achieved the same accuracy, precision, recall, and F1 score.
Algorithm 1 (iii) showcases the highest efficiency in precision, recall, accuracy, and F1 score, while requiring the lowest number of iterations. This ensures not only faster convergence but also reduces computational costs, making it a more efficient and time-saving option for accurately classifying tumors compared to other examination methods.
Next, we proceed to exhibit the accuracy and loss graphs for both the training and testing datasets to assess the models’ good fit adequacy.
Based on the data presented in
Figure 10, it is evident that Algorithm 1 (DRIPFBS), for
and
, provides good accuracy, and the loss plots during training and validation consistently exhibit close proximity, particularly after 150 iterations, surpassing Algorithm 1; OIPMPFBS-I, OIPMPFBS-II, and PMPFBS are presented in
Figure 11,
Figure 12 and
Figure 13. Additionally, it also outperforms the IFB and IMFB algorithms, as shown in
Figure 14 and
Figure 15.
5. Discussion on Mathematical Integrated Skills Prediction Results
According to the mathematical result, every aspect of performance in number representation, number operation, measurement, geometric figures, spatial reasoning, pattern and function, algebraic understanding, data analysis, and valid estimation contributes to the development of students’ mathematical integrated skills. The results from educational data can be discussed as follows.
Firstly, students proficient in number representation and number operations possess the capacity to precisely manipulate numerical values, execute mathematical operations, and establish correlations among diverse mathematical principles [
23,
24]. This skillset empowers individuals to utilize their comprehension of numerical concepts in diverse mathematical content strands, including algebra, measurement, and data analysis [
25]. Moreover, students can execute mathematical procedures with precision and expediency, employ suitable methodologies and computational techniques, and logically analyze numerical interconnections. This proficiency enables individuals to resolve problems and establish correlations among diverse mathematical concepts [
24,
25].
Secondly, mastering measurement skills can also augment the spatial reasoning capabilities of students, as it necessitates comprehension and visualization of geometric concepts such as magnitude, configuration, and orientation [
26]. Moreover, students who demonstrate proficiency in measurement possess the ability to establish correlations between geometric figures and measurements, employ formulas and techniques to compute areas and volumes, and scrutinize patterns and associations in spatial data [
25]. Moreover, students who exhibit proficiency in measurement can utilize their comprehension of measurement units and conversions to resolve problems that entail proportions, scaling, and dimensional analysis, addressing mathematical challenges in algebra [
26]. Also, they can scrutinize and construe data obtained via measurement, reinforcing their comprehension of statistical principles.
Additionally, acquiring expertise in geometric figures and spatial reasoning can augment students’ problem-solving aptitude and logical reasoning capabilities [
26]. The utilization of geometric principles allows individuals to engage in the analysis and resolution of geometric problems, the construction and substantiation of geometric arguments, and the application of geometric principles to practical, real-world scenarios [
25]. Competence in geometric figures and spatial reasoning also facilitates students’ comprehension of measurement, given that geometric shapes frequently entail the quantification of lengths, areas, volumes, and angles [
25].
Moreover, proficiency in pattern recognition, function analysis, and algebraic understanding is crucial to the overall mathematical proficiency of students [
25]. Students who demonstrate proficiency in these domains possess the ability to recognize and articulate mathematical relationships, represent them symbolically and algebraically, and utilize them to resolve problems spanning diverse mathematical fields [
27]. Through acquiring expertise in these domains, students can augment their capacity to engage in critical thinking, establish correlations between diverse mathematical concepts, and employ mathematical knowledge in various contexts [
25].
Finally, the ability to perform data analysis and make valid estimations empowers students to effectively manipulate empirical data, exercise critical thinking, and utilize mathematical principles in pragmatic situations. Students with a robust grounding in data analysis can interpret and explore data depictions, recognize recurring themes and tendencies, and formulate significant deductions. Individuals can utilize data to substantiate their mathematical assertions and authenticate their rationale [
28]. Furthermore, proficient estimation skills enable students to employ techniques across diverse mathematical fields, including measurement, quantity evaluation, and problem-solving. Estimation can be a valuable problem-solving tool when precise values are impractical.
By developing competence in each of these areas, students improve their mathematical integrated skills, which entails the ability to connect and apply knowledge from different mathematical domains, recognize relationships between concepts, and solve complex problems using a variety of mathematical techniques and strategies.
In conclusion, the implementation of educational data classification can yield various advantages. The utilization of data analytics in education can assist in the identification of concealed patterns and trends in student performance, the anticipation of academic achievement or deficiency, the customization of learning experiences according to individual requirements, the assessment of the efficacy of teaching methodologies and interventions, and the provision of data-driven insights for educational policy and planning.
6. Conclusions
This paper is devoted to an improved variant of the double inertial forward–backward splitting algorithm for predicting students’ mathematical integrated skills based on performance variability across different mathematical content strands. The results show that this method maintains accuracy, precision, recall rate, and F1 score but requires fewer iterations.
Our numerical experiments have shown that this algorithm realizes the classification of students’ skills with an accuracy of 81.40, a precision and recall rate of 72.09, and an F1 score of 72.09 using just 46 iterations. Results of such kind testify that the algorithm reliably predicts integrated mathematical skills necessary for the all-rounded mathematical ability of students. The results imply that successful learners gain access to many complex mathematics abilities, of which those with a deep-rooted conception of numbers, geometric reasoning, and algebraic thinking are the most common courses.
Our study results in the advancement of algorithms that can be used for skill prediction and the definition of cognitive strategies for individuals. This work will serve as a basis for more research shortly, as it will provide for a revision of the methodology so that it can be tested in datasets with other types of education data, making the whole education area even more stimulating and precise.
Consequently, the proposed algorithm is effective. It may be an excellent support tool for teaching professionals and researchers involved in enhancing students’ mathematics learning outcomes. Applying this technique in teaching can bring about a leap in educational intervention, making it more effective and, thus, personalized, thereby helping more students succeed in mathematics at the school level.