Automatic Group Organization for Collaborative Learning Applying Genetic Algorithm Techniques and the Big Five Model

Revelo Sánchez, Oscar; Collazos, César A.; Redondo, Miguel A.

doi:10.3390/math9131578

Open AccessArticle

Automatic Group Organization for Collaborative Learning Applying Genetic Algorithm Techniques and the Big Five Model

by

Oscar Revelo Sánchez

^1,*

,

César A. Collazos

²

and

Miguel A. Redondo

³

¹

Galeras.NET Research Group, Universidad de Nariño, San Juan de Pasto 52001, Colombia

²

IDIS Research Group, Universidad del Cauca, Popayán 190001, Colombia

³

CHICO Research Group, Universidad de Castilla-La Mancha, 13071 Ciudad Real, Spain

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(13), 1578; https://doi.org/10.3390/math9131578

Submission received: 29 March 2021 / Revised: 29 June 2021 / Accepted: 1 July 2021 / Published: 5 July 2021

(This article belongs to the Special Issue Advances in Artificial Intelligence and Statistical Techniques with Applications to Health and Education)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, an approach based on genetic algorithms is proposed to form groups in collaborative learning scenarios, considering the students’ personality traits as a criterion for grouping. This formation is carried out in two stages: In the first, the information of the students is collected from a psychometric instrument based on the Big Five personality model; whereas, in the second, this information feeds a genetic algorithm that is in charge of performing the grouping iteratively, seeking for an optimal formation. The results presented here correspond to the functional and empirical validation of the approach. It is found that the described methodology is useful to obtain groups with the desired characteristics. The specific objective is to provide a strategy that makes it possible to subsequently assess in the context what type of approach (homogeneous, heterogeneous, or mixed) is the most appropriate to organize the groups.

Keywords:

collaborative learning; collaborative work; genetic algorithms; group formation; personality traits

1. Introduction

Due to the current needs of society, education requires changes in the teaching–learning processes through the implementation of innovative and motivating pedagogical actions. Among those that have shown effective results are collaborative teaching strategies. These have become a more common practice today thanks to their high educational potential [1]. One of the key processes when implementing this type of strategy is the formation of working groups.

Outside the academic scope, groups are formed with various objectives, for example, people group together in social situations, at work, or when they seek common interests. Groups are considered as a basic social structure. Although in the academic scope groups are also formed with ease and for various purposes, the establishment of groups in the classroom can be a complicated and stilted process, always depending on the objective being pursued [2]. However, for collaborative learning to succeed, it is important to form effective groups, since the result of the group depends largely on the fulfilment of the responsibilities of each of its members, good academic and empathy relationships are fundamental among them [3].

The grouping problem is critical in collaborative learning, due to the complexity and difficulty of achieving an adequate grouping, based on different criteria and numerous students [4]. Group formation in collaborative environments is not a trivial task when it comes to achieving homogeneity or heterogeneity within the groups. Applying a good strategy in their forming, which considers not only one, but several characteristics of the students depends largely on the general academic benefit [5]. Therefore, it is very useful to have a solution that automates this process, to do it as efficiently as possible and increase the chances of success of the groups.

There are various criteria for the automatic formation of learning groups. These criteria have been used in a wide variety of studies that can be found in the literature. These studies usually consider factors such as the students’ learning style [6,7], their thinking style [8], their knowledge and behaviour [9,10], or characteristics such as their gender, skills, and personality [11,12,13,14,15,16], among others.

Considering the above, one of the aspects to be evaluated in the group formation may be the students’ personality. However, in the literature review developed by Borges et al. [17], it is observed that personality is one of the grouping criteria that is least used in studies, showing great potential for research on this topic.

The proposed approach in this paper seeks to find homogeneous, heterogeneous, or mixed groups, considering each student’s personality traits. Personality traits are measured under the “Big Five” model, using the self-assessment instrument based on this model named Big Five Questionnaire—BQF. The traits of each person within a global group are evaluated, to later find the group mean in each dimension contemplated by the Big Five model, and groups are formed seeking to optimize a certain intra-heterogeneity and inter-homogeneity measure. Since group formation is a combinatorial problem that involves multiple characteristics, the heuristic search offered by evolutionary algorithms was used as an optimization technique.

The characteristics from which groups are formed and the operators implemented in the genetic algorithm are the main contributions of this work. Most of the existing studies in group formation that use genetic algorithms focus the grouping according to the students’ knowledge level and use crossover and mutation basic operators. The proposed strategy exploits the traits derived from the five dimensions of the Big Five personality model (Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness), to improve collaboration and learning outcomes at group and individual level. Likewise, a modification of the crossover operator named C1 is used, which is suggested for problems where genes should not be repeated, as is the case under study; and, for mutation, a variation of the swap mutation operator is used. These modified genetic operators allow a more complete search in the solution space, providing new genetic information to the population, preventing the algorithm from being trapped in a local minimum.

The proposed approach was empirically validated through a controlled experiment with 82 students from four programming courses, belonging to Systems Engineering and Electronic Engineering programs of the University of Nariño of the City of San Juan de Pasto-Colombia during the academic semester B-2020. The experiment consisted of developing collaborative activities by the students, obtaining the required groups through the proposed strategy and by students’ preference, to evaluate the academic performance achieved by the participants finally. To somehow guarantee the initial equivalence of the groups, a pre-test was applied to both the experimental and control groups. Section 4.3 describes the empirical study in detail.

The rest of this paper is organized as follows: Section 2 addresses the relevant theoretical foundation of the Big Five model, work and collaborative learning, and genetic algorithms; Section 3 shows some related work about group formation; subsequently, Section 4 describes the proposed approach divided into two parts, the first describes the psychometric instrument used, while the second poses the use of genetic algorithms; and finally, Section 5 and Section 6 present the results obtained, the conclusions and the future work.

2. Theoretical Foundations

This section presents the theoretical foundations that support the research carried out. Topics about personality, work, and collaborative learning, group formation and genetic algorithms as optimization technique used are addressed.

2.1. The “Big Five” Model

Currently, a wide variety of models and personality theories that offer different perspectives on how to approach a person’s personality are available. Some of these theories are: Carl Jung’s Psychological Types [18], Keirsey’s Personality Types Theory [19], The “Big Five” Factors Personality Model [20], and Myers–Briggs Type Indicator (MBTI) [21], among others.

In this work, we decided to use the “Big Five” or “Five-Factor Model (FFM)” personality model because it has obtained the greatest consensus in the area of psychology and because it is one of the most widely used in the literature [22,23]. The “Big Five” is a personality trait hierarchical model composed of five big factors, where each represents personality characteristics at a more general abstraction level. These factors or dimensions are traditionally referred to as Extraversion (E), Agreeableness (A), Conscientiousness (C), Neuroticism (N), and Openness (O). Each value combination in the different dimensions generates a personality type with a different tendency to behave, interact, react, and reason.

Since the Big Five model is a purely descriptive personality model, psychologists have developed various tests and questionnaires that evaluate each of the five factors or dimensions in individuals; for example, NEO Personality Inventory-Revised (NEO-PI-R), Sixteen Personality Factor Fifth Edition (16PF-5), Big Five Questionnaire (BFQ), Big Five Questionnaire-Children (BFQ-C), 100 Trait-Descriptive Adjectives (TDA-100), Big Five Inventory (BFI), Hogan Personality Inventory (HPI), and Five-Factor Personality Inventory(FFPI), among others [24,25].

2.2. Work and Collaborative Learning

In the educational context, collaborative work is an interactive learning model, in which students build together, uniting efforts, talents and competencies, through activities that in consensus lead to reaching the established goals. More than a technique, collaborative work is considered an interaction philosophy and a personal way of working, that involves managing aspects such as respecting the individual contributions of group members [26].

From the above, the Collaborative Learning construct arises. Chaljub Hasbún [27] affirms that collaborative learning is a result of collaborative work. For Johnson et al. [28], collaborative learning is a carefully designed interaction system that organizes and induces reciprocal influence between team members. It is developed through a gradual process in which each member and everyone feels mutually committed to each other’s learning, generating a positive interdependence that does not imply competition.

Collaborative learning is achieved through group work methods characterized by the interaction and contribution of all in the acquisition of knowledge, sharing authority, accepting responsibility and the point of view of the other, and building consensus with others [29].

2.3. Genetic Algorithms

Genetic algorithms were described by Holland [30] and are considered as a computational model family inspired by evolutionary biology [31]. To date, their use has spread beyond the original conception, as a more general type of evolutionary algorithm that attempts to simulate Darwinian evolution and natural selection through the recombination and mutation of individuals [32]. These algorithms use a data structure to encode the potential solutions to the problem in question, generally a vector, as a chromosome and apply recombination operators seeking to preserve the critical information that guides towards a satisfactory solution [33].

In general, a genetic algorithm is structured in a three-step iterative process [31]: (i) an initial population of solutions (individuals) is created, represented by a chromosome that encodes the solution to the problem; (ii) a group of individuals is selected through a specific strategy, based on the fitness function, and the next population is generated by applying genetic operators (crossover and mutation) to the selected individuals; (iii) step (ii) is repeated until the remaining individuals in the generation are good enough according to the fitness function and the stop criteria. This process is outlined in Figure 1.

Regarding their application, genetic algorithms have been used, particularly during the last two decades, in a wide range of combinatorial optimization problems, including TSP (Travel Salesman Problem), KP (Knapsack Problem), sequencing and scheduling of tasks (Scheduling), vehicle routing, among others [34,35,36,37,38,39], which makes this technique a good candidate to solve the grouping problem outlined in this paper.

3. Related Works

In collaborative work, it is essential to have very well formed work teams, in which the members are comfortable with their peers and where the academic level of each allows a favourable interdependence. However, in practice, self-selection and random assignment of members are the most popular approaches used in group formation, although they do not always produce good results [40]. Currently, it is difficult to find group formation techniques, as research is based on the correct functioning of collaborative learning, but formation as such is neglected or undervalued. In the literature, however, there are several works where certain elements that must exist within the group are discussed and roles that must be fulfilled in each work team [41]. Below are some of the most important related works of recent years, including a brief description of their application.

Battur et al. [16] propose in their study that teams be formed based on the student’s complementary skills and ensuring that each team has an expert member in each identified skill. Wichmann et al. [9] investigated how group formation based on student behaviour affects productivity on a small group task. Amara et al. [42] carried out research to form homogeneous groups in a mobile collaborative learning environment, with a grouping attributes personalized selection; the technique used was the K-Means algorithm. Sadeghi and Kardan [13] and Amarasinghe et al. [12] propose binary integer linear programming based on task assignment, gender, and language preferences as a group formation optimization technique. Manske and Hoppe [10] used a semantic algorithm, which maximized the knowledge diversity in the groups. Odo et al. [43] analysed how the student’s affective state can affect their performance in collaborative study groups. Lykourentzou et al. [14] and Reis [15] highlight in their work the importance of personality traits as critical elements that affect collaboration and students’ interaction, affirming that this factor can influence performance and student satisfaction, and induce various actions and behaviours in group work.

On the other hand, the literature review carried out by Cruz and Isotani [44] regarding group formation demonstrates the great researcher’s interest in using the genetic algorithm technique as a solution to the problem, given its relevance when dealing with a large variable number and its ability to quickly generate optimal solutions, that is, useful groups.

4. Proposed Approach

As previously mentioned, the approach proposed in this paper seeks to find three types of groups, homogeneous, heterogeneous, and mixed, attending to the student’s personality traits, contrasting the five dimensions of the Big Five model. The measurement of these dimensions can be obtained employing the “Big Five Inventory” (BFI), to then find the group mean in each of the dimensions, and finally form groups seeking to optimize a certain measure of heterogeneity, homogeneity, or mixture.

The approach is explained below in three sections—the first one describes the proposed psychometric test to measure personality traits, followed by the formation of workgroups as such, through the application of genetic algorithms, and finally, the empirical design is described as used for validation.

4.1. Big Five Inventory (BFI)

As described in Section 2.1, the Big Five model, on which this work is theoretically and psychologically based, is a purely descriptive personality model, which has led psychologists to develop various tests and questionnaires that evaluate each of the five factors or dimensions in individuals. In the specific case, an adaptation to Spanish of the Big Five Inventory-BFI by John et al. [45] is used as an instrument to measure the students’ personality traits.

The use of this instrument is considered a scientifically accepted resource to quantify personality traits, which, as described below, are the input required by the grouping algorithm. It is not intended to issue a concept or psychological diagnosis of the participants, as this is outside the scope of the study.

Table 1 shows the Spanish and English versions of the instrument, the latter as a reference for non-Spanish-speaking readers, which were reconstructed from the original paper “Los Cinco Grandes Across Cultures and Ethnic Groups: Multitrait Multimethod Analysis of the Big Five in Spanish and English” [46]. It consists of 44 multiple-response items (Likert type) that measure the dimensions proposed for the Big Five model.

4.2. Algorithm for Group Formation

The proposed method for group formation is described in detail in this section. It is based on the previous work of Moreno et al. [47], who propose a method to group elements in a homogeneous way. The mathematical and algorithmic formulation of the model is generally described, from the representation of the elements to be grouped (students), the solutions (feasible groupings) and their fitness measures, to the operators employed in applying the genetic algorithm.

4.2.1. Student Representation

Each student n can be represented through a vector, where M is the number of characteristics, which could have a different nature, for example, demographic (age, gender), psychological (personality traits, abilities, capacities), academic (grades, pre-tests, self-assessment), and cognitive (learning styles, intelligence types), among others.

E_{n} = {C_{1}, C_{2}, \dots, C_{M}}

(1)

Each characteristic m (1 ≤ m ≤ M) must be a value in a predefined numerical range. If categorical attributes are considered, a prior numerical discretization process would be required. For example, if a characteristic takes “high”, “medium”, and “low” values, these could be changed to 1, 2, and 3, respectively.

A set of students can be represented by an N × M matrix, where N is the number of students and M is the number of characteristics, as shown in Table 2.

The data must be scaled to a common range so that there are no alterations in calculating the objective function. One way to do this is by applying the min–max normalization [48], which allows all the data to remain, for example, in the range 0–1, using the following expression, where

V_{\max}

and

V_{\min}

are the maximum and minimum values of the corresponding characteristic.

V^{'} = \frac{V - V_{\min}}{V_{\max} - V_{\min}}

(2)

4.2.2. Individual Representation

In the group formation problem, an individual corresponds to a given set of G groups, each with up to N/G students, where N is the total number of students. For the representation of individuals, it is proposed to use a matrix, where the number of rows corresponds to the number of groups G and the number of columns corresponds to the maximum size of each group N/G. In this way, each gene that makes up the chromosome contains the identifier of an element, and its position within the matrix defines the group to which it belongs. This representation facilitates the coding of the chromosome and the use of the genetic operators described below.

As in other combinatorial problems, in group formation, a chromosome cannot have repeated genes, which means that an individual (feasible solution) is one in which each element (student) is in a single position on the chromosome. For example, if you have a set of 20 students and you want to form 4 groups, each group would contain 5 students. A possible individual, if the students are numbered consecutively, is presented in Table 3.

4.2.3. Fitness Measure

As mentioned above, the objective of the proposed approach is to form homogeneous, heterogeneous, or mixed groups concerning all the students, considering their personality traits. The way to measure this classification criterion would be given by the fitness measure. One possible way to calculate it is described below:

The average of each characteristic of all students (TM) is calculated:

T M = {\bar{C_{1}}, \bar{C_{2}}, \dots, \bar{C_{M}}}

(3)

For each group g (1 ≤ g ≤ G) of each individual i the average of each characteristic is calculated. Considering that individual i is represented as a vector of Xⁱ, these averages (IM) are represented as follows:

I M_{g}^{i} = {\bar{X_{g, 1}^{i}}, \bar{X_{g, 2}^{i}}, \dots, \bar{X_{g, M}^{i}}}

(4)

The sum of the squared differences between the M characteristics of each group g of individual i and the average of each characteristic in all the students is calculated, like this:

D^{i} = \sum_{g = 1}^{G} [{(\bar{C_{1}} - \bar{X_{g, 1}^{i}})}^{2} + {(\bar{C_{2}} - \bar{X_{g, 2}^{i}})}^{2} + \dots + {(\bar{C_{M}} - \bar{X_{g, M}^{i}})}^{2}]

(5)

The lower this value (with a minimum of 0), the more similar each group will be on average concerning all the students (TM), in the case of homogeneous group formation; and the higher this value, the less similar each group will be on average concerning all the students, in the case of heterogeneous group formation. The objective function is expressed as follows:

\min | \max Z = \sum_{g = 1}^{G} [{(\bar{C_{1}} - \bar{X_{g, 1}^{i}})}^{2} + {(\bar{C_{2}} - \bar{X_{g, 2}^{i}})}^{2} + \dots + {(\bar{C_{M}} - \bar{X_{g, M}^{i}})}^{2}]

(6)

For mixed group formation, that is, heterogeneous for certain characteristics and homogeneous for others, the problem becomes one of multi-objective optimization: it is required to maximize the differences for the heterogeneous characteristics and at the same time minimize the differences for the homogeneous characteristics. Considering the above, a possible way to deal with this situation is described below.

Let HT and HM be the vectors of characteristic for which heterogeneity and homogeneity are considered, respectively, represented as follows:

H T = {C_{1}, C_{2}, \dots, C_{J}} \subset E_{n}

(7)

H M = {C_{J + 1}, C_{J + 2}, \dots, C_{M}} \subset E_{n}

(8)

For the fitness measure, the sum of the squared differences between the J characteristics of heterogeneity for each group g of individual i and the average of each characteristic in all the students is calculated, and the value obtained from the sum is subtracted from the differences squared between the M characteristics of homogeneity for each group g of individual i and the average of each characteristic in all the students, like this:

\begin{array}{l} D^{i} = & \sum_{g = 1}^{G} [{(\bar{C_{1}} - \bar{X_{g, 1}^{i}})}^{2} + {(\bar{C_{2}} - \bar{X_{g, 2}^{i}})}^{2} + \dots + {(\bar{C_{J}} - \bar{X_{g, J}^{i}})}^{2}] - \\ \sum_{g = 1}^{G} [{(\bar{C_{J + 1}} - \bar{X_{g, J + 1}^{i}})}^{2} + {(\bar{C_{J + 2}} - \bar{X_{g, J + 2}^{i}})}^{2} + \dots + {(\bar{C_{K}} - \bar{X_{g, M}^{i}})}^{2}] \end{array}

(9)

The greater the difference in objectives, the better heterogeneity the groups would have in the HT characteristics and the better homogeneity in the HM characteristics, simultaneously. The objective function can be expressed as follows:

\begin{array}{l} \max Z = & \sum_{g = 1}^{G} [{(\bar{C_{1}} - \bar{X_{g, 1}^{i}})}^{2} + {(\bar{C_{2}} - \bar{X_{g, 2}^{i}})}^{2} + \dots + {(\bar{C_{J}} - \bar{X_{g, J}^{i}})}^{2}] - \\ \sum_{g = 1}^{G} [{(\bar{C_{J + 1}} - \bar{X_{g, J + 1}^{i}})}^{2} + {(\bar{C_{J + 2}} - \bar{X_{g, J + 2}^{i}})}^{2} + \dots + {(\bar{C_{K}} - \bar{X_{g, M}^{i}})}^{2}] \end{array}

(10)

To clarify the entire process described, the data in Table 4, corresponding to a list of 6 students and 3 assessed characteristics, are considered as an example.

Now we want to form two groups, each with three students. Two possible individuals are shown in Table 5.

By applying (3), the following is obtained:

T M = {0.505, 0.500, 0.685}

(11)

Table 6 shows the

\bar{X_{g, C}^{i}}

calculation from Table 4 and Table 5, necessary to obtain

I M_{g}^{i}

according to (4).

I M_{g}^{1} = {\begin{matrix} \begin{matrix} 0.363 & 0.547 & 0.727 \end{matrix} \\ \begin{matrix} 0.647 & 0.453 & 0.643 \end{matrix} \end{matrix}}

(12)

I M_{g}^{2} = {\begin{matrix} \begin{matrix} 0.157 & 0.570 & 0.937 \end{matrix} \\ \begin{matrix} 0.853 & 0.430 & 0.433 \end{matrix} \end{matrix}}

(13)

Finally, calculating the fitness measures applying (5), D¹ = 0.048 and D² = 0.380 are obtained.

D^{1} = [\begin{array}{l} {(0.505 - 0.363)}^{2} + {(0.500 - 0.547)}^{2} + {(0.685 - 0.727)}^{2} + \\ {(0.505 - 0.647)}^{2} + {(0.500 - 0.453)}^{2} + {(0.685 - 0.643)}^{2} \end{array}] = 0.048

(14)

D^{2} = [\begin{array}{l} {(0.505 - 0.157)}^{2} + {(0.500 - 0.570)}^{2} + {(0.685 - 0.937)}^{2} + \\ {(0.505 - 0.853)}^{2} + {(0.500 - 0.430)}^{2} + {(0.685 - 0.433)}^{2} \end{array}] = 0.379

(15)

The grouping represented by Individual 1 is more inter-homogeneous than Individual 2; with this distribution, all the groups of Individual 1 reflect all the students (TM) with greater precision when all the characteristics are considered as a whole. On the contrary, the grouping represented by Individual 2 is more inter-heterogeneous than Individual 1; with this distribution, all the groups of Individual 2 present greater variability concerning all the students (TM) when all the characteristics are considered in whole.

Now a mixed formation is desired, which is homogeneous for C₂ and at the same time heterogeneous for C₁ and C₃. According to (7) and (8), we obtain:

H T = {C_{1}, C_{3}}

(16)

H M = {C_{2}}

(17)

Calculating the fitness measures applying (9), the following is obtained:

D_{H T}^{1} = [\begin{array}{l} {(0.505 - 0.363)}^{2} + {(0.684 - 0.727)}^{2} + \\ {(0.505 - 0.647)}^{2} + {(0.684 - 0.643)}^{2} \end{array}] = 0.044

(18)

D_{H M}^{1} = [{(0.500 - 0.547)}^{2} + {(0.500 - 0.453)}^{2}] = 0.004

(19)

D^{1} = 0.044 - 0.004 = 0.040

(20)

D_{H T}^{2} = [\begin{array}{l} {(0.505 - 0.157)}^{2} + {(0.684 - 0.937)}^{2} + \\ {(0.505 - 0.853)}^{2} + {(0.684 - 0.433)}^{2} \end{array}] = 0.369

(21)

D_{H M}^{2} = [{(0.500 - 0.570)}^{2} + {(0.500 - 0.430)}^{2}] = 0.010

(22)

D^{2} = 0.369 - 0.010 = 0.359

(23)

The grouping represented by Individual 2 is more inter-homogeneous for C₂ and inter-heterogeneous for C₁ and C₃ than Individual 1; with this distribution, all groups of Individual 2 more accurately reflect similarity and variability with the whole set of students (TM), when simultaneously seeking homogeneity for C₂ and heterogeneity for C₁ and C₃.

4.2.4. Initial Population and Evolution

The example in Table 5 shows a trivial group formation: each student is assigned in an orderly manner to a group based on their identifier. The first N/G students (in this case 3) belong to Group 1, the next N/G to Group 2 and so on. Although this formation is valid, the idea of the initial population is to randomly generate k individuals, using the matrix representation described above and satisfying the restriction that each student must be in one and only one of the array positions.

Once the initial population is obtained, the process of evolution is carried out, passing from one generation to another using genetic operators, until a desired fitness measure is obtained or until a certain number of generations is reached.

The main objective of the proposed algorithm is to improve the quality of group formation and its effectiveness in collaborative processes. To do this, a set of configurations are tested, and some modifications are made to the classical genetic operators. The flow of the genetic algorithm used for the student group formations based on personality traits is:

Step 1—Measure students’ personality traits: The first step is to measure the characteristics of the students based on which the groups are formed, in this case, their personality traits. This measurement is crucial for structuring good groups that promote efficient and effective collaboration and achieve better learning outcomes.

Step 2—Define genetic parameters: Before executing the genetic algorithm, the genetic parameters concerning group size, population size, number of generations, and crossover and mutation probabilities must be established. This process is described in Section 4.2.5.

Step 3—Encode chromosome: In this step, the chromosome is represented into a predefined data structure to allow genetic operators to apply. In this study, a matrix structure is used, as described in Section 4.2.2.

Step 4—Initialize population: The genetic algorithm is started by creating an initial population that consists of a set of feasible encoded solutions (chromosomes). This population is generated randomly to ensure its diversity.

Step 5—Evaluate fitness: A fitness function based on the students’ personality traits is used to evaluate the chromosomes of the population, as described in Section 4.2.3.

Step 6—Generate a new population: This step is the core of the genetic algorithm, where new and better solutions are generated. The genetic operators applied are: (a) selection, where two parents are selected for crossing, (b) crossover, where, based on a probability, a recombination of the parents’ genes is carried out, and (c) mutation, where, based on a probability, parts of the chromosome of the new population are mutated.

Step 7—End search: After several generations, the algorithm ends and converges to the fittest chromosome, which represents a feasible solution.

Step 8—Form optimal groups: Student groups are formed based on the genetic algorithm results, and students are notified to begin working in their groups on the development of the proposed collaborative activity.

Figure 2 illustrates the main flow of the student group formation process, using the proposed genetic algorithm, grouping the different steps into four stages: input (1), GA settings preparation (2, 3), GA procedure (4, 5, 6, 7), and output (8).

4.2.5. Search Complexity and Algorithm Performance

The estimation of the exhaustive search complexity of the proposed algorithm is associated with the combinatorial explosion generated by the group formation process, it goes hand in hand with the total number of students to group and the number of groups that want to be formed, which, in turn, is directly related to the size of the groups. In general, the number of G different groups of N/G students that can be obtained from a whole set of N students, considering the ordering of the groups relevant, can be calculated through multinomial coefficients [49] with the following expression:

\frac{N!}{{((\frac{N}{G})!)}^{G}}

(24)

Thus, for example, if you want to organize 50 students into 10 groups of 5, this value would amount to 4.91 × 10⁴³ possible combinations (applying (24)), which makes finding the best solution from an exhaustive search not very feasible in many cases. Hence the usefulness of the proposed method.

On the other hand, before presenting the results of the implemented genetic algorithm, the process of defining its general parameters is described: crossover probability, mutation probability, population size and generation number (termination criterion). The values obtained for these parameters were generalized for the different cases of model validation.

To define good values for both the crossover probability (p_c) and the mutation probability (p_m), values of each of the parameters were simulated in the ranges suggested by the literature [50,51], and others slightly outside of them, selecting the set with the best results. A high crossover probability allows greater exploration of the solution space, reducing the possibility of establishing a false optimum; but if the probability is very high, it causes a great investment in computation time in the exploration of unpromising regions of the solution space. As for the mutation probability, if it is very low, some genes that could have been produced are never tested; if it is too high, there will be much random disturbance, the children would begin to lose their parental likeness.

The simulation was performed on a base case: the example described in Section 4.2.3; the formation of two groups of three students is considered; formation of the three types: homogeneous, heterogeneous, and mixed; with three-valued characteristics; establishing a population size of 100 individuals; 100 generations of execution of the genetic algorithm; and, with 100 simulation runs. Since this is a case with low complexity in the calculations, its verification was possible through the example implementation in an electronic sheet. The simulation results are shown in Table 7.

The results show that statistically similar values are obtained for the three types of formation. In the three cases, with crossover probability from 0.2 to 0.4 and with mutation probability from 0.001 to 0.01, practically 100 times out of the 100 simulation runs, the corresponding optimal values were obtained. Therefore, for the case under study, it was decided to take the mean values in each interval as appropriate values, that is, p_c = 0.4 and p_m = 0.01.

Likewise, to define good values, both for the population size and the generations number (termination criterion of the genetic algorithm), each of the parameters was simulated by selecting the set of those with the best results. Small populations run the risk of not adequately covering the search space, while large populations can generate a high computational cost. On the other hand, as the number of generations increases, the average fitness is more likely to approach that of the best individual, but the computational cost increases.

The simulation was performed with randomly generated data for 50 students; heterogeneous group formation of four members; with five valued characteristics (corresponding to the five dimensions of the Big Five model); establishing a crossover of 0.4 and mutation of 0.01 probabilities; and with 100 simulation runs. The simulation results are shown in Table 8.

The results show that, for the case under analysis, a good population size (PS) is 100 individuals; with higher values, it is observed that fitness (F) decreases. Regarding the generations number (G), the above is ratified: the greater the number, the greater the computational cost (time (T) in milliseconds); and, in addition, as can be seen in Figure 3, after 1000 generations the fitness value stabilizes, which would indicate that the algorithm has found a value close to the optimal; so, it was decided to handle a value around 1000 generations.

The implemented genetic algorithm was validated from randomly simulated data with a structure like that presented in Table 4, considering five valued characteristics (corresponding to the five dimensions of the Big Five model). Three tests were carried out with lists of 20, 50, and 100 students; formation of the three types; groups of 4 and 5 members; a population size of 100 individuals; a generation number of 1000; and a run simulation number of 100.

For the first genetic operator, selection, a survivor’s number equal to population size was chosen by the tournament selection method. The choice of individuals (with replacement) was made randomly, although proportional to the fitness functions, that is, that the fittest individuals have a greater possibility of cloning to the next generation.

The crossover operator was applied to the resulting population, choosing two parents randomly according to the crossover probability of 0.4. Every two parents produce two children using the C1 operator, which chooses a crossing point between the parents’ chromosomes, combines the first segment of the first parent with the second segment, but in the order in which they appear in the second parent and vice versa [52].

Given the nature of the problem, a variation of the swap mutation operator was used [53]. It is proposed in two steps: in the first step, the individuals to be mutated are randomly selected; and, in the second step, two genes to be mutated are randomly selected. The gene mutation will consist of the swap of values of a specific allele in each gene, also selected randomly. Considering the matrix representation used, it is necessary to clarify that the swap’s allele cannot be in the same row since the change would not affect it (the order within a group has no relevance). The mutation operator was applied to the entire population of each generation with a probability of 0.01.

Under all these parameters, the execution of the algorithm programmed in Java™, on a laptop with an Intel Core I7 processor of 1.8 GHz and 8 Gb of RAM, generated the results presented in Table 9 and Table 10.

These results show that the algorithm implemented does not differentiate between the types of groups to be formed, practically the computational performance (time in milliseconds) is the same for the three in each of the lists, for groups of 4 or 5 members. On the other hand, the time used, and the fitness value are increasing for each of the lists, since as the total number of students increases, the complexity of the search increases, that is, there are more combinations to consider in determining the optimal value. It is important to consider the above: the tests were performed using a generations number of 1000 as the termination criterion of the genetic algorithm; if this parameter is increased, the adaptation value improves, but the processing time is also increased.

These results demonstrate the effectiveness of the proposed algorithm, which, being a heuristic search method, does not guarantee to reach the global optimum, but a very close value, despite the simplicity of its formulation and its low demand for computational resources (time and memory), even when the number of possible combinations is very high.

4.3. Empirical Design

The experiment consisted in conducting a collaborative learning activity, specifically an activity named “Peer Code Evaluation” [54], with 82 students from 4 programming courses, belonging to the programs of Systems Engineering and Electronic Engineering of the University of Nariño of the City of San Juan de Pasto-Colombia, in the academic semester B-2020. It is worth mentioning that for this period, given the COVID-19 situation, the courses were developed in virtual mode. Table 11 shows the characterization of each of the courses.

The research process was developed with an empirical design based on a quasi-experiment as shown in Table 12, seeking to verify one of the following hypotheses: H₀: the means of the grades obtained by the students in the topic of the collaborative activity are equal (null hypothesis); H₁: the means of the grades obtained by the students in the topic of the collaborative activity are different (research hypothesis). It is a quasi-experiment since the study groups (described below) were already formed before the experimentation, they were intact groups (the reason why they arose and the way they were formed have nothing to do with the experiment, it is a task that corresponds to the registration and academic control University office for each new academic period) [55]. This is a common situation in educational contexts, as teachers must evaluate the efficacy of their teaching methods, but pure experiments in these contexts are seldom politically, administratively, or ethically feasible [56].

G₁ and G₂ groups were the experimental groups and G₃ and G₄ were the control groups in each course. In addition, X was the experimental treatment that consisted of forming the required groups using the proposed approach, carrying out a collaborative learning activity during work sessions scheduled. In the control groups, to which the experimental treatment was not applied, the groups required for the collaborative activity, which was the same as for the experimental ones, were formed by students’ preference.

O₁, O₂, O₃, and O₄, were the pre-tests applied at the beginning of the experiment, both for the experimental and control groups, seeking to guarantee in some way the initial equivalence of the groups, which in turn guarantees the internal validity of the experiment. The pre-tests consisted of the individual response to the same questionnaire (for each of the courses), related to the topics of the collaborative activity.

In turn, O₅, O₆, O₇, and O₈ were the post-tests applied at the end of the experiment for both experimental and control groups, seeking to determine the implication of the experimental treatment. The post-tests consisted of individual responses to the same pre-test questionnaire (for each of the courses), related to the topics of collaborative activity.

The first experimental group G₁ consisted of 22 students from the Computer Programming course—Group 1 of the second semester of Electronic Engineering, who were applied the experimental treatment X and the post-test O₅. The control group G₃ consisted of 17 students from the Computer Programming course—Group 2, from the same semester and academic period, and who were not experimentally treated, but the O₇ post-test was applied. The second experimental group G₂ consisted of 24 students from the Graphic Programming course—Group 1 of the tenth semester of Systems Engineering, for whom the experimental treatment X and the post-test O₆ were applied. The control group G₄ consisted of 19 students from the Graphic Programming course—Group 2, from the same semester and academic period, and who were not experimentally treated, but the O₈ post-test was applied. As mentioned above, all groups were given pre-tests O₁, O₂, O₃, and O₄, seeking to guarantee the initial equivalence of the groups in each of the courses.

All participant students in the study were informed about the research’s purpose and scope and were assured of their anonymity.

5. Results

The proposal presented in this paper contemplates applying an instrument for measuring personality traits and an approach using genetic algorithms for group formation. The results described here correspond to the validation of the model and its application in a controlled experiment with students in a collaborative learning scenario. It is verified that the methodology described in Section 4 is useful to obtain groups with the desired characteristics, and it is empirically verified that the academic results obtained through the collaborative work of these groups are favourable.

As mentioned in Section 4.3, at the beginning of the experiment the pre-tests were applied to the study groups, the objective of which was to guarantee in some way the initial equivalence of the groups, since they are intact groups. These pre-tests consisted of questionnaires for individual response, about the specific topics of the collaborative activity developed in each course.

Figure 4 shows the apparent initial equivalence of the study groups. The results show that, on average, the grades obtained in the pre-tests by the experimental groups are like those obtained by the control groups in each of the courses.

To provide a solid conclusion regarding the initial equivalence of the groups, a statistical analysis was carried out using the Mann–Whitney U test, a non-parametric test used to compare two independent samples, seeking to statistically confirm the apparent similarity between the grades obtained by the experimental groups and those obtained by the control groups in the pre-tests, that is to say, that the students, in general, handle the same pre-concepts in each of their courses, before carrying out the experimentation. This test was used considering that the students’ pre-test grades do not follow a normal distribution.

The results of the application of the Mann–Whitney U test are shown in Table 13, which were obtained using SPSS™, with a confidence level of 95% and considering the following hypotheses: H₀: the means of the grades obtained by the students in the pre-test are similar; H₁: the means of the grades obtained by the students in the pre-test are different.

When comparing the experimental group G₁ with the control group G₃ of the Computer Programming course, a p-value of 0.589 was obtained. As this value is greater than 0.05, the alternative hypothesis (H₁) is rejected in favour of the null hypothesis (H₀), with a confidence level of 95%, that is, that the means of the grades obtained by the students in the pre-test are similar.

When comparing the experimental group G₂ with the control group G₄ of the Graphic Programming course, a p-value of 0.607 was obtained. As this value is greater than 0.05, the alternative hypothesis (H₁) is rejected in favour of the null hypothesis (H₀), with a confidence level of 95%, that is, that the means of the grades obtained by the students in the pre-test are similar.

The results of these tests show that there is no statistically significant difference between the means of the grades obtained in the pre-tests by the experimental groups and those obtained by the control groups in each of the courses, that is, that the pre-concepts that students handle about the topics required for the development of collaborative activity in each of the courses are similar. This adds validity to the experiment.

On the other hand, as mentioned in Section 4.3, at the end of the experiment, post-tests were applied to the study groups, the objective of which was to determine the implication of the experimental treatment. These post-tests consisted of questionnaires for individual response, about the specific topics of the collaborative activity developed in each course. This process made it possible to contrast the grades of the experimental groups versus those of the control groups, seeking to verify in a basic way if there is an improvement in the learning process by applying the proposed group formation technique based on personality traits, concerning the formation technique by students’ preference, traditionally used by teachers, when developing a collaborative activity.

Figure 5 shows the apparent positive incidence of the proposed experimental treatment. The results show that, on average, the grades obtained in the post-test by the experimental groups are higher than those obtained by the control groups.

To provide a solid conclusion regarding the goodness of the proposed group formation technique, a statistical analysis was performed using the Mann–Whitney U test, seeking to statistically confirm the difference between the grades obtained by the experimental groups compared to those obtained by the control groups, that is, a basic difference in the level of learning achieved by the students in the specific subject. This test was used considering that the students’ post-test grades do not follow a normal distribution. In addition, the effect size of the experimental treatment was calculated through Hedges’ g [57], a metric that allows to quantify the magnitude of the difference between two independent samples analysed through non-parametric tests, giving it greater reliability to test results.

The results of the application of the Mann–Whitney U test, which were obtained using SPSS™, are shown in Table 14 with a confidence level of 95% and considering the following hypotheses: H₀: the means of the grades obtained by the students in the post-test are similar; H₁: the means of the grades obtained by the students in the post-test are different.

When comparing the experimental group G₁ with the control group G₃ of the Computer Programming course, a p-value of 0.029 was obtained. As this value is less than 0.05, the null hypothesis (H₀) is rejected in favour of the alternative hypothesis (H₁), with a confidence level of 95%, that is, that the means of the grades obtained by the students in the post-test are different, with a difference of 0.6545 in favour of G₁. According to the classification made by Cohen [58], the effect size of the experimental treatment (g) with a value of 0.729 is considered as medium, approaching large, which implies that there is a significant difference between the results of the experimental group versus the control group not due to chance.

When comparing the experimental group G₂ with the control group G₄ of the Graphic Programming course, a p-value of 0.039 was obtained. As this value is less than 0.05, the null hypothesis (H₀) is rejected in favour of the alternative hypothesis (H₁), with a confidence level of 95%, that is, that the means of the grades obtained by the students in the post-test are different, with a difference of 0.4311 in favour of G₂. According to the classification made by Cohen [58], the effect size of the experimental treatment (g) with a value of 0.579 is considered as medium, which implies that there is a moderate difference between the results of the experimental group versus the control group that is not due to chance.

Finally, Table 15 shows the results of the application of the Mann–Whitney U test to the contrast between post-tests and pre-tests, which were obtained using SPSS™, with a confidence level of 95% and considering the following hypotheses: H₀: the means of the grades obtained by the students in the post-test and the pre-test are similar; H₁: the means of the marks obtained by the students in the post-test and the pre-test are different.

When comparing the post-tests with the pre-tests in both courses, it can be observed that in all cases the p-value is less than 0.05. Therefore, the null hypothesis (H₀) is rejected in favour of the alternative hypothesis (H₁), with a confidence level of 95%, that is, the means of the grades obtained by the students in the post-test and the pre-test are different. According to the classification made by Cohen [58], the effect size of the experimental treatment (g) with values (in all cases) greater than 0.8 is considered large, which implies that there is a very significant difference between the results of the post-tests and pre-tests that are not due to chance. In addition, these results indicate that there is an improvement on the part of the students in the domain of the specific topics in each course, independent of the group formation strategy that is used, which is more evident in the experimental groups than in the control ones.

The previous statistical analysis shows the positive impact of the experimental treatment applied in this research to the experimental groups compared to the control groups, allowing us to confirm that forming groups for collaborative learning scenarios considering the students’ personality traits benefits their academic performance.

6. Conclusions and Further Work

Considering that the problem of obtaining homogeneous, heterogeneous, or mixed groups from a set of students, where several of their characteristics are taken into account (for example, the personality dimensions of the Big Five model), is difficult to solve by analytical or exhaustive search methods, given the combinatorial explosion that occurs depending on the number of students and groups, a heuristic search method, such as genetic algorithms, turned out to be a good alternative for solving it.

The model presented in this paper aims to be a contribution to collaborative learning scenarios since it addresses one of its fundamental requirements: group formation. Therefore, it is very useful to have a solution that automates this process, to do it as efficiently as possible and increase the chances of success of the groups. The proposal aims to make the groups obtained as homogeneous as possible (as similar as possible to the general characteristics of the whole group), as heterogeneous as possible among themselves (to differ as far as possible from the general characteristics of the whole group), or showing a mixture among themselves (that are similar in some characteristics and that differ in others at the same time, concerning the general characteristics of the whole group). To achieve this, the model is made up of two parts. First, it considers the five dimensions proposed by the Big Five model to assess personality: extraversion, agreeableness, conscientiousness, neuroticism, and openness, proposing a specific instrument to measure them. Secondly, it uses genetic algorithms defining an objective function for each individual (a possible grouping), considering grouping as a multi-objective optimization problem.

The innovations of the proposed strategy lay in the grouping criterion considered and in the genetic operators used. Most studies in student group formation that use genetic algorithm approaches focus on grouping based on the students’ knowledge level and use crossover and mutation basic operators. The proposed approach took advantage of the personal characteristics arising from the five dimensions of the Big Five model, to improve collaboration and learning outcomes, both at the group and individual level. Additionally, a modification of the named C1 operator was used, a crossover operator for problems where there should not be repeated genes, as is the case under study; and, for mutation, a variation of the swap mutation operator was used. These modified genetic operators allow a more thorough search in the solution space, inserting new genetic information into the population, preventing the algorithm from being trapped in a local minimum.

The validation of the proposed model was carried out concerning the goodness of the genetic algorithms so that the group formation achieves the desired objectives; the results were satisfactory to the extent that the grouping obtained for each of the types (one of the possible ones) ensures that each group reflects more precisely similarity and/or variability with the whole set of students when considering the five dimensions of the Big Five Model as a grouping criterion.

As future work from the pedagogical perspective, it is desirable to improve the evaluation process by measuring not only the academic results of the students but also their performance at a collaborative level in these types of scenarios, which can facilitate knowledge, control, and improvement group work, leading to a progressive acquisition of this competence by students.

After this, it is proposed to assess in each context, also through controlled experiments, what type of approach is most suitable for organizing groups (homogeneous, heterogeneous, or mixed), bearing in mind that a technique and tool are available that facilitate this work and automate it.

In addition, it is suggested to explore which specific dimensions of the personality, evaluated through the Big Five Model, can most directly influence the learning process, particularly programming and software engineering in general. Based on this, eventually a recommendation system could be proposed that suggests what should be the ideal formation of workgroups, considering the specific characteristics of the students and the proposed collaborative activities.

Considering the above, from the computational point of view, the proposed group formation strategy works for any knowledge area—from the pedagogical point of view, an eventual generalization of the proposal would initially require characterization of the personality dimensions that favour collaborative performance in a specific area of knowledge, especially for mixed groups.

It is also proposed as future work to evaluate the additional effort that applying the proposed approach would imply on teachers and students, trying to determine if this “amount of additional effort” is worth making and under what circumstances.

Author Contributions

Conceptualization, O.R.S., C.A.C. and M.A.R.; methodology, O.R.S., C.A.C. and M.A.R.; software, O.R.S.; validation, O.R.S.; formal analysis, O.R.S.; investigation, O.R.S.; writing—original draft preparation, O.R.S.; writing—review and editing, O.R.S., C.A.C. and M.A.R.; supervision, C.A.C. and M.A.R.; funding acquisition, M.A.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study because all participant students were informed about the research’s purpose and scope and were ensured their anonymity.

Informed Consent Statement

Informed consent was obtained from all students involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors express their thanks to the respective research groups for allowing and supporting the development of this work: Galeras.NET Group, Universidad de Nariño; IDIS Group, Universidad del Cauca; and CHICO Group, Universidad de Castilla-La Mancha. In addition, special recognition to CHICO Research Group of the Universidad de Castilla-La Mancha for having funded the publication of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Moreno-Guerrero, A.-J.; Rondón García, M.; Martínez Heredia, N.; Rodríguez-García, A.-M. Collaborative Learning Based on Harry Potter for Learning Geometric Figures in the Subject of Mathematics. Mathematics 2020, 8, 369. [Google Scholar] [CrossRef] [Green Version]
Cárdenas, M.L.B.; Malagón, L. The Formation of Study Groups: Experiences in the Outset of a Permanent English Teacher Development Program. Signum Estud. da Ling. 2007, 10, 73–93. [Google Scholar] [CrossRef] [Green Version]
Barkley, E.F.; Major, C.H.; Cross, K.P. Collaborative Learning Techniques: A Handbook for College Faculty, 2nd ed.; Jossey-Bass: San Francisco, CA, USA, 2014; ISBN 9781118761557. [Google Scholar]
Lin, Y.-S.; Chang, Y.-C.; Chu, C.-P. Novel Approach to Facilitating Tradeoff Multi-Objective Grouping Optimization. IEEE Trans. Learn. Technol. 2016, 9, 107–119. [Google Scholar] [CrossRef]
Bekele, R. Computer-Assisted Learner Group Formation Based on Personality Traits; University of Hamburg: Hamburg, Germany, 2005. [Google Scholar]
Costaguta, R.; Menini, M.D.L.Á. An Assistant Agent for Group Formation in CSCL Based on Student Learning Styles. In Proceedings of the 7th Euro American Conference on Telematics and Information Systems—EATIS ’14; ACM Press: Valparaiso, Chile, 2014; pp. 1–4. [Google Scholar]
Lescano, G.; Costaguta, R.; Amandi, A. Genetic Algorithm for Automatic Group Formation Considering Student’s Learning Styles. In Proceedings of the 8th Euro American Conference on Telematics and Information Systems (EATIS); IEEE: Cartagena, Colombia, 2016; pp. 1–8. [Google Scholar]
Wang, D.-Y.; Lin, S.S.J.; Sun, C.-T. DIANA: A Computer-Supported Heterogeneous Grouping System for Teachers to Conduct Successful Small Learning Groups. Comput. Human Behav. 2007, 23, 1997–2010. [Google Scholar] [CrossRef]
Wichmann, A.; Hecking, T.; Elson, M.; Christmann, N.; Herrmann, T.; Hoppe, H.U. Group Formation for Small-Group Learning. In Proceedings of the 12th International Symposium on Open Collaboration; ACM: Berlin, Germany, 2016; pp. 1–4. [Google Scholar]
Manske, S.; Hoppe, H.U. Managing Knowledge Diversity: Towards Automatic Semantic Group Formation. In Proceedings of the 17th International Conference on Advanced Learning Technologies (ICALT), Timisoara, Romania, 3–7 July 2017; IEEE: Timisoara, Romania, 2017; pp. 330–332. [Google Scholar]
Zheng, Z.; Pinkwart, N. A Discrete Particle Swarm Optimization Approach to Compose Heterogeneous Learning Groups. In Proceedings of the 14th International Conference on Advanced Learning Technologies, Athens, Greece, 7–10 July 2014; IEEE: Athens, Greece, 2014; pp. 49–51. [Google Scholar]
Amarasinghe, I.; Hernandez-Leo, D.; Jonsson, A. Intelligent Group Formation in Computer Supported Collaborative Learning Scripts. In Proceedings of the 17th International Conference on Advanced Learning Technologies (ICALT), Timisoara, Romania, 3–7 July 2017; IEEE: Timisoara, Romania, 2017; pp. 201–203. [Google Scholar]
Sadeghi, H.; Kardan, A.A. Toward Effective Group Formation in Computer-Supported Collaborative Learning. Interact. Learn. Environ. 2016, 24, 382–395. [Google Scholar] [CrossRef]
Lykourentzou, I.; Antoniou, A.; Naudet, Y.; Dow, S.P. Personality Matters. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing; ACM: San Francisco, CA, USA, 2016; pp. 260–273. [Google Scholar]
Duque Reis, R.C. Formação de Grupos Em Ambientes Cscl Utilizando Traços de Personalidade Associados Às Teorias de Aprendizagem Colaborativa; Universidade de São Paulo: São Carlos, Brazil, 2019. [Google Scholar]
Battur, S.; Patil, M.S.; Desai, P.; Vijayalakshmi, M.; Raikar, M.M.; Hegde, P.; Joshi, G.H. Enhancing the Students Project with Team Based Learning Approach: A Case Study. In Proceedings of the 4th International Conference on MOOCs, Innovation and Technology in Education (MITE); IEEE: Madurai, India, 2016; pp. 275–280. [Google Scholar]
Borges, S.; Mizoguchi, R.; Bittencourt, I.I.; Isotani, S. Group Formation in CSCL: A Review of the State of the Art. In Higher Education for All. From Challenges to Novel Technology-Enhanced Solutions. HEFA 2017. Communications in Computer and Information Science; Cristea, A.I., Bittencourt, I.I., Lima, F., Eds.; Springer: Cham, Switzerland, 2018; Volume 832, pp. 71–88. ISBN 9783319979335. [Google Scholar]
Jung, C. Psychological Types; Taylor & Francis Ltd.: London, UK, 2017; ISBN 9781138687424. [Google Scholar]
Keirsey, D. Please Understand Me II: Temperament, Character, Intelligence; Prometheus Nemesis Book Company: Carlsbad, CA, USA, 2006; ISBN 9781885705020. [Google Scholar]
McCrae, R.R.; Allik, J. The Five-Factor Model of Personality Across Cultures; Springer: Boston, MA, USA, 2002; ISBN 9780306473555. [Google Scholar]
Torrin, K. A Guide to Myers-Briggs Type Indicator (MBTI), Including Its Background, Concepts, Applications, and More; Webster’s Digital Services: New York, NY, USA, 2012; ISBN 9781276177030. [Google Scholar]
Aguilar, R.A.; De Antonio, A.; Imbert, R. Searching Pancho’s Soul: An Intelligent Virtual Agent for Human Teams. In Proceedings of the Electronics, Robotics and Automotive Mechanics Conference (CERMA 2007), Morelos, Mexico, 25–28 September 2007; IEEE: Morelos, Mexico, 25 September 2007; pp. 568–571. [Google Scholar]
Soto, C.J.; Kronauer, A.; Liang, J.K. Five-Factor Model of Personality. In The Encyclopedia of Adulthood and Aging; Krauss Whitbourne, S., Ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2015; pp. 1–5. ISBN 9781118528921. [Google Scholar]
John, O.P.; Naumann, L.P.; Soto, C.J. Paradigm shift to the integrative Big Five trait taxonomy: History, measurement, and conceptual issues. In Handbook of Personality: Theory and Research; John, O.P., Robins, R.W., Pervin, L.A., Eds.; The Guilford Press: New York, NY, USA, 2008; pp. 114–158. ISBN 9781606237380. [Google Scholar]
Sleep, C.E.; Lynam, D.R.; Miller, J.D. A Comparison of the Validity of Very Brief Measures of the Big Five/Five-Factor Model of Personality. Assessment 2020, 28, 739–758. [Google Scholar] [CrossRef] [PubMed]
Maldonado Pérez, M. El Trabajo Colaborativo En El Aula Universitaria. Laurus Rev. Educ. 2007, 13, 263–278. [Google Scholar]
Chaljub Hasbún, J.M. Trabajo Colaborativo Como Estrategia de Enseñanza En La Universidad/Collaborative Work as a Teaching Strategy in the University. Cuad. Pedagog. Univ. 2015, 11, 64–71. [Google Scholar] [CrossRef]
Johnson, D.W.; Johnson, R.T.; Johnson Holubec, E. The New Circles of Learning: Cooperation in the Classroom and School; ASCD: Alexandria, VI, USA, 1994; ISBN 9780871202277. [Google Scholar]
Revelo-Sánchez, O.; Collazos-Ordóñez, C.A.; Jiménez-Toledo, J.A. El Trabajo Colaborativo Como Estrategia Didáctica Para La Enseñanza/Aprendizaje de La Programación: Una Revisión Sistemática de Literatura. TecnoLógicas 2018, 21, 115–134. [Google Scholar] [CrossRef] [Green Version]
Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; MIT University Press: Cambridge, MA, USA, 1992; ISBN 9780262275552. [Google Scholar]
Wang, R.; Sato, Y.; Liu, S. Mutated Specification-Based Test Data Generation with a Genetic Algorithm. Mathematics 2021, 9, 331. [Google Scholar] [CrossRef]
Díaz, D.; Valledor, P.; Ena, B.; Iglesias, M.; Menéndez, C. Improved Method for Parallelization of Evolutionary Metaheuristics. Mathematics 2020, 8, 1476. [Google Scholar] [CrossRef]
Goldberg, D.E. Genetic Algorithms; Pearson Education: New York, NY, USA, 2006; ISBN 9788177588293. [Google Scholar]
Alba, E.; Dorronsoro, B. Solving the Vehicle Routing Problem by Using Cellular Genetic Algorithms. In Evolutionary Computation in Combinatorial Optimization. EvoCOP 2004. Lecture Notes in Computer Science; Gottlieb, J., Raidl, G.R., Eds.; Springer: Berlin, Germany, 2004; Volume 3004, pp. 11–20. ISBN 9783540213673. [Google Scholar]
Asadzadeh, L. A Local Search Genetic Algorithm for the Job Shop Scheduling Problem with Intelligent Agents. Comput. Ind. Eng. 2015, 85, 376–383. [Google Scholar] [CrossRef]
Pongcharoen, P.; Hicks, C.; Braiden, P.M.; Stewardson, D.J. Determining Optimum Genetic Algorithm Parameters for Scheduling the Manufacturing and Assembly of Complex Products. Int. J. Prod. Econ. 2002, 78, 311–322. [Google Scholar] [CrossRef]
Rezoug, A.; Bader-El-Den, M.; Boughaci, D. Guided Genetic Algorithm for the Multidimensional Knapsack Problem. Memetic Comput. 2018, 10, 29–42. [Google Scholar] [CrossRef] [Green Version]
Vaishnav, P.; Choudhary, N.; Jain, K. Traveling Salesman Problem Using Genetic Algorithm: A Survey. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2017, 2, 105–108. [Google Scholar]
Zhang, W.; Lu, J.; Zhang, H.; Wang, C.; Gen, M. Fast Multi-Objective Hybrid Evolutionary Algorithm for Flow Shop Scheduling Problem. In Proceedings of the Tenth International Conference on Management Science and Engineering Management. Advances in Intelligent Systems and Computing; Xu, J., Hajiyev, A., Nickel, S., Gen, M., Eds.; Springer: Baku, Azerbaijan, 2016; pp. 383–392. [Google Scholar]
Ani, Z.C.; Yasin, A.; Husin, M.Z.; Hamid, Z.A. A Method for Group Formation Using Genetic Algorithm. Int. J. Comput. Sci. Eng. 2010, 2, 3060–3064. [Google Scholar]
Deleón, A.F.; Gómez, S.; Moreno, J. Uso de Tests de Aptitud y Algoritmos Genéticos Para La Conformación de Grupos En Ambientes Colaborativos de Aprendizaje. Av. Sist. Inf. 2009, 6, 165–172. [Google Scholar]
Amara, S.; Macedo, J.; Bendella, F.; Santos, A. Group Formation in Mobile Computer Supported Collaborative Learning Contexts: A Systematic Literature Review. Educ. Technol. Soc. 2016, 19, 258–273. [Google Scholar]
Odo, C.; Masthoff, J.; Beacham, N.; Alhathli, M. Affective State for Learning Activities Selection. In Proceedings of the Intelligent Mentoring Systems Workshop Associated with the 19th International Conference on Artificial Intelligence in Education, AIED 2018, London, UK, 27 June 2018; pp. 1–10. [Google Scholar]
Cruz, W.M.; Isotani, S. Group Formation Algorithms in Collaborative Learning Contexts: A Systematic Mapping of the Literature. In Collaboration and Technology. CRIWG 2014. Lecture Notes in Computer Science; Baloian, N., Burstein, F., Ogata, H., Santoro, F., Zurita, G., Eds.; Springer: Cham, Switzerland, 2014; Volume 8658, pp. 199–214. ISBN 9783319101651. [Google Scholar]
John, O.P.; Robins, R.W.; Pervin, L.A. Handbook of Personality, 3rd ed.; The Guilford Press: New York, NY, USA, 2008; ISBN 9781593858360. [Google Scholar]
Benet-Martínez, V.; John, O.P. Los Cinco Grandes across Cultures and Ethnic Groups: Multitrait-Multimethod Analyses of the Big Five in Spanish and English. J. Pers. Soc. Psychol. 1998, 75, 729–750. [Google Scholar] [CrossRef]
Moreno, J.; Rivera, J.C.; Ceballos, Y.F. Agrupamiento Homogéneo de Elementos Con Múltiples Atributos Mediante Algoritmos Genéticos. DYNA 2011, 78, 246–254. [Google Scholar]
Han, J.; Kamber, M. Data Mining: Concepts and Techniques, 2nd ed.; Elsevier Inc.: San Francisco, CA, USA, 2006; ISBN 9780080475585. [Google Scholar]
Conradie, W.; Goranko, V. Logic and Discrete Mathematics: A Concise Introduction; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2015; ISBN 9781118751275. [Google Scholar]
Kramer, O. Evolutionary Self-Adaptation: A Survey of Operators and Strategy Parameters. Evol. Intell. 2010, 3, 51–65. [Google Scholar] [CrossRef]
Mirjalili, S. Genetic Algorithm. In Evolutionary Algorithms and Neural Networks. Studies in Computational Intelligence; Springer: Cham, Switzerland, 2018; pp. 43–55. ISBN 9783319930251. [Google Scholar]
Reza Hejazi, S.; Saghafian, S. Flowshop-Scheduling Problems with Makespan Criterion: A Review. Int. J. Prod. Res. 2005, 43, 2895–2929. [Google Scholar] [CrossRef]
Araujo, L.; Cervigón, C. Algoritmos Evolutivos: Un Enfoque Práctico; Alfaomega Grupo Editor: Ciudad de México, México, 2009; ISBN 9786077686293. [Google Scholar]
Revelo-Sánchez, O.; Collazos, C.A.; Solano, A.F.; Fardoun, H. Diseño Colaborativo Basado En ThinkLets Como Apoyo a La Enseñanza de La Programación. Rev. Colomb. Comput. 2020, 21, 22–33. [Google Scholar] [CrossRef]
Kirk, R.E. Experimental Design—Procedures for the Behavioral Sciences, 4th ed.; SAGE Publications, Inc.: Los Angeles, CA, USA, 2013; ISBN 9781412974455. [Google Scholar]
Duzhin, F.; Gustafsson, A. Machine Learning-Based App for Self-Evaluation of Teacher-Specific Instructional Style and Tools. Educ. Sci. 2018, 8, 7. [Google Scholar] [CrossRef] [Green Version]
Ledesma, R.; Macbeth, G.; Cortada De Kohan, N. Tamaño Del Efecto: Revisión Teórica y Aplicaciones Con El Sistema Estadístico ViSta. Rev. Latinoam. Psicol. 2008, 40, 425–439. [Google Scholar]
Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: New York, NY, USA, 1988; ISBN 9780805802832. [Google Scholar]

Figure 1. GA general scheme.

Figure 2. The main flow of the student group formation process using GA.

Figure 3. GA performance.

Figure 4. Pre-test results. (a) Computer Programming; (b) Graphic Programming.

Figure 5. Post-test results. (a) Computer Programming; (b) Graphic Programming.

Table 1. Spanish and English Big Five Inventory.

Spanish Big Five Inventory
Las siguientes expresiones le describen a usted con más o menos precisión. Por ejemplo, ¿está de acuerdo en que usted es alguien “chistoso, a quien le gusta bromear”? Por favor escoja un número para cada una de las siguientes expresiones, indicando así hasta qué punto está de acuerdo o en desacuerdo en cómo le describe a usted.
Muy en desacuerdo 1	Ligeramente en desacuerdo 2	Ni de acuerdo ni en desacuerdo 3	Ligeramente de acuerdo 4	Muy de acuerdo 5
Me veo a mi mismo-a como alguien que…
___ 1. es bien hablador ___ 2. tiende a ser criticón ___ 3. es minucioso en el trabajo ___ 4. es depresivo, melancólico ___ 5. es original, se le ocurren ideas nuevas ___ 6. es reservado ___ 7. es generoso y ayuda a los demás ___ 8. puede a veces ser algo descuidado ___ 9. es calmado, controla bien el estrés ___ 10. tiene intereses muy diversos ___ 11. está lleno de energía ___ 12. prefiere trabajos que son rutinarios ___ 13. inicia disputas con los demás ___ 14. es un trabajador cumplidor, digno de confianza ___ 15. con frecuencia se pone tenso ___ 16. tiende a ser callado ___ 17. valora lo artístico, lo estético ___ 18. tiende a ser desorganizado ___ 19. es emocionalmente estable, difícil de alterar ___ 20. tiene una imaginación activa ___ 21. persevera hasta terminar el trabajo ___ 22. es a veces mal educado con los demás			___ 23. es inventivo ___ 24. es generalmente confiado ___ 25. tiende a ser flojo, vago ___ 26. se preocupa mucho por las cosas ___ 27. es a veces tímido, inhibido ___ 28. es indulgente, no le cuesta perdonar ___ 29. hace las cosas de manera eficiente ___ 30. es temperamental, de humor cambiante ___ 31. es ingenioso, analítico ___ 32. irradia entusiasmo ___ 33. es a veces frío y distante ___ 34. hace planes y los sigue cuidadosamente ___ 35. mantiene la calma en situaciones difíciles ___ 36. le gusta reflexionar, jugar con las ideas ___ 37. es considerado y amable con casi todo el mundo ___ 38. se pone nervioso con facilidad ___ 39. es educado en arte, música, o literatura ___ 40. es asertivo, no teme expresar lo que quiere ___ 41. le gusta cooperar con los demás ___ 42. se distrae con facilidad ___ 43. es extrovertido, sociable ___ 44. tiene pocos intereses artísticos
Por favor, compruebe que ha escrito un número delante de cada frase.
English Big Five Inventory
Here are a number of characteristics that may or may not apply to you. For example, do you agree that you are someone who likes to spend time with others? Please choose a number for each statement to indicate the extent to which you agree or disagree with that statement.
Disagree strongly 1	Disagree a little 2	Neither agree nor disagree 3	Agree a little 4	Agree strongly 5
I see myself as someone who…
___ 1. is talkative. ___ 2. tends to find fault with others. ___ 3. does a thorough job. ___ 4. is depressed, blue. ___ 5. is original, comes up with new ideas. ___ 6. is reserved. ___ 7. is helpful and unselfish with others. ___ 8. can be somewhat careless. ___ 9. is relaxed, handles stress well. ___ 10. is curious about many different things. ___ 11. is full of energy. ___ 12. starts quarrels with others. ___ 13. is a reliable worker. ___ 14. can be tense. ___ 15. is ingenious, a deep thinker. ___ 16. generates a lot of enthusiasm. ___ 17. has a forgiving nature. ___ 18. tends to be disorganized. ___ 19. worries a lot. ___ 20. has an active imagination. ___ 21. tends to be quiet. ___ 22. is generally trusting.			___ 23. tends to be lazy. ___ 24. is emotionally stable, not easily upset. ___ 25. is inventive. ___ 26. has an assertive personality. ___ 27. can be cold and aloof. ___ 28. perseveres until the task is finished. ___ 29. can be moody. ___ 30. values artistic, aesthetic experiences. ___ 31. is sometimes shy, inhibited. ___ 32. is considerate and kind to almost everyone. ___ 33. does things efficiently. ___ 34. remains calm in tense situations. ___ 35. prefers work that is routine. ___ 36. is outgoing, sociable. ___ 37. is sometimes rude to others. ___ 38. makes plans and follows through with them. ___ 39. gets nervous easily. ___ 40. likes to reflect, play with ideas. ___ 41. has few artistic interests. ___ 42. likes to cooperate with others. ___ 43. is easily distracted. ___ 44. is sophisticated in art, music, or literature.
Please check: Did you write a number in front of each statement?

Table 2. Representation of a set of students.

Id	$C_{1}$	$C_{2}$	…	$C_{M}$
1	70	0.50	…	25
2	20	0.83	…	−10
⋮	⋮	⋮		⋮
$N$	45	1.22	…	13

Table 3. Representation of an individual.

	S₁	S₂	S₃	S₄	S₅
G₁	1	2	3	4	5
G₂	6	7	8	9	10
G₃	11	12	13	14	15
G₄	16	17	18	19	20

Table 4. Example students.

Id	$C_{1}$	$C_{2}$	$C_{3}$
1	0.12	1.00	0.90
2	0.97	0.00	0.30
3	0.00	0.64	0.98
4	1.00	0.45	1.00
5	0.35	0.07	0.93
6	0.59	0.84	0.00

Table 5. Two possible example individuals.

Individual 1			Individual 2
1	2	3	1	3	5
4	5	6	2	4	6

Table 6.

\bar{X_{g, C}^{i}}

calculation.

Table 6.

\bar{X_{g, C}^{i}}

calculation.

Individual	Group	Id	$C_{1}$	$C_{2}$	$C_{3}$
1	1	1	0.120	1.000	0.900
		2	0.970	0.000	0.300
		3	0.000	0.640	0.980
		$\bar{X_{1, C}^{1}}$	0.363	0.547	0.727
	2	4	1.000	0.450	1.000
		5	0.350	0.070	0.930
		6	0.590	0.840	0.000
		$\bar{X_{2, C}^{1}}$	0.647	0.453	0.643
2	1	1	0.120	1.000	0.900
		3	0.000	0.640	0.980
		5	0.350	0.070	0.930
		$\bar{X_{1, C}^{2}}$	0.157	0.570	0.937
	2	2	0.970	0.000	0.300
		4	1.000	0.450	1.000
		6	0.590	0.840	0.000
		$\bar{X_{2, C}^{2}}$	0.853	0.430	0.433

Table 7. Simulation results of crossover and mutation probabilities.

	Homogeneous Optimal Value: 0.04259					Heterogeneous Optimal Value: 0.37947					Mixed (Het 1,3; Hom 2) Optimal Value: 0.35975
p_c\p_m	0.001	0.005	0.01	0.05	0.1	0.001	0.005	0.01	0.05	0.1	0.001	0.005	0.01	0.05	0.1
0.2	100	100	100	64	62	100	100	100	56	59	100	100	100	65	64
0.3	100	100	100	70	63	100	100	100	59	59	100	100	100	59	61
0.4	99	100	100	59	51	99	100	100	64	63	100	100	100	63	58
0.6	74	94	89	65	56	87	93	91	60	61	89	98	91	72	56
0.8	61	67	49	50	52	61	59	53	58	43	66	59	52	51	49

Table 8. Results of population size and generation number simulation (time (T) in milliseconds, fitness value (F)).

G\PS	100		250		500		1000
G\PS	T	F	T	F	T	F	T	F
50	670	2.86807	2793	2.95261	6039	2.57977	6440	2.41648
100	1294	3.19189	5196	3.11056	9153	2.68289	12,071	2.43724
250	3413	3.38202	13,152	3.27827	16,379	2.80822	29,856	2.45063
500	6311	3.65009	15,166	3.12584	30,225	2.77663	57,944	2.52308
1000	11,553	4.08895	30,642	3.24229	57,730	2.89469	115,698	2.56421

Table 9. Results of the 4-student group formation (time (T) in milliseconds, fitness value (F)).

	Homogeneous		Heterogeneous		Mix (Ht123, Hm45)
Students	T	F	T	F	T	F
20	2384	0.02564	2837	1.45408	3425	0.79947
50	12,244	0.15691	12,120	4.09133	12,406	2.58507
100	42,644	0.40598	40,700	6.79100	40,434	4.15891

Table 10. Results of the 5-student group formation (time (T) in milliseconds, fitness value (F)).

	Homogeneous		Heterogeneous		Mix (Ht123, Hm45)
Students	T	F	T	F	T	F
20	2711	0.01084	3038	1.04431	2908	0.60869
50	11,065	0.06563	12,240	2.97906	12,235	1.93656
100	41,801	0.20536	51,146	4.92599	41,940	3.26345

Table 11. Characterization of the working groups.

Program-Course	N	Group Type	Number of Groups	Grouping Type
Electronic Engineering- Computer Programming	22	Experimental	6/3–1/4	Heterogeneous
Electronic Engineering- Computer Programming	17	Control	3/3–2/4	Students’ preference
Systems Engineering- Graphic Programming	24	Experimental	8/3	Homogeneous
Systems Engineering- Graphic Programming	19	Control	5/3–1/4	Students’ preference

Table 12. Experimental design.

Group Type	Group	Pre-Test	Experimental Stimulus	Post-Test
Experimental	G₁	O₁	X	O₅
Experimental	G₂	O₂	X	O₆
Control	G₃	O₃	-	O₇
Control	G₄	O₄	-	O₈

Table 13. Mann–Whitney U test for pre-tests.

Course	Group	N	Tests	p
Computer Programming	Experimental (G₁)	22	O₁–O₃	0.589
Computer Programming	Control (G₃)	17	O₁–O₃	0.589
Graphic Programming	Experimental (G₂)	24	O₂–O₄	0.607
Graphic Programming	Control (G₄)	19	O₂–O₄	0.607

Table 14. Mann–Whitney U Test for post-tests.

Course	Group	N	Tests	p	g
Computer Programming	Experimental (G₁)	22	O₅–O₇	0.029	0.729
Computer Programming	Control (G₃)	17	O₅–O₇	0.029	0.729
Graphic Programming	Experimental (G₂)	24	O₆–O₈	0.039	0.579
Graphic Programming	Control (G₄)	19	O₆–O₈	0.039	0.579

Table 15. Mann–Whitney U Test for post-tests vs. pre-tests.

Course	Group	N	Tests	p	g
Computer Programming	Experimental (G₁)	22	O₁–O₅	0.000	2.860
Computer Programming	Control (G₃)	17	O₃–O₇	0.002	1.433
Graphic Programming	Experimental (G₂)	24	O₂–O₆	0.000	2.713
Graphic Programming	Control (G₄)	19	O₄–O₈	0.000	1.735

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Revelo Sánchez, O.; Collazos, C.A.; Redondo, M.A. Automatic Group Organization for Collaborative Learning Applying Genetic Algorithm Techniques and the Big Five Model. Mathematics 2021, 9, 1578. https://doi.org/10.3390/math9131578

AMA Style

Revelo Sánchez O, Collazos CA, Redondo MA. Automatic Group Organization for Collaborative Learning Applying Genetic Algorithm Techniques and the Big Five Model. Mathematics. 2021; 9(13):1578. https://doi.org/10.3390/math9131578

Chicago/Turabian Style

Revelo Sánchez, Oscar, César A. Collazos, and Miguel A. Redondo. 2021. "Automatic Group Organization for Collaborative Learning Applying Genetic Algorithm Techniques and the Big Five Model" Mathematics 9, no. 13: 1578. https://doi.org/10.3390/math9131578

APA Style

Revelo Sánchez, O., Collazos, C. A., & Redondo, M. A. (2021). Automatic Group Organization for Collaborative Learning Applying Genetic Algorithm Techniques and the Big Five Model. Mathematics, 9(13), 1578. https://doi.org/10.3390/math9131578

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automatic Group Organization for Collaborative Learning Applying Genetic Algorithm Techniques and the Big Five Model

Abstract

1. Introduction

2. Theoretical Foundations

2.1. The “Big Five” Model

2.2. Work and Collaborative Learning

2.3. Genetic Algorithms

3. Related Works

4. Proposed Approach

4.1. Big Five Inventory (BFI)

4.2. Algorithm for Group Formation

4.2.1. Student Representation

4.2.2. Individual Representation

4.2.3. Fitness Measure

4.2.4. Initial Population and Evolution

4.2.5. Search Complexity and Algorithm Performance

4.3. Empirical Design

5. Results

6. Conclusions and Further Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI