1. Introduction
Saccadic eye movements are quick movements that occur when visual focus on an object changes rapidly [
1]. There are strong connections between saccadic eye movements and cognition levels. It has been found that it is meaningful to analyze not only saccadic eye movements but also pupillary parameters to determine cognitive states; with suitable combinations of these variables, the cognitive load levels of individuals for specific activities can be determined [
2]. Therefore, many studies have analyzed the cognitive levels of individuals using eye movements for different tasks [
3]. The measurement of rapid eye movements, such as saccadic eye movements, has improved with the performance of eye-tracking devices, and the success and number of studies in this field have increased significantly in recent years. In this article, some of these studies are discussed. Pfleging et al. conducted a fully controlled experiment with an intuitive classifier under different illumination conditions, during which a dataset and preliminary model were developed to estimate the mental load level based on eye movement [
4]. In a driving simulator study, Kun et al. analyzed the pupil diameter measurements of a distant speaker and a driver preparing to speak and discovered that the pupil diameter of the driver was more significant than that of the remote speaker as the driver was preparing to speak. They also investigated the effects of different conversations on the pupil diameter. Consequently, they aimed to create a dialogue system capable of adapting to the driver’s behavior under a high cognitive load [
5].
Kruger and colleagues used the percentage change in pupil diameters to examine the cognitive load that occurs when participants watch a video with and without subtitles. They discovered cognitive load decreased in courses with subtitles when watching videos [
6]. Chen et al. suggested an intelligent cognitive load monitoring system based on the use of the eyes. According to the proposed method, the cognitive load was created using arithmetic exercises and controlled by the number of moves and digits. They used regression models to identify more than two cognitive load levels using different eye activity models. It was observed that the pupil diameter and number of blinks were found to increase more during complicated tasks [
7]. Zagermann et al. proposed employing eye-tracking tools to study users’ cognitive load when interacting with a system. In this context, they provided a model that integrates human–computer interaction features into the link between eye-tracking data and cognitive load. They believe that this approach will help develop interfaces that require less cognitive capacity and suggest that eye tracking is a significant tool for examining cognitive processes in visual computation [
8]. In another study, Cong et al. examined the relationship between cognitive load and eye movements. They investigated cognitive load in the multimedia presentation process. They developed a quantitative model to assess cognitive load in terms of knowledge comprehension and used pupil diameter variation for an assessment focused on the cognitive load theory [
9].
The term cognition refers to the ability of the brain to collect, process, and convert information received from various sources such as sensory stimuli, perception, experience, and vision. Cognitive load theory aims to objectively analyzes and make sense of the human brain’s perception, understanding, and learning processes [
10]. This exciting theory is used in many disciplines, especially education. Although it is believed that there are evolutionary similarities in information processing in individuals, it should be remembered that the ability to store information varies from person to person and is influenced by a variety of factors [
11]. John Sweller developed cognitive load theory (CLT), and an article on the subject was published in the journal Cognitive Science in 1988 [
12]. John Sweller revised this theory in an article published in 2019 owing to developments in the theory over time. CLT aims to aid scientists in developing unique instructional solutions compatible with the limitations of the human cognitive system. Working memory, or short-term memory, has a finite capacity and can only effectively process a limited amount of information instantly. This assumption regarding memory capacity for processing information is the basis of the cognitive load theory. Cognitive load theory aims to build meaningful learning design concepts that focus on the human cognitive system. According to cognitive load theory, human cognitive architecture assumes cognitive schemes that contain limited working memory and unlimited long-term memory; expertise is obtained only from knowledge stored in long-term memory as structures. Cognitive load is a multidimensional concept that affects a learner’s cognitive system while performing a task. The framework of Paas and van Merrinboer’s broad CLT model contains a causal dimension that reflects the interaction of task and learner characteristics and an evaluative dimension that reflects the concepts of measurable load, effort, and mental performance [
13,
14]. Every stimulus the individual is exposed to while in the learning position creates a load on the individual’s mind, and these stimuli may be in the environment in a desired or undesirable manner.
According to CLT, there are three types of cognitive load: intrinsic, extraneous, and germane [
15]. Intrinsic load is a cognitive load produced by the complexity or difficulty of learning new knowledge. Researchers in this field have found that learning a structure with many interaction components is more complex than learning a system with fewer interaction elements and requires more cognitive capacity to process [
16]. The learning technique cannot change the intrinsic load; it can only change with the degree of skill. Expert learners with extensive knowledge can quickly combine complex information items with pre-existing schemas and manipulate schema development as a working memory item. Therefore, individuals who are experts in their field have a low intrinsic load, despite solving complex problems. According to CLT, there are two types of loads apart from the intrinsic load [
17]. Extraneous load refers to the mental resources allocated to objects that do not contribute to learning and schema creation. This part of cognitive load is about the presentation of information and the instructional format, which can increase the total cognitive load while having little effect on learning; it is the amount of memory consumed by all the hidden programs operating in the background in the system tray [
18]. Germane load represents partitioned mental resources for creating and organizing long-term memory schemas, and it is similar to memory usage when loading a program on a computer. In this context, germane load defines the activity of long-term storage of information or schemas, and the peculiarity of this section significantly accelerates the learning process. Germane load enables the development of cognitive behavioral patterns necessary to make sense of the categories of information. As a result of the aggregation of these three loads, cognitive load emerges. This also means that the total cognitive load an individual experiences in working memory while performing a specific task is the sum of the three different load types.
Learning is connected to increasing the processing capacity of the working memory. The learning process involves transferring information from working memory to long-term memory. As in Schema theory, this information transmission allows information to be structured as a schema in long-term memory. Creating a schema entails connecting disparate pieces of information to move from a lower to a higher level of complexity and keeping them together as a single meaningful whole chunk of information.
Figure 1 illustrates a mental architecture model and the function of CLT in working memory and schema formation [
18].
Measuring this load is as important as defining the concept of a cognitive load. Many methodologies and measurement tools can be used to determine the cognitive load. The following are some approaches for evaluating cognitive load frequently utilized in the literature.
Emotional Scales: Participants are administered questionnaires assessing the difficulty and confusion level of cognitive tasks.
Performance Measures: Performance measures such as the time to complete a task, level of accuracy, or number of errors can be indicators of cognitive load.
Psychophysiological Measures: Physiological measurements such as electroencephalogram (EEG), electrocardiogram (ECG), eye movements, or skin conductance can be used to determine cognitive load.
Among these approaches, studies using physiological measurements have increased in popularity in recent years due to their avoidance of subjectivity and the development of sensor technologies. Eye-tracking systems stand out among various physiological parameters for estimating cognitive load. There are various reasons for this. The stimulus is usually presented to the user visually, with the result that there is a significant change in eye movements; there is no physical connection between eye tracking systems and the user; eye movements can be interpreted more easily than other physiological signal measurements; the cost of an eye-tracking system is as much as a comprehensive EEG device or magnetoencephalography (MEG) device; and its application does not require any pre-processing that requires expertise. There are many studies on the calculation of cognitive load from eye movements. In a study conducted in this context, pupil responses were used to establish a meaningful relationship between cognitive load and the difficulty level of a video game [
19]. In another study that examined the behavior of map users in the process of performing specific tasks using the relationship between cognitive load and eye movements, significant differences were observed between the eye movements of experts and novice users. As a result of the study, novices were observed to have longer fixation durations and higher saccade velocity, which indicates a higher cognitive load for novices [
20]. Another comprehensive study in this field also found a positive correlation between pupil dilation and cognitive load [
21]. In another study, the effects of visual and auditory stimuli on cognitive load were analyzed using eye movements. In a related study, the impact of background music (BGM) on cognitive load in learning processes was analyzed through eye movements. It was found that listening to BGM imposed a higher cognitive load on post-lexical processes than on lexical processes [
22]. Eye movement and pupillary response indicators of cognitive load, commonly used in studies, are listed below.
Saccadic eye movement is a type of rapid eye movement in which the eye shifts rapidly from one focal point to another. It is an essential component of visual perception and is controlled by a complex network of neuronal circuits in the brain [
23]. During a saccade, the eyes move rapidly from one point to another, followed by a brief fixation that allows the brain to interpret visual information. Because saccadic eye movement is a form of eye movement at high speeds, observing with cameras with high sampling rates is more meaningful and efficient. Saccadic eye movements are essential in various tasks and have been extensively studied in cognitive psychology, neurology, and ophthalmology.
The main objective of this research is to identify the variables that influence programmers’ cognitive load during computer programming activities. An extensive database collecting quantitative and categorical information on programmers’ spoken language, programming experience, and age was used. Through linguistic distance, categorical information in the database, such as native language, was transformed into quantitative variables, and the database of 216 participants was made appropriate for study through numerical values. As a result of the study, the factors influencing the cognitive load of programmers were determined and detailed in percentages.
3. Results
The canonical correlation analysis method was used to establish a meaningful and interpretable relationship between eye parameters affecting cognitive load and personal parameters of programmers. In the first stage, the data were normalized using the
z-score normalization method as a pre-processing step. Partial correlation analysis was performed with the dependent and independent variables created within the scope of this study, according to
Table 2, and the results are given in
Table 4 and
Table 5, respectively. The aim of the particle correlation analysis, which is performed as a preliminary analysis stage, is to reveal a meaningful correlation between and within variables [
42].
As seen in the partial correlation results for the dependent variables, there is no strong relationship between these variables. Among the dependent variables, there is a weak relationship between eye movement SD and pupil radius change SD 0.244).
The highest correlations between independent variables, which are personal parameters of programmers, are calculated follows: between the age of programmer and time of experience in programming 0.66) and between expertise level in experiment language and expertise level in programming 0.613).
The correlation between independent and dependent variables was analyzed to establish a significant correlation between cognitive load and the personal parameters of programmers, and the results are shown in
Table 6.
According to the results of the analysis, the dependent and independent variable groups with the highest correlation values are shown in
Table 7 with
p-values. As seen in the table, the relationship between these dependent and independent variables is statistically significant.
As shown by the results of the partial correlation analysis between the dependent and independent variables, there was a moderate correlation between the time it took programmers to complete the given tasks and their experience in the experimental language. In this study, moderate or weak correlation was determined according to the rule of thumb used to interpret the magnitude of the correlation coefficient [
43]. A multivariate data analysis approach is required to improve the correlation coefficient and establish a more effective and meaningful relationship between the two sets of variables. For this reason, canonical correlation analysis was preferred, because it analyzes the effects on the variables in a meaningful manner and provides the parameters that will provide the maximum correlation between the two datasets. The correlation ratio between the datasets and the level of significance of the variables was examined using canonical correlation analysis. Canonical correlation analysis provided the canonical correlation coefficient between cognitive load parameters and programmers’ personal parameters, as well as canonical variable vectors and weights. The relationship between the two datasets was analyzed to determine the significance of the canonical correlation coefficients. The canonical correlation coefficients for the canonical variable vectors among the relevant datasets are listed in
Table 8.
When the particle correlation analysis and canonical correlation results of the relationship between cognitive load and the databases of programmers’ parameters are analyzed, it is seen that canonical correlation analysis yields more statistically significant results, as expected. In this way, the relationship between the dependent and independent variables was found statistically more meaningfully and became suitable for integrated analysis.
Figure 7 and
Figure 8 show the variation in the 1st and 2nd canonical variable vectors representing these relationships, respectively. When the related figures were analyzed, it was determined that the relationship between the first canonical variable pair was more linear and statistically more significant. Therefore, the first canonical variable pair was preferred for creating a mathematical model to define the relationship between programmers’ cognitive load and personal parameters.
When the relationship between the two canonical variable vector pairs is examined, it is clearly observed that the relationship of the first vector pair has the highest correlation and a more linear characteristic. In addition, this linear characteristic difference can be easily observed from the deviation in the distribution of data on the fitted optimal line. Using canonical weighting, which is statistically significant and has the highest correlation coefficient, a path diagram describing the relationship between cognitive load and the personal parameters of programmers was generated, as shown in
Figure 9.
The amplitude of the canonical weights of canonical variables provides information regarding the influence of variables according to the datasets. Therefore, the relatively high amplitude of the weights provides information regarding the effect ratio between each dataset. To make the path diagram more meaningful using canonical weights, the percentage effects of the normalized canonical weights on the variables are shown in
Figure 10.
When the percentages of personal parameters affecting the cognitive load parameters of programmers were examined, it was determined that the highest effect was that of the age factor (18.047%), and the gender parameter had the lowest effect (0.255%). One of the important and novel results of this study is that the linguistic distance between a programmer’s native language and English has a significant effect on cognitive load (15.404%). When the canonical correlation analysis of the dataset is examined from another point of view, it is seen that there is a more balanced weight between the cognitive load parameters, which are as follows: peak ratio (20.9%), SD of eye movement (17.544%), SD of pupil radius (26.019%), and total duration (35.436%).
4. Discussion
In this study, we have determined the parameters that affect the cognitive load of computer programmers during code comprehension tasks in percentage terms. We have used the EMIP database, a comprehensive database of eye movements and different personnel parameters of 216 participants [
33,
34].
We have transformed categorical variables, such as the native language of programmers in the relevant EMIP database, into meaningful and efficient quantitative variables using the linguistic distance approach [
37]. We have extensively investigated the influence of linguistic distance on the cognitive load of computer programmers in this study for the first time in the literature with integrated sub-parameters.
To determine the rapid movements of the eye, the time-dependent changes in the pupil coordinates have been obtained, and the peak values for modelling the sudden movement of the eye have been determined with the
z-score peak detection algorithm. We have used the Savitzky–Golay FIR smoothing filter to reduce signal measurement noise during data processing. We have assessed eye movements and problem-solving time as cognitive load parameters of programmers. The personal parameters of the programmers and the calculated cognitive load parameters have been determined as dependent and independent variables. First, we performed partial correlation analyses within and between these variables and tested the statistical significance of the results obtained. In the next step, we analyzed the effect of these variables on each other with the CCA approach to increase statistical significance and reveal more comprehensive results. As a result of the analysis, we have observed that not only the native language spoken but also the linguistic distance of the native language from English significantly affected the cognitive load. The results that we have obtained within the scope of this study support the results and suggestions of the studies examining the relationship between cognitive load and language [
44].
We have determined the relationship between computer programming and spoken languages is essential according to cognitive load theory. Therefore, like the continuous exposure method used in foreign language education, teaching programming languages like spoken language can be meaningful and beneficial [
45]. We have hypothesized that this approach could positively influence the process of learning a programming language in the future.
We found that the age of the programmer had the highest effect on the cognitive load. Then, we determined that the duration of experience in the relevant programming language also had a significant effect on the programmer’s cognitive load. These outputs support the studies finding that age is an essential factor in cognitive load [
46]. Studies conducted within the scope of analyzing learning processes have revealed that cognitive load increases with age [
47].
One of the results of this study is that we have found that the frequency of using a different programming language had less effect on cognitive load. We have observed that specialization in or intensive use of a programming language with a different structure has a limited effect on the perception and interpretation of the problem encountered in another programming language. However, to support and generalize this inference, conducting comprehensive and specific studies that analyze the effect of different programming languages on cognitive load will be helpful.
It is widely recognized that there are biological and social differences between the ways men and women perceive environmental stimuli [
48]. However, current theories on cognitive load argue against the idea that these differences lead to precise and direct differences in working memory cognitive load. Cognitive abilities and performance are influenced by not only gender but also past experiences, education, and other factors that account for individual cognitive differences [
49].
A comprehensive review study conducted in this context examined many neuroimaging studies to investigate gender differences in brain structures. Although researchers have found gender-related differences in some brain regions, these differences are insufficient to explain cognitive abilities [
50].
Another comprehensive meta-analysis study investigating visual–spatial working memory in terms of gender differences has similar findings. The meta-analysis examined gender differences in visual–spatial working memory using a dataset of 180 effect sizes from 98 samples of healthy men and women aged 3 to 86 years. Statistical analyses of the study showed that these differences were too small to adequately explain gender differences in visual–spatial working memory abilities, particularly in mental rotation, where effect sizes may be relatively large [
51].
As a result of this study, we have observed that gender difference has a limited effect on cognitive load in computer programming processes. One point not entirely comprehended in research on gender differences in cognitive load has been the deficiency of clear universal spatial or temporal patterns in males’ and females’ responses to external factors. In this study on computer programming and cognitive load, similar results to those in the literature have been obtained, and no significant correlation has been found between cognitive load and gender differences. The pattern of the relationship between gender differences and cognitive load will become more evident and meaningful with comprehensive studies with different stimuli over time.
This study has some limitations that should be emphasized. The programmers’ cognitive load or physiological fatigue levels were not measured physiologically or with various scales before performing the tasks. In the process of realizing the dataset, the programmers only looked at two code samples; both were object-oriented, so whether the obtained findings can be generalized to more algorithmic code or code in languages with different programming models may need to be clarified. As this was a multi-location study, minor differences in the experimental setup may have occurred despite using the same hardware in all locations, such as the same eye tracker and laptop. The use of eye tracking to measure cognitive load has important advantages, but in some cases, it can also have disadvantages. Studies conducted with eye-tracking systems can sometimes feel uncomfortable or artificial for participants. Therefore, they may cause changes in participant behavior that may affect the results of the studies.