1. Introduction
Individual differences in learning to read originate from biological and environmental factors, which shape the development of the brain systems involved in the reading process [
1]. Dyslexia, a specific learning disorder with impairments in reading [
2] refers to a pattern of learning difficulties characterized by problems with accurate or fluent word recognition, poor decoding, and poor spelling abilities. Up to 20% of the general population may exhibit some degree of these difficulties [
3], while about 7% of people are affected heavily enough to qualify for a dyslexia diagnosis [
4]. Due to its nature, dyslexia is typically diagnosed only after children have started to learn to read, when it becomes evident that they are struggling to keep up with their peers [
5]. At this point, the pupils with dyslexia are already at risk of falling behind because reading is essential for school achievement in most subjects. Moreover, children with poor reading skills are also at an increased risk of social, emotional, and mental health problems, such as school dropout, attempted suicide, incarceration, anxiety, depression, and low self-concept [
6]. Therefore, it would be invaluable if children with dyslexia or at risk for dyslexia could be identified and involved in prevention and treatment programs as early as possible.
The diagnosis of dyslexia, especially at its early stages, has proven to be a complex task, especially because of the lack of a strict procedure for dyslexia screening [
7]. Being able to diagnose dyslexia and create a tool that can objectively quantify certain dyslexic tendencies has proven to be quite important so that the diagnosis process could be as objective and reliable as possible [
8].
Recently, Carioti et al. have shown that in developmental dyslexia research (published from 2013 to 2018), 67.4% of studies were performed on languages that could be considered to have a deep orthographic system [
9]. Considering this, performing dyslexia research on languages that have a shallow orthographic system could be considered quite important, not only because of their underrepresentation but because reading in a shallow orthographic system is easier, making dyslexia even harder to diagnose. Dyslexia diagnosis is a challenging task in the Serbian language (with one-to-one grapheme-phoneme pairs), which belongs to the group of languages with a shallow orthography.
In this paper, the study performed on native dyslexic and non-dyslexic Serbian speakers is presented. Novel spatiotemporal eye-tracking features were introduced, and the classification results using various machine learning (ML) algorithms were compared with the results obtained using conventional eye-tracking features. The difference between the subject classes (dyslexic and non-dyslexic) was analyzed using statistical tests for different color configurations in order to examine the influence of the color configuration of the reading material on subject class separability. Statistical analysis was also performed within the dyslexic subject group in order to analyze the influence of color configuration on reading performance and to determine whether a given color could influence the eye movement features in a manner indicating facilitation or aggravation of the reading task in subjects with dyslexia.
The contributions of the performed research are as follows:
Development of a novel feature set for describing and quantifying dyslexic tendencies in the Serbian language;
Statistical and classification analysis, showing the potential of the proposed features to be used as indicators of dyslexic tendencies;
An analysis of the influence of colored backgrounds and overlays on reading patterns using a selection of the proposed features that have shown to be the most indicative of dyslexic reading patterns.
2. Related Work
Dyslexia is diagnosed by tests that include reading and writing assessments, among other evaluations, and are standardized by experts on a large number of subjects [
10,
11]. The advancement of technology has made the digitalization of these tests possible, and it has also contributed to the objectivity of the testing as certain quantifiable metrics can be obtained from digitalized dyslexia tests [
12,
13,
14].
Different screening methodologies can be performed to distinguish dyslexic from non-dyslexic subjects. Brain-imaging methodology most prominently focuses on functional magnetic resonance imaging during reading [
15,
16] and diffusion tensor imaging [
16,
17], which both show, respectively, the functional or morphological differences between the dyslexic and the control group. Brain activity can be monitored using electroencephalography (EEG) as well, either on its own [
18,
19,
20] or in combination with other biometric signals, such as heart rate, electrodermal activity (EDA), and eye tracking [
21,
22,
23,
24].
The analysis of reading and eye movement patterns is often performed in dyslexia research. Temelturk et al. in [
25] performed a systematic review of 25 papers that include binocular eye-tracking during linguistic and non-linguistic tasks in children from 5–17 years of age with dyslexia and with typical development. The review aimed to combine the knowledge from the existing literature that observed the binocular coordination in children with dyslexia by describing the normative development of stable binocular control. The findings of the review indicate clearly that there is poor binocular coordination in children with dyslexia but that the results associated with different task characteristics were not as consistent. Another study focused on detecting dyslexia based on reading patterns was presented by Wang et al. in [
26]. A neural network was developed that was used to predict whether or not the subject had developmental dyslexia, based on the data gathered from 399 Chinese children. The dataset included children aged 7–13, 187 with dyslexia and 212 controls. The authors report an achieved accuracy of 94%, claiming that the reading accuracy was the feature that had the strongest factor in detecting dyslexia, but the phonological awareness, the accuracy rate of pseudo characters, the morphological awareness, the reading fluency, the rapid digit naming, and the reaction times of noncharacters made important contributions to the classification as well.
Eye tracking is often used in the practical diagnosis of dyslexia as it provides a direct insight into the visual sampling strategy. The eye movements of subjects with dyslexia show an erratic gaze pattern that can be quantitatively described by features and used for further development of the algorithms for automatic dyslexia recognition [
27].
Rello et al. in [
28] claim to be the first to attempt classifying dyslexia based on eye-tracking features using machine learning. The language of the text used in the experiment was Spanish, and 97 subjects were included (48 with dyslexia), with the subject age ranging from 11 to 54. Each subject read 12 different texts, each presented in a different font type, on white paper with black letters. A support vector machine (SVM) classifier was implemented, and the features used as inputs were the age of the participant, mean duration and the total number of fixations, total reading time, etc. The model was evaluated using 10-fold cross-validation, and an accuracy of 80.18% was achieved.
A study with a larger number of participants and a more in-depth feature analysis was performed in [
29]. The data were gathered from 185 subjects (97 with dyslexia), with ages ranging from 9 to 10, who read a single text written in the Swedish language. The text was presented on white paper with black letters, and a total of 168 eye-tracking features were considered. The features were derived from both version and vergence [
30], the regressive and progressive movements, the saccades, the fixations, the duration of the event, the distance spanning the event, the accumulated distance of an event, the accumulated distance over all subsequent positions, etc. Considering the large number of features, a recursive feature elimination (RFE) algorithm was implemented to reduce the number of features. An SVM classifier was used, and it was evaluated using 10-fold cross-validation, which was repeated 100 times to ensure the stability of results in terms of dataset splitting. The highest achieved accuracy was 95.6%, and it indicates that a large number of subjects in combination with a wide range of observed features enables a reliable classification. This paper also effectively performed subject-wise evaluation, where the data from a given subject are either in the training or test, creating an evaluation scenario similar to a real use case [
31].
Prabha et al. [
32] analyzed the dataset introduced in [
29] using several ML algorithms. Only the features extracted from fixations, in combination with an RFE feature selection algorithm, were used for the classification. The authors implemented an SVM classifier (with four different kernel configurations), a k-nearest neighbors (KNN), and a random forest (RF) algorithm and achieved the highest accuracy of 95% by KNN. In their further work [
33], Prabha et al. focused on analyzing the same dataset, but with new ML algorithms, such as particle swarm optimization (PSO)-based SVM hybrid kernel (hybrid SVM–PSO), SVM, RF, logistic regression (LR), and KNN. They also observed features extracted from both saccades and fixations and obtained an accuracy of 95.6% with the hybrid SVM–PSO model. Prabha et al. also focused on observing eye-tracking feature sets and several other ML algorithms in their work performed on the same dataset [
34,
35], obtaining similar results, although a slightly higher accuracy of 96% in [
35] using a hybrid SVM–PSO model.
A study including 69 children (32 with dyslexia) was conducted in [
36]. The children were aged 8.5–12.5 and read two text paragraphs in Greek. The authors implemented several ML algorithms (KNN, SVM, and naïve Bayes) and observed a wide range of eye-tracking features. The best-obtained accuracy of 97% was achieved using only three features, saccade length, the number of short forward movements, and the number of repeatedly fixated words.
A holistic approach for dyslexia detection based on a convolutional neural network (CNN) was implemented in [
37]. The authors used the dataset from [
29], but rather than extracting features, they used gaze coordinate data as a direct input to the CNN and implemented several padding algorithms to make the data sequences the same length. The achieved accuracy results of 96.6% (obtained with a modified cross-validation evaluation) show that, given the right data encoding, deep learning algorithms can provide very reliable dyslexia detection based on eye movement data.
Weiss et al. [
38] analyzed the lateralization of early orthographic processing during natural reading in subjects with dyslexia. The authors recorded the eye-tracking and EEG activity of the subjects, 24 subjects with dyslexia (mean age 24.8) and 24 control subjects (mean age 23), during the reading of isolated sentences in their native (Hungarian) language, with various spacing between letters. The statistical analysis of the EEG and the eye-tracking parameters performed in the paper has shown several interesting findings. Increased spacing between letters was shown to reduce the silent reading speed in both subject groups, in contrast to the beneficial effects on oral reading found in previous work. Furthermore, the authors found that the early left-hemispheric lateralization of orthographic processing during natural reading depends on the rank of fixations and that it is most prominent when reading on the default letter spacing in control readers, as well as that it deteriorates in subjects with dyslexia.
The detection of developmental dyslexia using machine learning and eye movement data was performed in [
39]. The authors observed a group of 165 subjects with an average age of 12.5. Of the chosen subjects, 30 met the criteria for a reading disorder based on choosing the 10th worst percentile of the reading fluency performance score, which was used to label them as dyslexic. The language used in the reading experiment was Finnish (the subjects’ native language), and a variety of eye-tracking features were observed. An RF algorithm was used for feature ranking, and an SVM was used for subject classification based on the selected features. The overall accuracy of 89.7% was achieved using five-fold cross-validation.
El Hmimdi et al. [
40] performed research on predicting a dyslexia diagnosis as well as reading speed from eye movement data in both reading and non-reading tasks. The authors used eye movement measures from four different setups, gathered from 46 dyslexic subjects (average age 15.5) and 41 control subjects (average age 14.8), recruited from schools in Paris. A vergence, saccade, and two reading tests were performed by each subject, and several eye-tracking measures were derived from the obtained data. Based on the obtained features, a variety of ML algorithms were implemented, and the findings showed an accuracy of 81.25% percent when using the data from the reading tests and 81.25% and 77.3% accuracy from the two no-reading tests, respectively. The prediction of reading speed was also performed on each of the feature sets from the two reading tests and two no-reading tests, showing that the reading speed can be predicted more accurately from one non-reading task than from the two reading tasks.
Vajs et al. [
41] presented a CNN solution for dyslexia detection based on the VGG16 neural network architecture. The eye-tracking data were gathered from 30 subjects (ages ranging from 7–13), 15 with dyslexia, and 15 controls. The subjects read the text in their native language (Serbian) on different colored backgrounds and overlays, and the raw eye-tracking data were segmented, visualized, and used in the form of colored images as inputs to the CNN model. The model was evaluated using leave-one-out subject cross-validation, and an accuracy of 87% was achieved.
4. Results
The average metrics achieved on the test sets for the four ML algorithms (LR, SVM, KNN, RF), using three different feature sets (conventional, proposed, and all features) as inputs, are given in
Table 2.
The achieved results show an overall high accuracy and a consistently better result when using the proposed features as well as the all features as inputs in comparison to the conventional ones. The best accuracy for both the proposed features as inputs and the all features as inputs was obtained by the LR algorithm, and it convincingly surpassed the best accuracy of 85% obtained for the conventional features by the SVM algorithm.
The average test set accuracy achieved when each individual feature is used as the ML input is shown in
Table 3. The other metrics for single feature evaluation are presented in
Appendix A.
The best accuracy was achieved for the Fixation intersection variability feature. The second and third best accuracies were achieved for the Fixation intersection coefficient and the Fixation fractal dimension. The accuracies achieved for these three features for all the ML algorithms were higher than the accuracies achieved when using all the conventional features as inputs.
The importance of each individual feature was also ranked using the decrease in impurity in the RF algorithm [
45], and the results are shown in
Figure 3.
The feature importance ranking indicates that the three proposed spatial features (Fixation intersection coefficient, Fixation fractal dimension, and Fixation intersection variability) that achieved the highest individual accuracy do indeed contribute to a high classification accuracy when observed as part of a feature set. Considering this, the three proposed features were used for further statistical analysis.
The boxplots of the
Fixation intersection coefficient,
Fixation fractal dimension, and
Fixation intersection variability for each color configuration and each subject group (dyslexic and control) are shown in
Figure 4.
The boxplots show that there is a clear difference between the dyslexic and control classes for each color configuration (the control group has much lower feature values than the dyslexic group). This was further proved by the statistical analysis. For the three most important features, for each color configuration, a statistically significant difference was achieved between the subject classes (p < 0.001) using the Mann–Whitney test. Furthermore, the Levene test of the dispersity between the subject groups also showed a statistically significant difference for each of the three fixation complexity features (Fixation intersection coefficient, Fixation fractal dimension, Fixation intersection variability) for every color configuration (p < 0.01). The Mann–Whitney test shows that the feature values significantly differ between the groups, and the Levene test of dispersity shows that for each color configuration, the dyslexic group has many more dispersed data points than the control group.
In order to determine whether there was a color that had a more positive influence on dyslexic subjects (the color that would produce the lowest feature values, as close as possible to the values of the control group), a statistical analysis was performed within the dyslexic subject group, comparing each pair of color configurations. The Wilcoxon signed ranks test showed that there was a statistically significant difference (
p < 0.01) only for three pairs of color configurations and only for a single feature (
Fixation fractal dimension): (1) yellow overlay and orange overlay, (2) orange background and yellow background, and (3) turquoise background and yellow background. The visualization of the configuration pairs for which there was a statistically significant difference, as well as for three arbitrary configurations for which there was no significant difference, is shown in
Figure 5.
5. Discussion
In this paper, several ML algorithms and statistical tests were performed with the goal of analyzing the dyslexic tendencies in a group of 30 children (15 dyslexic and 15 control). The text was written in the subjects’ native language, Serbian, which has a perfect matching between letters and phonemes. Considering dyslexia detection in such languages (the ones with a shallow orthographic system) is often quite difficult; an accuracy of 94% achieved on the balanced dataset used in this paper (F1 score 0.93 and AUROC 0.96) (
Table 2) shows a promising result that is comparable to the ones achieved in the literature [
29,
30,
32,
33,
35,
36,
37,
39,
40,
41] which were performed on languages with deeper orthographic systems. As the Serbian language has a shallow orthographic system, making dyslexia harder to diagnose, we consider the observed subject pool relevant for the performed research purposes for a language such as Serbian. Although the number of participants used in this study is lower than the subject groups found in the literature [
28,
29,
36,
39,
40], the number of total used trials (378 trials, explained in
Section 3.1) provided enough data for the performed type of machine learning analysis.
The three most important features (
Fixation intersection coefficient,
Fixation fractal dimension, and
Fixation intersection variability,
Figure 3) that describe the fixation gaze complexity achieved a decently high accuracy (89% or higher,
Table 3), even when they were used as the single input feature for the ML algorithms. The importance of feature design and data interpretation has shown to be quite significant as a single spatial feature describing fixation gaze complexity achieved a better accuracy (91% for
Fixation intersection coefficient) than all of the observed conventional features combined (85%). It is important to note that the fixation complexity features clearly have lower values for the control subjects and higher values for the dyslexic ones. The fixation complexity features, and consequently the gaze pattern complexity, could therefore be considered an indication of reading difficulties that can be observed in dyslexic subjects.
The proposed features should also be of use in dyslexia analysis for languages besides Serbian as struggling to focus on words could yield similar chaotic fixation movements in other languages. The drawback of the features is that they do require a certain sampling frequency and eye-tracker precision as the characterization of fixations that is used in this work does rely on detecting fine eye movements. The field of view of the reader can also influence the quality of the feature as reading from a further/shorter distance from the screen/paper could enable the reader to have a different number of words within a single focus point. This can, in turn, make the chaotic movement of the gaze either harder to detect or perhaps more saccadic, which might influence the separability of the classes.
The statistical analysis showed that the spatial features provide clear class separability regardless of color configuration, as seen in
Figure 4. The statistical differences between the subject groups for all the color configurations show that a single color cannot be used to make reading easier, to the degree that the dyslexic and control groups are not separable.
The comparison between color configurations for dyslexic subjects shows that there could be color configurations that are more favorable than others. The analysis within the dyslexic group also showed a statistically significant difference only between three pairs of colors, as seen in
Figure 5, indicating that none of the colors, universally, makes reading easier or harder when compared to the other ones. A lack of a consistently superior configuration, however, indicates that the colors have a different effect on each subject and that, in order to make reading easier for children with dyslexia, an individualistic approach would most likely be the best solution. The same conclusion could be reached by observing the statistical analysis between subject groups, as the statistical significance was prominent for each color configuration, indicating that none of the colors stands out in the sense of making dyslexic and control subjects more similar in their reading patterns.