1. Introduction
Eye tracking represents an accessible and non-invasive tool capable of measuring looking times, recording saccadic behaviours, assessing physiological ocular measures, such as pupil dilation, and shedding light on mental processes [
1,
2,
3,
4,
5]. Due to the low level of cooperation required from participants, eye-tracking techniques are particularly suitable for non-verbal participants and participants who are unable of following instructions, such as infants. In recent decades, eye trackers have allowed researchers to explore infant development and gain significant insights into early perception and cognition [
6,
7,
8]. Even though eye tracking is a valid tool for addressing a variety of research questions about infant perception and cognition, researchers are constantly faced with technical challenges and constraints. As outlined by Oakes [
8], some of these challenges include tracking head movements (raising the possibility of missing data if participants move the head outside the trackable area and the system must find the eye coordinates again), obtaining a good calibration, and establishing the experimental design implementation and data processing required to analyse the data. Whereas the two latter aspects are more related to the analytical and technical skills of the researcher before and after data acquisition, the former two are closely linked with the infant population and as such will be the focus of this paper. Additionally, some challenges are related to the physiology of the infant eye, both in terms of anatomical differences between the structure of the developing eye compared with the adult eye, and in terms of the typical features of infants, such as the tendency for wet eyes. Anatomical differences will be also considered in the present work.
To some extent, the impact of head movements depends on the trackable area available for eye tracking which, in turn, depends on the device used to measure eye movements [
9]. Two broad categories of eye-tracking devices are generally available for infant participants: remote and head-mounted (or wearable) systems. The most well-established eye-tracking paradigms in infancy have taken advantage of non-intrusive remote eye-tracking systems (e.g., [
10,
11,
12]). Most commercial solutions support regular flat screens up to 27 inches in size (or up to 30 inches with some limitations). Although small head movements are tolerated by many eye-tracking solutions, the eye’s image is usually lost outside the screen area and has to be reacquired following each tracking loss. According to Tomalski and Malinowska-Korczak [
13], infant participants spend about 10% of a standard eye-tracking session looking away from the monitor, resulting in missed data each time the system has to recover the eye’s image. These spatial constraints, restricted by the area of a regular screen, mean that gaze-orienting behaviours are mostly investigated through the contribution of the eyes while the head is held in a relatively stable position. In contrast, in everyday situations we orient thanks to both eyes and head movements. Even though such infant paradigms using regular screens have enabled valuable insights into infant visual behaviour in response to stimuli, more ecologically valid investigations that span across wider visual locations have been limited. To the best of our knowledge, so far, only one study by Pratesi and collaborators [
14] has adopted a remote eye-tracker to investigate infant gaze behaviour beyond near-peripheral locations (+/−30°). This was achieved by using five screens (a central screen and two additional screens on each side) across a 120° field of view. In addition to remote eye-tracking systems, the development of head-mounted eye-tracking systems has enabled an alternative method that allows free head and body movements and the possibility of investigating a wider three-dimensional space [
15]. Some young participants do not, however, respond positively to a wearable system on their heads, and may easily displace or remove the device. Further, these devices can be complicated to set up, resulting in higher overall attrition rates [
16].
The current study is aimed at testing a remote system that is not invasive for the participant and, at the same time, measures across a wide field of view of 126°. We defined a wide field of view extending beyond the near-peripheral locations which can be investigated using regular screens supported by most remote eye-tracking solutions. Measuring gaze movements across the visual field opens the possibility of studying the developing visual behaviour in a more naturalistic and unconstrained visual environment. The limitation of a restricted trackable area, in which the pupil position can be accurately detected, is overcome by using multiple infrared cameras. In the present work, a 4-camera system allows tracking of both the contributions of the eye relative to the head and the contributions of the head relative to the spatial environment. This work builds upon the initial investigation across a wide visual field of Pratesi and colleagues [
14], who piloted a similar system on a small group of nine infant participants. The present work extends this approach to a larger sample and to a single and wider screen, while taking advantage of a new software specifically adapted to the developing head and eyes. The current multi-camera setup enables researchers to investigate infant perception and cognition beyond standard screen sizes and, potentially, to define a tracking area even without a screen (see, for instance, applications of similar eye-tracking systems in the automotive field in order to track drivers’ eye movements across different car spaces; e.g., [
17]). The applications of this system include a range of studies investigating visual behaviour beyond a limited trackable area in the context of a participant who is less constrained to direct their visual attention to a standard screen space. Visual orientation could be monitored while participants move their heads in an active ‘real-world’ exploration. At the moment, similar investigations are mostly carried out with head-mounted systems, with the limitations described above.
The second challenge of eye tracking in infancy research which is addressed in this paper is calibration. Every eye-tracking system relies on calibration and quality of the data often depends on this [
7]. In fact, the data provided by the system (e.g., gaze positions) must be mapped onto the stimulus/display area. Eye-tracking data collected from infants are not always reliable and this can even lead to apparent differences in gaze behaviour when different groups of individuals are compared [
18,
19]. Among the relevant parameters in evaluating data quality, this paper specifically focuses on (1) spatial accuracy, as this is limited by the quality of the calibration procedure, and (2) robustness (i.e., data loss), as it is linked with the trackable area available. For a focus on precision, a third parameter of data quality, see Wass et al. [
19], who investigated the variables that low precision may influence when tracking the infant’s gaze on standard displays. Spatial accuracy (offset) refers to the distance between the actual location of the stimulus that a participant is looking at relative to the gaze points recorded/extracted by the eye-tracking system [
5,
20]. Traditionally, a good spatial accuracy is achieved by asking participants to maintain fixation on a number of small visual targets (usually 9 for adults) at predefined locations on the screen. This means calibration is more difficult with young infants who cannot follow such instructions, resulting in a spatial offset of 1–2° [
1]. In developmental studies, highly attractive stimuli (e.g., moving or looming colourful images paired with sounds) are typically used and calibration points are significantly reduced to 5 or even 2 in some cases [
6]. Notably, not all attractive stimuli result in a high-accuracy calibration. A recent investigation [
21] compared the impact of different calibration targets on infants’ attention and found that some targets, such as complex concentric animations or stimuli with the highest contrast at their centre, elicited more accurate gaze than others. In addition, taking the infant’s limited attention span into account, calibration should ideally be as brief as possible so that the infant is not too tired and remains cooperative during the following experimental procedure [
6]. For these reasons, optimal infant gaze calibration is not always achievable before the start of an experiment and, as it stands, there are currently no standard or prescribed calibration guidelines for researchers (see [
22] about publishing eye-tracking data in infancy research). For instance, important considerations, such as the criteria that determine whether the calibration is valid and whether it should be adjusted or repeated during the experiment, are not standardised across studies [
8]. The efficacy of a calibration procedure in producing accurate gaze measurements has been rarely included in empirical infant research, although it has been previously recommended as a factor of importance for methodological descriptions [
18,
23]. Studies using young participants have revealed evidence of systematic calibration errors and low spatial accuracy compared with the manufacturer’s estimates [
18,
24,
25].
More generally, eye trackers often show a systematic error, even with adult data and after careful online calibration [
26]. To overcome this issue, post hoc (or implicit) offline calibration has been proposed as a successful approach to replace calibration methods that require explicit collaboration from the participants [
27,
28] or as an additional step to improve data quality [
24,
26,
29,
30,
31]. This procedure normally includes recalibrating individual gaze points at various times during the study by correcting the error between the recorded gaze data of a participant and the actual location of the visual stimulus. To date, offline calibration methods of correct eye-tracking offsets have rarely been adopted in infancy research [
24]. In the present work, we combined an online system calibration with a novel offline implicit gaze calibration to improve the spatial accuracy of the eye-tracking system. The latter was possible as visual targets appeared at stable and predetermined positions during the experiment.
3. Results
The system allowed tracking infants’ eyes movements on a wide visual area, covering mid-peripheral locations up to at least 60° either side. Due to head tracking, gaze data were not lost when the participants moved or turned their head. The robustness data revealed that, on average, 23.97% (SD = 11.93%) of the raw data were lost during the entire recording. The system could track data even beyond the screen area and approximately 28.00% of the data (SD = 10.14%) were recorded outside the screen. Data visualisation revealed that gaze data out the screen were often directed towards the flash producers just below the display monitor. Head tracking throughout the entire experiment allowed the monitoring of infants’ eye distance from the display. Although this distance was initially set to 40 cm, the variation throughout the recording was high, with median head distance values per trial ranging from 29.20 to 56.50 cm (M = 42.25 cm, SD = 4.73 cm). Importantly, the system could still accommodate this variation.
By combining information about target position and gaze position at selected time frames, it was possible to correct a mean offset of −42.22 pixels (SD = 38.88 pixels) on the
x-axis and −44.81 pixels (SD = 53.61 pixels) on the
y-axis. At a 40 cm distance, this corresponds to a −1.38° mean offset on the
x-axis (SD = 1.27°) and a −1.46° mean offset on the
y-axis (SD = 1.75°). At the individual level, the smallest average correction was −0.17 pixels (−0.01°) on the
x-axis and −0.41 pixels (−0.01°) on the
y-axis, whereas the largest average correction corresponded to 119.40 pixels (3.89°) on the
x-axis and −207.47 pixels (−6.76°) on the
y-axis (see
Figure 3). This offline calibration procedure allowed the correction of an error that affected the majority of participants in the top-right direction (
n = 27, 87.10%) and that could have contributed towards an incorrect data interpretation.
Further, mean horizontal and vertical offsets maintained a consistent direction across the three experimental stages during which the calibration points had been selected. Mean vertical offsets did not vary significantly across experimental stages, X2 Friedman (2) = 2.52, p = 0.28. The experimental stage had a small effect on mean horizontal offsets, X2 Friedman (2) = 14.10, p < 0.001, WKendall = 0.23. Specifically, pairwise comparisons adjusted with Bonferroni correction showed that the mean horizontal offset was less prominent in the initial experimental stage (M = −23.68, SD = 49.67) compared to the second (M = −48.89, SD = 46.35, p = 0.002) and third experimental stages (M = −54.10, SD = 37.27, p = 0.001).
Saccadic reaction times in response to peripheral targets and dwell times over the face regions were extracted following individual offset correction. At this stage, trial validity was assessed. Only trials in which the infant was (1) looking at the centre of the screen at the offset of the attention grabber and (2) orienting towards the peripheral target between 100 ms and 5 s from its onset without gazing outside the screen were considered valid. Five infants who ended up with less than 20% valid data were excluded from further analysis. Out of a total 807 trials pooled across participants, 444 trials (55.02%) were valid and analysed further. Eighteen trials were also excluded as outliers. Results showed that infant participants detected the peripheral target after an average of 1269 ms (SD = 581 ms). At this time, the moving face target was located at 55° eccentricity. Dwell times over the face area were on average 2568 ms per trial (SD = 1505 ms).
4. Discussion
The goal in the current investigation was to generate a remote eye-tracking procedure that could successfully address some of the most relevant challenges that researchers face when studying infant participants. First, this method can accommodate head movements in a wide testing environment while measuring gaze in response to stimuli presented at 60° either side from the centre of a curved monitor. In addition, a simple offline calibration procedure was implemented. This not only improved data quality but it was also suitable for non-standard tracking areas and infant participants who cannot follow instructions.
In this study, data loss due to head movements was limited because multiple cameras were used and other cameras could take over when one camera could not acquire data, resulting in the possibility of accommodating head movements across the entire curved monitor, covering a FOV of 126°. Robustness was particularly good for a sample of infant participants. In fact, the current average data loss of 23.97% is not too far from the percentage of data loss reported for adults tested under optimal laboratory conditions, which can reach 20% [
3] (pp. 166–167). The proportion of data loss included blinks and the system failing to record data for technical difficulties or for systematic infant behaviours, such as covering their eyes or orienting towards the caregiver. Data loss in the present work showed an improvement compared to the 40% data loss reported by Pratesi and collaborators [
14], who also used this eye-tracking system with an infant sample. It could be that in the current study the target stimuli presentation was triggered by the infant looking at the central attention getter and, thus, the visual presentation progressed when the infant participant looked at the screen. Not having a gaze-contingent trial presentation could possibly lead to more significant data loss.
The four eye-tracking cameras adopted in the current study kept tracking both eye and head movements within the whole testing environment, covering a large visual area of 126° (although this can potentially be increased to 360° with the use of eight cameras, as reported by the manufacturer). In the present study, the focus was to measure saccadic reaction times and dwell times across a wide horizontal area but more locations, including a bigger vertical area, could be introduced by adjusting camera placement. During the entire recording, about 28% of gaze data were localised outside the wide screen but, notably, those data were within the working space of our set-up and were still recorded. This value could vary depending on how engaging the infant finds the experimental procedure. In the present study, we found out that infants’ attention was sometimes captured by the flash producers below the screen. It seems relevant for future infant studies to carefully consider the position of the flash producers.
Apart from being able to track eye gaze outside a limited headbox, the tracking system also allowed us to monitor the distance between the infants’ head from the screen throughout the testing session. Although parents were instructed to keep infants on their lap at a set distance, there was a high degree of variation in infants’ head distance during the procedure. Importantly, this method can record gaze data across a range of head–monitor distances and accommodate this expected variation in terms of distance due to the nature of this population. Normally, the head component of infant orienting behaviour is not considered in standard eye-tracking procedures. The ability to investigate infants’ orienting behaviour in a wider visual field, where head positions are less restrained, is essential in the aim to transition from strict laboratory-controlled environments to more naturalistic settings that best represent our everyday experiences. The present work provides some preliminary insights into infant information detection across a wide horizontal visual environment in which the contributions of both the eye and head components are necessary to successfully detect the target. To the best of our knowledge, a similar Smart Eye eye-tracking system has been used only once in infancy, but with a very small sample size [
14]. In the present work, a new software version, specifically designed to recognise the facial proportions and anatomical features typical of infants, was adopted, and its performance was enhanced by implementing an offline calibration procedure.
Here, data quality was also considered and, in particular, spatial accuracy. Offline data inspection revealed a systematic top-right error in the recorded gaze location compared to the true gaze location (i.e., actual target position). This top-right shift was noticeable in the vast majority of individual data. Overall, the offset direction did not change as the experiment progressed in time. The vertical offset size was stable across experimental stages, while the horizontal offset size showed a slight deterioration at the beginning of the study, only to then remain stable. The average error size that was found in the present work is comparable to past findings in infancy research, which used both an initial gaze calibration and a calibration verification procedure [
25,
38]. However, there was some variability between each individual’s average error. Notably, the systematic gaze position error that was reported here may not be detected in standard calibration displays presented before the start of the experiment and may change following an initial online calibration. For this reason, we proposed an offline calibration procedure to correct for individual offsets and improve data quality. This method has been more widely used in recent years, especially in adult research [
26,
29,
30,
31,
39,
40]. It has been proposed to overcome the effect of individual factors that limit eye-tracking data quality, such as physiological features of the eye or head movements [
26,
41] and to account for a degradation of calibration over time [
42]. In infant studies, this approach is not used as frequently as in adult research (see [
24] for a study that both verified and corrected a drift in infant gaze data), even though different researchers have previously raised concerns about the accuracy of infant eye-tracking data [
18,
19,
24,
25,
43]. In the current study, the implementation of an offline calibration procedure has been essential in making sure that a systematic offset was discovered and corrected, and also in overcoming the difficulty of calibrating a non-standard wide tracking environment. Further, standard gaze calibration procedures may not always be exact in developmental studies because infants do not always fixate on the required calibration points for sufficient amounts of time, and this can result in at least 1° error in spatial accuracy [
1,
22].
Improving spatial accuracy and estimating each individual’s offset are particularly important aspects for data interpretation, especially in some experimental designs. As outlined by Aslin [
1] and Dalrymple et al. [
18], spatial accuracy is extremely relevant if the eye-tracking paradigm relies on whether or not the subject looks at an area of interest (AOI). In fact, poor spatial accuracy can result in gaze points erroneously being recorded as falling outside or inside an AOI, particularly if the AOI is small and/or in close proximity to another. Furthermore, for experimental designs in which different age groups or populations are compared, discrepancies in data quality can potentially produce false differences in the outcome measures, therefore leading to erroneous interpretations [
19,
23].
One additional advantage of implementing an offline calibration procedure is that data are evaluated and corrected throughout the whole experiment, whereas standard calibration only occurs at the beginning of the session and is very rarely repeated during the experiment. Offline calibration therefore enables more accurate data throughout a testing session and improves the validity of eye-tracking investigations in infancy. This approach could be implemented across eye-tracking systems and is not dependent on one particular hardware system (see [
24,
25] for similar examples with Tobii systems). Overall, an offline calibration procedure should be adapted according to the experimental design and observed data quality. In the present work, we took advantage of the time intervals during which the visual stimuli already included in the experimental procedure were stable on screen. This allowed for six calibration points, spanning across different spatial locations and experimental stages. Whether or not all the stable visual stimuli on screen can be used as calibration points depends on data robustness and on whether gaze data are available for a sufficient duration when the target is on screen. When participants are likely not to attend to the stimuli on screen or when data quality is low in terms of robustness, additional calibration targets should be included. Notably, stable visual stimuli may not be required in every experimental procedure but should be incorporated specifically if an offline calibration procedure is planned. In this case, the calibration stimuli should ideally cover the entire tracked area. Including an increased number of stable visual stimuli than those available in the current procedure may enable future investigations to understand the ideal number of offline calibration samples that are needed to obtain the most reliable correction. Overall, even if an offline calibration step is not implemented to correct the offset, we strongly believe it is important to report not only the manufacturer’s accuracy data (albeit usually based on adult data under optimal testing conditions), but also to extract the actual data accuracy and consider its overall effect on data processing and interpretation. Downstream, data accuracy could be used as a potential parameter to exclude individual data [
44], or as a guide for setting the size of the AOI [
18,
45].
One limitation of the current investigation is that the visual targets of the experimental procedure only appeared along the horizontal axis. For this reason, offset coordinates were collected at different eccentricities but were limited to the horizontal axis. A more accurate offline tracking procedure could include more diverse locations on screen, including along the vertical meridians and locations near the edges of the screen where spatial accuracy typically decreases. This was not possible with the setup and procedure used in the present experimental paradigm. In general, the offset that was detected in the current study was rather linear and had a consistent direction throughout the procedure. This enabled us to correct it with a simple offline calibration interface. Different offline calibration procedures may be needed if data quality is more heterogeneous in time and space. Further, although the software used in the present investigation considered the anatomy of the developing eyes and head, we did not take into account individual characteristics of our infant sample (for instance, eye colour or infant positioning during the procedure as reported by Hessels at al. [
43]) that might have influenced the accuracy of this system and, more generally, data quality. More investigations are needed to identify which factors can affect data quality in wide-angle testing environments and with this specific eye-tracking system. Still, we highlighted the importance of verifying, and eventually improving, data quality parameters according to the adopted experimental procedure.