1. Introduction
The dynamics of driving a vehicle while performing a non-driving-related task (NDRT) is an area of critical importance for human-factors investigations of crash risk (e.g., [
1,
2,
3]), user interface design (e.g., [
4]), and behavior when automated support systems are engaged (e.g., [
5,
6]). Though several studies have shown an increase in driver distraction during various NDRTs (e.g., [
7]), engagement in NDRTs is relatively common while crashes are still relatively rare [
8]—this apparent paradox can potentially be better understood by looking at the interplay between NDRT behaviors and the conditions under which they occur. There are many factors that may influence drivers’ decisions to engage in NDRTs in real world conditions, including traffic, weather, road type and vehicle speed, which has long been believed to be an important variable [
9]. Speed can influence a driver’s choice of whether to engage in a task at a particular moment, as well as what type of task may be appropriate for that moment. For example, [
10], observing naturalistic engagements with infotainment systems, found that, while 50% of the interactions observed were shorter than 2.2 s, longer interactions tended to occur when vehicles were stationary. Additionally, the task itself may in turn have effects on speed that the driver continues to travel, as well as other aspects of driving performance (e.g., lane position, time-headway).
Morgenstern et al. [
11] summarizes a host of literature showing that drivers generally increase following distance, reduce maneuvers (such as lane changes), and reduce speed when engaged in NDRTs. Tivesten and Dozza [
12] found that drivers adapted their NDRT behavior to driving conditions, such as strategically initiating visual-manual tasks after turn maneuvers. Other studies (e.g., [
13]) showed that drivers do not engage in demanding visual-manual tasks in difficult driving contexts, such as bad weather, and do engage in such NDRTs in low demand driving contexts, such as when stopped at a signalled intersection [
14].
In an analysis of naturalistic data from the second Strategic Highway Research Program (SHRP2), Risteska et al. [
15] observed that driving environments do affect NDRT engagements as a whole, especially for older drivers. Examining baseline data (epochs from SHRP2 not containing crashes or near-crashes), the authors observed that increased speed (as well as increased environmental complexity, utilizing variables such as traffic level of service) was associated with fewer NDRTs, as well as shorter off-path glances. Utilizing the European Naturalistic Driving and Riding for Infrastructure and Vehicle Safety and Environment (UDRIVE) data, Ismaeel et al. [
16] looked specifically at NDRT likelihood at intersections, with NDRTs being more likely to be observed when the vehicle was stationary than when moving, and when at an intersection controlled by a traffic light than a traffic sign. These studies both suggest that drivers engage in a level of self-regulation based on the driving context. Together, these behaviors operate in a dynamic fashion between driving context, driving behavior, and NDRT type.
The relationship between speed and NDRT engagement has been studied principally in terms of the distribution of on- and off-road visual attention. Senders [
17] observed that diverting attention from the forward roadway disrupts driving more at faster speeds than at lower speeds. This is the case since, in manual driving, drivers are required to maintain moment-to-moment lane position and distance to lead vehicles, as well as engage in object and event detection, and visual attention cannot be diverted from the forward roadway for long, as the time-course of events requiring visual attention is reduced as speed increases. In line with this, interactions that involve greater investment of visual resources, such as traditional visual-manual human–machine interface (HMI) interactions, have been found to have a greater impact on driving speed than voice-initiated interactions [
18] (e.g., there is a tendency for many drivers to slow their speed when engaging in a visual-manual task).
The relationship between visual attention and driving safety is inscribed in several standards documents of in-vehicle HMI design, including those of trade associations (e.g., Alliance of Automobile Manufacturers [
19]; JAMA [
20]), NHTSA’s distracted driving guidelines for in-vehicle interfaces and aftermarket devices ([
21,
22]), and the European Statement of Principles on Human–Machine Interface ([
23]). While the foundations of these guidelines are designed to mitigate the influence of task demands on driving safety, they are somewhat limited in their consideration of factors involved in a driver’s decision to engage in NDRTs in real-world conditions. They are intended to be applied across all contexts, despite driving demand fluctuating wildly. This may lead to employing thresholds that are too permissive in high demand contexts, and too restrictive in low demand contexts. Understanding how drivers tend to safely approach engaging in different NDRTs across different driving contexts is a step toward better understanding what safer attentional patterns look like.
The current study evaluates the relationships between speed and engagement in NDRTs by leveraging real-world driving data. We examined the engagement in an array of typical NDRTs, categorized by their type and modality, across the spectrum of speeds from stationary to free-flow highway driving. We chose to do this for manual driving, despite the increased availability of partial-automation and the growth of research in automation, for two reasons. First, full automation, by some expert accounts, is “a transformation that is going to happen over 30 years and possibly longer [
24]” and it is likely that, for the time-being, control of vehicles will likely be principally the responsibility of human drivers. Second, even with automation, NDRT engagement will likely affect the responsiveness of potential drivers (either in the vehicle or remotely) to take-over during transfer of control requests or silent automation failures. Thus, understanding the propensity of NDRT engagements across driving contexts in baseline manual driving can lay the groundwork for modeling distraction risk across different levels of automation. Different types of NDRTs have different components of demand from a driver, such as visual (e.g., looking at an instrument cluster), visual-manual (e.g., mobile phone manipulation), or auditory-vocal (e.g., having a hands-free mobile phone conversation), and these types have been studied extensively in terms of distraction and driver crash risk (e.g., [
1]). We hypothesized that the likelihood to engage in visual-manual NDRTs, such as smart phone manipulation and interaction with the center stack, would show greater sensitivity to vehicle speed compared to tasks that mainly rely on voice interactions, such as hands-free phone conversations and voice-based HMI interactions.
The goal of this study was to quantify the relationship between vehicle speed and the likelihood of engaging in NDRTs. Below, we describe the naturalistic study from which the data were drawn; the coding approach used; the analytic approach used; characteristics of the models relating NDRT likelihood to speed; and conclusions and implications for research and designers.
2. Materials and Methods
2.1. Participants and Data Collection
Drivers were recruited from the greater Boston Massachusetts area via flyers, social networks, forums, online referrals, and word of mouth. Drivers were screened using background and driving record checks, and were asked about driving habits to ensure that highway driving was a part of their regular commute. Drivers were compensated for their time involvement in the study with the use of a vehicle, one tank of gas, coverage of roadway tolls for the duration of their use of the vehicle, and a small monetary compensation. Twenty participants, evenly balanced by gender and with an average age of 54 years (range 22 to 66 years, sd 14.48 years), comprised the sample considered in this analysis.
This study was part of the ongoing MIT Advanced Vehicle Technology (AVT) naturalistic data collection effort (see [
25] for additional details). Participants were provided one of two different MIT-owned and instrumented vehicle makes and models (2016 Range Rover Evoques and 2017 Volvo S90s) to use for one month and drive as they would their own vehicle. Periods of time during which partial automation (like ACC or lane centering) were active were excluded, as they were not generalizable to driving under full manual control or potentially even using other implementations of these same technologies. Initial analyses indicated that the vehicle type did not have significant association with the measures of interest, and accordingly, the following analysis did not include vehicle-type. Critical measures were obtained from cabin videos of drivers (coded as described below) and vehicle speed (collected from the controller area network bus (CAN-bus)). Drivers were aware their driving was being recorded, and were told they could request video to be deleted from the dataset.
2.2. Data Coding
Video recorded continuously during driving from two 720p (30 fps) cameras aimed at drivers (faces and seat/cabin area) was used for manual annotation of NDRT activities. Initial video coding was a collaborative, iterative effort involving senior staff and video analysts collectively developing a set of codes and definitions based on a subset of data. Subsequently, all analysts had hands-on training and received feedback from senior staff, and small portions of the data were dual-coded and assessed for inter-rated reliability (IRR), but no formal IRR was computed due to the large amount of data requiring coding. Video analysts identified and annotated NDRT engagement, including start, end, and type of NDRT for all periods of time when participants were observed manually driving (i.e., without semi-autonomous convenience features) on public roadways. The type of tasks coded were restricted to reflect tasks involving handheld devices (including mobile phone holding, manipulation, handheld and hands-free conversation) and specified vehicle HMI interactions (including center stack, steering wheel button, and voice-based interactions). This focus was based on theoretical relevance (HMI- and phone-related NDRTs remain of principle concern to distraction researchers and legislators alike), and implication relevance (potential implications of this work include HMI design decisions, which are likely best informed by coding both HMI usage and smart phone usage, as both share similarities in technology, task demands, and driver motivations).
Tasks were coded as continuous unless there was a 5 s or greater pause mid-task, in which case tasks were ended at the last touch (for visual-manual tasks, such as phone manipulation, center stack HMI interaction, or steering wheel button HMI interaction) or indication of the end of a phone call (for handheld and hands-free phone conversation tasks). Phone holding included all times that a driver was holding a smart phone, but not using it for a handheld or hands-free conversation, or otherwise interacting with the device (drivers were not observed using phone holder devices in this sample, so all interactions with phones involved some degree of phone holding). Phone manipulation tasks included all types of smart phone interactions, such as browsing, dialing, or texting. HMI interactions included stereo, climate control, navigation, paired phone, or other types of tasks, and were coded by modality of input. Voice-based tasks were coded as subtending the period between the pressing of a steering wheel button associated with voice-task initialization, and the attainment or failure of the functionality associated with the intended voice command (e.g., display of route guidance instructions). All remaining NDRTs were labeled as “Other.” However, because these tasks are of varied modality (e.g., talking to one’s self is a primarily auditory-vocal task, while eating is primarily visual-manual), this category was not included in the analyses of task type, but overall percentage of time spent engaged in these “other” activities is included.
Manually-coded NDRT data were synchronized with vehicle speed data from the vehicle CAN bus network, which were recorded at 30 Hz. The entire dataset used in this analysis was subtended over 714 h of driving (about 35 h per participant).
2.3. Analysis Approach
The NDRTs were first examined by overall engagement propensity, operationalized as the probability of engaging in a given NDRT at any given time (Equation (1)), regardless of whether the task occurred alone or alongside other tasks, and regardless of vehicle speed. This is conceptually similar to other studies (e.g., [
16]) that operationalize time spent on an NDRT (e.g., as a percentage of time), but is presented here as a probability for modelling purposes because, in subsequent analyses, the denominator will fluctuate based on speed.
where
Pj is the probability to engage in NDRT
j and
si,j is the number of seconds driver
i was observed engaging in NDRT
j, and
si is the overall number of seconds driver
i drove. The total number of drivers in this study is
n.
To evaluate whether NDRTs were performed differentially based on vehicle speed, a linear mixed-effects model with subject as a random factor was computed regressing NDRT likelihood against speed, NDRT type, and the interactions between these two factors. We fit a univariate model for smart phone NDRTs and a model for the embedded vehicle NDRTs.
Data were aggregated by participant and vehicle speed and grouped into five-mph bins. Speed bins ranged from 0 mph to 75 mph (preliminary analyses showed that driving above 75 mph was rare in this dataset and many participants had no observed driving above this speed). This yielded a table of 16 speed bins (zero, above zero to 5 mph, above 5 to 10 mph, up to above 70 to 75 mph, exclusive for each lower bound and inclusive for each upper bound) for each of the 20 participants in the study, with a propensity score for each of the seven primary NDRT categories (four smartphone-based NDRTs combined with three different modality vehicle HMI-based NDRTs). Propensities were computed similarly to Equation (1), with the number of seconds of observed NDRT engagement for each participant for each speed bin being divided by the total number of seconds of driving in that speed bin for that driver. Separate models assessing effects of gender and age did not yield significant effects for either, possibly due to small sample sizes within each category.
After visualizing and modeling linear effects, we also explored curvilinear relationships between NDRT propensity and speed, to develop stronger models of NDRT propensity and better reflect driver behavior. These were assessed on a task-by-task basis, comparing the goodness of fit of the linear model with the curvilinear alternative, and then re-modeling and plotting using the curvilinear transformation.
4. Discussion
The relationship between vehicle speed and NDRT engagement was examined and revealed that the probabilities to engage in NDRTs follow various speed-dependent patterns. Phone holding was the most common activity (of the set examined), although considerable between-driver variability was present. Holding, phone manipulation, and center stack HMI interaction NDRT types were strongly associated with vehicle speed. Phone holding showed a linear relationship, suggesting that drivers do not immediately stop phone holding as vehicle speed rises above a certain threshold, or vice versa. In contrast, a reciprocal exponential function provided a better fit when modeling the changes in the probability to engage in phone manipulation and center stack interactions across speed bins, suggesting that at a speed above ten miles per hour these activities become exceedingly rare.
This finding implies that the association between NDRT engagement and vehicle speed follows different patterns depending on the NDRT. Specifically, for NDRTs with strong voice-involved components (as opposed to significant manual-manipulation components), there were no speed dependent variations in the probability to engage. Thus, these NDRTs were as likely to be observed with the vehicle at standstill as they were in free-flowing highway traffic, and all speeds in-between. This was true for both handheld and hands-free phone conversations, as well as HMI interactions via steering wheel buttons and voice commands.
We hypothesize that the relationships between speed and NDRT engagement are traceable to one or more classes of driver behavior (as identified by [
11]) including: 1. self-regulation, the practice of foregoing engagement in a desired behavior because conditions are understood to carry risk; 2. compensatory behavior, the adjustment of one behavior in order to engage in another (e.g., slowing down to text); 3. workload shedding, the removal of elements of driving or NDRT demand in order to accomplish very difficult tasks (e.g., reducing speed modulation in order to text); and 4. strategic engagement and management of tasks, the choice to engage in NDRTs when it is believed the risk of doing so is relatively low, given the driving context.
Accordingly, it is plausible that drivers engaging in NDRTs such as phone manipulation and center stack interactions are self-regulating or attempting to strategically manage task initiation by engaging in these tasks at lower speeds. These findings may reflect, in part, a tendency observed in experimental evaluation studies of similar task interactions (e.g., [
18]) where compensatory behavior (shedding of speed) is observed when research participants are asked to engage with similar tasks. These potentially complementary findings suggest the importance of conducting observation under naturalistic conditions to fully place in context what is observed under experimental manipulation conditions.
The NDRT patterns observed could be seen as an indication of the amount of attentional capacity drivers had, or believed that they had, available based on the driving demands. As such, the patterns could support models of driver attention where attentional distribution should fit the demands of the situation. For example, Minimum Required Attention theory (MiRA [
26]) posits that a driver is attentive if they are sampling sufficient information from the environment, to maintain a good representation of the environment, and should not be considered inattentive if this sampling is sufficient, even if the driver is engaged in an NDRT. The drivers in this study were frequently engaged in NDRTs, but this engagement appeared to depend, at least in part, on context, as operationalized by speed.
One potential application of these models is to adapt feature availability and modes of operation of HMIs based on driving demand. While the usage trends observed in this study hold in aggregate (for this participant sample), individual differences (both at the participant and trip-level) occur—instances where participants engage in NDRTs at ranges of speed where they are low probability events across the sample. By adapting HMIs to fit normative NDRT engagement patterns—for example, by adjusting limits on visual and manual demand in speed contexts where most drivers do not typically engage in such NDRTs—designers can encourage HMI usage that fits typical driver ability to manage task demands. This may be especially useful for novice drivers, who are less likely to be strategic in their engagement of NDRTs [
27].
Though speed is an important factor in considering attention needs, it is of course just one of many factors that influence the immediacy, amount, and other characteristics of how attention is optimally distributed. While, all other things being equal, the relative risk of a sudden conflict tends to be higher at higher speeds, driving at high speed under semi-automated assisted driving on an uncrowded freeway in good weather is likely to have a very different risk profile than manually driving at relatively low speed on a crowded urban street in heavy rain, as would driving at similar speeds through road-types as varied as intersections, roundabouts [
28], or alternative geometries (e.g., [
29]).
Adaptive HMIs, based on driver real-time attentional needs, are an area of growing interest. Researchers and practitioners see real-time adaptive HMIs as an approach, in parallel with greater understanding of the driver’s state, that can be used to mitigate risks associated with NDRTs and curb less strategic NDRT usage (e.g., smartphones vs. imbedded vehicles systems). Applications based upon this growing area of exploration, however, are limited by current voluntary guidelines (e.g., [
21]) which include per se lockouts and certain other limits in a manner that is independent of driving context, and thus preclude application of the findings of this work and related research. Automation and other driver assistance systems that further augment the driver’s role may amplify the need for adaptive limits based upon context.
There are several limitations to this analysis sample. While the drivers could engage in NDRTs where they chose to on roads they chose to drive on, they were driving assigned vehicles, not their own. Consequently, it is plausible that for most participants that the vehicle HMI was less familiar than that in their own vehicles, which may have impacted their likelihood of engaging in certain tasks or using certain modalities (such as voice-based tasks). While “Other” NDRTs were coded, they were too varied in demand characteristics to be meaningfully modeled in this analysis. Future research might benefit from a consideration of a wider range of NDRT categories through a more detailed annotation of the “Other” category to better capture the diversity of NDRT types and their prevalence of engagement. As computer-vision-based approaches to recognizing driver behaviors improve [
30], coding the hundreds of hours of trip-level data from a project such as this at a higher degree of resolution will become possible with fewer resources. Additionally, because NDRTs were sometimes performed simultaneously (as shown in
Table 1, nearly 3% of the driving time observed contained multiple simultaneous NDRTs), considering the additional demand of added NDRTs in the modeling effort could prove fruitful. Each moment of NDRT engagement was also considered equally, while its likely that as NDRTs subtend longer periods of time, they contribute more potential distraction; future approaches could weight NDRT engagement time points by how long a driver has persisted in the activity (perhaps using an algorithm like AttenD [
31] to score engagement over time). Although some relationships between speed and NDRT likelihood were observed to be statistically significant despite the small sample, it is possible that other relationships could not be identified. This may especially be true for NDRTs observed less frequently. While manual coding of NDRT behavior at the trip level from cabin video is quite laborious, and thus increasing sample size is a non-trivial problem, it remains true that better identifying universal relationships between speed and NDRT likelihood would be improved by looking at a larger swathe of participants. In addition, drivers were aware they were in a study and were aware they were being recorded, which may have limited engagement in NDRTs, especially unsafe or illegal NDRTs. While this is also true for naturalistic studies that have evaluated the relationships between NDRTs and crash risk [
8], it remains a limitation of instrumented-vehicle-based driving research. Finally, further consideration of the influence of environmental conditions and driver support features (e.g., ACC, SAE L2, etc.) are logical next steps.