1. Introduction
Roadway infrastructure is critical for moving an ever-increasing number of people and goods. There are over 4 million miles of public roadways in the United States [
1]. In 2023, there were over 3.2 trillion vehicle miles traveled in the country [
2], a 13% increase from 2003 [
3]. It is estimated that each year, motorists pay over
$1000 in wasted time and fuel while traveling the transportation networks [
1]. Therefore, it is important for transportation agencies to actively monitor the performance of their managed mobility infrastructure to identify improvement opportunities and determine the best allocation of limited funds.
Interstates, US Routes, State Routes, and arterials serve most of the traffic demand [
2]. Several state agencies liaise with Traffic Management Centers (TMCs) to monitor these types of roadways and coordinate incident response [
4]. Historically, a network of roadside sensors and intelligent transportation system (ITS) cameras installed statewide, as well as segment-based crowdsourced speed and travel time data, have been used to assess prevailing traffic conditions and identify challenges [
5,
6]. However, sensors and ITS cameras only provide location-specific information, require regular maintenance, and are difficult to observe on a large scale. Furthermore, crowdsourced probe vehicle speeds and travel time data are usually aggregated, which complicates the detailed analysis of traffic conditions often requiring granular information.
Roadway intersections are another critical component of transportation networks. There are over 400,000 signalized intersections in the United States, which are estimated to contribute up to 10% of all traffic delay on the National Highway System [
7]. A properly managed traffic signal can reduce congestion, enhance mobility, and decrease delays and the number of vehicles stops [
8,
9]. Over the last two decades, agencies have implemented Automated Traffic Signal Performance Measures (ATSPMs) to proactively assess intersection operations. ATSPMs are visualizations and tools derived from traffic signal controller high-resolution (tenth-of-a-second) output and detector data [
10,
11]. Actionable insights are derived thanks to the data’s high reporting intervals [
12]; however, ATSPMs are difficult to scale, and the estimation of performance measures is sensitive to traffic conditions [
13] and detector configuration [
14].
In recent years, commercial connected vehicle (CV) trajectory data has emerged as an alternative dataset to actively monitor roadway and intersection performance [
13,
15,
16,
17]. The main benefit of this dataset is that it allows for the accurate estimation of traffic conditions at a variety of levels, ranging from the analysis of localized intersection movements [
13] to nationwide mobility [
18].
1.1. Connected Vehicle Trajectory Data
It is anticipated that in 2025, 470 million CVs will be in operation in the US, Europe, and China [
19]. A study of perhaps the largest provider of CV data in 2022 reported that, on average, one in every twenty vehicles in the United States provided telematics-based CV data through a data broker that could be used to estimate interstate and arterial performance measures [
20].
1.1.1. Description
Crowdsourced CV trajectory data consist of sets of waypoints that describe the journey that equipped vehicles undertake as they traverse the roadways. The waypoint reporting interval for the same vehicle is usually in the order of a few seconds, and the spatial accuracy is usually in the order of 2–3 m. Every waypoint has the following descriptive information attached: latitude, longitude, timestamp, speed, heading, and an anonymous trip identifier. By chronologically linking individual waypoints with the same trajectory identifier, the journey of a vehicle can be obtained.
1.1.2. Applications in Transportation
Commercial CV trajectory data provide accurate information of vehicles’ journeys at virtually any scale. This characteristic makes them a good candidate for a variety of transportation studies.
Sakhare et al. have leveraged the CV dataset to measure and visualize freeway conditions, evaluate incident response, and assess work zone performance [
15]. State transportation agencies, such as the Indiana Department of Transportation (INDOT), use CV data, in conjunction with other ITS assets, to monitor roadway performance and safety [
21].
CV trajectory data have also been used to evaluate intersections. Various techniques have been developed to derive traffic signal performance measures [
13,
16,
17] with the objective of identifying challenges and signal retiming opportunities [
13]. Additionally, CV-derived traffic signal, roundabout, and stop-controlled intersection performance measures have been used to locate statewide capital investment opportunities [
22], helping agencies perform data-driven investment decisions.
Other studies have used CV trajectory data for a wide range of purposes. Desai et al. used CV data to assess electric vehicle (EV) usage and charging infrastructure [
23]. Alsahfi et al. created an algorithm that can create and update road maps and identify their characteristics from vehicle trajectories [
24]. Further research has focused on the estimation of infrastructure characteristics from CV data, such as traffic volumes [
25], vehicle miles traveled [
26], roadway speeds [
27], and traffic signal timing [
28,
29].
1.2. Motivation and Objective
As the commercial CV data industry matures, the dataset characteristics also evolve with the objective of ensuring the privacy of motorists that anonymously provide their journey information while attempting to maintain the scale and granularity needed for transportation studies. One such change in the CV dataset that occurred in 2024 is the fuzzification of trajectory waypoints. Waypoint fuzzification entails the distortion of selected records to protect sensitive information in a manner that attempts to minimize information loss.
The fuzzification approach implemented in a current CV dataset truncates latitude and longitude coordinates to two decimal points (location blurring) when vehicles are located within 0.5 mi of frequently visited locations. Furthermore, when a waypoint is fuzzified, speed and heading values are not available.
Qualitatively, the impact that this fuzzification has on data availability and distribution is shown in
Figure 1.
Figure 1 compares 10 min of data collected on the same day-of-week (DOW) and time-of-day (TOD) during the second week of May between a historic 2023 dataset with non-fuzzified waypoints (
Figure 1a) and a current 2024 dataset with fuzzified waypoints (
Figure 1b). It is important to note that, in addition to the fuzzification difference, the historic dataset is comprised of more OEM fleets than the current dataset, which is by itself expected to affect the level of representativeness.
In
Figure 1, the overall waypoint sample size decreased 27% from the historic to the current dataset. Of all waypoints available for the region shown in
Figure 1b, 6% are fuzzified (callout i shows the location of the truncated GPS coordinates), and the rest are available for analysis. Areas on and between ramps (RA), signalized intersections (S), and roundabouts (RO) saw fewer sampled waypoints. In particular, RA1 and S1 show a noticeable decrease of traversing vehicles.
As transportation agencies, the private sector, and academia continue to use and invest in CV trajectory data, it is important to assess the impact that privacy filters (i.e., fuzzification) and fleet changes may have on derived studies. Since no previous study has provided such an analysis, the objective of this study is threefold:
Evaluate the current CV market penetration rate (MPR) and compare it to previous years’ estimations.
Assess the impact of privacy filters and fleet changes on interstate, US Route, and State Route coverage.
Evaluate the change on available vehicle trajectories for analysis by movement at traffic signals, roundabouts, and all-way stops.
These analyses provide stakeholders with insights on the data representativeness changes and possible effects on related studies. All assessments are conducted using statewide Indiana CV trajectory data.
2. Market Penetration Rates
The MPR provides agencies with a key metric to answer how representative the data are of the actual traffic and is essential for building confidence in the data. The MPR is the estimated percentage of the vehicles on the roadways that provide their trajectory information.
The MPR of CV data with fuzzified records was evaluated over a week from 5–11 May 2024. The actual traffic volume information was collected from INDOT’s count stations. A majority of the count stations in Indiana use loop detectors [
30,
31,
32] to count and classify vehicles. A total of 28 count stations that were operational during the entirety of the same week, as shown in
Figure 2, were chosen for analysis. Of the 28 stations, 10 were along interstates and the remaining 18 were along non-interstate roadways that cover various geographies in Indiana.
A virtual box a quarter-mile long and as wide as the road width was created around every count station. Unique journey identifiers were counted within this box and assumed as the trajectory counts for the CV data. Heading information from individual waypoints from the CV data was used as a filter to exclude journeys along a different route and direction. Since the GPS coordinates of fuzzified records are truncated, they were excluded from the MPR analysis. The MPR for a count station was calculated using the following equation:
where
is the MPR of count station
n,
is the number of unique trajectories from the CV data within the quarter-mile-long bounding box near count station
n, and
is the volume of vehicles from the same count station over the same time period.
Figure 3 and
Figure 4 show vehicle volumes, trajectories from CV data, and MPR by count station along interstate and non-interstate roadways. On 10 interstate locations, the vehicle volume ranged from 0.11 to 0.61 million over a week. During the same week, the identified CV trajectories ranged from 5.1 to 25.4 thousand. The MPR ranged from 3.5% to 6%. On 18 non-interstate locations, the vehicle volume ranged from 0.007 to 0.53 million and the identified CV trajectories from 0.28 to 19 thousand. The MPR ranged from 2.4% to 10.2%.
The reported overall MPR for interstate and non-interstate is given by Equation (2):
where
,
R is the list of count stations along
r, and
is the overall MPR of the CV data. Due to the variation in volume at select count stations, total CV trajectories and traffic volumes across the analysis period were considered for the overall MPR calculations. This is also a comparable metric with changing stations and analysis period days across the years. In 2024, the overall MPR on interstates was reported as 4.6% and on non-interstates it was 5%.
MPR Comparison with Previous Years
The 2024 MPR was compared to reported values in previous studies from 2020 to 2023 [
20,
33].
Table 1 presents a summary of the MPR analysis across the five years. The number of days analyzed during each year and the number of count stations evaluated varied depending upon the availability of the data. However, a minimum of a contiguous one-week period and at least seven count stations by road type were assessed during any of the years. Analysis was conducted during the month of August in 2020 and 2021 and in May for 2022, 2023, and 2024.
The MPR increased every year from 2020 to 2023 along both interstate and non-interstate roadways. Interstate MPR increased from 4.4% to 5.1%, whereas non-interstate MPR increased from 4.6% to 5.3% during this period. The increase may be due to the adoption of newer CVs and/or changes in commercial arrangements in the data supply chain. In 2024, the MPR dropped 0.5% on interstates and 0.3% on non-interstates. The reduction in MPR is possibly due to the fuzzified records, a reduced fleet, and/or a change in the data supply landscape.
Due to changes in operating conditions, maintenance issues, or work zone activity, not all count stations are available across the five years. For comparison of MPR across the years at a count station level, 12 common count station locations were compared from 2021 to 2024. The details of these 12 count stations are shown in
Table 2. Highway or roadway details with approximate mile marker (MM) information is also provided in
Table 2.
Figure 5 shows MPR values for each of the common 12 count stations from 2021 to 2024, colored by the roadway type. The dotted line represents overall MPR calculated using Equation (2) across all the available stations during that year. The relative position of the count stations remained the same over the years.
3. Interstate, US Route, and State Route Coverage
Observing changes in CV data representativeness on interstates, US Routes, and State Routes is vital for agencies, as these road networks make up a majority of the roadway infrastructure maintained by a state. Knowing the representativeness of CV data will help identify opportunities for utilizing this CV data for continuous roadway mobility monitoring statewide, especially in locations with no existing sensor infrastructure. A total of twelve interstates (pink), four US Routes (blue), and three State Routes (orange) were analyzed in this section, as shown in
Figure 6. Interstate 80 (I-80) was excluded from the analysis owing to its full concurrency in the state of Indiana with routes I-94 and I-90.
Each route was divided into tenth-of-a-mile segments (0.1 miles) for the analysis and for consistent comparisons between multiple years of data. A corresponding geospatial polygon for each such 0.1-mile segment was created, and CV data were matched to these segments to determine the exact mile-marker location along a route through which a CV waypoint passed. Following this geospatial matching process, distinct counts of CV journeys (essentially unique journey identifiers) passing through each 0.1-mile segment were computed for two analysis weeks—namely, 7–13 May 2023 (historic, non-fuzzified), and 5–11 May 2024 (current, fuzzified). The 2024 dataset contains fuzzified records that cannot be considered because their GPS coordinates have been truncated and their geospatial representation does not indicate their actual location. Furthermore, their lack of heading information makes it difficult to assign them to specific directions of travel, and the lack of speed information makes them lose significant value for various freeway studies that rely on this characteristic.
The study location for this section is represented by a total of 54,284 0.1-mile segments of roadway in Indiana, with 26,838 of them being on interstates, 17,640 on selected US Routes, and 9806 on selected State Routes.
Table 3 shows that nearly 22% of interstate segments, 41% of US Route segments, and 55% of State Route segments observed an increase in CV journeys from 2023 to 2024. A very small percentage of segments showed no change, while an even lower percentage of segments could not be directly compared due to missing data in either of the analysis years.
Figure 7 shows a box-and-whisker diagram of the network level percentage and absolute CV journey changes observed for the three types of routes analyzed. General trends show that the median percentage and absolute change in CV journeys for interstates is the highest, followed by slightly lower changes in US Routes and a median change of nearly 0% on State Route segments.
A cumulative frequency distribution (CFD) of all 0.1-mile segments and their percentage and absolute changes in CV journeys between the two years is shown in
Figure 8a and
Figure 8b, respectively. Median values for percentage change in journeys range from −4.95% for interstates to −1.96% for US Routes to +0.98% for State Routes. Correspondingly, median values for absolute change in journeys range from −289 for interstates to −37 for US Routes to +10 for State Routes. A number of segments showed reductions in CV journeys of more than 2000, possibly due to construction-related road closures significantly dropping volumes over those segments. Similarly, a number of segments showing increases in CV journeys of more than 2000 were a result of construction work being completed and converting an arterial into an interstate, thus leading to a significant rise in CV traffic passing through that segment.
Figure 9 documents percentage and absolute changes in CV journeys on the three types of routes as a Pareto-sorted column plot with each 0.1-mile segment represented. As evidenced by the zone of no-change gradually moving from left to right for interstates, US Routes, and State Routes, the highest percentage decrease in journeys is seen among interstate segments (78.2%), while the lowest percentage decrease in journeys is seen among State Route segments (44.8%).
Figure 10 shows a map-based visualization of the change in the number of CV journeys across the three types of routes. To remove outliers or minor changes, any segments with percentage changes in CV journeys between −5 and 5% were ignored in
Figure 10a–c and any segments with absolute changes in CV journeys between −100 and 100 were removed in
Figure 10d–f.
Figure 10a shows that a majority of rural interstate segments showed a decrease in journeys (−50% to −5%). Some segments at the Indiana–Ohio border on I-90 and I-74 showed increases in journeys, which may be attributable to construction projects in 2023 that were completed in 2024 leading to higher volumes or off-interstate construction projects causing additional diverting traffic to utilize the interstate. Segments highlighted in red in
Figure 10d–f near the Indianapolis region in central Indiana are majorly attributable to a construction project in the northeast corner of Indianapolis that resulted in reduced or otherwise rerouted traffic through the area.
In general, the slightly lower MPR as documented in the preceding section, coupled with a smaller fleet of OEMs being represented in the CV data and the associated fuzzification can together be assumed to cause the reduction in observed CV journeys in 2024 compared to 2023 for interstate, US Route, and State Route segments. These visuals will be vital for agencies and practitioners in evaluating the usability of this novel form of CV data and identifying any significant changes that may bias year-by-year comparisons with CV data from 2020 to 2023.
4. Intersection Coverage
This section compared the number of vehicle trajectories available for analysis at 2827 signalized intersections, 158 roundabouts, and 304 all-way stops in Indiana between the historic 7–13 May 2023, dataset without fuzzified waypoints and the current 5–11 May 2024, CV dataset with fuzzified waypoints.
For a vehicle trajectory to be available for performance analysis, its movement at the intersection, that is, its direction of travel (i.e., northbound, eastbound, southbound, and westbound) and its turn type (i.e., right, through, and left), need to be identified. Therefore, the trajectory of each sampled vehicle contained in the historic and current CV datasets near the analyzed intersections was analyzed and, if possible, assigned an intersection movement [
13].
Similar to the previous section, any fuzzified waypoint has truncated GPS coordinates with missing heading information, making it impossible to determine its trajectory’s movement through the intersection. Therefore, fuzzified waypoints are not available for intersection movement performance analysis. The results of the differences between the two datasets at distinct analysis levels are presented as follows:
First, the distribution of the change in available trajectories for analysis at the movement level is discussed.
Then, the change in the number of vehicle trajectories by turn type is evaluated.
Finally, a statewide qualitative analysis at the intersection level is provided.
4.1. Change by Movement
It is important to evaluate the amount of data available for analysis at the movement level. This is because movement level traffic signal performance measures provide practitioners with insights on the operational conditions in which each of the intersection’s phases serve traffic. Depending on the performance results for all movements at an intersection, signal retiming [
13] or capital investment [
22] activities may be suggested to improve operations.
The change in the number of trajectories available for analysis (
) at intersection
i, direction of travel
j, and turn type
k was calculated as follows:
where
and
are the total number of trajectories assigned direction of travel
j and turn type
k at intersection
i from the historic and current CV datasets, respectively. The percentage change
was calculated as follows:
Figure 11 summarizes in box-and-whisker plots the distribution of the percentage (
Figure 11a) and absolute (
Figure 11b) trajectory count changes at the 26,291 movements of the intersections analyzed. All-way stops showed the largest reduction in median percentage change (callout i), followed by roundabouts (callout ii). Signalized intersections had the smallest median reduction (callout iii).
Figure 12 supplements the distribution analysis from
Figure 11 by showing the change in available trajectories by movement as CFD diagrams. Traffic signals had the largest proportion of movements, around 12.5%, where the current CV dataset did not provide any trajectory available for analysis (callout i). This phenomenon only occurred for less than 6.25% of roundabout movements (callout ii). Overall, the increase in the availability of trajectories for analysis followed a similar distribution for all intersection types (callout iii).
It is important to note that most absolute changes were within the ±500 trajectory count range (
Figure 12b). All-way stops presented the smallest decrease in trajectories available for analysis (callout iv), likely because this type of intersection usually serves fewer vehicles than the others. The information provided in
Figure 12 provides valuable insights on the expected number of trajectories available for analysis by intersection type for a week of data.
Figure 13 shows Pareto-sorted bar graphs displaying the percentage and absolute changes on available trajectories for analysis for each evaluated movement. Callouts indicate the percentage of movements where the available trajectories increased (~29%), stayed the same (~1%), and decreased (~70%). The next subsection discusses the changes at the turn type level.
4.2. Change by Turn Type
In addition to the movement level analysis, it is important to evaluate the change in the number of available trajectories for analysis at the turn level (i.e., right, through, and left). The change in the number of trajectories available for analysis (
) that followed turn type
k was calculated as follows:
where
and
are the total number of trajectories assigned turn type
k from the historic and current CV datasets, respectively. The percentage change
was calculated as follows:
Table 4 summarizes the change in the number of trajectories available for analysis by turn type. Every category shows a reduction in available trajectories as a consequence of waypoint fuzzification (
Figure 1) and MPR reduction (
Table 1). Considering all turn types, signalized intersections had the smallest percentage decrease (−11.22%) and roundabouts the largest (−17.66%). In particular, through trips at signalized intersections had the smallest percentage reduction (−10.73%), while through trips at roundabouts had the largest (−19.46%). The following subsection presents the changes at the intersection level.
4.3. Change by Intersection
An evaluation of the change in the number of available trajectories for analysis at the intersection level can help identify overall trends and statewide conditions. The change in the number of trajectories available for analysis (
) at intersection
i was calculated as follows:
where
and
are the total number of trajectories available for analysis at intersection
i from the historic and current CV datasets, respectively. The percentage change
was calculated as follows:
Figure 14 qualitatively shows the percentage (
Figure 14a–c) and absolute (
Figure 14d–f) changes at each analyzed intersection. In general, rural areas displayed less pronounced differences between the datasets. This is likely because those intersections may serve more commuting traffic than urban intersections, which can reduce the impact of fuzzification as fewer vehicles have rural areas as destination. Additionally, most intersections fall within the ≥−50% to <+50% trajectory change categories, which indicates general moderate differences at the intersection level.
5. Discussion
CV trajectory data have demonstrated to be a versatile dataset that can be used to assess mobility under a wide range of scenarios and scales. This study presented a high-level analysis on the impact that data fuzzification and fleet reduction (
Figure 1) had on a current commercial CV dataset. An evaluation of the current and historic estimated MPRs showed reductions of 0.5% and 0.3% along interstate and non-interstate roadways, respectively (
Table 1). The summary impact at freeways and intersections is summarized below.
5.1. Impact at Freeways
A comparison of the CV data available on 54,284 individual 0.1-mile segments of interstates, US Routes, and State Routes between a week of historic non-fuzzified and a week of current fuzzified data showed that the number of journeys available for analysis increased for 33.8% of segments and decreased for 65.9%.
Although the reductions are non-trivial, for simply computing average segment speeds, the authors do not believe this will significantly impact estimations. For example,
Figure 15 shows trajectory-derived heatmaps of sampled vehicle speeds over one mile of I-465 in Indianapolis, Indiana. The vertical axis represents the location on the road and the horizontal axis the DOW and TOD; displayed trajectory segments color-coded based on their speed portray mobility conditions [
15]. A similar coverage can be qualitatively concluded by comparing the heatmap generated from a non-fuzzified historic dataset (
Figure 15a) and the heatmap generated from the fuzzified current dataset (
Figure 15b).
5.2. Impact at Intersections
A comparison of the number of CV trajectories available for movement level analysis at 3289 intersections between a week of historic non-fuzzified and a week of current fuzzified data showed an 11.6% overall reduction (
Table 4). Of the 26,291 studied individual movements, the number of trajectories available for analysis increased at 28.3% and decreased at 70.4%. This large and non-uniform reduction in available trajectories for analysis may substantially decrease sampling on particular movements and perhaps bias intersection analysis. This change is a concern, because, in many cases, the movements that are fuzzified going to or from residential or commercial areas are of a significant interest. This is because signal retiming [
13] and capital investment [
22] opportunity identification algorithms rely on the unbiased sampling of vehicle trajectories at all movements at any intersection.
For example,
Figure 16 shows trajectory-derived heatmaps of the estimated Highway Capacity Manual (HCM) level of service (LOS) [
13,
34] at the minor through movements of a corridor segment with 12 signalized intersections. The vertical axis represents each analyzed intersection, and the horizontal axis represents the TOD. It is evident how data coverage is reduced for the heatmap generated with the current fuzzified dataset (
Figure 16b, callout i) when comparing it to the heatmap derived from the historic dataset without fuzzified waypoints (
Figure 16a, callout i). Improvement opportunities will now be more difficult to identify for the intersection with decreased coverage (callout i).
5.3. Future Research
This study focused on providing a high-level overview of the impacts that CV data fuzzification and fleet composition changes have on data availability for transportation studies. Future research should provide a more in-depth investigation of the effects that these changes have on specific mobility analyses by type of roadway infrastructure. Subsequent efforts should provide special attention to the impact that data fuzzification has on movement level traffic signal performance estimations, as this granularity of analysis is critical to identify retiming [
13] and infrastructure [
22] improvement opportunities.
Furthermore, improved alternative data fuzzification techniques may be derived from an intentional dialog between transportation agencies, industry, and academia to further protect motorist privacy while minimizing adverse effects on data availability. Different fuzzification approaches should be extensively evaluated and best practices defined in future studies.
6. Conclusions
Even though there is a general reduction in the amount of data available for analysis in comparison with historic datasets, with MPR values ≥4.6%, the authors believe that the current CV dataset accounts for enough representativeness to derive most relevant mobility performance measures. Furthermore, as MPR follows an upward trend, it is just a matter of time before the current dataset contains more usable data than previous versions.
Of the infrastructure CV coverage analyzed, intersection performance will likely be the most affected by data fuzzification (
Figure 1 and
Figure 16). This is because the current data filtering approach may induce unintended bias on the amount of data available for analysis by movement at intersections near popular commercial areas or large residential zones.
Although the results of this study are based on data from Indiana, the authors believe that similar trends are likely to be observed in other locations. This is because if the same fuzzification approach and fleet modifications are implemented elsewhere, proportional changes in data availability are expected to be produced.