1. Introduction
In recent years, the rapid increase in motor vehicles in China has intensified road safety concerns. One of the primary challenges constraining road safety is traffic accidents, which severely affect the public’s travel experience and result in substantial economic losses [
1,
2,
3]. High speeds, varied vehicle types, and isolated traffic make expressways prone to accidents, often leading to large-scale congestion and sometimes partial system paralysis [
4,
5]. Therefore, accurately estimating the spatiotemporal delay effects of expressway accidents and understanding their underlying patterns are essential for improving traffic efficiency and ensuring the operational safety of expressways.
Traffic congestion has long posed a significant bottleneck to transportation systems’ safety, sustainability, and resilience [
6]. It can be categorized into two main types based on its causes: recurrent and non-recurrent congestion. Recurrent congestion is typically caused by predictable traffic patterns. In contrast, non-recurrent congestion results from unpredictable disruptions, such as traffic accidents or extreme weather conditions, leading to unpredictable spatiotemporal delays [
7]. Early research on non-recurrent congestion events employed queuing theory, shockwave models, and simulations to analyze their impact [
8,
9,
10]. For example, Jin [
11] developed a congestion spread and dissipation model based on traffic wave theory for typical traffic events, while Chen and Wang [
12] employed a cellular transmission model to analyze the spatiotemporal variation in congestion caused by traffic events. These models, though foundational, often struggled with practical applicability due to their reliance on idealized assumptions.
With advancements in data collection technologies, data-driven approaches have become central to studying spatiotemporal delay effects. The availability of large datasets has enabled researchers to utilize Geographic Information Systems (GIS) and machine learning tools to model and visualize traffic impacts more effectively [
13]. For example, Chung and Recker [
14,
15] proposed a binary integer programming model to identify accident-induced delay areas, while Liu et al. [
16] integrated GIS with shockwave models to dynamically visualize delay processes. Wang et al. [
17] further advanced these methods by incorporating shockwave propagation constraints into programming models, improving computational efficiency and practical applicability. Recent efforts, such as those by Zhang et al. [
18], have employed density clustering and Bayesian inference to explore spatiotemporal accident patterns, moving beyond traditional modeling constraints.
In addition to these data-driven approaches, researchers are increasingly focusing on identifying the factors contributing to spatiotemporal delays. Recent studies leverage both traditional econometric models and advanced machine learning techniques to uncover correlations between traffic flow, accident characteristics, and delay duration [
19,
20,
21,
22,
23,
24]. For instance, Pasidis [
19] employed econometric models to investigate the interaction between traffic variables and delay patterns, while Lin and Li [
21] used supervised learning to classify post-accident congestion types and predict the severity of delay impacts. Cao et al. [
22] developed a causal inference framework to quantify accident impacts on traffic speed. Golze et al. [
23] analyzed the temporal distribution of delays across various accident types by plotting speed profiles derived from floating vehicle trajectory data. Alsahfi [
24] explored the complex relationships between accident occurrences, severity, and demographic factors. This analysis integrated Geographic Information System (GIS) tools (space-time cube analysis) with non-parametric statistical and spatial techniques, including DBSCAN, KDE, and the Getis-Ord Gi* method. These advanced analytical techniques have enriched our understanding of the mechanisms behind traffic delays.
The availability of real-time traffic data from sensor networks to mobile positioning technologies has further enhanced spatiotemporal modeling accuracy by providing a continuous, large-scale view of traffic flow and congestion in real-world conditions [
25,
26]. This shift has allowed researchers to overcome limitations associated with earlier models, which were constrained by their spatial transferability and idealized assumptions [
27,
28,
29].
In light of these advancements, this paper proposes a novel method to estimate and analyze the spatiotemporal delay effects of expressway traffic accidents by integrating diverse geographic data sources, including accident data, vehicle trajectories, and road network information. This integration addresses common challenges associated with multi-source data inconsistencies, enhancing both spatial and temporal accuracy. A key innovation in our research is the introduction of the TPI, which accounts for the heterogeneity of road segments and enables precise estimation of delay duration, range, and severity. Additionally, we identify spatial aggregation patterns of delays using three spatiotemporal delay indicators and employ a spatial error model to examine how specific accident characteristics correlate with delay impacts. Our methodology provides traffic management authorities with valuable insights into roadway impacts, supporting targeted emergency response strategies and informed policy adjustments to mitigate future incidents. Compared to conventional methods, our approach not only improves data integration but also utilizes the TPI to evaluate congestion levels across diverse road segment attributes, offering a more robust and practical tool for traffic management.
2. Materials and Methods
The dataset for this study consists of three primary sources: (1) an expressway accident dataset with details on accident locations, mileage, time of occurrence, types, casualties, and weather conditions; (2) vehicle trajectory data, including license plates, timestamps, coordinates, and speeds; and (3) a traffic road network map containing segment identifiers, geometrical shapes, number of lanes, lengths, speed limits, and road types. As shown in
Figure 1, this study consists of four main steps. First, data preprocessing is conducted to standardize the coordinate systems of trajectory data and road network data and to remove redundant trajectory entries. Second, multi-source data integration is performed, refining accident locations and matching trajectory points with the road network. Third, accident impact identification and analysis are carried out, involving the creation of a spatiotemporal impact map, cluster analysis of accidents, and identification of hotspot areas. Finally, accident impact factors are analyzed, where key accident characteristics are selected, and spatial models are applied to analyze their influence.
Figure 1 provides an overview of the methodology, with each step explained in detail in the following sections.
2.1. Multi-Source Spatiotemporal Data Fusion and Location Refinement
Given the heterogeneity in data collection platforms, spatial and temporal references, and data organization formats, conducting spatiotemporal fusion and location refinement is essential before analysis. In this paper, the traffic road network map serves as the reference framework to accurately determine the spatial coordinates of each accident based on location and mileage information. Following this, vehicle trajectory data within the spatiotemporal bounds of the accident site are filtered to enable a comparative analysis of traffic flow under normal conditions versus during the accident event.
Map matching is necessary to address discrepancies between the trajectory and road network data. A shortest-path-based map-matching algorithm is applied to achieve both efficiency and accuracy [
30]. This method identifies potential road segments the vehicle may have traveled on, utilizing the shortest path approach to minimize the distance between trajectory points and road network segments, thereby ensuring alignment with nearby road structures.
Assuming the road network is represented as a directed graph
G (
V,
E), here,
V = {
v1, …,
vm} denotes the set of nodes, and
E = {
e1, …,
en} represents the set of road segments. Given a sequence of trajectory points
T = {
p1, …,
pt}, where each
pt represents the vehicle’s position at time
t, the shortest distance between each trajectory point and adjacent road segments is calculated. If the shortest distance is within a specified threshold deviation
δ (set to 40 m in this study), the corresponding road segment is designated as a candidate. This process is repeated for all trajectory points to form a set of candidate segments
ET. The optimal matching path
S is then determined from
ET by minimizing the following shortest-path objective function.
where
R (0 <
R < 1) denotes the reduction rate applied to road segments in
ET. Here,
le represents the length of segment
e, and
xe is a decision variable, where
xe = 1 indicates that segment
e is part of the matching path, and
xe = 0 otherwise. Solving this objective function produces the decision matrix
, producing the final matched path
.
2.2. Quantitative Assessment of Spatiotemporal Delays in Traffic Accidents
Congestion occurs when travel time through a road segment exceeds the maximum under free-flow conditions [
24,
25]. Some studies detect congestion by setting a threshold for speed differences [
28,
29]. However, a uniform threshold may lead to inaccurate results due to the varying characteristics of roads (e.g., speed limits, number of lanes, traffic volumes). The TPI, derived from vehicle travel time, provides a better measure by accounting for these differences and reflecting road conditions as either free-flowing or congested [
30]. The TPI is calculated as the ratio of actual to free-flow travel time, as expressed in Equation (2):
where
Ta and
Tf represent actual and free-flow travel times, respectively, and
Va and
Vf denote actual and free-flow speeds. Congestion severity categories based on TPI are provided in
Table 1 [
31].
Each 24 h day is divided into 144 ten-minute time slices to capture traffic dynamics. For each time slice
Ti, the average speed
on road segment
s is calculated as follows:
where
Q is the number of vehicles passing through segment
s during time slice
Ti,
Nq is the number of trajectory points for vehicle
q, and
Vn is the instantaneous speed at each point
n. Traffic speeds for each segment across all time slices are sorted with the 85th percentile speed selected as the free-flow speed
Vf. A minimum of 20 trajectory points is required for accurate speed calculation, and missing speed values are interpolated from adjacent slices if data are insufficient.
Generally, when a traffic accident occurs, the speed on the affected segment typically decreases, impacting upstream traffic. This effect propagates until normal conditions resume. The accident is considered impactful if the TPI of the segment and its upstream segments meet specific criteria within half an hour of the accident, as defined by:
where
Vs,Δt and
Vs−1,Δt represent the speeds for segment
s and its upstream segment
s − 1 within each time slice
after the accident, calculated using Equation (3).
Vs,f and
Vs−1,f are the respective free-flow speeds.
Spatiotemporal delays caused by traffic accidents are identified using Equation (4). The impact on each segment and its severity are assessed according to
Table 1, continuing until the impact dissipates. Key impact metrics for each accident include the following:
- (1)
Accident Impact Area Sa: The total length of all affected road segments.
- (2)
Accident Impact Duration Ta: Time from the accident’s occurrence until the impact is no longer observed.
- (3)
Accident Impact Degree Da: Maximum severity of the impact observed across all affected segments during each time slice.
2.3. Analysis of the Spatiotemporal Delays Associated with Traffic Accidents
This study employs three regression models—Ordinary Least Squares (OLS), Spatial Error Model (SEM), and Spatial Lag Model (SLM)—to explore the relationship between various accident characteristics and their associated spatiotemporal delays. The optimal model is selected based on the Lagrange Multiplier (LM) test results.
The SEM model captures spatial dependence through spatial error terms, reflecting spatial disturbances and overall spatial correlation where disturbances in one spatial unit can influence neighboring units. The SEM model is expressed as follows:
The SLM model accounts for spatial correlation among dependent variables by introducing spatial lag variables, representing the weighted average of attributes from neighboring spatial units. The SLM model is formulated as follows:
where
denotes an independent and identically distributed error term. The dependent variable, “
range”, refers to the duration or extent of the impact of traffic accidents.
= {
x1,
x2,
x3, …,
xp} represents the explanatory variables, including accident characteristics and environmental factors;
is the constant term, and
denotes the parameter estimating the effect of the explanatory variables on the dependent variable.
is the spatial weight matrix, and
and
represent the spatial error and spatial lag coefficients, respectively.
This study defines eight categories of accident-related features for explanatory variables, detailed in
Table 2. These categories include accident characteristics, traffic features, and environmental factors. Accident characteristics cover variables such as accident type, presence of casualties, and time of occurrence. Traffic features include the average speed on the affected road segment half an hour before the accident and the presence of secondary accidents within the spatiotemporal range of the initial accident. Environmental factors include weather conditions, road type, and whether the speed limits on the accident segment differ from adjacent segments.
The explanatory variables include both continuous and categorical variables. Since categorical variables cannot be directly used in regression models, they are transformed into dummy variables. For a categorical variable with
n categories,
n − 1 dummy variables are created to avoid multicollinearity. For example, the variable “road type” (with categories: regular road, bridge, and tunnel) is transformed into two dummy variables, as shown in
Table 3. Converting each categorical variable in
Table 2 into dummy variables and combining them with the continuous variables yields 15 explanatory variables.
When interpreting regression coefficients for SEM and SLM models, the interpretation varies based on the type of explanatory variable:
- (1)
For a continuous variable X, the regression coefficient a1 represents the average change in the dependent variable (e.g., accident duration or impact range) for a one-unit increase in X, with other variables constant.
- (2)
For a categorical variable X, the coefficient a1 reflects the average change in the dependent variable compared to a reference category, with other variables held constant. For example, using regular roads as the reference category, the coefficients for bridges and tunnels would indicate the difference in accident duration or extent of impact compared to regular roads.
Additionally, this study uses the variance inflation factor (VIF) to diagnose multicollinearity among the variables. It is generally accepted that a VIF between 0 and 10 indicates that multicollinearity is not a significant concern. Accordingly, this paper sets the threshold for multicollinearity at a VIF of less than 8.
3. Results
The experimental data in this study include expressway traffic accident records from Hunan province for October 2018, along with vehicle trajectory data and a navigation road network for the same region. As illustrated in
Figure 2, a total of 3474 traffic accidents occurred on expressways in Hunan Province in October 2018. The locations of these accidents are marked on the map of Hunan province. For each accident, a circular buffer with a radius of 10 km was created around the accident site to identify and filter trajectory points that may be associated with the accidents. This process identified trajectory data for 534,917 vehicles within these accident areas, resulting in 792,114,844 trajectory points.
Section 3.1 presents a detailed analysis of a randomly selected accident to quantify its spatiotemporal impact range.
Section 3.2 and
Section 3.3 extend this analysis with cluster and hotspot assessments for all accidents in the dataset.
3.1. Visualization of Spatiotemporal Delay Effect Estimation Results
As a case study, a visual analysis was conducted using a traffic accident on the Changsha–Shaoshan–Loudi Expressway. This accident occurred at 08:10 on 1 October 2018, due to a vehicle collision, with its location shown in
Figure 3a.
Figure 3b illustrates the fluctuating congestion levels around the accident site over time. The impact of the accident began at 08:00 and persisted until 09:20, with the overall duration of the incident’s effects lasting approximately 80 min. Notably, the analysis reveals that the accident’s effects began at 08:00, 10 min before the official record. This discrepancy likely reflects real-life scenarios, where incident times are recorded based on the reporting time rather than the actual onset of impact. Severe congestion reoccurred around 08:50, likely due to increased traffic volume, and gradually dissipated afterward. The congestion extended upstream from the accident site, with the most severe impact (rated at level 4) covering up to 2.7 km, illustrating variations in impact degree and duration across segments.
3.2. Clustering Analysis of Accidents Based on Delay Impact
A clustering analysis was conducted to explore the spatiotemporal patterns of delays caused by traffic accidents, focusing on impact duration, extent, and severity. The Fuzzy C-Means (FCM) algorithm was selected for its flexibility, allowing accidents to be associated with multiple clusters with varying degrees of membership. Compared to conventional methods like K-means, FCM better captures the complexity of traffic accident data. Using the Davies–Bouldin Index (DBI), the optimal number of clusters was determined as K = 8.
Figure 4 illustrates the clustering results, and
Table 4 provides statistical details. The data categorize 2179 accidents into eight clusters, with 1295 incidents identified as having no significant delays based on the TPI.
Clusters 1, 3, 4, and 6 represent the categories with the highest severity levels (level 4) and account for most occurrences (69.62%, or 1517 out of 2179 accidents). Clusters 1 and 6 show similar duration, extent, and impact severity characteristics, while clusters 3 and 4 demonstrate distinct impact patterns. Both clusters 3 and 4 are characterized by longer durations, larger affected areas, and higher severity levels, indicating a need for specific response measures. Cluster 3 highlights the importance of rapid response strategies, whereas cluster 4 emphasizes the need to assess the resilience of the road network in affected areas.
Clusters 2, 5, 7, and 8 are associated with lower accident risk levels, with severity ratings not exceeding level 3. Cluster 2, despite its lower severity, has a relatively high frequency of occurrence, likely representing minor daily incidents such as minor collisions or scrapes. Cluster 8, rated at severity level 3, has the lowest occurrence rate (69 out of 2179), suggesting the necessity for tailored measures in specific areas. Clusters 5 and 7 are rated at severity level 2, yet they differ in impact characteristics: Cluster 5 affects a larger area but has relatively low severity and short duration, often occurring in high-traffic zones where even brief delays can cause widespread congestion. Conversely, cluster 7 exhibits a shorter duration of impact than cluster 5.
3.3. Spatial Distribution Analysis of Accident Impact Using Kernel Density Estimation
A kernel density analysis was performed on impact duration and length to examine spatial distribution patterns further, as illustrated in
Figure 5. The results reveal clear clustering patterns, with significant hotspots concentrated around Changsha and along the Beijing–Hong Kong–Macao Expressway.
Figure 5a indicates that the most prominent hotspot for impact duration is located on the Changsha–Zhangjiajie Expressway, likely due to limited lane availability and the convergence of multiple expressways, which contribute to prolonged delays. Additional hotspots are observed on segments of the Changsha–Liuyang Expressway and the Beijing–Hong Kong–Macao Expressway. The Changsha–Liuyang Expressway connects to several other expressways, while the Beijing–Hong Kong–Macao Expressway, the busiest in Hunan Province with only four lanes, is especially prone to accidents and delays.
Further hotspots are identified at significant junctions, such as the Zhimushan Junction—where the Shanghai–Kunming High-speed Railway intersects the Erenhot–Guangzhou Expressway—and the Yinjia’ao Junction, where the Beijing–Hong Kong–Macao Expressway meets the Shanghai–Kunming High-speed Railway. These clusters are likely due to high speeds, heavy traffic flows, and frequent turns at these junctions, which increase the potential for accidents.
Figure 5b shows a similar spatial distribution for impact length, mirroring the impact duration patterns and highlighting additional hotspots. Notably, the Changsha–Zhangjiajie Expressway segment in Changde and the Hengyang and Yueyang sections of the Beijing–Hong Kong–Macao Expressway also exhibit prolonged congestion. These segments are characterized by narrow lanes and high traffic volumes, which can lead to extended delays following accidents. Strengthening monitoring efforts and installing appropriate traffic signage on these high-risk roads could help mitigate congestion and improve overall traffic flow.
3.4. Analysis of the Relationship Between Accident Characteristics and Delay Impact
The study utilized the Ordinary Least Squares (OLS) model, Spatial Error Model (SEM), and Spatial Lag Model (SLM) to analyze the correlation between accident characteristics and spatiotemporal delay impacts.
Table 5 presents the results, with significance levels denoted by *, **, and *** for the
p-values. Based on the Lagrange Multiplier (LM) and the Robust Lagrange Multiplier (Robust LM) test, the SEM model was deemed most suitable for explaining the underlying mechanisms of spatiotemporal delays associated with traffic accidents. Thus, further analysis focuses on the SEM model results.
The SEM results for impact duration indicate that collisions with stationary vehicles and fixed objects lead to longer delays than collisions with moving vehicles. Specifically, when the impact duration exceeds 45 min, the rate of stationary vehicle collisions increases by 16.1%, and fixed object collisions rise by 17.9%, compared to 9% for moving vehicle collisions. Secondary accidents and adverse weather conditions (e.g., rain) also contribute to prolonged delays. Additionally, higher vehicle speeds in the half-hour preceding an accident are associated with shorter impact durations, likely due to reduced traffic volume.
For impact length, SEM results suggest that collisions with stationary vehicles, fixed objects, and other types significantly increase the impact length compared to moving vehicle collisions, with increases of 7%, 11%, and 29%, respectively. Incidents with casualties and those occurring late at night show longer impact lengths than morning accidents. Secondary accidents are also associated with extended impact lengths. Moreover, accidents on bridges and tunnels have shorter impact lengths (44.8% and 58.2% less, respectively) than regular roads, likely due to quicker response times in these environments. Additionally, road segments with varying speed limits appear to have shorter impact lengths, possibly due to drivers adjusting their speed and maintaining greater distances between vehicles in these areas.
Our analysis highlights that traffic accidents on key expressways in Hunan province can lead to significant spatiotemporal delay effects, with notable hotspots identified on the Changsha–Zhangjiajie and Beijing–Hong Kong–Macao Expressways. Cluster analysis revealed distinct patterns of delay impacts across various accident types, while the TPI allowed for precise quantification of delay extent, duration, and severity. Additionally, our regression model results underscore the influence of specific accident characteristics (such as collision type and road segment attributes) on the severity and spread of delays. These findings provide valuable insights for traffic management and emergency response planning.
4. Conclusions and Future Work
Anomalous events like accidents, severe weather, and natural disasters are rare but have strong spatiotemporal delay effects. These impacts significantly affect transportation network resilience, making them crucial in traffic safety and urban sustainability. This study’s analysis of traffic accident-induced delays offers insights crucial for strengthening network resilience and advancing intelligent traffic management.
By leveraging multi-source geographic data, this study quantitatively assesses the spatiotemporal delay effects of expressway traffic accidents. The application of the TPI generates spatiotemporal impact maps, visually depicting delay effects across various road segments and timeframes. These maps provide essential guidance for traffic management authorities in planning and operational decision-making. Additionally, using a spatial error model, we analyze the relationship between spatiotemporal delay effects and accident characteristics, highlighting significant correlations with factors such as collision type, weather conditions, accident timing, speed on preceding segments, presence of casualties, and road infrastructure features. These findings deliver actionable insights for risk prevention and targeted interventions for highway traffic management.
Our hotspot and clustering analyses indicate that delay hotspots are concentrated on highways around Changsha, particularly on the Chang-Zhang Expressway, Hu-Kun Expressway, and Beijing–Hong Kong–Macao Expressway. This spatial clustering suggests that targeted measures, such as road design optimization and policy adjustments, could mitigate delay risks in these areas.
Future research should build on these findings by (1) extending the methodology to explore spatiotemporal delay effects in urban traffic environments, which present more complex dynamics; (2) enhancing delay estimation accuracy through the integration of video monitoring and supplementary data sources; and (3) conducting comprehensive evaluations of road network resilience using detailed models of spatiotemporal delay effects caused by various types of traffic disruptions. This progression will help refine our understanding of delay mechanisms and support the development of robust, real-time traffic management strategies.