Next Article in Journal
Research on Integrated Control Strategy for Highway Merging Bottlenecks Based on Collaborative Multi-Agent Reinforcement Learning
Previous Article in Journal
On the Stability of Graphene-Based Aqueous Dispersions and Their Performance in Cement Mortar
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Traffic Collision Severity Modeling Using Multi-Level Multinomial Logistic Regression Model

1
College of Engineering, Northeastern University, Vancouver, BC V6B 1Z3, Canada
2
MSE Department, University Canada West, Vancouver, BC V6Z 0E5, Canada
3
ACSS Department, University Canada West, Vancouver, BC V6Z 0E5, Canada
4
College of Engineering & Architecture, Gulf University for Science and Technology, Hawally 32093, Kuwait
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(2), 838; https://doi.org/10.3390/app15020838
Submission received: 30 November 2024 / Revised: 13 January 2025 / Accepted: 13 January 2025 / Published: 16 January 2025
(This article belongs to the Section Transportation and Future Mobility)

Abstract

:
This study investigates the various factors contributing to the severity of traffic collisions, with specific attention given to elements such as the involvement of pedestrians and cyclists, the roles played by motor vehicles, prevailing weather conditions, road characteristics, and geographical contexts. Drawing from a comprehensive dataset from the Virginia Department of Transportation, encompassing over 500,000 data points, this study utilizes two statistical models. Specifically, it utilizes Multinomial Logistic Regression and Multi-Level (Mixed Effect) Multinomial Logistic Regression, which accounts for group-level heterogeneity, to explore the intricate interplay between various factors and collision severity outcomes. The findings underscore the superiority of the Multi-Level Multinomial Logistic Regression model over the standard Multinomial Logistic Regression model in capturing road user severity. Furthermore, this paper highlights the heightened odds of fatalities associated with the presence of vulnerable road users, such as pedestrians and cyclists. Collisions involving unbelted drivers exhibited odds ratios exceeding 10, indicating a substantially elevated likelihood of severe outcomes compared to their belted counterparts.

1. Introduction

Car accidents cause significant human distress and economic challenges worldwide [1]. In 2021, the National Highway Traffic Safety Administration (NHTSA) reported roughly 43,000 traffic fatalities, representing a 10.5% increase over 2020 [2]. Furthermore, fatalities resulting from speeding climbed by 5%, as did deaths from alcohol-related incidents. Predictive models for traffic accidents provide useful insights into the causes of crashes and can help reduce their occurrence. In recent years, researchers have become increasingly interested in understanding the fundamental causes of traffic accidents and building models to forecast and evaluate these incidents. Various studies have highlighted specific driver behaviours contributing to accidents, such as distracted driving (e.g., mobile device use) and impaired driving [3,4,5].
According to Adanu et al. [6], children are more likely to be fatally harmed in crashes when drivers are speeding or distracted. Su et al. [7] stated that driver behaviour contributes to traffic accidents, including road infrastructure and vehicle characteristics. Understanding the precise factors influencing accidents is vital for policymakers and stakeholders to develop targeted strategies for prevention. A study by Robartes and Chen [8] examined factors affecting the severity of injuries in automobile–bicycle crashes using Virginia police crash data. Intoxicated automobile drivers and bicyclists, along with factors like vehicle speed and roadway conditions, were found to impact injury severity significantly. The study emphasizes the need for measures against biking and driving under the influence and promoting road separation. The research by Li et al. [9] also highlighted the importance of considering human behaviour in road safety strategies. Their research underscores the need for effective road safety measures that account for the multifaceted role of human behaviour in traffic collisions.
Research by Lestina et al. [10], focusing on crash severity at freeway entrance and exit ramp interchanges, showed common crash types in Northern Virginia and emphasized factors like speed, weather conditions, and alcohol as contributors to crash severity. Moreover, Noland and Quddus [11] analyzed road casualties in England, focusing on high-risk areas characterized by poverty, low car ownership, and elevated speed limits. Their findings recommend directing road safety efforts towards these locales to reduce road mishaps. Bahrololoom et al. [12] also investigated factors affecting fatalities and severe injuries in bicycle accidents in Victoria, Australia. Their study highlighted predictors such as the cyclist’s age and gender, time of day, and infrastructure quality, emphasizing the potential for infrastructure improvements and educational initiatives.
This study delves into the intricate factors influencing collision severity, focusing prominently on human behaviour. It investigates a range of variables, including pedestrian conduct, cyclist interactions, vehicle characteristics, weather conditions, and road attributes. Assessing the severity of traffic accidents is critical for understanding the factors that lead to more serious outcomes and guiding efforts to reduce fatalities and injuries. The severity of accidents, on the other hand, varies greatly between geographical areas—such as urban vs. rural settings, districts, and times of day—influencing the sorts of crashes and their results. Predictive models that assess these variable conditions can provide critical insights into accident causes and assist in tailoring safety measures to specific areas, thereby enhancing road safety and lowering accident-related injuries. By employing advanced predictive models like Multinomial Logistic Regression and Multi-Linear Multinomial Logistic Regression, the study aims to deepen our understanding of collision severity dynamics beyond traditional methodologies.
The objective of this study is to investigate the various factors contributing to the severity of traffic collisions. The study analyzed the impact of behavioural factors such as distracted driving, impaired driving, and speeding on accident severity, utilizing statistical models to assess these factors alongside road layout and vehicle types, and offering actionable insights for policymakers, stakeholders, and the public to enhance road safety. Recognizing the pivotal role of human behaviour in traffic collisions, this research endeavours to provide evidence-based insights that can inform targeted interventions and policy strategies for advancing effective road safety initiatives.

2. Literature Review

Several previous studies have been devoted to understanding factors that lead to varying degrees of severity in traffic collisions. Statistical methodologies have been the key for traffic collision severity prediction. Olowosegun et al. studied the factors influencing the frequency of pedestrian–motor vehicle incidents in Scotland, especially at crossings and junctions. The study utilized the correlated random parameter ordered probit models to account for unobserved heterogeneity. The results highlighted a number of significant components, including road, location, weather, vehicle, driver behavior, and timing considerations. Particularly in weather, dangers, and time factors, there were observed differences between signalized and unsignalized intersections and crossings controlled either physically or by humans [13]. Zamani et al. analyzed accident data in Los Angeles to determine the factors that influence pedestrian injury severity. They investigated critical factors and their consistency throughout time using a logit model with random parameters. The study found that key variables changed throughout a seven-year period, highlighting the importance of dynamic examination of crash data [14]. Li et al. analyzed pedestrian–motor vehicle crash data in North Carolina to identify characteristics affecting pedestrian injury severity. Using random parameter logit models, the study found that ambulance rescue and curved roadways had a consistent impact on pedestrian injury severity [15]. Guo et al. presented a two-level random intercept model using Bayesian probability inference for forecasting pedestrian crash severity. The model’s utility was demonstrated using pedestrian crash data from Colorado, and the results revealed better prediction accuracy compared to previous methods [16]. Adanu et al. used a random parameter multinomial logit model to identify accident characteristics that were strongly associated with kid pedestrian crashes. The study found that children are more likely to be fatally harmed in crashes when drivers are speeding or distracted [6].
Previous studies have established that the lives of vulnerable road users such as cyclists and pedestrians are particularly at higher risk when encountering vehicle operators. For instance, a study conducted by Edwards and Leonard [17] found that, while pedestrians and cyclists are involved in traffic accidents, they tend to suffer from severe injuries and higher fatality rates than the occupants of motor vehicles. Similarly, Li et al. [18] identified that road locations with more pedestrians and cyclists riding by states are more dangerous in terms of resulting collisions; therefore, safety measures should be directed.
Brázdil et al. [19] investigated the link between road accidents and weather conditions in the Czech Republic. The dataset comprised annual information on general traffic accidents, Property Damage Occurrences, injuries (both major and minor), and fatalities. Seven weather categories were utilized to categorize circumstances at the time of the accident. Among them, “rain”, “onset of rain and light rain”, “snowfall”, and “glaze ice and rime” were recognized as the most critical, causing the highest incidence of accidents, injuries, and deaths. Additionally, through a study conducted on the impacts of antecedent climate conditions on collision severity outcomes noted by Zou et al. [20] collision severity was associated with numerous risks, including extreme weather events.
The geometry of roads and their other features are vital in determining the probability of traffic accidents. According to the study by Abdel-Aty et al. [21], road characteristics such as intersections, pedestrian crossroads, and the road’s surface also determine the extent of collision. Also, the outcome of traffic crashes varies with the geographical location of the occurrences; rural settings contribute to more severe crashes than urban settings because of higher vehicle speeds accompanied by lengthy response times from emergency services [22].
Ye et al. used a joint Poisson regression model with multivariate normal heterogeneities to model crash frequency by severity level for freeway sections [23]. The model took into account common unobserved factors influencing crash frequencies across severity levels. To address unobserved heterogeneity, various approaches were used in [24,25,26], including random parameter models, Markov switching, ordered logit models, and latent class cluster models.
Statistical methodologies have been frequently applied to model collision severity. However, compared to traditional statical models such as Logistic Regression, the Multi-Level Mixed Effect Models (MMEM) model is well-suited for modeling crash severity due to its ability to account for the hierarchical structure of data and handle the categorical nature of the dependent variable, which is inherently multinomial. Moreover, the model estimates the log odds of each severity category relative to the reference, capturing the nuanced differences that define each level of crash severity. In a study by Washington et al. [27], MLR was used to point out issues of importance for severe collision risk and show the consequences of failing to consider group-level heteroscedasticity. On the other hand, Hedeker [28] stated that MMEM is more helpful in handling data with a hierarchical structure derived from collision data to obtain a more precise and detailed understanding of collision severity.
The systematic examination of the literature confirms the complexity and heterogeneity of the traffic collision risk. This study aims to investigate the various factors contributing to the severity of traffic collisions. The study utilized statistical methodologies such as the Multinomial Logistic Regression and Multi-Level Mixed Effect Models to analyze the impact of behavioural factors such as distracted driving, impaired driving, and speeding on accident severity, utilizing statistical models to assess these factors alongside road layout and vehicle types.

3. Methodology

3.1. Data Collection

This section provides an overview of the data used in the study, which centers on predicting collision severity. The data source is the collision incidents data from the Virginia Department of Transportation (VDOT). The dataset contains several attributes, including the collision severity level, road user types, road geometry, weather conditions, and behavioural aspects of road users. This publicly accessible dataset spans from 2019 to 2023, comprising approximately 500,000 collision incidents [29]. To ensure the reliability and accuracy of the model, the study executed a rigorous series of data processing steps. The obtained dataset contains categorical attributes. Dummy coding was used for the categorical variables and their boundaries are presented by the variable categories. Data cleaning was conducted for missing values, where missing values were eliminated given the large number of observations [30]. Data processing was conducted using R-software (RStudio Team, 2023) [31]. The crash severity level is presented using the KABCO scale. The crash severity level was classified into four categories: (a) Property Damage Only (PDO), designated as O; (b) minor injuries designated as B + C (similar to [32]); (c) major injury designated as A; and (d) fatalities denoted as K.

3.2. Variables

The selection of predictor variables is pivotal in achieving accurate collision severity forecasting. The following predictors are contained in the prepared dataset: Weather Condition, Roadway Alignment, Roadway Description, Collision Type, Alcohol Consumption, Belted, Bike, Traffic Signal, Districts, Area Type, Distracted, Drugs, Drowsy, Crash Date, Motor, Pedestrians, Speed, and Animal. Visualizing the correlation matrix emerged as an invaluable tool, unveiling a visual relation of interconnectedness and allowing us to perceive which variables danced in harmony and stood out in stark contrast [33]. These visualizations provided a tangible glimpse into the complex relationships that underlie collision severity, and the most influential or important factors are mentioned in Table 1.

4. Modelling

4.1. Multinomial Logistic Regression

The Multinomial Logistic Regression model is valuable for analyzing and understanding contributing factors across multiple categories. In the context of traffic accidents, this model allows for a comprehensive exploration of the various factors influencing crash severity, encompassing aspects of human behaviour and the environment highlighted by Yasmin, S.; Eluru [34]. The application of the Multinomial Logistic Regression model in this study focuses on understanding the influence of behavioural factors on traffic collision severity. Specifically, the model is designed to show how different contributing factors affect the severity of traffic accidents, including fatalities, sever injuries, and minor injuries, in comparison to PDO.
The model development process involved breaking down the analysis into distinct segments. Specifically, the dataset was divided into Multiple Binary Logistic Regression sub-models, each comparing a particular severity category with the reference category (e.g., minor severity vs. reference, severe severity vs. reference). This approach ensured that the model captured the nuanced differences that defined each severity level, as discussed by Dong et al. [35]. The primary objective was to train Multiple Binary Logistic Regression sub-models, each shedding light on the intricate relationships between predictor variables and crash severity. By segmenting the dataset in this manner, the data was navigated precisely, enabling a deeper understanding of the key factors influencing collision outcomes, as described by Milton et al. [24]. The Multinomial Logistic Regression model estimates the log odds of each crash severity category relative to a reference category. The general equation for the Multinomial Logistic Regression model can be expressed as follows, as shown by Shiran et al. [36]:
log P Y = j     X ) = β j 0 + β j 1 X 1 + β j 2 X 2 + + β j k X k
The Multinomial Logistic Regression model estimates the log odds of each crash severity category relative to a reference category. The probability of crash severity category j given predictor variables X is represented by P Y = j     X ) . The estimated coefficients for each predictor variable in the model are denoted by β j 0 ,   β j 1 ,   β j 2 ,   ,   β j k . The predictor variables (behavioural aspects, road conditions, etc.) influencing crash severity are represented by   X 1 , X 2 ,   ,   X k .
This model, grounded in statistical principles, was designed to predict the degrees of crash severity and is helpful when multiple categories are present in the targeted variable [37]. The development phase categorizes the dataset into Multiple Binary Logistic Regression sub-models, comparing each severity category against the reference category. This step ensures that the model captures the shades of difference that define each severity level.
Implementing this approach involves training Multiple Binary Logistic Regression sub-models, each offering insights into the intricate relationships between predictor variables and crash severity. The data was navigated precisely by dissecting the dataset into these sub-models, enabling a clear grasp of the factors that played a pivotal role in determining collision outcomes [38]. This phase was a technical interlude and the cornerstone of the research study’s empirical foundation. The methodology aligned with the complexities of real-world collision scenarios, offering a lens to perceive the spectrum of severity levels, much like dissecting light through a prism in a study [33].

4.2. Multi-Level Multinomial Logistic Regression

The Multi-Level Multinomial Logistic Regression model, an extension of the Multinomial Logistic Regression, presents an effective framework for handling hierarchical or nested data structures. The context of this study, aimed at comprehending the impact of behavioural aspects on traffic collision severities, outlines the application of this methodology to analyze crash severity while appropriately addressing nested data structures [39]. The Multi-Level Multinomial Logistic Regression model is valid for modeling crash severity because it accounts for the hierarchical structure of the data and handles the categorical nature of the dependent variable. In addition, crash severity is of a multinomial outcome nature. Several previous studies utilized the Multi-Level Multinomial Logistic Regression for modeling crash severity [40,41].
The diversity in crash severity across various locations revealed an intriguing phenomenon. Within specific groups, the variation was smaller than that observed across these groups. This observation became pronounced when considering geographic divisions, distinctions between rural and urban areas, land use patterns, climate zones, and functional areas. Spatial disparities contributed significantly to the severity of injuries resulting from crashes. Therefore, a meticulous analysis of crash data necessitated the consideration of all levels of clustering. Neglecting to account for cluster-specific effects could lead to statistical inaccuracies, manifesting as biased parameter estimates, underestimated standard errors, and an exaggerated perception of statistical significance, as highlighted by Chen et al. [42].
The model was meticulously tailored to the unique structure of the dataset, accommodating the inherent variability in crash severity. During this process, the study categorized the dataset into distinct levels of crash severity, each compared against a reference point. This categorization aided the model in discerning the factors that differentiate, for example, a minor collision from a more severe one. Such granularity enriched the analysis, enabling us to capture the subtleties associated with varying crash outcomes [43].
Furthermore, the model transcended individual cases and acknowledged that diverse groups may exhibit distinct patterns. Here, the concept of random effects assumed significance. These effects were incorporated to account for the variations between groups, which might experience crashes differently due to unique circumstances. The general equation for the Multi-Level Multinomial Logistic Regression model can be expressed as follows discussed by [33,44]:
log P Y i j = j     X i j ) = β o j + β 1 j X 1 i j + β 2 j X 2 i j + + β k j X k i j + u o j + u 1 j W 1 j + u 2 j W 2 j + + u k j W k j  
The probability of crash severity category j for individual i within group j given predictor variables X i j is represented by P Y i j = j     X i j ) . The fixed effects coefficients for individual-level predictors are denoted by β o j ,   β 1 j ,   β 2 j ,   ,   β k j . The individual-level predictor variables influencing crash severity are represented by X 1 i j ,   X 2 i j ,     ,   X k i j . The random effects coefficients for group-level predictors are represented by u o j ,   u 1 j ,   u 2 j ,   ,   u k j . The group-level predictor variables influencing crash severity are denoted by W 1 j ,   W 2 j ,   ,   W k j .

4.3. Model Evaluation

For model comparison, the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) indices are standard for comparing the fit of models with different degrees of complexity. The AIC is an information criterion that originated from information theory and also penalizes the models for their complexity while searching for the best compromise between complexity and accuracy. A lower AIC value indicates a better model. Some current studies have also shown the potential of AIC in different domains and recommended it as a good selection of a model to be fitted when analyzing heavy-tailed data [45]. On the other hand, the Bayesian Information Criterion (BIC) uses a higher penalty on model complexity. It is particularly helpful where sample sizes are limited, as it leans more towards the smaller models. The most famous feature of the BIC is that it gives a fairly accurate indication of which model is selected when the sample size enlarges [46].
As the number of predictors can be very high in high dimensional space, AIC and BIC are relevant. In these contexts, the performance of these criteria can vary. It can be proved that although both factors intend to balance the ability of model fitting and complexity, their asymptotic behaviours can result in different choices again in the high-dimensional case [45]. This finding implies that the type of model selection, namely AIC or BIC, should be chosen depending on the specifics of the context and the characteristics of the dataset within the framework of regression analyses.

5. Results and Discussion

5.1. Model Comparison

Two models were developed in this study, including a Multinomial Logistic Regression model and a Multi-Level Multinomial Logistic Regression model. For all two models, the target variable remained the same: Collision Severity, where four categories were retained: Minor Injury, Major Injury, PDO, and Fatality. The different categories of Collision Severity was examined using the term KABCO, which assessed and classified the severity of collisions if they led to Fatality, Major Injury, Minor Injury, or just Property Damage. PDO (Property Damage Only) was kept as the baseline in this modelling.
After a comparative analysis between two models, Multinomial Logistic Regression (MLR) and Multi-Level Multinomial Logistic Regression (MLMLR), the model performance and complexity assessment were conducted based on their goodness of fit metrics—AIC and BIC. AIC and BIC served as guiding metrics, facilitating model selection. Lower AIC and BIC values signify a more streamlined and efficient data interpretation model. Specifically, an AIC or BIC difference of >10 is strong evidence toward the superior of the model with less AIC or BIC [47,48].
As shown in Table 2 for the Multinomial Logistic Regression model, the AIC and BIC are approximately 427,550 and 428,464, respectively. In contrast, the Multi-Level Multinomial Logistic Regression model exhibits a lower AIC (around 421,118) and BIC (about 422,365) values, indicating a better trade-off between fit and complexity. When interpreting the AIC and BIC, smaller values signify enhanced model performance in elucidating collision severity without overcomplicating matters. The analysis favours the Multi-Level Multinomial Logistic Regression model, which maintains a significant advantage with considerably lower AIC and BIC values (more than 10 units) than the Multinomial Logistic Regression model. This suggests that the MLMLR model adeptly captures intricate nuances in collision severity, making it the preferred model. Each model was built sequentially and significant variables with p-value less 0.1 were considered.

5.2. Multi-Level Multinomial Logistic Regression

This study presents the results of the Multi-Level Multinomial Logistic Regression analysis, focusing on the impact of various variables on collision severity. First, ‘Collision Type (Head-On)’ is examined, which yields odds ratios of 1.9 for fatality, 3.6 for major injuries, and 1.2 for minor injuries, as shown in Table 3. These findings underscore the heightened risk of fatality and major injuries in head-on collisions, with a comparatively modest elevation in minor injuries. For ‘Collision Type (Rear-End)’, odds ratios of 0.75 for fatality, 0.65 for major injuries, and 1.23 for minor injuries are observed. This reveals a lower likelihood of fatality and major injuries in rear-end collisions, possibly due to their lower impact force. Minor injuries show a slight elevation. Focusing on Traffic Control, collisions with traffic control measures have odds ratios of 46 for fatality, 431 for major injuries, and 1 for minor injuries compared to those without such control. This emphasizes the substantial protective role of traffic control in reducing the risk of major injuries, while minor injuries maintain a similar likelihood.
Regarding the seatbelt usage (‘Belted (No)’), the odds ratios is 16 for fatality, 10 for major injuries, and 4 for minor injuries, compared to the baseline ‘Belted (Yes)’ category, highlighting the importance of seatbelt use in averting severe outcomes. ‘Bike (Yes)’ involvement in collisions results in odds ratios of 13 for fatality, 67 for major injuries, and 41 for minor injuries, underlining the heightened vulnerability associated with bicycle-involved collisions.
As shown in Table 3, when ‘Animal (Yes)’ is involved, the odds ratios of 5.7 for fatality, 0.56 for major injuries, and 1.5 for minor injuries indicate a measured elevation in risk, especially for fatality. Pedestrian-involved collisions display striking odds ratios of 354 for fatality, 436 for major injuries, and 258 for minor injuries. This underscores the significantly heightened risk across all injury severity levels in such collisions. Finally, ‘Area Type (Urban)’ scenarios demonstrate odds ratios of 0.6 for fatality, 0.67 for major injuries, and 0.87 for minor injuries, highlighting the protective nature of urban areas. These findings underscore the critical role of various factors in shaping collision severity outcomes, with implications for safety interventions and policy measures.
Seatbelt use greatly lowers severe outcomes, and accident type—often associated with driver conduct like speeding or distraction—defines driver-dependent factors determining collision severity. Through careful driving, drivers also help to lower hazards for sensitive road users such as bicycles and pedestrians. Driver-independent components, on the other hand, are those governed by external conditions or infrastructure; they include traffic control measures, road design elements like medians and lane separations, environmental conditions like urban or rural settings, and animal involvement. By means of education and enforcement, differentiating these elements helps target driver behavior; by means of road design and policy measures, they help address systematic hazards.
The equations for the Multi-Level Multinomial Logistic Regression model developed for the crash severity are presented below. In these equations, the PDO collision severity was used as the reference category. Thus, this would simplify the interpretation of the model coefficients. The odds ratios tell how the odds of each non-reference category change relative to the reference category. The reference category acts as a baseline, making it easier to interpret the impact of predictors on the probability of other outcomes by considering their deviations from the reference category.
Fatality:
  L o g P ( Y = F a t a l ) P ( Y = P D O ) = 10.4 + 1.85 X 1 0.34 X 2 0.38 X 3 + 1.6 X 4 + 0.25 X 5 + 3.8 X 6 + 4.5 X 7 0.76 X 8 + 9.2 X 9 1.6 X 10 + 2.8 X 11
Major Injury:
L o g P ( Y = M a j o r I n j ) P ( Y = P D O ) = 7.01 + 1.4 X 1 0.69 X 2 1.1 X 3 + 4.1 X 4 + 4.4 X 5 + 2.3 X 6 + 3.6 X 7 0.46 X 8 + 7 X 9 + 0.49 X 11
Minor Injury:
                                                L o g P ( Y = M i n o r I n j ) P ( Y = P D O ) = 0.7 + 0.62 X 1 0.64 X 4 0.41 X 5 + 1.2 X 6 + 2.9 X 7 + 5.7 X 9
The dataset comprises several variables related to traffic collisions and their characteristics in the above equations. The first three variables, X 1 , X 2 , and X 3 ,   represent different types of collisions: Head-On, Rear-End, and Sideswipe collisions, respectively. Traffic control measures are captured in X 4 (Yes) and X 5 (Other), indicating whether standard traffic control was present. The belting status of individuals involved in the collision is indicated by X 6 (Yes), while the presence of a bicycle in the incident is denoted by X 7 (Yes). Variables X 8 (Yes) and X 9 (Yes) account for the involvement of animals and pedestrians, respectively. The type of area where the collision occurred is specified by X 10 (Urban), and X 11   describes the roadway as a two-way divided road. This comprehensive set of variables aims to capture a detailed picture of traffic.

6. Conclusions

The study explored factors influencing traffic accident severities in North Virginia, US, utilizing models capable of handling more than two categories in the output variables. The Multinomial Logistic Regression (MLR) and Multi-Level Multinomial Logistic Regression (MLMLR) models were utilized for analysis. The results show that the Multinomial Logistic Regression model, including Collision Type, Traffic Control, Belt Usage, Bicycle Presence, Animal Incidents, and Pedestrian Involvement, identified the key factors influencing collision severity. These variables showed statistically significant coefficients, indicating their impact on outcomes like Fatality, Major Injury, and Minor Injury.
Moreover, the Multi-Level Multinomial Logistic Regression model, accommodating hierarchical data structures, provided a nuanced perspective and reaffirmed the influence of factors identified by the Multinomial Logistic Regression model. It introduced additional variables like Area Type and Roadway Description, shedding light on their roles in determining collision severity. The Multinomial Logistic Regression model highlighted significantly higher odds ratios for fatality and major injury when Pedestrians and Bicycles were involved in accidents, emphasizing the severity of such incidents.
In addition, head-on collisions were substantially more likely to result in fatalities and catastrophic injuries than rear-end collisions, which were normally less severe. Collisions in places with traffic control measures had a significantly lower risk of fatality and sever injury, highlighting the protective impact of these systems. Not wearing a seatbelt significantly raised the probability of serious outcomes, underscoring the necessity of seatbelt use in lowering injury severity. In addition, collisions involving animals marginally increased the probability of death. Urban regions provided a protective impact, with a lower risk of severe outcomes compared to non-urban locations.
The Multi-Level Multinomial Logistic Regression model, with its ability to account for the hierarchical nature of the data, presented an added advantage, potentially offering more robust and fine-grained distinctions in predicting collision severity outcomes. The findings contribute to a better understanding of factors influencing accident severities, providing valuable information for improving road safety strategies and mitigating the impact of collisions on individuals and the economy.
To lower the probability of accidents and collision severity, road managers and designers should improve traffic management features like traffic lights and roundabouts, especially in high-risk locations. To safeguard vulnerable road users, dedicated bicycle lanes, pedestrian crosswalks, and signalized crossings can help to create safer junctions. Roads that are prone to head-on crashes should have median barriers, rumble strips, and lane separation, while urban planning principles such as lower speed limits, narrower lanes, and better lighting should be implemented to increase overall safety. Seatbelt use must be promoted by stronger rules and awareness efforts, as well as the removal of animal crash hotspots through signage, fence, and wildlife corridors. These targeted tactics can dramatically improve road safety and reduce collision risk.
Future studies can consider other modeling techniques (e.g., Convolutional Neural Networks and Bidirectional Long Short-Term Memory Networks [49]) or interaction terms. Moreover, the application assessment of the proposed solutions suggested by the model can be considered in future work. In addition, future studies could distinguish between the variables that considered the causes and consequences of traffic accidents. This distinction may improve the model’s capacity to appropriately estimate the elements determining accident severity. Future work can also consider assessing the severity of collisions based on traffic conflict techniques, simulation-based models [50,51,52,53,54] and risk-based design measures [55,56,57].

Author Contributions

Conceptualization, R.A., G.M., and Y.T.A.; methodology, R.A., G.M., and Y.T.A.; software, R.A., K.W., and Y.T.A.; validation, R.A., K.W., G.M., and Y.T.A.; formal analysis, R.A., K.W., G.M., and Y.T.A.; investigation, R.A., K.W., G.M., and Y.T.A.; resources, R.A., and Y.T.A.; data curation, R.A., K.W., G.M., and Y.T.A.; writing—original draft preparation, R.A., K.W., G.M., and Y.T.A., writing—review and editing, R.A., K.W., G.M., and Y.T.A.; visualization, R.A., K.W., G.M., and Y.T.A.; supervision, R.A., and G.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in this study can be obtained from https://www.virginiaroads.org/search?groupIds=aef4d1831da7425aad16504337ce7142 (accessed on 25 January 2024).

Conflicts of Interest

There are no conflicts of interest.

References

  1. World Health Organization Road Traffic Injuries. Available online: https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries (accessed on 25 February 2024).
  2. NHTSA. Newly Released Estimates Show Traffic Fatalities Reached a 16-Year High in 2021; NHTSA: Washington, DC, USA, 2022. Available online: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/813298 (accessed on 29 December 2024).
  3. Kidd, D.G.; Chaudhary, N.K. Changes in the sources of distracted driving among Northern Virginia drivers in 2014 and 2018: A comparison of results from two roadside observation surveys. J. Saf. Res. 2019, 68, 131–138. [Google Scholar] [CrossRef]
  4. Nurullah, A.S.; Thomas, J.; Vakilian, F. The Prevalence of Cell Phone Use While Driving in a Canadian Province. Transp. Res. Part F Traffic Psychol. Behav. 2013, 19, 52–62. [Google Scholar] [CrossRef]
  5. Macdonald, S.; Zhao, J.; Martin, G.; Brubacher, J.; Stockwell, T.; Arason, N.; Steinmetz, S.; Chan, H. The Impact on Alcohol-Related Collisions of the Partial Decriminalization of Impaired Driving in British Columbia, Canada. Accid. Anal. Prev. 2013, 59, 200–205. [Google Scholar] [CrossRef]
  6. Adanu, E.K.; Dzinyela, R.; Agyemang, W. A comprehensive study of child pedestrian crash outcomes in Ghana. Accid. Anal. Prev. 2023, 189, 107146. [Google Scholar] [CrossRef]
  7. Su, Z.; Woodman, R.; Smyth, J.; Elliott, M. The Relationship between Aggressive Driving and Driver Performance: A Systematic Review with Meta-Analysis. Accid. Anal. Prev. 2023, 183, 106972. [Google Scholar] [CrossRef]
  8. Robartes, E.; Chen, T.D. The Effect of Crash Characteristics on Cyclist Injuries: An Analysis of Virginia Automobile-Bicycle Crash Data. Accid. Anal. Prev. 2017, 104, 165–173. [Google Scholar] [CrossRef]
  9. Li, G.; Li, Y.; Li, Y.; Craig, B.; Wu, X. Investigation of Contributing Factors to Traffic Crash Severity in Southeast Texas Using Multiple Correspondence Analysis. J. Road Saf. 2021, 32, 15–28. [Google Scholar] [CrossRef]
  10. Lestina, D.C.; Williams, A.F.; Lund, A.K.; Zador, P.; Kuhlmann, T.P. Motor Vehicle Crash Injury Patterns and the Virginia Seat Belt Law. JAMA J. Am. Med. Assoc. 1991, 265, 1409. [Google Scholar] [CrossRef]
  11. Noland, R.B.; Quddus, M.A. A Spatially Disaggregate Analysis of Road Casualties in England. Accid. Anal. Prev. 2004, 36, 973–984. [Google Scholar] [CrossRef]
  12. Bahrololoom, S.; Young, W.; Logan, D. A Random Parameter Model of Factors Influencing Bicycle Fatal and Serious Injury Crashes in Victoria, Australia. Available online: https://research.monash.edu/en/publications/a-random-parameter-model-of-factors-influencing-bicycle-fatal-and (accessed on 25 June 2024).
  13. Olowosegun, A.; Babajide, N.; Akintola, A.; Fountas, G.; Fonzone, A. Analysis of pedestrian accident injury-severities at road junctions and crossings using an advanced random parameter modelling framework: The case of Scotland. Accid. Anal. Prev. 2022, 169, 106610. [Google Scholar] [CrossRef]
  14. Zamani, A.; Behnood, A.; Davoodi, S.R. Temporal stability of pedestrian injury severity in pedestrian-vehicle crashes: New insights from random parameter logit model with heterogeneity in means and variances. Anal. Methods Accid. Res. 2021, 32, 100184. [Google Scholar] [CrossRef]
  15. Li, Y.; Song, L.; Fan, W.D. Day-of-the-week variations and temporal instability of factors influencing pedestrian injury severity in pedestrian-vehicle crashes: A random parameters logit approach with heterogeneity in means and variances. Anal. Methods Accid. Res. 2021, 29, 100152. [Google Scholar] [CrossRef]
  16. Guo, M.; Yuan, Z.; Janson, B.; Yang, Y. A Two-Level Random Intercept Logit Model for Predicting Pedestrian-Vehicle Crash. In Proceedings of the International Conference on Transportation and Development 2020, Seattle, WA, USA, 26–29 May 2020; American Society of Civil Engineers: Reston, VA, USA, 2020; pp. 68–81. [Google Scholar]
  17. Edwards, M.; Leonard, D. Effects of large vehicles on pedestrian and pedalcyclist injury severity. J. Saf. Res. 2022, 82, 275–282. [Google Scholar] [CrossRef]
  18. Li, Y.; Fan, W. Modelling Severity of Pedestrian-Injury in Pedestrian-Vehicle Crashes with Latent Class Clustering and Partial Proportional Odds Model: A Case Study of North Carolina. Accid. Anal. Prev. 2019, 131, 284–296. [Google Scholar] [CrossRef]
  19. Brázdil, R.; Chromá, K.; Zahradníček, P.; Dobrovolný, P.; Dolák, L. Weather and traffic accidents in the Czech Republic, 1979–2020. Theor. Appl. Climatol. 2022, 149, 153–167. [Google Scholar] [CrossRef]
  20. Zou, Y.; Zhang, Y.; Cheng, K. Exploring the impact of climate and extreme weather on fatal traffic accidents. Sustainability 2021, 13, 390. [Google Scholar] [CrossRef]
  21. Abdel-Aty, M.; Ekram, A.-A.; Huang, H.; Choi, K. A Study on Crashes Related to Visibility Obstruction due to Fog and Smoke. Accid. Anal. Prev. 2011, 43, 1730–1737. [Google Scholar] [CrossRef]
  22. Tandrayen-Ragoobur, V. The Economic Burden of Road Traffic Accidents and Injuries: A Small Island Perspective. Int. J. Transp. Sci. Technol. 2024, in press. [CrossRef]
  23. Ye, X.; Pendyala, R.M.; Shankar, V.; Konduri, K.C. A simultaneous equations model of crash frequency by severity level for freeway sections. Accid. Anal. Prev. 2013, 57, 140–149. [Google Scholar] [CrossRef]
  24. Milton, J.C.; Shankar, V.N.; Mannering, F.L. Highway Accident Severities and the Mixed Logit Model: An Exploratory Empirical Analysis. Accid. Anal. Prev. 2008, 40, 260–266. [Google Scholar] [CrossRef]
  25. Chen, F.; Chen, S.; Ma, X. Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. J. Saf. Res. 2018, 65, 153–159. [Google Scholar] [CrossRef] [PubMed]
  26. Ma, C.; Hao, W.; Xiang, W.; Yan, W. The impact of aggressive driving behavior on driver-injury severity at highway-rail grade crossings accidents. J. Adv. Transp. 2018, 2018, 9841498. [Google Scholar] [CrossRef]
  27. Washington, S.; Karlaftis, M.G.; Mannering, F.L. Statistical and Econometric Methods for Transportation Data Analysis; Informa: London, UK, 2020. [Google Scholar]
  28. Hedeker, D. A Mixed-Effects Multinomial Logistic Regression Model. Stat. Med. 2003, 22, 1433–1446. [Google Scholar] [CrossRef]
  29. Virginia Roads. Available online: https://www.virginiaroads.org/search?groupIds=aef4d1831da7425aad16504337ce7142 (accessed on 25 June 2024).
  30. Chu, X.; Ilyas, I.; Krishnan, S.; Wang, J. Data Cleaning: Overview and Emerging Challenges. In Proceedings of the SIGMOD/PODS’16: International Conference on Management of Data, San Francisco, CA, USA, 26 June–1 July 2016. [Google Scholar] [CrossRef]
  31. RStudio Team. RStudio: Integrated Development Environment for R; RStudio: Boston, MA, USA, 2023. [Google Scholar]
  32. Moomen, M.; Molan, A.M.; Ksaibati, K. A random parameters multinomial logit model analysis of median barrier crash injury severity on Wyoming interstates. Sustainability 2023, 15, 10856. [Google Scholar] [CrossRef]
  33. Usman, T.; Fu, L.; Miranda-Moreno, L.F. Injury Severity Analysis: Comparison of Multilevel Logistic Regression Models and Effects of Collision Data Aggregation. J. Mod. Transp. 2016, 24, 73–87. [Google Scholar] [CrossRef]
  34. Yasmin, S.; Eluru, N. Evaluating Alternate Discrete Outcome Frameworks for Modeling Crash Injury Severity. Accid. Anal. Prev. 2013, 59, 506–521. [Google Scholar] [CrossRef]
  35. Dong, C.; Richards, S.H.; Huang, B.; Jiang, X. Identifying the Factors Contributing to the Severity of Truck-Involved Crashes. Int. J. Inj. Control Saf. Promot. 2013, 22, 116–126. [Google Scholar] [CrossRef]
  36. Shiran, G.; Imaninasab, R.; Khayamim, R. Crash Severity Analysis of Highways Based on Multinomial Logistic Regression Model, Decision Tree Techniques and Artificial Neural Network: A Modeling Comparison. Sustainability 2021, 13, 5670. [Google Scholar] [CrossRef]
  37. Xie, Y.; Zhao, K.; Huynh, N. Analysis of Driver Injury Severity in Rural Single-Vehicle Crashes. Accid. Anal. Prev. 2012, 47, 36–44. [Google Scholar] [CrossRef]
  38. Mannering, F.; Bhat, C.R.; Shankar, V.; Abdel-Aty, M. Big Data, Traditional Data and the Tradeoffs between Prediction and Causality in Highway-Safety Analysis. Anal. Methods Accid. Res. 2020, 25, 100113. [Google Scholar] [CrossRef]
  39. Wang, Y.; Kockelman, K.M. A Poisson-Lognormal Conditional-Autoregressive Model for Multivariate Spatial Analysis of Pedestrian Crash Counts across Neighborhoods. Accid. Anal. Prev. 2013, 60, 71–84. [Google Scholar] [CrossRef] [PubMed]
  40. Islam, S.M.; Washington, S.; Kim, J.; Haque, M.M. A hierarchical multinomial logit model to examine the effects of signal strategies on right-turn crash injury severity at signalised intersections. Accid. Anal. Prev. 2023, 188, 107091. [Google Scholar] [CrossRef] [PubMed]
  41. IM Almadi, A.; Al Mamlook, R.E.; Ullah, I.; Alshboul, O.; Bandara, N.; Shehadeh, A. Vehicle collisions analysis on highways based on multi-user driving simulator and multinomial logistic regression model on US highways in Michigan. Int. J. Crashworthiness 2023, 28, 770–785. [Google Scholar] [CrossRef]
  42. Chen, F.; Chen, S. Injury Severities of Truck Drivers in Single- and Multi-Vehicle Accidents on Rural Highways. Accid. Anal. Prev. 2011, 43, 1677–1688. [Google Scholar] [CrossRef]
  43. Kim, J.-K.; Ulfarsson, G.F.; Kim, S.; Shankar, V.N. Driver-Injury Severity in Single-Vehicle Crashes in California: A Mixed Logit Analysis of Heterogeneity due to Age and Gender. Accid. Anal. Prev. 2013, 50, 1073–1081. [Google Scholar] [CrossRef]
  44. Quddus, M. Effects of Geodemographic Profiles of Drivers on Their Injury Severity from Traffic Crashes Using Multilevel Mixed-Effects Ordered Logit Model. Transp. Res. Rec. J. Transp. Res. Board 2015, 2514, 149–157. [Google Scholar] [CrossRef]
  45. Bai, Z.; Choi, K.P.; Fujikoshi, Y.; Hu, J. Asymptotics of AIC, BIC and Cp Model Selection Rules in High-Dimensional Regression. Bernoulli 2022, 28, 2375–2403. [Google Scholar] [CrossRef]
  46. Lanfear, R.; Calcott, B.; Kainer, D.; Mayer, C.; Stamatakis, A. Selecting Optimal Partitioning Schemes for Phylogenomic Datasets. BMC Evol. Biol. 2014, 14, 82. [Google Scholar] [CrossRef]
  47. Raftery, A.E. Bayesian Model Selection in Social Research. Sociol. Methodol. 1995, 25, 111–163. [Google Scholar] [CrossRef]
  48. Zucchini, W. An introduction to model selection. J. Math. Psychol. 2000, 44, 41–61. [Google Scholar] [CrossRef]
  49. Alhaek, F.; Liang, W.; Rajeh, T.M.; Javed, M.H.; Li, T. Learning spatial patterns and temporal dependencies for traffic accident severity prediction: A deep learning approach. Knowl.-Based Syst. 2024, 286, 111406. [Google Scholar] [CrossRef]
  50. Liu, Y.; Alsaleh, R.; Sayed, T. Modelling motorized and non-motorized vehicle conflicts using multiagent inverse reinforcement learning approach. Transp. B Transp. Dyn. 2024, 31, 2314762. [Google Scholar] [CrossRef]
  51. Nasernejad, P.; Sayed, T.; Alsaleh, R. Multiagent modeling of pedestrian-vehicle conflicts using Adversarial Inverse Reinforcement Learning. Transp. A Transp. Sci. 2023, 19, 2061081. [Google Scholar] [CrossRef]
  52. Lanzaro, G.; Sayed, T.; Alsaleh, R. Can motorcyclist behavior in traffic conflicts be modeled? A deep reinforcement learning approach for motorcycle-pedestrian interactions. Transp. B Transp. Dyn. 2022, 10, 396–420. [Google Scholar] [CrossRef]
  53. Lanzaro, G.; Sayed, T.; Alsaleh, R. Modeling motorcyclist–pedestrian near misses: A multiagent adversarial inverse reinforcement learning approach. J. Comput. Civ. Eng. 2022, 36, 04022038. [Google Scholar] [CrossRef]
  54. Alsaleh, R.; Sayed, T. Do road users play Nash Equilibrium? A comparison between Nash and Logistic stochastic Equilibriums for multiagent modeling of road user interactions in shared spaces. Expert Syst. Appl. 2022, 1, 117710. [Google Scholar] [CrossRef]
  55. Alsaleh, R.; Lanzaro, G.; Sayed, T. Incorporating design consistency into risk-based geometric design of horizontal curves: A reliability-based optimization framework. Transp. A Transp. Sci. 2024, 3, 2174356. [Google Scholar] [CrossRef]
  56. Lanzaro, G.; Alsaleh, R.; Sayed, T. Investigating the impact of correlation on system multimode reliability-based analysis of highway geometric design. Transp. A Transp. Sci. 2021, 10, 1027–1054. [Google Scholar] [CrossRef]
  57. Alsaleh, R.; Sayed, T.; Ismail, K.; AlRukaibi, F. System reliability as a surrogate measure of safety for horizontal curves: Methodology and case studies. Transp. A Transp. Sci. 2020, 1, 957–986. [Google Scholar] [CrossRef]
Table 1. Modelling dataset variables.
Table 1. Modelling dataset variables.
VariablesDescriptionPercentage/Mean
Weather conditionCollision in rainy condition16% (Adverse)
Roadway alignmentCollision at different alignment like straight or intersection14.71% (Curve)
85% (Straight)
Roadway descriptionCollision based on road type3.19% (One-Way)57% (Two-Way Divided)
39.73% (Two-Way Undivided)
Collision typeIn what way did the collision happen26.22% (Angle)
21.7% (Fixed)
2.3% (Head-On)
1.6% (No Collision)
27.51% (Rear-End)
10.04% (Sideswipe)
AlcoholDriver under influence of alcohol at the time of collision5.8%
UnbeltedAt the time of collision if the seat belts were on or not4.6%
Bike Collision with bike or not0.48%
Traffic signalIf the traffic control signal were present in the area80.4%
Area type (urban)If the collision happened in urban or rural area74.27% (Urban)
25.77% (Rural)
DistractedAny distractions in external environment for driver17.43%
DrowsySleepy2.7%
Crash dateCollison happened on weekday or weekend73.61% (Weekday)
26.38% (Weekend)
DrugDriver under influence of drugs/medicine at the time of collision1.01%
MotorWere any cars or motor vehicles involved in the collision1.71%
PedAny pedestrian involved1.18%
SpeedDriver speeding or not at the time of collision20.77%
AnimalCollision involving animals6.4%
Table 2. Model performance.
Table 2. Model performance.
Multinomial Logistic RegressionMulti-Level Multinomial Logistic Regression
AIC427,549421,118
BIC428,464422,365
Table 3. Output from Multi-Level MNL model, including coefficients, p-value, and odds ratio.
Table 3. Output from Multi-Level MNL model, including coefficients, p-value, and odds ratio.
Variables Bike-VehicleFatalityMajor InjuryMinor Injury
Coef.p-ValueOdds Ratio
(exp(coef))
Coef.p-ValueOdds Ratio
(exp(coef))
Coef.p-ValueOdds Ratio
(exp(coef))
Intercept−10.4<0.01 −7.01<0.01 −0.7<0.01
X1: Collision Type (Head-On)
* Angle
1.850.061.91.4<0.013.60.62<0.51.2
X2: Collision Type (Rear-End)
* Angle
−0.34<0.010.75−0.690.040.65---
X3: Collision Type (Sideswipe)
* Angle
−0.38<0.010.68−1.10.020.38---
X4: Traffic Control (Yes)
(No) *
1.6<0.01464.1<0.01431−0.64<0.011
X5: Traffic Control (Other)
(No) *
0.25<0.01184.4<0.01345−0.41<0.010.84
X6: Belted (No)
(Yes) *
3.8<0.0144.702.3<0.01101.2<0.014
X7: Bike (Yes)
(No) *
4.5<0.01133.6<0.01672.9<0.0141
X8: Animal (Yes)
(No)*
−0.76<0.015.7−0.46<0.010.56---
X9: Pedestrians (Yes)
(No) *
9.2<0.013547.0<0.014365.7<0.01258
X10: Area Type
(Urban)
Rural *
−1.60.100.60------
X11: Roadway
Description (Two-way divide)
(One-way) *
2.8<0.031.70.490.062.7---
Random effect coef.:
Var (V_dot)
0.67 0.64 0.86
Goodness of fit:
AIC421,118
BIC422,365
* Reference category.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alsaleh, R.; Walia, K.; Moshiri, G.; Alsaleh, Y.T. Traffic Collision Severity Modeling Using Multi-Level Multinomial Logistic Regression Model. Appl. Sci. 2025, 15, 838. https://doi.org/10.3390/app15020838

AMA Style

Alsaleh R, Walia K, Moshiri G, Alsaleh YT. Traffic Collision Severity Modeling Using Multi-Level Multinomial Logistic Regression Model. Applied Sciences. 2025; 15(2):838. https://doi.org/10.3390/app15020838

Chicago/Turabian Style

Alsaleh, Rushdi, Kawal Walia, Ghoncheh Moshiri, and Yasmeen T. Alsaleh. 2025. "Traffic Collision Severity Modeling Using Multi-Level Multinomial Logistic Regression Model" Applied Sciences 15, no. 2: 838. https://doi.org/10.3390/app15020838

APA Style

Alsaleh, R., Walia, K., Moshiri, G., & Alsaleh, Y. T. (2025). Traffic Collision Severity Modeling Using Multi-Level Multinomial Logistic Regression Model. Applied Sciences, 15(2), 838. https://doi.org/10.3390/app15020838

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop