Comparative Analysis of Parametric and Non-Parametric Data-Driven Models to Predict Road Crash Severity among Elderly Drivers Using Synthetic Resampling Techniques
Abstract
:1. Introduction
1.1. Background
1.2. Elderly Drivers
1.3. Applications of Machine Learning Models in Crash Severity Prediction
1.4. Research Objectives and Novelties
- Compare the performance of parametric (logistic regression and LDA) and non-parametric (random forest and XGBoost) machine learning models in predicting crash severity among elderly drivers, utilizing crash data from the Commonwealth of Virginia (USA) between 2014 to 2021. We assess model performance employing various metrics such as accuracy, sensitivity, specificity, balanced accuracy, and geometric mean.
- Investigate the impact of class imbalance on the predictive performance of these models and evaluate the potential benefits of employing synthetic resampling techniques, specifically random over-sampling examples (ROSE) and synthetic minority over-sampling technique (SMOTE), to address this issue.
- Assess the impact of training the machine learning models on original, ROSE-balanced, and SMOTE-balanced datasets on their generalization capabilities when facing unseen data by comparing cross-validation and test dataset results.
- Identify the most effective combination of machine learning models and resampling techniques that provides the best predictive performance in terms of sensitivity, specificity, balanced accuracy, and geometric mean.
- Evaluate the effect of various contributing factors on crash severity among elderly drivers, providing guidance for risk mitigation and safety improvement strategies.
- Provide insights and recommendations for future research and practical applications of machine learning models and resampling techniques in the field of crash severity prediction and traffic safety management.
1.5. Structure of the Paper
2. Methodology
2.1. Research Framework
2.2. Data Description
2.3. Resampling Techniques
2.3.1. Synthetic Minority Over-Sampling Technique (SMOTE)
2.3.2. Random Over-Sampling Examples (ROSE)
2.4. Parametric Machine Learning Models
2.4.1. Logistic Regression (LR)
2.4.2. Linear Discriminant Analysis (LDA)
2.5. Non-Parametric Machine Learning Models
2.5.1. Random Forest (RF)
2.5.2. eXtreme Gradient Boosting (XGBoost)
2.6. K-Fold Cross-Validation
2.7. Evaluation Metrics
2.7.1. Confusion Matrix
2.7.2. Accuracy
2.7.3. Sensitivity
2.7.4. Specificity
2.7.5. Geometric Mean
2.7.6. Balanced Accuracy
3. Results and Discussion
3.1. Comparison between Cross-Validation and Test Results
3.1.1. Original Dataset
3.1.2. ROSE Dataset
3.1.3. SMOTE Dataset
3.2. Results of the Crash Severity Models
3.2.1. Models Trained on Original Training Set
3.2.2. Models Trained on ROSE Training Set
3.2.3. Models Trained on SMOTE Training Set
3.3. Effectiveness of Resampling Techniques on Predictive Models
3.3.1. ROSE
3.3.2. SMOTE
3.4. The Effect of Influential Factors on Crash Severity
3.5. Operational and Management Implications
4. Conclusions
5. Study Limitations and Future Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- World Health Organization. Global Status Report on Road Safety 2018; World Health Organization: Geneva, Switzerland, 2018; ISBN 978-92-4-156568-4. [Google Scholar]
- Road Crashes Have More Impact on Poverty than You Probably Thought. Available online: https://blogs.worldbank.org/transport/road-crashes-have-more-impact-poverty-you-probably-thought (accessed on 3 March 2023).
- U.S. Census Bureau. 2017 National Population Projections Tables: Main Series. Available online: https://www.census.gov/data/tables/2017/demo/popproj/2017-summary-tables.html (accessed on 11 March 2023).
- The Myth of an “Ageing Society”. Available online: https://www.weforum.org/agenda/2018/05/the-myth-of-the-aging-society/ (accessed on 11 March 2023).
- Traffic Deaths Decreased in 2018, but Still 36,560 People Died | NHTSA. Available online: https://www.nhtsa.gov/traffic-deaths-decreased-2018-still-36560-people-died (accessed on 10 March 2023).
- Older Drivers. Available online: https://www.iihs.org/topics/older-drivers (accessed on 10 March 2023).
- Lee, J.; Gim, T.-H.T. Analysing the Injury Severity Characteristics of Urban Elderly Drivers’ Traffic Accidents through the Generalised Ordered Logit Model: A Case of Seoul, South Korea. J. Transp. Saf. Secur. 2022, 14, 1139–1164. [Google Scholar] [CrossRef]
- Cobb, R.W.; Coughlin, J.F. Are Elderly Drivers a Road Hazard? Problem Definition and Political Impact. J. Aging Stud. 1998, 12, 411–427. [Google Scholar] [CrossRef]
- Hakamies-Blomqvist, L. Elderly Drivers, Results from a Nordic in-Depth Study on Elderly Car Drivers. Comments on Im Bernhoft’s Paper; VTI Rapport; Swedish National Road and Transport Research Institute: Linkoping, Sweden, 1991. [Google Scholar]
- Mathias, J.L.; Lucas, L.K. Cognitive Predictors of Unsafe Driving in Older Drivers: A Meta-Analysis. Int. Psychogeriatr. 2009, 21, 637–653. [Google Scholar] [CrossRef] [PubMed]
- Bélanger, A.; Gagnon, S.; Yamin, S. Capturing the Serial Nature of Older Drivers’ Responses towards Challenging Events: A Simulator Study. Accid. Anal. Prev. 2010, 42, 809–817. [Google Scholar] [CrossRef]
- Andrews, E.C.; Westerman, S.J. Age Differences in Simulated Driving Performance: Compensatory Processes. Accid. Anal. Prev. 2012, 45, 660–668. [Google Scholar] [CrossRef]
- Rao, P.; Munoz, B.; Turano, K.; Munro, C.; West, S.K. The Decline in Attentional Visual Fields over Time among Older Participants in the Salisbury Eye Evaluation Driving Study. Investig. Opthalmology Vis. Sci. 2013, 54, 1839–1844. [Google Scholar] [CrossRef] [Green Version]
- de Wit, H. Impulsivity as a Determinant and Consequence of Drug Use: A Review of Underlying Processes. Addict. Biol. 2009, 14, 22–31. [Google Scholar] [CrossRef]
- Hanrahan, R.B.; Layde, P.M.; Zhu, S.; Guse, C.E.; Hargarten, S.W. The Association of Driver Age with Traffic Injury Severity in Wisconsin. Traffic Inj. Prev. 2009, 10, 361–367. [Google Scholar] [CrossRef]
- Kim, S.; Lym, Y.; Kim, K.-J. Developing Crash Severity Model Handling Class Imbalance and Implementing Ordered Nature: Focusing on Elderly Drivers. Int. J. Environ. Res. Public Health 2021, 18, 1966. [Google Scholar] [CrossRef]
- Alrumaidhi, M.; Rakha, H.A. Factors Affecting Crash Severity among Elderly Drivers: A Multilevel Ordinal Logistic Regression Approach. Sustainability 2022, 14, 11543. [Google Scholar] [CrossRef]
- Wang, X.; Xia, G.; Zhao, J.; Wang, J.; Yang, Z.; Loughney, S.; Fang, S.; Zhang, S.; Xing, Y.; Liu, Z. A Novel Method for the Risk Assessment of Human Evacuation from Cruise Ships in Maritime Transportation. Reliab. Eng. Syst. Saf. 2023, 230, 108887. [Google Scholar] [CrossRef]
- Hellton, K.H.; Tveten, M.; Stakkeland, M.; Engebretsen, S.; Haug, O.; Aldrin, M. Real-Time Prediction of Propulsion Motor Overheating Using Machine Learning. J. Mar. Eng. Technol. 2022, 21, 334–342. [Google Scholar] [CrossRef]
- Babichev, S.; Yasinska-Damri, L.; Liakh, I. A Hybrid Model of Cancer Diseases Diagnosis Based on Gene Expression Data with Joint Use of Data Mining Methods and Machine Learning Techniques. Appl. Sci. 2023, 13, 6022. [Google Scholar] [CrossRef]
- Almasoudi, F.M. Enhancing Power Grid Resilience through Real-Time Fault Detection and Remediation Using Advanced Hybrid Machine Learning Models. Sustainability 2023, 15, 8348. [Google Scholar] [CrossRef]
- Al Mamlook, R.E.; Abdulhameed, T.Z.; Hasan, R.; Al-Shaikhli, H.I.; Mohammed, I.; Tabatabai, S. Utilizing Machine Learning Models to Predict the Car Crash Injury Severity among Elderly Drivers. In Proceedings of the 2020 IEEE International Con-ference on Electro Information Technology (EIT), Naperville, IL, USA, 31 July–1 August 2020; pp. 105–111. [Google Scholar]
- Aldhari, I.; Almoshaogeh, M.; Jamal, A.; Alharbi, F.; Alinizzi, M.; Haider, H. Severity Prediction of Highway Crashes in Saudi Arabia Using Machine Learning Techniques. Appl. Sci. 2022, 13, 233. [Google Scholar] [CrossRef]
- Alhomaidat, F.; Abushattal, M.; Morgan Kwayu, K.; Kwigizile, V. Investigating the Interaction between Age and Liability for Crashes at Stop-Sign-Controlled Intersections. Transp. Res. Interdiscip. Perspect. 2022, 14, 100612. [Google Scholar] [CrossRef]
- Amin, S. Backpropagation-Artificial Neural Network (BP-ANN): Understanding Gender Characteristics of Older Driver Accidents in West Midlands of United Kingdom. Saf. Sci. 2020, 122, 104539. [Google Scholar] [CrossRef]
- Amiri, A.M.; Sadri, A.; Nadimi, N.; Shams, M. A Comparison between Artificial Neural Network and Hybrid Intelligent Genetic Algorithm in Predicting the Severity of Fixed Object Crashes among Elderly Drivers. Accid. Anal. Prev. 2020, 138, 105468. [Google Scholar] [CrossRef]
- Fiorentini, N.; Losa, M. Handling Imbalanced Data in Road Crash Severity Prediction by Machine Learning Algorithms. Infrastructures 2020, 5, 61. [Google Scholar] [CrossRef]
- Mafi, S.; AbdelRazig, Y.; Doczy, R. Machine Learning Methods to Analyze Injury Severity of Drivers from Different Age and Gender Groups. Transp. Res. Rec. J. Transp. Res. Board 2018, 2672, 171–183. [Google Scholar] [CrossRef]
- Taghipour, H.; Parsa, A.B.; Chauhan, R.S.; Derrible, S.; Mohammadian, A. (Kouros) A Novel Deep Ensemble Based Approach to Detect Crashes Using Sequential Traffic Data. IATSS Res. 2022, 46, 122–129. [Google Scholar] [CrossRef]
- Gu, X.; Lu, X.; Jin, X.; Guo, Y.; Zhou, Y.; Chen, Y. Analysis of Studies on Traffic Crashes Involving the Elderly. Int. Rev. Spat. Plan. Sustain. Dev. 2023, 11, 4–23. [Google Scholar] [CrossRef]
- Lunardon, N.; Menardi, G.; Torelli, N. ROSE: A Package for Binary Imbalanced Learning. R J. 2014, 6, 79–89. [Google Scholar] [CrossRef] [Green Version]
- Tantithamthavorn, C.; Hassan, A.E.; Matsumoto, K. The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models. IEEE Trans. Softw. Eng. 2020, 46, 1200–1219. [Google Scholar] [CrossRef] [Green Version]
- Menardi, G.; Torelli, N. Training and Assessing Classification Rules with Imbalanced Data. Data Min. Knowl. Discov. 2012, 28, 92–122. [Google Scholar] [CrossRef]
- Gupta, R.; Asgari, H.; Azimi, G.; Rahimi, A.; Jin, X. Analysis of Fatal Truck-Involved Work Zone Crashes in Florida: Application of Tree-Based Models. Transp. Res. Rec. J. Transp. Res. Board 2021, 2675, 1272–1290. [Google Scholar] [CrossRef]
- Rendón, E.; Alejo, R.; Castorena, C.; Isidro-Ortega, F.J.; Granda-Gutiérrez, E.E. Data Sampling Methods to Deal with the Big Data Multi-Class Imbalance Problem. Appl. Sci. 2020, 10, 1276. [Google Scholar] [CrossRef] [Green Version]
- Vilaça, M.; Macedo, E.; Coelho, M.C. A Rare Event Modelling Approach to Assess Injury Severity Risk of Vulnerable Road Users. Safety 2019, 5, 29. [Google Scholar] [CrossRef] [Green Version]
- Older Drivers | NHTSA. Available online: https://www.nhtsa.gov/road-safety/older-drivers (accessed on 15 March 2023).
- Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from Class-Imbalanced Data: Review of Methods and Applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
- Thabtah, F.; Hammoud, S.; Kamalov, F.; Gonsalves, A. Data Imbalance in Classification: Experimental Evaluation. Inf. Sci. 2020, 513, 429–441. [Google Scholar] [CrossRef]
- Pei, X.; Sze, N.N.; Wong, S.C.; Yao, D. Bootstrap Resampling Approach to Disaggregate Analysis of Road Crashes in Hong Kong. Accid. Anal. Prev. 2016, 95, 512–520. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Karacasu, M.; Ergül, B.; Altın, A. Estimating the Causes of Traffic Accidents Using Logistic Regression and Discriminant Analysis. Int. J. Inj. Control Saf. Promot. 2013, 21, 305–313. [Google Scholar] [CrossRef] [PubMed]
- Zhang, D.; Zhao, X.; Han, J.; Zhao, Y. A Comparative Study on PCA and LDA Based EMG Pattern Recognition for Anthro-pomorphic Robotic Hand. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May—7 June 2014; pp. 4850–4855. [Google Scholar]
- Yang, L.; Gao, H.; Wu, K.; Zhang, H.; Li, C.; Tang, L. Identification of Cancerlectins by Using Cascade Linear Discriminant Analysis and Optimal G-Gap Tripeptide Composition. Curr. Bioinform. 2020, 15, 528–537. [Google Scholar] [CrossRef]
- Mothwa, L.; Tapamo, J.-R.; Mapati, T. Conceptual Model of the Smart Attendance Monitoring System Using Computer Vision. In Proceedings of the 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Las Palmas de Gran Canaria, Spain, 26–29 November 2018; pp. 229–234. [Google Scholar]
- Yan, X.; He, J.; Zhang, C.; Liu, Z.; Qiao, B.; Zhang, H. Single-Vehicle Crash Severity Outcome Prediction and Determinant Extraction Using Tree-Based and Other Non-Parametric Models. Accid. Anal. Prev. 2021, 153, 106034. [Google Scholar] [CrossRef]
- Dimitrijevic, B.; Khales, S.D.; Asadi, R.; Lee, J. Short-Term Segment-Level Crash Risk Prediction Using Advanced Data Modeling with Proactive and Reactive Crash Data. Appl. Sci. 2022, 12, 856. [Google Scholar] [CrossRef]
- Guo, M.; Yuan, Z.; Janson, B.; Peng, Y.; Yang, Y.; Wang, W. Older Pedestrian Traffic Crashes Severity Analysis Based on an Emerging Machine Learning XGBoost. Sustainability 2021, 13, 926. [Google Scholar] [CrossRef]
- Islam, M.K.; Reza, I.; Gazder, U.; Akter, R.; Arifuzzaman, M.; Rahman, M.M. Predicting Road Crash Severity Using Classifier Models and Crash Hotspots. Appl. Sci. 2022, 12, 11354. [Google Scholar] [CrossRef]
- Jeong, H.; Jang, Y.; Bowman, P.J.; Masoud, N. Classification of Motor Vehicle Crash Injury Severity: A Hybrid Approach for Imbalanced Data. Accid. Anal. Prev. 2018, 120, 250–261. [Google Scholar] [CrossRef]
- Adams, J.; Hillman, M. The Risk Compensation Theory and Bicycle Helmets. Inj. Prev. 2001, 7, 89–91. [Google Scholar] [CrossRef] [Green Version]
Variable | Category | Count | Percentage |
---|---|---|---|
Crash severity | Non-severe | 148,473 | 94.09% |
Severe injury | 9327 | 5.91% | |
Crash type | Fixed-object | 13,399 | 8.49% |
Head-on | 3813 | 2.42% | |
Overturned | 1156 | 0.73% | |
Other | 10,479 | 6.64% | |
Rear-end | 52,953 | 33.56% | |
Sideswipe | 17,471 | 11.07% | |
Angle | 58,529 | 37.09% | |
Traffic signal | Yes | 40,998 | 25.98% |
No | 116,802 | 74.02% | |
Weather condition | No adverse condition | 137,196 | 86.94% |
Adverse condition | 20,604 | 13.06% | |
Roadway alignment | Straight | 142,472 | 90.29% |
Curve | 15,328 | 9.71% | |
Roadway type | Two-way divided | 91,375 | 57.91% |
Two-way undivided | 62,206 | 39.42% | |
One-way | 4219 | 2.67% | |
Work zone | No | 153,567 | 97.32% |
Yes | 4233 | 2.68% | |
Alcohol | Yes | 3483 | 2.21% |
No | 154,317 | 97.79% | |
Belted | No | 4271 | 2.71% |
Yes | 153,529 | 97.29% | |
Bike | Yes | 915 | 0.58% |
No | 156,885 | 99.42% | |
Distracted | Yes | 28,054 | 17.78% |
No | 129,746 | 82.22% | |
Drowsy | Yes | 2733 | 1.73% |
No | 155,067 | 98.27% | |
Drug | Yes | 716 | 0.45% |
No | 157,084 | 99.55% | |
Pedestrian | Yes | 1605 | 1.02% |
No | 156,195 | 98.98% | |
Speed violation | Yes | 20,211 | 12.81% |
No | 137,589 | 87.19% | |
Area type | Urban | 121,884 | 77.24% |
Rural | 35,916 | 22.76% | |
Animal | Yes | 5060 | 3.21% |
No | 152,740 | 96.79% | |
Posted speed (mph) | - | 157,800 | - |
Weekend | Yes | 32,622 | 20.67% |
No | 125,178 | 79.33% |
Crash Severity Class | Training Data (Original) | Training Data (ROSE) | Training Data (SMOTE) | Test Data |
---|---|---|---|---|
Severe | 6529 (5.9%) | 55,197 (50%) | 97,935 (48.5%) | 2798 (5.9%) |
Non-severe | 103,932 (94.1%) | 55,264 (50%) | 103,932 (51.5%) | 44,541 (94.1%) |
Total | 110,461 | 110,461 | 201,867 | 47,339 |
Predicted Class | Actual Class | |
Positive | Negative | |
Positive | True Positive (TP) | False Positive (FP) |
Negative | False Negative (FN) | True Negative (TN) |
Variable | Category | Estimate | SE | p-Value | Odds Ratio |
---|---|---|---|---|---|
Crash type | Angle Fixed object Head-on Overturned Rear end Sideswipe * | 1.10874 1.60342 2.21490 1.88370 0.29230 | 0.021 0.024 0.032 0.052 0.021 | <0.001 <0.001 <0.001 <0.001 <0.001 | 3.031 4.967 9.160 6.578 1.340 |
Traffic signal | Yes No * | 0.05584 | 0.013 | <0.001 | 1.057 |
Weather condition | No adverse condition Adverse condition * | 0.44410 | 0.017 | <0.001 | 1.559 |
Roadway alignment | Curve Straight * | 0.13316 | 0.019 | <0.001 | 1.142 |
Roadway type | One-way Two-way divided Two-way undivided * | −0.70767 −0.13259 | 0.041 0.012 | <0.001 <0.001 | 0.493 0.876 |
Work zone | Yes No * | −0.39516 | 0.035 | <0.001 | 0.674 |
Alcohol | Yes No * | 0.60794 | 0.032 | <0.001 | 1.837 |
Belted | No Yes * | 1.94371 | 0.026 | <0.001 | 6.985 |
Bike | Yes No * | 2.27529 | 0.055 | <0.001 | 9.731 |
Distracted | Yes No * | 0.11692 | 0.014 | <0.001 | 1.124 |
Pedestrian | Yes No * | 2.72422 | 0.049 | <0.001 | 15.245 |
Speed violation | Yes No * | 0.47653 | 0.014 | <0.001 | 1.610 |
Area type | Rural Urban * | 0.44437 | 0.014 | <0.001 | 1.560 |
Animal | Yes No * | −1.89159 | 0.052 | <0.001 | 0.151 |
Weekend | Yes No * | 0.09184 | 0.012 | <0.001 | 1.096 |
Intercept | Non-severe Injury|Severe Injury | 3.2344 | 0.146 | <0.001 | 25.391 |
Log-likelihood at convergence | −115,577 | ||||
Log-likelihood at zero | −139,834 | ||||
Likelihood ratio test | 48,514 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alrumaidhi, M.; Farag, M.M.G.; Rakha, H.A. Comparative Analysis of Parametric and Non-Parametric Data-Driven Models to Predict Road Crash Severity among Elderly Drivers Using Synthetic Resampling Techniques. Sustainability 2023, 15, 9878. https://doi.org/10.3390/su15139878
Alrumaidhi M, Farag MMG, Rakha HA. Comparative Analysis of Parametric and Non-Parametric Data-Driven Models to Predict Road Crash Severity among Elderly Drivers Using Synthetic Resampling Techniques. Sustainability. 2023; 15(13):9878. https://doi.org/10.3390/su15139878
Chicago/Turabian StyleAlrumaidhi, Mubarak, Mohamed M. G. Farag, and Hesham A. Rakha. 2023. "Comparative Analysis of Parametric and Non-Parametric Data-Driven Models to Predict Road Crash Severity among Elderly Drivers Using Synthetic Resampling Techniques" Sustainability 15, no. 13: 9878. https://doi.org/10.3390/su15139878
APA StyleAlrumaidhi, M., Farag, M. M. G., & Rakha, H. A. (2023). Comparative Analysis of Parametric and Non-Parametric Data-Driven Models to Predict Road Crash Severity among Elderly Drivers Using Synthetic Resampling Techniques. Sustainability, 15(13), 9878. https://doi.org/10.3390/su15139878