Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use
Abstract
:1. Introduction
2. Related Work
3. Materials and Methods
3.1. Data Collection
3.2. Imbalanced Dataset Handling Techniques
3.2.1. Oversampling
3.2.2. Undersampling
3.3. Classification Methods
3.3.1. Naive Bayes (NB)
3.3.2. Bayesian Network (BayesNet)
3.3.3. Decision Tree (ID3)
3.3.4. J48 Algorithm
3.3.5. Support Vector Machine (SVM)
3.3.6. Multilayer Perceptron (MLP)
3.3.7. Deep Learning (DL)
4. Results and Discussion
- True Positive (TP): +VE observations predicted as +VE.
- True Negative (TN): −VE observations predicted as −VE.
- False Positive (FP): −VE observations predicted as +VE.
- False Negative (FN): +VE observations predicted as −VE.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- World Health Organization. WHO Report 2015: Data Tables; WHO: Geneva, Switzerland, 2015. [Google Scholar]
- World Health Organization. Mobile Phone Use: A Growing Problem of Driver Distraction; WHO: Geneva, Switzerland, 2023; Available online: https://www.who.int/publications/i/item/mobile-phone-use-a-growing-problem-of-driver-distraction (accessed on 1 March 2023).
- Alkheder, S.; Taamneh, M.; Taamneh, S. Severity prediction of traffic accident using an artificial neural network. J. Forecast. 2017, 36, 100–108. [Google Scholar] [CrossRef]
- Dong, C.; Shao, C.; Li, J.; Xiong, Z. An improved deep learning model for traffic crash prediction. J. Adv. Transp. 2018, 2018, 3869106. [Google Scholar] [CrossRef]
- Taamneh, M.; Alkheder, S.; Taamneh, S. Data-mining techniques for traffic accident modeling and prediction in the United Arab Emirates. J. Transp. Saf. Secur. 2017, 9, 146–166. [Google Scholar] [CrossRef]
- Taamneh, M.; Taamneh, S.; Alkheder, S. Clustering-based classification of road traffic accidents using hierarchical clustering and artificial neural networks. Int. J. Inj. Control Saf. Promot. 2017, 24, 388–395. [Google Scholar] [CrossRef] [PubMed]
- Rahim, M.A.; Hassan, H.M. A deep learning based traffic crash severity prediction framework. Accid. Anal. Prev. 2021, 154, 106090. [Google Scholar] [CrossRef]
- Alomari, A.H.; Khedaywi, T.S.; Jadah, A.A.; Marian, A.R.O. Evaluation of Public Transport among University Commuters in Rural Areas. Sustainability 2023, 15, 312. [Google Scholar] [CrossRef]
- Alomari, A.H.; Khedaywi, T.S.; Marian AR, O.; Jadah, A.A. Traffic speed prediction techniques in urban environments. Heliyon 2022, 8, e11847. [Google Scholar] [CrossRef] [PubMed]
- Alomari, A.H.; Abu Lebdeh, E. Smart real-time vehicle detection and tracking system using road surveillance cameras. J. Transp. Eng. Part A Syst. 2022, 148, 04022076. [Google Scholar] [CrossRef]
- Alomari, A.H.; Al-Mistarehi, B.W.; Alnaasan, T.K.; Obeidat, M.S. Utilizing Different Machine Learning Techniques to Examine Speeding Violations. Appl. Sci. 2023, 13, 5113. [Google Scholar] [CrossRef]
- Ali, S.F.; Aslam, A.S.; Awan, M.J.; Yasin, A.; Damaševičius, R. Pose estimation of driver’s head panning based on interpolation and motion vectors under a boosting framework. Appl. Sci. 2021, 11, 11600. [Google Scholar] [CrossRef]
- Alomari, A.H.; Taamneh, M.M. Front-seat seatbelt compliance in Jordan: An observational study. Adv. Transp. Stud. Int. J. 2020, 11, 101–116. [Google Scholar] [CrossRef]
- Raman, S.R.; Ottensmeyer, C.A.; Landry, M.D.; Alfadhli, J.; Procter, S.; Jacob, S.; Hamdan, E.; Bouhaimed, M. Seat-belt use still low in Kuwait: Self-reported driving behaviours among adult drivers. Int. J. Inj. Control Saf. Promot. 2014, 21, 328–337. [Google Scholar] [CrossRef]
- Fiorentini, N.; Losa, M. Handling imbalanced data in road crash severity prediction by machine learning algorithms. Infrastructures 2020, 5, 61. [Google Scholar] [CrossRef]
- Sarkar, S.; Khatedi, N.; Pramanik, A.; Maiti, J. An ensemble learning-based undersampling technique for handling class-imbalance problem. In Proceedings of the ICETIT 2019: Emerging Trends in Information Technology, Delhi, India, 21–22 June 2019; Springer International Publishing: Cham, Switzerland, 2020; pp. 586–595. [Google Scholar]
- Shi, X.; Wong, Y.D.; Li, M.Z.F.; Palanisamy, C.; Chai, C. A feature learning approach based on XGBoost for driving assessment and risk prediction. Accid. Anal. Prev. 2019, 129, 170–179. [Google Scholar] [CrossRef]
- Parsa, A.B.; Taghipour, H.; Derrible, S.; Mohammadian, A.K. Real-time accident detection: Coping with imbalanced data. Accid. Anal. Prev. 2019, 129, 202–210. [Google Scholar] [CrossRef]
- Elamrani Abou Elassad, Z.; Mousannif, H.; Al Moatassime, H. Class-imbalanced crash prediction based on real-time traffic and weather data: A driving simulator study. Traffic Inj. Prev. 2020, 21, 201–208. [Google Scholar] [CrossRef] [PubMed]
- Cai, Q.; Abdel-Aty, M.; Yuan, J.; Lee, J.; Wu, Y. Real-time crash prediction on expressways using deep generative models. Transp. Res. Part C Emerg. Technol. 2020, 117, 102697. [Google Scholar] [CrossRef]
- Peng, Y.; Li, C.; Wang, K.; Gao, Z.; Yu, R. Examining imbalanced classification algorithms in predicting real-time traffic crash risk. Accid. Anal. Prev. 2020, 144, 105610. [Google Scholar] [CrossRef]
- Boonserm, E.; Wiwatwattana, N. Using Machine Learning to Predict Injury Severity of Road Traffic Accidents During New Year Festivals from Thailand’s Open Government Data. In Proceedings of the 2021 9th International Electrical Engineering Congress (iEECON), Pattaya, Thailand, 10–12 March 2021; pp. 464–467. [Google Scholar]
- Mujalli, R.O.; López, G.; Garach, L. Bayes classifiers for imbalanced traffic accidents datasets. Accid. Anal. Prev. 2016, 88, 37–51. [Google Scholar] [CrossRef]
- Morris, C.; Yang, J.J. Effectiveness of resampling methods in coping with imbalanced crash data: Crash type analysis and predictive modeling. Accid. Anal. Prev. 2021, 159, 106240. [Google Scholar] [CrossRef]
- Bedane, T.T.; Assefa, B.G.; Mohapatra, S.K. Preventing Traffic Accidents Through Machine Learning Predictive Models. In Proceedings of the 2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), Bahir Dar, Ethiopia, 22–24 November 2021; pp. 36–41. [Google Scholar]
- Jeong, H.; Jang, Y.; Bowman, P.J.; Masoud, N. Classification of motor vehicle crash injury severity: A hybrid approach for imbalanced data. Accid. Anal. Prev. 2018, 120, 250–261. [Google Scholar] [CrossRef] [PubMed]
- Basso, F.; Pezoa, R.; Varas, M.; Villalobos, M. A deep learning approach for real-time crash prediction using vehicle-by-vehicle data. Accid. Anal. Prev. 2021, 162, 106409. [Google Scholar] [CrossRef] [PubMed]
- Schlögl, M.; Stütz, R.; Laaha, G.; Melcher, M. A comparison of statistical learning methods for deriving determining factors of accident occurrence from an imbalanced high resolution dataset. Accid. Anal. Prev. 2019, 127, 134–149. [Google Scholar] [CrossRef] [PubMed]
Attribute Name | Data Type | Possible Values | Variable Type |
---|---|---|---|
Road Classification | Nominal | Major or Minor | Independent |
Vehicle Type | Nominal | Passenger Car, Bus, or Truck | Independent |
Presence of Front-Seat Passenger | Nominal | Yes or No | Independent |
Driver Seatbelt Use | Nominal | Yes or No | Independent |
Passenger Seatbelt Use | Nominal | Yes or No | Independent |
Driver Gender | Nominal | Male or Female | Independent |
Passenger Gender | Nominal | Male or Female | Independent |
Driver Age Group | Nominal | 18–30, 31–45, or above 45. | Independent |
Passenger Age Group | Nominal | 18–30, 31–45, or above 45. | Independent |
Driver Cell Phone Usage | Nominal | Yes or No | Dependent |
Algorithm | Overall Accuracy | Accuracy of the Classes of Cell Phone Usage | |
---|---|---|---|
Yes | No | ||
Multilayer Perceptron (MLP) | 84% | 0.6% | 99.3% |
Support Vector Machine (SVM) | 84.5% | 0% | 100% |
ID3 Decision Tree | 83.5% | 2.8% | 98.4% |
J48 Decision Tree | 84.5% | 0% | 100% |
Naive Bayes (NB) | 84.5% | 0% | 100% |
Multilayer Perceptron (MLP) | 84% | 0.6% | 99.3% |
DL-Archt1 | 84.5% | 0% | 100% |
DL-Archt2 | 84.5% | 0% | 100% |
DL-Archt3 | 84.5% | 0% | 100% |
Architecture Name | Hidden Layers | Output Layer |
---|---|---|
Archt1 | 5 dense layers | One output layer |
Archt2 | 10 dense layers | |
Archt3 | 15 dense layers |
True Class | Predicted Class | |
---|---|---|
+VE | −VE | |
+VE | TP | FN |
−VE | FP | TN |
Method | Measurements | Undersampling | Oversampling | |||
---|---|---|---|---|---|---|
Resample | SMOTE | |||||
ID3 (Iterative Dichotomiser 3 Decision Tree) | Total Accuracy (%) | 0.786458 | 0.742973 | 0.749086 | ||
Using cell phone | Yes | Precision (%) | 0.735294 | 0.766667 | 0.693741 | |
Recall (%) | 0.867679 | 0.693596 | 0.901663 | |||
FPR (%) | 0.288577 | 0.208303 | 0.406874 | |||
No | Precision (%) | 0.853365 | 0.72364 | 0.855088 | ||
Recall (%) | 0.711423 | 0.791697 | 0.593126 | |||
FPR (%) | 0.132321 | 0.306404 | 0.098337 | |||
J48 (Java 48 Decision Tree) | Total Accuracy (%) | 0.757291 | 0.735761 | 0.744883 | ||
Using cell phone | Yes | Precision (%) | 0.721790 | 0.763081 | 0.702663 | |
Recall (%) | 0.804772 | 0.678704 | 0.858641 | |||
FPR (%) | 0.286573 | 0.207935 | 0.371397 | |||
No | Precision (%) | 0.798206 | 0.714144 | 0.813098 | ||
Recall (%) | 0.713427 | 0.792065 | 0.628603 | |||
FPR (%) | 0.195228 | 0.321296 | 0.141359 | |||
NB (Naive Bayes) | Total Accuracy (%) | 0.664583 | 0.683247 | 0.665752 | ||
Using cell phone | Yes | Precision (%) | 0.654102 | 0.665984 | 0.647003 | |
Recall (%) | 0.639913 | 0.726731 | 0.745481 | |||
FPR (%) | 0.312625 | 0.359662 | 0.415743 | |||
No | Precision (%) | 0.67387 | 0.703674 | 0.691904 | ||
Recall (%) | 0.687375 | 0.640338 | 0.584257 | |||
FPR (%) | 0.360087 | 0.273269 | 0.254519 | |||
BayesNet (Bayesian Network) | Total Accuracy (%) | 0.664583 | 0.683247 | 0.665753 | ||
Using cell phone | Yes | Precision (%) | 0.654102 | 0.665984 | 0.647003 | |
Recall (%) | 0.639913 | 0.726731 | 0.745481 | |||
FPR (%) | 0.312625 | 0.359662 | 0.415743 | |||
No | Precision (%) | 0.67387 | 0.703674 | 0.691904 | ||
Recall (%) | 0.687375 | 0.640338 | 0.584257 | |||
FPR (%) | 0.360087 | 0.273269 | 0.254519 | |||
MLP (Multilayer Perceptron) | Total Accuracy (%) | 0.783333 | 0.713387 | 0.724878 | ||
Using cell phone | Yes | Precision (%) | 0.733826 | 0.918879 | 0.663496 | |
Recall (%) | 0.861171 | 0.463887 | 0.978147 | |||
FPR (%) | 0.288577 | 0.040411 | 0.562084 | |||
No | Precision (%) | 0.847255 | 0.64462 | 0.946486 | ||
Recall (%) | 0.711423 | 0.959589 | 0.437916 | |||
FPR (%) | 0.138829 | 0.536113 | 0.021853 | |||
SVM (Support Vector Machine) | Total Accuracy (%) | 0.698958 | 0.696561 | 0.713617 | ||
Using cell phone | Yes | Precision (%) | 0.677686 | 0.652466 | 0.73126 | |
Recall (%) | 0.711497 | 0.832465 | 0.728637 | |||
FPR (%) | 0.312625 | 0.437546 | 0.3034 | |||
No | Precision (%) | 0.720588 | 0.772842 | 0.69378 | ||
Recall (%) | 0.687375 | 0.562454 | 0.6966 | |||
FPR (%) | 0.288503 | 0.167535 | 0.271363 |
Method | Measurements | Undersampling | Oversampling | |||
---|---|---|---|---|---|---|
Resample | SMOTE | |||||
Archt1. (5 hidden dense layers) | Total Accuracy (%) | 0.7604 | 0.7311 | 0.7447 | ||
Using cell phone | Yes | Precision (%) | 0.7305 | 0.7525 | 0.6974 | |
Recall (%) | 0.7939 | 0.6835 | 0.8742 | |||
FPR (%) | 0.2705 | 0.2219 | 0.3877 | |||
No | Precision (%) | 0.7930 | 0.7136 | 0.8264 | ||
Recall (%) | 0.7295 | 0.7781 | 0.6123 | |||
FPR (%) | 0.2061 | 0.3165 | 0.1258 | |||
Archt2. (10 hidden dense layers) | Total Accuracy (%) | 0.7427 | 0.7330 | 0.7438 | ||
Using cell phone | Yes | Precision (%) | 0.7267 | 0.7564 | 0.6864 | |
Recall (%) | 0.7440 | 0.6821 | 0.9078 | |||
FPR (%) | 0.2585 | 0.2168 | 0.4239 | |||
No | Precision (%) | 0.7582 | 0.7140 | 0.8594 | ||
Recall (%) | 0.7415 | 0.7832 | 0.5761 | |||
FPR (%) | 0.2560 | 0.3179 | 0.0922 | |||
Archt3. (15 hidden dense layers) | Total Accuracy (%) | 0.7677 | 0.7191 | 0.7175 | ||
Using cell phone | Yes | Precision (%) | 0.7352 | 0.6782 | 0.7944 | |
Recall (%) | 0.8069 | 0.8269 | 0.5951 | |||
FPR (%) | 0.2685 | 0.3872 | 0.1574 | |||
No | Precision (%) | 0.8040 | 0.7820 | 0.6706 | ||
Recall (%) | 0.7315 | 0.6128 | 0.8426 | |||
FPR (%) | 0.1931 | 0.1731 | 0.4049 |
No. | Rule | Output |
---|---|---|
1 | If (Road = major AND Age group = 18–30) | Highly likely to use the phone |
2 | If (Road = major AND Age group = 31–45 AND Gender = Male AND Vehicle = Truck AND Seatbelt = Yes) | Not using the phone |
3 | If (Road = major AND Age group = 31–45 AND Gender = Male AND Vehicle = Truck AND Seatbelt = No) | Highly likely to use the phone |
4 | If (Road = major AND Age group = 31–45 AND Gender = Male AND Vehicle = PC) | Highly likely to use the phone |
5 | If (Road = major AND Age group = 31–45 AND Gender = Male AND Vehicle = Bus AND Seatbelt = Yes) | Not using the phone |
6 | If (Road = major AND Age group = 31–45 AND Gender = Male AND Vehicle = Bus AND Seatbelt = No) | Using phone |
7 | If (Road = major AND Age group = 31–45 AND Gender = Female AND Seatbelt = Yes) | Not |
8 | If (Road = major AND Age group = 31–45 AND Gender = Female AND Seatbelt = No) | Yes |
9 | If (Road = major AND Age group = above 45 AND Gender = Female) | No |
10 | If (Road = major AND Age group = above 45 AND Gender = Male AND Seatbelt = No AND Vehicle = Truck OR PC) | no |
11 | If (Road = major AND Age group = above 45 AND Gender = Male AND Seatbelt = No AND Vehicle = Bus) | Yes |
12 | If (Road = minor AND Age group = 18–30 AND Gender = Male OR Female AND Seatbelt = No) | No |
13 | If (Road = minor AND Age group= 18–30 AND Seatbelt = Yes) | yes |
14 | If (Road = minor AND Age group = 31–45 AND Seatbelt = No AND Vehicle = Truck OR Bus) | No |
15 | If (Road = minor AND Age group = 31–45 AND Seatbelt = No AND Vehicle = PC and Gender = Male OR Female) | No |
16 | If (Road = minor AND Age group = 31–45 AND Seatbelt = Yes) | No |
17 | If (Road = minor AND Age group = above 45 AND Vehicle = Truck) | yes |
18 | If (Road = minor AND Age group = 31–45 AND Vehicle = PC AND Gender = Male AND Seatbelt = No) | yes |
19 | If (Road = minor AND Age group = 31–45 AND Vehicle = PC AND Gender = Male AND Seatbelt = Yes) | No |
20 | If (Road = minor AND Age group = 31–45 AND Vehicle = PC AND Gender = Female) | No |
21 | If (Road = minor AND Age group = 31–45 AND Vehicle = Bus) | No |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Taamneh, M.M.; Taamneh, S.; Alomari, A.H.; Abuaddous, M. Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use. Sustainability 2023, 15, 10668. https://doi.org/10.3390/su151310668
Taamneh MM, Taamneh S, Alomari AH, Abuaddous M. Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use. Sustainability. 2023; 15(13):10668. https://doi.org/10.3390/su151310668
Chicago/Turabian StyleTaamneh, Madhar M., Salah Taamneh, Ahmad H. Alomari, and Musab Abuaddous. 2023. "Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use" Sustainability 15, no. 13: 10668. https://doi.org/10.3390/su151310668
APA StyleTaamneh, M. M., Taamneh, S., Alomari, A. H., & Abuaddous, M. (2023). Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use. Sustainability, 15(13), 10668. https://doi.org/10.3390/su151310668