Enhancing Keystroke Dynamics Authentication with Ensemble Learning and Data Resampling Techniques
Abstract
:1. Introduction and Motivation
1.1. Introduction
1.2. Motivation
- (1)
- Enhanced Security: Unlike passwords, which can be easily guessed, stolen, or shared, keystroke dynamics rely on unique typing behaviors that are significantly harder for attackers to imitate or forge. This characteristic makes keystroke dynamics a promising solution for enhancing security in environments that require continuous and unobtrusive authentication.
- (2)
- Improved User Experience: Keystroke dynamics operate seamlessly in the background as users type naturally, eliminating the need for memorizing complex passwords or carrying additional hardware. This not only simplifies the authentication process but also enhances the overall user experience by minimizing disruptions during authentication.
- (3)
- Cost-Effective and Scalable Implementation: Keystroke dynamics utilize standard keyboards, avoiding the need for expensive biometric hardware. This makes them an affordable and scalable solution for a wide range of applications, from personal computing to enterprise-level security systems.
- (1)
- Development of a Keystroke Data Collection Platform Using Django Framework: We developed a Django-based web platform for standardized keystroke data collection. This platform captures input timings and generates balanced datasets, addressing the issue of data scarcity and variability by providing a consistent, scalable means of collecting keystroke dynamics data from diverse user populations.
- (2)
- Implementation of Resampling Techniques to Handle Class Imbalance: To tackle the issue of class imbalance, we employed resampling methods such as SMOTE and Random Under-sampling. These techniques help to balance the dataset, particularly addressing the underrepresentation of minority keystroke patterns, which is critical for improving model training and enhancing generalization.
- (3)
- Application of Ensemble Learning Methods for Improved Classification: We propose the use of ensemble learning methods, including Random Forest and XGBoost, to enhance keystroke dynamics classification. These methods outperform traditional machine learning algorithms by reducing variance and improving generalization, particularly when dealing with high-dimensional data. This paper demonstrates how ensemble learning can effectively mitigate the risks of overfitting in small, high-dimensional datasets.
2. Related Work
2.1. Traditional User Authentication Methods
2.2. Application of Machine Learning in Keystroke Dynamics Authentication
3. Proposed Method
3.1. Data Collection
3.2. Data Resampling
Algorithm 1 SMOTE Algorithm. |
|
3.3. Ensemble Learning
3.3.1. Random Forest
3.3.2. XGBoost
3.3.3. Bagging
3.4. Combining SMOTE with Ensemble Learning Methods in Keystroke Dynamics Authentication
Algorithm 2 Data Classification using Django, SMOTE, and Ensemble Learning. |
|
3.5. Computational Complexity Analysis
4. Experimental
4.1. Experimental Setup
4.1.1. Datasets
4.1.2. Evaluation Metrics
4.1.3. Comparative Methods
- (1)
- Decision Trees (DT)
- (2)
- Support Vector Machines (SVM)
- (3)
- k-Nearest Neighbors (k-NN)
- (4)
- Random Forest (RF)
- (5)
- XGBoost
- (6)
- Bagging
- (7)
- SMOTE + Random Forest (SMOTE RF)
- (8)
- SMOTE + XGBoost (SMOTE XGBoost)
- (9)
- SMOTE + Bagging (SMOTE Bagging)
4.2. Experimental Results
4.2.1. Method Comparison Results
4.2.2. Parameter Sensitivity Analysis
4.2.3. Evaluation of Classifier Performance Using ROC Curves
4.2.4. Comparison of SMOTE-Enhanced Methods with Other Resampling Techniques
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ayeswarya, S.; Singh, K.J. A Comprehensive Review on Secure Biometric-Based Continuous Authentication and User Profiling. IEEE Access 2024, 12, 82996–83021. [Google Scholar] [CrossRef]
- Giot, R.; El-Abed, M.; Rosenberger, C. Keystroke dynamics authentication. Biometrics 2011, 1, 157–182. [Google Scholar]
- Intan, I. Combining of feature extraction for real-time facial authentication system. In Proceedings of the 2017 5th International Conference on Cyber and IT Service Management (CITSM), Denpasar, Indonesia, 8–10 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Alshanketi, F.; Traore, I.; Ahmed, A.A. Improving Performance and Usability in Mobile Keystroke Dynamic Biometric Authentication. In Proceedings of the 2016 IEEE Security and Privacy Workshops (SPW), San Jose, CA, USA, 22–26 May 2016; pp. 66–73. [Google Scholar] [CrossRef]
- Wankhede, S.B.; Verma, S. Keystroke dynamics authentication system using neural network. Int. J. Innov. Res. Dev. 2014, 3, 157–164. [Google Scholar]
- Andrean, A.; Jayabalan, M.; Thiruchelvam, V. Keystroke dynamics based user authentication using deep multilayer perceptron. Int. J. Mach. Learn. Comput. 2020, 10, 134–139. [Google Scholar] [CrossRef]
- Cilia, D.; Inguanez, F. Multi-model authentication using keystroke dynamics for smartphones. In Proceedings of the 2018 IEEE 8th International Conference on Consumer Electronics-Berlin (ICCE-Berlin), Berlin, Germany, 2–5 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
- Kiyani, A.T.; Lasebae, A.; Ali, K.; Rehman, M.U.; Haq, B. Continuous user authentication featuring keystroke dynamics based on robust recurrent confidence model and ensemble learning approach. IEEE Access 2020, 8, 156177–156189. [Google Scholar] [CrossRef]
- Quimatio, B.M.A.; Njike, O.F.Y.; Nkenlifack, M. User Authentification through Keystroke dynamics based on ensemble learning approach. In Proceedings of the CARI 2022–Colloque Africain sur la Recherche en Informatique et en Mathémathiques Appliquées, Sophia Antipolis, France, 4–7 October 2022. [Google Scholar]
- Zhang, W.; Zhao, W.; Li, J.; Zhuang, P.; Sun, H.; Xu, Y.; Li, C. CVANet: Cascaded visual attention network for single image super-resolution. Neural Netw. 2024, 170, 622–634. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Li, Z.; Li, G.; Zhuang, P.; Hou, G.; Zhang, Q.; Li, C. GACNet: Generate Adversarial-Driven Cross-Aware Network for Hyperspectral Wheat Variety Identification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
- Wongvorachan, T.; He, S.; Bulut, O. A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining. Information 2023, 14, 54. [Google Scholar] [CrossRef]
- Filimonyuk, L. An Approach to Decision-Making Systems’ Development in Case of Accidents’ Causes Figuring out in Large-Scale Systems. In Proceedings of the 2022 15th International Conference Management of Large-Scale System Development (MLSD), Moscow, Russia, 26–28 September 2022; pp. 1–4. [Google Scholar] [CrossRef]
- Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Trans. Syst. Man. Cybern. Part C Appl. Rev. 2012, 42, 463–484. [Google Scholar] [CrossRef]
- Zhu, T.; Lin, Y.; Liu, Y. Improving interpolation-based oversampling for imbalanced data learning. Knowl.-Based Syst. 2020, 187, 104826. [Google Scholar] [CrossRef]
- Kaur, P.; Sharma, A.; Chahal, J.K.; Sharma, T.; Sharma, V.K. Analysis on Credit Card Fraud Detection and Prevention using Data Mining and Machine Learning Techniques. In Proceedings of the 2021 International Conference on Computational Intelligence and Computing Applications (ICCICA), Nagpur, India, 26–27 November 2021; pp. 1–4. [Google Scholar] [CrossRef]
- Roy, A.; Cruz, R.M.; Sabourin, R.; Cavalcanti, G.D. A study on combining dynamic selection and data preprocessing for imbalance learning. Neurocomputing 2018, 286, 179–192. [Google Scholar] [CrossRef]
- Sierra, B.; Lazkano, E.; Irigoien, I.; Jauregi, E.; Mendialdua, I. K Nearest Neighbor Equality: Giving equal chance to all existing classes. Inf. Sci. 2011, 181, 5158–5168. [Google Scholar] [CrossRef]
- Liu, Z.; Cao, W.; Gao, Z.; Bian, J.; Liu, T.Y. Self-paced Ensemble for Highly Imbalanced Massive Data Classification. In Proceedings of the 36th IEEE International Conference on Data Engineering, Dallas, TX, USA, 20–24 April 2020. [Google Scholar]
- Dodda, R.; Raghavendra, C.; Aashritha, M.; Macherla, H.V.; Kuntla, A.R. A Comparative Study of Machine Learning Algorithms for Predicting Customer Churn: Analyzing Sequential, Random Forest, and Decision Tree Classifier Models. In Proceedings of the 2024 5th International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 7–9 August 2024; pp. 1552–1559. [Google Scholar] [CrossRef]
- Bernardo, A.; Gomes, H.M.; Montiel, J.; Pfahringer, B.; Valle, E.D. C-SMOTE: Continuous Synthetic Minority Oversampling for Evolving Data Streams. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020. [Google Scholar]
- Ducray, B.; Cobourne, S.; Mayes, K.; Markantonakis, K. Comparison of dynamic biometrie security characteristics against other biometrics. In Proceedings of the 2017 IEEE International Conference on Communications (ICC), Paris, France, 21–25 May 2017; pp. 1–7. [Google Scholar] [CrossRef]
- Cieslak, D.A.; Chawla, N.V. Learning Decision Trees for Unbalanced Data; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Killourhy, K.S.; Maxion, R.A. Comparing anomaly-detection algorithms for keystroke dynamics. In Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems & Networks, Lisbon, Portugal, 29 June–2 July 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 125–134. [Google Scholar]
- Bours, P. Continuous keystroke dynamics: A different perspective towards biometric evaluation. Inf. Secur. Tech. Rep. 2012, 17, 36–43. [Google Scholar] [CrossRef]
- Monaco, J.V.; Tappert, C.C. The partially observable hidden Markov model and its application to keystroke dynamics. Pattern Recognit. 2018, 76, 449–462. [Google Scholar] [CrossRef]
- Zhong, Y.; Deng, Y.; Jain, A.K. Keystroke dynamics for user authentication: A large-scale investigation. IEEE Trans. Biom. Behav. Identity Sci. 2019, 1, 123–135. [Google Scholar]
Method | Accuracy | F-Score | G-Mean | Recall |
---|---|---|---|---|
DT | 3.85 | 4.00 | 3.94 | 2.87 |
SVM | 5.66 | 5.79 | 5.84 | 3.85 |
KNN | 7.16 | 7.37 | 6.90 | 2.15 |
RF | 1.11 | 1.15 | 1.15 | 1.18 |
XGBoost | 2.87 | 3.00 | 3.00 | 2.11 |
Bagging | 2.37 | 2.44 | 2.42 | 1.58 |
SMOTE RF | 1.06 | 1.10 | 1.06 | 1.11 |
SMOTE XGBoost | 2.66 | 2.77 | 2.77 | 2.05 |
SMOTE Bagging | 2.58 | 2.66 | 2.79 | 2.35 |
Method | Accuracy | F-Score | G-Mean | Recall |
---|---|---|---|---|
DT | 90.59 | 92.58 | 89.92 | 89.52 |
SVM | 81.72 | 80.85 | 77.68 | 85.89 |
KNN | 77.55 | 78.40 | 79.36 | 93.95 |
RF | 98.39 | 98.78 | 97.97 | 97.18 |
XGBoost | 93.41 | 94.83 | 92.95 | 92.74 |
Bagging | 94.89 | 95.88 | 94.72 | 95.16 |
SMOTE RF | 98.66 | 98.97 | 98.39 | 97.98 |
SMOTE XGBoost | 94.09 | 95.34 | 93.61 | 93.15 |
SMOTE Bagging | 94.76 | 95.94 | 93.83 | 92.34 |
Method | Accuracy | F-Score | G-Mean | Recall |
---|---|---|---|---|
DT | 7.86 | 7.92 | 6.76 | 6.43 |
SVM | 7.96 | 7.98 | 8.57 | 8.57 |
KNN | 6.61 | 6.73 | 7.16 | 7.06 |
RF | 4.14 | 4.14 | 6.08 | 6.08 |
XGBoost | 1.82 | 1.86 | 3.27 | 3.16 |
Bagging | 5.25 | 5.31 | 5.94 | 5.76 |
SMOTE RF | 2.41 | 2.55 | 3.08 | 3.04 |
SMOTE XGBoost | 2.00 | 2.16 | 1.63 | 1.43 |
SMOTE Bagging | 5.96 | 6.16 | 2.31 | 2.00 |
Method | Accuracy | F-Score | G-Mean | Recall |
---|---|---|---|---|
DT | 98.97 | 99.48 | 85.04 | 73.48 |
SVM | 98.82 | 99.40 | 51.35 | 41.08 |
KNN | 99.21 | 99.60 | 82.09 | 68.34 |
RF | 99.43 | 99.71 | 83.74 | 71.59 |
XGBoost | 99.70 | 99.85 | 93.15 | 87.05 |
Bagging | 99.37 | 99.68 | 86.11 | 75.00 |
SMOTE RF | 99.62 | 99.81 | 93.36 | 87.51 |
SMOTE XGBoost | 99.70 | 99.85 | 96.84 | 94.00 |
SMOTE Bagging | 99.23 | 99.61 | 95.96 | 92.71 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, X.; Hou, D. Enhancing Keystroke Dynamics Authentication with Ensemble Learning and Data Resampling Techniques. Electronics 2024, 13, 4559. https://doi.org/10.3390/electronics13224559
Wang X, Hou D. Enhancing Keystroke Dynamics Authentication with Ensemble Learning and Data Resampling Techniques. Electronics. 2024; 13(22):4559. https://doi.org/10.3390/electronics13224559
Chicago/Turabian StyleWang, Xiaofei, and Daqing Hou. 2024. "Enhancing Keystroke Dynamics Authentication with Ensemble Learning and Data Resampling Techniques" Electronics 13, no. 22: 4559. https://doi.org/10.3390/electronics13224559
APA StyleWang, X., & Hou, D. (2024). Enhancing Keystroke Dynamics Authentication with Ensemble Learning and Data Resampling Techniques. Electronics, 13(22), 4559. https://doi.org/10.3390/electronics13224559