Web Traffic Anomaly Detection Using Isolation Forest
Abstract
:1. Introduction
- What are the necessary data and parameters for Isolation Forest, and how are they prepared for web traffic anomaly detection?
- How is Isolation Forest implemented in web traffic anomaly detection?
- What is the performance of Isolation Forest in detecting anomalies in web traffic?
2. Methodology
2.1. Web Traffic Data Preparation
2.2. Isolation Forest Model Implementation
2.3. Isolation Forest Model Evaluation
3. Results and Discussions
3.1. Web Traffic Data Preparation Results
3.2. Isolation Forest Model Implementation and Evaluation Results
4. Conclusions and Recommendations
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Trivedi, J.; Shah, M. A Systematic and Comprehensive Study on Machine Learning and Deep Learning Models in Web Traffic Prediction. Arch. Comput. Methods Eng. 2024, 31, 3171–3195. [Google Scholar] [CrossRef]
- Lu, T.; Wang, L.; Zhao, X. Review of Anomaly Detection Algorithms for Data Streams. Appl. Sci. 2023, 13, 6353. [Google Scholar] [CrossRef]
- Ji, I.H.; Lee, J.H.; Kang, M.J.; Park, W.J.; Jeon, S.H.; Seo, J.T. Artificial Intelligence-Based Anomaly Detection Technology over Encrypted Traffic: A Systematic Literature Review. Sensors 2024, 24, 898. [Google Scholar] [CrossRef] [PubMed]
- Tama, B.A.; Nkenyereye, L.; Islam, S.M.R.; Kwak, K.-S. An Enhanced Anomaly Detection in Web Traffic Using a Stack of Classifier Ensemble. IEEE Access 2020, 8, 24120–24134. [Google Scholar] [CrossRef]
- Kim, T.-Y.; Cho, S.-B. Web traffic anomaly detection using C-LSTM neural networks. Expert Syst. Appl. 2018, 106, 66–76. [Google Scholar] [CrossRef]
- Nassif, A.B.; Talib, M.A.; Nasir, Q.; Dakalbab, F.M. Machine Learning for Anomaly Detection: A Systematic Review. IEEE Access 2021, 9, 78658–78700. [Google Scholar] [CrossRef]
- Carrera, F.; Dentamaro, V.; Galantucci, S.; Iannacone, A.; Impedovo, D.; Pirlo, G. Combining Unsupervised Approaches for Near Real-Time Network Traffic Anomaly Detection. Appl. Sci. 2022, 12, 1759. [Google Scholar] [CrossRef]
- Inuwa, M.M.; Das, R. A comparative analysis of various machine learning methods for anomaly detection in cyber attacks on IoT networks. Internet Things 2024, 26, 101162. [Google Scholar] [CrossRef]
- Li, X.; Liang, D.; Li, X.; Huang, J.; Wu, J.; Gou, H. Quality monitoring of real-time PPP service using isolation forest-based residual anomaly detection. GPS Solut. 2024, 28, 118. [Google Scholar] [CrossRef]
- Gałka, Ł.; Karczmarek, P.; Tokovarov, M. Isolation Forest Based on Minimal Spanning Tree. IEEE Access 2022, 10, 74175–74186. [Google Scholar] [CrossRef]
- Liu, F.T.; Ting, K.M.; Zhou, Z.-H. Isolation Forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar] [CrossRef]
- Ding, Z.; Fei, M. An Anomaly Detection Approach Based on Isolation Forest Algorithm for Streaming Data using Sliding Window. IFAC Proc. Vol. 2013, 46, 12–17. [Google Scholar] [CrossRef]
- Karev, D.; McCubbin, C.; Vaulin, R. Cyber Threat Hunting Through the Use of an Isolation Forest. In CompSysTech ’17: Proceedings of the 18th International Conference on Computer Systems and Technologies; Association for Computing Machinery: New York, NY, USA, 2017; pp. 163–170. [Google Scholar] [CrossRef]
- Ripan, R.C.; Sarker, I.H.; Anwar, M.M.; Furhad, M.H.; Rahat, F.; Hoque, M.M.; Sarfraz, M. An Isolation Forest Learning Based Outlier Detection Approach for Effectively Classifying Cyber Anomalies. In Hybrid Intelligent Systems; Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Rios, T.N., Hong, T.-P., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 270–279. [Google Scholar]
- John, H.; Naaz, S. Credit Card Fraud Detection using Local Outlier Factor and Isolation Forest. Int. J. Comput. Sci. Eng. 2019, 7, 1060–1064. [Google Scholar] [CrossRef]
- Zaker, F. Online Shopping Store-Web Server Logs. Harvard Dataverse. 2019. [CrossRef]
- Gabryel, M.; Lada, D.; Filutowicz, Z.; Patora-Wysocka, Z.; Kisiel-Dorohinicki, M.; Chen, G.Y. Detecting Anomalies in Advertising Web Traffic with the Use of the Variational Autoencoder. J. Artif. Intell. Soft Comput. Res. 2022, 12, 255–256. [Google Scholar] [CrossRef]
- Al-Shehari, T.; Al-Razgan, M.; Alfakih, T.; Alsowail, R.A.; Pandiaraj, S. Insider Threat Detection Model using Anomaly-Based Isolation Forest Algorithm. IEEE Access 2023, 11, 118170–118185. [Google Scholar] [CrossRef]
- Franklin, R.J.; Mohana; Dabbagol, V. Anomaly Detection in Videos for Video Surveillance Applications using Neural Networks. In Proceedings of the 2020 Fourth International Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 8–10 January 2020; pp. 632–637. [Google Scholar] [CrossRef]
- Liu, F.T.; Ting, K.M.; Zhou, Z.-H. Isolation-Based Anomaly Detection. ACM Trans. Knowl. Discov. Data 2012, 6, 1–39. [Google Scholar] [CrossRef]
- Sadaf, K.; Sultana, J. Intrusion Detection Based on Autoencoder and Isolation Forest in Fog Computing. IEEE Access 2020, 8, 167059–167068. [Google Scholar] [CrossRef]
- Zhang, Y.F.; Lu, H.L.; Lin, H.F.; Qiao, X.C.; Zheng, H. The Optimized Anomaly Detection Models Based on an Approach of Dealing with Imbalanced Dataset for Credit Card Fraud Detection. Mob. Inf. Syst. 2022, 2022, 8027903. [Google Scholar] [CrossRef]
- Hamon, V. Malicious URI resolving in PDF documents. J. Comput. Virol. Hacking Tech. 2013, 9, 65–76. [Google Scholar] [CrossRef]
- Chabchoub, Y.; Togbe, M.U.; Boly, A.; Chiky, R. An In-Depth Study and Improvement of Isolation Forest. IEEE Access 2022, 10, 10219–10237. [Google Scholar] [CrossRef]
- Aldrich, C.; Liu, X. Monitoring of Mineral Processing Operations with Isolation Forests. Minerals 2024, 14, 76. [Google Scholar] [CrossRef]
- Zhang, Q.; Liang, Z.; Liu, W.; Peng, W.; Huang, H.; Zhang, S.; Chen, L.; Jiang, K.; Liu, L. Landslide Susceptibility Prediction: Improving the Quality of Landslide Samples by Isolation Forests. Sustainability 2022, 14, 16692. [Google Scholar] [CrossRef]
- Priyanto, C.Y.; Hendry; Purnomo, H.D. Combination of Isolation Forest and LSTM Autoencoder for Anomaly Detection. In Proceedings of the 2021 2nd International Conference on Innovative and Creative Information Technology (ICITech), Salatiga, Indonesia, 23–25 September 2021; pp. 35–38. [Google Scholar] [CrossRef]
- Chen, W.; Yang, K.; Yu, Z.; Shi, Y.; Chen, P. A survey on imbalanced learning: Latest research, applications and future directions. Artif. Intell. Rev. 2024, 57, 137. [Google Scholar] [CrossRef]
- Madhukar Rao, G.; Ramesh, D. A Hybrid and Improved Isolation Forest Algorithm for Anomaly Detection. In Proceedings of the International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications; Gunjan, V.K., Zurada, J.M., Eds.; Springer: Singapore, 2021; pp. 589–598. [Google Scholar]
- Foody, G.M. Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient. PLoS ONE 2023, 18, e0291908. [Google Scholar] [CrossRef] [PubMed]
- Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. In AI 2006: Advances in Artificial Intelligence; Sattar, A., Kang, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1015–1021. [Google Scholar]
- Yacouby, R.; Axman, D. Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models. In EVAL4NLP; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; [Online]; Available online: https://api.semanticscholar.org/CorpusID:226283839 (accessed on 27 May 2024).
- Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2011, 2, 37–63, [Online]. Available online: https://api.semanticscholar.org/CorpusID:3770261 (accessed on 8 June 2024).
- Naseer, S.; Saleem, Y.; Khalid, S.; Bashir, M.K.; Han, J.; Iqbal, M.M.; Han, K. Enhanced Network Anomaly Detection Based on Deep Neural Networks. IEEE Access 2018, 6, 48231–48246. [Google Scholar] [CrossRef]
- Benova, L.; Hudec, L. Comprehensive Analysis and Evaluation of Anomalous User Activity in Web Server Logs. Sensors 2024, 24, 746. [Google Scholar] [CrossRef]
- Sharova, E. Unsupervised Anomaly Detection with Isolation Forest. In Proceedings of the PyData, London, UK, 26 April 2018. [Google Scholar]
40.77.167.129—[22/Jan/2019:03:56:18 +0330] “GET /image/57710/productModel/100x100 HTTP/1.1” 200 1695 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)” “-” |
Column | Data Type | Column | Data Type | Column | Data Type |
---|---|---|---|---|---|
Srcip | object | URIs | category | Bytes | int32 |
Timestamp | object | Protocol | category | Referrer | category |
Method | category | Status | category | User-Agent | category |
# | Column | Data Type | # | Column | Data Type | # | Column | Data Type |
---|---|---|---|---|---|---|---|---|
1 | Srcip | Object | 5 | Protocol | category | 9 | User-Agent | category |
2 | Timestamp | Object | 6 | Status | category | 10 | URI_occurences | int64 |
3 | Method | Category | 7 | Bytes | int32 | 11 | IOC_occurences | int64 |
4 | URIs | Category | 8 | Referrer | category | 12 | User-Agent_occurences | int64 |
13 | URI_length | int64 |
# | Column | Data Type | # | Column | Data Type | # | Column | Data Type |
---|---|---|---|---|---|---|---|---|
1 | Srcip | object | 6 | Status | Category | 12 | User-Agent_occurences | int64 |
2 | Timestamp | object | 7 | Bytes | int32 | 13 | URI_length | int64 |
3 | Method | category | 8 | Referrer | Category | 14 | AttackType | object |
4 | URIs | category | 9 | User-Agent | Category | 15 | Human Rating | object |
5 | Protocol | category | 10 | URI_occurences | int64 | 16 | Predict | int32 |
11 | IOC_occurences | int64 | 17 | Anomaly_Score | int32 |
Percentage | Number of Records | |
---|---|---|
Training Set | 25% | 2,591,287 |
Testing Set | 10% | 1,035,020 |
Accuracy | Precision | Recall | F1-Score |
---|---|---|---|
0.93 | 0.95 | 0.90 | 0.92 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chua, W.; Pajas, A.L.D.; Castro, C.S.; Panganiban, S.P.; Pasuquin, A.J.; Purganan, M.J.; Malupeng, R.; Pingad, D.J.; Orolfo, J.P.; Lua, H.H.; et al. Web Traffic Anomaly Detection Using Isolation Forest. Informatics 2024, 11, 83. https://doi.org/10.3390/informatics11040083
Chua W, Pajas ALD, Castro CS, Panganiban SP, Pasuquin AJ, Purganan MJ, Malupeng R, Pingad DJ, Orolfo JP, Lua HH, et al. Web Traffic Anomaly Detection Using Isolation Forest. Informatics. 2024; 11(4):83. https://doi.org/10.3390/informatics11040083
Chicago/Turabian StyleChua, Wilson, Arsenn Lorette Diamond Pajas, Crizelle Shane Castro, Sean Patrick Panganiban, April Joy Pasuquin, Merwin Jan Purganan, Rica Malupeng, Divine Jessa Pingad, John Paul Orolfo, Haron Hakeen Lua, and et al. 2024. "Web Traffic Anomaly Detection Using Isolation Forest" Informatics 11, no. 4: 83. https://doi.org/10.3390/informatics11040083
APA StyleChua, W., Pajas, A. L. D., Castro, C. S., Panganiban, S. P., Pasuquin, A. J., Purganan, M. J., Malupeng, R., Pingad, D. J., Orolfo, J. P., Lua, H. H., & Velasco, L. C. (2024). Web Traffic Anomaly Detection Using Isolation Forest. Informatics, 11(4), 83. https://doi.org/10.3390/informatics11040083