Artificial Intelligence-Based Anomaly Detection Technology over Encrypted Traffic: A Systematic Literature Review
Abstract
:1. Introduction
2. Methods
2.1. Research Questions
- RQ1. What datasets are mainly used to measure the performance of anomaly detection models for encrypted traffic, and what encryption method was mainly used to encrypt the data used in the experiment?
- RQ2. How can the features needed for AI model learning and anomaly detection over encrypted traffic be extracted?
- RQ3. What data preprocessing methods are commonly used in the earlier phases of the development of malicious activity detection algorithms?
- RQ4. What is the AI algorithm used to detect anomalies in encrypted traffic?
- RQ5. What are the performance indicators used to evaluate the performance of an anomaly detection model over encrypted traffic?
2.2. Eligibility Criteria
- Research written in English: Since English is the most dominant language used in modern medical, scientific, and engineering research, this ensures diversity in the literature to be reviewed.
- Research from conferences or journals that have passed peer review: This ensures the basic quality of the articles to be analyzed.
- The collected literature should cover data processing (feature selection and preprocessing) techniques for encrypted network traffic.
- The collected literature should cover anomaly detection techniques (algorithms) over encrypted network traffic.
- Articles that present a methodology but do not have an objective evaluation of the proposal should be excluded.
- Studies outside the review topic, such as the classification of encrypted data rather than anomaly detection over encrypted traffic, should be excluded.
- Secondary research, such as survey-based studies, as opposed to primary research on data preprocessing and AI-based anomaly detection technologies for encryption traffic, should be excluded.
- Studies that conducted anomaly detection research only on data encrypted by encryption algorithms not currently in use due to design flaws or the development of alternative encryption algorithms should be excluded.
2.3. Information Sources
2.4. Search and Study Selection
- Articles identified in the bibliographic database using the search query are transferred to the reference management software EndNote 21.
- The same articles retrieved from different bibliographic databases are removed.
- Articles that present a methodology but do not have an objective evaluation of the proposal should be excluded.
- Identified articles will be reviewed based on the title and abstract according to the above eligibility criteria.
- To determine which documents should be included in this review, a full-text evaluation of the remaining documents is performed through steps 2 to 4.
2.5. Data Collection Process
2.6. Quality Assessment
- Review topic. Research should suggest ways to detect anomalies or attacks in traffic.
- Contextual information. Sufficient contextual information must be provided to interpret the results.
- Data. The research article must provide a detailed description of the data used in the experiment. This affects the reliability of the research results. This is also essential to answer research question RQ1.
- Details. Accurately conveying the data processing method and normal/anomaly classification explanation proposed by the research helps us to answer research questions RQ2 to RQ4.
- Experimental results. Experimental results play an essential role in proving the validity of the research.
3. Results
3.1. Study Selection
3.2. Summary of the Identified Literature
3.3. Study Characteristics
3.4. Dataset
3.5. Feature Extraction
- Statistics-based feature extraction
- Log Information-Based Feature Extraction
3.6. Feature Selections
3.7. Preprocessing
3.8. Detection Algorithm
3.9. Performance Indicators
4. Discussion
4.1. RQ1: What Datasets Are Mainly Used to Measure the Performance of Anomaly Detection Models over Encrypted Traffic, and What Encryption Method Was Mainly Used to Encrypt the Data Used in the Experiment?
4.2. RQ2: How Can the Features Needed for AI Model Learning and Anomaly Detection over Encrypted Traffic Be Extracted?
4.3. RQ3: What Data Preprocessing Technology Is Used to Detect Malicious Activity Using Encrypted Traffic?
4.4. RQ4: What Is the AI Algorithm Used to Detect Anomalies in Encrypted Traffic?
4.5. RQ5: What Are the Performance Indicators Used to Evaluate the Performance of an Anomaly Detection Model over Encrypted Traffic?
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zscaler. Spoiler: New ThreatLabz Report Reveals over 85% of Attacks Are Encrypted. 2022. Available online: https://www.zscaler.com/blogs/security-research/2022-encrypted-attacks-report (accessed on 31 December 2023).
- Wang, W.; Zhu, M.; Zeng, X.; Ye, X.; Sheng, Y. Malware traffic classification using convolutional neural network for representation learning. In Proceedings of the 2017 International Conference on Information Networking (ICOIN), Da Nang, Vietnam, 11–13 January 2017; IEEE: New York, NY, USA, 2017; pp. 712–717. [Google Scholar]
- Van Ede, T.; Bortolameotti, R.; Continella, A.; Ren, J.; Dubois, D.J.; Lindorfer, M.; Choffnes, D.; van Steen, M.; Peter, A. Flowprint: Semi-Supervised Mobile-App Fingerprinting on Encrypted Network Traffic; Network and Distributed System Security Symposium (NDSS), NDSS: San Diego, CA, USA, 2020. [Google Scholar]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Chen, L.; Gao, S.; Liu, B.; Lu, Z.; Jiang, Z. THS-IDPC: A three-stage hierarchical sampling method based on improved density peaks clustering algorithm for encrypted malicious traffic detection. J. Supercomput. 2020, 76, 7489–7518. [Google Scholar] [CrossRef]
- Bakhshi, T.; Ghita, B. Anomaly detection in encrypted internet traffic using hybrid deep learning. Secur. Commun. Netw. 2021, 2021, 5363750. [Google Scholar] [CrossRef]
- Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; Prisma Group. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Ann. Intern. Med. 2009, 151, 264–269. [Google Scholar] [CrossRef] [PubMed]
- Keele, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; Technical report, ver. 2.3 ebse technical report; School of Computer Science and Mathematics Keele University Keele: Staffs, UK, 2007. [Google Scholar]
- Stratosphere Lab. CTU-Malware-Capture-Botnet. Available online: https://www.stratosphereips.org/datasets-malware (accessed on 31 December 2023).
- Duncan, D.B. Malware Traffic Analysis. Available online: https://www.malware-traffic-analysis.net/ (accessed on 31 December 2023).
- Chao, D. A Mining Policy based Malicious Encrypted Traffic Detection Scheme. In Proceedings of the 2020 9th International Conference on Computing and Pattern Recognition, Xiamen, China, 30 October–1 November 2020; pp. 130–135. [Google Scholar]
- Chen, L.; Jiang, Y.; Kuang, X.; Xu, A. Deep learning detection method of encrypted malicious traffic for power grid. In Proceedings of the 2020 IEEE International Conference on Energy Internet (ICEI), Sydney, NSW, Australia, 24–28 August 2020; IEEE: New York, NY, USA, 2020; pp. 86–91. [Google Scholar]
- UNB VPN-nonVPN Dataset (ISCXVPN2016). Available online: https://www.unb.ca/cic/datasets/vpn.html (accessed on 31 December 2023).
- Yungshenglu USTC-TFC2016 Datset. Available online: https://github.com/yungshenglu/USTC-TFC2016 (accessed on 31 December 2023).
- UNB NSL-KDD Dataset. Available online: https://www.unb.ca/cic/datasets/nsl.html (accessed on 31 December 2023).
- UNSW Sydney. The UNSW-NB15 Dataset. Available online: https://research.unsw.edu.au/projects/unsw-nb15-dataset (accessed on 31 December 2023).
- UNB Intrusion Detection Evaluation Dataset (CIC-IDS2017). Available online: https://www.unb.ca/cic/datasets/ids-2017.html (accessed on 31 December 2023).
- Garcia, N.; Alcaniz, T.; González-Vidal, A.; Bernabe, J.B.; Rivera, D.; Skarmeta, A. Distributed real-time SlowDoS attacks detection over encrypted traffic using Artificial Intelligence. J. Netw. Comput. Appl. 2021, 173, 102871. [Google Scholar] [CrossRef]
- Huo, Y.; Zhao, F.; Zhang, H.; Zhuang, S.; Sun, J. AS-DMF: A Lightweight Malware Encrypted Traffic Detection Method Based on Active Learning and Feature Selection. Wirel. Commun. Mob. Comput. Online 2022, 2022, 1556768. [Google Scholar] [CrossRef]
- Stratosphere Lab. The CTU-13 Dataset. Available online: https://www.stratosphereips.org/datasets-ctu13 (accessed on 31 December 2023).
- Yang, J.; Liang, G.; Li, B.; Wen, G.; Gao, T. A deep-learning-and reinforcement-learning-based system for encrypted network malicious traffic detection. Electron. Lett. 2021, 57, 363–365. [Google Scholar] [CrossRef]
- Zhao, C.; Li, S.; Wu, X.; Han, W.; Tian, Z.; Chen, M. A Novel Malware Encrypted Traffic Detection Framework Based on Ensemble Learning. In Proceedings of the 2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC), Shenzhen, China, 9–11 October 2021; IEEE: New York, NY, USA, 2021; pp. 614–620. [Google Scholar]
- Datacon. Datacon2020. Available online: https://datacon.qianxin.com/opendata/maliciousstream (accessed on 31 December 2023).
- Zhang, S.; Bu, Y.; Chen, B.; Lu, X. Transfer learning for encrypted malicious traffic detection based on efficientnet. In Proceedings of the 2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC), Shanghai, China, 23–25 April 2021; IEEE: New York, NY, USA, 2021; pp. 72–76. [Google Scholar]
- De Lucia, M.J.; Cotton, C. Detection of encrypted malicious network traffic using machine learning. In Proceedings of the MILCOM 2019—2019 IEEE Military Communications Conference (MILCOM), Norfolk, VA, USA, 12–14 November 2019; IEEE: New York, NY, USA, 2019; pp. 1–6. [Google Scholar]
- Zeng, Y.; Gu, H.; Wei, W.; Guo, Y. Deep-Full-Range: A deep learning based network encrypted traffic classification and intrusion detection framework. IEEE Access 2019, 7, 45182–45190. [Google Scholar] [CrossRef]
- UNB Intrusion Detection Evaluation Dataset (ISCXIDS2012). Available online: https://www.unb.ca/cic/datasets/ids.html (accessed on 31 December 2023).
- Han, S.; Wu, Q.; Zhang, H.; Qin, B. Light-Weight Unsupervised Anomaly Detection for Encrypted Malware Traffic. In Proceedings of the 2022 7th IEEE International Conference on Data Science in Cyberspace (DSC), Guilin, China, 11–13 July 2022; IEEE: New York, NY, USA, 2022; pp. 206–213. [Google Scholar]
- Zhao, Z.; Li, Z.; Jiang, J.; Yu, F.; Zhang, F.; Xu, C.; Zhao, X.; Zhang, R.; Guo, S. ERNN: Error-Resilient RNN for Encrypted Traffic Detection towards Network-Induced Phenomena. IEEE Trans. Dependable Secur. Comput. 2023, 1–18. [Google Scholar] [CrossRef]
- Wang, Z.; Li, M.; Ou, H.; Pang, S.; Yue, Z. A Few-Shot Malicious Encrypted Traffic Detection Approach Based on Model-Agnostic Meta-Learning. Secur. Commun. Netw. 2023, 2023, 3629831. [Google Scholar] [CrossRef]
- UNB Android Malware Dataset (CIC-AndMal2017). Available online: https://www.unb.ca/cic/datasets/andmal2017.html (accessed on 31 December 2023).
- Niu, Z.; Xue, J.; Qu, D.; Wang, Y.; Zheng, J.; Zhu, H. A novel approach based on adaptive online analysis of encrypted traffic for identifying Malware in IIoT. Inf. Sci. 2022, 601, 162–174. [Google Scholar] [CrossRef]
- Malware Capture Faculity Project. Available online: https://mcfp.weebly.com/ (accessed on 31 December 2023).
- Li, M.; Song, X.; Zhao, J.; Cui, B. TCMal: A Hybrid Deep Learning Model for Encrypted Malicious Traffic Classification. In Proceedings of the 2022 IEEE 8th International Conference on Computer and Communications (ICCC), Chengdu, China, 9–12 December 2022; IEEE: New York, NY, USA, 2022; pp. 1634–1640. [Google Scholar]
- Stratosphere Lab. Stratosphere Laboratory Datasets. Available online: https://www.stratosphereips.org/datasets-overview (accessed on 31 December 2023).
- Liu, J.; Li, Z.; Wang, J.; Yan, T.; An, D.; Zhou, C.; Chen, G. A Weakly-Supervised Method for Encrypted Malicious Traffic Detection. In Proceedings of the International Symposium on Grids & Clouds 2022, Virtual, 21–25 March 2022; p. 27. [Google Scholar]
- Ferriyan, A.; Thamrin, A.H.; Takeda, K.; Murai, J. Encrypted malicious traffic detection based on word2vec. Electronics 2022, 11, 679. [Google Scholar] [CrossRef]
- Jstrosch, D. Malware-Samples. Available online: https://github.com/jstrosch/malware-samples (accessed on 31 December 2023).
- Zhang, X.; Zhao, M.; Wang, J.; Li, S.; Zhou, Y.; Zhu, S. Deep-forest-based encrypted malicious traffic detection. Electronics 2022, 11, 977. [Google Scholar] [CrossRef]
- Zheng, J.; Zeng, Z.; Feng, T. GCN-ETA: High-efficiency encrypted malicious traffic detection. Secur. Commun. Netw. 2022, 2022, 4274139. [Google Scholar] [CrossRef]
- Zhang, X.; Lu, J.; Sun, J.; Xiao, R.; Jin, S. MEMTD: Encrypted Malware Traffic Detection Using Multimodal Deep Learning. In Proceedings of the International Conference on Web Engineering, Bari, Italy, 5–8 July 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 357–372. [Google Scholar]
- Li, M.; Wu, Z.; Chen, K.; Wang, W. Adversarial Malicious Encrypted Traffic Detection Based on Refined Session Analysis. Symmetry 2022, 14, 2329. [Google Scholar] [CrossRef]
- Wang, Z.; Fok, K.W.; Thing, V.L. Machine learning for encrypted malicious traffic detection: Approaches, datasets and comparative study. Comput. Secur. 2022, 113, 102542. [Google Scholar] [CrossRef]
- UNSW Sydney. UNSW NS 2019 Dataset. Available online: https://iotanalytics.unsw.edu.au/attack-data.html (accessed on 31 December 2023).
- Bader, O.; Lichy, A.; Hajaj, C.; Dubin, R.; Dvir, A. MalDIST: From encrypted traffic classification to malware traffic detection and classification. In Proceedings of the 2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 8–11 January 2022; IEEE: New York, NY, USA, 2022; pp. 527–533. [Google Scholar]
- Fu, Z.; Liu, M.; Qin, Y.; Zhang, J.; Zou, Y.; Yin, Q.; Li, Q.; Duan, H. Encrypted malware traffic detection via graph-based network analysis. In Proceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses, Limassol, Cyprus, 26–28 October 2022; pp. 495–509. [Google Scholar]
- UNB CIC-InvesAndMal2019. Available online: https://www.unb.ca/cic/datasets/invesandmal2019.html (accessed on 31 December 2023).
- Alzighaibi, A.R. Detection of DoH Traffic Tunnels Using Deep Learning for Encrypted Traffic Classification. Computers 2023, 12, 47. [Google Scholar] [CrossRef]
- UNB CIRA-CIC-DoHBrw-2020. Available online: https://www.unb.ca/cic/datasets/dohbrw-2020.html (accessed on 31 December 2023).
- Liu, J.; Wang, L.; Hu, W.; Gao, Y.; Cao, Y.; Lin, B.; Zhang, R. Spatial-Temporal Feature with Dual-Attention Mechanism for Encrypted Malicious Traffic Detection. Secur. Commun. Netw. 2023, 2023, 7117863. [Google Scholar] [CrossRef]
- Wang, Z.; Thing, V.L. Feature mining for encrypted malicious traffic detection with deep learning and other machine learning algorithms. Comput. Secur. 2023, 128, 103143. [Google Scholar] [CrossRef]
- Stratosphere Lab. CTU-Normal-Captures. Available online: https://www.stratosphereips.org/datasets-normal (accessed on 31 December 2023).
- Stratosphere Lab. CTU-Mixed-Captures. Available online: https://www.stratosphereips.org/datasets-mixed (accessed on 31 December 2023).
- Hong, Y.; Li, Q.; Yang, Y.; Shen, M. Graph based Encrypted Malicious Traffic Detection with Hybrid Analysis of Multi-view Features. Inf. Sci. 2023, 644, 119229. [Google Scholar] [CrossRef]
- Abhay Pratap Singh, M.S. Real time malware detection in encrypted network traffic using machine learning with time based features. J. Discret. Math. Sci. Cryptogr. 2023, 26, 841–850. [Google Scholar] [CrossRef]
- Xing, J.; Wu, C. Detecting anomalies in encrypted traffic via deep dictionary learning. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Toronto, ON, Canada, 6–9 July 2020; IEEE: New York, NY, USA, 2020; pp. 734–739. [Google Scholar]
- Bahlali, A.R.; Bachir, A.; Cheriet, A. Malicious Encrypted Network Traffic Detection using Deep Auto-Encoder with A Custom Reconstruction Loss. In Proceedings of the 10th International Symposium on Networks, Computers and Communications (ISNCC’23), Doha, Qatar, 23–26 October 2023. [Google Scholar]
- UNB CSE-CIC-IDS2018 on AWS. Available online: https://www.unb.ca/cic/datasets/ids-2018.html (accessed on 31 December 2023).
- Garcia, S.; Grill, M.; Stiborek, J.; Zunino, A. An empirical comparison of botnet detection methods. Comput. Secur. 2014, 45, 100–123. [Google Scholar] [CrossRef]
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; IEEE: New York, NY, USA, 2009; pp. 1–6. [Google Scholar]
- Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, Australia, 10–12 November 2015; IEEE: New York, NY, USA, 2015; pp. 1–6. [Google Scholar]
- Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 2018, 1, 108–116. [Google Scholar]
- Keyes, D.S.; Li, B.; Kaur, G.; Lashkari, A.H.; Gagnon, F.; Massicotte, F. EntropLyzer: Android malware classification and characterization using entropy analysis of dynamic characteristics. In Proceedings of the 2021 Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS), Hamilton, ON, Canada, 18–19 May 2021; IEEE: New York, NY, USA, 2021; pp. 1–12. [Google Scholar]
- MontazeriShatoori, M.; Davidson, L.; Kaur, G.; Lashkari, A.H. Detection of doh tunnels using time-series classification of encrypted traffic. In Proceedings of the 2020 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), Calgary, AB, Canada, 17–22 August 2020; IEEE: New York, NY, USA, 2020; pp. 63–70. [Google Scholar]
- Lashkari, A.H.; Gil, G.D.; Mamun, M.S.I.; Ghorbani, A.A. Characterization of tor traffic using time based features. In Proceedings of the International Conference on Information Systems Security and Privacy, Porto, Portugal, 19–21 February 2017; SciTePress: Setúbal, Portugal, 2017; pp. 253–262. [Google Scholar]
- Draper-Gil, G.; Lashkari, A.H.; Mamun, M.S.I.; Ghorbani, A.A. Characterization of encrypted and vpn traffic using time-related. In Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP), Rome, Italy, 19–21 February 2016; pp. 407–414. [Google Scholar]
- Tiwari, A.; Saraswat, S.; Dixit, U.; Pandey, S. Refinements in Zeek Intrusion Detection System. In Proceedings of the 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 25–26 March 2022; IEEE: New York, NY, USA, 2022; pp. 974–979. [Google Scholar]
- Zeek the Zeek Network Security Monitor. Available online: https://github.com/zeek/zeek (accessed on 31 December 2023).
- Liu, J.; Tian, Z.; Zheng, R.; Liu, L. A distance-based method for building an encrypted malware traffic identification framework. IEEE Access 2019, 7, 100014–100028. [Google Scholar] [CrossRef]
- Xin, G.; Xixi, Z.; Haoguang, X.; Liang, G.; Yaning, M.; Xin, M.; Chenni, D.; Xiaorong, D.; Haichuan, S.; Liguo, W. An anomaly detection method of encrypted traffic based on user behavior. In Proceedings of the 2021 1st International Conference on Control and Intelligent Robotics, Guangzhou, China, 18–20 June 2021; pp. 51–56. [Google Scholar]
- Şahin, D.Ö.; Kural, O.E.; Akleylek, S.; Kılıç, E. A novel permission-based Android malware detection system using feature selection based on linear regression. Neural Comput. Appl. 2021, 35, 1–16. [Google Scholar] [CrossRef]
- Zou, X.; Hu, Y.; Tian, Z.; Shen, K. Logistic regression model optimization and case analysis. In Proceedings of the 2019 IEEE 7th International Conference on Computer Science and Network Technology (ICCSNT), Dalian, China, 19–20 October 2019; IEEE: New York, NY, USA, 2019; pp. 135–139. [Google Scholar]
- Salmi, N.; Rustam, Z. Naïve Bayes classifier models for predicting the colon cancer. In Proceedings of the IOP Conference Series: Materials Science and Engineering, Malang, Indonesia, 20–21 March 2019; IOP Publishing: Bristol, UK, 2019; p. 052068. [Google Scholar]
- Islam, R.; Devnath, M.K.; Samad, M.D.; Al Kadry, S.M.J. GGNB: Graph-based Gaussian naive Bayes intrusion detection system for CAN bus. Veh. Commun. 2022, 33, 100442. [Google Scholar] [CrossRef]
- Astuti, L.D.; Haryanto, H. Metode Pohon Keputusan Menggunakan Algoritma c4.5 untuk Pengelompokkan Data Penduduk pada Tingkatan Kesejahteraan Keluarga. Available online: https://core.ac.uk/display/35382395 (accessed on 25 January 2024).
- Lewis, R.J. An introduction to classification and regression tree (CART) analysis. In Proceedings of the Annual Meeting of the Society for Academic Emergency Medicine, San Francisco, CA, USA, 22–25 May 2000; Citeseer: Pennsylvania, PA, USA, 2000. [Google Scholar]
- Bansal, M.; Goyal, A.; Choudhary, A. A comparative analysis of K-nearest neighbor, genetic, support vector machine, decision tree, and long short term memory algorithms in machine learning. Decis. Anal. J. 2022, 3, 100071. [Google Scholar] [CrossRef]
- Arpit, D.; Wang, H.; Zhou, Y.; Xiong, C. Ensemble of averages: Improving model selection and boosting performance in domain generalization. Adv. Neural Inf. Process. Syst. 2022, 35, 8265–8277. [Google Scholar]
- Speiser, J.L.; Miller, M.E.; Tooze, J.; Ip, E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 2019, 134, 93–101. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: San Francisco, CA, USA, 2016; pp. 785–794. [Google Scholar]
- Özgür, A.; Erdem, H. A Review of KDD99 Dataset Usage in Intrusion Detection and Machine Learning between 2010 and 2015. 2016. Available online: https://peerj.com/preprints/1954/ (accessed on 25 January 2024).
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
- Liu, C.; He, L.; Xiong, G.; Cao, Z.; Li, Z. Fs-net: A flow sequence network for encrypted traffic classification. In Proceedings of the IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019; IEEE: New York, NY, USA, 2019; pp. 1171–1179. [Google Scholar]
- Hu, W.; Hu, W.; Maybank, S. Adaboost-based algorithm for network intrusion detection. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2008, 38, 577–583. [Google Scholar]
- Wu, Q.-W.; Cao, R.-F.; Xia, J.-F.; Ni, J.-C.; Zheng, C.-H.; Su, Y.-S. Extra trees method for predicting LncRNA-disease association based on multi-layer graph embedding aggregation. IEEE ACM Trans. Comput. Biol. Bioinform. 2021, 19, 3171–3178. [Google Scholar] [CrossRef]
- Montufar, G.F.; Pascanu, R.; Cho, K.; Bengio, Y. On the number of linear regions of deep neural networks. Adv. Neural Inf. Process. Syst. 2014, 2, 2924–2932. [Google Scholar]
- Bhatt, D.; Patel, C.; Talsania, H.; Patel, J.; Vaghela, R.; Pandya, S.; Modi, K.; Ghayvat, H. CNN variants for computer vision: History, architecture, application, challenges and future scope. Electronics 2021, 10, 2470. [Google Scholar] [CrossRef]
- Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; IEEE: New York, NY, USA, 2017; pp. 1597–1600. [Google Scholar]
- Yuan, C.; Yang, H. Research on K-value selection method of K-means clustering algorithm. J 2019, 2, 226–235. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning (PMLR), Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114. [Google Scholar]
Bibliographic Database | Search Queries |
---|---|
Web of Science (# of results: 56) | (ALL = (encrypted) AND ALL = (network) AND ALL = (traffic) AND ALL = (anomaly or anomalies) AND ALL = (detection or detecting)) AND (PY == (“2023” OR “2022” OR “2021” OR “2020” OR “2019”)) |
Scopus (# of results: 89) | (TITLE-ABS-KEY (encrypted) AND TITLE-ABS-KEY (network) AND TITLE-ABS-KEY (traffic) AND TITLE-ABS-KEY (detection OR detecting) AND TITLE-ABS-KEY (anomaly OR anomalies)) AND PUBYEAR > 2018 AND PUBYEAR < 2024 |
ACM (# of results: 121) | AllField:(encrypted) AND AllField:(network) AND AllField:(traffic) AND Title:(anomal*) AND AllField:(detect*) “filter”: {E-Publication Date: (1 January 2019 TO 31 August 2023)} |
IEEE Xplore (# of results: 60) | (“All Metadata”:encrypted) AND (“All Metadata”:network) AND (“All Metadata”:traffic) AND (“All Metadata”:anomal*) AND (“All Metadata”:detect*) (Publication Year: 2019–2023) |
Article | Dataset | Encryption Protocol | Feature Extraction | Feature Selection | Preprocessing | Classification Algorithm | Performance Metrics |
---|---|---|---|---|---|---|---|
Chen et al. [5] | Mixed (CTU-Malware-Captures [9], MTA dataset [10]) | SSL/TLS | Log Information-Based Feature Extraction | - | Normalization (−1,1) | XGBoost, SVM, Random Forest | Accuracy, F1-score |
Chao et al. [11] | CTU-Malware-Captures [9] | SSL/TLS | Log Information-Based Feature Extraction | - | - | LightGBM | Accuracy, Precision, Recall, F1-score, FPR, FNR, TNR, |
Chen et al. [12] | ISCX VPN-NONVPN [13], USTC_2016 [14], Self-collected data in the power system environment | SSL, SFTP, FTPS | - | Select only the first 784 bytes of the session to use as a feature | Length unification, Convert to two-dimensional data, Converted to 2D grayscale images | 1D-CNN, 2D-CNN | Precision, Recall |
Bakhshi et al. [6] | NSL-KDD [15] | SSH | Statistics-based feature extraction | - | Normalization (0,1), One-Hot Encoding | CNN, LSTM, GRU, CNN+GRU | Accuracy, Precision, Recall, FPR, F1-Score |
UNSW-NB15 [16] | TLS, SSH | ||||||
CIC-IDS-2017 [17] | TLS, SSH, | ||||||
Garcia et al. [18] | Self-collection Slow DoS dataset | TLS | Directly implemented Conversation Processor | - | Normalization | Autoencoder | Accuracy, Precision, Recall, FPR, F1-Score |
Huo et al. [19] | CTU-13 [20] | TLS | Log Information-Based Feature Extraction | Analysis of variance (ANOVA) method and mutual information (MI) | - | Random Forest, XGBoost, GNB | Accuracy, Precision, Recall, F1-score, FPR |
Yang et al. [21] | CTU-Malware-Captures [9] | TLS | - | Select only the first 784 bytes of the session to use as a feature | Length unification, data cleaning | ResNet | Accuracy, Precision, Recall, F1-score, MCC |
Zhoa et al. [22] | Datacon 2020 Dataset [23] | TLS | Log Information-Based Feature Extraction | - | - | Ensemble (RF, NB, TEXTCNN) | Recall, FPR |
Zhang et al. [24] | Mixed (ISCX VPN-nonVPN [13], CTU-13 [20]) | SSL/TLS | Statistics-based feature extraction | - | Data cleaning, Length unification, Converted to 2D grayscale images | Efficientnet | Accuracy, Precision, Recall, F1-score |
Lucia et al. [25] | CTU-13 [20] | TLS | Statistics-based feature extraction | - | - | SVM, 1D-CNN | Accuracy, Precision, Recall, F1-score, FPR |
Zeng et al. [26] | Mixed ISCX VPN-nonVPN [13], ISCX 2012 IDS [27]) | SSL, HTTPS | - | - | Package Generation, Traffic Purification, Traffic Refiner, Length unification | 1D-CNN, LSTM, SAE | Precision, Recall, F1-score |
Han et al. [28] | Datacon2020 Dataset [23] | TLS | Statistics-based feature extraction | - | - | Autoencoder | Accuracy, Precision, Recall, F1-score |
Zhao et al. [29] | USTC-TFC2016 [14] | TLS | Statistics-based feature extraction | - | - | ERNN | Accuracy, F1-score |
Wang et al. [30] | ISCX VPN-nonVPN [13], CICAndMal2017 [31] | VPN, TLS | Statistics-based feature extraction | - | Length unification, Convert to two-dimensional data, Converted to 2D grayscale images | 2D-CNN | Accuracy, Precision, F1-score |
Niu et al. [32] | Mixed (MTA dataset [10], MCFP dataset [33], CTU-13 [20]) | TLS | Log Information-Based Feature Extraction | - | - | Improved Adaptive Random Forests | Precision, Recall, F1-score |
Li et al. [34] | MTA dataset [10], STRA dataset [35], USTC-TFC2016 [14] | - | Statistics-based feature extraction | Correlation analysis | Length unification | TCMal (Transformer Encoder, CNN) | Accuracy, Precision, Recall, F1-score |
Liu et al. [36] | ISCX VPN-nonVPN [13], USTC-TFC2016 [14] | SSL/TLS | Statistics-based feature extraction | - | Convert to two-dimensional data, Converted to 2D grayscale images | ConvLaddernet (CNN, Ladder network) | Accuracy, Precision, Recall, F1-score |
Andrey et al. [37] | Mixed (CTU-Malware-Captures [9], Jason Stroschein’s public GitHub malware dataset [38]) | TLS | Extract TLS session capability from raw Pcap file | - | Convert TLS session extraction words to 300-dimensional vectors | CBOW-LSTM, CBOW-BiLSTM, Skip-gram LSTM, Skip-gram BiLSTM | F1-score |
Zhang et al. [39] | Mixed (MCFP dataset [33], ISCX VPN-nonVPN [13]), MTA dataset [10] | SSL/TLS | Statistics-based feature extraction | Traffic processing: removes special information that prevents classification (SNI, packet header) and extracts only the first N bytes of the session | Convert to two-dimensional data, Converted to 2D grayscale images | DF-IDS (XGBoost, Random Forest, Extra Trees) | Recall, FPR |
Zheng et al. [40] | Datacon 2020 Dataset [23] | SSL/TLS | Statistics-based feature extraction | - | - | Linear Regression, BernoulliNB, Decision Trees, XGBoost, GCN-TC, GCN + XGBoost, GCN + Random Forest, GCN + KNN, GCN + DT, GCN-ETA | Accuracy, F1-score, AUC |
Zhang et al. [41] | CTU-Malware-Captures [9] | TLS | MEMTD translates raw traffic into TLS, HPB, PLS, PAIS, extracting that information into features | - | - | Contextual LSTM, FS-Net, FusionNet | F1-score |
Li al. [42] | CTU-Malware-Captures [9] | SSL/TLS | Statistics-based feature extraction | Delete features unrelated to classification, such as Ip and Port | - | 1D-CNN, 2D-CNN | Accuracy, Precision, Recall, F1-score |
Wang et al. [43] | Mixed (UNSW NS 2019 [44], CICIDS-2017 [17], CIC-AndMal 2017 [31], MCFP dataset [33], CICIDS-2012 [27]) | SSL/TLS | Session-based feature extraction, Log Information-Based Feature Extraction | Select the features that fit the purpose of the five feature sets | - | Random Forest, KNN, CART, C4.5, MLP, NB, XGBoost, AdaBoost, Linear Regression, Logistic Regression, | Accuracy, Roc-AUC, Recall, FPR |
Bader et al. [45] | Mixed (STRA dataset [35], ISCX VPN-non-VPN [13], MTA dataset [10]) | TLS | Generating session data for 32 TLS packets and then generating feature information | Select features and statistics in a TLS session | Generate a matrix (5 × 4) for 14 features of 5 TLS packets | 1D CNN, 2D CNN, Random Forest, SVM, KNN | Accuracy, Precision, Recall, F1-score |
Fu et al. [46] | CICInvesAndMal 2019 [47], EncMal2021(Self-collection) | SSL/TLS | Statistics-based feature extraction, Extract TLS information and DGA-related features | - | - | Random Forest, FS-Net, ST-Graph | Precision, Recall, FPR |
Ahmad et al. [48] | CIRA-CIC-DoHBrw-2020 [49] | Https | Statistics-based feature extraction | - | Chi-square filtering (features with similarly non-numeric values are replaced with numeric values using the same chi-square filtering algorithm), Replace missing values (determination of valid values) | Stacking (Random Forest and Decision Tree) | Accuracy, Precision, Recall, F1-score |
Liu et al. [50] | CICAndMal2017 [31] | TLS | Statistics-based feature extraction | Ethernet head removal, IP address masking | Length unification | TLARNN (1D-CNN, biGRU) | Accuracy, Precision, Recall, F1-score |
Wang et al. [51] | Mixed (CTU-Malware-Captures [9], CTU-Normal-Captures [52], CTU-Mixed-Captures [53], CICIDS-2017 [17], CICIDS-2012 [27], CIRA-CIC-DoHBRW-2020 [49]) | SSL/TLS | Statistics-based feature extraction | - | - | Random Forest, Average Ensemble | Accuracy, Precision, Recall, F1-score ROC-AUC, FPR |
Hong et al. [54] | Mixed (MCFP dataset [33], CTU-13 dataset [20]) | TLS | Log Information-Based Feature Extraction | - | Length unification, Convert to two-dimensional data Converted to 2D grayscale images | KNN Graph-based MLP | Accuracy, Precision, Recall, F1-score |
Abhay et al. [55] | MCFP dataset [33] | HTTPS | Statistics-based feature extraction | - | Numericalization, Data Cleaning, Data Normalization | Random Forest, Decision Tree, Extra trees, AdaBoost | Accuracy, Precision, Recall, F1-score, Model building time, Detection time |
Xing et al. [56] | Mixed (CTU-13 [20], STRA dataset [35]) | SSL/TLS | Statistics-based feature extraction, Sequential Features Extracting | - | - | LSTM-based Autoencoder, Deep dictionary learning | Precision, Recall F1-score |
Bahlali et al. [57] | UNSW-NB15 [16], CSE-CIC-IDS2018 [58] | HTTPS, SSH, TLS | Statistics-based feature extraction | - | - | Autoencoder | Accuracy, Precision, Recall, FAR, F1-score |
Year of Research Publication | The Number of Selected Research |
---|---|
2019 | 2 |
2020 | 4 |
2021 | 5 |
2022 | 12 |
2023 | 9 |
Reference type | |
Journal | 18 |
Conference proceedings | 14 |
Type | Detection Algorithm | Count |
---|---|---|
Machine Learning | Random Forest | 7 |
XGBoost | 4 | |
Decision Trees | 3 | |
Naïve Bayes (NB) | 3 | |
Ensemble | 3 | |
SVM | 2 | |
Extra Trees | 2 | |
Linear Regression (LR) | 2 | |
KNN | 2 | |
AdaBoost | 2 | |
LightGBM | 1 | |
K-means | 1 | |
Improved Adaptive Random Forest | 1 | |
Logistic Regression | 1 | |
CART | 1 | |
Deep Learning | CNN | 10 |
Autoencoder | 5 | |
LSTM | 4 | |
FS-Net | 2 | |
GRU | 2 | |
ResNet | 1 | |
Efficientnet | 1 | |
Transformer Encoder | 1 | |
Error-Resilient RNN(ERNN) | 1 | |
Ladder network | 1 | |
FusionNet | 1 | |
ST-Graph | 1 | |
Deep dictionary learning | 1 | |
Multi-Layer Perceptron (MLP) | 1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ji, I.H.; Lee, J.H.; Kang, M.J.; Park, W.J.; Jeon, S.H.; Seo, J.T. Artificial Intelligence-Based Anomaly Detection Technology over Encrypted Traffic: A Systematic Literature Review. Sensors 2024, 24, 898. https://doi.org/10.3390/s24030898
Ji IH, Lee JH, Kang MJ, Park WJ, Jeon SH, Seo JT. Artificial Intelligence-Based Anomaly Detection Technology over Encrypted Traffic: A Systematic Literature Review. Sensors. 2024; 24(3):898. https://doi.org/10.3390/s24030898
Chicago/Turabian StyleJi, Il Hwan, Ju Hyeon Lee, Min Ji Kang, Woo Jin Park, Seung Ho Jeon, and Jung Taek Seo. 2024. "Artificial Intelligence-Based Anomaly Detection Technology over Encrypted Traffic: A Systematic Literature Review" Sensors 24, no. 3: 898. https://doi.org/10.3390/s24030898
APA StyleJi, I. H., Lee, J. H., Kang, M. J., Park, W. J., Jeon, S. H., & Seo, J. T. (2024). Artificial Intelligence-Based Anomaly Detection Technology over Encrypted Traffic: A Systematic Literature Review. Sensors, 24(3), 898. https://doi.org/10.3390/s24030898