Extraction of Minimal Set of Traffic Features Using Ensemble of Classifiers and Rank Aggregation for Network Intrusion Detection Systems
Abstract
:1. Introduction
- The novel scheme of traffic features ordering, consisting of the selection of classifiers, their hyperparameter optimization, individual feature rankings of the best variant of classifiers, and finally, weighted feature ranking, results in an ultimate ordered set of features.
- The analysis of the NF-UQ-NIDS-v2 datasets’ features was sorted using the proposed scheme in view of their influence on the decision-tree-based classification, considering both accuracy and computational efficiency.
- The final decision-tree-based classifier with a mechanism allowing for optimizing the speed-to-accuracy rate. By selecting various thresholds on the ordered feature list, one may increase the accuracy, lowering the inference speed and reversely.
2. Previous Works
2.1. Selecting Network Traffic Features
2.2. Network Traffic Datasets
3. Traffic Features
3.1. NF-UQ-NIDS-v2 Dataset
3.2. Traffic Features in NF-UQ-NIDS-v2
- Categorical (‘c’ in the second column of Table 3), which contain 14 features. This type includes addresses and specific communication protocols.
- Numerical with the remaining 29 features (‘n’ in the second column of Table 3). This type contains all features that represent numeric data of the traffic, such as average throughput, number of transmitted packets, or the duration of the traffic.
- General information about protocols —group I. This group contains details of protocols utilized in the communication, like the type of L7 protocol, type of ICMP, or FTP command return code. In general, not all features in this group are strictly dependent on the network infrastructure, but there are some examples where these connections may occur. For instance, ICMP controls the links within the network, and one of its answers explicitly points out that a destination is unreachable or that the time to obtain the answer has passed. It is similar with DNS queries, which differ depending on the IP protocol version.If IPv4 is in use, DNS will work on A records, whereas for IPv6, AAAA will be in use.
- Addressing data—group II. While preparing network traffic datasets, authors should anonymize or remove exact IP addresses, as they may bias the machine learning tools. Port numbers may or may not be helpful for research purposes. Some network attacks are strictly connected to exact port numbers in the victim’s machines, but the attackers may easily change them. When attacks are emulated in the laboratory network, their addresses are entirely unimportant. However, addressing data could be helpful while working with the attacks captured in the real world. To sum up, it is believed that this sort of data should be omitted in examinations like this one.
- TCP parameters—group III. In TCP, many parameters are used to establish a session between computers. NF-UQ-NIDS-v2 contains only two of them: cumulative values of TCP flags and TCP window sizes. It appears that these features are moderately connected to the infrastructure where the traffic is emulated. In the DoS or DDoS attacks on TCP session establishing, the so-called SYN flood attack type, an attacker initiates a connection by setting a TCP SYN flag but does not continue the connection acknowledging the server’s response, leading to the lack of resources for the legitimate clients. Such attacks can be discovered by analyzing the state of particular initialized sessions.
- Sent data—group IV. The analyzed dataset has a wide range of features devoted to the volume of sent data, e.g., the incoming number of bytes. Then, there are also lengths of the shortest or longest flows. This sort of feature should be more related to the settings in tools that are used by the attackers rather than the infrastructure itself.
- Transmission parameters (time, speed, throughput and TTL)—group V. Features in this group represent the traffic characteristics that should not depend on the infrastructure build. Traffic speed or throughput are firmly related to architecture of particular computer networks. A noticeable relation between TTL features and the appropriate traffic class was observed [15]. Therefore, the authors of NF-UQ-NIDS-v2 decided to refrain from taking this feature into research. Most likely, they linked TTL values with differing infrastructures in which the traffic was captured. TTL represents the number of nodes that could be entered by the packet while traveling within the network. Passing each node decreases the TTL value, thus preventing the creation of any loops in the network. Extremely high TTL values may be associated with DoS or DDoS attacks that flood the network with a massive number of packets. Continuing this line of thinking, the times in which flows are delivered through the network also could depend on the construction of a network. Nevertheless, features from this group seem more likely to rely on the network standards instead of the infrastructure.
- Retransmission parameters—group VI. In this group, retransmission features are placed. These features indicate how many packets did not reach their destination and had to be resent. Retransmission issues of packets occur randomly with no dependence on the network build.
- Packet sizes—group VII. This batch contains five features that accumulate chosen packet sizes. As long as network devices have no packet size restrictions, it is assumed that this group is less likely to rely on infrastructure.
4. Methodology
Algorithm 1 A summary of the research steps performed in Section 4 |
|
4.1. Preliminary Selection of Features
- LONGEST_FLOW_PKT with MAX_IP_PKT_LEN;
- MIN_TTL with MAX_TTL;
- TCP_FLAGS with CLIENT_TCP_FLAGS;
- RETRANSMITTED_OUT_BYTES with RETRANSMITTED_OUT_PKTS;
- OUT_BYTES with NUM_PKTS_1024_TO_1514_BYTES.
4.2. Classification and Feature Selection Algorithms
4.3. Features Ranking
5. Tests
5.1. Choice of Optimal Classifiers
5.2. Feature Ranking Preparation
5.3. Search for the Minimum Set of Features
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wright, D.; Kumar, R. Assessing the socio-economic impacts of cybercrime. Soc. Impacts 2023, 1, 100013. [Google Scholar] [CrossRef]
- Altulaihan, E.; Almaiah, M.A.; Aljughaiman, A. Anomaly Detection IDS for Detecting DoS Attacks in IoT Networks Based on Machine Learning Algorithms. Sensors 2024, 24, 713. [Google Scholar] [CrossRef]
- Kshirsagar, D.; Kumar, S. Towards an intrusion detection system for detecting web attacks based on an ensemble of filter feature selection techniques. Cyber-Phys. Syst. 2023, 9, 244–259. [Google Scholar] [CrossRef]
- Ashoor, A.S.; Gore, S. Importance of intrusion detection system (IDS). Int. J. Sci. Eng. Res. 2011, 2, 1–4. [Google Scholar]
- Dhal, P.; Azad, C. A comprehensive survey on feature selection in the various fields of machine learning. Appl. Intell. 2022, 52, 4543–4581. [Google Scholar] [CrossRef]
- Thakkar, A.; Lohiya, R. A survey on intrusion detection system: Feature selection, model, performance measures, application perspective, challenges, and future research directions. Artif. Intell. Rev. 2022, 55, 453–563. [Google Scholar] [CrossRef]
- Bouke, M.A.; Abdullah, A.; ALshatebi, S.H.; Abdullah, M.T. E2IDS: An enhanced intelligent intrusion detection system based on decision tree algorithm. J. Appl. Artif. Intell. 2022, 3, 1–16. [Google Scholar] [CrossRef]
- Ingre, B.; Yadav, A.; Soni, A.K. Decision tree based intrusion detection system for NSL-KDD dataset. In Proceedings of the Information and Communication Technology for Intelligent Systems (ICTIS 2017)-Volume 22, Ahmedabad, India, 25–26 March 2017; Springer: Berlin/Heidelberg, Germany, 2018; pp. 207–218. [Google Scholar]
- Rai, K.; Devi, M.S.; Guleria, A. Decision tree based algorithm for intrusion detection. Int. J. Adv. Netw. Appl. 2016, 7, 2828. [Google Scholar]
- Awad, M.; Fraihat, S. Recursive feature elimination with cross-validation with decision tree: Feature selection method for machine learning-based intrusion detection systems. J. Sens. Actuator Netw. 2023, 12, 67. [Google Scholar] [CrossRef]
- Gudivada, V.; Apon, A.; Ding, J. Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. Int. J. Adv. Softw. 2017, 10, 1–20. [Google Scholar]
- Guezzaz, A.; Benkirane, S.; Azrour, M.; Khurram, S. A reliable network intrusion detection approach using decision tree with enhanced data quality. Secur. Commun. Netw. 2021, 2021, 1230593. [Google Scholar] [CrossRef]
- Jain, A.; Patel, H.; Nagalapatti, L.; Gupta, N.; Mehta, S.; Guttula, S.; Mujumdar, S.; Afzal, S.; Sharma Mittal, R.; Munigala, V. Overview and importance of data quality for machine learning tasks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 3561–3562. [Google Scholar]
- Gupta, N.; Mujumdar, S.; Patel, H.; Masuda, S.; Panwar, N.; Bandyopadhyay, S.; Mehta, S.; Guttula, S.; Afzal, S.; Sharma Mittal, R.; et al. Data quality for machine learning tasks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual Event, 14–18 August 2021; pp. 4040–4041. [Google Scholar]
- Sarhan, M.; Layeghy, S.; Portmann, M. Towards a standard feature set for network intrusion detection system datasets. Mob. Netw. Appl. 2022, 27, 357–370. [Google Scholar] [CrossRef]
- Claise, B. Cisco Systems NetFlow Services Export Version 9—RFC 3954. 2004. Available online: https://www.rfc-editor.org/info/rfc3954 (accessed on 29 July 2024).
- Aitken, P.; Claise, B.; Trammell, B. Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information—RFC 7011. 2013. Available online: https://www.rfc-editor.org/info/rfc7011 (accessed on 29 July 2024).
- Mostert, W.; Malan, K.M.; Engelbrecht, A.P. A feature selection algorithm performance metric for comparative analysis. Algorithms 2021, 14, 100. [Google Scholar] [CrossRef]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
- Ferreira, A.J.; Figueiredo, M.A. Efficient feature selection filters for high-dimensional data. Pattern Recognit. Lett. 2012, 33, 1794–1804. [Google Scholar] [CrossRef]
- Komisarek, M.; Pawlicki, M.; Kozik, R.; Hołubowicz, W.; Choraś, M. How to Effectively Collect and Process Network Data for Intrusion Detection? Entropy 2021, 23, 1532. [Google Scholar] [CrossRef]
- Honest, N. A survey on Feature Selection Techniques. GIS Sci. J. 2020, 7, 353–358. [Google Scholar]
- Smith, J.; Doe, J. Analysis of Basic Features in Network Traffic for Intrusion Detection. J. Netw. Secur. 2020, 15, 112–130. [Google Scholar]
- Lee, A.; Chen, B. Evaluating Payload Content for Advanced Intrusion Detection. In Proceedings of the International Conference on Cybersecurity, Virtual Event, 26–28 July 2021; pp. 345–356. [Google Scholar]
- Kumar, R.; Patel, S. Time-Based Feature Analysis for Real-Time Intrusion Detection. IEEE Trans. Inf. Forensics Secur. 2022, 17, 987–1001. [Google Scholar]
- Martinez, C.; Lopez, S. Behavioral Feature Profiling for Network Intrusion Detection. J. Comput. Netw. 2023, 18, 215–230. [Google Scholar]
- Sharma, Y.; Sharma, S.; Arora, A. Feature ranking using statistical techniques for computer networks intrusion detection. In Proceedings of the 2022 7th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 22–24 June 2022; pp. 761–765. [Google Scholar]
- Kumar, A.; Kumar, S. Intrusion detection based on machine learning and statistical feature ranking techniques. In Proceedings of the 2023 13th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 19–20 January 2023; pp. 606–611. [Google Scholar]
- Seijo-Pardo, B.; Bolón-Canedo, V.; Porto-Díaz, I.; Alonso-Betanzos, A. Ensemble feature selection for rankings of features. In Proceedings of the International Work-Conference on Artificial Neural Networks, Palma de Mallorca, Spain, 10–12 June 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 29–42. [Google Scholar]
- He, W.; Li, H.; Li, J. Ensemble feature selection for improving intrusion detection classification accuracy. In Proceedings of the 2019 International Conference on Artificial Intelligence and Computer Science, Wuhan, China, 12–13 July 2019; pp. 28–33. [Google Scholar]
- Krishnaveni, S.; Sivamohan, S.; Sridhar, S.; Prabakaran, S. Efficient feature selection and classification through ensemble method for network intrusion detection on cloud computing. Clust. Comput. 2021, 24, 1761–1779. [Google Scholar] [CrossRef]
- Karimi, Z.; Kashani, M.M.R.; Harounabadi, A. Feature ranking in intrusion detection dataset using combination of filtering methods. Int. J. Comput. Appl. 2013, 78, 21–27. [Google Scholar] [CrossRef]
- Arora, A.; Peddoju, S.K. Minimizing network traffic features for android mobile malware detection. In Proceedings of the 18th International Conference on Distributed Computing and Networking, Hyderabad, India, 5–7 January 2017; pp. 1–10. [Google Scholar]
- Jha, S.K.; Arora, A. An enhanced intrusion detection system using combinational feature ranking and machine learning algorithms. In Proceedings of the 2022 2nd International Conference on Intelligent Technologies (CONIT), Hubli, India, 24–26 June 2022; pp. 1–8. [Google Scholar]
- Ring, M.; Wunderlich, S.; Scheuring, D.; Landes, D.; Hotho, A. A survey of network-based intrusion detection data sets. Comput. Secur. 2019, 86, 147–167. [Google Scholar] [CrossRef]
- Krupski, J.; Graniszewski, W.; Iwanowski, M. Data Transformation Schemes for CNN-Based Network Traffic Analysis: A Survey. Electronics 2021, 10, 2042. [Google Scholar] [CrossRef]
- Pinto, A.; Herrera, L.C.; Donoso, Y.; Gutierrez, J.A. Survey on Intrusion Detection Systems Based on Machine Learning Techniques for the Protection of Critical Infrastructure. Sensors 2023, 23, 2415. [Google Scholar] [CrossRef] [PubMed]
- Pavlov, A.; Voloshina, N. Dataset Selection for Attacker Group Identification Methods. In Proceedings of the 2021 30th Conference of Open Innovations Association FRUCT, Oulu, Finland, 27–29 October 2021; pp. 171–176. [Google Scholar]
- Ahmed, L.A.H.; Hamad, Y.A.M.; Abdalla, A.A.M.A. Network-based Intrusion Detection Datasets: A Survey. In Proceedings of the 2022 International Arab Conference on Information Technology (ACIT), Abu Dhabi, United Arab Emirates, 22–24 November.
- De Keersmaeker, F.; Cao, Y.; Ndonda, G.K.; Sadre, R. A survey of public IoT datasets for network security research. IEEE Commun. Surv. Tutor. 2023, 25, 1808–1840. [Google Scholar] [CrossRef]
- Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar]
- Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef]
- Alsaedi, A.; Moustafa, N.; Tari, Z.; Mahmood, A.; Anwar, A. TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access 2020, 8, 165130–165150. [Google Scholar] [CrossRef]
- Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the International Conference on Information Systems Security and Privacy, Funchal, Portugal, 22–24 January 2018; pp. 108–116. [Google Scholar]
- Gouda, H.A.; Ahmed, M.A.; Roushdy, M.I. Optimizing anomaly-based attack detection using classification machine learning. Neural Comput. Appl. 2024, 36, 3239–3257. [Google Scholar] [CrossRef]
- Adeniyi, O.; Sadiq, A.S.; Pillai, P.; Aljaidi, M.; Kaiwartya, O. Securing Mobile Edge Computing Using Hybrid Deep Learning Method. Computers 2024, 13, 25. [Google Scholar] [CrossRef]
- Qing, Y.; Liu, X.; Du, Y. Mitigating data imbalance to improve the generalizability in IoT DDoS detection tasks. J. Supercomput. 2023, 80, 9935–9960. [Google Scholar] [CrossRef]
- Gu, Z.; Lopez, D.T.; Alrahis, L.; Sinanoglu, O. Always be Pre-Training: Representation Learning for Network Intrusion Detection with GNNs. In Proceedings of the 2024 25th International Symposium on Quality Electronic Design (ISQED), San Francisco, CA, USA, 3–5 April 2024; pp. 1–8. [Google Scholar]
- Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar]
- Louppe, G.; Wehenkel, L.; Sutera, A.; Geurts, P. Understanding variable importances in forests of randomized trees. Adv. Neural Inf. Process. Syst. 2013, 26. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
- Hastie, T.; Rosset, S.; Zhu, J.; Zou, H. Multi-class adaboost. Stat. Its Interface 2009, 2, 349–360. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. Large margin classification using the perceptron algorithm. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, Madison, WI, USA, 24–26 July 1998; pp. 209–217. [Google Scholar]
- Hoi, S.C.; Sahoo, D.; Lu, J.; Zhao, P. Online learning: A comprehensive survey. Neurocomputing 2021, 459, 249–289. [Google Scholar] [CrossRef]
- Crammer, K.; Dekel, O.; Keshet, J.; Shalev-Shwartz, S.; Singer, Y. Online passive aggressive algorithms. J. Mach. Learn. Res. 2006, 7, 551–585. [Google Scholar]
- Zhang, T. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 116. [Google Scholar]
- Saunders, C.; Gammerman, A.; Vovk, V. Ridge regression learning algorithm in dual variables. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI, USA, 24–27 July 1998. [Google Scholar]
- Drucker, H.; Burges, C.J.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Adv. Neural Inf. Process. Syst. 1996, 9. [Google Scholar]
- Molnar, C. Interpretable Machine Learning, 2nd ed.; Lulu. com: Morrisville, NC, USA, 2022. [Google Scholar]
- Larriva-Novo, X.; Sánchez-Zas, C.; Villagrá, V.A.; Marín-Lopez, A.; Berrocal, J. Leveraging Explainable Artificial Intelligence in Real-Time Cyberattack Identification: Intrusion Detection System Approach. Appl. Sci. 2023, 13, 8587. [Google Scholar] [CrossRef]
- Alosaimi, S.; Almutairi, S.M. An intrusion detection system using BoT-IoT. Appl. Sci. 2023, 13, 5427. [Google Scholar] [CrossRef]
- Tareq, I.; Elbagoury, B.M.; El-Regaily, S.; El-Horbaty, E.S.M. Analysis of ton-iot, unw-nb15, and edge-iiot datasets using dl in cybersecurity for iot. Appl. Sci. 2022, 12, 9572. [Google Scholar] [CrossRef]
- Alzughaibi, S.; El Khediri, S. A cloud intrusion detection systems based on dnn using backpropagation and pso on the cse-cic-ids2018 dataset. Appl. Sci. 2023, 13, 2276. [Google Scholar] [CrossRef]
- Sobh, T.S.; Amer, M.I. Fpga-based network traffic security: Design and implementation using c5.0 decision tree classifier. J. Electron. Sci. Technol. 2013, 11, 393–403. [Google Scholar]
- Abdulhammed, R.; Faezipour, M.; Elleithy, K.M. Network intrusion detection using hardware techniques: A review. In Proceedings of the 2016 IEEE Long Island Systems, Applications and Technology Conference (LISAT), Farmingdale, NY, USA, 29 April 2016; pp. 1–7. [Google Scholar]
- Ngo, D.M.; Lightbody, D.; Temko, A.; Pham-Quoc, C.; Tran, N.T.; Murphy, C.C.; Popovici, E. HH-NIDS: Heterogeneous hardware-based network intrusion detection framework for IoT security. Future Internet 2022, 15, 9. [Google Scholar] [CrossRef]
- Tchakoucht, T.A.; Ezziyyani, M. Building a fast intrusion detection system for high-speed-networks: Probe and dos attacks detection. Procedia Comput. Sci. 2018, 127, 521–530. [Google Scholar] [CrossRef]
- Larriva-Novo, X.; Vega-Barbas, M.; Villagra, V.A.; Rivera, D.; Alvarez-Campana, M.; Berrocal, J. Efficient distributed preprocessing model for machine learning-based anomaly detection over large-scale cybersecurity datasets. Appl. Sci. 2020, 10, 3430. [Google Scholar] [CrossRef]
- Moustafa, N.; Turnbull, B.; Choo, K.K.R. An ensemble intrusion detection technique based on proposed statistical flow features for protecting network traffic of internet of things. IEEE Internet Things J. 2018, 6, 4815–4830. [Google Scholar] [CrossRef]
Method | Wrapper | Embedded | Filter | Hybrid |
---|---|---|---|---|
SVM Rank [29] | 1 | 4 | 1 | |
Voting [30] | 2 | 1 | 2 | |
Majority voting [31] | 5 | |||
Average importance factor [32] | 2 | |||
Naive Bayes Rank [33] | 2 | |||
Weighted average [34] | 1 | 2 | ||
Our approach | 6 |
Feature | NB15 | BoT | ToN | IDS2018 | NIDS-v2 |
---|---|---|---|---|---|
IPv4 source address | ✓ | ✓ | ✓ | ✓ | ✓ |
IPv4 source port number | ✓ | ✓ | ✓ | ✓ | ✓ |
IPv4 destination address | ✓ | ✓ | ✓ | ✓ | ✓ |
IPv4 destination port number | ✓ | ✓ | ✓ | ✓ | ✓ |
Incoming number of packets | ✓ | ✓ | ✓ | ✓ | ✓ |
Outgoing number of packets | ✓ | ✓ | ✓ | ✓ | ✓ |
Incoming number of bytes | ✓ | ✓ | ✓ | ✓ | |
Outgoing number of bytes | ✓ | ✓ | ✓ | ✓ | |
Flow duration | ✓ | ✓ | ✓ | ✓ | |
Protocol | ✓ | ✓ | ✓ | ✓ | |
Transaction state | ✓ | ✓ | ✓ | ||
Src to dst bytes/s | ✓ | ✓ | ✓ | ||
Dst to src bytes/s | ✓ | ✓ | ✓ | ||
Mean size of outgoing packet | ✓ | ✓ | |||
Mean size of incoming packet | ✓ | ✓ | |||
Start time of the traffic | ✓ | ✓ | |||
Pipelined depth into HTTP connection | ✓ | ✓ | |||
DNS query type | ✓ | ✓ | |||
Length of the smallest flow | ✓ | ✓ | |||
Length of the largest flow | ✓ | ✓ |
Feature | Type | Description | Group |
---|---|---|---|
1. IPV4_SRC_ADDR | c | IPv4 source address | II |
2. L4_SRC_PORT | c | IPv4 source port number | II |
3. IPV4_DST_ADDR | c | IPv4 destination address | II |
4. L4_DST_PORT | c | IPv4 destination port number | II |
5. PROTOCOL | c | IP protocol identifier byte | I |
6. L7_PROTO | c | Application protocol as a number | I |
7. IN_BYTES | n | Incoming number of bytes | IV |
8. IN_PKTS | n | Incoming number of packets | IV |
9. OUT_BYTES | n | Outgoing number of bytes | IV |
10. OUT_PKTS | n | Outgoing number of packets | IV |
11. TCP_FLAGS | c | Cumulative of all TCP flags | III |
12. CLIENT_TCP_FLAGS | c | Cumulative of all client TCP flags | III |
13. SERVER_TCP_FLAGS | c | Cumulative of all server TCP flags | III |
14. FLOW_DURATION_MILLISECONDS | n | Flow duration in milliseconds | V |
15. DURATION_IN | n | Incoming stream duration in milliseconds | V |
16. DURATION_OUT | n | Outgoing stream duration in milliseconds | V |
17. MIN_TTL | n | Minimal flow TTL | V |
18. MAX_TTL | n | Maximal flow TTL | V |
19. LONGEST_FLOW_PKT | n | Longest packet (bytes) of the flow | IV |
20. SHORTEST_FLOW_PKT | n | Shortest packet (bytes) of the flow | IV |
21. MIN_IP_PKT_LEN | n | Len of the smallest flow IP packet observed | IV |
22. MAX_IP_PKT_LEN | n | Len of the largest flow IP packet observed | IV |
23. SRC_TO_DST_SECOND_BYTES | n | Src to dst bytes/sec | V |
24. DST_TO_SRC_SECOND_BYTES | n | Dst to src bytes/sec | V |
25. RETRANSMITTED_IN_BYTES | n | Number of retransmitted TCP flow bytes (src-dst) | VI |
26. RETRANSMITTED_IN_PKTS | n | Number of retransmitted TCP flow packets (src->dst) | VI |
27. RETRANSMITTED_OUT_BYTES | n | Number of retransmitted TCP flow bytes (dst->src) | VI |
28. RETRANSMITTED_OUT_PKTS | n | Number of retransmitted TCP flow packets (dst->src) | VI |
29. SRC_TO_DST_AVG_THROUGHPUT | n | Src to dst average throughput (bps) | V |
30. DST_TO_SRC_AVG_THROUGHPUT | n | Dst to src average throughput (bps) | V |
31. NUM_PKTS_UP_TO_128_BYTES | n | Packets whose IP size ≤ 128 | VII |
32. NUM_PKTS_128_TO_256_BYTES | n | Packets whose IP size > 128 and ≤256 | VII |
33. NUM_PKTS_256_TO_512_BYTES | n | Packets whose IP size > 256 and ≤512 | VII |
34. NUM_PKTS_512_TO_1024_BYTES | n | Packets whose IP size > 512 and ≤1024 | VII |
35. NUM_PKTS_1024_TO_1514_BYTES | n | Packets whose IP size > 1024 and ≤1514 | VII |
36. TCP_WIN_MAX_IN | n | Max TCP Window (src->dst) | III |
37. TCP_WIN_MAX_OUT | n | Max TCP Window (dst->src) | III |
38. ICMP_TYPE | c | ICMP Type ·256 + ICMP code | I |
39. ICMP_IPV4_TYPE | c | ICMP Type. | I |
40. DNS_QUERY_ID | c | DNS query transaction Id. | I |
41. DNS_QUERY_TYPE | c | DNS query type (e.g., 1 = A, 2 = NS etc.) | I |
42. DNS_TTL_ANSWER | n | TTL of the first A record (if any) | I |
43. FTP_COMMAND_RET_CODE | c | FTP client command return code | I |
Tuned Parameter | Optimal Value | Tested Values | |
---|---|---|---|
RF | The number of trees | 250 | {50, 100, 150, 200, 250} |
RF | Max. depth of the tree | 10 | {5, 10} |
ET | The number of trees | 200 | {50, 100, 150, 200, 250} |
ET | Max. depth of the tree | 10 | {5, 10} |
AB | The number of trees | 200 | {50, 100, 150, 200, 250} |
AB | The boosting algorithm | SAMME | {SAMME, SAMME.R} |
PA | Max. number of iterations | 100 | {100, 250, 500, 750, 1000} |
PA | The regularization (step size) | 0.5 | {0.5, 1} |
SVM | Max. number of iterations | 250 | {100, 250, 500, 750, 1000} |
SVM | The regularization | 0.0001 | {0.00005, 0.0001} |
R | Max. number of iterations | 100 | {100, 250, 500, 750, 1000} |
R | The regularization | 1.0 | {0.5, 1} |
Accuracy/Recall | Precision | |||
---|---|---|---|---|
Features | Random Forest | Decision Tree | Random Forest | Decision Tree |
1 | 88.5% | 78.8% | 88.9% | 75.6% |
2 | 90.6% | 83.2% | 91.5% | 83.3% |
3 | 91.6% | 88.3% | 91.0% | 88.7% |
4 | 93.0% | 90.7% | 92.5% | 89.9% |
5 | 92.7% | 90.7% | 92.6% | 89.9% |
6 | 92.8% | 90.7% | 92.7% | 90.0% |
7 | 93.7% | 92.1% | 93.2% | 92.4% |
8 | 94.2% | 95.3% | 93.8% | 95.1% |
9 | 95.1% | 95.5% | 94.7% | 95.2% |
10 | 95.2% | 96.0% | 94.7% | 96.0% |
15 | 95.3% | 96.9% | 94.9% | 96.9% |
20 | 95.4% | 97.1% | 95.1% | 97.1% |
25 | 95.3% | 97.2% | 95.1% | 97.2% |
30 | 95.3% | 97.2% | 95.1% | 97.2% |
Method | Dataset | Feat. | Acc. | Rec. | Prec. |
---|---|---|---|---|---|
RF with Explainable AI [62] | UNSW-NB15 | 7 | 96.7% | 94.7% | 96.1% |
Three-level algorithms [63] | part of BoT-IoT | 44 | 100% | 100% | 100% |
Inception Time [64] | ToN-IoT | 42 | 99.7% | 99.6% | 99.7% |
UNSW-NB15 | 43 | 98.6% | 98.4% | 98.9% | |
MLP with backpropagation [65] | CIC-CSE-IDS2018 | 24 | 99% | 98.8% | 100% |
RF [45] | part of NF-UQ-NIDS-v2 | 6 | 99.1% | 99% | 99% |
Autoencoder with MLP [46] | part of NF-UQ-NIDS-v2 | 42 | 100% | 94.2% | 98.9% |
Transformer with Neighborhood Clean-ing Rule [47] | NF-BoT-IoT-v2 | 40 | only F1 is provided: 85.6% | ||
NF-CSE-CIC-IDS2018-v2 | 40 | only F1 is provided: 88.7% | |||
Graph Neural Network with Self-supervised learning [48] | part of NF-UQ-NIDS-v2 | 43 | only F1 is provided: 85% | ||
part of ToN-IoT | 40 | only F1 is provided: 98% | |||
RF [21] | UNSW-NB15-v2 | 10 | 100% | 100% | 100% |
BoT-IoT-v2 | 10 | 100% | 100% | 100% | |
ToN-IoT-v2 | 10 | 100% | 100% | 100% | |
NF-CSE-CIC-IDS2018-v2 | 10 | 100% | 100% | 100% | |
NF-UQ-NIDS-v2 | 10 | 98% | 98% | 98% | |
Our approach | NF-UQ-NIDS-v2 | 8 | 95.3% | 95.3% | 95.1% |
No. of Feat. | Testing Time [s] | No. of Feat. | Testing Time [s] | No. of Feat. | Testing Time [s] |
---|---|---|---|---|---|
30 | 1.86 | 20 | 1.58 | 10 | 1.39 |
29 | 1.81 | 19 | 1.55 | 9 | 1.01 |
28 | 1.79 | 18 | 1.59 | 8 | 0.95 |
27 | 1.78 | 17 | 1.55 | 7 | 0.85 |
26 | 1.79 | 16 | 1.56 | 6 | 0.81 |
25 | 1.72 | 15 | 1.53 | 5 | 0.80 |
24 | 1.67 | 14 | 1.46 | 4 | 0.77 |
23 | 1.65 | 13 | 1.47 | 3 | 0.72 |
22 | 1.70 | 12 | 1.45 | 2 | 0.57 |
21 | 1.62 | 11 | 1.43 | 1 | 0.46 |
Method | Dataset | Feat. | Description | Time [s] |
---|---|---|---|---|
Reduced Error Pruning Tree [69] | part of KDD Cup 1999 | 19 | testing time (only Probe attack) | 0.5 |
part of KDD Cup 1999 | 9 | testing time (only DoS attack) | 0.69 | |
MLP and DT [70] | UGR’16 | 8 | training and testing time | 4.25 |
AB [71] | part of UNSW-NB15 | 16 | training and testing time for only DNS samples | 150.8 |
part of UNSW-NB15 | 12 | training and testing time for only HTTP samples | 148.3 | |
part of NIMS | 16 | training and testing time for only DNS samples | 142.2 | |
part of NIMS | 12 | training and testing time for only HTTP samples | 145.6 | |
Autoencoder with MLP [46] | part of NF-UQ-NIDS-v2 | 42 | testing time (train/test split: 80/20) | 0.74 |
part of NF-UQ-NIDS-v2 | 42 | testing time (train/test split: 70/30) | 1.32 | |
part of NF-UQ-NIDS-v2 | 42 | testing time (train/test split: 60/40) | 1.44 | |
Our approach | NF-UQ-NIDS-v2 | 8 | testing time | 0.95 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Krupski, J.; Iwanowski, M.; Graniszewski, W. Extraction of Minimal Set of Traffic Features Using Ensemble of Classifiers and Rank Aggregation for Network Intrusion Detection Systems. Appl. Sci. 2024, 14, 6995. https://doi.org/10.3390/app14166995
Krupski J, Iwanowski M, Graniszewski W. Extraction of Minimal Set of Traffic Features Using Ensemble of Classifiers and Rank Aggregation for Network Intrusion Detection Systems. Applied Sciences. 2024; 14(16):6995. https://doi.org/10.3390/app14166995
Chicago/Turabian StyleKrupski, Jacek, Marcin Iwanowski, and Waldemar Graniszewski. 2024. "Extraction of Minimal Set of Traffic Features Using Ensemble of Classifiers and Rank Aggregation for Network Intrusion Detection Systems" Applied Sciences 14, no. 16: 6995. https://doi.org/10.3390/app14166995
APA StyleKrupski, J., Iwanowski, M., & Graniszewski, W. (2024). Extraction of Minimal Set of Traffic Features Using Ensemble of Classifiers and Rank Aggregation for Network Intrusion Detection Systems. Applied Sciences, 14(16), 6995. https://doi.org/10.3390/app14166995