Increasing the Effectiveness of Network Intrusion Detection Systems (NIDSs) by Using Multiplex Networks and Visibility Graphs
Abstract
:1. Introduction
Listing 1. snort NIDS rule examples. |
alert tcp $EXTERNAL_NET any -> $TELNET_SERVERS 23 \ ( msg:"MALWARE-BACKDOOR␣w00w00␣attempt"; flow:to_server,established; content:"w00w00"; metadata:ruleset community; classtype:attempted-admin; sid:209; rev:9; ) alert tcp $EXTERNAL_NET any -> $TELNET_SERVERS 23 \ ( msg:"MALWARE-BACKDOOR␣attempt"; flow:to_server,established; content:"backdoor",nocase; metadata:ruleset \ community; classtype:attempted-admin; sid:210; rev:7; ) alert tcp $EXTERNAL_NET any -> $TELNET_SERVERS 23 \ ( msg:"MALWARE-BACKDOOR␣MISC␣r00t␣attempt"; \ flow:to_server,established; content:"r00t"; metadata:ruleset community; classtype:attempted-admin; sid:211; rev:7; ) alert tcp $EXTERNAL_NET any -> $TELNET_SERVERS 23 \ ( msg:"MALWARE-BACKDOOR␣MISC␣rewt␣attempt"; flow:to_server,established; content:"rewt"; metadata:ruleset community; classtype:attempted-admin; sid:212; rev:7; ) |
- No zero-day attacks detection,
- Lots of false positive alerts because the rule only finds limited patterns in the packets,
- No contextual information about generated alerts
2. Materials and Methods
3. Machine Learning Based on Intrusion Detection Systems
3.1. Related Works
- Pattern classifiers;
- Single classifiers: K-nearest neighbor, support vector machines, artificial neural networks, etc.;
- Hybrid classifiers;
- Ensemble classifiers.
3.2. NIDS Architecture
3.3. Dataset
- A dataset format based on network flows and their features,
- Long-term monitoring: to understand the the behavior of every network communication, we need information over time,
- A labeled dataset,
- Wide usage in previous research to be compared with.
- It collects every flow from a network. The dataset contains more than 72,000,000 records;
- It labels which communication is an attack;
- It provides us with a great number of features describing it from every network flow between two IP address, as we can see in Table 1.
3.4. Temporal Behavior Multiplex Network
- Start time: Defined first date to create the time series.
- Finish time: Last date taken into account for the time series creation.
- Frequency: Slot time where the discrete function is used. Normally, the frequency is based on days, weeks, seconds, or years.
3.5. Visibility Graphs
3.5.1. Natural Visibility Graphs
- Connected: based on the definition of the visibility graph. Each node is connected with the left and right node, at least.
- Undirected.
- Invariant: no escalation or translation can affect the generated visibility graph.
3.5.2. Horizontal Visibility Graphs
- Connected, as the NVG.
- Invariant to any translation or reescalation.
- Irreversible: using the HVG, several time series can create the same HVG so that it is impossible to return from the graph to the time series. This is almost never a problem because our purpose in this operation is to catch time series structural properties. In the case where reversibility is needed, we have to use a weighted network, and to define a reversible network is feasible.
- Undirected graph. Basically, no direction is made between the two nodes. However, it is possible to create a directed graph using the temporal evolution of the time series, that is, the edge direction is the direction where the time increases in the time series.
- The natural visibility graph is a more connected graph than the HVG.
3.6. Edge Clustering
- max_degree: We define the degree of the network as the number of adjacent edges to the node. If we define a network as a set of nodes and E the set of edges connecting the nodes as , with an adjacency matrix A, we can define the max degree of the graph as
- density: We can define the density value as 0 when no edge exists on the graph. On the other hand, the value is equal to 1 if we are describing a complete graph.
- Natural visibility graph max_degree,
- Natural visibility graph density,
- Horizontal visibility graph max_degree,
- Horizontal visibility graph density.
- Initialization: once the number of clusters, k, has been chosen, k centroids are established in the data space; for example, by choosing them randomly.
- Assigning objects to centroids: each object in the data is assigned to its nearest centroid.
- Updating centroids: the centroid position of each group is updated by taking the position of the average of the objects belonging to that group as the new centroid.
3.7. Temporal Behavior Multiplex Network
3.8. Forecasting Using Random Forest
4. Results
- Start time: 23 April 2019 13:00:00,
- Finish time: 27 April 2019 08:00:00,
- Frequency: Hourly.
4.1. Node Features Acquisition
4.2. Attackers Detection
4.2.1. Evaluation Metrics
4.2.2. Supervised IP Address Detection
- Cross-validation: to avoid overfitting problems. It is very important to use cross-validation to obtain stable accuracy in the model. CV consists of randomly dividing the data into N groups; all groups except one trains the model, and the last tests it. This process is repeated N times, and their average accuracy is selected as the final accuracy. In our case, we use the class StratifiedKFold from the python sklearn package. We use this class instead of KFold because it preserves the percentage of samples for each class. Our case is a high imbalance dataset, and so it is very relevant to maintain the class balance. By default, the number of folds is set to 5, but we decide to increase them to 10 to obtain a more accurate prediction.
- Train-test: We divide the dataset into two different parts. The first one (train) is used to fit the model, and the other (test) is used to validate the fitted model. The StratifiedKFold class gives us, in our case, 10 train datasets and 10 test datasets to create a model for each pair.
- The Random Forest algorithm has several hyperparameters to select. We use a grid search technique to select the best result for our project. One of the most relevant parameters to tune is the number of estimators. In our case, after comparing between several values, we decided to use 100 as the best number of estimators.
Algorithm 1 Cross Validation. |
for train_index, test_index in skf.split(X, y) do end for |
4.2.3. Comparison with the Previous Approach
5. Discussion
5.1. Conclusions
- Reducing the number of alerts generated by the NIDS to be analyzed by the SOC analyst: The accuracies of the actual NIDS and machine learning approaches are very high; however, analyzing every network packet gives us a lot of alerts due to the large number of events crossing the networks, and in most cases, these alerts with a correct context about the IP address and the behavior of the relation between them makes it easier for the SOC analyst to discard them. However, this action redirects the focus on the correct alert where the behavior gives us the reason to analyze this relationship in depth.
- Reducing the computational requirements, decreasing the analysis to time slots instead of every network event. For example, if a behavior analysis is made every 5 min, we can change the depth of analysis of every packet in this slot of time to the behavior analysis for the relations of the existing IP address. This reduces the potential alert from thousands to hundreds.
5.2. Next Steps
- Validating this type of data in large corporate environments.
- Changing unsupervised time series clustering techniques to supervised techniques. This approach gives us a better understanding of the description of every cluster.
- Using other time series techniques to obtain more information about the behavior of the relationship between the nodes.
- Using this new dataset in cybersecurity with the same approach to confirm the same accuracy obtained in this paper.
- Using this approach to solve others’ real-world problems.
- Predicting with complex network capabilities rather than with Random Forest.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
NIDS | Network Intrusion Detection System |
UEBA | User Entity Behavior Analytics |
MTTD | Mean Time To Detection |
SIEM | Security Information and Event Management |
SOC | Security Operation Control |
References
- Dorogovtsev, S. Complex Networks; Oxford University Press: Oxford, UK, 2010. [Google Scholar]
- Strogatz, S.H. Exploring complex networks. Nature 2001, 410, 268–276. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Boccaletti, S.; Bianconi, G.; Criado, R.; DelGenio, C.I.; Gómez-Gardeñes, J.; Romance, M.; Sendixnxa-Nadal, I.; Wang, Z.; Zanin, M. The structure and dynamics of multilayer networks. Phys. Rep. 2014, 544, 1–122. [Google Scholar] [CrossRef] [Green Version]
- Da Fontoura Costa, L.; Oliveira, O.N.; Travieso, G.; Rodrigues, F.A.; Villas Boas, P.R.; Antiqueira, L.; Viana, M.P.; Correa Rocha, L.E. Analyzing and modeling real-world phenomena with complex networks: A survey of applications. Adv. Phys. 2011, 60, 329–412. [Google Scholar] [CrossRef] [Green Version]
- Kivela, M.; Arenas, A.; Barthelemy, M.; Gleeson, J.P.; Moreno, Y.; Porter, M.A. Multilayer Networks. J. Complex Netw. 2014, 2, 203–227. [Google Scholar] [CrossRef] [Green Version]
- Chapela, V.; Criado, R.; Moral, S.; Romance, M. Intentional Risk Management through Complex Networks Analysis; Springer International Publishing: Berlin/Heidelberg, Germany; New York, NY, USA; Dordrecht, The Netherlands; London, UK, 2015. [Google Scholar]
- Criado, R.; Moral, S.; Pérez, A.; Romance, M. On the edges’s PageRank and linegraphs. Chaos 2018, 28, 075503. [Google Scholar] [CrossRef] [PubMed]
- Estrada, E. Networks Science; Springer: New York, NY, USA, 2010. [Google Scholar]
- Latora, V.; Nicosia, V.; Russo, G. Complex Networks: Principles, Methods and Applications; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
- Moral, S.; Chapela, V.; Criado, R.; Pérez, A.; Romance, M. Efficient algorithms for estimating loss of information in a complex network: Applications to intentional risk analysis. Netw. Heterog. Media 2015, 10, 195–208. [Google Scholar]
- Newman, M. Networks: An Introduction; Oxford University Press: Oxford, UK, 2010. [Google Scholar]
- Zanin, M.; Romance, M.; Moral, S.; Criado, R. Credit Card Fraud Detection through Parenclitic Network Analysis. Complexity 2018, 2018, 5764370. [Google Scholar] [CrossRef]
- Zanin, M.; Papo, D.; Romance, M.; Criado, R.; Moral, S. The topology of card transaction money flows. Phys. A 2016, 462, 134–140. [Google Scholar] [CrossRef] [Green Version]
- Partida, A.; Criado, R.; Romance, M. Identity and Access Management Resilience against Intentional Risk for Blockchain-Based IOT Platforms. Electronics 2021, 10, 378. [Google Scholar] [CrossRef]
- Partida, A.; Criado, R.; Romance, M. Visibility Graph Analysis of IOTA and IoTeX Price Series: An Intentional Risk-Based Strategy to Use 5G for IoT. Electronics 2021, 10, 2282. [Google Scholar] [CrossRef]
- Criado-Alonso, A.; Battaner-Moro, E.; Aleja, D.; Romance, M.; Criado, R. Using complex networks to identify patterns in specialty mathematical language: A new approach. Soc. Netw. Anal. Min. 2020, 10, 69. [Google Scholar] [CrossRef]
- Iglesias, S.; Moral-Rubio, S.; Criado, R. A new approach to combine multiplex networks and time series attributes: Building intrusion detection systems (IDS) in cybersecurity. Chaos Solitons Fractals 2021, 150, 111143. [Google Scholar] [CrossRef]
- Perez, S.I.; Moral-Rubio, S.; Criado, R. Combining multiplex networks and time series: A new way to optimize real estate forecasting in New York using cab rides. Phys. A Stat. Mech. Its Appl. 2022, 609, 128306. [Google Scholar] [CrossRef]
- Aburomman, A.; Reaz, M.B.I. Review of ids develepment methods in machine learning. Int. J. Electr. Comput. Eng. 2016, 6, 2432. [Google Scholar] [CrossRef] [Green Version]
- Tsai, C.-F.; Hsu, Y.-F.; Lin, C.-Y.; Lin, W.-Y. Intrusion detection by machine learning: A review. Expert Syst. Appl. 2009, 36, 11994–12000. [Google Scholar] [CrossRef]
- Sethi, K.; Sai Rupesh, E.; Kumar, R.; Bera, P.; Venu Madhav, Y. A context-aware robust intrusion detection system: A reinforcement learning-based approach. Int. J. Inf. Secur. 2020, 19, 657–678. [Google Scholar] [CrossRef]
- Khan, M.A.; Karim, M.R.; Kim, Y. A Scalable and Hybrid Intrusion Detection System Based on the Convolutional-LSTM Network. Symmetry 2019, 11, 583. [Google Scholar] [CrossRef] [Green Version]
- Muna, A.H.; Moustafa, N.; Sitnikova, E. Identification of malicious activities in industrial internet of things based on deep learning models. J. Inf. Secur. Appl. 2018, 41, 1–11. [Google Scholar] [CrossRef]
- Tama, B.A.; Rhee, K.H. Attack Classification Analysis of IoT Network via Deep Learning Approach. Res. Briefs Inf. Commun. Technol. Evol. (ReBICTE) 2017, 3, 1–9. [Google Scholar] [CrossRef]
- Viet, H.N.; Van, Q.N.; Trang, L.L.T.; Nathan, S. Using Deep Learning Model for Network Scanning Detection. In Proceedings of the 4th International Conference on Frontiers of Educational Technologies, Moscow, Russia, 25–27 June 2018. [Google Scholar] [CrossRef] [Green Version]
- Van, N.T.; Thinh, T.N.; Sach, L.T. A Combination of Temporal Sequence Learning and Data Description for Anomaly-based NIDS. arXiv 2019, arXiv:1906.05277. [Google Scholar]
- Anton, S.D.; Ahrens, L.; Fraunholz, D.; Schotten, H. Time is of the essence: Machine learning-based intrusion detection in industrial time series data. In Proceedings of the 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, 17–20 November 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Wang, F.; Yang, S.; Wang, C.; Li, Q. A Novel Intrusion Detection System for Malware Based on Time-Series Meta-learning. In Proceedings of the International Conference on Machine Learning for Cyber Security, Guangzhou, China, 8–10 October 2020; Springer: Cham, Switzerland, 2020; pp. 50–64. [Google Scholar]
- Staniford-Chen, S.; Cheung, S.; Crawford, R.; Dilger, M.; Frank, J.; Hoagland, J.; Zerkle, D. A graph based intrusion detection system for large networks. In Proceedings of the 19th National Information Systems Security Conference, Baltimore, MD, USA, 22–25 October 1996. [Google Scholar]
- Akoglu, L.; Tong, H.; Koutra, D. Graph-based anomaly detection and description: A survey. arXiv 2014, arXiv:1404.4679. [Google Scholar] [CrossRef] [Green Version]
- Shang, K.K.; Small, M.; Xu, X.K.; Yan, W.S. The role of direct links for link prediction in evolving networks. EPL (Europhys. Lett.) 2017, 117, 28002. [Google Scholar] [CrossRef]
- Ashraf, J.; Keshk, M.; Moustafa, N.; Abdel-Basset, M.; Khurshid, H.; Bakhshi, A.D.; Mostafa, R.R. IoTBoT-IDS: A Novel Statistical Learning-enabled Botnet Detection Framework for Protecting Networks of Smart Cities. Sustain. Cities Soc. 2021, 72, 103041. [Google Scholar] [CrossRef]
- Lacasa, L.; Luque, B.; Ballesteros, F.; Luque, J.; Nuno, J.C. From time series to complex networks: The visibility graph. Proc. Natl. Acad. Sci. USA 2008, 105, 4972–4975. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Luque, B.; Lacasa, L.; Ballesteros, F.; Luque, J. Horizontal visibility graphs: Exact results for random time series. Phys. Rev. Stat. Nonlinear Soft Matter Phys. 2009, 80, 046103. [Google Scholar] [CrossRef] [Green Version]
- Hagberg, A.; Swart, P.; Chult, D.S. Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in Science Conference (SciPy2008), Pasadena, CA, USA, 19–24 August 2008; Varoquaux, G., Vaught, T., Millman, J., Eds.; Los Alamos National Lab: Los Alamos, NM, USA, 2008; pp. 11–15. [Google Scholar]
- Paparrizos, J.; Gravano, L. k-Shape: Efficient and Accurate Clustering of Time Series. ACM SIGMOD Rec. 2016, 45, 69–76. [Google Scholar] [CrossRef]
- Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef] [Green Version]
- Shafiq, M.; Tian, Z.; Bashir, A.K.; Du, X.; Guizani, M. CorrAUC: A malicious bot-iot traffic detection method in iot network using machine learning techniques. IEEE Internet Things J. 2020, 8, 3242–3254. [Google Scholar] [CrossRef]
- Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J.; Alazab, A. A novel ensemble of hybrid intrusion detection system for detecting internet of things attacks. Electronics 2019, 8, 1210. [Google Scholar] [CrossRef] [Green Version]
- Churcher, A.; Ullah, R.; Ahmad, J.; Rehman, S.U.; Masood, F.; Gogate, M.; Alqahtani, F.; Nour, B.; Buchanan, W.J. An experimental analysis of attack classification using machine learning in iot networks. Sensors 2021, 21, 446. [Google Scholar] [CrossRef]
- Zeeshan, M.; Riaz, Q.; Bilal, M.A.; Shahzad, M.K.; Jabeen, H.; Haider, S.A.; Rahim, A. Protocol Based Deep Intrusion Detection for DoS and DDoS attacks using UNSW-NB15 and Bot-IoT data-sets. IEEE Access 2021, 10, 2269–2283. [Google Scholar] [CrossRef]
Title 1 | Title 2 |
---|---|
ts | src_ip |
src_port | dst_ip |
dst_port | proto |
service | duration |
src_bytes | dst_bytes |
conn_state | missed_bytes |
src_pkts | src_ip_bytes |
dst_pkts | dst_ip_bytes |
dns_query | ns_qclass |
dns_qtype | dns_rcode |
dns_AA | dns_RD |
dns_RA | dns_rejected |
ssl_version | ssl_cipher |
ssl_resumed | ssl_established |
ssl_subject | ssl_issuer |
http_trans_depth | http_method |
http_uri | http_version |
http_request_body_len | |
http_response_body_len | http_status_code |
http_user_agent | http_orig_mime_types |
http_resp_mime_types | weird_name |
weird_addl | weird_notice |
label | type |
Cluster | Number of Edges | Number of Nodes |
---|---|---|
0 | 25,093 | 24,336 |
1 | 4207 | 3952 |
2 | 166 | 166 |
3 | 392 | 377 |
4 | 409 | 411 |
5 | 102 | 106 |
Node | Connected Edges |
---|---|
192.168.1.194 | 9257 |
192.168.1.190 | 5728 |
192.168.1.152 | 5284 |
192.168.1.184 | 3358 |
192.168.1.2 | 256 |
Node | Connected Edges |
---|---|
192.168.1.190 | 3503 |
192.168.1.195 | 138 |
192.168.1.180 | 123 |
192.168.1.30 | 101 |
192.168.1.31 | 89 |
Node | Connected Edges |
---|---|
192.168.1.190 | 113 |
192.168.1.195 | 20 |
192.168.1.180 | 18 |
192.168.1.193 | 9 |
192.168.1.30 | 4 |
Node | Connected Edges |
---|---|
192.168.1.190 | 296 |
192.168.1.195 | 25 |
192.168.1.180 | 24 |
192.168.1.30 | 14 |
192.168.1.31 | 13 |
Node | Connected Edges |
---|---|
192.168.1.190 | 354 |
192.168.1.195 | 18 |
192.168.1.180 | 14 |
192.168.1.193 | 8 |
192.168.1.30 | 5 |
Node | Connected Edges |
---|---|
192.168.1.190 | 71 |
192.168.1.195 | 21 |
192.168.1.31 | 3 |
192.168.1.180 | 3 |
192.168.1.1 | 3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Iglesias Perez, S.; Criado, R. Increasing the Effectiveness of Network Intrusion Detection Systems (NIDSs) by Using Multiplex Networks and Visibility Graphs. Mathematics 2023, 11, 107. https://doi.org/10.3390/math11010107
Iglesias Perez S, Criado R. Increasing the Effectiveness of Network Intrusion Detection Systems (NIDSs) by Using Multiplex Networks and Visibility Graphs. Mathematics. 2023; 11(1):107. https://doi.org/10.3390/math11010107
Chicago/Turabian StyleIglesias Perez, Sergio, and Regino Criado. 2023. "Increasing the Effectiveness of Network Intrusion Detection Systems (NIDSs) by Using Multiplex Networks and Visibility Graphs" Mathematics 11, no. 1: 107. https://doi.org/10.3390/math11010107
APA StyleIglesias Perez, S., & Criado, R. (2023). Increasing the Effectiveness of Network Intrusion Detection Systems (NIDSs) by Using Multiplex Networks and Visibility Graphs. Mathematics, 11(1), 107. https://doi.org/10.3390/math11010107