Unsupervised Security Threats Identification for Heterogeneous Events
Abstract
:1. Introduction
- Robust unsupervised detection model: We propose a novel method to remove false alarms from the alerts for five types of attacks by converting multi-class classification with an anomaly detection task.
- Application of optimal preprocessing: We apply an appropriate preprocessing method while considering the characteristics of the features among the various methods.
- Filtering alerts to be analyzed: Using the proposed model, the data classified as true alerts can be filtered from heterogeneous alert data and displayed. The security administrator can respond by first analyzing the filtered data and then checking the excluded data if necessary.
- Integrated relevance analysis: We analyze correlations to identify attacks and classify attack types using IRA. This helps the security administrator to respond to occurring attacks.
2. Related Works
- Datasets: Representative datasets include KDD-CUP99 [5] and NSL-KDD. CTU-UNB mixes the botnet-based datasets, CTU-13 [6] and UNB-ISCX-IDS 2012 [7]. UNSW-NB15 [8] comprises network packet data collected by the IXIA PerfectStorm tool and includes data pertinent to nine types of attacks, namely, fuzzers, analysis, backdoors, denial of service (DoS), exploits, generic, reconnaissance, shellcode, and worms. USTC-TFC 2016 [9] contains traffic data comprising ten types of both benign and malware traffic. CSE-CIC-IDS 2018 [10] generated data on web, brute force, DoS, botnet, and distributed denial of service (DDoS)+PortScan attacks as B-profiles and M-Profiles. CIC-DDoS 2019 [11] and CIC-DoS 20177 [12] are network traffic data based on DDoS and DoS, respectively. UGR’16 [13] consists of data on real traffic and up-to-date attacks collected by NetFlow v9 collectors. Most of these datasets comprise network traffic data and not logs from security devices such as enterprise security management (ESM) or SIEM log datasets collected from heterogeneous environments. Additionally, they comprise single attack (for example, DDoS and botnet) data or the scale of the attacks is small; thus, these datasets are limited in terms of the variety and sizes of the attacks.
- Anomaly classification based on machine learning: Anomaly-based classification methods that detect intrusions through the identification and classification of attacks have been studied extensively [14,15,16,17]. These studies aimed to increase the detection rate and accuracy and lower the false alarm rate through the application of supervised learning methods. However, a disadvantage of supervised learning is that only labeled data can be inputted and learned. Most of the data used for training support labels; however, in practice, it is difficult to label the collected data because various traffic or logs can appear simultaneously, or an unseen attack to which a timely response is not possible, such as a zero-day attack, may occur.
- Anomaly detection based on machine learning: Studies on various supervised learning-based detection models include [7,9,18,19,20,21,22]. To solve the problem of labeled data, an alternative solution is unsupervised learning, which does not require labels. Unsupervised learning solutions [11,23,24,25,26,27,28] have implemented autoencoders, one-class K-means, one-class scaled convex hulls, and isolation forests to identify threats. Nguyen et al. [29] showed that the variational autoencoder can identify various attacks through reconstructed errors when compared with the autoencoder and Gaussian-based thresholding techniques. They adopted gradients as fingerprints to identify or classify the attack types. The gradients identify each attack and show similar results for scan 11 and scan 44 attacks; thus, using this result for reconstructed errors improves the performance. However, the data collected by NetFlow have limitations because only traffic data and five types of attacks are included the following: DoS, port scanning, botnet, spam, and blacklist. Rao et al. [30] proposed the bi-functional autoencoder (BFAE) to reduce the dimensionality of time-series data using autoencoders capturing nonlinear relationships. The application of the BFAE demonstrated superior dimensionality reduction results compared to traditional techniques such as Principal Component Analysis (PCA), AE, and Functional Principal Component Analysis (FPCA). The BFAE method focuses on reconstruction; thus, it is not relevant to our research.
- Applications in different domains: Studies have been conducted in other domains to detect outliers [31,32,33]. Zhang et al. [33] proposed an anomaly detection technique based on unsupervised learning and attention techniques for time-series data. Choi and Kim [34] proposed anomaly detection using machine learning with a special focus on the HIL-based augmented ICS (HAI) dataset using an autoencoder combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) networks. The study involved two types of data: a synthetic dataset and a real-world power plant dataset. Because both time-series datasets were composed exclusively of numeric data, the direct application of the data as input to the deep learning model was not complicated. However, in heterogeneous security equipment alerts, numerical and categorical data are mixed. Moreover, even if security devices have the same features, some devices use numeric labeling, whereas others use categorical labeling. For example, device A indicates the level of the alert using numbers, whereas device B uses character values such as low, medium, and high. Therefore, we need to account for this heterogeneity by applying a method to convert categorical data to numeric data, after which we can use the various models of anomaly detection regardless of the domain.
3. Methodology
3.1. Dataset Generation
3.1.1. Testbed Construction for Generating Dataset
3.1.2. Configuration of Attack Scenario
3.1.3. Summary of Dataset
3.2. Data Preprocessing
3.3. Anomaly Detection
3.4. Integrated Relevance Analysis
4. Experimental Results
4.1. Experimental Environment
4.2. Anomaly Detection
4.2.1. Results of Individual Machines
4.2.2. Results of Individual Machines with Snort
4.3. Filtering the Alerts
4.4. Integrated Relevance Analysis
5. Discussion
- Limitation of the dataset to evaluate performance: Based on the experimental results, our model detects five different attack situations with one round of training using only normal data. According to Table 9, the model combined with Snort maintains an FNR between 0 and 0.12. Therefore, we expect that this model will be able to detect even the emergence of new attacks. However, as our model was trained on normal data, the performance of the model may vary depending on the state and volume of normal data and the difference between normal and attack data. It is important to collect high-quality normal data or refine the data to increase their quality. In our preprocessing steps, we applied various approaches such as removing IP addresses, which cannot represent attackers or targets, dropping meaningless fields, and determining numeric and categorical fields for data encoding. In addition, our dataset lacks a variety of attack cases, such as multiple attacks of the same type or different types of attacks occurring simultaneously. Furthermore, we could not distinguish attack types because the period between attacks was extremely short. Even if there was a difference in the density of attacks, further research is required to confirm whether the same level of performance can be maintained.
- Heterogeneous machines and domains: If a device is located in the outermost part of the network, it generates more logs than those located inside, or if different machines exist in the same location, a different number of logs can be generated depending on the device’s policy. Thus, the volume of data can vary because the size of data generation varies according to the devices and locations of the machines set up in the industrial domain. Therefore, we manually set the batch size by checking the corresponding volume; however, we also needed to determine a suitable batch size and the criteria for small data.
- Static threshold for anomaly detection: Because we used a static threshold that linearly separates the normal and abnormal as the average and maximum values of the training loss, false alerts were sometimes considered true alerts. As shown in Table 7, the performance differs depending on the threshold; specifically, Machine I shows a difference of approximately 2.0. Some solutions for handling this phenomenon are as follows: (1) Sampling the alerts appearing in the normal interval between attacks for training. (2) Choosing an optimal threshold to minimize false alerts and maximize the detection of actual alerts, for example, by using the median or assigning weights with a histogram of training losses. Additionally, we plan to find a method for the model to optimize batch size and thresholds regardless of the dataset.
- Operation time: In a situation where real-time detection is important, a long training time causes problems. In this study, the training time differed depending on the size of the dataset for each machine. In addition, the operation time was affected by both the dataset size and input size. As presented in Table 5, after data preprocessing, the number of fields at least doubles for all machines except Machine IV. When other machines are combined with Snort, the number of fields increases in proportion with the size of Snort, leading to an increase in operation time. When more than two machines are combined, the number of fields increases even further. To address this problem, a more efficient model should be developed or fields that have been proven to have low relevance to attacks should be removed, either through IRA or deep learning analysis.
- Comparative analysis to create a naive autoencoder: In this study, we used a naive autoencoder to examine the possibility of its application and to use it as a baseline for future work. It is also necessary to compare its performance with other state-of-the-art models, such as the VAE, Conv-AE, or attention techniques. To implement anomaly detection, we adopted the loss of the batch and, by checking the loss of individual fields in the batch, we also observed that the loss in specific fields had some effect; however, this was not addressed in this study. In future research, we aim to extract the pattern of each attack by combining the IRA and the loss results of individual fields.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Cybersecurity & Infrastructure Security Agency. DarkSide Ransomware: Best Practices for Preventing Business Disruption from Ransomware Attacks. Available online: https://www.cisa.gov/news-events/cybersecurity-advisories/aa21-131a (accessed on 13 October 2024).
- Federal Bureau of Investigation, Cyber Division. Conti Ransomware Attacks Impact Healthcare and First Responder Networks. Available online: https://www.cisa.gov/sites/default/files/publications/Conti%20Ransomware%20Healthcare%20Networks.pdf (accessed on 13 October 2024).
- Stouffer, K.; Pease, M.; Tang, C.; Zimmerman, T.; Pillitteri, V.; Lightman, S. NIST SP 800-82 rev.3(Draft): Guide to Operational Technology (OT) Security; National Institute of Standards and Technology Special Publication: Gaithersburg, MD, USA, 2022. Available online: https://csrc.nist.gov/pubs/sp/800/82/r3/ipd (accessed on 13 October 2024).
- Conti, M.; Donadel, D.; Turrin, F. A Survey on Industrial Control System Testbeds and Datasets for Security Research. IEEE Commun. Surv. Tutor. 2021, 23, 2248–2294. [Google Scholar] [CrossRef]
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
- Garcia, S.; Grill, M.; Stiborek, J.; Zunino, A. An empirical comparison of botnet detection methods. Comput. Secur. 2014, 45, 100–123. [Google Scholar] [CrossRef]
- Balkanli, E.; Alves, J.; Zincir-Heywood, A.N. Supervised learning to detect DDoS attacks. In Proceedings of the 2014 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), Orlando, FL, USA, 9–12 December 2014; pp. 1–8. [Google Scholar]
- Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Yousefi-Azar, M.; Varadharajan, V.; Hamey, L.; Tupakula, U. Autoencoder-based feature learning for cyber security applications. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 3854–3861. [Google Scholar]
- Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the ICISSp, Madeira, Portugal, 22–24 January 2018; pp. 108–116. [Google Scholar]
- Sharafaldin, I.; Lashkari, A.H.; Hakak, S.; Ghorbani, A.A. Developing Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxonomy. In Proceedings of the 2019 International Carnahan Conference on Security Technology (ICCST), Chennai, India, 1–3 October 2019; pp. 1–8. [Google Scholar] [CrossRef]
- Jazi, H.H.; Gonzalez, H.; Stakhanova, N.; Ghorbani, A.A. Detecting HTTP-based application layer DoS attacks on web servers in the presence of sampling. Comput. Networks 2017, 121, 25–36. [Google Scholar] [CrossRef]
- Maciá-Fernández, G.; Camacho, J.; Magán-Carrión, R.; García-Teodoro, P.; Therón, R. UGR‘16: A new dataset for the evaluation of cyclostationarity-based network IDSs. Comput. Secur. 2018, 73, 411–424. [Google Scholar] [CrossRef]
- Tama, B.A.; Comuzzi, M.; Rhee, K.H. TSE-IDS: A two-stage classifier ensemble for intelligent anomaly-based intrusion detection system. IEEE Access 2019, 7, 94497–94507. [Google Scholar] [CrossRef]
- Qassim, Q.; Zin, A.M.; Ab Aziz, M.J. Anomalies Classification Approach for Network-based Intrusion Detection System. Int. J. Netw. Secur. 2016, 18, 1159–1172. [Google Scholar]
- Atefi, K.; Hashim, H.; Khodadadi, T. A Hybrid Anomaly Classification with Deep Learning (DL) and Binary Algorithms (BA) as Optimizer in the Intrusion Detection System (IDS). In Proceedings of the 2020 16th IEEE International Colloquium on Signal Processing & Its Applications (CSPA), Langkawi, Malaysia, 28–29 February 2020; pp. 29–34. [Google Scholar]
- Gamage, S.; Samarabandu, J. Deep learning methods in network intrusion detection: A survey and an objective comparison. J. Netw. Comput. Appl. 2020, 169, 102767. [Google Scholar] [CrossRef]
- D’hooge, L.; Wauters, T.; Volckaert, B.; De Turck, F. In-depth comparative evaluation of supervised machine learning approaches for detection of cybersecurity threats. In Proceedings of the 4th International Conference on Internet of Things, Big Data and Security (IoTBDS), Heraklion, Greece, 2–4 May 2019; pp. 125–136. [Google Scholar]
- Hosseini, S.; Azizi, M. The hybrid technique for DDoS detection with supervised learning algorithms. Comput. Netw. 2019, 158, 35–45. [Google Scholar] [CrossRef]
- Mebawondu, J.O.; Alowolodu, O.D.; Mebawondu, J.O.; Adetunmbi, A.O. Network intrusion detection system using supervised learning paradigm. Sci. Afr. 2020, 9, e00497. [Google Scholar] [CrossRef]
- Kim, M. Supervised learning-based DDoS attacks detection: Tuning hyperparameters. ETRI J. 2019, 41, 560–573. [Google Scholar] [CrossRef]
- Aksu, D.; Üstebay, S.; Aydin, M.A.; Atmaca, T. Intrusion detection with comparative analysis of supervised learning techniques and fisher score feature selection algorithm. In Proceedings of the International Symposium on Computer and Information Sciences, Poznan, Poland, 20–21 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 141–149. [Google Scholar]
- Hwang, R.H.; Peng, M.C.; Huang, C.W.; Lin, P.C.; Nguyen, V.L. An Unsupervised Deep Learning Model for Early Network Traffic Anomaly Detection. IEEE Access 2020, 8, 30387–30399. [Google Scholar] [CrossRef]
- Alom, M.Z.; Taha, T.M. Network intrusion detection for cyber security using unsupervised deep learning approaches. In Proceedings of the 2017 IEEE National Aerospace and Electronics Conference (NAECON), Dayton, OH, USA, 27–30 June 2017; pp. 63–69. [Google Scholar] [CrossRef]
- Goh, J.; Adepu, S.; Tan, M.; Lee, Z.S. Anomaly detection in cyber physical systems using recurrent neural networks. In Proceedings of the 2017 IEEE 18th International Symposium on High Assurance Systems Engineering (HASE), Singapore, 12–14 January 2017; pp. 140–145. [Google Scholar]
- Schneider, P.; Böttinger, K. High-performance unsupervised anomaly detection for cyber-physical system networks. In Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCy, Toronto, ON, Canada, 15–19 October 2018; pp. 1–12. [Google Scholar]
- Tuor, A.; Kaplan, S.; Hutchinson, B.; Nichols, N.; Robinson, S. Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Meira, J.; Andrade, R.; Praça, I.; Carneiro, J.; Bolón-Canedo, V.; Alonso-Betanzos, A.; Marreiros, G. Performance evaluation of unsupervised techniques in cyber-attack anomaly detection. J. Ambient Intell. Humaniz. Comput. 2020, 11, 4477–4489. [Google Scholar] [CrossRef]
- Nguyen, Q.P.; Lim, K.W.; Divakaran, D.M.; Low, K.H.; Chan, M.C. GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection. In Proceedings of the 2019 IEEE Conference on Communications and Network Security (CNS), Washington, DC, USA, 10–12 June 2019; pp. 91–99. [Google Scholar] [CrossRef]
- Rao, A.R.; Wang, H.; Gupta, C. Functional approach for Two Way Dimension Reduction in Time Series. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17–20 December 2022; pp. 1099–1106. [Google Scholar] [CrossRef]
- Karimipour, H.; Dehghantanha, A.; Parizi, R.M.; Choo, K.K.R.; Leung, H. A deep and scalable unsupervised machine learning system for cyber-attack detection in large-scale smart grids. IEEE Access 2019, 7, 80778–80788. [Google Scholar] [CrossRef]
- Kundu, A.; Sahu, A.; Serpedin, E.; Davis, K. A3D: Attention-based auto-encoder anomaly detector for false data injection attacks. Electr. Power Syst. Res. 2020, 189, 106795. [Google Scholar] [CrossRef]
- Zhang, C.; Song, D.; Chen, Y.; Feng, X.; Lumezanu, C.; Cheng, W.; Ni, J.; Zong, B.; Chen, H.; Chawla, N.V. A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 1409–1416. [Google Scholar]
- Choi, W.H.; Kim, J. Unsupervised Learning Approach for Anomaly Detection in Industrial Control Systems. Appl. Syst. Innov. 2024, 7, 18. [Google Scholar] [CrossRef]
- Choi, S.; Yun, J.H.; Min, B.G.; Kim, H. POSTER: Expanding a Programmable CPS Testbed for Network Attack Analysis. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, ASIA CCS ’20, Taipei, Taiwan, 5–9 October 2020; pp. 928–930. [Google Scholar] [CrossRef]
ID | Vendor | Type | Installation | ACL | Signature |
---|---|---|---|---|---|
I | A | IDS/IPS | Physical | ✓ | ✓ |
II | B | IDS/IPS | Physical | ✓ | ✓ |
III | C | IDS/IPS | Physical | ✓ | ✓ |
IV | C | Firewall | Physical | ✓ | |
V | D | IDS/IPS | Physical | ✓ | ✓ |
VI | Open-source | Snort | Virtual | ✓ | ✓ |
VII | Open-source | Suricata | Virtual | ✓ | ✓ |
Category | Service | Attack Type | Attack Duration |
---|---|---|---|
Web | HTTP | 152 | 25 min 14 s |
OS | SMB | 81 | 13 min 29 s |
Remote procedure | DCERPC | 65 | 10 min 55 s |
Database | MySQL | 18 | 2 min 21 s |
DDoS | Specific protocols | 13 | 2 h 14 min 53 s |
Class | Machine | ||||||
---|---|---|---|---|---|---|---|
I | II | III | IV | V | Snort | Suricata | |
Normal | 6394 (39.4%) | 11,944 (26.0%) | 1901 (5.3%) | 1448 (23.8%) | 27 (0.4%) | 2649 (3.8%) | 546 (1.5%) |
HTTP | 4279 (26.4%) | 16,565 (36.1%) | 8891 (24.7%) | 3422 (56.3%) | 4579 (67.95%) | 51,287 (74.3%) | 2074 (5.9%) |
MySQL | 2222 (13.7%) | 3313 (7.2%) | 438 (1.2%) | 328 (5.4%) | - | 589 (0.9%) | 172 (0.5%) |
SMB | 1252 (7.7%) | 2435 (5.3%) | 861 (2.4%) | 284 (4.7%) | 3 (0.04%) | 401 (0.6%) | 98 (0.3%) |
DCERPC | 587 (3.6%) | 1245 (2.7%) | 676 (1.9%) | 366 (6.0%) | - | 735 (1.1%) | 248 (0.7%) |
DDoS | 1476 (9.1%) | 10,360 (22.6%) | 23,161 (64.5%) | 232 (3.8%) | 2130 (31.61%) | 13,333 (19.3%) | 32,286 (91.1%) |
Machine | Train Data | Test Data | Subtotal |
---|---|---|---|
I | 537,155 (96.5%) | 19,421 (3.5%) | 556,576 (100%) |
II | 1,012,399 (95%) | 52,777 (5%) | 1,065,176 (100%) |
III | 2,281,683 (91%) | 226,882 (9%) | 2,508,565 (100%) |
IV | 1,488,850 (90%) | 165,969 (10%) | 1,654,819 (100%) |
V | 4872 (41.4%) | 6887 (58.6%) | 11,759 (100%) |
Snort | 456,847 (85.2%) | 79,304 (14.8%) | 536,151 (100%) |
Suricata | 33,063 (48.2%) | 35,507 (51.8%) | 68,570 (100%) |
Total | 5,814,869 (90.8%) | 586,747 (9.2%) | 6,401,616 (100%) |
Category | Machine | ||||||
---|---|---|---|---|---|---|---|
I | II | III | IV | V | Snort | Suricata | |
Raw data | 20 | 48 | 52 | 38 | 30 | 21 | 69 |
Encoded data | 63 | 179 | 99 | 38 | 75 | 144 | 1122 |
Field | Data Type | Unique Value | Instance |
---|---|---|---|
timeSent | E | 82,626 | ‘2020-08-21T07:45:54.000Z’, ‘2020-08-21T07:46:04.000Z’, … |
@version | E | 1 | 1 |
@timestamp | E | 98,972 | ‘2020-08-21T07:45:55.295Z’, ‘2020-08-21T07:46:05.295Z’, ‘2020-08-21T07:46:05.296Z’, … |
original | E | 950,484 | ‘08/21-16:45:54.143014 {[}**{]} {[}1:1234567892:0{]} home -\textgreater external {[}**{]}{[}Priority: 0{]} {UDP} a.b.c.d:### -\textgreater a.b.c.d:###’ |
levelRisk | N | 4 | 0, 1, 2, 3 |
portVictim | N | 890 | 138, 1947, 137, 67, 5355, 3991, … |
portAttacker | N | 13,862 | 138, 49157, 56856, 49811, 137, 49155, 60130, 68, 51509, … |
id_generator | N | 1 | 1 |
id_revision | N | 21 | 0, 1, 18, 9, 17, 13, 10, 5, 7, 11, 8, 2, 6, 3, 15, 4, 12, 22, 16, 14, 25 |
id_signiture | N | 95 | 1234567892, 1234567893, 41701, 1234567891, 2381, 40046, … |
IPAttacker | C | 43 | Mixed IPv4, IPv6 |
IPVictim | C | 28 | Mixed IPv4, IPv6 |
location.lat | C | 1 | latitude |
location.lon | C | 1 | longitude |
nameMachine | C | 1 | ‘Snort IPS System’ |
nameAttack | C | 92 | ‘home -\textgreater external’, ‘External -\textgreater external’, ‘SERVER-IIS cmd.exe access’, … |
protocol | C | 4 | ‘udp’, ‘tcp’, ‘icmp’, ‘igmp’ |
nameOperator | C | 1 | ‘A’ |
nameUnit | C | 1 | ‘c’ |
type | C | 1 | ‘snort’ |
categoryModule | C | 14 | nan, ‘Potential Corporate Privacy Violation’, ‘Attempted Administrator Privilege Gain’, ‘Web Application Attack’, … |
Threshold | Method | Metric | Machine | ||||||
---|---|---|---|---|---|---|---|---|---|
I | II | III | IV | V | Snort | Suricata | |||
Maximum of training loss | PCA | Accuracy | 0.5645 | 0.7015 | 0.9477 | 0.7712 | 0.1186 | 0.9758 | 0.9832 |
Precision | 0.9723 | 1.0000 | 0.9904 | 1.0000 | 1.0000 | 0.9956 | 0.9972 | ||
Recall | 0.2891 | 0.5964 | 0.9541 | 0.6997 | 0.1150 | 0.9792 | 0.9858 | ||
F1-Score | 0.4457 | 0.7472 | 0.9719 | 0.8233 | 0.2063 | 0.9873 | 0.9914 | ||
FPR | 0.0127 | 0.0000 | 0.1657 | 0.0000 | 0.0000 | 0.1091 | 0.1795 | ||
FNR | 0.7109 | 0.4036 | 0.0459 | 0.3003 | 0.8850 | 0.0208 | 0.0142 | ||
AE | Accuracy | 0.8952 | 0.9739 | 0.9626 | 0.7776 | 0.9960 | 0.9815 | 0.9913 | |
Precision | 0.9654 | 0.9662 | 0.9813 | 0.9881 | 0.9960 | 0.9928 | 0.9951 | ||
Recall | 0.8577 | 0.9996 | 0.9793 | 0.7168 | 1.0000 | 0.9879 | 0.9961 | ||
F1-Score | 0.9083 | 0.9826 | 0.9803 | 0.8308 | 0.9980 | 0.9903 | 0.9956 | ||
FPR | 0.0472 | 0.0992 | 0.3346 | 0.0276 | 1.0000 | 0.1782 | 0.3132 | ||
FNR | 0.1423 | 0.0004 | 0.0207 | 0.2832 | 0.0000 | 0.0121 | 0.0039 | ||
LSTM-AE | Accuracy | 0.8949 | 0.9741 | 0.9626 | 0.7776 | 0.9960 | 0.9817 | 0.9913 | |
Precision | 0.9660 | 0.9666 | 0.9813 | 0.9881 | 0.9960 | 0.9936 | 0.9951 | ||
Recall | 0.8566 | 0.9995 | 0.9791 | 0.7168 | 1.0000 | 0.9873 | 0.9961 | ||
F1-Score | 0.9080 | 0.9828 | 0.9802 | 0.8308 | 0.9980 | 0.9905 | 0.9961 | ||
FPR | 0.0463 | 0.0981 | 0.3340 | 0.0276 | 1.0000 | 0.1582 | 0.3132 | ||
FNR | 0.1434 | 0.0005 | 0.0209 | 0.2832 | 0.0000 | 0.0127 | 0.0039 | ||
Average of training loss | PCA | Accuracy | 0.5769 | 0.7243 | 0.9470 | 0.7574 | 0.1186 | 0.9686 | 0.9833 |
Precision | 0.6479 | 0.8121 | 0.9687 | 0.8099 | 1.0000 | 0.9786 | 0.9909 | ||
Recall | 0.6599 | 0.8161 | 0.9756 | 0.8905 | 0.1150 | 0.9890 | 0.9922 | ||
F1-Score | 0.6539 | 0.8141 | 0.9721 | 0.8483 | 0.2063 | 0.9838 | 0.9915 | ||
FPR | 0.5505 | 0.5362 | 0.5650 | 0.6685 | 0.0000 | 0.5428 | 0.5824 | ||
FNR | 0.3401 | 0.1839 | 0.0244 | 0.1095 | 0.8850 | 0.0110 | 0.0078 | ||
AE | Accuracy | 0.6590 | 0.9518 | 0.9632 | 0.7618 | 0.9960 | 0.9719 | 0.9901 | |
Precision | 0.6459 | 0.9388 | 0.9687 | 0.7618 | 0.9960 | 0.9831 | 0.9928 | ||
Recall | 0.9671 | 0.9999 | 0.9932 | 1.0000 | 1.0000 | 0.9879 | 0.9972 | ||
F1-Score | 0.7745 | 0.9684 | 0.9808 | 0.8648 | 0.9980 | 0.9854 | 0.9950 | ||
FPR | 0.8140 | 0.1850 | 0.5739 | 1.0000 | 1.0000 | 0.4266 | 0.4634 | ||
FNR | 0.0329 | 0.0001 | 0.0068 | 0.0000 | 0.0000 | 0.0121 | 0.0028 | ||
LSTM-AE | Accuracy | 0.8317 | 0.8393 | 0.9582 | 0.7618 | 0.9960 | 0.9732 | 0.9883 | |
Precision | 0.8461 | 0.8217 | 0.9671 | 0.7618 | 0.9960 | 0.9732 | 0.9905 | ||
Recall | 0.8826 | 0.9995 | 0.9895 | 1.0000 | 1.0000 | 0.9925 | 0.9976 | ||
F1-Score | 0.8640 | 0.9019 | 0.9782 | 0.8648 | 0.9980 | 0.9862 | 0.9941 | ||
FPR | 0.2465 | 0.6157 | 0.6023 | 1.0000 | 1.0000 | 0.5108 | 0.6099 | ||
FNR | 0.1174 | 0.0005 | 0.0105 | 0.0000 | 0.0000 | 0.0075 | 0.0024 |
Threshold | Class | Machine | ||||||
---|---|---|---|---|---|---|---|---|
I | II | III | IV | V | Snort | Suricata | ||
Maximum of training loss | HTTP | 4279 (100%) | 16,565 (100%) | 8369 (94.13%) | 2954 (86.32%) | 4579 (100%) | 51,267 (99.96%) | 2064 (99.52%) |
MySQL | 2188 (98.47%) | 3311 (99.94%) | 292 (66.67%) | 0 (0%) | - | 204 (34.63%) | 104 (60.47%) | |
SMB | 391 (31.23%) | 2435 (100%) | 823 (95.59%) | 0 (0%) | 3 (100%) | 0 (0%) | 42 (42.86%) | |
DCERPC | 128 (21.81%) | 1234 (99.12%) | 676 (100%) | 366 (100%) | - | 735 (100%) | 245 (98.79%) | |
DDoS | 1433 (97.09%) | 10,360 (100%) | 23,161 (100%) | 0 (0%) | 2130 (100%) | 13,333 (100%) | 32,286 (100%) | |
Average of training loss | HTTP | 4279 (100%) | 16,565 (100%) | 8763 (98.56%) | 3422 (100%) | 4579 (100%) | 51,267 (99.96%) | 2064 (99.52%) |
MySQL | 2222 (100%) | 3311 (99.94%) | 374 (85.39%) | 328 (100%) | - | 204 (34.63%) | 123 (71.51%) | |
SMB | 1056 (84.35%) | 2435 (100%) | 823 (95.59%) | 284 (100%) | 3 (100%) | 0 (0%) | 58 (59.18%) | |
DCERPC | 460 (78.36%) | 1245 (100%) | 676 (100%) | 366 (100%) | - | 735 (100%) | 248 (100%) | |
DDoS | 1476 (100%) | 10,360 (100%) | 23,161 (100%) | 232 (100%) | 2130 (100%) | 13,333 (100%) | 32,286 (100%) |
Threshold | Cooperation | Metric | Machine | |||||
---|---|---|---|---|---|---|---|---|
I | II | III | IV | V | Suricata | |||
Maximum of training loss | Without Snort | Accuracy | 0.8952 | 0.9739 | 0.9626 | 0.7776 | 0.9960 | 0.9913 |
Precision | 0.9654 | 0.9662 | 0.9813 | 0.9881 | 0.9960 | 0.9951 | ||
Recall | 0.8577 | 0.9996 | 0.9793 | 0.7168 | 1.0000 | 0.9961 | ||
F1-Score | 0.9083 | 0.9826 | 0.9803 | 0.8308 | 0.9980 | 0.9956 | ||
FPR | 0.0472 | 0.0992 | 0.3346 | 0.0276 | 1.0000 | 0.3132 | ||
FNR | 0.1423 | 0.0004 | 0.0207 | 0.2832 | 0.0000 | 0.0039 | ||
With Snort | Accuracy | 0.8754 | 0.9278 | 0.9001 | 0.9302 | 0.9090 | 0.8708 | |
Precision | 0.9911 | 0.9805 | 0.9878 | 0.9858 | 0.9946 | 0.9945 | ||
Recall | 0.8755 | 0.9347 | 0.9074 | 0.9312 | 0.9104 | 0.8708 | ||
F1-Score | 0.9298 | 0.9570 | 0.9459 | 0.9577 | 0.9506 | 0.9285 | ||
FPR | 0.1269 | 0.1148 | 0.2923 | 0.0753 | 0.1274 | 0.1295 | ||
FNR | 0.1245 | 0.0653 | 0.0926 | 0.0688 | 0.0896 | 0.1292 | ||
Average of training loss | Without Snort | Accuracy | 0.6590 | 0.9518 | 0.9632 | 0.7618 | 0.9960 | 0.9901 |
Precision | 0.6459 | 0.9388 | 0.9687 | 0.7618 | 0.9960 | 0.9928 | ||
Recall | 0.9671 | 0.9999 | 0.9932 | 1.0000 | 1.0000 | 0.9972 | ||
F1-Score | 0.7745 | 0.9684 | 0.9808 | 0.8648 | 0.9980 | 0.9950 | ||
FPR | 0.8140 | 0.1850 | 0.5739 | 1.0000 | 1.0000 | 0.4634 | ||
FNR | 0.0329 | 0.0001 | 0.0068 | 0.0000 | 0.0000 | 0.0028 | ||
With Snort | Accuracy | 0.9603 | 0.8632 | 0.9647 | 0.8490 | 0.9751 | 0.9732 | |
Precision | 0.9713 | 0.8628 | 0.9647 | 0.8490 | 0.9879 | 0.9803 | ||
Recall | 0.9870 | 1.0000 | 1.0000 | 1.0000 | 0.9862 | 0.9922 | ||
F1-Score | 0.9791 | 0.9264 | 0.9820 | 0.9184 | 0.9871 | 0.9862 | ||
FPR | 0.4711 | 0.9820 | 0.9588 | 1.0000 | 0.3082 | 0.5339 | ||
FNR | 0.0130 | 0.0000 | 0.0000 | 0.0000 | 0.0138 | 0.0078 |
Class | Distinct Attack | Dataset | ||
---|---|---|---|---|
Train | Test | Subtotal | ||
Normal | - | 67,343 | 9711 | 77,054 |
DoS | 11 | 45,927 | 7460 | 53,387 |
Probe | 6 | 11,656 | 2421 | 14,077 |
R2L | 15 | 995 | 2885 | 3880 |
U2R | 7 | 52 | 67 | 119 |
Total | 39 | 125,973 | 22,544 | 71,463 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jang, Y.I.; Choi, S.; Min, B.-G.; Choi, Y.-J. Unsupervised Security Threats Identification for Heterogeneous Events. Electronics 2024, 13, 4061. https://doi.org/10.3390/electronics13204061
Jang YI, Choi S, Min B-G, Choi Y-J. Unsupervised Security Threats Identification for Heterogeneous Events. Electronics. 2024; 13(20):4061. https://doi.org/10.3390/electronics13204061
Chicago/Turabian StyleJang, Young In, Seungoh Choi, Byung-Gil Min, and Young-June Choi. 2024. "Unsupervised Security Threats Identification for Heterogeneous Events" Electronics 13, no. 20: 4061. https://doi.org/10.3390/electronics13204061
APA StyleJang, Y. I., Choi, S., Min, B. -G., & Choi, Y. -J. (2024). Unsupervised Security Threats Identification for Heterogeneous Events. Electronics, 13(20), 4061. https://doi.org/10.3390/electronics13204061