Using Rough Set Theory to Find Minimal Log with Rule Generation
Abstract
:1. Introduction
- Developing a new algorithm using RST basic concepts to create minimal re-ducts;
- Offering a feasible feature selection methodology scalable to huge datasets, without sacrificing performance;
- Creating a minimal rule decision database that retains information content;
- Using three benchmark UCI datasets to evaluate the performance of the methodology;
- Comparing the result of the proposed model to recent works.
2. Related Works
3. Theoretical Background
3.1. Rough Set
3.2. R Language
4. Research Methodology
4.1. Problem Statement and Motivation
4.2. Datasets
4.3. Building a Minimal Log Size (Reduct)
- Splitting the dataset into N subsets and performing the proposed algorithm on each subset will overcome hardware limitations, since fewer entries means less memory space to upload the data, perform computations, and store the results. Keeping the whole high dimensional dataset in memory and performing all the previous steps, is mostly impossible;
- Reducing the number of calculations, since passing only the minimal elements in the discernability matrix to reducts calculation will not cause the computation of each possible attribute combination, and hence the equation = − 1 is no longer valid. This will certainly reduce the execution time. The proposed code is given in Algorithm 1:
Algorithm 1: IRS Algorithm |
Input: T = (U,A∪D): information table, N: number of iterations, M: number of datasets Output: Core–Reduct, 1: For each dataset M do 2: For each iteration N do 3: Calculate INDN(D) 4: Compute DISC.MatrixN(T) 5: Do while (DISC.MatrixN(T) ≠ Ø) and i ≤ j (RST discernibility matrix is symetric) 6: Si0,j0 = Sort (xi,xj) ∈ DISC.MatrixN(T) according to number of conditional attributes A 7: End while 8: Compute ReductN(Si0,j0) (calculating reducts for minimal condition atrridutes) 9: ReductN = ReductN ∩ ReductN(Si0,j0) 10: End For N 11: Core–Reduct = Core–Reduct ∩ ReductN minimal optimal reduct 12: End For M |
4.4. Generating Minimal Decision Rules
Algorithm 2: Rule Generation Algorithm |
Input: ReductN (T): minimal reduct information table, M: number of datasets Output: Set-RuleMin 1: For each dataset M do 2: read.table(ReductN (T)) 3: Splitting ReductN (T) training set 60% and a test set 40%. 4: RI.LEM2Rules.RST() function Create rules depending on training set of ReductN (T) 5: predict() function Testing the quality of prediction depending on the test set of ReductN (T) 6: mean() function. Checking the accuracy of predictions 7: End For M |
4.5. Execution Time Comparison with Existing Methods
5. Conclusions and Future Works
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lundgren, B.; Moller, N. Defining information security. Sci. Eng. Ethics 2019, 25, 419–441. [Google Scholar] [CrossRef] [Green Version]
- Bass, T. Intrusion detection systems and multisensor datafusion. Commun. ACM 2000, 43, 99–105. [Google Scholar] [CrossRef]
- Xi, R.-R.; Yun, X.-C.; Jin, S.-y.; Zhang, Y.-Z. Research survey of network security situation awareness. J. Comput. Appl. 2012, 32, 1–4. [Google Scholar]
- Lai, J.-B.; Wang, Y.; Wang, H.-Q.; Zheng, F.-B.; Zhou, B. Research on network security situation awareness system architecture based on multi-source heterogeneous sensors. Comput. Sci. 2011, 38, 144. [Google Scholar]
- Yen, S.; Moh, M. Intelligent Log Analysis Using Machine and Deep Learning. In Research Anthology on Artificial Intelligence Applications in Security; IGI Global: Hershey, PA, USA, 2021; pp. 1154–1182. [Google Scholar]
- Svacina, J.; Raffety, J.; Woodahl, C.; Stone, B.; Cerny, T.; Bures, M.; Shin, D.; Frajtak, K.; Tisnovsky, P. On Vulnerability and Security Log Analysis: A Systematic Literature Review on Recent Trends. In Proceedings of the International Conference on Research in Adaptive and Convergent Systems, Gwangju, Korea, 13–16 October 2020; Association for Computing Machinery: New York, NY, USA; pp. 175–180. [Google Scholar]
- Chuvakin, A. Security event analysis through correlation. Inf. Secur. J. Glob. Perspect. 2004, 13, 13–18. [Google Scholar] [CrossRef]
- Klemettinen, M.; Mannila, H.; Toivonen, H. Rule discovery in telecommunication alarm data. J. Netw. Syst. Manag. 1999, 7, 395–423. [Google Scholar] [CrossRef]
- Bao, X.-H.; Dai, Y.-X.; Feng, P.-H.; Zhu, P.-F.; Wei, J. A detection and forecast algorithm for multi-step attack based on intrusion intention. J. Softw. 2005, 16, 2132–2138. [Google Scholar] [CrossRef] [Green Version]
- Gonzalez-Granadillo, G.; Gonzalez-Zarzosa, S.G.; Diaz, R. Security information and event management(siem): Analysis, trends, and usage in critical infrastructures. Sensors 2021, 21, 4759. [Google Scholar] [CrossRef] [PubMed]
- Liu, J.; Gu, L.; Xu, G.; Niu, X. A Correlation Analysis Method of Network Security Events Based on Rough Set Theory. In Proceedings of the 3rd IEEE International Conference on Network Infrastructure and Digital Content, Beijing, China, 21–23 September 2012; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2012; pp. 517–520. [Google Scholar]
- Yao, Y.; Wang, Z.; Gan, C.; Kang, Q.; Liu, X.; Xia, Y.; Zhang, L. Multi-source alert data understanding for security semantic discovery based on rough set theory. Neurocomputing 2016, 208, 39–45. [Google Scholar] [CrossRef]
- Bao, L.; Li, Q.; Lu, P.; Lu, J.; Ruan, T.; Zhang, K. Execution anomaly detection in large-scale systems through console log analysis. J. Syst. Softw. 2018, 143, 172–186. [Google Scholar] [CrossRef]
- Bania, R. Comparative review on classical rough set theory based feature selection methods. Int. J. Comput. Appl. 2015, 114, 31–35. [Google Scholar]
- Dagdia, Z.C.; Zarges, C.; Beck, G.; Lebbah, M. A scalable and effective rough set theory-based approach for big data pre-processing. Knowl. Inf. Syst. 2020, 62, 3321–3386. [Google Scholar]
- Raman, M.G.; Kirthivasan, K.; Sriram, V.S. Development of rough set–hypergraph technique for key feature identification in intrusion detection systems. Comput. Electr. Eng. 2017, 59, 189–200. [Google Scholar] [CrossRef]
- Dutta, S.; Ghatak, S.; Dey, R.; Das, A.K.; Ghosh, S. Attribute selection for improving spam classification in online social networks: A rough set theory-based approach. Soc. Netw. Anal. Min. 2018, 8, 1–16. [Google Scholar] [CrossRef]
- Anitha, A.; Acharjya, D. Crop suitability prediction in vellore district using rough set on fuzzy approximation space and neural network. Neural Comput. Appl. 2018, 30, 3633–3650. [Google Scholar] [CrossRef]
- Nanda, N.B.; Parikh, A. Hybrid Approach for Network Intrusion Detection System Using Random Forest Classifier and Rough Set Theory for Rules Generation. In Proceedings of the 3rd International Conference on Advanced Informatics for Computing Research, Shimla, India, 15–16 June 2019; Springer: Cham, Switzerland, 2019; pp. 274–287. [Google Scholar]
- Dagdia, Z.C.; Zarges, C.; Beck, G.; Lebbah, M. A distributed rough set theory based algorithm for an efficient big data pre-processing under the spark framework. In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 11–14 December 2017; pp. 911–916. [Google Scholar]
- Chen, D.; Zhang, L.; Zhao, S.; Hu, Q.; Zhu, P. A novel algorithm for finding reducts with fuzzy rough sets. IEEE Trans. Fuzzy Syst. 2011, 20, 385–389. [Google Scholar] [CrossRef]
- Ahmed, S.; Zhang, M.; Peng, L. Enhanced Feature Selection for Biomarker Discovery in LC-MS Data Using GP. In Proceedings of the 2013 IEEE Congress on Evolutionary Computation, Cancun, Mexico, 20–23 June 2013; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2013; pp. 584–591. [Google Scholar]
- Liu, H.; Yu, L. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 2005, 17, 491–502. [Google Scholar]
- Aghdam, M.H.; Ghasem-Aghaee, N.; Basiri, M.E. Text feature selection using ant colony optimization. Expert Syst. Appl. 2009, 36, 6843–6853. [Google Scholar] [CrossRef]
- Bolón-Canedo, V.; Rego-Fernández, D.; Peteiro-Barral, D.; Alonso-Betanzos, A.; Guijarro-Berdiñas, B.; Sánchez-Maroño, N. On the scalability of feature selec-tion methods on high-dimensional data. Knowl. Inf. Syst. 2018, 56, 395–442. [Google Scholar] [CrossRef]
- Thangavel, K.; Pethalakshmi, A. Dimensionality re-duction based on rough set theory: A review. Appl. Soft Comput. 2009, 9, 1–12. [Google Scholar] [CrossRef]
- El-Alfy, E.-S.M.; Alshammari, M.A. Towards scalable rough set based attribute subset selection for intrusion detection using parallel genetic algorithm in mapreduce. Simul. Model. Pract. Theory 2016, 64, 18–29. [Google Scholar] [CrossRef]
- Qian, Y.; Liang, X.; Wang, Q.; Liang, J.; Liu, B.; Skowron, A.; Yao, Y.; Ma, J.; Dang, C. Local rough set: A solution to rough data analysis in big data. International J. Approx. Reason. 2018, 97, 38–63. [Google Scholar] [CrossRef]
- Hu, J.; Pedrycz, W.; Wang, G.; Wang, K. Rough sets in distributed decision information systems. Knowl. Based Syst. 2016, 94, 13–22. [Google Scholar] [CrossRef]
- Dai, J.; Hu, H.; Wu, W.-Z.; Qian, Y.; Huang, D. Maximal-discernibility-pair-based approach to attribute reduction in fuzzy rough sets. IEEE Trans. Fuzzy Syst. 2017, 26, 2174–2187. [Google Scholar] [CrossRef]
- Velayutham, C.; Thangavel, K. Improved rough set algorithms for optimal attribute reduct. J. Electron. Sci. Technol. 2011, 9, 108–117. [Google Scholar]
- Pawlak, Z. Rough Sets: Theoretical Aspects of Reasoning about Data; Springer Science & Business Media: Dordrecht, The Netherlands, 2012; Volume 9. [Google Scholar]
- Pawlak, Z.; Skowron, A. Rudiments of rough sets. Inf. Sci. 2007, 177, 3–27. [Google Scholar] [CrossRef]
- Peralta, D.; del Rio, S.; Ramirez-Gallego, S.; Triguero, I.; Benitez, J.M.; Herrera, F. Evolutionary feature selection for big data classification: A mapreduce approach. Math. Probl. Eng. 2015, 2015, 246139. [Google Scholar] [CrossRef] [Green Version]
- Zbigniew, S. An Introduction to Rough Set Theory and Its Applications—A Tutorial. In Proceedings of the 1st International Computer Engineering Conference ICENCO’2004, Cairo, Egypt, 27–30 December 2004. [Google Scholar]
- Ray, K.S. Soft Computing and Its Applications: A Unified Engineering Concept; CRC Press: Boca Raton, FL, USA, 2014; Volume 1. [Google Scholar]
- Hothorn, T. Cran Task View: Machine Learning & Statistical Learning; The R Foundation: Vienna, Austria, 2021. [Google Scholar]
- Rhys, H.I. Machine Learning with R, the Tidyverse, and Mlr; Manning Publications: Shelter Island, NY, USA, 2020. [Google Scholar]
- Tuffery, S. Data Mining and Statistics for Decision Making; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Aphalo, P.J. Learn R: As a Language; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
- Abbas, Z.; Burney, A. A survey of software packages used for rough set analysis. J. Comput. Commun. 2016, 4, 10–18. [Google Scholar] [CrossRef] [Green Version]
- Dubois, D.; Prade, H. Rough fuzzy sets and fuzzy rough sets. Int. J. Gen. Syst. 1990, 17, 191–209. [Google Scholar] [CrossRef]
- Tang, J.; Wang, J.; Wu, C.; Ou, G. On uncertainty measure issues in rough set theory. IEEE Access 2020, 8, 91089–91102. [Google Scholar] [CrossRef]
- Beaubouef, T.; Petry, F.; Arora, G. Information-theoretic measures of uncertainty for rough sets and rough relational databases. Inf. Sci. 1998, 109, 185–195. [Google Scholar] [CrossRef]
- Parthaláin, N.M.; Shen, Q.; Jensen, R. A distance measure approach to exploring the rough set boundary region for attribute reduction. IEEE Trans. Knowl. Data Eng. 2010, 22, 305–317. [Google Scholar] [CrossRef] [Green Version]
- Razavi, S.; Jakeman, A.; Saltelli, A.; Prieur, C.; Iooss, B.; Borgonovo, E.; Plischke, E.; Piano, S.L.; Iwanaga, T.; Becker, W.; et al. The future of sensitivity analysis: An essential discipline for systems modeling and policy support. Environ. Model. Softw. 2021, 137, 104954. [Google Scholar] [CrossRef]
- Rodriguez, J.D.; Perez, A.; Lozano, J.A. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 569–575. [Google Scholar] [CrossRef] [PubMed]
- Sobol, I.M.; Tarantola, S.; Gatelli, D.; Kucherenko, S.; Mauntz, W. Estimating the approximation error when fixing unessential factors in global sensitivity analysis. Reliab. Eng. Syst. Saf. 2007, 92, 957–960. [Google Scholar] [CrossRef]
- Guillaume, J.H.; Jakeman, J.D.; Marsili-Libelli, S.; Asher, M.; Brunner, P.; Croke, B.; Hill, M.C.; Jakeman, A.J.; Keesman, K.J.; Razavi, S.; et al. Introductory overview of identifiability analysis: A guide to evaluating whether you have the right type of data for your modeling purpose. Environ. Model. Softw. 2019, 119, 418–432. [Google Scholar] [CrossRef]
- Saltelli, A.; Bammer, G.; Bruno, I.; Charters, E.; di Fiore, M.; Didier, E.; Espeland, W.N.; Kay, J.; Piano, S.L.; Mayo, D.; et al. Five ways to ensure that models serve society: A manifesto. Nature 2020, 582, 482–484. [Google Scholar] [CrossRef]
- Iooss, B.; Janon, A.; Pujol, G. Sensitivity: Global Sensitivity Analysis of Model Outputs; R Package Version 1.22.0; The R Foundation: Vienna, Austria, 2018. [Google Scholar]
- Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Majeed, A.; Ur Rasool, R.; Ahmad, F.; Alam, M.; Javaid, N. Near-miss situation based visual analysis of siem rules for real time network security monitoring. J. Ambient Intell. Humaniz. Comput. 2019, 10, 1509–1526. [Google Scholar] [CrossRef]
- Riza, L.S.; Janusz, A.; Bergmeir, C.; Cornelis, C.; Herrera, F.; Śle¸zak, D.; Benítez, J.M. Implementing algorithms of rough set theory and fuzzy rough set theory in the R package “roughsets”. Inf. Sci. 2014, 287, 68–89. [Google Scholar] [CrossRef]
- Tsang, E.C.C.; Chen, D.G.; Yeung, D.S.; Wang, X.Z.; Lee, J.W.T. Attributes reduction using fuzzy rough sets. IEEE Trans. Fuzzy Syst. 2008, 16, 1130–1141. [Google Scholar] [CrossRef]
- UCI Machine Learning Repository. 2005. Available online: http://www.ics.uci.edu/mlearn/MLRepository.html (accessed on 3 September 2021).
Event Name | Log Source | Event Count | Low Level Category | Source IP | Source Port | Destination IP | Destination Port | User Name | Magnitude |
---|---|---|---|---|---|---|---|---|---|
Tear down UDP connection | ASA @ 172.17.0.1 | 1 | Fire wall Session Closed | 8.8.8.8 | 53 | 172.18.12.10 | 53,657 | N/A | 7 |
Deny protocol src | R | 1 | Fire wall Deny | 172.20.12.142 | 56,511 | 172.217.23.174 | 443 | N/A | 8 |
Deny protocol src | ASA @ 172.17.0.1 | 1 | Fire wall Deny | 172.20.18.54 | 52,976 | 213.139.38.18 | 80 | N/A | 8 |
Deny protocol src | ASA @ 172.17.0.1 | 1 | Fire wall Deny | 172.20.15.71 | 53,722 | 52.114.75.79 | 443 | N/A | 8 |
Deny protocol src | ASA @ 172.17.0.1 | 1 | Fire wall Deny | 192.168.180.131 | 55,091 | 40.90.22.184 | 443 | N/A | 8 |
Built TCP connection | ASA @ 172.17.0.1 | 1 | Fire wall Deny | 172.18.12.19 | 59,201 | 163.172.21.225 | 443 | N/A | 8 |
Training Data Set | Minimal Attribute | Degree of Dependency 1 |
---|---|---|
First Training Set S1 (∩ three iterations) ReductN = 1 | A1 = {Event Name, Source IP, Source Port, Destination IP, Magnitude } |A1| = 5 | 1 |
Second Training Set S2 (∩ three iterations) ReductN = 2 | A2 = { Event Name, Source IP, Destination IP, Magnitude }|A2| = 4 | 0.9992941 |
Third Training Set S3 (∩ three iterations) ReductN = 3 | A3 = {Event Name, Source IP, Source Port, Destination IP, Magnitude } |A3| = 5 | 1 |
Core-Reduct (A1∩ A2∩ A3) | A2 = { Event Name, Source IP, Destination IP, Magnitude }|A2| = 4 | 0.9992941 |
Training Data Set | Number of Decision Rules before Reduct | Number of Deccision Rules after Reduct | Prediction Accuracy |
---|---|---|---|
First Training Set | S1 = 905 | A1 = 596 | 0.9552733 |
Second Training Set | S2 = 878 | A2 = 509 | 0.9535073 |
Third Training Set | S3 = 813 | A3 = 481 | 0.9741291 |
Dataset | Number of Attributes | Number of Instances |
---|---|---|
Glass | 9 | 100 |
Wiscon | 9 | 699 |
Zoo | 16 | 100 |
Data | Num. of Attributes of the Dataset | All Reducts | Execution Time in Seconds | |||
---|---|---|---|---|---|---|
IRS | SPS and CDM | Classical DiscernibilityMatrix (CDM) | SPS | IRS | ||
Wiscon | 9 | 4 | 4 | 1362.1 | 24.0956 | 9.05 |
Glass | 9 | 2 | 2 | 23.3268 | 0.7931 | 0.7 |
Zoo | 16 | 35 | 35 | 106.6581 | 1.2574 | 0.9967 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alawneh, T.N.; Tut, M.A. Using Rough Set Theory to Find Minimal Log with Rule Generation. Symmetry 2021, 13, 1906. https://doi.org/10.3390/sym13101906
Alawneh TN, Tut MA. Using Rough Set Theory to Find Minimal Log with Rule Generation. Symmetry. 2021; 13(10):1906. https://doi.org/10.3390/sym13101906
Chicago/Turabian StyleAlawneh, Tahani Nawaf, and Mehmet Ali Tut. 2021. "Using Rough Set Theory to Find Minimal Log with Rule Generation" Symmetry 13, no. 10: 1906. https://doi.org/10.3390/sym13101906
APA StyleAlawneh, T. N., & Tut, M. A. (2021). Using Rough Set Theory to Find Minimal Log with Rule Generation. Symmetry, 13(10), 1906. https://doi.org/10.3390/sym13101906