Using a Graph Engine to Visualize the Reconnaissance Tactic of the MITRE ATT&CK Framework from UWF-ZeekData22
Abstract
:1. Introduction
- To date, tactics from the MITRE ATT&CK framework have not been visualized graphically. This work focuses on presenting graphical visualizations of the MITRE ATT&CK Reconnaissance Tactic (TA0043) using graph representation.
- Essential feature selection was performed so that this work generated a graph data model using only a very limited set of network connection features. Feature generation was also performed using the limited set of network connection features.
- The graph models could be effectively used to train machine learning models, especially in the big data environment, in order to accurately predict when network traffic is nefarious.
- The reduction of the network data to only a few features (feature selection) that could be used to identify the Reconnaissance Tactic would be computationally beneficial in machine learning analysis, especially in the big data environment.
- Above all, these graph models can be used to develop a more robust threat intelligence platform (TIP) that would be able to visually detect the attacks before they happen, by recognizing the attack patterns in the data. A TIP is a technology solution that collects, aggregates, and organizes threat intelligence.
2. Related Works
3. The Dataset: UWF-ZeekData22
3.1. Distribution of UWF-ZeekData22 by Tactics
3.2. Software Utilized to Process Data
4. Preprocessing
- Destination ip (id.resp_h) and originating bytes (orig_bytes), used as per [21].
- Total number of connections between the unique source and destination.
- Total duration of the connection(s) between the vertices.
- Total number of bytes of the connections between vertices.
- The attack tactic.
Binning
5. Algorithmic Approach to Creating the Graphs
5.1. Overview of Approach
5.2. Workflow
5.2.1. Reducing the Data
5.2.2. Producing a Non-Cyclic Graph
5.2.3. Binning
5.2.4. Generating Visual Graph
5.3. Algorithmic Approach to Creating the Graphs
Algorithm 1: isCyclic |
Input: Graph G, vertex V to add Output: Boolean true if after adding V, the graph is cyclic, updated G, with vertex V added Add V to G Create and initialize visited array, recursionStack array Mark all vertices as unvisited in both visited and recursionStack forall vertex v in G Return isCyclicUtil (v, visited, recursionStack) isCyclicUtil (vertex, visited array, recurssionStack) if vertex visited before return false if vertex is in recursionStack return true Mark vertex as visited for vertex Mark recursionStack as visited for vertex forall children of vertex if isCyclic (childVertex, visited array, recursionStack) Return true Set recursionStack for vertex to false Return false |
6. Resulting Graph Visualizations
6.1. Star Motif
6.1.1. Visualizing the Reconnaissance Tactic by Connection Count
6.1.2. Visualizing the Reconnaissance Tactic by Average Duration
6.1.3. Visualizing the Reconnaissance Tactic by Average Bytes
6.2. Clique Motif
6.3. Visualizations of Non-Attacks by Count
6.4. Visualizing Attacks by Count
6.5. Visualizations of the Noncyclic Counts
6.6. Summarizing the Graphical Visualizations
7. Runtime Performance
8. Conclusions
9. Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Huong, T.T.; Bac, T.P.; Long, D.M.; Thang, B.D.; Binh, N.T.; Luong, T.D.; Phuc, T.K. LocKedge: Low-Complexity Cyberattack Detection in IoT Edge Computing. IEEE Access 2021, 9, 29696–29710. [Google Scholar] [CrossRef]
- Leevy, J.L.; Hancock, J.; Zuech, R.; Khoshgoftaar, T.M. Detecting Cybersecurity Attacks across Different Network Features and Learners. J. Big Data 2021, 8, 38. [Google Scholar] [CrossRef]
- Bagui, S.; Simonds, J.; Plenkers, R.; Bennett, T.A.; Bagui, S. Classifying UNSW-NB15 Network Traffic in the Big Data Framework Using Random Forest in Spark. Int. J. Big Data Intell. Appl. 2022, 2, 39–61. [Google Scholar] [CrossRef]
- Zhang, J.; Sun, J.; He, H. Clustering Detection Method of Network Intrusion Feature Based on Support Vector Machine and LCA Block Algorithm. Wirel. Pers. Commun. 2021, 127, 599–613. [Google Scholar] [CrossRef]
- Kevric, J.; Jukic, S.; Subasi, A. An Effective Combining Classifier Approach Using Tree Algorithms for Network Intrusion Detection. Neural Comput. Appl. 2016, 28, 1051–1058. [Google Scholar] [CrossRef]
- MITRE ATT&CK Reconnaissance, Tactic TA0043-Enterprise. Available online: https://attack.mitre.org/tactics/TA0043 (accessed on 23 March 2023).
- Jia, Y.; Qi, Y.; Shang, H.; Jiang, R.; Li, A. A Practical Approach to Constructing a Knowledge Graph for Cybersecurity. Engineering 2018, 4, 53–60. [Google Scholar] [CrossRef]
- Oracle Corporation 17 Use Cases for Graph Databases and Graph Analytics. 2021. Available online: https://www.oracle.com/a/ocom/docs/graph-database-use-cases-ebook.pdf (accessed on 19 August 2022).
- Bagui, S.S.; Mink, D.; Bagui, S.C.; Ghosh, T.; Plenkers, R.; McElroy, T.; Dulaney, S.; Shabanali, S. Introducing UWF-ZeekData22: A Comprehensive Network Traffic Dataset Based on the MITRE ATT&CK Framework. Data 2023, 8, 18. [Google Scholar] [CrossRef]
- University of West Florida UWF-ZeekData22. Available online: https://datasets.uwf.edu (accessed on 20 August 2020).
- Lee, M.-C.; Nguyen, H.T.; Berberidis, D.; Tseng, V.S.; Akoglu, L. GAWD: Graph anomaly detection in weighted directed graph databases. In Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Virtual, 8–11 November 2021; ACM: New York, NY, USA, 2021. [Google Scholar]
- Coupette, C.; Vreeken, J. Graph Similarity Description: How Are These Graphs Similar? In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual, 14–18 August 2021; ACM: New York, NY, USA, 2021. [Google Scholar]
- Schindler, T. Anomaly Detection in Log Data Using Graph Databases and Machine Learning to Defend Advanced Persistent Threats. Available online: https://dl.gi.de/handle/20.500.12116/4016 (accessed on 12 April 2023).
- Bai, Y.; Ding, H.; Bian, S.; Chen, T.; Sun, Y.; Wang, W. SimGNN: A Neural Network Approach to Fast Graph Similarity Computation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne, VIC, Australia, 11–15 February 2019; ACM: New York, NY, USA, 2019. [Google Scholar]
- Abraham, S.; Nair, S. A Predictive Framework for Cyber Security Analytics Using Attack Graphs. Int. J. Comput. Netw. Commun. 2015, 7, 1–17. [Google Scholar] [CrossRef]
- Rocha, R.C.; Thatte, B.D. Distributed Cycle Detection in Large-Scale Sparse Graphs. In Proceedings of the Simposio Brasileiro de Pesquisa Operacional (SBPO), Pernambuco, Brazil, 25–28 August 2015. [Google Scholar] [CrossRef]
- Ma, G.; Ahmed, N.K.; Wilke, T.L.; Yu, P.S. Deep graph similarity learning: A survey. Data Min. Knowl. Discov. 2021, 35, 688–725. [Google Scholar] [CrossRef]
- Li, Y.; Gu, C.; Vinyals, O.; Kohli, P. Graph Matching Networks for Learning the Similarity of Graph Structured Objects. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019. [Google Scholar]
- Koutra, D.; Parikh, A.; Ramdas, A.; Xiang, J. Algorithms for Graph Similarity and Subgraph Matching. Comput. Sci. 2011. Available online: https://www.cs.cmu.edu/~jingx/docs/DBreport.pdf (accessed on 1 June 2023).
- GraphStream—A Dynamic Graph Library. Available online: https://graphstream-project.org/ (accessed on 12 March 2023).
- Bagui, S.; Mink, D.; Bagui, S.; Ghosh, T.; McElroy, T.; Paredes, E.; Khasnavis, N.; Plenkers, R. Detecting Reconnaissance and Discovery Tactics from the MITRE ATT&CK Framework in Zeek Conn Logs Using Spark’s Machine Learning in the Big Data Framework. Sensors 2022, 22, 7999. [Google Scholar] [CrossRef] [PubMed]
- Sur, S.; Srimani, P.K. Topological Properties of Star Graphs. Comput. Math. Appl. 1993, 25, 87–98. [Google Scholar] [CrossRef] [Green Version]
- Bron, C.; Kerbosch, J. Algorithm 457: Finding All Cliques of an Undirected Graph. Commun. ACM 1973, 16, 575–577. [Google Scholar] [CrossRef]
- Mackaness, W.A.; Beard, K.M. Use of Graph Theory to Support Map Generalization. Cartogr. Geogr. Inf. Syst. 1993, 20, 210–221. [Google Scholar] [CrossRef]
- Von Landesberger, T.; Görner, M.; Rehner, R.; Schreck, T. A System for Interactive Visual Analysis of Large Graphs Using Motifs in Graph Editing and Aggregation. Proc. Vis. Model. Vis. Workshop 2009, 9, 331–340. [Google Scholar]
- MITRE ATT&CK Active Scanning, Technique T1595-Enterprise. Available online: https://attack.mitre.org/techniques/T1595/ (accessed on 23 January 2023).
- Chapter 4. Port Scanning Overview. Available online: https://nmap.org/book/port-scanning.html#port-scanning-intro (accessed on 13 March 2023).
- Frankel, S.; Green, D. Internet Protocol Version 6. IEEE Secur. Priv. Mag. 2008, 6, 83–86. [Google Scholar] [CrossRef]
Attribute Name | Description of Attribute | Used to Create Graph DB |
---|---|---|
ts | Time of first packet | |
uid | Unique identifier of connection | |
id.orig_h | IP address of packet sender | Yes |
id.orig_p | Outgoing port number | |
id.resp_h | IP address of packet receiver | Yes |
id.resp_p | Incoming port number | |
proto | Transport layer protocol of connection | |
service | Application protocol sent over connection | |
duration | How long connection lasted | Yes |
orig_bytes | Payload bytes originator sent | Yes |
resp_bytes | Payload bytes responder sent | |
conn_state | Possible connection states | |
local_orig | If connection is originated locally | |
local_resp | If connection is responded to locally | |
missed_bytes | Representative of packet loss | |
history | History of connections | |
orig_pkts | Number of packets originator sent | |
orig_ip_bytes | Number of IP level bytes originator sent | |
resp_pkts | Number of packets responder sent | |
resp_ip_bytes | Number of IP level bytes responder sent | |
community_id | ||
id | Connection’s 4-tuple of endpoint addresses/ports | |
tunnel_parents | uid values for encapsulating parent(s) connections |
Attack Tactic | Count |
---|---|
None (Not an attack) | 9,281,599 |
Reconnaissance | 9,278,722 |
Discovery | 2086 |
Credential Access | 31 |
Privilege Escalation | 13 |
Exfiltration | 7 |
Lateral Movement | 4 |
Resource Development | 3 |
Defense Evasion | 1 |
Initial Access | 1 |
Persistence | 1 |
ID | From | To | Total_Dur | Avg_Dur | Total_Bytes | Avg_Bytes | Count | CountBin |
---|---|---|---|---|---|---|---|---|
edge_0 | 143.88.2.10 | 143.88.7.15 | 353,248.5154 | 0.2126 | 2,654,582,328,320 | 1,597,759.9722 | 1,661,440 | 6 |
edge_1 | 143.88.2.10 | 143.88.7.11 | 972,063.5371 | 0.3123 | 5,579,520 | 1.7928 | 3,112,192 | 6 |
edge_2 | 143.88.2.10 | 143.88.7.1 | 279,987.9888 | 0.1338 | 8,567,808 | 4.0934 | 2,093,056 | 6 |
edge_3 | 143.88.2.10 | 143.88.7.12 | 778,386.2988 | 0.6914 | 925,758,636,800 | 822,247.5387 | 1,125,888 | 6 |
edge_257 | 143.88.5.12 | 143.88.5.1 | 943,576.7243 | 1.8777 | 36,458,752 | 72.5507 | 502,528 | 5 |
ID | From | To | Total_Dur | Avg_Dur | Total_Bytes | Avg_Bytes | Count | CountBin |
---|---|---|---|---|---|---|---|---|
edge_3 | 143.88.2.10 | 143.88.7.12 | 778,386.2988 | 0.6913 | 925,758,636,800 | 822,247.5387 | 1,125,888 | 4 |
edge_4 | 143.88.2.10 | 143.88.7.10 | 1792.93927 | 1.4007 | 798,720 | 624 | 1280 | 4 |
edge_42 | 143.88.2.10 | 143.88.7.14 | 3080.24 | 3.0080 | 0 | 0 | 1024 | 4 |
edge_43 | 143.88.2.10 | 143.88.7.13 | 3080.264 | 3.0080 | 0 | 0 | 1024 | 4 |
edge_257 | 143.88.5.12 | 143.88.5.1 | 943,576.7 | 1.8776 | 36,458,752 | 72.55068772 | 502,528 | 4 |
ID | From | To | Total_Dur | Avg_Dur | Total_Bytes | Avg_Bytes | Count | CountBin |
---|---|---|---|---|---|---|---|---|
edge_0 | 143.88.2.10 | 143.88.7.15 | 353,248.5 | 0.212616 | 2,654,582,328,320 | 1,597,760 | 1,661,440 | 3 |
edge_3 | 143.88.2.10 | 143.88.7.12 | 778,386.3 | 0.691353 | 925,758,636,800 | 822,247.5 | 1,125,888 | 4 |
edge_257 | 143.88.5.12 | 143.88.5.1 | 943,576.7 | 1.87766 | 36,458,752 | 72.55069 | 502,528 | 4 |
ID | From | To | Total_Dur | Avg_Dur | Total_Bytes | Avg_Bytes | Count | CountBin |
---|---|---|---|---|---|---|---|---|
edge_21 | 143.88.11.14 | 143.88.11.1 | 1,267,576.92 | 2.6 | 40,376,997 | 82.73 | 488,029 | 5 |
edge_35 | 143.88.255.10 | 10.0.10.1 | 114.42 | 0 | 605,569,716 | 90.06 | 6,724,017 | 6 |
ID | From | To | Total_Dur | Avg_Dur | Total_Bytes | Avg_Bytes | Count | CountBin |
---|---|---|---|---|---|---|---|---|
edge_3 | 143.88.7.10 | 143.88.2.10 | 1216.984 | 0.002334 | 24,576 | 0.047128 | 521,472 | 5 |
edge_6 | 143.88.2.10 | 143.88.7.15 | 353,248.5 | 0.212616 | 2,654,582,328,320 | 1597760 | 1,661,440 | 6 |
edge_7 | 143.88.2.10 | 143.88.7.11 | 972,063.5 | 0.31234 | 5,579,520 | 1.792794 | 3,112,192 | 6 |
edge_8 | 143.88.2.10 | 143.88.7.1 | 279,988 | 0.13377 | 8,567,808 | 4.093444 | 2,093,056 | 6 |
edge_9 | 143.88.2.10 | 143.88.7.12 | 778,386.3 | 0.691353 | 925,758,636,800 | 822,247.5 | 1,125,888 | 6 |
edge_262 | 143.88.5.12 | 143.88.5.1 | 943,576.7 | 1.87766 | 36,458,752 | 72.55069 | 502,528 | 5 |
edge_267 | 143.88.11.10 | 8.8.8.8 | 588,871.3 | 1.293066 | 43,664,530 | 95.88023 | 455,407 | 5 |
edge_268 | 143.88.11.10 | 8.8.4.4 | 590,266.6 | 1.300276 | 43,591,546 | 96.02614 | 453,955 | 5 |
edge_284 | 143.88.11.14 | 143.88.11.1 | 1,267,577 | 2.597 | 40,376,997 | 82.73483 | 488,029 | 5 |
edge_298 | 143.88.255.10 | 10.0.10.1 | 114.4165 | 0.000 | 605,569,716 | 90.06071 | 6,724,017 | 6 |
Phase 1—File Processing | Phase 2—Graph Processing | |||
---|---|---|---|---|
Duration (milliseconds) | Duration (milliseconds) | |||
Full File/Tactic/Filter by IP | Reduced Data | Full Data | Reduced Data | Full Data |
(84.3 k Rows) | (18.56 M Rows) | (84.3 k Rows) | (18.56 M Rows) | |
All rows | 702 | 64,955 | 60 | 65 |
Reconnaissance | 546 | 64,535 | 55 | 54 |
IP: 143.88.2.10 | 543 | 62,402 | 51 | 47 |
Duration for Graph Streaming | Row Count | |||
---|---|---|---|---|
(milliseconds) | ||||
Full File/Tactic | Reduced Data | Full Data | Reduced Data | Full Data |
All rows | 39 | 41 | 374 | 480 |
Reconnaissance | 39 | 40 | 255 | 258 |
IP: 143.88.2.10 | 38 | 38 | 254 | 256 |
Duration | Row Count | |||
---|---|---|---|---|
(milliseconds) | ||||
Full File/Tactic | Reduced Data | Full Data | Reduced Data | Full Data |
All rows | 7904 | 6967 | 374 | 480 |
Reconnaissance | 7510 | 7644 | 255 | 258 |
IP: 143.88.2.10 | 6834 | 7241 | 254 | 256 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bagui, S.S.; Mink, D.; Bagui, S.C.; Plain, M.; Hill, J.; Elam, M. Using a Graph Engine to Visualize the Reconnaissance Tactic of the MITRE ATT&CK Framework from UWF-ZeekData22. Future Internet 2023, 15, 236. https://doi.org/10.3390/fi15070236
Bagui SS, Mink D, Bagui SC, Plain M, Hill J, Elam M. Using a Graph Engine to Visualize the Reconnaissance Tactic of the MITRE ATT&CK Framework from UWF-ZeekData22. Future Internet. 2023; 15(7):236. https://doi.org/10.3390/fi15070236
Chicago/Turabian StyleBagui, Sikha S., Dustin Mink, Subhash C. Bagui, Michael Plain, Jadarius Hill, and Marshall Elam. 2023. "Using a Graph Engine to Visualize the Reconnaissance Tactic of the MITRE ATT&CK Framework from UWF-ZeekData22" Future Internet 15, no. 7: 236. https://doi.org/10.3390/fi15070236
APA StyleBagui, S. S., Mink, D., Bagui, S. C., Plain, M., Hill, J., & Elam, M. (2023). Using a Graph Engine to Visualize the Reconnaissance Tactic of the MITRE ATT&CK Framework from UWF-ZeekData22. Future Internet, 15(7), 236. https://doi.org/10.3390/fi15070236