ConGraph: Advanced Persistent Threat Detection Method Based on Provenance Graph Combined with Process Context in Cyber-Physical System Environment
Abstract
:1. Introduction
- (1)
- A new APT attack detection method ConGraph is proposed. It incorporates rich process–context information and file features at system runtime during the construction of a provenance graph to model fine-grained process behavior patterns.
- (2)
- We present a module for collecting contextual information to detect APT attacks, which collects features such as the file-access behavior of processes, network-access behavior, and interactive relationships between processes to enrich the provenance-graph structure. The detection efficiency is improved by using a graph-compression algorithm to reduce the computational scale.
- (3)
- We reproduced experiments based on a public dataset to collect process context during system runtime and evaluated our method. The experimental results demonstrate the effectiveness of our approach in detecting APT activities.
2. Related Work
3. Preliminary
3.1. Threat Model
- (1)
- Persistence. The attack activity lasts for a long time.
- (2)
- Stealthy. The attacker will try to mix the malicious activity with a lot of normal activity and try to disguise the malicious activity as normal activity.
- (3)
- An attack pattern exists in the provenance graph. In order to accomplish a malicious activity that is different from the benign activity, the attacker’s behavior should have some attack patterns in the provenance graph. The malicious node has a different local structure or contextual information compared to benign nodes in the provenance graph.
- (4)
- C&C communication. The victim host needs to communicate with the attacker to complete the attack commands and steal high-value information from the victim host.
3.2. Definitions
4. Proposed Model of APT Detection
4.1. System Overview of ConGraph
- (1)
- Log Preprocessing. The module involves the systematic collection and analysis of system audit logs, browser history, and DNS traffic through an array of tools—system audit tools (Windows ETW, Linux Auditd, etc.), user browsers (Google Chrome, Firefox, etc.), and network analysis tools (Wireshark, etc.). These data are subsequently structured into a provenance graph, arranged chronologically. In this graph, nodes represent system entities, like processes and files, while edges correspond to system events, exemplified by process creation.
- (2)
- Process–Context Collector. ConGraph collects API calls of processes at system runtime through an API call logging tool to obtain interaction characteristics, network characteristics, and file access characteristics of the processes. This enriches the semantic information of the nodes in the provenance graph. Furthermore, considering the complexity of matching certain process behavioral characteristics through API calls, we utilize rule matching to gather and collect these process behavioral features and other file security features, such as the file sensitivity level.
- (3)
- Sequence Construction and Context Fusion. The module will be divided into five parts: identify active nodes, context fusion, sequence construction, sequence sampling, and sequence embedding. First, ConGraph constructs the active node set by identifying the nodes with active behavior from the provenance graph. Next, the process context is fused with the provenance graph generated in Step 1. This is conducted using the node information from the active node set and the process-context information gathered by the process–context collector. The process–context information includes the behavioral characteristics of the process actions and the accessed files’ features. ConGraph extracts the action sequence of the active node from the fused provenance graph and completes the anonymized representation of the information. In addition, due to the great imbalance between attack sequences and benign sequences, over-sampling and under-sampling processes are performed on attack and benign sequences, respectively, during the training phase. Finally, we perform word embedding on the generated sequences to convert the anonymous representations into numerical representations. This improves the model’s ability to semantically distinguish between attack sequences and benign sequences.
- (4)
- Model Training and Detection. Through the above module, we collect the action sequences of attacking nodes and benign nodes. In the model training phase, we use the CNN-BiLSTM model to learn the behavioral patterns of APT attack processes and their corresponding process–context features. BiLSTM is capable of depicting the time-based behavioral features of the active node and learning the correlations between other system entities of the attack node. To capture the implicit features of APT attacks in sequences, we introduce a CNN layer to the model. During the detection phase, the contextual information is combined with the nodes’ feature sequences to classify process behavior into normal access or attacks.
4.2. Log Preprocessing
- (1)
- Merge all edges between two nodes that have the same category (e.g., file reads or writes) and retain only the edges with the earliest timestamps. Duplicate edges between nodes indicate repeated operations of the nodes over a period of time. Studies [17,29] have shown that these duplicate edges do not provide additional valid information for analyzing attacks. The same preprocessing will be used in the model training and detection phases, ensuring that the method does not affect the identification of the behavior of the attacking entities.
- (2)
- If some nodes and edges are involved in the same type of events, such nodes are grouped together. For example, if there is communication between process and three network nodes , , and over a period of time, there will exist a set of edges . At this point, we combine the three edges between , , and to form , where stands for the fused node.
- (4)
- Remove all isolated points from the provenance graph. APT attacks usually require a set of processes and files to cooperate with each other in order to realize the corresponding attack purpose; therefore, we consider isolated nodes irrelevant to the attack.
4.3. Process–Context Collector
4.4. Sequence Construction and Context Fusion
4.5. Model Training and Detection
5. Experiment
5.1. Dataset and Experimental Setup
5.2. Comparison Experiment
5.3. Ablation Experiment
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Han, S.; Xie, M.; Chen, H.H.; Ling, Y. Intrusion detection in cyber-physical systems: Techniques and challenges. IEEE Syst. J. 2014, 8, 1052–1062. [Google Scholar]
- Langner, R. Stuxnet: Dissecting a cyberwarfare weapon. IEEE Secur. Priv. 2011, 9, 49–51. [Google Scholar] [CrossRef]
- Kumar, R.; Kela, R.; Singh, S.; Trujillo-Rasua, R. APT attacks on industrial control systems: A tale of three incidents. Int. J. Crit. Infrastruct. Prot. 2022, 37, 100521. [Google Scholar] [CrossRef]
- Antonakakis, M.; April, T.; Bailey, M.; Bernhard, M.; Bursztein, E.; Cochran, J.; Durumeric, Z.; Halderman, J.A.; Invernizzi, L.; Kallitsis, M.; et al. Understanding the mirai botnet. In Proceedings of the 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, Canada, 16–18 August 2017; pp. 1093–1110. [Google Scholar]
- Sicato, J.C.S.; Sharma, P.K.; Loia, V.; Park, J.H. VPNFilter malware analysis on cyber threat in smart home network. Appl. Sci. 2019, 9, 2763. [Google Scholar] [CrossRef]
- European Union Agency for Cybersecurity (ENISA). Baseline Security Recommendations for IoT. Available online: https://www.enisa.europa.eu/publications/baseline-security-recommendations-for-iot/@@download/fullReport (accessed on 8 October 2023).
- NIST. Cybersecurity for IoT Program. Available online: https://www.nist.gov/itl/applied-cybersecurity/nist-cybersecurity-iot-program/consumer-iot-cybersecurity (accessed on 8 October 2023).
- Cirne, A.; Sousa, P.R.; Resende, J.S.; Antunes, L. IoT security certifications: Challenges and potential approaches. Comput. Secur. 2022, 116, 102669. [Google Scholar] [CrossRef]
- Anselmi, G.; Mandalari, A.M.; Lazzaro, S.; De Angelis, V. COPSEC: Compliance-Oriented IoT Security and Privacy Evaluation Framework. In Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, Madrid, Spain, 2–6 October 2023; pp. 1–3. [Google Scholar]
- Alshamrani, A.; Myneni, S.; Chowdhary, A.; Huang, D. A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities. IEEE Commun. Surv. Tutor. 2019, 21, 1851–1877. [Google Scholar] [CrossRef]
- Bridges, R.A.; Glass-Vanderlan, T.R.; Iannacone, M.D.; Vincent, M.S.; Chen, Q. A survey of intrusion detection systems leveraging host data. ACM Comput. Surv. CSUR 2019, 52, 1–35. [Google Scholar] [CrossRef]
- Singla, A.; Bertino, E.; Verma, D. Preparing network intrusion detection deep learning models with minimal data using adversarial domain adaptation. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, Taipei, Taiwan, 5–9 October 2020; pp. 127–140. [Google Scholar]
- Axelsson, S. Intrusion Detection Systems: A Survey and Taxonomy; Chalmers University of Technology: Goteborg, Sweden, 2000. [Google Scholar]
- Han, X.; Pasquier, T.; Ranjan, T.; Goldstein, M.; Seltzer, M. {FRAPpuccino}: Fault-detection through Runtime Analysis of Provenance. In Proceedings of the 9th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 17), Santa Clara, CA, USA, 10–11 July 2017. [Google Scholar]
- Han, X.; Pasquier, T.; Bates, A.; Mickens, J.; Seltzer, M. Unicorn: Runtime provenance-based detector for advanced persistent threats. arXiv 2020, arXiv:2001.01525. [Google Scholar]
- Zengy, J.; Wang, X.; Liu, J.; Chen, Y.; Liang, Z.; Chua, T.S.; Chua, Z.L. Shadewatcher: Recommendation-guided cyber threat analysis using system audit records. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 22–26 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 489–506. [Google Scholar]
- Hossain, M.N.; Milajerdi, S.M.; Wang, J.; Eshete, B.; Gjomemo, R.; Sekar, R.; Stoller, S.; Venkatakrishnan, V. {SLEUTH}: Real-time attack scenario reconstruction from {COTS} audit data. In Proceedings of the 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, Canada, 16–18 August 2017; pp. 487–504. [Google Scholar]
- Yang, J.; Zhang, Q.; Jiang, X.; Chen, S.; Yang, F. Poirot: Causal correlation aided semantic analysis for advanced persistent threat detection. IEEE Trans. Dependable Secur. Comput. 2021, 19, 3546–3563. [Google Scholar] [CrossRef]
- Milajerdi, S.M.; Gjomemo, R.; Eshete, B.; Sekar, R.; Venkatakrishnan, V. Holmes: Real-time apt detection through correlation of suspicious information flows. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1137–1152. [Google Scholar]
- Xiong, C.; Zhu, T.; Dong, W.; Ruan, L.; Yang, R.; Cheng, Y.; Chen, Y.; Cheng, S.; Chen, X. CONAN: A practical real-time APT detection system with high accuracy and efficiency. IEEE Trans. Dependable Secur. Comput. 2020, 19, 551–565. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, M.; Li, D.; Jee, K.; Li, Z.; Wu, Z.; Rhee, J.; Mittal, P. Towards a Timely Causality Analysis for Enterprise Security. In Proceedings of the Network and Distributed Systems Security (NDSS), San Diego, CA, USA, 18–21 February 2018. [Google Scholar]
- Hassan, W.U.; Guo, S.; Li, D.; Chen, Z.; Jee, K.; Li, Z.; Bates, A. Nodoze: Combatting threat alert fatigue with automated provenance triage. In Proceedings of the Network and Distributed Systems Security Symposium, San Diego, CA, USA, 24–27 February . 2019. [Google Scholar]
- Wang, S.; Wang, Z.; Zhou, T.; Sun, H.; Yin, X.; Han, D.; Zhang, H.; Shi, X.; Yang, J. Threatrace: Detecting and tracing host based threats in node level through provenance graph learning. IEEE Trans. Inf. Forensics Secur. 2022, 17, 3972–3987. [Google Scholar] [CrossRef]
- Liu, F.; Wen, Y.; Zhang, D.; Jiang, X.; Xing, X.; Meng, D. Log2vec: A heterogeneous graph embedding based approach for detecting cyber threats within enterprise. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 1777–1794. [Google Scholar]
- Du, M.; Li, F.; Zheng, G.; Srikumar, V. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1285–1298. [Google Scholar]
- Li, T.; Jiang, Y.; Lin, C.; Obaidat, M.S.; Shen, Y.; Ma, J. Deepag: Attack graph construction and threats prediction with bi-directional deep learning. IEEE Trans. Dependable Secur. Comput. 2022, 20, 740–757. [Google Scholar] [CrossRef]
- Ramos, C. Spam Campaigns with Malware Exploiting CVE-2017-11882 Spread in Australia and Japan. Available online: https://www.trendmicro.com/vinfo/us/threat-encyclopedia/spam/3655/spam-campaigns-with-malware-exploiting-cve201711882-spread-in-australia-and-japan (accessed on 6 June 2020).
- Chen, T.; Dong, C.; Lv, M.; Song, Q.; Liu, H.; Zhu, T.; Xu, K.; Chen, L.; Ji, S.; Fan, Y. APT-KGL: An Intelligent APT Detection System Based on Threat Knowledge and Heterogeneous Provenance Graph Learning. IEEE Trans. Dependable Secur. Comput. 2022, 1–15. [Google Scholar] [CrossRef]
- Xu, Z.; Wu, Z.; Li, Z.; Jee, K.; Rhee, J.; Xiao, X.; Xu, F.; Wang, H.; Jiang, G. High fidelity data reduction for big data security dependency analyses. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 504–516. [Google Scholar]
- Beijering, K.; Gooskens, C.; Heeringa, W. Predicting intelligibility and perceived linguistic distance by means of the Levenshtein algorithm. Linguist. Neth. 2008, 25, 13–24. [Google Scholar] [CrossRef]
- Alsaheel, A.; Nan, Y.; Ma, S.; Yu, L.; Walkup, G.; Celik, Z.B.; Zhang, X.; Xu, D. {ATLAS}: A sequence-based learning approach for attack investigation. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Vancouver, BC, Canada, 11–13 August 2021; pp. 3005–3022. [Google Scholar]
- FireEye Threat Intelligence. Second Adobe Flash Zeroday CVE-2015-5122 from Hackingteam Exploited in Strategic Web Compromise Targeting Japanese Victims. Available online: https://www.fireeye.com/blog/threat-research/2015/07/second_adobe_flashz0.html (accessed on 6 June 2020).
- Paganini, P. Phishing Campaigns Target US Government Agencies Exploiting Hacking Team Flaw CVE-2015-5119. Available online: https://securityaffairs.co/wordpress/38707/cyber-crime/phishing-cve-2015-5119.html (accessed on 6 June 2020).
- Li, B.; Chen, J.C. Exploit Kits in 2015: Flash Bugs, Compromised Sites, Malvertising Dominate. Available online: https://blog.trendmicro.com/trendlabs-security-intelligence/exploit-kits-2015-flash-bugs-compromised-sites-malvertising-dominate/ (accessed on 6 June 2020).
- Trend Micro. Rig Exploit Kit Now Using CVE-2018-8174 to Deliver Monero Miner. Available online: https://blog.trendmicro.com/trendlabs-security-intelligence/rig-exploit-kit-now-using-cve-2018-8174-to-deliver-monerominer/ (accessed on 6 June 2020).
- Jiang, G.; Mohandas, R.; Leathery, J.; Berry, A.; Galang, L. CVE-2017-0199: In the Wild Attacks Leveraging HTA Handler. Available online: https://www.fireeye.com/blog/threat-research/2017/04/cve-2017-0199-hta-handler.html (accessed on 6 June 2020).
- Yan, N.; Wen, Y.; Chen, L.; Wu, Y.; Zhang, B.; Wang, Z.; Meng, D. Deepro: Provenance-based APT Campaigns Detection via GNN. In Proceedings of the 2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Wuhan, China, 9–11 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 747–758. [Google Scholar]
- Guo, H.; Yuan, S.; Wu, X. Logbert: Log anomaly detection via bert. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18-22 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–8. [Google Scholar]
Categories | No. | Description | |
---|---|---|---|
Process Feature | Runtime Behavior | 1 | Keylogger |
2 | Recording microphone | ||
3 | Grab screen | ||
4 | Execute sensitive commands | ||
5 | Access sensitive files | ||
6 | the process has no GUI | ||
Network Behavior | 7 | The ancestor process has network connections | |
8 | The process accesses the Internet | ||
9 | Download file from the Internet | ||
File Feature | 10 | The file is downloaded from the Internet | |
11 | The file does not contain a valid signature | ||
12 | The file is a system-sensitive file |
Type | Representation | |
---|---|---|
Node | File | system_file, program_file, user_file |
Process | system_process, program_process, user_process | |
Network | socket, web_object, domain, IP_address | |
Edges | bind, sock_send, write, delete, fork, resolve, web_request, refer, connect, read, executed |
ID | APT Campaign | Exploit CVE | Attack Features * | Numberof Entity | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
PL | PA | INJ | IG | BD | LM | DE | Attack | Non-Attack | |||
M-1 | Strategic web compromise [32] | 2015-5122 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 28 | 17,565 | |
M-2 | Targeted GOV phishing [33] | 2015-5199 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 36 | 24,450 | |
M-3 | Malvertising dominate [34] | 2015-3105 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 36 | 24,424 | |
M-4 | Monero miner by Rig [35] | 2018-8174 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 28 | 15,378 | |
M-5 | Pony campaign [36] | 2017-0199 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 30 | 35,671 | |
M-6 | Spam campaign [27] | 2017-11882 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 42 | 19,580 |
Scenario | Methods | Metrics | ||
---|---|---|---|---|
Precision | Recall | F1-Score | ||
M-1 | DeepLog | 0.8064 | 0.6854 | 0.6496 |
Deepro | 0.8947 | 1 | 0.9444 | |
LogBert | 0.7512 | 0.5724 | 0.6497 | |
ConGraph | 0.9936 | 0.9935 | 0.9935 | |
M-2 | DeepLog | 0.7806 | 0.6104 | 0.5408 |
Deepro | 0.8824 | 0.9357 | 0.9091 | |
LogBert | 0.707 | 0.4725 | 0.5664 | |
ConGraph | 0.9922 | 0.992 | 0.992 | |
M-3 | DeepLog | 0.7824 | 0.6158 | 0.5493 |
Deepro | 0.85 | 0.9444 | 0.8947 | |
LogBert | 0.7329 | 0.4758 | 0.577 | |
ConGraph | 0.969 | 0.9684 | 0.9684 | |
M-4 | DeepLog | 0.7843 | 0.6214 | 0.5582 |
Deepro | 1 | 0.875 | 0.9333 | |
LogBert | 0.7753 | 0.5774 | 0.6619 | |
ConGraph | 0.9603 | 0.9591 | 0.9591 | |
M-5 | DeepLog | 0.7878 | 0.6239 | 0.5758 |
Deepro | 0.9529 | 0.8929 | 0.9091 | |
LogBert | 0.6996 | 0.4847 | 0.5727 | |
ConGraph | 0.916 | 0.9051 | 0.9095 | |
M-6 | DeepLog | 0.7863 | 0.6274 | 0.5674 |
Deepro | 1 | 0.7333 | 0.8462 | |
LogBert | 0.7312 | 0.5518 | 0.629 | |
ConGraph | 0.8592 | 0.8184 | 0.8131 | |
Avg. | DeepLog | 0.7879 | 0.6306 | 0.5735 |
Deepro | 0.9255 | 0.8968 | 0.9061 | |
LogBert | 0.7329 | 0.5224 | 0.6094 | |
ConGraph | 0.9467 | 0.9394 | 0.9392 |
Metrics | Precision | Recall | F1-Score | |
---|---|---|---|---|
Scenarios | ||||
M-1 | baseline | 0.9323 | 0.9261 | 0.9261 |
+process context | 0.9836 | 0.9935 | 0.9935 | |
M-2 | baseline | 0.9892 | 0.9889 | 0.9889 |
+process context | 0.9922 | 0.992 | 0.992 | |
M-3 | baseline | 0.9466 | 0.9432 | 0.9431 |
+process context | 0.969 | 0.9684 | 0.9684 | |
M-4 | baseline | 0.9347 | 0.9282 | 0.9279 |
+process context | 0.9603 | 0.9591 | 0.9591 | |
M-5 | baseline | 0.8941 | 0.874 | 0.8724 |
+process context | 0.9155 | 0.9051 | 0.9045 | |
M-6 | baseline | 0.8415 | 0.7835 | 0.7739 |
+process context | 0.8592 | 0.8184 | 0.8131 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, L.; Chen, W. ConGraph: Advanced Persistent Threat Detection Method Based on Provenance Graph Combined with Process Context in Cyber-Physical System Environment. Electronics 2024, 13, 945. https://doi.org/10.3390/electronics13050945
Li L, Chen W. ConGraph: Advanced Persistent Threat Detection Method Based on Provenance Graph Combined with Process Context in Cyber-Physical System Environment. Electronics. 2024; 13(5):945. https://doi.org/10.3390/electronics13050945
Chicago/Turabian StyleLi, Linrui, and Wen Chen. 2024. "ConGraph: Advanced Persistent Threat Detection Method Based on Provenance Graph Combined with Process Context in Cyber-Physical System Environment" Electronics 13, no. 5: 945. https://doi.org/10.3390/electronics13050945
APA StyleLi, L., & Chen, W. (2024). ConGraph: Advanced Persistent Threat Detection Method Based on Provenance Graph Combined with Process Context in Cyber-Physical System Environment. Electronics, 13(5), 945. https://doi.org/10.3390/electronics13050945