Missing Data Imputation Based on Causal Inference to Enhance Advanced Persistent Threat Attack Prediction
Abstract
:1. Introduction
- The combination of causal discovery and APT attack motivation analysis reveals event correlation through causality and infers potential attack strategies and motivations.
- The causal driven data imputation method is used to ensure that the imputation results are closer to the real data and better reflect the causal mechanism in the data, which is more accurate than traditional imputation methods.
- The causal mechanism is integrated into the generative model, which is not only used for data reconstruction, but also can mine the potential causal chain, so as to learn the causal dependence in the latent space more accurately.
2. Related Work
3. Materials and Methods
3.1. Pre-Processing and Sampling
3.2. Missing Data Processing Based on LiNGAM Graph Autoencoder
3.2.1. Graph Autoencoder
3.2.2. LiNGAM
3.2.3. Loss Function
3.3. Predictions
4. Experiment
4.1. Dataset
4.2. Evolution
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mittal, M.; Kumar, K.; Behal, S. Deep learning approaches for detecting DDoS attacks: A systematic review. Soft Comput. 2023, 27, 13039–13075. [Google Scholar] [CrossRef] [PubMed]
- Quintero-Bonilla, S.; Martín del Rey, A. A new proposal on the advanced persistent threat: A survey. Appl. Sci. 2020, 10, 3874. [Google Scholar] [CrossRef]
- Yang, H.; Wang, Z.; Zhang, L.; Cheng, X.C. A Multi-Protocol Botnet Detection Method for IoT. Acta Electron. Sin. 2023, 51, 1198–1206. [Google Scholar] [CrossRef]
- Karantzas, G.; Patsakis, C. An empirical assessment of endpoint detection and response systems against advanced persistent threats attack vectors. J. Cybersecur. Priv. 2021, 1, 387–421. [Google Scholar] [CrossRef]
- Xie, L.; Li, X.; Yang, H.; Zhang, L. A Multi-stage APT Attack Detection Method Based on Sample Enhancement. In International Symposium on Cyberspace Safety and Security; Springer International Publishing: Cham, Switzerland, 2022; pp. 209–216. [Google Scholar]
- Stojanović, B.; Hofer-Schmitz, K.; Kleb, U. APT datasets and attack modeling for automated detection methods: A review. Comput. Secur. 2020, 92, 101734. [Google Scholar] [CrossRef]
- Neuschmied, H.; Winter, M.; Stojanović, B.; Hofer-Schmitz, K.; Božić, J.; Kleb, U. Apt-attack detection based on multi-stage autoencoders. Appl. Sci. 2022, 12, 6816. [Google Scholar] [CrossRef]
- Wilkens, F.; Ortmann, F.; Haas, S.; Vallentin, M.; Fischer, M. Multi-stage attack detection via kill chain state machines. In Proceedings of the 3rd Workshop on Cyber-Security Arms Race; Association for Computing Machinery: New York, NY, USA, 2021; pp. 13–24. [Google Scholar]
- Zhou, P.; Zhou, G.; Wu, D.; Fei, M. Detecting multi-stage attacks using sequence-to-sequence model. Comput. Secur. 2021, 105, 102203. [Google Scholar] [CrossRef]
- Takey, Y.S.; Tatikayala, S.G.; Samavedam, S.S.; Eswari, P.L.; Patil, M.U. Real time early multi stage attack detection. In Proceedings of the 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 19–20 March 2021; IEEE: Piscataway, NJ, USA, 2021; Volume 1, pp. 283–290. [Google Scholar]
- Annadani, Y.; Pawlowski, N.; Jennings, J.; Bauer, S.; Zhang, C.; Gong, W. BayesDAG: Gradient-Based Posterior Inference for Causal Discovery. In Proceedings of the 37th Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Advances in Neural Information Processing Systems. Volume 36. [Google Scholar]
- Fan, S.; Wang, X.; Shi, C.; Cui, P.; Wang, B. Generalizing graph neural networks on out-of-distribution graphs. In IEEE Transactions on Pattern Analysis and Machine Intelligence; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
- Kaltenpoth, D.; Vreeken, J. Nonlinear causal discovery with latent confounders. In Proceedings of the PMLR: 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 15639–15654. [Google Scholar]
- Sun, B.; Sun, J.; Pham, L.H.; Shi, J. Causality-based neural network repair. In Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, PA, USA, 25–27 May 2022; pp. 338–349. [Google Scholar]
- Yang, J.; Zhang, Q.; Jiang, X.; Chen, S.; Yang, F. Poirot: Causal correlation aided semantic analysis for advanced persistent threat detection. IEEE Trans. Dependable Secur. Comput. 2021, 19, 3546–3563. [Google Scholar] [CrossRef]
- Strnad, A.; Messiter, Q.; Watson, R.; Carata, L.; Anderson, J.; Kidney, B. Casual, Adaptive, Distributed, and Efficient Tracing System (Cadets); Technical Report; BAE Systems Burlington United States: Burlington, MA, USA, 2019. [Google Scholar]
- Akbar, K.A.; Wang, Y.; Ayoade, G.; Gao, Y.; Singhal, A.; Khan, L.; Thuraisingham, B.; Jee, K. Advanced Persistent Threat Detection Using Data Provenance and Metric Learning. IEEE Trans. Dependable Secur. Comput. 2022, 20, 3957–3969. [Google Scholar] [CrossRef]
- Zhu, T.; Yu, J.; Xiong, C.; Cheng, W.; Yuan, Q.; Ying, J.; Chen, T.; Zhang, J.; Lv, M.; Chen, Y.; et al. Aptshield: A stable, efficient and real-time apt detection system for linux hosts. IEEE Trans. Dependable Secur. Comput. 2023, 20, 5247–5264. [Google Scholar] [CrossRef]
- Huang, G. Missing data filling method based on linear interpolation and lightgbm. J. Phys. Conf. Ser. 2021, 1754, 012187. [Google Scholar] [CrossRef]
- Yu, L.; Liu, L.; Peace, K.E. Regression multiple imputation for missing data analysis. Stat. Methods Med. Res. 2020, 29, 2647–2664. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.K.; Shao, J. Statistical Methods for Handling Incomplete Data; Chapman and Hall/CRC: Boca Raton, FL, USA, 2021. [Google Scholar]
- Yoon, J.; Jordon, J.; Schaar, M. Gain: Missing data imputation using generative adversarial nets. In Proceedings of the PMLR: International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 5689–5698. [Google Scholar]
- Peng, Z.; Huang, W.; Gu, S.; Xie, L.; Wang, Y.; Jiao, J.; Ye, Q. Conformer: Local features coupling global representations for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 367–376. [Google Scholar]
- Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 2018, 8, 6085. [Google Scholar] [CrossRef] [PubMed]
- Zhu, H.; Lin, Y.; Liu, Z.; Fu, J.; Chua, T.S.; Sun, M. Graph neural networks with generated parameters for relation extraction. arXiv 2019, arXiv:1902.00756. [Google Scholar]
- Zou, D.; Hu, Z.; Wang, Y.; Jiang, S.; Sun, Y.; Gu, Q. Layer-dependent importance sampling for training deep and large graph convolutional networks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Advances in Neural Information Processing Systems. Volume 32. [Google Scholar]
- Che Mat, N.I.; Jamil, N.; Yusoff, Y.; Mat Kiah, M.L. A systematic literature review on advanced persistent threat behaviors and its detection strategy. J. Cybersecur. 2024, 10, tyad023. [Google Scholar] [CrossRef]
- Ikeuchi, T.; Ide, M.; Zeng, Y.; Maeda, T.N.; Shimizu, S. Python package for causal discovery based on LiNGAM. J. Mach. Learn. Res. 2023, 24, 1–8. [Google Scholar]
- Wang, W.; Yi, P.; Jiang, J.; Zhang, P.; Chen, X. Transformer-based framework for alert aggregation and attack prediction in a multi-stage attack. Comput. Secur. 2024, 136, 103533. [Google Scholar] [CrossRef]
Data Tpye | Description |
---|---|
EVENT_CONNECT | establishes a network connection |
EVENT_READ | reads data |
EVENT_RECVFROM | receives data from a network connection |
EVENT_EXECUTE | executes a program or command |
EVENT_OPEN | opens a file or resource |
EVENT_FORK | creates a new process |
EVENT_WRITE | writes data to a file or output stream |
EVENT_CREATE_OBJECT | creates an object, such as a file or process |
EVENT_SENDMSG | sends the message |
EVENT_CLOSE | closes a file or network connection |
EVENT_SENDTO | sends data to a network connection |
Stage | Data Type | ||
---|---|---|---|
Reconnaissance | EVENT_CONNECT | EVENT_READ | EVENT_RECVFROM |
Intrusion | EVENT_EXECUTE | EVENT_OPEN | |
Lateral Movement | EVENT_FORK | EVENT_CONNECT | EVENT_EXECUTE |
Persistence | EVENT_WRITE | EVENT_CREATE_OGJECT | |
Execution | EVENT_WRITE | EVENT_SENDMSG | EVENT_CLOSE |
Exfiltration | EVENT_SENDTO | EVENT_WRITE | EVENT_CLOSE |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheng, X.; Kuang, M.; Yang, H. Missing Data Imputation Based on Causal Inference to Enhance Advanced Persistent Threat Attack Prediction. Symmetry 2024, 16, 1551. https://doi.org/10.3390/sym16111551
Cheng X, Kuang M, Yang H. Missing Data Imputation Based on Causal Inference to Enhance Advanced Persistent Threat Attack Prediction. Symmetry. 2024; 16(11):1551. https://doi.org/10.3390/sym16111551
Chicago/Turabian StyleCheng, Xiang, Miaomiao Kuang, and Hongyu Yang. 2024. "Missing Data Imputation Based on Causal Inference to Enhance Advanced Persistent Threat Attack Prediction" Symmetry 16, no. 11: 1551. https://doi.org/10.3390/sym16111551
APA StyleCheng, X., Kuang, M., & Yang, H. (2024). Missing Data Imputation Based on Causal Inference to Enhance Advanced Persistent Threat Attack Prediction. Symmetry, 16(11), 1551. https://doi.org/10.3390/sym16111551