AGCN-Domain: Detecting Malicious Domains with Graph Convolutional Network and Attention Mechanism
Abstract
:1. Introduction
- We proposed AGCN-Domain, a system for detecting malicious domain names using multiple relationship patterns: Client relation, resolution relation, and cname relation, which could extract and fuse the relationship features of domain names. The malicious domain names detection problem was transformed into a node binary prediction problem.
- We designed a mechanism to detect malicious domains with high accuracy. We introduced a graph convolutional neural network, which performed well in graph correlation tasks to detect malicious domains. We integrated the graph convolutional neural network with an attention mechanism to intelligently blend the effects of different relationship features on the classification results of domains for different types of domains.
- We made a comprehensive evaluation of our work with real-world data collected from an educational network. The results demonstrate that it has good performance in detecting malicious domains.
2. Related Work
2.1. Feature-Based Methods
2.2. Graph-Based Methods
2.3. Discussion
3. Preliminaries
3.1. Domain Relations
3.2. Graph Convolutional Network (GCN)
4. Method
4.1. Overview
4.2. Data Structure
4.3. Data Preprocessor
- Corrupted records. There are some corrupted records from transmission errors in collected raw data, such as an incomplete record missing some fields.
- Irregular domains. There are some irregular domains in the original data, which can be divided into two categories. One is that domains do not comply with domain naming rules, which is probably due to mistyping or misconfiguration, for example, containing commas in strings like youtube,com. The other is that the domains whose TLD (Top Level Domain) are not registered in IANA, which means that they are invalid Internet domains.
4.4. Relation Graph Constructor
4.4.1. Client Relation Graph
4.4.2. Resolution Relation Graph
4.4.3. Cname Relation Graph
4.4.4. Graph Pruner
- Popular domains. The basic intuition is that domains that have been queried by more clients are more likely to be legitimate. The typical example is that famous domains, such as google.com, can be queried by nearly all clients in the monitored local network. Processing such famous popular domains will take a lot of resources; thus, we pruned them to increase the efficacy of the system. A popular domain was defined as requested by more than 25% of clients.
- Hyperactive clients. In our data, there are some very active clients that can query domains even 1,000,000 times one day. We analyzed them and found that they are proxies or forwarders: there may be hundreds of clients behind source IP. Such clients cannot provide valid client relation for domains, so we deleted them. We set the top 0.1% clients as hyperactive clients and removed them.
- Inactive clients. There are some clients that query only a few domains. Such clients also cannot offer much information; thus, we set a threshold of and removed clients querying fewer domains than this. The was set to 2 in our experiment.
- Inactive IPs. The same as inactive clients, we erased IPs that host only one domain in our network data.
- Exceptions. Similar to previous work [18], we kept malicious domains and their related information even when they complied with the above rules, considering that malicious domains usually are inactive to avoid detection.
4.5. Attention-GCN Classifier
Algorithm 1 Attention-GCN Classifier |
|
5. Evaluation
5.1. Setup
5.2. Features
5.3. Initial Label Fraction
5.4. Sensitive Parameters
5.4.1. Comparison with Other Models
- DeepWalk [37]. DeepWalk learns representations of graph nodes from truncated random walks. For this experiment, we took the DeepWalk model to each relation graph and added them to obtain final embedding vectors for each node. Then, we leveraged a fully connected layer to distinguish malicious domains.
- Node2Vec [38]. Node2Vec aims to learn the scalable features of nodes in the graph. Similar to DeepWalk, we took Node2Vec to each relation graph and obtained representations for each node, then added them to obtain final node features. Then, a fully connected layer was applied to predict malicious domains.
- Basic GCN [29]. GCN is a famous model and has shown great performance in many areas. In this experiment, we took a basic GCN model to each graph and combined different relation features without an attention mechanism.
5.4.2. Comparison with Other Malicious Domain Detection Systems
- Manadhata, et al. [14] constructed a client-querying-domain bipartite graph to depict who is querying what. Then, the researchers labeled domains with ground truth and leveraged belief propagation algorithm to predict unknown domains’ states.
- Khalil, et al. [15] generated a domain resolution relation graph to represent whether two domains are sharing common resolutions. Then malicious scores for nodes can be calculated based on their distance from all known malicious domains.
- Lei, et al. [17] modeled domain behavior with three domain similarity graphs. The researchers derived domain features with graph embedding techniques from three graphs and detected malicious domains by concatenating these features.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AGCN | Graph convolutional network with attention mechanism |
GCN | Graph convolutional network |
CR | Client relation |
RR | Resolution relation |
NR | Cname relation |
References
- Antonakakis, M.; Perdisci, R.; Dagon, D.; Lee, W.; Feamster, N. Building a dynamic reputation system for dns. In Proceedings of the 19th USENIX Security Symposium (USENIX Security 10), Washington, DC, USA, 11–13 August 2010; USENIX Association: Berkeley, CA, USA, 2010; p. 18. [Google Scholar]
- Bilge, L.; Sen, S.; Balzarotti, D.; Kirda, E.; Kruegel, C. Exposure: A passive dns analysis service to detect and report malicious domains. ACM Trans. Inf. Syst. Secur. (TISSEC) 2014, 16, 1–28. [Google Scholar] [CrossRef]
- Bilge, L.; Kirda, E.; Kruegel, C.; Balduzzi, M. Exposure: Finding malicious domains using passive dns analysis. In Proceedings of the 18th Annual Network and Distributed System Security Symposium (NDSS2011), San Diego, CA, USA, 6–9 February 2011; pp. 1–17. [Google Scholar]
- Antonakakis, M.; Perdisci, R.; Lee, W.; Vasiloglou, N.; Dagon, D. Detecting malware domains at the upper dns hierarchy. In Proceedings of the 20th USENIX Conference on Security (USENIX Security 11), San Francisco, CA, USA, 8–12 August 2011; USENIX Association: Berkeley, CA, USA, 2011; p. 27. [Google Scholar]
- Chiba, D.; Yagi, T.; Akiyama, M.; Shibahara, T.; Yada, T.; Mori, T.; Goto, S. Domainprofiler: Discovering domain names abused in future. In Proceedings of the 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Toulouse, France, 28 June–1 July 2016; pp. 491–502. [Google Scholar]
- Hao, S.; Kantchelian, A.; Miller, B.; Paxson, V.; Feamster, N. Predator: Proactive recognition and elimination of domain abuse at time-of-registration. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 1568–1579. [Google Scholar]
- Schüppen, S.; Teubert, D.; Herrmann, P.; Meyer, U. {FANCI}: Feature-based automated nxdomain classification and intelligence. In Proceedings of the 27th USENIX Security Symposium (USENIX Security 18), Baltimore, MD, USA, 15–17 August 2018; pp. 1165–1181. [Google Scholar]
- Yadav, S.; Reddy, A.K.K.; Reddy, A.; Ranjan, S. Detecting algorithmically generated malicious domain names. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, Melbourne, Australia, 1–3 November 2010; pp. 48–61. [Google Scholar]
- Schiavoni, S.; Maggi, F.; Cavallaro, L.; Zanero, S. Phoenix: Dga-based botnet tracking and intelligence. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment; Springer: Berlin/Heidelberg, Germany, 2014; pp. 192–211. [Google Scholar]
- Woodbridge, J.; Anderson, H.S.; Ahuja, A.; Grant, D. Predicting domain generation algorithms with long short-term memory networks. arXiv 2016, arXiv:1611.00791. [Google Scholar]
- Tran, D.; Mac, H.; Tong, V.; Tran, H.A.; Nguyen, L.G. A lstm based framework for handling multiclass imbalance in dga botnet detection. Neurocomputing 2018, 275, 2401–2413. [Google Scholar] [CrossRef]
- Xu, C.; Shen, J.; Du, X. Detection method of domain names generated by dgas based on semantic representation and deep neural network. Comput. Secur. 2019, 85, 77–88. [Google Scholar] [CrossRef]
- Rahbarinia, B.; Perdisci, R.; Antonakakis, M. Segugio: Efficient behavior-based tracking of malware-control domains in large isp networks. In Proceedings of the 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Rio de Janeiro, Brazil, 22–25 June 2015; pp. 403–414. [Google Scholar]
- Manadhata, P.K.; Yadav, S.; Rao, P.; Horne, W. Detecting malicious domains via graph inference. In European Symposium on Research in Computer Security; Springer International Publishing: Cham, Switzerland, 2014; pp. 1–18. [Google Scholar]
- Khalil, I.; Yu, T.; Guan, B. Discovering malicious domains through passive dns data graph analysis. In Proceedings of the ASIA CCS ’16: 11th ACM on Asia Conference on Computer and Communications Security, Xi’an, China, 30 May–3 June 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 663–674. [Google Scholar]
- Sun, X.; Wang, Z.; Yang, J.; Liu, X. Deepdom: Malicious domain detection with scalable and heterogeneous graph convolutional networks. Comput. Secur. 2020, 99, 102057. [Google Scholar] [CrossRef]
- Lei, K.; Fu, Q.; Ni, J.; Wang, F.; Yang, M.; Xu, K. Detecting malicious domains with behavioral modeling and graph embedding. In Proceedings of the 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), Dallas, TX, USA, 7–10 July 2019; pp. 601–611. [Google Scholar]
- Sun, X.; Tong, M.; Yang, J.; Xinran, L.; Heng, L. Hindom: A robust malicious domain detection system based on heterogeneous information network with transductive classification. In Proceedings of the 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019), Beijing, China, 23–25 September 2019; USENIX Association: Berkeley, CA, USA, 2019; pp. 399–412. [Google Scholar]
- Zou, F.; Zhang, S.; Rao, W.; Yi, P. Detecting malware based on dns graph mining. Int. J. Distrib. Sens. Netw. 2015, 11, 102687. [Google Scholar] [CrossRef]
- Jia, Y.; Gu, Z.; Jiang, Z.; Gao, C.; Yang, J. Persistent graph stream summarization for real-time graph analytics. World Wide Web 2023, 26, 2647–2667. [Google Scholar] [CrossRef]
- Jia, Y.; Gu, Z.; Du, L.; Long, Y.; Wang, Y.; Li, J.; Zhang, Y. Artificial intelligence enabled cyber security defense for smart cities: A novel attack detection framework based on the MDATA model. Knowl.-Based Syst. 2023, 276, 110781. [Google Scholar] [CrossRef]
- Jia, Y.; Gu, Z.; Li, A. MDATA: A New Knowledge Representation Model: Theory, Methods and Applications; Springer Nature: New York, NY, USA, 2021; Volume 12647, pp. 1–255. [Google Scholar]
- Lee, J.; Lee, H. Gmad: Graph-based malware activity detection by dns traffic analysis. Comput. Commun. 2014, 49, 33–47. [Google Scholar] [CrossRef]
- Peng, C.; Yun, X.; Zhang, Y.; Li, S.; Xiao, J. Discovering malicious domains through alias-canonical graph. In Proceedings of the 2017 IEEE Trustcom/BigDataSE/ICESS, Sydney, Australia, 1–4 August 2017; pp. 225–232. [Google Scholar]
- Najafi, P.; Mühle, A.; Pünter, W.; Cheng, F.; Meinel, C. Malrank: A measure of maliciousness in siem-based knowledge graphs. In Proceedings of the 35th Annual Computer Security Applications Conference, San Juan, PR, USA, 9–13 December 2019; pp. 417–429. [Google Scholar]
- Anderson, H.S.; Woodbridge, J.; Filar, B. Deepdga: Adversarially-tuned domain generation and detection. In Proceedings of the AISec ’16: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, Vienna, Austria, 28 October 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 13–21. [Google Scholar]
- Fu, Y.; Yu, L.; Hambolu, O.; Ozcelik, I.; Husain, B.; Sun, J.; Sapra, K.; Du, D.; Beasley, C.T.; Brooks, R.R. Stealthy domain generation algorithms. IEEE Trans. Inf. Forensics Secur. 2017, 12, 1430–1443. [Google Scholar] [CrossRef]
- Yun, X.; Huang, J.; Wang, Y.; Zang, T.; Zhou, Y.; Zhang, Y. Khaos: An adversarial neural network dga with high anti-detection ability. IEEE Trans. Inf. Forensics Secur. 2019, 15, 2225–2240. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. Conference Track Proceedings OpenReview.net 2017. [Google Scholar]
- Alexa. Available online: https://www.alexa.com (accessed on 15 September 2022).
- 360DGAs. Available online: https://data.netlab.360.com/dga/ (accessed on 15 September 2022).
- MalwareDomainList. Available online: https://www.malwaredomainlist.com (accessed on 15 September 2022).
- Malc0de.com. Available online: https://malc0de.com/bl/ZONES (accessed on 15 September 2022).
- VirusTotal. Available online: https://www.virustotal.com (accessed on 15 September 2022).
- Pytorch. Available online: https://pytorch.org (accessed on 10 September 2022).
- Networkx. Available online: https://networkx.org (accessed on 10 September 2022).
- Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
- Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
Metrics Name | Instruction |
---|---|
TP | The number of malicious domains predicted as malicious |
FP | The number of benign domains predicted as malicious |
TN | The number of benign domains predicted as benign |
FN | The number of malicious domains predicted as benign |
Accuracy | (TP + TN)/(TP + FP + TN + FN) |
Precision | TP/(TP + FP) |
Recall | TP/(TP + FN) |
F1 | 2 × (Precision × Recall)/(Precision + Recall) |
Relation Pattern | Accuracy | Precision | Recall | F1 | CovRatio |
---|---|---|---|---|---|
C-R | 0.9421 | 0.9857 | 0.7991 | 0.8827 | 95.04% |
R-R | 0.9744 | 0.9650 | 0.8932 | 0.9277 | 66.96% |
N-R | 0.9955 | 1.000 | 0.8333 | 0.9091 | 11.48% |
Fused | 0.9693 | 0.9814 | 0.8915 | 0.9343 | 100.0% |
Initial Labels | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|
10% | 0.9427 | 0.9811 | 0.7967 | 0.8793 |
30% | 0.9564 | 0.9770 | 0.8487 | 0.9083 |
50% | 0.9643 | 0.9823 | 0.8731 | 0.9245 |
70% | 0.9693 | 0.9814 | 0.8915 | 0.9343 |
90% | 0.9720 | 0.9867 | 0.9024 | 0.9427 |
Model | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|
DeepWalk | 0.9171 | 0.9332 | 0.7365 | 0.8233 |
Node2vec | 0.9212 | 0.9323 | 0.7539 | 0.8337 |
BasicGCN | 0.9341 | 0.9786 | 0.7640 | 0.8581 |
AGCN-Domain | 0.9427 | 0.9811 | 0.7967 | 0.8793 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Luo, X.; Li, Y.; Cheng, H.; Yin, L. AGCN-Domain: Detecting Malicious Domains with Graph Convolutional Network and Attention Mechanism. Mathematics 2024, 12, 640. https://doi.org/10.3390/math12050640
Luo X, Li Y, Cheng H, Yin L. AGCN-Domain: Detecting Malicious Domains with Graph Convolutional Network and Attention Mechanism. Mathematics. 2024; 12(5):640. https://doi.org/10.3390/math12050640
Chicago/Turabian StyleLuo, Xi, Yixin Li, Hongyuan Cheng, and Lihua Yin. 2024. "AGCN-Domain: Detecting Malicious Domains with Graph Convolutional Network and Attention Mechanism" Mathematics 12, no. 5: 640. https://doi.org/10.3390/math12050640
APA StyleLuo, X., Li, Y., Cheng, H., & Yin, L. (2024). AGCN-Domain: Detecting Malicious Domains with Graph Convolutional Network and Attention Mechanism. Mathematics, 12(5), 640. https://doi.org/10.3390/math12050640