An Adaptive Multi-Layer Botnet Detection Technique Using Machine Learning Classifiers
Abstract
:1. Introduction
- This article presents a multi-layer approach to classify network traffic (P2P botnet traffic and non-P2P traffic) and identify botnets by applying machine learning classifier on network features such as port filtering, DNS query, and flow counting.
- Our work presents a P2P botnet detection framework based on a decision tree algorithm for feature selection to extract the most relevant features and ignore the irrelevant features.
- At the first layer, we filter non-P2P packets to reduce the amount of network traffic by applying port filtering using well-known ports, DNS query, and flow counting. The second layer further classified the captured network traffic into two classes such as non-P2P and P2P. At the third layer of our model, we reduced the features which may marginally affect the classification. At the final layer, we successfully detected P2P botnets using decision tree Classifier by extracting network communication features.
- The proposed technique of this study covers the limitations of single stage botnet detection, e.g., class imbalance.
- The accuracy of our model is 98.7% and the threshold of false alarm rate was reduced to 3. Furthermore, our experiments also demonstrate that the accuracy of proposed framework was improved up to 99% however at the expense of false reporting of benign files as botnets as well as false reporting of botnet as benign. We also consider the factor if benign files also send out search requests consistently so benign files may be reported as botnets. Additionally, it was observed the accuracy might be improved by increasing the epochs of deep learning algorithms at the expense of more execution cost.
- To validate the performance of our proposed technique, we performed the experiments on diverse datasets and the results are compared with other machine learning algorithms implemented for botnet detection.
Structure of Paper
2. Related Work
3. Proposed Scheme
3.1. Multi-Layer Detection Method
3.2. First Layer: Traffic Reduction
3.3. Second Layer: P2P and Non-P2P Traffic Classification
Algorithm 1 Second layer classification |
|
3.4. Third Layer: Feature Extraction and Feature Reduction
3.5. Fourth Layer: P2P Traffic Classification
Algorithm 2 Fourth Layer: P2P Traffic Classification |
|
4. Results and Analysis
4.1. Evaluating Metrics
4.1.1. Accuracy of Detection Model
4.1.2. False Alarm Rate (FAR)
4.2. Dataset and Experimental Setup
4.3. Detection Results and Discussion
4.4. Comparison with Other Machine Learning Classifiers
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Ma, X.; Zhang, J.; Tao, J.; Li, J.; Tian, J.; Guan, X. DNSRadar: Outsourcing malicious domain detection based on distributed cache-footprints. IEEE Trans. Inf. Forensics Secur. 2014, 9, 1906–1921. [Google Scholar] [CrossRef]
- Zhao, S.; Lee, P.P.; Lui, J.; Guan, X.; Ma, X.; Tao, J. Cloud-based push-styled mobile botnets: A case study of exploiting the cloud to device messaging service. In Proceedings of the 28th Annual Computer Security Applications Conference, Orlando, FL, USA, 3–7 December 2012; pp. 119–128. [Google Scholar]
- Ma, X.; Guan, X.; Tao, J.; Zheng, Q.; Guo, Y.; Liu, L.; Zhao, S. A novel IRC botnet detection method based on packet size sequence. In Proceedings of the 2010 IEEE International Conference on Communications (ICC), Cape Town, South Africa, 23–27 May 2010; pp. 1–5. [Google Scholar]
- Alazab, M.; Broadhurst, R. Spam and criminal activity. In Trends and Issues in Crime and Criminal Justice; Australian Institute of Criminology: Canberra, Australia, 2016. [Google Scholar]
- Arora, A.; Yadav, S.K.; Sharma, K. Denial-of-Service (DoS) Attack and Botnet: Network Analysis, Research Tactics, and Mitigation. In Handbook of Research on Network Forensics and Analysis Techniques; IGI Global: Hershey, PA, USA, 2018; pp. 117–141. [Google Scholar]
- Zhang, J.; Xie, Y.; Yu, F.; Soukal, D.; Lee, W. Intention and Origination: An Inside Look at Large-Scale Bot Queries. In Proceedings of the NDSS Symposium 2013, San Diego, CA, USA, 24–27 February 2013. [Google Scholar]
- Roto, V.; Oulasvirta, A.; Haikarainen, T.; Kuorelahti, J.; Lehmuskallio, H.; Nyyssönen, T. You Are a Game Bot!: Uncovering Game Bots in MMORPGs via Self-similarity in the Wild. In Proceedings of the NDSS 2016, San Diego, CA, USA, 21–24 February 2016; pp. 1–19. [Google Scholar] [CrossRef]
- Krueger, T.; Gascon, H.; Krämer, N.; Rieck, K. Learning stateful models for network honeypots. In Proceedings of the 5th ACM Workshop on Security and Artificial Intelligence (AISec ’12), Raleigh, NC, USA, 19 October 2012; p. 37. [Google Scholar] [CrossRef]
- Zhang, H.; Lu, G.; Qassrawi, M.T.; Zhang, Y.; Yu, X. Feature selection for optimizing traffic classification. Comput. Commun. 2012, 35, 1457–1471. [Google Scholar] [CrossRef]
- Chen, C.M.; Lai, G.H.; Young, P.Y. Defense Joint Attacks Based on Stochastic Discrete Sequence Anomaly Detection. In Proceedings of the 2016 11th Asia Joint Conference on Information Security (AsiaJCIS), Fukuoka, Japan, 4–5 August 2016; pp. 74–79. [Google Scholar] [CrossRef]
- Azab, A.; Alazab, M.; Aiash, M. Machine Learning Based Botnet Identification Traffic. In Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, Tianjin, China, 23–26 August 2016; pp. 1788–1794. [Google Scholar] [CrossRef]
- Wang, S.; Yan, Q.; Chen, Z.; Yang, B.; Zhao, C.; Conti, M. Detecting android malware leveraging text semantics of network flows. IEEE Trans. Inf. Forensics Secur. 2018, 13, 1096–1109. [Google Scholar] [CrossRef]
- Albanese, M.; Jajodia, S.; Venkatesan, S. Defending from stealthy botnets using moving target defenses. IEEE Secur. Priv. 2018, 16, 92–97. [Google Scholar] [CrossRef]
- Haddadi, F.; Zincir-Heywood, A.N. Botnet behaviour analysis: How would a data analytics-based system with minimum a priori information perform? Int. J. Netw. Manag. 2017, 27, e1977. [Google Scholar] [CrossRef]
- Alazab, M. Profiling and classifying the behavior of malicious codes. J. Syst. Softw. 2015, 100, 91–102. [Google Scholar] [CrossRef]
- Alazab, M.; Venkataraman, S.; Watters, P. Towards understanding malware behaviour by the extraction of API calls. In Proceedings of the 2010 Second Cybercrime and Trustworthy Computing Workshop, Ballarat, VIC, Australia, 19–20 July 2010; pp. 52–59. [Google Scholar]
- Zhang, J.; Perdisci, R.; Lee, W.; Luo, X.; Sarfraz, U. Building a Scalable System for Stealthy P2P-Botnet Detection. IEEE Trans. Inf. Forensics Secur. 2014, 9, 27–38. [Google Scholar] [CrossRef]
- Abdullah, R.S.; Abdollah, M.F.; Noh, Z.A.M.; Mas’ud, M.Z.; Sahib, S.; Yusof, R. Preliminary study of host and network-based analysis on P2P Botnet detection. In Proceedings of the 2013 International Conference on Technology, Informatics, Management, Engineering and Environment, Bandung, Indonesia, 23–26 June 2013; pp. 105–109. [Google Scholar] [CrossRef]
- Yin, C. Towards accurate node-based detection of P2P botnets. Sci. World J. 2014, 2014, 425491. [Google Scholar] [CrossRef]
- Botnet detection based on traffic behavior analysis and flow intervals. Comput. Secur. 2013, 39, 2–16. [CrossRef]
- Sharifnya, R.; Abadi, M. DFBotKiller: Domain-flux botnet detection based on the history of group activities and failures in DNS traffic. Digit. Investig. 2015, 12, 15–26. [Google Scholar] [CrossRef]
- Buczak, A.L.; Guven, E. A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection. IEEE Commun. Surv. Tutor. 2016, 18, 1153–1176. [Google Scholar] [CrossRef]
- Ye, W.; Cho, K. P2P and P2P botnet traffic classification in two stages. Soft Comput. 2017, 21, 1315–1326. [Google Scholar] [CrossRef]
- Ye, W.; Cho, K. Hybrid P2P traffic classification with heuristic rules and machine learning. Soft Comput. 2014, 18, 1815–1827. [Google Scholar] [CrossRef]
- Dainotti, A.; King, A.; Claffy, K.; Papale, F.; Pescapé, A. Analysis of a/0 stealth scan from a botnet. IEEE/ACM Trans. Netw. 2015, 23, 341–354. [Google Scholar] [CrossRef]
- Guofei, G.; Yegneswaran, V.; Porras, P.; Stoll, J.; Lee, W. Active botnet probing to identify obscure command and control channels. In Proceedings of the IEEE Annual Computer Security Application Conference, Honolulu, HI, USA, 7–11 December 2009; pp. 241–253. [Google Scholar]
- Guofei, G.; Perdisci, R.; Zhang, J.; Lee, W. BotMiner: Clustering analysis of network traffic for protocol- and structure-independent botnet detection. In Proceedings of the 17th Conference on Security Symposium, San Jose, CA, USA, 28 July–1 August 2008; pp. 139–154. [Google Scholar]
- Zhang, J.; Perdisci, R.; Lee, W.; Sarfraz, U.; Luo, X. Detecting stealthy P2P botnets using statistical traffic fingerprints. In Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN), Hong Kong, China, 27–30 June 2011; pp. 121–132. [Google Scholar] [CrossRef]
- Gu, G.; Zhang, J.; Lee, W. BotSniffer: Detecting botnet command and control channels in network traffic. In Proceedings of the 15th Annual Network and Distributed System Security Symposium, San Diego, CA, USA, 8–11 February 2008; p. 18. [Google Scholar]
- Gupta, B.; Badve, O.P. Taxonomy of DoS and DDoS attacks and desirable defense mechanism in a cloud computing environment. Neural Comput. Appl. 2017, 28, 3655–3682. [Google Scholar] [CrossRef]
- Wang, P.; Wu, L.; Aslam, B.; Zou, C.C. Analysis of Peer-to-Peer botnet attacks and defenses. In Propagation Phenomena in Real World Networks; Springer: Cham, Switzerland, 2015; pp. 183–214. [Google Scholar]
- Wang, C.; Zhou, X.; You, F.; Chen, H. Design of P2P Traffic Identification Based on DPI and DFI. In Proceedings of the 2009 International Symposium on Computer Network and Multimedia Technology, Wuhan China, 18–20 January 2009; pp. 1–4. [Google Scholar] [CrossRef]
- Yue, W.T.; Wang, Q.H.; Hui, K.L. See No Evil, Hear No Evil? Dissecting the Impact of Online Hacker Forums. MIS Q. 2019, 43, 73–95. [Google Scholar] [CrossRef]
- Bhuyan, M.H.; Bhattacharyya, D.K.; Kalita, J.K. Surveying port scans and their detection methodologies. Comput. J. 2011, 54, 1565–1581. [Google Scholar] [CrossRef]
- IANA. Service Name and Transport Protocol Port Number Registry. Available online: https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml (accessed on 20 April 2019).
- Stanek, W. Name-Resolution Services. In Windows Server 2012 Pocket Consultant; The Microsoft Press Store by Pearson: Redmond, WA, USA, 2012; Chapter Windows Se. [Google Scholar]
- Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 23–25 May 2016; pp. 582–597. [Google Scholar]
- Kim, K. A hybrid classification algorithm by subspace partitioning through semi-supervised decision tree. Pattern Recognit. 2016, 60, 157–163. [Google Scholar] [CrossRef]
- Davis, J.; Goadrich, M. The Relationship Between Precision-Recall and ROC Curves. In Proceedings of the 23rd International Conference on Machine Learnin, Pittsburgh, PA, USA, 25–29 June 2006; pp. 1–8. [Google Scholar]
- Peng, P.; Xiang, T.; Wang, Y.; Pontil, M.; Gong, S.; Huang, T.; Tian, Y. Unsupervised cross-dataset transfer learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1306–1315. [Google Scholar]
- Garcia, S.; Grill, M.; Stiborek, J.; Zunino, A. An empirical comparison of botnet detection methods. Comput. Secur. 2014, 45, 100–123. [Google Scholar] [CrossRef]
- Chowdhury, S.; Khanzadeh, M.; Akula, R.; Zhang, F.; Zhang, S.; Medal, H.; Marufuzzaman, M.; Bian, L. Botnet detection using graph-based feature clustering. J. Big Data 2017, 4, 14. [Google Scholar] [CrossRef] [Green Version]
- Saad, S.; Traore, I.; Ghorbani, A.; Sayed, B.; Zhao, D.; Lu, W.; Felix, J.; Hakimian, P. Detecting P2P botnets through network behavior analysis and machine learning. In Proceedings of the 2011 9th Annual International Conference on Privacy, Security and Trust (PST 2011), Montreal, QC, Canada, 19–21 July 2011; pp. 174–180. [Google Scholar] [CrossRef]
- Sherif, S.; Traore, I.; Ghorbani, A.A.; Sayed, B.; Zhao, D.; Lu, W.; Felix, J.; Hakimian, P. ISOT Dataset Description. In Proceedings of the 9th Annual Conference on Privacy, Security and Trust (PST2011), Montreal, QC, Canada, 19–21 July 2011; pp. 19–21. [Google Scholar]
- CiTRIX. How to Decrypt SSL and TLS Traffic Using Wireshark. Available online: https://support.citrix.com/article/CTX116557 (accessed on 20 April 2019).
- Martin, H.; Milan, Č.; Tomáš, J.; Pavel, Č. HTTPS Traffic Analysis and Client Identification Using Passive SSL/TLS Fingerprinting. EURASIP J. Inf. Secur. 2016, 1, 1–22. [Google Scholar]
- Radivilova, T.; Kirichenko, L.; Ageyev, D.; Tawalbeh, M.; Bulakh, V. Decrypting SSL/TLS traffic for hidden threats detection. In Proceedings of the 2018 IEEE 9th International Conference on Dependable Systems, Services and Technologies (DESSERT), Kiev, Ukraine, 12 July 2018; pp. 143–146. [Google Scholar] [CrossRef]
- Meyer, C.; Schwenk, J. Lessons Learned From Previous SSL/TLS Attacks—A Brief Chronology Of Attacks And Weaknesses. IACR Cryptol. EPrint Arch. 2013, 49, 1–14. [Google Scholar]
- Khan, R.U.; Zhang, X.; Kumar, R. Analysis of ResNet and GoogleNet models for malware detection. J. Comput. Virol. Hacking Tech. 2018, 15, 29–37. [Google Scholar] [CrossRef]
- Khan, R.U.; Zhang, X.; Kumar, R.; Aboagye, E.O. Evaluating the Performance of ResNet Model Based on Image Recognition. In Proceedings of the 2018 International Conference on Computing and Artificial Intelligence (ICCAI 2018), Chengdu, China, 12–14 March 2018; pp. 86–90. [Google Scholar] [CrossRef]
- Kumar, R.; Zhang, X.; Khan, R.U.; Ahad, I.; Kumar, J. Malicious Code Detection Based on Image Processing Using Deep Learning. In Proceedings of the 2018 International Conference on Computing and Artificial Intelligence (ICCAI 2018), Chengdu, China, 12–14 March 2018; pp. 81–85. [Google Scholar] [CrossRef]
- Kumar, R.; Zhang, X.; Khan, R.; Sharif, A.; Kumar, R.; Zhang, X.; Khan, R.U.; Sharif, A. Research on Data Mining of Permission-Induced Risk for Android IoT Devices. Appl. Sci. 2019, 9, 277. [Google Scholar] [CrossRef]
- Liao, W.H.; Chang, C.C. Peer to peer botnet detection using data mining scheme. In Proceedings of the 2010 International Conference on Internet Technology and Applications, Wuhan, China, 20–22 August 2010; pp. 1–4. [Google Scholar]
- Fedynyshyn, G.; Chuah, M.C.; Tan, G. Detection and classification of different botnet C&C channels. In International Conference on Autonomic and Trusted Computing; Springer: Berlin/Heidelberg, Germany, 2011; pp. 228–242. [Google Scholar]
- Dainotti, A.; Pescape, A.; Ventre, G. Worm Traffic Analysis and Characterization. In Proceedings of the 2007 IEEE International Conference on Communications, Glasgow, UK, 24–28 June 2007; pp. 1435–1442. [Google Scholar] [CrossRef]
Application | Port Number | Transport Protocol | Description |
---|---|---|---|
SSH | 22 | TCP, UDP, SCTP | Secure Shell Protocol |
TelNet | 23 | TCP, UDP | TelNet |
25, 110, 143, 220, 465, 993, 995 | SMTP, POP3, IMAP4, IMAP, SMTP over TLS, IMAP4 over SSL, POP3 over SSL | Simple Mail Transfer, Post Office Protocol—Version 3, Internet Message Access Protocol, Interactive Mail Access Protocol v3, Message Submission over TLS protocol, IMAP4/POP3 over TLS/SSL (993, 995) | |
NetBios | 125, 137, 139, 445 | TCP, UDP | Locus PC-Interface Net Map Server, NETBIOS Name Service, Microsoft-DS |
Remote | 3389 | TCP, UDP | MS WBT Server |
FTP-Data | 20, 21 | TCP, UDP, SCTP | File Transfer |
NTP | 123 | TCP, UDP | Network Time Protocol |
Eigenvalues | Description |
---|---|
avg_duration | The mean of the total duration of the different network flows in the same session. |
std_duration | The standard deviation of the total duration of the different network flows in the same session. |
min_duration | The minimum total duration of the different network flows in the same session. |
max_duration | The maximum total duration of the different network flows in the same session. |
avg_f(b)int | Average interval of uplink (downstream) packet transmission for different network flows in the same session. |
max_f(b)pl | Average value of the maximum value of the uplink transmission packet length for different network flows in the same session. |
min_f(b)pl | Average value of the minimum value of the uplink transmission packet length for different network flows in the same session. |
std_avg_f(b)pl | The standard deviation of the average of the length of the packet transmitted in the uplink (downstream) of different network flows in the same session. |
avg_f(b)pen | Average number of valid packets transmitted upstream (downstream) of different network flows in the same session. |
std_avg_f(b)pen | The standard deviation of the number of valid packets transmitted upstream (downstream) of different network flows in the same session. |
avg_f(b)pb | The average of the total number of bytes transmitted upstream (downstream) of different network flows in the same session. |
std_f(b)pb | The standard deviation of the total number of bytes transmitted upstream (downstream) of different network flows in the same session. |
Scenario No. | Scenario Name | Type of Attack | Packets Captured | Size (GB) | Duration (h) | Number of Infected Nodes | Displayed |
---|---|---|---|---|---|---|---|
1 | Neris | IRC, SPAM, CF | 323,154 | 52 | 6.15 | 1 | 100% |
2 | Neris | IRC, SPAM, CF | 176,064 | 60 | 4.21 | 1 | 100% |
3 | Rbot | IRC, PS, US | 495,056 | 121 | 66.85 | 1 | 100% |
4 | Rbot | IRC, DDoS, US | 256,712 | 53 | 4.21 | 1 | 100% |
5 | Virut | SPAM, PS, HTTP | 45,853 | 37.6 | 11.63 | 1 | 100% |
6 | Menti | PS | 24,764 | 30 | 2.18 | 1 | 100% |
7 | Sogou | HTTP | 20,663 | 5.8 | 0.38 | 1 | 100% |
8 | Murlo | PS | 85,735 | 123 | 19.5 | 1 | 100% |
9 | Neris | IRC, SPAM, CF, PS | 2,129,949 | 94 | 5.18 | 10 | 100% |
10 | Rbot | IRC, DDoS, US | 66,340,518 | 73 | 4.75 | 10 | 100% |
11 | Rbot | IRC, DDoS, US | 3,941,769 | 5.2 | 0.26 | 3 | 100% |
12 | NSIS.ay | P2P botnet | 352,266 | 8.3 | 1.21 | 3 | 100% |
13 | Virut | SPAM, PS, HTTP | 440,625 | 34 | 16.36 | 1 | 100% |
Dataset Name | Type of Attack | Description | Packets Captured | Size (GB) | Duration (h) | Displayed |
---|---|---|---|---|---|---|
ISOT [43] | Storm | Worm Traffic | 371,899 | 10.6 | 4.1 | 100% |
Waledac | Fake SMTP and UDP (SPAM) | |||||
Malicious SMTP and Malicious UDP | Trojan Horse | |||||
Benign Traffic | TCP, UDP | Clean Traffic | 148,222 | 3.5 | 31.2 | 100% |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khan, R.U.; Zhang, X.; Kumar, R.; Sharif, A.; Golilarz, N.A.; Alazab, M. An Adaptive Multi-Layer Botnet Detection Technique Using Machine Learning Classifiers. Appl. Sci. 2019, 9, 2375. https://doi.org/10.3390/app9112375
Khan RU, Zhang X, Kumar R, Sharif A, Golilarz NA, Alazab M. An Adaptive Multi-Layer Botnet Detection Technique Using Machine Learning Classifiers. Applied Sciences. 2019; 9(11):2375. https://doi.org/10.3390/app9112375
Chicago/Turabian StyleKhan, Riaz Ullah, Xiaosong Zhang, Rajesh Kumar, Abubakar Sharif, Noorbakhsh Amiri Golilarz, and Mamoun Alazab. 2019. "An Adaptive Multi-Layer Botnet Detection Technique Using Machine Learning Classifiers" Applied Sciences 9, no. 11: 2375. https://doi.org/10.3390/app9112375
APA StyleKhan, R. U., Zhang, X., Kumar, R., Sharif, A., Golilarz, N. A., & Alazab, M. (2019). An Adaptive Multi-Layer Botnet Detection Technique Using Machine Learning Classifiers. Applied Sciences, 9(11), 2375. https://doi.org/10.3390/app9112375