Detecting the Presence of Malware and Identifying the Type of Cyber Attack Using Deep Learning and VGG-16 Techniques
Abstract
:1. Introduction
- This work proposed a multi-stage architecture consisting of two modified VGG-19 models.
- Converting benign and malware exe files from raw data into grayscale images.
- Pre-processing techniques were applied to these images.
- The transfer learning approach was applied to our models, which were pre-trained on a Google ImageNet dataset of images of the size 224 * 224 * 3.
- The first stage VGG-19 model achieved an accuracy of 99% on the testing set, and the second stage VGG-19 model achieved an accuracy of 98.2% on the testing set.
2. Literature Review
3. Materials and Methods
3.1. Dataset
3.2. Types of Malware
3.2.1. Locker
3.2.2. Mediyes
3.2.3. Winwebsec
3.2.4. ZeroAccess
3.2.5. Zbot
3.3. Pre-Processing
3.3.1. Converting Benign and Malware Exe Files into Grayscale Images
3.3.2. Formatting Images
3.4. Data Augmentation
3.4.1. Rotation
3.4.2. Horizontal Flipping
3.4.3. Vertical Flipping
3.4.4. Shearing
3.4.5. Zooming
3.4.6. Splitting the Dataset into Training and Testing
3.5. Methodology
3.5.1. Multi-Stage Architecture
3.5.2. First Stage Network
3.5.3. Second Stage Network
4. Results and Discussion
4.1. Discussion
4.2. Comparative Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sharma, S.; Krishna, C.R.; Sahay, S.K. Detection of advanced Malware by machine learning techniques. In Soft Computing: Theories and Applications; Springer: Berlin/Heidelberg, Germany, 2019; pp. 333–342. [Google Scholar]
- Raff, E.; Barker, J.; Sylvester, J.; Brandon, R.; Catanzaro, B.; Nicholas, C.K. Malware detection by eating a whole exe. In Proceedings of the Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Vasan, D.; Alazab, M.; Wassan, S.; Naeem, H.; Safaei, B.; Zheng, Q. IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Comput. Netw. 2020, 171, 107138. [Google Scholar] [CrossRef]
- Accenture, the Cost of Cybercrime: Ninth Annual Study. 2020. Available online: https://www.accenture.com/_acnmedia/PDF-96/Accenture-2019-Cost-of-Cybercrime-Study-Final.pdf (accessed on 17 December 2020).
- Nadler, A.; Aminov, A.; Shabtai, A. Detection of malicious and low throughput data exfiltration over the DNS protocol. Comput. Secur. 2019, 80, 36–53. [Google Scholar] [CrossRef] [Green Version]
- Alazab, M.; Alazab, M.; Shalaginov, A.; Mesleh, A.; Awajan, A. Intelligent mobile malware detection using permission requests and API calls. Futur. Gener. Comput. Syst. 2020, 107, 509–521. [Google Scholar] [CrossRef]
- Makkar, A.; Obaidat, M.S.; Kumar, N. Fs2rnn: Feature Selection Scheme for Web Spam Detection Using Recurrent Neural Networks. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–6. [Google Scholar]
- Abawajy, J.H.; Kelarev, A. Iterative classifier fusion system for the detection of Android malware. IEEE Trans. Big Data 2017, 5, 282–292. [Google Scholar] [CrossRef]
- Sharmeen, S.; Huda, S.; Abawajy, J.H.; Ismail, W.N.; Hassan, M.M. Malware threats and detection for industrial mobile-IoT networks. IEEE Access 2018, 6, 15941–15957. [Google Scholar] [CrossRef]
- Awan, M.J.; Farooq, U.; Babar, H.M.A.; Yasin, A.; Nobanee, H.; Hussain, M.; Hakeem, O.; Zain, A.M. Real-time DDoS attack detection system using big data approach. Sustainability 2021, 13, 10743. [Google Scholar] [CrossRef]
- Mohammed, M.A.; Ibrahim, D.A.; Salman, A.O. Adaptive intelligent learning approach based on visual anti-spam email model for multi-natural language. J. Intell. Syst. 2021, 30, 774–792. [Google Scholar] [CrossRef]
- Azeez, N.A.; Odufuwa, O.E.; Misra, S.; Oluranti, J.; Damaševičius, R. Windows PE malware detection using ensemble learning. Informatics 2021, 8, 10. [Google Scholar] [CrossRef]
- Khalaf, B.A.; Mostafa, S.A.; Mustapha, A.; Mohammed, M.A.; Mahmoud, M.A.; Al-Rimy, B.A.S.; Abd Razak, S.; Elhoseny, M.; Marks, A. An adaptive protection of flooding attacks model for complex network environments. Secur. Commun. Netw. 2021, 2021. [Google Scholar] [CrossRef]
- Azizan, A.H.; Mostafa, S.A.; Mustapha, A.; Foozy, C.F.M.; Wahab, M.H.A.; Mohammed, M.A.; Khalaf, B.A. A machine learning approach for improving the performance of network intrusion detection systems. Ann. Emerg. Technol. Comput. 2021, 5, 201–208. [Google Scholar] [CrossRef]
- Damaševičius, R.; Venčkauskas, A.; Toldinas, J.; Grigaliūnas, Š. Ensemble-based classification using neural networks and machine learning models for windows pe malware detection. Electronics 2021, 10, 485. [Google Scholar] [CrossRef]
- Awan, M.J.; Yasin, A.; Nobanee, H.; Ali, A.A.; Shahzad, Z.; Nabeel, M.; Zain, A.M.; Shahzad, H.M.F. Fake news data exploration and analytics. Electronics 2021, 10, 2326. [Google Scholar] [CrossRef]
- Shamshirband, S.; Fathi, M.; Chronopoulos, A.T.; Montieri, A.; Palumbo, F.; Pescapè, A. Computational intelligence intrusion detection techniques in mobile cloud computing environments: Review, taxonomy, and open research issues. J. Inf. Secur. Appl. 2020, 55, 102582. [Google Scholar] [CrossRef]
- Shamshirband, S.; Chronopoulos, A.T. A New Malware Detection System Using a High Performance-ELM Method. In Proceedings of the 23rd International Database Applications & Engineering Symposium, Athens, Greece, 10–12 June 2019; pp. 1–10. [Google Scholar]
- Rezende, E.; Ruppert, G.; Carvalho, T.; Ramos, F.; de Geus, P. Malicious Software Classification Using Transfer Learning of Resnet-50 Deep Neural Network. In Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 18–21 December 2017; pp. 1011–1014. [Google Scholar]
- Khan, R.U.; Zhang, X.; Kumar, R. Analysis of ResNet and GoogleNet models for malware detection. J. Comput. Virol. Hacking Tech. 2019, 15, 29–37. [Google Scholar] [CrossRef]
- Vasan, D.; Alazab, M.; Wassan, S.; Safaei, B.; Zheng, Q. Image-Based malware classification using ensemble of CNN architectures (IMCEC). Comput. Secur. 2020, 92, 101748. [Google Scholar] [CrossRef]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How Transferable are Features in Deep Neural Networks? In Advances in Neural Information Processing Systems 27 (NIPS 2014); Curran Associates, Inc.: Montreal, QC, Canada, 2014; Volume 27. [Google Scholar]
- Agarap, A.F. Towards building an intelligent anti-malware system: A deep learning approach using support vector machine (SVM) for malware classification. arXiv Prepr. 2017, arXiv:1801.00318. [Google Scholar]
- Akarsh, S.; Poornachandran, P.; Menon, V.K.; Soman, K.P. A Detailed Investigation and Analysis of Deep Learning Architectures and Visualization Techniques for Malware Family Identification. In Cybersecurity and Secure Information Systems; Springer: Berlin/Heidelberg, Germany, 2019; pp. 241–286. [Google Scholar]
- Akarsh, S.; Simran, K.; Poornachandran, P.; Menon, V.K.; Soman, K.P. Deep Learning Framework and Visualization for Malware Classification. In Proceedings of the 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS), Coimbatore, India, 15–16 March 2019; pp. 1059–1063. [Google Scholar]
- Kumar, S. MCFT-CNN: Malware classification with fine-tune convolution neural networks using traditional and transfer learning in Internet of Things. Futur. Gener. Comput. Syst. 2021, 125, 334–351. [Google Scholar]
- Cui, Z.; Xue, F.; Cai, X.; Cao, Y.; Wang, G.; Chen, J. Detection of malicious code variants based on deep learning. IEEE Trans. Ind. Informatics 2018, 14, 3187–3196. [Google Scholar] [CrossRef]
- Cui, Z.; Du, L.; Wang, P.; Cai, X.; Zhang, W. Malicious code detection based on CNNs and multi-objective algorithm. J. Parallel Distrib. Comput. 2019, 129, 50–58. [Google Scholar] [CrossRef]
- Jain, M.; Andreopoulos, W.; Stamp, M. CNN vs ELM for Image-Based Malware Classification. arXiv Prepr. 2021, arXiv:2103.13820. [Google Scholar]
- Naeem, H.; Ullah, F.; Naeem, M.R.; Khalid, S.; Vasan, D.; Jabbar, S.; Saeed, S. Malware detection in industrial Internet of things based on hybrid image visualization and deep learning model. Ad Hoc Netw. 2020, 105, 102154. [Google Scholar] [CrossRef]
- Venkatraman, S.; Alazab, M.; Vinayakumar, R. A hybrid deep learning image-based analysis for effective malware detection. J. Inf. Secur. Appl. 2019, 47, 377–389. [Google Scholar] [CrossRef]
- Vu, D.-L.; Nguyen, T.-K.; Nguyen, T.V.; Nguyen, T.N.; Massacci, F.; Phung, P.H. A Convolutional Transformation Network for Malware Classification. In Proceedings of the 2019 6th NAFOSTED Conference on Information and Computer Science (NICS), Hanoi, Vietnam, 12–13 December 2019; pp. 234–239. [Google Scholar]
- Moussas, V.; Andreatos, A. Malware detection based on code visualization and two-level classification. Information 2021, 12, 118. [Google Scholar] [CrossRef]
- Verma, V.; Muttoo, S.K.; Singh, V.B. Multiclass malware classification via first-and second-order texture statistics. Comput. Secur. 2020, 97, 101895. [Google Scholar] [CrossRef]
- Çayır, A.; Ünal, U.; Dağ, H. Random CapsNet forest model for imbalanced malware type classification task. Comput. Secur. 2021, 102, 102133. [Google Scholar] [CrossRef]
- Woźniak, M.; Siłka, J.; Alrashoud, M.W.M. Recurrent neural network model for IoT and networking malware threat detection. IEEE Trans. Ind. Inform. 2020, 17, 5583–5594. [Google Scholar] [CrossRef]
- Kim, J.; Ban, Y.; Ko, E.; Cho, H.; Yi, J.H. MAPAS: A practical deep learning-based android malware detection system. Int. J. Inf. Secur. 2022, 21, 725–738. [Google Scholar] [CrossRef]
- Tuan, A.P.; Phuong, A.T.H.; Thanh, N.V.; Van, T.N. Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset. figshare. Dataset. 2018. Available online: https://figshare.com/articles/dataset/Malware_Detection_PE-Based_Analysis_Using_Deep_Learning_Algorithm_Dataset/6635642/1 (accessed on 16 October 2022).
- Nappa, A.; Rafique, M.Z.; Caballero, J. The MALICIA dataset: Identification and analysis of drive-by download operations. Int. J. Inf. Secur. 2015, 14, 15–33. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.F. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond Accuracy, F-score and ROC: A Family of Discriminant Measures for Performance Evaluation. In AI 2006: Advances in Artificial Intelligence, Proceedings of the 19th Australian Joint Conference on Artificial Intelligence, Hobart, Australia, 4–8 December 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1015–1021. [Google Scholar]
- Hemalatha, J.; Roseline, S.A.; Geetha, S.; Kadry, S.; Damaševičius, R. An efficient densenet-based deep learning model for malware detection. Entropy 2021, 23, 344. [Google Scholar] [CrossRef]
- Kumar, R.; Xiaosong, Z.; Khan, R.U.; Ahad, I.; Kumar, J. Malicious Code Detection Based on Image Processing Using Deep Learning. In Proceedings of the 2018 International Conference on Computing and Artificial Intelligence, Chengdu, China, 12–14 March 2018. [Google Scholar]
- Mercaldo, F.; Santone, A. Deep learning for image-based mobile malware detection. J. Comput. Virol. Hacking Tech. 2020, 16, 157–171. [Google Scholar] [CrossRef]
- Almusawi, H. Visual Malware Detection by Deep Learning Techniques inWindows System. Optim. Model. 2021, 1, 10–13. [Google Scholar]
- Awan, M.J.; Masood, O.A.; Mohammed, M.A.; Yasin, A.; Zain, A.M.; Damaševičius, R.; Abdulkareem, K.H. Image-Based Malware Classification Using VGG19 Network and Spatial Convolutional Attention. Electronics 2021, 10, 2444. [Google Scholar] [CrossRef]
Class | Precision | Recall | F1-Score |
---|---|---|---|
Benign | 0.96 | 0.94 | 0.95 |
Malware | 0.99 | 1.00 | 0.99 |
Class | Precision | Recall | F1-Score |
---|---|---|---|
Locker | 0.93 | 0.87 | 0.90 |
Mediyes | 1.00 | 0.98 | 0.99 |
Winwebsec | 0.99 | 1.00 | 0.99 |
Zbot | 0.98 | 0.98 | 0.98 |
Zeroaccess | 0.93 | 1.00 | 0.96 |
Method | Testing Accuracy |
---|---|
First stage modified VGG-19 model | 99% |
Second stage modified VGG-19 model | 98.2% |
Reference | Dataset | Feature Extraction Classification | Accuracy |
---|---|---|---|
[43] | They worked on many datasets for obtaining their results, such as Malimg dataset, Microsoft BIG 2015, MaleVis dataset and Malicia dataset. | -DenseNet model | They achieved an accuracy of 98.2% on Malimg dataset, 98.46% on BIG2015 dataset, 98.2% on MaleVis dataset and 89.48% on the Malicia dataset. |
[44] | The first two datasets were malicious datasets; the first one was obtained from Vision Research Lab and the second one was obtained from Microsoft malware Classification Challenge. The third one was obtained by collecting 3000 benign exe files from different sources. | -CNN Model | 98% |
[45] | They worked on 50,000 Android file (24,553 were malicious among 71 families and 25,447 were non-malicious) and 230 Apple files (115 sample belonged to 10 different families). | -CNN model | They obtained an accuracy on android families of about 92.9% and on IOS families of about 96.4%. |
[47] | They worked on a dataset called Malimg which consisted of 25 families and 9339 samples. | -CNN model | 96.76%. |
Our Research study | The dataset consisted of 8970 malware and 1000 non malware (benign) excutable files. The malware files were divided into 5 different types in the dataset which werw: Locker, Mediyes, Winwebsec, Zeroaccess, Zbot. All those malware files were collected from Malicia dataset and virus share website. | -VGG-19 Model | An accuracy of 99% on the first stage modified VGG-19 model and 98.2% on the second stage modified VGG-19 model. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alzahrani, A.I.A.; Ayadi, M.; Asiri, M.M.; Al-Rasheed, A.; Ksibi, A. Detecting the Presence of Malware and Identifying the Type of Cyber Attack Using Deep Learning and VGG-16 Techniques. Electronics 2022, 11, 3665. https://doi.org/10.3390/electronics11223665
Alzahrani AIA, Ayadi M, Asiri MM, Al-Rasheed A, Ksibi A. Detecting the Presence of Malware and Identifying the Type of Cyber Attack Using Deep Learning and VGG-16 Techniques. Electronics. 2022; 11(22):3665. https://doi.org/10.3390/electronics11223665
Chicago/Turabian StyleAlzahrani, Abdullah I. A., Manel Ayadi, Mashael M. Asiri, Amal Al-Rasheed, and Amel Ksibi. 2022. "Detecting the Presence of Malware and Identifying the Type of Cyber Attack Using Deep Learning and VGG-16 Techniques" Electronics 11, no. 22: 3665. https://doi.org/10.3390/electronics11223665
APA StyleAlzahrani, A. I. A., Ayadi, M., Asiri, M. M., Al-Rasheed, A., & Ksibi, A. (2022). Detecting the Presence of Malware and Identifying the Type of Cyber Attack Using Deep Learning and VGG-16 Techniques. Electronics, 11(22), 3665. https://doi.org/10.3390/electronics11223665