Malware Detection Based on Code Visualization and Two-Level Classification
Abstract
:1. Introduction
Objective
2. Related Work
3. Methodology
3.1. About the Malimg Dataset
3.2. Preprocessing
- File size: File size is characterizing each family, since all members have similar sizes.
- Entropy: Entropy is a statistical measure of randomness used to characterise the texture of the input image.
- Contrast: Contrast is the difference in luminance that makes an object in an image distinguishable.
- Correlation: The correlation coefficient between an image and the same image processed with a median filter.
- Energy: Grayscale images have gray levels, and gray levels are units of energy.
- Homogeneity: The distribution of gray values within an image.
- Mean Image Intensity: Every pixel of a grayscale image has an intensity (value) in the range [0, 255]. Mean Image Intensity is the mean of the intensity of all pixels.
3.3. Training the ANN
3.4. Defining the ANN Architecture
3.5. Testing Other Classification Tools
4. Results
4.1. Performance of the Two-Level ANN
4.2. Performance of Other Classification Methods
4.3. Comparison with Other Approaches
5. Discussion and Future Work
Future Work
- A future task is to test our ANN with additional datasets such as the BIG 2015 dataset.
- The performance of various classification methods depends on the size of the dataset; for example, k-NN performance decreases with the number of inputs. It would be interesting to compare the accuracy of the various classification methods with large datasets.
- Finally, it would be interesting to test hybrid schemes with combinations of methods, that is, one method at the 1st level and a different method at the second level, in an effort to combine their advantages. For instance, an ANN at the 1st level (which can cope with big amounts of input data) to perform the coarse classification and a k-NN at the 2nd level (where the input data will be limited) to perform the fine classification.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Acknowledgments
Conflicts of Interest
References
- Nataraj, L.; Karthikeyan, S.; Jacob, G.; Manjunath, B.S. Malware images: Visualization and automatic classification. In Proceedings of the 8th International Symposium on Visualization for Cyber Security, Pittsburgh, PA, USA, 20 July 2011; pp. 1–7. [Google Scholar]
- McAfee Labs Threats Reports. November 2020. Available online: https://www.mcafee.com/enterprise/en-us/threat-center/mcafee-labs/reports.html (accessed on 23 December 2020).
- Mallet, H. Malware Classification Using Convolutional Neural Networks—Step by Step Tutorial. A Quick and Easy Tutorial about an Interesting Approach to Malware Classification. 27 May 2020. Available online: https://towardsdatascience.com/malware-classification-using-convolutional-neural-networks-step-by-step-tutorial-a3e8d97122f (accessed on 18 July 2020).
- Donahue, J.; Paturi, A.; Mukkamala, S. Visualization Techniques for Efficient Malware Detection. RiskSense Technical White Paper Series. Available online: https://www.risksense.com/wp-content/uploads/2018/05/Visualization-Techniques-for-Efficient-Malware-Detection.pdf (accessed on 28 December 2020).
- Vasan, D.; Alazab, M.; Wassan, S.; Naeem, H.; Safaei, B.; Zheng, Q. IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Comput. Netw. 2020, 171, 107138. [Google Scholar] [CrossRef]
- Narayanan, B.N.; Davuluru, V.S.P. Ensemble Malware Classification System using Deep Neural Networks. Electronics 2020, 9, 721. [Google Scholar] [CrossRef]
- Conti, G.; Dean, E.; Sinda, M.; Sangster, B. Visual reverse engineering of binary and data files. In Lecture Notes in Computer Science, Proceedings of the 5th International Workshop on Visualization for Computer Security, VizSec ’08, Cambridge, MA, USA, 15 September 2008; Springer: Berlin/Heidelberg, Germany, 2018; pp. 1–17. [Google Scholar]
- Quist, D.A.; Liebrock, L.M. Visualizing compiled executables for malware analysis. In Proceedings of the 6th International Workshop on Visualization for Cyber Security (VizSec), Atlantic City, NJ, USA, 11 October 2009; pp. 27–32. [Google Scholar]
- Conti, G.; Bratus, S.; Shubina, A.; Lichtenberg, A.; Ragsdale, R.; Perez-Alemany, R.; Sangster, B.; Supan, M.A. Visual Study of Binary Fragment Types; Black Hat: San Francisco, CA, USA, 2010. [Google Scholar]
- Oliva, A.; Torralba, A. Modeling the shape of a scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 2001, 42, 145–175. [Google Scholar] [CrossRef]
- Kancherla, K.S.; Mukkamala, S. Image visualization based malware detection. In Proceedings of the 2013 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), Singapore, 16–19 April 2013; pp. 40–44. [Google Scholar]
- Narayanan, B.N.; Djaneye-Boundjou, O.; Kebede, T.M. Performance analysis of machine learning and pattern recognition algorithms for malware classification. In Proceedings of the 2016 IEEE National Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), Dayton, OH, USA, 25–29 July 2016; pp. 338–342. [Google Scholar]
- Makandar, A.; Patrot, A. Malware Analysis and Classification using Artificial Neural Network. In Proceedings of the 2015 International Conference on Trends in Automation, Communications and Computing Technology (I-TACT-15), Bangalore, India, 21–22 December 2015. [Google Scholar]
- Yue, S. Imbalanced Malware Images Classification: A CNN based Approach. Submitted on 27 August 2017. Available online: https://arxiv.org/abs/1708.08042 (accessed on 30 December 2020).
- Cui, Z.; Xue, F.; Cai, X.; Cao, Y.; Wang, G.; Chen, J. Detection of Malicious Code Variants Based on Deep Learning. J. IEEE Trans. Ind. Inform. 2018, 14, 3187–3196. [Google Scholar] [CrossRef]
- Cui, Z.; Du, L.; Wang, P.; Cai, X.; Zhang, W. Malicious code detection based on CNNs and multi-objective algorithm. J. Parallel Distrib. Comput. 2019, 129, 50–58. [Google Scholar] [CrossRef]
- Ni, S.; Qian, Q.; Zhanga, R. Malware identification using visualization images and deep learning. Comput. Secur. 2018, 77, 871–885. [Google Scholar] [CrossRef]
- Yajamanam, S.; Selvin, V.R.S.; Di Troia, F.; Stamp, M. Deep Learning versus Gist Descriptors for Image-based Malware Classification. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018), Funchal, Portugal, 22–24 January 2018; pp. 553–561. [Google Scholar]
- Makandar, A.; Patrot, A. Malware class recognition using image processing techniques. In Proceedings of the 2017 International Conference on Data Management, Analytics and Innovation (ICDMAI), Pune, India, 24–26 February 2017. [Google Scholar]
No. | Family Name | Malware Type |
---|---|---|
1 | Adialer.C | Dialer |
2 | Agent.FYI | Backdoor |
3 | Allaple.A | Worm |
4 | Allaple.L | Worm |
5 | Alueron.gen!J | Worm |
6 | Autorun.K | Worm:AutoIT |
7 | C2LOP.P | Trojan |
8 | C2LOP.gen!g | Trojan |
9 | Dialplatform.B | Dialer |
10 | Dontovo.A | Trojan Downloader |
11 | Fakerean | Rogue |
12 | Instantaccess | Dialer |
13 | Lolyda.AA1 | PWS |
14 | Lolyda.AA2 | PWS |
15 | Lolyda.AA3 | PWS |
16 | Lolyda.AT | PWS |
17 | Malex.gen!J | Trojan |
18 | Obfuscator.AD | Trojan Downloader |
19 | Rbot!gen | Backdoor |
20 | Skintrim.N | Trojan |
21 | Swizzor.gen!E | Trojan Downloader |
22 | Swizzor.gen!I | Trojan Downloader |
23 | VB.AT | Worm |
24 | Wintrim.BX | Trojan Downloader |
25 | Yuner.A | Worm |
Method | k-NN | EnsembleBT | ANN |
---|---|---|---|
Average time (s) | 0.032 | 0.345 | 0.024 |
Average Accuracy @L1 | 97.69% | 98.70% | 98.83% |
Average Accuracy @L2,G1 | 100% | 100% | 100% |
Average Accuracy @L2,G2 | 98.82% | 99.04% | 99.09% |
Overall Average Accuracy | 98.288% | 98.938% | 99.135% |
Year | Researchers | Methods | Technique | Accuracy (%) |
---|---|---|---|---|
2011 | Nataraj et al. | GIST | Machine Learning | 98 |
2017 | S. Yue | CNN | Deep Learning | 97.32 |
2017 | Makandar and Patrot | Gabor wavelet-kNN | Machine Learning | 89.11 |
2018 | Yajamanam et al. | GIST+kNN+SVM | Machine Learning | 97 |
2018 | Cui, Xue, et al. | GIST+SVM 15 | Deep Learning | 92.20 |
2018 | Cui, Xue, et al. | GIST+kNN | Deep Learning | 91.90 |
2018 | Cui, Xue, et al. | GLCM+SVM | Deep Learning | 93.20 |
2018 | Cui, Xue, et al. | GLCM+kNN | Deep Learning | 92.50 |
2018 | Cui, Xue, et al. | IDA+DRBA | Deep Learning | 94.50 |
2019 | Cui, Du, et al. | CNN, NSGA-II | Deep Learning | 97.6 |
2020 | Mallet | CNN, Keras | Deep Learning | 95.15 |
2020 | Vasan et al. | IMCFN, Color images | Deep Learning | 98.82 |
2021 | Two-level ANN | Image and file features, ANN | Two-level ANN | 99.13 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Moussas, V.; Andreatos, A. Malware Detection Based on Code Visualization and Two-Level Classification. Information 2021, 12, 118. https://doi.org/10.3390/info12030118
Moussas V, Andreatos A. Malware Detection Based on Code Visualization and Two-Level Classification. Information. 2021; 12(3):118. https://doi.org/10.3390/info12030118
Chicago/Turabian StyleMoussas, Vassilios, and Antonios Andreatos. 2021. "Malware Detection Based on Code Visualization and Two-Level Classification" Information 12, no. 3: 118. https://doi.org/10.3390/info12030118
APA StyleMoussas, V., & Andreatos, A. (2021). Malware Detection Based on Code Visualization and Two-Level Classification. Information, 12(3), 118. https://doi.org/10.3390/info12030118