Image-Based Feature Representation for Insider Threat Classification
Abstract
:1. Introduction
2. Related Work
3. Proposed Method
3.1. Feature Vector Construction
3.2. Image-Based Feature Vector Representation
3.3. Classification
3.4. Transfer Learning
4. Implementation
4.1. Dataset
4.2. Imbalanced Data Handling
4.3. Performance Metrics
4.4. Experimental Results
5. Discussion and Conclusion
Author Contributions
Funding
Conflicts of Interest
References
- Verizon: 2019 Data Breach Investigations Report. In Computer Fraud & Security; Elsevier BV: Oxfordshire, UK, 2019; Volume 2019, p. 4. [CrossRef]
- Accenture/Ponemon Institute. The Cost of Cybercrime, Network Security; Elsevier BV: Amsterdam, The Netherlands, 2019; Volume 2019, p. 4. [Google Scholar] [CrossRef]
- IBM. Cost of a Data Breach Report 2019. In Computer Fraud & Security; Elsevier BV: Oxfordshire, UK, 2019; Volume 2019, p. 4. [Google Scholar] [CrossRef]
- Garcia, A.; Orts-Escolano, S.; Oprea, S.; VillenaMartinez, V.; Martinez-Gonzalez, P.; Garcia-Rodriguez, J. A survey on deep learning techniques for image and video semantic segmentation. Appl. Soft Comput. 2018, 70, 41–65. [Google Scholar] [CrossRef]
- Homoliak, I.; Toffalini, F.; Guarnizo, J.; Elovici, Y.; Ochoa, M. Insight into insiders and it: A survey of insider threat taxonomies, analysis, modeling, and countermeasures. ACM Comput. Surv. (CSUR) 2019, 52, 30. [Google Scholar] [CrossRef] [Green Version]
- Sanzgiri, A.; Dasgupta, D. Classification of Insider Threat Detection Techniques; ACM: New York, NY, USA, 2016; Volume 25. [Google Scholar]
- Zeadally, S.; Yu, B.; Jeong, D.H.; Liang, L. Detecting insider threats: Solutions and trends. Inform. Secur. J. Glob. Perspect. 2012, 21, 183–192. [Google Scholar] [CrossRef]
- Berman, D.S.; Buczak, A.L.; Chavis, J.S.; Corbett, C.L. A survey of deep learning methods for cyber security. Information 2012, 10, 122. [Google Scholar] [CrossRef] [Green Version]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 2009, 41, 15. [Google Scholar] [CrossRef]
- Gavai, G.; Sricharan, K.; Gunning, D.; Hanley, J.; Singhal, M.; Rolleston, R. Supervised and Unsupervised methods to detect Insider Threat from Enterprise Social and Online Activity Data. JoWUA 2015, 6, 47–63. [Google Scholar]
- Glasser, J.; Lindauer, B. Bridging the gap: A pragmatic approach to generating insider threat data. In Security and Privacy Workshops; IEEE: Piscataway, NJ, USA, 2013; pp. 98–104. [Google Scholar]
- Liu, L.; De Vel, O.; Chen, C.; Zhang, J.; Xiang, Y. Anomaly-Based Insider Threat Detection Using Deep Autoencoders. In Proceedings of the 2018 IEEE International Conference on Data Mining Workshops (ICDMW) 2018, Singapore, 17–20 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 39–48. [Google Scholar]
- Noever, D. Classifier Suites for Insider Threat Detection. arXiv 2019, arXiv:1901.10948. [Google Scholar]
- Meng, F.; Lou, F.; Fu, Y.; Tian, Z. Deep learning based attribute classification insider threat detection for data security. In Proceedings of the 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC), Guangzhou, China, 18–21 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 576–581. [Google Scholar]
- Lin, L.; Zhong, S.; Jia, C.; Chen, K. Insider threat detection based on deep belief network feature representation. In Proceedings of the 2017 International Conference on Green Informatics (ICGI), Fuzhou, China, 15–17 August 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 54–59. [Google Scholar]
- Yuan, F.; Cao, Y.; Shang, Y.; Liu, Y.; Tan, J.; Fang, B. Insider threat detection with deep neural network. In Proceedings of the 2018 International Conference on Computational Science, Wuxi, China, 11–13 June 2018; Springer: Cham, Swizerland, 2018; pp. 43–54. [Google Scholar]
- Zhang, J.; Chen, Y.; Ju, A. Insider threat detection of adaptive optimization DBN for behavior logs. Turkish J. Electr. Eng. Comput. Sci. 2018, 26, 792–802. [Google Scholar] [CrossRef]
- Chattopadhyay, P.; Wang, L.; Tan, Y.P. Scenario-based insider threat detection from cyber activities. IEEE Trans. Comput. Soc. Syst. 2018, 5, 660–675. [Google Scholar] [CrossRef]
- Azaria, A.; Richardson, A.; Kraus, S.; Subrahmanian, V.S. Behavioral analysis of insider threat: A survey and bootstrapped prediction in imbalanced data. IEEE Trans. Comput. Soc. Syst. 2014, 1, 135–155. [Google Scholar] [CrossRef]
- Salem, M.B.; Hershkop, S.; Stolfo, S.J. A survey of insider attack detection research. In Insider Attack and Cyber Security; Springer: Boston, MA, USA, 2008; pp. 69–90. [Google Scholar]
- Liu, L.; De Vel, O.; Han, Q.L.; Zhang, J.; Xiang, Y. Detecting and preventing cyber insider threats: A survey. IEEE Commun. Surv. Tutor. 2018, 20, 1397–1417. [Google Scholar] [CrossRef]
- Ferreira, P.; Le, D.C.; Zincir-Heywood, N. Exploring Feature Normalization and Temporal Information for Machine Learning Based Insider Threat Detection. In Proceedings of the 2019 15th International Conference on Network and Service Management (CNSM), Halifax, NS, Canada, 21–25 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–7. [Google Scholar]
- Xin, Y.; Kong, L.; Liu, Z.; Chen, Y.; Li, Y.; Zhu, H.; Gao, M.; Hou, H.; Wang, C. Machine learning and deep learning methods for cybersecurity. IEEE Trans. Knowl. Data Eng. 2018, 6, 35365–35381. [Google Scholar] [CrossRef]
- Li, J.H. Cyber security meets artificial intelligence: A survey. Front. Inform. Technol. Electron. Eng. 2018, 19, 1462–1474. [Google Scholar] [CrossRef]
- Rezende, E.; Ruppert, G.; Carvalho, T.; Ramos, F.; De Geus, P. Malicious software classification using transfer learning of resnet-50 deep neural network. In Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 18–21 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1011–1014. [Google Scholar]
- Kancherla, K.; Mukkamala, S. Image visualization based malware detection. In Proceedings of the 2013 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), Singapore, 16–19 April 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 40–44. [Google Scholar]
- Tobiyama, S.; Yamaguchi, Y.; Shimada, H.; Ikuse, T.; Yagi, T. Malware detection with deep neural network using process behavior. In Proceedings of the 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), Atlanta, GA, USA, 10–14 June 2016; IEEE: Piscataway, NJ, USA, 2016; Volume 2, pp. 577–658. [Google Scholar]
- Bhodia, N.; Prajapati, P.; Di Troia, F.; Stamp, M. Transfer Learning for Image-Based Malware Classification. arXiv 2019, arXiv:1903.11551. [Google Scholar]
- Lison, P.; Mavroeidis, V. Automatic detection of malware-generated domains with recurrent neural models. arXiv 2017, arXiv:1709.07102. [Google Scholar]
- Feng, Z.; Shuo, C.; Xiaochuan, W. Classification for DGA-based malicious domain names with deep learning architectures. In Proceedings of the 2017 Second International Conference on Applied Mathematics and Information Technology, Vellore, India, 26 December 2017; p. 5. [Google Scholar]
- Dai, Y.; Li, H.; Qian, Y.; Lu, X. A malware classification method based on memory dump grayscale image. Digit. Investig. 2018, 27, 30–37. [Google Scholar] [CrossRef]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
- Zhao, J.; Shetty, S.; Pan, J.W.; Kamhoua, C.; Kwiat, K. Transfer learning for detecting unknown network attacks. EURASIP J. Inf. Secur. 2019, 2019, 1. [Google Scholar] [CrossRef]
- Zhao, J.; Shetty, S.; Pan, J. Feature-based transfer learning for network security. Proceeding of MILCOM 2017—2017 IEEE Military Communications Conference (MILCOM), Baltimore, MD, USA, 11 December 2017; pp. 17–22. [Google Scholar]
- Tan, Z.; Jamdagni, A.; He, X.; Nanda, P.; Liu, R.P.; Hu, J. Detection of denial-of-service attacks based on computer vision techniques. IEEE Trans. Comput. 2014, 64, 2519–2533. [Google Scholar] [CrossRef]
- Yamashita, R.; Nishio, M.; Do, R.K.G.; Togashi, K. Convolutional neural networks: An overview and application in radiology. Insights Imaging 2018, 9, 611–629. [Google Scholar] [CrossRef] [Green Version]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Simonyan, K.; Zisserman, A. Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Chollet, F. Keras: Deep learning library for theano and tensorflow. 2015. Available online: https://github.com/fchollet/keras (accessed on 18 July 2020).
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- Zhou, Z.H.; Liu, X.Y. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 2005, 18, 63–77. [Google Scholar] [CrossRef]
- Piciarelli, C.; Mishra, P.; Foresti, G.L. Image anomaly detection with capsule networks and imbalanced datasets. In Proceedings of the International Conference on Image Analysis and Processing, Trento, Italy, 9–13 September 2019; Springer: Cham, Swizerlands, 2019; pp. 257–267. [Google Scholar]
- Jiang, J.; Chen, J.; Gu, T.; Choo, K.K.R.; Liu, C.; Yu, M.; Huang, W.; Mohapatra, P. Anomaly Detection with Graph Convolutional Networks for Insider Threat and Fraud Detection. In Proceedings of the MILCOM 2019–2019 IEEE Military Communications Conference (MILCOM), Norfolk, VA, USA, 12–14 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 109–114. [Google Scholar]
- Aldairi, M.; Karimi, L.; Joshi, J. A Trust Aware Unsupervised Learning Approach for Insider Threat Detection. In Proceedings of the 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI), Los Angeles, CA, USA, 30 July–1 August 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 89–98. [Google Scholar]
- Khan, A.Y.; Latif, R.; Latif, S.; Tahir, S.; Batool, G.; Saba, T. Malicious Insider Attack Detection in IoTs Using Data Analytics; IEEE: Piscataway, NJ, USA, 2019; Volume 8, pp. 11743–11753. [Google Scholar]
- Le, D.C.; Zincir-Heywood, N.; Heywood, M.I. Analyzing data granularity levels for insider threat detection using machine learning. IEEE Trans. Netw. Serv. Manag. 2020, 17, 30–44. [Google Scholar] [CrossRef]
- Perera, P.; Patel, V.M. Learning deep features for one-class classification. IEEE Trans. Image Process. 2019, 28, 5450–5463. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Vasan, D.; Alazab, M.; Wassan, S.; Naeem, H.; Safaei, B.; Zheng, Q. IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Comput. Netw. 2020, 171, 107138. [Google Scholar] [CrossRef]
Log file | Features | Description |
---|---|---|
Login | L1 | Difference between office start time and first login time |
L2 | Difference between last login time and office end time | |
L3 | Average difference in time between office start time and number of logins before office hours | |
L4 | Average difference in time between office end time and number of logins after office hours | |
L5 | Total number of logins | |
L6 | Number of logins outside office hours | |
L7 | Number of systems accessed | |
L8 | Number of systems used outside office hours | |
L9 | Average session length outside office hours | |
E1 | Count of emails sent outside the domain of organization | |
E2 | Count of emails sent within the domain from supervisor′s account | |
E3 | No. of attachments | |
E4 | Average email size | |
E5 | Number of recipients | |
Device | D1 | Count of thumb drive usage outside office |
D2 | Count of external device usage | |
File | F1 | Number of .exe files downloaded |
Http | H1 | Count of usage of wikileaks.org |
Parameter | MobileNetV2 | VGG19 | ResNet50 |
---|---|---|---|
Input shape | (32,32,3) | (32,32,3) | (32,32,3) |
Weight | Initialized to ImageNet | Initialized to ImageNet | Initialized to ImageNet |
Optimizer | RMSProp | SGD | Adamax |
Loss function | Binary cross entropy | Binary cross entropy | Binary cross entropy |
Classifier | Softmax | Softmax | Softmax |
Epochs | 15 | 15 | 15 |
Batch size | 64 | 128 | 128 |
Dropout rate | 0.3 | Nil | Nil |
Regularization | Nil | BatchNormalization | L2 Regularization |
Non-Malicious Instances | Malicious Instances | Undersampling Ratio |
---|---|---|
24,150 | 966 | 25 |
19,320 | 966 | 20 |
14,490 | 966 | 15 |
9660 | 966 | 10 |
4830 | 966 | 5 |
Model | Sample Ratio | Training % (70) | Training % (80) | ||||||
---|---|---|---|---|---|---|---|---|---|
A | P | F | R | A | P | F | R | ||
DNN | 5 | 90.86 | 76.04 | 73.37 | 70.88 | 91.72 | 79.62 | 72.25 | 66.12 |
10 | 93.46 | 64.32 | 64.81 | 65.31 | 93.56 | 70.24 | 63.27 | 57.57 | |
20 | 95.57 | 53.65 | 52.76 | 59.87 | 95.88 | 67.34 | 57.39 | 45.49 | |
25 | 96.24 | 55.10 | 22.69 | 14.29 | 96.34 | 62.96 | 36.30 | 25.50 |
Model | Sample Ratio | Training % (70) | Training % (80) | ||||||
---|---|---|---|---|---|---|---|---|---|
A | P | F | R | A | P | F | R | ||
Mobile Net V2 | 5 | 75.84 | 83.83 | 87.06 | 90.54 | 78.60 | 83.48 | 87.83 | 92.65 |
10 | 85.32 | 89.98 | 92.83 | 95.84 | 87.81 | 90.86 | 93.49 | 96.27 | |
20 | 93.38 | 95.24 | 96.13 | 97.09 | 94.59 | 95.32 | 96.21 | 97.18 | |
25 | 94.61 | 96.21 | 97.54 | 98.09 | 95.28 | 96.21 | 97.58 | 98.18 | |
VGG19 | 5 | 90.16 | 94.07 | 94.10 | 94.13 | 90.94 | 94.80 | 94.55 | 94.31 |
10 | 91.21 | 96.90 | 95.08 | 93.34 | 92.24 | 96.37 | 95.65 | 94.93 | |
20 | 95.67 | 96.60 | 97.75 | 98.93 | 95.17 | 97.56 | 96.46 | 96.36 | |
25 | 94.16 | 96.16 | 98.78 | 98.03 | 96.34 | 96.80 | 97.12 | 98.59 | |
ResNet50 | 5 | 78.94 | 80.20 | 83.66 | 94.75 | 79.26 | 83.99 | 87.96 | 92.33 |
10 | 87.45 | 91.13 | 93.25 | 95.47 | 87.98 | 91.17 | 96.16 | 93.60 | |
20 | 89.95 | 95.29 | 90.43 | 85.27 | 92.28 | 95.29 | 96.18 | 97.15 | |
25 | 93.41 | 95.13 | 95.54 | 96.68 | 95.31 | 96.12 | 97.43 | 98.09 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gayathri, R.G.; Sajjanhar, A.; Xiang, Y. Image-Based Feature Representation for Insider Threat Classification. Appl. Sci. 2020, 10, 4945. https://doi.org/10.3390/app10144945
Gayathri RG, Sajjanhar A, Xiang Y. Image-Based Feature Representation for Insider Threat Classification. Applied Sciences. 2020; 10(14):4945. https://doi.org/10.3390/app10144945
Chicago/Turabian StyleGayathri, R. G., Atul Sajjanhar, and Yong Xiang. 2020. "Image-Based Feature Representation for Insider Threat Classification" Applied Sciences 10, no. 14: 4945. https://doi.org/10.3390/app10144945
APA StyleGayathri, R. G., Sajjanhar, A., & Xiang, Y. (2020). Image-Based Feature Representation for Insider Threat Classification. Applied Sciences, 10(14), 4945. https://doi.org/10.3390/app10144945