Fuzzy Integral-Based Multi-Classifiers Ensemble for Android Malware Classification
Abstract
:1. Introduction
- -
- Proposes a novel fuzzy integral based multi-classifiers ensemble for Android malware classification. The proposed approach has the capability of aggregating the classification results from multiple classifiers by taking into account both the significance of each classifier as well as the consistency and coalition among each possible subset of classifiers based on fuzzy measures.
- -
- For successful detection of Android malware apps, the proposed approach presents an adaptive fuzzy measure based on dynamic data in single classifiers and consistency and coalition among each potential subset of classifiers.
- -
- The proposed approach’s performance was evaluated through a series of experiments, and the results indicate that it outperforms both individual classifiers and other approaches.
2. Related Work
3. Choquet Fuzzy Integral
- (1)
- µ (Ø) = 0, l(C) = 1 (requirements limits).
- (2)
- If A, B ∈ P(C) and A ⊂ Z then µ (A) ≤ µ (B) (monotonicity).
4. The Proposed Fuzzy Integral-Based Multi-Classifiers Ensemble for Android Malware Classification
4.1. Dataset and Features Selection
4.2. Single Classifiers
4.2.1. The eXtreme Gradient Boosting (XGBoost)
4.2.2. Random Forest (RF)
4.2.3. Decision Tree (DT)
4.2.4. AdaBoost
4.2.5. LightGBM
4.3. Classifiers Fusion Based on Choquet Integral
4.4. Adaptive Fuzzy Measure
5. Results and Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Niu, W.; Cao, R.; Zhang, X.; Ding, K.; Zhang, K.; Li, T. OpCode-level function call graph based android malware classification using deep learning. Sensors 2020, 20, 3645. [Google Scholar] [CrossRef] [PubMed]
- Statista. 2019. Available online: https://www.statista.com/statistics/266210/number-of-available-applications-in-the-google-play-store (accessed on 13 June 2021).
- Gdata. 2019. Available online: https://www.gdata-software.com/news/g-data-mobile-malware-report-2019-new-high-for-malicious-android-apps (accessed on 13 April 2020).
- Feng, J.; Shen, L.; Chen, Z.; Wang, Y.; Li, H. A two-layer deep learning method for android malware detection using network traffic. IEEE Access 2020, 8, 125786–125796. [Google Scholar] [CrossRef]
- Conti, M.; Li, Q.Q.; Maragno, A.; Spolaor, R. The dark side (-channel) of mobile devices: A survey on network traffic analysis. IEEE Commun. Surv. Tutor. 2018, 20, 2658–2713. [Google Scholar] [CrossRef] [Green Version]
- Mehtab, A.; Shahid, W.B.; Yaqoob, T.; Amjad, M.F.; Abbas, H.; Afzal, H.; Saqib, M.N. AdDroid: Rule-based machine learning framework for android malware analysis. Mob. Netw. Appl. 2020, 25, 180–192. [Google Scholar] [CrossRef]
- Demontis, A.; Melis, M.; Biggio, B.; Maiorca, D.; Arp, D.; Rieck, K.; Corona, I.; Giacinto, G.; Roli, F. Yes, machine learning can be more secure! a case study on android malware detection. IEEE Trans. Depend. Secure Comput. 2017, 16, 711–724. [Google Scholar] [CrossRef] [Green Version]
- Papadopoulos, H.; Georgiou, N.; Eliades, C.; Konstantinidis, A. Android malware detection with unbiased confidence guarantees. Neurocomputing 2018, 280, 3–12. [Google Scholar] [CrossRef]
- Altaher, A. An improved Android malware detection scheme based on an evolving hybrid neuro-fuzzy classifier (EHNFC) and permission-based features. Neural Comput. Appl. 2016, 28, 4147–4157. [Google Scholar] [CrossRef]
- Abdulla, S.; Altaher, A. Intelligent approach for android malware detection. KSII Trans. Internet Inf. Syst. 2015, 9, 2964–2983. [Google Scholar]
- Altaher, A.; Barukab, O. Android malware classification based on ANFIS with fuzzy c-means clustering using significant application permissions. Turk. J. Electr. Eng. Comput. Sci. 2017, 25, 2232–2242. [Google Scholar] [CrossRef]
- Imtiaz, S.I.; Rehman, S.U.; Javed, A.R.; Jalil, Z.; Liu, X.; Alnumay, W.S. DeepAMD: Detection and identification of Android malware using high-efficient Deep Artificial Neural Network. Futur. Gener. Comput. Syst. 2020, 115, 844–856. [Google Scholar] [CrossRef]
- Wu, W.C.; Hung, S.H. DroidDolphin: A dynamic Android Malware Detection Framework using Big Data and Machine Learning. In Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems, Towson, MD, USA, 5–8 October 2014; pp. 247–252. [Google Scholar]
- Burguera, I.; Zurutuza, U.; Nadjm-Tehrani, S. Crowdroid: Behavior-Based Malware Detection System for Android. In Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, Chicago, IL, USA, 15–19 November 2011; pp. 15–26. [Google Scholar]
- Yang, Y.; Wei, Z.; Xu, Y.; He, H.; Wang, W. Droidward: An effective dynamic analysis method for vetting android applications. Cluster Comput. 2018, 21, 265–275. [Google Scholar] [CrossRef]
- Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2019, 14, 241–258. [Google Scholar] [CrossRef]
- Breiman, L. Bagging predictors. Mach. Learn 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
- Schapire, R.E.; Singer, Y. Improved Boosting Algorithms Using Confidence-rated Predictions. Mach. Learn. 1999, 37, 297–336. [Google Scholar] [CrossRef] [Green Version]
- Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
- Wang, W.; Li, Y.; Wang, X.; Liu, J.; Zhang, X. Detecting Android malicious apps and categorizing benign apps with ensemble of classifiers. Futur. Gener. Comput. Syst. 2018, 78, 987–994. [Google Scholar] [CrossRef] [Green Version]
- Arp, D.; Spreitzenbarth, M.; Hubner, M.; Gascon, H.; Rieck, K.; Siemens, C.E.R.T. Drebin: Effective and explainable detection of android malware in your pocket. Ndss 2014, 14, 23–26. [Google Scholar]
- Firdaus, A.; Anuar, N.B.; Ab Razak, M.F.; Sangaiah, A.K. Bio-inspired computational paradigm for feature investigation and malware detection: Interactive analytics. Multimed. Tools Appl. 2017, 77, 17519–17555. [Google Scholar] [CrossRef] [Green Version]
- Allix, K.; Bissyandé, T.F.; Klein, J.; Le Traon, Y. Androzoo: Collecting Millions of Android Apps for the Research Community. In Proceedings of the 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) 2016, Austin, TX, USA, 14–15 May 2016; pp. 468–471. [Google Scholar]
- Hu, D.; Ma, Z.; Zhang, X.; Li, P.; Ye, D.; Ling, B. The concept drift problem in Android malware detection and its solution. Secur. Commun. Netw. 2017, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Zhang, X.; Hu, D.; Fan, Y.; Yu, K. A Novel Android Malware Detection Method Based on Markov Blanket. In Proceedings of the 2016 IEEE First International Conference on Data Science in Cyberspace (DSC) 2016, Changsha, China, 13–16 June 2016; pp. 347–352. [Google Scholar]
- Coronado-De-Alba, L.D.; Rodríguez-Mota, A.; Escamilla-Ambrosio, P.J. Feature Selection and Ensemble of Classifiers for Android Malware Detection. In Proceedings of the 8th IEEE Latin-American Conference on Communications (LATINCOM), Medellin, Colombia, 15–17 November 2016; pp. 1–6. [Google Scholar]
- Peiravian, N.; Zhu, X. Machine Learning for Android Malware Detection using Permission and Api Calls. In Proceedings of the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, Washington, DC, USA, 4–6 November 2013; pp. 300–305. [Google Scholar]
- Wang, X.; Wang, W.; He, Y.; Liu, J.; Han, Z.; Zhang, X. Characterizing Android apps’ behavior for effective detection of malapps at large scale. Futur. Gener. Comput. Syst. 2017, 75, 30–45. [Google Scholar] [CrossRef] [Green Version]
- Talha, K.A.; Alper, D.I.; Aydin, C. APK Auditor: Permission-based Android malware detection system. Digit. Investig. 2015, 13, 1–14. [Google Scholar] [CrossRef]
- Milosevic, N.; Dehghantanha, A.; Choo, K.-K.R. Machine learning aided Android malware classification. Comput. Electr. Eng. 2017, 61, 266–274. [Google Scholar] [CrossRef] [Green Version]
- Taha, A.A.; Malebary, S.J. Hybrid classification of Android malware based on fuzzy clustering and the gradient boosting machine. Neural Comput. Appl. 2020, 33, 6721–6732. [Google Scholar] [CrossRef]
- Awan, M.J.; Masood, O.A.; Mohammed, M.A.; Yasin, A.; Zain, A.M.; Damaševičius, R.; Abdulkareem, K.H. Image-Based Malware Classification Using VGG19 Network and Spatial Convolutional Attention. Electronics 2021, 10, 2444. [Google Scholar] [CrossRef]
- Hemalatha, J.; Roseline, S.A.; Geetha, S.; Kadry, S.; Damaševičius, R. An efficient DenseNet-based deep learning model for malware detection. Entropy 2021, 23, 344. [Google Scholar] [CrossRef]
- Nisa, M.; Shah, J.H.; Kanwal, S.; Raza, M.; Khan, M.A.; Damaševičius, R.; Blažauskas, T. Hybrid malware classification method using segmentation-based fractal texture analysis and deep convolution neural network features. Appl. Sci. 2020, 10, 4966. [Google Scholar] [CrossRef]
- Choquet, G. Theory of Capacities. Ann. Inst. Fourier 1954, 5, 131–295. [Google Scholar] [CrossRef] [Green Version]
- Höhle, U. Integration with Respect to Fuzzy Measures. In Proceedings of the IFAC Symposium on Theory and Applications of Digital Control, New Delhi, India, 5–7 January 1982; pp. 35–37. [Google Scholar]
- Murofushi, T.; Sugeno, M. A theory of fuzzy measures: Representations, the Choquet integral, and null sets. J. Math. Anal. Appl. 1991, 159, 532–549. [Google Scholar] [CrossRef] [Green Version]
- Murofushi, T.; Sugeno, M. An interpretation of fuzzy measures and the Choquet integral as an integral with respect to a fuzzy measure. Fuzzy Sets Syst. 1989, 29, 201–227. [Google Scholar] [CrossRef]
- Li, X.; Wang, F.; Chen, X. Support Vector Machine Ensemble Based on Choquet Integral for Financial Distress Prediction. Int. J. Pattern Recognit. Artif. Intell. 2015, 29. [Google Scholar] [CrossRef]
- Chiou, H.-K.; Tzeng, G.-H. Fuzzy Multiple-Criteria Decision-Making Approach for Industrial Green Engineering. Environ. Manag. 2002, 30, 816–830. [Google Scholar] [CrossRef]
- Tahani, H.; Keller, J.M. Information fusion in computer vision using the fuzzy integral. IEEE Trans. Syst. Man Cyber. 1990, 733, 741. [Google Scholar] [CrossRef]
- Mori, T. Information Gain Ratio as Term Weight: The Case of Summarization of Ir Results. In Proceedings of the COLING 2002, The 19th International Conference on Computational Linguistics, Taipei, Taiwan, 26–30 August 2002. [Google Scholar]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM S International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Routledge: London, UK, 2017. [Google Scholar]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. [Google Scholar]
- Chibelushi, C.; Deravi, F.; Mason, J. Adaptive classifier integration for robust pattern recognition. IEEE Trans. Syst. Man Cybern. Part B 1999, 29, 902–907. [Google Scholar] [CrossRef] [PubMed]
- Dal Pozzolo, A.; Caelen, O.; Le Borgne, Y.-A.; Waterschoot, S.; Bontempi, G. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 2014, 41, 4915–4928. [Google Scholar] [CrossRef]
- Salah, A.; Shalabi, E.; Khedr, W. A lightweight android malware classifier using novel feature selection methods. Symmetry 2020, 12, 858. [Google Scholar] [CrossRef]
- Yerima, S.Y.; Sezer, S.; McWilliams, G. Analysis of Bayesian classification-based approaches for Android malware detection. IET Inf. Secur. 2013, 8, 25–36. [Google Scholar] [CrossRef] [Green Version]
- Sanz, B.; Santos, I.; Laorden, C.; Ugarte-Pedrero, X.; Bringas, P.G.; Álvarez, G. Puma: Permission Usage to Detect Malware in Android. In International Joint Conference CISIS’12-ICEUTE 12-SOCO 12 Special Sessions; Springer: Berlin/Heidelberg, Germany, 2013; pp. 289–298. [Google Scholar]
Approach | Description of Method | Accuracy | Limitations |
---|---|---|---|
Altaher [9] | Android permissions and adaptive neuro fuzzy inference systems (ANFIS) | 75% | Static analysis method and lacks the dynamic real-time inspection |
DroidDolphin [13] | API calls and activities by running the apps on virtual environments and SVM algorithm | 86.1% | Dynamic analysis consuming resources and takes a long time |
Arp et al. [22] | Used API calls, Android permissions and network addresses as features and support vector machine (SVM) algorithm | 94% | Exhibits the inherent limitations of static analysis |
Firdaus et al. [23] | Android permissions used as features, and three machine learning algorithms: voted perceptron (VP), radial basis function net-work (RBFN), and multilayer perceptron (MLP) | 90% | Static analysis method and lacks the dynamic real-time inspection |
Hu et al. [25] | Android permissions, application actions, and API were utilized as features An ensemble learning classifier was used as classifier | 96% | Unable to detect new Android malware apps |
Talha et al. [30] | Android permissions and machine learning algorithms | 88% | Static analysis method and lacks the dynamic real-time inspection |
Taha and Malebary [32] | Android permissions and fuzzy C-means clustering (FCM) algorithm with the light gradient boosting machine (LightGBM) | 94.63% | Exhibits the inherent limitations of static analysis |
Awan et al. [33] | Spatial attention and convolutional neural network (SACNN) based on deep learning framework | 97.42% | Lack of exploration in the data augmentation and the feature engineering domains |
Hemalatha et al. [34] | Visualization-based method, where malware binaries are depicted as two-dimensional images and classified by a deep learning model | 98.23% | Consumes re-sources and takes long time for training |
Nisa et al. [35] | The features that are extracted from malware images are then classified using different variants of support vector machine (SVM), k-nearest neighbor (KNN), decision tree (DT), and other classifiers | 99.3% | Used a pre-trained model and cannot detect new Android malware apps |
Classifier | Parameters |
---|---|
LightGBM | Objective = regression, metric = rmse, num_leaves = 80, learning_rate = 0.09, bagging_fraction = 0.7, feature_fraction = 0.7, bagging_frequency = 5, bagging_seed = 2018, verbosity = 1 |
Random Forest | n_estimators = 50 and random_state = 1 |
XGBoost | random_state = 1 and learning_rate = 0.01 |
AdaBoost | n_estimators = 100 |
Decision Tree | max_depth = 5, min_samples_leaf = 4 |
Android Apps | XGBoost | Random Forest | Decision Tree | AdaBoost | Light-GBM | Actual | C | µ |
---|---|---|---|---|---|---|---|---|
App1 | 0.1944 | 0.0033 | 0.0032 | 0.4870 | 0 | 0 | 0 | 0 |
App2 | 0.7831 | 0.9936 | 0.9930 | 0.5072 | 0.9920 | 1 | 1 | 1 |
App3 | 0.2605 | 0.1132 | 0.1135 | 0.4959 | 0.1135 | 0 | 0 | 0 |
App4 | 0 | 0.2605 | 0.4984 | 0.6121 | 1 | 1 | 1 | 0 |
App5 | 0.1944 | 0.0033 | 0.0032 | 0.4870 | 0.0034 | 0 | 0 | 0 |
App6 | 0.6965 | 0.8627 | 0.8628 | 0.5049 | 0.8606 | 1 | 1 | 1 |
App7 | 0.1944 | 0.0033 | 0.0032 | 0.4870 | 0 | 0 | 0 | 0 |
App8 | 0.7831 | 0.9858 | 0.9873 | 0.5133 | 0.9877 | 1 | 1 | 1 |
App9 | 0.7831 | 0.9858 | 0.9873 | 0.5133 | 0.9877 | 1 | 1 | 1 |
App10 | 0.6965 | 0.8627 | 0.8628 | 0.5049 | 0.8606 | 1 | 1 | 1 |
Sub-Ensemble of Classifiers | Performance Score | Average Performance Score |
---|---|---|
Decision Tree | 0.9477 | 0.9408 |
AdaBoost | 0.9366 | |
Random Forest | 0.9495 | |
XGBoost | 0.9224 | |
LightGBM | 0.9482 | |
Decision Tree, Random Forest | 0.9503 | 0.9465 |
Decision Tree, XGBoost | 0.9503 | |
Decision Tree, AdaBoost | 0.9238 | |
Decision Tree, LightGBM C5 | 0.9491 | |
Random Forest, Decision Tree | 0.9482 | |
Random Forest, AdaBoost | 0.9493 | |
Random Forest, LightGBM | 0.9491 | |
Decision Tree, AdaBoost | 0.9480 | |
Decision Tree, LightGBM | 0.9484 | |
AdaBoost, LightGBM | 0.9491 | |
Decision Tree, Random Forest, XGBoost | 0.9503 | 0.9493 |
Decision Tree, Random Forest, AdaBoost | 0.9503 | |
Decision Tree, Random Forest, LightGBM | 0.9491 | |
Decision Tree, AdaBoost, LightGBM | 0.9491 | |
Random Forest, XGBoost, AdaBoost | 0.9482 | |
Random Forest, XGBoost, LightGBM | 0.9488 | |
Random Forest, AdaBoost, LightGBM | 0.9491 | |
XGBoost, Decision Tree, LightGBM | 0.9486 | |
XGBoost, Decision Tree, AdaBoost | 0.9503 | |
Decision Tree, Random Forest, Decision Tree, AdaBoost | 0.9503 | 0.9495 |
Decision Tree, Random Forest, XGBoost, LightGBM | 0.9488 | |
Decision Tree, Random Forest, XGBoost, AdaBoost, LightGBM | 0.9488 | 0.9488 |
Classifier | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
Random Forest | 0.9495 | 0.8950 | 0.9680 | 0.9301 |
LGBM | 0.9482 | 0.8921 | 0.9674 | 0.9282 |
Decision Tree | 0.9480 | 0.8950 | 0.9639 | 0.9284 |
AdaBoost | 0.9366 | 0.9251 | 0.9076 | 0.9163 |
XGBoost | 0.9224 | 0.9192 | 0.8793 | 0.8988 |
Arithmetic Average | 0.9486 | 0.9669 | 0.8940 | 0.92910 |
The proposed | 0.9508 | 0.9240 | 0.9463 | 0.9350 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Taha, A.; Barukab, O.; Malebary, S. Fuzzy Integral-Based Multi-Classifiers Ensemble for Android Malware Classification. Mathematics 2021, 9, 2880. https://doi.org/10.3390/math9222880
Taha A, Barukab O, Malebary S. Fuzzy Integral-Based Multi-Classifiers Ensemble for Android Malware Classification. Mathematics. 2021; 9(22):2880. https://doi.org/10.3390/math9222880
Chicago/Turabian StyleTaha, Altyeb, Omar Barukab, and Sharaf Malebary. 2021. "Fuzzy Integral-Based Multi-Classifiers Ensemble for Android Malware Classification" Mathematics 9, no. 22: 2880. https://doi.org/10.3390/math9222880
APA StyleTaha, A., Barukab, O., & Malebary, S. (2021). Fuzzy Integral-Based Multi-Classifiers Ensemble for Android Malware Classification. Mathematics, 9(22), 2880. https://doi.org/10.3390/math9222880