A Comparative Analysis of Supervised and Unsupervised Models for Detecting Attacks on the Intrusion Detection Systems
Abstract
:1. Introduction
2. Related Work
2.1. Supervised Techniques
2.2. Unsupervised Techniques
3. Methodology
3.1. Dataset
3.2. Data Pre-Processing
3.3. Machine Learning Models
3.3.1. Supervised Models
3.3.2. Unsupervised Models
3.4. Optimization Approaches
3.5. Evaluation Metrics
- PRT refers to the total time necessary to train, test, and validate the models.
- PT denotes the time taken to predict malicious signals over non-malicious signals.
- TPS denotes the time each sample takes to train the ML model.
- M is the amount of memory the ML models use during the entire period.
4. Results and Discussion
- The AlexNet model yielded the best results of all supervised and unsupervised learning techniques in terms of the highlighted metrics.
- GNB and LR models yielded the worst results of the supervised models.
- The VA-Encoder model yielded the highest-performance results of the unsupervised models.
- The worst performance model among the unsupervised models was K-means.
- Several models, such as CART, C-SVM, and PCA, yielded satisfactory results.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Smadi, A.A.; Ajao, B.T.; Johnson, B.K.; Lei, H.; Chakhchoukh, Y.; Abu Al-Haija, Q. A Comprehensive Survey on Cyber-Physical Smart Grid Testbed Architectures: Requirements and Challenges. Electronics 2021, 10, 1043. [Google Scholar] [CrossRef]
- Tazi, K.; Abdi, F.; Abbou, M.F. Review on Cyber-physical Security of the Smart Grid: Attacks and Defense Mechanisms. In International Renewable and Sustainable Energy Conference (IRSEC); IEEE: Piscataway, NJ, USA, 2015; pp. 1–6. [Google Scholar]
- Khoei, T.T.; Aissou, G.; Hu, W.C.; Kaabouch, N. Ensemble Learning Methods for Anomaly Intrusion Detection System in Smart Grid. In Proceedings of the 2021 IEEE International Conference on Electro Information Technology (EIT), Mt. Pleasant, MI, USA, 14–15 May 2021; pp. 129–135. [Google Scholar] [CrossRef]
- Khoei, T.T.; Ismail, S.; Kaabouch, N. Boosting-based Models with Tree-structured Parzen Estimator Optimization to Detect Intrusion Attacks on Smart Grid. In Proceedings of the 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 1–4 December 2021; pp. 0165–0170. [Google Scholar]
- Mrabet, Z.E.; Ghazi, H.E.; Kaabouch, N. A performance comparison of data mining algorithms-based intrusion detection system for smart grid. In Conference on Electro Information Technology (EIT); IEEE: Piscataway, NJ, USA, 2019; pp. 298–303. [Google Scholar]
- Anthi, E.; Williams, L.; Słowińska, M.; Theodorakopoulos, G.; Burnap, P. A supervised intrusion detection system for smart home IoT devices. Internet Things J. 2019, 6, 9042–9053. [Google Scholar] [CrossRef]
- Talaei Khoei, T.; Ismail, S.; Shamaileh, K.A.; Devabhaktuni, V.K.; Kaabouch, N. Impact of Dataset and Model Parameters on Machine Learning Performance for the Detection of GPS Spoofing Attacks on Unmanned Aerial Vehicles. Appl. Sci. 2022, 13, 383. [Google Scholar] [CrossRef]
- Thapa, N.; Liu, Z.; Kc, D.B.; Gokaraju, B.; Roy, K. Comparison of machine learning and deep learning models for network intrusion detection systems. Future Internet 2020, 12, 167. [Google Scholar] [CrossRef]
- Song, C.; Sun, Y.; Han, G.; Rodrigues, J.J. Intrusion detection based on hybrid classifiers for smart grid. Comput. Electr. Eng. 2021, 93, 107212. [Google Scholar] [CrossRef]
- Roy, D.D.; Shin, D. Network Intrusion Detection in Smart Grids for Imbalanced Attack Types Using Machine Learning Models. In Proceedings of the International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 16–18 October 2019; pp. 576–581. [Google Scholar]
- Arora, P.; Kaur, B.; Teixeira, M.A. Evaluation of Machine Learning Algorithms Used on Attacks Detection in Industrial Control Systems. J. Inst. Eng. 2021, 102, 605–616. [Google Scholar] [CrossRef]
- Yao, R.; Wang, N.; Liu, Z.; Chen, P.; Sheng, X. Intrusion Detection System in the Advanced Metering Infrastructure: A Cross-Layer Feature-Fusion CNN-LSTM-Based Approach. Sensors 2021, 21, 626. [Google Scholar] [CrossRef]
- Yang, H.; Wang, F. Wireless Network Intrusion Detection Based on Improved Convolutional Neural Network. IEEE Access 2019, 7, 64366–64374. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, Z.; Ma, J.; Jin, Q. KFRNN: An Effective False Data Injection Attack Detection in Smart Grid Based on Kalman Filter and Recurrent Neural Network. IEEE Internet Things J. 2022, 9, 6893–6904. [Google Scholar] [CrossRef]
- Majidi, S.; Hadayeghparast, S.; Karimipour, H. FDI attack detection using extra trees algorithm and deep learning algorithm-autoencoder in smart grid. Int. J. Crit. Infrastruct. Prot. 2022, 37, 100508. [Google Scholar] [CrossRef]
- Ahmed, S.; Lee, Y.; Hyun, S.; Koo, I. Unsupervised Machine Learning-Based Detection of Covert Data Integrity Assault in Smart Grid Networks Utilizing Isolation Forest. IEEE Trans. Inf. Secur. 2019, 14, 2765–2777. [Google Scholar] [CrossRef]
- Menon, D.M.; Radhika, N. Anomaly detection in smart grid traffic data for home area network. In Proceedings of the 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT), Nagercoil, India, 18–19 March 2016; pp. 1–4. [Google Scholar]
- Grammatikis, P.R.; Sarigiannidis, P.; Efstathopoulos, G.; Panaousis, E. ARIES: A Novel Multivariate Intrusion Detection System for Smart Grid. Sensors 2020, 20, 5305. [Google Scholar] [CrossRef] [PubMed]
- Karimipour, H.; Dehghantanha, A.; Parizi, R.M.; Choo, K.R.; Leung, H. A Deep and Scalable Unsupervised Machine Learning System for Cyber-Attack Detection in Large-Scale Smart Grids. IEEE Access 2019, 7, 80778–80788. [Google Scholar] [CrossRef]
- Barua, A.; Muthirayan, D.; Khargonekar, P.P.; Al Faruque, M.A. Hierarchical Temporal Memory Based Machine Learning for Real-Time, Unsupervised Anomaly Detection in Smart Grid: WiP Abstract. In Proceedings of the ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS), Sydney, Australia, 21–25 April 2020; pp. 188–189. [Google Scholar]
- Hu, C.; Yan, J.; Liu, X. Adaptive Feature Boosting of Multi-Sourced Deep Autoencoders for Smart Grid Intrusion Detection. In Proceedings of the 2020 IEEE Power & Energy Society General Meeting (PESGM), Virtual, 3–6 August 2020; pp. 1–5. [Google Scholar]
- Sharafaldin, I.; Lashkari, A.H.; Hakak, S.; Ghorbani, A.A. Developing Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxonomy. In Proceedings of the IEEE 53rd International Carnahan Conference on Security Technology, Chennai, India, 1–3 October 2019. [Google Scholar]
- Altwaijry, H. Bayesian based intrusion detection system. In IAENG Transactions on Engineering Technologies; Springer: Berlin/Heidelberg, Germany, 2013; pp. 29–44. [Google Scholar]
- van de Schoot, R.; Depaoli, S.; King, R.; Kramer, B.; Märtens, K.; Tadesse, M.G.; Vannucci, M.; Gelman, A.; Veen, D.; Willemsen, J.; et al. Bayesian statistics and modelling. Nat. Rev. Methods Prim. 2021, 1, 1. [Google Scholar] [CrossRef]
- Jahromi, A.H.; Taheri, M. A non-parametric mixture of Gaussian naive Bayes classifiers based on local independent features. In Proceedings of the Artificial Intelligence and Signal Processing Conference (AISP), Shiraz, Iran, 25–27 October 2017; pp. 209–212. [Google Scholar] [CrossRef]
- Song, Y.; Ying, L. Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry 2015, 27, 130. [Google Scholar]
- Singh, S.; Gupta, P. Comparative study ID3, cart and C4. 5 decision tree algorithm: A survey. Int. J. Adv. Inf. Sci. Technol. (IJAIST) 2014, 27, 97–103. [Google Scholar]
- Zhang, M.L.; Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit. 2007, 40, 2038–2048. [Google Scholar] [CrossRef]
- Musavi, M.; Ahmed, W.; Chan, K.; Faris, K.; Hummels, D. On the training of radial basis function classifiers. Neural Netw. 1992, 5, 595–603. [Google Scholar] [CrossRef]
- Yang, X.; Zhang, G.; Lu, J.; Ma, J. A Kernel Fuzzy c-Means Clustering-Based Fuzzy Support Vector Machine Algorithm for Classification Problems With Outliers or Noises. IEEE Trans. Fuzzy Syst. 2011, 19, 105–115. [Google Scholar] [CrossRef]
- Izeboudjen, N.; Larbes, C.; Farah, A. A new classification approach for neural networks hardware: From standards chips to embedded systems on chip. Artif. Intell. Rev. 2014, 41, 491–534. [Google Scholar] [CrossRef]
- Wang, D.; He, H.; Liu, D. Intelligent Optimal Control With Critic Learning for a Nonlinear Overhead Crane System. IEEE Trans. Ind. Inform. 2018, 14, 2932–2940. [Google Scholar] [CrossRef]
- Wang, S.C. Artificial Neural Network. Interdiscip. Comput. Java Program. 2003, 743, 81–100. [Google Scholar] [CrossRef]
- Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Khoei, T.T.; Hu, W.C.; Kaabouch, N. Residual Convolutional Network for Detecting Attacks on Intrusion Detection Systems in Smart Grid. In Proceedings of the 2022 IEEE International Conference on Electro Information Technology (eIT), Mankato, MN, USA, 19–21 May 2022; pp. 231–237. [Google Scholar]
- Gunturi, S.K.; Sarkar, D. Ensemble machine learning models for the detection of energy theft. Electr. Power Syst. Res. 2021, 192, 106904. [Google Scholar] [CrossRef]
- Ismail, S.; Khoei, T.T.; Marsh, R.; Kaabouch, N. A comparative study of machine learning models for cyber-attacks detection in wireless sensor networks. In Proceedings of the 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 1–4 December 2021; pp. 0313–0318. [Google Scholar]
- Khoei, T.T.; Kaabouch, N. Densely Connected Neural Networks for Detecting Denial of Service Attacks on Smart Grid Network. In Proceedings of the 2022 IEEE 13th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 26–29 October 2022; pp. 0207–0211. [Google Scholar]
- Pham, D.T.; Dimov, S.S.; Chi, N.D. Selection of K in K-means clustering. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2005, 219, 103–119. [Google Scholar] [CrossRef]
- Jolliffe, T.I.; Jorge, C. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef] [PubMed]
- Bock, S.; Weiß, M. A Proof of Local Convergence for the Adam Optimizer. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
- Slimane, T.T.K.H.O.; Kaabouch, N. Cyber-Security of Smart Grids: Attacks, Detection, Countermeasure Techniques, and Future Directions. Commun. Netw. 2022, 14, 119–170. [Google Scholar]
- Jafari, F.; Dorafshan, S. Comparison between Supervised and Unsupervised Learning for Autonomous Delamination Detection Using Impact Echo. Remote Sens. 2022, 14, 6307. [Google Scholar] [CrossRef]
Attacks | Number of Samples |
---|---|
Total Normal | 5,693,110 |
Domain Name System (DNS) | 5,071,011 |
Simple Network Management Protocol (SNMP) | 5,159,870 |
Trivia File Transfer Protocol (TFTP) | 20,082,580 |
Lightweight Directory Access Protocol (LDAP) | 2,179,930,232 |
Network Basic Input/Output System (Netbios) | 4,092,937 |
Microsoft SQL To Server (MSSQL) | 5,781,928 |
Simple Service Discovery Protocol (SSDP) | 2,610,611 |
Network Time Protocol (NTP) | 1,202,649 |
Simple Service Discovery Protocol (SSDP) | 2,610,611 |
User Datagram Protocol Link Aggregation (UDP-Lag) | 366,461 |
Features | Abbreviations |
---|---|
Total Length of Forward Packets | Total Length of Fwd Packets |
Flow Byte(s) | Flow Byte |
Flow Packet(s) | Flow Packet |
Flow Inter Arrival Time Mean | Flow IAT Mean |
Flow I Inter Arrival Time Std | Flow IAT Std |
Flow Inter Arrival Time Max | Flow IAT Max |
Forward Packets | Fwd Packets |
Backward Packets | Bwd Packets |
Min Packet Length | Min Packet Length |
Max Packet Length | Max Packet Length |
Packet Length Variance | Packet Length Variance |
Total Forward Packets | Total Fwd Packets |
Total Backwards Packets | Total Bwd Packets |
Forward Packets Length Min | Fwd Packets Length Min |
Forward Packets Length Mean | Fwd Packets Length Mean |
Forward Inter Arrival Time Mean | Fwd IAT Mean |
Backward Inter Arrival Time Total | Bwd IAT Total |
Backward Inter Arrival Time Min | Bwd IAT Min |
Backward Inter Arrival Time Mean | Bwd IAT Mean |
Packet Length Mean | Packet Length Mean |
Forward Packet Length Std | Fwd Packet Length Std |
Model | Best Parameters |
---|---|
GNB | var_smoothing = 0.001 |
CART | Criterion = ‘gini’, max-depth = 36, splitter = ‘best’, max_features = ‘log2’. |
C-SVM | C = 4, penalty = ‘l2’ |
LR | Max_iter = 12, penalty = ‘l2’ |
AlexNet | Epoch = 100, momentum = 0.9, Batch size = 128, learning_rate = 0.01. |
LightGBM | Boosting_type = ‘gbdt’, max_depth = 10, learning_rate = 0.1, n_estimators = 100 |
PCA | max-depth = 10, Max-features = ‘sqrt’, splitter = ‘best’, Criterion = ‘entropy’. |
K-means | n-clusters = 2, algorithm = ‘auto’, random-state = 0. |
VA-Encoder | Loss = ‘mse’, Activation = ‘Relu’, Epoch = 100 |
Model | PRT (S) | PT (S) | TPS (S) | M (MiB) |
---|---|---|---|---|
GNB | 4.33 | 4.15 | 0.82 | 245 |
CART | 1.2 | 1.1 | 0.2 | 132 |
C-SVM | 2.9 | 1.8 | 0.39 | 236 |
LR | 1.6 | 1.2 | 0.51 | 223 |
AlexNet | 1.01 | 1 | 0.01 | 102 |
LightGBM | 1.4 | 1.3 | 0.09 | 112 |
PCA | 1.9 | 0.91 | 0.89 | 164 |
K-means | 1.9 | 1.4 | 0.81 | 180 |
VA-Encoder | 1.77 | 1.2 | 0.5 | 144 |
Models | Attacks | PRT (S) | PT (S) | TPS (S) | M (MiB) |
---|---|---|---|---|---|
AlexNet | LDAP | 1.4 | 1.2 | 0.7 | 125 |
DNS | 1.1 | 0.9 | 0.3 | 149 | |
SNMP | 1.9 | 1.2 | 0.4 | 123 | |
MSSQL | 1.3 | 1.2 | 0.1 | 177 | |
NetBIOS | 1.9 | 1.4 | 0.9 | 191 | |
NTP | 1.2 | 1.7 | 0.6 | 182 | |
SSDP | 1.1 | 1 | 0.5 | 173 | |
TFTP | 1.4 | 1.1 | 0.7 | 167 | |
UDP | 1.8 | 1.2 | 0.5 | 166 | |
UDP-Lag | 1.3 | 1.1 | 0.2 | 161 | |
DNS | 1.4 | 1.2 | 0.7 | 125 | |
Benign | 1.8 | 1.4 | 0.4 | 180 | |
VA-Encoder | LDAP | 3.5 | 3.1 | 0.4 | 290 |
DNS | 3.4 | 2.9 | 0.4 | 278 | |
SNMP | 3.2 | 2.3 | 0.9 | 276 | |
MSSQL | 2.9 | 2.3 | 0.3 | 254 | |
NetBIOS | 2.9 | 2.3 | 0.4 | 246 | |
NTP | 2.9 | 1.3 | 0.6 | 277 | |
SSDP | 3.9 | 2.1 | 0.5 | 297 | |
TFTP | 3.1 | 2.9 | 0.3 | 289 | |
UDP | 3.8 | 3.1 | 0.7 | 290 | |
UDP-Lag | 2.9 | 2.7 | 0.2 | 214 | |
Benign | 3.1 | 2.9 | 0.2 | 212 |
References | Best Used Models | Models Category | Accuracy (%) | Detection Rate (%) | Misdetection Rate (%) | False Alarm Rate (%) | Processing Time | Training Time Per Sample (S) | Prediction Time (S) | Memory Usage (MiB) | Datasets | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Supervised | Unsupervised | KDDCup99 | NSL-KDD | DARPA | CICDDOS 2019 | CICD 001/002/2018 | Self-Collected | ||||||||||
[1] | Stacking | ✓ | - | 97.3 | 96 | 4.1 | 8.9 | - | - | - | - | - | - | - | - | ✓ | - |
[2] | Categorical Boosting | ✓ | - | 97.71 | 96.8 | 5.06 | 3.98 | - | - | - | - | - | - | - | - | ✓ | - |
[8] | Long Short-term Memory with Extreme Boosting | ✓ | - | 88 | 98 | - | - | - | - | - | - | ✓ | - | ✓ | - | - | - |
[9] | Random Forest | ✓ | - | 97.01 | 99.7 | - | - | - | - | - | - | - | - | - | - | ✓ | - |
[11] | Isolation Forest | ✓ | ✓ | 93.01 | - | - | - | - | - | - | - | - | - | - | - | - | - |
[12] | K-means | - | ✓ | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
[14] | Generative Adversarial Network | - | ✓ | 93 | 87.5 | - | - | - | - | - | - | - | - | - | - | - | ✓ |
[15] | Hierarchical Temporal Memory | - | ✓ | 96 | - | - | - | - | - | - | - | - | - | - | - | - | - |
Proposed Models | Alex Net | ✓ | - | 98.71 | 98.9 | 1.1 | 1.29 | 1.16 | 1.06 | 0.10 | 104.2 | - | - | - | - | ✓ | - |
VA- Encoder | - | ✓ | 96.7 | 97 | 3 | 3.3 | 1.7 | 1.23 | 0.11 | 143.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Talaei Khoei, T.; Kaabouch, N. A Comparative Analysis of Supervised and Unsupervised Models for Detecting Attacks on the Intrusion Detection Systems. Information 2023, 14, 103. https://doi.org/10.3390/info14020103
Talaei Khoei T, Kaabouch N. A Comparative Analysis of Supervised and Unsupervised Models for Detecting Attacks on the Intrusion Detection Systems. Information. 2023; 14(2):103. https://doi.org/10.3390/info14020103
Chicago/Turabian StyleTalaei Khoei, Tala, and Naima Kaabouch. 2023. "A Comparative Analysis of Supervised and Unsupervised Models for Detecting Attacks on the Intrusion Detection Systems" Information 14, no. 2: 103. https://doi.org/10.3390/info14020103
APA StyleTalaei Khoei, T., & Kaabouch, N. (2023). A Comparative Analysis of Supervised and Unsupervised Models for Detecting Attacks on the Intrusion Detection Systems. Information, 14(2), 103. https://doi.org/10.3390/info14020103