A Reinforcement Learning Framework with Oversampling and Undersampling Algorithms for Intrusion Detection System
:1. Introduction
- Implementing the novel RLFOUA framework, an innovative automatic learning environment that pioneers the use of RL techniques in dataset resampling. This approach dynamically determines the next state based on the current state and past rewards, setting it apart from traditional resampling methods.
- Proposing the innovative TFRSMOTE resampling algorithm, a pioneering solution for handling imbalanced datasets in a dynamic environment. This algorithm generates synthetic data from the minority class for oversampling and strategically reduces the majority class for effective undersampling, marking a departure from conventional resampling techniques.
- Presenting the new IM as an original contribution to the field. This novel metric offers a comprehensive evaluation of algorithm performance, considering a range of critical factors not addressed by existing metrics.
2. Related Works
2.1. Resampling Algorithms
2.2. RL
3. Methodology
3.1. Datasets
3.1.1. CSE-CIC-IDS2018
3.1.2. NSL-KDD Dataset
3.2. RLFOUA Framework Description
Algorithm 1: RLFOUA framework |
Input: Original Dataset, A set of machine learning algorithms, oversampling algorithms, and undersampling algorithms. Output: A new generated model for detecting attacks in an independent set of data. 1. preprocessing phase 2. Employing Functional Agents in RL a. Establishing the Algorithm Repository b. Enhanced RL with Functional Agents for RLFOUA i. Call Agent 1 and Agent 2 (TFRSMOTE) |
3.2.1. Preprocessing
3.2.2. Establishing the Algorithm Repository
Algorithm 2: Establishing the Algorithm Repository |
Input: A set of machine learning algorithms, oversampling algorithms, and undersampling algorithms. Output: A composite pipeline consisting of a singular machine learning algorithm, an oversampling algorithm, and an undersampling algorithm. 1. Set up machine learning algorithms with default parameters, including Decision Tree, Logistic Regression, Linear Discriminant Analysis, Linear SVC, Bagging Classifier, Random Forest Classifier, Extra Trees Classifier, K Neighbors Classifier, Gaussian Process Classifier, Dummy Classifier, XGB Classifier, Easy Ensemble Classifier, and Elliptic Envelope (dimension = 12). 2. Set the oversampling algorithms to include Random Oversampling, ADASYN, and SVMSMOTE (dimension = 3). 3. Set the undersampling algorithms to include Random Undersampling, Nearest Neighbor, One Sided Selection, Neighborhood Cleaning Rule, and NearMiss (dimension = 5). 4. set the actions to the following items: select one combination from the pipelines of algorithms, keep the current pipelines of algorithms, remove the pipelines of algorithms, continue the framework, stop the framework (dimension = 5). 5. Initialize Q-values for state-action pairs: Q(s, a) for all possible states (dimension = 180) and actions (dimension = 5). 6. Return a set of pipelines of algorithms from steps 1, 2, and 3 (selecting one algorithm from each step to generate a single pipeline). |
3.2.3. Employing Functional Agents in RL
Algorithm 3: Enhanced RL with functional agents for RLFOUA |
Input: A preprocessed dataset with a composite pipeline consisting of a singular machine learning algorithm, an oversampling algorithm, and an undersampling algorithm. Output: A new generated model for detecting attacks in an independent set of data Initialize Q-table with zeros. Set reward discount factor (γ) to 0 for Bellman Equation (prioritizing immediate rewards). State: Composite pipeline (ML algorithm, oversampling, undersampling) Action: Selecting an algorithm combination Reward: IM metric, based on F1 score, accuracy, FPR, FNR, specificity, sensitivity Transition: Updating Q-values do 1. For each pipeline received from Algorithm 2 (Agent1): 1.1. Utilize the given pipeline to train on the training dataset and subsequently make predictions on the validation dataset. 1.2. Calculate evaluation metrics such as F1 score, accuracy, FPR, FNR, specificity, sensitivity, and time. 1.3.. 1.4. If the IM is better than the best IM from any of the previous iterations: 1.4.1. Save the model, algorithm details, metrics, and classification report. 1.5. If the IM for this pipeline is improved compared with the same pipeline in previous iteration: 1.5.1. Increase the reward for this pipeline. 1.6. Else: 1.6.1. Decrease the reward for this pipeline. 1.7. Update the Q-values in reward table using the Bellman equation. 1.8. TFRSMOTE algorithm (Agent2): Set the value of k for KNN between 20 and 2000. It is dynamically reduced if the size of the selected data is less than or equal to twice k to ensure at least two random data points are selected for step 1.9.5. 1.9. For value of k: 1.9.1. Predict the validation data from the generated model from step 1.4.1 1.9.2. Calculate the FPR, FNR, and TPR data from predicting the generated model from RLFOUA on the validation set. 1.9.3. Create three datasets: FNPR, which merges FPR and FNR, containing falsely classified data, TPR, containing correctly classified benign data, and remaining data. 1.9.4. Apply the KNN algorithm to select k groups of samples from the FNPR dataset obtained in step 1.9.3. 1.9.5. Calculate the average of two randomly selected samples from each group of the KNN results and add them to the FNPR dataset as new samples. 1.9.6. Apply KNN to select k groups of samples from the TPR dataset obtained in step 1.9.3. 1.9.7. Select one element randomly from each group. 1.9.8. Remove the elements selected in step 1.9.7 from the TPR dataset. 1.9.9. Generate a new dataset by combining the new TPR, new FNPR, and the remaining data from the original dataset obtained in step 1.9.3 1.10. replace the dataset with the new resampled dataset from the TFRSMOTE algorithm for the next iteration. 2. Exclude the algorithm combinations with rewards lower than the threshold from further processing. By default, this threshold is set to zero in this algorithm. However, retain its model and performance measures for later comparison with other algorithm combinations (Agent1). While IM shows improvement in at least one of the algorithm combinations, compared to previous loop iteration. Return the new balanced dataset and the generated model. |
3.2.4. Testing Method
- Preliminary application of RLFOUA framework on CSE-CIC-IDS2018 dataset: The smaller section of the dataset, as discussed in Section 3.1.1, undergoes division into training and validation subsets. Within the RLFOUA framework, machine learning algorithms are trained using the training set, while the validation dataset is utilized to compute classification metrics. This framework’s impact is assessed by comparing pre- and post-application metrics, encompassing F1 score, accuracy, recall, and precision. The four most proficient algorithm combinations are ranked independently for each dataset, effectively highlighting the RLFOUA framework’s capacity to enhance classification performance.
- Preliminary application of RLFOUA framework on NSL-KDD dataset: Similar analyses are conducted on the entire NSL-KDD datasets, following the same partitioning approach and evaluating performance based on distinct training and validation sets. Subsequent ranking of the top four algorithm combinations underscores the consistent advancement in classification performance through the RLFOUA framework.
- Primary testing on independent set of CSE-CIC-IDS2018 dataset: In order to corroborate the generalizability of the RLFOUA framework, independent datasets sourced from the CSE-CIC-ID2018 dataset are engaged. The entire dataset undergoes tripartite division into training, validation, and independent testing subsets. The framework is trained and validated on these data subsets, yielding a new model for subsequent testing. This newly formed model is then evaluated on independent testing sets using the RLFOUA framework, affirming the framework’s reliability and ability to maintain consistent performance beyond training and validation phases.
- Primary testing on independent set of NSL-KDD dataset: To extend the applicability of the method, the approach employed in test 3 is replicated on the NSL-KDD datasets. Employing a tripartite division, the datasets are split into training, validation, and independent testing subsets, with the framework trained and validated accordingly. The resultant model is then subjected to evaluation on separate testing sets, reinforcing the method’s generalizability and its capacity to ensure stable performance beyond training and validation stages.
4. Results
4.1. Preliminary Application of RLFOUA Framework on CSE-CIC-IDS2018 Dataset
4.2. Preliminary Application of RLFOUA Framework on NSL-KDD Dataset
4.3. Primary Testing on Independent Set for CSE-CIC-IDS2018
4.4. Primary Testing on Independent Set for NSL-KDD
5. Discussion
6. Conclusions
Author Contributions
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
- Abedzadeh, N.; Jacobs, M. Using Markov Chain Monte Carlo Algorithm for Sampling Imbalance Binary IDS Datasets. In Proceedings of the 12th International Workshop on Security, Privacy, Trust for Internet of Things (IoTSPT) at the ICCCN 2022, Honolulu, HI, USA, 25–28 July 2022; pp. 1–6. [Google Scholar]
- Abedzadeh, N.; Jacobs, M. A Survey in Techniques for Imbalanced IDS Datasets. In Proceedings of the ICICCS 2022: 16th International Conference on Intelligent Computing and Control Systems, Madurai, India, 25–27 May 2022; pp. 1–6. [Google Scholar]
- Ma, X.; Shi, W. AESMOTE: Adversarial Reinforcement Learning with SMOTE for Anomaly Detection. IEEE Trans. Netw. Sci. Eng. 2021, 8, 790–802. [Google Scholar] [CrossRef]
- Phetlasy, S.; Ohzahata, S.; Wu, C.; Kato, T. Applying SMOTE for a Sequential Classifiers Combination Method to Improve the Performance of Intrusion Detection System. In Proceedings of the IEEE International Conference on Dependable, Autonomic and Secure Computing, Fukuoka, Japan, 5–8 August 2019; pp. 1–6. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2022, 16, 321–357. [Google Scholar] [CrossRef]
- Qazi, N.; Raza, K. Effect of feature selection, synthetic minority over-sampling (SMOTE), and under-sampling on class imbalance classification. In Proceedings of the 14th International Conference on Modelling and Simulation, Cambridge, UK, 28–30 March 2012; pp. 145–150. [Google Scholar]
- Tesfahun, A.; Bhaskari, D.L. Intrusion detection using random forests classifier with SMOTE and feature reduction. In Proceedings of the International Conference on Cloud & Ubiquitous Computing & Emerging Technologies, Pune, India, 15–16 November 2013; pp. 127–132. [Google Scholar]
- Lopez-Martin, M.; Carro, B.; Sanchez-Esguevillas, A. Variational data generative model for intrusion detection. Knowl. Inf. Syst. 2019, 60, 569–590. [Google Scholar] [CrossRef]
- Sun, Y.; Liu, F. SMOTE-NCL: A re-sampling method with filter for network intrusion detection. In Proceedings of the 2nd IEEE International Conference on Computer and Communications, Sanya, China, 27–29 December 2016; pp. 1157–1161. [Google Scholar]
- Karatas, G.; Demir, O.; Sahingoz, O.K. Increasing the Performance of Machine Learning-Based IDSs on an Imbalanced and Up-to-Date Dataset. IEEE Access 2020, 8, 32150–32162. [Google Scholar] [CrossRef]
- Yan, B.; Han, G. LA-GRU: Building combined intrusion detection model based on imbalanced learning and gated recurrent unit neural network. Secur. Commun. Netw. 2018, 2018, 1–13. [Google Scholar] [CrossRef]
- Gad, A.R.; Nashat, A.A.; Barkat, T.M. Intrusion Detection System Using Machine Learning for Vehicular Ad Hoc Networks Based on ToN-IoT Dataset. IEEE Access 2012, 9, 142206–142217. [Google Scholar] [CrossRef]
- Jimoh, I.A.; Ismaila, I.; Olalere, M. Enhanced Decision Tree—J48 With SMOTE Machine Learning Algorithm for Effective Botnet Detection in Imbalanced Dataset. In Proceedings of the15th International Conference on Electronics Computer and Computation, Abuja, Nigeria, 10–12 December 2019; pp. 1–6. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; The MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Servin, A.; Kudenko, D. Multi-Agent Reinforcement Learning for Intrusion Detection. Ph.D. Thesis, University of York, York, UK, 2009. [Google Scholar]
- Huang, C.; Wu, Y.; Zuo, Y.; Pei, K.; Min, G. Towards experienced anomaly detector through reinforcement learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 1–6. [Google Scholar]
- Laptev, N.; Amizadeh, S.; Flint, I. Generic and scalable framework for automated time-series anomaly detection. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’15, Sydney, Australia, 10–13 August 2015; pp. 1939–1947. [Google Scholar]
- Caminero, G.; Lopez-Martin, B.; Carro, M.; Sanchez-Esguevillas, A. Adversarial environment reinforcement learning algorithm for intrusion detection. Comput. Netw. 2019, 159, 96–109. [Google Scholar] [CrossRef]
- Vij, C.; Saini, H. Intrusion Detection Systems: Conceptual Study and Review. In Proceedings of the 2021 6th International Conference on Signal Processing, Computing and Control, Xi’an, China, 9–11 April 2021; pp. 694–700. [Google Scholar]
- Maseer, Z.K.; Yusof, R.; Bahaman, N.; Mostafa, S.A.; Foozy, C.F.M. Benchmarking of Machine Learning for Anomaly Based Intrusion Detection Systems in the CICIDS2017 Dataset. IEEE 2021, 9, 1–6. [Google Scholar] [CrossRef]
- A Realistic Cyber Defense Dataset (CSE-CIC-IDS2018). Available online: https://registry.opendata.aws/cse-cicids2018 (accessed on 1 January 2022).
- Mbow, M.; Koide, H.; Sakurai, K. An Intrusion Detection System for Imbalanced Dataset Based on Deep Learning. In Proceedings of the 2021 9th International Symposium on Computing and Networking, Matsue, Japan, 23–26 November 2021; pp. 38–47. [Google Scholar]
- Gopalan, S.S.; Ravikumar, D.; Linekar, D.; Raza, A.; Hasib, M.; Roy, B.K. Balancing Approaches towards ML for IDS: A Survey for the CSE-CIC IDS Dataset. In Proceedings of the 2020 International Conference on Communications, Signal Processing, and Their Applications, Sharjah, United Arab Emirates, 16–18 March 2021; pp. 1–6. [Google Scholar]
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Nashville, TN, USA, 30 March–2 April 2009; pp. 1–6. [Google Scholar]
- Abedzadeh, N.; Jacobs, M. GANMCMCRO: A Generative Adversarial Network Markov Chain Monte Carlo Random Oversampling Algorithm for Imbalance Datasets. In Proceedings of the DMMLACS—3rd International Special Session on Data Mining and Machine Learning Applications for Cyber Security, Rome, Italy, 15–17 November 2023; pp. 1–8. [Google Scholar]
Algorithm | Attack Type | Precision | Recall | F1 Score | Support | Accuracy |
Balanced Bagging Classifier Random Oversampling Random Undersampling TFRSMOTE | Benign | 1 | 0.9996 | 0.9998 | 7322 | 0.9996 |
Infiltration | 0.9577 | 1 | 0.9784 | 68 | ||
Random Forest Classifier Random Oversampling Neighborhood Cleaning Rule TFRSMOTE | Benign | 1 | 0.9999 | 0.9999 | 7308 | 0.9999 |
Infiltration | 0.9821 | 1 | 0.991 | 55 | ||
Extra Trees Classifier ADASYN Neighborhood Cleaning Rule TFRSMOTE | Benign | 0.9999 | 0.9986 | 0.9992 | 7308 | 0.9985 |
Infiltration | 0.8485 | 0.9836 | 0.9106 | 57 | ||
Bagging Classifier ADASYN Neighborhood Cleaning Rule TFRSMOTE | Benign | 0.9999 | 0.9966 | 0.9982 | 7322 | 0.9965 |
Infiltration | 0.7059 | 0.9825 | 0.8219 | 61 |
Algorithm | Attack Type | Precision | Recall | F1 Score | Support | Frequency | Accuracy |
Decision Tree Classifier Random Oversampling Algorithm Near Miss Undersampling TFRSMOTE | Neptune | 0.9999 | 0.9991 | 0.9995 | 16,585 | 41,214 | |
Nmap | 0.9685 | 0.9898 | 0.979 | 590 | 1493 | ||
Normal | 0.9982 | 0.9987 | 0.9985 | 26,709 | 67,343 | ||
Rootkit | 0.0694 | 1 | 0.1299 | 5 | 10 | ||
Multihop | 1 | 1 | 1 | 7 | 7 | ||
Weighted Avg | 0.9885 | 0.9847 | 0.9847 | 50,378 | 0.9847 | ||
Bagging Classifier Random Oversampling Algorithm Near Miss Undersampling TFRSMOTE | Neptune | 0.9999 | 0.9993 | 0.9996 | 16,579 | 41,214 | |
Nmap | 0.9662 | 0.9967 | 0.9812 | 603 | 1493 | ||
Normal | 0.9994 | 0.9994 | 0.9994 | 26,828 | 67,343 | ||
Rootkit | 0.75 | 1 | 0.8571 | 3 | 10 | ||
Multihop | 1 | 1 | 1 | 4 | 7 | ||
Weighted Avg | 0.9982 | 0.9981 | 0.9981 | 50,390 | 0.9981 | ||
XGB Classifier Random Oversampling Algorithm Neighborhood Cleaning Rule Undersampling TFRSMOTE | Neptune | 0.9999 | 0.9999 | 0.9999 | 16,366 | 41,214 | |
Nmap | 0.9982 | 0.993 | 0.9956 | 568 | 1493 | ||
Normal | 1 | 0.9993 | 0.9996 | 27,123 | 67,343 | ||
Rootkit | 0.5714 | 1 | 0.7273 | 4 | 10 | ||
Multihop | 1 | 1 | 1 | 2 | 7 | ||
Weighted Avg | 0.9996 | 0.9995 | 0.9995 | 50378 | 0.9995 | ||
Extra Trees Classifier Random Oversampling Algorithm Neighborhood Cleaning Rule Undersampling TFRSMOTE | Neptune | 0.9999 | 1 | 1 | 16,462 | 41,214 | |
Nmap | 1 | 0.9951 | 0.9975 | 613 | 1493 | ||
Normal | 0.9999 | 0.9994 | 0.9996 | 26,852 | 67,343 | ||
Rootkit | 0.8 | 0.8 | 0.8 | 5 | 10 | ||
Multihop | 1 | 1 | 1 | 4 | 7 | ||
Weighted Avg | 0.9996 | 0.9995 | 0.9995 | 50,420 | 0.9995 |
Precision | Recall | F1 Score | Support | |
Benign | 0.9953 | 0.986 | 0.9907 | 538,865 |
Infiltration | 0.9348 | 0.9774 | 0.9556 | 110,453 |
Accuracy | 0.9846 | 649,318 | ||
Macro Avg | 0.9651 | 0.9817 | 0.9731 | 649,318 |
Weighted Avg | 0.985 | 0.9846 | 0.9847 | 649,318 |
Attack Type | Precision | Recall | F1 Score | Support | Frequency |
Back | 1 | 0.9873 | 0.9936 | 79 | 956 |
Buffer_overflow | 1 | 0.8 | 0.8889 | 5 | 30 |
Ftp_write | 0 | 0 | 0 | 1 | 8 |
Guess_passwd | 1 | 1 | 1 | 8 | 53 |
Ipsweep | 0.9945 | 0.9973 | 0.9959 | 365 | 3599 |
Land | 0.5 | 1 | 0.6667 | 1 | 18 |
Loadmodule | 0 | 0 | 0 | 1 | 9 |
Multihop | 1 | 1 | 1 | 1 | 7 |
Neptune | 1 | 1 | 1 | 4147 | 41,214 |
Nmap | 0.986 | 0.986 | 0.986 | 143 | 1493 |
Normal | 0.9976 | 0.9996 | 0.9986 | 6713 | 67,343 |
Pod | 1 | 1 | 1 | 18 | 201 |
Portsweep | 0.9963 | 1 | 0.9982 | 271 | 2931 |
Rootkit | 0 | 0 | 0 | 2 | 10 |
Satan | 1 | 0.9864 | 0.9931 | 367 | 3633 |
Smurf | 1 | 1 | 1 | 283 | 2646 |
Teardrop | 1 | 1 | 1 | 95 | 892 |
Warezclient | 0.978 | 0.9271 | 0.9519 | 96 | 890 |
Warezmaster | 1 | 1 | 1 | 1 | 20 |
Accuracy | 0.9981 | 12,597 | |||
Macro Avg | 0.8133 | 0.8255 | 0.8144 | 12,597 | |
Weighted Avg | 0.9978 | 0.9981 | 0.9979 | 12,597 |
Method | F1 Score | Accuracy | Precision | Recall |
RLFOUA | 0.9061 | 0.9981 | 0.9055 | 0.9118 |
AESMOTE | 0.8243 | 0.8209 | 0.8411 | 0.8209 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Abedzadeh, N.; Jacobs, M. A Reinforcement Learning Framework with Oversampling and Undersampling Algorithms for Intrusion Detection System. Appl. Sci. 2023, 13, 11275. https://doi.org/10.3390/app132011275
Abedzadeh N, Jacobs M. A Reinforcement Learning Framework with Oversampling and Undersampling Algorithms for Intrusion Detection System. Applied Sciences. 2023; 13(20):11275. https://doi.org/10.3390/app132011275
Chicago/Turabian StyleAbedzadeh, Najmeh, and Matthew Jacobs. 2023. "A Reinforcement Learning Framework with Oversampling and Undersampling Algorithms for Intrusion Detection System" Applied Sciences 13, no. 20: 11275. https://doi.org/10.3390/app132011275
APA StyleAbedzadeh, N., & Jacobs, M. (2023). A Reinforcement Learning Framework with Oversampling and Undersampling Algorithms for Intrusion Detection System. Applied Sciences, 13(20), 11275. https://doi.org/10.3390/app132011275