Next Article in Journal
Novel Optimized Strategy Based on Multi-Next-Hops Election to Reduce Video Transmission Delay for GPSR Protocol over VANETs
Next Article in Special Issue
On the Robustness of ML-Based Network Intrusion Detection Systems: An Adversarial and Distribution Shift Perspective
Previous Article in Journal
QoS-Aware and Energy Data Management in Industrial IoT
Previous Article in Special Issue
A Novel Deep Learning-Based Intrusion Detection System for IoT Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data

1
Department of Computer Science, University of West Florida, Pensacola, FL 32514, USA
2
Department of Mathematics and Statistics, University of West Florida, Pensacola, FL 32514, USA
*
Author to whom correspondence should be addressed.
Computers 2023, 12(10), 204; https://doi.org/10.3390/computers12100204
Submission received: 14 July 2023 / Revised: 19 September 2023 / Accepted: 21 September 2023 / Published: 11 October 2023
(This article belongs to the Special Issue Big Data Analytic for Cyber Crime Investigation and Prevention 2023)

Abstract

:
Machine Learning is widely used in cybersecurity for detecting network intrusions. Though network attacks are increasing steadily, the percentage of such attacks to actual network traffic is significantly less. And here lies the problem in training Machine Learning models to enable them to detect and classify malicious attacks from routine traffic. The ratio of actual attacks to benign data is significantly high and as such forms highly imbalanced datasets. In this work, we address this issue using data resampling techniques. Though there are several oversampling and undersampling techniques available, how these oversampling and undersampling techniques are most effectively used is addressed in this paper. Two oversampling techniques, Borderline SMOTE and SVM-SMOTE, are used for oversampling minority data and random undersampling is used for undersampling majority data. Both the oversampling techniques use KNN after selecting a random minority sample point, hence the impact of varying KNN values on the performance of the oversampling technique is also analyzed. Random Forest is used for classification of the rare attacks. This work is done on a widely used cybersecurity dataset, UNSW-NB15, and the results show that 10% oversampling gives better results for both BMSOTE and SVM-SMOTE.

1. Introduction

Cyberattacks have become a norm of the digital world. Ref. [1] states that there are only two types of companies in the world, the type that has been hacked and the type that doesn’t yet know they have been hacked. Attacks, by the individuals or organizations, aim to either steal, block access to, or delete information. Cyber-attacks affect the ecosystem of industry and result in huge financial losses. In 2021, globally, data breaches costed USD 4 million, and in USA it was twice that amount [2]. Cyberattacks not only damage the reputation of an enterprise, but also lead to legal complications. Cybersecurity teams in every organization develop, maintain and enforce policies and systems to identify and prevent such attacks on the network. In the initial days, Intrusion Detection Systems (IDS) were comprised of either Signature Based Detection or Behavior Based Detection. The former relied on the signature of the attack to be similar to the known signatures while the later compared the profile of the attack and compared it with the normal behavior of the standard profiles [3]. But, though network attacks are increasing steadily, the ratio of attacks to regular network traffic is still significantly less, creating highly imbalanced datasets. And this leads to the problem of effectively training ML models to detect and classify malicious traffic, especially rare malicious (attacks) traffic. Hence, predicting rare attacks in imbalanced datasets has become a significant problem. In this work, we address this issue by using data resampling techniques that oversample minority data and undersample majority data.
Though several oversampling and undersampling techniques are available [4], this paper uses two oversampling techniques, Borderline SMOTE (BSMOTE) and SVM-SMOTE, in varying percentages for oversampling minority data. Random Undersampling (RU) is used for undersampling majority data. The oversampling techniques use K-Nearest Neighbor (KNN) to identify neighbors after selecting a random minority sample point. The impact of different KNN values has also been studied on the performance of the oversampling technique. And finally, the Random Forest (RF) classifier, which has previously been successfully used for the classification of imbalanced data [5], is used for classification. This work is done on UNSW-NB15 [6], a well-researched cybersecurity dataset with many minority classes or rare attacks. This work looks at three of the rarest classes. Hence, the points that make this paper unique are:
  • The paper attempts to determine the optimal oversampling ratio that is needed to classify an attack with high accuracy; oversampling percentages are varied from 10% to 100%, with undersampling kept at a constant 50%.
  • The paper determines if the order of resampling, that is, oversampling before undersampling or undersampling before oversampling, has an impact on classification;
  • The paper studies whether there is any difference between BSMOTE and SVM-SMOTE in this experimental setup. The paper examines the impact of various KNN values on oversampling techniques.
The rest of this paper is organized as follows. Section 2 presents the background; Section 3 offers work related to oversampling and undersampling; Section 4 details the data used in this paper; Section 5 contains the experimental design; Section 6 outlines the hardware and software configurations; Section 7 presents the metrics used to present the results; Section 8 presents the results and discussions; Section 9 collates the conclusions and presents the future works.

2. Background

2.1. Resampling

Resampling is primarily used in imbalanced datasets to modify the data distribution in the training dataset [7]. It is considered an effective way of obtaining a more balanced data distribution in imbalanced datasets [8]. But before starting resampling, it is essential to understand the need for resampling. ML classifiers get skewed towards classes that have more data. And, since the majority class(es) have more data, classifiers often give high accuracy, but this is because they are predicting only the majority class(es) and not really classifying the minority classes. Hence, to accurately identify network attacks, which are usually the minority classes, using an ML classifier, it is necessary to balance the classes, and therefore the need for resampling.
For a dataset to be considered highly imbalanced, it should contain most data from a few majority classes and very few data points from the minority classes. In the case of network attacks, benign network traffic forms the majority of the population, and malicious attacks form a minority segment; hence we term the minority segment as rare attacks. Such rare attacks are complicated to identify in the real world, and training a ML model is even harder.

2.1.1. Undersampling

Undersampling is a technique where the majority class instances are reduced to bring balance to the dataset. Depending on the resampling type, the majority class samples may be analyzed before removing them from the distribution. Brute force approaches to the resampling include Random Undersampling (RU), where the algorithm does not know whether the data points being removed (by RU) are critical.

2.1.2. Oversampling

Oversampling addresses the minority classes in the dataset. It either creates duplicates of the existing minority class samples, as in the case of Random Oversampling (RO) or generates new synthetic points in the feature space, as is the case of the SMOTE family of methods. Oversampling of the minority class is done to scale up to the count of the majority class. That is, 10% oversampling means the existing minority class count is increased to match ten percent of the majority class.
In this work, resampling methods were not applied to the whole dataset but were limited to the training dataset. For ML models, it is imperative to monitor the class distribution in the training set to observe the patterns of the real population. Stratified sampling was used to ensure that, after the split, each class was represented in the training as well as testing data.

2.2. K-Nearest Neighbor

K-Nearest Neighbor (KNN) works by finding the nearest instances surrounding the instance to be classified and predicting the output based on those K instances. It is a non-parametric machine learning algorithm that does not make any assumptions about the form of the mapping function [9]. To classify any given instance, it looks up its K neighbors and assigns the instance to the appropriate class. The nearest neighbors are selected based on a distance measure in the feature space. KNN is used by synthetic data generation techniques such as SMOTE by creating new synthetic samples on the lines connecting the existing minority sample to their k-nearest neighbors [10,11]. Since KNN plays a vital role in the identification of misclassified instances in the case of BSMOTE, it was decided to vary K and analyze the results.

2.3. BSMOTE and SVM-SMOTE

Borderline SMOTE (BSMOTE) is an improvement of the classic SMOTE, where instead of oversampling all the minority class instances, the ones nearest to the borderline are identified and synthetic samples are generated [10]. SVM-SMOTE differs from the BSMOTE in that first standard SVMs are trained on the training data set, and then the borderline area is approximated. In SVM-SMOTE also, new data generation happens on the line created by joining the minority class support vector with its nearest neighbors [11]. In SVM-SMOTE, however, KNN values are not used to generate the decision boundary. They are used to select the kth nearest neighbor to create the new instances. Because of the differing roles played by KNN in these two different oversampling techniques, they are chosen for analysis in this paper.

2.4. Random Forest

Random Forest (RF) is a highly used machine learning classifier that uses the decision of multiple decision trees to develop a final classification label [12]. In the RF algorithm, the decision tree is generated differently from a regular decision tree algorithm. Features are randomly selected when a decision tree node splits, and of the randomly selected features, the best feature is selected based on statistical measures like Information Gain and the Gini Index.

3. Related Works

In the present day, Machine Learning (ML) is widely used in cybersecurity to detect network intrusions [13,14,15]. Ref. [13] looked at the application of K-nearest neighbor (KNN) on artificial neural networks (ANNs) to develop an algorithm for IDSs. Ref. [14] looked at the concept of the one-class classifier on the problem of anomaly detection in communication networks. Several research works have been carried out to achieve better classification results using ML when the datasets are imbalanced. Many review papers have been identified on this topic [16,17,18,19].
Next, papers that present different techniques to handle ML classification in the imbalanced dataset are presented. Ref. [10] presented two new minority oversampling techniques, borderline-SMOTE1 and borderline-SMOTE2. In both these techniques, the minority data near the borderline were oversampled. Authors of [10] claim that their methods achieved better TP rates and F-value scores than regular SMOTE and other oversampling methods.
Using a hybrid approach by combining ADASYN oversampling and TomekLink, researchers improved the detection of network intrusions on the NSL-KDD dataset [20]. A heterogenous ensemble learning approach developed by authors in another research [21] improved the Worms attack type detection rate in the UNSW-NB15 dataset. In one work, the authors used a combination of preprocessing techniques such as data standardization, normalization, as well as selection of features, and class balancing to improve the efficiency of the Random Forest classifier [5].
Ref. [22] proposed Tomek-link (TLUSBoost). TLUSBoost finds outliers using the Tomek-link concept and eliminates the probable redundant instances, thereby conserving the dataset’s characteristics. And AdaBoost is used for boosting. Their ensemble method had better experimental results than many other proposed methods.
Ref. [23] proposed a new algorithm that combined boosting with heuristic undersampling and distribution-based sampling (HUSDOS-Boost) to solve the extremely imbalanced and small minority data problem. This algorithm was tested on health care data and presented good results.
There are also a few works that worked on resampling ratios. Ref. [24]’s study showed that increasing the percentage of the minority class from 0.1% to 1.0% of the majority class with partial balancing of the majority class gives better performance than balancing the classes to a 50:50 ratio. Ref. [25] conducted a study on the impact of class distribution when the size of the training dataset is limited and found that naturally occurring distributions do not give better performance. This work also found that the optimal distribution of a dataset has minority class samples between 50% and 90%. Ref. [26] investigated the effectiveness of a class proportion threshold using different classifiers. Ref. [27] studied class imbalance in datasets using four different classifiers and six performance evaluation criteria to determine if oversampling of the minority class gave higher accuracy (96%) than undersampling of the majority class (77%). Oversampling the minority data rather than undersampling the majority data helped detect the minority classes [28]. In the same work, the authors also detected that resampling will not impact the detection of rare classes if the data is not highly imbalanced.
The uniqueness of our work in terms of resampling ratios is that we looked at resampling ratios from 0.1% to 1.0% at intervals of 0.1%. Keeping the undersampling constant at 50%, we looked at whether oversampling before undersampling is better or vice versa. Also, using B-SMOTE and SVM SMOTE, we looked at different K values.

4. The Data: UNSW-NB15

UNSW-NB15 [6], created in 2015, contains 2.5 million rows, of which 2.2 million rows are of regular or benign traffic, and the other 300,000 are attack traffic. The attack traffic has nine attack categories, of which the smallest categories are Worms, Shellcode, and Backdoors, making up 0.006%, 0.059%, and 0.091% of the total traffic, respectively. Hence, these categories can be considered rare attacks and are particularly interesting in this research. Figure 1 presents a distribution of the attack families in this dataset.
Worms are a classification of malware that is self-propagating within a single host and across networks [29]. Shellcode is a type of exploit payload that modifies a program’s flow of execution to spawn a command interpreter [30]. Backdoor is a classification of malware that allows unauthorized access to a computer [29].

5. Experimental Design

Two sets of experiments were performed to analyze the resampling effectiveness: (i) Oversampling minority data followed by undersampling, and (ii) Undersampling majority data followed by oversampling. Two oversampling techniques, BSMOTE and SVM-SMOTE, were used in both experiments, with RU held constant at 50%. A stratified split of the datasets into training and testing data was used for both sets of experiments. A stratified split is performed to ensure that the training and testing datasets follow the original distribution of the dataset and that minority class instances are present before the synthetic generation using resampling techniques. This experimental design is presented in Figure 2.

Preprocessing

Preprocessing in this paper follows [31]. First, the time_stamps column was dropped since we were not doing any time related analysis. Categorical data, that is, protocol, state, and attack category, were converted into numeric values. Normalization was used on continuous data following [31]. Then, information gain was calculated, and features with low information gain were dropped. The dropped columns are shown with an asterisk (*) in Table 1. Information gain is calculated by removing the randomness in the dataset, which is measured by a class’s entropy [32].

6. Hardware and Software Configurations

The hardware and software configurations used for this analysis are listed in Table 2 and the Python libraries used in this paper are presented in Table 3.

7. Metrics Used for Presentation of Results

7.1. Classification Metrics

Since accuracy gets biased towards classes with more data, which in this case would be the majority class or benign data, accuracy is not a good metric for classifying imbalanced data. To effectively evaluate the classification results for imbalanced data, the following matrices were used: Precision, Recall, F-score, and Macro Precision.
Precision: Precision is the proportion of Predicted Positive cases correctly labeled as Positive [33].
Precision = [True Positives]/[True Positives + False Positives]
Recall: This is the percentage of the correctly classified positive samples, also referred to as the True Positive Rate [33].
All Real Positives = [True Positives + False Negatives]
Recall = True Positive Rate = [True Positives]/[All Real Positives]
F-Score: The F-score is high when both the precision and recall are high [32].
F-Score = 2 × [Precision × Recall]/[Precision + Recall]
Macro Precision: This is the arithmetic mean of each individual class’s precision.

7.2. Welch’s t-Tests

Welch’s t-tests were used to test the equality of two population means under unequal variances. Because of unequal variances, it’s degrees of freedom (d.f.) is obtained using Satterthwaite approximation.

8. Results and Discussion

This section presents the results of the runs, that is, oversampling followed by undersampling and undersampling followed by oversampling using both BSMOTE and SVM-SMOTE, varying KNN. For all processing, undersampling was kept at 50% of the majority class, and oversampling was varied from 10% to 100%. In UNSW-NB15, non-attack(benign) data forms the majority class. All results are an average of ten runs, and the best results have been highlighted in green in the following tables.

8.1. Selection of KNN

Since the SMOTE algorithms use KNN before the selection of minority class instances for oversampling, the effect of varying KNN is studied. KNN = 3, 5, and 10 are used.

8.2. BSMOTE Oversampling Followed by Random Undersampling

In this set of runs, oversampling the minority data was followed by undersampling the majority data. Oversampling percentages were varied from 10% to 100% of the majority class count.

8.2.1. Worms: BSMOTE Oversampling Varying KNN followed by Random Undersampling

Table 4 presents the results for Worms for oversampling minority data using BSMOTE for KNN = 3, followed by RU of the majority data. Oversampling of 0.1 (10%) has the best overall results. This conclusion is arrived at by calculating the probabilistic value using Welch’s t-test scores for the metrics, precision, recall, F-score and macro precision. The p-value significance was set to 1.0 to allow for more chances of rejecting the null hypothesis.
From Table 5, Welch’s t-test values are first calculated between metrics of 0.1 and 0.2 oversampling ratios, followed by the p-value calculation. The idea here is to observe whether the metrics increase or decrease as the oversampling percentages vary. So, a one-tailed t-test is performed. Probability values for all four metrics are analyzed, and if any of them fall below the significance value, it is marked appropriately. There are three possible outcomes when doing this analysis based on three scenarios, as shown in Table 6.
For example, in Table 5, between 0.1 and 0.2, since no statistical differences are observed, 0.1 is preferred. Next, 0.1 is compared with 0.3, and so on. On the other hand, if we consider 0.1 vs. 0.9, all the t-test values are positive and significant differences are observed in precision and macro-precision. So, 0.1 oversampling is considered better than 0.9. For efficiency in execution time, having a lower oversampling percentage will always be better.
Table 7 presents the results for Worms for oversampling using BSMOTE for KNN = 5, followed by RU, and Table 8 presents the results for KNN = 10. For both KNN = 5 and KNN = 10, 0.1 (10%) BSMOTE oversampling has the best results across all metrics. The Welch’s t-test analysis results for these values are presented in Table 9 and Table 10, respectively.

8.2.2. Shellcode: BSMOTE Oversampling Varying KNN Followed by Random Undersampling

Shellcode has the second least number of occurrences (1511 out of 321,283 attacks) in the UNSW-NB15 dataset. Table 11, Table 12, and Table 13 present the BSMOTE oversampling results followed by RU for Shellcode for KNN = 3, 5, and 10, respectively. When Welch’s t-test scores were calculated to determine the best oversampling percentage, all three KNN values had 0.1 as the best oversampling percentage for better minority sample prediction.
Welch’s t-test results for runs of BSMOTE oversampling with KNN = 3 for Shellcode are presented in Table 14. It can be noted that 0.1 had no statistical difference with 0.2, 0.3, 0.5 and 0.6 oversampling percentages. It also produced better results than 0.8, 0.9 and 1.0 for precision and macro-precision.
Welch’s t-test analysis are presented in Table 15 and Table 16 for BSMOTE oversampling of Shellcode data followed by RU using KNN = 5 and 10 respectively. In both of these sets of runs, 0.1 oversampling gave better results than higher oversampling percentages.

8.2.3. Backdoors: BSMOTE Oversampling Varying KNN Followed by Random Undersampling

Backdoors had more occurrences than Shellcode (2329 out of 321,283 attacks), but it is still a minority class in the distribution. Table 17, Table 18 and Table 19 present the results of varying oversampling percentages using BSMOTE followed by RU for KNN = 3, 5 and 10, respectively.
Welch’s t-test calculations result in 0.1 (10%) BSMOTE oversampling having better overall results when compared to other oversampling percentages. For KNN = 3, 5 and 10 in backdoors, 10% BSMOTE oversampling was sufficient for the ML model to predict the minority class samples in the testing dataset as presented in the Table 20, Table 21, and Table 22, respectively.

8.3. SVM-SMOTE Oversampling Followed by Random Undersampling

SVM-SMOTE is an extension of the SMOTE technique where support vectors are used to identify the decision boundary between the majority and minority classes for classification. The minority samples close to the decision boundary are selected for resampling. As was done with BSMOTE oversampling, KNN was varied for 3, 5, and 10 for SVM-SMOTE.

8.3.1. Worms: SVM-SMOTE Oversampling Varying KNN Followed by Random Undersampling

Results for SVM-SMOTE oversampling using KNN 3, 5 and 10, on Worms, are presented in Table 23, Table 24 and Table 25 respectively. Probabilistic significance evaluation of the Welch’s t-test values resulted in 10% oversampling performing better for KNN = 3 and 5 as shown in Table 26 and Table 27.
When compared to KNN = 3 and 5, for KNN = 10, 30% oversampling performed the best. Though higher oversampling percentages (>30%) had higher precision and F-score values like 100% oversampling, the p-value calculations did not find the differences to be significant. The results are captured in Table 28.

8.3.2. Shellcode: SVM-SMOTE Oversampling Varying KNN Followed by Random Undersampling

Table 29, Table 30 and Table 31 present the results of SVM-SMOTE oversampling of the Shellcode attack. For KNN = 3, 5 and 10, p-value calculations of Welch’s t-test scores reveal that there were no significant differences observed between 10% and most of the higher percentages. The analysis results are captured in Table 32, Table 33 and Table 34 respectively.

8.3.3. Backdoors: SVM-SMOTE Oversampling Varying KNN Followed by Random Undersampling

In this section, SVM-SMOTE oversampling percentages are increased from 0.1 to 1.0, keeping the undersampling percent at a constant 0.5 for backdoors. Table 35, Table 36 and Table 37 present the results of the averages of the metrics for various SVM-SMOTE oversampling percentages for backdoors. As highlighted in the tables, 0.3 oversampling gave better results for KNN = 3 while 0.1 gave better results for the other two KNN variations, 5 and 10.
Table 38 captures the Welch’s t-test analysis between metrics of varying oversampling percentages for backdoors. Initially 0.2 oversampling has better precision and macro-precision than 0.1, but it lags behind 0.3 in recall. The higher oversampling percentages do not have any statistical difference with 0.3.
Welch’s t-test analysis for KNN = 5 and 10 are presented in Table 39 and Table 40 respectively. For both KNN = 5 and 10, 0.1 oversampling either performed better at one of the metrics or had no statistical differences with the higher oversampling percentages.

8.4. UNSW-NB15: Random Undersampling Followed by BMOTE Oversampling

In this section, results of the experiments carried out with RU of majority data followed by oversampling of minority data are presented. Since RU is performed first, the count of majority class is reduced, and therefore, the oversampling of minority data required to meet up to the majority data is reduced.

8.4.1. Worms: Random Undersampling Varying KNN Followed by BSMOTE Oversampling

In UNSW-NB15, Worms is the smallest of the minority classes. RU of the majority or benign data is performed first. RU is used to bring the majority class to 0.5 of the original count. This is followed by BSMOTE oversampling of the minority data, worms. Results are presented for KNN = 3, 5, and 10 in Table 41, Table 42 and Table 43, respectively. Welch’s t-test scores are calculated for the averages of the metrics, and it is found that 0.2 oversampling gave better results for KNN = 3, as shown in Table 44. For KNN = 5 and 10, 0.1 BSMOTE oversampling gave better results as shown in Table 45 and Table 46 respectively.

8.4.2. Shellcode: Random Undersampling Followed by BSMOTE Oversampling Varying KNN

For Shellcode, when RU was followed by BSMOTE oversampling, all three KNN values, 3, 5, and 10, gave better results at 0.1 oversampling, as shown in Table 47, Table 48 and Table 49 respectively. Welch’s t-test analysis of the metrics from RU followed by BSMOTE oversampling of Shellcode with varying KNN values are presented in Table 50, Table 51 and Table 52 respectively. It is clear from the analysis that 0.1 oversampling results are better than higher percentages for all three KNN values.

8.4.3. Backdoors: Random Undersampling Followed by BSMOTE Oversampling Varying KNN

Results for Backdoors are presented in Table 53, Table 54, and Table 55 for KNN = 3, 5 and 10 respectively. Probabilistic analysis of the metrics for KNN = 3 reveals that a 0.2 oversampling ratio has better precision, F-Score, and macro precision than 0.1. It did not have any statistical difference with 0.3 but had better recall than 0.4. No statistical differences were observed in the results for all the higher oversampling percentages, as presented in Table 56. The 10% oversampling ratio gave better KNN = 5 and 10 results in Welch’s t-test analysis of the metrics, as presented in Table 57 and Table 58 respectively.

8.5. Random Undersampling Followed by SVM-SMOTE Oversampling

In this section, results of the experiments carried out with RU of majority data followed by oversampling of minority data using SVM-SMOTE are presented.

8.5.1. Worms: Random Undersampling Followed by SVM-SMOTE Oversampling Varying KNN

Worms have better prediction when it is oversampled to 0.1 using SVM-SMOTE for KNN = 3 and 5 when the majority non-attack data is undersampled to 0.5, as shown in Table 59 and Table 60 respectively. For KNN = 10, however, it takes 0.4 oversampling using SVM-SMOTE to give overall better results, as shown in Table 61. It can be observed that precision and F-score decreased when KNN was increased from 3 to 10. This might be due to the imbalanced nature of the data. As the number of neighbors increases, more points from the majority dataset are included, and thus there is a higher chance of the actual point being classified incorrectly.
Table 62 and Table 63 capture the Welch’s t-test analysis performed on the metrics from Worms where RU is performed first, followed by the SVM-SMOTE oversampling for KNN = 3 and 5. In both of these runs, 0.1 had no statistical difference or better results in at least one of the metrics than higher percentages. For KNN = 10, 0.4 oversampling gave better results. This is captured in Table 64.

8.5.2. Shellcode: Random Undersampling Followed by SVM-SMOTE Oversampling Varying KNN

In this section, benign data is first undersampled (to 0.5), followed by SVM-SMOTE oversampling using varying KNN. Table 65, Table 66 and Table 67 present the results for KNN = 3, 5, and 10 respectively. From Table 65 it can be noted that, for KNN = 3, 0.4 oversampling gave better results overall. Welch’s t-test calculations, Table 68, showed that 0.4 has better recall than oversampling percentages below 0.4, and there were no statistical differences with higher oversampling percentages. Also, 0.4 had a better F-score than 0.7 and no statistical difference with higher percentages. For KNN = 5 and 10, however, Welch’s t-test analysis showed that the results were better at 0.1 oversampling, as shown in Table 69 and Table 70, respectively.

8.5.3. Backdoors: Random Undersampling Followed by SVM-SMOTE Oversampling Varying KNN

Table 71 and Table 72 show the average values of metrics for varying values of SVM-SMOTE oversampling with KNN = 3 and 5 respectively. Welch’s t-test scores and the subsequent probabilistic calculations show that 0.1 oversampling gave overall better prediction results for both KNN = 3 and 5 as shown in Table 73 and Table 74. For KNN = 3, there were no statistical differences between 0.1 and higher oversampling percentages until 0.6. Also, 0.1 has better precision, recall and macro precision than remaining percentages till 1.0 as shown in Table 71.
Table 74 captures the results for Welch’s t-test analysis for KNN = 5. From Table 74 it can be noted that 0.1 oversampling had no statistical differences across all metrics until 1.0, where the former had a better recall.
Table 75 presents the results for KNN = 10. For KNN = 10, as shown in Table 76, RU followed by 0.2 SVM-SMOTE oversampling gave the best results. p-value calculations show that 0.2 had a better recall and F-score than 0.1 and also performed better across all metrics than the other oversampling percentages.

9. Conclusions and Future Work

This paper analyzed the impact of the following on the minority sample classification:
  • The order of resampling techniques, that is, whether oversampling followed by undersampling or undersampling followed by oversampling, is better;
  • The selection of the oversampling techniques, Borderline SMOTE vs. SVM-SMOTE;
  • The effect of the selection of the KNN value on the oversampling percentage;
  • The selection of the oversampling ratio while keeping the undersampling constant at 50%.
Table 77 presents the best oversampling percentages for the rare attacks in UNSW-NB15.
It is observed that any oversampling percentage containing the best result for any individual metric may not always give the best result overall and vice versa. Based on the requirement and application, oversampling and undersampling percentages, the order of resampling techniques and KNN values have to be considered. Following are the conclusions drawn through the analysis of results and the best oversampling percentages presented in Table 77:
  • 10% oversampling gave better results for both BSMOTE and SVM-SMOTE, irrespective of the order of resampling.
  • SVM-SMOTE gave better prediction results at higher oversampling percentages than BSMOTE.
  • For rarer classes such as Worms, higher KNN led to increase in SVM-SMOTE oversampling percentages in both the orders of resampling.
For future work, we plan to expand this work to other datasets and other types of datasets and see what impact that will have on the results.

Author Contributions

This work was conceptualized by S.S.B., D.M., S.C.B. and S.S.; methodology was mainly done by S.S.B., D.M., S.S.B. and S.S.; software was done by S.S.; validation was done by S.S.B., D.M., S.C.B. and S.S.; formal analysis was done by S.S.B., D.M., S.C.B. and S.S.; investigation was done by S.S.B., D.M., S.C.B. and S.S.; data curation S.S.; writing—original draft preparation was done by S.S. and S.S.B.; writing—review and editing was done by S.S.B., D.M., S.C.B. and S.S.; visualization was done S.S.; supervision was done by S.S.B., D.M. and S.C.B.; project administration was done by S.S.B. and D.M.; funding acquisition was done by S.S.B., D.M. and S.C.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data is available at datasets.uwf.edu (accessed on 12 September 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cisco. What Is a Cyberattack?—Most Common Types; Cisco: San Jose, CA, USA, 2023; Available online: https://www.cisco.com/c/en/us/products/security/common-cyberattacks.html#~how-cyber-attacks-work (accessed on 17 April 2023).
  2. What Is a Cyberattack? IBM: Armonk, NY, USA; Available online: https://www.ibm.com/topics/cyber-attack (accessed on 17 April 2023).
  3. Delplace, A.; Hermoso, S.; Anandita, K. Cyber Attack Detection thanks to Machine Learning Algorithms. arXiv 2020, arXiv:2001.06309. Available online: https://arxiv.org/abs/2001.06309 (accessed on 12 April 2023).
  4. Alencar, R. Resampling Strategies for Imbalanced Datasets; Kaggle: San Francisco, CA, USA, 2017; Available online: https://www.kaggle.com/code/rafjaa/resampling-strategies-for-imbalanced-datasets (accessed on 17 April 2023).
  5. Ahmed, H.A.; Hameed, A.; Bawany, N.Z. Network intrusion detection using oversampling technique and machine learning algorithms. PeerJ. Comput. Sci. 2022, 8, e820. [Google Scholar] [CrossRef] [PubMed]
  6. Moustafa, N.; Slay, J. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
  7. Brownlee, J. Random Oversampling and Undersampling for Imbalanced Classification. 2021. Available online: https://machinelearningmastery.com/random-oversampling-and-undersampling-for-imbalanced-classification/ (accessed on 17 April 2023).
  8. Branco, P.S.; Torgo, L.; Ribeiro, R.A. A Survey of Predictive Modelling under Imbalanced Distributions. arXiv 2015. Available online: http://export.arxiv.org/pdf/1505.01658 (accessed on 17 April 2023).
  9. Patwardhan, S. Simple Understanding and Implementation of KNN Algorithm! Analytics Vidhya, Gurgaon, New Delhi, India. 2022. Available online: https://www.analyticsvidhya.com/blog/2021/04/simple-understanding-and-implementation-of-knn-algorithm/ (accessed on 25 April 2023).
  10. Han, H.; Wang, W.-Y.; Mao, B.-H. Borderline-smote: A new over-sampling method in imbalanced data sets learning. Lect. Notes Comput. Sci. 2005, 3644, 878–887. [Google Scholar] [CrossRef]
  11. Nguyen, H.M.; Cooper, E.W.; Kamei, K. Borderline over-sampling for Imbalanced Data Classification. Int. J. Knowl. Eng. Soft Data Paradig. 2011, 3, 4. [Google Scholar] [CrossRef]
  12. Brownlee, J. Bagging and Random Forest for Imbalanced Classification; Machine Learning Mastery: Vermont, Australia, 2020; Available online: https://machinelearningmastery.com/bagging-and-random-forest-for-imbalanced-classification/ (accessed on 17 April 2023).
  13. Dini, P.; Saponara, S. Analysis, design, and comparison of machine-learning techniques for networking intrusion detection. Designs 2021, 5, 9. [Google Scholar] [CrossRef]
  14. Dini, P.; Begni, A.; Ciavarella, S.; De Paoli, E.; Fiorelli, G.; Silvestro, C.; Saponara, S. Design and testing novel one-class classifier based on polynomial interpolation with application to networking security. IEEE Access 2022, 10, 67910–67924. [Google Scholar] [CrossRef]
  15. Elhanashi, A.; Gasmi, K.; Begni, A.; Dini, P.; Zheng, Q.; Saponara, S. Machine Learning Techniques for Anomaly-Based Detection System on CSE-CIC-IDS2018 Dataset. In International Conference on Applications in Electronics Pervading Industry, Environment and Society; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
  16. Ramyachitra, D.; Manikandan, P. Imbalanced dataset classification and solutions: A review. Int. J. Comput. Bus. Res. (IJCBR) 2014, 5, 1–29. [Google Scholar]
  17. Ganganwar, V. An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng. 2012, 2, 42–47. [Google Scholar]
  18. Chawla, N.V. Data mining for imbalanced datasets: An overview. In Data Mining and Knowledge Discovery Handbook; Springer: Boston, MA, USA, 2010; pp. 875–886. [Google Scholar]
  19. Nguyen, G.H.; Bouzerdoum, A.; Phung, S.L. Learning pattern classification tasks with imbalanced data sets. Pattern Recognit. 2009, 193–208. [Google Scholar]
  20. Abdelkhalek, A.; Mashaly, M. Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning. J. Supercomput. 2023, 79, 10611–10644. [Google Scholar] [CrossRef]
  21. Eke, H.; Petrovski, A.; Ahriz, H. Handling minority class problem in threats detection based on heterogeneous ensemble learning approach. Int. J. Syst. Softw. Secur. Prot. 2020, 11, 13–37. [Google Scholar] [CrossRef]
  22. Kumar, S.; Biswas, S.K.; Devi, D. TLUSBoost algorithm: A boosting solution for class imbalance problem. Soft Comput. 2019, 23, 10755–10767. [Google Scholar] [CrossRef]
  23. Fujiwara, K.; Huang, Y.; Hori, K.; Nishioji, K.; Kobayashi, M.; Kamaguchi, M. Over- and Under-sampling Approach for Extremely Imbalanced and Small Minority Data Problem in Health 860 Record Analysis. Front. Public Health 2020, 8, 178. [Google Scholar] [CrossRef]
  24. Hasanin, T.; Khoshgoftaar, T. The Effects of Random Undersampling with Simulated Class Imbalance for Big Data. In Proceedings of the 2018 IEEE International Conference on Information Reuse and Integration (IRI), Salt Lake City, UT, USA, 6–9 July 2018; pp. 70–79. [Google Scholar] [CrossRef]
  25. Weiss, G.; Provost, F. The Effect of Class Distribution on Classifier Learning: An Empirical Study; Rutgers University: Camden, NJ, USA, 2001. [Google Scholar]
  26. Silva, E.J.R.; Zanchettin, C. On the Existence of a Threshold in Class Imbalance Problems. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics, Hong Kong, China, 9–12 October 2015; pp. 2714–2719. [Google Scholar] [CrossRef]
  27. Joshi, A.; Kanwar, K.; Vaidya, P.; Sharma, S. A Principal Component Analysis, Sampling and Classifier strategies for dealing with concerns of class imbalance in datasets with a ratio greater than five. In Proceedings of the 2022 Second International Conference on Computer Science, Engineering and Applications (ICCSEA), Gunupur, India, 8 September 2022; pp. 1–6. [Google Scholar] [CrossRef]
  28. Bagui, S.; Li, K. Resampling imbalanced data for network intrusion detection datasets. J. Big Data 2021, 8, 6. [Google Scholar] [CrossRef]
  29. Sikorski, M.; Honig, A. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software; No Starch Press: San Francisco, CA, USA, 2012. [Google Scholar]
  30. Erickson, J. Hacking: The Art of Exploitation; No Starch Press: San Francisco, CA, USA, 2008. [Google Scholar]
  31. Bagui, S.S.; Mink, D.; Bagui, S.C.; Subramaniam, S.; Wallace, D. Resampling Imbalanced Network Intrusion Datasets to Identify Rare Attacks. Future Internet 2023, 15, 130. [Google Scholar] [CrossRef]
  32. Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques; Morgan Kaufmann: Burlington, MA, USA, 2022. [Google Scholar]
  33. Powders, D.M. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
Figure 1. Distribution of Attack Families in UNSW-NB15.
Figure 1. Distribution of Attack Families in UNSW-NB15.
Computers 12 00204 g001
Figure 2. Experimental design.
Figure 2. Experimental design.
Computers 12 00204 g002
Table 1. Information Gain Analysis.
Table 1. Information Gain Analysis.
FeatureInformation GainColumns Dropped
sttl0.476 
dttl0.422 
ct_state_ttl0.354 
sbytes0.345 
attack_cat0.339 
state0.318 
Sload0.307 
smeansz0.283 
proto0.275 
dbytes0.215 
dmeansz0.204 
dur0.193 
Dload0.188 
Dintpkt0.187 
Dpkts0.178 
ct_dst_sport_lst0.169 
swin0.167 
dwin0.165 
Ltime0.139 
Stime0.138 
Sintpkt0.131 
tcprtt0.127 
ackdat0.126 
synack0.125 
ct_src_dport_ltm0.121 
ct_dst_src_ltm0.108 
Spkts0.107 
ct_dst_ltm0.103 
Sjit0.1 
Djit0.097 
ct_src_ltm0.097*
ct_srv_dst0.094*
sloss0.09*
ct_srv_src0.089*
dloss0.085*
service0.081*
stcpd0.056*
dtcpb0.054*
res_bdy_len0.016*
trans_depth0.009*
is_sm_ips_ports0.0004*
Table 2. Hardware and Software Configurations.
Table 2. Hardware and Software Configurations.
ProcessorM1 Max Pro
RAM32 GB
OSMac OS Ventura
OS Version13.1
OS Build-
GPU-
Table 3. Python Library Versions.
Table 3. Python Library Versions.
Python3.9
Anaconda2022.1
Pandas1.5.2
Scikit-learn1.9.3
Numpy1.23.5
Imblearn0.10.0
Table 4. Worms: BSMOTE Oversampling using KNN = 3 followed by RU.
Table 4. Worms: BSMOTE Oversampling using KNN = 3 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6780.7530.7130.839
0.20.6170.6960.6510.808
0.30.6980.7380.7170.849
0.40.6580.7190.6840.829
0.50.6680.7610.7100.834
0.60.6730.7800.7180.836
0.70.6530.7300.6830.826
0.80.6610.7030.6800.830
0.90.6170.7300.6670.808
1.00.6850.7460.7090.842
Table 5. Welch’s t-test: UNSW-NB15 Worms: BSMOTE oversampling followed by RU for KNN = 3.
Table 5. Welch’s t-test: UNSW-NB15 Worms: BSMOTE oversampling followed by RU for KNN = 3.
Welch’s t-Test
(p < 0.10)
Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.21.2800.9771.2621.280No statistical difference between 0.1 and 0.2
0.1 vs. 0.3−0.4700.388−0.100−0.470No statistical difference between 0.1 and 0.3
0.1 vs. 0.40.5370.7160.7600.537No statistical difference between 0.1 and 0.4
0.1 vs. 0.50.221−0.1660.0710.221No statistical difference between 0.1 and 0.5
0.1 vs. 0.60.097−0.604−0.1340.097No statistical difference between 0.1 and 0.6
0.1 vs. 0.70.4300.5590.7300.430No statistical difference between 0.1 and 0.7
0.1 vs. 0.80.3901.3700.9340.390No statistical difference between 0.1 and 0.8
0.1 vs. 0.92.0750.5501.5332.0750.1 has better precision and macro precision than 0.9
0.1 vs. 1−0.2100.1270.093−0.210No statistical difference between 0.1 and 1.0
Table 6. Welch’s t-Test Result Scenario vs. Outcome.
Table 6. Welch’s t-Test Result Scenario vs. Outcome.
ScenarioOutcome
No statistical difference between the two oversampling percentagesLess oversampling percentage is preferred over the higher one due to less computational effort.
p-value indicates a significant difference in any of the metrics and the corresponding t-test score is positive.First oversampling percentage is preferred as it gave better result than the second.
p-value indicates a significant difference in any of the metrics but the corresponding t-test score is negative.Second oversampling percentage is preferred as it gave better result than the first.
Table 7. Worms: BSMOTE Oversampling using KNN = 5 followed by RU.
Table 7. Worms: BSMOTE Oversampling using KNN = 5 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6080.7360.6640.804
0.20.6000.7110.6460.800
0.30.5660.7730.6510.783
0.40.5650.7800.6530.782
0.50.5810.7380.6490.790
0.60.5860.7590.6560.793
0.70.6190.7530.6780.809
0.80.5390.7190.6140.769
0.90.5730.7110.6280.786
1.00.6000.7500.6650.800
Table 8. Worms: BSMOTE Oversampling using KNN = 10 followed by RU.
Table 8. Worms: BSMOTE Oversampling using KNN = 10 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.5110.7500.6060.755
0.20.5040.7570.6040.752
0.30.4800.7420.5800.740
0.40.4940.7460.5930.747
0.50.4900.7570.5930.745
0.60.4710.7030.5630.735
0.70.4920.7230.5820.746
0.80.4800.7340.5780.740
0.90.4820.7260.5780.741
1.00.4950.7340.5890.747
Table 9. Welch’s t-test: Worms: BSMOTE oversampling using KNN = 5 followed by RU.
Table 9. Welch’s t-test: Worms: BSMOTE oversampling using KNN = 5 followed by RU.
Welch’s t-Test Results (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.1830.7990.6120.1830.1 and 0.2 are statistically equal
0.1 vs. 0.31.413−1.5630.5921.4130.1 is better than 0.3 except F-Score.
0.1 vs. 0.41.773−1.8260.6281.7730.1 is better than 0.4 except F-Score
0.1 vs. 0.50.836−0.0650.5060.8360.1 and 0.5 are statistically equal
0.1 vs. 0.60.451−0.9570.2640.4510.1 and 0.6 are statistically equal
0.1 vs. 0.7−0.431−0.859−0.706−0.4310.1 and 0.7 are statistically equal
0.1 vs. 0.82.0520.9592.5002.8260.1 better than 0.8 except recall where both of them are statistically equal
0.1 vs. 0.90.8210.7451.1310.8210.1 and 0.9 are statistically equal
0.1 vs. 10.219−0.746−0.0330.3020.1 and 1 are statistically equal
Table 10. Welch’s t-test: Worms: BSMOTE oversampling using KNN = 10 followed by RU.
Table 10. Welch’s t-test: Worms: BSMOTE oversampling using KNN = 10 followed by RU.
Welch’s t-Test Results (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.194−0.2640.0770.194No statistical difference between 0.1 and 0.2
0.1 vs. 0.31.2820.2301.4001.282No statistical difference between 0.1 and 0.3
0.1 vs. 0.40.6960.1080.5660.696No statistical difference between 0.1 and 0.4
0.1 vs. 0.51.130−0.2070.9321.130No statistical difference between 0.1 and 0.5
0.1 vs. 0.61.3620.9631.3991.362No statistical difference between 0.1 and 0.6
0.1 vs. 0.70.6770.6950.9180.677No statistical difference between 0.1 and 0.7
0.1 vs. 0.81.2660.4341.5891.2660.1 has better F-Score than 0.8
0.1 vs. 0.90.8890.5370.8800.889No statistical difference between 0.1 and 0.9
0.1 vs. 10.5770.2510.4640.577No statistical difference between 0.1 and 1.0
Table 11. Shellcode: BSMOTE Oversampling using KNN = 3 followed by RU.
Table 11. Shellcode: BSMOTE Oversampling using KNN = 3 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.7190.9060.8020.859
0.20.7190.9130.8050.859
0.30.7130.9080.7990.856
0.40.7000.9030.7890.850
0.50.7200.9050.8020.859
0.60.7130.8990.7950.856
0.70.7050.8980.7900.852
0.80.7030.8950.7880.851
0.90.7060.9100.7950.853
1.00.6920.9050.7840.846
Table 12. Shellcode: BSMOTE oversampling using KNN = 5 followed by RU.
Table 12. Shellcode: BSMOTE oversampling using KNN = 5 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6940.9000.7830.847
0.20.6780.9170.7800.839
0.30.6820.9160.7820.841
0.40.6960.9030.7860.847
0.50.6940.9160.7900.847
0.60.6790.9120.7780.839
0.70.6820.9110.7790.841
0.80.6980.9170.7920.849
0.90.6790.9170.7800.839
1.00.7000.9040.7890.850
Table 13. Shellcode: BSMOTE oversampling using KNN = 10 followed by RU.
Table 13. Shellcode: BSMOTE oversampling using KNN = 10 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6600.9290.7720.830
0.20.6570.9290.7700.828
0.30.6530.9250.7650.826
0.40.6660.9130.7700.832
0.50.6580.9170.7660.829
0.60.6520.9210.7640.826
0.70.6430.9240.7580.821
0.80.6490.9200.7610.824
0.90.6560.9230.7670.828
1.00.6600.9070.7640.830
Table 14. Welch’s t-test: Shellcode: BSMOTE oversampling using KNN = 3 followed by RU.
Table 14. Welch’s t-test: Shellcode: BSMOTE oversampling using KNN = 3 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.2−0.058−0.958−0.434−0.058No statistical difference between 0.1 and 0.2
0.1 vs. 0.31.011−0.1400.4761.011No statistical difference between 0.1 and 0.3
0.1 vs. 0.44.2040.3603.2204.2050.1 has better precision, F-Score and macro than 0.4
0.1 vs. 0.5−0.0910.0950.000−0.091No statistical difference between 0.1 and 0.5
0.1 vs. 0.60.8490.7261.2130.849No statistical difference between 0.1 and 0.6
0.1 vs. 0.71.5400.8391.4681.5410.1 has better precision and macro precision than 0.7
0.1 vs. 0.82.1861.2172.7432.1880.1 has better precision, F-Score and macro precision than 0.8
0.1 vs. 0.92.636−0.3411.2622.6350.1 has better precision and macro-precision than 0.9
0.1 vs. 12.8800.1872.8692.8810.1 has better precision, F-Score and macro precision than 1
Table 15. Welch’s t-test: Shellcode: BSMOTE oversampling using KNN = 5 followed by RU.
Table 15. Welch’s t-test: Shellcode: BSMOTE oversampling using KNN = 5 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.566−0.2760.3860.5660.1 and 0.2 are statistically equal.
0.1 vs. 0.30.874−0.7060.4420.8740.1 and 0.3 are statistically equal.
0.1 vs. 0.41.8050.0701.7021.8060.1 better than 0.4 but the latter has better recall
0.1 vs. 0.53.349−0.1283.1043.3490.1 better than 0.5. but the latter has better recall.
0.1 vs. 0.6−0.084−0.489−0.284−0.0840.1 and 0.6 are statistically equal
0.1 vs. 0.70.684−0.7300.2930.6830.1 and 0.7 are statistically equal
0.1 vs. 0.81.6200.5201.4801.6200.1 is better than 0.8 across precision, F-Score and macro precision.
0.1 vs. 0.91.2022.8142.1871.2040.1 is better than 0.9 across recall and F-Score metrics.
0.1 vs. 12.1010.2461.8322.1010.1 is better than 1
Table 16. Welch’s t-test: Shellcode: BSMOTE oversampling using KNN = 10 followed by RU.
Table 16. Welch’s t-test: Shellcode: BSMOTE oversampling using KNN = 10 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.4230.1090.5500.423No statistical difference between 0.1 and 0.2
0.1 vs. 0.31.1220.6271.2781.123No statistical difference between 0.1 and 0.3
0.1 vs. 0.4−0.8912.6480.401−0.8900.1 has better recall than 0.4
0.1 vs. 0.50.2051.9230.8170.2060.1 has better recall than 0.5
0.1 vs. 0.61.0911.3261.3171.0920.1 and 0.6 are statistically equal
0.1 vs. 0.71.7811.8762.1231.7810.1 is better than 0.7 across all metrics
0.1 vs. 0.81.7832.6372.3531.7840.1 is better than 0.8 across all metrics
0.1 vs. 0.90.3360.8870.7610.3370.1 and 0.9 are statistically equal
0.1 vs. 1−0.0402.2102.289−0.0380.1 has better recall and F-Score than 1.0
Table 17. Backdoors: BSMOTE oversampling using KNN = 3 followed by RU.
Table 17. Backdoors: BSMOTE oversampling using KNN = 3 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.9580.9390.9480.978
0.20.9530.9420.9470.976
0.30.9380.9500.9440.969
0.40.9380.9500.9440.969
0.50.9470.9470.9470.973
0.60.9430.9490.9460.971
0.70.9480.9450.9470.974
0.80.9490.9420.9460.974
0.90.9450.9410.9430.972
1.00.9520.9450.9480.976
Table 18. Backdoors: BSMOTE oversampling using KNN = 5 followed by RU.
Table 18. Backdoors: BSMOTE oversampling using KNN = 5 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.9660.9520.9590.983
0.20.9660.9430.9540.983
0.30.9650.9400.9530.982
0.40.9650.9380.9510.982
0.50.9610.9440.9520.980
0.60.9640.9440.9540.982
0.70.9630.9430.9530.981
0.80.9540.9410.9470.976
0.90.9640.9390.9510.982
1.00.9740.9350.9540.987
Table 19. Backdoors: BSMOTE oversampling using KNN = 10 followed by RU.
Table 19. Backdoors: BSMOTE oversampling using KNN = 10 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.9680.9440.9560.984
0.20.9650.9370.9500.982
0.30.9550.9450.9500.977
0.40.9640.9530.9580.982
0.50.9640.9490.9560.982
0.60.9550.9440.9490.977
0.70.9490.9480.9480.974
0.80.9590.9430.9510.979
0.90.9570.9440.9500.978
1.00.9570.9470.9520.978
Table 20. Welch’s t-test: Backdoors: BSMOTE oversampling using KNN = 3 followed by RU.
Table 20. Welch’s t-test: Backdoors: BSMOTE oversampling using KNN = 3 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.885−0.5030.3690.886No statistical diff between 0.1 and 0.2
0.1 vs. 0.33.240−2.9231.1793.2380.1 has Precision and F-Score while 0.2 has better recall
0.1 vs. 0.43.013−2.1021.7303.0130.1 has Precision and F-Score while 0.4 has better recall
0.1 vs. 0.52.140−1.3390.4852.1400.1 has better precision and macro precision
0.1 vs. 0.63.929−2.3920.9313.9280.1 has better precision and macro precision while 0.6 has better recall
0.1 vs. 0.71.612−1.6100.3011.6110.1 has better precision and macro precision while 0.7 has better recall
0.1 vs. 0.82.281−0.8301.0552.2810.1 has better precision and macro precision than 0.8
0.1 vs. 0.91.833−0.3391.0011.8320.1 has better precision and macro precision than 0.8
0.1 vs. 11.208−1.272−0.0521.207No statistical diff between 0.1 and 1.0
Table 21. Welch’s t-test: Backdoors: BSMOTE oversampling followed by RU for KNN = 5.
Table 21. Welch’s t-test: Backdoors: BSMOTE oversampling followed by RU for KNN = 5.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.2−0.0771.7261.294−0.0750.1 has better recall than 0.2
0.1 vs. 0.30.1011.7871.7350.1040.1 has better recall and F-Score than 0.3
0.1 vs. 0.40.1621.8692.5910.1640.1 has better recall and F-Score than 0.4
0.1 vs. 0.50.9041.4012.2070.9060.1 has better F-Score than 0.5
0.1 vs. 0.60.3311.7321.6110.3330.1 has better recall and F-Score than 0.6
0.1 vs. 0.70.4921.7712.0010.4940.1 has better recall and F-Score than 0.7
0.1 vs. 0.81.7472.1402.8071.7490.1 is better than 0.8 across all metrics
0.1 vs. 0.90.3032.0231.6810.3060.1 has better recall and F-Score than 0.9
0.1 vs. 1−1.3192.7151.436−1.3160.1 has better recall than 1.0
Table 22. Welch’s t-test: Backdoors: BSMOTE oversampling using KNN = 10 followed by RU.
Table 22. Welch’s t-test: Backdoors: BSMOTE oversampling using KNN = 10 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.7851.4841.5070.787No statistical diff between 0.1 and 0.2
0.1 vs. 0.34.463−0.1131.8544.4600.1 has better precision, F-Score and macro precision than 0.3
0.1 vs. 0.41.441−1.525−1.0641.440No statistical diff between 0.1 and 0.4
0.1 vs. 0.51.532−0.902−0.1071.529No statistical diff between 0.1 and 0.5
0.1 vs. 0.62.7510.0431.7242.7510.1 has better precision, F-Score and macro precision than 0.6
0.1 vs. 0.73.699−1.0242.7153.6990.1 has better precision, F-Score and macro precision than 0.7
0.1 vs. 0.82.2710.0661.3272.2740.1 has better precision and macro precision than 0.8
0.1 vs. 0.93.983−0.0391.4643.9850.1 has better precision and macro precision than 0.9
0.1 vs. 14.994−0.6321.9224.9960.1 has better precision, F-Score and macro precision than 1.0
Table 23. Worms: SVM-SMOTE oversampling using KNN = 3 followed by RU.
Table 23. Worms: SVM-SMOTE oversampling using KNN = 3 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6740.7760.7210.837
0.20.6730.7530.7100.836
0.30.6490.7340.6860.824
0.40.6490.7610.6980.824
0.50.6020.7230.6560.801
0.60.7080.8150.7520.854
0.70.6520.7340.6890.826
0.80.6820.7960.7330.841
0.90.6180.8000.6960.809
1.00.6140.7030.6550.807
Table 24. Worms: SVM-SMOTE oversampling using KNN = 5 followed by RU.
Table 24. Worms: SVM-SMOTE oversampling using KNN = 5 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6050.6920.6430.802
0.20.5620.7730.6500.781
0.30.6050.7880.6830.802
0.40.5340.7260.6120.767
0.50.5720.7380.6420.786
0.60.5520.7760.6440.776
0.70.5910.7690.6680.795
0.80.5120.7800.6170.756
0.90.5780.7800.6600.789
1.00.4920.7530.5940.746
Table 25. Worms: SVM-SMOTE oversampling using KNN = 10 followed by RU.
Table 25. Worms: SVM-SMOTE oversampling using KNN = 10 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.4440.7650.5620.722
0.20.4200.7500.5380.710
0.30.4920.7800.6000.746
0.40.4790.8190.6030.739
0.50.4610.7690.5720.730
0.60.4530.7800.5730.726
0.70.4540.7730.5710.727
0.80.4650.7920.5850.732
0.90.4810.7650.5900.740
1.00.4950.7690.6020.747
Table 26. Welch’s t-test: Worms: SVM-SMOTE oversampling using KNN = 3 followed by RU.
Table 26. Welch’s t-test: Worms: SVM-SMOTE oversampling using KNN = 3 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.0480.4930.3130.048No statistical difference between 0.1 and 0.2
0.1 vs. 0.30.8630.8981.1270.8630.1 has better F-Score and macro then 0.3
0.1 vs. 0.40.8730.3500.7540.873No statistical difference between 0.1 and 0.4
0.1 vs. 0.53.3071.7442.9193.3070.1 is significantly better than 0.5
0.1 vs. 0.6−0.867−1.075−1.440−0.867No statistical difference between 0.1 and 0.6
0.1 vs. 0.71.3201.3331.7931.3200.1 has better F-Score than 0.7
0.1 vs. 0.8−0.189−0.514−0.335−0.189No statistical difference between 0.1 and 0.8
0.1 vs. 0.93.252−0.7721.4973.2520.1 has better precision and macro
0.1 vs. 12.8102.2643.0582.8100.1 is better than 1.0 across all metrics
Table 27. Welch’s t-test: Worms: SVM-SMOTE oversampling using KNN = 5 followed by RU.
Table 27. Welch’s t-test: Worms: SVM-SMOTE oversampling using KNN = 5 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.21.7660.6541.4361.765No significant difference
0.1 vs. 0.3−2.615−0.426−1.909−2.6150.1 is better than 0.3 but the latter has better f-score.
0.1 vs. 0.4−1.778−1.887−1.974−1.7780.1 is better than 0.4 but 0.4 has better f-score
0.1 vs. 0.5−0.982−0.095−0.541−0.9820.1 and 0.5 are statistically equal
0.1 vs. 0.6−0.439−0.409−0.444−0.4390.1 and 0.6 are statistically equal
0.1 vs. 0.7−0.566−0.309−0.495−0.5660.1 and 0.7 are statistically equal
0.1 vs. 0.8−1.206−0.905−1.159−1.2060.1 better than 0.8 except recall where both of them are statistically equal
0.1 vs. 0.9−1.7260.000−1.216−1.7260.1 and 0.9 are statiscally equal
0.1 vs. 1−2.922−0.129−1.976−2.9220.1 and 1 are statiscally equal
Table 28. Welch’s t-test: Worms: SVM-SMOTE oversampling for KNN = 10 followed by RU.
Table 28. Welch’s t-test: Worms: SVM-SMOTE oversampling for KNN = 10 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.21.2480.4621.0151.248No statistical diff bet 0.1 and 0.2
0.1 vs. 0.3−1.849−0.301−1.350−1.8490.3 has better precision and macro precision than 0.1
0.3 vs. 0.40.471−0.820−0.1010.471No statistical diff bet 0.3 and 0.4
0.3 vs. 0.51.2300.1871.1061.230No statistical diff bet 0.3 and 0.5
0.3 vs. 0.61.4830.0001.0311.483No statistical diff bet 0.3 and 0.5
0.3 vs. 0.71.4010.1811.1531.402No statistical diff bet 0.3 and 0.7
0.3 vs. 0.81.045−0.2380.5661.045No statistical diff bet 0.3 and 0.8
0.3 vs. 0.90.3530.3330.3200.353No statistical diff bet 0.3 and 0.9
0.3 vs. 1.0−0.0990.254−0.068−0.099No statistical diff bet 0.3 and 1.0
Table 29. Shellcode: SVM-SMOTE oversampling using KNN = 3 followed by RU.
Table 29. Shellcode: SVM-SMOTE oversampling using KNN = 3 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.7090.9110.7970.854
0.20.7190.9080.8020.859
0.30.7030.9100.7930.851
0.40.7090.9040.7940.854
0.50.7110.9040.7950.855
0.60.7170.9090.8010.858
0.70.7080.9020.7930.854
0.80.7080.9000.7920.854
0.90.7130.9100.7990.856
1.00.7100.9040.7950.855
Table 30. Shellcode: SVM-SMOTE oversampling using KNN = 5 followed by RU.
Table 30. Shellcode: SVM-SMOTE oversampling using KNN = 5 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6990.9060.7890.849
0.20.6960.9130.7890.848
0.30.6900.9180.7880.845
0.40.6930.9130.7870.846
0.50.6870.9050.7810.843
0.60.7050.9040.7920.852
0.70.6840.9090.7810.842
0.80.6850.9120.7830.842
0.90.7100.9100.7980.855
1.00.6920.9130.7880.846
Table 31. Shellcode-SVM-SMOTE oversampling using KNN = 10 followed by RU.
Table 31. Shellcode-SVM-SMOTE oversampling using KNN = 10 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6880.9310.7910.844
0.20.6570.9280.7690.828
0.30.6830.9240.7860.841
0.40.6720.9120.7730.836
0.50.6610.9140.7670.830
0.60.6610.9120.7660.830
0.70.6450.9210.7580.822
0.80.6730.9130.7750.836
0.90.6650.9200.7720.832
1.00.6530.9220.7640.826
Table 32. Welch’s t-test: Shellcode: SVM-SMOTE oversampling using KNN = 3 followed by RU.
Table 32. Welch’s t-test: Shellcode: SVM-SMOTE oversampling using KNN = 3 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.2−0.5360.391−0.448−0.536No statistical diff between 0.1 and 0.2
0.1 vs. 0.30.3600.1770.3670.360No statistical diff between 0.1 and 0.3
0.1 vs. 0.4−0.0030.7350.255−0.003No statistical diff between 0.1 and 0.4
0.1 vs. 0.5−0.1040.6830.150−0.104No statistical diff between 0.1 and 0.5
0.1 vs. 0.6−0.4580.238−0.411−0.458No statistical diff between 0.1 and 0.6
0.1 vs. 0.70.0870.9950.3870.088No statistical diff between 0.1 and 0.7
0.1 vs. 0.80.0801.1650.4550.080No statistical diff between 0.1 and 0.8
0.1 vs. 0.9−0.2510.169−0.228−0.251No statistical diff between 0.1 and 0.9
0.1 vs. 1−0.0290.7290.165−0.029No statistical diff between 0.1 and 1.0
Table 33. Welch’s t-test: Shellcode: SVM-SMOTE oversampling using KNN = 5 followed by RU.
Table 33. Welch’s t-test: Shellcode: SVM-SMOTE oversampling using KNN = 5 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.261−0.922−0.0470.261No statistical diff between 0.1 and 0.2
0.1 vs. 0.30.551−1.3210.1480.551No statistical diff between 0.1 and 0.3
0.1 vs. 0.40.550−0.8960.2810.549No statistical diff between 0.1 and 0.4
0.1 vs. 0.51.1270.1541.1941.128No statistical diff between 0.1 and 0.5
0.1 vs. 0.6−0.6560.228−0.502−0.656No statistical diff between 0.1 and 0.6
0.1 vs. 0.71.575−0.3481.1921.5750.1 is better than 0.7 in precision and macro precision
0.1 vs. 0.81.302−1.3700.9961.302No statistical diff between 0.1 and 0.8
0.1 vs. 0.9−1.140−0.527−1.293−1.141No statistical diff between 0.1 and 0.9
0.1 vs. 10.601−1.1890.1910.600No statistical diff between 0.1 and 1.0
Table 34. Welch’s t-test: Shellcode: SVM-SMOTE oversampling using KNN = 10 followed by RU.
Table 34. Welch’s t-test: Shellcode: SVM-SMOTE oversampling using KNN = 10 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.22.5450.3613.2482.5460.1 has better precision, f-Score and macro precision
0.1 vs. 0.30.4430.8830.7210.443No statistical difference between 0.1 and 0.3
0.1 vs. 0.41.0952.1702.0721.0960.1 has better recall and F-Score than 0.4
0.1 vs. 0.52.6682.5783.2452.6690.1 is better than 0.5 across all metrics
0.1 vs. 0.62.2723.3452.9702.2730.1 is better than 0.6 across all metrics
0.1 vs. 0.73.7841.1844.5363.7860.1 is better than 0.7 in precision, F-Score and macro precision
0.1 vs. 0.81.4161.9951.9741.4170.1 has better than 0.8 in precision, F-Score
0.1 vs. 0.92.0692.2632.5462.0700.1 is better than 0.9 across all metrics
0.1 vs. 12.3472.0712.6132.3480.1 is better than 1.0 across all metrics
Table 35. UNSW-NB15 Backdoors: SVM-SMOTE oversampling using KNN = 3 followed by RU.
Table 35. UNSW-NB15 Backdoors: SVM-SMOTE oversampling using KNN = 3 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.9150.9450.9300.957
0.20.9360.9310.9330.967
0.30.9290.9450.9370.964
0.40.9280.9390.9340.964
0.50.9340.9370.9350.966
0.60.9340.9400.9370.967
0.70.9300.9340.9320.965
0.80.9350.9330.9340.967
0.90.9280.9360.9320.964
1.00.9290.9350.9320.964
Table 36. Backdoors: SVM-SMOTE oversampling using KNN = 5 followed by RU.
Table 36. Backdoors: SVM-SMOTE oversampling using KNN = 5 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.9190.9470.9330.959
0.20.9160.9480.9310.958
0.30.9090.9490.9290.954
0.40.9130.9450.9290.956
0.50.9080.9450.9260.954
0.60.9260.9380.9320.963
0.70.9260.9340.9300.962
0.80.9260.9370.9320.963
0.90.9120.9460.9280.956
1.00.9140.9400.9270.957
Table 37. Backdoors: SVM-SMOTE oversampling using KNN = 10 followed by RU.
Table 37. Backdoors: SVM-SMOTE oversampling using KNN = 10 followed by RU.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.9140.9570.9350.957
0.20.9070.9500.9280.953
0.30.9130.9500.9310.956
0.40.9160.9430.9290.958
0.50.9200.9490.9340.960
0.60.8990.9500.9240.949
0.70.8950.9450.9190.947
0.80.9180.9430.9300.959
0.90.9060.9450.9250.952
1.00.9120.9520.9310.956
Table 38. Welch’s t-test: Backdoors: SVM-SMOTE oversampling using KNN = 3 followed by RU.
Table 38. Welch’s t-test: Backdoors: SVM-SMOTE oversampling using KNN = 3 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.2−2.5992.404−0.624−2.5960.2 has better precision and macro precision than 0.1 but the latter has better recall
0.2 vs. 0.30.874−2.029−0.5820.8720.3 has better recall than 0.2
0.3 vs. 0.40.1430.8290.6070.144No statistical difference between 0.3 and 0.4
0.3 vs. 0.5−0.5681.0400.210−0.567No statistical difference between 0.3 and 0.5
0.3 vs. 0.6−0.6130.772−0.018−0.612No statistical difference between 0.3 and 0.6
0.3 vs. 0.7−0.1111.9190.992−0.1100.3 has better recall than 0.7
0.3 vs. 0.8−0.8981.6980.459−0.8960.3 has better recall than 0.8
0.3 vs. 0.90.0341.2550.8760.035No statistical difference between 0.3 and 0.9
0.3 vs. 1.0−0.0211.1980.604−0.020No statistical difference between 0.3 and 1.0
Table 39. Welch’s t-test: Backdoors: SVM-SMOTE oversampling using KNN = 5 followed by RU.
Table 39. Welch’s t-test: Backdoors: SVM-SMOTE oversampling using KNN = 5 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.235−0.1770.1820.235No statistical diff between 0.1 and 0.2
0.1 vs. 0.30.912−0.3900.7570.912No statistical diff between 0.1 and 0.3
0.1 vs. 0.40.4710.5080.6620.471No statistical diff between 0.1 and 0.4
0.1 vs. 0.50.7130.3450.9520.714No statistical diff between 0.1 and 0.5
0.1 vs. 0.6−0.7691.7480.219−0.7680.1 has better recall than 0.6
0.1 vs. 0.7−0.6513.1470.576−0.6500.1 has better recall than 0.7
0.1 vs. 0.8−0.8752.5080.279−0.8740.1 has better recall than 0.8
0.1 vs. 0.90.7510.2640.8920.752No statistical diff between 0.1 and 0.9
0.1 vs. 10.3641.6990.8990.3640.1 has better recall than 1.0
Table 40. Welch’s t-test: Backdoors: SVM-SMOTE oversampling using KNN = 10 followed by RU.
Table 40. Welch’s t-test: Backdoors: SVM-SMOTE oversampling using KNN = 10 followed by RU.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.21.1621.3451.7471.1630.1 has better F-Score than 0.2
0.1 vs. 0.30.1381.5150.6490.138No statistical diff between 0.1 and 0.3
0.1 vs. 0.4−0.3171.8821.102−0.3150.1 has better recall than 0.4
0.1 vs. 0.5−0.8811.1240.077−0.880No statistical diff between 0.1 and 0.4
0.1 vs. 0.61.9321.0631.6561.9320.1 has better precision, F-Score and macro precision than 0.6
0.1 vs. 0.73.1832.4613.9933.1850.1 is better than 0.7 across all metrics
0.1 vs. 0.8−0.6972.8121.129−0.6940.1 has better F-Score than 0.8
0.1 vs. 0.91.0621.6812.0551.0640.1 has better recall and F-Score than 0.9
0.1 vs. 10.2480.9860.7320.248No statistical diff between 0.1 and 1.0
Table 41. Worms: RU followed by BSMOTE oversampling using KNN = 3.
Table 41. Worms: RU followed by BSMOTE oversampling using KNN = 3.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6930.7380.7120.846
0.20.7050.8380.7660.852
0.30.7560.7650.7570.878
0.40.6660.7530.7020.833
0.50.6430.7150.6760.821
0.60.6940.7530.7170.847
0.70.7120.7760.7410.856
0.80.7160.7570.7350.858
0.90.7110.7530.7290.855
1.00.6640.7380.6940.832
Table 42. UNSW-NB15 Worms: RU followed by BSMOTE oversampling using KNN = 5.
Table 42. UNSW-NB15 Worms: RU followed by BSMOTE oversampling using KNN = 5.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6300.8300.7150.815
0.20.5860.7000.6350.793
0.30.6300.7500.6820.815
0.40.5520.6760.6010.776
0.50.6170.7650.6810.808
0.60.5900.7650.6640.795
0.70.5920.7530.6630.796
0.80.5680.7380.6390.784
0.90.6120.7500.6710.806
1.00.5280.7610.6230.764
Table 43. Worms: RU followed by BSMOTE oversampling using KNN = 10.
Table 43. Worms: RU followed by BSMOTE oversampling using KNN = 10.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.5090.7840.6170.754
0.20.5070.7380.6000.753
0.30.4730.7800.5880.736
0.40.4920.7610.5950.746
0.50.4830.7230.5780.741
0.60.4710.7110.5670.735
0.70.4650.7530.5740.732
0.80.4930.7920.6060.746
0.90.4900.7690.5970.744
1.00.5460.7960.6450.773
Table 44. Welch’s t-test: Worms: RU followed by BSMOTE oversampling using KNN = 3.
Table 44. Welch’s t-test: Worms: RU followed by BSMOTE oversampling using KNN = 3.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.2−0.318−2.276−1.529−0.3180.2 has better recall than 0.1
0.2 vs. 0.3−1.2231.8110.252−1.2230.2 has better recall than 0.3
0.2 vs. 0.40.8172.2221.7960.8170.2 has better recall and F-Score than 0.4
0.2 vs. 0.52.0182.5782.6002.0180.2 is better than 0.5 across all metrics
0.2 vs. 0.60.2572.1781.4680.2580.2 has better recall than 0.6
0.2 vs. 0.7−0.1491.2960.603−0.1490.2 and 0.7 are statistically equal
0.2 vs. 0.8−0.2562.0160.787−0.2560.2 has better recall than 0.8
0.2 vs. 0.9−0.1152.1360.928−0.1150.2 has better recall than 0.9
0.2 vs. 1.01.1501.8272.0841.1500.2 has better recall and F-Score than 1.0
Table 45. Welch’s t-test: Worms: RU followed by BSMOTE oversampling using KNN = 5.
Table 45. Welch’s t-test: Worms: RU followed by BSMOTE oversampling using KNN = 5.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.7193.5061.6450.7190.1 is better than 0.2 in recall and F-Score
0.1 vs. 0.3−0.0012.1560.841−0.0010.1 is better than 0.3 in recall
0.1 vs. 0.41.6282.5842.6131.6290.1 is better than 0.4 across all metrics
0.1 vs. 0.50.2931.3800.8040.294No statistical difference between 0.1 and 0.5
0.1 vs. 0.60.9131.5811.3350.9130.1 is better than 0.6 in recall
0.1 vs. 0.70.9391.9461.3670.9390.1 is better than 0.7 in recall
0.1 vs. 0.81.6011.8092.0641.6010.1 is better than 0.8 across all metrics
0.1 vs. 0.90.3941.5971.0130.3940.1 is better than 0.9 in recall
0.1 vs. 12.7191.8372.5642.7190.1 is better than 1.0 across all metrics
Table 46. Welch’s t-test: Worms: RU followed by BSMOTE oversampling using KNN = 10.
Table 46. Welch’s t-test: Worms: RU followed by BSMOTE oversampling using KNN = 10.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.1141.3250.7770.114No statistical difference between 0.1 and 0.2
0.1 vs. 0.31.7350.1131.2741.7350.1 is better than 0.3 in precision and macro-precision
0.1 vs. 0.40.4370.5770.6400.437No statistical difference between 0.1 and 0.4
0.1 vs. 0.51.1371.7251.6201.1370.1 is better than 0.5 in recall and F-Score
0.1 vs. 0.61.5911.7141.7811.5920.1 is better than 0.6 across all metrics
0.1 vs. 0.71.5390.7711.3731.5390.1 is better than 0.7 in precision and macro-precision
0.1 vs. 0.80.812−0.1660.4410.811No statistical difference between 0.1 and 0.8
0.1 vs. 0.91.2380.3370.9141.238No statistical difference between 0.1 and 0.9
0.1 vs. 1−0.900−0.309−0.799−0.900No statistical difference between 0.1 and 1.0
Table 47. Shellcode: RU followed by BSMOTE oversampling using KNN = 3.
Table 47. Shellcode: RU followed by BSMOTE oversampling using KNN = 3.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.7180.9090.8020.859
0.20.6990.9050.7880.849
0.30.7020.9200.7960.851
0.40.6880.9030.7800.844
0.50.6870.9150.7850.843
0.60.6970.9060.7870.848
0.70.6890.9050.7820.844
0.80.6950.9060.7860.847
0.90.7080.9010.7930.854
1.00.7070.9170.7980.853
Table 48. Shellcode: RU followed by BSMOTE oversampling using KNN = 5.
Table 48. Shellcode: RU followed by BSMOTE oversampling using KNN = 5.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6940.9160.7890.847
0.20.6970.9140.7910.848
0.30.6730.9050.7720.836
0.40.6760.9150.7770.838
0.50.6770.9230.7810.838
0.60.6660.9060.7670.833
0.70.6780.9150.7780.839
0.80.6730.9100.7740.836
0.90.6730.9020.7710.836
1.00.6570.9190.7660.828
Table 49. Shellcode: RU followed by BSMOTE oversampling using KNN = 10.
Table 49. Shellcode: RU followed by BSMOTE oversampling using KNN = 10.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6650.9220.7720.832
0.20.6450.9210.7580.822
0.30.6520.9250.7650.826
0.40.6490.9200.7610.824
0.50.6370.9390.7590.818
0.60.6430.9220.7580.821
0.70.6550.9110.7620.827
0.80.6570.9350.7720.828
0.90.6440.9220.7580.822
1.00.6310.9280.7510.815
Table 50. Welch’s t-test: Shellcode: RU followed by BSMOTE oversampling for KNN = 3.
Table 50. Welch’s t-test: Shellcode: RU followed by BSMOTE oversampling for KNN = 3.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.21.9850.8112.1331.9850.1 is statistically better than 0.2 in precision, F-Score and macro precision
0.1 vs. 0.32.223−1.4691.2572.2220.1 is better than 0.3 in precision and macro precision.
0.1 vs. 0.42.2110.9872.9332.2120.1 is better than 0.4 in precision, f-score and macro precision.
0.1 vs. 0.52.569−1.0232.1672.5690.1 is better than 0.5 in precision, f-score and macro precision.
0.1 vs. 0.62.7140.3983.3202.7150.1 is better than 0.6 in precision, f-score and macro precision.
0.1 vs. 0.73.9470.7755.1413.9480.1 is better than 0.7 in precision, f-score and macro precision.
0.1 vs. 0.82.1890.4882.4012.1900.1 is better than 0.8 in precision, f-score and macro precision.
0.1 vs. 0.92.5010.6541.6282.5010.1 is better than 0.9 in precision, f-score and macro precision.
0.1 vs. 11.260−0.8810.7731.260No statistical difference between 0.1 and 1
Table 51. Welch’s t-test: Shellcode: RU followed by BSMOTE oversampling for KNN = 5.
Table 51. Welch’s t-test: Shellcode: RU followed by BSMOTE oversampling for KNN = 5.
Welch’s t-Test (p < 0.10)Precision t ValueRecall t ValueF-Score t ValueMacro Precision t ValueAnalysis
0.1 vs. 0.2−0.3130.312−0.190−0.313No statistical difference between 0.1 and 0.2
0.1 vs. 0.32.2321.1882.8092.2330.1 is statistically better than 0.3 in precision, F-Score and macro-precision
0.1 vs. 0.42.1290.1001.7442.1290.1 is statistically better than 0.4 in precision, F-Score and macro-precision
0.1 vs. 0.51.925−0.8841.4071.9240.1 is statistically better than 0.5 in precision and macro-precision
0.1 vs. 0.62.9371.1443.3442.9380.1 is statistically better than 0.6 in precision, F-Score and macro-precision
0.1 vs. 0.71.4960.1811.4591.496No statistical difference between 0.1 and 0.7
0.1 vs. 0.82.3060.7312.3732.3060.1 is statistically better than 0.8 in precision, F-Score and macro-precision
0.1 vs. 0.92.1332.9630.0002.1340.1 is better than 0.9 across all metrics
0.1 vs. 13.678−0.5363.4963.6780.1 is statistically better than 1.0 in precision, F-Score and macro-precision
Table 52. Welch’s t-test: Shellcode: RU followed by BSMOTE oversampling for KNN = 10.
Table 52. Welch’s t-test: Shellcode: RU followed by BSMOTE oversampling for KNN = 10.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.21.4630.2451.4051.4630.1 and 0.2 are statistically equal
0.1 vs. 0.31.088−0.5890.8671.0880.1 and 0.3 are statistically equal
0.1 vs. 0.41.2750.4291.2701.2750.1 and 0.4 are statistically equal
0.1 vs. 0.52.761−2.4221.8152.7600.1 is better than 0.5 in precision, F-Score and macro precision. 0.5 is better than 0.1 in recall.
0.1 vs. 0.61.9910.0671.7711.9910.1 is better than 0.6 in precision, F-Score and macro precision.
0.1 vs. 0.70.8771.8731.3310.8780.1 is better than 0.7 in recall.
0.1 vs. 0.80.709−3.5220.1030.7080.1 is better than 0.8 in recall.
0.1 vs. 0.92.0430.0561.9212.0430.1 is better than 0.9 in precision, F-Score and macro precision.
0.1 vs. 12.199−0.7132.2962.1990.1 is better than 1.0 in precision, F-Score and macro precision.
Table 53. Backdoors: RU followed by BSMOTE oversampling using KNN = 3.
Table 53. Backdoors: RU followed by BSMOTE oversampling using KNN = 3.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.9360.9490.9420.967
0.20.9450.9530.9490.972
0.30.9430.9570.9500.971
0.40.9460.9470.9460.972
0.50.9350.9510.9430.967
0.60.9400.9490.9440.970
0.70.9420.9510.9460.971
0.80.9410.9500.9450.970
0.90.9360.9570.9460.968
1.00.9430.9470.9450.971
Table 54. UNSW-NB15 Backdoors: RU followed by BSMOTE oversampling using KNN = 5.
Table 54. UNSW-NB15 Backdoors: RU followed by BSMOTE oversampling using KNN = 5.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.9340.9590.9460.967
0.20.9380.9530.9450.969
0.30.9410.9510.9460.970
0.40.9290.9540.9410.964
0.50.9360.9540.9450.968
0.60.9340.9480.9410.967
0.70.9310.9550.9430.965
0.80.9400.9510.9450.970
0.90.9300.9530.9410.965
1.00.9390.9550.9470.969
Table 55. UNSW-NB15 Backdoors: RU followed by BSMOTE oversampling using KNN = 10.
Table 55. UNSW-NB15 Backdoors: RU followed by BSMOTE oversampling using KNN = 10.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.9330.9610.9470.966
0.20.9320.9530.9420.966
0.30.9320.9620.9470.966
0.40.9130.9540.9330.956
0.50.9240.9490.9370.962
0.60.9310.9550.9430.965
0.70.9310.9530.9420.965
0.80.9180.9590.9380.959
0.90.9200.9590.9390.960
1.00.9220.9490.9350.960
Table 56. Welch’s t-test: Backdoors: RU followed by BSMOTE oversampling for KNN = 3.
Table 56. Welch’s t-test: Backdoors: RU followed by BSMOTE oversampling for KNN = 3.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.2−1.685−1.207−2.040−1.6860.2 is better than 0.1 in precision, F-Score and macro-precision
0.2 vs. 0.30.300−0.790−0.4370.300No statistical difference between 0.2 and 0.3
0.2 vs. 0.4−0.2081.6601.007−0.2070.2 is better than 0.4 in recall
0.2 vs. 0.51.4290.5641.4871.429No statistical difference between 0.2 and 0.5
0.2 vs. 0.60.6540.7501.2890.655No statistical difference between 0.2 and 0.6
0.2 vs. 0.70.3660.9590.5700.366No statistical difference between 0.2 and 0.7
0.2 vs. 0.80.4830.4470.7310.484No statistical difference between 0.2 and 0.8
0.2 vs. 0.91.487−1.0450.9321.486No statistical difference between 0.2 and 0.9
0.2 vs. 10.1540.8101.3870.154No statistical difference between 0.2 and 1.0
Table 57. Welch’s t-test: UNSW-NB15 Backdoors: RU followed by BSMOTE oversampling for KNN = 5.
Table 57. Welch’s t-test: UNSW-NB15 Backdoors: RU followed by BSMOTE oversampling for KNN = 5.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.2−0.6791.1360.259−0.678No statistical difference between 0.1 and 0.2
0.1 vs. 0.3−1.2411.8760.099−1.2400.1 has better recall than 0.3
0.1 vs. 0.40.6361.0511.3090.637No statistical difference between 0.1 and 0.4
0.1 vs. 0.5−0.3431.6360.523−0.3420.1 has better recall than 0.5
0.1 vs. 0.6−0.0723.2392.171−0.0700.1 has better recall and F-Score
0.1 vs. 0.70.7770.6460.9860.778No statistical difference between 0.1 and 0.7
0.1 vs. 0.8−1.2851.9250.293−1.2830.1 has better recall than 0.8
0.1 vs. 0.90.4541.3941.2620.455No statistical difference between 0.1 and 0.9
0.1 vs. 1−1.3140.981−0.238−1.314No statistical difference between 0.1 and 1.0
Table 58. Welch’s t-test: UNSW-NB15 Backdoors: RU followed by BSMOTE oversampling for KNN = 10.
Table 58. Welch’s t-test: UNSW-NB15 Backdoors: RU followed by BSMOTE oversampling for KNN = 10.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.1391.7921.1780.1400.1 has better than 0.2
0.1 vs. 0.30.175−0.4360.0370.175No statistical difference between 0.1 and 0.3
0.1 vs. 0.43.0921.5204.2183.0950.1 has better precision, F-Score and macro precision
0.1 vs. 0.51.1853.4612.1711.1860.1 has better recall and F-Score
0.1 vs. 0.60.2971.3310.9820.298No statistical difference between 0.1 and 0.6
0.1 vs. 0.70.2712.8531.0900.2720.1 has better recall than 0.7
0.1 vs. 0.82.2820.5322.1662.2830.1 has better precision, F-Score and macro precision
0.1 vs. 0.91.5920.8202.3691.5930.1 has better precision, F-Score and macro precision
0.1 vs. 11.4072.1293.4981.4090.1 has better recall and F-Score
Table 59. Worms: RU followed by SVM-SMOTE oversampling using KNN = 3.
Table 59. Worms: RU followed by SVM-SMOTE oversampling using KNN = 3.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6450.7960.7090.822
0.20.5640.7150.6260.782
0.30.6180.7730.6850.809
0.40.6090.8030.6920.804
0.50.6150.7230.6600.807
0.60.6470.7530.6950.823
0.70.6120.8070.6960.806
0.80.6010.7570.6680.800
0.90.6350.8110.7100.817
1.00.6450.7460.6900.822
Table 60. Worms: RU followed by SVM-SMOTE oversampling using KNN = 5.
Table 60. Worms: RU followed by SVM-SMOTE oversampling using KNN = 5.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.5740.8190.6730.787
0.20.5830.7690.6620.791
0.30.5530.7730.6390.776
0.40.5470.7690.6370.773
0.50.5080.7840.6130.754
0.60.5140.7800.6170.757
0.70.4570.6920.5490.728
0.80.5480.6960.6120.774
0.90.5350.7340.6170.767
1.00.5930.7880.6740.796
Table 61. Worms: RU followed by SVM-SMOTE oversampling using KNN = 10.
Table 61. Worms: RU followed by SVM-SMOTE oversampling using KNN = 10.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.4910.7530.5930.745
0.20.4650.7880.5850.732
0.30.4840.7920.5990.742
0.40.4600.8340.5920.730
0.50.4700.7880.5870.735
0.60.4510.8070.5780.725
0.70.4490.7690.5660.724
0.80.4820.7230.5770.741
0.90.4690.7530.5770.734
1.00.4280.7530.5440.714
Table 62. Welch’s t-test: Worms: RU followed by SVM-SMOTE oversampling using KNN = 3.
Table 62. Welch’s t-test: Worms: RU followed by SVM-SMOTE oversampling using KNN = 3.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.22.1912.3774.9172.1920.1 is better than 0.3 across all metrics
0.1 vs. 0.30.7860.7381.3980.786No statistical difference between 0.1 and 0.3
0.1 vs. 0.41.150−0.2671.0621.150No statistical difference between 0.1 and 0.4
0.1 vs. 0.50.7532.0682.5610.7540.1 has better recall and F-Score than 0.5
0.1 vs. 0.6−0.0441.2610.529−0.044No statistical diff between 0.1 and 0.6
0.1 vs. 0.70.911−0.2870.4570.911No statistical diff between 0.1 and 0.7
0.1 vs. 0.81.3021.0181.8501.3020.1 has better F-Score than 0.7
0.1 vs. 0.90.252−0.452−0.0430.252No statistical diff between 0.1 and 0.9
0.1 vs. 10.0151.6000.6100.0150.1 has better recall than 1.0
Table 63. Welch’s t-test: Worms: RU followed by SVM-SMOTE oversampling using KNN = 5.
Table 63. Welch’s t-test: Worms: RU followed by SVM-SMOTE oversampling using KNN = 5.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.2−0.2332.0760.331−0.2330.1 has better recall than 0.2
0.1 vs. 0.30.3813.7941.0480.3810.1 has better recall than 0.3
0.1 vs. 0.40.7981.4081.3580.798No statistical diff between 0.2 and 0.3
0.1 vs. 0.51.6760.9481.8661.6760.1 has better precision, F-Score and macro precision than 0.5
0.1 vs. 0.61.4902.8391.9001.4900.1 has better recall and F-Score than 0.6
0.1 vs. 0.73.0234.2883.7453.0240.1 is better than 0.7 across all metrics
0.1 vs. 0.80.6813.9381.9890.6820.1 has better recall and F-Score than 0.8
0.1 vs. 0.91.0552.4291.7461.0560.1 has better recall and F-Score than 0.9
0.1 vs. 1−0.4741.928−0.047−0.4740.1 has better recall than 1.0
Table 64. Welch’s t-test: UNSW-NB15 Worms: RU followed by SVM-SMOTE oversampling using KNN = 10.
Table 64. Welch’s t-test: UNSW-NB15 Worms: RU followed by SVM-SMOTE oversampling using KNN = 10.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.807−0.9770.2710.806No statistical diff between 0.1 and 0.2
0.1 vs. 0.30.175−1.274−0.1710.175No statistical diff between 0.1 and 0.3
0.1 vs. 0.40.920−2.7570.0230.9200.4 has better recall than 0.1
0.4 vs. 0.5−0.2761.5330.160−0.2760.4 has better recall than 0.5
0.4 vs. 0.60.5871.1790.9490.587No statistical diff between 0.4 and 0.6
0.4 vs. 0.70.4741.6730.9590.4740.4 has better recall than 0.7
0.4 vs. 0.8−0.7445.0940.674−0.7430.4 has better recall than 0.8
0.4 vs. 0.9−0.4012.0020.664−0.4010.4 has better recall than 0.9
0.4 vs. 1.01.5442.5092.5111.5440.4 is better than 1.0 across all metrics
Table 65. Shellcode: RU followed by SVM-SMOTE oversampling using KNN = 3.
Table 65. Shellcode: RU followed by SVM-SMOTE oversampling using KNN = 3.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.7040.9010.7900.852
0.20.7040.9050.7920.851
0.30.6960.9110.7890.848
0.40.6930.9100.7870.846
0.50.6940.9120.7880.847
0.60.6930.9220.7910.846
0.70.6810.9050.7770.840
0.80.6940.9100.7870.847
0.90.6950.9150.7900.847
1.00.6880.9100.7840.844
Table 66. Shellcode: RU followed by SVM-SMOTE oversampling using KNN = 5.
Table 66. Shellcode: RU followed by SVM-SMOTE oversampling using KNN = 5.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6930.9220.7910.846
0.20.6820.9210.7830.841
0.30.6740.9250.7800.837
0.40.6770.9210.7810.838
0.50.6890.9110.7850.844
0.60.6730.9220.7780.836
0.70.6650.9190.7710.832
0.80.6820.9250.7850.841
0.90.6580.9320.7710.829
1.00.6740.9210.7780.837
Table 67. Shellcode: RU followed by SVM-SMOTE oversampling using KNN = 10.
Table 67. Shellcode: RU followed by SVM-SMOTE oversampling using KNN = 10.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.6790.9200.7810.839
0.20.6530.9240.7650.826
0.30.6470.9230.7610.823
0.40.6470.9320.7640.823
0.50.6530.9160.7620.826
0.60.6450.9170.7570.822
0.70.6550.9220.7660.827
0.80.6620.9270.7720.831
0.90.6350.9240.7530.817
1.00.6600.9280.7710.830
Table 68. Welch’s t-test: Shellcode: RU followed by SVM-SMOTE oversampling using KNN = 3.
Table 68. Welch’s t-test: Shellcode: RU followed by SVM-SMOTE oversampling using KNN = 3.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.019−0.964−0.2400.019No statistical diff between 0.1 and 0.2
0.1 vs. 0.30.778−1.3240.1190.777No statistical diff between 0.1 and 0.3
0.1 vs. 0.41.150−2.2000.5391.1490.4 has better recall than 0.1
0.4 vs. 0.5−0.064−0.327−0.083−0.064No statistical diff between 0.4 and 0.5
0.4 vs. 0.6−0.050−1.114−0.730−0.051No statistical diff between 0.4 and 0.6
0.4 vs. 0.71.0150.6561.5551.0150.4 has better F-Score than 0.7
0.4 vs. 0.8−0.1160.000−0.096−0.116No statistical diff between 0.4 and 0.8
0.4 vs. 0.9−0.285−0.943−0.589−0.286No statistical diff between 0.4 and 0.9
0.4 vs. 1.00.5730.0000.6600.573No statistical diff between 0.4 and 1.0
Table 69. Welch’s t-test: UNSW-NB15 Shellcode: RU followed by SVM-SMOTE oversampling using KNN = 5.
Table 69. Welch’s t-test: UNSW-NB15 Shellcode: RU followed by SVM-SMOTE oversampling using KNN = 5.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.6570.3960.6720.657No statistical diff between 0.1 and 0.2
0.1 vs. 0.31.238−0.6521.2351.238No statistical diff between 0.1 and 0.3
0.1 vs. 0.41.2500.1641.4011.251No statistical diff between 0.1 and 0.4
0.1 vs. 0.50.2302.9880.6800.2310.1 has better recall than 0.5
0.1 vs. 0.61.6180.0781.7981.6180.1 has better precision, F-Score and macro
0.1 vs. 0.71.9931.0062.2031.9930.1 has better precision, F-Score and macro
0.1 vs. 0.80.828−0.6760.6890.828No statistical diff between 0.1 and 0.8
0.1 vs. 0.92.968−1.8132.7072.9670.1 has better precision, F-Score and macro while 0.9 has better recall
0.1 vs. 11.4040.1801.5331.4040.1 has better F-Score
Table 70. Welch’s t-test: UNSW-NB15 Shellcode: RU followed by SVM-SMOTE oversampling for KNN = 10.
Table 70. Welch’s t-test: UNSW-NB15 Shellcode: RU followed by SVM-SMOTE oversampling for KNN = 10.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.22.910−0.4293.6812.9110.1 has better precision, F-Score and macro precision than 0.2
0.1 vs. 0.33.491−0.3833.0423.4910.1 has better precision, F-Score and macro precision than 0.3
0.1 vs. 0.43.501−2.3052.9323.5010.1 is better than 0.4 across all metrics
0.1 vs. 0.53.0560.4242.8453.0560.1 has better precision, F-Score and macro precision than 0.5
0.1 vs. 0.63.5760.4433.4523.5760.1 has better precision, F-Score and macro precision than 0.6
0.1 vs. 0.72.259−0.3122.4142.2590.1 has better precision, F-Score and macro precision than 0.7
0.1 vs. 0.81.714−1.2001.3501.7130.1 has better precision and macro precision than 0.8
0.1 vs. 0.96.692−0.5726.0366.6920.1 has better precision, F-Score and macro precision than 0.9
0.1 vs. 11.519−1.2161.2401.518No statistical diff bet 0.1 and 1.0
Table 71. UNSW-NB15 Backdoors: RU followed by SVM-SMOTE oversampling using KNN = 3.
Table 71. UNSW-NB15 Backdoors: RU followed by SVM-SMOTE oversampling using KNN = 3.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.9370.9500.9430.968
0.20.9260.9450.9350.963
0.30.9350.9440.9390.967
0.40.9210.9530.9370.960
0.50.9230.9510.9370.961
0.60.9350.9540.9440.967
0.70.9190.9500.9340.959
0.80.9210.9460.9330.960
0.90.9210.9430.9320.960
1.00.9170.9430.9300.958
Table 72. UNSW-NB15 Backdoors: RU followed by SVM-SMOTE oversampling using KNN = 5.
Table 72. UNSW-NB15 Backdoors: RU followed by SVM-SMOTE oversampling using KNN = 5.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.9190.9540.9360.959
0.20.9170.9550.9360.958
0.30.9150.9550.9350.957
0.40.9220.9480.9350.961
0.50.9150.9560.9350.957
0.60.9120.9550.9330.956
0.70.9150.9490.9320.957
0.80.9160.9520.9340.958
0.90.9250.9450.9350.962
1.00.9170.9430.9300.958
Table 73. Welch’s t-test: UNSW-NB15 Backdoors: RU followed by SVM-SMOTE oversampling for KNN = 3.
Table 73. Welch’s t-test: UNSW-NB15 Backdoors: RU followed by SVM-SMOTE oversampling for KNN = 3.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.7881.0461.2070.788No statistical diff between 0.1 and 0.2
0.1 vs. 0.30.1800.8820.8170.180No statistical diff between 0.1 and 0.3
0.1 vs. 0.41.433−0.4890.9971.432No statistical diff between 0.1 and 0.4
0.1 vs. 0.51.456−0.2151.5271.457No statistical diff between 0.1 and 0.5
0.1 vs. 0.60.170−0.674−0.1610.169No statistical diff between 0.1 and 0.6
0.1 vs. 0.72.1610.0522.7432.1620.1 has better precision, recall and macro-precision than 0.7
0.1 vs. 0.81.5820.6742.4391.5820.1 has better precision, recall and macro-precision than 0.8
0.1 vs. 0.91.6921.0632.0591.6930.1 has better precision, recall and macro-precision than 0.9
0.1 vs. 12.3751.1494.1352.3770.1 has better precision, recall and macro-precision than 1.0
Table 74. Welch’s t-test: UNSW-NB15 Backdoors: RU followed by SVM-SMOTE oversampling for KNN = 5.
Table 74. Welch’s t-test: UNSW-NB15 Backdoors: RU followed by SVM-SMOTE oversampling for KNN = 5.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.20.387−0.2910.2140.387No statistical diff between 0.1 and 0.2
0.1 vs. 0.30.601−0.1610.2970.601No statistical diff between 0.1 and 0.3
0.1 vs. 0.4−0.3571.1400.496−0.357No statistical diff between 0.1 and 0.4
0.1 vs. 0.50.534−0.3200.2720.534No statistical diff between 0.1 and 0.5
0.1 vs. 0.60.825−0.1140.8710.825No statistical diff between 0.1 and 0.6
0.1 vs. 0.70.4050.7760.7300.405No statistical diff between 0.1 and 0.7
0.1 vs. 0.80.6450.3570.9350.645No statistical diff between 0.1 and 0.8
0.1 vs. 0.9−0.8421.3540.654−0.841No statistical diff between 0.1 and 0.9
0.1 vs. 10.3251.6971.4100.3270.1 has better recall than 1.0
Table 75. UNSW-NB15 Backdoors: RU followed by SVM-SMOTE oversampling using KNN = 10.
Table 75. UNSW-NB15 Backdoors: RU followed by SVM-SMOTE oversampling using KNN = 10.
Oversampling %PrecisionRecallF-ScoreMacro Precision
0.10.9110.9570.9330.955
0.20.9180.9670.9420.959
0.30.9070.9670.9360.953
0.40.8830.9550.9180.941
0.50.9060.9620.9330.953
0.60.8940.9470.9200.947
0.70.8930.9580.9240.946
0.80.8980.9560.9260.949
0.90.9090.9520.9300.954
1.00.9130.9530.9330.956
Table 76. Welch’s t-test: UNSW-NB15 Backdoors: RU followed by SVM-SMOTE oversampling for KNN = 10.
Table 76. Welch’s t-test: UNSW-NB15 Backdoors: RU followed by SVM-SMOTE oversampling for KNN = 10.
Welch’s t-Test (p < 0.10)Precision
t Value
Recall
t Value
F-Score
t Value
Macro Precision
t Value
Analysis
0.1 vs. 0.2−1.077−1.716−1.873−1.0790.2 has better recall and F-Score than 0.1
0.2 vs. 0.31.0780.1641.2951.078No statistical diff between 0.2 and 0.3
0.2 vs. 0.44.9343.0214.6534.9330.2 is better than 0.4 across all metrics
0.2 vs. 0.51.6561.9152.1091.6570.2 is better than 0.5 across all metrics
0.2 vs. 0.62.8304.6523.8792.8320.2 is better than 0.6 across all metrics
0.2 vs. 0.74.3701.9576.0594.3730.2 is better than 0.7 across all metrics
0.2 vs. 0.83.0762.2853.7903.0780.2 is better than 0.8 across all metrics
0.2 vs. 0.91.0002.9282.2781.0020.2 has better recall and F-Score than 0.9
0.2 vs. 1.00.7293.9452.6670.7320.2 has better recall and F-Score than 1.0
Table 77. Comparison of best oversampling percentages for UNSW-NB15 minority data.
Table 77. Comparison of best oversampling percentages for UNSW-NB15 minority data.
UNSW-NB15 Oversampling then UndersamplingUndersampling then Oversampling
KNNBSMOTESVM-SMOTEBSMOTESVM-SMOTE
Worms30.10.10.20.1
50.10.10.10.1
100.10.30.10.4
Shellcode30.10.10.10.4
50.10.10.10.1
100.10.10.10.1
Backdoors30.10.30.20.1
50.10.10.10.1
100.10.10.10.2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bagui, S.S.; Mink, D.; Bagui, S.C.; Subramaniam, S. Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data. Computers 2023, 12, 204. https://doi.org/10.3390/computers12100204

AMA Style

Bagui SS, Mink D, Bagui SC, Subramaniam S. Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data. Computers. 2023; 12(10):204. https://doi.org/10.3390/computers12100204

Chicago/Turabian Style

Bagui, Sikha S., Dustin Mink, Subhash C. Bagui, and Sakthivel Subramaniam. 2023. "Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data" Computers 12, no. 10: 204. https://doi.org/10.3390/computers12100204

APA Style

Bagui, S. S., Mink, D., Bagui, S. C., & Subramaniam, S. (2023). Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data. Computers, 12(10), 204. https://doi.org/10.3390/computers12100204

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop