Next Article in Journal
OCT Retinal and Choroidal Layer Instance Segmentation Using Mask R-CNN
Previous Article in Journal
A Systematic Review on Machine Learning and Deep Learning Models for Electronic Information Security in Mobile Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

χ2-BidLSTM: A Feature Driven Intrusion Detection System Based on χ2 Statistical Model and Bidirectional LSTM

1
School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China
2
Department of Electrical Engineering, University of Science and Technology, Bannu 28100, Pakistan
3
Department of Education, University for Development Studies (UDS), Tamale P.O. Box TL 1350, Ghana
4
Department of Computer Science Information Management, Providence University, Taichung City 433, Taiwan
5
Department of Applied Data Science, Noroff University College, 4612 Kristiansand, Norway
6
Department of Computer Engineering, Sungkyul University, Anyang 14097, Korea
*
Author to whom correspondence should be addressed.
Sensors 2022, 22(5), 2018; https://doi.org/10.3390/s22052018
Submission received: 8 February 2022 / Revised: 21 February 2022 / Accepted: 28 February 2022 / Published: 4 March 2022
(This article belongs to the Section Sensor Networks)

Abstract

:
In a network architecture, an intrusion detection system (IDS) is one of the most commonly used approaches to secure the integrity and availability of critical assets in protected systems. Many existing network intrusion detection systems (NIDS) utilize stand-alone classifier models to classify network traffic as an attack or as normal. Due to the vast data volume, these stand-alone models struggle to reach higher intrusion detection rates with low false alarm rates( FAR). Additionally, irrelevant features in datasets can also increase the running time required to develop a model. However, data can be reduced effectively to an optimal feature set without information loss by employing a dimensionality reduction method, which a classification model then uses for accurate predictions of the various network intrusions. In this study, we propose a novel feature-driven intrusion detection system, namely χ 2 -BidLSTM, that integrates a χ 2 statistical model and bidirectional long short-term memory (BidLSTM). The NSL-KDD dataset is used to train and evaluate the proposed approach. In the first phase, the χ 2 -BidLSTM system uses a χ 2 model to rank all the features, then searches an optimal subset using a forward best search algorithm. In next phase, the optimal set is fed to the BidLSTM model for classification purposes. The experimental results indicate that our proposed χ 2 -BidLSTM approach achieves a detection accuracy of 95.62% and an F-score of 95.65%, with a low FAR of 2.11% on NSL-KDDTest+. Furthermore, our model obtains an accuracy of 89.55%, an F-score of 89.77%, and an FAR of 2.71% on NSL-KDDTest−21, indicating the superiority of the proposed approach over the standard LSTM method and other existing feature-selection-based NIDS methods.

1. Introduction

In present-day society, various organizations and individuals have become more and more reliant upon information and communication technology (ICT), due to the increasing number of useful technologies. The rise in reliance has resulted in a greater demand for more stable and reliable ICT components and services. As a section of ICT, the Internet provides a medium for individuals and organizations to accomplish tasks in their everyday lives. However, as the data flow and the information traffic over the Internet increase, user privacy and transactions become more prone to malicious users’ threats and attacks (intrusions). An intrusion is a succession of activities aiming to jeopardize the security of a network system [1].
Intrusion detection systems (IDSs) have proven essential in the security domain and play a vital role in detecting different types of malicious behaviors and attacks. IDSs can be grouped into three basic strategic concepts (misuse detection, anomaly detection, and a hybrid of the two) [2,3]. Misuse detection is a signature-based approach used to identify a particular matching behavior or signature, compare it to recorded user behavior or activities, and raise a signal [4,5,6]. Anomaly detection is used to spot activities that are significantly different from normal user activity. In anomaly detection systems, an action is raised if there is some deviation from a predefined computer state [7,8,9]. Hybrid detection is a fusion of anomaly and misuse detection methods used to identify malicious activities [2,10,11]. It is vital to mention that network intrusion or attacks can come from outside the network (outsider attacks) or from within the network (insider attacks). Researchers have proposed several different intrusion detection systems over the past few decades using machine learning, deep learning, and other statistical methods. However, in recent times, machine learning and deep learning techniques have gained more attention in many different research areas, including intrusion detection [12]. They have become the most commonly adopted approaches for many intrusion detection systems (IDS).
In the literature, machine learning methods such as support vector machine (SVM), decision trees (DT), k-nearest neighbor (KNN), artificial neural networks, and deep neural networks (DNN) have been widely used for the detection of network intruders [13,14,15,16]. However, the performance of these techniques depends heavily on simulated datasets. These datasets often require many features for training, making them computationally expensive for most classification models. Furthermore, using large numbers of features may result in low performance, because some features may be redundant and irrelevant to the performance of a model.
Therefore, it is necessary to perform feature selection before training, to eliminate redundant and irrelevant features from the datasets. Feature selection plays an important role in data preprocessing for most machine learning models. It is the process of selecting features with the highest contributions to the predictive variable. Feature selection can be performed manually or using algorithms (automatically) to reduce the dimensions of the data to a subset of features relevant to building a predictive model. There are three main categories of feature selection in the literature: wrapper, filter, and hybrid methods [17]. The wrapper method utilizes the greedy search strategy to evaluate all possible feature combinations against a criterion for evaluation based on machine learning algorithms [17,18]. The filter method, on the other hand, is not dependent on any machine learning algorithm. Features are selected based on the variable characteristics or intrinsic properties, which are measured via statistical analysis [19,20]. A hybrid or embedded method uses a combination of the properties of wrapper and filter methods [17,21]. Motivated by the positive impact of feature selection on the performance of machine learning and deep learning models for several different problems, we have developed a new IDS called χ 2 -BidLSTM for network systems.

Main Contributions

The proposed χ 2 -BidLSTM IDS integrates χ 2 with a BidLSTM-based deep learning model. The χ 2 statistical model is used for the ranking and selection of features based on their χ 2 test scores. The selected optimal features are used to train a bidirectional long short-term memory (BidLSTM)-based recurrent neural network (RNN) for network intrusion detection. The NSL-KDD dataset, which can be accessed via the University of New Brunswick (UNB) data repository, is used to train and evaluate our χ 2 -BidLSTM model’s performance. The contributions of this paper are as follows:
  • Developing and implementing an intrusion detection system based on a bidirectional long short-term memory integrated with a χ 2 feature selection model.
  • To the best of our knowledge, no prior work has addressed the hybridization of the bidirectional LSTM model with χ 2 statistical model for network intrusion detection.
  • The χ 2 -BidLSTM method uses fewer features for training and testing purposes, and thus reduces the complexity of traditional BidLSTM and also improves its classification accuracy.
  • A better classification accuracy than the traditional bidirectional LSTM model is obtained. Additionally, our approach outperforms existing state-of-the-art methods in the literature.
The remainder of this work is organized as follows. Section 2 presents a review of related work in the literature. A description of the dataset and the proposed methodology is presented in Section 3. Section 4 presents the implementation, experimental results, and discussion. Section 5 discusses the model complexity and limitations. In Section 6, we present the conclusions and future directions of the study.

2. Related Work

As an essential element for ensuring security in network systems, IDSs continuously draw the interest of many researchers. Many models have been developed to enhance the effectiveness of IDSs in network systems. In this section, we discuss the literature related to IDS techniques based on machine learning (ML) and deep learning (DL) that leverage feature selection for network anomaly detection.
The authors in [22] proposed a hybrid IDS approach using the NSL-KDD dataset, which focuses on combining the probability distributions of different learning algorithms using information gain (IG) and a voting algorithm to select relevant features for classification. The hybrid method comprises the J48, Random Tree, Meta Pagging, REPTree, Decision Stump, AdaBoostM1, and naive Bayes base classifiers. Although the technique demonstrated a good performance of 99.81% and 98.56% accuracy for binary and multi-class problems, respectively, there are still some concerns that need attention. The feature selection process in this approach is often biased towards variables with distinct values, not variables that have observations with large values, which can result in over-fitting and poor performance. In [23], Hota and Shrivas also developed a framework that utilizes different feature selection methods for irrelevant feature removal. The findings suggested that the C4.5 algorithm could obtain the greatest accuracy with IG for just 17 features of the NSL-KDD dataset. The study investigated the performance of four different feature selection methods (i.e., correlation, information gain, relief, and symmetrical uncertainty) integrated with the C4.5 decision tree algorithm for classification. According to the experimental findings, the most efficient amongst the four selection methods was information gain with C4.5, which obtained a detection accuracy of 99.68%. Although the result is promising, the method tends to be skewed towards attributes with many possible values, leading to poor generalization. Moreover, the entropy model employed in C4.5 has many time-consuming logarithmic operations, sorting operations, and continuous values resulting in high computational cost. Using logistic regression combined with a search strategy, the authors in [24] presented a feature-selection-based IDS model that selects the best subset of features from the KDDCUP’99 and the UNSW-NB15 datasets. The findings indicated that their algorithm yields a good detection accuracy with just 18 selected features from the KDDCUP’99 dataset and 20 selected features from the UNSW-NB15 dataset.
Acharya and Singh [25] proposed a novel bio-driven feature selection approach that utilizes the Intelligent Water Drops algorithm combined with an SVM classifier for network intrusion detection. Their approach, also known as a swarm optimization algorithm, produced a high performance on the KDDCUP’99 dataset. The results indicated that the approach obtained a high accuracy of 93.12%, a detection rate of 91.35%, and a reduced false alarm rate of 3.35%, compared to other methods. The authors in [26] introduced a hybrid IDS mechanism that integrates feature selection and clustering using SVM and K-medoids clustering strategies. In this approach, the authors trained a naïve Bayes classifier on the KDDCUP’99 dataset. They evaluated the model using the detection rate, accuracy, and false alarm rate. The evaluation results indicated that the proposed approach obtained a higher detection rate of 90.1%, an accuracy of 91.5%, and a false alarm rate of 6.36%. In [27], Jabbar et al. presented a cluster-oriented ensemble model for network intrusion detection. The model was developed using the alternating decision tree technique (ADTree) and the K-nearest neighbor (KNN) algorithm. In experiments, their proposed approach showed a better performance with regard to accuracy and detection rate, compared to other methods in the literature.
In another approach [28], Paulauskas and Auskalnis introduced an ensemble IDS model. The model was developed using naïve Bayes (NB), C5.0, J48, and the partial decision list algorithms as base classifiers, with the notion of integrating multiple learners. Experimental findings indicated that the approach obtained better accuracy for network intrusion detection. To combat the high-dimensionality problem in network traffic, Zhou and Cheng [29] developed a heuristic feature selection algorithm known as the correlation-based feature selection bat algorithm (CFS-BA). Their strategy obtains the best feature subset by evaluating the correlations among features. The authors further built an ensemble model that incorporates random forest, forest-oriented penalizing attribute, and C4.5 decisions, using the rule of the average of probabilities (AoP). The model was trained and evaluated using CIC-IDS2017, KDDCUP’99, and the NSL-KDD datasets. The results showed that the CFS-BA ensemble approach produced a better performance, compared to other existing methods.
In [30], Pham et al. presented a hybrid approach that leverages gain ratio and bagging techniques for network intrusion detection. The former (gain ratio) is utilized to obtain the best features. The latter (bagging) is used to integrate tree-based core classifiers. The approach was evaluated using the NSL-KDD dataset. The results showed that the bagging method combined with J48 as the core classifier produced better performance for 35 features. The authors in [31] proposed a wrapper-based IDS that utilizes a hyper-graph (HG) and a genetic algorithm (GA) for producing possible subsets of features. The approach uses SVM as a classification algorithm, which is evaluated on the NSL-KDD dataset. From the evaluation, their proposed method exhibited a performance accuracy of 96.72% with 35 selected features.
In [32], Abdullah et al. developed an IDS model based on splitting the data input into several subsets relative to the attack types. In this work, IG was used to select the best features for each subset. Using random forest (RF) and partial decision list (PART) as core classifiers, the method was evaluated on the NSL-KDD dataset. Experimental findings illustrated that higher accuracy was achieved with the RF and PART classifiers combined with product probability. In [33], Mohammadi et al. introduced a feature-selection-based IDS that incorporates a clustering algorithm. The methodology was developed using a wrapper method that leverages a linear correlation coefficient (LCC) algorithm and a filter strategy that utilizes a cuttlefish algorithm (CFA). Their approach trained a decision tree (DT) classifier on the KDDCUP’99 dataset. Experimental results with 10-fold cross-validation showed that the method obtained a 95.03% accuracy, a 95.23% detection rate, and a reduced false alarm rate of 1.6%. The authors in [34] developed a hybrid intrusion detection system that integrates principal component analysis (PCA) and information gain (IG) algorithms for feature selection. Their approach was evaluated on the NSL-KDD, Kyoto 2006+, and ISCX 2012 datasets using three ensemble classifiers (i.e., multi-layer perceptron (MLP), SVM, and instance-based learning algorithms (IBK)). The IG-PCA method exhibited more remarkable performance in detection rate, accuracy, and false alarm rate than other existing strategies.
Using the organic combination of several deep learning methods, the authors in [35] proposed a novel anomaly detection approach known as HELAD. The authors first performed feature extraction and selection using the damped incremental statistics algorithm (DISA). An autoencoder was then trained with selected features of a label dataset while noting the irregular (abnormal) score labels in the data traffic. They further trained an LSTM model using the irregular score label and obtained the final score using a weighted technique. In the experiment, the HELAD method produced a better accuracy compared to other state-of-the-art methods. In [36], the authors used a multi-objective technique to obtain the best subsets of features, which were then evaluated based on three decision tree algorithms (i.e., NB, RF, and C4.5). The three algorithms were trained and tested using the CIC-IDS2017, UNSW-NB15, and NSL-KDD datasets. In the experiment, the NSGA2-LR approach showed promising results compared to other methods.

3. Materials and Methods

In this section, we present a detailed description of the dataset used in our study and the proposed methodology.

3.1. Description of Dataset

One of the benchmark datasets utilized by researchers on intrusion detection in the security domain is the NSL-KDD dataset [37]. It is publicly available in the online data repository of the University of New Brunswick (UNB) [38]. The NSL-KDD dataset is a modified form of the KDDCUP’99 dataset presented in [39]. The proposed model was trained and evaluated on the NSL-KDD dataset. We selected this dataset because of the following advantages:
  • The dataset has a reasonable and sufficient number of traffic records that can be used to perform the study.
  • It does not contain redundant traffic in the training set, ensuring that classifiers are not biased toward more frequently occurring records.
  • The testing set has no duplicate records; hence, the performance of learning algorithms is not biased by models with higher detection rates on more frequently occurring records.
  • The fraction of records in the main KDD dataset is inversely proportional to the overall records chosen from each difficulty level category. Therefore, the prediction rates of various ML algorithms differ over a greater range, making accurate evaluation of various learning methods more effective.
The dataset includes a training set (i.e., KDDTrain+) containing 125,973 data records and two different test sets (i.e., KDDTest−21 and KDDTest+) containing 11,850 and 22,544 data records, respectively, as presented in Table 1.

3.2. Data Preprocessing

As presented in Table 2, the NSL-KDD dataset has forty-one (41) features, of which three are non-numeric. The non-numeric features are service, protocol_type, and flag. The dataset has one classification label that can be categorized into two classes (i.e., normal and attack) for a 2-class classification or five classes for a 5-class classification. The five classes include Remote-2-Local (R2L), User-to-Root (U2R), Denial of Service (DoS), Probe, and Normal. Apart from the normal class, the remaining four classes represent the different attack types found in the dataset (see Table 3). Like any neural network model, the proposed approach uses only numeric values as inputs. Hence, we converted all the non-numeric data inputs to numeric form by encoding and assigning unique integer values to each of them. As an integral part of data preprocessing, normalization plays an essential role in producing a balanced dataset. The commonly used normalization strategies in machine learning and data science include decimal scaling, z-score, and min-max. The NSL-KDD dataset exhibits uneven distribution for some features (e.g., src_bytes and dst_bytes), leading to biased results. To ensure that the proposed model does not produce biased results, we transformed all 41 features to values within the range of 0 to 1 by utilizing the min-max feature scaling technique, as shown in Equation (1):
x = z z m i n z m a x z m i n
where z signifies the original value of the feature and x represents the newly scaled number.

3.3. Proposed Approach

The proposed method (see Figure 1) in this paper is chi-square bidirectional long short-term memory ( χ 2 -BidLSTM), which involves two steps. The first step utilizes a chi-square statistical model to select optimal features from the dataset. The second step trains a bidirectional LSTM predictive model on the optimal set.

3.3.1. Chi-Square ( χ 2 ) Feature Selection

A χ 2 model computes the χ 2 statistics for every feature ( F i ) and class ( θ ) to measure the level of independence between each class and feature. It also indicates features that are most likely to be irrelevant (not class-dependent) for classification [40]. The feature selection process first partitions the data and ranks the features, and then performs a search to obtain an optimal subset from the ranked set of features [41]. The features are ranked using the χ 2 test scores. For instance, consider a 2-class (i.e., Normal and Attack) classification with m instances. We can construct a table to obtain the χ 2 test scores, as shown in Table 4.
Here, φ represents the total number of instances with feature F i , and the total number of instances without F i is represented by m φ . In addition, δ denotes the total number of normal instances. The total number of attack instances is denoted by m δ . The χ 2 test statistic compares the observed values (O) measured from the data with the expected values (E). From Table 4, the observed values are n, a, ν , and ρ . Let E n , E a , E ν , and E ρ represent their expected values, respectively. Using the assumption that the two occurrences are independent, we can compute the expected values as:
E n = n + a n + a m
Similarly to Equation (2), we can also compute the expected values E a , E ν , and E ρ . The χ 2 test statistic for a goodness-of-fit test is generally obtained by:
χ 2 = i = 1 k O i E i 2 E i
where i is the number of different data classes, O denotes the observed values measured from the data, and E represents the expected values. We can therefore compute the χ 2 test statistic for Table 4 as:
χ 2 = n E n 2 E n + a E a 2 E a + ν E ν 2 E ν + ρ E ρ 2 E ρ
The test statistic in Equation (4) is used to rank the features. Subsequently, we perform a forward best-first search to select the features with the highest test scores as the optimal set. Thus, we first select a feature with the highest χ 2 test result and check its performance using the BidLSTM model. Another feature is added to the subset of features in the subsequent iteration based on the test score. Again, we investigate the performance of the subset of features with the BidLSTM model. This procedure is repeated until every ranked feature is added to the subset. The subset of features with the best performance is then selected as the optimal set and supplied to the BidLSTM predictive model to produce the best performance results.

3.3.2. BidLSTM Model

In this subsection, we present the deep learning methods used in this study. We give a brief overview of the working principles of RNNs in general and narrow it down to the main methods: the LSTM and the BidLSTM.
RNN is a generalized form of traditional feed-forward network with internal memory, capable of propagating information from the past to the future. It generates loops in the networks which enable information to persist. The loops are utilized together with the memory state to process a sequence of input data [42]. RNN is a category of DNN that is able to utilize previous outputs while maintaining hidden layers that serve as storage for information [43,44]. The same weights and biases are supplied to all layers, thereby minimizing the challenge of memorizing and increasing parameters. The basic architecture of an RNN is shown in Figure 2. RNNs may be suitable for solving several research problems, but they suffer from the drawback of vanishing gradients, which inspired the development of LSTM in [45].
The LSTM network is a more advanced version of the RNN that learns long-term dependencies via a gating mechanism. It is a solution to the vanishing gradients problem encountered when training conventional RNNs [47]. The gates and cell state are the LSTM network’s basic principles. The cell state is considered as the network’s memory and serves as route to propagate relevant information. The gates (i.e., forget, input, and output gates) control the information flow and determine what knowledge should be kept or discarded (forgotten), as shown in Figure 3. Equations (5)–(9) give the expressions for the cell state and various gates at the periods t and t 1 , as follows:
i T = σ ( W ^ α i α T ) + ( W ^ β i β T 1 ) + ( W ^ ζ i ζ T 1 ) + λ i
f T = σ ( W ^ α f α T ) + ( W ^ β f β T 1 ) + ( W ^ ζ f ζ T 1 ) + λ f
ζ T = ( f T ζ T 1 ) + i T tanh ( W ^ α ζ α T ) + ( W ^ β ζ β T 1 ) + λ ζ
o T = σ ( W ^ α o α T ) + ( W ^ β o β T 1 ) + ( W ^ ζ o ζ T 1 ) + λ o
β t = o T tanh β T
where i T denotes the input gate, α represents the input vector, o T is the output gate, β T denotes the output, and f T represents the forget function. The cell state is given by ζ , with W ^ and λ as the weight and bias parameters, respectively.
The proposed method, which is the bidirectional LSTM (BidLSTM), augments the conventional LSTM to enhance a network model’s performance. The BidLSTM model utilizes two hidden LSTM layers to process data inputs in two directions (i.e., forward and backward) [48,49]. The basic concept of a BidLSTM model is quite simple. It involves duplicating the primary recursive layer of the neural model. In training, the input to the primary layer consists of the actual data, while that of the duplicate layer is a reverse copy of the data. This technique effectively increases the amount of information available to the model. Figure 4 displays the structure of a BidLSTM model. The Keras library in Python provides a wrapper for the bidirectional layers used for developing BidLSTMs. It permits users to decide the merging mode, which determines how the outputs from both directions (i.e., forward and backward) are combined before feeding them to the subsequent layer. The forward hidden layer ( β ), the backward hidden layer ( β ), and the output (o) of a BidLSTM can be obtained from the following equations [49,50]:
β t = h W ^ α β α t + W ^ β β β t 1 + λ β
β t = h W ^ α β α t + W ^ β β β t + 1 + λ β
o t = W ^ β o β t + W ^ β o β t + λ o
where W ^ α β denotes the forward hidden weight and W ^ α β denotes the backward hidden weight. The terms λ β and λ β signify the forward and backward bias vectors, respectively, while the term h denotes the hidden layer.
It is evident from the literature that bidirectional RNN models perform considerably better than standard models in several research areas, including intrusion detection. In this approach, we evaluated the performance of BidLSTM using the NSL-KDD intrusion detection dataset. The χ 2 statistical model was hybridized with the BidLSTM to further improve the model’s performance. We carried out experiments to discover the hyper-parameter values that would result in the best IDS performance metrics. The trained χ 2 -BidLSTM model consisted of an input layer with 64 neurons, three hidden layers with 32 neurons each, and an output layer with five neurons corresponding to the five class labels. We set the number of epochs to 100, with a range of 0 to 0.05 as the model weights. We defined the loss function in the training process to assess the model weights. Since the study deals with a multi-class classification issue, we chose an algorithmic loss function specified in the Keras library as “categorical_crossentropy”. This loss function measures how the predicted values vary from the actual values. We employed ReLU as the activation function for all layers except the output layer, which used Softmax activation. The model uses an “Adam” optimization algorithm with a learning rate of 0.008. Finally, we fitted the model to the dataset using the “fit” function. We adopted the K-fold cross-validation scheme with the value of K set to 10, to evaluate the model’s performance.

4. Experimental Results and Discussion

This section presents the process of implementing χ 2 -BidLSTM (Algorithm 1) and discusses the experimental results. To investigate the proposed method’s robustness, we evaluated the model’s performance using different metrics such as accuracy, precision, F-score, and false alarm rate (FAR). In addition, this section compares the findings to those of the standard LSTM model and other techniques in the literature. The complexity and runtime analyses of the proposed algorithm are also provided.

4.1. Implementation

The proposed method is a feature-selection-based IDS called χ 2 -BidLSTM. Several tools in the literature can be used to perform this type of experiment. In this study, the Python programming language was utilized to implement the different phases of the proposed method. To be precise, we used Python’s TensorFlow and Keras libraries to implement the various components of the model. All experiments and evaluations were carried out using a personal computer (PC) running on the Windows 10 Operating System (OS), with the following specifications: Intel Core i5-9300H CPU, 8GB Random Access Memory (RAM), NVIDIA GeForce GTX 1050, and a 4GB dedicated GDDR5 VRAM. The implementation was in two phases. The first phase was feature selection with a chi-square statistical model. As mentioned earlier, the NSL-KDD dataset contains 41 training features. However, the dataset contains some irrelevant features that can hinder the performance of a model. To improve the prediction accuracy, we used a χ 2 feature selection method to narrow down the features to those most relevant for classification. As shown in Section 1, the χ 2 model computes the scores between each feature D [ i ] and class label L and ranks the features in descending order based on their test scores. The result is saved in SELECTED. The algorithm finally returns SELECTED, containing the list of ranked feature indexes. After ranking all features, a forward best search was performed to select an optimal set, as stated in Section 3.3.1. The search first finds the feature having the highest χ 2 test score using the evaluation function v ( ) and appends it to SELECTED. The next iteration finds the subsequent feature that achieves the highest score in addition to the features in SELECTED. The procedure is repeated until an ideal feature combination is achieved and fed to the classification models to produce the best results.

4.2. Results and Discussion

In this section, we present a discussion of the results obtained from all experiments. We performed a total of four separate experiments using the NSL-KDD dataset with 10-fold cross-validation.

4.2.1. Experiment No. 1: Standard LSTM Trained with All 41 Features

In experiment 1, we investigated a standard LSTM model’s performance using all 41 features for 5-class classification (i.e., DoS, Probe, U2R, R2L, and Normal). Table 5 illustrates the confusion matrices used to evaluate the model’s performance, and the results are reported in Table 6 and Table 7.
Algorithm 1. χ 2 -BidLSTM implementation process
                      ▹ Obtain the χ 2 test scores for each feature using χ 2 statistical model
                    ▹ Rank(sort) the features in descending order based on their χ 2 test scores
1:
a r r ← {}
2:
for i ← 1 to n do
3:
     t e s t _ s c o r e c h i . s q a u r e d ( D [ i ] , L ) ▹ Compute the χ 2 score between features in the dataset D and class labels L
4:
    append (i, t e s t _ s c o r e ) to a r r
5:
end for
6:
rank the features of a r r ▹ Sort the features in a descending order based on their χ 2 test scores
7:
store the feature scores of a r r to S E L E C T E D
8:
return S E L E C T E D
               ▹ Find the features with the highest test value ( v _ m a x ) from the ranked features
                       ▹ Obtain the best feature subset for training using forward search
9:
S E L E C T E D ← {}
10:
v _ m a x ←−1
11:
S u b F ← index of D
12:
while S u b F != NULL do
13:
     i n d e x N U L L
14:
    for i ← 0 to S u b F length do
15:
          t e m p f e a t u r e _ l i s t ( S E L E C T E D S u b F [ i ] )
16:
          t e m p v v ( t e m p f e a t u r e _ l i s t )
17:
         if  t e m p v > v _ m a x  then
18:
            index ← i
19:
         end if
20:
    end for
21:
    if index == NULL then
22:
        break
23:
    else
24:
        append  S u b F [ i n d e x ]  to  S E L E C T E D
25:
        Remove  S u b F [ i n d e x ]  from  S u b F
26:
    end if
27:
end while
28:
return  S E L E C T E D as optimal set
               ▹ Model training interface with a K-fold cross-validation using the optimal set
29:
for f = 1 to k do
30:
    [ ]Training_set = New_List[ | V | ]
31:
    [ ]Testing_set = New_List[ | V | ]                        ▹ Construct the training set
32:
    for m = 1 to k do
33:
        if m == f then
34:
           continue
35:
        end if
36:
        for v = 1 to  | V |  do
37:
           Train[v] + fold[v][m]
38:
        end for
39:
    end for                                  ▹ Construct the testing set
40:
    for v = 1 to  | V |  do
41:
        Test[v] + fold[v][m]
42:
    end for                         ▹ Fit BidLSTM model for training and testing
43:
    model = BidLSTM()
44:
    BidLSTM.Fit(Train)                            ▹ Train model with K-1 folds
45:
    Evaluate model perfomance with remaining Kth folds
46:
    scores = cross_val_scores()
47:
    Return scores                  ▹ Return the classification accuracy and validation scores
48:
end for
49:
Test model with an unseen test dataset
50:
Return test scores
As reported in Table 6, the standard LSTM model produces a detection accuracy of 87.26%, a precision of 90.34%, an F-Score of 88.03%, and a false alarm rate of 4.03% for the NSL-KDDTest+ dataset. From Table 7, the model produced a 74.49% detection accuracy, 81.53% precision, an F-Score of 75.76%, and a 5.96% false alarm rate for the NSL-KDDTest−21 dataset.

4.2.2. Experiment No. 2: BidLSTM Trained with all 41 Features

The second phase of the experiments involved a bidirectional LSTM model trained with all 41 features of the dataset. We evaluated the model’s performance using the confusion matrix and experimental findings shown in Table 8, Table 9 and Table 10.
To obtain a better intuition about the numbers of correctly classified attacks and the misclassification rates, we tabulated the confusion matrix in Table 8 for the two test sets (i.e., NSL-KDDTest+ and NSL-KDDTest−21). The vertical labels denote the true classes while the horizontal labels represent the predicted classes. As mentioned in Section 3.1, the NSL-KDDTest+ dataset contains 22,544 traffic records, out of which 12,833 samples are malicious behaviors (attacks) and 9711 are normal behaviors. Out of the 12,833 attack samples, the BidLSTM model could correctly detect 11,332, producing a detection accuracy of 91.36% from the confusion matrix, a precision of 92.81%, and an F-score of 91.67%. The model misclassified 1501 attack samples, yielding a low false alarm rate of 3.06%. Similarly, the NSL-KDDTest−21 test set contains 2152 normal and 9698 attack records. BidLSTM correctly detected 7947 attacks while 1751 were misclassified. Thus, the model achieved 82.05% detection accuracy, 85.91% precision, an F-Score of 82.77%, and a false alarm rate of 4.20% compared to the standard LSTM model. The performances of our proposed approach, BidLSTM, and those of other current techniques using all the 41 features in the NSL-KDD dataset are reported in Table 11.
Based on the results presented in Table 11, our approach shows substantial advantages over the other methods on the NSL-KDD dataset. BidLSTM trained using all 41 features reliably exhibits a greater detection accuracy and a better F-score than the other methods on the two test sets (i.e., NSL-KDDTest+ and NSL-KDDTest−21), as shown in Figure 5. Additionally, it also has a lower FAR than these approaches, indicating its effectiveness in detecting intrusions.

4.2.3. Experiment No. 3: Standard LSTM Trained with Reduced Features

In this section, we investigate the performance of the chi-squared feature selection integrated with the standard LSTM model. Using the χ 2 statistical model, we achieved different subsets of features. These subsets were fed successively to the standard LSTM classification model for training, and the performance of each subset was recorded as shown in Section 6. The subset of features that produced the best performance results was selected as the optimal set as shown in Figure 6. Table 12 presents the list of features in the chosen subset.
As shown in Table 13, with just 21 features, the model could correctly detect 11,377 malicious records out of the 12,833 records in the NSL-KDDTest+ dataset, producing a higher accuracy of 91.16% compared to training the model with all 41 features. Additionally, with the reduced set of features, the LSTM model obtained 91.86% precision, 96.23% specificity, an F-score of 91.32%, and a recall of 91.20%. It also produced a low false alarm rate of 3.77% compared to the standard LSTM trained with all features.
The experimental results from Table 14 indicate that the standard LSTM model trained with the reduced feature set improved the performance by 3.90%.
In the same vein, Table 15 shows that the model improved the performance by 6.49% with the reduced feature set. That is, with the NSL-KDDTest−21 dataset, as presented in Table 13, the model accurately detected 7760 attack records out of a total of 9698, achieving a detection accuracy of 80.98% compared to when it was trained with the complete feature set. It obtained precision, specificity, recall and F-score values of 84.95%, 95.49%, 80.97%, and 81.68%, respectively, with a low false alarm rate of 4.51%.

4.2.4. Experiment No. 4: BidLSTM Trained with Reduced Features

In this experiment, we evaluated the performance of the proposed BidLSTM method using a reduced feature set. Similarly to experiment No. 3, we obtained different subsets of features after applying χ 2 feature selection. Sequentially, these feature subsets were fed to the BidLSTM model for training and classification. We then selected the subset with the best performance as the optimal feature set, as shown in Table 12.
From Table 16, Table 17 and Table 18, we can observe that the BidLSTM model had higher detection accuracy, precision, specificity, F-score, and recall. It is evident from the results that BidLSTM trained with a reduced set of features improves the performance of BidLSTM trained with all 41 features by 4.26% and 7.50% on the NSL-KDDTest+ and NSL-KDDTest−21 datasets, respectively. With 17 features, as presented in Table 12, the model could correctly detect 11,976 attack samples out of the 12,833 samples in the NSL-KDDTest+ dataset, yielding a greater accuracy of 95.62%. Furthermore, it achieved a higher precision of 95.88%, a specificity of 97.89%, an F-score of 95.65%, and a recall of 95.62%. It produced a 2.11% false alarm rate, which was lower than when the model was trained with a complete feature set. In addition, it can be observed from Table 16 that BidLSTM trained with 17 features could effectively detect 8644 attacks from a total of 9698 attack records in the NSL-KDDTest−21 dataset. That is, using the NSL-KDDTest−21 test dataset, BidLSTM with the reduced feature set obtained a detection accuracy of 89.55%, with a low false alarm rate of 2.71%, as shown in Table 18. It also achieved 90.75% precision, 97.29% specificity, 89.55% recall, and an F-score of 89.77%. A comparison of the performances of our proposed method and other existing feature reduction methods is shown in Table 19.
To broaden the scope of the benchmark, we compared the performance of our χ 2 -BidLSTM approach to that of earlier studies that used the NSL-KDD Test+ and NSL-KDDTest−21 datasets. Figure 7 shows the comparison of our results with some of the earlier techniques on these two test sets. The proposed approach, which outperforms other contemporary IDS algorithms, achieved the best detection accuracy based on experimental findings on the NSL-KDD datasets. In addition to having greater detection accuracy, the proposed approach outperformed prior approaches significantly in terms of the false-alarm-rate measure. Our proposed χ 2 -BidLSTM method achieved greater accuracy of 95.62% and an F-score of 95.65%, with a false alarm rate of 2.11% on the NSL-KDDTest+ dataset, using only 17 features. Furthermore, the proposed method obtained an accuracy of 89.55%, an F-score of 89.77%, and a false alarm rate of 2.71%, with just 17 features, according to the experimental findings on the KDDTest−21 dataset, which is superior to the χ 2 -LSTM method and the other existing approaches based on all the performance measures presented in Table 19. The table shows that feature selection improves the performance of both the standard LSTM and BidLSTM models considerably in predicting network intrusion. Chi-square feature selection, compared to other existing feature-selection-based approaches (i.e., PCA, information gain, mutual nformation, CFS, and gain ratio), exhibited superiority in terms of detection accuracy and FAR.

5. Model Complexity and Limitations

This subsection addresses the complexity of the proposed χ 2 -BidLSTM method and the time needed for training and testing. Furthermore, we also present the limitations of the proposed approach.

5.1. Time Complexity

We evaluated the time complexity of our proposed χ 2 -BidLSTM approach with regard to the various units of the model implementation: feature ranking using the χ 2 statistical model, optimal feature selection using the forward best search algorithm, and BidLSTM. To obtain the best feature combination set for training, we first used the χ 2 statistical model to rank all features in a descending order based on their χ 2 test scores, as shown in Section 1. The time complexity is O ( n × F ) , where n is the number of classes and F is the number of features. After ranking all the features, we used the forward best search algorithm to obtain the optimal feature set for each model. The algorithm begins with an empty set. It searches for the feature with the highest χ 2 test score using the evaluation function and appends it to SELECTED (see Algorithm Section 1; lines 9–28). The algorithm continuously finds the next feature that can achieve the best evaluation score with the feature(s) in SELECTED until the desired dimension is reached and no additional features can improve the accuracy. The algorithm’s time complexity is O ( D ) , where D is the desired dimension.
The core unit of the approach is the BidLSTM model, which trains two LSTM layers (i.e., the first layer in the forward direction and the second in reverse order). The time complexity for training the forward LSTM layer is O ( ( Q H ) + ( Q M c B s ) + ( H U f ) + ( M c B s U f ) ) , where Q denotes the total number of output units, H represents the total number of hidden layers, and M c represents the total number of memory cell blocks, with B s as the size of the cell blocks ( B s > 0 ) . The number of units associated with the memory cells, gates, and hidden units in the forward direction is denoted by U f . With an equal time complexity needed to train the reversed order, the time complexity for training the BidLSTM predictive model is O ( 2 [ ( Q H ) + ( Q M c B s ) + ( H U f ) + ( M c B s U f ) ] ) = O ( W ) , where W represents the overall weights necessary for the network model. Hence, the total computational complexity of the proposed χ 2 -BidLSTM with respect to time is O ( ( n × F ) + D + W ) .

5.2. Execution Time Analysis

In this subsection, we analyze the testing and training times of the models used in this study. To ensure a fair and accurate analysis, we performed all the experiments using an Intel Core i5 PC with an 8 GB memory. The training and testing times of the various methods are reported in Figure 8a and Figure 8b, respectively. From Figure 8a, it can be seen that the BidLSTM approach with the complete set of features requires more time (9789.24 s) to train than the standard LSTM model (5546.31 s). The reason is that BidLSTM trains two LSTMs with an entry shape of a dimensional matrix of the data length and the number of features used. In this domain, the most important characteristic of a model is its ability to accurately, actively, and effectively detect network intrusion. As such, there is a trade-off between training time and performance. Therefore, the standard LSTM model may require a shorter training time, but the BidLSTM model exhibits better accuracies in detecting intrusions. From Figure 8, it is evident that feature selection reduces the training and testing times of both methods considerably. With the reduced set of features, the standard LSTM model requires a training time of 2397.36 s, whiles it takes 4678.62 s to train the BidLSTM model. Thus, feature selection not only improves the performances of the various models but also minimizes the computational times of the models.

5.3. Limitations

The experimental findings demonstrate that the proposed method can efficiently detect network intrusions. However, more study in this domain is still required to improve the overall performance of the proposed approach further. The proposed χ 2 -BidLSTM model has a higher complexity and needs more training time than the standard LSTM model, as demonstrated by the complexity and runtime analyses of the model. In the real-world network scenario of computer systems, new forms of intrusions emerge continually, which may not be captured by the NSL-KDD dataset. The absence of emerging novel attacks in the dataset can make it difficult for the proposed method to adapt to recently emerging attacks in a network system. These are the primary limitations of the χ 2 -BidLSTM-based intrusion detection model.

6. Conclusions and Future Directions

6.1. Conclusions

Even though various machine learning methods have been proposed to enhance the performance of IDSs, most existing intrusion detection methods still struggle to achieve good performance. This study offers a new IDS approach called χ 2 -BidLSTM that integrates the chi-square ( χ 2 ) statistical model with a bidirectional long short-term memory (BidLSTM) model. We used the χ 2 statistical model to reduce the dataset to the optimal set, to handle the imbalance and high dimensionality of the data. The NSL-KDD dataset with 10-fold cross-validation was used to evaluate the performance of our proposed χ 2 -BidLSTM approach, and the results were compared to other existing intrusion detection approaches. The experimental findings indicated that the proposed χ 2 -BidLSTM method improved the intrusion detection accuracies of the standard LSTM and BidLSTM models by 3.90% and 4.26%, respectively, on the NSL-KDDTest+ test set, and by 6.49% and 7.50%, respectively, on the NSL-KDDTest−21 test set. Compared with previously existing techniques that utilize feature selection, the proposed approach achieved higher detection accuracy and F-scores on the two test datasets, while maintaining lower false alarm rates. Furthermore, the proposed χ 2 -BidLSTM method exhibited good performance in detecting minority attack types such as User-to-Root (U2R) and Remote-2-Local (R2L) attacks compared to the other techniques, indicating the robustness and effectiveness of our proposed approach.

6.2. Future Directions

The future direction of this study is to explore more feature selection algorithms to further improve the intrusion detection rate of our model. Additionally, as mentioned in Section 5.3, although the proposed approach has a higher intrusion detection accuracy with a low false alarm rate (FAR), the approach has a significant execution-time cost due to the BidLSTM deep network architecture and computations within the LSTM memory cells. Hence, we intend to investigate how to reduce the computational complexity of the proposed method further while maintaining high detection accuracy and a low false alarm rate. We also plan to investigate the performance of our approach with the latest intrusion detection datasets with real-world traffic, such as the UNSW-NB15 and CIC-IDS2017 datasets.

Author Contributions

Y.I.—conceptualization, formal analysis, methodology, software, validation, writing—original draft, and writing—review and editing; Y.X.—investigation, software, resources, supervision, and writing—review and editing; L.A.—formal analysis, methodology, validation, visualization, and writing—original draft; Z.A.-R.—formal analysis, methodology, validation, visualization, and writing—original draft; Y.-C.H.—formal analysis, methodology, validation, visualization, and writing—original draft; S.K.—formal analysis, methodology, validation, visualization, and writing—original draft; S.L.—formal analysis, methodology, validation, visualization, and writing—original draft. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The dataset used or analyzed during the current study is publicly available for use from the University of New Brunswick (UNB) data repository. It can be accessed using the following url: Available online: https://www.unb.ca/cic/datasets/nsl.html (accessed on 10 January 2022).

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF), grant funded by the Korean government (MSIT) (No. 2021R1F1A1063319).

Conflicts of Interest

The authors declare there is no conflict of interest regarding the publication of this paper. The manuscript has been submitted solely to this journal and is not published, in press, or submitted elsewhere.

Abbreviations

The following abbreviations are used in this study:
IDSIntrusion Detection System
NIDSNetwork Intrusion Detection System
LSTMLong Short-Term Memory
BidLSTMBidirectional Long Short-Term Memory
NSL-KDDNew Security Laboratory Knowledge Discovery Data
DLDeep Learning
MLMachine Learning
DNNDeep Neural Network
RNNRecurrent Neural Networks
ICTInformation and Communication Technology
PCAPrincipal Component Analysis
ELEnsemble Learning
FSSL-ELFuzzy-Based Semi-Supervised Learning with Ensemble Learning
TSE-IDSTwo-Stage Ensemble Intrusion Detection System
CFS-BACorrelation-Based Feature Selection with Bat Algorithm
FSFeature Selection
GRAGreedy Randomized Adaptive
EM-FSEnsemble Model with Feature Selection
MMFSAMulti-Measure Feature Selection Algorithm
LSSVMLeast Square Support Vector Machine
CP-ARMCentral Points with Association Rule Mining

References

  1. Agarwal, R.; Joshi, M.V. PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection), Technical Report. In Proceedings of the First SIAM Conference on Data Mining, Chicago, IL USA, 4–7 April 2001. [Google Scholar]
  2. Ghosh, A.K.; Schwartzbard, A. A Study in Using Neural Networks for Anomaly and Misuse Detection. In Proceedings of the 8th USENIX Security Symposium, Washington, DC, USA, 23–36 August 1999. [Google Scholar]
  3. Lee, W.; Stolfo, S.; Mok, K. A data mining framework for building intrusion detection models. In Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344), Oakland, CA, USA, 14 May 1999; pp. 120–132. [Google Scholar]
  4. Beqiri, E. Neural Networks for Intrusion Detection Systems. In Global Security, Safety, and Sustainability. ICGS3 2009. Communications in Computer and Information Science; Jahankhani, H., Hessami, A.G., Hsu, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; Volume 45. [Google Scholar] [CrossRef]
  5. Cannady, J. Artificial neural networks for misuse detection. In Proceedings of the National Information Systems Security Conference, Arlington, VI, USA, 5 October 1998; Volume 26, pp. 443–456. [Google Scholar]
  6. Sen, J.; Mehtab, S. Machine Learning Applications in Misuse and Anomaly Detection. In Security and Privacy From a Legal, Ethical, and Technical Perspective; IntechOpen: London, UK, 2020; Available online: https://www.intechopen.com/chapters/72542 (accessed on 10 January 2022).
  7. Nassif, A.B.; Talib, M.; Nasir, Q.; Dakalbab, F.M. Machine Learning for Anomaly Detection: A Systematic Review. IEEE Access 2021, 9, 78658–78700. [Google Scholar] [CrossRef]
  8. Jose, S.; Malathi, D.; Reddy, B.; Jayaseeli, D. A survey on anomaly based host intrusion detection system. J. Phys. Conf. Ser. 2018, 1000, 012049. [Google Scholar] [CrossRef]
  9. Jia, Q.; Chen, C.X.; Gao, X.; Li, X.; Yan, B.; Ai, G.Q.; Li, J.; Xu, J. Anomaly detection method using center offset measurement based on leverage principle. Knowl. Based Syst. 2020, 190, 105191. [Google Scholar] [CrossRef]
  10. Kim, G.; Lee, S.; Kim, S. A Novel Hybrid Intrusion Detection Method Integrating Anomaly Detection with Misuse Detection. Expert Syst. Appl. 2014, 41, 1690–1700. [Google Scholar] [CrossRef]
  11. Hajisalem, V.; Babaie, S. A hybrid intrusion detection system based on ABC-AFS algorithm for misuse and anomaly detection. Comput. Netw. 2018, 136, 37–50. [Google Scholar] [CrossRef]
  12. Aldweesh, A.; Derhab, A.; Emam, A.Z. Deep learning approaches for anomaly-based intrusion detection systems: A survey, taxonomy, and open issues. Knowl. Based Syst. 2020, 189, 105124. [Google Scholar] [CrossRef]
  13. Wang, H.; Gu, J.; Wang, S. An effective intrusion detection framework based on SVM with feature augmentation. Knowl. Based Syst. 2017, 136, 130–139. [Google Scholar] [CrossRef]
  14. Zhang, J.; Zulkernine, M. A hybrid network intrusion detection technique using random forests. In Proceedings of the First International Conference on Availability, Reliability and Security (ARES’06), Vienna, Austria, 20–22 April 2006; pp. 262–269. [Google Scholar]
  15. Horng, S.J.; Su, M.Y.; Chen, Y.H.; Kao, T.W.; Chen, R.J.; Lai, J.L.; Perkasa, C.D. A Novel Intrusion Detection System Based on Hierarchical Clustering and Support Vector Machines. Expert Syst. Appl. 2011, 38, 306–313. [Google Scholar] [CrossRef]
  16. Bamakan, S.M.H.; Wang, H.; Shi, Y. Ramp loss K-Support Vector Classification-Regression; a robust and sparse multi-class approach to the intrusion detection problem. Knowl. Based Syst. 2017, 126, 113–126. [Google Scholar] [CrossRef]
  17. Jovic, A.; Brkic, K.; Bogunovic, N. A review of feature selection methods with applications. In Proceedings of the 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 25–29 May 2015; pp. 1200–1205. [Google Scholar]
  18. Zhu, Y.; Liang, J.; Chen, J.; Ming, Z. An improved NSGA-III algorithm for feature selection used in intrusion detection. Knowl. Based Syst. 2017, 116, 74–85. [Google Scholar] [CrossRef]
  19. Sánchez-Maroño, N.; Alonso-Betanzos, A.; Tombilla-Sanromán, M. Filter Methods for Feature Selection—A Comparative Study. In Intelligent Data Engineering and Automated Learning; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  20. Jan, S.U.; Koo, I. A Novel Feature Selection Scheme and a Diversified-Input SVM-Based Classifier for Sensor Fault Classification. J. Sens. 2018, 2018, 7467418:1–7467418:21. [Google Scholar] [CrossRef]
  21. Chen, J.; Qi, X.; Chen, L.; Chen, F.; Cheng, G. Quantum-inspired ant lion optimized hybrid k-means for cluster analysis and intrusion detection. Knowl. Based Syst. 2020, 203, 106167. [Google Scholar] [CrossRef]
  22. Aljawarneh, S.A.; Aldwairi, M.; Yassein, M.B. Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J. Comput. Sci. 2018, 25, 152–160. [Google Scholar] [CrossRef]
  23. Hota, H.; Shrivas, A. Decision Tree Techniques Applied on NSL-KDD Data and Its Comparison with Various Feature Selection Techniques. In Advanced Computing, Networking and Informatics—Volume 1. Smart Innovation, Systems and Technologies; Kumar Kundu, M., Mohapatra, D., Konar, A., Chakraborty, A., Eds.; Springer: Cham, Switzerland, 2014; Volume 27. [Google Scholar] [CrossRef]
  24. Khammassi, C.; Krichen, S. A GA-LR wrapper approach for feature selection in network intrusion detection. Comput. Secur. 2017, 70, 255–277. [Google Scholar] [CrossRef]
  25. Acharya, N.; Singh, S. An IWD-based feature selection method for intrusion detection system. Soft Comput. 2018, 22, 4407–4416. [Google Scholar] [CrossRef]
  26. Khalvati, L.; Keshtgary, M.; Rikhtegar, N. Intrusion Detection based on a Novel Hybrid Learning Approach. J. Data Min. 2018, 6, 157–162. [Google Scholar]
  27. Jabbar, M.A.; Aluvalu, R.; Reddy, S.S. Cluster Based Ensemble Classification for Intrusion Detection System. In Proceedings of the 9th International Conference on Machine Learning and Computing, Singapore, 24–26 February 2017. [Google Scholar]
  28. Paulauskas, N.; Auskalnis, J. Analysis of data pre-processing influence on intrusion detection using NSL-KDD dataset. In Proceedings of the 2017 Open Conference of Electrical, Electronic and Information Sciences (eStream), Vilnius, Lithuania, 27 April 2017; pp. 1–5. [Google Scholar]
  29. Zhou, Y.; Cheng, G.; Jiang, S.; Dai, M. Building an Efficient Network Intrusion Detection System Based on Feature Selection and Ensemble Classifier. arXiv 2019, arXiv:abs/1904.01352. [Google Scholar]
  30. Pham, N.; Foo, E.; Suriadi, S.; Jeffrey, H.; Lahza, H.F. Improving performance of intrusion detection system using ensemble methods and feature selection. In Proceedings of the Australasian Computer Science Week Multiconference, Brisband, QLD, Australia, 29 January–2 February 2018. [Google Scholar]
  31. Raman, M.; Somu, N.; Kannan, K.; Liscano, R.; Sriram, V. An efficient intrusion detection system based on hypergraph—Genetic algorithm for parameter optimization and feature selection in support vector machine. Knowl. Based Syst. 2017, 134, 1–12. [Google Scholar] [CrossRef]
  32. Abdullah, M.; Alshannaq, A.; Balamash, A.; Almabdy, S. Enhanced Intrusion Detection System using Feature Selection Method and Ensemble Learning Algorithms. Int. J. Comput. Sci. Inf. Secur. (IJCSIS) 2018, 16, 48–55. [Google Scholar]
  33. Mohammadi, S.; Mirvaziri, H.; Ahsaee, M.G.; Karimipour, H. Cyber intrusion detection by combined feature selection algorithm. J. Inf. Secur. Appl. 2019, 44, 80–88. [Google Scholar] [CrossRef]
  34. Salo, F.; Nassif, A.B.; Essex, A. Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection. Comput. Netw. 2019, 148, 164–175. [Google Scholar] [CrossRef]
  35. Zhong, Y.; Chen, W.; Wang, Z.; Chen, Y.; Wang, K.; Li, Y.; Yin, X.; Shi, X.; Yang, J.; Li, K. HELAD: A novel network anomaly detection model based on heterogeneous ensemble learning. Comput. Netw. 2020, 169, 107049. [Google Scholar] [CrossRef]
  36. Khammassi, C.; Krichen, S. A NSGA2-LR wrapper approach for feature selection in network intrusion detection. Comput. Netw. 2020, 172, 107183. [Google Scholar] [CrossRef]
  37. Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
  38. Available online: https://www.unb.ca/cic/datasets/nsl.html (accessed on 1 January 2022).
  39. Available online: https://archive.ics.uci.edu/ml/datasets/kdd+cup+1999+data (accessed on 1 January 2022).
  40. Liu, H.; Setiono, R. Chi2: Feature selection and discretization of numeric attributes. In Proceedings of the 7th IEEE International Conference on Tools with Artificial Intelligence, Herndon, VA, USA, 5–8 November 1995; pp. 388–391. [Google Scholar]
  41. Ali, L.; Khan, S.; Golilarz, N.A.; Yakubu, I.; Qasim, I.; Noor, A.; Nour, R. A Feature-Driven Decision Support System for Heart Failure Prediction Based on χ2 Statistical Model and Gaussian Naive Bayes. Comput. Math. Methods Med. 2019, 2019, 6314328. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Cui, Z.; Ke, R.; Wang, Y. Deep Bidirectional and Unidirectional LSTM Recurrent Neural Network for Network-wide Traffic Speed Prediction. arXiv 2018, arXiv:abs/1801.02143. [Google Scholar]
  43. Berman, D.S.; Buczak, A.; Chavis, J.S.; Corbett, C. A Survey of Deep Learning Methods for Cyber Security. Information 2019, 10, 122. [Google Scholar] [CrossRef] [Green Version]
  44. Kim, J.; Kim, H. Applying Recurrent Neural Network to Intrusion Detection with Hessian Free Optimization. In International Workshop on Information Security Applications; Springer: Cham, Switzerland, 2015; pp. 357–369. [Google Scholar]
  45. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  46. Available online: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (accessed on 5 January 2022).
  47. Hochreiter, S.; Schmidhuber, J. LSTM can Solve Hard Long Time Lag Problems. NIPS 1996, 9, 473–479. [Google Scholar]
  48. Schuster, M.; Paliwal, K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef] [Green Version]
  49. Graves, A.; Mohamed, A.; Hinton, G.E. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 6645–6649. [Google Scholar]
  50. Roy, B.; Cheung, H. A Deep Learning Approach for Intrusion Detection in Internet of Things using Bi-Directional Long Short-Term Memory Recurrent Neural Network. In Proceedings of the 2018 28th International Telecommunication Networks and Applications Conference (ITNAC), Sydney, NSW, Australia, 21–23 November 2018; pp. 1–6. [Google Scholar]
  51. Ma, T.; Wang, F.; Cheng, J.; Yu, Y.; Chen, X. A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks. Sensors 2016, 16, 1701. [Google Scholar] [CrossRef] [Green Version]
  52. Thaseen, I.; Banu, J.; Lavanya, K.; Ghalib, M.R.; Abhishek, K. An integrated intrusion detection system using correlation-based attribute selection and artificial neural network. Trans. Emerg. Telecommun. Technol. 2021, 32, e4014. [Google Scholar]
  53. Yang, Y.; Zheng, K.; Wu, C.; Niu, X.; Yang, Y. Building an Effective Intrusion Detection System Using the Modified Density Peak Clustering Algorithm and Deep Belief Networks. Appl. Sci. 2019, 9, 238. [Google Scholar] [CrossRef] [Green Version]
  54. Yin, C.; Zhu, Y.; Fei, J.; He, X.Z. A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks. IEEE Access 2017, 5, 21954–21961. [Google Scholar] [CrossRef]
  55. Javaid, A.; Niyaz, Q.; Sun, W.; Alam, M. A Deep Learning Approach for Network Intrusion Detection System. Eai Endorsed Trans. Secur. Saf. 2016, 3, e2. [Google Scholar]
  56. Kanna, P.R.; Santhi, P. Unified Deep Learning approach for Efficient Intrusion Detection System using Integrated Spatial-Temporal Features. Knowl. Based Syst. 2021, 226, 107132. [Google Scholar] [CrossRef]
  57. Gao, Y.; Liu, Y.; Jin, Y.; Chen, J.; Wu, H. A Novel Semi-Supervised Learning Approach for Network Intrusion Detection on Cloud-Based Robotic System. IEEE Access 2018, 6, 50927–50938. [Google Scholar] [CrossRef]
  58. Tama, B.A.; Comuzzi, M.; Rhee, K. TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-Based Intrusion Detection System. IEEE Access 2019, 7, 94497–94507. [Google Scholar] [CrossRef]
  59. Kanakarajan, N.K.; Muniasamy, K. Improving the Accuracy of Intrusion Detection Using GAR-Forest with Feature Selection. In Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA), Durgapur, West Bengal, India, 16–18 November 2015; Springer: New Delhi, India, 2016. [Google Scholar]
  60. Herrera-Semenets, V.; Bustio-Martínez, L.; Hernández-León, R.; van den Berg, J. A multi-measure feature selection algorithm for efficacious intrusion detection. Knowl.-Based Syst. 2021, 227, 107264. [Google Scholar] [CrossRef]
  61. Ambusaidi, M.A.; He, X.; Nanda, P.; Tan, Z. Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm. IEEE Trans. Comput. 2016, 65, 2986–2998. [Google Scholar] [CrossRef] [Green Version]
  62. Moustafa, N.; Slay, J. A hybrid feature selection for network intrusion detection systems: Central points. arXiv 2017, arXiv:abs/1707.05505. [Google Scholar]
Figure 1. The proposed IDS architecture.
Figure 1. The proposed IDS architecture.
Sensors 22 02018 g001
Figure 2. An unrolling RNN architecture [46].
Figure 2. An unrolling RNN architecture [46].
Sensors 22 02018 g002
Figure 3. The LSTM memory cell.
Figure 3. The LSTM memory cell.
Sensors 22 02018 g003
Figure 4. A bidirectional LSTM architecture.
Figure 4. A bidirectional LSTM architecture.
Sensors 22 02018 g004
Figure 5. Comparison of results against existing methods on NSL-KDDTest+ and NSL-KDDTest−21 using all 41 features. (a) Performance results on NSL-KDDTest+; (b) performance results on NSL-KDDTest−21.
Figure 5. Comparison of results against existing methods on NSL-KDDTest+ and NSL-KDDTest−21 using all 41 features. (a) Performance results on NSL-KDDTest+; (b) performance results on NSL-KDDTest−21.
Sensors 22 02018 g005
Figure 6. Performance results of different subsets of features on NSL-KDDTest+ and NSL-KDDTest−21. (a) Performance of different subsets on NSL-KDDTest+; (b) performance of different subsets on NSL-KDDTest−21.
Figure 6. Performance results of different subsets of features on NSL-KDDTest+ and NSL-KDDTest−21. (a) Performance of different subsets on NSL-KDDTest+; (b) performance of different subsets on NSL-KDDTest−21.
Sensors 22 02018 g006
Figure 7. Comparison of results against existing feature selection methods on NSL-KDDTest+ and NSL-KDDTest−21. (a) Performance results on NSL-KDDTest+; (b) performance results on NSL-KDDTest−21.
Figure 7. Comparison of results against existing feature selection methods on NSL-KDDTest+ and NSL-KDDTest−21. (a) Performance results on NSL-KDDTest+; (b) performance results on NSL-KDDTest−21.
Sensors 22 02018 g007
Figure 8. Training and testing times of the methods used in the study. (a) Training times of the various methods in seconds; (b) testing times of the various methods in seconds.
Figure 8. Training and testing times of the methods used in the study. (a) Training times of the various methods in seconds; (b) testing times of the various methods in seconds.
Sensors 22 02018 g008
Table 1. Traffic sample breakdown of the NSL-KDD dataset.
Table 1. Traffic sample breakdown of the NSL-KDD dataset.
ClassNumber of Samples
Attack Type KDDTrain+KDDTest+KDDTest−21
DoS45,92774584342
Probe11,65624212402
U2R52200200
R2L99527542754
Normal67,34397112152
Total 125,97322,54411,850
Table 2. List of all 41 features in the NLS-KDD dataset.
Table 2. List of all 41 features in the NLS-KDD dataset.
No.FeatureCodeNo.FeatureCode
01duration F 01 22is_guest_login F 22
02protocol_type F 02 23count F 23
03service F 03 24srv_count F 24
04flag F 04 25serror_rate F 25
05src_bytes F 05 26srv_error_rate F 26
06dst_bytes F 06 27rerror_rate F 27
07land F 07 28srv_rerror_rate F 28
08wrong_fragment F 08 29same_srv_rate F 29
09urgent F 09 30diff_srv_rate F 30
10hot F 10 31srv_diff_host_rate F 31
11num_failed_logins F 11 32dst_host_count F 32
12logged_in F 12 33dst_host_srv_count F 33
13num_compromised F 13 34dst_host_same_srv_rate F 34
14root_shell F 14 35dst_host_diff_srv_rate F 35
15su_attempted F 15 36dst_host_same_src_port_rate F 36
16num_root F 16 37dst_host_srv_diff_host_rate F 37
17num_file_creations F 17 38dst_host_serror_rate F 38
18num_shells F 18 39dst_host_srv_serror_rate F 39
19num_access_files F 19 40dst_host_rerror_rate F 40
20num_outbound_cmds F 20 41dst_host_srv_rerror_rate F 41
21is_host_login F 21
Table 3. Categories of the various attack types.
Table 3. Categories of the various attack types.
ClassTypes of Attacks
Training SetTesting Set
DoSsmurf, neptune, land, back, teardrop, podland, pod, apache2, processtable, neptune, smurf, worm, udpstorm, back, mailbomb, teardrop
Probesatan, nmap, portsweep, ipsweepportsweep, satan, nmap, ipsweep, saint, mscan
U2Rperl, loadmodule, buffer-overflow, rootkitps, rootkit, sqlattack, buffer-overflow, xterm, loadmodule, perl
R2Limap, warezmaster, fpt-write, warezclient, spy, phf, multihop, guess-passwdwarezmaster, snmpguess, phf, xsnoop, httptunnel, snmpgetattack, sendmail, warezclient, fpt-write, named, xlock, spy, imap, guess-passwd, multihop
Normalnormalnormal
Table 4. Computation of chi-square test scores.
Table 4. Computation of chi-square test scores.
Normal ClassAttack ClassTotal
F i occurnan + a = φ
F i do not occur ν ν ν + ρ = m φ
Total n + ν = δ a + ρ = m δ m
Table 5. Confusion matrix for standard LSTM model trained with all 41 features.
Table 5. Confusion matrix for standard LSTM model trained with all 41 features.
Predicted LabelPredicted Label
NormalDoSProbeR2LU2RNormalDoSProbeR2LU2R
True LabelNormal89584729911168511433815
DoS40864415386656523064527693
Probe24126211418222582320622534
R2L184549820670439737919290
U2R4526109251358187
Test SetNSL-KDDTest+NSL-KDDTest−21
Table 6. Standard LSTM performance on NSL-KDDTest+ using all 41 features.
Table 6. Standard LSTM performance on NSL-KDDTest+ using all 41 features.
Class LabelPerformance Results (%)
PrecisionSpecificityRecallFARF-Score
DoS99.4399.7586.360.2592.44
Probe53.6590.9387.329.0766.47
R2L98.4399.8375.050.1785.17
U2R48.4299.5646.000.4447.18
Normal91.0793.1692.256.8491.66
Table 7. Standard LSTM performance on NSL-KDDTest−21 using all 41 features.
Table 7. Standard LSTM performance on NSL-KDDTest−21 using all 41 features.
Class LabelPerformance Results (%)
PrecisionSpecificityRecallFARF-Score
DoS98.5899.4170.570.5982.26
Probe59.6185.2185.8514.7970.36
R2L97.9799.5670.040.4481.69
U2R37.9998.7843.501.2240.56
Normal54.6285.5678.3014.4464.35
Table 8. Confusion matrix of BidLSTM model trained with all 41 features.
Table 8. Confusion matrix of BidLSTM model trained with all 41 features.
Predicted LabelPredicted Label
NormalDoSProbeR2LU2RNormalDoSProbeR2LU2R
True LabelNormal92641435381776236428
DoS32167383433534263657221236
Probe19192216051951721651015
R2L173031122700446028120270
U2R39053010835060798
Test SetNSL-KDDTest+NSL-KDDTest−21
Table 9. BidLSTM performance on NSL-KDDTest+ using all 41 features.
Table 9. BidLSTM performance on NSL-KDDTest+ using all 41 features.
Class LabelPerformance Results (%)
PrecisionSpecificityRecallFARF-Score
DoS99.8599.9390.340.0794.86
Probe65.9994.3291.535.6876.69
R2L99.7499.9782.430.0390.26
U2R62.0799.7054.000.3057.75
Normal92.7594.3695.405.6494.06
Table 10. BidLSTM performance on NSL-KDDTest−21 using all 41 features.
Table 10. BidLSTM performance on NSL-KDDTest−21 using all 41 features.
Class LabelPerformance Results (%)
PrecisionSpecificityRecallFARF-Score
DoS99.4899.7584.220.2591.22
Probe70.0490.2090.139.8078.83
R2L98.9799.7773.600.2384.42
U2R62.4299.4949.000.5154.90
Normal61.7188.6482.5311.3670.62
Table 11. Performance comparison against existing methods in the literature using all 41 features (N/A denotes not available).
Table 11. Performance comparison against existing methods in the literature using all 41 features (N/A denotes not available).
ApproachPerformance (%)
NSL-KDDTest+NSL-KDDTest−21
AccuracyF-ScoreFARAccuracyF-ScoreFAR
SCDNN [51]72.64N/A27.3644.55N/A55.45
NN [52]83.6783.2823.47N/AN/AN/A
MDPCA-DBN [53]82.0881.752.6266.1874.8713.06
RNN [54]81.2979.2512.4264.67N/AN/A
STL [55]74.38N/A7.2157.34N/A15.06
OCNN [56]88.6789.7811.89N/AN/AN/A
HMLSTM [56]87.1188.4012.20N/AN/AN/A
OCNN-HMLSTM [56]90.6191.468.86N/AN/AN/A
Standard LSTM87.2688.034.0374.4975.765.96
BidLSTM91.3691.673.0682.0582.774.20
Table 12. The selected optimal set of features.
Table 12. The selected optimal set of features.
MethodFeature CodeNumber of Features
Standard LSTM[ F 02 , F 03 , F 04 , F 05 , F 06 , F 08 , F 10 , F 13 , F 14 , F 22 , F 24 , F 25 , F 27 , F 28 , F 29 , F 31 , F 33 , F 34 , F 38 , F 40 , F 41 ]21
BidLSTM[ F 02 , F 03 , F 04 , F 05 , F 06 , F 08 , F 10 , F 13 , F 14 , F 22 , F 24 , F 25 , F 27 , F 28 , F 29 , F 31 , F 33 ]17
Table 13. Confusion matrix of standard LSTM model trained with 21 features.
Table 13. Confusion matrix of standard LSTM model trained with 21 features.
Predicted LabelPredicted Label
NormalDoSProbeR2LU2RNormalDoSProbeR2LU2R
True LabelNormal917575051681836122692114
DoS375680621595340533434931388
Probe11115621202113178332143480
R2L3220100232573289266212823
U2R68006126350190146
Test SetNSL-KDDTest+NSL-KDDTest−21
Table 14. Standard LSTM performance results on NSL-KDDTest+ using 21 features.
Table 14. Standard LSTM performance results on NSL-KDDTest+ using 21 features.
Class LabelPerformance Results (%)
PrecisionSpecificityRecallFARF-Score
DoS97.6698.9291.261.0894.35
Probe72.1195.9387.574.0779.09
R2L97.8199.7484.420.2690.63
U2R60.8799.6463.000.3661.92
Normal91.2893.1794.486.8392.85
Table 15. Standard LSTM performance results on NSL-KDDTest−21 using 21 features.
Table 15. Standard LSTM performance results on NSL-KDDTest−21 using 21 features.
Class LabelPerformance Results (%)
PrecisionSpecificityRecallFARF-Score
DoS98.4199.2876.990.7286.39
Probe67.1888.9289.2211.0876.65
R2L96.2999.1077.270.9085.74
U2R53.8798.9373.001.0762.00
Normal66.0090.2585.329.7574.42
Table 16. Confusion matrix of BidLSTM model trained with 17 features.
Table 16. Confusion matrix of BidLSTM model trained with 17 features.
Predicted LabelPredicted Label
NormalDoSProbeR2LU2RNormalDoSProbeR2LU2R
True LabelNormal9580011651019682515009
DoS2617018152027286391294743
Probe12712293005284225880
R2L142510625010276017523030
U2R3420016413709171
Test SetNSL-KDDTest+NSL-KDDTest−21
Table 17. BidLSTM performance results on NSL-KDDTest+ using 17 features.
Table 17. BidLSTM performance results on NSL-KDDTest+ using 17 features.
Class LabelPerformance Results (%)
PrecisionSpecificityRecallFARF-Score
DoS99.8999.9594.100.0596.91
Probe85.9898.1494.711.8690.13
R2L99.8099.9790.810.0395.10
U2R81.5999.8382.000.1781.80
Normal94.4495.6198.654.3996.50
Table 18. BidLSTM performance results on NSL-KDDTest−21 using 17 features.
Table 18. BidLSTM performance results on NSL-KDDTest−21 using 17 features.
Class LabelPerformance Results (%)
PrecisionSpecificityRecallFARF-Score
DoS97.1298.4590.101.5593.48
Probe84.3595.5794.004.4388.92
R2L98.9799.7483.620.2690.65
U2R76.6899.5585.500.4480.85
Normal75.8493.5391.456.4782.92
Table 19. Comparison of results against existing feature-selection-based algorithms on NSL-KDDTest+ and NSL-KDDTest−21.
Table 19. Comparison of results against existing feature-selection-based algorithms on NSL-KDDTest+ and NSL-KDDTest−21.
ApproachFeature Selection MethodNumber of FeaturesPerformance (%)
NSL-KDDTest+NSL-KDDTest−21
AccuracyF-scoreFARAccuracyF-scoreFAR
FSSL-EL [57]PCA2084.54N/A5.3171.29N/A20.35
TSE-IDS [58]Hybrid3785.80N/A11.7072.52N/A18.00
CFS-BA [29]CFS1087.37N/A3.1973.57N/A12.92
FS+GRA-Forest [59]Information Gain3285.0685.1012.20N/AN/AN/A
EM-FS [30]Gain Ratio3584.25N/A2.79N/AN/AN/A
MMFSA-CR [60]Hybrid1983.98N/AN/AN/AN/AN/A
LSSVM [61]Mutual Information1876.2076.103.90N/AN/AN/A
CP-ARM [62]Hybrid1179.6079.503.50N/AN/AN/A
χ 2 -LSTMChi-Square2191.1691.323.7780.9881.684.51
χ 2 -BidLSTMChi-Square1795.6295.652.1189.5589.772.71
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Imrana, Y.; Xiang, Y.; Ali, L.; Abdul-Rauf, Z.; Hu, Y.-C.; Kadry, S.; Lim, S. χ2-BidLSTM: A Feature Driven Intrusion Detection System Based on χ2 Statistical Model and Bidirectional LSTM. Sensors 2022, 22, 2018. https://doi.org/10.3390/s22052018

AMA Style

Imrana Y, Xiang Y, Ali L, Abdul-Rauf Z, Hu Y-C, Kadry S, Lim S. χ2-BidLSTM: A Feature Driven Intrusion Detection System Based on χ2 Statistical Model and Bidirectional LSTM. Sensors. 2022; 22(5):2018. https://doi.org/10.3390/s22052018

Chicago/Turabian Style

Imrana, Yakubu, Yanping Xiang, Liaqat Ali, Zaharawu Abdul-Rauf, Yu-Chen Hu, Seifedine Kadry, and Sangsoon Lim. 2022. "χ2-BidLSTM: A Feature Driven Intrusion Detection System Based on χ2 Statistical Model and Bidirectional LSTM" Sensors 22, no. 5: 2018. https://doi.org/10.3390/s22052018

APA Style

Imrana, Y., Xiang, Y., Ali, L., Abdul-Rauf, Z., Hu, Y. -C., Kadry, S., & Lim, S. (2022). χ2-BidLSTM: A Feature Driven Intrusion Detection System Based on χ2 Statistical Model and Bidirectional LSTM. Sensors, 22(5), 2018. https://doi.org/10.3390/s22052018

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop