1. Introduction
In present-day society, various organizations and individuals have become more and more reliant upon information and communication technology (ICT), due to the increasing number of useful technologies. The rise in reliance has resulted in a greater demand for more stable and reliable ICT components and services. As a section of ICT, the Internet provides a medium for individuals and organizations to accomplish tasks in their everyday lives. However, as the data flow and the information traffic over the Internet increase, user privacy and transactions become more prone to malicious users’ threats and attacks (intrusions). An intrusion is a succession of activities aiming to jeopardize the security of a network system [
1].
Intrusion detection systems (IDSs) have proven essential in the security domain and play a vital role in detecting different types of malicious behaviors and attacks. IDSs can be grouped into three basic strategic concepts (misuse detection, anomaly detection, and a hybrid of the two) [
2,
3]. Misuse detection is a signature-based approach used to identify a particular matching behavior or signature, compare it to recorded user behavior or activities, and raise a signal [
4,
5,
6]. Anomaly detection is used to spot activities that are significantly different from normal user activity. In anomaly detection systems, an action is raised if there is some deviation from a predefined computer state [
7,
8,
9]. Hybrid detection is a fusion of anomaly and misuse detection methods used to identify malicious activities [
2,
10,
11]. It is vital to mention that network intrusion or attacks can come from outside the network (outsider attacks) or from within the network (insider attacks). Researchers have proposed several different intrusion detection systems over the past few decades using machine learning, deep learning, and other statistical methods. However, in recent times, machine learning and deep learning techniques have gained more attention in many different research areas, including intrusion detection [
12]. They have become the most commonly adopted approaches for many intrusion detection systems (IDS).
In the literature, machine learning methods such as support vector machine (SVM), decision trees (DT), k-nearest neighbor (KNN), artificial neural networks, and deep neural networks (DNN) have been widely used for the detection of network intruders [
13,
14,
15,
16]. However, the performance of these techniques depends heavily on simulated datasets. These datasets often require many features for training, making them computationally expensive for most classification models. Furthermore, using large numbers of features may result in low performance, because some features may be redundant and irrelevant to the performance of a model.
Therefore, it is necessary to perform feature selection before training, to eliminate redundant and irrelevant features from the datasets. Feature selection plays an important role in data preprocessing for most machine learning models. It is the process of selecting features with the highest contributions to the predictive variable. Feature selection can be performed manually or using algorithms (automatically) to reduce the dimensions of the data to a subset of features relevant to building a predictive model. There are three main categories of feature selection in the literature: wrapper, filter, and hybrid methods [
17]. The wrapper method utilizes the greedy search strategy to evaluate all possible feature combinations against a criterion for evaluation based on machine learning algorithms [
17,
18]. The filter method, on the other hand, is not dependent on any machine learning algorithm. Features are selected based on the variable characteristics or intrinsic properties, which are measured via statistical analysis [
19,
20]. A hybrid or embedded method uses a combination of the properties of wrapper and filter methods [
17,
21]. Motivated by the positive impact of feature selection on the performance of machine learning and deep learning models for several different problems, we have developed a new IDS called
-BidLSTM for network systems.
Main Contributions
The proposed -BidLSTM IDS integrates with a BidLSTM-based deep learning model. The statistical model is used for the ranking and selection of features based on their test scores. The selected optimal features are used to train a bidirectional long short-term memory (BidLSTM)-based recurrent neural network (RNN) for network intrusion detection. The NSL-KDD dataset, which can be accessed via the University of New Brunswick (UNB) data repository, is used to train and evaluate our -BidLSTM model’s performance. The contributions of this paper are as follows:
Developing and implementing an intrusion detection system based on a bidirectional long short-term memory integrated with a feature selection model.
To the best of our knowledge, no prior work has addressed the hybridization of the bidirectional LSTM model with statistical model for network intrusion detection.
The -BidLSTM method uses fewer features for training and testing purposes, and thus reduces the complexity of traditional BidLSTM and also improves its classification accuracy.
A better classification accuracy than the traditional bidirectional LSTM model is obtained. Additionally, our approach outperforms existing state-of-the-art methods in the literature.
The remainder of this work is organized as follows.
Section 2 presents a review of related work in the literature. A description of the dataset and the proposed methodology is presented in
Section 3.
Section 4 presents the implementation, experimental results, and discussion.
Section 5 discusses the model complexity and limitations. In
Section 6, we present the conclusions and future directions of the study.
2. Related Work
As an essential element for ensuring security in network systems, IDSs continuously draw the interest of many researchers. Many models have been developed to enhance the effectiveness of IDSs in network systems. In this section, we discuss the literature related to IDS techniques based on machine learning (ML) and deep learning (DL) that leverage feature selection for network anomaly detection.
The authors in [
22] proposed a hybrid IDS approach using the NSL-KDD dataset, which focuses on combining the probability distributions of different learning algorithms using information gain (IG) and a voting algorithm to select relevant features for classification. The hybrid method comprises the J48, Random Tree, Meta Pagging, REPTree, Decision Stump, AdaBoostM1, and naive Bayes base classifiers. Although the technique demonstrated a good performance of 99.81% and 98.56% accuracy for binary and multi-class problems, respectively, there are still some concerns that need attention. The feature selection process in this approach is often biased towards variables with distinct values, not variables that have observations with large values, which can result in over-fitting and poor performance. In [
23], Hota and Shrivas also developed a framework that utilizes different feature selection methods for irrelevant feature removal. The findings suggested that the C4.5 algorithm could obtain the greatest accuracy with IG for just 17 features of the NSL-KDD dataset. The study investigated the performance of four different feature selection methods (i.e., correlation, information gain, relief, and symmetrical uncertainty) integrated with the C4.5 decision tree algorithm for classification. According to the experimental findings, the most efficient amongst the four selection methods was information gain with C4.5, which obtained a detection accuracy of 99.68%. Although the result is promising, the method tends to be skewed towards attributes with many possible values, leading to poor generalization. Moreover, the entropy model employed in C4.5 has many time-consuming logarithmic operations, sorting operations, and continuous values resulting in high computational cost. Using logistic regression combined with a search strategy, the authors in [
24] presented a feature-selection-based IDS model that selects the best subset of features from the KDDCUP’99 and the UNSW-NB15 datasets. The findings indicated that their algorithm yields a good detection accuracy with just 18 selected features from the KDDCUP’99 dataset and 20 selected features from the UNSW-NB15 dataset.
Acharya and Singh [
25] proposed a novel bio-driven feature selection approach that utilizes the Intelligent Water Drops algorithm combined with an SVM classifier for network intrusion detection. Their approach, also known as a swarm optimization algorithm, produced a high performance on the KDDCUP’99 dataset. The results indicated that the approach obtained a high accuracy of 93.12%, a detection rate of 91.35%, and a reduced false alarm rate of 3.35%, compared to other methods. The authors in [
26] introduced a hybrid IDS mechanism that integrates feature selection and clustering using SVM and K-medoids clustering strategies. In this approach, the authors trained a naïve Bayes classifier on the KDDCUP’99 dataset. They evaluated the model using the detection rate, accuracy, and false alarm rate. The evaluation results indicated that the proposed approach obtained a higher detection rate of 90.1%, an accuracy of 91.5%, and a false alarm rate of 6.36%. In [
27], Jabbar et al. presented a cluster-oriented ensemble model for network intrusion detection. The model was developed using the alternating decision tree technique (ADTree) and the K-nearest neighbor (KNN) algorithm. In experiments, their proposed approach showed a better performance with regard to accuracy and detection rate, compared to other methods in the literature.
In another approach [
28], Paulauskas and Auskalnis introduced an ensemble IDS model. The model was developed using naïve Bayes (NB), C5.0, J48, and the partial decision list algorithms as base classifiers, with the notion of integrating multiple learners. Experimental findings indicated that the approach obtained better accuracy for network intrusion detection. To combat the high-dimensionality problem in network traffic, Zhou and Cheng [
29] developed a heuristic feature selection algorithm known as the correlation-based feature selection bat algorithm (CFS-BA). Their strategy obtains the best feature subset by evaluating the correlations among features. The authors further built an ensemble model that incorporates random forest, forest-oriented penalizing attribute, and C4.5 decisions, using the rule of the average of probabilities (AoP). The model was trained and evaluated using CIC-IDS2017, KDDCUP’99, and the NSL-KDD datasets. The results showed that the CFS-BA ensemble approach produced a better performance, compared to other existing methods.
In [
30], Pham et al. presented a hybrid approach that leverages gain ratio and bagging techniques for network intrusion detection. The former (gain ratio) is utilized to obtain the best features. The latter (bagging) is used to integrate tree-based core classifiers. The approach was evaluated using the NSL-KDD dataset. The results showed that the bagging method combined with J48 as the core classifier produced better performance for 35 features. The authors in [
31] proposed a wrapper-based IDS that utilizes a hyper-graph (HG) and a genetic algorithm (GA) for producing possible subsets of features. The approach uses SVM as a classification algorithm, which is evaluated on the NSL-KDD dataset. From the evaluation, their proposed method exhibited a performance accuracy of 96.72% with 35 selected features.
In [
32], Abdullah et al. developed an IDS model based on splitting the data input into several subsets relative to the attack types. In this work, IG was used to select the best features for each subset. Using random forest (RF) and partial decision list (PART) as core classifiers, the method was evaluated on the NSL-KDD dataset. Experimental findings illustrated that higher accuracy was achieved with the RF and PART classifiers combined with product probability. In [
33], Mohammadi et al. introduced a feature-selection-based IDS that incorporates a clustering algorithm. The methodology was developed using a wrapper method that leverages a linear correlation coefficient (LCC) algorithm and a filter strategy that utilizes a cuttlefish algorithm (CFA). Their approach trained a decision tree (DT) classifier on the KDDCUP’99 dataset. Experimental results with 10-fold cross-validation showed that the method obtained a 95.03% accuracy, a 95.23% detection rate, and a reduced false alarm rate of 1.6%. The authors in [
34] developed a hybrid intrusion detection system that integrates principal component analysis (PCA) and information gain (IG) algorithms for feature selection. Their approach was evaluated on the NSL-KDD, Kyoto 2006+, and ISCX 2012 datasets using three ensemble classifiers (i.e., multi-layer perceptron (MLP), SVM, and instance-based learning algorithms (IBK)). The IG-PCA method exhibited more remarkable performance in detection rate, accuracy, and false alarm rate than other existing strategies.
Using the organic combination of several deep learning methods, the authors in [
35] proposed a novel anomaly detection approach known as HELAD. The authors first performed feature extraction and selection using the damped incremental statistics algorithm (DISA). An autoencoder was then trained with selected features of a label dataset while noting the irregular (abnormal) score labels in the data traffic. They further trained an LSTM model using the irregular score label and obtained the final score using a weighted technique. In the experiment, the HELAD method produced a better accuracy compared to other state-of-the-art methods. In [
36], the authors used a multi-objective technique to obtain the best subsets of features, which were then evaluated based on three decision tree algorithms (i.e., NB, RF, and C4.5). The three algorithms were trained and tested using the CIC-IDS2017, UNSW-NB15, and NSL-KDD datasets. In the experiment, the NSGA2-LR approach showed promising results compared to other methods.
5. Model Complexity and Limitations
This subsection addresses the complexity of the proposed -BidLSTM method and the time needed for training and testing. Furthermore, we also present the limitations of the proposed approach.
5.1. Time Complexity
We evaluated the time complexity of our proposed
-BidLSTM approach with regard to the various units of the model implementation: feature ranking using the
statistical model, optimal feature selection using the forward best search algorithm, and BidLSTM. To obtain the best feature combination set for training, we first used the
statistical model to rank all features in a descending order based on their
test scores, as shown in
Section 1. The time complexity is
, where
n is the number of classes and
F is the number of features. After ranking all the features, we used the forward best search algorithm to obtain the optimal feature set for each model. The algorithm begins with an empty set. It searches for the feature with the highest
test score using the evaluation function and appends it to
SELECTED (see Algorithm
Section 1;
lines 9–28). The algorithm continuously finds the next feature that can achieve the best evaluation score with the feature(s) in
SELECTED until the desired dimension is reached and no additional features can improve the accuracy. The algorithm’s time complexity is
, where
D is the desired dimension.
The core unit of the approach is the BidLSTM model, which trains two LSTM layers (i.e., the first layer in the forward direction and the second in reverse order). The time complexity for training the forward LSTM layer is , where Q denotes the total number of output units, H represents the total number of hidden layers, and represents the total number of memory cell blocks, with as the size of the cell blocks . The number of units associated with the memory cells, gates, and hidden units in the forward direction is denoted by . With an equal time complexity needed to train the reversed order, the time complexity for training the BidLSTM predictive model is , where W represents the overall weights necessary for the network model. Hence, the total computational complexity of the proposed -BidLSTM with respect to time is .
5.2. Execution Time Analysis
In this subsection, we analyze the testing and training times of the models used in this study. To ensure a fair and accurate analysis, we performed all the experiments using an Intel Core i5 PC with an 8 GB memory. The training and testing times of the various methods are reported in
Figure 8a and
Figure 8b, respectively. From
Figure 8a, it can be seen that the BidLSTM approach with the complete set of features requires more time (9789.24 s) to train than the standard LSTM model (5546.31 s). The reason is that BidLSTM trains two LSTMs with an entry shape of a dimensional matrix of the data length and the number of features used. In this domain, the most important characteristic of a model is its ability to accurately, actively, and effectively detect network intrusion. As such, there is a trade-off between training time and performance. Therefore, the standard LSTM model may require a shorter training time, but the BidLSTM model exhibits better accuracies in detecting intrusions. From
Figure 8, it is evident that feature selection reduces the training and testing times of both methods considerably. With the reduced set of features, the standard LSTM model requires a training time of 2397.36 s, whiles it takes 4678.62 s to train the BidLSTM model. Thus, feature selection not only improves the performances of the various models but also minimizes the computational times of the models.
5.3. Limitations
The experimental findings demonstrate that the proposed method can efficiently detect network intrusions. However, more study in this domain is still required to improve the overall performance of the proposed approach further. The proposed -BidLSTM model has a higher complexity and needs more training time than the standard LSTM model, as demonstrated by the complexity and runtime analyses of the model. In the real-world network scenario of computer systems, new forms of intrusions emerge continually, which may not be captured by the NSL-KDD dataset. The absence of emerging novel attacks in the dataset can make it difficult for the proposed method to adapt to recently emerging attacks in a network system. These are the primary limitations of the -BidLSTM-based intrusion detection model.