1.1. Literature Review and Build-Up Ideas
Wireless networks have become the backbone of modern communication, enabling connectivity across a wide range of devices and applications. However, this ubiquity has also made them a prime target for malicious actors, who constantly seek to exploit vulnerabilities for data theft, disruption, or espionage [
1,
2]. Therefore, Intrusion Detection Systems (IDS) are essential tools for protecting wireless networks. Developing IDS systems has attracted the attention of researchers for many decades since the early age of wireless networks to observe and secure the accessibility to networks. Until now, the design of IDS can be categorized according to the detection technique it employs. There are two main types and one hybrid type [
3,
4]:
Signature-based IDS (or knowledge-based detection)—A signature-based IDS solution typically monitors inbound network traffic to find sequences and patterns that match a particular attack signature. Its strength is its low false alarm rates compared to anomaly-based IDS. However, a major limitation of signature-based IDS solutions is their inability to detect unknown attacks. Malicious actors can simply modify their attack sequences in malware or other types of attacks to avoid detection. Some research in this type of IDS is [
5,
6,
7];
Anomaly-based IDS (or behavior-based detection, statistical-based detection)—A behavior or anomaly-based IDS solution acts beyond identifying particular attack signatures to detect and analyze malicious or unusual patterns of behavior. This type of system applies Artificial Intelligence (AI) and ML to analyze large quantities of data and network traffic to pinpoint the anomalies. Despite having higher false alarm rates than knowledge-based IDS, anomaly-based IDS can adapt to new, unique, or original attacks, and it is less dependent on identifying specific operating system vulnerabilities. Some contributions in this field are [
8,
9,
10];
Hybrid IDS—A combination of the types mentioned above. This type of system can effectively pinpoint the observed attack types and learn the pattern of traffic to track new attack types. It is one of the best solutions and has received the most focus from recent researchers but comes with the cost of high hardware resource consumption and complicated implementation depending on the components that create the system. The contributions in this field are [
11,
12,
13,
14]
The rise of ML created the opportunity to achieve more excellent IDS. Some frontiers that set the cornerstone for improving the IDS include Chih-Fong Tsai and Yu-Feng (2009) in [
15], which investigates the challenges and opportunities of applying machine learning (ML) techniques for network intrusion detection in real-world settings. Or Halqual in [
16], who introduced a multi-grade intrusion detection model based on data mining technology. The authors aim to address the shortcomings of traditional IDS, such as high false alarm rates and limited detection capabilities. Numerous more IDS designs based on Linear regression, Support Vector Machine (SVM), Naive Bayes mode, Tree-based family model and clustering models are mentioned in contributions [
17,
18,
19,
20,
21] and surveys [
22,
23,
24]. However, despite extensive research and promising results in controlled environments, the adoption of these proposed systems in operational settings remains limited. These contributions with ML and traditional methods often struggle to keep pace with the evolving threat landscape, where the amount of data features required to distinguish between anomaly and normal traffic behavior increases drastically as cyber-attacks become more and more delicate.
The advent of DL, a subset of ML, has opened up new avenues for intrusion detection. DL algorithms, inspired by the structure and function of the human brain, excel at automatically learning intricate patterns and representations from vast amounts of data. This ability to discern subtle anomalies and correlations within network traffic makes DL a promising tool for identifying malicious activity that might elude traditional IDS methods. Some early frontiers in this area are Ghanem and his partners, who performed the research [
25], which proposes a novel approach using an enhanced Bat algorithm to train a multilayer perceptron for intrusion detection, highlighting the potential of nature-inspired optimization. The emergence of Graph Neural Networks (GNNs) for IDS has shown promising results due to their ability to model complex network relationships. The authors of [
26,
27] provide comprehensive surveys on GNNs in IDS, highlighting their adaptability to evolving network structures, although challenges with computational cost and interpretability remain. DL architectures like Convolutional Neural Networks CNNs and Generative Adversarial Networks (GANs) have also been explored. Al-Milli et al., in [
28], demonstrated the feasibility of using CNNs and GANs for intrusion detection, but generalization and adversarial robustness remain concerns. Mohammadpour et al., in [
29], surveyed CNN-based IDS, emphasizing their capability of automatic feature extraction while noting the need for careful hyperparameter tuning.
In recent years, more advanced DL structures and hybrid models combining different DL architectures have also been investigated. ElSayed et al. [
30] proposed a CNN-based model with regularization for SDNs, while Gautam et al. [
31] introduced a hybrid Recurrent Neural Network (RNN) with feature optimization. Both show promise but require further validation and generalization. The use of Long Short-Term Memory Networks (LSTM), one of the variants of RNN for IDS, has been explored in various studies. Contributions such as [
32,
33] were focused on investigating the LSTM-based IDS for host-based and network-based intrusion detection, respectively. These studies demonstrated the effectiveness of LSTMs in capturing temporal dependencies in network traffic, but the need for large labeled datasets remains a challenge. Further advancements include Bidirectional LSTM (BiLSTM) and hybrid CNN-LSTM models. Chen et al. [
34] and Imrana et al. [
35] utilized BiLSTMs for intrusion detection, showcasing their ability to capture bidirectional temporal relationships. However, the computational cost associated with training deep BiLSTM models remains a concern. And the most advanced approach is the hybrid between CNN and RNN (LSTM or GRU), which demonstrates their efficacy in capturing temporal dependencies in network traffic in both time and space aspects, as shown in the contributions [
36,
37,
38]. They offer vastly improved performance and introduce computational complexities but require further research to exploit their full potential in developing IDS.
Almost all of the mentioned contributions and studies have focused on classification approaches. That means the systems only detect and classify the attacks at the moment they occur and, thus, leave the system passive in observation and protection. As a wise idiom said: “An ounce of prevention is worth a pound of cure”. No matter how fast an IDS solves the problem, there is a possibility that the attacks will occur successfully and will more or less damage the wireless system. This is true especially for some dangerous attacks, such as DDoS, because they quickly flood the system with bots and barely give enough time for the system to recognize the attack and decide on a solution to fix the problem. Furthermore, according to their contributions, the IDS can never achieve 100% correctness in prediction, which means there will always be a possibility of wrong prediction. When this happens, the lack of response time can leave the system vulnerable. Therefore, to maximize the protection, it is better to develop a system that can estimate and predict the network traffic status in a short amount of time based on the recent traffic status history. This will allow the IDS to estimate how the traffic will behave in the near future and detect any potential threats that can occur based on past data and the current traffic flow status. This will contribute to making the IDS more active in their observation duty and give them more time to deal with the attacks, thus increasing the efficiency of the protection. One more advantage is that, when actively estimating the traffic for a certain duration, the IDS can be based on not only the current status of the traffic but also the predicted status to decide if there is a potential attack about to occur or not. Hence, the false alarm rate will be reduced in the future.
Regarding this approach, there have been very few frontier studies that have tried to develop the IDS this way. The latest and closest research to this approach is [
39], where the authors propose a strategy of using a combination of a CNN, LSTM, and attention models to predict the future
T packets. The research shows promising results in which their best model obtained an F1 score (the F1 score is a metric used to evaluate the performance of classification tasks [
40]) of 83% for the T = 1 packet scenario and reached a 91% F1 score for forecasting an attack in the subsequent T = 20 packets. However, there is no imbalanced data consideration in their contribution, which leads to a reduction in the accuracy of the strategy they used, and the combination of three separate models can consume the processing time and resources of the IDS.
Another common limitation of these former contributions that affected the accuracy and precision of prediction is the imbalance in the datasets they used. These include AWID [
41,
42], CICIDS2017, CSE-CICIDS2018 [
28], LITNET, and KDDcup [
43]. Some older datasets are still in use, for example, KDD-1999, DARPA 1999, and KDDCup-99 [
44,
45], and are applied in anomaly, signature, and hybrid-based IDSs and the family of KDD (Knowledge Discovery and Data mining) datasets [
46]. No matter what datasets they use and how good they are, they all have a common flaw, which is the imbalance between categories in the datasets. Typically, this flaw comes from the fact that anomaly traffic data, like attacks, is often a rare event compared to the vast amount of normal traffic in a network. As a result, the algorithm may not have enough examples of attack behavior to learn its distinctive features effectively, making it more prone to misclassification. Suppose the dataset used to train the detection algorithm has a disproportionate amount of normal traffic data compared to attack data. In that case, the algorithm may become biased toward classifying new instances as normal. This bias can lead it to misclassify actual attacks as normal (false negatives) or flag harmless anomalies as attacks (false positives). This is the cause of high false alarming errors in a majority of current IDS.
According to [
7], the factor of imbalanced data is unavoidable due to the nature of the cyber security problem, where the “Normal” behavior outnumbers the attacks considerably. As Wilson claimed in [
47], there is almost no ultimate technique or method to completely treat the imbalanced data in wireless networks, and depending on the utilized dataset, the researchers will, based on their experience and knowledge, choose the best approaches to deal with their data. These approaches have been studied and applied in numerous research, such as in [
48,
49,
50], where the authors applied an ML algorithm to reduce the effect of the imbalanced datasets they used; in the contribution [
51], the authors introduce using a semi-supervised learning model in IDS. Despite not explicitly mentioned in the paper, this is a smart way to overcome the imbalanced data since in semi-supervised and unsupervised learning, most of the data are not labeled so that the model will learn the underlying patterns of both normal and abnormal behavior fairly and not develop a bias toward any category; however, training these models may require a larger dataset for them to learn efficiently and thus consume time and computation resources. Another good method for overcoming imbalanced data is mentioned in [
52], where the authors used radial basis function neural networks, which can model complex decision boundaries and could potentially help them learn patterns from both majority and minority classes, but at the cost of very high computational time required. Other contributions, such as [
53,
54,
55], focused on the Federated learning method that helps DL models solve the problem of imbalanced data, but it is not really suitable in large-scale networks due to relying on multiple participants. And lastly, in [
56,
57,
58], some researchers, including Wilson, agreed that hybrid approaches, such as a hybrid system or combined strategy, will be the most suitable approach to deal with imbalances since it can utilize the advantage of multiple systems or algorithms and create the opportunity to achieve a good system for IDS.
Overall, the best and lowest-cost approach to minimize false alarms is limiting the effect of the unbalanced category on the other categories in the data. This includes collecting more data, which is the best option but is very time-consuming and very difficult since if more data could be collected easily, this problem would cease to exist, discarding the overwhelming categories—mostly the “normal” traffic behavior—which would cause a loss of information and bias in prediction; and finding a way to separate the greater number of categories from the lesser number of categories, thus reducing the effect. The last approach is not easy to achieve but is more time-saving, cost-saving, and information-preserving than the other solutions. This is the solution that will be focused on in this topic and will be integrated into the hybrid system we design to form a strategy, which we call HRC.
1.2. The Methodology’s Novelty and Contributions
Technically, there are two contributions from this strategy to the IDS:
The first contribution is that this strategy can be used to predict the future behavior of the wireless network to detect if there is a potential threat about to happen based on observing the IP packages’ information. The idea is that the flow of IP packages can reflect the behavior of the networks as it is inherently time-dependent. Each packet in the flow has a timestamp indicating when it was transmitted or received. The order of the packets and the time intervals between them thus can provide the patterns that reflect the behavior of the network over a long period. And these patterns are distinctive in the attacks. Despite how complicated an attack is, it usually leaves a trail when occurring and is presented on the flow IP packages. By carefully exploiting these trails with modern algorithms, such as ML and DL, the system can recognize the signs of an attack before it occurs. Among the algorithms, RNN-LSTM and CNN, as mentioned, are the brightest candidates for extracting and learning the time–space relationship between features in the IP package flow, and thus, they are the most suitable for this strategy. This contribution can help the IDS to be more proactive in detecting the potential threat and having enough time to react to the attack, or in the case of false detection, due to predicting multiple steps ahead in a short moment in the future, the IDS can have time to reconsider the decision before making a final decision in the upcoming current moment. This idea is considerably new in IDS study, and the closest to it, at this moment, is in the contribution [
39].
The second contribution of this strategy is its ability to deal with imbalanced data without discarding any samples or changing the base relationship between categories, such as in [
59,
60,
61]. The main task of IDS is to recognize the attack occurring on the wireless system; therefore, classifying the traffic behavior is the main task of IDS. However, imbalanced data significantly affect almost all the classification approaches because the class imbalance directly skews the distribution of the target class (or target category) toward the majority class (or majority category). If imbalances are too high, the model becomes overly familiar with the majority class, leading to poor generalization of the minority classes. In the regression task, the impact might be less severe mostly because the regression model focuses more on the distribution of the target class and is not overly biased toward a specific range or set of classes. Therefore, as long as the distribution of the target class remains balanced, the model can still learn the full range of potential output classes and is less likely to be completely biased toward the majority. Therefore, we let the regression part in the HRC strategy handle this instead of the classification part. As mentioned, the primary task of the regression part is to try to predict the behavior of the traffic in the near future, which is specifically “normal” or “attack”, so technically, the regressor also handles classifying these two categories. Regarding the attack, we combine all the attacks into one big category and, thus, raise the number of samples considerably so that it will reduce the imbalanced effect between them and the “normal” category. This idea, plus the fact that regression models are less affected by the imbalanced categorical problem, will help the IDS deal with the imbalanced data more effectively.
Overall, this strategy can provide a new method of handling two major problems in one approach and benefit the IDS by reducing the computational cost, the complicity, and the number of other approaches used in the system to overcome these problems.
With all those reasons and knowledge, in this work, we propose a strategy called HRC for an IDS framework based on DL to improve the ability of IDS that can deal with the imbalanced dataset. Overall, this strategy employs two supervised algorithms: (i) a deep hybrid neural network model using a one-dimensional convolutional layer LSTM (Conv1D-LSTM) to predict traffic behavior according to the traffic pattern; (ii) a one-dimensional convolutional network (CNN1D) to classify the incoming types of attack. Five classes (or categories) of traffic behaviors were chosen from the AWID3 for our research, including Website Spoofing, Evil twin, Botnet, Malware, and Normal.
The paper is structured as follows:
Section 1 introduces the topic and reviews the existing related work;
Section 2 presents definitions of the problem and the preparation of data;
Section 3 describes our proposed HRC strategy;
Section 4 shows the experimental results of the individual model used in the strategy;
Section 5 evaluates the goodness of the HRC strategy when integrated into an IDS framework;
Section 6 concludes the paper with an evaluation and future works.