1. Introduction
The network architecture of the SDN is programmable compared with traditional networks. It separates the controls into a logic control plane [
1,
2]. Nevertheless, for the SDN, it is still susceptible to attacks from network. One of the efficient countermeasures is the deployment of an intrusion detection system (IDS) for the SDN [
1,
3]. In this paper, an IDS which is based on GA is proposed to prevent SDN from malicious data traffic.
The GA is an evolutionary algorithm. It simulates the phenomena of replication, crossover and mutation in natural selection, and it is insensitive to local optima, which makes it possible to find global optima [
4]. The GA has been proposed for solving different optimization problems and has obvious advantages on optimization problems according to the experimental results [
5].
An IDS is a security scheme that tracks and examines network traffic in order to find intrusions [
6]. However, it is incredibly challenging to detect malicious packets in a network owing to the rapid growth of data traffic and the continually expanding network scale. Consequently, it might be difficult to locate the detection points in an IDS [
7]. IDS hardware resources, such as CPU processing power, memory access speed and storage capacity, are typically constrained [
8]. To improve the detection efficiency, multiple IDSs must be deployed to check a vast number of data packets, especially in large-scale network systems. The IDS we suggest is deployed on the control plane of an SDN, and by mirroring fully-configured SDN controllers, it collects and analyzes traffic from the switch [
1].
In order to identify suspicious packets accurately, the IDS module typically classifies the collected traffic using classification algorithms. The authors of [
9] created a Deep Neural Network model for suspicious traffic detection in SDN. In order to identify network risks for the IDS, an enhanced behavior-based support vector machine and learning algorithm utilized in the security monitoring system (SMS) was proposed in [
10]. However, most of the current research focuses on packet sampling and detection at a specific SDN point. In fact, the SDN is massive and has a huge amount of data flow. Anomalies can be quickly identified with IDS modules deployed on the SDN control plane, allowing for quick and decisive action to be taken to ensure network security.
Not all packets of the network traffic can be examined due to the limited storage capacity of the IDS. Therefore, some malicious packets might escape from detection [
11]. In order to address this problem, some studies suggested an IDS model for capturing partial traffic in applications. Using center metrics in SDNs, Yoon et al. [
12] suggested a scalable flow sampling method. This approach uses per-flow and per-switch sampling to probabilistically capture packets on the switch, and it calculates the traffic sampling sites of switches by applying center metrics in graph theory. However, sampling might cause some valuable information to be lost. The proposed scheme offers a sample rate adjustment strategy to resolve this problem and establish the best sampling rate for switching. Due to the IDS’s limited detection capacity, the technique determines the best sampling rate for each switch. Therefore, despite the IDS’s limited detecting capabilities, the suggested approach fully utilizes them. Results from simulations show that our suggested IDS model can accurately detect malicious traffic with limited processing capacity. Additionally, simulation results demonstrate the suggested method is superior compared to other similar strategies.
The contributions of this work are summarized as follows: (i) A new IDS model has been proposed which can detect suspicious packets on numerous switches in an SDN; (ii) A scheme based on genetic algorithm to approach the optimal sampling rate of each switch has been proposed; (iii) The effectiveness of the suggested IDS model and the efficiency of the optimal sampling rate scheme have been precisely verified by experiments.
The remainder of this paper is organized as follows. In
Section 2, we provide an overview of previous related work. In
Section 3, we present the preliminaries and methodology of the solution. In
Section 4 and
Section 5, the proposed IDS model and the scheme to compute the optimal sampling rate are presented in detail. In
Section 6, the evaluation and analysis of the simulation results of the proposed IDS model are presented. In
Section 7, we conclude this paper and discuss ideas for future work.
2. Related Works
Numerous research works of intrusion detection technology concentrate on how to effectively manage large numbers of traffic samples related to conventional IP networks [
13,
14,
15,
16]. Silva [
13] showed the value of network traffic sampling in his research. Additionally, a mechanism for calculating each network device’s weight according to the memory usage, CPU load and data volume has been proposed. An adaptive feature recognition sampling technique was put out by Bartos [
14] as a solution to the issue that sampling network traffic can decrease the precision of subsequent anomaly detection. The simulation results showed that traffic sampling can improve the efficiency of anomaly detection and minimize information loss throughout the sample process. In order to increase the detection of network attacks and balance IDS loads, Ha et al. [
15] suggested a clustering-based flow grouping method that distributes flows according to routing information and flow data rate. In contrast to the approach based on traditional clustering, Ahmed [
16] proposed a network traffic summary technique which may further create statistics for data mining in high-dimensional complicated network traffic data sets.
SDN security technologies have been the subject of numerous research studies [
17,
18,
19,
20,
21]. Only a small number of them, however, focused on the attack detection of SDN. The fusion of Statistical Fingerprint IDS and SDN architecture can prevent the growth of malicious traffic in the network [
20]. In order to identify the patterns in the data coming through the firewall, the authors of [
21] employed association rules and constructed a packet filtering firewall on an SDN controller called Floodlight. The authors of [
22] presented an overview of several kinds of intrusion detection systems and the new technology of SDN. To minimize the effects of an unbanlanced data flow, the authors of [
23] construct an adaptive IDS model of SDN based on online ensemble learning algorithm. Some studies have concentrated on a specific probability of data flow sampling in light of the IDS’s comparatively poor detection capacities.
To support system packet sampling, probabilistic packet sampling and other diverse sampling approaches in an SDN, an extended OpenFlow called FleXam was suggested in [
24]. In [
25], the authors suggested a low-latency, sampling-based network measurement platform called OpenSample, which implements quick traffic statistics measurement in SDN using the probabilistic packet sampling of the sFlow protocol. According to [
11], an optimization problem was formulated to determine an appropriate sample rate for each switch because the processing capacity of an IDS server is considered significantly smaller than the entire quantity of traffic in a large-scale network system. Utilizing mirroring, the IDS server samples network traffic at the best sampling rate.
IDS servers are often installed at the edge of a network [
1,
12]. By mirroring the fully configured SDN controller, an IDS server can sample traffic from any switch and then process all of the switch packets. The strategy employed flow sampling techniques to cut down on the duplication of traffic and minimize network overhead. However, there will be an increase in network overhead and possibly even network congestion due to the transmission of sampled traffic and the transfer of the control information. Additionally, delays in feedback may occur due to the time needed for information transmission. In [
26], a functional modular control plane architecture model was suggested. This architectural model increases the flexibility and scalability of a single centralized controller (e.g., NOX) by decoupling the control plane. In [
27], Hu proposed a distributed architecture for the SDN control plane. To demonstrate the scalability of the control plane, the authors of [
28] suggested adding some control features to the data plane. A hierarchical control plane architecture featuring peer-to-peer communication among logically scattered controllers was created by the developers of [
29]. An Intrusion Detection and Prevention System (IDPS) was designed and put into use by the authors of [
30] using SDN. The suggested IDPS is a software application that monitors malicious activity or security policy violations on networks and systems and then takes action to stop them. A comparison study of several IDS systems based on a deep learning model and machine learning methodologies is explored in the article, and future perspectives for SDN security are detailed in [
31], which also provides an overview of the security solutions currently available for the SDN. To increase the overall accuracy of intrusion detection in SDNs, the authors of [
32] proposed a five-level hybrid classification system combining the k-nearest neighbor approach (kNN), the extreme learning machine (ELM), and the hierarchical extreme learning machine (HELM). The authors of [
33] noted that the visibility and flexibility of managed, centralized, and regulated software defined networks has increased. However, these advantages also result in a more vulnerable environment and some serious challenges. This study demonstrates the use of tree-based machine learning algorithms for traffic monitoring to detect malicious behaviors in the SDN controller.
In this paper, we propose to install an IDS module on an SDN’s control plane. Our scheme utilizes the flexibility and programmable network management of SDN in comparison to previous alternatives. Furthermore, adequate processing capacity is offered by the scalable control plane of SDN.
3. Preliminaries and Methodology of the Solution
3.1. Preliminaries
Compared to traditional networks, SDN features a centralized control system which manages a tremendous quantity of data traffic. The application plane, control plane, and data plane are the three tiers of the SDN architecture [
1,
34]. Numerous applications make up the application plane of the SDN. The control plane is in charge of centrally supervising the devices of the data plane and managing the network’s overall information [
35]. Using OpenFlow, the control plane keeps monitoring on the data plane [
36]. Additionally, the data plane provides the function of data forwarding. The operational effectiveness of many Internet applications and service systems can be increased via SDNs. Compared to the conventional routing table, each switch’s flow table on the data plane for data processing is more complicated, since the SDN controller would need to give routing instructions for all of the network traffic which is forwarded.
3.2. Methodology of the Solution
The network congestion will occur when the overall amount of sampled traffic exceeds its processing capacity. In order to address this problem, we proposed an IDS for SDN based on a genetic algorithm and a scheme that is based on the sum of false-negative rates to choose the optimal sampling rate of the IDS.
The studied network architecture is made up of OpenFlow-based switches and an SDN controller. Based on OpenFlow which is on the control plane, the SDN controller is linked to the switches. Through OpenFlow, the controller monitors traffic and manages each switch’s flow forwarding table.
The control plane is abstracted into a master module and a number of sub-function modules without losing generality. Each sub-function module is primarily managed and controlled by the master module. According to their types, messages from the data plane are passed to the appropriate sub-function modules. The sub-function modules can be flexibly enlarged internally while being logically centralized, which further improves the control plane’s adaptability, reliability and overall performance.
The control plane of an SDN is where the suggested IDS module is installed. All packets on all switches are sampled by the SDN controller, which then sends the samples to the IDS module for additional analysis. The IDS module will trigger and feed back to the master module a security alert if it discovers a suspicious packet. In order to secure the network, the master module reconfigures it based on the switches’ present status and the findings of the detection.
GA is suitable for solving complex optimization problems. The IDS module can accurately detect malicious packets and immediately deliver the alert to the SDN controller when using GA to solve the optimal sampling probability of switches.
5. The Optimal Sampling Vector
5.1. Problem Formulation of the Optimal Sampling Vector
With the sampling technique, the accuracy of classifier and sampling can both impact the ability of the network to resist malicious attacks. For simplicity, a false-negative rate is used in the rest of the work to measure sample accuracy.
Define
as the false-negative rate of the
flow; then, the false-negative rate vectors of
g flows are:
If malicious packets of a flow have not been sampled from the sender to the receiver, IDS will not be able to detect them in this flow. This flow will be classified as normal flow by mistake, and the controller will not implement necessary measures to resist it. Let
denote the probability that the
j switch misses malicious packets while sampling the
flow and
denote the set of switches which the
flow passes through. Therefore,
can also be defined as the product of
of all switches in
:
It should be noted that Equation (
7) refers to the probability that each intermediate switch cannot catch malicious packets.
is the best to achieve a higher attack detection efficiency. The calculation of
will be described in the following.
Assuming
packets are sampled from
packets per second (both
and
are integers), then the false-negative rate
can be calculated as follows:
If the number of normal packets in the flow is less than the number of sampled packets (), all packets sampled contain at least one malicious packet, so =0. If the flow does not pass through the switch, the sampling of the jth switch does not affect the false-negative rate of the flow, so .
For better illustration, Equation (
8) is further relaxed by introducing the Gamma function [
37]. The gamma function is defined by
, which is related to factorial as:
Then, Equation (
8) is rewritten as:
The optimization problem to find the
is formulated as follows:
The objective function of
is the sum of the false-negative rates of all flows. Further, based on Equation (
10), we can rewrite Equation (
11) as follows:
Solving
will approach to the
that can minimize the global false-negative rate of an IDS. Due to the limited capacity of IDS, the total amount of traffic sampled cannot exceed the detection capability of IDS. The detection capability of IDS can be simply defined as the maximum number of traffic that IDS can handle correctly without any significant decrease in detection performance. Let
C denote the IDS capacity. Equation (
12) is used to constrain the relationship between
C and
. Equation (
13) is used to limit the sampling probability for each switch to a positive number between 0 and 1.
5.2. GA-Based Approach to the Optimal
In order to solve
, we proposed to apply GA for computational efficiency. In particular, the fitness function in the GA approach is the objective function shown in Equation (
11). The proposed approach has three steps, Initial solution population, Selection operator, Crossover operator and mutation operator.
The Initial solution population is used to initialize k solutions in order to form a solution population. The variable of the model is the sampling probability vector . We choose the initial solution population as . Each solution in the initial solution population is an n-dimensional equivalent vector whose values are probability values generated randomly.
The Selection operator is used to guarantee the correctness of the solution and satisfy the constraints in Equations (
12) and (
13), accelerating the convergence to the optimal solution. The selection operator used has two parts:
The Crossover operator and mutation operator is used to cross and mutate the rest of the individuals selectively, except that the best individual in each generation goes to the next generation of execution algorithms directly.
r is generated randomly. If
, the chromosome needs crossover, thus forming a collection of chromosomes to be crossed gradually. If the number of chromosomes in the collection is even, chromosomes cross each other sequentially. If it is odd, the last chromosome in the collection goes into the next generation directly without crossover after the other chromosomes cross each other sequentially. The specific way of crossover is then given by:
where
,
. The genes inherited from parents and the global optimal genes are half each. The mutation operator and crossover operator are similar. The best chromosome in each generation does not mutate, and the rest of the chromosomes produce a random number
r. If
, choose it to mutate. The specific way of mutation is then given by:
where
. Each mutation is generated randomly. After iterative iterations until the optimal solution no longer changes, the result obtained is the optimal sampling probability.
In order to find the optimal solution , a parameter must be computed first. When a flow begins to transmit data, we can only know how many packets the flow sends per second without knowing how many malicious packets it sends per second. Malicious packets are sent at different rates for different flows. The value of parameter will affect the choice of sampling probability, so we update continuously by taking feedback in the SDN controller so that will be corrected according to the feedback value until it approaches the exact value.
Assume
=
when the scheme is initialized. The SDN controller samples the
flow on each switch and directs the packets sampled to the IDS module. The IDS module detects that the flow sends
malicious packets per second, while the total number of packets sampled per second for this flow is
:
The IDS module feeds back the estimate
of
to the master module for the
flow:
However, we cannot replace the initial value with the estimated value completely, so the weight
(
) is set. The updated
can be calculated as follows:
Ideally, will approach the optimal value after a few iterations. That concludes the approach.
The GA-based approach to the optimal
is summarized in Algorithm 1. In Algorithm 1,
finds the optimal solution
and the value of fitness function
with Equation (
11).
is used to judge whether GA is converged.
picks the suitable solutions from the last previous generation.
is the crossover calculation function of GA and
is the mutation calculation function of GA.
Algorithm 1 Approach to the optimal . |
- Input:
s, g, A, , k, ,,, . - Output:
. - 1:
Set the initial solution population P of GA - 2:
- 3:
- 4:
whiledo - 5:
- 6:
- 7:
- 8:
if then - 9:
- 10:
end if - 11:
end while - 12:
|
5.3. The GA-Based IDS Scheme
The proposed GA-based IDS scheme is summarized in Algorithm 2. The GA-based IDS solves the problem of overloading in the IDS module effectively. Using GA to solve the optimal sampling probability of switches, the IDS module can detect malicious packets accurately and send the warning to the SDN controller instantly without receiving all packets transmitted by switches. Meanwhile, the high-quality feedback mechanism of the master module uses the feedback information of the IDS module to update
continuously, which makes the sampling scheme more efficient and accurate.
Algorithm 2 The anomaly detection process of the GA-based IDS. |
- Input:
s, g, A, , , , . - Output:
Suspicious flows. - 1:
Initialize the with - 2:
while True do - 3:
compute and with Algorithm 1 - 4:
if IDS needs to be stopped then - 5:
break - 6:
end if - 7:
- 8:
classify the data according to the K-means clustering algorithm - 9:
identify suspicious flows - 10:
- 11:
end while
|
6. Experiment and Discussion
In this section, the suggested schemes are illustrated through simulations and tests. First, we demonstrate that using GA to optimize the IDS model can produce results that are more accurate and efficient than those produced by other algorithms. Then, we contrast the suggested IDS sampling plan with those already in place in SDN setups. The simulations and assessments are performed by Matlab 2014b on a Thinkpad E460 with an Intel
[email protected] and 16GB RAM.
6.1. Network Settings
The networks used in the simulations and experiments are constructed randomly. To assess the suggested techniques, the intrusion flows’ malicious sending rates
are modified. In particular, detailed settings of the two tested networks (i.e., network A and network B) are listed in
Table 1.
6.2. Approach to the Optimal Sampling Probability
The proposed Algorithm 1 is used to find the optimal sampling probability . In this section, we first test and find the optimal settings for running GA in Algorithm 1. Then, we compare the proposed Algorithm 1 with other widely-adopted approaches, i.e., Artificial Neural Network (ANN) and Particle Swarm Optimization (PSO) for both network A and network B.
There are two parameters to be set for the GA to approach the optimal
: crossover probability
and mutation probability
. In order to find the best parameter settings so that the optimal sampling probability can be found quickly, we test several pairs of
and
randomly to solve
and compare the result with the true value.
Table 2 shows some of the parameter pairs that return the best results for network A and network B, respectively. As we can see, the parameter pair that returns the best results for both network A and B is
,
. Without loss of generality,
,
are used to run Algorithm 1 for further analysis in this work.
Next, we demonstrate the proposed Algorithm 1 by comparing it with ANN-based and PSO-based approaches. In particular, the malicious sending rates of three attack flows are set to 12 pps, 18 pps and 25 pps in each round of comparison. As shown in
Figure 1, the x-axis indicates the number of iterations, and the y-axis indicates the mean of false-negative rate of three attack flows. With both tested networks, all three approaches are able to converge with enough iterations. In comparison, our proposed Algorithm 1 is able to achieve a much lower capture-failure rate among the three approaches in both network settings. In particular, our proposed GA-based Algorithm 1 converges faster than the PSO-based approach. Although Algorithm 1 converges a bit slower than the ANN-based approach, it is just a few iterations behind, with a much better result. Now that we have verified that our proposed Algorithm 1 is efficient in finding
, we will then evaluate the proposed IDS model.
6.3. Evaluation of the Sampling Scheme
In this subsection, we evaluate the proposed sampling scheme, i.e., Algorithm 2, by comparing it with two existing schemes. In particular, we compared it with the probabilistic sampling scheme (noted as Alg. P) and the optimal sampling scheme (noted as Alg. S) proposed in [
11] with the same simulation settings. Without loss of generality, we use the same network settings as the previous evaluations. Assume that three flows (out of 200 flows) contain malicious packets with a ratio of 2%.
Figure 2a,b show the mean of false-negative rates with respect to the IDS capacities achieved by the three schemes. In particular,
Figure 2a,b are the results for network A and network B, respectively. We can see that our proposed Algorithm 2 outperforms the other two schemes in both network settings. Algorithm 2 has the fastest convergence rate to achieve the lowest capture-failure rate.
Next, we show the mean false-negative rate with respect to the ratio of malicious packets. As shown in
Figure 3a,b, the proposed Algorithm 2 has the fastest convergence rate among the three approaches. In practice, the proposed Algorithm 2 may require 20% or fewer iterations than Alg. S, while Alg. P with a much slower convergence is left out of the competition.
6.4. Evaluation of the IDS Model
In this part, we assess the suggested IDS model using the suggested sample method (i.e., Algorithm 2), and the ideal parameters (i.e., ) calculated from the prior assessment. The initial value of the malicious sending rate of each flow is set to 2% of . The malicious sending rate for the three attack flows are each initialized as 12 pps, 18 pps and 25 pps. In order to provide a clearer demonstration, the malicious rate are updated to 40 pps, 56 pps, and 60 pps at the experiment’s 50th iteration.
As shown in
Figure 4a–d, in each iteration, the false-negative rates for three flows are calculated, and
is updated in accordance with Equation (
19). As seen in
Figure 4a,b, after several updates to
, the proposed IDS model can detect the majority of malicious packets in network A. As the model iterates, the accuracy rises. Additionally, in the initial few iterations, the convergence is faster with a higher
. The same results apply to network B, as shown in
Figure 4c,d. The accuracy of anomalous traffic detection will therefore be improved by updating
in the traffic sampling process. If the malicious packet rate is high enough with an optimized
, the missing rate might be nearly 0. Additionally, the findings from
Figure 4a–d show that the convergence rate declines with a smaller
. Additionally, the fact that
approaches the exact value in continuous feedback is what causes the convergence.
The change of
is further demonstrated in
Figure 5a–d. As it demonstrates,
soon approaches the actual value. After the 50th iteration, we update the actual value of
to show that our suggested IDS model can adjust to rapid network changes.
Based on the false-negative rates, the experimental results show that compared to other widely adopted methods, the proposed IDS model and the the optimal sampling rate scheme are capable of lowering the overhead on the IDS module and enhancing the detection efficiency in SDNs under medium network loads.
7. Conclusions
In this paper, we suggested an IDS for SDN based on a genetic algorithm to detect suspicious packets. The suggested IDS is specifically installed on an SDN’s control plane. Compared to the traditonal IDS server settings, it lowers network overhead. As a result, the suggested system only requires a little amount of resources to effectively detect malicious traffic.
In addition, we suggested a method for determining the ideal sampling rate for the IDS which is based on the total of false-negative rates.
The simulation results showed that the suggested IDS model may significantly increase its intrusion detection efficiency under medium network loads. By developing this model and the sampling scheme, we aim to enhance the detection efficiency in SDNs. In the future, we will keep working to enhance the system in order to cope with medium to heavy network loads.