1. Introduction
SDN is a new paradigm that facilitates network management with its dynamic and programmable structure. In SDN, control and data planes are divided from each other, and network management is carried out by a central controller [
1]. Thus, the controller, which can manage the whole network from a single point, can quickly apply different network policies to the whole network.
Figure 1 shows the layered structure of the SDN environment. However, this emerging new approach brings along security problems in addition to the advantages it provides. In addition to attacks encountered in traditional network structures, SDN is also exposed to attacks specific to itself [
2]. Perhaps the most dangerous of these attacks are attacks on the controller, because the attacker who seizes the controller can have the ability to manage or disrupt all network traffic. DDoS attacks in which users are denied access to network services are at the top of the attacks on the controller.
The attackers aim to create heavy traffic with more than one machine, to consume the resources on the target machine, and to prevent it from serving after a while by DDoS attacks. Attackers use “botnets” created from devices called zombies hijacked by internet hackers. DDoS attacks are carried out with a large number of machines, so it is very difficult to detect and block. The frequency and severity of DDoS attacks are constantly increasing and can have fatal effects on many network services [
3,
4]. For this reason, quick detection and prevention of DDoS attacks are some of the most important problems for network service providers and administrators. Different SDN layers can be disabled by filling communication channels between the controller and the switch or between the controller and the application layer with unnecessary flow information by DDoS attacks. There is no built-in security mechanism on the controller that can distinguish between attack traffic and normal traffic. Therefore, it is very difficult to detect an attack.
DDoS attacks are grouped into three categories; application-layer attacks, resource-consuming attacks, and volumetric attacks [
5]. Application-layer attacks consist of complex attacks. They target specific services using less bandwidth and slowly consume network resources. Therefore, it is difficult to detect. Hypertext Transfer Protocol (HTTP) and Domain Name System (DNS) attacks can be evaluated in this category [
6]. In resource-consuming attacks, servers are rendered unavailable by taking advantage of vulnerabilities in protocols implemented on the network layer. TCP-SYN Flood consumes the resources of the target machine (memory, CPU, and storage) [
7]. It aims to consume the bandwidth of the network with volumetric attacks. Common attacks such as ICMP, UDP, and TCP-SYN flood are performed by using vulnerabilities in Layer 3 and Layer 4 protocols [
8].
In this study, we focus on the SDN to ensure a lightweight hybrid model equipped with NCA and machine learning approaches to contribute to ensuring a new-generation manageable network architecture. In detecting DDoS attacks with machine learning, some flow characteristics (packet size, arrival time, response time, packet rate, packet per flow, etc.) are used to identify whether the network traffic is normal. DDoS attacks often use the same average packet size. Since the attack traffic has a high bitrate, the time to arrive at the target machine is very short. Attackers focus on any of these features to consume the target machine’s resources and prevent it from serving. For this purpose, we handle a public dataset including a total of 23 features for detecting DDoS attacks with machine learning. Instead of considering all the features in the dataset, we reveal the most efficient features with the NCA approach with the help of the newly proposed model. To ensure more generalized results, the proposed approach is tried and tested in four different machine learning algorithms. As a result, the obtained promising results point out that the proposed approach can achieve more efficient results compared to traditional machine learning algorithms, even while using fewer features. The proposed model has great potential in contributing to the management of new-generation SDN architecture.
The rest of the paper is organized as follows: The next section elaborates on some previous related works. In
Section 3, information about the used publicly available dataset is briefly given. in addition, the existing models, feature selection method, data augmentation method, machine learning method, optimization method, and the proposed method are presented briefly in this section. The results and analysis are given in
Section 4. The discussion is presented in
Section 5. Finally,
Section 6 includes the concluding remarks and future work.
2. Related Works
In recent years, many studies have been done to secure SDN using machine learning techniques. In this section, we discuss several studies of DDoS security mechanisms based on machine learning and deep learning techniques.
Security solutions such as the Intrusion Prevention System (IPS) and the Intrusion Detection System (IDS) are used to ensure network security. The increasing variety of attacks has made it necessary to make statistical calculations on these systems. With machine learning algorithms, IDS systems have gained the ability to make meaningful comments and predictions. Pérez-Díaz et al. [
9] proposed a new architectural solution to detect Low-Rate DDoS (LR-DDoS) attacks and mitigate their effectiveness in SDN. The architectural solution consists of IPS and IDS modules placed on the controller. Attack detection is made using different trained machine learning and deep learning methods through the Identification Application Programming Interface (API) positioned in the IDS module. They used the Canadian Institute of Cybersecurity (CIC) DoS dataset in their studies. The experimental results showed that the algorithm that gives the best result with 95% accuracy among six different machine learning algorithms is Multi-Layer Perceptron (MLP). Shoo et al. [
10] introduced a new evolutionary model to classify DDoS attack traffic in an SDN environment. The model uses a combined SVM algorithm for malicious traffic classification. Genetic algorithms (GA) were used for SVM optimization when determining Kernel Principal Component Analysis (KPCA) as a property-selection method to improve the model’s classification performance. Two different datasets which consist of UDP flood, HTTP flood, Smurf, SiDDoS and normal traffics were used to test and compare model accuracy. The experimental results show that the proposed combined method accuracy is 98.9%.
Kyaw, Aye Thandar, May Zin Oo, and Chit Su Khin [
11] used two machine learning algorithms to detect UDP flooding attacks in the SDN environment. They used the Scapy tool for traffic packet generation. Their system collects the flow statics via the OpenFlow switch. After the feature extraction phase, they compared the classification performance of Linear and Polynomial SVM models. Experimental results show that the Polynomial SVM algorithm has a 34% lower false alarm rate with 3% better accuracy.
Janarthanam, S., N. Prakash, and M. Shanthakumar [
12] proposed the security framework that detects DDoS attacks on the SDN environment. The framework is based on an adaptive learning model that uses the historical dataset for traffic classification. They used a cross-validation approach for efficient classification results. Although the results obtained are promising, the adaptive security model should be tested on different datasets obtained from the real environment to be more realistic. Tan, Liang et al. [
13] proposed a novel security model for DDoS attacks in the SDN environment. The model involves two modules based on ML algorithms. The data-processing module uses the K-Means algorithm for best feature selection and the detection module uses the
-nearest neighbour (
NN) algorithm to detect attack flows. Compared to the distributed-Self-Organizing Map (SOM) and entropy-based method, their method has a 98.85% accuracy with a 98.47% recall rate.
Wang, Lu, and Ying Liu [
14] proposed a DDoS attack detection method that used a two-level detection system to identify the attack based on information entropy and deep learning. They used entropy detection to detect suspicious traffic at the first level, and at the second level, they used the convolutional neural network (CNN) model to detect attack traffic. Finally, they tested the method using deep neural networks, decision trees (DT), and SVM models. The CNN model’s accuracy was 4.25–8.20% higher than the other algorithms.
Deepa, V., K. Muthamil Sudar, and P. Deepalakshmi [
15] proposed an ensemble technique to detect denial of service (DDoS) attacks. They used four different machine learning models to detect suspicious traffic in the SDN environment. SVM-SOM algorithm showed better results compared to the other ML algorithms with 98.12% accuracy. The authors in [
16] introduced a DDoS attack-detection system for SDN. The system used two security stages. Firstly, they used Snort to detect signature-based attacks. After that, they used the SVM classifier and the deep neural network (DNN) machine learning model for attack classification. The experimental results proved that DNN has a better classification accuracy rate than SVM at 92.30%.
The authors of [
17] demonstrated the success of the deep learning model in detecting and classifying DDoS attacks in their studies. They applied the DNN model on two different samples taken over the CICDDoS2019 dataset. The attack detection scenario was applied on the first dataset, while the attack traffic classification scenario was applied on the second dataset. Their results showed that the DNN model is quite successful in both intrusion detection and classification. The authors generalize on the results they obtained on the CICDDoS2019 dataset in their studies. However, different datasets can give different results. Therefore, they could support their work by working on different datasets such as NSL-KDD, ISCX IDS 2012, UNSW-NB15, and CICIDS 2017.
Some of the researchers have made intrusion detection using hybrid machine learning models in their studies. Nam, Tran Manh et al. [
18] proposed a DDoS security system using the SDN architecture to detect attack flows. Their hybrid solution uses combined kNN and SOM algorithms. They classified the traffic into normal and malicious using flow statistics collected from SDN switches and vehicle sensors. Adhikary et al. [
19] focused on a hybrid technique which was combined the technique of Neural Network and DT for different types of DDoS attacks in Vehicular Ad hoc Network (VANET). The proposed hybrid algorithm has better results than the single models of Neural Networks and DT. Hosseini and Azizi [
20] proposed a hybrid model to detect and mitigate the DDoS attack. Their framework separated the sides as proxy and client. This way, the limited resources on both sides can be used effectively. They combined six different ML techniques to identify the attack flows. Random Forest classifier provides better results than the compared ML techniques.
Several machine learning-based solutions to detect DDoS attacks in cloud computing and IoT networks have been proposed. The big challenge in machine learning-based solutions is the detection of these attacks with high accuracy. Ujjan, Raja Majid Ali et al. [
21] focused on Internet of Things (IoT) DDoS attack detection. Their proposed methods used time-based and packet-based sampling approaches to collect network traffic coming to the SDN data plane. With these sampling approaches, they aim to reduce the IDS and Deep Neural Network (DNN) model’s processing load and increase the classification performances. The results show that their proposed model has higher detection rates. Ravi, Nagarathna, and S. Mercy Shalinie [
22] proposed a security mechanism to detect DDoS attacks mitigation in the IoT networks. Their mechanism, which is named Learning-driven Detection Mitigation (LEDEM), used a semi-supervised ML model for malicious traffic detection. LEDEM has multiple customized controllers connected to a central controller. They have implemented different security approaches for IoT environments that they separate as mobile IoT and Fixed IoT. They used their dataset for testing their security mechanism. Yong et al. [
23] focused on the web-shell intrusion in IoT environment using the ensemble methods. The authors used the principal component analysis to select the best features with three types of ensemble techniques: random forest (RF), voting, and extremely randomized trees (ET). While RF and ET work well for the light IoT environment, the voting method gives better results for heavy IoT scenarios.
The authors of [
24] proposed a machine learning-based DDoS intrusion detection system to ensure the security of cloud services. They developed the Self-Adaptive Evolutionary Extreme Learning Machine (SaE-ELM) model as an automatic adaptive system and applied it to intrusion detection systems. They tested their method on four different datasets and compared the classification accuracy with commonly used machine learning models such as ANN, DT, and SVM. Although the test and training time of the model they developed is slightly higher compared to the SaE-ELM model, the results obtained are quite good.
Although the central management and programmable structure provided by SDN brings new capabilities to IDSs, the performance of these detection systems depends on the quality of training datasets.
In the recent studies we have summarized above, different datasets such as KDD Cup’99, NSL-KDD, CICIDS2017, CAIDA 2016, UNB-ISCX, and CIC DoS were used. The biggest problem with these datasets is that they are out of date. Attack characteristics are changing, so the need for up-to-date datasets is increasing. LITNET-2020 dataset [
25] and Boğaziçi University datasets [
26] are the current datasets used to detect DDoS attacks. However, these datasets are also created using traditional network platforms like the other datasets.
Our motivation in this study has been to work on up-to-date datasets obtained from SDN network platforms. There are a few publicly available datasets that can be used directly for anomaly detection systems applied in SDN networks [
27,
28]. We used the "DDOS attack SDN Dataset" in our study, which is also a new dataset and accessible to researchers for use in machine learning and deep learning research.
4. Experimental Results
In the first stage of the experimental study, SDN records were classified directly with machine learning methods after the preprocessing step without any feature selection. Hyper-parameters of machine learning algorithms were determined automatically using the method of optimization of hyper-parameters to perform an effective classification. While the dataset was divided as training at the rate of 0.7, it was separated as a test at the rate of 0.3. To perform the classification process with the
NN algorithm, the value of
, which is the number of neighbours to be looked at, was determined as 1, and Euclidean was chosen as the distance function. The Gini algorithm is determined as the division criterion in the DT method. The hidden neuron number was 10 for classification with the ANN method, and the Levenberg-Marquardt algorithm was used as the training algorithm. To classify the network records with the SVM method, the kernel Radial Basis Function was selected, the box constraint value was determined as 1, and the kernel scale value was determined as 0.9. When the classification results were examined after the SDN records were given to the input of the machine learning algorithms, the best accuracy rate was obtained with the ANN method at 97.35%, while the accuracy rates of 95.41%, 94.14%, and 80.56% were obtained with kNN, DT, and SVM methods, respectively. Performance results obtained as results of classification are given in
Table 3.
The raw network data in the dataset were carried out in the preprocessing step, and feature selection was applied with the NCA algorithm. To train the NCA algorithm, the regularization parameter value lambda (λ), which prevents overfitting, was automatically determined. The stochastic gradient descent (SGD) method was used to optimize feature weights. In SGD optimization, the mini-batch size value was determined as 10 and the epoch value as 5. While the weight values of the unrelated features in the NCA algorithm are close to zero, the weight values of the features with high discrimination features are higher. The index values of features and the corresponding weight values are given in
Figure 3.
When we look at the weight values of the features with the NCA algorithm, it is observed that the weight values of eight features are between 0 and 1, while the weight values of 14 features vary between 1.11 and 17.87. Machine learning algorithms are known to affect computational costs when classifying high-specification problems [
46]. For this reason, after analyzing 22 network features NCA algorithms, the first classification process was made with eight features with an index value of more than 9. In the second experimental study, 14 effective features were selected and given as input data to machine learning algorithms. The 14 most effective properties and weight values selected by NCA are given in
Table 4.
More than 100 thousand network records were classified by
NN, DT, ANN, and SVM algorithms after preprocessing and feature selection. In the first experimental study, the new dataset, created by selecting the features with an index value of more than 9 with the NCA algorithm, is given as an input to ML algorithms. As a result of experimental studies, promising results were obtained with all classification algorithms. While the best accuracy rate was obtained with the DT method as 99.1760%, it was determined as 97.7542%, 96.2015%, and 81.4810% with the
NN, YSA, and SVM methods, respectively. The performance results obtained by machine learning methods as a result of the experimental study with the most efficient eight features are given in
Table 5. ROC curves of the whole machine learning method with eight features are given in
Figure 4.
In the second experimental study, the feature set consisted of 22 features. After training with the NCA algorithm, the features with an index value of more than 1.11 were selected. They were classified by ML methods using the same hyperparameters as in the first experimental study. As a result of experimental studies, very good results were obtained with all classification algorithms. While the best accuracy rate was obtained as 100% with the DT method, it was determined as 99.15%, 99.78%, and 98.59% with
NN, ANN, and SVM methods, respectively. The performance results obtained by machine learning methods as a result of the experimental study are given in
Table 6. In
Figure 5, the ROC curves of all machine learning methods are given.
Finally, the feature set consisting of 22 features showed the best results and an index value above 1.11. It was also subjected to a cross-validation test. As a result of the experimental study, the highest accuracy rates were obtained by the DT method. While the accuracy rate was 99.82% with the DT method, it was determined as 99.23%, 97.63%, and 97.20% with the
NN, ANN, and SVM methods, respectively. Performance values obtained by cross-validation test are given in
Table 7.
5. Discussion
In
Table 8, the studies on DDoS attack traffic detection using machine learning algorithms and the classification model we propose are shown comparatively. When
Table 8 is examined, it is seen that different datasets were used to detect attack traffic. Some of the researchers used public datasets containing network traffic data from conventional network topologies such as KDD Cup’99, NSL-KDD, UNB-ISCX, CICIDS2017, and CAIDA 2016 [
2,
3,
4,
5,
6,
7,
8]. The use of these datasets is positive for comparing the performance of machine learning algorithms used in the detection of attack traffic. However, the fact that the SDN architecture is different from the conventional network architecture causes SDN to have unique attack vectors other than its current attacks. Furthermore, the increasing number of attack traffic and variety requires the use of up-to-date datasets. For this reason, researchers use their datasets obtained by using the SDN architecture for their work [
11,
12,
14,
19,
20,
21,
27,
28]. The SDN-specific dataset used in this study was created by the Bennett University study group for machine learning and deep learning studies. The most important criterion for selecting the dataset is that it is created using the SDN architecture and includes up-to-date SDN DDoS traffic data.
The results show that machine learning models are quite successful in detecting attack traffic. Our work aims to contribute to the research conducted in this field. Our experimental results showed that using the NCA feature selection method on SDN traffic data increases the accuracy of machine learning methods in detecting attack traffics. While selecting features with the NCA algorithm, all features are scored according to their distinctiveness index values. However, although this feature selection method does not give the optimum number of features to be selected, it is a deficiency of the method, but experimental studies have been carried out by selecting a different number of features in this study. For attacks such as DDoS attacks that need to be intervened without wasting time, it is important to detect the attack traffic by using system resources as efficiently as possible. Therefore, the most effective features should be selected when creating machine learning models.
It can be seen from
Table 8 that the performance of machine learning models in studies using feature selection algorithms is better than in other studies [
10,
19,
21]. It can be said that model classification performance contributes positively to the classification of attack traffic when used in conforming to feature selection algorithms. However, given that studies in the literature are run by applying different models on different datasets, it is difficult to make general evaluations on comparative results.
6. Conclusions
In this study, normal and attack traffic in the dataset obtained from the SDN environment was classified using machine learning algorithms. The customized SDN-based dataset consists of TCP, UDP, and ICMP normal and attack traffics. The dataset has statistical features such as byte_count, duration_sec, packet rate, and packet per flow except for features that define source and target machines. The NCA algorithm has been used to perform an effective classification and to select the most suitable features. After analyzing 22 network features NCA algorithms, 14 effective features were selected and given as input to machine learning algorithms. More than 100 thousand network records were classified by NN, DT, ANN, and SVM algorithms after preprocessing and feature selection. The experimental results show that DT has a better accuracy rate than the other algorithms with 100%.
In future studies, it is planned to increase the diversity of attacks and compare the classification performances of machine learning models with feature selection algorithms.