Using Machine Learning and Software-Defined Networking to Detect and Mitigate DDoS Attacks in Fiber-Optic Networks
Abstract
:1. Introduction
2. FON, SDN, and ML Interaction for Security Applications
2.1. Fiber Optical Networks
2.2. Network Management System
2.3. Data Communication Network
2.4. Data Communication Channel
- Overhead for the Generation Section ranges from D1 to D3 bytes.
- Bytes D4 through D12 are taken up by the Multiplex Section overhead.
2.5. Normal and Attack Traffic in DCNs of Fiber Optic Networks
2.5.1. Normal Traffic
- FTP is a protocol operating at the application layer to ensure files can be sent between computers using TCP connections [15]. FTP can be used for Backup/Restore capability between servers and nodes in fiber-optic networks and to simulate FTP traffic in the DCN. The Distributed Internet Traffic Generator (DITG) tool described in Section 5 can be used at this level.
- HTTP is a protocol used to request and dispense web content based on plain text. Many websites employ HTTPS to encrypt traffic with Transport Layer Security (TLS), making the internet more secure [16]. NMS applications use HTTP or HTTPS to display a node’s graphical user interface (GUI) and monitor its operation. Section 5 describes the Curl tool used to simulate HTTP traffic in the DCN.
- SNMP is an internet standard used for managing devices (switches, servers, workstations, printers, routers, and telecom equipment) on IP networks. Most NMSs use SNMP to remotely monitor, set up, change, and fix networks. SNMP uses UDP port 161 to send and receive requests and port 162 to receive traps from managed devices [15]. As described in Section 5, the SNMP trap tool is used to simulate SNMP traffic in the DCN.
- ICMP is a protocol used to verify the connections between various network components and determine whether data has reached the destination [17]. ICMP is a crucial component of error reporting and network transmission testing. However, distributed denial-of-service (DDoS) attacks can be carried out using ICMP. The Ping and Hping3 tools are used to simulate ICMP traffic inthe DCN, as described in Section 5.
2.5.2. Attack Traffic
- SYN Flood is an attack targeting the victim’s machine by starting a TCP connection. Because of this, the victim receives a large number of SYN packets, while no ACK is sent back to the victim. This utilizes many resources on the victim’s machine and prevents legitimate users from being served [18].
- ICMP Flood is an attack that exhausts all of the victim’s resources by flooding the server with pings (echo requests) that keep it busy sending echo replies [18].
- UDP Flood is an attack that attempts to take down servers by flooding the targeted host with many UDP packets to random ports. Often, attackers use UDP’s connectionless functionality to broadcast a continuous stream of UDP data packets to the victim’s workstation [18].
2.6. Software-Defined Networking
- Application Layer: a layer that contains applications and programs as well as services such as load balancing, quality-of-service, and a firewall.
- Control layer: a central controller that manages the network traffic. It uses the OpenFlow protocol to communicate with the infrastructure layer to monitor and control the entire network.
- Infrastructure layer: this layer has both physical and virtual network forwarding hardware devices that use OpenFlow protocols, such routers, switches, and access points.
2.7. OpenFlow Protocol
2.8. Machine Learning
3. Related Works
4. Motivation
5. Methodology
5.1. Network Implementation
- Python is a high-level open-source programming language initially developed by Guido van Rossum in 1991. It can be used to support machine learning through a number of libraries and tools. Thanks to its wide range of machine learning programming capabilities [41], Python has developed into a potent programming language with support for object-oriented, imperative, functional, and procedural development methods. In addition, Python now has built-in libraries for different machine learning algorithms.
- Mininet is a popular network emulator tool for SDN research, and is used in our research as it makes for an excellent underlying network topology. The mininet environment enables the creation of virtual hosts and switches, which can then be connected to create the desired network topology. This tool has a Python API as well, making it easy to create custom topologies and experiments [31]. As shown in this Listing 1, a mininet network emulator can be used to build a network consisting of virtual hosts, switches, controllers, and connections. OpenFlow is enabled on mininet hosts’ switches because the mininet hosts run standard Linux software, enabling SDN and highly flexible custom routing. The primary benefit of using mininet is that it supports Open-Flow Protocol, which SDNs require to configure their networks and perform computations. It provides an affordable way to build, test, and create custom network topologies that closely resemble real networks.
Listing 1. Mininet code example. |
- Ryu controller is a component-based SDN framework, SDN architecture, and programmable controller tool. It is well-defined application programming interfaces (APIs) for software components that makes it simple for developers to build new apps for network management and control. In addition, Ryu is compatible with a variety of protocols for the management of network devices. These protocols include OF-config, Netconf, and OpenFlow (particularly versions 1.0, 1.2, 1.3, 1.4, and 1.5 and the Nicira extensions). Because it is licensed under Apache 2.0, the Ryu code can be used by anybody without cost [42]. This license makes the network more flexible by making it easier to manage, handle, and adapt traffic flows. Listing 2 shows an example flow.
Listing 2. Viewing flows in Ryu Controller. |
5.2. Traffic Generation
- I
- Low normal traffic: this occurs when there is little traffic between nodes and servers in an FON in an ideal traffic case, which means that there are not very many activities, alarms, or modifications in the configuration of the network. Up to five simultaneous connections from a single region to the server are considered as low normal traffic.
- II
- Low attack traffic: this occurs when there is a small group of attackers attacking a target and the target has enough resources to reply to the attackers. An attack with low traffic, such as an ICMP flood, can slow down a server’s performance. A low attack is defined by 25 to 33 spoof packets per second.
- III
- High normal traffic: this occurs when there are more activities on the FON side, such as fiber cuts, traffic needing to be rerouted, or an NOC engineer needing do carry out a monthly network backup. Six to ten simultaneous connections are considered high normal traffic.
- IV
- High attack traffic: when a group of hackers constructs a DDoS attack on a FON, they flood the server with a large amount of traffic, restricting the network operators in their management of the FON. A high attack is estimated as 50 to 100 spoof packets per second.
Listing 3. Traffic generation phases. |
- Iperf is an open-source network performance measurement tool. It sends traffic from one host to another with the adapted bandwidth to obtain the desired results. Iperf can make more measurements than only the throughput, including packet loss, jitter, and how the traffic is distrbuted. Iperf works for TCP and UDP traffic, is currently built into mininet, and has differences for each protocol [43,44].
- Distributed Internet Traffic Generator (D-ITG) is a tool that can generate packet-level traffic that can be precisely replicated [43,45]. D-ITG contains models created to mimic the sources of several protocols, including TCP, UDP, ICMP, DNS, Telnet, and FTP. In addition, D-ITG can simulate FTP traffic (e.g., backup and restore) and SNMP traffic (e.g., set bulk and get bulk).
- SNMP TRAP is a tool that generates a notification (i.e., a trap) report alongside an event or alarm to the SNMP manager with the specified message [46].
- Curl is a command line tool used to effectively simulate HTTP traffic [47].
- Hping3 is an open-source packet generator and analyzer for the TCP/IP protocol created by Salvatore Sanfilippo. It can be utilized to benefit from an optimal scanning technique, and is even included in the Nmap Security Scanner [48] as one of the standard tools for security auditing and testing of firewalls and networks. Furthermore, as explained in Section 2.5, the Hping3 tool can simulate all DDoS attacks. Listing 4 shows how to use the associated tools to generate normal or attack traffic.
Listing 4. Normal and attack traffic generation. |
5.3. Data Collection and Feature Extraction/Selection
- Speed of the source IP (SSIP) can be determined by the number of IP addresses received from sources in a certain amount of time; it can be defined as Equation (1) [27,30,32,39]:
- Speed of session (SOS): the number of flow entries in the transport layer (L4 in OSI) determines how many sessions are opened per unit of time T. This feature, introduced in our research and named SOS, is defined as shown in Equation (2):This feature is essential for identifying attacks, as the number of open sessions per unit of time T may considerably increase during a DDoS attack.
- Ratio of pair-flow entries (RPF) is the total number of interactive flow entries (i.e., bi-directional) divided by the total number of IP addresses during the period T; it is defined by Equation (3) [27,32]:Under normal conditions, the traffic between sources and destinations is usually interactive. Therefore, bidirectional flows induce a number of flows that is equal to the number of IP addresses in the network. However, when an attack occurs, interactive communication is disabled and unidirectional flows (from multiple sources to a destination) become established in the network. In this case, the number of IPs is much larger than the number of interactive flows. In this manner, due to the low number of interactive flows and no service availability, the traffic can be characterized as a DDoS attack.
- Standard deviation of flow of packets (SDFP) during a period T is defined by Equation (4) [27,32,39]:
- Standard deviation of flow bytes (SDFB) is calculated by the difference between the number of bytes in a flow compared to the average number of bytes per flow during a period T. It is defined by Equation (5) [27,32,39]:Both the standard deviation of the packets in flows and the standard deviation of bytes in flows significantly affect DDoS attack. However, the predictive value of SDFP is significantly lower during attacks than during normal traffic.
5.3.1. The Dataset
5.3.2. Selected Features
5.3.3. Comparative Study of the Most Popular ML Models
- Logistic Regression (LR) is a well-known machine learning model used for binary classification. It has a probabilistic framework able to adapt classification thresholds and obtain conviction intervals [31].
- K-Nearest Neighbors (KNN) is a non-parametric supervised learning model used to solve classification and regression problems. It performs classification and prediction of new datapoints by comparing them to predefined groups based on the proximity principle [33].
- Decision Tree (DT) is a method of classification that uses a tree structure. This algorithm divides the population into two or more groups based on the most critical attributes [51].
- Random Forest (RF) is a regression-based machine learning algorithm guided by a base model series. Random forest regression has many benefits, such as high accuracy, efficiency, and performance when dealing with essential variables. To obtain the best prediction results, it is vital to choose the correct number of trees [52].
5.3.4. Determining the Adequate Dataset Size
5.4. Performance Metrics
- True Positive (TP): the number of attacks that the classifier predicts correctly.
- True Negative (TN): the amount of normal traffic that the classifier predicts correctly.
- False Positive (FP): how often normal traffic is mistakenly labeled as attack traffic.
- False Negative (FN): how often attack traffic is mistakenly labeled as normal traffic.
- Learning curve: a learning curve shows how an estimator’s validation score and training score change as the number of training samples changes (i.e., with dataset size). This curve is calculated from the training data in order to inform how well a model is learning as the amount of data increases. Moreover, it determines whether the estimator is more likely to make a bias or variance error.
- Accuracy: the ratio of the number of correct predictions to the total number of predictions. It can be calculated by Equation (6):
- Precision: the number of true positive cases out of all predicted positive cases. The precision value, which is between 0 and 1, can be calculated using Equation (7):
- Recall: the number of predicted positive cases as a percentage of all positive cases; it is similar to the True Positive Rate (TPR), and is calculated using Equation (8):
- F1-score: the F1-score represents the harmonic mean of precision and recall. It accounts for both false positives and false negatives. Consequently, it performs effectively on uneven datasets. It can be determined by Equation (9):
- Cross-validation score: a method for re-sampling that uses different parts of the data to test and train a model repeatedly. It is mostly used when the goal is to make a prediction and determine how well a prediction model might work in real-life problems [54].
- STD of cross-validation score: the standard deviation of the cross-validation score measures the variation of the scores when computing a single score for one of the k folds. A low value of this parameter is the most acceptable, and indicates that a dataset is adequate for testing use.
6. Results and Discussion
6.1. Features and Hyperparameter-Based Performance Evaluation
- LR: Figure 11a and Table 4 show that performance metrics improved during the four stages (6–9) for the LR algorithm, from 94.59% accuracy at the beginning to a 99.79% accuracy at the end. Adding more features and tuning the LR algorithm’s hyperparameters significantly affects the algorithm’s performance. However, an increased number of features increases the training time.
- KNN: Most of the performance metrics for the KNN algorithm do not become better during the first three stages (6–8). However, in the ninth stage the accuracy, precision, and F1-score all improve, while the cross-validation score and its standard deviation are not enhanced. The training time for KNN is acceptable, as shown in Figure 11b and Table 4.
- SVM: As shown in Figure 12a and Table 4, increasing the number of features does not represent adequate input for SVM utilization. When tuning the hyperparameters with “linear” and C equal to “1000”, SVM is able to provide high performance. However, training time is a major issue of SVM, especially for detecting DDoS attacks.
- DT: There is no need to tune any ML hyperparameters, as the performance reached 100% in the sixth and eighth stages. Therefore, this algorithm only has two stages. Figure 13a and Table 4 show that this algorithm works well for protecting FON from DDoS attacks, and the training time is less than most of the selected algorithms.
- RF: Similar to the DT algorithm, the RF algorithm works well without changing any ML hyperparameters. A random forest classifier is a robust classifier by default; however, the main problem in this research resides in the numbers and classes being different, meaning that reaching 100% is a reasonable goal. As an advanced step, we changed the default parameters of the algorithm for “n_estimators” from 100 as the default value to 3 (Table 3). This step was performed in order to check the performance with fewer numbers than the default value in the Scikit-learn library. The RF algorithm can be considered as multiple DT algorithms running simultaneously, which is why the training time is higher than the DT algorithm. The performance metrics of this algorithm are shown in Figure 13b and Table 4.
6.2. Comparison between Different Models and Deploying the Optimal One
6.3. Detection and Mitigation Principle
6.4. Performance Evaluation of FON Using the Selected Model
6.5. Performance Evaluation of the Proposed Model
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- He, J.; Zhao, H. Fault Diagnosis and Location based on Graph Neural Network in Telecom Networks. In Proceedings of the 2020 International Conference on Networking and Network Applications, Haikou, China, 10–13 December 2020; pp. 304–309. [Google Scholar] [CrossRef]
- Security, C.N. Threat Intelligence Report 2020; Technical Report; Corero Network Security: Marlborough, MA, USA, 2020. [Google Scholar]
- Xie, J.; Richard Yu, F.; Huang, T.; Xie, R.; Liu, J.; Wang, C.; Liu, Y. A survey of machine learning techniques applied to software defined networking (SDN): Research issues and challenges. IEEE Commun. Surv. Tutor. 2019, 21, 393–430. [Google Scholar] [CrossRef]
- Ahmed, M.; Shatabda, S.; Islam, A.; Robin, M.; Islam, T. Intrusion detection system in software-defined networks using machine learning and deep learning techniques—A comprehensive survey. TechRxiv 2021. [Google Scholar] [CrossRef]
- Alashhab, A.A.; Zahid, M.S.M.; Azim, M.A.; Daha, M.Y.; Isyaku, B.; Ali, S. A Survey of Low Rate DDoS Detection Techniques Based on Machine Learning in Software-Defined Networks. Symmetry 2022, 14, 1563. [Google Scholar] [CrossRef]
- Ashraf, J.; Latif, S. Handling intrusion and DDoS attacks in Software Defined Networks using machine learning techniques. In Proceedings of the 2014 National Software Engineering Conference, Rawalpindi, Pakistan, 11–12 November 2014; pp. 55–60. [Google Scholar] [CrossRef]
- Geer, D. SDN for IoT Supports Networking Devices. Available online: https://www.techtarget.com/iotagenda/feature/SDN-to-support-Internet-of-Things-devices (accessed on 12 September 2022).
- Arumugam, M. Optical fiber communication—An overview. Pramana 2001, 57, 849–869. [Google Scholar] [CrossRef]
- Agwekar, A.; Ahmed, M.; Singh, R.; Ahmed, R. Synchronization Problems in Synchronous Digital Hierarchy (SDH) Communication System and Master Slave Strategies. Int. J. Sci. Eng. Technol. 2012, 1, 197–199. [Google Scholar]
- Huawei. Overview of DCN-NE40E-M2 V800R010C10SPC500 Feature Description—System Management 01. Available online: https://support.huawei.com/enterprise/en/doc/EDOC1100058412/6ee037b2/overview-of-dcn (accessed on 10 September 2022).
- Cisco. Configuring Data Communication Channel. Available online: https://content.cisco.com/chapter.sjs?uri=/searchable/chapter/content/en/us/td/docs/routers/ncs4200/configuration/guide/cem-line-cards/16-12-1/b-cem-ocx-xe-16-12-1-ncs4200/b-cem-ocx-xe-16-11-1-ncs4200_chapter_01010.html.xml (accessed on 10 September 2022).
- Jyothirmai, N.; Valli, R.M.; Krishna, A.R. SDH and its future trends. Int. J. Innov. Technol. Explor. Eng. 2012, 1, 74–78. [Google Scholar]
- Bhakar, M. STM Frame Structure. Available online: https://www.scribd.com/document/134931089/Stm-Frame-Structure (accessed on 10 September 2022).
- Henderson, P.M. Fundamentals of SONET/SDH. 2001. Available online: https://www.michael-henderson.us/Papers/SONET-SDH.pdf (accessed on 10 September 2022).
- Harjunen, T.; Särkkä, A. Classic TCP/IP applications: TELNET, FTP, SMTP, NNTP and SNMP. 1998. Available online: https://www.netlab.tkk.fi/opetus/s38130/s98/tcpapp/TCP_appl.pdf (accessed on 20 September 2022).
- Seufert, M.; Egger, S.; Slanina, M.; Zinner, T.; Hoßfeld, T.; Tran-Gia, P. A Survey on Quality of Experience of HTTP Adaptive Streaming. IEEE Commun. Surv. Tutor. 2015, 17, 469–492. [Google Scholar] [CrossRef]
- Deering, S. ICMP Router Discovery Messages; Technical Report; Palo Alto: Santa Clara, CA, USA, 1991. [Google Scholar]
- Dennis, J.R.; Li, X. Machine-Learning and Statistical Methods for DDoS Attack Detection and Defense System in Software Defined Networks. Master’s Thesis, College of Engeering and Sc Ryerson University, Toronto, ON, Canada, 2018. [Google Scholar]
- Dhaliwal, A.S. Detection and Mitigation of SYN and HTTP Flood DDoS Attacks in Software Defined Networks. Master’s Thesis, Faculty of Engineering and Architectural Science, Ryerson University, Toronto, ON, Canada, 2017. [Google Scholar] [CrossRef]
- Monnet, Q. An Introduction to SDN. Available online: https://qmonnet.github.io/whirl-offload/2016/07/08/introduction-to-sdn/ (accessed on 10 September 2022).
- Hamed, M.I.; ElHalawany, B.M.; Fouda, M.M.; Tag Eldien, A.S. A new approach for server-based load balancing using software-defined networking. In Proceedings of the 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 5–7 December 2017; pp. 30–35. [Google Scholar] [CrossRef]
- Kaur, S.; Kumar, K.; Singh, J.; Ghumman, N.S. Round-robin based load balancing in Software Defined Networking. In Proceedings of the 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 11–13 March 2015; pp. 2136–2139. [Google Scholar]
- Foundation, T.O.N. OpenFlow Switch Specification, Version 1.5.1 (Protocol Version 0x06). Available online: https://opennetworking.org/wp-content/uploads/2014/10/openflow-switch-v1.5.1.pdf (accessed on 10 September 2022).
- Mohammed, M.; Khan, M.B.; Bashier, E.B.M. Machine Learning: Algorithms and Applications; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
- Paleyes, A.; Urma, R.G.; Lawrence, N.D. Challenges in Deploying Machine Learning: A Survey of Case Studies. ACM Comput. Surv. 2022. [Google Scholar] [CrossRef]
- Mishra, S.; Sharma, S.K.; Alowaidi, M.A. Multilayer self-defense system to protect enterprise cloud. Comput. Mater. Contin. 2021, 66, 71–85. [Google Scholar] [CrossRef]
- William Isaac, S.I. Software-Defined Security; Technical Report; University of Toronto: Toronto, ON, Canada, 2018; Available online: https://www.researchgate.net/publication/324716038_SOFTWARE-DEFINED_SECURITY (accessed on 20 September 2022).
- Rahman, O.; Quraishi, M.A.G.; Lung, C.H. DDoS attacks detection and mitigation in SDN using machine learning. In Proceedings of the 2019 IEEE World Congress on Services (SERVICES), Milan, Italy, 8–13 July 2019; Volume 2642-939X, pp. 184–189. [Google Scholar] [CrossRef]
- Khashab, F.; Moubarak, J.; Feghali, A.; Bassil, C. DDoS Attack Detection and Mitigation in SDN using Machine Learning. In Proceedings of the 2021 IEEE Conference on Network Softwarization: Accelerating Network Softwarization in the Cognitive Age, (NetSoft), Tokyo, Japan, 28 June–2 July 2021; pp. 395–401. [Google Scholar] [CrossRef]
- Kumar Singh, V. DDOS Attack Detection and Mitigation Using Statistical and Machine Learning Methods in SDN. Master’s Thesis, National College of Ireland, Dublin, Ireland, 2020. [Google Scholar]
- Etman, M.A.A. DDoS Attack Detection System Using Semi-Supervised Machine Learning in SDN. Master’s Thesis, Ryerson University, Toronto, ON, Canada, 2018. [Google Scholar] [CrossRef]
- Kotb, S.E.; El-Dien, H.A.; Eldien, A.S. SGuard: Machine learning-based Distrbuted Denial-of-Service Detection Scheme for Software Defined Network. In Proceedings of the 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt, 26–27 May 2021; pp. 251–257. [Google Scholar] [CrossRef]
- Gadallah, W.G.; Omar, N.M.; Ibrahim, H.M. Machine learning-based distributed denial of service attacks detection technique using new features in software-defined networks. Int. J. Comput. Netw. Inf. Secur. 2021, 13, 15–27. [Google Scholar] [CrossRef]
- Mohammed, S.S.; Hussain, R.; Senko, O.; Bimaganbetov, B.; Lee, J.Y.; Hussain, F.; Kerrache, C.A.; Barka, E.; Alam Bhuiyan, M.Z. A New Machine Learning-based Collaborative DDoS Mitigation Mechanism in Software-Defined Network. In Proceedings of the 2018 14th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), Limassol, Cyprus, 15–17 October 2018. [Google Scholar] [CrossRef]
- Kyaw, A.T.; Zin Oo, M.; Khin, C.S. Machine-Learning Based DDOS Attack Classifier in Software Defined Network. In Proceedings of the 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Phuket, Thailand, 24–27 June 2020; pp. 431–434. [Google Scholar] [CrossRef]
- Deepa, V.; Muthamil Sudar, K.; Deepalakshmi, P. Detection of DDoS attack on SDN control plane using hybrid machine learning techniques. In Proceedings of the International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 13–14 December 2018; pp. 299–303. [Google Scholar] [CrossRef]
- Nurwarsito, H.; Nadhif, M.F. DDoS Attack Early Detection and Mitigation System on SDN using Random Forest Algorithm and Ryu Framework. In Proceedings of the 2021 8th International Conference on Computer and Communication Engineering (ICCCE), Kuala Lumpur, Malaysia, 22–23 June 2021; pp. 178–183. [Google Scholar] [CrossRef]
- Sudar, K.M.; Beulah, M.; Deepalakshmi, P.; Nagaraj, P.; Chinnasamy, P. Detection of Distributed Denial of Service Attacks in SDN using Machine learning techniques. In Proceedings of the 2021 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 27–29 January 2021; pp. 1–4. [Google Scholar] [CrossRef]
- Ye, J.; Cheng, X.; Zhu, J.; Feng, L.; Song, L. A DDoS Attack Detection Method Based on SVM in Software Defined Network. Secur. Commun. Netw. 2018, 2018, 9804061. [Google Scholar] [CrossRef]
- Bashaireh, B. Cloudflare Reports Q2 2022 DDoS Attack Trends. Zawya. 2022. Available online: https://www.zawya.com/en/press-release/research-and-studies/cloudflare-reports-q2-2022-ddos-attack-trends-x3erqlfs (accessed on 10 September 2022).
- Rana, Y. Python: Simple though an Important Programming language. Int. Res. J. Eng. Technol. 2019, 6, 1856–1858. [Google Scholar]
- Asadollahi, S.; Goswami, B.; Sameer, M. Ryu controller’s scalability experiment on software defined networks. In Proceedings of the 2018 IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), Bangalore, India, 1–2 February 2018; pp. 1–5. [Google Scholar] [CrossRef] [Green Version]
- Noman, H.M.; Jasim, M.N. POX Controller and Open Flow Performance Evaluation in Software Defined Networks (SDN) Using Mininet Emulator. IOP Conf. Ser. Mater. Sci. Eng. 2020, 881, 012102. [Google Scholar] [CrossRef]
- Dugan, J.; Estabrook, J.; Ferbuson, J.; Gallatin, A.; Gates, M.; Gibbs, K.; Hemminger, S.; Jones, N.; Qi, F.; Renker, G.; et al. iPerf—The Ultimate Speed Test Tool for TCP, UDP and SCTP. Available online: https://iperf.fr/ (accessed on 10 September 2022).
- Avallone, S.; Guadagno, S.; Emma, D.; Pescape, A.; Ventre, G. D-ITG distributed Internet traffic generator. In Proceedings of the First International Conference on the Quantitative Evaluation of Systems, Enschede, The Netherlands, 27–30 September 2004; pp. 316–317. [Google Scholar] [CrossRef]
- IBM. Snmptrap Command IBM DOCUMENTATION. Available online: https://www.ibm.com/docs/en/aix/7.2?topic=s-snmptrap-command (accessed on 10 September 2022).
- Hostetter, M.; Kranz, D.A.; Seed, C.; Terman, C.; Ward, S. Curl: A gentle slope language for the Web. World Wide Web J. 1997, 2, 121–134. [Google Scholar]
- Sanfilippo, S. Hping. Available online: http://www.hping.org/ (accessed on 10 September 2022).
- Gupta, A. Distributed Denial of Service Attack Detection Using a Machine Learning Approach. Master’s Thesis, University of Calgary, Calgary, AB, Canada, 2018. [Google Scholar] [CrossRef]
- Stankovic, B.; Kotur, N.; Nikcevic, G.; Gasic, V.; Zukic, B.; Pavlovic, S. Machine Learning Modeling from Omics Data as Prospective Tool for Improvement of Inflammatory Bowel Disease Diagnosis and Clinical Classifications. Genes 2021, 12, 1438. [Google Scholar] [CrossRef] [PubMed]
- Rochmawati, N.; Hidayati, H.B.; Yamasari, Y.; Yustanti, W.; Rakhmawati, L.; Tjahyaningtijas, H.P.; Anistyasari, Y. Covid Symptom Severity Using Decision Tree. In Proceedings of the 2020 3rd International Conference on Vocational Education and Electrical Engineering: Strengthening the framework of Society 5.0 through Innovations in Education, Electrical, Engineering and Informatics Engineering (ICVEE), Surabaya, Indonesia, 3–4 October 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Kurniawati, N.; Putri, D.N.N.; Ningsih, Y.K. Random Forest Regression for Predicting Metamaterial Antenna Parameters. In Proceedings of the 2020 2nd International Conference on Industrial Electrical and Electronics (ICIEE), Lombok, Indonesia, 20–21 October 2020; pp. 174–178. [Google Scholar] [CrossRef]
- Jayaswal, V. Performance Metrics: Confusion Matrix, Precision, Recall, and F1 Score. Available online: https://towardsdatascience.com/performance-metrics-confusion-matrix-precision-recall-and-f1-score-a8fe076a2262 (accessed on 10 September 2022).
- Brownlee, J. Machine Learning Mastery with Python: Understand Your Data, Create Accurate Models, and Work Projects End-to-End; Machine Learning Mastery: San Juan, Puerto Rico, 2016. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Ref | Scope | ML Algorithm | Dataset | Feature | Accuracy | Limitations |
---|---|---|---|---|---|---|
[18] | Detection/Mitigation | RF | UCLA dataset and Synthetic | Three feature | 97.70% | Only one ML model is being evaluated. |
[26] | Detection/Mitigation | SVM | Synthetic | Five features | 98.52% | Only layer 3 feature collection duplicates dataset instances. |
[27] | Detection | SVM | Synthetic | Five features | High | Only layer 3 feature collection duplicates dataset instances. |
Some features remain unchanged. | ||||||
[28] | Detection/Mitigation | J48 | Synthetic | 24 Features | J48 classifier is the best | High complexity due to many features. |
RF | ||||||
SVM | The detection time is high. | |||||
K-NN | ||||||
[29] | Detection | SVM | Synthetic | Five features | 94.99% | No mitigation. |
LR | 98.90% | |||||
KNN | 86.41% | |||||
DT | 99.11% | |||||
NB | 99.64% | |||||
RF | 99.76% | |||||
[30] | Detection/Mitigation | SVM | Synthetic | Three features | 99.26% | Only layer 3 feature collection duplicates dataset instances. |
[31] | Detection | LSVM & LR | Synthetic | Four features | 98% | No mitigation. |
[32] | Detection | SVM | Synthetic | Five features | 97.5–99.9% | No mitigation. |
[33] | Detection | SVM | Synthetic | Six features | 99.84% | No mitigation. |
KNN | 98.96% | |||||
DT | 99.26% | |||||
NB | 77.64% | |||||
RF | 99.19% | |||||
[34] | Detection/Mitigation | ML | NSL-KDD dataset | 25 features | F1-score (77%) | High complexity due to many features. |
[35] | Detection | polynomial SVM | Synthetic | Five features | 95.38% | No mitigation. |
[36] | Detection/Mitigation | SVM- SOM | Synthetic | Not mentioned | 98.12% | Features not defined. |
[37] | Detection/Mitigation | RF | Synthetic | Five features | 98.38% | Evaluation of only one model. |
[38] | Detection | SVM | KDD99 | 41 features | 78% | High complexity due to many features. |
DT | 85% | No mitigation and accuracy not high. | ||||
[39] | Detection | SVM | Synthetic | Six Features | 95.24% | Only layer 3 feature collection duplicates dataset instances. |
SOS | SSIP | RPF | SDFP | SDFB | TYPE | PHASE |
---|---|---|---|---|---|---|
7 | 1 | 0.083333 | 78.5163 | 11,8931.2 | 0 | Low Normal Traffic (I) |
19 | 0 | 0.307692 | 1007.227 | 670,561.4 | 0 | |
8 | 1 | 0.25 | 1510.026 | 939,150 | 0 | |
12 | 1 | 0.333333 | 99.04024 | 150,217.5 | 0 | |
. | . | . | . | . | . | |
. | . | . | . | . | . | |
107 | 107 | 0.002331 | 0.048252 | 4.728723 | 1 | Low Attack Traffic (II) |
123 | 123 | 0.002513 | 0 | 0 | 1 | |
113 | 113 | 0.002255 | 0.047458 | 4.650868 | 1 | |
111 | 111 | 0.00216 | 0.046449 | 4.551984 | 1 | |
. | . | . | . | . | . | |
. | . | . | . | . | . | |
12 | 4 | 0.166666667 | 268.1955528 | 149,167.4674 | 0 | High Normal Traffic (III) |
7 | 2 | 0.064516129 | 23.52020408 | 35,562.54857 | 0 | |
5 | 2 | 0.066666667 | 0.577350269 | 567.0716445 | 0 | |
10 | 1 | 0.25 | 93.32372 | 141,228.5 | 0 | |
. | . | . | . | . | . | |
. | . | . | . | . | . | |
196 | 211 | 0.006494 | 0.113402 | 13.72169 | 1 | High Attack Traffic (IV) |
182 | 197 | 0.00409 | 0.078165 | 10.47416 | 1 | |
20 | 19 | 0 | 0.917663 | 109.2019 | 1 | |
236 | 251 | 0.008287 | 0.074175 | 9.444982 | 1 | |
. | . | . | . | . | . | |
. | . | . | . | . | . |
Machine Learning Algorithm | Parameter 1 | Default Value | Increasing Value | Decreasing Value | Description |
---|---|---|---|---|---|
Logistic Regression | tol | float, default = | Tolerance for stopping criteria. | ||
C | float, default = 1.0 | Inverse of regularization strength; must be a positive float. As in support vector machines, smaller values specify stronger regularization. | |||
K-Nearest Neighbour | n_neighbors | int, default = 5 | Number of neighbors to use found by Algorithm | ||
p | int, default = 2 | Power parameter for the Minkowski metric. This is equivalent to using manhattan_distance (l1) when p = 1 and euclidean_distance (l2) when p = 2. | |||
Support Vector Machine | C | float, default = | Regularization parameter. The regularization strength is inversely proportional to C and must be strictly positive. | ||
kernel | default = ‘rb’ | Specifies the kernel type to be used in the algorithm. | |||
gamma | default = ‘scale’ | Kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’. | |||
Naive Bayes | var_smoothing | float, default = | Portion of the largest variance of all features that are added to variances for calculation stability. | ||
Decision Trees | max_depth | int, default = None | The maximum depth of the tree. If none, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. | ||
RandomForest | n_estimators | int, default = 100 | The number of trees in the forest. |
MLA | Stage | Numbrt of Features | Grid Searched Hyper-Parameter | Selected Hyper-Parameters | Accuracy | Precision | Recall | F1-Score | CV_Score | Std_CV | Training Time (s) |
---|---|---|---|---|---|---|---|---|---|---|---|
LR | 6 | 3 1 | [‘tol’: [, , , ]], | Default 3 | 94.59 | 99.55 | 89.75 | 94.4 | 95 | 3.16 | 0.03459 |
7 | 3 | [‘C’: [0.1,1.0,10.0,100.0]] | O 4 [‘C’: 100.0, ‘tol’: 0.01] | 94.8 | 100 | 89.75 | 94.6 | 95.25 | 3 | 0.03057 | |
8 | 5 2 | Default | 99.16 | 98.39 | 100 | 99.19 | 99.63 | 0.94 | 0.0775 | ||
9 | 5 | O [‘C’: 10.0, ‘tol’: 0.01] | 99.79 | 99.59 | 100 | 99.8 | 99.88 | 0.25 | 0.0473 | ||
KNN | 6 | 3 | [‘n_neighbors’: [3,5,10,15,20]], | Default | 99.58 | 99.19 | 100 | 99.59 | 99.81 | 0.29 | 0.0033 |
7 | 3 | [‘p’: [1,2]] | O [‘n_neighbors’: 5, ‘p’: 2] | 99.58 | 99.19 | 100 | 99.59 | 99.81 | 0.29 | 0.0010 | |
8 | 5 | Default | 99.58 | 99.19 | 100 | 99.59 | 99.25 | 1 | 0.0020 | ||
9 | 5 | O [‘n_neighbors’: 3, ‘p’: 1] | 99.79 | 99.59 | 100 | 99.8 | 99.5 | 0.78 | 0.0011 | ||
SVM | 6 | 3 | [‘C’: [0.01,1, 10,100,1000],’kernel’: [‘linear’], | Default | 85.03 | 100 | 70.49 | 82.69 | 87.56 | 7.28 | 0.0375 |
7 | 3 | ‘gamma’]: [0.5,0.3,0.2,0.1,0.01], ‘kernel’: [‘rbf’] | O [‘C’: 1000, ‘kernel’: ‘linear’] | 100 | 100 | 100 | 100 | 99.94 | 0.18 | 14.299 | |
8 | 5 | Default | 90.85 | 84.72 | 100 | 91.73 | 89.63 | 0.93 | 0.0201 | ||
9 | 5 | O [‘C’: 1, ‘kernel’: ‘linear’] | 99.38 | 98.79 | 100 | 99.39 | 99.88 | 0.24 | 1.2329 | ||
NB | 6 | 3 | [‘var_smoothing’: [, , | Default | 93.76 | 97.77 | 89.75 | 93.59 | 94.63 | 3.36 | 0.0035 |
7 | 3 | , , , ]] | O [‘var_smoothing’: ] | 94.8 | 100 | 89.75 | 94.6 | 95.25 | 3 | 0.0039 | |
8 | 5 | Default | 98.34 | 96.83 | 100 | 98.39 | 98.5 | 1.26 | 0.0041 | ||
9 | 5 | O [‘var_smoothing’: ] | 99.79 | 99.59 | 100 | 99.8 | 99.88 | 0.25 | 0.0010 | ||
DT | 6 | 3 | Default | 100 | 100 | 100 | 100 | 98.62 | 3.92 | 0.0036 | |
8 | 5 | Default | 100 | 100 | 100 | 100 | 98.5 | 3.89 | 0.0059 | ||
6 | 3 | Default | 100 | 100 | 100 | 100 | 99.62 | 0.75 | 0.1699 | ||
RF | – 5 | 3 | Downgraded 5 [ n_estimators = 3] | 99.79 | 99.59 | 100 | 99.79 | 99.43 | 0.85 | 0.0149 | |
8 | 5 | Default | 100 | 100 | 100 | 100 | 100 | 0 | 0.1800 |
DDoS D 1 and M 2 Solutions | Isaac et al. [27] | Vishal Kumar [30] | Proposed Model |
---|---|---|---|
Features | [SFE, SSIP, RPF, SDFP, SDFB] | [SFE, SSIP, RPF] | [SOS, SSIP, RPF, SDFP, SDFB] |
Flows in layer 3 | Layer 3 | Layer 3 | Layer 4 |
Issue | Some features are same | Repeated Instances | - |
Sample duration | 5 s | 5 s | 5 s |
Flow counting | Only new flows | Existing flow and new | Only new flows |
Number of Instances | 1600 | 1600 | 1600 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alwabisi, S.; Ouni, R.; Saleem, K. Using Machine Learning and Software-Defined Networking to Detect and Mitigate DDoS Attacks in Fiber-Optic Networks. Electronics 2022, 11, 4065. https://doi.org/10.3390/electronics11234065
Alwabisi S, Ouni R, Saleem K. Using Machine Learning and Software-Defined Networking to Detect and Mitigate DDoS Attacks in Fiber-Optic Networks. Electronics. 2022; 11(23):4065. https://doi.org/10.3390/electronics11234065
Chicago/Turabian StyleAlwabisi, Sulaiman, Ridha Ouni, and Kashif Saleem. 2022. "Using Machine Learning and Software-Defined Networking to Detect and Mitigate DDoS Attacks in Fiber-Optic Networks" Electronics 11, no. 23: 4065. https://doi.org/10.3390/electronics11234065
APA StyleAlwabisi, S., Ouni, R., & Saleem, K. (2022). Using Machine Learning and Software-Defined Networking to Detect and Mitigate DDoS Attacks in Fiber-Optic Networks. Electronics, 11(23), 4065. https://doi.org/10.3390/electronics11234065