We assumed that the attacker has started to launch cascading attacks on the broker to receive messages sent to and from the devices connected to it and that it can be exploited and corrupted. Since the broker used in IoT devices is designed to receive messages from any sending device in its vicinity, and the attacker can consume and corrupt IoT devices without revealing its command, they easily fall into the trap of this type of attack. Therefore, there are operating systems and simulation tools that help us create virtual networks, observe the network and monitor the anomalies that may occur in its behavior. Taking these factors into account helps us investigate the cause of this anomaly, find out what is behind it, and find a solution. This is done before the anomaly occurs in a real network.
For this purpose, we created a virtual environment by installing VMware as a virtual operating system (ContikiOS) and then used the IoT simulation tool Cooja, which has many features to monitor network traffic and analyze the details that appear in it, which we relied on in this study. This experiment was also conducted for some types of attacks on IoT, but the type of HELLO flooding attack was not so much in focus. This is one of the network attacks where links or nodes become unavailable by generating a large amount of traffic. This can exhaust all network resources. Such attacks can be carried out by both internal and foreign attackers. To carry out a HELLO flooding attack, prompt messages are used.
This is also the contribution of the author, who has conducted many studies to define the problem and formulate a hypothesis to find a solution. Wireless Sensor Nodes (WSN) OS is an open-source operating system for an event-driven kernel underlying this lightweight and compact operating system. Preemptive multitasking at the system level is possible with this OS. Typical Contiki OS configurations require 40 kilobytes for ROM and 2 kilobytes for RAM. Prototype threads, preemptive multithreading, TCP/IP networking, and IPv6 are included in a full Contiki installation, as are an Internet browser and private web server, and various other utilities such as a screen saver and virtual network computations. Contiki has two types of communication stacks (Rime and uIP). uIP is a small communication stack of TCP/IP RFC-CONFORME, which simplifies Internet communication. Rime is a communication stack that has a low-power radio. It is said to be lightweight. It provides a set of basic communication options. Contiki OS Architecture 4.3.2.1. The attacker sends DODAG Information Solicitation (DIS) messages to neighboring nodes, which must reset their trickling timer, or sends unicast DIS messages to each node, which must respond with a DODAG Information (DIO) Object message to perform flooding attacks at the transport layer.
Table 2 shows the simulation parameters.
5.2. Classifier Approach: Dataset Description and Processing
The dataset used in this study was collected from Al-Kasassbeh et al. [
13]. The computed results showed that the method of the random forest (RF) algorithm is better for anomaly detection using machine learning techniques.
Procedure:
The first step is to describe the dataset and how it was preprocessed.
In the second step, the different models are cross-validated to see which model is most predictive.
The third step is to process the data using machine learning algorithms.
In the fourth step, each model is evaluated for performance.
A dataset collected by Al-Kasassbeh et al. [
13] was used. The dataset contains 4998 records with 8 classes and 34 attributes, as shown in
Table 5. Data for the variables were collected and divided into the Interface, IP, TCP, and ICMP groups. The 34 attributes were collected during attack testing, in which the server (victim) was subjected to various sorts of attacks. The information gain ratio for each feature was used to rate each characteristic included in the data, making it possible to distinguish between features that are necessary and those that are not.
Data preprocessing steps were performed before the classification task. We removed the redundant columns with a high correlation of 0.9. Missing values in the data were handled using mean imputation. Scaling of the characteristics was also performed. A standard scalar was used to transform the data to have a mean of zero with a standard deviation of one [
26]. The proper selection of SNMP-MIB variables is crucial to detecting anomalies on networks because no one variable can capture all anomalies. To detect anomalies more accurately, we focused on using effective variables. Router devices were used to collect MIB variables. A total of 34 MIB variables were selected from five MIB groups: IP, TCP, UDP, and ICMP (variables collected from specific router interfaces). A counter 32 is a non-negative four-byte integer that is continuously incremented from 0 to 232, and wraps back to 0 when it reaches its maximum value. As a result of a comprehensive investigation, we selected these variables among other MIB variables in the groups because they are more affected by attack traffic and are continuously updated based on the incoming and outgoing traffic over the network; therefore, they are more effective in detecting attacks.
By demonstrating the identification of as many of the most prevalent and contemporary attacks that can occur on various network layers as is practical, we demonstrate the strength and usefulness of SNMP-MIB data in network anomaly detection (network layer, transport layer and application layer). Using categorization techniques, we are currently testing the SNMP-MIB data in tests. In the first method, we divided the MIB variables into five categories with 34 attributes (Interface, IP, ICMP, TCP, and UDP), with a number of MIB variables belonging to each category. Then, each MIB group was subjected to the classification algorithms separately in order to demonstrate how each group is impacted by attacks and, ultimately, to identify the group or groups that are most successful at spotting anomalies. According to the preliminary findings of this strategy, each classifier performs differently across the MIB groups, with a range of accuracy rates for the employed classifiers between high and low.
The cross-validation (CV) method was used to evaluate the models. The K-fold method and the leave-one-out method were examined for cross-validation. An analysis of K-fold cross-validation was conducted using 80% of the data as the training set and 20% as the testing set without replacing 80% of the training data with 20% of the testing data. The original sample’s observation was used as testing data for the leave-one-out cross-validation of the K-fold (k = 5) method, and the remaining observations were used as training data. As a result, every observation in the sample was used as testing data once.
Table 6 shows the collected sample.
Supervised machine learning algorithms can best be understood through the lens of the bias-variance trade-off. Some popular examples of supervised machine learning algorithms are linear regression (LR) for regression problems, random forest (RF) for classification and regression problems, support vector machines (SVM) for classification problems, and k-nearest neighbors (KNN) for both regression and classification. Data are predicted into discrete class labels through the classification process. Alternatively, regression creates a model that predicts continuous quantities. In this research, we investigated support vector machine (SVM), random forest (RF), k-nearest neighbors (KNN), and logistic regression (LR) classifiers for support DDoS attack detection.
Additionally, an artificial neural network-based approach called multilayer perceptron (MLP) has been investigated. These techniques could detect malicious activities and attacks, improve human analysis, and automate repetitive security tasks. The implementation was done using Python and related libraries such as Scikit-learn, Pandas, Numpy, TensorFlow, and Keras [
33]. Before the classification task, the data were preprocessed. The redundant columns with a high correlation of 0.9 were removed. Missing values in the data were handled by mean imputation. The results show that the random forest (RF) algorithm has high accuracy in detecting DDoS attacks. Moreover, the performance using a multilayer perception (MLP) is generally ideal and very similar to RF. This work provides a robust and efficient approach to predict DDoS attacks from the dataset SNMP MIB. Collected sample records are shown in
Table 6. The calculated values for sensitivity, specificity, accuracy, precision, recall and F1-measure for different classes with different algorithms are shown in
Table 7.
High sensitivity, specificity, and accuracy are all hallmarks of a good test. The results of our classification performance with traditional machine learning algorithms and multilayer perceptrons (MLP) and USML. We compare the model performance of machine learning algorithms (SVM, RF, KNN, LR) and the results obtained with the multilayer perceptron (MLP).
We compared several traditional machine learning algorithms using all the features from the SNMP-MIB dataset. SNMP-MIB is used to detect patterns of DDoS attacks. Machine learning algorithms, including support vector machine (SVM), random forest (RF), k-nearest neighbors (KNN) and logistic regression (LR), and an artificial neural network, multilayer perceptron (MLP), naive Bayes (NB) and decision tree (DT) are used to classify the dataset. Random forest (RF) is the best classifier with the highest accuracy for detecting DDoS attacks when traditional machine learning algorithms and MLP are used in the experimental analysis. Machine learning algorithms were evaluated based on sensitivity, specificity, accuracy, precision, recall and F1 score. The results presented in this section are based on the use of a 5-fold CV and hyper-parameter tuning with a grid search. RF has high accuracy in detecting DDoS attacks. For validation, these algorithms are used in a binary classification test, and their performance is statistically measured and compared with the existing literature (
Table 8) [
9,
10,
11,
12].
According to the results, the RF algorithm proved to be very accurate in detecting DDoS attacks. The random forest (RF) algorithm is an ensemble algorithm that contains multiple decision tree algorithms. Moreover, the performance using multilayer perceptions (MLP) is generally ideal and very similar to RF. This work provides a robust and efficient approach for predicting DDoS attacks from the SNMP MIB dataset.
5.3. Intrusion Detection Schemes
The details of the experiment given in this section provide a clear overview of the response framework and its associated benefits. Key details of the experiment include log analysis, net flow analyzers, intrusion detection, and mitigation. We implement an intrusion detection system using Snort. A snort is an open-source software that can run in three different modes, i.e., packet capture, packet sniffer mode, packet logger, etc. In packet logger, these packets are written to the disk, or we can run it in intrusion detection mode, using the rule sets available in Snort and IDS (compares packets with rule base) [
34]. Snort is on a network, so it listens for traffic coming over the network. Therefore, it is a network-based intrusion detection system. Generally, a network-based intrusion detection system is deployed at a single point of entry into a network. They use simple rules, which are signatures for detection. Snort rules, including malicious traffic, exploit, scan, FTP, telnet, DOS, DDOS, etc., are enabled in Snort. Snort rules are either site-specific policies or are required in most environments to avoid false positives.
The more rules that need to be matched, the slower the IDS, and the more packets are dropped. Snort has three main uses. It can be used as a pure packet sniffer such as tcpdump, a packet logger used to debug network traffic. Snort logs packets in tcpdump in binary format and names them by their IP address. In packet capture mode, Snort received 142 packets, analyzed 70 packets (49.2%), and discarded 0 (0%), 63 UDP packets, 0 TCP packets, 2 ARP packets, and 6 fragmented packets. In packet logging mode, Snort analyzed 17 packets (47.2%), dropped 0 (0%), logged 17, and issued 0 alarms. In alert mode, Snort analyzed 4 out of 4 packets and discarded 0 (0%). In sniffer mode, Snort analyzed 14 packets and discarded 0 (0%) (
Table 9).
Figure 11 shows the analysis of packets in the different Snort modes. Network throughput increases with average packet arrival (packets/time slot) and maximum buffer size. This reflects the effectiveness of the rules applied to ensure that as many packets as possible successfully arrive at their destination.
The attack detection rate decreases with the number of packets or nodes, bandwidth consumption increases with the number of nodes, and throughput increases with the number of nodes. When Snort runs in packet logger mode and collects each packet, it arranges the packets in a directory. When running in IDS mode, it uses the rules available in the snort.conf file that specify suspicious network activity and sends an alert if the rules match the actual activity. Network traffic is analyzed using sFlow- RT. It is used for bandwidth analysis, network traffic analysis, and network performance monitoring. As seen in
Figure 12, a higher peak indicates flood traffic from random IP addresses. Malicious traffic that saturated the victim was reduced after the network was trained. Lower peaks after 00.24.50 s indicate that mitigation was performed quickly and successfully.
5.4. Discussion
This research is conducted in a virtual environment where a virtual attack is carried out, and then it uses the situation to collect data and measure the extent of the benefits obtained based on criteria that the previous researchers did not analyze. Considering the relevant study and the findings from the empirical analysis, the most important finding is the weak protection of systems in many IoT devices. The Cooja tool revealed anomalies that could not have been observed without using the tool. Thus, it is an excellent tool for this type of testing. The tool was found to provide several auxiliary methods for the simultaneous comparison of the collected data. The theoretical significance of the results of this study lies in the fact that they will help identify a body of knowledge related to cybersecurity issues associated with networks and embedded subsystems of the IoT. Consequently, the results of this study should be of interest to future researchers studying IoT issues and how to appropriately address them.
The results of this study are relevant to vulnerability researchers and IoT network protection specialists because they guide how to avoid problems that may occur in real networks by first simulating them and then developing proactive solutions to them. In addition to avoiding short-term problems, there are also long-term solutions. It was stated that the IoT has become a material and moral part of our lives, and the weak protection in it may become a real threat to our lives, so it is necessary to search and investigate the areas of its security as much as possible after identifying a few in the relevant study related to the attack that was implemented, which is the Flooding attack, compared to other types of attacks. For the quantitative portion, the environment is configured to default in a preset scenario where M2M traffic monitoring using MQTT is attacked twice, once in normal mode and again after a default attack is run on it and is set up to measure the impact of malicious activity. The regulators and research experts use the findings of this study to detect vulnerabilities for IoT/embedded subsystems in a systematic manner so that the application process can be carried out successfully.
The performance of the prominent machine learning algorithm used in binary is evaluated based on sensitivity, specificity, accuracy, recall, precision, and F1-measure. In this study, we compared the model performance of machine learning algorithms (SVM, RF, ANN, LR) with results obtained with multi-layer perceptron (MLP). The results are based on the use of a five-way CV with a grid search and hyper-parameter tuning. The intrusion detection schemes implemented with Snort include protocol analysis, network flow analysis, intrusion detection, cyber-attack mitigation, and returning to normal.
Every study has its limitations, but this one has so far succeeded in identifying a flooding attack. However, it was utilizing a technology that had its limits, as while analyzing the virtual network, it took several minutes before the anomaly was discovered, which might have had major repercussions and losses if it had occurred. While simulation has been very helpful in gathering data and identifying abnormalities, there are still many other routes to explore to enhance this research and allow for an additional examination of pertinent papers.