1. Introduction
IoT technology [
1] has seen significant advancements in recent years, leading to steady development and enhancement of the industrial chain. IoT technology is extensively used in several industries. Its integration with cyclical forces arising from infrastructure expansion, crucial industrial transformation, and consumer upgrading contributes to the exponential growth of the IoT industry as a whole. The advent of 5G coincides with the commencement of the Internet of Things. The origin of 5G technology has initiated a transformative revolution in the realm of IoT [
2], as it facilitates the expansion of human communication to include the vast network of interconnected devices and systems known as the Internet of Everything. However, the rapid growth of the IoT also gives rise to distinct challenges, particularly in security. IoT necessitates the development of a pragmatic framework capable of detecting anomalous network activity. This is particularly crucial since most IoT devices are inherently limited in their computational capabilities, rendering them vulnerable to malicious hacking. This might lead to challenges such as declining service quality or unauthorized access to sensitive information.
Machine and deep-learning-based approaches [
3] have been widely used for abnormal traffic identification in the IoT as artificial intelligence technologies gain prominence. The efficacy of feature engineering is significant for the approaches based on machine learning [
4,
5]. Nevertheless, IoT devices can see and gather external data, which is facilitated by a diverse array of sensors. Every sensor functions as a source of information, and various types of sensors collect information that varies in substance and format. Therefore, extracting network traffic data presents a substantial obstacle for detection algorithms. Moreover, the IoT is a pervasive network that relies on the Internet as its fundamental infrastructure and core technology. By integrating diverse wired and wireless networks with the Internet, IoT enables the precise and real-time delivery of object information. This entails transmitting the data collected by IoT sensors at regular intervals to designated devices via the network. However, the extraction of features from traffic in IoT devices poses a significant challenge due to the limited processing capacity of these devices [
6], and the large volume of data involved could overload the devices. Additionally, if the feature extraction model is implemented in a data center or cloud server, data transmission becomes a complex issue. On the other hand, deep learning has recently arisen as a novel approach to learning, characterized by its robust learning skills, capacity to adapt to changing settings, and lack of reliance on manual feature engineering [
7,
8]. This approach has effectively mitigated and resolved several challenges often encountered with conventional methodologies [
9].
The Spring 2022 State of IoT study, published by IoT Analytics, disclosed that the worldwide tally of IoT devices saw an 8% increase, culminating in 12.2 billion units in 2021. Nevertheless, the observed growth rate exhibited a notable decline compared to preceding years, primarily due to the limited availability of semiconductor chips. The training dataset must eventually include a large number of different negative examples for the majority of traditional detection algorithms. Therefore, this paper posits that acquiring a complete collection of negative samples for the vast range of IoT devices is a significant challenge.
This study employed a distinct methodology by only using positive traffic (allocated for testing purposes) to train the model. The training data were fed into the model for detection, resulting in the determination of the least confidence index for each class of samples, which indicates proper classification. Subsequently, this index was used as the threshold value for each category throughout the process of fine-tuning. Consequently, traffic data with a confidence index lower than its corresponding class were classified as malicious traffic. Traffic data with a confidence index lower than the appropriate class was regarded as negative. This study presents a new network model that aims to extract feature information only from positive traffic. The proposed model employs one-dimensional convolution to extract temporal information from time series data and two-dimensional convolution to extract spatial information. These two types of convolutions are combined to enhance the model’s overall performance. This extensive feature extraction technique allows the model to achieve heightened confidence in distinguishing regular traffic patterns and accurately detecting malicious traffic instances. Furthermore, it is essential to highlight the significance of minimizing the amount of input data needed and improving the detection capabilities of the model. The contributions of this study may be briefly summarized as follows.
(1) Our proposal entails using normal traffic to train the model. The proposed model can detect malicious network traffic without including negative samples during training. This approach effectively mitigates many issues arising from the inadequate collection of complete negative data and the inherent imbalance between positive and negative samples.
(2) In this study, we provide a novel network model. The extraction of sequence information is achieved through one-dimensional convolution, while spatial information extraction is accomplished through two-dimensional convolution. These convolutional operations are combined to enable the model to comprehensively extract feature information from the data. This allows the model to achieve a high level of recognition for normal traffic, resulting in an increased confidence index.
(3) This study utilizes Pcap files to exclude the 32 bytes after the first 24 bytes of the file header. This adjustment ensures that the data adhere to the specified criteria, mitigating the impact of data filling, cleansing, and other activities that may distort the real data. Additionally, this approach effectively reduces the length of the input data.
The subsequent sections of this manuscript are as follows:
Section 2 provides an overview of the existing literature and research in the field.
Section 3 provides an overview of the data used in the experimental analysis and a detailed description of the suggested model for monitoring anomalous traffic in the context of the IoT. In
Section 4, the practical methodology and findings are presented.
Section 5 comprehensively summarizes the present study and outlines future research directions.
2. Related Works
In recent years, there has been a conspicuous rise in the prominence of deep learning as an emerging field of research. One of the primary advantages of deep learning is its capacity to autonomously acquire feature representations from raw data, thereby eliminating the need for manual feature engineering. This feature boosts the system’s proficiency in representing information and contributes to its overall resilience and flexibility.
Table 1 presents the latest studies in this specific field, including the used datasets, methodology, and relevant particulars.
The OwLEye method, created by Yong et al. [
10], utilizes a machine learning model to evaluate the malevolent score of online requests to identify web traffic. A threshold is established to ascertain whether a query constitutes a web assault. In their study, Ageev et al. [
11] used fuzzy logic inference to detect abnormal traffic patterns in IoT networks. This was achieved by the analysis of stationary Poisson or self-similar traffic that is characteristic of these networks. In their study, Zhu et al. [
12] presented Fed-SOINN, an attention-based federal incremental learning algorithm. The technique described in this study enhances the efficacy and speed of model optimization by using asynchronous updates on a central server. The authors in reference [
13] used non-linear transformation and structural risk reduction to convert the Internet traffic categorization issue into a quadratic optimization problem. This methodology does not need the feature selection process and exhibits commendable stability and accuracy.
In their research, N. Islam et al. [
14] investigated the application of intrusion detection systems (IDSs) in IoT environments. The study concentrated on decision tree (DT), random forest (RF), support vector machine (SVM), and deep machine learning techniques, including deep neural networks (DNNs), deep belief networks (DBNs), long short-term memory (LSTM), stacked LSTM, and bi-directional LSTM (Bi-LSTM). This research assessed the efficacy of shallow and deep machine learning methodologies using diverse variables, including accuracy, precision, recall, and F1 score. The study’s findings revealed that deep learning intrusion detection systems (IDSs) had superior performance in detecting IoT threats compared to shallow machine learning methods.
Abdel-Basset et al. [
15] adopted the LocalGRU method to achieve local representation and utilized a multi-headed attention layer to gain global representation. An intrusion detection model was developed to analyze IoT traffic inside a fog computing environment. The experimental findings revealed that the model had a remarkable accuracy rate of 99.75% when evaluated on the UNSW-NB15 dataset. In their study, Putchala [
16] introduced a novel multilayer IoT architecture that integrates deep learning techniques, namely long short-term memory (LSTM) and gated recurrent neural networks (GRUs). This suggested design aimed to achieve a lightweight implementation for IoT systems. The performance of the design architecture was assessed using the DARPA/KDD Cup 1999 intrusion detection dataset for every layer, resulting in an accuracy rate of 98.91%.
In research undertaken by Lopez-Martin et al. [
17], characteristics were retrieved from the packet headers transmitted throughout the stream lifespan. A distinct attribute was developed for every data stream, whereby just the features of the packets were used, with the exclusion of IP addresses. The collected characteristics were used to investigate several model architectures, such as recurrent neural networks (RNNs), individual convolutional neural networks (CNNs), and hybrid combinations of CNNs and RNNs. The researchers obtained a peak accuracy of 96.3% by performing comparison studies. In contrast, M. B. Umair et al. suggested a classification method using multilayer deep learning [
18], resulting in a notable accuracy rate of 99.23%. This classification method’s accuracy was superior to that of support vector machine (SVM)- and K-nearest neighbor (KNN)-based classification algorithms.
Atayero [
19] proposed the development of DRNN and SMOTE-DRNN models using the Bot-IoT dataset. The models sought to mitigate the issue of class imbalance by using the synthetic minority oversampling method (SMOTE) to generate supplementary samples representing the minority class. Deep recurrent neural networks (DRNNs) enabled the models to acquire hierarchical feature representations from the balanced network traffic data, facilitating discriminative classification. The findings from the simulation indicate that the performance metrics of the DRNN model, including accuracy, recall, F1 score, AUC, GM, and MCC, were negatively impacted by the presence of a high-class imbalance within the dataset. The SMOTE-DRNN model demonstrated a notable accuracy rate of 99.50%.
Rezvy et al. [
20] employed a deep self-coding dense neural network technique in their research to proficiently identify intrusions or threats inside 5G and IoT networks. The findings revealed a noteworthy improvement in the precision and efficiency of detection. In contrast, Sarika [
21] and colleagues introduced a technique that employs deep self-encoders to identify potentially malicious network behaviors through IoT devices. Nevertheless, the methodology used by these researchers resulted in an accuracy rate that fell below 85% for both datasets.
This study presents a new methodology for obtaining data information features via one-dimensional and two-dimensional convolutional crossover algorithms. The model underwent conventional traffic patterns training and can identify atypical traffic instances that depart from the established norm.
4. Experiments and Results
This section begins by conducting performance evaluations of the enhanced model using experiments on the CIC-AndMal2017 dataset. Subsequently, the model was applied to the CIC IoT Dataset 2022 to assess its capacity for traffic classification and anomalous traffic detection in the IoT context. Furthermore, an examination was conducted to determine the model’s generalization capability, exploring its potential applicability in domains beyond IoT. The computations were executed on a laptop with a 12th Generation Intel(R) Core(TM) i9-12900H processor running at a clock speed of 2.50 GHz. The laptop had 16.0 GB of onboard RAM. The calculations were performed using the torch package in Python.
The most effective approach for evaluating the model’s performance is to execute it inside an actual network environment. However, due to the high cost associated with this strategy, we considered the network model’s performance by examining the test set’s performance. Accuracy (acc) is a widely used evaluation metric in classification models. It represents the proportion of samples correctly classified by the model. In the context of IoT anomalous traffic detection, accuracy provides an overview of the overall prediction results. Precision (P) measures the model’s ability to correctly classify normal traffic, i.e., the proportion of normal traffic that is accurately detected. Recall (p) measures the model’s ability to capture normal traffic, i.e., the proportion of normal traffic that is successfully detected compared to the total traffic classified as normal. In IoT anomalous traffic detection, we aimed to minimize false positives by prioritizing high precision. At the same time, we strived to identify as much anomalous traffic as possible by focusing on high recall. Both precision and recall are crucial metrics in this context. To provide a comprehensive evaluation of the model’s performance in anomalous traffic detection, we used the F1 score. The F1 score combines precision and recall, offering a balanced assessment. By considering the F1 score, we can gain deeper insights into the model’s performance. They are formulated as follows:
Based on our understanding of neural networks [
34,
35,
36,
37,
38], we hypothesized that the Resnet residual network effectively captures the spatial (2D) feature information of traffic data. However, it overlooks the sequence features inherent in traffic data, which may result in incomplete feature extraction and consequently impact the detection of malicious traffic. For these reasons, we made improvements to the direct mapping part of the original Resnet18. We incorporated one-dimensional convolution to extract sequence features, thereby combining sequence feature extraction with spatial feature extraction. However, it is necessary to conduct experiments to compare the performance of the original model with the modified model after replacing the constant mapping layer with one-dimensional convolution. We compared the performance of the original model and a modified version that incorporates one-dimensional convolution by conducting experimental observations.
Initial experiments were undertaken to assess the efficacy of the loss function using both the original residual network model without any enhancements and the updated residual network model with alterations. In this study, we used a selection of 10 distinct categories of traffic classified as adware, extracted from the Pcap file inside the CIC-AndMal2017 dataset. A total of 50,000 data points were obtained by selecting 5000 data points for each traffic class. Furthermore, we included ten distinct data labels. We partitioned the dataset into training and test sets at a ratio of 8:2. The data in the test set were mixed randomly to mitigate bias during the training phase and improve the model’s capacity for generalization.
As seen in
Figure 7, the ultimate loss function of both the initial model (Mal_R0.01) and the novel model (Mal_RL0.01) exhibited similar results after adequate training iterations. The novel model demonstrated a marginal decrease in loss. This finding illustrates that substituting the constant mapping layer of the residual network with a one-dimensional convolutional layer does not result in a decline in the model’s performance. Additionally, it was noticed that the ultimate stable value of the loss function was unaltered with learning rate values of 0.01, 0.005, and 0.001. The loss function reached a steady state after 20 epochs. Hence, a learning rate of 0.01 was selected for the remaining trials, and the training process was conducted for 20 epochs.
The performance of the two models was assessed based on their accuracy. In the trials, the parameters of both models’ framework structures were similar, except for the variation in residual blocks, which pertains to the feature extraction process. Simultaneously, we executed a series of five experimental rounds to mitigate the influence of random occurrences. A disparity in accuracy existed between the two models in their ability to conduct data categorization. The mean accuracy of the original residual network model was 0.9254, while the mean accuracy of the new residual network model was 0.9303. This discrepancy between the two models is significant.
In the context of IoT traffic data, our objective was to detect anomalous network traffic data. Previous methodologies have primarily focused on identifying anomalous network traffic, but they have experienced several obstacles. The prevalence of various forms of atypical traffic has shown a rising trend over time, posing challenges in procuring an adequate number of negative samples. To tackle these issues, we propose a unique methodology that evaluates the confidence index of normal traffic data. The characterization of traffic data as normal is contingent upon the presence of a high confidence index, which indicates the extent of our faith in its accuracy. In contrast, when the confidence index of regular traffic data is deemed inadequate, it is categorized as abnormal traffic. To accomplish this objective, extracting an adequate number of characteristics from the traffic data is necessary. This allows us to give a high level of confidence to regular traffic data and a low level of confidence to anomalous network traffic.
The experimental results of our study indicate that the model we created is capable of extracting information from traffic data with a better feature extraction capability. The experiments were conducted using the CIC IoT Dataset 2022 to categorize IoT devices [
39,
40] and detect aberrant traffic.
Initially, the positive samples in the dataset (
Table 2) were partitioned into a training set and a test set at a ratio of 8:2. Subsequently, the newly formed model was trained using the training set. Subsequently, the minimum confidence index was extracted from each category (
Figure 8) and uniformly adjusted to serve as a threshold for the test set. The experiment was performed using both the old and new models.
When utilizing the confidence index for identifying malicious traffic, the model typically produces high-confidence classification results for normal traffic. However, for abnormal traffic, the model may yield a lower confidence score. By establishing an appropriate threshold, it becomes possible to allow normal traffic to pass the detection while still detecting abnormal traffic. Using this method addresses the issue of sample imbalance caused by insufficient negative sample collection. Additionally, it aids in the detection of new types of malicious traffic by assigning a lower confidence index when the traffic falls below the threshold. This approach is beneficial for classifying and identifying emerging types of malicious traffic.
To evaluate the model’s performance, the test set was partitioned into two subsets: positive and negative. These subsets were then tested independently to assess their respective outcomes. The minimum confidence index for each class in the training set, the minimum confidence index for each category in the positive samples, and the maximum confidence index for each class in the negative samples (as shown in
Figure 9) were selected. It was observed that the positive samples exhibited similar values to the threshold value of the confidence index. A limited number of negative samples had confidence indexes close to the threshold value. Consequently, we can modify the threshold value to ensure that the positive samples satisfy the specified criteria and the negative samples fail to meet the threshold, facilitating the identification of abnormal traffic. In terms of confidence index setting, our goal was to ensure that a significant number of positive samples have a confidence level above the threshold. At the same time, we aimed to minimize the number of malicious traffic samples that have a confidence level below the threshold. This approach allows us to effectively detect abnormal traffic while allowing normal traffic to pass through the detection process. In our experiments, we observed the F1 scores of the model and determined that setting the threshold to the minimum confidence index minus 0.02 for each class in the training set improved the F1 score performance. This adjustment allowed the model to maintain a high level of accuracy in recognizing both types of traffic. Consequently, we applied the same threshold in all subsequent experiments.
The confusion matrix presented a comprehensive analysis of the Resnet-18 and Conv1+Resnet-18 models, specifically regarding their performance on positive and negative samples. The classification accuracy for each data category consistently exceeded 99.9%, mirroring the accuracy of the original model. The negative samples were deliberately chosen to include Flood and RTSP Brute Force (RSTP) types. The Flood category encompassed the entirety of the Camera data. In contrast, the RTSP Brute Force category contained HTTP, UDP, and TCP traffic data to comprehensively evaluate the model’s capacity to detect anomalous traffic. According to the data presented in
Figure 10, it can be observed that the initial model exhibited a false positive rate of 0.02% for Flood detection and 11.75% for RSTP detection. Consequently, the overall accuracy of the model was at most 97.64%. Furthermore, it is worth noting that the model’s performance exhibited some instability, occasionally attaining an accuracy level of approximately 90%. On the other hand, the newly introduced model (depicted in
Figure 11) showed a consistent performance and achieved a detection accuracy of 99.9% for negative samples.
The evaluation metrics for the Conv1+Resnet-18 model were computed and are displayed in
Table 3. The findings indicate that the accuracy of identifying both types of aggressive traffic surpassed 99%, with the identification rate of positive samples reaching 100% in the most favorable circumstances. The results of this study suggest that the approach presented in this research article exhibits considerable promise in anomaly detection for IoT traffic.
To demonstrate the superior performance of our model, we trained a total of six models on the dataset. These models included traditional network models such as LSTM, Lenet-5, and Resnet-18, as well as the novel model CNN+LSTM. Additionally, we used network models FedAVG and CNN(1D)+GRU, specifically designed for IoT. By comparing these models with the model proposed in this paper, we aimed to highlight the advantages of our approach. As shown in
Table 4, the method proposed in this paper necessitates a high confidence index to be assigned to the positive samples. Moreover, it utilizes a mere 32 bytes of data. Therefore, the employed model must effectively extract the information embedded in the positive samples. This extraction is crucial for accurately distinguishing between positive and negative samples since only positive samples are used as training data. The LSTM model has limited feature extraction capability and struggled to stabilize its confidence index for normal traffic, resulting in an accuracy of only 90.88% on the test set. On the other hand, FedAVG showed a significant improvement in accuracy. However, due to its non-centralized training, the confidence index of positive samples was not sufficiently large, making it challenging to distinguish negative samples. Both the Lenet-5 and Resnet-18 models achieved 99% accuracy on positive samples. However, due to Resnet-18’s superior information extraction capability, it outperformed Lenet-5 in recognizing negative samples. When trained with both positive and negative samples, both the CNN+LSTM and CNN(1D)+GRU models exhibited good performance. However, they were not as accurate as our proposed model in detecting malicious traffic when trained solely on normal traffic.
In our study, we utilized the binary format of data packets for model training. This approach helps us overcome the challenge of handling diverse data from heterogeneous IoT devices. Additionally, the data we select for analysis exhibit common characteristics found in most IoT devices. The design of our model focused on extracting data features for our study, independently of any specific traffic model. As a result, our findings can be generalized to various types of IoT devices without the limitations imposed by different device types or traffic models. In our upcoming research, we aim to develop a framework for detecting malicious traffic. This framework will operate in collaboration with both the cloud-side and end-side components, and our objective is to implement this method in real-world scenarios.
Based on the aforementioned experimental sections, the accuracy of the classification model and the confidence index for correctly classifying positive samples as positive was relatively low for the category of traffic data known as CIC-AndMal2017. This can be attributed to the similarity between the data instances within this category. Conversely, when conducting experiments on the CIC IoT Dataset 2022, we aimed to assess the method’s applicability to other relevant domains. To this end, we selected four distinct types of traffic data, namely chat (4000), email (4000), files (4000), and audio (2000), from the ISCXVPN2016 dataset. This dataset’s total number of instances used for testing purposes amounted to 14,000. Three audio samples were selected for the experiment as positive samples, while an additional set of audio samples was chosen as negative samples.
The experimental results indicate that the trained model exhibits a high level of stability, accurately identifying anomalous traffic and categorizing normal traffic with a success rate exceeding 99%. The model parameters utilized in the two previous studies remained unchanged.
5. Conclusions and Future Work
This study compared our modified residual network model and the original model, focusing on accuracy. The results consistently demonstrate that our model performed better than the original model. This observation suggests the exceptional performance of our model. Subsequently, our model was implemented to identify abnormal network traffic in the Internet of Things (IoT) domain, resulting in a commendable accuracy rate of 99% or above when detecting Flood and RSTP anomalous traffic. Furthermore, our model demonstrated a high level of accuracy in the classification of normal traffic. In addition, the model was evaluated on the ISCXVPN2016 dataset to identify abnormal network traffic not included in the training set. The outcomes of this evaluation were found to be satisfactory.
The methodology employed in our study has yielded encouraging outcomes within the realm of IoT. Based on the experimental research, we demonstrated the potential of our methods for detecting anomalous traffic in both application software and Internet networks. However, it is imperative to thoroughly examine the many challenges and issues in effectively identifying and detecting malicious network traffic. Based on these findings, we propose two recommendations for future investigations.
(1) Privacy and data transmission: The utilization of federated learning mitigates the issue of privacy breaches among clients. Rather than retaining the entirety of clients’ data, federated learning ensures privacy protection through model aggregation. Additionally, it alleviates the challenge of excessive data transmission in network settings. This approach can be effectively employed to address the issue of data transfer during the process of model training in network environments.
(2) Computational resources: Using computational resources in traditional centralized detection methods places a significant burden on data centers, resulting in inefficient use of computational resources on edge devices. However, this issue can be effectively addressed by adopting edge computing techniques. Hence, investigating approaches for transferring detection responsibilities from the central data center to the edge, thereby facilitating the optimal utilization of edge computing resources, presents a potentially fruitful avenue for further academic investigations.