1. Introduction
Since 20 September 2016, the Mirai malware has attacked Internet of Things (IoT) devices [
1] and crippled half of U.S. network activity [
2]. Distributed denial of service (DDoS) attacks have exploded and escalated trends over many years. With the COVID-19 pandemic breakout in 2020, people have isolated themselves from activities and become more dependent on the network, and DDoS attacks have also grown dramatically [
3]. Because most businesses are service providers, they must be operated continuously, so failure caused by a hacked network or service will result in financial and reputational loss [
4]. Along with the advancement of technology, DDoS attack techniques evolve daily [
5], and it is impossible to defend against new threats with old methods. In this situation, we require a mechanism that allows the existing intrusion detection system (IDS) to recognize unknown traffic characteristics to assist the telecom engineer in locating unseen attacks.
According to the distributed denial of service (DDoS) quarterly report conducted by Cloudflare, a content delivery network (CDN) provider, thousands of DDoS attacks are launched each month [
6]. Although most of the attack traffic is below 500 Mbps, this volume is sufficient to interrupt several enterprise systems temporarily. Even every quarter, specific attacks up to 100 Gbps will occur, causing large-scale service disruptions and possibly data center closures, harming the service provider’s finances and resulting in compensation.
In light of internet information activities’ growth and the expansions of new services, DDoS attack tactics are also continually evolving. This is a significant challenge for traditional IDS systems, which must repeatedly be trained on attack patterns reported by telecommunications experts. However, according to a report by Cloudflare, most attacks are over within an hour, making it too late for telecom technicians to launch an investigation. Artificial intelligence technology has made pivotal advancements in recent years, and related research has been utilized in various disciplines, including cybersecurity. Many deep learning-based IDSs have been designed and exhibit high accuracy. The accuracy rate for identifying recognized conventional DDoS attacks can reach more than 90% in the relevant experiments [
7,
8,
9]. However, if a traditional IDS encounters new types of attacks, the model does not consider them unknown, so they are incapable of being confronted. Given this, we need an IDS that can flag the unknown traffic to the telecom engineer for analysis at the start of the attack rather than evaluating whether it is good or bad, especially when the different characteristics between old and new threats are highly evident. The defensive system’s reaction will be particularly crucial if it faces an attack with distinct essential elements. That indicates that the issue is no longer with the performance of the training procedure; perhaps the most straightforward approach is to update the training and test datasets. Nevertheless, the model’s challenge is the unknown traffic, and the open set is not as simplistic as the closed one.
This paper proposes a novel IDS architecture using deep learning technology as a basis combined with the statistical value of the reconstruction error and the distribution of the output feature space to detect unknown traffic. The model backbone employs DHRNet [
10] as the original architecture, enhanced with SLCPL (spatial location constraint prototype loss) [
11] to centralize the outputs in different directions, whereas the feature space distribution modeling part is implemented by a one-class support vector machine (OC-SVM) [
12] approximated by stochastic gradient descent (SGD). This study’s architecture inherits advantages from DHRNet architecture: it can directly generate reconstruction errors to incorporate with SGD OC-SVM to identify unknown traffic and forward it to the telecom engineer for labeling. The incremental learning module uses labeled samples to enhance the defensive performance of the IDS.
The remainder of this paper is organized as follows:
Section 2 provides a summary of related work.
Section 3 describes the assumptions about the situation and the detection framework proposed in this paper. The experimental results are described in
Section 4.
Section 5 concludes this research and provides future prospects.
3. Proposed Methodology
This article presents a framework incorporating 1D-DHRNet implicit reconstruction error and SLCPL loss function, a one-class support vector machine module (OC-SVM), and incremental learning as a solution to the OSR challenge in DDoS attack detection.
Figure 1 depicts the functional diagram of the proposed framework.
This study’s framework is constructed around the 1D-DHRNet model, which is used to discriminate between regular traffic and DDoS attacks. This model’s loss function comprises two parts. The first component is the reconstruction error, and SSE is used as the loss function for encoding and decoding restoration. The second part is SLCPL, located after the model’s output to deal with open-set risk. For SLCPL, the loss decreases as the output of same-class samples becomes more concentrated and the distance between different-class samples increases. To enable the model to detect unknown samples, this study adopts the SGD OC-SVM approach to model the feature space generated by SLCPL and identify samples outside the distribution. When SGD OC-SVM is fitted, only samples of the same class correctly classified by the model are used; data scattered outside the fitting range of this class of models are considered outliers.
In this research, DHRNet is preferred as the backbone. The network architecture concept proposed by Yoshihashi [
10] for the image field as a classification network employs an encoder-decoder, and the reconstruction error is considered during training. The prominent feature of DHRNet is that procedural reconstruction errors for unseen samples are more significant than the training data. Due to the different data types, the dataset in this study uses numerical type; therefore, we refer to the concept of this architecture and reimplement it with 1D CNN, which is called 1D-DHRNet.
The potential danger in the OSR problem is that even though the unknown samples have different spatial distributions, the softmax function will still classify them into any category. This study will rely on SLCPL to control the object space to eliminate the above issue. The output of this method will centralize the distribution of samples to create more space for different samples with any class. As a necessary enhancement, this study incorporated SLCPL into the loss function of 1D-DHRNet. This framework’s loss function consists of two parts. The first part employs reconstruction error and the loss function SSE (total squared error). The second component uses SLCPL, which follows the model’s output. The smaller the SLCPL value, the more concentrated samples for the same type and the greater the distance between samples of different types.
The OOD method utilized in this paper is still insufficient to give the model the ability to recognize unknown samples. It is crucial to develop a technique to model the feature space produced by SLCPL and identify samples outside the distribution. The more straightforward the procedure, the better the data generated by this research. The solution that satisfies the criteria and operates quickly enough is SGD OC-SVM. This technique simulates stochastic gradient descent using the OC-SVM. This study uses the technique of modeling each classification to get SGD OC-SVM closer to the original single-class application method. Only samples from the same class with the correct classification are used when SGD OC-SVM is fitted. Therefore, while predicting, all samples dispersed outside of this class of models’ fitting range are outliers.
3.1. The 1D Deep Hierarchical Reconstruction Nets (1D-DHRNet)
This study employs a modified DHRNet-based network architecture featuring 1D convolution, as shown in
Figure 2. The fundamental idea behind the network architecture is to enable the model to perform feature learning of the categories as well as classification to recover as many embedded feature values as feasible in the reconstruction phase. Following the data stream, SLCPL calculates the output y from DHRNet to determine the inner- and interclass distances. The output x’1 is used for the SSE calculation, and the loss value is merged with the class distance data from the SLCPL to accomplish the sample classification.
In
Figure 3, the real flow of the model’s data is generally portrayed. Prelu is utilized as the activation function in CNN encoders to enrich information display. Prelu maintains negative values and is linear; hence, no gradient vanishes. The main output y, with three neurons, is sent to SLCPL for classification and aggregation operations. Another output of the model is the z layer which is depicted in
Figure 2. After converting each layer’s values to convolution, they are compressed, deconvolved, and then converted back to the original data for error comparison and reconstruction.
3.2. Spatial Location Constraint Prototype Loss
SLCPL loss function is based on GCPL (generalized convolutional prototype learning). Both loss functions will generate large values when the model is classified correctly, but the output is not concentrated. Given
k is the class being predicted,
N is the number of known classes, and
is the embedding function (that is, the encoder CNN in the architecture of this article),
is the Euclidean distance between the output of the embedding function and the center of the prototype
. The formula for GCPL loss is derived as (1):
The distance between classes is provided by
. This loss uses the distance
between the sample
x and the prototype center that predicts the
k class. To minimize the loss function, one can increase the value of the sample with other classes’ prototype centers or reduce the distance from the predicted class prototype center as Formula (2):
The constraint term
is used to concentrate the distribution distance of the same class of samples. The distance formula is as in (3):
SLCPL is deduced additionally based on GCPL, as in (4). It can be found that this restriction is performed on the prototype center (5) as the SLCPL restriction item.
In (5),
,
. The
part is the distance between the center of the
i-class prototype and the center point. The literature shows that the
implementation method here is helpful for optimization of the training process. By controlling the variance of these distances, the distance from the center point of each class to the coordinate origin can be limited. Then, the model can be manipulated to yield the original value of the output. The space near the point in this paper,
will be written as
. The conceptual diagram of the operation is shown in
Figure 4, where the black dotted line is the decision boundary of softmax when making classification judgments.
3.3. Reconstruction Loss
This research uses reconstruction loss and SLCPL as multipurpose loss functions during training. Reconstruction loss will force the model to classify and reconstruct during training, and SLCPL will strengthen various types of intraclass distances during classification.
The loss part uses
SSE (sum of squared for error), which is expressed in (6) as reconstruction errors, and the loss for each batch is (7).
where
s are the original features, and
y are the features after reconstruction.
Compared with MSE (mean squared error), SSE can make the model pay more attention to the restoration difference of a single feature in the training stage. Because the single sample error is no longer averaged but evolved, this will magnify the reconstruction error of a single feature item. The overall loss function formula is shown in Formula (8).
3.4. Unknown Identification Module
Under the OpenMax principle, a hypersphere is constructed for each category, with the average start vector as the center. The farthest Euclidean distances from the center will be used to fit the Weibull curve to accumulate the distribution function for extreme value estimation. Therefore, this study uses the same concept, using the 3D feature space output produced by SLCPL with centralized features, with a one-class support vector machine (OC-SVM) featuring the SGD variant for hypersphere construction. Compared with the radial basis kernel function version of the OC-SVM, the computational complexity of the SGD OC-SVM is much lower.
The OC-SVM algorithm aims to find a hypersphere that distinguishes positive samples from negative samples. This outcome can be regarded as an optimization problem. The gradient descent method utilizes all samples to update the gradient loss during calculation, so its computational complexity remains high. SGD is also based on gradient descent, but small sample batches are used for updating. Since the update parameters are solved in small batches, the degree of loss reduction can be observed to determine when to stop the iteration. This approximation can significantly reduce the time complexity.
For the SLCPL feature space approximation map of OC-SVM, refer to
Figure 5, where the yellow area is the circled area of known classification, and the samples outside the yellow area will be regarded as unknown.
In the unknown identification module, this study uses a dual-index strategy for classification, and the strategy architecture is shown in
Figure 6. The first detection indicator is the observation reconstruction error
. Both
and the 99th percentile method are used to remove the large reconstruction data. Then, the OC-SVM scheme based on SGD approximation is adopted, and the model output is screened by the 0.5 percentile of the upper and lower bounds, such as in Formulas (9) and (10). Only the samples within the 99th percentile of the reconstruction error and within the OC-SVM rules will be passed, and the others will be aggregated and forwarded to telecommunication experts. The passed rules are shown in (11).
3.5. Incremental Learning
The framework developed in this study has an unknown identification module that can capture unknown traffic. In the hypothetical situation, the captured traffic is reported to the communication experts to be marked and to let the model learn again. This study uses a fine-tuned strategy for the aforementioned purpose. In the architecture of a multiclass model, it is possible to make the model learn again by updating some framework modules. The component that must be modified is the number of classifications of the SLCPL loss function, which allows the model to acquire new knowledge by adding new classifications and reduces the learning rate during training to prevent excessive forgetting of old knowledge.
5. Conclusions
According to existing research, the preponderance of training and testing studies only analyze known categories. Therefore, an intrusion detection system trained solely on datasets has weaknesses. Further, attacks having similar features to benign traffic is one of its crucial limits. This study presents a hybrid network architecture that combines the characteristics of unsupervised and supervised networks. Concurrently, the reconstruction and classification errors are used for training in conjunction with the OOD solution to detect unknown attacks. The experimental results demonstrate that the proposed architecture can provide a closed-set training model, a technique for rejecting output or recognizing it as unknown, which depends on communications engineers for data labeling and incremental training for evolution. The architecture proposed in this study shows promise in facing unknown emerging attacks.
For the existing new attack methods proposed by Cloudflare, such as CLDAP or layer 7 (L7) DDoS attack, no dataset with relevant attack samples can retarget attacks with this type of attack. The L7 attack is the most challenging because its traffic may appear to originate from a legitimate source. Our future research direction will be adding additional expansion modules to the proposed framework to address this issue. It is hoped that after further verification of the performance of this research architecture, it can be applied to the internal network environment as the gatekeeper of enterprise network security.