1. Introduction
With the proliferation of diverse and increasingly sophisticated cyber threats, network intrusion detection has emerged as a crucial tool for identifying malicious network traffic and maintaining the security of digital ecosystems [
1]. As the hardware capabilities of devices continue to advance, many network intrusion detection systems have integrated machine learning and deep learning models to enhance their detection capabilities [
2]. These systems, when combined with the deep learning method, rely more heavily on ample training samples to achieve improved generalization compared to traditional rule-based intrusion detection systems. However, the ever-evolving landscape of novel network attack methodologies poses significant challenges, especially when these emerging network attacks are characterized by a scarcity of attack samples. Traditional supervised intrusion detection systems, reliant on a large number of samples, face difficulties in promptly detecting and defending against these new types of network attacks.
To address the challenges outlined above, recent advancements in network intrusion detection have increasingly focused on the integration of meta-learning mechanisms. In this context, few-shot learning is particularly noteworthy, as it enables systems to learn from a minimal number of samples, thereby enhancing their adaptability to new and previously unseen threats. This capability is crucial for countering novel cyber threats, where traditional systems often struggle due to insufficient data.
Meta-learning frameworks are specifically designed to mitigate the issue of few-shot learning for emerging threats, such as zero-day vulnerabilities [
3]. Prior research [
4,
5,
6] has demonstrated that the incorporation of meta-learning frameworks into network intrusion detection systems significantly improves their ability to manage few-shot scenarios. By effectively extracting and transferring knowledge from specific tasks, these frameworks enable models to achieve remarkable recognition performance, even with limited sample sizes. Furthermore, our intrusion detection method is founded on a metric-based meta-learning approach, which enables it to effectively address multi-class scenarios. We hypothesize that by establishing a prototype for each category and classifying new samples through distance measurement, this approach will enable the system to proficiently manage multi-classification problems.
Adequate information is crucial for model construction and is a key element in ensuring the accuracy of intrusion detection systems. However, dependency solely on meta-learning algorithms cannot fully address the information loss caused by the few-shot problem. Therefore, in this system, we introduce an Adaptive Feature Fusion mechanism to augment the data acquired by the model from a representational perspective. Typically, network traffic in intrusion detection can be represented in two forms: one presents data packets in a binary data stream format, offering the advantage of input that closely resembles the original data and retains fine details; the other involves the extraction of network features from traffic, transforming data packets into a series of intuitive statistical features. AFF dynamically combines these two representations of network traffic, adaptively integrating the strengths of both. In cases of network traffic characterized by significant differences in macro-level statistical methods, AFF prioritizes statistical feature representation, while in other scenarios, it focuses on the binary data stream representation. In scenarios with a limited number of samples, AFF efficiently extracts valuable information from the samples, thereby mitigating the problem of insufficient information to a certain extent. We hypothesize that our proposed method will significantly enhance the classification accuracy in both binary and multi-class tasks when compared to traditional methods that do not incorporate the Adaptive Feature Fusion mechanism. This enhancement is attributed to the AFF mechanism’s ability to extract more valuable information from a fixed number of samples, thereby improving the differentiation of various network traffic patterns.
The primary contributions of this paper are as follows:
We develop an intrusion detection method using a metric-based meta-learning approach, which not only distinguishes between normal and malicious network traffic but also excels in handling multi-class tasks in few-shot scenarios.
We introduce an Adaptive Feature Fusion mechanism, which effectively combines two types of traffic data representations, thereby enriching the information content of each sample and enhancing the efficacy of few-shot learning in network intrusion detection.
We conduct extensive experiments on the network intrusion detection method incorporating AFF. The results demonstrate its excellent performance in both binary and multi-class scenarios. Furthermore, the feasibility experiments validate its practical applicability.
2. Related Work
2.1. Intrusion Detection Based on Deep Learning
As computational resources for neural networks continue to expand, a prevalent trend in applying deep learning involves employing the proposed neural network architectures for feature extraction to classify traffic of different categories [
7]. A highly scalable hybrid deep neural network (DNN) was introduced by [
8], which underwent benchmark testing across multiple datasets. The experiments demonstrated that deep learning methods can capture more profound feature information, providing an advantage over previous research in intrusion detection tasks.
However, while the HAST-IDS introduced by [
9] utilizes CNN for spatial feature extraction and RNN for temporal features—achieving a detection rate of 96.96% on the ISCX-2012 dataset—its reliance on specific architectures limits adaptability to varied traffic conditions. Similarly, the DST-IDS proposed by [
10] combines packet-based and flow-based techniques, transforming raw packets into 28 × 28 × 1 images, resulting in an accuracy of 96.27% on the CICIDS2017 dataset. While effective, both systems demonstrate a lack of integration of statistical feature representation, potentially constraining their overall performance.
In contrast to prior intrusion detection systems, which frequently focus exclusively on either statistical feature representation or binary data stream representation, FS-IDS [
11] aims to enhance classification accuracy within the binary data stream representation by integrating statistical feature information. However, its lack of adaptiveness to specific task requirements limits the maximization of information utilization, suggesting a gap in flexibility compared to models that leverage both feature types effectively.
2.2. Few-Shot Learning
Traditional deep learning methods often rely on a large amount of labeled data, which can be resource intensive and unattainable in some cases. In situations with limited labeled data samples, deep learning models can struggle to generalize effectively due to overfitting issues. In contrast, humans, equipped with prior knowledge and relevant backgrounds, can rapidly transfer their acquired knowledge to new task scenarios, thereby effectively completing tasks even in few-shot learning scenarios [
12].
In recent years, meta-learning has gained popularity and been widely applied to the challenges of few-shot learning. Generally, meta-learning can be seen as a process of extracting knowledge from multiple similar learning tasks and improving subsequent performance through experience [
13], often referred to as “learning to learn”. Existing meta-learning methods for few-shot learning typically involve learning from a limited number of target tasks with a large number of similar tasks to create a classifier that can adapt to various scenarios [
14]. The Meta-Learning Algorithmic Model (MAML) introduced by [
15] achieves rapid convergence with minimal data by updating parameter values to change gradient-based optima, thereby adapting to new application scenarios. Matching networks [
16] draw inspiration from some concepts in Metric Learning within deep learning and enhance network capabilities using external memory. Prototypical networks [
17] are also metric-based meta-learning methods, directly determining sample categories by their proximity to prototypes, which are the mean representations of all samples within each category in a shallow feature space.
2.3. Application of Few-Shot Learning in the Field of Intrusion Detection
Confronted with emerging novel network attacks, traditional intrusion detection relying on a plethora of labeled samples struggles to effectively detect threats. Consequently, some studies seek to integrate few-shot learning into intrusion detection systems, leveraging meta-learning frameworks to achieve effective detection even in scenarios with extremely limited sample sizes. In the paper [
18], the FC-Net is introduced as the first intrusion detection system to incorporate a meta-learning framework. FC-Net, with a CNN as its backbone, transforms data packets into 16×16 RGB images as input, achieving a notable accuracy of 94.64% on the CICIDS2017 dataset. Built upon the MAML-based meta-learning framework, ref. [
19] establishes a few-shot intrusion detection task. Leveraging the learn to forget mechanism, the model achieves rapid convergence, ultimately attaining an accuracy of 94.66% in the binary classification task on the CICIDS2017 dataset.
While these studies validate the feasibility of meta-learning models in addressing few-shot challenges in intrusion detection, they primarily focus on binary classification tasks. There remains a critical need for solutions tailored specifically for multi-class classification tasks. Thus, this paper builds on the enhanced performance in binary scenarios to conduct comprehensive experiments targeting multi-class intrusion detection, thereby distinguishing its contributions in addressing the identified gaps.
3. Problem Formulation
In few-shot scenarios, the samples within the task set constitute the meta-training set, while the few samples corresponding to the target task constitute the meta-test set. Typically, in the realm of few-shot learning, a model is trained to differentiate among instances from
N different categories by being given
K instances for each category, constituting what is known as
K-shot
N-way tasks. As shown in
Figure 1, during a training episode or task, a subset of categories is randomly selected from the training set, and a portion of instances from each category is randomly sampled to form the support set. Additional instances are selected from the remaining categories to construct the query set. With the support set provided, the model is trained iteratively to minimize the prediction loss on the query set.
In the field of network intrusion detection, we define a task as a binary classification task distinguishing between normal (G) and a specific type of malicious sample (A, B, C, or F). G represents normal traffic, while A, B, C, and F represent different malicious traffic types. Types A, B, and C have ample labeled samples in the dataset, whereas F is a new type with only K samples available. The primary goal is to perform task by learning from K samples each of type F and normal type G, forming a classifier f. This classifier will then identify type F malicious samples among unknown samples. The support set for this task is , where . Given the small K, these samples alone are insufficient for .
To address this, we leverage tasks , , and , which are structurally similar to but use malicious samples of types A, B, and C, respectively. For example, task uses K samples, each of types A and G, forming . These tasks simulate the support and test sets for . In meta-learning terms, (comprising tasks like , , and ) and (comprising tasks like ) are the meta-training and meta-testing sets, respectively.
4. Methodology
4.1. Metric-Based Meta-Learning Model
The approach of metric-based meta-learning involves utilizing the distances between categories as indicators to facilitate the learning process. In this paper, we use a prototypical network as the foundational approach and illustrate the implementation of AFF. In a single training episode of K-shot N-way tasks, the support set, denoted as
, consists of samples with feature vectors
, where
D represents the dimensionality of the input feature vectors that encapsulate the characteristics of each sample. The corresponding labels
indicate the categories of these samples. The prototypical network computes a representation
for each category
k using the support set, often referred to as the prototype, where
M signifies the dimensionality of the prototype representations that summarize the information from the corresponding feature vectors, facilitating classification. Each category’s prototype is essentially the mean feature embedding of the samples belonging to that category in the support set, with feature embedding generated by the embedding function
, where
represents learnable parameters:
Following this, the prototype network classifies each sample in the query set based on the distances to various class prototypes. In this paper, we employ the distance
d for calculating distances, and the class distribution for sample
e is determined by the softmax corresponding to distances in the embedding space:
In practical implementation, the distance function
d used is the Euclidean distance because of its excellent computational efficiency and effectiveness as demonstrated in previous work [
20,
21].
4.2. Network Traffic Representation
Binary data stream representation. From a transmission perspective, transmitted data packets are effectively converted from bitstreams into electrical signals for network transmission. These data packets are then parsed by protocols other than the physical layer in the Open Systems Interconnection (OSI) protocol suite. In this context, a network traffic flow can be defined as the sequence of packets exchanged between two endpoints in a network connection, typically beginning with a TCP connection establishment (SYN) and concluding with the connection termination (FIN). As a result, data packets within network traffic can be seen as a combination of payloads and data packet headers from various protocol layers, such as IP and TCP. Each high-level data packet header contains semantic information, including elements like version, length, and identifier. However, due to the utilization of different protocols and varying content, achieving uniform representation and constant sizes for data packets can be challenging.
To address this issue, nPrint [
22] represents packets in their raw binary data form, ensuring that semantic information can be accurately extracted through predefined formats. By preserving a fixed number of bytes for each protocol header and aligning the data packets with padding, nPrint ensures consistency in format.
Figure 2 illustrates the partitioning and meaning of the data.
Recent work [
23,
24] has introduced a trend of transforming each byte in data traffic into corresponding pixels in grayscale images and demonstrated the effectiveness of this approach. However, due to the nonuniform sizes and formats of data packets, the straightforward stacking of raw binary data does not effectively leverage the feature extraction capabilities of CNN architectures, which rely on gathering information from pixels and their adjacent pixels.
To address this issue, our research aligns the binary data stream and maps bits from data packets to pixels in grayscale images, thereby creating image-based representations. In order to balance information retention and computational complexity, we select the first five data packets from each network traffic flow and the first 256 bytes from each packet.
Subsequently, we arrange the 256 bits from each packet into a 16 × 16 grid to form grayscale images, stacking the packets in chronological order. This method maximizes the feature extraction capabilities of the convolutional neural network (CNN) while effectively representing the binary data stream. Each grayscale image in the sequence can be regarded as a snapshot of the data packets, illustrating various layers of information derived from the binary stream. The specific process for handling the binary data stream representation is illustrated in
Figure 3.
Statistical feature. In this study, we use the CICFlowMeter-4.0 tool to extract statistical features from network traffic, such as flow size, flow duration, and packet count. We omit sensitive information such as IP addresses and port numbers, so 76 out of the initially extracted features are retained as the final set of statistical features. Given the significant variations in the value ranges of different features, we perform data cleansing and subsequently apply min-max normalization to standardize the data and expedite model convergence. For each feature dimension, the normalization formula is as follows:
where max (
) and min (
) represent the maximum and minimum values of the specific feature across all samples. This data transformation maps the values of each feature to the range [−1, 1]. The application of min-max normalization mitigates the influence of dimensional disparities within the data and yields the final set of statistical features.
4.3. Adaptive Feature Fusion Mechanism
Statistical features and binary data stream features provide significantly different types of information for the same network traffic. Statistical features mainly take a macroscopic approach, summarizing important characteristics of the given network traffic, providing a relatively comprehensive overview. On the other hand, binary flow data focus more on the detailed information contained within each packet, including the semantic details of various protocol headers at different layers and payload contents.
To effectively harness the information from both representations and further enhance performance in few-shot learning (FSL), we introduce an Adaptive Feature Fusion (AFF) mechanism. This mechanism is designed to fuse heterogeneous features while dynamically adjusting its emphasis based on specific target tasks and variations in training set samples. In this context, the AFF mechanism can dynamically adjust the focus on these two feature types based on the specific network traffic scenario. For instance, during a DDoS attack, where traffic patterns may shift rapidly and unpredictably, the AFF mechanism can prioritize binary data stream features to capture the nuanced behavior of malicious packets while still retaining the broader statistical context. By fine-tuning the fusion coefficients, the system can enhance its sensitivity to real-time threats, ensuring that critical features are highlighted to improve detection accuracy. Specifically, if a spike in packet size or frequency is detected, the AFF mechanism may increase the weight of binary features to focus on the packet contents that reveal the attack’s nature, thereby improving the system’s responsiveness to evolving attack strategies. Consequently, upon the integration of the AFF mechanism, it transforms the feature vectors for each traffic class in the prototype network into convex combinations of feature vectors generated from the two representation types. For each sample
i, the computation of its feature vector
is as follows:
where
represents the feature vector derived from statistical feature
after extraction, and
is the feature vector derived from binary flow data representation
after extraction. The two types of feature vectors should be generated within the same dimensional space to ensure that AFF correctly produces the final fused feature vector. In this equation,
is the adaptive fusion coefficient, and its magnitude depends on the feature vectors
of each sample. The specific calculation method is as follows:
where
a denotes the adaptive fusion network. The generated
values range from (0, 1) and essentially represent the weight of the binary data stream feature vector
within the final feature vector
. Depending on the specific scenario, the value of the adaptive blending coefficient
can fluctuate, thereby modifying the reliance of
on the two types of feature vectors.
4.4. Overall Architecture
The overall architecture of AFF is depicted in
Figure 4. AFF consists of two components: a feature extraction network and an adaptive fusion network. The feature extraction network comprises two distinct architectures, each responsible for extracting features from the sample’s binary data stream representation and statistical features, generating feature vectors of the same dimensions. The adaptive fusion network uses the feature vector corresponding to the statistical features as input and outputs an adaptive fusion coefficient within the range (0,1). This coefficient is used to generate the sample’s final feature vector according to Formula 4. The progressive operation of these two constituents, situated between the intrusion detection model’s input and the prototype network, leads to a subsequent improvement in the overall model’s performance. The overall architecture of AFF is depicted in
Figure 4.
Binary data stream feature extraction network. After data preprocessing, the representation of binary flow data essentially consists of 5 grayscale images, each with a size of 16 × 16 pixels. To expedite network convergence, enhance model accuracy, and improve generalization capabilities, we choose a Residual Network (ResNet) architecture over the traditional convolutional network structure for binary data stream feature vector extraction. Subsequently, a vector is generated for each data packet in the network traffic, which corresponds to a grayscale image. As the input representations are alignment processed, the multiple feature vectors obtained after passing through the ResNet maintain their alignment characteristics. This allows for dimension reduction by taking the average of the multiple feature vectors that constitute a three-dimensional tensor. Ultimately, this process yields the feature vector for binary flow data. The ResNet architecture is composed of four residual blocks, and each convolutional layer within these blocks has a kernel size of 5 × 5 and a stride of 1. Rectified Linear Unit (ReLU) serves as the activation function, while max-pooling layers with a kernel size of 2 and a stride of 2 are present only in the first three residual blocks.
Statistical feature extraction network. The input statistical feature are represented as a 76-dimensional vector. The feature extraction network for statistical features is a nonlinear multilayer perceptron (MLP), composed of fully connected layers, activation functions, dropout layers, and additional fully connected layers as shown in
Figure 4. The activation function used is ReLU. The statistical features are finally transformed into a 512-dimensional feature vector, which matches the dimensionality of the binary data stream feature vector.
Adaptive fusion network. The adaptive fusion network also employs an MLP with a network structure similar to that of the statistical feature’s feature extraction network. However, a sigmoid function is added before the output, allowing the input 512-dimensional statistical feature vector to generate an adaptive fusion coefficient within the range (0,1). Following the convex combination of the two types of feature vectors, each sample’s feature vector is used as input for the prototypical network.
4.5. Training Strategy
To illustrate, Algorithm 1 provides insight into how the prototypical network operates with the introduction of the Adaptive Feature Fusion Mechanism.
Algorithm 1: Training an episode for the prototypical network with AFF. M is the total number of classes in the training set. N is the number of classes in each episode. K is the number of supports for each class. is the number of queries for each class. R and T are the feature vector sets for binary data stream and statistical features. |
Data: Training set where . . Result: Episode loss for the sampled episode. ; for k in V do ; ; ; for in do ; ; ; end ; for in do ; ; ; end end ; for k in do for () in do ; end end |
5. Experiments and Analysis
5.1. Datasets
Due to the lack of datasets specifically generated for few-shot learning scenarios in the field of intrusion detection, a common practice is to select widely used datasets from the network security domain as the foundation and construct few-shot tasks by redefining samples. In order to evaluate the proposed model, we select the CICIDIS2017 [
25] and ISCX2012 [
26] datasets as benchmarks, with the following reasons:
To achieve the acquisition of two different data representations, the chosen datasets should provide unprocessed network traffic files rather than pre-extracted feature vectors.
The selected datasets have corresponding labels for each sample, which simplifies the process of model learning and metric calculation.
The selected datasets encompass a variety of protocols and types of attacks, making it possible to simulate real-world network attack scenarios effectively.
These datasets are widely recognized for their comprehensive coverage of various attack types and protocols, providing a solid foundation for training and evaluating intrusion detection systems. After filtering and reconstruction, normal traffic and five attack types from the CICIDS2017 dataset, along with normal traffic and four attack types from the ISCX2012 dataset, are combined into a new dataset without altering or selecting samples. Detailed information about the two reconstructed datasets is provided in
Table 1.
For the convenience of presenting and analyzing the results of subsequent experiments, we assign specific codes to each type of traffic, with the main code representing the traffic category and the subscript indicating its original dataset. The subscript “i” indicates data from ISCX2012, while the subscript “c” indicates data from CICIDS2017.
5.2. Metrics
Intrusion detection is fundamentally a classification problem, making accuracy the most commonly used performance metric. Accuracy measures the proportion of correctly classified samples against the total test samples, reflecting the model’s predictive capability. Recall is also critical in this domain, indicating the ratio of true positives to the total positives identified by the model, which is essential for assessing the detection of malicious samples. In multi-class experiments, we use micro-precision and micro-recall, which provide the average precision and recall across all categories as key evaluation metrics.
5.3. Experimental Settings
In order to comprehensively and deeply investigate the proposed approach, we design three categories of experiments with different objectives.
The first category of experiments aims to explore whether the prototype network incorporating the AFF mechanism can effectively address intrusion detection problems in a few-shot scenario and whether the proposed method outperforms existing solutions. To this end, the first category of experiments is conducted based on the restructured CICIDS2017 dataset and is divided into two settings: binary classification tasks and multi-class classification tasks. In the binary classification tasks, we select four attack categories from the dataset as the data source and 70% of the benign samples for the meta-training set, reserving one category for the meta-test set. The test set comprises all attacks, including the few-shot attack category, along with the remaining normal data flow samples. By choosing different attack categories as the source of the test set, as the number of combinations C(5,1) = 5, we can construct five parallel experiments. In order to explore the impact of different sample sizes K on detection performance, this experiment is conducted under the settings of K = 1 and K = 5.
For the multi-classification tasks of intrusion detection, our objective is for the model to correctly classify traditional attacks with a substantial number of samples and, concurrently, accurately identify categories of attacks with limited samples. Hence, we opt for three types of attacks, along with normal data flow samples, to compose the meta-training set. Additionally, we select one attack category to simulate a scenario with few shots for constructing the dataset. In the multi-class classification tasks, a total of 5 experiments are conducted, and the sources of meta-training and meta-testing data for each experiment are shown in
Table 2. Similar to the binary classification tasks, the multi-class classification tasks are also performed under the settings of
K = 1 and
K = 5.
The second category of experiments aims to investigate the effectiveness of the AFF mechanism, examining whether the adaptive fusion mechanism can indeed enhance the performance of the prototype network. We conduct ablation experiments on binary and multi-class classification tasks under the K = 5 setting using the restructured CICIDS2017 dataset to observe the enhancement effect of the AFF mechanism on the prototype network.
The third category of experiments aims to explore the feasibility of the proposed model in practical applications. We conduct cross experiments on the restructured CICIDS2017 and ISCX2012 datasets. By selecting one dataset as the source of meta-training and meta-testing while using the other to simulate real-world data, we observe the performance of the proposed model when the source of meta-training and meta-testing samples differs. This experiment is conducted under the K = 5 setting and includes binary and multi-class classification tasks.
5.4. Classification Experiments on the Reconstructed CICIDS2017 Dataset
The test results for binary classification tasks in the first category of experiments are shown in
Table 3 and
Table 4. As shown below, the proposed method performs well in both binary and multi-class classification intrusion detection tasks, consistently maintaining high accuracy and recall values during testing. The value of the adaptive mixing coefficient
is shown in
Table A1 in the
Appendix A under various experimental settings.
An analysis of these two tables leads to the following conclusions:
The choice of few-shot types has a noticeable impact on the experimental results, a phenomenon that is consistent in both binary and multi-class tasks. For example, selecting SSHc as the few-shot type results in favorable performance in both binary and multi-class tasks, while choosing Infiltrationc leads to a significant decrease in accuracy and recall in both types of experiments.
The selection of the K value has a clear influence on the results. When K is increased from 1 to 5, various data groups in both types of experiments show varying degrees of improvement. Such results are reasonable since more samples imply that the model can extract more useful information, providing a more reliable basis for classifying training set samples.
Subsequently, we compare our proposed method with existing research that uses the same benchmark dataset, CICIDS2017. The results are presented in
Table 5 and
Table 6.
Table 5 demonstrates that, in binary classification tasks, the prototype network incorporating the AFF mechanism stands out among existing few-shot intrusion detection methods, displaying a clear advantage.
Intrusion detection for multi-class classification in few-shot scenarios is a relatively recent research area, with few existing comparative studies available for reference. As illustrated in
Table 6, besides the approach proposed in this paper and SPN and Cs, the other methods rely on a large volume of samples for training and do not utilize few-shot learning techniques.
Table 6 further illustrates that, even when working with highly constrained sample sizes, the method outlined in this paper consistently delivers commendable detection performance. This accomplishment carries paramount significance within the dynamic landscape of real network environments characterized by the emergence of novel attack scenarios.
5.5. AFF Effectiveness Experiments
To evaluate the effectiveness of the AFF mechanism, a series of ablation experiments is conducted under consistent data preprocessing conditions. With K set to 5, the experimental results for both the prototype network and the prototype network integrated with the AFF mechanism in binary and multi-class tasks are presented in
Figure 5.
Figure 5 illustrates the comparison between two models: one that employs the AFF mechanism, integrating two types of features (left), and another that relies solely on binary data stream (right). In each heatmap, the vertical axis represents the types of meta-testing traffic in the experiment, while the horizontal axis indicates the types of traffic involved. The values in the heatmap range from 0.95 to 1.00, with darker colors signifying higher classification accuracy for the corresponding types of traffic.
Figure 5 clearly demonstrates a substantial improvement in the performance of the prototype network across both binary and multi-class tasks due to the presence of the AFF mechanism, as indicated by the darker colors in the heatmap on the left compared to the one on the right. This enhanced accuracy in traffic classification for both tasks can be explained by two key perspectives that highlight the overall improvements across different types of samples within each task:
From the perspective of the fundamental reasons for few-shot learning shortcomings, a key factor is the information loss due to insufficient samples. The AFF mechanism addresses this by integrating two types of features, effectively supplementing the missing information and optimizing the overall model.
From a theoretical standpoint, the prototype network without the AFF mechanism serves as a special case of our proposed model, relying exclusively on binary data stream for traffic classification. Consequently, the introduction of the AFF mechanism guarantees that the overall performance will be at least as strong as that of the model that relies solely on a single type of feature.
5.6. Feasibility Experiments
To simulate the real-world application of our trained model, we design a series of cross-experiments based on the restructured CICIDS2017 and ISCX2012 datasets. Under the condition of K = 5, we select four attack types’ traffic from the CICIDS2017 dataset along with normal traffic to form the meta-training set, while we use the entire range of traffic types from the ISCX2012 dataset as few-shot types for testing. The results for both binary and multi-class tasks are illustrated in
Figure 6.
The left image in
Figure 6 illustrates the model’s performance in the binary classification scenario, where the test set includes malicious traffic samples of DDoS
i, DoS
i, Infiltration
i, and SSH
i. The right image represents the results of the multi-class experiment under the same experimental settings. Even in the multi-class experiment of the cross-validation, the overall model maintains a commendable performance, achieving an accuracy of 90.36%. It is noteworthy that the cross-experiment is conducted under extreme conditions with
K = 5, testing the model on a new dataset with significant variations from the training dataset. Therefore, the results of the cross-experiment suggest that the model exhibits robust stability and superior performance in environments with substantial differences.
6. Applicability
In this subsection, we further clarify the applicability of the proposed approach and the experimental design, highlighting the following two limitations:
Unencrypted traffic samples. The research and experiments presented in this study are predicated on the assumption that the traffic samples utilized are unencrypted. In instances where traffic is encrypted, the information contained within the payload section of the binary data stream representation may be compromised, as the underlying data become inaccessible for analysis. This limitation poses a significant challenge for intrusion detection systems, potentially diminishing the effectiveness of our proposed method in such scenarios. Furthermore, it is essential to highlight that the traffic samples from the CICIDS2017 and ISCX2012 datasets employed in the experiments are also unencrypted; consequently, the results obtained remain uncertain in the context of encrypted traffic and warrant further investigation.
Features of samples are limited. To ensure real-time performance in intrusion detection systems, we extract features by selecting the first five data packets from each network traffic flow and the initial 256 bytes from each packet. This selection is informed by findings from [
22], which suggest that header information (such as IP and TCP headers) and the early data within the payload are typically the most informative for sample classification. However, this methodology entails certain limitations; specifically, it may lead to the loss of critical information for some malicious payloads that are present later in the network traffic flow or that extend beyond the first 256 bytes. Such limitations could significantly affect the classification accuracy for these samples.
7. Conclusions
In the present paper, we introduced a metric-based meta-learning framework designed to enhance few-shot intrusion detection for both binary and multi-class classification tasks. By employing metric-based meta-learning to create prototypes after feature extraction and integrating an Adaptive Feature Fusion mechanism, our approach effectively reduces information loss, a significant challenge in few-shot learning scenarios.
The results indicate that the proposed method achieves a detection rate of 98.98% with five-shot samples, outperforming existing few-shot learning techniques. Additionally, ablation studies confirm that the AFF mechanism enhances accuracy by mitigating information loss. Cross-validation on the CICIDS2017 and ISCX2012 datasets further supports the model’s robustness, with accuracy rates of 92.32% and 90.36% for binary and multi-class tasks, respectively.
Despite these findings, further evaluation of the model’s performance in real-world scenarios is needed to assess its generalization capabilities. Moreover, addressing the challenge of completely unseen samples presents an important area for future research. Our future work will focus on improving generalization through methods like meta-regularization and domain adaptation, as well as exploring zero-shot learning to enhance the model’s applicability in dynamic environments. These efforts aim to further advance the practical utility of our approach in real-world intrusion detection systems.