1. Introduction
With time, the internet has developed and grown, and it now provides a plethora of beneficial services to enhance people’s lives. However, there are a number of security hazards connected to these services, including a rise in network infections, eavesdropping, and hostile attacks [
1,
2,
3], all of which complicate detection and raise false alarms. A growing number of internet users, including important businesses like banks, enterprises, and governmental agencies, have network security as their top priority.
Cyber-attacks typically begin with reconnaissance to identify vulnerabilities, which are then exploited to initiate damaging actions [
4]. Unauthorized access to computer systems poses threats to their confidentiality, integrity, and availability (CIA), leading to what we term an “intrusion” [
5]. In recent years, numerous innovative cyber-attack techniques have emerged, such as brute force attacks, botnets, distributed denial of service (DDoS), and cross-site scripting [
6]. These attacks have raised significant concerns about cyber security. Cybercriminals have increasingly utilized various platforms and infrastructures as vectors for malware and botnets, including for Bitcoin Trojans. According to the internet security threat report (ISTR), malware is discovered every thirteen seconds on average during web searches. Incidences of ransom ware, email spam, and other online threats have seen substantial increases, as reported by CNBC [
7]. Cybercrime continues to escalate at an alarming rate, with the global cost of cyber-attacks projected to reach an unprecedented
$9.5 trillion annually by 2024 [
8]. This sharp increase highlights the growing complexity and sophistication of cyber threats. As organizations encounter higher stakes in protecting sensitive data, the role of cyber security measures such as intrusion detection systems (IDSs) becomes more critical. These systems help identify and mitigate potential risks, offering proactive defense against the ever-evolving landscape of cyber threats. As we approach 2025, cybercrime is expected to remain a top global concern, further driving the need for advanced, real-time threat detection and mitigation strategies.
Real-time intrusion detection is critical for ensuring the security and integrity of network systems. Deep learning models have proven effective in real-time analysis of network traffic, enabling swift identification of intrusions [
9]. Various machine learning approaches enhance the responsiveness of IDSs, particularly in adapting to emerging threats [
10]. Additionally, integrating real-time capabilities within IDSs significantly improves network security by promptly detecting and mitigating attacks [
11].
IDSs are among the most widely used security solutions, designed to identify unauthorized access and safeguard devices and network infrastructure from malicious activities. IDSs can be broadly categorized based on their detection approach into two main types. The first type, signature-based IDSs, compare network traffic or host activity against a database of known malicious patterns. While this method is effective for identifying known threats, it requires frequent updates to remain effective and may struggle with unknown or zero-day attacks as it relies on pre-existing signatures. In contrast, anomaly-based IDSs identify potential threats by detecting deviations from normal behavior. Unlike a signature-based IDS, this approach does not depend on known attack patterns, making it particularly effective for identifying zero-day attacks that exploit previously unknown vulnerabilities. To achieve this, anomaly-based IDSs often leverage machine learning and deep learning techniques to analyze large datasets, learn normal behavior patterns, and detect anomalies with high accuracy. This approach not only enhances the system’s adaptability to new threats but also reduces false positives and negatives. In our study, we adopted the anomaly-based approach to effectively address these challenges.
This study introduces two advanced hybrid deep learning models for intrusion detection, a transformer–deep neural network (Transformer–DNN) and an autoencoder–convolutional neural network (Autoencoder–CNN). Both models effectively address class imbalance using advanced techniques like enhanced hybrid adaptive synthetic sampling–synthetic minority oversampling technique (ADASYN-SMOTE), enhanced SMOTE, and edited nearest neighbors (ENN). On the CICIDS2017 dataset [
12], both models achieved high accuracy, with Autoencoder–CNN reaching 99.90% in binary and 99.95% in multi-class classification, and Transformer–DNN achieving 99.92% and 99.96%, respectively. With the NF-BoT-IoT-v2 dataset [
13], Autoencoder–CNN attained 99.98% in binary and 97.95% in multi-class classification, while Transformer–DNN achieved 99.98% and 97.90%, respectively, demonstrating strong performance across both datasets. The following is an overview of the principal contributions:
An effective intrusion detection system was developed using two enhanced hybrid deep learning models, Transformer–DNN and Autoencoder–CNN. The transformer extracts contextual features for pattern analysis, while the DNN performs final classification. The Autoencoder reshapes data, preparing it for precise classification by the CNN.
Enhanced hybrid ADASYN-SMOTE resampling is leveraged for binary classification, while enhanced SMOTE resampling is applied for multi-class classification. These techniques are combined with ENN to effectively address class imbalance and enhance model performance.
Integrating the enhanced local outlier factor (LOF) strengthens anomaly detection by detecting and removing outliers, significantly boosting the model’s ability to identify minority class attacks and improving overall detection performance.
Evaluation on the CICIDS2017 and NF-BoT-IoT-v2 datasets demonstrated the superior performance of the proposed models compared with state-of-the-art approaches.
This paper is structured as follows.
Section 2 provides a comprehensive review of the related literature.
Section 3 describes the methodology employed in this study.
Section 4 presents the results obtained from the experiments, while
Section 5 offers a detailed discussion of the findings.
Section 6 addresses the limitations of the proposed approach.
Section 7 concludes the study by summarizing its key contributions and insights. Finally,
Section 8 outlines potential directions for future research.
2. Related Work
IDSs now provide essential protection for national, economic, and personal security, in the context of the exponential development of data collection and the growing interconnection of the global internet infrastructure. In an effort to reduce computer system vulnerabilities and improve surveillance capabilities, James P. Anderson invented the notion of intrusion detection in 1980 [
14]. While security experts continue to work to improve the effectiveness and precision of IDSs, these systems have been widely implemented over time. This section reviews various machine learning and deep learning techniques used for intrusion detection described in the literature. Since DL has so many uses and performs so well in areas like image recognition and natural language processing, it has become an obvious choice for traffic anomaly detection in IDSs. Deep learning approaches for categorizing attack types in intrusion detection systems have been mostly described in academic publications.
2.1. Binary Classification
Using the Transformer–DNN and Autoencoder–CNN technique for IDS binary classification combines the strengths of advanced contextual feature extraction and spatial pattern recognition. The transformer component excels at capturing contextual relationships in network traffic data, enhancing the ability to discern intricate data dependencies essential for effective classification. The DNN leverages these extracted features to perform accurate final classifications, ensuring robust detection of attack types. Meanwhile, the Autoencoder compresses and reshapes network traffic data, addressing class imbalance and enhancing the representation of data characteristics. These optimized data are then passed to the CNN, which specializes in identifying intricate spatial patterns crucial for distinguishing malicious from legitimate activity. By integrating these methods, this hybrid approach significantly enhances the effectiveness of cyber security defenses against dynamic and evolving threats. The result is an IDS with improved accuracy, reduced false positives and negatives, and superior real-time threat detection capabilities.
In ref. [
15], the authors proposed a DNN model that achieved a binary classification accuracy of 93.1%. This study explores the development of a versatile and efficient IDS capable of detecting and categorizing unexpected and evolving cyber-attacks. The dynamic nature of networks and the rapid evolution of attacks necessitate the evaluation of multiple datasets over time, using both static and dynamic techniques. This approach aided in identifying the most effective methods for detecting emerging threats, and provided a thorough evaluation of DNN models alongside other traditional machine learning classifiers using several publicly available benchmark malware datasets. The authors in ref. [
16] introduced a DNN-based intrusion detection model with a reported accuracy of 99%. The model was applied to a recently available dataset that included packet-based and flow-based data along with additional metadata. The dataset was labeled and imbalanced; it included 79 attributes, with some classes having significantly fewer training samples. The study highlights the challenges of working with imbalanced data and the importance of using deep learning models to address these issues. In ref. [
17], the authors recommend the use of principal component analysis (PCA) along with classifiers such as random forest (RF), linear discriminant analysis (LDA), and quadratic discriminant analysis (QDA), to achieve 99.6% accuracy. PCA can be employed as a feature dimensionality reduction method, and the reduced features used to construct various classifiers for IDS development.
In ref. [
18], the authors introduced a long short-term memory (LSTM) model that achieved 92.2% accuracy in binary classification. This approach incorporates attention mechanisms with LSTM networks to effectively capture both temporal and spatial patterns in network traffic data. The model was tested on the UNSW-NB15 dataset, containing diverse patterns and notable differences between training and testing data, creating a challenging evaluation setting. In ref. [
19], a CNN–bi-directional long short-term memory (BiLSTM) model was proposed, attaining 97.90% accuracy in binary classification. This hybrid approach integrated bidirectional LSTM with a lightweight CNN, applying feature selection techniques to optimize model efficiency. The authors of ref. [
20] proposed a hybrid model combining LSTM, CNN, and SVM to achieve 98.47% accuracy. Their hybrid semantic deep learning (HSDL) architecture incorporated word2vec embedding to capture semantic information in network traffic, and AES encryption to enhance cloud storage security. The crossover-based mine blast optimization algorithm (CMBA) was used to select the optimal AES key. Additionally, ref. [
21] proposed a two-stage deep learning structure incorporating a gated recurrent unit (GRU) network in the first stage and a denoising auto-encoder (DAE) in the second stage, which achieved an accuracy of 90.21% on Test+ for intrusion detection. Furthermore, ref. [
22] tackled class imbalance by integrating CNN–bidirectional long short-term memory (BiLSTM) with ADASYN, reaching an accuracy of 90.73% on Test+. In ref. [
23], the authors proposed a hybrid Transformer–CNN deep learning model to overcome these challenges. The model utilized data resampling methods like ADASYN, SMOTE, ENN, and class weights to address class imbalance, achieving an accuracy of 99.71%.
In ref. [
24], the authors proposed advancements for IDSs in cloud environments by developing and evaluating two innovative deep neural network models. The first model utilized a multi-layer perceptron (MLP) trained with backpropagation (BP), while the second combined particle swarm optimization (PSO) with MLP training. Both models achieved impressive accuracy of 98.97%, demonstrating significant improvements in IDS performance and efficiency for detecting and mitigating intrusions. In ref. [
25], the authors demonstrated a notable reduction in the time required for traffic analysis and significant success with their proposed model. Tested on the CSE-CIC-IDS2018 dataset, the model’s effectiveness was confirmed. Experimental results revealed that using the ExtraTree algorithm, the model achieved an impressive accuracy of 98.5% for binary classification. In ref. [
26], the authors utilized PySpark with Apache Spark in the Google Colaboratory (Colab) environment, relying on the Keras and Scikit-Learn libraries. The training and testing datasets included ’CICIoT2023’ and ’TON_IoT’. To enhance the feature set, the datasets were refined using the correlation method. The authors developed a hybrid deep learning algorithm combining one-dimensional CNN and LSTM, obtaining an accuracy of 98.75% for optimal performance.
In ref. [
27], the authors introduced a new classifier algorithm designed to detect malicious traffic in IoT environments using machine learning techniques. The approach utilized a real IoT dataset that simulates actual traffic conditions, assessing the performance of different classification algorithms to evaluate their effectiveness, achieving an accuracy of 99.2%. In ref. [
28], the authors described three essential machine learning techniques used for binary classification. These methods were applied within an IDS designed to detect IoT-based attacks and classify them into two categories, benign and malicious. The study utilized the IoT-23 dataset, a recent and extensive dataset, to create an intelligent IDS capable of identifying and categorizing attack patterns in IoT environments for binary classification, achieving an accuracy of 99%. In ref. [
29], the authors examined the factors influencing existing near-Earth remote sensing systems and introduced a spatio-temporal graph attention network (N-STGAT) incorporating node states for application in network intrusion detection in near-Earth remote sensing systems, obtaining an accuracy of 97.88%. In ref. [
30], the authors introduced a self-supervised graph neural network for network intrusion detection systems, designed to effectively and thoroughly differentiate between normal and malicious network flow associated with various attack types. To the best of our knowledge, this represents the first graph neural network (GNN)-based method for classification tasks in NIDS using an unsupervised approach, resulting in an accuracy of 99.64%.
The Transformer–DNN and Autoencoder–CNN models outperformed existing methods, achieving 99.92% and 99.90% accuracy, respectively, in binary classification using CICIDS2017. With the NF-BoT-IoT-v2 dataset, both models reach 99.98% accuracy. These results represent significant advancements, with a comparison in
Table 1.
2.2. Multi-Class Classification
For multi-class categorization in IDSs, the combination of Transformer–DNN and Autoencoder–CNN offers a powerful solution. The transformer effectively captures contextual relationships within network traffic data, enabling the extraction of critical patterns and features. These features are utilized by the DNN to perform precise classifications of various attack types. Meanwhile, the Autoencoder reduces the dimensionality of the data, enhancing its representation, and the CNN leverages these refined features to detect intricate spatial patterns and anomalies. This integrated approach significantly enhances the IDS’s ability to distinguish between multiple attack categories, improving detection accuracy and fortifying the system’s overall performance.
Given the dynamic nature of network environments and the rapid evolution of attacks, evaluating various datasets using both static and dynamic methods is crucial for identifying the most effective algorithms for detecting future threats. In ref. [
15], the authors recommend a DNN model that achieved a multi-class classification accuracy of 95.6%. Their aim was to develop a flexible and effective IDS capable of identifying and categorizing unexpected and evolving cyber-attacks using a deep neural network, and the research provided a comprehensive analysis of DNN and other conventional machine learning classifiers using multiple publicly available benchmark malware datasets. The authors in ref. [
16] proposed a DNN model that achieved 99% accuracy. This model, tested on the latest publicly available dataset with packet-based, flow-based data, and additional metadata, addressed the challenges of imbalanced and labeled datasets and used 79 attributes. The research in ref. [
17] reported that using PCA, RF, LDA, and QDA models achieved 99.6% accuracy for multi-class classification. PCA was used for dimensionality reduction, and the reduced features were utilized to construct several classifiers for IDS development.
In ref. [
23], the authors introduced a hybrid Transformer–CNN deep learning model designed to tackle these challenges. The model incorporates data resampling techniques such as ADASYN, SMOTE, edited ENN, and class weights to mitigate class imbalance, achieving an accuracy of 99.02%. In ref. [
31], the authors present a conditional generative adversarial network (CGAN) augmented by bidirectional encoder representation from transformers (BERT), a powerful pre-trained language model, designed to enhance multi-class intrusion detection. The approach uses CGAN to generate additional data for minority attack classes, effectively tackling class imbalance. Moreover, BERT is integrated into the CGAN discriminator, improving feature extraction and strengthening input–output dependencies, thereby boosting detection performance through adversarial training, resulting in an accuracy of 87.40%. In ref. [
27], the authors propose a novel classifier algorithm for detecting malicious traffic in IoT environments using machine learning techniques. The method employs a genuine IoT dataset representing real-world traffic patterns and evaluates the effectiveness of various classification algorithms to assess their performance, achieving an accuracy of 99.2%. In ref. [
32], an RNN and a DNN were utilized to achieve accuracy rates of 98.68% and 98.95%, respectively. In ref. [
33], the authors proposed a multilayer CNN integrated with LSTM, which achieved remarkable accuracy of 99.5%. This approach utilized CNN layers to extract and select features, followed by a softmax classifier to categorize network intrusions. In ref. [
34], the authors presented an RF model that achieved an accuracy rate of 98.3%. The study extended its attack detection method to the UNSW-NB15 dataset, reaching 98.3% accuracy in multi-class classification tasks.
A hybrid model proposed by ref. [
20], combining LSTM, CNN, and SVM, achieved an accuracy of 98.47%. This approach created a HSDL architecture integrating SVM, CNN, and LSTM to analyze semantic information from network traffic. The study also included AES encryption for cloud storage security, optimized using the CMBA technique. In ref. [
24], the authors introduced enhancements for IDS in cloud environments by designing and assessing two novel deep neural network models. The first model utilized MLP optimized through BP, while the second combined MLP with PSO. These models significantly enhanced IDS effectiveness and efficiency, achieving accuracy of 98.41%. In ref. [
35], the authors underscore the critical importance of cyber security in monitoring and safeguarding network infrastructures against vulnerabilities and intrusions. They emphasize that advancements in machine learning, particularly deep learning, have significantly improved the early detection and prevention of attacks through advanced self-learning and feature extraction techniques. Their research utilized deep learning to analyze the CSE-CIC-IDS2018 dataset, which included both normal network behavior and various attacks. The evaluation of the LSTM model demonstrated a remarkable detection accuracy of 99%. In ref. [
29], the authors analyzed the factors affecting existing near-Earth remote sensing systems and proposed an N-STGAT that integrates node states for use in network intrusion detection in near-Earth remote sensing systems, obtaining an accuracy of 93%. In ref. [
36], the authors propose a collaborative federated learning approach that facilitates the sharing of cyber threat intelligence (CTI) between organizations, with the goal of developing a more efficient ML-based network intrusion detection system (NIDS). By implementing LSTM on the NF-BoT-IoT-v2 dataset, the model achieved an accuracy of 94.61%.
The Transformer–DNN and Autoencoder–CNN models achieved 99.96% and 99.95% accuracy, respectively, in multi-class classification with CICIDS2017. With NF-BoT-IoT-v2, the Autoencoder–CNN reached 97.95% and the Transformer–DNN achieved 97.90%. These results represent significant improvements over prior research, as detailed in
Table 2.
2.3. Challenges
State-of-the-art IDSs leveraging deep learning models face several significant challenges. A key issue is achieving high accuracy, which is often hindered by the class imbalance prevalent in benchmark datasets. These datasets typically contain a disproportionate amount of normal traffic compared with attack traffic, making it difficult to detect rare but critical attack types. This imbalance leads to elevated rates of false alarm and reduces overall detection effectiveness. While deep learning holds promise for improving detection capabilities, it also introduces substantial computational complexity and resource demands, raising concerns about scalability and efficiency, particularly in large-scale, real-time operational settings. Another critical challenge is the limited generalizability of these models. They often struggle to adapt to diverse network conditions or to identify novel attack types not included in the training data, reducing their robustness in practical application. Additionally, many studies have prioritized theoretical and experimental advancements in deep learning, often neglecting practical deployment challenges such as data privacy, system latency, and integration with existing security frameworks. A further limitation is the tendency to focus narrowly on accuracy, potentially overlooking other essential performance metrics such as precision, recall, F1 score, and the implications of false positives and negatives. Addressing these interconnected challenges requires a holistic approach that emphasizes balanced data handling, scalability, adaptability to evolving network threats, and practical considerations for deployment in real-world environments.
The Transformer–DNN and Autoencoder–CNN models address key limitations in existing intrusion detection systems, demonstrating superior accuracy and other performance metrics compared with traditional methods. These models effectively tackle class imbalance issues through advanced resampling techniques, including enhanced hybrid ADASYN-SMOTE for binary classification and enhanced SMOTE for multi-class classification, combined with ENN. The Autoencoder enhances the preprocessing of network traffic data by improving feature representation and balancing class distributions, which significantly boosts the CNN’s ability to classify and detect rare attack types. Meanwhile, the transformer excels in capturing contextual relationships within data, enabling the analysis of complex patterns and dependencies, while the DNN leverages these insights for precise classification. Both models are optimized for scalability and performance, efficiently handling large-scale datasets while maintaining real-time processing capabilities. Their robustness has been validated through extensive testing on the CICIDS2017 and NF-BoT-IoT-v2 datasets, confirming their effectiveness across diverse network environments and attack scenarios. Designed with real-world deployment in mind, these models minimize false positives and negatives, ensuring their applicability in live network settings. Moreover, a focus on comprehensive evaluation metrics beyond accuracy provides a holistic assessment of performance, addressing potential challenges in detection reliability and practical application.
5. Discussion
This section provides a comprehensive evaluation of the Autoencoder–CNN and Transformer–DNN models, benchmarking their performance against other classification techniques including CNN, Autoencoder, and DNN, across both binary and multi-class classification tasks. A detailed analysis of confusion matrices and performance metrics including accuracy, precision, recall, and F1 score highlights the comparative strengths and limitations of each approach. By focusing on results from the CICIDS2017 and NF-BoT-IoT-v2 datasets, we aim to demonstrate how the Autoencoder-CNN model, with its integration of Autoencoder and CNN architectures, and the Transformer-DNN model, with its combination of Transformer and DNN architectures, excel in detecting different classes. These integrations can contribute to measurable improvements in the detection and differentiation of diverse attack types, addressing key challenges in network intrusion detection. Through this in-depth discussion, we underscore the practical implications of our findings, particularly regarding the enhanced accuracy and reliability these models offer for intrusion detection systems in real-world applications. These results demonstrate the models’ potential to improve both early threat detection and response, thereby elevating the overall robustness of modern cyber security solutions.
- (i)
Binary Classification
On the CICIDS2017 dataset, the Autoencoder–CNN and Transformer–DNN models exhibited exceptional performance in binary classification. The Autoencoder-CNN achieved accuracy, precision, recall, and F1 scores of 99.90%, correctly detecting 13,927 normal instances and 11,039 attack instances, while incorrectly detecting 14 normal instances and 10 attack instances. The Transformer-DNN outperformed these results slightly, achieving metrics of 99.92% and correctly detecting 13,930 normal instances and 11,039 attack instances, with 11 normal instances and 10 attack instances incorrectly detected. On the NF-BoT-IoT-v2 dataset, both models achieved identical metrics of 99.98% across accuracy, precision, recall, and F1 score. The Autoencoder–CNN and Transformer–DNN correctly detected 1263 normal instances and 10,795 attack instances, with 3 attack instances incorrectly detected, as shown in
Figure 3. This high level of performance underscores the models’ robustness, especially in handling imbalanced datasets such as are typical in real-world applications. The high precision and recall values demonstrate their reliability in detecting attacks while minimizing both false positives and false negatives, making them effective solutions for intrusion detection systems.
The comparative performance of the Transformer–DNN and Autoencoder–CNN models in binary classification on the CICIDS2017 and NF-BoT-IoT-v2 datasets, as shown in
Figure 4 and
Figure 5, demonstrates their outstanding effectiveness. With the CICIDS2017 dataset, the Transformer–DNN model exceled, achieving perfect scores across all metrics and 99.92% accuracy, precision, recall, and F1 score. The Autoencoder-CNN was close behind, attaining 99.90% across all metrics, reflecting its strong performance. The DNN model also performed well, achieving 99.88% in all metrics, while the CNN and Autoencoder models showed slightly lower results at 99.83% and 99.73%, respectively. With the NF-BoT-IoT-v2 dataset, both the Transformer–DNN and Autoencoder-CNN models achieved flawless performance, with 99.98% across all metrics, highlighting their excellent ability in binary classification tasks. The DNN model trailed slightly with a score of 99.96%, while the CNN and Autoencoder models achieved 99.97% and 99.94%, respectively. These results further emphasize the exceptional classification capabilities of the Transformer–DNN and Autoencoder–CNN models, proving their suitability for real-world application in intrusion detection, particularly when dealing with imbalanced datasets.
The performance metrics in
Table 32 highlight the exceptional effectiveness of the Autoencoder–CNN and Transformer–DNN models for binary classification across different classes within the CICIDS2017 and NF-BoT-IoT-v2 datasets. With the CICIDS2017 dataset, the Autoencoder–CNN achieved an accuracy of 99.90% for the Normal class, with precision of 99.93%, recall of 99.90%, and an F1 score of 99.91%. For the Attack class, it attained an accuracy of 99.91%, precision of 99.87%, recall of 99.91%, and an F1 score of 99.89%. The Transformer–DNN slightly outperformed this, with 99.92% accuracy for the Normal class, achieving a precision of 99.93%, recall of 99.92%, and an F1 score of 99.92%. For the Attack class, it reached 99.91% accuracy with metrics of 99.90% precision, 99.91% recall, and 99.90% F1 score. With the NF-BoT-IoT-v2 dataset, both models demonstrated outstanding performance, highlighting their robustness and effectiveness in intrusion detection. The Autoencoder–CNN and Transformer–DNN models achieved 100% accuracy for the Normal class, with precision of 99.76%, recall of 100%, and F1 score of 99.88%. For the Attack class, both models delivered accuracy of 99.97%, achieving 100% precision, 99.97% recall, and an F1 score of 99.99%. These metrics showcase the balanced performance of the Autoencoder–CNN and Transformer–DNN models across classes, making them highly effective for real-world intrusion detection scenarios where accurate identification of both normal and attack traffic is critical.
- (ii)
Multi-Class Classification
In multi-class classification on the CICIDS2017 dataset, the Autoencoder–CNN model demonstrated exceptional performance across accuracy, precision, recall, and F1 score, all reaching 99.95%. The confusion matrix shown in
Figure 6 reflects the model’s ability to effectively distinguish between a wide range of attack types with minimal misclassifications. For example, the model correctly classified 1811 instances of PortScan, with only one misclassification incorrectly labeled as DoS GoldenEye. Similarly, it correctly classified all 2025 instances of DDoS, with no misclassifications. The model also accurately classified 5467 instances of DoS Hulk, with one misclassification incorrectly labeled as DDoS. Regarding more challenging attack types such as DoS Slowloris, the model correctly identified 227 instances, with a single misclassification incorrectly labeled as DoS Slowhttptest. Likewise, it accurately classified 155 instances of DoS Slowhttptest, with 1 misclassification incorrectly labeled as a Web Attack—XSS. In the case of Bot attacks, the model correctly classified 98 instances, with one misclassification incorrectly labeled as a Web Attack—Brute Force. The model also excelled in classifying rare attack types like Web Attack—Brute Force or Web Attack—XSS, infiltration, Web Attack—Sql Injection, and Heartbleed, all with no misclassifications. The confusion matrix highlights the model’s ability to handle both frequent and rare attack types with minimal errors, demonstrating its effectiveness in addressing multi-class classification challenges, particularly in imbalanced datasets. These results validate the model’s potential for real-world intrusion detection systems, where both high accuracy and the ability to distinguish among diverse attack types are essential.
In multi-class classification on the CICIDS2017 dataset, the Transformer-DNN model demonstrates exceptional performance, achieving outstanding accuracy, precision, recall, and F1-scores across multiple attack classes. As illustrated by the confusion matrix in
Figure 7, the model effectively distinguishes between various attack types with minimal misclassifications. For instance, the model correctly identified all 1811 instances of PortScan, with only 1 misclassification, where it was classified as DDoS, and accurately classified all 2025 instances of DDoS and 5468 instances of DoS Hulk, with no misclassifications. For DoS GoldenEye, the model correctly classified 536 instances without errors, and similarly, the model classified 305 instances of FTP-Patator, with just 1 instance incorrectly labeled as DoS Slowloris. Additionally, for more challenging attack types, such as DoS Slowloris and Web Attack–Brute Force, the model achieved perfect accuracy, correctly identifying all 228 instances of DoS Slowloris and all 63 instances of Web Attack–Brute Force, with no misclassifications. The Web Attack-SQL Injection, Infiltration, and Heartbleed attack classes were all perfectly identified, with 100% precision, recall, and F1-scores. The model also exhibited strong performance across less frequent classes like Bot, with only 1 misclassification out of 99 instances, showcasing its ability to handle imbalanced datasets effectively. With an overall accuracy of 99.96%, precision of 99.96%, recall of 99.96%, and F1-score of 99.96%, these results highlight the Transformer-DNN model’s robust ability to manage multi-class classification challenges, making it a reliable candidate for real-world intrusion detection systems where precision and accuracy in classifying diverse attack types are essential.
The Autoencoder–CNN and Transformer–DNN models demonstrates impressive performance in multi-class classification on the NF-BoT-IoT-v2 dataset. The Autoencoder–CNN achieved an overall accuracy of 97.95%, with precision, recall, and F1 score values of 97.97%, 97.95%, and 97.95%, respectively. The Autoencoder–CNN correctly identifies 2670 instances of Reconnaissance, 4154 instances of DDoS, 3475 instances of DoS, and 62 instances of Theft. The Transformer–DNN, with a slightly lower accuracy of 97.90%, achieved precision, recall, and F1 score values of 97.98%, 97.90%, and 97.90%, respectively. It correctly identifies 2633 instances of Reconnaissance, 4154 instances of DDoS, 3507 instances of DoS, and 62 instances of Theft. Both models misclassified some instances among the classes, as shown in
Figure 8. Both models effectively minimized false positives and false negatives, demonstrating their robustness in real-world applications, especially when handling the complexities of imbalanced datasets.
The comparative performances of the proposed Transformer–DNN and Autoencoder–CNN models against other multi-class classifiers demonstrates their exceptional capability in handling complex classification tasks on the CICIDS2017 and NF-BoT-IoT-v2 datasets, as shown in
Figure 9 and
Figure 10. The evaluation metrics, including accuracy, precision, recall, and F1 score, highlight the outstanding performance of both models. On the CICIDS2017 dataset, the Transformer–DNN achieved the highest scores across all metrics, with an impressive 99.96% accuracy, precision, recall, and F1 score. The Autoencoder-CNN model followed closely with scores of 99.95% in each metric. The CNN and DNN models each achieved 99.94% across all metrics, while the Autoencoder model scored slightly lower at 99.93%. On the NF-BoT-IoT-v2 dataset, the Autoencoder–CNN outperformed the Transformer–DNN, with slightly higher accuracy of 97.95% and precision, recall, and F1 scores of 97.97%, 97.95%, and 97.95%, respectively. The Transformer–DNN achieved accuracy of 97.90%, with precision, recall, and F1 scores of 97.98%, 97.90%, and 97.90%, respectively. The CNN model achieved 97.87% accuracy, with precision, recall, and F1 scores of 97.91%, 97.87%, and 97.87%, respectively. The DNN model achieved 97.83% accuracy, with precision, recall, and F1 scores of 97.88%, 97.83%, and 97.82%, respectively. The standalone Autoencoder scored 97.81% for accuracy, with precision, recall, and F1 scores of 97.87%, 97.81%, and 97.81%, respectively. These results underscore the effectiveness of integrating advanced architectures to enhance classification performance when using imbalanced datasets. The results also demonstrate the superiority of the Transformer–DNN and Autoencoder–CNN models in terms of reliable and efficient multi-class classification on both datasets.
The Autoencoder–CNN model demonstrated exceptional performance in multi-class classification on the CICIDS2017 dataset, achieving remarkable accuracy and reliability across diverse attack categories, as detailed in
Table 33. For certain classes such as FTP-Patator, SSH-Patator, infiltration, Web Attack—Sql Injection, and Heartbleed, the model achieved perfect scores in all metrics, including 100% accuracy, precision, recall, and F1 score, underscoring its capability to detect these threats flawlessly. For challenging attack types such as DoS Slowloris and DoS Slowhttptest, the model maintained robust performance with F1 scores of 99.78% and 99.36%, respectively. For classes like PortScan and DDoS, which often involve complex patterns, the model exceled with F1 scores of 99.97% and 99.98%, respectively, indicating its effectiveness in handling intricate attack scenarios. Notably, F1 scores of 99.21% and 99.05% were achieved for Web Attack—Brute Force and Web Attack—XSS, reflecting the model’s adeptness at identifying less frequent attack types. Overall, these comprehensive results affirm the Autoencoder–CNN model’s robustness and suitability for real-world intrusion detection systems where precise and consistent performance is paramount.
The Transformer–DNN model demonstrated exceptional performance in multi-class classification on the CICIDS2017 dataset, achieving outstanding metrics across various attack classes, as detailed in
Table 34. The model achieved perfect scores for several categories, including DoS Hulk, DoS GoldenEye, SSH-Patator, Web Attack—XSS, infiltration, Web Attack—Sql Injection, and Heartbleed, with 100% accuracy, precision, recall, and F1 scores. These results underscore the model’s ability to detect these attack types with complete reliability and no misclassifications. For other attack classes, such as DDoS and PortScan, the model maintained exceptional performance, achieving F1 scores of 99.95% and 99.97%, respectively. Strong metrics were also displayed for challenging attack types including DoS Slowloris and DoS Slowhttptest, with F1 scores of 99.78% and 99.68%, respectively. Additionally, FTP-Patator and Bot attacks were associated with F1 scores of 99.84% and 99.49%, demonstrating the model’s capacity to handle diverse patterns of malicious behavior. Even for Web Attack—Brute Force, the model achieved an impressive F1 score of 99.21%. These comprehensive metrics highlight the Transformer–DNN model’s robustness in handling complex multi-class classification tasks, making it highly suitable for real-world intrusion detection systems where accurate and reliable detection across various attack types is essential.
The performance metrics presented in
Table 35 demonstrate the remarkable capabilities of the Autoencoder–CNN and Transformer–DNN models for multi-class classification using the NF-BoT-IoT-v2 dataset. The Autoencoder–CNN model demonstrated strong performance across all classes, with the highest accuracy of 100% for the Theft class, precision of 96.88%, and recall of 100%, yielding an F1 score of 98.41%. For the Reconnaissance class, it achieved 95.53% accuracy, with precision of 98.63%, recall of 95.53%, and an F1 score of 97.06%. The DDoS class was associated with an accuracy of 99.28%, with precision and recall values of 99.38% and 99.28%, respectively, resulting in an F1 score of 99.33%. For the DoS class, accuracy was 98.25%, with precision of 95.81%, recall of 98.25%, and an F1 score of 97.01%. The Transformer–DNN model performed comparably, with impressive metrics. For the Theft class, it achieved 100% accuracy, 98.41% precision, and 100% recall, with an F1 score of 99.20%. In the Reconnaissance class, it showed 94.20% accuracy, 99.85% precision, and 94.20% recall, with an F1-score of 96.94%. In the DDoS class, it matched the Autoencoder–CNN’s results with an accuracy of 99.28%, precision of 99.38%, recall of 99.28%, and an F1 score of 99.33%. It also demonstrated strong performance in the DoS class, achieving 99.15% accuracy, 94.84% precision, 99.15% recall, and an F1 score of 96.95%. Overall, both models exhibited robust performance in identifying and classifying different types of attacks, with the Autoencoder–CNN excelling in certain classes and the Transformer–DNN showing slightly better results in others.