1. Introduction
Industry 5.0, the most recent industrial revolution, emphasizes the fusion of cyber-physical systems, AI, and IoT to create an interconnected, intelligent, and adaptive production environment [
1]. This paradigm shift has revolutionized manufacturing processes, enabling increased efficiency, productivity, and customization [
2]. It also facilitates the optimization of resources, i.e., energy efficiency, and reduced waste [
3]. As a result, Industry 5.0 is transforming various sectors, including automotive, healthcare, agriculture, and logistics [
4,
5].
However, the growing interconnectedness and complexity of Industry 5.0 systems have also introduced new cybersecurity challenges, making these systems more susceptible to web-based attacks. Industry 5.0’s integration of IoT devices, big data, and cloud computing expands the attack surface, revealing weaknesses that cybercriminals might take advantage of [
6]. Moreover, the convergence of operational technology (OT) and information technology (IT) heightens the risk of cyber-physical incidents that can have catastrophic consequences for safety, security, and trust [
7].
Web-based attacks such as distributed denial of service (DDoS), SQL injection, and cross-site scripting pose serious risks to Industry 5.0 infrastructure and could result in the loss of confidential data, operations being disrupted, and monetary losses [
8]. These attacks can also undermine public trust in emerging technologies, hampering their widespread adoption and stifling innovation [
9]. To protect the assets and ensure the resilience of Industry 5.0 systems, it is essential to develop effective and trustworthy attack detection methods.
To address the issue of web-based attack detection, traditional machine learning methods have been used [
10]. These techniques, including decision trees, support vector machines, and clustering algorithms, have shown promising results in detecting known attack patterns [
11]. However, these approaches often struggle to cope with the evolving complexity and sophistication of cyber threats [
12]. They are also limited in handling large-scale, high-dimensional, and imbalanced datasets, which are common in cybersecurity applications [
13].
Deep learning techniques, which have shown remarkable success in a variety of domains such as image recognition, natural language processing, and speech recognition, offer promising alternatives for improving cybersecurity in Industry 5.0 [
14]. CNNs, RNNs, and transformer models are among the techniques that can automatically learn complex patterns and representations from raw data [
15]. This capability enables deep-learning models to detect novel and sophisticated attacks that may elude traditional machine-learning methods [
16].
Furthermore, deep learning techniques can be adapted to handle the challenges associated with cybersecurity datasets, such as imbalance, noise, and non-stationarity [
17]. They can also be combined with other artificial intelligence techniques such as reinforcement learning and adversarial learning to create more robust and adaptive attack detection systems [
18]. Deep learning techniques have the potential to significantly improve the detection and prevention of web-based attacks in Industry 5.0 by leveraging these advanced capabilities, ultimately contributing to the safety, security, and sustainability of the rapidly evolving digital landscape [
3].
Furthermore, in Industry 5.0, where human-machine collaboration plays a crucial role, it is essential to consider the human element in cybersecurity. Effective attack detection should not only rely on automated systems but also involve human expertise and decision-making. Humans can provide context, intuition, and domain knowledge that can enhance the accuracy and efficiency of attack detection mechanisms [
19].
Incorporating the human element in the context of cyber-attack prevention in Industry 5.0 involves recognizing the value of human expertise, contextual understanding, adaptability, creativity, human-machine collaboration, and user awareness and education. Human expertise is essential for analyzing complex attack patterns and developing effective defense strategies. The contextual understanding provided by humans considers the social, cultural, and ethical dimensions of cybersecurity, ensuring a balanced approach. Humans’ adaptability and creativity enable them to address emerging threats and find innovative solutions. Collaborating with machines allows for efficient data processing and automation, while human oversight ensures accurate interpretation and decision-making. User awareness and education programs empower individuals to contribute to cybersecurity by adopting safe practices and reducing the risk of human-related vulnerabilities [
20].
Overall, integrating the human element in Industry 5.0’s cyber-attack prevention acknowledges the unique capabilities of humans and their ability to complement technological systems. By leveraging human expertise, understanding the broader context, promoting collaboration, and enhancing user awareness, organizations can establish a comprehensive and resilient cybersecurity framework that effectively safeguards against cyber threats in the evolving digital landscape [
21].
In the context of cyber-attack prevention in Industry 5.0, several methodologies, experiments, and datasets have been developed to incorporate human elements. These efforts aim to leverage human expertise, behavior, and interactions to enhance cybersecurity measures such as user behavior analytics and human centric cyber security datasets [
22].
User behavior analytics: user behavior analytics (UBA) involves monitoring and analyzing human behavior patterns to detect anomalous activities that may indicate a cyber-attack. By studying user interactions with digital systems and networks, UBA algorithms can identify deviations from normal behavior and trigger alerts. Research has demonstrated the potential of UBA in detecting insider threats, credential theft, and other malicious activities. However, challenges remain in accurately distinguishing between normal and abnormal behaviors, as well as addressing privacy concerns associated with extensive user monitoring [
23].
Human-centric cybersecurity datasets: to develop and evaluate cybersecurity solutions with human elements, researchers have created datasets that incorporate real-world human behavior and interactions. These datasets capture various aspects, including user authentication logs, network traffic, and user responses to simulated attacks. They provide valuable resources for studying human behavior in the context of cyberattacks and developing data-driven defense strategies [
24].
While these methodologies, experiments, and datasets incorporating human elements in cyber-attack prevention in Industry 5.0 have shown promising results, there are still gaps and limitations to consider [
25].
Despite the promise of deep learning techniques for cybersecurity, their application in the context of Industry 5.0 remains relatively unexplored. Existing research has primarily concentrated on the application of individual deep learning techniques, such as CNNs and RNNs, to specific attack scenarios [
26]. However, in Industry 5.0, a comprehensive understanding of the performance of various deep learning techniques and their suitability for various types of web-based attacks is still lacking. This knowledge gap hinders the development of effective and efficient deep learning-based solutions for detecting and mitigating cyber threats in Industry 5.0 environments [
27].
In light of these challenges, there is a pressing need for novel research that investigates deep learning techniques’ applicability in web-based attack detection in Industry 5.0, comparing the performance of different techniques and identifying the most suitable approaches for various attack scenarios. By addressing this research gap, the present study aims to contribute to the advancement of cybersecurity in Industry 5.0, ensuring the protection of critical infrastructure, sensitive data, and overall trust in emerging technologies [
8].
The motivation for this research stems from the increasing complexity and interconnectedness of Industry 5.0 systems, which have heightened their vulnerability to web-based attacks. Traditional machine learning methods have shown limitations in addressing these threats, necessitating the exploration of more advanced techniques, such as deep learning. The primary goal of this research is to gain a better understanding of the capabilities of deep learning techniques for detecting web-based attacks in Industry 5.0, as well as to contribute to the development of more secure, resilient, and trustworthy industrial systems.
Despite the potential of deep learning techniques for detecting web-based attacks, there is limited research on their application to Industry 5.0 environments. Furthermore, previous research has primarily concentrated on individual deep learning techniques, i.e., CNNs or RNNs, without considering the full range of possibilities or their performance in comparison with one another [
26].
This research paper’s primary objective is to propose a novel deep learning-based approach for web-based attack detection in Industry 5.0 by comparing the performance of CNNs, RNNs, and transformer models. This study aims to:
Investigate the use of deep learning approaches in identifying web-based attacks in Industry 5.0 scenarios.
Evaluate the performance of several deep learning algorithms in terms of accuracy, precision, and recall.
In Industry 5.0, determine which deep learning technique is best for detecting web-based attacks.
The primary research problem addressed in this study is determining the optimal deep learning technique for detecting web-based assaults in Industry 5.0. Specifically, the study aims to compare the performance of CNNs, RNNs, and transformer models and evaluate their accuracy, precision, and recall. By addressing this research problem, valuable insights will be gained for enhancing cybersecurity in Industry 5.0 systems. The rest of the paper is organized into four sections.
Section 2 provides a literature review on Industry 5.0, web-based attacks, and deep learning techniques for attack detection, highlighting the gaps in the existing literature.
Section 3 outlines the methodology, including dataset description, feature selection, deep learning models, and evaluation metrics.
Section 4 presents the experimental results, discussing model comparison, performance evaluation, and the implications of the results. Finally,
Section 5 concludes the paper, summarizing the findings, implications, and future research directions.
3. Methodology
This section discusses the methodology used for developing and evaluating deep learning models for intrusion detection. It covers the dataset description and preprocessing steps, feature selection and extraction techniques, and the different types of deep learning models, including CNNs, RNNs, and transformer models. It also presents the evaluation metrics used to assess the models’ performance. An overview of the proposed methodology is presented in
Figure 1.
3.1. Datasets and Pre-Processing
The dataset used in this research is a combination of the KDD Cup 1999 dataset [
40] and the more recent CICIDS2017 dataset [
41], which provide a comprehensive collection of various web-based attacks, including DDoS, SQL injection, and cross-site scripting attacks. Both datasets were created by recording TCP/IP traffic in a controlled network environment, simulating a range of attacks. A detailed description is given in
Table 5.
The KDD Cup 1999 dataset comprises approximately 5 million connection records, where each connection is described by 41 features and labeled as either ‘normal’ or an ‘attack’, with the latter further categorized into four major types: denial of service (DoS), remote to local (R2L), user to root (U2R), and probe.
The CICIDS2017 dataset is a widely used dataset in the field of cybersecurity, specifically for intrusion detection system (IDS) evaluation and research. It consists of about 2.8 million instances, each described by 79 features. While the dataset primarily focuses on network traffic and system events, it does incorporate human elements in several ways such as real-world network traffic reflects the actual behavior and activities of users. Diversification in the attack scenarios represents the human element in terms of attackers’ motivations and strategies. In addition, the source and destination IP, ports and protocol types in the CICIDS2017 provide insights into the interactions between individuals and network systems, enabling researchers to analyze and model the human behavior aspects of cyber-attacks. Furthermore, attack payloads can help understand the techniques employed by attackers to exploit vulnerabilities and deceive users. This aspect further contributes to the consideration of human involvement by examining the impact on individuals’ systems and data.
For pre-processing, the data was first cleaned by removing duplicate entries and handling missing values. Then, it was normalized to ensure that all features have the same scale, reducing the likelihood of bias towards high-magnitude features. Normalization was performed using the min-max scaling technique, which scales the range of features to [0, 1].
3.2. Feature Selection and Extraction
The high dimensionality of the datasets poses a challenge for any machine learning model, as it can lead to overfitting and increased computational complexity. Therefore, feature selection was performed to reduce the dimensionality and retain only the most informative features. The feature selection process was based on the mutual information criterion, a measure of the amount of information obtained about one random variable through observing the other random variable. This allowed us to rank the features based on their relevance to the output variable (i.e., attack type) and select the top-ranked features.
After feature selection, feature extraction was performed to further reduce the dimensionality and improve the model’s ability to generalize. Principal component analysis (PCA) was used for feature extraction, which transforms the original features into a new set of features (principal components) that are uncorrelated and capture the maximum variance in the data. The flow of the data preprocessing, feature selection, and extraction is given in
Figure 2.
3.3. Deep Learning Models
In this research, we employ three types of deep learning models: CNNs, RNNs, and transformer models. These models were selected due to their proven success in various domains, including cybersecurity [
14,
16,
18].
3.3.1. Convolutional Neural Networks (CNNs)
CNNs are primarily used in image processing tasks due to their ability to capture local patterns and spatial hierarchies in the data [
33]. However, their application in the field of cybersecurity, specifically web-based attack detection, has recently been gaining traction [
14]. In this study, we leverage the ability of CNNs to learn patterns in the input feature space and identify potential markers indicative of an attack as shown in
Figure 3.
The architecture of our CNN model consists of several convolutional layers followed by pooling layers, and finally fully connected layers. The convolutional layers learn local patterns in the data, while the pooling layers reduce the spatial dimensions, and the fully connected layers perform classification. The architecture of the CNN model is given in
Figure 4.
3.3.2. Recurrent Neural Networks (RNNs)
RNNs are designed to process sequential data, making them suitable for tasks involving temporal dependencies [
43]. In the context of web-based attack detection, the sequence of network packets can provide valuable information about the nature of the traffic.
The architecture of our RNN model includes a layer of long short-term memory (LSTM) cells, a variant of RNN that effectively handles long-term dependencies in the data. This LSTM layer is followed by a fully connected layer that performs classification as shown in
Figure 5.
3.3.3. Transformer Models
Transformer models, based on the ‘attention’ mechanism, have revolutionized the field of natural language processing [
16]. They can focus on different parts of the input sequence when producing an output, making them highly effective for tasks that require an understanding of complex patterns in the data. The architecture of the transformer model is shown in
Figure 6.
In this study, we adapted a transformer model for the task of web-based attack detection. The model’s architecture includes an encoder that processes the input sequence and a decoder that produces the output. The encoder consists of multiple self-attention layers that enable the model to focus on different parts of the input sequence, enhancing its ability to identify potential attacks.
Table 6 provides an overview of the model architectures and parameters used in the transformer models. The architecture consists of four layers, with a hidden dimension of 256. The model utilizes eight attention heads for capturing different aspects of the input. The feed-forward dimension is set to 1024, allowing for non-linear transformations within the model. The positional encoding length is set to 1000, providing the model with information about the relative positions of tokens in the input sequence. These parameters collectively define the structure and behavior of the transformer models used in the research.
3.4. Models Evaluation Metrics
In order to validate the experiments, there may be unseen threats to the validity of experimentation encompass various aspects that may introduce biases or limitations to the study’s findings. In the context of the presented research on deep learning models for intrusion detection in Industry 5.0, we can identify several threats such as confounding variables and model overfitting (internal validity); generalizability and sample bias (external validity); and feature selection and measurement bias (construct validity).
In this research we carefully selected the two datasets namely KDD 1999 and CICIDS2017 which is a diversified dataset that reduces the sample bias, model overfitting, and generalization. CICIDS2017 is commonly used dataset as a use case of Industry 5.0 [
46,
47,
48,
49]. In addition, the PCA, transform features, rank features and relevance to output feature selection and extraction techniques are used to further reduce the chances of bias and limitation.
Table 7 represents the size of features.
Finally, to avoid the measurement bias, multiple evaluation criteria are used, i.e., accuracy, precision, recall, and F measures. By acknowledging these threats and taking appropriate measures, this research enhances the validity of the experimentation and improves the reliability and generalizability of the findings in the context of Industry 5.0.
3.4.1. Accuracy
It is the most intuitive performance measure. Accuracy is the ratio of correctly predicted instances (both positive and negative) to the total number of instances. Accuracy is calculated as follows:
where
TP is the number of true positives (attacks correctly identified as attacks),
TN is the number of true negatives (normal behavior correctly identified as normal),
FP is the number of false positives (normal behavior incorrectly identified as an attack), and
FN is the number of false negatives (attacks incorrectly identified as normal).
3.4.2. Precision
Precision is also known as the positive predictive value; precision is the ratio of correctly predicted positive instances to the total predicted positive instances. It is calculated as follows:
Precision measures the ability of a classifier not to label a negative sample as positive.
3.4.3. Recall
Recall is also known as sensitivity, hit rate, or true positive (TP); recall is the ratio of correctly predicted positive instances to the total actual positive instances. It is calculated as follows:
Recall measures the ability of a classifier to find all the positive samples.
3.4.4. F1 Score
F1 score is the weighted average of precision and recall. Therefore, this score takes both false positives and false negatives into account. It is usually more useful than accuracy, especially if you have an uneven class distribution. The F1 score is calculated as follows:
The models’ performances are evaluated using these metrics, and the results are presented in the next chapter. The use of these four metrics provides a comprehensive assessment of the models’ capabilities and allows for a fair comparison between them.
4. Results and Discussion
In this section, we present the results of our experiments with the three deep learning models, i.e., CNNs, RNNs, and transformer models. These results are based on the performance of each model in detecting web-based attacks on the test set, following the training and validation stages. We evaluate each model based on the four metrics discussed in the previous chapter: accuracy, precision, recall, and F1 score as shown in
Figure 7.
4.1. Models Performance Evaluation
The performance of each model according to the four metrics is shown in
Table 8. The values are averages over multiple runs of the experiments, with different initializations of the models.
All three models achieved high performance with accuracy above 0.94 and F1 scores above 0.92. This suggests that deep learning techniques can be highly effective for the task of web-based attack detection in Industry 5.0.
Figure 8 shows the confusion matrix of predicted data.
However, there are some differences between the models. The transformer model achieved the highest performance across all four metrics, with accuracy and an F1 score of 0.96 and 0.94, respectively. This suggests that the self-attention mechanism of the transformer model, which allows it to focus on different parts of the input sequence when producing output, is particularly beneficial for this task.
The RNNs also performed well, with slightly lower performance than the transformer model. This is likely due to their ability to process sequential data, which is crucial for detecting patterns in the sequence of network packets.
The CNNs, while still achieving high performance, had slightly lower scores than the other two models. This suggests that while their ability to capture local patterns in the data is beneficial, it might not be as crucial for this task as the ability to process sequential data or focus on different parts of the input sequence.
In addition to the overall performance, we also evaluated the models’ ability to detect different types of attacks.
Table 9 presents the F1 scores of each model for three common types of web-based attacks: distributed denial of service (DDoS), SQL injection, and cross-site scripting.
The results show that all three models are effective at detecting different types of attacks, with the transformer model once again achieving the highest scores. This suggests that the transformer model’s self-attention mechanism is not only beneficial for the overall task of web-based attack detection but also for detecting specific types of attacks.
4.2. Comparison with State-of-the-Art Techniques
In addition to the evaluation of the proposed deep learning techniques, it is crucial to place these results in the context of existing state-of-the-art techniques. This comparison provides a benchmark for understanding the extent of improvement achieved by the proposed models.
Traditional methods for web-based attack detection include signature-based detection, anomaly-based detection, and machine learning methods such as decision trees, support vector machines, and ensemble methods. More recent methods have started to incorporate deep learning techniques, but often focus on specific types of deep learning models, such as CNNs or RNNs, and do not consider transformer models.
Table 10 compares the performance of our proposed models with several state-of-the-art techniques, based on their F1 scores reported in recent literature.
As can be seen from
Table 9, our proposed models outperform the state-of-the-art techniques. The transformer model, in particular, achieves an F1 score that is 0.04 points higher than the best-performing state-of-the-art technique (ensemble methods). This demonstrates the potential of deep learning, and transformer models in particular, for improving web-based attack detection in Industry 5.0.
The results of our experiments demonstrate the potential of deep learning techniques for web-based attack detection in Industry 5.0. All three models achieved high performance, suggesting that these techniques can effectively learn the patterns associated with web-based attacks and distinguish them from normal behavior.
Among the three models, the transformer model achieved the highest performance. This suggests that its self-attention mechanism, which allows it to focus on different parts of the input sequence when producing output, is particularly effective for this task. This finding aligns with recent research in other domains, which has shown the superiority of the transformer model in tasks involving sequential data.
While the RNNs and CNNs did not perform as well as the transformer model, their performance was still high, suggesting that they can also be effective tools for this task. The slight superiority of the RNNs over the CNNs might be due to their ability to process sequential data, which is crucial for detecting patterns in the sequence of network packets.
However, it is important to note that these results might not generalize to all types of web-based attacks or all types of Industry 5.0 systems. Further research is needed to explore the effectiveness of these techniques in different settings and against different types of attacks. Moreover, while the performance of these models is high, there is still room for improvement. Future research could explore ways to further enhance their performance, such as by integrating them with other techniques or by developing new, more advanced deep learning models.
5. Conclusions
In this study, we investigated the application of deep learning techniques, specifically CNNs, RNNs, and transformer models, for web-based attack detection in Industry 5.0. Our findings suggest that these deep learning techniques can effectively detect web-based attacks, with an overall high performance across all models. Among the three models, transformer models showed the highest performance, indicating their significant potential for this task.
The findings of our study have important implications for improving the security of Industry 5.0. Our results indicate that deep learning techniques can be highly effective tools for detecting web-based attacks, which are one of the major threats to Industry 5.0. Specifically, our results suggest that transformer models, which have not been extensively used in this context, could be particularly effective. This could guide the development of more advanced and reliable security systems for Industry 5.0, contributing to the resilience and sustainability of these systems.
Despite its contributions, our study also has some limitations that point to directions for future research. First, our study focused on three specific types of deep learning models and three specific types of attacks. Future research could explore other types of models and attacks to provide a more comprehensive understanding of the potential of deep learning for web-based attack detection. Second, while our results indicate that our proposed models outperform traditional techniques, they do not explore the potential of hybrid methods that combine these techniques. Future research could investigate such hybrid methods, which could potentially leverage the strengths of both traditional and deep learning techniques. Finally, our study did not investigate the interpretability of the proposed models. Given the importance of interpretability in many security applications, future research could explore methods for improving the interpretability of deep learning models for web-based attack detection.