1. Introduction
The Internet of Things (IoT) is rapidly expanding, bringing forth a transformation in every aspect of our everyday lives. In the IoT paradigm, many objects in the environments are interconnected in the form of a network in one way or another [
1]. The IoT’s development is a complex, large-scale process of technological innovation. At the outset of the IoT’s implementation, operating a domain-specific application was the primary development approach [
2]. This application can be a production control system with industrial control and monitoring capabilities that provides multiple enterprise-related services. IoT applications are currently deployed in cross-industry applications based on the principles of public information services. In large-scale contexts, communication controllers and solution providers regulate and create these IoT applications, which support residential and industrial users. IoT-enabled applications are capable of location sensing [
3], location information sharing [
4], environment sensing [
5], ad hoc networking [
6], secure communication [
7], remote operations and many more and are even capable of different service requirements [
8].
The integration of the IoT for business automation is referred to as the Industrial Internet of Things (IIoT). However, when applied to critical infrastructures, the IIoT can expose them to severe network vulnerabilities, posing a disruptive threat to society [
9]. SCADA systems are commonly utilized in such critical infrastructures to better supervise and control such IIoT application cases. Further, due to open standard protocols being used for communication between the core components, it is more vulnerable to security risks and threats. To deal with the security issues of SCADA-based networks, different types of security techniques are introduced and proposed in the literature including key management for the securing communication protocols [
10,
11,
12], intrusion detection approaches [
13,
14], secure transmission of information [
12], and access control strategies [
15]. Among all the techniques, access control acts as the front-line security mechanism for the systems under threat. Access control has gained much importance in ensuring the prevention of information leakage by monitoring the access of data or resources and preventing the unauthorized transmission of information in SCADA [
16]. The decision to allow access to a resource is known as access control [
16].
In the existing literature, the roles are either assigned to users by network administrators manually or based on the attributes of the users. The assignment of roles to end objects by administrators is referred to as role engineering. The manual establishment of the roles by the network administrators is a tedious process with an impact on the efficiency of the overall process. Likewise, the situation becomes more complex in IIoT use cases due to the presence of heterogeneous devices interacting with each other and the hyperdynamic environment changing rapidly. The dynamic environment rapidly changes the characteristics of end objects in critical infrastructure applications such as metro transportation and industrial automation scenarios. In an attempt to resolve this issue, we propose a framework to automate role engineering using machine learning in access control. We propose two machine learning approaches to automatically execute the role engineering in a complex scenario with heterogeneous devices and changing environments. Following is the summary of the contributions of this proposed study:
- 1.
We provide a detailed analysis of the current trends, gaps, and problems in access control approaches. We present a comparative study of all contemporary access control approaches with respect to the IIoT domain. The study offers a core understanding of the requirements of modern loosely coupled critical infrastructures.
- 2.
We propose a framework for automatic role assignment problems in fine-grained access control. In existing studies, this process is manually adopted, which is a laborious and cumbersome process. By this, the fine-grained access control can achieve maximum flexibility in a time-efficient manner.
- 3.
We leveraged supervised machine learning approaches to map the SCADA-based IIoT system for this problem, which is novel and open to further research. We employed machine learning to automatically execute the role assignment and propagation in such an environment that is changing and generating complex data.
- 4.
We provide a thorough analysis of machine learning mapping to this domain with different hyperparameters and their effect to achieve maximum accuracy. A detailed discussion is provided in later sections of this paper.
- 5.
A thorough comparison is presented between MLP and ELM based on validation, test, and time effectiveness. A different number of hyperparameters were considered in the environmental setup to conduct the experimental results.
The organization of the paper is as follows: First, we discuss the preliminary concepts of SCADA and access control in the following section. The state-of-the-art access control approaches with contemporary trends and open problems are discussed in detail in the literature review followed by the problem formulation for this research study. After that, we discuss the machine learning algorithms employed in this study in the proposed solution section followed by the environment setup section. After that, we discuss the results and performance measurements in the Results Section. Following that, the conclusion and future work are provided at the end.
3. Literature Review
The core concept of RBAC according to the NIST-RBAC-2000-standard [
22] is that users and permissions are assigned to roles and users as members of roles obtain permissions. The relationship between user–role and permission–role in the RBAC model can be many-to-many. A novel access control model based on the RBAC framework was proposed in [
23] using the semantic business roles and intelligent agents to implement intelligent RBAC (I-RBAC). A real dataset for occupational roles from Standard Occupational Classification (SOC) was used in this paper. This framework provides the required level of access control for a multi-domain environment with a highly dynamic nature by applying real-world semantic business roles and intelligent agent technologies. The authors in [
24] proposed a platform using Ethereum’s smart contract technology to identify the role of the trans-organizational environment based on the RBAC model called RBAC-SC. Ethereum is a secure, flexible open blockchain platform in which smart contracts are established to provide decentralized applications serving as autonomous agents, which operate the same as programmed and installed on a blockchain. The authors of [
25] considered the security issues of the Modbus, protocol which is used by most SCADA applications, and proposed a secure RBAC model to provide authorization to the client, as well as the Modbus frame. The Transport Layer Security (TLS) protocol was used to achieve authentication in the system after the completion of certificate verification at two endpoints.
In ABAC, authorization policies to determine an access decision are specified using the attributes or characteristics of objects in an access event. In order to mitigate the limitations of RBAC, the authors in [
26] proposed a novel ABAC-based access control that is more flexible to serve the needs of IoT use cases such as smart devices and make the data exchange more secure in a cloud–IoT environment.
Another ABAC-based model was presented in [
27] for managing shared IoT devices in smart cities. In this model, the users hold their attributes and request authorizations by using diverse entities by setting up smart contracts. At the time of access, a trust level is calculated for each attribute whose value is dependent on the combined trust of each approving entity. The authors in [
28] also proposed a formal ABAC model named ITS-ABACG to address the issues related to access control in the Industrial Internet of Vehicles (IIoV). The concept of groups was introduced in the proposed model, which is used to assign different smart entities according to different attributes such as location, direction, speed, and some others. A taxonomy of current access control methods that are being adopted in cross domain applications is presented in
Figure 5.
Different types of approaches for privacy preservation have been proposed in the field of electronic health record (EHR) systems. For example, in [
29], the authors proposed an ABAC model based on the Extensible Access Control Markup Language (XACML) for cloud-based EHR systems using XML encryption and XML digital signature techniques. A novel ABAC approach was proposed in [
30] based on blockchain technology for IoT systems. This scheme has overcome the problem of maintaining an access control list for individuals in the system. According to the system, every device is defined by a set of predefined attributes, which are issued by the attribute authorities based on its identity or capability. For this purpose, the record of attribute distribution is stored using a blockchain.
However, to resolve the limitations of both ABAC and RBAC models, the authors in [
31] proposed a hybrid approach for access control named hybrid access control (HAC), which is based on the dynamic conflict of interest (COI) on the level of the role to provide secure localization of vehicles based on the IoT and satellites. This hybrid model is the combination of the ABAC and RBAC models, and new attributes of RBAC entities are added, hence extending the RBAC model. A novel and dynamic access control model named authorizing workflow task role-based access control (AW-TRBAC) was proposed in [
32], which is based on the dynamic segregation of duties (SoD) and process workflow, focusing on the task instance restrictions for the restriction of roles, governance of access, and logs.
The authors in [
33] discussed the IIoT vulnerabilities in the context of industrial processes. To make the business application more reliable, the authors proposed a framework based on blockchain that leverages machine learning algorithms to detect and mitigate attacks and security vulnerabilities in real-time. Blockchain technology was used for sensor access control management using smart contracts, and various machine learning algorithms such as ANN, SVM, DT, and naive Bayes were experimented with to validate the efficiency of the proposed framework. The authors in [
34] also attempted to address the data breach vulnerabilities by proposing a deep learning privacy preservation framework. The framework safeguards the data by employing the attribute-based access control using the convolutional neural network (CNN). The proposed scheme considers the IIoT application for healthcare where massive data are produced and gathered. These data are used to explore the relationship between the users’ trust and their attributes using the CNN in this work. Similarly, to safeguard the data breach vulnerabilities and provide a better mechanism for data privacy in IIoT use cases and applications, the authors in [
35] proposed a novel framework named ProModChain, which uses the Ethereum-based blockchain and federated learning to safeguard the privacy and trustworthiness of IIoT data. Federated machine learning is used to provide a global representation of the environment knowledge in distributed IIoT settings. The coordination between the private nodes is enforced using smart contracts for safety and transparency. Through the evaluation setup, the proposed model had significant results.
In [
36], the authors leveraged machine learning for the role engineering process for access control. The authors argued that using access control as a frontline mechanism can ensure data privacy and integrity in critical infrastructures. However, in access control, the roles are manually extracted, which affects the efficiency and applicability of this approach. To reduce manual efforts, the authors employed Adaboost and SVM for the automatic role engineering process. Through evaluation experiments, the models presented good results. To further automate the access control mechanism, the authors of [
37] leveraged a transformer-based deep learning approach to extract the access control policies from user and business stories. The authors argued that agile software development involves the user stories to incrementally develop the system, and the same idea can be employed to automate the policy specification. The proposed model takes inputs from the user stories and then detects if the provided input can be used for policy extraction or not. Further, it explores the actors, data objects, and their operational relationships to project them in the form of an access control policy.
The authors in [
38] argued that critical data-intensive systems are always subjected to data access breaches while providing services to requests. To resolve these issues, the authors leveraged machine learning to propose a novel framework that is risk-adaptive. The proposed framework evaluates the genuineness of the requester and then calculates the risk attached to resolving the request. The proposed framework considers many contextual features of the requester in real-time such as the time, location, and previous history of the requester to calculate the risk.
5. Proposed Role Engineering Approach
The approach to mapping the attribute-based access control is intuitive. At any time, for instance, t, the state of the SCADA node in the large-scale and fast-changing environment can be represented as the attributes of the nodes. The mapping of these node states to attributes can be formulated as:
However, the attributes represented can be dynamic and static based on the type of information of the SCADA device. For example, if there are
attributes for each, m is the static attribute such as the position of the device and n is the dynamic attribute such as the time of the day. The conventional RBAC approach lacks automation in role assignment. However, to obtain effective, yet efficient access control in a large-scale and dynamic environment, the integration of the advantages offered by both RBAC and ABAC can be promising. To build such a system, these attributes can be mapped to a finite set of roles using machine learning approaches. One issue that relates to ABAC is that automatic role propagation can lead to role explosion, where there are too many roles and required permissions in a large-scale enterprise. Nevertheless, this issue can be resolved in the integration of RBAC and ABAC and can be referred to as a hybrid access control mechanism. This proposed hybrid integration is efficient and effective in a dynamic environment. Based on the availability of the user, attributes, and roles, RBAC is applied to static attributes, and ABAC is applied to dynamic attributes. In that way, the hybrid model will be less computational complex. For example, for
attributes in the hybrid model, the result will be
roles and
rules, making it less complex than RBAC with
roles and ABAC with
rules, as follows:
We can develop the hybrid context-aware access control with automated role engineering. Knowledge propagation and role assignment can be achieved by leveraging machine learning. For a machine learning model, the input is the combination of both static and dynamic attributes and the output is the decision of the model by inferring what role should be assigned to users’ attributes with a set of allowed permissions. The role of the machine learning model is to approximate a function that maps the attributes to role assignment with a set of permission. The weights for models can be learned initially by the manual role–attribute structure setups for IoT devices in the network.
Machine Learning for Role Engineering
In this environment, different sensing nodes can capture different types of data corresponding to the environment application requirements. These captured data can be accessed by the set of users that have the access right privileges. Combining this with RBAC, the different users have different attributes that can be leveraged to determine the role of the user [
39]. In the literature, this concept of mapping the dynamic characteristic of users to determine the role is referred to as the fine-grained access control (FGAC) model. In a WSN, the goal of FGAC is to map the unique privilege right to the user or end device based on attributes to access the piece of information [
40].
Based on the availability of the manual user–role relationship by mathematical proof, the optimality of the automated role assignment can be guaranteed. However, the automated role assignment cannot guarantee the exposure of the system to various attacks such as denial of service, insider attack, and man in the middle. This problem can be resolved by using attribute-based encrypted systems to provide a safeguard against such attacks [
41]. In such systems, integrating the machine-learning-based automated role assignment can provide accurate modeling of user–role relationships, making the system efficient and effective in terms of time and cost. In large-scale scenarios where roles are not manageable, fine-grained access policies better serve the purpose. The application scenario of this paper is to apply the role assignment for fine-grained access control based on encrypted data in mobile edge computing, but this scenario can be altered to encrypted sensory data of SCADA-based systems. The tailored scenario is similar in terms of SCADA sensors sharing the data with the edge server and a piece of that information is shared with the reporting authorities [
17].
The key idea behind the automated role assignment is to learn the sensing data patterns and predict the end device node based on the characteristics the end device can have at any time instance
t. For this purpose, different machine learning classifiers can be leveraged to learn the user–role relationship and predict the roles at runtime by analyzing the context of the end device attributes. In [
36], the authors leveraged Adaboost and SVM to predict the device roles and automated role propagation. The authors discussed that, since the sensors’ data are usually not well separated, especially in the IoT environment, a predictive model can suffer from high variation in the results due to uncertainty lying in the data since they are not well separated. In this paper, we extended the work of [
36] by leveraging the feedforward network (multi layer perceptron) and extreme learning machine (ELM) for this task along with conventional machine learning models.