1. Introduction
In recent years, the enormous growth of IoT devices has been witnessed, where billions of internet-connected devices are used around the world, exchanging substantial, detailed, and sensitive contextual data about the environment and users each day. Recent statistics show that the number of IoT-connected devices is estimated to increase up to 30.9 billion by 2025, as shown in
Figure 1 years* forecasts, and the damages related to cybercrime are projected to hit more than
$6 trillion annually by 2025 [
1].
As IoT devices perform various exchange activities, there is always a concern about privacy, security, and an impact on the confidentiality, integrity and availability (CIA) of the communicated information. Different components of the connected world create different challenges and problems for the security of cyber-physical systems. Over the years, the equation of cyber security has diversified from militaries to individuals (
Figure 2), from universities to hospitals and from the aviation industry to manufacturing. Over time, a huge number of IoT attacks, such as ransomware, phishing, denial of service, privilege escalation, and configuration manipulation, have been reported to cause disastrous financial, legislative, and reputational exposures [
2]. The attack paradigm is shifting from general system attacks to IoT settings. From January to June 2021, the Kaspersky data breach report stated around 1.51 billion IoT attacks [
3]. To avoid the exploitation of vulnerabilities, IoT devices should be secured using dedicated security tools [
4].
The industrial internet of things (IIoT) is a network of intelligent devices that are linked together to create an IoT ecosystem that exchanges, stores, and analyzes the data. A typical industrial IoT ecosystem includes infrastructure for public and/or private data communications, ranging from sensors, actuators, and data stores to complex industrial robots. More specifically, IIoT refers to all connected instruments and devices that, when combined with industrial applications such as production and energy management, create a complex network of services that enable automation at a higher level. Connected sensors and actuators enhance business intelligence by allowing businesses to identify inefficiencies and malfunctions in advance, saving time and money. However, the constant expansion of the connections and usage of industry-standard communication protocols makes it imperative to safeguard vital industrial systems from cyber security risks [
5]. In addition to having a continual connection to the internet and industrial networks, the industrial systems that regulate the production process and functioning of smart factories also have access to the data and information of their affiliated business organizations. Industrial control systems (ICS) is a common term for such an arrangement [
6].
Industrial protocols are the real-time communication protocols that are used to interact and communicate in a specific industrial scenario, among the control devices used in the IIoT environment. The protocols such as common industrial protocol (CIP), Modbus, distributed network protocol 3 (DNP3), message queuing telemetry transport (MQTT) are some of the industrial protocols frequently used in industrial control systems. These are the application layer protocols and have specific function codes and fields that contain the information that is needed to be exchanged between the field IIoT devices. These protocols are often secured by transport layer security (TLS) protocols that share secret keys and certificates for secure communications. Some of the protocols are object-based; each device in the network is an object and the objects request and respond according to the functions of the industrial control systems. The industrial protocols have specific function codes that are predefined for the recognition of specific tasks, i.e., read and write requests and responses. Anomaly or attack detection of the networks that are using these types of protocols can be performed by writing the rules against the function codes and port numbers and using the combinations of the fields of the protocols.
Adversaries intrude on the privacy of organizations, homes, or industries by exploiting the vulnerabilities in the IoT devices incorporated for automation and human ease. A typical example of espionage in industrial IoT includes compromising the control center through any susceptible sensor or programmable logic controller (PLC). Similarly, smart home systems can be compromised through a single susceptible device surrendering the entire system [
7]. The attacks can be performed on devices, communication networks, or on software applications that are used to control IoT devices. Malware can be injected into the devices compromising firmware that erroneously updates the controls and other parts such as memory and connection points [
8]. IIoT security, with a range of potential threats, attack surfaces, and vectors, is a challenging task. However, it can be managed by skillfully securing the layered architecture of the IIoT network. Each layer has its vulnerabilities and limitations that must be considered to assure its security by shielding it from various distinct forms of attacks [
9].
The crucial need to protect private and public data stored on distinct IT infrastructures, utility infrastructures, cloud data centers, personal machines, PCs, laptops, tablets, smartphones, and IoT devices escalates here. More data means more digital footprint and a higher risk of cyber espionage. The enormous growth of IoT devices has attracted hackers to perform diverse attacks on IoT devices as they generally lack built-in security features to counter the threats. The attackers take advantage of the continuous connectivity of IoT devices to the internet and deploy distinctive attacks including denial of service, malware infiltration, man-in-the-middle, and firmware hijacking, to name a few. Common approaches for the detection and prediction of attacks in such networks are machine and deep learning models. The machine learning models work by getting trained on the historical network data and predicting the incoming traffic based on the previous data patterns. Both supervised and unsupervised ML algorithms such as decision trees (DT), random forest (RF), support vector machines (SVM) and logistic regressions (LR), K-nearest neighbor (KNN), neural networks, and K-mean clustering have been used in security prediction models. Attacks on IoT communication, such as denial of service, SYN flooding, and man-in-the-middle can be effectively detected using machine learning models. The right algorithm for the best detection of the attacks on a specific type of network can be decided by applying the model and calculating the performance measure such as accuracy, precision, recall, and the F1-score of the model. These machine learning-based computational models are deployed on the networks through firewalls, intrusion detection and prevention systems, and other dedicated security tools.
As industries and enterprises are heavily moving towards sensor-based automated devices and rigorously producing huge amounts of data through vulnerable communication protocols, a security information and event management (SIEM) solution with the latest IoT data and network protocols and standards is necessary. Wazuh is an open-source SIEM solution, and its architecture is mainly based on the agents that run on the monitored hosts and send security information to a centralized SIEM server. By integrating separate functions into a single agent and platform architecture, the SIEM server ensures endpoint security, threat intelligence, detailed security operations, and even cloud security. This research focused on the limitation of the computational ability of IoT/IIoT devices to run the agents on them, especially the ones having operating system dependencies. The researchers signified that a holistic approach was needed to protect the IoT/IIoT devices without compromising their computational power and installing any agent or running any script on them to collect the sensitive communication data securely. Knowing the fact that IoT devices continuously exchange data through the gateway, the agentless approach is carried out by sniffing the traffic from the gateway and passing it on to the machine learning model for initial detection of anomalous traffic, and the generated ML model prediction is converted into JSON logs, which are transmitted to the Wazuh SIEM for further rule-based security monitoring. In the proposed scheme, Wazuh uses the log forwarder installed on the system, on the gateway, which is responsible for passing the generated logs to the server for IIoT protocol analysis, decoding, and dynamic rule writing. The main contributions of this research are:
Attack detection at the gateway in industrial IoT devices using ML models.
Incorporating the predicted output of ML models within the Wazuh SIEM.
Dynamic rule writing for alert generation based on predictions.
Industrial protocol analysis and feature extraction for custom rule writing.
Testing and validation of rules for event monitoring on the SWaT dataset.
In this research study, enhanced security is implemented using machine learning predictions, dynamic rule writing, and industrial SIEM solution for industrial IoT devices. The SIEM solution, which is a powerful tool used for the security and monitoring of remote devices, is integrated with the security framework of this research. The machine learning model is implemented at the gateway level for detection and log generation. Dynamic rules on ML prediction and the rules based on the protocols used in the industrial cyber-physical system are written on the SIEM side for further threat intelligence and event monitoring. The proposed agentless scheme works without installing any agent or running any script on the endpoints, i.e., IoT/IIoT devices.
The rest of the paper is arranged as follows:
Section 2 contains the literature review and comparison of this research with the previous studies, while
Section 3 defines the experimental design components. In
Section 4, the methodology is explained, and the results are discussed in
Section 5. Finally, the conclusion and future work are presented in
Section 6.
2. Literature Review
As IoT/IIoT is a network of household appliances in smart homes and industrial devices such as PLCs, sensors, actuators, and human machine interfaces (HMIs) in an industrial environment; a lot of sensitive data are sent and received from one device to another. This routed data are critical to ongoing operations and requires security to avoid risks of exploitation and malicious attempts of an adversary [
10]. In a recent research work [
11], Splunk SIEM was implemented to improve the security of an organization. The Splunk SIEM tool uses universal forwarding agents (UFA) installed on the devices to be monitored and forwards the data to the PC having Splunk Enterprise for further processing, i.e., indexing and searching for the events. In the same research, the authors presented four rules for real-time monitoring of the event logs and scheduled reporting. The alerts generated by implemented rules were forwarded by email notification.
In [
12], researchers used the hypertext transfer protocol (HTTP) and transmission control protocol (TCP) detection models with efficient machine learning models (decision trees) for mobile malware app detection. The network traffic to be monitored was collected using the tcpdump tool, which was then mirrored to the server where it was monitored and managed thoroughly, hence not affecting the devices’ performance due to the network monitoring. Features selected for network traffic monitoring were TCP flow features, i.e., uploading and downloading bytes; total uploading and downloading packet numbers in session; and HTTP header features, i.e., requested resources’ internet host and port number, the request method, request uniform resource identifier (URI), and the user agent. Researchers used the Drebin project’s malware apps and benign traffic from the app market using an app crawler. The proposed method achieved higher accuracy as compared to Drebin and other malware scanners.
In [
13], the authors introduced a fog network distributed over the smart city which detects and alerts unusual activities and IoT cyberattacks. Data from the IoT devices was collected through the IoT sensors in the IoT layer and computed at the fog layer, and then an alert was generated to notify the cloud security systems. The fog layer strategy is useful for reducing the latency between IoT sensors and the cloud. At the fog layer, the machine learning algorithm, random forest, was trained on the UNSW-NB15 dataset [
14] with the selected features such as source and destination IP, destination port, protocol, and duration. In another research [
15], researchers presented the SIEM-based IoT-botnet attack detection and mitigation framework. Traffic logs from the IoT devices were collected through the gateway by installing the Splunk SIEM agent on it. This forwarding agent sent the traffic logs after parsing and indexing them to the Splunk Enterprise server. These forwarded logs were analyzed, and if the distributed denial of service (DDoS) attack was identified, the Splunk SIEM server built the connection with the gateway using SSH and automatically added the rules to the firewall IPTABLES to stop the malicious traffic and alert the network administrator about the attack.
In [
16], researchers used the agentless technique by using simple network management protocol (SNMP) push and pull requests for log collection from remote devices and storing the logs in the Prometheus database. For log analysis and visualization, the Grafana dashboard was used and in case of anomaly, and the Prometheus default alert generator generated an alert. In [
17], researchers used the publicly available dataset CIC-IDS2017, which included clean and DDoS attack traffic in pcap file format. Using CICFlowMeter [
18], they extracted features from the pcap-like timestamp, destination, source Ips, and other relevant features to identify the attacks. CSV files, having mentioned attributes, were directly fed to the machine learning model at open-source elastic SIEM. The simple threshold rules were configured for alarming network traffic logs that indicated the DDoS attack. For example, DDoS alarms could be activated by adding the rule that if the number of certain IP connections increases from three times the average of the last hour per minute.
In [
19], researchers proposed a system that used a federated, self-learning approach for anomaly detection in IoT devices at a security gateway. For device type identification, authors used the existing approach AuDI [
20], which identifies the device type in the local network. For anomaly detection, federated-learning-based models were developed based on specific device types and were trained on locally generated data from the security gateways. Later, these models were aggregated to the global models by IoT security service. The proposed system, when evaluated in a real environment, came out with a higher attack detection rate and no false alarms.
In [
21], A SIEM solution is designed and implemented for the security of the smart grid. The proposed SIEM consists of some components for infrastructure monitoring, traffic capturing, machine learning/ deep learning (ML/DL)-based intrusion detection, and smart-grid application-layer monitoring of the visual analytics-based anomaly detection. The researchers used the features of Modbus, DNP3, building automation and control networks (BACnet), MQTT, and other protocols for various attacks such as DoS, SQL injection, and intrusion detection.
Industrial protocols play a significant role in the communication security of smart industrial systems. In a research study analyzing industrial protocols [
22], researchers examined DNP3 protocol packets using Wireshark. The findings suggested that there was a control code inside the application layer that showed the relay trip. The rules were written in Snort to generate the alert when there was a trip command from an unknown source.
Normal and anomalous traffic detection has widely been performed with the use of machine and deep learning algorithms. In [
23], the authors used deep learning algorithms for the detection of attacks in modern datasets such as UNSW-NB-15 [
14] and CICDDoS2019. The authors claimed to use the hybrid deep learning approach for attack detection. First, the autoencoder was used for the important feature selection without human intervention, and then a multilayer perceptron network was utilized for attack prediction. The model resulted in 98% accuracy in DDoS attack detection.
Industrial SIEM solutions are considered the standard for security implementation. However, the relevant literature reported limited research on agentless approaches for industrial IoT security solutions such as Wazuh. Critical analysis of the literature also emphasizes the gap in industrial protocols’ feature analysis and their use for rule writing. Utilization of IoT/IIoT communication data to detect and predict attacks can assist in developing the holistic scheme for industrial cyber-physical systems security. To fill the gaps in smart industrial cyber-physical systems security literature, this research work, as shown in
Table 1, proposed a security scheme that collects the IIoT network data using the agentless approach from the gateway, converts the raw traffic files to CSV format, and passes the data to a trained machine learning model for attack detection. The predictions of the ML model along with the data packet information are embedded into the JSON log format. The JSON logs are forwarded through the agent of Wazuh installed on the same device having the traffic sniffing scripts and ML models at the gateway level. The logs are received at the Wazuh server end where the decoders are added to extract the features that are further used in rules writing for attack detection and event monitoring. The rules are written against Intrusion detection, DDoS, and man-in-the-middle (MITM) attacks. For event monitoring of industrial IoT devices, protocols based on the specific types of IIoT processes are studied, and important features are extracted to determine the commands the devices are sending or receiving to or from the other devices. Rules are written that match the specific function codes and service types and generate alerts of the event occurring in the industrial control systems. Alerts that are generated in case of any anomaly and the events occurring in the systems are displayed on the Wazuh dashboard. The alerts are stored with the timestamp and the summary can also be accessed and downloaded from the dashboard.
3. Experimental Design Components
3.1. The SWaT Case Study
The water supply system is crucial infrastructure that is recently one of the most frequent targets of cyberattacks. The Singapore University of Technology’s iTrust Centre for Research in Cyber Security created the secure water treatment testbed (SWaT) in 2015 for cyber security research. Later, updated datasets and details were also uploaded by the iTrust labs to help researchers in a smart industrial security context. The last update was made on 19 July 2021. The dataset, its details, and the testbed’s technical information can be accessed from iTrust labs [
24].
SWaT is an operational water treatment testbed that can produce 5 US gallons of filtered water per hour. The testbed is a scale-down setup of large modern water treatment plants common in large cities, with a footprint of about 90 square meters. The control system and overall physical process of SWaT intimately reflect the legitimate setup in the field.
Figure 3 shows that there are six stages of the SWaT, numbered P1 to P6. At each stage, there are dual PLCs one of which acts as the primary controller, if the primary PLC fails, the secondary PLC works as a backup of the event. In general, the testbed uses a dispersed control strategy in regular operations, with local PLCs controlling each stage of the process separately. The PLCs are connected in the layer 1 network and communicate constantly to receive the state information from other processes that the local device needs for some of the processing steps. The operator can control all the actuators in the testbed manually, instead of using the automatic distributed control mode, by using the human–machine interface (HMI) and supervisory control and data acquisition system (SCADA) systems.
The primary PLC collects data from the sensors and manages actuators using the Remote I/O (RIO) unit during each stage of the process, such as pumps and valves. Remote I/O connected between the PLCs and the sensor/actuators for the transfer of commands and data to and from the PLCs and field devices. Water can flow into or out of a tank, for instance, by turning the pumps on or closing a valve. The PLCs monitor the status of the devices and decide when the pump should be turned on or off by updating the actuator values. To monitor the chemical traits of water flowing through all six stages, additional sensors are available.
3.2. SIEM Solutions
Since IoT devices are now being used in every field from homes and small-scale retail shops to large-scale industries, the vulnerabilities within them are exploited by adversaries to steal information or for ransom. Gartner in 2005 introduced SIEM (security information and event management) for endpoint monitoring. SIEM is the combination of SIM (security information management) and SEM (security event management), which are two separate systems for event storage, analysis, and reporting (SIM) and real-time collection of events (SEM) [
25]. Generally, SIEM is a security tool that aids companies in identifying potential security vulnerabilities and threats before they manage to interfere with and damage business operations. For security and compliance management use cases, it surfaces user behavior anomalies and employs artificial intelligence to automate many of the manual operations related to threat identification and incident response. It has become a mainstay in contemporary security operation centers (SOCs) [
26]. Security information and event management (SIEM) tools are typically used for cyber-physical systems security. They collect the event logs from the devices and perform diverse actions to ensure the security of the connected systems. The main functions of SIEM solutions are event logging, normalization, aggregation, and event correlation. In logging, the SIEM solution stores the data collected and forwarded by the agents on the devices. These data collected from different types of devices are then converted to the common format so that the data structure operations can be performed on them. This process is called normalization. After normalization, all the redundant values from the data are eliminated (aggregation). Then, the processed logs are correlated for the detection of any suspicious behavior and unknown pattern. After performing all the functions on the collected data, SIEM solutions generate alerts in case of any abnormal activity performed on the devices. A basic SIEM architecture is shown in
Figure 4.
SIEM is used to gather event data from monitored devices throughout the network of a company or home. IT and security teams can automatically manage their network’s event log and network flow data in one centralized location thanks to the real-time collection, storage, and analysis of logs and flow data from users, applications, assets, cloud environments, and networks. To compare their internal security data with known threat signatures and profiles, several SIEM solutions also integrate with third-party threat intelligence feeds. Teams can stop or recognize novel attacks through integration with real-time threat sources. Any SIEM solution must provide event correlation which uses advanced analytics to quickly find and eliminate possible threats to enterprise security by identifying and comprehending complex data patterns. The manual operations associated with the in-depth analysis of security events are offloaded by SIEM systems, which considerably reduces the mean time to detect (MTTD) and mean time to respond (MTTR) for IT security teams.
SIEM systems can recognize all entities in the IT environment since they provide centralized management of on-premises and cloud-based infrastructure. By categorizing strange activity as it is discovered on the network, SIEM technology can monitor security incidents across all connected users, devices, and applications. Administrators can be quickly warned and take appropriate action to mitigate it using customizable, established correlation criteria before it develops into more security risks. Much research is conducted to secure IoT devices using different tools and techniques to detect and prevent anomalies within the IoT network.
3.3. Wazuh SIEM
Wazuh is an open-source SIEM solution, and its architecture is mainly based on the agents that run on the monitored hosts and send security information to a centralized SIEM server. Firewalls, switches, routers, and access points that do not require agents are supported and can actively provide log data through Syslog, SSH, or their application programming interfaces (APIs). The Wazuh indexer indexes and stores the findings from the central server’s analysis and decoding of the incoming data. Wazuh indexer is a full-text search, highly scalable engine. The indexer is responsible for indexing the alerts coming from the server and storing them. The indexer provides analytics and near real-time search capabilities. In this research, a single-node indexer is used for deployment in a larger IIoT environment; the multi-node cluster can be installed for higher availability. The Wazuh indexer stores the data in a JSON document in the form of a key-value pair and then correlates the data key fields and value fields. So, in the Wazuh server, the values can be of any datatype, i.e., Boolean, integer, string, and time format. This main component of Wazuh ensures redundancy and increases query capacity by distributing the JSON documents into the shards. Shards are the containers that would be further distributed to other nodes in case of multi-node cluster deployment of Wazuh.
The server is the component of Wazuh that receives the logs from the agent. It then applies the rules written in it and in case of any anomaly; it triggers the alerts. The server also manages the remote agents and their status. Different components of the Wazuh server perform different tasks, such as the Agent enrollment service: agents are enrolled by generating the authentication keys and sharing them with the agent using the TLS/SSL connection Agent connection service. This part authenticates the agent’s ID and authentication keys shared by the enrollment component, the Analysis engine: this is an important component required to make the logs of the raw pcap network traffic. This component uses the decoders to recognize the type of information that is received from the agents. The patterns extracted from the decoders are matched with the rules written in the Wazuh. All the alerts and analytics of events can be seen from the Wazuh dashboard.
Wazuh has the agents that are to be deployed on the monitored hosts. The hosts should be configured with the server using the IP and authentication keys. The paths for the Windows, Linux, and MAC operating systems are already defined in the configuration file of the agents from where they gather the system data. The agents send the data to the server on the same network using the TCP protocol on the 1514 port. Wazuh SIEM can also monitor the devices having specific operating systems or having the ability to send the data through an SSH connection with the server. Previously, efforts have been made on developing mechanisms to secure IoT/IIoT devices to overcome their power, computation, and storage limitations. Limited research and development have been performed on securing the IoT/IIoT devices using the SIEM solution. Some researchers used the SIEM solution for securing the IoT/IIoT devices but only implemented the security solution for DoS attack detection, while others either installed the SIEM agents on the devices or used SNMP pulling techniques or built some type of connection between the IoT/IIoT devices and the SIEM server for data collection. A complete Wazuh agent server architecture diagram is shown in
Figure 5.
4. Methodology
4.1. IIoT Dataset
The suggested technique used the agentless approach for the network traffic analysis of the IoT devices along with the Wazuh SIEM solution. The security scheme proposed the collection of IIoT traffic logs from the default gateway and directing them to the SIEM solution for correlation and monitoring without installing any software or running the script on the field IIoT devices such as sensors, actuators, and PLCs. SWaT and CICDDoS2019 datasets were used for real-time traffic monitoring and attack detection using ML models, respectively. In CICDDoS2019 dataset, different types of attacks were performed on the systems by a third party. Some of the attacks they performed on the devices were PortMap, NetBIOS, LDAP, and SYN [
23]. In both publicly available datasets, the traffic was collected from the gateway.
For the implementation of this security scheme in industrial settings, the first step would be to gather the IIoT devices’ network data from the gateway by collecting the IIoT network traffic through live capture using Wireshark or from the Python library Pyshark using the following script:
In this research, real-time traffic monitoring and ML-based attack detection are the two processes carried out at the gateway level. The main goal of this research was to detect malicious activity and monitor various types of IIoT devices. The experimental testbed is shown in the proposed research scheme diagram in
Figure 6.
4.2. Real-Time Traffic Monitoring
For real-time traffic monitoring, the network traffic was directly sent to the server by converting it to the JSON log and sending it to the server. The following script was used for the direct conversion of the raw IoT network packets into JSON format.
As the Wazuh agent can only read the flat log files, the JSON logs in the out.json files were converted to the inline JSON format. In this research, the logs converted from raw pcap to the JSON format had one packet in multiple lines which without using the xPath filters could not be read by the agent [
27]. Using a Python script, the multiline JSON logs were converted to inline JSON logs, i.e., one log per line.
4.3. Attack Detection Automation at the Gateway
The next step was to monitor the traffic using machine learning models for the detection of state-of-the-art attacks in IIoT devices. The machine learning models were trained on the CICDDOS2019 dataset which was available in CSV format. In this dataset, the researchers collected the data into pcap files from the network gateway by using CICFlowMeter, the traffic was converted into the CSV having the network traffic flow features. The pre-processing of the dataset was performed before training the models. The field’s source and destination IP addresses, timestamps, and flowID were to be encoded before feeding the dataset to the model. Using the information gain technique, important features that correlated with each other were identified and the resultant (encoded and compacted features) dataset was set ready to train the models. The dataset was split as 70% for training the model and 30% for testing the models. The training data files were used to train the DT, RF, and KNN machine-learning classification models, widely used for classification problems. The dataset had different class labels, i.e., benign and the types of DDoS attacks such as UDP flooding, SYN flooding, NetBIOS, and LDAP, to name a few. The scripts for model training and testing were written in Python using the PyCharm community version. The accuracy of all the models to predict the attacks were calculated on the incorporated dataset and the best predictive model with higher performance measures was used for the prediction of attack and benign traffic logs. The following sections are on the model’s description and its results on the CICDDoS2019 dataset.
4.3.1. K-Nearest Neighbor (KNN)
K-nearest neighbor is a supervised learning algorithm that works by putting the new data into the category that is most likely to the existing category data by assuming the similarity between the new and the old data. Although the KNN approach is most frequently employed for classification problems, it can also be utilized for regression problems. The algorithm is also called the lazy learner algorithm due to the reason that at the training time, it only stores a dataset and at the time of testing, it compares the new data with the existing one and categorizes it with the most similar data. The distance of the new instance, measured from the neighbors is calculated using the Euclidean distance formula given by:
4.3.2. Decision Tree Model
A decision tree is a supervised algorithm mostly used for classification problems. The model is a tree-structured classifier where the dataset features are represented as the internal nodes, the branches represent the decisions made, and finally, the outcomes are represented by the leaves. In this model, to make the decision tree, the classification and regression tree (CART) algorithm is used. While testing the data, based on the trained data, the new instance is compared to the nodes and branches, and according to the decision of the model, it jumps to the other node and repeats the process, and the final decision is shown at the end of the tree, called leaves. The below algorithm shows how the decision trees are made:
The source set is split into the subsets based on attribute values tests.
The above process is repeated on the subsets recursively.
The recursive splitting process is called recursive partitioning.
Recursive partitioning process is carried out until further splitting does not yields into the performance betterment.
The instances using decision trees are classified as follows:
The instance is traversed through the tree from the root node to the leaf nodes.
The instance is tested on the root node and based on the attribute value it jumps to the specific node after testing.
The process is repeated on each level of node and hence an instance is classified.
4.3.3. Random Forest Classification Model
As the name depicts, the model develops a forest of decision trees for class prediction. Random forest is the supervised algorithm. The dataset is first divided into subsets randomly. The Gini index of the splits is calculated to find out the impurity of the splits. The Gini index is calculated using the formula.
The impurity of the split can also be measured by calculating the entropy.
The split with the lowest Gini index or lowest entropy is used for further splitting and making the decision trees. After the row and column sampling, decision trees are made for each of the subsets. The decision trees work independently and output their prediction. The outcome of the final random forest model is what most decision trees predict.
4.4. Gateway Agent Configuration with Wazuh Server
In this research, the gateway agent is configured with the server through the Wazuh server’s IP setting in the agent’s configuration file. The agent requests for the authentication key from the server on the server’s IP and the server responds and the connection is established for monitoring. The Wazuh server receives the logs from the agent on port 1514 which is also defined in the configuration file ossec.config of the agent, as shown in
Figure 7.
Real-time event monitoring from the network data is performed by analyzing the application layer protocols that are mostly used in industrial control systems. Common Industrial Protocol (CIP), DNP3, and Modbus are examples of industrial protocols. Errors in the devices or the operations performed by these protocols can be detected from the industrial protocol fields that are used in the system. The analysis and description of the features that can be used in the rules for event monitoring are described in the following section.
4.5. Industrial Protocols Analysis
4.5.1. Common Industrial Protocol (CIP) Analysis
CIP or Common Industrial Protocol uses the producer-consumer communication model to handle general purpose network connections, network services such as file transfer, and automation operations, including analog and digital input/output devices, HMI, movement control, and position response. CIP packets have the attributes, service codes, and connections that represent data, commands, and relationships between the data and services, respectively. For a wide range of industrial automation applications, including control, safety, energy, synchronization and motion, information, and network administration, CIP comprises a comprehensive suite of messages and services. Users can connect these applications to the Internet and business-class Ethernet networks using CIP. CIP offers consumers a single communication architecture across the industrial enterprise and is supported by hundreds of vendors globally. It also offers a media-independent, expandable, and upgradeable communication architecture while protecting their current automation investments and enabling users to make use of the various benefits of open networks now.
CIP has two types of messages: explicit and implicit messages. Explicit messages contain the attributes, service codes, and paths that are used to direct the devices on what action to perform. The implicit messages are used for I/O data transportation. There are no addresses, or service codes and the device already know what to do with the data implied by the connection ID that is decided at the time of connection. Event monitoring in the industrial networks that use the CIP protocol can be monitored using the CIP attributes. The Secure Water Treatment (SWaT) setup also uses the CIP protocol. After analyzing the network traffic of SWaT, service codes, request path, and success status were recognized as the important features for event monitoring of SWaT. Important fields of CIP that were used in the rule for threat intelligence at the Wazuh server end are service_code, request_path, and success_status.
4.5.2. Modbus Protocol Analysis
Modbus is commonly used in remote terminal units (RTUs). The TCP listening port is 502 and is typically communicating through RS485 or RS232. The Modbus application layer consists of a protocol data unit (PDU) and before PDU its MBAP Header (Modbus application header). Modbus protocol has a transaction that uniquely identifies each request’s function code that specifies the type of action to perform, the data bytes that contain the information about the start register, and several registers to read. The transaction identifier can be used to check the number of transactions made by the device in a specific time unit. Similarly, the unit identifier can be used to check how many times an RTU, or any other communication device connects with the same PLC or any other slave device. The main resource is the function code by which we can identify whether the PLC is getting an error or not, how many types of work a device performs in a day, and if the specific device is given the assigned task or not. For a read operation, numbers 01 to 04 are used, and for a write operation, 05 to 10 codes are used, which are for reading and writing the coils, discrete inputs, multiple holding registers, and input register values. Rules can be written for monitoring the network using the Modbus protocol and for diagnostic purposes based on the above information.
4.5.3. DNP3 Protocol Analysis
DNP3 is a three-layer protocol following the standard of the Ethernet for plant automation (EPA) two-way communication. It is rolled up into an application layer, encapsulated within the TCP\IP or UDP layer. Sometimes other layers are also added, i.e., pseudo transport layer. DNP3 protocol is used between the master and distributed remote units called outstations. The master works as an interface between the human network manager and the monitoring systems, while the remote unit is between the master station and the physical equipment.
In DNP3 packets, a feature ‘length of Datalink layer’ checks the information of application data regarding each function of a device. The function code of the application layer identifies the action to perform. There is a field trip control that is set in case of tripping in the devices. The protocols such as DNP3 are specialized and function codes are already configured; the rule can look for the function codes and generate the alert according to the meaning of that function code in the system. For example, if a Function_code==15 alert is generated with the message ‘unsolicited’, the alarm is disabled.
4.5.4. SWaT Network Monitoring
For SWaT process monitoring, the features of the common industrial protocol are deeply analyzed, extracted, and compared in the rules for the event alert generation. The CIP protocol has the service codes in the packets that have the fixed value for read (0x4c) and write (0x4d) commands. The example packets of the read operation are stated in
Figure 8 and
Figure 9. PLC 6 (192.168.1.60) requests the value of level sensor LIT101 from PLC 1(192.168.1.10), which is equipped in the first stage of the SWaT system. PLC 1 responds back with success status to PLC 6 with the sensors’ data.
The connection between all the devices, while analyzing the SWaT network traffic, is summarized in
Appendix A. As discussed above, the SWaT has six stages of water purification. All six stages have their own local PLCs, sensors, and actuators. One-stage sensors and actuators send the values to the PLCs, and the PLCs communicate with each other to exchange the sensor’s data and control the whole process. The common industrial protocol is used in the system for communication that has implicit and explicit messages in it. Implicit messages are used to send control I/O data where the PLCs send these messages to the sensors or actuators when they must read or write. The explicit messages are important here for event monitoring. Through explicit messages, the PLC sends the request to read the value of the specific sensor and the other PLC responds with the value of the sensor. Later, the values are passed to the program written to the PLCs and after making the decisions the control data are sent to the local devices. All the communication between the PLCs and other field devices are summarized in
Appendix A. By using this information in the SWaT network, the rules are written in Wazuh for event monitoring.
4.6. Decoder Writing in Wazuh
On the server side, the decoder is written and added which is applied to the logs received by the Wazuh server to extract the important features for the rules to monitor or detect the events in the iIoT network. The decoder extracts the transport layer features, as shown in
Figure 10, from the logs such as ip.src and ip.dst, MAC addresses, and some TCP fields, while for testing on the SWaT, common industrial protocol fields are utilized which are further used in the rules to generate the alerts.
As mentioned earlier, the decoders were written and added in this research for extracting the fields from the JSON logs.
The decoder decodes and extracts the ip.src and ip.dst field values from the logs which will further be used in the rules to generate the alerts (
Figure 11). The explanation of keywords used in the decoders is given in the preceding sections.
The prematch keyword is used to match some string that is common in the logs that are intended to be decoded by this decoder. If the string is matched, then the current decoder will be used, and the other decoders will not be searched.
Regex is the abbreviation for regular expression used by the decoders to find patterns or words in the rules. Only the fields that are in the parenthesis are extracted by the decoders.
Order label is mandatory to define with the regex keyword. The order keyword defines the field name in which the regex pattern is received.
This keyword is used to link the child decoders with a parent decoder. A parent decoder may have one or more children, but the child decoder cannot be a parent to another decoder. There are many more keywords that can be used in decoders, and rules can be found from [
27].
Pcre2: Perl Compatible Regular Expression used or logs interpretation. There are many quantifiers used in pcr2; however, some that are used in our decoders are given below in
Table 2.
Table 3 shows some special characters used in decoders. More characters and quantifiers with description can be studied [
27]. The decoder can be tested using the option “Decoders Test” by giving an example log. The test results are shown in
Figure 12.
4.7. Rule Writing and SWaT Event Monitoring
For this research, the industrial map was already available indicating where the security solution was to be deployed into the network. The devices’ IPs or MAC were manually configured in the rules file. Rules were written for the intrusion detection and the alerts generation test was carried out by sending the requests from the unknown IP to the network whose information was not configured in the SIEM server. The DoS attack was detected by the rule in which the timeframe and frequency of packets were defined, and the alert was generated when the logs were received from the same IP that was unknown to the system within a specific timeframe.
All the information in
Appendix A was gathered by analyzing the network traffic of SWaT. This information was used to write the rule to monitor the network. The service code fields, and CIP request path fields were extracted and used to generate the alerts whenever the event occurred as shown in
Figure 13.
The tag names (request path) were also defined and used to predict which sensor or actuator’s value was requested or transmitted in the monitored packet. The alerts were generated whenever the data was requested or responded to from or to the programmable logic controller.
4.8. DDoS Attack Dynamic Rule Writing
JRip algorithm is used for the dynamic rule writing for DDoS attack detection. JRip works by learning the patterns from the training dataset and specifying the threshold for each class given in the training dataset. The deep understandings of how JRip works can be gained from [
28]. By feeding the CICDDoS2019 dataset into the Weka software using the JRip algorithm, the rules were written to detect different types of DDoS attacks found in the dataset. The thresholds for the fields were noted from the output of the JRip algorithm. The fields needed to compare from the traffic were extracted from the decoder and the rule was written into the WAZUH for the attack’s detection to complement the machine learning model attack prediction. Enhanced security is achieved by this approach.
The thresholds can be changed periodically for performance maintenance on future attacks. A generalized DDoS attack can be detected by defining the timeframe and the packet count in a rule as shown in
Figure 14. The following
Figure 15 is an example of such a rule.