1. Introduction
While power grids play an increasingly crucial role in modern society’s infrastructure, their reliance on specialized Information Technology (IT), known as Operational Technology (OT), Industrial Control Systems (ICS), or Supervisory Control and Data Acquisition (SCADA), becomes more critical. These systems enable engineers to monitor and control various elements of the power system from a central location. However, the trend toward greater sensor integration, cloud computing, and remote access has led to a more profound intertwining with general IT systems. This integration does not only elevate the functionality and efficiency of these systems but also significantly increases their vulnerability to cyber-attacks. Consequently, there is an urgent need to re-evaluate and enhance the cyber security measures for these critical infrastructures. Their security requirements are transforming in response to these changes, necessitating ongoing updates and improvements.
Cyber threats are evolving, with specific increases in cyber-attacks targeting ICS. Stuxnet destroyed more than 200,000 machines in 14 Iranian facilities, including the Natanz uranium-enrichment plant [
1]. Many analyses claim Stuxnet damaged almost one-fifth of Iran’s nuclear centrifuges, consequently delaying the Iranian uranium enrichment program [
2]. The Stuxnet attack also highlighted the critical role of ICS domain knowledge in cyber-attacks.
The Ukraine cyber-attacks in 2015 and 2016 targeted the Ukraine Power grid to disrupt grid operations. The 2015 cyber-attack employed the BLACKENERGY3 [
3] malware, targeting the distribution level and disrupting 30 substations [
4]. The attack left approximately 225,000+ customers without power for 1–6 h [
5]. The BlackEnergy3 malware, an advanced version of the BlackEnergy2 malware (Attacks related to the BlackEnergy malware are attributed to the Russian Sandworm group that has targeted numerous industries in cyber and espionage operations), was initially used to access the corporate network. The attacker then pivoted to the SCADA network [
6], denying operators SCADA system interaction while remotely opening circuit breakers and taking substations offline [
7].
The 2016 attack utilized the CRASHOVERIDE malware, also known as Industroyer [
8,
9]. The malware directly exploited vulnerabilities in industrial protocols and disrupted a transmission level substation, which caused an outage in Kyiv for about one hour [
10]. According to Dragos, the CRASHOVERIDE malware is not unique to any configuration or vendor [
11]. It leverages network communication and grid operation knowledge, making it an immediate threat to European, Middle Eastern, and Asian grid systems. In 2022, Ukraine faced another cyber-attack using an updated version of the Industroyer malware, known as “Industroyer2”, specifically designed to compromise the country’s energy sector [
12]. This sophisticated cyber-attack on a critical infrastructure underscores the advanced capabilities of cyber adversaries in exploiting vulnerabilities in ICSs.
Some authors claim that a combination of security measures could have either prevented the cyber-attacks mentioned or reduced their consequences [
13]. Even a basic security level outlined in the IEC 62443 standard family [
14] might have prevented the 2015 Ukraine attack [
15]. An important part of these measures is the capability to detect an attack.
Detecting cyber-attacks in ICS environments requires insights into the activities within the system itself, including the associated networks, communication protocols, hosts, controllers, sensors, and other ICS components. While detecting cyber-attacks in ICS environments is crucial, bridging this need with effective solutions presents a challenge. Currently, the market offers only a limited number of commercial solutions, like intrusion detection systems (IDS), equipped to handle these complex requirements. Moreover, there is a noticeable gap in implementing ICS-specific methods, as suggested by recent research [
16].
1.1. Motivation and Contribution
Cyber-attack detection on ICS must occur early to prevent physical damage and process disruptions like power outages. While ICS IDSs are designed for this purpose, their adoption has been slow. This can partly be due to the lack of proven effectiveness in realistic environments and the insufficient implementation of ICS-specific methods [
16]. Consequently, evaluating both commercial and open-source ICS IDSs in a controlled, realistic environment becomes essential.
It is vital to undestand the differences in detection capabilities between traditional IT-specific IDSs and those designed specifically for ICS. To be able to evaluate different IDS types, we propose running a controlled experiment on a Hardware In the Loop (HIL) lab. This approach mirrors the implementation challenges in production environments, emphasizing the need for a robust experiment protocol for setup validation and fair comparison.
We have formulated three research questions:
How effective are IT-specific IDSs in detecting IT-related attacks within industrial control systems?
To what extent do ICS-specific IDSs struggle to detect novel, tailor-made attacks targeted at industrial control systems?
What are the inherent challenges in creating valid and reliable evaluation metrics for IDS performance in a live testlab environment?
Our research contributes to the field of ICS security by providing empirical insights from extensive experiments in a Digital Substation (DS) HIL testlab. In addition to our primary research, we have also made a contribution by developing an experimental protocol that forms the foundation of these experiments. Together, these contributions advance both the theoretical understanding and practical application of IDS in the realm of ICS security.
1.2. Organization
This paper is structured as follows.
Section 2 presents an overview of industrial control systems.
Section 3 presents an overview of intrusion detection systems, and
Section 4 presents other research into evaluating IDSs in IT and OT.
Section 5 discusses the tools and experiment methods.
Section 6 presents the results.
Section 7 includes the discussion.
Section 8 contains conclusions and further work.
2. Industrial Control Systems in Power Systems
An ICS is a specialized IT system used to control a physical process. The National Institute of Standards and Technology (NIST)’s Guide to ICS Security defines ICS as: “An ICS consists of combinations of control components (e.g., electrical, mechanical, hydraulic, pneumatic) that act together to achieve an industrial objective (e.g., manufacturing, transportation of matter or energy)” [
17], p. 2-1. An ICS is a system used to control a physical process that often entails vendor-specific and COTS (Commercial-off-the-shelf) components.
Power systems’ primary function is to generate and transfer electric energy to consumers. A power system encompasses one or more generating sources and transmission lines operated under joint management to supply electricity [
18]. Power system operations are enabled by specialized IT often referred to as OT, ICS, or SCADA. These systems allow engineers to monitor and control the power system’s multiple locations, generators, and substations from a central terminal. A power system ICS is commonly composed of both digital and analog components like Remote Terminal Units (RTU), Programmable Logic Controllers (PLC), Intelligent Electronic Devices (IED), Human Machine Interfaces (HMIs), Breakers, Sensors, and various specialized software. All these components communicate in real time through industrial communication protocols.
Security Issues in the IEC 60870-5-104 Protocol
IEC 60870-5-104 (IEC/104) [
19] is a communication protocol between SCADA systems and RTUs in power systems. It is an extension of IEC 60870-5-101 [
20] and is widely used in the European power grid.
Table 1 outlines key characteristics of the IEC/104 protocol.
Despite its widespread use, IEC/104 has several vulnerabilities attackers can exploit. Some of these vulnerabilities include:
Lack of authentication: The protocol does not provide robust authentication mechanisms, making it easy for attackers to impersonate legitimate users.
Lack of encryption: The protocol does not provide encryption, making it easy for attackers to eavesdrop on the communication and steal sensitive information.
Buffer overflow: The protocol is susceptible to buffer overflow attacks, where an attacker can exploit a vulnerability in the protocol to inject malicious code into a system.
Denial of Service attacks: The protocol is susceptible to Denial of Service attacks, where an attacker can flood a system with traffic to disrupt its operation.
5. Materials and Methods
5.1. Controlled Experiment
We aimed to evaluate and compare ICS IDSs regarding their detection precision and classification, which requires a high degree of control. We chose to conduct a controlled experiment to ensure all IDSs undergo testing in a consistent and repeatable manner and eliminate biases from uncontrolled variables.
According to Wohlin [
40], a controlled experiment is an empirical inquiry that manipulates some input factors or variables of the studied setting based on randomization and measures the effect on outcome variables. To conduct a controlled experiment, researchers must sufficiently manage all other variables and the environment. This control enables statistical analysis to determine the impact of changes in input variables on the output variables. Experiments give researchers a high degree of control over the study and also a high degree of control over collecting measurements.
A controlled experiment needs a controlled environment where variables are known, a way to measure the variables, and a way to ensure the experiment’s validity. We achieved this by carefully scoping and planning both the environment and the operation of the experiment. Implementing proper experiment protocols is essential to ensure the smooth operation of the experiment and to validate that the environment is under control.
5.2. Testlab
We used a DS HIL Testbed, controlled detection architecture, IDS tools, and specific attack scenarios for our controlled environment.
For the immediate environment, we used an existing HIL testbed, the DS enclave located at the Institute for Energy Technology (IFE) in Halden, Norway [
41]. The DS enclave includes hardware and software for one substation breaker bay, including a protection relay. The topology of the testbed is illustrated in
Figure 1.
The figure depicts the DS enclave infrastructure for a Norwegian Digital Substation, aligning with the IEC 62443 standard. It features a layered design and incorporates both station and process buses in accordance with the IEC 61850 standard series [
42]. The enclave router acts as a demilitarized zone (DMZ). This DMZ separates IT and OT environments. The infrastructure incorporates remote access software for secure connectivity. It also utilizes virtualization platforms and container technologies. These technologies support computing, storage, and networking. The design is tailored for flexibility in simulating hardware- and software-based attack scenarios. The infrastructure is designed for efficient data capture and analysis, with provisions for addressing challenges such as large PCAP file management and resource limitations. The DS enclave has a central logging server for collecting network traffic and Syslogs. This server enables detailed data analysis and tracking of attacks. The setup supports multiple detection systems, with network traffic directed to hardware appliances and software-based IDS.
The DS enclave is equipped with IECTest, a specialized software conforming to the IEC/104 standard, which we utilized to emulate the operations of a control center [
43]. This versatile tool allowed script-based programming, enabling the automation of specific commands, such as the opening and closing circuit breakers, to be executed at predetermined times within the experiment’s duration. Each experimental run employed the identical script to ensure consistency and replicability of results, maintaining uniformity across all test scenarios.
5.3. Detection Architecture
We set up all the IDSs with access to the same information so they could “see” the same events. All network traffic going through the edge router, the process bus switch, and the station bus switch were mirrored and fed into the IDS sensors. We also installed sensors for host-based IDS with the same privileges on the same hosts.
To remove bias in the tools used by the IDSs, we configured them up to send all alerts to the same Security Information Event Management (SIEM) system using the Syslog protocol.
In testing and creating an experiment protocol for ICS IDS evaluation, we used three commercial and two open-source (free) IDSs. Cisco Cyber Vision (
https://www.cisco.com/c/en/us/products/security/cyber-vision/) (accessed on 10 October 2023) (IDS-A), Omicron Stationguard (
https://www.omicronenergy.com/en/products/stationguard/) (accessed on 10 October 2023) (IDS-B), Secure-NOK SNOK (
https://www.securenok.com/our-products/) (accessed on 10 October 2023) (IDS-C), Suricata (
https://suricata.io/) (accessed on 10 October 2023) (IDS-D) and Wazuh (
https://wazuh.com/) (accessed on 10 October 2023) (IDS-E). A, B, and C are commercial, and D and E are free and open source. A, B, and D are network-based, E is host-based, and C is a hybrid. B, D, and E are rule-based, A is both anomaly and signature-based, and C is behavior-based. For our experiment, the most interesting difference between the five IDSs is how they can be categorized based on the environments they target.
Environment-focused IDSs: Cisco CyberVision, Omicron StationGuard, and Secure-Nok SNOK are specifically designed for OT environments and industrial control systems. They are tailored to address the unique requirements and protocols used in OT networks.
General-purpose IDSs: Suricata and Wazuh are versatile solutions that can be deployed in various environments, including IT and OT networks. They offer a broad range of detection capabilities suitable for different use cases.
In addition to this, they can be categorized by their detection capabilities and deployment options.
Detection capabilities: Cisco CyberVision, Omicron StationGuard, and Secure-Nok SNOK focus on monitoring OT-specific protocols, while Suricata and Wazuh offer broader detection capabilities suitable for a variety of network protocols and traffic patterns.
Deployment options: Cisco CyberVision, Omicron StationGuard, and Secure-Nok SNOK are commercial solutions with vendor-specific support and deployment options. Suricata and Wazuh are open-source projects, offering greater flexibility for customization and integration with other security tools.
5.4. Attack Scenarios
To assess the precision and attack classification capabilities of the IDS, we selected a set of validated cyber-attacks [
43] for implementation in the testbed.
In the 2015 Ukraine power grid attack, attackers initially gained access to electrical breakers and then executed a denial of service strategy to lock out the operators [
4]. We employed a carefully selected set of validated attacks to mimic this attack sequence on a smaller scale. Our approach included phases of active reconnaissance (Attack 1), followed by the opening of breakers (Attacks 2 and 3), and culminating with a series of denial of service attacks (Attacks 4 and 5). The objective was to evaluate the IDSs in a setting that mirrors real-world conditions, where an attacker typically utilizes various attack methods rather than a single, isolated approach. The adversary profile was a competent, motivated, skilled attacker who has established a foothold through privilege escalation and pivoting. We scripted the five attacks to run in a fifteen-minute window, constituting one experiment run.
Figure 2 illustrates the timeline of our executed attacks, with further details provided in
Table 3.
5.5. Experiment—Steps and Quality Assurance
Our experiment protocols can be seen in
Table 4 and
Table 5. We needed to be explicit in using two protocols: one for setup to verify that we had a controlled environment and one for running the experiment. Secondary protocols for running the attacks were also created but are outside the scope of this paper and not presented.
5.6. Analysis
In our case we were looking to verify the effectiveness of the IDSs we compare and evaluate [
44]. The confusion matrix is the most commonly used in such cases, shown in
Figure 3.
True Positive (TP): Attacks/intrusions that are successfully detected
False Positive (FP): Normal events that are wrongly classified as attacks
True Negative (TN): Normal events that are classified as such
False Negative (FN): Attacks/intrusions that are not detected
Figure 3.
A confusion matrix illustrating the relationship between true positives, true negatives, false positives, and false negatives.
Figure 3.
A confusion matrix illustrating the relationship between true positives, true negatives, false positives, and false negatives.
As the effectiveness of an IDS is best measured by combining the terms in the confusion matrix, other numeric values are usually calculated based on these terms [
44]. Since we did not evaluate the network level values, using metrics like the classification rate (CR), which depends on correctly classified instances and the total number of instances, was not ideal [
45]. Instead, we chose to use the detection rate (
DR) as our primary metric, which is also relevant at our aggregated level. We compared the detected attacks gathered in the SIEM with the attacks executed in each test:
We also separately considered whether the IDS could correctly classify an attack according to the ICS domain. This mainly addresses the issue that the attacks in
Table 3 could be detected by IDSs as an attack but not necessarily classified correctly according to the attack type. For example, an IDS might have detected attack ID 3 as ARP poisoning but still fail to classify the attack as an operating failure or breaker opening.
7. Discussion
7.1. IDS Performance in ICS Environments
In our study, the performance of IT-specific IDSs, particularly IDS-D and IDS-E, highlighted their limited effectiveness in detecting many ICS-specific attacks, aligning with expectations and existing research. Notably, IDS-D failed to identify ARP attacks (Attacks 3 and 4), which are IT-specific but impact the ICS at layer 2 of the OSI model. This failure is attributed to IDS-D being developed for higher OSI model layers, confirming the known limitations of IT-centric systems like Suricata in ICS environments. These systems often lack support for key ICS protocols essential for effective threat detection in SCADA networks [
46]. However, this observation also suggests that IT-specific IDSs are only partially ineffective in ICS contexts, as they can detect specific IT-based attacks that traverse into ICS environments. This underscores the necessity for a layered security approach, combining IT and ICS-specific measures, to provide comprehensive protection against a broad spectrum of cyber threats. Our inclusion of an IT-centric IDS was to validate these limitations empirically, reaffirming the need for IDS solutions that are either inherently designed for or skillfully adapted to the unique requirements of ICS. This emphasizes the importance of a thoughtful and informed approach in selecting and configuring IDSs for robust security in specialized ICS environments.
7.2. Comparative Analysis of Rule-Based vs. Behavior-Based IDSs
In our experimental analysis, the effectiveness of IDS-A, as detailed in
Table 7, was not as high as expected in detecting attacks. The extensive need for tuning and resetting in the lab might have significantly reduced the effectiveness of the IDSs by altering their baselines. This issue was not unique to IDS-A; we observed similar challenges across all ICS-specific IDSs, including IDS-B and IDS-C. The dynamic and continually changing testbed environment, with potentially outdated baselines, underscores a significant challenge in maintaining the effectiveness of ICS-specific IDSs within such variable environments. This highlights the need for continual adaptation and updating of IDS configurations to stay aligned with evolving ICS environments and threat landscapes, emphasizing the complexity and expertise required for effective deployment and management of these systems.
Furthering our analysis, we observed that within the constraints of our testlab, the rule-based IDS-B outperformed the behavior-based IDSs, IDS-A and IDS-C. As shown in
Table 8, only IDS-B correctly classifies attacks within the ICS domain. We attribute this notable performance difference primarily to the dynamic nature of our testlab, characterized by frequent changes that can significantly challenge behavior-based IDSs. These systems, particularly IDS-A and IDS-C, rely on machine learning algorithms and require stable and consistent environments for sufficient training and baseline establishment. The continual alterations in the testlab environment necessitated frequent updates to these baselines, impacting the accuracy and reliability of these behavior-based systems.
In contrast, the rule-based IDS-B, which operates based on predefined rules rather than learned behaviors, was less susceptible to such environmental variability. Its performance, therefore, remained more consistent and effective in detecting intrusions despite the changing conditions of the testlab. This observation underscores the importance of considering the operational environment and its stability when deploying IDSs, especially behavior-based ones that rely on machine learning algorithms for anomaly detection.
7.3. Challenges and Insights from IDS Testing in Dynamic Environments
In the broader context of evaluating IDSs within SCADA environments, other studies are noteworthy reference points [
33,
34]. However, our research expands upon this by evaluating a range of IDSs and implementing a thorough and transparent experimental protocol. This detailed documentation of our methodology significantly enhances the reproducibility and credibility of our findings. Moreover, the comprehensive nature of testing in a live lab environment, including all inherent system complexities and potential faults, offers a more realistic assessment of IDS performance in actual operational settings. This approach underscores the value of thorough system testing, which can reveal intricacies and challenges that might not be evident in more controlled or theoretical evaluations. Additionally, our focus on examining commercial IDSs caters to the industry’s growing interest and need for practical, deployable solutions, moving beyond theoretical models to evaluate how these systems perform in real-world scenarios. This shift toward a more holistic and pragmatic evaluation approach is crucial for advancing the field of cyber security in SCADA and other critical infrastructure systems.
7.4. Development and Refinement of Experimental Protocol
Our experience with developing the experimental protocol revealed several challenges in creating a controlled environment for evaluating IDS performance. Each of the experiments conducted in the lab led to iterative improvements in the protocol. Initial experiments highlighted the necessity for an IDS setup protocol and modifications to address network and topology changes. Subsequent runs emphasized the importance of having a singular focus for each experiment and allocating time and resources to address setup issues. The reported experiment, with its specific attack scenarios and focused goal, demonstrated the effectiveness of the refined protocol in evaluating ICS IDS precision and classification. Our journey in developing this protocol underscores the complexity and dynamism of ICS environments and the critical need for robust, adaptable, and comprehensive experimental protocols to reliably assess IDS performance.
7.5. Limitations and Challenges of the Experimental Setup
While effective in comparing different IDS solutions, our experimental setup faced certain limitations. Notably, the absence of a control group, typically essential for baseline comparisons, might limit our ability to assess the absolute effectiveness of each IDS. However, in our context of comparative analysis, this limitation was mitigated as our focus was primarily on contrasting the performance of various systems.
The limited number of test runs could impact the experiment’s robustness, potentially affecting our findings’ generalizability. This is further argued by the fluctuation in alert numbers shown in
Table 6. Nevertheless, these runs provided sufficient preliminary analysis, particularly given the controlled, hardware in the loop environment and the specific scenarios tested.
Changes to the testlab architecture shortly before the experiments may have influenced the tuning and effectiveness of the IDS. While not ideal, this situation presented a realistic scenario of IDSs adapting to evolving environments, offering valuable insights into their flexibility.
Lastly, the lack of a fully realized SCADA HMI in our setup might limit the applicability of our findings to real-world scenarios. However, this also allowed for a more focused examination of IDS performance in a controlled setting, which is crucial for the initial evaluation stages.
8. Conclusions and Further Work
Our experiments critically evaluated various IDS types in a realistic substation environment, including Network and Host-based IDSs and ICS-aware systems, revealing crucial insights into their effectiveness against cyber-attacks in ICS. This evaluation highlighted the strengths and weaknesses of both commercial and open-source IDSs, guiding their implementation in ICS. Surprisingly, IT-related attacks (Attack IDs 3 and 4) were not detected by IT-specific IDSs, indicating that these systems might only effectively identify IT-related ICS attacks with significant tuning.
Our primary aim was to determine the capability of ICS-specific IDSs in detecting tailor-made attacks on industrial control systems. However, the changing testlab environment hindered a conclusive assessment due to unstable baselines critical for behavior-based IDSs. In our study, the Omicron Stationguard, a rule-based IDS, showed the most consistent detection of tailor-made attacks in our testlab. However, it is important to note that the variable nature of the testlab may have influenced these results. Further research in a more stable setting is essential to verify these findings and better understand anomaly-based IDS effectiveness in industrial control systems.
Additionally, the creation of valid and reliable evaluation metrics for IDS performance faced challenges due to the dynamic nature of the testlab, the absence of a complete SCADA HMI, and limitations in test runs and time. These factors emphasize the complexities in replicating operational ICS conditions and assessing IDS effectiveness in such environments. Our study’s adherence to a standardized protocol, building on the gaps identified by Giraldo et al. [
28], enhances the reliability and applicability of our results, setting a foundation for future research to explore further and improve IDS performance in ICS.
We suggest broadening the range of IDS testing in ICS environments to better understand diverse system performances. Establishing a more stable and controlled testlab environment is crucial for enhancing the reliability of IDS evaluations, especially for behavior-based systems. Adding a complete SCADA HMI to our testlab will more accurately mimic operational conditions. We also recommend increasing the duration and number of test runs to thoroughly assess IDS effectiveness against a wider array of sophisticated and novel cyber threats.