1. Introduction
Industrial control systems (ICSs) are increasingly critical in modern infrastructure and significant projects such as the hydraulic facility, transport, energy, and chemical industries. In this sense, ICS security is also directly linked to the smooth operation of critical infrastructures [
1]. When the ICSs are attacked, it will directly harm the physical world by causing environmental pollution, power outages, oil leaks, and explosions. With the acceleration of the digitalization process of ICSs, the integration of industrialization and informatization has been gradually strengthened. Due to the increasing openness of industrial control systems, there are a increasing number of threats to the systems. Hence, timely and accurate anomaly detection in ICSs is essential in reflecting the security status of the production process and determining the vulnerability of the industrial control systems. Maintaining secure operation of industrial control networks is increasingly critical in improving production efficiency and safety [
2,
3].
ICSs are a series of control systems, which include supervisory control and data acquisition systems, distributed control systems and programmable logic controllers (PLCs), and other control systems and control units. An ICS ensures the safe, reliable, and secure operation of industrial processes. In ICSs, malicious attacks are possible due to the inherent loopholes in communication protocols. Recently, industrial control networks have faced constant threats, such as the Stuxnet virus attacking PLC codes to achieve such an attack, thereby destroying the centrifuge’s regular operation. Thus, numerous researchers have devoted themselves to constructing state models for ICSs to enable anomaly detection for different attacks.
The ICSs’ layers interact and communicate with one another via the network while carrying out their specific assigned tasks. The ICS is vulnerable in both the network along physical layers due to its close coupling between cyberspace and physical space. An attacker may launch a cyberattack, which could result in malicious software and data asset theft or tampering with the equipment, leading to the loss of crucial control information and failure of crucial control commands.
The threat of attack has caused worldwide concern about the cyber security of ICSs. Given the numerous attack threats faced by ICSs, Teixeira et al. proposed an ICS pass-through attack model based on three-dimensional information and physical space to characterize the various attack means in different spaces and to illustrate the characteristics of multiple types of attacks [
4]. Accordingly, Adepu et al. proposed a framework for describing physical attacks, cyberattacks, and other types of attacks by dividing them into domain, attacker, and attack models [
5]. In light of the wide variety of attack types, complex attack paths, and variable attack strategies facing ICSs, it is challenging to construct a mathematical model covering all scenarios.
Currently, data-driven methods of extracting information from process data and modeling monitoring have become a hotspot in anomaly detection research. The advancement of sensor technology has allowed almost all industrial objects to be equipped with various types of sensors, which has resulted in a great deal of data being collected in industrial processes. By merging the data from various sources and examining the correlation between the information, data-driven anomaly detection methods can detect whether a system is under attack. A relational model that captures the intruder’s identity, velocity, level of threat, and target of intrusion was developed, which serves as a foundation for continuous cyberspace state monitoring [
6]. Lu et al. proposed a security monitoring method for industrial control networks based on an improved C-SVC (C-Support Vector Classifier), which can effectively identify multiple types of abnormal states and form situational awareness results [
7]. A hidden Markov model-based attack detection for Stuxnet has been proposed in the industrial control system subject to random packet dropouts [
6]. Despite being based on mechanistic models of attack-induced abnormal states, the methods above have inherent limitations when applied to large-scale complex industrial processes.
Considering the large scale and complexity of the system in question, as opposed to complex processes mechanisms, researchers have monitored network security status by analyzing the process data in industrial control networks. Multivariate statistical process monitoring (MSPM) methods have been widely studied and applied over the past few decades [
8]. Rather than modeling a particular attack model, MSPM depicts the operational state of the system. Attack detection on ICSs is achieved by comparing the deviations from the operational state. Among the most well-known representative branches of statistical process monitoring is principal component analysis (PCA), which is regarded as an effective means of dimension reduction. PCA identifies the major changes in data by decomposing multiple related variables into several orthogonal principal components [
9,
10]. The PCA-based MSPM approach enables monitoring by modeling the variable space of the system where two different monitoring statistics, Hotelling
and Squared prediction error
Q, are viewed as the monitoring statistics [
11,
12].
Although PCAs are widely used to detect anomalies, they do not perform as well when their assumptions are incorrect. The underlying Gaussian assumption in the calculation of control limits of monitoring statistics in PCA makes it a poor monitoring tool for non-Gaussian processes. A variety of PCA variants have been proposed for nonlinear processes, including probability PCA (PPCA) [
13] and kernel PCA (KPCA) [
14], in which the data are projected into a high-dimensional space. In essence, KPCA remains a linear dimensionality reduction method, and its effectiveness is heavily influenced by the choice of kernel function, which is not appropriate for systems with nonlinear or stochastic perturbations. Within the maximum likelihood framework, PPCA measures the similarity between new data points according to their probability density functions [
15,
16]. Canonical variate analysis (CVA), which provides a more accurate description of the process by maximizing the correlation between mainly dependent and quality variables [
17,
18], is another valid method for incorporating both static and dynamic process characteristics. Zhang et al. developed a CVA-based modeling and monitoring method for simultaneous static and dynamic analysis in three-phase flow processes [
19,
20]. A fault information-aided canonical variate analysis and a structured monitoring strategy has been proposed to improve anomaly detection rate [
17]. However, the process is usually assumed to operate under one condition, whereas industrial processes always operate in multiple modes.
For plant-wide processes, multimodal methods were introduced as a solution to these problems. Generally, block division is the key step in sub-block modeling. These methods can be classified into two main categories: data-driven and knowledge-based. Based on field experience and prior process knowledge, knowledge-based methods usually divide process variables into blocks. A hierarchical multiblock total projection to latent structures (T-PLS) based on an operating performance assessment scheme was proposed to identify the anomalies in operating statuses [
21]. Using prior process knowledge, Zhu et al. proposed the distributed parallel PCA process monitoring framework to decompose the high-dimensional process variables [
22]. When there is a lack of accurate prior knowledge, monitoring and anomaly detection performance may be less than optimal if the process variables are not correctly divided.
Data-driven methods have also been extensively used to divide variable blocks in distributed process monitoring using the process measurements from industrial historians. The data-driven approach clusters variables into sub-blocks by evaluating the correlations between variables. For instance, Hu et al. used mutual information (MI) analysis to extract the complex relationships between each possible process variable and the burn-through point in the sintering process [
23]. Zhang et al. investigated an improved mixture probability principal component analysis with clustering for nonlinear process monitoring where the
k-means is subsequently utilized as a clustering algorithm to divide the variables into optimal sub-blocks [
24]. Minimal redundancy maximal relevance was used to divide the most related variables into the same block and form a dynamic multiblock monitoring framework [
25]. With mutual information-spectral clustering, the measured variables were automatically divided into sub-blocks on which a Bayesian inference-based multiblock KPCA monitoring model was established [
26]. Combining knowledge-based and data-driven approaches, Cao et al. developed a hierarchical hybrid, distributed PCA for the plant-wide monitoring of chemical processes with two-layer manner sub-block division.
Although the aforementioned monitoring strategies have been demonstrated as effective, the monitoring performance may not be optimal when faced with sophisticated cyberattacks. On the one hand, network layer attacks such as data injection present more randomness and uncertainty than faults in the system. The above characteristics lead to traditional monitoring methods failing to identify the dynamic characteristics caused by attacks when modeling with normal samples. Specifically, PCA-based monitoring methods cannot fully extract the state-by-state characteristics of the system in the principal metric space, leading to omissions and false alarms in the monitoring results. Similarly, when confronted with large-scale complex systems, the traditional centralized modeling approach cannot adequately reconstruct the system’s state characteristics.
Motivated by the above research status, a concurrent distributed monitoring method was proposed to tackle the ICS attack detection tasks. redUsing a two-stage distributed modeling approach, we can extract all the state characteristics of the system. By using the MI method, the decision variables are selected and the distributed structure is realized. Then, the PPCA models compute both the serially correlated subspace and its residual subspace based on canonical variate analysis, which makes a complete interpretation of process dynamics under ICSs possible.
In the proposed framework, all detection variables were selected into the first layer by one-way variance analysis, and the detection variables were further divided into sub-blocks using a combination of general knowledge-based strategies with mutual information. Then, CVA–PPCA monitoring models were established for each sub-block, in which CVA was used to explore the serial correlations, and PPCA-based monitoring models were constructed for the variables of subspace. Finally, Bayesian inference was used to obtain comprehensive statistical indicators of the ICSs, which can realize plant-wide anomaly detection. Thus, the dynamic characteristics of the ICSs were restored, allowing for a deeper understanding of its security status. The main contributions of the present work are as follows.
An adaptive process variable selection and blocking method for distributed monitoring was implemented with combined knowledge-based strategies with mutual information.
Both linear and non-linear behaviors were analyzed and monitored, which can provide a meaningful interpretation for fine-scale identifying ICS attacks.
The rest of this paper is organized as follows. The problem description and monitoring framework are given in
Section 2.
Section 3 outlines the proposed concurrent distributed CVA-PPCA-based monitoring method in detail.
Section 4 details a validation of the effectiveness of the proposed method on actual drilling processes. Finally, conclusions are made in
Section 5.
2. Problem Description and Modeling Framework
In this section, the problems of ICS security monitoring are summarized. Based on these, a framework of monitoring model was designed.
2.1. Problem Description
ICS is an umbrella term for various network-connected control systems in the industrial field. Over the past few decades, ICSs have greatly enhanced the degree of industrial process automation and brought certain security risks.
Figure 1 shows a typical industrial control network architecture for the geological drilling processes. A controller employs a communication network to regulate the operation of the controlled process by measurements from geographically dispersed sensors.
During the drilling process, the PLC is responsible for controlling the industrial control system in order to read the data from the field sensors. Additionally, the Profibus communication protocol was utilized in order to facilitate communication between the PLC and the industrial control machine. For the purpose of reading the data from the PLC over the OLE for Process Control (OPC) protocol, the WinCC configuration software was utilized. MVC (Model–View–Controller) architecture was utilized by the system, which enables intelligent optimization control, as well as complicated logic operations.
A system failure results from an attacker’s deliberate destruction or manipulation of actuators, control units, etc., which is another manifestation of the ICS vulnerability in the physical layer. Network attacks and instrument malfunction both appear as anomalies in the data sampled by the sensors. The difference is that network attacks cause equipment failure, so the data usually show a causal relationship between them. Additionally, network attacks tend to maintain the statistical characteristics of the data sparingly, whereas equipment failures often result in outliers, missing values, and other easily observable changes. Due to the complexity of physical layer attacks, the attack detection algorithms in this paper only address attacks suffered at the network layer.
False data injection is a common network layer attack. In the event that sensors transmit sensing data to the PLC, the data may be tampered with, leading to the instability of the control system. In this attack, the original correct measurement value
of moment
t will be tampered with, resulting in the measurement value
deviating from the normal value
, which causes the feedback control system to perform incorrect responses. The attack process can be expressed as [
7]
where
and
are the impact index, which is usually a constant; and
is the attack period. This paper assumed that the anomalous state of the system was caused by fake data that were imposed by the attacker.
In general, false data injection attacks include the manipulation of system measures while the attacker is aware of the setup of the system. These attacks are difficult to monitor directly since they are difficult to detect. The three primary types of attacks that fall under the category of fake data injection assaults are known as surge attacks, deviation attacks, and geometry attacks. To varied degrees and at varying rates, the normal operation of the system is disrupted in each of these instances, and, when it is severe, it is likely to result in serious accidents.
Figure 2 presents histogram plots of the partial variables in the geological drilling process, such as the rate of permeation (ROP) as an example. Clearly, the distribution of data that is not ideal (shown by the red area) is mostly contained within the distribution of data that is optimal (represented by the blue area). Since this is the case, one of the most important concerns in ICS security monitoring is how to further parse data features. Monitoring the current status of network security can assist decision makers in determining whether or not an attacker intends to launch an attack. The operation of the system will be guaranteed to be stable and secure as a result of this.
In a data tampering attack, the attacker tampers with measured values of a system since he knows the system configuration and cannot be detected intuitively. Therefore, the following challenges need to be faced when investigating ICS-oriented attack detection methods.
Complexity: The number of current cyberattacks on ICSs is increasing, with attackers exploiting ICS vulnerabilities to deliver different types of attacks and threats.
Crypticity: There are insufficient means of identifying attack behavior, and the attack detection false alarm rate is high due to attackers deliberately confusing the attack with the normal operation of the control system.
Therefore, an essential component of achieving ICS attack detection involves developing a monitoring model that accurately captures the dynamic aspects of the attack behavior.
2.2. Modeling Framework
The objective of this study was to detect the abnormalities of ICSs by constructing a process monitoring model based on the sufficient normal data of related detection variables. A novel CVA-PPCA-based monitoring method was presented to overcome the shortcomings and improve the performance of network anomaly identification. The framework of the proposed network condition monitoring scheme is shown in
Figure 3.
The monitoring model consists of two parts: offline modeling and online monitoring. According to one-way analysis, the ICS performance quality-related detection variables were chosen; these were then further divided into reasonable sub-blocks by the MI analysis, which were added with prior knowledge. Within each sub-block, the CVA method was used to classify the variables according to their correlation into their correlated canonical subspace and residual subspace. Then, the PPCA-based monitoring model was established in canonical subspace. Finally, Bayesian inference was used to obtain comprehensive statistical indicators of the whole process, which can realize anomaly detection.
For online monitoring, real-time monitoring statistics can be compared with historical data to determine the overall performance of the integrated monitoring system and to define the detection thresholds according to attack type. Anomalies can then be detected by comparing the monitoring statistics to see if the limits have been exceeded.
3. Implementation of the Monitoring Model
In this section, the ICS security monitoring model is established. Firstly, sub-block division was carried out using one-way analysis and mutual information analysis. Using the CVA method, the original variable space was divided, and the PPCA monitoring model with preset control limits was constructed. To achieve online monitoring, the online data are used to calculate the monitoring statistics and compare them to the detection threshold.
3.1. Sub-Block Division Based on One-Way Analysis and Mutual Information Analysis
There are usually multiple industrial controls and multiple systems within ICSs. The whole process contains a number of detection variables. The multi-block modeling approach is an effective way to deal with the anomaly detection problem of large-scale processes. To fully extract the correlations between variables, sub-block division is necessary before offline modeling.
A two-stage delineation method was used in this study to create a multi-sub-block structure, with one-way analysis of variance being selected in the first stage to determine the operational state-related decision variables, which was followed by mutual information analysis and process knowledge for sub-block delineation.
In the first phase, one-way analysis of variance (ANOVA) can be used to determine the effect of the different operating modes on the distribution of variable data. By measuring the difference in the variance fluctuations caused by different operating conditions and random errors, ANOVA determines if changes in the operating conditions are a major factor in system operation.
There are five normal geological drilling conditions: drill up and down, rotary drilling, back reaming, hole sweeping, and sliding drilling. Assuming that the number of samples for each operating condition is selected as
and
, then the drilling data for each condition is recorded as
. The degree of variation
between the drilling data can be calculated as follows:
where
is the mean value of data collected for the variable, and
is the mean value of the variable in a data set for a mode. Furthermore,
can be decomposed into the sum of its error sum of squares and effect sum of squares, which is denoted as
, and
and
are relative-independent, the details of which can be defined as
According to the above definition, it is clear that
measures the distributional differences within different drilling conditions and also globally. Thus, it is possible to select the operating status-related variables related to the ICSs effectively. The degree of influence of a variable is measured by constructing a test statistic
and its test probability
:
where
obeys
F-distribution,
, and the test probability is
. The smaller the test probability, the greater the effect of the parameter on the operating conditions.
Table 1 presents the test probability of each parameter based on 1800 samples of data collected from the industrial control network in the drilling process. Clearly, the probability of testing parameters
and
is significantly higher than those of the other variables, which is also consistent with the process knowledge. A total of 10 variables can be selected for
, i.e.,
.
In the second stage, the detection variable blocking is based on MI combined with prior knowledge. MI involves determining whether a detection parameter’s data distribution and a performance indicator’s distribution are interdependent. When several variables interact, MI is the entropy that was initially contained as it decays. It suggests that information entropy is not constant but rather varies with the number of events that occur. MI is commonly interpreted as a metric that quantifies the degree of dependence and strength between two variables. Specifically, given two random variables
and
, the mutual information between them is defined as
where
and
are the marginal probability density functions of
X and
Y, and
is the joint probability of
X and
Y.
As this equation represents the uncertainty in
after removing
, it confirms the intuitive meaning of MI as the amount of information one variable provides about another. By analyzing the physical mechanism of the drilling production, it can be seen that
,
, and
are part of the mud system, and
and
are also one of the
-influencing parameters. Then, according to the blocking criterion [
13], these variables were divided into three sub-blocks:
,
, and
.
Hence, the detection variables were blocked according to their interrelationships using the MI combined with prior knowledge, and the CVA-PPCA anomaly detection model is then applied on a distributed sub-block structure.
3.2. Canonical Subspace Identification Based on CVA
The drilling detection variables are categorized into distinct sub-blocks based on current correlations. Then, state monitoring models would be constructed within each sub-block by parsing the data characteristics to accomplish anomaly detection for various attack methods.
Canonical variate analysis (CVA) is a dimension reduction algorithm that maximizes the alignment between two sets of variables. By maximizing the correlation between the “past” values and the “future” values of the system, the CVA-based approach generates state-space models from time-related data. Thus, CVA can be used to establish the relationships between process variables and quality variables, and the trained CVA model can be used for process monitoring related to quality.
In CVA, linear dimension reduction is used to reduce the size of variables so that it can be used to determine the most significant correlation between qualitative and primary dependent variables, as well as dynamic processes [
20]. This study addresses the auto-correlation challenge of modeling the operational state of industrial control networks.
The past and future drilling data matrix is constructed using drilling data
. Assume that, at moment
k, the past vector
, comprising the past data, and the future vector
, containing the present and future observations, are defined as
where the two vectors, i.e.,
and
, should first be normalized to a zero mean and with unit variance. To define the past and future matrices, vectors were arranged in the following Hankel matrix:
where
for a dataset with
N samples.
The aim of CVA is to reveal the remarkable features of the ICS operating conditions by identifying the projection matrix
L and
J in order to identify a linear combination of the future and past observations that have the optimal linear performance. The problem of solving the projection matrix is defined as follows:
The projection matrix
J and
L can be calculated by singular-value decomposition (SVD) on the Hankel matrix
H as follows:
where the sample covariances
and
and the cross-covariance of
of the past vector
and the future vector
are defined as follows:
where
U and
V consist of singular vectors that are orthogonal and only pairwise-correlated, and
is a diagonal matrix containing the canonical correlation coefficients. Thus, the projection matrices
J and
L can be calculated by taking the first
r columns of
U and
V, respectively.
For the
k moments of the ICS operation, the transformation matrices
J and
L are as follows:
The canonical state subspace
Z and its residual subspace
E of the drilling data matrix
x can be defined as
where the residual projection matrix
.
Therefore, the space of the primary and dependent variables Z, which are canonically correlated with the ICSs’ operational performance, is extracted within each sub-block. Then, a PPCA-based monitoring model is built on it to detect cyberattacks.
3.3. Overall Monitoring Model
According to CVA, the ICS variable space for drilling processes consists of a correlated canonical and residual subspace. It is necessary to establish a model for monitoring subspace in order to implement the proposed scheme.
PPCA-based monitoring model: The PPCA method is a representation of PCA in probability space, where probability density functions measure the degree of the novelty of new data points. While PCA is a linear down-scaling method, PPCA can take into account the nonlinear and dynamic characteristics of the system fully. When dealing with non-linear characteristics, PCA is vastly improved by the incorporation of probability. Data
x is believed to be generated by the latent variable
z when viewed from the perspective of probability. In order to produce the standard PPCA, the following pattern is utilized [
15]:
where
is the process observation variable,
is the vector of latent variables,
is the associated model parameter vector like loading matrix,
is an independent noise vector, and
describes the unknown function, which can be interpreted by a linear model in general.
where
,
, and
is the monitoring delay. The model parameters are then determined using a maximum-likelihood technique given a set of observational data.
According to the canonical subspace
acquired in the previous section. The PPCA algorithm seeks the projection matrix
to further reveal both the static and dynamic process variations in which the linear transformation
has the maximal variance. Like PCA, the problem of matrix projection can be expressed mathematically as
The transformed goal of the PPCA is to map the original
m-dimensional data into a
d-dimensional space, whose principal element model
T can be expressed as
where
W is the load matrix;
is the scoring matrix;
P is the number of principal components retained, which is commonly determined by a rule known as the cumulative percentage variance (CPV) [
27]; and
is the residual matrix, which represents process noise interference.
In general, the principal element is associated with a multivariate standard–normal distribution, while the noise residual is associated with a multivariate normal distribution, where , and is the noise variance. Then, the distribution of sample Z with respect to principal element is . According to Bayes’ theorem, the distribution of the sample data X is , and .
Thus, the problem solved by the PPCA algorithm can be seen as forming observations
Z from the distribution
by the hidden variable
. The problem to be addressed translates into the estimation of the distribution parameters
W and
from the measurement samples [
24]. This paper solves the probability distribution using the maximum-likelihood estimation problem. Expectation maximization (EM) is a powerful method for estimating the parameters of hidden variable models, which uses an expectation maximization algorithm that iterates repeatedly to find the parameters.
Online attack detection: To monitor the state of the ICSs online, the monitoring threshold must first be determined. Traditionally, PCA-based monitoring methods calculate two types of statistics, and Q, as well as the corresponding control charts. Specifically, the statistic is designed to monitor the data variations in the principal component space (PCS), while the Q statistic is used to monitor the data changes in the residual space. Observations of large deviations in the monitoring statistics may indicate an abnormal state of the industrial control network.
On the basis of the PPCA algorithm, the principal component space
, contains systematic variation information and will be used to construct the
statistic, while the residual
will form the
Q statistic. The monitoring statistics are defined as
In the case of a multivariate normal distribution for the process variables, the detection threshold for
can be obtained using the
F-distribution with
as the significance factor:
where
p is the number of PCSs. As with the residual subspace, a weighted Chi-squared distribution can approximate the confidence limit of
Q, such as
where
and
, in which
is the mean value of
Q, and
is the corresponding variance.
As the PPCA exclusively employs the Martensian paradigm for the detection of principal elements and noise [
28], the comprehensive monitoring statistics, which consist of
and
Q, can be directly generated from the whitened values of the statistics. The following formats were used to calculate the comprehensive monitoring statistic
S:
As a result of the proposed monitoring model, which effectively detects the data injection attacks on the ICSs,
is the threshold determined by kernel density estimation (KDE) [
29], which is the measurement of the degree of deviation from the normal operating conditions. Additionally,
S is the monitoring statistic based on the PPCA, and
is the threshold determined by the kernel density estimation (KDE). The threshold
is given by
where
is the probability density function of
S estimated by KDE. If the corresponding detection logic satisfies, for example,
, the operating performance is optimal; otherwise, it is non-optimal.
According to the previous discussion, there are several sub-blocks formed here. There is a need to integrate local statistics to construct comprehensive surveillance indicators for the whole process. This study used Bayesian inference to integrate the monitoring results of multiple sub-blocks into the overall monitoring results due to its excellent performance in sub-block decision fusion. Conceptually, the probability of each sub-model being under attack can be expressed as
where the prior probability of
is calculated as
and the conditional probabilities
and
are defined as
where
represents the statistic in the
i-th sub-block and
represents the control limits in the
i-th mode blocks;
N and
F denote the optimal and non-optimal operating performance, respectively;
and
represent the prior probabilities under the confidence level
and 1-
; and
. The intuitive interpretation is that the operating status expressed by sampling data is either normal or non-optimal in the drilling process.
After that, in the modeling phase, it is possible to obtain comprehensive monitoring indicators by integrating the PPCA sub-models for various operating modes based on Bayesian inference.
During the actual monitoring process, it can be determined that the ICSs have received an attack when the monitoring indicator exceeds the preset threshold.
4. Experimental Results and Analysis
This section verifies the validity of the methodology through practical examples, which are derived from the geological drilling process, and is divided into processes.
4.1. Geological Drilling Process
Geological exploration and resource extraction are contingent upon the successful completion of a geological drilling project. The drilling process is primarily conducted by drill rigs that are equipped with alternative current frequency conversion electric motors.
Figure 4 illustrates the schematic of a typical geological drilling process. A few of the components that were used in the drilling process included the crown blocks, moving blocks, derrick, driller’s residence, rotary table, drilling control system, mud pump, mud pit, sedimentation pit, drill string, bottom hole assembly, and drill bit.
Figure 5 shows a geothermal well construction site with an on-site industrial control system.
4.2. Overall Results of the ICS Attack Detection
In this paper, real-life case studies with drilling data from a geothermal well demonstrated the effectiveness and superiority of the proposed operating performance monitoring method. The selected running data contains the 12 process variables mentioned in
Table 1 from 1052 m to 1058 m, with an interval of 1 s, totaling 2826 data samples.
Figure 6 demonstrates time-series data of the actual running process of the ICSs during drilling. Despite the fact that the data injection attack on the network began at 160 s, no significant change was observed in the data curves of the detected variables. Therefore, more in-depth analyses of the data generated in the ICSs are needed to obtain a more accurate portrayal of the ICS operating state.
Before constructing the ICS monitoring model, the data set under normal operations was obtained. A standard data matrix was created by selecting 10 decision variables based on a one-way analysis (ANOVA), i.e., . According to the blocking MI-based criterion, these variables were divided into three sub-blocks, , , and . For each sub-block, the CVA-PPCA offline monitoring model is established on their canonical subspace, and the calculation of the composite discriminatory indicators and discriminatory thresholds are performed.
During the online monitoring phase, online data are collected according to a window of 20 min, and the monitoring statistic is calculated to identify the attack conditions in comparison with the detection threshold. The length of the monitoring window has some effect on the quality of the monitoring. A long window may not detect the fluctuations caused by dual-use attacks, such as, for instance, when there is too short of a window, which may cause frequent alarms and may interfere with the driller’s normal operation. Using the industrial control system at the drilling site and manual experience, this study specified a 20-min monitoring window, leading to better results.
Specifically, the principal components of the variables with were selected to construct the monitoring model. In all of the monitoring charts, the KDE algorithm was adopted to preset the control limits at a confidence level of and monitoring statics .
In this paper, the anomalous state of the ICSs was the result of two categories of data tampering: surge attacks and biased attacks [
30,
31]. During a surge attack, a single piece of data is manipulated in order to provide the greatest amount of damage in the shortest amount of time, and it exhibits a step change. Contrary to this, a biased attacker adds non-zero constants to numerous parts of data in a sequence and shows a slow process of change. The monitoring model in this paper was intended to detect the assaults that the system has received by analyzing the monitoring statistics that had been generated by the attacks relevant to the change.
Figure 7 illustrates the ICS attack detection results obtained through the proposed method.
As shown in
Figure 7, the red dashed line indicates the preset control limits, whereas the blue line represents the monitoring statistics calculated from the online data. The surge attack and deviation attack were performed at the 110th seconds of each experiment, as shown in
Figure 7a and
Figure 7b, respectively. In addition,
Figure 7c shows the monitoring results under normal operating conditions. Based on the attacking records, the model successfully identified the impact of the step-wise and slowly varying deviations from the normal operating state. The experimental results revealed that the proposed method can effectively identify anomalies due to attacks with 92.31% accuracy and 12 s monitoring delay.
For greater clarity, the PCA-based process monitoring method was chosen to perform the comparative experiments as a monitoring strategy [
32]. To realize the comparison, the integrated monitoring statistics of
, achieved by combining
and
Q, ere adopted in the attack detection task [
33]. The control limit was set as
. It can be seen from
Figure 8 that the PCA failed to detect the attacks because there was no significant change in the monitoring statistics. In both cases, the PCA method was less susceptible to the operational instability caused by assaults. As a result of the initial data structure being altered, the anomalies caused by data injection-type attacks did not rapidly accumulate and did not significantly affect the detection data. Consequently, the original PCA method was unable to extract the features that were related to operating conditions, resulting in unsatisfactory monitoring results. The monitoring process also suffered from more misses, false alarms, and longer anomaly detection delays than the method proposed in this study.
To effectively showcase the effectiveness of the proposed method in the monitoring processes, there were some sophisticated process monitoring methods that were selected for comparison such as the original PPCA [
15] and mRMR-PCA [
32]. The monitoring delay (
) refers to the period between the incidence of attack performance and the detection of its reasons. Evaluating the performance monitoring involves assessing the non-detection rate (
) and false alarm rate (
) according to specific criteria. The following matrix proves the definitions of the above indicators
The variable represents the count of samples that are incorrectly classified into non-optimal modes when they should have been classified into optimal modes. The variable represents the count of samples that were correctly classified into optimal modes. The variable represents the count of samples that are incorrectly classified into optimal modes when they should have been classified into non-optimal modes. Lastly, the variable represents the count of samples that are correctly classified into non-optimal modes. Lower values for and suggest a superior monitoring performance.
The detection results of the different methods for monitoring data injection attacks are shown in
Table 2 and
Table 3. It is essential to clarify that the typical PCA approach failed to detect both attacks because of its
for the two statistics, which went up to 74.31% and 83.34%. The PPCA method is inadequate due to its failure to include the non-linear attributes of the data, rendering it unsuccessful in detecting abnormalities. The
of the
Q-statistic calculated by mRMR-PCA was 6.05%, but it was 90.64% for the
-statistic in Case 1, which did not meet the needs of field applications. The mRNR-PCA based-monitoring method utilizes a distributed architecture, and, while it did not successfully identify the local attacks, its efficacy was attributed to the singular PCA model. The results show that our method has a comparatively better monitoring performance than the other methods. In terms of statistical metrics, the maximum enhancement of
and
reached 69.17 % and 9.67%, respectively, and the shortest detection delays of 11 s and 20 s were achieved in both cases.
In intuitive terms, the distributed structure ensures that the monitoring model can effectively extract the local and global features with finer-grained precision. In contrast, the typical correlation space combined with the data feature approach captures the latent data features of the ICSs and more accurately portrays the operational state of the process as a whole.
In summary, the proposed approach takes into account the relationship between variable spaces and residual spaces for online monitoring, whereas PCA just evaluates the interaction between variables. The findings suggest that an enhancement in performance monitoring can be achieved by partitioning the initial dataset using PPCA and CVA-based variable reconstruction.