1. Introduction
Intervention is typically defined as a deliberate action or strategy aimed at changing the state of one or several variables, thereby affecting the overall behavior or output of the system. In fields such as medicine and social science, intervention often takes the form of controlled experiments, such as administering a specific dosage of a drug to patients in order to study its therapeutic effects. When measuring the impact of teaching methods on students, the specific content of the intervention might involve choosing between traditional and digital teaching approaches. As the theory of experimental design has evolved, the forms of experiment used to evaluate the effectiveness of interventions have become increasingly diverse, including methods such as randomized controlled trials (RCT) [
1], among others.
As a term with rich meanings, intervention is often abstractly expressed as an operator
in statistics and causal reasoning, as proposed by Judea Pearl. This notation reflects a general meaning of intervention, which is to force one or more variables in the model set as a specific value or distribution [
2]. The introduction of the do-calculus simplifies and essentializes the original intervention measure, greatly broadening the boundary of intervention research, especially when the intervention action is difficult to perform in reality due to ethical and legal restrictions, or higher implementation costs are required, the do operator converts the actual impossibility into theoretical possibility, allowing researchers to evaluate potential results through theoretical models without doing actual interventions. This counterfactual analysis method has been widely used in causal reasoning.
We all know that in the real world, there are complex connections between different entities. Broadly speaking, these connections can be categorized into two types: correlation and causality. Research on interventions and their impacts typically requires a thorough understanding of real causality. If we simply observe a certain correlation between the variables and rashly intervene, we may find that the measure does not have the desired effect due to the false relationship caused by the existence of underlying factors and thus results in a waste of resources [
3].
For instance, suppose there are three variables that conform to the relationships illustrated in
Figure 1, where X is the common cause of Y and Z, but X may not be observed due to incomplete prior knowledge. It can be found that since Y and Z are simultaneously affected by X, changes in X will invariably result in correlated fluctuations in the output data of Y and Z. If we do not know the existence of potential factor X in advance, we might be misled by this apparent correlation and make incorrect intervention decisions.
In order to accurately describe the real relationship between variables, Pearl introduced the concept of a structural causal model (SCM). SCMs are mathematical models based on the relationships between potential and observed variables through a series of structural equations [
4]. These models provide a systematic approach to formally articulate the causal relationships among variables and have become increasingly significant in the realm of causal reasoning. In particular, when it is challenging to assess the effectiveness of an intervention by practical means, constructing a SCM is crucial. Through a SCM, the do operator can link the observed distribution to the intervention distribution, thereby estimating the nominal post-intervention value of a variable [
5].
Furthermore, a key question for decision-makers is how to select a good intervention. Lee et al. [
6] pointed out that the design of optimal intervention can be regarded as optimizing an expected strategy that contains two dimensions: observation and action. Simply put, it involves making the best choice based on the observed information when there are alternative interventions. Current methods that combine causal models with decision frameworks for the design of intervention primarily include two categories: Bayesian optimization and the causal bandit algorithm.
Bayesian optimization [
7,
8], as a global optimization method, employs a prior probability model and an acquisition function to optimize computationally expensive and non-analytical objective functions. With the increasing use of causal Bayesian networks (CBN) [
9,
10], incorporating causal information into the Bayesian optimization (BO) can better guide the search process for interventions. Causal Bayesian optimization (CBO) formulates a series of interventions by maximizing a target function. This method leverages causal relationships between variables to enhance the reliability of decision-making, and it effectively handles uncertainties through Bayesian techniques, delivering more robust optimization results. Meanwhile, it can adapt strategies in dynamic environments to accommodate new data and changes. Due to these advantages, causal Bayesian optimization has garnered considerable attention. Aglietti et al. [
11] introduced a causal Bayesian optimization algorithm to infer optimal decision strategies using causal information available, aiming to address a new class of problems called causal global optimization (CGO). Further, due to the potential constraints in actual intervention scenarios, methods like constrained causal Bayesian optimization (cCBO) [
12] have been proposed based on the theory of CBO. Considering that the choice of intervention measures is affected by the uncertainty of the causal structure, Branchini et al. [
13] extended the causal Bayesian optimization framework and proposed a causal entropy optimization algorithm (CEO) that incorporates uncertainty information into the decision-making process. Zhang et al. [
14] developed a causal active learning framework and designed a closed-form acquisition function to facilitate the rapid optimization of interventions. Similarly, Li et al. [
15] designed the acquisition function in Bayesian optimization to make the causal reasoning process more robust to noise interference. Scott et al. [
16] proposed a model-based CBO (MCBO) method that models the uncertainties of SCMs. Ren et al. [
17] introduced a CBO method based on learning the distribution of exogenous variables to address the common issue of neglecting exogenous variables in existing methods. However, when using the CBO method, it is usually assumed that the causal graph of variables is known, while the structural equations are unknown, and probabilistic surrogate models, often linear Gaussian models, are employed as substitutes. Simple linear Gaussian models cannot model complex causal relationships, which can lead to unreasonable intervention designs. Additionally, CBO involves sequential decision-making, which has a high computational complexity that may exceed the data sampling time, rendering the algorithm impractical for real-world applications. In contrast, heuristic optimization offers rapid convergence and higher computational efficiency, effectively addressing these issues.
The design of intervention measures is essentially a decision optimization problem. Therefore, it was conceptualized as a stochastic multi-armed bandit problem with side information, where a good intervention is defined as one that maximizes the expected reward [
18,
19]. In the context of the causal bandit problem, Kroon et al. [
20] introduced an estimator based on the property of separating sets into the discrete causal bandit algorithm, allowing them to learn better interventions without requiring the incorporation of prior causal knowledge. Bilodeau et al. [
21] considered the case where the observed variables might be d-separators after intervention and proposed an adaptive optimal regret algorithm, extending its adaptability to conditions such as the front-door criterion. Moreover, methods like dynamic programming [
22] and reinforcement learning [
23] have also been employed to find superior interventions. The causal bandit algorithm utilizes causal reasoning to understand the impact of actions, thereby making more effective decisions based on causal relationships rather than mere correlations. It can also tailor actions to specific user environments by considering individual characteristics and previous interactions to improve outcomes. By focusing on causal effects, this type of algorithm can reduce the exploration needed to identify optimal actions. However, the effectiveness of such methods depends on the validity of the causal models used. Incorrect assumptions can lead to misleading results, and these methods can be computationally intensive, especially when dealing with large datasets and complex models, which may limit their scalability.
Although a large number of methods have been developed to study the optimization design of interventions based on causal models, there are still several shortcomings in the current approaches:
Model-based approaches often employ simplistic causal modeling such as the Gaussian process (GP) [
9,
11,
13] and linear Gaussian models [
15,
18]. However, in practical problems, target variables may exhibit non-Gaussian distributions, making traditional Gaussian-based models insufficient for capturing complex nonlinear relationships and non-stationary noise distributions among variables.
The construction of intervention scenarios remains incomplete. For instance, causal bandit algorithms for multi-armed bandit problems typically focus on discrete or limited interventions. But, in reality, any value within the feasible range is possible, not just limited discrete ones. Furthermore, current researchers seldom consider optimization decisions under constraints [
13], especially when we do not want other unmanipulated variables to deviate from our expectations after intervening on the target variable. This consideration cannot be ignored in practical applications.
Heuristic algorithms are increasingly employed for intervention optimization [
24,
25]. However, in fault intervention scenarios, due to the large solution space of feasible interventions, applying optimization algorithms directly to such problems often results in becoming stuck in local optima, making it difficult to find better solutions quickly. Therefore, improvement in the optimization process of the algorithm is necessary.
Based on the discussion above, we propose an improved heuristic algorithm based on a causal generative model with the aim of designing optimal interventions and conducting corresponding experiments to demonstrate the effectiveness of this method. The main contributions of this paper are summarized as follows:
A dual-layer mapping generative SCM is developed to characterize the causal structure among fault-related variables, along with the updating process of the counterfactual SCM following an intervention.
The fault intervention problem is modeled based on real-world scenarios and formally described as a constrained optimization problem aimed at maximizing intervention utility.
An ALNS algorithm based on the expected improvement strategy (EI-ALNS) is proposed to address the optimization problem of fault intervention, and experiments demonstrate the superiority of this heuristic method in optimizing intervention measures.
The value of the method for intervention design in this paper lies in its ability to provide guidance for decision-making during equipment operation and maintenance. Specifically, through real-time monitoring data, operators can assess whether the equipment is likely to fail in the near future reflected in changes in relevant parameters and proactively gain insights into potential failure information using predictive analytics and other methods. Based on this, operators can evaluate the effects of specific interventions (such as adjusting certain parameters within the system) in advance using causal generative models and optimize these interventions with improved heuristic algorithms, enabling them to implement effective measures when a failure occurs.
The remainder of this paper is organized as follows:
Section 2 presents some background konwledge.
Section 3 provides a detailed description of the implementation process of the method.
Section 4 validates the proposed model and search strategy on both synthetic data and satellite attitude control system data.
Section 5 summarizes the advantages and disadvantages.
2. Background
2.1. Generative Structural Causal Model
Let
denote time-series data where
, and
represents the number of variables. Suppose the causal relationships among variables are known as a priori, thus allowing us to describe the generation mechanism of these variables through SCMs. According to Strobl [
26], a complete SCM
typically includes the following components:
- 1.
Set of variables : When considering causal relationships among system variables, it is generally believed that there are two types of variables—endogenous variables and exogenous variables . The endogenous variables constitute the main body of the SCM and are directly influenced by other variables within the model, while the exogenous variables indicate the sources that are external to the causal relationships, typically representing random factors or dependencies not considered in the model.
- 2.
Structural equations
: Set of functions that characterize the causal relationships among the endogenous variables:
Here,
is the structural function corresponding to the endogenous variable
, and
represents the parent variables of
, i.e., the set of variables that directly influence
.
- 3.
Joint probability density of exogenous variables : Determines the randomness of the entire model and impacts the accuracy of analysis and prediction when using the model.
- 4.
Causal graph : Visually describes the causal relationships between variables, which satisfies the constraints of a directed acyclic graph (DAG). indicates the directed edges in .
In this paper, SCM is treated as a generative model in order to answer the following question: ”What would the data look like after the intervention?”. The role of the generative SCM primarily encompasses the following two aspects. On one hand, it can capture complex relationships within the causal structure and simulate the impacts of intervened variables on other variables in the system. On the other hand, it provides a formalized counterfactual perspective, which allows the model to be trained to generate data under hypothetical scenarios, thereby facilitating the analysis and evaluation of hypothetical actions.
It should be noted that the generative SCM described in this paper is a surrogate model. It requires training with observational data to approximately simulate the data generation process for each variable. The causal structure is a priori, predetermined by expert information, which is a critical assumption in constructing the model.
2.2. Post-Nonlinear Causal Model
The design of the generative model in this paper, which incorporates a dual-layer mapping structure, draws formal inspiration from the nonlinear acyclic causal model proposed by Zhang et al. [
27], which is also known as post-nonlinear causal (PNL) model. PNL is one type of SCM, but it has a more complex functional form. Mathematically, the representation of the PNL model for each variable can typically be articulated as follows:
where
denotes the nonlinear effect of the causal variables
,
is the independent disturbance, and
denotes post-nonlinear distortion in variable
.
Specifically, the two nonlinear processing layers of the PNL model can be viewed as a “progressive” relationship, where the data generation process for each variable can be divided into the following two stages:
The primary goal of the first layer is to extract preliminary nonlinear features from the input causal variables. In this layer, we construct a nonlinear transformation of its parents , denoted by , plus some disturbance , thereby obtaining the intermediate outputs generated by this layer.
- 2.
The second stage
Nonlinear distortion is applied to the output of the first stage , which is used as input for the transformation of the second layer , thereby producing the final output .
Compared to the traditional linear SCMs [
14] and single-layer nonlinear SCMs [
28], the expressiveness and superiority of the PNL model are primarily reflected in the following aspects:
Traditional linear causal models
can only describe the direct linear relationships between variables, while single-layer nonlinear causal models
often only depict some relatively simple dependency structures. The PNL model, through the introduction of two nonlinear functions
and
, is able to more finely decompose complex nonlinear causal relationships, thereby capturing the nonlinear effect of the causes [
29].
It is known that in linear and single-layer nonlinear models, the error terms are additive noise. However, in the PNL model, the error term is encompassed by the posterior nonlinear function, allowing the model to handle non-stationary noise distributions and external disturbances in a more complex manner [
30], depicting the noise effect in the transmission process from
to
, which follows an unconventional distribution.
The nonlinear component can be customized to adapt to specific data structures, which are often regarded as nonlinear distortion caused by the sensor or measurement. This makes the model more adaptable and robust when facing varied and heterogeneous real-world data.
2.3. Counterfactual Updating in SCM
When we intervene on variables in endogenous variables , the original SCM may no longer be able to effectively describe the new actual situation. This is because the intervention operation forces the variable to take on a specific value, causing the intervened variable not to be influenced by its parent variables, which is reflected in the DAG by deleting incoming edges of the corresponding variable. The causal structure is revised in the “counterfactual world”, leading to a change in the data generation mechanism, thereby simulating counterfactual occurrences that do not exist in the real world. Formally, suppose we perform the intervention , replacing with intervention value . The original SCM then transforms into the counterfactual SCM , where represents the counterfactual distribution that needs to be relearned, is the probability density of posterior noise, is the post-intervention DAG.
2.4. Intervention Optimization
In practical operations, when intervening at a fault, we typically consider how to balance the cost and benefit of the intervention measures, because the application of intervention measures will inevitably lead to the consumption of resources. Specifically, if the intervention intensity is too low, the control over the fault may not achieve the desired effect. On the other hand, excessive intervention could lead to unnecessary waste of manpower and material resources. Therefore, a good intervention measure should have a high utility value, i.e., achieving better intervention results at a lower cost. To achieve the optimal intervention intensity, we formulate the intervention optimization as a classic constrained optimization problem, and then we can develop a heuristic algorithm to solve it.
3. Proposed Method
The method for designing optimal intervention proposed in this paper primarily consists of the following two components, as illustrated in
Figure 2:
Before devising intervention measures, the first consideration is how to evaluate the impact of intervention measures on the fault. To solve this, we introduce a generative SCM based on a dual-layer mapping structure called graph-based warped multi-task Gaussian process (G-WMGP), which includes two sub-models: G-MTGP and normalizing flows. We also detail the updating process through which the G-WMGP model is adapted into the counterfactual G-WMGP model in the context of an intervention, thereby providing a theoretical foundation for the generation of post-intervention data.
- 2.
Formulation of intervention measures
To transform the intervention decision into a constrained optimization problem, we define the objective function and the specific form of the intervention measures, where the intervention measures align with the “3W” intuitive thinking of decision-makers in real-world scenarios: the target of the intervention (“which”), the timing and duration of the intervention (“when”), and the specific values of the intervention (“what”). Given this, we consider employing a heuristic optimization algorithm to identify high-quality intervention measures, where an intervention measure is defined as an individual solution. The objective function is calculated based on the counterfactual data, which are generated by the generative SCM, and the intervention measure generated in each iteration will act as input to “urge” the update of the SCM. Moreover, we propose a search strategy aimed at addressing efficiency (i.e., how to quickly find an optimal solution) and accuracy (i.e., how to enhance the quality of the solution) issues during the algorithm’s execution.
3.1. Data Generation Based on SCM
3.1.1. Generative Model Construction
In order to facilitate the construction of the model, we assume that the causal relationships among variables satisfy the condition of causal stationarity [
31], which posits that the causal structure
does not change over time. Further, considering the causal links between any variables exhibit a relatively complex nonlinear pattern, we employ the structure of a post-nonlinear causal model [
32] to simulate the data generation process of the variables of interest, i.e., for each variable
:
Here, both
and
represent nonlinear mapping relationships,
represents mutually independent noise that illustrate the potential influence of the exogenous variable
on the system variable
, where
. The parent variables of
in the causal structure
are denoted as
, where
, and the bold form
represents the values of the corresponding variables. Intuitively, serving as the input to the generative model,
directly impacts the generation data of
[
30]. Therefore, we hope to construct a transformation
that can effectively characterize the data generation mechanism when
is influenced by
.
To achieve this, we introduce a generative SCM called G-WMGP, which is based on the PNL dual-layer mapping architecture for designing the specific functional form of transformation
. The principle involves mapping the original input through a war-ping transformation to a new input space, where a graph-based multi-task Gaussian process (G-MTGP) is established. The inner mapping
uses the G-MTGP model, which is an extension of GP and denoted as
, where the inputs and outputs are based on the many-to-one relationships among variables in the causal graph
. In the multi-task learning framework, the correlations among tasks are utilized to improve the learning performance; thus, the overall dataset can be viewed as a collection containing
tasks, with each task
comprising
samples. The covariance matrix
encompasses the covariance between different tasks and among different data points within the same task, characterized by a coregionalization kernel in this paper as follows:
Here, . denotes the Euclidean distance between data points at and , where . and represent scale factors, and summarizes parameters from the multi-task Gaussian process. The positive definite matrix describes the correlation between tasks and and corresponds to the variance of task while .
The outer mapping
employs a generative architecture of normalizing flows, aiming at constructing flexible probability distributions on continuous random variables. The main idea is to express
as the transformation
of a real vector
sampled from
:
where
is regarded as the base distribution of the flow-based model, which equals
due to the construction of the G-MTGP model. To enhance the learning capability of the flow model,
is typically composed of
sub-flows, i.e.,
, while each of them can be a different flow type. When building the flow-based model, it is necessary to ensure that
is invertible and both
and its inverse
are differentiable. According to the property of the inverse,
are given by the following:
Under these conditions, the density of
can be obtained through the change in variables:
where
denotes the absolute Jacobian determinant of
, which quantifies the relative change in a small neighborhood around
due to the transformation of
. Because
, according to Equation (6),
can be written as follows:
Similarly, when containing composable sub-flows, the Jacobian determinant of
is calculated as follows:
The loss function used for model training can be characterized by the divergence between the flow model
and the target distribution
, specifically employing the forward Kullback–Leibler divergence and approximated using the Monte Carlo method [
33]:
where
,
defines the parameters of the flow model.
represents the Jacobian determinant of the inverse transformation
, calculated as follows:
Through Equation (5), parameters can be learned with the observational data .
By constructing a multi-layered nested architecture, the G-WMGP model not only more accurately captures the latent structures and nonlinear dependencies in the data, but also provides high-quality data generation and explicit modeling of complex distributions, while maintaining the flexibility and interpretability of the model.
3.1.2. Counterfactual SCM Calculation
According to the three-step method proposed by Pearl, the post-intervention counterfactual data
can be obtained through the following process [
28,
33]:
- 1.
Abduction: First, calculate the posterior noise distribution for the transformation
based on
, which conforms to a Gaussian distribution according to Weilbach et al. [
29], as follows:
where
Here,
is calculated with
.
- 2.
Action: Then, update the structural equation
with the intervention measure
, i.e., replacing
with value
:
- 3.
Prediction: Finally, calculate the predictive posterior distribution of
at
:
where
and
are given by
is the covariance matrix among
, while
is calculated as another between
and
. Then, we can obtain the post-intervention counterfactual values through samples from the posterior distribution of
:
By constructing SCM
, it is possible to derive a counterfactual mapping
from the causes of
to
if an intervention measure is implemented on
.
3.2. Design of Optimal Intervention Based on EI-ALNS
In this section, we primarily discuss the process of seeking better solutions for intervention measures based on the optimization algorithm and the improved search strategy, which aims at enhancing the optimization capability of the algorithm. We first elaborate on the fault intervention issue in
Section 3.2.1, and construct the scenario based on real-world fault conditions and intervention needs, while also defining the optimization objective and constraints. In
Section 3.2.2, we detail the principle of the expected improvement search strategy which is abbreviated as EI, and then provide the implementation process of the EI-ALNS algorithm in the context of intervention optimization.
3.2.1. Problem Modeling
Assume that the faulty variable is denoted as , and the interval during which the fault occurs is represented as , with a length of . Our goal is to implement interventions to eliminate the abnormal state of as much as possible. To achieve this, an intervention measure should include the following elements:
Which: Intervened variable , a discrete set. Given the causal structure, we can clearly identify which variables directly or indirectly influence . Considering the constraints of cost and operability in real-world scenarios, and decision-makers may hope to eradicate the fault at its source rather than merely addressing the symptoms, it is common practice to intervene on a subset of .
When: Intervention start time and step length , which lies within the range , discrete values. These two constitute the time interval of intervention, indicating when the decision-makers start the intervention and how long the intervention measures last.
How: Specific values for intervention , continuous values. The intervention values , which should be within the normal range of the variable, reflect the intensity of the intervention through their difference from the original data.
In order to transform the intervention decision problem into a classic constrained optimization problem, we first define the following variables of interest within :
Intervention variables: Variables about to be intervened denoted as .
Target variable: The faulty variable . Ensuring that the faulty variable returns to normal functioning as much as possible is one of the core purposes of designing intervention measures.
Constraint variables: For certain critical variables in the system, it is necessary to avoid causing new faults when intervening in order to control the adverse effects of the intervention.
Intervention values should be continuously derived within the normal range of the variables, as mentioned above. In this paper,
is defined as a deviation from the expected interval
:
where
represents the normal interval of
at time
, and
denotes the target values under non-intervention conditions.
represents the permissible fluctuation range for the variable in factual scenarios, which can be determined based on expert knowledge or approximated using the statistical interquartile range method. Thus, we can calculate the abnormal state of
at time
:
The post-intervention expected values of
, denoted as
, are derived from the aforementioned counterfactual update process. Given these definitions, the relative abnormal value can be expressed as follows:
where
represents the mean of outlier data within
. Taking practical context into account, we not only aim to mitigate the abnormal state of variables through intervention measures, but also seek to ensure that the cost of intervention is not excessively high. The cost of each intervened variable
can be represented by the following:
And the total cost of the intervention is the sum, as follows:
Taking relative changes aims to eliminate differences in dimensions among variables. In summary, the objective function and the constraints of this optimization problem can be formalized as follows:
The optimal intervention measure is thus determined by solving the following:
3.2.2. Expected Improvement Search Strategy
The ALNS algorithm is a powerful heuristic method that relies on its flexible neighborhood search mechanism and adaptive strategy adjustments, aiming at solving large-scale and complex constrained optimization problems [
34]. The basic idea of the ALNS algorithm is to start from the current solution and dynamically search for new solutions in the solution space through a process of destruction and reconstruction, combined with adaptive mechanisms and acceptance criteria.
Although the ALNS algorithm itself has a strong global optimization capability, directly applying it to the optimization of intervention measures reveals challenges. Specifically, when there are many intervention variables and large intervention intervals, due to the solution form , which includes continuous optimization of intervention values, the solution space becomes very large, making it more difficult to find high-quality solutions without extensive search time.
Considering the aforementioned issues and the difficulty of obtaining superior solutions using the basic form of the ALNS algorithm, we aim to refine the optimization process by combining the ideas of global and local neighborhood searches, thereby avoiding the pitfalls of local optima. Since the complex relationships among variables in the scenario have already been formalized using a SCM, we consider how to integrate information provided by the causal model to assist in guiding the search process of the algorithm. We have found that the cCBO approach improves the intervention optimization process by incorporating information from the causal surrogate model, which has been proven effective both theoretically and practically [
12]. This idea has greatly inspired us to improve the algorithm.
We propose a search strategy with the core principle that during the iterative optimization of intervention values, the direction and step size of the neighborhood search are adaptively adjusted based on the effectiveness of the current intervention, thereby enhancing the algorithm’s optimization capability. As a classic optimization search method, the gradient descent method enables the solution to be updated towards the current negative gradient direction. Considering that the objective function (20) is smooth and differentiable, we can perform local optimization on the intervention values
based on gradient information. Formally, for the current intervention measure
, we derive a new one
based on the following improvement rule:
where
represents the number of iterations for executing gradient descent, and
is the learning rate.
is the step size control term, which serves to adjust the step size in the search process during the iterations of the algorithm. Increase the search step size when the current intervention
significantly deviates from the theoretically optimal intervention
. Conversely, the step size is gradually reduced to achieve a more refined search. For the intervened variable
at time
, we have the following:
Here,
is the control coefficient, which is adaptively adjusted during the optimization process. Given the current solution
, we can calculate
for
and
for
while taking their ratio as the value of
:
This ratio intuitively represents the effectiveness of the intervention measure in fault clearance, which incorporates the intervention information from the generative SCM. If the current intervention measure is of low utility, meaning the intervention input is excessive but the effect is poor, and continuing in the current search direction may be more likely to find an inferior solution. At this point, the coefficient will increase and change the search direction based on the evaluative information on from the generative SCM, thereby helping to escape the local area of inferior solutions. Conversely, when has a high utility, it is appropriate to decrease the step size controlled by conducting more detailed searches in its neighborhood. During the search process, can adaptively adjust without the need to set values manually. This strategy enhances the ability of the ALNS algorithm to escape local optima through a “feedback regulation” mechanism, thereby increasing the opportunity for finding superior solutions (Algorithm 1).
Algorithm 1: Expected Improvement Search |
|
The process of the ALNS algorithm based on the expected improvement strategy is shown in
Figure 3. First, set the initial parameters of the algorithm and generate an intervention
. Then, one of the operators, which is defined as the destruction operator and reconstruction operator, is selected to iteratively update
through the weights of operators, which results in
for
. After the n-th intervention measure
is obtained, the following steps are executed:
Update to the counterfactual through performing with observed facta .
Calculate the expected step size and update where the objective function contains the gradient information of the counterfactual generation transformation at .
Replace with and obtain new intervention measure .
Following this, we then compare with the intervention from the previous iteration and decide whether to accept the inferior one based on the Metropolis criterion, thereby updating the current intervention and weights of operators for the next iteration.
5. Discussion
This paper proposes a heuristic method based on a generative SCM and validates the effectiveness of the method on both synthetic data and satellite attitude control system data. The proposed G-WMGP model, constructed as a dual-layer mapping model based on causal coupling knowledge, effectively captures the complex nonlinear relationships when the fault occurs. The dual-layer structure also exhibits a strong capability to capture uncertainties in the data. Thus, we embed the SCM into the ALNS algorithm for optimizing interventions, enabling the generation of the post-intervention desired values. Additionally, we provide the computational process of the counterfactual SCM to assess the effects of performing intervention measures more accurately. In the search process for the optimal intervention, we introduce a search strategy with the purpose of enhancing the optimization capability. To validate the superiority of the proposed method, in
Section 4.1, we demonstrate that the G-WMGP model has a strong learning ability and robustness to noise by comparing both synthetic and real datasets with baseline models. In
Section 4.2, through comparison with various heuristic algorithms and different search strategies, we argue that the superior performance of the improved ALNS algorithm lies in its ability to strike a balance between time cost and solution quality. Compared to other search strategies, it not only finds better interventions but also requires less search time.
However, when constructing the generative model, this paper considers assumptions of causal stationarity and the absence of latent confounding factors, which may partially limit the applicability of the model as follows:
The causal stationarity assumption implies that causal relationships in the system remain unchanged over time. Due to the need for counterfactual generation, we only consider scenarios where causal structure changes when the variables in the system are disturbed by external forces, such as interventions. However, in real-world background, causal relationships may not be static. When real-world systems are dynamic and have feedback effects, static SCMs may not be able to capture the complex and dynamic interactions between variables in the system.
No latent confounding factors assumption requires that all potential confounding variables are identified and controlled, assuming complete causal knowledge of the system. However, even if we believe that all factors related to failures have been observed and controlled, there may still be unidentified confounding variables in reality. Missing confounding factors could potentially mislead us in identifying the true causes of failures, resulting in biases when estimating causal effects, as shown in
Figure 1.
The ALNS algorithm is effective in solving the intervention optimization issue, but it still has certain limitations in the following aspects:
Scalability: When the problem scale significantly increases, the performance of the ALNS algorithm may decline. This is because the algorithm needs to find effective solutions in a larger search space, and each neighborhood operation requires more computing resources and time, which may lead to an inefficient search process. Meanwhile, the ALNS algorithm typically relies on heuristic rules to guide the searching process, which may not be sufficient to capture all important dependencies and interactions in the face of complex causal structures, thus affecting the convergence of the algorithm and the quality of the final solution.
Computational complexity: when dealing with dynamically changing datasets, the ALNS algorithm may need to frequently update its search strategies or adjust parameters, which could affect the computational efficiency and the feasibility of real-time applications. Additionally, if there are complex causal relationships among variables in the dataset, it may be necessary to design specific neighborhood structures or customized improvement strategies, thereby increasing the complexity of the algorithm.
Future work could focus on developing more reasonable causal models or capturing causal relationships that are dynamically changing. For intervention optimization algorithms, it is important to consider whether the computational efficiency of the algorithms can be improved, for instance, by addressing complex causal structures through the design of neighborhood structures. Additionally, investigating whether these algorithms can be deployed for real-time optimization is also worth exploring.