Design of Optimal Intervention Based on a Generative Structural Causal Model

Wu, Haotian; Chen, Siya; Fan, Jun; Jin, Guang

doi:10.3390/math12203172

Open AccessFeature PaperArticle

Design of Optimal Intervention Based on a Generative Structural Causal Model

College of Systems Engineering, National University of Defense Technology, Changsha 410003, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(20), 3172; https://doi.org/10.3390/math12203172

Submission received: 5 September 2024 / Revised: 26 September 2024 / Accepted: 8 October 2024 / Published: 10 October 2024

Download

Browse Figures

Versions Notes

Abstract

:

In the industrial sector, malfunctions of equipment that occur during the production and operation process typically necessitate human intervention to restore normal functionality. However, the question that follows is how to design and optimize the intervention measures based on the modeling of actual intervention scenarios, thereby effectively resolving the faults. In order to address the aforementioned issue, we propose an improved heuristic method based on a causal generative model for the design of optimal intervention, aiming to determine the best intervention measure by analyzing the causal effects among variables. We first construct a dual-layer mapping model grounded in the causal relationships among interrelated variables to generate counterfactual data and assess the effectiveness of intervention measures. Subsequently, given the developed fault intervention scenarios, an adaptive large neighborhood search (ALNS) algorithm employing the expected improvement strategy is utilized to optimize the interventions. This method provides guidance for decision-making during equipment operation and maintenance, and the effectiveness of the proposed model and search strategy have been validated through tests on the synthetic datasets and satellite attitude control system dataset.

Keywords:

fault intervention; causal generative model; adaptive large neighborhood search algorithm; expected improvement strategy

MSC:

68T20

1. Introduction

Intervention is typically defined as a deliberate action or strategy aimed at changing the state of one or several variables, thereby affecting the overall behavior or output of the system. In fields such as medicine and social science, intervention often takes the form of controlled experiments, such as administering a specific dosage of a drug to patients in order to study its therapeutic effects. When measuring the impact of teaching methods on students, the specific content of the intervention might involve choosing between traditional and digital teaching approaches. As the theory of experimental design has evolved, the forms of experiment used to evaluate the effectiveness of interventions have become increasingly diverse, including methods such as randomized controlled trials (RCT) [1], among others.

As a term with rich meanings, intervention is often abstractly expressed as an operator

d o (X = x)

in statistics and causal reasoning, as proposed by Judea Pearl. This notation reflects a general meaning of intervention, which is to force one or more variables in the model set as a specific value or distribution [2]. The introduction of the do-calculus simplifies and essentializes the original intervention measure, greatly broadening the boundary of intervention research, especially when the intervention action is difficult to perform in reality due to ethical and legal restrictions, or higher implementation costs are required, the do operator converts the actual impossibility into theoretical possibility, allowing researchers to evaluate potential results through theoretical models without doing actual interventions. This counterfactual analysis method has been widely used in causal reasoning.

We all know that in the real world, there are complex connections between different entities. Broadly speaking, these connections can be categorized into two types: correlation and causality. Research on interventions and their impacts typically requires a thorough understanding of real causality. If we simply observe a certain correlation between the variables and rashly intervene, we may find that the measure does not have the desired effect due to the false relationship caused by the existence of underlying factors and thus results in a waste of resources [3].

For instance, suppose there are three variables that conform to the relationships illustrated in Figure 1, where X is the common cause of Y and Z, but X may not be observed due to incomplete prior knowledge. It can be found that since Y and Z are simultaneously affected by X, changes in X will invariably result in correlated fluctuations in the output data of Y and Z. If we do not know the existence of potential factor X in advance, we might be misled by this apparent correlation and make incorrect intervention decisions.

In order to accurately describe the real relationship between variables, Pearl introduced the concept of a structural causal model (SCM). SCMs are mathematical models based on the relationships between potential and observed variables through a series of structural equations [4]. These models provide a systematic approach to formally articulate the causal relationships among variables and have become increasingly significant in the realm of causal reasoning. In particular, when it is challenging to assess the effectiveness of an intervention by practical means, constructing a SCM is crucial. Through a SCM, the do operator can link the observed distribution to the intervention distribution, thereby estimating the nominal post-intervention value of a variable [5].

Furthermore, a key question for decision-makers is how to select a good intervention. Lee et al. [6] pointed out that the design of optimal intervention can be regarded as optimizing an expected strategy that contains two dimensions: observation and action. Simply put, it involves making the best choice based on the observed information when there are alternative interventions. Current methods that combine causal models with decision frameworks for the design of intervention primarily include two categories: Bayesian optimization and the causal bandit algorithm.

Bayesian optimization [7,8], as a global optimization method, employs a prior probability model and an acquisition function to optimize computationally expensive and non-analytical objective functions. With the increasing use of causal Bayesian networks (CBN) [9,10], incorporating causal information into the Bayesian optimization (BO) can better guide the search process for interventions. Causal Bayesian optimization (CBO) formulates a series of interventions by maximizing a target function. This method leverages causal relationships between variables to enhance the reliability of decision-making, and it effectively handles uncertainties through Bayesian techniques, delivering more robust optimization results. Meanwhile, it can adapt strategies in dynamic environments to accommodate new data and changes. Due to these advantages, causal Bayesian optimization has garnered considerable attention. Aglietti et al. [11] introduced a causal Bayesian optimization algorithm to infer optimal decision strategies using causal information available, aiming to address a new class of problems called causal global optimization (CGO). Further, due to the potential constraints in actual intervention scenarios, methods like constrained causal Bayesian optimization (cCBO) [12] have been proposed based on the theory of CBO. Considering that the choice of intervention measures is affected by the uncertainty of the causal structure, Branchini et al. [13] extended the causal Bayesian optimization framework and proposed a causal entropy optimization algorithm (CEO) that incorporates uncertainty information into the decision-making process. Zhang et al. [14] developed a causal active learning framework and designed a closed-form acquisition function to facilitate the rapid optimization of interventions. Similarly, Li et al. [15] designed the acquisition function in Bayesian optimization to make the causal reasoning process more robust to noise interference. Scott et al. [16] proposed a model-based CBO (MCBO) method that models the uncertainties of SCMs. Ren et al. [17] introduced a CBO method based on learning the distribution of exogenous variables to address the common issue of neglecting exogenous variables in existing methods. However, when using the CBO method, it is usually assumed that the causal graph of variables is known, while the structural equations are unknown, and probabilistic surrogate models, often linear Gaussian models, are employed as substitutes. Simple linear Gaussian models cannot model complex causal relationships, which can lead to unreasonable intervention designs. Additionally, CBO involves sequential decision-making, which has a high computational complexity that may exceed the data sampling time, rendering the algorithm impractical for real-world applications. In contrast, heuristic optimization offers rapid convergence and higher computational efficiency, effectively addressing these issues.

The design of intervention measures is essentially a decision optimization problem. Therefore, it was conceptualized as a stochastic multi-armed bandit problem with side information, where a good intervention is defined as one that maximizes the expected reward [18,19]. In the context of the causal bandit problem, Kroon et al. [20] introduced an estimator based on the property of separating sets into the discrete causal bandit algorithm, allowing them to learn better interventions without requiring the incorporation of prior causal knowledge. Bilodeau et al. [21] considered the case where the observed variables might be d-separators after intervention and proposed an adaptive optimal regret algorithm, extending its adaptability to conditions such as the front-door criterion. Moreover, methods like dynamic programming [22] and reinforcement learning [23] have also been employed to find superior interventions. The causal bandit algorithm utilizes causal reasoning to understand the impact of actions, thereby making more effective decisions based on causal relationships rather than mere correlations. It can also tailor actions to specific user environments by considering individual characteristics and previous interactions to improve outcomes. By focusing on causal effects, this type of algorithm can reduce the exploration needed to identify optimal actions. However, the effectiveness of such methods depends on the validity of the causal models used. Incorrect assumptions can lead to misleading results, and these methods can be computationally intensive, especially when dealing with large datasets and complex models, which may limit their scalability.

Although a large number of methods have been developed to study the optimization design of interventions based on causal models, there are still several shortcomings in the current approaches:

Model-based approaches often employ simplistic causal modeling such as the Gaussian process (GP) [9,11,13] and linear Gaussian models [15,18]. However, in practical problems, target variables may exhibit non-Gaussian distributions, making traditional Gaussian-based models insufficient for capturing complex nonlinear relationships and non-stationary noise distributions among variables.
The construction of intervention scenarios remains incomplete. For instance, causal bandit algorithms for multi-armed bandit problems typically focus on discrete or limited interventions. But, in reality, any value within the feasible range is possible, not just limited discrete ones. Furthermore, current researchers seldom consider optimization decisions under constraints [13], especially when we do not want other unmanipulated variables to deviate from our expectations after intervening on the target variable. This consideration cannot be ignored in practical applications.
Heuristic algorithms are increasingly employed for intervention optimization [24,25]. However, in fault intervention scenarios, due to the large solution space of feasible interventions, applying optimization algorithms directly to such problems often results in becoming stuck in local optima, making it difficult to find better solutions quickly. Therefore, improvement in the optimization process of the algorithm is necessary.

Based on the discussion above, we propose an improved heuristic algorithm based on a causal generative model with the aim of designing optimal interventions and conducting corresponding experiments to demonstrate the effectiveness of this method. The main contributions of this paper are summarized as follows:

A dual-layer mapping generative SCM is developed to characterize the causal structure among fault-related variables, along with the updating process of the counterfactual SCM following an intervention.
The fault intervention problem is modeled based on real-world scenarios and formally described as a constrained optimization problem aimed at maximizing intervention utility.
An ALNS algorithm based on the expected improvement strategy (EI-ALNS) is proposed to address the optimization problem of fault intervention, and experiments demonstrate the superiority of this heuristic method in optimizing intervention measures.

The value of the method for intervention design in this paper lies in its ability to provide guidance for decision-making during equipment operation and maintenance. Specifically, through real-time monitoring data, operators can assess whether the equipment is likely to fail in the near future reflected in changes in relevant parameters and proactively gain insights into potential failure information using predictive analytics and other methods. Based on this, operators can evaluate the effects of specific interventions (such as adjusting certain parameters within the system) in advance using causal generative models and optimize these interventions with improved heuristic algorithms, enabling them to implement effective measures when a failure occurs.

The remainder of this paper is organized as follows: Section 2 presents some background konwledge. Section 3 provides a detailed description of the implementation process of the method. Section 4 validates the proposed model and search strategy on both synthetic data and satellite attitude control system data. Section 5 summarizes the advantages and disadvantages.

2. Background

2.1. Generative Structural Causal Model

Let

{x_{t}}_{t = 1}^{M}

denote time-series data where

x_{t} = {x_{t}^{i}}_{i = 1}^{n} \in ℝ^{n}

, and

n

represents the number of variables. Suppose the causal relationships among variables are known as a priori, thus allowing us to describe the generation mechanism of these variables through SCMs. According to Strobl [26], a complete SCM

M = 〈V_{a}, F, P, G〉

typically includes the following components:

1.: Set of variables $V_{a} = {V, U}$ : When considering causal relationships among system variables, it is generally believed that there are two types of variables—endogenous variables $V$ and exogenous variables $U$ . The endogenous variables $V = {V_{i}}_{i = 1}^{n}$ constitute the main body of the SCM and are directly influenced by other variables within the model, while the exogenous variables $U = {U_{i}}_{i = 1}^{n}$ indicate the sources that are external to the causal relationships, typically representing random factors or dependencies not considered in the model.
2.: Structural equations $F = {F_{i}}_{i = 1}^{n}$ : Set of functions that characterize the causal relationships among the endogenous variables:

$V_{i} = F_{i} (P a (V_{i}), U_{i})$

(1)

Here, $F_{i}$ is the structural function corresponding to the endogenous variable $V_{i}$ , and $P a (V_{i})$ represents the parent variables of $V_{i}$ , i.e., the set of variables that directly influence $V_{i}$ .
3.: Joint probability density of exogenous variables $P (U)$ : Determines the randomness of the entire model and impacts the accuracy of analysis and prediction when using the model.
4.: Causal graph $G = (V_{a}, E)$ : Visually describes the causal relationships between variables, which satisfies the constraints of a directed acyclic graph (DAG). $E$ indicates the directed edges in $G$ .

In this paper, SCM is treated as a generative model in order to answer the following question: ”What would the data look like after the intervention?”. The role of the generative SCM primarily encompasses the following two aspects. On one hand, it can capture complex relationships within the causal structure and simulate the impacts of intervened variables on other variables in the system. On the other hand, it provides a formalized counterfactual perspective, which allows the model to be trained to generate data under hypothetical scenarios, thereby facilitating the analysis and evaluation of hypothetical actions.

It should be noted that the generative SCM described in this paper is a surrogate model. It requires training with observational data to approximately simulate the data generation process for each variable. The causal structure

G

is a priori, predetermined by expert information, which is a critical assumption in constructing the model.

2.2. Post-Nonlinear Causal Model

The design of the generative model in this paper, which incorporates a dual-layer mapping structure, draws formal inspiration from the nonlinear acyclic causal model proposed by Zhang et al. [27], which is also known as post-nonlinear causal (PNL) model. PNL is one type of SCM, but it has a more complex functional form. Mathematically, the representation of the PNL model for each variable can typically be articulated as follows:

x^{(i)} = F_{i, 2} (F_{i, 1} (P a_{G}^{(i)}) + ε^{(i)}), i = 1, 2, \dots, n

(2)

where

F_{i, 1}

denotes the nonlinear effect of the causal variables

P a_{G}^{(i)}

,

ε^{(i)}

is the independent disturbance, and

F_{i, 2}

denotes post-nonlinear distortion in variable

x^{(i)}

.

Specifically, the two nonlinear processing layers of the PNL model can be viewed as a “progressive” relationship, where the data generation process for each variable can be divided into the following two stages:

1.: The first stage

The primary goal of the first layer is to extract preliminary nonlinear features from the input causal variables. In this layer, we construct a nonlinear transformation of its parents

P a_{G}^{(i)}

, denoted by

F_{i, 1} (P a_{G}^{(i)})

, plus some disturbance

ε^{(i)}

, thereby obtaining the intermediate outputs

u^{(i)} = F_{i, 1} (P a_{G}^{(i)}) + ε^{(i)}

generated by this layer.

2.: The second stage

Nonlinear distortion is applied to the output of the first stage

u^{(i)}

, which is used as input for the transformation of the second layer

x^{(i)} = F_{i, 2} (u^{(i)})

, thereby producing the final output

x^{(i)}

.

Compared to the traditional linear SCMs [14] and single-layer nonlinear SCMs [28], the expressiveness and superiority of the PNL model are primarily reflected in the following aspects:

Modeling capability of nonlinear relationships

Traditional linear causal models

x^{(i)} = \sum B_{i k} x_{k} + ε^{(i)}

can only describe the direct linear relationships between variables, while single-layer nonlinear causal models

x^{(i)} = F_{i} (P a_{G}^{(i)}) + ε^{(i)}

often only depict some relatively simple dependency structures. The PNL model, through the introduction of two nonlinear functions

F_{i, 1}

and

F_{i, 2}

, is able to more finely decompose complex nonlinear causal relationships, thereby capturing the nonlinear effect of the causes [29].

Flexible control of error structures

It is known that in linear and single-layer nonlinear models, the error terms are additive noise. However, in the PNL model, the error term is encompassed by the posterior nonlinear function, allowing the model to handle non-stationary noise distributions and external disturbances in a more complex manner [30], depicting the noise effect in the transmission process from

P a_{G}^{(i)}

to

x^{(i)}

, which follows an unconventional distribution.

Adaptability and robustness of the model

The nonlinear component can be customized to adapt to specific data structures, which are often regarded as nonlinear distortion caused by the sensor or measurement. This makes the model more adaptable and robust when facing varied and heterogeneous real-world data.

2.3. Counterfactual Updating in SCM

When we intervene on variables in endogenous variables

V

, the original SCM may no longer be able to effectively describe the new actual situation. This is because the intervention operation forces the variable to take on a specific value, causing the intervened variable not to be influenced by its parent variables, which is reflected in the DAG by deleting incoming edges of the corresponding variable. The causal structure is revised in the “counterfactual world”, leading to a change in the data generation mechanism, thereby simulating counterfactual occurrences that do not exist in the real world. Formally, suppose we perform the intervention

d o (P a_{G}^{(i)} = X_{i v})

, replacing

P a_{G}^{(i)}

with intervention value

X_{i v}

. The original SCM

M

then transforms into the counterfactual SCM

\tilde{M} = 〈V_{a}, F^{'}, P^{'}, G^{'}〉

, where

F^{'}

represents the counterfactual distribution that needs to be relearned,

P^{'}

is the probability density of posterior noise,

G^{'}

is the post-intervention DAG.

2.4. Intervention Optimization

In practical operations, when intervening at a fault, we typically consider how to balance the cost and benefit of the intervention measures, because the application of intervention measures will inevitably lead to the consumption of resources. Specifically, if the intervention intensity is too low, the control over the fault may not achieve the desired effect. On the other hand, excessive intervention could lead to unnecessary waste of manpower and material resources. Therefore, a good intervention measure should have a high utility value, i.e., achieving better intervention results at a lower cost. To achieve the optimal intervention intensity, we formulate the intervention optimization as a classic constrained optimization problem, and then we can develop a heuristic algorithm to solve it.

3. Proposed Method

The method for designing optimal intervention proposed in this paper primarily consists of the following two components, as illustrated in Figure 2:

Construction of generative model

Before devising intervention measures, the first consideration is how to evaluate the impact of intervention measures on the fault. To solve this, we introduce a generative SCM based on a dual-layer mapping structure called graph-based warped multi-task Gaussian process (G-WMGP), which includes two sub-models: G-MTGP and normalizing flows. We also detail the updating process through which the G-WMGP model is adapted into the counterfactual G-WMGP model in the context of an intervention, thereby providing a theoretical foundation for the generation of post-intervention data.

2.: Formulation of intervention measures

To transform the intervention decision into a constrained optimization problem, we define the objective function and the specific form of the intervention measures, where the intervention measures align with the “3W” intuitive thinking of decision-makers in real-world scenarios: the target of the intervention (“which”), the timing and duration of the intervention (“when”), and the specific values of the intervention (“what”). Given this, we consider employing a heuristic optimization algorithm to identify high-quality intervention measures, where an intervention measure is defined as an individual solution. The objective function is calculated based on the counterfactual data, which are generated by the generative SCM, and the intervention measure generated in each iteration will act as input to “urge” the update of the SCM. Moreover, we propose a search strategy aimed at addressing efficiency (i.e., how to quickly find an optimal solution) and accuracy (i.e., how to enhance the quality of the solution) issues during the algorithm’s execution.

3.1. Data Generation Based on SCM

3.1.1. Generative Model Construction

In order to facilitate the construction of the model, we assume that the causal relationships among variables satisfy the condition of causal stationarity [31], which posits that the causal structure

G

does not change over time. Further, considering the causal links between any variables exhibit a relatively complex nonlinear pattern, we employ the structure of a post-nonlinear causal model [32] to simulate the data generation process of the variables of interest, i.e., for each variable

V_{i}

:

x^{(i)} = g_{i} (f_{i} (P a_{G}^{(i)}) + ε^{(i)}), i = 1, 2, \dots, n

(3)

Here, both

g_{i}

and

f_{i}

represent nonlinear mapping relationships,

ε^{(i)}

represents mutually independent noise that illustrate the potential influence of the exogenous variable

U_{i}

on the system variable

V_{i}

, where

ε^{(i)} \sim N (0, σ_{i}^{2})

. The parent variables of

V_{i}

in the causal structure

G

are denoted as

P a_{G}^{(i)}

, where

P a_{G}^{(i)} \subseteq V

, and the bold form

P a_{G}^{(i)}

represents the values of the corresponding variables. Intuitively, serving as the input to the generative model,

P a_{G}^{(i)}

directly impacts the generation data of

V_{i}

[30]. Therefore, we hope to construct a transformation

F_{i} = g_{i} (f_{i} (•))

that can effectively characterize the data generation mechanism when

V_{i}

is influenced by

P a_{G}^{(i)}

.

To achieve this, we introduce a generative SCM called G-WMGP, which is based on the PNL dual-layer mapping architecture for designing the specific functional form of transformation

F_{i} \in F

. The principle involves mapping the original input through a war-ping transformation to a new input space, where a graph-based multi-task Gaussian process (G-MTGP) is established. The inner mapping

f_{i}

uses the G-MTGP model, which is an extension of GP and denoted as

f_{i} \sim G P (μ, K_{θ, i})

, where the inputs and outputs are based on the many-to-one relationships among variables in the causal graph

G

. In the multi-task learning framework, the correlations among tasks are utilized to improve the learning performance; thus, the overall dataset can be viewed as a collection containing

S

tasks, with each task

r_{j} \in \{r_{j}, j \in P a_{G}^{(i)}\}

comprising

M

samples. The covariance matrix

K_{θ, i}

encompasses the covariance between different tasks and among different data points within the same task, characterized by a coregionalization kernel in this paper as follows:

k_{θ, i} {(P a_{G}^{(i)} (m), P a_{G}^{(i)} (m^{'}))}_{j, j^{'}} = σ^{2} (1 + \frac{\sqrt{3}}{l} α) \exp (- \frac{\sqrt{3}}{l} α) \cdot Q (r_{j}, r_{j^{'}})

(4)

Q (r_{j}, r_{j^{'}}) = {(w w^{T} + d i a g (u))}_{j, j^{'}}

(5)

Here,

m, m^{'} = 1, 2, \dots, M

.

α = {‖P a_{G}^{(i)} (m) - P a_{G}^{(i)} (m^{'})‖}_{2}

denotes the Euclidean distance between data points at

m

and

m^{'}

, where

P a_{G}^{(i)} (m), P a_{G}^{(i)} (m^{'}) \in ℝ^{n}

.

σ^{2}

and

l

represent scale factors, and

θ

summarizes parameters from the multi-task Gaussian process. The positive definite matrix

Q (r_{j}, r_{j^{'}})

describes the correlation between tasks

r_{j}

and

r_{j^{'}}

and corresponds to the variance of task

r_{j}

while

j = j^{'}

.

The outer mapping

g_{i}

employs a generative architecture of normalizing flows, aiming at constructing flexible probability distributions on continuous random variables. The main idea is to express

x^{(i)}

as the transformation

g_{i}

of a real vector

u^{(i)}

sampled from

p_{u}

:

x^{(i)} = g_{i} (u^{(i)})

(6)

where

p_{u}

is regarded as the base distribution of the flow-based model, which equals

N (μ, K_{θ, i} + σ_{i}^{2})

due to the construction of the G-MTGP model. To enhance the learning capability of the flow model,

g_{i}

is typically composed of

n^{'}

sub-flows, i.e.,

g_{i} = T_{1} \circ T_{2} \dots \circ T_{n^{'}}

, while each of them can be a different flow type. When building the flow-based model, it is necessary to ensure that

g_{i}

is invertible and both

g_{i}

and its inverse

g_{i}^{- 1}

are differentiable. According to the property of the inverse,

g_{i}^{- 1}

are given by the following:

g_{i}^{- 1} = T_{n^{'}}^{- 1} \circ T_{n^{'} - 1}^{- 1} \dots \circ T_{1}^{- 1}

(7)

Under these conditions, the density of

x^{(i)}

can be obtained through the change in variables:

p_{x} (x^{(i)}) = p_{u} (u^{(i)}) {|\det J_{g_{i}} (u^{(i)})|}^{- 1}

(8)

where

|\det J_{g_{i}} (u^{(i)})|

denotes the absolute Jacobian determinant of

g_{i}

, which quantifies the relative change in a small neighborhood around

u^{(i)}

due to the transformation of

g_{i}

. Because

u^{(i)} = g_{i}^{- 1} (x^{(i)})

, according to Equation (6),

p_{x} (x^{(i)})

can be written as follows:

p_{x} (x^{(i)}) = p_{u} (g_{i}^{- 1} (x^{(i)})) |\det J_{g_{i}^{- 1}} (x^{(i)})|

(9)

Similarly, when containing composable sub-flows, the Jacobian determinant of

g_{i}

is calculated as follows:

\det J_{g_{i}} (u^{(i)}) = \det J_{T_{1}} (T_{2} (u^{(i)})) \cdot \det J_{T_{2}} (T_{3} (u^{(i)})) \dots \det J_{T_{n^{'} - 1}} (T_{n^{'}} (u^{(i)}))

(10)

The loss function used for model training can be characterized by the divergence between the flow model

g_{i} (x^{(i)}; φ)

and the target distribution

p_{x}^{*} (x^{(i)})

, specifically employing the forward Kullback–Leibler divergence and approximated using the Monte Carlo method [33]:

\begin{array}{l} L (θ, φ) = - \frac{1}{M} \sum_{m = 1}^{M} \frac{1}{2} (y^{T} {(K_{θ, i} + σ_{i}^{2} I)}^{- 1} y + \log |K_{θ, i} + σ_{i}^{2} I| + M S \log (2 π)) \\ + \log |\det J_{g_{i}^{- 1}} (x_{m}^{(i)}; φ)| + const \end{array}

(11)

where

y : = g_{i}^{- 1} (x_{m}^{(i)}; φ) - μ

,

φ

defines the parameters of the flow model.

|\det J_{g_{i}^{- 1}}|

represents the Jacobian determinant of the inverse transformation

g_{i}^{- 1}

, calculated as follows:

|\det J_{g_{i}^{- 1}} (x_{m}^{(i)}; φ)| = \prod_{m = 1}^{M} |\frac{\partial g_{i}^{- 1} (x_{m}^{(i)}; φ)}{\partial x_{i}}|

(12)

Through Equation (5), parameters

{θ, φ}

can be learned with the observational data

X_{F}^{(i)} = {\{x^{(i)}, P a_{G}^{(i)}\}}_{F}

.

By constructing a multi-layered nested architecture, the G-WMGP model not only more accurately captures the latent structures and nonlinear dependencies in the data, but also provides high-quality data generation and explicit modeling of complex distributions, while maintaining the flexibility and interpretability of the model.

3.1.2. Counterfactual SCM Calculation

According to the three-step method proposed by Pearl, the post-intervention counterfactual data

x_{C F}^{(i)}

can be obtained through the following process [28,33]:

1.: Abduction: First, calculate the posterior noise distribution for the transformation $F_{i}$ based on $X_{F}^{(i)}$ , which conforms to a Gaussian distribution according to Weilbach et al. [29], as follows:

$ε_{F}^{(i)} |X_{F}^{(i)} \sim N ({\hat{μ}}_{ε, i}, {\hat{σ}}_{ε, i}^{2})$

(13)

where

${\hat{μ}}_{ε, i} = σ_{i}^{2} {({K^{'}}_{θ, i} + σ_{i}^{2} I)}^{- 1} (g_{i}^{- 1} (X_{F}^{(i)}) - μ)$

(14)

${\hat{σ}}_{ε, i}^{2} = σ_{i}^{2} {K^{'}}_{θ, i} {({K^{'}}_{θ, i} + σ_{i}^{2} I)}^{- 1}$

(15)

Here, ${K^{'}}_{θ, i}$ is calculated with ${\{P a_{G}^{(i)}\}}_{F}$ .
2.: Action: Then, update the structural equation $F_{i}$ with the intervention measure $d o (P a_{G}^{(i)} = X_{i v})$ , i.e., replacing $P a_{G}^{(i)}$ with value $X_{i v}$ :

$x_{C F}^{(i)} (d o (P a_{G}^{(i)} = X_{i v})) = g_{i} (f_{i} (X_{i v}) + ε_{F}^{(i)})$

(16)
3.: Prediction: Finally, calculate the predictive posterior distribution of $Z_{i} = f_{i} (X_{i v}) + ε_{F}^{(i)}$ at $X_{i v}$ :

$Z_{i} |X_{F}^{(i)} \sim N ({\tilde{μ}}_{i} + {\hat{μ}}_{ε, i}, {\tilde{σ}}_{i}^{2} + {\hat{σ}}_{ε, i}^{2})$

(17)

where ${\tilde{μ}}_{i}$ and ${\tilde{σ}}_{i}$ are given by

${\tilde{μ}}_{i} = μ + {\tilde{K}}_{i} {({K^{'}}_{θ, i} + σ_{i}^{2} I)}^{- 1} (x_{F}^{(i)} - μ)$

(18)

${\tilde{σ}}_{i}^{2} = {\tilde{k}}_{i} - {\tilde{K}}_{i} {({K^{'}}_{θ, i} + σ_{i}^{2} I)}^{- 1} {\tilde{K}}_{i}^{⊤}$

(19)

${\tilde{k}}_{i}$ is the covariance matrix among ${\{X_{i v} (m)\}}_{m = 1}^{M}$ , while ${\tilde{K}}_{i}$ is calculated as another between $X_{i v}$ and ${\{P a_{G}^{(i)}\}}_{F}$ . Then, we can obtain the post-intervention counterfactual values through samples from the posterior distribution of $Z_{i}$ :

$x_{C F}^{(i)} = g_{i} (z_{C F}^{(i)})$

(20)

By constructing SCM $\tilde{M}$ , it is possible to derive a counterfactual mapping ${\tilde{F^{'}}}_{i}$ from the causes of $V_{i}$ to $V_{i}$ if an intervention measure is implemented on $P a_{G}^{(i)}$ .

3.2. Design of Optimal Intervention Based on EI-ALNS

In this section, we primarily discuss the process of seeking better solutions for intervention measures based on the optimization algorithm and the improved search strategy, which aims at enhancing the optimization capability of the algorithm. We first elaborate on the fault intervention issue in Section 3.2.1, and construct the scenario based on real-world fault conditions and intervention needs, while also defining the optimization objective and constraints. In Section 3.2.2, we detail the principle of the expected improvement search strategy which is abbreviated as EI, and then provide the implementation process of the EI-ALNS algorithm in the context of intervention optimization.

3.2.1. Problem Modeling

Assume that the faulty variable is denoted as

V_{k}

, and the interval during which the fault occurs is represented as

S

, with a length of

M

. Our goal is to implement interventions to eliminate the abnormal state of

V_{k}

as much as possible. To achieve this, an intervention measure

R = (V_{i v}, t_{i v}, L_{i v}, X_{i v})

should include the following elements:

Which: Intervened variable $V_{i v}$ , a discrete set. Given the causal structure, we can clearly identify which variables directly or indirectly influence $V_{k}$ . Considering the constraints of cost and operability in real-world scenarios, and decision-makers may hope to eradicate the fault at its source rather than merely addressing the symptoms, it is common practice to intervene on a subset $V_{i v}$ of $P a_{G}^{(k)}$ .
When: Intervention start time $t_{i v} \in S$ and step length $L_{i v}$ , which lies within the range $[1, M - L_{i v}]$ , discrete values. These two constitute the time interval of intervention, indicating when the decision-makers start the intervention and how long the intervention measures last.
How: Specific values for intervention $X_{i v} \in ℝ^{V_{i v} \times L_{i ν}}$ , continuous values. The intervention values $X_{i v}$ , which should be within the normal range of the variable, reflect the intensity of the intervention through their difference from the original data.

In order to transform the intervention decision problem into a classic constrained optimization problem, we first define the following variables of interest within

G

:

Intervention variables: Variables about to be intervened denoted as $P a_{G}^{(k)}$ .
Target variable: The faulty variable $V_{k}$ . Ensuring that the faulty variable returns to normal functioning as much as possible is one of the core purposes of designing intervention measures.
Constraint variables: For certain critical variables $V_{c} \subseteq V$ in the system, it is necessary to avoid causing new faults when intervening in order to control the adverse effects of the intervention.

Intervention values should be continuously derived within the normal range of the variables, as mentioned above. In this paper,

x_{o u t, t}^{(i)}

is defined as a deviation from the expected interval

Q (i, t)

:

x_{o u t, t}^{(i)} \notin Q (i, t) : = (μ_{t a r, t}^{(i)} - δ_{i, t}, μ_{t a r, t}^{(i)} + δ_{i, t})

(21)

where

Q (i, t)

represents the normal interval of

V_{i}

at time

t

, and

μ_{t a r}^{(i)}

denotes the target values under non-intervention conditions.

δ_{i, t}

represents the permissible fluctuation range for the variable in factual scenarios, which can be determined based on expert knowledge or approximated using the statistical interquartile range method. Thus, we can calculate the abnormal state of

V_{k}

at time

t

:

C_{s t a t e, t}^{(k)} = |μ_{t}^{(k)} (d o (P a_{G}^{(k)} = X_{i v})) - μ_{t a r, t}^{(k)}| - δ_{k, t}, i f μ_{t}^{(k)} (d o (P a_{G}^{(k)} = X_{i v})) \notin Q (k, t)

(22)

The post-intervention expected values of

V_{k}

, denoted as

μ_{t}^{(k)} (d o (P a_{G}^{(k)} = X_{i v}))

, are derived from the aforementioned counterfactual update process. Given these definitions, the relative abnormal value can be expressed as follows:

C_{s t a t e}^{(k)} = \frac{1}{|{\bar{x}}^{(k)}|} \sum_{t \in S} C_{s t a t e, t}^{(k)}

(23)

where

{\bar{x}}^{(k)}

represents the mean of outlier data within

S

. Taking practical context into account, we not only aim to mitigate the abnormal state of variables through intervention measures, but also seek to ensure that the cost of intervention is not excessively high. The cost of each intervened variable

V_{b}

can be represented by the following:

C_{c o s t}^{(b)} = \sum_{t \in S} |\frac{1}{x_{t}^{(b)}} (x_{i v, t}^{(b)} - x_{t}^{(b)})|, V_{b} \in V_{i v}

(24)

And the total cost of the intervention is the sum, as follows:

C_{c o s t} = \sum_{V_{b} \in V_{i v}} C_{c o s t}^{(b)}

(25)

Taking relative changes aims to eliminate differences in dimensions among variables. In summary, the objective function and the constraints of this optimization problem can be formalized as follows:

F (R) = C_{c o s t} + C_{s t a t e}^{(k)}

(26)

s . t . \{\begin{cases} μ_{t}^{(i)} (d o (P a_{G}^{(k)} = X_{i v})) \in Q (i, t), f o r V_{i} \in V_{c} \\ x_{i v, t}^{(b)} \in Q (b, t), f o r V_{b} \in V_{i v} \end{cases}

(27)

The optimal intervention measure is thus determined by solving the following:

R^{*} = \arg \min_{s . t .} F (R)

(28)

3.2.2. Expected Improvement Search Strategy

The ALNS algorithm is a powerful heuristic method that relies on its flexible neighborhood search mechanism and adaptive strategy adjustments, aiming at solving large-scale and complex constrained optimization problems [34]. The basic idea of the ALNS algorithm is to start from the current solution and dynamically search for new solutions in the solution space through a process of destruction and reconstruction, combined with adaptive mechanisms and acceptance criteria.

Although the ALNS algorithm itself has a strong global optimization capability, directly applying it to the optimization of intervention measures reveals challenges. Specifically, when there are many intervention variables and large intervention intervals, due to the solution form

R = (V_{i v}, t_{i v}, L_{i v}, X_{i v})

, which includes continuous optimization of intervention values, the solution space becomes very large, making it more difficult to find high-quality solutions without extensive search time.

Considering the aforementioned issues and the difficulty of obtaining superior solutions using the basic form of the ALNS algorithm, we aim to refine the optimization process by combining the ideas of global and local neighborhood searches, thereby avoiding the pitfalls of local optima. Since the complex relationships among variables in the scenario have already been formalized using a SCM, we consider how to integrate information provided by the causal model to assist in guiding the search process of the algorithm. We have found that the cCBO approach improves the intervention optimization process by incorporating information from the causal surrogate model, which has been proven effective both theoretically and practically [12]. This idea has greatly inspired us to improve the algorithm.

We propose a search strategy with the core principle that during the iterative optimization of intervention values, the direction and step size of the neighborhood search are adaptively adjusted based on the effectiveness of the current intervention, thereby enhancing the algorithm’s optimization capability. As a classic optimization search method, the gradient descent method enables the solution to be updated towards the current negative gradient direction. Considering that the objective function (20) is smooth and differentiable, we can perform local optimization on the intervention values

X_{i v}

based on gradient information. Formally, for the current intervention measure

R = (V_{i v}, t_{i v}, L_{i v}, X_{i v})

, we derive a new one

R^{'} = (V_{i v}, t_{i v}, L_{i v}, {X^{'}}_{i v})

based on the following improvement rule:

Δ X_{i v, t} = {X^{'}}_{i v, t} - X_{i v, t} = - \sum_{n = 1}^{n_{0}} β \nabla_{x} F {(R^{(n)})}_{t} - Z_{t}, t \in (t_{i v}, t_{i v} + L_{i v})

(29)

where

n_{0}

represents the number of iterations for executing gradient descent, and

β

is the learning rate.

Z_{t}

is the step size control term, which serves to adjust the step size in the search process during the iterations of the algorithm. Increase the search step size when the current intervention

R

significantly deviates from the theoretically optimal intervention

R^{*}

. Conversely, the step size is gradually reduced to achieve a more refined search. For the intervened variable

V_{q} \in V_{i v}

at time

t

, we have the following:

Z_{t}^{(q)} = α |x_{i v, t}^{(q)} - x_{t}^{(q)}|

(30)

Here,

α

is the control coefficient, which is adaptively adjusted during the optimization process. Given the current solution

R

, we can calculate

C_{s t a t e}^{(k)}

for

V_{k}

and

C_{c o s t}^{(q)}

for

V_{q}

while taking their ratio as the value of

α

:

α = C_{s t a t e}^{(k)} \times C_{c o s t}^{(q)}

(31)

This ratio intuitively represents the effectiveness of the intervention measure in fault clearance, which incorporates the intervention information from the generative SCM. If the current intervention measure

R

is of low utility, meaning the intervention input is excessive but the effect is poor, and continuing in the current search direction may be more likely to find an inferior solution. At this point, the coefficient

α

will increase and change the search direction based on the evaluative information on

R

from the generative SCM, thereby helping to escape the local area of inferior solutions. Conversely, when

R

has a high utility, it is appropriate to decrease the step size

Δ X_{i v, t}

controlled by

α

conducting more detailed searches in its neighborhood. During the search process,

α

can adaptively adjust without the need to set values manually. This strategy enhances the ability of the ALNS algorithm to escape local optima through a “feedback regulation” mechanism, thereby increasing the opportunity for finding superior solutions (Algorithm 1).

Algorithm 1: Expected Improvement Search

The process of the ALNS algorithm based on the expected improvement strategy is shown in Figure 3. First, set the initial parameters of the algorithm and generate an intervention

R_{0}

. Then, one of the operators, which is defined as the destruction operator and reconstruction operator, is selected to iteratively update

R_{0}

through the weights of operators, which results in

R_{n}

for

n = 1, 2, \dots

. After the n-th intervention measure

R_{n}

is obtained, the following steps are executed:

Update $S C M M$ to the counterfactual $S C M \tilde{M}$ through performing $d o (P a_{G}^{(k)} = X_{i v})$ with observed facta $X_{F}^{(k)}$ .
Calculate the expected step size $Δ X_{i v} = {\{Δ X_{i v, t}\}}_{t \in S}$ and update ${X^{'}}_{i v} = X_{i v} + Δ X_{i v}$ where the objective function $F (R)$ contains the gradient information of the counterfactual generation transformation ${\tilde{F^{'}}}_{k}$ at $X_{i v}$ .
Replace $X_{i v}$ with ${X^{'}}_{i v}$ and obtain new intervention measure ${R^{'}}_{n}$ .

Following this, we then compare

{R^{'}}_{n}

with the intervention from the previous iteration

{R^{'}}_{n - 1}

and decide whether to accept the inferior one based on the Metropolis criterion, thereby updating the current intervention and weights of operators for the next iteration.

4. Experimental Results

4.1. Models Comparison

Generating accurate post-intervention counterfactual data is one of the key objectives of generative models based on SCM. To verify the capability of the G-WMGP model to capture the factual distribution, we compare several GP-based baselines described below:

GP-SCM [28]: A SCM based on a single-task Gaussian process. For each variable $V_{i}$ , a GP-SCM is constructed as follows:

$x^{(i)} = f_{i} (P a_{G}^{(i)}) + ε^{(i)}$

(32)

where $f_{i} \sim G P (0, k_{θ, i})$ . The covariance matrix $k_{θ, i} = {[k_{θ, i}]}_{M \times M}$ is characterized using the RBF kernel.
G-MTGP [12]: Graph-based multi-task Gaussian process. The relationship between $V_{i}$ and $P a_{G}^{(i)}$ is similar to Equation (26), but the kernel function is selected as a coregionalization kernel, which is formulated in Equations (3) and (4), where the target GP is considered as a linear combination of independent GPs.
G-WGP [29]: Graph-based warped Gaussian process. A dual-layer mapping architecture based on a warped transformation, where the transformation is modeled by a normalizing flow similar to the G-WMGP model proposed in this paper, except that the base distribution is represented by a standard GP-SCM.

The reason why we chose these three SCMs as baseline models is as follows: GP-SCM is one of the most classic nonlinear SCMs, the G-MTGP model is the core generative model of the cCBO approach, and the flow-based G-WGP model is also constructed as the generative SCM in solving the algorithmic recourse problem. The G-MTGP model and G-WGP model have been demonstrated to have sufficiently superior performance in the relevant literature through comparisons with other intervention models.

To compare the ability to uncover the underlying distribution and capture nonlinear effects between variables, we use Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Signal-to-Noise Ratio (SNR) as evaluation metrics. Here, SNR is calculated by the following:

S N R = \frac{1}{n} \sum_{i = 1}^{n} 10 \log (\frac{σ_{s, i}^{2}}{σ_{n, i}^{2}})

(33)

where

σ_{s, i}^{2}

represents the variance of the predicted values for variable

V_{i}

,

σ_{n, i}^{2}

represents the variance of the prediction errors for variable

V_{i}

. The purpose of taking the average is to consider the different dimensions of variables. A higher SNR indicates that the model is more effective at extracting useful information from a noisy environment, meaning it is more robust under noisy conditions.

4.1.1. Synthetic Datasets

We simulate some data using the structured synthetic data generation process proposed by Lawrence et al. [35] to test the G-WMGP model, as well as the aforementioned comparative methods. The advantage of using synthetic data lies in the ability to conduct experiments with accurate causal prior information, thereby avoiding the adverse effects that errors in causal knowledge, which are non-critical factors, might have on the experiments.

Before generating the data, it is necessary to specify the causal structure of the variables and initialize the SCM, including the functional dependencies between variables and the noise distribution. First, a nonlinear SCM with additive noise is constructed, where the functional dependencies and noise distribution are randomly selected from a specified collection of distributions, with the assumption that no time-lag effects and no influence from latent factors are considered. Given this SCM, we vary the number of variables

n

to generate synthetic datasets labeled as Synthetic 1, Synthetic 2, and Synthetic 3. Additionally, to verify whether the length of the fault interval

S

has an impact on the performance of these surrogate models, we set different values of

M

, denoted as Reg 1, Reg 2, and Reg 3, respectively. In this paper,

M, n \in [20, 30, 40]

. The experimental results are presented in Table 1, as follows:

Experimental comparisons with several baselines reveal that the G-WMGP model has the smallest MAE and RMSE on Synthetic 1 and Synthetic 2, showing a better performance in learning the data distribution for datasets with fewer variables. As for Synthetic 3, it is observed that the G-WMGP model still outperforms traditional single-mapping GP-SCM, while only showing a distinct advantage over the G-MTGP model and G-WGP model when the value of

M

is relatively small. However, it is noteworthy that in almost all cases, the SNR values of the G-WMGP model are significantly higher than those of other models, demonstrating the strong robustness of the G-WMGP model in noisy environments. Considering the performance across all these cases, we can see that GP-SCM is relatively the worst, both in terms of prediction and robustness to noise. The G-WGP model exhibits better capability of prediction than the G-MTGP model, while its main advantage lies in its stronger robustness compared to the G-MTGP model. The G-WMGP model proposed in this paper undoubtedly has the best performance in both learning power and noise robustness.

4.1.2. Satellite Attitude Control System Dataset

Considering that synthetic datasets might not adequately reflect the complexity and variability of real-world data, we further verified the effectiveness of the proposed model on a satellite attitude control system dataset. This dataset comprises 68,713 samples, including 19 telemetry variables, with a sampling frequency of 5 min (Table 2). It also contains some fault information, and the intervals corresponding to faulty variables have been labeled, with a label value of 1 indicating the occurrence of a fault and 0 otherwise. The causal relationships among these 19 variables are already known and displayed in Figure 4.

Similarly, we compare the MAE, RMSE, and SNR values of different models on a real dataset, with the results as follows:

From Table 3, we can observe negative SNR values on Reg 1 and Reg 2, indicating that the noise interference in this dataset is more intense compared to the previous synthetic datasets. In such conditions, the G-WMGP model remains reliable in terms of noise robustness, particularly on Reg 3 where its SNR value is only 0.0655. Except for the RMSE value on Reg 2, which is higher than that of the G-WGP model, the other metrics demonstrate the dominant performance of the G-WMGP model on the real dataset, further proving the effectiveness of the proposed model.

4.2. Algorithms Comparison

In this section, we still employ the satellite attitude control system dataset for the comparison of intervention optimization algorithms. We extract the fault information from the dataset, which includes the faulty variables, the start time of the fault, the duration of the fault, and the fault values of the variables. These 19 variables are assigned labels from A to T, and based on the varying lengths of the fault intervals, we categorize the fault scenarios, selecting specific ones as the basis of the intervention experiment. The detailed fault information is as follows (Table 4):

We first compare the performance of different heuristic algorithms on the intervention optimization problem, including the genetic algorithm (GA), simulated annealing algorithm (SA), and particle swarm optimization algorithm (PSO). We run each algorithm ten times, providing the mean and standard error of the convergence values from these runs as follows:

As shown in Table 5, although the performance of the ALNS algorithm may not be as stable as other algorithms, particularly evident in scenario 3, we can still find that the ALNS algorithm has the smallest convergence mean in all three scenarios, demonstrating its superiority in solving this intervention optimization problem.

To further validate the effectiveness of the improved algorithm, we select two related baselines—random minimization baseline and greedy search baseline—and their specific meanings are as follows:

Random minimization baseline (RM): Given the current intervention $R$ , a random walk is performed $N$ times in a local region. Both the direction and step size are randomly generated. The improved intervention $R^{'}$ is chosen such that

$R^{'} = \underset{r \in N}{\arg \min} F (R_{r})$

(34)

where $R_{r} \in {R_{i}}_{i = 1}^{N}$ .
Greedy search baseline (GS): The improvement of $R$ is purely based on the gradient information of the objective function. This involves taking $n_{1}$ steps in the direction of the gradient descent:

$Δ X_{i v, t} = - \sum_{m = 1}^{n_{1}} β \nabla_{x} F {(R^{(m)})}_{t}$

(35)

As the gradient descent algorithm approaches a local optimum, it tends to gradually converge. Therefore, for the optimization strategies that utilize gradient information to improve

R

, i.e., EI and GS, it is crucial to determine the appropriate number of gradient descent steps

n_{0}

and

n_{1}

, which ensures that significant improvements can be achieved with a relatively low time cost. To illustrate the impact that

n_{0}

and

n_{1}

have on the performance of the algorithm, we conduct experiments by altering these parameters and observing the changes in the algorithm’s runtime and the values of the objective function.

Taking scenario 1 as an example, as shown in Figure 5. Figure 5a and Figure 5b, respectively, illustrate the performance of the I-ALNS algorithm and GS-ALNS algorithm under different values of

n_{0}

and

n_{1}

. The red bar chart represents the runtime of the algorithm, while the blue curve indicates the objective value calculated by Equation (20) at the maximum number of iterations. It can be observed from Figure 5a that, when

n_{0} = 20

, the optimal objective value that the algorithm can find begins to converge and continues to increase the value of

n_{0}

, which has a negligible effect on the improvement of the solution. Similarly, in Figure 5b, the blue curve shows an inflection point at

n_{1} = 60

. Therefore, considering both the benefit of finding a better solution and the computational cost of the algorithm,

n_{0} = 20

and

n_{1} = 60

are deemed appropriate. For other fault scenarios, the values of

n_{0}

and

n_{1}

can also be determined using the inflection point method.

To compare the capability of ALNS algorithms based on different search strategies to find high-quality solutions, we apply them to the three scenarios described in Table 3 and compare them with the ALNS algorithm that has no improvement strategy, in which the Adam optimizer with a learning rate of

β = 10^{- 2}

is used under the EI and GS strategy, the number of random walks is given

N = 50

under the RM strategy, and the maximum number of iterations is 200. The objective values of different algorithms are plotted as follows:

As shown in Figure 6, the algorithm based on the RM strategy, due to its inherent uncertainty, may produce inferior solutions at the initial stage. However, when the number of random walks is sufficiently large, it can converge to a comparatively good result. Experimental data indicate that the two methods using gradient information are able to find better solutions than the random baseline and converge to a stable state more quickly, and the EI-ALNS algorithm, represented by the blue curve, consistently outperforms other baselines and achieves the smallest convergence value.

A diagram of the optimal intervention measure when the EI-ALNS algorithm converges on scenario 1 is illustrated in Figure 7. For the causal graph on the left, the yellow node represents the faulty variable, while red nodes denote the causes of the fault, i.e., from which the variables to be intervened are selected. By using the EI-ALNS algorithm, we obtain the optimal intervention measure

R^{*} = (V_{i v}^{*}, t_{i v}^{*}, L_{i v}^{*}, X_{i v}^{*})

, where the intervened variables

V_{i v}^{*}

include variable F, variable H, and variable N, indicated in red. The middle graphs display the intervention values

X_{i v}^{*}

for the corresponding variables, which are represented by blue curves, where the red area indicates the intervention interval determined by start time

t_{i v}^{*}

and step length

L_{i v}^{*}

. The rightmost graph shows the post-intervention desired values of the faulty variable and is displayed as a red curve. It can be observed that the abnormal condition of variable I within

S

is significantly reduced, indicating a notable intervention effect.

Furthermore, considering that the generative model might be influenced by external noise, which could lead to deviations in the experiment results when evaluating the effectiveness of intervention measures, we run each algorithm 10 times across the example scenarios to minimize the interference of extraneous factors and obtain more representative results. The convergence values of these 10 runs for each algorithm are presented in a box plot, as depicted in Figure 8. We can observe that the optimization capability of the method based on the RM strategy exhibits significant instability, with noticeable fluctuations in convergence results, and a higher likelihood of extreme performance. In contrast, the results of the gradient-based methods show less variation with more consistent algorithmic performance, making them more applicable in real-world scenarios. Additionally, we compare the average runtime spent by different algorithms during these 10 runs, as illustrated in Figure 9. It can be seen that the original ALNS algorithm runs the fastest but produces lower-quality solutions. The RM-ALNS and GS-ALNS algorithms improve solution quality at the cost of higher time consumption, while the EI-ALNS algorithm strikes a balance between time cost and solution quality. From this, we can conclude that the EI strategy has a greater advantage over the other two strategies.

5. Discussion

This paper proposes a heuristic method based on a generative SCM and validates the effectiveness of the method on both synthetic data and satellite attitude control system data. The proposed G-WMGP model, constructed as a dual-layer mapping model based on causal coupling knowledge, effectively captures the complex nonlinear relationships when the fault occurs. The dual-layer structure also exhibits a strong capability to capture uncertainties in the data. Thus, we embed the SCM into the ALNS algorithm for optimizing interventions, enabling the generation of the post-intervention desired values. Additionally, we provide the computational process of the counterfactual SCM to assess the effects of performing intervention measures more accurately. In the search process for the optimal intervention, we introduce a search strategy with the purpose of enhancing the optimization capability. To validate the superiority of the proposed method, in Section 4.1, we demonstrate that the G-WMGP model has a strong learning ability and robustness to noise by comparing both synthetic and real datasets with baseline models. In Section 4.2, through comparison with various heuristic algorithms and different search strategies, we argue that the superior performance of the improved ALNS algorithm lies in its ability to strike a balance between time cost and solution quality. Compared to other search strategies, it not only finds better interventions but also requires less search time.

However, when constructing the generative model, this paper considers assumptions of causal stationarity and the absence of latent confounding factors, which may partially limit the applicability of the model as follows:

The causal stationarity assumption implies that causal relationships in the system remain unchanged over time. Due to the need for counterfactual generation, we only consider scenarios where causal structure changes when the variables in the system are disturbed by external forces, such as interventions. However, in real-world background, causal relationships may not be static. When real-world systems are dynamic and have feedback effects, static SCMs may not be able to capture the complex and dynamic interactions between variables in the system.
No latent confounding factors assumption requires that all potential confounding variables are identified and controlled, assuming complete causal knowledge of the system. However, even if we believe that all factors related to failures have been observed and controlled, there may still be unidentified confounding variables in reality. Missing confounding factors could potentially mislead us in identifying the true causes of failures, resulting in biases when estimating causal effects, as shown in Figure 1.

The ALNS algorithm is effective in solving the intervention optimization issue, but it still has certain limitations in the following aspects:

Scalability: When the problem scale significantly increases, the performance of the ALNS algorithm may decline. This is because the algorithm needs to find effective solutions in a larger search space, and each neighborhood operation requires more computing resources and time, which may lead to an inefficient search process. Meanwhile, the ALNS algorithm typically relies on heuristic rules to guide the searching process, which may not be sufficient to capture all important dependencies and interactions in the face of complex causal structures, thus affecting the convergence of the algorithm and the quality of the final solution.
Computational complexity: when dealing with dynamically changing datasets, the ALNS algorithm may need to frequently update its search strategies or adjust parameters, which could affect the computational efficiency and the feasibility of real-time applications. Additionally, if there are complex causal relationships among variables in the dataset, it may be necessary to design specific neighborhood structures or customized improvement strategies, thereby increasing the complexity of the algorithm.

Future work could focus on developing more reasonable causal models or capturing causal relationships that are dynamically changing. For intervention optimization algorithms, it is important to consider whether the computational efficiency of the algorithms can be improved, for instance, by addressing complex causal structures through the design of neighborhood structures. Additionally, investigating whether these algorithms can be deployed for real-time optimization is also worth exploring.

Author Contributions

Writing—original draft, H.W.; conceptualization and resources, S.C.; supervision, J.F.; project administration, G.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hajna, S.; Sharp, S.J.; Cooper, A.J.M.; Williams, K.M.; Van Sluijs, E.M.F.; Brage, S.; Griffin, S.J.; Sutton, S. Effectiveness of Minimal Contact Interventions: An RCT. Am. J. Prev. Med. 2021, 60, e111–e121. [Google Scholar] [CrossRef] [PubMed]
Shao, Y.; Wang, S.; Zhao, W. A Causality-Aware Perspective on Domain Generalization via Domain Intervention. Electronics 2024, 13, 1891. [Google Scholar] [CrossRef]
Wang, X.; Wang, W.; Feng, F.; Rong, W.; Yin, C.; Xiong, Z. Causal Intervention for Fairness in Multibehavior Recommendation. IEEE Trans. Comput. Soc. Syst. 2024, 11, 6320–6332. [Google Scholar] [CrossRef]
Halpern, J.Y. Axiomatizing Causal Reasoning. J. Artif. Intell. Res. 2000, 12, 317–337. [Google Scholar] [CrossRef]
Huntington-Klein, N. Pearl before Economists: The Book of Why and Empirical Economics. J. Econ. Methodol. 2022, 29, 326–334. [Google Scholar] [CrossRef]
Lee, S.; Bareinboim, E. Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
Garnett, R.; Osborne, M.A.; Roberts, S.J. Bayesian Optimization for Sensor Set Selection. In Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks, Stockholm, Sweden, 12–16 April 2010; ACM: Stockholm, Sweden, 2010; pp. 209–219. [Google Scholar] [CrossRef]
Wang, X.; Jin, Y.; Schmitt, S.; Olhofer, M. Recent Advances in Bayesian Optimization. ACM Comput. Surv. 2023, 55, 1–36. [Google Scholar] [CrossRef]
Yoo, C.; Gonzalez, E.; Gong, Z.; Roy, D. A Better Mechanistic Understanding of Big Data through an Order Search Using Causal Bayesian Networks. Big Data Cogn. Comput. 2022, 6, 56. [Google Scholar] [CrossRef]
Hu, H.; Kerschberg, L. Improving Causal Bayesian Networks Using Expertise in Authoritative Medical Ontologies. ACM Trans. Comput. Healthc. 2023, 4, 1–32. [Google Scholar] [CrossRef]
Aglietti, V.; Lu, X.; Paleyes, A.; González, J. Causal Bayesian Optimization. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), Palermo, Italy, 26–28 August 2020. [Google Scholar]
Aglietti, V.; Malek, A.; Ktena, I.; Chiappa, S. Constrained Causal Bayesian Optimization. In Proceedings of the 40th International Conference on Machine Learning (ICML), Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
Branchini, N.; Aglietti, V.; Dhir, N.; Damoulas, T. Causal Entropy Optimization. Available online: https://arxiv.org/pdf/2208.10981 (accessed on 23 August 2022).
Zhang, J.; Cammarata, L.; Squires, C.; Sapsis, T.P.; Uhler, C. Active Learning for Optimal Intervention Design in Causal Models. Nat. Mach. Intell. 2023, 5, 1066–1075. [Google Scholar] [CrossRef]
Li, Z.; Li, Y.; Lian, Z.; Zheng, R. Improved Causal Bayesian Optimization Algorithm with Counter-Noise Acquisition Function and Supervised Prior Estimation. J. Phys. Conf. Ser. 2023, 2547, 012017. [Google Scholar] [CrossRef]
Sussex, S.; Makarova, A.; Krause, A. Model-Based Causal Bayesian Optimization. In Proceedings of the 11th International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Ren, S.; Qian, X. Causal Bayesian Optimization via Exogenous Distribution Learning. arXiv 2024, arXiv:2402.02277. [Google Scholar] [CrossRef]
Nair, V.; Patil, V.; Sinha, G. Budgeted and Non-Budgeted Causal Bandits. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), Virtual, 13–15 April 2021. [Google Scholar]
Yan, Z.; Mukherjee, A.; Varıcı, B.; Tajer, A. Robust Causal Bandits for Linear Models. IEEE J. Sel. Areas Inf. Theory 2024, 5, 78–93. [Google Scholar] [CrossRef]
de Kroon, A.A.W.M.; Belgrave, D.; Mooij, J.M. Causal Bandits without Prior Knowledge Using Separating Sets. In Proceedings of the First Conference on Causal Learning and Reasoning (CLeaR), Eureka, CA, USA, 11–13 April 2022. [Google Scholar]
Bilodeau, B.; Wang, L.; Roy, D.M. Adaptively Exploiting D-Separators with Causal Bandits. In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
Håkansson, S.; Lindblom, V.; Gottesman, O.; Johansson, F.D. Learning to Search Efficiently for Causally Near-Optimal Treatments. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
Gönül, S.; Namlı, T.; Coşar, A.; Toroslu, İ.H. A Reinforcement Learning Based Algorithm for Personalization of Digital, Just-in-Time, Adaptive Interventions. Artif. Intell. Med. 2021, 115, 102062. [Google Scholar] [CrossRef] [PubMed]
Shi, Y.; Zhang, G.; Lu, D.; Lv, L.; Liu, H. Intervention Optimization for Crowd Emotional Contagion. Inf. Sci. 2021, 576, 769–789. [Google Scholar] [CrossRef]
Yao, T.; Wang, C.; Wang, X.; Li, X.; Jiang, Z.; Qi, P. Enhancing Percutaneous Coronary Intervention with Heuristic Path Planning and Deep-Learning-Based Vascular Segmentation. Comput. Biol. Med. 2023, 166, 107540. [Google Scholar] [CrossRef] [PubMed]
Strobl, E.V. Counterfactual Formulation of Patient-Specific Root Causes of Disease. J. Biomed. Inform. 2024, 150, 104585. [Google Scholar] [CrossRef] [PubMed]
Zhang, K.; Zhang, K.; Hyvärinen, A.; Hyvarinen, A. Distinguishing Causes from Effects Using Nonlinear Acyclic Causal Models. In Proceedings of the 22nd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–11 December 2008. [Google Scholar]
Karimi, A.H.; Von Kügelgen, J.; Schölkopf, B.; Valera, I. Algorithmic Recourse under Imperfect Causal Knowledge a Probabilistic Approach. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
Weilbach, J.; Gerwinn, S.; Kandemir, M.; Fraenzle, M. Estimation of Counterfactual Interventions under Uncertainties. In Proceedings of the 15th Asian Conference on Machine Learning (ACML), Istanbul, Turkey, 11–14 November 2023. [Google Scholar]
Gong, W.; Jennings, J.; Zhang, C.; Pawlowski, N. Rhino: Deep Causal Temporal Relationship Learning with History-Dependent Noise. In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
Runge, J. Causal Network Reconstruction from Time Series: From Theoretical Assumptions to Practical Estimation. Chaos Interdiscip. J. Nonlinear Sci. 2018, 28, 075310. [Google Scholar] [CrossRef] [PubMed]
Zhang, K.; Wang, Z.; Zhang, J.; Schölkopf, B. On Estimation of Functional Causal Models: General Results and Application to the Post-Nonlinear Causal Model. ACM Trans. Intell. Syst. Technol. 2016, 7, 1–22. [Google Scholar] [CrossRef]
Papamakarios, G.; Nalisnick, E.; Rezende, D.J.; Mohamed, S.; Lakshminarayanan, B. Normalizing Flows for Probabilistic Modeling and Inference. J. Mach. Learn. Res. 2021, 22, 1–64. [Google Scholar]
Suksee, S.; Sindhuchao, S. GRASP with ALNS for Solving the Location Routing Problem of Infectious Waste Collection in the Northeast of Thailand. Int. J. Ind. Eng. Comput. 2021, 12, 305–320. [Google Scholar] [CrossRef]
Lawrence, A.R.; Kaiser, M.; Sampaio, R.; Sipos, M. Data Generating Process to Evaluate Causal Discovery Techniques for Time Series Data. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]

Figure 1. An example of a “fake” relationship. Since X is the cause of Y, and Z is not directly related to Y, if Y fails, intervention should be made on the real cause X rather than the false cause Z. Incorrect intervention (depicted as “x” between X and Z) will not have impacts on the actual fault transmission link (depicted as “?” between X and Y).

Figure 2. The framework of our method.

Figure 3. The flowchart of EI-ALNS.

Figure 4. Causal relationships among these 19 variables, presented as a causal graph.

Figure 5. Runtime performance graph under various

n_{0}

and

n_{1}

. (a) EI-ALNS on scenario 1; (b) GS-ALNS on scenario 1.

Figure 5. Runtime performance graph under various

n_{0}

and

n_{1}

. (a) EI-ALNS on scenario 1; (b) GS-ALNS on scenario 1.

Figure 6. Objective value of different algorithms across time steps.

Figure 7. The optimal intervention measure and post-intervention result using EI-ALNS algorithm for scenario 1.

Figure 8. Optimal objective value of each method over 10 times.

Figure 9. Average runtime per iteration.

Table 1. Comparison with baseline models on synthetic datasets.

Baseline Models			GP-SCM	G-MTGP	G-WGP	G-WMGP
Dataset	Region	Metrics	GP-SCM	G-MTGP	G-WGP	G-WMGP
Synthetic 1	Reg 1	MAE	0.1941	0.1975	0.1958	0.1912
		RMSE	0.2340	0.2390	0.2365	0.2299
		SNR	6.5542	7.2749	7.4981	7.9175
	Reg 2	MAE	0.1911	0.1836	0.1867	0.1829
		RMSE	0.2307	0.2235	0.2281	0.2212
		SNR	6.5309	7.6280	7.4668	7.8326
	Reg 3	MAE	0.2025	0.2018	0.2002	0.1978
		RMSE	0.2472	0.2487	0.2463	0.2403
		SNR	6.4119	6.7236	7.2403	7.5264
Synthetic 2	Reg 1	MAE	0.2055	0.1952	0.1927	0.1894
		RMSE	0.2498	0.2359	0.2318	0.2302
		SNR	8.3081	10.0011	10.0951	10.2510
	Reg 2	MAE	0.2068	0.1968	0.1952	0.1917
		RMSE	0.2512	0.2385	0.2355	0.2341
		SNR	9.0110	9.7833	9.9564	10.3954
	Reg 3	MAE	0.2068	0.2011	0.2017	0.1994
		RMSE	0.2510	0.2478	0.2490	0.2438
		SNR	9.2907	10.0112	10.0269	10.3497
Synthetic 3	Reg 1	MAE	0.2113	0.1974	0.1976	0.1935
		RMSE	0.2597	0.2401	0.2392	0.2336
		SNR	5.1458	6.7793	6.9453	7.1746
	Reg 2	MAE	0.1959	0.1854	0.1870	0.1857
		RMSE	0.2375	0.2276	0.2280	0.2250
		SNR	6.9043	8.0212	8.0459	8.1312
	Reg 3	MAE	0.2012	0.1947	0.1928	0.1928
		RMSE	0.2463	0.2358	0.2325	0.2332
		SNR	6.7114	7.7421	7.9201	7.9916

Table 2. Information of satellite attitude control system dataset.

Attribute	Value
Number of variables	19
Sampling frequency	5 min
Data length	68,713

Table 3. Comparison with baseline models on a real dataset.

Baseline Models		GP-SCM	G-MTGP	G-WGP	G-WMGP
Region	Metrics	GP-SCM	G-MTGP	G-WGP	G-WMGP
Reg 1	MAE	0.8308	0.8023	0.7984	0.7701
	RMSE	1.2641	1.1871	1.1872	1.1235
	SNR	−6.9671	−3.5833	−3.3389	−3.0929
Reg 2	MAE	1.1328	1.0106	1.0302	0.9751
	RMSE	2.9878	2.9163	2.9069	2.9104
	SNR	−1.9956	−1.4988	−1.5842	−1.3890
Reg 3	MAE	0.9988	1.0358	1.0053	0.9602
	RMSE	1.4118	1.4231	1.4594	1.3906
	SNR	0.2838	3.4263	0.9867	0.0655

Table 4. Fault scenarios.

Scenario ID	Faulty Variable	Fault Start Time	Fault Duration
1	I	57,496	25
2	H	25,528	31
3	S	7265	48

Table 5. Comparison of ALNS, GA, SA, and PSO.

Scenario ID	Objective Value
Scenario ID	ALNS	GA	SA	PSO
1	3.290 ± 2.0 × 10⁻³	3.298 ± 1.7 × 10⁻³	3.387 ± 3.3 × 10⁻³	3.314 ± 2.5 × 10⁻³
2	4.256 ± 2.1 × 10⁻³	4.267 ± 2.4 × 10⁻³	4.273 ± 2.9 × 10⁻³	4.264 ± 1.9 × 10⁻³
3	7.774 ± 3.4 × 10⁻³	7.792 ± 2.8 × 10⁻³	7.799 ± 3.8 × 10⁻³	7.790 ± 2.4 × 10⁻³

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, H.; Chen, S.; Fan, J.; Jin, G. Design of Optimal Intervention Based on a Generative Structural Causal Model. Mathematics 2024, 12, 3172. https://doi.org/10.3390/math12203172

AMA Style

Wu H, Chen S, Fan J, Jin G. Design of Optimal Intervention Based on a Generative Structural Causal Model. Mathematics. 2024; 12(20):3172. https://doi.org/10.3390/math12203172

Chicago/Turabian Style

Wu, Haotian, Siya Chen, Jun Fan, and Guang Jin. 2024. "Design of Optimal Intervention Based on a Generative Structural Causal Model" Mathematics 12, no. 20: 3172. https://doi.org/10.3390/math12203172

APA Style

Wu, H., Chen, S., Fan, J., & Jin, G. (2024). Design of Optimal Intervention Based on a Generative Structural Causal Model. Mathematics, 12(20), 3172. https://doi.org/10.3390/math12203172

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design of Optimal Intervention Based on a Generative Structural Causal Model

Abstract

1. Introduction

2. Background

2.1. Generative Structural Causal Model

2.2. Post-Nonlinear Causal Model

2.3. Counterfactual Updating in SCM

2.4. Intervention Optimization

3. Proposed Method

3.1. Data Generation Based on SCM

3.1.1. Generative Model Construction

3.1.2. Counterfactual SCM Calculation

3.2. Design of Optimal Intervention Based on EI-ALNS

3.2.1. Problem Modeling

3.2.2. Expected Improvement Search Strategy

4. Experimental Results

4.1. Models Comparison

4.1.1. Synthetic Datasets

4.1.2. Satellite Attitude Control System Dataset

4.2. Algorithms Comparison

5. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI