Leveraging Stochasticity for Open Loop and Model Predictive Control of Spatio-Temporal Systems

Boutselis, George I.; Evans, Ethan N.; Pereira, Marcus A.; Theodorou, Evangelos A.

doi:10.3390/e23080941

Open AccessArticle

Leveraging Stochasticity for Open Loop and Model Predictive Control of Spatio-Temporal Systems

by

George I. Boutselis

^1,†,

Ethan N. Evans

^1,*,†

,

Marcus A. Pereira

^2,† and

Evangelos A. Theodorou

^1,2

¹

Department of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA 30313, USA

²

Institute of Robotics and Intelligent Machines, Georgia Institute of Technology, Atlanta, GA 30313, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work and appear in last name order.

Entropy 2021, 23(8), 941; https://doi.org/10.3390/e23080941

Submission received: 3 June 2021 / Revised: 30 June 2021 / Accepted: 14 July 2021 / Published: 23 July 2021

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Stochastic spatio-temporal processes are prevalent across domains ranging from the modeling of plasma, turbulence in fluids to the wave function of quantum systems. This letter studies a measure-theoretic description of such systems by describing them as evolutionary processes on Hilbert spaces, and in doing so, derives a framework for spatio-temporal manipulation from fundamental thermodynamic principles. This approach yields a variational optimization framework for controlling stochastic fields. The resulting scheme is applicable to a wide class of spatio-temporal processes and can be used for optimizing parameterized control policies. Our simulated experiments explore the application of two forms of this approach on four stochastic spatio-temporal processes, with results that suggest new perspectives and directions for studying stochastic control problems for spatio-temporal systems.

Keywords:

stochastic spatio-temporal systems; stochastic partial differential equations; stochastic control; variational optimization; optimization in Hilbert space

1. Introduction and Related Work

Many complex systems in nature vary spatially and temporally, and are often represented as stochastic partial differential equations (SPDEs). These systems are ubiquitous in nature and engineering, and can be found in fields such as applied physics, robotics, autonomy, and finance [1,2,3,4,5,6,7,8,9]. Examples of stochastic spatio-temporal processes include the Poisson–Vlassov equation in plasma physics, heat, Burgers’ and Navier–Stokes equations in fluid mechanics, and Zakai and Belavkin equations in classical and quantum filtering. Despite their ubiquity and significance to many areas of science and engineering, algorithms for stochastic control of such systems are scarce.

The challenges of controlling SPDEs include significant control signal time delays, dramatic under-actuation, high dimensionality, regular bifurcations, and multi-modal instabilities. For many SPDEs, existence and uniqueness of solutions remains an open problem; when solutions exist, they often have a weak notion of differentiability, if at all. Their performance analysis must be treated with functional calculus, and their state vectors are often most conveniently described by vectors in an infinite-dimensional time-indexed Hilbert space, even for scalar one-dimensional SPDEs. These and other challenges together represent a large subset of the current-day challenges facing the fluid dynamics and automatic control communities, and present difficulties in the development of mathematically consistent and numerically realizable algorithms.

The majority of computational stochastic control methods in the literature have been dedicated to finite-dimensional systems. Algorithms for decision making under uncertainty of such systems typically rely on standard optimality principles, from the stochastic optimal control (SOC) literature, namely the dynamic programming (or Bellman) principle, and the stochastic Pontryagin maximum principle [10,11,12]. The resulting algorithms typically require solving the Hamilton–Jacobi–Bellman (HJB) equation: a backward nonlinear partial differential equation (PDE) of which solutions are not scalable to high dimensional spaces.

Several works (e.g., [13,14] for the Kuramoto–Sivashinsky SPDE) propose model predictive control based methodologies for reduced order models of SPDEs based on SOC principles. These reduced order methods transform the original SPDE into a finite set of coupled stochastic differential equations (SDEs). In SDE control, probabilistic representations of the HJB PDE can solve scalability via sampling techniques [15,16], including iterative sampling and/or parallelizable implementations [17,18]. These methods have been explored in a reinforcement learning context for SPDEs [19,20,21].

Recently, a growing body of work considers deterministic PDEs, and utilize finite dimensional machine learning methods, such as deep neural network surrogate models that utilize standard SOC-based methodologies. In the context of fluid systems, these approaches are increasingly widespread in the literature [22,23,24,25,26]. A critical issue in applying controllers that rely on a limited number of modes is that they can produce concerning emergent phenomena, including spillover instabilities [27,28] and failing latent space stabilizability conditions [25].

Outside the large body of finite dimensional methods for PDEs and/or SPDEs are a few works that attempt to extend the classical HJB theory for systems described by SPDEs. These are comprehensively explored in [29] and include both distributed and boundary control problems. Most notably, [30] investigates explicit solutions to the HJB equation for the stochastic Burgers equation based on an exponential transformation, and [31] provides an extension of the large deviation theory to infinite dimensional spaces that creates connections to the HJB theory. These and most other works on the HJB theory for SPDEs mainly focus on theoretical contributions and leave the literature with algorithms and numerical results tremendously sparse. Furthermore, the HJB theory for boundary control has certain mathematical difficulties, which impose limitations.

Alternative methodologies are derived, using information theoretic control. The basis of a subset of these methods is a relation between free energy and relative entropy in statistical physics, given by the following:

Free Energy \leq Work - Temperature \times Entropy

(1)

This inequality is an instantiation of the second law in stochastic thermodynamics: increase in entropy results in minimizing the right hand side of the expression. In finite dimensions, connections between Equation (1) and dynamic programming motivate these methods. Essentially, there exist two different points of view on decision making under uncertainty that overlap for fairly general classes of stochastic systems, as depicted in Figure 1.

These connections are extended to infinite-dimensional spaces [32] (see also Appendix F) and are leveraged in this letter to develop practical algorithms for distributed and boundary control of stochastic fields. Specifically, we develop a generic framework for control of stochastic fields that are modeled as semi-linear SPDEs. We show that optimal control of SPDEs can be cast as a variational optimization problem and then solved, using sampling of infinite dimensional diffusion processes. The resulting variational optimization algorithm can be used in either fixed or receding time horizon formats for distributed and boundary control of semilinear SPDEs and utilizes adaptive importance sampling of stochastic fields. The derivation relies on non-trivial generalization of stochastic calculus to arbitrary Hilbert spaces and has broad applicability.

This manuscript presents an open loop and model predictive control methodology for the control of SPDEs related to fluid dynamics, which are grounded on the theory of stochastic calculus in function spaces, which is not restricted to any particular finite representation of the original system. The control updates are independent of the method used to numerically simulate the SPDEs, which allows the most suitable problem dependent numerical scheme (e.g., finite differences, Galerkin methods, and finite elements) to be employed.

Furthermore, deriving the variational optimization approach for optimal control entirely in Hilbert spaces overcomes numerical issues, including matrix singularities and SPDE space-time noise degeneracies that typically arise in finite dimensional representations of SPDEs. Thus, the work in this letter is a generalization of information theoretic control methods in finite dimensions [33,34,35,36] to infinite dimensions and inherits crucial characteristics from its finite dimensional counterparts.

However, the primary benefit of the information theoretic approach presented in this work is that the stochasticity inherent in the system can be leveraged for control. Namely, The inherent system stochasticity is utilized for exploration in the space of trajectories of SPDEs in Hilbert spaces, which provide a Newton-type parameter update on the parametrized control policy. Importance sampling techniques are incorporated to iteratively guide the sampling distribution, and result in a mathematically consistent and numerically realizable sampling-based algorithm for distributed and boundary controlled semi-linear SPDEs.

2. Preliminaries and Problem Formulation

At the core of our method are comparisons between sampled stochastic paths used to perform Newton-type control updates as depicted in Figure 2. Let H, U be separable Hilbert spaces with inner products

{〈 \cdot, \cdot 〉}_{H}

and

{〈 \cdot, \cdot 〉}_{U}

, respectively,

σ

-fields

B (H)

and

B (U)

, respectively, and probability space

(Ω, F, P)

with filtration

F_{t},

t \in [0, T]

. Consider the controlled and uncontrolled infinite-dimensional stochastic systems of the following form:

\begin{matrix} d X & = A X d t + F (t, X) d t + \frac{1}{\sqrt{ρ}} G (t, X) d W (t), \end{matrix}

(2)

\begin{matrix} d X & = A X d t + F (t, X) d t + G (t, X) (U^{(i)} (t, X; θ) d t + \frac{1}{\sqrt{ρ}} d W (t)), \end{matrix}

(3)

where

X (0)

is an

F_{0}

-measurable, H-valued random variable, and

A : D (A) \subset H \to H

is a linear operator, where

D (A)

denotes here the domain of

A

.

F : [0, T] \times H \to H

and

G : [0, T] \times U \to H

are nonlinear operators that satisfy properly formulated Lipschitz conditions associated with the existence and uniqueness of solutions to Equation (2) as described in ([2] Theorem 7.2). The term

U^{(i)} (t, X; θ)

is a control operator on Hilbert space H parameterized by a finite set of decision variables

θ

. We view these dynamics in an iterative fashion in order to realize an iterative method. As such, the superscript

(i)

refers to the iteration number.

The term

W (t) \in U

corresponds to a Hilbert space Wiener process, which is a generalization of the Wiener process in finite dimensions. When this noise profile is spatially uncorrelated, we call it a cylindrical Wiener process, which requires the added assumptions on

A

in ([2] Hypothesis 7.2) in order to form a contractive, unitary, linear semigroup, which is required to guarantee the existence and uniqueness of

F_{t}

-adapted weak solutions

X (t), t \geq 0

. A thorough description of the Wiener process in Hilbert spaces, along with its various forms, can be found in Appendix A. For generality, Equations (2) and (3) introduce the parameter

ρ \in R

, which acts as a uniform scaling of the covariance of the Hilbert space Wiener process. This parameter also appears as a “temperature” parameter in the context of Equation (1).

In what follows,

{〈 \cdot, \cdot 〉}_{S}

denotes the inner product in a Hilbert space S and

C ([0, T]; H)

denotes the space of continuous processes in H for

t \in [0, T]

. Define the measure on the path space of uncontrolled trajectories produced by Equation (2) as

L

and define the measure on the path space of controlled trajectories produced by Equation (3) as

L^{(i)}

. The notation

E_{L}

denotes expectations over paths as Feynman path integrals.

Many physical and engineering systems can be written in the abstract form of Equation (2) by properly defining operators

A

, F and G along with their corresponding domains. Examples can be found in our simulated experiments, as well as Table 1, with more complete descriptions in ([2] Chapter 13)). The goal of this work is to establish control methodologies for stochastic versions of such systems.

Control tasks defined over SPDEs typically quantify task completion by a measurable functional

J : C ([0, T]; H) \to R

referred to as the cost functional, given by the following:

J (X (\cdot, ω)) = ϕ (X (T), T) + \int_{t}^{T} ℓ (X (s), s) d s,

(4)

where

X (\cdot, ω) \in C ([0, T]; H)

denotes the entire state trajectory,

ϕ (X (T), T)

is a terminal state cost and

ℓ (X (s), s)

is a state cost accumulated over the time horizon

s \in [t, T]

. With this, we define the terms of Equation (1). More information can be found in Appendix B.

Define the free energy of cost function

J (X)

with respect to the uncontrolled path measure

L

and temperature

ρ \in R

as follows [32]:

V (X) : = - \frac{1}{ρ} ln E_{L} [exp (- ρ J (X))] .

(5)

Additionally, the generalized entropy of controlled path measure

L^{(i)}

with respect uncontrolled path measure

L

is defined as follows:

S (\tilde{L} | | L) : = \{\begin{matrix} - \int_{Ω} \frac{d L^{(i)}}{d L} ln \frac{d L^{(i)}}{d L} d L, if L^{(i)} < < L, \\ + \infty, otherwise, \end{matrix}

(6)

where “

< <

” denotes absolute continuity [32].

The relationship between free energy and relative entropy was extended to a Hilbert space formulation in [32]. Based on the free energy and generalized entropy definitions, Equation (1) with temperature

T = \frac{1}{ρ}

becomes the so-called Legendre transformation, and takes the following form:

\begin{matrix} - \frac{1}{ρ} ln E_{L} [exp (- ρ J)] \leq [E_{L^{(i)}} (J) - \frac{1}{ρ} S (L^{(i)} | | L)], \end{matrix}

(7)

with equilibrium probability measure in the form of a Gibbs distribution as follows:

d L^{*} = \frac{exp (- ρ J) d L}{\int_{Ω} exp (- ρ J) d L},

(8)

The optimality of

L^{*}

is verified in [32]. The statistical physics interpretation of inequality Equation (7) is that maximization of entropy results in a reduction in the available energy. At the thermodynamic equilibrium, the entropy reaches its maximum and

V = E - T S

.

The free energy-relative entropy relation provides an elegant methodology to derive novel algorithms for distributed and boundary control problems of SDPEs. This relation is also significant in the context of SOC literature, wherein optimality of control solutions rely on fundamental principles of optimality, such as the Pontryagin maximum principle [10] or the Bellman principle of optimality [11]. Appendix F shows that by applying a properly defined Feynman–Kac argument, the free energy is equivalent to a value function that satisfies the HJB equation. This connection is valid for general probability measures, including measures defined on path spaces induced by infinite-dimensional stochastic systems.

Our derivation is general in the context of [30], wherein they apply a transformation that is only possible for state-dependent cost functions. The proof given in Appendix E is novel for a generic state and a time-dependent cost to the best knowledge of the authors. The observation that the Legendre transformation in Equation (7) is connected to optimality principles from SOC motivates the use of Equation (8) for the development of stochastic control algorithms.

Flexibility of this approach is apparent in the context of stochastic boundary control problems, which are theoretically more challenging due to the unbounded nature of the solutions [29,37]. The HJB theory for these settings is not as mature, and results are restricted to simplistic cases [38]. Nonetheless, since Equation (7) holds for arbitrary measures, the difficulties of related works are overcome by the proposed information theoretic approach. Hence, in either the stochastic boundary control or distributed control case, the free energy represents a lower bound of a state cost plus the associated control effort. Despite losing connections to optimality principles in systems with boundary control, our strategy in both distributed and boundary control settings is to optimize the distance between our parameterized control policies and the optimal measure in Equation (8) so that the lower bound of the total cost can be approached by the controlled system. Specifically, we look for a finite set of decision variables

θ^{*}

that yield a Hilbert space control input

U (\cdot)

that minimizes the distance to the optimal path measure as follows:

\begin{matrix} θ^{*} & = \underset{θ}{argmax} S (L^{*} | | L^{(i)}) \end{matrix}

(9)

\begin{matrix} = \underset{θ}{argmax} [- \int_{Ω} \frac{d L^{*}}{d L^{(i)}} ln \frac{d L^{*}}{d L^{(i)}} d L^{(i)}] . \end{matrix}

(10)

3. Stochastic Optimization in Hilbert Spaces

To optimize Equation (9), we apply the chain rule for the Radon-Nikodym derivative twice. While this is necessary on the right term for our control update, this is applied to the left term for importance sampling, which enhances algorithmic convergence. In each instance, the chain rule has the form:

\frac{d L^{*}}{d L^{(i)}} = \frac{d L^{*}}{d L} \frac{d L}{d L^{(i)}} .

(11)

Note that the first derivative is given by Equation (8), while the second derivative is given by a change of measure between control and uncontrolled infinite dimensional stochastic dynamics. This change in measure arises from a version of Girsanov’s Theorem, provided with a proof in Appendix C. Under the open-loop parameterization, the following holds:

U (t, x; θ) = \sum_{ℓ = 1}^{N} m_{ℓ} (x) u_{ℓ} (t) = m {(x)}^{⊤} u (t; θ),

(12)

Girsanov’s theorem yields the following change of measure between the two SPDEs:

\begin{matrix} \frac{d L}{d L^{(i)}} = exp (- \sqrt{ρ} \int_{0}^{T} u {(t)}^{⊤} \bar{m} (t) + \frac{ρ}{2} \int_{0}^{T} u {(t)}^{⊤} M u (t) d t), \end{matrix}

(13)

with

\bar{m} (t) : = {[{〈 m_{1}, d W (t) 〉}_{U_{0}}, . . ., {〈 m_{N}, d W (t) 〉}_{U_{0}}]}^{⊤} \in R^{N},

(14)

M \in R^{N \times N}, {(M)}_{i j} : = {〈 m_{i}, m_{j} 〉}_{U},

(15)

where

x \in D \subset R^{n}

denotes the localization of actuators in the spatial domain

D

of the SPDEs and

m_{ℓ} \in U

are design functions that specify how actuation is incorporated into the infinite dimensional dynamical system. This parameterization can be used for both open-loop trajectory optimization as well as for model predictive control. In our experiments we apply model predictive control through re-optimization and turn Equation (12) into an implicit feedback-type control. Optimization using Equation (9) with policies that explicitly depend on the stochastic field is also possible and is considered, using gradient-based optimization in [19,20,21].

To simplify optimization in Equation (9), we further parameterize

u (t; θ)

as a simple measurable function. In this case, the parameters

θ

consist of all step functions

{u_{i}}

. With this representation, we arrive at our main result—an importance sampled variational controller of the following form:

Lemma 1.

Consider the controlled SPDE in (3) and a parameterization of the control as specified by (12), with

θ

consisting of step functions

{u_{i}}

. The iterative control scheme for solving the stochastic control problem

u^{*} = argmax S (L^{*} | | \tilde{L}) .

(16)

is given by the following expression:

\begin{matrix} u_{j}^{(i + 1)} & = u_{j}^{(i)} + \frac{1}{\sqrt{ρ} Δ t} M^{- 1} E_{L^{(i)}} [\frac{exp (- ρ J^{(i)})}{E_{L^{(i)}} [exp (- ρ J^{(i)})]} \int_{t_{j}}^{t_{j + 1}} {\bar{m}}^{(i)} (t)], \end{matrix}

(17)

\begin{matrix} where J^{(i)} & : = J + \frac{1}{\sqrt{ρ}} \sum_{j = 1}^{L} u_{j}^{(i) ⊤} \int_{t_{j}}^{t_{j + 1}} {\bar{m}}^{(i)} (t) + \frac{Δ t}{2} \sum_{j = 1}^{L} u_{j}^{(i) ⊤} M u_{j}^{(i)}, \end{matrix}

(18)

\begin{matrix} {\bar{m}}^{(i)} (t) & : = {[{〈 m_{1}, d W^{(i)} (t) 〉}_{U}, . . ., {〈 m_{N}, d W^{(i)} (t) 〉}_{U}]}^{⊤} \in R^{N}, \end{matrix}

(19)

\begin{matrix} and W^{(i)} (t) & : = W (t) - \sqrt{ρ} \int_{0}^{t} U^{(i)} (s) d s . \end{matrix}

(20)

Proof.

See Appendix D. □

Lemma 1 yields a sampling-based iterative scheme for controlling semilinear SPDEs, and is depicted in Figure 2. An initial control policy, which is typically initialized by zeros, is applied to the semilinear SPDE. The controlled SPDE then evolves with different realizations of the Wiener process in a number of trajectory rollouts. The performance of these rollouts is evaluated on the importance sampled cost function in Equation (18). These are used to calculate the Gibbs averaged performance weightings

exp (- ρ J^{(i)}) / E_{L}^{(i)} [exp (- ρ J^{(i)}]

. Finally, the outer expectation in Equation (17) is evaluated, and used to produce an update to the control policy.

This procedure is repeated over a number of iterations. In the open-loop setting, the procedure considers the entire time window

[0, T]

, and the entire control trajectory is optimized in a “single shot”. In contrast, in the MPC setting, a shorter time window

[t_{sim}, T_{sim}]

is considered for I iterations; the control at the current time step

u_{I} (t_{sim})

is applied to the system; and the window recedes backward by a time step

Δ t

. This procedure is explained in greater detail in Appendix J.

For the purposes of implementation, we perform the approximation as follows:

\int_{t_{j}}^{t_{j + 1}} {〈 m_{l}, d W (t) 〉}_{U_{0}} \approx \sum_{s = 1}^{R} {〈 m_{l}, e_{s} 〉}_{U} Δ β_{s}^{(i)} (t_{j}),

(21)

where

Δ β_{s}^{(i)} (t_{j})

are Brownian motions sampled from the zero-mean Gaussian distribution

Δ β_{s}^{(i)} (t_{j}) \sim N (0, Δ t)

, and

{e_{j}}

form a complete orthonormal system in U. This is based on truncation of the cylindrical Wiener noise expansion as follows:

W (t) = \sum_{j = 1}^{\infty} β_{j} (t) e_{j} .

(22)

We note that the control of SPDEs with cylindrical Wiener noise, as shown above, can be extended to the case in [30] in which

G (t, X)

is treated as a trace-class covariance operator

\sqrt{Q}

of a Q-Wiener process

d W_{Q} (t)

. See Appendix H for more details. The resulting iterative control policy is identical to Equation (17) derived above.

4. Comparisons to Finite-Dimensional Optimization

In light of recent work that applied finite dimensional control after reducing the SPDE model to a set of SDEs or ODEs, we highlight the critical advantages of optimizing in Hilbert spaces before discretizating. The main challenge with performing optimization-based control after discretization is that SPDEs typically reduce to degenerate diffusion process for which importance sampling schemes are difficult. Consider the following finite dimensional SDE representation of Equation (2):

\begin{matrix} d \hat{X} & = A \hat{X} d t + F (t, \hat{X}) d t + G (t, \hat{X}) (M u (t; θ) d t + \frac{1}{\sqrt{ρ}} R d β (t)), \end{matrix}

(23)

where

\hat{X} \in R^{d}

is a d-dimensional vector comprising the values of the stochastic field at particular basis elements. The terms

A

,

F

, and

G

are matrices associated with their respective Hilbert space operators. The matrix

M \in R^{d \times k}

, where k is the number of actuators placed in the field. The vector

d β \in R^{m}

collects noise terms and

R

collects associated finite dimensional basis vectors of Equation (22). The matrix

R \in R^{d \times m}

is composed of d rows, which is the number of basis elements used to spatially discretize the SPDE Equation (2), and m columns, which is the number of expansion terms of Equation (22) that are used.

Girsanov’s theorem for SDEs of the form Equation (23) requires the matrix

R

to be invertible as seen in the resulting change of measure:

\begin{matrix} \frac{d L}{d L^{(i)}} = exp ( & - \sqrt{ρ} \int_{0}^{T} {〈R^{- 1} M u (s, θ), d W (s)〉}_{U} \\ + \frac{ρ}{2} \int_{0}^{T} {〈R^{- 1} M u (s, θ), R^{- 1} M u (s, θ)〉}_{U} d s) \end{matrix}

(24)

Deriving the optimal control in the finite dimensional space requires that (a) the noise term is expanded to at least as many terms as the points on the spatial discretization

d \leq m

, and (b) the resulting diffusion matrix

R

in Equation (23) is full rank. Therefore, increasing finite dimensional approximation accuracy increases the complexity of the sampling process and optimal control computation. This is even more challenging in the case of SPDEs with Q-Wiener noise, where many of the eigenvalues in the expansion of

W (t)

must be arbitrarily close to zero.

Other finite dimensional approaches, as in [39], utilize Gaussian density functions instead of the measure theoretic approach. These approaches are not possible firstly due to the need to define the Gaussian density with respect to a measure other than the Lebesgue measure, which does not exist in infinite dimensions. Secondly, an equivalent Euler–Maruyama time discretization is not possible without first discretizing spatially. Finally, after spatial discretization, the use of transition probabilities based on density functions requires the invertibility of

R R^{T}

(see Appendix I). These characteristics make Gaussian density-based approaches not suitable for deriving optimal control of SPDEs.

5. Numerical Results

Performing variational optimization in the infinite dimensional space enables a general framework for controlling general classes of stochastic fields. It also comes with algorithmic benefits from importance sampling and can be applied in either the open loop or MPC mode for both boundary and distributed control systems. Critically, it avoids feasibility issues in optimizing finite dimensional representations of SPDEs. Additional flexibility arises from the freedom to choose the model reduction method that is best suited for the problem without having to change the control update law. Details on the algorithm and more details on each simulated experiment can be found in Appendix J and Appendix K.

5.1. Distributed Control of Stochastic PDEs in Fluid Physics

Several simulated experiments were conducted to investigate the efficacy of the proposed control approach. The first explores control of the 1D stochastic viscous Burgers’ equation with non-homogeneous Dirichlet boundary conditions. This advection–diffusion equation with random forcing was studied as a simple model for turbulence [40,41].

The control objective in this experiment is to reach and maintain a desired velocity at specific locations along the spatial domain, depicted in black. In order to achieve the task, the controller must overcome the uncontrolled spatio-temporal evolution governed by an advective and diffusive nature, which produces an apparent velocity wave front that builds across the domain as depicted on the bottom left of Figure 3.

Both open-loop and MPC versions of the control in Equation (17) were tested on the 1D stochastic Burgers’ equation and the results are depicted in the top subfigure of Figure 3. Their performance was compared by averaging the velocity profiles for the 2nd half of each experiment and repeated over 128 trials. The simulated experiment duration was 1.0 s. For the open-loop scheme, 100 optimization iterations with 100 sampled trajectory rollouts per iteration were used. In the MPC setting, 10 optimization iterations were performed at each time step, each using 100 sampled trajectory rollouts.

The results suggest that both the open-loop and MPC schemes have comparable success in controlling the stochastic Burgers SPDE. The open-loop setting depicts the apparent rightward wavefront that is not as strong in the MPC setting. There is also quite a substantial difference in variance over the trajectory rollouts. The open-loop setting depicts a smaller variance overall, while the MPC setting depicts a variance that shrinks around the objective regions. The MPC performance is desirable since the performance metric only considers the objective regions. The root mean squared error (RMSE) and variance averaged over the desired regions is provided in Table 2.

The stochastic Nagumo equation with homogeneous Neumann boundary conditions is a reduced model for wave propagation of the voltage in the axon of a neuron [42]. This SPDE shares a linear diffusion term with the viscous Burgers equation as depicted in Table 1. However, as shown in the bottom left subfigure of Figure 4, the nonlinearity produces a substantially different behavior, which propagates the voltage across the axon with our simulation parameters in about 5 s. This set of simulated experiments explores two tasks: accelerating the rate at which the voltage propagates across the axon, and suppressing the voltage propagation across the axon. This is analogous to either ensuring the activation of a neuronal signal, or ensuring that the neuron remains inactivated.

These tasks are accomplished by reaching either a desired value of 1.0 or 0.0 over the right end of the spatial region for acceleration and suppression, respectively. In both experiments, open-loop and MPC versions of Equation (17) were tested, and the results are depicted in Figure 4 and Figure 5. For the open-loop scheme, 200 optimization iterations with 200 sampled trajectory rollouts per iteration were used. In the MPC setting, 10 optimization iterations were performed at each time step, each using 100 sampled trajectory rollouts. State trajectories of both control schemes were compared by averaging the voltage profiles for 2nd half of each time horizon and repeated over 128 trials.

The results of the two stochastic Nagumo equation tasks suggest that both control schemes achieve success on both the acceleration and suppression tasks. While the performance appears substantially different outside the target region, the two control schemes have very similar performance on the desired region, which is the only penalized region in the optimization objective. In the top subfigures of Figure 4 and Figure 5, the desired region is zoomed in on. The zoomed in views depict a higher variance in the state trajectories of the open-loop control scheme than the MPC scheme.

As in the stochastic viscous Burgers experiment, there is an apparent trade-off between the two control schemes. The MPC scheme yields a desirable lower variance in the region that is being considered for optimization, but produces state trajectories with very high variance outside the goal region. The open loop control is understood as seeking to achieve the task by reaching low variance trajectories everywhere, while the MPC scheme is understood as acting reactively (i.e., re-optimizes based on state measurements) to a propagating voltage signal. The RMSE and variance averaged over the desired region of 128 trials of each experiment are given in Table 3.

The next simulated experiment explores scalability to 2D spatial domains by considering the 2D stochastic heat equation with homogeneous Dirichlet boundary conditions. This experiment can be thought of as attempting to heat an insulated metal plate to specified temperatures in specified regions while the edges remain at a temperature of 0 at some scale. The desired temperatures and regions associated with this experiment are depicted in the left subfigure of Figure 6. This experiment tests the MPC scheme.

Starting from a random initial temperature profile, as in the second subfigure of Figure 6, and using a time horizon of 1.0 s, the MPC controller is able to achieve the desired temperature profile toward the end of the time horizon as shown in the fourth subfigure of Figure 6. The third subfigure of Figure 6 depicts the middle of the time horizon. The MPC controller used 5 optimization iterations at every timestep and 25 sampled trajectories per iteration.

This result suggests that in this case, this approach can handle the added complexity of 2D stochastic fields. As depicted in the right subfigure of Figure 6, the proposed MPC control scheme solves the task of reaching the desired temperature at the specified spatial regions.

5.2. Boundary Control of Stochastic PDEs

The control update in Equation (17) describes control of SPDEs by distributing actuators throughout the field. However, our framework can also handle systems with control and noise at the boundary. A key requirement is to write such dynamical systems in the mild form as follows:

\begin{matrix} X (t) & = e^{t A} ξ + \int_{0}^{t} e^{(t - s) A} F_{1} (t, X) d s \\ + (λ I - A) [\int_{0}^{t} e^{(t - s) A} D (F_{2} (t, X) + G (t, X) U (t, X; θ)) d s \\ + \int_{0}^{t} e^{(t - s) A} D B (t, X) d V (s)], P a . s . \end{matrix}

(25)

where the operator

D

corresponds to the boundary conditions of the problem, and is called the Dirichlet map (Neumann map, respectively) for Dirichlet (Neumann, respectively) boundary control/noise. These maps take operators defined on the boundary Hilbert space

Λ_{0}

to the Hilbert space H of the domain.

λ

is a real number also associated with the boundary conditions. The operator

d V

describes a cylindrical Wiener process on the boundary Hilbert space

Λ_{0}

. For further details, the reader can refer to the discussion in [29] Section 2.5, Appendix C.5, and Appendix G.

Studying optimal control problems with dynamics, as in Equation (25), is rather challenging. HJB theory requires additional regularity conditions, and proving convergence of Equation (25) becomes nontrivial, especially when considering Dirichlet boundary noise. Numerical results are limited to simplistic problems. Nevertheless, Equation (17) is extended to the case of boundary control by similarly using tools from Girsanov’s theorem to obtain the change of measure as follows:

\begin{matrix} \frac{d L}{d L^{(i)}} = exp ( & - \int_{0}^{T} {〈B^{- 1} G U, d V (s)〉}_{Λ_{0}} + \frac{1}{2} \int_{0}^{T} | | B^{- 1} G U {| |}_{Λ_{0}}^{2} d s), \end{matrix}

(26)

which was also utilized in reference [43] for studying solutions of SPDEs, similar to Equation (25). Using the control parameterization of the distributed case above results in the same approach described in Equation (17) with inner products taken with respect to the boundary Hilbert space

Λ_{0}

to solve stochastic boundary control problems.

The stochastic 1D heat equation under Neumann boundary conditions was explored to conduct simulated experiments that investigate the efficacy of the proposed framework in stochastic boundary control settings. The objective is to track a time-varying profile that is uniform in space by actuation only at the boundary points. The MPC scheme of Equation (17), with 10 optimization iterations per time step is depicted in the left subfigure of Figure 7. The random sample of the controlled state trajectory, depicted in a violet to red color spectrum, remains close to the time-varying desired profile, depicted in magenta. The associated bounded actuation signals acting on the two boundary actuators are depicted in the right subfigure of Figure 7.

As suggested by the results of the simulated experiments, the authors note a clear empirical iterative improvement of the control policy on each of the experiments. This necessitates a deeper theoretical analysis of the convergence of the proposed algorithm, and is influenced by several of the parameters that appear in Algorithms A1 and A2. The parameter

ρ

, which appears in the controlled and uncontrolled dynamics in Equations (2) and (3) as well as in the Legendre transformation Equation (7), influences the intensity of the stochasticity and the relative weightings of the terms in Equation (18), which in general leads to an exploration–exploitation trade off. The number of rollouts also has a significant effect on the empirical performance. In general, a larger number of rollouts is advantageous due to a more representative sampling of state space, as well as a better approximation of the expectation, yet can lead to a larger computational burden. In the MPC setting, the time horizon has a significant effect on the empirical performance. This is typical of MPC methods, as a short receding window can cause the algorithm to be myopic, while a large receding window recovers the “single shot” or open-loop performance. Finally, the spatial and temporal discretization size has a significant effect on algorithmic performance, due to the errors introduced in large spatial or temporal steps in the resulting discrete equations, which may ultimately fail the Courant–Friedrichs–Lewy conditions of the SPDE.

The above experiments were designed to cover stochastic SPDEs with nonlinear dynamics, multiple spatial dimensions, time-varying objectives, and systems with both distributed and boundary actuation. This range explores the versatility of the proposed framework to problems of many different types. Throughout these experiments, the control architecture produces state trajectories that solve the objective with high probability for the given stochasticity.

6. Conclusions

This manuscript presented a variational optimization framework for distributed and boundary controlled stochastic fields based on the free energy–relative entropy relation. The approach leverages the inherent stochasticity in the dynamics for control, and is valid for generic classes of infinite-dimensional diffusion processes. Based on thermodynamic notions that have demonstrated connections to established stochastic optimal control principles, algorithms were developed that bridge the gap between abstract theory and computational control of SPDEs. The distributed and boundary control experiments demonstrate that this approach can successfully control complex physical systems in a variety of domains.

This research opens new research directions in the area of control of stochastic fields that are ubiquitous in the domain of physics. Based on the use of forward sampling, future research on the algorithmic side will include the development of efficient methods for the representation and propagation of stochastic fields, using techniques in machine learning, such as deep neural networks. Other directions include explicit feedback parameterizations and, in the context of boundary control, HJB approaches in the information theoretic formulation.

Author Contributions

Conceptualization, E.A.T.; data curation, M.A.P. and E.N.E.; methodology, G.I.B. and E.N.E.; software, M.A.P. and G.I.B.; investigation M.A.P. and G.I.B.; formal analysis, G.I.B., E.N.E. and E.A.T.; writing, E.N.E.; visualization, M.A.P., E.N.E. and G.I.B.; supervision, E.A.T.; project administration, E.A.T.; resources, E.A.T.; funding acquisition, E.A.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Army Research Office contract W911NF2010151. Ethan N. Evans was supported by the SMART scholarship and George I. Boutselis was partially supported by the Onassis Fellowship.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data supporting the reported results were produced “from scratch” by the algorithms detailed in the manuscript.

Acknowledgments

We would like to express our gratitude to Andrzej Swiech for our useful and pertinent discussions, which clarified certain aspects of SPDEs in Hilbert spaces.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SPDE	Stochastic Partial Differential Equation
PDE	Partial Differential Equation
SDE	Stochastic Differential Equation
ODE	Ordinary Differential Equation
SOC	Stochastic Optimal Control
HJB	Hamilton–Jacobi–Bellman
MPC	Model Predictive Control
RMSE	Root Mean Squared Error

Appendix A. Description of the Hilbert Space Wiener Process

In this section, we provide formal definitions of various forms of the Hilbert space Wiener process. Some of these statements can be found in [2], Section 4.1.

Definition A1.

Let

H

denote a Hilbert space. A

H

-valued stochastic process

W (t)

with probability law

L (W (\cdot))

is called a Wiener process if the following hold:

(i): $W (0) = 0$
(ii): W has continuous trajectories.
(iii): W has independent increments.
(iv): $L (W (t) - W (s)) = N (0, (t - s) Q), t \geq s \geq 0$
(v): $L (W (t)) = L (- W (t)), t \geq 0$

Proposition A1.

Let

{e_{i}}_{i = 1}^{\infty}

be a complete orthonormal system for the Hilbert Space

H

. Let Q denote the covariance operator of the Wiener process

W (t)

. Note that Q satisfies

Q e_{i} = λ_{i} e_{i}

, where

λ_{i}

is the eigenvalue of Q that corresponds to eigenvector

e_{i}

. Then,

W (t) \in H

has the following expansion:

W (t) = \sum_{j = 1}^{\infty} \sqrt{λ_{j}} β_{j} (t) e_{j},

(A1)

where

β_{j} (t)

are real valued Brownian motions that are mutually independent of

(Ω, F, P)

.

Definition A2.

Let

{e_{i}}_{i = 1}^{\infty}

be a complete orthonormal system for the Hilbert space

H

. An operator A on

H

with the set of its eigenvalues

{λ_{i}}_{i = 1}^{\infty}

in a given basis

{e_{i}}_{i = 1}^{\infty}

is called a trace-class operator if the following holds:

T r (A) : = \sum_{n = 1}^{\infty} 〈A e_{n}, e_{n}〉 = \sum_{i = 1}^{\infty} λ_{i} < \infty .

(A2)

The two primary Wiener processes that are typically used to model spatio-temporal noise processes in the SPDE literature are the cylindrical Wiener process and the Q-Wiener process. These are both referred to in the main text, and are defined in the following two definitions.

Definition A3.

A Wiener process

W (t)

on

H

with covariance operator Q is called a cylindrical Wiener process if Q is the identity operator I.

Definition A4.

A Wiener process

W (t)

on

H

with covariance operator Q is called a Q-Wiener process if Q is of trace-class.

An immediate fact following Definition A3 is that the cylindrical Wiener process acts spatially everywhere on

H

with equal magnitude. One can easily conclude that for a cylindrial Wiener process, the eigenvalues

{λ_{i}}_{i = 1}^{\infty}

of the covariance operator Q are all unity, thus the following holds:

\sum_{i = 1}^{\infty} λ_{i} = \infty .

(A3)

However, we note that in this case, the series in (A1) converges in another Hilbert space

U_{1} \supset U

when the inclusion

ι : U \to U_{1}

is Hilbert–Schmidt. For more details, see [2].

On the other hand, immediately following Definition A4 is the fact that a Q-Wiener process must not have a spatially equal effect everywhere on the domain. More precisely, one has the following proposition.

Proposition A2.

Let

W (t)

be a Q-Wiener process on

H

with covariance operator Q. Let

{λ_{i}}_{i = 1}^{\infty}

denote the set of eigenvalues of Q in the complete orthonormal system

{e_{i}}_{i = 1}^{\infty}

. Then, the eigenvalues must fall into one of the following three cases:

(i): For any $ε > 0$ , there are only finitely many eigenvalues $λ_{i}$ of covariance operator Q such that $| λ_{i} | > ε$ . That is, the set ${i \in N_{+} : | λ_{i} | > ε}$ , where $N_{+}$ is the positive natural numbers, has finite elements.
(ii): The eigenvalues $λ_{i}$ of covariance operator Q follow a bounded periodic function such that $| λ_{i} | > 0$ ∀ $i \in N_{+}$ and $\sum_{i = 1}^{\infty} λ_{i} = 0$ .
(iii): Both case (i) and case (ii) are satisfied. In this case, the eigenvalues follow a bounded and convergent periodic function with ${lim}_{i \to \infty} λ_{i} = 0$ .

Appendix B. Relative Entropy and Free Energy Dualities in Hilbert Spaces

In this section, we provide the relation between free energy and relative entropy. This connection is valid for general probability measures, including measures defined on path spaces induced by infinite-dimensional stochastic systems. In what follows,

L^{p}

(

1 \leq p < \infty

) denotes the standard

L^{p}

space of measurable functions and

P

denotes the set of probability measures.

Definition A5.

(Free Energy) Let

L \in P

a probability measure on a sample space Ω, and consider a measurable function

J : L^{p} \to R_{+}

. Then, the following term,

V : = \frac{1}{ρ} {log}_{e} \int_{Ω} exp (ρ J) d L (ω),

(A4)

is called the free energy of J with respect to

L

and

ρ \in R

. The function

{log}_{e}

denotes the natural logarithm.

Definition A6.

Generalized entropy: Let

L, \tilde{L} \in P

, then the relative entropy of

\tilde{L}

with respect to

L

is defined as follows.

S (\tilde{L} | | L) : = \{\begin{matrix} - \int_{Ω} \frac{d \tilde{L} (ω)}{d L (ω)} {log}_{e} \frac{d \tilde{L} (ω)}{d L (ω)} d L (ω), i f \tilde{L} < < L, \\ + \infty, o t h e r w i s e, \end{matrix}

where “

< <

” denotes absolute continuity of

\tilde{L}

with respect to

L

. We say that

\tilde{L}

is absolutely continuous with respect to

L

and we write

\tilde{L} < < L

if

L (B) = 0 \Rightarrow \tilde{L} (B) = 0, \forall B \in F

.

The free energy and relative entropy relationship is expressed by the following theorem:

Theorem A1.

Let

(Ω, F)

be a measurable space. Consider

L, \tilde{L} \in P

and Definitions A5 and A6. Under the assumption that

\tilde{L} < < L

, the following inequality holds:

\begin{matrix} - \frac{1}{ρ} {log}_{e} E_{L} [exp (- ρ J)] \leq [E_{\tilde{L}} (J) - \frac{1}{ρ} S (\tilde{L} | | L)], \end{matrix}

(A5)

where

E_{L}, E_{\tilde{L}}

denote expectations under probability measures

L

,

\tilde{L}

, respectively. Moreover,

ρ \in R_{+}

and

J : L^{p} \to R_{+}

. The inequality in (A5) is the so-called Legendre transformation.

By defining the free energy as temperature

T = \frac{1}{ρ}

, the Legendre transformation has the following form:

V \leq E - T S,

(A6)

and the equilibrium probability measure has the classical form:

d L^{*} (ω) = \frac{exp (- ρ J) d L (ω)}{\int_{Ω} exp (- ρ J) d L (ω)},

(A7)

To verify the optimality of

L^{*}

, it suffices to substitute (A7) in (A5) and show that the inequality collapses to an equality [35]. The statistical physics interpretation of inequality (A6) is that maximization of entropy results in a reduction in the available energy. At the thermodynamic equilibrium, the entropy reaches its maximum and

V = E - T S

.

Appendix C. A Girsanov Theorem for SPDEs

Theorem A2 (Girsanov).

Let Ω be a sample space with a σ-algebra

F

. Consider the following H-valued stochastic processes:

\begin{matrix} d X & = (A X + F (t, X)) d t + G (t, X) d W (t), \end{matrix}

(A8)

\begin{matrix} d \tilde{X} & = (A \tilde{X} + F (t, \tilde{X})) d t + \tilde{B} (t, \tilde{X}) d t + G (t, \tilde{X}) d W (t), \end{matrix}

(A9)

where

X (0) = \tilde{X} (0) = x

and

W \in U

is a cylindrical Wiener process with respect to measure

P

. Moreover, for each

Γ \in C ([0, T]; H)

, let the law of X be defined as

L (Γ) : = P (ω \in Ω | X (\cdot, ω) \in Γ)

. Similarly, the law of

\tilde{X}

is defined as

\tilde{L} (Γ) : = P (ω \in Ω | \tilde{X} (\cdot, ω) \in Γ)

. Assume

E_{P} [e^{\frac{1}{2} \int_{0}^{T} {| | ψ (t) | |}^{2} d t}] < + \infty,

(A10)

where

ψ (t) : = G^{- 1} (t, X (t)) \tilde{B} (t, X (t)) \in U_{0} .

(A11)

Then, the following holds:

\begin{matrix} \tilde{L} (Γ) = E_{P} [exp (\int_{0}^{T} {〈 ψ (s), d W (s) 〉}_{U} - \frac{1}{2} \int_{0}^{T} {| | ψ (s) | |}_{U}^{2} d s) | X (\cdot) \in Γ] . \end{matrix}

(A12)

Proof.

Define the process:

\hat{W} (t) : = W (t) - \int_{0}^{t} ψ (s) d s .

(A13)

Under the assumption in (A10),

\hat{W}

is a cylindrical Wiener process with respect to a measure

Q

determined by the following:

\begin{matrix} d Q (ω) & = exp (\int_{0}^{T} {〈 ψ (s), d W (s) 〉}_{U} - \frac{1}{2} \int_{0}^{T} {| | ψ (s) | |}_{U}^{2} d s) d P \\ = exp (\int_{0}^{T} {〈 ψ (s), d \hat{W} (s) 〉}_{U} + \frac{1}{2} \int_{0}^{T} {| | ψ (s) | |}_{U}^{2} d s) d P . \end{matrix}

(A14)

The proof for this result can be found in ([2], Theorem 10.14). Now, using (A13), (A8) can be rewritten as the following:

\begin{matrix} d X & = (A X + F (t, X)) d t + G (t, X) d W (t) \end{matrix}

(A15)

\begin{matrix} = (A X + F (t, X)) d t + B (t, X) d t + G (t, X) d \hat{W} (t) \end{matrix}

(A16)

Notice that the SPDE in (A16) has the same form as (A9). Therefore, under the introduced measure

Q

and noise profile

\hat{W}

,

X (\cdot, ω)

becomes equivalent to

\tilde{X} (\cdot, ω)

from (A9). Conversely, under measure

P

, (A15) (or (A16)) behaves as the original system in (A8). In other words, (A8) and (A16) describe the same system on

(Ω, F, P)

. From the uniqueness of solutions and the aforementioned reasoning, one has the following:

P ({\tilde{X} \in Γ}) = Q ({X \in Γ}) .

The result follows from Equation (A14). □

Appendix D. Proof of Lemma 1

Proof.

Under the open loop parameterization

U (x, t) = m {(x)}^{T} u (t)

, the problem takes the following form:

\begin{matrix} u^{*} & = argmin [\int_{C} {log}_{e} \frac{d L^{*} (x)}{d \tilde{L} (x)} d L^{*} (x)] = argmin [\int_{C} {log}_{e} \frac{d L^{*} (x)}{d L (x)} \frac{d L (x)}{d \tilde{L} (x)} d L^{*} (x)] . \end{matrix}

By using the change of measures in Equation (13) of the main text, minimization of the last expression is equivalent to the minimization of the following expression:

\begin{matrix} E_{L^{*}} [{log}_{e} \frac{d L (x)}{d \tilde{L} (x)}] & = - \sqrt{ρ} E_{L^{*}} [\int_{0}^{T} u {(t)}^{⊤} \bar{m} (t)] + \frac{1}{2} ρ E_{L^{*}} [\int_{0}^{T} u {(t)}^{⊤} M u (t) d t] . \end{matrix}

As stated in Lemma 1, we apply the control in discrete time instances, and consider the class of step functions

u_{i}

,

i = 0, \dots, L - 1

that are constant over fixed-size intervals

[t_{i}, t_{i + 1}]

of length

Δ t

. We have the following:

\begin{matrix} E_{L^{*}} [{log}_{e} \frac{d L (x)}{d \tilde{L} (x)}] & = - \sqrt{ρ} \sum_{i = 0}^{L - 1} u_{i}^{⊤} E_{L^{*}} [\int_{t_{i}}^{t_{i + 1}} \bar{m} (t)] + \frac{1}{2} ρ \sum_{i = 0}^{L - 1} u_{i}^{⊤} M u_{i} Δ t, \end{matrix}

where we have used the fact that

M

is constant with respect to time. Due to the symmetry of

M

, minimization of the expression above with respect to

u_{i}

results in the following:

u_{i}^{*} = \frac{1}{\sqrt{ρ} Δ t} M^{- 1} E_{L^{*}} [\int_{t_{i}}^{t_{i + 1}} \bar{m} (t)] .

(A17)

Since we cannot sample directly from the optimal measure

L^{*}

, we need to express the above expectation with respect to the measure induced by controlled dynamics,

L^{(i)}

. We can then directly sample controlled trajectories based on

L^{(i)}

and approximate the optimal control trajectory. The change in expectation is achieved by applying the Radon–Nikodym derivative. These so-called importance sampling steps are as follows. First define

W^{(i)}

in a similar fashion to Equation (A13), as follows:

W^{(i)} (t) : = W (t) - \int_{0}^{t} \sqrt{ρ} U^{(i)} (s) d s .

(A18)

Similar to Equation (A16), one can rewrite the uncontrolled dynamics as the following:

\begin{matrix} \begin{matrix} d X & = (A X + F (t, X)) d t + \frac{1}{\sqrt{ρ}} G (t, X) d W (t) \\ = (A X + F (t, X)) d t + G (t, X) (U^{(i)} d t + \frac{1}{\sqrt{ρ}} d W^{(i)} (t)) . \end{matrix} \end{matrix}

(A19)

Under the open-loop parameterization

U (t) (x) = m {(x)}^{⊤} u_{j}

, where

u_{j}

are step functions on each interval

[t_{j}, t_{j + 1}]

, the change of measures becomes the following:

\begin{matrix} \begin{matrix} \frac{d L}{d L^{(i)}} = exp (- \sqrt{ρ} \sum_{k = 0}^{L - 1} u_{k}^{(i) ⊤} \int_{t_{k}}^{t_{k + 1}} {\bar{m}}^{(i)} (t) - ρ \frac{1}{2} \sum_{k = 0}^{L - 1} u_{k}^{(i) ⊤} M u_{k}^{(i)} Δ t), \end{matrix} \end{matrix}

(A20)

where

\begin{matrix} {\bar{m}}^{(i)} (t) : = & {[{〈 m_{1}, d W^{(i)} (t) 〉}_{U}, . . ., {〈 m_{N}, d W^{(i)} (t) 〉}_{U}]}^{⊤} \in R^{N} . \end{matrix}

(A21)

One can alternatively write this as the following:

\begin{matrix} {(\int_{t_{j}}^{t_{j + 1}} {\bar{m}}^{(i)} (t))}_{l} & = \int_{t_{j}}^{t_{j + 1}} {〈m_{l}, d W^{(i)} (t)〉}_{U} = \int_{t_{j}}^{t_{j + 1}} {〈m_{l}, d W (t) - \sqrt{ρ} U^{(i)} (t) d t〉}_{U} \\ = \int_{t_{j}}^{t_{j + 1}} {〈m_{l}, d W (t)〉}_{U} - \sqrt{ρ} [{〈m_{l}, m_{1}〉}_{U}, . . ., {〈m_{l}, m_{N}〉}_{U}] u_{j}^{(i)} Δ t . \end{matrix}

It follows that:

\int_{t_{j}}^{t_{j + 1}} {\bar{m}}^{(i)} (t) = \int_{t_{j}}^{t_{j + 1}} \bar{m} (t) - \sqrt{ρ} Δ t M u_{j}^{(i)} .

(A22)

In order to derive the iterative scheme, we perform one step of importance sampling and express the associated expectations with respect the measure induced by the controlled SPDE in Equation (3) of the main text. Let us begin by modifying Equation (A17) via the appropriate change of measures from (A20), as well as (A18):

\begin{matrix} u_{j}^{i + 1} & = \frac{1}{\sqrt{ρ} Δ t} M^{- 1} \int_{Ω} [\frac{d L^{*}}{d L} \frac{d L}{d L^{(i)}} \int_{t_{i}}^{t_{i + 1}} \bar{m} (t)] d L^{(i)} \\ = \frac{1}{\sqrt{ρ} Δ t} M^{- 1} \int [\frac{exp (- ρ J)}{E_{L} [exp (- ρ J)]} \frac{d L}{d L^{(i)}} \int_{t_{i}}^{t_{i + 1}} \bar{m} (t)] d L^{(i)} \\ = \frac{1}{\sqrt{ρ} Δ t} M^{- 1} \int [\frac{exp (- ρ J)}{E_{L^{(i)}} [\frac{d L}{d L^{(i)}} exp (- ρ J)]} \frac{d L}{d L^{(i)}} \int_{t_{i}}^{t_{i + 1}} \bar{m} (t)] d L^{(i)} \end{matrix}

(A23)

\begin{matrix} = \frac{1}{\sqrt{ρ} Δ t} M^{- 1} E_{L^{(i)}} [\frac{exp (- ρ J^{(i)})}{E_{L^{(i)}} [exp (- ρ J^{(i)})]} \int_{t_{i}}^{t_{i + 1}} \bar{m} (t)], \end{matrix}

(A24)

One can reorder Equation (A22) as the following:

\int_{t_{j}}^{t_{j + 1}} \bar{m} (t) = \int_{t_{j}}^{t_{j + 1}} {\bar{m}}^{(i)} (t) + \sqrt{ρ} Δ t M u_{j}^{(i)} .

(A25)

and plug it into Equation (A24) to yield the following:

\begin{matrix} u_{j}^{i + 1} & = \frac{1}{\sqrt{ρ} Δ t} M^{- 1} E_{L^{(i)}} [\frac{exp (- ρ J^{(i)})}{E_{L^{(i)}} [exp (- ρ J^{(i)})]} \int_{t_{j}}^{t_{j + 1}} {\bar{m}}^{(i)} (t) + \sqrt{ρ} Δ t M u_{j}^{(i)}] \\ = u_{j}^{(i)} + \frac{1}{\sqrt{ρ} Δ t} M^{- 1} E_{L^{(i)}} [\frac{exp (- ρ J^{(i)})}{E_{L^{(i)}} [exp (- ρ J^{(i)})]} \int_{t_{j}}^{t_{j + 1}} {\bar{m}}^{(i)} (t)], \end{matrix}

(A26)

which is equivalent to Equation (17) in the main text with

J^{(i)}

defined by Equation (18) in the main text. □

Appendix E. Feynman–Kac for Spatio-Temporal Diffusions: From Expectations to Hilbert Space PDEs

Lemma A1.

Infinite dimensional Feynman–Kac: Define

ψ : [t_{0}, T] \times H \to R

as the following conditional expectation.

\begin{matrix} ψ (t, X) : = E_{L} [exp (- ρ J (X_{t, X}^{T})) | F_{t}] + E_{L} [\int_{t}^{T} g (X, t) exp (- ρ Φ (X_{t, X}^{s})) d s | F_{t}], \end{matrix}

(A27)

evaluated on stochastic trajectories

X_{t, X}^{T}

generated by the infinite dimensional stochastic systems in Equations (2) and (3) of the main text and

ρ \in R_{+}

. The trajectory dependent terms

Φ (X_{t, X}^{T}) : L^{p} \to R_{+}

and

J (X_{t, X}^{T}) : L^{p} \to R_{+}

are defined as follows:

\begin{matrix} \begin{matrix} Φ (X_{t, X}^{s}) & = \int_{t}^{s} ℓ (τ, X (τ)) d τ, \\ J (X_{t, X}^{T}) & = ϕ (T, X) + Φ (X_{t, X}^{T}) . \end{matrix} \end{matrix}

(A28)

Additionally, let

ψ (t, X) \in C_{b}^{1, 2} ([0, T] \times H)

. Then, the function

ψ (t, X)

satisfies the following equation:

\begin{matrix} \begin{matrix} - \partial_{t} ψ (t, X (t)) & = - ρ ℓ (t, X (t)) ψ (t, X (t)) + 〈ψ_{X}, A X (t) + F (X (t))〉 \\ + \frac{1}{2} Tr [ψ_{X X} (B Q^{\frac{1}{2}}) {(B Q^{\frac{1}{2}})}^{*}] + g (t, X (t)) . \end{matrix} \end{matrix}

(A29)

Proof.

The proof starts with the expectation in (A27), which is an expectation conditioned on the filtration

F_{t}

. To keep the notation short, we drop the dependencies on t and

X (t)

, and write

ϕ_{T} = ϕ (T, X (T))

,

ℓ_{t} = ℓ (t, X (t))

, and

g_{t} = g (t, X (t))

. We split the integrals inside the expectations to write the following:

\begin{matrix} ψ (t, X) & = E_{L} [exp (- ρ ϕ_{T} - ρ \int_{t}^{T} ℓ_{τ} d τ) | F_{t}] \\ + E_{L} [\int_{t}^{T} g_{s} exp (- ρ \int_{t}^{s} ℓ_{τ} d τ) d s | F_{t}] \\ = E_{L} [exp (- ρ ϕ_{T} - ρ \int_{t + δ t}^{T} ℓ_{τ} d τ) exp (- \int_{t}^{t + δ t} ℓ_{τ} d τ) | F_{t}] \\ + E_{L} [\int_{t}^{t + δ t} g_{s} exp (- ρ \int_{t}^{s} ℓ_{τ} d τ) d s | F_{t}] \\ + E_{L} [\int_{t + δ t}^{T} g_{s} exp (- ρ \int_{t}^{s} ℓ_{τ} d τ) d s | F_{t}] \end{matrix}

By using the law of iterated expectations between the two sub-sigma algebras

F_{t} \subseteq F_{t + δ t}

we have the following:

\begin{matrix} ψ (t, X) & = E_{L} [E_{L} [exp (- ρ ϕ_{T} - ρ \int_{t + δ t}^{T} ℓ_{τ} d τ) exp (- \int_{t}^{t + δ t} ℓ_{τ} d τ) | F_{t + δ t}] | F_{t}] \\ + E_{L} [\int_{t}^{t + d t} g_{s} exp (- ρ \int_{t}^{s} ℓ_{τ} d τ) d s | F_{t}] \\ + E_{L} [E_{L} [\int_{t + d t}^{T} g_{s} exp (- ρ \int_{t}^{t + d t} ℓ_{τ} d τ) exp (- ρ \int_{t + d t}^{s} ℓ_{τ} d τ) d s | F_{t + δ t}] | F_{t}] . \end{matrix}

Next, we use the fact that the conditioning on the filtration

F_{t + δ t}

results in the following equality:

\begin{matrix} E_{L} [E_{L} [exp (- ρ ϕ_{T} - ρ \int_{t + δ t}^{T} ℓ_{τ} d τ) exp (- \int_{t}^{t + δ t} ℓ_{τ} d τ) | F_{t + δ t}] | F_{t}] \\ = E_{L} [exp (- \int_{t}^{t + δ t} ℓ_{τ} d τ) E_{L} [exp (- ρ ϕ_{T} - ρ \int_{t + δ t}^{T} ℓ_{τ} d τ) | F_{t + δ t}] | F_{t}] \end{matrix}

By further using this property of independence we have the following:

\begin{matrix} ψ (t, X) & = E_{L} [exp (- ρ \int_{t}^{t + δ t} ℓ_{τ} d τ) E_{L} [exp (- ρ ϕ_{T} - ρ \int_{t + δ t}^{T} ℓ_{τ} d τ) | F_{t + δ t}] | F_{t}] \\ + E_{L} [\int_{t}^{t + d t} g_{s} exp (- ρ \int_{t}^{s} ℓ_{τ} d τ) d s | F_{t}] \\ + E_{L} [exp (- ρ \int_{t}^{t + δ t} ℓ_{τ} d τ)] E_{L} [\int_{t + δ t}^{T} g_{s} exp (- ρ \int_{t + δ t}^{s} ℓ_{τ} d τ) d s | F_{t + δ t}] | F_{t}] \\ = E_{L} [exp (- ρ \int_{t}^{t + δ t} ℓ_{τ} d τ) ψ (t + δ t, X (t + δ t)) | F_{t}] \\ + E_{L} [\int_{t}^{t + δ t} g_{s} exp (- ρ \int_{t}^{s} ℓ_{τ} d τ) d s | F_{t}] \end{matrix}

The last expression provides the backward propagation of the

ψ (t, X (t))

by employing a expectation over

ψ (t + δ t, X (t + δ t))

. To get the backward deterministic Kolmogorov equations for the infinite dimensional case, we subtract the term

E [ψ (t + δ t, X (t + δ t)) | F_{t}]

from both sides:

\begin{matrix} - E_{L} [ψ (t + δ t, X (t + δ t)) - ψ (t, X (t)) | F_{t}] \\ = E_{L} [\{exp (- ρ \int_{t}^{t + δ t} ℓ_{τ} d τ) - 1\} ψ (t + δ t, X (t + δ t)) | F_{t}] \\ + E_{L} [\int_{t}^{t + δ t} g_{s} exp (- ρ \int_{t}^{s} ℓ_{τ} d τ) d s | F_{t}] . \end{matrix}

Next, we take the limit as

δ t \to 0

and we have the following:

\begin{matrix} - lim_{δ t \to 0} E_{L} [ψ (t + δ t, X (t + δ t)) - ψ (t, X (t)) | F_{t}] \\ = lim_{δ t \to 0} E_{L} [(exp (- ρ \int_{t}^{t + δ t} ℓ_{τ} d τ) - 1) ψ (t + δ t, X (t + δ t)) | F_{t}] \\ + lim_{δ t \to 0} E_{L} [\int_{t}^{t + δ t} g_{s} exp (- ρ \int_{t}^{s} ℓ_{τ} d τ) d s | F_{t}] . \end{matrix}

Thus, we have to compute three terms. We employ the Lebegue dominated convergence theorem to pass the limit inside the expectations:

- lim_{δ t \to 0} E_{L} [ψ (t + δ t, X (t + δ t)) - ψ (t, X (t)) | F_{t}] = E_{L} [d ψ | F_{t}]

(A30)

By using the Itô differentiation rule ([2] Theorem 4.32) for the case of infinite dimensional stochastic systems, we have the following:

\begin{matrix} \begin{matrix} E_{L} [d ψ (t, X (t)) | F_{t}] & = \partial_{t} ψ (t, X (t)) d t + 〈ψ_{X}, A X (t) + F (X (t))〉 d t \\ + \frac{1}{2} Tr [ψ_{X X} (B Q^{\frac{1}{2}}) {(B Q^{\frac{1}{2}})}^{*}] d t \end{matrix} \end{matrix}

The next term is as follows:

\begin{matrix} \begin{matrix} lim_{δ t \to 0} E_{L} [(exp (- ρ \int_{t}^{t + δ t} ℓ_{τ} d τ) - 1) ψ (t + δ t, X (t + δ t)) | F_{t}] & = - E_{L} [ℓ_{t} ψ (t, X (t)) | F_{t}] \\ = - ρ ℓ (t, X (t)) ψ (t, X (t)) d t \end{matrix} \end{matrix}

The third term is as follows:

\begin{matrix} \begin{matrix} lim_{δ t \to 0} E_{L} [\int_{t}^{t + δ t} g_{s} exp (- ρ \int_{t}^{s} ℓ_{τ} d τ) d s | F_{t}] = E_{L} [g (t, X (t)) δ t | F_{t}] = g (t, X (t)) d t \end{matrix} \end{matrix}

Combining the three terms above, we have shown that

ψ (t, X (t))

satisfies the backward Kolmogorov equation for the case of the infinite dimensional stochastic system in Equation (3) of the main text. □

Appendix F. Connections to Stochastic Dynamic Programming

In this section, we show the connections between stochastic dynamic programming and the free energy. Before proceeding, let

C_{b}^{k, n} ([0, T] \times H)

denote the space of all functions

ξ : [0, T] \times H \to R^{1}

that are k times continuously Fréchet differentiable with respect to time t and n times G

\hat{a}

teaux differentiable with respect to X. In addition, all their partial derivatives are continuous and bounded in

[0, T] \times H

. Furthermore, trajectories starting at

X \in E

over the time horizon

[t, T]

are denoted

X_{t, X}^{T} \equiv X (T, t, ω; X)

. Using this notation, we have that

X (t, t, ω; X) = X

. Finally, for real separable Hilbert space E, by the notation

x \otimes y

, we mean a linear bounded operator on E such that the following holds:

(x \otimes y) z = x 〈 y, z 〉, \forall x, y, z \in E .

First, we perform the exponential transformation on the function

ψ (t, X (t)) \in C_{b}^{1, 2} ([0, T] \times H)

and show that the transformed function

V (t, X (t)) \in C_{b}^{1, 2} ([0, T] \times H)

satisfies the HJB equation for the case of infinite dimensional systems [29]. This result is derived with general Q-Wiener noise with covariance operator Q, however it holds also for cylindrical Wiener noise (

Q = I

). This requires applying the Feynman–Kac lemma and deriving the backward Chapman–Kolmogorov equation for the case of infinite-dimensional stochastic systems. The backward Kolmogorov equations result in the HJB equation after a logarithmic transformation is applied. We start from the free energy and relative entropy inequality in (A5) and define the function

ψ (t, X (t)) : [0, T] \times H \to R

as follows:

ψ (t, X (t)) : = E_{L} [exp (- ρ J (X_{t, X}^{T})) | X],

which is simply the free energy as defined in Definition A5. By using the Feynman–Kac lemma we have that the function

ψ (t, X)

satisfies the backward Chapman–Kolmogorov equation specified as follows:

\begin{matrix} \begin{matrix} - \partial_{t} ψ (t, X (t)) & = - ρ ℓ (t, X (t)) ψ (t, X (t)) + 〈ψ_{X}, A X (t) + F (X (t))〉 \\ + \frac{1}{2} Tr [ψ_{X X} (G Q^{\frac{1}{2}}) {(G Q^{\frac{1}{2}})}^{*}] . \end{matrix} \end{matrix}

(A31)

where

\partial_{t} ψ (t, X (t))

denotes the Fréchet derivative of

ψ (t, X (t))

with respect to t, and

ψ_{X}

and

ψ_{X X}

denote the first and second G

\hat{a}

teaux derivatives of

ψ (t, X (t))

with respect to

X (t)

. Starting with the exponential transformation we have the following:

V (t, X (t)) = - \frac{1}{ρ} {log}_{e} ψ (t, X (t)) \Rightarrow ψ (t, X (t)) = e^{- ρ V (t, X (t))} .

Next, we compute the functional derivatives

V_{X}

and

V_{X X}

as functions of the functional derivatives

ψ_{X}

and

ψ_{X X}

. This results in the following:

\begin{matrix} \begin{matrix} ρ \partial_{t} V (t, X (t)) e^{- ρ V} & = - ρ ℓ (t, X (t)) e^{- ρ V} - ρ 〈V_{X} e^{- ρ V}, A X (t) + F (X (t))〉 \\ + \frac{ρ}{2} Tr [(V_{X} \otimes V_{X}) (G Q^{\frac{1}{2}}) {(G Q^{\frac{1}{2}})}^{*} e^{- ρ V}] \\ - \frac{1}{2} Tr [(V_{X X} (G Q^{\frac{1}{2}}) {(G Q^{\frac{1}{2}})}^{*} e^{- ρ V}] . \end{matrix} \end{matrix}

The last equations simplifies to the following:

\begin{matrix} \begin{matrix} - \partial_{t} V (t, X (t)) & = ℓ (t, X (t)) + 〈V_{X}, A X (t) + F (X (t))〉 \\ - \frac{1}{2 ρ} Tr [(V_{X} \otimes V_{X}) (G Q^{\frac{1}{2}}) {(G Q^{\frac{1}{2}})}^{*}] + \frac{1}{2 ρ} Tr [V_{X X} (G Q^{\frac{1}{2}}) {(G Q^{\frac{1}{2}})}^{*}] \end{matrix} \end{matrix}

(A32)

From the definition of the trace operator

Tr [A] : = \sum_{j = 1}^{\infty} 〈 A e_{j}, e_{j} 〉

for orthonormal basis

{e_{j}}

over the domain of A, we have the following expression:

\begin{matrix} \frac{1}{2} Tr [(V_{X} \otimes V_{X}) (G Q^{\frac{1}{2 ρ}}) {(G Q^{\frac{1}{2}})}^{*}] = \frac{1}{2 ρ} \sum_{j = 1}^{\infty} 〈(V_{X} \otimes V_{X}) (G Q^{\frac{1}{2}}) {(G Q^{\frac{1}{2}})}^{*} e_{j}, e_{j}〉 \end{matrix}

Since

(x \otimes y) z = x 〈 y, z 〉

, we have the following:

\begin{matrix} \frac{1}{2 ρ} \sum_{j = 1}^{\infty} 〈(V_{X} \otimes V_{X}) (G Q^{\frac{1}{2}}) {(G Q^{\frac{1}{2}})}^{*} e_{j}, e_{j}〉 & = \frac{1}{2 ρ} \sum_{j = 1}^{\infty} 〈V_{X} 〈V_{X}, (G Q^{\frac{1}{2}}) {(G Q^{\frac{1}{2}})}^{*} e_{j}〉, e_{j}〉 \\ = \frac{1}{2 ρ} \sum_{j = 1}^{\infty} 〈V_{X}, (G Q^{\frac{1}{2}}) {(G Q^{\frac{1}{2}})}^{*} e_{j}〉 〈V_{X}, e_{j}〉 \\ = \frac{1}{2 ρ} \sum_{j = 1}^{\infty} 〈(G Q^{\frac{1}{2}}) {(G Q^{\frac{1}{2}})}^{*} V_{X}, e_{j}〉 〈V_{X}, e_{j}〉 \\ \underset{Parseval}{=} \frac{1}{2} 〈V_{X}, (G Q^{\frac{1}{2}}) {(G Q^{\frac{1}{2}})}^{*} V_{X}〉 \\ = \frac{1}{2 ρ} | | {(G Q \frac{1}{2})}^{*} V_{X} | |_{U_{0}}^{2} \end{matrix}

Substituting back to (A32) we have the HJB equation for the infinite dimensional case:

\begin{matrix} \begin{matrix} - V_{t} (t, X (t)) & = ℓ (t, X (t)) + 〈V_{X}, A X (t) + F (X (t))〉 + \frac{1}{2 ρ} Tr [V_{X X} (G Q^{\frac{1}{2}}) {(G Q^{\frac{1}{2}})}^{*}] \\ - \frac{1}{2 ρ} | | {(G Q^{\frac{1}{2 ρ}})}^{*} V_{X} | |_{U_{0}}^{2} \end{matrix} \end{matrix}

In the same vein, one can also show that the relative entropy between the probability measures induced by the uncontrolled and controlled infinite dimensional systems in Equations (2) and (3) of the main text, respectively, result in an infinite dimensional quadratic control cost. This requires the use of the Radon–Nikodym derivative from our generalization of Girsanov’s theorem for the case of infinite dimensional stochastic systems in Equations (2) and (3) of the main text.

Appendix G. SPDEs under Boundary Control and Noise

Let us consider the following problem with Neumann boundary conditions:

\{\begin{matrix} Δ_{x} y (x) = λ y (x), x \in O \\ \frac{\partial}{\partial n} y (x) = γ (x), x \in \partial O \end{matrix}

(A33)

where

Δ_{x}

corresponds to the Laplacian,

λ \geq 0

is a real number,

O

is a bounded domain in

R^{d}

with regular boundary

\partial O

and

\frac{\partial}{\partial n}

denotes the normal derivative, with n being the outward unit normal vector. As shown in [29] and references therein, there exists a continuous operator

D_{N} : H^{s} (\partial O) \to H^{s + 3 / 2} (O)

such that

D_{N} γ

is the solution to (A33). Given this operator, stochastic parabolic equations with Neumann boundary conditions of the following type:

\begin{matrix} \frac{\partial h (t, x)}{\partial t} = Δ_{x} h (t, x) + f_{1} (t, h) + c_{1} (t, h) \frac{\partial w (t, x)}{\partial t}, x \in O \\ \frac{\partial h (t, x)}{\partial n} = f_{2} (t, h) + c_{2} (t, h) \frac{\partial v (t, x)}{\partial t}, x \in \partial O, \\ h (0, x) = h_{0} (x) . \end{matrix}

(A34)

which can be written in the mild abstract form:

\begin{matrix} \{\begin{matrix} \begin{matrix} X (t) & = e^{t A_{N}} X_{0} + \int_{0}^{t} e^{(t - s) A_{N}} F_{1} (s, X) d s + \int_{0}^{t} e^{(t - s) A_{N}} C_{1} (s, X) d W (s) \\ + \int_{0}^{t} {(λ I - A_{N})}^{1 / 4 + ϵ} e^{(t - s) A_{N}} G_{N} F_{2} (s, X) d s \\ + \int_{0}^{t} {(λ I - A_{N})}^{1 / 4 + ϵ} e^{(t - s) A_{N}} G_{N} C_{2} (s, X) d V (s), \end{matrix} \end{matrix} \end{matrix}

(A35)

where

G_{N} : = {(λ I - A_{N})}^{3 / 4 - ϵ} D_{N}

, and the remaining terms are defined with respect to the space-time formulation of (A35). A similar expression can be obtained for Dirichlet conditions as well; however, the solution has to be investigated under weak norms, or in weighted

L^{2}

spaces. More details can be found in ([29] Appendix C) and references therein.

Appendix H. An Equivalence of the Variational Optimization Approach for SPDEs with Q-Wiener Noise

In this section, we briefly discuss how one obtains an equivalent variational optimization as in Section 3 of the main text, for control of SPDEs with Q-Wiener noise. Consider the uncontrolled and controlled version of an H-valued process be given, respectively, by the following:

\begin{matrix} d X & = (A X + F (t, X)) d t + \frac{1}{\sqrt{ρ}} \sqrt{Q} d W (t), \end{matrix}

(A36)

\begin{matrix} d \tilde{X} & = (A \tilde{X} + F (t, \tilde{X})) d t + \sqrt{Q} (U (t, \tilde{X}) d t + \frac{1}{\sqrt{ρ}} d W (t)), \end{matrix}

(A37)

with initial condition

X (0) = \tilde{X} (0) = ξ

. Here, Q is a trace-class operator, and

W \in U

is a cylindrical Wiener process. The assumption that Q is of trace class is expressed as follows:

Tr [Q] = \sum_{n = 1}^{\infty} 〈Q e_{n}, e_{n}〉 < \infty .

As opposed to the discussion following Equation (3) of the main text, in this case we do not require any contractive assumption on the operator

A

due to the nuclear property of the operator Q. The stochastic integral

\int_{0}^{t} e^{(t - s) A} \sqrt{Q} d W (s)

is well defined in this case ([2] Chapter 4.2). Define the process:

\begin{matrix} W_{Q} (t) & : = \sqrt{Q} W (t) = \sum_{n = 1}^{\infty} \sqrt{Q} e_{n} β_{n} (t) \\ = \sum_{n = 1}^{\infty} \sqrt{λ_{n}} e_{n} β_{n} (t) \end{matrix}

where the basis

{e_{n}}

satisfies the eigenvalue–eigenvector relationship

Q e_{n} = λ e_{n}

. The process

W_{Q} (t)

satisfies the properties in Definition A4, and is therefore a Q-Wiener process.

The above case is an SPDE driven by Q-Wiener noise, which is quite different from the cylindrical Wiener process described in the rest of this work. In order to state the Girsanov’s theorem in this case, we first define the Hilbert space

U_{0} : = \sqrt{Q} (U) \subset U

with inner product

{〈 u, v 〉}_{U_{0}} : = {〈Q^{- 1 / 2} u, Q^{- 1 / 2} v〉}_{U}

,

\forall u, v \in U_{0}

.

Theorem A3 (Girsanov).

Let Ω be a sample space with a σ-algebra

F

. Consider the following H-valued stochastic processes:

\begin{matrix} d X & = (A X + F (t, X)) d t + \frac{1}{\sqrt{ρ}} d W_{Q} (t), \end{matrix}

(A38)

\begin{matrix} d \tilde{X} & = (A \tilde{X} + F (t, \tilde{X})) d t + \sqrt{Q} U (t, \tilde{X}) d t + \frac{1}{\sqrt{ρ}} d W_{Q} (t)), \end{matrix}

(A39)

where

X (0) = \tilde{X} (0) = x

and

W_{Q} \in U

is a Q-Wiener process with respect to measure

P

. Moreover, for each

Γ \in C ([0, T]; H)

, let the law of X be defined as

L (Γ) : = P (ω \in Ω | X (\cdot, ω) \in Γ)

. Similarly, the law of

\tilde{X}

is defined as

\tilde{L} (Γ) : = P (ω \in Ω | \tilde{X} (\cdot, ω) \in Γ)

. Then, we have the following:

\begin{matrix} \tilde{L} (Γ) = E_{P} [exp (\int_{0}^{T} {〈ψ (s), d W_{Q} (s)〉}_{U_{0}} - \frac{1}{2} \int_{0}^{T} {| | ψ (s) | |}_{U_{0}}^{2} d s) | X (\cdot) \in Γ], \end{matrix}

(A40)

where we have defined

ψ (t) : = \sqrt{ρ} U (t, \tilde{X} (t)) \in U_{0}

and assumed the following:

E_{P} [e^{\frac{1}{2} \int_{0}^{T} {| | ψ (t) | |}^{2} d t}] < + \infty .

(A41)

Proof.

The proof is identical to the proof of Theorem A2. □

Note that

ψ (t)

in this case is identical to

ψ (t)

in Theorem A2. As a result, despite having Q-Wiener noise, we have the same variational optimization for this case as in Section 3 of the main text.

Appendix I. A Comparison to Variational Optimization in Finite Dimensions

In what follows, we show how degeneracies arise for a similar derivation in finite dimensions. The stochastic dynamics are given by the following:

d X = (A X + F (t, X)) d t + G (t, X) (U (t, X) d t + \frac{1}{\sqrt{ρ}} d W (t)),

(A42)

where W(t) is a cylindrical Wiener process. Now, let the Hilbert space state vector

X (t) \in H

be approximated by a finite dimensional state vector

X (t) \approx \hat{X} (t) \in R^{d}

with arbitrary accuracy, where d is the number of grid points. In order to rewrite a finite dimensional form of (A42), the cylindrical Wiener noise term

W (t)

must be captured by a finite dimensional approximation. The expansion of

W (t)

in (A1) is restated here and truncated at m terms:

W (t) = \sum_{j = 1}^{\infty} \sqrt{λ_{j}} β_{j} (t) e_{j} = \sum_{j = 1}^{\infty} β_{j} (t) e_{j} \approx \sum_{j = 1}^{m} β_{j} (t) e_{j}

(A43)

where

λ_{j} = 1

,

\forall j \in N

in the case of cylindrical Wiener noise, and

β_{j} (t)

is a standard Wiener process on

R

. The stochastic dynamics in (A42) become a finite set of SDEs:

d \hat{X} = (A \hat{X} + F (t, \hat{X})) d t + G (t, \hat{X}) (M u (t; θ) d t + \frac{1}{\sqrt{ρ}} R d β (t))

(A44)

The terms

A

,

F

, and

G

are matrices associated with the Hilbert space operators

A

, F, and G respectively. The matrix

M

has dimensionality

M \in R^{d \times k}

, where k is the number of actuators placed in the field. The vector

d β \in R^{m}

collects the Wiener noise terms in the expansion (A43), and the matrix

R

collects finite dimensional basis vectors from (A43). As noted in the main paper, the dimensionality of the

R

is

R \in R^{d \times m}

. The degeneracy arises when

d > m

for the case of the cylindrical noise. For the case of Q-Wiener noise, degeneracy may arise, even when

d \leq m

and Rank

(R) < d

. In both cases, the issue of degeneracy prohibits the use of the Girsanov theorem for the importance sampling steps, due to the lack of invertibility of

R

. With respect to the approach relying on Gaussian densities, the derivation would require the following time discretization of the reduced order model in (A44):

\begin{matrix} \hat{X} (t + Δ t) & = \hat{X} (t) + \int_{t}^{t + Δ t} (A \hat{X} + F (t, \hat{X})) d t + \int_{t}^{t + Δ t} G (t, \hat{X}) (M u (t; θ) d t + \frac{1}{\sqrt{ρ}} R d β (t)) \end{matrix}

(A45)

\begin{matrix} \approx \hat{X} (t) + (A \hat{X} + F (t, \hat{X})) Δ t + G (t, \hat{X}) (M u (t; θ) Δ t + \frac{1}{\sqrt{ρ}} R d β (t)) \end{matrix}

(A46)

Without loss of generality, we simplify the expression above by assuming the

G (t, \hat{X}) = I_{d \times d}

. The transition probability takes the following form:

\begin{matrix} p (\hat{X} (t + Δ t) | \hat{X} (t)) \\ = \frac{1}{{(\sqrt{2 π})}^{n} {(det Σ_{\hat{X}})}^{\frac{1}{2}}} exp (- \frac{1}{2} {(\hat{X} (t + Δ t) - μ_{\hat{X}} (t + Δ t))}^{⊤} Σ_{\hat{X}}^{- 1} (\hat{X} (t + Δ t) - μ_{\hat{X}} (t + Δ t))) \end{matrix}

(A47)

where the term

μ_{\hat{X}} (t + Δ t)

is the mean and

Σ_{\hat{X}}

is the variance defined as follows:

\begin{matrix} μ_{\hat{X}} (t + Δ t) & = \hat{X} (t) + (A \hat{X} + F (t, \hat{X})) Δ t + M u (t; θ) Δ t \end{matrix}

(A48)

\begin{matrix} Σ_{\hat{X}} & = \frac{1}{ρ} R R^{T} Δ t \end{matrix}

(A49)

The existence of the transition probability densities requires invertibility of

R R^{T}

, which is not possible when

d < m

or when

Rank (R) < d

for

d \geq m

.

Appendix J. Algorithms for Open Loop and Model Predictive Infinite Dimensional Controllers

The following algorithms use equations derived in [42] for finite difference approximation of semi-linear SPDEs for Dirichlet and Neumann Boundary conditions. Spatial discretization is done as follows: pick a number of coordinate-wise discretization points J on the coordinate-wise domain

D = [a, b] \subset R

such that each spatial coordinate is discretized as

x_{k} = a + k \frac{b - a}{J}

where

k = 0, 1, 2, \dots, J

. For our experiments, the function that specifies how actuation is implemented by the infinite dimensional control is of the following form:

m_{l} (x_{k}; θ) = exp [\frac{- 1}{2 σ_{l}^{2}} {(x_{k} - μ_{l})}^{2}], l = 1, \dots, N

(A50)

where

μ_{l}

denotes the spatial position of the actuator on

[a, b]

and

σ_{l}

controls the influence of the actuator on nearby positions.

Next, we provide two algorithms for infinite dimensional stochastic control. In particular, Algorithm A1 is for open-loop trajectory optimization and Algorithm A2 is for model predictive control that uses implicit feedback.

Algorithm A1 Open-loop infinite dimensional controller.

1:: Function:u = OptimizeControl(Time horizon (T), number of optimization iterations (I), number of trajectory samples per optimization iteration (R), initial field profile ( $X_{0}$ ), number of actuators (N), initial control sequences ( $u_{T \times N}$ ) for each actuator, temperature parameter (ρ), time discretization ( $Δ t$ ), actuator centers and variance parameters (θ))
2:: for $i = 1 to I$ do
3:: Initialize $X \leftarrow X_{0}$
4:: for $r = 1 to R$ do
5:: for $t = 1 to T$ do
6:: Sample noise, $d W (t, x_{k}) = \sum_{j = 1}^{J} (e_{j} (t, x_{k}) β_{j} (t))$ , $e_{j} = \sqrt{2 / a} s i n (j π x / a)$ for $x \in L^{2} (0, a)$
7:: Compute entries of the actuation matrix $\tilde{M}$ by (A50)
8:: Compute the control actions applied to each grid point, $U (t) = u {(t)}^{T} \tilde{M}$
9:: Propagate the discretized field $X (t)$ ([42], Algorithm 10.8)
9:: end for
10:: end for
11:: Compute trajectory cost $J_{r}^{(i)}$ via Equation (18) of the main text
11:: end for
12:: end for
13:: Compute exponential weight of each trajectory $J_{r}^{(i)} : = exp (- ρ J_{r}^{(i)} (X))$
14:: Compute the normalizer $J_{m}^{(i)} = \frac{1}{R} \sum_{r = 1}^{R} J_{r}^{(i)}$
15:: Update nominal control sequence by Equation (17) of the main text
15:: end for
16:: end for
17:: Return: u

Algorithm A2 Model predictive infinite dimensional controller.

1:: Inputs: MPC time horizon (T), number of optimization iterations (I), number of trajectory samples per optimization iteration (R), initial profile ( $X_{0}$ ), number of actuators (N), initial control sequences ( $u_{T \times N}$ ) for each actuator, temperature parameter ( $ρ$ ), time discretization ( $Δ t$ ), actuator centers and variance parameters ( $θ$ ), total simulation time ( $T_{sim}$ )
2:: for $t_{sim} = 1 to T_{sim}$ do
3:: $u_{I} (t_{sim}) =$ OptimizeControl $(T, I, R, X_{0}, N, u, ρ, Δ t, θ)$
4:: Apply $u_{I} (t = 1)$ and propagate the discretized field to $t_{sim} + 1$
5:: Update the initial field profile $X_{0} \leftarrow X (t_{sim} + 1)$
6:: Update initial control sequence $u = [u_{I} [2 : T, :]; u_{I} [T, :]]$
6:: end for
7:: end for

For MATLAB pseudo-code on sampling space-time noise (step 6 in Algorithm A1 and step 7 in Algorithm A2), refer to ([42] algorithms 10.1 and 10.2). Note, however, that our experiments used cylindrical Wiener noise so

λ_{j} = 1

\forall j = 1, \dots, J

.

Appendix K. Brief Description of Each Experiment

The following is additional information about the experiments referenced in Section V. Appendix K.1 describes boundary and distributed control experiments, while Appendices Appendix K.2 and Appendix K.3 describe experiments for distributed control only.

Appendix K.1. Heat SPDE

The 2D stochastic Heat PDE with homogeneous Dirichlet boundary conditions is given by the following:

\begin{matrix} h_{t} (t, x, y) & = ϵ h_{x x} (t, x, y) + ϵ h_{y y} (t, x, y) + σ d W (t), \\ h (t, 0, y) & = h (t, a, y) = h (t, x, 0) = h (t, x, a) = 0, \\ h (0, x, y) & \sim N (h_{0}; 0, σ_{0}), \end{matrix}

(A51)

where the parameter

ϵ

is the so-called thermal diffusivity, which governs how quickly the initial temperature profile diffuses across the spatial domain. Equation (A51) considers the scenario of controlling a metallic plate to a desired temperature profile, using five actuators distributed across the plate. The edges of the plate are always held at constant temperature of 0 degrees Celsius. The parameter a is the length of the sides of the square plate for which we use

a = 0.5

m.

The actuator dynamics are modeled by Gaussian-like exponential functions with the means co-located with the actuator locations at:

μ = [μ_{1}, μ_{2}, μ_{3}, μ_{4}, μ_{5}] = [(0.2 a, 0.5 a), (0.5 a, 0.2 a),

(0.5 a, 0.5 a), (0.5 a, 0.8 a), (0.8 a, 0.5 a)]

and the variance of the effect of each actuator on nearby field states given by

σ_{l}^{2} = {(0.1 a)}^{2}

,

\forall l = 1, \dots, 5

. For every

j = 1, \dots, J

, and

l = 1, \dots, N

, the resulting

m_{l} (x)

has the following form:

m_{l, j} ([\begin{matrix} x \\ y \end{matrix}]) = exp \{- \frac{1}{2} {([\begin{matrix} x \\ y \end{matrix}] - [\begin{matrix} μ_{l, x} \\ μ_{l, y} \end{matrix}])}^{⊤} [\begin{matrix} σ_{l}^{2} & 0 \\ 0 & σ_{l}^{2} \end{matrix}] ([\begin{matrix} x \\ y \end{matrix}] - [\begin{matrix} μ_{l, x} \\ μ_{l, y} \end{matrix}])\}

The spatial domain is discretized by dividing the x and y domains into 64 points each creating a grid of

64 \times 64

spatial locations on the plate surface. For our experiments, we use a semi-implicit forward Euler discretization scheme for time and central difference for the 2nd order spatial derivatives

h_{x x}

and

h_{y y}

. We used the following parameter values, time discretization

Δ t = 0.01

s, MPC time horizon

T = 0.05

s, total simulation time

T_{sim} = 1.0

s, thermal diffusivity

ϵ = 1.0

and initialization standard deviation

σ_{0} = 0.5

. The cost function considered for the experiments was defined as follows:

J : = \sum_{t} \sum_{x} \sum_{y} κ {(h_{actual} (t, x, y) - h_{desired} (t, x, y))}^{2} \cdot 1_{S} (x, y)

where

S : = \cup_{i = 1}^{5} S_{i}

and the indicator function

1_{S} (x, y)

is defined as follows:

1_{S} (x, y) : = \{\begin{matrix} 1, if (x, y) \in S \\ 0, otherwise \end{matrix}

(A52)

where

\begin{matrix} S_{1} & = {(x, y) ∣ x \in [0.48 a, 0.52 a] and y \in [0.48 a, 0.52 a]} is in the central region, \\ S_{2} & = {(x, y) ∣ x \in [0.22 a, 0.18 a] and y \in [0.48 a, 0.52 a]} is the left - mid region, \\ S_{3} & = {(x, y) ∣ x \in [0.82 a, 0.78 a] and y \in [0.48 a, 0.52 a]} is the right - mid region, \\ S_{4} & = {(x, y) ∣ x \in [0.48 a, 0.52 a] and y \in [0.18 a, 0.22 a]} is in the top - central region, \\ S_{5} & = {(x, y) ∣ x \in [0.48 a, 0.52 a] and y \in [0.78 a, 0.82 a]} is in the bottom - central region . \end{matrix}

In addition

h_{desired} (t, x, y) = 0 . 5^{\circ} C

for

(x, y) \in S_{1}

and

h_{desired} (t, x, y) = 1 . 0^{\circ} C

for

(x, y) \in \cup_{i = 2}^{5} S_{i}

and the scaling parameter

κ = 100

.

In the boundary control case, we make use of the 1D stochastic heat equation given as follows:

\begin{matrix} h_{t} (t, x) & = ϵ h_{x x} (t, x) + σ d W (t) \\ h (0, x) & = h_{0} (x) \end{matrix}

For Dirichlet and Neumann boundary conditions, we have

h (t, x) = γ (x)

,

\forall x \in \partial O

and

h_{x} (t, x) = γ (x)

,

\forall x \in \partial O

, respectively. Regarding our 1D boundary control example, we set

ϵ = 1

,

σ = 0.1

,

h_{x} (t, 0) = u_{1} (t)

and

h_{x} (t, a) = u_{2} (t)

. In this case,

m_{l} (x)

is simply given by the identity function and the corresponding inner products associated with Girsanov’s theorem are given by the standard dot product. Finally, the cost function used is the same as above with

S = {x | 0 < x < a}

and the following:

h_{d e s i r e d} (t, x) = \{\begin{matrix} 1, for t \in [0, 0.4], \\ 3, for t \in [0, 0.4] and t \in [0.8, 1.3] . \end{matrix}

Appendix K.2. Burgers SPDE

The 1D stochastic Burgers PDE with non-homogeneous Dirichlet boundary conditions is as follows:

\begin{matrix} h_{t} (t, x) + h h_{x} (t, x) & = ϵ h_{x x} (t, x) + σ d W (t) \\ h (t, 0) & = h (t, a) = 1.0 \\ h (0, x) & = 0, \forall x \in (0, a) \end{matrix}

(A53)

where the parameter

ϵ

is the viscosity of the medium. (A53) considers a simple model of a 1D flow of a fluid in a medium with non-zero flow velocities at the two boundaries. The goal is to achieve and maintain a desired flow velocity profile at certain points along the spatial domain. As seen in the desired profile in Figure 3 of the main paper, there are three areas along the spatial domain with desired flow velocity such that the flow has to be accelerated, then decelerated, and then accelerated again while trying to overcome the stochastic forces and the dynamics governed by the Burgers PDE. Similar to the experiments for the heat SPDE, we consider actuators behaving as Gaussian-like exponential functions with the means co-located with the actuator locations at

μ = [0.2 a, 0.3 a, 0.5 a, 0.7 a, 0.8 a]

and the spatial effect (variance) of each actuator given by

σ_{l}^{2} = {(0.1 a)}^{2}

,

\forall l = 1, \dots, 5

. The parameter

a = 2.0 m

is the length of the channel along which the fluid is flowing.

This spatial domain was discretized, using a grid of 128 points. The numerical scheme used semi-implicit forward Euler discretization for time and central difference approximation for both the 1st and 2nd order derivatives in space. The 1st order derivative terms in the advection term

h h_{x}

were evaluated at the current time instant, while the 2nd order spatial derivatives in the diffusion term

h_{x x}

were evaluated at the next time instant; hence, the scheme is semi-implicit. The following are values of some other parameters used in our experiments: time discretization

Δ t = 0.01

, total simulation time =

1.0 s

, MPC time horizon =

0.1 s

, and the scaling parameter

κ = 100

. The cost function considered for the experiments was defined as follows:

J : = \sum_{t} \sum_{x} κ {(h_{actual} (t, x) - h_{desired} (t, x))}^{2} \cdot 1_{S} (x)

where the function

1_{S} (x)

is defined as in (A52) with

S = \cup_{i = 1}^{3}

, where

S_{1} = [0.18 a, 0.22 a]

,

S_{2} = [0.48 a, 0.52 a]

, and

S_{3} = [0.78 a, 0.82 a]

. In addition

h_{desired} (t, x) = 2.0 m / s

for

x \in S_{1} \cup S_{3}

, which is at the sides, and

h_{desired} (t, x) = 1.0 m / s

for

x \in S_{2}

which is in the central region.

Appendix K.3. Nagumo SPDE

The stochastic Nagumo equation with Neumann boundary conditions is as follows:

\begin{matrix} h_{t} (t, x) & = ϵ h_{x x} (t, x) + h (t, x) (1 - h (t, x)) (h (t, x) - α) + σ d W (t) \\ h_{x} (t, 0) & = h_{x} (t, a) = 0 \\ h (0, x) & = {(1 + exp (- \frac{2 - x}{\sqrt[]{2}}))}^{- 1} \end{matrix}

The parameter

α

determines the speed of a wave traveling down the length of the axon and

ϵ

the rate of diffusion. By simulating the deterministic Nagumo equation with

a = 5.0, ϵ = 1.0

and

α = - 0.5

, we observed that after about 5 s, the wave completely propagates to the end of the axon. Similar to the experiments for the heat SPDE, we considered actuators behaving as Gaussian-like exponential functions with actuator centers (mean values) at

μ = [0.2 a, 0.3 a, 0.4 a, 0.5 a, 0.6 a, 0.7 a, 0.8 a]

and the spatial effect (variance) of each actuator given by

σ_{l}^{2} = {(0.1 a)}^{2}

,

\forall l = 1, \dots, 7

. The spatial domain was discretized using a grid of 128 points. The numerical scheme used semi-implicit forward Euler discretization for time and central difference approximation for the 2nd order derivatives in space. The following are values of some other parameters used in our experiments: time discretization

Δ t = 0.01

, MPC time horizon =

0.1 s

, total simulation time =

1.5 s

for acceleration task and total simulation time =

5.0 s

for the suppression task, and the scaling parameter

κ = 10,000

. The cost function for this experiment was defined as follows:

J = \sum_{t} \sum_{x} κ {(h_{actual} (t, x) - h_{desired} (t, x))}^{2} \cdot 1_{S} (x)

where

h_{desired} (t, x) = 0.0 V

for the suppression task, and

h_{desired} (t, x) = 1.0 V

for the acceleration task, and the function

1_{S} (x)

is defined as in (A52) with

S = [0.7 a, 0.99 a]

.

References

Chow, P. Stochastic Partial Differential Equations; Taylor & Francis: Boca Raton, FL, USA, 2007. [Google Scholar]
Da Prato, G.; Zabczyk, J. Stochastic Equations in Infinite Dimensions; Encyclopedia of Mathematics and its Applications; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
Mikulevicius, R.; Rozovskii, B.L. Stochastic Navier–Stokes Equations for Turbulent Flows. SIAM J. Math. Anal. 2004, 35, 1250–1310. [Google Scholar] [CrossRef]
Dumont, G.; Payeur, A.; Longtin, A. A stochastic-field description of finite-size spiking neural networks. PLoS Comput. Biol. 2017, 13, e1005691. [Google Scholar] [CrossRef] [Green Version]
Pardoux, E. Stochastic partial differential equations and filtering of diffusion processes. Stochastics 1980, 3, 127–167. [Google Scholar] [CrossRef]
Bang, O.; Christiansen, P.L.; If, F.; Rasmussen, K.O.; Gaididei, Y.B. Temperature effects in a nonlinear model of monolayer Scheibe aggregates. Phys. Rev. E 1994, 49, 4627–4636. [Google Scholar] [CrossRef] [Green Version]
Cont, R. Modeling term structure dynamics: An infinite dimensional approach. Int. J. Theor. Appl. Financ. 2005, 8, 357–380. [Google Scholar] [CrossRef] [Green Version]
Gough, J.; Belavkin, V.P.; Smolyanov, O.G. Hamilton-Jacobi-Bellman equations for quantum optimal feedback control. J. Opt. B Quantum Semiclassical Opt. 2005, 7, S237. [Google Scholar] [CrossRef]
Bouten, L.; Edwards, S.; Belavkin, V.P. Bellman equations for optimal feedback control of qubit states. J. Phys. B At. Mol. Opt. Phys. 2005, 38, 151. [Google Scholar] [CrossRef] [Green Version]
Pontryagin, L.; Boltyanskii, V.; Gamkrelidze, R.; Mishchenko, E. The Mathematical Theory of Optimal Processes; Pergamon Press: New York, NY, USA, 1962. [Google Scholar]
Bellman, R.; Kalaba, R. Selected Papers On Mathematical Trends in Control Theory; Dover Publications: Mineola, NY, USA, 1964. [Google Scholar]
Yong, J.; Zhou, X. Stochastic Controls: Hamiltonian Systems and HJB Equations; Stochastic Modelling and Applied Probability; Springer: New York, NY, USA, 1999. [Google Scholar]
Lou, Y.; Hu, G.; Christofides, P.D. Model predictive control of nonlinear stochastic PDEs: Application to a sputtering process. In Proceedings of the 2009 American Control Conference, St. Louis, MO, USA, 10–12 June 2009; pp. 2476–2483. [Google Scholar] [CrossRef]
Gomes, S.; Kalliadasis, S.; Papageorgiou, D.; Pavliotis, G.; Pradas, M. Controlling roughening processes in the stochastic Kuramoto-Sivashinsky equation. Phys. D Nonlinear Phenom. 2017, 348, 33–43. [Google Scholar] [CrossRef]
Pardoux, E.; Rascanu, A. Stochastic Differential Equations, Backward SDEs, Partial Differential Equations; Springer: Berlin/Heidelberg, Germany, 2014; Volume 69. [Google Scholar] [CrossRef]
Fleming, W.H.; Soner, H.M. Controlled Markov Processes and Viscosity Solutions, 2nd ed.; Applications of Mathematics; Springer: New York, NY, USA, 2006. [Google Scholar]
Exarchos, I.; Theodorou, E.A. Stochastic optimal control via forward and backward stochastic differential equations and importance sampling. Automatica 2018, 87, 159–165. [Google Scholar] [CrossRef]
Williams, G.; Aldrich, A.; Theodorou, E.A. Model Predictive Path Integral Control: From Theory to Parallel Computation. J. Guid. Control. Dyn. 2017, 40, 344–357. [Google Scholar] [CrossRef]
Evans, E.N.; Pereira, M.A.; Boutselis, G.I.; Theodorou, E.A. Variational Optimization Based Reinforcement Learning for Infinite Dimensional Stochastic Systems. In Proceedings of the Conference on Robot Learning, Osaka, Japan, 30 October–1 November 2019. [Google Scholar]
Evans, E.N.; Kendall, A.P.; Boutselis, G.I.; Theodorou, E.A. Spatio-Temporal Stochastic Optimization: Theory and Applications to Optimal Control and Co-Design. In Proceedings of the 2020 Robotics: Sciences and Systems (RSS) Conference, 12–16 July 2020. [Google Scholar]
Evans, E.N.; Kendall, A.P.; Theodorou, E.A. Stochastic Spatio-Temporal Optimization for Control and Co-Design of Systems in Robotics and Applied Physics. arXiv 2021, arXiv:2102.09144. [Google Scholar]
Bieker, K.; Peitz, S.; Brunton, S.L.; Kutz, J.N.; Dellnitz, M. Deep Model Predictive Control with Online Learning for Complex Physical Systems. arXiv 2019, arXiv:1905.10094. [Google Scholar]
Nair, A.G.; Yeh, C.A.; Kaiser, E.; Noack, B.R.; Brunton, S.L.; Taira, K. Cluster-based feedback control of turbulent post-stall separated flows. J. Fluid Mech. 2019, 875, 345–375. [Google Scholar] [CrossRef] [Green Version]
Mohan, A.T.; Gaitonde, D.V. A deep learning based approach to reduced order modeling for turbulent flow control using LSTM neural networks. arXiv 2018, arXiv:1804.09269. [Google Scholar]
Morton, J.; Jameson, A.; Kochenderfer, M.J.; Witherden, F. Deep dynamical modeling and control of unsteady fluid flows. arXiv 2018, arXiv:1805.07472. [Google Scholar]
Rabault, J.; Kuchta, M.; Jensen, A.; Réglade, U.; Cerardi, N. Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control. J. Fluid Mech. 2019, 865, 281–302. [Google Scholar] [CrossRef] [Green Version]
Curtain, R.F.; Glover, K. Robust stabilization of infinite dimensional systems by finite dimensional controllers. Syst. Control Lett. 1986, 7, 41–47. [Google Scholar] [CrossRef]
Balas, M. Feedback control of flexible systems. IEEE Trans. Autom. Control 1978, 23, 673–679. [Google Scholar] [CrossRef]
Fabbri, G.; Gozzi, F.; Swiech, A. Stochastic Optimal Control in Infinite Dimensions—Dynamic Programming and HJB Equations; Number 82 in Probability Theory and Stochastic Modelling; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
Da Prato, G.; Debussche, A. Control of the Stochastic Burgers Model of Turbulence. SIAM J. Control Optim. 1999, 37, 1123–1149. [Google Scholar] [CrossRef] [Green Version]
Feng, J. Large deviation for diffusions and Hamilton-Jacobi equation in Hilbert spaces. Ann. Probab. 2006, 34, 321–385. [Google Scholar] [CrossRef] [Green Version]
Theodorou, E.A.; Boutselis, G.I.; Bakshi, K. Linearly Solvable Stochastic Optimal Control for Infinite-Dimensional Systems. In Proceedings of the 2018 IEEE Conference on Decision and Control (CDC), Miami, FL, USA, 17–19 December 2018; IEEE: New York, NY, USA, 2018; pp. 4110–4116. [Google Scholar]
Todorov, E. Efficient computation of optimal actions. Proc. Natl. Acad. Sci. USA 2009, 106, 11478–11483. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Theodorou, E.; Todorov, E. Relative entropy and free energy dualities: Connections to Path Integral and KL control. In Proceedings of the IEEE Conference on Decision and Control, Maui, HI, USA, 10–13 December 2012; pp. 1466–1473. [Google Scholar] [CrossRef]
Theodorou, E.A. Nonlinear Stochastic Control and Information Theoretic Dualities: Connections, Interdependencies and Thermodynamic Interpretations. Entropy 2015, 17, 3352. [Google Scholar] [CrossRef]
Kappen, H.J. Path integrals and symmetry breaking for optimal control theory. J. Stat. Mech. Theory Exp. 2005, 11, P11011. [Google Scholar] [CrossRef] [Green Version]
Maslowski, B. Stability of semilinear equations with boundary and pointwise noise. Ann. Della Sc. Norm. Super. Pisa Cl. Sci. 1995, 22, 55–93. [Google Scholar]
Debussche, A.; Fuhrman, M.; Tessitore, G. Optimal control of a stochastic heat equation with boundary-noise and boundary-control. ESAIM Control. Optim. Calc. Var. 2007, 13, 178–205. [Google Scholar] [CrossRef]
Kappen, H.J.; Ruiz, H.C. Adaptive Importance Sampling for Control and Inference. J. Stat. Phys. 2016, 162, 1244–1266. [Google Scholar] [CrossRef] [Green Version]
Da Prato, G.; Debussche, A.; Temam, R. Stochastic Burgers’ equation. Nonlinear Differ. Equ. Appl. NoDEA 1994, 1, 389–402. [Google Scholar] [CrossRef]
Jeng, D.T. Forced model equation for turbulence. Phys. Fluids 1969, 12, 2006–2010. [Google Scholar] [CrossRef]
Lord, G.J.; Powell, C.E.; Shardlow, T. An Introduction to Computational Stochastic PDEs; Cambridge Texts in Applied Mathematics, Cambridge University Press: Cambridge, UK, 2014. [Google Scholar] [CrossRef] [Green Version]
Duncan, T.E.; Maslowski, B.; Pasik-Duncan, B. Ergodic boundary/point control of stochastic semilinear systems. SIAM J. Control Optim. 1998, 36, 1020–1047. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Connection between the free energy-relative entropy approach and stochastic Bellman principle of optimality.

Figure 2. Overview of architecture for the control of spatio-temporal stochastic systems, where

d W_{j}^{r}

denotes a cylindrical Wiener process at time step j for simulated system rollout r. See Equations (17) and (18) and related explanations for a more complete explanation. Although the rollout images appear pictorially similar, they represent different realizations of the noise process

d W_{t}

.

Figure 2. Overview of architecture for the control of spatio-temporal stochastic systems, where

d W_{j}^{r}

denotes a cylindrical Wiener process at time step j for simulated system rollout r. See Equations (17) and (18) and related explanations for a more complete explanation. Although the rollout images appear pictorially similar, they represent different realizations of the noise process

d W_{t}

.

Figure 3. Infinite dimensional control of the 1D Burgers SPDE: (top) Velocity profiles averaged over the 2nd half of each time horizon over 128 trials. (bottom left) Spatio-temporal evolution of the uncontrolled 1D Burgers SPDE with cylindrical Wiener process noise. (bottom right) Spatio-temporal evolution of 1D Burgers SPDE, using MPC.

Figure 4. Infinite dimensional control of the Nagumo SPDE—acceleration task: (top) voltage profiles averaged over the 2nd half of each time horizon over 128 trials, (bottom left) uncontrolled spatio-temporal evolution for 5.0 s, and (bottom right) accelerated activity with MPC within 1.5 s.

Figure 5. Infinite dimensional control of the Nagumo SPDE—suppression task: (top) voltage profiles averaged over the 2nd half of each time horizon over 128 trials, (bottom left) uncontrolled spatio-temporal evolution for 5.0 s, and (bottom right) suppressed activity with MPC for 5.0 s.

Figure 6. Infinite dimensional control of the 2D heat SPDE under homogeneous Dirichlet boundary conditions: (first) desired temperature values at specified spatial regions, (second) random initial temperature profile, (third) temperature profile half way through the experiment and (fourth) temperature profile at the end of experiment.

Figure 7. Boundary control of stochastic 1D heat equation: (left) temperature profile over the 1D spatial domain over time. The magenta surface corresponds to the spatio-temporal desired temperature profile. Colors that are more red correspond to higher temperatures, and colors that are more violet correspond to lower temperature. (right) Control inputs at the left boundary in black and the right boundary in green entering through Neumann boundary conditions.

Table 1. Examples of commonly known semi-linear PDEs in a fields representation with subscript x representing partial derivative with respect to spatial dimensions and subscript t representing partial derivatives with respect to time. The associated operators

A

and

F (t, X)

in the Hilbert space formulation are colored blue and violet, respectively.

Table 1. Examples of commonly known semi-linear PDEs in a fields representation with subscript x representing partial derivative with respect to spatial dimensions and subscript t representing partial derivatives with respect to time. The associated operators

A

and

F (t, X)

in the Hilbert space formulation are colored blue and violet, respectively.

Equation Name	Partial Differential Equation	Field State
Nagumo	$u_{t} = ϵ u_{x x} + u (1 - u) (u - α)$	Voltage
Heat	$u_{t} = ϵ u_{x x}$	Heat/temperature
Burgers (viscous)	$u_{t} = ϵ u_{x x} - u u_{x}$	Velocity
Allen–Cahn	$u_{t} = ϵ u_{x x} + u - u^{3}$	Phase of a material
Navier–Stokes	$u_{t} = ϵ Δ u - \nabla p - (u \cdot \nabla) u$	Velocity
Nonlinear Schrodinger	$u_{t} = \frac{1}{2} i u_{x x} + {i \| u \|}^{2} u$	Wave function
Korteweg–de Vries	$u_{t} = - u_{x x x x} - 6 u u_{x}$	Plasma wave
Kuramoto–Sivashinsky	$u_{t} = - u_{x x x x} - u_{x x} - u u_{x}$	Flame front

Table 2. Summary of Monte Carlo trials for the stochastic viscous Burgers equation.

	RMSE			Average $σ$
Targets	Left	Center	Right	Left	Center	Right
MPC	0.0344	0.0156	0.0132	0.0309	0.0718	0.0386
Open-loop	0.0820	0.1006	0.0632	0.0846	0.0696	0.0797

Table 3. Summary of Monte Carlo trials for Nagumo acceleration and suppression tasks.

Task	Acceleration		Suppression
Paradigm	MPC	Open-Loop	MPC	Open-Loop
RMSE	6.605 × 10 $^{- 4}$	0.0042	0.0021	0.0048
Avg. $σ$	0.0059	0.0197	0.0046	0.0389

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Boutselis, G.I.; Evans, E.N.; Pereira, M.A.; Theodorou, E.A. Leveraging Stochasticity for Open Loop and Model Predictive Control of Spatio-Temporal Systems. Entropy 2021, 23, 941. https://doi.org/10.3390/e23080941

AMA Style

Boutselis GI, Evans EN, Pereira MA, Theodorou EA. Leveraging Stochasticity for Open Loop and Model Predictive Control of Spatio-Temporal Systems. Entropy. 2021; 23(8):941. https://doi.org/10.3390/e23080941

Chicago/Turabian Style

Boutselis, George I., Ethan N. Evans, Marcus A. Pereira, and Evangelos A. Theodorou. 2021. "Leveraging Stochasticity for Open Loop and Model Predictive Control of Spatio-Temporal Systems" Entropy 23, no. 8: 941. https://doi.org/10.3390/e23080941

APA Style

Boutselis, G. I., Evans, E. N., Pereira, M. A., & Theodorou, E. A. (2021). Leveraging Stochasticity for Open Loop and Model Predictive Control of Spatio-Temporal Systems. Entropy, 23(8), 941. https://doi.org/10.3390/e23080941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leveraging Stochasticity for Open Loop and Model Predictive Control of Spatio-Temporal Systems

Abstract

1. Introduction and Related Work

2. Preliminaries and Problem Formulation

3. Stochastic Optimization in Hilbert Spaces

4. Comparisons to Finite-Dimensional Optimization

5. Numerical Results

5.1. Distributed Control of Stochastic PDEs in Fluid Physics

5.2. Boundary Control of Stochastic PDEs

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Description of the Hilbert Space Wiener Process

Appendix B. Relative Entropy and Free Energy Dualities in Hilbert Spaces

Appendix C. A Girsanov Theorem for SPDEs

Appendix D. Proof of Lemma 1

Appendix E. Feynman–Kac for Spatio-Temporal Diffusions: From Expectations to Hilbert Space PDEs

Appendix F. Connections to Stochastic Dynamic Programming

Appendix G. SPDEs under Boundary Control and Noise

Appendix H. An Equivalence of the Variational Optimization Approach for SPDEs with Q-Wiener Noise

Appendix I. A Comparison to Variational Optimization in Finite Dimensions

Appendix J. Algorithms for Open Loop and Model Predictive Infinite Dimensional Controllers

Appendix K. Brief Description of Each Experiment

Appendix K.1. Heat SPDE

Appendix K.2. Burgers SPDE

Appendix K.3. Nagumo SPDE

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI