Bayesian Instability of Optical Imaging: Ill Conditioning of Inverse Linear and Nonlinear Radiative Transfer Equation in the Fluid Regime

Li, Qin; Newton, Kit; Wang, Li

doi:10.3390/computation10020015

Open AccessArticle

Bayesian Instability of Optical Imaging: Ill Conditioning of Inverse Linear and Nonlinear Radiative Transfer Equation in the Fluid Regime

by

Qin Li

¹

,

Kit Newton

²

and

Li Wang

^3,*

¹

Department of Mathematics, University of Wisconsin-Madison, Madison, WI 53706, USA

²

Department of Mathematics, Diablo Valley College, Pleasant Hill, CA 94523, USA

³

School of Mathematics, University of Minnesota, Minneapolis, MN 55455, USA

^*

Author to whom correspondence should be addressed.

Computation 2022, 10(2), 15; https://doi.org/10.3390/computation10020015

Submission received: 14 December 2021 / Revised: 7 January 2022 / Accepted: 13 January 2022 / Published: 19 January 2022

(This article belongs to the Special Issue Inverse Problems with Partial Data)

Download

Browse Figures

Versions Notes

Abstract

:

For the inverse problem in physical models, one measures the solution and infers the model parameters using information from the collected data. Oftentimes, these data are inadequate and render the inverse problem ill-posed. We study the ill-posedness in the context of optical imaging, which is a medical imaging technique that uses light to probe (bio-)tissue structure. Depending on the intensity of the light, the forward problem can be described by different types of equations. High-energy light scatters very little, and one uses the radiative transfer equation (RTE) as the model; low-energy light scatters frequently, so the diffusion equation (DE) suffices to be a good approximation. A multiscale approximation links the hyperbolic-type RTE with the parabolic-type DE. The inverse problems for the two equations have a multiscale passage as well, so one expects that as the energy of the photons diminishes, the inverse problem changes from well- to ill-posed. We study this stability deterioration using the Bayesian inference. In particular, we use the Kullback–Leibler divergence between the prior distribution and the posterior distribution based on the RTE to prove that the information gain from the measurement vanishes as the energy of the photons decreases, so that the inverse problem is ill-posed in the diffusive regime. In the linearized setting, we also show that the mean square error of the posterior distribution increases as we approach the diffusive regime.

Keywords:

inverse problems; Bayesian approach; stability deterioration; multiscale modeling; asymptotic analysis

1. Introduction

Optical tomography is a well-defined inverse problem. In the lab, laser beams with high-energy photons are injected into bio-tissues to detect the interior optical property [1]. This helps identify unhealthy bio-tissues for treatment. Mathematically, this amounts to reconstructing the optical coefficients, such as the scattering and the absorption coefficients in the radiative transfer equation (RTE), which is a model equation that describes the propagation of photon particles [2]. The equation, in its simplest form, reads

\partial_{t} f + \frac{1}{ϵ} v \cdot \nabla_{x} f = \frac{1}{ϵ^{2}} Q [f] .

(1)

The unknown

f (t, x, v)

describes the number of photon particles at time

t \in R^{+}

, location

x \in Ω \subseteq R^{d}

, and traveling with velocity

v \in S^{d - 1}

. The dimension of the problem is

d = 2, 3

. v is the travel direction of the particles and therefore belongs to a unit ball. The two terms in the equation represent the transport and the scattering effect, respectively. The transport term characterizes

\dot{x} = v

, while the scattering term

Q [f]

describes the way the photon particles interact with the media. When the temperature is fixed, this operator is a linear operator, whereas if the laser beam heats up the tissue,

Q

reflects the nonlinear dependence on the temperature. This is the term that encodes the optical property of the media.

In the steady state, the

\partial_{t} f

term is dropped, and the equation balances the transport term and the scattering term. The equation is well-posed if equipped with Dirichlet-type boundary condition [3,4,5,6]:

{f |}_{Γ_{-}} = ϕ,

(2)

where

Γ_{\pm}

collects the coordinates on the physical boundary

\partial Ω

with the velocity either pointing in or out of the domain:

Γ_{\pm} = {(x, v) : x \in \partial Ω, \pm v \cdot n_{x} > 0} = \cup_{x \in \partial Ω} Γ_{\pm} (x), where Γ_{\pm} (x) = {v : \pm v \cdot n_{x} > 0} .

In practice, the laser light is shined into the domain, meaning that

ϕ

is prescribed. Then, one detects the number of photons scattered out of the domain by measuring

{f |}_{Γ_{+}}

. We term this operator the albedo operator:

A_{Q} : f {|_{Γ_{-}} \to f |}_{Γ_{+}},

where the subindex

Q

reflects the influence of

Q

. Therefore, optical tomography amounts to reconstructing the coefficients in

Q

using the information in the operator

A_{Q}

. The general well-posedness theory of such an inverse problem was addressed in the pioneering papers [7,8]. The result on the stability was established in [9,10,11], and see [12] for a review.

One key parameter in Equation (1) is

ϵ

, which is the Knudsen number. It describes the regime the system is in. By definition, the Knudsen number is the ratio of the mean-free path and the domain length. Physically, if a photon has low energy (visible or near-infrared light), it travels only a short distance before being scattered, and the mean-free path is short compared to the domain length, leading to a small

ϵ

. When this happens, one typically observes a diffused light phenomenon, and the received images are blurred. The situation is termed the diffusion effect, and in this regime, RTE, either linear or nonlinear, can be asymptotically approximated by a diffusive equation that characterizes macroscopic quantities such as the density

ρ (x) = \int f (x, v) d v

. Depending on the form of

Q

, this limiting equation for

ρ

is accordingly adjusted. Details regarding the diffusion limit can be found in [13,14].

Intuitively, as one decreases the photon energy, the received picture loses its crisp, and the reconstruction becomes more unstable. The perturbation observed in the measurement is enlarged in the reconstructed coefficients. This phenomenon has been been numerically observed in [1,15,16] and proved rigorously in [17], in which the authors investigate the stability deterioration of the inverse problem as

ϵ \to 0

. However, all these results heavily depend on the theoretical framework where it is assumed that one has the access to the full map

A

. This amounts to sending in all kinds of incoming data

ϕ

and taking the measurement over the entire

Γ_{+}

.

These theoretical results are helpful in building the foundation for understanding the stability deterioration, but the setup is infeasible numerically or practically. In the lab, a finite number of incoming configurations can be taken, and the detectors can measure outgoing photons in a finite number of locations. Therefore, new theory needs to be developed to account for this real situation [18,19]. To put the problem in a mathematical framework, we denote

{ϕ_{j}}_{j = 1}^{J}

as the incoming data and

{x_{k}}_{k = 1}^{K}

as the location of the detectors; then, the measurements are

G = {G_{j k}}_{j = 1, k = 1}^{J, K}, with G_{j k} = \int_{v \in Γ_{+} (x_{k})} A_{Q} [ϕ_{j}] | v \cdot n_{x_{k}} | d v .

Equivalently, denoting

f_{j}

as the solution to (1) with the boundary condition in (2) being

ϕ_{j}

, then:

G_{j k} = \int_{v \in Γ_{+} (x_{k})} f_{j} (x_{k}, v) | v \cdot n_{x_{k}} | d v .

With the finite amount of data points, to find the coefficients in

Q

, one can either adopt the PDE-constrained minimization algorithms [20,21,22] or employ the Bayesian inference [23,24,25]. While this finite setting greatly affects the well-posedness argument [26,27,28], the physical intuition still holds true in the sense that the reconstruction is expected to become more and more unstable as

ϵ

diminishes.

In this paper, we quantify the stability deterioration in the Bayesian inference framework with a finite number of data points. To quantify “stability”, we propose two measures: one is a global measure that evaluates the information gain by comparing the posterior and prior distributions, and the other is a local measure that characterizes the “flatness” of the posterior distribution around the MAP (maximum a posteriori) point. More particularly, at the global level, we measure the difference between the posterior and prior distributions using the KL (Kullback–Leibler) divergence. Since the posterior distribution of the coefficients takes both the prior knowledge and the measured data into account, this divergence essentially characterizes by how much the data are driving the final guess away from the prior guess. At the local level, we estimate the Hessian of the posterior distribution around the MAP point, which essentially describes the uncertainty level of the optimizer. For both measures, we analyze their dependence on

ϵ

, and we reveal the stability relation between two regimes, the transport regime and diffusive regime, in a more rigorous way. We will present our theory through the lens of both linear and nonlinear RTEs, but we shall mention that other multiscale kinetic models may have such a relation in the inverse setting; see for instance in [29,30,31].

The rest of the paper is organized as follows. In the next section, we recall some formulation in the Bayesian inference problem and demonstrate a general relation in the KL divergence between prior and posterior distribution, through which we will address the stability deterioration from the global point of view. In Section 3, we consider the inverse problem for the linear radiative transfer equation, and we show that the KL divergence is of order

ϵ

, which diminishes as we approach the diffusive regime. We extend the investigation to the nonlinear RTE in Section 4 and obtain similar results. In Section 5, we focus on the local viewpoint by estimating the second derivative of the parameter-to-measurement map, and we show that it decreases at the order of

ϵ

, indicating that the posterior distribution becomes flatter near the diffusive regime and therefore is less sensitive to the measurement data, rendering the inverse problem more unstable. We summarize our theoretical results and provide some numerical evidence in Section 6. A final conclusion of the paper is drawn in Section 7.

All the discussion in these sections finally achieve the following aim of the paper: to demonstrate, in the Bayesian inference framework, the stability deterioration of optical imaging when the impinging light uses low-energy photons.

2. Bayesian Formulation Basics

Bayesian inference is one of the most popular numerical methods for inferring unknown parameters. In this section, we give a quick overview of definitions to be used in this paper.

To start, we define the parameter to measurement map

G : B_{σ} \mapsto R^{q}

and denote

η

the measurement error:

d = G (σ) + η,

where the measurement

d \in R^{q}

, the measurement error

η \in R^{d}

, and

σ \in B \subset B_{σ}

, the admissible set in a pre-defined Banach space

B_{σ}

:

B = {σ \in B_{σ} : ∥ G (σ) ∥_{L_{\infty} (Ω)} < C_{B}} .

(3)

Note that

B_{σ}

can be a function space specified in (10), and

σ

is a function in this space. Throughout the paper, we assume that

η

is an additive noise generated by a Gaussian distribution

N (0, Γ_{η})

, and

Γ_{η}

is a

q \times q

dimensional matrix.

In Bayesian inference, one needs to prepare a prior distribution for

σ

. Denote

F

as the

σ

-algebra of B and

μ_{0}

as the prior probability measure on B. According to the Bayes’ rule, the posterior distribution

μ_{post} = μ (σ | d)

is given according to its Radon–Nikodym derivative with respect to

d μ_{0}

:

\frac{d μ_{post}}{d μ_{0}} \propto \frac{1}{Z} exp (- \frac{1}{2} {∥ G (σ) - d ∥}_{Γ_{η}}^{2}),

(4)

where Z is the normalization constant:

Z = \int_{B} exp (- \frac{1}{2} {∥ G (σ) - d ∥}_{Γ_{η}}^{2}) d μ_{0} (σ),

and the mismatch is weighted by

Γ_{η}

{∥ G (σ) - d ∥}_{Γ_{η}}^{2} = {(G (σ) - d)}^{T} Γ_{η}^{- 1} (G (σ) - d) .

Suppose

μ_{1, 2}

are two distinct probability measures; then, the KL divergence measures the distance between them:

D_{KL} (μ_{1} | μ_{2}) = \int_{X} log (\frac{d μ_{1}}{d μ_{2}}) d μ_{1} .

when this definition is used in Bayesian inference to quantify the relative gain through the measurement process, one defines the relative entropy, and in the particular setting of (4), we have the following proposition.

Proposition 1.

Assume

σ \in B

; then, the KL divergence between

μ_{post}

and

μ_{0}

has the following estimate:

D_{K L} (μ_{0} | μ_{post}) \leq C \int_{B} \int_{B} |(G (σ^{'}) - G (σ)) Γ_{η}^{- 1} (G (σ) + G (σ^{'}) - 2 d)| d μ_{0} (σ^{'}) d μ_{0} (σ)

(5)

for some positive constant C independent of B.

Proof.

Since

μ_{0}

and

μ_{post}

are mutually absolutely continuous, according to (4), we have

\begin{matrix} D_{K L} (μ_{0} | μ_{post}) & = \int_{B} log (\frac{d μ_{0}}{d μ_{post}}) d μ_{0} \\ = \int_{B} (\frac{1}{2} {∥ G (σ) - d ∥}_{Γ_{η}}^{2} + log Z) d μ_{0} . \end{matrix}

Denoting

B (σ) = \frac{1}{2} {∥ G (σ) - d ∥}_{Γ_{η}}^{2}

, we proceed with:

\begin{matrix} D_{K L} (μ_{0} | μ_{post}) & = \int_{B} [B (σ) + log (\int_{B} e^{- B (σ^{'})} d μ_{0} (σ^{'}))] d μ_{0} (σ) \\ = \int_{B} [- log e^{- B (σ)} + log (\int_{B} e^{- B (σ^{'})} d μ_{0} (σ^{'}))] d μ_{0} (σ) \\ \leq C \int_{B} [- e^{- B (σ)} + (\int_{B} e^{- B (σ^{'})} d μ_{0} (σ^{'}))] d μ_{0} (σ) \\ \leq C \int_{B} \int_{B} | e^{- B (σ)} - e^{- B (σ^{'})} | d μ_{0} (σ^{'}) d μ_{0} (σ) \\ \leq C \int_{B} \int_{B} | B (σ) - B (σ^{'}) | d μ_{0} (σ^{'}) d μ_{0} (σ) \end{matrix}

where we used the Lipschitz continuity of

e^{x}

in a bounded interval. The constant C depends on the size of d and the boundedness of

G

. Since

σ \in B

, according to (3),

{∥ G (σ) ∥}_{\infty} < C_{B}

, making

C < \infty

. The inequality (5) holds if we plug in the expression of

B (σ)

to get:

B (σ) - B (σ^{'}) = \frac{1}{2} [G (σ^{'}) - G (σ)] Γ_{η}^{- 1} [G (σ) + G (σ^{'}) - 2 d] .

□

In view of (5), given that

Γ_{η}^{- 1}

and

G (σ) + G (σ^{'}) - 2 d

are at least bounded, the difference between

μ_{post}

and

μ_{0}

is controlled by the difference between

G (σ)

and

G (σ^{'})

. This means if

G

is a slow-varying map over the whole admissible set B, then

μ_{post}

only slightly differs from

μ_{0}

, indicating that the information gain is small. In the following sections, we justify this property for RTE in both linear and nonlinear settings in the

ϵ \to 0

regime.

3. Global View Example 1: Linear Radiative Transfer Equation

The first example we consider is the linear radiative transfer equation:

\begin{matrix} \{\begin{matrix} v \cdot \nabla_{x} f (x, v) = - ϵ σ_{a} (x) f (x, v) + \frac{1}{ϵ} σ_{s} (x) L f (x, v), \\ f |_{Γ_{-}} = ϕ, \end{matrix} \end{matrix}

(6)

where

x \in Ω \subseteq R^{d}

,

v \in S^{d - 1}

,

L f = \frac{1}{| S^{d - 1} |} \int_{S^{d - 1}} f d v - f

is the collision operator,

σ_{a}

is the absorption coefficient, and

σ_{s}

is the scattering coefficient. Both

σ_{s}

and

σ_{a}

here are seen as functions supported on

Ω

. The boundary

Γ_{\pm} = {(x, v) \in Ω \times S^{d - 1} | \pm v \cdot n_{x} > 0}

denotes the incoming (

Γ_{-}

) and outgoing (

Γ_{+}

) boundaries respectively, and thus, incoming boundary conditions are considered here.

In the following, we first establish the the parameter-to-measurement map

G

. Then, according to Proposition 1, the core of the quantification lies in analyzing how fast

G

varies with respect to

σ

, for which we derive its Fréchet derivative and estimate its dependence on

ϵ

.

3.1. First Derivative of the Parameter-to-Solution Map

Let

σ (x) = (σ_{s} (x), σ_{a} (x))

be the short hand notation,

x \in Ω

, and assume that only a finite series of incoming data

{ϕ_{j}}_{j = 1}^{J}

are generated in the experiments, and the outgoing data are collected only at K discrete boundary locations

{x_{k}}_{k = 1}^{K}

; then,

G

is defined as:

G : σ (x) \mapsto {\{\int_{Γ_{+} (x_{k})} f (x_{k}, v; ϕ_{j}) v \cdot n d v\}}_{j = 1, \dots J, k = 1, \dots, K} .

(7)

We also define the parameter-to-solution map

S

as:

S : σ (x) \mapsto {f (x_{k}, v; ϕ_{j})}_{j = 1, \dots J, k = 1, \dots, K},

then

G

and

S

are related via:

G (σ) = \int_{Γ_{+}} S (σ) v \cdot n d v .

The above relation can be understood more precisely component-wise. We write

G = {G_{j k}}_{j = 1, \dots J, k = 1, \dots, K}

and

S = {S_{j k}}_{j = 1, \dots J, k = 1, \dots, K}

, then

G_{j k} (σ) = \int_{Γ_{+} (x_{k})} f (x_{k}, v; ϕ_{j}) v \cdot n d v = \int_{Γ_{+} (x_{k})} S_{j k} (σ) v \cdot n d v .

Consequently, finding the derivative of

G

in the

\hat{σ}

direction amounts to finding the derivative of the map

S

,

\begin{matrix} G_{j k}^{'} (σ) \hat{σ} & = lim_{t \to 0} \frac{1}{t} \int_{Γ_{+} (x_{k})} (f_{σ + t \hat{σ}} (x_{k}, v; ϕ_{j}) - f_{σ} (x_{k}, v; ϕ_{j})) v \cdot n d v = \int_{Γ_{+} (x_{k})} S_{j k}^{'} (σ) \hat{σ} v \cdot n d v, \end{matrix}

(8)

where the subscript in f denotes the parameter used in obtaining the solution f, and

\hat{σ}

is an admissible variation of

σ

defined below. To abbreviate the notation, we denote

f = f_{σ} (x, v; ϕ), \hat{f} = f_{σ + t \hat{σ}} (x, v; ϕ), w = S^{'} (σ) \hat{σ}

(9)

and omit the subscripts j, k when the calculations are valid for all j and k.

We now specify the set of

σ

as follows:

B_{σ} = {σ \in C^{1} {(Ω) : σ > 0, max {∥ σ ∥}_{L_{\infty} (Ω)}, ∥ σ^{- 1} ∥_{L_{\infty} (Ω)}, ∥ \nabla (σ^{- 1}) ∥_{L_{\infty} (Ω)}} < C_{1}} .

(10)

Then, from [32] (see Proposition 3.1), we have that for

σ \in B_{σ}

,

{∥ G (σ) ∥}_{L_{\infty} (Ω)} \leq C

. We then call a parameter

\hat{σ} \in B_{σ}

an admissible variation of

σ

if the perturbed parameter

σ + t \hat{σ} \in B_{σ}

for sufficiently small t.

Hereafter, assume that the boundary condition

ϕ \in L_{\infty} (Γ_{-})

. We cite the following two results from [4]. The first one indicates that

G^{'}

from (8) is well defined, and the second one provides an estimate of w defined in (9).

Lemma 1.

Let

ϕ \in L_{\infty} (Γ_{-})

; then, the operator

S

is a Lipschitz continuous mapping from

B_{σ}

to

F

, where

F

is defined as

F = {f : f \in L_{\infty} (Ω \times S^{d - 1}), v \cdot \nabla f \in L_{\infty} (Ω \times S^{d - 1})} .

Proposition 2.

Denote

A f = v \cdot \nabla_{x} f, C_{σ} f = ϵ σ_{a} (x) f - \frac{1}{ϵ} σ_{s} (x) L f .

For

σ \in B_{σ}

and admissible variation

\hat{σ}

, w is the unique solution of the following equation:

\{\begin{matrix} A w + C_{σ} w = - C_{\hat{σ}} f, \\ w |_{Γ_{-}} = 0, \end{matrix}

(11)

where

f \in F

is the solution to (6) with parameter σ. Moreover, for

ϵ = 1

, there holds

{∥ w ∥}_{L_{\infty} (Ω)} \leq C ∥ \hat{σ} ∥_{L_{\infty} (Ω)} {∥ ϕ ∥}_{L_{\infty} (Γ_{-})} .

3.2. Dependence on Knudsen Number

We discuss the dependence of the first derivative of the parameter-to-measurement map

G

on the Knudsen number

ϵ

and use it to build the asymptotic connection in the KL divergence between

μ_{0}

and

μ_{post}

. The proofs are carried out by asymptotic analysis. First, we recall the diffusion limit of the RTE.

Theorem 1.

Suppose

f (x, v)

satisfies Equation (6) with a smooth boundary condition

ϕ (x, v) = ξ (x) - \frac{ϵ}{σ_{s}} v \cdot \nabla_{x} ρ_{f} (x),

(12)

where

ξ (x) \in C^{2} (\partial Ω)

, and

ρ_{f}

solves

\{\begin{matrix} - C_{d} \nabla \cdot (\frac{1}{σ_{s}} \nabla ρ_{f}) + σ_{a} ρ_{f} = 0, \\ ρ_{f} (x) |_{\partial Ω} = ξ (x) . \end{matrix}

(13)

Then,

ϕ \in L_{\infty} (Γ_{-})

, and

f (x, v) \to ρ_{f} (x)

as

ϵ \to 0

. Moreover, we have

∥ f - ρ_{f} ∥_{L_{\infty} (Ω \times S^{d - 1})} \leq C_{B_{σ}} ϵ .

The proof is very similar to that in [32] (see the Appendix therein), but we still include the details here as it will be used in proving the next proposition.

Proof.

We decompose the solution as

f = f_{0} + ϵ f_{1} + ϵ^{2} f_{2} + ϵ^{3} f_{r},

(14)

where

f_{0}

,

f_{1}

, and

f_{2}

are chosen as

\begin{matrix} f_{0} & = ρ_{f}, \\ f_{1} & = - \frac{1}{σ_{s}} v \cdot \nabla_{x} ρ_{f}, \\ f_{2} & = L^{- 1} [- \frac{v}{σ_{s}} \cdot \nabla_{x} (\frac{v}{σ_{s}} \cdot \nabla_{x} ρ_{f}) + \frac{σ_{a}}{σ_{s}} ρ_{f}] . \end{matrix}

In order for

f_{2}

to be well defined above, we need

- \frac{v}{σ_{s}} \cdot \nabla_{x} (\frac{v}{σ_{s}} \cdot \nabla_{x} ρ_{f}) + \frac{σ_{a}}{σ_{s}} ρ_{f}

to be in the range of

L

, which then leads to

{〈\frac{v}{σ_{s}} \cdot \nabla_{x} (\frac{v}{σ_{s}} \cdot \nabla_{x} ρ_{f})〉}_{v} = \frac{σ_{a}}{σ_{s}} ρ_{f} .

Equipping it with the boundary condition

ρ_{f} |_{\partial Ω} = ξ (x)

determines

ρ_{f}

. Note that

ξ (x) \in C^{2} (\partial Ω)

, the standard elliptic estimate gives the global boundedness of

ρ

,

\partial_{x_{i}} ρ

and

\partial_{x_{i} x_{j}} ρ

. Since

L^{- 1}

is a bounded operator on

Null {(L)}^{⊥}

,

f_{2}

is uniformly bounded. Since

L^{- 1}

and

\nabla_{x}

commute, similar analysis can be applied to show that

\partial_{x_{i}} f_{2}

is uniformly bounded. Plugging the expansion (14) into (6), we obtain the equation for

f_{r}

:

v \cdot \nabla_{x} f_{r} + ϵ σ_{a} f_{r} - \frac{1}{ϵ} σ_{s} L f_{r} = - σ_{a} f_{2} - \frac{1}{ϵ} v \cdot \nabla_{x} f_{2} - \frac{1}{ϵ} σ_{a} f_{1}

with boundary condition

f_{r} |_{Γ_{-}} = - ϵ L^{- 1} [- \frac{v}{σ_{s}} \cdot \nabla_{x} (\frac{v}{σ_{s}} \cdot \nabla_{x} ρ_{f} (x)) + \frac{σ_{a}}{σ_{s}} ρ_{f} (x)]

. Consequently,

f_{r}

satisfies RTE with

O (ϵ)

boundary condition and

O (1 / ϵ)

source; therefore, from the maximum principle,

∥ f_{r} ∥_{L_{\infty} (Ω \times S^{d - 1})} \leq O (1 / ϵ)

. Recalling (14), we have

f = ρ_{f} - ϵ \frac{1}{σ_{s}} v \cdot \nabla_{x} ρ_{f} + O (ϵ^{2}) .

Using the boundedness of

\nabla_{x} ρ_{f}

, we conclude the theorem. □

Remark 1.

In our case, we suppress the boundary layer by proposing a special boundary condition for f as defined in (12). This result can be extended to more general boundary conditions, in which case boundary layer analysis is inevitable, see [13] for details.

The proposition below demonstrates the dependence of the first derivative of

G

on

ϵ

.

Proposition 3.

For every

σ \in B_{σ}

and admissible variation

\hat{σ}

, there holds

G^{'} (σ) \hat{σ} = O (ϵ) .

Proof.

Considering (11) with perturbation

\hat{σ} : = ({\hat{σ}}_{s}, {\hat{σ}}_{a})

, we expand w and f as

\begin{matrix} w = w_{0} + ϵ w_{1} + ϵ^{2} w_{r}, f = f_{0} + ϵ f_{1} + ϵ^{2} f_{2} + ϵ^{3} f_{r} . \end{matrix}

(15)

Then, at

O (1 / ϵ)

, we have

- σ_{s} L w_{0} = {\hat{σ}}_{s} L f_{0} = 0

, which indicates that

w_{0} (x, v) = w_{0} (x) .

(16)

At

O (1)

, we have

v \cdot \nabla w_{0} - σ_{s} L w_{1} = {\hat{σ}}_{s} L f_{1}

, then using the form of

f_{1} = - \frac{1}{σ_{s}} v \cdot \nabla f_{0}

, we obtain

σ_{s} L w_{1} = v \cdot \nabla w_{0} - \frac{{\hat{σ}}_{s}}{σ_{s}} v \cdot \nabla f_{0}

, and thus

w_{1} = - \frac{1}{σ_{s}} v \cdot \nabla w_{0} + \frac{{\hat{σ}}_{s}}{σ_{s}^{2}} v \cdot \nabla ρ_{f},

(17)

where we have used

f_{0} = ρ_{f}

. To get an evolution equation for

w_{0}

, we look at the

O (ϵ)

level and obtain:

v \cdot \nabla_{x} w_{1} + σ_{a} w_{0} - σ_{s} L w_{r} = - {\hat{σ}}_{a} f_{0} + {\hat{σ}}_{s} L f_{r}

. Substituting (16) and (17), and taking the average in v leads to

{〈v \cdot \nabla_{x} (\frac{v}{σ_{s}} \cdot \nabla_{x} w_{0})〉}_{v} - σ_{a} w_{0} = {\hat{σ}}_{a} ρ_{f} + {〈v \cdot \nabla_{x} (\frac{{\hat{σ}}_{s}}{σ_{s}^{2}} v \cdot \nabla_{x} ρ_{f})〉}_{v} .

(18)

From the zero boundary condition of w, we also have

w_{0} |_{\partial Ω} = 0 .

(19)

Substituting (15) into (11), we get

\begin{matrix} v \cdot \nabla_{x} w_{r} + ϵ σ_{a} w_{r} - \frac{σ_{s}}{ϵ} L w_{r} = & - σ_{a} (\frac{1}{ϵ} w_{0} + \frac{{\hat{σ}}_{s}}{σ_{s}^{2}} v \cdot \nabla_{x} ρ_{f} - \frac{v}{σ_{s}} \cdot \nabla_{x} w_{0}) \\ - \frac{1}{ϵ} v \cdot \nabla_{x} (\frac{{\hat{σ}}_{s}}{σ_{s}^{2}} v \cdot \nabla_{x} ρ_{f} - \frac{v}{σ_{s}} \cdot \nabla_{x} w_{0}) \\ - \frac{{\hat{σ}}_{a}}{ϵ} f + \frac{{\hat{σ}}_{s}}{ϵ} L (f_{2} + ϵ f_{r}), \end{matrix}

(20)

with boundary condition

w_{r} |_{Γ_{-}} = - ϵ w_{1} |_{Γ_{-}}

. Again from the maximum principle,

∥ w_{r} ∥_{L_{\infty} (Ω \times S^{d - 1})} \leq O (1 / ϵ)

. Here, the boundedness of

w_{0}

,

\partial_{x_{i}} w_{0}

are again from the elliptic estimate of (18).

Now, recalling the definition of the first derivative of the forward map, we have

\begin{matrix} G^{'} (σ_{*}) \hat{σ} & = \int_{Γ_{+}} w (x, v; ϕ) v \cdot n d v \\ = \int_{Γ_{+}} (w_{0} (x; ϕ) + ϵ w_{1} (x, v; ϕ) + ϵ^{2} w_{r} (x, v; ϕ)) v \cdot n d v = O (ϵ), \end{matrix}

which concludes the proof. Here, the first term vanishes due to the zero boundary condition (19). □

Theorem 2.

For a linear RTE problem, the KL divergence between

μ_{post}

and

μ_{0}

vanishes as

ϵ \to 0

, i.e.,

D_{K L} (μ_{post} | μ_{0}) = O (ϵ) .

Proof.

From (5), we have that

\begin{matrix} D_{K L} (μ_{0} | μ_{post}) & ≲ \int_{B} \int_{B} |(G (σ^{'}) - G (σ)) Γ_{η}^{- 1} (G (σ) + G (σ^{'}) - 2 d)| d μ_{0} (σ^{'}) d μ_{0} (σ) \\ = \int_{B} \int_{B} \int_{0}^{1} |G^{'} (σ + s (σ^{'} - σ)) (σ^{'} - σ) Γ_{η}^{- 1} (G (σ) + G (σ^{'}) - 2 d)| d s d μ_{0} (σ^{'}) d μ_{0} (σ) . \end{matrix}

Then, from Proposition 3 and boundedness of

G (σ)

, the result is immediate. □

4. Global View Example 2: Nonlinear Radiative Transfer Equation

Along the same vein, we extend our investigation to the nonlinear RTE in this section:

\begin{matrix} \{\begin{matrix} ϵ v \cdot \nabla_{x} f (x, v) = σ (x) (T {(x)}^{4} - f (x, v)), \\ ϵ^{2} ∆ T (x) = σ (x) (T {(x)}^{4} - ρ (x)), \\ f |_{Γ_{-}} = ϕ (x, v), \\ T |_{\partial Ω} = T_{B} (x) . \end{matrix} \end{matrix}

(21)

Here,

x \in Ω \subseteq R^{d}

,

v \in S^{d - 1}

,

f (x, v)

is again the radiation intensity and

T (x)

is the temperature.

ρ = \frac{1}{| S^{d - 1} |} \int_{S^{d - 1}} f d v

is the total intensity.

ϵ

is the Knudsen number, which is defined to be the ratio of the mean free path to a characteristic length of the problem [33]. The boundary conditions for f are on the incoming boundary

Γ_{-}

.

In the following, we first recall the diffusion limit of (21); then, we analyze the first derivative of the parameter to the measurement map in the inverse setting and investigate its dependence on

ϵ

so as to build a connection between

μ_{post}

and

μ_{0}

.

4.1. First Derivative of the Parameter-to-Solution Map

For the nonlinear RTE problem, the inverse problem is again set up to recover the scattering coefficient

σ

by measuring the outgoing intensity flux. Therefore, the parameter-to-measurement map is defined as

G : σ (x) \mapsto \int_{Γ_{+}} v \cdot n f d v .

More particularly, assume that only a finite series of incoming data,

{ϕ_{j}}_{j = 1}^{J}

, are generated in the experiments and injected into the tissue, and the outgoing data are collected only at K discrete boundary locations

{x_{k}}_{k = 1}^{K}

; then

G : σ (x) \mapsto {\{\int_{Γ_{+} (x_{k})} f (x_{k}, v; ϕ_{j}, T_{B, j}) v \cdot n d v\}}_{j = 1, \dots J, k = 1, \dots, K} .

We also define the parameter-to-solution maps

S_{1}

and

S_{2}

\begin{matrix} S_{1} : σ (x) \mapsto {f (x_{k}, v; ϕ_{j}, T_{B, j})}_{j = 1, \dots J, k = 1, \dots, K}, \end{matrix}

(22)

\begin{matrix} S_{2} : σ (x) \mapsto {T (x_{k}; ϕ_{j}, T_{B, j})}_{j = 1, \dots J, k = 1, \dots, K}; \end{matrix}

(23)

then, the forward map

G

is written

G_{j k} (σ) = \int_{Γ_{+} (x_{k})} f (x_{k}, v; ϕ_{j}, T_{B, j}) v \cdot n d v = \int_{Γ_{+} (x_{k})} {S_{1}}_{j k} (σ) v \cdot n d v .

Consequently, finding the derivative of

G

amounts to finding the derivative of the map

S_{1}

, namely,

\begin{matrix} G_{j k}^{'} (σ) \hat{σ} & = lim_{t \to 0} \frac{1}{t} \int_{Γ_{+} (x_{k})} (f_{σ + t \hat{σ}} (x_{k}, v; ϕ_{j}, T_{B, j}) - f_{σ} (x_{k}, v; ϕ_{j}, T_{B, j})) v \cdot n d v \\ = \int_{Γ_{+} (x_{k})} {S_{1}}_{j k}^{'} (σ) \hat{σ} v \cdot n d v, \end{matrix}

(24)

where we use the same notation to index the parameter of f as what we did for the linear RTE. Since f is nonlinearly coupled to T, we also need to find the derivative of the map

S_{2}

, which is defined similarly as

{S_{2}^{'}}_{j k} (σ) \hat{σ} = lim_{t \to 0} \frac{1}{t} [T_{σ + t \hat{σ}} (x; ϕ, T_{B}) - T_{σ} (x; ϕ, T_{B})] .

Here,

σ \in B_{σ}

defined in (10) and

\hat{σ}

is an admissible variation of

σ

.

To abbreviate the notation, we denote

\begin{matrix} f = f_{σ} (x, v; ϕ, T_{B}), \hat{f} = f_{σ + t \hat{σ}} (x, v; ϕ, T_{B}), w_{f} = S_{1}^{'} (σ) \hat{σ}, \end{matrix}

(25)

\begin{matrix} T = T_{σ} (x; ϕ, T_{B}), \hat{T} = T_{σ + t \hat{σ}} (x; ϕ, T_{B}), w_{T} = S_{2}^{'} (σ) \hat{σ}, \end{matrix}

(26)

and omit the subscript j, k when the calculations are valid for all j and k.

Proposition 4.

Let

ϕ \in L_{\infty} (Γ_{-})

. Then, the operators

S_{1}

and

S_{2}

, defined in (22) and (23), are Lipschitz continuous mappings from

L_{\infty} (Ω) \times L_{\infty} (Ω)

to

F_{f} \times F_{T}

, where

\begin{matrix} F_{f} = {f : f \in L_{\infty} (Ω \times S^{d - 1}), v \cdot \nabla f \in L_{\infty} (Ω \times S^{d - 1})}, a n d F_{T} = {T : T \in L_{\infty} (Ω)} . \end{matrix}

Proof.

We follow Theorem 3.3 in [4] and extend the proof to the nonlinear RTE. Let

σ \in B_{σ}

and

\hat{σ}

be admissible, and denote

(f, T), (\hat{f}, \hat{T})

be corresponding solutions of (21):

\begin{matrix} \{\begin{matrix} ϵ v \cdot \nabla_{x} f = σ (T^{4} - f), \\ ϵ ∆ T = σ (T^{4} - ρ), \\ f |_{Γ_{-}} = ϕ (x, v), T |_{\partial Ω} = T_{B} (x), \end{matrix} \end{matrix}

and

\begin{matrix} \{\begin{matrix} ϵ v \cdot \nabla_{x} \hat{f} = (σ + t \hat{σ}) ({(\hat{T})}^{4} - \hat{f}) . \\ ϵ ∆ \hat{T} = (σ + t \hat{σ}) ({(\hat{T})}^{4} - \hat{ρ}), \\ \hat{f} |_{Γ_{-}} = ϕ (x, v), \hat{T} |_{\partial Ω} = T_{B} (x) . \end{matrix} \end{matrix}

For sufficiently small t, we write the perturbed solutions

\hat{f} = f + t \tilde{f}, \hat{T} = T + t \tilde{T};

then, it is straightforward to show that

(\tilde{f}, \tilde{T})

satisfy

\begin{matrix} \{\begin{matrix} ϵ v \cdot \nabla_{x} \tilde{f} = σ [4 T^{3} \tilde{T} - t \tilde{f}] + \hat{σ} (T^{4} - f) + O (t^{2}), \\ ϵ^{2} ∆ \tilde{T} = σ [4 T^{3} \tilde{T} - \tilde{ρ}] + \hat{σ} (T^{4} - ρ) + O (t^{2}), \\ \tilde{f} |_{Γ_{-}} = 0, \tilde{T} |_{\partial Ω} = 0 . \end{matrix} \end{matrix}

(27)

From Theorem 3.1 in [33], for

T_{B} \in H^{1 / 2} (\partial Ω) \cap L_{\infty} (\partial Ω)

, let

γ

be that

∥ T_{B} ∥_{L_{\infty} (\partial Ω)} \leq γ

, there exists a unique solution

(\tilde{f}, \tilde{T})

to (27) that satisfies the following thanks to the maximum principle:

\begin{matrix} ∥ \tilde{T} ∥_{L_{\infty} (Ω \times S^{d - 1})} = {∥ \frac{1}{t} (T - \hat{T}) ∥}_{L_{\infty} (Ω \times S^{d - 1})} \leq γ, \\ ∥ \tilde{f} ∥_{L_{\infty} (Ω \times S^{d - 1})} = {∥ \frac{1}{t} (f - \hat{f}) ∥}_{L_{\infty} (Ω \times S^{d - 1})} \leq B (γ) . \end{matrix}

(28)

□

As an immediate consequence, we have the following control of the first derivative of the parameter-to-solution map.

Proposition 5.

For

σ \in B_{σ}

and admissible variation

\hat{σ}

,

w_{f}

and

w_{T}

are the unique solutions of the following system:

\begin{matrix} \{\begin{matrix} ϵ v \cdot \nabla_{x} w_{f} = σ (4 T^{3} w_{T} - w_{f}) + \hat{σ} (T^{4} - f), \\ ϵ^{2} ∆ w_{T} = σ (4 T^{3} w_{T} - w_{ρ}) + \hat{σ} (T^{4} - ρ), \\ w_{f} |_{Γ_{-}} = 0, w_{T} |_{\partial Ω} = 0, \end{matrix} \end{matrix}

(29)

where

(f, T)

and

(\hat{f}, \hat{T})

satisfy (21) with parameter σ and

σ + t \hat{σ}

, respectively. Moreover, there holds:

\begin{matrix} ∥ w_{f} ∥_{L_{\infty} (Ω)} \leq C_{f}, {∥ w_{T} ∥}_{L_{\infty} (Ω)} \leq C_{T}, \end{matrix}

(30)

where

C_{T} > 0

and

C_{f}

is a constant depending on

C_{T}

.

Proof.

From the definition of

w_{f}

and

w_{T}

, (29) comes directly from (27). Then, the bounds (30) can be concluded from (28). □

4.2. Dependence on Knudsen Number

To analyze the dependence of

G^{'}

on

ϵ

, we first recall the asymptotic result in the forward setting.

Theorem 3.

Let

(f (x, v), T (x))

solve (21); then, as

ϵ \to 0

, the pair converges to

(T_{0} {(x)}^{4}, T_{0} (x))

, with

T_{0} (x)

solving the nonlinear diffusion equation,

\{\begin{matrix} C_{d} \nabla \cdot (\frac{1}{σ (x)} \nabla T_{0} {(x)}^{4}) + ∆ T_{0} = 0, \\ T_{0} |_{\partial Ω} = ζ (x), \end{matrix}

(31)

where

C_{d}

is a constant depending on dimension.

ζ (x)

is determined from

ϕ (x, v)

and

T_{B} (x)

by solving a Milne problem.

This result is rigorously proved in [33], and we only sketch the main steps here.

Proof.

Consider the asymptotic expansion in

ϵ

in the bulk area:

\begin{matrix} f = f_{0} + ϵ f_{1} + ϵ^{2} f_{2} + \dots, and T = T_{0} + ϵ^{2} T_{2} + \dots . \end{matrix}

Then, at the leading order

O (ϵ^{0})

, we have

\begin{matrix} T_{0}^{4} = f_{0} = ρ_{0}, \end{matrix}

(32)

which implies that

f_{0}

is uniform in v. The next order

O (ϵ)

leads to

f_{1} = - \frac{1}{σ} v \cdot \nabla f_{0} .

(33)

At

O (ϵ^{2})

, we have

\begin{matrix} \{\begin{matrix} v \cdot \nabla_{x} f_{1} & = σ (4 T_{0}^{3} T_{2} - f_{2}), \\ ∆ T_{0} & = σ (4 T_{0}^{3} T_{2} - ρ_{2}) . \end{matrix} \end{matrix}

Taking the average in v for the first equation and subtracting it from the second one, we have

\begin{matrix} \frac{1}{| S^{d - 1} |} \int_{S^{d - 1}} v \cdot \nabla_{x} (- \frac{1}{σ} v \cdot \nabla f_{0}) d v + ∆ T_{0} = 0 . \end{matrix}

(34)

Using (32) in (34) implies that

C_{d} \nabla \cdot (\frac{1}{σ} \nabla T_{0}^{4}) + ∆ T_{0} = 0,

(35)

where

C_{d}

depends on the dimension.

To determine the boundary condition for

T_{0}

, we need to take into account the boundary layer effect. More precisely, let

Ω_{0}

be the interior of the domain so that

Ω ∖ Ω_{0}

is the boundary area. As before, we let

y = \frac{x - \tilde{x}}{ϵ} \cdot n

be the stretching variable, where

\tilde{x} \in \partial Ω

and n is the unit outer normal direction at

\tilde{x}

; then,

y \in [0, \infty)

. For

x \in Ω ∖ Ω_{0}

, let

f_{BL} (y, v) = f (x, v)

,

T_{BL} (y) = T (x)

, and they satisfy the Milne problem:

\begin{matrix} \{\begin{matrix} \partial_{y} f_{BL} (y, v) v \cdot n = σ (T_{BL} {(y)}^{4} - f_{BL} (y, v)) \\ \partial_{y y} T_{BL} (y) = σ (T_{BL} {(y)}^{4} - ρ_{BL} (y)) \\ f_{BL} (0, v) = ϕ (\tilde{x}, v) \\ T_{BL} (0) = T_{B} (\tilde{x}) \end{matrix} . \end{matrix}

Then, solving the Milne problem, the boundary condition for (35) is given by

T_{0} (x) |_{\partial Ω} = T_{BL} (\infty) .

□

With this theorem in mind, we delve into the dependence of the derivatives of

G

on

ϵ

in the following proposition.

Proposition 6.

For every

j, k

,

σ \in B_{σ}

, and admissible variation

\hat{σ}

, there holds:

G_{j k}^{'} (σ) \hat{σ} = O (ϵ) .

Proof.

The proof is carried out by asymptotic analysis. We write the expansions:

\begin{matrix} f & = f_{0} + ϵ f_{1} + ϵ^{2} f_{2}, T = T_{0} + ϵ^{2} T_{2} \end{matrix}

(36)

\begin{matrix} w_{f} & = w_{f, 0} + ϵ w_{f, 1} + ϵ^{2} w_{f, 2}, w_{T} = w_{T, 0} + ϵ^{2} w_{T, 2} . \end{matrix}

(37)

Plugging (36) into (21), we have as before:

\begin{matrix} f_{0} = T_{0}^{4} = ρ_{0}; \end{matrix}

(38)

\begin{matrix} f_{1} = - \frac{v}{σ} \cdot \nabla ρ_{0}; \end{matrix}

(39)

\begin{matrix} C_{d} \nabla \cdot (\frac{1}{σ} \nabla T_{0}^{4}) + ∆ T_{0} = 0, T_{0} |_{\partial Ω} = ζ (x), \end{matrix}

(40)

and solving these readily gives

f_{0}

,

f_{1}

, and

T_{0}

. The remaining

f_{2}

and

T_{2}

are obtained by solving the rest of the system:

\{\begin{matrix} ϵ^{3} v \cdot \nabla f_{2} = σ [{(T_{0} + ϵ^{2} T_{2})}^{4} - 4 ϵ^{2} T_{0}^{3} T_{2} - T_{0}^{4}], \\ ϵ^{4} ∆ T_{2} = σ [{(T_{0} + ϵ^{2} T_{2})}^{4} - 4 ϵ^{2} T_{0}^{3} T_{2} - T_{0}^{4}] . \end{matrix}

(41)

Next, we plug both (36) and (37) into (29), and at leading order

O (1)

:

w_{f, 0} = w_{ρ, 0} = 4 T_{0}^{3} w_{T, 0} .

(42)

At

O (ϵ)

, we equate

v \cdot \nabla_{x} w_{f, 0} = - σ w_{f, 1} - \hat{σ} f_{1},

which implies

w_{f, 1} = - \frac{1}{σ} v \cdot \nabla_{x} w_{f, 0} + \frac{\hat{σ}}{σ^{2}} v \cdot \nabla_{x} f_{0} .

(43)

Then, the next order

O (ϵ^{2})

is:

\begin{matrix} v \cdot \nabla_{x} w_{f, 1} = σ (12 T_{0} T_{2}^{2} w_{T, 0} + 4 T_{0}^{2} w_{T, 2} - w_{f, 2}) + \hat{σ} (4 T_{0}^{3} T_{2} - f_{2}) \end{matrix}

(44)

\begin{matrix} ∆ w_{T, 0} = σ (12 T_{0}^{2} T_{2} w_{T, 0} + 4 T_{0}^{3} w_{T, 2} - w_{ρ 2}) + \hat{σ} (4 T_{0}^{3} T_{2} - ρ_{2}), \end{matrix}

(45)

taking the average in v of (44) and adding the result to gives the equation for

w_{T, 0}

C_{d} \nabla \cdot (\frac{1}{σ} \nabla (4 T_{0}^{3} w_{T, 0})) + ∆ w_{T, 0} = C_{d} \nabla_{x} \cdot (\frac{\hat{σ}}{σ^{2}} \nabla ρ_{0}), w_{T, 0} |_{\partial Ω} = 0 .

(46)

Here, the boundary condition for

w_{T, 0}

is again determined by the boundary layer asymptotic analysis, so we omit the details.

Therefore, from the above, one solves (46) for

w_{T, 0}

first and then uses (42) and (43) to get

w_{f, 0}

and

w_{f, 1}

. The remaining

w_{f, 2}

and

w_{T, 2}

can be obtained by solving the rest of the system:

\begin{matrix} \{\begin{matrix} ϵ^{3} v \cdot \nabla w_{f, 2} & = σ [4 {(T_{0} + ϵ^{2} T_{2})}^{3} (w_{T, 0} + ϵ^{2} w_{T, 2}) - 4 ϵ^{2} T_{0}^{3} w_{T, 2} - 12 ϵ^{2} T_{0}^{2} T_{2} w_{T, 0} - 4 T_{0}^{3} w_{T, 0}] \\ + \hat{σ} [{(T_{0} + ϵ^{2} T_{2})}^{4} - T_{0}^{4} - 4 ϵ^{2} T_{0}^{3} T_{2}], \\ ϵ^{4} ∆ w_{T, 2} & = σ [4 {(T_{0} + ϵ^{2} T_{2})}^{3} (w_{T, 0} + ϵ^{2} w_{T, 2}) - 4 ϵ^{2} T_{0}^{3} w_{T, 2} - 12 ϵ^{2} T_{0}^{2} T_{2} w_{T, 0} - 4 T_{0}^{3} w_{T, 0}] \\ + \hat{σ} [{(T_{0} + ϵ^{2} T_{2})}^{4} - T_{0}^{4} - 4 ϵ^{2} T_{0}^{3} T_{2}] . \end{matrix} \end{matrix}

(47)

This gives the boundedness of

w_{f, 2}

by

\frac{1}{ϵ}

. Using the expansion for

w_{f}

in the form (24) of

G^{'} (σ) \hat{σ}

, we have

\begin{matrix} G_{j k}^{'} (σ) \hat{σ} & = \int_{Γ_{+} (x_{k})} w_{f} (x_{k}, v; ϕ_{j}, T_{B}) v \cdot n d v \\ = \int_{Γ_{+} (x_{k})} (w_{f, 0} (x_{k}; ϕ_{j}, T_{B}) + ϵ w_{f, 1} (x_{k}; ϕ_{j}, T_{B}) + ϵ^{2} w_{f, 2} (x_{k}; ϕ_{j}, T_{B})) v \cdot n d v \\ = O (ϵ), \end{matrix}

concluding the proposition. □

The main theorem for the nonlinear case is now in order.

Theorem 4.

For the nonlinear RTE problem, the KL divergence between

μ_{post}

and

μ_{0}

vanishes as

ϵ \to 0

, i.e.,

D_{K L} (μ_{post} | μ_{0}) = O (ϵ) .

With the result from Proposition 6, the proof is identical to that of Theorem 2.

5. Local Behavior around the MAP Point

The KL divergence between the prior and the posterior distribution is a global quantity: it characterizes the information gain from the measured data in the whole distribution. We are also concerned about the local behavior of the posterior distribution, especially around the maximum a posteriori (MAP) point, which is denoted by

σ_{*}

. Suppose the posterior distribution around the MAP point is rather “flat”; then, the probability is unchanged in a fairly large area around

σ_{*}

, meaning that all the configuration in this flat area can be approximately taken as the optimal point, and the reconstruction is insensitive to data, demonstrating the instability.

This behavior can be characterized well if the problem is linear. Suppose

G (σ) = G σ

; then, assuming the prior distribution is a Gaussian centered at

σ_{0}

with covariance

C_{0}

, the posterior distribution is uniquely determined by

σ_{post}

and

C_{post}

by:

σ_{post} = C_{post}^{- 1} (G^{⊤} Γ_{η}^{- 1} d + C_{0}^{- 1} σ_{0}), C_{post} = {(G^{⊤} Γ_{η}^{- 1} G + C_{0}^{- 1})}^{- 1} .

The “flatness” of a Gaussian is characterized by its covariance matrix. Indeed, with a quick derivation, one can show that:

\int_{B} {∥ σ_{post} - σ ∥}^{2} d μ_{post} = tr (C_{post}) .

Therefore, the less informative the forward map is, the smaller G is, and then the bigger

tr (C_{post})

gets, indicating the higher mean-square error. Geometrically, it means the Gaussian is flatter.

We would like to understand this local behavior around the MAP point. However, the forward map we have is nonlinear, so the argument above only serves as a guidance. To start, we denote:

μ_{post} = \frac{1}{Z} exp ({- ∥ G (σ) - d ∥}_{Γ_{η}}^{2} / 2) μ_{0} \propto exp {- A} .

Then, the convexity of the posterior distribution is uniquely determined by the hessian of A. For that, we quote:

Proposition 7.

Let

σ_{*}

be admissible and

\hat{σ}

be an admissible variation. The Hessian of the posterior distribution is expressed in terms of the forward map

G

as:

A^{″} (σ_{*}) [\hat{σ}, \hat{σ}] = \frac{1}{2} \sum_{i} (G_{i}^{'} (σ_{*}) {[\hat{σ}]}^{⊤} Γ_{η}^{- 1} G_{i}^{'} (σ_{*}) [\hat{σ}] + G_{i} {(σ)}^{⊤} Γ_{η}^{- 1} G_{i}^{″} (σ_{*}) [\hat{σ}, \hat{σ}] - d_{i}^{⊤} Γ_{η}^{- 1} G_{i}^{″} (σ_{*}) [\hat{σ}, \hat{σ}]) .

Proof.

We begin by expanding out

G

:

\begin{matrix} A (σ) = \frac{1}{2} \sum_{i = 1}^{J K} [- 2 d_{i}^{⊤} Γ_{η}^{- 1} G_{i} (σ) + G_{i} {(σ)}^{⊤} Γ_{η}^{- 1} G_{i} (σ)] + remainders, \end{matrix}

where the remainder term has no

G

dependence. Expand

G

around

σ_{*}

for small t:

G_{i} (σ) = G_{i} (σ_{*}) + t G_{i}^{'} (σ_{*}) [\hat{σ}] + \frac{1}{2} t^{2} G_{i}^{″} (σ_{*}) [\hat{σ}, \hat{σ}] + O (t^{3}),

where

σ = σ_{*} + t \hat{σ}

. Plug this in the expression of A and collect the second power of t; then, we have:

A^{″} (σ_{*}) [\hat{σ}, \hat{σ}] = \frac{1}{2} \sum_{i} (G_{i}^{'} (σ_{*}) {[\hat{σ}]}^{⊤} Γ_{η}^{- 1} G_{i}^{'} (σ_{*}) [\hat{σ}] + G_{i} {(σ)}^{⊤} Γ_{η}^{- 1} G_{i}^{″} (σ_{*}) [\hat{σ}, \hat{σ}] - d_{i}^{⊤} Γ_{η}^{- 1} G_{i}^{″} (σ_{*}) [\hat{σ}, \hat{σ}]),

concluding the proof. □

This formula holds true for all

σ_{*}

that is admissible and thus is valid at MAP as well. According to this proposition, to show the “flatness” of the distribution at MAP, we essentially need to show the smallness of both

G_{i}^{'} (σ_{*})

and

G_{i}^{″} (σ_{*})

for all i. All derivations below are to justify this statement for both the linear and nonlinear RTE.

5.1. Linear RTE

We begin with the linear RTE. The following proposition, due to [4], characterizes the second derivative of the parameter-to-solution map

S

, which we will compute its dependence on

ϵ

.

Proposition 8.

Denote

H = S^{″} (σ_{*}) [σ_{1}, σ_{2}]

,

A f = v \cdot \nabla_{x}

, and

C_{σ} f = ϵ σ_{a} (x) f - \frac{1}{ϵ} σ_{s} (x) L f

. Then, for any admissible

σ_{*}

and admissible variations

σ_{1}

,

σ_{2}

, H is the unique solution of the following equation:

\{\begin{matrix} A H + C_{σ_{*}} H = - C_{σ_{1}} w^{(2)} - C_{σ_{2}} w^{(1)}, \\ H |_{Γ_{-}} = 0, \end{matrix}

where

f \in F

is the solution to (6) with parameter

σ_{*}

and

w^{(1, 2)}

are the solutions to (11) with parameters

σ_{*} + t σ_{1, 2}

, respectively. Moreover, there holds:

{∥ H ∥}_{L_{\infty} (Ω)} \leq C (∥ σ_{1} ∥_{L_{\infty} (Ω)} + ∥ σ_{2} ∥_{L_{\infty} (Ω)}) .

Proof.

Considering the second derivative of G in Equation (24), all

f^{i, j}

satisfy the RTE with different

σ

as shown below, and the same incoming boundary condition g,

\begin{matrix} A f^{(1, 2)} + C_{σ_{*} + t σ_{1} + t σ_{2}} f^{(1, 2)} & = 0 \\ A f^{(1)} + C_{σ_{*} + t σ_{1}} f^{(1)} & = 0 \\ A f^{(2)} + C_{σ_{*} + t σ_{2}} f^{(2)} & = 0 \\ A f + C_{σ_{*}} f & = 0, \end{matrix}

and combining these equations gives

\begin{matrix} A (f^{(1, 2)} - f^{(1)} - f^{(2)} + f) + C_{σ_{*}} (f^{(1, 2)} - f^{(1)} - f^{(2)} + f) \\ + t C_{σ_{1}} f^{(1, 2)} - t C_{σ_{1}} f^{(1)} + t C_{σ_{2}} f^{(1, 2)} - t C_{σ_{2}} f^{(2)} = 0 . \end{matrix}

Here, we have used the fact that C is linear in

σ

. Dividing by

t^{2}

and taking the limit as

t \to 0

then gives

\{\begin{matrix} A H + C_{σ_{*}} H = - C_{σ_{1}} w^{(2)} - C_{σ_{2}} w^{(1)}, \\ H |_{Γ_{-}} = 0 . \end{matrix}

(48)

The boundedness is seen from Theorem 3.7 in [4]. □

We next determine the dependence of the forward map’s second derivative in terms of

ϵ

.

Proposition 9.

For every

j, k

, admissible

σ_{*} \in B_{σ}

, and admissible variation

\hat{σ}

, there holds:

G_{j k}^{″} (σ_{*}) [\hat{σ}, \hat{σ}] = O (ϵ) .

Proof.

Considering (48) with fixed

ϕ_{j}

and boundary measurement location

x_{k}

, we expand

\begin{matrix} H & = H_{0} + ϵ H_{1} + ϵ^{2} H_{2} \\ w^{(1)} & = w_{0}^{(1)} + ϵ w_{1}^{(1)} + ϵ^{2} w_{2}^{(1)} \\ w^{(2)} & = w_{0}^{(2)} + ϵ w_{1}^{(2)} + ϵ^{2} w_{2}^{(2)} . \end{matrix}

Using (48), at

O (1 / ϵ)

, we obtain

- σ_{s} L H_{0} = σ_{s, 1} L w_{0}^{(2)} + σ_{s, 2} L w_{0}^{(1)} .

From Proposition 2,

w_{0}^{(1)}

and

w_{0}^{(2)}

have no velocity dependence, which gives

L H_{0} = 0

; thus,

H_{0}

has no velocity dependence. Additionally, from the zero boundary condition of H, we have

H_{0} |_{\partial Ω} = 0

.

At

O (1)

,

\begin{matrix} v \cdot \nabla_{x} H_{0} - σ_{s} L H_{1} = σ_{s, 1} L w_{1}^{(2)} + σ_{s, 2} L w_{1}^{(1)}, \end{matrix}

and thus

L H_{1} = \frac{1}{σ_{s}} v \cdot \nabla_{x} H_{0} - \frac{σ_{s, 1}}{σ_{s}} L w_{1}^{(2)} - \frac{σ_{s, 2}}{σ_{s}} L w_{1}^{(1)} .

Since

L w_{1}^{(1)} = - w_{1}^{(1)}

and

L w_{1}^{(2)} = - w_{1}^{(2)}

, we have

H_{1} = - \frac{1}{σ_{s}} v \cdot \nabla_{x} H_{0} - \frac{σ_{s, 1}}{σ_{s}} w_{1}^{(2)} - \frac{σ_{s, 2}}{σ_{s}} w_{1}^{(1)} .

Finally at

O (ϵ)

, we have

v \cdot \nabla_{x} H_{1} + σ_{a} H_{0} - σ_{s} L H_{2} = - σ_{a, 1} w_{0}^{(2)} + σ_{s, 1} L w_{2}^{(2)} - σ_{a, 2} w_{0}^{(1)} + σ_{s, 2} L w_{2}^{(1)} .

Integrating over velocity, we obtain an equation for the leading order term

H_{0}

, which is in closed form, since

w_{1}^{(i)}

are written in terms of

w_{0}^{(i)}

(see [4])

\begin{matrix} \nabla \cdot (\frac{1}{σ_{s}} \nabla H_{0}) + σ_{a} H_{0} \\ = & - σ_{a, 1} w_{0}^{(2)} - σ_{a, 2} w_{0}^{(1)} - \frac{σ_{s, 1}}{σ_{s}} \frac{1}{| S^{d - 1} |} \int_{S^{d - 1}} w_{1}^{(2)} d v - \frac{σ_{s, 2}}{σ_{s}} \frac{1}{| S^{d - 1} |} \int_{S^{d - 1}} w_{1}^{(1)} d v . \end{matrix}

Upon averaging over velocity, we obtain the second derivative of the operator

G

with the

[\hat{σ}, \hat{σ}]

perturbation,

\begin{matrix} G_{j k}^{″} (σ_{*}) [\hat{σ}, \hat{σ}] & = \int_{Γ_{+} (x_{k})} H (x, v; ϕ_{j}) v \cdot n d v \\ = \int_{Γ_{+} (x_{k})} (H_{0} (x; ϕ_{j}) + ϵ H_{1} (x, v; ϕ_{j}) + ϵ^{2} H_{2} (x, v; ϕ_{j})) v \cdot n d v \\ = ϵ \int_{Γ_{+} (x_{k})} (H_{1} (x, v; ϕ_{j}) + O (ϵ^{2})) v \cdot n d v \\ = O (ϵ) . \end{matrix}

The contribution from the

H_{0}

term becomes zero due to its trivial boundary condition, and H is bounded from [4]. □

Directly from Propositions 3, 7 and 9, we have the following corollary.

Corollary 1.

For

σ_{*} \in B_{σ}

and admissible variation

\hat{σ}

, the diagonal elements

A^{″} (σ_{*}) [\hat{σ}, \hat{σ}]

of the Hessian of the posterior distribution satisfies:

A^{″} (σ_{*}) [\hat{σ}, \hat{σ}] = O (ϵ) .

5.2. Nonlinear RTE

We repeat the analysis for the nonlinear RTE, for the case when

G

has been linearized. Using similar notatinos, we let

H_{f} = S_{1}^{″} (σ_{*}) [σ_{1}, σ_{2}], H_{T} = S_{2}^{″} (σ_{*}) [σ_{1}, σ_{2}]

and have the following proposition regarding the boundedness of

H_{f}

and

H_{T}

.

Proposition 10.

For

σ_{*} \in B_{σ}

and admissible variations

σ_{1}, σ_{2}

,

(H_{f}, H_{T})

are the unique solution to the following system:

\begin{matrix} \{\begin{matrix} ϵ v \cdot \nabla_{x} H_{f} = - σ_{*} H_{f} (4 T^{3} H_{T} + 12 T^{2} w_{T}^{(1)} w_{T}^{(2)}) \\ + σ_{1} (4 T^{3} w_{T}^{(2)}) + σ_{2} (4 T^{3} w_{T}^{(1)}) - σ_{1} w_{f}^{(2)} - σ_{2} w_{f}^{(1)} \\ ϵ^{2} ∆ H_{T} = - σ_{*} H_{ρ} + σ_{*} (4 T^{3} H_{T} + 12 T^{2} w_{T}^{(1)} w_{T}^{(2)} \\ + σ_{1} (4 T^{3} w_{T}^{(2)}) + σ_{2} (4 T^{3} w_{T}^{(2)}) - σ_{1} w_{ρ}^{(2)} - σ_{2} w_{ρ}^{(1)} \\ H_{f} |_{Γ_{-}} = 0, H_{T} |_{\partial Ω} = 0 \end{matrix}, \end{matrix}

where

w_{f}^{(1, 2)}

denotes the first derivative of the parameter-to-solution map for f in the

σ_{1, 2}

direction, and similarly for

w_{T}^{(1, 2)}

, and

H_{ρ} = \frac{1}{| S^{d - 1} |} \int_{S^{d - 1}} H_{f} d v

. Moreover, there holds:

\begin{matrix} ∥ H_{f} ∥_{L_{\infty} (Ω)} \leq C_{f}^{'}, {∥ H_{T} ∥}_{L_{\infty} (Ω)} \leq C_{T}^{'}, \end{matrix}

where

C_{T}^{'} > 0

and

C_{f}^{'}

depends on

C_{T}^{'}

.

Proof.

We let

H_{f} = \frac{1}{t^{2}} (f^{(1, 2)} - f^{(1)} - f^{(2)} + f)

, and

H_{T} = \frac{1}{t^{2}} (T^{(1, 2)} - T^{(1)} - T^{(2)} + T)

. Each

(f^{(i, j)}, T^{(i, j)})

satisfies the nonlinear RTE with a variety of

σ

shown below:

\begin{matrix} \{\begin{matrix} v \cdot \nabla_{x} f^{(1, 2)} = \frac{1}{ϵ} (σ_{*} + t σ_{1} + t σ_{2}) ({(T^{(1, 2)})}^{4} - f^{(1, 2)}), \\ ∆ T^{(1, 2)} = \frac{1}{ϵ^{2}} (σ_{*} + t σ_{1} + t σ_{2}) ({(T^{(1, 2)})}^{4} - ρ^{(1, 2)}); \end{matrix} \end{matrix}

(49)

\begin{matrix} \{\begin{matrix} v \cdot \nabla_{x} f^{(1)} = \frac{1}{ϵ} (σ_{*} + t σ_{1}) ({(T^{(1)})}^{4} - f^{(1)}), \\ ∆ T^{(1)} = \frac{1}{ϵ^{2}} (σ_{*} + t σ_{1}) ({(T^{(1)})}^{4} - ρ^{(1)}); \end{matrix} \end{matrix}

(50)

\begin{matrix} \{\begin{matrix} v \cdot \nabla_{x} f^{(2)} = \frac{1}{ϵ} (σ_{*} + t σ_{2}) ({(T^{(2)})}^{4} - f^{(2)}), \\ ∆ T^{(2)} = \frac{1}{ϵ^{2}} (σ_{*} + t σ_{2}) ({(T^{(2)})}^{4} - ρ^{(2)}); \end{matrix} \end{matrix}

(51)

\begin{matrix} \{\begin{matrix} v \cdot \nabla_{x} f = \frac{1}{ϵ} σ_{*} (T^{4} - f), \\ ∆ T = \frac{1}{ϵ^{2}} σ_{*} (T^{4} - ρ), \end{matrix}, \end{matrix}

(52)

where

ρ = \frac{1}{| S^{d - 1} |} \int_{S^{d - 1}} f d v

.

We begin by computing various combinations of

T^{(i)}

. First, we recall:

\begin{matrix} w_{T}^{(1)} & = lim_{t \to 0} \frac{1}{t} (T^{(1)} - T), \\ w_{T}^{(2)} & = lim_{t \to 0} \frac{1}{t} (T^{(2)} - T), \end{matrix}

and expand

\begin{matrix} {(T^{(1, 2)})}^{4} & = {(T^{(1)})}^{4} + {(T^{(1)})}^{3} (4 t w_{T}^{(2)} + 4 t^{2} H_{T}) + {(T^{(1)})}^{2} (6 t^{2} {(w_{T}^{(2)})}^{2}) + O (t^{3}) \\ = {(T + t w_{T}^{(1)})}^{4} + {(T + t w_{T}^{(1)})}^{3} (4 t w_{T}^{(2)} + 4 t^{2} H_{T}) + {(T + t w_{T}^{(1)})}^{2} (6 t^{2} {(w_{T}^{(2)})}^{2}) + O (t^{3}) . \end{matrix}

Similarly for

σ_{1}

and

σ_{2}

respectively,

\begin{matrix} {(T^{(1)})}^{4} & = {(T + t w_{T}^{(1)})}^{4} \\ = T^{4} + 4 t T^{3} w_{T}^{(1)} + 6 t^{2} T^{2} w_{T}^{(1)} + O (t^{3}), \\ {(T^{(2)})}^{4} & = T^{4} + 4 t T^{3} w_{T}^{(2)} + 6 t^{2} T^{2} w_{T}^{(2)} + O (t^{3}) . \end{matrix}

Combining these,

\begin{matrix} {(T^{(1, 2)})}^{4} - {(T^{(1)})}^{4} & = 4 t T^{3} w_{T}^{(2)} + O (t^{2}), \\ {(T^{(1, 2)})}^{4} - {(T^{(2)})}^{4} & = 4 t T^{3} w_{T}^{(1)} + O (t^{2}), \end{matrix}

so that

{(T^{(1, 2)})}^{4} - {(T^{(1)})}^{4} - {(T^{(2)})}^{4} + T = 4 t^{2} T^{3} H_{T} + 12 t^{2} T^{2} w_{T}^{(1)} w_{T}^{(2)} + O (t^{3}) .

(53)

To derive the equations for

H_{f}

and

H_{T}

, we take (49), subtract (50) and (51), and add (52). Using (53),

\begin{matrix} \{\begin{matrix} ϵ v \cdot \nabla_{x} H_{f} = & - σ_{*} H_{f} (4 T^{3} H_{T} + 12 T^{2} w_{T}^{(1)} w_{T}^{(2)}) \\ + σ_{1} (4 T^{3} w_{T}^{(2)}) + σ_{2} (4 T^{3} w_{T}^{(1)}) - σ_{1} w_{f}^{(2)} - σ_{2} w_{f}^{(1)} \\ ϵ^{2} ∆ H_{T} = & - σ_{*} H_{ρ} + σ_{*} (4 T^{3} H_{T} + 12 T^{2} w_{T}^{(1)} w_{T}^{(2)} \\ + σ_{1} (4 T^{3} w_{T}^{(2)}) + σ_{2} (4 T^{3} w_{T}^{(2)}) - σ_{1} w_{ρ}^{(2)} - σ_{2} w_{ρ}^{(1)} \end{matrix} . \end{matrix}

(54)

To show the boundedness, we note that

H_{f}, H_{T}

solve the nonlinear RTE with a modified right-hand side only containing bounded terms, so we again use Theorem 3.1 from [33] to obtain the boundedness:

∥ H_{f} ∥_{L_{\infty} (Ω)} \leq C_{f}^{'}, {∥ H_{T} ∥}_{L_{\infty} (Ω)} \leq C_{T}^{'} .

□

Proposition 11.

For every

j, k

and functions

σ_{*}, \hat{σ}

in the admissible set, the second derivative of the forward map is of order ϵ,

G_{j k}^{″} (σ_{*}) [\hat{σ}, \hat{σ}] = O (ϵ) .

Proof.

We start with arbitrary directions

σ_{1}

and

σ_{2}

and later choose the

[\hat{σ}, \hat{σ}]

direction. To find the

ϵ

dependence of

H_{f}

and

H_{T}

, we expand:

\begin{matrix} H_{f} & = H_{f, 0} + ϵ H_{f, 1} + ϵ^{2} H_{f, 2} \\ w_{f}^{(1)} & = w_{f, 0}^{(1)} + ϵ w_{f, 1}^{(1)} + ϵ^{2} w_{f, 2}^{(1)} \\ w_{f}^{(2)} & = w_{f, 0}^{(2)} + ϵ w_{f, 1}^{(2)} + ϵ^{2} w_{f, 2}^{(2)} \\ T & = T_{0} + ϵ^{2} T_{2} \\ w_{T}^{(1)} & = w_{T, 0}^{(1)} + ϵ w_{T, 1}^{(1)} + ϵ^{2} w_{T, 2}^{(1)} \\ w_{T}^{(2)} & = w_{T, 0}^{(2)} + ϵ w_{T, 1}^{(2)} + ϵ^{2} w_{T, 2}^{(2)} \\ H_{T} & = H_{T, 0} + ϵ^{2} H_{T, 2}, \end{matrix}

and plug this in to (54). At order

O (1)

, we obtain

\begin{matrix} \{\begin{matrix} 0 = 12 σ_{*} T_{0}^{2} w_{T, 0}^{(1)} w_{T, 0}^{(2)} + 4 σ_{*} T_{0}^{3} H_{T, 0} + 4 σ_{1} T_{0}^{3} w_{T, 0}^{(2)} + 4 σ_{2} T_{0}^{3} w_{T, 0}^{(1)} - σ_{*} H_{f, 0} - σ_{1} w_{f, 0}^{(2)} - σ_{2} w_{f, 0}^{(1)} \\ 0 = 12 σ_{*} T_{0}^{2} w_{T, 0}^{(1)} w_{T, 0}^{(2)} + 4 σ_{*} T_{0}^{3} H_{T, 0} + 4 σ_{1} T_{0}^{3} w_{T, 0}^{(2)} + 4 σ_{2} T_{0}^{3} w_{T, 0}^{(1)} - σ_{*} H_{ρ, 0} - σ_{1} w_{ρ, 0}^{(2)} - σ_{2} w_{ρ, 0}^{(1)} \end{matrix}, \end{matrix}

subtracting these two equations, we obtain

H_{f, 0} = H_{ρ, 0},

so

H_{f, 0}

loses its velocity dependence. At

O (ϵ)

, we obtain

v \cdot \nabla_{x} H_{f, 0} = - σ_{*} H_{f, 1} - σ_{1} w_{f, 1}^{(2)} - σ_{2} w_{f, 1}^{(1)},

so

H_{f, 1} = - \frac{1}{σ_{*}} v \cdot \nabla_{x} H_{f, 0} - \frac{σ_{1}}{σ_{*}} w_{f, 1}^{(2)} - \frac{σ_{2}}{σ_{*}} w_{f, 1}^{(1)} .

Finally, at

O (ϵ^{2})

, we obtain

\begin{matrix} v \cdot \nabla_{x} H_{f} = & 8 σ_{*} H_{T, 0} H_{T, 2} T_{0}^{3} + 12 σ_{*} {(H_{T, 0})}^{2} T_{0}^{2} T_{2} + 12 σ_{2} T_{0}^{2} T_{2} w_{T, 0}^{(1)} \\ + 4 σ_{2} T_{0}^{3} w_{T, 2}^{(1)} + 12 σ_{1} T_{0}^{2} T_{2} w_{T, 0}^{(2)} \\ + 24 σ_{*} T_{0} T_{2} w_{T, 0}^{(2)} + 12 σ_{*} T_{0}^{2} w_{T, 2}^{(1)} w_{T, 0}^{(2)} + 4 σ_{1} T_{0}^{3} w_{T, 2}^{(2)} \\ + 12 σ_{*} T_{0}^{2} w_{T, 0}^{(1)} w_{T, 2}^{(2)} - σ_{*} H_{f, 2} - σ_{1} w_{f, 2}^{(2)} σ_{*} w_{f, 2}^{(1)} \end{matrix}

(55)

and

\begin{matrix} ∆ H_{T, 0} & = 8 σ_{*} H_{T, 0} H_{T, 2} T_{0}^{3} + 12 σ_{*} {(H_{T, 0})}^{2} T_{0}^{2} T_{2} + 12 σ_{2} T_{0}^{2} T_{2} w_{T, 0}^{(1)} + 4 σ_{2} T_{0}^{3} w_{T, 2}^{(1)} + 12 σ_{1} T_{0}^{2} T_{2} w_{T, 0}^{(2)} \\ + 24 σ_{*} T_{0} T_{2} w_{T, 0}^{(2)} + 12 σ_{*} T_{0}^{2} w_{T, 2}^{(1)} w_{T, 0}^{(2)} + 4 σ_{1} T_{0}^{3} w_{T, 2}^{(2)} + 12 σ_{*} T_{0}^{2} w_{T, 0}^{(1)} w_{T, 2}^{(2)} \\ - σ_{*} \frac{1}{| S^{d - 1} |} \int_{S^{d - 1}} H_{f, 2} d v - σ_{1} \frac{1}{| S^{d - 1} |} \int_{S^{d - 1}} w_{f, 2}^{(2)} d v - σ_{2} \frac{1}{| S^{d - 1} |} \int_{S^{d - 1}} w_{f, 2}^{(1)} d v . \end{matrix}

Integrating equation (55) over velocity, subtracting (56), and using Green’s theorem, we obtain

\nabla \cdot (\frac{1}{σ_{*}} \nabla H_{f, 0}) - ∆ H_{T, 0} = \frac{1}{| S^{d - 1} |} \int_{S^{d - 1}} v \cdot \nabla_{x} (\frac{σ_{1}}{σ_{*}} w_{f, 1}^{(2)} - \frac{σ_{2}}{σ_{*}} w_{f, 1}^{(1)}) d v .

Since

H_{f, 0}

has trivial boundary condition, its contribution drops out.

\begin{matrix} G_{j k}^{″} (σ_{*}) [\hat{σ}, \hat{σ}] & = \int_{Γ_{+} (x_{k})} (H_{f, 0} (x; ϕ_{j}) + ϵ H_{f, 1} (x, v; ϕ_{j}) + ϵ^{2} H_{f, 2} (x, v; ϕ_{j})) v \cdot n d v \\ = ϵ \int_{Γ_{+} (x_{k})} H_{f, 1} (x, v; ϕ_{j}) v \cdot n d v + O (ϵ^{2}); \end{matrix}

therefore, with the boundedness of H from from Proposition 10, we have

G_{j k}^{″} (σ_{*}) [\hat{σ}, \hat{σ}] = O (ϵ)

. □

Corollary 2.

For

σ_{*} \in B_{σ}

and admissible variation

\hat{σ}

, the diagonal elements

A^{″} (σ_{*}) [\hat{σ}, \hat{σ}]

of the Hessian of the posterior distribution based on the nonlinear RTE satisfies:

A^{″} (σ_{*}) [\hat{σ}, \hat{σ}] = O (ϵ) .

This is a direct corollary from Propositions 3, 7, and 11.

6. Research Results and Discussion

As is well known, the radiative transfer equation, in the high-energy regime where the scattering is dominating, is well approximated by the diffusion equation. This asymptotic relation constitutes a major part in model reduction for the forward setting. On the contrary, it has an adverse effect for the inverse problem. In particular, the reconstruction becomes increasingly unstable as we approach the diffusive regime. This relation has been investigated numerically in [1,15,16] and analytically in [17], where a full data to measurement map, named the Albedo operator, is assumed to be given.

However, in practice, only partial data are available, and the previously obtained well-posedness results are no longer feasible. We focus our investigation in this paper for this scenario, using Bayesian inference as the basic tool. We have proposed two new measures to characterize the stability and its deterioration in the diffusion regime. One is the KL divergence between the posterior and prior distribution, which implies the overall information gain from the data. We show that the information gain is less in the small Knudsen number regime. The other is the Hessian of the posterior distribution, which is related to the mean square error of the posterior and quantifies the uncertainty of the optimizer; therefore, it serves as a local measure around the MAP. We find that the posterior distribution is more and more “flat” and thus carries no useful information when the diffusion effect is strong. Both measures are taken for the linear RTE and are extended to, for the first time in the literature, treat the nonlinear RTE as well.

Although the paper is completely theoretical, it gives guidance on conducting numerical experiments. The discussion in this paper essentially suggests that the problem is intrinsically bad in the diffusion regime and thus rules out all possible algorithms that could potentially deliver a good numerical recovery.

There are some natural follow-up questions that need to be answered in the near future. One task is to clearly identify the number and the quality of the measurements for a unique reconstruction. Suppose the reconstructed function is represented by an N dimensional vector; then, how many experiments and measurements exactly are needed for a unique reconstruction, and where should the source and detector be located? The answer to this question for EIT (electrical impedance tomography) was given in [26], but one needs to tune the process to fit the situation for the radiative transfer equation used in the current paper. Another question concerns the numerical stability. While it is true that the information gain deteriorates as

ϵ

becomes small, there might be numerical approaches that gradually incorporate information from high-energy to low-energy photons. One possible strategy is to use a large number of experimental measurements conducted with low-energy photons to build a rough initial guess as a “warm start” before adding in information obtained at a high-energy level. The warm start given by the low-energy level results serves as a good initial guess toward a final convergence. The process is parallel to Bayesian tempering, which saw some good developments in the past few years [34,35], and the inverse scattering problem that combines information from multiple frequencies [36]. Another approach is to use a hybrid image modality, such as photoacoustic tomography. where acoustic measurement generated by the photoacoustic effect is used to infer the optimal properties of the media. In this case, an improved stability is anticipated [37,38].

Numerical Evidence

Here, we conduct two numerical tests to further support our theoretical findings. In particular, we consider the following linear radiative transfer equation:

\begin{matrix} \{\begin{matrix} ϵ v \cdot \nabla_{x} f (x, v) = σ_{s} (x) L f (x, v), x \in {[0, 0.6]}^{2}, v \in S^{1}, \\ f |_{Γ_{-}} = ϕ . \end{matrix} \end{matrix}

(56)

The data are prepared as follows. For the i-th experiment, we prescribe incoming data

ϕ_{i}

and measure the output intensity f at location j on

Γ_{+}

, which is denoted as

d_{i, j}

. Here,

ϕ

and detect location j are chosen randomly, and

N_{I} = 44

number of experiments are conducted with

N_{J} = 264

number of receivers for each experiment. These data pairs

{ϕ_{i}, {d_{i, j}}_{j = 1}^{N_{J}}}_{i = 1}^{N_{I}}

are kept fixed for various choices of Knudsen number

ϵ

for a fair comparison.

In the first test, we assume

σ_{s} (x)

has the form

σ_{s} (x) = 1 + σ N ((0.3, 0.3), Σ),

(57)

where

N (0, Σ)

is a two-dimensional normal distribution with mean zero and a fixed covariance matrix

Σ

. The parameter

σ

is the parameter to be inferred. We assume the ground-truth

σ = 0.5

, and the ground-truth medium is plotted on the left of Figure 1. The inverse problem aims at inferring this ground-truth from the prescribed data pairs. To illustrate the stability of the inverse problem, we show how, for different choices of

σ

away from the truth, the measurements differ from the true measurements. More precisely, we denote

d_{p} (σ) = {(\sum_{i, j} {| d_{i, j} (σ) - d_{i, j}^{true} |}^{2})}^{1 / 2}

(58)

as the perturbation in the measurement, where

d_{i, j}

is the measurement with a fixed

σ

for the i-th experiment and j-th measurement, and

d_{i, j}^{true}

is the measurement with true

σ = 0.5

. In the right panel of Figure 1, we plot the difference in measurement data (58) versus the perturbation in

σ

for three different

ϵ

. It is clear that with smaller

ϵ

, the difference in measurement becomes more indistinguishable, which means that a smaller perturbation in measurement can lead to larger deviation in reconstruction. This is in consistent with our theory on the stability deterioration with decreasing

ϵ

.

In the second test, we consider a different form of

σ_{s} (x)

:

σ_{s} (x) = 1 + σ_{1} N ([0.2, 0.2], Σ) + σ_{2} N ([0.45, 0.45], Σ) .

(59)

Here, the to-be-reconstructed parameters are

σ_{1}

and

σ_{2}

. We choose

σ_{1} = 0.5

and

σ_{2} = 0.3

to be the ground-truth and display the medium in Figure 2. Similar to the previous case, we plot the difference in measurement

d_{p} (σ_{1}, σ_{2}) = {(\sum_{i, j} {| d_{i, j} (σ_{1}, σ_{2}) - d_{i, j}^{true} |}^{2})}^{1 / 2}

(60)

with respect to

σ_{1} \in (0, 1)

and

σ_{2} \in (0, 0.6)

. It is again evident that for smaller

ϵ

,

d_{p}

becomes flatter, leading to a deterioration in stability, as shown in Figure 2. In addition, we identify the data that are above

0.05 \times {max}_{i, j} {d_{i, j}}

in Figure 3. Here, the horizontal axis is the index for the experiment, and the vertical axis is the index for the receiver. It is seen that a larger

ϵ

gives a sparse dataset, whereas small

ϵ

gives a dense dataset, meaning that all receivers receive some information about the source. This indicates that the data are more spread out for smaller

ϵ

, which makes the inverse problem harder to solve.

7. Conclusions

In this paper, we examine the stability deterioration for the multiscale inverse radiative transfer equation (RTE) in the Bayesian framework. Even though the instability on the continuous level was discussed in the literature [31] and the multiscale convergence was shown in [16,17,32], the instability representation in the Bayesian framework was open. The current paper constitutes the first result that fills the gap. The results presented suggest that one should not use Bayesian inference for conducting optical tomography when the photon energy is low. The numerical algorithm, without fine tuning, would carry very limited use in reconstructing the ground-truth media.

Author Contributions

Conceptualization: Q.L.; methodology, Q.L. and L.W.; validation, Q.L. and L.W.; formal analysis, K.N. and L.W.; investigation, Q.L. and L.W.; resources, Q.L. and L.W.; writing—original draft preparation, K.N.; writing—review and editing, Q.L. and L.W.; visualization, Q.L.; supervision, Q.L.; project administration, Q.L.; funding acquisition, Q.L. and L.W. All authors have read and agreed to the published version of the manuscript.

Funding

Q.L. acknowledges support from Vilas Early Career award. The research is supported in part by NSF via grant DMS-1750488 and the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin Madison with funding from the Wisconsin Alumni Research Foundation. K.N. acknowledges support from NSF grant DMS-1750488. L.W. acknowledges support from NSF via grant DMS-1846854.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Arridge, S.R. Optical tomography in medical imaging. Inverse Probl. 1999, 15, R41–R93. [Google Scholar] [CrossRef] [Green Version]
Case, K.M.; Zweifel, P.F. Linear Transport Theory; Addison-Wesley Publishing Company: Boston, MA, USA, 1967. [Google Scholar]
Case, K.; Zweifel, P. Existence and uniqueness theorems for the neutron transport equation. J. Math. Phys. 1963, 4, 1376–1385. [Google Scholar] [CrossRef] [Green Version]
Egger, H.; Schlottbom, M. Numerical methods for parameter identification in stationary radiative transfer. Comput. Optim. Appl. 2015, 62, 67–83. [Google Scholar] [CrossRef] [Green Version]
Egger, H.; Schlottbom, M. A mixed variational framework for the radiative transfer equation. Math. Model. Methods Appl. Sci. 2012, 22, 1150014. [Google Scholar] [CrossRef]
Dautray, R.; Lions, J.L. Mathematical Analysis and Numerical Methods for Science and Technology: Volume 1 Physical Origins and Classical Methods; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Choulli, M.; Stefanov, P. An inverse boundary value problem for the stationary transport equation. Osaka J. Math. 1999, 36, 87–104. [Google Scholar]
Stefanov, P.; Tamasan, A. Uniqueness and non-uniqueness in inverse radiative transfer. Proc. Am. Math. Soc. 2009, 137, 2335–2344. [Google Scholar] [CrossRef] [Green Version]
Wang, J. Stability estimates of an inverse problem for the stationary transport equation. Ann. Inst. Henri Poincaré Phys. Théor. 1999, 70, 473–495. [Google Scholar]
Bal, G.; Langmore, I.; Monard, F. Inverse transport with isotropic sources and angularly averaged measurements. Inverse Probl. Imaging 2008, 2, 23–42. [Google Scholar] [CrossRef]
Bal, G.; Jollivet, A. Generalized stability estimates in inverse transport theory. Inverse Probl. Imaging 2018, 12, 59–90. [Google Scholar] [CrossRef] [Green Version]
Bal, G. Inverse transport theory and applications. Inverse Probl. 2009, 25, 48. [Google Scholar] [CrossRef] [Green Version]
Bardos, C.; Santos, R.; Sentis, R. Diffusion Approximation and Computation of the Critical Size. Trans. Am. Math. Soc. 1984, 284, 617–649. [Google Scholar] [CrossRef]
Bensoussan, A.; Lions, J.L.; Papanicolaou, G. Boundary layers and homogenization of transport processes. Publ. Res. Inst. Math. Sci. 1979, 15, 53–157. [Google Scholar] [CrossRef] [Green Version]
Hielscher, A.; Alcouffe, R.; Barbour, R. Comparison of finite-difference transport and diffusion calculations for photon migration in homogeneous and heterogeneous tissues. Phys. Med. Biol. 1998, 43, 1285–1302. [Google Scholar] [CrossRef]
Chen, K.; Li, Q.; Wang, L. Stability of stationary inverse transport equation in diffusion scaling. Inverse Probl. 2018, 34, 025004. [Google Scholar] [CrossRef] [Green Version]
Lai, R.Y.; Li, Q.; Uhlmann, G. Inverse Problems for the Stationary Transport Equation in the Diffusion Scaling. SIAM J. Appl. Math. 2019, 79, 2340–2358. [Google Scholar] [CrossRef]
Nusken, N.; Reich, S.; Rozdeba, P.J. State and Parameter Estimation from Observed Signal Increments. Entropy 2019, 21, 505. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jiang, S.W.; Harlim, J. Parameter Estimation with Data-Driven Nonparametric Likelihood Functions. Entropy 2019, 21, 559. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Biegler, L.T.; Ghattas, O.; Heinkenschloss, M.; van Bloemen Waanders, B. Large-Scale PDE-Constrained Optimization: An Introduction. In Large-Scale PDE-Constrained Optimization; Biegler, L.T., Heinkenschloss, M., Ghattas, O., van Bloemen Waanders, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; pp. 3–13. [Google Scholar]
Rees, T.; Dollar, H.S.; Wathen, A.J. Optimal Solvers for PDE-Constrained Optimization. SIAM J. Sci. Comput. 2010, 32, 271–298. [Google Scholar] [CrossRef] [Green Version]
De los Reyes, J. Numerical PDE-Constrained Optimization; SpringerBriefs in Optimization; Springer International Publishing: Cham, Switzerland, 2015. [Google Scholar]
Idier, J. Bayesian Approach to Inverse Problems; ISTE Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
Stuart, A.M. Inverse problems: A Bayesian perspective. Acta Numer. 2010, 19, 451–559. [Google Scholar] [CrossRef] [Green Version]
Dashti, M.; Stuart, A.M. The Bayesian Approach to Inverse Problems. In Handbook of Uncertainty Quantification; Ghanem, R., Higdon, D., Owhadi, H., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 311–428. [Google Scholar]
Bui-Thanh, T.; Li, Q.; Zepeda-Núñez, L. Bridging and Improving Theoretical and Computational Electric Impedance Tomography via Data Completion. arXiv 2021, arXiv:2105.00554. [Google Scholar]
Wang, J.; Li, M.; Cheng, J.; Guo, Z.; Li, D.; Wu, S. Exact reconstruction condition for angle-limited computed tomography of chemiluminescence. Appl. Opt. 2021, 60, 4273–4281. [Google Scholar] [CrossRef]
Harris, I. Direct Sampling for Recovering Sound Soft Scatterers from Point Source Measurements. Computation 2021, 9, 120. [Google Scholar] [CrossRef]
Li, Q.; Sun, W. Applications of kinetic tools to inverse transport problems. Inverse Probl. 2020, 36, 035011. [Google Scholar] [CrossRef] [Green Version]
Hellmuth, K.; Klingenberg, C.; Li, Q.; Tang, M. Multiscale Convergence of the Inverse Problem for Chemotaxis in the Bayesian Setting. Computation 2021, 9, 119. [Google Scholar] [CrossRef]
Li, Q.; Newton, K. Diffusion Equation-Assisted Markov Chain Monte Carlo Methods for the Inverse Radiative Transfer Equation. Entropy 2019, 21, 291. [Google Scholar] [CrossRef] [Green Version]
Newton, K.; Li, Q.; Stuart, A.M. Diffusive optical tomography in the Bayesian framework. Multiscale Model. Simul. 2020, 18, 589–611. [Google Scholar] [CrossRef] [Green Version]
Klar, A.; Schmeiser, C. Numerical passage from radiative heat transfer to nonlinear diffusion models. Math. Model. Methods Appl. Sci. 2001, 11, 749–767. [Google Scholar] [CrossRef]
Chandra, R.; Azam, D.; Kapoor, A.; Müller, R.D. Surrogate-assisted Bayesian inversion for landscape and basin evolution models. Geosci. Model Dev. 2020, 13, 2959–2979. [Google Scholar] [CrossRef]
Latz, J.; Madrigal-Cianci, J.P.; Nobile, F.; Tempone, R. Generalized parallel tempering on Bayesian inverse problems. Stat. Comput. 2021, 31, 1–26. [Google Scholar] [CrossRef]
Bao, G.; Li, P.; Lin, J.; Triki, F. Inverse scattering problems with multi-frequencies. Inverse Probl. 2015, 31, 093001. [Google Scholar] [CrossRef] [Green Version]
Rabanser, S.; Neumann, L.; Haltmeier, M. Stochastic proximal gradient algorithms for multi-source quantitative photoacoustic tomography. Entropy 2018, 20, 121. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Frederick, C.; Ren, K.; Vallélian, S. Image Reconstruction in Quantitative Photoacoustic Tomography with the Simplified P_2 Approximation. SIAM J. Imaging Sci. 2018, 11, 2847–2876. [Google Scholar] [CrossRef]

Figure 1. Test 1. (Left): ground truth (57) with

σ = 0.5

. (Right): perturbation in measured data versus perturbation in

σ

.

Figure 1. Test 1. (Left): ground truth (57) with

σ = 0.5

. (Right): perturbation in measured data versus perturbation in

σ

.

Figure 2. Test 2: The panel on the left shows the ground-truth medium with

σ_{1} = 0.5

and

σ_{2} = 0.3

in (59). The two plots on the right show, for different

ϵ

, the perturbation in the measured data versus the perturbation in

σ_{1, 2}

.

Figure 2. Test 2: The panel on the left shows the ground-truth medium with

σ_{1} = 0.5

and

σ_{2} = 0.3

in (59). The two plots on the right show, for different

ϵ

, the perturbation in the measured data versus the perturbation in

σ_{1, 2}

.

Figure 3. Test 2. Visualization of measured data whose intensity value is above

0.05 \times {max}_{i, j} {d_{i, j}}

.

Figure 3. Test 2. Visualization of measured data whose intensity value is above

0.05 \times {max}_{i, j} {d_{i, j}}

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Q.; Newton, K.; Wang, L. Bayesian Instability of Optical Imaging: Ill Conditioning of Inverse Linear and Nonlinear Radiative Transfer Equation in the Fluid Regime. Computation 2022, 10, 15. https://doi.org/10.3390/computation10020015

AMA Style

Li Q, Newton K, Wang L. Bayesian Instability of Optical Imaging: Ill Conditioning of Inverse Linear and Nonlinear Radiative Transfer Equation in the Fluid Regime. Computation. 2022; 10(2):15. https://doi.org/10.3390/computation10020015

Chicago/Turabian Style

Li, Qin, Kit Newton, and Li Wang. 2022. "Bayesian Instability of Optical Imaging: Ill Conditioning of Inverse Linear and Nonlinear Radiative Transfer Equation in the Fluid Regime" Computation 10, no. 2: 15. https://doi.org/10.3390/computation10020015

APA Style

Li, Q., Newton, K., & Wang, L. (2022). Bayesian Instability of Optical Imaging: Ill Conditioning of Inverse Linear and Nonlinear Radiative Transfer Equation in the Fluid Regime. Computation, 10(2), 15. https://doi.org/10.3390/computation10020015

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bayesian Instability of Optical Imaging: Ill Conditioning of Inverse Linear and Nonlinear Radiative Transfer Equation in the Fluid Regime

Abstract

1. Introduction

2. Bayesian Formulation Basics

3. Global View Example 1: Linear Radiative Transfer Equation

3.1. First Derivative of the Parameter-to-Solution Map

3.2. Dependence on Knudsen Number

4. Global View Example 2: Nonlinear Radiative Transfer Equation

4.1. First Derivative of the Parameter-to-Solution Map

4.2. Dependence on Knudsen Number

5. Local Behavior around the MAP Point

5.1. Linear RTE

5.2. Nonlinear RTE

6. Research Results and Discussion

Numerical Evidence

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI