Identification of Nonlinear Systems Using the Infinitesimal Generator of the Koopman Semigroup—A Numerical Implementation of the Mauroy–Goncalves Method

Drmač, Zlatko; Mezić, Igor; Mohr, Ryan

doi:10.3390/math9172075

Open AccessArticle

Identification of Nonlinear Systems Using the Infinitesimal Generator of the Koopman Semigroup—A Numerical Implementation of the Mauroy–Goncalves Method

by

Zlatko Drmač

^1,*

,

Igor Mezić

² and

Ryan Mohr

³

¹

Department of Mathematics, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia

²

Department of Mechanical Engineering and Mathematics, University of California, Santa Barbara, CA 93106, USA

³

AIMdyn, Inc., Santa Barbara, CA 93101, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(17), 2075; https://doi.org/10.3390/math9172075

Submission received: 30 June 2021 / Revised: 10 August 2021 / Accepted: 17 August 2021 / Published: 27 August 2021

(This article belongs to the Special Issue Dynamical Systems and Operator Theory)

Download

Browse Figures

Versions Notes

Abstract

:

Inferring the latent structure of complex nonlinear dynamical systems in a data driven setting is a challenging mathematical problem with an ever increasing spectrum of applications in sciences and engineering. Koopman operator-based linearization provides a powerful framework that is suitable for identification of nonlinear systems in various scenarios. A recently proposed method by Mauroy and Goncalves is based on lifting the data snapshots into a suitable finite dimensional function space and identification of the infinitesimal generator of the Koopman semigroup. This elegant and mathematically appealing approach has good analytical (convergence) properties, but numerical experiments show that software implementation of the method has certain limitations. More precisely, with the increased dimension that guarantees theoretically better approximation and ultimate convergence, the numerical implementation may become unstable and it may even break down. The main sources of numerical difficulties are the computations of the matrix representation of the compressed Koopman operator and its logarithm. This paper addresses the subtle numerical details and proposes a new implementation algorithm that alleviates these problems.

Keywords:

infinitesimal generator; Koopman operator; matrix logarithm; nonlinear system identification; preconditioning; Rayleigh quotient

1. Introduction

Suppose that we have an autonomous dynamical system

\dot{x} (t) = F (x (t)) \equiv (\begin{matrix} F_{1} (x (t)) \\ ⋮ \\ F_{n} (x (t)) \end{matrix}), x (t) \in R^{n},

(1)

that is accessible only through snapshots from a sequence of trajectories with different (possibly unknown) initial conditions. More precisely,

(x_{k}, y_{k}) \in R^{n} \times R^{n}, k = 1, \dots, K, where y_{k} = φ^{t} (x_{k})

(2)

is the flow associated with (1). In a real application, t is a fixed time step, and it is possible that the time resolution precludes any approach based on estimating the derivatives; the dataset could also be scarce, sparsely collected from several trajectories/short bursts of the dynamics under study. The task is to identify

F

and express it analytically, using a suitably chosen class of functions. This is the essence of data-driven system identification, which is a powerful modeling technique in applied sciences and engineering—for a review see e.g., [1]. Approaches like that of SINDy [2] rely on numerical differentiation, which is a formidable task in cases of scarce and/or noisy data, and it requires special techniques such as e.g., total-variation regularization [3] or e.g., weak formulation [4]. With an appropriate ansatz (e.g., physics-informed) on the structure of the right hand side in (1), the identification process is computationally executed as sparse regression, see e.g., [2,5,6]. An alternative approach is machine learning techniques such as physics-informed neural networks, which proved a powerful tool for learning nonlinear partial differential equations [7].

Recently, Mauroy and Goncalves [8] proposed an elegant method for learning

F

from the data, based on the semigroup

U^{t} f = f \circ φ^{t}

of Koopman operators acting on a space of suitably chosen scalar observables

f \in F

. In the case of the main method proposed in [8],

F

is the space

L^{2} (X)

, where

X \subset R^{n}

is compact, forward invariant, and big enough to contain all data snapshots. The method has two main steps. First, a compression of

U^{t}

onto a suitable finite dimensional but rich enough subspace

F_{N}

of

F

is computed. On a convenient basis

B = {℘_{1}, \dots, ℘_{N}}

of

F_{N}

, having only a limited number of snapshots (2), this compression is executed in the algebraic (discrete) least squares framework, yielding the matrix representation

U_{N} \in R^{N \times N}

of

U^{t}

.

It can be shown that

U_{N}

is an approximation of the matrix exponential

U_{N} \approx e^{L_{N} t}

, or equivalently

L_{N} \approx (1 / t) log U_{N}

, where

L_{N}

is a finite dimensional compression of the infinitesimal generator

L

defined by

L f = lim_{t \to 0^{+}} \frac{U^{t} f - f}{t}, f \in D (L) .

(3)

Note that the infinitesimal generator is well-defined (on its domain

D (L) \subset L^{2} (X)

) since the Koopman semigroup of operators is strongly continuous in

L^{2} (X)

(i.e.,

{lim}_{t \to 0^{+}} ∥ U^{t} f - f ∥ = 0

, where

∥ \cdot ∥

denotes the

L^{2}

norm on X). We refer to [9] for more details on semigroups of operators and their properties, and to [10] for the theory and applications of the Koopman operator.

In the second step, the vector field is recovered by using the fact that

L

can also be expressed as (see e.g., [11])

L f = F \cdot \nabla f = \sum_{i = 1}^{n} F_{i} \frac{\partial f}{\partial x_{i}}, f \in D (L) .

(4)

If

F_{i} = \sum_{k} ϕ_{k i} ℘_{k}

, then the action of

L

to the basis’s vectors

℘_{k}

can be computed, using (4), by straightforward calculus, and its matrix representation will, by comparison with

(1 / t) log U_{N}

, reveal the coefficients

ϕ_{k i}

. Of course, in real applications, the quality of the computed approximations will heavily depend on the information content in the supplied data snapshots. Finer sampling (smaller time resolution) and more trajectories with different initial data are certainly desirable. If

N > K

, then a modification, called a dual method, is based on the logarithm of a

K \times K

matrix

U_{K}

. Mauroy and Goncalves [8] proved the convergence (with probability one as

t \to 0

,

N \to \infty

,

K \to \infty

) and illustrated the performances of the method (including the dual formulation) on a series of numerical examples.

In this paper, we consider numerical aspects of the method and study its potential as a robust software tool. Our main thesis is that a seemingly simple implementation based on the off-the-shelf software components has certain limitations. For instance, with the increased dimension that guarantees theoretically better approximation and ultimate convergence, the numerical implementation, due to severe ill-conditioning, may become unstable and eventually it may break down. In other words, it can happen that with more data the potentially better approximation does not materialize in the computation. This is an undesirable chasm between analytical properties and numerical finite precision realization of the method. Hence, more numerical analysis work is needed before the method becomes mature enough to be implemented as a robust and reliable software tool.

Contributions and Organisation of the Paper

We identify, analyze and resolve the two main sources of numerical instability of the original method [8]. First, for a given time lag t, numerical computation of the matrix representation

U_{N}

of a compression of the Koopman operator

U^{t}

on a (finite) N-dimensional subspace may not be computed accurately enough. Secondly, even if computed accurately, the matrix

U_{N}

may be so ill-conditioned as to preclude stable computation of the logarithm. Both issues are analyzed in detail, and we propose a new numerical algorithm that implements the method.

This material can be considered a numerical supplement to [8], and as an instructive case study for numerical software development. Moreover, the techniques developed here are used in the computational part of the recently developed framework [12]. The infinitesimal generator approach has also been successfully used for learning stochastic models from aggregated trajectory data [13,14], as well as for inverse modelling of Markov jump processes [15]. The stochastic framework is not considered in this paper, but its results apply to systems defined by stochastic differential equations as described in [8]. The stochastic setting contains many challenging problems and requires sophisticated tools, based e.g., on the Kolmogorov backward equation [8], the forward and adjoint Fokker–Planck equations [16,17,18], or the Koopman operator fitting [19].

The rest of paper is organized as follows. In Section 2 we set the stage and setup a numerical linear algebra framework for the analysis of subtle details related to numerical implementation of the method. This is standard, well known material and it is included for the reader’s convenience and to introduce necessary notation. In particular, we give detailed description of a finite dimensional compression (in a discrete least squares sense) of the Koopman operator (Section 2.1), and we review the basic properties of the matrix logarithm (Section 2.3). Then, in Section 3, we review the details of the Mauroy–Goncalves method, including a particular choice of the monomial basis

B

and in Section 3.2 the details on the corresponding matrix representation of the generator (4). A case study example that illustrates the problem of numerical ill-conditioning is provided in Section 4. In Section 5, we propose a preconditioning step that allows for a more accurate computation of the logarithm

log U_{N}

in the case

K \geq N

. In Section 6 we consider the dual method for the case

N > K

, and formulate it as a compression of

U

onto a particular K dimensional subspace of

F_{N}

. This formulation is then generalized in Section 7, where we introduce a new algorithm that (out of a given N) selects a prescribed number of (at most K) basis functions that are most linearly independent, as seen on the discrete set of data snapshots. The proposed algorithm, designated as basis pruning, can be used in both the dual and the main method, and it can be combined with the preconditioning introduced in Section 5.

2. Preliminaries: Finite Dimensional Compression of $U$ and Its Logarithm

To set up the stage, in Section 2.1 we first describe matrix representation

U_{N}

of the Koopman operator compressed to a finite dimensional subspace

F_{N} \subset F

. Some details of the numerical computation of

U_{N}

are discussed in Section 2.2. In Section 2.3, we briefly review the matrix logarithm from the numerical linear algebra perspective.

2.1. Compression of $U^{t}$ and the Anatomy of Its Matrix Representation

Given

U^{t} : F ⟶ F

and an N-dimensional subspace

F_{N} \subset F

, we want to compress

U^{t}

to

F_{N}

and to work with a finite dimensional approximation

Φ_{N} U_{| F_{N}}^{t} : F_{N} ⟶ F_{N}

, where

Φ_{N} : F ⟶ F_{N}

is an appropriate projection. The subspace

F_{N}

contains functions that are simple for computation, but it is rich enough to provide good approximations for the functions in

F

; it will be materialized through an ordered basis

B = {℘_{1}, \dots, ℘_{N}}

. If an

f \in F_{N}

is expressed as

f = \sum_{i = 1}^{N} f_{i} ℘_{i}

, then the coordinates of f in the basis

B

are written as

{[f]}_{B} = {(\begin{matrix} f_{1}, \dots, f_{N} \end{matrix})}^{T}

. The ambient space

F

is equipped with the Hilbert space structure.

2.1.1. Discrete Least Squares Projection $Φ_{N} : F ⟶ F_{N}$

Since in a data-driven setting the function is known only at the points

x_{k}

, an operator compression will be defined using discrete least squares projection. For

g \in F

, the projection

Φ_{N} g = \sum_{i = 1}^{N} {\hat{g}}_{i} ℘_{i} \in F_{N}

of g is defined so that the

{\hat{g}}_{i}

’s solve the problem

\frac{1}{K} \sum_{k = 1}^{K} ω_{k}^{2} ∥ (Φ_{N} g) (x_{k}) - g (x_{k}) ∥_{2}^{2} = \frac{1}{K} \sum_{k = 1}^{K} ω_{k}^{2} {∥ \sum_{i = 1}^{N} {\hat{g}}_{i} ℘_{i} (x_{k}) - g (x_{k}) ∥}_{2}^{2} ⟶ min_{{\hat{g}}_{i}},

(5)

where

ω_{k} \geq 0

is a weight attached to each sample

x_{k}

, and

{∥ \cdot ∥}_{2}

is the Euclidean norm. This is a

L^{2}

residual with respect to the empirical measure

δ_{K}

defined as the sum of the Dirac measures concentrated at the

x_{k}

’s,

δ_{K} = (1 / K) \sum_{k = 1}^{K} δ_{x_{k}}

. The weighting will be important in the case of noisy data; it can also be used in connection with a quadrature formula so that (5) mimics a continuous norm (defined by an integral) approximation in

F

. In the unweighted case

ω_{k} = 1

for all k. The objective function (5) can be written as

{∥ W^{\frac{1}{2}} [(\begin{matrix} ℘_{1} (x_{1}) & \dots & ℘_{N} (x_{1}) \\ ⋮ & \dots & ⋮ \\ ℘_{1} (x_{K}) & \dots & ℘_{N} (x_{K}) \end{matrix}) (\begin{matrix} {\hat{g}}_{1} \\ ⋮ \\ {\hat{g}}_{N} \end{matrix}) - (\begin{matrix} g (x_{1}) \\ ⋮ \\ g (x_{K}) \end{matrix})] ∥}_{2}^{2} \equiv {∥ W^{\frac{1}{2}} [O_{X} {({\hat{g}}_{i})}_{i = 1}^{N} - {(g (x_{k}))}_{k = 1}^{K}] ∥}_{2}^{2},

where

W = diag {(ω_{k}^{2})}_{k = 1}^{K}

,

{(O_{X})}_{i j} = ℘_{j} (x_{i})

. More generally, W can be a suitable positive definite (e.g., inverse of the noise covariance) matrix. In a numerical computation the positive definite square root

W^{1 / 2}

is replaced, equivalently, with the Cholesky factor: if

W = L L^{T}

is the Cholesky factorization with the unique lower triangular factor L, then

Φ = H^{1 / 2} L^{- T}

must be orthogonal and when we replace

W^{1 / 2}

with

Φ L^{T}

, we can omit

Φ

because the norm in the above objective function is invariant under orthogonal transformations.

At this point we assume that the

K \times N

matrix

O_{X}

is of full column rank N; this requires

K \geq N

. (The rank deficient case will be discussed later.) This full column rank assumption yields the unique least squares solution

(\begin{matrix} {\hat{g}}_{1} \\ ⋮ \\ {\hat{g}}_{N} \end{matrix}) = {(O_{X}^{T} W O_{X})}^{- 1} O_{X}^{T} W (\begin{matrix} g (x_{1}) \\ ⋮ \\ g (x_{K}) \end{matrix}) = {[Φ_{N} g]}_{B},

(6)

which defines the projection

Φ_{N}

. If

g \in F_{N}

, then

{[Φ_{N} g]}_{B} = {[g]}_{B}

. In the unweighted case,

W = I_{K}

, we have

{[Φ_{N} g]}_{B} = O_{X}^{†} {(g (x_{i}))}_{i = 1}^{K}

.

2.1.2. Matrix Representation of $Φ_{N} U_{| F_{N}}^{t} : F_{N} ⟶ F_{N}$

To describe the action of

U^{t}

in

F_{N}

, we first consider how it changes the basis vectors:

U^{t} ℘_{i}

can be split as a sum of a component belonging to

F_{N}

, and a residual,

(U^{t} ℘_{i}) (x) = ℘_{i} (φ^{t} (x)) = \sum_{j = 1}^{N} u_{j i} ℘_{j} (x) + ρ_{i} (x) .

Given the limited information (only the snapshot pairs

(x_{k}, y_{k})

), the coefficients

u_{j i}

are determined so that the residual

ρ_{i}

is small over the data

x_{k}

. Since

φ^{t} (x_{k}) = y_{k}

, we have

ρ_{i} (x_{k}) = ℘_{i} (y_{k}) - \sum_{j = 1}^{N} u_{j i} ℘_{j} (x_{k}), k = 1, \dots, K,

and we can select the

u_{j i}

’s to minimize

\sum_{k = 1}^{K} ω_{k}^{2} {∥ \sum_{j = 1}^{N} u_{j i} ℘_{j} (x_{k}) - ℘_{i} (y_{k}) ∥}_{2}^{2} = {∥ W^{\frac{1}{2}} [(\begin{matrix} ℘_{1} (x_{1}) & \dots & ℘_{N} (x_{1}) \\ ⋮ & \dots & ⋮ \\ ℘_{1} (x_{K}) & \dots & ℘_{N} (x_{K}) \end{matrix}) (\begin{matrix} u_{1 i} \\ ⋮ \\ u_{N i} \end{matrix}) - (\begin{matrix} ℘_{i} (y_{1}) \\ ⋮ \\ ℘_{i} (y_{K}) \end{matrix})] ∥}_{2}^{2} .

(7)

The solution is the projection (see e.g., ([20], §2.1.3, §2.7.4))

(\begin{matrix} u_{1 i} \\ ⋮ \\ u_{N i} \end{matrix}) = {(O_{X}^{T} W O_{X})}^{- 1} O_{X}^{T} W (\begin{matrix} ℘_{i} (y_{1}) \\ ⋮ \\ ℘_{i} (y_{K}) \end{matrix}) \equiv {(O_{X}^{T} W O_{X})}^{- 1} O_{X}^{T} W (\begin{matrix} (U^{t} ℘_{i}) (x_{1}) \\ ⋮ \\ (U^{t} ℘_{i}) (x_{K}) \end{matrix}) = {[Φ_{N} U^{t} ℘_{i}]}_{B} .

Then, for any

f = \sum_{i = 1}^{N} f_{i} ℘_{i} \in F_{N}

, we have

\begin{matrix} g (x) & = & U^{t} f (x) = \sum_{i = 1}^{N} f_{i} [\sum_{j = 1}^{N} u_{j i} ℘_{j} (x) + ρ_{i} (x)] = \sum_{j = 1}^{N} ℘_{j} (x) \sum_{i = 1}^{N} u_{j i} f_{i} + \sum_{i = 1}^{N} f_{i} ρ_{i} (x) \\ = & \sum_{j = 1}^{N} ℘_{j} (x) g_{j} + \sum_{i = 1}^{N} f_{i} ρ_{i} (x), where g_{j} = \sum_{i = 1}^{N} u_{j i} f_{i}, i . e ., (\begin{matrix} g_{1} \\ ⋮ \\ g_{N} \end{matrix}) = U_{N} (\begin{matrix} f_{1} \\ ⋮ \\ f_{N} \end{matrix}) . \end{matrix}

Hence,

Φ_{N} U_{| F_{N}}^{t} : F_{N} ⟶ F_{N}

is on the basis

B

represented by the matrix

\begin{matrix} {[Φ_{N} U_{| F_{N}}^{t}]}_{B} = O_{X}^{†} O_{Y} \equiv U_{N}, \end{matrix}

(8)

\begin{matrix} Φ_{N} U_{| F_{N}}^{t} (\begin{matrix} ℘_{1} (x) & \dots & ℘_{N} (x) \end{matrix}) (\begin{matrix} f_{1} \\ ⋮ \\ f_{N} \end{matrix}) = (\begin{matrix} ℘_{1} (x) & \dots & ℘_{N} (x) \end{matrix}) (U_{N} (\begin{matrix} f_{1} \\ ⋮ \\ f_{N} \end{matrix})), \end{matrix}

(9)

where we use

W = I_{K}

for the sake of technical simplicity, and

O_{X}^{†}

is the Moore–Penrose generalized inverse. If

W \neq I_{K}

, then

{[Φ_{N} U_{| F_{N}}^{t}]}_{B} = {(O_{X}^{T} W O_{X})}^{- 1} O_{X}^{T} W O_{Y} \equiv U_{N}

.

2.1.3. When Is $U_{N}$ Nonsingular?

If both

O_{X}

and

O_{Y}

are of full column rank N, then the rank of

U_{N}

depends on the canonical angles between the ranges of

O_{X}

and

O_{Y}

. Indeed, if

O_{X} = Q_{X} R_{X}

,

O_{Y} = Q_{Y} R_{Y}

are the thin QR factorizations, then

U_{N} = R_{X}^{- 1} Q_{X}^{T} Q_{Y} R_{Y}

, where the singular values of

Q_{X}^{T} Q_{Y}

can be written in terms of the canonical angles

0 \leq θ_{1} \leq \dots \leq θ_{N} \leq π / 2

as

cos θ_{1} \geq \dots \geq cos θ_{N}

. Hence, in the case of full column rank of

O_{X}

and

O_{Y}

, nonsingularity of

U_{N}

is equivalent to

cos θ_{N} > 0

. To visualize this condition,

U_{N}

will be nonsigular if none of the ranges of

O_{X}

and

O_{Y}

contain a direction that is orthogonal to the other one. If the basis function and the flow map are well behaved and the sampling time is reasonable, it is reasonable to expect

θ_{N} < π / 2

.

2.1.4. Relations with the DMD

In the DMD framework,

℘_{1}, \dots, ℘_{N}

are the scalar components of a

N \times 1

vector valued observable evaluated at the sequence of snapshots

x_{1}, \dots, x_{K}

, and it is customary to arrange them in a

N \times K

matrix, i.e.,

O_{X}^{T}

. Similarly, the values of the observables at the

y_{k}

’s are in the matrix

O_{Y}^{T}

. The Exact DMD matrix is then

A = O_{Y}^{T} {(O_{X}^{T})}^{†} = U_{N}^{T}

. For more details on this connection we refer to [21,22].

2.2. On the Numerical Solution of $∥ O_{X} U_{N} - O_{Y} ∥_{F} \to m i n$

In general, the least squares projection

O_{X} U_{N} = O_{X} O_{X}^{†} O_{Y}

is uniquely determined, but, unless

O_{X}

is of full column rank N, the solution

U_{N}

of the problem

∥ O_{X} U_{N} - O_{Y} ∥_{F} \to min

is not unique—each of its columns is from a linear manifold determined by the null-space of

O_{X}

and we can vary them independently by adding to them arbitrary vectors from the null space of

O_{X}

. Furthermore, even when

O_{X}

is of rank N but ill-conditioned, a typical numerical least squares solver will detect the ill-conditioning by revealing that the matrix is close to singular matrices and it will treat it as numerically rank deficient. Then, the computed solution

U_{N}

becomes non-unique, the concrete result depends on the least squares solution algorithm, and it may be rank deficient. This calls for caution when computing

U_{N}

and

log U_{N}

numerically.

We discuss this in Section 2.2.1, and illustrate the problems in practice using a case study example in Section 4.

2.2.1. Least Squares Solution in Case of Numerical Rank Deficiency

If

O_{X}

is not of full column rank N, then

O_{X}^{†} O_{Y}

is one of infinitely many solutions to the least squares problem for the matrix

U_{N}

that is used to represent the operator compression. Furthermore, since it is necessarily singular, its logarithm does not exist and identifying a matrix approximation of the infinitesimal generator is not feasible. This is certainly the case when

K < N

(recall that in this case a dual form of the method is used; see Section 6), and in the case

K \geq N

, considered here, the matrix

O_{X}

can be numerically rank deficient and the software solution will return a solution that depends on a particular algorithm for solving the least squares problem.

Let

O_{X} = Φ Σ Ψ^{T}

be the SVD of

O_{X}

and let r be the rank of

O_{X}

such that

r < min (K, N)

. Let

Σ_{r} = diag {(σ_{i})}_{i = 1}^{r}

, where

σ_{1} \geq \dots \geq σ_{r} > 0

are the nonzero singular values. Partition the singular vector matrices as

Φ = (Φ_{r}, Φ_{0})

,

Ψ = (Ψ_{r}, Ψ_{0})

, where

Φ_{r}

and

Ψ_{r}

have r columns, so that

O_{X} = Φ_{r} Σ_{r} Ψ_{r}^{T}

,

O_{X}^{†} = Ψ_{r} Σ_{r}^{- 1} Φ_{r}^{T}

. Recall that the columns of the

N \times (N - r)

matrix

Ψ_{0}

are an orthonormal basis for the null space of

O_{X}

.

In the rank deficient case, the solution set for the least squares problem

∥ O_{X} U_{N} - O_{Y} ∥_{F} \to {min}_{U_{N}}

is a linear manifold—in this case of the form

N = {O_{X}^{†} O_{Y} + Ψ_{0} Ξ, Ξ \in R^{(N - r) \times N}} .

(10)

Clearly

O_{X} (O_{X}^{†} O_{Y} + Ψ_{0} Ξ) = O_{X} O_{X}^{†} O_{Y} = Φ_{r} Φ_{r}^{T} O_{Y}

.

The particular choice

U_{N} = O_{X}^{†} O_{Y} = Ψ_{r} Σ_{r}^{- 1} Φ_{r}^{T} O_{Y}

, as (8), is distinguished by being of minimal Frobenius norm, because

∥ O_{X}^{†} O_{Y} + Ψ_{0} {Ξ ∥}_{F}^{2} = ∥ O_{X}^{†} O_{Y} ∥_{F}^{2} + {∥ Ξ ∥}_{F}^{2} .

The minimality of the Euclidean norm is one criterion to pinpoint the unique solution and uniquely define the pseudo-inverse. Such minimal norm solution, which is of rank at most r, may indeed be desirable in some applications, but here it is useless if

r < N

, because we cannot proceed with computing

log U_{N}

.

It remains an interesting question whether in the rank deficient cases we can explore the solution set (10) and with some additional constraints define a meaningful matrix representation. In general, a matrix representation should as much as possible reproduce the behaviour of the operator in that subspace. For example, if

O_{X}

is of full column rank N (i.e., we have sufficiently large K and well selected observables) and the first basis function is constant,

℘_{1} (x) \equiv 1

, then the first columns of

O_{X}

and

O_{Y}

are

e = {(1, \dots, 1)}^{T}

and simple argument implies that in (8) the first column of

U_{N}

is the first canonical vector

e_{1} = {(1, \dots, 0)}^{T}

and

{[Φ_{N} U_{| F_{N}}^{t}]}_{B} {[℘_{1}]}_{B} \equiv U_{N} e_{1} = e_{1} = 1 \cdot {[℘_{1}]}_{B},

(11)

which corresponds to

U^{t} ℘_{1} = 1 \cdot ℘_{1}

. If

r < N

(so that

O_{X}

has a nontrivial null space), then we do not expect

U_{N} e_{1} = e_{1}

, but we can show that

N

contains a matrix

{\hat{U}}_{N}

such that

{\hat{U}}_{N} e_{1} = e_{1}

.

Proposition 1.

If

℘_{1} (x) \equiv 1

, then we can choose an

{\hat{U}}_{N} \in N

such that

{\hat{U}}_{N} e_{1} = e_{1}

.

Proof.

To satisfy

(O_{X}^{†} O_{Y} + Ψ_{0} Ξ) e_{1} = e_{1}

,

Ξ

must be such that

Ψ_{0} Ξ e_{1} = e_{1} - U_{N} e_{1}

. Note that

e_{1} - U_{N} e_{1}

is in the null-space of

O_{X}

:

O_{X} (e_{1} - O_{X}^{†} O_{Y} e_{1}) = O_{X} e_{1} - O_{X} O_{X}^{†} O_{Y} e_{1} = e - O_{X} O_{X}^{†} e = e - e = 0 .

Hence,

Ξ (:, 1) = Ψ_{0}^{T} (e_{1} - U_{N} e_{1}) = Ψ_{0}^{T} e_{1}

, and

Ξ (:, 2 : n)

can be set e.g., to zero to obtain

Ξ

of minimal Frobenius norm. Here also

Ξ = Ψ_{0}^{T}

is an interesting choice, but we omit the details because, in this paper, following [8], we treat the rank deficiency by a form of dual method as described in Section 6 and Section 7. □

Remark 1.

The global non-uniqueness in form of the additive term

Ψ_{0} Ξ

is non-essential when instead of

U_{N}

we use its compression in a certain subspace. For instance, if

r = K < N

the Rayleigh quotient of

U_{N}

with respect to the range of

O_{X}^{T}

remains unchanged, see Section 6.

In practice, the least squares solution using the SVD and the formula for the pseudo-inverse are often replaced by a more efficient method based on the column pivoted (rank revealing) QR factorization. For instance, the factorization [23] uses a greedy matrix volume maximizing scheme to determine a permutation

Π

such that in the QR factorization

O_{X} Π = Q R, Q \in R^{K \times N}, Q^{T} Q = I_{N}, R \in R^{N \times N} upper triangular,

(12)

the triangular factor R has a strong diagonal dominance of the form

| R_{i i} | \geq \sqrt{\sum_{k = i}^{j} {| R_{k j} |}^{2}}, 1 \leq i \leq j \leq N .

(13)

If

O_{X}

is of rank

r < N

, then

R (1 : r, 1 : r)

is nonsingular and

R (r + 1 : N, r + 1 : N) = 0

, and the least squares solution of

∥ O_{X} {z - b ∥}_{2} \to {min}_{z}

is computed as

z = Π (\begin{matrix} R {(1 : r, 1 : r)}^{- 1} Q {(1 : K, 1 : r)}^{T} b \\ 0_{N - r, 1} \end{matrix}) .

In a nearly rank deficient (i.e., ill-conditioned) case, the index r is determined so that setting

R (r + 1 : N, r + 1 : N)

to zero introduces error below a tolerance that is of order of machine precision. This is the solution returned by the backslash operator in Matlab. Hence, in the numerical rank deficient cases, the solution will have at least

N - r

zero components, and it will be in general different from the solution obtained using the pseudo-inverse. Here too, some additional constraints can be satisfied under some conditions, as shown in the following:

Proposition 2.

Let

O_{X} Π = Q R

be the column pivoted QR factorization in the first step of the backslash operator when solving the least squares problem

∥ O_{X} U_{N} - O_{Y} ∥_{F} \to min

, where

r = rank (O_{X}) < N

and

O_{X} (:, 1) = O_{Y} (:, 1)

. If the first column of

O_{X}

is selected among the leading k pivotal columns in the permutation matrix Π, then

U_{N} (:, 1) = e_{1}

.

2.3. Computing the Logarithm $log O_{X}^{†} O_{Y}$

The key element of the Mauroy–Goncalves method is that

U_{N} \approx e^{L_{N} t}

, where

L_{N}

is a matrix representation of a compression of the infinitesimal generator (3), that is computed as

L_{N} = (1 / t) log U_{N}

. Hence, to use the matrix

L_{N}

as an ingredient of the identification method, it is necessary that

U_{N} = O_{X}^{†} O_{Y}

is nonsingular; otherwise

log U_{N}

does not exist. Furthermore, to achieve the primary value of the logarithm (as a primary matrix function, i.e., the same branch of the logarithm used in all Jordan blocks), the matrix must not have any real negative eigenvalues. Only under those conditions we can obtain a real (assuming real

U_{N}

) logarithm as the primary function.

For the reader’s convenience, we summarize the key properties of the matrix logarithm and refer to [24] for proofs and more details.

Theorem 1.

Let A be real nonsingular matrix. Then A has real logarithm if and only if A has an even number of Jordan blocks of each size for every negative eigenvalue.

Theorem 2.

Let A be real nonsingular matrix. Then A has a unique real logarithm if and only if all eigenvalues of A are positive real and no eigenvalue has more than one Jordan block in the Jordan normal form of A.

Theorem 3.

Suppose that the

n \times n

complex A has no eigenvalue on

(- \infty, 0]

. Then a unique logarithm of A can be defined with eigenvalues in the strip

{z \in C : - π < ℑ (z) < π}

. It is called the principal logarithm and denoted by

log A

. If A is real, then its principal logarithm is real as well.

In an application, the matrix

U_{N} = O_{X}^{†} O_{Y}

may be difficult to compute accurately and it may be so severely ill-conditioned that in the finite precision computations it could appear numerically rank deficient (recall the discussion in Section 2.2). Thus, from the numerical point of view, the most critical part of the method is computing

U_{N}

and its logarithm. For a detailed analysis of numerical methods for computing the matrix logarithm we refer the reader to ([25], Chapter 11).

3. Identification Method

To introduce the new numerical implementation, we need more detailed description of the method [8] and its concrete realization. In Section 3.1, we select the subspace

F_{N}

as the span of the monomials in n variables. The key idea of the method, to explore the connection

U^{t} = e^{L t}

in the framework of finite dimensional compressions of

U^{t}

and

L

, is reviewed in detail in Section 3.2. This is a rather challenging step, both in terms of theoretical justification of the approximation (including convergence when

N, K \to \infty

,

t \to 0

) and the numerical realization. As an interesting detail, we point out in Section 3.3.1 that a structure aware reconstruction/identification can be naturally formulated for e.g., the important class of quadratic systems. This generates an interesting structured least squares problem.

3.1. The Choice of the Basis $B$ —Monomials

For a concrete application, the choice of a suitable basis depends on the assumption on the structure of

F

. The monomial basis is convenient if

F

is a polynomial field, or if it can be well approximated by polynomials. We assume that

F (x) = (\begin{matrix} \sum_{k = 1}^{N_{F}} ϕ_{k 1} x_{1}^{s_{1}^{(k)}} x_{2}^{s_{2}^{(k)}} \dots x_{n}^{s_{n}^{(k)}} \\ ⋮ \\ \sum_{k = 1}^{N_{F}} ϕ_{k n} x_{1}^{s_{1}^{(k)}} x_{2}^{s_{2}^{(k)}} \dots x_{n}^{s_{n}^{(k)}} \end{matrix}) = (\begin{matrix} F_{1} (x) \\ ⋮ \\ F_{n} (x) \end{matrix}), F_{j} (x) = \sum_{k = 1}^{N_{F}} ϕ_{k j} x^{s^{(k)}},

(14)

where

x^{s^{(k)}} = x_{1}^{s_{1}^{(k)}} x_{2}^{s_{2}^{(k)}} \dots x_{n}^{s_{n}^{(k)}}

are monomials written in multi-index notation and have a total degree of at most

m_{F}

. In that case,

F_{N}

is chosen as the space of polynomials up to some total degree m,

m \geq m_{F}

i.e.,

F_{N} = span (B)

, where

B = {x_{1}^{s_{1}} x_{2}^{s_{2}} \dots x_{n}^{s_{n}} : s_{i} \in N_{0}, s_{1} + s_{2} + \dots + s_{n} \leq m}, N = (\binom{n + m}{n}) \geq N_{F} .

(15)

To facilitate automatic and relatively simple matrix representation of linear operators acting on

F_{N}

, we choose graded lexicographic ordering (grlex) of

B

, which is one of the standard procedures in the multivariate polynomial framework. Grlex orders the basis so that it first divides the monomials in groups with same total degree; the groups are listed with increasing total degree and inside each group the monomials are ordered so that their exponents

s = (s_{1}, \dots, s_{n}) \in N_{0}^{n}

are lexicographically ordered. For example, if

n = 3

,

m = 2

, we have the order as follows (read the tables in (16) column-wise; each column corresponds to the monomials of the same total degree, ordered lexicographically):

[\begin{matrix} 1 (000) & x_{3} (001) & x_{3}^{2} (002) \\ x_{2} (010) & x_{2} x_{3} (011) \\ x_{1} (100) & x_{2}^{2} (020) \\ x_{1} x_{3} (101) \\ x_{1} x_{2} (110) \\ x_{1}^{2} (200) \end{matrix}] (s_{1}, \dots, s_{n}) ⇄ k [\begin{matrix} 1 & 2 & 5 \\ 3 & 6 \\ 4 & 7 \\ 8 \\ 9 \\ 10 \end{matrix}] .

(16)

If we want to emphasize that

s = (s_{1}, \dots, s_{n})

is at the kth position in this ordering, we write

s^{(k)} = (s_{1}^{(k)}, \dots, s_{n}^{(k)})

, and the corresponding monomial is written as

x^{s^{(k)}} \equiv x_{1}^{s_{1}^{(k)}} x_{2}^{s_{2}^{(k)}} \dots x_{n}^{s_{n}^{(k)}}

. An advantage of grlex in our setting is that it allows simple extraction of the operator compression to a subspace spanned by monomials of lower total degree.

It should be noted that the dimension N grows extremely fast with increased n and m, which is the source of many computational difficulties, in particular when combined with the requirement

K \geq N

which is a necessary condition for the non-singularity of

U_{N}

. (This difficulty is alleviated by the dual method.)

Even though polynomial basis is not always the best choice, it serves well for the purposes of this paper because, with increased total degree, it generates highly ill-conditioned numerical examples that are good stress test cases for development, testing and analysis of numerical implementation.

3.2. Compression of $L$ in the Monomial Basis

Consider now the action of

L

to the vectors of the monomial basis

B

. It is a straightforward and technically tedious task to identify the columns of the corresponding matrix

{[L_{N}]}_{B}

, whose approximation can also be computed as

\frac{1}{t} log U_{N}

. Since we are interested only in the coefficients

ϕ_{k j}

in (14), it is enough to compute only some selected columns of

{[L_{N}]}_{B}

.

Let ℓ be the index of

x_{j}

in the grlex ordering, i.e.,

℘_{ℓ} (x) = x_{j}

;

ℓ = n + 2 - j

(see the scheme (16)). Then the application of (4) to

℘_{ℓ}

reads

(L ℘_{ℓ}) (x) = (F \cdot \nabla ℘_{ℓ}) (x) = \sum_{i = 1}^{n} F_{i} (x) \frac{\partial}{\partial x_{i}} ℘_{ℓ} (x) = F_{j} (x) \equiv F_{n + 2 - ℓ} (x), i . e ., L ℘_{ℓ} = F_{n + 2 - ℓ} .

If

L_{N} = Φ_{N} L_{| F_{N}}

, then also

L_{N} ℘_{ℓ} = F_{j}

(because of the assumption (14)

F_{j} \in F_{N}

). Hence, in the basis

B

we have

{[L_{N}]}_{B} (:, ℓ) = {[Φ_{N} L ℘_{ℓ}]}_{B} = {[F_{j}]}_{B} = (\begin{matrix} ϕ_{1 j} \\ ϕ_{2 j} \\ ⋮ \\ ϕ_{N_{F} j} \\ 0_{N - N_{F}} \end{matrix}), where j = n + 2 - ℓ, ℓ = 2, \dots, n + 1 .

(17)

In other words, the coordinates of

F_{j}

are encoded in

{[L_{N}]}_{B} (:, n + 2 - j)

. Finally the entries of

{[L_{N}]}_{B}

can be obtained with the compression

U_{N}

computed from data. Indeed, it is shown in [8] that

lim_{t \to 0^{+}} ∥ Φ_{N} L f - (1 / t) log Φ_{N} U^{t} f ∥ = 0, f \in F_{N} .

Hence, provided that

B

contains independent basis functions, we have

{[L_{N}]}_{B} = lim_{t \to 0^{+}} \frac{1}{t} {[log Φ_{N} U_{| F_{N}}^{t}]}_{B} = lim_{t \to 0^{+}} \frac{1}{t} log {[Φ_{N} U_{| F_{N}}^{t}]}_{B},

and it follows that, for t small enough,

{[L_{N}]}_{B} \approx \frac{1}{t} log U_{N} \equiv \frac{1}{t} log O_{X}^{†} O_{Y} .

(18)

For each

F_{j}

, its coefficients in the expansion (14) are simply read off from the corresponding column of

{[L_{N}]}_{B}

. Alternatively, we can identify additional columns and determine the coefficients by solving a least squares problem. For more details we refer to [8].

3.3. Imposing the Structure in the Reconstruction of F

Using (17), the values of

F_{j}

at the

x_{k}

’s can be approximated using the values

(\begin{matrix} \tilde{F_{j} (x_{1})} \\ ⋮ \\ \tilde{F_{j} (x_{K})} \end{matrix}) = (\begin{matrix} ℘_{1} (x_{1}) & \dots & ℘_{N} (x_{1}) \\ ⋮ & \dots & ⋮ \\ ℘_{1} (x_{K}) & \dots & ℘_{N} (x_{K}) \end{matrix}) (\begin{matrix} ϕ_{1 j} \\ ϕ_{2 j} \\ ⋮ \\ ϕ_{N_{F} j} \\ 0_{N - N_{F}} \end{matrix}) = O_{X} \cdot {[L_{N}]}_{B} (:, n + 2 - j), j = 1, \dots, n .

(19)

These can be used for expressing

F_{j}

using another suitable dictionary of functions

q_{i j}

(e.g., rational) by decoupled least squares fitting for the functions

F_{j}

: with an ansatz

F_{j} = \sum_{i = 1}^{N_{F_{j}}} φ_{i j} q_{i j}

, the coefficients

φ_{i j}

are determined to minimize

∥ (\begin{matrix} \tilde{F_{j} (x_{1})} \\ ⋮ \\ \tilde{F_{j} (x_{K})} \end{matrix}) - (\begin{matrix} q_{1 j} (x_{1}) & q_{2 j} (x_{1}) & \dots & q_{N_{F_{j}} j} (x_{1}) \\ ⋮ & ⋮ & \dots & ⋮ \\ q_{1 j} (x_{K}) & q_{2 j} (x_{K}) & \dots & q_{N_{F_{j}} j} (x_{K}) \end{matrix}) (\begin{matrix} φ_{1 j} \\ ⋮ \\ φ_{N_{F_{j}} j} \end{matrix}) ∥,

(20)

where

∥ \cdot ∥

is an appropriate (possibly weighted) norm.

Remark 2.

Note that (20) is slightly more general than in [8]—we can use separate dictionaries for each coordinate function

F_{j}

, which allows fitting variables of different (physical, thus mathematical) nature separately, with most appropriate classes of basis functions.

In [8], it is recommended to solve the regression problem (20) with a sparsity promoting method, thus revealing the underlying structure. In many cases, the sparsity is known to be specially structured, and we can exploit that information. We illustrate our proposed approach using the quadratic systems.

3.3.1. Quadratic Systems

Suppose we know that the system under study is quadratic, i.e.,

F (x) = A x + G (x \otimes x)

. Quadratic systems are important class of dynamical systems, with many applications and interesting theoretical properties, see e.g., [26].

With the approximate field values

\tilde{F (x_{1})}, \dots, \tilde{F (x_{K})}

, we can seek

A \in R^{n \times n}

and

G \in R^{n \times n^{2}}

to achieve

(\begin{matrix} \tilde{F (x_{1})} & \dots & \tilde{F (x_{K})} \end{matrix}) \approx A (\begin{matrix} x_{1} & \dots & x_{K} \end{matrix}) + G (\begin{matrix} x_{1} \otimes x_{1} & \dots & x_{K} \otimes x_{K} \end{matrix}) .

(21)

If we set

\tilde{F} = (\begin{matrix} \tilde{F (x_{1})} & \dots & \tilde{F (x_{K})} \end{matrix})

,

X = (\begin{matrix} x_{1} & \dots & x_{K} \end{matrix})

, then the identification of the coefficient matrices

A

and

G

reduces to solve the matrix least squares problem

{∥ (\begin{matrix} X^{T} & {(X ⊙ X)}^{T} \end{matrix}) (\begin{matrix} A^{T} \\ G^{T} \end{matrix}) - {\tilde{F}}^{T} ∥}_{F} ⟶ min_{A, G},

(22)

where

X ⊙ X = (\begin{matrix} x_{1} \otimes x_{1} & \dots & x_{K} \otimes x_{K} \end{matrix}) \in R^{n^{2} \times K}

is the Khatri–Rao product. Here too, one can add a sparsity promoting regularization, with the implicitly defined underlying quadratic structure.

4. Numerical Implementation—A Case Study Analysis

When it comes to turning a numerical algorithm into software, it is often straightforward to write a few lines in Matlab, Python, Octave or some other software package and have a running implementation of a sophisticated procedure obtained by composition of building blocks (subroutines). However, one should keep in mind that the final numerical computation is in finite precision (machine) arithmetic, and that in some cases development of robust numerical software requires a more careful approach. In this section, we use a case study example that reveals difficulties from the numerical software point of view, and that motivates modifications to alleviate them.

4.1. An Example: Lorenz System

A good way to test robustness of a numerical algorithm is to push it to its limits. In this case, we choose a difficult test case and let the dimensions of the data matrices grow by increasing the total degree m of the polynomial basis (and thus the dimension N) and matching that with increased K so that

K > N

. The main goal is to provide a case study example.

Example 1.

Consider the Lorenz system

(\begin{matrix} {\dot{x}}_{1} \\ {\dot{x}}_{2} \\ {\dot{x}}_{3} \end{matrix}) = (\begin{matrix} - 10 & 10 & 0 \\ 28 & - 1 & 0 \\ 0 & 0 & - 8 / 3 \end{matrix}) (\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \end{matrix}) + (\begin{matrix} 0 \\ - x_{1} x_{3} \\ x_{1} x_{2} \end{matrix}) .

(23)

The exact coefficients, ordered to match the grlex ordering of the monomial basis are

\begin{array}{r} 1 & x_{3} & x_{2} & x_{1} & x_{3}^{2} & x_{2} x_{3} & x_{2}^{2} & x_{1} x_{3} & x_{1} x_{2} & x_{1}^{2} \\ F_{1} : & 0 & 0 & 1.0000 \times 10^{1} & - 1.0000 \times 10^{1} & 0 & 0 & 0 & 0 & 0 & 0 \\ F_{2} : & 0 & 0 & - 1.0000 \times 10^{0} & 2.8000 \times 10^{1} & 0 & 0 & 0 & - 1.0000 \times 10^{0} & 0 & 0 \\ F_{3} : & 0 & - 2.6667 \times 10^{0} & 0 & 0 & 0 & 0 & 0 & 0 & 1.0000 \times 10^{0} & 0 \end{array} .

To collect data, we ran simulations with 55 random initial conditions and from each trajectory we randomly (independently) selected 55 points, giving a total of

K = 3025

pairs

(x_{k}, y_{k})

. The simulations were performed in Matlab, using the ode45() solver in the time interval

[0, 0.2]

with the time step

δ t = 10^{- 3}

. In the key Formula (18), we computed the logarithm in Matlab in two ways, as logm (pinv

(O_{X}) * O_{Y}

) and as logm

(O_{X} \ O_{Y})

and obtained nearly the same matrix. Of course, using the pseudoinverse explicitly is not recommended. We will use it in this section for illustrative purposes; recall the discussion in Section 2.2. The computed approximations of the coefficients of (23), with

m = 3

,

N = 20

and

m_{F} = 2

, are

\begin{array}{r} 1 & x_{3} & x_{2} & x_{1} & x_{3}^{2} & x_{2} x_{3} & x_{2}^{2} & x_{1} x_{3} & x_{1} x_{2} & x_{1}^{2} \\ F_{1} : & 3.2 \times 10^{- 5} & - 2.5 \times 10^{- 6} & 1.0000 \times 10^{1} & - 1.0000 \times 10^{1} & - 7.5 \times 10^{- 7} & - 5.2 \times 10^{- 6} & 1.5 \times 10^{- 7} & 7.4 \times 10^{- 6} & - 1.6 \times 10^{- 6} & 8.0 \times 10^{- 7} \\ F_{2} : & - 3.4 \times 10^{- 5} & - 8.7 \times 10^{- 6} & - 1.0000 \times 10^{0} & 2.8000 \times 10^{1} & 4.7 \times 10^{- 6} & 1.3 \times 10^{- 5} & 8.7 \times 10^{- 9} & - 1.0001 \times 10^{0} & - 6.1 \times 10^{- 6} & 6.1 \times 10^{- 6} \\ F_{3} : & 2.8 \times 10^{- 4} & - 2.6667 \times 10^{0} & - 5.5 \times 10^{- 6} & - 6.9 \times 10^{- 6} & - 8.3 \times 10^{- 6} & 2.9 \times 10^{- 6} & 1.6 \times 10^{- 8} & - 1.0 \times 10^{- 5} & 1.0000 \times 10^{0} & 5.7 \times 10^{- 6} \end{array} .

The nonzero coefficients are matched to five digits of accuracy, and the remaining coefficients are below

O (10^{- 5})

. Nearly the same result is obtained if the samples from all trajectories have the same time stamps. Given the difficulty of the Lorenz example and the fact that the results are obtained by a low dimensional approximation of a nontrivial (numerically simulated) dynamics, the results are good. Analytical properties of the method indicate that increasing dimensions provides increased accuracy, ultimately yielding convergence.

Example 2.

Next, we illustrate the reconstruction of the function

F

. We simulated the system in the interval

[0, 4]

with

δ t = 0.01

, and used the monomials with the total degree up to

m = 3

. In the discrete time grid, we randomly selected 9 positions at which we took three consecutive values from each trajectory; in the second experiment 27 randomly selected values of

x_{k}

are taken from each trajectory. The total number of trajectories (with random initial conditions) was set to 150.

F

is approximated using (19). For each

x_{k}

, we compute the approximation error as

ϵ_{k} = max_{i = 1, 2, 3} \frac{| \tilde{F_{i} (x_{k})} - F_{i} (x_{k}) |}{∥ F (x_{k}) ∥_{\infty}} .

(24)

The sampling points and the values of

ϵ_{k}

are shown in Figure 1.

Example 3.

Now we use the data snapshots from Example 1, and increase the total degree to

m = 9

, thus increasing N from

N = 20

to

N = 220

. Recall that

K = 3025

. Surprisingly, the computed coefficients are all complex, and are completely off; their absolute values are (the euclidean norms of the real and the imaginary parts of the vector of the computed coefficients are

O (10^{6})

.

\begin{array}{r} 1 & x_{3} & x_{2} & x_{1} & x_{3}^{2} & x_{2} x_{3} & x_{2}^{2} & x_{1} x_{3} & x_{1} x_{2} & x_{1}^{2} \\ F_{1} : & 4.6 \times 10^{3} & 1.6 \times 10^{6} & 3.6 \times 10^{5} & 3.0 \times 10^{4} & 1.5 \times 10^{4} & 1.2 \times 10^{4} & 4.3 \times 10^{3} & 1.0 \times 10^{4} & 9.1 \times 10^{3} & 1.0 \times 10^{4} \\ F_{2} : & 3.0 \times 10^{4} & 1.2 \times 10^{6} & 2.8 \times 10^{5} & 1.0 \times 10^{4} & 9.6 \times 10^{3} & 1.2 \times 10^{4} & 7.6 \times 10^{3} & 2.1 \times 10^{4} & 9.5 \times 10^{3} & 5.5 \times 10^{3} \\ F_{3} : & 6.8 \times 10^{3} & 2.4 \times 10^{5} & 5.5 \times 10^{4} & 2.2 \times 10^{3} & 1.9 \times 10^{3} & 2.6 \times 10^{3} & 1.6 \times 10^{3} & 4.5 \times 10^{3} & 2.0 \times 10^{3} & 1.1 \times 10^{3} \end{array} .

The approximation of

F

is also bad, see Figure 2. Increasing the number of trajectories and samples per trajectory to obtain

K =

24,025 did not bring any improvement.

Although increasing N, with correspondingly large K, should be the step toward better approximation, the numerical implementation breaks down. In situation like this one, it is important to find out and understand the sources of the problem.

4.2. What Went Wrong

It is clear that in this computation the most sensitive part is computation of the matrix logarithm, and that this is the most likely source of problems. Indeed, in the last run in Example 3 a warning was issued by the logm() function:

Warning: the principal matrix logarithm is not defined for A with nonpositive real eigenvalues. A non-principal matrix logarithm is returned.

A closer inspection of the eigenvalues of

U_{N}

, shown in Figure 3 confirms that

U_{N}

indeed has problematic (real negative) eigenvalues.

Although Figure 3 and Figure 4 show numerically computed eigenvalues, by a backward stability analysis argument one can argue that

U_{N}

is certainly close to matrices with eigenvalues that preclude computing the logarithm in finite precision arithmetic. Hence, computation of the logarithm must be ill-conditioned as the ill-conditioning is essentially measured as the inverse distance to singularity [27].

Another look at

U_{N}

reveals that its first column is not what it should be. In this example we use monomials and the first basis vector is the constant. Since

U^{t} ℘_{1} = 1 \cdot ℘_{1}

, we expect

{[Φ_{N} U_{| F_{N}}^{t}]}_{B} {[℘_{1}]}_{B} \equiv U_{N} e_{1} = e_{1} = 1 \cdot {[℘_{1}]}_{B}

, which clearly follows from the solution of the least squares problem (7), if the coefficient matrix is of full column rank. On the other hand, the first column of

U_{N}

is computed as shown in Figure 5.

Two conclusions are immediate. First,

O_{X}

is considered (by the software) numerically rank deficient. Secondly, two different software solutions of the same problem return two different solutions. For the data shown in Figure 3 and Figure 4, we computed

U_{N}

as

pinv (O_{X}) * O_{Y}

. This is in general not a good idea for solving least squares problems; nevertheless, it can be often seen in the literature and software implementations as a textbook-style numerical method for least squares problems, albeit a wrong one. We included it here for instructive purposes. If we repeat the experiment with the backslash operator,

O_{X} \ O_{Y}

, the results are, unfortunately, not better. One conspicuous difference is that, instead of the cluster of absolutely small eigenvalues of

pinv (O_{X}) * O_{Y}

(see Figure 4),

O_{X} \ O_{Y}

has a zero eigenvalue of multiplicity 45. This multiple zero eigenvalue is a consequence of the sparsity structure of

O_{X} \ O_{Y}

, see Figure 6.

In the course of solving the least squares problem, in both methods (explicit use of the pseudoinverse or the backslash operator), the coefficient matrix

O_{X}

is truncated (small singular values are set to zero if

pinv (O_{X})

is used; the upper triangular factor is truncated in the case of backslash, which uses a rank revealing QR factorization) and the computed

U_{N}

, due to rounding errors, is small a perturbation of a rank-deficient matrix with a number of eigenvalues in the vicinity of zero. As a result, the matrix logarithm function fails.

The data shown in Figure 7 are instructive. The condition number of

O_{X}

and the distribution of its singular values are nicely revealed by the column norms of

O_{X}

. The LS solver automatically truncates small singular values that originate from small columns (small basis function values over the snapshots

x_{k}

) and not necessarily from collinearity in the sense of small angles between basis functions.

Look at the singular values

σ_{1} (U_{N}) \geq \dots \geq σ_{N} (U_{N})

in the right panel, in particular the gap

σ_{176} (U_{N}) \approx 1.4058 \times 10^{- 6} ≫ σ_{177} (U_{N}) \approx 3.9261 \times 10^{- 10}

. The numerical rank of

U_{N}

is determined as 176 because

σ_{176} (U_{N})

is the smallest singular value that is above the threshold

N \cdot eps \cdot σ_{1} (U_{N}) \approx 4.6543 \times 10^{- 8}

. The index 176 originates in the numerical rank of

O_{X}

which is determined in Matlab (

rank (O_{X})

returns 176) by the default tolerance. Note that there is no such visible gap (cliff) in the ordered singular values of

O_{X}

.

Remark 3.

It should be also mentioned here that the cosines of the canonical angles between the ranges of

O_{X}

and

O_{Y}

are between 0.988 and one (cf. Section 2.1.3), and that we expect the spectrum of

U_{N}

not to be too far away from the unit circle. The intuition is that (in the process of computation of

O_{X}^{†} O_{Y}

) the truncated part of

O_{X}

did not (could not) cancel the corresponding part in

O_{Y}

, which resulted in the problematic cluster of eigenvalues near zero. (In Figure 3, the number of eigenvalues in the cluster around zero is 44, which is the numerical rank deficiency, because the numerical rank is determined (from the singular values) as 176.)

Remark 4.

In the computations pinv

(O_{X}) * O_{Y}

and

O_{X} \ O_{Y}

, the truncation is conducted based on the SVD and the rank revealing QR factorization, respectively, of

O_{X}

, independent of

O_{Y}

. It is more appropriate to solve each LS problem (7) separately (for the corresponding column of

U_{N}

), and use the truncation strategy following Rust [28]. Note that this is a more general issue because the matrix

O_{X}^{†} O_{Y}

is often used in the Koopman/DMD setting. (See [29] for a detailed numerical analysis of the DMD and its variations.) Furthermore, the problem can become even more difficult if we have weighting matrix W with scaling factors that spread over several orders of magnitude. (In this paper, we work with

W = I_{K}

for the sake of brevity.)

Remark 5.

In the case of noisy data, it might be advantageous to replace the least squares fit with the total least squares ([20], §2.7.6). This is an important issue and we leave it for our future work.

5. Computing ${[L_{N}]}_{B}$ with Preconditioning

Numerical examples in Section 4 show that implementation of the method in state of the art packages such as Matlab which is straightforward and effortless: the key computation is coded as

logm (O_{X} \ O_{Y})

or as

logm (pinv (O_{X}) * O_{Y})

. However, the results are not always satisfactory, and the problems are both in the computation of

U_{N}

(solving the least squares problem) and in computing the matrix logarithm. In this section, we use the structure of the least squares problem solver (reviewed in Section 2.2.1) to introduce a simple modification and then to construct preconditioned computations of

log U_{N}

. These techniques can be combined with the dual method that is analyzed in detail in Section 6 and Section 7.

5.1. A Simple Modification

The first attempt to avoid some of the problems illustrated in Section 4 is to prevent truncation when computing

O_{X}^{†} O_{Y}

. So we simply run the procedure used in the backslash solver (see Section 2.2.1), but without truncation. More precisely, using the column pivoted QR factorization

O_{X} Π = Q R

,

U_{N}

is computed as

U_{N} = Π R^{- 1} Q^{T} O_{Y}

.

With

m = 9

and same setting as in Example 1, the computation of the logarithm of

U_{N}

was successful. (Matlab function logm() computed a real valued logarithm, without an error/warning message) and the coefficients of (23) are recovered to four digits of accuracy, which is satisfactory. The remaining coefficients are below

O (10^{- 4})

and are discarded in the Formula (19) for reconstructing

F

. The first column of

U_{N}

was

e_{1}

up to an error of the order of the roundoff.

The results shown in Figure 8 are encouraging, because they show that the data contain information that can be turned into more accurate output, provided that the algorithm successfully curbs the ill-conditioning. Furthermore, test with

m = 10

(

κ_{2} (U_{N}) \approx 2.9 \times 10^{20}

) showed nearly the same four digit accuracy; with

m = 11

(

κ_{2} (U_{N}) \approx 2.2 \times 10^{24}

) the accuracy dropped to two digits (but the logarithm was successfully computed); with

m = 12

(

κ_{2} (U_{N}) \approx 7.7 \times 10^{26}

) the logarithm is still computed as real matrix, but the accuracy of the computed coefficients drops to

O (10^{- 1})

and the reconstruction of

F

fails. However, if we increase K to 24,025 (by sampling more points from more trajectories) the accuracy rebounds to at least for digits both in the coefficients and the reconstruction of

F

. At

m = 16

(

N = 969

and

K =

24,025, or

K = 3025

) the accuracy is lost. The condition number of

U_{N}

is greater than

10^{37}

.

Our adversarial testing is used to expose potential weaknesses of the computational steps in a software implementation of the algorithm. Of course, all these outcomes may vary, depending on the number of trajectories, initial conditions, sampling resolution; in some examples it is possible that the computation of the matrix logarithm breaks down even for smaller values of m. ( Moreover, numerical accuracy, i.e., the conditioning of the problem, heavily depends on the underlying dynamics). In any case, at this point we have to conclude that using the

log O_{X}^{†} O_{Y}

is a source of nontrivial numerical difficulties that preclude efficient and robust deployment of the Mauroy–Goncalves method. The theoretical convergence (as

N, K \to \infty

,

t \to 0

) is not matched by numerical convergence of a straightforward implementation of the method in finite precision computation.

5.2. Preconditioning the Logarithm of $O_{X}^{†} O_{Y}$

The discussion in Section 4.2 and Section 5.1 is only one step toward more robust computation—it merely identifies the problem and shows how small changes in an implementation substantially change the final result. Clearly, sufficiently accurate computation of

U_{N}

is among the necessary conditions for successful computation of the matrix algorithms. However, this is not sufficient; even if we compute

U_{N}

accurately, computation of the logarithm may fail if the matrix is ill-conditioned.

We now introduce a new scheme that uses functional calculus-based preconditioning. If S is any nonsingular matrix, then

log (O_{X}^{†} O_{Y}) = S log (S^{- 1} (O_{X}^{†} O_{Y}) S) S^{- 1} \equiv S log ({(O_{X} S)}^{†} O_{Y} S) S^{- 1} .

(25)

Note that replacing

O_{X}^{†} O_{Y}

with the similar matrix

S^{- 1} (O_{X}^{†} O_{Y}) S

corresponds to changing the basis for matrix representation of the compressed Koopman operator. With the experience from Section 4.2, it is clear that the key is to compute the preconditioned matrix

S^{- 1} (O_{X}^{†} O_{Y}) S

without first computing

O_{X}^{†} O_{Y}

. (Once we compute

O_{X}^{†} O_{Y}

explicitly in floating point arithmetic and store it in the machine memory, it may be then too late even for exact computation.)

The conditions on S are:

(i): Tt should facilitate more accurate computation of the argument $S^{- 1} (O_{X}^{†} O_{Y}) S = {(O_{X} S)}^{†} O_{Y} S$ for the matrix logarithm;
(ii): It should have preconditioning effect for computing the logarithm of $S^{- 1} (O_{X}^{†} O_{Y}) S$ ;
(iii): The application of S and $S^{- 1}$ should be efficient and numerically stable.

Example 4.

To test the concept, we use the same data as in Example 1 with

m = 9

, and for the matrix S we take

S = diag (1 / ∥ O_{X} (:, i) {∥_{2})}_{i = 1}^{N}

and compute

L_{N} = \frac{1}{t} S log ({(O_{X} S)}^{†} (O_{Y} S)) S^{- 1} .

(26)

Contrary to the failure of the formula

L_{N} = (1 / t) log (O_{X}^{†} O_{Y})

, (26) computes the real logarithm of the explicitly computed

{(O_{X} S)}^{†} (O_{Y} S)

, and recovers the coefficients with an

O (10^{- 5})

relative error. That is, we scale

O_{X}

and

O_{Y}

by

diag (1 / ∥ O_{X} (:, i) {∥_{2})}_{i = 1}^{N}

, and then proceed by solving the least squares problem with the thus scaled matrices. To understand the positive result, we first note that the condition number of

O_{X} S

is

κ_{2} (O_{X} S) \approx 1.3 \times 10^{9}

, so the least squares solution is computed without truncation. The condition number of

{(O_{X} S)}^{†} (O_{Y} S)

was

κ_{2} ({(O_{X} S)}^{†} (O_{Y} S)) \approx 6.4 \times 10^{5}

. (The condition number of

O_{X}^{†} O_{Y}

is

O (10^{20})

so that the matrix is considered numerically rank deficient.) With

m = 12

we had

κ (U_{N}) \approx 6.9 \times 10^{61}

and

κ_{2} ({(O_{X} S)}^{†} (O_{Y} S)) \approx 1.4 \times 10^{13}

; the coefficients are recovered to three accurate digits and approximation of

F

is with error slightly larger than the one in the right panel of Figure 8.

However, as m increases, the diagonal scaling cannot cope with the increased condition number; already at

m = 15

, a complex non-principal value of the logarithm is computed, with some eigenvalues whose imaginary parts are equal

π / δ t

. The computed

U_{N}

has a cluster of eigenvalues around zero. For the record,

κ_{2} (U_{N}) \approx 2.2 \times 10^{220}

,

κ_{2} ({(O_{X} S)}^{†} (O_{Y} S)) \approx 6.7 \times 10^{78}

. Surprisingly, the approximate coefficients, although complex, have small imaginary parts (of the order of the roundoff) and their real parts still provide reasonably good approximations of the true coefficients.

5.2.1. Scaled QR Factorization Based Preconditioner

To develop a stronger preconditioner, we start with the following observation: No matter how ill-conditioned

O_{X}

and

O_{Y}

may be (in the sense of badly scaled columns), the distance between the ranges of

O_{X}

and

O_{Y}

, as measured by the canonical angles between the subspaces, should not be too big. (Intuitively,

O_{Y}

contains the observables evaluated at the states

y_{k}

downstream in time

δ t

from the

x_{k}

’s used in

O_{X}

. Recall Remark 3.)

Hence, if we compute the QR factorization of

O_{X}

, the inverse of its triangular factor will have a preconditioning effect on

O_{Y}

by postmultiplication. This leads to Algorithm 1.

Algorithm 1

[L_{N}] = Inf_Generator_QRSC (O_{X}, O_{Y}, T)

Input:: $O_{X} \in C^{K \times N}$ , $O_{Y} \in C^{K \times N}$ , $T > 0$
1:: $S = diag (1 / ∥ O_{X} (:, i) {∥_{2})}_{i = 1}^{N}$
2:: $[Q_{X}, {\hat{R}}_{X}] = qr (O_{X} S)$ {QR factorization}
3:: ${\hat{U}}_{N} = Q_{X}^{T} (O_{Y} S) {\hat{R}}_{X}^{- 1}$ { ${\hat{U}}_{N}$ is similar to $O_{X}^{†} O_{Y}$ .}
4:: ${\hat{L}}_{N} = log ({\hat{U}}_{N})$
5:: $L_{N} = (1 / T) S ({\hat{R}}_{X}^{- 1} {\hat{L}}_{N} {\hat{R}}_{X}) S^{- 1}$
Output:: $L_{N}$ . { $L_{N} = (1 / T) log (O_{X}^{†} O_{Y})$ }

Example 5.

To test this algorithm, we use the data from Example 1, take

m = 15

and increase the number of snapshots to K = 24,025. The matrix

{\hat{U}}_{N}

is well conditioned,

κ_{2} ({\hat{U}}_{N}) \approx 1.2 \times 10^{2}

, and computing the matrix logarithm is successful. The coefficients are recovered to four digits of accuracy, and the reconstruction of

F

is slightly better than the one shown in the right panel in Figure 8.

This example, as a test of the proposed approach, is encouraging. Our next task is to further develop the method along the lines of Algorithm 1, and to provide a robust method with accompanying numerical analysis, and finally to implement it as a reliable software toolbox.

5.2.2. Pivoted QR Factorization Based Preconditioner

Since in this approach the matrix logarithm is the most critical and numerically most difficult computational task, the preprocessing/preconditioning aims to ensure successful completion of that particular step in the method. The back application of the similarity is also an important step. In Algorithm 1, the main preconditioning is performed by an upper triangular factor from the QR factorization, and by its inverse. For a numerically robust computation, it is important that the QR factorization is computed accurately even in the case of wildly scaled data, and, moreover, that the resulting triangular factor is rank revealing and well structured. These goals can be accomplished by pivoting. In this subsection we outline the main principles along which the idea of QR factorization-based preconditioned computation of the matrix

L_{N}

(matrix representation of a compression of the infinitesimal generator) can be further pursued.

The column pivoting has the rank revealing property and the triangular factor is diagonally dominant in a very strong sense, see e.g., [23,30] and (13).

In the case of Businger–Golub pivoting, we know that

R_{X} = Δ_{X} T_{X}

, where

Δ_{X} = diag (| {(R_{X})}_{i i} {|)}_{i = 1}^{N}

and

T_{X}

is well conditioned. Hence,

R_{X}^{- 1} = T_{X}^{- 1} Δ_{X}^{- 1}

, and after Line 3 in Algorithm 2 one can insert another preconditioning with

Δ_{X}

. We omit the details for the sake of brevity. Instead, we conclude this theme with few remarks that should be useful for further study and implementation.

Algorithm 2

[L_{N}] = Inf_Generator_QRCP (O_{X}, O_{Y}, T)

Input:: $O_{X} \in C^{K \times N}$ , $O_{Y} \in C^{K \times N}$ , $T > 0$
1:: Reorder the snapshots by simultaneous row permutation of $O_{X}$ and $O_{Y}$ ; see Remark 6.
2:: $[Q_{X}, R_{X}, Π_{X}] = qr (O_{X})$ {Rank revealing QR factorization with column pivoting}
3:: ${\hat{U}}_{N} = Q_{X}^{T} (O_{Y} Π_{X}) R_{X}^{- 1}$ { ${\hat{U}}_{N}$ is similar to $O_{X}^{†} O_{Y}$ .}
4:: ${\hat{L}}_{N} = log ({\hat{U}}_{N})$
5:: $L_{N} = (1 / T) Π_{X} (R_{X}^{- 1} {\hat{L}}_{N} R_{X}) Π_{X}^{T}$
Output:: $L_{N}$ . { $L_{N} = (1 / T) log (O_{X}^{†} O_{Y})$ }

Remark 6.

For the numerical accuracy of the QR factorization, an additional row pivoting may be needed to obtain that the rows are ordered so that their

ℓ_{\infty}

norms are decreasing, see [31,32]. If Ψ is a permutation matrix that encodes the row pivoting, then

{(Ψ O_{X})}^{†} = O_{X}^{†} Ψ^{T}

, so that

{(Ψ O_{X})}^{†} (Ψ O_{Y}) = O_{X}^{†} O_{Y}

. This means that using the additional row pivoting in the QR factorization in Algorithm 2 is equivalent to a particular ordering of the data snapshots. The column pivoting corresponds to reordering the basis’ functions. Both reorders of the data are allowed operations and can thus be used to enhance numerical robustness of the computation.

Remark 7.

If

K ≫ 2 N

then it pays off to change the coordinates by computing the QR factorization

(\begin{matrix} Q_{X} & Q_{Y} \end{matrix}) = Q (\begin{matrix} R \\ 0 \end{matrix}),

and use the corresponding columns of

R

instead of

O_{X}

and

O_{Y}

. This follows the idea of the QR-compressed DMD [29].

Remark 8.

The key assumption in the above described method is that

K ≫ N

, i.e., that both

O_{X}

and

O_{Y}

are tall matrices; their columns are in a high dimensional space and with suitable transformation S in the column spaces (

O_{X} \mapsto O_{X} S

,

O_{Y} \mapsto O_{Y} S

) we can improve the condition numbers. By the (variational) monotonicity principle, supplying more snapshots (increasing K) moves the singular values of

O_{X}

and

O_{Y}

to the right, thus improving the condition numbers of both matrices. Since, by the underlying continuity, the canonical angles between the ranges of

O_{X}

and

O_{Y}

are expected to be away from

π / 2

, and

U_{N} = O_{X}^{†} O_{Y}

is going to be nonsingular. Moreover, the overall condition number of the computation can be controlled using our proposed modifications that are designed to ensure stable computation of the matrix logarithm.

6. Dual Method

It is clear that the values of N linearly independent functions

℘_{1}, \dots, ℘_{N}

over the discrete set

x_{1}, \dots, x_{K}

of

K < N

snapshots (that is, with only the tabulated values in the

K \times N

matrices

O_{X}

,

O_{Y}

) contain redundancy. On the other hand, increasing the space dimension N is a way to lift the data to higher dimensional space; more observables improve both the DMD and KMD analyses. Note that, in the case

N > K

, both the compression

U_{N} = {[Φ_{N} U_{| F_{N}}^{t}]}_{B} = O_{X}^{†} O_{Y}

and the EDMD matrix (

U_{N}^{T}

) are rank deficient; unfortunately, the infinitesimal generator identification framework cannot work in that setting because the matrix logarithm is not defined. It should be stressed here that, e.g., in the DMD setting, the action of the operator given by the data is restricted to an at most K-dimensional subspace in the N dimensional space, and that the approximations of the Koopman modes are obtained by a Rayleigh–Ritz extraction. Hence, any operator (matrix) function of

U_{N}

only makes sense and has a practical usability in the context of an approximation from the subspace defined by the data.

In [8], a dual method is proposed, which instead of

U_{N} = O_{X}^{†} O_{Y}

works with the logarithm of

U_{K} = O_{Y} O_{X}^{†}

. In this section we first provide, in Section 6.1, a detailed linear algebra description of the dual method that will facilitate a more general formulation that allows for modifications which may lead to better numerical algorithms. In fact, we show in Section 7 that the dual method of [8] is but a special case of subspace projection methods, and we show how to exploit this for design of numerically better schemes.

6.1. A Rayleigh Quotient Formulation

In the dual formulation, the transition from

U_{N}

to

U_{K}

can be formulated as another compression of

U^{t}

onto a particular K-dimensional subspace of

F_{N}

.

Proposition 3.

If we define

(\begin{matrix} ψ_{1} (x) & \dots & ψ_{K} (x) \end{matrix}) = (\begin{matrix} ℘_{1} (x) & \dots & ℘_{N} (x) \end{matrix}) O_{X}^{†}

, then (assuming

O_{X}

is of full row rank K),

B_{K} = {ψ_{1}, \dots, ψ_{K}}

is a basis of the K dimensional subspace

F_{K} \subset F_{N}

, and

U_{K} = O_{Y} O_{X}^{†}

is the matrix representation of the Rayleigh quotient

Φ_{K} U_{| F_{K}}^{t}

in which

Φ_{K} : F \to F_{K}

is the least squares projection as in Section 2.1.1. The matrix

U_{K}

is the matrix Rayleigh quotient of

U_{N}

with respect to the range of

O_{X}^{†}

, i.e.,

U_{K} = O_{X} U_{N} O_{X}^{†}

and

U_{N} O_{X}^{†} = O_{X}^{†} U_{K}

. Furthermore, the Rayleigh quotient with respect to

O_{X}^{†}

is the same for all matrices from the set

N

(see (10)):

O_{X} (O_{X}^{†} O_{Y} + Ψ_{0} Ξ) O_{X}^{†} = O_{Y} O_{X}^{†} . (N o t e h e r e t h a t {(O_{X}^{†})}^{†} = O_{X} .)

(27)

Proof.

First, note that the basis functions

ψ_{1}, \dots, ψ_{K}

evaluated at

x_{1}, \dots, x_{K}

can be tabulated as the matrix

O_{X, K} = (\begin{matrix} ψ_{1} (x_{1}) & ψ_{2} (x_{1}) & \dots & ψ_{K} (x_{1}) \\ ⋮ & ⋮ & \dots & ⋮ \\ ψ_{1} (x_{K}) & ψ_{2} (x_{K}) & \dots & ψ_{K} (x_{K}) \end{matrix}) = O_{X} O_{X}^{†} = I_{K},

and that for a

g \in F_{K}

, its representation in the basis

B_{K}

is

{[g]}_{B_{K}} = {(\begin{matrix} g (x_{1}), \dots, g (x_{K}) \end{matrix})}^{T}

; see (6), where we take

W = I

for the sake of simplicity. Similarly,

(U^{t} ψ_{1}, \dots, U^{t} ψ_{K})

evaluated at

x_{1}, \dots, x_{K}

yields the matrix

O_{Y, K} = O_{Y} O_{X}^{†}

. Now, as in Section 2.1.2,

U_{K} = O_{X, K}^{†} O_{Y, K} = O_{Y} O_{X}^{†}

. Relation (27) is easily checked. □

If

U_{K}

is nonsingular then the identification scheme should be recast in terms of

log U_{K}

. In the basis

B_{K}

we have then the following representations of the Rayleigh quotient

Φ_{K} U_{| F_{K}}^{t}

and its logarithm.

Corollary 1.

The matrix representations of the compressed

U^{t}

and its logarithm (if defined) are as follows: For any

{(g_{1}, \dots, g_{K})}^{T} \in C^{K}

,

\begin{matrix} Φ_{K} U_{| F_{K}}^{t} ((\begin{matrix} ψ_{1} (x) & \dots & ψ_{K} (x) \end{matrix}) (\begin{matrix} g_{1} \\ ⋮ \\ g_{K} \end{matrix})) & = & (\begin{matrix} ψ_{1} (x) & \dots & ψ_{K} (x) \end{matrix}) (U_{K} (\begin{matrix} g_{1} \\ ⋮ \\ g_{K} \end{matrix})), \\ log (Φ_{K} U_{| F_{K}}^{t}) ((\begin{matrix} ψ_{1} (x) & \dots & ψ_{K} (x) \end{matrix}) (\begin{matrix} g_{1} \\ ⋮ \\ g_{K} \end{matrix})) & = & (\begin{matrix} ψ_{1} (x) & \dots & ψ_{K} (x) \end{matrix}) (log (U_{K}) (\begin{matrix} g_{1} \\ ⋮ \\ g_{K} \end{matrix})) . \end{matrix}

Next, we consider function evaluation at

y_{k} = φ^{t} (x_{k})

,

k = 1, \dots, K

.

Proposition 4.

Let

f = \sum_{i = 1}^{N} f_{i} ℘_{i} \in F_{N}

. Then

(\begin{matrix} U^{t} f (x_{1}) \\ ⋮ \\ U^{t} f (x_{K}) \end{matrix}) = (\begin{matrix} f (y_{1}) \\ ⋮ \\ f (y_{K}) \end{matrix}) = O_{Y} O_{X}^{†} (\begin{matrix} f (x_{1}) \\ ⋮ \\ f (x_{K}) \end{matrix}) + O_{Y} (I_{N} - O_{X}^{†} O_{X}) (\begin{matrix} f_{1} \\ ⋮ \\ f_{N} \end{matrix}) .

(28)

If

I_{N} \neq O_{X}^{†} O_{X}

, then the second term on the right hand side in (28) is zero if and only if the

f_{i}

’s are of the form

{(\begin{matrix} f_{i} \end{matrix})}_{i = 1}^{N} = O_{X}^{†} {(\begin{matrix} g_{k} \end{matrix})}_{k = 1}^{K}

with arbitrary

g_{k}

’s. Furthermore, if

g = \sum_{i = 1}^{K} g_{i} ψ_{i} \in F_{K} \subset F_{N}

, then

g_{i} = g (x_{i})

and

(\begin{matrix} U^{t} g (x_{1}) \\ ⋮ \\ U^{t} g (x_{K}) \end{matrix}) = O_{Y} O_{X}^{†} (\begin{matrix} g (x_{1}) \\ ⋮ \\ g (x_{K}) \end{matrix}), (\begin{matrix} log (Φ_{K} U_{| F_{K}}^{t}) g (x_{1}) \\ ⋮ \\ log (Φ_{K} U_{| F_{K}}^{t}) g (x_{K}) \end{matrix}) = log (O_{Y} O_{X}^{†}) (\begin{matrix} g (x_{1}) \\ ⋮ \\ g (x_{K}) \end{matrix}) .

(29)

Proof.

For (28), it suffices to write

(\begin{matrix} f (y_{1}) \\ ⋮ \\ f (y_{K}) \end{matrix}) = O_{Y} (\begin{matrix} f_{1} \\ ⋮ \\ f_{N} \end{matrix}) = O_{Y} (O_{X}^{†} O_{X} + I_{N} - O_{X}^{†} O_{X}) (\begin{matrix} f_{1} \\ ⋮ \\ f_{N} \end{matrix}) .

For the first relation in (29), note that

g = \sum_{i = 1}^{K} g_{i} ψ_{i}

evaluated at

y_{1}, \dots, y_{K}

reads

(\begin{matrix} g (y_{1}) \\ ⋮ \\ g (y_{K}) \end{matrix}) = O_{Y} O_{X}^{†} (\begin{matrix} g_{1} \\ ⋮ \\ g_{K} \end{matrix}), where g_{i} = g (x_{i}) because O_{X, K} = I_{K} .

□

Hence, using the last relation and

L = F \cdot \nabla \approx (1 / t) log (Φ_{K} U_{| F_{K}}^{t})

in

F_{K}

(evaluated at the snapshots

x_{i}

), we have

(\begin{matrix} F \cdot \nabla g (x_{1}) \\ ⋮ \\ F \cdot \nabla g (x_{K}) \end{matrix}) \approx \frac{1}{t} log (O_{Y} O_{X}^{†}) (\begin{matrix} g (x_{1}) \\ ⋮ \\ g (x_{K}) \end{matrix}) .

(30)

If we choose

g (x) = x_{j}

,

j = 1, \dots, n

, respectively, (where

x = {(x_{1}, \dots, x_{n})}^{T}

) then

F \cdot \nabla g (x) = F_{j} (x)

and we obtain approximate filed values

\tilde{F (x_{i})}

defined as

(\begin{matrix} {\tilde{F (x_{1})}}^{T} \\ ⋮ \\ {\tilde{F (x_{K})}}^{T} \end{matrix}) = \frac{1}{t} log (O_{Y} O_{X}^{†}) (\begin{matrix} x_{1}^{T} \\ ⋮ \\ x_{K}^{T} \end{matrix}) .

(31)

Note however that (30), (31) require that

g \in F_{K}

. Furthermore, the approximations of

F

are given only at the sequence

x_{1}, \dots, x_{K}

.

Global Identification of $F$

After identifying the values

F (x_{k})

, the idea is to use the ansatz

F (x) = (\begin{matrix} F_{1} (x) \\ ⋮ \\ F_{n} (x) \end{matrix}), F_{j} (x) = \sum_{k = 1}^{N_{F}} φ_{k j} h_{j} (x),

(32)

where

h_{1}, \dots, h_{N_{F}}

are chosen from a dictionary of functions, possibly different from the basis used to lift the data and identify the compression of the infinitesimal generator. Then we obtain the sequence of least squares problems

{∥ (\begin{matrix} h_{1} (x_{1}) & \dots & h_{N_{F}} (x_{1}) \\ h_{1} (x_{2}) & \dots & h_{N_{F}} (x_{2}) \\ ⋮ & \dots & ⋮ \\ h_{1} (x_{K}) & \dots & h_{N_{F}} (x_{K}) \end{matrix}) (\begin{matrix} φ_{1 j} \\ φ_{2 j} \\ ⋮ \\ φ_{N_{F} j} \end{matrix}) - (\begin{matrix} {\tilde{F}}_{j} (x_{1}) \\ {\tilde{F}}_{j} (x_{2}) \\ ⋮ \\ {\tilde{F}}_{j} (x_{K}) \end{matrix}) ∥}_{2}^{2} ⟶ min_{φ_{1 j} \dots φ_{N_{F} j}}, j = 1, \dots, n,

(33)

that can also be equipped with a regularization factor that promotes sparse solution.

6.2. A Numerical Example

As in Section 4, a simple but difficult example can be used to explore the numerical feasibility of the derived scheme. It has been shown in [8] that the method works well for the Lorenz system on small time intervals. In the following example, we increase the time domain and test the accuracy of the reconstruction of

F

. An ill-conditioned problem is obtained by taking the total degree of the polynomials to be 12. Although this may seem artificial, it is useful because it provides numerically difficult cases that expose the weaknesses of the computational scheme and excellent case studies for better understanding and further development.

Example 6.

In this example we run simulation of the Lorenz system with time resolution

δ t = 10^{- 3}

on the intervals

[0, 0.1]

,

[0, 0.18]

and

[0, 0.19]

. The basis functions

℘_{i}

are the monomials with total degree up to

m = 12

. We generate 12 trajectories with random initial conditions, and from each trajectory we sample three consecutive snapshots at ten randomly selected positions; the matrices

O_{X}

,

O_{Y}

are thus

360 \times 455

. In the reconstruction Formula (31), the matrix logarithm is computed in Matlab as logm(OY/OX); in all three runs, logm() issued a warning that a non-principal value of the algorithm was computed. The reconstruction error is measured component-wise and snapshot-wise as

ϵ_{k}^{(i)} = \frac{| \tilde{F_{i} (x_{k})} - F_{i} (x_{k}) |}{∥ F (x_{k}) ∥_{\infty}}, i = 1, \dots n; k = 1, \dots, K .

(34)

In addition, we measure the total error of the tabulated values in the Frobenius norm as

τ = ∥ {(\tilde{F_{i} (x_{k})})}_{k, i = 1}^{K, n} - {(F_{i} (x_{k}))}_{k, i = 1}^{K, N} ∥_{F} / {∥ {(F_{i} (x_{k}))}_{k, i = 1}^{K, N} ∥}_{F} .

(35)

In Figure 9, we show the errors

ϵ_{k}^{(i)}

for the intervals

[0, 0.1]

and

[0, 0.18]

; in the case of the time interval

[0, 0.19]

, the method broke down and the reconstructed values were computed as NaN’s.

The results depend on the initial conditions used to generate sample trajectories. In the above experiments, each initial condition is taken as a normally distributed

3 \times 1

vector generated in Matlab using randn. If we generate the initial conditions as uniformly distributed inside the sphere of radius

0.1

, centered at the origin, then for all three test intervals the error τ was

O (10^{- 3})

. With such initialization, the same level of accuracy is then maintained for larger time intervals, up to

[0, 0.4]

; for

[0, 0.41]

the error increased to

τ \approx 0.1

and for

[0, 0.42]

the result was NaN.

This example shows that numerical (software) implementation of the dual method requires additional analysis and modifications, similar to the main method. In the next section we explore numerical linear algebra techniques that could facilitate a more robust implementation.

7. Subspace Selection

The approximation (31) is based on a particular K-dimensional subspace of

F_{N} = span (℘_{1}, \dots, ℘_{N})

, where (in the case of monomial basis) the choice of the new basis functions precludes direct coefficient comparisons with (17), (19). Instead, the identification procedure follows the lines of (30)–(33).

We can set up a more general framework: independent of the ratio

N / K

(and independent of the formulation—original or dual) we can seek other suitable subspaces of

F_{N}

, and not necessarily of dimension K. The selection criterion is numerical: both the subspace and its dimension

\hat{N}

should be determined with respect to the numerical conditioning of the matrix representations at the finite sequence

x_{1}, \dots, x_{K}

. A basis of such a

\hat{N}

-dimensional subspace

F_{\hat{N}}

of

F_{N}

is written as

(ψ_{1}, \dots, ψ_{\hat{N}}) = (℘_{1}, \dots, ℘_{N}) S

, where

S

is

N \times \hat{N}

selection operator, i.e., matrix, of rank

\hat{N}

. The tabulated values of

(ψ_{1}, \dots, ψ_{\hat{N}})

at

x_{1}, \dots, x_{K}

are easily computed as

O_{X, S} = O_{X} S

. Similarly, the tabulated values of

(U^{t} ψ_{1}, \dots, U^{t} ψ_{\hat{N}})

are

O_{Y, S} = O_{Y} S

.

Certainly,

\hat{N} \leq min (K, N)

. If e.g.,

K < N

, we aim at

\hat{N} = K

, but we will have option that a numerical algorithm decides whether that choice is feasible, depending on the level of ill-conditioning.

The subspace selection operator

S

should ensure that, if needed, the constructed basis of

F_{\hat{N}}

contains

ℓ < \hat{N}

a priori selected functions

℘_{i_{1}}, \dots, ℘_{i_{ℓ}}

(recall the identification procedure outlined in Section 3.2) and that

O_{X, S}

is well conditioned. This implicitly restricts ℓ to be at most K, and in practice

ℓ ≪ K

. Furthermore, the remaining

\hat{N} - ℓ

basis functions should be selected among the

℘_{j}

’s. In other words, we seek

S

as a selection of

\hat{N}

columns of the identity

I_{N}

. This is achieved by the following two step procedure:

(Optional) Define $S_{0}$ as the $N \times N$ permutation matrix that moves the selected functions to the leading positions, i.e., such that $(℘_{1}, \dots, ℘_{N}) S_{0} = (℘_{i_{1}}, \dots, ℘_{i_{ℓ}}, \dots)$ .
Add to these ℓ functions a selection of $\hat{N} - ℓ$ functions that are most linearly independent in the orthogonal complement of $span (℘_{i_{1}}, \dots, ℘_{i_{ℓ}})$ as seen on the discrete points $x_{1}, \dots, x_{K}$ .

The second step, which can be designated as basis pruning, is based on the numerical rank revealing techniques.

7.1. Pruning the Basis $(℘_{1}, \dots, ℘_{N})$

Removing the basis functions

℘_{j}

that carry numerically redundant information on the set

x_{1}, \dots, x_{K}

can be automated using the rank revealing QR factorization [23] as follows. First, compute the QR factorization of the selected functions, with an optional column pivoting

(O_{X} S_{0}) (:, 1 : ℓ) Π_{1} = Q_{1} (\begin{matrix} R_{11} \\ 0 \end{matrix}) = (\begin{matrix} . & . & . & . & . \\ . & . & . & . & . \\ . & . & . & . & . \\ . & . & . & . & . \\ . & . & . & . & . \end{matrix}) (\begin{matrix} * & * \\ 0 & * \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \end{matrix}),

(36)

and then apply

Q_{1}^{*}

to the remaining

N - ℓ

columns to obtain

Q_{1}^{*} O_{X} S_{0} (Π_{1} \oplus I_{N - ℓ}) = (\begin{matrix} R_{11} & {\tilde{R}}_{12} \\ 0 & {\tilde{R}}_{22} \end{matrix}) = (\begin{matrix} * & * & \times & \times & \times & \times & \times & \times \\ 0 & * & \times & \times & \times & \times & \times & \times \\ 0 & 0 & + & + & + & + & + & + \\ 0 & 0 & + & + & + & + & + & + \\ 0 & 0 & + & + & + & + & + & + \end{matrix}) .

(37)

(The structure of the computed matrices is illustrated in (36), (37) for

K = 5

,

N = 8

,

ℓ = 2

.) Now, the columns of

{\tilde{R}}_{22}

are the coordinates of the projection of the trailing

N - ℓ

columns of

O_{X} S_{0}

onto the

K - ℓ

dimensional orthogonal complement of the span of the leading ℓ columns (of

O_{X} S_{0}

, at this moment assumed linearly independent). A well conditioned selection of

\hat{N} - ℓ

columns can be computed by another column pivoted (rank revealing) QR factorization

{\tilde{R}}_{22} Π_{2} = (\begin{matrix} + & + & + & + & + & + \\ + & + & + & + & + & + \\ + & + & + & + & + & + \end{matrix}) Π_{2} = Q_{2} (\begin{matrix} R_{22} & R_{23} \end{matrix}) = Q_{2} (\begin{matrix} 🟉 & 🟉 & 🟉 & x & x & x \\ 0 & 🟉 & 🟉 & x & x & x \\ 0 & 0 & 🟉 & x & x & x \end{matrix}) .

(38)

Altogether, we have the factorization

(\begin{matrix} I_{ℓ} & 0 \\ 0 & Q_{2}^{*} \end{matrix}) Q_{1}^{*} O_{X} S_{0} (\begin{matrix} Π_{1} & 0 \\ 0 & I_{N - ℓ} \end{matrix}) (\begin{matrix} I_{ℓ} & 0 \\ 0 & Π_{2} \end{matrix}) = (\begin{matrix} R_{11} & R_{12} & R_{13} \\ 0 & R_{22} & R_{23} \end{matrix}) = (\begin{matrix} * & * & \times & \times & \times & \times & \times & \times \\ 0 & * & \times & \times & \times & \times & \times & \times \\ 0 & 0 & 🟉 & 🟉 & 🟉 & x & x & x \\ 0 & 0 & 0 & 🟉 & 🟉 & x & x & x \\ 0 & 0 & 0 & 0 & 🟉 & x & x & x \end{matrix}),

and the leading

\hat{N}

columns of

O_{X} S_{0} (I \oplus Π_{2})

are the desired selection. Note that in this formula we have not used the permutation

(Π_{1} \oplus I_{N - ℓ})

of the first ℓ columns (here assumed independent so that

R_{11}

is nonsingular), to respect the requested ordering

℘_{i_{1}}, \dots, ℘_{i_{ℓ}}

encoded in

S_{0}

7.2. Well Conditioned Selection by Basis Pruning—General Case

In Section 7.1 we assumed that it was indeed possible to select

\hat{N}

linearly independent columns from

O_{X}

. This may not be the case; moreover, the mere linear independence in finite precision numerical computation is not enough. We need well-conditioned selection of the columns of

O_{X}

, i.e., well conditioned matrix

O_{X, S}

, but also well conditioned

O_{Y, S}

. The rank revealing pivoting (materialized in the permutation matrices

Π_{1}

,

Π_{2}

) will provide relevant information.

If the initially selected ℓ functions are nearly linearly dependent on the supplied snapshots, but

\hat{ℓ} < ℓ

of them can be considered well conditioned, then on the diagonal of

R_{11}

we will see that

| {(R_{11})}_{\hat{ℓ} \hat{ℓ}} | > tol | {(R_{11})}_{\hat{ℓ} + 1, \hat{ℓ} + 1} |

. In that case, we set

(R_{11}) (\hat{ℓ} + 1 : ℓ, \hat{ℓ} + 1 : ℓ) = 0

, and the submatrix

{\tilde{R}}_{22}

is now defined as the subarray at the positions

(\hat{ℓ} + 1 : K, \hat{ℓ} + 1 : N)

. Since its first column is now zero, the pivoting

Π_{2}

will eliminate it from further selections (unless the entire

{\tilde{R}}_{22}

is zero). To illustrate, assume that in (37) the

(2, 2)

position carries a value

ϵ

that is smaller than a prescribed threshold. Then, we have

Q_{1}^{*} O_{X} S_{0} (Π_{1} \oplus I_{N - ℓ}) = (\begin{matrix} R_{11} & {\tilde{R}}_{12} \\ 0 & {\tilde{R}}_{22} \end{matrix}) = (\begin{matrix} * & * & \times & \times & \times & \times & \times & \times \\ 0 & ϵ & + & + & + & + & + & + \\ 0 & 0 & + & + & + & + & + & + \\ 0 & 0 & + & + & + & + & + & + \\ 0 & 0 & + & + & + & + & + & + \end{matrix}) \approx (\begin{matrix} * & * & \times & \times & \times & \times & \times & \times \\ 0 & 0 & + & + & + & + & + & + \\ 0 & 0 & + & + & + & + & + & + \\ 0 & 0 & + & + & + & + & + & + \\ 0 & 0 & + & + & + & + & + & + \end{matrix}) .

(39)

Altogether, this discussion can be summarized in Algorithm 3.

Algorithm 3

[S, \hat{ℓ}, \hat{r}, Π_{1}, Π_{2}] = Subspace_Selection (O_{X}, S_{0}, \hat{N}, tol)

Input:: $O_{X} \in C^{K \times N}$ , $S_{0}$ , $\hat{N}$ , $tol$
1:: (Optional) Reorder the snapshots by simultaneous row permutation of $O_{X}$ and $O_{Y}$ ; see Remark 6.
2:: Bring the selected functions forward to the leading ℓ positions: $Q_{X} = Q_{X} S_{0}$ . Implement $S_{0}$ as a sequence of swaps to avoid excess data movement (in the case of large dimensions).
3:: $[Q_{1}, R_{1}, Π_{1}] = qr (O_{X} (:, 1 : ℓ))$ {Rank revealing QR factorization with column pivoting. Overwrite $R_{1} = {(R_{11}^{T}, 0)}^{T}$ over the leading ℓ columns of $O_{X}$ . See (36).}
4:: Determine the numerical rank $\hat{ℓ}$ of $R_{11}$ and in the case $\hat{ℓ} < ℓ$ set $R_{11} (\hat{ℓ} + 1 : ℓ, \hat{ℓ} + 1 : ℓ) = 0$ .
5:: $O_{X} (:, ℓ + 1 : N) = Q_{1}^{*} O_{X} (:, ℓ + 1 : N)$ .
6:: $[Q_{2}, R_{2}, Π_{2}] = qr (O_{X} (\hat{ℓ} + 1 : K, \hat{ℓ} + 1 : N))$ . {Rank revealing QR factorization with column pivoting. $R_{2} = (R_{22}, R_{23})$ overwrites $O_{X} (\hat{ℓ} + 1 : K, \hat{ℓ} + 1 : N)$ }
7:: Determine the numerical rank $\hat{r}$ of $R_{22}$ . Set $\tilde{N} = \hat{ℓ} + \hat{r}$ .
8:: $S = (S_{0} (Π_{1} \oplus I_{N - ℓ}) (I_{\hat{ℓ}} \oplus Π_{2})) (:, 1 : min (\hat{N}, \tilde{N}))$ ;
Output:: $S, \hat{ℓ}, \hat{r}, Π_{1}, Π_{2}$ .

7.3. Implementation Details

Remark 9.

Note that the case

\hat{ℓ} < ℓ

triggers an exception if the selected ℓ functions are essential in the overall computation, as for example in (17), (19), (30), (31). In the examples used in this paper

\hat{ℓ} = ℓ

, so that no additional action is needed.

Remark 10.

The columns of

O_{X}

can be severely ill-conditioned and the rank revealing QR factorization should be carefully implemented [33]. The thresholding strategy can vary from soft, mild, to hard, depending on the the concrete example, see [34]. Algorithm 3 can be used to determine the numerical rank but also to ensure that the condition number of the selected columns of

O_{X}

is the below specified value. This can be efficiently implemented using incremental condition estimators tailored for triangular matrices.

Remark 11.

If the column dimension

min (\hat{N}, \tilde{N})

of

{\hat{O}}_{X} = O_{X} S

and

{\hat{O}}_{Y} = O_{Y} S

is at most K, then we can also apply the original approach from Section 2. Furthermore, the pruning scheme can be also directly applied to the original method.

Remark 12.

Suppose that

N ≫ K

and that K is moderate or also big. Then computational complexity is an issue and the algorithm can be modified as follows:

In Line 3, ℓ is expected to be small or moderate compared to K, so that this step can be efficiently implemented using LAPACK (the functions xGEQRF and xGEQP3) or ScaLAPACK using PxGEQPF (but using the most recent implementation [35]). In the second part of the algorithm, we need a well-conditioned submatrix of a

(K - \hat{ℓ}) \times (N - \hat{ℓ})

submatrix of

Q_{1}^{*} O_{X} (:, ℓ + 1 : N)

. To do that, we do not need to compute the whole matrix. We can apply the scheme from [36] and sample the columns of

Q_{1}^{*} O_{X} (:, ℓ + 1 : N)

until we form a well-conditioned

R_{22}

.

7.4. Numerical Experiments with the Dictionary Pruning Algorithm

Example 7.

(Continuation of Example 6) We use the same data as in Example 6, but instead of

O_{X}

and

O_{Y}

we use K-column submatrices

O_{X, S}

,

O_{Y, S}

, selected by Algorithm 3 with the requirement that

℘_{1}, \dots, ℘_{n + 1}

must be kept in the subspace

F_{K}

. More precisely, we set

\hat{N} = K

, and the numerical rank is determined with

tol = 0

, so that

\tilde{N} = \hat{N} = K

. The results shown in Figure 10 show a significant improvement.

Example 8.

The purpose of this example is to illustrate the robustness of the proposed algorithm: we take the sampling interval as big as

[0, 30]

or

[0, 50]

with the resolution of the numerical simulation

δ t = 0.01

and

δ t = 0.1

, and the samples are taken, as before, at randomly selected time instances. For illustration, the time stamps of the snapshots are marked on the first generated trajectory (with

δ t = 0.01

) and shown in the first row in Figure 11.

The accuracy is satisfactory, given the length of the interval (

[0, 30]

), the discretization step of the simulation (

δ t = 0.01

,

δ t = 0.1

) and the number of samples. Next, we increase the interval to

[0, 50]

; the results are in Figure 12.

Now, we reduce the number of samples—from each trajectory we sample at five positions (instead of ten), giving the total of 180 instead of 360 snapshots

x_{k}

. The results are summarized in Figure 13.

In all examples, the rank revealing was conducted with a hard threshold, with no attempt in the direction of strong rank revealing, which could further improve the numerical accuracy. The details are omitted for the sake of brevity and will be available in our future work.

8. Concluding Remarks

In this work we provided the first steps towards a robust numerical software implementation of the method [8] for identification of dynamical systems using the infinitesimal generator of the Koopman semigroup. An adversarial testing using polynomial bases revealed critical numerical issues that we addressed in detail. We proposed two techniques: preconditioning and basis pruning. These are a basis for a new software implementation that will include other choices of basis functions, including data-driven/empirical constructions of the bases. This is subject of our ongoing and planned future work, including stochastic models.

Author Contributions

Conceptualization, Z.D., I.M. and R.M.; methodology, Z.D.; formal analysis, Z.D., I.M. and R.M; software, Z.D.; writing—original draft preparation, Z.D.; writing—review and editing, Z.D., I.M and R.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the DARPA contract HR0011-18-9-0033.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

The authors thank Alexandre Mauroy (University of Namur, Belgium) for his remarks and insights that helped us to improve the presentation.

Conflicts of Interest

The authors declare no conflict of interest.

References

Montáns, F.J.; Chinesta, F.; Gómez-Bombarelli, R.; Kutz, J.N. Data-driven modeling and learning in science and engineering. C. R. MÉCanique 2019, 347, 845–855. [Google Scholar] [CrossRef]
Brunton, S.L.; Proctor, J.L.; Kutz, J.N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA 2016, 113, 3932–3937. Available online: https://www.pnas.org/content/113/15/3932.full.pdf (accessed on 20 June 2021). [CrossRef] [Green Version]
Chartrand, R. Numerical differentiation of noisy, nonsmooth data. ISRN Appl. Math. 2011, 2011, 164564. [Google Scholar] [CrossRef] [Green Version]
Messenger, D.A.; Bortz, D.M. Weak SINDy: Galerkin-based data-driven model selection. arXiv 2020, arXiv:2005.04339. [Google Scholar]
Rudy, S.H.; Brunton, S.L.; Proctor, J.L.; Kutz, J.N. Data-driven discovery of partial differential equations. Sci. Adv. 2017, 3. Available online: https://advances.sciencemag.org/content/3/4/e1602614.full.pdf (accessed on 20 June 2021). [CrossRef] [Green Version]
Goyal, P.; Benner, P. Discovery of nonlinear dynamical systems using a Runge-Kutta inspired dictionary-based sparse regression approach. arXiv 2021, arXiv:2105.04869. [Google Scholar]
Raissi, M.; Perdikaris, P.; Karniadakis, G. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Mauroy, A.; Goncalves, J. Koopman-based lifting techniques for nonlinear systems identification. IEEE Trans. Autom. Control 2020, 65, 2550–2565. [Google Scholar] [CrossRef] [Green Version]
Engel, K.J.; Nagel, R. One-Parameter Semigroups for Linear Evolution Equations; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1999; Volume 194. [Google Scholar]
Budišić, M.; Mohr, R.; Mezi’c, I. Applied Koopmanism. Chaos Interdiscip. J. Nonlinear Sci. 2012, 22, 047510. [Google Scholar] [CrossRef] [Green Version]
Lasota, A.; Mackey, M.C. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics; Springer: Berlin/Heidelberg, Germany, 1994. [Google Scholar]
Mohr, R.; Drmač, Z.; Mezić, I.; Fonoberova, M. Composition Operator for Static Data; AIMDyn Tech Report; AIMDyn Inc.: Santa Barbara, CA, USA, 2019. [Google Scholar]
Niemann, J.H.; Klus, S.; Schütte, C. Data-driven model reduction of agent-based systems using the Koopman generator. PLoS ONE 2021, 16, e0250970. [Google Scholar] [CrossRef]
Klus, S.; Nüske, F.; Peitz, S.; Niemann, J.H.; Clementi, C.; Schütte, C. Data-driven approximation of the Koopman generator: Model reduction, system identification, and control. Phys. D Nonlinear Phenom. 2020, 406, 132416. [Google Scholar] [CrossRef] [Green Version]
Metzner, P.; Horenko, I.; Schütte, C. Generator estimation of Markov jump processes based on incomplete observations nonequidistant in time. Phys. Rev. E 2007, 76, 066702. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Siefert, M.; Kittel, A.; Friedrich, R.; Peinke, J. On a quantitative method to analyze dynamical and measurement noise. EPL (Europhys. Lett.) 2007, 61, 466. [Google Scholar] [CrossRef] [Green Version]
Friedrich, R.; Peinke, J.; Sahimi, M.; Reza Rahimi Tabar, M. Approaching complexity by stochastic methods: From biological systems to turbulence. Phys. Rep. 2011, 506, 87–162. [Google Scholar] [CrossRef]
Callaham, J.L.; Loiseau, J.C.; Rigas, G.; Brunton, S.L. Nonlinear stochastic modelling with Langevin regression. Proc. R. Soc. A Math. Phys. Eng. Sci. 2021, 477, 20210092. [Google Scholar] [CrossRef]
Riseth, A.N.; Taylor-King, J.P. Operator fitting for parameter estimation of stochastic differential equations. arXiv 2017, arXiv:1709.05153. [Google Scholar]
Björck, A. Numerical Methods in Matrix Computations; Springer: Berlin/Heidelberg, Germany, 2015; Volume 59. [Google Scholar] [CrossRef]
Williams, M.O.; Kevrekidis, I.G.; Rowley, C.W. A data–driven approximation of the Koopman operator: Extending Dynamic Mode Decomposition. J. Nonlinear Sci. 2015, 25, 1307–1346. [Google Scholar] [CrossRef] [Green Version]
Mezić, I.; Drmač, Z.; Črnjarić Žic, N.; Maćešić, S.; Fonoberova, M.; Mohr, R.; Avila, A.M.; Manojlović, I.; Andrejčuk, A. A Koopman operator-based prediction algorithm and its application to COVID-19 pandemic. Nat. Commun. submitted.
Businger, P.A.; Golub, G.H. Linear least squares solutions by Householder transformations. Numer. Math. 1965, 7, 269–276. [Google Scholar] [CrossRef]
Culver, W.J. On the existence and uniqueness of the real logarithm of a matrix. Proc. Am. Math. Soc. 1966, 17, 1146–1151. [Google Scholar] [CrossRef]
Higham, N.J. Functions of Matrices: Theory and Computation; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2008; p. xx+425. [Google Scholar]
Kinyon, M.; Sagle, A. Quadratic dynamical systems and algebras. J. Differ. Equations 1995, 117, 67–126. [Google Scholar] [CrossRef] [Green Version]
Demmel, J.W. The geometry of ill-conditioning. J. Complex. 1987, 3, 201–229. [Google Scholar] [CrossRef] [Green Version]
Rust, B.W. Truncating the Singular Value Decomposition for ill–Posed Problems; Technical Report NISTIR 6131; Mathematical and Computational Sciences Division, National Institue of Standards and Technology, U.S. Department of Commerce, NIST: Gaithersburg, MD, USA, 1998.
Drmač, Z.; Mezić, I.; Mohr, R. Data driven modal decompositions: Analysis and enhancements. SIAM J. Sci. Comput. 2018, 40, A2253–A2285. [Google Scholar] [CrossRef]
Chandrasekaran, S.; Ipsen, I.C.F. On rank–revealing factorizations. SIAM J. Matrix Anal. Appl. 1994, 15, 592–622. [Google Scholar] [CrossRef] [Green Version]
Powell, M.J.D.; Reid, J.K. On applying Householder transformations to linear least squares problems. In Information Processing 68, Proceedings of the International Federation of Information Processing Congress, Edinburgh, UK, 5–10 August 1968; IFIP: Amsterdam, The Netherlands, 1969; pp. 122–126. [Google Scholar]
Cox, A.J.; Higham, N.J. Stability of Householder QR factorization for weighted least squares problems. In Numerical Analysis 1997, Proceedings of the 17th Dundee Biennial Conference, Dundee, UK, 24–27 June 1997; Griffiths, D.F., Higham, D.J., Watson, G.A., Eds.; Pitman Research Notes in Mathematics; Pitman: London, UK, 1998; Volume 380, pp. 57–73. [Google Scholar]
Drmač, Z.; Bujanović, Z. On the failure of rank revealing QR factorization software—A case study. ACM Trans. Math. Softw. 2008, 35, 1–28. [Google Scholar] [CrossRef]
Drmač, Z.; Šain Glibić, I. New numerical algorithm for deflation of infinite and zero eigenvalues and full solution of quadratic eigenvalue problems. arXiv 2019, arXiv:1904.05418. [Google Scholar]
Bujanović, Z.; Drmač, Z. New robust ScaLAPACK routine for computing the QR factorization with column pivoting. arXiv 2019, arXiv:1910.05623. [Google Scholar]
Drmač, Z.; Gugercin, S. A new selection operator for the iscrete empirical interpolation method—improved a priori error bound and extensions. SIAM J. Sci. Comput. 2016, 38, A631–A648. [Google Scholar] [CrossRef]

Figure 1. (Example 2,

m = 3

.) First row: samples of

x_{k}

from the first three trajectories, using two different sampling schemes. Second row: the corresponding values of

{log}_{10} ϵ_{k}

defined in (24) for 12,000 randomly selected points in the box

[- 20, 20] \times [- 20, 20] \times [0, 50]

.

Figure 1. (Example 2,

m = 3

.) First row: samples of

x_{k}

from the first three trajectories, using two different sampling schemes. Second row: the corresponding values of

{log}_{10} ϵ_{k}

defined in (24) for 12,000 randomly selected points in the box

[- 20, 20] \times [- 20, 20] \times [0, 50]

.

Figure 2. (Example 3,

m = 9

.) The values of

{log}_{10} ϵ_{k}

defined in (24) for 12,000 randomly selected points in the box

[- 20, 20] \times [- 20, 20] \times [0, 50]

. Left panel: with

m = 3

as in Example 1. Right panel: with

m = 9

.

Figure 2. (Example 3,

m = 9

.) The values of

{log}_{10} ϵ_{k}

defined in (24) for 12,000 randomly selected points in the box

[- 20, 20] \times [- 20, 20] \times [0, 50]

. Left panel: with

m = 3

as in Example 1. Right panel: with

m = 9

.

Figure 3. (Example 3,

m = 9

.) Left panel: the (computed) eigenvalues of the matrix representation of the computed compression

U_{N}

of

U^{t}

. The red cross at the origin indicates a cluster of eigenvalues. Right panel: zoomed neighborhood of the origin, showing many (in this case 44) absolutely small eigenvalues, quite a few of which are negative real. The matrix

U_{N}

is computed as

(pinv (O_{X}) * O_{Y})

. If it is computed as

(O_{X} \ O_{Y})

, instead of the cluster around zero, the eigenvalue zero appears with multiplicity 45.

Figure 3. (Example 3,

m = 9

.) Left panel: the (computed) eigenvalues of the matrix representation of the computed compression

U_{N}

of

U^{t}

. The red cross at the origin indicates a cluster of eigenvalues. Right panel: zoomed neighborhood of the origin, showing many (in this case 44) absolutely small eigenvalues, quite a few of which are negative real. The matrix

U_{N}

is computed as

(pinv (O_{X}) * O_{Y})

. If it is computed as

(O_{X} \ O_{Y})

, instead of the cluster around zero, the eigenvalue zero appears with multiplicity 45.

Figure 4. (Example 3,

m = 9

.) The (computed) eigenvalues of the matrix

L_{N}

. Note that some of them are at the boundary of the strip

{z \in C : - π / δ t < ℑ (z) < π / δ t}

, i.e., the eigenvalues of

log U_{N}

are at the boundary of

{z \in C : - π < ℑ (z) < π}

, cf. Theorem 3. The right panel shows the distribution of the eigenvalues closer to the origin. Compare with Figure 3.

Figure 4. (Example 3,

m = 9

.) The (computed) eigenvalues of the matrix

L_{N}

. Note that some of them are at the boundary of the strip

{z \in C : - π / δ t < ℑ (z) < π / δ t}

, i.e., the eigenvalues of

log U_{N}

are at the boundary of

{z \in C : - π < ℑ (z) < π}

, cf. Theorem 3. The right panel shows the distribution of the eigenvalues closer to the origin. Compare with Figure 3.

Figure 5. (Example 3,

m = 9

.) Left panel: the first column of

U_{N}

, computed in Matlab as

pinv (O_{X}) * O_{Y}

. Its norm is

∥ U_{N} {(:, 1) ∥}_{2} \approx 5.0350 \times 10^{- 4}

. Right panel: the first column of

U_{N} = O_{X} \ O_{Y}

, with norm

∥ U_{N} {(:, 1) ∥}_{2} \approx 5.9611 \times 10^{- 4}

. The true value of

U_{N} (:, 1)

should be

e_{1} = {(1, 0, \dots, 0)}^{T}

.

Figure 5. (Example 3,

m = 9

.) Left panel: the first column of

U_{N}

, computed in Matlab as

pinv (O_{X}) * O_{Y}

. Its norm is

∥ U_{N} {(:, 1) ∥}_{2} \approx 5.0350 \times 10^{- 4}

. Right panel: the first column of

U_{N} = O_{X} \ O_{Y}

, with norm

∥ U_{N} {(:, 1) ∥}_{2} \approx 5.9611 \times 10^{- 4}

. The true value of

U_{N} (:, 1)

should be

e_{1} = {(1, 0, \dots, 0)}^{T}

.

Figure 6. (Example 3,

m = 9

.) The sparsity structure of

pinv (O_{X}) * O_{Y}

(numerical rank 176) and

O_{X} \ O_{Y}

(numerical rank 175). The backslash operator uses the rank revealing (column pivoted) QR factorization and, by truncation, returns a sparse solution.

Figure 6. (Example 3,

m = 9

.) The sparsity structure of

pinv (O_{X}) * O_{Y}

(numerical rank 176) and

O_{X} \ O_{Y}

(numerical rank 175). The backslash operator uses the rank revealing (column pivoted) QR factorization and, by truncation, returns a sparse solution.

Figure 7. The (computed) singular values of

O_{X} \in R^{3025 \times 220}

and

U_{N} \in R^{220 \times 220}

, and the column norms. The numerical rank of

O_{X}

and

U_{N}

is 176, which is considerably below the full rank 220. If

U_{N}

is computed as

O_{X} \ O_{Y}

, the numerical rank is 175. The results are similar (with the numerical rank 163) if the number of snapshots is increased to 24,025 as explained in Example 3.

Figure 7. The (computed) singular values of

O_{X} \in R^{3025 \times 220}

and

U_{N} \in R^{220 \times 220}

, and the column norms. The numerical rank of

O_{X}

and

U_{N}

is 176, which is considerably below the full rank 220. If

U_{N}

is computed as

O_{X} \ O_{Y}

, the numerical rank is 175. The results are similar (with the numerical rank 163) if the number of snapshots is increased to 24,025 as explained in Example 3.

Figure 8. Left panel: The (computed) eigenvalues of

U_{N} \in R^{220 \times 220}

(×), and the eigenvalues of

e x p (δ t L_{N})

(∘). The maximal relative difference between the matching eigenvalues is computed as

8.1 \times 10^{- 9}

. Here,

U_{N}

is computed as

U_{N} = Π R^{- 1} Q^{T} O_{Y}

without any truncation of R. The diagonal entries of R span, in absolute value, the range between

1.9 \times 10^{16}

and

1.0 \times 10^{1}

. This reveals the condition number of

U_{N}

which is computed as

7.4 \times 10^{16}

by the Matlab function cond(). Right panel: The values of

{log}_{10} ϵ_{k}

are defined in (24) for 12,000 randomly selected points in the box

[- 20, 20] \times [- 20, 20] \times [0, 50]

. Compare with Figure 3, Figure 4 and Figure 7.

Figure 8. Left panel: The (computed) eigenvalues of

U_{N} \in R^{220 \times 220}

(×), and the eigenvalues of

e x p (δ t L_{N})

(∘). The maximal relative difference between the matching eigenvalues is computed as

8.1 \times 10^{- 9}

. Here,

U_{N}

is computed as

U_{N} = Π R^{- 1} Q^{T} O_{Y}

without any truncation of R. The diagonal entries of R span, in absolute value, the range between

1.9 \times 10^{16}

and

1.0 \times 10^{1}

. This reveals the condition number of

U_{N}

which is computed as

7.4 \times 10^{16}

by the Matlab function cond(). Right panel: The values of

{log}_{10} ϵ_{k}

are defined in (24) for 12,000 randomly selected points in the box

[- 20, 20] \times [- 20, 20] \times [0, 50]

. Compare with Figure 3, Figure 4 and Figure 7.

Figure 9. The errors

ϵ_{k}^{(i)}

(34). Left panel: time interval

[0, 0.1]

. The total error is

τ = 2.2259 \times 10^{- 2}

. Right panel: time interval

[0, 0.18]

. The total error is

τ = 4.6174 \times 10^{2}

. Similar results are obtained using logm(Y∗pinv(X)). On the other hand, while on the interval

[0, 0.19]

logm(Y/X) fails producing NaN’s, logm(Y∗pinv(X)) completed without NaN exceptions, alas with the error

τ = 5.1718 \times 10^{2}

and with

ϵ_{k}^{(i)}

similar to that in the right panel above.

Figure 9. The errors

ϵ_{k}^{(i)}

(34). Left panel: time interval

[0, 0.1]

. The total error is

τ = 2.2259 \times 10^{- 2}

. Right panel: time interval

[0, 0.18]

. The total error is

τ = 4.6174 \times 10^{2}

. Similar results are obtained using logm(Y∗pinv(X)). On the other hand, while on the interval

[0, 0.19]

logm(Y/X) fails producing NaN’s, logm(Y∗pinv(X)) completed without NaN exceptions, alas with the error

τ = 5.1718 \times 10^{2}

and with

ϵ_{k}^{(i)}

similar to that in the right panel above.

Figure 10. (Example 7.) The errors

ϵ_{k}^{(i)}

(34). Left panel: time interval

[0, 0.1]

. The total error is

τ = 1.1516 \times 10^{- 2}

(

1.2406 \times 10^{- 2}

for logm(Y∗pinv(X))). Right panel: time interval

[0, 0.18]

. The total error is

τ = 1.7873 \times 10^{- 2}

(

9.4987 \times 10^{2}

for logm(Y∗pinv(X))). Compare this with Figure 9.

Figure 10. (Example 7.) The errors

ϵ_{k}^{(i)}

(34). Left panel: time interval

[0, 0.1]

. The total error is

τ = 1.1516 \times 10^{- 2}

(

1.2406 \times 10^{- 2}

for logm(Y∗pinv(X))). Right panel: time interval

[0, 0.18]

. The total error is

τ = 1.7873 \times 10^{- 2}

(

9.4987 \times 10^{2}

for logm(Y∗pinv(X))). Compare this with Figure 9.

Figure 11. (Example 8.) First row: the time stamps of

x_{1}, \dots, x_{360}

, illustrated on the first out of 12 generated trajectories. Three consecutive snapshots, with time lag

δ t = 0.01

, are taken at ten randomly selected and fixed time instances. Second row: The first plot shows the relative errors

ϵ_{k}^{(i)}

with

δ t = 0.01

, and the second plot for

δ t = 0.1

(sampled on another randomly selected times). The total errors are

τ = 1.4893 \times 10^{- 1}

and

τ = 8.6613 \times 10^{- 1}

, respectively.

Figure 11. (Example 8.) First row: the time stamps of

x_{1}, \dots, x_{360}

, illustrated on the first out of 12 generated trajectories. Three consecutive snapshots, with time lag

δ t = 0.01

, are taken at ten randomly selected and fixed time instances. Second row: The first plot shows the relative errors

ϵ_{k}^{(i)}

with

δ t = 0.01

, and the second plot for

δ t = 0.1

(sampled on another randomly selected times). The total errors are

τ = 1.4893 \times 10^{- 1}

and

τ = 8.6613 \times 10^{- 1}

, respectively.

Figure 12. (Example 8.) The first plot shows the relative errors

ϵ_{k}^{(i)}

with

δ t = 0.01

, and the second plot for

δ t = 0.1

(sampled on another randomly selected times in

[0, 50]

). The total errors are

τ = 1.8672 \times 10^{- 1}

and

τ = 1.8278 \times 10^{0}

, respectively. In the second plot, we used the real parts of the computed approximations

\tilde{F_{i} (x_{k})}

, as a non-principal (non-real) value of the logarithm was computed.

Figure 12. (Example 8.) The first plot shows the relative errors

ϵ_{k}^{(i)}

with

δ t = 0.01

, and the second plot for

δ t = 0.1

(sampled on another randomly selected times in

[0, 50]

). The total errors are

τ = 1.8672 \times 10^{- 1}

and

τ = 1.8278 \times 10^{0}

, respectively. In the second plot, we used the real parts of the computed approximations

\tilde{F_{i} (x_{k})}

, as a non-principal (non-real) value of the logarithm was computed.

Figure 13. (Example 8.) The first plot shows the five positions at a sample trajectory where three consecutive snapshots are taken. The matrix

O_{X}

is

180 \times 455

. The reduced dimension

\tilde{N}

is 64. The second plot shows the relative errors

ϵ_{k}^{(i)}

with

δ t = 0.01

. The total error is

τ = 4.4319 \times 10^{- 1}

.

Figure 13. (Example 8.) The first plot shows the five positions at a sample trajectory where three consecutive snapshots are taken. The matrix

O_{X}

is

180 \times 455

. The reduced dimension

\tilde{N}

is 64. The second plot shows the relative errors

ϵ_{k}^{(i)}

with

δ t = 0.01

. The total error is

τ = 4.4319 \times 10^{- 1}

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Drmač, Z.; Mezić, I.; Mohr, R. Identification of Nonlinear Systems Using the Infinitesimal Generator of the Koopman Semigroup—A Numerical Implementation of the Mauroy–Goncalves Method. Mathematics 2021, 9, 2075. https://doi.org/10.3390/math9172075

AMA Style

Drmač Z, Mezić I, Mohr R. Identification of Nonlinear Systems Using the Infinitesimal Generator of the Koopman Semigroup—A Numerical Implementation of the Mauroy–Goncalves Method. Mathematics. 2021; 9(17):2075. https://doi.org/10.3390/math9172075

Chicago/Turabian Style

Drmač, Zlatko, Igor Mezić, and Ryan Mohr. 2021. "Identification of Nonlinear Systems Using the Infinitesimal Generator of the Koopman Semigroup—A Numerical Implementation of the Mauroy–Goncalves Method" Mathematics 9, no. 17: 2075. https://doi.org/10.3390/math9172075

APA Style

Drmač, Z., Mezić, I., & Mohr, R. (2021). Identification of Nonlinear Systems Using the Infinitesimal Generator of the Koopman Semigroup—A Numerical Implementation of the Mauroy–Goncalves Method. Mathematics, 9(17), 2075. https://doi.org/10.3390/math9172075

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Identification of Nonlinear Systems Using the Infinitesimal Generator of the Koopman Semigroup—A Numerical Implementation of the Mauroy–Goncalves Method

Abstract

1. Introduction

Contributions and Organisation of the Paper

2. Preliminaries: Finite Dimensional Compression of U and Its Logarithm

2.1. Compression of U t and the Anatomy of Its Matrix Representation

2.1.1. Discrete Least Squares Projection Φ N : F ⟶ F N

2.1.2. Matrix Representation of Φ N U | F N t : F N ⟶ F N

2.1.3. When Is U N Nonsingular?

2.1.4. Relations with the DMD

2.2. On the Numerical Solution of ∥ O X U N − O Y ∥ F → m i n

2.2.1. Least Squares Solution in Case of Numerical Rank Deficiency

2.3. Computing the Logarithm log O X † O Y

3. Identification Method

3.1. The Choice of the Basis B —Monomials

3.2. Compression of L in the Monomial Basis

3.3. Imposing the Structure in the Reconstruction of F

3.3.1. Quadratic Systems

4. Numerical Implementation—A Case Study Analysis

4.1. An Example: Lorenz System

4.2. What Went Wrong

5. Computing [ L N ] B with Preconditioning

5.1. A Simple Modification

5.2. Preconditioning the Logarithm of O X † O Y

5.2.1. Scaled QR Factorization Based Preconditioner

5.2.2. Pivoted QR Factorization Based Preconditioner

6. Dual Method

6.1. A Rayleigh Quotient Formulation

Global Identification of F

6.2. A Numerical Example

7. Subspace Selection

7.1. Pruning the Basis ( ℘ 1 , … , ℘ N )

7.2. Well Conditioned Selection by Basis Pruning—General Case

7.3. Implementation Details

7.4. Numerical Experiments with the Dictionary Pruning Algorithm

8. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2. Preliminaries: Finite Dimensional Compression of $U$ and Its Logarithm

2.1. Compression of $U^{t}$ and the Anatomy of Its Matrix Representation

2.1.1. Discrete Least Squares Projection $Φ_{N} : F ⟶ F_{N}$

2.1.2. Matrix Representation of $Φ_{N} U_{| F_{N}}^{t} : F_{N} ⟶ F_{N}$

2.1.3. When Is $U_{N}$ Nonsingular?

2.2. On the Numerical Solution of $∥ O_{X} U_{N} - O_{Y} ∥_{F} \to m i n$

2.3. Computing the Logarithm $log O_{X}^{†} O_{Y}$

3.1. The Choice of the Basis $B$ —Monomials

3.2. Compression of $L$ in the Monomial Basis

5. Computing ${[L_{N}]}_{B}$ with Preconditioning

5.2. Preconditioning the Logarithm of $O_{X}^{†} O_{Y}$

Global Identification of $F$

7.1. Pruning the Basis $(℘_{1}, \dots, ℘_{N})$