1. Introduction
Information geometry and geometric mechanics each induce geometric structures on an arbitrary manifold Q, and we investigate the relationship between these two approaches. More specifically, we study the interaction of three objects: , the tangent bundle on which a Lagrangian function is defined; , the cotangent bundle on which a Hamiltonian function is defined; and , the product manifold on which the divergence function (from the information geometric perspective) or the Type I generating function (from the geometric mechanics perspective) is defined. In discrete mechanics, while the correspondence is via a symplectomorphism given by the time-h flow map associated with the Hamiltonian , and the correspondence is via the map relating the boundary-value and initial-value formulation of the Euler–Lagrange flow, it is the correspondence between through the fiberwise Legendre map based on L or H that actually serves to couple the Hamiltonian flow with Lagrangian flow, leading to one and the same dynamics. We propose a decoupling of the Lagrangian and Hamiltonian dynamics through the use of a divergence function defined on the Pontryagin bundle that measures the discrepancy (or duality gap) between and . We also establish, through variational error analysis that the divergence function agrees with exact discrete Lagrangian up to third order if and only if Q is a Hessian manifold.
Geometric mechanics [
1] investigates the equations of motion using the Lagrangian, Hamiltonian, and Hamilton–Jacobi formulations of classical Newtonian mechanics. Two apparently different principles were used in those formulations: the principle of conservation (energy, momentum, etc.) leading to
Hamiltonian dynamics and the principle of variation (least action) leading to
Lagrangian dynamics. The conservation properties of the Hamiltonian approach are with respect to the underlying symplectic geometry on the cotangent bundle, whereas the variational principles that result in the Euler–Lagrange equation of motions and the Hamilton–Jacobi equations reflect the geometry of the semispray on the tangent bundle. Lagrangian and Hamiltonian mechanics reflect two sides of the same coin–that they describe the identical dynamics on the configuration space (base manifold) is both remarkable and also to be expected due to their construction: because the Lagrangian and Hamiltonian are dual to each other, and related via the Legendre transform.
Information geometry [
2,
3] in the broadest sense of the term, provides a dualistic Riemannian geometric structure that is induced by a class of functions called
divergence functions, which essentially provide a method of smoothly measuring a directed distance between any two points on the manifold, where the manifold is the space of probability densities. It arises in various branches of information science, including statistical inference, machine learning, neural computation, information theory, optimization and control, etc. Various geometric structures can be induced from divergence functions, including metric, affine connection, symplectic structure, etc., and this is reviewed in [
4]. Convex duality and the Legendre transform play a key role in both constructing the divergence function and characterizing the various dualities encoded by information geometry [
5,
6].
Given that geometric mechanics and information geometry both prescribe dualistic geometric structures on a manifold, it is interesting to explore the extent to which these two frameworks are related. In geometric mechanics, the Legendre transform provides a link between the Hamiltonian function that is defined on the cotangent bundle , with the Lagrangian function that is defined on the tangent bundle , whereas in information geometry, it provides a link between the biorthogonal coordinates of the base manifold Q if it is dually-flat and exhibits the Hessian geometry. To understand their deep relationship, it turns out that we need to resort to the discrete formulation of geometric mechanics, and investigate the product manifold . The basic tenet of discrete geometric mechanics is to preserve the fact that Hamiltonian flow is a symplectomorphism, and construct discrete time maps that are symplectic. This results in two ways of viewing discrete-time mechanics, either as maps on or , which are related via discrete Legendre transforms. The shift in focus from to precisely lends itself to establishing a connection to information geometry, as the divergence function is naturally defined on , and in both information geometry and discrete geometric mechanics, induces a symplectic structure on . This is the basic observation that connects geometric mechanics and information geometry, and we will explore the implications of this connection in the paper.
Our paper is organized as follows.
Section 2 provides a contemporary viewpoint of geometric mechanics, with Lagrangian and Hamiltonian systems discussed in parallel with one another, in terms of geometry on
and
, respectively, including a discussion of Dirac mechanics on the Pontryagin bundle
, which provides a unified treatment of Lagrangian and Hamiltonian mechanics.
Section 3 summarizes the results of the discrete formulation of geometric mechanics, which is naturally defined on the product manifold
.
Section 4 is a review of now-classical information geometry, including the Riemannian metric and affine connections on
, and the manner in which the divergence function naturally induces dualistic Riemannian structures. The special cases of Hessian geometry and biorthogonal coordinates are highlighted, showing how the Legendre transform is essential for characterizing dually-flat spaces.
Section 5 starts with a presentation of the symplectic structure on
induced by a divergence function, which is naturally identified with the Type I generating function on it. We follow up by investigating its transformation into a Type II generating function (which plays a key role in discrete Hamiltonian mechanics). We then propose to decouple the discrete Hamiltonian and Lagrangian dynamics by using the divergence function to measure their duality gap. Finally, we perform variational error analysis to show that on a dually-flat Hessian manifold, the Bregman divergence is third-order accurate with respect to the exact discrete Lagrangian.
Section 6 closes with a summary and discussion.
2. A Review of Geometric Mechanics
Consider an
n-dimensional configuration manifold
Q, with local coordinates
. The Lagrangian formulation of mechanics is defined on the tangent bundle
, in terms of a Lagrangian
. From this, one can construct an action integral
which is a functional of the curve
, given by
Then, Hamilton’s variational principle states that,
where the variation
is induced by an infinitesimal variation
of the trajectory
q, subject to the condition that the variations vanish at the endpoints, i.e.,
. Applying standard results from the calculus of variations, we obtain the following Euler–Lagrange equations of motion,
The Hamiltonian formulation of mechanics is defined on the cotangent bundle
, and the
fiberwise Legendre transform,
, relates the tangent bundle and the cotangent bundle as follows,
where
is the conjugate momentum to
:
The term
fiberwise is used to emphasize the fact that
establishes a pointwise correspondence between
and
for any point
q on
Q. The cotangent bundle forms the
phase space, on which one can define a Hamiltonian
,
where
is viewed as a function of
by inverting the Legendre transform (
3), and
denotes the duality or natural pairing between a vector
v and covector
p at the point
. A straightforward calculation shows that
and
From these, we transform the Euler–Lagrange equations into Hamilton’s equations,
The canonical symplectic form
on
can be identified with a quadratic form induced by the skew-symmetric matrix
J, i.e.,
. With that identification, Hamilton’s equations can be expressed as,
Alternatively, Hamilton’s equations (
5) can be derived using Hamilton’s phase space variational principle, which states that,
for infinitesimal variations
that vanish at the endpoints. The infinitesimal variation of the integral is computed by differentiating under the integral, integrating by parts, and using the fact that the infinitesimal variations
vanish at the endpoints, which yields:
and by the fundamental theorem of the calculus of variations, which states that the integral is stationary only when the terms in the parentheses multiplying into the independent variations
and
vanish, we recover Hamilton’s equations (
5).
Lagrangian and Hamiltonian mechanics are typically viewed as different representations of the same dynamical system, with the Legendre transform relating the two formulations. Here, the Legendre transform (with as its inverse) refers to both the map relating two sets of variables, with , as well as the relationship between two functions, the Lagrangian and the Hamiltonian . The Legendre transform links pairs of convex conjugate functions; in classical mechanics, the Lagrangian L and Hamiltonian H are always related in this sense of forming a convex pair. The requirement that be strictly convex in the variable is referred to as hyperregularity. When the Lagrangian is positive homogeneous (or singular), the Legendre transform yields a Hamiltonian function that is identically zero, which means that in such cases, the Hamiltonian analogue of the Lagrangian system does not exist, which is problematic in the context of analytic mechanics. In order to address such degeneracy, it is necessary to consider Dirac mechanics on Dirac manifolds, which is a simultaneous generalization of Lagrangian and Hamiltonian mechanics.
In geometric mechanics, including the contemporaneous Dirac formulation, the Lagrangian L and Hamiltonian H are always coupled via the fiberwise Legendre transform . In information geometry, it is a well-known fact that one can construct the divergence function (to be defined later), which captures the departure from such perfect coupling. In other words, we can view Lagrangian and Hamiltonian systems as two separate systems, which are endowed with their own dynamics and are in some sense dual to each other, and we then use the divergence function to measure their duality gap. For this reason, we will review the Lagrangian and Hamiltonian formulation of mechanics in terms of and , respectively, without necessarily assuming that the Lagrangian and Hamiltonian are related by the Legendre transform.
2.1. Lagrangian Mechanics as an Extremization System on
As noted previously, the Euler–Lagrange equations (
2) arise from the stationarity conditions that describe the extremal curves of the action integral, over the class of varied curves that fix the endpoints. Carrying out the differentiation in (
2) explicitly yields,
The
fundamental tensor associated with the Lagrangian
is given by,
which is assumed to be positive-definite, i.e., the Lagrangian
L is hyperregular. Let
denote the matrix inverse of
, then (
6) can be written as
where
So, Equation (
7) with the above
are Euler–Lagrange equations in disguise, and its solution is an extremal curve of the action integral.
Recall that a smooth curve on
Q can be lifted to a curve on
in a natural way: a curve
becomes
. Given an arbitrary
, the system of equations (
7) specify a family of curves, called a
semispray. As seen above, semisprays arise naturally in variational calculus as extremal curves of the action integral associated with a Lagrangian.
Semisprays can be more generally described by a vector field. Recall that a vector field on
Q is a section of
. Now, consider a vector field on the tangent bundle
; it is a section of the double tangent bundle
. The integral surfaces of the semispray induces a decomposition of the total space
into the horizontal subspace
and the vertical subspace
, which defines an Ehresmann connection. A vector on
encodes the second-order derivative of curves on
Q, and a semispray defines the following vector field
V on
:
where the factor
is there by convention. The integral curve of the semispray satisfies the second-order ordinary differential equation (
7), and we say that a semispray is a vector field on the tangent bundle
which encodes a second-order system of differential equations on the base manifold
Q.
A semispray is called a
full spray if the spray coefficients
satisfy
for
. In this case, the integral curve remains invariant under reparameterization by a positive number, i.e., it satisfies homogeneity. When the semispray becomes a (full) spray, the Lagrange geometry becomes Finsler geometry, and the fundamental tensor
becomes the Finsler–Riemann metric tensor (which includes the Riemann metric as a special case).
As noted above, a semispray induces an Ehresmann connection on
Q and this connection is torsion-free and typically nonlinear. Conversely, given a torsion-free connection, one can construct a semispray. The connection is homogenous if and only if the semispray is a full spray. Moreover, if the spray is affine, then the connection is affine as well—an affine spray
takes the form
where
is referred to as the affine connection.
To summarize, Lagrangian dynamics is related to action minimization by the Euler operator, and leads to a semispray on the configuration manifold Q. Under suitable conditions, the Lagrangian function defined on will lead to a torsion-free but generally nonlinear connection, and an affine connection only for a very special form of Lagrangian.
2.2. Hamiltonian Mechanics as a Conservative System on
Given a Hamiltonian
, we consider the
Hamiltonian vector field (where
denotes a section) defined by
It is straightforward to verify that
along the dynamical flow of
:
So, a Hamiltonian vector field
advects the Hamiltonian
H along its flow, so that
H is constant along solution curves, which implies that the Lie derivative
of
H along the flow of
vanishes,
Formally, starting from the tautological 1-form
on
Q, one obtains a 2-form
, called the Poincaré 2-form,
which is the canonical symplectic form on
:
where
are vector fields on
.
More generally, given a Hamiltonian
H along with a symplectic form
, which is, by definition, a closed, nondegenerate 2-form, one obtains the Hamiltonian vector field
on
, defined in abstract notation by
or equivalently in a more familiar notation,
One can define the Poisson bracket
of two functions
F and
G by using their respective Hamiltonian vector fields and the symplectic form,
For the canonical symplectic form, it has the following coordinate expression,
In this way, Hamilton’s equations can be expressed in terms of the Poisson bracket as follows,
By Darboux’s theorem, it is always possible to choose local coordinates
on
, referred to as canonical coordinates, such that the symplectic form has the expression
. In these coordinates, Hamilton’s equations defined in terms of the symplectic structure (
9) and Poisson structure (
10) recover the canonical Hamiltonian vector field (
8).
Note that
any smooth function
H on
induces a Hamiltonian vector field. An arbitrary vector field
X on
is locally Hamiltonian, i.e., induced by a smooth function
H, if
is closed, i.e.,
. In addition, a Hamiltonian vector field preserves the
volume form , i.e.,
where
is the
n-fold exterior product of
,
2.3. Symplectic Maps and Symplectic Flows
A symplectic map is a diffeomorphism of that preserves its symplectic structure . We first consider a one-parameter family of symplectic maps generated by the flow map of a vector field . Since the entire family of symplectic maps leave invariant, it follows that . It can be shown (using Cartan’s magic formula, and the fact that is closed) that a vector field is symplectic if is closed, i.e., . By the Poincaré lemma, this implies that is locally exact, that is, in the neighborhood of any point, there exists some function such that . So there is always locally exists a Hamiltonian that generates a vector field X whose flow is symplectic with respect to .
More generally, a diffeomorphism
is a symplectic map from a symplectic space
to another space
if:
where
are the symplectic forms on
, respectively. The above condition (
11) holds if and only if for any functions
f,
g:
- (i)
,
- (ii)
.
With respect to Darboux coordinates about a point
, the condition (
11) that a map
is symplectic can be expressed locally by
, where
denotes the Jacobian of
at
z.
A canonical transformation of
is an automorphism
,
such that
The significance of canonical transformations is that they preserve the form of Hamilton’s equations, and one can check that an automorphism is canonical by verifying that in a Darboux coordinate chart.
2.4. Symplectic Structure on Pulled Back from
If we endow with the canonical symplectic form, we can construct a symplectic form on in such a way that these two spaces are symplectomorphic.
The mapping between
and
can be constructed in two different ways, Case I involves the Legendre transform:
and Case II involves the Riemannian metric tensor
g (on
Q):
Note that we say that g is a pseudo-Riemannian metric on Q when g acts on a pair of tangent vectors at the tangent space at a point q of Q; it can be viewed as a symmetric -tensor that maps . On the other hand, the symplectic form is a skew-symmetric -tensor that acts on a pair of tangent vectors on , so it maps .
Case I. Given the Lagrangian
, this induces the fiberwise Legendre transform
, which is given by
. If
L is hyperregular, then this map is a diffeomorphism. If we endow
with the pullback symplectic form
, which is given by
then the Legendre transform is a symplectomorphism (by construction).
Case II. The Riemannian metric
g induces the musical isomorphisms
and
between
and
, which are the operations that lower and raise the index, respectively. If we endow
with the pullback symplectic form
, which is given by
then the musical isomorphism is a symplectomorphism (by construction).
Link between Case I and Case II. It is possible that the two ways of identifying
may be the same; this happens when
g on
coincides with the second derivatives of
with respect to the
v-variable:
assuming
L is hyperregular. The inverse of
g, denoted
, can be obtained from
using the Hamiltonian
defined on
. Note that when the Lagrangian has the form
, this corresponds to the Riemannian metric
g being given by the kinetic energy metric
.
2.5. Hamilton-Jacobi Theory and Dirichlet-to-Neumann Map
In classical mechanics, the Hamilton–Jacobi equation is first introduced as a partial differential equation that the action integral satisfies. Recall that the action integral
S along the solution of the Euler–Lagrange equation (
2) over the time interval
is
This is referred to as Jacobi’s solution of the Hamilton–Jacobi equation. Here, we assume that the initial position
is fixed and the final position
depends on the initial velocity
. By taking a variation
of the endpoint
, one obtains a partial differential equation satisfied by
:
This is the Hamilton–Jacobi equation, when H does not explicit depend on t.
Conversely, it is shown that if is a solution of the Hamilton–Jacobi equation then is a generating function for the family of canonical transformations (or symplectic flows) that describe the dynamics defined by Hamilton’s equations. This result is the theoretical basis for the powerful technique of exact integration called separation of variables.
There are two uses of . First, it serves to characterize the Dirichlet-to-Neumann map, which refers to the correspondence between the boundary data with the initial data of a dynamical system. Second, it provides a foliation of the configuration space Q, around the point and parameterized by t, that is defined by the condition .
In the rest of the paper, we will view
as a scalar-valued function of
, which we refer to as the
exact discrete Lagrangian ,
this is equivalent to the expression for Jacobi’s solution, as the stationarity conditions of this variational characterization are simply the Euler–Lagrange equations. Furthermore, this characterization has the added benefit that it is well-defined even if the Lagrangian is degenerate. The exact discrete Lagrangian provides us with the time-
h flow map for the Euler–Lagrange equation. Given a fixed initial point
, this defines a map which takes
to an initial velocity
, such that the Euler–Lagrange trajectory
with initial condition
has boundary values
. This is the Dirichlet-to-Neumann map
,
.
To address the Dirichlet-to-Neumann map more generally, let us first recall the definition of a retraction:
Definition 1 ([
7], Definition 4.1.1 on p. 55)
. A retraction on a manifold Q is a smooth mapping : with the following properties: Let be the restriction of to for an arbitrary ; then,- (i)
, where denotes the zero element of ;
- (ii)
with the identification , satisfieswhere is the tangent map of at .
Equation (
17) implies that the map
is invertible in some neighborhood of
in
. Its inverse is conveniently denoted as
, which is defined by
it is easy to see that
is also invertible in some neighborhood of
for any
.
Let us introduce a special class of coordinate charts that are compatible with a given retraction map . A coordinate chart with U an open subset in Q and is said to be retraction compatible at if
- (i)
is centered at q, i.e., ;
- (ii)
the compatibility condition
holds, where we identify
with
as follows: Let
with
for
. Then
where
is the unit vector in the
-direction in
.
An atlas for the manifold Q is retraction compatible if it consists of retraction compatible coordinate charts.
In Equation (
19), we assumed that
and so strictly speaking
is defined on
. However, it is always possible to define a coordinate chart such that
by stretching out the open set
to
so that (
19) is defined for any
.
Retraction maps provide general means to relate to : in essence it provides a correspondence between and for all (we may take to mean the projection of onto either the first or the second slot).
2.6. Variational Mechanics and the Pontryagin Bundle
Lagrangian and Hamiltonian mechanics can be combined into Dirac mechanics [
8,
9], which is described on the
Pontryagin bundle , which has position, velocity, and momentum as local coordinates.
Just as the Euler–Lagrange equations of motion arises out of Hamilton’s principle, Hamilton’s equations can also arise from Hamilton’s phase space principle:
On the Pontryagin bundle
, which has local coordinates
, a relaxation of Hamilton’s principle (
1) is the
Hamilton–Pontryagin variational principle, which uses a Lagrange multiplier
p to impose the second-order condition
,
This encapsulates both Hamilton’s and Hamilton’s phase space variational principles, as well as the Legendre transform, and gives the
implicit Euler–Lagrange equations,
The last equation explicitly imposes the primary constraint condition, which is important when describing degenerate Lagrangian systems, such as electrical circuits. Note that the
p are interpreted as Lagrange multipliers [
10] in addition to its usual interpretation as conjugate momenta. The three equations can be combined by eliminating
v and
p to recover the Euler–Lagrange equations.
An important application of Hamilton–Jacobi theory is in optimal control theory. Consider a typical optimal control problem,
subject to the constraints,
and the boundary conditions
and
. We convert constrained optimization to unconstrained optimization by using Lagrange multipliers
p (sometimes called the
costate or auxiliary variables), and we can define the augmented cost functional:
where we introduced the costate variables
p, and also defined the control Hamiltonian,
The variables
forms a Hamiltonian system, so we impose the optimality condition,
to obtain the equation for the optimal control
, and we obtain the Hamiltonian,
We also define the optimal cost-to-go function,
where
for
is the solution of Hamilton’s equations with the above
H such that
; and
is the optimal cost
and the function
is defined by
Since this definition coincides with (
14), the function
satisfies the Hamilton–Jacobi equation (
15); this reduces to the Hamilton–Jacobi–Bellman (HJB) equation for the optimal cost-to-go function
:
It can also be shown that the costate p of the optimal solution is related to the solution of the Hamilton–Jacobi–Bellman equation.
3. Discrete Formulation of Geometric Mechanics
In this section, we review various schemes for discretizing mechanics (see, e.g., [
11]). Geometric mechanics focuses on the differential geometric structure of the configuration manifolds, the associated symplectic and Poisson structures on the phase space, and the conservation laws generated by symmetries, and geometric structure-preserving numerical integration aims to preserve as many of these geometric properties as possible under discretization. The main idea is to start from the canonical symplectic form
on
, and look at the symplectomorphisms that preserve
or its pullback via the Legendre transforms to
or
.
3.1. Symplectomorphisms from to and to
Given a cotangent bundle with a symplectic form , we wish to endow the bundles and with a symplectic structure. Given a function , the Legendre transform is viewed as the fiber derivative , . The pullback of with respect to yields a symplectic structure on .
Similarly, given a function
, we define two
discrete fiber derivatives,
:
, which serve as
discrete Legendre transforms:
Here
refers to taking a derivative with respect to the first or second slot, respectively:
The two choices of discrete fiber derivatives correspond to whether one views
as a bundle over
Q with respect to
or
, i.e., projection onto the first or the second slot. These induce symplectic structures
on
by pullback.
Let be a symplectic map and let the maps denoted by the dotted arrows in the figure above be defined by requiring that the diagram commutes. Then, these maps are also symplectic maps, and the fiber derivative is a symplectomorphism between and , and the discrete fiber derivatives are symplectomorphisms between and .
3.2. Discrete Lagrangian Mechanics
The aim of geometric structure-preserving numerical integration is to preserve as many geometric conservation laws as possible under discretization. Discrete variational mechanics [
11] is based on the
discrete Hamilton’s principle,
where the endpoints
and
are fixed, and the
discrete Lagrangian,
, is a Type I generating function of the symplectic map. Recall that there exists an
exact discrete Lagrangian (
16), that generates the exact time-
h flow of a Lagrangian system, but it cannot be computed in general. One possible method of constructing computable discrete Lagrangians is the Galerkin approach, which involves replacing the infinite-dimensional function space
and the integral in (
16) with a finite-dimensional function space and a quadrature formula, respectively. Below are two examples of discrete Lagrangians:
- (i)
Symplectic midpoint integrator
this can be obtained from the Galerkin approach by considering the family of linear polynomials as the finite-dimensional function space, and the midpoint rule as the quadrature formula.
- (ii)
Störmer–Verlet integrator
this can be obtained from the Galerkin approach by considering the family of linear polynomials as the finite-dimensional function space, and the trapezoidal rule as the quadrature formula.
Performing variational calculus on the discrete Hamilton’s principle (
25) yields the
discrete Euler–Lagrange (DEL) equations,
The above equation implicitly defines the
discrete Lagrangian map at points sufficiently close to the diagonal of
. This is equivalent to the
implicit discrete Euler–Lagrange (IDEL) equations,
which is precisely the characterization of a symplectic map in terms of Type I generating function. It implicitly defines the
discrete Hamiltonian map , and it is symplectic with respect to the canonical symplectic form
on
, i.e.,
.
The two discrete fiber derivatives
induce a single unique
discrete symplectic form on
,
and the discrete Lagrangian map is symplectic with respect to
on
, i.e.,
.
The discrete Lagrangian and Hamiltonian maps can be expressed in terms of the discrete fiber derivatives, , and , respectively. This characterization of the discrete flow maps underlies the proof of the variational error analysis theorem.
These properties may be summarized in the following commutative diagram,
If the exact discrete Lagrangian is used, then the discrete Hamiltonian map is equal to the time-h flow map of Hamilton’s equations, and the dotted arrow is the time-h flow map of the Euler–Lagrange equations.
The variational integrator approach to constructing symplectic integrators simplifies the numerical analysis of these methods. In particular, the task of establishing the geometric conservation properties and order of accuracy of the discrete Lagrangian map reduces to the simpler task of verifying certain properties of the discrete Lagrangian instead.
3.3. Discrete Hamilton–Jacobi Formulation
In the context of discrete variational mechanics, discrete Hamilton–Jacobi theory can be viewed as a composition theorem which relates the composition of symplectic maps generated by a Type II generating function with a symplectic map generated by a Type I generating function . By convention, the first argument in the composition generating function is typically omitted, and we simply consider it to be a function of the final position .
The right discrete Hamiltonian,
[
12], is related to the discrete Lagrangian by the Legendre transform,
where we impose the condition that
. Equivalently, this can be characterized variationally by
. This leads to a discrete Hamilton’s principle in phase space,
which yields the right discrete Hamilton’s equations,
which is precisely the characterization of a symplectic map in terms of Type II generating function.
The continuous Hamilton–Jacobi equation can be derived by considering the evolution properties of Jacobi’s solution, which is the action integral evaluated along the solution of the Euler–Lagrange equations. One can derive a discrete Hamilton–Jacobi theory by considering a discrete analogue of Jacobi’s solution, expressed in terms of the right discrete Hamiltonian,
which we evaluate along a solution of the right discrete Hamilton’s equations (
29). From this, we have,
where
is considered to be a function of
and
. Taking derivatives with respect to
, we obtain,
but the term inside the parenthesis vanishes as we are restricting this to a solution of the right discrete Hamilton’s equations. Therefore, we have that
which when substituted into (
30) yields the discrete Hamilton–Jacobi equation,
3.4. Discrete Hamilton–Pontryagin Principle
Leok and Ohsawa [
13] considered the discrete Hamilton’s principle and relaxed the discrete second-order condition,
and reimposed it using Lagrange multipliers
, in order to derive the
discrete Hamilton–Pontryagin principle on
,
Here, the superscripts 0, or 1 on
refers to the first or second slot, respectively, in
. This in turn yields the
implicit discrete Euler–Lagrange equations,
where
denote as before the partial derivative with respect to the first or second argument in
. Making the identification
, the last two equations define the
discrete fiber derivatives,
as given by (
23) and (24). Discrete fiber derivatives induce a
discrete symplectic form,
, and the discrete Lagrangian map
and the discrete Hamiltonian map
preserve
and
, respectively.
5. Linking Information Geometry with Geometric Mechanics
5.1. Symplectic Structure on Induced from the Divergence Function
We will now establish the connection between information geometry and discrete geometric mechanics. The divergence function from information geometry can be viewed as a Type I generating function of a symplectic map, and in particular, it can be viewed as a discrete Lagrangian in the sense of discrete Lagrangian mechanics. More specifically, let the configuration manifold be the information manifold, i.e.,
, and the discrete Lagrangian be the divergence function, i.e.,
. With this identification, we observe that the information geometric construction of symplectic structure on
described below is nothing but the discrete symplectic structure on
given in (
28) where the discrete Lagrangian
is replaced with the divergence function
.
From information geometry, a divergence function
is given as a scalar-valued binary function on
Q (of dimension
n). We now view it as a unary function on
(of dimension
) that vanishes along the diagonal
. In this subsection, we investigate the conditions under which a divergence function can serve as a
generating function of a symplectic structure on
. A compatible metric on
will also be derived. When restricted to the diagonal submanifold
, the skew-symmetric symplectic form will vanish, so
, which carries a statistical structure, is actually a Lagrangian submanifold (see [
21,
22]).
First, we fix a point
x in the first slot and a point
y in the second slot of
– this results in two
n-dimensional submanifolds of
that will be denoted,
(with the
y point fixed) and
(with the
x point fixed), respectively. The canonical symplectic form
on the cotangent bundle
is given by
Given
, we define a map
from
to
, which is given by,
Recall that the comma in the subscript of a divergence function
indicates whether it is being differentiated with respect to a variable in the first or second slot. It is easily checked that there exists a neighborhood of the diagonal
, such that the map
is a diffeomorphism. In particular, the Jacobian of the map is given by
which is nondegenerate in a neighborhood of the diagonal
.
We calculate the pullback by
of the canonical symplectic form
on
to
:
Here, , since by the equality of mixed partials, always holds.
Similarly, we consider the canonical symplectic form
on
and define a map
from
, which is given by
Using
to pullback
to
yields an analogous formula:
Therefore, based on canonical symplectic forms on
and
, we obtained the same symplectic form on
Theorem 1 ([
22])
. A divergence function induces a symplectic form (57) on which is the pullback of the canonical symplectic forms and by the maps and , With the symplectic form
given as above, it is easy to check that
is closed:
It was Barndorff-Nielsen and Jupp [
21] who first proposed (
57) as an induced symplectic form on
, apart from a minus sign; they called the divergence function
a
york.
The fact that this symplectic structure coincides with the one introduced in discrete mechanics should come as no surprise. The and submanifolds are related to the two ways of viewing as a bundle over Q, depending on whether one chooses , or , as the bundle projection. Then, the maps , are, up to a sign, simply the discrete fiber derivatives , where the discrete Lagrangian is replaced by the divergence function .
5.2. Divergence as a Type I Generating Function
As we have seen previously, symplectic maps are a natural way of describing the flow of Hamiltonian mechanics on the cotangent bundle
. We will now consider the characterization of symplectic maps in terms of generating functions, and in particular, we review three different parameterizations based on the classification given in Goldstein [
23].
Lemma 2. Given , then on is symplectic if and only if there exists such that To prove this, observe that
from which, we immediately obtain
Identifying the corresponding terms yield (
59).
Type I generating functions are linked with other types of generating functions via partial Legendre transforms. Fixing the first or second variable slot leads to, respectively, Type II or III generating functions, denoted respectively.
Let
be a submanifold, with local coordinates
, of
, with local coordinates
, where
is dependent on
and
. Then
on
is symplectic if and only if there exists
such that
Likewise, let
be a submanifold, whose local coordinates are
, of
with local coordinates
where
is dependent on
and
. Then
on
is symplectic if and only if there exists
such that
In the case of discrete mechanics, the Type II generating function is denoted by
and the Type III generating function is denoted by
. We compute their exterior derivatives:
Therefore, symplectic maps can be defined implicitly in terms of a Type II generating function
,
and a Type III generating function
,
More explicitly, these are related to the discrete Lagrangian
, which is a Type I generating function, by the following partial Legendre transforms:
or equivalently,
The upshot of the above discussion is that , are Legendre dual variables with respect to , , whereas in the fiberwise Legendre transform , it is , which are dual to , —the dual correspondence is , instead of . As before, the two discrete Legendre dualities are due to the two ways of viewing as a bundle over Q.
In the context of information geometry,
is nothing but the
partial Legendre transform of the divergence function
with respect to the first or second argument. Consider the Bregman divergence
,
and view it as a discrete Lagrangian
. Then, its partial Legendre transform with respect to
, the Type II generating function
, is
which evaluates to
where
is obtained by solving
By substitution, we obtain,
Note that in this case, the Legendre dual of is no longer as given by the fiberwise Legendre map, but is rather shifted by an amount . It is interesting that still takes the form of , as does . This is a special property of taking the Bregman divergence as the generating function.
5.3. -Divergence for Decoupling L and H
In geometric mechanics, Hamiltonian and Lagrangian dynamics represent one and the same dynamics–they are
coupled; this is because
and
are related by the fiberwise Legendre transform
–in fact they are a Legendre pair. The conservation properties of the Hamiltonian approach with respect to the underlying symplectic geometry and the variational principles that arise in the Lagrangian and Hamilton–Jacobi theories reflect two sides of the same coin.
To appreciate this, we look at the interaction of three manifolds
,
and
. We take
to be the configuration variable
q at successive time-step—it is the dynamical equation that governs the evolution from
to
. The Hamiltonian dynamics, which is encoded in the preservation of
of
, governs discrete Hamiltonian flow
, through a Type I generating function
. On the other hand, the Lagrangian flow is governed by the retraction map
, such as the Dirichlet-to-Neumann map induced by Jacobi’s solution
to the Hamilton–Jacobi equation. Those two dynamic updates
need not be identical. In mechanics, the Hamiltonian energy conservation system and the Lagrangian extremization system lead to one and the same dynamics, precisely because
and
are linked through the fiberwise Legendre transform
at
:
In other words, L and H are perfectly coupled–with no duality gap.
Information geometry, on the other hand, starts with a divergence (or
contrast) function
on
, which measures the discrepancy between the two systems. Given
on
and
on
, we write
Theorem 2. Let and be strictly convex functions, defined on and in terms of the variables and , respectively. Then, for the following statements, any two imply the rest:
- (i)
;
- (ii)
and are (fiberwise) convex conjugate (Legendre dual) to each other;
- (iii)
;
- (iv)
.
When
,
with
, and
The Euler–Lagrange equations are equivalent to
Our insight here is that does not have to vanish identically. The consequence is that we do not require the Lagrangian dynamics (extremization dynamics) and Hamiltonian dynamics (conservation dynamics) to be coupled; they will be allowed to evolve independently. The function allows us to study fiberwise symplectomorphisms of Dirac manifolds.
Let us consider the case that (ii) holds, i.e.,
and
are Legendre duals to each other. Then, the canonical divergence
can be written as the Bregman divergence
and
, after applying the fiberwise Legendre map
or
,
This implies that,
and they satisfy,
This is the reference-representation biduality [
18,
19], which is satisfied whenever
L and
H are Legendre duals of each other.
5.4. Variational Error Analysis
Recall that we previously defined the
exact discrete Lagrangian (
16), which is related to Jacobi’s solution of the Hamilton–Jacobi equation. The significance of the exact discrete Lagrangian is that it generates the exact discrete time flow of a Lagrangian system, but in general it cannot be computed explicitly. Instead, a computable discrete Lagrangian
is used instead to construct a discretization of Lagrangian mechanics, and it induces the discrete Lagrangian map
.
Since discrete variational mechanics is expressed in terms of discrete Lagrangians, and the exact discrete Lagrangian generates the exact flow map of a continuous Lagrangian system, it is natural to ask whether we can characterize the order of accuracy of the Lagrangian map
as an approximation of the exact flow map, in terms of the extent to which the discrete Lagrangian
approximates the exact discrete Lagrangian
. This is indeed possible, and is referred to as
variational error analysis. Theorem 2.3.1 of [
11] shows that if a discrete Lagrangian
approximates the exact discrete Lagrangian
to order
p, i.e.,
, then the discrete Hamiltonian map,
, viewed as a one-step method, is order
p accurate.
As mentioned above, the divergence function from information geometry can serve as a Type I generating function of a symplectic map, and hence it can be viewed as a discrete Lagrangian in the sense of discrete Lagrangian mechanics. A divergence function also generates the Riemannian metric and affine connection structures on the diagonal manifold (Lemma 1), in addition to generating the symplectic structure on . Viewed in this way, a natural question is to what extent can we view the divergence function as corresponding to the exact Lagrangian flow of an associated continuous Lagrangian. We can show that
Theorem 3. The exact discrete Lagrangian associated with the geodesic flow, with respect to the induced metric g, can be approximated by a divergence function up to third order accuracy,if and only if Q is a Hessian manifold, i.e., is the Bregman divergence , for some strictly convex function Φ.
Proof. Let us expand the exact discrete Lagrangian to obtain,
where
.
From the definition of a divergence function:
Differentiating with respect to
q,
so
Differentiating with respect to
q again,
Observe that the left-hand side is the metric induced by the divergence function,
Expanding
around
for
:
we obtain
where
Clearly,
, and
Comparing the corresponding terms in powers of
h, we obtain,
Substituting (
74) into (75) yields
with
This, according to Proposition 1, demonstrates that the manifold is Hessian, and hence dually-flat. So, for the expansions to agree to , the inducing divergence function must be the Bregman divergence . ☐
6. Summary
In this paper, we show the differences and connections between geometric mechanics and information geometry in canonically prescribing differential geometric structures on a smooth manifold Q. The Legendre transform plays crucial roles in both; however, they serve very different purposes. In geometric mechanics, the fiberwise Legendre map serves to link the cotangent bundle with tangent bundle , whereas in information geometry, the Legendre transform relates the pair of biorthogonal coordinates, which are special coordinates on a dually-flat manifold Q. More specifically, (or its inverse ) is invoked to establish the isomorphism between in geometric mechanics, whereas in information geometry, a Hessian metric g built upon a convex function on Q is used for the correspondence between two coordinate systems on Q, and also for potentially (but not necessarily) establishing a correspondence between and .
The link between information geometry and discrete mechanics is much stronger when one considers the discrete version (as opposed to the traditional, continuous version) of geometric mechanics. Both endow a symplectic structure on , through the use of a discrete Lagrangian in the case of geometric mechanics and a divergence function in the case of information geometry—in fact they are both Type I generating functions for inducing on via pullback from the canonical symplectic structure on . Using the Legendre transform, Type II generating functions can be constructed, which lead to the (right) discrete Hamiltonian in geometric mechanics and to the dual divergence function in information geometry.
Our analyses draw a distinction between the fiberwise Legendre map (which is used in continuous mechanics setting), the Legendre transform between biorthogonal coordinates (which is used in information geometry), and the Legendre transform between Type I and Type II generating functions (which is used in the setting of both discrete geometric mechanics and information geometry). The distinctions are more prominent when one considers the Pontryagin bundle . There, we can construct a divergence function that actually measures the duality gap between the Lagrangian function and the Hamiltonian function that generate a pair of (forward and backward) Legendre maps. In so doing, we demonstrate that information geometry can be viewed as an extension of geometric mechanics based on Dirac mechanics and geometry, with a full-blown duality between the Lagrangian and Hamiltonian components.
7. Discussion and Future Directions
Noda [
24] showed that, with respect to the symplectic structure
on
, the Hamiltonian flow of the canonical divergence
induces geodesic flows for ∇ and
. He interpreted biorthogonal coordinates as a single coordinate system on
, in a way that is consistent with treating
as the Type I generating function on
. It remains unclear how the resulting Hamiltonian flow is related to dynamical flow on the Dirac manifold.
In another related work, Ay and Amari [
25] sought to characterize the canonical form of divergence functions for general (non dually-flat) manifolds. They investigated the
retraction map which we discussed in
Section 2.5, and used the exponential map associated to any torsion-free affine connection ∇ on
. This approach, based on parallel transport, in essence generates a semispray on
, and is quite different from characterizing the dynamics using the Hamilton flow on
. Note that even though one may define a symplectic structure (through pullback) on
as well, Ay and Amari [
25] treats the semispray on
as the primary geometric object. Future research will clarify its relation to our approach, which is based on defining a symplectic structure on
directly.
Finally, comparing information geometry with geometric mechanics may shed light on universal machine learning algorithms. In machine learning or state estimation applications, we wish to have the estimated distribution be influenced by the observations, so that the estimated distribution eventually becomes consistent with the observed data. Let
denote the sequence of predictions by (possibly a series of) model distributions
, and let
denote the actual data generated by an unknown distribution
that we are trying to estimate. In practice, the divergence functions are constructed so that the
pseudo-distance between two distributions
and
can be computed using only complete information about
and samples from
. As such, we can measure the mismatch between the current prediction
and the actual data
using
, since the asymmetry in the definition of
is such that we only require samples
from the true but unknown distribution. So, adding a momentum term to ensure gentle change in model predictions, a possible choice of a discrete Lagrangian for generating the discrete dynamics for the machine learning application might be given by
where the first term can be interpreted as the action associated with the kinetic energy, and the second term is the action associated with the potential energy. By construction, the term
vanishes when the prediction
is consistent with the actual observation
, and it is positive otherwise, so the term
can be viewed as a potential energy term that penalizes mismatch between the estimated distribution and the observational data. Our variational error analysis may thus shed light on an asymptotic theory of inference where sample size
is akin to discretization step
.
The link between geometric mechanics and information geometry, as revealed through our present investigation, is still rather preliminary. The possibility of a unified mathematical framework for information and mechanics is intriguing and remains a challenge for future research.