1. Introduction
Information geometry (IG) has, in recent decades, become a very helpful mathematical tool for several branches of science [
1,
2,
3]. It relies on the study of a smooth manifold endowed with a Riemannian metric tensor and a couple of torsion-free affine connections which are dual to each other and whose average leads to the Levi–Civita connection [
4]. More precisely, the main object of study in IG is a quadruple
, where
is a Riemannian manifold [
5] and
are torsion-free linear connections on the tangent bundle
, such that
for all sections
and
where
denotes the Levi–Civita connection of the metric tensor
[
6]. Here
denotes the space of vector fields
from the manifold to the vector bundle. The quadruple
is usually referred to as a
statistical manifold whenever the affine connections are both torsion-free [
6].
The geometry of a statistical manifold is totally encoded in a distance-like function
in the following way:
where the indexes
run from
to
and
,
are the symbols of the dual connections
and
, respectively [
7]. Here,
denotes a coordinate system at
and
. When the matrix
is strictly positive definite for all
, such a function is called a
divergence function or
contrast function of the statistical manifold
[
8].
Given a torsion free
dual structure on a smooth manifold
, there are infinitely many divergence functions which induce on
the same dual structure
[
9]. However, Amari and Nagaoka showed that a kind of canonical divergence is uniquely defined on a
dually flat statistical manifold [
4]. More precisely, a dual structure
is said to be dually flat when both the Riemann curvature tensors
and
are zero [
5]. In this case, there exist coordinate systems mutually dual
, such that
,
and
, where
if
otherwise
. Here,
and
. Moreover, there exists a couple of functions
and
, such that
It turns out that
where Einstein’s notation is adopted. Given two points
, the canonical divergence of the dually flat statistical manifold
between
and
is then defined by
Relying upon the canonical divergence on a dually flat structure, one can list the basic properties for a divergence to be a canonical one of a general dual structure. In [
6], the authors required that a divergence function, to be a canonical one, would be one half the square of the Riemannian distance when
and would be the canonical divergence of Amari and Nagaoka when the dual structure is dually flat. However, we can find in the literature some different divergences which accomplish such requirements (see for instance [
10,
11,
12,
13,
14]). Hence, the search for a canonical divergence on a general dual structure is still an open problem [
10]. Nonetheless, the notion of canonical divergence on a dually flat manifold can illustrate how information geometry modifies the usual Riemannian geometry. In [
4], the authors have applied such a notion to any curve within a general statistical manifold. Let
be a curve within a statistical manifold
; we can consider the dual structure
induced by
on
. Such a structure is given by
where
From here on, we denote the scalar product
induced by the metric tensor
with
. Since
is a
-dimensional manifold, it is a dually flat manifold. Therefore, in [
4] the authors applied the notion of canonical divergence to obtain a divergence of the curve
:
In [
4], the authors claimed that the divergence
is independent of the parameterization of
but only depends on the orientation of
. This brings a close relation between the divergence
and its dual function, which is given as follows:
In particular, we have that
where
[
4].
Even though the adaptation of the canonical divergence (
7) to the
-dimensional case resulting in the canonical divergence Formula (
10) is clear in [
4], with this work, we aim to provide differential geometric-based proofs of all statements claimed in [
4] on the canonical divergence of a curve
. In particular, we want to prove the following statements:
- (i)
The divergence of a curve
is given by the expression in (
10). We refer to it as
the canonical divergence of the curve .
- (ii)
The divergence is independent of the particular parameterization of .
- (iii)
If we change the orientation of
, we obtain the dual divergence given in (
11) and the relation (
12).
Finally, in the self-dual case, that is when
, we prove that
which is claimed in [
4] and provides evidence of how information geometry modifies the usual Riemannian geometry.
2. The Canonical Divergence of a Curve
Let
be a curve within a statistical manifold
. Let us assume that it is a
-dimensional manifold. Therefore, the statistical manifold
is always dually flat. This implies that there exists a couple of two affine parameters,
and
. Indeed, consider an arbitrary parameter
; owing to the dual flatness of
, the
-forms
and
are such that the following holds true [
15]:
where the coefficients
and
are defined in (
9). The solutions of Equations (
14) and (
15) are given by
Then, it straightforwardly follows that
According to the theory developed in [
4], we can find two functions
and
such that
These two equations lead to the following relations:
which have the following solution:
Now, we can understand that
and
. Likewise, we have that
and
. Therefore, following (
7), we can define the divergence of
between
and
as
where
and
.
At this point, we can try to give a more explicit expression for the divergence
. Let us interchange the role of
and
. Then, from Equations (
16) and (
20), we can write
Now, we show that the expression (
23) is the same as the one in (
10).
Proposition 1. Let be a curve endowed with the dualistic structure . Let and be the affine parameters with respect to and , respectively. Then, the divergence of can be written as follows: Proof. If we assume that
, we can see from Equations (
10) and (
16) that
Let us now observe that, from Equation (
1), we can write
Therefore, we have that
where we assumed that
. Now, we may observe from
Figure 1 that we can represent the domain of integration in the double integral of Equation (
10) as the grey colored region.
Therefore, we can split the double integral as follows:
and by plugging in it
and
, we obtain the desired result.
The representation of in Proposition 1 allows us to show that the divergence of a curve is independent of the particular parameterization of . The next result might be important to characterize the divergence as a distance-like function.
Proposition 2. Let be a smooth curve and let be a re-parameterized curve of . Then Proof. Consider
(
) an increasing diffeomorphism, meaning that
if
. From Equation (
24), we can write
Our aim is to perform the change of variable
within the integral (
26), which is ruled by
. Here, we use different notations to denote the derivative with respect to the “time”
(
) and the derivative with respect to the parameter
(
). Let us define
. Then, we have that
Recall that
with
and
.
We can now use Equations (
27) and (
28) to perform the change of variable in the integral (
26):
where we assumed that
.
This result shows that the canonical divergence
of the dual structure
is independent of the parametrization of the curve. However, it depends on the orientation of the curve. We will shortly show that, by reversing the parameter of
, we obtain the dual divergence of
which is defined in Equation (
11). In order to accomplish this task, we may note that, by applying the same methods as the ones in the proof of Proposition 1, we can write the dual divergence of
as follows:
for
, such that
and
. Then, in the next result, we are going to prove that
coincides with the
-divergence of the reversely oriented curve,
.
Proposition 3. Let be such that and . Let be the reversely oriented curve. Then we have that Proof. Consider Equation (
24); by integrating by parts, we can write it as follows:
By reversing the “time”
, namely
, we get the reversely oriented curve, namely
. Therefore, from the last equation, we obtain
where the last equality is obtained from Equation (
29). Finally, we can write
which proves the statement.
To sum up, the
-divergence of an arbitrary path
only depends on the orientation but not on its parameterization, and the
-divergence of
coincides with the
-divergence of the reversely oriented curve
. For this reason, we can refer to
as a
modification of the curve length within information geometry. A further support to this statement comes from the self-dual case, namely when
. In this case, information geometry reduces to the Riemannian geometry [
6] and the
-divergence
becomes the square of the Riemannian length of the curve
.
Proposition 4. Let be a self-dual statistical manifold, i.e. with the Levi–Civita connection. Let be a smooth curve, such that and . Then the -divergence (10) becomeswhere is the length of . Proof. Consider the representation of
given in Equation (
10); when
is the coefficient of the Levi–Civita connection induced onto the curve
, we can write
Thanks to the compatibility property of
with the metric tensor
, we then obtain
At this point, we are ready to express the divergence of
in the self-dual case:
We may now observe from
Figure 1 that the area of the integration domain in the integral above is one half the area of the rectangle
. Therefore, by the symmetry properties of
, we obtain that
which proves that the divergence of
, in the self-dual case, is one half the square of the length, i.e., it is independent of the parameterization.
3. Conclusions
In classical differential geometry, the connection among geodesics, the length of a curve, and the distance provide deep insight into the geometric structure of a Riemannian manifold. For instance, under specific conditions [
5], the distance between any two points of a Riemannian manifold is obtained through the geodesic between them by means of the minimization of the length functional along any path between the two points [
16,
17].
In information geometry, the inverse problem concerns the search for a divergence function which recovers a given dual structure on a smooth manifold . The Hessian of allows recovery of the Riemannian metric , while third-order derivatives of D retrieve the two torsion free connections and which are dual with respect to . Interestingly, is only determined by the third-order Taylor polynomial expansion of . Therefore, in general, is far from being unique for a given statistical manifold structure. However, in the dually flat case where both and are flat, it is possible to consider a canonical divergence function from a given , which was originally proposed by Amari and Nagaoka.
We considered here the case in which the divergence of a curve
in a statistical manifold
is obtained by applying the notion of the canonical divergence of a dually flat statistical manifold. In this way, the notion of divergence can be considered as the natural quantity to study to understand the geometric structure of a statistical manifold. More specifically, given
, a curve
of
has a natural induced dual structure which is dually flat, since
can be regarded as a
-dimensional submanifold of
itself. In [
4], Amari and Nagaoka used the canonical divergence in a dually flat case to define a “canonical divergence”
of the curve.
In this pedagogical paper, we provided a systematic organization and mathematical proofs of the properties of the divergence in Equation (
22) which were originally stated without demonstration in [
4]. Specifically, we obtained the following results:
In Equation (
22), we provided an explicit expression
of the canonical divergence of a
-dimensional path
.
In Equation (
25) and Proposition 2, we showed that
does not depend on the chosen parameterization of
.
In Equation (
30) and Proposition 3, we demonstrated how
depends on the adopted orientation of
.
In Equation (
32) and Proposition 4, we verified that
equals one half the square length of
in the self-dual case.
By mimicking the classical theory of Riemannian geometry, our work helps to obtain a deeper understanding of the distance-like functional represented by the divergence in Equation (
22). Furthermore, our explicit analysis can provide useful insights for identifying a minimum of
through variational methods in more general settings. These methods, in turn, are the natural tools to deal with any sort of application in information geometry [
18]. It would be of great interest to select a general canonical divergence via the minimum of
over the set of all paths connecting any two points of a general statistical manifold. However, this problem appears to be highly non-trivial, and its solution requires, for instance, a criterion capable of explaining why one divergence would be better than the others in capturing a given dual structure. In some sense, the divergence must be defined using the information from
in a minimal way. We believe that the differential geometric-based approach employed here could help in accomplishing such a task, which is, for now, beyond the scope of our current pedagogical setting. We leave this discussion along with a presentation on potential applications of general canonical divergence for non-dually flat scenarios to future scientific efforts.
In conclusion, our work presents new proofs of pedagogical value that, to the best of our knowledge, do not appear anywhere in the literature. In this sense, our work is neither a review nor a simplification of existing proofs. However, our findings could be regarded as preparatory to results of more general applicability in information geometry, including the study of more general examples of statistical manifolds. In addition, as previously mentioned in the paper, our proofs could be further exploited to pursue more research-oriented open questions of a non-trivial nature, including investigating the ranking of different divergences capable of capturing a given dual structure.