1. Introduction
The motivation for the work reported in this paper is the deeper understanding of the effect of the ambient metric on the higher-order interactions arising in data, as well as its multi-scale geometric–topological properties. The first can be viewed via the Čech or Alpha complex, the second through their persistent homology. Specifically, we generalize geometric and topological data analysis methods from Euclidean geometry to Bregman geometries in which dissimilarity is measured with divergences that are generally non-symmetric and violate the triangle inequality. By necessity, these methods are sensitive to the dissimilarity defining the ambient geometry, and we exploit this sensitivity to quantify the difference between geometries.
As example geometries, we emphasize those related to information-theoretic concepts, such as the Shannon geometry and the Fisher geometry. What we call the Shannon geometry is induced by the relative entropy (Kullback–Leibler divergence) [
1], which is based on Shannon’s entropy [
2]. The Fisher geometry is induced by the Fisher distance (more technically the Fisher–Rao metric [
3]). It was introduced by Rao [
4], building on Fisher’s seminal work [
5] in statistics.
These are examples of what we call information spaces [
6]. Such spaces often exhibit unfamiliar behavior, different from the Euclidean space we are used to. Understanding such spaces is however important, as—plainly speaking—this is where many types of data live [
7].
Importantly, these geometries are commonly used in data science. For example, relative entropy is a standard loss function used in dimensionality reduction [
8,
9] and deep learning [
7,
10]. (This connection may not be immediately obvious. Many models aim to explicitly minimize what is called cross-entropy. It has the same minimizer as the relative entropy, which can additionally be viewed as a distance, and therefore gives rise to a geometry). The Fisher geometry starts to be considered [
11] as an alternative.
We therefore deem it important to shed light on the differences and commonalities between these geometries. Some pertinent questions are as follows: Is the simpler Fisher geometry a good approximation of the Shannon geometry? Can we see a significant difference between the Euclidean geometry and the non-Euclidean ones, as predicted by the discrepancy in their practical performances?
In the current work, we are particularly interested in the algorithms that underpin the data analysis methods, especially the topological ones. While the Fisher geometry can be handled with Euclidean tools [
6], the Shannon geometry used to require customized ones [
12]. We show that the Shannon geometry can also benefit from existing robust tools, although in this case the application is less direct. We also hope that this development opens new alleys for topological data analysis in information spaces.
Prior work and results. The research reported in this paper merges several lines of work. One is the study of Bregman divergences from the computational geometry point of view initiated in [
13] and the extension of topological data analysis to Bregman and Fisher geometries started in [
6,
12]. Another is the study of higher-order interactions in high-dimensional probabilistic feature spaces [
14,
15], which generally lack information-theoretical interpretations.
Pivotal work at the intersection of computational geometry and Bregman geometry includes the extension of k-means clustering [
16], proximity search data structures [
17,
18], and Voronoi tessellations [
13] to the Bregman setting.
On a practical note, we mention a research direction focusing on detecting, loosely speaking, the holes in knowledge [
14,
15,
19,
20] understood as a collection of text documents. In the simplest case, each text document is represented as a discrete probability distribution (by counting characteristic keywords [
21]). Interestingly, higher-order interactions among documents are necessary to capture the multi-scale topology of these data. Indeed, pairwise, triplewise, etc., intersections between balls (with respect to the appropriate metric or divergence) centered at the data points reveal the topological structure.
In this work, we will also use computational topology methods. One important topological tool is the nerve and the related Nerve Theorem. Intuitively, it allows one to capture the topology (more precisely, the homotopy type) of a union of (a finite collection of) objects using a finite simplicial complex. The theorem is usually attributed to Borsuk [
22] or Leray [
23], and is stated more explicitly by Weil [
24]. See [
25] for a comprehensive, modern treatment of this topic.
A particularly important concept in our investigations is the Bregman–Delaunay mosaic, which we formally define as the straight-line dual of the not necessarily straight-line Bregman–Voronoi tessellation obtained by measuring distance with the Bregman divergence from a data point. This mosaic was already defined in [
13], and we explain how it can be computed as a weighted Euclidean Delaunay mosaic using standard geometric software. In Euclidean space, the Alpha shapes can be defined as the sublevel sets of the radius function on the Delaunay mosaic, which is a generalized discrete Morse function in the sense of Forman [
26] and Freij [
27]. The lower sets of the critical simplices of this function constitute the Wrap complex [
28], which was introduced as a shape reconstruction tool in [
29]. We extend this framework by introducing the rise function on a Bregman–Delaunay mosaic, which provides a convenient measure of the size of a Bregman ball. With these notions, we construct the shape of data in different geometries, and we use them to quantify the difference between the geometries.
We have implemented all the algorithms and used the resulting software to run experiments comparing the Euclidean, Shannon, and Fisher geometries for synthetic data. We find that the Delaunay mosaics and their Alpha and Wrap complexes in these geometries show some, occasionally subtle, differences, which we quantify.
Outline. Section 2 provides the necessary background from discrete geometry and computational topology.
Section 3 gives the details needed to compute Delaunay mosaics and their Alpha and Wrap complexes in Bregman and Fisher geometries using software for weighted Delaunay mosaics in Euclidean geometry.
Section 4 presents computational experiments, and
Section 5 discusses the quantification of the difference between Bregman and other geometries.
Section 6 concludes the paper. In
Appendix A,
Appendix B,
Appendix C and
Appendix D, we provide supplementary experiments, results, and background information.
2. Background
We need some background on Bregman divergences, Delaunay mosaics, and discrete Morse functions. Indeed, this paper combines these concepts to obtain new insights into Bregman–Delaunay mosaics and their scale-dependent subcomplexes.
Bregman divergence. Given a suitable convex function on a convex domain, the best affine approximation at a point defines a dissimilarity measure on the domain (see [
30]). We follow [
31] in the details of this construction, requiring a technical third condition that guarantees a conjugate function of the same kind. Let
be an open and convex domain. A function
is of Legendre type [
32] if
- (i)
F is differentiable;
- (ii)
F is strictly convex;
- (iii)
diverges whenever we approach the boundary of .
If the boundary of the domain is empty, which is the case for
, then Condition (iii) is void. In other words,
does not necessarily diverge when
. Given points
, the Bregman divergence from
x to
y associated with
F is the difference between
F and the best affine approximation of
F at
y, both evaluated at
x:
where
denotes the standard dot product. Note that
, with equality iff
, resembling a metric. However, the other two axioms of a metric do not hold: the divergence is not necessarily symmetric, and it violates the triangle inequality in all non-trivial cases. In spite of these shortcomings, Bregman divergences are useful as measures of dissimilarity and are sufficient to define a geometry.
For a given
, the primal ball with center
x contains all points
y such that the divergence from
x to
y is at most
h, and the dual ball contains all points
y such that the divergence from
y to
x is at most
h:
The geometric intuition for (
2) is to cast light onto the graph of
F from a point vertically above
in
and at distance
h below the graph of
F: the primal ball is the vertical projection of the lit up part of the graph onto
. This ball is not necessarily convex. The geometric intuition for (3) is to intersect the graph of
F with the tangent hyperplane at
x shifted vertically upward by a distance
h: the dual ball is the vertical projection of the part of the graph on or below this shifted hyperplane. This ball is necessarily convex.
The conjugate of
F can be constructed with elementary geometric means. Specifically, we use the polarity transform that maps a point
to the affine map
defined by
. Similarly, it maps
to
. The graph of
F can be described as a set of points or a set of affine maps that touch the graph. The conjugate function,
, is defined such that polarity maps the points of the graph of
F to the tangent affine maps of the graph of
, and it maps the tangent affine maps of the graph of
F to the points of the graph of
. Since
A and
switch position with gradient, so do
F and
. More specifically,
and
are given by
The convexity of
and Conditions (i), (ii), (iii) imply that
is convex and
satisfies (i), (ii), (iii). In other words, the conjugate of a Legendre type function is again a Legendre type function. Importantly, the Bregman divergences associated with
F and with
are symmetric:
. Hence,
maps primal balls to dual balls and it maps dual balls to primal balls:
Since all dual balls are convex, all primal balls are diffeomorphic images of convex sets. This implies that the common intersection of a collection of primal balls is either empty or contractible [
12], so the Nerve Theorem applies.
Examples. An important example of a Legendre type function is defined by mapping x to half the square of its Euclidean norm: . It is the only Legendre type function that is its own conjugate: . The symmetry between the divergences of a Legendre type function and its conjugate thus implies and . Indeed, it is easy to see that the divergence is half the squared Euclidean distance, , which is of course symmetric. This particular Legendre type function provides an anchor point for comparison.
The example that justifies the title of this paper is the (negative) Shannon entropy [
2]. We add an extra term, resulting in
, defined as
. Being linear, the additional term does not affect the resulting divergence, but simplifies certain computations later.
This function is of Legendre type and fundamental to information theory [
2]. Its divergence,
is generally referred to as the relative entropy [
10] or the Kullback–Leibler divergence [
1] from
x to
y. We remark that the above derivation using the Bregman machinery results in a divergence which is valid on the entire positive orthant of
, and agrees with the standard definition [
1] on the standard simplex.
The gradient of the Shannon entropy at x is the vector with components for . Using Equation (5), one easily computes that the conjugate of E maps this vector to . Hence is defined by mapping to .
A case of special interest is the restriction of the Shannon entropy to the standard simplex, which is a subset of the positive orthant. Writing
for a point of
, the
standard -simplex, denoted
, consists of all points
x that satisfy
. We use
as the domain of a Legendre type function, which is the reason why we introduce
as an open set. Finally, write
for the restriction of the Shannon entropy to the standard simplex. This setting is important because each
can be interpreted as a probability distribution on
d disjoint events. Correspondingly,
is the expected efficiency to optimally encode a sample from this distribution. Finally, the relative entropy from
x to
y is the expected loss in coding efficiency if we use the code optimized for
y to encode a sample from
x. Projecting the gradient of the unrestricted Shannon entropy into the hyperplane of the simplex passing through the origin, we obtain the gradient of the restriction:
Using (
4) and (5), we compute the conjugate of
, which we state in terms of the barycentric coordinates parametrizing
. Specifically, we obtain
and
in which the
are the coordinates in conjugate space. Indeed, it is not difficult to verify (13) using
for points in the standard simplex.
Antonelli isometry. A Bregman divergence gives rise to a path metric in which length is measured by integrating the square root of the divergence. As explained in [
6], any divergence that decomposes into a term per coordinate implies an isometry between this path metric and the Euclidean metric. By (
9), the relative entropy is an example of such a divergence, and the corresponding path metric is known as the Fisher metric, which plays an important role in statistics and information geometry [
3]. Instead of formalizing the recipe for constructing the Fisher metric from the relative entropy, we note that the mapping
is an isometry with Euclidean space, as first observed by Antonelli [
33]. By virtue of being an isometry, the distance between points
under the Fisher metric satisfies
. The path of this length from
x to
y is the preimage of the line segment from
to
, which is generally not straight.
Of special interest is the Fisher metric restricted to the standard -simplex, denoted . The mentioned isometry maps to , which is the positive orthant of the sphere with radius and center at the origin in . The shortest path between is thus the preimage of the great-circle arc that connects and on the sphere. Since this arc is generally longer than the straight line segment connecting and in , the distance between x and y under the Fisher metric restricted to is generally larger than in the unrestricted case.
Alpha shapes and Wrap complexes. Two popular shape reconstruction methods based on Delaunay mosaics are the Alpha shapes introduced in [
34] and the Wrap complexes first published in [
29]. Both extend to generalized discrete Morse functions and therefore to Bregman–Delaunay mosaics and Bregman–Wrap complexes. While working with Bregman divergences, we can construct them using weighted Euclidean Delaunay mosaics, for which there is readily available fast software. For brevity, standard definitions and properties are available in
Appendix B. Letting
D be a simplicial complex and
a generalized discrete Morse function, the Alpha complex for
h is the sublevel set,
and the Alpha shape is the underlying space of the Alpha complex. In contrast to the Alpha shape, the assumption that
f be a generalized discrete Morse function is essential in the definition of the Wrap complex. Recall that every step of a generalized discrete Morse function is an interval of simplices in the Hasse diagram. We form the step graph,
, whose nodes are the steps and whose arcs connect step
to step
if there are simplices
and
with an arc from
P to
Q in the Hasse diagram. By construction,
f is strictly increasing along directed paths in the step graph, which implies that the graph is acyclic.
The lower set of a node
in
, denoted
, is the set of nodes
for which there are directed paths from
to
. Similarly, we write
for the lower set of a collection of nodes, and
for the corresponding collection of simplices. We are particularly interested in the set of singular intervals, and we recall that each such interval contains a critical simplex of
f. We write
for the set of singular intervals, and
for the subset whose simplices satisfy
. The Wrap complex for
h is the union of steps in the lower sets of the singular intervals with value at most
h:
There are alternative constructions of the Wrap complex. Starting with the Alpha complex for
h, we obtain the Wrap complex for the same value by collapsing all non-singular intervals that can be collapsed. The order of the collapses is not important as all orders produce the same result, namely
. Symmetrically, we may start with the critical simplices of value at most
h and add the minimal collection of non-singular intervals needed to obtain a simplicial complex. The minimal collection is unique and so is the result,
. A proof of the equivalence of these three definitions of the Wrap complex is given in
Appendix C.
3. Mosaics and Algorithms
In this section, we review Bregman–Delaunay and Fisher–Delaunay mosaics as well as their scale-dependent subcomplexes. All mosaics are constructed using software for weighted Delaunay mosaics in Euclidean geometry; all subcomplexes are computed by convex optimization, adapting the method from [
12]. We begin with the mosaics in Bregman geometry.
Bregman–Delaunay mosaics. Let
be open and convex, consider a Legendre type function
, and let
be locally finite. Following [
12,
13], we define the Bregman–Voronoi domain of
, denoted
, as the points
that satisfies
for all
. The Bregman–Voronoi tessellation is the collection of such domains, and the Bregman–Delaunay mosaic records all non-empty common intersections:
and we note that the mosaic is isomorphic to the nerve [
35,
36] of the tessellation. To develop geometric intuition, we observe that
can be obtained by growing primal Bregman balls with centers at the points
. When two such balls meet, they freeze where they touch but keep growing everywhere else. Eventually, each ball covers exactly the corresponding domain. Since the primal balls are not necessarily convex, it is not surprising that the faces shared by the domains are not necessarily straight. Nevertheless, the Delaunay mosaic has a natural straight-line embedding as all its cells are vertical projections of the lower faces of the convex hull of the points
. To see this, we note that each cell of the mosaic corresponds to a dual Bregman ball whose boundary passes through the vertices of the cell, and this ball is the vertical projection of the part of the graph of
F on or below the graph of an affine function.
Construction. To construct the mosaic, we assume that
is in general position, by which we mean that Conditions I and II from
Appendix D are satisfied after transforming
to
such that
is a subcomplex of the weighted Delaunay mosaic of
X. Lifting the points from
to
and projecting the lower boundary of the convex hull back to
, we obtain the mosaic. We remind the reader that relevant background information can be found in
Appendix B, and define
. We formalize this method while stating all steps in terms of weighted points in
d dimensions:
- Step 1:
Let be the set of weighted points , with .
- Step 2:
Compute the weighted Delaunay mosaic of X in Euclidean geometry, denoted .
- Step 3:
Select as the collection of simplices in whose corresponding weighted Voronoi cells have a non-empty intersection with .
Indeed, the weighted Delaunay mosaic computed in Step 2 may contain simplices that do not belong to the Delaunay–Bregman mosaic of
F. To implement Step 3, we note that
is dual to
, which is isomorphic to
, and this Bregman–Voronoi tessellation is the weighted Voronoi tessellation of
X restricted to
. This tessellation has convex polyhedral cells and is readily available as the dual of
. Writing
for the points
with
and
for the weighted Voronoi cell that corresponds to
, we have
Instead of computing all these intersections, we can collapse
to the desired subcomplex and thus save time by looking only at a subset of the mosaic. We explain how the simplices can be organized to facilitate such a collapse. Recalling that
is open and convex, we introduce the signed distance function,
, which maps every
to plus or minus
such that the sphere with center
a and radius
r touches
but does not cross the boundary. Finally,
if
and
if
. Note that
and that
is open and convex for every
t. Now construct
by mapping
to the maximum
for which
. By (
19), we obtain
by removing all simplices
Y with
. The crucial observation is that for
X in general position,
is a generalized discrete Morse function with a single critical vertex. To see this, we observe that
decomposes
into convex domains for every value
t, which by the Nerve Theorem [
22] implies that
is contractible. Removing the simplices in a sequence of decreasing values of
thus translates into a sequence of collapses that preserve the homotopy type of the mosaic.
Rise functions. To introduce scale into the construction of Bregman–Delaunay mosaics, we generalize the radius function from Euclidean geometry to Bregman geometries, changing the name because size is more conveniently measured by height difference in the
-st coordinate direction as opposed to the radius in
. Let
and
be the point and affine map that correspond to
, and let
be the upper envelope of the
,
. We introduce the rise function,
, which maps each simplex,
Q, to the minimum difference between
and
at points in the conjugate Voronoi cell:
It is the infimum amount we have to lower the graph of
until it intersects the graph of
at a point vertically above the Voronoi cell in conjugate space. Without going to the conjugate, we can interpret
in terms of (primal) Voronoi domains and cones of light cast from the
onto the graph, which we raise until the cones clipped to within their Voronoi domains have a point in common. This interpretation motivates the name of the function. Comparing (
20) with (
A8), we see that the two agree when
and
. Indeed, we obtain
and
. Furthermore,
, and taking the infimum is the same as taking the minimum.
For every , we have a sublevel set, , which we refer to as the Bregman–Alpha complex of U and F for size h. For , this complex is empty, for , it is a set of vertices namely the points in U, and for sufficiently large positive h, this complex is .
Information-theoretic interpretations. We offer a brief information-theoretical interpretation for many of the geometric objects mentioned above in the case of relative entropy.
Let us start with a primal relative entropy ball of radius
r around a finite probability distribution
c. It contains all distributions that can be used to approximate
c, incurring the efficiency loss of at most
r bits. With this, it is easy to see that the Voronoi and Delaunay constructions have intuitive information-theoretical interpretations. For simplicity, let us consider the Čech rise (radius) function, which arises from the first point of a non-empty intersection between a number of primal balls [
35,
36]. This radius is therefore the smallest number of bits that need to be incurred when approximating the centers of these balls with a single distribution. This value can be interpreted as an information-theoretical quantification of the high-order interactions among these distributions. Of course, the intersection point is exactly the unique best choice for such a distribution.
The remaining objects can be interpreted in a similar way, which we leave for the reader. We stress that the birth and death values in the resulting persistence diagrams also express the loss of coding efficiency in bits—and stress that the diagrams are identical for the Delaunay and Čech case.
Computation. We compute the rise function following the intuition based on primal Voronoi domains explained below (
20). Equivalently,
is the minimum amount we have to raise the graph of
F so it has a supporting hyperplane that passes through all points
, with
, while all other point
, with
, lie on or above the hyperplane.
To turn this intuition into an algorithm, we consider the affine hull of Q and write for the affine function that satisfies for all . Let measure the difference: . Since F is of Legendre type, so is H. We are interested in the infimum of H, which either occurs at a point in or at the limit of a divergent sequence. We therefore introduce a numerical routine that returns both, the infimum and the point where it occurs:
- 1
InfSize :
- 2
;
- 3
return .
Note that the dual Bregman ball centered at of size contains Q in its boundary, and it may or may not contain points of in its interior. If it does not, then , otherwise, is the minimum function value of the proper cofaces of Q. To express this more formally, we write for the collection of simplices with and . Since Q gets its value either directly or from a coface, it is opportune to compute the rise function in the order of decreasing dimension:
- 1
for downto 0 do
- 2
forall p-simplices do
- 3
= InfSize(F, Q);
- 4
if
- 5
then
- 6
else .
Note that this algorithm assigns a value to every simplex in . Indeed, the simplices in that are not in have been culled in Step 3, as explained above.
Fisher metric. In addition to the Bregman divergences, we consider Delaunay mosaics under the Fisher metric. To construct them, we recall that the mapping defined by is an isometry between the Fisher metric and the Euclidean metric. This suggests the following algorithm:
- Step 1:
Compute the Delaunay mosaic of in Euclidean space.
- Step 2:
Remove the simplices from whose dual Voronoi cells have an empty intersection with .
- Step 3:
Draw the resulting complex by mapping each point to the original point .
The rise function in Euclidean geometry maps every simplex to the squared radius of the smallest empty circumsphere of . By isometry, the preimage of this Euclidean sphere is the smallest empty circumsphere of Q under the Fisher metric, and the squared radius is the same. We thus obtain the rise function on the Fisher–Delaunay mosaic by copying the values of the rise function on the Delaunay mosaic in Euclidean geometry.
The construction of the mosaic for the Fisher metric restricted to the standard simplex,
, is only slightly more complicated. As mentioned in
Section 2, the isometry maps
to
, which is our notation for the positive orthant of the sphere with radius
centered at the origin in
. The distance between points
under the Fisher metric thus equals the Euclidean length of the great-circle arc connecting
. The Delaunay mosaic of
under the geodesic distance can be obtained by constructing the convex hull of
in
and centrally projecting all faces not incident to 0 onto the sphere. As before, we cull simplices whose dual Voronoi cells have an empty intersection with the positive orthant of the sphere, and we draw the mosaic in
by mapping the vertices back to the original points. Furthermore, the rise functions of the mosaics in
and in
are the same. Note, however, that the geodesic radius is the arc-sine of and therefore slightly larger than the straight Euclidean radius in
.
4. Computational Experiments
We illustrate the Bregman–Alpha and Bregman–Wrap complexes while comparing them to the conjugate, the Fisher, and the Euclidean constructions.
Example in positive quadrant. Let
X be a set of 1000 points uniformly distributed according to the Fisher metric in
. To sample
X, we use the isometry,
, between the Fisher and the Euclidean metric mentioned in
Section 2. Specifically, we sample 1000 points uniformly at random according to the Euclidean metric in
, and we map each point with coordinates
,
to
, which is again a point in
. To compute the Delaunay mosaic in Fisher geometry, we construct the (Euclidean) Delaunay mosaic of
and draw this mosaic with the vertices at the points in
X. Recall however that the domain is
and not
. A simplex whose corresponding Voronoi cell has an empty intersection with the positive orthant thus does not belong to the mosaic, which is restricted to
. We identify these simplices and remove them from the Delaunay mosaic as described in
Section 3.
Example in standard triangle. Motivated by our interest in information-theoretic applications, we repeat the above experiment within the standard triangle, , which consists of all points that satisfy . Every point in can be interpreted as a probability distribution on three disjoint events, which is indeed the most relevant scenario for the application of the relative entropy. To sample a set Y of 1000 points uniformly at random according to the Fisher metric in , we use again j, now restricted to , whose image is the positive orthant of the sphere with radius centered at the origin of . Sampling 1000 points uniformly at random according to the geodesic distance on the sphere, we take the convex hull of and obtain the mosaic by mapping the vertices to the points in . Before drawing the faces in , we remove 0 and all incident faces, as well as all faces whose corresponding Voronoi cells have an empty intersection with .
Recall that the squared Fisher metric matches the relative entropy in the infinitesimal regime, which explains the random appearance of the reconstruction in
Figure 1, for which we set the threshold to
. The reconstruction in Shannon geometry is similar to those in conjugate Shannon geometry in
Figure 1a and in Fisher geometry in
Figure 1b. To interpret the reconstruction in
Figure 1d, we observe that the difference between the Shannon entropy and the squared Euclidean norm has a minimum at the center and no other critical points in the interior of the triangular domain. Accordingly, the reconstruction removes simplices near the corners and the three sides first. More drastically, the Bregman–Wrap complex for the same data removes all simplices except for a single critical edge near the center (see
Figure 2d).
5. Quantification of Difference
We take a data-centric approach to quantifying the differences between the geometries. Given a common domain, , and a finite set of points, , we compare the corresponding mosaics and rise functions.
Mosaics. The Delaunay mosaics of
U depend on the local shape of the balls defined by the metric or the divergence. Letting
D and
E be two Delaunay mosaics with vertex sets
U, we compare them by counting the common cells:
which is sometimes referred to as the
Jaccard distance between the two sets. It is normalized so that
iff
and
iff
D and
E share no cells at all. In our application, the two mosaics share all vertices, so
J is necessarily strictly smaller than 1. To apply this measure to the constructions in
Section 4, we write
for the mosaics in
Figure A1, and we write
for the mosaics in
Figure 3. All mosaics are different, except for
and
. The Jaccard distances are given in
Table 1.
We see that the mosaics in conjugate Shannon geometry and in Fisher geometry are most similar to each other and less similar to the mosaic in Shannon geometry. The mosaic in Euclidean geometry is most dissimilar to the others (see
Figure 3 and related
Figure A1 in
Appendix A for visual confirmation).
Rise functions. Different rise functions on the same mosaic can be compared by counting the inversions, which are the pairs of cells whose orderings are different under the two functions. Recall that
and
, let
and
be the rise functions in Shannon geometry, and let
and
be the rise functions in weighted Euclidean geometry. The normalized number of inversions are
In words, slightly fewer than half the pairs are inversions, both for
and for
. This is plausible because
orders the cells along the diagonal while
preserves the random character of the point sample (see
Figure A2d). Similarly,
orders the cells radially, from the center of the standard triangle to its periphery, while
preserves again the random character of the sample (see
Figure 4d).
We can compare the rise functions also visually, by color-coding the 2-dimensional cells, and this works even if the mosaics are different. Specifically, we shade the triangles by mapping small to large rise function values onto dark to light colors. In
Figure A2a,b, this leads to randomly mixed dark and light triangles, while in
Figure A2c,d, there are clear but opposing gradients parallel to the diagonal. Similarly, in
Figure 4c, we see the rise function decreasing from the center to the boundary of the standard triangle, and in
Figure 4d we see it increasing from the center to the boundary. In addition, we compare general rise functions by computing their persistence diagrams (see [
36]). Writing
for the persistence diagram of function
d, we quantify the difference with the bottleneck between the diagrams:
As explained in [
36], the bottleneck distance is 1-Lipschitz, that is:
, but
does not necessarily imply
. The bottleneck distances between the
and the
are given in
Table 2.
In part, this comparison agrees with the Jaccard distances between the mosaics given in
Table 1. The most obvious disagreements are for
and for
, in which quite different functions are defined on identical mosaics.
6. Discussion
We extended two popular computational geometric-topological methods within the framework of discrete Morse functions and showed how this generalizes the methods to data in Bregman and Fisher geometries. Importantly, this allowed us to produce a robust implementation without the need to develop customized low-level software. We hope that this result will provide extra motivation for the development of geometric software for high-dimensional Euclidean space, as such software can be reused to handle data in information space. In particular, efficient computation of skeleta of weighted Delaunay mosaics would be relevant for topological analysis of high-dimensional data.
Turning the table, we use these generalized methods to compare different geometries experimentally. Our experimental approach to studying geometries is a first step towards an intuitive understanding of these often counter-intuitive geometries, as well as the high-order interactions occurring in them.
We reiterate one reason why understanding such geometries is important: they underpin many modern data science methods. It is, for example, a prerequisite for explaining the surprisingly good performance of many deep learning models (that output predictions as discrete probability distributions). More broadly, the space of discrete probability distributions is a basic object in mathematics, but is not well understood. We provide new tools for broadening the understanding of such spaces.
Perhaps the main observation coming from our experiments is the following. As we know, the pairwise distance in the Kullback–Leibler geometry and the Euclidean or Fisher one can be very different. Indeed, on the standard simplex, the former can approach infinity while it is bounded in the latter. Still, our experiments show that the geometric-topological structures—at least in low dimensions—do not typically show large discrepancies between the geometries. Is this also true in higher dimensions? How does this generalize to other Bregman divergences and the corresponding generalized Fisher distances [
6]? To what extent can the Fisher space replace the Shannon space in various applications?
Several open questions about experimental understanding of geometric spaces arise:
How do these results generalize to other notions of entropy, such as the Burg entropy [
37]? The resulting Bregman Itakura–Saito divergence [
38] is used to compare speech samples, but little is known about the resulting geometry and how it compares with other geometries.
Is there a deeper reason for the similarity of the studied geometries? For example: are there Pinsker-type inequalities [
39,
40] between the pairwise dissimilarities (as well as other Bregman divergences)?
Can the sensitivity of Delaunay mosaics to the dissimilarity be quantified probabilistically, as the expected Jaccard distance for random point processes?
Persistence has been used before to compare metric spaces [
41], and it would be interesting to know whether there are deeper connections to our work.
Can the Fisher metric be extended beyond the positive part of the sphere? What is the geometry of the preimage of the Antonelli map in this case?
We finish with a technical question concerning the Delaunay mosaics in Fisher geometry: is the drawing we obtain by mapping the vertices to the corresponding points and connecting them with straight edges, flat triangles, etc., necessarily a geometric realization of the mosaic?