Next Article in Journal
Contrast Information Dynamics: A Novel Information Measure for Cognitive Modelling
Next Article in Special Issue
Singular-Value-Decomposition-Based Matrix Surgery
Previous Article in Journal
Kramers–Wannier Duality and Random-Bond Ising Model
Previous Article in Special Issue
Patterns in Temporal Networks with Higher-Order Egocentric Structures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Understanding Higher-Order Interactions in Information Space

by
Herbert Edelsbrunner
1,
Katharina Ölsböck
1 and
Hubert Wagner
2,*
1
ISTA (Institute of Science and Technology Austria), 3400 Klosterneuburg, Austria
2
Department of Mathematics, University of Florida, Gainesville, FL 32611, USA
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(8), 637; https://doi.org/10.3390/e26080637
Submission received: 19 June 2024 / Revised: 19 July 2024 / Accepted: 20 July 2024 / Published: 27 July 2024

Abstract

:
Methods used in topological data analysis naturally capture higher-order interactions in point cloud data embedded in a metric space. This methodology was recently extended to data living in an information space, by which we mean a space measured with an information theoretical distance. One such setting is a finite collection of discrete probability distributions embedded in the probability simplex measured with the relative entropy (Kullback–Leibler divergence). More generally, one can work with a Bregman divergence parameterized by a different notion of entropy. While theoretical algorithms exist for this setup, there is a paucity of implementations for exploring and comparing geometric-topological properties of various information spaces. The interest of this work is therefore twofold. First, we propose the first robust algorithms and software for geometric and topological data analysis in information space. Perhaps surprisingly, despite working with Bregman divergences, our design reuses robust libraries for the Euclidean case. Second, using the new software, we take the first steps towards understanding the geometric-topological structure of these spaces. In particular, we compare them with the more familiar spaces equipped with the Euclidean and Fisher metrics.

1. Introduction

The motivation for the work reported in this paper is the deeper understanding of the effect of the ambient metric on the higher-order interactions arising in data, as well as its multi-scale geometric–topological properties. The first can be viewed via the Čech or Alpha complex, the second through their persistent homology. Specifically, we generalize geometric and topological data analysis methods from Euclidean geometry to Bregman geometries in which dissimilarity is measured with divergences that are generally non-symmetric and violate the triangle inequality. By necessity, these methods are sensitive to the dissimilarity defining the ambient geometry, and we exploit this sensitivity to quantify the difference between geometries.
As example geometries, we emphasize those related to information-theoretic concepts, such as the Shannon geometry and the Fisher geometry. What we call the Shannon geometry is induced by the relative entropy (Kullback–Leibler divergence) [1], which is based on Shannon’s entropy [2]. The Fisher geometry is induced by the Fisher distance (more technically the Fisher–Rao metric [3]). It was introduced by Rao [4], building on Fisher’s seminal work [5] in statistics.
These are examples of what we call information spaces  [6]. Such spaces often exhibit unfamiliar behavior, different from the Euclidean space we are used to. Understanding such spaces is however important, as—plainly speaking—this is where many types of data live [7].
Importantly, these geometries are commonly used in data science. For example, relative entropy is a standard loss function used in dimensionality reduction [8,9] and deep learning  [7,10]. (This connection may not be immediately obvious. Many models aim to explicitly minimize what is called cross-entropy. It has the same minimizer as the relative entropy, which can additionally be viewed as a distance, and therefore gives rise to a geometry). The Fisher geometry starts to be considered [11] as an alternative.
We therefore deem it important to shed light on the differences and commonalities between these geometries. Some pertinent questions are as follows: Is the simpler Fisher geometry a good approximation of the Shannon geometry? Can we see a significant difference between the Euclidean geometry and the non-Euclidean ones, as predicted by the discrepancy in their practical performances?
In the current work, we are particularly interested in the algorithms that underpin the data analysis methods, especially the topological ones. While the Fisher geometry can be handled with Euclidean tools [6], the Shannon geometry used to require customized ones [12]. We show that the Shannon geometry can also benefit from existing robust tools, although in this case the application is less direct. We also hope that this development opens new alleys for topological data analysis in information spaces.
Prior work and results. The research reported in this paper merges several lines of work. One is the study of Bregman divergences from the computational geometry point of view initiated in [13] and the extension of topological data analysis to Bregman and Fisher geometries started in [6,12]. Another is the study of higher-order interactions in high-dimensional probabilistic feature spaces [14,15], which generally lack information-theoretical interpretations.
Pivotal work at the intersection of computational geometry and Bregman geometry includes the extension of k-means clustering [16], proximity search data structures [17,18], and Voronoi tessellations [13] to the Bregman setting.
On a practical note, we mention a research direction focusing on detecting, loosely speaking, the holes in knowledge [14,15,19,20] understood as a collection of text documents. In the simplest case, each text document is represented as a discrete probability distribution (by counting characteristic keywords [21]). Interestingly, higher-order interactions among documents are necessary to capture the multi-scale topology of these data. Indeed, pairwise, triplewise, etc., intersections between balls (with respect to the appropriate metric or divergence) centered at the data points reveal the topological structure.
In this work, we will also use computational topology methods. One important topological tool is the nerve and the related Nerve Theorem. Intuitively, it allows one to capture the topology (more precisely, the homotopy type) of a union of (a finite collection of) objects using a finite simplicial complex. The theorem is usually attributed to Borsuk [22] or Leray [23], and is stated more explicitly by Weil [24]. See [25] for a comprehensive, modern treatment of this topic.
A particularly important concept in our investigations is the Bregman–Delaunay mosaic, which we formally define as the straight-line dual of the not necessarily straight-line Bregman–Voronoi tessellation obtained by measuring distance with the Bregman divergence from a data point. This mosaic was already defined in [13], and we explain how it can be computed as a weighted Euclidean Delaunay mosaic using standard geometric software. In Euclidean space, the Alpha shapes can be defined as the sublevel sets of the radius function on the Delaunay mosaic, which is a generalized discrete Morse function in the sense of Forman [26] and Freij [27]. The lower sets of the critical simplices of this function constitute the Wrap complex [28], which was introduced as a shape reconstruction tool in [29]. We extend this framework by introducing the rise function on a Bregman–Delaunay mosaic, which provides a convenient measure of the size of a Bregman ball. With these notions, we construct the shape of data in different geometries, and we use them to quantify the difference between the geometries.
We have implemented all the algorithms and used the resulting software to run experiments comparing the Euclidean, Shannon, and Fisher geometries for synthetic data. We find that the Delaunay mosaics and their Alpha and Wrap complexes in these geometries show some, occasionally subtle, differences, which we quantify.
Outline. Section 2 provides the necessary background from discrete geometry and computational topology. Section 3 gives the details needed to compute Delaunay mosaics and their Alpha and Wrap complexes in Bregman and Fisher geometries using software for weighted Delaunay mosaics in Euclidean geometry. Section 4 presents computational experiments, and Section 5 discusses the quantification of the difference between Bregman and other geometries. Section 6 concludes the paper. In Appendix A, Appendix B, Appendix C and Appendix D, we provide supplementary experiments, results, and background information.

2. Background

We need some background on Bregman divergences, Delaunay mosaics, and discrete Morse functions. Indeed, this paper combines these concepts to obtain new insights into Bregman–Delaunay mosaics and their scale-dependent subcomplexes.
Bregman divergence. Given a suitable convex function on a convex domain, the best affine approximation at a point defines a dissimilarity measure on the domain (see [30]). We follow [31] in the details of this construction, requiring a technical third condition that guarantees a conjugate function of the same kind. Let Ω R d be an open and convex domain. A function F : Ω R is of Legendre type [32] if
(i)
F is differentiable;
(ii)
F is strictly convex;
(iii)
F diverges whenever we approach the boundary of Ω .
If the boundary of the domain is empty, which is the case for Ω = R d , then Condition (iii) is void. In other words, F ( x ) does not necessarily diverge when x . Given points x , y Ω , the Bregman divergence from x to y associated with F is the difference between F and the best affine approximation of F at y, both evaluated at x:
D F ( x y ) = F ( x ) F ( y ) + F ( y ) , x y ,
where . , . denotes the standard dot product. Note that D F ( x y ) 0 , with equality iff x = y , resembling a metric. However, the other two axioms of a metric do not hold: the divergence is not necessarily symmetric, and it violates the triangle inequality in all non-trivial cases. In spite of these shortcomings, Bregman divergences are useful as measures of dissimilarity and are sufficient to define a geometry.
For a given h 0 , the primal ball with center x contains all points y such that the divergence from x to y is at most h, and the dual ball contains all points y such that the divergence from y to x is at most h:
B F ( x , h ) = { y Ω D F ( x y ) h } ,
B F * ( x , h ) = { y Ω D F ( y x ) h } .
The geometric intuition for (2) is to cast light onto the graph of F from a point vertically above x R d in R d + 1 and at distance h below the graph of F: the primal ball is the vertical projection of the lit up part of the graph onto R d . This ball is not necessarily convex. The geometric intuition for (3) is to intersect the graph of F with the tangent hyperplane at x shifted vertically upward by a distance h: the dual ball is the vertical projection of the part of the graph on or below this shifted hyperplane. This ball is necessarily convex.
The conjugate of F can be constructed with elementary geometric means. Specifically, we use the polarity transform that maps a point A = ( a , a d + 1 ) R d × R to the affine map A * : R d R defined by A * ( x ) = a , x a d + 1 . Similarly, it maps A * to A = ( A * ) * . The graph of F can be described as a set of points or a set of affine maps that touch the graph. The conjugate function, F * : Ω * R , is defined such that polarity maps the points of the graph of F to the tangent affine maps of the graph of F * , and it maps the tangent affine maps of the graph of F to the points of the graph of F * . Since A and A * switch position with gradient, so do F and F * . More specifically, Ω * = ϕ ( Ω ) and F * : Ω * R are given by
ϕ ( x ) = F ( x ) ,
F * ( ϕ ( x ) ) = F ( x ) , x F ( x ) ,
F * ( ϕ ( x ) ) = x .
The convexity of Ω and Conditions (i), (ii), (iii) imply that Ω * is convex and F * satisfies (i), (ii), (iii). In other words, the conjugate of a Legendre type function is again a Legendre type function. Importantly, the Bregman divergences associated with F and with F * are symmetric: D F ( x y ) = D F * ( ϕ ( y ) ϕ ( x ) ) . Hence, ϕ maps primal balls to dual balls and it maps dual balls to primal balls:
B F * * ( ϕ ( x ) , h ) = ϕ ( B F ( x , h ) ) ,
B F * ( ϕ ( x ) , h ) = ϕ ( B F * ( x , h ) ) .
Since all dual balls are convex, all primal balls are diffeomorphic images of convex sets. This implies that the common intersection of a collection of primal balls is either empty or contractible [12], so the Nerve Theorem applies.
Examples. An important example of a Legendre type function is ϖ : R d R defined by mapping x to half the square of its Euclidean norm: ϖ ( x ) = 1 2 x 2 . It is the only Legendre type function that is its own conjugate: ϖ = ϖ * . The symmetry between the divergences of a Legendre type function and its conjugate thus implies D ϖ ( x y ) = D ϖ ( y x ) and B ϖ ( x , h ) = B ϖ * ( x , h ) . Indeed, it is easy to see that the divergence is half the squared Euclidean distance, D ϖ ( x y ) = 1 2 x y 2 , which is of course symmetric. This particular Legendre type function provides an anchor point for comparison.
The example that justifies the title of this paper is the (negative) Shannon entropy [2]. We add an extra term, resulting in E : R + d R , defined as E ( x ) = i = 1 d [ x i ln x i x i ] . Being linear, the additional term does not affect the resulting divergence, but simplifies certain computations later.
This function is of Legendre type and fundamental to information theory [2]. Its divergence,
D E ( x y ) = i = 1 d [ x i ln x i x i ln y i x i + y i ] ,
is generally referred to as the relative entropy [10] or the Kullback–Leibler divergence [1] from x to y. We remark that the above derivation using the Bregman machinery results in a divergence which is valid on the entire positive orthant of R d , and agrees with the standard definition [1] on the standard simplex.
The gradient of the Shannon entropy at x is the vector E ( x ) with components ln x i for 1 i d . Using Equation (5), one easily computes that the conjugate of E maps this vector to i = 1 d x i . Hence E * : R d R is defined by mapping y R d to E * ( y ) = i = 1 d e y i .
A case of special interest is the restriction of the Shannon entropy to the standard simplex, which is a subset of the positive orthant. Writing x = ( x 1 , x 2 , , x d ) for a point of R + d , the standard ( d 1 ) -simplex, denoted Δ d 1 , consists of all points x that satisfy x 1 + x 2 + + x d = 1 . We use Δ d 1 as the domain of a Legendre type function, which is the reason why we introduce Δ d 1 as an open set. Finally, write E Δ : Δ d 1 R for the restriction of the Shannon entropy to the standard simplex. This setting is important because each x Δ d 1 can be interpreted as a probability distribution on d disjoint events. Correspondingly, E Δ ( x ) = E ( x ) is the expected efficiency to optimally encode a sample from this distribution. Finally, the relative entropy from x to y is the expected loss in coding efficiency if we use the code optimized for y to encode a sample from x. Projecting the gradient of the unrestricted Shannon entropy into the hyperplane of the simplex passing through the origin, we obtain the gradient of the restriction:
E Δ ( x ) = ln x 1 ln x 2 ln x d 1 d i = 1 d ln x i 1 1 1 .
Using (4) and (5), we compute the conjugate of E Δ , which we state in terms of the barycentric coordinates parametrizing R d 1 . Specifically, we obtain ϕ Δ ( x ) = E Δ ( x ) and
E Δ * ( ϕ Δ ( x ) ) = E Δ ( x ) , x E Δ ( x )
= 1 1 d i = 1 d ln x i
= 1 + ln i = 1 d e y i ,
in which the y i = ln x i 1 d i = 1 d ln x i are the coordinates in conjugate space. Indeed, it is not difficult to verify (13) using ln i = 1 d x i = 0 for points in the standard simplex.
Antonelli isometry. A Bregman divergence gives rise to a path metric in which length is measured by integrating the square root of the divergence. As explained in [6], any divergence that decomposes into a term per coordinate implies an isometry between this path metric and the Euclidean metric. By (9), the relative entropy is an example of such a divergence, and the corresponding path metric is known as the Fisher metric, which plays an important role in statistics and information geometry [3]. Instead of formalizing the recipe for constructing the Fisher metric from the relative entropy, we note that the mapping
ȷ ( x ) = ( 2 x 1 , 2 x 2 , , 2 x d )
is an isometry with Euclidean space, as first observed by Antonelli [33]. By virtue of being an isometry, the distance between points x , y R + d under the Fisher metric satisfies x y Fsh = ȷ ( x ) ȷ ( y ) . The path of this length from x to y is the preimage of the line segment from ȷ ( x ) to ȷ ( y ) , which is generally not straight.
Of special interest is the Fisher metric restricted to the standard ( d 1 ) -simplex, denoted Δ d 1 . The mentioned isometry maps Δ d 1 to ȷ ( Δ d 1 ) , which is the positive orthant of the sphere with radius 2 and center at the origin in R d . The shortest path between x , y Δ d 1 is thus the preimage of the great-circle arc that connects ȷ ( x ) and ȷ ( y ) on the sphere. Since this arc is generally longer than the straight line segment connecting ȷ ( x ) and ȷ ( y ) in R + d , the distance between x and y under the Fisher metric restricted to Δ d 1 is generally larger than in the unrestricted case.
Alpha shapes and Wrap complexes. Two popular shape reconstruction methods based on Delaunay mosaics are the Alpha shapes introduced in [34] and the Wrap complexes first published in [29]. Both extend to generalized discrete Morse functions and therefore to Bregman–Delaunay mosaics and Bregman–Wrap complexes. While working with Bregman divergences, we can construct them using weighted Euclidean Delaunay mosaics, for which there is readily available fast software. For brevity, standard definitions and properties are available in Appendix B. Letting D be a simplicial complex and f : D R a generalized discrete Morse function, the Alpha complex for h is the sublevel set,
Alpha h ( f ) = f 1 ( , h ] ,
and the Alpha shape is the underlying space of the Alpha complex. In contrast to the Alpha shape, the assumption that f be a generalized discrete Morse function is essential in the definition of the Wrap complex. Recall that every step of a generalized discrete Morse function is an interval of simplices in the Hasse diagram. We form the step graph, G = G f , whose nodes are the steps and whose arcs connect step φ to step ψ if there are simplices P φ and Q ψ with an arc from P to Q in the Hasse diagram. By construction, f is strictly increasing along directed paths in the step graph, which implies that the graph is acyclic.
The lower set of a node ν in G , denoted ν , is the set of nodes φ for which there are directed paths from φ to ν . Similarly, we write N = ν N ν for the lower set of a collection of nodes, and N for the corresponding collection of simplices. We are particularly interested in the set of singular intervals, and we recall that each such interval contains a critical simplex of f. We write Sg f for the set of singular intervals, and Sg f ( h ) Sg f for the subset whose simplices satisfy f ( Q ) h . The Wrap complex for h is the union of steps in the lower sets of the singular intervals with value at most h:
Wrap h ( f ) = Sg f ( h ) .
There are alternative constructions of the Wrap complex. Starting with the Alpha complex for h, we obtain the Wrap complex for the same value by collapsing all non-singular intervals that can be collapsed. The order of the collapses is not important as all orders produce the same result, namely Wrap h ( f ) . Symmetrically, we may start with the critical simplices of value at most h and add the minimal collection of non-singular intervals needed to obtain a simplicial complex. The minimal collection is unique and so is the result, Wrap h ( f ) . A proof of the equivalence of these three definitions of the Wrap complex is given in Appendix C.

3. Mosaics and Algorithms

In this section, we review Bregman–Delaunay and Fisher–Delaunay mosaics as well as their scale-dependent subcomplexes. All mosaics are constructed using software for weighted Delaunay mosaics in Euclidean geometry; all subcomplexes are computed by convex optimization, adapting the method from [12]. We begin with the mosaics in Bregman geometry.
Bregman–Delaunay mosaics. Let Ω R d be open and convex, consider a Legendre type function F : Ω R , and let U Ω be locally finite. Following [12,13], we define the Bregman–Voronoi domain of u U , denoted dom F ( u , Ω ) , as the points a Ω that satisfies D F ( u a ) D F ( v a ) for all v U . The Bregman–Voronoi tessellation is the collection of such domains, and the Bregman–Delaunay mosaic records all non-empty common intersections:
Vor F ( U , Ω ) = { dom F ( u , Ω ) u U } ,
Del F ( U , Ω ) = { Q U u Q dom F ( u , Ω ) } ,
and we note that the mosaic is isomorphic to the nerve [35,36] of the tessellation. To develop geometric intuition, we observe that Vor F ( U , Ω ) can be obtained by growing primal Bregman balls with centers at the points u U . When two such balls meet, they freeze where they touch but keep growing everywhere else. Eventually, each ball covers exactly the corresponding domain. Since the primal balls are not necessarily convex, it is not surprising that the faces shared by the domains are not necessarily straight. Nevertheless, the Delaunay mosaic has a natural straight-line embedding as all its cells are vertical projections of the lower faces of the convex hull of the points ( u , F ( u ) ) R d + 1 . To see this, we note that each cell of the mosaic corresponds to a dual Bregman ball whose boundary passes through the vertices of the cell, and this ball is the vertical projection of the part of the graph of F on or below the graph of an affine function.
Construction. To construct the mosaic, we assume that U Ω is in general position, by which we mean that Conditions I and II from Appendix D are satisfied after transforming U Ω to X R d × R such that Del F ( U , Ω ) is a subcomplex of the weighted Delaunay mosaic of X. Lifting the points from R d to R d + 1 and projecting the lower boundary of the convex hull back to R d , we obtain the mosaic. We remind the reader that relevant background information can be found in Appendix B, and define ϖ ( a ) = 1 2 a 2 . We formalize this method while stating all steps in terms of weighted points in d dimensions:
Step 1: 
Let X R d × R be the set of weighted points x ( u ) = ( u , 2 ϖ ( u ) 2 F ( u ) ) , with u U .
Step 2: 
Compute the weighted Delaunay mosaic of X in Euclidean geometry, denoted Del ( X ) .
Step 3: 
Select Del F ( U , Ω ) as the collection of simplices in Del ( X ) whose corresponding weighted Voronoi cells have a non-empty intersection with Ω * .
Indeed, the weighted Delaunay mosaic computed in Step 2 may contain simplices that do not belong to the Delaunay–Bregman mosaic of F. To implement Step 3, we note that Del F ( U , Ω ) is dual to Vor F ( U , Ω ) , which is isomorphic to Vor F * ( ϕ ( U ) , Ω * ) , and this Bregman–Voronoi tessellation is the weighted Voronoi tessellation of X restricted to Ω * . This tessellation has convex polyhedral cells and is readily available as the dual of Del ( X ) . Writing Y ( Q ) X for the points x ( u ) with u Q U and dom ( Y ) for the weighted Voronoi cell that corresponds to Y Del ( X ) , we have
Del F ( U , Ω ) = { Q U dom ( Y ( Q ) ) Ω * } .
Instead of computing all these intersections, we can collapse Del ( X ) to the desired subcomplex and thus save time by looking only at a subset of the mosaic. We explain how the simplices can be organized to facilitate such a collapse. Recalling that Ω * R d is open and convex, we introduce the signed distance function, θ : R d R , which maps every a R d to plus or minus r = r ( a ) such that the sphere with center a and radius r touches Ω * but does not cross the boundary. Finally, θ ( a ) = r ( a ) if a Ω * and θ ( a ) = r ( a ) if a Ω * . Note that Ω * = θ 1 [ , 0 ) and that Ω t * = θ 1 [ , t ) is open and convex for every t. Now construct ϑ : Del ( X ) R by mapping Y Del ( X ) to the maximum t R for which dom ( Y ) Ω t * = . By (19), we obtain Del F ( U , Ω ) by removing all simplices Y with ϑ ( Y ) 0 . The crucial observation is that for X in general position, ϑ is a generalized discrete Morse function with a single critical vertex. To see this, we observe that Vor ( X ) decomposes Ω t * into convex domains for every value t, which by the Nerve Theorem [22] implies that ϑ 1 ( , t ] is contractible. Removing the simplices in a sequence of decreasing values of ϑ thus translates into a sequence of collapses that preserve the homotopy type of the mosaic.
Rise functions. To introduce scale into the construction of Bregman–Delaunay mosaics, we generalize the radius function from Euclidean geometry to Bregman geometries, changing the name because size is more conveniently measured by height difference in the ( d + 1 ) -st coordinate direction as opposed to the radius in R d . Let u ˙ = ( u , F ( u ) ) and u ¯ : R d R be the point and affine map that correspond to u Ω , and let υ : R d R be the upper envelope of the u ¯ , u U . We introduce the rise function, ϱ F : Del F ( U , Ω ) R , which maps each simplex, Q, to the minimum difference between F * and υ at points in the conjugate Voronoi cell:
ϱ F ( Q ) = inf a ϕ ( dom ( Q , Ω ) ) [ F * ( a ) υ ( a ) ] .
It is the infimum amount we have to lower the graph of F * until it intersects the graph of υ at a point vertically above the Voronoi cell in conjugate space. Without going to the conjugate, we can interpret ϱ F ( Q ) in terms of (primal) Voronoi domains and cones of light cast from the u ˙ onto the graph, which we raise until the cones clipped to within their Voronoi domains have a point in common. This interpretation motivates the name of the function. Comparing (20) with (A8), we see that the two agree when F = ϖ and Ω = R d . Indeed, we obtain F * = ϖ and υ = ξ . Furthermore, ϕ ( dom ( Q , Ω ) ) = dom ( Q , Ω ) , and taking the infimum is the same as taking the minimum.
For every h R , we have a sublevel set, Del F , h ( U , Ω ) = ϱ F 1 ( , h ] , which we refer to as the Bregman–Alpha complex of U and F for size h. For h < 0 , this complex is empty, for h = 0 , it is a set of vertices namely the points in U, and for sufficiently large positive h, this complex is Del F ( U , Ω ) .
Information-theoretic interpretations. We offer a brief information-theoretical interpretation for many of the geometric objects mentioned above in the case of relative entropy.
Let us start with a primal relative entropy ball of radius r around a finite probability distribution c. It contains all distributions that can be used to approximate c, incurring the efficiency loss of at most r bits. With this, it is easy to see that the Voronoi and Delaunay constructions have intuitive information-theoretical interpretations. For simplicity, let us consider the Čech rise (radius) function, which arises from the first point of a non-empty intersection between a number of primal balls [35,36]. This radius is therefore the smallest number of bits that need to be incurred when approximating the centers of these balls with a single distribution. This value can be interpreted as an information-theoretical quantification of the high-order interactions among these distributions. Of course, the intersection point is exactly the unique best choice for such a distribution.
The remaining objects can be interpreted in a similar way, which we leave for the reader. We stress that the birth and death values in the resulting persistence diagrams also express the loss of coding efficiency in bits—and stress that the diagrams are identical for the Delaunay and Čech case.
Computation. We compute the rise function following the intuition based on primal Voronoi domains explained below (20). Equivalently, ϱ F ( Q ) is the minimum amount we have to raise the graph of F so it has a supporting hyperplane that passes through all points u ˙ , with u Q , while all other point u ˙ , with u U , lie on or above the hyperplane.
To turn this intuition into an algorithm, we consider the affine hull of Q and write v ¯ : aff Q R for the affine function that satisfies v ¯ ( u ) = F ( u ) for all u Q . Let  H : aff Q Ω R measure the difference: H ( a ) = F ( a ) v ¯ ( a ) . Since F is of Legendre type, so is H. We are interested in the infimum of H, which either occurs at a point in aff Q Ω or at the limit of a divergent sequence. We therefore introduce a numerical routine that returns both, the infimum and the point where it occurs:
1
InfSize  ( function F , simplex Q ) :
2
       ( a Q , h Q ) = ( arginf H , inf H ) ;
3
      return  ( a Q , h Q ) .
Note that the dual Bregman ball centered at a Q aff Q Ω of size h Q contains Q in its boundary, and it may or may not contain points of U \ Q in its interior. If it does not, then ϱ F ( Q ) = h Q , otherwise, ϱ F ( Q ) is the minimum function value of the proper cofaces of Q. To express this more formally, we write coFacets ( Q ) for the collection of simplices R Del ( X ) with Q R and # R = # Q + 1 . Since Q gets its value either directly or from a coface, it is opportune to compute the rise function in the order of decreasing dimension:
1
for  p = d  downto 0 do
2
      forall p-simplices Q Del F ( U , Ω )  do
3
             ( a Q , h Q ) = InfSize(F, Q);
4
            if  B F * ( a Q , h Q ) [ U \ Q ] =
5
                  then  ϱ F ( Q ) = h Q
6
                  else  ϱ F ( Q ) = min R coFacets ( Q ) ϱ F ( R ) .
Note that this algorithm assigns a value to every simplex in Del F ( U , Ω ) . Indeed, the simplices in Del ( X ) that are not in Del F ( U , Ω ) have been culled in Step 3, as explained above.
Fisher metric. In addition to the Bregman divergences, we consider Delaunay mosaics under the Fisher metric. To construct them, we recall that the mapping ȷ : R + d R + d defined by ȷ ( x ) = ( 2 x 1 , 2 x 2 , , 2 x d ) is an isometry between the Fisher metric and the Euclidean metric. This suggests the following algorithm:
Step 1: 
Compute the Delaunay mosaic of ȷ ( U ) in Euclidean space.
Step 2: 
Remove the simplices from Del ( ȷ ( U ) ) whose dual Voronoi cells have an empty intersection with R + d .
Step 3: 
Draw the resulting complex by mapping each point ȷ ( u ) to the original point u U R + d .
The rise function in Euclidean geometry maps every simplex ȷ ( Q ) Del ( ȷ ( U ) ) to the squared radius of the smallest empty circumsphere of ȷ ( Q ) . By isometry, the preimage of this Euclidean sphere is the smallest empty circumsphere of Q under the Fisher metric, and the squared radius is the same. We thus obtain the rise function on the Fisher–Delaunay mosaic by copying the values of the rise function on the Delaunay mosaic in Euclidean geometry.
The construction of the mosaic for the Fisher metric restricted to the standard simplex, Δ d 1 , is only slightly more complicated. As mentioned in Section 2, the isometry maps Δ d 1 to 2 S + d 1 , which is our notation for the positive orthant of the sphere with radius 2 centered at the origin in R d . The distance between points u , v Δ d 1 under the Fisher metric thus equals the Euclidean length of the great-circle arc connecting ȷ ( u ) , ȷ ( v ) 2 S + d 1 . The Delaunay mosaic of ȷ ( U ) under the geodesic distance can be obtained by constructing the convex hull of ȷ ( U ) { 0 } in R d and centrally projecting all faces not incident to 0 onto the sphere. As before, we cull simplices whose dual Voronoi cells have an empty intersection with the positive orthant of the sphere, and we draw the mosaic in Δ d 1 by mapping the vertices back to the original points. Furthermore, the rise functions of the mosaics in 2 S + d 1 and in Δ d 1 are the same. Note, however, that the geodesic radius is the arc-sine of and therefore slightly larger than the straight Euclidean radius in R d .

4. Computational Experiments

We illustrate the Bregman–Alpha and Bregman–Wrap complexes while comparing them to the conjugate, the Fisher, and the Euclidean constructions.
Example in positive quadrant. Let X be a set of 1000 points uniformly distributed according to the Fisher metric in ( 0 , 2 ] 2 R + 2 . To sample X, we use the isometry, ȷ : R + 2 R + 2 , between the Fisher and the Euclidean metric mentioned in Section 2. Specifically, we sample 1000 points uniformly at random according to the Euclidean metric in ( 0 , 2 ] 2 , and we map each point with coordinates x 1 , x 2 to ȷ 1 ( x 1 , x 2 ) = 1 2 ( x 1 2 , x 2 2 ) , which is again a point in ( 0 , 2 ] 2 . To compute the Delaunay mosaic in Fisher geometry, we construct the (Euclidean) Delaunay mosaic of ȷ ( X ) and draw this mosaic with the vertices at the points in X. Recall however that the domain is Ω = R + d and not R d . A simplex whose corresponding Voronoi cell has an empty intersection with the positive orthant thus does not belong to the mosaic, which is restricted to Ω . We identify these simplices and remove them from the Delaunay mosaic as described in Section 3.
Example in standard triangle. Motivated by our interest in information-theoretic applications, we repeat the above experiment within the standard triangle, Δ 2 , which consists of all points ( x 1 , x 2 , x 3 ) R + 3 that satisfy x 1 + x 2 + x 3 = 1 . Every point in Δ 2 can be interpreted as a probability distribution on three disjoint events, which is indeed the most relevant scenario for the application of the relative entropy. To sample a set Y of 1000 points uniformly at random according to the Fisher metric in Δ 2 , we use again j, now restricted to Δ 2 , whose image is the positive orthant of the sphere with radius 2 centered at the origin of R 3 . Sampling 1000 points uniformly at random according to the geodesic distance on the sphere, we take the convex hull of ȷ ( Y ) { 0 } and obtain the mosaic by mapping the vertices to the points in Y = ȷ 1 ( ȷ ( Y ) ) . Before drawing the faces in Δ 2 , we remove 0 and all incident faces, as well as all faces whose corresponding Voronoi cells have an empty intersection with  R + 2 .
Recall that the squared Fisher metric matches the relative entropy in the infinitesimal regime, which explains the random appearance of the reconstruction in Figure 1, for which we set the threshold to 0.0025 . The reconstruction in Shannon geometry is similar to those in conjugate Shannon geometry in Figure 1a and in Fisher geometry in Figure 1b. To interpret the reconstruction in Figure 1d, we observe that the difference between the Shannon entropy and the squared Euclidean norm has a minimum at the center and no other critical points in the interior of the triangular domain. Accordingly, the reconstruction removes simplices near the corners and the three sides first. More drastically, the Bregman–Wrap complex for the same data removes all simplices except for a single critical edge near the center (see Figure 2d).

5. Quantification of Difference

We take a data-centric approach to quantifying the differences between the geometries. Given a common domain, Ω , and a finite set of points, U Ω , we compare the corresponding mosaics and rise functions.
Mosaics. The Delaunay mosaics of U depend on the local shape of the balls defined by the metric or the divergence. Letting D and E be two Delaunay mosaics with vertex sets U, we compare them by counting the common cells:
J ( D , E ) = 1 # ( D E ) # D + # E # ( D E ) ,
which is sometimes referred to as the Jaccard distance between the two sets. It is normalized so that J = 0 iff D = E and J = 1 iff D and E share no cells at all. In our application, the two mosaics share all vertices, so J is necessarily strictly smaller than 1. To apply this measure to the constructions in Section 4, we write D 0 ,   D 1 ,   D 2 ,   D 3 ,   D 4 for the mosaics in Figure A1, and we write E 0 ,   E 1 ,   E 2 ,   E 3 ,   E 4 for the mosaics in Figure 3. All mosaics are different, except for D 0 = D 4 and E 0 = E 4 . The Jaccard distances are given in Table 1.
We see that the mosaics in conjugate Shannon geometry and in Fisher geometry are most similar to each other and less similar to the mosaic in Shannon geometry. The mosaic in Euclidean geometry is most dissimilar to the others (see Figure 3 and related Figure A1 in Appendix A for visual confirmation).
Rise functions. Different rise functions on the same mosaic can be compared by counting the inversions, which are the pairs of cells whose orderings are different under the two functions. Recall that D 0 = D 4 and E 0 = E 4 , let d 0 : D 0 R and e 0 :   E 0 R be the rise functions in Shannon geometry, and let d 4 :   D 4 R and e 4 :   E 4 R be the rise functions in weighted Euclidean geometry. The normalized number of inversions are
I ( d 0 , d 4 ) = 0.476 ,
I ( e 0 , e 4 ) = 0.467 .
In words, slightly fewer than half the pairs are inversions, both for d 0 , d 4 and for e 0 , e 4 . This is plausible because d 4 orders the cells along the diagonal while d 0 preserves the random character of the point sample (see Figure A2d). Similarly, e 4 orders the cells radially, from the center of the standard triangle to its periphery, while e 0 preserves again the random character of the sample (see Figure 4d).
We can compare the rise functions also visually, by color-coding the 2-dimensional cells, and this works even if the mosaics are different. Specifically, we shade the triangles by mapping small to large rise function values onto dark to light colors. In Figure A2a,b, this leads to randomly mixed dark and light triangles, while in Figure A2c,d, there are clear but opposing gradients parallel to the diagonal. Similarly, in Figure 4c, we see the rise function decreasing from the center to the boundary of the standard triangle, and in Figure 4d we see it increasing from the center to the boundary. In addition, we compare general rise functions by computing their persistence diagrams (see [36]). Writing Dgm ( d ) for the persistence diagram of function d, we quantify the difference with the bottleneck between the diagrams:
B ( d , e ) = W ( Dgm ( d ) , Dgm ( e ) ) .
As explained in [36], the bottleneck distance is 1-Lipschitz, that is: B ( d , e ) d e , but d e does not necessarily imply B ( d , e ) 0 . The bottleneck distances between the d i :   D i R and the e i :   E i R are given in Table 2.
In part, this comparison agrees with the Jaccard distances between the mosaics given in Table 1. The most obvious disagreements are for d 0 , d 4 and for e 0 , e 4 , in which quite different functions are defined on identical mosaics.

6. Discussion

We extended two popular computational geometric-topological methods within the framework of discrete Morse functions and showed how this generalizes the methods to data in Bregman and Fisher geometries. Importantly, this allowed us to produce a robust implementation without the need to develop customized low-level software. We hope that this result will provide extra motivation for the development of geometric software for high-dimensional Euclidean space, as such software can be reused to handle data in information space. In particular, efficient computation of skeleta of weighted Delaunay mosaics would be relevant for topological analysis of high-dimensional data.
Turning the table, we use these generalized methods to compare different geometries experimentally. Our experimental approach to studying geometries is a first step towards an intuitive understanding of these often counter-intuitive geometries, as well as the high-order interactions occurring in them.
We reiterate one reason why understanding such geometries is important: they underpin many modern data science methods. It is, for example, a prerequisite for explaining the surprisingly good performance of many deep learning models (that output predictions as discrete probability distributions). More broadly, the space of discrete probability distributions is a basic object in mathematics, but is not well understood. We provide new tools for broadening the understanding of such spaces.
Perhaps the main observation coming from our experiments is the following. As we know, the pairwise distance in the Kullback–Leibler geometry and the Euclidean or Fisher one can be very different. Indeed, on the standard simplex, the former can approach infinity while it is bounded in the latter. Still, our experiments show that the geometric-topological structures—at least in low dimensions—do not typically show large discrepancies between the geometries. Is this also true in higher dimensions? How does this generalize to other Bregman divergences and the corresponding generalized Fisher distances [6]? To what extent can the Fisher space replace the Shannon space in various applications?
Several open questions about experimental understanding of geometric spaces arise:
  • How do these results generalize to other notions of entropy, such as the Burg entropy [37]? The resulting Bregman Itakura–Saito divergence [38] is used to compare speech samples, but little is known about the resulting geometry and how it compares with other geometries.
  • Is there a deeper reason for the similarity of the studied geometries? For example: are there Pinsker-type inequalities [39,40] between the pairwise dissimilarities (as well as other Bregman divergences)?
  • Can the sensitivity of Delaunay mosaics to the dissimilarity be quantified probabilistically, as the expected Jaccard distance for random point processes?
  • Persistence has been used before to compare metric spaces [41], and it would be interesting to know whether there are deeper connections to our work.
  • Can the Fisher metric be extended beyond the positive part of the sphere? What is the geometry of the preimage of the Antonelli map in this case?
We finish with a technical question concerning the Delaunay mosaics in Fisher geometry: is the drawing we obtain by mapping the vertices to the corresponding points and connecting them with straight edges, flat triangles, etc., necessarily a geometric realization of the mosaic?

Author Contributions

Conceptualization, H.E., K.Ö. and H.W.; methodology, H.E., K.Ö. and H.W.; software, K.Ö. and H.W; writing, H.E, K.Ö. and H.W.; visualization, K.Ö.; supervision, H.E.; funding acquisition, H.E. and H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, grant no. 788183, from the Wittgenstein Prize, Austrian Science Fund (FWF), grant no. Z 342-N31, the DFG Collaborative Research Center TRR 109, ‘Discretization in Geometry and Dynamics’, Austrian Science Fund (FWF), grant no. I 02979-N35, the Austrian-Slovenian project on ‘Intrinsic Persistent Homology’, Austrian Science Fund (FWF), grant no. I 4245-N35 and the 2022 Google Research Scholar Program grant ‘Algorithms for Topological Analysis of Neural Networks’.

Data Availability Statement

The software and data used for this study is available at: https://git.ista.ac.at/katharina.oelsboeck/wrap_2_3-public/ (accessed on 19 July 2024).

Acknowledgments

We thank Anton Nikitenko for first observing that the Wrap complex can be characterized as stated in Claim (ii) of the Wrap Complex Lemma, and Ondrej Draganov for correcting a critical mistake in one of our formulas in Section 2.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Extra Visualizations

In this section, we provide visualizations and discussion for the positive orthant case. We recall that the form of the Kullback–Leibler divergence we use works also in this case (and not only on the standard simplex). Figure A3 top displays the Bregman–Alpha complex in Shannon geometry for threshold 0.004 . Infinitesimally, the relative entropy agrees with the squared Fisher metric, so the uniform distribution of the points translates into a fairly uniform arrangement of random holes in the complex. The closer we get to the left or the lower side of the square, the denser the points get and the more anisotropically aligned with the sides the edges and the triangles get.
For comparison, the bottom row in Figure A3 shows the Bregman–Alpha complex in conjugate Shannon geometry, in Fisher geometry, in Euclidean geometry, and in weighted Euclidean geometry. The primal and the dual balls behave similarly, which explains the similarity of the complexes in Figure A3 top and (a). It should however be mentioned that the underlying triangulation in Figure A3a occasionally folds, which is caused by moving the vertices from the conjugate points (for which we have a straight-line embedding) to the original points. Not surprisingly, there is also a striking similarity to the reconstruction in Fisher geometry Figure A3b. The Bregman–Alpha complex in Euclidean geometry Figure A3c is just the usual Alpha complex of the points. It clearly shows that the density decreases along the diagonal. The complex in Figure A3d mixes aspects of Shannon and Euclidean geometry. In particular, it reuses the mosaic in Figure A3 top and assigns weights to the points such that this triangulation is the weighted Delaunay mosaic of the weighted points in Euclidean geometry. The corresponding rise function reflects the difference between the Shannon entropy and the squared Euclidean norm. Indeed, the rise function increases along the diagonal, which explains why the reconstructed complex is almost the entire mosaic, with cells along the left and bottom sides of the square domain missing.
Figure A1. (Top) The Bregman–Delaunay mosaic in Shannon geometry for the same set of points as used in Figure A3 and Figure A4. (Bottom row) Four Delaunay mosaics whose triangles and edges are colored depending on whether or not they belong to the Shannon–Delaunay mosaic.
Figure A1. (Top) The Bregman–Delaunay mosaic in Shannon geometry for the same set of points as used in Figure A3 and Figure A4. (Bottom row) Four Delaunay mosaics whose triangles and edges are colored depending on whether or not they belong to the Shannon–Delaunay mosaic.
Entropy 26 00637 g0a1
Figure A2. (Top) The color-coded Bregman–Delaunay mosaic in Shannon geometry. The set of points is the same as in Figure A1. (Bottom row) The color-coded Delaunay mosaics for the same set X.
Figure A2. (Top) The color-coded Bregman–Delaunay mosaic in Shannon geometry. The set of points is the same as in Figure A1. (Bottom row) The color-coded Delaunay mosaics for the same set X.
Entropy 26 00637 g0a2
Figure A3. (Top) The Bregman–Alpha complex in Shannon geometry of a set X of 1000 points uniformly distributed according to the Fisher metric in ( 0 , 2 ] 2 and a threshold h = 0.004 . (Bottom row) The reconstructions in four different geometries.
Figure A3. (Top) The Bregman–Alpha complex in Shannon geometry of a set X of 1000 points uniformly distributed according to the Fisher metric in ( 0 , 2 ] 2 and a threshold h = 0.004 . (Bottom row) The reconstructions in four different geometries.
Entropy 26 00637 g0a3
Figure A4. (Top) The Bregman–Wrap complex in Shannon geometry of the same points and the same threshold as in Figure A3. (Bottom row) The reconstructions in four different geometries.
Figure A4. (Top) The Bregman–Wrap complex in Shannon geometry of the same points and the same threshold as in Figure A3. (Bottom row) The reconstructions in four different geometries.
Entropy 26 00637 g0a4
We see very similar reconstructions in Figure A4, which shows the Bregman–Wrap complexes for the same set of points and the same threshold. By construction, each Wrap complex is a homotopy equivalent subcomplex of the corresponding Alpha complex. The biggest difference occurs in weighted Euclidean geometry, in which we reuse the mosaic in Shannon geometry but filter with the rise function obtained from the squared Euclidean norm. The corresponding Bregman–Wrap complex consists of a single vertex near the upper right corner of the square domain (see Figure A4d). This reconstruction reflects the simple relation between the Shannon entropy and the halved squared Euclidean norm: ϖ ( x ) E ( x ) is monotonically increasing from left to right and from bottom to top. This translates into a discrete gradient that introduces a flow with a single critical cell, namely, the vertex near the upper right corner.

Appendix B. Background on Delaunay Mosaics and Related Topics

We recall standard definitions related to Delaunay mosaics, the corresponding liftings and projections, as well as discrete Morse theory.
Delaunay mosaics. In this paper, the ability to assign real weights to points is essential, so we go straight to the weighted generalizations of the Voronoi tessellation and the Delaunay mosaic. A weighted point is a pair x = ( pt ( x ) , wt ( x ) ) R d × R , in which pt ( x ) is its location and wt ( x ) is its weight. The power distance of a R d from x is π x ( a ) = pt ( x ) a 2 wt ( x ) . It is common to think of the weighted point as a ball with center pt ( x ) and squared radius wt ( x ) . With this interpretation, π x ( a ) is negative inside, zero on the boundary, and positive outside the ball. Given a locally finite set of weighted points, X R d × R , the (weighted) Voronoi domain of x X consists of all points a for which x minimizes the power distance, and the (weighted) Voronoi tessellation of X is the collection of such domains:
dom ( x ) = { a R d π x ( a ) π y ( a ) , y X } ,
Vor ( X ) = { dom ( x ) x X } .
A (weighted) Voronoi cell is the common intersection of Voronoi domains, and we write dom ( Q ) = x Q dom ( x ) . Note that the affine hull of dom ( Q ) contains a unique point, denoted a Q , that minimizes the power distance to the weighted points in Q. Indeed, a Q is at the intersection of the affine hull of dom ( Q ) and the affine hull of the locations pt ( x ) , x Q . Let # Q be the cardinality of Q. We are primarily interested in the generic case, when every non-empty Voronoi cell, dom ( Q ) , satisfies the following two general position conditions:
  • the dimension of dom ( Q ) is d + 1 # Q ,
  • a Q does not belong to the boundary of dom ( Q ) .
By Condition I, dom ( Q ) = whenever # Q > d + 1 . Condition I also implies that every non-empty Voronoi cell is the intersection of a unique collection of Voronoi domains.
Figure A5. The Voronoi tessellation restricted to the open rectangular region and its dual restricted Delaunay mosaic.
Figure A5. The Voronoi tessellation restricted to the open rectangular region and its dual restricted Delaunay mosaic.
Entropy 26 00637 g0a5
The (weighted) Delaunay mosaic is the collection of polytopes spanned by subsets of X that define non-empty Voronoi cells. It is convenient to identify such a subset, Q, with the polytope it spans, which is the convex hull of the locations of the weighted points in Q. In the assumed generic case, all polytopes are simplices, and the Delaunay mosaic is a simplicial complex geometrically realized in R d , which we denote as Del ( X ) . Most of the time, we restrict our attention to an open convex region, Ω R d , we assume X Ω × R , and we write dom ( Q , Ω ) = dom ( Q ) Ω . Correspondingly, the restricted Voronoi tessellation and the restricted Delaunay mosaic are
Vor ( X , Ω ) = { dom ( x , Ω ) x X } ,
Del ( X , Ω ) = { Q X dom ( Q , Ω ) }
(see Figure A5).
Lifting and projecting. The Voronoi tessellation and the Delaunay mosaic can both be constructed as the projection of the boundary complexes of convex polyhedra in R d + 1 . To explain this, recall that ϖ ( a ) = 1 2 a 2 and map every weighted point, x, to the point x ˙ R d + 1 and to the affine map x ¯ :   R d R defined by
x ˙ = ( pt ( x ) , ϖ ( pt ( x ) ) 1 2 wt ( x ) ) ,
x ¯ ( a ) = ϖ ( pt ( x ) ) + pt ( x ) , a pt ( x ) + 1 2 wt ( x )
= pt ( x ) , a 1 2 pt ( x ) 2 + 1 2 wt ( x ) .
The map is chosen so that the solution to ϖ ( a ) x ¯ ( a ) = 0 is the sphere with center pt ( x ) and squared radius wt ( x ) . The point is chosen so that a point on the graph of ϖ lies on or below the graph of x ¯ iff this point is visible from x ˙ , by which we mean that the entire line segment connecting x ˙ with this point lies below the graph of ϖ .
Let ξ :   R d R be the pointwise maximum of the affine maps, ξ ( a ) = max x X x ¯ ( a ) , and note that it is piecewise linear and convex. As observed already by Georges Voronoi [42], the vertical projection of its linear pieces gives the Voronoi tessellation of X in R d . To obtain a similar construction of the Delaunay mosaic, we take the convex hull of the points x ˙ R d + 1 . We call a hyperplane that touches the polytope without intersecting its interior a support plane, and the intersection of the polytope with a support plane a face of the polytope. For points in general position, all faces are simplices. A lower face is the intersection of the polytope with a non-vertical support plane such that the polytope lies above the hyperplane. In analogy to the relation observed by Voronoi, the vertical projection of the lower faces of the convex hull gives the Delaunay mosaic of X in R d .
The interpretations of the Voronoi tessellation and the Delaunay mosaic as projections of boundary complexes of convex polyhedra provide geometrically intuitive interpretations of a function that plays a crucial role in this paper. Recall that each simplex, Q Del ( X ) , corresponds to a Voronoi cell, dom ( Q ) . The radius function, or more precisely the half-squared radius, ϱ :   Del ( X ) R , maps Q to the minimum difference between ϖ and ξ at points in the Voronoi cell:
ϱ ( Q ) = min a dom ( Q ) [ ϖ ( a ) ξ ( a ) ] .
In words, ϱ ( Q ) is the amount we have to lower the graph of ϖ until it intersects the graph of ξ at a point vertically above dom ( Q ) . The function value is also the minimax difference between ϖ and any affine map that satisfies y ¯ ( pt ( x ) ) ϖ ( pt ( x ) ) 1 2 wt ( x ) for all x X and with equality for all x Q . Specifically, we minimize the maximum y ¯ ( a ) ϖ ( a ) , in which the maximization is over all a R d , and the minimization is over all affine maps, y ¯ :   R d R , that satisfy the conditions stated above.
Discrete Morse theory. Assuming a general position, the radius function on the Delaunay mosaic enjoys structural properties, which we now formalize. Let K be a simplicial complex and P , Q K two simplices. For a monotonic function, f :   K R , P Q implies f ( P ) f ( Q ) . The Hasse diagram of K is the directed graph whose nodes are the simplices and whose arcs are the codimension 1 face relations: every arc ends at a p-simplex and starts at a ( p 1 ) -dimensional face of this simplex. By construction, the values of a monotonic function are non-decreasing along directed paths in the Hasse diagram. A level set of f is a maximal collection of simplices with shared function value, f 1 ( r ) K , and we call a maximal connected subset of a level set a step. For simplices P R in K, call ψ = { Q K P Q R } an interval, P = lb ( ψ ) its lower bound, R = ub ( ψ ) its upper bound, and note that # ψ = 2 # R # P . According to an inessential modification of the original formulation by Forman [26], f is a discrete Morse function if every step is an interval of size 1 or 2. A slightly weaker condition was introduced by Freij [27], calling f a generalized discrete Morse function if every step is an interval. The corresponding partition of K into intervals is called the generalized discrete gradient of f.
The singleton intervals are special, which is expressed by calling the simplices they contain and the corresponding values critical. To motivate this terminology, consider two contiguous values, r < s , and the corresponding sublevel sets, K r = f 1 ( , r ] and K s = f 1 ( , s ] . By assumption, no simplex maps to a value strictly between r and s, which implies that the difference between the two complexes is the level set at s. This level set is a disjoint union of steps, and because f is generalized discrete Morse, a disjoint union of mutually separated intervals. When we add the simplices of such an interval to K r , then the homotopy type changes if the interval consists of a single, critical simplex, and it remains unchanged if the interval consists of two or more simplices. The operation of removing a non-singular interval is called a collapse. If all intervals in f 1 ( s ) are non-singular, then we write K s K r to express that K r can be obtained from K s by collapsing all intervals in the difference. More generally, if ( r , t ] contains no critical value of f, then K t K r (see Forman [26]).

Appendix C. Equivalence of Definitions

This appendix proves that the three definitions of the Wrap complex offered in Section 2 are indeed equivalent. Given a generalized discrete Morse function f :   D R , we recall that Wrap h ( f ) Alpha h ( f ) are the Wrap and the Alpha complex of f for h, and Sg f ( h ) is the collection of singular steps whose critical simplices have function value at most h.
Result 1 
(Wrap Complex Lemma). Let f :   D R be a generalized discrete Morse function on a simplicial complex. Then
(i) 
Wrap h ( f ) is the smallest complex K D that satisfies Alpha h ( f ) K , in which we restrict the collapses to intervals of f.
(ii) 
Wrap h ( f ) is the smallest subcomplex of D that contains Sg f ( h ) and is a union of intervals of f.
Proof. 
Consider two steps, φ and ψ , in the step graph G of f. If there is an arc from φ to ψ , then φ contains a proper face of a simplex in ψ . This implies that if M is a collection of steps such that K = M is a complex, then ψ M implies φ M . If both belong to M, then φ cannot be collapsed. On the other hand, if φ M and no successor of φ in G belongs to M, then we can collapse φ ; that is: K \ φ is a complex. To prepare the proofs of (i) and (ii), we let M be a collection of steps such that
  • K = M contains a critical simplex Q iff f ( Q ) h ;
  • K is a complex;
  • there is no step φ M such that K K \ φ .
First, we claim that the three properties specify M uniquely. To prove this claim, let φ 0 M be non-singular and let φ 0 , φ 1 , , φ k be maximal such that φ i M is a successor of φ i 1 in G for 1 i k . We note that k 1 because φ 0 cannot be collapsed, and φ k is singular because the sequence is maximal. To obtain a contradiction, we assume that N M is another collection of steps that satisfies Properties 1, 2, and 3. Suppose first that N contains a step μ 0 M , and consider a maximal sequence μ 0 , μ 1 , , μ such that μ j N is a successor of μ j 1 in G for 1 j . Since μ 0 M , the step is necessarily non-singular, which implies 1 and μ singular. But then there is a first step along this sequence, μ j , that belongs to M. Since there is an arc from μ j 1 to μ j and μ j 1 M , this contradicts that M is a complex. Suppose second that N contains no such step μ 0 , but M contains a step φ 0 N . By the symmetric argument, this implies that N is not a complex, again a contradiction. We conclude that the collection M that satisfies Properties 1, 2, 3 is unique.
Second, we claim that the unique complex that satisfies Properties 1, 2, and 3 is Wrap h ( f ) . By definition, the Wrap complex contains all critical simplices that satisfy f ( Q ) h . The value of Q is the maximum of any step in the lower set of its singular interval, which implies that Wrap h ( f ) contains no critical simplex with value larger than h and therefore satisfies Property 1. Property 2 is satisfied because all faces of a simplex in a step that are not in the step belong to predecessors of the step. Indeed, the directed path from a face to a simplex in the Hasse diagram maps to a possibly shorter directed path from the step of the face to the step of the simplex in G . To see that Property 3 is satisfied as well, we note that every non-singular step φ 0 Wrap h ( f ) has a directed path to a singular interval and can therefore not be collapsed. We conclude that Wrap h ( f ) is the only union of steps that satisfies Properties 1, 2, and 3.
To prove (i), we note that Alpha h ( f ) satisfies Properties 1 and 2, so it cannot satisfy Property 3 unless it is equal to Wrap h ( f ) . We can therefore collapse non-singular intervals. The process must halt, and the only way it can halt is when it reaches the unique union of steps that satisfies Properties 1, 2, 3, which is Wrap h ( f ) .
To prove (ii), we observe that Wrap h ( f ) contains Sg f ( h ) and is a union of steps. To see that it is the smallest such complex, suppose there is another complex, L = N , that has this property and there exists a step φ Wrap h ( f ) \ L . As argued above, this contradicts that L is a complex, which implies (ii). □

Appendix D. Algorithm for Discrete Gradient

It is easy to see that the rise function defined in Section 3 is monotonic. As proved in [12], it also satisfies the more stringent requirements of a generalized discrete Morse function provided U Ω is in general position. The generalized discrete gradient of this function is a partition of the Delaunay mosaic into intervals, and this partition is instrumental in the construction of subcomplexes discussed in Section 4.
The construction of this partition is complicated by the impossibility of computing the rise function exactly, at least for general Legendre-type functions. Given a numerical approximation, g :   K R , our goal is therefore to first recover the generalized discrete Morse function that g approximates. Given a tolerance, ε 0 , we give an algorithm that computes such a function f :   K R with f g ε and such that the corresponding partition is minimal in a restricted sense. To prepare the algorithm, we define the gap of a subset φ K as the maximum difference of function values:
gap φ = max P , Q φ , P Q [ g ( Q ) g ( P ) ] .
If g is monotonic, then all gaps are non-negative. Otherwise, let ε 0 be the smallest (largest negative) gap between pairs P Q , set g ( P ) = min { g ( P ) , g ( Q ) } whenever P Q , and note that this makes g monotonic while changing the value of any simplex by at most ε 0 . We will therefore assume that g is monotonic. Letting V be a partition of K into intervals, we call an interval ψ V compatible with V if
(i)
ψ is the union of intervals in V;
(ii)
every pair of simplices P Q with P ψ and Q ψ implies g ( ub ( ψ ) ) g ( Q ) ,
in which ub ( ψ ) is the upper bound of the interval. The algorithm constructs the discrete gradient of f by adding compatible intervals to an initially trivial partition of K, namely the one in which every simplex belongs to its own set in the partition. The function itself is computed by spreading the function value of the upper bound to the other simplices in the interval. Let ψ 1 , ψ 2 , , ψ m be the collection of all intervals of K, sorted by gap, let ε 0 be a fixed threshold, and initialize i to 1 and V to the trivial partition of K.
1
while  i m  and  gap ψ i ε  do
2
      if  ψ i compatible with V then
3
            remove all φ V with φ ψ i from V;
4
            add ψ i to V;
5
            forall  P ψ i  do set f ( P ) = g ( ub ( ψ i ) ) ;
6
       i = i + 1 .
Condition (i) guarantees that the computed V is a partition of K into intervals. Condition (ii) makes sure that no relation is reversed, which implies that the computed function, f :   K R , is monotonic and that V is a refinement of its partition into steps. Finally, 0 f ( P ) g ( P ) ε for all simplices P K , as claimed. Without assuming that g is monotonic, the upper bound on the distance between the two functions is ε + ε 0 .
A slight improvement of the algorithm takes into account that an interval can change from incompatible to compatible. By keeping track of this property throughout the algorithm, we can add an interval to the partition even after it was rejected earlier.

References

  1. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  2. Shannon, C. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  3. Amari, S.I.; Nagaoka, H. Methods of Information Geometry; American Mathematical Society: Providence, RI, USA, 2000; Volume 191. [Google Scholar]
  4. Rao, C. Information and Accuracy Attainable in Estimation of Statistical Parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–91. [Google Scholar]
  5. Fisher, R.A. On the Mathematical Foundations of Theoretical Statistics. Philos. Trans. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character 1922, 222, 309–368. [Google Scholar]
  6. Edelsbrunner, H.; Virk, Z.; Wagner, H. Topological data analysis in information space. In Proceedings of the 35th International Symposium on Computational Geometry, Portland, OR, USA, 18–21 June 2019; pp. 31:1–31:14. [Google Scholar]
  7. Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
  8. van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
  9. McInnes, L.; Healy, J.; Saul, N.; Großberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 2018, 3, 861. [Google Scholar] [CrossRef]
  10. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  11. Scott, T.R.; Gallagher, A.C.; Mozer, M.C. von mises-fisher loss: An exploration of embedding geometries for supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10612–10622. [Google Scholar]
  12. Edelsbrunner, H.; Wagner, H. Topological data analysis with Bregman divergences. In Proceedings of the 33rd International Symposium on Computational Geometry, Brisbane, Australia, 4–7 July 2017; pp. 39:1–39:16. [Google Scholar]
  13. Boissonnat, J.D.; Nielsen, F.; Nock, R. Bregman Voronoi diagrams. Discret. Comput. Geom. 2010, 44, 281–307. [Google Scholar] [CrossRef]
  14. Wagner, H.; Dłotko, P. Towards topological analysis of high-dimensional feature spaces. Comput. Vis. Image Underst. 2014, 121, 21–26. [Google Scholar] [CrossRef]
  15. Wagner, H.; Dlotko, P.; Mrozek, M. Computational Topology in Text Mining. In Proceedings of the Computational Topology in Image Context: 4th International Workshop, CTIC 2012, Bertinoro, Italy, 28–30 May 2012; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7309, p. 68. [Google Scholar]
  16. Banerjee, A.; Merugu, S.; Dhillon, I.S.; Ghosh, J. Clustering with Bregman Divergences. J. Mach. Learn. Res. 2005, 6, 1705–1749. [Google Scholar]
  17. Nielsen, F.; Piro, P.; Barlaud, M. Tailored Bregman ball trees for effective nearest neighbors. In Proceedings of the 25th European Workshop on Computational Geometry (EuroCG), Brussels, Belgium, 16–18 March 2009; pp. 29–32. [Google Scholar]
  18. Nielsen, F.; Piro, P.; Barlaud, M. Bregman Vantage Point Trees for Efficient Nearest Neighbor Queries. In Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, (ICME), New York, NY, USA, 28 June–3 July 2009; pp. 878–881. [Google Scholar] [CrossRef]
  19. Gholizadeh, S.; Seyeditabari, A.; Zadrozny, W. Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining. Big Data Cogn. Comput. 2018, 2, 33. [Google Scholar] [CrossRef]
  20. Salnikov, V.; Cassese, D.; Lambiotte, R.; Jones, N.S. Co-occurrence simplicial complexes in mathematics: Identifying the holes of knowledge. Appl. Netw. Sci. 2018, 3, 1–23. [Google Scholar] [CrossRef]
  21. Salton, G.; Wong, A.; Yang, C.S. A vector space model for automatic indexing. Commun. ACM 1975, 18, 613–620. [Google Scholar] [CrossRef]
  22. Borsuk, K. On the imbedding of systems of compacta in simplicial complexes. Fundam. Math. 1948, 35, 217–234. [Google Scholar] [CrossRef]
  23. Leray, J. L’anneau spectral et l’anneau filtré d’homologie d’un espace localement compact et d’une application continue. J. Mathématiques Pures Appliquées 1950, 29, 1–139. [Google Scholar]
  24. Weil, A. Sur les théorèmes de de Rham. Comment. Math. Helv. 1952, 26, 119–145. [Google Scholar] [CrossRef]
  25. Bauer, U.; Kerber, M.; Roll, F.; Rolle, A. A unified view on the functorial nerve theorem and its variations. Expo. Math. 2023, 41, 125503. [Google Scholar] [CrossRef]
  26. Forman, R. Morse theory for cell complexes. Adv. Math. 1998, 134, 90–145. [Google Scholar] [CrossRef]
  27. Freij, R. Equivariant discrete Morse theory. Discret. Math. 2009, 309, 3821–3829. [Google Scholar] [CrossRef]
  28. Bauer, U.; Edelsbrunner, H. The Morse theory of Čech and Delaunay complexes. Trans. Amer. Math. Soc. 2017, 369, 3741–3762. [Google Scholar] [CrossRef]
  29. Edelsbrunner, H. Surface reconstruction by wrapping finite point sets in space. In Discrete and Computational Geometry. The Goodman–Pollack Festschrift; Aronov, B., Basu, S., Pach, J., Sharir, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; pp. 379–404. [Google Scholar]
  30. Bregman, L. The relaxation method of finding the common point of convex sets and its applications to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 1967, 7, 200–217. [Google Scholar] [CrossRef]
  31. Bauschke, H.; Borwein, J. Legendre functions and the method of random Bregman projections. J. Convex Anal. 1997, 4, 27–67. [Google Scholar]
  32. Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar] [CrossRef]
  33. Antonelli, P. The geometry of random drift I. Stochastic distance and diffusion. Adv. Appl. Prob. 1977, 9, 238–249. [Google Scholar]
  34. Edelsbrunner, H.; Kirkpatrick, D.; Seidel, R. On the shape of a set of points in the plane. IEEE Trans. Inform. Theory 1983, IT-29, 551–559. [Google Scholar] [CrossRef]
  35. Alexandroff, P. Über kombinatorische Eigenschaften allgemeiner Kurven. Math. Ann. 1927, 96, 512–554. [Google Scholar] [CrossRef]
  36. Edelsbrunner, H.; Harer, J. Computational Topology. An Introduction; American Mathematical Society: Providence, RI, USA, 2010. [Google Scholar]
  37. Burg, J.P. Maximum Entropy Spectral Analysis; Stanford University: Stanford, CA, USA, 1975. [Google Scholar]
  38. Itakura, F. Analysis synthesis telephony based on the maximum likelihood method. In Proceedings of the 6th International Congress on Acoustics, Tokyo, Japan, 21–28 August 1968; pp. 17–20. [Google Scholar]
  39. Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  40. Rioul, O. A historical perspective on Schützenberger-Pinsker inequalities. In Proceedings of the International Conference on Geometric Science of Information, St. Malo, France, 30 August–1 September 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 291–306. [Google Scholar]
  41. Chazal, F.; Cohen-Steiner, D.; Guibas, L.; Mémoli, F.; Oudot, S. Gromov–Hausdorff stable signatures for shapes using persistence. Comput. Graph. Forum 2009, 28, 1393–1403. [Google Scholar] [CrossRef]
  42. Voronoi, G. Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Deuxième Mémoire: Recherches sur les paralléloèdres primitifs. J. Reine Angew. Math. 1908, 134, 198–287. [Google Scholar] [CrossRef]
Figure 1. (Top) The Bregman–Alpha complex in Shannon geometry of a set Y of 1000 random points in Δ 2 with threshold h = 0.0025 . (Bottom row) The reconstructions in four different geometries.
Figure 1. (Top) The Bregman–Alpha complex in Shannon geometry of a set Y of 1000 random points in Δ 2 with threshold h = 0.0025 . (Bottom row) The reconstructions in four different geometries.
Entropy 26 00637 g001
Figure 2. (Top) The Bregman–Wrap complex in Shannon geometry of the same points and threshold as in Figure 1. (Bottom row) The reconstructions in four different geometries.
Figure 2. (Top) The Bregman–Wrap complex in Shannon geometry of the same points and threshold as in Figure 1. (Bottom row) The reconstructions in four different geometries.
Entropy 26 00637 g002
Figure 3. (Top) The Bregman–Delaunay mosaic in Shannon geometry for the same set of points as used in Figure 1 and Figure 2. (Bottom row) Four Delaunay mosaics whose triangles and edges are colored depending on whether or not they belong to the Shannon–Delaunay mosaic.
Figure 3. (Top) The Bregman–Delaunay mosaic in Shannon geometry for the same set of points as used in Figure 1 and Figure 2. (Bottom row) Four Delaunay mosaics whose triangles and edges are colored depending on whether or not they belong to the Shannon–Delaunay mosaic.
Entropy 26 00637 g003
Figure 4. (Top) the color-coded Bregman–Delaunay mosaic in Shannon geometry of the same set of points as in Figure 3. (Bottom row) The color-coded Delaunay mosaics.
Figure 4. (Top) the color-coded Bregman–Delaunay mosaic in Shannon geometry of the same set of points as in Figure 3. (Bottom row) The color-coded Delaunay mosaics.
Entropy 26 00637 g004
Table 1. The Jaccard distances between the Delaunay mosaics in Shannon, conjugate Shannon, Fisher, Euclidean, and weighted Euclidean geometries for points in the positive quadrant on the top and in the standard triangle on the bottom.
Table 1. The Jaccard distances between the Delaunay mosaics in Shannon, conjugate Shannon, Fisher, Euclidean, and weighted Euclidean geometries for points in the positive quadrant on the top and in the standard triangle on the bottom.
J D 0 D 1 D 2 D 3 D 4
D 0 0.000.060.040.480.00
D 1 0.000.020.470.06
D 2 0.000.470.04
D 3 0.000.48
D 4 0.00
E 0 E 1 E 2 E 3 E 4
E 0 0.000.100.060.520.00
E 1 0.000.040.510.10
E 2 0.000.510.06
E 3 0.000.52
E 4 0.00
Table 2. The bottleneck distances between the persistence diagrams of the rise functions on the Delaunay mosaics in Shannon, conjugate Shannon, Fisher, Euclidean, and weighted Euclidean geometries for points in the positive orthant on the top and points in the standard triangle on the bottom.
Table 2. The bottleneck distances between the persistence diagrams of the rise functions on the Delaunay mosaics in Shannon, conjugate Shannon, Fisher, Euclidean, and weighted Euclidean geometries for points in the positive orthant on the top and points in the standard triangle on the bottom.
B d 0 d 1 d 2 d 3 d 4
d 0 0.00000.00280.00040.01260.0048
d 1 0.00000.00280.01260.0048
d 2 0.00000.01260.0048
d 3 0.00000.0126
d 4 0.0000
e 0 e 1 e 2 e 3 e 4
e 0 0.00000.00060.00030.00310.0035
e 1 0.00000.00030.00300.0034
e 2 0.00000.00300.0034
e 3 0.00000.0023
e 4 0.0000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Edelsbrunner, H.; Ölsböck, K.; Wagner, H. Understanding Higher-Order Interactions in Information Space. Entropy 2024, 26, 637. https://doi.org/10.3390/e26080637

AMA Style

Edelsbrunner H, Ölsböck K, Wagner H. Understanding Higher-Order Interactions in Information Space. Entropy. 2024; 26(8):637. https://doi.org/10.3390/e26080637

Chicago/Turabian Style

Edelsbrunner, Herbert, Katharina Ölsböck, and Hubert Wagner. 2024. "Understanding Higher-Order Interactions in Information Space" Entropy 26, no. 8: 637. https://doi.org/10.3390/e26080637

APA Style

Edelsbrunner, H., Ölsböck, K., & Wagner, H. (2024). Understanding Higher-Order Interactions in Information Space. Entropy, 26(8), 637. https://doi.org/10.3390/e26080637

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop