1. Introduction
Networks provide a powerful framework for modelling interactions within systems composed of a large number of components [
1,
2,
3]. A network description of any real-world system consists of a collection of vertices or nodes representing the constituents of the system, and edges or links indicating the presence of interactions between pairs of vertices or nodes. For certain systems, the underlying network could also provide additional information in the form of directed edges, weighted edges or signed edges. However, a fundamental limitation of all network representations is that they can only be used to model pairwise interactions within a system, whereas many real-world complex systems display higher-order interactions between three or more components [
4,
5]. One possible framework to model such higher-order interactions within complex systems is through the use of hypergraphs [
6].
In the past two decades it has become trendy to establish discrete analogs of classical theorems in geometry [
7,
8]. Especially in recent years there has been an increasing interest in characterizing networks as discrete geometric objects, which has led to significant developments in the field of network geometry [
9]. Various complementary approaches to network geometry are currently being explored, including hyperbolic network geometry and emergent network geometry [
10], and topological data analysis [
11]. A notable direction of research in network geometry involves the study of discrete Ricci curvatures for complex networks [
12]. Forman–Ricci curvature and Ollivier–Ricci curvature are the two widely-used notions of discrete Ricci curvature that have also found several practical applications, including detection of crashes and bubbles in financial markets [
13] and community identification in complex networks [
14]. While the geometric and topological properties of complex networks are being extensively analyzed, studies involving the geometric and topological characterisation of hypergraphs are rare. Nevertheless, there have been a few attempts at extending the notion of discrete Ricci curvature to the framework of hypergraphs [
15,
16].
The present work is dedicated to the proposition that hypergraphs can be naturally construed as partially ordered sets or posets, which in turn, have an innate interpretation as simplicial complexes. Posets exhibit a strict partial order relation, which is a homogeneous, transitive and asymmetric relation. However, it is important to note that the Hasse diagrams representing posets can be quite symmetrical, even though they are directed graphs. Following the above mentioned properties of hypergraphs and posets, we provide a canonical method to construct two-dimensional simplicial complexes associated with hypergraphs. Two-dimensional simplicial complexes are endowed with intrinsic topological and geometric properties, which allow us to easily compute the Euler characteristic of hypergraphs. Notably, the interpretation of hypergraphs as two-dimensional simplicial complexes also allows us to define the Forman–Ricci curvature for hypergraphs. Subsequently, we performed an empirical study of real-world hypergraphs, where we computed and analyzed the Forman–Ricci curvatures of hyperedges in several real-world hypergraphs. Specifically, we compared the Forman–Ricci curvature of hyperedges with two standard hyperedge measures, namely eigenvector centrality [
17] and cardinality.
2. Preliminaries
A hypergraph is defined as a pair , where V is a finite set of elements called vertices or nodes and A is a set of non-empty subsets of V called hyperedges. Note that hypergraphs are a generalization of graphs or networks. In contrast to graphs where an edge can connect only two vertices, hyperedges in hypergraphs can connect a group (or a set) of two or more vertices together.
2.1. Posets
We briefly summarize here the minimal definitions and properties of posets that we have used in this work. Further, we presume the reader is familiar with the basic definition of posets.
Definition 1 (Coverings). Let be a poset, where < denotes the partial order relation on , and let be elements of . We say that p covers q if and there does not exist , such that . We denote the fact that p covers q by .
Definition 2 (Ranked posets). Given a poset , a rank function for is a function ρ: such that
If q is a minimal element of , then ;
If , then . A poset is called ranked if there exists a rank function for . The maximal value of is called the rank of , and it is denoted by .
Note that for the definition of ranked posets we essentially (but not strictly) follow the one provided by Bloch [
18] and, while other terminologies exist [
19,
20], we prefer the one above for the sake of clarity and concordance with Bloch’s paper. Let us also note that if a poset is ranked, then the rank function is unique. Furthermore, if
is a ranked poset of rank
r, and if
, we denote
, and by
the cardinality of
, i.e.,
.
While a variety of examples of posets pervade mathematics, a basic example is that of the set of subsets, i.e., the power set
of a given set
X. Here, the partial order relation < is defined by the inclusion relation ⊂. Note that the hypergraph
defined on the vertex set
V is essentially a subset of
, and therefore, can also be regarded as a poset. As for the case of posets in general,
represents the archetypal example of ranked posets. Thus, hypergraphs represent, in essence, ranked posets, which is essential for the sequel. Many of the hypergraphs arising in real-world datasets are actually directed in nature [
15,
21]. For such hypergraphs, the poset structure is even more evident, as the order relation is emphasized by the directionality. Moreover, if there are no loops in such a hypergraph, the resulting poset is also ranked.
2.2. Simplicial Complexes and the Euler Characteristic
For this work on posets, rather than providing the full technical definition of a simplicial complex, we refer the reader to classic references (see e.g., [
22,
23]).
Given a poset , there exists a canonical way of producing an associated simplicial complex , by considering a vertex for each element and an m-simplex for each chain of elements of .
If the poset
is finite, we can define its Euler characteristic as being equal to that of the associated simplicial complex
, i.e.,
Note that the hypergraph defined previously is a finite hypergraph. Thus, we can define the Euler characteristic of the simplicial complex associated with H.
Equation (
1) enables us to define the Euler characteristic of any poset, even if it is not ranked, due to the fact that the associated simplicial complex is naturally ranked by the dimension of the simplices (faces). However, if
is itself ranked—as indeed it is in our setting—then there exists a direct, purely combinatorial way of defining the Euler characteristic of
that emulates the classical one, in the following manner:
While in general
and
do not coincide, they are identical in the case of
complexes, thus in particular for polyhedral complexes, hence a fortiori for simplicial complexes. In particular, we shall obtain the same Euler characteristic irrespective of the model of hypergraph that we choose to operate with: the poset model
, its associated complex
, the geometric view of posets as simplicial complexes attached to each subset of cardinality
k (i.e., to each vertex a
k-simplex), or the more general polyhedral model that we considered in Ref. [
15]. Thus, the Euler characteristic of a hypergraph is a well defined invariant, independent of the chosen hypergraph model, and as such captures the essential topological structure of the network.
In this work, we provide a slightly modified procedure for obtaining a simplicial complex from a poset
. Specifically, we considered a vertex for each element in
and a 2-simplex (triangle) for chains consisting of at most 3 elements. This allows us to produce a two-dimensional simplicial complex associated with
, and we denote such a complex as
.
Figure 1 is a graphical illustration summarizing the construction of
and
for an example poset. The Euler characteristic
of
is given by the following simple expression:
where
,
and
are the number of vertices, number of edges and number of triangular faces, respectively, in
. In the succeeding sections, we shall show that
is not just a topological invariant, but it is also closely related to the geometry of
.
2.3. Forman–Ricci Curvature
Forman introduced a discretization of the notion of Ricci curvature [
24], by adapting the Bochner–Weizenböck formula [
25] to the quite general setting of
complexes. Some of us have expatiated on the geometric content of Forman’s discretization of Ricci curvature elsewhere [
12]. Therefore, to avoid repetition, we refer the reader to [
12]. Note that in [
12], Forman’s original notion of Ricci curvature is referred to as the augmented Forman–Ricci curvature. Moreover, we had introduced a one-dimensional notion of Forman–Ricci curvature and employed it for the study of complex networks in [
26].
While Forman–Ricci curvature applies to both vertex- and edge-weighted complexes, (a feature which makes it suitable for broad range of applications), we concentrate here on the combinatorial case, wherein all vertex as well as edge weights are equal to 1. In this combinatorial case, Forman–Ricci curvature has the following simple and appealing form:
Here, # denotes the number of elements in a set, denotes triangles and e denotes edges, while denotes the set of triangles that consist of the edge e. Moreover, “” denotes parallelism, where two faces of the same dimension (e.g., edges) are said to be parallel if they share a common parent (i.e., a higher dimensional face containing the edges, in this case, a triangle), or a common child (i.e., a lower dimensional face, in this case a vertex), but not both.
Note that for “shallow” real-world hypergraphs, like the metabolic networks considered in [
21], both Forman–Ricci curvature and the Euler characteristic are readily computable.
2.4. The Gauss–Bonnet Formula
In the smooth setting, there exists a strong connection between curvature and the Euler characteristic, that is captured by the classical Gauss–Bonnet formula [
25]. However, the Forman–Ricci curvature as defined in [
24] does not satisfy a Gauss–Bonnet type theorem, since no counterparts in dimensions 0 and 2, essential in the formulation of the Gauss–Bonnet theorem, are defined therein. Interestingly, Bloch defined these necessary curvature terms and was thus able to formulate [
18] an analogue of the Gauss–Bonnet theorem in the setting of ranked posets. While in general the one-dimensional curvature term has no close classical counterpart, in the particular case of cell complexes (and thus of simplicial complexes), Euler characteristic and Forman–Ricci curvature are intertwined in the following discrete version of the Gauss–Bonnet theorem:
Here,
denote the sets of vertices, edges and faces (triangles), respectively, and
and
denote the respective zero-dimensional and two-dimensional curvature terms required in a Gauss–Bonnet type formula. These curvature functions are defined via a number of auxiliary functions, as follows:
where
are the aforementioned auxiliary functions, which are defined in the following simple and combinatorially intuitive manner:
Since we consider only triangular 2-faces, the formulas for the curvature functions reduce to the following very simple and intuitive ones:
where
denotes, conform to the canonical notation, the degree of the vertex
v, i.e., the number of its adjacent vertices.
From these formulas and from the general expression of the Gauss–Bonnet formula (Equation (
5)), we obtain the following combinatorial formulation of the noted formula in the setting of the two-dimensional simplicial complexes:
or, after taking into account also Equation (
9), and some additional manipulations:
Note that for unity of notation throughout the paper, we preferred to write in the formulas above,
rather than
V, and
instead of
F, as commonly used. Equation (
5) and its variations also allow for studying the long time behaviour of evolving (hyper-)networks via the use of prototype networks of given Euler characteristic [
27].
3. Hypergraph Datasets
For our empirical study, we applied the poset-based construction on 12 real-world hypergraphs spanning multiple domains.
Table 1 provides the summary statistics for these 12 hypergraphs.
First, we considered three hypergraphs derived from biological datasets, namely biogrid [
28], protein-complex [
29], and disease-gene [
30]. In the biogrid hypergraph, vertices are chemicals and a hyperedge is a set of chemicals that interact with a given gene. In the protein-complex hypergraph, vertices are proteins and a hyperedge corresponds to a set of proteins contained in a particular complex. In the disease-gene hypergraph, vertices correspond to diseases and a hyperedge consists of a set of diseases associated with a particular gene.
Second, we acquired four hypergraphs based on affiliation networks, namely corporate-memberships [
31,
32], senate-committees [
33], norwegian-directorate [
34], and youtube-groups [
35]. In the corporate-memberships hypergraph, vertices correspond to corporate executive officers and a hyperedge consists of members of a particular social organization. The senate-committees hypergraph is derived from Congressional data [
36], where vertices are members of the US Senate and hyperedges correspond to committee memberships. The norwegian-directorate hypergraph is based on an interlocking directorate among public limited companies in Norway, wherein the vertices are directors and hyperedges comprise of directors affiliated to boards. In the youtube-groups hypergaph, vertices are YouTube users and a hyperedge is a set of users in a given group.
Third, we considered 2 hypergraphs based on communication networks, namely facebook-forum [
37], enron-email [
38], and one hypergraph contact-highschool [
39] which is based on proximity data. The facebook-forum hypergraph is based on users’ activity in a Facebook forum, wherein the vertices represent Facebook users and a hyperedge corresponds to the users that have participated in a given topic of the forum. In the enron-email hypergraph, vertices correspond to emails at Enron and a hyperedge consists of the sender and all the recipients of an email. The hypergraph contact-highschool is based on close-proximity human interactions [
40], obtained from sensor data worn by students at a high school, wherein vertices represent students and each hyperedge is a group of students that were in close proximity of one another.
Finally, we acquired two hypergraphs, github-projects [
31] and cond-mat-publications [
3] that are based on collaboration networks and co-authorship networks, respectively. In the github-projects hypergraph, vertices correspond to GitHub users and a hyperedge consists of users which have contributed to a project. The cond-mat-publications hypergraph is based on preprints posted to Condensed Matter section of arXiv E-Print Archive between 1995 and 1999, wherein the vertices correspond to authors and a hyperedge represents a scientific paper, which comprises of co-authors of that paper.
4. Results
For each real-world hypergraph
H, we obtained the associated two-dimensional simplicial complex
by the canonical construction procedure described in
Section 2.2. Note that the vertices (0-simplices) of
are comprised of the vertices
V and hyperedges
A of the original hypergraph
H, and the edges (1-simplices) and triangular faces (2-simplices) are defined by the partial order relation (see
Figure 1). To investigate the topological properties of the hypergraphs, we computed the Euler characteristic of the associated complex
as defined in Equation (
3). The computed Euler characteristic corresponding to each of the 12 hypergraphs considered in this study is reported in
Table 2.
To study the geometrical properties of the hypergraphs, we computed the Forman–Ricci curvature
of the edges in
(see Equation (
4)). For all the 12 hypergraphs, we found that the Forman–Ricci curvature was negative for most of the edges in
, indicating a strong prevalence of partial order relation in all the hypergraphs. Moreover, negative edge curvatures indicate a rarity of triangular faces in
. Recall that the Forman–Ricci curvature of an edge depends on the degrees of the anchoring vertices as well as the triangular faces in which this edge belongs.
Forman–Ricci curvature is a discrete analogue of the classical Ricci curvature [
25] and is fundamentally defined on edges. However, it is also possible to define the scalar curvature for a vertex or a node as the sum of the curvatures of all edges incident to that vertex:
The above definition was previously used to analyse the curvatures of vertices in complex networks [
12]. In the present work, we have extended this definition to the case of two-dimensional simplicial complexes
associated with hypergraphs. We found that the scalar curvature is also negative for most of the vertices in
.
Figure 2 shows the distribution of the scalar curvatures of those vertices in
that correspond to the hyperedges
A of the original hypergraph.
While edges and triangular faces in the poset transformation
starting from the original hypergraph do not have direct correspondence to real-world data, the vertices in the transformed poset or two-dimensional simplicial complex
represent the vertices and hyperedges of the original hypergraph. Hence, the notion of scalar curvature makes it possible to extend the definition of Forman–Ricci curvature to vertices as well as hyperedges in hypergraphs, and also allows for a comparative analysis with existing hypergraph measures. In the present work, we compute the correlation between Forman–Ricci curvature of hyperedges and two standard hyperedge measures, namely eigenvector centrality and cardinality. A brief description of both these measures is provided in
Appendix A.
Eigenvector centrality was first introduced by Bonacich [
17] as a measure to quantify the important individuals in the context of social networks. Since then, it has become a standard tool for network analysis with a wide range of applications including Google’s PageRank [
41] and as a means to predict neuron’s firing rate [
42]. It is based on the idea that the centrality score of a vertex depends on the centrality score of its neighbours. Moreover, Bonacich [
17] generalised this concept to two-mode data such as hypergraphs, which allows one to compute the centralities of not only the vertices, but also the hyperedges. It follows from the definition of eigenvector centrality (see
Appendix A) that central vertices or nodes belong to central hyperedges and central hyperedges contain central nodes.
Figure 3 is a scatter plot between Forman–Ricci curvature and eigenvector centrality of the hyperedges for the 12 hypergraphs considered in this study.
Table 3 shows the Pearson correlation between Forman–Ricci curvature and eigenvector centrality of the hyperedges for the hypergraphs. We found that most real hypergraphs show a moderate to high negative correlation between Forman–Ricci curvature and eigenvector centrality, regardless of the domain that they belong to. However, we should emphasize that 3 hypergraphs, namely norwegian-directorate, enron-email and cond-mat-publications display a weak correlation between Forman–Ricci curvature and eigenvector centrality. On the other hand, affiliation hypergraphs such as corporate-memberships, senate-committees, and youtube-groups have a very high absolute correlation (≥0.9) between the two measures. Additionally, we found that hypergraphs representing biological datasets show moderate correlation between Forman–Ricci curvature and eigenvector centrality.
Figure 4 is a scatter plot between Forman–Ricci curvature and cardinality of the hyperedges for the 12 hypergraphs considered in this study.
Table 3 shows the Pearson correlation between Forman–Ricci curvature and cardinality of the hyperedges for the hypergraphs. The Forman–Ricci curvature of hyperedges in all the hypergraphs display a high negative correlation with their cardinalities. Similar to the results of correlation with eigenvector centrality, we found that affiliation hypergraphs such as hypergraphs of senate-committees, corporate-memberships and norwegian-directorate show a very high absolute correlation of
,
and
, respectively. The affiliation hypergraph youtube-groups also shows a high absolute correlation of
. Moreover, we found that hypergraphs representing biological datasets display a moderate to high absolute correlation between the two measures.