1. Introduction
This paper is concerned with the weighted homology of the nerve complex of RNA bi-structures [
1]. The weighted nerve complex of such bi-structures provides a framework for studying the computational complexity of algorithms for identifying sequences that are thermodynamically stable with respect to a given bi-structure. Such sequences hold the key for evolutionary optimization [
2].
The weighted homology introduced here augments the simplicial homology by making the weights of the simplices an integral part of the theory. In our situation, these weights encode the cardinality of intersections of loops in a bi-structure. Along the lines of [
3], the notion of a weighted complex is introduced. Differently from [
3], the weights of simplices in this paper are taken from a discrete valuation ring that contains a copy of the integers as units and in which the weight of a simplex divides the weight of its faces.
1.1. Background
RNA sequences are linear, single-stranded sequences composed of the bases A,U,G, and C. In contrast to double-stranded DNA, RNA sequences fold into structures (see
Figure 1). A particularly important class of RNA structures is that of RNA secondary structures [
4]. Secondary structures can be represented as planar, non-crossing arc diagrams [
5,
6]. By means of its folded structure, RNA facilitates a plethora of important functional roles [
7]. The boundary components of an RNA secondary structure, when viewed as a fatgraph [
8], are called loops, and they determine the minimum free energy of the structure.
Secondary structures have been studied from the perspectives of enumerative combinatorics [
9,
10,
11,
12], algebraic combinatorics [
13], matrix models [
14,
15], and topology [
16,
17,
18].
In [
10], a bijection between linear trees and secondary structures was presented. In [
13], the authors enumerated
k-non-crossing RNA structures by employing a bijection between
k-non-crossing partial matchings and walks in the interior of the Weyl-chamber
based on a certain bijection between oscillating tableaux and matchings [
19,
20,
21]. The geometrical realization of a certain complex of secondary structures was studied in [
22].
A key determinant of a sequence–structure pair is its thermodynamic stability, which is measured via its free energy. Given a sequence
, RNA structure prediction algorithms identify structures by assuming minimal or low free energy with respect to
[
23]. Given any RNA sequence–structure pair
, the free energy
is computed by adding all loop energy contributions [
24,
25], i.e.,
. Here,
depends on the loop type of
s (hairpin, interior, multi-branch, etc.) and the particular nucleotides of the sequence
a that appear in the loop
s. Details on loop decompositions, partition functions, and recursive constructions for the folded structure—derived via polynomial-time dynamic programming (DP)—can be found in [
9,
23,
26,
27,
28]. In [
12], the notion of bi-secondary structures was introduced. Bi-secondary structures are central to identifying sequences that can realize two mutually exclusive minimum free energy (mfe) conformations. Such sequences naturally appear in the context of evolutionary transitions [
29] and of RNA riboswitches, i.e., sequences that exhibit two distinct, stable configurations [
30].
1.2. Motivation
This paper was motivated by the problem of computing thermodynamically stable sequences for two distinguished secondary structures,
S and
T [
2]. The problem can be formalized as the computation of a sequence
a that minimizes
, where
The sub-problems of the underlying DP routine are associated with sets of loops and recursively constructed by adding one loop at a time, where subsequently added nucleotides affect the energy calculation if they appear in multiple loops simultaneously; see
Figure 2. This leads to the consideration of all such intersections as a simplicial complex,
X, whose homology provides insights into the above optimization problem.
In [
1,
31], we computed the simplicial homology groups of
X and proved that only the second homology group carries information. This group was identified to be free, and its generators were identified as the crossing components within the diagram representation. All known RNA riboswitches exhibit one such crossing component, while random pairs of structures contain multiple crossing components.
The simplicial homology of
X in [
1] only allows us to express the existence of loop intersections. However, the crucial determinant for the algorithmic complexity of the DP routine in [
2] is the size of such intersections; see
Figure 2. The recursion is over the set of loops, successively examining one loop at a time. Each such examination affects certain vertices that the currently examined loop shares with the unprocessed loops.
It is thus natural to employ a homology theory that accounts for the intersection size. The notion of weighted homology as originally formulated [
3], assigns a certain weight to each simplex, imposing specific divisibility conditions on these weights.
Our framework represents a variation on weighted homology as follows: We consider homology with coefficients in a discrete valuation ring, which, in fact, makes any divisibility constraints obsolete and allows us to deal with the cardinality of intersections in a natural way. The boundary operator introduced here is new, and so is our approach to connecting the weighted and simplicial homologies. More specifically, the construction of an inflation map relating and is a new technique. The computation of the weighted homology groups, , , and , is also novel. Specifically, the idea of obtaining a combinatorial interpretation for the invariant factors in the structure theorems for and via particular -spanning trees is original. The chain maps between and , which relate the and introduced here, are new and nontrivial. In particular, the injection that induces the isomorphism is a result of the fact that, in the expressions for the generators of , any triangle has a corresponding intersection size of one.
1.3. Organization
The paper is organized as follows: In
Section 2, the simplicial loop complex of a bi-structure is introduced and the splicing of nucleotides is recalled [
1]. In
Section 3, the weighted complex of a bi-structure and a modified splicing procedure are discussed. In
Section 4, the weighted homology and simplicial collapses [
3,
32] are introduced. In
Section 5, a relative point of view is adopted by relating the simplicial and weighted homologies. Specifically, we introduce the inflation map, which maps classes of unweighted 1-cycles into corresponding weighted 1-cycles. In
Section 6, we construct certain spanning trees that will be instrumental for proving our main results in
Section 7. Here, we compute the weighted homology modules of bi-structures and provide combinatorial interpretations. The torsion module emerging in the zeroth homology stems from a recursive contraction scheme on the distinguished weighted spanning trees of the complex. The torsion of the first homology module can be understood via particular weighted edges that reference the aforementioned spanning tree. We conclude the paper with the computation of the second homology module, with all others being trivial. We integrate our results and outline future work in
Section 8.
2. The Simplicial Loop Complex of a Bi-Structure
An RNA diagram
S over the set
is a vertex-labeled graph—whose vertices represent nucleotides—drawn on the horizontal axis and labeled with elements of the set
. An arc
, is an ordered pair of vertices, which represents the base pairing between the
i-th and
j-th nucleotides. Each
S-vertex can be paired with at most one other vertex, and the arc that connects them is drawn in the upper half-plane. An RNA diagram over the set
is augmented with two additional vertices associated with positions 0 and
, together with the arc
, which is called the rainbow, representing the closing base pair of the external loop of the RNA structure; see [
23]. The set of vertices
is called an interval, and is denoted by
. The set
is referred to as the backbone of the diagram. Two arcs,
and
with
, are called crossing if and only if
.
S is called a secondary structure if it does not contain any crossing arcs. A loop
s in a secondary structure
S is represented as the disjoint union of intervals
, such that
and
, for
, are arcs, and any other interval-vertices are unpaired. We denote by
the unique, maximal arc
of the loop
s, and we have a correspondence between loops and their maximal arcs; see
Figure 1.
S-arcs and loops can be endowed with a partial order: . Abusing the notation, for the two loops , we write if . Accordingly, (a) any unpaired vertex is contained in exactly one loop, (b) any non-rainbow arc appears in exactly two loops, and (c) the Hasse diagram of is a rooted tree with the rainbow arc as the root.
Given two secondary structures S and T, we refer to as a bi-secondary structure (bi-structure). Abusing the notation, we let be the loop set of R. We represent a bi-structure with the S-arcs in the upper and the T-arcs in the lower half-plane along the same horizontal backbone.
Let be a collection of finite sets. We call a d-simplex of A if . We set and let . Let be the set of all d-simplices of A. Then, the complex (nerve) of A is A -simplex is called a -face of B if and . Let be a -face of a maximal simplex B, where . Then, is B-free if no other maximal simplex of contains as a face. By construction, is a simplicial complex. For a secondary structure S, is a tree.
Let
be a bi-structure with the loop complex
. Then, a simplicial order,
, is given by
We specify a d-simplex, , by the ordered -tuple , where .
A 1-simplex
is called pure if
and
are loops in the same secondary structure and mixed otherwise; see
Figure 3.
By construction, any 2-simplex contains exactly one pure and two mixed edges. Furthermore, X cannot contain simplices of a dimension greater than or equal to four, as that would imply that three or more loops in the same secondary structure intersect non-trivially. Moreover, for any , and .
In [
31], we investigated the effect of splicing a nucleotide
p into two adjacent nucleotides
such that the two arcs incident to
p were resolved into two non-crossing arcs. In the case
, splitting does not change
X, and if
, splitting induces a specific alteration: the removal of
from
X, as well as a distinguished free edge,
, and exactly two free triangles,
, glued at
. We refer to
as a
-butterfly; see
Figure 4.
Lemma 1. Let be a bi-structure with complex X; then, there exists a simple bi-structure with a complex , where is derived from X by successively removing any 3-simplex, σ, together with an associated σ-butterfly.
Thus, successively splicing all nucleotides in P induces a bi-structure , together with an associated complex , which is obtained by removing all -butterflies and their corresponding s from X.
Removing a
-butterfly and its corresponding
is equivalent to
—a simplicial collapse consisting of two elementary collapses [
32,
33]. The first removes
and
, and the second removes
and
.
Proposition 1. Let X be the loop complex of a bi-structure , and let be an X-sub complex obtained by removing all 3-simplices, σ, from X, together with corresponding σ-butterflies. Then, is the complex of a bi-structure that satisfies for any , .
Note that after splicing, we have simplified X to a complex of a bi-structure, , thus reducing its maximum simplex dimension and maintaining its homology. In the following, we generalize this to weighted complexes.
3. -Splicings and the Weighted Complex of a Bi-Structure
We begin by modifying the splicing introduced in
Section 2 as follows: Instead of replacing a nucleotide
by the two nucleotides
, we substitute it by three nucleotides,
, such that the arcs in
R that share
p as an endpoint now have endpoints
and
, which are non-crossing, and
a is unpaired. We call this a
-splicing of
p.
Note that -splicing and splicing all have exactly the same effect on X—namely, eventually removing any 3-simplex and a corresponding -butterfly. In terms of bi-structures, from , -splicing P produces the bi-structure augmented by a set of distinguished, unpaired nucleotides , which only appear in 0- and 1-simplices. By construction, the complex of does not contain any 3-simplices. Furthermore, after -splicing P, any distinguished nucleotide produced is flanked by two distinct endpoints of non-crossing arcs. Thus, for any simplex , we have , where denotes the difference between the number of non-distinguished and distinguished nucleotides contained in . Note that h is a natural generalization of in the context of complexes of bi-structures. Accordingly, we define a weighted complex associated with a bi-structure as follows.
First, we recall that a ring R is called a discrete valuation ring if it is a Principal Ideal Domain (PID) with exactly one non-zero maximal ideal. Any irreducible element is a generator of this ideal and is called a uniformizer for R, and furthermore, any two such elements differ only up to multiplication with a unit.
Definition 1. Let X be a complex of a bi-structure such that for any , we have . Let be a discrete valuation ring with uniformizer π. We define the weight function , , and call the weighted complex of .
Lemma 2. Let be a bi-structure with no distinguished vertices and complex X. Let be derived from via μ-splicing P, and we denote its complex by . Then, there exists an embeddingwhere is a X-subcomplex obtained by removing any 3-simplices and associated butterflies. Moreover, ϵ is an embedding of weighted complexes: . Proof. Successively -splicing P generates the series of complexes . By construction, given and a 3-simplex , the corresponding -splicing at p replaces p with the triple . Since is unpaired, is contained in exactly two loops, , the only two loops that change size in terms of number of nucleotides when passing to . Since the distinguished nucleotide contributes a , we note that and ; furthermore, . Any other -simplex—differently from and its -butterfly, which are removed—is not affected by the splicing, and consequently retains its weight in , whence the lemma. □
4. Weighted Homology
Consider
, where
denotes the free
R-module generated by all
n-simplices contained in
X. Let
be given by
where the symbol
is defined to be
. This is, in view of
, a non-negative power of the uniformizer
and, hence, still an element of the ring
R. As a result, the boundary operator
is well defined. Since
, we obtain
, i.e.,
is a boundary map.
Let
denote the
-homology groups. For
, we have
and we denote the relative homology groups by
. These are connected via the long exact sequence
Lemma 2 guarantees that, for any weighted complex of a bi-structure , there exists a bi-structure with a weighted complex such that and where . Thus, the relative homology groups and their associated long sequence are well defined.
However, a weighted complex does not have to be induced by a bi-structure. The previous definitions still hold as long as
for an arbitrary
-pair. Weighted complexes represent a general combinatorial-algebraic framework that enhances the simplicial homology for a wide range of applications. The next proposition extracts the algebraic core of
-splicings; it represents a variation of [
32], and it also appears in some form in [
3].
Proposition 2. Let be a weighted complex and let such that: (a) τ is σ-free; (b) . Let Y be the X-subcomplex obtained by removing any (). Then, Proof. Claim 1. It suffices to prove the Lemma for a σ-free τ with the property .
We may assume that and . The lattice of sub-simplices of is isomorphic to a binary k-cube, where cover relations are induced by face inclusions.
We prove by induction on k that a k-cube can be recursively decomposed into a sequence of pairs , where is a maximal simplex with a free face . The induction basis is immediate.
For the induction step, note that a k-cube decomposes into two disjoint copies of a -cube based on if the last coordinate is zero or one. By the induction hypothesis, for a -cube, there exists a sequence, , with the desired properties; appending one or zero as the last coordinate yields two mappings from the cube into the k cube. By construction, these form a family of disjoint-type embeddings, i.e., their images are injective copies of the -cube that are vertex disjoint in the k-cube. Accordingly, we obtain and . As the two sub-cubes are disjoint, we can immediately construct the desired sequence for the k-cube, ; the claim follows.
Claim 2. Let , where maximal, τ is σ-free, and . Then, for any , .
Note that only and are non-trivial. implies ; hence, . Note that , so implies ; thus, . Similarly, , so is also generated by . Now, , and so . Finally, for , , so for any , and Claim 2 follows.
The proposition is implied in view of Claim 2 and the following long exact sequence:
□
In view of and , for any and its corresponding -butterfly, Proposition 2 implies the following.
Corollary 1. Let be the complex of the bi-structure , and let be the complex of that is obtained by completely μ-splicing . Then, 5. The Inflation Map
In view of Corollary 1, we can assume that
X is the complex of a bi-structure with the property
, where
denotes the simplicial complex induced by all
k-simplices in
X [
34]. Clearly, the natural embedding
is an embedding of weighted complexes, and we have the exact sequence
producing the long exact sequence
We now adopt a relative point of view by relating the simplicial homology
to the weighted homology
. To this end, we consider the following two segments of the long homology sequences
where the surjectivity of
is given by
[
1].
While these sequences belong to different categories, the first two vertical maps are natural injections and relate simplicial and weighted homology modules.
Indeed, the first vertical injection is induced by chain maps of the form
,
, which make the following diagram commutative:
The second vertical injection is given by the fact that, since , is an R-submodule of and, by construction, and .
This motivates us to ask if there exists a -linear map that makes the above diagram commutative. It turns out that such a map exists and will prove useful for understanding .
Lemma 3. The mappingwhere , is well defined and -linear; the following diagram is commutative: Proof. By construction, it suffices to show that infl is well defined. Observe that, in view of sequence (
1),
is surjective. Hence, any
can be expressed as
, where
is unique, modulo elements of
. Secondly, in [
31], we showed that
where
;
is contained in the
-crossing component
and
.
Any face
e contained in
appears with opposite signs in two distinct the 2-simplices
and
, where
. Therefore,
i.e.,
; infl is well defined, and a lemma follows. □
We next employ to obtain information about . is connected, and for a spanning tree , the free generators of are in bijection with ’s mixed cycle closing edges for . A generator is a sum of edges and the closing edge , with each such edge appearing exactly once. We fix as follows: Consider the trees of the sub-complexes for S and T, respectively, and let be the tree obtained by connecting the -roots with the mixed edge , corresponding to the two rainbow loops.
Lemma 4. The following sequence of R modules is exact: Proof. With
as above, consider a generator
where, for
,
are pure
S- or
T-edges and
while
. Since
is surjective, we can also write
This involves additional mutually canceling edges,
k, all of which are mixed and appear as faces of pairs of 2-simplices,
. Without loss of generality (w.l.o.g.), we can traverse the cycle
beginning at
via the pure
S-edges until we arrive at
, after which we return back to
via the pure
T-edges. Any
pair satisfies either
or
, and we accordingly categorize the
k edges into
- and
-edges, respectively. As a result, in
-edges persist with the coefficient, say
, while all
-edges cancel. Hence,
where
. Note that only
and
are mixed
-edges, and so
.
Any set
M of
-cycles induces a set of distinct mixed faces
. In the two sub-trees of
S and
T formed by the respective pure edges of the
-paths, these mixed faces connect an
S-leaf with a
T-minimal vertex or a
T-leaf with an
S-minimal vertex. By construction, these edges are distinct from any of the
produced by any
M-cycle, and the coefficients of any of these
in a linear combination
are zero. Iterating this argument, we arrive at
Accordingly,
freely generate
, and the following diagram is commutative:
Since
, we have
whence
□
6. The Rank of
Next, we compute the rank of by constructing a basis. Note that an element in is a sum of cycles, and it is natural to ask what the analogue of a cycle in is.
Let be minimal in terms of the number of edges it contains. A straightforward computation shows that there exist with , where the multi-set is unique up to a scalar factor. This shows that, except for a scalar multiple, minimal cycles determine the distinguished cycles, and we shall call a cycle elementary if and is a minimal cycle. An elementary cycle is precisely the -analogue of a minimal -cycle.
In view of being connected, a basis of can be comprised of all minimal cycles derived from a fixed -spanning tree : Any non-tree edge added to this tree results in a unique minimal cycle in the basis that is labeled by the edge. One key observation here is that generates for an arbitrary spanning tree of .
To derive a basis of in an analogous fashion, an arbitrary choice of an -spanning tree is no longer sufficient because of the weights involved. However, in the following, we construct a distinguished tree that is particularly well suited to the understanding of the embedding of into .
Lemma 5. There exists an -spanning tree, , such that the following sequence is exact: Proof. It suffices to construct such that . To this end, we fix an arbitrary -spanning tree and let . We examine M-edges one by one, yielding a chain of processed edges, , and a sequence of trees, . We claim that these have the property .
The above holds trivially for
, and we assume that
and
are constructed with
. Let
and set
. Then, there exist
such that
. Letting
and
leads to
where, by construction, at least one edge has a coefficient of one. In case the
, set
. Otherwise, remove a
-edge with coefficient one, and add
to
, obtaining
. Note then that
By induction, when the processes terminates, it yields the desired . □
induces a new set of closing edges and the associated minimal -cycles . We shall show that the corresponding elementary cycles constitute a basis of .
Proposition 3. There exists an isomorphism Furthermore, is freely generated by the set of elementary cycles and has rank , where e and v are the numbers of 1- and 0-simplices in .
Proof. is free, as it is a sub-module of the finitely generated free module over the PID R and . By construction, to each , there corresponds a unique elementary -cycle , whose coefficient is one. Furthermore, is non-torsion in ; if , R being an integral domain implies .
Claim. Any has the representation with a unique .
We proceed by induction on the number of -closing edges in c.
To establish the induction basis, let be the unique term that contains in c. Using the fact that appears with a coefficient of one in , we derive .
For the induction step, distinguishing as a closing edge summand in c, we have with . By inductive hypothesis, with unique coefficients , hence the claim.
From the claim and the fact that the
are non-torsion, we conclude that
hence the proposition. □
7. The Modules of the Weighted Homology
By Lemma 5, we have
from which
.
Since
is a module over the PID
R, we arrive at
where
, and
is the
-torsion module of
.
We shall proceed by computing . Note that the -tree is a spanning tree of , and generates . Thus, we can compute via . To this end, we process as follows: We select a -vertex and an incident edge for which is minimal. We contract onto by removing and replacing any edges of the form with while maintaining their original weights . Recursively contracting all edges in produces a sequence of trees , where is a single vertex at which point the process terminates. Relabeling the -vertices, we may assume that is removed at the ith step of this process.
Theorem 1. There exists an isomorphismwhere , as described above. Proof. Let denote the set of recursively contracted edges up to and including the mth step. Set , , and , where the formal fractions are well defined by the minimality of the denominator exponent. Note then that .
Claim 1. for any .
We proceed by induction on
i, where Lemma 5 implies the induction basis
. Consider the contraction at step
i:
. A
-edge of the form
induces the
-edge
with
. Then,
implies
, and Claim 1 follows by induction.
Claim 2. .
Let
, where
is removed from
at the
ith contraction step. We rewrite the
z in terms of the
. This is done recursively, starting with
, as follows: Let
be the coefficient of
at step
p. Then,
implies
, and we obtain the updated
-coefficient
Processing all , we arrive at , and Claim 2 follows.
Claim 3. with .
By Claim 2, we have
, and we make the following ansatz:
First,
is a well-defined homomorphism: Suppose
. Since
is linearly independent from
,
, this immediately implies that
. Claim 1 guarantees that
, and consequently, there exist unique coefficients,
, such that
In view of
, we obtain
The linear independence of , in turn, implies that for any i, from which , and thus, .
is, by construction, surjective and does not have a nontrivial kernel from which Claim 3 follows.
By construction, , and as for the direct summands , for , we consider the mapping , . Suppose that ; then, implies that , so h is well defined. Clearly h is surjective, and it remains to check the injectivity. implies that and implies that , hence the theorem. □
We proceed by analyzing
. We begin with the expression for
in Lemma 4:
where
is the 2-simplex containing the edge
, and
are mixed edges that appear when transitioning between 2-simplices with different weights that share a
face.
The “troublemakers” here are the off-diagonal terms of the embedding, , which stem from passing from trivial to nontrivial crossing components and vice versa. We shall eliminate such transitions by appropriately deforming X without affecting its homology. We then have to check that the deformation gives rise to an inflation map that is just a restriction of the original inflation map for X. Let be a 2-simplex corresponding to a non-crossing arc, i.e., is maximal in X. W.l.o.g., we can assume that with . Then, is -free, and by construction, . Removing and from X produces a sub-complex with . By Proposition 2, we may remove all 2-simplices contained in X corresponding to non-crossing arcs together with their pure faces without changing the homology. Accordingly, the resulting is an X-sub-complex such that . Note that is, in general, not a complex induced by a bi-structure.
Lemma 6. There exists an -spanning tree, , with .
Proof. Let be a fixed -spanning tree. We then continue with -edge replacements, as in Lemma 5, and the process terminates with the -spanning tree , which has the property .
Note that a pure edge e that is not in can be written as , where is the unique 2-simplex containing the pure edge e. Since , we have or , depending on the sign of e in . Consequently, , and the lemma follows. □
Note that and . Furthermore, by Proposition 3, is generated by elementary cycles induced by . Hence, we can exploit the structure of in order to compute .
Theorem 2. is an R-torsion module, and there exists an isomorphismwhere is a closing edge of a minimal cycle with respect to . Proof. We begin by establishing that the inflation map naturally restricts to .
Claim 1
. We have the commutative diagram Clearly, and , and by the long homology sequences, and are embeddings. Lemma 6 shows that is also a -tree, from which is obtained from by just adding all elementary cycles induced by the pure edges that are not contained in . Consequently, removing these edges from is equivalent to restricting the inflation map to the sub-complex , and Claim 1 follows.
Claim 2
. For , we have . As in Lemma 4:
where
is the 2-simplex containing the edge
, and
are mixed edges that appear when transitioning between 2-simplices with different weights that share a
face. By construction, for any
,
, and so no
edges emerge in the inflation images. In particular,
has the coefficient
in
. By Lemma 6
, and so the corresponding elementary cycle
has the coefficient 1 at
. Since
is freely generated by all elementary cycles, Claim 2 follows.
is freely generated by
and is free of rank
. By Lemma 4,
, from which
is, by the structure theorem of finitely generated modules over PIDs, a full-torsion module. The exact sequence
implies that
, and clearly,
□
Theorem 3. The injection induces an isomorphism of R-modules: Proof. The long homology sequence for the weighted homology induces the exact sequence
where
and
. Hence,
, and the sequence is exact. In view of
and Proposition 3, all modules in the sequence are free and, thus, projective; hence,
. Analogously, the exact sequence
of free modules implies that
. Since
and
, we can conclude that
Finally, since both and are free finitely generated modules over the PID R, the theorem follows. □
Remark 1. Since , by construction, . As such, , and, in view of Corollary 1, this holds for weighted complexes stemming from arbitrary bi-structures.
8. Conclusions and Future Work
In this paper, we introduced the weighted homology of the nerve complex X of an RNA bi-structure. We demonstrated that the weighted homology of a bi-structure distinctively augments the simplicial homology.
In the simplicial homology, only the zeroth and second homology groups are nontrivial [
1]. Since the complex of a bi-structure is, by construction, connected, the simplicial homology carries key information exclusively via
.
The weighted homology not only conserves the information provided by the simplicial homology, but it also supplies additional information. Specifically, connectivity is still picked up by the free submodule of ; however, additional torsion now emerges. This torsion is given a concrete combinatorial interpretation via the compression algorithm on the spanning tree . In fact, the compression procedure naturally generates the free submodule of rank one at its final step.
In the case of , we consider and observe that the nontrivial chain maps , , produce an injection , in which all coefficients are one, since any 2-simplex contained in the expression of an -generator (i.e., appearing in a crossing component) has a weight of one. As a result, passing to the ring R preserves the information present in .
In the computation of
, we employed the simplicial homology in a different way. Here, we consider the relative homology and transcend the information from the 1-skeleton,
. While
is trivial [
1],
carries information, which we can utilize by extracting from
the embedded exact sequence of Lemma 4. After resolving some technicalities, this allows us to compute
. Note that passing to the deformation
greatly simplifies the inflation map, and we observe that it acts on
as
would on the 1-skeleton.
The structure theorems for the weighted homology enable the classification of bi-structures by their
R-modules. Such classifications can provide insights into the algorithmic complexity of problems involving bi-structures. For instance, the problem of computing thermodynamically stable sequences for a bi-structure can be efficiently solved in polynomial time when the second homology
R-module
is trivial [
2].
As for future work, we shall extend the homological analysis to RNA–RNA interaction structures [
35]. An interaction structure can be represented as a diagram with two backbones drawn horizontally on top of each other such that both intra-molecular and inter-molecular bonds are non-crossing; see
Figure 5.
The inter-molecular arcs naturally induce an equivalence relation
on the vertices of the two backbones. The intersection of two loops is accordingly defined to be the set of
-equivalence classes of vertices that they share. A bi-secondary structure can then be viewed as a particular type of interaction structure—namely, an interaction structure with two identical backbones, where interaction arcs connect any pair
for
. In contrast to a bi-structure, an interaction structure can exhibit nontrivial first-simplicial homology and requires revisiting the notion of a crossing component for interaction structures. The analysis involves many more topologies—in particular, surfaces and Mayer–Vietoris sequences [
34].
The framework of the weighted homology is by no means restricted to bi- or interaction structures. It also gives rise to the consideration of the dissimilarity complex of a finite set of genomic sequences, which we briefly discuss in the following. A multiple sequence alignment
W of genomic sequences is defined to be a finite set of words of equal length
m over the finite alphabet
. For any
, let
be the map that returns the symbol at position
k of a word, namely,
for
. A
subset
forms a
d-simplex if there exists at least one position
j such that
. In other words, all
sequences in
contain mutually different nucleotide types at the site
j. For a fixed
, the number of different positions
j for which the above holds is called the dissimilarity of
. This leads to the aforementioned dissimilarity complex for genomic sequences; see
Figure 6. This complex is, by definition, low dimensional; its highest dimension is
. For
, the 1-skeleton of the dissimilarity complex captures the well-known Hamming distance—the number of positions in which two genetic sequences differ. It may be worth pointing out that this framework integrates well-known generalizations of distances, as these appear naturally as weights in the dissimilarity complex. Triangle inequalities generalize to tetrahedron inequalities, etc., and this could enhance our understanding of genetic sequences. This is because basic constructs—e.g., trees, such as the tree of life—that reflect the ancestral relations between sequences depend on the notion of Hamming distance. Our preliminary investigations suggest that the homology of the dissimilarity complex captures evolutionary events. For example, the free rank of the first homology module gives bounds of the possible recombinations within the sequence set.