2.1. Introduction
We consider rooted trees; a root is entered by an edge which starts at a node of degree 1, referred to as a
super-root; the edge itself will be called the
root edge. We will regard the super-root of a tree, and then the root, as located “at the top,” and leaves of a tree (all of degree 1) as located “at the bottom”, as shown in
Figure 1a. Similarly,
subtrees are regarded with their root edge, which enters the root of the subtree and starts at its super-root, the latter being of degree 1. The
tail of an edge
e is its endpoint located closer to the root, and the second endpoint is the
head of the edge; they are denoted by
e+ and
e−, respectively. An
ancestor node/edge is located closer to the root, and a
descendant node/edge is farther from the root, both lying on the same path; this relation is denoted by ≤. The nearest ancestor is called a
parent, and the nearest descendant is called a
child; nodes located immediately below the same node are said to be
sibling nodes, each of them is called a
sister of any other one. In essence, a tree specifies a parent–child relation. For any set
N of leaves, a node which is the lowest among all ancestors of all leaves in
N is well defined. Denote it by
N′. For two trees
T1 and
T2, there is a well-known
canonical continuation [
13,
14] of
any preassigned mapping s from leaves in
T1 to leaves in
T2, which is denoted here by
s″. Namely, for an interior node
x in
T1 with a set
x< of leaves lying below it, we let
s″(
x) =
s(
x<)′. The set
x< is called the
clade of
x, or the
clade of the root edge entering
x, or the
clade of the subtree defined by
x.
If
s″(
e+) =
s″(
e−), then
s′(
e+) is defined as the edge entering
s″(
e−) from above; for instance,
s′ maps the super-root in
T1 into the root edge in
T2. Thus,
s′ maps nodes in
T1 into nodes and edges in
T2; see
Figure 1b. This definition can be generalized to the case where
T2 is a network (see
Section 2.3). We will refer to the mapping
s′ as an
embedding of
T1 into
T2. Below, when saying “a tree
T1 is mapped to a tree
T2,” we always mean precisely this mapping
s′ and speak about its images and arguments. Note that in this case we have a preassigned mapping
s, which is called a
leaf mapping. Of course, the mapping
s′ preserves the natural order.
The path of an edge e ∈ T1 in a tree T2 is a path consisting of nodes in T2 from s′(e+) to s′(e−) excluding its endpoints s′(e+) and s′(e−).
The
image tree of
T1 in
T2 is defined to be a tree isomorphic to
T1 with nodes located at nodes and in edges of
T2 so that
v is located either at the node
s′(
v) or in the pipe
t =
s′(
v) (see
Figure 2). In this sense, edges of
T2 will be referred to as
pipes containing nodes and edges of
T1. Images are
connected by an edge according to the parent–child relation of their arguments. Thus, we want to consider an
image tree as the
T1 tree located inside
T2. Our use of mathematical interpretations in embedding is related to our inspiration from biological paradigms, such as the evolution of genomes and genomic rearrangement.
If tree leaves are assigned unique names, we may not distinguish between a leaf and a name assigned to it. A tree with labelled leaves represents the discrete-time evolution of a matter which is assigned to the root of the tree; the evolution goes from the root to leaves through interior nodes of the tree. For example, this can be evolution of a gene, protein, species, etc., from the root along the whole given tree to its leaves, which represent up-to-date states of the initial (at the root) matter; such up-to-date states could already be actually specified.
Evolution analysis is one of the most fundamental mathematical problems, having at the same time a broad practical importance (see, e.g., [
15,
16,
17,
18]). Note, also, the work [
19], which describes an algorithm for solving the problem of isomorphic tree embedding in a network and provides numerous references to articles related to this topic. A detailed overview of topics related to evolutionary trees and numerous references are given in [
6]. Issues related to phylogenetic networks are discussed in the book [
7].
Let
T0 be the set of all leaves of a tree or network
T. The problem is as follows: given a tree
P, a set
M, a tree or network
S, and mappings
s1:
P0 →
M and
s2:
M →
S0, find a tree
G (where
G0 =
M) that reconciles the evolutions
s1′ and
s2′ in the following sense. We define
costs c(
s′) and
c(
s2′) of the embeddings
s1′:
P →
G and
s2′:
G →
S, respectively, and then minimize
f(
G) =
c(
s1′) +
c(
s2′) with respect to the argument
G (see
Figure 3). For the case of a network
S, the definition of the embedding cost of
s2′:
G →
S is given in
Section 2.7. Until that section,
S will always be a tree, and the embedding costs will be considered according to Definitions 1 and 2 below. However,
f(
G) can be defined to be an easily computable function with some properties that we will need below.
We are dealing here with a general mathematical problem, so all further material is presented independently of any applications. However, as a typical example, let us describe its content using biological terms. Let
P be a tree whose leaves are assigned with protein names (“protein evolution tree”, or, for short, protein tree), and let
S be a network, in particular a tree, whose leaves are assigned with names of species containing these proteins (“species evolution network”, or, for short, species network/tree). For more details concerning the notion of a network, see
Section 2.3. At this moment, we may assume
S to be a tree. Let
M be a set of genes from which proteins in leaves of
P are formed, and let a leaf mapping
s1:
P0 →
M be fixed where
y =
s1(
x) is the gene from which a protein
x is formed (see
Figure 3). On the other hand, a mapping
s2:
M →
S0 is fixed where
s2(
y) is the species to which a gene
y belongs. The composition
s2(
s1):
P0 →
S0 defines a mapping which specifies a species to which a protein
x belongs.
s1 and
s2 are specified extrinsically for applied reasons. Let
G be a tree with leaf set
M =
G0, which is called the “gene evolution tree”, or gene tree, or an
intermediate tree with respect to
P and
S. Given
s1 and
s2, their continuations
s1′:
P →
G and
s2′:
G →
S are defined. We regard
s1′ as evolution of proteins relative to genes (in the gene tree), and
s2′ as evolution of genes relative to species (in the species tree). The problem is to minimize
c(
s1′) +
c(
s2′) with respect to the gene tree
G and, as a result, to find a tree
G* on which
c(
s1′) +
c(
s2′) attains its minimum value.
2.3. Definition of an Embedding of a Tree into a Network
Definition 2. Let s′: T1 → T2 be an embedding. Recall that edges in T2 are referred to as pipes. An edge e ∈ T1 enters a pipe t ∈ T2 if s′ maps e+ to t+ or above t+ and maps e− to t or below t. Informally, in the image tree, the edge e persists at t+ and either becomes lost inside t or goes further through the whole t (see Figure 5a). Similarly, an edge e leaves a pipe t ∈ T2 if s′ maps e+ to t− or above and maps e− strictly below t−. Informally, in the image tree, the edge e arises at t− or above and persists in one of the child pipes of t−, maybe partially (see Figure 5b). Such a description is called an evolutionary scenario of T1 in T2. From
binary network S, we understand a directed acyclic graph with nodes divided into four groups, one node of in-degree 0 and out-degree 1 (the
super-root), nodes of in-degree 1 and out-degree 2 (
tree-type nodes, including the
root), nodes of in-degree 2 and out-degree 1 (
hybrid nodes), and nodes of in-degree 1 and out-degree 0 (
leaves). Thus, a tree is a particular case of a network with no hybrid nodes. A node may have two parents, so the notion of a nearest common ancestor is not well defined (see
Figure 6). If there is a directed path from node
x to node
y, then we say that
x is
above y and write
x ≥
y or
y ≤
x. A
hybrid pipe t is a pipe whose endpoint
t− is a hybrid node. A
tree-type pipe is defined similarly.
The continuation of this order onto all nodes and pipes, which is also written as x is above y (x ≥ y) or y is below x (y ≤ x), is defined similarly. For example, pipe e is below e+ and above e−. Sometimes, the term ancestor is used instead of above, and descendant is used instead of below; a node located immediately above some node is called its parent, and a node located immediately below some node is called its child; nodes located immediately below the same node are said to be sibling nodes; each of them is called a sister of any other one.
A subnetwork Sx is the network consisting of a node x and everything that lies below it, together with a root pipe (for Sx) entering x from above; if there are two such pipes, choose any of them. The leaf set in Sx is called a clade in S or the clade of the node x, or also the clade of the root pipe entering x; we denote the clade of x by x<. A subnetwork may contain nodes with two parents, such that one of them does not belong to Sx. Such nodes of a subnetwork will still be referred to as hybrid nodes. Throughout what follows, by a network we mean a rooted binary network with a root pipe, and similarly for a subnetwork.
As in the case of two trees, for a tree
G and a network
S we define a continuation
s″ of a mapping
s from leaves in
G to leaves in
S, though, on interior nodes, the value of
s″ can also be a pipe in
S. Namely, let a
continuation s″ from nodes in
G to nodes and pipes in
S be any continuation of
s preserving the natural ordering ≤ in
G and having the following properties, the super-root in
G is mapped to the root pipe in
S; and the image
y =
s″(
x) of a node
x is not a hybrid node. If
s″(
x) is a tree-type node, then the paths
px-x1 and
px-x2 of the child edges of
x enter different child pipes of
s″(
x). Moreover, the mapping
s″ for any edge
e ∈
G is
complemented with pointing to a directed
path pe which consists of nodes in
S leading from
s″(
e+) to
s″(
e−). An example in
Figure 7b is the node
e−, which is mapped to
t−. For the case of a tree, such a path
pe is unique; above, it was called the path of the edge
e. Such a non-unique
s″ will be called an
embedding of a tree
G into a network
S.
Now we will repeat Definitions 1 and 2 for a network. An edge e ∈ G enters a pipe t ∈ S if s″ maps e+ to t+ or above t+, maps e− to t or below t, and pe passes through t, maybe partially. Similarly, an edge e leaves a pipe t if s″ maps e+ to t− or above t−, maps e− strictly below t−, and pe passes through ti, maybe partially, where ti is a child pipe of t.
Let us be given an embedding
s″:
G →
S. A
duplication in
G is a node of degree 3 which is mapped to a pipe. A
loss in
G is a pair consisting of an edge
e ∈
G and a tree-type node
z ∈
S, such that
z lies in
pe (and does not coincide with
s′(
e+) or
s′(
e−)). A loss is said to be
implicit if one of the subnetworks with a root pipe starting at
z does not contain images from
G0, otherwise, it is said to be
explicit. The
locus of a duplication is defined to be its image, and the locus of a loss is the node
z. Duplications and losses will be called evolutionary
events. We assume each of them to be assigned with its cost, a strictly positive rational number. The
cost c(
s″) of an embedding
s″ is the total cost of all
duplication and
loss events in it. The notions defined in this paragraph are also well applicable to an embedding
s″:
P →
G, which leads to costs of the embeddings
s1″:
P →
G and
s2″:
G →
S. As a result, we have two costs,
c1(
s″) and
c2(
s″), and their sum
c1(
s″) +
c2(
s″). The duplication and loss events were introduced in [
14] in connection with the “alpha” mapping.
Definition 3. A minimum embedding s′ of a tree G into a network S is an embedding s″ with the minimum cost.
It is easy to construct a minimum embedding s′: G → S, where S is a network (minimum with respect to duplication and loss events). However, there can be many such minimum embeddings.
Such embeddings can be constructed by induction, where the induction step consists of embedding a subtree Ge into a subnetwork St in ascending order of their size (first over all t until we reach the root pipe, and then over all e until we reach the root edge), as is usually performed in dynamic programming.
At a step of this induction, costs of two embedding variants may happen to be the same; an example is shown in
Figure 7, where
e is a root edge and
e− is mapped to either
t (then
e is duplicated in
t) or
t− (then
e forks together with
t). This generates two different minimum embeddings. Namely, for the tree
G = ((
a1,
a2),(
a3,
a4)) and for the network with leaves
a,
b, and
c, we show two different minimum embeddings under duplication cost 2 and implicit loss cost 1. Each of them has s cost of 6; the first embedding has one duplication and 4 implicit losses, whereas the second has two duplications and two implicit losses.
2.4. Algorithm for Constructing an Intermediate Tree
We present an algorithm for the general case where
S is a network and where duplications and explicit and implicit losses have arbitrary costs. In the first reading, it might be convenient to assume that
S is a tree. Recall that
M is the set of leaves of the sought-for intermediate tree
G*, see
Figure 3, where
G is a variable tree, the argument of the functional over which the cost of an intermediate tree
G is minimized. Let
cdp and
cdm denote, respectively, the costs of duplications under mappings are
s1 and
s2;
elp and
elm are the costs of explicit losses; and
ilp and
ilm are the costs of implicit losses.
For an arbitrary non-empty subset A ⊆ M, denote by E(A) the set of edges e ∈ P for which the following is valid: s1(e<) ⊆ A and there is no e1 ∈ P with e1 > e for which s1(e1<) ⊆ A. For example, for the root edge r in G (or root pipe in S and mapping s2(s1)) we have E(r<) = {r′}, where r′ is the root edge in P. It is easily seen that a non-root edge t in the tree G is entered by precisely edges from E(t<). We will also consider the relation s2(t<) ⊆ A ⊆ S0, where t ∈ G.
The algorithm presented below exhaustively examines all
partitions of each set
A = (
A1 ∪
A2) ∈
B,
A ⊆
M. More precisely, we exhaustively examine all
pairs consisting of a
set A ∈
B and a
pipe z ∈
S, where
s2(
A) ⊆
z<. For that, we use the following order on pairs (
A,
z): ascending order of the cardinality |
A| of
A; for equal cardinalities, we arbitrarily fix an order on the first coordinate; for a fixed
A, we fix any order extending the relation < over pipes
z ∈
S from leaves to the root. According to this order, the algorithm constructs a
final tree
G =
G(
A,
z) with leaf set
A, partition
A = (
A1 ∗
A2), and root edge
t in
G which starts in
z. The algorithm constructs a tree
G so that
A1 =
t1< and
A2 =
t2<, where
t1 and
t2 are children of the edge
t. The algorithm also calculates
C(
A,
z), the sum of costs of embeddings into
G over all subtrees in
P with root edges
e ∈
E(
A) plus the cost of the minimum embedding of
G itself into the subnetwork
Sz. This whole sum will be called the
cost C(
A,
z)
of the tree G(
A,
z). Clearly, the tree
G* =
G(
M,
r) obtained at the output of the algorithm is a solution to the intermediate tree problem, and its
cost equals
C(
M,
r). The flow chart of the algorithm is shown in
Figure 8.
The induction base for a one-element set {m} = A and a leaf pipe z ∈ S, the tree G(A,z) consists of m and the root edge in m, then C(A,z) = .
Consider the induction step. It includes four cases, in which T denotes a candidate for G(A,z) and C denotes a candidate for the cost C(A,z). In Cases 1 and 2 below, z ∈ S forks in pipes z1 and z2, and in Case 2 it can also be a leaf in S.
If S is a tree, Case 4 is impossible, so only one of Cases 1–3 can be applied. Moreover, if s2(A) ⊆ t1<, then we may only proceed with Case 1; if s2(A1) ⊆ t1<, s2(A2) ⊆ t2<, we may only proceed with Case 3; if none of these conditions is satisfied, we proceed with Case 2. However, below we present the general algorithm assuming S to be a network.
(1) Let z− ∈ S be a tree-type node (a hybrid node is considered in Case 4), and assume that an edge t ∈ G leaves z and enters the pipe z1, i.e., s2′(t−) ≤ z1. This is the case of a loss of edge t at node z− with respect to the embedding s2′. Let T = G(A,z1) and C = (A,z1) + l, where l = elm if z2< contains an element from s2(M) and l = ilm otherwise. Note that A = t< and that for the tree S, the relation s2(A) ⊆ z1< holds in this case only. Symmetrically for t2.
(2) Let
A1,
A2 ∈
B be a partition of
A, and assume that there is
duplication in a pipe
z ∈
S with respect to
s2′, i.e.,
s2′(
t−) =
z ∈
S and the edge
t is forked in
z into
t1 and
t2. Let
T =
G(
A1,
z)^
G(
A2,
z), where ^ denotes the union of trees
G(
A1,
z) and
G(
A2,
z) under a common root, i.e.,
t1< =
A1 and
t2< =
A2.
Denote by
k the number of nodes
v ∈
P, such that
s1′(
v) =
t−, i.e., one edge leaving
v lies in
E(
A1) and the other is in
E(
A2). Let
If s1(P0) intersects with both A1 and A2, or C = C(A1,z) + C(A2,z) + {cdm + ilp·|E(A)|} otherwise.
(3) Let
A1,
A2 ∈
B be a partition of
A and assume that
s2′(
t−) =
z−, i.e., the edge
t is forked at a furcation
z− ∈
S,
z1 is entered by edge
t1 with clade
A1, and
z2 is entered by edge
t2 with clade
A2. This is the case of a coordinated furcation of
t and
z. As in Case 2, let
T =
G(
A1,
z1)^
G(
A2,
z2), and let
k be the number of nodes
v ∈
P with
s1′(
v) =
t−. Let
If s1(P) intersects with A1 and A2, and C = C(A1,z1) + C(A2,z2) + ilp·|E(A)| otherwise. Note that the condition s2(A1) ⊆ t1< and s2(A2) ⊆ t2< on the tree S holds in this case only.
(4) Edge t leaves a hybrid node z and goes into its unique child pipe z0, i.e., s2′(t−) ≤ z0. Then, T = G(A,z0) and C = C(A,z0).
For a final G(A,z) at step (A,z), the algorithm chooses the tree T with the minimum value of C over all cases and all partitions of A. This number C will be the cost C(A,z) of this tree T; at each step (A,z), the algorithm obtains the cost C(A,z) of a final tree G(A,z). For the final output tree, the algorithm chooses G* = G(M,r), where r is the root pipe in S. This G* is a solution to the intermediate tree problem. All these facts are proved below. □
Proof of Theorem 1. The proof is constructed by induction, in the dynamic programming style typically used for the optimization of a loss function, the latter being in our case the cost of the tree G.
The algorithm exactness. Using induction on all pairs (A,z), let us check that for each of the above four cases the cost c(T,A,z) of the tree T specified in this case equals the number C given there. In particular, the cost of the final tree G* = G(M,r) is C(M,r). The induction base, as well as induction steps of Cases 1 and 4, are obvious. Consider Case 3 of the algorithm; Case 2 is considered similarly. Recall that c(T,A,z) = c(s1′) + c(s2′), where c(s1′) is the total cost of embeddings into T of subtrees in P with root edges from E(A) (we call it the first part of the cost) and c(s2′) is the cost of the embedding of T into Sz (the second part of the cost); denote c(T,A,z) = 1(T,A,z) + 2(T,A,z). In Case 3, we obtain 1(T,A,z) = 1(T1,A1,z1) + 1(T2,A2,z2) + d, where T1 and T2 are subtrees with root edges t1 and t2, and d is the total cost of duplications in t and losses in t_ for the embedding of subtrees from P into T. Similarly, 2(T,A,z) = 2(T1,A1,z1) + 2(T2,A2,z2). Hence, c(T,A,z) = c(T1,A1,z1) + c(T2,A2,z2) + d. By the induction hypothesis, C(A1,z1) = c(T1,A1,z1) and C(A2,z2) = c(T2,A2,z2). It remains to show that {…} = d, where {…} is the third term in the expression for C. Indeed, the edge t is entered by |E(A)| edges from P, and the edges t1 and t2 are entered, respectively, by |E(A1)| and |E(A2)| edges that are children of the edges entering t. From the subtrees with root edges in E(A), delete everything that is below the edges entering t1 or t2. We obtain |E(A)| trees, in total with |E(A1)| + |E(A2)| leaves. Let I be the set of inner nodes in them. Clearly, |I| = |E(A1)| + |E(A2)| − |E(A)|. Every node in I is either a duplication in t or a furcation in t_, and vice versa, any such duplication or furcation corresponds to a node in I. By the definition, the number of furcations is k, and all other nodes in I are duplications. Among the |E(A1)| + |E(A2)| edges entering t1 or t2, exactly 2k do not generate a loss in t′−. Losses in t′− are either all explicit or all implicit, which depends on the sets A1 = t1< and A2 = t2< only. In the first case, we obtain the first expression for C. In the second case, we have k = 0, which implies that there are no duplications in t; i.e., |E(A1)| + |E(A2)| = |E(A)|. This follows from the facts that P and G are trees and from the definition of s′.
Now, following the same induction, we prove that the cost
C(
A,
z) of the tree
G(
A,
z) constructed at step (
A,
z) is the desired minimum in the intermediate tree problem. Let
X be a minimum tree with leaf set
A; by the condition, all its clades belong to some
B which with respect to
P,
Sz, and some embedding
s2:
Xt →
Sz is minimal in the intermediate tree cost; in particular,
s2:
Xt →
Sz is a minimum embedding. For
X, the conditions of one of Cases 1–4 of the algorithm are satisfied. Let, for example, the conditions of Case 3 be fulfilled; the other cases are considered similarly. Denote by
f(
X) =
s1′(
X) +
s2′(
X) the minimized cost in the intermediate tree problem. Denote by
X1 and
X2 the subtrees with root edges
t1 and
t2 embedded in the subnetworks
Sz1 and
Sz2. Their clades are some sets
A1 and
A2 in
B. Repeating the arguments of the preceding paragraph, we obtain
f(
X) =
f(
X1) +
f(
X2) +
d, where
d is the sum of duplications in
t in losses in
t_ when embedding the subtrees from
P into
X. Then, we have
here,
C(
A1∪
A2,
z) is the value of
C computed with respect to the partition
A1 and
A2. Hence,
f(
G(
A,
z)) =
f(
X).
The algorithm runtime. We construct sets E(A) for all A ∈ B in a time of the order of |B|·|P|. Inclusions and intersections for sets in B are computed in a time of the order of |B|2 ·|M|. Inclusions of sets from B into clades of the network S are computed in a time of the order of |B|·|S|·|M|. All partitions of each set A ∈ B into sets A1,A2 ∈ B are computed in a time of the order of |B|3; for every triple A, A1, A2 of sets one checks that A1 and A2 are disjoint and |A1| + |A2| = |A|. For every A ∈ B and every partition into A1 and A2, one computes the number k of nodes v ∈ P in a time of the order of |P|. The number of sets A and of partitions of A is no greater than |B|, so the total computation time for all k and for all A is of the order of |B|2 ·|P|.
At each induction step when constructing the trees G(A,z), for each A ∈ B and each pipe t we examine at most |B| partitions of A. For each partition, the cost C is computed in a constant time. Therefore, the total construction time for all G(A,z) is of the order of |B|2·|S|. The total runtime of the algorithm is at most of the order of |B|·(|B|·|M| + |S|·|M| + |B|2 + |B|·|P| + |B|·|S|). □
Remark 1. If S is a tree, then the described algorithm can be generalized so that to take into account horizontal transfers when computing the embedding of G into S. Such an event assumes that pipes in S are divided into time layers, which often results in occurrence of nodes of degree 2 (see [22]). A horizontal transfer consists of transferring an edge of G from one pipe of S into another pipe lying in the same time layer. A transfer may have a furcation at the source or do not have it (in the first case, the furcation occurs in the pipe, with one of the child edges remaining in its pipe and the other being transferred). To obtain the corresponding algorithm, we need to add two more cases to the described-above algorithm. These are furcation of an edge e in z with transferring one of the child edges to another pipe in the same time layer, and transferring an edge e (without furcation) to another pipe of the same layer. In both cases, it is not necessarily required that s2(A) ⊆ z<. 2.6. Example of the Algorithm Operation
Let us show how the algorithm works using an example from [
6], where
S = (((
a,
b),
c),(
f,(
d,
e))) is a tree,
P = (((
a,
f),
c),(
e,(
b,
d))),
M = {
a,
b,
c,
d,
e,
f}; and
s1 and
s2 map each leaf to itself. Let the cost of all explicit losses be 2, of all implicit losses be 1, and of all duplications be 3; and collection
B′ is as follows: {
a,
f}, {
a,
f,
c}, {
b,
d}, {
e,
b,
d}, {
a,
b}, {
a,
b,
c}, {
d,
e}, {
f,
d,
e}, the set
M itself, and all one-element sets in
M. The embedding
s′:
P →
S is shown as an image tree in
Figure 9.
Construct the main parameter B. The root pipe adds to B′ the sets {a,b,f,d}, {a,f,e}, {b,c,d}, {c,e}, {a,b,c,f,d}, {a,c,f,e}, {a,b,d,e}, and {b,c,d,e}. The left-hand child pipe adds to it the sets {a,c} and {b,c}, and the righ-hand one adds {f,d} and {f,e}. Clearly, B satisfies condition (*).
Construct trees G(A,z) for all sets A ∈ B, z ∈ S, z being the ancestor edge for the set A of leaves. For one-element and two-element sets A, the trees are trivial. Consider three-element sets A. For the set A = {a,f,c}, the trees ((a,f),c) and ((a,c),f) have the same cost. For {e,b,d}, the tree (b,(d,e)) is the one of the minimum cost among the two trees. For {a,b,c} and {f,d,e}, the trees ((a,b),c) and (f,(d,e)) are those of the minimum costs among the three trees. For {a,f,e} and {b,c,d}, the trees (a,(f,e)) and ((b,c),d) are those of the minimum costs among the two trees.
For the set {a,b,f,d} in B, the minimum cost is given by the partition into {a,b} and {f,d}; for {a,c,f,e}, by the partition into {a,c} and {f,e}; for {a,b,d,e} and {b,c,d,e}, by the partitions into {a,b} and {d,e} and into {b,c} and {d,e}, respectively.
For the set {
a,
b,
c,
f,
d} in
B, the minimum cost is given by the partition into {
a,
b,
f,
d} and {
c}. Finally, for the whole
M, the minimum cost is given by {
a,
b,
f,
d} and {
c,
e}. Thus, the resulting tree
G* is (((
a,
b),(
f,
d)),(
c,
e)), the resulting cost
C* = 26, and the obtained embeddings
s1′:
P →
G and
s2′:
G →
S are shown in
Figure 10. In total, they contain 2 duplications and 11 losses. This is the global minimum
G** of the cost of a tree
G in the unconditional problem.
For all costs equal to 1, our algorithm outputs the same tree
G*. For the same input data, the algorithm from [
8] has constructed the tree
G = (((
c,
a),
f),(
b,(
d,
e))); the corresponding embeddings are shown in
Figure 11. Here, in total, there are 3 duplications and 11 losses.
If we confine ourselves with the main parameter
B′, our algorithm constructs a tree with a cost strictly greater than in the unconditional problem; namely, 3 duplications and 11 losses (see
Figure 12). In [
8], there are also 3 duplications and 11 losses, but our embeddings are quite different (see
Figure 11 and
Figure 12).
2.7. ILS-Minimum Embedding of a Tree into a Network and Theorem 3
Let
G and
S be any given binary tree and binary network, respectively, and let
s″:
G → S be an embedding. The
Incomplete Linear Sorting (
ILS) event is an edge
e ∈
G entering a pipe
t ∈
S provided that at least two edges enter
t (see [
9]). The
locus of an
ILS event is the pipe
t. The
ILS-
cost over a pipe
t ∈
S entered by
kt ≥ 1 edges from
G is defined as
ct =
c·(
kt –1), where
c is the cost of a single
ILS plus the costs of all duplications and all losses in
t.
The ILS-cost over
S is defined as
plus the costs of all duplications and all losses in
S, where
t runs over pipes
t ∈
S entered by at least one edge from
G; the latter is denoted as
t ∈
S↓. Such pipes will be called
non-empty, while all others are said to be
empty. Clearly,
, where
n is the number of non-empty pipes in
S.
An ILS-minimum embedding of s′: G → S is defined to be an embedding with the total minimum cost of , where n is the number of non-empty pipes in s′, plus the costs of all duplications and losses. This cost is called the ILS-minimum cost of s′; it reflects how much s′ differs from an isomorphism.
Similarly, we define an ILS event for a given set {se: Ge → S} of embeddings of different trees Ge into a given network S. Edges occurring in S from different se are considered equally. The ILS-cost of a set {se} is defined by the same expression , where n is the number of pipes that are non-empty for at least one of the embeddings se, plus the costs of all duplications and all losses in S. A set {se} is called an ILS-minimum set of embeddings if the ILS-cost of {se} is minimal as compared to any set {re: Ge → S} of embeddings of these Ge into S with some conditions on all se and re. This cost is called the ILS-minimum cost of {se}.
A
bridge in
S is a pipe
t for which any path from the super-root to the clade
t< contains
t. In our case, this is equivalent to the following condition:
t is a pipe such that its removal makes the undirected graph corresponding to
S disconnected. A leaf pipe is always a bridge. A
block in
S is an inclusion-maximal set of nodes in
S for which the induced subgraph does not contain a bridge. Two blocks either coincide or are disjoint. For example, in
Figure 6, bridges are the pipe
u7-
u6 and also the root and leaf pipes; blocks are the super-root, leaves, and two complements to the edge
u7-
u6 without the root or leaf pipes. Note that blocks of a network
S form a rooted, though now non-binary, tree. For a network
S of
level k any block contains at most
k hybrid nodes. This number
k is a measure of distantness between
S and the tree.
We say that pipe t enters block B if B contains its head t− but does not contain its tail t+; such a t is unique for B and is a bridge in S. Pipe t leaves B if B contains its tail t+ but does not contain its head t−; there can be many pipes leaving B, and all of them are bridges in S. In what follows, for any pipe t ∈ S we assume a unique block Bt in S, such that t− ∈ B; however, we consider the embedding Ge → St. The mapping t ↦ Bt between incoming pipes t and blocks B in S (except for the super-root) is a one-to-one mapping.
For an arbitrary non-empty set A of leaves in an arbitrary network S, E(A) is defined exactly as in the above case where S is a tree. Namely, for an edge e ∈ G the relation e ∈ E(A) means the following: s(e<) ⊆ A and there is no e1 ∈ G for which e1 > e and s(e1<) ⊆ A. It is easily seen that for any edge e entering a pipe t ∈ S we have e ∈ E(t<) or ∃e′ > e (e′ ∈ E(t<)). As is noted above, if S is a tree, then finally there remains only one disjunction term e ∈ E(t<).
Let s″: G → S be an embedding, and introduce the following condition:
(*) each non-root pipe t which is a bridge in S is entered by precisely edges from E(t<).
We have proved the following result.
Theorem 3. Let s: G0 → S0 be an arbitrary mapping from leaves of a binary tree G to leaves of a binary network S. The algorithm has been constructed which outputs an ILS-minimum continuation s′: G → S of s with the property (*). Its runtime is of the order of |G|·|S|·k·24k, where |·| is the number of nodes in the tree or network and k is the level of the network S. It is exact if the cost of a single ILS is not less than the sum of the duplication cost and the cost of an implicit loss.
In [
9], a construction algorithm was suggested for a minimum (with respect to
ILS events only) embedding of a tree into a network of level 1. Additionally, in [
9] (section “Conclusion”), a method was outlined to generalize that algorithm to the case of a network of any level
k. Our algorithm uses the idea from [
9] combined with ideas of the algorithm of Theorem 1.
Denote by cd, el, il, and ci, respectively, the costs of duplication, explicit loss, implicit loss, and ILS event.
Claim 1. If ci ≤ cd + el, then there exists an ILS-minimum embedding s*: G → S such that property (*) is valid.
Proof. Let s′ be any ILS-minimum embedding. We successively rearrange s′ to a desired embedding s*; to this end, we exhaustively examine all non-root bridges t ∈ S in an arbitrary order. The algorithm that we will construct now is not used in what follows; i.e., we need Claim 1 as an existence result only. Namely, in the Proof of Theorem 3 we construct an embedding s′: G → S, which possesses property (*) and is minimal among the embeddings with property (*); considering Claim 1, it will be absolutely minimal.
We perform the proof by contradiction. Assume that t ∈ S and we have the following:
(**) edge e1 ∈ G enters t, e1 ∉ E(t<), and e1 is an edge the most distant from the root among edges satisfying these two conditions.
Let us show that a sibling edge e2 of e1 also satisfies all the three conditions, and rearrange s′ so that again to obtain a minimum embedding s″ such that both e1 and e2 do not enter t, there arises one new edge entering t, and the number of edges entering all other pipes does not increase. We repeat this rearrangement until we find an edge e1 satisfying condition (**).
Thus, take e′ > e1 (e′ ∈ E(t<); such an e′ is unique. Consider a path φ from the root through the edges e′ and e2 to some leaf and a path ψ of pipes that contain φ so that ψ ends at t<. Then, ψ passes through t. Assume that some edge e3 in φ enters t. Then, we must have e3 = e2. Indeed, by the choice of e1, the edge e3 on φ cannot lie below e2, and since e1 enters t, it cannot lie above e2. Thus, e1 and e2 enter t and are forking from some edge e, i.e., they fork at e− ∈ G. Let z = s′(e−).
If z is a pipe, then e1 and e2 appeared in z as a result of a duplication and entered child pipes of z, either both entered the same pipe z1 or each one entered its own. In the first case, the duplication can be moved to z1, child of z, thus reducing the numbers of losses and ILS events when computing the embedding cost of s′; in the second case, the duplication can be eliminated by moving it to z−. In both cases, the cost of the resulting new embedding becomes strictly less, which contradicts the minimality of s′. The case that z is a hybrid node is impossible.
If
z is a tree-type node, then the edges
e1 and
e2 arise in
z and are continued to
t. Consider a new embedding
s″, which differs from
s′ only by the fact that
s″(
e−) =
t, and draw a path
pe along any of the paths
pe1 or
pe2 (see
Figure 13).
For s″, one explicit loss appears at z. In the node t+, one ILS event is replaced with a duplication at t and loss at z. The embedding cost does not increase, but now the number of edges e entering t is less by 1, and the number of edges entering any other pipe have not increased. By repeating this modification, we find the desired ILS-minimum embedding s*. □
Let a set D of hybrid pipes in a network S be fixed. For an embedding s″: Ge → St of a subtree G with root edge e into a subnetwork of S with root pipe t, a block Bt ⊂ S is said to be D-coordinated if D is the set of non-empty hybrid pipes in B. For the same embedding s″, denote by N(t,D) the set of all non-empty pipes in Bt. The cardinality of this set will play an important role in what follows.
For a set M = {Ge → St: e ∈ X} of embeddings of subtrees into a fixed subnetwork, where X is a fixed set of edges of G, a block Bt ⊂ S is said to be D-coordinated if D is the set of hybrid pipes in Bt that are non-empty for at least one embedding in M.
Claim 2. Let e ∈ G, t ∈ S and Bt be a block in S with t− ∈ Bt. We construct an algorithm which, given any leaf mapping s: Ge → St and any set D of hybrid pipes in a block Bt ⊂ S, outputs a set Y, such that Y = N(t,D) if there exists a continuation s″: Ge → St of s that D-coordinates Bt. The runtime of the algorithm is if the order of |Bt|.
Proof. Let such an s″ exist. The algorithm exhaustively examines bridges that leave Bt and records which of them are empty. For that, we use the fact that a bridge x leaving Bt is empty with respect to s″ if, and only if, the clade of x is disjoint with s″(e<). Then, the algorithm exhaustively examines tree-type pipes x ∈ Bt from leaves to the root and labels x as empty, if and only if, either x = t or both child pipes of x are empty; labels of hybrid pipes are known by the condition. □
For any embedding of leaves s0: G0 → S0, we delete from S the subnetwork of every pipe, such that its clade is disjointed with s0(G0) and its sibling clade is not. In the resulting graph S′, we combine maximum-length sequences t1 > … > tn of pipes (ti+1 is a child pipe to ti) in S for each of which (for 1 ≤ i ≤ n − 1) one child subnetwork has been deleted into a «compound» pipe t, which will be referred to as new; we again denote the resulting binary network by S′. For every new pipe in S′, denote l(t) = n ≥ 2; for all other pipes in S′, denote l(t) = 1. The leaf mappings s0: G0 → S0 and s0: G0 → S0′ coincide.
For a binary network S′, we formulate a new minimum embedding problem. We replace the search for a minimum embedding s′: G → S with the search for a minimum embedding s′: G → S′. The latter is more convenient due to the fact that for every tree pipe t ∈ S′ there are edges e1 and e2 that enter, respectively, the child pipes t1 and t2. Furthermore, the cardinality of the network S′ is not greater (and typically less) than the cardinality of S.
Thus, we define an implicit loss in S′ to be a pair (e,t) where the edge e enters a new pipe t ∈ S′; if a new pipe is the root pipe r ∈ S′, then e is the root edge in G. Therefore, to several implicit losses in the former sense there corresponds a single implicit loss in the new sense. The cost of a new implicit loss is set to be il ·(l(t) − 1), where il is the cost of a single implicit loss equal to its former value. The locus of an implicit loss is assumed to be the pipe t. An explicit loss and a duplication are defined as above and have the original costs el and ed. An ILS event is defined as above; it is a pipe entered by strictly greater than one edge. However, the cost of a new ILS event on a pipe t is ci·(kt − 1)·l(t), where kt is the number of edges entering t. We split this cost into a sum of two terms, the first ILS term and the second ILS term, ci·(kt − 1) and ci·(kt − 1)·(l(t) − 1). The cost of a new ILS event on a network S is , where kt ≥ 1 and where n is the number of non-empty pipes in S′. Here, we will also distinguish between the first ILS term and the second ILS term. The ILS-minimum continuation and the ILS-minimum cost are defined as above, taking into account all the above-mentioned events. Thus, the new problem is to find an ILS-minimum continuation s′: G → S′ of s0 with property (*). This problem is of interest in connection with the following fact.
Claim 3. Given a solution of the new problem by a linear-time algorithm, one can construct a solution of the original problem with property (*).
Proof. Let s′: G → S′ be a solution of the new problem for a new network S′, and let its ILS-minimum cost be c. Define an embedding s: G → S for the original problem by keeping the values s(g) = s′(g) for all s′(g) except for s′(g) = t, where t is the new pipe in S′. In the latter case, we have s(g) = tn, where tn ∈ S is the lowermost pipe in t ∈ S′. The ILS cost of s is also c. Assume that there exists a solution s*: G → S of the original problem with an ILS-minimum cost c′ < c. Then, define an embedding s**: G → S′ by keeping the values s**(g) = s*(g) for all s*(g) except for s*(g) = ti, where ti is a part of the new pipe t. In the latter case, we have s**(g) = t. However, the ILS costs of s* and s** are the same. Let us check property (*) for s. Consider edges entering pipe t with respect to s′; with respect to s, they enter either t (if t is not new) or t1, t2, …, tn (otherwise). Since E(t<) = E(t1<) = … = E(tn<), we conclude that any bridge t ∈ S is entered by exactly the edges from E(t<). □
We again denote the resulting network by S; its size is of the order of k·|G|, where k is the level of the original (or equivalently, of the resulting) network.
For a given embedding s″: Ge → St, define the B-cost of a block B to be the total cost of all events in B plus the cost of duplications and implicit losses in the pipe t. For a given set {se: Ge → St} of embeddings, let the B-cost of a block B be the total cost of all events in B plus the cost of all duplications and implicit losses in t that arise from all these embeddings se.
For an arbitrary set D of hybrid pipes in Bt, by a D-minimum embedding we call an embedding s″: Ge → St satisfying property (*), with D coordinating the block Bt, and having the smallest Bt-cost among all embeddings Ge → St that satisfy these two conditions. We denote such an s″ by s(e,t,D) and denote its Bt-cost by C(e,t,D). If there is no any such s″, we say that a D-minimum embedding is not defined.
For a set {Ge|e ∈ E(t<)}, by a D-minimum set of embeddings we call a set {se: Ge → St} of embeddings satisfying property (*), with D coordinating the block Bt, and having the smallest Bt-cost among all sets {re} that satisfy these two conditions. We denote this Bt-cost by c(t,D) and denote this D-minimum set by s(t,D).
Denote by c(t) the minimum of c(t,D) over all D specified above, which is attained at some D0, and define s(t) = s(t,D0). If |Bt| = 1, then D = ∅. If t is a bridge, then s(t) is called a bridge set, and its elements are referred to as bridge embeddings.
The result of Theorem 3 consists of constructing a set s(r) with r a root pipe in S such that s(r) consists of a single embedding, which is the desired ILS-minimum continuation s′: G → S.
Algorithm for ILS-minimum embedding of a tree G into a network S. By induction, during a forward pass of the algorithm we construct C(e,t,D), and during its backward pass we could construct s(e,t,D), though the latter is not needed. The usual linear order on these pairs is fixed: pipes t from the leaves to the root, and for each fixed t we look through edges e from the leaves to the root.
For each pair (e,t), with properties s(e<) ⊆ t< (if t is not a bridge) or e ∈ E(t<) (otherwise), and for each set D of hybrid pipes in Bt, we find a D-minimum embedding s(e,t,D) of Ge → St and its Bt-cost, or notice that they are undefined. We denoted this Bt-cost by C(e,t,D), and for this embedding, denote the set of non-empty pipes in Bt by N(e,t,D). Additionally, for each pair (e,t) we compose a list D(e,t) of admissible sets D.
For an arbitrary set
M of pipes, define
. By Claim 2, the set
N(
t,
D) can be constructed an algorithm which is linear in |
S| given the set
D; this allows, for any
t ∈
S and for any sets
D1 and
D2, to compute in time of the order of|
S|
2·2
4k the values of
f(
N(
t,
D1) ∩
N(
t,
D2)) and
f(
N(
t1,
D1) ∩
N(
t2,
D2)), where
t1 and
t2 are children of
t. It is specifically these intersections that are used in what follows. The flow chart of the algorithm is shown in
Figure 14.
The induction base for constructing D-minimum embedding and bridge embedding of Ge → St: for a leaf pipe t− ∈ Bt and a leaf edge e ∈ G, D(e,t) = {∅} and the embeddings are trivial, its B-cost being C(e,t,D) = il·(l(t)–1).
The induction step for
D-minimum embeddings. Consider the following four cases, analogous to those considered in
Section 2.4, where
U(
e,
t,
D), by the proof given below, will be found to be equal to the
B-costs of the embeddings defined below. In Cases 1–4 below, computations are performed if all the involved quantities are well defined and all the required conditions are fulfilled. Recall that
D is a set of hybrid pipes in
B.
- (1)
Assume that edge e leaves t and “swings” to the child pipe t1 at a furcation in S, and t1 ∈ D (if t1 is a hybrid pipe) or e ∈ E(t1<), D = ∅ (if t1 is a bridge) or D is arbitrary (if t1 is a tree-type pipe and is not a bridge). The embedding continues (since e+ ↦ t) an already known (D\t1)-minimum (if t1 is not a bridge) or bridge (otherwise) embeddings of Ge → St1. Its B-cost is, respectively, U(e,t,D) = U(e,t1,D\t1) + el + il·(l(t)–1) if D\t1 ∈ D(e,t1), or U(e,t,D) = el + il ·(l(t)–1), symmetrically for t2. For each e, t, and D, if, in this case or in one of the cases below, we have U(e,t,D) < ∞, then we add D to the list D(e,t).
- (2)
Assume edge e forks in t into e1 and e2. Exhaustively examine all pairs (D1,D2) where D1 and D2 are any sets of hybrid pipes in B, such that D1 ∈ D(e1,t) and D2 ∈ D(e2,t). For each D′ = D1 ∪ D2 ⊂ Bt we set the initial value of U(e,t,D′) = ∞. For every (D1,D2) compute
If V is strictly less than the current value of U(e,t,D′) for D′ = D1 ∪ D2, set U(e,t,D′) = V. In particular, we compute U(e,t,D).
- (3)
Assume that edge e forks at a furcation in S, t1 is entered by edge e1, and t2 is entered by e2. For each D′ we set the initial value of U(e,t,D′) = ∞. If t1 and t2 are not bridges, exhaustively examine all pairs (D1,D2), where D1 and D2 are sets of hybrid pipes in B, such that D1 ∈ D(e1,t1), D2 ∈ D(e2,t2). For every (D1,D2) compute
If V is strictly less than the current value of U(e,t,D′) for D′ = D1 ∪ D2 ∪ {t1,t2}, set U(e,t,D′) = V. Here, by the definition, the set {t1,t2} contains only hybrid pipes.
If exactly one of the pipes t1 and t2 is a bridge (assume that this is t1), exhaustively examine each sets D2 of hybrid pipes in B such that D2 ∈ D(e2,t2). For D2, if e1 ∈ E(t1<) compute V = U(e2,t2,D2) + il ·(l(t) − 1), and if V is strictly less than the current value of U(e,t,D2 ∪ {t2}), set U(e,t,D2 ∪ {t2}) = V. Here, similarly, the set {t2} contains only hybrid pipes.
If both pipes t1 and t2 are bridges, e1 ∈ E(t1<), and e2 ∈ E(t2<), set U(e,t,∅) = il ·(l(t) − 1). In particular, we compute U(e,t,D).
- (4)
Assume that edge t′ leaves t, enters its unique child pipe t1, and t1 ∈ D (if t1 is a hybrid pipe); e ∈ E(t1<) and D = ∅ (if t1 is a bridge). Then, the embedding continues (since e+ ↦ t) the already known (D\t1)-minimum (if t1 is not a bridge) or bridge (otherwise) embedding Ge → St1, and U(e,t,D) = U(e,t1,D\t1) + il ·(l(t)–1) if D\t1 ∈ D(e,t1), or U(e,t,D) = il ·(l(t)–1).
Choose an appropriate case that has the minimum B-cost U(e,t,D) and memorize it, as well as the sets D1 and D2 in Cases 2 or 3. These will be used in the further backward pass of the algorithm, which will result in the construction of a s(e,t,D) embeddings. There is no need to compute U(e,t,D) for e, t, and D; these values are needed only for arguments satisfying the given conditions, which are necessary to proceed with the next cases of the algorithm. As is proved below, C(e,t,D) = U(e,t,D); accordingly, the backward pass of the algorithm finds the embeddings s(e,t,D): Ge → St.
- (5)
The induction step for bridge embeddings after the corresponding construction step for D-minimum embeddings. Let t be a bridge. By property (*) (see the Proof of Theorem 3 below), t is entered by precisely the edges from E(t<). Arbitrarily order edges e1, e2, e3,…,em ∈ E(t<). Similarly to what was performed in Case 2 (but without the cd term), by exhaustively examining pairs D1, D2 of sets of hybrid pipes in B, where D1 ∈ D(e1,t) and D2 ∈ D(e2,t), for every D = D1 ∪ D2 we find D1′ and D2′ that correspond to a D-minimum two-element set M2 = {Ge1 → St, Ge2 → St} of embeddings. More precisely, for each D we minimize over D1 and D2, where D = D1 ∪ D2, the quantity V in the expression from Case 2 without the cd term, i.e.,
Assume that the minimum of
V is finite and is attained at some
D1′ and
D2′.
Denote it by
U(
e1,
e2,
t,
D) and add
D to the list
D(
e1,
e2,
t) of admissible sets. Note that the set
M2 = {
s(
e1,
t,
D1′),
s(
e2,
t,
D2′)} itself is not used here. Let
D1′ =
f21(
D) and
D2′ =
f22(
D). Again examining pairs
D1,
D2, where
D1 ∈
D(
e3,
t) and
D2 ∈
D(
e1,
e2,
t), for every
D =
D1 ∪
D2 we find
D1″ and
D2″ that correspond to a
D-minimum 3-element
set M3 = {
Ge1 →
St,
Ge2 →
St,
Ge3 →
St} of embeddings. More precisely, for each
D we minimize over
D1 and
D2, where
D =
D1 ∪
D2, the quantity
Assume that the minimum is finite and is attained at some D1″ and D2″. Denote it by U(e1,e2,e3,t,D) and add D to the list D(e1,e2,e3,t) of admissible sets. Note that the set M3 = {s(e1,t,f2,1(D1″)), s(e2,t,f2,2(D1″), s(e3,t,D2″)} itself is note used here. Proceeding further in this way with computing U(e1,e2,…ei,t,D), for each D we construct a D-minimum set Mm = {Ge1 → St, Ge2 → St,…,Gem → St} of embeddings, which we denote by s(t,D). The set s(t,D) contains exactly m = |E(t<)| embeddings, where m is the number of edges entering t.
Choose a bridge set s(t) = s(t,D0) with the minimum U(e1,e2,…em,t,D) over all D. For t = r, the root pipe, this set s(r) consists of a single element, which is the desired embedding s′: G → S with the minimum B-cost. The ILS-minimum cost s′ equals the sum over all bridges t ∈ S of Bt-costs of sets s(t) plus the costs of ILS events in t, where Bt corresponds to t. □
Proof of Theorem 3. To prove the exactness of the algorithm, we first show that the expressions for computing U(e,t,D) in the algorithm output the Bt-cost of the corresponding embeddings, which, clearly, D-coordinate the block Bt. For Cases 1 and 4, note the following, the embedding Ge → St, as compared to the embedding Ge → St1, has one non-empty pipe (t1) more and one entry (into t1) more. Therefore, the first ILS term is the same for both embeddings. This is also true for the second ILS term as well, since it equals 0 at t. For Case 3, the embedding Ge → St, as compared to the union of embeddings Ge1 → St1 and Ge2 → St2 has two non-empty pipes (t1 and t2) more and two entries (into t1 and t2) more. Therefore, the first ILS term of the embedding Ge → St and of the union Ge1 → St1 and Ge2 → St2 are the same if pipes that are non-empty for both embeddings of the union are counted once each time. Similarly, the second ILS term of the embedding Ge → St and of the union are the same if pipes that are non-empty for the embeddings are counted l(t)−1 times. This explains the presence of the term ci·f(N(e1,t1,D1) ∩ N(e2,t2,D2)) or, in Case 2, ci·f(N(e1,t,D1) ∩ N(e2,t,D2)) in the expressions for C. Additionally, for Case 2 let us check that in the expression for V we may use the values U(e1,t,D1) = C(e1,t,D1) and U(e2,t,D2) = C(e2,t,D2), i.e., regard the embeddings Ge1 → St and Ge2 → St, respectively, to be D1-minimum and D2-minimum. Indeed, the minimum Bt-cost is attained at some embedding s: Ge → St, to which there correspond some embeddings s1: Ge1 → St and s2: Ge2 → St with some D1 and D2. If, for example, s1 is not D1-minimum, then by replacing it with a D1-minimum one, we strictly decrease the Bt-cost of the embedding s, a contradiction. Note the following, here we use the fact that N(t,D1) is independent of the embedding Ge1 → St that D1-coordinates the block Bt. Correctness of using the values U(e1,t1,D1) = C(e1,t1,D1) and U(e2,t2,D2) = C(e2,t2,D2) in Case 3 is prove similarly. All arguments justifying Case 2 can be literally repeated to justify the fact that in the induction step for bridge embeddings, every Mi is D-minimal and U(e1,e2,…ei,t,D) is its Bt-cost. In particular, this is true for a bridge set Mm.
Using induction from leaves to the root, let us prove that for any bridge t the constructed bridge set s(t) is ILS minimal; then, in particular, so is s′. For leaf pipes, this is obvious; let t be a non-leaf pipe. The total cost of events for s(t) can be represented as a sum of three terms, the Bt-cost, the total cost of events in subnetworks St′ for bridges t′ leaving Bt, and the cost of the ILS event in t. The construction of s(t) implies the minimality of the first of these terms. The induction hypothesis on the ILS minimality of bridge embeddings for t′ satisfying property (*) implies the minimality of the second term. Property (*) for t implies that the third term is constant and equals ci·(|E(t<)| − 1)·l(t).
Let us check property (*) for the embedding s′. Embedding s(e,t,D) satisfy property (*)′: if e′ enters a bridge t′, then e′ ∈ E(t′<). Indeed, the conditions in Steps 1–4 ensure this property for bridges leaving B. After that, it can be proved by induction on the construction of s(e,t,D). Since any element of a bridge set is one of the s(e,t,D), property (*)′ is satisfied for bridge sets too, in particular, for s′. Now (*) follows from the fact that for any bridge t and any e ∈ E(t<) a path in G from the root to a leaf passing through e contains an edge entering t.
The algorithm runtime. For each pair (e,t) in the construction of a D-minimum embedding we look through at most 2k sets D, or at most 4k pairs (D1,D2). If t is a bridge pipe, it additionally requires at most |E(t<)|·24k pairs (D1,D2), which yields the overall estimate of the order of |G|·|S|·k·24k. □
Remark 1. The minimum embeddings of a tree into a network, taking ILS events into account or disregarding them, can be different. For example, let a tree be G = (l1,l2), and let a network S be the network shown in Figure 13 with the following modifications: at the bottom there is a single leaf l, and side branches in S are continued to the root (two rhombuses are added). Then, a minimum embedding G → S without taking into account ILS is shown in Figure 13a (no duplications and losses, one ILS), and a minimum embedding taking into account ILS (if the duplication cost is less that the ILS cost) is shown in Figure 13b (no ILS and no losses, one duplication). Remark 2. The problem of constructing a minimum embedding s′: G → S can also be posed without the assumption that the embedding is a continuation of some leaf mapping s. If we disregard ILS events in this problem, then such an embedding can be constructed in a standard way: by exhaustively examining pairs (edge e, pipe t) and choosing for each pair the best of the four cases given in Section 2.4; here, the condition s′(e<) ⊂ t< cannot be used. The authors are unaware of any polynomial algorithm for constructing a minimum embedding s′ in this setting with taking into account the ILS (even if S is a tree).