1. Introduction
Let us fix a finite set Ω = {1,…, r}, whose elements denote different colors, and consider a finite random set X containing a random number of elements of colors 1, …, r. The simplest colored branching process can be defined as an evolution of a population in which all individuals live the same fixed time, and then, when the lifetime ends, each individual generates (independently of others) a random set of “children” containing individuals of different colors and distributed as X. We will assume that the evolution starts with a unique initial individual. It is suitable to represent this process as a random genealogical tree with individuals as vertices and each vertex connected with its “children” by edges. Denote by Xn the set of all genetic lines of length n starting from the initial individual. The colored branching process can degenerate (when, starting from some n, all of the sets Xn are empty) or, otherwise, evolve endlessly. Every genetic line x = (x1, …, xn) ∈ Xn generates an empirical measure δx,n on the set of colors Ω in accordance with the following rule: for each i ∈ Ω, the value of δx,n(i) is the quota in the string (x1, …, xn) of all elements having the color i.
Denote by
μ(
i), where
i ∈ Ω, the expectation of the number of all elements
x ∈
X having the color
i. Let
ν be an arbitrary probability measure on Ω. An analog to the McMillan theorem that will be proven below (see Theorem 11) asserts that under the condition of non-degeneracy of the colored branching process, the cardinality of the set {
x ∈
Xn |
δx,n ≈ ν} almost surely (
i.e., with probability one) has asymptotics (up to the logarithmic equivalence) of order
e−nρ(ν,μ), where:
Formally, the value of ρ(ν, μ) coincides with the usual Kullback–Leibler divergence and differs from it only in the fact that in our setting, the measure μ is not a probability, and so ρ(ν, μ) can be negative.
This is the first main result of the paper. It generalizes the usual McMillan theorem, which deals with Ω
n, rather than
Xn. Its proof is based essentially on the Legendre duality between the Kullback–Leibler divergence and the so-called
spectral potential, which was introduced first in [
1] and then investigated profoundly in [
2,
3].
We investigate also Hausdorff dimensions of random fractals defined in terms of the limit behavior of the sequence of empirical measures
δx,n. Let
X∞ be the set of all infinite genetic lines. Fix an arbitrary vector
θ = (
θ(1), …,
θ(
r)) ∈ (0, 1)
r and define the following metric on
X∞:
Denote by
V any set of probability measures on Ω. The second main result of the paper (concerning Hausdorff dimensions of random fractals) is the following: under the condition of non-degeneracy of the colored branching process,
almost surely, where
δx,n ≻
V means that the sequence
δx,n has at least one limit point in
V and
d(
ν,
μ,
θ) is the
Billingsley–Kullback entropy defined as:
These results are the essence of Theorems 12 and 17 below.
The systematic study of random self-similar fractals was initiated in the works by Falconer [
4], Mauldin, Williams [
5] and Graf [
6]. They considered the attractor
K of a random version of an iterated function system (IFS) in ℝ
n, letting the scaling factors
θ(
i) ∈ [0, 1) be independent and identically distributed (i.i.d.) random variables. The main result proven in [
4–
6] is the following: if the open set condition holds and
, then
K ≠ ∅ with positive probability, and with the same probability (
i.e., almost surely, provided
K ≠ ∅), its Hausdorff dimension
s is a unique solution to the equation
. Recently, the research was extended to the case of random self-affine fractals in ℝ
n (see [
7,
8]) and random affine code tree fractals [
9].
On the other hand, Billingsley in [
10,
11] considered fractals in the space of infinite sequences Ω
ℕ, where Ω is a finite set, defined in terms of the limit behavior of empirical measures. He investigated Hausdorff dimensions of such fractals and, in particular, obtained formulas analogous to
Formulars (2) and
(4) (with Ω
ℕ instead of
X∞ and
μ(
i)
≡ 1). Afterwards, Billingsley’s ideas and methods were transferred to strange attractors of dynamical systems in ℝ
n (see [
12,
13]).
The present research combines both features of random self-similar fractals and subsets of nonrandom self-similar fractals defined in terms of the limit behavior of empirical measures. This combination provides its novelty.
The paper can be divided into two parts. The first one (Sections 3–7) contains preliminary information necessary for our research. All theorems in this part are known and supplied with appropriate references; however, some of them are modified in a certain way for the sake of convenience in further usage. Most of them are proven for completeness of presentation and the reader’s convenience. The second part (Sections 2 and 8–11) contains essentially new results, which have been briefly formulated above.
2. Examples and Discussion
Example 1. Suppose that Ω = {1, 2}, and the random set X ⊂ Ω contains the elements 1, 2 with probabilities p(1), p(2), respectively. Obviously, for i = 1, 2, the expectation of the number of elements in X that are equal to i is p(i). Therefore, in the notation of the previous section, μ(i) = p(i).
Let us define a sequence of random sets Xn ⊂ Ωn in the following way: each Xn contains each of 2n strings x = (x1, …, xn) ∈ Ωn with probability p(x1) ⋯ p(xn). In other words, under the condition (x1, …, xn) ∈ Xn, the set Xn+1 contains the string (x1, …, xn, 1) with probability p(1) and the string (x1, …, xn, 2) with probability p(2). In particular, Xn+1 contains both with probability p(1)p(2) and contains neither of them with probability (1 − p(1))(1 − p(2)).
Let
ν be the probability distribution on Ω, such that
ν(1) = 1/3 and
ν(2) = 2/3. Question: how many strings (
x1, …,
xn) in which the quota of elements that equal one is close to 1/3 and the quota of elements that equal two is close to 2/3 are there in the random set
Xn? According to
Formula (1), under the condition of non-degeneracy of the stochastic process
X1,
X2,
X3, …, the number of those strings with probability one has an asymptotics of order
e−nρ(ν,μ), where:
In particular, if p(1) = p(2) = 2/3, then
and, respectively, e−nρ(ν,μ) = 2n/3. On the other hand, if p(1) = 1 and p(2) = 1/3, then
and e−nρ(ν,μ) = (3/4)n/3. The latter means that for all large enough n, there are no strings x = (x1, …, xn) in the random set Xn, such that δx,n(1) ≈ 1/3 and δx,n(2) ≈ 2/3.
Example 2. (random Cantor dust). To each string
x = (
x1, …,
xn) ∈
Xn from Example 1, let us put in correspondence a line segment
I(
x) ⊂ ℝ as follows:
and so on. If a certain segment
I(
x) =
I(
x1, …,
xn) has already been defined, then
I(
x1, …,
xn, 1) and
I(
x1, …,
xn, 2) are the left quarter and the right half of
I(
x), respectively. The random Cantor dust
C∞ is defined as:
Put
θ(1) = 1/4 and
θ(2) = 1/2. Let the distance function on the random set of infinite strings
X∞ be defined as:
Consider the mapping
π:
X∞ →
C∞ that takes any infinite string
x = (
x1,
x2,…) ∈
X∞ to the point π(
x) = ∩
n I(
x1, …,
xn). Obviously, this mapping is bi-Lipschitz, and hence, the Hausdorff dimensions of any subset
A ⊂
X∞ and its image
π(
A) ⊂ ℝ are the same. Question: what is the Hausdorff dimension of the set
π(
A), where:
The answer is given by
Formulas (3) and
(4). For example, if for each segment
I(
x1, …,
xn), the “probabilities of survival” of its left quarter
I(
x1, …,
xn, 1) and its right half
I(
x1, …,
xn, 2) are
p(1) =
p(2) = 2/3, then under the condition
π(
A) ≠ ∅ with probability one, we have:
As far as I know, no theorem proven or published previously gives answers to the questions put in Examples 1 and 2. On the other hand, if, in the setting of Example 2, one replaces the set
π(
A) by the whole of
π(
X∞), then dim
H π(
X∞) is computed, for instance, in [
4–
6] and Theorem 15.2 in [
14]. The answer is given there as a solution to the so-called
Bowen equation. Moreover, in those papers, dim
H π(
X∞) was computed in a more general setting when the scaling factors
θ(
i) ∈ [0, 1) are random.
Therefore, the following problem arises naturally: how can one compute dim
H A and dim
H π(
A) for the set
A defined in
Formula (5) in the case of random scaling factors
θ(
i)? The present work does not solve this problem.
Falconer and Miao started the investigation of random self-affine fractals [
7]. They considered a random iterated function system (RIFS) generated by a finite collection of contracting affine maps
Si: ℝ
n → ℝ
n and encoded by a random tree
T∞ that is defined exactly in the same manner as the random set of infinite genetic lines
X∞ above. It turned out that under the condition of non-degeneracy
T∞ ≠ ∅ and some additional assumptions, the attractor of RIFS with probability one has a constant dimension (Hausdorff and box-counting), which can be found as a solution to a certain equation. Recently, analogous results were obtained for random affine code tree fractals [
9] and for uniformly random self-similar fractals [
8]. Thereby, the natural problem arises: is it possible to compute almost sure Hausdorff dimensions for subsets of random fractals like in [
7–
9], provided the subsets are defined in terms of the limit behavior of the corresponding empirical measures (say, as in
Formula (5))?
One more possible direction to generalize the results of the paper is the case when the distribution ν, empirical measures δx,n and/or the scaling factors θ(i) are defined not on the set of colors Ω, but on the whole set of infinite strings Ωℕ.
3. The Spectral Potential
The notion of spectral potential was introduced in [
1–
3]. It was defined there as a logarithm of the spectral radius of a weighted shift or transfer operator. We will use below a simplified version of the spectral potential that corresponds to an operator of weighted integration.
Let X be an arbitrary finite set. Denote by B(X) the space of all real-valued functions on X, by M(X) the set of all positive measures on X and by M1(X) the set of all probability distributions on X.
Every measure
μ ∈
M(
X) defines a linear functional on
B(
X) of the form:
It is easily seen that this functional is positive (
i.e., it takes nonnegative values on nonnegative functions). If, in addition, the measure
μ is a probability, then this functional is normalized (
i.e.,
μ[
1] = 1).
Consider the nonlinear functional:
where
φ ∈
B(
X) and
μ ∈
M(
X). We will call it the
spectral potential. It is monotone (if
φ ≥
ψ then
λ(
φ,
μ)
≥ λ(
ψ,
μ)), additively homogeneous (that is,
λ(
φ +
t,
μ) =
λ(
φ,
μ) +
t for each constant
t) and analytic in
φ.
Define a family of probability measures
μφ on
X, depending on the functional parameter
φ ∈
B(
X), by means of the formula:
Evidently, each measure μφ is equivalent to μ and has the density eφ−λ(φ,μ) with respect to μ.
Let us compute the first two derivatives of the spectral potential with respect to the argument
φ. We introduce the notation:
This is nothing else than the derivative of the spectral potential in the direction
f at the point
φ. An elementary computation shows that:
In other words, the derivative
λ′(
φ,
μ) coincides with the probability measure
μφ. Now, put:
and compute this derivative using the just obtained
Formula (7):
In probability theory, the expression
μφ[
f] is usually called the expectation of a random variable
f with respect to the probability distribution
μφ, and the expression
μφ[
fg] −
μφ[
f]
μφ[
g] is called the covariance of random variables
f and
g. In particular, the second derivative:
is equal to the variance of the random variable
f with respect to the distribution
μφ. Since the variance is nonnegative, it follows that the spectral potential is convex in
φ.
4. The Kullback Action
Denote by
B∗(
X) the space of all linear functionals on
B(
X). Obviously,
For any measure μ ∈ M(X) and nonnegative function φ ∈ B(X), define a product φμ ∈ M(X) by means of the formula (φμ)(x) = φ(x)μ(x).
The Kullback action is the following functional of two arguments
ν ∈
B∗(
X) and
μ ∈
M(
X):
To be more precise, “all of the other cases” fit into at least one of the three categories: (1) probability measures ν, such that ν(x0) > 0 and μ(x0) = 0 for a certain x0 ∈ X; (2) non-normalized functionals ν; and (3) nonpositive functionals ν.
As far as we know, in the literature, this functional was defined only for probability measures ν and μ. Different authors call it differently: the relative entropy, the deviation function, the Kullback–Leibler information function and the Kullback–Leibler divergence.
When
ν is a probability measure, the Kullback action can be defined by the explicit formula:
In particular, if
μ(
x)
≡ 1, then the Kullback action differs only in sign from Shannon’s entropy:
In the case of a probability measure
μ, the Kullback action is nonnegative and vanishes only if
ν =
μ. Indeed, if the functional
ν is not a probability measure that is absolutely continuous with respect to
μ, then
ρ(
ν,
μ) = +
∞. Otherwise, if
ν is a probability measure of the form
ν =
φμ, then Jensen’s inequality and strong convexity of the function
f(
x) =
x ln
x imply:
(since
μ[
φ] =
ν[
1] = 1) and the equality
ρ(
ν, μ) = 0 holds if and only if
φ is constant almost everywhere, and therefore,
ν coincides with
μ.
Every measure
μ ∈
M(
X) can be written in the form
μ =
cμ1, where
c =
μ[
1] and
μ1 ∈
M1(
X). If
ν ∈
M1(
X), then
Formula (9) implies:
In case ν ∉ M1(X), this inequality holds, as well, since the Kullback action is infinite.
Theorem 1. The spectral potential and the Kullback action satisfy the Young inequality:that turns into equality if and only if ν =
μψ. Proof. If
ρ(
ν, μ) = +
∞, then the Young inequality is trivial. If
ρ(
ν,
μ) < +
∞, then by definition of the Kullback action, the functional
ν is an absolutely continuous probability measure of the form
ν =
φμ, where
φ is a nonnegative density. In this case:
(where at the last step, we have used Jensen’s inequality and concavity of the logarithm function). Since
ρ(
ν,
μ) =
ν[ln
φ], this formula implies inequality
(12).
Recall that
μψ =
eψ−λ(ψ,μ)μ. Therefore, if
ν =
μψ, then by definition:
Vice versa, suppose that
ρ(
ν,
μ) =
ν[
ψ] −
λ(
ψ,
μ). Then, subtracting from the above equality the Young inequality
ρ(
ν,
μ)
≥ ν[
φ] −
λ(
φ,
μ), one obtains:
which implies that
ν =
λ′(
ψ,
μ). Finally,
λ′(
ψ,
μ) coincides with
μψ.
Theorem 2. The Kullback action ρ(
ν, μ)
is the Legendre transform of the spectral potential: Proof. By the Young inequality, the left-hand side of
Formula (13) is not less than the right-hand one. Therefore, it is enough to associate with any functional
ν ∈
B∗(
X) a family of functions
ψt, depending on the real-valued parameter
t, on which the equality in
Formula (13) is attained.
Suppose first that
ν is a probability measure that is absolutely continuous with respect to
μ. Therefore,
ν has the form
ν =
φμ, where
φ is a nonnegative density. Consider the family of functions:
When
t → +
∞, we have the following relations:
and so,
Formula (13) is proven.
In all other cases, when ν is not an absolutely continuous probability measure, by definition, we have ρ(ν, μ) = +∞. Let us consider these cases one after another.
If
ν is a probability measure, which is singular with respect to
μ, then there exists
x0 ∈
X, such that
μ(
x0) = 0 and
ν(
x0) > 0. In this case, consider the family of functions:
The right-hand side of the above formula goes to +
∞, while
t increases, and so,
Formula (13) holds.
If the functional
ν is not normalized, then put
ψt(
x)
≡ t. In this case, the expression:
is unbounded from above, and hence,
Formula (13) is still valid.
Finally, if the functional
ν is not positive, then there exists a nonnegative function
φ, such that
ν[
φ] < 0. Consider the family
ψt =
−tφ, where
t > 0. For this family:
as
t → +
∞, and
Formula (13) still remains in force. □
Corollary 1. The functional ρ(⋅, μ) is convex and lower semicontinuous on the space B∗(X).
Proof. These are properties of the Legendre transform. □
5. The Local Large Deviations Principle and the McMillan Theorem
As above, we keep the following notation: X is a finite set; B(X) stands for the space of real-valued functions on X; B∗(X) is the space of linear functionals on B(X); M1(X) is the set of all probability measures on X; and M(X) is the set of all positive measures on X.
With each finite sequence
x = (
x1,…,
xn) ∈
Xn, we associate an
empirical measure δx,n ∈
M1(
X), which is supported on the set
{x1,…,
xn} and assigns to every point
xi the measure 1
/n. The integral of any function
f with respect to
δx,n looks like:
For any measure μ ∈ M(X), we denote by μn its Cartesian power, which is defined on Xn.
Theorem 3 (The local large deviations principle).
For any positive measure μ ∈
M(
X),
functional ν ∈
B∗(
X)
and ε > 0,
there exists a neighborhood O(
ν),
such that: On the other hand, for any ε > 0
and any neighborhood O(
ν),
the following asymptotic estimate holds: If
ρ(
ν, μ) = +
∞, then by the difference
ρ(
ν, μ) −
ε in
Formula (14), we mean an arbitrary positive number.
In the case of probability measure
μ, Theorem 3 is a partial case of Varadhan’s large deviations principle (the explicit formulation of which can be found, e.g., in [
15,
16]). Therefore, this theorem can be deduced from Varadhan’s large deviations principle by means of mere renormalization of
μ. Nevertheless, we will prove it independently for the sake of completeness.
Proof. By Theorem 2, for any
ε > 0, there exists
ψ ∈
B(
X), such that:
Consider the probability measure
μψ =
eψ−λ(ψ,μ)μ. Obviously,
Define a neighborhood of the functional ν as follows:
Thus, the first part of Theorem 3 is proven.
If
ρ(
ν, μ) = +
∞, then estimating
Formula (15) is trivial. Therefore, it is enough to prove it only for the case when
ν is a probability measure of the form
ν =
φμ, and the Kullback action
ρ(
ν,
μ) =
ν[ln
φ] is finite. Fix any number
ε > 0 and any neighborhood
O(
ν). Define the sets:
(the last inequality in the braces implies that
φ(
xi) > 0 at each point of the sequence
x = (
x1,…,
xn)). Note that for
x ∈
Yn,
Corollary 2 (The McMillan theorem).
For any probability measure ν ∈
M1(
X)
and ε > 0,
there exists a neighborhood O(
ν),
such that: On the other hand, for any neighborhood O(
ν)
and ε > 0
: Here, H(
ν)
denotes Shannon’s entropy defined in Formula (10).
Proof. This follows from
Formulas (9) and
(10) and Theorem 3 if we set
μ(
x) = 1 for all
x ∈
X.
6. Hausdorff Dimension and the Maximal Dimension Principle
Let us define the Hausdorff dimension of an arbitrary metric space Ω.
Suppose that Ω is covered by at most a countable collection of sets
U = {
Ui}. Denote by
|U| the diameter of this covering:
|U| = sup
|Ui|, where
|Ui| is the diameter of
Ui. For every
α ∈ ℝ, put:
The
Hausdorff measure (of dimension
α) of the metric space Ω is:
where
U is at most a countable covering of Ω. Obviously,
This implies the basic property of the Hausdorff measure: if mes(Ω, α) < ∞ for some α, then mes(Ω, β) = 0 for all β > α.
The
Hausdorff dimension of the space Ω is the number:
In other words, dimH Ω = α0 if mes(Ω, α) = 0 for all α > α0 and mes(Ω, α) = ∞ for all α < α0.
Note that, according to this definition, we obtain dimH Ω = −∞ when Ω = ∅ and dimH Ω = 0 when Ω is nonempty finite or countable.
Below, we will consider the space of sequences:
Let x = (x1, x2, …) ∈ Xℕ. Denote by Zn(x) the set of all sequences y = (y1, y2,…) whose first n coordinates coincide with the same coordinates of x. This set will be called a cylinder of rank n. In particular, Z0(x) = Xℕ. All cylinders generate the Tychonoff topology on the space Xℕ and the cylinder σ-algebra of subsets in Xℕ.
On the set of all cylinders, we fix an arbitrary positive function
η that possesses the following two properties: first, if
Zn(
x) ⊂
Zm(
y), then
η(
Zn(
x))
≤ η(
Zm(
y)) and, second,
η(
Zn(
x)) 0
→ as
n → ∞ at each point
x ∈
Xℕ. Define the cylindrical metric on
Xℕ by means of the formula:
Evidently, the diameter of Zn(x) in this metric coincides with η(Zn(x)).
Suppose that in addition to cylindrical metric
(20) on
Xℕ, there is given a Borel measure
μ on
Xℕ. The function:
is called the
(lower) pointwise dimension of the measure
μ.
The next theorem provides an effective tool for computing the Hausdorff dimensions of various subsets of Xℕ.
Theorem 4. Suppose A ⊂ Xℕ. If there exists a finite Borel measure μ on Xℕ, such that dμ(x) ≤ d for each point x ∈ A, then dimH A ≤ d. On the contrary, if dμ(x) ≥ d for each x ∈ A and the outer measure μ∗(A) is positive, then dimH A ≥ d.
It follows that if dμ(x) ≡ d on the whole of the subset A ⊂ Xℕ, then its dimension is equal to d.
A weakened version of the second part of Theorem 4 in which the condition dμ(x) ≥ d is replaced by a more strong one μ(Zn(x)) ≤ |Zn(x)|d is usually called the mass distribution principle.
Theorem 4 was first proven by Billingsley in the case when the function
η in
Formula (20) is a probability measure on
Xℕ (see Theorems 2.1 and 2.2 in [
11]). An analog to this theorem for subsets
A ⊂ R
r was proven in works [
12,
13].
Proof. Every cylinder
Zn(
x) is, in fact, a ball in metric
(20), whose radius equals its diameter, and
vice versa, any ball in this metric coincides with a cylinder. Besides, any two cylinders
Zn(
x) and
Zm(
y) either have an empty intersection or one of them is embedded in the other. Therefore, while computing the Hausdorff measure and the dimension of a subset
A ⊂
Xℕ, it is enough to operate only with disjoint coverings of
A by cylinders.
Suppose first that
dμ(
x) <
α for all points
x ∈ A. Then, for each
x ∈ A, there exist arbitrarily small cylinders
Zn(
x) satisfying the condition
|Zn(
x)
|α < μ(
Zn(
x)). Using cylinders of this kind, we can arrange a disjoint covering
U of the set
A of arbitrarily small diameter. For this covering, we have:
and hence, dim
H A ≤ α. Thus, the first part of the theorem is proven.
Suppose now that
dμ(
x)
> α for all points
x ∈ A. Define the sets:
Obviously, A = ∪ε>0 Aε. Hence, there exists an ε, such that μ∗(Aε) > 0. Let U be a disjoint covering of A by cylinders of diameters less than ε. From the definition of Aε, it follows that mes(U, α) ≥ μ∗(Aε). Therefore, dimH A ≥ α, and thus, the second part of the theorem is proven.
Each point
x = (
x1,
x2,
…) ∈
Xℕ generates a sequence of empirical measures
δx,n on the set
X:
In other words, δx,n(i) is the fraction of coordinates of the vector (x1,…, xn) coinciding with i.
For every probability measure
ν ∈
M1(
X), let us define its
basin B(
ν) as the set of all points
x ∈
Xℕ, such that
δx,n converges to
ν:
Evidently, basins different measures do not intersect each other and are nonempty. If x ∈ B(ν) and y ∈ Xℕ differs from x in only finite number of coordinates, then y ∈ B(ν). This implies the density of each basin in Xℕ.
Every measure ν ∈ M1(X) generates the Bernoulli distribution Pν = νℕ on Xℕ. By the strong law of large numbers, the basin B(ν) has probability one with respect to Bernoulli distribution Pν, and its complement has zero probability Pν. In particular, any basin different from B(ν) has zero probability.
Points that do not belong to the union of all basins will be called irregular. The set or irregular points has zero probability with respect to any distribution Pν, where ν ∈ M1(X). As a result, Xℕ turns out to be decomposed into the disjoint union of different basins and the set of irregular points.
Let us fix some numbers
θ(
i) ∈ (0, 1) for all elements
i ∈
X = {1,…,
r} and define a
cylindrical θ-metric on
Xℕ by the rule:
It is a partial case of cylindrical metric
(20).
For each measure
ν ∈
M1(
X) and
θ-metric
(21), define the quantity:
We will call it the
Billingsley entropy, because he was the first who wrote down this formula and applied it for the computation of the Hausdorff dimensions [
10]. He also expressed this quantity in terms of Shannon’s entropy and the Kullback action:
Theorem 5. The Hausdorff dimension of any basin B(
ν)
with respect to θ-metric (21) is equal to the Billingsley entropy S(
ν, θ).
A partial case of this theorem, in which
θ(1) = ⋯ =
θ(
r) = 1
/r, was first proven by Eggleston [
17]. In the complete form, this theorem and its generalizations were proven by Billingsley [
10,
11].
Proof. Assume first that
ν(
i) > 0 for every
i = 1,
…,
r. Obviously,
Hence, for each point
x ∈
B(
ν), we have:
Applying Theorem 4 to the set A = B(ν) and measure μ = Pν, we obtain the statement of Theorem 5.
In the general case, the same argument provides only the lower bound
, which implies the lower bound dimH B(ν) ≥ S(ν, θ). The inverse inequality is provided by the next lemma.
Lemma 1. Suppose that Xℕ is equipped with metric (21).
Then, for any measure ν ∈
M1(
X)
and ε > 0,
there exists a neighborhood O(
ν),
such that the Hausdorff dimension of the set:does not exceed S(
ν, θ) +
ε. Proof. Fix a measure
ν ∈
M1(
X) and an arbitrary positive number
x. By McMillan’s theorem, there exists a neighborhood
O(
ν), such that for each positive integer
n:
Decrease this neighborhood in such a way that, in addition to
Formula (24), for every measure
δ ∈
O(
ν), the next inequality holds:
Then, for every cylinder
Zn(
x) satisfying the condition
δx,n ∈
O(
ν), we have the estimate:
For any positive integer
N, the set
A is covered by the collection of cylinders:
Evidently, the diameter of this covering goes to zero when
N increases. Now, we can evaluate mes(
UN, α) by means of
Formulas (24) and
(25):
If
α > S(
ν, θ), then we can choose κ > 0 so small that the exponent in braces in
Formula (26) is negative, and all of the sum in
Formula (26) goes to zero as
N → ∞. Therefore, the Hausdorff measure (of dimension
α) of the set
A is zero, and hence, dim
H A does not exceed
α.
We will say that a sequence of empirical measures δx,n condenses on a subset V ⊂ M1(X) (notation: δx,n ≻ V) if it has at least one limit point in V.
Similar to the known large deviations principle by Varadhan [
15,
16], it is natural to call the next theorem the
maximal dimension principle.Theorem 6. Let the space Xℕ be equipped with cylindrical θ-metric (21).
Then, for any nonempty subset V ⊂
M1(
X)
: Proof. The set
A = {
x ∈
Xℕ | δx,n ≻
V} contains basins of all measures
ν ∈
V. Therefore, by Theorem 5, its dimension is not less than the right-hand side of
Formula (27).
It is easily seen from the definition
(22) of the Billingsly entropy
S(
ν, θ) that it depends continuously on the measure
ν ∈
M1(
X). Consider the closure
of
V. Obviously, it is compact. Fix any
ε > 0. By Lemma 1 for any measure
, there exists a neighborhood
O(
ν), such that:
Let us take a finite covering of
composed of neighborhoods of this sort. Then, the set
A = {
x ∈
Xℕ | δx,n ≻
V} will be covered by a finite collection of sets of the form {
x ∈
Xℕ | δx,n ≻
O(
ν)} satisfying inequality
Formula (28). By the arbitrariness of
ε, this implies the statement of Theorem 6.
A result very similar to Theorem 6 was proven by Billingsley in Theorem 7.1 in [
10].
7. Branching Processes
First, let us recall the basic notions about the simplest Galton–Watson branching process.
Suppose that a random variable Z takes nonnegative values k ∈ Z+ with probabilities pk. The Galton–Watson branching process is a sequence of random variables Z0, Z1, Z2,…, such that Z0 ≡ 1, Z1 = Z, and further, each Zn+1 is defined as the sum of Zn independent counterparts of the random variable Z. In particular, if Zn = 0, then Zn+1 = 0, as well. Usually, Zn is understood as the total number of descendants in the n-th generation of a unique common ancestor under the condition that each descendant independently of the others gives birth to Z children.
It is known that in some cases, the posterity of the initial ancestor may degenerate (when starting from a certain
n, all
Zn are zeros), and in other cases, it can “flourish” (when
Zn grows exponentially). The type of behavior of a branching process depends on the mean number of children of any individual:
and on the generating function of that number:
Obviously, the restriction of the function f(s) to the segment [0, 1] is nonnegative, nondecreasing, convex and satisfies f(1) = 1 and f′(1) = m.
In the theory of branching processes (see, for instance, [
18,
19]), the following statements were proven.
Theorem 7. The generating functions of the number of descendants in the n-th generation:satisfy the recursion relation fn+1(
s) =
f(
fn(
s)).
Theorem 8. If m ≤ 1, then the branching process degenerates almost surely (except the case when each individual gives birth to exactly one child). If m > 1, then the probability of degeneration q is less than one and coincides with a unique non-unit root of the equation f(s) = s on the segment [0, 1].
Theorem 9. If m > 1
and E
Z2 <
∞, then the sequence Wn =
Z n/mn converges almost surely to a random variable W, such that P
{W > 0
} = 1
− q. If m > 1
and E
Z2 =
∞, then, for any number m0 < m, with probability 1
− q: (here, q is the probability of degeneration of the branching process).
Thereby, in the case m > 1, there is an alternative for the total number of descendants Zn: either it vanishes at a certain moment n0 (with probability q < 1) or it is asymptotically equivalent to Wmn (with the complementary probability 1 − q), where the random variable W > 0 does not depend on n (except the case EZ2 = ∞, when only the logarithmic equivalence ln Zn ∼ ln mn is guaranteed). All of the other types of behavior of descendants’ totality have zero probability.
We will exploit these theorems in the study of colored branching processes.
Suppose now that each individual may give birth to children of
r different colors (or
r different genders, if one likes). We will assume that the posterity of each individual in the first generation is a random set
X containing a random number
k1 of children of the first color, a random number
k2 of children of the second color, and so on, up to a random number
kr of children of color
r. All of the elements of
X (including elements of the same color) are treated as different. The ordered array
will be called the
color structure of the set
X of children. Denote by
pk the probability of the birth of the set
X with color structure
k = (
k1,
k2,…,
kr). Naturally, all of the probabilities
pk are nonnegative and:
If an individual x1 gave birth to x2, then x2 gave birth to x3, and so on, up to an individual xn, then the sequence x = (x1,…, xn) will be called a genetic line of length n.
Let us construct a new branching process taking into account not only the total number of descendants, but also the color of each individual and all its upward and downward lineal relations. This process may be treated as a random genealogical tree with a common ancestor in the root and all its descendants in the vertices, where each parent is linked with all its children. In the case of a degenerating population, its genealogical tree is finite, and in the case of a “flourishing” one, the tree is infinite.
Formally, it is convenient to define such a process as a sequence of random sets
Xn containing all genetic lines of length
n. As the first set
X1, we take
X. The subsequent sets
Xn are built by induction: if
Xn is already known, then for all genetic lines (
x1,…,
xn)
∈ Xn, one defines disjoint independent random sets of children
X(
x1,…,
xn), each with a color structure distribution as in
X, and
Xn+1 is given by:
The stochastic process X1, X2, X3,… built in this way will be referred to as a colored branching process (or unconditional colored branching process, if one wishes to emphasize that the posterity of any individual is independent of its color and genealogy).
8. The McMillan Theorem for Colored Branching Processes
Consider a colored branching process X1, X2, X3,… with a given finite collection of colors Ω = {1,…, r} and a probability distribution
, where k = (k1,…, kr) is the color structure of each individual’s set of children X. We will always assume that X1 = X is generated by a unique initial individual.
For any genetic line
x = (
x1,…,
xn)
∈ Xn, define the
spectrum δx,n as the corresponding empirical measure on Ω given by the rule:
where
g(
xt) denotes the color of
xt. In other words,
δx,n(
i) is the fraction of individuals of color
i in the genetic line
x. Our next goal is to obtain asymptotical estimates for cardinalities of the random sets:
where
O(
ν) is a small neighborhood of the distribution
ν on the set of colors Ω.
Denote by
μ(
i) the expectation of members of color
i in
X:
Provided all μ(i) are finite, the vector μ = (μ(1),…, μ(r)) can be regarded as a measure on the set of colors Ω. This measure generates the measure μn on Ωn as a Cartesian product.
Define a mapping
G :
Xn → Ω
n by means of the formula:
where
g(
xt) is the color of
xt.
Lemma 2. For any,
we have: Proof. Discard the last coordinate in
ω, and let
. For every genetic line (
x1,…,
xn−1)
∈ Xn−1, by virtue of the definition of unconditional colored branching process, we have:
Evidently, this expression does not depend on
x′ = (
x1,…,
xn−1). Therefore,
Repeated application of the latter equality gives
Formula (31).
For the measure
μ from
Formula (30), let us define the Kullback action:
where
M1(Ω) is the set of all probability measures on Ω. This formula is a copy of
Formula (9).
Theorem 10. Suppose X1,
X2,
X3,…
is an unconditional colored branching process with a finite collection of colors Ω.
Then, for any ε > 0
and probability measure ν ∈ M1(Ω),
there exists a neighborhood O(
ν)
⊂ M1(Ω),
such that for all natural n: On the other hand, for any ε > 0
and any neighborhood O(
ν)
: If
ρ(
ν, μ) = +
∞, the expression
−ρ(
ν, μ) +
ε in
Formula (32) should be treated as an arbitrary negative real number.
Proof. It follows from
Formula (29) that, for every genetic line
x ∈ Xn, its spectrum
δx,n coincides with the empirical measure
δω,n, where
ω =
G(
x). Therefore,
It is remarkable that the last reference to the large deviations principle serves as a unique “umbilical cord” linking the first three Sections of the paper with the others.
Now, we are ready to state an analog of the McMillan theorem for colored branching processes. Let q∗ be the probability of degeneration of the process (the probability of the event when starting from a certain number n all of the sets Xn turn out to be empty).
Theorem 11. Suppose that X1,
X2,…
is an unconditional colored branching process with a finite collection of colors Ω.
Then, for any ε > 0
and any probability measure ν ∈ M1(Ω),
there exists a neighborhood O(
ν)
⊂ M1(Ω),
such that almost surely: On the other hand, if ρ(
ν, μ) < 0,
then, for any neighborhood O(
ν)
and positive ε, the estimate:holds with probability 1
− q∗ (or almost surely under the condition that our branching process does not degenerate). Proof. The application of Chebyshev’s inequality to
Formula (32) gives:
Summing up these inequalities over all
n ≥ N, one obtains:
This implies
Formula (35) with constant 2
ε instead of
ε, which does not change its sense.
Now, we proceed to the second part of the theorem. Let
x =
−ρ(
ν, μ)
− ε and, at the same time, the number
ε be so small that
x > 0. By the second part of Theorem 10, for any neighborhood
O(
ν), there exists
N, such that:
Without loss of generality, we may assume that O(ν) is convex.
Now, we wish to construct a Galton–Watson branching process
Z1,
Z2,… satisfying the conditions:
Let the random variable
Z1 be defined by
Formula (37). For
n > 1, set
Zn to be the total number of genetic lines:
such that the spectrum of each segment (
xkN+1,…,
x(k+1)N) belongs to
O(
ν). In other words, we treat as “individuals” of the process
Z1,
Z2,
Z3,… those segments (
xkN+1,…,
x(k+1)N) of genetic lines of the initial process whose spectrum lies in
O(
ν). Now,
Formula (38) follows from convexity of
O(
ν); and from the unconditionality of the initial colored branching process, it can be concluded that the sequence
Z1,
Z2,… in fact forms a Galton–Watson branching process.
By construction, one has E
Z1 > eNx. In this setting, Theorem 9 asserts that there is an alternative for the sequence
Zn: either it tends to zero with a certain probability
q < 1 or it grows faster than
enNx with probability 1
− q. In the second case,
Formula (38) implies:
To finish the proof, we have to do two things: verify that
Formula (39) is valid with probability 1
− q∗ and get rid of the multiplier
N there. To do this, we will exploit two ideas. First, if the colored branching process
X1,
X2,… had been generated by
m initial individuals instead of a unique one, then
Formula (39) would be valid with a probability of at least 1
− qm. Second, if one genetic line is a part of another one and the ratio of their lengths is close to one, then their spectra are close to each other, as well.
Obviously, the total number of individuals in the n-th generation of the initial branching process X1, X2, X3,… equals |Xn|. The sequence of random variables |Xn| forms a Galton–Watson branching process with the probability of degeneration q∗, that does not exceed q. Therefore, the sequence |Xn| grows exponentially with probability 1 − q∗.
Consider the colored branching process
Xk+1,
Xk+2,
Xk+3,… obtained from the initial one by truncation of the first
k generations. It can be thought of as a union of
|Xk| independent branching processes generated by all individuals of
k-th generation. Therefore, it satisfies
Formula (39) with probability at least
. Hence, for the initial process with even greater probability, we obtain the condition:
where
O∗(
ν) is an arbitrary neighborhood of
ν containing the closure of
O(
ν).
Suppose that the sequence
|Xn| grows exponentially. For every
m ∈ ℕ, we set the numbers:
For each
k =
ki, condition
(40) holds with probability at least 1
− qm, and together, they give the estimate:
with probability at least 1
− Nqm. By the arbitrariness of
m, this estimate is valid almost surely (under the condition
|Xn| → ∞, which takes place with probability 1
−q∗). It is equivalent to
Formula (36).
9. Dimensions of Random Fractals (Upper Bounds)
Here, we continue the investigation of a colored branching process
X1,
X2,… with a finite collection of colors Ω = {1,…,
r}. Let us consider the corresponding set of infinite genetic lines:
Define the cylindrical
θ-metric on
X∞:
where the numbers
θ(1),…,
θ(
r) are taken from (0, 1).
We will be interested in the Hausdorff dimensions of both the space
X∞ and its various subsets, defined in terms of the partial limits of empirical measures on Ω (we call these measures spectra; see
Formula (29), and denote by
δx,n). If the colored branching process degenerates, then
X∞ = ∅. Therefore, the only case of interest is when
m = E
|X1| > 1, and the cardinality of
Xn increases with a rate of order
mn.
As before, we denote by
μ(
i), where
i ∈ Ω, the expectation of individuals of color
i in the random set
X1. It will be always assumed that
μ(
i) <
∞. Consider any probability measure
ν ∈ M1(Ω). It will be proven below that the dimension of the set {
x ∈ X∞ | δx,n → ν} can be computed by means of the function:
We will call it the Billingsley–Kullback entropy.
The numerator in
Formula (42) is the Kullback action, and the denominator is negative. If
μ is a probability measure on Ω, then the Kullback action is nonnegative. However, in our setting, this is not the case, since
m =
μ(1) + ⋯ +
μ(
r) > 1. In particular, if
μ(
i)
> ν(
i), for all
i ∈ Ω, then the Kullback action is negative, and the Billingsley–Kullback entropy is positive. Note, in addition, that if
μ(1) = ⋯ =
μ(
r) = 1, then
−ρ(
ν, μ) is equal to Shannon’s entropy
H(
ν), and the Billingsley–Kullback entropy turns into the Billingsley entropy
(22).
Lemma 3. Let the space X∞ of infinite genetic lines be equipped with metric (41).
Then, for any probability measure ν ∈ M1(Ω)
and any ε > 0,
there exists a neighborhood O(
ν),
such that the Hausdorff dimension of the set:does not exceed d(
ν, μ, θ) +
ε almost surely. The proof is carried out in the same manner as in Lemma 1. Take any
x > 0. By Theorem 32, there exists a neighborhood
O(
ν), such that almost surely:
Let us reduce this neighborhood in such a way that in addition to
Formula (43), for all
δ ∈ O(
ν),
Then, for each cylinder
Zn(
x) satisfying the condition
δx,n ∈ O(
ν), the following estimate holds:
For every natural
N, the set
A is covered by the collection of cylinders:
Evidently, the diameter of this covering tends to zero as
N → ∞. Hence,
can be estimated by means of
Formulas (43) and
(44):
If
α > d(
ν, μ, θ), then
x can be chosen so small, that the last exponent in braces is negative, and all of the sum
(45) tends to zero as
N → ∞. Thus, the Hausdorff measure (of dimension
α) of the set
A is zero, and its dimension does not exceed
α.
As before, we say that a sequence of empirical measures δx,n condenses on a subset V ⊂ M1(Ω)
if it has a limit point in V.
Theorem 12. Let X1,
X2,…
be an unconditional colored branching process with a finite set of colors Ω,
and let the set X∞ of all infinite genetic lines be equipped with a cylindrical θ-metric (41).
Then, for any subset V ⊂ M1(Ω),
almost surely:
In particular, dim
H X∞ ≤ s almost surely, where s is a unique root of the “Bowen equation”:
Proof. It follows from the definition of the Billingsley–Kullback entropy
d(
ν, μ, θ) that it depends continuously on the measure
ν ∈ M1(Ω). Let
be the closure of
V. Obviously, it is compact. Take an arbitrary
ε > 0. By Lemma 3, for any measure
, there exists a neighborhood
O(
ν), such that almost surely:
Choose a finite covering of the set
by neighborhoods of this kind. Then, the set
will be covered by a finite collection of sets of the form
satisfying
Formula (48). By the arbitrariness of
ε, this implies the first statement of Theorem 12.
Let
s be a solution to
Equation (47). Note that for any
ν ∈ M1(Ω), the concavity of the logarithm function implies that:
Consequently, d(ν, μ, θ) ≤ s. Now, the second part of the theorem follows from the first one if we take V = M1(Ω).
Remark. Usually, the term “Bowen equation” is used for the equation
P (
sφ) = 0, where
P (
sφ) is the topological pressure of a weight function
sφ in some dynamical system (more detailed explanations can be found in [
13]). If we replace the topological pressure
P (
sφ) by the spectral potential:
then the Bowen equation turns into the equation
λ(
sφ, μ) = 0, which is equivalent to
Formula (47).
10. Block Selections of Colored Branching Processes
Let ξ1, ξ2, ξ3,… be a sequence of independent identically distributed random variables taking values of zero or one (independent Bernoulli trials).
Lemma 4. If 0 <
p′ < p < 1
and P{
ξi = 1}
≥ p, then:uniformly with respect to the probability.
Proof. In the case P{
ξi = 1} =
p, this follows from the law of large numbers. If P{
ξi = 1} increases, then the probability in the left-hand side of
Formula (49) increases, as well.
Consider a colored branching process
X1,
X2,… with a finite set of colors Ω = {1,…,
r}. Each
Xn consists of genetic lines (
x1,
x2,…,
xn) of length
n, in which every subsequent individual is born by the previous one. Fix a (large enough) natural
N. We will split genetic lines of length divisible by
N into blocks of length
N:
Each block
yk generates an empirical measure
(spectrum) on Ω by the rule:
where
g(
xt) denotes the color of
xt.
A block selection of order N from a colored branching process X1, X2,… is any sequence of random subsets Yn ⊂ XnN with the following property: if (y1,…, yn+1) ∈ Yn+1, then (y1,…, yn) ∈ Yn. In this case, the sequence of blocks (y1,…, yn+1) will be called a prolongation of the sequence (y1,…, yn).
As above (see
Formula (30)), denote by
μ(
i) the expectation of children of color
i born by each individual and by
μ the corresponding measure on Ω.
Theorem 13. Let X1,
X2,…
be an unconditional colored branching process with a finite set of colors Ω
and the probability of degeneration q∗ < 1.
If a measure ν ∈ M1(Ω)
satisfies the condition ρ(
ν, μ) < 0,
then for any neighborhood O(
ν)
⊂ M1(Ω)
and any number ε > 0
with probability 1
− q∗, one can extract from the branching process a block selection Y1,
Y2,…
of an order N, such that each sequence of blocks (
y1,…,
yn)
∈ Yn has at least l(
N)
prolongations in Yn+1,
where:and the spectra of all blocks belong to O(
ν).
Proof. Fix any numbers
p and
ε satisfying the conditions:
By the second part of Theorem 11 for all large enough
N, we have:
Further, we will consider finite sequences of random sets
X1,…,
XnN and extract from them block selections
Y1,…,
Yn of order
N, such that the spectra of all of their blocks belong to
O(
ν) and each sequence of blocks (
y1,…,
yk)
∈ Yk has at least
l(
N) prolongations in
Yk+1. Denote by
An the event of the existence of a block selection with these properties. Define one more event
A by the condition:
It follows from
Formulas (50) and
(51) that P(
A)
≥ p +
ε. Evidently,
A ⊂ A1. Therefore, P(
A1)
≥ p +
ε. Now, we are going to prove by induction that P(
An)
≥ p whenever the order
N of our block selection is large enough. Let us perform the step of induction. Suppose that P(
An)
≥ p is valid for some
n. Consider the conditional probability P(
An+1|A). By the definition of events
An+1 and
A, it cannot be less than the probability of the following event: there are at least
l(
N) wins in a sequence of
independent Bernoulli trials with the probability of win P(
An) in each. Choosing
N large enough and using Lemma 4 (with
p′ =
p/2 and
, one can make this probability greater than 1
− ε. Then,
Thus, the inequality P(An) > p is proven for all n.
This means that with probability greater than p, one can extract from the sequence X1,…, XnN a block selection Y1,…, Yn of order N, such that the spectra of all blocks belong to the neighborhood O(ν) and each sequence of blocks (y1,…, yk) ∈ Yk has at least l(N) prolongations in Yk+1.
To obtain a block selection of infinite length with the same properties, we will construct finite block selections Y1,…, Yn in the following manner. Initially, suppose that every Yk, where k ≤ n, consists of all of the sequences of blocks (y1,…, yk) ∈ XkN, such that the spectrum of each block lies in O(ν). At the first step, we exclude from Yn−1 all of the sequences of blocks having less than l(N) prolongations in Yn and then exclude from Yn all prolongations of the sequences that were excluded from Yn−1. At the second step, we exclude from Yn−2 all of the sequences of blocks having after the first step less than l(N) prolongations in the modified Yn−1, and then, exclude from Yn−1 and Yn all prolongations of the sequences that were excluded from Yn−2. Proceeding further in the same manner, after n steps, we will obtain a block selection Y1,…, Yn, such that each sequence of blocks from any Yk has at least l(N) prolongations in Yk+1. Evidently, this selection will be the maximal one among all of the selections of order N having the mentioned “l(N)-prolongations” property. Therefore, with probability at least p, all of the sets Yk are nonempty.
For every
n, let us construct, as is described above, the maximal block selection
possessing the mentioned “
l(
N)-prolongations” property. From the maximality of these selections, it follows that:
Define the sets
. Then, with probability at least p, all of them are nonempty and compose an infinite block selection from Theorem 13. Since p may be chosen arbitrarily close to 1 – q*, such selections do exist with probability 1 – q*.
Theorem 13 can be strengthened by taking several measures in place of a unique measure ν ∈ M1(Ω).
Theorem 14. Let X1,
X2,
X3,…
be an unconditional colored branching process with a finite set of colors Ω
and probability of degeneration q* < 1.
If a finite collection of measures νi ∈ M1(Ω),
where i = 1,…,
k, satisfy the inequalities ρ(
νi, μ) < 0,
then for any neighborhoods O(
νi)
∈ M1(Ω)
and any ε > 0
with probability 1
– q*, one can extract from the branching process a block selection Y1,
Y2,…
of an order N, such that for every i = 1,…,
k, each sequence of blocks (
y1,…,
yn)
∈ Yn has at least:prolongations (
y1,…,
yn, y)
∈ Yn+1 with the property δy ∈ O(
νi).
It can be proven in the same manner as the previous one. Here, the event
An should be understood as the existence of a finite block selection
Y1,…,
Yn satisfying the conclusion of Theorem 14, and the event
A should be defined by the system of inequalities:
We leave the details to the reader.
11. Dimensions of Random Fractals (Lower Bounds)
Now, we continue the investigation of the space of infinite genetic lines:
which is generated by an unconditional colored branching process
X1,
X2,… with a finite set of colors Ω = {1,…,
r}. It is assumed that there is a measure:
on Ω, where
μ(
i) denote the expectation of children of color
i born by each individual, and
X∞ is equipped with cylindrical
θ-metric
(41).
Theorem 15. Let X1,
X2,
X3,…
be an unconditional colored branching process with a finite set of colors Ω
and probability of degeneration q* < 1.
If a measure ν ∈ M1(Ω)
satisfies the condition d(
ν, μ, θ) > 0,
then with probability 1
– q* for any neighborhood O(
ν),
we have the lower bound:
Proof. Let us fix any number
α < d(
ν, μ, θ) and
ε > 0 so small that:
Then, we choose a convex neighborhood
O*(
ν), whose closure lies in
O(
ν) and such that, for any measure
δ ∈
O*(
ν):
By Theorem 13 with probability 1
– q*, one can extract from the branching process under consideration a block selection
Y1,
Y2,… of an order
N, such that any sequence of blocks (
y1,…,
yn) from
Yn has at least
l(
N) prolongations in
Yn+1, where:
and for each block
, the corresponding empirical measure
δyk (spectrum) belongs to
O*(
ν). Let us exclude from this selection a certain part of genetic lines in such a way that each of the remaining sequences of blocks (
y1,…,
yn) ∈
Yn has exactly [
l(
N)] prolongations in
Yn+1.
Any sequence
y = (
y1,
y2,…) ∈
Y∞ consists of blocks of length
N. Writing down in order the elements of all of these blocks, we obtain from
y an infinite genetic line
x = (
x1,
x2,…) ∈
X∞. Denote it as
π(
y). By the definition of
Y∞, the spectrum of each block
yk belongs to
O*(
ν). For every point
x =
π(
y), where
y ∈
Y∞, the empirical measure
δx,nN is the arithmetical mean of the empirical measures corresponding to the first
n blocks of
y, and so, it belongs to
O*(
ν), as well. It follows that:
The family of all cylinders of the form
ZnN(
x), where
x ∈
π(
Y∞), generates some
σ-algebra on
π(
Y∞). Define a probability measure
P on this
σ-algebra, such that:
Then, for all large enough
N, all
x ∈
π(
Y∞) and all natural
n:
It follows from the last two formulas and
Formula (53) that:
Now, we are ready to compute the Hausdorff measure of dimension
α of the set
π(
Y∞). If, while computing the Hausdorff measure, we used coverings of
π(
Y∞) not by arbitrary cylinders, but only by cylinders of orders divisible by
N, then the last formula would imply that such a measure is not less than
P (
π(
Y∞)) = 1. Any cylinder can be embedded in a cylinder of order divisible by
N, such that the difference of their orders is less than
N and the ratio of their diameters greater than min
θ(
i)
ℕ. Therefore,
and hence, dim
H π(
Y∞)
≥ α.
The set defined in the right-hand part of
Formula (55) contains
π(
Y∞). Thus, its dimension is at least
α. Recall that we have proven this fact by means of a block selection that exists with probability 1
– q*. By the arbitrariness of
α < d(
ν, μ, θ), this implies the desired bound
(52) with the same probability.
Theorem 16. Let s be a root to the “Bowen equation”: If s ≤ 0, then X∞ = ∅ almost surely. Otherwise, if s > 0, then X∞ is nonempty with a positive probability, and with the same probability, its dimension equals s.
We note that a more general version of Theorem 16, in which the similarity coefficients
θ(
i) are random, is proven in [
4–
6,
14].
Proof. The expectation of the total number of children born by each individual in the branching process generating the set X∞ is equal to m = μ(1) + ⋯ + μ(r). If s ≤ 0, then m ≤ 1. In this case, by Theorem 8, our branching process degenerates almost surely, and X∞ = ∅.
If
s > 0, then
m > 1. In this case, by Theorem 8, our branching process does not degenerate with a positive probability, and
X∞ is nonempty. Define a measure
ν ∈
M1(Ω) by means of the equality:
Then, evidently, d(ν, μ, θ) = s. By the previous theorem, dimH X∞ ≥ s with the same probability with which X∞ ≠ ∅. On the other hand, by Theorem 12, the inverse inequality holds almost surely.
For every probability measure ν ∈ M1(Ω), we define its basin B(ν) ⊂ X∞ as the set of all infinite genetic lines x = (x1, x2, x3,…), such that the corresponding sequence of empirical measures δx,n converges to ν. What is the dimension of B(ν)? By Theorem 12, it does not exceed the Billingsley–Kullback entropy d(ν, μ, θ) with probability one. On the other hand, the inverse inequality does not follow from the previous results (and, in particular, from Theorem 15). To obtain it, we ought to enhance the machinery of block selections.
Lemma 5. Let be the vertices of a cube in ℝ
r. Then, there exists a law of choice i: ℝ
r → {1,…, 2
r},
such that if neighborhoods O(
Qi)
are small enough, and sequences:satisfy the conditions and
,
then the sequence Δ
n converges to the center of the cube. Proof. First, we consider the case
r = 1, when the cube is a segment. Let, for definiteness,
Q1 =
−1 and
Q2 = 1. Set:
Take any neighborhoods
O(
Q1) and
O(
Q2)with radii at most one. Then, for any sequence
δn satisfying the conditions
and
|δ1| < 2, we have the estimate
|∆
n| < 2
/n. It may be easily proven by induction. Thus, in the one-dimensional case, the lemma is proven. To prove it in the multidimensional case, one should choose a coordinate system with the origin at the center of the cube and the axes parallel to its edges and apply the law of choice
(56) to each of the coordinates independently.
Theorem 17. Let X1,
X2,
X3,…
be an unconditional colored branching process with a finite set of colors Ω
and probability of degeneration q* < 1.
If a measure ν ∈
M1(Ω)
satisfies the condition d(
ν, μ, θ) > 0,
then with probability 1
– q*: Proof. Fix any number
α < d(
ν, μ, θ) and
ε > 0 so small, that:
The set
M1(Ω) is, in fact, a simplex of dimension
r − 1, where
r =
|Ω
|. Suppose first that
ν is an inner point of this simplex (in other words,
ν(
i) > 0 for all
i ∈ Ω). Take a small convex neighborhood
O(
ν) ⊂
M1(Ω), such that for any measure
δ ∈
O(
ν):
Let
be the vertices of some cube in
O(
ν) with the center at
ν. Define for them small neighborhoods
O(
Qi) ⊂
O(
ν) as in Lemma 5. By Theorem 14 with probability 1
– q*, one can extract from the colored branching process
X1,
X2,… a block selection
Y1,
Y2,… of an order
N, such that for every
i ≤ 2
r−1, each sequence of blocks (
y1,…,
yn) ∈
Yn has at least:
prolongations (
y1,…,
yn, y) ∈
Yn+1 possessing the property
δy ∈
O(
Qi).
We exclude from this block selection a certain part of the genetic lines, so that each of the remaining sequences of blocks (
y1,…,
yn) ∈
Yn has exactly [
l(
N)] prolongations (
y1,…,
yn, y) ∈
Yn+1, and all of these prolongations satisfy the law of choice from Lemma 5, namely,
Let π(Y∞) be the set of all infinite genetic lines (x1, x2,…) ∈ X∞ for which every initial segment of length nN, being partitioned into blocks of length N, turns into an element of Yn. Then, by Lemma 5, we have the inclusion π(Y∞) ⊂ B(ν).
Reproducing the reasoning from the proof of Theorem 15, one can verify that the dimension of π(Y∞) is greater than α. Since α can be taken arbitrarily close to d(ν, μ, θ), we obtain the lower bound dimH B(ν) ≥ d(ν, μ, θ). The inverse inequality, as was mentioned above, follows from Theorem 12. Thus, in the case of an inner point ν ∈ M1(Ω), the theorem is proven.
If the measure
ν belongs to the boundary of the simplex
M1(Ω), then one should exclude from Ω all elements
i with
ν(
i) = 0 and consider the set:
Let us exclude from the branching process X1, X2,… all of the genetic lines containing elements of colors not in Ω0 and denote by
the resulting branching process (with the set of colors Ω′). The corresponding set of infinite genetic lines
is contained in X∞. It follows from the definition of the Billingsley–Kullback entropy d(ν, μ, θ) that it is the same for the sets of colors Ω and Ω′. Besides, the measure ν lies in the interior of the simplex M1(Ω′). Therefore,
with the same probability as the probability of the event
.
The theorem would be completely proven if the probability of the event
was equal to 1 – q*. However, it may be less than 1 – q*. This obstacle can be overcome as follows.
Let
. This is nothing else than the expectation of each individual’s number of children in the branching process
If
m′ ≤ 1, then
Formula (11) implies the inequality
ρ(
ν, μ) ≥ 0, which contradicts the condition
d(
ν, μ, θ) > 0 of our theorem. Therefore,
m′ > 1, and respectively, the probability of the event
is strictly less than one. Let us denote it by
q′.
If the branching process
was generated not by a unique initial element, but by k initial elements, then the probability of the event
would be equal to (q′)k. Recall that the cardinality of Xn grows exponentially with probability 1 – q*. If this is the case, one can first wait for the event |Xn| ≥ k and then consider separately |Xn| independent counterparts of the branching process
generated by different elements of Xn. This trick allows one to obtain the bound dimH B(ν) ≥ d(ν, μ, θ) with conditional probability at least 1 − (q′)k under the condition |Xn| → ∞. Since k is arbitrary, the above-mentioned conditional probability is, in fact, equal to one, and the complete probability cannot be less than 1 – q*.