In this section, we consider
-valued inputs subject to additive non-degenerate multivariate Gaussian noise. In
Section 3.1, the capacity-achieving distribution,
, is formulated as the objective of an optimization problem; its support is then framed in terms of the zero set of a certain real-analytic function, which is dependent on
and referred to as
.
Section 3.2 finds an equivalent expression for
, which is an intermediary step to showing in
Section 3.3 that
is non-constant.
Section 3.4 uses the result that
is non-constant to show that the support of the capacity-achieving distribution is contained in a countable union of submanifolds of dimensions in the range
. This union is finite when the input is constrained to a compact subset of
. It is then shown that the support of the capacity-achieving input has Lebesgue measure 0 and is nowhere dense in
.
Appendix A is dedicated to showing the convexity and compactness of the optimization space used in
Section 3.1.
Appendix B establishes a pointwise characterization of
, which justifies the definition of
.
Appendix C provides integrability results, which are used throughout the paper.
Appendix D shows that the objective functional is weak continuous, strictly concave, and weak differentiable on the optimization space.
Appendix E shows that
has an analytic extension to
. Finally,
Appendix F supports
Section 3.3 by finding bounds for certain functions.
As a first step towards defining the set of feasible input distributions, let
be the set of finite Borel measures on
. Note that
is contained in the set of finite signed Borel measures on
, which has an intrinsic vector space structure and can be equipped with a norm [
16]. Since
lies within a normed vector space, the convexity and compactness of its subsets can be discussed.
3.1. Optimization Problem
By Theorem 3.6.2 of [
17], the capacity of the channel in (
7) is given by the optimization problem
Since the relationship between
,
, and
is known, the mutual information is a function of the distribution of
alone. Thus, the mutual information induced between
and
will be denoted by
. Similarly, we express the even-moment constraint in terms of a functional
given by
where
is equivalent to
. Rewriting (
8) in terms of
and
yields
Much of the appendix is dedicated to understanding properties of the problem presented in (
10). It is shown in Theorem A1 that
is convex and compact. Furthermore, by Theorems A3 and A4,
is a weakly continuous and strictly concave function on
. Therefore, the supremum is achieved by a unique input distribution
(see, e.g., Appendix C of [
14])—that is,
We use the notation
to describe a capacity-achieving input directly (i.e.,
).
Before proceeding, we require some definitions and notations. In the first definition, and throughout the paper, for and , we denote the ball of radius r centered at by . We will denote the closure of by . The output density induced by an input is denoted .
Definition 1. Let be a random variable with alphabet . Then, the support of is the set given byIf has distribution , we may alternatively refer to . Definition 2. For , the output differential entropy is given byand the marginal entropy density at is given bywhenever the integrals exist. The relationship between the differential entropy and marginal entropy density can be seen as follows. For any
and
, we have that
The equivalence of (
15)–(
17) is due to Fubini–Tonelli [
18] and Lemma A4. If
, then
Lastly, define
and let
be given by
Since
, we have that
is finite for all
and conclude that
is also finite for all
. Furthermore, by Lemma A8,
can be extended to a complex analytic function for all
; hence, it is continuous.
We show that
solves (
11) if and only if there exists
such that
for all
.
We show that, for the choice of
in (
22), the inequality in (
22) is equivalent to the condition that for all
,
and if
, then
For
, (
24) is satisfied if and only if
.
To establish (
22), we will first use a Lagrange multiplier to reformulate (
11) as an unconstrained problem over
. We will then obtain (
22) by taking the weak derivative of the resulting objective functional and applying a Karush–Kuhn–Tucker condition for optimality. We choose to work in the space
since, when
, the functionals
and
are not weakly differentiable on the larger space
.
Since
, (
11) can equivalently be written as
where
is the same as in (
11). By Theorem A5,
is convex. Moreover, letting
be a Heaviside step function at
,
is an interior point of the feasible region since
. Since
is convex by Lemma A1, there exists
such that
where
and
(see e.g., Appendix C of [
14]). Furthermore, for an arbitrary
,
. Therefore, for this choice of
, we also have
By Theorems A5 and A6,
has a weak derivative at
in the direction of any
given by
where (
31) is due to
. Substituting
gives
where the differential entropy of the noise
is finite since
is positive definite.
Now,
is the difference between a strictly concave function (see Theorem A4) and a convex function (due to Theorem A5 and the non-negativity of
). Therefore,
is strictly concave and
is optimal if and only if, for all
,
However,
is arbitrary and each
satisfies
for some
. Therefore,
is optimal if and only if, for all
,
which is the statement we sought to show in (
22).
The condition (
34) is on the capacity-achieving distribution
itself. Since our objective is to characterize
, we find an equivalent condition to (
34) to describe the behavior of
at individual points in the input alphabet
. Thus, by (
34) and Theorem A2, for all
,
and if
, then
The rest of
Section 3 is dedicated to exploiting the relationship
where
is the zero set of
.
3.2. Hilbert Space and Hermite Polynomial Representation
In this subsection, an equivalent expression for (
36) is found by viewing the integral as an inner product in a Hilbert space and writing
in terms of a Hermite polynomial basis for that space. Hermite polynomial bases are well-suited to analysis of Gaussian noise channels and they have been used in a number of information-theoretic papers (see, e.g., Refs. [
14,
19]).
Consider the Hilbert space
equipped with inner product
The inner product’s subscript is omitted when the space can be inferred.
Since the components of
are independent, with
having variance
, the density of
factors into
We will construct an orthogonal basis for
from orthogonal bases for the spaces
.
First, with
, an orthogonal basis for
is given by the Hermite polynomials
[
20], which are defined through the generating function
For any
, the
mth Hermite polynomial has degree
m and a positive leading coefficient. Next, for each
and
, define the stretched Hermite polynomials
The inner product of
is related to that of
by
where
and
are versions of
and
that have been stretched horizontally by a factor of
. Substituting any
and
into (
44) shows that the set
is orthogonal. Furthermore, if there was a non-zero function (in an
sense) that had a zero
inner product with
for each
, then a stretched version of this function would also have a zero
inner product with
for each
m. This would contradict the completeness of the Hermite polynomials in
; hence,
forms a basis for
. Lastly, the stretched Hermite polynomials
have the generating function
Now,
is isomorphic to the tensor product of the
spaces,
; consequently,
forms an orthonormal basis for
[
21], where
Since, by Lemma A2,
, there exist constants
for which
where equality is in an
sense. Then, substituting (
47), and using the notations
and
for
and
, we write
Substituting (
49) and (
55) into the integral term in (
36) yields
where
, with
This simplification to a polynomial will be helpful since the cost function associated with the even-moment constraint is also a polynomial. This relationship is exploited in
Section 3.3.
3.3. Non-Constancy of
Recall the relationship
from (
37). Since
,
has at least one zero, it is constant if and only if
This subsection is dedicated to showing the latter equivalent condition. The immediate implication is that
is a strict subset of
; however, the fact that
is a non-zero real-analytic function will be used in
Section 3.4 to prove the main results.
By way of contradiction, suppose that, for all
,
. Substituting (
59) into (
36), this is equivalent to
for all
. The discussion proceeds in two cases:
and
.
In the case that
and
,
is known to be Gaussian [
22] and there is no contradiction with (
61). Therefore, for
, we focus only on compact input alphabets
.
With
and
, ref. (
61) reduces to
for all
. Let
be the
i′th row of the
identity matrix and let
be the all zero vector. Since (
62) holds for all
, matching coefficients gives
Since, for each
and
,
has degree
and a positive leading coefficient,
also has degree
in
and the unique term with total degree
is
, which has a positive coefficient. Therefore, the polynomials present in the sum are of the form
and, for
,
The constants
and
are positive, while
and
are real. Substituting this and the identity
into (
49) yields
or equivalently,
By definition, ; however, results in a constant density on , which is invalid. Then, it must be the case that . Thus, the output achieved by , has independent Gaussian components. Since and are independent and is an n-variate Gaussian random variable, must either be an n-variate Gaussian random variable or be almost surely equal to some . In the former case, violates the stipulation that the input alphabet is compact and contradicts the assumption that is identically 0. In the latter case, is trivial and satisfies the main results of the paper.
For the case
, we derive a contradiction to (
61) using results on the rate of decay of a function compared with that of its Fourier transform to conclude that
is not identically 0.
Lemma 1. Let have, for some , a characteristic function satisfyingfor all . Let be a random variable independent of . Then, the characteristic function of satisfies, for all , Proof. By the independence of
and
, and the fact that characteristic functions have pointwise moduli upper-bounded by 1,
□
Lemma 2. Let have, for some constant , a characteristic function satisfyingfor all . Let be a random variable independent of and have density . If there exist positive constants α and K such that, for all ,then . Proof. Apply Lemma 1 and Theorem 4 of [
23], noting that an identically 0 function cannot be a density. □
We make use of Lemma 2 by setting
,
, and
and deriving a contradiction to the assumption that
is identically 0. Note that, using Rayleigh quotients, the modulus of the characteristic function of
can be upper-bounded for any
by
That is, the characteristic function of
satisfies (
75) with
.
To complete the contradiction, we show that there exists
and
such that
satisfies the bound in (
76). The assumption that
is identically 0 yields (
61); substituting the Multinomial Theorem,
By coefficient matching in (
78), the set of non-zero coefficients, other than
, is indexed by the set
Furthermore, for
,
Therefore, substituting (
80) into (
49),
for some positive constant
. As with the case
,
results in a constant output density over
and can be disregarded as a possibility. Thus, for each
, we have
With
and
, showing that there exists
for which (
76) holds, is equivalent to showing that
is bounded. This, in turn, is equivalent to showing that the polynomial in the exponent,
is upper bounded. We proceed by considering the degrees of the terms of
to determine the behavior of (
84) as
increases.
For each
and
,
has degree
in
. Furthermore,
has total degree
and the unique highest degree term,
, has coefficient
. Note that, since
, and by the definition of
,
has total degree
. Hence, ref. (
84) can be rewritten as
, where
and
is the sum of the remaining terms, each with a total degree of at most
.
Note the following:
For each , and, by Lemma A9, the minimal value of —evaluated on a sphere of radius —is at least .
For each
, we have that
. Indeed, for each
,
by (
82),
, and
; further, for each
,
is even.
The maximum value of , evaluated on a sphere of radius , is at most for some —that is, each term of is either of the form or for some , and , where . Lemma A10 shows that these are no more than or .
We conclude that, since
and
for all
,
Thus, since
is a continuous function that satisfies (
90), it is bounded from above. Let
and
Then, for all
,
Recall that, with
, the smallest eigenvalue of
, and
, the characteristic function of
satisfies (
75). Let
and choose
according to (
91). Then,
satisfies (
92), yet
. Hence, the bound on the characteristic function of
given by (
77) and the bound on the density of
given in (
92) contradict Lemma 2. Therefore, the coefficient matching equation (
78) cannot hold for all
and we conclude that, for
,
cannot be identically 0 on
.
We summarize the results of the two cases, and , in a theorem.
Theorem 1. Suppose that either
- 1.
is compact, or
- 2.
, with .
Then, either for some or An immediate consequence of Theorem 1 is that
is a strict subset of
. Recall from
Section 3.1 that
has an analytic extension to
. Therefore, Theorem 1 shows that
is “sparse" in the sense used by [
12]—that is, there exists a non-zero function with an analytic extension to
that is zero on
. However, the primary importance of Theorem 1 is as an intermediary result that is used in
Section 3.4 to obtain a better understanding of the structure of
.
3.4. Main Results
In this section, we use geometry to show that is contained in a countable disjoint union of submanifolds of dimensions ranging . Furthermore, this union is finite when is compact. We then show that has Lebesgue measure 0 and is nowhere dense in .
The discussions in this section consider subsets of a vector’s components; so, for
and
, we introduce the notation
Recall from (
37) in
Section 3.1 that
Since, by Lemma A8,
has an analytic extension to
, it is real-analytic, which motivates us to study the geometry of zero sets of real-analytic functions. We start by restating Theorem 6.3.3 of [
24] to the level that is needed in this paper.
Theorem 2 (Structure Theorem).
Let be a real-analytic function, where is not identically 0 in . After a rotation of the coordinates , there exist constants , such that withwe havewhere is either empty or contains only the origin and , , is a finite disjoint union of i-dimensional submanifolds—that is, for each , there exists for whichwhere each is an i-dimensional submanifold. Furthermore, lettingthere exists an open set and real-analytic functions , , on for which We apply Theorem 2 to characterize the zero set of
in the form of (
97) and obtain the following result.
Theorem 3. Suppose that either
- 1.
is compact, or
- 2.
, with .
Then,where is a countable union of isolated points and , , is a countable disjoint union of i-dimensional submanifolds. Furthermore, if is compact, these unions are finite. Proof. First, note that, by Theorem 1, either
for some
or
is not identically 0 on
. In the former case, the result is trivially true; so, assume that
is not identically 0 on
. Therefore, for any
, we can translate
by
and rotate its coordinate system to apply Theorem 2—that is, there exists a sufficiently small open set
around
such that
where the
values are as in Theorem 2.
Since
is dense in
,
Furthermore, if
is compact, the open cover
has a finite subcover
—that is,
Defining the index set
we obtain
Since, for each
,
is either empty or a single point,
is a countable set of points and is finite when
is compact. Furthermore, each
, where
, is itself a finite union of
i-dimensional submanifolds. Hence,
is a countable union of
i-dimensional submanifolds. When
is compact, this union is also finite. □
Note that Theorem 3 agrees with the results of [
5] when the cases overlap. Indeed, consider the case in which
is a ball centered at the origin,
, and the noise covariance matrix
, where
is the
identity matrix and
. Then, ref. [
5] shows that the capacity-achieving distribution is supported on a finite number of concentric
-spheres. Each
-sphere is an
dimensional submanifold.
In the next two theorems, we show that has Lebesgue measure 0 and is nowhere dense in .
Theorem 4. Suppose that either
- 1.
is compact, or
- 2.
, with .
Let denote the n-dimensional Lebesgue measure. Then, Proof. By Theorem 3, we have
where
is countable. Note that for each
,
is either empty or a single point; so,
. Furthermore, for each
and
,
is a finite disjoint union of
i-dimensional submanifolds, and for
, each submanifold has Lebesgue measure 0. Therefore,
. □
We will now define the notion of a subset being nowhere dense in its superset and show that is nowhere dense in .
Definition 3. Let . A set is said to be dense in B if, for every , there exists a sequence such that Definition 4. Let . A set is called nowhere dense in B if, for every open set , is not dense in U.
Theorem 5. Suppose that either
- 1.
is compact, or
- 2.
, with .
Then, is nowhere dense in .
Proof. By Theorem 1, either
for some
or
Since, in the former, case
is nowhere dense in
, assume the latter and let (
115) hold. Let
be a non-empty open set; we will show the result by proving that
is not dense in
U.
Fix
. Translating
by
, rotating the coordinate system and applying Theorem 2 shows that there exists a sufficiently small open set
Q containing
on which
It suffices to show the existence of a point of the form
that is not the limit of any sequence in
. Let
be a convergent sequence in
, indexed by
m, for which
Using the parameterization from (
100), the
n’th component of sequence index
m satisfies one of the following:
However, since , where is open, the set is uncountable. Thus, there exists t such that is not the limit of any sequence in . □