4.1. Proof of Theorem 1
Converse: Assume that the pair
is
D-achievable. For
with the convention
, define the RV’s
The rate
R is lower bounded as follows
Here, follows since is memoryless, and follows since forms a Markov chain.
We may now lower bound
as follows
where
follows since
is a deterministic function of
.
The sum-rate can be lower bounded as follows
where
follows by equality (
37) and inequality (
38).
Draw
J uniformly from
independently of
, and define the RV’s
,
,
,
, and
. Using
J, we may express (
37) as follows
and we may express (
38) as follows
With regard to the expected distortion, we may write
Step
is justified as follows: Since
are deterministic functions of
is a Markov chain and, given
, the tuple
is independent of
. As a consequence of that, Lemma 1 in ([
12] Section II.B) guarantees the existence of a reconstruction
which dominates
in the sense that
This observation interpreted as the “data processing inequality” for estimation has already been made in ([
12] Lemma 1).
By (
1) and the memoryless property of the sequence
one can verify the Markov relation
which implies the Markov relation
. Similarly, the definitions of
and
and the memoryless property of
imply that, conditioned on
,
forms a Markov chain.
Thus, conditioned on
U,
forms a Markov chain, hence
where
denotes the conditional law induced by the marginal law
. The combination of (
40)–(
42) and (
44) together with the latter Markov relations establish the converse.
We shall now obtain an alternative characterization for the lower bound (
38). For a law
and its induced conditional law
, let
and let
denote the lower convex envelope of
.
Define
then by ([
13] Section III.C, Lemma 1)
is convex.
The integrand on the RHS of (
46) is the entropy of the scalar quantizer of
conditioned on
where
is defined in (36c). Now, conditioned on
, Lemma 1 in ([
12] Section II.B) ensures that
Consequently, we may lower bound the RHS of (
46) as follows
where in
we used the definition of
Q and in
its convexity. The lower bound (
47) may be interpreted as follows. Fix
and consider, for each
, time shraing of at most two scalar quantizers for the “source”
attaining a distortion level
. The optimal helper time shares side-information-dependent scalar quantizers of
(at most two per each side-information symbol
), while the reconstruction at the decoder is a function of
.
Direct: To establish the achievability of
, consider the codebook construction as follows. The codebook
is obtained by drawing the
n-length sequences
independently of
. (For the definition of
—the set of
-strongly typical
n-sequences corresponding to a marginal law
and a few properties of these sequences see [
14,
15,
16]).
Given the source sequence x, is defined as follows.
If the encoder searches for the first sequence in such that (s.t.) and sets .
If , or s.t. , an encoding error is declared.
Given , the helper forms the sequence of descriptions that is sent causally to the decoder.
Decoding: Given as well as the sub-sequence , the decoder forms the reconstruction sequence .
Given that
are jointly typical since
is memoryless, the Markov lemma guarantees that, for large
n, with high probability
are also jointly typical. Since
forms a Markov chain, by the Markov lemma, with high probability,
as well as
are jointly typical. Thus, with high probability,
are jointly typical, hence the distortion constraint (
10) is fulfilled for large
n. That the sequence
can be described at a conditional entropy rate satisfying (12b)–(
47) can be established along similar lines as the proof of the direct part of Theorem 2 in ([
13] Section III.C). Finally, standard error probability analysis verifies that, with high probability,
are jointly typical as long as (
12a) holds.
4.2. Proof of Lemma 1
If
, then
where
and
are independent and the pair
is equal in distribution to the pair
. Thus, the unitary transformation
preserves the Gaussian nature of the channel and factors according to (see (
53) ahead)
To show (
22a), consider the sequence of identities
Moreover, to show (22b), consider the sequence of identities
Starting with (
50), consider the difference
under a law of the form
i.e., that
forms a Markov chain (see also ([
9] Section VI.A, Lemma 4)).
We distinguish between the two cases:
- (1)
In case that
, the non-negativity of mutual information implies that the expression (
52) is non-negative, hence the inequalities (
63) ahead hold for any
.
- (2)
In case that
, we prove first that for the set of laws which are feasible for the optimization problem (
20), the expression (
52) is strictly positive.
Observe that with the choice of
and
where
and
are two independent copies of
such that
and
i.e.,
, we have
Thus, the unitary transformation
picks a pair of independent copies of a law
and “creates” a pair of laws,
and
, which factor jointly according to (
53) with
hence are symmetric w.r.t. the inputs
and
. We shall define the latter set of laws by
and, as shown above,
.
Remark 2. Suppose that with then, for both laws and , we have and . Since the image of the mapis a convex set, standard dimensionality reduction argument can be used to establish the existence of a law where is supported on a finite set and achieves any point in the image of the map (55) (See ([9] Section IV, Remark 2)). By rate-distortion theory, the constraint
with both
U and
V non-void implies that
Now, for any
as per (
49), conditioned on
, the random variable
V is dependent on
i.e.
. As a consequence of that, since the mutual information
is strictly positive and, conditioned on
,
forms a Markov chain
Since a law of the form (
53) dictates
the combination of (
56)–(
58) yields that
Thus, the conditional mutual information is non-increasing under the conditioning on .
By the symmetry of the pair of laws
and
induced by
, conditioned on
, the random variable
V is dependent on
, hence
However, with the latter factorization, if indeed
, i.e.,
then, since the conditional mutual information
is non-increasing under the conditioning on
, it follows that
, which is in contradiction with (
59). Consequently, for
,
, thus establishing that the expression (
52) is positive.
Consequently, there exists some
, such that for any
The combination of (
50) and (
62) implies that, for any
,
Here,
follows by (
50) and (
62), and
is true, since the set of laws over which the infimum on the RHS of (
63) is evaluated is not empty. Indeed, the choice of
and
where
and
are two independent copies of
such that
and
–i.e.,
(hence it is feasible for
in (
19)), satisfies (
54), hence it belongs to the feasible set.
On the other hand, since both mappings
and
are invertible, then with
Here, follows since, conditioned on , and are independent, and follows since, conditioned on , and are independent.
When
and
where
and
are two independent copies of
, such that
and
, then
forms a Markov chain. Thus,
and the RHS of (
64) becomes
thus establishing that,
where the inequality follows, since the infimum on the LHS is taken over a larger set than that on the RHS. The combination of (
63) and (
65) establishes (
22a) for
.
Now, returning to (
51), consider the difference
under the law (
53) i.e., that
forms a Markov chain. By rate-distortion theory, the constraint
with both
U and
V non-void implies that
Moreover, an argument similar to that leading to the conclusion that (
52) is non-negative establishes that (
66) is non-negative.
Consequently, there exist some
such that for any
In addition, the combination of (
51) and (
68) implies that, for any
,
The combination of (
69) and (
65) establishes (22b) for
.