1. Introduction
For a Markov chain
and an assigned joint probability distribution
, consider the following information bottleneck (IB) problem:
where
C is the bottleneck constraint parameter and the optimization is with respect to the conditional probability distribution
of
Z given
Y. Formulation (1) was introduced by Tishby in [
1] and has found remarkable applications in supervised and unsupervised learning problems such as classification, clustering, prediction, etc. [
2,
3,
4,
5,
6,
7]. From a more fundamental information theoretic viewpoint, the IB arises from the classical remote source coding problem [
8,
9,
10] under logarithmic distortion [
11].
An interesting application of the IB problem in communications consists of a source node, an oblivious relay, and a destination node, which is connected to the relay via an error-free link with capacity
C. The source node sends codewords over a communication channel and an observation is made at the relay.
X and
Y are, respectively, the channel input from the source node and output at the relay. The relay is oblivious in the sense that it cannot decode the information message of the source node itself. This feature can be modeled rigorously by assuming that the source and destination nodes make use of a codebook selected at random over a library, while the relay is unaware of such random selection. For example, in a cloud radio access network (C-RAN), each remote radio head (RRH) acts as a relay and is usually constrained to implement only radio functionalities while the baseband functionalities are migrated to the cloud central processor [
12]. Considering the relatively simple structure of the RRHs, it is usually prohibitive to let them know the codebooks and random encoding operations, particularly as the network size becomes large. The fact that the relay cannot decode is also supported by secrecy demands, which means that the codebooks known to the source and destination nodes are to be considered absolutely random, as done here.
Due to the oblivious feature, the relaying strategies that require the codebooks to be known at the relay, e.g., decode-and-forward, compute-and-forward, etc. [
13,
14,
15], cannot be applied. Instead, the relay has to perform oblivious processing, i.e., employ strategies in the form of compress-and-forward [
16,
17,
18,
19]. In particular, the relay must treat
X as a random process with distribution induced by random selection over the codebook library (see [
12] and references therein) and has to produce some useful representation
Z by simple signal processing and to convey it to the destination node subject to the link constraint
C. Then, it makes sense to find
Z such that
is maximized.
The IB problem for this kind of communication scenario has been studied in [
12,
20,
21,
22,
23,
24,
25,
26]. In [
20], the IB method was applied to reduce the fronthaul data rate of a C-RAN network. References [
21,
22], respectively, considered Gaussian scalar and vector channels with IB constraint and investigated the optimal tradeoff between the compression rate and the relevant information. In [
23], the bottleneck rate of a frequency-selective scalar Gaussian primitive diamond relay channel was examined. In [
24,
25], the rate-distortion region of a vector Gaussian system with multiple relays was characterized under the logarithmic loss distortion measure. Reference [
12] further extended the work in [
25] to a C-RAN network with multiple transmitters and multiple relays and studied the capacity region of this network. However, all of References [
12,
20,
21,
22,
23,
24,
25] considered block fading channels and assumed that the perfect channel state information (CSI) was known at both the relay and the destination nodes. In [
26], the IB problem of a scalar Rayleigh fading channel was studied. Due to the bottleneck constraint, it was impossible to inform the destination node of the perfect CSI in each channel realization. An upper bound and two achievable schemes were provided in [
26] to investigate the bottleneck rate.
In this paper, we extend the work in [
26] to the multiple-input multiple-out (MIMO) channel with independent and identically distributed (i.i.d.) Rayleigh fading. This model is relevant for the practical setting of the uplink of a wireless multiuser system where
K users send coded uplink signals to a base station. The base station is formed by an RRH with
M antennas, connected to a cloud central processor via a digital link of rate
C (bottleneck link). The RRH is oblivious to the user codebooks and can apply only simple localized signal processing corresponding to the low-level physical layer functions (i.e., it is an oblivious relay). In current implementations, the RRH quantizes both the uplink pilot symbols and the data-bearing symbols received from the users on each “resource block” (This corresponds roughly to a coherence block of the underlying fading channel in the time-frequency domain) and sends the quantization bits to the cloud processor via the digital link. Here, we simplify the problem, and instead of considering a specific pilot-based channel estimation scheme, we assume that the channel matrix is given perfectly to the relay (remote radiohead), i.e., that the CSI is perfect but local at the relay. Then, we consider an upper bound and specific achievability strategies to maximize the mutual information between the user transmitted signals and the message delivered to the cloud processor, where we allow the relay to operate local oblivious processing as an alternative to direct quantization of both the CSI and the received data-bearing signal.
Intuitively, the relay can split the capacity of the relay-destination link into two parts and convey both the CSI and its observation to the destination node. Hence, in the first and second achievable schemes, the relay transmits the compressed CSI and observation to the destination node. Specifically, in the first scheme, the relay simply compresses the channel matrix as well as its observation and then forwards them to the destination node. Roughly speaking, this is what happens today in “naive” implementation of RRH systems. Therefore, this scheme can be seen as a baseline scheme. However, the capacity allocated for conveying the CSI to the destination in this scheme is proportional to both the channel input dimension and the number of antennas at the relay. To reduce the channel use required for CSI transmission, in the second achievable scheme, the relay first obtains an estimate of the channel input using channel inversion and then transmits the quantized noise levels as well as the compressed noisy signal to the destination node. In contrast to the first scheme, the capacity allocated to CSI transmission in this scheme is only proportional to the channel input dimension.
Due to the explicit CSI transmission through the bottleneck, the performance of the first and second achievable schemes is sensitive to the MIMO channel dimension, especially the channel input dimension. To ensure that it still performs well when the channel dimension grows large, in the third and fourth achievable schemes, the relay does not convey any CSI to the destination node. In the third scheme, the relay first estimates the channel input using channel inversion and then transmits a truncated representation of the estimate to the destination node. In the fourth scheme, the relay first produces the minimum mean-squared error (MMSE) estimate of the channel input and then source-encodes this estimate. Numerical results show that, with simple symbol-by-symbol oblivious relay processing and compression, the lower bounds obtained by the proposed achievable schemes can come close to the upper bound on a wide range of relevant system parameters.
The rest of this paper is organized as follows. In
Section 2, a MIMO channel with Rayleigh fading is presented and the IB problem for this system is formulated.
Section 3 provides an upper bound to the bottleneck rate. In
Section 4, four achievable schemes are proposed, where each scheme satisfies the bottleneck constraint and gives a lower bound to the bottleneck rate. Numerical results are presented in
Section 5 before the conclusions in
Section 6.
Throughout this paper, we use the following notations. and denote the real space and the complex space, respectively. Boldface upper (lower) case letters are used to denote matrices (vectors). stands for the dimensional identity matrix and denotes the all-zero vector or matrix. Superscript denotes the conjugated-transpose operation, denotes the expectation operation, and . ⊗ and ⊙, respectively, denote the Kronecker product and the Hadamard product.
5. Numerical Results
In this section, we evaluate the lower bounds obtained by different achievable schemes proposed in
Section 4 and compare them with the upper bound derived in
Section 3. Before showing the numerical results, we first give the following lemma, which compares the bottleneck rate of the NDT scheme with those of the other three schemes in the
case.
Lemma 10. When , the NDT scheme outperforms the other three schemes, i.e., Remark 5. Besides the proof in Appendix M, we can also explain Lemma 10 from a more intuitive perspective. When , the destination node can obtain perfect and from the relay by using the NDT scheme. The bottleneck rate is thus determined by the capacity of Channel 1. In the QCI scheme, though the destination node can obtain perfect signal vector and noise power of each channel, the correlation between the elements of the noise vector is neglected since the off-diagonal entries of are not considered. The bottleneck rate obtained by the QCI scheme is thus upper bounded by the capacity of Channel 1. As for the TCI or MMSE schemes, the destination node can obtain perfect or from the relay. However, the bottleneck rate in these two cases is not only affected by the capacity of Channel 1 but is also limited by the performance of zero-forcing or MMSE estimation since the estimation inevitably incurs a loss of information. Hence, the NDT scheme has a better performance when . In the following, we give the numerical results. Note that, when performing the QCI scheme, we choose the quantization levels as quantiles for the sake of convenience.
Figure 2 depicts
versus distortion
D under different configurations of SNR
. It can be found from this figure that
first increases and then decreases with
D. It is thus important to find a good
D to maximize
. Since it is difficult to obtain the explicit expression of (
21), it is not easy to strictly analyze the relationship between
and
D. However, we can intuitively explain
Figure 2 as follows. When using the NDT scheme, the relay quantizes both
and
. Due to the bottleneck constraint
C, there exists a tradeoff. When
D is small, the estimation error of
is small. The destination node can get more CSI, and
thus increases with
D. When
D grows large, though more capacity in
C is allocated for quantizing
, the estimation error of
is large. Hence,
decreases with
D. In the following simulation process, when implementing the NDT scheme, we vary
D, calculate
using (
21), and then let
be the maximum value.
In
Figure 3 and
Figure 4, we performed Monte Carlo simulations to obtain joint entropy
in (
34) and the sum of individual entropies
in (
37). Note that, as stated in
Section 4.2, the complexities of calculating
and
are, respectively, proportional to
and
J. Hence, when
J or
K is large, it becomes quite difficult to obtain
. For example, when
and
, we have
and
65,536, i.e., there are 65,536 points in space
. To obtain a reliable pmf
for each point, the number of channel realizations has to be much greater than 65,536.
Figure 3 shows that the gap between
and
is small. In addition, as
M increases,
approaches
quickly, indicating that the dependence between
becomes weak. This can be explained by considering an extreme case where
. Based on the definition of
and the strong law of large numbers, we almost surely have
when
. Hence,
.
are thus almost independent.
When
and
K increases,
Figure 4 shows that there exists an obvious increase in the gap between
and
. Hence, when
and
K increases, the correlation between
is enhanced. We will thus obtain a gain to
if we use
instead of
. However, we would like to point out the following: First, it can be found from
Figure 4 that, when
, this trend becomes less evident. Second, as shown in the following results, when
, since the QCI scheme uses a lot of capacity in
C to quantize
, its performance is not as good as the TCI scheme or MMSE scheme. Third, when
K or
B is large, it becomes difficult to obtain
. Therefore, when implementing the QCI scheme in the following, we obtain
by using
, i.e., quantizing
separately.
In
Figure 5 and
Figure 6, we investigate the effect of threshold
on
for the cases with
and
, respectively. From these two figures, several observations can be made. First, when
, and
or
K is small,
increases greatly and then decreases with
, indicating that the choice of
has a significant impact on
. It is thus important to look for a good
to maximize
in these cases. Second, when
, and
K as well as
are large or when
,
first remains unchanged and then monotonically decreases with
. In these cases, a small
is good enough to guarantee a large
and a search for
can thus be avoided. For example, when
, we can set
, based on which a simpler expression of
is given in (
69). For the case with
, since
does not exist when
, we can set
to be a fixed small number.
In
Figure 7 and
Figure 8, we compare
with its upper bound
and lower bound
. As expected,
,
, and
all increase with
M and
. When
M or
is small, there is a small gap between
and
, and a small gap between
and
. As
M and
increase, these gaps narrow rapidly and the curves almost coincide, which verifies Remark 2. As a result, when
or
is large, we can set
and use
in (
70) to lower bound
since it has a more concise expression.
In
Figure 9 and
Figure 10, the upper bound
and lower bounds obtained by different schemes are depicted versus SNR
. Several observations can be made from these two figures. First, as expected, all bounds increase with
. Second, when
K,
M, and
are small, the NDT scheme outperforms the other achievable schemes. However, as these parameters increase, the performance of the NDT scheme deteriorates rapidly. This is because, when
K,
M, and
are small, the performance of the considered system is mainly limited by the capacity of Channel 1, and the NDT scheme works well since the destination node can extract more information from the compressed observation of the relay and CSI. However, when
K and
M increase, the NDT scheme requires too many channel uses for CSI transmission. Third, the QCI scheme can obtain a good performance when
K is small. Of course, as stated at the beginning of
Section 4.3, the number of bits required for transmitting quantized noise levels in the QCI scheme is proportional to
K and
B. Hence, the performance of the QCI scheme varies significantly when
K and
B change. Moreover, it is also shown that the performance of the TCI scheme is worse than that of the MMSE scheme in the low SNR regime while getting quite close to that of the MMSE scheme in the high SNR regime. When
grows large, the lower bounds obtained by the TCI and MMSE schemes both approach
C and are larger than those obtained by the NDT and QCI schemes.
In
Figure 11 and
Figure 12, the effect of the bottleneck constraint
C is investigated. From
Figure 11, it can be found that, as
C increases, all bounds grow and converge to different constants, which can be calculated based on Lemmas 1, 3, 5, 7 and 9.
Figure 11 also shows that, thanks to CSI transmission, the NDT and QCI schemes outperform the TCI and MMSE schemes when
C is large. By comparing these two figures, it can be found that, in
Figure 11, no bound approaches
C, even for the case with
, while in
Figure 12, it is possible for
,
, and
to approach
C. For example, when
and
,
. This is because the bottleneck rate is limited by the capacity of Channel 1 and
C. In
Figure 11, since
K and
M are small, the capacity of Channel 1 is smaller than
C. Hence, the bounds of course will not approach
C. In
Figure 12, more multi-antenna gains can be obtained due to larger
K and
M. The capacity of Channel 1 is thus larger than
C in some cases (e.g.,
and
). Hence,
,
, and
may approach
C in these cases. Note that, as shown in
Figure 11, since
is not satisfied,
when
.
In
Figure 13 and
Figure 14, the effect of
M is investigated for different configurations of
. These two figures show that
,
,
, and
all increase monotonically with
M and that, as
M grows,
as well as
become very close to
. For
, except the
case in
Figure 13,
monotonically decreases with
M since the relay has to transmit more channel information to the destination node.
In
Figure 15 and
Figure 16, we set
and depict the upper and lower bounds versus
K or
M. In
Figure 15, we fix
C to 50, while in
Figure 16, we set
, which makes sense since the bottleneck constraint should scale with the number of degrees of freedom of the input signal
. Since we choose the quantization levels as quantiles when performing the QCI scheme, as stated at the end of
Section 4.2,
should be satisfied. Hence, in
Figure 15 and
Figure 16, we only consider
bits when performing the QCI scheme. When
and they grow simultaneously, the capacity of Channel 1 increases due to the muti-antenna gains. Hence, for a fixed
C,
Figure 15 shows that all bounds increase first. When
K or
M grows large,
and
approach the bottleneck constraint
C while
decreases for all values of
B. This is because the number of bits per channel use required for informing the destination node of
in the QCI scheme is proportional to
K while CSI transmission is unnecessary for the TCI and MMSE schemes. For the NDT scheme, since the number of bits required for quantizing
is proportional to both
K and
M, there is only an increase when
K grows from 1 to 2. After that,
decreases monotonically and has the worst performance. In contrast, when
, the bottleneck rate of the system is mainly limited by
C. Hence,
Figure 16 shows that all bounds except
increase almost linearly with
K and that
,
, and
are quite close to
C.