Nearest Neighbor Decoding and Pilot-Aided Channel Estimation for Fading Channels

Asyhari, A. Taufiq; Koch, Tobias; Guillén i Fàbregas, Albert

doi:10.3390/e22090971

Open AccessArticle

Nearest Neighbor Decoding and Pilot-Aided Channel Estimation for Fading Channels^†

by

A. Taufiq Asyhari

^1,*

,

Tobias Koch

^2,3

and

Albert Guillén i Fàbregas

^4,5,6

¹

School of Computing and Digital Technology, Birmingham City University, Millennium Point, Birmingham B4 7XG, UK

²

Signal Theory and Communications Department, Universidad Carlos III de Madrid, 28911 Leganés, Spain

³

Gregorio Marañón Health Research Institute, 28007 Madrid, Spain

⁴

Department of Information and Communication Technologies, Universitat Pompeu Fabra, 08018 Barcelona, Spain

⁵

Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain

⁶

Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK

^*

Author to whom correspondence should be addressed.

^†

The material in this paper was presented in part at the 2011 IEEE International Symposium on Information Theory (ISIT), Saint Petersburg, Russia, 31 July–5 August 2011 and the 49th Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, USA, 28–30 September 2011.

Entropy 2020, 22(9), 971; https://doi.org/10.3390/e22090971

Submission received: 7 July 2020 / Revised: 21 August 2020 / Accepted: 26 August 2020 / Published: 31 August 2020

(This article belongs to the Special Issue Information Theory for Communication Systems)

Download

Browse Figures

Versions Notes

Abstract

:

We study the information rates of noncoherent, stationary, Gaussian, and multiple-input multiple-output (MIMO) flat-fading channels that are achievable with nearest neighbor decoding and pilot-aided channel estimation. In particular, we investigate the behavior of these achievable rates in the limit as the signal-to-noise ratio (SNR) tends to infinity by analyzing the capacity pre-log, which is defined as the limiting ratio of the capacity to the logarithm of the SNR as the SNR tends to infinity. We demonstrate that a scheme estimating the channel using pilot symbols and detecting the message using nearest neighbor decoding (while assuming that the channel estimation is perfect) essentially achieves the capacity pre-log of noncoherent multiple-input single-output flat-fading channels, and it essentially achieves the best so far known lower bound on the capacity pre-log of noncoherent MIMO flat-fading channels. Extending the analysis to fading multiple-access channels reveals interesting relationships between the number of antennas and Doppler bandwidth in the comparative performance of joint transmission and time division multiple-access.

Keywords:

achievable rates; fading; high signal-to-noise ratio (SNR); mismatched decoding; multiple-access channels; multiple antennas; nearest neighbor decoding; noncoherent; pilot-aided channel estimation

1. Introduction

The capacity of coherent multiple-input multiple-output (MIMO) channels increases with the signal-to-noise ratio (SNR) as

\min (n_{t}, n_{r}) log SNR

, where

n_{t}

and

n_{r}

are the number of transmit and receive antennas, respectively, and

SNR

denotes the SNR per receive antenna [1,2]. The growth factor

\min (n_{t}, n_{r})

is sometimes referred to as the capacity pre-log [3] or spatial multiplexing gain [4,5,6]. This capacity growth can be achieved using a nearest neighbor decoder which selects the codeword that is closest (in Euclidean distance) to the channel output. In fact, for coherent fading channels with additive Gaussian noise, this decoder is the maximum-likelihood decoder and is therefore optimal in the sense that it minimizes the error probability (see [7] and references therein). The coherent channel model assumes that there is a genie that provides the exact fading coefficients to the decoder, an assumption that is difficult to achieve in practice. In this paper, we replace the role of the genie by a scheme that estimates the fading coefficients via pilot symbols. This can be viewed as a particular coding strategy over a noncoherent fading channel, i.e., a channel where both communication ends do not have access to fading coefficients but may be aware of the fading statistics. Please note that with imperfect fading estimation, the nearest neighbor decoder that treats the fading estimate as if it were perfect is not necessarily optimal. Nevertheless, we show that in some relevant cases, nearest neighbor decoding with pilot-aided channel estimation achieves the capacity pre-log of noncoherent fading channels. (For noncoherent channels, the capacity pre-log is defined as the limiting ratio of the capacity to the logarithm of the SNR as the SNR tends to infinity.)

The capacity of noncoherent fading channels has been studied in several works. Building upon [8], Hassibi and Hochwald [9] studied the capacity of the block-fading channel and used pilot symbols (also known as training symbols) to obtain reasonably accurate fading estimates. Jindal and Lozano [10] provided tools for a unified treatment of pilot-based channel estimation in both block and stationary fading channels with bandlimited power spectral densities. In these works, lower bounds on the channel capacity were obtained. Lapidoth [3] studied a single-input single-output (SISO) fading channel for more general stationary fading processes and showed that depending on the predictability of the fading process, the capacity growth in SNR can be, inter alia, logarithmic or double logarithmic. The extension of [3] to multiple-input single-output (MISO) fading channels can be found in [11]. A lower bound on the capacity of stationary MIMO fading channels was derived by Etkin and Tse in [12]. With a view to next-generation (5G and beyond) communication networks, there has been an interest in capacity analyses of noncoherent massive MIMO channels with the vast majority of attempts focusing on the block-fading model [13,14,15].

Lapidoth and Shamai [16] and Weingarten et al. [17] studied noncoherent stationary fading channels from a mismatched-decoding perspective. In particular, they studied the achievable rates of Gaussian codebooks and nearest neighbor decoding. In both works, it is assumed that there is a genie that provides imperfect estimates of the fading coefficients.

In this work, we add the estimation of the fading coefficients to the analysis. In particular, we study a communication system where the transmitter emits pilot symbols at regular intervals, and where the receiver separately performs channel estimation and data detection. More precisely, based on the channel outputs corresponding to pilot transmissions, the channel estimator produces estimates of the fading coefficients for the remaining time instants using a linear minimum mean-squared error (LMMSE) interpolator. Using these estimates, the data detector employs a nearest neighbor decoder that detects the transmitted message. We study the achievable rates of this communication scheme at high SNR. In particular, we study the pre-log for fading processes with bandlimited power spectral densities. (The pre-log is defined as the limiting ratio of the achievable rate to the logarithm of the SNR as the SNR tends to infinity.)

For SISO fading channels, using simplifying arguments, Lozano [18] and Jindal and Lozano [10] showed that this scheme achieves the capacity pre-log. In particular, they express the achievable rates of this scheme in terms of the capacity of a fading channel whose SNR is reduced due to the imperfect channel estimation; (cf. [10], Equation (17)). Their expression ([10], Equation (21)) for the estimation error is based on the assumption that channel estimation is performed using infinitely many pilot symbols. However, obtaining ([10], Equation (17)) from the provided references is not straightforward, since it requires a limiting argument where both the codeword length and the number of pilot symbols tend to infinity in a controlled manner. The analysis is further complicated by the fact that for a given number of pilot symbols, the estimation error of the LMMSE interpolator is not stationary but cyclo-stationary, and it becomes stationary only as the number of pilot symbols tends to infinity. In this paper, we prove this result without any simplifying assumptions and extend it to MIMO fading channels. We show that the maximum rate pre-log with nearest neighbor decoding and pilot-aided channel estimation is given by the capacity pre-log of the coherent fading channel times the fraction of time used for the transmission of data. Hence, the loss with respect to the coherent case is solely due to the transmission of pilots used to obtain accurate fading estimates. If the inverse of twice the bandwidth of the fading process is an integer, then for MISO channels, the above scheme achieves the capacity pre-log derived by Koch and Lapidoth [11]. For MIMO channels, the above scheme achieves the best so far known lower bound on the capacity pre-log obtained in [12]. The proof steps followed in this paper apply also to other pilot-assisted communication strategies and can be mimicked to perform rigorous analyses of their achievable rates; see, e.g., [19,20,21,22].

The rest of the paper is organized as follows. Section 2 describes the channel model and introduces our transmission scheme along with nearest neighbor decoding and pilots for channel estimation. Section 3 defines the pre-log and presents the main result. Section 4 extends the use of our scheme to a fading multiple-access channel (MAC). Section 5 and Section 6 provide the proofs of our main results. Section 7 summarizes the results and concludes the paper.

2. System Model and Transmission Scheme

We consider a discrete-time MIMO flat-fading channel with

n_{t}

transmit antennas and

n_{r}

receive antennas. Thus, the channel output at time instant

k \in ℤ

(where ℤ denotes the set of integers) is the complex-valued

n_{r}

-dimensional random vector given by

Y_{k} = \sqrt{\frac{SNR}{n_{t}}} H_{k} x_{k} + Z_{k} .

(1)

Here

x_{k} {\in ℂ}^{n_{t}}

denotes the time-k channel input vector (with ℂ denoting the set of complex numbers),

H_{k}

denotes the

(n_{r} \times n_{t})

-dimensional random fading matrix at time k, and

Z_{k}

denotes the

n_{r}

-variate random additive noise vector at time k.

The noise process

{Z_{k}, k \in ℤ}

is a sequence of independent and identically distributed (i.i.d.) complex-Gaussian random vectors with zero mean and covariance matrix

I_{n_{r}}

, where

I_{n_{r}}

is the

n_{r} \times n_{r}

identity matrix.

SNR

denotes the average SNR for each received antenna. The fading process

{H_{k}, k \in ℤ}

is stationary, ergodic, and complex-Gaussian. We assume that the

n_{r} \cdot n_{t}

processes

{H_{k} (r, t), k \in ℤ}

,

r = 1, \dots, n_{r}

,

t = 1, \dots, n_{t}

are independent and have the same law, with each process having zero mean, unit variance, and power spectral density

f_{H} (λ)

,

- \frac{1}{2} \leq λ \leq \frac{1}{2}

. The assumption that the fading processes are independent is realistic for data transmission over a rich uniform scattering environment when both transmit and receive antennas have sufficient separation to ensure independent signal paths that translate into spatially-independent fading coefficients. The power spectral density

f_{H} (\cdot)

is a nonnegative (measurable) function satisfying

E [H_{k + m} (r, t) H_{k}^{*} (r, t)] = \int_{- 1 / 2}^{1 / 2} e^{i 2 π m λ} f_{H} (λ) d λ

(2)

where

{(\cdot)}^{*}

denotes complex conjugation, and where

i ≜ \sqrt{- 1}

. We assume that

f_{H} (\cdot)

has bandwidth

λ_{D} < 1 / 2

, i.e.,

λ_{D}

is the smallest value such that

f_{H} (λ) = 0

for almost every

| λ | > λ_{D}

. We finally assume that the fading process

{H_{k}, k \in ℤ}

and the noise process

{Z_{k}, k \in ℤ}

are independent and that their joint law does not depend on

{x_{k}, k \in ℤ}

.

The transmission involves both codewords and pilots. The former conveys the message to be transmitted, and the latter are used to facilitate the estimation of the fading coefficients at the receiver. We denote a codeword conveying a message m,

m \in M

at rate R (where

M = \{1, \dots, ⌊ e^{n R} ⌋\}

is the set of possible messages, and where

⌊ b ⌋

denotes the largest integer smaller than or equal to b) by the length-n sequence of input vectors

{\bar{x}}_{1} (m), \dots, {\bar{x}}_{n} (m)

. The codeword is selected from the codebook

C

, which is drawn i.i.d. from an

n_{t}

-variate complex-Gaussian distribution with zero mean and identity covariance matrix, so

\frac{1}{n} \sum_{k = 1}^{n} E [{∥{\bar{X}}_{k} (m)∥}^{2}] = n_{t}, m \in M

(3)

where

∥ \cdot ∥

denotes the Euclidean norm.

To estimate the fading matrix, we transmit orthogonal pilot vectors. The pilot vector

p_{t} \in ℂ^{n_{t}}

used to estimate the fading coefficients corresponding to the t-th transmit antenna is given by

p_{t} (t) = 1

and

p_{t} (t^{'}) = 0

for

t^{'} \neq t

. For example, the first pilot vector is

p_{1} = {(1, 0, \dots, 0)}^{T}

, where

{(\cdot)}^{T}

denotes the transpose. To estimate the whole fading matrix, we thus need to send the

n_{t}

pilot vectors

p_{1}, \dots, p_{n_{t}}

.

The transmission scheme is as follows. Every L time instants (for some

L \in ℕ

, where ℕ is the set of positive integers), we transmit the

n_{t}

pilot vectors

p_{1}, \dots, p_{n_{t}}

. Each codeword is then split up into blocks of

L - n_{t}

data vectors, which will be transmitted after the

n_{t}

pilot vectors. The process of transmitting

L - n_{t}

data vectors and

n_{t}

pilot vectors continues until all n data vectors are completed. Herein we assume that n is an integer multiple of

L - n_{t}

. (If n is not an integer multiple of

L - n_{t}

, then the last

L - n_{t}

instants are not fully used by data vectors and include therefore time instants where we do not transmit anything. The thereby incurred loss in information rate vanishes as n tends to infinity.) Prior to transmitting the first data block, and after transmitting the last data block, we introduce a guard period of

L (T - 1)

time instants (for some

T \in ℕ

), where we transmit every L time instants the

n_{t}

pilot vectors

p_{1}, \dots, p_{n_{t}}

, but we do not transmit data vectors in between. The guard period ensures that at every time instant, we can employ a channel estimator that bases its estimation on the channel outputs corresponding to the T preceding and the T subsequent pilot transmissions. This facilitates the analysis and, asymptotically, does not incur any loss in terms of achievable rates. The above transmission scheme is illustrated in Figure 1. The channel estimator is described in the following.

Please note that the total blocklength of the above transmission scheme (comprising data vectors, pilot vectors, and guard period) is given by

n^{'} = n_{p} + n + n_{g}

(4)

where

n_{p}

denotes the number of channel uses reserved for pilot vectors, and where

n_{g}

denotes the number of channel uses during the silent guard period, i.e.,

\begin{matrix} n_{p} & = (\frac{n}{L - n_{t}} + 1 + 2 (T - 1)) n_{t} \end{matrix}

(5)

\begin{matrix} n_{g} & = 2 (L - n_{t}) (T - 1) . \end{matrix}

(6)

We now turn to the decoder. Let

D

denote the set of integers reserved for the transmission of data vectors, and let

P

denote the set of integers reserved for the transmission of pilot symbols. The decoder consists of two parts: a channel estimator and a data detector. To estimate the fading coefficient at a given time instant, the channel estimator considers the channel output vectors

Y_{k^{'}}

,

k^{'} \in P

corresponding to the T preceding and T subsequent pilot transmissions and estimates

H_{k} (r, t)

using a linear interpolator. The estimate

{\hat{H}}_{k}^{(T)} (r, t)

of the fading coefficient

H_{k} (r, t)

is thus given by

{\hat{H}}_{k}^{(T)} (r, t) = \sum_{\begin{matrix} k^{'} = k - T L : \\ k^{'} \in P \end{matrix}}^{k + T L} a_{k^{'}} (r, t) Y_{k^{'}} (r)

(7)

where the coefficients

a_{k^{'}} (r, t)

are chosen in order to minimize the mean-squared error. (It has been shown in [23] that for the linear interpolator in (7), only the observations when pilots are transmitted, i.e.,

Y_{k^{'}}, k^{'} \in P

are relevant for fading estimation.) In general, these coefficients depend on k and T. However, for the sake of compactness, we do not reflect this dependence in the notation.

Please note that since the pilot vectors transmit only from one antenna, the fading coefficients corresponding to all transmit and receive antennas

(r, t)

can be observed. Furthermore, please note that since the fading processes

{H_{k} (r, t), k \in ℤ}

,

r = 1, \dots, n_{r}

,

t = 1, \dots, n_{t}

are independent, estimating

H_{k} (r, t)

only based on

{Y_{k} (r), k \in ℤ}

rather than on

{Y_{k}, k \in ℤ}

incurs no loss in optimality.

Since the time-lags between

H_{k}

,

k \in D

and the observations

Y_{k^{'}}

,

k^{'} \in P

depend on k, it follows that the interpolation error

E_{k}^{(T)} (r, t) ≜ H_{k} (r, t) - {\hat{H}}_{k}^{(T)} (r, t)

(8)

is not stationary but cyclo-stationary with period L. It can be shown that, irrespective of r, the variance of the interpolation error

ϵ_{ℓ, T}^{2} (r, t) ≜ E [{|H_{k} (r, t) - {\hat{H}}_{k}^{(T)} (r, t)|}^{2}]

(9)

tends to the following expression as T tends to infinity [23]:

\begin{matrix} (10) & ϵ_{ℓ}^{2} (t) & ≜ lim_{T \to \infty} ϵ_{ℓ, T}^{2} (r, t) \\ (11) & = 1 - \int_{- 1 / 2}^{1 / 2} \frac{SNR | f_{L, ℓ - t + 1} {(λ) |}^{2}}{SNR f_{L, 0} (λ) + n_{t}} d λ \end{matrix}

where

ℓ ≜ k mod L

denotes the remainder of

k / L

. Here

f_{L, ℓ} (\cdot)

is given by

f_{L, ℓ} (λ) = \frac{1}{L} \sum_{ν = 0}^{L - 1} {\bar{f}}_{H} (\frac{λ - ν}{L}) e^{i 2 π ℓ \frac{λ - ν}{L}}, ℓ = 0, \dots, L - 1

(12)

and

{\bar{f}}_{H} (\cdot)

is the periodic continuation of

f_{H} (\cdot)

, i.e., it is the periodic function of period

[- 1 / 2, 1 / 2)

that coincides with

f_{H} (λ)

for

- 1 / 2 \leq λ \leq 1 / 2

. If

L \leq \frac{1}{2 λ_{D}}

(13)

then

| f_{L, ℓ} (\cdot) |

becomes

| f_{L, ℓ} (λ) | = f_{L, 0} (λ) = \frac{1}{L} f_{H} (\frac{λ}{L}) .

(14)

In this case, irrespective of ℓ and t, the variance of the interpolation error is given by

ϵ_{ℓ}^{2} (t) = ϵ^{2} = 1 - \int_{- 1 / 2}^{1 / 2} \frac{SNR {[f_{H} (λ)]}^{2}}{SNR f_{H} (λ) + L n_{t}} d λ, ℓ = 0, \dots, L - 1, t = 1, \dots, n_{t}

(15)

which vanishes as the

SNR

tends to infinity. Recall that

λ_{D}

denotes the bandwidth of

f_{H} (\cdot)

. Thus, (13) implies that no aliasing occurs as we undersample the fading process L times. Please note that in contrast to (11), the variance in (15) is independent of the transmit antenna index t. See Section 5.1 for a more detailed discussion.

The channel estimator feeds the sequence of fading estimates

{{\hat{H}}_{k}^{(T)}, k \in D}

(which is composed of the matrix entries

{{\hat{H}}_{k}^{(T)} (r, t), k \in D}

) to the data detector. We shall denote its realization by

{{\hat{H}}_{k}^{(T)}, k \in D}

. Based on the channel outputs

{y_{k}, k \in D}

and fading estimates

{{\hat{H}}_{k}^{(T)}, k \in D}

, the data detector uses a nearest neighbor decoder to guess which message was transmitted. Thus, the decoder decides on the message

\hat{m}

that satisfies

\hat{m} = arg \min_{m \in M} D (m)

(16)

where

D (m) ≜ \sum_{k \in D^{(n^{'})}} {∥y_{k} - \sqrt{\frac{SNR}{n_{t}}} {\hat{H}}_{k}^{(T)} x_{k} (m)∥}^{2} .

(17)

On the RHS of (17), assuming that the first pilot symbol is transmitted at time

k = 0

, we defined

D^{(n^{'})} ≜ {0, \dots, n^{'} - 1} \cap D

(18)

as the set of time indices where data vectors corresponding to a codeword of length

n^{'}

are transmitted. (For comparison,

D

represents the set of all integers that are reserved for the transmission of data vectors).

3. The Pre-Log

We say that a rate

R (SNR) ≜ \frac{log | M |}{n}

(19)

is achievable if there exists a code of length n with

| M |

codewords such that the error probability tends to zero as n tends to infinity. In this work, we study the set of rates that are achievable with nearest neighbor decoding and pilot-aided channel estimation. We focus on the achievable rates at high

SNR

. In particular, we are interested in the maximum achievable pre-log, defined as

Π_{R^{*}} ≜ \underset{SNR \to \infty}{lim sup} \frac{R^{*} (SNR)}{log SNR}

(20)

where

R^{*} (SNR)

is the maximum rate achievable with nearest neighbor decoding and pilot-aided channel estimation, maximized over all possible encoders.

The capacity pre-log—which is given by (20) but with

R^{*} (SNR)

replaced by the capacity

C (SNR)

—of SISO fading channels was computed by Lapidoth [3] as

Π_{C} = μ ({λ : f_{H} (λ) = 0})

(21)

where

μ (\cdot)

denotes the Lebesgue measure on the interval

[- 1 / 2, 1 / 2]

. (The capacity is defined as the supremum of all achievable rates maximized over all possible encoders and decoders.) Koch and Lapidoth [11] extended this result to MISO fading channels and showed that if the fading processes

{H_{k} (t), k \in ℤ}

,

t = 1, \dots, n_{t}

are independent and have the same law, then the capacity pre-log of MISO fading channels is equal to the capacity pre-log of the SISO fading channel with fading process

{H_{k} (1), k \in ℤ}

. Using (21), the capacity pre-log of MISO fading channels with bandlimited power spectral densities of bandwidth

λ_{D}

can be evaluated as

Π_{C} = 1 - 2 λ_{D} .

(22)

Since

R^{*} (SNR) \leq C (SNR)

, it follows that

Π_{R^{*}} \leq Π_{C}

.

To the best of our knowledge, the capacity pre-log of MIMO fading channels is unknown. For independent fading processes

{H_{k} (r, t), k \in ℤ}

,

t = 1, \dots, n_{t}

,

r = 1, \dots, n_{r}

that have the same law, the best so far known lower bound on the MIMO pre-log is due to Etkin and Tse [12], and is given by

Π_{C} \geq \min (n_{t}, n_{r}) (1 - \min (n_{t}, n_{r}) μ ({λ : f_{H} (λ) > 0})) .

(23)

For power spectral densities that are bandlimited to

λ_{D}

, this becomes

Π_{C} \geq \min (n_{t}, n_{r}) (1 - \min (n_{t}, n_{r}) 2 λ_{D}) .

(24)

Observe that (24) specializes to (22) for

n_{r} = 1

.

It should be noted that the capacity pre-log for MISO and SISO fading channels was derived under a peak-power constraint on the channel inputs, whereas the lower bound on the capacity pre-log for MIMO fading channels was derived under an average-power constraint. Clearly, the capacity pre-log corresponding to a peak-power constraint can never be larger than the capacity pre-log corresponding to an average-power constraint. It is believed that the two pre-logs are in fact identical (see the conclusions in [3]).

In this paper, we show that a communication scheme that employs nearest neighbor decoding and pilot-aided channel estimation achieves the following pre-log.

Theorem 1.

Consider the Gaussian MIMO flat-fading channel with

n_{t}

transmit antennas and

n_{r}

receive antennas (1). Then, the transmission and decoding scheme described in Section 2 achieves

Π_{R^{*}} \geq \min (n_{t}, n_{r}) (1 - \frac{\min (n_{t}, n_{r})}{L^{*}})

(25)

where

L^{*} = ⌊\frac{1}{2 λ_{D}}⌋

.

Proof.

See Section 5. □

Remark 1.

We derive Theorem 1 for i.i.d. Gaussian codebooks, which satisfy the average-power constraint (3). Nevertheless, it can be shown that Theorem 1 continues to hold when the channel inputs satisfy a peak-power constraint. More specifically, we show in Section 5.3 that for an input distribution with power constraint

E [∥ \bar{X} ∥^{2}] \leq n_{t}

to achieve the pre-log (25), it is sufficient that its probability density function

p_{\bar{X}} (\cdot)

satisfies

p_{\bar{X}} (\bar{x}) \leq \frac{K}{π^{n_{t}}} e^{- ∥ \bar{x} ∥^{2}}, \bar{x} \in ℂ^{n_{t}}

(26)

for some K satisfying

lim_{SNR \to \infty} \frac{log K}{log SNR} = 0 .

(27)

The condition (26) is satisfied, for example, by i.i.d., truncated, Gaussian inputs, i.e., by inputs for which the

n_{t}

elements in

\bar{X}

are i.i.d. and

\begin{matrix} p_{\bar{X}} (\bar{x}) = \{\begin{matrix} \frac{1}{\hat{K} π^{n_{t}}} e^{- ∥ \bar{x} ∥^{2}}, & if | \bar{x} (t) | \leq 1, 1 \leq t \leq n_{t} \\ 0, & otherwise \end{matrix} \end{matrix}

(28)

with

\hat{K} = {(\int_{| \bar{x} | \leq 1} \frac{1}{π} e^{- | \bar{x} |^{2}} d \bar{x})}^{n_{t}} .

(29)

If

1 / (2 λ_{D})

is an integer, then (25) becomes

Π_{R^{*}} \geq \min (n_{t}, n_{r}) (1 - \min (n_{t}, n_{r}) 2 λ_{D}) .

(30)

Thus, in this case nearest neighbor decoding together with pilot-aided channel estimation achieves the capacity pre-log of MISO fading channels (22) as well as the lower bound on the capacity pre-log of MIMO fading channels (24).

Suppose that both transmitter and receiver use the same number of antennas, namely,

{n_{t}}^{'} ≜ {n_{r}}^{'} ≜ \min (n_{t}, n_{r})

. Then, as the codeword length tends to infinity, we have from (4)–(6) that the fraction of time consumed for the transmission of pilots is given by

lim_{n \to \infty} \frac{n_{p}}{n^{'}} = lim_{n \to \infty} \frac{(\frac{n}{L - {n_{t}}^{'}} + 1 + 2 (T - 1)) {n_{t}}^{'}}{(\frac{n}{L - {n_{t}}^{'}} + 1 + 2 (T - 1)) {n_{t}}^{'} + n + 2 (L - {n_{t}}^{'}) (T - 1)} = \frac{{n_{t}}^{'}}{L} .

(31)

Consequently, by rewriting the pre-log (25) as

Π_{R^{*}} \geq {n_{t}}^{'} (1 - \frac{{n_{t}}^{'}}{L}), L \leq \frac{1}{2 λ_{D}}

(32)

we observe that the loss compared to the capacity pre-log

{n_{t}}^{'} = \min (n_{t}, n_{r})

of the coherent fading channel is given by the fraction of time used for the transmission of pilots. This implies that the nearest neighbor decoder in combination with the channel estimator described in Section 2 is optimal at high SNR in the sense that it achieves the capacity pre-log of the coherent fading channel. Moreover, the achievable pre-log in Theorem 1 is the best pre-log that can be achieved by any scheme employing

{n_{t}}^{'}

pilot vectors.

To achieve the pre-log in Theorem 1, we assume that the training period L satisfies

L \leq \frac{1}{2 λ_{D}}

, in which case the variance of the interpolation error (15), namely

ϵ^{2} = 1 - \int_{- 1 / 2}^{1 / 2} \frac{SNR {[f_{H} (λ)]}^{2}}{SNR f_{H} (λ) + L n_{t}} d λ \approx \frac{2 λ_{D} L n_{t}}{SNR}

(33)

vanishes as the reciprocal of the SNR. The achievable pre-log is then maximized by maximizing

L \leq \frac{1}{2 λ_{D}}

. Please note that as a criterion of “perfect side information” for nearest neighbor decoding in fading channels, Lapidoth and Shamai [16] suggested that the variance of the fading estimation error should be negligible compared to the reciprocal of the SNR. The condition

L \leq \frac{1}{2 λ_{D}}

can thus be viewed as a sufficient condition for obtaining “nearly perfect side information” in the sense that the variance of the interpolation error is of the same order as the reciprocal of the SNR.

Of course, one could increase the training period L beyond

\frac{1}{2 λ_{D}}

. By increasing L, we reduce the rate loss due to the transmission of pilots as indicated in (32) at the cost of obtaining a larger fading estimation error, which in turn may reduce the reliability of the nearest neighbor decoder. To understand this trade-off better, we next briefly discuss the achievable pre-log when

L > \frac{1}{2 λ_{D}}

. Indeed, for

L > \frac{1}{2 λ_{D}}

, the variance of the interpolation error follows from (11) as

\begin{matrix} (34) & ϵ_{ℓ}^{2} (t) & = 1 - \int_{- 1 / 2}^{1 / 2} \frac{SNR {|f_{L, ℓ - t + 1} (λ)|}^{2}}{SNR f_{L, 0} (λ) + n_{t}} d λ \\ (35) & = \int_{- 1 / 2}^{1 / 2} \frac{n_{t} f_{L, 0} (λ)}{SNR f_{L, 0} (λ) + n_{t}} d λ + \int_{- 1 / 2}^{1 / 2} \frac{SNR ({[f_{L, 0} (λ)]}^{2} - {|f_{L, ℓ - t + 1} (λ)|}^{2})}{SNR f_{L, 0} (λ) + n_{t}} d λ . \end{matrix}

The former integral

\int_{- 1 / 2}^{1 / 2} \frac{n_{t} f_{L, 0} (λ)}{SNR f_{L, 0} (λ) + n_{t}} d λ \approx \frac{n_{t}}{SNR}

(36)

vanishes as the reciprocal of the SNR. However, we prove in Appendix B that, as the SNR tends to infinity, the latter integral

\int_{- 1 / 2}^{1 / 2} \frac{SNR ({[f_{L, 0} (λ)]}^{2} - {|f_{L, ℓ - t + 1} (λ)|}^{2})}{SNR f_{L, 0} (λ) + n_{t}} d λ

(37)

is bounded away from zero. This implies that the interpolation error (35) does not vanish as the SNR tends to infinity, and the pre-log achievable with the scheme described in Section 2 is zero. It thus follows that the condition

L \leq \frac{1}{2 λ_{D}}

is necessary in order to achieve a positive pre-log.

Comparing (24) and (25) with the capacity pre-log

\min (n_{t}, n_{r})

of the coherent fading channel, we observe that, for a fading process of bandwidth

λ_{D}

, the penalty for not knowing the fading coefficients is roughly

{(\min (n_{t}, n_{r}))}^{2} \cdot 2 λ_{D}

. Consequently, the lower bound (25) does not grow linearly with

\min (n_{t}, n_{r})

, but it is a quadratic function of

\min (n_{t}, n_{r})

that achieves its maximum at

\min (n_{t}, n_{r}) = \frac{L^{*}}{2} .

(38)

This gives rise to the lower bound

Π_{R^{*}} \geq \frac{L^{*}}{4}

(39)

which cannot be larger than

1 / (8 λ_{D})

. The same holds for the lower bound (23).

4. Fading Multiple-Access Channels

In this section, we extend the use of nearest neighbor decoding with pilot-aided channel estimation to the fading MAC depicted in Figure 2. We are interested in the pre-log region that can be achieved with this scheme.

We consider a two-user MIMO fading MAC, where two terminals wish to communicate with a third one, and where the channels between the terminals are MIMO fading channels. Extension to more than two users is straightforward. The first user has

n_{t, 1}

antennas, the second user has

n_{t, 2}

antennas, and the receiver has

n_{r}

antennas. The channel output at time instant

k \in ℤ

is a complex-valued

n_{r}

-dimensional random vector given by

Y_{k} = \sqrt{SNR} H_{1, k} x_{1, k} + \sqrt{SNR} H_{2, k} x_{2, k} + Z_{k} .

(40)

Here

x_{s, k} \in ℂ^{n_{t, s}}

denotes the time-k channel input vector corresponding to user s,

s = 1, 2

;

H_{s, k}

denotes the

(n_{r} \times n_{t, s})

-dimensional fading matrix at time k corresponding to user s,

s = 1, 2

;

SNR

denotes the average SNR for each transmit antenna; and

Z_{k}

denotes the

n_{r}

-variate additive noise vector at time k. The fading processes

{H_{s, k}, k \in ℤ}

,

s = 1, 2

are independent of each other and of the noise process

{Z_{k}, k \in ℤ}

and have the same distribution as the fading process considered in the point-to-point channel (Section 2). The noise process

{Z_{k}, k \in ℤ}

is a sequence of i.i.d. complex-Gaussian vectors with zero mean and covariance matrix

I_{n_{r}}

.

Both users transmit codewords and pilot symbols over the channel (40). To transmit the messages

m_{s} \in {1, \dots, ⌊ e^{n R_{s}} ⌋}

,

s = 1, 2

(where

m_{1}

and

m_{2}

are drawn independently), each user’s encoder selects a codeword of length n from a codebook

C_{s}

, where

C_{s}

,

s = 1, 2

are drawn i.i.d. from an

n_{t, s}

-variate, zero-mean, complex-Gaussian distribution of covariance matrix

I_{n_{t, s}}

. Similar to the single-user case, orthogonal pilot vectors are used. The pilot vector

p_{s, t} \in ℂ^{n_{t, s}}

,

s = 1, 2

,

t = 1, \dots, n_{t, s}

used to estimate the fading coefficients from transmit antenna t of user s is given by

p_{s, t} (t) = 1

and

p_{s, t} (t^{'}) = 0

for

t^{'} \neq t

. For example, the first pilot vector of user s is given by

{(1, 0, \dots, 0)}^{T}

. To estimate the fading matrices

H_{1, k}

and

H_{2, k}

, each training period requires the transmission of

(n_{t, 1} + n_{t, 2})

pilot vectors

p_{1, 1}, \dots, p_{1, n_{t, 1}}, p_{2, 1}, \dots, p_{2, n_{t, 2}}

.

Assuming that transmission from both users is synchronized, the transmission scheme extends the point-to-point setup in Section 2 to the two-user MAC setup as illustrated in Figure 3. Every L time instants (for some

L \geq n_{t, 1} + n_{t, 2}, L \in ℕ

), user 1 first transmits the

n_{t, 1}

pilot vectors

p_{1, 1}, \dots, p_{1, n_{t, 1}}

. Once the transmission of the

n_{t, 1}

pilot vectors ends, user 2 transmits its

n_{t, 2}

pilot vectors

p_{2, 1}, \dots, p_{2, n_{t, 2}}

. The codewords for both users are then split up into blocks of

(L - n_{t, 1} - n_{t, 2})

data vectors, which are transmitted simultaneously after the

(n_{t, 1} + n_{t, 2})

pilot vectors. The process of transmitting

(L - n_{t, 1} - n_{t, 2})

data vectors and

(n_{t, 1} + n_{t, 2})

pilot vectors continues until all n data symbols are completed. Herein we assume that n is an integer multiple of

(L - n_{t, 1} - n_{t, 2})

. (As in the point-to-point setup, in the limit as n tends to infinity, this assumption is not critical in terms of achievable rates.) Prior to transmitting the first data block, and after transmitting the last data block, a guard period of

L (T - 1)

time instants (for some

T \in ℕ

) is introduced for the purpose of channel estimation, where we transmit every L time instants the

(n_{t, 1} + n_{t, 2})

pilot vectors but we do not transmit data vectors in between. Please note that codewords from both users are jointly transmitted at the same time instants whereas pilots from both users do not interfere and are separately transmitted at different time instants. The total blocklength of this transmission scheme (comprising data vectors, pilot vectors, and guard period) is given by

n^{'} = n_{p} + n + n_{g}

(41)

where

n_{p}

and

n_{g}

are

\begin{matrix} (42) & n_{p} & = (\frac{n}{L - n_{t, 1} - n_{t, 2}} + 1 + 2 (T - 1)) (n_{t, 1} + n_{t, 2}) \\ (42) & n_{g} & = 2 (L - n_{t, 1} - n_{t, 2}) (T - 1) . \end{matrix}

Similar to the single-user case, the receiver guesses which messages have been transmitted using a two-part decoder that consists of a channel estimator and a data detector. The channel estimator first obtains matrix-valued fading estimates

{{\hat{H}}_{s, k}^{(T)}, k \in D}

,

s = 1, 2

from the received pilots

Y_{k^{'}}

,

k^{'} \in P

using the same linear interpolator as (7). From the received codeword

{y_{k}, k \in D}

and the channel-estimate matrices

{{\hat{H}}_{s, k}^{(T)}, k \in D}

,

s = 1, 2

(which are the realizations of

{{\hat{H}}_{s, k}^{(T)}, k \in D}

,

s = 1, 2

), the decoder chooses the pair of messages

({\hat{m}}_{1}, {\hat{m}}_{2})

that minimizes the distance metric

({\hat{m}}_{1}, {\hat{m}}_{2}) = arg \min_{(m_{1}, m_{2})} D (m_{1}, m_{2})

(44)

where

D (m_{1}, m_{2}) ≜ \sum_{k \in D^{(n^{'})}} {∥y_{k} - \sqrt{SNR} {\hat{H}}_{1, k}^{(T)} x_{1, k} (m_{1}) - \sqrt{SNR} {\hat{H}}_{2, k}^{(T)} x_{2, k} (m_{2})∥}^{2}

(45)

and where

D^{(n^{'})}

is defined in the same way as (18). In the following, we shall refer to the above communication scheme as the joint-transmission scheme.

We shall compare the joint-transmission scheme with a time-division multiple-access (TDMA) scheme, where each user transmits its message using the transmission scheme illustrated in Figure 4. Specifically, during the first

β n^{'}

channel uses (for some

0 \leq β \leq 1

, and where

n^{'}

is given in (41)), user 1 transmits its codeword according to the transmission scheme given in Section 2 (see also Figure 4), while user 2 is silent. Then, during the next

(1 - β) n^{'}

channel uses, user 2 transmits its codeword according to the same transmission scheme, while user 1 is silent. In both cases, the receiver guesses the corresponding message

m_{s}

,

s = 1, 2

using a nearest neighbor decoder and pilot-aided channel estimation.

4.1. The MAC Pre-Log

Let

R_{1}^{*} (SNR)

,

R_{2}^{*} (SNR)

, and

R_{1 + 2}^{*} (SNR)

be the maximum achievable rate of user 1, the maximum achievable rate of user 2, and the maximum achievable sum-rate, respectively. The achievable-rate region is given by the set [24]

R (SNR) = {(R_{1}, R_{2}) : R_{1} \leq R_{1}^{*} (SNR), R_{2} \leq R_{2}^{*} (SNR), R_{1} + R_{2} \leq R_{1 + 2}^{*} (SNR)} .

(46)

We are interested in the pre-logs of all rate pairs

(R_{1} (SNR), R_{2} (SNR))

in

R (SNR)

, defined as the limiting ratios of

R_{1} (SNR)

and

R_{2} (SNR)

to the logarithm of the SNR as the SNR tends to infinity. More precisely, the pre-log region is defined as the set of all pre-log pairs

(Π_{R_{1}}, Π_{R_{2}})

for which there exists a sequence of rate pairs

(R_{1} (SNR), R_{2} (SNR))

that, for every

SNR

, lies in

R (SNR)

and satisfies

\begin{matrix} \underset{SNR \to \infty}{lim sup} \frac{R_{1} (SNR)}{log SNR} & = Π_{R_{1}} \end{matrix}

(47)

\begin{matrix} \underset{SNR \to \infty}{lim sup} \frac{R_{2} (SNR)}{log SNR} & = Π_{R_{2}} . \end{matrix}

(48)

Let the maximum achievable pre-logs be defined as

\begin{matrix} Π_{R_{1}^{*}} & ≜ \underset{SNR \to \infty}{lim sup} \frac{R_{1}^{*} (SNR)}{log SNR} \end{matrix}

(49)

\begin{matrix} Π_{R_{2}^{*}} & ≜ \underset{SNR \to \infty}{lim sup} \frac{R_{2}^{*} (SNR)}{log SNR} \end{matrix}

(50)

\begin{matrix} Π_{R_{1 + 2}^{*}} & ≜ \underset{SNR \to \infty}{lim sup} \frac{R_{1 + 2}^{*} (SNR)}{log SNR} \end{matrix}

(51)

and define the capacity pre-logs

Π_{C_{1}}

,

Π_{C_{2}}

, and

Π_{C_{1 + 2}}

in the same way but with

R_{1}^{*} (SNR)

,

R_{2}^{*} (SNR)

, and

R_{1 + 2}^{*} (SNR)

replaced by the respective capacities

C_{1} (SNR)

,

C_{2} (SNR)

, and

C_{1 + 2} (SNR)

. If the ratios of the rates to

log SNR

in (47)–(51) converge as

SNR \to \infty

, i.e., if the limits superior are, in fact, limits, then the pre-log region is given by the set

\begin{matrix} Π_{R} = {(Π_{R_{1}}, Π_{R_{2}}) : Π_{R_{1}} \leq Π_{R_{1}^{*}}, Π_{R_{2}} \leq Π_{R_{2}^{*}}, Π_{R_{1}} + Π_{R_{2}} \leq Π_{R_{1 + 2}^{*}}} . \end{matrix}

(52)

Indeed,

R (SNR)

includes all rate pairs

(R_{1} (SNR), R_{2} (SNR))

satisfying

\begin{matrix} (53) & \frac{R_{1} (SNR)}{log SNR} & \leq \frac{R_{1}^{*} (SNR)}{log SNR} \\ (54) & \frac{R_{2} (SNR)}{log SNR} & \leq \frac{R_{2}^{*} (SNR)}{log SNR} \\ (55) & \frac{R_{1} (SNR)}{log SNR} + \frac{R_{2} (SNR)}{log SNR} & \leq \frac{R_{1 + 2}^{*} (SNR)}{log SNR} . \end{matrix}

This implies that, for every pre-log pair

(Π_{R_{1}}, Π_{R_{2}})

in

Π_{R}

, we can find a sequence of rate pairs

(R_{1} (SNR), R_{2} (SNR))

in

R (SNR)

that achieve (47)–(48). Conversely, if the pre-log pair

(Π_{R_{1}}, Π_{R_{2}})

does not lie in

Π_{R}

, then there exists a sufficiently large

{SNR}_{0}

such that, for all

SNR \geq {SNR}_{0}

, at least one of the three conditions (53)–(55) is violated. Consequently, we cannot find a sequence of rate pairs

(R_{1} (SNR), R_{2} (SNR))

in

R (SNR)

that satisfies (47)–(48).

We next present our result on the pre-log region of the two-user MIMO fading MAC achievable with the joint-transmission scheme.

Theorem 2.

Consider the MIMO fading MAC (40). Then, the joint-transmission scheme achieves the pre-log region

\begin{matrix} { & (Π_{R_{1}}, Π_{R_{2}}) : & Π_{R_{1}} \leq \min (n_{r}, n_{t, 1}) (1 - \frac{n_{t, 1} + n_{t, 2}}{L^{*}}), \\ Π_{R_{2}} \leq \min (n_{r}, n_{t, 2}) (1 - \frac{n_{t, 1} + n_{t, 2}}{L^{*}}), \\ Π_{R_{1}} + Π_{R_{2}} \leq \min (n_{r}, n_{t, 1} + n_{t, 2}) (1 - \frac{n_{t, 1} + n_{t, 2}}{L^{*}})} \end{matrix}

(56)

where

L^{*} = ⌊\frac{1}{2 λ_{D}}⌋

.

Proof.

See Section 6. □

The pre-log region given in Theorem 2 is the largest region achievable with any transmission scheme that uses

(n_{t, 1} + n_{t, 2}) / L^{*}

of the time instants for transmitting pilot symbols. Indeed, even if the channel estimator would be able to estimate the fading coefficients perfectly, and even if we could decode the data symbols using a maximum-likelihood decoder, the capacity pre-log region (without pilot transmission) would be given by the set [1,2,24]

\begin{matrix} {(Π_{R_{1}}, Π_{R_{2}}) : Π_{R_{1}} \leq \min (n_{r}, n_{t, 1}), Π_{R_{2}} \leq \min (n_{r}, n_{t, 2}), Π_{R_{1}} + Π_{R_{2}} \leq \min (n_{r}, n_{t, 1} + n_{t, 2})} \end{matrix}

(57)

which, after multiplying by

1 - (n_{t, 1} + n_{t, 2}) / L^{*}

to account for the transmission of pilot symbols, becomes (56). Thus, in order to improve upon (56), one would need to design a transmission scheme that employs less than

(n_{t, 1} + n_{t, 2}) / L^{*}

pilot symbols per channel use.

Remark 2

(TDMA Pre-Log). Consider the MIMO fading MAC (40). Then, the TDMA scheme employing nearest neighbor decoding and pilot-aided channel estimation achieves the pre-log region

\begin{matrix} { & (Π_{R_{1}}, Π_{R_{2}}) : & Π_{R_{1}} \leq β \min (n_{r}, n_{t, 1}) (1 - \frac{n_{t, 1}}{L^{*}}), \\ Π_{R_{2}} \leq (1 - β) \min (n_{r}, n_{t, 2}) (1 - \frac{n_{t, 2}}{L^{*}}), 0 \leq β \leq 1} \end{matrix}

(58)

where

L^{*} = ⌊\frac{1}{2 λ_{D}}⌋

. This follows directly from the pre-log of the point-to-point MIMO fading channel (Theorem 1) with the number of transmit antennas given by

n_{t, 1}

and

n_{t, 2}

, respectively.

Please note that the sum of the pre-logs

Π_{R_{1}} + Π_{R_{2}}

is upper-bounded by the capacity pre-log of the point-to-point MIMO fading channel with

(n_{t, 1} + n_{t, 2})

transmit antennas and

n_{r}

receive antennas, since the latter channel corresponds to the case where the transmitting terminals can cooperate. While the capacity pre-log of general point-to-point MIMO fading channels remains an open problem, the capacity pre-log of point-to-point MISO fading channels is known, cf. (22). It thus follows that, for

n_{r} = n_{t, 1} = n_{t, 2} = 1

, we have

Π_{R_{1}} + Π_{R_{2}} \leq Π_{C_{1 + 2}} = 1 - 2 λ_{D}

(59)

which together with the single-user constraints

\begin{matrix} Π_{R_{1}} & \leq Π_{C_{1}} = 1 - 2 λ_{D} \end{matrix}

(60)

\begin{matrix} Π_{R_{2}} & \leq Π_{C_{2}} = 1 - 2 λ_{D} \end{matrix}

(61)

implies that TDMA achieves the capacity pre-log region of the SISO fading MAC. The next section provides a more detailed comparison between the joint-transmission scheme and TDMA.

4.2. Joint Transmission Versus TDMA

In this section, we discuss how the joint-transmission scheme performs compared to TDMA. To this end, we compare the sum-rate pre-log

Π_{R_{1 + 2}^{*}}

of the joint-transmission scheme (Theorem 2) with the sum-rate pre-log of the TDMA scheme employing nearest neighbor decoding and pilot-aided channel estimation (Remark 2) as well as with the sum-rate pre-log of the coherent TDMA scheme, where the receiver has knowledge of the realizations of the fading processes

{H_{s, k}, k \in ℤ}

,

s = 1, 2

. In the latter case, the sum-rate pre-log is given by

Π_{R_{1 + 2}^{*}} = β \min (n_{r}, n_{t, 1}) + (1 - β) \min (n_{r}, n_{t, 2}) .

(62)

The following corollary presents a sufficient condition on

L^{*}

under which the sum-rate pre-log of the joint-transmission scheme is strictly larger than that of the coherent TDMA scheme (62), as well as a sufficient condition on

L^{*}

under which the sum-rate pre-log of the joint-transmission scheme is strictly smaller than the sum-rate pre-log of the TDMA scheme given in Remark 2. Since (62) is an upper bound on the sum-rate pre-log of any TDMA scheme over the MIMO fading MAC (40), and since the sum-rate pre-log given in Remark 2 is a lower bound on the sum-rate pre-log of the best TDMA scheme, it follows that the sufficient conditions presented in Corollary 1 hold also for the best TDMA scheme.

Corollary 1.

Consider the MIMO fading MAC (40). The joint-transmission scheme achieves a larger sum-rate pre-log than any TDMA scheme if

L^{*} > \frac{\min (n_{r}, n_{t, 1} + n_{t, 2}) (n_{t, 1} + n_{t, 2})}{\min (n_{r}, n_{t, 1} + n_{t, 2}) - \min (n_{r}, max (n_{t, 1}, n_{t, 2}))}

(63)

where we define

a / 0 ≜ \infty

for every

a > 0

. Conversely, the best TDMA scheme achieves a larger sum-rate pre-log than the joint-transmission scheme if

\begin{matrix} L^{*} < & \frac{\min (n_{r}, n_{t, 1} + n_{t, 2}) (n_{t, 1} + n_{t, 2})}{\min (n_{r}, n_{t, 1} + n_{t, 2}) - \min (n_{r}, n_{t, 1}, n_{t, 2})} - \frac{\min (n_{t, 1} n_{r}, {n_{t, 1}}^{2}, n_{t, 2} n_{r}, {n_{t, 2}}^{2})}{\min (n_{r}, n_{t, 1} + n_{t, 2}) - \min (n_{r}, n_{t, 1}, n_{t, 2})} . \end{matrix}

(64)

Recall that

L^{*}

is inversely proportional to the bandwidth of the power spectral density

f_{H} (\cdot)

, which in turn is inversely proportional to the coherence time of the fading channel. Corollary 1 thus demonstrates that the joint-transmission scheme tends to be superior to TDMA when the coherence time of the channel is large. In contrast, TDMA is superior to the joint-transmission scheme when the coherence time of the channel is small. Intuitively, this can be explained by observing that, compared to TDMA, the joint-transmission scheme uses the antennas at the transmitters and at the receiver more efficiently, but requires more pilot symbols to estimate the fading coefficients. Thus, when the coherence time is large, the number of pilot symbols required to estimate the fading is small, so the gain in achievable rate by using the antennas more efficiently dominates the loss incurred by requiring more pilot symbols. On the other hand, when the coherence time is small, the number of pilot symbols required to estimate the fading coefficients is large and the loss in achievable rate incurred by requiring more pilot symbols dominates the gain by using the antennas more efficiently.

We next evaluate (63) and (64) for some particular values of

n_{r}

,

n_{t, 1}

, and

n_{t, 2}

.

4.2.1. Receiver Employs Less Antennas than Transmitters

Suppose that

n_{r} \leq \min (n_{t, 1}, n_{t, 2})

. Then, the right-hand sides (RHSs) of (63) and (64) become ∞, so every finite

L^{*}

satisfies (64). Thus, if the number of receive antennas is smaller than the number of transmit antennas, then, irrespective of

L^{*}

, TDMA is superior to the joint-transmission scheme.

4.2.2. Receiver Employs More Antennas than Transmitters

Suppose that

n_{r} \geq n_{t, 1} + n_{t, 2}

, and suppose that

n_{t, 1} = n_{t, 2} = n_{t}

. Then, (63) and (64) become

L^{*} > 4 n_{t}

(65)

and

L^{*} < 3 n_{t} .

(66)

Thus, if

L^{*}

is greater than

4 n_{t}

, then the joint-transmission scheme is superior to TDMA. In contrast, if

L^{*}

is smaller than

3 n_{t}

, then TDMA is superior. This is illustrated in Figure 5 for the case where

n_{r} = 2

and

n_{t, 1} = n_{t, 2} = 1

. Please note that if

L^{*}

is between

3 n_{t}

and

4 n_{t}

, then the joint-transmission scheme is superior to the TDMA scheme presented in Remark 2, but it may be inferior to the best TDMA scheme.

4.2.3. A Case in between

Suppose that

n_{r} \leq n_{t, 1} + n_{t, 2}

and

n_{t, 2} < n_{r} \leq n_{t, 1}

. Then, (63) and (64) become

L^{*} > \infty

(67)

and

L^{*} < n_{t, 2} + \frac{n_{r} n_{t, 1}}{n_{r} - n_{t, 2}} .

(68)

Thus, in this case the joint-transmission scheme is always inferior to the coherent TDMA scheme (62), but it can be superior to the TDMA scheme in Remark 2.

4.3. Typical Values of $L^{*}$

We briefly discuss the range of values of

L^{*}

that may occur in practical scenarios. To this end, we first recall that

L^{*} = ⌊ 1 / (2 λ_{D}) ⌋

, and that

λ_{D}

is the bandwidth of the fading power spectral density

f_{H} (\cdot)

, which can be associated with the Doppler spread of the channel as [12]

λ_{D} = \frac{f_{m}}{W_{c}} .

(69)

Here

f_{m}

is the maximum Doppler shift given by

f_{m} = \frac{v}{c} f_{c}

(70)

where v is the speed of the mobile device,

c = 3 \cdot 10^{8}

m/s is the speed of light, and

f_{c}

is the carrier frequency; and

W_{c}

is the coherence bandwidth of the channel approximated as [12,25]

W_{c} \approx \frac{1}{5 σ_{τ}}

(71)

where

σ_{τ}

is the delay spread. Following the order-of-magnitude computations of Etkin and Tse [12], we determine typical values of

λ_{D}

for indoor, urban, and hilly area environments and for carrier frequencies ranging from 800 MHz to 5 GHz and tabulate the results in Table 1.

For indoor environments and mobile speeds of 5 km/h, we have that

L^{*}

is typically larger than

5 \times 10^{4}

. For urban environments,

L^{*}

is typically larger than

2.5 \times 10^{3}

for mobile speeds of 5 km/h and larger than 125 for mobile speeds of 75 km/h. For hilly area environments and mobile speeds of 200 km/h,

L^{*}

ranges typically from 10 to 250. Thus, for most practical scenarios,

L^{*}

is typically large. It therefore follows that, if

n_{r} \geq n_{t, 1} + n_{t, 2}

, the condition (63) is satisfied unless

n_{t, 1} + n_{t, 2}

is very large. For example, if the receiver employs more antennas than the transmitters, and if

n_{t, 1} = n_{t, 2} = n_{t}

, then

L^{*} > 4 n_{t}

is satisfied even for urban environments and mobile speeds of 75 km/h, as long as

n_{t} < 30

. Only for hilly area environments and mobile speeds of 200 km/h, this condition may not be satisfied for a practical number of transmit antennas. Thus, if the number of antennas at the receiver is sufficiently large, then the joint-transmission scheme is superior to TDMA in most practical scenarios. On the other hand, if

n_{r} \leq \min (n_{t, 1}, n_{t, 2})

, then TDMA is always superior to the joint-transmission scheme, irrespective of how large

L^{*}

is. This suggests that one should use more antennas at the receiver than at the transmitters.

5. Proof of Theorem 1

Theorem 1 is proved as follows. We first characterize the estimation error from the linear interpolator (7). We then compute the rates achievable with the communication scheme described in Section 2. Finally, we analyze the pre-log corresponding to these rates.

5.1. Linear Interpolator

We first note that the estimate of

H_{k} (r, t)

is given by (7), namely,

{\hat{H}}_{k}^{(T)} (r, t) = \sum_{\begin{matrix} k^{'} = k - T L : \\ k^{'} \in P \end{matrix}}^{k + T L} a_{k^{'}} (r, t) Y_{k^{'}} (r), k \in D .

(72)

We denote the interpolation error by

E_{k}^{(T)} (r, t) = H_{k} (r, t) - {\hat{H}}_{k}^{(T)} (r, t)

.

For future reference, and for any

k \in

, we express

k = j L + ℓ

, so

ℓ = k \mod L

. Assuming that the first pilot symbol is transmitted at

k = 0

, it follows that

ℓ = 0, \dots, n_{t} - 1

for

k \in P

and

ℓ = n_{t}, \dots, L - 1

for

k \in D

. The statistical properties of the channel estimator for a given window size T are summarized in the following lemma.

Lemma 1.

For a given T, the linear interpolator (72) has the following properties.

1.

For each

t = 1, \dots, n_{t}

,

r = 1, \dots, n_{r}

, and

ℓ = n_{t}, \dots, L - 1

, the estimate

{\hat{H}}_{j L + ℓ}^{(T)} (r, t)

and the corresponding estimation error

E_{j L + ℓ}^{(T)} (r, t)

are independent zero-mean complex-Gaussian random variables.

2.

(a) For a given transmit antenna t and

ℓ \in {n_{t}, \dots, L - 1}

, the

n_{r}

processes

{({\hat{H}}_{j L + ℓ}^{(T)} (1, t), E_{j L + ℓ}^{(T)} (1, t)), j \in ℤ}, \dots, {({\hat{H}}_{j L + ℓ}^{(T)} (n_{r}, t), E_{j L + ℓ}^{(T)} (n_{r}, t)), j \in ℤ}

(73)

are independent and have the same law.

(b): For a given receive antenna r and $ℓ \in {n_{t}, \dots, L - 1}$ , the $n_{t}$ processes

${({\hat{H}}_{j L + ℓ}^{(T)} (r, 1), E_{j L + ℓ}^{(T)} (r, 1)), j \in ℤ}, \dots, {({\hat{H}}_{j L + ℓ}^{(T)} (r, n_{t}), E_{j L + ℓ}^{(T)} (r, n_{t})), j \in ℤ}$

(74)

are independent but have different laws.

3.: For each $ℓ = n_{t}, \dots, L - 1$ , the process ${({\hat{H}}_{j L + ℓ}^{(T)}, H_{j L + ℓ}, Z_{j L + ℓ}, X_{j L + ℓ}), j \in ℤ}$ is jointly stationary and ergodic.
4.: For $ℓ = n_{t}, \dots, L - 1$ , it holds that

$E [Z_{ℓ}^{†} {\hat{H}}_{ℓ}^{(T)} X_{ℓ}] = 0$

(75)

where ${(\cdot)}^{†}$ denotes the conjugate transpose.

Proof.

See Appendix A. □

5.2. Achievable Rates and Pre-Logs

In the following proof, we only consider the case where

n_{t} = n_{r}

. The more general case of

n_{t} \neq n_{r}

follows then by using only

n_{r}

transmit antennas or by ignoring

n_{r} - n_{t}

antennas at the receiver. This yields a lower bound on the maximum achievable rate and does not incur a loss with respect to the pre-log. Indeed, it can be shown that the nearest neighbor decoder described in Section 2 achieves the pre-log

\min (n_{r}, n_{t})

. Thus, increasing

n_{t}

beyond

n_{r}

or

n_{r}

beyond

n_{t}

does not improve the pre-log achievable by such a decoder. In fact, increasing

n_{t}

beyond

n_{r}

requires the transmission of more pilot symbols and does therefore even reduce the pre-log achievable with the communication scheme described in Section 2.

To prove Theorem 1, we analyze the generalized mutual information (GMI) [27] for the channel and communication scheme in Section 2. The GMI, denoted by

I_{T}^{gmi} (SNR)

, specifies the highest information rate for which the average probability of error, averaged over the ensemble of i.i.d. Gaussian codebooks, tends to zero as the codeword length n tends to infinity (see [7,16,17] and references therein). The GMI for stationary Gaussian fading channels employing nearest neighbor decoding has been evaluated in [16,17] for the case where a genie provides the receiver with an estimate of the fading process. However, the estimate considered in [16,17] is assumed to be jointly stationary and ergodic with

{(H_{k}, X_{k}, Z_{k}), k \in ℤ}

, which is not satisfied by

{{\hat{H}}_{k}^{(T)}, k \in D}

. We therefore need to adapt the work in [16,17] to our channel model. For completeness, we present all the main steps here, even though they are very similar to the ones in [16,17].

We prove Theorem 1 as follows:

We compute a lower bound on $I_{T}^{gmi} (SNR)$ for a fixed window size T (Section 5.2.1).
We analyze the behavior of this lower bound as T tends to infinity (Section 5.2.2).
We evaluate the limiting ratio of this lower bound to $log SNR$ as $SNR$ tends to infinity (Section 5.2.3).

5.2.1. $I_{T}^{gmi} (SNR)$ for a Fixed T

We analyze the GMI for a fixed T using a random coding upper bound on the average error probability. Please note that due to the symmetry of the codebook construction, it suffices to consider the error behavior conditioned on the event that message 1 was transmitted. Let

E (m^{'})

denote the event that

D (m^{'}) \leq D (1)

, where

D (\cdot)

was defined in (17). The ensemble-average error probability—where the average is over the ensemble of i.i.d. Gaussian codes—corresponding to message

m = 1

is thus given by

{\bar{P}}_{e} (1) = Pr \{⋃_{m^{'} \neq 1} E (m^{'})\} .

(76)

To evaluate the GMI from the RHS of (76), we next define some useful quantities. Recall the channel and transmission model in Section 2. Without loss of generality, assume that the first pilot vector is transmitted at time

k = 0

. Let

\begin{matrix} F (SNR) & ≜ n_{r} + \frac{SNR}{(L - n_{t}) n_{t}} \sum_{ℓ = n_{t}}^{(L - 1)} E [{∥E_{ℓ}^{(T)}∥}_{F}^{2}] \end{matrix}

(77)

where

E_{ℓ}^{(T)}

is a random matrix whose row-r column-t entry is given by

E_{ℓ}^{(T)} (r, t)

, and where

{∥ \cdot ∥}_{F}

denotes the Frobenius norm. For some arbitrary

δ > 0

, we further define the typical set

\begin{matrix} T_{δ} ≜ {(x_{k}, y_{k}, {\hat{H}}_{k}^{(T)}), k = 0, \dots, n^{'} - 1 : |\frac{1}{n} \sum_{k \in D^{(n^{'})}} {∥y_{k} - \sqrt{\frac{SNR}{n_{t}}} {\hat{H}}_{k}^{(T)} x_{k}∥}^{2} - F (SNR)| < δ} \end{matrix}

(78)

with

D^{(n^{'})} = {0, \dots, n^{'} - 1} \cap D

and

n^{'} = n_{p} + n + n_{g}

, as given in (18) and (4), respectively. Then, we have the following convergence as n tends to infinity.

Lemma 2.

For the channel model and communication scheme described in Section 2, we have that

lim_{n \to \infty} Pr \{(X^{n^{'}}, Y^{n^{'}}, {\hat{H}}^{(T), n^{'}}) \in T_{δ}\} = 1, \forall δ > 0

(79)

where we have used the notation

U^{n^{'}}

to denote the sequence

U_{0}, \dots, U_{n^{'} - 1}

.

Proof.

We have

\begin{matrix} lim_{n \to \infty} & \frac{1}{n} \sum_{k \in D^{(n^{'})}} {∥y_{k} - \sqrt{\frac{SNR}{n_{t}}} {\hat{H}}_{k}^{(T)} x_{k}∥}^{2} \\ (80) & = lim_{n \to \infty} \frac{1}{n} \sum_{k \in D^{(n^{'})}} {∥\sqrt{\frac{SNR}{n_{t}}} (H_{k} - {\hat{H}}_{k}^{(T)}) x_{k} + z_{k}∥}^{2} \\ (81) & = \frac{1}{L - n_{t}} \sum_{ℓ = n_{t}}^{L - 1} lim_{n \to \infty} \frac{L - n_{t}}{n} \sum_{j = 0}^{\frac{n}{L - n_{t}} - 1} {∥\sqrt{\frac{SNR}{n_{t}}} (H_{j L + ℓ} - {\hat{H}}_{j L + ℓ}^{(T)}) x_{j L + ℓ} + z_{j L + ℓ}∥}^{2} \\ (82) & = \frac{1}{L - n_{t}} \sum_{ℓ = n_{t}}^{L - 1} E [{∥\sqrt{\frac{SNR}{n_{t}}} (H_{ℓ} - {\hat{H}}_{ℓ}^{(T)}) {\bar{X}}_{ℓ} + Z_{ℓ}∥}^{2}], almost surely \\ (83) & = \frac{1}{L - n_{t}} \sum_{ℓ = n_{t}}^{L - 1} (n_{r} + \frac{SNR}{n_{t}} E [{∥E_{ℓ}^{(T)} {\bar{X}}_{ℓ}∥}^{2}]) \\ (84) & = F (SNR) . \end{matrix}

Herein (82) follows from (Part 3) of Lemma 1 and the ergodic theorem ([28], Chapter 7); (83) follows from (Part 4) of Lemma 1; and (84) follows since

{\bar{X}}_{ℓ}

has zero mean and covariance matrix

I_{n_{t}}

, and is independent from

E_{ℓ}^{(T)}

(since

{E_{k}^{(T)}, k \in D}

is a function of

{(H_{k}, Z_{k}), k \in ℤ}

). It thus follows that, as

n \to \infty

,

\frac{1}{n} \sum_{k \in D^{(n^{'})}} {∥y_{k} - \sqrt{\frac{SNR}{n_{t}}} {\hat{H}}_{k}^{(T)} x_{k}∥}^{2}

(85)

converges to

F (SNR)

almost surely, which in turn implies that it also converges in probability, which is (79). □

Considering the typical set (78), and following the derivation in [16,17], the error probability

{\bar{P}}_{e} (1)

in (76) can be upper-bounded as

\begin{matrix} {\bar{P}}_{e} (1) \leq & e^{n R} \cdot Pr \{\frac{1}{n} \cdot D (m^{'}) < F (SNR) + δ| (X^{n^{'}} (1), Y^{n^{'}}, {\hat{H}}^{(T), n^{'}}) \in T_{δ}\} \\ + Pr \{(X^{n^{'}} (1), Y^{n^{'}}, {\hat{H}}^{(T), n^{'}}) \in T_{δ}^{c}\}, m^{'} \neq 1 \end{matrix}

(86)

where

T_{δ}^{c}

denotes the complement of

T_{δ}

. It follows from Lemma 2 that the second term on the RHS of (86) can be made arbitrarily small by letting n tend to infinity.

The GMI characterizes the rate of exponential decay of the expression

Pr \{\frac{1}{n} \cdot D (m^{'}) < F (SNR) + δ| (X^{n^{'}} (1), Y^{n^{'}}, {\hat{H}}^{(T), n^{'}}) \in T_{δ}\}, m^{'} \neq 1

(87)

as

n \to \infty

[16,17]. The computation of the GMI requires the conditional log moment-generating function of the metric

D (m^{'})

associated with the wrong message output

m^{'} \neq 1

, conditioned on the channel outputs and on the fading estimates, which is defined as

\begin{matrix} κ_{n} (θ, y^{n^{'}}, {\hat{H}}^{(T), n^{'}}) & ≜ log E [\exp (\frac{θ}{n} \sum_{k \in D^{(n^{'})}} D_{k} (m^{'}))| \{(y_{k}, {\hat{H}}_{k}^{(T)}), k \in D^{(n^{'})}\}] \end{matrix}

(88)

where

D_{k} (m^{'}) ≜ {∥y_{k} - \sqrt{\frac{SNR}{n_{t}}} {\hat{H}}_{k}^{(T)} x_{k} (m^{'})∥}^{2} .

(89)

Proceeding along the lines of [16,17], we can express the conditional log moment-generating function in (88) as the sum of conditional log moment-generating functions for the individual vector metrics

D_{k} (m^{'})

,

k \in D^{(n^{'})}

, i.e.,

\begin{matrix} κ_{n} (θ, y^{n^{'}}, {\hat{H}}^{(T), n^{'}}) \\ (90) & = \sum_{k \in D^{(n^{'})}} log E [\exp (\frac{θ}{n} D_{k} (m^{'}))| y_{k}, {\hat{H}}_{k}^{(T)}] \\ (91) & = \sum_{k \in D^{(n^{'})}} (\frac{θ}{n} y_{k}^{†} {(I_{n_{r}} - \frac{θ}{n} \frac{SNR}{n_{t}} {\hat{H}}_{k}^{(T)} {\hat{H}}_{k}^{† (T)})}^{- 1} y_{k} - log \det (I_{n_{r}} - \frac{θ}{n} \frac{SNR}{n_{t}} {\hat{H}}_{k}^{(T)} {\hat{H}}_{k}^{† (T)})) . \end{matrix}

We then have that, for all

θ < 0

,

\begin{matrix} lim_{n \to \infty} \frac{1}{n} \cdot κ_{n} (n θ, y^{n^{'}}, {\hat{H}}^{(T), n^{'}}) \\ = lim_{n \to \infty} \frac{1}{n} \sum_{k \in D^{(n^{'})}} θ y_{k}^{†} {(I_{n_{r}} - θ \frac{SNR}{n_{t}} {\hat{H}}_{k}^{(T)} {\hat{H}}_{k}^{† (T)})}^{- 1} y_{k} \\ (92) & - lim_{n \to \infty} \frac{1}{n} \sum_{k \in D^{(n^{'})}} log \det (I_{n_{r}} - θ \frac{SNR}{n_{t}} {\hat{H}}_{k}^{(T)} {\hat{H}}_{k}^{† (T)}) \\ = \frac{1}{L - n_{t}} \sum_{ℓ = n_{t}}^{L - 1} lim_{n \to \infty} \frac{L - n_{t}}{n} \sum_{j = 0}^{\frac{n}{L - n_{t}} - 1} θ y_{j L + ℓ}^{†} {(I_{n_{r}} - θ \frac{SNR}{n_{t}} {\hat{H}}_{j L + ℓ}^{(T)} {\hat{H}}_{j L + ℓ}^{† (T)})}^{- 1} y_{j L + ℓ} \\ (93) & - \frac{1}{L - n_{t}} \sum_{ℓ = n_{t}}^{L - 1} lim_{n \to \infty} \frac{L - n_{t}}{n} \sum_{j = 0}^{\frac{n}{L - n_{t}} - 1} log \det (I_{n_{r}} - θ \frac{SNR}{n_{t}} {\hat{H}}_{j L + ℓ}^{(T)} {\hat{H}}_{j L + ℓ}^{† (T)}) \\ = \frac{1}{L - n_{t}} \sum_{ℓ = n_{t}}^{L - 1} E [θ Y_{ℓ}^{†} \cdot {(I_{n_{r}} - θ \frac{SNR}{n_{t}} {\hat{H}}_{ℓ}^{(T)} {\hat{H}}_{ℓ}^{† (T)})}^{- 1} \cdot Y_{ℓ}] \\ (94) & - \frac{1}{L - n_{t}} \sum_{ℓ = n_{t}}^{L - 1} E [log \det (I_{n_{r}} - θ \frac{SNR}{n_{t}} {\hat{H}}_{ℓ}^{(T)} {\hat{H}}_{ℓ}^{† (T)})], almost surely \\ (95) & ≜ κ (θ, SNR) \end{matrix}

where the last step should be regarded as the definition of

κ (θ, SNR)

. The convergence in (94) is due to the ergodicity of

{(Y_{j L + ℓ}, {\hat{H}}_{j L + ℓ}^{(T)}), j \in ℤ}

,

ℓ = n_{t}, \dots, L - 1

(see (Part 3) of Lemma 1) and the ergodic theorem.

Following the same steps as in [16,17], we can then show that for all

δ^{'} > 0

, the ensemble-average error probability can be bounded as

{\bar{P}}_{e} (1) \leq \exp (n R) \exp (- n (I_{T}^{gmi} (SNR) - δ^{'})) + ε (δ^{'}, n)

(96)

for some

ε (δ^{'}, n)

satisfying

lim_{n \to \infty} ε (δ^{'}, n) = 0, δ^{'} > 0 .

(97)

On the RHS of (96),

I_{T}^{gmi} (SNR)

denotes the GMI as a function of the

SNR

for a fixed T, which is given by

I_{T}^{gmi} (SNR) = \frac{L - n_{t}}{L} (sup_{θ < 0} (θ F (SNR) - κ (θ, SNR))) .

(98)

Herein the pre-factor

(L - n_{t}) / L

equals the fraction of time instants used for data transmission. The bound (96) implies that for rates below

I_{T}^{gmi} (SNR)

, the communication scheme described in Section 2 has vanishing error probability as n tends to infinity.

Combining (77) and (94) with (98) yields

\begin{matrix} I_{T}^{gmi} (SNR) & = sup_{θ < 0} \frac{1}{L} \sum_{ℓ = n_{t}}^{L - 1} {θ (n_{r} + \frac{SNR}{n_{t}} E [{∥E_{ℓ}^{(T)}∥}_{F}^{2}]) + E [log \det (I_{n_{r}} - θ \frac{SNR}{n_{t}} {\hat{H}}_{ℓ}^{(T)} {\hat{H}}_{ℓ}^{† (T)})] \\ - E [θ Y_{ℓ}^{†} {(I_{n_{r}} - θ \frac{SNR}{n_{t}} {\hat{H}}_{ℓ}^{(T)} {\hat{H}}_{ℓ}^{† (T)})}^{- 1} Y_{ℓ}]} . \end{matrix}

(99)

Following the steps used in ([29], Appendix D), it can be shown that, for

θ < 0

,

E [θ Y_{ℓ}^{†} {(I_{n_{r}} - θ \frac{SNR}{n_{t}} {\hat{H}}_{ℓ}^{(T)} {\hat{H}}_{ℓ}^{† (T)})}^{- 1} Y_{ℓ}] \leq 0 .

(100)

As observed in ([29], Appendix D), a good lower bound on

I_{T}^{gmi} (SNR)

for high SNR follows by choosing

θ = \frac{- 1}{n_{r} + SNR n_{r} ϵ_{*, T}^{2}}

(101)

where

ϵ_{*, T}^{2} = max_{\begin{matrix} r = 1, \dots, n_{r}, \\ t = 1, \dots, n_{t}, \\ ℓ = n_{t}, \dots, L - 1 \end{matrix}} ϵ_{ℓ, T}^{2} (r, t) .

(102)

Hence, substituting the choice of

θ

in (101), and applying (100) to the RHS of (99), we obtain the following lower bound on

I_{T}^{gmi} (SNR)

:

I_{T}^{gmi} (SNR) \geq \frac{1}{L} \sum_{ℓ = n_{t}}^{L - 1} \{E [log \det (I_{n_{r}} + \frac{SNR}{n_{t} n_{r} + n_{t} n_{r} SNR ϵ_{*, T}^{2}} {\hat{H}}_{ℓ}^{(T)} {\hat{H}}_{ℓ}^{† (T)})] - 1\} .

(103)

5.2.2. $I_{T}^{gmi} (SNR)$ as $T \to \infty$

We next analyze the RHS of (103) in the limit as T tends to infinity. To this end, we note that, for

L \leq \frac{1}{2 λ_{D}}

, the variance of the interpolation error tends to (15), namely

ϵ_{ℓ}^{2} (t) = 1 - \int_{- 1 / 2}^{1 / 2} \frac{SNR {[f_{H} (λ)]}^{2}}{SNR f_{H} (λ) + L n_{t}} d λ

(104)

as T tends to infinity, irrespective of ℓ and t. We shall therefore omit the subscript and argument and write

ϵ^{2}

instead of

ϵ_{ℓ}^{2} (t)

. Please note that for a fixed T, the entries of

\frac{1}{\sqrt{n_{t} n_{r} + n_{t} n_{r} SNR ϵ_{*, T}^{2}}} {\hat{H}}_{ℓ}^{(T)}

(105)

are independent but not i.i.d., which follows from Part 2) of Lemma 1. However, as T tends to infinity, their distribution becomes identical due to (104) and hence they converge in distribution to

\frac{1}{\sqrt{n_{t} n_{r} + n_{t} n_{r} SNR ϵ_{*, T}^{2}}} {\hat{H}}_{ℓ}^{(T)} \overset{d}{⟶} \frac{1}{\sqrt{n_{t} n_{r} + n_{t} n_{r} SNR ϵ^{2}}} \bar{H}

(106)

where the entries of

\bar{H}

are i.i.d. complex-Gaussian random variables with zero mean and variance

(1 - ϵ^{2})

.

Next note that

log \det (I_{n_{r}} + \frac{SNR}{n_{t} n_{r} + n_{t} n_{r} SNR ϵ_{*, T}^{2}} {\hat{H}}_{ℓ}^{(T)} {\hat{H}}_{ℓ}^{† (T)})

(107)

is a nonnegative, continuous function with respect to the entries of the matrix

\frac{1}{n_{t} n_{r} + n_{t} n_{r} SNR ϵ_{*, T}^{2}} {\hat{H}}_{ℓ}^{(T)} {\hat{H}}_{ℓ}^{† (T)} .

(108)

It therefore follows from Portmanteau’s lemma [30] that, as

T \to \infty

, the RHS of (103) can be lower-bounded by

\begin{matrix} lim_{T \to \infty} \frac{1}{L} \sum_{ℓ = n_{t}}^{L - 1} \{E [log \det (I_{n_{r}} + \frac{SNR}{n_{t} n_{r} + n_{t} n_{r} SNR ϵ_{*, T}^{2}} {\hat{H}}_{ℓ}^{(T)} {\hat{H}}_{ℓ}^{† (T)})] - 1\} \\ (109) & \geq \frac{L - n_{t}}{L} \{E [log \det (I_{n_{r}} + \frac{SNR}{n_{t} n_{r} + n_{t} n_{r} SNR ϵ^{2}} \bar{H} {\bar{H}}^{†})] - 1\} \\ (110) & \geq \frac{L - n_{t}}{L} (E [log \det (\frac{SNR}{n_{t} n_{r} + n_{t} n_{r} SNR ϵ^{2}} \bar{H} {\bar{H}}^{†})] - 1) \end{matrix}

where the last inequality follows from the lower bound

log \det (I + A) \geq log \det A

.

Combining (110) with (103), and using that, by assumption,

n_{t} = n_{r}

, we obtain that

\begin{matrix} (111) & I^{gmi} (SNR) & ≜ lim_{T \to \infty} I_{T}^{gmi} (SNR) \\ (112) & \geq \frac{L - n_{t}}{L} (n_{t} log SNR - n_{t} log ({n_{t}}^{2} + {n_{t}}^{2} SNR ϵ^{2}) + E [log \det \bar{H} {\bar{H}}^{†}] - 1) . \end{matrix}

5.2.3. The Pre-Log

It remains to compute a lower bound on the pre-log. To this end, we compute the limiting ratio of the RHS of (112) to

log SNR

as

SNR

tends to infinity. We first consider

\begin{matrix} (113) & SNR ϵ^{2} & = SNR (1 - \int_{- 1 / 2}^{1 / 2} \frac{SNR {[f_{H} (λ)]}^{2}}{SNR f_{H} (λ) + L n_{t}} d λ) \\ (114) & = \int_{- 1 / 2}^{1 / 2} \frac{SNR f_{H} (λ) L n_{t}}{SNR f_{H} (λ) + L n_{t}} d λ . \end{matrix}

Since the integrand is bounded by

0 \leq \frac{SNR f_{H} (λ) L n_{t}}{SNR f_{H} (λ) + L} \leq L n_{t}

(115)

it follows that

0 \leq SNR ϵ^{2} \leq L n_{t}

, which implies that

lim_{SNR \to \infty} \frac{log ({n_{t}}^{2} + {n_{t}}^{2} SNR ϵ^{2})}{log SNR} = 0 .

(116)

We next consider the term

E [log \det \bar{H} {\bar{H}}^{†}] - 1

. Please note that by ([31], Lemma A.2) and the assumption

n_{t} = n_{r}

, we have

\begin{matrix} E [log \det \bar{H} {\bar{H}}^{†}] - 1 & = n_{t} log (1 - ϵ^{2}) + \sum_{b = 0}^{n_{t} - 1} ψ (n_{t} - b) - 1 \end{matrix}

(117)

where

ψ (\cdot)

is Euler’s digamma function [32]. Furthermore, since

0 \leq \frac{SNR {[f_{H} (λ)]}^{2}}{SNR f_{H} (λ) + L n_{t}} \leq f_{H} (λ)

(118)

we have by the Dominated Convergence Theorem [28] that

lim_{SNR \to \infty} ϵ^{2} = lim_{SNR \to \infty} (1 - \int_{- 1 / 2}^{1 / 2} \frac{SNR {[f_{H} (λ)]}^{2}}{SNR f_{H} (λ) + L n_{t}} d λ) = 0

(119)

so

log (1 - ϵ^{2})

vanishes as the SNR tends to infinity. Combining (119) with (117) yields

lim_{SNR \to \infty} \frac{E [log \det \bar{H} {\bar{H}}^{†}] - 1}{log SNR} = 0 .

(120)

It follows from (112), (116), and (120) that

\begin{matrix} (121) & Π_{R^{*}} & \geq n_{t} (1 - \frac{n_{t}}{L}) \\ (122) & = \min (n_{t}, n_{r}) (1 - \frac{\min (n_{t}, n_{r})}{L}), L \leq \frac{1}{2 λ_{D}} \end{matrix}

where we have used that

n_{t} = n_{r} = \min (n_{t}, n_{r})

. Please note that the condition

L \leq \frac{1}{2 λ_{D}}

is necessary since otherwise (104) would not hold. This proves Theorem 1.

5.3. A Note on the Input Distribution

The pre-log in Theorem 1 is derived using codebooks whose entries are drawn i.i.d. from an

n_{t}

-variate Gaussian distribution with zero mean and identity covariance matrix. However, Gaussian inputs are not necessary to achieve the pre-log (25). In fact, as we shall argue next, the pre-log (25) can be achieved by any i.i.d. inputs with a probability density function satisfying

E [∥ \bar{X} ∥^{2}] \leq n_{t}

and (26) and (27), namely,

\begin{matrix} (123) & p_{\bar{X}} (\bar{x}) \leq \frac{K}{π^{n_{t}}} e^{- ∥ \bar{x} ∥^{2}}, \bar{x} \in^{n_{t}} \\ (124) & lim_{SNR \to \infty} \frac{log K}{log SNR} = 0 . \end{matrix}

Indeed, since the inputs have a density, they also satisfy

E [∥ \bar{X} ∥^{2}] > 0

. To show that the conditions (26) and (27) suffice to achieve (25), we follow the steps in Section 5.2 but with

F (SNR)

replaced by

F (SNR) = n_{r} + \frac{SNR}{(L - n_{t}) n_{t}} \sum_{ℓ = n_{t}}^{L - 1} E [{∥E_{ℓ}^{(T)} {\bar{X}}_{ℓ}∥}_{F}^{2}] .

(125)

We then upper-bound

F (SNR)

and

κ (θ, SNR)

as follows. Using that for any two matrices

A

and

B

we have

{∥ A B ∥}_{F}^{2} \leq {∥ A ∥}_{F}^{2} \cdot {∥ B ∥}_{F}^{2}

([33], Section 5.6), and using that

E_{ℓ}^{(T)}

and

{\bar{X}}_{ℓ}

are independent, we can upper-bound

F (SNR)

by

\begin{matrix} F (SNR) \leq n_{r} + \frac{SNR}{(L - n_{t}) n_{t}} \sum_{ℓ = n_{t}}^{L - 1} E [{∥E_{ℓ}^{(T)}∥}_{F}^{2}] \cdot E [{∥{\bar{X}}_{ℓ}∥}^{2}] . \end{matrix}

(126)

As for

κ (θ, SNR)

, we have

\begin{matrix} E [\exp (\frac{θ}{n} D_{k} (m^{'}))| y_{k}, {\hat{H}}_{k}^{(T)}] \\ (127) & = \int_{{\bar{x}}_{k}} p_{\bar{X}} ({\bar{x}}_{k}) \exp (\frac{θ}{n} {∥y_{k} - \sqrt{\frac{SNR}{n_{t}}} {\hat{H}}_{k}^{(T)} {\bar{x}}_{k}∥}^{2}) d {\bar{x}}_{k} \\ (128) & \leq \int_{{\bar{x}}_{k}} \frac{K}{π^{n_{t}}} \exp (- ∥ {\bar{x}}_{k} ∥^{2} + \frac{θ}{n} {∥y_{k} - \sqrt{\frac{SNR}{n_{t}}} {\hat{H}}_{k}^{(T)} {\bar{x}}_{k}∥}^{2}) d {\bar{x}}_{k} \\ (129) & = \frac{K}{\det (I_{n_{r}} - \frac{θ}{n} \frac{SNR}{n_{t}} {\hat{H}}_{k}^{(T)} {\hat{H}}_{k}^{† (T)})} \exp (\frac{θ}{n} y_{k}^{†} {(I_{n_{r}} - \frac{θ}{n} \frac{SNR}{n_{t}} {\hat{H}}_{k}^{(T)} {\hat{H}}_{k}^{† (T)})}^{- 1} y_{k}) . \end{matrix}

Here (128) follows from (123), and (129) follows by evaluating the integral as in ([17], Appendix A). By following the steps in Section 5.2, and by choosing

θ = \frac{- 1}{n_{r} + SNR n_{r} ϵ_{*, T}^{2} E [∥ \bar{X} ∥^{2}]}

(130)

where

ϵ_{*, T}^{2}

is given in (102), we obtain from (126) and (129) that

\begin{matrix} I_{T}^{gmi} (SNR) & \geq \frac{1}{L} \sum_{ℓ = n_{t}}^{L - 1} \{E [log \det (I_{n_{r}} + \frac{SNR}{n_{t} n_{r} + n_{t} n_{r} SNR ϵ_{*, T}^{2} E [∥ \bar{X} ∥^{2}]} {\hat{H}}_{ℓ}^{(T)} {\hat{H}}_{ℓ}^{† (T)})]\} \\ - \frac{L - n_{t}}{L} (1 + log K) . \end{matrix}

(131)

Taking the limit as T tends to infinity, and repeating the steps in Section 5.2, it follows that

\begin{matrix} (132) & I^{gmi} (SNR) & = lim_{T \to \infty} I_{T}^{gmi} (SNR) \\ (133) & \geq \frac{L - n_{t}}{L} (E [log \det (\frac{SNR}{n_{t} n_{r} + n_{t} n_{r} SNR ϵ^{2} E [∥ \bar{X} ∥^{2}]} \bar{H} {\bar{H}}^{†})] - 1 - log K) \\ = \frac{L - n_{t}}{L} (n_{t} log SNR - n_{t} log ({n_{t}}^{2} + {n_{t}}^{2} SNR ϵ^{2} E [∥ \bar{X} ∥^{2}]) \\ (133) & + E [log \det \bar{H} {\bar{H}}^{†}] - 1 - log K) \end{matrix}

where we have again used the assumption

n_{t} = n_{r}

.

We conclude by evaluating the limiting ratio of the RHS of (134) to

log SNR

as

SNR

tends to infinity. Using (115) and that

E [∥ \bar{X} ∥^{2}] \leq n_{t}

, we obtain that

lim_{SNR \to \infty} \frac{log ({n_{t}}^{2} + {n_{t}}^{2} SNR ϵ^{2} E [∥ \bar{X} ∥^{2}])}{log SNR} = 0 .

(135)

This in turn yields together with (120) that

lim_{SNR \to \infty} \frac{I^{gmi} (SNR)}{log SNR} \geq n_{t} (1 - \frac{n_{t}}{L})

(136)

provided that

\begin{matrix} lim_{SNR \to \infty} \frac{log K}{log SNR} = 0 . \end{matrix}

(137)

It thus follows that any i.i.d. input distribution satisfying

E [∥ \bar{X} ∥^{2}] \leq n_{t}

and (26) and (27) achieves the pre-log (25).

6. Proof of Theorem 2

In contrast to the proof of Theorem 1, for the fading MAC, it is not sufficient to restrict ourselves to the case of

n_{t, 1} = n_{t, 2} = n_{r}

. For example, increasing

n_{r}

beyond

n_{t, 1}

and

n_{t, 2}

does not increase the single-rate pre-logs

Π_{R_{1}^{*}}

and

Π_{R_{2}^{*}}

, but it does increase the pre-log of the achievable sum-rate

Π_{R_{1 + 2}^{*}}

. For the proof of Theorem 2, we therefore consider a general setup of

n_{t, 1}

,

n_{t, 2}

, and

n_{r}

.

We derive the achievable pre-logs for the MAC case by following similar steps as in the point-to-point case. We first consider the average error probability, averaged over the ensemble of i.i.d. Gaussian codebooks. Let

{\bar{P}}_{e}

and

{\bar{P}}_{e} (m_{1}, m_{2})

be the ensemble-average error probability and the ensemble-average error probability when messages

m_{1}

and

m_{2}

are transmitted, respectively. Due to the symmetry of the codebook construction,

{\bar{P}}_{e}

is equal to

{\bar{P}}_{e} (1, 1)

and it therefore suffices to consider

{\bar{P}}_{e} (1, 1)

to derive the achievable rates. Let

E (m_{1}^{'}, m_{2}^{'})

denote the event that

D (m_{1}^{'}, m_{2}^{'}) \leq D (1, 1)

, where

D (\cdot, \cdot)

was defined in (45). Using the union bound, the error probability

{\bar{P}}_{e} (1, 1)

can be upper-bounded as

\begin{matrix} (138) & {\bar{P}}_{e} (1, 1) & = Pr \{⋃_{(m_{1}^{'}, m_{2}^{'}) \neq (1, 1)} E (m_{1}^{'}, m_{2}^{'})\} \\ (139) & \leq Pr \{⋃_{m_{1}^{'} \neq 1} E (m_{1}^{'}, 1)\} + Pr \{⋃_{m_{2}^{'} \neq 1} E (1, m_{2}^{'})\} + Pr \{⋃_{m_{1}^{'} \neq 1} ⋃_{m_{2}^{'} \neq 1} E (m_{1}^{'}, m_{2}^{'})\} . \end{matrix}

We next analyze the three probabilities on the RHS of (139). Let the matrix

E_{s, k}^{(T)}

,

s = 1, 2

with entries

E_{s, k}^{(T)} (r, t)

be the estimation-error matrix in estimating

H_{s, k}

, i.e.,

E_{s, k}^{(T)} = H_{s, k} - {\hat{H}}_{s, k}^{(T)} .

(140)

To facilitate the analysis, we first generalize

F (SNR)

and

T_{δ}

, defined in the point-to-point case in (77) and (78), to the MAC case:

\begin{matrix} (141) & F (SNR) & ≜ n_{r} + \frac{SNR}{L - n_{t, 1} - n_{t, 2}} \sum_{ℓ = n_{t, 1} + n_{t, 2}}^{L - 1} E [{∥E_{1, ℓ}^{(T)}∥}_{F}^{2} + {∥E_{2, ℓ}^{(T)}∥}_{F}^{2}], \\ T_{δ} & ≜ {(x_{s, k}, y_{k}, {\hat{H}}_{s, k}^{(T)}), k = 0, \dots, n^{'} - 1, s = 1, 2 : \\ (142) & |\frac{1}{n} \sum_{k \in D^{(n^{'})}} {∥y_{k} - \sqrt{SNR} {\hat{H}}_{1, k}^{(T)} x_{1, k} - \sqrt{SNR} {\hat{H}}_{2, k}^{(T)} x_{2, k}∥}^{2} - F (SNR)| < δ} \end{matrix}

for some

δ > 0

, with

n^{'}

given in (41) and

D^{(n^{'})} = {0, \dots, n^{'} - 1} \cap D

. Using

F (SNR)

and the typical set

T_{δ}

, we continue by evaluating the GMI for each of the three probabilities on the RHS of (139), which correspond to the error events

(m_{1}^{'} \neq 1, m_{2}^{'} = 1)

,

(m_{1}^{'} = 1, m_{2}^{'} \neq 1)

, and

(m_{1}^{'} \neq 1, m_{2}^{'} \neq 1)

.

6.1. Error Event $(m_{1}^{'} \neq 1, m_{2}^{'} = 1)$

Following the steps in Section 5.2 to derive (86), we can upper-bound the ensemble-average error probability for the error event

E (m_{1}^{'}, 1)

,

m_{1}^{'} \neq 1

as

\begin{matrix} Pr \{⋃_{m_{1}^{'} \neq 1} E (m_{1}^{'}, 1)\} \\ \leq e^{n R_{1}} \cdot Pr \{\frac{1}{n} \cdot D (m_{1}^{'}, 1) < F (SNR) + δ| \{(X_{s}^{n^{'}} (1), Y^{n^{'}}, {\hat{H}}_{s}^{(T), n^{'}}), s = 1, 2\} \in T_{δ}\} \\ + Pr \{\{(X_{s}^{n^{'}} (1), Y^{n^{'}}, {\hat{H}}_{s}^{(T), n^{'}}), s = 1, 2\} \in T_{δ}^{c}\}, m_{1}^{'} \neq 1 . \end{matrix}

(143)

Note that the second probability on the RHS of (143) vanishes as n tends to infinity, which can be shown along the lines of the proof of Lemma 2.

The GMI of user 1 gives the rate of exponential decay of the term

Pr \{\frac{1}{n} \cdot D (m_{1}^{'}, 1) < F (SNR) + δ| \{(X_{s}^{n^{'}} (1), Y^{n^{'}}, {\hat{H}}_{s}^{(T), n^{'}}), s = 1, 2\} \in T_{δ}\}

(144)

as

n \to \infty

. Its evaluation requires the expression of the log moment-generating function of the metric

D (m_{1}^{'}, 1)

, conditioned on the channel outputs, on

m_{2}^{'} = 1

, and on the fading estimates, which is defined as

\begin{matrix} κ_{1, n} & (θ, y^{n^{'}}, x_{2}^{n^{'}} (1), {\hat{H}}_{1}^{(T), n^{'}}, {\hat{H}}_{2}^{(T), n^{'}}) \\ ≜ log E [\exp (\frac{θ}{n} \sum_{k \in D^{(n^{'})}} D_{k} (m_{1}^{'}, 1)) | \{(y_{k}, x_{2, k} (1), {\hat{H}}_{1, k}^{(T)}, {\hat{H}}_{2, k}^{(T)}), k \in D^{(n^{'})}\}] \end{matrix}

(145)

where

D_{k} (m_{1}^{'}, m_{2}^{'}) ≜ {∥y_{k} - \sqrt{SNR} {\hat{H}}_{1, k}^{(T)} x_{1, k} (m_{1}^{'}) - \sqrt{SNR} {\hat{H}}_{2, k}^{(T)} x_{2, k} (m_{2}^{'})∥}^{2} .

(146)

Following the steps used in Section 5.2 to obtain (90) and (91), it can be shown that

\begin{matrix} κ_{1, n} (θ, y^{n^{'}}, x_{2}^{n^{'}} (1), {\hat{H}}_{1}^{(T), n^{'}}, {\hat{H}}_{2}^{(T), n^{'}}) \\ = \sum_{k \in D^{(n^{'})}} {\frac{θ}{n} {(y_{k} - \sqrt{SNR} {\hat{H}}_{2, k}^{(T)} x_{2, k} (1))}^{†} {(I_{n_{r}} - \frac{θ}{n} SNR {\hat{H}}_{1, k}^{(T)} {\hat{H}}_{1, k}^{† (T)})}^{- 1} (y_{k} - \sqrt{SNR} {\hat{H}}_{2, k}^{(T)} x_{2, k} (1)) \\ - log \det (I_{n_{r}} - \frac{θ}{n} SNR {\hat{H}}_{1, k}^{(T)} {\hat{H}}_{1, k}^{† (T)})} . \end{matrix}

(147)

Then, following the steps used in Section 5.2 to derive (92)–(94), we obtain that, for all

θ < 0

,

\begin{matrix} lim_{n \to \infty} \frac{1}{n} \cdot κ_{1, n} (n θ, y^{n^{'}}, x_{2}^{n^{'}} (1), {\hat{H}}_{1}^{(T), n^{'}}, {\hat{H}}_{2}^{(T), n^{'}}) \\ (148) & = \frac{1}{L - n_{t, 1} - n_{t, 2}} \sum_{ℓ = n_{t, 1} + n_{t, 2}}^{L - 1} (g_{1, ℓ} (θ, SNR) - E [log \det (I_{n_{r}} - θ SNR {\hat{H}}_{1, ℓ}^{(T)} {\hat{H}}_{1, ℓ}^{† (T)})]) \\ (149) & ≜ κ_{1} (θ, SNR), almost surely \end{matrix}

where the last step should be regarded as the definition of

κ_{1} (θ, SNR)

. In (148), we define

\begin{matrix} g_{1, ℓ} (θ, SNR) & ≜ E [θ {(Y_{ℓ} - \sqrt{SNR} {\hat{H}}_{2, ℓ}^{(T)} X_{2, ℓ})}^{†} {(I_{n_{r}} - θ SNR {\hat{H}}_{1, ℓ}^{(T)} {\hat{H}}_{1, ℓ}^{† (T)})}^{- 1} (Y_{ℓ} - \sqrt{SNR} {\hat{H}}_{2, ℓ}^{(T)} X_{2, ℓ})] . \end{matrix}

(150)

Following the derivation in [16,17], we can then upper-bound

Pr \{⋃_{m_{1}^{'} \neq 1} E (m_{1}^{'}, 1)\} \leq \exp (n R_{1}) \exp (- n (I_{1, T}^{gmi} (SNR) - δ^{'})) + ε_{1} (δ^{'}, n)

(151)

for any

δ^{'} > 0

, and for some

ε_{1} (δ^{'}, n)

satisfying

lim_{n \to \infty} ε_{1} (δ^{'}, n) = 0, δ^{'} > 0 .

(152)

On the RHS of (151),

I_{1, T}^{gmi} (SNR)

denotes the GMI for user 1 as a function of the

SNR

for a fixed T, i.e.,

I_{1, T}^{gmi} (SNR) = \frac{L - n_{t, 1} - n_{t, 2}}{L} (sup_{θ < 0} (θ F (SNR) - κ_{1} (θ, SNR))) .

(153)

The pre-factor

(L - n_{t, 1} - n_{t, 2}) / L

equals the fraction of time used for data transmission. The bound (151) implies that, for all rates below

I_{1, T}^{gmi} (SNR)

, the error probability in decoding user 1’s message for the scheme described in Section 4 vanishes as n tends to infinity.

Combining (141) and (148) with (153), we obtain that

\begin{matrix} I_{1, T}^{gmi} (SNR) = sup_{θ < 0} \frac{1}{L} \sum_{ℓ = n_{t, 1} + n_{t, 2}}^{L - 1} {θ & (n_{r} + SNR E [{∥E_{1, ℓ}^{(T)}∥}_{F}^{2} + {∥E_{2, ℓ}^{(T)}∥}_{F}^{2}]) - g_{1, ℓ} (θ, SNR) \\ + E [log \det (I_{n_{r}} - θ SNR {\hat{H}}_{1, ℓ}^{(T)} {\hat{H}}_{1, ℓ}^{† (T)})]} . \end{matrix}

(154)

Since the supremum over

θ < 0

is difficult to evaluate, we next consider a lower bound on

I_{1, T}^{gmi} (SNR)

. By noting that

g_{1, ℓ} (θ, SNR) \leq 0

for

θ \leq 0

(which can be shown using the technique developed in ([29], Appendix D), and by choosing

θ = \frac{- 1}{n_{r} + n_{r} (n_{t, 1} + n_{t, 2}) SNR ϵ_{*, T}^{2}}

(155)

where

ϵ_{*, T}^{2} = max_{\begin{matrix} s = 1, 2, \\ r = 1, \dots, n_{r}, \\ t = 1, \dots, n_{t, s}, \\ ℓ = n_{t, 1} + n_{t, 2}, \dots, L - 1 \end{matrix}} E [{|E_{s, ℓ}^{(T)} (r, t)|}^{2}]

(156)

we obtain the following lower bound on

I_{1, T}^{gmi} (SNR)

:

\begin{matrix} I_{1, T}^{gmi} (SNR) \geq \frac{1}{L} \sum_{ℓ = n_{t, 1} + n_{t, 2}}^{L - 1} E [log \det (I_{n_{r}} + \frac{SNR {\hat{H}}_{1, ℓ}^{(T)} {\hat{H}}_{1, ℓ}^{† (T)}}{n_{r} + n_{r} (n_{t, 1} + n_{t, 2}) SNR ϵ_{*, T}^{2}}) - 1] . \end{matrix}

(157)

(As pointed out in Section 5, this choice of

θ

yields a good lower bound at high SNR.) We continue by analyzing the RHS of (157) in the limit as the observation window T of the channel estimator tends to infinity. To this end, we note that, for

L \leq \frac{1}{2 λ_{D}}

, the variance of the interpolation error tends to (15) (with

SNR

in (15) replaced by

n_{t} SNR

due to the difference between the point-to-point channel model (1) and the MAC channel model (40)), so

lim_{T \to \infty} E [{|E_{s, ℓ}^{(T)} (r, t)|}^{2}] = ϵ^{2} = 1 - \int_{- 1 / 2}^{1 / 2} \frac{SNR {[f_{H} (λ)]}^{2}}{SNR f_{H} (λ) + L} d λ

(158)

irrespective of

s, ℓ, r

, and t. It follows that the estimate

{\hat{H}}_{1, ℓ}^{(T)}

tends to

{\bar{H}}_{1}

in distribution as T tends to infinity, which implies that

\begin{matrix} \frac{{\hat{H}}_{1, ℓ}^{(T)} {\hat{H}}_{1, ℓ}^{† (T)}}{n_{r} + n_{r} (n_{t, 1} + n_{t, 2}) SNR ϵ_{*, T}^{2}} \overset{d}{⟶} \frac{{\bar{H}}_{1} {\bar{H}}_{1}^{†}}{n_{r} + n_{r} (n_{t, 1} + n_{t, 2}) SNR ϵ^{2}} \end{matrix}

(159)

where the

n_{r} \times n_{t, 1}

entries of

{\bar{H}}_{1}

are i.i.d., circularly-symmetric, complex-Gaussian random variables with zero mean and variance

(1 - ϵ^{2})

. Using Portmanteau’s lemma (as used in (109)), we obtain that

\begin{matrix} (160) & I_{1}^{gmi} (SNR) & = lim_{T \to \infty} I_{1, T}^{gmi} (SNR) \\ (161) & \geq \frac{L - n_{t, 1} - n_{t, 2}}{L} (E [log \det (I_{n_{r}} + \frac{SNR {\bar{H}}_{1} {\bar{H}}_{1}^{†}}{n_{r} + n_{r} (n_{t, 1} + n_{t, 2}) SNR ϵ^{2}})] - 1) \\ \geq \frac{L - n_{t, 1} - n_{t, 2}}{L} \min (n_{r}, n_{t, 1}) [log SNR - log (n_{r} + n_{r} (n_{t, 1} + n_{t, 2}) SNR ϵ^{2})] \\ (162) & + \frac{L - n_{t, 1} - n_{t, 2}}{L} Ψ_{1} \end{matrix}

where

Ψ_{1} ≜ \{\begin{matrix} E [log \det {\bar{H}}_{1} {\bar{H}}_{1}^{†}] - 1, & n_{r} \leq n_{t, 1} \\ E [log \det {\bar{H}}_{1}^{†} {\bar{H}}_{1}] - 1, & n_{r} > n_{t, 1} . \end{matrix}

(163)

The inequality (162) follows by lower-bounding

log \det (I + A) \geq log \det A

.

By evaluating the limiting ratio of the RHS of (162) to

log SNR

as

SNR

tends to infinity following similar steps as in Section 5.2.3, we obtain the following lower bound on the maximum achievable pre-log of user 1:

Π_{R_{1}^{*}} \geq \min (n_{r}, n_{t, 1}) (1 - \frac{n_{t, 1} + n_{t, 2}}{L}), L \leq \frac{1}{2 λ_{D}} .

(164)

As in the point-to-point case, the condition

L \leq 1 / (2 λ_{D})

is necessary to obtain (15). The lower bound (164) yields one boundary of the pre-log region presented in Theorem 2.

6.2. Error Event $(m_{1}^{'} = 1, m_{2}^{'} \neq 1)$

The error event

(m_{1}^{'} = 1, m_{2}^{'} \neq 1)

can be analyzed by swapping user 1 and user 2 and then using the results obtained in the previous subsection for the error event

(m_{1}^{'} \neq 1, m_{2}^{'} = 1)

. We thus have the lower bound

\begin{matrix} Π_{R_{2}^{*}} & \geq \min (n_{r}, n_{t, 2}) (1 - \frac{n_{t, 1} + n_{t, 2}}{L}), L \leq \frac{1}{2 λ_{D}} \end{matrix}

(165)

which yields the second boundary of the pre-log region presented in Theorem 2.

6.3. Error Event $(m_{1}^{'} \neq 1, m_{2}^{'} \neq 1)$

For the error event

(m_{1}^{'} \neq 1, m_{2}^{'} \neq 1)

, the analysis of the achievable sum rate follows the same analysis as in the point-to-point case (Section 5.2). More specifically, the GMI

I_{1 + 2, T}^{gmi} (SNR)

that describes the exponential decay of the term

Pr \{\frac{1}{n} \cdot D (m_{1}^{'}, m_{2}^{'}) < F (SNR) + δ| \{(X_{s}^{n^{'}} (1), Y^{n^{'}}, {\hat{H}}_{s}^{(T), n^{'}}), s = 1, 2\} \in T_{δ}\}

(166)

can be viewed as the GMI of an

n_{r} \times (n_{t, 1} + n_{t, 2})

-dimensional point-to-point MIMO channel with fading matrix

[H_{1, k}, H_{2, k}]

and fading estimate matrix

[{\hat{H}}_{1, k}^{(T)}, {\hat{H}}_{2, k}^{(T)}]

. The maximum achievable sum-rate pre-log can therefore be obtained by following the same steps as in Section 5.2, but with arbitrary

n_{r}

and

n_{t} = n_{t, 1} + n_{t, 2}

. It can thus be shown that the maximum achievable sum-rate pre-log

Π_{R_{1 + 2}^{*}}

is lower-bounded by

\begin{matrix} Π_{R_{1 + 2}^{*}} \geq \min (n_{r}, n_{t, 1} + n_{t, 2}) (1 - \frac{n_{t, 1} + n_{t, 2}}{L}), L \leq \frac{1}{2 λ_{D}} . \end{matrix}

(167)

On the RHS of (167), the term

\min (n_{r}, n_{t, 1} + n_{t, 2})

corresponds to the MIMO gain, which is given by the minimum number of receive and transmit antennas, and the term

(1 - \frac{n_{t, 1} + n_{t, 2}}{L})

corresponds to the fraction of time indices for data transmission. This yields the third boundary of the pre-log region presented in Theorem 2.

7. Conclusions

In this paper, we studied a communication scheme for MIMO fading channels that estimates the fading via transmission of pilot symbols at regular intervals and feeds the fading estimates to the nearest neighbor decoder. Restricting ourselves to fading processes with a bandlimited power spectral density, we studied the information rates achievable with this scheme at high SNR. Specifically, we analyzed the achievable rate pre-log, defined as the limiting ratio of the achievable rate to the logarithm of the SNR in the limit as the SNR tends to infinity.

We showed that in order to obtain fading estimates whose variance vanishes as the SNR tends to infinity, the portion of time required for pilot transmission must be greater than or equal to the number of transmit antennas times twice the bandwidth of the fading power spectral density. We demonstrated that in this case, the nearest neighbor decoder achieves the capacity pre-log of the coherent fading channel times the fraction of time used for the transmission of data. Hence, the loss with respect to the coherent case is solely due to the transmission of pilots used to obtain accurate fading estimates. Our achievability bounds are tight in the sense that any scheme using as many pilots as our proposed scheme cannot achieve a higher pre-log using a nearest neighbor decoder. Furthermore, if the inverse of twice the bandwidth of the fading process is an integer, then, for MISO channels, our scheme achieves the capacity pre-log of the noncoherent fading channel derived by Koch and Lapidoth [11]. For noncoherent MIMO channels, our scheme achieves the best so far known lower bound on the capacity pre-log obtained by Etkin and Tse [12]. Since the last result only yields a lower bound on the capacity pre-log of MIMO channels, there may exist other schemes achieving a better pre-log than our scheme.

Author Contributions

Conceptualization, A.T.A., T.K., and A.G.i.F.; methodology, A.T.A., T.K., and A.G.i.F.; software, A.T.A.; validation, A.T.A., T.K., and A.G.i.F.; formal analysis, A.T.A., T.K., and A.G.i.F.; investigation, A.T.A., T.K., and A.G.i.F.; resources, A.T.A., T.K., and A.G.i.F.; data curation, A.T.A.; writing–original draft preparation, A.T.A.; writing–review and editing, T.K. and A.G.i.F.; visualization, A.T.A.; supervision, T.K. and A.G.i.F.; project administration, A.G.i.F.; funding acquisition, A.T.A., T.K., and A.G.i.F. All authors have read and agreed to the published version of the manuscript.

Funding

The work of A.T.A was supported in part by the Yousef Jameel Scholarship at the University of Cambridge. T. K. received funding from the European’s Seventh Framework Programme (FP7/2007–2013) under grant agreement No. 252663, from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement number 714161), and from the Spanish Ministerio de Economía y Competitividad under grant RYC-2014-16332. The work of A.G.i.F. has been funded by the European Research Council under ERC grant agreements 259663 and 725411.

Acknowledgments

The authors wish to thank the anonymous referees for their valuable comments. The work of A.T.A. was conducted in part while he was with the University of Cambridge (U.K.), the National Chiao Tung University (Taiwan), and the University of Bradford (U.K.). The work of T.K. was conducted in part while he was with the University of Cambridge (U.K.).

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Proof of Lemma 1

We prove each part of Lemma 1 in a separate item:

By the orthogonality principle [34], it follows that ${\hat{H}}_{k}^{(T)} (r, t)$ and $E_{k}^{(T)} (r, t)$ are uncorrelated. Noting that the pilot symbols are unity, we can write (72) as

${\hat{H}}_{k}^{(T)} (r, t) = \sum_{\begin{matrix} k^{'} = k - T L : \\ k^{'} \in P \end{matrix}}^{k + T L} a_{k^{'}} (r, t) (\sqrt{\frac{SNR}{n_{t}}} H_{k^{'}} (r, t) + Z_{k^{'}} (r)), k \in D .$

(A1)

Since the processes ${H_{k} (r, t), k \in}$ and ${Z_{k} (r), k \in}$ are zero-mean complex-Gaussian, we have from (A1) and the orthogonality principle that ${\hat{H}}_{k}^{(T)} (r, t)$ and $E_{k}^{(T)} (r, t)$ are independent zero-mean complex-Gaussian random variables.
Recall from Section 5.1 that the time index k can be written as $k = j L + ℓ$ . Then, for $k \in D$ , we have $ℓ = n_{t}, \dots, L - 1$ , and for $k \in P$ we have $ℓ = 0, \dots, n_{t} - 1$ . Since the pilot vectors are transmitted sequentially from $p_{1}$ to $p_{n_{t}}$ , we have for $(j L + ℓ) \in P$ that

$x_{j L + ℓ} = p_{ℓ + 1}, ℓ = 0, \dots, n_{t} - 1 .$

(A2)

That is, the pilot vectors that help estimate the fading coefficients from transmit antenna t are transmitted at time instants whose remainder after division by L is equal to $t - 1$ . This implies that in order to estimate $H_{k} (r, t)$ , there is no loss in optimality by considering only the outputs $Y_{k^{'}} (r)$ for $k^{'} \in P \cap {k - T L, \dots, k + T L}$ satisfying

$k^{'} \mod L = t - 1 .$

(A3)

Indeed, the channel outputs $Y_{k^{'}} (r)$ , $k^{'} \mod L \neq t - 1$ correspond to $H_{k^{'}} (r, t^{'})$ , $t^{'} \neq t$ , which are independent from $H_{k} (r, t)$ since we assumed that the fading processes corresponding to different transmit and receive antennas are independent. It follows that for estimation at time $k = j L + ℓ$ , the coefficients $a_{k^{'}} (r, t)$ that minimize the mean-squared error depend only on L and ℓ [23]. The fading estimate (72) can then be expressed as

$\begin{matrix} (A4) & {\hat{H}}_{j L + ℓ}^{(T)} (r, t) & = \sum_{τ = - T}^{T - 1} α_{- τ L, ℓ} (r, t) Y_{(j - τ) L + t - 1} (r) \\ (A5) & = \sum_{τ = - T}^{T - 1} α_{- τ L, ℓ} (r, t) (\sqrt{\frac{SNR}{n_{t}}} H_{(j - τ) L + t - 1} (r, t) + Z_{(j - τ) L + t - 1} (r)) \end{matrix}$

where for a given L and $ℓ = n_{t}, \dots, L - 1$ , we defined

$α_{- τ L, ℓ} (r, t) ≜ a_{(j - τ) L + t - 1} (r, t), τ = - T, \dots, T - 1 .$

(A6)

Noting again that the $n_{r} \cdot n_{t}$ processes ${H_{k} (r, t), k \in}$ are independent from each other and have the same law, we obtain the following results from (A5):
(a)
For a given t, the time differences between the index of interest ( $j L + ℓ$ ) and the positions of pilots ( $(j - τ) L + t - 1$ ) do not depend on r. It thus follows that for a given t, the optimal coefficients $α_{- τ L, ℓ} (r, t)$ are identical for all $r = 1, \dots, n_{r}$ [23]. This implies that for a given t and ℓ, the $n_{r}$ processes

${({\hat{H}}_{j L + ℓ}^{(T)} (1, t), E_{j L + ℓ}^{(T)} (1, t)), j \in ℤ}, \dots, {({\hat{H}}_{j L + ℓ}^{(T)} (n_{r}, t), E_{j L + ℓ}^{(T)} (n_{r}, t)), j \in ℤ}$

(A7)

are independent and have the same law.
(b)
For a given r, the time differences between the index of interest ( $j L + ℓ$ ) and the position of pilots ( $(j - τ) L + t - 1$ ) depend on t. It thus follows from [23] that for a given r, the optimal coefficients $α_{- τ L, ℓ} (r, t)$ are generally different for $t = 1, \dots, n_{t}$ . This implies that for a given r and ℓ, the $n_{t}$ processes

${({\hat{H}}_{j L + ℓ}^{(T)} (r, 1), E_{j L + ℓ}^{(T)} (r, 1)), j \in ℤ}, \dots, {({\hat{H}}_{j L + ℓ}^{(T)} (r, n_{t}), E_{j L + ℓ}^{(T)} (r, n_{t})), j \in ℤ}$

(A8)

are independent but have different laws.
We first note that ${H_{k}, k \in}$ is an ergodic Gaussian process, which implies that it is also weakly mixing [35]. (See [36] for a definition of a weakly-mixing process.) Since ${Z_{k}, k \in}$ is an i.i.d. Gaussian process and independent from ${H_{k}, k \in}$ , it follows from ([36], Proposition 1.6) that ${(H_{k}, Z_{k}), k \in}$ is jointly ergodic.
We next evaluate the process ${({\hat{H}}_{k}^{(T)}, H_{k}, Z_{k}), k \in D}$ . Please note that this process cannot be expressed directly as a time-invariant function of ${(H_{k}, Z_{k}), k \in}$ . Indeed, by assuming $k = j L + ℓ$ , we can see from (A5) that the function to produce ${\hat{H}}_{k}^{(T)}$ from ${(H_{k}, Z_{k}), k \in}$ depends on the time index k via ℓ. To sidestep this problem, and to facilitate the analysis, we introduce a “dummy” matrix-valued process ${A_{k, ℓ}, k \in}$ of dimension $n_{r} \times n_{t}$ , where the row-r column-t entry of $A_{k, ℓ}$ is given by

$A_{k, ℓ} (r, t) = \sum_{τ = - T}^{T - 1} α_{- τ L, ℓ} (r, t) (\sqrt{\frac{SNR}{n_{t}}} H_{k - τ L - ℓ + t - 1} (r, t) + Z_{k - τ L - ℓ + t - 1} (r)) .$

(A9)

Here the coefficients $α_{- τ L, ℓ}$ , $τ = - T, \dots, T$ have the same value as those in (A5) for a given L and ℓ. Consequently, for every $ℓ = n_{t}, \dots, L - 1$ , the process ${A_{k, ℓ}, k \in ℤ}$ is a time-invariant function of ${(H_{k}, Z_{k}), k \in ℤ}$ that coincides with ${\hat{H}}_{k}^{(T)}$ for $k = j L + ℓ$ . This in turn implies that for every $ℓ = n_{t}, \dots, L - 1$ , the process ${(A_{k, ℓ}, H_{k}, Z_{k}), k \in ℤ}$ is jointly weakly mixing. Furthermore, by the definition of weakly mixing [35,36,37], the process ${(A_{j L + ℓ, ℓ}, H_{j L + ℓ}, Z_{j L + ℓ}), j \in ℤ}$ (for every $ℓ = n_{t}, \dots, L - 1$ ) is also jointly weakly mixing. Since for $k = j L + ℓ$ , $k \in D$ , the matrix $A_{j L + ℓ, ℓ}$ is identical to ${\hat{H}}_{j L + ℓ}^{(T)}$ , it follows that the process ${({\hat{H}}_{j L + ℓ}^{(T)}, H_{j L + ℓ}, Z_{j L + ℓ}), j \in ℤ}$ (for every $ℓ = n_{t}, \dots, L - 1$ ) is jointly weakly mixing, which implies ergodicity.
We finally evaluate the joint behavior of the two processes ${({\hat{H}}_{j L + ℓ}^{(T)}, H_{j L + ℓ}, Z_{j L + ℓ}), j \in ℤ}$ and ${X_{j L + ℓ}, j \in ℤ}$ for $ℓ = n_{t}, \dots, L - 1$ . Since, for every $ℓ = n_{t}, \dots, L - 1$ , the process ${X_{j L + ℓ}, j \in ℤ}$ is i.i.d. and independent from ${({\hat{H}}_{j L + ℓ}^{(T)}, H_{j L + ℓ}, Z_{j L + ℓ}), j \in ℤ}$ , we have by ([38], Lemma 2) that

${({\hat{H}}_{j L + ℓ}^{(T)}, H_{j L + ℓ}, Z_{j L + ℓ}, X_{j L + ℓ}), j \in ℤ}, ℓ = n_{t}, \dots, L - 1$

(A10)

is jointly ergodic.
Please note that the process ${{\hat{H}}_{k}^{(T)}, k \in D}$ is a function of ${(H_{k}, Z_{k}), k \in P}$ . Since ${Z_{k}, k \in D}$ has zero mean and is independent from ${(H_{k}, Z_{k}), k \in P}$ and ${X_{k}, k \in D}$ , it follows that for every of $ℓ = n_{t}, \dots, L - 1$ (which correspond to $k \in D$ ),

$E [Z_{ℓ}^{†} {\hat{H}}_{ℓ}^{(T)} X_{ℓ}] = 0 .$

(A11)

Appendix B. Variance of the Interpolation Error for $L \leq \frac{1}{2 λ_{D}}$

Recall that the variance of the interpolation error tends to the following value as T tends to infinity:

\begin{matrix} ϵ_{ℓ}^{2} (t) = 1 - \int_{- 1 / 2}^{1 / 2} \frac{SNR {|f_{L, ℓ - t + 1} (λ)|}^{2}}{SNR f_{L, 0} (λ) + n_{t}} d λ \end{matrix}

(A12)

where

f_{L, ℓ} (λ) = \frac{1}{L} \sum_{ν = 0}^{L - 1} {\bar{f}}_{H} (\frac{λ - ν}{L}) e^{i 2 π ℓ \frac{λ - ν}{L}}, - \frac{1}{2} \leq λ \leq \frac{1}{2} .

(A13)

This can be lower-bounded as

\begin{matrix} (A14) & ϵ_{ℓ}^{2} (t) = & \int_{- 1 / 2}^{1 / 2} \frac{n_{t} f_{L, 0} (λ)}{SNR f_{L, 0} (λ) + n_{t}} d λ + \int_{- 1 / 2}^{1 / 2} \frac{SNR ({[f_{L, 0} (λ)]}^{2} - {|f_{L, ℓ - t + 1} (λ)|}^{2})}{SNR f_{L, 0} (λ) + n_{t}} d λ \\ (A15) & \geq & \int_{- 1 / 2}^{1 / 2} \frac{SNR ({[f_{L, 0} (λ)]}^{2} - {|f_{L, ℓ - t + 1} (λ)|}^{2})}{SNR f_{L, 0} (λ) + n_{t}} d λ \end{matrix}

where the inequality follows because the first integral in (A14) is nonnegative. Defining

ℓ^{'} ≜ ℓ - t + 1

, we next note that

\begin{matrix} (A16) & {[f_{L, 0} (λ)]}^{2} - {|f_{L, ℓ^{'}} (λ)|}^{2} & = \frac{1}{L^{2}} \sum_{ν = 0}^{L - 1} \sum_{\begin{matrix} ν^{'} = 0, \\ ν^{'} \neq ν \end{matrix}}^{L - 1} {\bar{f}}_{H} (\frac{λ - ν}{L}) {\bar{f}}_{H} (\frac{λ - ν^{'}}{L}) [1 - e^{i 2 π ℓ^{'} \frac{λ - ν}{L}} \cdot e^{- i 2 π ℓ^{'} \frac{λ - ν^{'}}{L}}] \\ (A17) & = \frac{2}{L^{2}} \sum_{ν = 0}^{L - 1} \sum_{\begin{matrix} ν^{'} > ν \end{matrix}}^{L - 1} {\bar{f}}_{H} (\frac{λ - ν}{L}) {\bar{f}}_{H} (\frac{λ - ν^{'}}{L}) [1 - cos (2 π ℓ^{'} \frac{ν^{'} - ν}{L})] . \end{matrix}

Since the summands are nonnegative, it follows that

\begin{matrix} {[f_{L, 0} (λ)]}^{2} - {|f_{L, ℓ^{'}} (λ)|}^{2} \geq \frac{2}{L^{2}} {\bar{f}}_{H} (\frac{λ}{L}) {\bar{f}}_{H} (\frac{λ - 1}{L}) [1 - cos (\frac{2 π ℓ^{'}}{L})] . \end{matrix}

(A18)

The RHS of (A15) can thus be lower-bounded as

\begin{matrix} \int_{- 1 / 2}^{1 / 2} \frac{SNR ({[f_{L, 0} (λ)]}^{2} - {|f_{L, ℓ^{'}} (λ)|}^{2})}{SNR f_{L, 0} (λ) + n_{t}} d λ & \geq \frac{2 [1 - cos (\frac{2 π ℓ^{'}}{L})]}{L^{2}} \int_{L} \frac{SNR {\bar{f}}_{H} (\frac{λ}{L}) {\bar{f}}_{H} (\frac{λ - 1}{L})}{SNR f_{L, 0} (λ) + n_{t}} d λ \end{matrix}

(A19)

where

L

denotes the interval in

[- 1 / 2, 1 / 2]

where

{\bar{f}}_{H} (\frac{λ}{L})

and

{\bar{f}}_{H} (\frac{λ - 1}{L})

overlap.

We next express L as

L = \frac{1}{2 λ_{D}} + ε

(A20)

for some

ε > 0

. Then, the interval

L

is of Lebesgue measure

μ (L) = \min (1, 2 λ_{D} ε) .

(A21)

By Fatou’s lemma [39], we obtain that

\begin{matrix} \underset{SNR \to \infty}{lim inf} \frac{2 [1 - cos (\frac{2 π ℓ^{'}}{L})]}{L^{2}} \int_{L} \frac{SNR {\bar{f}}_{H} (\frac{λ}{L}) {\bar{f}}_{H} (\frac{λ - 1}{L})}{SNR f_{L, 0} (λ) + n_{t}} d λ \\ (A22) & \geq \frac{2 [1 - cos (\frac{2 π ℓ^{'}}{L})]}{L^{2}} \int_{L} \underset{SNR \to \infty}{lim inf} \frac{SNR {\bar{f}}_{H} (\frac{λ}{L}) {\bar{f}}_{H} (\frac{λ - 1}{L})}{SNR f_{L, 0} (λ) + n_{t}} d λ \\ (A23) & = \frac{2 [1 - cos (\frac{2 π ℓ^{'}}{L})]}{L^{2}} \int_{L} \frac{{\bar{f}}_{H} (\frac{λ}{L}) {\bar{f}}_{H} (\frac{λ - 1}{L})}{f_{L, 0} (λ)} d λ . \end{matrix}

Since

L

is of positive Lebesgue measure, and since the integrand on the RHS of (A23) is strictly positive, it follows that [40]

\int_{L} \frac{{\bar{f}}_{H} (\frac{λ}{L}) {\bar{f}}_{H} (\frac{λ - 1}{L})}{f_{L, 0} (λ)} d λ > 0 .

(A24)

Furthermore, for

ℓ^{'} = ℓ - t + 1

and

ℓ = n_{t}, \dots, L - 1

, we have

cos (\frac{2 π ℓ^{'}}{L}) < 1 .

(A25)

Consequently, combining (A25) and (A24) with (A23), (A19), and (A15), we obtain the desired result

\underset{SNR \to \infty}{lim inf} ϵ_{ℓ}^{2} (t) > 0 .

(A26)

References

Foschini, G.J. Layered space-time architecture for wireless communication in a fading environment when using multi-element antennas. Bell Labs Tech. J. 1996, 1, 41–59. [Google Scholar] [CrossRef]
Telatar, E. Capacity of multi-antenna Gaussian channels. Eur. Trans. Telecommun. 1999, 10, 585–595. [Google Scholar] [CrossRef]
Lapidoth, A. On the asymptotic capacity of stationary Gaussian fading channels. IEEE Trans. Inf. Theory 2005, 51, 437–446. [Google Scholar] [CrossRef] [Green Version]
Zheng, L.; Tse, D.N.C. Diversity and multiplexing: A fundamental tradeoff in multiple-antenna channels. IEEE Trans. Inf. Theory 2003, 49, 1073–1096. [Google Scholar] [CrossRef] [Green Version]
Heath, R.W., Jr.; Lozano, A. Foundations of MIMO Communication; Cambridge University Press: Cambridge, UK, 2019. [Google Scholar]
Nikbakht, H.; Wigger, M.A.; Shamai Shitz, S. Multiplexing gains under mixed-delay constraints on Wyner’s soft-handoff model. Entropy 2020, 22, 182. [Google Scholar] [CrossRef] [Green Version]
Lapidoth, A. Nearest neighbor decoding for additive non-Gaussian noise channels. IEEE Trans. Inf. Theory 1996, 42, 1520–1529. [Google Scholar] [CrossRef]
Marzetta, T.L. BLAST training: Estimating channel characteristics for high-capacity space-time wireless. In Proceedings of the 37th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 22–24 September 1999. [Google Scholar]
Hassibi, B.; Hochwald, B.M. How much training is needed in multiple-antenna wireless links? IEEE Trans. Inf. Theory 2003, 49, 951–963. [Google Scholar] [CrossRef] [Green Version]
Jindal, N.; Lozano, A. A unified treatment of optimum pilot overhead in multipath fading channels. IEEE Trans. Commun. 2010, 58, 2939–2948. [Google Scholar] [CrossRef] [Green Version]
Koch, T.; Lapidoth, A. The fading number and degrees of freedom in non-coherent MIMO fading channels: A peace pipe. In Proceedings of the IEEE International Symposium on Information Theory (ISIT 2005), Adelaide, Australia, 4–9 September 2005. [Google Scholar] [CrossRef] [Green Version]
Etkin, R.H.; Tse, D.N.C. Degrees of freedom in some underspread MIMO fading channels. IEEE Trans. Inf. Theory 2006, 52, 1576–1608. [Google Scholar] [CrossRef]
Nam, J.; Caire, G.; Debbah, M.; Poor, H.V. Capacity scaling of massive MIMO in strong spatial correlation regimes. IEEE Trans. Inf. Theory 2020, 66, 3040–3064. [Google Scholar] [CrossRef] [Green Version]
Gomez-Cuba, F.; Chowdhury, M.; Manolakos, A.; Erkip, E.; Goldsmith, A.J. Capacity scaling in a non-coherent wideband massive SIMO block fading channel. IEEE Trans. Wirel. Commun. 2019, 18, 5691–5704. [Google Scholar] [CrossRef] [Green Version]
Vu, M.N.; Tran, N.H.; Wijeratne, D.G.; Pham, K.; Lee, K.S.; Nguyen, D.H.N. Optimal signaling schemes and capacity of non-coherent Rician fading channels with low-resolution output quantization. IEEE Trans. Wirel. Commun. 2019, 18, 2989–3004. [Google Scholar] [CrossRef]
Lapidoth, A.; Shamai Shitz, S. Fading channels: How perfect need “perfect side information” be? IEEE Trans. Inf. Theory 2002, 48, 1118–1134. [Google Scholar] [CrossRef]
Weingarten, H.; Steinberg, Y.; Shamai Shitz, S. Gaussian codes and weighted nearest neighbor decoding in fading multiple-antenna channels. IEEE Trans. Inf. Theory 2004, 50, 1665–1686. [Google Scholar] [CrossRef]
Lozano, A. Interplay of spectral efficiency, power and Doppler spectrum for reference-signal-assisted wireless communication. IEEE Trans. Wirel. Commun. 2008, 7, 5020–5029. [Google Scholar] [CrossRef] [Green Version]
Asyhari, A.T.; Koch, T.; Guillén i Fàbregas, A. Nearest neighbour decoding and pilot-aided channel estimation in stationary Gaussian flat-fading channels. In Proceedings of the IEEE International Symposium on Information Theory (ISIT 2011), St. Petersburg, Russia, 31 July–5 August 2011. [Google Scholar] [CrossRef] [Green Version]
Asyhari, A.T.; Koch, T.; Guillén i Fàbregas, A. Nearest neighbour decoding with pilot-assisted channel estimation for fading multiple-access channels. In Proceedings of the 49th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 28–30 September 2011. [Google Scholar] [CrossRef] [Green Version]
Asyhari, A.T.; ten Brink, S. Orthogonal or superimposed pilots? A rate-efficient channel estimation strategy for stationary MIMO fading channels. IEEE Trans. Wirel. Commun. 2017, 16, 2776–2789. [Google Scholar] [CrossRef]
Verenzuela, D. Exploring alternative massive MIMO designs: Superimposed pilots and mixed-ADCs. Ph.D. Thesis, Linköping University, Linköping, Sweden, 2020. [Google Scholar]
Ohno, S.; Giannakis, G.B. Average-rate optimal PSAM transmissions over time-selective fading channels. IEEE Trans. Wirel. Commun. 2002, 1, 712–720. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
Rappaport, T.S. Wireless Communications: Principles and Practice, 2nd ed.; Prentice Hall PTR: Upper Saddle River, NJ, USA, 2002. [Google Scholar]
Saunders, S.R.; Aragón Zavala, A. Antennas and Propagation for Wireless Communication Systems, 2nd ed.; Wiley: Chichester, UK, 2007. [Google Scholar]
Merhav, N.; Kaplan, G.; Lapidoth, A.; Shamai Shitz, S. On information rates for mismatched decoders. IEEE Trans. Inf. Theory 1994, 40, 1953–1967. [Google Scholar] [CrossRef] [Green Version]
Durrett, R. Probability: Theory and Examples, 4th ed.; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
Asyhari, A.T.; Guillén i Fàbregas, A. Nearest neighbor decoding in MIMO block-fading channels with imperfect CSIR. IEEE Trans. Inf. Theory 2012, 58, 1483–1517. [Google Scholar] [CrossRef] [Green Version]
Van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes: With Applications to Statistics; Springer: New York, NY, USA, 1996. [Google Scholar]
Grant, A. Rayleigh fading multi-antenna channels. EURASIP J. Appl. Signal Process. 2002, 2002, 316–329. [Google Scholar] [CrossRef] [Green Version]
Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables; Dover: New York, NY, USA, 1965. [Google Scholar]
Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, UK, 1985. [Google Scholar]
Poor, H.V. An Introduction to Signal Detection and Estimation, 2nd ed.; Springer (A Dowden & Culver Book): New York, NY, USA, 1994. [Google Scholar]
Sethuraman, V.; Hajek, B. Capacity per unit energy of fading channels with a peak constraint. IEEE Trans. Inf. Theory 2005, 51, 3102–3120. [Google Scholar] [CrossRef]
Brown, J.R. Ergodic Theory and Topological Dynamics; Academic Press: New York, NY, USA, 1976. [Google Scholar]
Petersen, K. Ergodic Theory; Cambridge Studies in Advanced Mathematics 2; Cambridge University Press: Cambridge, UK, 1983. [Google Scholar]
Kim, Y.H. A coding theorem for a class of stationary channels with feedback. IEEE Trans. Inf. Theory 2008, 54, 1488–1499. [Google Scholar] [CrossRef]
Royden, H.L. Real Analysis, 2nd ed.; Macmillan: New York, NY, USA, 1968. [Google Scholar]
Weir, A.J. Lebesgue Integration and Measure; Cambridge University Press: Cambridge, UK, 1973. [Google Scholar]

Figure 1. Structure of pilot and data transmission for

n_{t} = 2

,

L = 7

, and

T = 2

.

Figure 1. Structure of pilot and data transmission for

n_{t} = 2

,

L = 7

, and

T = 2

.

Figure 2. The two-user MIMO fading MAC system model.

Figure 3. Structure of joint-transmission scheme for

n_{t, 1} = 2

,

n_{t, 2} = 1

,

L = 7

, and

T = 2

.

Figure 3. Structure of joint-transmission scheme for

n_{t, 1} = 2

,

n_{t, 2} = 1

,

L = 7

, and

T = 2

.

Figure 4. Structure of TDMA scheme for

n_{t, 1} = 2

,

n_{t, 2} = 1

,

L = 4

, and

T = 2

.

Figure 4. Structure of TDMA scheme for

n_{t, 1} = 2

,

n_{t, 2} = 1

,

L = 4

, and

T = 2

.

Figure 5. Pre-log regions for a fading MAC with

n_{r} = 2

and

n_{t, 1} = n_{t, 2} = 1

for different values of

L^{*}

. Depicted are the pre-log region for the joint-transmission scheme as given in Theorem 2 (dashed line), the pre-log region of the TDMA scheme as given in Remark 2 (solid line), and the pre-log region of the coherent TDMA scheme (62) (dotted line).

Figure 5. Pre-log regions for a fading MAC with

n_{r} = 2

and

n_{t, 1} = n_{t, 2} = 1

for different values of

L^{*}

. Depicted are the pre-log region for the joint-transmission scheme as given in Theorem 2 (dashed line), the pre-log region of the TDMA scheme as given in Remark 2 (solid line), and the pre-log region of the coherent TDMA scheme (62) (dotted line).

Table 1. Typical values of

L^{*}

for various environments with

f_{c}

ranging from 800 MHz to 5 GHz. The values of the delay spread are taken from [12,25] for indoor and urban environments and from [26] for hilly area environments.

Table 1. Typical values of

L^{*}

for various environments with

f_{c}

ranging from 800 MHz to 5 GHz. The values of the delay spread are taken from [12,25] for indoor and urban environments and from [26] for hilly area environments.

Environment	Delay Spread $σ_{τ}$	Mobile Speed v	$λ_{D} \approx 5 σ_{τ} \frac{v}{c} f_{c}$	$L^{*}$
Indoor	10–100 ns	5 km/h	$2 \times 10^{- 7}$ – $10^{- 5}$	$5 \times 10^{4}$ – $2.5 \times 10^{6}$
Urban	1–2 $μ$ s	5 km/h	$2 \times 10^{- 5}$ – $2 \times 10^{- 4}$	$2.5 \times 10^{3}$ – $2.5 \times 10^{4}$
Urban	1–2 $μ$ s	75 km/h	$2 \times 10^{- 4}$ – $0.004$	125– $2.5 \times 10^{3}$
Hilly area	3–10 $μ$ s	200 km/h	$0.002$ – $0.05$	10–250

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Asyhari, A.T.; Koch, T.; Guillén i Fàbregas, A. Nearest Neighbor Decoding and Pilot-Aided Channel Estimation for Fading Channels. Entropy 2020, 22, 971. https://doi.org/10.3390/e22090971

AMA Style

Asyhari AT, Koch T, Guillén i Fàbregas A. Nearest Neighbor Decoding and Pilot-Aided Channel Estimation for Fading Channels. Entropy. 2020; 22(9):971. https://doi.org/10.3390/e22090971

Chicago/Turabian Style

Asyhari, A. Taufiq, Tobias Koch, and Albert Guillén i Fàbregas. 2020. "Nearest Neighbor Decoding and Pilot-Aided Channel Estimation for Fading Channels" Entropy 22, no. 9: 971. https://doi.org/10.3390/e22090971

APA Style

Asyhari, A. T., Koch, T., & Guillén i Fàbregas, A. (2020). Nearest Neighbor Decoding and Pilot-Aided Channel Estimation for Fading Channels. Entropy, 22(9), 971. https://doi.org/10.3390/e22090971

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Nearest Neighbor Decoding and Pilot-Aided Channel Estimation for Fading Channels^†

Abstract

1. Introduction

2. System Model and Transmission Scheme

3. The Pre-Log

4. Fading Multiple-Access Channels

4.1. The MAC Pre-Log

4.2. Joint Transmission Versus TDMA

4.2.1. Receiver Employs Less Antennas than Transmitters

4.2.2. Receiver Employs More Antennas than Transmitters

4.2.3. A Case in between

4.3. Typical Values of $L^{*}$

5. Proof of Theorem 1

5.1. Linear Interpolator

5.2. Achievable Rates and Pre-Logs

5.2.1. $I_{T}^{gmi} (SNR)$ for a Fixed T

5.2.2. $I_{T}^{gmi} (SNR)$ as $T \to \infty$

5.2.3. The Pre-Log

5.3. A Note on the Input Distribution

6. Proof of Theorem 2

6.1. Error Event $(m_{1}^{'} \neq 1, m_{2}^{'} = 1)$

6.2. Error Event $(m_{1}^{'} = 1, m_{2}^{'} \neq 1)$

6.3. Error Event $(m_{1}^{'} \neq 1, m_{2}^{'} \neq 1)$

7. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Proof of Lemma 1

Appendix B. Variance of the Interpolation Error for $L \leq \frac{1}{2 λ_{D}}$

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Nearest Neighbor Decoding and Pilot-Aided Channel Estimation for Fading Channels †

Abstract

1. Introduction

2. System Model and Transmission Scheme

3. The Pre-Log

4. Fading Multiple-Access Channels

4.1. The MAC Pre-Log

4.2. Joint Transmission Versus TDMA

4.2.1. Receiver Employs Less Antennas than Transmitters

4.2.2. Receiver Employs More Antennas than Transmitters

4.2.3. A Case in between

4.3. Typical Values of L *

5. Proof of Theorem 1

5.1. Linear Interpolator

5.2. Achievable Rates and Pre-Logs

5.2.1. I T gmi ( SNR ) for a Fixed T

5.2.2. I T gmi ( SNR ) as T → ∞

5.2.3. The Pre-Log

5.3. A Note on the Input Distribution

6. Proof of Theorem 2

6.1. Error Event ( m 1 ′ ≠ 1 , m 2 ′ = 1 )

6.2. Error Event ( m 1 ′ = 1 , m 2 ′ ≠ 1 )

6.3. Error Event ( m 1 ′ ≠ 1 , m 2 ′ ≠ 1 )

7. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Proof of Lemma 1

Appendix B. Variance of the Interpolation Error for L ≤ 1 2 λ D

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Nearest Neighbor Decoding and Pilot-Aided Channel Estimation for Fading Channels^†

4.3. Typical Values of $L^{*}$

5.2.1. $I_{T}^{gmi} (SNR)$ for a Fixed T

5.2.2. $I_{T}^{gmi} (SNR)$ as $T \to \infty$

6.1. Error Event $(m_{1}^{'} \neq 1, m_{2}^{'} = 1)$

6.2. Error Event $(m_{1}^{'} = 1, m_{2}^{'} \neq 1)$

6.3. Error Event $(m_{1}^{'} \neq 1, m_{2}^{'} \neq 1)$

Appendix B. Variance of the Interpolation Error for $L \leq \frac{1}{2 λ_{D}}$