1. Introduction
Massive machine-type communication (mMTC) is a rapidly growing class of wireless communications that aim to connect tens of billions of unattended devices to wireless networks. One significant application of mMTC is that of distributed sensing, which consists of a large number of wireless sensors that gather data over time and transmit their data to a central server. This server then interprets the received data to produce useful information and/or make executive decisions. When combined with recent advances in machine learning, such networks are expected to open a vast realm of economic and academic opportunities. However, the large population of unattended devices within these networks threatens to overwhelm existing wireless communication infrastructures by dramatically increasing the number of network connections; it is expected that the number of machines connected to wireless networks will soon exceed the population of the planet by at least an order of magnitude. Additionally, the sporadic and bursty nature of sensor transmissions makes them highly inefficient under current estimation/enrollment/scheduling procedures typical of cellular networks [
1,
2]. The combination of these challenges necessitates the design of novel physical and medium access control (MAC) layer protocols to efficiently handle the demands of these wireless devices.
One recently proposed paradigm for efficiently handling the demands of unattended devices is that of unsourced random access (URA), as described by Polyanskiy in 2017 [
3]. URA captures many of the nuances of IoT devices by considering a network with an exceedingly large number of uncoordinated devices, of which only a small percentage is active at any given point in time. When a device/user is active, it encodes its short message using a common codebook and then transmits its codeword over a regularly scheduled time slot, as facilitated by a beacon. Furthermore, the power available to each user is strictly limited and assumed to be uniform across devices. The use of a common codebook is characteristic of URA and has two important implications: First, the network does not need to maintain a dictionary of active devices and their unique codebook information; second, the receiver does not know which node transmitted a given message unless the message itself contains a unique identifier. The receiver is then tasked with recovering an unordered list of transmitted messages sent during each time slot by the collection of active devices. The performance of URA schemes is evaluated with respect to the per-user probability of error (PUPE), which is the probability that a user’s message is not present in the receiver’s final list of decoded messages; this criterion is formally defined in (
3). In [
3], Polyanskiy provides finite block length achievability bounds for short block lengths typical of URA applications using random Gaussian coding and maximum likelihood (ML) decoding. However, these bounds are derived in the absence of complexity constraints and, thus, are impractical for deployment in real-world networks. Over the past few years, several alternative URA schemes have been proposed that offer tractable complexity with minor degradation in performance. For example, [
4,
5,
6,
7,
8,
9] exploit connections between URA and compressed sensing (CS) to reduce decoding complexity. In [
10,
11,
12,
13,
14,
15,
16,
17,
18,
19], schemes are presented that seek to separate user’s signals and recover them using single-user coding techniques. These results are extended to multiple-input multiple-output (MIMO) scenarios in [
20,
21,
22,
23,
24,
25].
All of the aforementioned URA schemes employ concatenated channel codes to recover messages sent by the collection of active users at the receiver. We note that the term
channel code is used broadly such that it includes spreading sequences and certain signal dictionaries such as those commonly used for CS. Although it is conceptually simpler to decode inner and outer codes separately, it is a well-known fact within coding theory that dynamically sharing information between the inner and outer decoders will often improve the performance of the system [
26,
27]. In this paper, we present a novel framework for sharing information between a wide class of inner codes and a tree-based outer code. This approach significantly improves PUPE performance and reduces the computational complexity of the scheme. Specifically, our main contributions are as follows:
- 1
A general system model consisting of a wide class of inner codes and an outer tree code is developed. An enhanced decoding algorithm is presented whereby the outer tree code may guide the convergence of the inner code by restricting the search space of the inner decoder to parity consistent paths.
- 2
The coded compressed sensing (CCS) scheme of Amalladinne et al. in [
5] is considered under this model. With the enhanced decoding algorithm, the
required to achieve a fixed PUPE is reduced by nearly 1 dB when the number of users is low and the number of columns in the sensing matrix is reduced by over 99% by the last slot, thus significantly reducing decoding complexity.
- 3
The CCS for massive MIMO scheme of Fengler et al. in [
21] is considered under this model. With the enhanced decoding algorithm, the number of antennas required to achieve a fixed PUPE is reduced by 23% in certain regimes, and the decoding runtime is reduced by 70–90%.
2. System Model
Consider a URA scenario consisting of K-active devices, which are referred to by a fixed but arbitrary label . Each of these users wishes to simultaneously transmit a B-bit message to a central base station over a Gaussian multiple access channel (GMAC) using a concatenated code consisting of an inner code and an outer tree code . This inner code has the crucial property that, given a linear combination of codewords, the constituent information messages may be individually recovered with high probability. Furthermore, we assume that the probability that any two active users’ messages that are identical is low, i.e., whenever .
We consider a scenario where it is either computationally intractable to inner encode/decode the entire message simultaneously or it is otherwise impractical to transmit the entire inner codeword at once. Thus, each user must divide its information message into fragments and inner encode/decode each fragment individually. In order to ensure that the message can be reconstructed from its fragments at the receiver, the information fragments are first connected together by using the outer tree-based code and then inner encoded using code . The resulting signal is transmitted over the channel. We elaborate on this process below.
Each message
is broken into
L fragments where fragment
ℓ has length
and
. Notationally,
is represented as the concatenation of fragments by
. The fragments are outer-encoded together by adding parity bits to the end of each fragment, with the exception of the first fragment. This is accomplished by taking random linear combinations of the information bits contained in previous sections. The parity bits appended to the end of section
ℓ are denoted by
, and they collectively have length
. This outer-encoded vector is denoted by
, where
. The vector
now assumes the form shown in
Figure 1.
At this point, it may be instructive to explain our preference for the tree code over possible alternatives such as LDPC codes, polar codes, and BCH codes. The main purpose of the tree code is disambiguation, as opposed to error correction. It offers a principled method to trade-off performance and complexity, as detailed in [
5]. Furthermore, it provides optimal asymptotic performances, as shown in [
6]. It remains unclear how the aforementioned alternative coding techniques would work in this current context. This situation acts as a strong motivation for the development of the tree code in [
5] and for its adoption in [
4,
6,
21] as well as within this manuscript.
After the outer-encoding process is complete, user
j inner encodes each fragment
individually using
and concatenates the encoded fragments to form signal
. Each user then simultaneously transmits its signal to the base station over a GMAC. The received signal at the base station assumes the following form:
where
is a vector of Gaussian noise with independent standard normal components, and
d accounts for the transmit power.
Recall that the receiver is tasked with producing an unordered list of all transmitted messages. A naive method to perform this is to have inner and outer decoders operate independently of each other. That is, the inner decoder is first run independently on each of the
L sections in
. Since
has the property that, given a linear combination of its codewords, the constituent input signals may be recovered with high probability, the aggregate signal in every slot can be expanded into a list of
K encoded fragments
. It may be helpful to remind the reader that
does not necessarily correspond to the message sent by user
j because the receiver has no way of connecting a received message to an active user within URA. Therefore, at this stage, the receiver has
L unordered lists
, each with
K outer-encoded fragments. From these lists, the receiver wishes to recover
K messages sent by the active devices during the frame. This is performed by running the tree decoder on the
L lists to find parity-consistent paths across lists. Specifically, the tree decoder first selects a root fragment from list
and computes the corresponding parity section
. The tree decoder then branches out to all fragments in list
, for which its parity sections match
; each match creates a parity-consistent partial path. This process repeats until the last list
is processed. At this point, if there is a single path from
to
, the message created by that path is deemed valid and stored for further processing. On the other hand, if there are multiple parity-consistent paths from a given root fragment or no parity consistent paths from a given root fragment, a decoding failure is declared.
Figure 2 illustrates this scheme.
While intuitive, this strategy is sub-optimal because information is not being shared by the inner and outer decoders. If the inner and outer decoders were to operate concurrently, the output of the outer decoder could be used to reduce the search space of the inner decoder, thus guiding the convergence of the inner decoder to a parity-consistent solution. This would also reduce the search space of the inner code, thus providing an avenue for reducing decoding complexity [
28,
29]. Explicitly, assume that immediately after the inner decoder produces list
, the outer decoder finds all parity-consistent partial paths from the root node to stage
ℓ. Each of these
R parity-consistent partial paths has an associated parity section
. Furthermore, it is known that only those fragments in
that contain one of the
admissible parity sections may be part of
K-transmitted messages. Thus, when producing
, the search space of the inner decoder may be reduced drastically to only the subset for which fragments contain an admissible parity section
.
This algorithmic enhancement is a key contribution; it has the potential to simultaneously reduce decoding complexity and improve PUPE performance. Still, a precise characterization of the benefits of this enhanced algorithm depends on the inner code chosen. We now consider two situations in which this algorithm may be applied: coded compressed sensing (CCS) [
5] and CCS for massive MIMO [
21]. For each of the considered schemes, complexity reduction and performance improvements are quantified. We emphasize that this algorithmic enhancement is applicable to other scenarios beyond those considered in this paper. One such example is the CHIRRUP scheme presented by Calderbank and Thompson in [
4]. This latter scheme uses an efficient CS solver based on a second order Reed-Muller code concatenated with a tree outer code as the foundation for a divide-and-conquer approach. The decoding process is performed by using a depth-first search within inner blocks, followed by stitching. The depth-first search of later blocks could potentially be informed by the type of search space pruning described herein. Details are omitted due to space considerations.
3. Case Study 1: Coded Compressed Sensing
CCS has emerged as a practical scheme for URA that offers good performance with low complexity [
5,
6,
7,
8,
9]. We note briefly that some recent variants of CCS that employ an LDPC outer code [
7,
8,
9] are not compatible with the model presented in this paper. Thus, we focus on the original version of the algorithm presented by Amalladinne et al. in [
5]. At its core, CCS seeks to exploit a connection between URA and compressed sensing (CS). This connection may be understood by transforming a
B-bit message
into a length
index vector
, where the single non-zero entry is a one at location
. The notation
denotes binary message
interpreted as a radix-10 integer. This bijection between
and
is denoted by
. The vector
may then be compressed into signal
using sensing matrix
, where
is an appropriately chosen CS matrix. We note that there is some flexibility in the selection of
: Gaussian, Rademacher, and sub-sampled Hadamard/DFT matrices are suitable choices. Additionally, LDPC-based or other structured sensing matrices may be employed instead. Regardless of the choice of
, the columns of
must be normalized such that
for every
i in order to satisfy URA power constraints.
Each device then transmits its signal over the noisy multiple access channel, which naturally adds the sent signals together. At the receiver, the original signals may be recovered from using standard CS recovery techniques such as non-negative least-squares (NNLS) or least absolute shrinkage and selection operator (LASSO). However, for messages of even modest lengths, the size of is too large for standard CS solvers to handle. In order to circumvent this challenge, a divide and conquer approach can be employed.
In CCS, inner code
consists of the CS encoder, and the outer tree code
is identical to that presented in
Section 2. Note that there is an additional step between
and
: The outer-encoded message
is transformed into the inner code input
via the bijection
described above. Furthermore, we emphasize that
has the property that, given a linear combination of
codewords, the corresponding set of
K one-sparse constituent inputs may be recovered with high probability. This, combined with the assumption that
for
, makes CCS an eligible candidate for the enhanced decoding algorithm described previously. We review below CCS encoding and decoding operations.
3.1. CCS Encoding
When user
j wishes to transmit a message to the central base station, it encodes its message in the following manner. First, it breaks its
B-bit message into
L fragments and outer-encodes the
L fragments using the tree code described in
Section 2; this yields outer codeword
. Recall that fragment
ℓ has
information bits and
parity bits. We emphasize that
is constant for all sections in CCS, but the ratio of
to
is subject to change. Fragment
is then converted into length
index vector, denoted by
, and compressed using sensing matrix
into vector
. Within the next transmission frame, user
j sends its encoded fragments across GMAC along with all other active users. At the base station, the received vector associated with slot
ℓ assumes the following form:
where
is a vector of Gaussian noise with standard normal components, and
d reflects the transmit power. This is a canonical form of a
K-sparse compressed vector embedded in Gaussian noise.
3.2. CCS Decoding
CCS decoding begins by running a standard CS solver such as NNLS or LASSO on each section to produce LK-sparse vectors. The K indices in each of these L slots are converted back to binary representations using , and the tree decoder is run on resultant L lists to produce estimates of transmitted messages.
This process may be improved by applying the proposed enhanced decoding algorithm, which proceeds as follows for CCS. The inner CS solver first recovers section 1, and then computes the set of possible parity patterns for section 2, denoted by
. The columns of
are then pruned dynamically to remove all columns associated with inadmissible parity patterns in section 2. This reduces the number of columns of
from
to
[
28]. Section 2 is then recovered, and the process repeats itself until section
L has been decoded. At this point, valid paths through the
L lists are identified, and the list of estimated transmitted messages is finalized.
Figure 3 illustrates this process.
3.3. Results
As previously mentioned, the algorithmic enhancement presented in this article has the potential to improve both the performance and computational complexity of certain concatenated coding schemes. Before quantifying the gains obtained by applying this algorithmic enhancement to CCS, we define appropriate measures of performance and computational complexity. Being a URA scheme, the performance of CCS is evaluated with respect to the per-user probability of error (PUPE), which is given by the following:
where
is the estimated list of transmitted messages, with at most
K items. Since many different CS solvers with varying computational complexities may be employed within the CCS framework, the complexity reduction offered by the enhanced decoding algorithm is quantified by counting the number of columns removed from
.
The column pruning operation has at least four major implications on the performance and complexity of CCS. These implications are summarized below.
- 1
Many CS solvers rely on iterative methods or convex optimization solvers to recover from . Decreasing the width of will result in a reduction in computational complexity, the exact size of which will depend on the CS solver employed.
- 2
When all message fragments have been correctly recovered for stages , the matrix is pruned in such a way that is perfectly consistent with the true signal. In this scenario, the search space for the CS solver is significantly reduced and the performance will improve.
- 3
When an erroneous message fragment has been incorrectly identified as a true message fragment by stage ℓ, the column pruning operation will guide the CS solver to a list of fragments that is more likely to contain additional erroneous fragments. This further propagates the error and helps erroneous paths stay alive longer.
- 4
When a true fragment is mistakenly removed from a CS list, its associated parity pattern may be discarded and disappear entirely. This results in the loss of a correct message and additional structured noise, which may decrease the PUPE performance of other valid messages.
Despite having positive and negative aspects, the net effect of the enhanced decoding algorithm on the system’s PUPE performance is positive, as illustrated in
Figure 4. This figure was generated by simulating a CCS scenario with
users, each of which wishes to transmit a
bit message divided into
stages over
channel uses. NNLS was used as the CS solver.
From
Figure 4, we observe that the enhanced decoding algorithm reduces the required
by nearly 1 dB for a low number of users. Furthermore, for the entire range of number of users considered, the enhanced algorithm is at least as good as the original algorithm and often much better.
By tracking the expected number of parity-consistent partial paths, it may be possible to compute the expected column reduction ratio at every stage. However, this is a daunting task, as explained in [
5]. Instead, we estimate the expected column reduction ratio by applying the approximate analysis found in [
5], which relies on the following simplifying assumptions:
No two users have the exact same message fragments at any stage: whenever and for all .
The inner CS decoder makes no mistakes in producing lists .
Under these assumptions and starting from a designated root node, the number of erroneous paths that survive stage
ℓ, denoted
, is subject to the following recursion.
Using initial condition
, we obtain the following expected value.
When matrix
is pruned dynamically, then
K copies of the tree decoder run in parallel and, as such, the expected number of parity-consistent partial paths at stage
ℓ can be expressed as follows.
Under the further simplifying assumptions that all parity patterns are independent and
concentrates around its mean, we can approximate the number of admissible parity patterns. The probability that a particular path maps to a specific parity pattern is
; hence, the probability that this pattern is not selected by any path become
. Taking the complement of this event and multiplying by the number of parity patterns, we obtain an approximate expression for the mean number of admissible patterns.
Thus, the expected column reduction ratio at slot
ℓ, denoted
, is provided by the following.
Figure 5 shows the estimated versus simulated column reduction ratio across stages. Overall, the number of columns in
can be reduced drastically for some stages, thus significantly lowering the complexity of the decoding algorithm.
5. Conclusions
In this article, a framework for a concatenated code architecture consisting of a structured inner code and an outer tree code was presented. This framework was specifically designed for URA applications but may find applications in other fields as well. An enhanced decoding algorithm was proposed for this framework that has the potential to simultaneously improve performance and decrease computational complexity. This enhanced decoding algorithm was applied to two URA schemes: coded compressed sensing (CCS) and CCS for massive MIMO. In both cases, PUPE performance gains were observed, and decoding complexity was significantly reduced.
The proposed algorithm is a natural extension of the existing literature. From coding theory, we know that there are at least three methods for inner and outer codes to interact. Namely, the two codes may operate completely independent of one another in a Forney-style concatenated fashion; this is the style of the original CCS decoder presented in [
5]. Secondly, information messages may be passed between inner and outer decoders as both decoders converge to the correct codeword; this is the style of CCS-AMP, which was proposed by Amalladinne et al. in [
7]. Finally, a successive cancellation decoder may be employed in the spirit of coded decision feedback; this is the style highlighted in this article and considered in [
28,
29]. Thus, the dynamic pruning introduced in this paper can be framed as an application of coding theoretic ideas to a concatenated coding structure that is common within URA.
By using the examples presented in this article pertained to CCS, we emphasize that dynamic pruning may be applicable to many algorithms beyond CCS. For instance, this approach may be relevant to support recovery in exceedingly large dimensions, where a divide-and-conquer approach is needed. As long as the inner and outer codes subscribe to the structure described in
Section 2, this algorithmic enhancement can be leveraged to obtain performance and/or complexity improvements.