1. Introduction
In recent years, there has been a strengthening link between advancement in computational technologies and components of physical systems. The so-called Cyber-Physical System (CPS) consists of a set of modules that interact with each other and communicate with the outside world. Combining computational and communication aspects with control techniques in one system becomes a challenge. Cyber-Physical Systems applications can be found in almost all areas of human life, such as production systems, intelligent networks, robotics, transport systems, medical devices, military systems, home networks, intelligent buildings, etc. In many CPS applications, digital communication systems play a key role, as they are often integrated with the executive system. During communication, data may be disturbed by various unwanted signals, noise and/or interferences. Error Correction Coding (ECC) techniques can not only detect communication errors, but also reconstruct valid data. Low-Density Parity-Check (LDPC) codes are one of the best known codes with very good correction capabilities.
LDPC codes were firstly presented in 1962 by R. G. Gallager [
1], but the technology of that time did not allow practical applications. Research and development of LDPC codes was resumed in 1999 after an article by D. J. C. MacKay [
2]. Today, LDPC codes have been already used in various applications, the DVB-S2 and DVB-T2 (digital television) standards [
3], 10-Gbase-T Ethernet networks [
4], ITU-T G.hn [
5], 802.11ad (WiGig) [
6], 802.11n/ac/ax (WiFi) and 802.16e (WiMAX) [
7]. In 2015, the first information about applications of LDPC codes in 5G networks was revealed [
8].
An LDPC code is defined by its parity check matrix, which is a sparse matrix, or equivalently, by the corresponding bipartite factor graph (known as a Tanner graph [
9]). The graph structure has a direct relationship with the structure of the LDPC encoder and decoder. Particularly important for implementation reasons are the Quasi-Cyclic (QC) codes [
10], for which parity check matrix is an array of square submatrices, with every submatrix being a cyclic permutation of an identity matrix. These types of arrays allow for efficient hardware implementation of the parallel and semi-parallel QC-LDPC decoder [
11].
The research activities in the area of LDPC coding techniques are still prevalent and they include, among others, the methodical construction of LDPC codes [
10,
12,
13], decoding algorithm design [
11,
14,
15] and efficient decoder implementation [
4,
16,
17,
18]. The implementation issues typically constraint the design to some class of implementation oriented LDPC codes, which most commonly belong to the mentioned QC-LDPC class.
LDPC codes are decoded iteratively using the sum-product algorithm, also known as belief propagation [
2], which closely approximates maximum-likelihood decoding. The commonly used message passing schedule is a two-phase message passing scheme [
11], where a decoding iteration is divided into two rounds of computations, corresponding to variable and check nodes of the code graph respectively. There are also known other schedules, for example the layered decoding scheme [
19].
Every hardware LDPC decoder belongs to one of the categories: serial, semi-parallel or parallel, which means the decoder computation units execute the decoding algorithm in a serial, semi-parallel or parallel manner, respectively. The serial decoder has the lowest throughput, but also the lowest hardware utilization. The parallel decoder has the highest throughput and hardware requirements. A semi-parallel decoder is a compromise solution that is the best choice in most applications. The results of the research presented in this paper concern a semi-parallel decoder architecture with the two-phase message computation schedule. The presented decoder solution can be adapted to any QC-LDPC code. There exist a broad range of known implementations, most of them surveyed in [
16], many of them aimed at ASIC implementation [
16,
18], some of them for implementation in software [
17]. The research presented in this paper is devoted specifically to the implementation in a Field-Programmable Gate Array (FPGA) chip. In the designed decoder, an important parts of the computing units are fitted directly to the Look-Up-Table (LUT) fabric of FPGAs, giving a slightly improved decoding performance and decreased resources requirements.
An important features of the LDPC decoder is the implemented method of calculating messages in computing units, which is essentially independent of the chosen architecture. Computationally efficient message computation algorithms are known as Min-Sum (MS) and Normalized Min-Sum (NMS) [
14,
15]. The NMS algorithm differs from the Min-Sum algorithm by the additional normalization stage, which slightly reduces the magnitude of the iteratively approximated beliefs, which has a known effect of improved final decoding results.
The main scope of this paper is a presentation of an irregular QC-LDPC decoder implementation, which is oriented specifically to a LUT based structure of an FPGA programmable chip. The essence of the contribution of this paper is technology mapping approach, in which the normalization in the NMS algorithm is directly oriented to the LUT-based architecture. Moreover, we present the decoder design and resulting system solutions, aimed at an efficient implementation of the LDPC decoder inside a LUT-based FPGAs.
This paper is organized as follows. The second chapter presents the theoretical background of LDPC decoding and implementation of QC-LDPC decoder. Next, new concepts of technology mapping of QC-LDPC decoder oriented to FPGA are proposed. The implementation of distributed control unit of QC-LDPC decoder and the original implementation of normalization module which is oriented to LUT-based architecture are presented.
Section 4 illustrates experimental results of proposed solutions. The obtained results were compared with solutions known from the literature. The article ends with a summary, which contains directions for future work.
2. Theoretical Background of Hardware Implementation of a Semi-Parallel QC-LDPC Decoder
An LDPC code is defined by a parity check matrix
of size
. An example matrix
of a QC type code is shown in
Figure 1 in the form of an array of its
entries. Matrix
is a QC matrix, since it consists of circulant submatrices of size
. A submatrix will in short be called the matrix
. Parity check matrix
of QC-LDPC code can be presented in the form of an array of submatrices, where “X” corresponds to the all-zero submatrix and a numerical value
s corresponds to a an identity matrix circularly shifted by
s positions to the right (i.e., columns are cyclically shifted by the indicated number). For QC-LDPC codes, storing the
structure in the decoder (RAM/ROM) memory requires only storing the array of cyclic shift values. The implementation of the presented decoder takes advantage of this simplified representation.
An example parity check matrix of QC-LDPC code, constructed for this research with a method presented in [
20], is shown in
Figure 2. The size of the submatrix is
in this case, the size of the matrix
is
and the code rate is
. The minimum number of non-zero elements in rows is 6 and the maximum number is 7. For columns these are respectively 2 and 7. It means this is an irregular code with the maximum column weight of 7.
LDPC decoding algorithms with complexity scaling as
belong to the class of iterative belief propagation (BP) algorithms, where beliefs are propagated between nodes represented as a Tanner graph. This graph is a bipartite graph with control vertices representing rows of
, bit vertices representing columns of
and edges representing non-zero positions in
. The commonly used method for determining the beliefs (messages propagated in the algorithm) are Min-Sum and NMS [
15]. The decoding process utilizing Min-Sum function can be presented in consecutive steps as follows:
Initialization: Assigning input values that are Log-Likelihood Ratio (LLR). For every
and
:
Control nodes: Determination of minimum values. For every
and
:
Bit nodes: Adding minimum values and LLR input values. For every
and
:
Pseudo-posteriori probabilities: Determination of pseudo-posteriori probabilities. For every
:
Hard decisions: Making trial, hard decisions. For every
:
Verification of control equations:
Another iteration: If the control equations have been met, decoding is terminated with
as an outcome. Otherwise, the next iteration of decoding begins, starting from the (
2) control nodes, unless the iteration limit is exceeded.
where:
meaning is “becomes”,
—code vector,
—received vector,
—credibility LLR value from the n-th bit vertex to the m-th Tanner graph control vertex,
—LLR a priori probabilities for the n-th bit,
is a set of integers between 1 and N,
—a set of column indexes in the parity check matrix containing one in the m-th row,
—a set of row indexes in the parity check matrix containing one in the n-th column,
—message from the m-th control vertex to the n-th bit vertex of Tanner graph,
—decoded vector.
It is known [
21] that significantly improved decoding performance can be obtained by adding a normalizing parameter to Equation (
2). Equation (
2) will then take the form (
7), where
is a normalizing parameter, or equivalently form (
8), where
and
.
4. Experimental Results
A special test environment was developed for the purpose of experimental research, the block diagram of which is presented in
Figure 11. The environment consists of three key elements: computer (PC), microcontroller (eval-board STM32F4DICOVERY) and FPGA system (eval-board with Cyclone V device). The PC computer runs a system simulation software developed in Python, modeling random data generator, LDPC encoder, Binary Phase Shift Keying (BPSK) modulator/demodulator, AWGN (Additive White Gaussian Noise) channel and LDPC decoder. The encoded and modulated data is disturbed using the AWGN channel model, with noise level according to the variable Signal-to-Noise Ratio (SNR). The erroneous data is then corrected (decoded) by LDPC decoding software. Such a testing system allows verification of the correction properties of the LDPC code with given parity check matrix
. The results of the simulation the form of Bit Error Ratio (BER) vs SNR in an AWGN channel can be used as a reference chart for hardware implementations of the LDPC decoder.
The developed computer software is also responsible for generating description of QC-LDPC decoders in a Hardware Description Language (Verilog) for the FPGA chip. The generator as input parameters needs the parity check matrix , specified by matrix size , submatrix size P, location of nonzero submatrices and corresponding cyclical shift values. The generator creates all files (in Verilog) that are necessary to build a project in a Quartus environment. Additional parameters for the generator are:
FPGA type and family (e.g., Cyclone V, 5CSEBA6U23I7);
bit-resolution of decoder input—a priori LLRs (e.g., 4-bits);
maximum number of decoder iterations (e.g., 50);
type of algorithm used (e.g., Min-Sum or Normalized Min-Sum with chosen normalization).
A set of test data consisting of coded vectors (without interference) and distorted data vectors can be generated by the developed software, which allows verification of the QC-LDPC decoder FPGA implementation. A graphical interface of the whole environment was created with C#, while the microcontroller software was written in the C language. The aim of the microcontroller is to distribute the data, the clock and other control signals to the FPGA.
Making use of the environment, we can easily compare different constructions and configurations of the hardware decoder, reporting the hardware utilization as well as experimentally obtained error correction performance curves.
Table 1 shows the hardware requirements, obtained as a result of logic synthesis of a few selected configurations of the QC-LDPC decoder. It can be observed that as the data bus size increases, the hardware utilization for ALM units and memory bits also increases. Meanwhile, the normalization choice, including the proposed non-linear variant (“Proposal”), has no impact on the decoder hardware utilization. Therefore, the Proposal can be applied without any implementation overhead.
Figure 12 shows the observed error correction performance of the QC-LDPC decoder implemented in the Cyclone V (FPGA), illustrating dependence on the message (belief) precision. The Bit Error Rate (BER) and Frame Error Rate (FER) curves for the 3-bit data reflect very poor correction performance for the presented SNR range. Meanwhile, the 4-bit and 5-bit precision decoders correct errors with effectiveness increasing with SNR. Differences in BER and FER curves between 4- and 5-bit cases depend somewhat on the SNR, but they are not very significant. Therefore, in general a 4-bit bus is recommended, because of its better correction properties than the 3-bit solution and less hardware utilization in relation to the 5-bit solution. All tests were performed for the parity check matrix
shown in
Figure 2 and the maximum number of iterations of 15.
Figure 12 as a reference provides the corresponding BER and FER curves obtained from computer a simulation with the floating-point precision BP decoding.
The next simulation results provide an experimental insight into the normalization method of the NMS algorithm.
Figure 13 presents the BER and FER results in the examined SNR range for the Min-Sum algorithm (
) and a range of variants of the NMS algorithm, with the recommended 4-bit precision. It can be observed that in general, the Min-Sum algorithm has performance inferior to the NMS algorithm with optimized
, with
being the optimal value in this case. Meanwhile, Min-Sum outperforms NMS with some other values of
.
However, it can be also observed that the use of the “Proposed” solution gives correction performance even better than NMS with the best . The proposed nonlinear normalization method makes it possible to achieve to achieve significantly improved correction performance than the optimized NMS. Meanwhile, it can be well fitted into FPGA resources, with nonlinear normalization still implemented in a single LUTs. Therefore, we achieved a modified NMS algorithm implementation, well suited for implementation in FPGAs without any hardware overhead, but with an improved correction performance.
5. Conclusions
The article presents a hardware (FPGA) implementation of the irregular QC-LDPC decoder. A few variants of decoding algorithm and precisions have been investigated. It is shown that it is beneficial to use a 4-bit precision, characterized by a better correction performance in comparison with the 3-bit precision, and hardware resources savings in comparison with 5-bit precision. The use of 4-bit buses does not lead to a significant deterioration in the correction performance in comparison with the decoder with 5-bit buses.
The novel idea presented in this article is a method of implementing the normalization of the NMS algorithm, enabling effective technological mapping into FPGA structures. The proposed method of nonlinear normalization with a selected logic function makes it possible to obtain a better correction performance in comparison to standard, well-known NMS solutions. A comparison of the hardware resources used for the different decoder variants indicates a slight increase in the number of ALM units used for the NMS algorithm and the proposed algorithm in comparison with the Min-Sum algorithm.
Particularly noteworthy is the interdisciplinary nature of the work presented. The use of the capabilities of a personal computer, a microcontroller and an FPGA system perfectly fits the idea of Cyber-Physical Systems, which combine issues in the fields of electronics, computer science and telecommunications. The developed test environment not only allows for automating many activities (e.g., generating new decoders, carry out tests) but also significantly expands the research capabilities. The use of a distributed control system allows for significant simplification of the construction of the QC-LDPC decoder as well as the control system itself.
Future work will focus on optimizing the developed systems in terms of energy consumption. The idea of a distributed control system provides many new possibilities, including an introduction of Clock Gating for different parts of the decoder, which will also be a further direction of work. It seems that it is possible to develop a decoder which, while ensuring the assumed correction parameters, can become more energy-efficient. The essence of these ideas is to reduce the dynamic power consumption by using Clock Gating methods in a way that best suits QC-LDPC decoders.