Implementation of a Reduced Decoding Algorithm Complexity for Quasi-Cyclic Split-Row Threshold Low-Density Parity-Check Decoders

Mejmaa, Bilal; Aqil, Chakir; Akharraz, Ismail; Ahaitouf, Abdelaziz

doi:10.3390/info15110684

Open AccessArticle

Implementation of a Reduced Decoding Algorithm Complexity for Quasi-Cyclic Split-Row Threshold Low-Density Parity-Check Decoders

¹

Department of Physics, University Sidi Mohamed Ben Abdellah (USMBA), Fez 30000, Morocco

²

Department of Informatics and Mathematical Engineering, University Ibn Zohr (UIZ), Agadir 80000, Morocco

^*

Author to whom correspondence should be addressed.

Information 2024, 15(11), 684; https://doi.org/10.3390/info15110684

Submission received: 17 August 2024 / Revised: 25 October 2024 / Accepted: 29 October 2024 / Published: 1 November 2024

(This article belongs to the Special Issue Next-Generation Communication Networks and Systems in Smart Cities: Enhancing Connectivity, Security, and Performance)

Download

Browse Figures

Versions Notes

Abstract

:

We propose two decoding algorithms for quasi-cyclic LDPC codes (QC-LDPC) and implement the more efficient one in this paper. These algorithms depend on the split row for the layered decoding method applied to the Min-Sum (MS) algorithm. We designate the first algorithm “Split-Row Layered Min-Sum” (SRLMS), and the second algorithm “Split-Row Threshold Layered Min-Sum” (SRTLMS). A threshold message passes from one partition to another in SRTLMS, minimizing the gap from the MS and achieving a binary error rate of 3 × 10⁻⁵ with Imax = 4 as the maximum number of iterations, resulting in a decrease of 0.25 dB. The simulation’s findings indicate that the SRTLMS is the most efficient variant decoding algorithm for LDPC codes, thanks to its compromise between performance and complexity. This paper presents the two invented algorithms and a comprehensive study of the co-design and implementation of the SRTLMS algorithm. We executed the implementation on a Xilinx Kintex-7 XC7K160 FPGA, achieving a maximum operating frequency of 101 MHz and a throughput of 606 Mbps.

Keywords:

LDPC codes; min-sum; split row; layered decoding; co-design; FPGA implementation

1. Introduction

Low-density parity-check (LDPC) codes are linear block codes with a sparse check matrix. These codes were initiated by Gallagher [1] in 1962 and rediscovered in 1996 by MacKay and Neal [2]. LDPC codes have been extensively included in standards since. Their error performance is close to the Shannon limit for long code lengths [3]. They function well, are selected as error-correcting codes for fifth-generation applications [4,5], and are employed in second-generation digital satellite video transmission for satellite communications [6]. Moreover, the third-generation partnership project (3GPP) has standardized them for 5G new radio (NR) [6,7].

The bit-flipping (BF) algorithm and its variants [7] or the sum-product (SP) algorithm [2] and its variants [8] are typically used to decode LDPC codes. Despite its good performance, the SP algorithm is characterized by a calculating loop. On the other hand, while the BF method has fewer calculations overall, its performance is reduced. Every SP and BF version aims to balance performance and complexity. Weighted Bit Flipping (WBF) [9], Modified Weighted Bit Flipping (MWBF) [10], and Improvement MWBF (IMWBF) [11], as well as Multi-Stage Bit-Flipping Decoding (MSBFD) [12], Cyclic Switching Weighted Bit-Flipping Decoding (CSWBFD) [13], and A New Reliability Ratio WBF (NRRWBF) [14], are found for BF in this context. For the SP, we can cite the following methods: Normalized Min-Sum [15], Relaxed Min-Sum (RMS) [16], Offset Min-Sum (OMS) [17], and Min-Sum (MS) [18]. Since MS is the best option among the offered algorithms, much research is focused on refining it further to lower SP’s complexity. The MS split-row algorithms [19], MS multi split-row threshold [20] for the LDPC Regular and MS split-row codes [21], and MS multi split-row threshold [22] for irregular LDPC codes are among the algorithms that can be found in the literature addressing this topic. These techniques minimize the number of operations needed for decoding by vertically partitioning the parity-check matrix (PCM) into two or more parts. For the same objective, layered scheduling decoders [23,24,25,26] are provided. In this case, the PCM is divided into N horizontal layers. This approach involves processing the decoded data in the first layer, sending the findings to the next, and so on, until the final layer. This process is repeated for every iteration. This method makes it possible to reduce the number of iterations required to complete the decoding process.

In this document, we propose two novel algorithms, “Split-Row Layered Min-Sum (SRLMS)” and “Split-Row Threshold Layered Min-Sum (SRTLMS)”, and then we present the co-design and implementation of the most performed one (SRTLMS). These algorithms integrate the existing approaches to achieve the optimal balance between performance and complexity of the decoder. These algorithms are applied to the quasi-cyclic LDPC codes, which are widely used in communication systems. We introduce the implementation of the SRT circuit, which is a key component of the innovative SRTLMS model that was co-designed and simulated for layered LDPC decoders. The SRTLMS model adheres to the WiMAX standard, which is a state-of-the-art wireless communication protocol. Our implementation boosts a remarkable throughput of 606 MHz, a significant improvement over the existing throughput of WiMAX.

The remainder of the paper is organized as follows: In Section 2, we discuss LDPC codes and their decoding methods. The main contribution of this work is the combination of split-row approaches in layer decoding for quasi-cyclic LDPC codes, as discussed in Section 3 of this article. Section 4 then presents the co-design and implementation findings. The results of the BER simulation and implementation of the SRTLMS are covered in Section 5. In Section 6, we discuss our proposal’s benefits. The conclusion of our work is given in Section 7.

2. Decoding Algorithms for LDPC Codes

The Tanner graph, a bipartite graph with two different types of connected nodes, is one way that LDPC codes are defined. A branch links a check node

m

and a variable node

n

if and only if

H_{m n} = 1

in the matrix M × N sparse parity-check matrix H, where the number of control nodes M corresponds to the collection of parity-check equations, and N is the block length, which represents the number of variable nodes (VNs).

2.1. LDPC Quasi-Cyclic Codes

QC-LDPC codes present an improved class of LDPC codes provided by the structured H matrix, which is produced by expanding an

m_{b}

by

n_{b}

base matrix B. Each non-zero entry from the base matrix is extended to a z × z identity sub-matrix shifted circularly to the right by the value of the entry. The structure of the PCM allows easy determination of the positions of the non-zero elements by the hardware designer. Therefore, several communication standards have adopted QC-LDPC codes, such as 802.11n [27], 802.15.3c [28], and 802.16e [29]. In this article, we employ an 802.16e system QC-LDPC code at a rate of ½ 2304 bits [24,29]. A bigger H matrix is expanded from the base matrix of WiMAX (12 × 24), which is presented in Figure 1. The entries of the base matrix can be expanded as follows:

A blank entry can be expanded as a fully zero 96 × 96 sub-matrix. From a zero entry of the base matrix, a 96 × 96 identity matrix is expanded. Finally, a non-zero entry (δ) is expanded by circularly shifting the 96 × 96 identity matrix to the right by δ.

2.2. Min-Sum Decoding

As noted in the introductory section, utilizing the MS decoding technique is a feasible method to decrease the SP decoding algorithm complexity while minimizing control node processing. Herein, we establish the parameters employed by this approach as follows:

V_{(i)}

= {j:

H_{i j}

= 1} presents the VNs participating in the

i_{t h}

check equation.

C_{(j)}

= {i:

H_{i j}

= 1} is the collection of check nodes (CNs) participating in

j_{t h}

the VN update.

V_{(i)}

\j presents every VN in

V_{(i)}

nonetheless, node

j

.

C_{(j)}

\i presents every CN in

C_{(j)}

nonetheless, node

i

.

λ_{j} = l n (\frac{P (x_{i} = 0 | y_{i})}{P (x_{i} = 1 | y_{i})})

(1)

λ_{j}

is designated as the data extrapolated in light of the received symbol’s (

λ_{i}

) log-likelihood ratio.

The result of row processing (

β_{i j}

) is the message coming from check node

i

toward variable node

j

.

The result of the column processing (

α_{i j}

) is the message received from variable node

i

to check node

j

.

MS decoding is summarized into four steps:

Initialization: $λ_{j}$ is the initialization of $β_{i j}$ to the received symbol’s ( $λ_{j}$ ) log-likelihood ratio for each $i$ and $j$ . The computed messages α and β are exchanged between VNs and CNs through the graph’s edges by the following steps (2–4) during each iteration.
The check node update or row processing: Compute the messages $α_{i j}$ based on β received messages from all other VNs that are connected to CN $C_{i}$ , excluding the β information coming from $V_{j}$ :

$α_{i j} = S \times \prod_{j^{'} \in V_{(i)} \ j} s i g n (β_{i j^{'}}) \times {M i n}_{j^{'} \in V_{(i)} \ j} (|β_{i j^{'}}|)$

(2)
The early product in Equation (2) is the update of the parity (sign), and the reliability (magnitude) update is the second product term. The S parameter denotes a binary bit representing the sign product of all β messages received from the corresponding VNs.
Column processing/VN update: The incoming α messages from all other check nodes besides $C_{i}$ are used to compute the $β_{i j}$ messages.

$β_{i j} = λ_{j} + \sum_{i' \in C_{(j)} \ i} α_{i' j}$

(3)
Inspect for syndromes and terminate early: Following the completion of column processing, each bit in column $j$ is updated based on the addition of channel information ( $λ_{j}$ ) alpha messages from nearby check nodes.

$w_{j} = λ_{j} + \sum_{i' \in C_{(j)}} α_{i' j}$

(4)

An estimated code vector from the revised vector

\hat{x} = \{{\hat{x}}_{1}, {\hat{x}}_{2}, \dots, {\hat{x}}_{N}\}

is calculated by

\hat{x} = \{\begin{matrix} 1, i f w_{i} \leq 0 \\ 0, i f w_{i} > 0 \end{matrix}

(5)

The decoding procedure is finished and the iterative process converges if

H . {\hat{x}}^{T} = 0

. In this case,

{\hat{x}}^{T}

is the transpose of a verified code word. Otherwise, the decoding proceeds as in step 2 until a successful code word estimation is made or the maximum number of iterations

I_{m a x}

has been reached, which terminates the decoding operation.

2.3. Split-Row Threshold and Split-Row Decoding Algorithms

As seen in Figure 2, the traditional message-passing mechanism consists of two phases: an update of the control node and an update of the variable node (a). The control node’s processing is split into two or more almost independent partitions for split-row [19] and split-row threshold [20] decoding. Each processor on a control node concurrently generates a new message using the bare minimum of information from its neighboring partitions. Split-row decoding divides the process of the control node into two blocks, as seen in Figure 2b. For every controlling node processor, only one information bit (sign) should be sent between partitions to improve error performance.

The main disadvantage of the split-row algorithm is that it can result in a decrease in error performance compared to Min-Sum normalized decoders, with losses reaching up to 1 dB [17,22]. The performance decrease depends on how many partitions the control node has because each partition does not know the minimal value of the other partitions and only the sign signal is provided to reduce error coming from inaccurate sign information from the control node’s output. The algorithm of the split-row threshold shown in Figure 2c reduces the error caused by erroneous amplitude by providing a signal,

T h r e s h o l d_e n s p

, that indicates whether or not a division has a minimum below a given threshold (T). This means that all control nodes take the minimum of either T or their local minimum. Consequently, there are few notable deviations from the partitioned true minimum, making multi-split implementations possible.

In the final diagram, the split-row threshold method adds the extra single-bit signal

T h r e s h o l d_e n s p

to the original split-row architecture [22]. This signal essentially inhibits virtual alteration to the split-row algorithm with the addition of minimal wire and additional circuitry. Even though error performance has been greatly improved, the main cause of the complexity of the big connections needed in today’s LDPC decoder implementations is the reduction in communication between the processors of CNs and VNs brought about by split-row architectures [30,31]. Furthermore, the surface area of each processor is reduced since fewer inputs are reaching each control node. Be aware, however, that the VN operation is the same in all of these methods. The logic of the VN processor remains unchanged as a result.

2.4. Layered Decoding Algorithm

The PCM H of size M × N is divided into L horizontal layers using layered decoding (see Figure 3). Because each decoding layer has M/L consecutive rows of H, each VN is only connected to a decoding layer a maximum of once. We designate by

M_{l}

the collection of H rows that follow each other and correspond to layer

l \in \{1, \dots, L\}

.

A message from the current layer is sent vertically to any other unprocessed layers in the same VN during the horizontally tiered decoding phase. During each iteration, the horizontal layers are addressed one at a time, from top to bottom. The LDPC coding matrix H from the 802.16e standard is used in this work. Figure 1 illustrates how a cyclic identity matrix shift results in the formation of a quasi-cyclic matrix. This form lends itself nicely to a horizontal layer decoding, since every row in the base matrix may be viewed as a layer and every submatrix has a column weight of one. The message transmission flow for layered decoding for the selected H matrix is shown in Figure 4.

Algorithm 1 explains MS layered decoding.

Algorithm 1: Layered Min-sum decoding algorithm

Received word

Input : y = (y_{1} . . ., y_{N})

is the channel output alphabet
Estimated codeword

Output : \hat{x} = \{{\hat{x}}_{1}, {\hat{x}}_{2}, . . ., {\hat{x}}_{N}\}

\in {\{0, 1\}}^{N}

Initialization

For all i = 1 . . ., N

do λ_{j} = l n (\frac{P (x_{i} = 0 | y_{i})}{P (x_{i} = 1 | y_{i})})

For all j = 1 . . ., M

and i \in H_{(j)}

do α_{i j} = 0

;
The iteration loop includes the following steps in each decoding iteration.

For all l = 1, . . ., L

do
Variable node processing

For all {j \in M}_{l}

and i \in H_{(j)}

do

β_{i j} = λ_{j} - α_{i j}

Check-node processing

For all {j \in M}_{l}

and i \in H_{(j)}

do

α_{i j} = S \times \prod_{j^{'} \in V_{(i)} \ j} s i g n (β_{i j^{'}}) \times {M i n}_{j^{'} \in V_{(i)} \ j} (|β_{i j^{'}}|)

A posteriori information update

For all {j \in M}_{l}

and i \in H_{(j)}

do

λ_{j} = α_{i j} + β_{i j}

End (looping horizontal layers)
Hard decision

For all i = 1 . . ., N

do x_{i} = \frac{1 - s g n (λ_{j})}{2}

Syndrome check

If H . {\hat{x}}^{T} = 0

then exit the iteration loop
End iteration loop

3. Split-Row Threshold Algorithm for a Layered Decoder

Our proposals, which consist of two algorithms, the split-row algorithm and the split-row threshold for a layered decoder, are covered in this section.

Reducing the complexity of the Min-Sum algorithm can be achieved with these two approaches. The highest signal-to-noise ratio (SNR) gain is achieved by the LDPC decoding technique (MS), although it is still complicated. A proficient decoder strikes a balance between complexity and performance. To achieve this objective, we combined the split-row approach with the layered decoding of the MS algorithm, since these methods (split-row and layered decoder) have demonstrated their efficacy in decreasing computing operations [22] and the needed number of iterations to converge the decoder [25]. First, the matrix H is divided into N horizontal layers using the split-row technique for layered decoding. Subsequently, each layer is split into two sections, wherein the messages

α_{i j}

are calculated and sent from the variable nodes to the check nodes. Moreover, each partition transmits data separately and concurrently utilizing local values, with just a sign message transferring data from one partition to the next; see Figure 5. The idea underlying the second technique, the split-row threshold for a layered decoder, is the same as the first one, with the addition of a threshold message that reports on the comparison between the experimental threshold and each partition’s local minimum. As a result, Figure 6 illustrates how little the local and global minimums differ from one another.

The split-row threshold for a layered decoder algorithm’s decoding procedure is predicated on the threshold technique that split-row offered to compensate for the discrepancy between the partition minimums to improve error performance with minimal additional hardware. Consequently, every partition transmits a

T h r e s h o l d_e n s p_o u t = 1

signal to the subsequent partition notifying it whether its local Min1 is less than a threshold T, and a

T h r e s h o l d_e n s p_o u t = 0

signal in the absence of such information. The algorithm determines if the threshold T exceeds Min1 and Min2, and then the two minimums (Min1, Min2) serve to update the

α_{i j}

values. Moreover, a threshold signal (

T h r e s h o l d_e n s p_o u t = 1

) which moves to the following partition is declared high, proving that the minimums in that partition are less than the threshold.

Alternatively, if the local minimums Min1 and Min2 exceed the threshold T, and the adjacent partition signal (

T h r e s h o l d_e n s p_i n = 1

) is claimed, indicating that the local minimum Min1 is less than threshold T, then the threshold T is used to update the αij values in this partition. However, if the threshold signal (

T h r e s h o l d_e n s p_i n = 0

) indicates that in the other partition, the local minimum Min1, is greater than the threshold T, Min1 and Min2 are used in updating the

α_{i j}

values.

If Min1 is below T and both Min2 and the incoming threshold signal (

T h r e s h o l d_e n s p_i n = 1

) are higher than T, then

α_{i j}

values are calculated using Min1 and T. On the other hand, if (

T h r e s h o l d_e n s p_i n = 0

) is the incoming threshold signal,

α_{i j}

will be derived from calculations that involve choosing Min1 and Min2.

The following algorithm 2 is a summary of the algorithm used to calculate the Min-Sum split-row threshold for each row in the score:

Algorithm 2: Split-row threshold layered min-sum decoding algorithm

For each layer L:
Initialization
For each partition, finding the first and second minimums is based on the following relationships:

{m i n}_{j^{'} \in V (i) \ j} (|β_{i j^{'}}|) = \{\begin{matrix} {M i n 1}_{i}; i f j \neq a r g m i n ({M i n 1}_{i}) \\ {M i n 2}_{i}; i f j = a r g m i n ({M i n 1}_{i}) \end{matrix}

Where

{M i n 1}_{i} = {m i n}_{j \in V (i)} (|β_{i j}|

)

{M i n 2}_{i} = {m i n}_{j^{″} \in V (i) \ a r g m i n ({M i n 1}_{i})} (|β_{i j^{″}}|)

1-

i f {M i n 1}_{i} \leq T a n d {M i n 2}_{i} \leq T t h e n

T h r e s h o l d_e n s p (i)_o u t =

1

{m i n}_{j^{'} \in V (i) \ j} (|β_{i j^{'}}|) = \{\begin{matrix} {M i n 1}_{i}; i f j \neq a r g m i n ({M i n 1}_{i}) \\ {M i n 2}_{i}; i f j = a r g m i n ({M i n 1}_{i}) \end{matrix}

2-

i f {M i n 1}_{i} \leq T a n d {M i n 2}_{i} \geq T t h e n

T h r e s h o l d_e n s p (i)_o u t =

1

T h r e s h o l d_e n s p (i)_i n = =

1
Then

{m i n}_{j^{'} \in V (i) \ j} (|β_{i j^{'}}|) = \{\begin{matrix} {M i n 1}_{i}; i f j \neq a r g m i n ({M i n 1}_{i}) \\ T; i f j = a r g m i n ({M i n 1}_{i}) \end{matrix}

else

{m i n}_{j^{'} \in V (i) \ j} (|β_{i j^{'}}|) = \{\begin{matrix} {M i n 1}_{i}; i f j \neq a r g m i n ({M i n 1}_{i}) \\ {M i n 2}_{i}; i f j = a r g m i n ({M i n 1}_{i}) \end{matrix}

end if
3-

e l s e i f {M i n 1}_{i} \geq T a n d

T h r e s h o l d_e n s p (i)_i n = = 1

Then

T h r e s h o l d_e n s p (i)_o u t =

0

{m i n}_{j^{'} \in V (i) \ j} (|β_{i j^{'}}|) = T

4-else

T h r e s h o l d_e n s p (i)_o u t =

0

{m i n}_{j^{'} \in V (i) \ j} (|β_{i j^{'}}|) = \{\begin{matrix} {M i n 1}_{i}; i f j \neq a r g m i n ({M i n 1}_{i}) \\ {M i n 2}_{i}; i f j = a r g m i n ({M i n 1}_{i}) \end{matrix}

End if

4. Co-Design and Implementation of the Split-Row Threshold Layered Min-Sum (SRTLMS) Algorithm

In this section, we present a novel co-design process that combines the Layered Min-Sum (LMS) algorithm, scripted in MATLAB r2022b, and the split-row threshold (SRT) algorithm, modeled in VHDL, to achieve a feasible and efficient design. We validate the SRTLMS results and apply various synthesis strategies using Vivado 2020.2 to optimize the algorithm. We then test the algorithm across a range of frequencies to identify the maximum frequency of our design.

4.1. Co-Design of the SRTLMS Algorithm

Our co-design approach seamlessly integrates algorithm development and hardware implementation using MATLAB and Vivado. This co-design approach is essential for scenarios where we need to implement and test complex algorithms designed in MATLAB on an FPGA platform. Figure 7 shows the simulation results of the first layer in the first iteration, based on the two data inputs (onlytest_beta and onlytest_alpha). The lower left corner displays the results from the MATLAB script that simulates the SRTLMS. The lower right corner presents the fixed-point representation of the output values from the co-design.

The results of the MATLAB and co-design simulations show a high degree of agreement, with a negligible difference of 3.10-2. This indicates the accuracy and reliability of the co-design approach for the SRTLMS model. The co-design output precision is determined by the number of bits used for the fixed-point representation of the data. However, there is a trade-off between precision and resource consumption, as more bits require more logic elements and registers. Therefore, in this co-design approach, we used 13 bits as the optimal number of bits that can achieve the desired precision without compromising the resource efficiency of the circuit.

4.2. Implementation of the SRT Design

In this research endeavor, we successfully implemented the SRT design, which is a crucial component for achieving low-complex and low-power computing. Our implementation involved a thorough exploration of the algorithm’s capabilities and challenges under various scenarios and constraints. We used the Xilinx Kintex-7 XC7K160 FPGA as the target platform for verification and testing, as it offers an average number of configurable logic blocks and high-speed transceivers.

We applied three different synthesis strategies to the project: Flow_PerfOptimized_high, Flow_AreaOptimized_high, and Flow_AlternateRoutability. These strategies enabled us to compare and contrast different synthesis techniques. The Kintex-7 XC7K160 FPGA has a total of 101,400 Look-Up Tables (LUTs), which are the basic elements of logic synthesis. As shown in Figure 8, our implementation consumed between 74,228 (73%) and 78,475 (77%) of the available LUTs, depending on the synthesis strategy. The Flow_AlternateRoutability strategy achieves the lowest resource utilization (73%) among the three strategies by optimizing the logic technology mapping to produce multiple functionally equivalent implementations. It then selects the optimal implementation for each subcircuit based on the routing resources, avoiding the complex interlayer interconnection network required by the conventional LDPC decoding architecture.

The total operating timing of the implemented model also varied depending on the synthesis strategy, as shown in Figure 9. The operating timing represents the time required to process the entire model, which is inversely proportional to the frequency. The operating timing values ranged from 9.9 ns to 10.75 ns, which demonstrates the high speed and low latency of our model. The Flow_AreaOptimized_high uses a high-level synthesis tool that can optimize the area and performance of the circuit by applying various transformations such as loop unrolling, pipelining, resource sharing, and scheduling. The tool can also reduce the wire length and the parasitic capacitance, which in turn reduces the delay of the circuit. Therefore, the Flow_AreaOptimized_high strategy can achieve the lowest clock period (9.9 ns) among the three strategies.

5. Results

5.1. BER Simulation Results

This section presents the findings from simulations conducted with MATLAB software to examine the performance of the proposed algorithms. We assessed the error performance of our methods with the MS approach on a channel with additive white noise (AWGN) and binary phase shift keying (BPSK). The maximum number of iterations is set to Imax = 4 or less once the decoder converges. The performance of the QC-LDPC codes is seen in the figures below (Figure 10).

The computations of T and S are dependent on one another. For example, we designate a value for the factor S and, based on this value, we execute our MATLAB program for various threshold and E_b/N₀ values. Subsequently, we identify the threshold value that demonstrates the lowest BER across all E_b/N₀ values. We subsequently utilize this value as the threshold and run a simulation for a range of S factor values to ascertain which yields the minimal test BER. The technique is repeated until an optimal value of threshold and factor S corresponding to the minimal BER stabilize throughout the simulations.

The selection of the threshold values T and the scaling factor S significantly influences the error performance of the proposed algorithms. Simulations find the optimal values for T and S, as Figure 10, Figure 11, Figure 12 and Figure 13 illustrate. The QC-LDPC code used in the simulations presented here is characterized by the following parameters:

Length of the codeword N = 2304, number of information bits K = 1152, number of parity-check equations M = 1152, coding rate R = 0.5.

The simulation used to estimate the threshold for the Min-Sum split-row threshold for a layered decoder is shown as follows.

The optimal experimental threshold value was found to be 3 for an SNR ranging from 3 dB to 4.2 dB and a normalizing factor S = 0.4, as shown in Figure 10.

Figure 11, Figure 12 and Figure 13 depict the simulation used for scale factor estimation experimentally for the MS, SRLMS, and SRTLMS, respectively, for different E_b/N₀ values between 3 and 4.2 dB.

According to these figures, the optimal values obtained for the algorithms MS, SRLMS, and SRTLMS are 0.9, 0.3, and 0.4, respectively.

The error performance results for a 2304-bit QC-LDPC code for MS, SRLMS, and SRTLMS are displayed in Figure 14, utilizing the threshold value acquired in Figure 10 and the values of the scale factors found in Figure 11, Figure 12 and Figure 13.

First, we can see from Figure 14 and Figure 15 that the BER/BLER for all three algorithms is about equal in the low-signal-to-noise-ratio region. This is because there is a lot of noise in this range, making it difficult for algorithms to discern between valid and incorrect bits during decoding. However, the simulation demonstrates that the SRLMS method has a decline when comparing error performance against the MS in the region with a high signal-to-noise ratio (E_b/N₀ > 2.75 dB), arriving at 1.15 dB for the bit error rate equal to 2 × 10⁻⁵ for the BER (Figure 14) and 8 × 10⁻⁴ for the BLER (Figure 15).

Because row processing in SRLMS is carried out in each partition independently of the other partition’s minimum, this influences the difference between the local and global minimums of each partition and the full row H of the parity-check matrix, respectively. In terms of complexity, SRLMS is less difficult than MS since fewer operations are needed due to the division of the matrix H into two partitions and each partition processing data independently and concurrently, whereas MS compares and finds Min1 and Min2.

5.2. Implementation Results

Table 1 reports the maximum frequencies achieved by our model using the three synthesis strategies. The total power consumption was between 0.72 Watts and 0.82 Watts, which demonstrates the energy efficiency of our design. Figure 16 depicts the power consumption components of our model with the Flow_AreaOptimized_high strategy. We analyzed the impact of different frequencies on the Worst Negative Slack (WNS) of our model, which is a measure of the timing performance. We found that the optimal WNS values for our model, using the Flow_PerfOptimized_high, Flow_AreaOptimized_high, and Flow_AlternateRoutability strategies, were 0.039 ns at 98 MHz, 0.062 ns at 101 MHz, and 0.022 ns at 93 MHz, respectively. These results indicate that our model can operate at high frequencies with minimal timing violations.

We observed that the maximum frequencies achieved by the three strategies ranged from 93 MHz to 101 MHz, which reflects the different trade-offs between latency and performance.

Following Table 1, this study illustrates notable enhancements in FPGA implementation techniques relative to reference [32], especially with diminished computational complexity and latency. By attaining elevated maximum frequencies and improved WNS values, our designs can expedite data processing and fulfill timing specifications more effectively. Despite a minor rise in resource utilization, these increments are negligible and eclipsed by the advantages of less complexity and latency.

According to the Flow_PerfOptimized strategy, the implementation in [32] could meet the timing constraints for a frequency of 71 MHz, consuming 57,439 LUTs and FFs. Based on the same strategy, our design consumed a total of 19,376 LUTs. However, for an elevated frequency of 98 MHz, our model meets the timing constraints. Our solution skillfully strikes a balance between performance and latency, while addressing the significant obstacle of complexity in LDPC decoding by partitioning the decoding process into two segments.

6. Discussion

Due to the threshold message passing through one partition before moving on to the next, the SRTLMS method greatly reduces the debasement observed in the SRLMS method, reaching up to 0.25 dB for the bit error rate comparable to 3 × 10⁻⁵ with essentially a similar degree of complexity. This outcome is attributable to the exploratory parameter T, which lessens the difference between each partition’s global and local minimums. Based on these findings, the SRTLMS algorithm thus strikes a balance between complexity and error-correcting performance. These findings highlight the significance of this algorithm, so long as it satisfies the requirement of creating the ideal decoder, which is still a problem for all high-performance decoders. Second, the key to the significance of layer decoding is to lower the maximum number of iterations required for the decoder to converge. Unlike the maximum iteration number utilized in the study, which was based on the split-row approach in [19,20], where Imax = 15, and in [21,22], where Imax = 7, this characteristic is evident in Figure 14 since Imax = 4.

We developed and implemented a high-performance model that can handle a massive number of coded bits with high speed and efficiency. Our model deals with a total of N = 2304 coded bits, which are partitioned into Z = 96 sub-blocks and decoded iteratively in only four iterations. The model runs at a high frequency of

F_{c l k} = 101 M H z

, which is achieved by using a parallel processing scheme that requires careful timing analysis and optimization. The decoder throughput can be calculated using Equation (6); therefore, the model achieves a remarkable throughput of 606 Mbps.

T = N_{c o d e d} F_{c l k} / (Z N_{i t e r})

(6)

7. Conclusions

In this paper, we present the SRLMS algorithm, which is a novel and efficient technique to reduce the complexity of MS decoding based on the split-row method. We also introduce the SRTLMS algorithm, which enhances the performance of SRLMS by using a threshold that is empirically determined with the same level of complexity. We apply these algorithms to the layer decoding method, which is a powerful approach to decrease the number of iterations required. Moreover, the co-design of the SRTLMS algorithm and the implementations of the SRT show remarkable results, achieving a maximum operating frequency of 101 MHz, which leads to a minimal total clock period of 9.9 ns. Therefore, the complexity of the SRTLMS algorithm is validated through both simulations and hardware implementation.

Author Contributions

I.A., methodology, formal analysis, validation, and supervision; A.A., methodology, formal analysis, validation, supervision, and data curation; C.A., software, conceptualization, and data curation; B.M., conceptualization, data curation, and writing—original draft preparation. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study were obtained at the Engineer Sciences Laboratory from the Polydisciplinary Faculty of Taza, University Sidi Mohamed Ben Abdellah (USMBA), Fez. They are available upon reasonable request. Data sharing is subject to ethical and privacy considerations. Do not hesitate to contact Mejmaa Bilal at [email protected] for inquiries regarding data access and availability.

Acknowledgments

The authors would like to express their sincere gratitude to their research supervisors Ismail Akharraz and Abdelaziz Ahaitouf for their invaluable support and guidance throughout this research. Their expertise and encouragement were instrumental in the successful completion of this work. We are also grateful for the support provided by the Laboratory of Engineer Sciences for providing the necessary facilities and resources.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gallager, R. Low-density parity-check codes. IRE Trans. Inf. Theory 1962, 8, 1962. [Google Scholar] [CrossRef]
MacKay, D.; Neal, R.M. Near Shano Limit Performance of Low-Density-Parity-Check Codes. Electron. Lett. 1996, 32, 1645–1946. [Google Scholar] [CrossRef]
Chung, S.Y.; Forney, G.D.; Richardson, T.J.; Urbanke, R. On the Design of Low-Density Parity-Check Codes within 0.0045 dB of the Shannon Limit. IEEE Commun. Lett. 2001, 5, 58–60. [Google Scholar] [CrossRef]
European Telecommunications Standards Institute (ETSI). Digital Video Broadcasting (DVB); EN 302 307-1, 2014, V1.4.1; European Telecommunications Standards Institute (ETSI): Sophia Antipolis, France, 2014. [Google Scholar]
Landolsi, M. Semi-random LDPC codes for CDMA communication over non-linear band-limited satellite channels. Int. J. Satell. Commun. Netw. 2006, 24, 303–317. [Google Scholar] [CrossRef]
Mejmaa, B.; Marktani, M.A.; Akharraz, I.; Ahaitouf, A. An Efficient QC-LDPC Decoder Architecture for 5G-NR Wireless Communication Standards Targeting FPGA. Computers 2024, 13, 195. [Google Scholar] [CrossRef]
Li, H.; Guo, J.; Guo, C.; Wang, D. A low-complexity min-sum decoding algorithm for LDPC codes. In Proceedings of the 17th International Conference on Communication Technology (ICCT), Chengdu, China, 27–30 October 2017; IEEE: Piscataway, NJ, USA. [Google Scholar]
Vermaand, A.; Shrestha, R. A New Partially-Parallel VLSI-Architecture of Quasi-Cyclic LDPC Decoder for 5G New-Radio. In Proceedings of the 33rd International Conference on VLSI Design and 19th International Conference on Embedded Systems, Bangalore, India, 4–8 January 2020; IEEE: New York, NY, USA, 2020. [Google Scholar]
Kou, Y.; Lin, S.; Fossorier, M. Low-density parity check codes based on finite geometries: A rediscovery and more. IEEE Trans. Inform. Theory 2001, 47, 2711–2736. [Google Scholar] [CrossRef]
Zhang, J.; Fossorier, M.P.C. A modified weighted bit flipping decoding of low-density parity-check codes. Commun. Lett. IEEE 2004, 8, 165–167. [Google Scholar] [CrossRef]
Jiang, M.; Zhao, C.; Shi, Z.; Chen, Y. An improvement on the modified weighted bit flipping decoding algorithm for LDPC codes. Commun. Lett. IEEE 2005, 9, 814–816. [Google Scholar] [CrossRef]
Wang, Y.; Wu, G. Cyclic Switching Weighted Bit-Flipping Decoding for Low-Density Parity-Check Codes. IET Commun. 2018, 12, 271–275. [Google Scholar] [CrossRef]
Chang, T.C.Y.; Wang, P.H.; Su, Y.T. Multi-Stage Bit-Flipping Decoding Algorithms for LDPC Codes. IEEE Commun. Lett. 2019, 23, 1524–1528. [Google Scholar] [CrossRef]
Aqil, C.; Akharraz, I.; Ahaitouf, A. A New Reliability Ratio Weighted Bit Flipping Algorithm for Decoding LDPC Codes. Wirel. Commun. Mob. Comput. 2021, 2021, 6698602. [Google Scholar] [CrossRef]
Chen, J.; Dholakia, A.; Eleftheriou, E.; Fossorier, M.; Hu, X.Y. Reduced-complexity decoding of LDPC codes. IEEE Trans. Commun. 2005, 53, 1288–1299. [Google Scholar] [CrossRef]
Zhao, J.; Zarkeshvari, F.; Banihashemi, A.H. On implementation of min-sum algorithm and its modifications for decoding low-density Paritycheck (LDPC) codes. IEEE Trans. Commun. 2005, 53, 549–554. [Google Scholar] [CrossRef]
Hajiyat, Z.R.M.; Sali, A.; Mokhtar, M.; Hashim, F. Channel Coding Scheme for 5G Mobile Communication System for Short Length Message Transmission. Wirel. Pers. Commun. 2019, 106, 377–400. [Google Scholar] [CrossRef]
Hemati, S.; Leduc-Primeau, F.; Gross, W.J. A Relaxed Min-Sum LDPC Decoder with Simplified Check Nodes. IEEE Commun. Lett. 2016, 20, 422–425. [Google Scholar] [CrossRef]
Mohsenin, T.; Baas, B. Split-row: A reduced complexity, high throughput LDPC decode architecture. In Proceedings of the 2006 International Conference on Computer Design, San Jose, CA, USA, 1–4 October 2006; IEEE: New York, NY, USA, 2006. [Google Scholar]
Mohsenin, T.; Truong, D.; Baas, B. Multi-split row threshold decoding implementations for LDPC codes. In Proceedings of the 2009 IEEE International Symposium on Circuits and Systems, Taipei, Taiwan, 24–27 May 2009; IEEE: New York, NY, USA, 2009; pp. 2449–2452. [Google Scholar]
El Alami, R.; Mrabti, M.; Gueye, C.B. Reduced Complexity of Decoding Algorithm for Irregular LDPC Codes Using a Split Row Method. J. Wirel. Netw. Commun. 2012, 4, 29–34. [Google Scholar] [CrossRef]
Aqil, C.; El Alami, R.; Akharraz, I.; Ahaitouf, A. Threshold Multi Split-Row algorithm for decoding irregular LDPC codes. In Proceedings of the International Conference on Applied Mathematics, Taza, Morocco, 19–20 October 2017; pp. 88–93. [Google Scholar]
Gunnam, K.K.; Choi, G.S.; Yeary, M.B. A Parallel VLSI Architecture for Layered Decoding for Array LDPC Codes. In Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID’07), Bangalore, India, 6–10 January 2007; IEEE: New York, NY, USA, 2007; pp. 738–743. [Google Scholar]
Zhang, K.; Huang, X. High-Throughput Layered Decoder Implementation for Quasi-Cyclic LDPC Codes. IEEE J. Sel. Areas Commun. 2009, 27, 985–994. [Google Scholar] [CrossRef]
Kakde, S.; Khobragade, A.; Ambatkar, S.; Nandanwar, P. Implementation of Decoding Architecture for LDPC Code Using a Layered Min-Sum Algorithm. IUM Eng. J. 2017, 18, 128–136. [Google Scholar] [CrossRef]
Zheng, X.; Hu, Q.; Feng, J. Two improved algorithms for layered QC-LDPC decoding algorithm. In Proceedings of the 2018 IEEE Canadian Conference on Electrical & Computer Engineering (CCECE), Quebec, QC, Canada, 13–16 May 2018; IEEE: New York, NY, USA, 2018. [Google Scholar]
Xiao, Y. IEEE 802.11N: Enhancements for higher throughput in wireless LANs. IEEE Wirel. Commun. 2006, 12, 82–91. [Google Scholar] [CrossRef]
Singh, H.; Yong, S.K.; Oh, J.; Ngo, C. Principles of IEEE 802.15.3c: Multi-Gigabit Millimeter-Wave Wireless PAN. In Proceedings of the 18th International Conference on Computer Communications and Networks, San Francisco, CA, USA, 3–6 August 2009; IEEE: New York, NY, USA, 2009. [Google Scholar]
Liu, C.H.; Yen, S.W.; Chen, C.L.; Chang, H.C.; Lee, C.Y.; Hsu, Y.S.; Jou, S.J. An LDPC Decoder Chip Based on Self-Routing Network for IEEE 802.16e Applications. IEEE J. Solid-State Circuits 2008, 43, 684–694. [Google Scholar] [CrossRef]
Li, H.; Xu, H.; Chen, C.; Bai, B. Efficient construction of quasi-cyclic LDPC codes with multiple lifting sizes. IEEE Commun. Lett. 2024, 28, 754–758. [Google Scholar] [CrossRef]
Tran-Thi, B.N.; Nguyen-Ly, T.T.; Hoang, T. An FPGA design with high memory efficiency and decoding performance for 5G LDPC decoder. Electronics 2023, 12, 3667. [Google Scholar] [CrossRef]
Katyushnyj, A.; Krylov, A.; Rashich, A.; Zhang, C.; Peng, K. FPGA implementation of LDPC decoder for 5G NR with parallel layered architecture and adaptive normalization. In Proceedings of the 2020 IEEE International Conference on Electrical Engineering and Photonics (EExPolytech), St. Petersburg, Russia, 15–16 October 2020; IEEE: New York, NY, USA, 2020; pp. 34–37. [Google Scholar]

Figure 1. The 802.16e standard’s LDPC parity-check matrix of rate ½.

Figure 2. Block diagram for traditional two-phase decoding (a), and (b) split-row decoding; diagram of split-row threshold in (c).

Figure 3. Parity-check matrix with layered scheduling.

Figure 4. Message-passing flow in horizontal LD.

Figure 5. Split-row decoding in a block diagram using two partitions, highlighting sign-passing signals in the layer.

Figure 6. Split-row threshold decoding system block diagram with two partitions that emphasize crossing signals for signs and threshold.

Figure 7. Comparison of simulation and co-design outputs.

Figure 8. Comparison of the total number of consumed LUTs.

Figure 9. Comparison of the operating periods.

Figure 10. Performance of the BER at the threshold for

E_{b} / N_{0}

values of 3 dB, 3.5 dB, 3.7 dB, 4 dB, and 4.2 dB presented by the colors pink, red, blue, black, and light blue, respectively.

Figure 10. Performance of the BER at the threshold for

E_{b} / N_{0}

values of 3 dB, 3.5 dB, 3.7 dB, 4 dB, and 4.2 dB presented by the colors pink, red, blue, black, and light blue, respectively.

Figure 11. BER performance utilizing MS as a function of the scale factor for

E_{b} / N_{0}

values of 3.4 dB, 3.5 dB, 3.7 dB, 3.8 dB, and 4.2 dB presented by the colors black (star markers), black (square markers), red, pink, and blue, respectively.

Figure 11. BER performance utilizing MS as a function of the scale factor for

E_{b} / N_{0}

values of 3.4 dB, 3.5 dB, 3.7 dB, 3.8 dB, and 4.2 dB presented by the colors black (star markers), black (square markers), red, pink, and blue, respectively.

Figure 12. BER performance utilizing SRLMS as a function of the scale factor for

E_{b} / N_{0}

values of 3.5 dB, 3.8 dB, 4 dB, and 4.5 dB presented by the colors blue, pink, black, and red, respectively.

Figure 12. BER performance utilizing SRLMS as a function of the scale factor for

E_{b} / N_{0}

values of 3.5 dB, 3.8 dB, 4 dB, and 4.5 dB presented by the colors blue, pink, black, and red, respectively.

Figure 13. BER performance using SRTLMS as a function of the scale factor for

E_{b} / N_{0}

values of 3 dB, 3.2 dB, 3.5 dB, and 3.7 dB presented by the colors light blue, black, red, and blue, respectively.

Figure 13. BER performance using SRTLMS as a function of the scale factor for

E_{b} / N_{0}

values of 3 dB, 3.2 dB, 3.5 dB, and 3.7 dB presented by the colors light blue, black, red, and blue, respectively.

Figure 14. Comparison of the BER performance of several decoding algorithms.

Figure 15. Comparison of the BLER performance of several decoding algorithms.

Figure 16. Total power consumption based on Flow_AreaOptimized_high strategy.

Table 1. Implementation results of the SRT model.

Synthesis Strategy	This Work			[32]
Synthesis Strategy	Flow_Perfoptimized_High	Flow_AreaOptimized_High	Flow_AlternateRoutability	Flow_PerfOptimized_High	Flow_AreaOptimized_High	Flow_AlternateRoutability
Max. Frequency (MHz)	98	101	93	71	71	71
WNS (ns)	0.039	0.062	0.022	0.186	−3.722	1.365
Total Power (W)	0.814	0.767	0.723	-	-	-
LUTs	76,815	78,475	74,228	57,439	58,089	63,644

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mejmaa, B.; Aqil, C.; Akharraz, I.; Ahaitouf, A. Implementation of a Reduced Decoding Algorithm Complexity for Quasi-Cyclic Split-Row Threshold Low-Density Parity-Check Decoders. Information 2024, 15, 684. https://doi.org/10.3390/info15110684

AMA Style

Mejmaa B, Aqil C, Akharraz I, Ahaitouf A. Implementation of a Reduced Decoding Algorithm Complexity for Quasi-Cyclic Split-Row Threshold Low-Density Parity-Check Decoders. Information. 2024; 15(11):684. https://doi.org/10.3390/info15110684

Chicago/Turabian Style

Mejmaa, Bilal, Chakir Aqil, Ismail Akharraz, and Abdelaziz Ahaitouf. 2024. "Implementation of a Reduced Decoding Algorithm Complexity for Quasi-Cyclic Split-Row Threshold Low-Density Parity-Check Decoders" Information 15, no. 11: 684. https://doi.org/10.3390/info15110684

APA Style

Mejmaa, B., Aqil, C., Akharraz, I., & Ahaitouf, A. (2024). Implementation of a Reduced Decoding Algorithm Complexity for Quasi-Cyclic Split-Row Threshold Low-Density Parity-Check Decoders. Information, 15(11), 684. https://doi.org/10.3390/info15110684

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Implementation of a Reduced Decoding Algorithm Complexity for Quasi-Cyclic Split-Row Threshold Low-Density Parity-Check Decoders

Abstract

1. Introduction

2. Decoding Algorithms for LDPC Codes

2.1. LDPC Quasi-Cyclic Codes

2.2. Min-Sum Decoding

2.3. Split-Row Threshold and Split-Row Decoding Algorithms

2.4. Layered Decoding Algorithm

3. Split-Row Threshold Algorithm for a Layered Decoder

4. Co-Design and Implementation of the Split-Row Threshold Layered Min-Sum (SRTLMS) Algorithm

4.1. Co-Design of the SRTLMS Algorithm

4.2. Implementation of the SRT Design

5. Results

5.1. BER Simulation Results

5.2. Implementation Results

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI