1. Introduction
Multiple-input multiple-output (MIMO) systems are essential for enhancing spectral efficiency in modern wireless networks. Spatial multiplexing in MIMO systems allows for simultaneous transmission of multiple information streams across different antennas, setting it apart from diversity systems that focus on reliability by transmitting identical information. Achieving higher data rates through spatial multiplexing presents significant challenges at the receiver, particularly in detection complexity and efficiency, which have been the subject of research for over five decades, driving the evolution of MIMO detection methodologies [
1,
2]. The core of MIMO detection involves decoding transmitted symbols using known channel characteristics. While maximum likelihood (ML) detection minimizes bit error rate (BER) optimally, it is computationally impractical for physical implementations involving a large number of antennas. Therefore, alternative methods like sphere decoding (SD), zero forcing (ZF), and linear minimum mean squared error (LMMSE) have been developed for near-optimal performance with lower complexity [
2]. There are also methods such as Neumann series expansion (NSE), Gauss–Seidel (GS), and conjugate gradient (CG) which utilize iterative matrix-vector multiplication to reduce system complexity [
3,
4,
5,
6]. Non-linear MIMO detectors are useful in reducing interference for subsequent signals, though errors in interference signals can degrade detection efficacy [
7]. Advanced approaches, such as the Belief Propagation (BP) algorithm [
8], are effective for high number of antennas and low inter-channel correlation, but may introduce delays and degrade performance in fading channels due to their iterative nature. Therefore, developing a detection strategy that achieves high reliability without requiring excessive amounts of decoding time is one of the major challenges in MIMO systems [
9].
In addition to conventional methods discussed above, recent studies have explored both model-driven and data-driven deep learning approaches [
10]. Model-driven techniques enhance iterative algorithms like orthogonal approximate message passing (OAMP) [
11], alternating direction method of multipliers (ADMM) [
12], Viterbi [
13], and expectation propagation [
14]. Data-driven solutions use deep learning architectures such as autoencoders [
15], deep neural networks (DNNs), and convolutional neural networks (CNNs) [
16] for high detection accuracy. These deep learning (DL)-based MIMO detection methods outperform traditional detectors under various channel conditions. Although there are studies that discuss model-driven and data-driven approaches either separately or together, the increasing amount of data in new communication systems increasingly favors model-driven methods. Unsupervised deep learning techniques, such as autoencoders, can be used to learn the entire system for MIMO detection, as demonstrated in data-driven MIMO detection [
15]. In addition, DetNet uses a model-driven approach to detection using iterative projected gradient descent [
17]. Data-driven methods for MIMO detection in fixed-channel scenarios utilize CNNs and DNNs [
16]. Another approach uses conventional deep learning network topologies for signal detection in MIMO systems with erroneous channels [
18], while another study employs neural networks to identify decision zones for multi-user MIMO systems [
19].
Deep unfolding (DU) algorithms, also known as model-driven deep learning methods, constitute a transformative approach that combines classical iterative methods with the adaptive capabilities of neural networks, and are a common solution for MIMO detection [
20,
21]. By structuring known iterative algorithms into neural network layers, each iteration treated is treated as a layer [
22] that allows parameters to be trained via backpropagation rather than updated deterministically in a traditional way. This leads to improved solutions by incorporating additional or modified parameters to capture features that classical methods may miss [
23]. Unlike traditional methods, the network can generalize to new inputs after training on different data sets, eliminating the need to recalculate parameters for each system change. This approach builds neural network layers over multiple iterations using advanced learning techniques to achieve unprecedented results [
20,
24,
25,
26]. Various deep unfolding-based algorithms for MIMO channel detection are reported, including trainable projected gradient detectors [
27] and the conjugate gradient descent technique [
28,
29], with other alternative approaches in [
11,
12,
30]. Despite these developments, there is still a significant research gap in improving these approaches, especially when it comes to dealing with the complexity and variability of harsh situations. This emphasizes the need for further advances in this area and the usefulness of the proposed approach in improving MIMO detection technology.
The deep unfolding approach also offers significant advantages in computational efficiency and hardware implementation [
26]. This method is particularly beneficial for physical applications with hardware constraints and operational efficiency requirements. By predetermining the neural network’s structure to mimic specific algorithm iterations, it reduces the need for extensive training data and computational resources, addressing major challenges faced by traditional DNNs.
This study presents a significant advance in the field of MIMO signal detection by introducing a unique detection strategy that combines Tikhonov regularization and CG method with deep unfolding. Detection of transmitted symbols over a multipath fading channel is considered as an ill-conditioned problem [
31] that may result in slower or even a non-existent convergence. Using the matrix
L as a regularization term enhances the detection process in CG-based detection methods by resolving the input signal while effectively suppressing the degrading effects. Thus, the regularization allows for significant improvements in detection performance over conventional methods for different channel conditions and antenna layouts.
The main contributions of this study are summarized below:
To the best knowledge of the authors, this is the first study where Tikhonov regularization is integrated with the conjugate gradient method for MIMO detection in a deep learning-based approach.
Performance of the proposed method has been compared with both iterative and model-driven techniques for different channel models such as Rayleigh, Kronecker, Tapped Delay Line A (TDL-A), and TDL-E.
The remaining sections of this study are organized as follows:
Section 2 presents the relevant work and subjects.
Section 3 provides a thorough explanation of the proposed approach. A comprehensive analysis of computational complexity is provided in
Section 4. The simulation results are given in
Section 5 and
Section 6, and conclusions are drawn and suggestions for further work are explored.
3. Proposed Method
Tikhonov regularization, also known as ridge regularization, is a method which inserts a regularization term to the solution for solving ill-posed problems or preventing overfitting in linear regression. Penalizing the magnitude of the coefficients in the loss function we are attempting to minimize is the core idea behind Tikhonov regularization. In its simplest form, Tikhonov regularization penalizes solutions with large magnitudes by adding a regularization term to the objective function to be reduced [
38]. The solution is more resilient to varying conditions thanks to this regularization term. Several applications in engineering and physics lead to the following types of linear least-squares problems:
Matrix
A is of ill-determined rank, that is, its singular values gradually decay to zero without a noticeable gap, and the measured data, tainted by an unknown error
of norm constrained by δ > 0, are represented by
. The matrix
has
rows representing the number of measurements and
columns corresponding to the number of unknowns or variables, which are the components of the vector
. Least-squares problems, also referred to as discrete ill-posed problems, require this kind of matrix. An exact approximation of the minimal norm solution
for the error-free least-squares problem associated with (4) is sought after.
represents the pseudoinverse of Moore–Penrose in this case. Due to the error in
and the clustering of A’s singular values near the origin, the solution
of (4) is usually not a reasonable approximation of
. Changing the minimization problem to a nearby problem whose solution is less vulnerable to the error in
is one method to overcome this problem (4). This substitution is sometimes referred to as regularization.
The regularization parameter
> 0 in this case controls how sensitive the solution of (5) is to the error e in
as well as how near the solution is to the target vector
. It is generally known that by substituting an appropriate regularization matrix for the Tikhonov minimization problem (5), it is frequently possible to increase the quality of the
approximation determined by Tikhonov regularization
L.
where the regularization matrix
typically has dimensions such that
may vary depending on the specific regularization approach, although it is often equal to
in the case of square matrices.
The regularization matrix
L encodes the extra restrictions or previous knowledge about the solution x in (6), and
λ is a regularization parameter that governs the trade-off between fitting the data and meeting the regularization term. The process of choosing the regularization matrix
L and regularization parameter
λ, which requires domain expertise and careful tuning, is critical to obtaining precise and reliable solutions to ill-posed inverse problems. The matrix
L is typically an
N ×
N matrix, where
N represents the number of unknown variables, whereas parameter
λ is usually a constant scalar. Basically, there are two main approaches for selecting the regularization matrix
L. The first one is the non-derivative method which forms
L as
. Here,
is derived from the singular value decomposition (SVD) of matrix
A, and
D is a diagonal matrix of singular values. The latter one is the derivative-based method that uses first- or second-order derivative operators to construct
L. An interested reader should refer to [
38,
39] for the details.
3.1. Deep-Unfolded Tikhonov-Regularized Conjugate Gradient Algorithm
We utilize Tikhonov regularization in the CG algorithm with deep unfolding to improve performance on different types of channels, addressing issues such as noise sensitivity and ensuring convergence in high-dimensional MIMO systems. The CG technique effectively tackles the complexity of high-dimensional MIMO systems, improving performance and signal estimates when paired with Tikhonov regularization. When incorporated into model-driven MIMO detection frameworks such as LCG, Tikhonov regularization appears to be an effective method for improving robustness and performance. This method involves adding a term
L to the system matrix in the context of MIMO detection. In this work, the proposed method is dynamically adjusting the detection strength by the model during training thanks to the trainable parameter regularization matrix
L, alpha, and beta. It is noteworthy that, unlike the original Tikhonov regularization approach shown in (6), which represents the parameters
λ and
L individually, our proposed method combines them into a single matrix
L, which is the product of the scalar
λ and the matrix
L. By treating the multiplication of these elements as a single matrix, the network streamlines the computation and improves the flexibility of the model for different channel conditions. The pseudocode of the proposed method is shown in Algorithm 2.
Algorithm 2: Deep-Unfolded Tikhonov-Regularized Conjugate Gradient Algorithm |
Inputs: y, H, δ2 Output: Transmitted signal vector estimation 1: Initialization: , x0 = 0, r0 = b, p0 = r0 2: for i = 0, …, K do 3: 4: 5: 6: 7: train: AdamOptimizer(minimize(loss), parameters{α,β,L}) 8: 9: end for 10: return |
The DU-TCG algorithm enhances the CG method with deep unfolding by incorporating Tikhonov regularization and an optimization step using trainable parameters α, β, and
L. These parameters are optimized with an Adam optimizer [
40,
41] during each iteration to minimize the loss function, playing crucial roles in updating the residual, refining the search direction, and applying regularization. The algorithm begins by initializing matrices and vectors as in the standard CG method. The parameters α and β are initially set to
0, while the
L matrix is initialized as an identity matrix of size
. In each iteration, the residual
is updated with
, and the search direction
is refined with
. A new solution estimate
is computed using Tikhonov regularization matrix
, with a loss function measuring the difference between
and
. The process iteratively refines the solution by updating trainable parameters until the final transmitted signal vector estimate
is obtained. The iterations of the Tikhonov-regularized CG algorithm are unrolled to create the proposed Deep-Unfolded Tikhonov-Regularized Conjugate Gradient (DU-TCG), a deep learning architecture in which each layer is associated with an algorithm iteration as shown in
Figure 3.
Using this method, the model can be trained to find the best parameters for improved performance in a range of channel situations. Along with the α and β parameters, the DU-TCG algorithm adds a new trainable parameter called the regularization, L, which is in matrix form. This combination allows for dynamic modification of the regularization strength during training and simplifies computation as it is processed by the network as a single matrix.
To guide the training process, DU-TCG’s loss function is designed to assess the difference between the transmitted signal and the predicted output. For this purpose, the mean squared error (MSE) loss function is used as shown in Algorithm 2. The mean squared difference between the expected and actual values is quantified by the MSE loss function, giving the network a specific target to minimize during training. The network increases the resilience and accuracy of detection by iteratively updating the parameters
α,
β, and
L to minimize this loss. The
iteration of the CG algorithm corresponds to the
layer of DU-TCG detector. The layer-dependent trainable parameters of the DU-TCG detector is represented with
in the
i-th layer of the network and
and
step size, search direction coefficient, and
L regularization matrix are learnt from training samples
by minimizing mean square error as shown:
K denotes the number of layers, and denotes the output of DU-TCG with and inputs.
3.2. Training Details
The TensorFlow (version 2.6.2) library with the Adam optimizer was used to create the proposed DU-TCG network in Python (version 3.6.13), and channel matrices were generated using MATLAB R2021a. The test and training data sets,
and
, were created randomly based on Equation (2) with different noise levels. The transmitted symbols,
x, were selected from modulation schemes such as BPSK, 16-QAM, 64-QAM, and 256-QAM. Various channel models, including the Kronecker channel [
28], Rayleigh fading channel, and TDL-A and TDL-E MIMO channels [
29], employ different random generators for their channel matrices. The training process uses 5 × 10
4 samples with an SNR of 25 dB. In deep-unfolded MIMO detection, training with high SNR values is usually preferred to provide a better model tuning. For instance, refs. [
28,
42] have shown that higher SNR values provide a clearer signal, which allows the model to learn more effectively. Additionally, refs. [
43,
44] have stated that training at lower SNR levels can even degrade model performance. All trainable parameters were initially set to zero. Subsequently, 5 × 10
5 samples were used for training with SNR values ranging from 0 to 20 dB, incremented by 2 dB. This wider range of SNR values allows a comprehensive evaluation of the detector’s performance. The learning rate was first set to 10
−3 and was fine-tuned by halving it after each epoch, so that the detection is more robust. The average loss’s point of discontinuity determines the stopping criterion. Also, to ensure a fair comparison under identical conditions, the number of layers used in deep-unfolded methods was chosen as the same. The number of layers used for Rayleigh channel was selected as 5, while 15 layers were used in other channel models to adapt to challenging channel conditions. The training method is computationally efficient, taking about two hours on a normal Intel i7-7500U processor, because the model has just three trainable parameters and works well with a small number of layers. It is anticipated that the training time will rise in proportion to larger MIMO systems or models with more trainable parameters.
5. Simulation Results
Within this section, several MIMO layouts over different channel conditions and modulation orders are discussed to demonstrate the performance of the proposed DU-TCG detection method. In particular, the simulation results highlight the detection performance improvement of the DU-TCG over well-known approaches such as the MMSE, CG, and LCG. Unless otherwise stated, BPSK modulation is used for the sake of simplicity, as it allows a straightforward evaluation of the behavior of the algorithm and a demonstration of its core performance. Bit error rate (BER) and normalized mean square error (NMSE) metrics are employed to illustrate the superiority of the proposed method over the discussed other methods.
Figure 4 compares the BER performance of the proposed DU-TCG method with other methods, namely MMSE, CG, LCG, and ideal ML detector over a Rayleigh fading channel for a BPSK modulated 10 × 10 MIMO system.
Figure 4 shows that although the ML detector offers the lowest bit error rates, the proposed DU-TCG can perform better than other gradient-based techniques such as CG, and LCG and traditional MMSE, which makes it a promising candidate for practical applications. The proposed DU-TCG method has approximately 4 dB SNR gain over CG and LCG methods at BER values of 10
−3. In
Figure 4, all methods except ML have poor BER performance at low SNR values as expected. Unlike ML, iterative methods are more affected by low SNR values during the initial iterations, leading to suboptimal results.
Figure 5 shows that the proposed DU-TCG method outperforms CG and LCG for different number of layers in the 10 × 10 MIMO layout.
As shown in
Figure 5, DU-TCG exhibits better BER performance with both 5 and 15 layers, compared to the CG and LCG methods. Although the performance of the CG and LCG improves when the number of layers is increased by 3 times, DU-TCG still outperforms these methods. However, as the complexity of the system is directly related by the number of layers, the results show that DU-TCG is a better candidate in terms of system cost.
Besides the conventional MIMO scheme and channel models discussed above, in
Figure 6, we also compare the performance of DU-TCG with Kronecker, TDL-A, and TDL-E channel models for the 32 × 64 MIMO scheme with BPSK modulation. Since the channel conditions are challenging, 15 layers are used here in the training process.
For Kronecker, TDL-A, and TDL-E channel models, both the proposed DU-TCG and existing LCG methods do not have a sufficient detection performance. However, DU-TCG still outperforms the LCG for all three channel models. DU-TCG has employed the benefit of Tikhonov regularization’s superiority for the ill-posed problems, and DU-TCG’s sophisticated regularization mechanism significantly improves the detection process compared to the LCG method.
Figure 7 presents the BER performance of CG, LCG, and the proposed DU-TCG methods for BPSK and QAM16 modulation types for a 32 × 128 MIMO layout in TDL-A channel.
It is well known that the performance of iterative detection methods increases significantly when the number of receiving antennas is much greater than the number of transmitting antennas, and this effect is reflected in our simulation results using the DU-TCG, LCG, and CG methods.
Figure 7 shows the results of the system with such a MIMO structure. The detection performance of LCG is close to DU-TCG for lower-order modulations in a difficult channel condition such as TDL-A, while DU-TCG outperforms LCG for a higher-order modulation. The figure demonstrates that the BER of DU-TCG is up to 1.2 times better than LCG when using BPSK modulation under TDL-A fading and up to 8.3 times better when using 16-QAM modulation. The superior performance of DU-TCG for difficult conditions such as high-order modulation systems suggests that integrating a regularization term into a deep-unfolded method will help convergence stability and hence improve detection performance.
Additionally, in the simulation results presented above, the BER values are limited to 10
−4, as in similar studies utilizing deep unfolding methods in the literature [
45,
46]. Achieving lower BER values in the simulation environment, particularly for deep-unfolding-based methods, usually requires considerably longer simulation times and higher computational resources, which is impractical for our current computing environment. Nevertheless, we conducted an extensive simulation to achieve lower BER values, and the results are presented in
Figure 8, which illustrates the BER performance of the LCG and DU-TCG techniques for 64-QAM and 256-QAM modulations in a 32 × 64 MIMO system under Rayleigh fading conditions.
The detection performance of higher-order modulation schemes, such as 64-QAM and 256-QAM, as shown in
Figure 8, demonstrates that the proposed DU-TCG method retains its superior performance even with more complex modulation formats, which are widely used in sub-6 GHz 5G communication systems [
45,
47]. In contrast, other deep unfolding techniques, such as in [
19,
28,
29,
42], may show a degradation in detection performance as the modulation order increases.
In
Figure 9, we compare the detection performance of the proposed DU-TCG method with DetNet over a 32 × 64 MIMO layout.
For the 32 × 64 configuration, DU-TCG’s BER remains up to nine times lower than DetNet’s, demonstrating its strong performance even at higher antenna counts. While the number of trainable parameters in DetNet is eight for each layer, = , it is three for the proposed DU-TCG method, = . DetNet has a large number of trainable parameters, which can increase computational complexity and resource requirements. These results highlight the usefulness of DU-TCG, providing better performance without excessive computational cost.
The NMSE performance of the proposed DU-TCG and LCG methods, shown in logarithmic scale, for scalar and vector parameterization, is illustrated in
Figure 10 for a 32 × 64 MIMO layout.
Figure 10 is one of the most important pieces of evidence showing the superiority of using Tikhonov regularization as DU-TCG has lower NMSE values in both scalar and vector parameterization cases. As the simulation results show, in the case of scalar parameterization, the NMSE decreases with increasing SNR values for both DU-TCG and LCG methods, which indicates that the signal detection performance of the system increases. On the other hand, in the case of vector parameterization, the increase in SNR values for LCG does not affect the NMSE values after a point, while the NMSE value of the proposed DU-TCG continues to decrease. This shows that the performance of the LCG in complex situations remains constant after a certain level and cannot learn effectively. In addition, the NMSE values for vector parameterization approaches are much smaller than those for scalar parametrization, indicating improved detection performance. Therefore, in case of vector parameterization, the effect of regularization on NMSE performance is higher, resulting in better BER performance.
After discussing the effect of scalar or parameterization techniques on the model-driven approaches above, we will show the impact of the same techniques on BER values in
Figure 11.
In terms of BER values, in scalar parameterization, DU-TCG and LCG have similar detection performance. Additionally, LCG-V outperforms the DU-TCG-S. However, when we employ the vector parameterization with Tikhonov regularization in DU-TCG, more than 1 dB SNR gain is achieved at the 10−3 BER value compared to LCG-V.
6. Discussion
In this study, unlike traditional techniques, a method was proposed that integrates the iterative conjugate gradient method with Tikhonov regularization. The regularization matrix, step size, and search direction coefficients were used as trainable parameters in a deep unfolding approach. Extensive simulation results that demonstrate the superiority of the proposed method over existing methods were presented.
The results of this study clearly demonstrate that the proposed method, DU-TCG, is an improved detection strategy for MIMO systems, outperforming both state-of-the-art deep learning techniques such as DetNet and LCG, as well as traditional approaches such as MMSE and CG. Combining the conjugate gradient method with the Tikhonov regularization approach in a deep learning framework successfully reduces the degrading effects of channel conditions and higher-order modulation.
The scalability of DU-TCG is demonstrated by its consistent performance in both large (32 × 128) and small (10 × 10) MIMO systems, providing broad applicability to MIMO configurations of varying size and complexity. Additionally, the better performance of the DU-TCG is also shown under various channel conditions such as Kronecker, TDL-A, and TDL-E channel models, which are widely used in advanced communication systems. This research also reveals that DU-TCG surpasses DetNet in a 32 × 64 MIMO layout when system complexity increases. Simulation results also show that the proposed method provides up to 4 dB SNR gain compared to CG with considerably less iterations. In addition, DU-TCG stays ahead of CG and LCG as the number of layers increase, even surpassing LCG’s high-layer performance with fewer layers. Ultimately, as compared to the vector parameterization of LCG, the NMSE and BER performances of DU-TCG are superior. For both scalar and vector parameterization, DU-TCG decreased the NMSE values.
This work combines Tikhonov regularization and the CG technique with deep unfolding, which significantly improves MIMO signal detection under various MIMO layouts, modulation orders, and channel conditions. The proposed method inserts a regularization term to the system, which improves the stability and generality of the solution. In particular, when there are noisy or imperfect data conditions, this regularization helps the CG converge more consistently. Thus, the approach efficiently handles the complexity present in high-dimensional MIMO systems while iteratively improving its signal estimates.
Consequently, this study highlights the advantages of the proposed DU-TCG method for MIMO detection over different scenarios. As a future work, combining Tikhonov regularization with other iterative detection algorithms for MIMO systems may be considered. Additionally, reducing computational complexity of model-driven approaches is also another challenge that needs to be discussed in the area of deep-unfolded algorithms.