5.1. Simulation Settings
The samples used for training and evaluating the models are generated following the data model presented in
Section 3.
For the generation of a dataset sample, the number of signals
K is first randomly generated from
, then
K angles are randomly sampled from
. The covariance matrix
can be then estimated by exploiting the previous values and the fixed parameters of the ULA. The label associated with the covariance matrix is
Hence, each sample consists of a pair
. The parameters used in our simulations are designed to closely reflect real-world scenarios, incorporating appropriate approximations as recommended in [
59]. Specifically, the ULA in this work consists of 10 elements (
) with an inter-element spacing of
, corresponding to a wavelength of
. Additionally, we set the maximum number of impinging signals to
. We assume a sampling rate twice the signal frequency, following the Nyquist criterion to prevent aliasing [
60]. We use a 32-bit ADC resolution. Finally, multipath components were set to zero, focusing on direct line-of-sight signals to isolate DOA estimation performance under controlled conditions.
We generate a dataset containing data samples with angles in the range , affected by variable SNRs uniformly sampled from the intervals , , and dBs. Each sample consists of a total of snapshots.
The generation of data samples begins by generating the entire DOA grid in the range of with a step size of 1°. From this grid, all possible combinations of 1, 2, and 3 sources are created. For combinations involving two or three sources, the DOA sequences are randomly permuted. This process yields a total of 288,100 DOA combinations. For each DOA combination, three samples are generated for each one of the three SNR intervals. Considering the DOA grid, the various SNRs and the three sample variants, the dataset comprises 2,592,900 samples. We randomly split this dataset into training and validation sets, allocating 80% for training and 20% for validation. Consequently, the training set consists of 2,074,320 samples, while the validation set contains 518,580 samples.
We then generate seven distinct test sets to assess the method’s performance on unseen data and its capability to generalize to scenarios beyond those simulated in the training set. These test sets incorporate varying conditions—such as phase shifts, modulation types, and carrier frequencies—that impact both the signal waveform and the resulting covariance matrix. Since the covariance matrix captures the correlation structure of signals received across array sensors, these conditions affect the input to the proposed model in unique ways, altering spatial and temporal correlations.
- T1.
Same characteristics as the training set
This test set is designed with the same characteristics as the training set.
- T2.
Different number of snapshots
This test set contains samples with the same characteristics as the training set in terms of grid of arrival angles and noise levels but varies the number of snapshots T with values . Testing with different snapshot counts is crucial to assess the model’s performance under varying data availability, from limited snapshots (e.g., ) to more extensive datasets (e.g., ), simulating real-world conditions and challenges.
- T3.
Various SNRs
This test set consists of samples generated by simulating signals with SNR levels different from those used in the training set, specifically dB. Testing with unseen SNR levels is crucial to evaluate the model’s generalization ability in real-world scenarios, where signal quality can vary significantly. Low SNR levels (e.g., ) represent highly noisy environments, challenging the model’s robustness and noise tolerance, while higher SNR levels (e.g., ) simulate clearer signals where precision becomes key.
- T4.
Off-grid angles
This test set comprises samples with arrival directions that are not part of the training grid (off-grid angles), specifically . Experimenting with off-grid signals is essential for evaluating the model’s ability to generalize beyond predefined scenarios and handle real-world conditions where signal arrival angles rarely align perfectly with the training grid.
- T5.
Combination of T2, T3 and T4
This test set combines the characteristics of the T2, T3, and T4 test sets, meaning all challenging conditions are present simultaneously. This makes the evaluation data highly realistic, as they reflect real-world scenarios where such conditions often co-occur.
- T6.
Various phases
This test set consists of signals with phase shifts in the range , which are not present in the training set or the T1 test, where the phase is fixed at . Testing with varying phase values is essential to assess the model’s robustness to phase variations, a common occurrence in real-world communication and radar systems due to factors like signal propagation and modulation schemes.
- T7.
Various frequencies
This test set is designed to evaluate the robustness of the models with respect to changes in signal frequency. The same parameters as in T1 were used, except that the frequency varies. Different carrier frequencies were applied while keeping the sensor positions fixed. Each frequency has its own wavelength and wavenumber, meaning the inter-element spacing is not normalized to each signal’s wavelength. Four subsets were created, each with signals at a different frequency: 2.4 GHz, 3 GHz, 5 GHz, and 10 GHz. These frequencies were selected due to their widespread use across various applications. Higher frequencies (5 GHz and 10 GHz) are typically better suited for short-range, high-bandwidth applications, while lower frequencies (2.4 GHz and 3 GHz) offer greater coverage and signal penetration.
- T8.
Modulation types
This test set is similar to T1, except that different modulation types are applied to simulate a variety of signal behaviors. Amplitude Modulation (AM) introduces a variation in the signal’s amplitude over time, creating fluctuations in strength that can be detected and analyzed. Frequency Modulation (FM) produces a signal with a varying instantaneous frequency, altering how rapidly the signal’s phase evolves, which affects its spectral properties. Phase Shift Keying (PSK) changes the signal’s phase according to a specific modulation scheme, typically used to encode information in discrete phase shifts, making it ideal for digital communication.
5.2. Results
The performance of the proposed model for DOA estimation is compared with five state-of-the-art DOA methods that have publicly available implementations. MUltiple SIgnificant Classification (MUSIC) identifies DOAs in correspondence with the pseudo-spectra identified on a predefined grid [
5]. Root MUSIC (R-MUSIC) estimates the angular directions from the solutions of higher-order polynomials [
61]. The aforementioned methods are covariance-based techniques that demand sufficient data snapshots for accurate DOA estimation and often assume known NOS. To ensure consistency with the experimental setup of other methods where the NOS is unknown, we applied the Akaike Information Criterion (AIC) to estimate the NOS before performing DOA estimation [
62]. Multi-snapshot Newtonized Orthogonal Matching Pursuit (MNOMP) uses Newton refinement and feedback strategy for DOA estimation, leveraging Fast Fourier Transform (FFT) to keep computation complexity low [
63]. The Multi-Task Autoencoder with Parallel multilayer Classifiers (MTAPC) model effectively estimates spatial spectra in complex scenarios, primarily thanks to the enhanced representation achieved by the multi-task autoencoder [
9]. DNNDOA [
18] is a CNN designed for DOA estimation in the low-SNR regime as a multi-label classification task, considering an on-grid approach.
The Cramér–Rao Lower Bound (CRLB) provides a theoretical benchmark for the minimum variance of unbiased estimators, making it an essential measure for complementing and validating the results obtained by existing methods in DOA estimation. Comparing DOA methods with the CRLB helps assess their proximity to optimal performance, highlighting both efficiency and unbiasedness. Since the CRLB accounts for factors such as noise and sensor configuration, it serves as a crucial reference for validating results on simulated data under various conditions. Methods that approach the CRLB reach the theoretical performance limit, while those that exceed it reveal areas for improvement [
64]. Therefore, we include the CRLB in our comparison of results.
The results are presented using RMSE to evaluate the precision of DOA estimation, while accuracy is employed to estimate the NOS, measured as the fraction of correctly estimated sources. RMSE is calculated using the following equation [
26]:
where
indicates the predicted angle,
represents the ground-truth angle, and
K is the number of signals. Ground-truth and predicted signals are paired using a nearest-neighbor metric, ensuring that each predicted angle is compared to its closest ground-truth angle, resulting in a more accurate and lower RMSE value. When the number of predicted sources does not match the number of ground-truth sources, special handling is required for RMSE computation.
Table 2 provides an example to demonstrate how these cases are addressed.
For test set T1, we report results in
Table 3. As it is possible to see, the proposed method achieved the best performance with respect to the competitors. On the other hand, MNOMP achieves the worst RMSE and accuracy. From the perspective of efficiency, we evaluate methods based on inference time, computational complexity, and the number of parameters. In terms of inference time, deep learning methods are noticeably slower than traditional approaches like MUSIC, R-MUSIC, and MNOMP. When measuring computational complexity in Mega-FLOPS (MFLOPS), these traditional methods again prove more efficient than deep learning approaches. Among the deep learning methods, DNNDOA has the highest computational complexity, reaching 2.43 MFLOPS. Despite the fact that our proposed method has significantly more parameters than MUSIC, R-MUSIC, and MTAPC,
Figure 3 demonstrates its superior performance in both RMSE and accuracy. Interestingly, DNNDOA, which has a similar parameter count to our model, performs the worst in terms of accuracy. In
Figure 4, the RMSE is decomposed by the number of sources and varying SNR. Notably, the RMSE value for three sources, shown in panel (c), is higher than that for one and two sources. Among the algorithms evaluated, DNNDOA exhibits the greatest sensitivity to high SNR values.
In the T2 test set (see
Table 4), performance declines with shorter snapshots for all methods. Our method outperforms others, with RMSE and accuracy of 11.81 and 47.04 (
) compared to 0.32 and 99.58 (
). In the T3 test set, as seen in
Table 5, the methods exhibit comparable RMSE performance, with the exception of DNNDOA. MTPAC achieves the lowest error, followed by R-MUSIC and MUSIC, with our method close behind. This suggests that the other methods demonstrate slightly better generalization to noise levels not covered in the training set.
The performance of the T4 test set is presented in
Table 5. As before, it clearly demonstrates that the proposed method outperforms the other methods. In a broader context, when comparing these results with those in
Table 3, it turns out that off-grid angles have a minimal impact on the performance of the methods under consideration. Analyzing the results presented in
Table 6 for the T5 test set, it is evident that the proposed method’s RMSE performance remains relatively stable across varying snapshot lengths. In fact, our method outperforms the competitors in most cases, with the exception of RMSE at
. It is worth noting the stark underperformance in estimating the NOS by the MUSIC and R-MUSIC methods when dealing with snapshots of fewer than 500, with rates as low as 0.00%. In
Table 7, it achieves the lowest RMSE and high accuracy across all phases, proving robust against phase variability, while other methods like DNNDOA and MNOMP display variable performance with higher RMSE, particularly at 45° and 90° phases. In
Table 8, the proposed method excels across the frequency spectrum, especially at higher frequencies (5 GHz and 10 GHz), where it achieves an RMSE of 18.56 and nearly 93% accuracy. While MUSIC and R-MUSIC maintain high accuracy, they show slightly higher RMSE, indicating some limitations at high frequencies.
Table 9 shows that the proposed method also leads in handling different modulation types, particularly FM and PSK, achieving low RMSE and high accuracy.
The boxplots in
Figure 5 illustrate the CRLB index distributions for the different test sets and snapshot counts. In the first plot (T1, T3, T4), the CRLB values are concentrated between
and
degrees squared. T3 shows a larger spread, indicating more variability in estimation error. In the second plot (T2), as the number of snapshots increases, the CRLB decreases significantly, reflecting improved estimation accuracy with more data. At 1000 snapshots, the CRLB is concentrated between
and
. The third plot (T5) follows a similar trend, though with higher overall CRLB values compared to T2, indicating more challenging estimation conditions for T5. Overall, more snapshots lead to a lower and narrower CRLB, highlighting improved performance.
By comparing CRLB values with those achieved by the proposed method on the various test sets, it is possible to notice that as snapshot count increases, the RMSE of the proposed DOA estimation method converges toward the CRLB, indicating more efficient performance with larger data samples. At low snapshot counts (5 and 100), the RMSE is notably higher than the CRLB, reflecting increased estimation uncertainty with limited data. However, with more snapshots (200, 500, and especially 1000), the RMSE closely approaches the CRLB, demonstrating the method’s ability to achieve near-optimal accuracy when sufficient data are available. For instance, with 1000 snapshots in T2, the RMSE of is close to the CRLB of , and in T5, the RMSE of is near the CRLB of .
To obtain a scenario-independent comparison of each method’s relative performance, we normalize the RMSE of each method by the CRLB for the corresponding scenario. Since the CRLB represents the best possible performance achievable in a given scenario, dividing the squared error by this bound yields a normalized measure, which is independent of the specific scenario’s characteristics. This normalization minimizes the impact of scenario variability on the RMSE, thus reducing the overall uncertainty when averaging performance across scenarios. The result is an effectiveness metric for each method, defined as the ratio of RMSE to CRLB, averaged over all scenarios. As seen in
Figure 6, our approach more accurately reflects the performance disparities among estimators, making it evident how each one approaches or deviates from the theoretical CRLB limit.