1. Introduction
Direction of Arrival (DOA) estimation is a crucial task in array signal processing research. Many excellent estimation methods, such as the multiple signal classification (MUSIC) algorithm [
1] and the estimating signal parameters via rotational invariance techniques (ESPRIT) algorithm [
2], have been proposed and widely applied in modern radar, sonar, and wireless communication systems. While traditional scalar arrays have achieved a relatively mature state in DOA super-resolution angle estimation, polarization-sensitive arrays (PSAs) offer notable advantages in practical applications. Due to their strong resistance to interference and high separation probability, PSAs have garnered considerable interest in signal processing research over the past few decades [
3,
4,
5]. At the same, various algorithms have been explored in depth for coherent polarization–DOA estimation, including polarization smoothing (PS) [
6], improved polarization smoothing [
7], signal-subspace fitting [
8,
9], and alternating projection (AP) [
10]. In general, the aforementioned algorithms construct parametric models for signals or antenna arrays, and thus can be categorized as “model-driven”. Yet, the performance of model-based methods deteriorates rapidly when the actual received signals significantly deviate from the predetermined model, particularly in the presence of factors such as a low signal-to-noise ratio (SNR). Moreover, existing algorithms often fail to account for the real-time requirements, which are critical in modern radar, wireless communication, and other applications where polarization–DOA estimation must frequently operate in high-speed, real-time environments.
Therefore, to address such issues, various data-driven learning methods have emerged, namely neural networks and support vector machines (SVM) [
11,
12,
13]. Researchers have applied deep learning techniques to radar parameter estimation, resulting in significant improvements in estimation accuracy [
14,
15,
16]. Meanwhile, with the rapid development of deep learning, coherent DOA estimation algorithms based on deep learning have gradually received attention. Various deep learning-based methods have been proposed to address the limitations of model driven approaches. Some methods include defect elimination algorithms [
17], unsupervised learning strategies [
18], and phase position learning [
19,
20,
21]. These methods outperform traditional algorithms such as spatial smoothing and subspace fitting, particularly in complex environments. For instance, in one study [
22], deeper data features of coherent signals were explored by introducing angle separation learning schemes. A more suitable learning model was established that overcame the limitations of traditional methods and improved the performance of coherent DOA estimation. At the same time, models such as the multitask autoencoder [
23] and few-shot learning [
24] have demonstrated superior performance in low SNR scenarios, handling fewer snapshots and array defects more effectively than typical super-resolution algorithms. These advancements offer new possibilities and directions for radar parameter estimation. Nonetheless, research on 2-D coherent polarization–DOA estimation using deep learning remains limited. In addition, although simulations show that a multilayer perceptron (MLP) can perform coherent polarization DOA estimation, its fully connected structure may limit its effectiveness. The lack of position and time modeling mechanisms in the multilayer perceptron may affect the effectiveness of extracting and fusing multi-dimensional information in the spatial polarization domain.
To address these issues, a high-precision 2-D coherent polarization–DOA estimation method based on a sequence-embedding fusion (SEF) transformer is proposed for the first time. Initially, inspired by natural language processing (NLP) techniques [
25], the multi-dimensional data of the spatial-polarization domain, represented by the covariance matrix, is translated into textual tasks within the transformer model [
26,
27]. This transformation leverages the multi-head self-attention mechanism of the transformer, allowing it to capture the intricate multi-dimensional features of the spatial-polarization domain. Additionally, this paper introduces the SEF module, which integrates a convolutional neural network (CNN) with local information extraction capabilities and a dimension transformation function. This integration enhances the correlation within the spatial-polarization domain across different spatial and temporal sequences. The model effectively integrates spatial-polarization domain features from the covariance matrix, improving its ability to capture correlations. In addition, in order to improve the expressiveness and generalization ability of the model, we designed a multi-task parallel output model and a multi-task weighted loss function [
28,
29]. Simulation results show that the proposed method improves the estimation accuracy compared to the traditional model-driven algorithms such as the generalized subspace fitting (GSF) algorithm. Additionally, the method is superior to classical data-driven methods such as multilayer perceptron (MLP) in the estimation accuracy and generalization ability. Moreover, the proposed method has made significant advancements in both real-time performance and efficiency, and future research will further explore reconfigurable digital architectures. This challenge highlights the importance of reconfigurable digital architectures as a key research direction. By leveraging reconfigurable hardware architectures, such as field-programmable gate arrays (FPGA) and application-specific integrated circuits (ASIC), researchers can develop polarization–DOA estimation systems capable of real-time operation. These architectures offer flexible hardware resource allocation, which not only enhances processing speed but also significantly reduces power consumption, thereby meeting the demands of modern signal processing [
30,
31].
The rest of this paper is structured as follows:
Section 2 introduces the signal model, including the detailed mathematical representation of the array model and the received signals.
Section 3 presents the core methodology of the proposed sequence-embedding fusion (SEF) transformer for polarization–DOA estimation, including the transformer model, data preprocessing, and the SEF module for feature fusion.
Section 3 explains the optimization strategies for the model, such as omitting positional encoding and implementing multi-task parallel processing, as well as introducing the angle pairing method and calculation principle of model efficiency.
Section 4 details the numerical simulations and experiments conducted to evaluate the performance of the proposed method. Finally,
Section 5 concludes the paper by summarizing the key findings and outlining potential directions for future research.
2. Signal Model
In this paper, as shown in
Figure 1, a uniform circular array (UCA) of radius
r is used to adopt
M co-point biorthogonal short dipoles, and the two channels correspond to the directions of the
x and
y axes, respectively. Let the azimuth angle and elevation angle of the
n-th incident signal be
,
, the auxiliary polarization angle and polarization phase difference be
, the spatial steering vector of the corresponding signal can be expressed as:
where
,
and
,
is the position coordinate of the
m-th dipole,
, and
. The polarization steering vectors for the
n-th signal of the array can be expressed as:
where
, and since the array element uses a co-point electromagnetic vector sensor, the manifold matrix consisting of
N signals can be expressed as:
where ⊗ denotes the Kronecker product, and
is the manifold matrix. When the incident signal is coherent, the signal can be expressed as:
where the complex gain
is the complex constant of
with respect to
, and
T is the number of snapshots. The received signal can be expressed as:
where
is the additive white Gaussian noise vector, and
is the
dimensional vector of attenuation coefficients. The covariance matrix of the received signal
can be expressed as:
where
, and
is the covariance matrix associated with different polarization information.
3. SEF Transformer Model for Polarization–DOA Estimation
As discussed in this section, using the signal model, the aim was to process the multi-dimensional feature data derived from coherent signals and their spatial-polarization domain information. However, the fully-connected structure of multilayer perceptrons (MLP) may limit their effectiveness in estimating 2-D coherent polarization–DOA. This is due to the absence of model mechanisms such as position and time, which impedes the extraction of spatial-polarization domain feature information. As such, efforts were made to transform the challenge of processing multi-dimensional data
in the spatial-polarization domain into a textual inference task within the transformer model [
26,
27], in accordance with the principles of NLP [
25]. Since transformers typically process input in the form of tensors containing sequence information, the proposed model, as illustrated in
Figure 2, utilizes a preprocessed
. This matrix is divided into
S-dimensional sequence data and
E-dimensional embedding data, which serve as inputs to the model.
The generated signal can be represented as L-dimensional data. When the signal angle falls within a specific interval, the corresponding position is set to 1, while all other positions are set to 0. Assuming there are N incoming waves, the signal is represented by an L-dimensional vector of length with N positions are 1 and the remaining positions are 0. This representation is known as N-hot encoding. To address the classification problem, this paper employs N-hot encoding to explore all possible permutations, which corresponds to the number of permutations CLN for . As such, one-hot encoding is used as a substitute for N-hot encoding. The model’s output consists of four parallel angle predictions in one-hot format. Here, B denotes the batch size, S denotes the sequence dimension, and E denotes the embedding dimension.
The following section presents an overview of the fundamental concepts of the data preprocessing, sequence-embedding fusion, and optimization of the model within the context of the SEF transformer model architecture.
3.1. Data Preprocessing
3.1.1. Input Preprocessing
Considering that the covariance matrix is a conjugate symmetric matrix, it is generally necessary to utilize only the upper or lower triangular portion of
as an input:
where
is the
th element of
.
Since
in the aforementioned equation contains complex numbers, except for the diagonal elements, it is necessary from a practical modeling perspective to separate these complex numbers into their real and imaginary components. These components are then normalized and used as input to the model:
where
,
and
denotes the operations of taking real and imaginary components, respectively, and
denotes the
-norm of the vector.
3.1.2. Training and Testing Preprocessing
One of the main advantages of layer normalization [
32] is its ability to mitigate the effects of internal covariate shift in neural networks. By using consistent means and variances during both the training and testing phases, layer normalization helps maintain stability in the model’s data distribution, thereby enhancing the model’s generalization ability. Additionally, this technique stabilizes the training process and accelerates convergence by keeping the inputs to the activation function within a small range. This helps prevent gradient vanishing, which further improves the model’s sensitivity to input data. The mean and variance are first calculated during training, where layer normalization is normalized over feature dimensions, and we set feature input as
,
B is the batch size,
is the feature dimension, and calculated the mean variance of each sample in the
B dimension,
where
is the mean value, and
is the variance. The standardized
value is,
where
is a considerably small constant that is used to prevent the denominator from being zero. The scaling amount
and offset amount
are learnable parameters and rescale and offset to obtain the final output
.
During the testing phase, using the parameters learned during the training phase, the input is directly normalized:
where ⊙ denotes the multiplication between elements,
and
are the mean and standard deviation of the entire training set calculated during the training phase, and
and
are parameters learned through the gradient descent algorithm during the training process.
3.2. SEF Module for Feature Fusion
As discussed in this section, it is necessary to convert
into the tensor form of sequence-embedding in textual inference using the following expression:
where
multi-dimensional data matrix composed of
S-dimensional sequences and
E-dimensional embeddings. In order to reduce the influence of internal covariate shift, the following preprocessing, layer normalization, and linear transformation operations are performed,
In the multi-head attention mechanism of the transformer model, let
H represent the number of attention heads. For a given multi-head attention layer, the inputs consist of
, where
,
,
,
.
and
is the weight matrix of learning.
is the query, which can be represented as a certain representation of the input sequence,
and
are keys and values, respectively, used to calculate attention weights and aggregate features.
and
perform matrix multiplication, and the equation is as follows:
Then, a scaling operation is performed using the following equation:
where
is the dimension of
. Subsequently, a masking operation is required to mask certain regions that are not utilized for attention. These regions are multiplied by 0, then
mapping and the matrix weights are projected to
. The attention weights are then obtained using the following equation:
Finally, the attention weight is multiplied by
to obtain the output:
Introducing this multi-head self-attention mechanism into the subsequent calculation, of leverages the fact that each attention head, denoted as , can learn distinct feature representations of incoming signals. This diversity in feature learning aids the model in gaining a more nuanced understanding of signal distribution in space. In the context of 2-D coherent polarization–DOA estimation, which often involves multiple polarization channels, each channel offers a unique perspective on the signal’s polarization characteristics. The multi-head self-attention mechanism enhances the model’s ability to associate and integrate features from different channels, thereby improving the capture of the signal’s polarization characteristics.
Given the spatial-polarization feature correlations in the spatial-polarization domain
, this paper incorporates a CNN with local information extraction capabilities into the SEF module. As illustrated in
Figure 3, the CNN processes the queries, keys, and values to produce outputs
,
, and
, which are then fed into the multi-head attention mechanism. This approach enhances the correlation information between sequence and embedding in the spatial-polarization domain through the attention mechanism. Additionally, the multi-head attention mechanism integrates global information from the outputs
, which improves the model’s ability to capture spatial-polarization domain relationships. This process contributes to better performance in coherent polarization–DOA estimation within the spatial-polarization domain. The specific operational process of the multi-head self-attention fusion CNN is detailed in Algorithm 1.
Algorithm 1 Multi-Head Self Attention Fusion CNN Process |
- 1:
Input: (Query, Key, Value matrices) - 2:
, , , where , , are learnable weight matrices - 3:
Perform linear mapping: , , - 4:
Apply Scaled Dot-Product Attention (SDPA) as:
- 5:
Perform parallel operations across heads. For the i-th head:
- 6:
Concatenate the outputs of the heads:
- 7:
Perform final linear mapping on the concatenated output:
|
In addition, as shown in
Figure 4, the aim was to introduce representations on different dimensions by combining the
dimensional transformation with CNN and local information extraction capabilities that will take the output with the
,
, and
of the SEF module as the input of the multi-head attention mechanism, which can be seen as introducing sequence information from different spaces and time series. Thus, the model enhances the spatial-polarization domain correlation from different spatial and temporal perspectives, which enables the model to perform better fuse the spatial-polarization domain features corresponding to the
processed in this paper. Improving the model’s understanding and utilization of the spatial-polarization domain correlation information, namely better understand the contextual information input and improve the model’s expressive power, thus improving the model’s ability to model sequence diversity. The specific operation SEF multi-head self attention is shown in Algorithm 2.
The aforementioned SEF module was added to each multi-head self-attention mechanism module
to fuse multi-dimensional features
in the spatial-polarization domain, resulting in improved feature integration. This enhancement subsequently boosts the model’s ability to understand and utilize the overall sequence relationships. The comprehensive modeling framework incorporating these elements is illustrated in
Figure 5.
Algorithm 2 SEF Multi-Head Self Attention Process |
- 1:
Step 1: Perform dimension conversion of sequence and embedding as: - 2:
Step 2: Perform feature fusion of sequence and embedding dimensions as:
where CNN represents convolution operations with a kernel size of . - 3:
Step 3: Reshape the sequence and embedding dimensions:
to complete feature fusion of sequence and embedding dimensions. - 4:
Step 4: Implement the self-attention mechanism after linear layer mapping. The overall equation is shown below:
|
The goal was to enhance the model’s expressive capability by learning the residual value between the input and output, as follows:
where ⊕ is used to indicate residual connections. Finally, layer normalization and reconstruction vector operations are performed to complete the output of the penultimate layer of the model, as follows:
3.3. Optimization of Model
3.3.1. Positional Encoding Omission
In the transformer model, the positional embedding matrix
and word embedding are usually combined together. The equation for the combination of position and word embedding can be expressed as:
where
represents the input embedding matrix containing positional embeddings.
Equation (
25) illustrates that, in the input phase of the model, the word embedding matrix and the positional embedding matrix are combined to produce an input that incorporates both semantic and positional information. However, the primary focus of this paper is on orientation information within the multi-dimensional features of the spatial-polarization domain in
. This differs from the positional encoding used in NLP tasks, which captures the positional relationships between words in a sentence. In this context, the
does not heavily rely on positional information [
33], making positional encoding unnecessary for this model. Furthermore, incorporating encoding for longer sequence positions introduces additional parameters, which can increase computational and storage demands. Omitting positional encoding simplifies the model structure and significantly reduces the number of parameters, thereby alleviating the computational and storage burden. This reduction can improve both the training and inference efficiency of the model and make it easier to understand and debug.
3.3.2. Multi Task Parallel Processing
As discussed in this section, the concept that the CNN with shared model parameter [
29] mechanism within the SEF module facilitates multi-dimensional information fusion in the spatial-polarization domain was leveraged. This approach ensures the correlation between multi-dimensional features and across multiple tasks. A parallel output structure for multi-tasks was designed and a multi-task weighted loss function was implemented to enhance the model’s expressive power. This optimization allows the model to better fit the training data and improves its predictive performance and inference capabilities on unseen data, thereby enhancing the overall generalization ability of the model.
The following describes a parallel processing and updating model approach for the four multi-dimensional task designs in this study. Let the output of the second-to-last layer of the SEF transformer model be denoted as
, with dimensions
. The multi-task parallel output is represented by the following equation:
where
and
are the weight parameters of the final output layer, and
and
are the output items.
Meanwhile, based on previous research [
8], which indicated that polarization angle estimation typically performs worse than spatial angle estimation, a multi-task weighted cross-entropy loss function was designed in the present study to address this issue. The multi-task loss function [
28] is structured to account for the varying performance across different tasks. The expression for the loss function for a batch is given by:
where
and
are the label values of the
i-th multilabel classification under the
j-th sample, respectively, and
and
are the predicted values.
is the size of batchsize;
,
,
, and
represents the loss functions corresponding to the four tasks;
and
represent the elevation angle separately, azimuth angle, auxiliary polarization angle, polarization phase difference, and the weight of the loss function. The weights utilized in this paper are set to
.
Finally, the aforementioned multi-task weighted loss function is used to back propagate the model and calculate the gradient. The model parameters are then updated using the Adam optimizer.
3.4. Angle Pairing for SEF Transformer
As mentioned above, a multi-task parallel module was designed to maintain the correlation between multi-dimensional features and multiple tasks by decoupling the problem into several parallel sub-problems, while this approach simplifies the overall model by reducing its complexity, it necessitates an additional step for angle pairing, since the outputs from multiple tasks consist of multi-class label results specific to their respective tasks. To address this, the GSF algorithm [
8] is utilized in the angle pairing process. This process begins with representing the received data according to the signal reception model as:
where
is the spatial response matrix,
is the polarization response matrix, and
is the polarized source signal matrix. The signal covariance matrix is:
When
of
N signals are coherent signals, then the rank of
is
, there are
large eigenvalues in
. Take the matrix composed of the
eigenvectors corresponding to the large eigenvalues as the signal subspace
. In an ideal scenario, the signal subspace is a linear subspace of the array manifold tensor space, where there must exist a full rank matrix
, such that:
where
is the signal subspace, and
is the transformation matrix. Moreover, according to Equation (
30), the least squares solution of matrix
is:
where
represents the left inverse of a matrix. Additionally,
with
is the pseudo-inverse matrix of
.
When there is noise in space, the manifold of the signal array is not completely equal to the signal subspace. Further construction of fitting function is possible, and we estimate the spatial parameters of the signal by solving Equation (
32):
where
represents the
-norm of a matrix. Additionally, Equation (
32) can replace the matrix
in order to obtain:
where
. Additionally, the estimation of spatial angle
can be achieved through spatial domain search using Equation (
33).
represents the array manifold composed of
N signals. It requires a 2
N dimensional search.
Similarly, the polarization parameters of the signal are estimated as follows Equation (
34), and there is a full rank matrix
, such that:
where
is the manifold matrix, and
is the transformation matrix. Additionally, the least squares solution of
is:
where
is the pseudo-inverse matrix of
. The polarization parameters can be estimated by Equation (
36):
where
is the estimated value of the spatial angle obtained from Equation (
33). Therefore, estimated values of polarization parameters can be obtained:
where
, polarization domain search can be achieved through Equation (
37) obtain an estimate of polarization angle
, which completes the angle pairing process of spatial angle and polarization angle.
3.5. Model Efficiency
The computational complexity expressions for the linear transformation layer, layer normalization, CNN layer, multi-head self-attention, feedforward neural network, residual connection, parallel linear transformation layers, and generalized subspace fitting for angle pairing can be represented as follows:
where
S is the sequence dimension of the input,
E is the embedding dimension of the input,
is the embedding dimension of the output, the kernel size of the convolution is
, the input channel is
, the output channel is
,
H is the number of heads in the multi-head self-attention mechanism,
N is the number of sources, and
is the dimension of the output of the last layer.
The formula for the model parameters of the linear transformation layer, layer normalization, CNN layer, multi-head self-attention, feed-forward neural network, and parallel linear transformation layer can be expressed as follows:
4. Numerical Simulation
To evaluate the effectiveness of the model, numerical simulations were conducted. The experimental array structure in this paper is UCA. Additionally, due to the data-driven nature of the proposed method, the array structure is not limited to UCA and can be extended to other array structure. In the simulations, the following methods were compared: SEF Trans, which represents the transformer with sequence and embedding fusion; Trans, which denotes the transformer without sequence and embedding fusion; MLP, which stands for multilayer perceptron; and GSF, which refers to the generalized subspace fitting algorithm. The simulation dataset consists of 2,000,000 data points, with 80% allocated for training, 10% for validation, and 10% for testing. The experiment covers a spatial angle range of
and a polarization angle range of
, with a quantization interval of
. The simulation assumes 2 incoming signal sources. The multi-task weighted loss function
incorporates four angle weights:
,
,
, and
. The experimental environment for this study was Python (version 3.7, Python Software Foundation, Wilmington, DE, USA), while data generation was performed using MATLAB 2021b. The simulation parameters used in the experiments are listed in
Table 1 and
Table 2.
To compare the training convergence and performance of various algorithms, the SNR of this experiment was set to 15 dB and 200 snapshots were used. The proposed methods—SEF Trans, Trans, and MLP—were trained over different epochs. The training loss curves for these methods are shown in
Figure 6,
Figure 7 and
Figure 8. As depicted, the SEF Trans method quickly achieved superior convergence compared to the other two methods and maintains relative stability after convergence. In this study, we employed the tree-structured parzen estimator (TPE) method [
34] for hyperparameter optimization of the neural network. TPE is a Bayesian optimization technique that explores the hyperparameter space by constructing conditional probability models. In each trial, TPE dynamically balances exploration and exploitation by selecting the most promising hyperparameters likely to improve performance, thus reducing the number of trials required. We chose TPE because it dynamically balances exploration and exploitation, making it particularly well-suited for high-dimensional and complex hyperparameter search problems. Our experimental results demonstrate that this approach has led to satisfactory performance optimization.
To compare the testing accuracy of various algorithms, the SNR was set to 15 dB, and the DOA estimation error was evaluated under the condition that it is
for each epoch of 200 snapshots. The test accuracy is computed using the following formula:
where
is the total number of test samples,
and
represent the errors in the two angles (the difference between the estimated and true angles) for the
i-th sample, and
is the error threshold; this experiment is set to
.
denotes the indicator function, which is 1 if the condition inside the parentheses is true (i.e., both angle errors are less than or equal to
) and 0 otherwise.
A comparative simulation was conducted to assess the test accuracy of the proposed SEF Trans method against the other two comparison algorithms. As shown in
Figure 9,
Figure 10 and
Figure 11, the test accuracy curves reveal that the SEF Trans method achieved the highest DOA estimation accuracy with the fewest number of epochs. Additionally, the polarization phase difference, which involves processing signal phase information, is more susceptible to noise and interference. In contrast, the polarization assist angle, relying on signal amplitude or other more robust features, demonstrated better estimation performance. This was confirmed during simulation testing, where the accuracy of the auxiliary polarization angle was found to be superior to that of the polarization phase difference, consistent with the observed characteristics [
8].
To verify the feasibility of omitting positional encoding (PE), an experiment was conducted with an SNR of 15 dB and 200 snapshots. A comparative analysis was performed on the accuracy of the SEF Trans method with and without positional encoding, with the error criterion set to less than or equal to one degree. As shown in
Figure 12, the results indicate that positional encoding does not significantly enhance performance. Therefore, it can be concluded that positional encoding can be omitted in this context.
To evaluate the model efficiency of the proposed method, the following table presents a comparison of different data-driven class methods. As shown in
Table 3, it is observed that the SEF Trans method without PE outperforms other methods in terms of computational complexity, number of parameters, and inference time. This suggests that the model efficiency of the SEF Trans method, when positional encoding is omitted, is superior to that of the other methods.
Table 4 shows that the comparison of inference time for different algorithms in the task of coherent polarization–DOA estimation under the same GPU test environment. In this test, the search range of the spatial angle for the model-driven algorithm is set to
, the search range of the polarization angle is
, and the search step is
. The experiment demonstrates that the SEF Trans method outperforms traditional algorithms such as the polarization smoothing-oblique projection [
6] and the generalized subspace fitting (GSF) [
8] methods in terms of inference time. This indicates that the SEF Trans method not only significantly reduces inference time but also exhibits higher real-time performance and computational efficiency on the same hardware platform. This demonstrates the potential of the SEF trans method for real-time signal processing tasks.
To verify estimation performance under varying SNR conditions, the SNR was set from 0 to 21 dB in 3 dB increments, with 200 snapshots used for comparison. The RMSE and CRB simulation results for each method are shown in
Figure 13. As indicated in
Figure 13a,b, while the GSF algorithm exhibited slightly better estimation accuracy than the SEF Trans method when the SNR was between 18 and 21 dB, the SEF Trans method outperformed the Trans, MLP, and GSF algorithms when the SNR was between 0 and 15 dB.
Figure 13c,d further confirm that the proposed SEF Trans algorithm demonstrated superior estimation performance in both the spatial and polarization domains.
To evaluate estimation performance from the perspective of the number of snapshots, the experiment compared simulation performance under an SNR of 0 dB with snapshots ranging from 100 to 500 in increments of 100. The RMSE and CRB simulation results for each method are shown in
Figure 14. As depicted in the RMSE curves for both spatial and polarization angles, the proposed SEF Trans method consistently demonstrated significant advantages as the number of snapshots increased. This indicates that the proposed method maintained strong learning and solving capabilities for multi-dimensional feature data in the spatial-polarization domain, as described in the present paper.
To analyze the generalization ability of the proposed algorithm concerning SNR, the SNR for the training set was set from 0 to 21 dB in 3 dB increments, with 200 snapshots. The test set was set from 2 to 20 dB in 3 dB increments, resulting in a maximum SNR mismatch of 2 dB.
Figure 15 illustrates the RMSE results for DOA estimation of both spatial and polarization angles. Compared with other data-driven algorithms, the SEF Trans method demonstrated robust performance even with SNR mismatches between the training and testing data sets. Although the GSF algorithm showed higher accuracy than the MLP and Trans algorithms as SNR increased, the accuracy of the SEF Trans method was comparable to that of the GSF algorithm. This indicates that the proposed SEF Trans algorithm exhibited superior generalization performance.
To evaluate the generalization ability of the algorithm concerning the number of snapshots, the training set was set with an SNR of 15 dB and snapshots ranging from 100 to 500 in increments of 100. The test set had snapshots ranging from 150 to 450 in increments of 100, resulting in a maximum difference of 50 snapshots.
Figure 16 illustrates that, despite the mismatch in the number of snapshots between the training and testing datasets, the proposed algorithm maintains strong performance.
Figure 16a,b show that the spatial angle estimation accuracy of this algorithm, after generalization, was comparable to the GSF algorithm at higher SNR levels, and the proposed method outperforms the Trans and MLP algorithms in terms of accuracy. Additionally,
Figure 16c,d reveal that the generalized polarization angle estimation accuracy of the proposed algorithm generally surpassed that of the GSF algorithm. These results indicate that the proposed algorithm exhibited high generalization and robustness.