1. Introduction
Radar automatic target recognition (RATR) represents a pivotal advancement in military and civilian radar systems, enabling the extraction and analysis of specific features from radar echoes [
1,
2,
3]. This process allows for the automatic recognition of targets based on their attributes, beyond merely tracking their speed and location. The adoption of RATR transforms radar systems from simple detection tools into integral components of advanced intelligence and reconnaissance frameworks, marking a significant leap toward fully autonomous and intelligent defense systems.
Methods for RATR can be primarily divided into two types: those based on narrow-band and wide-band radar systems. Using narrow-band information for RATR involves analyzing data with a limited frequency range, such as radar cross section (RCS) sequences [
4,
5,
6] and Doppler or micro-Doppler signatures [
7,
8,
9]. These methods have the advantage of requiring less data, which simplifies collection and enhances processing efficiency. However, since the target size is usually smaller than the resolution of narrow-band radar, RATR methods based on narrow-band radar often struggle to obtain detailed target information, which in turn, limits the recognition rate. Wide-band radar systems, such as synthetic aperture radar (SAR) [
10,
11,
12] and inverse synthetic aperture radar (ISAR) [
13,
14,
15], use wide-band signals to easily obtain more detailed target recognition. The wide-band-based methods can improve recognition performance with rich detailed information about the target. However, the significant data size and complex imaging algorithms pose severe challenges in data handling and real-time processing. Thus, the efficiency of SAR and ISAR is constrained by their operational demands. A high-resolution range profile (HRRP) represents the projection of a target on the line of sight. It provides more detailed target information than RCS. Compared to SAR and ISAR images, HRRP is easier to store and process. Thus, HRRP-based RATR has received widespread attention.
Currently, HRRP-based RATR methods can be roughly categorized into two types: traditional methods [
16,
17,
18] and deep learning methods [
19,
20,
21]. Li et al. [
22] first introduced a framework for HRRP-based RATR, encompassing data collection, preprocessing, feature extraction, and classifier design. Subsequent research has mainly concentrated on feature extraction, which can be divided into two main strategies: dimensionality reduction and data transformation. The dimensionality reduction approach reduces the volume of HRRP data using subspace or sparse methods [
23,
24]. The data transformation approach converts HRRP into frequency domain representations, such as bi-spectrum or spectrogram [
25,
26]. Although the features extracted by these methods are highly interpretable, their dependence on experience and inconsistent performance across different datasets constrain their applicability in real-world scenarios.
In recent years, deep learning methods have been widely used in HRRP-based RATR, which can be broadly categorized into three main types. The first type involves auto-encoder models, which leverage signal reconstruction to extract data features for recognition. Pan et al. [
27] developed an innovative HRRP target recognition approach utilizing discriminative deep auto-encoders to boost classification accuracy with a limited number of training samples. Du et al. [
28] presented a factorized discriminative conditional variational auto-encoder designed to extract features resilient to variations in target orientation. Zhang et al. [
29] proposed a patch-wise auto-encoder based on transformers (PwAET) to enhance performance under different noise conditions. These auto-encoder-based methods exhibit strong noise reduction abilities. However, the limited feature extraction capabilities and the requirement for high similarity between training and testing samples restrict their generalization ability.
The second type of method utilizes convolutional neural networks (CNNs), which enable the automatic extraction of local features from HRRP. Wan et al. [
30] employed a one-dimensional CNN for processing HRRP in the time domain and a two-dimensional CNN for spectrogram features. Fu et al. [
31] applied a residual CNN architecture for ship target recognition. Chen et al. [
32] introduced a target-attentional convolutional neural network (TACNN) to improve target recognition effectiveness. Although CNNs are effective in image processing, they struggle to extract global features due to their restricted receptive fields.
For the last type, time-sequential models, including recurrent neural networks (RNNs), long short-term memory networks (LSTMs), and Transformers, are employed to capture temporal features between different range cells. In 2019, Xu et al. [
33] introduced a target-aware recurrent attentional network (TARAN) to exploit temporal dependencies and highlight informative regions in HRRP. Zhang et al. [
34] proposed a deep model combining a convolutional long short-term memory (ConvLSTM) network with a self-attention mechanism for polarimetric HRRP target recognition. Pan et al. [
35] designed a stacked CNN–Bi-RNN with an attention mechanism (SCRAM) to enhance generalization in scenarios with limited samples. In 2022, Diao et al. [
36] pioneered the application of the Transformer for HRRP target recognition. These time-sequential models treat HRRP as sequential data. However, HRRP reflects the scattering distribution, so it is more appropriate to consider it as a one-dimensional profile rather than sequential data. Consequently, time-sequential models may overlook the spatial features of targets presented within HRRP.
Currently, most research focuses on extracting the spatial features of HRRP, with few scholars paying attention to the polarimetric features of HRRP. Polarimetric radars work by transmitting and receiving electromagnetic waves in various polarimetric states, significantly enhancing the radar’s capability to obtain more information about the target’s structure. Polarimetric HRRP integrates the advantages of polarimetric radars and wide-band radars, thereby improving the accuracy and reliability of RATR systems. Long et al. [
37] applied
decomposition along the slow time dimension in a dual polarimetric HRRP sequence and designed a six-zone
plane for classification. This is a traditional method of using polarimetric features for classification. Some studies have also integrated polarimetric information into deep networks. For instance, in [
38], the amplitude of different polarization channels was used for dual-band polarimetric HRRP target recognition, but this approach might lose the rich phase information present in polarimetric HRRP. Zhang et al. used amplitude and phase features in [
34], and in [
39], they utilized the real and imaginary parts of polarimetric HRRP and extracted artificial features to guide the model classification. However, experiments have shown that phase, real, and imaginary parts are unstable and not conducive to network learning. Therefore, new methods need to be explored to fully utilize polarimetric information in deep learning networks.
HRRP is often sparse, containing not only scattering information from the target but also from non-target areas. The majority of approaches treat these areas without distinction, hindering the effective use of the most critical range cells in HRRP. One method is to detect the target in the HRRP and then recognize it [
40]. However, this method increases complexity, and the recognition result is greatly affected by the performance of the previous detection algorithm, leading to inevitable information loss. Additionally, the target area still includes some undesirable information that may degrade the identification, and the detection method cannot remove them. The self-attention mechanism offers another solution, thanks to its ability to identify and emphasize crucial parts of the input data [
29,
32,
33,
34,
35]. However, the practicality of this mechanism is limited by the requirement of extensive datasets and an effective training process. Furthermore, these methods often lack good generalization and interpretability.
In summary, current HRRP-based RATR methods face three main issues. First, the feature extraction capabilities of the networks are limited. To address this, we use the vision Transformer (ViT) [
41] as the backbone network. This model is a variant of the Transformer [
42] in the field of image processing. It outperforms auto-encoders in feature extraction, provides a more comprehensive view of global features than CNNs, and better captures the image details of HRRP than traditional time-sequential models, such as RNNs and LSTMs. Second, polarimetric information has not been fully utilized, especially in deep learning methods. To tackle this, we perform two coherent and two incoherent polarimetric decomposition methods. Inspired by [
43], we use one-dimensional convolution to increase the number of channels of polarimetric information, which enhances the network’s performance. Last but not least, current methods struggle to distinguish between target and non-target areas in HRRP, resulting in poor generalization and interpretability. In previous studies, attention maps have predominantly been utilized to observe the effects of network training [
29,
36,
39], but they have not been leveraged to enhance network performance. We propose to integrate the difference between the attention map and HRRP span into the loss function to steer the network to focus on range cells of real targets. We previously introduced the foundational ideas in [
44]. In this paper, we significantly extend that work by providing a more comprehensive background analysis, incorporating additional polarimetric preprocessing techniques, detailing the attention loss function and its back-propagation process, and conducting extensive supplementary experiments. These expansions not only demonstrate the effectiveness of our proposed method under various conditions but also offer deeper insights into the model’s performance and generalization capabilities. In more detail, this paper makes the main contributions as follows:
To the best of our knowledge, this paper is the first to apply ViT to polarimetric HRRP target recognition. Experimental results show that our ViT-based method ITAViT consistently outperforms existing approaches across various conditions, achieving superior recognition performance. Furthermore, compared to its base model, the Transformer, ViT provides significant advantages in both recognition accuracy and computational efficiency.
We propose a hybrid approach that combines traditional feature extraction with a polarimetric preprocessing layer (PPL) to optimize the utilization of polarimetric information, integrating the strengths of both traditional methods and CNNs. Through ablation experiments, we analyze the performance of different features and find that combining the amplitude of polarimetric HRRP with coherent polarimetric features achieves the best performance. We also demonstrate the effectiveness of PPL through comparative experiments.
A novel attention loss is constructed to guide the model to focus on range cells of real HRRP targets during training and inference processes, and its back-propagation process is derived. This method does not require modifications to the internal structure of the network and much additional computational overhead. The effectiveness of attention loss is demonstrated under various experimental conditions, including scenarios with noise and clutter, as well as situations where there are significant differences between the training and test sets. The results highlight the robustness and generalization capabilities of our proposed method. Additionally, the attention loss offers improved interpretability of the self-attention-based HRRP target recognition task through the visualization of the attention map.
In
Figure 1, we present a high-level block diagram of the proposed method to explain the relationship between our contributions and the identified issues, and to illustrate the connections between different modules.
The remainder of this paper is organized as follows. In
Section 2, we describe the representation of polarimetric HRRP and introduce our proposed method, ITAViT.
Section 3 describes the experimental data and presents the experimental results as well as comparisons with other methods.
Section 4 analyzes the roles of the three components of the network separately. Finally,
Section 5 concludes the paper.
5. Conclusions
In this study, we propose an ITAViT model: a novel approach for polarimetric HRRP target recognition, leveraging the ViT enhanced by polarimetric preprocessing and a specifically designed attention loss. We implement various manual polarimetric feature extractions and also derive the back-propagation process for attention loss. Through extensive testing on a dataset of simulated X-band signatures of civilian vehicles, our method demonstrates overall superior performance compared to other approaches used for HRRP target recognition, achieving the highest recognition rates when considering both SOC and EOC conditions. We also conduct ablation experiments to test the effects of polarimetric preprocessing and attention loss separately. The results indicate that the combination of magnitude and coherent features achieves better recognition rates compared to other methods utilizing polarimetric information, and the PPL further enhances performance by extracting additional polarimetric features. The use of attention loss improves recognition rates under SOC, EOC, and simulated ground clutter environments by guiding the network to focus on the strong scattering points of the target, enhancing the model’s generalization performance, robustness to noisy environments, and interpretability.
Although our approach marks a significant achievement, we recognize its limitations. First, in some radar data, the weak scattering points of a target may also carry important information, such as the target’s shape and size. Our method with attention loss might lose some of this information. To address this limitation, we plan to explore additional mechanisms in future work that can balance the attention given to both strong and weak radar signals. For instance, we consider developing a weighted attention loss or an adaptive approach that better captures the full spectrum of radar data. Besides, given the absence of actual polarimetric HRRP datasets, our experiments are limited to simulated data with added noise and clutter for testing. Future work will involve acquiring real polarimetric HRRP datasets to evaluate the network’s performance in practical scenarios. Finally, the attention loss shows promising results for HRRP data but may not be directly transferred into image domain applications due to its tailored design for HRRP. Future work will focus on exploring the potential of attention loss.