1. Introduction
Nowadays, the synthetic aperture radar (SAR) is becoming a very useful sensor for earth remote sensing applications. That is due to its ability to work under different meteorological conditions. Recent technologies of radar images reconstruction have significantly increased the overwhelming amount of radar images. Among them, we distinguish the inverse synthetic aperture radar (ISAR) and synthetic aperture radar (SAR). The difference between these two types of radar images is that the motion of the target leads to generating the ISAR images, whereas the motion of the radar works to obtain the SAR images. Both types are reconstructed according to the reflected electromagnetic waves of the target. Recently, the automatic target recognition (ATR) from these radar images has become an active research topic and it is of paramount importance in several military and civilian applications [
1,
2,
3]. Therefore, it is crucial to develop a new robust generic algorithm that recognizes the aerial (aircraft) targets in ISAR images and the ground battlefield targets in SAR images. The main goal of the ATR system from ISAR or SAR images is to assign automatically a class target to a radar image. To do so, a typical ATR involves mainly three steps: pre-processing, feature extraction and recognition. The pre-processing locates the region of interest (ROI) which is most often the target. The feature extraction step aims to reduce the information of the radar image by the conversion from pixel domain to the feature domain. The main challenge of this transformation (conversion) is to preserve and keep the discriminative characterization of the radar image. These feature vectors are given as input of a classifier to recognize the class (label) of the radar images.
The ISAR and SAR images are chiefly composed of the target and the background areas. Thus, it is desirable to separate these two areas in a fashion to preserve only the target information that is the more relevant to characterize radar images. For this purpose, a variety of methods have been proposed [
1,
4,
5,
6,
7,
8] including the high-definition imaging (HDI), the watershed segmentation, the histogram equalization, the filtering, the thresholding, the dilatation, and the opening and closing. Recently, according to the best performance of the visual saliency mechanism in several image processing applications [
9,
10,
11], the remote sensing community follows the same philosophy, especially, to detect multiple targets in the same radar images [
12,
13,
14]. However, the visual saliency is not widely exploited in the radar images containing one target.
Regarding the feature extraction step, a number of methods have been proposed to characterize radar images such as the down-sampling, the cropping, the principal component analysis (PCA) [
15,
16], the wavelet transforms [
17,
18], the Fourier descriptors [
1], the Krawtchouk moments [
19], the local descriptor like the scale-invariant feature transform (SIFT) method [
20] and so on. Despite that the SIFT method proved its performance in different computer vision fields, a limited number of works used it to describe the target in radar images [
21,
22]. That is due in one hand to its sensitivity to speckle. Consequently, it detects keypoints in the background of radar images which reduce the discriminative behavior of the feature vector. On the other hand, the computation of the whole SIFT keypoints descriptors needs a heavy computational time.
In the recognition step, many several classical classifiers have been adopted for ATR, such as k-nearest neighbors (KNN) [
23], support vector machine (SVM) [
24], adaBoost [
25], softmax of deep features [
13,
26,
27,
28,
29,
30]. In the literature, the most used approach to recognize the SIFT keypoints descriptors is the matching. However, this method requires a high runtime due to the huge number of keypoints. Recently, a great concern has been aroused for sparse representation theory. As a pioneer work, a sparse representation-based classification (SRC) method is proposed by Wright et al. [
31] for face recognition. Due to its outstanding performance, this method has been broadly applied to various kinds of several remote sensing applications [
8,
15,
17,
32,
33,
34,
35]. SRC determines a class label of the test image based on its sparse linear combination with a dictionary composed of training samples.
In this paper, we demonstrate that not all the SIFT keypoints are useful to describing the content of radar images. It is wise and beneficial to reduce them by computing only those located in the target area. To achieve this, inspired by the work of Wang et al. [
36] in SAR image retrieval, we combine the SIFT with a saliency attention model. More precisely, for each radar image, we generate the saliency map by Itti’s model [
9]. The pixels contained in the saliency map are maintained and the remaining are discarded. Consequently, the target area is separated from the background. After that, the SIFT descriptors are calculated from the resulting segmented radar image. As a result, only the SIFT keypoints located in the target are computed. We call the built features multiple salient keypoints descriptors (MSKD). For the decision engine, we adopt the SRC method which is mainly used for the classification of one feature per image called single-task classification. In order to deal with the multiple features per image e.g., SIFT, Liao et al. [
37] have proposed to use multitask SRC in which each single task is applied on one SIFT descriptor. Zhang et al. [
38] have drawn a similar system for 3D face recognition. Regarding these approaches, the number of the used SRC equals exactly the number of test image keypoints which increases the computational load. To overcome this shortcoming, we use the MSKD as the input of the multitask SRC (MSRC). In this way, the number of used SRC per image is significantly reduced. Additionally, only the meaningful SIFT keypoints of the radar image are exploited for the recognition task. In short, we use the MSKD-MSRC to refer to the proposed method.
The rest of this paper is organized as follows. In
Section 2, we describe the proposed approach for radar target recognition. Afterwards, the advantage of MSKD-MSRC is experimentally verified in
Section 3 on several radar images databases. Finally, the conclusion and perspectives are given in
Section 5.
4. Discussion
The experiment results are obtained on two types of radar images which are ISAR (
Section 3.1.1) and SAR images (
Section 3.2.1). Lastly, different versions of MSTAR database (SOC, EOC1 and EOC2) are tested. We compared our method with three others. Several comparison criteria are used: the overall recognition rate, the recognition rate by class and the runtime. The objective of this contribution is to demonstrate that such an approach can be applied efficiently to SAR and ISAR images in the automatic radar target recognition problem.
Regarding the radar image characterization, the experiment results demonstrate that the proposed strategy enhances significantly the target recognition rates and speeds up the task of recognition. This is due to the pre-processing step that efficiently locates the ROI region through the saliency attention model. Consequently, the computation of the SIFT keypoints is faster than the use of the whole radar image. Additionally, the keypoints are concentrated in the salient region which increases the descriptor discrimination.
For the recognition stage, it is clear that the MSRC is slower than the matching but it contributes to increasing the recognition rate. On the other hand, the number of task used in the MSRC equals exactly to the number of keypoints. From that, the MSKD reduces the number of tasks and therefore accelerates the recognition step. We can conclude that the proposed method provides a trade-off between recognition rate and runtime.
Comparing both databases, the ISAR images provide the higher recognition rate. This is because the strong noise existed in the SAR images. In addition to this, the proposed method performs poorly for the high angles for the EOC-2 version of the MSTAR database. Generally, the difference in depression angles has more influence in recognition rate than the configuration variations. This issue is in compliance with the state-of-the-art results.