1. Introduction
Multiple-input multiple-output (MIMO) radar has the advantages of real-time and high-resolution imaging owing to its multi-channel configuration [
1,
2]. MIMO radar can form far more observation channels than the number of actual physical arrays through the waveform diversity [
3] and virtual aperture technology [
4], and the multiple observation channels are used to collect echo data in the way of spatial parallel transceiver combinations, so that it has the ability of real-time imaging of snapshot without target motion compensation, ensuring MIMO radar imaging a wide range of potential applications in the fields of security inspection, nondestructive testing, urban combat, and airborne high-speed target detection [
5].
MIMO radar imaging technology can be broadly categorized into two kinds, namely, synthetic-aperture imaging technology and real-aperture imaging technology. Typical representatives of the former one are synthetic-aperture radar (SAR) and inverse synthetic-aperture radar (ISAR) imaging, such as two-dimensional snapshot imaging of airborne targets combined with MIMO radar and ISAR technology [
6] and airborne radar for three-dimensional imaging and nadir observation (ARTINO) [
7] imaging combined with MIMO radar and SAR technology. These combinations can obtain three-dimensional spatial distribution information of the target through the movement of the platform. However, the conventional frequency-domain SAR and ISAR imaging algorithms have difficulty in achieving azimuth focusing on both the transmitting and receiving apertures simultaneously, and fast Fourier transform (FFT)-based methods cannot perform MIMO radar imaging in near-field situations that do not meet the assumption of plane waves. Comparatively, due to its convenience and robust imaging capability regardless of the MIMO array configuration, real-aperture imaging technology has become the preferred processing method for MIMO radar imaging.
The back projection (BP) algorithm is a widely used real-aperture imaging method which is not limited by the array configuration and imaging scenarios of MIMO radar. The early application in the field of real-aperture imaging is the rectangular format algorithm (RFA) combined with the fast Fourier transform to improve the computational efficiency in radar imaging. With the development of synthetic-aperture technology, the polar format algorithm (PFA) [
8] and range-migration algorithm (RMA, or
-K) [
9] appeared one after another. However, these frequency-domain algorithms all apply the fast Fourier transform, Abel transform and Stolt interpolation, which makes them unable to obtain better performance in the scenarios of nonlinear target motion and nonlinear configuration of MIMO radar arrays [
10]. Meanwhile, these frequency-domain algorithms have some other drawbacks: (1) they require a large amount of computer memory to store and compute 2D frequency-domain transforms, and (2) they require a large number of time-domain complementary zeros before Stolt interpolation of data with finite aperture [
11]. The BP algorithm, on the other hand, as a time-domain imaging algorithm, has been widely adopted due to its simplicity of approach and its applicability to various imaging scenarios and radar array configurations. The BP algorithm was introduced to MIMO radar imaging in 2010 [
12], in which a two-dimensional imaging model of MIMO radar is established and its spatial sampling capability is analyzed from the concept of spatial convolution. Ref. [
13] proposed an improved time-delay curve correction back projection algorithm (TCC-BP) in 2013, which significantly reduces the computational burden of the BP algorithm in comparative experiments. However, these algorithms suffer from the spatial variation, sidelobe interference and background noise due to the coherent superposition of signals for imaging, which leads to its failing to satisfy the increasing demand for MIMO radar imaging resolution. Scholars mainly explore two aspects to improve the imaging quality of MIMO radar. One is through the waveform design technology, the design of the penalty function and the beam shaping technology to reduce the MIMO radar beam width in range and azimuth, but this way of reducing the sidelobes at the same time reduces the detection ability of the MIMO radar and often needs to adjust the array antenna arrangement [
14,
15]; the other is through the signal processing way to improve the imaging method, with phase compensation to suppress the grating and the sidelobes or with the transmitter–receiver array’s beam zero drift to offset the grating lobes, and this method can significantly attenuate the sidelobes’ energy, but there will still be part of the energy leakage, resulting in the deformation of the point spread function, and the operation is more cumbersome [
16,
17].
In recent years, the emergence of semantic segmentation techniques [
18,
19] in deep learning provides new ideas for image processing in the field of computer vision. The crucial difficulty lies in the need to accurately classify every pixel point in an image. To realize the end-to-end, pixel-to-pixel training and learning capability of the traditional convolutional neural networks (CNNs), the fully convolutional neural network (FCN) was born [
20]. The encoder module of the FCN model converts the fully connected layer in traditional CNN into the combined form of convolutional layer and nonlinear up-sampling module, ensuring its advantage of supporting any size of image inputs and outputs. SegNet [
21] is a typical FCN model with encoder–decoder architecture, and its biggest improvement is the proposed structure of unpooling, which applies the indexing of max-pooling used in the encoder, avoiding the rough eight-fold upsampling in the FCN and helping to maintain the integrity of the high-frequency information of the images. The U-Net model [
22] has similar architecture and usage as the FCN model and the SegNet model, but creatively introduces a splicing module that splices the feature maps of each stage encoder onto the upsampled feature maps of each corresponding decoder to form a U-shaped structure. The design allows the decoder at each stage to learn the detailed information lost in the encoder by max-pooling. These advantages help to reduce the data volume requirement for training the U-Net network and improve the segmentation accuracy requirement, which makes U-net have a wide range of applications for image semantic segmentation problems in some fields with small-sample characteristics and high accuracy requirements.
The U-Net-shaped networks are widely improved and applied in the field of remote sensing. With the widespread use of deep residual networks [
23], a segmentation neural network for road region extraction is proposed in [
24] by combining a bit of residual learning and the U-Net model. The network is constructed using residual units, which simplifies deep network training while facilitating information propagation by enriching skip connections, and fewer training parameters can be applied to improve network performance. Based on the U-Net framework structure, the channel attention mechanism and spatial attention mechanism [
25] are introduced to improve the utilization of spectral and spatial information, and the residual dense connectivity block is applied to enhance the feature reuse and information flow transfer in [
26], the extracted roads in experiments are closer to the ground reality. On the basis of dual-attention mechanism U-Net, a remote sensing image improvement network is proposed to provide a generic neural for remote sensing image super-resolution, colorization, simultaneous SR colorization, and pan-sharpening network [
27]. Refs. [
28,
29] combined dual attention mechanism U-Net with generative adversarial network to achieve ISAR super-resolution [
30] and end-to-end resolution enhancement. Refs. [
31,
32] also applied the attention mechanism and image semantic segmentation algorithms to airborne target recognition and maritime SAR imaging recognition, both of which achieved good results on measured data. The U-Net framework also has a wide range of applications in other scenarios of radar, such as multistation cooperative radar target recognition [
33], marine target detection [
34], satellite-borne SAR images ship detection [
35], urban building imaging [
36], and so on [
37,
38].
The successful applications of semantic segmentation technology based on the U-Net framework in the fields of road extraction, super-resolution and enhanced imaging of remote sensing images provide ideas for solving the imaging problems in MIMO radar. In this paper, based on the U-Net network architecture, we combine the residual unit and dual-attention mechanism module with the U-Net framework and propose a RAU-Net-based MIMO radar imaging method for spatial-variant correction and denoising. The specific contributions of this paper are as follows:
- (1)
To the best of our knowledge, this is the first RAU-Net-based spatial-variant correction and denoising method in the community of MIMO radar imaging. The method improves the convolution layer of U-Net with residual units and improves the concatenation of U-Net with dual-attention modules and further explores and extends the capabilities and application scenarios of the U-Net framework and solves the problem of spatial-variant correction and denoising for MIMO radar imaging;
- (2)
Combined with the MIMO radar imaging scenario, an improved loss function based on the nuclear norm is proposed, which enhances the network’s ability of focusing for MIMO radar point-spread function and denoising in MIMO radar images;
- (3)
Through the construction of the training datasets, after training on the simulation datasets, the network can achieve a good generalization ability to the real measurement data of outdoor complex targets under even very low SNR conditions, and the network’s ability is verified by several rounds of real-world measured experiments;
- (4)
The network realizes MIMO radar imaging based on the TCC-BP algorithm, which reduces the operational time greatly compared to the traditional BP algorithm on the basis of improving the imaging resolution and denoising capability, ensuring it a wide range of application prospects in the field of real-aperture MIMO radar imaging.
The rest of this article is organized as follows. In
Section 2, the fundamentals of MIMO radar are introduced.
Section 3 presents the proposed RAU-Net-based MIMO radar imaging method in detail and gives the network loss function.
Section 4 describes the details of the data acquisition and testing strategy. In
Section 5, various comparative experiments are carried out to evaluate the performance of the proposed method, and the ablation experiments are deployed to verify the proposed blocks.
Section 6 draws a conclusion.
3. Proposed RAU-Net-Based MIMO Radar Imaging Method
The MIMO radar TCC-BP imaging algorithm can reduce the computation greatly compared to the traditional BP algorithm, but it also results in a distance correction error that affects the imaging quality. In this section, the proposed RAU-Net-based MIMO radar imaging method will be introduced to eliminate the spatial variation and sidelobe interference and remove the background noise.
This chapter is organized into four parts, the U-Net model architecture, residual connection block, dual attention module and the proposed RAU-Net-based MIMO radar imaging method.
3.1. U-Net Model Architecture
In this work, the U-Net model is chosen as the basic architecture in our approach. A typical U-Net model consists of a compression path and an expansion path, corresponding to the encoder and decoder in SegNet, respectively. The compression path consists of four blocks, each of which uses two effective convolutions and one max-pooling downsampling, and the input image will be downsampled by times after passing through the compression path to obtain the feature map. The expansion path also consists of four blocks, and the feature map of the previous layer is up-sampled by the inverse convolution at the beginning of each block, and then up-sampled by the up-sampling of the previous layer, and then up-sampled by the up-sampling of the previous layer, and then up-sampled by the inverse convolution. Each block starts by up-sampling the previous feature map by inverse convolution, and then performs splicing operation with the corresponding module of the compression path, and finally obtains the output results with the same size as the input images.
U-Net networks have excellent performance in the field of image segmentation due to their unique design. In order to apply the U-Net model to the MIMO radar imaging task, we translate the objectives of feature extraction and image segmentation in the U-Net network into radar imaging results. By extracting and mapping the target features through the compression and expansion paths of the U-Net model, the real target information is extracted and segmented from the original imaging results that contain a lot of noise and clutter, thus forming high-quality radar imaging results.
In order to further adapt to the MIMO radar imaging characteristics, and enhance the imaging quality of BP series algorithms, we introduce residual connection block and dual attention module to improve the U-Net model.
3.2. Residual Connection Block
The residual unit and identity mapping used in this work are shown in
Figure 3. As the general form of residual units expressed in Equation (
12),
and
are input and output of the
l-th unit, and
F is a residual function and
f is a ReLU function.
Particularly, the residual unit designed in this work chooses an identity mapping
as suggested in [
39]. The identity mapping constructs a direct path for propagating information through the network, which makes the training of our U-Net model in general become easier. So the residual unit can be expressed by Equation (
13).
To better suit the U-Net model architecture, each residual unit contains two 3 × 3 convolutions applied repeatedly, each followed by a rectified linear unit (ReLU) and a 2 × 2 max-pooling operation. In addition to this, we also add a batch normalization (BN) [
40] unit in front of each round of convolution, which is used to avoid internal covariate shift. This allows the model to apply a higher learning rate as well as avoid the use of dropout to some extent through regularization.
3.3. Dual Attention Module
The ground-based MIMO radars and ISAR generally have large imaging scenarios in which there are often several regions of interest and aggregated targets. In this case, it is important to add an attention mechanism to the learning-imaging network that allows the network to focus its limited learning resources on the region where the target is located, rather than the vast background, clutter, or other uninteresting targets. As for the processing of radar imaging data, it is also necessary to add a multi-channel attention mechanism, which will help our model synthesizing the multi-channel information to increase and improve the fault tolerance of the attention mechanism. So, as shown in
Figure 4, we explore the addition of a dual-attention mechanism, i.e., the spatial attention module (SA) in
Figure 4a and channel attention module (CA) in
Figure 4b, to the U-Net model architecture to enhance the convergence speed of the model and the focusing of the imaging results.
As shown in
Figure 4, for better integration into the U-Net model architecture, both the CA module and SA module have two inputs,
and
.
is the output of
l-th compression level through concatenation, and
is the gating signal from the last extension level. With this design, the dual-attention module inherits the advantages of the concatenation operation in U-Net, which enhances the model’s ability to focus on fine features while applying coarse features to correct them to avoid losing feature information. It can be noticed that the sigmoid activation function was chosen over softmax for both models, the reason being that the sigmoid activation function has better training convergence, whereas the continuous use of the softmax activation function to normalize the attention coefficients produces too sparse activation at the output.
3.4. The Proposed RAU-Net-Based MIMO Radar Imaging Method
Figure 5 illustrates the overall structure of proposed RAU-Net-based MIMO radar imaging method. The model as a whole draws on the structure of the U-Net model to compress and expand the paths, and on this basis, for the MIMO radar imaging scenario, the residual unit and dual attention module are introduced to improve the model. Specifically, (1) the two rounds of convolution in each layer are replaced by a residual unit, which applies the idea of regularization to avoid the dropout and improve the learning efficiency of the model; (2) the dual attention module is added to the concatenation between the two paths, which uses the SA and CA modules to improve the convergence speed and learning efficiency of the model; (3) adding the dual attention module in the concatenation between two paths and cascading the SA and CA modules comprehensively improve the convergence speed of the model learning and enhance the network’s focus on the region of interest.
As can be seen through the model structure diagram in
Figure 5, the original image generated by the TCC-BP algorithm is firstly input into the network, and the fine features of the image are gradually learned through the compression path consisting of residual units, while the image information is gradually recovered through the expansion path. In the process of the image passing through the expansion path, each layer is connected with the original information of the image retained by the compression path through concatenation and dual attention module, avoiding the loss of the original information while further enhancing the information of the region of interest. Finally, the network-enhanced imaging results are output after convolution and sigmoid activation function. It is clear to see that the output imaging results have significant improvement effects in target separation, removal of background noise and spatial resolution correction.
The input and output of the model used in this paper are in the form of data matrices, but in order to more intuitively demonstrate the MIMO radar imaging resolution null effect, the imaging results in the paper are temporarily used to show the sector, and the operation process is to plot the model output results in the sector area according to the distance from the radar array and the corresponding compression or expansion. The model input and output size is a image down each layer of the contraction, and the expansion path image size is halved in the order of due to the model of each layer in two convolutions, so the image is input to the edge of the expansion of to ensure that each layer of the output image size is the same.