Millimeter-Wave Radar Clutter Suppression Based on Cycle-Consistency Generative Adversarial Network

Li, Ziyi; Li, Yang; Wang, Yanping; Zheng, Tong; Qu, Hongquan

doi:10.3390/electronics13214166

Open AccessArticle

Millimeter-Wave Radar Clutter Suppression Based on Cycle-Consistency Generative Adversarial Network

by

Ziyi Li

¹,

Yang Li

²

,

Yanping Wang

^2,*,

Tong Zheng

³

and

Hongquan Qu

²

¹

College of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China

²

College of Information, North China University of Technology, Beijing 100144, China

³

School of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(21), 4166; https://doi.org/10.3390/electronics13214166

Submission received: 9 September 2024 / Revised: 8 October 2024 / Accepted: 22 October 2024 / Published: 23 October 2024

(This article belongs to the Special Issue Artificial Intelligence (AI) Based Radar Signal Processing and Radar Imaging)

Download

Browse Figures

Versions Notes

Abstract

:

Vehicle-mounted millimeter-wave radar is widely used in autonomous driving systems for its ability to observe road scenes at all times and in all weathers. However, the data collected by millimeter-wave radar are seriously affected by the existence of clutter. This clutter will result in false detection during object detection. To address this issue, a feature extraction network with clutter suppression is necessary. This paper proposes a new clutter suppression method for millimeter-wave Range–Angle (RA) images based on a cycle-consistency generative adversarial network (CycleGAN). The generator of the method can be used as the feature extraction network of the object detection. The method aims to convert cluttered images into clutter-free images by unsupervised learning. In this method, an attention gate (AG) is introduced into the generator, a spatial attention mechanism that improves the ability of the model to automatically learn to focus on the features of targets and suppress the clutter of the background. Additionally, the target consistency loss term is added to the loss function to maintain target integrity while suppressing network training overfitting. The public dataset CRUW is utilized to evaluate the performance of the proposed method, which is compared and analyzed with traditional methods and deep learning methods. Experimental results show that the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) of the proposed method reach 39.846 and 0.990, respectively.

Keywords:

vehicle-mounted millimeter-wave radar; clutter suppression; generative adversarial network; attention mechanism; target consistency loss

1. Introduction

In recent years, the rapid advancements in artificial intelligence have significantly accelerated the development of autonomous driving technologies. There is a consensus that, in the foreseeable future, autonomous driving will become an integral part of daily life for individuals [1]. Higher-level driving technology is the mainstream trend for future development. A secure driving system necessitates robust and precise environmental perception capabilities. Central to this is object detection, an important part of environmental perception [2]. The effective implementation of object detection relies on a variety of different sensors on the vehicle. Among these, millimeter-wave radar is widely recognized as a pivotal driving assistance sensor. It demonstrates effectiveness in various driving scenarios, including under differing weather and lighting conditions. Millimeter-wave radar’s strong robustness has garnered the attention of numerous researchers in the field [2]. In most solutions to perception, millimeter-wave radar serves primarily as a complementary tool to cameras and a LiDAR system. The superior localization information of radar is utilized, while its rich semantic information remains underutilized. Consequently, the exploration of object detection algorithms based on vehicle-mounted millimeter-wave radar holds substantial significance in advancing automated driving capabilities.

The vehicle-mounted millimeter-wave radar emits a series of continuous frequency-modulated millimeter-wave signals via its antenna, and the receiving antenna will receive the echo signal reflected back at each position in the scene. However, challenges arise due to the inevitable coupling between the transmitting antenna and the receiving antenna, ground reflection, and the presence of non-observable objects, resulting in clutter [3]. The echo signals of clutter and real objects are difficult to separate in the time domain and space domain, which causes serious interference to the object signal and increases the difficulty of object detection and feature extraction. Consequently, it is imperative to enhance the performance of the feature extraction network within the detection network to effectively process millimeter-wave radar data contaminated by clutter, thereby significantly reducing the false detection rate of the system. This paper proposes a clutter suppression algorithm designed for seamless integration with existing object detection algorithms.

In recent years, there has been a notable emphasis among researchers on the field of radar clutter suppression, leading to the development of numerous methodologies aimed at addressing this challenge [4]. These methodologies can be broadly categorized into two principal groups: traditional methods and those based on deep learning approaches. Table 1 compares the contributions and limitations of the mainstream methods within the two major categories.

1.1. Traditional Clutter Suppression Methods

The traditional methods are mainly divided into spatial-based domain methods, frequency-based domain methods, and other transform domains.

The spatial-based domain methods represent the early radar clutter suppression technology. It mostly uses the clutter statistical model to describe the clutter component to eliminate the clutter [16]. Such methods are characterized by their relative simplicity and direct data processing capabilities, but they exhibit weak interference resistance. The introduction of Fourier transform has significantly advanced the development of frequency-based domain methods in the field of clutter suppression.
The frequency-based domain methods extract the Doppler information of the target and clutter by Fourier transform. The representative methods include Moving Target Indicator (MTI) [5], which uses the difference between the moving target echo and the strong ground clutter in the Doppler domain to construct a zero-frequency notch filter to suppress the clutter. Moving Target Detection (MTD) [6] improves the improvement factor by introducing a group of filters in the frequency domain. However, the suppression effect is not ideal for clutter with a large spectrum expansion.
The transform domain methods utilize various mathematical transformations to decompose the signal. The representative methods include:
(1)
Singular value decomposition (SVD) [7] decomposes the input signal into the product of three specific matrices: two orthogonal matrices and a diagonal matrix. The elements situated along the diagonal of the diagonal matrix are referred to as singular values, which are typically organized in descending order. Larger singular values correspond to more significant signal features. However, during the process of singular value selection, there is a risk of inadvertently suppressing important target signals by misclassifying them as clutter.
(2)
Principal component analysis (PCA) [8] reflects the linear correlations among various features of the signal by computing the covariance matrix, which performs eigenvalue decomposition on this matrix, retaining only the principal components associated with the top k eigenvalues. It may also result in the loss of critical target information, thereby diminishing the effectiveness of the clutter suppression.
(3)
Robust principal component analysis (RPCA) [9] decomposes the input data into a low-rank matrix and a sparse matrix, transforming them into an optimization problem. However, when handling high-dimensional data, the computational complexity is high, resulting in slower processing speeds. Although the above methods generally have a good effect on clutter suppression, they all require prior modeling and the estimation of clutter parameters, making it difficult to adapt to real-time and variable clutter.

1.2. Deep Learning Clutter Suppression Methods

Different from the traditional hand-crafted design algorithm, the deep learning algorithm automatically extracts the feature information of clutter and targets through a neural network layer and realizes the suppression of clutter by a trained model. This is precisely because of the strong representation ability of the neural network layer to the features, which needs less manual calculation. The release of human resources and the improvement of clutter suppression efficiency are realized [17,18]. At present, the commonly used deep learning methods for radar clutter suppression can be categorized into two main types: convolutional neural networks (CNNs) [19] and generative adversarial networks (GANs) [20].

Regarding convolutional neural networks, Li et al. [10] introduced a clutter suppression algorithm utilizing an encoder–decoder to solve the problem of object detection at various signal-to-noise ratios in real sea conditions. Meanwhile, Guo et al. [11] and Zhang et al. [12] developed a sea clutter suppression network employing a deep convolutional autoencoder. Additionally, Geng et al. [13] proposed a ground-penetrating radar clutter suppression algorithm based on an LSTM network. Wen et al. [10] also presented a sea clutter suppression method based on deep convolutional neural networks. While these approaches have yielded promising results, they often require a substantial number of clutter and clutter-free sample pairs during the training phase. In practice, obtaining clutter-free labeled images is challenging, which limits the performance of the above supervised learning methods in clutter suppression tasks.
In the domain of GANs [20], the capability of neural networks to learn features has significantly advanced. GANs also considerably lessen the reliance on labeled training datasets, facilitating the integration of deep learning methods into radar clutter suppression. Ni et al. [14] designed an unsupervised conditional generative adversarial network to map ground-penetrating radar data from clutter to clutter-free, and it has good applicability on real data. Furthermore, Mou et al. [15] introduced a novel sea clutter suppression method SCS-GAN, which further expands the number of radar training sets by GAN. This network incorporates an attention mechanism to further boost the network’s ability to extract clutter features. Although existing deep learning-based clutter suppression methods have shown a good performance, there is a scarcity of techniques specifically aimed at suppressing radar image clutter in traffic scenarios. Moreover, the coexistence of object signals and clutter within the echo signal poses a challenge, as the distribution of the object echoes is relatively sparse compared to that of the clutter. Effectively suppressing clutter while retaining the object information is crucial for enhancing the overall performance of object detection.

To cope with these problems, this study introduces a novel clutter suppression method utilizing CycleGAN [21], specifically designed for millimeter-wave radar applications. CycleGAN can realize the style transfer between unpaired datasets. In this research, we leverage the capabilities of CycleGAN for clutter suppression in millimeter-wave radar Range–Angle (RA) images. The direct application of the original CycleGAN method to millimeter-wave radar clutter suppression can remove a portion of the clutter, but it lacks the ability to extract critical target information, resulting in a loss of target integrity in the final output. To enhance the feature extraction and feature retention capabilities of CycleGAN, the following significant improvements are proposed in this paper:

Multi-scale feature fusion network: The generator incorporates a multi-scale feature fusion network enhanced with an attention mechanism. This design effectively captures contextual information across multiple scales, thereby emphasizing both the categorization and localization of targets. The attention mechanism ensures a focus on the region of interest throughout the clutter-suppressing process.
Target consistency loss: An additional term, designed as the target consistency loss, is incorporated into the original loss function. This regularization term preserves the integrity of the target information by limiting the gap between input and output. Furthermore, it helps mitigate the risk of overfitting, which is particularly relevant when dealing with limited training data.

The remainder of this paper is structured as follows: The second section provides a overview of the operating principles of CycleGAN. The third section presents the proposed methodology, detailing the architecture and functions of the network. Finally, the dataset utilized for the experiment is introduced, along with a validation of the network’s applicability and effectiveness, including a quantitative assessment of the generated outcomes.

2. Operational Principles of CycleGAN

CycleGAN [21], a variant of generative adversarial networks (GANs), was introduced by Zhu et al. at the Berkeley Artificial Intelligence Laboratory of the University of California in 2017. Similar to GANs, CycleGAN also uses the idea of a zero-sum game. However, conventional GANs typically require paired datasets for training, which can be expensive and difficult to obtain. CycleGAN is designed to function with unpaired training data, which facilitates the mutual mapping between two different image domains. The conceptual framework of CycleGAN is illustrated in Figure 1.

The architecture of CycleGAN comprises two generators and two discriminators, specifically designed as generator

G

, generator

F

and discriminators

D_{X}

and

D_{Y}

. During the training phase, the dataset is partitioned into a source domain dataset

X

and a target domain dataset

Y

. These distinct datasets serve as inputs to generators

G

and

F

, respectively. In this context, the generators function as mapping functions within the network.

G

and

F

share an identical structure while learning two mappings:

G : X \to Y

and

F : Y \to X

, respectively. At the same time, the discriminators

D_{X}

and

D_{Y}

also maintain the same structure. The primary function of the discriminators is to determine whether the input image originates from the generator or a real image from the dataset.

In the mapping

G : X \to Y

, an input image

x

(where

x \in X

) is utilized to generate an output image

\hat{x}

that closely resembles the distribution of the target domain

Y

, expressed as

\hat{x} = G (x)

. The discriminator

D_{Y}

functions to distinguish the real image

y

and the generated image

\hat{x}

. Conversely, in the mapping

F : Y \to X

, an input image

y

(where

y \in Y

) is employed to produce an output image

\hat{y}

that aligns with the distribution of source domain

X

, expressed as

\hat{y} = F (y)

. The discriminator

D_{X}

is responsible for differentiating between the real image

x

and the generated image

\hat{y}

. Throughout the mapping process, the parameters of both the generator and the discriminator are updated according to the conventional GAN loss function.

The key innovation of CycleGAN lie in its implementation of cycle consistency. To mitigate the loss of essential feature information from the original image during the generation process, cycle consistency loss is integrated, which incorporates the concept of inverse mapping. This mechanism ensures that the generated image

\hat{x}

retains the same content as the input image

x

. When image

\hat{x}

is fed into the generator

F

, a new image

\hat{Y}

is generated that closely resembles image

x

, such that

F (G (x)) = x

. In parallel, when the generated image

\hat{y}

is fed into generator

G

, a new image

\hat{X}

is generated that aligns as closely as possible with image

y

, satisfying

G (F (y)) = y

. The principle of cycle consistency ensures that the images produced by the generators are highly consistent with the original images. This congruence is achieved through cyclic mapping between the two distinct image domains. Consequently, this method minimizes the risk of information loss.

The ability of CycleGAN to leverage unpaired images for network training provides an outstanding advantage in the field of image transformation. It demonstrates robust performance even when there are substantial disparities between the two image domains. Nevertheless, CycleGAN is not without its limitations. Specifically, during the transformation process, there is no guarantee that the generated image will keep the original target image intact, which can result in the loss of critical target information. To address this issue, this paper proposes enhancements to the origin CycleGAN framework. The details of these specific improvements are elaborated upon in the subsequent subsections.

3. Proposed Method

In this section, we provide a comprehensive overview of the clutter suppression method introduced in this paper for vehicle-mounted millimeter-wave radar data. Initially, we outline the overall architecture of the clutter suppression approach. Subsequently, we detail the various components of the network, including the specific configurations of both the generator and discriminator. Lastly, we present the loss function employed for model training.

3.1. Architecture of the Clutter Suppression Network

Building on the principles of the CycleGAN, this paper defines the clutter suppression process for the vehicle-mounted millimeter-wave radar as

G : X \to Y

, representing the mapping from domain

X

to domain

Y

. The inverse mapping is denoted as

F : Y \to X

, which corresponds to the process of clutter recovery. Here, domain

X

signifies the cluttered image domain derived from millimeter-wave radar Range–Angle images, while domain

Y

denotes the clutter-free image domain. This cyclic mapping is designed to facilitate the network’s in-depth learning of clutter characteristics, thereby enabling effective clutter suppression through style transfer. However, the presence of various clutter types in the radar Range–Angle images of real-word scenarios poses significant challenges.

The original CycleGAN exhibits limited capability in feature extraction for the target, leading to suboptimal generalization performance and an inability to achieve the desired clutter suppression effect. Specifically, during the feature extraction phase, the features of the target region will be affected by the surrounding background, which can result in poor or even missing local feature generation for the target. Furthermore, during the network-training process, the cyclic consistency loss of CycleGAN is designed to ensure that the original clutter data align with the reconstructed clutter data at the pixel level. However, it is difficult to obtain highly consistent results when the target and the clutter appear to overlap. Consequently, the images generated by the generator often contain redundant information, leading to deformation of the target region and adversely impacting the overall quality of the produced images.

To cope with the above limitations, this paper improved the generator of the original CycleGAN by adding skip connections [22] fusing multi-scale features to enhance the target recognition rate. To enhance the generation of target regions, an attention mechanism is introduced to highlight the target regions of interest and suppress irrelevant background regions.

The proposed network builds upon the principles of CycleGAN and integrates both a pair of GANs and an attention mechanism, specifically the attention gate (AG) [23]. The architecture of the proposed network is shown in Figure 2. Each GAN in the system consists of a generator and a discriminator, which work together to achieve style transfer between the cluttered RA images data domain

X

and clutter-free RA images data domain

Y

.

Specifically, the clutter suppression generator

G

is used to extract characteristics of the clutter. The cluttered RA images

x

are fed into generator

G

to produce the clutter suppression result

G (x)

, where

x \in X

. To ensure the output satisfies

G (y) \in Y

, the discriminator

D_{Y}

is employed to discriminate the clutter suppression results

G (x)

and the clutter-free images

y

, thereby guiding generator

G

in learning the mapping from

X

to

Y

. In order to maintain a one-to-one correspondence between

G (x)

and

x

, the network utilizes generator

F

to learn the clutter recovery mapping. The output

G (x)

is subsequently input into generator

F

to obtain the clutter reconstruction result

F (G (x))

, where

F (G (x)) \approx x

. Utilizing the above process, the network can realize the performance of clutter suppression.

Considering the symmetric net structure, it is also necessary to learn mapping

F

that is from

Y

to

X

. Similarly,

F (y)

, generated by generator

F

, represents the outcome of the clutter reconstruction applied to

y

, where

y \in Y

. Discriminator

D_{X}

is utilized to discriminate if

F (y)

satisfies the characteristic of

X

, i.e.,

F (y) \in X

. Finally,

F (y)

is recovered by generator

G

, i.e.,

G (F (y)) \approx y

. The specific modules of the clutter suppression network are given in the following sections.

3.1.1. Architecture of Generator

As the central components of the clutter suppression network, both the clutter suppression generator

G

and the clutter reconstruction generator

F

share an identical structure design. The specific structure is shown in Figure 3a.

The primary function of generator

G

is to produce the clutter-free RA images that corresponding to the inputted cluttered RA images, while generator

F

is responsible for reintroducing clutter into the clutter-free RA images. The architecture of each generator comprises two key components: the main feature extraction module and the enhanced feature extraction module. Initially, the RA image serves as the input for the generator; the main feature extraction module processes the input through down-sampling, progressively extracting crucial feature information. Subsequently, the enhanced feature extraction module restores the scale of the image by up-sampling. Additionally, the up-sampled feature information is combined with a feature map obtained from the previous level down-sampling using a skip connection. This integration facilitates the enhanced extraction module’s capacity to leverage feature information across varying resolutions effectively.

The main feature extraction module comprises two different sub-modules. Each sub-module is composed of a convolution layer, instance normalization and Leaky ReLU. The primary difference between these two sub-modules lie in the presence of a max pooling layer. The structure of the sub-module is shown in Figure 3b. The convolution layer utilizes a kernel of

4 \times 4

to extract the feature map. Following this, instance normalization and Leaky ReLU are applied to the feature map to enhance the module’s representation capacity.

The extracted feature map is fed into the enhanced feature extraction module, which is designed to fuse features at different scales and reconstruct them. This module also comprises two sub-modules, differentiated by the presence of an up-sampling layer. The foundational sub-module consists of a transposed convolutional layer, instance normalization and Leaky ReLU. The structure of the sub-module is shown in Figure 3b.

Moreover, there is the phenomenon of overlap between the target and the clutter during the data acquisition process. This phenomenon leads to the inclusion of redundant information in the generator’s output image and results in a distortion of the target response. In order to make the generator focus on the region of interest of the target when extracting features, an attention gate (AG) [23] is introduced in this paper. The architecture of the attention gate is illustrated in Figure 4.

The AG consists of a convolution layer and activation function. The feature maps produced by each layer of the enhancement feature extraction module are utilized as inputs to the attention gate. Additionally, the feature maps from the corresponding preceding layer are also used as inputs. The specific operation of the AG involves adding the output feature

k

from the upper encoder to the output feature

g

from the decoder after both have passed through their respective convolution layers

W_{k}

and

W_{g}

, with a convolution filter size of

1 \times 1 \times 1

. Subsequently, the process involves passing through the ReLU activation function

σ_{1}

, the convolution layer

ψ

and the sigmoid activation function

σ_{2}

, resulting in the derivation of the attention coefficient

α

. Ultimately, the feature

k

is multiplied by the attention coefficient

α

to produce the modified feature

\hat{k}

, which is employed for skip connection.

The attention gate, a spatial attention mechanism, focuses on the spatial positions within the feature map, assigning weights to each location. A higher weight indicates the greater importance of the features in that particular region. This capability enables the network to suppress irrelevant regions of the input image while highlighting salient features that are useful for the task. We present a visualization of the attention coefficient

α

acquired by the network throughout the training process under a single target scene, as illustrated in Figure 5b. The feature activation maps resulting from the skip connection, both with and without the attention gate, are shown in Figure 5c,d, respectively. It can be seen that the attention gates can enhance target feature representation.

3.1.2. Architecture of Discriminator

To enhance the performance of the generator, it is essential to utilize the discriminator to direct the generator in learning the mapping relations

G

and

F

. Both discriminators,

D_{X}

and

D_{Y}

, share the same network structure. The detailed structure of the discriminators is illustrated in Figure 6.

The function of the discriminator is to determine whether the input data are real images or synthetic images produced by the generator. Discriminator

D_{X}

is tasked with differentiating between real images in the

X

domain and those generated by generator

F

. Discriminator

D_{Y}

serves to distinguish between the real images in the

Y

domain and images generated by generator

G

. The structure of the discriminator is same as the original CycleGAN. The discriminator has five layers. The first four layers are used to extract the features. Each of the layers consists of convolutional layers Instance Norm and Leaky ReLU. The fifth layer contains only one convolutional layer, which is used to output the result of the discrimination. The first four layers of the discriminator perform feature extraction on the input radar data. The fifth layer of the discriminator transforms the feature map to obtain the final discriminative score, which has a value between 0–1. The output value measures the correspondence between the input and output. Through the guidance of the discriminator, the clutter suppression generator and the clutter reconstruction generator can continuously learn and finally realize the fitting of the corresponding

G

and

F

.

3.2. Loss Function

During the training of the clutter suppression network, the generator and discriminator are iteratively optimized in an alternating manner, with the goal of achieving the effective convergence of the network to fulfill the desired clutter suppression performance. In this training phase, the loss function serves a crucial role in evaluating the model’s learning effectiveness. The essence of training is the optimization of the loss function. Therefore, the design of the loss function is very important, and it is related to whether the network can effectively converge and whether the expected design goals can be achieved. The loss function of the clutter suppression network designed in this paper mainly includes three parts: generative adversarial loss [20], cyclic consistency loss [21] and target consistency loss [24]. These three losses are introduced in detail as follows.

Adversarial loss is used to train the corresponding generator and discriminator, and it is also a conventional loss function of the GAN network. The purpose of the generator is to minimize the value of the adversarial loss. The purpose of the discriminator is to maximize the value of the adversarial loss. Through this setting, the authenticity of the data generated by the generator is ensured by evaluating the similarity between the generated images and the real images. Take the learning mapping relationship

G : X \to Y

as an example, whose corresponding discriminator is

D_{Y}

. The adversarial loss of mapping function

G : X \to Y

is denoted by

L_{G A N_1}

, and is shown in Equation (1):

L_{G A N_1} = E_{y \sim P_{d a t a} (y)} [\log D_{Y} (y)] + E_{x \sim P_{d a t a} (x)} [\log (1 - D_{Y} (G (x)))],

(1)

where

x

,

y

represent the RA maps from domain

X

and domain

Y

, respectively.

y \sim P_{d a t a} (y)

and

x \sim P_{d a t a} (x)

are used to represent the distribution of the data.

E

represents the expectation.

G (x)

is the clutter suppression result of

x

, which is obtained by the clutter suppression generator.

D_{Y}

is used to distinguish

y

and

G (x)

.

Similarly, for the mapping relationship

F : Y \to X

, the adversarial loss is defined as in Equation (2) and is denoted by

L_{G A N_2}

. Its variable meaning is consistent with the above equation. Here

D_{X}

is used to distinguish

x

and

F (y)

:

L_{G A N_2} = E_{x \sim P_{d a t a} (x)} [\log D_{X} (x)] + E_{y \sim P_{d a t a} (y)} [\log (1 - D_{X} (F (y)))],

(2)

The advantage of CycleGAN is cycle consistency. During the process of the transfer, the cyclic mapping between the two image domains can prevent the loss of basic features, in order to reconstruct an image that is highly consistent with the original image. To ensure that

x

and

G (x)

,

y

and

F (y)

can be uniquely corresponding. This means

G (x)

is the clutter suppression result strictly corresponding to

x

, and

F (y)

is the clutter reconstruction result strictly corresponding to

y

. The cyclic consistency loss requires that

G (x)

can be restored to the original image of the

X

domain, and

F (y)

can be restored to the original image of the

Y

domain. The final expectation is

F (G (x)) \approx x

,

G (F (y)) \approx y

. Therefore, the cyclic consistency loss is defined as in Equation (3) and is denoted by

L_{c y c l e}

:

L_{c y c l e} = \frac{1}{n} \sum_{i}^{n} {‖ F (G (x_{i})) - x_{i} ‖}_{1} + \frac{1}{n} \sum_{i}^{n} {‖ G (F (y_{i})) - y_{i} ‖}_{1},

(3)

where

{‖\cdot‖}_{1}

represents L1 norms, which are used to make the reconstructed image close to the original image. The data

x

are input into the generator

G

; the output is

G (x)

. The reconstructed result of

G (x)

is

F (G (x))

, which should satisfy the condition of

F (G (x)) \in x

. The cyclic consistency loss can guide

F (G (x))

and

x

to be close enough. By the constraint of cyclic suppression loss, the network can be prevented from mapping several pieces of data from

X

to the same result that conforms to the characteristics of the

Y

domain, to realize the correctness of clutter suppression. Similarly, the same is true for the clutter reconstruction process. The generative adversarial loss and cyclic consistency loss ensure the learning and fitting ability of the network for clutter. However, clutter suppression requires not only the accurate removal of clutter in complex traffic scenes, but also the effective preservation of target information. To better preserve the target information in the clutter suppression process and prevent training overfitting, we introduce the target consistency loss, which is defined as

L_{t a r} = \frac{1}{n} \sum_{i}^{n} {‖ F (x_{i}) - x_{i} ‖}_{2} + \frac{1}{n} \sum_{i}^{n} {‖ G (y_{i}) - y_{i} ‖}_{2},

(4)

Generator

G

is used to suppress the clutter of the input data. The data

y

are directly input into generator

G

to obtain the output result

G (y)

, which represents the enhanced clutter-free image. In Equation (4),

\frac{1}{n} \sum_{i}^{n} {‖ G (y_{i}) - y_{i} ‖}_{2}

measures the change in the target area in the input radar data by generator

G

. Similarly,

\frac{1}{n} \sum_{i}^{n} {‖ F (x_{i}) - x_{i} ‖}_{2}

is used to measure the change in target area caused by generator

F

.

By the constraint of target consistency loss, the ability of the network to identify and retain the target area during training can be strengthened. The loss function with target consistency loss can better ensure the clutter suppression performance of the entire network. The complete loss function of the clutter suppression method proposed in this paper can be expressed as follows:

L = L_{G A N_1} + L_{G A N_2} + λ L_{c y c l e} + μ L_{t a r},

(5)

where

λ

and

μ

are weight coefficients, which are used to control relative importance. During the training phase, each epoch of training can be regarded as a ‘maximum–minimum’ game process. First, fix the parameters of the generator and update the parameters of the discriminator to obtain the maximized loss function, in order to make the discriminator accurately distinguish between the input radar data and the data generated by the generator. Then, fix the parameters of the two discriminators and update the parameters of the generator to obtain the minimized loss function, in order for the data generated by the generator to be as consistent as possible with the real input data. The above optimization process is repeated until the network converges. In the test phase, it only requires the use of a trained clutter suppression generator

G

to perform clutter suppression on the clutter data.

4. Experiments and Results

In this section, we will assess the performance of the clutter suppression network proposed in this paper using publicly available datasets. Initially, we will outline the experiment dataset employed, the specific configuration of the network and the evaluation metrics. Subsequently, we will present the experimental results associated with various model architecture and hyperparameter selections related to the clutter suppression method. Finally, we will analyze the network’s performance by comparing it with other methods.

4.1. Experiment Implementation

4.1.1. Data Preparation

The RA images utilized in this the experiment are sourced from the public Camera–Radar of the University of Washington (CRUW) dataset [25]. The public CRUW dataset is a pipeline of detection RA images containing varies real clutter in traffic scenes. The sensor platform employed for the CRUW dataset consists of a pair of stereo cameras [26] and two perpendicular 77 GHz FMCW MMW radar antenna arrays [27]. Some specific configurations of the sensor platform are shown in Table 2.

The CRUW dataset comprises a total of 149,420 RA images. Utilizing the entire dataset for training would demand substantial time and resources, which is unnecessary. Therefore, we randomly selected about one-fifth of the dataset, amounting to 35,932 images, as the experimental dataset. This sub-dataset was then divided into a training set and a test set, according to the ratio of 9:1, respectively. The training dataset encompasses two distinct scenarios based on the number of objects, single-object and dual-object. The single-target scenes are denoted as Training Set A, while the dual-target scenes are designated as Training Set B. The specific distribution of the data is detailed in Table 3.

The network requires both clutter domain data X and the clutter-free domain data Y for training. Dataset X comprises cluttered data based on the RA images in the real scene, sourced from the CRUW dataset. Dataset Y consists of clutter-free data obtained from processing RA images in conjunction with the ground truth images within CRUW. Both the clutter domain dataset X and the clutter-free domain dataset Y are of equal size. The clutter data X serve as input to generator G, while the clutter-free data Y function as the input to generator F, facilitating clutter suppression and reconstruction. Additionally, these two distinct pieces of data are utilized as one input for the two discriminators, aimed at evaluating the effectiveness of the corresponding generators. A portion of the experimental data used for training is illustrated in Figure 7.

4.1.2. Network Training

The network is trained with the sub-dataset of CRUW prepared in Section 4.1.1. In the configuration of the network architecture, the convolutional layers utilize a filter of 4 × 4, a stride of 2 and a padding of 1 × 1. The entire training process is implemented in PyTorch1.7. This paper involves training a network composed of two distinct models, optimized using the AdamW [28] optimizer, with a learning rate set at 0.0002, a decay rate of 0.5 and a batch size of 1 on NVIDIA Quadro RTX 6000 GPU (Santa Clara, CA, USA). Through ablation experiments conducted under the same configuration, the most suitable models and loss functions for the generators are determined. The training data, formatted to a size of 256 × 256 × 1, are input into the network, and the optimization process is concluded after 100 epochs, resulting in a total training time of approximately 8.5 h. The loss curves from the training process are illustrated in Figure 8.

4.1.3. Evaluation Index

Evaluation methods can be generally divided into qualitative evaluation and quantitative evaluation. Qualitative evaluation is based on the subjective evaluation of the experimental results by the researchers according to certain criteria or experience. Quantitative evaluation refers to the accurate and objective evaluation of the experiment by constructing relevant mathematical models according to the research needs. In this experiment, we introduced two image quality evaluation indicators to evaluate the experimental results, namely peak signal-to-noise ratio (PSNR) [29] and structural similarity (SSIM) [30]. These two evaluation indexes can effectively evaluate the quality of the RA spectra generated by this method.

PSNR is an image quality index that has been widely used to evaluate the degree of image distortion. It evaluates the similarity between images by calculating the mean square error between the generated image and ground truth. The formula is as follows:

M S E = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} {(p (i, j) - q (i, j))}^{2},

(6)

P S N R = 10 \log_{10} (\frac{255^{2}}{M S E}),

(7)

where MSE represents the mean square error.

p (i, j)

and

q (i, j)

represents the pixel values at the coordinates, respectively.

H

,

W

are the height and width of the image. The smaller the mean square error is, the larger the PSNR value is. This means that the two images are similar, and the better the image quality after clutter suppression is.

The degree of correlation between image pixels reflects the structural information of the image, and the structural information is relatively independent of the illumination information, while the illumination information is determined by the contrast and brightness. SSIM evaluates the similarity between two images by comprehensively considering the relationship between structural information, contrast information and brightness information. The formula is defined as follows:

S S I M (m, n) = L (m, n) + C (m, n) + S (m, n),

(8)

L (m, n) = \frac{2 u_{m} u_{n} + c_{1}}{u_{m}^{2} + u_{n}^{2} + c_{1}},

(9)

C (m, n) = \frac{2 σ_{m} σ_{n} + c_{2}}{σ_{m}^{2} + σ_{n}^{2} + c_{2}},

(10)

S (m, n) = \frac{σ_{m n} + c_{3}}{σ_{m} σ_{n} + c_{3}},

(11)

where

m

,

n

represent the RA map of clutter suppression and ground truth.

u_{m}

,

u_{n}

represent the mean value of the RA map;

σ_{m}

,

σ_{n}

represent the standard deviation of the RA map;

σ_{m}^{2}

,

σ_{n}^{2}

represent the variance of the RA map;

σ_{m n}

represents the covariance of the RA map; and

c_{1}

,

c_{2}

,

c_{3}

are usually minimal constants, avoiding the case of denominator 0. The value of SSIM is between 0 and 1. The higher the value, the higher the similarity of the image.

4.2. Ablation Experiment

4.2.1. Model Choice

In order to demonstrate that fusing multi-scale features is more favorable for the feature extraction of RA images, the backbone of the clutter suppression network designed in this paper is CycleGAN, in which the generator is tested on four different structures: Rsenet_6blocks, Rsenet_9blocks, Unet_128 and Unet_256. We analyze the clutter suppression results corresponding to different models in both qualitative and quantitative ways.

First, the clutter suppression performance of the method in this paper is tested at different structures of the generator, and the test results are shown in Figure 8. It can be seen that the proposed method can suppress clutter under different structures, but the suppression effect is different. In the face of variable clutter, the generator based on the Resnet blocks [31] only extracted the low-resolution feature information of the target by down-sampling, and the fine structural information of the target is lacking, so the suppression effect is not good. The generator based on U-net [32] not only extracts the low-resolution feature information by down-sampling but also adds the skip connection that transmits the high-resolution feature information from the encoder to the decoder at the same height. The U-net structure for the generator can fuse multi-scale features, which provides more fine features and is conducive to clutter suppression. The experimental results show that the generator achieves the better clutter suppression effect when it effectively utilizes the feature information. Therefore, we finally adopt unet_256 as the generator of the clutter suppression method.

In Figure 9, it can be seen that the generator based on Resnet can suppress most of the clutter, but there is still clutter that can have an impact on the target response generation. In contrast, the generator based on U-net can suppress most of the clutter and retain the target response. Meanwhile, the evaluation indicators of the method with different generators are shown in Table 4. It can be seen in the table that the PSNR and SSIM of the generator based on U-net are better than those of the generator based on Resnet.

4.2.2. Ablation of Loss Function

In order to prevent the complex model from leading to training overfitting, we introduce an additional loss to the original loss function to avoid this problem. In particular, we encourage target consistency between inputs and outputs by introducing an additional loss that preserves the original target information more completely. We visualize the test results of the method based on the unet_256 model with different losses, as shown in Figure 10. It can be seen that the object of clutter suppression is closer to the object in the original RA map after the introduction of the additional loss.

In Figure 10, it can be seen that compared to the original loss function, the suppression result of the loss function including four parts can be highly consistent with the target response in the ground truth. At the same time, the evaluation indicators of the different loss functions under the generator based on unet_256 are shown in Table 5 and Table 6. More intuitively, it can be seen that the clutter suppression method achieves a better suppression performance after the introduction of a target consistency loss in this paper.

4.2.3. Ablation of Hyperparameters

In this paper, the equation of the loss function contains two hyperparameters, λ and μ. In order to seek the optimal combination, we conducted ablation experiments based on the unet_256 model. The experimental results are shown in Table 6.

In Table 7, it can be seen that when μ is fixed, λ takes a smaller value and the PSNR value is smaller. Therefore, when the value of λ = 10, the clutter suppression performance of the network is better. When λ = 10 and μ is taken as 1, the value of the PSNR is maximum and the suppression performance of the network is best. From Table 8, it can be seen that the values of λ and μ have the same effect on the SSIM and PSNR. Therefore, the hyperparameters λ and μ are set to 10 and 1, respectively, during the training phase of the clutter suppression method.

4.3. Suppression Performance Comparision

The suppression performance of different methods based on the public CRUW dataset are shown in Figure 11, in order to make comparisons with the traditional clutter suppression method and other deep learning-based methods. In this paper, the clutter suppression results based on the traditional methods with SVD [33] and RPCA [9] are reproduced, respectively. The clutter suppression results based on the deep learning methods with Pix2Pix [34] and CycleGAN are reproduced, respectively. The first column is the suppression result of the original cluttered RA image. The second column is the suppression result of SVD. The third column is the suppression result of RPCA. The fourth column is the suppression result of Pix2Pix. The fifth column is the suppression result of CycleGAN. The last column is the suppression result of the method proposed in this paper.

From Figure 11, it can be seen that the traditional method does not completely suppress the clutter. The performance of the deep learning-based method is better than the traditional method. Most of the clutter is suppressed by the deep learning-based method, but the generation of the target response is affected by the clutter. The target response is weakened. In contrast, the performance of the method proposed successfully suppresses the clutter and better retains the target response The evaluation indicators of the different methods are shown in Table 9. Additionally, a comparison of the runtime of the proposed method with other approaches is presented in the table. The analysis of these two tables indicates that the proposed method not only exhibits superior performance but also meets the real-time requirements for traffic scenarios. In order to have an intuitive comparison of the performance of different methods, we evaluated different methods and retrogrades in the form of ROC curves. The details are shown in Figure 12. As the epoch gradually increases, the PSNR value of the clutter-free image generated by the model is larger. The area under the curve represents the performance of the model, and the larger the area, the better the performance.

5. Conclusions

In this paper, a CycleGAN-based clutter suppression method is proposed with the aim that a feature extraction network with clutter suppression can be obtained for the better detection performance of vehicle-mounted radar in traffic scenes. The goal of the clutter suppression is to effectively remove clutter and protect the integrity of the target. In the network, this paper improved the generator by introducing an attention mechanisms and fusing multi-scale features. The fusion of multi-scale features can effectively utilize contextual information and is more conducive to the determination of the target area. The attention mechanism can focus only on the target region of interest during the feature fusion process, effectively reducing redundant information and accelerating network computation. In addition, the introduction of target consistency loss not only avoids network training overfitting but also preserves the integrity of the target information. Extensive experiments were conducted on the publicly available dataset CRUW. The best performance was achieved on the test dataset when

λ

= 10 and

μ

= 1, obtaining a peak signal-to-noise ratio of 39.846 and a structural similarity of 0.990. The results of the experiment demonstrates that the method proposed in this paper can ensure the integrity and consistency of the target response while effectively removing clutter.

Importantly, this method does not require paired cluttered and clutter-free data for training, increasing the generalization performance for the different radar RA images in real scenes. At the same time, the generator of the trained clutter suppression method has a standard encoder structure that can be easily integrated into any object detection method. However, the training dataset utilized in this study is based on real static traffic scenarios (such as parking lots), where the clutter interference is significantly lower compared to dynamic traffic environments. Subsequent research is focused on improving the clutter suppression performance under more complex scenes and realizing higher-accuracy object detection based on the clutter suppression results.

Author Contributions

Conceptualization, Z.L. and T.Z.; methodology, Z.L. and T.Z.; software, Z.L.; validation, Z.L.; formal analysis, Y.W., Y.L. and H.Q.; investigation, Y.W., Y.L. and H.Q.; resources, Y.W., Y.L. and H.Q.; data curation, Z.L.; writing—original draft preparation, Z.L., writing—review and editing, Z.L., T.Z. and Y.L.; visualization, Y.W.; supervision, H.Q.; project administration, Y.W. and Y.L. funding acquisition, Y.W. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China 62131001, Beijing Natural Science Foundation 4232003, the Yuyou Talent Training Program of the North China University of Technology 218051360020XN115/014, the Yuxiu Innovation Project of NCUT (Project No.2024NCUTYXCX119), the Yuxiu Innovation Project of NCUT (Project No.2024NCUTYXCX210).

Data Availability Statement

The data presented in this study are openly available in [CRUW] at https://www.cruwdataset.org/.

Acknowledgments

We thank the anonymous reviewers for their good suggestions and comments to help improve the quality of the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wei, Z.; Zhang, F.; Chang, S.; Liu, Y.; Wu, H.; Feng, Z. MmWave Radar and Vision Fusion for Object Detection in Autonomous Driving: A Review. Sensors 2022, 22, 2542. [Google Scholar] [CrossRef] [PubMed]
Zhou, T.; Yang, M.; Jiang, K.; Wong, H.; Yang, D. MMW Radar-Based Technologies in Autonomous Driving: A Review. Sensors 2020, 20, 7283. [Google Scholar] [CrossRef] [PubMed]
Abdu, F.J.; Zhang, Y.; Fu, M.; Li, Y.; Deng, Z. Application of Deep Learning on Millimeter-Wave Radar Signals: A Review. Sensors 2021, 21, 1951. [Google Scholar] [CrossRef]
Huang, P.; Yang, H.; Zou, Z.; Xia, X.-G.; Liao, G. Multichannel Clutter Modeling, Analysis, and Suppression for Missile-Borne Radar Systems. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 3236–3260. [Google Scholar] [CrossRef]
Weng, Z.Y. Optimal design of clutter rejection filters for MTI system. In Proceedings of the 2001 CIE International Conference on Radar Proceedings (Cat No.01TH8559), Beijing, China, 15–18 October 2001; pp. 475–478. [Google Scholar] [CrossRef]
Wang, H.; Cai, L. A localized adaptive MTD processor. IEEE Trans. Aerosp. Electron. Syst. 1991, 27, 532–539. [Google Scholar] [CrossRef]
Yang, Y.; Xiao, S.-P.; Wang, X.-S. Radar Detection of Small Target in Sea Clutter Using Orthogonal Projection. IEEE Geosci. Remote Sens. Lett. 2019, 16, 382–386. [Google Scholar] [CrossRef]
Karlsen, B.; Larsen, J.; Sorensen, H.; Jakobsen, K.B. Comparison of PCA and ICA based clutter reduction in GPR systems for anti-personal Iandmine detection. In Proceedings of the 11th IEEE Singal Process Workshop on Statistical Singal Processing (Cat. No.01TH8563), Singapore, 8 August 2001. [Google Scholar]
Song, X.; Xiang, D.; Zhou, K.; Su, Y. Improving RPCA-Based Clutter Suppression in GPR Detection of Antipersonnel Mines. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1338–1342. [Google Scholar] [CrossRef]
Wen, L.; Zhong, C.; Huang, X.; Ding, J. Sea Clutter Suppression Based on Selective Reconstruction of Features. In Proceedings of the 2019 6th Asia-Pacific Conference on Synthetic Aperture Radar (APSAR), Xiamen, China, 26–29 November 2019; pp. 1–6. [Google Scholar] [CrossRef]
Guo, S.; Zhang, Q.; Shao, Y.; Chen, W. Sea Clutter and Target Detection with Deep Neural Networks. In DEStech Transactions on Computer Science and Engineering; DEStech Publishing Inc.: Lancaster, PA, USA, 2017. [Google Scholar]
Zhang, Q.; Shao, Y.; Guo, S.; Sun, L.; Chen, W. A Novel Method for Sea Clutter Suppression and Target Detection via Deep Convolutional Autoencoder. Int. J. Signal Process. 2017, 2, 35–40. [Google Scholar]
Geng, J.; He, J.; Ye, H.; Zhan, B. A Clutter Suppression Method Based on LSTM Network for Ground Penetrating Radar. Appl. Sci. 2022, 12, 6457. [Google Scholar] [CrossRef]
Ni, Z.-K.; Shi, C.; Pan, J.; Zheng, Z.; Ye, S.; Fang, G. Declutter-GAN: GPR B-Scan Data Clutter Removal Using Conditional Generative Adversarial Nets. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4023105. [Google Scholar] [CrossRef]
Mou, X.; Chen, X.; Guan, J.; Dong, Y.; Liu, N. Sea Clutter Suppression for Radar PPI Images Based on SCS-GAN. IEEE Geosci. Remote Sens. Lett. 2021, 18, 1886–1890. [Google Scholar] [CrossRef]
Weinberg, G.V. Constant false alarm rate detectors for Pareto clutter models. IET Radar Sonar Navig. 2013, 7, 153–163. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Krichen, M. Generative Adversarial Networks. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; pp. 1–7. [Google Scholar] [CrossRef]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer International Publishing: Cham, Switzerland, 2015. [Google Scholar] [CrossRef]
Panda, S.L.; Sahoo, U.K.; Maiti, S.; Sasmal, P. An Attention U-Net-Based Improved Clutter Suppression in GPR Images. IEEE Trans. Instrum. Meas. 2024, 73, 8502511. [Google Scholar] [CrossRef]
Pei, J.; Yang, Y.; Wu, Z.; Ma, Y.; Huo, W.; Zhang, Y.; Huang, Y.; Yang, J. A Sea Clutter Suppression Method Based on Machine Learning Approach for Marine Surveillance Radar. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 3120–3130. [Google Scholar] [CrossRef]
Wang, Y.; Jiang, Z.; Li, Y.; Hwang, J.-N.; Xing, G.; Liu, H. RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization. IEEE J. Sel. Top. Signal Process. 2021, 15, 954–967. [Google Scholar] [CrossRef]
Flir Systems. Available online: http://www.flir.com (accessed on 8 August 2024).
Texas Instruments. Available online: http://www.ti.com (accessed on 8 August 2024).
Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Ide, H.; Kurita, T. Improvement of learning for CNN with ReLU activation by sparse regularization. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2684–2691. [Google Scholar] [CrossRef]
Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar] [CrossRef]
Gao, S.; Cheng, M.M.; Zhao, K.; Zhang, X.Y.; Yang, M.H.; Torr, P. Res2Net: A New Multi-cale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 652–662. [Google Scholar] [CrossRef]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
Poon, M.W.; Khan, R.H.; Le-Ngoc, S. A singular value decomposition (SVD) based method for suppressing ocean clutter in high frequency radar. IEEE Trans. Signal Process. 1993, 41, 1421–1425. [Google Scholar] [CrossRef] [PubMed]
Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar] [CrossRef]

Figure 1. General idea of CycleGAN.

Figure 2. Overview of clutter suppression network. The architecture comprises the clutter suppression generator

G

, the clutter reconstruction generator

F

and discriminators

D_{X}

and

D_{Y}

. Here,

x

represents the Range–Angle images containing clutter, while

y

represents the Range–Angle images that are clutter-free. The output

G (x)

represents the clutter suppression result derived from

x

, and

F (y)

indicates the clutter reconstruction result for

y

. Additionally,

F (G (x))

reflects the clutter reconstruction result of

G (x)

,

G (F (y))

denotes the clutter suppression result of

F (y)

.

F (x)

is the target-enhancing result of

x

, and

G (y)

is the target-enhancing result of

y

.

Figure 2. Overview of clutter suppression network. The architecture comprises the clutter suppression generator

G

, the clutter reconstruction generator

F

and discriminators

D_{X}

and

D_{Y}

. Here,

x

represents the Range–Angle images containing clutter, while

y

represents the Range–Angle images that are clutter-free. The output

G (x)

represents the clutter suppression result derived from

x

, and

F (y)

indicates the clutter reconstruction result for

y

. Additionally,

F (G (x))

reflects the clutter reconstruction result of

G (x)

,

G (F (y))

denotes the clutter suppression result of

F (y)

.

F (x)

is the target-enhancing result of

x

, and

G (y)

is the target-enhancing result of

y

.

Figure 3. Architecture of generator. Sub-figure (a) is the complete structure of the generator. Sub-figure (b) is the different modules that make up the generator. Conv denotes the convolution layer; Instance Norm represents the normalization of a single graph; Leaky ReLU, Tanh represents the activation function; Conv Transpose represents the transposed convolutional layer; the left half represents the main feature extraction, the right half represents the enhanced feature extraction; AG represents the attention gate; C represents the skip connection.

Figure 4. Architecture of attention gate.

Figure 5. The performance of the attention gate. From left to right, (a) represents the input RA map of the network; (b) represents the attention coefficient; (c) represents the feature activation of the skip connection without the attention gate; (d) represents the feature activation of the skip connection with the attention gate.

Figure 6. Architecture of discriminator.

Figure 7. (a) Cluttered RA images; (b) clutter-free RA images.

Figure 8. Loss curves of network-training phase with different loss functions.

Figure 9. Clutter suppression results under different model. The first column shows the original RA maps, the last column shows the ground truth, and the middle four columns show the test results under different models.

Figure 10. Clutter suppression results under different loss functions. The first column shows the original RA maps, the second column shows the results under three-part loss function, and the third columns show the results under four-part loss function.

Figure 11. Clutter suppression results with different methods. The second and third columns presents the results of the traditional methods. The fourth and fifth columns presents the results of the deep learning-based method. The last column presents the results of this paper.

Figure 12. The ROC curve of different methods. The x-axis is by epoch, and the y-axis represents the value of PSNR.

Table 1. Comparison of clutter suppression methods from existing literature.

Category	Method	Key Contributions	Limitations
Traditional methods	MTI [5], MTD [6]	- Radar signal processing. - Doppler effect analysis.	Inability to adapt to real-time dynamic scenarios.
Traditional methods	SVD [7] PCA [8] RPCA [9]	- Dimensionality reduction to mitigate the impact of noise on data analysis. - Applying linear transformations to the data matrix to extract principal features.	Inability to adapt to real-time dynamic scenarios.
Deep learning methods	Li et al. [10] Guo et al. [11] Zhang et al. [12] Geng et al. [13] Wen et al. [10]	- Convolutional neural networks. - Robust feature representation capabilities (whether through automatic feature extraction or selective feature reconstruction) to distinguish between targets and clutter.	Requirement for corresponding labeled data.
	Declutter-GAN [14] SCS-GAN [15]	- Conditional generative adversarial network. - The adversarial learning mechanism effectively optimizes data, enhancing the performance of clutter suppression.	Dependence on additional specified conditional.

Table 2. Sensor configurations for CRUW dataset.

Camera	Values	Radar	Values
Frame rate	30 FPS	Frame rate	30 FPS
Pixels	1440 × 1080	Frequency	77 GHz
Resolution	1.6 MegaPixels	# of transmitters	2
Field of view	93.6°	# of receivers	4
Stereo baseline	0.35 m	# of chirps per frame	255
		Range resolution	0.23 m
		Azimuth resolution	15°

Table 3. Training and testing sets in dataset.

Data	Number
Training set A	23,352
Training set B	8400
Test	4180

Table 4. Quantitative evaluation of performance under different structures for generator. Among them, a larger value of PSNR is better, while a larger value of SSIM is desired.

Evaluation Metrics	Resnet		U-Net
Evaluation Metrics	Resnet_6blocks	Resnet_9blocks	Unet_128	Unet_256
PSNR (↑)	27.289	28.299	34.294	38.802
SSIM (↑)	0.957	0.970	0.978	0.987

Table 5. Quantitative evaluation of performance under different loss functions: The value of PSNR. A larger value of PSNR is better.

	$L_{G A N_1}$ $, L_{G A N_2}$ $, L_{c y c l e}$	$L_{t a r}$	Unet_256
1	√	×	26.013
2	√	√	38.802

Table 6. Quantitative evaluation of performance under different loss functions: The value of SSIM. A smaller value of SSIM is better.

	$L_{G A N_1}$ $, L_{G A N_2}$ $, L_{c y c l e}$	$L_{t a r}$	Unet_256
1	√	×	0.944
2	√	√	0.987

Table 7. PSNR of different hyperparameters. A larger value of PSNR is better.

	10	7	5	2
μ	10	7	5	2
0.5	38.802	27.344	25.9945	25.763
1	39.846	38.742	38.220	34.085
1.5	39.409	39.208	39.156	38.943
2.0	39.227	39.026	38.951	38.734

Table 8. SSIM of different hyperparameters. A smaller value of SSIM is better.

	10	7	5	2
μ	10	7	5	2
0.5	0.987	0.962	0.934	0.934
1	0.990	0.989	0.980	0.977
1.5	0.989	0.984	0.981	0.978
2.0	0.988	0.986	0.978	0.972

Table 9. Quantitative evaluation of performance under different clutter suppression methods. Among them, a larger value of PSNR is better, while a larger value of SSIM is desired.

Method	SVD	RPCA	Pix2Pix	CycleGAN	Ours
PSNR (↑)	25.58	28.83	33.758	34.294	39.846
SSIM (↑)	0.874	0.905	0.969	0.978	0.990
Time(s)	0.0102	2.0839	0.493	0.008	0.017

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Z.; Li, Y.; Wang, Y.; Zheng, T.; Qu, H. Millimeter-Wave Radar Clutter Suppression Based on Cycle-Consistency Generative Adversarial Network. Electronics 2024, 13, 4166. https://doi.org/10.3390/electronics13214166

AMA Style

Li Z, Li Y, Wang Y, Zheng T, Qu H. Millimeter-Wave Radar Clutter Suppression Based on Cycle-Consistency Generative Adversarial Network. Electronics. 2024; 13(21):4166. https://doi.org/10.3390/electronics13214166

Chicago/Turabian Style

Li, Ziyi, Yang Li, Yanping Wang, Tong Zheng, and Hongquan Qu. 2024. "Millimeter-Wave Radar Clutter Suppression Based on Cycle-Consistency Generative Adversarial Network" Electronics 13, no. 21: 4166. https://doi.org/10.3390/electronics13214166

APA Style

Li, Z., Li, Y., Wang, Y., Zheng, T., & Qu, H. (2024). Millimeter-Wave Radar Clutter Suppression Based on Cycle-Consistency Generative Adversarial Network. Electronics, 13(21), 4166. https://doi.org/10.3390/electronics13214166

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Millimeter-Wave Radar Clutter Suppression Based on Cycle-Consistency Generative Adversarial Network

Abstract

1. Introduction

1.1. Traditional Clutter Suppression Methods

1.2. Deep Learning Clutter Suppression Methods

2. Operational Principles of CycleGAN

3. Proposed Method

3.1. Architecture of the Clutter Suppression Network

3.1.1. Architecture of Generator

3.1.2. Architecture of Discriminator

3.2. Loss Function

4. Experiments and Results

4.1. Experiment Implementation

4.1.1. Data Preparation

4.1.2. Network Training

4.1.3. Evaluation Index

4.2. Ablation Experiment

4.2.1. Model Choice

4.2.2. Ablation of Loss Function

4.2.3. Ablation of Hyperparameters

4.3. Suppression Performance Comparision

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI