1. Introduction
Synthetic aperture radar (SAR) is an air-born radar imaging sensor capable of producing radar images under all weather conditions and at any time of the day. It provides backscattering information on electromagnetic echoes produced by radar. Many different applications using radar imaging technology have been used widely in polarimetry and interferometry for the monitoring of the Earth’s surface in real time. SAR is a coherent imaging system that is affected by speckle noise, which hinders details of the observed scene and makes automatic image interpretation difficult. Speckle noise is a consequence of many scattered echoes with a shifted phase within the same resolution cell. The sum of all echoes within the resolution cell causes strong fluctuations in intensity from one cell to another. SAR data are generally complex-valued; therefore, information of interest can be explored within the SAR amplitude data or a phase part. In recent years, many different SAR satellites, including Cosmo SkyMed, Sentinel 1, ALOS-2, and RadarSAT2, have been orbiting around the Earth, and the data can be accessed very easily. Speckle removal is an essential task before performing SAR data processing. Many different algorithms have been proposed for suppressing speckle since 1980 [
1].
Speckle can be suppressed by using multi-looking techniques or spatial and frequency averaging, applying Bayesian inference or time-frequency analysis, averaging neighbor pixels, and, more recently, using convolutional networks. Speckle is multiplicative noise; therefore, models for despeckling should estimate speckle and preserve all spatial features, edges, and strong scatterers. The estimated speckle should have a Gamma distribution with a mean equal to 1.
Different methods of despeckling exist in the literature that depend on the modeling of SAR data. Adaptive spatial domain filters, such as Lee [
2], Kuan [
3], and Frost [
4], deploy a weighted average of the central pixel intensity and the average intensity of its neighboring pixels in the moving window. An advanced version of the Lee and Frost filters was proposed that operates similarly but categorizes the coefficient of variation into homogeneous regions, heterogeneous regions, and isolated points [
5].
Model-based despeckling involves maximizing the posterior (MAP) probability density function (PDF), which consists of PDF data and a model [
6]. A group of despeckling techniques involves discrete wavelet transform (DWT). These techniques involve reducing noise by applying thresholding to the coefficients of the log-transformed single-look image’s DWT. Several works have been proposed within wavelet and the second generation of wavelet transforms, which use the threshold within the wavelet subbands [
7], or apply Bayesian inference to the subbands [
8]. The authors of [
9] proposed a weighted-average algorithm of similar pixel values. Pixel similarity is defined as the Euclidean distance between patches. A more general definition of pixel similarity is based on a noise distribution model, as proposed in [
9]. A scattering-based version of the SAR block-matching 3D (SARBM3D) filter [
8] was presented in [
10]. The authors of [
10] modified the original algorithm from [
8] and exploited the already available information on the imaged scene, and the authors of [
11] proposed a despeckling evaluation framework.
Deep learning has also been investigated for image-denoising tasks in recent years. Deep learning methods have shown a lot of success in classification [
12] and low-level computer vision problems, such as segmentation [
13], denoising [
14], and super-resolution [
15]. State-of-the-art image restoration with high-quality results and real-time processing capability using a convolutional neural network (CNN) was proposed in [
16]. The deep learning approach can also be applied to the denoising of optical images, mainly in the context of additive white Gaussian noise (AWGN). A dramatic improvement in denoising can be achieved using advanced regularization and learning methods, such as Rectifier Linear Unit (ReLU), batch normalization, and residual learning [
17]. The authors of [
14] presented a feed-forward denoising convolutional neural network (DCNN) following a residual learning approach [
17] paired with batch normalization (BN) [
18]. Residual learning methods perform better since they focus on predicting the residual image (i.e., noise image) instead of directly producing a noise-free image. The residual learning approach helps improve the training of CNNs. CNNs learn better when asked to produce an output that is significantly different from the input [
14,
19]. The authors of [
20] proposed an image-despeckling convolutional neural network (ID-CNN), which assumes multiplicative speckle noise, recovering the filtered image through a component-wise division-residual layer.
Recent research has proposed different machine learning approaches for processing SAR images, such as the SAR Dilated Residual Network (SAR-DRN) [
21], which employs dilated convolutions and a combination structure of skip connections with residual learning, the CNN-based deep encoder–decoder architecture of the U-Net to capture speckle statistical features [
22], and a non-local despeckling method for SAR images. In this method, the weight of the target pixel is estimated using a convolutional neural network [
23]. The despeckling of SAR images within the contourlet wavelet transform using a CNN-based structure was proposed in [
24]. The CNN-based despeckling of polarimetric SAR data using a complex-value CNN was proposed in [
25]. Recently, CNNs have been updated with attention mechanisms [
26], where a residual attention network maps features better within an encoder–decoder network. A second-order channel attention that refines convolutional features using second-order statistics was proposed in [
27]. The despeckling performance was improved by incorporating an attention mechanism [
28,
29,
30,
31]. Almost all despeckling methods introduce several artifacts, which are consequences of either image modeling or the spatial relation between image data. The SAR evaluation procedures are well defined in [
11].
Recently, published methods for SAR image despeckling have used convolutional neural networks [
32,
33,
34,
35,
36]. The authors of [
32] proposed a residual network known as SAR-DRDNet, which consists of non-local and detail recovery parts and uses the global information of the SAR image and multiscale contextual information of the pixels. A wavelet-based thresholding method known as MSPB [
33] uses the pixel neighborhood and a bilateral filter for noise suppression, together with an intelligent Bayesian thresholding rule. SAR-DDPM [
34] is a denoising diffusion probabilistic model with a Markov chain. The despeckled image is obtained through a reverse process that predicts the added noise iteratively using a noise predictor conditioned on the speckled image. The authors of [
35] employed an overcomplete CNN architecture to focus on learning low-level features by restricting the receptive field. The proposed network consists of an overcomplete branch that focuses on the local structures and an undercomplete branch that focuses on the global structures. Transformer-based SAR image despeckling [
36] comprises a Transformer-based encoder, which allows the network to learn global dependencies between different image regions, aiding in better despeckling. The network is trained end-to-end with synthetically generated speckled images using a composite loss function. The SAR-CAM method [
37] improves the performance of the encoder–decoder CNN architecture by using various attention modules to capture multiscale information.
A self-calibrated dilated convolutional neural network for SAR image despeckling called SARSCDCNN was proposed in [
38], which consists of several self-calibrated blocks. The features are extracted within two branches, representing the contextual features within the original space and the features within the long-range space. A down-up sampling operation and convolutions with hybrid dilated rates are used. Multi-temporal features and a similarity estimation approach to despeckling were proposed in [
39]. A single-image-capable despeckling method using similarity-based block-matching and a noise-referenced encoder–decoder convolutional neural network exploits similarity-based block-matching within one noisy SAR image. The method uses a Siamese network to share parameters between two branches.
Traditional CNNs try to increase the receptive field size as the network goes deeper, thus extracting global features. However, the speckle is relatively small, and increasing the receptive field does not help extract speckle features. The goal of newly developed methods is to incorporate newly designed features into deep convolutional networks, such as spatial attention mechanisms and transformers.
This paper presents two different approaches to the modeling of SAR data statistics, both using a CNN. The first method uses residual learning with additive noise, similar to [
14]. The adaptation of optical imaging processing to SAR image processing is achieved by handling multiplicative speckle noise and using a dual or Siamese CNN. Each sub-network consists of a structure and loss function, which ensures that each sub-network is the most similar to the corresponding clean SAR image. The minimal configuration of each sub-network assures minimal training time using the proposed training database. The second method uses an encoder–decoder structure, similar to the U-Net structure, and adds a multi-resolution attention mechanism for advanced speckle modeling. The attention mechanism consists of several additional sub-networks: a Dilated Residual Network (DRN), an Attention Supervision Network (ASN), and a hlMulti-resolution Attention Mechanism (MAM). The experimental results of the proposed methods show very good results in speckle estimation and provide visual results similar to those of the SARBM3D method for synthetic and real data. In addition, the proposed methods provide better results compared to the SAR-CAM and overcomplete neural network methods in objective and subjective measurements.
5. Discussion
In this paper, we propose two methods based on a CNN to improve the overall performance of existing methods. The proposed methods differ in their network structures. The first one exploits a Siamese structure with a DRN network. This method introduces a minor novelty over existing approaches because it “just” combines two existing methods. The second proposed method uses recent trends in object detection and combines a U-shaped network with a multi-resolution attention mechanism. The methods were compared with the overcomplete CNN and SAR-CAM methods. The SAR-CAM method uses some of the components within the proposed network’s structure. We chose synthetic and real SAR images to compare the despeckling efficiency of five methods. The results obtained from the synthetically generated images showed that the OCNN and SAR-CAM methods can estimate speckle noise, but they are not capable of preserving high SAR dynamics such as point scatterers, edges, and changes of contrast. They could not preserve textural features. The reason for the very bad efficiency of the OCNN and SAR-CAM methods is that the authors of these methods converted SAR images into 8-bpp images and decreased the SAR image dynamics. Therefore, the methods do not have any practical value, except for competing in PSNR with existing methods. Considering the examples using synthetically generated images, we can conclude that the homogeneous areas were over-smoothed and some bias in the SAR image’s mean value was introduced by the OCNN and SAR-CAM methods. This may be a consequence of the fact that we did not scale the presented images to an 8-bit dynamic. The SARBM3D, SNN, and ABCNN methods achieved very similar results in the objective and subjective measurements, as reported in
Table 1,
Table 2,
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7. The methods achieved different despeckling efficiencies due to their different network structures. Also important is the training of the network so that it includes diverse SAR features.
When used to despeckle the real SAR image, the proposed methods achieved results that were comparable to those of the SARBM3D method, more successfully modeled homogeneous areas and estimated speckle noise in homogeneous areas, and effectively preserved textural features. The OCNN and SAR-CAM methods over-smoothed the real SAR image, destroying the point features and textural features, as shown in the Experimental Results section.
Further work could include recent advances in SAR image processing, where the attention mechanism can have different structures and can be exploited by transformer-based structures. Recent advances have shown that the standard approaches to adaptive mean weighting and nonlocal means can be combined successfully with CNN-based methods. To achieve improvements in efficiency, it is important to find a mechanism that is able to assign higher importance to the strong features and restrain the non-important ones.