3.3. Experiment Results
The proposed method is compared with the bicubic interpolation, SNGAN, Real-ESRGAN and PDM-SR. SNGAN is a GAN-based method aiming at SAR images and requires paired datasets. Real-ESRGAN and PDM-SR are blind SR methods that do not need paired datasets. To compare these methods effectively, we trained the first method with the paired dataset and trained the last two methods with the unpaired dataset. Finally, we tested the above methods with synthetic images and real SAR images, respectively.
Table 2 and
Table 3 lists the average PSNR, SSIM and ENL of several SR methods at a scale factor of ×4 in synthetic and real SAR datasets. As
Table 2 listed, the proposed method achieves the best performance on the metrics of ENL and shows the second-best result on PSNR and SSIM, except for the bicubic interpolation method. The bicubic interpolation method gets good results on PSNR and SSIM because the input LR images are bicubic down-sampled from the reference HR images, however, the better performance on metrics cannot prove that they would be better on visual perception, as shown in
Figure 6 and
Figure 7. SNGAN is trained with synthetic paired data and shows worse performance than the other methods when compared to SSIM, which reveals the domain gap between the real and synthetic SAR images. Real ESRGAN gets better results due to overcoming the domain gap by learning with real unpaired SAR images. Real ESRGAN pays more attention to the degradations such as blur, resize, noise and JPEG compression, and it adopts a high-order degradation model. However, if these degradation model designs are not suitable for the SAR mechanism, especially for the SAR noise mechanism, they may not obtain great results on real SAR images in terms of ENL. Thus, Real ESRGAN shows better performance on reconstruction metrics but worse on the noise intensity metric. The PDM SR adopts a probabilistic degradation model on noise module design, which pays more attention to estimating the noise in a probabilistic way, thus, it gains great performance on ENL second to the proposed method. The proposed method introduces several SAR noise priors into the degradation model in a probabilistic way and demonstrates the best performance on denoising in terms of ENL.
Our proposed method is a joint super-resolution reconstruction and despeckling co-processing, which inherited the advantages of both tasks. The metric of ENL represents the noise level and the other methods do not take full account of the noise problems, therefore, the proposed method with the denoising function gets the best score in ENL. the input images of the synthetic paired dataset are obtained from the corresponding real high-resolution images in a bicubic down-sampling way, and some of the methods we compared are trained with the synthetic data in a supervised training strategy, while the proposed method is aimed at the real scene SAR images and is trained with real SAR images all in an unsupervised way, which means that the former methods have introduced the bicubic deterministic degradation model in the training process, while the proposed method learns the probabilistic implicit degradation mode between real LR and HR SAR images that follows, definitely not the bicubic mode. The degradation modes that the two kinds of methods learned are different. The synthetic paired dataset fits the methods trained with bicubic mode data and they will get better scores on PSNR and SSIM because the forward process is just the reverse process of the bicubic down-sampling. However, it is not good for the blind SR problem because the degradation mode in the real scene is definitely not bicubic mode. Therefore, the proposed method may not get better performance in the synthetic dataset in which the LR images are bicubic down-sampled from the corresponding HR images.
However, the SR problem is ill-posed, and there is no sole criterion to evaluate the SR performance. Additionally, it is unfair to judge the blind SR method by the reference metrics because the other methods used the reference HR images in the training process, while the blind SR methods did not. Therefore, we evaluate the SR results comprehensively by combining visual perception and metrics. When compared with other methods using real SAR image datasets, the proposed method also shows the best performance on ENL.
To further demonstrate the advantages of our method, we select several visualization results shown in
Figure 6 and
Figure 7.
Figure 6 shows the SR result of several typical targets such as buildings, flat surfaces and ships in the synthetic SAR dataset. In order to compare the details of SR performance, we provide enlarged views of the areas marked by yellow rectangles. From the perspective of image visual effects, it can be found that the proposed method shows powerful capability both in denoising and SR. In
Figure 6, we present three sets of pictures processed by different methods, as shown in each column, and the areas in the yellow boxes are enlarged for more details visually, as shown in the second row. Compared to interpolation and SNGAN, which require paired datasets, the buildings processed by the proposed method in
Figure 6 are exposed obviously due to the suppression of backing noise, which demonstrates that the proposed method tends to improve the denoising of SR images instead of the SR counterpart, similar to the real images. At the same time, the texture of buildings and roads can be restored well by the proposed method. Real ESRGAN also gains great denoising performance, but some of the results show artifacts and generate some weird patterns that are incompatible with the real scenes. In this aspect, the proposed method can restore continuous lines from corrupt images without artificial patterns, as shown in the red box in the second row, which is conducive to subsequent image interpretation. In the second example, the proposed method can suppress the noise near the strong scattering target, and it can retain the target scattering elements when suppressing cross bright spots, which may resolve the cross bright spots of SAR images caused by signal processing of pulse compression. In the third example, the original image has plenty of noise in flat areas, which can be suppressed after the proposed SR process. Its enlarged parts shown in the second row demonstrate that the proposed method can suppress the granularity noise and retain the texture.
In
Figure 7, we select several scenes from real SAR images to illustrate the generalization capability when processing real scene images. The proposed method can also reduce the impact of sinc bright spots caused by strong scattering targets. As shown in the first example, the oil tanks suffer from noise near the strong scattering area, like the two-dimensional sinc function, and the proposed method can suppress the sinc noise by suppressing the side lobes. In the second example, the proposed method also generates the image with the least noise in the areas of the target and the backing ground. Meanwhile, GAN-based methods usually produce artifacts and nonexistent texture due to the arbitrariness of the generator. As depicted in the third example, SNGAN and real-ESRGAN generate some weird texture near the oil tanks while the proposed method suppresses the speckle noise without artifacts. In brief, the proposed method can retain the texture of original images and suppress noise effectively.
To further illustrate the despeckling effect, we compare our method with different kinds of despeckling methods. The spatial domain filtering method used by Kuan [
59] and Frost [
41], the non-local mean method NLM [
60] and SAR-BM3D [
61], and the deep-learning-based method DnCNN [
16] are implemented for comparison with the proposed method. For a fair comparison, the LR images are restored by a basic bicubic upsampling process.
Figure 8 shows the best performance of the proposed method among the mainstream despeckling methods in terms of ENL.
To illustrate the stability for different intensities of noise, we select the real scene SAR images with noise of different intensities. As
Figure 9 depicted, the SAR images with the noise of lower intensities (higher ENL scores) can get better despeckling performance, while the images with lower ENL scores may not get too much improvements in terms of ENL scores due to the inherent severe noise. From the perspective of visual effects, the processed images also get great despeckling performance according to the row b in
Figure 9.
Additionally, we discover an enormous potential for detection due to the powerful capabilities for exposing targets by generating the details and eliminating the noise. The proposed method can improve mAP and reduce the training epochs effectively, which indicates that the SR methods can reduce missed detection and false alarms significantly. We train a target detection algorithm based on yolov5 with and without the SR preprocessing, and the other parts of the algorithm are set identically. We train the model with the MSAR [
62] dataset, which contains 28,449 SAR images of ships, bridges, planes and oil tanks.
Table 4 shows the detail of the MSAR dataset. MSAR contains more categories than other mainstream SAR target detection datasets, and it remains challenging on severe noise and unbalanced samples, as shown in
Figure 10.
The metrics to measure the detection effect mainly include accuracy, recall and average accuracy (
mAP), which refer to the rate of correctly recognized samples in all positively detected samples, the rate of correctly identified samples in all ground truth samples, and comprehensive metric calculated by
P and
R, respectively. The definitions are as follows:
where
TP,
FP and
FN refer to positive samples predicted as positive, negative samples predicted as positive and negative samples predicted as negative, respectively.
We modify the yolov5 with the proposed SR processing, which is used as pre-processing before the whole detection mode. The SR processing brings a huge improvement to detection.
Figure 11 shows the training curve of the above training method and
Table 5 and
Table 6 show the comparison indicators of each target. From the training curve, the SR process improves mAP to 0.838 within less than 100 epochs, while the baseline can only reach 0.687 within 500 epochs, which indicates that the proposed SR process can improve the efficiency and accuracy of the target detection task significantly. In
Table 5 and
Table 6, we can also see that the metrics of each category are improving to varying degrees, while the declination of mAP in the bridge category is mainly caused by unbalanced quantity. The substantial improvement is mainly caused by extra information from the SR process, which provides effective information based on LR images and eliminates the interference of noise. The powerful capabilities for exposing targets by generating the details and eliminating the noise show great potential for detection tasks by significantly reducing missed detection and false alarms.
To further illustrate how the SR process improves the performance of detection by reducing missed detection and false alarms, we select several examples of the detection results under the two methods, as shown in
Figure 12 and
Figure 13. It is clear that the results with SR methods have less missed detection and false alarms. The LR images contain plenty of strong local noise, which can easily be falsely recognized as targets. As shown in
Figure 12, several ships and planes are falsely detected on land. Additionally, the heavy noise causes the targets to be drowned out, as shown in
Figure 11; an oil tank and several ships are missed due to the heavy noise. The SR images processed by the proposed method have lower noise and clearer targets, which is conducive to various interpretations.
To compare the proposed method with other detection methods in this challenging dataset, we choose P, R, mAP and time for each image as indicators.
Table 7 shows the performance of these methods, the first three methods trained with MSAR dataset are carried out in [
62]. The proposed method improves the recognition accuracy greatly with little increase in time consumption.
3.4. Ablation Results
To illustrate the effectiveness of the proposed method, we design ablation experiments with and without each part of the proposed method. We mainly explore the effectiveness of training data (paired or unpaired), SR model (EDSR or RRDBnet), degradation model (original or improved PDM), kernel design (whether adjust the kernel size of convolution layer) and loss (with or without perception loss). The details of the ablation results are listed in
Table 8, in which, the highest score is marked in bold red and the second highest score is marked in bold black.
Through five sets of comparisons, the proposed method does not achieve the highest accuracy in every index at the same time, but achieves the best results in balancing each index and visual perception, and gets the second-best score. For training data, the adoption of unpaired data will not improve evaluation indicators such as PSNR and SSIM that require reference images, but will tend to find internal connections, which is reflected in improving ENL. For the degradation model, the improved PDM can greatly improve the image quality due to the introduction of SAR prior information into the model, which can make SR better while maintaining the same content of the training data. For the SR model, RRDBnet is an improvement of the super-resolution network, which has a stronger learning ability; therefore, it can get better results. For the kernel design, when adopting the improved kernel module design according to the noise and image characteristics without adding the perception loss, although the ENL is greatly improved and the noise is greatly reduced, the quality of the image structure cannot be guaranteed, which is reflected in the blurring of the image and the thickening of the lines. The perception loss is to prevent the GAN model from learning arbitrarily, so that the visual effect of the images can be closer to the results that people accept most; therefore, the metrics show good improvement when compared to the baseline method.