PatchMask: A Data Augmentation Strategy with Gaussian Noise in Hyperspectral Images

Dou, Hong-Xia; Lu, Xing-Shun; Wang, Chao; Shen, Hao-Zhen; Zhuo, Yu-Wei; Deng, Liang-Jian

doi:10.3390/rs14246308

Open AccessArticle

PatchMask: A Data Augmentation Strategy with Gaussian Noise in Hyperspectral Images

by

Hong-Xia Dou

¹,

Xing-Shun Lu

²,

Chao Wang

^3,*,

Hao-Zhen Shen

²,

Yu-Wei Zhuo

⁴

and

Liang-Jian Deng

⁵

¹

School of Science, Xihua University, Chengdu 610039, China

²

School of Information Engineering, Zhejiang Ocean University, Zhoushan 316022, China

³

Key Laboratory of Oceanographic Big Data Mining and Application of Zhejiang Province, Zhoushan 316022, China

⁴

Yingcai Honors College, University of Electronic Science and Technology of China, Chengdu 611731, China

⁵

School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(24), 6308; https://doi.org/10.3390/rs14246308

Submission received: 11 November 2022 / Accepted: 29 November 2022 / Published: 13 December 2022

(This article belongs to the Special Issue Deep Learning for the Analysis of Multi-/Hyperspectral Images)

Download

Browse Figures

Versions Notes

Abstract

:

Data augmentation (DA) is an effective way to enrich the richness of data and improve a model’s generalization ability. It has been widely used in many advanced vision tasks (e.g., classification, recognition, etc.), while it can hardly be seen in hyperspectral image (HSI) tasks. In this paper, we analyze whether existing augmentation methods are suitable for the task of HSI denoising and find that the biggest challenge lies in neither losing the spatial information of the original image nor destroying the correlation between the various bands for HSI denoising. Based on this, a new data augmentation method named PatchMask is proposed, which makes the training samples as diverse as possible while preserving the spatial and spectral information. The training data augmented by this method are somewhere between clear and noisy, which can make the network learn more effectively and generalize. Experiments demonstrate that our method outperforms other data augmentation methods, such as the benchmark CutBlur, in enhancing HSI denoising. In addition, the given DA method was used on several popular denoising networks, such as QRNN3D, DnCNN, MPRnet, CBDNet, and HSID-CNN, to verify the effectiveness of the proposed method. The results show that the given DA could increase the value of the PSNR by 0.2∼0.5 dB in various examples.

Keywords:

data augmentation; hyperspectral image (HSI); HSI denoising; deep learning; convolutional neural network; Gaussian noise

1. Introduction

Hyperspectral images (HSIs) have important applications in many fields, such as remote sensing [1,2], food safety [3,4], astronomy [5], medicine [6], and agriculture [7,8]. However, during the imaging process, due to the interference of complex human factors and the influence of the natural environment, such as illumination, the collected HSIs often have various noises (e.g., Gaussian noise). Therefore, there has been much work aimed at improving the performance of hyperspectral remote sensing images, such as through pansharpening [9,10,11], super-resolution [12], and denoising [13].

Most successful traditional HSI denoising methods are based on certain strong prior knowledge, such as low-rank representation [14,15,16,17,18], sparse coding [19,20,21,22,23], global correlation along spectra [24,25], and so on. With the development of deep learning (DL), DL-based methods have drawn more and more attention [13,26,27,28], such as those using convolutional neural networks (CNNs). In order to achieve a better denoising effect, DL-based methods need a large number of training samples to learn network parameters. However, the currently widely used datasets (ICVL [29], Pavia [30], etc.) have a limited number of training samples because HSIs are more challenging to obtain than RGB images. Therefore, we hope to extend the data augmentation method to the task of HSI denoising, generate new samples with more positive feedback for the network, and further improve the denoising effect of the network.

Data augmentation (DA) is an effective way to improve performance without increasing the computational cost in the machine learning field, and it can improve model robustness and reduce model sensitivity to data at the same time. The core idea of most DA methods is to partially block or obfuscate the training sample so that the model can gain a greater generalization ability. The most commonly used geometric transformations are flipping, rotation, cropping, scaling, translation, and so on. When combined with deep neural networks, DA strategies have been successfully applied in high-level vision tasks, such as image classification [31,32,33,34,35,36,37,38] and object recognition [39,40]. Some typical operations include feature space augmentation [41,42], adversarial training [43,44,45], and so on. However, it has been found that most existing DA methods lead to loss or confusion of spatial information between pixels if applied directly to some low-level vision tasks, such as HSI denoising. Unlike in high-level tasks, this relationship between pixels plays an important role in low-level vision tasks. Sharp transitions, mixed image content, and a lack of pixel relationships in the image can all degrade model performance. Therefore, those DA methods hinder the model’s ability to recover images and cannot be directly used for low-level tasks.

Many studies proposed alleviations of the limitations of DA in low-level vision tasks [46,47]. Radu et al. [48] used simple geometric manipulations, such as rotation and flipping, to improve the performance of single-image super-resolution (SISR), which is the most basic DA method. Yoo et al. [47] further proposed CutBlur for the super-resolution task for ordinary color images, which showed marginal improvement. CutBlur introduces parts of high-resolution image information into low-resolution images by replacing low-resolution patches with the corresponding high-resolution patches, which provides a beneficial regularization effect for model training and minimizes the boundary effects. They also explored the possibility of applying this method to other low-level vision tasks.

However, the above methods are still not directly applicable to the task of HSI denoising. Although the CutBlur method, which clips high-resolution and low-resolution images to each other, has improved the performance of super-resolution models, more improvements are still needed. After copying and pasting a clear image and a noisy image with CutBlur, the noise in the generated new samples will gather together, preventing the network from learning the difference between clear and noisy images to the greatest extent possible. We want to refine this difference in such a way that some parts of the new samples that are generated are more noticeable, and those parts can be well learned.

Motivated by this, this work specially designed a new DA method named PatchMask for HSI denoising. First, the noisy and clean images are segmented into patches, and then a certain number of noise patches are randomly selected and exchanged with the clean patches at the corresponding positions to generate two new training samples with partial noise-image information and partial clean-image information. Through our PatchMask method, the network can know not only the presence and intensity of noise, but also the noise regions that should be paid more attention to. Our main contributions are as follows:

Limited DA methods are currently explicitly designed for HSIs; thus, the proposed method that combines the characteristics of HSIs makes them more advantageous. Our PatchMask can learn the the difference between clear and noisy samples more precisely and pay more attention to the noisy areas.
Our PatchMask method was applied to several HSI denoising models and achieved good performance in the presence of Gaussian noise. Plenty of experiments on the ICVL and CAVE datasets show that our method can improve the performance of multiple networks and has a certain universality.

This paper presents our work in the following order. First, Section 2 gives a brief review of HSI denoising methods and DA methods. In Section 3, the new DA method, namely, PatchMask, is described in detail. In Section 4, we describe the extensive experiments that were conducted to demonstrate the effectiveness of our method. Finally, Section 5 concludes the whole paper.

2. Related Work

2.1. HSI Denoising Methods

HSI denoising methods can be classified into two categories: traditional methods and DL-based methods. The traditional HSI denoising task [14,15,16,17,18,49,50,51,52,53,54,55] is generally based on some strong prior knowledge or the similarities and correlations of some image blocks. For example, Block Matching 3D (BM3D [53]) uses the idea of non-local block matching, that is, finding similar blocks and performing domain transformations on similar blocks. The authors also used collaborative filtering to reduce the noise contained in similar blocks. In addition, expected patch log likelihood (EPLL [54]) forms a learned model of natural image patches for image restoration, and low-rank matrix recovery (LRMR [55]) recovers HSI through a low-rank matrix. These methods have all achieved good results.

With artificial intelligence technology’s rapid development and gradual maturity, various DL-based methods have also been widely used in HSI denoising tasks [56,57,58,59,60]. For the first time, a fully convolutional neural network (CNN), HSI-DeNet [57], was used for HSI recovery; it simultaneously introduced residual learning, dilated convolution, and multi-channel filtering, and it achieved good results. Zhang et al. [61] took one step forward by investigating the construction of feed-forward denoising convolutional neural networks (DnCNNs) to embrace the progress in very deep architecture, learning algorithms, and regularization methods in image denoising. Based on this, the spatial–spectral gradient network (SSGN [58]) adopted the spatial–spectral gradient learning strategy to effectively extract the deep features of HSIs. Maffei et al. [28] efficiently took into consideration both the spatial and spectral information contained in HSIs and proposed a model called HSI single-denoising CNN (HSI-SDeCNN). Due to the multi-band characteristics of HSIs, 3D convolution has also gradually been used in HSI denoising tasks, such as in 3DADCNN [59] and QRNN3D [60]. QRNN3D [60] proposed an alternating directional 3D quasi-recurrent neural network that could effectively embed the domain knowledge of structural spectral correlations and global correlations along the spectrum. In CVPR 2021, MPRnet [62] was a proposal of a multi-stage architecture that progressively learns restoration functions for degraded inputs, thereby breaking down the overall recovery process into more manageable steps. However, the limited training samples restrict the effects of DL-based denoising methods.

2.2. Data Augmentation Methods

DA is an effective method for improving data diversity and network generalization, and its essence is the process of using existing data to create new data. The whole process not only increases the number of samples, but also makes the dataset “stronger”. The newly generated samples have more positive feedback for the network.

When processing traditional RGB images, basic DA methods, such as flipping, rotation, cropping, translation, etc., are generally used the most. For example, Ballester et al. [63] used the RandomResizedCrop strategy. These methods have strong practicality and can be used in many tasks, and not just for high-level tasks (e.g., image recognition, image classification, etc.). They are also widely used in many low-level tasks, such as the image dehazing task, to which Wang et al. [64] applied random cropping and flipping.

In addition, the original samples can be expanded by erasing or occluding part of the image. For example, DeVries et al. [65] occlude the input image with a random Cutout Box Mask. After that, Zhong et al. [66] replace the original image area with random values or the average pixels of the training set. It is also possible to add new samples by superimposing pixels [67]. These methods all improve the generalization ability of the network, but they also have certain limitations and are usually applied to high-level tasks.

Recently, Yoo et al. [47] provided a comprehensive analysis of the existing augmentation methods applied to the super-resolution task. CutBlur [47] brings DA methods into the super-resolution field. The method of clipping the image blocks of the same position to each other further highlights the regional differences between low-resolution and high-resolution, making the network pay more attention to the target area and improving the performance of the network. Ghaffar et al. [46] fused auxiliary channels (or custom bands) with each training sample, which helped the model learn useful representations.

2.3. Motivation

Due to the simplicity and effectiveness of CutBlur, it inspired us to consider the idea of CutBlur for HSI problems, for which there has been limited work on DA. However, CutBlur focuses on natural images and only considers a fixed patch with a large image region for DA, ignoring the randomness of small patches with different image contents. This point may weaken the representation and diversity of data, limiting the performance of DA. For example, a fixed patch with a large image region for CutBlur may only contain one type of image context, e.g., seawater, whereas if we conduct the randomness of small patches in an image, they may cover more types of image contents, which will help the representation of DA. In addition, since we mainly focus on hyperspectral images, where the important task is to preserve the spectral continuity, when conducting DA on hyperspectral images, it is better to consider patches with complete spectral bands. Motivated by the aforementioned points, we propose a new DA approach for the fundamental HSI denoising problem.

3. Proposed Method

In this section, we introduce PatchMask, a new DA method designed for the task of HSI denoising.

The denoising task is a process of separating or decoupling noise from an image, and the same is true for the HSI denoising task. The source of the noise map is described as follows:

y = x + n

(1)

where y refers to the HSI that we obtained. Due to various influences in the acquisition process, y contains various noises n (e.g., Gaussian noise). After removing the noise n, a clear image x can be obtained.

y, x, n \in R^{H \times W \times B}

, H and W represent the height and width of the spatial resolution, respectively, and B represents the number of HSI bands.

3.1. Algorithm

First, the noisy and clear images are divided into

α

patches. Then, a certain number of patches in the noise image are randomly selected and pasted into the corresponding positions in the clean image, and vice versa. This generates two new training samples (

{\hat{x}}_{(n o i s e \to g t)}

,

{\hat{x}}_{(g t \to n o i s e)}

). Our description can be summed up in the following formulas:

{\hat{x}}_{g t \to n o i s e} = \sum_{i = 1}^{α - α \times e N} M ⊙ x_{n o i s e}^{p_{i}} + \sum_{i = 1}^{α \times e N} (1 - M) ⊙ x_{g t}^{p_{i}}

(2)

{\hat{x}}_{n o i s e \to g t} = \sum_{i = 1}^{α - α \times e N} M ⊙ x_{g t}^{p_{i}} + \sum_{i = 1}^{α \times e N} (1 - M) ⊙ x_{n o i s e}^{p_{i}}

(3)

where

M \in {0, 1}

represents the binary mask, 1 means using all of the image, and 0 means the opposite.

α

represents the total number of patches, and

e N

represents the number of swaps. ⊙ means that the array elements are multiplied in sequence. The above formula can generally be described as synthesizing a new sample, that is, the patches of the two images at random positions are exchanged with each other. The entire process is shown in Figure 1. In Figure 1, the clear image and the noise image are first divided into blocks, and the parameter

α

determines the number of blocks. Then, according to the parameter

e N

(exchange ratio), the clear image and the noise image in the corresponding position are exchanged to obtain two new samples.

In addition, we summarize the whole procedure of data augmentation in Algorithm 1. In Algorithm 1, we first judge the input parameters

α

and

p r o b

. If they do not satisfy the conditions of method execution, we train them according to the original noise-free image pair. If they satisfy the conditions of method execution, we enter the given DA method. In the execution of the method, firstly, the noise map

x_{n o i s e}

and the clear map

x_{g t}

are chunked, and then the patches at the same position of the

e N \times α

block are randomly taken and exchanged to get a new noise map

{\hat{x}}_{g t \to n o i s e}

or

{\hat{x}}_{n o i s e \to g t}

, which is used to generate a new image pair for training with the previous clear map. Correspondingly, we also present the entire algorithm flow in Figure 2 for a better understanding of our algorithm.

Algorithm 1 PatchMask implementation process

Require:: Pictures with noise $x_{n o i s e}$ , a clear picture $x_{g t}$ , probability of the method being used $p r o b$ , the total number of patches $α$ , and the ratio of patch swaps $e N$ .
Ensure:: The new noise map generated ${\hat{x}}_{g t \to n o i s e}$ or ${\hat{x}}_{n o i s e \to g t}$
1:: if $α \leq 0$ or $Random probability \geq p r o b$ then
2:: return $x_{n o i s e}$
3:: else
4:: Divide the $x_{n o i s e}$ into $x_{n o i s e}^{p_{1}}, x_{n o i s e}^{p_{2}}, \dots, x_{n o i s e}^{p_{α}}$
5:: Divide the $x_{g t}$ into $x_{g t}^{p_{1}}, x_{g t}^{p_{2}}, \dots, x_{g t}^{p_{α}}$
6:: $k e y [e N \times α] \subseteq n u m b e r [α]$
7:: for $i = 1, \dots, α$ do
8:: if $j \in k e y [e N \times α]$ then
9:: $x_{n o i s e}^{p_{j}} \leftarrow x_{g t}^{p_{j}}$
10:: $x_{g t}^{p_{j}} \leftarrow x_{n o i s e}^{p_{j}}$
11:: end if
12:: end for
13:: ${\hat{x}}_{g t \to n o i s e} \leftarrow x_{n o i s e}^{p_{1}} + x_{n o i s e}^{p_{2}} + \dots + x_{n o i s e}^{p_{α}}$
14:: ${\hat{x}}_{n o i s e \to g t} \leftarrow x_{g t}^{p_{1}} + x_{g t}^{p_{2}} + \dots + x_{g t}^{p_{α}}$
15:: return ${\hat{x}}_{g t \to n o i s e}$ or ${\hat{x}}_{n o i s e \to g t}$
16:: end if

3.2. Discussion

3.2.1. Principle

For HSI denoising tasks, it should be noted that in the process of DA, information other than that of the original image should not be introduced, and the pixel structure of the original image should not be greatly affected. Yoo et al. in [47] tested methods such as Cutout [65] and CutMix [68] on super-resolution tasks, and they found that excessive use of these methods would lead to a significant decrease in the performance of the results. For low-level vision tasks, the sharp changes in an image during DA increase the difficulty of the image reconstruction process, which also degrades the performance of the network.

At the same time, unlike RGB images, HSIs also have a significant issue to consider: The influence of information between bands is important for the denoising task. We tried copy–pasting the clear image and the noisy image at different positions on each band, but this reduced the network performance. The main reason is that when the network performs feature extraction, the information between different bands loses its original connection, affecting the performance of the network model. Therefore, we suggest allowing the network model to learn the distribution of noise while preserving the overall structure of the original image space and the information connections between the bands. Simultaneously, we want the model to pay more attention to these noises.

3.2.2. Why Use Patches?

We adopt this mechanism of dividing into several patches mainly because the network can pay more attention to more details when splitting. For example, in Figure 3, we can see that some patches mainly contain high-spatial-frequency regions of the image, while others mainly contain low-spatial-frequency regions of the image. High-frequency parts (e.g., streets) describe fast-changing fine details of the image and are difficult-to-restore clean images, while low-frequency parts (e.g., the squares) describe slightly changing structures that are less difficult to restore and may be excluded in the process of screening patches.

Sun et al. [69] also had a similar approach that involved extracting the more important parts of the image, participating in the training of the network, and achieving good results in the task of demosaicking. This effect was equivalent to an attention mechanism; in this way, some important parts of the image could be given additional attention.

We divide the whole process into blocks, and then randomly sample them. In this way, each patch can be selected with equal probability, and if a whole block is operated upon, some patches will be selected repeatedly with a high probability. As shown in the central part of Figure 3b, this part is likely to be repeatedly selected during the operation.

3.2.3. Why Does Our Method Work for HSI Denoising?

As shown in Figure 4, regardless of the imaging principle or the application field, HSIs are completely different from RGB images.

As they are limited by capabilities of devices and requirements of tasks, RGB images have only three bands of data format. However, HSIs have a more complex frequency band and data structure. Therefore, the band information and textural information are equally important for HSI denoising, and ignoring the correlation between adjacent bands will lead to a significantly lower denoising effect. When performing data enhancement operations, we should start with HSIs’ spectral and spatial aspects.

In image restoration, the image characteristics, e.g., the changes in pixels, the spatial priors of images, the preservation of information along the spectral direction, etc., will greatly affect the quality of the recovery of images. Therefore, when designing this method, we fully considered the characteristics of HSIs that were mentioned in this section, as well as their spatial resolution, when processing. Instead of using a patch from another image or just clearing it out, we swap it with its original patch. In this way, the entire spatial structure remains intact. In addition, spectral information is also taken into account. The different bands throughout the image are unchanged, thus maximizing the preservation of spectral information.

3.2.4. What Can Models Learn from PatchMask?

For an un-enhanced dataset, the data in the dataset are relatively singular. When a model is learning, the convergence speed is faster than that when the model is learning with DA, but the value of the loss function is higher, which was proven by the experiments described in Section 4.6.

Compared with the original data, the biggest difference in the processed data was that the distribution of noise changed greatly. After this process, we hoped that the model could learn not only the density of the noise, but also the noise distribution, as it was changed by the DA method. In this way, the model could pay more attention to more complex areas, making the entire model more refined for the HSI denoising task, and the performance of the model was improved. It can also be seen in our experimental results that after using this method, the details of some images were better recovered.

4. Experiments and Results

In this section, we describe our experimental procedure and present the results. To demonstrate the effectiveness of our method, we conducted comparative experiments on a dataset with other DA methods. In addition, to demonstrate the effect of our method in networks with different parameter amounts, we selected four networks with different parameter values for testing. We also tested different application scales to illustrate the effect of adding samples through DA methods on the network.

4.1. Comparisons with Other Methods

To demonstrate the effectiveness of our method, we selected the ICVL dataset and the QRNN3D [60] network for testing, and the test results are shown in Table 1. We can see that both the original CutBlur method and our newly proposed PatchMask method achieved better performance than the baselines. The main reason is that both CutBlur and PatchMask retain the contextual content of the original image, which does not cause an excessive semantic loss. On the contrary, they only change the distribution of noise or add a layer of the mask to the original image. This increases the variety of data in the dataset and increases the number of samples between clear and noisy. At the same time, our proposed method enables the high-frequency information in some patches to be sufficiently trained. In this experiment, we tried to make the parameters of the two DA methods consistent. For CutBlur, we used the parameters

p = 0.3

and

α = 0.7

(we followed the parameter settings used in the original paper). For our proposed method, we set the parameters to

p = 0.3

,

α = 16

, and

e N = 0.3

, which were set to better compare the impacts of the two on the network.

As shown in Figure 5, we tested other DA methods, such as Cutout [65], which is also commonly used on RGB images, where, unlike tasks such as classification and recognition, a large number of pixels are lost, which causes difficulties in the recovery of the network. The obtained results differed significantly from the original image. Therefore, we set the parameters here as follows: The number of blocks was 1 and the size was 2. We found that the network results became worse after the loss of the original pixels, mainly because the network had no response for some of the parameters in the missing pixels, and therefore, the missing pixels caused the network performance to be degraded in the image recovery task. In addition, we also conducted experiments on another DA method, i.e., Mixup [67]. In our experiments, we set the parameter of

α = 0.2

, and we found that the network performance was also degraded to some extent, which was mainly because when performing the operation, the method confused the information between the clear and noisy map bands, thus causing the network performance to be degraded.

4.2. Comparisons on Benchmark Datasets

4.2.1. ICVL Dataset

When conducting experiments, there are certain difficulties in obtaining HSIs. Therefore, we used the ICVL [29] dataset. The ICVL dataset is an HSI set that was acquired by using a Specim PS Kappa DX4 hyperspectral camera and a rotating stage for spatial scanning. It includes 200 images with a spatial resolution of

1392 \times 1300

, The number of bands was 31. For the accuracy of the experiment, we used 100 images as the training set and another 50 as the test set, and the spatial resolution input into the network was uniformly set to

64 \times 64

. Figure 6 is the RGB rendering of the HSI dataset.

4.2.2. CAVE Dataset

The CAVE dataset is a database of hyperspectral images that are used to simulate a GAP camera, and the entire dataset contains 32 hyperspectral images of different scenes. As shown in Figure 7, these images cover a wide range of real-world materials and objects, and each image includes full-spectral-resolution reflectance data in 31 bands from 400 to 700 nm with 10 nm steps.

To demonstrate the generalization ability of the proposed method, we tested it on this dataset as well. To ensure the accuracy of the experimental results, the experimental setup here was the same as that described in Section 4.1. We only replaced the ICVL dataset with the CAVE dataset, and the network followed QRNN3D. The reason that the improvement here was not so significant on the ICVL dataset was mainly due to the difference in the data volume. The ICVL dataset had 100 units training data, while the CAVE dataset only had 26 units of training data. However, we can see that our approach was equally effective on the CAVE dataset. Please see Figure 8 and Table 2 for detailed experimental results and data.

4.3. Implementation Details

In this section, we will describe the implementation details of the experiments. First, in the experiments, we used two datasets to accomplish the task, i.e., the ICVL [70] and CAVE [71] datasets. As the dataset for most of the experiments, we chose the ICVL dataset due to its large data volume and high image quality. From that dataset, we chose 100 of the images as the training set and 50 images as the test set. The CAVE dataset was only used to demonstrate the generalization ability of our method. Notably, all images in both datasets had 31 bands.

The QRNN3D [60] network was shown in experiments to have good performance as a network dedicated to the task of hyperspectral denoising; thus, we used it as a benchmark network. In our experiments, we added Gaussian noise (

σ = 50

) to the dataset and used the original loss function of the network for each network according to the original text. The images of the input networks were uniformly cropped to the size of

64 \times 64

for training, and all of the band information was retained. Meanwhile, we used the Adam optimizer, whose initial learning rate was set to

1 \times 10^{4}

, and we used a cosine annealing strategy [72] for training.

In addition, we also used two commonly used metrics, i.e., the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), to evaluate the performance. The PSNR describes the ratio of the maximum power of the signal to the corrupted noise and is commonly employed to measure the reconstruction quality of images and videos. Moreover, the SSIM defines structural information as a feature of the scene-wide structure of an object independent of luminance and contrast, and the distortion is modeled as the interaction of these three elements. It is used in most hyperspectral denoising tasks; see, e.g., SDeCNN [26], SSDRN [73], etc. Therefore, to ensure the validity and fairness of the comparisons, we still used the two most commonly used metrics for evaluation. In addition, all experiments were performed on the same NVIDIA GeForce RTX3090 (24G) GPU for fair comparisons.

4.4. Comparison of Different Models

Generally speaking, the more parameters a model has, the more beneficial it will be for a network to learn. This is because the more parameters a model has, the larger the capacity of the entire model is, and the more things are learned. We selected four networks with different parameters (DnCNN [61], CBDNet [74], HSID-CNN [28], MPRnet [62], and QRNN3D [60]). To fairly compare the impacts of several networks on the DA method, we set the parameters for applying DA to

p = 0.3

,

α = 16

, and

e N = 0.3

.

Below, we will briefly introduce the four networks mentioned above and illustrate the improvements that we made to the network in our experiments to deal with HSIs. Then, we will show the results of our method on different models.

DnCNN: The structure is shown in Figure 9. This network was the first to use residual learning for noise reduction. By combining residual learning and batch normalization (BN), the training of the noise reduction model can be greatly improved and accelerated. For a specific noise level, DnCNN can achieve an outstanding level of visual effects and evaluation indicators. For this network, we improved its structure and adjusted the network’s input and output channels uniformly to 31 to adapt to the large number of bands in HSIs. It should be noted that this network does not pay attention to the information between different bands; therefore, it has certain limitations for HSIs.

MPRnet: This is a progressive multi-stage network. As shown in Figure 10, the first two stages of the network adopt the U-Net structure. There are many attention modules embedded in the network. For example, each stage passes through a channel attention block in advance, and the skip-connection part of U-Net also has a channel attention block (CAB) module. In addition, there is an attention adjustment module—the supervised attention module (SAM)—that introduces supervision information between stages. Between stages, cross-stage feature fusion (CSFF) is also performed in the encoder–decoder part of the U-Net to better preserve contextual information. We also set the input and output channels to 31 to accommodate the number of bands in the dataset.

CBDNet: This is from a paper included in CVPR 2019, and it reached state-of-the-art (SOTA) performance on the DND dataset at that time. The model is more inclined to remove noise from real environments, and the whole network has two components, i.e., a noise estimation network and a non-blind denoising sub-network that removes noise with unknown noise levels. In this work, synthetic noisy images and real-world noisy images were both utilized to train the network; thus, the network was able to represent the noise in real-world images and improve the denoising performance. The network structure is shown in Figure 11.

HSID-CNN: Aiming at the characteristics of high redundancy and correlation of information in HSIs, this network considers the spatial–spectral joint at the input of a convolutional neural network. The end-to-end nonlinear mapping of noisy images and clear images is realized with deep convolutional neural networks, which solves the inflexibility present in other methods. This network uses multi-scale feature extraction and multi-level representation to obtain multi-scale spatial–spectral features and fuses different features for restoration. In this way, the network achieves better performance. The network structure is shown in Figure 12.

QRNN3D This network is an alternating-orientation 3D recurrent neural network for HSI denoising, and it effectively exploits the structural spatial–spectral correlation and global correlation information along the spectrum. The alternating-direction structure that is introduced removes causal dependencies without adding extra computational costs. The model can model spatial–spectral dependence while maintaining the flexibility of HSIs with arbitrary bands. The network structure is shown in Figure 13.

As shown in Table 3, for the above models, the performance thereof was improved after applying our DA method. However, for models with simpler structures, the improvement was limited, which was mainly because there was not much information that simple models could learn, and the addition of DA methods made it more difficult for these simple models to adapt to this change. Moreover, 3D convolutions are very useful for hyperspectral denoising tasks. In Table 3, we can see that QRNN3D achieved good performance with fewer parameters because the 3D convolution was able to extract inter-band information more efficiently. The experimental outcomes are shown in Figure 14.

4.5. Ablation Study

4.5.1. Proportion of Newly Generated Samples

The number of samples generated through DA should be investigated. If there are too many samples generated through DA, the learning of the network will be biased towards the new samples rather than the original samples. Otherwise, if the data samples generated through DA are too few, the network cannot learn the differences between the new samples and the original samples. Therefore, we designed a set of experiments to verify how different scales of augmented datasets affected the network. For experimental accuracy, we empirically set

α = 16

and

e N = 0.3

for better performance.

From Table 4, we can see that when the number of new samples was increased to 30% of the original dataset, the performance of the entire network was better. With the increase in the number of new samples, the performance of the network showed a certain level of decline. It is believed that the proportion of the original sample information learned by the network decreased as the proportion of new samples increased; thus, the performance of the network model also decreased. In Figure 15, we can also see that the images were best reconstructed at 30% of the new samples.

4.5.2. Total Number of Patches— $α$

The parameter

α

has an important role in the proposed PatchMask DA method; it refers to the total number of patches into which the image is divided. The greater the total number of patches is, the smaller the area of each patch will be, and the finer the overall area formed by a patch will be. It was possible to fit more complex textured areas of the image. In this way, more complex regions were trained, and the network performance was improved. To prove our conjecture, we conducted a set of experiments on the parameter

α

, and the experimental results are shown in Table 5 and Figure 16.

4.5.3. The Ratio of Patch Swaps— $e N$

Another key parameter,

e N

, refers to the proportion of patches that are swapped. Our DA method generated two new complimentary samples. As shown in Table 6, we found that the performance dropped significantly when the exchange ratio was closer to

e N = 0.5

. When

e N = 0.5

, the number of noisy patches decreased, and the probability that noisy patches happened to be in information-dense regions further decreased. It was originally envisaged that there would be complex textures and other areas in which the denoising task was more complex, and the method of covering the areas with noise masks lost the original effect, resulting in some performance degradation. The visualization results of the experiment are presented in Figure 17, where we can see that the image reconstruction was better in terms of detail at

e N = 0.1

.

The experimental results are shown in Table 6. This part of the experiment was performed with the other parameters kept the same. We set the proportion of the increased dataset to 30% of the original dataset, and the alpha parameter that we described above was set to 16. In addition, we still chose QRNN3D for the model and ICVL for the dataset. During training, we randomly added Gaussian noise with

σ = 50

to the dataset.

4.6. Convergence Analysis

To prove that our DA method did not cause divergence in the original network, we show a comparison of the training loss curve obtained with our method. The abscissa in Figure 18 represents the training epoch, and the ordinate represents the training loss. As shown in Figure 18, the early stage of training (before epoch = 15), shown with the red curve (without our method), had consistently lower loss values than the blue curve (with our method), and the training loss curve decreased faster than the blue curve. The main reason for our analysis is that when the DA method was not used, the network needed to learn less content, and the network did not have to learn the changes in the noise distribution after DA and the parts that required the network to pay attention. Therefore, the convergence rate of the network was faster.

At the later stage of training (shown after we have zoomed in on the image), we found that the loss value of the blue curve was significantly lower than that of the red curve, which was also in line with our predictions. After using the DA method, the performance of the network was further improved.

5. Conclusions

This paper investigated the problem of DA for HSI denoising. Existing DA methods are unsuitable for HSI denoising, so we designed a simple yet effective DA method called PatchMask for the task of HSI denoising. The proposed PatchMask generates new training samples by randomly exchanging some patches between noisy and clean images, and this does not increase the network’s computation time. Newly generated samples encourage the model to learn which patches are more critical for noise removal, thereby discarding some unimportant patches. We qualitatively and quantitatively analyzed the parameters of the method and conducted plenty of experiments to verify the effectiveness and advantages of the proposed method on different models. This paper can shed some light on the future work concerning DA in HSI denoising and other computer vision tasks.

Author Contributions

Conceptualization, H.-X.D.; formal analysis, Y.-W.Z.; methodology, X.-S.L.; project administration, H.-X.D.; software, X.-S.L.; supervision, C.W.; validation, X.-S.L.; visualization, X.-S.L.; writing—original draft, X.-S.L.; writing—review and editing, H.-Z.S., H.-X.D., C.W. and L.-J.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the research start-up funding of Xihua University (RZ2000002862) and Basic Public Welfare Research in Zhejiang Province of China (LGG22F020036). Also, this research is supported by NSFC (62203089) and Sichuan Science and Technology Project (Grant No. 2021ZYD0021, 2022NSFSC0507).

Data Availability Statement

The datasets generated during the study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

HSI	Hyperspectral Image
DA	Data Augmentation
DL	Deep Learning
QRNN3D	3D Quasi-Recurrent Neural Network
DnCNN	Denoising Convolutional Neural Network
MPRnet	Multi-Stage Progressive Image Restoration Network
PSNR	Peak Signal-to-Noise Ratio
SSIM	Structural Similarity
SOTA	State of the Art
BN	Batch Normalization
CSFF	Cross-Stage Feature Fusion
CAB	Channel Attention Block
SAM	Supervised Attention Module
SISR	Single-Image Super-Resolution
BM3D	Block Matching 3D
EPLL	Expected Patch Log Likelihood
LRMR	Low-Rank Matrix Recovery
CNN	Convolutional Neural Network
SSGN	Spatial–Spectral Gradient Network
HSI-SDeCNN	HSI Single-Denoising CNN

References

Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.; Chanussot, J. Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef] [Green Version]
Lu, M.; Chen, B.; Liao, X.; Yue, T.; Yue, H.; Ren, S.; Li, X.; Nie, Z.; Xu, B. Forest Types Classification Based on Multi-Source Data Fusion. Remote Sens. 2017, 9, 1153. [Google Scholar] [CrossRef] [Green Version]
Liu, Y.; Pu, H.; Sun, D.W. Hyperspectral imaging technique for evaluating food quality and safety during various processes: A review of recent applications. Trends Food Sci. Technol. 2017, 69, 25–35. [Google Scholar] [CrossRef]
Feng, Y.Z.; Sun, D.W. Application of hyperspectral imaging in food safety inspection and control: A review. Crit. Rev. Food Sci. Nutr. 2012, 52, 1039–1058. [Google Scholar] [CrossRef] [PubMed]
Gonzalez, C.G.; Absil, O.; Van Droogenbroeck, M. Supervised detection of exoplanets in high-contrast imaging sequences. Astron. Astrophys. 2018, 613, A71. [Google Scholar] [CrossRef] [Green Version]
Gowen, A.A.; Feng, Y.; Gaston, E.; Valdramidis, V. Recent applications of hyperspectral imaging in microbiology. Talanta 2015, 137, 43–54. [Google Scholar] [CrossRef] [PubMed]
Mahesh, S.; Jayas, D.; Paliwal, J.; White, N. Hyperspectral imaging to classify and monitor quality of agricultural materials. J. Stored Prod. Res. 2015, 61, 17–26. [Google Scholar] [CrossRef]
Liu, Z.; Lu, Y.; Peng, Y.; Zhao, L.; Wang, G.; Hu, Y. Estimation of Soil Heavy Metal Content Using Hyperspectral Data. Remote Sens. 2019, 11, 1464. [Google Scholar] [CrossRef] [Green Version]
Deng, L.J.; Vivone, G.; Paoletti, M.E.; Scarpa, G.; He, J.; Zhang, Y.; Chanussot, J.; Plaza, A. Machine Learning in Pansharpening: A Benchmark, from Shallow to Deep Networks. IEEE Geosci. Remote Sens. Mag. 2022, 10, 279–315. [Google Scholar] [CrossRef]
Zhang, T.J.; Deng, L.J.; Huang, T.Z.; Chanussot, J.; Vivone, G. A Triple-Double Convolutional Neural Network for Panchromatic Sharpening. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–14. [Google Scholar] [CrossRef]
Jin, C.; Deng, L.J.; Huang, T.Z.; Vivone, G. Laplacian Pyramid Networks: A New Approach for Multispectral Pansharpening. Inf. Fusion 2022, 78, 158–170. [Google Scholar] [CrossRef]
Hu, J.F.; Huang, T.Z.; Deng, L.J.; Dou, H.X.; Hong, D.; Vivone, G. Fusformer: A Transformer-Based Fusion Network for Hyperspectral Image Super-Resolution. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6012305. [Google Scholar] [CrossRef]
Xie, W.; Li, Y. Hyperspectral imagery denoising by deep learning with trainable nonlinearity function. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1963–1967. [Google Scholar] [CrossRef]
Wang, M.; Wang, Q.; Chanussot, J.; Li, D. Hyperspectral image mixed noise removal based on multidirectional low-rank modeling and spatial–spectral total variation. IEEE Trans. Geosci. Remote Sens. 2020, 59, 488–507. [Google Scholar] [CrossRef]
Mäkinen, Y.; Azzari, L.; Foi, A. Exact transform-domain noise variance for collaborative filtering of stationary correlated noise. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 185–189. [Google Scholar]
Fan, H.; Li, C.; Guo, Y.; Kuang, G.; Ma, J. Spatial–spectral total variation regularized low-rank tensor decomposition for hyperspectral image denoising. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6196–6213. [Google Scholar] [CrossRef]
Gao, L.; Yao, D.; Li, Q.; Zhuang, L.; Zhang, B.; Bioucas-Dias, J.M. A new low-rank representation based hyperspectral image denoising method for mineral mapping. Remote Sens. 2017, 9, 1145. [Google Scholar] [CrossRef] [Green Version]
Fan, F.; Ma, Y.; Li, C.; Mei, X.; Huang, J.; Ma, J. Hyperspectral image denoising with superpixel segmentation and low-rank representation. Inf. Sci. 2017, 397, 48–68. [Google Scholar] [CrossRef]
Zhao, Y.Q.; Yang, J. Hyperspectral image denoising via sparse representation and low-rank constraint. IEEE Trans. Geosci. Remote Sens. 2014, 53, 296–308. [Google Scholar] [CrossRef]
Gong, X.; Chen, W.; Chen, J. A low-rank tensor dictionary learning method for hyperspectral image denoising. IEEE Trans. Signal Process. 2020, 68, 1168–1180. [Google Scholar] [CrossRef]
Fu, Y.; Lam, A.; Sato, I.; Sato, Y. Adaptive spatial-spectral dictionary learning for hyperspectral image denoising. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 343–351. [Google Scholar]
Dantas, C.F.; Cohen, J.E.; Gribonval, R. Hyperspectral Image Denoising using Dictionary Learning. In Proceedings of the 2019 10th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Amsterdam, The Netherlands, 24–26 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–5. [Google Scholar]
Lu, T.; Li, S.; Fang, L.; Ma, Y.; Benediktsson, J.A. Spectral–spatial adaptive sparse representation for hyperspectral image denoising. IEEE Trans. Geosci. Remote Sens. 2015, 54, 373–385. [Google Scholar] [CrossRef]
Xie, Q.; Zhao, Q.; Meng, D.; Xu, Z.; Gu, S.; Zuo, W.; Zhang, L. Multispectral images denoising by intrinsic tensor sparsity regularization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1692–1700. [Google Scholar]
Peng, Y.; Meng, D.; Xu, Z.; Gao, C.; Yang, Y.; Zhang, B. Decomposable nonlocal tensor dictionary learning for multispectral image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 2949–2956. [Google Scholar]
Maffei, A.; Haut, J.M.; Paoletti, M.E.; Plaza, J.; Bruzzone, L.; Plaza, A. A single model CNN for hyperspectral image denoising. IEEE Trans. Geosci. Remote Sens. 2019, 58, 2516–2529. [Google Scholar] [CrossRef]
Cao, X.; Fu, X.; Xu, C.; Meng, D. Deep Spatial-Spectral Global Reasoning Network for Hyperspectral Image Denoising. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5504714. [Google Scholar] [CrossRef]
Yuan, Q.; Zhang, Q.; Li, J.; Shen, H.; Zhang, L. Hyperspectral image denoising employing a spatial–spectral deep residual convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1205–1218. [Google Scholar] [CrossRef] [Green Version]
Arad, B.; Ben-Shahar, O. Sparse Recovery of Hyperspectral Signal from Natural RGB Images. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 19–34. [Google Scholar]
Danese, G.; Lombardi, R.; Morizio, M.; Revelli, C. PAVIA: A control system for active vision. In Proceedings of the Conference on Computer Architectures for Machine Perception, Como, Italy, 18–20 September 1995; IEEE: Piscataway, NJ, USA, 1995; pp. 250–257. [Google Scholar]
Wang, W.; Liu, X.; Mou, X. Data Augmentation and Spectral Structure Features for Limited Samples Hyperspectral Classification. Remote Sens. 2021, 13, 547. [Google Scholar] [CrossRef]
Alipourfard, T.; Arefi, H. Virtual training sample generation by generative adversarial networks for hyperspectral images classification. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 42, 63–69. [Google Scholar] [CrossRef] [Green Version]
Nalepa, J.; Myller, M.; Kawulok, M. Training-and test-time data augmentation for hyperspectral image segmentation. IEEE Geosci. Remote Sens. Lett. 2019, 17, 292–296. [Google Scholar] [CrossRef]
Li, W.; Chen, C.; Zhang, M.; Li, H.; Du, Q. Data augmentation for hyperspectral image classification with deep CNN. IEEE Geosci. Remote Sens. Lett. 2018, 16, 593–597. [Google Scholar] [CrossRef]
Audebert, N.; Le Saux, B.; Lefèvre, S. Generative adversarial networks for realistic synthesis of hyperspectral samples. In Proceedings of the IGARSS 2018–2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4359–4362. [Google Scholar]
Li, W.; Wu, G.; Zhang, F.; Du, Q. Hyperspectral image classification using deep pixel-pair features. IEEE Trans. Geosci. Remote Sens. 2016, 55, 844–853. [Google Scholar] [CrossRef]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Perez, L.; Wang, J. The effectiveness of data augmentation in image classification using deep learning. arXiv 2017, arXiv:1712.04621. [Google Scholar]
Montserrat, D.M.; Lin, Q.; Allebach, J.; Delp, E.J. Training object detection and recognition CNN models using data augmentation. Electron. Imaging 2017, 2017, 27–36. [Google Scholar] [CrossRef]
Park, D.S.; Chan, W.; Zhang, Y.; Chiu, C.C.; Zoph, B.; Cubuk, E.D.; Le, Q.V. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv 2019, arXiv:1904.08779. [Google Scholar]
DeVries, T.; Taylor, G.W. Dataset augmentation in feature space. arXiv 2017, arXiv:1702.05538. [Google Scholar]
Chu, P.; Bian, X.; Liu, S.; Ling, H. Feature space augmentation for long-tailed data. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 694–710. [Google Scholar]
Gong, C.; Ren, T.; Ye, M.; Liu, Q. Maxup: Lightweight adversarial training with data augmentation improves neural network training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2474–2483. [Google Scholar]
Morris, J.X.; Lifland, E.; Yoo, J.Y.; Grigsby, J.; Jin, D.; Qi, Y. Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp. arXiv 2020, arXiv:2005.05909. [Google Scholar]
Lv, N.; Ma, H.; Chen, C.; Pei, Q.; Zhou, Y.; Xiao, F.; Li, J. Remote sensing data augmentation through adversarial training. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 9318–9333. [Google Scholar] [CrossRef]
Ghaffar, M.; McKinstry, A.; Maul, T.; Vu, T. Data augmentation approaches for satellite image super-resolution. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 4, 47–54. [Google Scholar] [CrossRef] [Green Version]
Yoo, J.; Ahn, N.; Sohn, K.A. Rethinking data augmentation for image super-resolution: A comprehensive analysis and a new strategy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8375–8384. [Google Scholar]
Timofte, R.; Rothe, R.; Van Gool, L. Seven Ways to Improve Example-Based Single Image Super Resolution. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1865–1873. [Google Scholar] [CrossRef] [Green Version]
Xue, T.; Wang, Y.; Chen, Y.; Jia, J.; Wen, M.; Guo, R.; Wu, T.; Deng, X. Mixed Noise Estimation Model for Optimized Kernel Minimum Noise Fraction Transformation in Hyperspectral Image Dimensionality Reduction. Remote Sens. 2021, 13, 2607. [Google Scholar] [CrossRef]
Liu, S.; Jiao, L.; Yang, S. Hierarchical Sparse Learning with Spectral-Spatial Information for Hyperspectral Imagery Denoising. Sensors 2016, 16, 1718. [Google Scholar] [CrossRef] [Green Version]
Song, X.; Wu, L.; Hao, H.; Xu, W. Hyperspectral Image Denoising Based on Spectral Dictionary Learning and Sparse Coding. Electronics 2019, 8, 86. [Google Scholar] [CrossRef] [Green Version]
Chen, Y.; Cao, X.; Zhao, Q.; Meng, D.; Xu, Z. Denoising hyperspectral image with non-iid noise structure. IEEE Trans. Cybern. 2017, 48, 1054–1066. [Google Scholar] [CrossRef] [Green Version]
Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef] [PubMed]
Zoran, D.; Weiss, Y. From learning models of natural image patches to whole image restoration. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 479–486. [Google Scholar]
Zhang, H.; He, W.; Zhang, L.; Shen, H.; Yuan, Q. Hyperspectral image restoration using low-rank matrix recovery. IEEE Trans. Geosci. Remote Sens. 2013, 52, 4729–4743. [Google Scholar] [CrossRef]
Wang, C.; Zhang, L.; Wei, W.; Zhang, Y. When Low Rank Representation Based Hyperspectral Imagery Classification Meets Segmented Stacked Denoising Auto-Encoder Based Spatial-Spectral Feature. Remote Sens. 2018, 10, 284. [Google Scholar] [CrossRef]
Chang, Y.; Yan, L.; Fang, H.; Zhong, S.; Liao, W. HSI-DeNet: Hyperspectral image restoration via convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2018, 57, 667–682. [Google Scholar] [CrossRef]
Zhang, Q.; Yuan, Q.; Li, J.; Liu, X.; Shen, H.; Zhang, L. Hybrid noise removal in hyperspectral imagery with a spatial–spectral gradient network. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7317–7329. [Google Scholar] [CrossRef]
Liu, W.; Lee, J. A 3-D atrous convolution neural network for hyperspectral image denoising. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5701–5715. [Google Scholar] [CrossRef]
Wei, K.; Fu, Y.; Huang, H. 3-D quasi-recurrent neural network for hyperspectral image denoising. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 363–375. [Google Scholar] [CrossRef] [Green Version]
Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [Green Version]
Mehri, A.; Ardakani, P.B.; Sappa, A.D. MPRNet: Multi-path residual network for lightweight image super resolution. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 2704–2713. [Google Scholar]
Ballester, P.; Araujo, R.M. On the performance of GoogLeNet and AlexNet applied to sketches. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
Wang, C.; Shen, H.Z.; Fan, F.; Shao, M.W.; Yang, C.S.; Luo, J.C.; Deng, L.J. EAA-Net: A novel edge assisted attention network for single image dehazing. Knowl.-Based Syst. 2021, 228, 107279. [Google Scholar] [CrossRef]
DeVries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13001–13008. [Google Scholar]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
Walawalkar, D.; Shen, Z.; Liu, Z.; Savvides, M. Attentive cutmix: An enhanced data augmentation approach for deep learning based image classification. arXiv 2020, arXiv:2003.13048. [Google Scholar]
Sun, S.; Chen, L.; Slabaugh, G.; Torr, P. Learning to sample the most useful training patches from images. arXiv 2020, arXiv:2011.12097. [Google Scholar]
Arad, B.; Ben-Shahar, O. Sparse Recovery of Hyperspectral Signal from Natural RGB Images. Available online: https://icvl.cs.bgu.ac.il/hyperspectral/ (accessed on 15 November 2022).
Yasuma, F.; Mitsunaga, T.; Iso, D.; Nayar, S. Generalized Assorted Pixel Camera: Post-Capture Control of Resolution, Dynamic Range and Spectrum; Technical Report; Department of Computer Science, Columbia University: New York, NY, USA, 2008. [Google Scholar]
Loshchilov, I.; Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
Xie, W.; Li, Y.; Jia, X. Deep convolutional networks with residual learning for accurate spectral-spatial denoising. Neurocomputing 2018, 312, 372–381. [Google Scholar] [CrossRef]
Guo, S.; Yan, Z.; Zhang, K.; Zuo, W.; Zhang, L. Toward convolutional blind denoising of real photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1712–1722. [Google Scholar]

Figure 1. The whole operation process of PatchMask.

Figure 2. The flow of the entire algorithm is shown in the figure above. The data in the training set are stored in the form of noisy–clear image pairs. First, the two images are chunked, and then the corresponding patches are exchanged to generate a new noisy image. After that, the newly generated noisy image and the previous clear image form a new set of image pairs, which are put into the network for training at the same time as the image pairs in the original dataset.

Figure 3. (a) is the ground truth and (d) is the noisy image. (b,c) show the renderings of different DA methods. (e,f) are more intuitive, using white lines to mark the part where the operation occurs.

Figure 4. The difference in structure between HSIs and RGB images.

Figure 5. We chose band 25 for comparison with other DA methods.

Figure 6. Presentation of the ICVL dataset. The images shown here are the RGB rendering given on the official website of thedataset [70].

Figure 7. Presentation of the CAVE dataset. The images above were provided by the CAVE dataset [71] and are all RGB images that were rendered under neutral daylight illumination (D65).

Figure 8. Here, we have selected the 24th band to demonstrate the effect of the method on the CAVE dataset.

Figure 9. Schematic diagram of the network structure of DnCNN [61].

Figure 10. Schematic diagram of the network structure of MPRnet [62].

Figure 11. Schematic diagram of the network structure of CBDNet [74].

Figure 12. Structure of HSID-CNN [28].

Figure 13. Structure of QRNN3D [60].

Figure 14. The results of the original model and the network trained after adding our DA method on the ICVL dataset. The 20th band of the image was selected for comparison.

Figure 15. The effect of the number of new samples on network performance.

Figure 16. The impact of the total number of patches on the network performance.

Figure 17. Comparison of the results for different patch-swap ratios.

Figure 18. Convergence comparison of the network’s training loss curve (red) without DA and the training loss curve (blue) when using our proposed DA method. Our entire experimental setup was the same as that in Section 4.1.

Table 1. Comparison of PSNR and SSIM when using different methods;

δ

represents an increased value. The baselines used the QRNN3D and ICVL datasets (with Gaussian noise and

σ = 50

), and these models were trained from scratch. Best scores are highlighted.

Table 1. Comparison of PSNR and SSIM when using different methods;

δ

represents an increased value. The baselines used the QRNN3D and ICVL datasets (with Gaussian noise and

σ = 50

), and these models were trained from scratch. Best scores are highlighted.

Methods	PSNR ( $δ$ )	SSIM ( $δ$ )
QRNN3D	39.5217	0.9788
+Cutout	39.1023 (−0.4194)	0.9591 (−0.0197)
+Mixup	39.3816 (−0.1401)	0.9730 (−0.0058)
+CutBlur	39.9113 (+0.3896)	0.9804 (+0.0016)
+PatchMask (Ours)	40.0250 (+0.5033)	0.9811 (+0.0023)

Table 2. The effect of the method was demonstrated on the CAVE dataset. Best scores are highlighted.

Methods	PSNR( $δ$ )	SSIM ( $δ$ )
QRNN3D	33.8689	0.9048
+PatchMask (Ours)	34.0612 (+0.1923)	0.9115 (+0.0067)

Table 3. The effects of DA methods on the performance of network models with different parameters.

δ

represents an increased value. Best scores are highlighted. For the convenience of reading, we bold the best results of different methods in the same network.

Table 3. The effects of DA methods on the performance of network models with different parameters.

δ

represents an increased value. Best scores are highlighted. For the convenience of reading, we bold the best results of different methods in the same network.

Methods	Params (M)	PSNR ( $δ$ )	SSIM ( $δ$ )
DnCNN	0.59	30.4375	0.8832
+PatchMask	0.59	30.6744 (+0.2369)	0.8840 (+0.0008)
CBDNet	4.32	36.5373	0.9367
+PatchMask	4.32	36.7826 (+0.2453)	0.9371 (+0.0004)
HSID-CNN	0.40	36.9755	0.9487
+PatchMask	0.40	37.2521 (+0.2766)	0.9496 (+0.0009)
MPRnet	3.64	38.8554	0.9576
+PatchMask	3.64	39.1643 (+0.3089)	0.9588 (+0.0012)
QRNN3D	0.86	39.5217	0.9788
+PatchMask	0.86	40.0250 (+0.5033)	0.9811 (+0.0023)

Table 4. The impact of adding new samples generated through DA to the original dataset. For the accuracy of the experiment, our baseline here still used the QRNN3D network and ICVL dataset, and the other configurations kept the same. Best scores are highlighted.

Methods	New Samples	PSNR ( $δ$ )	SSIM ( $δ$ )
Baseline		39.5217	0.9788
Baseline +PatchMask	10%	40.0118 (+0.4901)	0.9811 (+0.0023)
	30%	40.0250 (+0.5033)	0.9811 (+0.0023)
	50%	39.9685 (+0.4468)	0.9804 (+0.0016)
	70%	39.9180 (+0.3963)	0.9798 (+0.0010)
	90%	39.8690 (+0.3473)	0.9805 (+0.0017)

Table 5. The impact of the total number of patches on the network. Best scores are highlighted.

Methods	$α$	PSNR ( $δ$ )	SSIM ( $δ$ )
Baseline		39.5217	0.9788
+PatchMask	4	40.0059 (+0.4842)	0.9809 (+0.0021)
+PatchMask	16	40.0250 (+0.5033)	0.9811 (+0.0023)
+PatchMask	64	40.0983 (+0.5766)	0.9813 (+0.0025)

Table 6. The effect of the patch-exchange ratio on the network performance.

Methods	eN	PSNR ( $δ$ )	SSIM ( $δ$ )
Baseline		39.5217	0.9788
+PatchMask	0.1	40.0538 (+0.5321)	0.9814 (+0.0026)
+PatchMask	0.2	40.0543 (+0.5326)	0.9813 (+0.0025)
+PatchMask	0.3	40.0250 (+0.5033)	0.9811 (+0.0023)
+PatchMask	0.4	39.9641 (+0.4424)	0.9807 (+0.0019)
+PatchMask	0.5	39.9428 (+0.4107)	0.9807 (+0.0019)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dou, H.-X.; Lu, X.-S.; Wang, C.; Shen, H.-Z.; Zhuo, Y.-W.; Deng, L.-J. PatchMask: A Data Augmentation Strategy with Gaussian Noise in Hyperspectral Images. Remote Sens. 2022, 14, 6308. https://doi.org/10.3390/rs14246308

AMA Style

Dou H-X, Lu X-S, Wang C, Shen H-Z, Zhuo Y-W, Deng L-J. PatchMask: A Data Augmentation Strategy with Gaussian Noise in Hyperspectral Images. Remote Sensing. 2022; 14(24):6308. https://doi.org/10.3390/rs14246308

Chicago/Turabian Style

Dou, Hong-Xia, Xing-Shun Lu, Chao Wang, Hao-Zhen Shen, Yu-Wei Zhuo, and Liang-Jian Deng. 2022. "PatchMask: A Data Augmentation Strategy with Gaussian Noise in Hyperspectral Images" Remote Sensing 14, no. 24: 6308. https://doi.org/10.3390/rs14246308

APA Style

Dou, H. -X., Lu, X. -S., Wang, C., Shen, H. -Z., Zhuo, Y. -W., & Deng, L. -J. (2022). PatchMask: A Data Augmentation Strategy with Gaussian Noise in Hyperspectral Images. Remote Sensing, 14(24), 6308. https://doi.org/10.3390/rs14246308

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PatchMask: A Data Augmentation Strategy with Gaussian Noise in Hyperspectral Images

Abstract

1. Introduction

2. Related Work

2.1. HSI Denoising Methods

2.2. Data Augmentation Methods

2.3. Motivation

3. Proposed Method

3.1. Algorithm

3.2. Discussion

3.2.1. Principle

3.2.2. Why Use Patches?

3.2.3. Why Does Our Method Work for HSI Denoising?

3.2.4. What Can Models Learn from PatchMask?

4. Experiments and Results

4.1. Comparisons with Other Methods

4.2. Comparisons on Benchmark Datasets

4.2.1. ICVL Dataset

4.2.2. CAVE Dataset

4.3. Implementation Details

4.4. Comparison of Different Models

4.5. Ablation Study

4.5.1. Proportion of Newly Generated Samples

4.5.2. Total Number of Patches— α

4.5.3. The Ratio of Patch Swaps— e N

4.6. Convergence Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.5.2. Total Number of Patches— $α$

4.5.3. The Ratio of Patch Swaps— $e N$