1. Introduction
In recent years, hyperspectral imaging and application have attracted great attention in the field of earth remote sensing. HSI classification is the basic task of hyperspectral data analysis and application [
1,
2,
3,
4]. The spatial regional consistency characteristics [
5] of the image should be considered during the process of HSI since some correlations exist between the ground objects in the HSI. Moreover, the problem of background interference widely exists in the existing public HSI data [
6], which also makes it difficult to accurately identify and classify ground objects. In summary, it is very important to make full use of the rich spatial consistency information [
7] and improve the quality of HSIs [
8].
The research on spatial consistency has attracted increasing attention as a result of the development of remote sensing classification techniques. The spatial consistency of an image can be simply defined as every small window having similarity with the other windows in the same image, especially the adjacent windows [
9]. Therefore, the correlation between a pixel and its neighboring pixels should be considered during feature extraction. In addition, usually, similar objects tend to be distributed in a block, that is, pixels belonging to the same class are usually close to each other. Therefore, the spatial consistency in HSI is firstly used to enhance the quality of HSI. Based on the Gibbs algorithm, Rand et al. [
10] regard HSI as a set of high-dimensional vectors related to spectral information, and divide a large set into several subsets of vectors according to spatial similarity. The spatial consistency of spectral information at each site is enhanced by facilitating subsequent spectral mixing analysis (SMA) of HSI. Yue et al. [
11] combine multiple similar pixels adjacent in the spatial domain into one block to realize the pixel reduction in HSI according to the spectral angle. Secondly, spatial consistency is also used for feature extraction of HSIs. The spatial correlation features of HSIs can be obtained by using Spectral Graph Wavelet Transform (SGWT) [
12], which fully considers the relationship between each pixel and its adjacent pixels. Nadia et al. [
13] use SGWT to extract texture information of a HSI as secondary features in HSI classification. In addition, SGW can also be seen as a filter, which can be used to extract the multi-scale characteristics of an image. The SGW [
14] is used as the convolution kernel to construct a Graph Wavelet Neural Network (GWNN), which is used to classify the nodes of the graph. Dong et al. [
15] decomposes the vibration signal by using SGW to obtain its multi-scale characteristics, and converts the results into path graphs at multiple levels. The above spatial consistency enhancement methods are mainly based on hyperspectral raw data (data-wise).
Chen [
16] applied deep learning to HSI classification for the first time and achieved good results. Convolutional Neural Networks (CNNs) in hyperspectral image classification tasks [
17,
18,
19,
20] use convolutional kernels to traverse the whole image and extract valuable features. In the process of convolution, the spatial consistency of the image has been considered. The recently proposed DRCNN [
21] divides an image block into multiple regions, which are sent into different CNN models instead of only a single CNN model. The classification process is more consistent using the regional consistency assumption in the spatial domain since multiple feature extractions and weighted averages are performed in DRCNN. Overall, the advantage of DRCNN is that it strengthens the spatial consistency of the feature-wise approach and increases the number of samples by multi-region operation. However, it has some limitations. Firstly, DRCNN ignores the spatial consistency at the data-wise (pixel) level since the correlation between the pixels in an HSI is considered less. Secondly, the operation of multiple convolutions will lead to the loss of image edge information, which will cause the problem of edge point misclassification. Furthermore, to remove the noise of HSIs, the denoising effects of commonly used filters such as the bilateral filter [
22], trilateral filter [
23], and Gaussian filter have been compared. The Gaussian filter is selected in this paper since it can remove Gaussian noise and smooth the edges of images. In particular, the Gaussian filter can greatly simplify the noise variance estimation and analysis [
24]. Therefore, a Data-wise spAtial regioNal Consistency re-Enhancement (DANCE) method is proposed in this paper to further improve the spatial consistency. Based on the above analysis, DANCE can overcome the shortcomings of the DRCNN in terms of spatial consistency in data-wise approaches to some extent.
The main contributions of this paper are as follows:
To solve the misclassification problem of HSI image edge points, a novel and effective DANCE method is proposed to enhance the spatial regional consistency of data-wise approaches, which can promote the performance of some state-of-the-art methods.
To better integrate the feature-wise and data-wise method, the structure of the DRCNN model is optimized through experiments, which can comprehensively improve the spatial regional consistency.
The remainder of this paper is organized as follows. The related basic knowledge is introduced in
Section 2. The proposed method is described in
Section 3. The experiment results and analysis are discussed in
Section 4. The discussion is given in
Section 5. The conclusions are drawn in
Section 6.
5. Discussion
5.1. The Selection of the Input Size in DANCE
Before the use of DANCE, an HSI is divided into many blocks having the same size; then, the undirected graphs of blocks are obtained by spectral graph theory. Therefore, the size of blocks determines how many pixels are used simultaneously for spatial consistency, which can affect the performance of DANCE. Based on the analysis, the selection experiments of the sub-block size in DANCE were designed. To generate a node graph of every HSI block, the sub-block size must be an integer. Thus, the input size of Indian Pines data was set to 145 × 145, 29 × 29, and 5 × 5. The image after the use of DANCE is sent to the DRCNN for classification, and the results are shown in
Table 10. Smaller blocks mean more iterations, which in turn affects the running time. Therefore, the running time with different image block sizes is also shown in
Table 10.
It can be seen that the best classification results are obtained when the size of the HSI block is set to 29 × 29. Compared with the size of 145 × 145, the smaller image block only contains the information of spatial consistency with its neighborhoods, which is better than the result of computing all pixel points together. In addition, when the size is set to 5 × 5, the more iterations lead to huge computation complexity. Therefore, the input size in DANCE was selected as 29 × 29 on Indian Pines data.
In summary, the selection of the image block size needs to consider both the classification performance and the running time. According to the above experimental results, it is clear that good performance can be achieved when the middle block size is selected in all possible sizes. Therefore, the sub-block sizes were set to 128 × 37 and 122 × 85 for Salinas-small data and Pavia University data, respectively. However, for different data, further study of the evaluation criteria for the optimum sub-block size is still necessary.
5.2. The Computation Cost of DANCE
To evaluate the computation cost of DANCE, the Indian Pines data were taken as an example. The HSI was first divided into blocks with the size of 29 × 29, and passed into DANCE ten times. The computational costs are shown in
Table 11, which includes the averages and variances of disk usage, CPU usage, and the running time.
It can be seen that the proposed DANCE does not greatly increase the burden of image processing. However, the running time is still not short, and thus needs to be optimized in future research.
6. Conclusions
Motivated by the DRCNN method using feature-wise spatial regional consistency, a method named Data-wise spAtial regioNal Consistency re-Enhancement (DANCE) is proposed, which fully considers the relationship between pixels and combines the SGWT with Gaussian filtering. Then, DRCNN is used to realize the HSI classification. Experimental results show the proposed DANCE method can effectively enhance the spatial regional consistency of images based on a data-wise approach. It can be seen in
Section 4.2.1 that the proposed method performs better than other baselines and DRCNN. Firstly, compared with other baselines, the proposed method makes full use of the spatial consistency of both the data-wise and feature-wise approaches. For both the middle and edge areas of the ground objects, the misclassification points are evidently reduced. Then, compared with DRCNN, DANCE improves the quality of HSIs by enhancing the spatial consistency of the data-wise approach and removing the Gaussian noise. In particular, it can be seen that the accuracy of edge points is improved in the classification maps. The disadvantage is that DANCE increases computational cost compared with only DRCNN.
Some additional work should be further researched. Firstly, the result of six regions in the DRCNN is adopted by the contact strategy. Therefore, the central region does not achieve the role of re-correcting the misclassified points. This issue should also be given further attention in future work. Regarding another aspect, the proposed method does not consider the spectral correlations between different bands, which leads to the problem of redundancy with longer training time and larger storage space. The above two issues were not addressed in this study. In the future, we will conduct further research.