Next Article in Journal
Discussion on InSAR Identification Effectivity of Potential Landslides and Factors That Influence the Effectivity
Next Article in Special Issue
One-Shot Dense Network with Polarized Attention for Hyperspectral Image Classification
Previous Article in Journal
An Adaptive Focal Loss Function Based on Transfer Learning for Few-Shot Radar Signal Intra-Pulse Modulation Classification
Previous Article in Special Issue
Multiscale Feature Aggregation Capsule Neural Network for Hyperspectral Remote Sensing Image Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hyperspectral Image Classification Based on Spectral Multiscale Convolutional Neural Network

1
College of Communication and Electronic Engineering, Qiqihar University, Qiqihar 161000, China
2
College of Information and Communication Engineering, Dalian Nationalities University, Dalian 116000, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(8), 1951; https://doi.org/10.3390/rs14081951
Submission received: 19 March 2022 / Revised: 15 April 2022 / Accepted: 16 April 2022 / Published: 18 April 2022
(This article belongs to the Special Issue Recent Advances in Processing Mixed Pixels for Hyperspectral Image)

Abstract

:
In recent years, convolutional neural networks (CNNs) have been widely used for hyperspectral image classification, which show good performance. Compared with using sufficient training samples for classification, the classification accuracy of hyperspectral images is easily affected by a small number of samples. Moreover, although CNNs can effectively classify hyperspectral images, due to the rich spatial and spectral information of hyperspectral images, the efficiency of feature extraction still needs to be further improved. In order to solve these problems, a spatial–spectral attention fusion network using four branch multiscale block (FBMB) to extract spectral features and 3D-Softpool to extract spatial features is proposed. The network consists of three main parts. These three parts are connected in turn to fully extract the features of hyperspectral images. In the first part, four different branches are used to fully extract spectral features. The convolution kernel size of each branch is different. Spectral attention block is adopted behind each branch. In the second part, the spectral features are reused through dense connection blocks, and then the spectral attention module is utilized to refine the extracted spectral features. In the third part, it mainly extracts spatial features. The DenseNet module and spatial attention block jointly extract spatial features. The spatial features are fused with the previously extracted spectral features. Experiments are carried out on four commonly used hyperspectral data sets. The experimental results show that the proposed method has better classification performance than some existing classification methods when using a small number of training samples.

Graphical Abstract

1. Introduction

Hyperspectral image is a three-dimensional image, which is captured by some aerospace vehicles carrying hyperspectral imagers. Hyperspectral images contain rich spectral–spatial information. Each sample of hyperspectral images has hundreds of spectral bands, and each band has hundreds of reflection information, which enables hyperspectral images to play a great role in military target detection, agricultural production, water quality detection, mineral exploration and other aspects [1,2,3,4]. Researchers have carried out a significant amount of useful research using the unique characteristics of hyperspectral images, for example, using the spectral information of hyperspectral images to detect the information of the earth’s surface [5,6]. Hyperspectral image classification is based on different kinds of substances with different spectral curves. Each category corresponds to some specific samples, and each sample also has its own unique spatial–spectral characteristics. However, there are two common problems in hyperspectral image classification: (1) using small sample training in hyperspectral image classification will affect the performance of the model and reduce the generalization ability of the model and (2) in the case of small samples, if we can fully extract spatial spectral features and improve the classification performance.
In the early stage of studying hyperspectral image classification, tools such as support vector machine [7] and polynomial logistic regression [8] are mainly used. The definition of support vector machine is the linear classifier with the largest interval in the feature spatial. Its learning strategy is to maximize the interval. Because hyperspectral images also contain a large number of nonlinear features, SVM cannot extract the non-linear features in hyperspectral images well. Although spectral information can be used for classification, the classification performance will be better if spatial information is fully used on the basis of spectral information. In order to further improve the classification performance, super-pixel sparse representation and multi-core learning are also proposed [9,10,11]. A multi-core model has more flexibility and stronger feature mapping ability than single kernel function, but the algorithm of multi-core learning is more complex, inefficient and needs more memory and time.
Using deep learning technology can automatically extract the nonlinear and hierarchical features of hyperspectral images. For example, image classification [12], semantic segmentation [13] and target detection [14] in computer vision tasks, information extraction [15], machine translation [16], question answering system [17] in natural language processing and image classification. have made great progress with the help of deep learning technology. As a typical classification task, hyperspectral image classification, as a result of the progress of deep learning technology, the classification accuracy has also been greatly improved. So far, a great deal of exploration on extracting spectral–spatial features of hyperspectral images has been carried out. Some typical feature extraction methods [18], such as the structural filtering method [19,20,21] and morphological contour method [22,23,24], random field method [25,26], sparse representation method [27,28] and segmentation method [29,30,31] have been proposed. Compared with the traditional feature extraction method based on manual production, the deep learning method is an end-to-end method, which can learn useful features automatically from a large number of hyperspectral data through a multi-layer network. At present, the methods for extracting hyperspectral image features using depth learning include stack automatic encoder (SAE) [32], depth belief network (DBN) [33], CNNs [34], recursive neural network [35,36] and graph convolution network [37].
In [33], Chen introduced the stacked automatic encoder (SAE) to extract important features. Tao [38] extracted spectral–spatial features using two sparse SAE. Depth automatic coding will reduce the dimension of the input data. Its dimension reduction is different from PCA. Depth automatic coding is more complex because it carries out nonlinear operation. A new depth automatic encoder (DAE) was proposed by [39] who designed a new collaborative representation method to process small training sets, which can obtain more useful features from the neighborhood of target pixels in hyperspectral images. Zhang [40] used a recursive automatic encoder (RAE) and weighted method to fuse the extracted spatial information. In [34], a depth belief network (DBN) was used for hyperspectral image classification. Although these methods can be classified, they are one-dimensional classification methods with poor classification performance. Hu [41] proposed to directly extract the spectral features of hyperspectral images by using a one-dimensional CNN model and classified them by using the extracted spectral features; the model has five layers. Li [42] proposed a new method for classifying hyperspectral images using pixels.
For hyperspectral image classification tasks, the 2D-CNNs model can directly extract spatial information. Some features in hyperspectral images are highly similar. In [43], a depth two-dimensional CNNs model based on depth hash neural network (DHNN) is proposed. The proposed model can effectively learn the features with high similarity in hyperspectral images. Because hyperspectral images are high-dimensional data, it is necessary to reduce the dimension before classification, and then learn the spatial information in hyperspectral image samples through 2D-CNNs. Chen [44] et al. proposed a method of acquiring spatial–spectral information of hyperspectral images and feature fusion using 2D-CNNs, which is based on deep neural network (DNN). The different region convolution neural network (DRCNN) to classify hyperspectral images was proposed by [45]. The input of different regions is used to learn the context features of different regions, and better classification results are obtained. Zhu et al. [46] proposed to introduce deformable convolution into the network to classify hyperspectral images, and adaptively adjust the receptive field size of the activation unit of a convolution neural network to effectively reflect the complex structure of hyperspectral images.
Hyperspectral image classification using 3D-CNNs has better performance, because both 1D-CNNs and 2D-CNNs cannot extract spatial feature information and spectral feature information at the same time. When using small training samples for hyperspectral image classification, the 3D-CNNs model is a more effective classification method, because it can capture the spatial–spectral information of hyperspectral images at the same time. Ding et al. [47] proposed a convolutional neural network based on diverse branch modules (DBB). It enriches the spatial feature by combining branches with different scales and different complexities, including convolution sequence, multiscale convolution and average pooling. Thus, the feature extraction ability of single convolution is improved. Each branch uses convolution kernels with different scales to extract the spectral information of hyperspectral images, so as to improve the classification performance. Usually, convolution neural networks use pooling operations to reduce the size of feature map. This process is crucial to realize local spatial invariance and increase the receptive field of subsequent convolution. Therefore, the pooling operation should minimize the loss of information in the feature map. At the same time, computing and memory overhead should be limited. In order to meet these needs, Alexandros [48] and others proposed a fast and efficient pooling method, 3D-Softpool, which can accumulate activation in an exponential weighted manner. Compared with other pooling methods, 3D-Softpool retains more information in the down sampling activation mapping. In hyperspectral image classification, finer down sampling can obtain more spatial feature information and can improve the classification accuracy.
A new network of dual branch dual attention mechanism (DBDA) is proposed in [49]. One branch is used to extract spatial features and the other branch is used to extract spectral features. A spatial attention module is applied to spatial branches and a channel attention module is used to spectral branches. Some important spectral–spatial features can be captured by using the attention module, which helps to improve classification performance. The hyperspectral image classification methods based on 3D-CNNs can also be divided into two categories: (1) 3D-CNNs are utilized as a whole to extract the spectral–spatial features of hyperspectral images. In [50], the depth feature extraction network of 3D-CNNs is proposed, which can capture spatial–spectral features at the same time. Some 3D-CNNs frameworks directly obtain the features of hyperspectral cubes without preprocessing and post-processing the input data. (2) Spectral features and spatial features are extracted and classified after feature fusion. In order to fully extract the spatial–spectral features of hyperspectral images from shallow layer to deep layer, a three-layer CNN is constructed in [51]. Then, by fusing multi-layer spatial features and spectral features, more complementary information can be provided. Finally, the fused features and classifiers form a network, and the end-to-end performance optimization is carried out. Li et al. [52] proposes deep CNN with double branch structure to extract spatial and spectral features.
The deep pyramid residual network (pResNet) proposed in [53] can make full use of a large amount of information in hyperspectral images for classification, because the network can increase the dimension of feature mapping between layers. In [54], an end-to-end fast dense spectral spatial convolution (FDSSC) hyperspectral image classification structure is proposed. Different convolution kernel sizes are used to extract spectral and spatial features respectively, and an effective convolution method is used to reduce the high dimension. Improving the running speed of the network can also effectively prevent over fitting. In order to avoid the loss of context information caused by using only one or several fixed windows as the input of hyperspectral image classification, an attention multi branch CNN structure using adaptive region search (RS–AMCNN) is proposed in [55]. In [56], a method for classifying hyperspectral images using multiscale super pixels and guided filter (MSS–GF) is proposed. MSS is used to obtain spatial local information from different scales in different regions, and a sparse representation classifier is used to generate classification maps of different scales in each region. This method can effectively improve the classification ability of hyperspectral images. Because RS–AMCNN can adaptively search the position of spatial window in local areas according to the specific distribution of samples, it can effectively extract edge information and evenly extract important features in each area. In [57], a spectrum and context information set classification method based on Markov random field (MRF) is proposed. In order to make full use of deep features, a cascade MRF model is proposed to extract deep information. This method has good classification performance. In [58], the proposed dual branch multi attention network (DBMA) extracts spectral spatial features, and uses the attention mechanism on both branches, which has a good classification effect. Sun et al. proposed a new method, low rank component induced spatial spectral kernel method based on patch, called lrcissk, for HSI classification. Through the low rank matrix recovery (LRMR) technology, the low rank features of the spectrum in HSI are reconstructed to explore more accurate spatial information, which is used to identify the homogeneous neighborhood pixels (i.e., centroid pixels) of the target [59]. In [60], this paper proposed a spectral–spatial feature tokenization transformer (SSFTT) method, which has a Gaussian weighted feature marker for function transformation, capturing spectral spatial features and advanced semantic features for hyperspectral image classification. Hong et al. proposed a method called invariant attribute profile (IAP) to extract invariant features from the spatial and frequency domain of hyperspectral images and classify hyperspectral images [24]. Aletti, G. et al. proposed a new semi-supervised method for multilabel segmentation of HSI, which combines appropriate linear discriminant analysis and can be used to compare the similarity indexes of different spectra [61]. Christos G. Bampis et al. proposed a graph driven image segmentation method. By developing the diffusion process defined on any graph, this method has less computational burden through experiments [62].
The content of hyperspectral images is usually complex; many different substances show similar texture features, which means the performance of many CNN models cannot be brought into full play. Due to the existence of noise and redundancy in hyperspectral image data, standard CNNs cannot capture all features. In addition, when additional layers are added, the deeper CNNs architecture will also affect the convergence of the network and produce lower classification accuracy. In order to alleviate these problems, Ding et al. [47] proposed DBB, which combines multiple branches with different scales and complexity to extract richer spectral feature information, including convolution sequence, multiscale convolution and average pooling. When using 3D-CNNs to extract features, the more layers of the network, the more complex the network structure will be, resulting in more parameters and more computing and memory overhead. In order to reduce information loss, 3D-Softpool [48] is used to extract spatial features, and 3D-Softpool can be cumulatively activated in an exponentially weighted manner. Compared with a series of other pooling methods, 3D-Softpool retains more information in down sampling activation mapping, which can effectively improve the performance and generalization ability of hyperspectral image classification. Inspired by DBB and 3D-Softpool methods, in order to fully extract spatial–spectral information and solve the problem of small sample over fitting, a spectral–spatial attention fusion method based on four branch multiscale blocks (FBMB) and sampling activation network based on 3D-Softpool module is proposed. The contributions of this study are as follows:
This paper proposes a FBMB structure different from other multi branches. The module enriches the feature spatial by combining multiple branches with different scales and complexity, and adds spectral attention blocks to each branch to further extract important spectral features. Finally, the extracted features of these branches are concatenated. The module can fully capture spatial–spectral features and improve the classification performance.
In the process of extracting spatial features, 3D-Softpool is introduced, and 3D-Softpool will be cumulatively activated in an exponential weighted manner. Compared with other pooling methods, 3D-Softpool retains more information in the down sampling activation mapping. A fusion method similar to dense connection is designed to extract the spectral and spatial features again to further improve the classification accuracy of hyperspectral images.
Experiments on four public data sets show that the experimental results of the proposed method for hyperspectral image classification are better than other advanced methods.
The rest of this paper is arranged as follows. Section 2 introduces each part of the proposed method in detail. Section 3 gives the experimental results and analysis. Section 4 provides a discussion of the proposed method. In Section 5, some conclusions are provided.

2. Materials and Methods

2.1. Overall Structure of the Proposed Method

The overall framework of the proposed network based on spectral four branch multiscale network (SFBMSN) for hyperspectral image classification is shown in Figure 1.
The SFBMSN network is composed of three parts. In the first part, the FBMB structure with a channel attention module is designed to extract and select spectral features, so as to fully extract important spectral features. The second part uses the dense connection structure to further extract the spectral features and fully extract the information. In the third part, the dense connection structure of the first convolution layer including a 3D-Softpool module is used to extract spatial features, increase the receiving field of subsequent convolution and extract more spatial features, so as to improve the classification performance. In addition, some optimization strategies are used to prevent over fitting.
In the first part, FBMB includes four branches, and the convolution kernel size of each branch is not the same. A spectral attention mechanism is introduced into each branch to obtain spectral features more conducive to classification. Then, the spectral features extracted by four branches are fused. In the second part, based on the DenseNet [63] structure and the idea of reusing spectral features, the fused features are input into the DenseNet network, and the dense blocks with three convolution layers are used to extract spectral features. The third part is similar to the extraction of spectral features. The original hyperspectral data are input into the dense block containing 3D-Softpool. The 3D-Softpool module is added to the first convolution layer of dense blocks to extract spatial neighborhood features combined with spatial attention mechanism. Then, the feature maps obtained from spectral branches and spatial branches are added element by element, and the features after spatial–spectral fusion are input into global average pool (GAP), full connection layer and linear classifier to obtain the classification results.

2.2. FBMB and Spectral Self-Attention Module

The FBMB and spectral self-attention module are important modules in the first part of the proposed method. In Figure 1, the first part is the proposed FBMB structure with spectral self-attention mechanism. Firstly, principal component analysis (PCA) is performed on the original hyperspectral image data, and the data after PCA is P 9 × 9 × b a n d , where 9 × 9 represents the length and width of the data, and band represents the number of channels of the data. Next, a three-dimensional convolution operation is carried out on the data P 9 × 9 × b a n d . The size of the convolution kernel of the three-dimensional convolution layer is set to (1 × 1 × 7), the padding is set to (0 × 0 × 0) and the stride is set to (1 × 1 × 2). In this way, the length and width of the data after the convolution operation remain unchanged, which is 9 and 9, respectively, and the number of channels becomes a , that is, a is the number of channels of the data after the convolution layer operation.
The output data are input into the four branches of the FBMB module at the same time, and retain the useful spectral information as much as possible through the spectral self-attention mechanism in each branch. For the first branch of the FBMB module, the convolution layers with 6 convolution kernels are utilized for convolution. The population strategy is used for all branches to make the input and output data size consistent. After the convolution layer, the batch normalization (BN) layer is used. The BN layer can normalize and linearly scale the channel to speed up the convergence of the model. Suppose the input data are B = { x 1 m } , the output data are y i = B N λ , β ( x i ) and the trainable parameters are λ , β , then the process of BN can be represented as
1 m i = 1 m x i μ B
1 m i = 1 m ( x i μ B ) 2 δ B 2
x i μ B δ B 2 + ε x ˜ i
λ x ˜ i + β y i
Firstly, the mean and variance of the input data are calculated according to Equations (1) and (2), then the input data are normalized to [0, 1] corresponding to Equation (3), and finally multiply each element in B by γ and add β to output y i . γ and β are trainable parameters. The purpose of normalization is to adjust the data to a unified interval, reduce the divergence of data and reduce the learning difficulty of the network. Using BN and Mish activation function after convolution layer can effectively avoid gradient explosion and gradient disappearance. Then, continue to use the spectral self-attention mechanism to capture important spectral information. The schematic diagram of the mechanism of spectral self-attention is shown in Figure 2.
Using the spectral attention mechanism, we can mine the interdependence between spectral feature maps, extract feature maps with strong dependence, and improve the feature representation of specific semantics. As shown in Figure 2, U represents the spectral features, which is the initial input of U c × p × p . Here, p × p is the input patch size and c represents the number of input channels. Y represents the spectral attention map, and the size of Y is c × c . Y is calculated from the initial input spectral feature map U . y i j is used to measure the influence of the i th spectral feature on the j th spectral feature. U i is the i th spectral feature and U j is the j th spectral feature. The calculation process is
y i j = exp ( U i × U j ) i = 1 C exp ( U i × U j )
Then, the results of matrix multiplication between Y and U are reshape into c × p × p . Finally, the reshape result is weighted by the scale α parameter, and input U is added to obtain the final spectral attention map E c × p × p :
E j = α i = 1 C ( y i j U j ) + U j
where α is initialized to zero and can be learned gradually. The final map E includes the weighted summations of features of all channels, which can improve the resolution of features.
In the second branch, the size of the convolution kernel in the first convolution layer is ( 1 × 1 × 1 ) , the number of the convolution kernel is 6, and BN + Mish is used. The size of convolution kernel in the second convolution layer is ( 3 × 3 × 7 ) , the number of convolution kernels is 6 and BN + Mish is used. At the end of the second branch, spectral self-attention is utilized, and the input size is ( 9 × 9 × a ) . The third branch has the same structure as the fourth branch, and the size of convolution kernel is different. The size of convolution kernel in the third branch is ( 3 × 3 × 7 ) and the number of convolution kernel is 6. For the fourth branch, the size of convolution kernel is ( 5 × 5 × 7 ) and the number of convolution kernel is 6. After the convolution layer, BN + Mish and spectral self-attention mechanism are adopted to avoid data explosion and gradient disappearance. Because data padding is used in all four branches, the output size is ( 9 × 9 × a , 6 ) . Finally, the data outputs from the four branches are added, and the added cube size is ( 9 × 9 × a , 24 ) . The cube output by the FBMB module contains a large amount of important spectral feature information, which provides rich spectral feature information for subsequent operations.

2.3. Dense Connection Network

Dense connection network is an important network of the second and third parts of the proposed method. In order to avoid the gradient disappearance problem caused by the deepening of the network depth, the dense connection module is adopted behind the FBMB module to further extract the effective spectral features. On the premise of ensuring full transmission of information between the middle layers of the network, all layers are directly connected. Each layer connects the inputs of all previous layers, and then transmits those output feature maps to all subsequent layers. This can ensure the maximum spectral information flow between network layers and make full use of spectral features. The structure diagram of DenseNet is shown in Figure 3.
DenseNet used in this paper has three layers, and each layer is composed of convolution, BN, and Mish. Each layer has a nonlinear transformation function H l . H l is the combined function of the three operations. x l 1 is the output of layer l 1 . In the traditional feedback network, the output of layer l 1 is the input of layer l , and the output x l of layer l can be obtained, which can be represented as
x l = H l ( x l 1 )
DenseNet can effectively improve inter layer information transmission by connecting one layer with all subsequent layers. Therefore, the output [ x 0 , x 1 , , x l 1 ] of all layers before layer l is used as the input of layer l , and the output of layer l is represented as
x l = H l [ x 0 , x 1 , , x l 1 ]
After dense connection, the key information of spectral features is extracted by spectral self-attention mechanism again.

2.4. Spatial Dense Connection Network with 3D-Softpool

As dense connection network has already been introduced, it will not be repeated in this section. 3D-Softpool is an important module in the third part of the proposed method. The principle of 3D-Softpool will be introduced in detail in this section. The dense connection is utilized to capture spatial features. This dense connection has three layers, and each layer is composed of convolution, BN and Mish. The Mish layer is replaced with 3D-Softpool in the first layer of dense connection. 3D-Softpool is a fast and efficient pooling method, which accumulates activation in an exponentially weighted manner. Compared to other pooling methods, 3D-Softpool retains more information during the down sampling activation mapping, which helps to improve classification performance. 3D-Softpool is based on the natural index (e), which can result in a large activation value and have a greater impact on the output. 3D-Softpool is differentiable, which means that all activation in the local neighborhood will be assigned at least one minimum gradient value during back propagation. The process of 3D-Softpool is shown in Figure 4.
3D-Softpool uses the maximum approximation in the activation area, and each activation a i is assigned a weight W i , which can be represented as
W i = e a i j R e a j
The output value of 3D-Softpool is obtained by summing all weighted activation in an activation map:
a ˜ = i R W i a i
The output is
a ˜ = i R e a i a i j R e a j
The probability distribution of the normalized results generated using SoftMax is proportional to the adjacent activation value of each activation value relative to the adjacent region. It can make all activation contribute to the final classification output, extract more useful spatial features and improve the classification accuracy.

2.5. Spatial Self-Attention Mechanism

The spatial self-attention mechanism is an important module in the third part of the proposed method. The principle of spatial self-attention mechanism will be introduced in detail in this section. The data generated after 3D-Conv, BN and Mish operations are input to the spatial self-attention module. For spatial attention mechanism, by establishing the context relationships between local spatial features, more extensive context information can be encoded into local spatial features to improve the representation ability of features. The spatial self-attention mechanism is shown in Figure 5.
F c × h × h is the input feature map, B , C and D are three new spatial feature maps generated after three convolution operations, among which { B , C   a n d   E } c × h × h . Then B , C and D are reshaped into c × t , where t = h 2 and t represent the number of pixels. B and C perform matrix multiplication, and then obtain the spatial attention feature map S t × t through SoftMax layer calculation.
S i j = exp ( B i × C j ) i = 1 b exp ( B i × C j )
where, S i j is the influence of the i th pixel on the j th pixel. D and S T perform matrix multiplication, and the result is reshaped into c × h × h ;
Z j = η i = 1 b ( S j i D j ) + F j
Among them, the initial value of η is zero, which can gradually assign more weights. By observing Equation (13), we can find that all positions and original input feature maps will be added with a certain weight to obtain the final output spatial attention feature map Z c × h × h .

3. Experiment and Results

3.1. Data Set and Parameter Setting

The performance of the proposed method is verified by using four classical hyperspectral image data sets: Indian pine (IN), Pavia University (UP), Kennedy Space Center (KSC) and Salinas Valley (SV).
The IN data set shown in Table 1 is the earliest test data used for hyperspectral image classification. An Indian pine tree in Indiana was imaged by airborne visible infrared imaging spectrometer (AVIRIS) in 1992, and then intercepted with a size of 145 × 145. The size of 145 is labeled for hyperspectral image classification test. The spatial resolution of the image generated by the spectral imager is about 20 m. After eliminating useless bands, 200 bands are left for experimental research. There are 21,025 pixels in the data set, but only 10,249 pixels are ground object pixels, and the remaining 10,776 pixels are background pixels. In the actual classification, these pixels need to be eliminated. Because the intercepted area is crops, there are 16 classes in total, so different ground objects have relatively similar spectral curves, and in these 16 classes, the distribution of samples is extremely uneven.
The UP data set shown in Table 2 is part of the hyperspectral data imaged by the airborne reflective optical system imaging spectrometer (rosis) of Germany in Pavia City, Italy in 2003. The spatial resolution of the image is 1.3 m and the data size is 610 × 340; The spectral imager imaged 115 bands in the wavelength range of 0.43–0.86 μm, eliminated 12 bands affected by noise, and left 103 available bands. Among them, the data set contains 9 types of features, including trees, asphalt roads, bricks, meadows, etc.
The KSC data set shown in Table 3 represent the data collected by NASA AVIRIS (airborne visible/infrared imaging spectrometer) instrument at Kennedy Space Center (KSC) in Florida on 23 March 1996. AVIRIS collected 224 bands with a width of 10 nm and a central wavelength of 400–2500 nm. The spatial resolution of KSC data obtained from an altitude of about 20 km is 18 M. After removing the water absorption and low SNR bands, 176 bands were used for analysis. Training data were selected using land cover maps provided by color infrared photography and Landsat thermal mapper (TM) images provided by Kennedy Space Center. The vegetation classification scheme was developed by KSC personnel to define the types of functions that can be distinguished at the spatial resolution of Landsat and these AVIRIS data. Because the spectral characteristics of some vegetation types are similar, it is difficult to distinguish land cover in this environment. For classification purposes, 13 categories are defined for the site, representing various land cover types that occur in this environment.
The SV data set shown in Table 4 is the same as the Indian pines image. The Salinas data were also obtained by AVIRIS imaging spectrometer, which is an image of the Salinas Valley in California, USA. Different from Indian pines, its spatial resolution reaches 3.7 m. The image originally has 224 bands. Similarly, we generally use the image of the remaining 204 bands after excluding the 108-112154-167 and the 224th band that cannot be reflected by water. The size of the image is 512 × 217. Therefore, it contains 111,104 pixels, of which 56,975 pixels are background pixels, and 54,129 pixels can be applied to classification. These pixels are divided into 16 categories, including fallow and celery.
In the experiment, in order to verify that small training samples can also achieve high classification accuracy, we randomly selected 3%, 0.5%, 3% and 0.5% samples in IN, UP, KSC and SV for training, and the remaining samples in each data set were used for testing. The next section will prove that the proposed method can achieve high classification accuracy in the case of small samples. Table 1, Table 2, Table 3 and Table 4 show the number of training samples and test samples in the four data sets of IN, UP, KSC and SV.
The experimental hardware platform is a server with NVIDIA GeForceRTX2060 GPU and 16 GB random access memory. The experimental software platform is based on Windows 10 VisualStudioCode operating system, using CUDA 10.0, Pytorch 1.2.0 and Python 3.7.4. All experiments randomly selected different training data and were repeated 10 times. The final results of all experiments were the average results of 10 experiments. The experiment uses Adam optimizer, and the learning rate is set to 0.0001. We used real positive (TP) to represent the positive samples correctly classified by the model, false negative (FN) to represent the positive samples incorrectly classified by the model, false positive (FP) to represent the negative samples incorrectly classified by the model, and true negative (TN) to represent the negative samples correctly classified by the model. The overall accuracy (OA), average accuracy (AA) and Kappa coefficient (Kappa) were used as evaluation indexes [64]. OA is the ratio of the number of correctly classified samples to the total number of samples. The OA is calculated as
O A = T P + T N T P + F N + F P + T N
AA is the classification accuracy of each category. The calculation of AA is
A A = T P T P + F N + T N T N + F P 2
Kappa coefficient measures the consistency between classification map and ground real map. The calculation of Kappa is
K a p p a = p o p e 1 p e
where p o is the sum of the number of correctly classified samples of each category divided by the total number of samples; p e is the actual quantity of each category multiplied by the sum of the predicted quantity of this category divided by the square of the total number of all categories.
Some traditional and advanced hyperspectral image classification methods based on convolutional neural network are used to compare the classification performance with the proposed method. These methods include SVM, CDCNN, SSRN, FDSSC, DBMA, and DBDA. SVM uses radial basis function to solve nonlinear classification problem; CDCNN is a 2D-CNN model. Other methods and proposed methods are 3D-CNN model.
The design of network structure and the selection of parameters determine the classification performance of the network. In our experiments, the main parameters of SFBMSN are shown in Table 5. Hyperspectral images data are dimensionally reduced by PCA, and then put into the network for feature extraction. Assuming that the size of the input data is 9 × 9 × n , the data pass through the input module, FBMB module of spectral branch, 3D-Softpool module of spatial branch, dense connection block, attention block and final classification block. Obviously, spatial size and n are important factors affecting the classification performance. Therefore, this section will analyze the impact of these parameters in detail. Table 5 shows the parameter configurations of the proposed methods.
Effect of n on classification performance: n determines the depth of the network. Here, the influence of parameter n in spectral branch on the classification accuracy is discussed. Generally speaking, with the increase in network depth, the classification accuracy will also improve. Too many layers will bring problems such as over fitting, gradient disappearance and gradient explosion. Figure 6 shows the impact of the number of n on OA of different data sets. Set n of IN, UP, KSC, and SV data sets to 96, 97, 98 and 99, respectively, and the spatial size is fixed. As can be seen from Figure 6a, when n is set to 97, the OA value is the highest, and OA gradually decreases with the increase in n . For SV data set, OA value will fluctuate slightly with the increase in n . In order to avoid information redundancy caused by excessively densely connected networks and balance classification accuracy and calculation, the n number of IN, UP, SV and KSC data sets is set to 97.
Effect of spatial size on classification performance: For hyperspectral images, too small input data blocks will lead to insufficient feature extraction, while large data blocks easily result in noise problems. Therefore, the spatial size of the input 3D data block will also affect the classification accuracy. When n is fixed, the effect of spatial size on classification performance is analyzed. The spatial size of the input sample is set to 5 × 5 , 7 × 7 , 9 × 9 , 11 × 11 , 13 × 13 and 15 × 15 , respectively. Figure 6b shows the OAs with different input spatial sizes on four data sets. As can be seen from Figure 6b, the OA increases with the increase in spatial size. For UP and IN data sets, when the spatial size reaches 9 × 9 , the OA accuracy begins to decrease. For SV and KSC data sets, when the spatial size reaches 9 × 9 , the OA is the highest. Therefore, the input block with spatial size 9 × 9 is selected to train the network.

3.2. Experimental Results and Analysis

All methods are tested with the same proportion of training samples, and the classification performance of these methods is compared. Table 6, Table 7, Table 8 and Table 9 lists the classification accuracy of all categories obtained by all methods on the four data sets of IN, UP, KSC and SV. All results obtain the average of the experimental results 10 times, and the highest classification results are shown in bold. It can be observed from Table 6, Table 7, Table 8 and Table 9 that compared with other methods, OA, AA and Kappa proposed by the method are all the highest. The OA value of the proposed method is 28.29% higher than that of SVM, and 6.11%, 3.45%, 6.82%, 24.64%, 3.77% and 9.9% higher than that of DBMA, DBDA, SSRN, CDCNN, FDSSC and pResNet, respectively. SVM does not use spatial neighborhood information and has poor robustness, so its OA value is low at only 67.77%. CDCNN is a 2D-CNN structure, and its robustness is better than SVM, so its OA value is 3.65% higher than that of SVM. FDSSC adopts the dense connection, and its OA value is more than 3.05% higher than that of SSRN using the residual connection. The DBMA method adopts the network structure of double branches and double attention, and using small sample training will lead to over fitting. The DBDA network uses double branch and double attention structure, which has a more flexible feature extraction structure than the DBMA network. Therefore, the OA value obtained by DBDA is higher than that of DBMA. The proposed method designs the FBMB module to extract spectral features. The convolution kernels of different sizes are used to fully extract the important spectral features on the four branches, and the spectral attention mechanism is deployed on each branch to further extract the key spectral features. Finally, the spectral features extracted from the four branches are fused to obtain more effective spectral features. Observation of the experimental results shows that the classification performance is significantly improved after using the 3D-Softpool module to extract spatial features. After observing the experimental results of classification using small samples, it is found that the proposed method has the best classification performance. As can be seen from Table 6, Table 7, Table 8 and Table 9, the classification accuracy of the proposed method is the highest among the four data sets. The experimental results also show that the classification performance of the CDCNN method using the shallow network to capture features is the worst, the receptive field of the shallow network is smaller than that of the deep network and it extracts low-level features. Other methods used for comparison also adopt feature fusion strategies, including SSAN and FDSSC, which usually provide higher classification accuracy than other methods (CDCNN, SVM). When using small samples for training, the hierarchical fusion mechanism can fuse the complementary information and related information output from different convolution layers, making the extracted features more comprehensive. Moreover, in order to further verify the performance of the proposed SFBMSN network, the classification maps of different methods on the four data sets of IN, UP, SV and KSC are shown in Figure 7, Figure 8, Figure 9 and Figure 10. By observing the classification map, it can be seen that compared with the classification map of other methods, the classification map of the proposed method has less noise, clear boundary and is closest to the ground truth map. The effectiveness of the proposed method is further proved.

4. Discussion

Experiment 1: In order to verify the effectiveness of the proposed attention block, FBMB module, 3D-Softpool and dense connection, some ablation experiments were carried out on four hyperspectral images data sets. Only the modules to be tested in the network were deleted, and other parts remained unchanged. Figure 11 shows the experimental results with or without specific modules. At the same time, it can be seen that the accuracy of the network reaches the highest after adding the spatial and spectral attention module. The reason is that the introduction of attention block to extract the features of hyperspectral images can adaptively allocate different weights, different spectral features and different spatial regions and selectively enhance the important features useful for classification, that is, increase the weight of some important features, which will help to improve the accuracy of classification. In addition, Figure 11b shows the effectiveness of fusing spatial–spectral features of different scales. As shown in Figure 11b, the FBMB module improves the classification accuracy of four data sets. This is because multiple branches can effectively extract spectral features of different scales, which is helpful to improve the classification accuracy. It is obvious from Figure 11c that after removing the 3D-Softpool module, the OAs on the four data sets decrease significantly. 3D-Softpool block can reduce the loss of information in feature mapping and retain more information in down sampling activation mapping, so better classification results can be achieved on four data sets. As can be seen from Figure 11d, if the dense connection is not used for the experiment, the OAS on the four data sets of IN, UP, SV and KSC decreases significantly. Using dense connection, all convolution layers can be connected, so that the spectral characteristic map output after convolution operation of each convolution layer is the input of all subsequent layers. This can ensure the maximum spectral information flow between network layers and make full use of spectral features. Therefore, better classification performance can be obtained on the four data sets of IN, UP, SV and KSC. As can be seen from Figure 11d, if the dense connection is not adopted for the experiment, the OAs on the four data sets of IN, UP, SV and KSC decreases significantly. Using dense connection, all convolution layers can be connected, so that the spectral feature maps output by each convolution layer are the input of all subsequent layers. This can ensure the maximum spectral information flow between network layers and make full use of spectral features. Therefore, better classification performance can be obtained on the four data sets of IN, UP, SV and KSC.
Experiment 2: Figure 12 is the schematic diagram of FBMB. In order to prove that our proposed FBMB can effectively improve the classification performance of hyperspectral image classification we carried out two groups of comparative experiments. Only the structure of FBMB was changed, and other structures remained unchanged. Some comparative experiments were carried out under the same conditions. The experimental results of the two groups of experiments were compared with that of the proposed method. In the first experiment, the convolution kernel size of all convolutions in FBMB were set to 7 × 7 × 7. We call the first group of comparative experiments the four-branch same scale block (FBSSB). In the second experiment, we used any three branches of the four branches B1, B2, B3 and B4 for experiments, which can have four combinations, namely C1 (B1, B2, B3), C2 (B1, B2, B4), C3 (B2, B3, B4) and C4 (B1, B3, B4). The experimental results of the three groups of comparative experiments on the four data sets of IN, UP, SV and KSC are shown in Figure 13. As can be seen from Figure 13, FBMB has the highest OA value and the best classification performance on the four data sets. The convolution kernel size of all convolutions in the first group of experiments is 7 × 7 × 7. The classification performance of this structure is not good. The size of convolution kernel is large, which leads to large number of parameters and slow operation in the experimental process. Compared with the combination of three branches, the feature extraction of four branches in FBMB is more sufficient and the classification performance is better.
Experiment 3: In order to further verify the performance of the proposed method, in comparison with other methods, the classification performance of different methods under different proportions of training samples was compared. In the experiment, the training ratios of IN, UP, KSC and SV were set to 1%, 5%, 10% and 15%, respectively. The experimental results are shown in Figure 14. As can be seen from Figure 14, when there are few training samples, the classification performance of CDCNN and SVM is relatively poor, and the classification performance of the proposed method is the best. With the increase in the number of samples, each method can obtain higher classification accuracy, but the classification accuracy of the proposed method can still be higher than that of other methods. This shows that this method has good generalization ability.
Experiment 4: Feature fusion merges the features extracted from the image into a complete image as the input of the next layer network, which can input more discriminative features for the next layer network. According to the order of fusion and prediction, feature fusion can be divided into early feature fusion and late feature fusion. Early feature fusion is a commonly used classical feature fusion method. For example, in an Inside–Outside Net (ION) [65] or HyperNet [66]), concatenation [67] or addition operations are used to fuse certain layers. The feature fusion strategy in this experiment is an early fusion strategy that directly connects two spectral and spatial scale features. The two input features have the same size, and the output feature dimension is the sum of the two dimensions. Table 10 shows the experimental results of whether to use the fusion strategy. It can be seen from Table 10 that the OA values obtained on the four data sets after feature fusion are significantly higher than those obtained without feature fusion strategy. After feature fusion is used on each data set, the OA value obtained increases by more than 1.8%. The results show that the processing effect of feature fusion strategy on hyperspectral image classification is significantly improved compared with that without feature fusion strategy.

5. Conclusions

In this paper, the SFBMSN method is proposed for hyperspectral image classification. The FBMB module, 3D-Softpool module, spatial attention module, channel attention module and dense connection are used in the network structure of this method. Using FBMB module, the spectral features of hyperspectral images can be extracted from multiple scales and different levels, and the spectral attention module is introduced into each branch of FBMB to obtain more important information and suppress useless information. Using a dense connection structure to extract spatial features can directly splice spatial features from different layers, realize feature reuse and improve the efficiency of feature extraction. The 3D-Softpool module is used for the first time in the dense connection structure. The 3D-Softpool can retain more spatial feature information in the down sampling activation mapping. The purpose of using spatial attention is to extract important spatial information and suppress useless redundant information. Experiments on four commonly used data sets using small training samples show that the proposed SFBMSN method is very competitive in hyperspectral image classification tasks.
In future research, we will further explore how to fuse the extracted spatial–spectral features more effectively, so that the spatial and spectral features at the edge can also be fully applied to classification. Therefore, designing a more efficient fusion model is an important direction of our future research.

Author Contributions

Conceptualization, C.S.; data curation, C.S. and J.S.; formal analysis, J.S.; methodology, C.S.; software, J.S.; validation, C.S. and J.S.; writing—original draft, J.S.; writing—review and editing, C.S. and L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the National Natural Science Foundation of China (41701479, 62071084), in part by the Heilongjiang Science Foundation Project of China under Grant LH2021D022, and in part by the Fundamental Research Funds in Heilongjiang Provincial Universities of China under Grant 135509136.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Indiana Pines, University of Pavia, Kennedy Space Center and Salinas Valley data sets are available online at http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes (accessed on 3 July 2021).

Acknowledgments

The authors would like to thank the editors and the reviewers for their help and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chang, C.I. Hyperspectral Data Exploitation: Theory and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar] [CrossRef] [Green Version]
  2. Patel, N.K.; Patnaik, C.; Dutta, S.; Shekh, A.M.; Dave, A.J. Study of crop growth parameters using airborne imaging spectrometer data. Int. J. Remote Sens. 2001, 22, 2401–2411. [Google Scholar] [CrossRef]
  3. Goetz, A.F.; Vane, G.; Solomon, J.E.; Rock, B.N. Imaging Spectrometry for Earth Remote Sensing. Science 1985, 228, 1147–1153. [Google Scholar] [CrossRef] [PubMed]
  4. Civco, D.L. Artifificial neural networks for land-cover classification and mapping. Int. J. Geogr. Inf. Syst. 1993, 7, 173–186. [Google Scholar] [CrossRef]
  5. Wang, X.; Kong, Y.; Gao, Y.; Cheng, Y. Dimensionality reduction for hyperspectral data based on pairwise constraint discriminative analysis and nonnegative sparse divergence. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 1552–1562. [Google Scholar] [CrossRef]
  6. Ghamisi, P.; Benediktsson, J.A.; Ulfarsson, M.O. Spectral–Spatial Classification of Hyperspectral Images Based on Hidden Markov Random Fields. IEEE Trans. Geosci. Remote Sens. 2013, 52, 2565–2574. [Google Scholar] [CrossRef] [Green Version]
  7. Farrugia, R.A.; Debono, C.J. A Robust Error Detection Mechanism for H.264/AVC Coded Video Sequences Based on Support Vector Machines. IEEE Trans. Circuits Syst. Video Technol. 2008, 18, 1766–1770. [Google Scholar] [CrossRef] [Green Version]
  8. Zhong, P.; Wang, R. Jointly Learning the Hybrid CRF and MLR Model for Simultaneous Denoising and Classification of Hyperspectral Imagery. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 1319–1334. [Google Scholar] [CrossRef]
  9. Fang, L.; Li, S.; Kang, X.; Benediktsson, J.A. Spectral-spatial classification of hyperspectral images with a superpixel-based discriminative sparse model. IEEE Trans. Geosci. Remote Sens. 2015, 53, 4186–4201. [Google Scholar] [CrossRef]
  10. Fu, W.; Li, S.; Fang, L. Spectral-spatial hyperspectral image classification via superpixel merging and sparse representation. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 4971–4974. [Google Scholar] [CrossRef]
  11. Fang, L.; Li, S.; Duan, W.; Ren, J.; Benediktsson, J.A. Classification of Hyperspectral Images by Exploiting Spectral–Spatial Information of Superpixel via Multiple Kernels. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6663–6674. [Google Scholar] [CrossRef] [Green Version]
  12. Li, G.; Li, L.; Zhu, H.; Liu, X.; Jiao, L. Adaptive Multiscale Deep Fusion Residual Network for Remote Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8506–8521. [Google Scholar] [CrossRef]
  13. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef] [Green Version]
  14. Wang, R.J.; Li, X.; Ling, C.X. Pelee: A Real-Time Object Detection System on Mobile Devices. arXiv 2018, arXiv:1804.06882. [Google Scholar]
  15. Zeng, D.; Liu, K.; Chen, Y.; Zhao, J. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1753–1762. [Google Scholar] [CrossRef]
  16. Gehring, J.; Auli, M.; Grangier, D.; Yarats, D.; Dauphin, Y.N. Convolutional Sequence to Sequence Learning. arXiv 2017, arXiv:1705.03122. [Google Scholar]
  17. He, H.; Gimpel, K.; Lin, J. Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Milan, Italy, 26–31 July 2015; pp. 1576–1586. [Google Scholar] [CrossRef]
  18. He, L.; Li, J.; Liu, C.; Li, S. Recent advances on spectral–spatial hyperspectral image classification: An overview and new guidelines. IEEE Trans. Geosci. Remote Sens. 2018, 56, 1579–1597. [Google Scholar] [CrossRef]
  19. Shen, L.; Jia, S. Three-dimensional Gabor wavelets for pixel-based hyperspectral imagery classification. IEEE Trans. Geosci. Remote Sens. 2011, 49, 5039–5046. [Google Scholar] [CrossRef]
  20. Tang, Y.; Lu, Y.; Yuan, H. Hyperspectral image classification based on three-dimensional scattering wavelet transform. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2467–2480. [Google Scholar] [CrossRef]
  21. Phillips, R.D.; Blinn, C.E.; Watson, L.T.; Wynne, R.H. An adaptive noise-fifiltering algorithm for AVIRIS data with implications for classi- fification accuracy. IEEE Trans. Geosci.Remote Sens. 2009, 47, 3168–3179. [Google Scholar] [CrossRef]
  22. Palmason, J.A.; Benediktsson, J.A.; Sveinsson, J.R.; Chanussot, J. Classification of hyperspectral data from urban areas using morphological preprocessing and independent component analysis. In Proceedings of the 2005 IEEE International Geoscience and Remote Sensing Symposium, 2005. IGARSS ’05, Seoul, Korea, 29 July 2005; p. 4. [Google Scholar] [CrossRef]
  23. Li, J.; Marpu, P.R.; Plaza, A.; Bioucas-Dias, J.M.; Benediktsson, J.A. Generalized composite kernel framework for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2013, 51, 4816–4829. [Google Scholar] [CrossRef]
  24. Hong, D.; Wu, X.; Ghamisi, P.; Chanussot, J.; Yokoya, N.; Zhu, X.X. Invariant attribute profifiles: A spatial-frequency joint feature extractor for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3791–3808. [Google Scholar] [CrossRef] [Green Version]
  25. Duan, P.; Kang, X.; Li, S.; Ghamisi, P.; Benediktsson, J.A. Fusion of multiple edge-preserving operations for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 10336–10349. [Google Scholar] [CrossRef]
  26. Sun, L.; Wu, Z.; Liu, J.; Xiao, L.; Wei, Z. Supervised spectral–spatial hyperspectral image classification with weighted Markov random fifields. IEEE Trans. Geosci. Remote Sens. 2015, 53, 1490–1503. [Google Scholar] [CrossRef]
  27. Li, F.; Xu, L.; Siva, P.; Wong, A.; Clausi, D.A. Hyperspectral image classification with limited labeled training samples using enhanced ensemble learning and conditional random fifields. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2427–2438. [Google Scholar] [CrossRef]
  28. Chen, Y.; Nasrabadi, N.M.; Tran, T.D. Hyperspectral image classification using dictionary-based sparse representation. IEEE Trans. Geosci. Remote Sens. 2011, 49, 3973–3985. [Google Scholar] [CrossRef]
  29. Gao, L.; Hong, D.; Yao, J.; Zhang, B.; Gamba, P.; Chanussot, J. Spectral superresolution of multispectral imagery with joint sparse and lowrank learning. IEEE Trans. Geosci. Remote Sens. 2020, 59, 2269–2280. [Google Scholar] [CrossRef]
  30. Ghamisi, P.; Couceiro, M.S.; Martins, F.M.L.; Benediktsson, J.A. Multilevel image segmentation based on fractional-order Darwinian particle swarm optimization. IEEE Trans. Geosci. Remote Sens. 2014, 52, 2382–2394. [Google Scholar] [CrossRef] [Green Version]
  31. Hong, D.; Gao, L.; Yokoya, N.; Yao, J.; Chanussot, J.; Du, Q.; Zhang, B. More diverse means better: Multimodal deep learning meets remote-sensing imagery classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4340–4354. [Google Scholar] [CrossRef]
  32. Zheng, K.; Gao, L.; Liao, W.; Hong, D.; Zhang, B.; Cui, X.; Chanussot, J. Coupled convolutional neural network with adaptive response function learning for unsupervised hyperspectral superresolution. IEEE Trans. Geosci. Remote Sens. 2020, 59, 2487–2502. [Google Scholar] [CrossRef]
  33. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
  34. Chen, Y.; Zhao, X.; Jia, X. Spectral–Spatial classification of hyperspectral data based on deep belief network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2381–2392. [Google Scholar] [CrossRef]
  35. Mou, L.; Ghamisi, P.; Zhu, X.X. Deep recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3639–3655. [Google Scholar] [CrossRef] [Green Version]
  36. Hang, R.; Liu, Q.; Hong, D.; Ghamisi, P. Cascaded recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5384–5394. [Google Scholar] [CrossRef] [Green Version]
  37. Hong, D.; Gao, L.; Yao, J.; Zhang, B.; Plaza, A.; Chanussot, J. Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5966–5978. [Google Scholar] [CrossRef]
  38. Tao, C.; Pan, H.; Li, Y.; Zou, Z. Unsupervised spectral-spatial feature learning with stacked sparse autoencoder for hyperspectral imagery classification. IEEE Geosci. Remote Sens. Lett. 2015, 12, 2438–2442. [Google Scholar] [CrossRef]
  39. Ma, X.; Wang, H.; Geng, J. Spectral-spatial classification of hyperspectral image based on deep auto-encoder. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 4073–4085. [Google Scholar] [CrossRef]
  40. Zhang, X.; Liang, Y.; Li, C.; Hu, N.; Jiao, L.; Zhou, H. Recursive autoencoders-based unsupervised feature learning for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1928–1932. [Google Scholar] [CrossRef] [Green Version]
  41. Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H.-C. Deep Convolutional Neural Networks for Hyperspectral Image Classification. J. Sens. 2015, 2015, 258619. [Google Scholar] [CrossRef] [Green Version]
  42. Li, W.; Wu, G.; Zhang, F.; Du, Q. Hyperspectral Image Classification Using Deep Pixel-Pair Features. IEEE Trans. Geosci. Remote Sens. 2016, 55, 844–853. [Google Scholar] [CrossRef]
  43. Fang, L.; Liu, Z.; Song, W. Deep Hashing Neural Networks for Hyperspectral Image Feature Extraction. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1412–1416. [Google Scholar] [CrossRef]
  44. Chen, Y.; Li, C.; Ghamisi, P.; Jia, X.; Gu, Y. Deep Fusion of Remote Sensing Data for Accurate Classification. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1253–1257. [Google Scholar] [CrossRef]
  45. Zhang, M.; Li, W.; Du, Q. Diverse Region-Based CNN for Hyperspectral Image Classification. IEEE Trans. Image Process. 2018, 27, 2623–2634. [Google Scholar] [CrossRef]
  46. Zhu, J.; Fang, L.; Ghamisi, P. Deformable Convolutional Neural Networks for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1254–1258. [Google Scholar] [CrossRef]
  47. Ding, X.; Zhang, X.; Han, J.; Ding, G. Diverse Branch Block: Building a Convolution as an Inception-like Unit.Computer Science.Computer Vision and Pattern Recognition. arXiv 2021, arXiv:2103.13425v2. [Google Scholar]
  48. Stergiou, A.; Poppe, R.; Kalliatakis, G. Refining activation downsampling with SoftPool. Computer Science.Computer Vision and Pattern Recognition. arXiv 2021, arXiv:2101.00440v3. [Google Scholar]
  49. Li, R.; Zheng, S.; Duan, C.; Yang, Y.; Wang, X. Classification of hyperspectral image based on double-branch dual-attention mechanism network. Remote Sens. 2020, 12, 582. [Google Scholar] [CrossRef] [Green Version]
  50. Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [Google Scholar] [CrossRef] [Green Version]
  51. Liu, B.; Yu, X.; Zhang, P.; Tan, X.; Wang, R.; Zhi, L. Spectral-spatial classification of hyperspectral image using three-dimensional convolution network. J. Appl. Remote Sens. 2018, 12, 016005. [Google Scholar] [CrossRef]
  52. Li, Y.; Zhang, H.; Shen, Q. Spectral-Spatial Classification of Hyperspectral Imagery with 3D Convolutional Neural Network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef] [Green Version]
  53. Paoletti, M.E.; Haut, J.M.; Fernandez-Beltran, R.; Plaza, J.; Plaza, A.J.; Pla, F. Deep pyramidal residual networks for spectral–spatial hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 740–754. [Google Scholar] [CrossRef]
  54. Wang, W.; Dou, S.; Jiang, Z.; Sun, L. A fast dense spectral–spatial convolution network framework for hyperspectral images classification. Remote Sens. 2018, 10, 1068. [Google Scholar] [CrossRef] [Green Version]
  55. Feng, J.; Wu, X.; Shang, R.; Sui, C.; Li, J.; Jiao, L.; Zhang, X. Attention Multibranch Convolutional Neural Network for Hyperspectral Image Classification Based on Adaptive Region Search. IEEE Trans. Geosci. Remote Sens. 2021, 59, 5054–5070. [Google Scholar] [CrossRef]
  56. Dundar, T.; Ince, T. Sparse Representation-Based Hyperspectral Image Classification Using Multiscale Superpixels and Guided Filter. IEEE Trans. Geosci. Remote Sens. 2019, 16, 246–250. [Google Scholar] [CrossRef]
  57. Cao, X.; Wang, X.; Wang, D.; Zhao, J.; Jiao, L. Spectral–Spatial Hyperspectral Image Classification Using Cascaded Markov Random Fields. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4861–4872. [Google Scholar] [CrossRef]
  58. Ma, W.; Yang, Q.; Wu, Y.; Zhao, W.; Zhang, X. Double-Branch Multi-Attention Mechanism Network for Hyperspectral Image Classification. Remote Sens. 2019, 11, 1307. [Google Scholar] [CrossRef] [Green Version]
  59. Sun, L.; Ma, C.; Chen, Y.; Zheng, Y.; Shim, H.J.; Wu, Z.; Jeon, B. Low Rank Component Induced Spatial-Spectral Kernel Method for Hyperspectral Image Classification. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 3829–3842. [Google Scholar] [CrossRef]
  60. Sun, L.; Zhao, G.; Zheng, Y.; Wu, Z. Spectral-Spatial Feature Tokenization Transformer for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5522214. [Google Scholar] [CrossRef]
  61. Aletti, G.; Benfenati, A.; Naldi, G. A Semi-Supervised Reduced-Space Method for Hyperspectral Imaging Segmentation. J. Imaging 2021, 7, 267. [Google Scholar] [CrossRef]
  62. Bampis Christos, G.; Maragos, P.; Bovik, A.C. Graph-Driven Diffusion and Random Walk Schemes for Image Segmentation. IEEE Trans. Image Process. 2017, 26, 35–50. [Google Scholar] [CrossRef]
  63. Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef] [Green Version]
  64. Sinha, B.; Yimprayoon, P.; Tiensuwan, M. Cohen’s Kappa Statistic: A Critical Appraisal and Some Modifications. Math. Calcutta Stat. Assoc. Bull. 2006, 58, 151–170. [Google Scholar] [CrossRef]
  65. Bell, S.; Zitnick, C.L.; Bala, K.; Girshick, R. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2874–2883. [Google Scholar]
  66. Kong, T.; Yao, A.; Chen, Y.; Sun, F. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 845–853. [Google Scholar]
  67. Liu, C.; Wechsler, H. A shape- and texture-based enhanced Fisher classifier for face recognition. IEEE Trans. Image Process. 2001, 10, 598–608. [Google Scholar]
Figure 1. The overall framework of the proposed SFBMSN network.
Figure 1. The overall framework of the proposed SFBMSN network.
Remotesensing 14 01951 g001
Figure 2. The schematic diagram of spectral self-attention module.
Figure 2. The schematic diagram of spectral self-attention module.
Remotesensing 14 01951 g002
Figure 3. The schematic diagram of DenseNet.
Figure 3. The schematic diagram of DenseNet.
Remotesensing 14 01951 g003
Figure 4. The schematic diagram of 3D-Softpool.
Figure 4. The schematic diagram of 3D-Softpool.
Remotesensing 14 01951 g004
Figure 5. Schematic diagram of spatial self-attention module.
Figure 5. Schematic diagram of spatial self-attention module.
Remotesensing 14 01951 g005
Figure 6. (a) The OA effect of n on the four hyperspectral image data sets. (b) The OA effect of spatial size on the four hyperspectral image data sets.
Figure 6. (a) The OA effect of n on the four hyperspectral image data sets. (b) The OA effect of spatial size on the four hyperspectral image data sets.
Remotesensing 14 01951 g006
Figure 7. Classification maps of IN data set using 3% training samples: (a) false color image, (b) ground truth map (GT) and (cj) the classification map and overall accuracy of different algorithms.
Figure 7. Classification maps of IN data set using 3% training samples: (a) false color image, (b) ground truth map (GT) and (cj) the classification map and overall accuracy of different algorithms.
Remotesensing 14 01951 g007
Figure 8. Classification maps of UP data set using 0.3% training samples: (a) false color image, (b) ground truth map (GT) and (cj) classification map and overall accuracy of different algorithms.
Figure 8. Classification maps of UP data set using 0.3% training samples: (a) false color image, (b) ground truth map (GT) and (cj) classification map and overall accuracy of different algorithms.
Remotesensing 14 01951 g008
Figure 9. Classification maps of SV data set using 0.5% training samples: (a) false color image, (b) ground truth map (GT) and (cj) the classification map and overall accuracy of different algorithms.
Figure 9. Classification maps of SV data set using 0.5% training samples: (a) false color image, (b) ground truth map (GT) and (cj) the classification map and overall accuracy of different algorithms.
Remotesensing 14 01951 g009
Figure 10. Classification maps of KSC data set using 5% training samples: (a) false color image, (b) ground truth map (GT) and (cj) the classification map and overall accuracy of different algorithms.
Figure 10. Classification maps of KSC data set using 5% training samples: (a) false color image, (b) ground truth map (GT) and (cj) the classification map and overall accuracy of different algorithms.
Remotesensing 14 01951 g010
Figure 11. Ablation experiments of three modules of the proposed method on different data sets: (a) attention mechanism, (b) FBMB, (c) 3D-Softpool and (d) dense connection.
Figure 11. Ablation experiments of three modules of the proposed method on different data sets: (a) attention mechanism, (b) FBMB, (c) 3D-Softpool and (d) dense connection.
Remotesensing 14 01951 g011
Figure 12. Schematic diagram of FBMB.
Figure 12. Schematic diagram of FBMB.
Remotesensing 14 01951 g012
Figure 13. Comparative experiments results of FBMB, FBSSB, C1, C2, C3 and C4 on different data sets.
Figure 13. Comparative experiments results of FBMB, FBSSB, C1, C2, C3 and C4 on different data sets.
Remotesensing 14 01951 g013
Figure 14. The classification performance of different methods is compared under different training sample ratios in IN, UP, SV and KSC data sets. (a) Classification performance of different methods on IN data set, (b) classification performance of different methods on UP data set, (c) classification performance of different methods on SV data set and (d) classification performance of different methods on KSC data set.
Figure 14. The classification performance of different methods is compared under different training sample ratios in IN, UP, SV and KSC data sets. (a) Classification performance of different methods on IN data set, (b) classification performance of different methods on UP data set, (c) classification performance of different methods on SV data set and (d) classification performance of different methods on KSC data set.
Remotesensing 14 01951 g014
Table 1. Number of training samples and test samples in IN data set.
Table 1. Number of training samples and test samples in IN data set.
Class Number of Samples
NO.ColorNameTrainingTest
C1 Remotesensing 14 01951 i001Alfalfa343
C2 Remotesensing 14 01951 i002Corn-notill421386
C3 Remotesensing 14 01951 i003Corn-mintill24806
C4 Remotesensing 14 01951 i004Corn7230
C5 Remotesensing 14 01951 i005Grass-pasture14469
C6 Remotesensing 14 01951 i006Grass–Tress21709
C7 Remotesensing 14 01951 i007Grass-pasture-mowed325
C8 Remotesensing 14 01951 i008Hay-windrowed14464
C9 Remotesensing 14 01951 i009Oats317
C10 Remotesensing 14 01951 i010Soybean-notill29943
C11 Remotesensing 14 01951 i011Soybean-mintill732382
C12 Remotesensing 14 01951 i012Soybean-clean17576
C13 Remotesensing 14 01951 i013Wheat6199
C14 Remotesensing 14 01951 i014Woods371228
C15 Remotesensing 14 01951 i015Buildings–Grass–Trees–Drives11375
C16 Remotesensing 14 01951 i016Stone–Steel–Towers390
Total3079942
Table 2. Number of training samples and test samples in UP data set.
Table 2. Number of training samples and test samples in UP data set.
Class Number of Samples
NO.ColorNameTrainingTest
C1 Remotesensing 14 01951 i017Asphalt336598
C2 Remotesensing 14 01951 i018Meadows9318,556
C3 Remotesensing 14 01951 i019Gravel102089
C4 Remotesensing 14 01951 i020Trees153049
C5 Remotesensing 14 01951 i021Painted metal sheets61339
C6 Remotesensing 14 01951 i022Bare soil255004
C7 Remotesensing 14 01951 i023Bitumen61324
C8 Remotesensing 14 01951 i024Self-blocking bricks183664
C9 Remotesensing 14 01951 i025Shadows4943
Total21042,566
Table 3. Number of training samples and test samples in KSC data set.
Table 3. Number of training samples and test samples in KSC data set.
Class Number of Samples
NO.ColorNameTrainingTest
C1 Remotesensing 14 01951 i026Scrub38723
C2 Remotesensing 14 01951 i027Willow swamp12231
C3 Remotesensing 14 01951 i028CP hammock12244
C4 Remotesensing 14 01951 i029Slash pine12240
C5 Remotesensing 14 01951 i030Oakbroadleaf8153
C6 Remotesensing 14 01951 i031Hard wood11218
C7 Remotesensing 14 01951 i032Swamp5100
C8 Remotesensing 14 01951 i033Graminoid marsh21410
C9 Remotesensing 14 01951 i034Spartina marsh26494
C10 Remotesensing 14 01951 i035Cattail marsh20384
C11 Remotesensing 14 01951 i036Sait marsh20399
C12 Remotesensing 14 01951 i037Mud flats25478
C13 Remotesensing 14 01951 i038Water46881
Total2564955
Table 4. Number of training samples and test samples in SV data set.
Table 4. Number of training samples and test samples in SV data set.
Class Number of Samples
NO.ColorNameTrainingTest
C1 Remotesensing 14 01951 i039Brocoli_green_weeds_1101999
C2 Remotesensing 14 01951 i040Brocoli_green_weeds_2183708
C3 Remotesensing 14 01951 i041Fallow91967
C4 Remotesensing 14 01951 i042Fallow_rough_plow61388
C5 Remotesensing 14 01951 i043Fallow_smooth132665
C6 Remotesensing 14 01951 i044Stubble193940
C7 Remotesensing 14 01951 i045Celery173562
C8 Remotesensing 14 01951 i046Graps_untrained5611,215
C9 Remotesensing 14 01951 i047Soil_vinyard_develop316172
C10 Remotesensing 14 01951 i048Corn_cenesced_green_weed163262
C11 Remotesensing 14 01951 i049Lettuce_romaine_4wk51063
C12 Remotesensing 14 01951 i050Lettuce_romaine_5wk91833
C13 Remotesensing 14 01951 i051Lettuce_romaine_6wk4912
C14 Remotesensing 14 01951 i052Lettuce_romaine_7wk51065
C15 Remotesensing 14 01951 i053Vinyard_untrained367232
C16 Remotesensing 14 01951 i054Vinyard_vertical_trellis91798
Total26353,886
Table 5. Experimental parameters and configurations.
Table 5. Experimental parameters and configurations.
Laryer SettingInput SizeKernel SizeOutput Size
Spectral
branch
Input-3D-Conv--9 × 9 × 200
DiverseBranchBlock(9 × 9 × n, 24)-(9 × 9 × 97, 24)
BN-Relu-3D-Conv1(9 × 9 × n, 24) 1 × 1 × 200 , stride = [1, 1, 1](9 × 9 × 97, 12)
Concatenate--(9 × 9 × 97, 36)
BN-Relu-3D-Conv2(9 × 9 × n, 36) 1 × 1 × 7 , stride = [1, 1, 1](9 × 9 × 97, 12)
Concatenate--(9 × 9 × 97, 48)
BN-Relu-3D-Conv3(9 × 9 × n, 48) 1 × 1 × 7 , stride = [1, 1, 1](9 × 9 × 97, 12)
Concatenate--(9 × 9 × 97, 60)
Channel Attention Block(9 × 9 × n, 60)-(9 × 9 × 97, 60)
BN-Dropout-GAP(9 × 9 × n, 60)-(1 × 60)
Spatail
branch
input--9 × 9 × 200
CONV1 × 1 × 200 1 × 1 × b a n d , stride = [1, 1, 1](9 × 9 × 1, 24)
BN-Relu-3DSoftPool-3DConv(9 × 9 × 1, 24) 3 × 3 × 1 , stride = [1, 1, 1](9 × 9 × 1, 12)
Concatenate--(9 × 9 × 1, 36)
BN-Relu-3D-Conv1(9 × 9 × 1, 36) 3 × 3 × 1 , stride = [1, 1, 1](9 × 9 × 1, 12)
Concatenate--(9 × 9 × 1, 48)
BN-Relu-3D-Conv2(9 × 9 × 1, 48) 3 × 3 × 1 , stride = [1, 1, 1](9 × 9 × 1, 12)
Concatenate--(9 × 9 × 1, 60)
Spatail Attention Block(9 × 9 × 1, 60) 3 × 3 × 1 , stride = [1, 1, 1](9 × 9 × 1, 60)
BN-Dropout-GAP(9 × 9 × 1, 60) 3 × 3 × 1 , stride = [1, 1, 1](1 × 60)
Fc_LinearConcatenate-full_connection 260
Table 6. KPI (OA, AA, Kappa) on Indian pines (IN) data set with 3% training samples.
Table 6. KPI (OA, AA, Kappa) on Indian pines (IN) data set with 3% training samples.
ClassSVMCDCNNpResNetSSRNDBMAFDSSCDBDAProposed
C135.6148.5627.0681.5382.2583.5296.4998.41
C256.4866.8681.7288.1885.9390.4492.2592.05
C361.5633.1380.4286.6888.6487.6091.695.77
C441.5554.9161.0783.2787.9991.2492.6394.41
C583.0687.3591.6596.7895.0598.3197.7699.69
C684.3491.1694.7495.4497.5398.2596.8599.14
C757.8657.2520.0485.9851.1187.7065.6277.70
C888.6892.9299.2895.7598.6298.4598.7599.95
C937.4648.0868.9972.1653.3172.1183.4280.73
C1063.3364.9583.2484.9386.2283.9586.4791.76
C1164.7467.7488.7188.2689.5195.7293.1296.91
C1251.5641.3160.2185.3483.1890.5091.2292.72
C1384.7585.6879.5898.1596.898.9996.6998.39
C1489.6887.2597.4794.5396.5295.9596.1599.41
C1563.8386.6485.0188.6585.1992.5192.3795.31
C1697.6791.4389.2694.4895.4798.0190.8394.91
OA (%)67.77 ± 071.42 ± 2.5686.13 ± 1.3689.24 ± 0.4189.95 ± 1.0692.29 ± 2.5692.58 ± 0.5396.03 ± 0.03
AA (%)68.74 ± 071.35 ± 1.2175.31 ± 2.2188.69 ± 0.9586.80 ± 0.5991.45 ± 2.5691.17 ± 0.2294.58 ± 0.41
Kappa (%)64.97 ± 067.22 ± 2.7484.11 ± 0.9287.88 ± 0.4788.24 ± 1.1991.24 ± 2.5691.6 ± 0.6395.34 ± 0.61
Table 7. KPI (OA, AA, Kappa) on Pavia University (UP) data set with 0.5% training samples.
Table 7. KPI (OA, AA, Kappa) on Pavia University (UP) data set with 0.5% training samples.
ClassSVMCDCNNpResNetSSRNDBMAFDSSCDBDAProposed
C182.2787.7887.2194.1088.8391.6492.5297.28
C283.5494.7398.0296.6697.0797.0698.0799.16
C357.5565.2831.2276.7577.0886.2287.8694.35
C493.3396.1385.3699.2996.7196.7596.2798.27
C594.3797.5395.5799.6497.4699.7497.8499.17
C681.6689.6255.3793.8593.6696.8398.4798.98
C748.1478.2837.9986.4887.7371.0492.6294.64
C872.1578.5376.3283.7181.1777.8489.4387.07
C998.9692.0592.4898.9795.3798.7397.4797.84
OA (%)83.03 ± 087.94 ± 0.1382.2 ± 1.9892.50 ± 1.3291.8 ± 0.5693.16 ± 2.5696.01 ± 0.0397.13 ± 0.04
AA (%)78.24 ± 085.32 ± 0.1972.91 ± 2.1592.16 ± 1.3190.01 ± 2.6490.58 ± 2.5694.72 ± 0.5996.31 ± 0.39
Kappa (%)76.45 ± 083.95 ± 0.1677.71 ± 3.0190.89 ± 1.6189.04 ± 0.7590.88 ± 2.5694.71 ± 0.0496.19 ± 0.05
Table 8. KPI (OA, AA, Kappa) on Salinas Valley (SV) data set with 0.5% training samples.
Table 8. KPI (OA, AA, Kappa) on Salinas Valley (SV) data set with 0.5% training samples.
ClassSVMCDCNNpResNetSSRNDBMAFDSSCDBDAProposed
C199.4197.7589.2296.1797.5310098.7499.24
C297.8997.4794.9397.8598.6396.9998.1899.43
C388.9988.5486.1395.2695.8298.0196.4896.84
C496.5994.5694.2598.6691.1696.9695.3195.57
C596.0895.0999.3397.2595.7599.5897.1599.66
C699.9196.3598.9998.9598.3399.6698.8597.40
C796.6293.8898.6398.3396.6992.0799.3392.45
C873.1781.4584.6787.2688.3999.5692.8499.32
C997.0997.5998.799.3798.1697.0398.0696.99
C1086.3785.8397.0796.3794.8897.3098.5395.17
C1186.9783.6689.6496.8292.6396.0696.7499.34
C1296.2196.7798.9497.4296.7898.4197.85100
C1392.4597.8799.0697.2497.2899.9098.4899.91
C1493.0293.2297.3497.8196.9696.7597.5697.98
C1576.0273.8588.9584.3384.0388.8784.2385.98
C1698.8296.8195.2699.5498.0499.6698.9699.67
OA (%)86.98 ± 088.36 ± 0.2892.82 ± 2.192.04 ± 0.9692.95 ± 0.3395.79 ± 0.3693.74 ± 0.7497.23 ± 0.55
AA (%)91.56 ± 091.95 ± 0.6694.32 ± 0.7295.95 ± 0.2195.68 ± 0.297.50 ± 0.5496.76 ± 0.1798.11 ± 0.31
Kappa (%)85.45 ± 087.05 ± 0.391.84 ± 2.0891.14 ± 1.0892.16 ± 0.3495.31 ± 0.3293.05 ± 0.896.91 ± 0.75
Table 9. KPI (OA, AA, Kappa) on Kennedy Space Center (KSC) data set with 5% training samples.
Table 9. KPI (OA, AA, Kappa) on Kennedy Space Center (KSC) data set with 5% training samples.
ClassSVMCDCNNpResNetSSRNDBMAFDSSCDBDAProposed
C192.4396.8199.6198.499.3999.7399.6799.71
C287.1483.6592.6494.5293.893.9996.5899.67
C372.4783.9282.3785.280.282.5088.7288.32
C454.4558.6145.7374.5575.3178.7880.8291.07
C564.1152.8373.0675.1369.668.5578.1492.29
C665.2377.1790.2294.3595.0693.3197.7599.25
C775.575.3497.4484.6487.0888.6995.1595.52
C887.3385.8396.9396.9795.498.8399.0899.84
C987.9491.6599.8197.8396.2199.8099.98100
C1096.0193.8797.1498.8496.1310099.9298.98
C1196.0398.7798.2699.1499.6499.1598.9299.17
C1293.7594.0899.1399.1798.1999.0798.95100
C1399.7299.81001001001.0099.9799.51
OA (%)87.96 ± 089.33 ± 0.6594.04 ± 2.5594.52 ± 0.994.12 ± 0.2795.62 ± 0.0396.76 ± 0.5198.83 ± 0.52
AA (%)82.55 ± 084.03 ± 0.9590.17 ± 23.7392.15 ± 1.8791.23 ± 0.7592.49 ± 0.0694.9 ± 0.297.72 ± 0.21
Kappa (%)86.59 ± 088.13 ± 0.7393.68 ± 2.7193.9 ± 193.45 ± 0.3195.12 ± 0.0396.4 ± 0.5798.58 ± 0.7
Table 10. Effective analysis of the feature fusion strategy (OA%).
Table 10. Effective analysis of the feature fusion strategy (OA%).
StrategyINUPSVKSC
With fusion96.0397.1397.2398.83
Without fusion94.2195.2795.3997.01
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shi, C.; Sun, J.; Wang, L. Hyperspectral Image Classification Based on Spectral Multiscale Convolutional Neural Network. Remote Sens. 2022, 14, 1951. https://doi.org/10.3390/rs14081951

AMA Style

Shi C, Sun J, Wang L. Hyperspectral Image Classification Based on Spectral Multiscale Convolutional Neural Network. Remote Sensing. 2022; 14(8):1951. https://doi.org/10.3390/rs14081951

Chicago/Turabian Style

Shi, Cuiping, Jingwei Sun, and Liguo Wang. 2022. "Hyperspectral Image Classification Based on Spectral Multiscale Convolutional Neural Network" Remote Sensing 14, no. 8: 1951. https://doi.org/10.3390/rs14081951

APA Style

Shi, C., Sun, J., & Wang, L. (2022). Hyperspectral Image Classification Based on Spectral Multiscale Convolutional Neural Network. Remote Sensing, 14(8), 1951. https://doi.org/10.3390/rs14081951

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop