1. Introduction
Hyperspectral images contain hundreds of bands, with rich spectral and spatial information [
1]. They have been explored in many areas, such as HSI classification [
2], geological survey [
3], biomedicine [
4]. In particular, the HSI classification task is the basis of HSI analysis. It allocates a label to each pixel. However, the acquisition and learning of HSI features have always been the focus of and difficult in HSI classification research. How sufficiently and effectively features are extracted directly affects the classification results.
In the early stages of research, most of classification methods are mainly handcrafted feature extraction [
5] and conventional classifiers. The widely used handcraft feature extraction methods contain principal component analysis (PCA) [
6], linear discriminant analysis (LDA) [
7], and simple linear iterative cluster (SLIC) [
8]. Feng Xue et al. proposed an improved functional principal component analysis method, which can extract more effective functional features by making full use of the label information of training samples [
6]. Qiuling Hou et al. proposed a novel supervised dimensionality reduction method termed linear discriminant analysis based on kernel-based possibilistic c-means, which use a KPCM algorithm to generate different weights for different samples [
7]. Munmun Baisantry et al. proposed a band selection technique based on the FDA and functional PCA fusion method. This method can select shape-preserving, discriminative bands that can highlight the important characteristics, variations as well as patterns of the hyperspectral data [
9]. Conventional classifiers mainly contain support vector machine (SVM) [
10], k-nearest neighbor (KNN) [
11], logistic regression [
12], and so on. Amos Bortiew et al. proposed an active learning method for HSI classification using kernel sparse representation classifiers (KSRC). KSRC has proven to be a robust classifier and has successfully been applied to HSI classification [
13]. Nevertheless, the handcrafted features extracted by these methods usually have weak representative and discriminative feature characteristics, which result in unsatisfactory classification performance.
Recently, learning-based methods have been applied to classify HSIs and have achieved great success due to their greater representation ability [
2,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28]. Yushi Chen et al. [
2] designed a stacked autoencoder to learn spectral information and spatial information. Then, they proposed a deep learning framework combined PCA and logistic regression to fuse two kinds of features. The classifier of the deep learning method unsurprisingly achieved higher accuracy. Yushi Chen et al. [
29] further designed a new deep framework that combined spectral spatial feature extraction and classifier to improve classification accuracy. Wei Hu et al. [
20] presented a simple CNN architecture that contained multiple convolutional layers for classification. Jiaojiao Li et al. [
21] proposed a full CNN to learn HSI features. Furthermore, a deconvolution network was proposed to enhance HSI features. Qichao Liu et al. [
18] proposed a novel guided feature extraction unit, which can enhance the cross-classes region features and suppress the irregular features. Ghaderizadeh Saeed et al. [
17] proposed a mixed 3D–2D CNN, which could learn the useful spectral-spatial features. Weiwei Song et al. proposed a novel hashing-based deep metric learning method for hyperspectral images and light detection and ranging data classification. They also elaborately designed a loss function to simultaneously consider the label-based semantic loss and hashing-based metric loss [
23]. Tan Guo et al. designed a dual-view spectral and global spatial feature fusion network to extract the spatial-spectral features, which included a spatial subnetwork and a spectral subnetwork [
22]. Hao Zhou et al. proposed a novel multiple feature fusion model, which included two subnetworks of multiscale fully CNN and multihop GCN to extract the multilevel information from HSIs [
24]. Yule Duan et al. proposed a structure-preserved hyper GCN, which integrated local regular convolution and irregular hypergraph convolution to learn the structured semantic feature of HSIs [
25]. M.E.Paoletti et al. presented a novel classification method combined the ghost-module architecture with a CNN-based HSI classifier to reduce the computational cost, which can achieve a satisfactory classification performance [
26]. Swalpa Kumar Roy et al. proposed a new end-to-end morphological deep learning framework to model nonlinear information during the training process. The method included spectral and spatial morphological blocks to extract relevant features from the HSI data [
27]. Kazem Safari et al. proposed a deep learning strategy that combined different convolutional neural networks to efficiently extract joint spatial-spectral features over multiple scales [
28]. Although these learning-based methods have more advantages than conventional methods in HSI classification, they inevitably require extensive labeled samples to train a suitable network for classification. Nevertheless, it is usually difficult to obtain labeled samples to train a good performing neural network, because the collection of labeled samples requires a lot of time and financial resources.
To mitigate this issue, many scholars have introduced unsupervised methods to assist in HSI classification [
30,
31,
32]. Shaohui Mei et al. [
30] designed an unsupervised feature learning method with 3D convolutional autoencoder (3D-CAE), which could effectively learn spatial-spectral structure features. Lichao Mou et al. [
31] proposed an unsupervised network architecture, which introduced an encoder-decoder architecture to extract unsupervised spectral-spatial features to assist classification.
Meanwhile, some researchers have also presented semi-supervised approaches [
33,
34,
35]. Kun Tan et al. [
33] argued that spatial neighborhood information of samples is very useful in the semi-supervised HSI classification. Therefore, they used spatial neighborhood information to enhance the classifier to improve the classification performance. Yue Wu et al. [
34] designed a semi-supervised approach to fully extract unlabeled samples information. Further, they used a self-training method to gradually increase the sample points confidence, which could enhance the semi-supervised HSI classification ability. Fuding Xie et al. [
35] developed a multinomial logistic regression module and a local mean-based pseudo nearest neighbor module to learn labeled samples information. Then, a novel four steps strategy was proposed to conduct semi-supervised classification. The above methods indeed achieved good classification performance in the case of limited samples. But they mainly learned the limited labeled features or further explored the features of unlabeled samples to train the model. It meant that the labeled samples were exactly identical to the unlabeled samples, which resulted in a network performance always limited by the number of labeled samples from the target domain data to be classified (namely, the target domain). Meanwhile, they also hardly utilized enough labeled samples in other HSIs [
36] (namely, the source domain). In practice, the distribution of the source domain and the target domain may be different, which makes it difficult to improve the accuracy of HSI few-shot classification.
To further resolve this issue, meta-learning methods have been proposed to learn classification abilities from HSIs [
37,
38]. In particular, meta-learning does not require strict class consistency and the same distribution between source domain and target domain. Few-shot learning (FSL) is an important implementation method of meta-learning [
39,
40,
41], which can transfer the extracted classification knowledge from source domain to target domain. In recent years, a growing number of FSL methods have been proposed for HSI classification [
42,
43,
44,
45]. Bing Liu et al. [
42] proposed a deep FSL method to solve the small sample size problem of HSI classification via learning the metric space from the training set. Kuiliang Gao et al. [
43] designed a novel classification method based on relational network and trained it using the idea of meta-learning. Xuejian Liang et al. [
44] proposed an attention multisource fusion method for HSI few-shot classification, which could extract features from fused homogeneous and heterogeneous data. Xibing Zuo et al. [
45] proposed an edge-labeling graph neural network, which could explicitly quantify the intraclass and interclass features between different pixels.
Although these FSL methods have achieved good classification accuracy with the limited labeled samples, the feature extraction ability of the source and target domain data is still the main factor affecting classification accuracy and thus needs to be improved. Meanwhile, it is necessary to further enhance useful spectral-spatial features and repress useless features to improve the classification accuracy. To this end, a novel spectral-spatial domain attention network (SSDA) is developed in this article. It is composed of a spectral-spatial module, a domain attention module and a multiple loss module. The spectral-spatial module is designed to extract discriminative and domain invariance spectral-spatial features via a spectral branch and a multiscale spatial branch. The domain attention module is proposed to enhance the contributions of useful spectral-spatial features. The multiple loss module contains few-shot loss, coral loss, and mmd loss, which can jointly solve the domain adaptation issue. In particular, the few-shot loss is utilized to minimize the cross entropy between the predicted probability and the ground truth. The coral loss is utilized to minimize the difference in learned feature covariances between source domain and target domain. The mmd loss is utilized to reduce the distribution differences between source domain and target domain. Extensive experimental results demonstrate that on the Salinas, University of Pavia (UP), Indian Pines (IP), and Huoshaoyun datasets, the proposed method achieves higher classification accuracy than state-of-the-art methods with few-shot samples. This article mainly contains three contributions.
To extract discriminative and domain invariance spectral-spatial features, a spectral-spatial module is developed via a spectral branch and a multiscale spatial branch. The residual deformable 3D block of the spectral branch can learn rich spectral features, which can well adapt to the spatial geometric transformations. The multiscale spatial branch can learn multiscale spatial features, which can learn strong complementary and related spatial information.
Different spectral-spatial features make different contributions to classification. The domain attention module is designed via a spectral attention block and a spatial attention block to further enhance useful spectral-spatial features and suppress useless features.
By combining the spectral-spatial module, the domain attention module, and the multiple loss module, a novel spectral-spatial domain attention (SSDA) method is proposed. It can transfer the classification knowledge from source domain to target domain to improve the HSI few-shot classification accuracy.