1. Introduction
In earth observation, hyperspectral technology is one of the top trends in the remote sensing community, and it plays a significant role. The hundreds of spectral bands provided by hyperspectral images (HSIs) give them an inherent advantage in quantitative applications such as mineral mapping, environmental monitoring and classification [
1].
Earlier work mainly focused on discriminative spectral feature extraction, such as the spectral angle mapper (SAM) [
2] and spectral information divergence (SID) [
3]. These methods utilize extensive prior knowledge and typically require no sample. They do not, however, fully use the spatial features of HSIs. Later on, some supervised spectral classifiers, such as support vector machines (SVM) [
4], random forest [
5] and neural networks [
6], were proposed to achieve a more accurate classification and have since gained widespread acceptance. Nonetheless, the curse of dimensionality is still a bottleneck for supervised classifiers. To resolve this concern, a large number of spectral–spatial joint feature extraction methods have been developed for HSI classification [
7,
8,
9,
10,
11]. In addition, numerous band selection methods [
12,
13] and hand-crafted feature extraction methods combined with a learning strategy are proposed [
8,
14]. However, these shallow features still have limitations regarding more precise hyperspectral classification in complex scenes.
Thanks to advances in computer vision, deep learning technology has made significant strides in remote sensing applications in recent years. CNNs’ potent learning capabilities have enabled end-to-end supervised deep learning models to achieve highly competitive hyperspectral classification results [
15,
16,
17,
18] when large amounts of labeled data are available. The various deep leaning classification modes for HSIs are summarized in [
19]. However, since hyperspectral classification is a few-shot problem in most cases, it is challenging to collect the large number of hand-crafted training samples required for supervised classification models. In addition, larger models applied to scenarios with limited data tend to result in overfitting and reduce the robustness of the model. Currently, developing an unsupervised or semi-supervised method suitable for small-sample scenarios is a significant challenge in hyperspectral classification.
Various deep learning-based approaches with different learning paradigms have been proposed to address this issue [
20,
21], including transfer learning [
22,
23], few-shot learning [
24,
25] and self-learning [
26,
27]. The purpose of transfer learning is to initialize the network weights, thereby reducing training time and improving accuracy. Few-shot learning also employs the strategy of transfer learning but focuses more on mining meaningful representation information from labeled data [
28]. However, this method has stringent requirements for the labeled quality and diversity of the data. Unlike few-shot learning methods, self-supervised learning can learn deep representations by reconstructing the input completely. However, this type of data representation is regarded as the compression of all information rather than the effective screening of meaningful information, and information with a small proportion is often disregarded [
29].
Summarizing previous research, we believe that in order to solve the problem of few-shot hyperspectral classification, three conditions are necessary: 1. Strong feature extraction capabilities; 2. Effective representation information is obtained from data; 3. The model is robust and less dependent on data. Therefore, the question of how to combine self-supervised learning with traditional physical models, such as HU, to achieve effective expression of representational information has become an important area of study.
Let us take a look at the HSI unmixing methods first. The HU technique is applied in order to split the mixed pixel spectrum into materials in their purest forms (endmembers) as well as their proportions (abundances). Generally, the common HU model can be summarized in two categories: the linear mixing (LM) model and the nonlinear mixing (NLM) model [
30]. The LM model operates under the presumption that every pixel of the HSIs is a linear combination of the pure endmembers. These methods can be further subdivided into pure pixel-based methods [
31,
32], and non-pure pixel-based methods [
33,
34], which concentrate on leveraging the data structure by making geometrical or sparseness assumptions. However, the LM model does not consider the multiple-scattering function and the interaction between objects, which makes it unsuitable to solve the HU problem in complex scenes [
35]. In recent years, many neural network (NN) algorithms have been proposed to handle NLM challenges [
36,
37,
38]. Deep learning methods have been proven to improve the accuracy of HU. Recently, the convolutional autoencoder (CAE) has emerged as a new trend in HU applications [
39,
40,
41]. A denoising and sparseness autoencoder is introduced in [
42,
43] to estimate the abundance of endmembers. Additionally, stacked autoencoders are further employed for HU [
44,
45]. Most recently, 3D-CNN autoencoders [
35,
46] were employed to handle HU problems in a supervised setting. Existing research has acknowledged the significance of spatial–spectral joint characteristics for classification [
47]. New advances indicate that HU theory can guide the network to learn more effective and regularized representational features; thus, the robustness of the model can be enhanced. The endmember abundance obtained through HU can provide useful spatial–spectral features for semi-supervised classification [
48,
49,
50]. This also sheds new light on solving the few-shot classification problem.
In this research, we introduce a novel end-to-end convolutional autoencoder that we call CACAE. Furthermore, an attention mechanism is utilized to increase the accuracy of abundance estimation by extracting the diagnostic spectral characteristics associated with a given endmember more precisely. In addition, a semi-supervised classification pipeline based on CACAE that uses endmember abundance maps as classification features is introduced. Experiments are carried out on real hyperspectral datasets, and the outcomes are compared using a variety of supervised and semi-supervised models.
The remaining sections of this paper are structured as follows: In
Section 2, the proposed method is described. The experimental dataset is described in
Section 3. Experiments and analysis are discussed in
Section 4, while the final section provides the conclusion.
5. Conclusions
In this contribution, we present a new semi-supervised pipeline for few-shot hyperspectral classification, where endmember abundance maps obtained by HU are treated as latent features for classification. It includes two main processes. First, it uses 3D-CNN to build a self-supervised learning model and realizes the non-negativity and summation constraints through the NSC layer and realizes the extraction of endmember abundance information with given endmembers. Secondly, the abundance map can be treated as a diagnostic spatial–spectral feature, enabling claffication with only a small number of samples.
The first experiment is designed to verify the effectiveness of the proposed model for abundance map extraction. In this experiment, the performance of CACAE and other 3D-CNN methods (including PCAE, CCAE and PACAE) for HU is assessed (
Section 4.3). The results suggest that the proposed model is capable of accurately estimating the abundance map of a given endmember with the lowest RMSE and ASAD. Additionally, the two strategies of utilizing the attention mechanism and taking an image cube as input are useful for improving the estimation accuracy. The second experiment tests the performance of proposed methods and other algorithms (including SID, FCLS, SAM, Basic-CNN and PACAE) for HSI classification (
Section 4.4.1). The experimental results demonstrated that the performance of the proposed model not only outperformed traditional unsupervised and semi-supervised classification methods but also exceeded that of CNN-based supervised classification models, even when samples were relatively sufficient. It also illustrates the effectiveness of the abundance maps for HSI classification. The last experiment is designed to assess the robustness of the proposed model with a small sample size. The traditional machine learning algorithms SVM and Basic-CNN are employed as comparison models (
Section 4.4.2). The proposed model shows advantages over the supervised classification model at all levels of sampling rates and shows stable and robust characteristics such that the model’s performance is not restricted by the number of samples. These three experiments fully prove the potential of the proposed method in few-shot classification.
The advantage of the proposed method is that it combines a 3D-CNN with HU theory and uses the estimated endmember abundance as the input diagnostic feature for classification, which effectively improves the classification accuracy and robustness of the model. Because sufficient data are required to train a model with a large number of parameters, which tends to result in overfitting problems with small sample sizes, the suggested model employs a lightweight architecture design. The CNN model used for comparison also uses a similar backbone structure to make the comparison more fair and to avoid the difference in network structure affecting the reliability of the experiment.
The primary limitation of this method is that it is still dependent on the precision of the given endmembers. Although there are methods such as N-FINDR and VCA that can estimate endmembers, there is still room for improvement in accuracy. On the other hand, however, the current loss design makes it difficult to distinguish targets with very small spectral differences. Future research will focus on utilizing the autoencoder network for unsupervised endmember abundance estimation and developing a new loss function for more accurate classification. The hyperspectral datasets used in this paper and example code are available in the GitHub repository via
https://github.com/lichunyu123/3DCAE_Hyper (accessed on 1 December 2022).