1. Introduction
Hyperspectral imagery sensors usually collect reflectance information of objects in hundreds of contiguous bands over a certain electromagnetic spectrum [
1], and the hyperspectral image (HSI) can simultaneously obtain a set of two-dimensional images (or bands) [
2]. These rich bands play an important role in discriminating different objects by their spectral signatures [
3], and making them widely applicable in classification [
4] and anomaly detection [
5]. However, limited by the existing imaging sensor technologies, HSIs are characterized by low spatial resolution, which results in limitation of their applications’ performance. As a type of signal post-processing technique, HSI super-resolution (SR) can improve the spatial resolution of the HSI without modifying the imagery hardware, which is a hot issue in computer vision.
HSI SR has been studied for a long time in remote sensing and many methods have been proposed to improve the spatial resolution of the HSIs. According to the number of input images, these methods can be roughly classified into two types: the fusion-based HSI SR methods and the single-based ones.
The fusion-based approaches are based on the assumption that multiple fully-registered observations of the same scene are accessible. Dong et al. [
6] proposed a nonnegative structured sparse representation approach, which jointly estimates the dictionary and sparse code of the high-resolution(HR) HSI based on the input low-resolution (LR) HSI and HR panchromatic (PAN) image. By utilizing the similarities between pixels in the super-pixel, Fang et al. [
7] proposed a super-pixel based sparse representation method. Dian et al. [
8] presented a non-local sparse tensor factorization HSI SR method, which achieves a fuller exploitation of the spatial-spectral structures in the HSI. For matrixing three-dimensional HSI and multispectral image (MSI) that are prone to inducing loss of structural information, Kanatsoulis et al. [
9] addressed the problem from a tensor perspective and established a coupled tensor factorization framework. Zhang et al. [
10] discovered that the clustering manifold structure of the latent HSI can be well preserved in the spatial domain of the input conventional image, and proposed to super-resolve the HSI by this discovery. Considering that the sparse based methods tackle each pixel independently, Han et al. [
11] utilized a self-similarity prior as the constraint for sparse representation of the HSI and MSI. With the auxiliary HR image being another input of this kind of methods, more information is obtained, and this type of methods often achieves a better spatial enhancement. However, in reality, the auxiliary fully-registered HR description of the same scene is always hard or impossible to be achieved, which restricts the practicability of this type of methods.
The single-based HSI SR methods can be further divided into the sub-pixel mapping ones and the direct single-based ones. Sub-pixel mapping methods aim at estimating the fractional abundance of pure ground objects within a mixed pixel, and obtain the probabilities of sub-pixels to belong to different land cover classes [
12]. Irmak [
13] firstly utilized the virtual dimensionality to determine the number of endmembers in the scene and computed the abundance maps. The corresponding HR abundance maps are firstly obtained by maximum a posteriori method. Then, they are utilized to reconstruct the HR HSI. This kind of methods tackles the SR problem from the endmember extraction and fractional abundance estimation. However, the noise generated by the unmixing operation is inevitable during the mapping operation, which makes a negative influence on the SR process. Additionally, sub-pixel mapping methods are usually applied to certain applications, such as classification and target detection, for overcoming the limitation in spatial resolution. Arun et al. [
14] explored convolutional neural network to jointly optimize the unmixing and mapping operation in a supervised manner. Xu et al. [
15] presented a joint spectral-spatial mapping model to obtain the probabilities of sub-pixels to belong to different land cover classes, and obtained a resolution-enhanced image.
The direct single-based HSI SR methods aim at reconstructing an HR HSI with only one LR HSI. Inspired by the achievements in deep learning based RGB image SR methods, Yuan et al. [
16] and He et al. [
17] proposed to super-resolve each band individually by transfer learning. Considering the three-dimensional data characteristics of HSIs, Mei et al. [
18] proposed a 3D-CNN to exploit both the spatial context and the spectral correlation. However, as there are usually hundreds of bands in the HSI, super-resolving the bands individually consumes much complexity. Moreover, as the image quality decreases, super-resolving each single band will be much more difficult, which will induce severe performance degradation.
In this paper, we propose an HSI SR method by combining an information distillation network (IDN) [
19] with an intra-fusion operation to make a deep exploitation of the spatial-spectral information. During the implementation process, bands are firstly selected by certain interval, and then super-resolved by taking advantage of their spatial information and the spatial mapping learnt by the IDN model. The IDN was trained by 91 images from Yang et al. [
20] and 200 images from the Berkeley Segmentation dataset [
21]. Three data augmentation ways were applied to make full use of the training data and 2619 images were obtained. The IDN was designed to learn the spatial mapping between Y channels of the low-resolution RGB images and those of the corresponding high-resolution RGB images. Each single band in HSI can also be tackled as its Y channel at current wavelength. In this way, it is reasonable to transfer the IDN for the HSI SR. Secondly, spectral correlation is utilized to achieve a complete but coarse HR HSI. In addition, the information these unselected bands convey is further exploited by intra-fusing with the coarse HR HSI, resulting a finer HR HSI. In this way, both spatial and spectral information of the input LR HSI is fully-utilized, which contributes to the robust and acceptable performance.The main contributions of this work are summarized as follows:
1. We adopt a scalable SR strategy for super-resolving the HSI. Firstly, an IDN is used for super-resolving the interval-selected bands individually, a process exploiting their spatial information and the mapping learned by the IDN. Secondly, the unselected bands are fast interpolated via cubic Hermit spline method, which uses the high spectral correlation in the HSI to obtain a coarse HR HSI. Both spatial and spectral information is utilized. Meanwhile, contrary to super-resolving the HSI band by band via some certain methods, this scalable SR strategy achieves a tradeoff between high quality and high efficiency.
2. Most existing single-based methods super-resolve bands in the HSI individually, which neglects the spectral information. In this way, their performance is highly correlated to the images’ spatial quality. The proposed method deeply utilizes both the spatial and spectral information in the HSI, and its performance is more robust.
3. To deeply use the information the input LR HSI conveys, intra-fusion is made between the spectral-interpolated coarse HR HSI and the input LR HSI. Different from most fusion methods, which require another co-registered image as the input, the other input of the proposed intra-fusion is an intermediate outcome of the SR processing, which fully exploits the information the LR HSI conveys in a subtle way.
The remainder of the paper is organized as follows:
Section 2 describes the proposed method. We present the experimental results and data analysis in
Section 3. Conclusions are drawn in
Section 4.
2. Proposed Method
In this section, we present the four main parts of the proposed method: framework overview, bands’ selection and super-resolution by IDN, unselected bands’ super-resolution, and intra-fusion. Detailed descriptions of these four parts are presented in the following subsections.
2.1. Framework Overview
Figure 1 illustrates the workflow of the proposed HSI SR method. The input data are one LR HSI. Bands are first selected by a certain distance and super-resolved via IDN with respect to their spatial information. Then, unselected bands are super-resolved by utilizing their spectral correlation to obtain a coarse but integrated HR HSI. Furthermore, this coarse HR HSI is intra-fused with the input LR HSI to further use the information these unselected bands convey.
To facilitate discussion, we clarify the notations of some frequently used terms. The input LR HSI and the desired HR HSI are represented as and , where w and h denote the width and height of the input LR HSI. s and n denote the scaling factor and the number of bands, respectively. and denote the coarse HR HSI and the intra-fused fine HR HSI, respectively.
2.2. Bands’ Selection and Super-Resolution by IDN
This part firstly analyzes the correlation between the bands in the HSI and depicts the rationality of interval setting, and then super-resolves the selected bands via IDN.
2.2.1. Correlation Analysis
For HSIs, their neighboring bands are highly correlated in the spectral domain [
22].
Figure 2 plots the correlation coefficient curves of the Pavia university scene HSI, a remote-sensing HSI widely applied in classification [
23].
The value on the x-axis is the index of the current band
. The legend denotes the interval between
and the other neighboring band
. The value on the y-axis is the correlation coefficient between
and
. According to the three curves in
Figure 2, although correlation decreases as the band gets further from the current one, most bands are highly correlated to their neighboring bands, whose correlation coefficients are larger than 0.95. Moreover, as the image quality decreases, most high-frequency in the images are damaged, the correlation coefficient between neighboring bands is supposed to be higher (related experiments have been described in
Section 3). Hence, to achieve a high efficiency without performance loss, it is rational and necessary to firstly select some bands by certain distance and super-resolve the selected bands by utilizing their spatial information. Contrary to super-resolving each bands in the HSI, this operation will highly reduce the computational complexity with negligible performance degradation.
2.2.2. Super-Resolution via IDN
Figure 3 has shown the general architecture of the IDN, a process consisting of three parts, i.e., a feature extraction block, multiple stacked information distillation blocks and a reconstruction block.
The feature extraction block applies two convolution layers to extract the feature maps from the original LR images. The extracted features maps act as the input of the information distillation blocks. Several information distillation blocks are composed by using chained mode, in which each block contains an enhancement unit and a compression unit with stacked style. Finally, a transposed convolution layer without activation function acts as the reconstruction block to obtain the HR image. Compared with the other networks, the IDN extracts feature maps directly from the LR images and utilizes multiple DBlocks to generate the residual representations in HR space. The enhancement unit in each DBlock gathers as much information as possible, and the remaining compression unit distills more useful information, which achieves competitive results with a concise structure.
It should be noted that the network considered two loss functions during the training process. The first one is the widely used mean square error (MSE):
in which
N,
, and
represent the number of training samples, the
ith input image patch and the label of the
ith input image patch, respectively. Meanwhile, mean absolute error (MAE) is also applied to train the IDN model. The MAE is formulated as:
Specifically, the IDN is first trained with , and then fine-tuned by the .
Having the trained IDN, the spatial resolution of the selected bands in the LR HSI can be enhanced in a fast and efficient way, and the IDN super-resolved HR bands can be denoted as
where the unselected bands are temporarily missing.
2.3. Spectral-Interpolation for the Unselected Bands
Given the super-resolved interval-selected bands, the proposal applies a cubic Hermite spline method
to achieve a continuous and smooth entire HR HSI. Cubic Hermite splines are typically used for interpolation of numeric data specified at given argument values, to obtain a smooth continuous function. Compared with the linearity, it can better hold and analyze the mean of the the dependent variables and capture the nature of their relationships [
24].
Suppose that the following nodes in the cubic Hermite spline function and their values are given:
in which
n denotes the number of the given nodes minus 1, thus it starts from 0. The proposed cubic Hermite spline function
describes the mapping between
x and
y, and is a partition- defined formula.
With these IDN-super-resolved HR bands acting as the given nodes, the cubic Hermite spline function
can be applied to obtain one coarse but integrated HR HSI, which can be denoted as:
2.4. Intra-Fusion
According to the above descriptions, the HR HSI is reconstructed by utilizing the spatial information of the selected bands, the mapping learned by the IDN model and the spectral correlation. The information these unselected bands convey is directly neglected. If further utilization of these information is made, it is supposed that a spatial enhancement will be gained. In this way, we propose to get an HR HSI through intra-fusing the spectral-interpolated with the input LR HSI L via the non-negative matrix factorization (NMF) method. Different from most existing fusion methods, the input of the proposed fusion is an intermediate output of the super-resolution process, which is why it is named as intra-fusion. This intra-fusion is more flexible and more practical.
Because of the coarse spatial resolution, pixels in the HSI are usually mixed by different materials. Spectral curves of the HSI usually are mixtures of different pure materials’ reflectance, and these pure materials are called endmembers. Considering the mathematics simplicity and physical effectiveness, each spectral curve can be modeled by a linear mixture model. Let matrix
represent the desired HR HSI by concatenating the pixels of HSI
H:
. The same operation is implemented on the spectral-interpolated HR HSI
and the input LR HSI
to obtain the
and
. In this way, the desired HR HSI
can be denoted as
where
is the endmember matrix, and
is the abundance matrix.
N denotes the residual.
D is the number of endmembers and each column in
W represents the spectrum of an endmember. Here,
D is obtained by the Neyman–Pearson lemma [
25]. Given an HSI with
n bands, eigenvalues of its correlation matrix
and covariance matrix
are computed and sorted as
and
, respectively. According to the binary hypothesis testing, assume H0:
, H1:
. According to the preset false alarm rate, a threshold
can be computed to maximize the detection rate. In this way, when
is greater than
, a signal signature is considered to exist. The number of endmembers is obtained by counting the number of eigenvalues that satisfy
. In the proposal, the given false alarm rate is set as
. Meanwhile, all elements in the matrix
W and
C are nonnegative.
The spectral-interpolated HR HSI
and the input LR HSI
can be denoted as:
in which
Q is the spectral transform matrix, and
is the spatial spread transform matrix.
and
are the residual matrices. When substituting Equation (
6) into Equations (
7) and (
8),
and
can be reformulated as
where
and
denote the spatial degraded abundance matrix and the spectrally degraded matrix, respectively.
During the HSI unmixing procedure, it is expected that the HSI reconstructed by the endmember and coefficient matrices should be close to input image. In this way, the cost functions about the input LR HSI
and spectral-interpolated
are formulated as
and
, respectively. NMF was developed to decompose a nonnegative matrix into a product of nonnegative matrices [
26]. When applied to the
, the cost function is formulated as
During the solving process, both
and
C are firstly initialized as nonnegative matrices. If
is smaller than the desired matrix
, a variable
k whose value is larger than 1 will be multiplied by
to make it next to the
. On the other hand, if
is larger than
,
k should be larger than 0 but smaller than 1 to make sure all the elements in
are nonnegative. In this way,
k is defined as
where
denotes the transposition of the matrix. “./” denotes the element-wise division. According to Equation (
7), it is noted that the relation between
k and constant 1 changes with that between
and
. Hence,
can be updated by the following expression:
The same update strategy is operated on C, W and .
When W and C are obtained, the HR HSI intra-fused by and L is also achieved, which contains both the conveyed spatial information L and the IDN learned spatial mapping correlation. Moreover, this fusion operation does not require any auxiliary HR image as the input, which is more practical. The complete algorithm is summarized in Algorithm 1.
Algorithm 1: Pseudocode of the proposal. |
|