1. Introduction
With the advancement of remotely-sensed hyperspectral imaging instruments, Hyperspectral Images (HSIs) have gained widespread attention throughout the globe. HSI contains enriched information due to the presence of various bands (almost more than hundreds) [
1]. HSI provides the comprehensive spectral information of the materials’ physical properties. This ubiquitous technique is applied in agricultural monitoring [
2,
3], forestry [
4], ecosystem monitoring [
5], mineral identification [
6,
7], environmental pollution monitoring [
8] and urban growth analysis [
9,
10]. For the HSIs’ classification, good results usually entail many labeled samples. The images are deemed as high-dimensional points that lie in or nearly in low-dimensional manifolds. Recently, numerous dimensionality reduction algorithms have proposed preserving the local manifold structure of the data, such as Locally Linear Embedding (LLE) [
11], Isomap [
12] and Laplacian Eigenmap [
13]. The main limitation is the lack of sufficient labeled training data since identifying and labeling samples comprise a daunting task. The main drawback of the machine learning approach is overfitting [
14,
15] due to the limited number of training data when dealing with small sample problems in highly dimensional and nonlinear cases, which is referred to as the “Hughes phenomenon” [
16,
17].
To resolve the issue above, semi-supervised learning was proposed to utilize both labeled data and the information conveyed by the marginal distribution of the unlabeled samples to boost the algorithmic performance [
18,
19,
20]. Graph embedding means that each node is mapped to a low-dimensional feature vector and tries to maintain the connection between vertices [
20]. Cai et al. proposed Semi-supervised Discriminant Analysis (SDA), which is a novel method that takes advantage of labeled and unlabeled samples [
21]. Nevertheless, the semi-supervised learning algorithmic performance relies profoundly on the graph construction process. The
k-nearest neighbor [
12] and Locally Linear Embedding (LLE) neighbors [
13,
22], as well as other traditional methods depend mainly on the pair of Euclidean distances. However, these methods are not robust to noise as a result of using Euclidean distances to find the pairwise weights. Yan et al. proposed the
graph for SSL (Semi-supervised learning) [
23]. Zhu et al. presented a series of novel semi-supervised learning approaches arising from graph representation [
18,
24,
25]. Belkin and Niyogi [
26] proposed a regression function that fits the label of the labeled data while maintaining the smoothness of the data. Jebara et al. provided a
b-matching graph for SSL, which ensures that each node has the same number of edges in the balanced graph [
22]. Zhou et al. [
18] conducted semi-supervised learning with local and global consistency.
Low-rank representation is proposed to construct an undirected graph (LR-graph) [
27,
28,
29], which jointly receives the graph of all of the data according to the low-rank constraint [
27,
28,
29]. The graph regards as finite a group of samples in which one sample is associated with a vertex and the weight represents two connecting vertexes’ similarity [
20,
30,
31]. In [
32], Zhang et al. introduced the HSI restoration Low-Rank-based Recovery Method (LRMR), which can remove various noises. Veganzones et al. revealed an approach that partitions the image into patches and resolves the fusion issue of each patch by low-rank representation individually [
33]. According to LRR (Low-rank Representation) and the Learned Dictionary (LD), a hyperspectral anomaly detector has been put forward, which assumes that the hyperspectral image can disintegrate into a low-rank and a sparse matrix, respectively, representing the background and anomalies [
34]. To resolve the failure to preserve spatial information problem, a Tensor Sparse and Low-rank Graph for Discriminant Analysis (TSLGDA) is proposed [
35]. In [
36], He et al. suggested a spatial-spectral mixed-noise removal method for the HSI. The proposed Sparse and Low-rank Graph-based Discriminant Analysis (SLGDA) incorporates sparsity and low-rank to simultaneously preserve both local and global information [
30]. Qi et al. presented a multi-task joint sparse and low-rank representation model for high-resolution satellite image interpretation [
37].
The calculation of the LRR graph is a grave concern; because the growing data points subsequently issue the pixels in HSIs, the computational complexity of LRR grows exponentially. To resolve the issues above, we explore the low-rank structure via block theory.
Figure 1 shows the formulation of the proposed regularized block low-rank discriminant analysis feature extraction for HSI classification. As shown in
Figure 1, we preprocess the hyperspectral image first by the Image Fusion and Recursive Filtering feature (IFRF), which removes the redundant information simultaneously [
17]. The IFRF feature method is to select better bands, which eradicates the noise and redundant information concurrently. However, our goal is to improve the graph adjacency between the pixels for the regularization term of regularized discriminant analysis. Consequently, inspired by the LRR algorithm, we aim to exploit the low-rank structure. LR uses all samples as the dictionary, where each sample is represented as a linear combination of the dictionary. The low-rank hypothesis is a global constraint, which guarantees that data points in the same subspace are clustered in the same class [
29]. When all samples distribute into independent subspaces, the coefficients of the low-rank assumption reveal the membership of these samples: the within-cluster affinities are dense, and the between-cluster affinities are all zeros.
Afterward, divide the processed image into blocks (subsets) by pixels and implement the low-rank representation on each block image. Eventually, we combine the subsets’ feature representation with a complete feature graph. Further, the
k-nearest neighbor is applied to handle the integral low-rank graph to satisfy the symmetric matrix requirements for the regularized graph of discriminant analysis. Additionally, at the same time,
k-nearest neighbor preserves the local information of the image [
38]. Furthermore, process the semi-supervised discriminant analysis for feature extraction, which takes advantage of the labeled samples and the distribution of the whole samples. Finally, implement the supervised and semi-supervised classifier methods. We perform comprehensive experiments on several real multi-class HSIs. We summarize the main contributions of the paper in the following paragraphs.
From the inspiration of LRR and the semi-supervised discriminant analysis algorithm, we proposed a robust feature extraction method, regularized block low-rank discriminant analysis. The block LRR solves the growing computational complexity problem of LRR, which captures the global structure of the data.
After image fusion and recursive filtering, we implement our proposed regularized block low-rank discriminant analysis feature extraction method. The kNN approach simultaneously addresses two issues such as SDA’s symmetric requirements and capturing the local information of the HSIs. Consequently, the BLRDA method maximally preserves the local geometry of the data.
Extensive experiments on several multi-class HSIs demonstrate that our proposed BLRDA method is a novel feature extraction method not yet evident in the literature. It can enhance the performance of hyperspectral image classification significantly. Given simple supervised and semi-supervised classifiers, the feature graph achieves significant performance in the HSIs’ classification with few labeled samples. Moreover, the BLRDA feature extraction method is much more robust than similar methods.
We organize the remaining part of our paper as follows. We start with a preliminary introduction in
Section 2.
Section 3 deciphers the HSI classification based on regularized block low-rank discriminant analysis. To check the reproducibility of our proposed method, several experiments on real-world hyperspectral images are carried out in
Section 4. In
Section 5, we provide the discussion. Conclusions are presented in
Section 6.
2. Preliminary
2.1. Regularization
To avoid the overfitting issue, regularization is implemented. Regularization means adding some rules (restrictions) to the objective function that needs training, which amends the solution space and reduces the possibility of finding the wrong solution, so that it does not inflate itself. The constraint is interpreted as a priori knowledge (the regularization parameter is equivalent to introducing the prior distribution to the parameter).
The regularization constraint has a guiding role. When optimizing the error function, it tends to choose the gradient reduction direction. Subsequently, the final solution tends to conform to a priori knowledge (such as the general l-norm a priori knowledge, indicating that the original problem is more likely to be relatively simple and tends to produce sparse parameters).
Regularization has roughly two functions. From the model modification point of view, it is used to balance the two terms in the learning process such as bias-variance, fitting ability-generalization ability, loss function-generalization ability and empirical risk-structural risk. From the model solution point of view, regularization provides the possibility of a unique solution. Eventually, least squares fitting may produce an infinite solution, but adding or regularization terms can lead to unique solutions.
In practice, overfitting [
14,
15] may happen due to limited training samples [
16]. To address the overfitting issue, semi-supervised learning was proposed to utilize both labeled samples and unlabeled samples to enhance algorithmic performance. The priori assumption of consistency is the key of SSL, which indicates that nearby points may have the same label or similar embeddings [
18]. Cai et al. proposed Semi-supervised Discriminant Analysis (SDA), which takes advantage of both labeled and unlabeled samples [
21].
The semi-supervised learning algorithmic performance relies profoundly on the graph construction processes. Hence, inspired by low-rank representation, we aim to improve the semi-supervised discriminant analysis feature extraction method via a novel regularized low-rank-based graph term. Furthermore, we narrated a brief outline of two important techniques, semi-supervised discriminant analysis and the low-rank representation algorithm.
2.2. Overview of Semi-supervised Discriminant Analysis
Semi-supervised Discriminant Analysis (SDA) is derived from Linear Discriminant Analysis (LDA). LDA is a supervised method that minimizes within-class covariance while maximizing between-class covariance. The objective function of LDA is as follows:
where
is defined as the between-class scatter and
is the within-class scatter matrix.
Overfitting may occur when there are not enough training samples. A possible solution to deal with overfitting could be learned on both labeled and unlabeled data by imposing a normalizer [
21]. Applying the notion of the regularized graph, a smoothness penalty is incorporated into the objective function of linear discriminant analysis, and the overfitting problem is solved. The labeled samples in the SDA algorithm are to maximize the different classes’ separability. The graph adjacency of all of the samples is used to estimate the fundamental geometric information, which is called the regularized graph.
Given a set of samples
, where
, the first
m samples are labeled as
, and the remaining
l samples are unlabeled. There are
c classes. A rejection matrix
is obtained here in the SDA method [
21], which can present the prior assumption of consistency according to a regularization term.
Here, is defined as the total class scatter matrix. is the parameter that balances the complexity and empirical loss of the model. The regularized graph controls the learning complexity of the hypothesis family.
Given a set of examples, is the graph adjacency that represents the model of the relationships of nearby data points. Typically, the k-nearest neighbor graph is used to calculate the neighbor points’ relationship, where an edge between the nearest neighbor is the weight of each other.
The model can easily incorporate the priori knowledge in the regularization term
, as follows:
The column (or row, since
is symmetric) sum of
, i.e.,
is the diagonal matrix
. The Laplacian matrix [
23] is
. Then, the SDA’s objective function with regularization term
is:
We obtain the projective vector
by maximizing the generalized eigenvalue problem where
d is the weight matrix’s rank for the labeled graph.
2.3. Low-Rank Representation
The low-rank representation is first proposed to construct a graph (LR-graph) [
23], which jointly receives the graph according to the low-rank constraint. It can efficiently catch the global structures of the data [
39].
Suppose
is the samples set. Each column is an object sample. Therefore, we represent the feature matrix
on a given dictionary
[
29] as:
where
is the low-rank coefficient matrix and
is the representation coefficient of
.
is a linear combination, which is called the representation coefficient. The so-called low-rank representation is a matrix for this coefficient.
For example, in HSIs, samples of the same category exhibit similar spectra, while samples from different groups do not present similar spectra. Due to the apparent intra-class similarity and inter-class dissimilarity, each sample can be represented well by others in the same class, while not by others in another class. Therefore, when being represented on the dictionary
, samples from the
i-th class
will produce a significant representation on the
component and small factors on other parts
(
). The appropriate permutations on columns of
,
will exhibit apparent diagonal-block structure, as shown in
Figure 2.
However, the actual labels of samples are unknown in
; it is intractable to directly reveal the block-diagonal structure in
. Nevertheless, the underlying block-diagonal structure enables
to be low-rank [
40]. Therefore, we exploit the low-rank property of
instead of the block-diagonal structure implicitly. Furthermore, each sample can be well represented due to the intra-class similarity. The following low-rank framework [
29] searches the lowest rank solution:
Due to the discrete nature of the rank function, the above optimization problem is onerous to sort out. It is evident from the literature that by matrix completion methods (e.g., [
41]), the optimization problem is changed to a convex problem:
where
is the nuclear norm (i.e., trace norm) [
42], which is the matrix’s singular values’ summation. A more reasonable function is proposed by taking into consideration the noise or corruption in real-world situations, where the residual matrix is often sparse.
where
can be the
-norm or
-norm. Here, the
-norm is chosen as the error term, which is
.
is to balance the low-rank and the error term. Here, the inexact Augmented Lagrange Multipliers (ALM) method [
43,
44] is applied to obtain the optimization solution
.
2.4. Image Fusion and Recursive Filtering Feature
Initially, we extracted the spatial information of the HSI by using Image Fusion and Recursive Filtering (IFRF), which is one of the easiest approaches to image fusion [
17]. The IFRF feature of an HSI contains the main information of the HSI, which eradicates noise and redundant information simultaneously. It plays a vital role in identifying objects and can be used to discriminate between different classes in the classification problem [
17]. Hence, we preprocess the HSI first by IFRF to eliminate noise and redundant information.
Let us suppose
represents the original hyperspectral image, which has
D-dimensional bands and
M pixels in the
i-th band of the whole image represented by
. We have attempted to spectrally partition the whole hyperspectral image (hyperspectral bands) into multiple subsets. Each subset is composed of several
K continuous bands. Here, we define
as the number of subsets. Further, we can get
, where
indicates the floor operation that computes the biggest integer no larger than
. The
i-th (
) subset is shown as follows:
Afterward, fuse the adjacent bands of each subset by image fusion method (the averaging method). For example, the
i-th image fusion feature
, i.e., the
i-th fusion band, is computed as follows,
Here,
refers to the
n-th band in the hyperspectral image’s
i-th subset.
is the
i-th subset’s band number. After image fusion, we remove the noise pixels and redundant information for each subset. Then, we obtain the
i-th feature by recursive filtering on the above fusion band
.
Here,
indicates the recursive filtering transform method.
is defined as the filter’s spatial standard deviations, and
is the range standard deviations [
45]. After image fusion and recursive filtering, we obtain the feature image
. Let
be the preprocessed features vector, where
represents a pixel with
band numbers (dimension) in the hyperspectral image.
3. Methodology
In the above section, we study the classic regularized discriminant analysis, semi-supervised discriminant analysis and the LRR, which can efficiently capture the global structures of the data. Furthermore, the labeled samples, as well as the unlabeled samples can be exploited simultaneously by depicting the underlying block-diagonal structure of . Considering this, we have attempted to decipher in our present work the robust regularized block low-rank discriminant analysis feature extraction method. We have comprehensively provided insight into the optimization method based on the inexact ALM algorithm. The framework of the HSI classification applying the proposed graph incorporation is found in the following part of this section.
3.1. Regularized Block Low-Rank Discriminant Analysis
The HSI features’ vector , is the channel number (data dimension), and M is the pixels number (sample or object number). Determining an appropriate subspace for classification is an important task. The low-rank representation feature graph is an onerous task; the operating time increases exponentially with a growing number of samples. Hence, we explore the low-rank structure via block theory, which divides the whole image into blocks for low-rank representation.
We choose the block size as
pixels in each block partition. Let
represent
m blocks’ set index of the image, where there are
pixels in each
. To formulate the matrix
, according to
, we can put the vectors belonging to the same blocks together in the form of
,
and
. Further, the LRR optimization problem for each block is converted into the following form:
where we choose the
-norm as the error term
, which is
. Here, we use the inexact ALM method [
43,
44] as the optimal solution
.
For properly optimizing problem (13), by presenting an auxiliary variable
regarding
, we convert it to the similar issue as follows [
29]:
Incorporating an intermediate variable
, we minimize the augmented Lagrange function as follows via the inexact ALM method [
43]:
where
,
and
are Lagrangian multipliers and the penalty parameter, respectively. To minimize the problem (15), we update the variables
,
,
,
and
alternately with others fixed. Meanwhile, the sub-problem is convex regarding all the variables; it can supply a relevant, unique solution. The inexact ALM is applied to update the variables
,
and
iteratively. The optimization process outline is given in the following Algorithm 1.
After the block low-rank representation, we get . Then, we combine these feature representation subsets with a whole feature graph .
The combined low-rank graph is composed of multiple blocks graph
, which is asymmetric. Traditionally, the symmetrization process is used to determine the mean value of its transposition by itself to satisfy the symmetric requirement of SDA. However, the graph adjacency matrix
is not a square matrix. Our previous work shows that the
k-nearest neighbor is an effective symmetrization process that can significantly improve the performance of SDA [
38]. Hence, the
kNN algorithm integrally handles the combined low-rank graph, which addresses two issues, such as addressing the symmetrization process and further maximally preserving the local information of the image by the
kNN’s property.
The samples
and
are considered as neighbors if
is among the
k-nearest neighbor of
or
is among the
k-nearest neighbor of
. Here, we use the Heat kernel weighting [
13] method to assign weights for
as follows:
where
denotes
k neighbors of
in (16).
Therefore, the BLRDA method can achieve high-quality feature representations for classification.
Algorithm 1 Resolving function (15) by the inexact ALM. |
Input: Mapped data graph , regularization parameter for local affinity. , , , , , , and .- 1:
while No convergence do - 2:
Fix other variables and update ; - 3:
Fix other variables and update ; - 4:
Fix the others and update ; - 5:
Update the multipliers as follows: , - 6:
Update parameter by ; - 7:
Examine convergence conditions and - 8:
end while Output: The Laplacian regularized low-rank representation graph . |
3.2. BLRDA Feature Extraction for HSI Classification
The semi-supervised discriminant analysis method can successfully solve the overfitting problem with few labeled samples. Given a set of samples with c classes and unlabeled samples , the is the samples of the k-th class. The algorithmic procedure of HSI classification by applying the regularized block low-rank discriminant analysis feature extraction method for both supervised and semi-supervised classification is stated below:
Step 1 Construct the adjacency graph: Construct the block low-rank and kNN graph in Formula (16) for the regularization term. In addition, calculate the graph Laplacian .
Step 2 Construct the labeled graph: for the labeled graph, construct the matrix as:
Define , where is an identity matrix.
Step 3 Eigen-problem: Calculate the eigenvectors for the generalized eigenvector problem:
where
. The
is of rank
d, and we will have
d eigenvectors denoted as
.
Step 4 Regularized discriminant analysis embedding: Let the transformation matrix
. Finally, embedding the samples into a
d-dimensional subspace,
we can obtain the between-class scatter
and the total class scatter matrix
.
Therefore, the Eigen-problem in Formula (18) is the same as the Eigen problem in Formula (5), and we can obtain the projective matrix
:
where
is the graph after
kNN, i.e.,
is the graph of nearby data points’ relationships.
is the Laplacian matrix [
23].
is the diagonal matrix
.
Then, graph
can be embedded into the
d-dimensional subspace as the SDA embedding matrix
.
Step 5 HSIs’ Classification: Finally, perform the supervised and semi-supervised classifier nearest neighbor and SVM classifier for classification. The classifiers mentioned above are simple and ubiquitously-used classifiers.
5. Discussion
Classification of HSIs plays a pivotal role in understanding the HSIs. In the present work, we propose a novel approach for HSI feature extraction, Regularized Block Low-rank Discriminant Analysis (BLRDA). Our goal is to enhance the classification accuracy of HSIs by the useful feature extraction method. Experimental results on the three images show that the BLRDA is a competitive feature extraction method with other comparative methods for HSI classification.
From the supervised and semi-supervised experiments illustrated in
Table 2,
Table 3 and
Table 4, we observe that the BLRDA method is an effective feature extraction method, which achieves the highest classification accuracy compared to the other methods. The performance was remarkable even with simple supervised and semi-supervised classifiers (nearest neighbor and SVM) and randomly given parameters. In some case, traditional construct graph methods (such as
kNN-graph and LLE-graph) may perform well in some classes, but they are not as stable as our proposed algorithm. The LR-graph captures the global data structures better and obtains the representation of all samples under global low-rank constraints, while the most common
k-Nearest Neighbor (
kNN) and Locally Linear Embedding (LLE) neighbors use fixed global parameters to determine the weights of the graph with Euclidean distances whose graphical structures are associated with noise instability and are sensitive to noise. Additionally, the SR-graph lacks global constraints, which greatly degrades performance when the data are grossly corrupted. However, the LR-graph addresses these drawbacks, which jointly receives the graph of the whole data and is demonstrated to be robust to the noise. Further, the
k-nearest neighbor is used to integrally handle the combined low-rank graph, which performs two functions: preserving the local information of the image and further satisfying the algorithmic requirements for the subsequent dimension reduction procedure. Consequently, it significantly improves the classification performance of hyperspectral images, which indicates that the BLRDA method is a superior HSI feature graph both for supervised classification and for semi-supervised classification.
We analyzed robustness based on the labeled samples’ size effect and noise effect. In
Figure 9, we discover that the BLRDA method is usually superior to the others in terms of both the NN and SVM classifier, which is robust to the label percentage variations.
Figure 9 also shows that the proposed method achieves higher classification accuracy even at meager label rates. Some of the other compared algorithms are not as robust as our BLRDA algorithm, in particular when the label rate is low. Due to the lack of labeled samples, our proposed method is much more robust and suitable for real-world HSI classification. As we can see in
Table 6 and
Figure 11,
Figure 12 and
Figure 13, the results of our method are robust to noise and labeled sample size. As the noise increases, the performance of BLRDA drops very slowly, from 0.9765–0.9481 in the case of the India Pines image, which has little effect in practical situations. However, for the comparative methods BSDA , BKDA, BLEDA and IFRF, overall accuracy drops quickly with the increase in noise. In addition to the India Pines image, we can see the same results from these three HSI results. This benefits from the robustness of the low-rank representation to noise, while
k-Nearest Neighbor (
kNN) and Locally Linear Embedding (LLE) neighbors use fixed global parameters with Euclidean distances, which are sensitive to noise. The LR-graph better captures the global property of the data. Therefore, the BLRDA method is demonstrated to be robust to the labeled samples’ size and noise, which indicates that it is highly competitive in practical situations.
We evaluate the performance of parameters with different block sizes and reduction dimensions for the BLRDA method, as shown in
Table 7 and
Table 8. From the tables, we can see that the classification results increase slightly with the growing block size and reduced dimension. Therefore, we could use small block sizes in actual situations for the purpose of efficiency, with classification accuracy loss so minimal that it could be ignored. Moreover, from
Table 8, we can conclude that the reduced dimension is robust in a medium-sized block.
The proposed BLRDA method performs quite well and is competitive with respect to supervised and semi-supervised classification of HSIs. In future work, we plan to explore further real-world applications and improve the BLRDA method, thereby making it more efficient. Inspired by one reviewer, in the future, we could divide the image into blocks by a proper community detection algorithm [
50,
51,
52,
53] instead of dividing the image by the regular pixels’ position, which is very great advice. The way of dividing blocks may impact the subsequent low-rank representation.