First, the TRP algorithm is used to reduce the dimensions of hyperspectral remote sensing images containing all bands. Then, the MD classifier is used to classify the low-dimensional images to obtain the distance matrix. Finally, they are included in the ensemble classification framework to obtain the final classification results.
3.1. TRP Algorithm
Given a hyperspectral remote sensing image A = {aj, j = 1, …, J}, where j is the pixel index, J is the number of pixels, and aj = (ajd, d = 1, …, D) is the spectral measure vector of pixel j, d is the band index, D is the number of bands, and ajd is the spectral measure of the band d of pixel j. Taking the spectral vector aj (j = 1, …, J) as the row vector, a hyperspectral remote sensing image can be expressed as a J × D matrix. For the convenience of description, A is still used to refer to a hyperspectral remote sensing image matrix without confusion, that is, A = [a1, … aj, …, aJ]T, where T is the transpose operation.
The TRP algorithm can project hyperspectral remote sensing images into a low-dimensional subspace and make any vector pair in the low-dimensional subspace satisfy the distance relative invariance with a high probability. It is possible for the TRP algorithm to reduce the dimensionality of hyperspectral remote sensing images. The TRP algorithm is as follows [
31].
Theorem 1. The D dimensional feature space can be randomly projected into the KTRP dimensional space, where KTRP is a positive integer and satisfies,where is the intrinsic dimensionality, ⌈ ⌉ is the round up character, ε ∈ [0.7, 1.5], and β > 0 are the projection parameters that control the range of distance preservation and the success rate of projection, respectively. Let RTRP = [rdk]D × KTRP be a tighter random projection matrix, and rdk are independent random variables subject to standard normal distribution, that is, rdk ~ N(0, 1). For a given hyperspectral remote sensing image A, the low-dimensional image B projected to KTRP dimensionality by RTRP is,where B = [b1, …, bj, …, bJ]T = [bjk]J × KTRP, and bj is the low-dimensional vector of pixel j. For the spectral vectors aj and aj′ in the hyperspectral remote sensing image A, let the corresponding two low-dimensional vectors in the low-dimensional image B be bj and bj′, respectively. bj and bj′ satisfy the distance relative invariance, ifwhere represents 2-norm. So, Any two low-dimensional vectors in the KTRP dimensional space obtained by the TRP algorithm satisfy the distance relative invariance at least with the probability PTRP, where PTRP = 1 − J–β. It is worth noting that the distance relative invariance is not that the square of the distance between the vectors before and after projection is equal, but remains at interval . The bj and bj′ with out of the interval are considered to have little similarity of the vector pair before and after projection. Under the constraint of the same probability PTRP, the intrinsic dimensionality of the TRP algorithm is lower than that of the RP algorithm. Therefore, the TRP algorithm can reduce the number of bands in hyperspectral remote sensing images to a greater extent, while ensuring that the vector pairs satisfy the distance relative invariance with probability PTRP. As distance reflects the structure of the dataset, the TRP algorithm indicates that low-dimensional images in a low-dimensional space can maintain the structure of hyperspectral remote sensing images with a high probability.
The detailed process of the TRP Algorithm 1 can be summarized as follows.
Algorithm 1. The detailed process of the TRP algorithm. |
Input: test hyperspectral remote sensing image A. |
Output: low-dimensional image B. |
Step 1. Calculate ← Equation (1), and set the dimensionality KTRP. |
Step 2. Generate rdk according to the standard normal distribution, that is rdk ~ N(0, 1). |
Step 3. Form R. |
Step 4. Calculate the low-dimensional image B ← Equation (2). |
3.2. Random Projection Matrix Selection Strategy
The random projection matrix is randomly generated without considering the class information of hyperspectral remote sensing images. Different random projection matrices will produce different low-dimensional images. Therefore, the selection of the random projection matrix directly affects the subsequent classification accuracy of hyperspectral remote sensing images. To make low-dimensional images have stronger class separability and achieve accurate classification of hyperspectral remote sensing images, this subsection uses a random projection matrix selection strategy based on the separable information of a single class to obtain low-dimensional images with the best separability of a single class. It is worth noting that the benefit of subsequent classification refers to greater differences between classes and smaller differences within classes. This means that there may be overtraining effects and generalization may degenerate, but it is more meaningful for classification tasks.
First, the sample matrix of all classes is defined as
F = [
F1; …;
Fl; …;
FL], where
l is the class index,
Fl is the sample matrix of the
lth class, and
L is the number of classes, which is a priori. It can be specifically expressed as
where
is the sample vector of the
lth class and the
dth (
d = 1, 2,
…,
D) band, and
H is the number of samples of the
lth class. It is worth noting that this paper sets the number of samples to be the same for all classes. Then, the TRP algorithm is used to reduce the dimensionality of the samples. Through the random projection matrix
R, the sample matrix
F of all classes can be projected into the
KTRP dimensional space, thereby obtaining the low-dimensional sample matrix
S = [
S1; …;
Sl; …;
SL] of all classes. It is calculated as follows,
SL in the
S matrix is the low-dimensional sample matrix of the
lth class obtained by using the TRP algorithm for dimensionality reduction. The specific expansion is as follows:
where
is the low-dimensional measure of the first dimensionality of
Hth sample of
lth class, and
is the low-dimensional sample vector of the
kth dimensionality in the
KTRP dimensional space,
According to Equation (7), is related to the vector rk of the kth column in the random projection matrix. Thus, each element rdk in the kth column of the random projection matrix can be limited by constraining each cumulative sum of . The random projection matrix with the best class separability of the dimensionality reduction result can be selected by means of multiple sampling.
The measurement of the random projection matrix selection strategy based on the separable information of a single class is the large intra-class variance of a single class and the small distance from other classes. Each dimensionality in the random projection matrix is separately selected, to select the random projection matrix that is extremely conducive to the classification of this class.
As each element rdk of the random projection matrix R obeys the standard normal distribution, multiple random numbers can be generated according to this distribution as the sampling set of the elements rdk. It is defined as Qdk = [, …, , …, ]. Each random number is used to calculate the class separability, and the random number that maximizes the degree of separability of a certain class is selected as the element rdk of the random projection matrix R.
Specifically, the
lth class is taken as an example to introduce the random projection matrix selection strategy based on the separable information of a single class. For the random number
, the variance of the
lth class samples and the distance from other class samples are calculated, respectively. Then, the minimum distance between this class and other classes is divided by the variance of this class to get the
lth class final difference value
. It can be calculated by
According to Equation (8), the difference matrix of this class can be obtained by using all random numbers and is expressed as
= [
, …,
, …,
]. Then, the
ψ*th sampling is obtained by maximizing the final difference value of the
lth class, that is,
Finally, the element
rdk takes the
ψ*th sampling random number that maximizes the final class difference value, that is,
3.3. Entropy-Weighted Ensemble Algorithm
The random projection matrix selection strategy based on the separable information of a single class considers the separability of a single object class but does not measure the class separability of low-dimensional images from a global perspective. To this end, based on the idea of the random projection matrix selection strategy of the separable information of a single class and ensemble classification, the classification results of multiple low-dimensional images are combined to construct a classification model, thereby obtaining more stable and accurate classification results.
The main idea of the entropy-weighted ensemble classification algorithm is to use the random projection matrix selection strategy based on the separable information of a single class to select L random projection matrices suitable for L classes, respectively. In addition, the TRP algorithm is used to reduce the dimensionality of the hyperspectral remote sensing images A based on these projection matrices, thereby obtaining L low-dimensional images. The number of ensembles is set as the number of classes in this paper to ensure that each class is considered. For ease of reference, iter is used to represent the index of ensembles, that is, iter = 1, …, L. The low-dimensional image obtained by the iterth ensemble using the TRP algorithm is defined as Biter.
The MD classifier is used to classify L low-dimensional images to obtain L distance matrices, respectively. Each distance matrix Ziter is regarded as a similarity measurement matrix between the low-dimensional spectral vector and the mean vector of each class. Finally, the entropy information is used to weigh each matrix to obtain the final similarity measure matrix C.
With the help of the class information of the samples, the TRP algorithm is used to calculate the lower limit of the projection dimension
KTRP, and the random projection matrix
Riter is obtained by the random projection matrix selection strategy. Then, the sample matrix
F of all classes can be projected into the
KTRP dimensional space. The result of the dimensionality reduction of the sample matrix
F by the random projection matrix
Riter is
Siter = [
Siter1; …;
Siterl; …;
SiterL], which is calculated as follows
Then the feature mean vector
of the low-dimensional samples of the
lth class is calculated to obtain distance matrix, which is expressed as
Then, according to Equation (11), the same random projection matrix
Riter is used to reduce the dimensionality of the hyperspectral remote sensing image
A to obtain the low-dimensional image
Biter. So far, the feature mean vector and low-dimensional images of all classes of low-dimensional samples can be obtained. The MD classifier builds a classification model on low-dimensional images, and the similarity of the low-dimensional vector is defined by calculating the distance between each low-dimensional vector and the mean vector of each class sample. The distance matrix is defined as
Ziter = [
, …,
, …,
], where
is the distance between the
jth low-dimensional vector and the mean vector of all class samples of the
iterth ensemble. Specifically,
= [
, …,
, …,
], where
is between the mean vector
of the
lth class low-dimensional samples and the low-dimensional vector
. The distance is calculated as follows
In the process of entropy-weighted ensemble classification, to avoid the problem that the distance value in each distance matrix is too large or too small, it is necessary to normalize each distance matrix
Ziter to obtain the matrix
Yiter before processing all the distance matrices, that is,
Yiter = [
, …,
, …,
]. Specifically,
= [
, …,
, …,
], where
is calculated as follows
The information entropy is used to perform weighted ensemble processing on multiple distance matrices to generate a similarity measure matrix, and the entropy value of the defined matrix
Yiter is calculated as follows
where
G represents the total number of unique distance values,
represents the probability that the distance value in
iterth matrix
Yiter is
g, and the frequency is not 0. It can be calculated by
where # represents the quantity, and the (
j,
l) is the position where the distance value is
g. The final similarity measure matrix
C = {
cjl,
j = 1, …,
J, l = 1, …,
L} is calculated as follows
Finally, to model the class, the deterministic classification result
o = [
o1, …,
oj, …,
oJ] can be obtained in the decision-making process of hyperspectral remote sensing images classification, where
oj ∈ {1, …,
L}. Simply, the low-dimensional vector
bj belongs to the
lth class with the smallest Euclidean distance,
3.4. The Complexity of the Proposed Algorithm
The space and time complexities of the proposed algorithm are analyzed here. This section studies the complexity in three parts: the optimization strategy of the projection matrix, the MD classifier, and the ensemble algorithm.
The main contribution of the complexity of the optimization strategy of the projection matrix is to calculate the projection matrix. To update the projection matrix, it takes O(HLD) space and O(HLDKTRP) time for the calculation of the low-dimensional sample matrix. Furthermore, O(HKTRP) space and O(THL2KTRP) time are required to calculate the class dissimilarity. The main contribution of the complexity of the classification algorithm is to calculate the distance matrix. To update the distance matrix, it takes O(SD) space and O(SDKTRP) time for the calculation of the low-dimensional images. Furthermore, O(SKTRP) space and O(SLKTRP) time are required to calculate the distance. The main contribution of the complexity of the ensemble algorithm is to repeat the above two steps. To obtain the final distance matrix, the overall space complexity of the proposed algorithm is O(LSD), and the overall time complexity of the proposed algorithm is O(L(HLDKTRP + THL2KTRP + SDKTRP + SLKTRP)).
Figure 2 presents the flow chart for the proposed algorithm. For easier understanding, the blue arrows represent the dimensionality reduction process of samples, and the red arrows represent the dimensionality reduction process of hyperspectral remote sensing images.
Furthermore, the detailed process of the proposed Algorithm 2 can be summarized as follows.
Algorithm 2. The detailed process of the proposed classification algorithm. |
Input: samples F, test hyperspectral remote sensing image A. |
Output: the classification results o. |
Step 1. Calculate ← Equation (1), and set the dimensionality KTRP. |
For iter = 1: L |
Step 2. Randomly generate Ψ random numbers according to the standard normal distribution. |
Step 3. Calculate the final difference value ← Equation (8). |
Step 4. Calculate the ψ*th sampling ← Equation (9). |
Step 5. Form Riter ← Equation (10). |
Step 6. Reduce the dimensionality of the hyperspectral image A and samples F ← Equation (11). |
Step 7. Calculate the mean vector of each low-dimensional class sample ← Equation (12). |
Step 8. Calculate distance matrix Ziter ← Equation (14). |
End |
Step 9. Obtain the entropy value Eiter ← Equation (15). |
Step 10. Obtain the final similarity measure matrix C ← Equation (17). |
Step 11. Make a classification decision o ← Equation (18). |