Next Article in Journal
The Preliminary Study of Environmental Variations Around the Du-Ku Highway Since 2000
Previous Article in Journal
The Direct Assimilation of Radar Reflectivity Data with a Two-Moment Microphysics Scheme for a Landfalling Typhoon in an OSSE Framework
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Joint Sparse Local Linear Discriminant Analysis for Feature Dimensionality Reduction of Hyperspectral Images

1
College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China
2
Key Laboratory of Computational Optical Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(22), 4287; https://doi.org/10.3390/rs16224287
Submission received: 20 October 2024 / Accepted: 15 November 2024 / Published: 17 November 2024

Abstract

:
Although linear discriminant analysis (LDA)-based subspace learning has been widely applied to hyperspectral image (HSI) classification, the existing LDA-based subspace learning methods exhibit several limitations: (1) They are often sensitive to noise and demonstrate weak robustness; (2) these methods ignore the local information inherent in data; and (3) the number of extracted features is restricted by the number of classes. To address these drawbacks, this paper proposes a novel joint sparse local linear discriminant analysis (JSLLDA) method by integrating embedding regression and locality-preserving regularization into the LDA model for feature dimensionality reduction of HSIs. In JSLLDA, a row-sparse projection matrix can be learned, to uncover the joint sparse structure information of data by imposing a L 2 , 1 -norm constraint. The L 2 , 1 -norm is also employed to measure the embedding regression reconstruction error, thereby mitigating the effects of noise and occlusions. A locality preservation term is incorporated to fully leverage the local geometric structural information of the data, enhancing the discriminability of the learned projection. Furthermore, an orthogonal matrix is introduced to alleviate the limitation on the number of acquired features. Finally, extensive experiments conducted on three hyperspectral image (HSI) datasets demonstrated that the performance of JSLLDA surpassed that of some related state-of-the-art dimensionality reduction methods.

1. Introduction

With the rapid advancement of hyperspectral imaging technology, its applications have proliferated across various fields, including agricultural [1,2,3] and forestry management [4,5], urban planning [6,7,8], international relations [9], and resource exploration [10,11]. Compared to traditional remote sensing images, hyperspectral images (HSIs) offer superior spectral resolution and a wealth of spectral information, which facilitates the recognition of ground objects [12]. Nevertheless, the high spectral dimensionality of HSIs presents significant challenges for their processing and analysis.
Dimensionality reduction (DR) is an effective approach for addressing the challenges of high data redundancy and huge data volumes [13]. DR techniques can eliminate redundant information by transforming high-dimensional data into more discriminative low-dimensional subspaces, which has become an important step for the analysis of HSIs. Over the past decades, researchers have proposed a variety of DR methods, which can mainly be categorized into linear and nonlinear DR approaches. The most well-known linear DR methods are principal component analysis (PCA) [14] and linear discriminant analysis (LDA) [15], whereas Laplacian eigenmaps (LE) [16], isometric mapping (ISOMAP) [17], and locality-preserving projection (LPP) [18] represent the foremost methods for tackling nonlinear data. These methodologies primarily focus on learning projections from distinct local geometric structures within the raw data. In contrast to PCA and LDA, they are capable of preserving the spatial local geometric structure inherent in the original data, yet they are not inherently suited for classification tasks, as the features they derive do not inherently possess discriminability [19].
LDA stands as one of the most frequently utilized methods for obtaining discriminative features in image classification tasks [20]. It projects the data into a low-dimensional space where samples from the same class are as close together as possible, while samples from different classes are positioned as far apart as possible. Various extended models based on LDA have also been proposed, such as orthogonal LDA (OLDA) [21] and discriminative local alignment (DLA) [22]. Acknowledging LDA’s approximation error, Wang et al. [23] introduced trace ratio LDA (TRLDA) as an optimal LDA solution. They further proposed optimal dimensionality LDA (ODLDA) to identify the optimal subspace dimension. To overcome the limitation that LDA cannot correctly describe internal structure, Zhu et al. [24] defined a scatter matrix based on a neighborhood composed of reverse nearest neighbors. However, these LDA extension methods were all developed by utilizing L 2 -norm as a metric criterion, which can make models sensitive to noise and outliers. To mitigate the impact of noise, robust feature extraction techniques have garnered significant attention across various fields. A commonly adopted approach to enhancing robustness involves integrating robust metrics to refine existing feature extraction models. For instance, L 1 -norm was applied to mitigate the negative effects of noise, and L 1 -norm PCA and L 1 -norm-2DPCA were proposed for robust feature extraction [25,26]. Lu et al. [27] enhanced the robustness of two-dimensional locality-preserving projections (2DLPP) by introducing the nuclear norm. Zhang et al. [28] minimized reconstruction errors based on the nuclear norm and L 2 , 1 -norm, and proposed two 2DNPP methods. Gu et al. [29] addressed the issue of the classical CSP based on the squared Frobenius norm being sensitive to noise by utilizing the L 2 , 1 -norm, and proposed a regularized version of the common spatial pattern based on the L 2 , 1 -norm (RCSP- L 2 , 1 ). Nie et al. [30] proposed a robust PCA method based on L 2 , 1 -norm maximization, which addressed the processing challenges of high-dimensional data through non-greedy optimization algorithms, overcoming the sensitivity to outliers and the high computational complexity of traditional PCA. Zhang et al. [31] proposed a novel low-rank-preserving embedding regression (LRPER) approach, which employs the robust L 2 , 1 -norm to measure the reconstruction error and regression loss, thereby effectively modeling noise and occlusions. Deng et al. [32] utilized the adaptive L 2 , p -norm to model noise and residuals in a low-dimensional space, enhancing the robustness and generalization.
As the L 2 , 1 -norm is an effective approach for obtaining joint sparse projections for discriminative feature selection or extraction, a plethora of joint sparse projection learning methods based on the L 2 , 1 -norm have been developed. In [33], Nie et al. employed L 2 , 1 -norm regularization to select features across all data points through joint sparsity. In an unsupervised learning environment, Yang et al. [34] introduced a discriminative L 2 , 1 -norm minimization algorithm (UDFS) for effective feature selection without labels. Robust sparse linear discriminant analysis (RSLDA) [35] strategically incorporates the L 2 , 1 -norm to dynamically select the most discriminative features for enhanced classification, thus optimizing the discriminative analysis process. Tang et al. [36] devised an unsupervised linear feature selective projection (FSP) method for feature extraction, by integrating low-rank embedding, dual Laplacian regularization, and L 2 , 1 -norm minimization. Li et al. [37] constructed a new linear discriminant analysis method characterized by robustness and sparsity using the L 1 -norm and L 2 , 1 -norm, which can obtain all discriminant directions simultaneously. Long et al. [38] combined LPP with the L 2 , 1 -norm to achieve simultaneous feature extraction and feature selection.
Although employing a sparse constraint allows selecting the most significant features for the purpose of feature extraction, it also has several inherent limitations. Firstly, these methods tend to focus primarily on the global structural information of data samples, often neglecting the intrinsic local geometric structure of the data. This local information is crucial for capturing subtle differences between data points and their local neighborhood relationships. The neglect of local information may result in the reduced feature representations failing to fully preserve the complex structure of the original data, thereby affecting the accuracy of subsequent analysis tasks. Secondly, most of these methods lack robustness to noise. Additionally, while the low-dimensional feature spaces generated by current methods achieve effective data compression, they struggle to directly identify which features are more important or discriminative for specific target tasks. Furthermore, many existing methods emphasize enhancing the discriminability of features by designing regression loss functions that maximize the feature differences between different categories or samples. However, this process often considers projection learning and regression optimization in isolation, overlooking the potential interactions and synergies between these two stages.
In response to the aforementioned issues, we propose a novel robust dimensionality reduction method based on LDA, termed joint sparse local linear discriminant analysis (JSLLDA). The LPP-regularization term is introduced to maintain the neighborhood relationship of the original samples by constructing a nearest neighbor weighted graph, so as to learn the projection by simultaneously using the global information and local information of the data. To enhance the robustness of the model, the robust measure L 2 , 1 -norm is employed to evaluate the regression loss. Since the L 2 , 1 -norm does not involve squaring operations, it reduces the impact of outliers on the distance measurements. Unlike the L 1 -norm, the L 2 , 1 -norm also enforces row sparsity, which can make the error terms sparse and thereby fit the noise during the projection learning process. This enhances the robustness against outliers. In addition, JSLLDA also uses L 2 , 1 -norm regularization to constrain the projection matrix. Unlike the L 1 -norm, the L 2 , 1 -norm enforces row sparsity; that is, more discriminative features are given higher weights, while redundant information and noise are given lower weights when obtaining low-dimensional features. This not only improves the robustness of the model, but also enhances the interpretability of the obtained low-dimensional features. Most importantly, the proposed method relates the projected data to the original label information through an orthogonal matrix; that is, the interaction between projection learning and regression is considered through a linear regression variant. The main contributions of this work are summarized as follows:
(1) To address the issues such as insufficient robustness, limitations in feature dimensions, and the absence of locality manifold structures that currently affect LDA-based subspace learning methods, a new robust dimensionality reduction method named JSLLDA is proposed by integrating embedding regression and locality-preserving regularization into the LDA model.
(2) In JSLLDA, the locality-preserving regularization can capture the locality manifold geometric structural information, while the L 2 , 1 -norm imposed on the projection matrix can reveal the joint sparse information of the data, significantly enhancing the robustness against noise. Moreover, the embedding regression term can overcome the limitations in the feature dimensions of LDA and improve the discriminability of low-dimensional features by fully using the prior information.
(3) An alternating iteration algorithm was developed to improve the JSLLDA model, and its computational complexity was analyzed both theoretically and numerically. Experiments were conducted on three public HSI datasets, and the associated results verified that the proposed method achieved a better classification performance than several state-of-the-art methods. Specifically, the robustness analysis and ablation study of the proposed JSLLDA method are discussed in detail.

2. Methodology

In this section, some notations commonly used in this study are first introduced. Then, we review related foundational works, which include linear regression (LR) and linear discriminant analysis (LDA). Finally, the formulation and optimization of the proposed method are presented.

2.1. Notations and Definitions

X = x 1 , x 2 , , x n R m × n represents the original high-dimensional data with n samples, and Y R c × n is the label information, where c denotes the number of classes. For a square matrix M, its ith row data are denoted m i , : , and its ith row and jth column data are denoted m i j . The trace of matrix M is written as T r ( M ) . The transposition of matrix M is denoted by M T . The L 2 , 1 -norm is defined as follows:
M 2 , 1 = i = 1 n m i , : = j = 1 n i = 1 m m i j 2 .

2.2. Linear Regression

Traditional linear regression (LR) mines some kind of mapping relationship hidden in the given sample and label information [39]. It is one of the most important machine techniques and has a wide range of applications in image classification and feature extraction. Generally, the formula for LR is as follows:
W * = argmin W W T X Y F 2 .
In Equation (1), W is the regression matrix, the linear regression model will choose the best regression matrix as much as possible, so it can make the output value W T X as close as possible to the real label Y. Generally, this method of solving a model based on minimizing the mean square error is called the “least squares method” and is solved as follows:
W * = X T X t X T Y .
Unfortunately, the classical LR model measures the regression error with the F-norm, which is sensitive to noise. Thereby, the generalization and robustness of LR is generally unsatisfactory.

2.3. Linear Discriminant Analysis

Linear discriminant analysis (LDA) is a widely used supervised dimensionality reduction algorithm [40]. Its main purpose is to make data compact within the same class and as dispersed as possible between classes after transforming the data into a low-dimensional space. Thus, the objective function of LDA is defined as follows:
P = arg min P T P = 1 P T S w λ S b P
where λ is a small positive constant. LDA finds a subspace to distinguish different classes by minimizing the rank of the intra-class dispersion matrix S w and maximizing the rank of the inter-class dispersion matrix S b . Specifically, S w and S b are defined as follows:
S w = 1 n i = 1 c j = 1 n i x j i m i x j i m i T
S b = 1 n i = 1 c n i m i m m i m T
where c is the number of classes in the sample, the number of samples in the ith class is denoted as n i , and the jth sample of the ith class is denoted as x j i . The mean vector of class i samples is m i = 1 n i j = 1 n j x j i , and the mean feature of all samples is m = 1 n i = 1 c j = 1 n i x j i . Nevertheless, when LDA is applied to HSI dimensionality reduction, the number of extractable features is inevitably constrained by the number of classes. Although the projections obtained by LDA can reduce the distance between samples of the same class and increase the separability between different classes, they overlook the locality structural information of HSI data. However, the locality information often plays an important role in the feature dimensionality reduction of HSIs. What is worse, the number of features extracted by LDA is limited by the number of classes, with a maximum of c 1 features.

2.4. Formulation of JSLLDA

As previously mentioned, most existing LDA-based dimensionality reduction methods overlook the locality structural information within data, which includes critical information such as the spatial neighborhood relationships between data samples. Ignoring this information may prevent the model from capturing subtle differences in the data. To address this issue, we introduce a locality-preserving regularization term into the LDA model to reveal the locality structural information. Mathematically, this can be expressed as follows:
min P , W , E Tr P T S w μ S b P + λ 1 Tr P T X L X T P
where P R m × c is the projection matrix, and μ is a balance factor. S b is the between-class scatter matrix, and S w is the within-class matrix. S is an undirected domain graph, in which the points represent sample points and the edges represent the nearest neighbor relationships between data samples. The Laplace matrix is L = D W , where D is a diagonal matrix where each element on the diagonal is the sum of the row or column elements of S.
Further considering linking the projection with regression, a linear regression variant is introduced into the model. In order to obtain more feature information, the regression matrix is decomposed into a regression matrix W and a projection matrix P. In detail, W and P are the orthogonal matrix and projection matrix of c × k, m × k, respectively, and k denotes the number of features to be extracted. The projection matrix P can only obtain c 1 projections, while the orthogonal matrix W has the size c × k, implying that k projections can be obtained. k can be set to any positive integer by the user, as required. At this point, the number of features acquired by LDA is no longer limited by the number of classes. More importantly, a regression matrix W is introduced to consider the reconstruction relationship between the projected samples and the label information, i.e., the constraint Y = W P T X + E can make the samples projected onto the low-dimensional space retain the label energy of the original data as much as possible. Equation (6) is rewritten as follows:
min P , W , E Tr P T S w μ S b P + λ 1 Tr P T X L X T P s . t . Y = W P T X + E , W T W = I .
E denotes the reconstruction error, which is utilized to fit noise. Considering the sparseness and robustness of the L 2 , 1 -norm, which mitigates the influence of outliers, we utilize this norm to measure the reconstruction error E.
min P , W , E Tr P T S w μ S b P + λ 1 Tr P T X L X T P + λ 2 E 2 , 1 s . t . Y = W P T X + E , W T W = I .
In addition, the L 2 , 1 -norm can be further applied to constrain the projection matrix to further mine the joint sparse information and to distinguish high-low-dimensional features. Specifically, imposing an L 2 , 1 -constraint on the projection matrix can render the matrix row-sparse, thereby assigning larger weights to significant features, while assigning weights close to zero to redundant features or noise. Evidently, these features endowed with larger weights are precisely what we need. Then, the objective function of JSLLDA is formulated as follows:
min P , W , E Tr P T S w μ S b P + λ 1 Tr P T X L X T P + λ 2 E 2 , 1 + λ 3 P 2 , 1 s . t . Y = W P T X + E , W T W = I .
where λ 1 , λ 2 , and λ 3 are regularization parameters. P is the projection matrix, and μ is balance factor. In the above model, the first term is a linear discriminant analysis term, which promotes the low-dimensional subspaces to be more cohesive in the same class. The second term is the LPP term, which makes up for the disadvantage of ignoring local information in the linear discriminant analysis and makes full use of local information to improve the representation ability of the extracted features. The third term utilizes the L 2 , 1 -norm to measure the reconstruction error. Compared with the Frobenius norm, it lacks a square operation, which is beneficial for improving the robustness of the model. In addition, the fourth term imposes a L 2 , 1 -constraint on the projection matrix P, which gives more weight to important features and less weight to redundant features or noise close to 0, showing that those features are needed for the actual task.

2.5. Optimization to JSLLDA

In this section, an iterative method is designed to solve the optimization problem of JSLLDA by using the alternating direction method of multipliers (ADMM) [41]. The corresponding generalized Lagrangian function for problem (9) is written as
L ( W , P , E , Y ) = Tr P T S w μ S b P + λ 3 P 2 , 1 + λ 2 E 2 , 1 + λ 1 Tr P T X L X T P 1 2 β η F 2 + β 2 Y W P T X E + η β F 2 ,
where β is the penalty factor and Y is the Lagrange multiplier. The specific solution scheme is as follows:
  • Step 1: Fix W and E to update P. The problem Equation (10) is transformed into the following optimization problem:
    min P Tr P T S w μ S b P + λ 3 P 2 , 1 + λ 1 Tr P T X L X T P + β 2 Y W P T X E + η β F 2 ,
    where H is defined as H i i = 1 2 p i 2 , Y E + η β is defined as M.
    By evaluating the derivative of Equation (11) with respect to P and setting it to 0, the following can be obtained:
    2 S w μ S b P + λ 3 H P + β X X T P X M T W + λ 1 X L X T P = 0 .
    then, P is calculated as follows:
    P = 2 S w μ S b + λ 3 H + β X X T + λ 1 X L X T β X M T W .
  • Step 2: Fix P and E to update W. The solution to W can be obtained by minimizing the equivalence problem (7)
    min W T W = I Y W P T X E + η β F 2 .
    Let Y E + η β = M , and problem (14) is rewritten as
    min W T W = I M W P T X F 2 = max W T W = I Tr W T M X T P .
    Suppose the SVD of M X T P is
    S V D M X T P = U S V T .
    Then, the solution of W can be obtained by
    W = U V T .
  • Step 3: Fix W and P to update E. Let us discuss a situation where W and P are provided. At this point, E can be solved using the following function:
    min E λ 2 E 2 , 1 + β 2 Y W P T X + η β E F 2 .
    According to [42], E can be expressed as the following closed solution:
    E = Ω λ 2 β Y W P T X + η β ,
    where Ω is the shrinkage operator.
  • Step 4: update η , β
    η = η + β Y W P T X E , β = min ρ β , β max .
    where ρ and β max are constant. The iterative solving process of JSLLDA is summarized in Algorithm 1.
Algorithm 1: The Iterative Algorithm for Solving JSLLDA
Input: Sample data X, label matrix Y, class compactness graph weight matrix S, reduced dimension m, parameters λ 1 , λ 2 , λ 3 , and maximum number of iteration steps T.
Initialize: β = 0.1 ,   ρ = 1.01 ,   β max = 10 5 ,   μ = 10 5 . Q = 0 ,   E = 0 ,   η = 0 , where 0 is zero matrix, initialize P as an orthogonal matrix.
while not converge do
    1. Update P by using (13),
    2. Update W by using (17),
    3. Update E by using (19),
    4. Update η , β by using (20).
    5. Check the convergence conditions
end while
Output: W, P, E.

3. Experiments

In this section, the effectiveness of JSLLDA was validated on HSIs using the Salinas dataset, the University of Pavia dataset, and the Heihe dataset. To provide a comprehensive comparison, results from several state-of-the-art algorithms, including LDA [15], LPP [18], RSLDA [35], TRLDA [23], LRPER [31], and L 2 , p -RER [32], are presented. SVM classifiers were utilized to classify the dimensionality-reduced data, with classification accuracy serving as the quantitative evaluation metric for each method. The objective metrics employed were the overall accuracy (OA), the average accuracy (AA), the kappa coefficient ( κ ), and standard deviation.

3.1. Experimental Datasets

(1) The Salinas hyperspectral dataset was captured by the AVIRIS sensor over Salinas Valley, California, with a spatial resolution of 3.7 m per pixel. Originally consisting of 224 bands, the dataset was reduced to 204 usable bands by removing bands [108–112], [154–167], and the 224th band, due to water absorption. The image dimensions are 512 × 217 pixels, yielding a total of 111,104 pixels, of which 54,129 pixels are of practical use. These pixels are categorized into 16 classes, including fallow, grapes_untrained, and soil_vineyard_develop.
(2) The University of Pavia hyperspectral dataset was captured by the Reflective Optics Spectrographic Imaging System in 2003, covering part of the city of Pavia, Italy, with a spatial resolution of 1.3 meters. The dataset originally contained 115 bands, but 12 bands were removed due to damage, leaving 103 spectral bands in use. The image dimensions are 610 × 340 pixels, totaling 2,207,400 pixels. Excluding background elements, 42,776 pixels are of practical use. These pixels are classified into nine different categories, including bitumen, trees, and gravel.
(3)The Heihe dataset was collected by CASI/SASI sensors. There are eight types of land cover in the dataset, and the dataset size is 684 × 453 pixels, with a spatial resolution of 2.4 m. After removing 14 bands that are heavily affected by noise, the remaining 135 bands were utilized.

3.2. Evaluation Index

This subsection describes how OA, AA, and κ are calculated. To compute OA, AA, and κ , it is first necessary to compute the following confusion matrix:
M = m 11 m 1 C m C 1 m C C
In Equation (21), m i j denotes the number of samples that originally belonged to class i but were predicted to be class j, and C denotes the total number of classes. OA is the ratio of the number of data samples accurately classified to the total number of samples tested and is calculated as follows:
O A = i = 1 C m i i N t e s t
In Equation (22), N t e s t represents the total number of test set samples and m i i is the diagonal element in the confusion matrix, representing the number of samples of class i correctly classified. AA is the average of the accuracy of each category and is calculated as follows:
AA = 1 C i = 1 C m i i N i
where N i is the total number of samples of class i. κ is a statistical measure used to assess the agreement between the classification results and the true classification. κ ranges from −1 to 1, where higher values indicate a stronger alignment between the method’s results and the actual classifications. The coefficient is calculated as follows:
κ = O A p e 1 p e
where p e is the expected random classification accuracy calculated as p e = 1 N 2 i = 1 C M i · × M · i . M i · is the sum of row i in the confusion matrix, and M · i is the sum of column i in the confusion matrix. For the metrics OA, AA, and κ , higher values indicate better model performance. Conversely, for the standard deviation, a lower value signifies greater stability in the model’s performance.

3.3. Experiment Setup

For the proposed JSLLDA, there are three crucial parameters: the regularization parameters λ 1 , λ 2 , and λ 3 . These parameters control the effect of the locality-preserving term, the reconstruction error term, and the joint sparse term in the objective function, respectively. To achieve the best performance for JSLLDA, we conducted cross-validation experiments on the three datasets separately. The parameters λ 1 , λ 2 , and λ 3 were tested with values set to 0.1 ,   0.01 ,   0.001 ,   0.0001 ,   0.00001 . Figure 1, Figure 2 and Figure 3 present the results of tuning λ 1 , λ 2 , and λ 3 for the Salinas, University of Pavia, and Heihe datasets, respectively. The red crosses in these three figures are the best parameter positions we found. For ease of presentation, λ 3 was fixed at 0.1 ,   0.01 ,   0.001 ,   0.0001 ,   0.00001 , and the variations in OA with respect to λ 1 and λ 2 were then analyzed. As illustrated in Figure 1, the third panel shows that when λ 3 was set to 0.001, there existed a combination of λ 1 and λ 2 where the overall accuracy (OA) reached its peak. Specifically, for the Salinas dataset, the optimal parameters were ( λ 1 ,   λ 2 ,   λ 3 ) = ( 0.1 ,   0.1 ,   0.001 ) . Similarly, for the University of Pavia dataset, the second panel of Figure 2 indicates that the model performed best when λ 1 , λ 2 , and λ 3 were set to ( 0.01 , 0.1 , 0.01 ) . For the Heihe dataset, the optimal parameters, as shown by the red markers in the second panel of Figure 3, were ( λ 1 , λ 2 , λ 3 ) = ( 0.1 , 0.1 , 0.01 ) .
According to Figure 4, for the three hyperspectral datasets, the OA values of all methods in the experiment, except for LDA, tended to stabilize after reaching a certain number of retained features. Specifically, OA tended to stabilize after retaining 30 features for all methods except LDA. However, for LDA, since it can only obtain at most c 1 features, in Figure 4, the low dimensional feature dimensions obtained by LDA for the Salinas dataset, University of Pavia dataset, and Heihe dataset are set to 15, 8, and 7, respectively.

3.4. Experimental Results and Analysis

The AA, OA, κ , and their standard deviation for the different dimensionality reduction methods on the three hyperspectral datasets are presented in Table 1, Table 2 and Table 3. And the corresponding highest classification accuracy for each class is bolded. Figure 5, Figure 6 and Figure 7 illustrate the visualization results obtained by applying the LDA, LPP, RSLDA, TRLDA, LRPER, L 2 , p -RER, and JSLLDA methods for dimensionality reduction on the three hyperspectral datasets, followed by classification using an SVM classifier.
For the Salinas dataset, 1% of the samples in each category were randomly selected for training. Table 1 and Figure 5 present the numerical results and visualization of SVM classification after dimensionality reduction using various methods on the Salinas dataset. Table 1 provides the detailed classification accuracy for each method, as well as the OA, AA, and the κ coefficient. Figure 5 illustrates a corresponding visual representation of the classification results. The classification results presented in Table 1 demonstrate that the proposed JSLLDA achieved superior performance in terms of average AA, OA, and the κ . Specifically, JSLLDA surpassed LDA, LPP, RSLDA, TRLDA, LRPER, and L 2 , p -RER by 0.52% to 8.84%, 0.2% to 6.54%, and 0.22% to 7.4% for AA, OA, and κ coefficient, respectively. It is evident that the standard deviations of the metrics for our proposed method were the smallest. This indicates that the performance of JSLLDA exhibited minimal fluctuations, which typically means that results are relatively stable across multiple experiments. Notably, JSLLDA also achieved the highest single-class accuracies for Class 2 (Broccoli-green-weeds-2), Class 3 (Fallow), Class 4 (Fallow-rough-plow), Class 5 (Fallow-smooth), Class 6 (Stubble), Class 9 (Soil-vineyard-develop), Class 10 (Corn-senesced-green-weeds), and Class 16 (Vineyard-vertical-trellis). These findings are further corroborated by Figure 5, which shows that the regions corresponding to these classes were more homogeneous compared to those obtained using other methods. In Figure 5, Classes 2 and 3 are circled with light cyan dashed lines. By comparing Class 2 in Figure 5f,h, it can be observed that the region classified by JSLLDA, which had the highest recognition rate, is smoother than that classified by TRLDA, which had the lowest recognition rate. This smoother appearance was due to fewer misclassifications in JSLLDA. The same result was found for the similar Class 3.
For the University of Pavia dataset, 1% of the samples from each class were randomly selected for training, while the remaining samples were used for testing. As shown in Table 2, JSLLDA achieved an OA that was 14.24%, 11.66%, 4.58%, 6.73%, 9.58%, and 2.57% higher than LDA, LPP, RSLDA, TRLDA, LRPER, and L 2 , p -RER, respectively. The standard deviation for OA was slightly higher than that of LPP and L 2 , p -RER, but it was still lower than that of the other methods. Additionally, the AA and κ coefficient were also 4.29% to 18.75% and 3.44% to 19.79% higher, respectively. Similarly, in terms of the stability of the AA and Kappa coefficients, JSLLDA was only slightly less stable than LPP and L 2 , p -RER. Figure 6 visually confirms that JSLLDA produced smoother classification regions compared to the other dimensionality reduction methods, particularly for Class 3 (Gravel), Class 4 (Trees), Class 5 (Painted metal sheets), Class 6 (Bitumen), Class 7 (Bare soil), and Class 9 (Shadows). Furthermore, despite the relatively small number of samples in Class 7 (Bitumen), the regions classified as such were notably larger and more homogeneous. Classes 6 and 7 have been highlighted in our analysis. As shown in Table 2, LDA’s recognition rates for Classes 6 and 7 were below 50%. Consequently, in Figure 6b, more than half of the areas corresponding to Classes 6 and 7 are displayed in incorrect colors due to the high number of misclassified samples. In contrast, in Figure 6h, Classes 6 and 7 exhibit a relatively higher classification accuracy, leading to a greater display of homogeneous regions.
For the Heihe dataset, 20 samples from each class were selected for the training set, with the remaining samples used for testing. As shown in Table 3, the proposed method enhanced the OA by 1.31% to 20.81%, and the Kappa coefficient was 0.22% to 7.4% higher compared to the other methods. Observing the standard deviation of OA, it is evident that our proposed method performed comparably to LRPER, with only a slight difference. For the AA and κ coefficient, the performance of our method was slightly inferior only to that of L 2 , p -RER. In Figure 7, Class 3 has been specifically highlighted. It is observed that the higher the recognition rate, the closer the corresponding classification result map aligned with the actual land cover distribution of Heihe, as shown in Figure 7a.

4. Model Analysis

In this section, the computational complexity of the proposed method is first considered. Subsequently, the robustness of the model was validated through a series of experiments, and the specific roles of each component of the model were explored. Based on these comprehensive experimental results and analyses, a thorough discussion and evaluation of JSLLDA is then conducted, highlighting its advantages, while also candidly acknowledging its limitations. Finally, future research directions for JSLLDA are outlined, and its potential value in diverse application scenarios is explored.

4.1. Computational Complexity Analysis

In this subsection, the computational complexity of JSLLDA is analyzed. Suppose that m is the dimension of the original data, projecting P reduces the original dimension to a low-dimensional space with d features. The major time-consuming computation is divided into two parts, originating from step 1 and step 2 in Algorithm 1. In step 1, the main computational cost originates from the inverse operation of the matrix, which has a time complexity of approximately O ( m 3 ) . While in step 2, the singular value decomposition of the matrix contributes the second part of the complexity, which is approximately O ( m d 2 ) . Furthermore, considering that JSLLDA requires T iterations to achieve convergence, the total time complexity of JSLLDA can be estimated as O ( T ( m 3 + m d 2 ) ) . In general, m > d , when the total computational complexity is about O ( T m 3 ) .
To provide a straightforward comparison of the computational efficiency of various methods, each method was executed 10 times, and the average runtime in seconds was recorded as the experimental result. These results are presented in Table 4. Notably, all the programs used in the aforementioned experiments were run on a computer equipped with a 3.40 GHz Intel(R) Core(TM) i7-13700KF CPU and 32 GB of RAM, utilizing MATLAB 2021a as the programming platform. As shown in Table 4, the LRPER method exhibited the longest runtime across the three datasets, while LDA had the shortest runtime. The single execution time of JSLLDA was consistently under one second, indicating a significant advantage over comparative algorithms and demonstrating its negligible computational burden in practical applications.

4.2. Robustness Analysis

To comprehensively assess the robustness of the proposed model, zero-mean Gaussian noise with a local variance of 0.5 was introduced into the datasets: the first 100 spectral bands of the Salinas dataset, the first 50 bands of the University of Pavia dataset, and the first 40 bands of the Heihe dataset. The performance of the various dimensionality reduction methods was then evaluated on these noise-corrupted datasets, with the results presented in Table 5, Table 6 and Table 7. And, we bolded the highest value for each metric. This thorough evaluation aimed to demonstrate the stability and effectiveness of the model under challenging noisy conditions.
Upon a detailed examination and analysis of the experimental results presented in Table 5, Table 6 and Table 7, it is clear that the JSLLDA method proposed in this study significantly outperformed the other comparative approaches in terms of OA, AA, and κ across three datasets with varying levels of data corruption: Salinas, University of Pavia, and Heihe. Specifically, for the Salinas dataset, JSLLDA demonstrated a substantial improvement in OA, ranging from 3.14% to 42.29%, highlighting its superior performance. Similarly, on the University of Pavia dataset, JSLLDA achieved notable results, with an OA enhancement between 0.11% and 35.22%, markedly surpassing the other benchmark algorithms. Furthermore, the evaluation on the Heihe dataset revealed that JSLLDA delivered an OA improvement within the range of 1.56% to 46.26%, further corroborating its stability and superiority across diverse corrupted scenarios. In summary, the JSLLDA method not only demonstrated significant efficiency in hyperspectral data classification tasks but also highlighted its robust capability to manage complex data corruption scenarios.

4.3. Ablation Study

In this subsection, the impact of the regression regularization (RR), joint sparsity (JS), and locality-preserving regularization (LP) terms on the performance of the proposed JSLLDA model was analyzed via ablation experiments. Notably, when JSLLDA was stripped of these regularization terms (RR, JS, and LP), it was reduced to the classical LDA model, which served as the baseline for comparison. We evaluated the dimensionality reduction models using OA, AA, and the κ coefficient. The results of the ablation study are presented in Table 8, Table 9 and Table 10, where ‘✓’ indicates that the model contains the corresponding component and ‘✗’ indicates that it does not.
As is evident from Table 8, the introduction of the RR term RR enhanced the baseline model in terms of OA, AA, and κ by 3.45%, 3.48%, and 3.68%, respectively. This demonstrates that the introduction of regression allowed the orthogonal matrix to better preserve the label information in the low-dimensional data. Additionally, the orthogonal components generated more discriminative information, which enhanced the model’s resistance to external noise.
Furthermore, upon integrating the JS and LP terms into the baseline + RR model (i.e., baseline + RR + JS and baseline + RR + LP), we observed OA improvements of 4.69% and 6.16%, respectively, compared to the baseline+RR model. The JS term accentuates discriminative features in high-dimensional data by assigning larger weights to salient features, while suppressing less important ones, resulting in a more discriminative information extraction. Meanwhile, the LPP-R term preserves the neighborhood relationships in the original data, ensuring that the projection matrix retains the primary energy of the raw data, thereby enhancing the discriminability of the low-dimensional space.
Crucially, the model that incorporated all three regularization terms—RR, JS, and LPP-R (our proposed JSLLDA)—achieved the optimal classification performance, with the OA, AA, and kappa coefficient reaching 91.24%, 95.07%, and 90.22%, respectively. These values marked improvements of 8.73%, 9.55%, and 9.65% over the baseline model. Similar trends were observed on the University of Pavia dataset, where the integration of RR, JS, and LPP into the baseline model led to significant enhancements in OA, AA, and κ by 13.85%, 16.49%, and 19.29%, respectively. Analogous conclusions held true for the Heihe dataset.
In summary, we can confidently conclude that the inclusion of RR, JS, and LP terms playd a pivotal role in enhancing the discriminative feature extraction capabilities of JSLLDA. Each term contributes uniquely and synergistically to achieving a superior classification performance.

4.4. Discussion

The classification results across the three hyperspectral datasets demonstrated that our proposed method consistently achieved superior performance. Notably, the representative methods RSLDA and JSLLDA, which extend LDA by imposing an L 2 , 1 constraint on the projection matrix, generally outperformed both LDA and TRLDA. This improvement was due to the L 2 , 1 constraint, which enforces row sparsity in the projection matrix, thereby giving greater weight to important features and enhancing the discriminative power of the acquired features. Furthermore, regression-based methods such as LRPER, L 2 , p -RER, and JSLLDA tend to surpass traditional models like LDA and LPP. This advantage arises because regression-based models account for the interplay between projection and regression, with the orthogonal matrix providing the classifier with more comprehensive feature information, significantly improving performance. Among the algorithms compared, only LPP focuses exclusively on local information, while LDA, RSLDA, TRLDA, LRPER, and L 2 , p -RER emphasize global information. In contrast, our proposed method integrates both local and global information, which was a crucial factor in JSLLDA’s superior classification performance.
Furthermore, the complexity analysis demonstrated the feasibility of our approach in practical scenarios. However, our proposed method still has some shortcomings. The ablation experiments also indirectly explained the advantages of the model. Specifically, the RR term establishes a link between projection and regression, aiding the projection in extracting more discriminative low-dimensional features. The LP term ensures that neighborhood information in the high-dimensional space is retained in the low-dimensional space. Additionally, by imposing an L 2 , 1 constraint on the projection matrix, the model captures joint sparse information, allowing the low-dimensional space to preserve more of the essential information from the original data. The sparse error term further enhances the robustness of the model. In the pre-processing process, three-dimensional hyperspectral image data are converted into two-dimensional data and input into JSLLDA for processing, which destroys the original spatial structure of the hyperspectral image data and loses certain spatial information. In future work, we hope to continue to investigate how to directly use high-dimensional data as an input to preserve more original information.
Although hyperspectral images are highly effective in capturing the spectral characteristics of objects, they are also prone to interference from external noise. Our proposed robust feature extraction method addresses this issue by mitigating the impact of varying environmental conditions, such as changes in lighting and shadows, thereby enhancing the accuracy of crop classification in agricultural production. Similarly, in mineral exploration, where the presence of irrelevant spectral bands and complex surface cover can introduce noise, our method improves the detection and classification accuracy of mineral types.

5. Conclusions

In this study, a robust LDA-based dimensionality reduction method called JSLLDA was proposed. In JSLLDA, a locality-preserving regularization term was introduced to capture the locality structure of data, while a regression variant was utilized to fully exploit the interaction between regression and projection. Moreover, the model’s robustness was further enhanced using a L 2 , 1 -norm constraint, which is imposed on the regression loss. Additionally, the interpretability of the projections was improved through a L 2 , 1 -norm constrained projection matrix. Furthermore, an alternating iterative algorithm was designed to optimize the proposed JSLLDA model, and its time complexity was analyzed theoretically and experimentally. To evaluate the performance of JSLLDA, experiments were conducted on three public hyperspectral datasets. To further assess the robustness of JSLLDA, Gaussian noise of varying concentrations was randomly added to the datasets for testing. Compared with several related dimensionality reduction methods, the results demonstrated that JSLLDA was more effective in extracting distinctive low-dimensional features for hyperspectral image classification. Finally, to validate the contribution of each component of the model, a series of ablation experiments were performed, confirming that each part was beneficial for JSLLDA.

Author Contributions

Methodology, C.-Y.C.; Software, M.-T.L.; Validation, Y.-J.D. and L.R.; Investigation, M.-T.L. and Y.L.; Writing—original draft, C.-Y.C. and Y.-J.D.; Writing—review and editing, L.R., Y.L. and X.-H.Z.; Visualization, M.-T.L. and Y.L.; Funding acquisition, X.-H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62401203 and Grant 62401546, in part by the Hunan Provincial Key Research and Development Program under Grant 2023NK2011 and Grant 2023NK2002, and in part by the Hunan Provincial Natural Science Foundation of China under Grant 2022JJ40189.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Li, M.; Luo, Q.; Liu, S. Application of hyperspectral imaging technology in quality inspection of agricultural products. In Proceedings of the 2022 International Conference on Computers, Information Processing and Advanced Education (CIPAE), Ottawa, ON, Canada, 26–28 August 2022; pp. 369–372. [Google Scholar]
  2. Manohar Kumar, C.V.S.; Nidamanuri, R.R.; Dadhwal, V.K. Sub-pixel discrimination of soil and crop in drone-based hyperspectral imagery. In Proceedings of the 2023 13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Athens, Greece, 31 October–2 November 2023; pp. 1–4. [Google Scholar]
  3. Tang, X.J.; Liu, X.; Yan, P.F.; Li, B.X.; Qi, H.Y.; Huang, F. An MLP network based on residual learning for rice hyperspectral data classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6007405. [Google Scholar] [CrossRef]
  4. Liang, J.; Li, P.; Zhao, H.; Han, L.; Qu, M. Forest species classification of UAV hyperspectral image using deep learning. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; pp. 7126–7130. [Google Scholar]
  5. Singh, L.; Mutanga, O.; Mafongoya, P.; Peerbhay, K.; Crous, J. Hyperspectral remote sensing for foliar nutrient detection in forestry: A near-infrared perspective. Remote Sens. Appl. Soc. Environ. 2022, 25, 100676. [Google Scholar] [CrossRef]
  6. Nisha, A.; Anitha, A. Current advances in hyperspectral remote sensing in urban planning. In Proceedings of the 2022 Third International Conference on Intelligent Computing Instrumentation and Control Technologies (ICICICT), Kannur, India, 11–12 August 2022; pp. 94–98. [Google Scholar]
  7. Ferreira, M.P.; Martins, G.B.; de Almeida, T.M.H.; da Silva Ribeiro, R.; da Veiga Júnior, V.F.; da Silva Rocha Paz, I.; de Siqueira, M.F.; Kurtz, B.C. Estimating aboveground biomass of tropical urban forests with UAV-borne hyperspectral and LiDAR data. Urban For. Urban Green. 2024, 96, 128362. [Google Scholar] [CrossRef]
  8. Li, X.; Wang, L.; Guan, H.; Chen, K.; Zang, Y.; Yu, Y. Urban tree species classification using UAV-based multispectral images and LiDAR point clouds. J. Geovis. Spat. Anal. 2024, 8, 5. [Google Scholar] [CrossRef]
  9. Zhi, Z.; Liu, J.; Liu, J.; Li, A. Geospatial structure and evolution analysis of national terrestrial adjacency network based on complex network. J. Geovis. Spat. Anal. 2024, 8, 12. [Google Scholar] [CrossRef]
  10. Li, Z.; Yang, X.; Meng, D.; Cao, X. An adaptive noisy label-correction method based on selective loss for hyperspectral image-classification problem. Remote Sens. 2024, 16, 2499. [Google Scholar] [CrossRef]
  11. Durojaiye, A.I.; Olorunsogo, S.T.; Adejumo, B.A.; Babawuya, A.; Muhamad, I.I. Deep learning techniques for the exploration of hyperspectral imagery potentials in food and agricultural products. Food Humanit. 2024, 3, 100365. [Google Scholar] [CrossRef]
  12. Ren, L.; Hong, D.; Gao, L.; Sun, X.; Huang, M.; Chanussot, J. Orthogonal subspace unmixing to address spectral variability for hyperspectral image. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5501713. [Google Scholar] [CrossRef]
  13. Su, Y.; Jiang, M.; Gao, L.; You, X.; Sun, X.; Li, P. Graph-cut-based node embedding for dimensionality reduction and classification of hyperspectral remote sensing images. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 1720–1723. [Google Scholar]
  14. Uddin, M.P.; Mamun, M.A.; Hossain, M.A. PCA-based feature reduction for hyperspectral remote sensing image classification. IETE Tech. Rev. 2021, 38, 377–396. [Google Scholar] [CrossRef]
  15. Fabiyi, S.D.; Murray, P.; Zabalza, J.; Ren, J. Folded LDA: Extending the linear discriminant analysis algorithm for feature extraction and data reduction in hyperspectral remote sensing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 12312–12331. [Google Scholar] [CrossRef]
  16. Belkin, M.; Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 2003, 15, 1373–1396. [Google Scholar] [CrossRef]
  17. Wang, C.; Fu, W.; Huang, H.; Chen, J. Isomap-based three-dimensional operational modal analysis. Sci. Program. 2020, 2020. [Google Scholar] [CrossRef]
  18. Wang, A.; Zhao, S.; Liu, J.; Yang, J.; Liu, L.; Chen, G. Locality adaptive preserving projections for linear dimensionality reduction. Expert Syst. Appl. 2020, 151, 113352. [Google Scholar] [CrossRef]
  19. Qiao, Z.; Zhou, L.; Huang, J.Z. Sparse linear discriminant analysis with applications to high dimensional low sample size data. IAENG Int. J. Appl. Math. 2009, 39, 48–60. [Google Scholar]
  20. Cui, H.; Deng, Y.; Zhong, R.; Li, W.; Yu, C.; Danyushevsky, L.V.; Belousov, I.; Li, Z.; Wang, H. Determining the ore-forming processes of Dongshengmiao Zn-Pb-Cu deposit: Evidence from the linear discriminant analysis of pyrite geochemistry. Ore Geol. Rev. 2023, 163, 105782. [Google Scholar] [CrossRef]
  21. Ye, J.; Xiong, T. Null space versus orthogonal linear discriminant analysis. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 1073–1080. [Google Scholar]
  22. Zhang, T.; Tao, D.; Yang, J. Discriminative locality alignment. In Proceedings of the Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, 12–18 October 2008; Proceedings, Part I 10. Springer: Berlin/Heidelberg, Germany, 2008; pp. 725–738. [Google Scholar]
  23. Wang, J.; Wang, L.; Nie, F.; Li, X. A novel formulation of trace ratio linear discriminant analysis. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 5568–5578. [Google Scholar] [CrossRef]
  24. Zhu, F.; Gao, J.; Yang, J.; Ye, N. Neighborhood linear discriminant analysis. Pattern Recognit. 2022, 123, 108422. [Google Scholar] [CrossRef]
  25. Kwak, N. Principal Component Analysis Based on L1-Norm Maximization. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 1672–1680. [Google Scholar] [CrossRef]
  26. Wang, R.; Nie, F.; Yang, X.; Gao, F.; Yao, M. Robust 2DPCA With Non-greedy 1 -Norm Maximization for Image Analysis. IEEE Trans. Cybern. 2015, 45, 1108–1112. [Google Scholar] [CrossRef]
  27. Lu, Y.; Yuan, C.; Lai, Z.; Li, X.; Wong, W.K.; Zhang, D. Nuclear norm-based 2DLPP for image classification. IEEE Trans. Multimed. 2017, 19, 2391–2403. [Google Scholar] [CrossRef]
  28. Zhang, Z.; Li, F.; Zhao, M.; Zhang, L.; Yan, S. Robust neighborhood preserving projection by nuclear/L2,1-norm regularization for image feature extraction. IEEE Trans. Image Process. 2017, 26, 1607–1622. [Google Scholar] [CrossRef]
  29. Gu, J.; Cai, Q.; Gong, W.; Wang, H. L21-Norm-Based Common Spatial Pattern with Regularized Filters. In Proceedings of the 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 18–20 June 2021; Volume 4, pp. 1746–1751. [Google Scholar]
  30. Nie, F.; Tian, L.; Huang, H.; Ding, C. Non-greedy L21-norm maximization for principal component analysis. IEEE Trans. Image Process. 2021, 30, 5277–5286. [Google Scholar] [CrossRef]
  31. Zhang, T.; Long, C.F.; Deng, Y.J.; Wang, W.Y.; Tan, S.Q.; Li, H.C. Low-rank preserving embedding regression for robust image feature extraction. IET Comput. Vis. 2024, 18, 124–140. [Google Scholar] [CrossRef]
  32. Deng, Y.J.; Yang, M.L.; Li, H.C.; Long, C.F.; Fang, K.; Du, Q. Feature dimensionality reduction with L2,p-norm-based robust embedding regression for classification of hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5509314. [Google Scholar] [CrossRef]
  33. Nie, F.; Huang, H.; Cai, X.; Ding, C. Efficient and robust feature selection via joint L2,1-norms minimization. Adv. Neural Inf. Process. Syst. 2010, 23, 1813–1821. [Google Scholar]
  34. Yang, Y.; Shen, H.T.; Ma, Z.; Huang, Z.; Zhou, X. L2,1-norm regularized discriminative feature selection for unsupervised learning. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume Volume Two, Catalonia, Spain, 16–22 July 2011; IJCAI’11. AAAI Press: Barcelona, Spain, 2011; pp. 1589–1594. [Google Scholar]
  35. Wen, J.; Fang, X.; Cui, J.; Fei, L.; Yan, K.; Chen, Y.; Xu, Y. Robust sparse linear discriminant analysis. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 390–403. [Google Scholar] [CrossRef]
  36. Tang, C.; Liu, X.; Zhu, X.; Xiong, J.; Li, M.; Xia, J.; Wang, X.; Wang, L. Feature selective projection with low-rank embedding and dual Laplacian regularization. IEEE Trans. Knowl. Data Eng. 2020, 32, 1747–1760. [Google Scholar] [CrossRef]
  37. Li, C.N.; Li, Y.; Meng, Y.H.; Ren, P.W.; Shao, Y.H. L2,1-norm regularized robust and sparse linear discriminant analysis via an alternating direction method of multipliers. IEEE Access 2023, 11, 34250–34259. [Google Scholar] [CrossRef]
  38. Long, C.F.; Wen, Z.D.; Deng, Y.J.; Hu, T.; Liu, J.L.; Zhu, X.H. Locality preserved selective projection learning for rice variety identification based on leaf hyperspectral characteristics. Agronomy 2023, 13, 2401. [Google Scholar] [CrossRef]
  39. Liu, Q.; Huang, X.; Bai, R. Bayesian modal regression based on mixture distributions. Comput. Stat. Data Anal. 2024, 199, 108012. [Google Scholar] [CrossRef]
  40. Lam, B.S.; Choy, S.; Yu, C.K. Linear discriminant analysis with trimmed and difference distribution modeling. Knowl.-Based Syst. 2024, 299, 112093. [Google Scholar] [CrossRef]
  41. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 2011, 3, 1–122. [Google Scholar]
  42. Yang, J.; Yin, W.; Zhang, Y.; Wang, Y. A fast algorithm for edge-preserving variational multichannel image restoration. SIAM J. Imaging Sci. 2009, 2, 569–592. [Google Scholar] [CrossRef]
Figure 1. For the Salinas dataset, the OA varied with λ 1 and λ 2 when λ 3 was set to (a) 0.1, (b) 0.01, (c) 0.001, (d) 0.0001, and (e) 0.00001.
Figure 1. For the Salinas dataset, the OA varied with λ 1 and λ 2 when λ 3 was set to (a) 0.1, (b) 0.01, (c) 0.001, (d) 0.0001, and (e) 0.00001.
Remotesensing 16 04287 g001
Figure 2. For the University of Pavia dataset, the OA varied with λ 1 and λ 2 when λ 3 was set to (a) 0.1, (b) 0.01, (c) 0.001, (d) 0.0001, and (e) 0.00001.
Figure 2. For the University of Pavia dataset, the OA varied with λ 1 and λ 2 when λ 3 was set to (a) 0.1, (b) 0.01, (c) 0.001, (d) 0.0001, and (e) 0.00001.
Remotesensing 16 04287 g002
Figure 3. For the Heihe dataset, the OA varied with λ 1 and λ 2 when λ 3 was set to (a) 0.1, (b) 0.01, (c) 0.001, (d) 0.0001, and (e) 0.00001.
Figure 3. For the Heihe dataset, the OA varied with λ 1 and λ 2 when λ 3 was set to (a) 0.1, (b) 0.01, (c) 0.001, (d) 0.0001, and (e) 0.00001.
Remotesensing 16 04287 g003
Figure 4. The average recognition rates for the different datasets versus the different dimensions of the features extracted by different methods. (a) The Salinas dataset. (b) The University of Pavia dataset. (c) The Heihe dataset.
Figure 4. The average recognition rates for the different datasets versus the different dimensions of the features extracted by different methods. (a) The Salinas dataset. (b) The University of Pavia dataset. (c) The Heihe dataset.
Remotesensing 16 04287 g004
Figure 5. Classification maps of the different methods for the Salinas dataset. (a) Ground truth, (b) LDA, (c) LPP, (d) RSLDA, (e) TRLDA, (f) LRPER, (g) L 2 , p -RER, (h) JSLLDA.
Figure 5. Classification maps of the different methods for the Salinas dataset. (a) Ground truth, (b) LDA, (c) LPP, (d) RSLDA, (e) TRLDA, (f) LRPER, (g) L 2 , p -RER, (h) JSLLDA.
Remotesensing 16 04287 g005
Figure 6. Classification maps of different methods for the University of Pavia dataset. (a) Ground truth, (b) LDA, (c) LPP, (d) RSLDA, (e) TRLDA, (f) LRPER, (g) L 2 , p -RER, (h) JSLLDA.
Figure 6. Classification maps of different methods for the University of Pavia dataset. (a) Ground truth, (b) LDA, (c) LPP, (d) RSLDA, (e) TRLDA, (f) LRPER, (g) L 2 , p -RER, (h) JSLLDA.
Remotesensing 16 04287 g006
Figure 7. Classification maps of the different methods for the Heihe dataset. (a) Ground truth, (b) LDA, (c) LPP, (d) RSLDA, (e) TRLDA, (f) LRPER, (g) L 2 , p -RER, (h) JSLLDA.
Figure 7. Classification maps of the different methods for the Heihe dataset. (a) Ground truth, (b) LDA, (c) LPP, (d) RSLDA, (e) TRLDA, (f) LRPER, (g) L 2 , p -RER, (h) JSLLDA.
Remotesensing 16 04287 g007
Table 1. Classification accuracies obtained by applying SVM on the features extracted from the Salinas hyperspectral dataset.
Table 1. Classification accuracies obtained by applying SVM on the features extracted from the Salinas hyperspectral dataset.
Class #LDALPPRSLDATRLDALRPER L 2 , p -RERJSLLDA
193.42 ± 6.4198.48 ± 0.3698.24 ± 1.8196.74 ± 1.02100 ± 0.0098.85 ± 0.8698.24 ± 1.50
298.77 ± 0.4299.14 ± 0.2098.53 ± 0.4698.45 ± 0.3399.45 ± 0.1599.49 ± 0.2299.91 ± 0.05
392.57 ± 2.9694.65 ± 4.3092.72 ± 4.5886.93 ± 6.0577.58 ± 6.6092.79 ± 3.7499.12 ± 0.75
496.52 ± 2.5497.68 ± 0.9997.48 ± 1.9598.73 ± 0.9297.54 ± 0.0799.23 ± 0.4699.41 ± 0.53
594.73 ± 2.5196.66 ± 1.4995.53 ± 1.3193.87 ± 2.4595.07 ± 3.2191.83 ± 4.9898.15 ± 0.75
699.65 ± 0.1699.55 ± 0.1298.83 ± 0.3698.54 ± 1.0298.92 ± 0.4399.69 ± 0.1099.71 ± 0.21
799.59 ± 0.1999.46 ± 0.1198.57 ± 0.4680.92 ± 5.1999.50 ± 0.1399.77 ± 0.0999.72 ± 0.12
892.20 ± 5.6087.18 ± 1.9785.23 ± 2.9980.92 ± 5.1982.27 ± 1.8185.37 ± 2.5186.25 ± 1.82
997.55 ± 1.2498.94 ± 0.7997.79 ± 1.0896.99 ± 0.8596.36 ± 1.4199.27 ± 0.6399.60 ± 0.29
1090.57 ± 2.2584.20 ± 6.8086.61 ± 4.2883.39 ± 5.3884.33 ± 2.0593.24 ± 3.0893.87 ± 2.07
1191.41 ± 1.6191.92 ± 1.5485.87 ± 6.6190.79 ± 3.9633.52 ± 4.2290.02 ± 4.0290.84 ± 3.27
1294.87 ± 2.2796.99 ± 4.0194.92 ± 3.5198.24 ± 1.7978.83 ± 7.2499.44 ± 0.5799.39 ± 0.94
1397.92 ± 0.4397.62 ± 0.5196.66 ± 1.3196.16 ± 3.0594.08 ± 1.9998.51 ± 0.6897.66 ± 1.20
1495.26 ± 1.8993.42 ± 1.2990.49 ± 4.7090.09 ± 4.9384.87 ± 3.3596.96 ± 1.1394.58 ± 1.72
1516.05 ± 12.5448.85 ± 4.6760.33 ± 3.9353.67 ± 7.2764.38 ± 4.8767.75 ± 3.7863.44 ± 3.68
1695.28 ± 3.4596.77 ± 1.7891.29 ± 2.4690.82 ± 0.6390.09 ± 4.0997.70 ± 1.0598.30 ± 0.79
AA90.40 ± 0.7392.59 ± 0.5891.82 ± 0.7086.05 ± 0.7790.52 ± 0.8694.37 ± 0.3094.89 ± 0.22
OA84.61 ± 0.6388.23 ± 0.6488.64 ± 0.3586.41 ± 0.5586.09 ± 0.3790.95 ± 0.5091.15 ± 0.26
κ 82.73 ± 0.7486.84 ± 0.7287.32 ± 0.3984.83 ± 0.6184.47 ± 0.4289.91 ± 0.5590.13 ± 0.29
Table 2. Classification accuracies obtained by applying SVM on the features extracted from the university of Pavia hyperspectral dataset.
Table 2. Classification accuracies obtained by applying SVM on the features extracted from the university of Pavia hyperspectral dataset.
Class #LDALPPRSLDATRLDALRPER L 2 , p -RERJSLLDA
166.89 ± 3.0479.82 ± 4.3084.86 ± 2.0183.79 ± 2.0783.06 ± 1.7090.57 ± 1.9787.01 ± 2.47
293.83 ± 2.0690.81 ± 1.0594.52 ± 1.8392.17 ± 2.5596.53 ± 1.1893.98 ± 1.2495.56 ± 1.22
346.97 ± 11.0345.91 ± 3.4949.66 ± 3.5449.15 ± 4.8850.19 ± 1.1257.22 ± 6.1163.16 ± 3.97
478.95 ± 5.2282.23 ± 1.8278.71 ± 5.2179.27 ± 4.9669.25 ± 5.3181.87 ± 2.8782.53 ± 3.81
599.78 ± 0.0998.69 ± 0.4398.43 ± 0.8492.35 ± 4.1395.61 ± 5.0498.36 ± 1.2898.47 ± 0.93
631.94 ± 11.5348.88 ± 4.5853.83 ± 7.5748.76 ± 7.2221.28 ± 3.2570.43 ± 3.6675.77 ± 2.97
740.62 ± 11.5839.82 ± 8.0868.63 ± 9.1566.37 ± 9.7553.76 ± 5.3857.66 ± 10.7473.02 ± 5.87
853.951 ± 9.6549.23 ± 6.1384.22 ± 2.5883.10 ± 2.4977.96 ± 2.5074.97 ± 3.9085.87 ± 2.01
979.30 ± 6.2487.75 ± 2.8599.57 ± 0.2798.98 ± 0.4698.40 ± 1.2197.28 ± 2.6899.54 ± 0.31
AA65.80 ± 3.2069.24 ± 0.5679.16 ± 1.2477.10 ± 1.7371.78 ± 0.9780.26 ± 1.2084.55 ± 0.82
OA73.79 ± 1.7976.37 ± 0.4283.45 ± 1.0081.30 ± 1.2578.45 ± 1.1685.46 ± 0.5188.03 ± 0.70
κ 64.20 ± 2.5768.20 ± 0.5277.63 ± 1.3774.78 ± 1.6770.15 ± 1.6480.55 ± 0.6683.99 ± 0.94
Table 3. Classification accuracies obtained by applying SVM on the features extracted from the Heihe hyperspectral dataset.
Table 3. Classification accuracies obtained by applying SVM on the features extracted from the Heihe hyperspectral dataset.
Class #LDALPPRSLDATRLDALRPER L 2 , p -RERJSLLDA
173.83 ± 11.0478.56 ± 4.7687.61 ± 2.4985.12 ± 2.4590.86 ± 1.8490.79 ± 2.2991.37 ± 1.52
273.93 ± 6.9177.56 ± 3.7989.06 ± 6.2698.78 ± 0.3192.41 ± 1.0496.63 ± 1.2196.96 ± 0.88
364.60 ± 6.4676.05 ± 2.4484.83 ± 2.5978.52 ± 3.5281.84 ± 1.4984.71 ± 0.1289.54 ± 3.40
465.26 ± 8.2149.41 ± 12.5171.18 ± 5.7077.08 ± 4.1875.75 ± 9.3986.45 ± 1.7585.27 ± 4.90
583.17 ± 5.1079.69 ± 6.0588.68 ± 4.1984.78 ± 3.5694.25 ± 1.3994.05 ± 1.4296.04 ± 1.01
655.77 ± 4.3178.82 ± 4.8481.52 ± 5.6583.82 ± 4.0083.88 ± 4.3989.64 ± 1.5090.67 ± 3.50
761.68 ± 9.8758.19 ± 11.8969.51 ± 5.8875.70 ± 4.9880.49 ± 2.6081.61 ± 9.0191.64 ± 2.50
886.20 ± 5.6882.11 ± 5.9890.30 ± 3.4278.50 ± 5.4892.22 ± 1.9090.09 ± 1.6692.23 ± 2.85
AA70.56 ± 3.7572.55 ± 1.7382.84 ± 2.3486.41 ± 1.5586.17 ± 0.4689.51 ± 1.4492.09 ± 0.98
OA71.48 ± 5.6975.57 ± 2.5886.07 ± 2.0786.80 ± 1.2588.36 ± 0.9990.91 ± 0.3992.29 ± 0.68
κ 63.40 ± 6.4668.22 ± 3.0681.39 ± 2.6382.15 ± 1.6584.30 ± 1.2787.74 ± 0.4789.58 ± 0.89
Table 4. Average calculation time (seconds) for different dimensionality reduction methods.
Table 4. Average calculation time (seconds) for different dimensionality reduction methods.
DatasetLDALPPRSLDATRLDALRPER L 2 , p -RERJSLLDA
Salinas0.0180.0540.22470.094320.76655.05212.6303
University of Pavia0.00560.00260.11560.045525.64572.57031.1550
HeiHe0.01130.00360.11590.06441.4410.32180.2203
Table 5. Comparison of the OA (%) classification results of the SVM classifiers after processing the Salinas dataset corrupted with Gaussian noise, using the different dimensionality reduction methods.
Table 5. Comparison of the OA (%) classification results of the SVM classifiers after processing the Salinas dataset corrupted with Gaussian noise, using the different dimensionality reduction methods.
MetricsLDALPPRSLDATRLDALRPER L 2 , p -RERJSLLDA
OA46.53 ± 1.3976.82 ± 0.9385.63 ± 0.6279.54 ± 3.0582.85 ± 0.6777.69 ± 4.3388.77 ± 0.72
AA45.62 ± 1.2377.34 ± 1.3589.79 ± 0.9680.78 ± 4.5084.56 ± 0.7475.37 ± 6.3591.73 ± 0.57
κ 40.80 ± 1.5074.11 ± 1.0383.95 ± 0.6977.09 ± 3.4180.87 ± 0.7475.15 ± 4.8387.48 ± 0.80
Table 6. Comparison of the OA (%) classification results of the SVM classifiers after processing the Pavia dataset corrupted with Gaussian noise, using the different dimensionality reduction methods.
Table 6. Comparison of the OA (%) classification results of the SVM classifiers after processing the Pavia dataset corrupted with Gaussian noise, using the different dimensionality reduction methods.
MetricsLDALPPRSLDATRLDALRPER L 2 , p -RERJSLLDA
OA44.91 ± 2.5874.41 ± 0.2280.02 ± 0.8278.75 ± 1.0361.56 ± 1.5077.19 ± 2.1280.13 ± 0.58
AA12.11 ± 1.8568.08 ± 1.2568.87 ± 2.1166.53 ± 2.6643.22 ± 2.8567.05 ± 5.6872.44 ± 2.15
κ 3.30 ± 6.2165.76 ± 0.2772.59 ± 1.2170.77 ± 1.5046.25 ± 3.0669.39 ± 2.8573.35 ± 0.76
Table 7. Comparison of the OA (%) classification results of the SVM classifiers after processing the Hiehe dataset corrupted with Gaussian noise, using the different dimensionality reduction methods.
Table 7. Comparison of the OA (%) classification results of the SVM classifiers after processing the Hiehe dataset corrupted with Gaussian noise, using the different dimensionality reduction methods.
MetricsLDALPPRSLDATRLDALRPER L 2 , p -RERJSLLDA
OA40.23 ± 6.1680.37 ± 2.3184.93 ± 1.6878.15 ± 1.7665.13 ± 1.6476.68 ± 2.6486.49 ± 1.28
AA41.82 ± 4.5679.34 ± 1.6486.25 ± 1.1576.04 ± 1.4565.49 ± 0.3971.58 ± 1.8683.85 ± 0.82
κ 28.49 ± 7.0974.25 ± 2.8280.09 ± 2.1171.31 ± 2.1356.08 ± 1.7469.22 ± 3.2382.01 ± 1.63
Table 8. Ablation experiments on the Salinas dataset.
Table 8. Ablation experiments on the Salinas dataset.
BaselineRRJSLPP-ROAAA κ
82.51 ± 0.9785.52 ± 1.1280.57 ± 1.07
85.96 ± 1.0589.00 ± 1.5784.25 ± 1.21
90.65 ± 0.4294.75 ± 0.1389.58 ± 0.47
88.67 ± 0.6093.13 ± 0.4787.35 ± 0.68
91.24 ± 0.3595.07 ± 0.1590.22 ± 0.39
Table 9. Ablation experiments on the University of Pavia dataset.
Table 9. Ablation experiments on the University of Pavia dataset.
BaselineRRJSLPP-ROAAA κ
73.79 ± 1.7965.80 ± 3.2064.20 ± 2.57
76.23 ± 0.7566.28 ± 1.6468.15 ± 1.01
87.19 ± 0.6482.41 ± 1.5582.86 ± 0.86
77.51 ± 0.8368.45 ± 1.3969.79 ± 1.09
87.64 ± 0.9482.29 ± 2.2183.49 ± 1.25
Table 10. Ablation experiments on the Heihe dataset.
Table 10. Ablation experiments on the Heihe dataset.
BaselineRRJSLPP-ROAAA κ
71.48 ± 5.6970.56 ± 3.7563.40 ± 6.46
72.58 ± 3.8667.33 ± 4.2064.21 ± 4.82
90.45 ± 2.3390.91 ± 0.9787.16 ± 2.98
71.57 ± 4.4766.67 ± 2.6063.06 ± 5.21
92.32 ± 0.6792.16 ± 0.9689.63 ± 0.89
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cao, C.-Y.; Li, M.-T.; Deng, Y.-J.; Ren, L.; Liu, Y.; Zhu, X.-H. Joint Sparse Local Linear Discriminant Analysis for Feature Dimensionality Reduction of Hyperspectral Images. Remote Sens. 2024, 16, 4287. https://doi.org/10.3390/rs16224287

AMA Style

Cao C-Y, Li M-T, Deng Y-J, Ren L, Liu Y, Zhu X-H. Joint Sparse Local Linear Discriminant Analysis for Feature Dimensionality Reduction of Hyperspectral Images. Remote Sensing. 2024; 16(22):4287. https://doi.org/10.3390/rs16224287

Chicago/Turabian Style

Cao, Cong-Yin, Meng-Ting Li, Yang-Jun Deng, Longfei Ren, Yi Liu, and Xing-Hui Zhu. 2024. "Joint Sparse Local Linear Discriminant Analysis for Feature Dimensionality Reduction of Hyperspectral Images" Remote Sensing 16, no. 22: 4287. https://doi.org/10.3390/rs16224287

APA Style

Cao, C. -Y., Li, M. -T., Deng, Y. -J., Ren, L., Liu, Y., & Zhu, X. -H. (2024). Joint Sparse Local Linear Discriminant Analysis for Feature Dimensionality Reduction of Hyperspectral Images. Remote Sensing, 16(22), 4287. https://doi.org/10.3390/rs16224287

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop