Next Article in Journal
Pointing Accuracy of an Operational Polarimetric Weather Radar
Next Article in Special Issue
Hyperspectral Image Super-Resolution by Deep Spatial-Spectral Exploitation
Previous Article in Journal
Evaluation of the Performance of SM2RAIN-Derived Rainfall Products over Brazil
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Kernel Joint Sparse Representation Based on Self-Paced Learning for Hyperspectral Image Classification

Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan 430062, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2019, 11(9), 1114; https://doi.org/10.3390/rs11091114
Submission received: 15 March 2019 / Revised: 24 April 2019 / Accepted: 5 May 2019 / Published: 9 May 2019

Abstract

:
By means of joint sparse representation (JSR) and kernel representation, kernel joint sparse representation (KJSR) models can effectively model the intrinsic nonlinear relations of hyperspectral data and better exploit spatial neighborhood structure to improve the classification performance of hyperspectral images. However, due to the presence of noisy or inhomogeneous pixels around the central testing pixel in the spatial domain, the performance of KJSR is greatly affected. Motivated by the idea of self-paced learning (SPL), this paper proposes a self-paced KJSR (SPKJSR) model to adaptively learn weights and sparse coefficient vectors for different neighboring pixels in the kernel-based feature space. SPL strateges can learn a weight to indicate the difficulty of feature pixels within a spatial neighborhood. By assigning small weights for unimportant or complex pixels, the negative effect of inhomogeneous or noisy neighboring pixels can be suppressed. Hence, SPKJSR is usually much more robust. Experimental results on Indian Pines and Salinas hyperspectral data sets demonstrate that SPKJSR is much more effective than traditional JSR and KJSR models.

1. Introduction

Hyperspectral sensors simultaneously acquire digital images in many narrow and contiguous spectral bands across a wide range of the spectrum. The resulting hyperspectral data contains both detailed spectral characteristics and rich spatial structure information from a scene. Exploiting the rich spectral and spatial information, hyperspectral remote sensing has been successfully applied in many fields [1,2,3], such as agriculture, the environment, the military, etc. In most of these applications, pixels in a scene need to be classified [2,4,5]. The commonly used classifiers include spectral-based classifiers such as nearest neighbor (NN) and support vector machine (SVM), and spatial–spectral classifiers such as mathematical morphological (MM) and Markov random field (MRF) methods [2]. Compared with the spectral-based classifiers that only use spectral information, spatial–spectral classifiers use both the spectral and spatial information of a hyperspectral image (HSI) and can produce much better classification results [5,6,7,8]. Now, spatial–spectral classifiers have become the research focus [2,5].
Under the definition of spatial dependency systems [5], spatial–spectral classification methods can be approximately divided into three categories [5]: preprocessing-based classification, postprocessing-based classification, and integrated classification. In the preprocessing-based methods, spatial features are first extracted from the HSI and then used for the classification. In postprocessing-based methods, a pixel-wise classification is first conducted, and then spatial information is used to refine the previous pixel-wise classification results. The integrated methods simultaneously use the spectral and spatial information to generate an integrated classifier. In this paper, we focus on the integrated methods.
The joint sparse representation (JSR)-based classification method is a typical integrated spatial–spectral classifier [9,10,11,12,13,14,15,16,17,18]. JSR pursues a joint representation of spatial neighboring pixels in a linear and sparse representation framework. If neighboring pixels are similar, making a joint representation of each neighboring pixel can improve the reliability of sparse support estimation [17,19]. The success of the JSR model mainly lies in the follows two factors: (1) joint representation: the neighborhood pixel set is consistent, that is, pixels in a spatial neighborhood are highly similar or belong to the same class; (2) linear representation: the linear representation framework in the JSR model is coincident with the hyperspectral data characteristics. However, in practice, spatial neighborhood is likely to have inhomogeneous pixels [15], such as background, noise, and pixels from other classes, which dramatically affects the joint representation. In addition, hyperspectral data usually exhibits nonlinear characteristics [20,21,22,23], so the linear representation framework may also be unreasonable.
To alleviate the effect of noisy or inhomogeneous neighboring pixels, an intuitive and natural idea is to introduce a weight vector to discriminate neighboring pixels. The weight can be predefined as a non-local weight [10] or dynamically updated in a nearest regularized JSR (NRJSR) model [11]. Rather than weighting neighboring pixels, another method is to construct an adaptive spatial neighborhood system such that inhomogeneous neighboring pixels are precluded. The adaptive neighborhood can be constructed based on traditional image segmentation techniques [12], the superpixel segmentation method [13], and the shape-adaptive region extraction method [14]. Both the weighting method and adaptive neighborhood method improve the consistency of neighborhood pixel sets such that the joint representation framework is effective in most cases. However, all these methods are linear methods and show performance deficiency when data are not linearly separable.
Hyperspectral data are considered inherently nonlinear [23]. The nonlinearity is attributed to multiple scattering between solar radiation and targets, the variations in sun–canopy–sensor geometry, the presence of nonlinear attenuating medium (water), the heterogeneity of pixel composition, etc. [23]. To cope with the nonlinear problem, kernel-based JSR (KJSR) methods are proposed [19,24,25,26]. KJSR mainly includes two steps: projecting the original data into high-dimensional feature space using a nonlinear map and then performing JSR in the feature space. Because the data in the feature space are linear separable, the JSR model can be directly applied. In practice, it only needs to compute a kernel function between samples. Most works on the KJSR method are concentrated on the design of kernel function. The kernel functions include Gaussian kernel [24], spatial–spectral derivative-aided kernel [25], and multi-feature composite kernel [26]. Compared with the original JSR, the use of kernel methods yields a significant performance improvement [24]. However, these kernel-based JSR methods assume that neighboring pixels have equal importance and do not considered the differences of neighboring pixels in the feature space. This is obviously unreasonable when pixels in the spatial neighborhood are inhomogeneous.
To simultaneously improve the joint representation and linear representation abilities of the JSR model, we adopt a self-paced learning (SPL) strategy [27,28,29] to select feature-neighboring pixels and propose a self-paced KJSR (SPKJSR) model. In detail, a self-paced regularization term is incorporated into the KJSR model, and the resulting regularized KJSR model can simultaneously learn the weights and sparse coefficient vectors for different feature-neighboring pixels. The optimized weight vector indicates the importance of neighboring pixels. The inhomogeneous or noisy pixels are automatically assigned small weights and hence their negative effects are eliminated.
The rest of this paper is organized as follows. Section 2 introduces JSR and KJSR and then describes our proposed method. Section 3 provides the experimental results. Section 4 gives a discussion. Finally, Section 5 draws a conclusion.

2. Self-Paced Kernel Joint Sparse Representation

2.1. Self-Paced Learning (SPL)

Given a training set { x i , y i } i = 1 N , the goal of a learning algorithm is to learn a function f ( · , θ ) (or simply parameter θ ) by solving the following minimization problem:
min θ E ( θ ) = i = 1 N L y i , f ( x i , θ ) ,
where y i is the true value, f ( x i , θ ) is the estimated value, and L is a loss function.
In model (1), all training samples are used to learn the parameter θ without considering the differences between training samples. When there exist noisy training samples or outliers, the learned function f or model parameter θ will be inaccurate.
Rather than using all training samples for learning, curriculum learning (CL) or self-paced learning (SPL) adopts a gradual learning strategy to select samples from easy to complex to use in training [17,27,28,29,30,31]. Both CL and SPL share a similar conceptual learning paradigm but differ in the derivation of the curriculum. A curriculum determines a sequence of training samples ranked in ascending order of learning difficulty. In CL, the curriculum is predefined by some prior knowledge and thus is problem-specific and lacks generalizations. To alleviate this problem, the self-paced learning (SPL) method incorporates curriculum updating in the process of model optimization [28]. SPL jointly optimizes the learning objective and the curriculum such that the learned model and curriculum are consistent.
For model (1), SPL simultaneously optimizes the model parameter θ and the weight vector ω = [ ω 1 , , ω N ] T by employing a self-paced regularizer:
min θ , ω E ( θ , ω ) = i = 1 N ω i L y i , f ( x i , θ ) + h ( λ , ω i ) ,
where ω i is a weight that indicates the difficulty of sample x i , and h ( λ , ω ) is a self-paced function. λ is a “model age” parameter that decides the size of the model. The parameters θ and ω in the model (2) can be solved by an alternating optimization strategy [28].

2.2. Joint Sparse Representation (JSR)

Given a set of N training samples x 1 , x 2 , , x N with x i R B , we can construct a training dictionary as x = [ x 1 , x 2 , , x N ] R B × N . For a testing pixel z , we can extract its neighboring pixels in a w × w spatial window centered at z and denote them as z = [ z 1 , z 2 , , z T ] R B × T ( z 1 = z ), where T is the number of pixels in the neighborhood.
In the framework of sparse representation [32], each neighboring pixel z k can be represented by the training dictionary x with a sparsity coefficient vector α k . Because neighboring pixels in a small spatial window are highly similar, they have similar representations under the training dictionary. By further assuming that the positions of nonzero elements in the sparsity coefficient vectors are the same and combining the representation of neighboring pixels, the following JSR model is generated [9]:
Z = [ z 1 , z 2 , , z T ] = [ X α 1 , X α 2 , , X α T ] = X S ,
where S = [ α 1 , α 2 , , α T ] R N × T is a matrix with only K nonzero rows. The sketches of sparse representation and joint sparse representation are shown in Figure 1.
The optimization problem for solving the matrix S in (3) is:
S ^ = arg min S Z X S F 2 subject to S r o w , 0 K ,
where · F denotes the Frobenius norm, S r o w , 0 denotes the number of nonzero rows of S , and K is an upper bound of the sparsity level. The solution of (4) can be obtained by the simultaneous orthogonal matching pursuit (SOMP) algorithm [9,33].
Once the row-sparse matrix S ^ is obtained, the testing pixel z is classified in the class with the minimal reconstruction residual:
Class ( z ) = arg min c = 1 , , C r c ( Z ) ,
where the c-th residual r c ( z ) = z x : , Ω c S ^ Ω c , : F 2 , and Ω c { 1 , 2 , , N } is the index set corresponding to the c-th class.

2.3. Kernel Joint Sparse Representation (KJSR)

In order to exploit the intrinsic nonlinear properties of the hyperspectral imagery, pixels in the original space are projected to a high-dimensional feature space by a nonlinear map, and a kernel-based JSR (KJSR) model is obtained by performing the JSR on the feature space [24].
Denote ϕ as a nonlinear map, the KJSR model assumes that the mapped neighboring pixels in the feature space (i.e., ϕ ( z 1 ) , ϕ ( z 2 ) , , ϕ ( z T ) ) are also similar and hence have a similar sparsity pattern. The KJSR model is represented as
Z ϕ = [ ϕ ( z 1 ) , ϕ ( z 2 ) , , ϕ ( z T ) ] = [ X ϕ α 1 , X ϕ α 2 , , X ϕ α T ] = X ϕ S ,
where x ϕ = [ ϕ ( x 1 ) , ϕ ( x 2 ) , , ϕ ( x N ) ] R D × N and D is the dimensionality of the feature space.
The optimization problem for solving the row-sparse matrix S is
S ^ = arg min S Z ϕ X ϕ S F 2 , subject to S r o w , 0 K ,
which can be solved by the kernel SOMP (KSOMP) algorithm [24], as shown in Algorithm 1.
Algorithm 1 KSOMP
Input: Training dictionary x = [ x 1 , x 2 , , x N ] , neighborhood pixel matrix z = [ z 1 , z 2 , , z T ] ,
    sparsity level K, kernel function κ , regularization parameter γ .
  Compute K x = κ ( x , x ) and K x , z = κ ( x , z ) .
  Set index set Λ 0 = arg max i = 1 , , N ( K x , z ) i , : 2 , and let k = 1 .
  Run the following steps until convergence:
   1 Compute the correlation coefficient matrix:
C = K X , Z ( K X ) : , Λ k 1 ( K X ) Λ k 1 , Λ k 1 + γ I 1 ( K X , Z ) Λ k 1 , : .

   2 Identify the optimal atom, and find the corresponding index:
λ k = arg max i = 1 , , N C i , : 2 .

   3 Enlarge the index set: Λ k = Λ k 1 λ k .
   4 Update the iteration number: k = k + 1 , and go to Step 1.
Output: index set Λ = Λ k 1 and coefficient matrix S ^ = ( K x ) Λ , Λ + γ I 1 ( K x , z ) Λ , : .

2.4. Self-Paced Kernel Joint Sparse Representation (SPKJSR)

In the KJSR model, it is assumed that transformed neighboring pixels ϕ ( z 1 ) , ϕ ( z 2 ) , , ϕ ( z T ) have equal importance in the sparse representation. However, this assumption is usually unreasonable because there exist differences between original spatial neighboring pixels and the nonlinear map ϕ that may further enlarge the differences. The spatial inconsistency mainly appears in the following two aspects: (1) When a target pixel is around the boundary of an object, its spatial neighborhood usually contains inhomogeneous pixels, such as background pixels or pixels from different classes; (2) when a target pixel lies in the center of a large homogeneous region, all neighboring pixels are from the same class. This notwithstanding, the spatial distances between neighboring pixels and the center target pixel are different. Neighboring pixels that are far away from the center pixel usually provide limited contributions to the classification of the central target pixel, especially when the neighborhood window is large.
When there exists a spatial inconsistency between neighboring pixels, the feature representations of neighboring pixels in the kernel space are also dissimilar. Considering the distinctiveness of feature neighboring pixels, we employ a self-paced learning strategy to select important feature neighboring pixels and propose a self-paced KJSR (SPKJSR) model for the classification of HSIs.
For convenience, we first transfer the matrix-norm-based objective function in the model (7) to a vector-norm-based one as follows:
L ( S ) = Z ϕ X ϕ S F 2 = t = 1 T ϕ ( z t ) X ϕ α t 2 2 .
Based on the self-paced learning strategy, the SPKJSR model simultaneously optimizes a weight vector and sparse coefficient matrix for feature neighboring pixels:
{ S ^ , ω ^ } = arg min S , ω t = 1 T ω t ϕ ( z t ) X ϕ α t 2 2 + h ( λ , ω t ) , s . t . S r o w , 0 K ,
where ω = [ ω 1 , ω 2 , , ω T ] T is a weight vector of feature neighboring pixels and h ( λ , ω ) is the self-paced function. Here, the self-paced learning strategy is used to select important feature neighboring pixels for joint sparse representation.
The optimization problem (9) can be solved by an alternative optimization strategy. With a fixed ω , (9) can be represented as:
S ^ = arg min S t = 1 T ω t ϕ ( z t ) X ϕ α t 2 2 = ( Z ϕ X ϕ S ) W 1 / 2 F 2 s . t . S r o w , 0 K ,
where W = diag ω 1 , , ω T is the diagonal weight matrix.
Because ω i 0 , S W 1 / 2 r o w , 0 S r o w , 0 K . Denote S ˜ = S W 1 / 2 and z ˜ ϕ = z ϕ W 1 / 2 . Then model (10) is changed to
S ^ = arg min S Z ˜ ϕ X ϕ S ˜ F 2 s . t . S ˜ r o w , 0 K .
As the feature map ϕ is unknown, we cannot compute z ϕ W 1 / 2 directly. Fortunately, in the KSOMP algorithm, we only need to compute the correlation matrix between z ˜ ϕ and x ϕ as
K ˜ X , Z = Z ˜ ϕ , X ϕ = Z ϕ W 1 / 2 , X ϕ = Z ϕ , X ϕ W 1 / 2 = κ ( X , Z ) W 1 / 2 = K X , Z W 1 / 2 .
By employing the KSOMP Algorithm 1, we can obtain the sparsity coefficient matrix:
S ^ = [ α ^ 1 , α ^ 2 , , α ^ T ]
and further compute the approximation error of each neighboring pixel:
t = ϕ ( z t ) ω t X ϕ α ^ t 2 2 = ω t κ ( z t , z t ) 2 ω t α ^ t T κ X , z t + α ^ t T K X α ^ t .
Based on the approximation errors, we can describe the complexity of each feature neighboring pixel ϕ ( z t ) and determine the corresponding weight value based on self-paced learning as follows:
ω ^ t = arg min 0 ω t 1 ω t t + h ( λ , ω t ) .
To solve the weight ω t in (15), a self-paced function h needs to be defined. The self-paced function can be binary, linear, logarithmic, or a mixture function [28]. Here, we take the mixture function as an example to show how to solve the weight. The mixture function is defined as
h ( λ 1 , λ 2 , ω t ) = ζ log ω t + ζ λ 1 , ζ = λ 1 λ 2 λ 1 λ 2 , 0 < λ 2 < λ 1 .
Substituting (16) into (15) obtains
ω ^ t = arg min 0 ω t 1 ω t t ζ log ω t + ζ λ 1 .
Taking the derivative with respect to ω t , we obtain the mixture weight:
ω t = 1 , t λ 2 ; 0 , t λ 1 ; ζ ( λ 1 t ) / ( λ 1 t ) , λ 2 < t < λ 1 .
Based on the weight in (18), the pixels are divided into three classes: (1) “easy” pixels with small loss ( t λ 2 ) corresponding to weight 1; (2) “complex” pixels with large loss ( t λ 1 ) corresponding to weight 0; and (3) “moderate” pixels whose loss is between λ 2 and λ 1 . It is clear that the “complex” pixels are excluded from the JSR model. When λ 1 is small, only very limited “easy” pixels are used. With an increase in λ 1 , more neighboring pixels are regarded as “moderate” pixels and hence used for the model.
From (18), we can see that the parameters λ 1 and λ 2 are related to the losses of neighboring pixels. As the amplitude of losses is different in each iteration, the setting of these parameters is difficult. Considering that λ 1 and λ 2 control the size of the active feature neighboring pixel set, we can set these two parameters by setting the number of feature neighboring pixels in each iteration of the self-paced learning process. For a square neighborhood with T pixels, we first define a pixel number sequence { T 1 , T 2 , , T m a x } , where T i denotes the number of feature neighboring pixels selected in the i-th iteration [17], and T m a x = T . In the i-th iteration, the loss vector of neighboring pixels ( i ) is sorted in an ascending order, which results in a sorted loss a ( i ) . Then we set the parameters as follows: λ 1 ( i ) = a ( i ) ( T i ) with T i = ( k 1 + ( i 1 ) δ ) T , and λ 2 ( i ) = a ( i ) ( T ˜ i ) with T ˜ i = ( k 2 + ( i 1 ) δ ) T . Now, the parameters λ 1 and λ 2 are determined by the fixed initialization parameters k 1 , k 2 and step size δ . In the experiment, k 1 , k 2 and δ are set as k 1 = 0.5 , k 2 = 0.2 and δ = 0.05 .
By alternatively updating the coefficient matrix S ^ and weight vector ω ^ according to (13) and (15) or (18), the objective function in (11) will decrease, and finally the the algorithm will converge.
When the coefficient matrix S ^ and weight vector ω ^ are obtained, the reconstruction residual for each class can be computed:
r c ( z ) = Z ϕ W ^ 1 / 2 ( X ϕ ) : , Ω c S ^ Ω c , : F 2 = t = 1 T ϕ ( z t ) ω ^ t 1 / 2 ( X ϕ ) : , Ω c S ^ Ω c , t F 2 = t = 1 T κ ( z t , z t ) ω ^ t 2 ω ^ t 1 / 2 S ^ Ω c , t T ( K X , Z ) Ω c , t + S ^ Ω c , t T ( K X ) Ω c , Ω c S ^ Ω c , t .
Finally, the testing pixel z is assigned to the class with the minimal reconstruction residual.
Algorithm 2 shows the implementation process of SPKJSR.
Algorithm 2 SPKJSR
Input: Training dictionary x = [ x 1 , x 2 , , x N ] , neighborhood pixel matrix z = [ z 1 , z 2 , , z T ] ,
    sparsity level K, kernel function κ , the initial parameters k 1 , k 2 , and step δ .,
  Compute K x = κ ( x , x ) and K x , z = κ ( x , z ) .
  Initialization: W ( 0 ) = I , and let k = 1 .
  1 Solve the coefficient matrix S ^ and weight W ^ by running the following steps until convergence:
   1.1 Solve the sparsity coefficient matrix S ( k ) = [ α 1 ( k ) , , α T ( k ) ] by the KSOMP algorithm:
S ( k ) = arg min S Z ϕ ( W ( k 1 ) ) 1 / 2 X ϕ S F 2 , s . t . S r o w , 0 K .

   1.2 Compute the error of each neighboring pixel:
t ( k ) = ϕ ( z t ) ( ω t ( k 1 ) ) 1 / 2 X ϕ α t ( k ) 2 = ω t ( k 1 ) κ ( z t , z t ) 2 ( ω t ( k 1 ) ) 1 / 2 ( α t ( k ) ) T κ X , z t + ( α t ( k ) ) T K X α t ( k ) .

   1.3 Compute the sorted loss vector a ( k ) , and update the model age:
λ 1 ( k ) = a ( k ) ( k 1 + ( k 1 ) δ ) T , λ 2 ( k ) = a ( k ) ( k 2 + ( k 1 ) δ ) T .

   1.4 Estimate the self-paced weight:
ω t ( k ) = arg min 0 ω t 1 ω t t ( k ) + h ( λ 1 ( k ) , λ 2 ( k ) , ω t ) .

   1.5 Update the weight matrix: W ( k ) = diag ω 1 ( k ) , , ω T ( k ) .
   1.6 Update the number of iterations: k = k + 1 , and go to Step 1.1.
  2 Compute the reconstruct error of each class:
r c ( z ) = Z ϕ W ^ 1 / 2 ( X ϕ ) : , Ω c S ^ Ω c , : F 2 .

  3 Classify the testing pixel z :
Class ( z ) = arg min c = 1 , , C r c ( z ) .

3. Experiments Results

3.1. Data Sets

To evaluate the performance of the proposed method for HSI classification, we use the following two benchmark hyperspectral data sets (available online: http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes):
(1) Indian Pines: This image scene has a size of 145 × 145 pixels and 220 spectral bands, where 200 spectral bands are used in the experiments by removing 20 noisy bands from the original data. The image has 16 different ground-truth classes. The false color composite image and ground-truth map are shown in Figure 2.
(2) Salinas: This image scene has a size of 512 × 217 pixels and 204 spectral bands. The data contains 16 ground-truth classes and a total of 54,129 labeled samples. The false color composite image and ground-truth map are shown in Figure 3.
In real applications, the number of labeled samples is usually very limited, which makes HSI classification a challenging problem. To show the performance of our proposed method in the case of small sample sizes, we randomly choose 1% of samples from each class to form the training set, and all remaining samples consist of the testing set. The selected training and testing samples for these two data sets are shown in Table 1 and Table 2, respectively.

3.2. Model Setting

We compare the proposed method with seven related classification methods: (1) support vector machine (SVM); (2) SVM with a spatial–spectral composite kernel (SVMCK) [6]; (3) 1 -norm-based sparse representation classification (SRC) [32]; (4) 0 -norm-based orthogonal matching pursuit for sparse representation classification (OMP) [9]; (5) JSR [9]; (6) non-local weighted JSR (WJSR) [10]; and (7) KJSR [24]. Among these methods, SVM is a spectral classifier and SVMCK is the corresponding spatial–spectral classifier. SRC and OMP are sparse-based spectral classifiers, and JSR, WJSR, KJSR, and SPKJSR are corresponding spatial–spectral classifiers. The experiments are carried out using MATLAB R2017a and run on a computer with a 3.50 GHz Intel(R) Xeon(R) E5-1620 CPU, 32GB RAM, and a Windows 7 operating system.
In the experiments, the class-specific accuracy (CA), overall accuracy (OA), average accuracy (AA), and kappa coefficient ( κ ) on the testing set are recorded to assess the performance of different classification methods. The CA is the ratio of correctly classified pixels to the total number of pixels for each individual class. The OA is the ratio of correctly classified pixels to the total number of pixels. The AA is the mean of the CAs. The kappa coefficient quantifies the agreement of classification. A statistical McNemar’s test is used to evaluate the statistical significance of differences between the overall accuracy of different algorithms [3,15,34,35]. The Z value of McNemar’s test is defined as:
Z = f 12 f 21 f 12 + f 21 ,
where f 12 indicates the number of testing samples classified incorrectly by classifier 1 and correctly by classifier 2, and f 21 has a dual meaning. Accepting the common 5% level of significance, the difference in accuracy between classifiers 1 and 2 is said to be statistically significant if | Z | > 1.96 . Here, we set the proposed SPLKJSR algorithm as classifier 1 and the comparison algorithm as classifier 2. Thus, Z < 1.96 indicates that SPLKJSR is statistically more accurate than the comparison algorithm.
For SVM, the Gaussian radial basis function kernel is used, and the related parameters are set as C = 10000 and σ = 0.5 . For SVMCK, a weighted summation spatial–spectral kernel is used [6]. The parameters for SVMCK are a penalty parameter C, the width of spectral kernel σ w , the width of spatial kernel σ s , the combination coefficient μ , and the neighborhood size T. These parameters are empirically set as C = 10000 , σ w = 2 , σ s = 0.25 , μ = 0.8 , and T = 81 . As recommended in [9,10], the number of neighboring pixels and the sparsity level for JSR and WJSR on the Indian Pines data set are set as T = 81 and K = 30 . In [24], the number of neighboring pixels for KJSR is also set as T = 81 , and the sparsity level K 30 corresponds to similar results. For a fair comparison, we set the number of neighboring pixels and the sparsity level as T = 81 and K = 30 in the Indian Pines data set for four JSR-based methods (i.e., JSR, WJSR, KJSR, and SPKJSR). Considering that the Salinas image consists of large homogeneous regions, a large spatial window of size 9 × 9 ( T = 81 ) is used, and the sparsity level is also set as K = 30 [15,17].

3.3. Classification Results

In the experiment, we randomly choose the training samples and testing samples five times and record the mean accuracies and standard derivations over the five runs for different algorithms. The classification results for the Indian Pines and Salinas data sets are shown in Table 3 and Table 4, respectively. From the classification results, we can see that
(1) The proposed SPKJSR provides the best classification results for both data sets. Compared with the original KJSR, SPKJSR improves the OA and κ coefficient by about 4% in Indian Pines, and by about 2% in Salinas. This demonstrates that the self-paced learning strategy can eliminate the negative effect of inhomogeneous pixels and select effective feature neighboring pixels for the joint sparse representation, which helps to improve the classification performance. Moreover, the improvement in performance when using the proposed model over the rest of the methods is statistically significant because the Z values for McNemar’s test in Table 5 are much smaller than −1.96.
(2) Compared with JSR, KJSR improves the OA by 9% and 6% in the Indian Pines and Salinas data sets, respectively. It demonstrates that there exists nonlinear relations between samples in these two hyperspectral data sets, and the nonlinear kernel can effectively describe the intrinsic nonlinear structure relations.
(3) For the CA, SPKJSR obtains the best results for most of the classes. In particular, SPKJSR shows good performance for the classes with large numbers of samples, such as Classes 2, 10, 11, and 12 of Indian Pines, and Classes 8, 9, 10, and 15 of Salinas. In addition, our SPKJSR also provides satisfactory performance for the classes with very limited samples, such as Classes 4, 7, 9, 10, and 15 of Indian Pines. In contrast, traditional sparse-based methods almost fail for these classes due to the lack of dictionary atoms. In the JSR model, the training dictionary and neighborhood pixel matrix are two key factors. From the mechanism of sparse representation, when a class has a large number of samples, the number of dictionary atoms corresponding to this class is also large, so the representation ability and classification performance on large classes are usually better. Our proposed SPKJSR employs a self-paced learning strategy to adaptively select important pixels and discard inhomogeneous pixels, which refines the structure of the neighborhood feature pixel matrix and improves the reliability of joint representation.
Figure 4 and Figure 5 show the classification maps obtained by different algorithms. The spectral-based methods (i.e., SVM, SRC, and OMP) generate very bad results with too much “salt & pepper” noise because they only use the spectral information without using spatial neighborhood information. Spatial–spectral-based methods greatly improve the classification performance. Compared with KJSR, our SPKJSR shows relatively better results. In particular, with the Salinas data set, traditional sparse-based methods, such as SRC, OMP, JSR, and WJSR, almost wrongly classify Class No. 15 “Vineyard untrained” as Class No. 8 “Grape untrained”. These two classes are spatially adjacent, so their spectral characteristics are very similar. It is very difficult to classify these spectrally similar classes. This notwithstanding, our SPKJSR can still discriminate the subtle differences between these two classes and provides desirable results.
The computational times for different methods when reaching their optimal classification performances are reported in Table 6. Because sparse models need to make predictions for each test sample separately, they are usually more time-consuming. As an iterative algorithm, the proposed SPKJSR is relatively slower than JSR and KJSR.
In the previous experiments, we have evaluated the effectiveness of our proposed SPKJSR in the case of small sample sizes (i.e., 1% of samples per class for training). Here, we further show the results for a large number of training samples and analyze the effect of the number of training samples. For this purpose, we draw 1%, 2%, 3%, 4%, and 5% of labeled samples from each class to form the training set and then evaluate the performance of different algorithms on the corresponding testing set. Figure 6 and Figure 7 show the OA of different methods versus the ratio of training samples per class for the Indian Pines and Salinas data sets, respectively. It can clearly be seen that SPKJSR provides consistently better results than the other algorithms with different numbers of training samples. Compared with the original linear JSR method, kernel-based KJSR and SPKJSR show a great performance improvement, which demonstrates the effectiveness of the kernel method for describing the intrinsic nonlinear relations of hyperspectral data.
As shown in Algorithm 2, SPKJSR employs an alternative iterative strategy to update the sparsity coefficient matrix and weight matrix, so it is an iterative algorithm. Now, we investigate the effect of the number of iterations. Figure 8 and Figure 9 show the classification OA versus the number of iterations on the Indian Pines and Salinas data sets, respectively. It can be seen that the proposed algorithm achieves relatively better performance after 3 iterations.

4. Discussions

From the previous results in Table 3 and Table 4, we can see that KJSR dramatically improves the original JSR. This demonstrates that kernel representation is effective for modeling the nonlinear relation between hyperspectral pixels [24]. In addition, the improvement of WJSR over JSR demonstrates that pixels in a spatial neighborhood have differences, and the weighted strategy can alleviate the negative effect of noisy or inhomogeneous pixels in the neighborhood [10]. Due to the presence of noisy neighboring pixels, directly performing kernel representation on these noisy pixels may not be robust. Our proposed SPKJSR model provides a robust kernel representation to improve the robustness of the joint representation of neighboring pixels in the feature space.
Although many existing JSR methods have tried to eliminate noisy or inhomogeneous pixels in the spatial neighborhood in a preprocessing step by means of image segmentation techniques [12,13,14], they usually suffer from an inaccurate identification of inhomogeneous neighboring pixels. The identification of pixels is based on spectral similarity, which is usually inaccurate due to the spectral variation of spatially adjacent pixels. Rather than deleting inhomogeneous pixels in advance, defining a robust metric has proven to be effective for suppressing the effect of inhomogeneous pixels on the JSR model [15,17,18]. Some robust metrics, such as correntropy-based metrics [15] and maximum likelihood estimation-based metrics [18,36], are used. This notwithstanding, in the feature space, there is a lack of robust metrics on kernel representation for the KJSR model. To the best of our knowledge, this paper is the first to provide a robust kernel representation for the KJSR model.

5. Conclusions

We have proposed a self-paced kernel joint sparse representation (SPKJSR) model for hyperspectral image classification. The proposed SPKJSR mainly improves the feature neighboring pixel representation in the traditional kernel joint sparse representation (KJSR) model. By introducing a self-paced learning strategy, the proposed SPKJSR simultaneously optimizes the sparsity coefficient matrix and weight matrix for feature neighboring pixels in a regularized KJSR framework. The optimized weight can indicate the difficulty of neighboring pixels. The inhomogeneous or noisy neighboring pixels are assigned a small weight or 0 weight and hence produce very limited effects on the joint sparse representation. Thus, SPKJSR is much more robust and accurate than the traditional kernel joint sparse representation method. To validate the effectiveness of the proposed method, we have performed experiments on two benchmark hyperspectral data sets: Indian Pines and Salinas. Experimental results have shown that the proposed SPKJSR provides consistently better results than other existing joint sparse representation methods. In particular, when only 1% of samples per class are use for training, SPKJSR improves the overall accuracy of KJSR by about 3.5% on the Indian Pines data set and 1.8% on the Salinas data set. In the future, we will consider using different kinds of kernels for the KJSR model, such as spatial–spectral-based kernels and Log-Euclidean kernels.

Author Contributions

Conceptualization, S.H., J.P., Y.F., and L.L.; Methodology, S.H., J.P., Y.F., and L.L.; Software, S.H. and J.P.; Validation, J.P. and Y.F.; Formal analysis, S.H., J.P., and Y.F.; Investigation, S.H., J.P., and L.L.; Resources, J.P. and L.L.; Writing—original draft preparation, S.H. and J.P.; Writing—review and editing, S.H., J.P., Y.F., and L.L.; Supervision, J.P., Y.F., and L.L.

Funding

This research was funded by the National Natural Science Foundation of China under Grant Nos. 61871177 and 11771130.

Acknowledgments

The authors would like to thank D. Landgrebe for providing the Indian Pines data set and J. Anthony Gualtieri for providing the Salinas data set.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
HSIHyperspectral image
JSRJoint sparse representation
KJSRKernel joint sparse representation
SPLSelf-paced learning
SPKJSRSelf-paced kernel joint sparse representation
SOMPSimultaneous orthogonal matching pursuit
KSOMPKernel simultaneous orthogonal matching pursuit

References

  1. Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.; Chanussot, J. Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef]
  2. Fauvel, M.; Tarabalka, Y.; Benediktsson, J.A.; Chanussot, J.; Tilton, J.C. Advances in spectral-spatial classification of hyperspectral images. Proc. IEEE 2013, 101, 652–675. [Google Scholar] [CrossRef]
  3. Mallinis, G.; Galidaki, G.; Gitas, I. A comparative analysis of EO-1 Hyperion, Quickbird and Landsat TM imagery for fuel type mapping of a typical mediterranean landscape. Remote Sens. 2014, 6, 1684–1704. [Google Scholar] [CrossRef]
  4. Zhou, Y.; Peng, J.; Chen, C.L.P. Dimension reduction using spatial and spectral regularized local discriminant embedding for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 1082–1095. [Google Scholar] [CrossRef]
  5. He, L.; Li, J.; Liu, C.; Li, S. Recent advances on spectral-spatial hyperspectral image classification: An overview and new guidelines. IEEE Trans. Geosci. Remote Sens. 2018, 56, 1579–1597. [Google Scholar] [CrossRef]
  6. Camps-Valls, G.; Gomez-Chova, L.; Munoz-Mare, J.; Vila-Frances, J.; Calpe-Maravilla, J. Composite kernels for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2006, 3, 93–97. [Google Scholar] [CrossRef]
  7. Zhou, Y.; Peng, J.; Chen, C.L.P. Extreme learning machine with composite kernels for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2351–2360. [Google Scholar] [CrossRef]
  8. Peng, J.; Zhou, Y.; Chen, C.L.P. Region-kernel-based support vector machines for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 4810–4824. [Google Scholar] [CrossRef]
  9. Chen, Y.; Nasrabadi, N.; Tran, T. Hyperspectral image classification using dictionary-based sparse representation. IEEE Trans. Geosci. Remote Sens. 2011, 49, 3973–3985. [Google Scholar] [CrossRef]
  10. Zhang, H.; Li, J.; Huang, Y.; Zhang, L. A nonlocal weighted joint sparse representation classification method for hyperspectral imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2057–2066. [Google Scholar]
  11. Chen, C.; Chen, N.; Peng, J. Nearest regularized joint sparse representation for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2016, 13, 424–428. [Google Scholar] [CrossRef]
  12. Zou, J.; Li, W.; Huang, X.; Du, Q. Classification of hyperspectral urban data using adaptive simultaneous orthogonal matching pursuit. J. Appl. Remote Sens. 2014, 8, 085099. [Google Scholar] [CrossRef]
  13. Fang, L.; Li, S.; Kang, X.; Benediktsson, J.A. Spectral-spatial classification of hyperspectral images with a superpixel-based discriminative sparse model. IEEE Trans. Geosci. Remote Sens. 2015, 53, 4186–4201. [Google Scholar] [CrossRef]
  14. Fu, W.; Li, S.; Fang, L.; Benediktsson, J.A. Hyperspectral image classification via shapeadaptive joint sparse representation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 556–567. [Google Scholar] [CrossRef]
  15. Peng, J.; Du, Q. Robust joint sparse representation based on maximum correntropy criterion for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 7152–7164. [Google Scholar] [CrossRef]
  16. Wu, C.; Du, B.; Zhang, L. Hyperspectral anomalous change detection based on joint sparse representation. ISPRS J. Photogramm. Remote Sens. 2018, 146, 137–150. [Google Scholar] [CrossRef]
  17. Peng, J.; Sun, W.; Du, Q. Self-paced joint sparse representation for the classification of hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 1183–1194. [Google Scholar] [CrossRef]
  18. Peng, J.; Li, L.; Tang, Y. Maximum likelihood estimation based joint sparse representation for the classification of hyperspectral remote sensing images. IEEE Trans. Neural Netw. Learn. Syst. 2018. [Google Scholar] [CrossRef]
  19. Hu, S.; Xu, C.; Peng, J.; Xu, Y.; Tian, L. Weighted kernel joint sparse representation for hyperspectral image classification. IET Image Process. 2019, 13, 254–260. [Google Scholar] [CrossRef]
  20. Camps-Valls, G.; Bruzzone, L. Kernel-based methods for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2005, 43, 1351–1362. [Google Scholar] [CrossRef]
  21. Peng, J.; Chen, H.; Zhou, Y.; Li, L. Ideal regularized composite kernel for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 1563–1574. [Google Scholar] [CrossRef]
  22. Heylen, R.; Parente, M.; Gader, P. A review of nonlinear hyperspectral unmixing methods. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 1844–1868. [Google Scholar] [CrossRef]
  23. Han, H.; Goodenough, D.G. Investigation of Nonlinearity in Hyperspectral Imagery Using Surrogate Data Methods. IEEE Trans. Geosci. Remote Sens. 2008, 46, 2840–2847. [Google Scholar] [CrossRef]
  24. Chen, Y.; Nasrabadi, N.; Tran, T. Hyperspectral image classification via kernel sparse representation. IEEE Trans. Geosci. Remote Sens. 2013, 51, 217–231. [Google Scholar] [CrossRef]
  25. Wang, J.; Jiao, L.; Liu, H.; Yang, S.; Liu, F. Hyperspectral image classification by spatial-spectral derivative-aided kernel joint sparse representation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2485–2500. [Google Scholar] [CrossRef]
  26. Zhang, E.; Zhang, X.; Jiao, L.; Liu, H.; Wang, S.; Hou, B. Weighted multifeature hyperspectral image classification via kernel joint sparse representation. Neurocomputing 2016, 178, 71–86. [Google Scholar] [CrossRef]
  27. Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In International Conference on Machine Learning (ICML); ACM: New York, NY, USA, 2009; pp. 41–48. [Google Scholar]
  28. Jiang, Y.; Meng, D.; Zhao, Q.; Shan, S.; Hauptmann, A. Self-paced curriculum learning. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), Austin, TX, USA, 25–30 January 2015; pp. 2694–2700. [Google Scholar]
  29. Meng, D.; Zhao, Q.; Jiang, L. A theoretical understanding of self-paced learning. Inf. Sci. 2017, 414, 319–328. [Google Scholar] [CrossRef]
  30. Tang, Y.; Wang, X.; Harrison, A.P.; Lu, L.; Xiao, J.; Summers, R.M. Attention-guided curriculum learning for weakly supervised classification and localization of thoracic diseases on chest radiographs. In International Workshop on Machine Learning in Medical Imaging (MLMI); Springer: Cham, Switzerland, 2018; pp. 249–258. [Google Scholar]
  31. Wu, Y.; Tian, Y. Training agent for first-person shooter game with actor-critic curriculum learning. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
  32. Wright, J.; Yang, A.Y.; Ganesh, A.; Sastry, S.; Ma, Y. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 210–227. [Google Scholar] [CrossRef] [PubMed]
  33. Tropp, J.A.; Gilbert, A.C.; Strauss, M.J. Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit. Signal Process. 2006, 86, 572–588. [Google Scholar] [CrossRef]
  34. Foody, G.M. Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy. Photogramm. Eng. Remote Sens. 2004, 70, 627–633. [Google Scholar] [CrossRef]
  35. Fauvel, M.; Benediktsson, J.A.; Chanussot, J.; Sveinsson, J.R. Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles. IEEE Trans. Geosci. Remote Sens. 2008, 46, 3804–3814. [Google Scholar] [CrossRef]
  36. Yang, M.; Zhang, L.; Shiu, S.; Zhang, D. Robust Kernel Representation With Statistical Local Features for Face Recognition. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 900–912. [Google Scholar] [CrossRef]
Figure 1. Sketches of sparse representation and joint sparse representation. (a) Sparse Representation; (b) Joint Sparse Representation.
Figure 1. Sketches of sparse representation and joint sparse representation. (a) Sparse Representation; (b) Joint Sparse Representation.
Remotesensing 11 01114 g001
Figure 2. Indian Pines data set. (a) RGB composite image; (b) ground-truth map.
Figure 2. Indian Pines data set. (a) RGB composite image; (b) ground-truth map.
Remotesensing 11 01114 g002
Figure 3. Salinas data set. (a) RGB composite image; (b) ground-truth map.
Figure 3. Salinas data set. (a) RGB composite image; (b) ground-truth map.
Remotesensing 11 01114 g003
Figure 4. Indian Pines: (a) Ground-truth map; classification maps obtained by (b) SVM (65.78%), (c) SVMCK (74.94%); (d) SRC (67.28%); (e) OMP (60.89%); (f) JSR (70.18%); (g) WJSR (71.48%); (h) KJSR (80.11%); and (i) SPKJSR (83.61%).
Figure 4. Indian Pines: (a) Ground-truth map; classification maps obtained by (b) SVM (65.78%), (c) SVMCK (74.94%); (d) SRC (67.28%); (e) OMP (60.89%); (f) JSR (70.18%); (g) WJSR (71.48%); (h) KJSR (80.11%); and (i) SPKJSR (83.61%).
Remotesensing 11 01114 g004
Figure 5. Salinas: (a) Ground-truth map; classification maps obtained by (b) SVM (90.09%); (c) SVMCK (93.77%); (d) SRC (89.16%); (e) OMP (84.75%); (f) JSR (88.37%); (g) WJSR (88.92%); (h) KJSR (95.03%); and (i) SPKJSR (96.88%).
Figure 5. Salinas: (a) Ground-truth map; classification maps obtained by (b) SVM (90.09%); (c) SVMCK (93.77%); (d) SRC (89.16%); (e) OMP (84.75%); (f) JSR (88.37%); (g) WJSR (88.92%); (h) KJSR (95.03%); and (i) SPKJSR (96.88%).
Remotesensing 11 01114 g005
Figure 6. OA versus the number of training samples on Indian Pines.
Figure 6. OA versus the number of training samples on Indian Pines.
Remotesensing 11 01114 g006
Figure 7. OA versus the number of training samples on Salinas.
Figure 7. OA versus the number of training samples on Salinas.
Remotesensing 11 01114 g007
Figure 8. OA versus the number of iterations on Indian Pines.
Figure 8. OA versus the number of iterations on Indian Pines.
Remotesensing 11 01114 g008
Figure 9. OA versus the number of iterations on Salinas.
Figure 9. OA versus the number of iterations on Salinas.
Remotesensing 11 01114 g009
Table 1. The number of training and testing samples for each class in the Indian Pines data set (only 1% of labeled samples per class for training, with a total of 115 training samples and 10,251 testing samples).
Table 1. The number of training and testing samples for each class in the Indian Pines data set (only 1% of labeled samples per class for training, with a total of 115 training samples and 10,251 testing samples).
NoClass#Train#TestNoClass#Train#Test
1Alfalfa3519Oats317
2Corn-notill14142010Soybean-notill10958
3Corn-mintill882611Soybean-mintill252443
4Corn323112Soybean-clean6608
5Grass-pasture549213Wheat3209
6Grass-trees774014Woods131281
7Grass-pasture-mowed32315Buildings-Grass-Trees-Drives4376
8Hay-windrowed548416Stone-Steel-Towers392
Table 2. The number of training and testing samples for each class in the Salinas data set (only 1% of labeled samples per class for training, with a total of 543 training samples and 53,586 testing samples).
Table 2. The number of training and testing samples for each class in the Salinas data set (only 1% of labeled samples per class for training, with a total of 543 training samples and 53,586 testing samples).
NoClass#Train#TestNoClass#Train#Test
1Weeds12019899Soil626141
2Weeds237368910Corn333245
3Fallow20195611Lettuce 4wk111057
4Fallow plow14138012Lettuce 5wk191908
5Fallow smooth27265113Lettuce 6wk9907
6Stubble40391914Lettuce 7wk111059
7Celery36354315Vineyard untrained737195
8Grapes untrained1131115816Vineyard trellis181789
Table 3. Overall, average, and individual class accuracies and κ statistics in the form of mean ± standard deviation for the Indian Pines data set. The best results are highlighted in bold typeface.
Table 3. Overall, average, and individual class accuracies and κ statistics in the form of mean ± standard deviation for the Indian Pines data set. The best results are highlighted in bold typeface.
ClassSVMSVMCKSRCOMPJSRWJSRKJSRSPKJSR
166.01 ± 5.9966.01 ± 17.766.01 ± 8.1658.82 ± 5.1979.08 ± 5.9977.12 ± 22.088.89 ± 4.9396.73 ± 4.08
259.27 ± 1.8979.13 ± 9.1552.37 ± 5.5544.51 ± 7.6555.73 ± 2.1158.10 ± 6.0974.79 ± 6.2079.53 ± 5.03
347.94 ± 2.0069.49 ± 9.2848.38 ± 7.0435.27 ± 1.4641.49 ± 20.341.81 ± 18.063.64 ± 8.7867.96 ± 8.35
446.32 ± 6.8057.72 ± 26.450.07 ± 3.2833.77 ± 6.7536.22 ± 1.6434.34 ± 3.1973.30 ± 11.283.41 ± 6.30
582.45 ± 6.6762.60 ± 10.065.51 ± 10.473.37 ± 7.2273.10 ± 7.0775.34 ± 8.9879.06 ± 7.2779.05 ± 5.38
689.50 ± 4.1789.10 ± 5.6693.02 ± 3.6989.05 ± 5.6398.11 ± 0.9598.65 ± 0.6298.38 ± 0.6290.50 ± 4.03
795.65 ± 4.3592.75 ± 6.6485.51 ± 5.0285.51 ± 2.5171.01 ± 13.388.41 ± 10.091.30 ± 11.5100.0 ± 0
889.26 ± 3.9683.47 ± 12.094.15 ± 1.4792.29 ± 2.0797.86 ± 1.1499.66 ± 0.1299.93 ± 0.1299.93 ± 0.12
998.04 ± 3.4096.08 ± 3.4096.08 ± 6.7974.51 ± 18.945.10 ± 8.9874.51 ± 23.872.55 ± 3.4094.12 ± 5.88
1050.45 ± 21.065.27 ± 11.654.07 ± 10.947.88 ± 10.531.94 ± 5.5331.91 ± 3.5575.16 ± 12.379.71 ± 12.7
1167.18 ± 8.0473.77 ± 2.7171.73 ± 4.3762.44 ± 6.9880.91 ± 3.7383.63 ± 3.3386.64 ± 4.7589.78 ± 2.72
1235.85 ± 4.9459.37 ± 14.145.34 ± 14.338.32 ± 6.7756.36 ± 15.557.84 ± 14.051.10 ± 16.761.51 ± 9.14
1396.97 ± 1.9381.50 ± 18.099.20 ± 0.2896.81 ± 2.6498.40 ± 1.3899.52 ± 0.8399.84 ± 0.2892.18 ± 3.44
1486.78 ± 4.1192.53 ± 4.6994.87 ± 1.6589.69 ± 2.3898.83 ± 0.7999.53 ± 0.4799.79 ± 0.1699.32 ± 1.03
1523.14 ± 8.9845.92 ± 10.813.83 ± 4.1816.58 ± 4.8544.50 ± 16.639.36 ± 20.116.49 ± 15.149.56 ± 5.49
1689.13 ± 4.74100.0 ± 088.77 ± 5.9988.04 ± 6.0596.01 ± 1.2698.91 ± 1.0986.59 ± 3.3281.16 ± 7.63
OA65.78 ± 1.5674.94 ± 3.5567.28 ± 1.3560.89 ± 1.4770.18 ± 0.2471.48 ± 0.6480.11 ± 1.8683.61 ± 1.48
AA70.25 ± 24.075.92 ± 15.869.93 ± 24.664.18 ± 25.569.04 ± 24.772.41 ± 25.278.59 ± 21.784.03 ± 14.5
κ 61.02 ± 1.9071.49 ± 4.0562.65 ± 1.8255.44 ± 1.8365.60 ± 0.3766.98 ± 0.8677.13 ± 2.3281.19 ± 1.87
Table 4. Overall, average, and individual class accuracies and κ statistics in the form of mean ± standard deviation for the Salinas data set. The best results are highlighted in bold typeface.
Table 4. Overall, average, and individual class accuracies and κ statistics in the form of mean ± standard deviation for the Salinas data set. The best results are highlighted in bold typeface.
ClassSVMSVMCKSRCOMPJSRWJSRKJSRSPKJSR
199.06 ± 0.4596.61 ± 5.8299.41 ± 0.4899.14 ± 0.30100.0 ± 0100.0 ± 0100.0 ± 0100.0 ± 0
298.48 ± 0.3399.86 ± 0.1898.74 ± 0.7698.24 ± 1.6299.96 ± 0.0499.95 ± 0.05100.0 ± 0100.0 ± 0
394.24 ± 4.4998.91 ± 1.5893.39 ± 1.3988.05 ± 3.0492.09 ± 3.1692.04 ± 2.9899.97 ± 0.06100.0 ± 0
498.21 ± 1.2294.86 ± 1.6298.86 ± 0.5594.23 ± 4.4796.45 ± 3.0196.47 ± 3.1099.08 ± 1.4196.52 ± 2.02
597.14 ± 1.3697.51 ± 1.6098.16 ± 0.8391.56 ± 0.5888.32 ± 2.1293.19 ± 2.6799.71 ± 0.3099.56 ± 0.08
699.56 ± 0.2198.94 ± 1.7299.72 ± 0.1799.82 ± 0.0399.95 ± 0.0799.96 ± 0.01100.0 ± 099.91 ± 0.05
799.27 ± 0.1099.08 ± 0.4799.28 ± 0.2099.63 ± 0.1299.89 ± 0.1599.91 ± 0.03100.0 ± 0100.0 ± 0
886.28 ± 4.7793.07 ± 0.7285.07 ± 1.4478.08 ± 1.4494.54 ± 0.6594.89 ± 0.8296.97 ± 1.5497.66 ± 0.84
997.81 ± 0.9997.33 ± 2.6798.01 ± 0.3697.65 ± 0.6799.31 ± 0.6399.26 ± 0.68100.0 ± 0100.0 ± 0
1087.98 ± 8.2793.71 ± 1.6390.92 ± 2.8288.02 ± 0.9693.97 ± 1.3294.96 ± 1.9896.48 ± 1.8097.41 ± 1.74
1192.21 ± 4.9088.46 ± 9.1193.38 ± 4.9992.75 ± 4.9894.04 ± 3.4595.05 ± 2.7999.27 ± 0.5599.84 ± 0.05
1299.86 ± 0.2499.86 ± 0.2099.95 ± 086.48 ± 4.0684.26 ± 10.785.17 ± 10.8100.0 ± 099.84 ± 0.10
1397.43 ± 0.2398.53 ± 1.4097.57 ± 0.2289.12 ± 2.6178.61 ± 3.0688.31 ± 3.2598.97 ± 0.5099.34 ± 0.22
1492.63 ± 1.1896.51 ± 0.9692.79 ± 1.1390.97 ± 0.8595.62 ± 1.5694.65 ± 0.7399.40 ± 0.3898.62 ± 1.66
1562.53 ± 4.3176.72 ± 3.9454.91 ± 2.1844.80 ± 2.8840.95 ± 13.240.81 ± 13.170.28 ± 3.0883.00 ± 1.70
1697.82 ± 1.1297.59 ± 2.1798.56 ± 0.4397.06 ± 1.8099.29 ± 0.3499.39 ± 0.5198.58 ± 0.8099.03 ± 0.76
OA90.09 ± 1.0893.77 ± 0.4089.16 ± 0.3384.75 ± 0.5288.37 ± 2.0588.92 ± 1.9995.03 ± 0.2896.88 ± 0.26
AA93.78 ± 9.3095.47 ± 5.8393.67 ± 11.189.73 ± 13.491.08 ± 14.792.12 ± 14.497.42 ± 7.3298.17 ± 4.18
κ 88.95 ± 1.2193.05 ± 0.4587.90 ± 0.3882.98 ± 0.5986.97 ± 2.3287.58 ± 2.2494.46 ± 0.3196.52 ± 0.29
Table 5. Z values of McNemar’s test between the proposed SPKJSR and other methods.
Table 5. Z values of McNemar’s test between the proposed SPKJSR and other methods.
SVMSVMCKSRCOMPJSRWJSRKJSR
Indian Pines−33.18−15.56−32.69−40.33−23.60−21.59−11.04
Salinas−54.52−27.20−57.54−76.39−68.47−65.61−25.53
Table 6. The running time of different methods.
Table 6. The running time of different methods.
SVMSVMCKSRCOMPJSRWJSRKJSRSPKJSR
Indian Pines0.32.2212015216975243
Salinas1.917198133138015999983061

Share and Cite

MDPI and ACS Style

Hu, S.; Peng, J.; Fu, Y.; Li, L. Kernel Joint Sparse Representation Based on Self-Paced Learning for Hyperspectral Image Classification. Remote Sens. 2019, 11, 1114. https://doi.org/10.3390/rs11091114

AMA Style

Hu S, Peng J, Fu Y, Li L. Kernel Joint Sparse Representation Based on Self-Paced Learning for Hyperspectral Image Classification. Remote Sensing. 2019; 11(9):1114. https://doi.org/10.3390/rs11091114

Chicago/Turabian Style

Hu, Sixiu, Jiangtao Peng, Yingxiong Fu, and Luoqing Li. 2019. "Kernel Joint Sparse Representation Based on Self-Paced Learning for Hyperspectral Image Classification" Remote Sensing 11, no. 9: 1114. https://doi.org/10.3390/rs11091114

APA Style

Hu, S., Peng, J., Fu, Y., & Li, L. (2019). Kernel Joint Sparse Representation Based on Self-Paced Learning for Hyperspectral Image Classification. Remote Sensing, 11(9), 1114. https://doi.org/10.3390/rs11091114

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop