Next Article in Journal
Multisensor Estimation Fusion with Gaussian Process for Nonlinear Dynamic Systems
Next Article in Special Issue
Phylogenetic Analysis of HIV-1 Genomes Based on the Position-Weighted K-mers Method
Previous Article in Journal
A Novel Residual Dense Pyramid Network for Image Dehazing
Previous Article in Special Issue
Radiomics Analysis on Contrast-Enhanced Spectral Mammography Images for Breast Cancer Diagnosis: A Pilot Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sub-Graph Regularization on Kernel Regression for Robust Semi-Supervised Dimensionality Reduction

1
School of Management Studies, Shanghai University of Engineering Science, Shanghai 201600, China
2
School of Information Science and Technology, Donghua University, Shanghai 201620, China
*
Authors to whom correspondence should be addressed.
Entropy 2019, 21(11), 1125; https://doi.org/10.3390/e21111125
Submission received: 7 October 2019 / Revised: 5 November 2019 / Accepted: 7 November 2019 / Published: 15 November 2019
(This article belongs to the Special Issue Statistical Inference from High Dimensional Data)

Abstract

:
Dimensionality reduction has always been a major problem for handling huge dimensionality datasets. Due to the utilization of labeled data, supervised dimensionality reduction methods such as Linear Discriminant Analysis tend achieve better classification performance compared with unsupervised methods. However, supervised methods need sufficient labeled data in order to achieve satisfying results. Therefore, semi-supervised learning (SSL) methods can be a practical selection rather than utilizing labeled data. In this paper, we develop a novel SSL method by extending anchor graph regularization (AGR) for dimensionality reduction. In detail, the AGR is an accelerating semi-supervised learning method to propagate the class labels to unlabeled data. However, it cannot handle new incoming samples. We thereby improve AGR by adding kernel regression on the basic objective function of AGR. Therefore, the proposed method can not only estimate the class labels of unlabeled data but also achieve dimensionality reduction. Extensive simulations on several benchmark datasets are conducted, and the simulation results verify the effectiveness for the proposed work.

1. Introduction

Dimensionality reduction is an important issue when handing high-dimensional data in many real-world applications, such as image classification, text recognition, etc. In general, dimensionality reduction is achieved by finding a linear or nonlinear projection matrix that casts the original high-dimensional data into a low-dimensional subspace so that the computational complexity can be reduced and the key intrinsic information can be preserved [1,2,3,4,5,6,7,8,9,10]. Principal component analysis (PCA) and linear discriminant analysis (LDA) [11] are two of the most widely-used methods for dimensionality reduction. PCA is achieved by finding a projection matrix along the maximum variance of the dataset with the best reconstruction. While LDA is utilized to search for the optimal direction ensuring that the dataset in the reduced subspace can maximize the between-class scatter while minimizing the within-class scatter. As LDA is a supervised approach, it generally outperforms PCA by giving sufficient labeled information.
A key problem is that obtaining a large amount of labeled data is time-consuming and expensive. On the other hand, unlabeled data may be abundant in some real world applications. Therefore, semi-supervised learning (SSL) approaches have become increasingly important in the area of pattern recognition and machine learning [1,2,4,12,13,14]. Over the past decades, according to the manifold or clustering assumptions—i.e., nearby data likely have the same labels [1,2,4]—graph based SSL is one of the most popular methods in the aspect of SSL, which includes the manifold regularization (MR) [3], learning with local and global consistency (LGC) [2] and Gaussian fields and harmonic functions (GFHF) [1] methods. All of these utilize labeled and unlabeled sets to formulate a graph for approximating the geometry of data manifolds [5].
The above graph-based SSL can be usually divided into two categorizations: The first is the inductive learning method and the second is the transductive learning one. The transductive learning methods aim to propagate the labeled information via a graph [1,2,4], so that the labels of an unlabeled set are estimated. However, a key problem for transductive learning methods is that they cannot estimate the class labels of new incoming data, therefore suffering from the out-of-sample problem. In constrast, the inductive learning methods, known as MR [3] and Semi-supervised Discriminant Analysis (SDA) [5], aim to study a decision function for classification on the original data space, so that they can reduce the dimensionality as well as naturally solve out-of-sample problems.
It can be noted that the graph in SSL tends to be a k nearest neighborhood (kNN) based graph that is first to find the k-neighborhoods of each data [15,16,17] and then define a weight matrix measuring the similarity between any pair-wise data [1,2,4,18,19,20,21]. However, kNN graph has a key limit in that it cannot be scalable to a large-scale dataset, as the computational complexity for searching the k neighborhoods of data is O k n 2 , which is not linear with n. To solve this problem, Liu et al. [22,23] proposed an efficient anchor graph (AGR), where each data point is first to find the k neighborhoods of anchor points, then the graph is constructed by the inner product of coefficients between the data and anchors, through which the class labels can be inferred from anchors to the whole dataset. As a result, the computational complexity can be greatly reduced. While there are different ways to build the adjacency matrix S in AGR [24,25,26], we argue that most of them are developed intuitively and lack a probability explanation. In addition, AGR cannot directly infer the class labels of incoming data.
In this paper, we aim to enhance AGR by solving the above problems. From the element concept idea of AGR, we point that the anchors should have the same probability distribution to those of data points, as the anchors refer to the data that can roughly approximate the distribution of data points. Based on this assumption, we then analyze S from the stochastic view and further extend it to be doubly-stochastic. As a result, the distribution of anchors is the same to those of data points, and the updated S can be treated as a transition matrix, where each value in S can be viewed as a transition probability value between any data point and anchor point. Benefiting from S, we then develop a sub-graph regularized framework for SSL. The new sub-graph is constructed by S in an efficient way and can preserve the geometry of data structure. Accordingly, an SSL strategy based on such a sub-graph is also developed, which is first to infer the labels of anchors and then to calculate those of the training data. The is quite different from conventional graph-based SSL, which is directly to infer the class labels of datasets on the whole graph and may result in a huge computational cost if the dataset is large-scale. However, this SSL strategy is efficient and suitable for handling a large-scale dataset. The experiments on extensive benchmark datasets show the effectiveness and efficiency of the proposed SSL method.
The main contributions of this paper are given as follows:
(1)
We develop a doubly-stochastic S that measures the similarity between data points and anchors. The new updated S has probability means and can be viewed as transition probability between data points and anchors. In addition, the proposed S is also a stochastic extension to the ones in AGR.
(2)
We develop a sub-graph regularized framework for SSL. The new sub-graph is constructed by S in an efficient way and can preserve the geometry of the data manifold.
(3)
We also adopt a linear predictor for inferring the class labels of new incoming data, which can handle out-of-sample problems. In addition, the computational complexity of this linear predictor is linear with the number of anchors, and hence is efficient.
The organization of the paper is as follows: In Section 2, basic notations and reviews for SSL are provided; in Section 3, the proposed model for graph construction and SSL are developed. In Section 4, we conduct extensive simulations, and give our final conclusions in Section 5.

2. Notations and Preliminary Work

2.1. Notations

Let X = [ X l , X u ] R d × l + u be the data matrix, where d presents the feature number, l and u are the number of labeled and unlabeled sets, respectively, so that X l and X u are respectively the labeled and unlabeled sets, Y = [ y 1 , y 2 , , y l + u ] R c × l + u be the one hot labels of data, F = [ f 1 , f 2 , , f l + u ] R c × l + u is the predicted label matrix satisfying 0 f i j 1 .

2.2. Review of Graph Based Semi-Supervised Learning

We will review the prior graph based SSL methods. Two well-known methods for SSL include LGC [1] and GFHF [2]. The objective of LGC and GFHF can be given as:
g L F = 1 2 i , j = 1 l + u f i D i i f j D j j F 2 W i j + λ i = 1 l + u f i y i F 2 g G F = 1 2 i , j = 1 l + u f i f j F 2 W i j + λ i = 1 l f i y i F 2
where λ is a balancing parameter that controls the trade off between the label fitness and the manifold smoothness. λ is a large value such that i = 1 l | | f i y i | | F 2 = 0 , or f i = y i , i = 1 , 2 , , l .

2.3. Anchor Graph Regularization

Anchor graph regularization (AGR) is an efficient graph based learning method for large-scale SSL. In detail, let A = a 1 , a 2 , a m R d × m be the anchor point set, G = g 1 , g 2 , g m R c × m be the label matrix of A, Z R m × n be the weight matrix measuring the similarity between each x j and a i with constraints Z i j 0 and i = 1 m Z i j = 1 , which is usually formulated by the kernel weights or the local reconstructed strategy making the computational complexity for both two strategies linear with the data number. Then, the label matrix F can be estimated as:
f j = i = 1 m g i Z i j ,
so that AGR is to minimize the following objective function:
J G = j = 1 l G z j y j F 2 + γ 2 i , j = 1 n W i j a G z i G z j F 2 = G Z l Y l F 2 + γ T r G Z I W a Z T G T = G Z l Y l F 2 + γ T r G L r G T ,
where the first term is the loss function and the second term is the manifold regularized term, W a = Z T Δ 1 Z R n × n is the anchor graph, and Δ R m × m is a diagonal matrix with each element satisfying Δ i i = j = 1 n Z i j . It can be easily proven that W a is doubly-stochastic, hence it has probability meaning. In addition, given two data points x i and x j with common anchor points, it follows W i j a > 0 ; otherwise W i j a = 0 . This indicates that the data points with common anchor points have similar semantic concepts hence W a can characterize the semantic structure of datasets. L r = Z I W a Z T R m × m is the reduced Laplacian matrix, Z l R m × l is formed by the first l columns of Z. Here, we can see that although AGR is performed with a regularization term on all data points, it is equivalent to being regularized on anchor points with a reduced Laplacian matrix L r . Finally, the labels of data points can be inferred from those of anchor points, where the computational complexity can be reduced to O n . Therefore, both graph construction and the regularized procedure in AGR are efficient and scalable to a large-scale dataset.

3. A Sub-Graph Regularized Framework for Efficient Semi-Supervised Learning

3.1. Analysis of Anchor Graph Construction

The key point for anchor graph construction is to define the weight matrix for measuring the similarity between each data point and anchor data. A typical way is to use kernel regression [22]:
S i j = K δ x i , b j s i K δ x i , b s s i
where δ is the bandwidth of Gaussian function and i denotes the indices of the k neighborhood anchors of x i . Obviously, we have S T 1 q = 1 n , where 1 n R n × 1 and 1 q R q × 1 is the column vectors with n and q ones, respectively, so that the sum of each column of S is equal to 1. This means S i j can be viewed as a probability value P b i | x j , which represents the transferred probability from x j to b j . Then, following the Bayes rule, we have:
P b i = j = 1 n P x j P b i | x j 1 n P b i | x j
where P x j 1 / n follows a uniform distribution based on the strong law of large number n . In addition, since the anchors are also sampled from the dataset, we can further assume P b i also follows a uniform distribution, i.e., P b i = 1 / q . With these assumptions, we have:
P b i = 1 / q , P x j = 1 / n P b i = j = 1 n P x j P b i | x j j = 1 n P b i | x j = n q S i . 1 n = σ
where S i is the i- t h row of S and σ = n / q is a fixed value so that S 1 n = n / q 1 q = σ 1 q . We thereby have two constraints on S, i.e., S T 1 q = 1 n and S 1 n = σ 1 q (the advantages will be shown in the next subsection). Our goal is to calculate a weight matrix S that follows the above constraints so that S has clear stochastic meaning.
Fortunately, this can be simply achieved by iteratively normalizing S both in row and column, i.e.,
S 0 P r S 1 P c S 1 P r S 2 P c S 2
where P c ( S ) = S Δ c 1 and P r ( S ) = Δ r 1 S , Δ c = d i a g ( 1 S ) R l + u × l + u and Δ r = d i a g ( S 1 ) R q × q . Acutally, the above iterative procedure is equivalent to solving the following optimization problem:
min S S S 0 F 2 s . t . S 0 , S T 1 q = 1 n , S 1 n = σ 1 q
where S 0 is the initial S as calculated in Equation (4). Equation (8) involves an instance of quadratic programming (QP), which can be divided into two convex sub-problems:
min S S S 0 F 2 s . t . S 0 , S T 1 q = 1 n
min S S S 0 F 2 s . t . S 0 , S 1 n = σ 1 q .
By the above derivations, the initial QP problem in Equation (8) is tackled by successively alternating between two sub-problems in Equations (9) and (10). This alternate optimization procedure will converge due to Von-Neumann’s lemma [27,28]. In addition, Von-Neumann’s lemma guarantees that alternately solving the sub-problems in Equations (9) and (10) with the current solution is theoretically guaranteed to converge to the global optima of Equation (8).

3.2. Sub-Graph Construction

We have now obtained q anchors and the coefficient s j of each data x j . The weight matrix S reflects the affinities between data points and anchors, i.e., X B S . If we further assume such affinities in the original high-dimensional dataset can be preserved in the low-dimensional class labels, then we have F Z S , where Z = [ z 1 , z 2 , , z q ] R c × q represents the class labels of anchors B. This indicates that the class labels of the dataset can be easily obtained by F = Z S , given that the class labels of anchors have already been inferred. Since the number of anchors is smaller than that of the dataset, the computational cost for calculating Z can be much lower than directly calculating F in certain conventional graph-based SSL methods. We thereby present an efficient method for semi-supervised learning, in which we aim to develop a sub-graph regularized (SGR) framework for semi-supervised learning by utilizing the information of anchors.
Here, in order to develop our proposed sub-graph SSL method, we need to first construct a sub-graph on the set of anchors and define the adjacency matrix to measure the similarity between any two anchors. There are many approaches to construct the graph by utilizing the anchors, such as conventional kNN graph [1,18,20,21]. However, intuitively, we will design the adjacency matrix W d R q × q by using S as follows:
W d = 1 σ S S T .
It can be easily proven that W d 1 q = ( 1 / σ ) S S T 1 q = ( 1 / σ ) S 1 n = 1 q . This indicates W d is a doubly-stochastic matrix. Therefore, the above graph construction can be theoretically derived by a probabilistic means. More straightforward, it can be easily noted that W d in Equation (11) is an inner product of S with each element W i j d = s i r , where s i r s j r T and s j r are the i- t h and j- t h rows of S = { s 1 r , s 2 r , , s q r } . This indicates that the rows of S are denoted as the representations of anchors. In addition, given b i and b j share more common data points choosing them as anchors, their corresponding s i r and s j r will be similar and W i j d will become a large value; To the constrast, W i j d will be equal to 0, if b i and b j do not share any data points. Therefore W d derived in Equation (11) can be viewed as an adjacency matrix to measure the similarity between any two anchors.

3.3. Efficient Semi-Supervised Learning via Sub-Graph Construction

With the above graph construction, we then develop our sub-graph model for efficient semi-supervised learning. Since the number of anchors is much smaller than that of the dataset, our goal is first to estimate the labels of anchors Z from labeled data via the sub-graph model, and then to calculate those of unlabeled samples by the weight matrix. Here, we first give the objective function of the proposed sub-graph regularized framework for calculating the class labels of anchors as follows:
The first term in Equation (12) is to measure the smoothness of estimated labels on the graph, while the second term is to measure how the estimated labels are consistent original labels, and the third one is a Tikhonov regularization term to avoid the singularity of possible solutions. η A and η I are the parameters balancing the tradeoff of the three terms. By conducting the derivation of J Z with regard to Z, we can calculate the class labels for anchors as follows:
Z * = Y U S T S U S T + η A I + η I L d 1
where U is a diagonal matrix where the first l and the remaining u element are 1 and 0, respectively, L d is the graph Laplacian matrix of W d . Following Equation (13), we can observe that key computations for Z * are the inverse of S l S l T + η I L d + η A I , where the complexity is O q 3 . Note that q l + u , calculating Z can be much smaller than directly calculating F as in LGC and GFHF. Finally, the class labels of the dataset can be calculated by
F = Z * S = Y U S T S U S T + η I L d + η A I 1 S .
The basic steps of the proposed SGR are in Algorithm 1.
Algorithm 1: The proposed SGR.
1 
Input: Data X R D × l + u , label matrix Y R c × l + u , the number of anchors q and other parameters.
2 
From S as Equation (8).
3 
Form sub-graph weight matrix as S S T in Equation (11).
4 
Estimate the label matrix of anchors Z * = Y U S T S U S T + η I L d + η A I 1 as in Equation (12).
5 
Estimate the label matrix of dataset by F = Z * S .
6 
Output: The predicted label matrix of anchors and dataset Z R c × q , F R c × l + u , respectively.

3.4. Out-of-Sample Extension via Kernel Regression

The proposed SGR can be used to estimate the labels of unlabeled data. It cannot directly infer the labels of new data. One way to handle such problems is to find a linear projective model by regressioning anchors B on Z, i.e.,:
V = arg min V , b V T B + b T e Z F 2 + γ Z F 2
where V R d × c is the projection and b is the bias term. Though this linearization assumption Z = V T B + b T e provides an effective and efficient solution to the out-of-sample problem. However it is not able to fit the nonlinear distribution. Therefore, we solve the above problem in two ways: (1) We combine the objective function of SGR and the regression term to form a unified framework, so that the class labels of Z, the projection V, and the bias b can be simultaneously calculated; (2) we utilize the kernel trick to search a nonlinear projection. Specifically, we give the objective function as:
J V , Z , b = min V , Z , b j = 1 l Z s j y j F 2 + η A V F 2 + η R V T φ X + b T e Z F 2 + η I i , j = 1 q W i j d z i z j F 2 .
It should be noted that φ B is only implicit and not available. To calculate the optimal V, we have to involve some restrictions. In detail, let V have a linear combination of φ B , i.e., V = φ B A , where A R q × c is the coefficient for V, then:
J V , Z , b = min V , Z , b j = 1 l Z s j y j F 2 + η A T r A T K A + η R A T K + b T e Z F 2 + η I i , j = 1 q W i j d z i z j F 2
where K represents the kernel matrix and we can select Gaussian kernel. By setting the derivatives of Equation (16), if follows:
b = 1 q Z T 1 q K A 1 q Z T 1 q K A 1 q 1 q T 1 q 1 q T A = K L c K T + η K 1 K L c Z T Z = Y U S T S U S T + η I L d + η R L r 1
where η = η I / η R , L c = I 1 q T 1 q / 1 q 1 q T is to subtract the mean of all data, L r = L c L c K T K L c K T + η I 1 K L c . Here, denote x as a new coming data and x k as its kernel representation, its projected data t can be given t = V T x k + b and the label of x is estimated as:
c t = arg max i t i
One toy model example for verifying out-of-sample extensions can be given in Figure 1. In this toy example, we annotate two datasets as labeled sets in each class. We then infer the labels in the region x , y | x [ 2 , 2 ] , y [ 2 , 2 ] by out-of-sample extension both in the linear version and kernel version. The experiment results show that the decision boundary learned by the kernel version is satisfied, since they are both consistent with the data manifold. While the linear version fails to handle the task, due to the two-cycle dataset following a nonlinear distribution.
Note that the proposed method includes three stages of training: (1) initialize the anchors by k-means; (2) construct the sub-graph w d ; (3) perform SSL. Here, the computational cost of k-means in the first stage is O q l + u , while the one for sub-graph construction and SSL strategy in the second and third stage are W d is O q l + u and O q 3 + ( l + u ) q , respectively. The summary of the computational complexity is in Table 1, from which we can see that if we use a fixed q ( q l + u ) anchors for large scale dataset, the computational complexity of proposed SGR scales linearly with l + u , which indicates the proposed SGR is suitable for handling large-scale data.
It should be noted a recent work, [29], has proposed another SSL method based coupled graph Laplacian regularization, which is similar to our proposed work. The main advantages for our proposed work compared to [29] can be issued as follows: (1) The proposed constructed graph is doubly-stochastic, so that the constructed graph Laplacian is normalized in each row or column. For the coupled graph Laplacian rigorization, their constructed graph may not be doubly-stochastic; (2) the proposed work can directly handle out-of-sample problems by projecting the newly-coming data on the projection matrix so that the class membership of newly-coming data can be inferred. While for the coupled graph Laplacian regularization, it does not consider this point.

4. Experiments

4.1. Toy Examples for Synthetic Datasets

We will first show the iterative approach of the proposed method can adaptively reduce the bias of a data manifold, where a dataset of two classes with noises is generated with a half-moon distribution in each class. Here, we use a kernel version of the proposed method to learn the classification model to handle such nonlinear distribution. Figure 2 shows the decision surfaces and boundaries obtained by the proposed method during the iterations. From Figure 2, we can observe that for the two-moon dataset, the results converge fast by only using four iterations. In Figure 2, we can observe that by initially treating each local regression term equal, the boundary learned by the proposed method cannot well separate the two classes as there are many mis-classified data points. However, during the iterative rewrighted process, the converged boundary in Figure 2 after four iterations can be more and more accurate and distinctive due to the reason that the biases caused by the noisy data are seriously reduced.

4.2. Description of Dataset

In this section, we will utilize six real-world datasets for verification. The six datasets are the Extended Yale-B, Carnegie Mellon University Pose, Illumination and Expression (CMU-PIE), Columbia Object Image Library 100 (COIL-100), Eidgenössische Technische Hochschule 80 (ETH80), U. S. Post Station (USPS) digit image and Chinese Academy of Sciences, Institute of Automation, Hand-Written Digit Base (CASIA-HWDB) datasets. For each dataset, we only select 5%, 10%, 15%, and 20% of the data points to formulate a labeled set randomly, 20% of the data to formulate a test set, and the remaining ones to formulate an unlabeled set. The information of the data and sampled images can be observed in Table 2 and Figure 3, respectively.

4.3. Image Classification

We will show the effectiveness of the proposed SGR for image classification. The experiment settings are as follows [36,37]: For most SSL methods, e.g., LGC, Special Label Propagation (SLP), Linear Neighborhood Propagation (LNP), AGR, Efficient Anchor Graph Regularization (EAGR) and MR, the parameter k for constructing the kNN graph is determined by five-fold cross validation, which is chosen from 6 to 20. For LGC, LNP AGR, and EAGR, the regularized parameter is needed to set, which is determined from 10 6 , 10 3 , 10 1 , 1 , 10 , 10 3 , 10 6 . The average accuracies of over 50 random splits with changed numbers of labeled data are shown in Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8. From the classification results, we have:
(1) For almost all methods, the classification results increase given that the number of labeled data increases. For instance, the results of SGR will increase 15% as the number of labeled data is increased from 5% to 20% in most cases. This can almost get 17% increase in CASIA-HWDB dataset. In addition, the classification results will not increase given the number of labeled samples are sufficient especially in the cases of COIL100, USPS, and ETH80 datasets;
(2) The proposed SGR can outperform other methods in all cases. For instance, SGR can achieve 5%–9% superiority over SLP, LNP, and MR in almost all cases. Especially in the CASIA-HWDB dataset, this improvement can even achieve 9%. AGR and EAGR can obtain competitive results as SGR by tuning the parameters. However, the proposed SGR can automatically adjust them while achieving satisfying results;
(3) The accuracies of the unlabeled set outperform those of the test set. This is because the testing data are not utilized for training. However, the accuracies of the test set are still good showing that SGR is able to handling the new incoming data.

4.4. Parameter Analysis with Different Numbers of Anchors

In this subsection, we will verify the accuracies of SGR against different numbers of anchors. In this study, we selected 5% data to formulate a labeled set and the remaining ones to formulate an unlabeled set. Then, in Figure 4, we give the accuracy curve of SGR under different numbers of anchors, where the candidate set is chosen from n to 10 n .
From Figure 4, we can see that in ETH80 dataset, the classification results increase when the number of anchors increase. However, the accuracies will not increase anymore given sufficient number of anchors, such as 10 n . Here, 10 n is still much smaller compared with that of original data. For other datasets, the classification accuracies have no change and are less sensitive to the number of anchors.

4.5. Image Visualization

In this subsection, we will demonstrate the visualization of the proposed method to show its superiority. In this study, we choose the digit and letter images of the first five classes from CASIA-HWDB dataset for experiment, where we randomly select 20 data and 80 data in each class to formulate a labeled set and an unlabeled set, respectively. The rest are used to formulate testing data. We then project the test set on the 2D subspace by utilizing a 2D projection matrix for visualization. Since the out-of-sample extension of the proposed SGR and MR are derived from the regression problem, we perform PCA operator on the projection data of V T X to reduce its dimensionality into two in order to handle the sub-manifold visualization problem. Then, the test data can be visualized on 2D subspace. The experiment results are shown in Figure 5 and Figure 6. From the experiment results, we can observe that SGR can obtain the better performance especially in CASIA-HWDB digit image data.

5. Conclusions

In this paper, we proposed a sub-graph-based SSL for image classification. The main contributions of the proposed work are as follows:
(1)
We developed a doubly-stochastic S that measures the similarity between data points and anchors. The new updated S has probability means and can be viewed asa transition probability between data points and anchors. In addition, the new sub-graph is constructed by S in an efficient way and can preserve the geometry of data manifold. Simulation results verify the superiority of the proposed SGR;
(2)
We also adopt a linear predictor for inferring the labels of new incoming data, which can handle out-of-sample problems. The computational complexity of this linear predictor is linear with the number of anchors; hence it is efficient. This shows that SGR can handle a large-scale dataset, which is quite practical;
From the above analysis, we can see that the main advantages for the proposed work is the effectiveness for handling the classification problems and that it needs less computational complexity for both graph construction and SSL. It can also handle out-of-sample problems based on a kernel regression on anchors. However, it also suffers the drawback that the parameters are not adaptive. In addition, the graph construction and SSL inference are in two different stages. Our future work can lie in developing a unified framework for optimization with adaptive adjusted parameters.
While the proposed work mainly focuses on image classification, our future work can also lie in handling other state-of-the-art applications, such as image retagging [38], and context classification in the natural language processing field [39,40].

Author Contributions

Conceptualization, Software, Methodology, J.L.; Formal analysis, Funding acquisition, Original Draft, M.Z.; Supervision, Validation, Review and editing: W.K.

Funding

This work is supported by the National Science Foundation of China under Grant No. 61971121, 61601112 and 61603088, the Fundamental Research Funds for the Central Universities and DHU Distinguished Young Professor Program.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhu, X.; Ghahramani, Z.; Lafferty, J.D. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03), Washington, DC, USA, 21–24 August 2003. [Google Scholar]
  2. Zhou, D.; Bousquet, O.; Lal, T.N.; Weston, J.; Scholkopf, B. Learning with local and global consistency. Advances in Neural Information Processing Systems; MIT: Cambridge, MA, USA, 2004. [Google Scholar]
  3. Belkin, M.; Niyogi, P.; Sindhwani, V. Manifold regularization: A geometric framework for learning from labeled and unlabeled samples. J. Mach. Learn. Res. 2006, 7, 2399–2434. [Google Scholar]
  4. Nie, F.; Xiang, S.; Liu, Y.; Zhang, C. A general graph based semi-supervised learning with novel class discovery. Neural Comput. Appl. 2010, 19, 549–555. [Google Scholar] [CrossRef]
  5. Cai, D.; He, X.; Han, J. Semi-supervised discriminant analysis. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–20 October 2007; pp. 1–7. [Google Scholar]
  6. Zhao, M.; Zhang, Z.; Chow, T.W.; Li, B. Soft label based linear discriminant analysis for image recognition and retrieval. Comput. Image Underst. 2014, 121, 86–99. [Google Scholar] [CrossRef]
  7. Zhao, M.; Zhang, Z.; Chow, T.W.; Li, B. A general soft label based linear discriminant analysis for semi-supervised dimensionality reduction. Neural Netw. 2014, 55, 83–97. [Google Scholar] [CrossRef]
  8. Zhao, M.; Chow, T.W.; Wu, Z.; Zhang, Z.; Li, B. Learning from normalized local and global discriminative information for semi-supervised regression and dimensionality reduction. Inf. Sci. 2015, 324, 286–309. [Google Scholar] [CrossRef]
  9. Zhao, M.; Chow, T.W.; Zhang, Z.; Li, B. Automatic image annotation via compact graph based semi-supervised learning. Knowl.-Based Syst. 2015, 76, 148–165. [Google Scholar] [CrossRef]
  10. Zhao, M.; Zhang, Z.; Chow, T.W. Trace ratio criterion based generalized discriminative learning for semi-supervised dimensionality reduction. Pattern Recognit. 2012, 45, 1482–1499. [Google Scholar] [CrossRef]
  11. Fukunaga, K. Introduction to statistical pattern classification. Patt. Recognit. 1990, 30, 1149. [Google Scholar]
  12. Gao, Y.; Ma, J.; Yuille, A.L. Semi-supervised sparse representation based classification for face recognition with insufficient labeled samples. arXiv 2016, arXiv:1609.03279. [Google Scholar] [CrossRef]
  13. Ma, J.; Zhao, J.; Jiang, J.; Zhou, H.; Guo, X. Locality Preserving Matching. Int. J. Comput. Vis. 2019, 127, 512–531. [Google Scholar] [CrossRef]
  14. Gao, Y.; Yuille, A.L. Estimation of 3D Category-Specific Object Structure: Symmetry, Manhattan and/or Multiple Images. Int. J. Comput. Vis. 2019, 127, 1501–1526. [Google Scholar] [CrossRef]
  15. Tenenbaum, J.B.; de Silva, V.; Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef] [PubMed]
  16. Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef]
  17. He, X.; Yan, S.; Hu, Y.; Niyogi, P.; Zhang, H. Face recognition using Laplacianfaces. IEEE Trans. Pattern Anal. Mach. 2005, 27, 328–340. [Google Scholar]
  18. Wang, F.; Zhang, C. Label propagation through linear neighborhoods. IEEE Trans. Knowl. Data Eng. 2008, 20, 55–67. [Google Scholar] [CrossRef]
  19. Wang, J.; Wang, F.; Zhang, C.; Shen, H.C.; Quan, L. Linear neighborhood propagation and its applications. IEEE Trans. Pattern Anal. Machine Intell. 2009, 31, 1600–1615. [Google Scholar] [CrossRef]
  20. Yang, Y.; Nie, F.; Xu, D.; Luo, J.; Zhuang, Y.; Pan, Y. A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 723–742. [Google Scholar] [CrossRef]
  21. Xiang, S.; Nie, F.; Zhang, C. Semi-supervised classification via local spline regression. IEEE Trans. Pattern Anal. Mach. 2010, 32, 2039–2053. [Google Scholar] [CrossRef]
  22. Liu, W.; He, J.; Chang, S.-F. Large graph construction for scalable semi-supervised learning. In Proceedings of the 27th international conference on machine learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 679–686. [Google Scholar]
  23. Liu, W.; Wang, J.; Chang, S.-F. Robust and scalable graph-based semisupervised learning. Proc. IEEE 2012, 100, 2624–2638. [Google Scholar] [CrossRef]
  24. Wang, M.; Fu, W.; Hao, S.; Tao, D.; Wu, X. Scalable semi-supervised learning by efficient anchor graph regularization. IEEE Trans. Knowl. Data Eng. 2016, 28, 1864–1877. [Google Scholar] [CrossRef]
  25. Fu, W.; Wang, M.; Hao, S.; Mu, T. Flag: Faster learning on anchor graph with label predictor optimization. IEEE Trans. Big Data 2017. [Google Scholar] [CrossRef]
  26. Wang, M.; Fu, W.; Hao, S.; Liu, H.; Wu, X. Learning on big graph: Label inference and regularization with anchor hierarchy. IEEE Trans. Knowl. Data Eng. 2017, 29, 1101–1114. [Google Scholar] [CrossRef]
  27. Von Neumann, J. Functional Operators: Measures and Integrals; Princeton University Press: Princeton, NJ, USA, 1950; Volume 1. [Google Scholar]
  28. Liu, W.; Chang, S.-F. Robust multi-class transductive learning with graphs. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
  29. Zhao, X.; Wang, D.; Zhang, X.; Gu, N.; Ye, X. Semi-supervised learning based on coupled graph laplacian regularization. In Proceedings of the 2018 Chinese Intelligent Systems Conference; Springer: Berlin/Heidelberg, Germany, 2019; pp. 131–142. [Google Scholar]
  30. Georghiades, A.S.; Belhumeur, P.N.; Kriegman, D.J. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. 2001, 23, 643–660. [Google Scholar] [CrossRef]
  31. Baker, S.; Bsat, M. The CMU pose, illumination, and expression database. IEEE Trans. Pattern Anal. Mach. 2003, 25, 1615. [Google Scholar]
  32. Nene, S.A.; Nayar, S.K.; Murase, H. Columbia Object Image Library (COIL-100); Technical Report CUCS-005-96; Columbia University: New York, NY, USA, 1996. [Google Scholar]
  33. Leibe, B.; Schiele, B. Analyzing appearance and contour based methods for object categorization. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Madison, WI, USA, 18–20 June 2003; p. II-409. [Google Scholar]
  34. Hull, J.J. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 550–554. [Google Scholar] [CrossRef]
  35. Liu, C.-L.; Yin, F.; Wang, D.-H.; Wang, Q.-F. CASIA online and offline chinese handwriting databases. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, 18–21 September 2011; pp. 37–41. [Google Scholar]
  36. Hou, C.; Nie, F.; Wang, F.; Zhang, C.; Wu, Y. Semisupervised learning using negative labels. IEEE Trans. Neural Netw. 2011, 22, 420–432. [Google Scholar]
  37. Rodriguez, M.Z.; Comin, C.H.; Casanova, D.; Bruno, O.M.; Amancio, D.R.; Costa, L.D.F.; Rodrigues, F.A.; Kestler, H.A. Clustering algorithms: A comparative approach. PLoS ONE 2019, 14, e0210236. [Google Scholar] [CrossRef]
  38. Tang, J.; Shu, X.; Li, Z.; Jiang, Y.G.; Tian, Q. Social anchor unit graph regularized tensor completion for large scale image retagging. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2027–2034. [Google Scholar] [CrossRef]
  39. Amancio, D.R.; Silva, F.N.; Costa, L.d.F. Concentric network symmetry grasps authors’ styles in word adjacency networks. EPL (Europhys. Lett.) 2015, 110, 68001. [Google Scholar] [CrossRef]
  40. Koplenig, A.; Wolfer, S. Studying lexical dynamics and language change via generalized entropies: The problem of sample size. Entropy 2019, 21, 464. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Out-of-sample extension: two-cycle dataset in x , y | x [ 2 , 2 ] , y [ 2 , 2 ] . (a,c) the contour lines of the decision boundary; (b,d) the contour surface is the estimated label values in the region. In this experiment, the figures in the upper row represent the results by using a linear prediction model Z = V T B + b T e , while those on the bottom row represent the results by using a kernel based prediction model Z = V T φ ( B ) + b T e . Clearly, the kernel prediction model is much better than the linear prediction model since the two-cycle dataset follows a nonlinear distribution.
Figure 1. Out-of-sample extension: two-cycle dataset in x , y | x [ 2 , 2 ] , y [ 2 , 2 ] . (a,c) the contour lines of the decision boundary; (b,d) the contour surface is the estimated label values in the region. In this experiment, the figures in the upper row represent the results by using a linear prediction model Z = V T B + b T e , while those on the bottom row represent the results by using a kernel based prediction model Z = V T φ ( B ) + b T e . Clearly, the kernel prediction model is much better than the linear prediction model since the two-cycle dataset follows a nonlinear distribution.
Entropy 21 01125 g001
Figure 2. Gray image of reduced space learned by the proposed method: two-moon dataset.
Figure 2. Gray image of reduced space learned by the proposed method: two-moon dataset.
Entropy 21 01125 g002
Figure 3. Sample images of real-world datasets: Yale-B, Carnegie Mellon University Pose, Illumination and Expression (CMU-PIE), Columbia Object Image Library 100 (COIL-100), Eidgenössische Technische Hochschule 80 (ETH80), U. S. Post Station (USPS) digit image and Chinese Academy of Sciences, Institute of Automation, Hand-Written Digit Base (CASIA-HWDB) datasets.
Figure 3. Sample images of real-world datasets: Yale-B, Carnegie Mellon University Pose, Illumination and Expression (CMU-PIE), Columbia Object Image Library 100 (COIL-100), Eidgenössische Technische Hochschule 80 (ETH80), U. S. Post Station (USPS) digit image and Chinese Academy of Sciences, Institute of Automation, Hand-Written Digit Base (CASIA-HWDB) datasets.
Entropy 21 01125 g003
Figure 4. Classification accuracies over different numbers of anchors.
Figure 4. Classification accuracies over different numbers of anchors.
Entropy 21 01125 g004
Figure 5. Visualization performance of different methods: Five letters images from CASIA-HWDB: Principal Component Analysis (PCA), Locality Preserving Projection (LPP), Linear Discriminant Analysis (LDA), Semi-supervised Discriminant Analysis (SDA), Manifold Regularization (MR) and Sub-Graph Regularization (SGR).
Figure 5. Visualization performance of different methods: Five letters images from CASIA-HWDB: Principal Component Analysis (PCA), Locality Preserving Projection (LPP), Linear Discriminant Analysis (LDA), Semi-supervised Discriminant Analysis (SDA), Manifold Regularization (MR) and Sub-Graph Regularization (SGR).
Entropy 21 01125 g005
Figure 6. Visualization performance of different methods: five digits images from CASIA-HWDB: Principal Component Analysis (PCA), Locality Preserving Projection (LPP), Linear Discriminant Analysis (LDA), Semi-supervised Discriminant Analysis (SDA), Manifold Regularization (MR) and Sub-Graph Regularization (SGR).
Figure 6. Visualization performance of different methods: five digits images from CASIA-HWDB: Principal Component Analysis (PCA), Locality Preserving Projection (LPP), Linear Discriminant Analysis (LDA), Semi-supervised Discriminant Analysis (SDA), Manifold Regularization (MR) and Sub-Graph Regularization (SGR).
Entropy 21 01125 g006
Table 1. The computational complexity of different stages. Semi-supervised learning (SSL).
Table 1. The computational complexity of different stages. Semi-supervised learning (SSL).
The Proposed MethodThe First Stage (Initialization)The Second Stage (The Proposed Model)The Third Stage (SSL)Totals (Considering Large-Scale Data
q l + u )
Computational Complexity O q l + u O q l + u O q 3 + q l + u O q l + u + O q l + u +
O q 3 + q l + u O q l + u + q 3
Table 2. Information of different datasets.
Table 2. Information of different datasets.
DatasetDatabase TypeSampleDimClassTrain per ClassTest per Class
Extended Yale-B [30]Face16,12310243880%20%
CMU-PIE [31]Face11,00010246880%20%
COIL100 [32]Object720010241005814
ETH80 [33]Object3280102480338
USPS [34]Hand-written digits929825610800remaining
CASIA-HWDB [35]Hand-written letters12,45625652200remaining
Table 3. Classification accuracies of the Yale-B dataset.
Table 3. Classification accuracies of the Yale-B dataset.
Methods5% Training Labeled10% Training Labeled15% Training Labeled20% Training Labeled
UnlabeledTestUnlabeledTestUnlabeledTestUnlabeledTest
SVM53.1 ± 1.152.7 ± 1.068.8 ± 2.067.7 ± 0.675.2 ± 1.173.7 ± 1.380.0 ± 1.878.8 ± 1.2
MR59.0 ± 1.258.5 ± 1.370.3 ± 1.169.4 ± 0.576.4 ± 1.374.9 ± 1.580.7 ± 1.379.0 ± 1.1
LGC64.7 ± 1.0 71.8 ± 1.1 76.4 ± 4.2 80.8 ± 1.0
SLP65.6 ± 2.3 73.9 ± 1.0 78.0 ± 1.8 81.8 ± 1.0
LNP64.9 ± 1.353.8 ± 2.772.0 ± 1.271.2 ± 0.478,0 ± 2.476.6 ± 2.181.6 ± 1.080.0 ± 1.4
AGR66.6 ± 1.565.8 ± 1.374.3 ± 1.272.2 ± 0.478.1 ± 1.577.3 ± 1.783.0 ± 1.280.0 ± 4.5
EAGR66.9 ± 0.866.5 ± 1.874.4 ± 1.173.2 ± 1.578.0 ± 1.577.2 ± 1.984.4 ± 2.483.6 ± 3.1
SGR69.9 ± 0.467.2 ± 1.075.7 ± 1.174.0 ± 3.379.4 ± 1.078.3 ± 1.186.3 ± 2.582.8 ± 2.4
Table 4. Classification accuracies of the CMU-PIE dataset.
Table 4. Classification accuracies of the CMU-PIE dataset.
Methods5% Training Labeled10% Training Labeled15% Training Labeled20% Training Labeled
UnlabeledTestUnlabeledTestUnlabeledTestUnlabeledTest
SVM42.5 ± 1.341.5 ± 1.156.8 ± 2.255.8 ± 1.564.6 ± 1.263.8 ± 1.869.3 ± 1.768.9 ± 1.2
MR47.8 ± 1.146.7 ± 1.659.3 ± 1.858.8 ± 1.365.6 ± 1.664.5 ± 1.669.9 ± 1.469.1 ± 1.4
LGC53.5 ± 1.6 60.3 ± 1.7 66.5 ± 2.8 70.5 ± 1.3
SLP55.3 ± 1.9 63.4 ± 1.8 67.2 ± 1.9 70.9 ± 1.3
LNP55.2 ± 1.254.8 ± 1.962.9 ± 1.561.8 ± 0.968,3 ± 2.767.3 ± 2.371.1 ± 1.271.0 ± 1.6
AGR56.4 ± 1.455.3 ± 1.864.8 ± 1.364.7 ± 0.568.5 ± 2.166.9 ± 1.872.8 ± 1.771.3 ± 3.5
EAGR57.2 ± 1.056.4 ± 1.664.4 ± 1.263.7 ± 1.968.4 ± 1.867.7 ± 2.373.1 ± 2.072.4 ± 2.7
SGR59.0 ± 0.758.4 ± 1.365.6 ± 1.264.6 ± 1.969.8 ± 1.667.9 ± 1.675.0 ± 2.473.9 ± 2.3
Table 5. Classification accuracies of the COIL100 dataset.
Table 5. Classification accuracies of the COIL100 dataset.
Methods5% Training Labeled10% Training Labeled15% Training Labeled20% Training Labeled
UnlabeledTestUnlabeledTestUnlabeledTestUnlabeledTest
SVM83.6 ± 0.983.2 ± 0.888.5 ± 0.886.6 ± 0.891.8 ± 0.891.4 ± 0.795.3 ± 0.894.5 ± 1.6
MR83.7 ± 1.083.4 ± 0.989.0 ± 0.987.3 ± 0.992.1 ± 0.891.6 ± 0.995.3 ± 0.794.7 ± 1.3
LGC85.5 ± 0.8 89.3 ± 0.9 92.4 ± 0.8 95.5 ± 0.6
SLP86.4 ± 0.7 89.3 ± 0.9 92.8 ± 0.6 95.6 ± 0.8
LNP86.5 ± 0.785.6 ± 0.789.6 ± 0.988.7 ± 0.792.9 ± 0.792.4 ± 0.895.8 ± 0.795.1 ± 1.3
AGR86.5 ± 0.685.8 ± 0.990.9 ± 0.988.8 ± 0.893.3 ± 0.692.7 ± 0.995.8 ± 0.795.3 ± 1.4
EAGR86.6 ± 0.785.7 ± 1.389.9 ± 0.989.0 ± 1.593.2 ± 0.692.7 ± 1.596.0 ± 0.795.2 ± 0.9
SGR87.0 ± 0.686.7 ± 1.091.8 ± 0.989.7 ± 0.894.7 ± 0.693.2 ± 0.897.0 ± 0.695.6 ± 0.9
Table 6. Classification accuracies of the ETH80 dataset.
Table 6. Classification accuracies of the ETH80 dataset.
Methods5% Training Labeled10% Training Labeled15% Training Labeled20% Training Labeled
UnlabeledTestUnlabeledTestUnlabeledTestUnlabeledTest
SVM61.1 ± 1.359.4 ± 0.371.1 ± 1.970.2 ± 2.075.9 ± 1.575.3 ± 3.178.9 ± 2.077.9 ± 2.5
MR62.3 ± 0.860.0 ± 0.271.7 ± 2.071.0 ± 2.776.2 ± 1.075.3 ± 2.878.9 ± 1.978.3 ± 2.5
LGC65.7 ± 1.4 73.5 ± 1.4 76.8 ± 1.5 79.0 ± 1.7
SLP65.9 ± 1.5 73.9 ± 1.2 76.9 ± 1.6 79.3 ± 1.8
LNP64.9 ± 0.962.2 ± 0.273.4 ± 2.071.4 ± 2.676.7 ± 1.176.0 ± 2.679.0 ± 1.878.5 ± 2.0
AGR66.4 ± 1.665.1 ± 0.275.0 ± 1.772.2 ± 2.276.9 ± 1.776.1 ± 2.579.6 ± 2.078.9 ± 1.9
EAGR68.2 ± 1.767.7 ± 2.174.9 ± 1.474.2 ± 1.977.3 ± 1.777.0 ± 1.980.0 ± 2.279.4 ± 2.8
SGR69.4 ± 1.967.2 ± 0.174.0 ± 1.374.2 ± 2.277.5 ± 1.977.3 ± 1.879.8 ± 2279.0 ± 2.2
Table 7. Classification accuracies of the ETH80 dataset.
Table 7. Classification accuracies of the ETH80 dataset.
Methods5% Training Labeled10% Training Labeled15% Training Labeled20% Training Labeled
UnlabeledTestUnlabeledTestUnlabeledTestUnlabeledTest
SVM71.7 ± 0.770.6 ± 1.577.9 ± 0.777.8 ± 0.291.9 ± 4.490.9 ± 4.296.1 ± 1.995.7 ± 0.9
MR74.1 ± 0.773.0 ± 1.580.9 ± 0.879.8 ± 0.192.6 ± 3.491.7 ± 3.496.1 ± 2.295.0 ± 1.0
LGC74.7 ± 0.7 87.1 ± 0.8 94.6 ± 3.3 96.5 ± 2.3
SLP75.0 ± 0.5 89.7 ± 0.7 95.4 ± 3.0 96.5 ± 2.3
LNP76.5 ± 0.674.8 ± 0.892.0 ± 0.790.8 ± 0.595.5 ± 3.495.0 ± 3.496.9 ± 2.596.5 ± 0.9
AGR78.7 ± 0.676.1 ± 0.793.6 ± 0.792.6 ± 0.796.0 ± 2.495.8 ± 2.497.1 ± 2.896.7 ± 0.9
EAGR79.9 ± 0.679.4 ± 1.293.6 ± 0.792.9 ± 1.196.3 ± 3.695.5 ± 3.597.2 ± 1.796.3 ± 2.2
SGR80.7 ± 0.579.7 ± 0.795.0 ± 0.593.3 ± 0.897.2 ± 3.196.2 ± 3.197.4 ± 1.597.3 ± 0.7
Table 8. Classification accuracies of the CASIA-HWDB dataset.
Table 8. Classification accuracies of the CASIA-HWDB dataset.
Methods5% Training Labeled10% Training Labeled15% Training Labeled20% Training Labeled
UnlabeledTestUnlabeledTestUnlabeledTestUnlabeledTest
SVM56.8 ± 5.455.8 ± 0.665.7 ± 0.664.0 ± 1.779.0 ± 0.578.2 ± 4.083.4 ± 1.882.1 ± 1.9
MR58.7 ± 3.357.3 ± 0.573.0 ± 0.662.0 ± 1.479.4 ± 0.678.4 ± 2.786.6 ± 1.985.5 ± 1.5
LGC63.1 ± 2.4 76.1 ± 0.4 80.7 ± 0.5 88.1 ± 1.4
SLP63.4 ± 1.6 77.4 ± 0.4 85.3 ± 0.5 88.6 ± 1.7
LNP66.5 ± 1.464.8 ± 0.678.5 ± 0.577.5 ± 0.785.9 ± 0.584.8 ± 1.789.2 ± 1.790.6 ± 8.2
AGR72.0 ± 0.971.0 ± 0.680.9 ± 2.877.8 ± 0.687.2 ± 0.586.4 ± 1.691.8 ± 1.690.0 ± 4.1
EAGR74.9 ± 0.774.4 ± 1.278.6 ± 3.378.0 ± 3.187.6 ± 0.487.2 ± 1.091.6 ± 1.891.2 ± 2.2
SGR75.3 ± 0.773.6 ± 0.583.6 ± 2.280.3 ± 0.688.7 ± 0.386.5 ± 1.693.2 ± 1.791.7 ± 3.3

Share and Cite

MDPI and ACS Style

Liu, J.; Zhao, M.; Kong, W. Sub-Graph Regularization on Kernel Regression for Robust Semi-Supervised Dimensionality Reduction. Entropy 2019, 21, 1125. https://doi.org/10.3390/e21111125

AMA Style

Liu J, Zhao M, Kong W. Sub-Graph Regularization on Kernel Regression for Robust Semi-Supervised Dimensionality Reduction. Entropy. 2019; 21(11):1125. https://doi.org/10.3390/e21111125

Chicago/Turabian Style

Liu, Jiao, Mingbo Zhao, and Weijian Kong. 2019. "Sub-Graph Regularization on Kernel Regression for Robust Semi-Supervised Dimensionality Reduction" Entropy 21, no. 11: 1125. https://doi.org/10.3390/e21111125

APA Style

Liu, J., Zhao, M., & Kong, W. (2019). Sub-Graph Regularization on Kernel Regression for Robust Semi-Supervised Dimensionality Reduction. Entropy, 21(11), 1125. https://doi.org/10.3390/e21111125

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop