Next Article in Journal
Analysis of Scholarly Communication Activities in Buddhism and Buddhist Studies
Previous Article in Journal / Special Issue
Analysis and Visualization for Hot Spot Based Route Recommendation Using Short-Dated Taxi GPS Traces
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Graph Regularized Within-Class Sparsity Preserving Projection for Face Recognition

Institute of Image Processing & Pattern Recognition, Tai Zhou University, Taizhou 318000, China
*
Author to whom correspondence should be addressed.
Information 2015, 6(2), 152-161; https://doi.org/10.3390/info6020152
Submission received: 18 December 2014 / Revised: 2 April 2015 / Accepted: 10 April 2015 / Published: 24 April 2015
(This article belongs to the Special Issue Intelligent Data Analysis)

Abstract

:
As a dominant method for face recognition, the subspace learning algorithm shows desirable performance. Manifold learning can deal with the nonlinearity hidden in the data, and can project high dimensional data onto low dimensional data while preserving manifold structure. Sparse representation shows its robustness for noises and is very practical for face recognition. In order to extract the facial features from face images effectively and robustly, in this paper, a method called graph regularized within-class sparsity preserving analysis (GRWSPA) is proposed, which can preserve the within-class sparse reconstructive relationship and enhances separatability for different classes. Specifically, for each sample, we use the samples in the same class (except itself) to represent it, and keep the reconstructive weight unchanged during projection. To preserve the manifold geometry structure of the original space, one adjacency graph is constructed to characterize the interclass separability and is incorporated into its criteria equation as a constraint in a supervised manner. As a result, the features extracted are sparse and discriminative and helpful for classification. Experiments are conducted on the two open face databases, the ORL and YALE face databases, and the results show that the proposed method can effectively and correctly find the key facial features from face images and can achieve better recognition rate compared with other existing ones.

1. Introduction

Face recognition is an important but complicated problem in computer vision. It has broad applications ranging from computer interface to surveillance. Many algorithms have been proposed in literature, including two-dimensional face recognition and three-dimensional face recognition methods [1,2,3,4]. Three-dimensional face recognition, which tries to use 3D geometry of the face for identification, proves to be more robust to illumination, pose and disguise. However, the problem of facial expressions is a major issue in 3D face recognition, since the geometry of the face significantly changes with different facial expressions. Most of the images can be seen as two-dimensional matrices, so 2D face recognition also received tremendous attention in computer vision and pattern recognition. Subspace learning methods, such as principle component analysis (PCA) [5] and linear discriminant analysis (LDA) [6], have been extensively studied. Both of them seek to find the low-dimensional representation for the original high-dimensional data, and to preserve some kind of intrinsic structure.
PCA is an unsupervised method and the projections are obtained by maximizing the total scatter of the data. While LDA is a supervised method and it tries to maximize the ratio of between-class scatter to within-class scatter. Experiments show that LDA outperforms PCA in face recognition. However, it is reported that the face images probably reside in some sort of manifolds [7]. One problem of these two algorithms is that they only exploit the linear global Euclidean structure and ignores the local geometry structure. Although they have been extended to nonlinear methods like KPCA [8] and KLDA [9] by kernel trick, it is hard to choose a perfect kernel function and the computation is expensive.
Manifold learning tries to find an embedding that projects the high dimensional data onto low dimensional data while preserving the intrinsic geometry of data, especially the local geometry. The representatives are Isomap [10], locally linear embedding (LLE) [11] and Laplacian eigenmaps (LE) [12]. However, the manifold learning algorithms are affected by two critical problems [13]: (i) the construction of the adjacency graph, (ii) the embedding of new test data, which is also called the out of sample problem. As for the later problem, He proposed a linear method named locality preserving projections (LPP) [14] to approximate the eigenfunctions of the Laplace–Beltrami operator on the manifold, that is to say, LPP is a linear version of LE. By considering the local information and class label information, many variants [15,16,17,18] were proposed and can achieve good performance. One critical step in these methods is to construct the adjacency graph; however, how to define the sparse adjacency weight matrices is still an open problem.
Traditional method for adjacency graph is to use the k n e a r e s t neighborhood graph or ε-neighborhood graph. However, these two methods are all parametric and the performances of the algorithms are highly dependent on the choice of its parameter. In [19], it is reported that the adjacency graph structure and graph weights are highly interrelated and should not be separated. So, it is desirable to design a method that can simultaneously carry out these two tasks in one step. To this end, recently two l 1 graph construction methods [20,21] have been proposed, which complete the adjacency graph and graph weight calculation within one step.
Recently, the sparse representation (SR) [22] has been extensively studied and found wide applications in computer vision and image processing problems. The main idea of SR is that a query image can be sparsely represented as a linear combination of all the training samples, its non-zero representation coefficients are naturally sparse and the representations are mostly from the same class of the query image, SR is an unsupervised method but it exploits the discriminant nature of sparse representation for classification. Based on this idea, Qiao proposed sparsity preserving projection (SPP) [23] for feature extraction, which tries to preserve the sparse reconstructive relationship of samples in the low-dimensional data by minimizing the distance between sparsely reconstructed samples and the original sample. However, there are still some issues to be solved. First, SPP is an unsupervised method and does not make use of the class information. Second, when the dictionary is large, SPP is very time-consuming.
To this end, in this paper, a method called graph regularized within-class sparsity preserving projection analysis (GRWSPA) is proposed, which aims at preserving the within-class sparse reconstructive relationship by minimizing the distance between sparsely reconstructed samples in the same class (within class) and their corresponding original samples like SPP, which can reduce the computation time, as the number of samples in each class is usually much smaller than the total number of training samples. At the same time, by assuming samples in different classes lie on different sub-manifolds, it tries to maximize the scatter of inter-class samples by constructing a between-class adjacency graph, and pulls samples from different classes as far as possible.
The rest of the paper is organized as follows. In Section 2, SPP is briefly reviewed. The proposed algorithm is presented in Section 3. In Section 4, experiments are carried out to evaluate the proposed algorithm. Finally, the conclusions are drawn in Section 5.

2. Sparsity Preserving Projections

Let X = { x 1 , x 2 , , x n } , x i R D , i = 1 , 2 , , n be the training samples. In real applications, the dimensionality D is often very high. One reasonably way is to use dimensionality reduction to map the data from the high-dimensional space to a low dimensional one, which can be expressed mathematically as y i = A T x i R d for any x i , usually d < < D , here A is called the transformation matrix.
The idea of SPP is that every sample in the training samples can be represented as a linear combination of the remaining samples. That is, x i = X α i , here α i has the form of α i = { a 1 , a 2 , , a i 1 , 0 , a i + 1 , , a n } , and most elements of α i are zero. This can be formulated as
   min α α i 0 s . t . x i = X α i
where α i 0 denotes the l 0 norm, meaning the number of non-zero entries in α i . However, this problem is NP-hard. If α i is sparse enough, the above optimization can be replaced as
   min α α i 1 s . t . x i = X α i
This can be solved by standard convex programming method [24]. Suppose α ˜ i is the optimal solution to the above optimization, SPP then tries to preserve the sparse reconstruction relationship, which can be expressed as the following optimization:
   min i n A T x i A T X α ˜ i s . t . A T X X T A = I
which can be simplified by simple algebra:
     min i n A T x i A T X α ˜ i = A T i = 1 n ( x i X α ˜ i ) ( x i X α ˜ i ) T A = A T X ( i = 1 n ( e i α ˜ i ) ( e i α ˜ i ) T ) X T A = A T X ( I S S T + S T S ) X T A
So the optimal projections are the eigenvectors of the following generalized eigenvalue problem
X ( I S S T + S T S ) X T A = λ X X T A
where S = { α ˜ 1 , α ˜ 2 , ... , α ˜ n } .

3. Graph Regularized Within-Class Sparsity Preserving Analysis

From the above section, we can see that SPP is an unsupervised method, and does not use the label information properly. Moreover, the sparse representations are obtained from the whole training samples. If the number of training samples is large, the process is very computation-expensive. In this section we will present an improved SPP algorithm.

3.1. Preserve the Sparsity Structure for Within-Class Samples

In sparse representation, a test sample x i can be represented as a linear combination of all training samples, the non-zero sparse representation coefficients w i j can reflect the contribution of x j while reconstructing x i . The higher value w i j is, the more similar x j and x i are, and are supposed to concentrate on the training samples within the same class as the test sample. While the small value w i j means that x j has little contribution for reconstructing x i , and is probably from different classes. However, SPP does not consider the class information, and its adjacency graph weights are based on sparse representation and take the whole training samples as dictionary. However, it is very time-consuming if the number of training samples is large. One solution to this problem might be that we can take the samples in the same class as the dictionary to reconstruct x i , like SPP, it can be represented as:
   min w k , i 1 s . t . x k , i = X k . i w k , i    j w k . i j = 1
where x k , i is denoted as the i t h sample in the k t h class, i = 1 , 2 , ... , n k , k = 1 , 2 , ... , c , here n k means the number of samples in the k t h class, c means the number of classes. X k . i denotes the whole samples in the k t h class. The sparse representation coefficients w k , i have the form of w k , i = ( w k , i 1 , w k , i 2 , w k , i 3 , , w k , i i 1 , 0 , w k , i i + 1 , , w k , i n k ) T . Suppose w ˜ ˜ ˜ k , i = ( 0 , 0 , 0 , , 0 , w k , i T , 0 , , 0 ) T , then the weight matrix has the form of W = ( w 1 , 1 , w 1 , N 1 , w 2 , 1 , w 2 N 2 w c , 1 , w c , n c ) .
Like SPP, we hope that the sparse structure can be well preserved, which can be solved by the following formulation:
   min k = 1 c i = 1 n k A T x k , i A T X w k , i 2 s . t . A T X X T A = I
The above optimization can be reduced to the following problem:
   max t r ( A T X S X T A ) s . t . A T X X T A = I
where S = W + W T W T W .

3.2. Discover the Discriminant Structure for between Class Samples

It is supposed that samples from different classes lie on different sub-manifolds; one reasonable way for classification is to map these sub-manifolds as far as possible. We construct an adjacency graph G = ( X , B ) over the training data X to characterize the relationship for different classes. The elements of the weight matrix B can be defined as follows:
B i j = { ( 1 x i x j x i x j ) if  x i  and  x j  are k nearest neighbors but have different labels  0 otherwise    
In order to guarantee the discriminant ability in low dimensional representation, like Unsupervised Discriminant Projection (UDP) [25], we hope that the connected points in the adjacency graph should stay as distant as possible, which can be expressed as the following optimization:
max 1 2 n n i j ( y i y j ) 2 B i j
where y i is the low dimensional representation of x i . The above objective incurs a heavy penalty if nearby points x i and x j are mapped close while they are belonging to different classes, which is an attempt to ensure that if points x i and x j are close but are from different classes, then y i and y j are far apart, which can encode the local discriminant information and helpful for classification.
We can simplify the above optimization as follows:
J B = 1 2 n n i j n ( y i y j ) B i j = 1 n n A T S B A
where S B is called the Laplacian difference scatter matrix.
S B = 1 2 i = 1 n j = 1 n B i j ( x i x j ) ( x i x j ) T = 1 2 ( i = 1 n j = 1 n B i j x i x i T 2 i = 1 n j = 1 n B i j x i x i T + i = 1 n j = 1 n B i j x j x j T ) = i = 1 n D i i x i x i T i = 1 n j = 1 n B i j x j x j T = X D X T X B X T = X L X T
where D is a diagonal matrix, as D i i = j B i j , L = D B is the Laplacian matrix.

3.3. GRWSPA

To take the within-class reconstruction relationship and between-class separability into account, it is desirable to keep the reconstruction weights in the same class as SPP while maximize the local discriminant information. By combining 3.1 and 3.2, it can easily form the following optimization
   max t r ( A T X S X T A ) + μ t r ( A T X L X T A ) s . t . A T X X T A = I
where μ is a factor to balance the sparse representation and the discriminant ability.
For compact expression, the maximization problem can further be transformed to the following problem:
max A A T X S X T A + μ A T X L X T A A T X X T A
Then the optimal A is the eigenvectors corresponding to the largest d eigenvalues of the following generalized eigenvalue problem:
( X S X T + μ X L X T ) A = λ X X T A

4. Experimental Section

In this section, several experiments are carried out to show the effectiveness of the proposed algorithm on the ORL and YALE databases. We compare our method with some classic methods including LDA, LPP, UDP and SPP. For classification, we use the nearest neighbor classifier for its easy implementation. There is a parameter μ , here we set it to μ = λ max ( X S X T ) / λ max ( X L X T ) , where λ max ( X S X T ) means the maximum eigenvalue of X S X T . Note that, during the feature extraction, we will encounter that some matrices are singular, so here PCA is employed as a preprocessing step and keep 98% energy of images. For UDP, the neighborhood size needs to be determined, here we set it to k = n i 1 , where n i is the number of samples in the i t h class.
The ORL database contains 40 individuals; each has 10 sample images with some variations in poses, facial expressions and some details. For each image, it is taken at different times and has different variations including expressions like open or closed eyes, smiling or non-smiling. Some are captured with a tolerance for some tilting and rotation of the face up to 20 degrees. Figure 1 shows some samples of one subject from ORL database.
Figure 1. Samples of one subject from ORL database.
Figure 1. Samples of one subject from ORL database.
Information 06 00152 g001
We randomly choose l = 3 , 4 , 5 , 6 , a n d 7 images from each class for training and the remaining for test. For each l , we run 10 times for each algorithm and obtain the average rate as the recognition rate. Table 1 gives the classification accuracy rates (%) for each algorithm under different sizes of training.
Table 1. Recognition Rates on ORL.
Table 1. Recognition Rates on ORL.
Training PCALDAUDPSPPGRWSPA
378.284.782.883.282.8
483.790.888.288.889.9
586.893.788.790.495.2
689.195.693.891.596.8
792.496.994.794.898.1
To see how the dimensionality affects recognition rate, Figure 2 shows the recognition rates for different method with respect to different dimensionality on ORL database with four training samples per person.
Figure 2. Recognition rates vs. dimensionality on ORL database.
Figure 2. Recognition rates vs. dimensionality on ORL database.
Information 06 00152 g002
The YALE database contains 165 images from 15 subjects, with each 11 images. The images are captured with variations in lighting condition, facial expression (normal, happy, sad, sleepy, surprised, and wink). Figure 3 shows some samples of one subject from YALE database.
Figure 3. Samples of one subject from YALE database.
Figure 3. Samples of one subject from YALE database.
Information 06 00152 g003
We randomly choose l = 3 , 4 , 5 , 6 , a n d 7 images from each class for training and the remaining for test. For each l , we runs 10 times for each algorithm and obtain the average rate as the recognition rate. Table 2 gives the classification accuracy rates (%) for each algorithm under different sizes of training.
Table 2. Recognition Rates on YALE.
Table 2. Recognition Rates on YALE.
TrainingPCALDAUDPSPPGRWSPA
370.972.974.875.776.5
473.474.575.877.475.9
573.975.978.279.382.6
675.376.280.581.485.5
776.878.182.483.687.8
From Figure 2 and the tables above, we can see that all the algorithms perform better on ORL than YALE database. This is probably on ORL the images have less variation than the images on YALE. LDA and UDP outperform PCA, this is probably PCA is representative in the low dimensional space and helpful for reconstruction, while LDA is a supervised method and takes the class information into account. UDP, as a manifold learning algorithm, makes use of the local and non-local information of the face image, demonstrates its effectiveness in feature extraction. SPP is based upon sparse representation, which preserves the sparse reconstructive relationship of the data and contains natural discriminant information even if it is unsupervised. The proposed algorithm, on one hand, preserves the within-class sparse reconstructive relationships like SPP, on the other hand, maximizes the scatter of samples from different classes. So after projection, data from the same class are compact while data from different classes are far apart. So the proposed algorithm has much better performance than PCA, LDA, LPP, UDP and SPP.

5. Conclusions

In this paper, based on sparsity preserving projection, we propose a new algorithm called Graph Regularized Within-class Sparsity Preserving Analysis (GRWSPA). GRWSPA preserves the within-class sparse reconstruction weights so as to discover the intrinsic information, while maximizing the between-class scatter so that after projection the samples from different classes are far apart. Experiments were carried out on the ORL and YALE face databases, and the results demonstrate the performance advantage of the proposed algorithm over others.

Acknowledgments

This research was supported by Zhejiang Provincial Natural Science Foundation of China (Nos. LQ15F020001, LY14F020036), Scientific Research Fund of Zhejiang Provincial Education Department (Y201223744) and National Science Foundation of China under Grant No. 6127226.

Author Contributions

Songjiang Lou designed and wrote the paper, Ying Chen did the experiments and data analysis, Wenping Guo surveyed the related works, Xiaoming Zhao supervised the work. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sandbach, G.; Zafeiriou, S.; Pantic, M.; Yin, L. Static and Dynamic 3D Facial Expression Recognition: A Comprehensive Survey. Image Vis. Comput. 2012, 30, 683–697. [Google Scholar] [CrossRef]
  2. Vezzetti, E.; Marcolin, F. 3D human face description: Landmarks measures and geometrical feature. Image Vis. Comput. 2012, 30, 698–712. [Google Scholar] [CrossRef]
  3. Pears, N.; Heseltine, T.; Romero, M. From 3D point clouds to pose-normalised depth maps. Int. J. Comput. Vis. 2010, 89, 152–176. [Google Scholar] [CrossRef]
  4. Vezzetti, E.; Marcolin, F.; Stola, V. 3D human face soft tissues landmarking method: An advanced approach. Comput. Ind. 2013, 64, 1326–1354. [Google Scholar] [CrossRef]
  5. Turk, M.; Pentland, A. Eigenfaces for recognition. Cogn. Neurosci. 1991, 3, 71–86. [Google Scholar] [CrossRef]
  6. Belhume, P.N.; Hespanha, J.P.; Kriegman, D.J. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 711–720. [Google Scholar] [CrossRef]
  7. Seung, H.S.; Lee, D.D. The Manifold Ways of Perception. Science 2000, 290, 2268–2269. [Google Scholar] [CrossRef] [PubMed]
  8. Scholkopf, B.; Smola, A.; Smola, E.; Müller, K.-R. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Comput. 1998, 10, 1299–1399. [Google Scholar] [CrossRef]
  9. Baudat, G.; Anouar, F. Generalized Discriminant Analysis Using Kernel Approach. Neural Comput. 2000, 12, 2385–2404. [Google Scholar] [CrossRef] [PubMed]
  10. Tenenbaum, J.B.; de Silva, V.; Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef] [PubMed]
  11. Rowies, S.T.; Saul, L.K. Nonliear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [PubMed]
  12. Belkin, M.; Niyogo, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 2003, 15, 1373–1396. [Google Scholar] [CrossRef]
  13. Raducanu, B.; Dornaika, F. Embedding new observations via sparse-coding for non-linear manifold learning. Pattern Recognit. 2014, 47, 480–492. [Google Scholar] [CrossRef]
  14. He, X.; Yan, S.; Hu, Y.; Niyogi, P.; Zhang, H.-J. Face recognition using Laplacianfaces. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 328–340. [Google Scholar] [CrossRef] [PubMed]
  15. Chen, H.-T.; Chang, H.-W.; Liu, T.-L. Local Discriminant Embedding and Its Variants. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), San Diego, CA, USA, 20–25 June 2005; IEEE Computer Society: Washington, DC, USA, 2005; pp. 846–853. [Google Scholar]
  16. Yan, S.; Xu, D.; Zhang, B.; Zhang, H.-J.; Yang, Q.; Lin, S. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 40–51. [Google Scholar] [CrossRef] [PubMed]
  17. Fu, Y.; Yan, S.; Huang, T.S. Classification and feature extraction by simplexization. IEEE Trans. Inf. Forensics Secur. 2008, 3, 91–100. [Google Scholar] [CrossRef]
  18. Zhang, T.; Yang, J.; Wang, H.; Du, C.; Zhao, D. Maximum variance projections for face recognition. Opt. Eng. 2007, 46, 067206:1–067206:8. [Google Scholar]
  19. Wang, Y.; Zhao, Y.; Zhang, L.; Liang, J.; Zeng, M.; Liu, X. Graph Construction Based on Re-weighted Sparse Representation for Semi-supervised Learning. J. Inf. Comput. Sci. 2013, 10, 375–383. [Google Scholar]
  20. Cheng, H.; Liu, Z.; Yang, J. Sparsity induced similarity measure for label propagation. In Proceedings of IEEE 12th International Conference on Computer Vision (ICCV), Kyoto, Japan, 27 September–4 October 2009; pp. 317–324.
  21. Yan, S.; Wang, H. Semi-supervised Learning by Sparse Representation. In Proceedings of the 9th SIAM International Conference on Data Mining (SDM 09), Sparks, NV, USA, 30 April–2 May 2009; pp. 792–801.
  22. Wright, J.; Yang, A.Y.; Ganesh, A.; Shankar, S.S.; Ma, Y. Robust Face Recognition via Sparse Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 210–227. [Google Scholar] [CrossRef] [PubMed]
  23. Qiao, L.; Chen, S.; Tan, X. Sparsity Preserving Projections with Applications to Face Recognition. Pattern Recognit. 2010, 43, 331–341. [Google Scholar] [CrossRef]
  24. Chen, S.S.; Donoho, D.L.; Saunders, M.A. Atomic decomposition by basis pursuit. SIAM Rev. 2001, 43, 129–159. [Google Scholar] [CrossRef]
  25. Yang, J.; Zhang, D.; Yang, J.-Y.; Niu, B. Globally maximizing, locally minimizing: Unsupervised discriminant projection with application to face and palm biometrics. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 650–664. [Google Scholar] [CrossRef] [PubMed]

Share and Cite

MDPI and ACS Style

Lou, S.; Zhao, X.; Guo, W.; Chen, Y. Graph Regularized Within-Class Sparsity Preserving Projection for Face Recognition. Information 2015, 6, 152-161. https://doi.org/10.3390/info6020152

AMA Style

Lou S, Zhao X, Guo W, Chen Y. Graph Regularized Within-Class Sparsity Preserving Projection for Face Recognition. Information. 2015; 6(2):152-161. https://doi.org/10.3390/info6020152

Chicago/Turabian Style

Lou, Songjiang, Xiaoming Zhao, Wenping Guo, and Ying Chen. 2015. "Graph Regularized Within-Class Sparsity Preserving Projection for Face Recognition" Information 6, no. 2: 152-161. https://doi.org/10.3390/info6020152

APA Style

Lou, S., Zhao, X., Guo, W., & Chen, Y. (2015). Graph Regularized Within-Class Sparsity Preserving Projection for Face Recognition. Information, 6(2), 152-161. https://doi.org/10.3390/info6020152

Article Metrics

Back to TopTop