1. Introduction
With the rapid development of information technology and artificial intelligence, the scale and dimensionality of data are also increasing significantly. This vast amount of high-dimensional data pose great challenges to traditional machine learning and statistical analysis. In the fields of data mining and machine learning, data representation [
1] is a fundamental and crucial task. It remains a fundamental and critical task, as it directly influences the performance and interpretability of subsequent models. Effective low-dimensional data representation can handle massive high-dimensional data, reduce redundant features in the original data, and reveal the latent structural information of the data [
2]. The primary objective of data representation is to effectively characterize the original data using various techniques, thereby facilitating subsequent tasks like clustering and classification. As a result, data representation plays a crucial role in a range of applications, including information retrieval, classification, hyperspectral image processing [
3], and information extraction [
4,
5,
6,
7].
Over the past few decades, to effectively handle high-dimensional data, researchers have developed a variety of data representation methods. These methods include Principal Component Analysis (PCA) [
8], Manifold Learning [
9,
10,
11], Linear Discriminant Analysis (LDA) [
12], Concept Factorization (CF) [
13], Sparse Coding (SC) [
14], non-negative matrix factorization (NMF) [
15,
16] Deep Learning (DL) [
17], and low-rank representation (LRR) [
18]. Among them, non-negative matrix factorization (NMF) has become one of the most widely used data representation techniques [
19] due to its excellent interpretability. However, the classic NMF algorithm only imposes non-negativity constraints, which is not sufficient to meet the diverse clustering needs. Therefore, to enhance its clustering performance, researchers have embedded various additional constraints into the NMF algorithm. For example, Gao et al. [
20] proposed a sparse non-negative matrix factorization (SNMF) algorithm, which enhances the algorithm’s learning capability by embedding sparsity constraints as a penalty term within the NMF framework. Similarly, Ding et al. [
21] introduced an Orthogonal Non-negative Matrix Tri-factorization (ONMTF) algorithm, which imposes orthogonality constraints on both the basis matrix and coefficient matrix, aiming to produce clearer and more interpretable clustering results during NMF decomposition. Although these NMF variants improve clustering performance by imposing different constraints on the classical NMF algorithm, they fail to consider the manifold structure in the data, which is crucial for the manifold structure of clustering data. To explore the geometric structure of data manifolds and feature manifolds, researchers encode the geometric information of data and feature spaces by constructing similarity graphs and embed graph regularization into the original NMF to reveal the intrinsic geometric structure of the data.
Consequently, various graph-regularized NMF variants have been proposed. For example, Cai et al. [
22] introduced graph-regularized NMF (GNMF), which learns the local manifold structure of the data space by constructing a similarity graph. However, GNMF only considers the similarity in the data space and does not address the feature space. For this purpose, Shang et al. [
23] proposed dual regularized NMF (DNMF) by constructing two similarity graphs to explore the geometric structure information in both the data and feature spaces simultaneously. Inspired by Shang et al. [
23], Sun et al. [
24] introduced sparse dual graph-regularized NMF (SDGNMF), which not only incorporates label information into the graph regularization but also imposes sparsity constraints on the basis matrix. To prevent misalignment between the image and basis vectors and further enhance the algorithm’s discriminative ability, Li et al. [
25] proposed semi-supervised dual orthogonally constrained dual graph-regularized NMF (SDGNMF-BO). This method integrates dual orthogonality constraints and dual graph regularization into a semi-supervised NMF framework, further enhancing the learning capability in the subspace. To maximize the sparsity of the learned coefficient matrix, Li et al. [
26] introduced semi-supervised graph and local coordinate regularized NMF. By embedding local coordinate constraints into a semi-supervised NMF framework with graph regularization, the sparsity of the coefficient matrix is enhanced. Inspired by Li et al. [
26] and considering that matrix factorization may have multiple solutions, Wang et al. [
27] proposed locally orthogonally constrained semi-supervised dual graph-regularized NMF (LOSDNMF). This method integrates dual graph regularization, local coordinate constraints, and dual orthogonality constraints into a semi-supervised NMF framework, effectively enhancing the sparsity and discriminative power of data representation. However, although graph regularization methods are generally superior to many other approaches, these methods often rely on the K-nearest neighbors (KNN) approach to construct the graph structure. However, this construction can interfere with the matrix factorization process, potentially leading to suboptimal clustering results. To explore the nonlinear structure of data and construct a similarity graph with an optimal block-diagonal structure, Xu et al. [
28] proposed an explicit data-driven kernel learning strategy. This strategy directly learns the kernel through the self-representation of the data, simultaneously enabling adaptive weighting. Based on this kernel, the local manifold structure of the data can be preserved in the nonlinear space via a kernel-based local manifold term, facilitating the construction of a graph structure with an optimal block-diagonal form. Recognizing that multi-view clustering can enhance clustering performance by effectively integrating complementary information from different views, Xu et al. [
29] further proposed a novel multi-view clustering algorithm that adaptively constructs a kernel matrix without requiring a predefined kernel function. Inspired by the aforementioned research, constructing adaptive graph structures has also gained increasing attention in non-negative matrix factorization (NMF). For instance, the NMF with Adaptive Neighbors (NMFAN) method [
30] introduces an adaptive graph mechanism to achieve simultaneous optimization of matrix factorization and similarity learning. This method balances the interactions between these two sub-tasks, allowing each sub-task to iteratively optimize based on the results of the other, thereby ensuring a more accurate construction of the similarity matrix in the data graph. Additionally, Shu et al. [
31] proposed a new data representation method (RCNMF) by imposing a rank constraint on the Laplacian matrix of the learned graph to ensure that the connected components precisely match the sample categories. On the other hand, with the continuous development of non-negative matrix factorization (NMF), an increasing number of constraints have been introduced. However, excessive constraints may lead to unreliable solutions. Therefore, to address the issue of limited degrees of freedom caused by too many constraints, some researchers have introduced a third decomposition factor, denoted as R, within the NMF framework. This additional factor, serving as a scaling function, not only provides extra degrees of freedom for the factor matrices X, U, and V, but also enhances the flexibility of the decomposition process. For example, Tang et al. [
32] proposed a new three-factor matrix decomposition model. By introducing dual graph regularization and dual orthogonality constraints into NMF, this model not only explores the geometric properties of data and feature manifolds but also ensures the orthogonality of the factor matrices.
Inspired by the aforementioned algorithms, we have not fixed the input feature map and input data graph related to the affinity matrix in our model. Instead, we learn new data similarity matrix D and feature similarity matrix A based on the initial data similarity matrix W and the initial feature similarity matrix S. These new optimal similarity graphs are more suitable for clustering tasks. Moreover, we have imposed rank constraints on the Laplacian graphs of these two new similarity matrices to ensure that the number of connected components is consistent with the number of sample categories. Furthermore, inspired by recent research on orthogonal constraints by Ding et al. [
21] and sparse constraints by Luo et al. [
33], we have introduced sparsity and double orthogonal constraints within the non-negative matrix factorization framework with dual Laplacian rank constraints. These double orthogonal constraints not only address the issues of slow optimization and high computational complexity in existing NMF models but also prevent mismatches between images and base vectors. Thus, this enhancement effectively improves the discriminative and exclusive nature of clustering.
Therefore, we propose a novel non-negative matrix factorization algorithm called sparse feature-weighted dual Laplacian rank-constrained non-negative matrix factorization (SFLRNMF), along with its extended version, sparse feature-weighted dual Laplacian rank-constrained non-negative tri-factor matrix factorization (SFLRNMTF). Our main contributions include the following:
- (1)
A novel learning mechanism, dual Laplacian rank-constrained non-negative matrix factorization (SFLRNMF), has been proposed. This mechanism is capable of learning the optimal feature similarity graph and data similarity graph, and rank constraints are applied to the Laplacian matrices of both graphs to construct an optimal dual graph regularizer. Additionally, a weight matrix has been constructed to explore the attributes of the original data and the diversity of the samples. Furthermore, an -norm sparsity constraint has been imposed on the basis matrix to promote sparsity, simplify computation, and enhance the model’s local learning capabilities and robustness.
- (2)
Based on SFLRNMF, the dual Laplacian rank-constrained non-negative matrix tri-factorization (SFLRNMTF) model is proposed. Specially, orthogonal constraints are imposed on both the coefficient matrix and the basis matrix to ensure that each data point has a unique basis vector in the feature space, which allows one to enhance the discriminative ability of clustering. Additionally, excessive constraints may lead to too few degrees of freedom, which results in unreliable solutions during factor decomposition. Therefore, factor R has been introduced to ensure the accuracy of the factor decomposition.
- (3)
Corresponding iterative update optimization schemes have been proposed, and convergence proofs for the two algorithms have been provided. Furthermore, the effectiveness of these two models has been validated by conducting experiments on benchmark datasets and comparing the results with several of the most advanced clustering methods.
The rest of the paper is structured as follows: In
Section 2, we introduce the basic principles of standard NMF and its variants. In
Section 3, the novel SFLRNMF algorithm is presented, and its convergence is provided theoretically.
Section 4 introduces the extended version of SFLRNMF and provides the convergence of the optimization process. In
Section 5, we conduct numerical experiments to demonstrate the efficiency of our two methods. In
Section 6, we summarize the paper and discuss future work.
3. The Presented Model
3.1. The Motivation of the Proposed Method
Inspired by Local Linear Embedding (LLE) [
10] and Laplacian Eigenmaps (LE) [
11], researchers have developed several graph-regularized NMF clustering algorithms. However, the clustering performance of these algorithms highly depends on the quality of the graph model. Therefore, we propose automatically adjusting the weights of a given similarity matrix to learn the optimal similarity matrix. Specifically, our goal is to learn the new data similarity matrix D and feature similarity matrix A based on the initial data similarity matrix W and the initial feature similarity matrix S, thus constructing an optimal graph more suited for clustering tasks.
Figure 1 shows the process of constructing this optimal graph. We use traditional weighting methods, such as 0–1 weighting, heat kernel weighting, or probabilistic neighborhood methods, to construct nearest neighbor graphs related to the data similarity matrix W and feature similarity matrix S. Then, we impose rank constraints on the Laplacian matrix of the learned similarity matrices and iteratively update the similarity matrices A and D.
Through the analysis above, in this section, we propose a new NMF model, namely sparse feature-weighted dual graph Laplacian rank-constrained non-negative matrix factorization (SFLRNMF), designed for graph-based clustering.
3.2. Sparse Constraints
Sparse constraints involve using appropriate sparse models to represent sparse data. Introducing sparse constraints into non-negative matrix factorization (NMF) combines the advantages of NMF and sparse representation, thus enhancing the effectiveness of NMF methods. In particular, imposing sparse constraints on the basis matrix U when decomposing the original matrix X has been proven to be a very successful and practical strategy. When each row of the basis matrix is sparse, fewer basis elements are needed to represent the original matrix, which greatly aids in data recovery. Therefore, sparse constraints have received widespread attention in recent years. Xu et al.’s [
34] research shows that the sparse effect of the
-norm is better when
, making the
-norm-based sparse constraint an increasingly favored condition among researchers. We first use the
-norm to impose sparse constraints on the basis matrix U with the specific method as follows:
Imposing -norm sparse constraints on the basis matrix enhances the algorithm’s robustness, local learning capability, and clustering performance, while making the basis matrix U sparser and simplifying the computational process.
3.3. Bi-Orthogonal Constraints
Ding et al. [
21] proposed that in non-negative matrix factorization, satisfying
, for a given solution
, there also exists another solution
where
and
. To avoid such erroneous solutions, orthogonal constraints should be applied to the basis matrix after decomposition, such that
. Additionally, to differentiate various features and ensure that each feature vector points in distinct directions, which facilitates clearer and more distinguishable clustering in the sample space, orthogonal constraints should also be applied to the coefficient matrix, such that
. Imposing orthogonal constraints in both the feature and sample spaces helps to distinguish different features and samples more prominently. This significantly enhances the performance of clustering algorithms, making different data groups more distinct and independent.
3.4. Feature Weighting
In this section, we integrate a feature weighting mechanism into non-negative matrix factorization (NMF) to better differentiate the importance of features in the original matrix, thereby improving the model’s performance and interpretability. By introducing a feature weighting matrix T, the objective function can be summarized as follows:
3.5. Models of Proposed Methods
Our strategy is to construct an optimal dual graph regularizer based on the initial data similarity matrix and the initial feature similarity matrix, both of which block diagonal matrices. To establish this objective, we begin with the following theorem.
Theorem 1 ([
35])
. If the affinity matrix is non-negative, the Laplacian matrix , where the i-th element is . The degree matrix is defined as a diagonal matrix with the following important properties [36,37], where W is the initial data similarity graph. Theorem 1 indicates that given an affinity matrix W, if the rank of the Laplacian matrix is equal to n-k, it is an ideal graph. Therefore, for a given initial data affinity matrix W and feature affinity matrix S, we can learn the corresponding data similarity matrix D and feature similarity matrix A. The Laplacian matrices corresponding to these two similarity matrices are and , respectively.
By applying rank constraints to the two Laplacian matrices, namely
and
, and using the Frobenius norm to measure the approximation error between the initial affinity matrices
W and
S and the learned similarity matrices D and A, respectively, the constrained Laplacian rank for graph-based clustering can be formulated as the following problem:
The above objective function, due to depending on S and depending on W, along with the constraints , is evidently a complex nonlinear consrained problem. Next, we will reformulate this problem using Laplacian rank constraints.
Assume
represents the i-th smallest eigenvalue of
, and
represents the i-th smallest eigenvalue of
. Additionally, since
and
are both semi-definite matrices, their eigenvalues
and
are non-negative. It can be seen that for a sufficiently large value of
, problem (22) is equivalent to the following problem:
Therefore, when
is sufficiently large, the optimal solutions S and W to Equation (25) will make
equal to zero, thereby satisfying the two rank constraints in problem (22). According to the Ky Fan theorem [
38], the following equality can be obtained:
Further, expression (23) can be rewritten in the following form:
It is worth noting that 1 is a vector with all elements equal to 1. Next, we construct the optimal dual graph regularization term within the feature-weighted NMF framework. The objective function is written in the following form:
Considering the sparsity constraints and orthogonality constraints in Equation (26), the objective function of the proposed SFLRNMF framework can be summarized as follows:
Here, and are non-negative and balance the weight of the first reconstruction error term and the other terms, with being the sparsity parameter. T is a diagonal matrix, , which assigns a weight to each feature of the original matrix X.
3.6. An Efficient Iterative Update Rule for Solving the Proposed Model
To address this non-convex problem, by alternately optimizing the following variables, we can transform the Lagrangian function of the objective function (27) into the following equations:
Here,
, and
is a diagonal matrix. We can compute the diagonal element of its i-th row as follows:
Here,
is a sufficiently small constant to avoid overflow in the above equation. To iteratively update the basis matrix U, the coefficient matrix V, and the feature weighting matrix T, we should take the partial derivatives of J:
According to the Karush–Kuhn–Tucker (KKT) conditions, the iterative updates for the basis matrix U, the coefficient matrix V, and the feature weighting matrix T are as follows:
To obtain the updates for the new data graph S and the new feature graph W, we use alternating optimization. First, we fix U, V, T, and W, then update S:
Equation (36) is equivalent to optimizing the following problem:
Note that for different i, the above problem is independent, so we can solve the above problem by solving for each i separately.
where definition
, and the j-th column element of
is denoted by
(similarly for
and
). Problem (35) can be written in vector form as follows:
Equation (39) can be solved using the simplex sparse learning model proposed by Huang et al. [
39].
Next, update matrix W; similarly, we can update W by fixing matrices U, V, T, and S as follows:
Similarly, optimizing (40) is equivalent to optimizing the following problem:
For each row of
and
, we have
Similarly, we define
, and the j-th column element of
is denoted by
(similarly for
and
). Problem (39) can be written in vector form as follows:
For solving Equation (43), the simplex sparse learning model proposed by Huang et al. [
39] is used.
The optimization process of the SFLRNMF algorithm is shown in Algorithm 1.
Algorithm 1 The process of SFLRNMF algorithm |
Input: Initial Data similarity matrix S and Characteristic similarity matrix , parameter ; Data matrix . the neighbor number , and the maximum iteration number Niter. Output: fundamental matrix , coefficient matrix Initialization: Repeat: Fixing S and W, update U, V and T Update U by Equation (33) Update V by Equation (34) Update T by Equation (35) Fixing U, V and T, update S and W Update S by Equation (39) Update W by Equation (43) Compute by , by Until: convergence |
3.7. Convergence Analysis of the SFLRNMF Algorithm
In this section, we analyze the convergence of SFLRNMF and prove that the objective function in Equation (27) is monotonically decreasing under the iterative update rules (33) to (35).
Firstly, we analyze the convergence of the iterative update rule in Equation (34).
Definition 1. Provided the following conditions are met [40]: , , where is an auxiliary function of . Assuming that the (t + 1)-th iteration update rule is as follows: Therefore, we can prove that , which implies that converges.
The above equation is an auxiliary function of , where .
Proof. Given that the first and second derivatives of
are
and
, we can derive the Taylor expansion of
as follows:
By simultaneously solving Equations (44) and (45), we obtain as the local minimum of Equation (45), and as the corresponding local minimum value.
Given the equation , we have . Therefore, we can derive that , Since is an auxiliary function of , is monotonically decreasing.
In the same method, we can prove that under the iterative update rule (35), is monotonically decreasing. □
Proof. According to Lemma 2, we have the following equation:
In the i-th iteration, we fix
as
to solve for
,
, and
. We define the following function:
Given that
, we obtain the following inequality:
Combining inequalities (48) and (51), we obtain the following inequality: . Thus, is monotonically decreasing under the updated Equation (33). Based on the above convergence analysis, we can conclude that the objective function (27) is monotonically decreasing under the iterative update rules (33)–(35), (39), and (43). □
4. Sparse Feature-Weighted Double Laplace Rank-Constrained NMTF Model
In this section, we propose a novel extension of SFLRNMF, called sparse feature-weighted dual Laplacian rank-constrained non-negative matrix tri-factorization (SFLRNMTF). In SFLRNMTF, we introduce an additional factor R and dual orthogonality constraints to SFLRNMF. This not only enhances the accuracy of the low-rank representation but also ensures a more robust model by incorporating different scales of X, U, and V.
4.1. SFLRNMF with Three-Factor (SFLRNMTF)
To apply the idea of dual orthogonality constraints to SFLRNMF, we incorporate an additional factor R and dual orthogonality constraints into Equation (27) for SFLRNMF, as shown below:
Here, is a diagonal scaling matrix, and T has the same meaning as in Equation (27).
4.2. An Efficient Iterative Update Rule for Solving Model (SFLRNMTF)
We rewrite the objective function (51) of the SFLRNMTF model as a Lagrangian function, as shown below:
To update the matrices U, V, R, and T, first, we compute the partial derivatives of the Lagrangian function
with respect to different variables, as shown below:
Based on the Karush–Kuhn–Tucker (KKT) conditions [
39], the iterative updates for matrices U, V, R, and T are as follows:
The optimization process for SFLRNMTF is shown in Algorithm 2.
Algorithm 2 The process of SFLRNMTF algorithm |
Input: Initial Data similarity matrix S and Characteristic similarity matrix W, parameter α, θ, λ; Data matrix . the neighbor number k, and the maximum iteration number Niter. Output: fundamental matrix U, coefficient matrix V Initialization: Repeat: Fixing S and W, update U, V and T Update U by Equation (58) Update V by Equation (59) Update R by Equation (60) Update T by Equation (61) Fixing U, V and T, update S and W
Update S by Equation (39) Update W by Equation (43) Compute by , by Until: convergence |
4.3. Convergence Analysis of the SFLRNMTF Algorithm
In this section, we theoretically prove the convergence of SFLRNMTF and demonstrate that the objective function in Equation (52) is monotonically decreasing under the iterative update rules (58)–(61).
Definition 2. If the following conditions proposed by Shang et al. [40] are satisfied, that is, , , and is an auxiliary function of , then for iteration , the update rule is as follows: Thus, it can be proven that converges.
The above equation is an auxiliary function for , where .
Proof. First, the first-order derivative of is , and the second-order derivative of is .
Thus, we can rewrite
in the following Taylor series form:
By solving Equations (60) and (61) simultaneously, we find that
is a local minimum of Equation (61), and
is the local minimum point corresponding to this local minimum.
we can derive that
. Since
is an auxiliary function of
,
is monotonically decreasing.
In the same method, we can prove that under the iterative update rule (35), , and are monotonically decreasing. □
According to Lemma 4, we have
To solve for
,
,
, and
, we set
to
in the i-th generation. We define the following function:
Since
=
, we obtain the following inequality:
Combining inequalities (67) and (70), we obtain the following inequality:
Thus, is monotonically decreasing. Based on the above analysis, we can conclude that the objective function (52) of SFLRNMTF is monotonically decreasing under the update rules in Equations (39), (43), and (58)–(61).
5. Experiments and Analysis
In this subsection, to verify the efficiency of the proposed methods (SFLRNMF and SFLRNMTF), we conducted numerical experiments on four datasets (COIL20, JAFFE, UMIST, and YaleB32) to evaluate their robustness, sensitivity, and convergence. We compared the clustering performance with eight algorithms, including PCA [
8], NMF [
15,
16], GNMF [
22], DNMF [
23], DSNMF-LDC [
42], NMFAN [
30], LOSDNMF [
27], EWNMF [
43], SGLNMF [
26], and SDGNMF-BO [
25].
All numerical experiments were conducted on a PC with Windows 10 operating system, a CPU of 3.40 GHz, and 8GB of memory, using the Matlab 2021a platform. The specific characteristics of these datasets are shown in
Table 2.
5.1. Datasets
- (1)
COIL-20 Dataset: The COIL-20 dataset was developed by Columbia University and contains images of 20 different real objects. The dataset comprises a total of 1,440 images.
- (2)
JAFFE Dataset: This dataset involves Japanese female facial expressions from ten female volunteers, with each category containing seven facial expressions. Each facial emotion of each volunteer was photographed two to four times, resulting in a total of 213 images.
- (3)
UMIST Dataset: This dataset was constructed by the University of Manchester Institute of Science and Technology (UMIST) and includes facial images of 20 individuals with varying poses, races, genders, and appearances. The dataset contains a total of 575 images.
- (4)
YaleB Dataset: This dataset contains 2414 frontal face images taken under controlled lighting conditions in a laboratory, representing 38 subjects. Each image was manually cropped to a 32 × 32 pixel grayscale image and then converted into a 1024-dimensional vector.
5.2. Parameter Setting
In this section, to ensure fairness, we employ k-means as the clustering method to compare the clustering performance of SFLRNMF, SFLRNMTF, and eight other algorithms across four datasets. For our SFLRNMF and SFLRNMTF models, the primary parameters involve the dual Laplacian graph rank constraint parameter , the sparsity parameter , and the orthogonality constraint parameter . The range for all regularization parameters is set within . The maximum number of iterations is set to 200, and the best average result from 10 experiments is considered the final clustering result. For all graph-based methods, the number of nearest neighbors is fixed at five. For all semi-supervised NMF methods, in each dataset, 20% of the data points from each category are randomly selected as labeled samples, and these points are used to construct the label constraint matrix A. Meanwhile, to verify the performance of our proposed algorithms on different classification metrics, we randomly select the number of clusters c within the range of 2 to 10 for clustering experiments on the Jaffe and YaleB datasets. For clustering experiments on the COIL20 and UMIST datasets, the range for the number of clusters c is set from 10 to 20.
Furthermore, due to the multitude of algorithms compared, to enhance the clarity of reading for the readers, in the clustering experiments, the names of all NMF variant algorithms have been abbreviated as follows: GNMF is abbreviated as G, DSNMF-LDC as LDC, NMFAN as AN, DNMF as D, LOSDNMF as LOSD, EWNMF as EW, SGLNMF as SGL, SDGNMF-BO as SDG, SFLRNMF as SFLR, and SFLRNMTF as SFLRT. This treatment helps in making the understanding and comparison of each algorithm clearer to the readers.
5.3. Clustering Performance
Table 3,
Table 4,
Table 5,
Table 6,
Table 7,
Table 8,
Table 9 and
Table 10 show the clustering results in terms of ACC and NMI values for the four datasets. The results in bold represent the optimal results. The corresponding clustering results are shown in
Figure 2,
Figure 3,
Figure 4,
Figure 5,
Figure 6,
Figure 7,
Figure 8 and
Figure 9.
Table 2 and
Table 3, along with
Figure 2 and
Figure 3, display the clustering results on the JAFFE dataset. As shown, some algorithms achieved 100% in both ACC and NMI when the number of clusters was set to two. However, as the number of clusters increased, our algorithms, SFLRNMF and SFLRNMTF, consistently performed well across most clustering settings, achieving average accuracy rates (ACC) of 97.28% and 95.89% and average Normalized Mutual Information (NMI) of 98.16% and 96.80%, respectively. Compared to the lowest-performing NMF algorithm, SFLR and SFLRT showed an average increase in accuracy of 7.21% and 8.09% and in NMI of 10.89% and 11.8%, respectively. These results demonstrate that our methods have a significant advantage in processing facial recognition images.
Table 4 and
Table 5, along with
Figure 4 and
Figure 5, present the clustering results on the COIL20 dataset. These data clearly show that, compared to ten other benchmark algorithms, our algorithms achieved satisfactory results. Specifically, in terms of average ACC scores, DSNMF-LDC’s scores are second only to ours, and in terms of average NMI scores, DNMF’s scores also rank just below ours. More precisely, our SFLRNMF and SFLRNMTF algorithms exceed the average ACC of DSNMF-LDC by 6.47% and 6.99%, respectively, and surpass DNMF’s average NMI by 5.48% and 5.94%, respectively. Moreover, as observed from
Figure 4 and
Figure 5, regardless of the number of clusters, the highest ACC and NMI scores consistently appear in our two algorithms. These results amply demonstrate the superiority and stability of our algorithms.
The clustering results on the UMIST dataset are shown in
Table 6 and
Table 7, as well as
Figure 4 and
Figure 5. According to
Table 6 and
Table 7, it can be observed that in terms of ACC, the SFLRNMTF algorithm generally performs the best among the majority of the clustering settings, followed by the SFLRNMF algorithm. In terms of NMI, the SFLRNMTF algorithm also exhibits high performance, with an average score of 84.42%, while the average score for the SFLRNMF algorithm is 84.84%. Compared to other algorithms, such as the poorest performing NMF and the best performing DSNMF-LDC, our SFLRNMTF algorithm’s average ACC is higher by 30.75% and 6.95%, respectively. In terms of NMI, the average NMI of the SFLRNMF algorithm is higher than that of NMF and DSNMF-LDC by 26.54% and 2.48%, respectively. It is noteworthy that although the average ACC score of SFLRNMT is higher than that of SFLRNMF, its average NMI score is lower than that of SFLRNMF. The reason for this phenomenon is that NMI is actually a measure of the joint probability distribution across different classes, which means it calculates by comparing the mutual information and entropy between generated labels and true labels. Therefore, the more points that are misclassified into the same class, the higher the NMI score at the same ACC value. This explains why SFLRNMTF has a higher average ACC score than SFLRNMF, yet a slightly lower average NMI score. Furthermore, as depicted in
Figure 4 and
Figure 5, regardless of the number of clusters, the ACC and NMI curves of SFLRNMF and SFLRNMTF consistently remain above those of the other algorithms compared. These results indicate that the SFLRNMF and SFLRNMTF algorithms proposed by us not only produce significant effects but also demonstrate strong stability when processing facial recognition images in the UMIST dataset.
Similarly, the clustering performance on the YaleB dataset is presented in
Table 8 and
Table 9, as well as in
Figure 8 and
Figure 9. In terms of accuracy, the SFLRNMF algorithm was found to perform best in most clustering setups, especially when the number of clusters was set to 2, achieving a high score of 91.64%, significantly surpassing the other algorithms. On average, an ACC score of 65.02% was obtained by SFLRNMF, with SFLRNMF closely following at an average score of 63.49%. In contrast, the worst performing algorithm was NMF, which only achieved an average ACC score of 27.39%. In terms of Normalized Mutual Information (NMI), high scores were also demonstrated by SFLRNMTF, particularly when the number of clusters was 2, reaching an NMI score of 81.56%. Overall, the average NMI score for SFLRNMTF was 58.85%, slightly higher than that of SFLRNMF, which was 57.09%. Compared to LOSDNMF, which exhibited poorer clustering performance, both SFLRNMF and SFLRNMTF achieved higher scores in ACC and NMI across various numbers of clusters. Furthermore, as observed in
Figure 8 and
Figure 9, although the ACC and NMI curves for all algorithms showed a downward trend with an increasing number of clusters, the curves for SFLRNMF and SFLRNMTF consistently remained above those of the other comparative algorithms. These results indicate that the proposed SFLRNMF and SFLRNMTF algorithms not only produce significant effects in handling facial recognition images from the YaleB dataset but also exhibit stability across various cluster number settings. Particularly, the SFLRNMF algorithm showed the most stable performance among all compared algorithms.
Similarly, the clustering performance on the YaleB dataset is demonstrated in
Table 8 and
Table 9, as well as
Figure 8 and
Figure 9. In terms of accuracy, the SFLRNMTF algorithm performs best under most clustering counts. For accuracy (ACC), the average ACC score of SFLRNMF is 65.02%, followed closely by SFLRNMTF with an average score of 63.49%. In contrast, the worst performing algorithm is NMF, with an average ACC score of only 27.39%. In terms of Normalized Mutual Information (NMI), SFLRNMF also shows higher scores, particularly when the cluster count is 2, reaching an NMI score of 81.56%. Overall, the average NMI score of SFLRNMF is 58.85%, slightly higher than that of SFLRNMTF at 57.09%. Compared to LOSDNMF, which has poorer clustering performance, both SFLRNMF and SFLRNMTF outperform in both ACC and NMI scores across various numbers of clusters. Furthermore, as shown in
Figure 8 and
Figure 9, although the ACC and NMI curves for all algorithms tend to decline as the number of clusters increases, the curves for SFLRNMF and SFLRNMTF consistently remain above those of the other comparative algorithms. These results indicate that the proposed SFLRNMF and SFLRNMTF algorithms not only produce significant effects in handling facial recognition images from the YaleB dataset but also exhibit stability across different cluster settings. Notably, the SFLRNMF algorithm shows the most stable performance among all compared algorithms.
5.4. Visualization Comparison
Based on the clustering comparison experiments mentioned above, we plan to delve deeper into the superiority of our two algorithms through their subspace learning capabilities. For a more intuitive and effective comparison, we selected 10 categories from four test datasets as the input data for the experiments (since too many data categories might create visual clutter that could hinder analysis, while too few categories might not adequately highlight the subspace learning performance of different algorithms). The visual comparison results are displayed in
Figure 10 and
Figure 11. Note that the subgraphs containing category labels do not obscure any comparative information.
Figure 10 and
Figure 11 demonstrate that, on the COIL-20 and UMIST datasets, our SFRLNMF and SFRLNMTF algorithms perform excellently, being capable of distinguishing data samples more clearly. Particularly in the visual comparison on the COIL-20 dataset, both methods proposed by us are demonstrated to successfully and clearly separate the samples from other categories. This indicates that our methods can efficiently learn low-dimensional subspace representations. Not only does this experiment validate the reliability of the aforementioned clustering experiments, but it also confirms the contributions of our two methods.
5.5. Parameter Sensitivity Analysis
In this section, we explored the sensitivity of parameters for our method on the JAFFE dataset. As seen from the SFRLNMF objective function in Equation (27), three fundamental parameters are involved: the orthogonality constraint parameter
, the sparsity parameter
, and the graph regularization parameter
. These parameters were selected from the range [
,
,
,
,
,
,
,
,
]. The comparison results for ACC and NMI are displayed in three-dimensional bar charts in
Figure 12a,b,
Figure 13a,b and
Figure 14a,b.
From
Figure 12a,b, it can be observed that with the variation in the values of parameters
and
, the values of ACC and NMI decrease, indicating that the SFLRNMF is sensitive to the parameters
and
on the JAFFE dataset, particularly evident when
equals 1000. Specifically, when α is selected from [
,
,
,
,
], the scores of these two metrics are slightly higher.
Similarly, we fixed
at 1000 and varied
and
to compute the scores for ACC and NMI. The corresponding 3D histograms for ACC and NMI are shown in
Figure 13a,b. From
Figure 13a,b, it is evident that the overall scores for ACC and NMI on the JAFFE dataset are satisfactory, particularly when
is chosen from the range [
,
,
,
,
,
], where the scores are highest. Additionally, when
is within the range [
,
,
,
,
,
,
,
,
], regardless of how the value of
changes within the given range, the scores for ACC and NMI remain high, indicating that the parameter
has a minimal impact on the clustering results. In this scenario, it can be considered that the SFLRNMF algorithm demonstrates greater robustness when
is fixed.
To explore the sensitivity of other parameters in the SFLRNMF algorithm, a similar analysis of parameter sensitivity was conducted with the
parameter fixed, as illustrated in
Figure 14.
According to
Figure 14a,b, overall, the scores of the ACC and NMI metrics vary with changes in these two parameters. Particularly, the best results are obtained when the values of parameters
and
range from [
,
]. Therefore, in this case, the SFLRNMF is more sensitive to parameters
and
.
Next, we also conducted a parameter sensitivity analysis for SFLRNMTF with respect to the parameters
,
, and
, since SFLRNMTF is an improved version of SFLRNMF and thus they share the same parameters. When
,
, and
were fixed at 1000, respectively, the corresponding 3D histograms for the ACC and NMI scores on the JAFFE dataset are shown in
Figure 15,
Figure 16 and
Figure 17, while the ranges for the other two parameters remained set at
,
,
,
,
,
,
,
,
.
From these figures, it can be concluded that compared to parameter
, parameter
shows higher sensitivity to the performance of SFLRNMTF. When parameter
is set to 1000, the appropriate range for
is [
,
]. When
is 1000 and
is within the range [
,
,
,
,
], better performance is achieved by the algorithm. Additionally, regardless of the values of
and
, the overall variation in ACC and NMI remains almost constant, indicating that when parameter
is fixed at 1000, both
and
exhibit robustness in the performance of SFLRNMTF. However, in
Figure 16 and
Figure 17, it can be seen that the variations in ACC and NMI are uneven and unstable, indicating that parameters
and
exhibit higher sensitivity to the performance of SFLRNMTF. When parameter
is set to 1000, the clustering performance of the SFLRNMTF algorithm is optimal and remains stable under other conditions. Additionally, it is evident that under these circumstances, the optimal ranges for θ and
are [
,
,
,
,
] and [
,
], respectively.
5.6. Convergence Analysis
Based on the mathematical derivations of the convergence of two algorithms discussed in
Section 3.7 and
Section 4.3 of the paper, it was verified that these two algorithms are highly efficient in addressing the problem of local optima. To further assess the convergence of the SFLRNMF and SFLRNMTF algorithms, experiments were conducted on the four datasets discussed in this paper. In the experiments, the number of categories was set to the maximum number of categories for each dataset, and the convergence curves of our model on these four datasets were plotted, as shown in
Figure 18 and
Figure 19. In these figures, the
y-axis represents the value of the objective function, while the
x-axis represents the number of iterations. The experimental results demonstrated the superior performance of our algorithms on these datasets.
From
Figure 18a–d, it can be seen that the SFLRNMF algorithm demonstrates extremely fast convergence on four datasets (JAFFE, COIL20, UMIST, and YaleB). The objective function value drops rapidly and stabilizes within less than 10 iterations. This consistent, fast convergence indicates that SFLRNMF can efficiently find a local optimal solution across various datasets, highlighting the robustness and adaptability of the algorithm. In contrast,
Figure 19 shows the convergence curves of the SFLRNMTF algorithm. Observing
Figure 19b–d, it can be seen that the SFLRNMTF algorithm also exhibits a relatively fast convergence rate on the COIL20, UMIST, and YaleB datasets, with the objective function value stabilizing within 10 iterations, indicating high optimization efficiency on these datasets. However, in
Figure 19a, the convergence rate of SFLRNMTF is relatively slower on the JAFFE dataset. Although the objective function value decreases significantly in the first 10 iterations, it takes approximately 40 iterations to fully converge.
In summary,
Figure 18 and
Figure 19 indicate that both algorithms exhibit favorable convergence performance across different datasets. Overall, SFLRNMF and SFLRNMTF both converge within a relatively small number of iterations, confirming their stability and efficiency on various datasets. Notably, the SFLRNMF algorithm demonstrates a consistently rapid convergence across all datasets.
5.7. Calculation Time Analysis
In this section, we analyze the computation time by comparing the duration required for clustering experiments conducted on four datasets, in order to more clearly understand the specific impacts of different data dimensions and dataset sizes on computation time. These datasets include JAFFE, COIL20, UMIST, and YaleB. During the experiments, the number of clusters was set to the maximum number of categories each dataset contains. To ensure the stability and reliability of the results, ten independent experiments were conducted on each dataset, and the average execution time of these experiments was calculated. The results are summarized in
Table 11, which details the average computation time for each dataset under various clustering settings.
From the data analysis in
Table 11, it is observed that, firstly, datasets with larger dimensions significantly increase runtime, demonstrating that the model’s runtime is notably influenced by the dimensions and size of the data. Secondly, algorithms based on graph structures have longer computation times compared to non-graph-based algorithms. This indicates that graph-based algorithms consume a substantial amount of time in constructing graphs, especially in high-dimensional datasets, where this difference in time is more pronounced. In terms of algorithm types, algorithms utilizing dual graph structures take longer to execute compared to those using a single graph structure. Furthermore, semi-supervised algorithms (such as LOSDNMF, SGLNMF, SDGNMF-BO) have longer runtimes compared to unsupervised algorithms. For unsupervised dual graph algorithms like DNMF, as well as SFLRNMF and SFLRNMTF, the latter two demonstrate superior performance on high-dimensional datasets. This is primarily due to these algorithms having faster convergence rates and fewer iterations, significantly enhancing efficiency when dealing with large-scale or high-dimensional data.