1. Introduction
High-dimensional data are increasingly common in everyday life, which has greatly affected the methods that are used for data analysis and processing. Performing clustering analyses on high-dimensional data has become one of the hot spots in current research. It is usually assumed that high-dimensional data are distributed within a joint subspace, which is composed of multiple low-dimensional subspaces. This supposition has become the premise of the study of subspace clustering [
1].
Over the last several decades, researchers have proposed many useful approaches for solving clustering problems [
2,
3,
4,
5,
6,
7]. Among these methods, spectral-type subspace clustering has become the most popular for use in subspace algorithms because of its outstanding clustering effects when processing high-dimensional data. The spectral-type method mainly utilizes local or global information from data samples to construct affinity matrices. In general, the representation matrices greatly affect the performance of spectral-type subspace clustering. In recent years, a vast number of different methods have been proposed in the attempt to find better representation matrices. Two typical representative methods in linear spectral-type clustering are sparse subspace clustering (SSC) [
2,
8] and low-rank representation (LRR) [
3,
9], which are based on linear representation. They emphasize sparseness and low rank when solving coefficient matrices.
Initially, SSC used the L0 norm to represent the sparsity of the matrices. When considering that the optimization of the L0 norm is NP-hard, the L0 norm was finally replaced by the L1 norm. SSC pursues the sparse representation by minimizing the L1 norm of coefficient matrices, which can better classify data. However, the obtained matrices are too sparse and miss the important correlation structures of the data. Thus, SSC leads to unsatisfactory clustering results for highly correlated data. To make up for the shortcomings of SSC, LRR employs the nuclear norm to search for low-rank representations with the goal of better analyzing the relevant data. By describing global structures, LRR improves the clustering performance of highly correlated data. However, the coefficient matrices that are obtained using the LRR model may not be sparse enough. To this end, many improved methods have been proposed.
Luo et al. [
10] made use of the prior of sparsity and low rank and then presented a multi-subspace representation (MSR) model. Zhuang et al. [
11] improved the MSR model by adding non-negative constraints to the coefficient matrices and then proposed a non-negative low-rank sparse representation (NNLRS) model. Both the MSR and NNLRS models ensure that coefficient matrices have both sparsity and grouping effects by using a combination of sparsity and low rank. To explore local and global structures, Zheng et al. [
12] incorporated local constraints into LRR and presented a locally constrained low-rank representation (LRR-LC) model. Chen et al. [
13] developed another new model by adding symmetric constraints to the LRR so that highly correlated data in the subspaces had consistent representation. To better reveal the essential structures of subspaces, Lu et al. [
14] utilized the F norm to constrain the coefficient matrices and proposed the use of least squares regression (LSR) for subspace clustering. This method demonstrated that enforced block diagonal (EBD) conditions were helpful for block diagonalization, i.e., under EBD conditions, the structures of the obtained coefficient matrices are block diagonal. In the meantime, LSR encouraged the grouping effect by clustering highly correlated data together. It has been theoretically proven that LSR tends to shrink the coefficients of related data and is robust to noise. More importantly, the objective function of LSR produces closed solutions, which are easy to solve and reduce the time complexity of the algorithm. Generally, under the independent subspace assumption, the models that use the L
1 norm, nuclear norm or F norm as regularization terms can satisfy EBD conditions. This can help them to learn affinity matrices with block diagonal properties.
Assuming that the data samples are not contaminated or damaged and that the subspaces are independent of each other, the above methods can obtain coefficient matrices with block diagonal structures. However, the structures that are obtained using these methods are unstable because the required assumptions are often not established. When a fragile block diagonal structure is broken, it causes the clustering performance to decline to a great extent. For this reason, many subspace algorithms have been proposed that extend the idea of block diagonal structures. Zhang et al. [
15] introduced an effective structure into LRR and proposed a recognition method to distinguish block diagonal low-rank representation (BDLRR), which learned distinguished data representations while shrinking non-block diagonal parts and highlighting block diagonal representations to improve the clustering effects. Feng et al. [
16] devised two new subspace clustering methods, which were based on a graph Laplace constraint. Using the above methods, block diagonal coefficient matrices can be precisely constructed. However, their optimization processes are inefficient because each updating cycle contains an iterative projection. Hence, the hard constraints of the coefficient matrices are difficult to satisfy precisely. To settle this matter, Lu et al. [
17] used block diagonal regularization to add constraints into the solutions and then put forward a block diagonal representation (BDR) model. These constraints, which were more flexible than direct hard constraints, could help the BDR model to easily obtain solutions with block diagonal properties. Therefore, the regularization was called a soft constraint. This block diagonal regularization caused BDR to be non-convex, but it could be solved via simple and valid means. This model imposed block diagonal structures on the coefficient matrices, which was more likely to lead to accurate clustering results. Due to the good performance of the k-diagonal block within subspace clustering, numerous corresponding extended algorithms [
18,
19,
20] have been proposed.
Under the conditions of independent subspaces and noise-free data, LSR can obtain coefficient matrices that have block diagonal properties, which usually produce exact clustering results. However, the coefficient matrices that are obtained using the LSR method are sensitive to noise or corruption. In real environments, data samples are often contaminated by noise. In this case, the fragile block diagonal structures can become damaged. When the block diagonal structures are violated, the LSR method does not necessarily lead to satisfactory clustering results.
Inspired by the BDR method, we constructed a novel block diagonal least squares regression (BDLSR) subspace clustering method by combining LSR and BDR. Considering that the LSR method meets EBD conditions and has a grouping effect, we directly pursued block diagonal matrices on the basis of LSR. The block diagonal regularizer was a soft constraint that encouraged the block diagonalization of the coefficient matrices and could improve the accuracy of the clustering. However, the appearance of the block diagonal regularizer resulted in the non-convexity of BDLSR. Fortunately, the alternating minimization method could be used to solve the objective function. Meanwhile, the convergence of the objective function could also be proven. Our experimental results from using several public datasets indicated that BDLSR was more effective.
The rest of the paper is organized as follows. The least squares regression model and the block diagonal regularizer are briefly reviewed in
Section 2. The BDLSR model is proposed, optimized and discussed explicitly in
Section 3. In
Section 4, the experimental effectiveness of BDLSR is demonstrated using three datasets. Finally, the conclusions are summarized.
3. Subspace Clustering Using BDLSR
In this section, our BDLSR method, which is a combination of least squares regression and block diagonal representation, is proposed for subspace clustering. Then, the objective function of the BDLSR model is presented and the optimization process is described. Finally, the convergence of the BDLSR model is proven.
3.1. The Proposed Model
In theory, under the independent subspace assumption, LSR can obtain coefficient matrices that have block diagonal attributes. In other words, when the similarity between classes is zero and the similarity within classes is more than zero, perfect data clustering can usually be achieved. However, in real environments, the block diagonal structures of the solutions can be easily broken by data noise or corruptions. Considering the advantages of block diagonal representation, we integrated it into the objective function of LSR to obtain solutions with strengthened block diagonal structures, which required the matrices to be simultaneously non-negative and symmetric. Then, we proposed the BDLSR method for subspace clustering. The objective function of BDLSR for handling noisy data was as follows (where
and
Y is a data matrix):
where the coefficient matrix
B is non-negative and symmetric and its diagonal elements are all zero. The above conditions had to be satisfied to use a block diagonal regularizer. However, the conditions limited the representation capability of
B. To alleviate this issue, we needed to relax the restrictions on
B. Thus, we introduced an intermediary term
Z into the above formula. The new model was rewritten equivalently as follows (where
):
where
and
are the balance parameters. When
was sufficiently large, Equation (4) was equivalent to Equation (3). The new added item
was strongly convex for updating
Z and
B; thus, it was easy to solve closed-form solutions and perform convergence analyses. Then, we could alternatively optimize the variables
Z,
V and
B.
3.2. The Optimization of BDLSR
Next, we solved the minimization problem of the BDLSR model in Equation (4). Because of the block diagonal regularizer, it was observed that Equation (4) was non-convex. The key to solving this issue was the non-convex term
. Fortunately, we could utilize the related theorem in [
17,
21] to reformulate
. The theorem in [
17] was presented as follows.
We let
. Then:
where
.
Then,
could be reformulated as a convex problem:
Due to
being defined as
, Equation (4) was equivalent to the following problem:
The above problem involved three variables, which could be obtained by alternating the minimization. Then, we decomposed the problem into three subproblems and developed the update steps separately.
By fixing the variables
B and
V, we could update
Z using:
Note that Equation (6) was a convex program. By setting the partial derivative of
Z to 0, we could obtain its closed-form solution. The solution was:
Similarly, by fixing the variables
B and
Z, we could update
V using:
According to Equation (8), we could obtain the solution for
V as follows:
where
consists of
k eigenvectors, which are associated with the
k smallest eigenvalues of
.
Then, we could solve
B. By fixing the variables
Z and
V, we could update
B using:
By converting Equation (10), the following function was obtained:
According to the proposition in [
17], we could obtain the closed-form solution of
B:
where:
The complete optimization procedure for the BDLSR model for subspace clustering is summarized in Algorithm 1.
Algorithm 1 The optimization process for solving Equation (5). |
Input: Data matrix Y, , . |
Initialize:. |
while not converge do |
1: Update by solving (6). |
2: Update by solving (8). |
3: Update by solving (11). |
4: m = m + 1. |
end while |
Output:. |
3.3. Discussion
To solve Equation (4), we developed an iterative update algorithm, which is shown in this section. For a given data matrix, we could settle the proposed BDLSR problem by using representation matrices Z and B, which could be obtained using Algorithm 1. Then, we used Z and B to construct the affinity matrices, e.g., or . Finally, we performed spectral-type subspace clustering on the constructed affinity matrices.
We noted that the BDLSR algorithm contained three parameters, which could be utilized to balance the different terms in Equation (4). We could first run BDR for the suitable and values or run LSR for the suitable and values. After that, we could search the last remaining parameter to achieve the highest clustering accuracy.
Note that the number of subspaces had to be known to construct the affinity matrices. Considering this requirement is required for all spectral-type subspace algorithms, the number always had to be known. Our proposed BDLSR algorithm used the block diagonal structure and its clustering effectiveness was shown in the following experiments.
3.4. Convergence Analysis
Because of the block diagonal regularizer, the objective function in Equation (4) was non-convex. Fortunately, Equation (4) could be solved by alternating the minimization and its solution could be proven to converge to the global optimum. The alternating minimization method is summarized in Algorithm 1. Now, we explain the convergence of Algorithm 1 theoretically.
To guarantee the convergence of BDLSR, the sequence
that was generated by Algorithm 1 needed to be proven to have at least one limit point. In addition, any limit point
of
was a stationary point of Equation (5). Now, with regard to Algorithm 1, we detail the convergence analysis by referring to [
17].
The objective function of Equation (5) was denoted as , where m represents the m-th iteration. We let and . The indicator functions of and were denoted as and , respectively.
First, according to the update process of
in Equation (6), we obtained:
Note that
is
-strongly convex with regard to
Z. We obtain
Then, due to the optimality of
in Equation (8), we obtained:
Next, according to the update process of
in Equation (11), we obtained:
Note that
was strongly
-convex with regard to
B. We obtained:
By combining and simplifying (13)–(15), we obtained:
Hence, monotonically decreased and was upper bounded. This implied that and were bounded. Additionally, implied that and were bounded.
Since
and
were positive semidefinite, we obtained
. Thus,
. Then, by summing Equation (16) over
m = 0, 1, 2, …, we obtained:
Using Equation (18) and the update process of
in Equation (8), we obtained:
Then, since
,
and
were bounded, a point
and a subsequence
existed such that
,
and
. Then, according to Equations (17)–(19), we obtained
,
and
. On the other hand, by considering the optimality of
in Equation (8),
in Equation (6) and
in Equation (10), we obtained:
By letting
in Equation (20), we obtained:
Thus, was a stationary point of Equation (5).