1. Introduction
Due to the growing availability of datasets of large-scale networks, community detection has attracted significant consideration. The community detection problem is to discover a community structure by dividing the network into multiple clusters according to the affinity between nodes. Because the spectral clustering method is easy to implement and can detect non-convex clusters, it is widely used for detecting clusters in networks. Compared to the traditional algorithms, spectral clustering performs well and has many fundamental advantages [
1,
2,
3,
4].
In the spectral clustering algorithm, the similarity between the data points is reflected by the weights on the edges in the graph. The data points are mapped to a lower-dimensional space through the Laplacian matrix of the graph, and finally, the non-convex datasets in the obtained low-dimensional space are clustered by traditional clustering algorithms.
Let be an undirected and unweighted simple graph with n nodes, where V and E are the set of nodes and edges, respectively. The adjacency matrix of graph G, denoted by , is a 0–1 symmetric matrix of order n, where the -th and -th element is 1 if there is an edge between two nodes i and j, and 0 otherwise. Let , which is defined as the degree of node i. Moreover, and are called the maximal degree and minimal degree of G, respectively. Denote as the average degree of graph G, which equals . The degree matrix is defined by . The symmetric matrix is called an unnormalized Laplacian of G, each of whose row sum is zero. The normalized Laplacian has zero as the smallest eigenvalue and plays an very important role in the spectral clustering algorithm. It is well defined only in case exists, i.e., there are no isolated nodes.
In 2002, Ng et al. [
5] proposed a version of spectral clustering (NJW) under the normalized Laplacian matrix. Moreover, the authors in [
5] analyzed their algorithm using matrix perturbation theory and gave the conditions for the algorithm performing well when nodes from different clusters are well-separated. However, when dealing with a sparse network with a strong degree of heterogeneity, i.e., the minimum degree of the graph is low, and NJW cannot concentrate well. To resolve this issue, Chaudhuri and Chung [
6] introduced the notion of a degree-corrected random-walk Laplacian
and demonstrated that it outputs the correct partition under a wide-range graph generated from extended planted partition (EPP) model. Instead of doing the spectral decomposition on the entire matrix, Chaudhuri and Chung [
6] divided the nodes into two random subsets and only used the induced subgraph on one of those random subsets to compute the spectral decomposition. Qin and Rohe [
7] investigated the spectral clustering algorithm using the degree-corrected normalized Laplacian
under the degree-corrected stochastic blockmodel, where
. This method extended the previous statistical estimation results to the more canonical spectral clustering algorithm, which is called the regularized spectral clustering (RSC). Recently, Qing and Wang [
8] proposed an improved spectral clustering under the degree-corrected stochastic blockmodel also, where
, (ISC). Unlike NJW and RSC, which use the top
k eigenvectors to construct the mapping matrix, ISC uses the top
eigenvectors and the corresponding eigenvalues instead and outperforms especially in the weak signal networks, where
k is the number of clusters.
Actually, previous works for spectral clustering with the degree-corrected Laplacian were mostly applied to graphs generated from stochastic blockmodels. Moreover, the optimal
has a complex dependence on the degree of distribution of the graph and
provides good results [
6,
7]. In [
7], the authors claimed that when
, it could be adjusted by a multiplicative constant and the results are not sensitive to such adjustments. However, some numerical experiments show that an appropriate
could be found for a better performance.
This paper investigates the spectral clustering algorithm using the degree-corrected Laplacian in view of spectral graph theory [
9] and shows that it also works for a wide class of graphs. Moreover, we also provide theoretical guidance on the choice of the parameter
. Finally, six real-world datasets are used to test the performance of our method for an appropriate
. The results are roughly equivalent to that of RSC, or even better.
The rest of this paper is organized as follows. In
Section 2, we list some relative definitions and useful lemmas in the analysis of our main results in
Section 3. In
Section 4, some numerical experiments are conducted for the real-world datasets. Moreover, some artificial networks are generated to analyze the effect of our method in terms of some related parameters. The conclusion and future work are provided in
Section 5.
2. Preliminary
Let
be a graph. The symmetric difference of two subsets
S and
T of
V is defined as
. For a subset
S of
V,
. The symbol
denotes the volume of
S that is given by the sum of degree of all notes in
S, i.e.,
. If
k disjoint subsets
of
V satisfy
, we call
a
k-way partition of
V. Kolev and Mehlhorn [
10] introduced the minimal average conductance denoted by
where
U is a set of containing every
k way partition of the points set of
G, and
. A partition
is called optimal, if it satisfies that
In this paper, we denote
as the actual partition returned by the RSC algorithm, where
k is the number of classes of the graph.
Let denote the 2-norm for a vector and denote the Frobenius norm for a matrix.
The k-means algorithm tends to find a set of k centers to minimize the sum of the squared-distance between the points and the center to which it is assigned.
Let
F be a spectral embedding map from
V to a vector space. Given any
k-way partition of
G and a set of vectors, say
and
, respectively, the cost function of partition
of
V, mentioned in [
11], is defined as
The main idea of this function is to expand each element of V by making copies of and form a set with nodes. Then, it acquires a partition by using k-means algorithm. The “trick” is to copy every node u to identical nodes. This method can efficiently deal with the networks, which have the overlap between clusters. For convenience, it is necessary to assume that the k-means clustering algorithm outputs of the expansion of vertices V satisfying the following condition.
- (A)
For every , all copies of are contained in one part.
Suppose that
is the partition of
V with centers
, which is the output of the
k-means clustering algorithm, the value of the clustering cost function is denoted by “COST”, i.e.,
Then, we will introduce the traditional NJW and RSC Algorithm 1.
Algorithm 1. The traditional NJW and RSC algorithm |
- Input:
( for RSC) - 1:
Calculate the normalized Laplacian matrix . ( for RSC). - 2:
Find the eigenvectors corresponding to the k largest eigenvalues of L. Form by putting the eigenvectors into the columns. - 3:
Normalize each row of X to get matrix Y, i.e., where and . - 4:
Apply k-means method to Y to get the label of each node. - Output:
labels for all nodes
|
3. Analysis of RSC Algorithm
Our method for analyzing the RSC algorithm follows the strategy developed by Peng et al. [
11], Kolve et al. [
10], and Mizutani [
12]. Let
be a partition of the nodes set of
V. Define
is the normalized indicator of
. That means, if
, the
v-th element of
is one, or else is zero. The normalized indicator
of
is given as
It is obvious that .
The following result is called the structure theoremwhich plays a very important role to examine the performance of the spectral clustering. It shows that there is a linear combination of such that and are close.
Theorem 1 (Structure Theorem)
. Letwhere ( for short) is the -th largest eigenvalue of , and be the -optimal partition of G, , If , then there exists a orthogonal matrix , such that Proof. Denote
as the element in
corresponding to the vertex
u. Moreover, let
Let , , and . Considering the singular value decomposition of , given as , where and are orthogonal matrices, and is a diagonal matrix.
Let
and
. Then,
is an orthogonal matrix. According to the proof of Theorem 4 in [
12], it obtains
This completes the proof. □
Given
k vectors, say
, we suppose that
is lower bounded by some real numbers
and
is upper bound by a real number
, i.e.,
We are now ready to derive the bounds of
and
shown in (
4) for the RSC algorithm. Let
and
be the
v-th row of
, corresponding to the node
v. Since
U is an orthogonal matrix, the inequality (
3) can be rewritten as
The spectral embedding map in the RSC algorithm, denoted by
, is given as
Hence, according to the discussion in [
12], it is easy to obtain the upper bound of “COST”. The discussion needs the following inequality.
Lemma 1 ([
12])
. The following inequality holds for a vector and a vector with , Theorem 2. Let a partition of G be an optimal achieving and be the spectral embedding map in RSC algorithm. Define the center of , for , then
.
,
where .
Proof. First, since
, we have that
On the other hand, let
. Then,
The result holds. □
Assume that stands for the optimal clustering cost of graph G, then it is obvious that , where is an approximation ratio. Moreover, . Therefore, we can obtain the upper bound of COST.
Theorem 3. Let be a -optimal partition of G. Then Lemma 2 ([
12])
. Assume that, for every permutation , there is an index l such that for a real number . Then, the following inequality holds,where H is a subset of , p is an element of and is a non-negative real number satisfying , and ω is the upper bound of in (4). By setting and , then we obtain the following result.
Theorem 4. Suppose that the assumption of Lemma 2 holds. Then, Theorem 5 (Main result)
. Given a graph G = (V,E) and a positive integer k, let a partition of G be and be a partition of G returned by the RSC clustering algorithm. Assume that a k-means clustering algorithm has an approximation ratio of α and satisfies assumption (A). If , then, after a suitable renumbering of , the following holds for , Proof. Assume that, for every permutation
, there is an index
l such that
for a real number
. Hence, applying Theorems 3 and 4, we can obtain the following
which contradicts Theorem 3. That means, after a suitable renumbering of
, we have
for every
. □
4. Finding an Appropriate and Numerical Experiment
The main theorem gives an upper bound of in RSC algorithm. It tells us that the performance would vary while the term decreases with increasing . In this section, we will try to find an appropriate for the good partitioning, according to this main theorem.
Before our analysis, we may make some reasonable assumptions as (B) to (D).
- (B)
- (C)
- (D)
Firstly,
stands for the ratio of the edges in
to the degree summation of all points in
. We may assume that
, since
is one of clusters in the optimal partitioning. Second, as mentioned in [
6,
7], the choice of
is very important. If
is too small, there is insufficient regularization. If
is too large, it washes out significant eigenvalues. Then, it is reasonable to assume that
. Moreover,
and
stand for the edges in the responding cluster and the ratio of them stands for the relative density. Hence, we can assume that
Then,
where
is the
-th largest eigenvalue of
. Furthermore, the theoretical analysis in [
7] shows that
provides good results and one could adjust this by a multiplicative constant. For these reasons, we set
and attend to find an appropriate
to refine the algorithm.
Six real datasets are used to test our method. These datasets can be downloaded directly from
http://zke.fas.harvard.edu/software.html, accessed on 10 September 2022.
Table 1 shows the detail information of six real datasets, including the source of the dataset, the number of data points (
n), the number of communities included (
k), the minimum degree of data points (
), and the maximum degree of data points (
).
4.1. Find an Appropriate
Figure 1 plots the variation of UB(
) when
varies between 0 and 1 in six real datasets. It is obvious that
is decreasing with increasing
. The following theorem (often called the Geršgorin disc theorem) makes this observation true.
Theorem 6 (Geršgorin Disk Theorem)
. Let anddenote the deleted absolute row sums of A. Then, all the eigenvalues of A are located in the union of n discs It tells us that, for all , decreases with increasing . Then, and the term will decrease as well. It is easy to see that . Therefore, we would like to find an appropriate , such that the upper bound will not vary too much, when varies small.
According to Theorem 5, we may assume that
Then
follows from the assumption
and
.
Define
as the absolute difference of
when
increases 0.005, i.e.,
. We would like to find that
satisfies the following conditions:
In the rest of this paper, three indices, namely RI, NMI, and error rate, are used to evaluate the effectiveness.
Evaluation Indices
Rand Index For a dataset with
n data points, the total number of sample pairs is
. If two sample points belong to the same class are classified into the same class, we denote the number of such sample pairs as
a. If two sample points belong to different classes are divided into different classes, we denote the number of such sample pairs as
b. The calculation formula of RI is as follows:
The value represents the proportion of correctly clustered sample pairs in all sample pairs and is often used to measure the similarity between two datasets. Obviously, is between 0 and 1. If , the clustering is completely correct, and if , it is completely wrong.
Normalized Mutual Information We use
U and
V to denote the true label vector and predicted label vector, respectively. Let
represent the elements belonging to class
i in U and
represent the elements belonging to class
j in V.
represents the information entropy of
U, that could be calculated by
where the base of the logarithmic function is usually 2 and
represents the ratio of the number of nodes belonging to class
i to the total amount of nodes, i.e.,
. Now, we can obtain the formula for calculating mutual information (MI):
where
. Based on the information entropy and the mutual information, we can obtain the normalized mutual information as
Error Rate Error rate is defined by
where
and
are the true and predicted labels of node
i, respectively.
4.2. Real Networks Experiments
After some pre-processing, these six real datasets are all networks containing
k non-overlapping communities and are labeled. We will use RSC-
to stand for the RSC algorithm when
which satisfies the condition in (
6). Actually, NJW, RSC, and RSC-
are three different cases in RSC algorithm, when
takes different values. When
, it is NJW algorithm. When
, it is the RSC algorithm. When
in (
6), it is RSC-
.
Table 2 shows the experimental results of these three cases. Furthermore, the best performance in each dataset is indicated by the bold-type letter. The last row in
Table 2 shows the corresponding
in RSC-
.
As can be seen from the table, RSC- is fully clustered correctly on UKfaculty and karate dataset. Moreover, RSC- achieves the best clustering results on the politicalblog dataset, with only 58 clustering errors.
Table 3 shows the items of the upper bound for
proposed in Theorem 5. From the observation, the performance of the RSC-
algorithm is effected by the two parameters of
and
. For example, the RSC-
does not perform well in caltech and dolphins. All networks except caltech have the minimal average conductance smaller than 0.4 and that of caltech is larger than 0.5. Although dolphins has a small
,
is larger than 2.
4.3. Synthetic Data Experiments
In this section, we will use artificial networks to evaluate the performance of the RSC- algorithm in terms of the average degree, mixing parameter, and the number of nodes in the largest community. We generate artificial networks using the LFR benchmark, which is considered as a standard test network for community detection, characterized by a non-uniform distribution of node degrees and community sizes.
The test artificial networks are generated with the following parameters: the number of nodes (n), the average degree (
), the maximum degree(maxd), the mixing parameter (
), the number of nodes in the smallest community (minc), and the number of nodes in the largest community (maxc). The value of the mixing parameter, denoted by
, is between 0.1 and 0.9. Low amounts of
give a clear community structure where the intra-cluster link is much more than the inter-cluster link [
17].
4.3.1. The Ratio of the Average Degree to the Maximum Degree
In this experiment, we generate nine artificial networks consisting of 500 nodes. To evaluate the performance of RSC-
in terms of the average degree, we fix the parameter
, minc = 100, maxc = 300, maxd = 220, respectively, and the average degreevaries from 10 to 170, i.e., 10, 30, 50, 70, 90, 110, 130, 150, and 170, respectively. Then, the ratio of the average degree to the maximum degree varies from 0.0455 to 0.7727. The performance comparison is summarized in
Figure 2.
From our observation, we can understand that the performance of RSC-
is highly dependent on the average degree of the network. With the average degree increasing, RI and NMI increase, and the error rate decreases significantly. Actually, this phenomenon is verified by the inequality (
2), since the equality holds when the graph is regular.
4.3.2. Mixing Parameter
In this experiment, we also generate nine artificial networks with 500 nodes and fix the parameter
, minc = 100, maxc = 300, maxd = 220, respectively. In order to study the effect of the mixing parameter on RSC-
,
varies from 0.1 to 0.9. The experimental results are shown in
Figure 3.
From the observation, we understand that RSC- performs excellently when is between 0 and 0.3. However, it drops sharply when is varying from 0.3 to 0.5. This phenomenon coincides with the result for the real datasets, that RSC- does not perform well when is larger than 0.5. However, the performance of RSC- remains stable when . This shows that RSC- is less affected by .
4.3.3. The Number of Nodes in the Largest Community
In this experiment, we generate 13 artificial networks consisting of 1700 nodes. To evaluate the performance of RSC-
in terms of the number of nodes in the largest community, we fix the parameter
,
= 30, minc = 300, maxd = 500, respectively, and the number of nodes in the largest community is varying from 300 to 900, step size is 50. The experimental results are shown in
Figure 4.
Since both the degree and the community size distributions, in the graph generated by the LFR benchmark, are power laws, this experiment uses to simulate , and the experiment result shows that the RSC- algorithm performs well when the network is “balanced”, which also verifies the results in the real datasets.
6. Discussion
The RSC algorithm uses a constant for the degree-correction. Can we use different degree-corrections for different nodes? We try to use the information of the neighbor nodes of each node as follows.
Let be the set of nodes adjacent to node i. Denote , , and .
Let
be a diagonal matrix of order
n. The modified normalized Laplacian matrix is
We used RSC-max, RSC-min, RSC-mean, and RSC-mid to denote the method when
equals to
,
,
,
, and
, respectively.
Table 4 shows the experimental results of these methods. We can see that the RSC-min algorithm is a bit better than RSC. The RSC-min algorithm performs better than RSC in five datasets, and only misclassifies two nodes on UKfaculty. Therefore, using a different degree-correction for each node might improve the performance of the RSC algorithm. We will leave this to our future work.