A Useful Criterion on Studying Consistent Estimation in Community Detection

Qing, Huan

doi:10.3390/e24081098

Open AccessArticle

A Useful Criterion on Studying Consistent Estimation in Community Detection

by

Huan Qing

School of Mathematics, China University of Mining and Technology, Xuzhou 221116, China

Entropy 2022, 24(8), 1098; https://doi.org/10.3390/e24081098

Submission received: 10 July 2022 / Revised: 4 August 2022 / Accepted: 8 August 2022 / Published: 9 August 2022

(This article belongs to the Special Issue Signal and Information Processing in Networks)

Download

Browse Figures

Versions Notes

Abstract

:

In network analysis, developing a unified theoretical framework that can compare methods under different models is an interesting problem. This paper proposes a partial solution to this problem. We summarize the idea of using a separation condition for a standard network and sharp threshold of the Erdös–Rényi random graph to study consistent estimation, and compare theoretical error rates and requirements on the network sparsity of spectral methods under models that can degenerate to a stochastic block model as a four-step criterion SCSTC. Using SCSTC, we find some inconsistent phenomena on separation condition and sharp threshold in community detection. In particular, we find that the original theoretical results of the SPACL algorithm introduced to estimate network memberships under the mixed membership stochastic blockmodel are sub-optimal. To find the formation mechanism of inconsistencies, we re-establish the theoretical convergence rate of this algorithm by applying recent techniques on row-wise eigenvector deviation. The results are further extended to the degree-corrected mixed membership model. By comparison, our results enjoy smaller error rates, lesser dependence on the number of communities, weaker requirements on network sparsity, and so forth. The separation condition and sharp threshold obtained from our theoretical results match the classical results, so the usefulness of this criterion on studying consistent estimation is guaranteed. Numerical results for computer-generated networks support our finding that spectral methods considered in this paper achieve the threshold of separation condition.

Keywords:

community detection; consistency; mixed membership network; separation condition; sharp threshold

1. Introduction

Networks with latent structure are ubiquitous in our daily life, for example, social networks from social platforms, protein–protein interaction networks, co-citation networks and co-authorship networks [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]. Community detection is a powerful tool to learn the latent community structure in networks and graphs in social science, computer science, machine learning, statistical science and complex networks [16,17,18,19,20,21,22]. The goal of community detection is to infer a node’s community information from the network.

Many models have been proposed to model networks with latent community structure; see [23] for a survey. The stochastic blockmodel (SBM) [24] stands out for its simplicity, and it has received increasing attention in recent years [25,26,27,28,29,30,31,32,33,34,35]. However, the SBM only models a non-overlapping network in which each node belongs to a single community. Estimating mixed memberships of the network whose node may belong to multiple communities has received a lot of attention [36,37,38,39,40,41,42,43,44]. To capture the structure of the network with mixed memberships, Ref. [36] proposed the popular mixed membership stochastic blockmodel (MMSB), which is an extension of SBM from non-overlapping networks to overlapping networks. It is well known that the degree-corrected stochastic blockmodel (DCSBM) [45] is an extension of SBM by considering the degree heterogeneity of nodes to fit real-world networks with various node degree. Similarly, Ref. [41] proposed a model named the degree-corrected mixed membership (DCMM) model as an extension of MMSB by considering the degree heterogeneity of nodes. There are alternative models based on MMSB, such as the overlapping continuous community assignment model (OCCAM) of [40] and the stochastic blockmodel with overlap (SBMO) proposed by [46], which can also model networks with mixed memberships. As discussed in Section 5, OCCAM equals DCMM, while SBMO is a special case of DCMM.

1.1. Spectral Clustering Approaches

For the four models SBM, DCSBM, MMSB and DCMM, many researchers focus on designing algorithms with provable consistent theoretical guarantees. Spectral clustering [47] is one of the most widely applied methods with guarantees of consistency for community detection.

Within the SBM and DCSBM frameworks for a non-overlapping network, spectral clustering has two steps. It first conducts the eigen-decomposition of the adjacency matrix or the Laplacian matrix [26,48,49]. Then it runs a clustering algorithm (typically, k-means) on some leading eigenvectors or their variants to infer the community membership. For example, Ref. [26] showed the consistency of spectral clustering designed based on Laplacian matrix under SBM. Ref. [48] proposed a regularized spectral clustering (RSC) algorithm designed based on regularized Laplacian matrix and shows its theoretical consistency under DCSBM. Ref. [30] studied the consistencies of two spectral clustering algorithms based on the adjacency matrix under SBM and DCSBM. Ref. [50] designed the spectral clustering on the ratios-of-eigenvectors (SCORE) algorithm with a theoretical guarantee under DCSBM. Ref. [49] studied the impact of regularization on a Laplacian spectral clustering under SBM.

Within the MMSB and DCMM frameworks for the overlapping network, broadly speaking, spectral clustering has the following three steps. One first conducts an eigen-decomposition of the adjacency matrix or the graph Laplacian, then hunts corners (also known as vertexes) using a convex hull algorithm, and finally has a membership reconstruction step by projection. The convex hull algorithms suggested in [41] differ in the k-means algorithm a lot. For example, Ref. [44] designed the sequential projection after cleaning (SPACL) algorithm based on the finding that there exists a simplex structure in the eigen-decomposition of the population adjacency matrix and studies the SPACL theoretical properties under MMSB. Meanwhile, SPACL uses the successive projection algorithm proposed in [51] to find the corners for its simplex structure. To fit DCMM, Ref. [41] designs the Mixed-SCORE algorithm based on the finding that there exists a simplex structure in the entry-wise ratio matrix obtained from the eigen-decomposition of the population adjacency matrix under DCMM. Ref. [41] also introduces several choices for convex hull algorithms to find corners for the simplex structure and show the estimation consistency of the Mixed-SCORE under DCMM. Ref. [43] finds the cone structure inherent in the normalization of eigenvectors of the population adjacency matrix under DCMM as well as OCCAM, and develops an algorithm to hunt corners in the cone structure.

1.2. Separation Condition, Alternative Separation Condition and Sharp Threshold

SBM with n nodes belonging to K equal (or nearly equal) size communities and vertices connect with probability

p_{in}

within clusters and

p_{out}

across clusters, denoted by

S B M (n, K, p_{in}, p_{out})

, has been well studied in recent years, especially for the case when

K = 2

; see [21] and the references therein. In this paper, we call the network generated from

S B M (n, K, p_{in}, p_{out})

the standard network for convenience. Without causing confusion, we also call

S B M (n, K, p_{in}, p_{out})

the standard network, occasionally. Let

p_{in} = α_{in} \frac{\log (n)}{n}, p_{out} = α_{out} \frac{\log (n)}{n}

. Refs. [21,52] found that exact recovery in

S B M (n, 2, α_{in} \frac{\log (n)}{n}, α_{out} \frac{\log (n)}{n})

is solvable, and efficiently so, if

| \sqrt{α_{in}} - \sqrt{α_{out}} | > \sqrt{2}

(i.e.,

| \sqrt{p_{in}} - \sqrt{p_{out}} | > \sqrt{\frac{2 \log (n)}{n}}

) and unsolvable if

| \sqrt{α_{in}} - \sqrt{α_{out}} | < \sqrt{2}

as summarized in Theorem 13 of [53]. This threshold can be achieved by semidefinite relaxations [21,54,55,56] and spectral methods with local refinements [57,58]. Unlike semidefinite relaxations, spectral methods have a different threshold, which was particularly pointed out by [21,52]: one highlight for

S B M (n, 2, p_{in}, p_{out})

is a theorem by [59] which says that when

p_{in} > p_{out}

, if

\begin{matrix} \frac{p_{in} - p_{out}}{\sqrt{p_{in}}} ≫ \sqrt{\frac{\log (n)}{n}} \end{matrix}

then spectral methods can exactly recover node labels with high probability as n goes to infinity (also known as consistent estimation [30,40,41,43,44,48,50]).

Consider a more general case

S B M (n, K, p_{in}, p_{out})

with

K = O (1)

; this paper finds that the above threshold can be extended as

\begin{matrix} \frac{| p_{in} - p_{out} |}{\sqrt{\max (p_{in}, p_{out})}} ≫ \sqrt{\frac{\log (n)}{n}}, \end{matrix}

(1)

which can be alternatively written as

\begin{matrix} \frac{| α_{in} - α_{out} |}{\sqrt{\max (α_{in}, α_{out})}} ≫ 1 . \end{matrix}

(2)

In this paper, when

K = O (1)

, the lower bound requirement on

\frac{| p_{in} - p_{out} |}{\sqrt{\max (p_{in}, p_{out})}}

(and

\frac{| α_{in} - α_{out} |}{\sqrt{\max (α_{in}, α_{out})}}

) for the consistent estimation of spectral methods is called the separation condition (alternative separation condition). The network generated from

S B M (n, K, p_{in}, p_{out})

with

p_{in} > p_{out}

is an assortative network in which nodes within the community have more edges than across communities [60]. The network generated from

S B M (n, K, p_{in}, p_{out})

with

p_{in} < p_{out}

is a dis-assortative network in which nodes within the community have fewer edges than across communities [60]. Therefore, Equation (2) holds for both assortative and dis-assortative networks.

Meanwhile, when

K = 1

such that

p = p_{in} = p_{out}

,

S B M (n, K, p_{in}, p_{out}) = S B M (n, 1, p, p)

degenerates to Erdös–Rényi (ER) random graph

G (n, p)

[53,61,62]. Ref. [61] finds that the ER random graph is connected with high probability if

\begin{matrix} p \geq \frac{\log (n)}{n} . \end{matrix}

(3)

We call the lower bound requirement on p for generating a connected ER random graph the sharp threshold in this paper.

1.3. Inconsistencies on Separation Condition in Some Previous Works

In this paper, we focus on the consistency of spectral method in community detection. The study of consistency is developed by obtaining the theoretical upper bound of error rate for a spectral method through analyzing the properties of the population adjacency matrix under the statistical model. To compare the consistencies of the theoretical results under different models, it is meaningful to study whether the separation condition and sharp threshold obtained from upper bounds of theoretical error rates for different methods under different models are consistent or not. Meanwhile, the separation condition and sharp threshold can also be seen as alternative unified theoretical frameworks to compare all methods and model parameters mentioned in the concluding remarks of [30].

Based on the separation condition and sharp threshold, here we describe some phenomena of the inconsistency in the community detection area. We find that the separation conditions of

S B M (n, K, p_{in}, p_{out})

with

K = O (1)

obtained from the error rates developed in [41,43,44] under DCMM or MMSB are not consistent with those obtained from the main results of [30] under SBM, and the sharp threshold obtained from the main results of [43,44] do not match the classical results. A summary of these inconsistencies is provided in Table 1 and Table 2. Furthermore, after delicate analysis, we find that the requirement on the network sparsity of [43,44] is stronger than that of [30,41], and [63] also finds that the requirement of Ref. [44] of network sparsity is sub-optimal.

1.4. Our Findings

Recall that we reviewed several spectral clustering methods under SBM, DCSBM, MMSB and DCMM introduced in [26,30,41,43,44,48,49,50] and DCSBM, MMSB and DCMM are extensions of SBM (i.e.,

S B M (n, K, p_{in}, p_{out})

is a special case of DCSBM, MMSB and DCMM). We have the following question:

Can these spectral clustering methods achieve the threshold in Equation (1) (or Equation (2)) for

S B M (n, K, p_{in}, p_{out})

with

K = O (1)

and the threshold in Equation (3) for the Erdös–Rényi (ER) random graph

G (n, p)

?

The answer is yes. In fact, spectral methods for network with mixed memberships still achieve thresholds in Equations (1) and (2) for

M M S B (n, K, Π, p_{in}, p_{out})

defined in Definition 2 when

K = O (1)

, where

M M S B (n, K, Π, p_{in}, p_{out})

can be seen as a generalization of

S B M (n, K, p_{in}, p_{out})

such that there exist nodes belonging to multiple communities. Explanations for why these spectral clustering methods achieve thresholds in Equations (1)–(3) will be provided in Section 3, Section 4 and Section 5 via re-establishing theoretical guarantee for SPACL under MMSB and its extension under DCMM because we find that the main theoretical results of [43,44] are sub-optimal. Meanwhile, we can obtain (and cannot obtain) the separation condition and sharp threshold from the theoretical bounds of error rates for spectral methods analyzed in [30,41,43,44] ([26,30,48,49,50]) directly. Instead of re-establishing the theoretical guarantee for all spectral methods reviewed in this paper to show that they achieve thresholds in Equations (1) and (3) for

S B M (n, K, P_{in}, P_{out})

with

K = O (1)

, we mainly focus on the SPACL algorithm under MMSB and its extension under DCMM since MMSB and DCMM are more complex than SBM and DCSBM.

We then summarize the idea of using the separation condition and sharp threshold to study the consistencies, and compare the error rates and requirements on network sparsity of different spectral methods under different models as a four-step criterion, which we call the separation condition and sharp threshold criterion (SCSTC for short). With an application of this criterion, this paper provides an attempt to answer the questions of how the above inconsistency phenomena occur, and how to obtain consistent resultswith weaker requirements on the network sparsity of [43,44]. To answer the two questions, we use the recent techniques on row-wise eigenvector deviation developed in [64,65] to obtain consistent theoretical results directly related with model parameters for the SPACL and the SVM-cone-DCMMSB algorithm of [43]. The two questions are then answered by delicate analysis with an application of SCSTC to the theoretical upper bounds of error rates in this paper and some previous spectral methods. Using SCSTC for the spectral methods introduced and studied in [26,30,48,49,50] and some other spectral methods fitting models that can reduce to

S B M (n, K, P_{in}, P_{out})

with

K = O (1)

, one can prove that these spectral methods achieve thresholds in Equations (1)–(3). The main contributions in this paper are as follows:

(i): We summarize the idea of using the separation condition of a standard network and sharp threshold of the ER random graph $G (n, p)$ to study the consistent estimations of different spectral methods designed via eigen-decomposition or singular value decomposition of the adjacency matrix or its variants under different models that can degenerate to SBM under mild conditions as a four-step criterion, SCSTC. The separation condition is used to study the consistency of the theoretical upper bound for the spectral method, and the sharp threshold can be used to study the network sparsity. The theoretical results of upper bounds for different spectral methods can be compared by SCSTC. Using this criterion, a few inconsistent phenomenons of some previous works are found.
(ii): Under MMSB and DCMM, we study the consistencies of the SPACL algorithm proposed in [44] and its extended version using the recent techniques on row-wise eigenvector deviation developed in [64,65]. Compared with the original results of [43,44], our main theoretical results enjoy smaller error rates by lesser dependence on K and $\log (n)$ . Meanwhile, our main theoretical results have weaker requirements on the network sparsity and the lower bound of the smallest nonzero singular value of the population adjacency matrix. For details, see Table 3 and Table 4.
(iii): Our results for DCMM are consistent with those for MMSB when DCMM degenerates to MMSB under mild conditions. Using SCSTC, under mild conditions, our main theoretical results under DCMM are consistent with those of [41]. This answers the question that the phenomenon that the main results of [43,44] do not match those of [41] occurs due to the fact that in Refs. [43,44], the theoretical results of error rates are sub-optimal. We also find that our theoretical results (as well as those of [41]) under both MMSB and DCMM match the classical results on the separation condition and sharp threshold, i.e., achieve thresholds in Equations (1)–(3). Using the bound of $∥ A - Ω ∥$ instead of $∥ A_{re} - Ω ∥$ to establish the upper bound of error rate under SBM in [30], the two spectral methods studied in [30] achieve thresholds in Equations (1)–(3), which answers the question of why the separation condition obtained from error rate of [41] does not match that obtained from the error rate of [30]. Using $∥ A_{re} - Ω ∥$ or $∥ A - Ω ∥$ influences the row-wise eigenvector deviations in Theorem 3.1 of [44] and Theorem I.3 of [43], and thus using $∥ A_{re} - Ω ∥$ or $∥ A - Ω ∥$ influences the separation conditions and sharp thresholds of [43,44]. For comparison, our bound on row-wise eigenvector deviation is obtained by using the techniques developed in [64,65] and that of [41] is obtained by applying the modified Theorem 2.1 of [66]; therefore, using $∥ A_{re} - Ω ∥$ or $∥ A - Ω ∥$ has no influence on the separation conditions and sharp thresholds of ours and that of [41]. For details, see Table 1 and Table 2. In a word, using SCSTC, the spectral methods proposed and studied in [26,30,41,43,44,48,49,50,67,68] or some other spectral methods fitting models that can reduce to $S B M (n, K, p_{in}, p_{out})$ achieve thresholds in Equations (1)–(3).
(iv): We verify our threshold in Equation (2) by some computer-generated networks in Section 6. The numerical results for networks generated under $M M S B (n, K, Π, p_{in}, p_{out})$ when $K = 2$ and $K = 3$ show that SPACL and its extended version achieve a threshold in Equation (2), and results for networks generated from $S B M (n, K, p_{in}, p_{out})$ when $K = 2$ and $K = 3$ show that the spectral methods considered in [26,30,48,50] achieve the threshold in Equation (2).

The article is organized as follows. In Section 2, we give the formal introduction to the mixed membership stochastic blockmodel and review the algorithm SPACL considered in this paper. The theoretical results of consistency for the mixed membership stochastic blockmodel are presented and compared to related works in Section 3. After delicate analysis, the separation condition and sharp threshold criterion is presented in Section 4. Based on an application of this criterion, the improvement consistent estimation results for the extended version of SPACL under the degree corrected mixed membership model are provided in Section 5. Several computer-generated networks under MMSB and SBM are conducted to show that some spectral clustering methods achieve the threshold in Equation (2) in Section 6. The conclusion is given in Section 7.

Notations. We take the following general notations in this paper. Write

[m] : = {1, 2, \dots, m}

for any positive integer m. For a vector x and fixed

q > 0

,

{∥ x ∥}_{q}

denotes its

l_{q}

-norm. We drop the subscript if

q = 2

occasionally. For a matrix M,

M^{'}

denotes the transpose of the matrix M,

∥ M ∥

denotes the spectral norm,

{∥ M ∥}_{F}

denotes the Frobenius norm,

{∥ M ∥}_{2 \to \infty}

denotes the maximum

l_{2}

-norm of all the rows of M, and

{∥ M ∥}_{\infty} : = \max_{i} \sum_{j} | M (i, j) |

denotes the maximum absolute row sum of M. Let

rank (M)

denote the rank of matrix M. Let

σ_{i} (M)

be the i-th largest singular value of matrix M,

λ_{i} (M)

denote the i-th largest eigenvalue of the matrix M ordered by the magnitude, and

κ (M)

denote the condition number of M.

M (i, :)

and

M (:, j)

denote the i-th row and the j-th column of matrix M, respectively.

M (S_{r}, :)

and

M (:, S_{c})

denote the rows and columns in the index sets

S_{r}

and

S_{c}

of matrix M, respectively. For any matrix M, we simply use

Y = \max (0, M)

to represent

Y_{i j} = \max (0, M_{i j})

for any

i, j

. For any matrix

M \in R^{m \times m}

, let

diag (M)

be the

m \times m

diagonal matrix whose i-th diagonal entry is

M (i, i)

.

1

and

0

are column vectors with all entries being ones and zeros, respectively.

e_{i}

is a column vector whose i-th entry is 1, while other entries are zero. In this paper, C is a positive constant which may vary occasionally.

f (n) = O (g (n))

means that there exists a constant

c > 0

such that

| f (n) | \leq c | g (n) |

holds for all sufficiently large n.

x ⪰ y

means there exists a constant

c > 0

such that

| x | \geq c | y |

.

f (n) = o (g (n))

indicates that

\frac{f (n)}{g (n)} \to 0

as

n \to \infty

.

2. Mixed Membership Stochastic Blockmodel

Let

A \in {0, 1}^{n \times n}

be a symmetric adjacency matrix such that

A (i, j) = 1

if there is an edge between node i to node j, and

A (i, j) = 0

otherwise. The mixed membership stochastic blockmodel (MMSB) [36] for generating A is as follows.

\begin{matrix} Ω : = ρ Π \tilde{P} Π^{'} A (i, j) \sim Bernoulli (Ω (i, j)) i, j \in [n], \end{matrix}

(4)

where

Π \in R^{n \times K}

is called the membership matrix with

Π (i, k) \geq 0

and

\sum_{k = 1}^{K} Π (i, k) = 1

for

i \in [n]

and

k \in [K]

,

\tilde{P} \in R^{K \times K}

is an non-negative symmetric matrix with

\max_{k, l \in [K]} \tilde{P} (k, l) = 1

for model identifiability under MMSB,

ρ

is called the sparsity parameter which controls the sparsity of the network, and

Ω \in R^{n \times n}

is called the population adjacency matrix since

E [A] = Ω

. As mentioned in [41,44],

σ_{K} (\tilde{P})

is a measure of the separation between communities, and we call it the separation parameter in this paper.

ρ

and

σ_{K} (\tilde{P})

are two important model parameters directly related with the separation condition and sharp threshold, and they will be considered throughout this paper.

Definition 1.

Call model (4) the mixed membership stochastic blockmodel (MMSB), and denote it by

M M S B_{n} (K, \tilde{P}, Π, ρ)

.

Definition 2.

Let

M M S B (n, K, Π, p_{in}, p_{out})

be a special case of

M M S B_{n} (K, \tilde{P}, Π, ρ)

when

ρ \tilde{P}

has diagonal entries

p_{in}

and non-diagonal entries

p_{out}

, and

κ (Π^{'} Π) = O (1)

.

Call node i ‘pure’ if

Π (i, :)

is degenerate (i.e., one entry is 1, all others

K - 1

entries are 0) and ‘mixed’ otherwise. When all nodes are pure in

Π

, we see that

M M S B (n, K, Π, p_{in}, p_{out})

exactly reduces to

S B M (n, K, p_{in}, p_{out})

. Thus,

M M S B (n, K, Π, p_{in}, p_{out})

is a generalization of

S B M (n, K, p_{in}, p_{out})

with mixed nodes in each community. In this paper, we show that SPACL [44] fitting MMSB and SVM-cone-DCMMSB [43] and Mixed-SCORE [41] fitting DCMM also achieve thresholds in Equations (1)–(3) for

M M S B (n, K, Π, p_{in}, p_{out})

with

K = O (1)

. By Theorems 2.1 and 2.2 [44], the following conditions are sufficient for the identifiability of MMSB, when

ρ \tilde{P} (k, l) \in [0, 1]

for all

k, l \in [K]

,

(I1) $rank (\tilde{P}) = K$ .
(I2) There is at least one pure node for each of the K communities.

Unless specified, we treat conditions (I1) and (I2) as the default from now on.

For

k \in [K]

, let

I^{(k)}

be the set of pure nodes in community k such that

I^{(k)} = {i \in [n] : Π (i, k) = 1}

. For

k \in [K]

, select one node from

I^{(k)}

to construct the index set

I

, i.e.,

I

is the index of nodes corresponding to K pure nodes, one from each community. Without loss of generality, let

Π (I, :) = I_{K}

where

I_{K}

is the

K \times K

identity matrix. Recall that

rank (Ω) = K

. Let

Ω = U Λ U^{'}

be the compact eigen-decomposition of

Ω

such that

U \in R^{n \times K}, Λ \in R^{K \times K}

, and

U^{'} U = I_{K}

. Lemma 2.1 [44] gives that

U = Π U (I, :)

, and such a form is called ideal simplex (IS for short) [41,44] since all rows of U form a K-simplex in

R^{K}

and the K rows of

U (I, :)

are the vertices of the K-simplex. Given

Ω

and K, as long as we know

U (I, :)

, we can exactly recover

Π

by

Π = U U^{- 1} (I, :)

since

U (I, :) \in R^{K \times K}

is a full rank matrix. As mentioned in [41,44], for such IS, the successive projection (SP) algorithm [51] (i.e., Algorithm A1) can be applied to U with K communities to exactly find the corner matrix

U (I, :)

. For convenience, set

Z = U U^{- 1} (I, :)

. Since

Π = Z

, we have

Π (i, :) = \frac{Z (i, :)}{{∥ Z (i, :) ∥}_{1}}

for

i \in [n]

.

Based on the above analysis, we are now ready to give the ideal SPACL algorithm with input

Ω, K

and output

Π

.

Let $Ω = U Λ U^{'}$ be the top-K eigen-decomposition of $Ω$ such that $U \in R^{n \times K}, Λ \in R^{K \times K}, U^{'} U = I$ .
Run SP algorithm on the rows of U assuming that there are K communities to obtain $I$ .
Set $Z = U U^{- 1} (I, :)$ .
Recover $Π$ by setting $Π (i, :) = \frac{Z (i, :)}{{∥ Z (i, :) ∥}_{1}}$ for $i \in [n]$ .

With the given U and K, since the SP algorithm returns

U (I, :)

, we see that the ideal SPACL exactly (for detail, see Appendix B) returns

Π

.

Now, we review the SPACL algorithm of [44]. Set

\tilde{A} = \hat{U} \hat{Λ} {\hat{U}}^{'}

to be the top K eigen-decomposition of A such that

\hat{U} \in R^{n \times K}, \hat{Λ} \in R^{K \times K}, {\hat{U}}^{'} \hat{U} = I_{K}

, and

\hat{Λ}

contains the top K eigenvalues of A. For the real case, use

\hat{Z}, \hat{Π}

given in Algorithm 1 to estimate

Z, Π

, respectively. Algorithm 1 is the SPACL algorithm [44] where we only care about the estimation of the membership matrix

Π

, and omit the estimation of P and

ρ

. Meanwhile, Algorithm 1 is a direct extension of the ideal SPACL algorithm from the oracle case to the real case, and we omit the prune step in the original SPACL algorithm of [44].

Algorithm 1 SPACL [44]

Require: The adjacency matrix $A \in R^{n \times n}$ and the number of communities K.
Ensure: The estimated $n \times K$ membership matrix $\hat{Π}$ .
1: Obtain $\tilde{A} = \hat{U} \hat{Λ} {\hat{U}}^{'}$ , the top K eigen-decomposition of A.
2: Apply SP algorithm (i.e., Algorithm A1) on the rows of $\hat{U}$ assuming there are K communities to obtain $\hat{I}$ , the index set returned by SP algorithm.
3: Set $\hat{Z} = \hat{U} {\hat{U}}^{- 1} (\hat{I}, :)$ . Then set $\hat{Z} = \max (0, \hat{Z})$ .
4: Estimate $Π (i, :)$ by $\hat{Π} (i, :) = \hat{Z} (i, :) / {∥ \hat{Z} (i, :) ∥}_{1}, i \in [n]$ .

3. Consistency under MMSB

Our main result under MMSB provides an upper bound on the estimation error of each node’s membership in terms of several model parameters. Throughout this paper, K is a known positive integer. Assume that

(A1): $ρ n \geq \log (n)$ .

Assumption (A1) provides a requirement on the lower bound of sparsity parameter

ρ

such that it should be at least

\log (n) / n

. Then we have the following lemma.

Lemma 1.

Under

M M S B_{n} (K, \tilde{P}, Π, ρ)

, when Assumption (A1) holds, with probability at least

1 - o (n^{- α})

for any

α > 0

, we have

\begin{matrix} ∥ A - Ω ∥ \leq \frac{α + 1 + \sqrt{(α + 1) (α + 19)}}{3} \sqrt{ρ n \log (n)} . \end{matrix}

In Lemma 1, instead of simply using a constant

C_{α}

to denote

\frac{α + 1 + \sqrt{(α + 1) (α + 19)}}{3}

, we keep the explicit form here.

Remark 1.

When Assumption (A1) holds, the upper bound of

∥ A - Ω ∥

in Lemma 1 is consistent with Corollary 6.5 in [69] since

Var (A (i, j)) \leq ρ

under

M M S B_{n} (K, P, Π, ρ)

.

Lemma 1 is obtained via Theorem 1.4 (Bernstein inequality) in [70]. For comparison, Ref. [44] applies Theorem 5.2 [30] to bound

∥ A - Ω ∥

(see, for example, Equation (14) of [44]) and obtains a bound as

C \sqrt{ρ n}

for some

C > 0

. However,

C \sqrt{ρ n}

is the bound between a regularization of A and

Ω

as stated in the proof of Theorem 5.2 [30], where such regularization of A is obtained from A with some constraints in Lemmas 4.1 and 4.2 of the supplemental material [30]. Meanwhile, Theorem 2 [71] also gives that the bound between a regularization of A and

Ω

is

C \sqrt{ρ n}

, where such a regularization of A should also satisfy few constraints on A; see Theorem 2 [71] for detail. Instead of bounding the difference between a regularization of A and

Ω

, we are interested in bounding

∥ A - Ω ∥

by the Bernstein inequality, which has no constraints on A. For convenience, use

A_{re}

to denote the regularization of A in this paper. Hence,

∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}

with high probability, and this bound is model independent as shown by Theorem 5.2 [30] and Theorem 2 [71] as long as

ρ \geq \max_{i, j} Ω (i, j)

(here, let

Ω = E [A]

without considering models, a

ρ

satisfying

ρ \geq \max_{i, j} Ω (i, j)

is also the sparsity parameter which controls the overall sparsity of a network). Note that

A_{re}

is not

\tilde{A}

, where

\tilde{A} = \hat{U} Λ {\hat{U}}^{'}

is obtained by the top K eigen-decomposition of A, while

A_{re}

is obtained by adding constraints on degrees of A; see Theorem 2 [71] for detail.

In [41,43,44], the main theoretical results for their proposed membership estimating methods hinge on a row-wise deviation bound for the eigenvectors of the adjacency matrix, whether under MMSB or DCMM. Different from the theoretical technique applied in Theorem 3.1 [44], which provides sup-optimal dependencies on

\log (n)

and K, and needs sub-optimal requirements on the sparsity parameter

ρ

and the lower bound of

σ_{K} (Ω)

, to obtain row-wise deviation bound for the singular eigenvector of

Ω

, we use Theorem 4.2 [64] and Theorem 4.2 [65].

Lemma 2

(Row-wise eigenspace error). Under

M M S B_{n} (K, \tilde{P}, Π, ρ)

, when Assumption (A1) holds, suppose

σ_{K} (Ω) \geq C \sqrt{ρ n \log (n)}

, with probability at least

1 - o (n^{- α})

,

When we apply Theorem 4.2 of [64], we have

$\begin{matrix} ∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty} = O (\frac{\sqrt{K} (κ (Ω) \sqrt{\frac{n}{K λ_{K} (Π^{'} Π)}} + \sqrt{\log (n)})}{σ_{K} (\tilde{P}) \sqrt{ρ} λ_{K} (Π^{'} Π)}), \end{matrix}$
When we apply Theorem 4.2 of [65], we have

$\begin{matrix} ∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty} = O (\frac{\sqrt{n \log (n)}}{σ_{K} (\tilde{P}) \sqrt{ρ} λ_{K}^{1.5} (Π^{'} Π)}) . \end{matrix}$

For convenience, set

ϖ = ∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty}

, and let

ϖ_{1}, ϖ_{2}

denote the upper bound in Lemma 2 when applying Theorem 4.2 of [64] and Theorem 4.2 of [65], respectively. Note that when

λ_{K} (Π^{'} Π) = O (\frac{n}{K})

, we have

ϖ_{1} = ϖ_{2} = O (\frac{K^{1.5}}{σ_{K} (\tilde{P})} \frac{1}{\sqrt{n}} \sqrt{\frac{\log (n)}{ρ n}})

, and therefore we simply let

ϖ_{2}

be the bound since its form is slightly simpler than

ϖ_{1}

.

Compared with Theorem 3.1 of [44], since we apply Theorem 4.2 of [64] and Theorem 4.2 of [65] to obtain the bound of row-wise eigenspace error under MMSB, our bounds do not rely on

\min (K^{2}, κ^{2} (Ω))

while Theorem 3.1 [44] does. Meanwhile, our bound in Lemma 2 is sharper with lesser dependence on K and

\log (n)

, has weaker requirements on the lower bounds of

σ_{K} (Ω), λ_{K} (Π^{'} Π)

and the sparsity parameter

ρ

. The details are given below:

We emphasize that the bound of Theorem 3.1 of [44] should be $∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty} = O (\frac{ψ (Ω) \sqrt{K n} \log^{ξ} (n)}{σ_{K} (\tilde{P}) \sqrt{ρ} λ_{K}^{1.5} (Π^{'} Π)})$ instead of $∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty} = O (\frac{ψ (Ω) \sqrt{K n}}{σ_{K} (\tilde{P}) \sqrt{ρ} λ_{K}^{1.5} (Π^{'} Π)})$ for $ξ > 1$ where the function $ψ$ is defined in Equation (7) of [44], and this is also pointed out by Table 2 of [63]. The reason is that in the proof part of Theorem 3.1 [44], from step (iii) to step (iv), they should keep the term $\log^{ξ} (n)$ since this term is much larger than 1. We can also find that the bound in Theorem 3.1 [44] should multiply $\log^{ξ} (n)$ from Theorem VI.1 [44] directly. For comparison, this bound $O (\frac{ψ (Ω) \sqrt{K n} \log^{ξ} (n)}{σ_{K} (\tilde{P}) \sqrt{ρ} λ_{K}^{1.5} (Π^{'} Π)})$ is $K^{0.5} \log^{ξ - 0.5} (n)$ times our bound in Lemma 2. Meanwhile, by the proof of the bound in Theorem 3.1 of [44], we see that the bound depends on the upper bound of $∥ A - Ω ∥$ , and [44] applies Theorem 5.2 of [30] such that $∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}$ with high probability. Since $C \sqrt{ρ n}$ is the upper bound of the difference between a regularization of A and $Ω$ . Therefore, if we are only interested in bounding $∥ A - Ω ∥$ instead of $∥ A_{re} - Ω ∥$ , the upper bound of Theorem 3.1 [44] should be $O (\frac{ψ (Ω) \sqrt{K n} \log^{ξ + 0.5} (n)}{σ_{K} (\tilde{P}) λ_{K}^{1.5} (Π^{'} Π)})$ , which is at least $K^{0.5} \log^{ξ} (n)$ times our bound in Lemma 2. Furthermore, the upper bound of the row-wise eigenspace error in Lemma 2 does not rely on the upper bound of $∥ A - Ω ∥$ as long as $σ_{K} (Ω) \geq C \sqrt{ρ n \log (n)}$ holds. Therefore, whether using $∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}$ or $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}$ does not change the bound in Lemma 2.
Our Lemma 2 requires $σ_{K} (Ω) \geq C \sqrt{ρ n \log (n)}$ , while Theorem 3.1 [44] requires $σ_{K} (Ω) \geq 4 \sqrt{ρ n} \log^{ξ} (n)$ by their Assumption 3.1. Therefore, our Lemma 2 has a weaker requirement on the lower bound of $σ_{K} (Ω)$ than that of Theorem 3.1 [44]. Meanwhile, Theorem 3.1 [44] requires $λ_{K} (Π^{'} Π) \geq \frac{1}{ρ}$ while our Lemma 2 has no lower bound requirement on $λ_{K} (Π^{'} Π)$ as long as it is positive.
Since $∥ Ω ∥ = ∥ ρ Π \tilde{P} Π^{'} ∥ \leq C ρ n$ by basic algebra, the lower bound requirement on $σ_{K} (Ω)$ in Assumption 3.1 of [44] gives that $4 \sqrt{ρ n} {log}^{ξ} (n) \leq σ_{K} (Ω) \leq ∥ Ω ∥ \leq C ρ n$ , which suggests that Theorem 3.1 [44] requires $ρ n \geq C \log^{2 ξ} (n)$ , and this also matches with the requirement on $ρ n$ in Theorem VI.1 of [44] (and this is also pointed out by Table 1 of [63]). For comparison, our requirement on sparsity given in Assumption (A1) is $ρ n \geq \log (n)$ , which is weaker than $ρ n \geq C \log^{2 ξ} (n)$ . Similarly, in our Lemma 2, the requirement $σ_{K} (Ω) \geq C \sqrt{ρ n \log (n)}$ gives $C \sqrt{ρ n \log (n)} \leq σ_{K} (Ω) \leq ∥ Ω ∥ \leq C ρ n$ , thus we have $\log (n) \leq C ρ n$ which is consistent with Assumption (A1).

If we further assume that

K = O (1), λ_{K} (Π^{'} Π) = O (\frac{n}{K})

(i.e.,

κ (Π^{'} Π) = O (1)

) and

σ_{K} (\tilde{P}) = O (1)

, the row-wise eigenspace error is of order

\frac{1}{\sqrt{n}} \sqrt{\frac{\log (n)}{ρ n}}

, which is consistent with the row-wise eigenvector deviation of the result of [63], shown in their Table 2. The next theorem gives the theoretical bounds on the estimations of memberships under MMSB.

Theorem 1.

Under

M M S B_{n} (K, \tilde{P}, Π, ρ)

, let

\hat{Π}

be obtained from Algorithm 1, and suppose the conditions in Lemma 2 hold; there exists a permutation matrix

P \in R^{K \times K}

such that, with probability at least

1 - o (n^{- α})

, we have

\begin{matrix} \max_{i \in [n]} {∥ e_{i}^{'} (\hat{Π} - Π P) ∥}_{1} = O (ϖ K κ (Π^{'} Π) \sqrt{λ_{1} (Π^{'} Π)}) . \end{matrix}

Remark 2

(Comparison to Theorem 3.2 [44]). Consider a special case by setting

κ (Π^{'} Π) = O (1)

, i.e.,

λ_{K} (Π^{'} Π) = O (\frac{n}{K})

and

λ_{1} (Π^{'} Π) = O (\frac{n}{K})

. We focus on comparing the dependencies on K in bounds of our Theorem 1 and Theorem 3.2 [44]. Under this case, the bound of our Theorem 1 is proportional to

K^{2}

by basic algebra; since

\min (K^{2}, κ^{2} (Ω)) = \min (K^{2}, O (1)) = O (1)

and the bound in Theorem 3.2 [44] should multiply

\sqrt{K}

because (in [44]’s language)

∥ {\hat{V}}_{p}^{- 1} ∥_{F} \leq \frac{\sqrt{K}}{σ_{K} ({\hat{V}}_{p})}

instead of

∥ {\hat{V}}_{p}^{- 1} ∥_{F} = \frac{1}{λ_{K} ({\hat{V}}_{p})}

in Equation (45) [44], the power of K is 2 by checking the bound of Theorem 3.2 [44]. Meanwhile, note that our bound in Theorem 2 is

l_{1}

bound, while the bound in Theorem 3.2 [44] is

l_{2}

bound. When we translate the

l_{2}

bound of Theorem 3.2 [44] into

l_{1}

bound, the power of K is 2.5 for Theorem 3.2 [44]. Hence, our bound in Theorem 1 has less dependence on K than that of Theorem 3.2 [44], and this is also consistent with the first bullet given after Lemma 2.

Table 3 summarizes the necessary conditions and dependence on the model parameters of the rates in Theorem 1 and Theorem 3.2 [44] for comparison. The following corollary is obtained by adding conditions on the model parameters similar to Corollary 3.1 in [44].

Corollary 1.

Under

M M S B (n, K, Π, p_{in}, p_{out})

with

K = O (1)

, when the conditions of Lemma 2 hold, with probability at least

1 - o (n^{- α})

, we have

\begin{matrix} \max_{i \in [n]} {∥ e_{i}^{'} (\hat{Π} - Π P) ∥}_{1} = O (\frac{1}{σ_{K} (\tilde{P})} \sqrt{\frac{\log (n)}{ρ n}}) . \end{matrix}

Remark 3.

Consider a special case in Corollary 1 by setting

σ_{K} (\tilde{P})

as a constant, we see that the error bound

O (\sqrt{\frac{\log (n)}{ρ n}})

in Corollary 1 is directly related to Assumption (A1), and for consistent estimation, ρ should shrink slower than

\frac{\log (n)}{n}

.

Remark 4.

Under the setting of Corollary 1, the requirement

σ_{K} (Ω) \geq C \sqrt{ρ n \log (n)}

in Lemma 2 holds naturally. By Lemma II.4 [44], we know that

σ_{K} (Ω) \geq ρ σ_{K} (\tilde{P}) λ_{K} (Π^{'} Π) = C ρ n σ_{K} (\tilde{P})

. To make the requirement

σ_{K} (Ω) \geq C \sqrt{ρ n \log (n)}

always hold, we just need

C ρ n σ_{K} (\tilde{P}) \geq C \sqrt{ρ n \log (n)}

, which gives that

σ_{K} (\tilde{P}) \geq C \sqrt{\frac{\log (n)}{ρ n}}

, and it just matches with the requirement of the consistent estimation of memberships in Corollary 1.

Remark 5

(Comparison to Theorem 3.2 [44]). When

K = O (1)

and

λ_{K} (Π^{'} Π) = O (\frac{n}{K})

, by the first bullet in the analysis given after Lemma 2, the row-wise eigenspace error of Theorem 3.1 [44] is

O (\frac{\log^{ξ} (n)}{σ_{K} (\tilde{P}) \sqrt{ρ} n})

, and it gives that their error bound on estimation membership given in their Equation (3) is

O (\frac{\log^{ξ} (n)}{σ_{K} (\tilde{P}) \sqrt{ρ n}})

, which is

\log^{ξ - 0.5} (n)

times of the bound in our Lemma 1.

Remark 6

(Comparison to Theorem 2.2 [41]). Replacing the Θ in [41] by

Θ = \sqrt{ρ} I

, their DCMM model degenerates to MMSB. Then their conditions in Theorem 2.2 are our Assumption (A1) and

λ_{K} (Π^{'} Π) = O (\frac{n}{K})

for MMSB. When

K = O (1)

, the error bound in Theorem 2.2 in [41] is

O (\frac{1}{σ_{K} (\tilde{P})} \sqrt{\frac{\log (n)}{ρ n}})

, which is consistent with ours.

4. Separation Condition and Sharp Threshold Criterion

After obtaining Corollary 1 under MMSB, now we are ready to give our criterion after introducing the separation condition of

M M S B (n, K, Π, p_{in}, p_{out})

with

K = O (1)

and the sharp threshold of ER random graph

G (n, p)

in this section.

Separation condition. Let

P = ρ \tilde{P}

be the probability matrix for

M M S B (n, K, Π, p_{in}, p_{out})

when

K = O (1)

, so P has diagonal (and non-diagonal) entries

p_{in}

(and

p_{out}

) and

σ_{K} (P) = ρ σ_{K} (\tilde{P}) \equiv | p_{in} - p_{out} |

. Recall that

\max_{k, l \in [K]} \tilde{P} (k, l) = 1

under

M M S B_{n} (K, \tilde{P}, Π, ρ)

, we have

\max_{k, l \in [K]} P (k, l) = ρ \equiv \max (p_{in}, p_{out})

. So, we have the separation condition

\frac{| p_{in} - p_{out} |}{\sqrt{\max (p_{in}, p_{out})}} \equiv \sqrt{ρ} σ_{K} (\tilde{P})

(also known as the relative edge probability gap in [44]) and the alternative separation condition

\frac{| α_{in} - α_{out} |}{\sqrt{\max (α_{in}, α_{out})}} \equiv \sqrt{\frac{ρ n}{\log (n)}} σ_{K} (\tilde{P})

. Now, we are ready to compare the thresholds of the (alternative) separation condition obtained from different theoretical results.

(a) By Corollary 1, we know that $σ_{K} (\tilde{P})$ should shrink slower than $\sqrt{\frac{\log (n)}{ρ n}}$ for consistent estimation. Therefore, the separation condition $\frac{| p_{in} - p_{out} |}{\sqrt{\max (p_{in}, p_{out})}} \equiv \sqrt{ρ} σ_{K} (\tilde{P})$ should shrink slower than $\sqrt{\frac{\log (n)}{n}}$ (i.e., Equation (1)), and this threshold is consistent with Corollary 1 of [59] and Equation (17) of [49]. The alternative separation condition $\frac{| α_{in} - α_{out} |}{\sqrt{\max (α_{in}, α_{out})}} \equiv \sqrt{\frac{ρ n}{\log (n)}} σ_{K} (\tilde{P})$ should shrink slower than 1 (i.e., Equation (2)).
(b) Undoubtedly, the (alternative) separation condition in (a) is consistent with that of [41], since Theorem 2.2 [41] shares the same error rate $O (\frac{1}{σ_{K} (\tilde{P})} \sqrt{\frac{\log (n)}{ρ n}})$ for $M M S B (n, K, Π, p_{in}, p_{out})$ with $K = O (1)$ .
(c) By Remark 5, using $∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}$ , we know that in Ref. [44], Equation (3) is $O (\frac{\log^{ξ} (n)}{σ_{K} (\tilde{P}) \sqrt{ρ n}})$ , so $\sqrt{ρ} σ_{K} (\tilde{P})$ should shrink slower than $\frac{\log^{ξ} (n)}{\sqrt{n}}$ . Thus, for [44], the separation condition is $\frac{\log^{ξ} (n)}{\sqrt{n}}$ , and the alternative separation condition is $\log^{ξ - 0.5} (n)$ , which are sub-optimal compared with ours in (a). Using $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}$ , and Equation (3) in Ref. [44], which is $O (\frac{\log^{ξ + 0.5} (n)}{σ_{K} (\tilde{P}) \sqrt{ρ n}})$ , we see that for [44], now the separation condition is $\frac{\log^{ξ + 0.5} (n)}{\sqrt{n}}$ and the alternative separation condition is $\log^{ξ} (n)$ .
(d) For comparison, the error bound of Corollary 3.2 [30] built under SBM for community detection is $O (\frac{1}{σ_{K}^{2} (\tilde{P}) ρ n})$ for $S B M (n, K, p_{in}, p_{out})$ with $K = O (1)$ , so $\sqrt{ρ} σ_{K} (\tilde{P})$ should shrink slower than $\frac{1}{\sqrt{n}}$ . Thus the separation condition for [30] is $\frac{1}{\sqrt{n}}$ . However, as we analyzed in the first bullet given after Lemma 2, [30] applied $∥ A_{r e} - Ω ∥ \leq C \sqrt{ρ n}$ to build their consistency results. Instead, we apply $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}$ to the built theoretical results of [30], and the error bound of Corollary 3.2 [30] is $O (\frac{\log (n)}{σ_{K}^{2} (\tilde{P}) ρ n})$ , which returns the same separation condition as our Corollary 1 and Theorem 2.2 of [41] now. Following a similar analysis to (a)–(c), we can obtain an alternative separation condition for [30] immediately, and the results are provided in Table 2. Meanwhile, as analyzed in the first bullet given after Lemma 2, whether using $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}$ or $∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}$ does not change our error rates. By carefully analyzing the proof of 2.1 of [41], we see that whether using $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}$ or $∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}$ also does not change their row-wise large deviation, hence it does not influence their upper bound of the error rate for their Mixed-SCORE.

Sharp threshold. Consider the Erdös–Rényi (ER) random graph

G (n, p)

[61]. To construct the ER random graph

G (n, p)

, let

K = 1

and

Π

be an

n \times 1

vector with all entries being ones. Since

K = 1

and the maximum entry of

\tilde{P}

is assumed to be 1, we have

\tilde{P} = 1

in

G (n, p)

and hence

σ_{K} (\tilde{P}) = 1

. Then we have

Ω = Π ρ \tilde{P} Π^{'} = Π ρ Π^{'} = Π p Π^{'}

, i.e,

p = ρ

. Since the error rate is

O (\frac{1}{σ_{K} (\tilde{P})} \sqrt{\frac{\log (n)}{ρ n}}) = O (\sqrt{\frac{\log (n)}{p n}})

, for consistent estimation, we see that p should shrink slower than

\frac{\log (n)}{n}

(i.e., Equation (3)), which is just the sharp threshold in [61], Theorem 4.6 [62], strongly consistent with [72], and the first bullet in Section 2.5 [53] (called the lower bound requirement of p for the ER random graph to enjoy consistent estimation as the sharp threshold). Since the sharp threshold is obtained when

K = 1

, which means a connected ER random graph

G (n, p)

, this is also consistent with the connectivity in Table 2 of [21]. Meanwhile, since our Assumption (A1) requires

ρ n \geq \log (n)

, it gives that p should shrink slower than

\frac{\log (n)}{n}

since

p = ρ

under

G (n, p)

, which is consistent with the sharp threshold. Since Theorem 2.2 of Ref. [41] enjoys the same error rate as ours under the settings in Corollary 1, [41] also reaches the sharp threshold as

\frac{\log (n)}{n}

. Furthermore, Remark 5 says that the bound for the error rate in Equation (3) [44] should be

O (\frac{\log^{ξ} (n)}{σ_{K} (\tilde{P}) \sqrt{ρ n}})

when using

∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}

; following a similar analysis, we see that the sharp threshold for [44] is

\frac{\log^{2 ξ} (n)}{n}

, which is sub-optimal compared with ours. When using

∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}

, the sharp threshold for [44] is

\frac{\log^{2 ξ + 1} (n)}{n}

. Similarly, the error bound of Corollary 3.2 [30] is

O (\frac{1}{σ_{K}^{2} (\tilde{P}) ρ n}) \equiv O (\frac{1}{p n})

under ER

G (n, p)

since

p = ρ, σ_{K} (\tilde{P}) = 1

and

K = 1

. Hence, the sharp threshold obtained from the theoretical upper bound for error rates of [30] is

\frac{1}{n}

, which does not match the classical result. Instead, we apply

∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}

with a high probability to build the theoretical results of [30], and the error bound of Corollary 3.2 [30] is

O (\frac{\log (n)}{p n})

, which returns the classical sharp threshold

\frac{\log (n)}{n}

now.

Table 1 summarizes the comparisons of the separation condition and sharp threshold. Table 2 records the respective alternative separation condition. The delicate analysis given above supports our statement that the separation condition of a standard network (i.e.,

S B M (n, K, p_{in}, p_{out})

with

K = O (1)

or

M M S B (n, K, Π, p_{in}, p_{out})

with

K = O (1)

) and the sharp threshold of ER random graph

G (n, p)

can be seen as unified criteria to compare the theoretical results of spectral methods under different models. To conclude the above analysis, here, we summarize the main steps to apply the separation condition and sharp threshold criterion (SCSTC for short) to check the consistency of the theoretical results or compare the results of spectral methods under different models, where spectral methods mean methods developed based on the application of the eigenvectors or singular vectors of the adjacency matrix or its variants for community detection. The four-stage SCSTC is given below:

$s t e p_{1}$: Check whether the theoretical upper bound of the error rate contains $σ_{K} (\tilde{P})$ (note that $P = ρ \tilde{P}$ is probability matrix and maximum entries of $\tilde{P}$ should be set as 1), where the separation parameter $σ_{K} (\tilde{P})$ always appears when considering the lower bound of $σ_{K} (Ω)$ . If it contains $σ_{K} (\tilde{P})$ , move to the next step. Otherwise, it suggests possible improvements for the consistency by considering $σ_{K} (\tilde{P})$ in the proofs.
$s t e p_{2}$: Let $K = O (1)$ and network degenerate to the standard network whose numbers of nodes in each community are in the same order and can been seen as $O (\frac{n}{K})$ (i.e., a $S B M (n, K, p_{in}, p_{out})$ with $K = O (1)$ in the case of a non-overlapping network or a $M M S B (n, K, Π, p_{in}, p_{out})$ with $K = O (1)$ in the case of an overlapping network, and we will mainly focus on $S B M (n, K, p_{in}, p_{out})$ with $K = O (1)$ for convenience.). Let the model degenerate to $S B M (n, K, p_{in}, p_{out})$ with $K = O (1)$ , and then we obtain the new theoretical upper bound of the error rate. Note that if the model does consider degree heterogeneity, sparsity parameter $ρ$ should be considered in the theoretical upper bound of error rate in $s t e p_{1}$ . If the model considers degree heterogeneity, when it degenerates to $S B M (n, K, p_{in}, p_{out})$ with $K = O (1)$ , $ρ$ appears at this step. Meanwhile, if $ρ$ is not contained in the error rate of $s t e p_{1}$ when the model does not consider degree heterogeneity, it suggests possible improvements by considering $ρ$ .
$s t e p_{3}$: Let $P = ρ \tilde{P}$ be the probability matrix when the model degenerates to $S B M (n, K, p_{in}, p_{out})$ such that P has diagonal entries $p_{in}$ and non-diagonal entries $p_{out}$ . So, $σ_{K} (P) = | p_{in} - p_{out} | = ρ σ_{K} (\tilde{P})$ and separation condition $\frac{| p_{in} - p_{out} |}{\sqrt{\max (p_{in}, p_{out})}} \equiv \sqrt{ρ} σ_{K} (\tilde{P})$ since the maximum entry of $\tilde{P}$ is assumed to be 1. Compute the lower bound requirement of $σ_{K} (\tilde{P})$ for consistency estimation through analyzing the new bound obtained in $s t e p_{2}$ . Compute separation condition $\frac{| p_{in} - p_{out} |}{\sqrt{\max (p_{in}, p_{out})}} \equiv \sqrt{ρ} σ_{K} (\tilde{P})$ using the lower bound requirement for $σ_{K} (\tilde{P})$ . The sharp threshold for the ER random graph $G (n, p)$ is obtained from the lower bound requirement on $ρ$ for the consistency estimation under the setting that $K = 1, σ_{K} (\tilde{P}) = 1$ and $p = ρ$ .
$s t e p_{4}$: Compare the separation condition and the sharp threshold obtained in $s t e p_{3}$ with Equations (1) and (3), respectively. If the sharp threshold $≫ \frac{\log (n)}{n}$ or the separation condition $≫ \sqrt{\frac{\log (n)}{n}}$ , then this leaves improvements on the requirement of the network sparsity or theoretical upper bound of the error rate. If the sharp threshold is $\frac{\log (n)}{n}$ and the separation condition is $\sqrt{\frac{\log (n)}{n}}$ , the optimality of the theoretical results on both error rates and the requirement of network sparsity is guaranteed. Finally, if the sharp threshold $≪ \frac{\log (n)}{n}$ or separation condition $≪ \sqrt{\frac{\log (n)}{n}}$ , this suggests that the theoretical result is obtained based on $∥ A_{re} - Ω ∥$ instead of $∥ A - Ω ∥$ .

Remark 7.

This remark provides some explanations on the four steps of SCSTC.

In $s t e p_{1}$ , we give a few examples. When applying SCSTC to the main results of [40,48,67], we stop at $s t e p_{1}$ as analyzed in Remark 8, suggesting possible improvements by considering $σ_{K} (\tilde{P})$ for these works. Meanwhile, for the theoretical result without considering $σ_{K} (\tilde{P})$ , we can also move to $s t e p_{2}$ to obtain the new theoretical upper bound of the error rate, which is related with ρ and n. Discussions on the theoretical upper bounds of error rates of [50,68] given in Remark 8 are examples of this case.
In $s t e p_{2}$ , letting $K = O (1)$ and the model reduce to $S B M (n, K, p_{in}, p_{out})$ for the non-overlapping network or $M M S B (n, K, Π, p_{in}, p_{out})$ for the overlapping network can always simplify the theoretical upper bound of error rate, as shown by our Corollaries 1 and 2. Here, we provide some examples about how to make a model degenerate to SBM. For $M M S B_{n} (K, \tilde{P}, Π, ρ)$ in this paper, when all nodes are pure, MMSB degenerates to SBM; for the $D C M M_{n} (K, \tilde{P}, Π, Θ)$ model introduced in Section 5 or DCSBM considered in [30,48,50], setting $Θ = \sqrt{ρ} I$ makes DCMM and DCSBM degenerates to SBM when all nodes are pure, similar to the ScBM and DCScBM considered in [67,68,71], the OCCAM model of [40], the stochastic blockmodel with the overlap proposed in [46], the extensions of SBM and DCSBM for hypergraph networks considered in [73,74,75], and so forth.
In $s t e p_{3}$ and $s t e p_{4}$ , the separation condition can be replaced by an alternative separation condition.
When using SCSTC to build and compare theoretical results for the spectral clustering method, the key point is computing the lower bound for $\frac{| p_{in} - p_{out} |}{\sqrt{\max (p_{in}, p_{out})}}$ when the probability matrix P has diagonal entries $p_{in}$ and non-diagonal entries $p_{out}$ from the theoretical upper bound of the error rate for a certain spectral method. If this lower bound is consistent with that of Equation (1), this suggests theoretical optimality, and otherwise it suggests possible improvements by following the four steps of SCSTC.

The above analysis shows that SCSTC can be used to study the consistent estimation of model-based spectral methods. Use SCSTC, the following remark lists a few works whose main theoretical results leave possible improvements.

Remark 8.

The unknown separation condition, or sub-optimal error rates, or a lack of requirement of network sparsity of some previous works, suggest possible improvements of their theoretical results. Here, we list a few works whose main results can be possibly improved until considering the separation condition.

Theorem 4.4 of [48] proposes the upper bound of the error rate for their regularized spectral clustering (RSC) algorithm, designed based on a regularized Laplacian matrix under DCSBM. However, since [48] does not study the lower bound (in the [48] language) of $λ_{K}$ and m, we cannot directly obtain the separation condition from their main theorem. Meanwhile, the main result of [48] does not consider the requirement on the network sparsity, which leaves some improvements. Ref. [48] does not study the theoretical optimal choice for the RSC regularizer τ. After considering $σ_{K} (\tilde{P})$ and sparsity parameter ρ, one can obtain the theoretical optimal choice for τ, and this is helpful for explaining and choosing the empirical optimal choice for τ. Therefore, the feasible network implementation of SCSTC is obtaining the theoretical optimal choices for some tuning parameters, such as regularizer τ of the RSC algorithm. By using SCSTC, we can find that RSC achieves thresholds in Equations (1)–(3), and we omit proofs for it in this paper.
Refs. [26,49] study two algorithms designed based on the Laplacian matrix and its regularized version under SBM. They obtain meaningful results, but do not consider the network sparsity parameter ρ and separation parameter $σ_{K} (\tilde{P})$ . After obtaining improved error bounds which are consistent with separation condition $\sqrt{\frac{\log (n)}{n}}$ using SCSTC, one can also obtain the theoretical optimal choice for regularizer τ of the RSC-τ algorithm considered in [49] and find that the two algorithms considered in [26,49] achieve thresholds in Equations (1)–(3).
Theorem 2.2 of [50] provides an upper bound of their SCORE algorithm under DCSBM. However, since they do not consider the influence of $σ_{K} (\tilde{P})$ , we cannot directly obtain the separation condition from their main result. Meanwhile, by setting their $Θ = \sqrt{ρ} I$ , DCSBM degenerates to SBM, which gives that their $e r r_{n} = \frac{1}{ρ^{2} n} (1 + \frac{\log (n)}{ρ n}) = O (\frac{1}{ρ^{2} n})$ by their assumption Equation (2.9). Hence, when $Θ = \sqrt{ρ} I$ , the upper bound of Theorem 2.2 in [50] is $O (\frac{\log^{3} (n)}{ρ^{2} n})$ . The upper bound of error rate in Corollary 3.2 of [30] is $O (\frac{\log (n)}{ρ n})$ when using $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}$ under the setting that $κ (Π) = O (1), K = O (1)$ and $σ_{K} (\tilde{P}) = O (1)$ . We see that $\frac{\log^{3} (n)}{ρ^{2} n}$ grows faster than $\frac{\log (n)}{ρ n}$ , which suggests that there is space to improve the main result of [50] in the aspects of the separation condition and error rates. Furthermore, using SCSTC, we can find that SCORE achieves thresholds in Equations (1)–(3) because its extension mixed-SCORE [41] achieves thresholds in Equations (1)–(3).
Ref. [67] proposes two models, ScBM and DCScBM, to model the directed networks and an algorithm DI-SIM based on the directed regularized Laplacian matrix to fit DCScBM. However, similar to [48], their main theoretical result in their Theorem C.1 does not consider the lower bound of (in the language of Ref. [67]) $σ_{K}, m_{y}, m_{z}$ and $γ_{z}$ , which causes that we cannot obtain the separation condition when DCScBM degenerates to SBM. Meanwhile, their Theorem C.1 also lacks a lower bound requirement on network sparsity. Hence, there is space to improve the theoretical guarantees of [67]. Similar to [48,49], we can also obtain the theoretical optimal choices for regularizer τ of the DI-SIM algorithm and prove that DI-SIM achieves the thresholds in Equations (1)–(3) since it is the directed version of RSC [48].
Ref. [68] mainly studies the theoretical guarantee for the D-SCORE algorithm proposed by [14] to fit a special case of the DCScBM model for directed networks. By setting their $θ (i) = \sqrt{ρ}, δ (j) = \sqrt{ρ}$ for $i, j \in [n]$ , their directed-DCBM degenerates to SBM. Meanwhile, since their $e r r_{n} = \frac{1}{ρ}$ , their mis-clustering rate is $O (\frac{T_{n}^{2} \log (n)}{ρ n})$ , which matches that of [30] under SBM when setting $T_{n}$ as a constant. However, if setting $T_{n}$ as $\log (n)$ , then the error rate is $O (\frac{\log^{3} (n)}{ρ n})$ , which is sub-optimal compared with that of [30]. Meanwhile, similar to [50,68], the main result does not consider the influences of K and $σ_{K} (\tilde{P})$ , causing a lack of a separation condition. Hence, the main results of [68] can be improved by considering K, $σ_{K} (P)$ , or a more optimal choice of $T_{n}$ to make their main results comparable with those of [30] when directed-DCBM degenerates to SBM. Using SCSTC, we can find that the D-SCORE also achieves thresholds in Equations (1)–(3) since it is the directed version of SCORE [50].

5. Degree Corrected Mixed Membership Model

Using SCSTC to Theorem 3.2 of [43], as shown in Table 1 and Table 2 results in Theorem 3.2 [43] being sub-optimal. To obtain the improvement theoretical results, we give a formal introduction of the degree corrected mixed membership (DCMM) model proposed in [41] first, then we review the SVM-cone-DCMMSB algorithm of [43] and provide the improvement theoretical results. A DCMM for generating A is as follows.

\begin{matrix} Ω : = Θ Π \tilde{P} Π^{'} Θ A (i, j) \sim Bernoulli (Ω (i, j)) i, j \in [n], \end{matrix}

(5)

where

Θ \in R^{n \times n}

is a diagonal matrix whose i-th diagonal entry is the degree heterogeneity of node i for

i \in [n]

. Let

θ \in R^{n \times 1}

with

θ (i) = Θ (i, i)

for

i \in [n]

. Set

θ_{\max} = \max_{i \in [n]} θ (i), θ_{\min} = \min_{i \in [n]} θ (i)

and

{\tilde{P}}_{\max} = \max_{k, l \in [K]} \tilde{P} (k, l), {\tilde{P}}_{\min} = \min_{k, l \in [K]} \tilde{P} (k, l)

.

Definition 3.

Call model (5) the degree corrected mixed membership (DCMM) model, and denote it by

D C M M_{n} (K, \tilde{P}, Π, Θ)

.

Note that if we set

\tilde{Π} = Θ Π

and choose

Θ

such that

\tilde{Π} \in {0, 1}^{n \times K}

, then we have

Ω = \tilde{Π} \tilde{P} {\tilde{Π}}^{'}

, which means that the stochastic blockmodel with overlap (SBMO) proposed in [46] is just a special case of DCMM. Meanwhile, if we write

Θ

as

Θ = \tilde{Θ} D_{o}

, where

\tilde{Θ}, D_{o}

are two positive diagonal matrices and let

Π_{o} = D_{o} Π

, then we can choose

D_{0}

such that

∥ Π_{o} {(i, :) ∥}_{F} = 1

. By

Ω = Θ Π \tilde{P} Π^{'} Θ = \tilde{Θ} Π_{o} \tilde{P} Π_{o}^{'} \tilde{Θ}

, we see that the OCCAM model proposed in [40] equals the DCMM model. By Equation (1.3) and Proposition 1.1 of [41], the following conditions are sufficient for the identifiability of DCMM when

θ_{\max} {\tilde{P}}_{\max} \leq 1

:

(II1) $rank (\tilde{P}) = K$ and $\tilde{P}$ has unit diagonals.
(II2) There is at least one pure node for each of the K communities.

Note that though the diagonal entries of

\tilde{P}

are ones,

{\tilde{P}}_{\max}

may be larger than 1 as long as

θ_{\max} {\tilde{P}}_{\max} \leq 1

under DCMM, and this is slightly different from the setting that

\max_{k, l \in [K]} \tilde{P} (k, l) = 1

under MMSB.

Without causing confusion, under

D C M M_{n} (K, \tilde{P}, Π, Θ)

, we still let

Ω = U Λ U^{'}

be the top-K eigen-decomposition of

Ω

such that

U \in R^{n \times K}, Λ \in R^{K \times K}

and

U^{'} U = I_{K}

. Set

U_{*} \in R^{n \times K}

by

U_{*} (i, :) = \frac{U (i, :)}{{∥ U (i, :) ∥}_{F}}

and let

N_{U} \in R^{n \times n}

be a diagonal matrix such that

N_{U} (i, i) = \frac{1}{{∥ U (i, :) ∥}_{F}}

for

i \in [n]

. Then

U_{*}

can be rewritten as

U_{*} = N_{U} U

. The existence of the ideal cone (IC for short) structure inherent in

U_{*}

mentioned in [43] is guaranteed by the following lemma.

Lemma 3.

Under

D C M M_{n} (K, \tilde{P}, Π, Θ)

,

U_{*} = Y U_{*} (I, :)

where

Y = N_{M} Π Θ^{- 1} (I, I) N_{U}^{- 1} (I, I)

with

N_{M}

being an

n \times n

diagonal matrix whose diagonal entries are positive.

Lemma 3 gives

Y = U_{*} U_{*}^{- 1} (I, :)

. Since

U_{*} = N_{U} U

and

Y = N_{M} Π Θ^{- 1} (I, I) N_{U}^{- 1} (I, I)

, we have

\begin{matrix} N_{U}^{- 1} N_{M} Π = U U_{*}^{- 1} (I, :) N_{U} (I, I) Θ (I, I) . \end{matrix}

(6)

Since

Ω (I, I) = Θ (I, I) Π (I, :) \tilde{P} Π^{'} (I, :) = Θ (I, I) \tilde{P} Θ (I, I) = U (I, :) Λ U^{'} (I, :)

, we have

Θ (I, I) \tilde{P} Θ (I, I) = U (I, :) Λ U^{'} (I, :)

. Then we have

Θ (I, I) = \sqrt{diag (U (I, :) Λ U^{'} (I, :))}

when Condition (II1) holds such that

\tilde{P}

has unit-diagonals. Set

J_{*} = N_{U} (I, I) Θ (I, I) \equiv \sqrt{diag (U_{*} (I, :) Λ U_{*}^{'} (I, :))}, Z_{*} = N_{U}^{- 1} N_{M} Π, Y_{*} = U U_{*}^{- 1} (I, :)

. By Equation (6), we have

\begin{matrix} Z_{*} = Y_{*} J_{*} \equiv U U_{*}^{- 1} (I, :) diag (U_{*} (I, :) Λ U_{*}^{'} (I, :)) . \end{matrix}

(7)

Meanwhile, since

N_{U}^{- 1} N_{M}

is an

n \times n

positive diagonal matrix, we have

\begin{matrix} Π (i, :) = \frac{Z_{*} (i, :)}{∥ Z_{*} {(i, :) ∥}_{1}}, i \in [n] . \end{matrix}

(8)

With given

Ω

and K, we can obtain

U, U_{*}

and

Λ

. The above analysis shows that once

U_{*} (I, :)

is known, we can exactly recover

Π

by Equations (7) and (8). From Lemma 3, we know that

U_{*} = Y U_{*} (I, :)

forms the IC structure. Ref. [43] proposes the SVM-cone algorithm (i.e., Algorithm A2) which can exactly obtain

U_{*} (I, :)

from the ideal cone

U_{*} = Y U_{*} (I, :)

with inputs

U_{*}

and K.

Based on the above analysis, we are now ready to give the ideal SVM-cone-DCMMSB algorithm. Input

Ω, K

. Output:

Π

.

Let $Ω = U Λ U^{'}$ be the top-K eigen-decomposition of $Ω$ such that $U \in R^{n \times K}, Λ \in R^{K \times K}, U^{'} U = I$ . Let $U_{*} = N_{U} U$ , where $N_{U}$ is a $n \times n$ diagonal matrix whose i-th diagonal entry is $\frac{1}{{∥ U (i, :) ∥}_{F}}$ for $i \in [n]$ .
Run SVM-cone algorithm on $U_{*}$ assuming that there are K communities to obtain $I$ .
Set $J_{*} = \sqrt{diag (U_{*} (I, :) Λ U_{*}^{'} (I, :))}, Y_{*} = U U_{*}^{- 1} (I, :), Z_{*} = Y_{*} J_{*}$ .
Recover $Π$ by setting $Π (i, :) = \frac{Z_{*} (i, :)}{∥ Z_{*} {(i, :) ∥}_{1}}$ for $i \in [n]$ .

With given

U_{*}

and K, since the SVM-cone algorithm returns

U_{*} (I, :)

, the ideal SVM-cone-DCMMSB exactly (for detail, see Appendix B) returns

Π

.

Now, we review the SVM-cone-DCMMSB algorithm of [43], where this algorithm can be seen as an extension of SPACL designed under MMSB to fit DCMM. For the real case, use

{\hat{Y}}_{*}, {\hat{J}}_{*}, {\hat{Z}}_{*}, {\hat{Π}}_{*}

given in Algorithm 2 to estimate

Y_{*}, J_{*}, Z_{*}, Π

, respectively.

Algorithm 2 SVM-cone-DCMMSB [43]

Require: The adjacency matrix $A \in R^{n \times n}$ and the number of communities K.
Ensure: The estimated $n \times K$ membership matrix ${\hat{Π}}_{*}$ .
1: Obtain $\tilde{A} = \hat{U} \hat{Λ} {\hat{U}}^{'}$ , the top K eigen-decomposition of A. Let ${\hat{U}}_{*} \in R^{n \times K}$ such that ${\hat{U}}_{*} (i, :) = \frac{\hat{U} (i, :)}{∥ \hat{U} {(i, :) ∥}_{F}}$ for $i \in [n]$ .
2: Apply SVM-cone algorithm (i.e., Algorithm A2) on the rows of ${\hat{U}}_{*}$ assuming there are K communities to obtain ${\hat{I}}_{*}$ , the index set returned by SVM-cone algorithm.
3: Set ${\hat{J}}_{*} = \sqrt{diag ({\hat{U}}_{*} ({\hat{I}}_{*}, :) \hat{Λ} {\hat{U}}_{*}^{'} ({\hat{I}}_{*}, :))}, {\hat{Y}}_{*} = \hat{U} {\hat{U}}_{*}^{- 1} ({\hat{I}}_{*}, :), {\hat{Z}}_{*} = {\hat{Y}}_{*} {\hat{J}}_{*}$ . Then set ${\hat{Z}}_{*} = \max (0, {\hat{Z}}_{*})$ .
4: Estimate $Π (i, :)$ by ${\hat{Π}}_{*} (i, :) = {\hat{Z}}_{*} (i, :) / {∥ {\hat{Z}}_{*} (i, :) ∥}_{1}, i \in [n]$ .

Consistency under DCMM

Assume that

(A2): ${\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} \geq \log (n)$ .

Since we let

{\tilde{P}}_{\max} \leq C

, Assumption (A2) equals

θ_{\max} {∥ θ ∥}_{1} \geq \log (n) / C

. The following lemma bounds

∥ A - Ω ∥

under

D C M M_{n} (K, \tilde{P}, Π, Θ)

when Assumption (A2) holds.

Lemma 4.

Under

D C M M_{n} (K, \tilde{P}, Π, Θ)

, when Assumption (A2) holds, with probability at least

1 - o (n^{- α})

, we have

\begin{matrix} ∥ A - Ω ∥ \leq \frac{α + 1 + \sqrt{(α + 1) (α + 19)}}{3} \sqrt{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} \log (n)} . \end{matrix}

Remark 9.

Consider a special case when

Θ = \sqrt{ρ} I

such that DCMM degenerates to MMSB, since

{\tilde{P}}_{\max}

is assumed to be 1 under MMSB, Assumption (A2) and the upper bound of

∥ A - Ω ∥

in Lemma 4 are consistent with that of Lemma 1. When all nodes are pure, DCMM degenerates to DCSBM [45], then the upper bound of

∥ A - Ω ∥

in Lemma 4 is also consistent with Lemma 2.2 of [50]. Meanwhile, this bound is also consistent with Equation (6.34) in the first version of [41], which also applies the Bernstein inequality to bound

∥ A - Ω ∥

. However, the bound is

C \sqrt{θ_{\max} {∥ θ ∥}_{1}}

in Equation (C.53) of the latest version for [41], which applies Corollary 3.12 and Remark 3.13 of [76] to obtain the bound. Though the bound in Equation (C.53) of the latest version for [41] is sharper by a

\sqrt{\log (n)}

term, Corollary 3.12 of [76] has constraints on

W (i, j)

(here,

W = A - Ω

) such that

W (i, j)

can be written as

W (i, j) = ξ_{i j} b_{i j}

, where

{ξ_{i, j} : i \geq j}

are independent symmetric random variables with unit variance, and

{b_{i, j} : i \geq j}

are given scalars; see the proof of Corollary 3.12 [76] for detail. Therefore, without causing confusion, we also use

A_{re}

to denote the constraint A used in [41] such that

∥ A_{re} - Ω ∥ \leq C \sqrt{θ_{\max} {∥ θ ∥}_{1}}

. Furthermore, if we set

ρ \geq \max_{i, j} Ω (i, j)

such that

ρ \geq θ_{\max}^{2}

, the bound in Lemma 4 also equals

∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}

and the assumption (A2) reads

{\tilde{P}}_{\max} ρ n \geq \log (n)

. The bound

∥ A_{re} - Ω ∥ \leq C \sqrt{θ_{\max} {∥ θ ∥}_{1}}

in Equation (C.53) of [41] reads

∥ A_{re} - Ω | | \leq C \sqrt{ρ n}

.

Lemma 5.

(Row-wise eigenspace error) Under

D C M M_{n} (K, \tilde{P}, Π, Θ)

, when Assumption (A2) holds, suppose

σ_{K} (Ω) \geq C θ_{\max} \sqrt{{\tilde{P}}_{\max} n \log (n)}

, with probability at least

1 - o (n^{- α})

.

When we apply Theorem 4.2 of [64], we have

$\begin{matrix} ∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty} = O (\frac{θ_{\max} \sqrt{{\tilde{P}}_{\max} K} (\frac{θ_{\max} κ (Ω)}{θ_{\min}} \sqrt{\frac{n}{K λ_{K} (Π^{'} Π)}} + \sqrt{\log (n)})}{θ_{\min}^{2} σ_{K} (\tilde{P}) λ_{K} (Π^{'} Π)}) . \end{matrix}$
When we apply Theorem 4.2 of [65], we have

$\begin{matrix} ∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty} = O (\frac{θ_{\max} \sqrt{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} \log (n)}}{θ_{\min}^{3} σ_{K} (\tilde{P}) λ_{K}^{1.5} (Π^{'} Π)}) . \end{matrix}$

Without causing confusion, we also use

ϖ, ϖ_{1}, ϖ_{2}

under DCMM as Lemma 2 for notation convenience.

Remark 10.

When

Θ = \sqrt{ρ} I

such that DCMM degenerates to MMSB, bounds in Lemma 5 are consistent with those of Lemma 2.

Remark 11

(Comparison to Theorem I.3 [43]). Note that the ρ in [43] is

θ_{\max}^{2}

, which gives that the row-wise eigenspace concentration in Theorem I.3 [43] is

O (\frac{θ_{\max} \sqrt{K n} {∥ U ∥}_{2 \to \infty} \log^{ξ} (n)}{σ_{K} (Ω)})

when using

∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}

and this value is at least

O (\frac{\sqrt{θ_{\max} {∥ θ ∥}_{1} K} {∥ U ∥}_{2 \to \infty} \log^{ξ} (n)}{σ_{K} (Ω)})

. Since

{∥ U ∥}_{2 \to \infty} \leq \frac{θ_{\max}}{θ_{\min} \sqrt{λ_{K} (Π^{'} Π)}}

by Lemma II.1 of [43] and

σ_{K} (Ω) \geq θ_{\min}^{2} σ_{K} (\tilde{P}) λ_{K} (Π^{'} Π)

by the proof of Lemma 5, we see that the upper bound of Theorem I.3 [43] is

O (\frac{θ_{\max} \sqrt{K θ_{\max} {∥ θ ∥}_{1}} \log^{ξ} (n)}{θ_{\min}^{3} σ_{K} (\tilde{P}) λ_{K}^{1.5} (Π^{'} Π)})

, which is

\sqrt{K} \log^{ξ - 0.5} (n)

(recall that

ξ > 1

) times than our

ϖ_{2}

. Again, Theorem I.3 [43] has stronger requirements on the sparsity of

θ_{\max} {∥ θ ∥}_{1}

and the lower bound of

σ_{K} (Ω)

than our Lemma 5. When using the bound of

∥ A - Ω ∥

in our Lemma 4 to obtain the row-wise eigenspace concentration in Theorem I.3 [43], their upper bound is

\sqrt{K} \log^{ξ} (n)

times than our

ϖ_{2}

. Similar to the first bullet given after Lemma 2, whether using

∥ A - Ω ∥ \leq C \sqrt{θ_{\max} {∥ θ ∥}_{1} \log (n)}

or

∥ A_{re} - Ω ∥ \leq C \sqrt{θ_{\max} {∥ θ ∥}_{1}}

does not change our ϖ under DCMM.

Remark 12

(Comparison to Lemma 2.1 [41]). The fourth bullet of Lemma 2.1 [41] is the row-wise deviation bound for the eigenvectors of the adjacency matrix under some assumptions translated to our

κ (Π^{'} Π) = O (1)

, Assumption (A2) and lower bound requirement on

σ_{K} (Ω)

since they apply Lemma C.2 [41]. The row-wise deviation bound in the fourth bullet of Lemma 2.1 [41] reads

O (\frac{θ_{\max} K^{1.5} \sqrt{θ_{\max} {∥ θ ∥}_{1} \log (n)}}{σ_{K} (\tilde{P}) {∥ θ ∥}_{F}^{3}})

, where the denominator is

σ_{K} (\tilde{P}) {∥ θ ∥}_{F}^{3}

instead of our

θ_{\min}^{3} σ_{K} (\tilde{P}) λ_{K}^{1.5} (Π^{'} Π)

due to the fact that [41] uses

\frac{σ_{K} (\tilde{P}) {∥ θ ∥}_{F}^{2}}{K}

to roughly estimate

σ_{K} (Ω)

while we apply

θ_{\min}^{2} σ_{K} (\tilde{P}) λ_{K} (Π^{'} Π)

to strictly control the lower bound of

σ_{K} (Ω)

. Therefore, we see that the row-wise deviation bound in the fourth bullet of Lemma 2.1 [41] is consistent with our bounds in Lemma 5 when

κ (Π^{'} Π) = O (1)

, while our row-wise eigenspace errors in Lemma 5 are more applicable than those of [41] since we do not need to add a constraint on

Π^{'} Π

such that

κ (Π^{'} Π) = O (1)

. The upper bound of

∥ A - Ω ∥

of [41] is

C \sqrt{θ_{\max} {∥ θ ∥}_{1}}

given in their Equation (C.53) under

D C M M_{n} (K, \tilde{P}, Π, Θ)

, while ours is

C \sqrt{θ_{\max} {∥ θ ∥}_{1} \log (n)}

in Lemma 4, since our bound of the row-wise eigenspace error in Lemma 5 is consistent with the fourth bullet of Lemma 2.1 [41], this supports the statement that the row-wise eigenspace error does not rely on

∥ A - Ω ∥

given in the first bullet after Lemma 2.

Let

π_{\min} = \min_{1 \leq k \leq K} 1^{'} Π e_{k}

, where

π_{\min}

measures the minimum summation of nodes belonging to a certain community. Increasing

π_{\min}

makes the network tend to be more balanced, and vice versa. Meanwhile, the term

π_{\min}

appears when we propose a lower bound of

η

defined in Lemma A1 to keep track of the model parameters in our main theorem under

D C M M_{n} (K, \tilde{P}, Π, Θ)

. The next theorem gives the theoretical bounds on estimations of memberships under DCMM.

Theorem 2.

Under

D C M M_{n} (K, \tilde{P}, Π, Θ)

, let

\hat{Π}

be obtained from Algorithm 2, suppose conditions in Lemma 5 hold, and there exists a permutation matrix

P_{*} \in R^{K \times K}

such that with probability at least

1 - o (n^{- α})

, we have

\begin{matrix} \max_{i \in [n]} {∥ e_{i}^{'} ({\hat{Π}}_{*} - Π P_{*}) ∥}_{1} = O (\frac{θ_{\max}^{15} K^{5} ϖ κ^{4.5} (Π^{'} Π) λ_{1}^{1.5} (Π^{'} Π)}{θ_{\min}^{15} π_{\min}}) . \end{matrix}

For comparison, Table 4 summarizes the necessary conditions and dependence on model parameters of rates for Theorem 2 and Theorem 3.2 [43], where the dependence on K and

\log (n)

are analyzed in Remark 13 given below.

Remark 13.

(Comparison to Theorem 3.2 [43]) Our bound in Theorem 2 is written as combinations of model parameters and Π can follow any distribution as long as Condition (II2) holds, where such model parameters’ related form of estimation bound is convenient for further theoretical analysis (see Corollary 2), while the bound in Theorem 3.2 [43] is built when Π follows a Dirichlet distribution and

κ (Π^{'} Θ^{2} Π) = O (1)

. Meanwhile, since Theorem 3.2 [43] applies Theorem I.3 [43] to obtain the row-wise eigenspace error, the bound in Theorem 3.2 [43] should multiply

\log^{ξ} (n)

by Remark 11, and this is also supported by the fact that in the proof of Theorem 3.1 [43], when computing bound of

ϵ_{0}

(in the language in Ref. [43]) [43] ignores the

\log^{ξ} (n)

term.

Consider a special case by setting

λ_{K} (Π^{'} Π) = O (\frac{n}{K}), π_{\min} = O (\frac{n}{K})

and

\frac{θ_{\max}}{θ_{\min}} = O (1)

with

θ_{\max} = \sqrt{ρ}

, where such case matches the setting

κ (Π^{'} Θ^{2} Π) = O (1)

in Theorem 3.2 [43]. Now we focus on analyzing the powers of K in our Theorem 2 and Theorem 3.2 [43]. Under this case, the power of K in the estimation bound of Theorem 2 is 6 by basic algebra; since

\min (K^{2}, κ^{2} (Ω)) = \min (K^{2}, O (1)) = O (1), \frac{1}{λ_{K}^{2} (Π^{'} Θ^{2} Π)} = O (\frac{K^{2}}{ρ^{2} n^{2}})

,

\frac{1}{η} = O (K)

by Lemma A1 where η in Lemma A1 follows the same definition as that of Theorem 3.2 [43], and the bound in Theorem 3.2 [43] should multiply

\sqrt{K}

because (in the language of Ref. [43])

∥ {({\hat{Y}}_{C} {\hat{Y}}_{C}^{'})}^{- 1} ∥_{F}

should be no larger than

\frac{\sqrt{K}}{λ_{K} ({\hat{Y}}_{C} {\hat{Y}}_{C}^{'})}

instead of

\frac{1}{λ_{K} ({\hat{Y}}_{C} {\hat{Y}}_{C}^{'})}

in the proof of Theorem 2.8 [43], the power of K is 6 by checking the bound of Theorem 3.2 [43]. Meanwhile, note that our bound in Theorem 2 is

l_{1}

bound, while the bound in Theorem 3.2 [43] is

l_{2}

bound, and when we translate the

l_{2}

bound of Theorem 3.2 [43] into

l_{1}

bound, the power of K is 6.5 for Theorem 3.2 [43], suggesting that our bound in Theorem 2 has less dependence on K than that of Theorem 3.2 [43].

The following corollary is obtained by adding some conditions on the model parameters.

Corollary 2.

Under

D C M M_{n} (K, \tilde{P}, Π, Θ)

, when conditions of Lemma 5 hold, suppose

λ_{K} (Π^{'} Π) = O (\frac{n}{K}), π_{\min} = O (\frac{n}{K})

and

K = O (1)

, with probability at least

1 - o (n^{- α})

, we have

\begin{matrix} \max_{i \in [n]} {∥ e_{i}^{'} ({\hat{Π}}_{*} - Π P_{*}) ∥}_{1} = O (\frac{θ_{\max}^{16} \sqrt{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} \log (n)}}{θ_{\min}^{18} σ_{K} (\tilde{P}) n}) . \end{matrix}

Meanwhile, when

θ_{\max} = O (\sqrt{ρ}), θ_{\min} = O (\sqrt{ρ})

(i.e.,

\frac{θ_{\min}}{θ_{\max}} = O (1)

), we have

\begin{matrix} \max_{i \in [n]} {∥ e_{i}^{'} ({\hat{Π}}_{*} - Π P_{*}) ∥}_{1} = O (\frac{1}{σ_{K} (\tilde{P})} \sqrt{\frac{{\tilde{P}}_{\max} \log (n)}{ρ n}}) . \end{matrix}

Remark 14.

When

λ_{K} (Π^{'} Π) = O (\frac{n}{K}), K = O (1), θ_{\max} = O (\sqrt{ρ})

and

θ_{\min} = O (\sqrt{ρ})

, the requirement

σ_{K} (Ω) \geq C θ_{\max} \sqrt{{\tilde{P}}_{\max} n \log (n)}

in Lemma 5 holds naturally. By the proof of Lemma 5,

σ_{K} (Ω)

has a lower bound

θ_{\min}^{2} σ_{K} (\tilde{P}) λ_{K} (Π^{'} Π) = O (θ_{\min}^{2} σ_{K} (P) n)

. To make the requirement

σ_{K} (Ω) \geq C θ_{\max} \sqrt{{\tilde{P}}_{\max} n \log (n)}

always hold, we just need

θ_{\min}^{2} σ_{K} (\tilde{P}) n \geq C θ_{\max} \sqrt{{\tilde{P}}_{\max} n \log (n)}

, and it gives

σ_{K} (\tilde{P}) \geq C \sqrt{\frac{{\tilde{P}}_{\max} \log (n)}{ρ n}}

, which matches the requirement of consistent estimation in Corollary 2.

Using SCSTC to Corollary 2, let

Θ = \sqrt{ρ} I

such that DCMM degenerates to MMSB, and it is easy to see that the bound in Lemma 2 is consistent with that of Lemma 1. Therefore, the separation condition, alternative separation condition and sharp threshold obtained from Corollary 2 for the extended version of SPACL under DCMM are consistent with classical results, as shown in Table 1 and Table 2 (detailed analysis will be provided in next paragraph). Meanwhile, when

θ_{\max} = O (\sqrt{ρ}), θ_{\min} = O (\sqrt{ρ})

and settings in Corollary 2 hold, the bound in Theorem 2.2 [41] is of order

\frac{1}{σ_{K} (\tilde{P})} \sqrt{\frac{\log (n)}{ρ n}}

, which is consistent with our bound in Corollary 2.

Consider a mixed membership network under the settings of Corollary 2 when

Θ = \sqrt{ρ} I

such that DCMM degenerates to MMSB. By Corollary 2,

\frac{σ_{K} (\tilde{P})}{\sqrt{{\tilde{P}}_{\max}}}

should shrink slower than

\sqrt{\frac{\log (n)}{ρ n}}

. We further assume that

\tilde{P} = (2 - β) I_{K} + (β - 1) 1 1^{'}

for

β \in [1, 2) \cup (2, \infty)

; we see that this

\tilde{P}

with unit diagonals and

β - 1

as non-diagonal entries still satisfies Condition (II1). Meanwhile,

σ_{K} (\tilde{P}) = | β - 2 | = {\tilde{P}}_{\max} - {\tilde{P}}_{\min}

and

{\tilde{P}}_{\max} = \max (1, β - 1)

, so

\frac{σ_{K} (\tilde{P})}{\sqrt{{\tilde{P}}_{\max}}} = \frac{| β - 2 |}{\sqrt{\max (1, β - 1)}}

should shrink slower than

\sqrt{\frac{\log (n)}{ρ n}}

. Setting

P = ρ P

as the probability matrix for such

\tilde{P}

, we have

p_{out} = ρ (β - 1), p_{in} = ρ

, and

\max (p_{in}, p_{out}) = ρ \max (1, β - 1)

. Sure, the separation condition

\frac{| p_{in} - p_{out} |}{\sqrt{\max (p_{in}, p_{out})}} \equiv \frac{\sqrt{ρ} | β - 2 |}{\sqrt{\max (1, β - 1)}}

should shrink slower than

\sqrt{\frac{\log (n)}{n}}

, which satisfies Equation (1). For an alternative separation condition and sharp threshold, just follow a similar analysis as that of MMSB, and we obtain the results in Table 1 and Table 2.

6. Numerical Results

In this section, we present the experimental results for an overlapping network by plotting the phase transition behaviors for both SPACL and SVM-cone-DCMMSB to show that the two methods achieve the threshold in Equation (2) under

M M S B (n, K, Π, p_{in}, p_{out})

when

K = 2

and

K = 3

. We also use some experiments to show that the spectral methods studied in [26,30,48,50] achieve the threshold in Equation (2) under

S B M (n, K, p_{in}, p_{out})

when

K = 2

and

K = 3

for the non-overlapping network. To measure the performance of different algorithms, we use the error rate defined below:

\begin{matrix} \min_{P \in {K \times K permutation matrix}} \frac{1}{n} {∥ \hat{Π} - Π P ∥}_{1} . \end{matrix}

(9)

For all simulations, let

p_{in} = α_{in} \frac{\log (n)}{n}

and

p_{out} = α_{out} \frac{\log (n)}{n}

be diagonal and non-diagonal entries of P, respectively. Since P is the probability matrix,

α_{in}

and

α_{out}

should be located in

(0, \frac{n}{\log (n)}]

. After setting P and

Π

, each simulation experiment has the following steps:

(a): Set $Ω = Π P Π^{'}$ .
(b): Let W be an $n \times n$ symmetric matrix such that all diagonal entries of W are 0, and $W (i, j)$ are independent centered Bernoulli with parameters $Ω (i, j)$ . Let $A = Ω - diag (Ω) + W$ be the adjacency matrix of a simulated network with mixed memberships under MMSB (so there are no loops).
(c): Apply spectral clustering method to A with K communities. Record the error rate.
(d): Repeat (b)–(c) 50 times, and report the mean of the error rates over the 50 times.

Experiment 1: Set

n = 600, K = 2

, and

n_{0} = 250

, where

n_{0}

is the number of pure nodes in each community. Let all mixed nodes have mixed membership

(1 / 2, 1 / 2)

. Since

α_{in}

and

α_{out}

should be set to less than

\frac{n}{\log (n)} = \frac{600}{\log (600)} \approx 93.795

, we let

α_{in}

and

α_{out}

be in the range of

{5, 10, 15, \dots, 90}

. For each pair of

(α_{in}, α_{out})

, we generate P and then run steps (a)–(d) for this experiment. So, this experiment generates an adjacency matrix of network with mixed memberships under

M M S B (n, 2, Π, p_{in}, p_{out})

. The numerical results are displayed in panels (a) and (b) of Figure 1. We can see that our theoretical bounds (red lines) are quite tight, and the threshold regions obtained from the boundaries of light white areas in panels (a) and (b) are close to our theoretical bounds. Meanwhile, both methods perform better when

\frac{| α_{in} - α_{out} |}{\sqrt{\max (α_{in}, α_{out})}}

increases and SVM-cone-DCMMSB outperforms SPACL for this experiment since panel (b) is darker than panel (a). Note that the network generated here is an assortative network when

α_{in} > α_{out}

, and the network is a dis-assortative network when

α_{in} < α_{out}

. So, the results of this experiment support our finding that SPACL and SVM-cone-DCMMSB achieves the threshold in Equation (2) for both assortative and dis-assortative networks.

Experiment 2: Set

n = 600, K = 3

, and

n_{0} = 150

. Let all mixed nodes have mixed membership

(1 / 3, 1 / 3, 1 / 3)

. Let

α_{in}

and

α_{out}

range in

{5, 10, 15, \dots, 90}

. Sure, this experiment is under

M M S B (n, 3, Π, p_{in}, p_{out})

. The numerical results are displayed in panels (c) and (d) of Figure 1. We see that both methods perform poorly in the region between the two red lines, and the analysis of the numerical results for this experiment is similar to that of Experiment 1.

For visualization, we plot two networks generated from

M M S B (n, K, Π, p_{in}, p_{out})

when

K = 2

and

K = 3

in Figure 2. We also plot two dis-assortative networks generated from

M M S B (n, K, Π, p_{in}, p_{out})

when

K = 2

and

K = 3

in Figure A1 in Appendix A. In Experiments 1 and 2, there exist some mixed nodes for network generated under MMSB. The following two experiments only focus on network under SBM such that all nodes are pure. Meanwhile, we only consider four spectral algorithms studied in [26,30,48,50] for the non-overlapping network. For convenience, we call the spectral clustering method studied in [26] normalized principle component analysis (nPCA), and Algorithm 1 studied in [30] ordinary principle component analysis (oPCA), where nPCA and oPCA are also considered in [50]. Next, we briefly review nPCA, oPCA, RSC and SCORE.

The nPCA algorithm is as follows with input

A, K

and output

\hat{Π}

.

Obtain the graph Laplacian $L = D^{- 1 / 2} A D^{- 1 / 2}$ , where D is a diagonal matrix with $D (i, i) = \sum_{j = 1}^{n} A (i, j)$ for $i \in [n]$ .
Obtain $\hat{U} \hat{Λ} {\hat{U}}^{'}$ , the top K eigen-decomposition of L.
Apply k-means algorithm to $\hat{U}$ to obtain $\hat{Π}$ .

The oPCA algorithm is as follows with input

A, K

and output

\hat{Π}

.

Obtain $\hat{U} \hat{Λ} {\hat{U}}^{'}$ , the top K eigen-decomposition of A.
Apply k-means algorithm to $\hat{U}$ to obtain $\hat{Π}$ .

The RSC algorithm is as follows with input

A, K

, regularizer

τ

, and output

\hat{Π}

.

Obtain the regularized graph Laplacian $L_{τ} = D_{τ}^{- 1 / 2} A D_{τ}^{- 1 / 2}$ , where $D_{τ} = D + τ I$ , and the default $τ$ is the average node degree.
Obtain $\hat{U} \hat{Λ} {\hat{U}}^{'}$ , the top K eigen-decomposition of $L_{τ}$ . Let ${\hat{U}}_{*}$ be the row-normalized version of $\hat{U}$ .
Apply k-means algorithm to ${\hat{U}}_{*}$ to obtain $\hat{Π}$ .

The SCORE algorithm is as follows with input

A, K

, threshold

T_{n}

and output

\hat{Π}

.

Obtain the K (unit-norm) leading eigenvectors of A: ${\hat{η}}_{1}, {\hat{η}}_{2}, \dots, {\hat{η}}_{K}$ .
Obtain an $n \times (K - 1)$ matrix ${\hat{R}}_{*}$ such that for $i \in [n], k \in [K - 1]$ ,

${\hat{R}}_{*} (i, k) = \{\begin{matrix} \hat{R} (i, k), & if | \hat{R} (i, k) | \leq T_{n}, \\ T_{n}, & if \hat{R} (i, k) > T_{n}, \\ - T_{n}, & if \hat{R} (i, k) < - T_{n}, \end{matrix}$

where $\hat{R} (i, k) = \frac{{\hat{η}}_{k + 1} (i)}{{\hat{η}}_{1}} (i)$ , and the default $T_{n}$ is $\log (n)$ .
Apply k-means algorithm to ${\hat{R}}_{*}$ to obtain $\hat{Π}$ .

We now describe Experiments 3 and 4 under

S B M (n, K, p_{in}, p_{out})

when

K = 2

and

K = 3

.

Experiment 3: Set

n = 600, K = 2

, and

n_{0} = 300

, i.e., all nodes are pure and each community has 300 nodes. So this experiment generates the adjacency matrix of the network under

S B M (n, 2, p_{in}, p_{out})

. Numerical results are displayed in panels (a)–(d) of Figure 3. We can see that these spectral clustering methods achieve the threshold in Equation (2).

Experiment 4: Set

n = 600, K = 3

, and

n_{0} = 200

, i.e., all nodes are pure and each community has 200 nodes. So, this experiment is under

S B M (n, 3, p_{in}, p_{out})

. The numerical results are displayed in panels (e)–(h) of Figure 3. The results show that these methods achieve threshold in Equation (2).

For visualization, we plot two assortative networks generated from

S B M (n, K, p_{in}, p_{out})

when

K = 2

and

K = 3

in Figure 4. We also plot two dis-assortative networks generated from

S B M (n, K, p_{in}, p_{out})

when

K = 2

and

K = 3

in Figure A2 in Appendix A.

7. Conclusions

In this paper, the four-step separation condition and sharp threshold criterion SCSTC is summarized as a unified framework to study the consistencies and compare the theoretical error rates of spectral methods under models that can degenerate to SBM in a community detection area. With an application of this criterion, we find some inconsistent phenomena of a few previous works. In particular, using SCSTC, we find that the original theoretical upper bounds on error rates of the SPACL algorithm under MMSB and its extended version under DCMM are sub-optimal for the error rates and requirements on network sparsity. To find how the inconsistent phenomena occur, we re-establish theoretical upper bounds of error rats for both SPACL and its extended version by using recent techniques on row-wise eigenvector deviation. The resulting error bounds explicitly keep track of seven independent model parameters

(K, ρ, σ_{K} (\tilde{P}), λ_{K} (Π^{'} Π), λ_{1} (Π^{'} Π), θ_{\min}, θ_{\max})

, which allow us to have a further delicate analysis. Compared with the original theoretical results, ours have smaller error rates with lesser dependence on K and

\log (n)

, weaker requirements on the network sparsity and the lower bound of the smallest nonzero singular value of population adjacency matrix under both MMSB and DCMM. For DCMM, we have no constraint on the distribution of the membership matrix as long as it satisfies the identifiability condition. When considering the separation condition of a standard network and the probability to generate a connected Erdös–Rényi (ER) random graph by using SCSTC, our theoretical results match the classical results. Meanwhile, our theoretical results also match those of Theorem 2.2 [41] under mild conditions, and when DCMM degenerates to MMSB, the theoretical results under DCMM are consistent with those under MMSB. Using the SCSTC criterion, we find that the reasons behind the inconsistent phenomena are the sup-optimality of the original theoretical upper bounds on error rates for SPACL as well as its extended version, and the usage of a regularization version of the adjacency matrix when building theoretical results for spectral methods designed to detect nodes labels for a non-mixed network. The processes of finding these inconsistent phenomena, sub-optimality theoretical results on error rates and the formation mechanism of these inconsistent phenomena guarantee the usefulness of the SCSTC criterion. As shown by Remark 8, the theoretical results of some previous works can be improved by applying this criterion. Using SCSTC, we find that spectral methods considered in [26,41,43,44,48,49,50,67,68] achieve thresholds in Equations (1)–(3), and this conclusion is verified by both theoretical analysis and the numerical results in this paper. A limitation of this criterion is that it is only used for studying the consistency of spectral methods for a standard network with a constant number of communities. It would be interesting to develop a more general criterion that can study the consistency of all methods besides spectral methods, and models besides those can degenerate to SBM for a non-standard network with large K. Finally, we hope that the SCSTC criterion developed in this paper can be widely applied to build and compare theoretical results for spectral methods in the community detection area and that the thresholds in Equations (1)–(3) can be seen as benchmark thresholds for spectral methods.

Funding

This research was funded by Scientific research start-up fund of China University of Mining and Technology NO. 102520253, the High level personal project of Jiangsu Province NO. JSSCBS20211218.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SCSTC	separation condition and sharp threshold criterion
SBM	stochastic blockmodels
DCSBM	degree corrected stochastic blockmodel
MMSB	mixed membership stochastic blockmodel
DCMM	degree corrected mixed membership model
SBMO	stochastic blockmodel with overlap
OCCAM	overlapping continuous community assignment model
RSC	regularized spectral clustering
SCORE	spectral clustering on ratios-of-eigenvectors
SPACL	sequential projection after cleaning
ER	Erdös–Rényi
IS	ideal simplex
IC	ideal cone
SP	successive projection algorithm
oPCA	ordinary principle component analysis
nPCA	normalized principle component analysis

Appendix A. Additional Experiments

Figure A1. Panel (a): a graph generated from MMSB with

n = 600

and

K = 2

. Each community has 250 pure nodes. For the 100 mixed nodes, they have mixed membership

(1 / 2, 1 / 2)

. Panel (b): a graph generated from MMSB with

n = 600

and

K = 3

. Each community has 150 pure nodes. For the 150 mixed nodes, they have mixed membership

(1 / 3, 1 / 3, 1 / 3)

. Nodes in panels (a,b) connect with probability

p_{in} = 1 / 600

and

p_{out} = 60 / 600

, so the two networks in both panels are dis-assortative networks. For panel (a), error rates for SPACL and SVM-cone-DCMMSB are 0.0298 and 0.0180, respectively. For panel (b), error rates for SPACL and SVM-cone-DCMMSB are 0.1286 and 0.0896, respectively. For both panels, dots in the same color are pure nodes in the same community, and green square nodes are mixed.

Figure A1. Panel (a): a graph generated from MMSB with

n = 600

and

K = 2

. Each community has 250 pure nodes. For the 100 mixed nodes, they have mixed membership

(1 / 2, 1 / 2)

. Panel (b): a graph generated from MMSB with

n = 600

and

K = 3

. Each community has 150 pure nodes. For the 150 mixed nodes, they have mixed membership

(1 / 3, 1 / 3, 1 / 3)

. Nodes in panels (a,b) connect with probability

p_{in} = 1 / 600

and

p_{out} = 60 / 600

, so the two networks in both panels are dis-assortative networks. For panel (a), error rates for SPACL and SVM-cone-DCMMSB are 0.0298 and 0.0180, respectively. For panel (b), error rates for SPACL and SVM-cone-DCMMSB are 0.1286 and 0.0896, respectively. For both panels, dots in the same color are pure nodes in the same community, and green square nodes are mixed.

Figure A2. Panel (a): a graph generated from

S B M (600, 2, 2 / 600, 30 / 600)

. Panel (b): a graph generated from

S B M (600, 3, 2 / 600, 30 / 600)

. Networks in panels (a,b) are dis-assortative networks since

p_{in} < p_{out}

. For panel (a), error rates for oPCA, nPCA, RSC and SCORE are 0. For panel (b), error rates for oPCA, nPCA, RSC and SCORE are 0.0067. Colors indicate clusters.

Figure A2. Panel (a): a graph generated from

S B M (600, 2, 2 / 600, 30 / 600)

. Panel (b): a graph generated from

S B M (600, 3, 2 / 600, 30 / 600)

. Networks in panels (a,b) are dis-assortative networks since

p_{in} < p_{out}

. For panel (a), error rates for oPCA, nPCA, RSC and SCORE are 0. For panel (b), error rates for oPCA, nPCA, RSC and SCORE are 0.0067. Colors indicate clusters.

Appendix B. Vertex Hunting Algorithms

The SP algorithm is written as below.

Algorithm A1 Successive projection (SP) [51]

Require: Near-separable matrix $Y_{s p} = S_{s p} M_{s p} + Z_{s p} \in R_{+}^{m \times n}$ , where $S_{s p}, M_{s p}$ should satisfy Assumption 1 [51], the number r of columns to be extracted.
Ensure: Set of indices $K$ such that $Y (K, :) \approx S$ (up to permutation)
1: Let $R = Y_{s p}, K = {}, k = 1$ .
2: While $R \neq 0$ and $k \leq r$ do
3: $k_{*} = {argmax}_{k} {∥ R (k, :) ∥}_{F}$ .
4: $u_{k} = R (k_{*}, :)$ .
5: $R \leftarrow (I - \frac{u_{k} u_{k}^{'}}{∥ u_{k} ∥_{F}^{2}}) R$ .
6: $K = K \cup {k_{*}}$ .
7: k=k+1.
8: end while

Based on Algorithm A1, the following theorem is Theorem 1.1 in [51], and it is also the Lemma VII.1 in [44]. This theorem provides the bound between the corner matrix

S_{s p}

and its estimated version returned by letting

Y_{s p}

be the input of the SP algorithm when

M_{s p}^{'} S_{s p}^{'}

enjoys the ideal simplex structure.

Theorem A1.

Fix

m \geq r

and

n \geq r

. Consider a matrix

Y_{s p} = S_{s p} M_{s p} + Z_{s p}

, where

S_{s p} \in R^{m \times r}

has a full column rank,

M_{s p} \in R^{r \times n}

is a non-negative matrix such that the sum of each column is at most 1, and

Z_{s p} = [Z_{s p, 1}, \dots, Z_{s p, n}] \in R^{m \times n}

. Suppose that

M_{s p}

has a submatrix equal to

I_{r}

. Write

ϵ \leq \max_{1 \leq i \leq n} {∥ Z_{s p, i} ∥}_{F}

. Suppose

ϵ = O (\frac{σ_{\min} (S_{s p})}{\sqrt{r} κ^{2} (S_{s p})})

, where

σ_{\min} (S_{s p})

and

κ (S_{s p})

are the minimum singular value and condition number of

S_{s p}

, respectively. If we apply the SP algorithm to columns of

Y_{s p}

, then it outputs an index set

K \subset {1, 2, \dots, n}

such that

| K | = r

and

\max_{1 \leq k \leq r} \min_{j \in K} {∥ S_{s p} (:, k) - Y_{s p} (:, j) ∥}_{F} = O (ϵ κ^{2} (S_{s p}))

, where

S_{s p} (:, k)

is the k-th column of

S_{s p}

.

For the ideal SPACL algorithm, since inputs of the ideal SPACL are

Ω

and K, we see that the inputs of SP algorithm are U and K. Let

m = K, r = K, Y_{s p} = U^{'}, Z_{s p} = U^{'} - U^{'} \equiv 0, S_{s p} = U^{'} (I, :),

and

M_{s p} = Π^{'}

. Then, we have

\max_{i \in [n]} {∥ U (i, :) - U (i, :) ∥}_{F} = 0

. By Theorem A1, the SP algorithm returns

I

up to permutation when the input is U, assuming there are K communities. Since

U = Π U (I, :)

under

M M S B_{n} (K, P, Π, ρ)

, we see that

U (i, :) = U (j, :)

as long as

Π (i, :) = Π (j, :)

. Therefore, though

I

may be different up to the permutation,

U (I, :)

is unchanged. Therefore, following the four steps of the ideal SPACL algorithm, we see that it exactly returns

Π

.

Algorithm A2 below is the SVM-cone algorithm provided in [43].

Algorithm A2 SVM-cone [43]

Require: $\hat{S} \in R^{n \times m}$ with rows have unit $l_{2}$ norm, number of corners K, estimated distance corners from hyperplane $γ$ .
Ensure: The near-corner index set $\hat{I}$ .
1: Run one-class SVM on $\hat{S} (i, :)$ to get $\hat{w}$ and $\hat{b}$
2: Run k-means algorithm to the set ${\hat{S} (i, :) | \hat{S} (i, :) \hat{w} \leq \hat{b} + γ}$ that are close to the hyperplane into K clusters
3: Pick one point from each cluster to get the near-corner set $\hat{I}$

As suggested in [43], we can start

γ = 0

and incrementally increase it until K distinct clusters are found. Meanwhile, for the ideal SVM-cone-DCMMSB algorithm, when setting

U_{*}

and K as the inputs of the SVM-cone algorithms, since

∥ U_{*} - U_{*} ∥_{2 \to \infty} = 0

, Lemma F.1. [43] guarantees that SVM-cone algorithm returns

I

up to permutation. Since

U_{*} = Y U_{*} (I, :)

by Lemma 3 under

D C M M_{n} (K, P, Π, Θ)

, we have

U_{*} (i, :) = U_{*} (j, :)

when

Π (i, :) = Π (j, :)

by basic algebra, which gives that

U_{*} (I, :)

is unchanged though

I

may be different up to permutation. Therefore, the ideal SVM-cone-DCMMSB exactly recovers

Π

.

Appendix C. Proof of Consistency under MMSB

Appendix C.1. Proof of Lemma 1

Proof.

We apply Theorem 1.4 (Bernstein inequality) in [70] to bound

∥ A - Ω ∥

, and this theorem is written as shown below.

Theorem A2.

Consider a finite sequence

{X_{k}}

of independent, random, self-adjoint matrices with dimension d. Assume that each random matrix satisfies

\begin{matrix} E [X_{k}] = 0, and λ_{\max} (X_{k}) \leq R almost surely . \end{matrix}

Then, for all

t \geq 0

,

\begin{matrix} P (λ_{\max} (\sum_{k} X_{k}) \geq t) \leq d \cdot \exp (\frac{- t^{2} / 2}{σ^{2} + R t / 3}), \end{matrix}

where

σ^{2} : = ∥ \sum_{k} E [X_{k}^{2}] ∥

.

Let

e_{i}

be an

n \times 1

vector, where

e_{i} (i) = 1

and 0 elsewhere, for

i \in [n]

. For convenience, set

W = A - Ω

. Then we can write W as

W = \sum_{i = 1}^{n} \sum_{j = 1}^{n} W (i, j) e_{i} e_{j}^{'}

. Set

W^{(i, j)}

as the

n \times n

matrix such that

W^{(i, j)} = W (i, j) (e_{i} e_{j}^{'} + e_{j} e_{i}^{'})

, which gives

W = \sum_{1 \leq i < j \leq n} W^{(i, j)}

where

E [W^{(i, j)}] = 0

and

\begin{matrix} ∥ W^{(i, j)} ∥ & = ∥ W (i, j) (e_{i} e_{j}^{'} + e_{j} e_{i}) ∥ = | W (i, j) | ∥ (e_{i} e_{j}^{'} + e_{j} e_{i}^{'}) ∥ = | W (i, j) | = | A (i, j) - Ω (i, j) | \leq 1 . \end{matrix}

For the variance parameter

σ^{2} : = ∥ \sum_{1 \leq i < j \leq n} E [{(W^{(i, j)})}^{2}] ∥

. We bound

E (W^{2} (i, j))

as shown below:

\begin{matrix} E (W^{2} (i, j)) & = E ({(A (i, j) - Ω (i, j))}^{2}) = Var (A (i, j)) = Ω (i, j) (1 - Ω (i, j)) \leq Ω (i, j) \\ = ρ Π (i, :) \tilde{P} Π^{'} (j, :) \leq ρ . \end{matrix}

Next we bound

σ^{2}

as shown below:

\begin{matrix} σ^{2} & = ∥ \sum_{1 \leq i < j \leq n} E (W^{2} (i, j)) (e_{i} e_{j}^{'} + e_{j} e_{i}^{'}) (e_{i} e_{j}^{'} + e_{j} e_{i}^{'}) ∥ = ∥ \sum_{1 \leq i < j \leq n} E [W^{2} (i, j) (e_{i} e_{i}^{'} + e_{j} e_{j}^{'})] ∥ \\ \leq \max_{1 \leq i \leq n} | \sum_{j = 1}^{n} E (W^{2} (i, j)) | \leq \max_{1 \leq i \leq n} \sum_{j = 1}^{n} ρ = ρ n . \end{matrix}

Set

t = \frac{α + 1 + \sqrt{(α + 1) (α + 19)}}{3} \sqrt{ρ n \log (n)}

for any

α > 0

, combine Theorem A2 with

σ^{2} \leq ρ n, R = 1, d = n

, and we have

\begin{matrix} P (∥ W ∥ \geq t) & = P (∥ \sum_{1 \leq i < j \leq n} W^{(i, j)} ∥ \geq t) \leq n \cdot \exp (\frac{- t^{2} / 2}{σ^{2} + R t / 3}) \\ \leq n \cdot \exp (\frac{- (α + 1) \log (n)}{\frac{18}{{(\sqrt{α + 1} + \sqrt{α + 19})}^{2}} + \frac{2 \sqrt{α + 1}}{\sqrt{α + 1} + \sqrt{α + 19}} \sqrt{\frac{\log (n)}{ρ n}}}) \leq \frac{1}{n^{α}}, \end{matrix}

where we use Assumption (A1) such that

\frac{18}{{(\sqrt{α + 1} + \sqrt{α + 19})}^{2}} + \frac{2 \sqrt{α + 1}}{\sqrt{α + 1} + \sqrt{α + 19}} \sqrt{\frac{\log (n)}{ρ n}} \leq

\frac{18}{{(\sqrt{α + 1} + \sqrt{α + 19})}^{2}}

+ \frac{2 \sqrt{α + 1}}{\sqrt{α + 1} + \sqrt{α + 19}} = 1

. □

Appendix C.2. Proof of Lemma 2

Proof.

Let

H = {\hat{U}}^{'} U

, and

H = U_{H} Σ_{H} V_{H}^{'}

be the SVD decomposition of

H_{\hat{U}}

with

U_{H}, V_{H} \in R^{n \times K}

, where

U_{H}

and

V_{H}

represent respectively the left and right singular matrices of H. Define

sgn (H) = U_{H} V_{H}^{'}

. Since

E (A (i, j) - Ω (i, j)) = 0

,

E [{(A (i, j) - Ω (i, j))}^{2}] \leq ρ

by the proof of Lemma 1,

\frac{1}{\sqrt{ρ n / (μ \log (n))}} \leq O (1)

holds by Assumption (A1) where

μ

is the incoherence parameter defined as

μ = \frac{{n ∥ U ∥}_{2 \to \infty}^{2}}{K}

. By Theorem 4.2 [64], with high probability, we have

\begin{matrix} ∥ \hat{U} {sgn (H) - U ∥}_{2 \to \infty} \leq C \frac{\sqrt{K ρ} (κ (Ω) \sqrt{μ} + \sqrt{\log (n)})}{σ_{K} (Ω)}, \end{matrix}

provided that

c_{1} σ_{K} (Ω) \geq \sqrt{ρ n \log (n)}

for some sufficiently small constant

c_{1}

. By Lemma 3.1 of [44], we know that

{∥ U ∥}_{2 \to \infty}^{2} \leq \frac{1}{λ_{K} (Π^{'} Π)}

, which gives

\begin{matrix} ∥ \hat{U} {sgn (H) - U ∥}_{2 \to \infty} \leq C \frac{\sqrt{K ρ} (κ (Ω) \sqrt{\frac{n}{K λ_{K} (Π^{'} Π)}} + \sqrt{\log (n)})}{σ_{K} (Ω)} . \end{matrix}

(A1)

Remark A1.

By Theorem 4.2 of [65], when

σ_{K} (Ω) \geq 4 {∥ A - Ω ∥}_{\infty}

, we have

\begin{matrix} ∥ \hat{U} {sgn (H) - U ∥}_{2 \to \infty} \leq 14 \frac{{∥ A - Ω ∥}_{\infty}}{σ_{K} (Ω)} {∥ U ∥}_{2 \to \infty} . \end{matrix}

By Lemma 3.1 [44], we have

\begin{matrix} ∥ \hat{U} {sgn (H) - U ∥}_{2 \to \infty} \leq 14 \frac{{∥ A - Ω ∥}_{\infty}}{σ_{K} (Ω)} \sqrt{\frac{1}{λ_{K} (Π^{'} Π)}} . \end{matrix}

Unlike Lemma V.1 [44] which bounds

{∥ A - Ω ∥}_{\infty}

via the Chernoff bound and obtains

{∥ A - Ω ∥}_{\infty} \leq C ρ n

with high probability, we bound

{∥ A - Ω ∥}_{\infty}

by the Bernstein inequality using a similar idea as Equation (C.67) of [41]. Let

y = {(y_{1}, y_{2}, \dots, y_{n})}^{'}

be any

n \times 1

vector; by Equation (C.67) [41], we know that with an application of the Bernstein inequality, for any

t \geq 0

and

i \in [n]

, we have

\begin{matrix} P (| \sum_{j = 1}^{n} (A (i, j) - Ω (i, j)) y (j) | > t) \leq 2 \exp (- \frac{t^{2} / 2}{\sum_{j = 1}^{n} Ω (i, j) y^{2} (j) + \frac{{t ∥ y ∥}_{\infty}}{3}}) . \end{matrix}

By the proof of Lemma 1, we have

Ω (i, j) \leq ρ

. Set

y (j)

as 1 or

- 1

such that

(A (i, j) - Ω (i, j)) y (j) = | A (i, j) - Ω (i, j) |

, we have

\begin{matrix} {P (∥ A - Ω ∥}_{\infty} > t) \leq 2 \exp (- \frac{t^{2} / 2}{ρ n + \frac{t}{3}}) . \end{matrix}

Set

t = \frac{α + 1 + \sqrt{(α + 1) (α + 19)}}{3} \sqrt{ρ n \log (n)}

for any

α > 0

, by Assumption (A1), we have

\begin{matrix} {P (∥ A - Ω ∥}_{\infty} > t) \leq 2 \exp (- \frac{t^{2} / 2}{ρ n + \frac{t}{3}}) \leq n^{- α} . \end{matrix}

Hence, when

σ_{K} (Ω) \geq C_{0} \sqrt{ρ n \log (n)}

where

C_{0} = 4 \frac{α + 1 + \sqrt{(α + 1) (α + 19)}}{3}

, with probability at least

1 - o (n^{- α})

,

\begin{matrix} ∥ \hat{U} sgn (H_{\hat{U}}) {- U ∥}_{2 \to \infty} \leq C \frac{\sqrt{ρ n \log (n)}}{σ_{K} (Ω)} \sqrt{\frac{1}{λ_{K} (Π^{'} Π)}} . \end{matrix}

Note that when

λ_{K} (Π^{'} Π) = O (\frac{n}{K})

, the above bound turns to be

C \frac{\sqrt{ρ K \log (n)}}{σ_{K} (Ω)}

, which is consistent with that of Equation (A1). Also note that this bound

\frac{\sqrt{ρ n \log (n)}}{σ_{K} (Ω)} \sqrt{\frac{1}{λ_{K} (Π^{'} Π)}}

is sharper than the

\frac{ρ n}{σ_{K} (Ω)} \sqrt{\frac{1}{λ_{K} (Π^{'} Π)}}

of Lemma V.1 [44] by Assumption (A1).

Since

\hat{U}

and U have orthonormal columns, now we are ready to bound

∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty}

:

\begin{matrix} ∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty} = \max_{i \in [n]} {∥ e_{i}^{'} (U U^{'} - \hat{U} {\hat{U}}^{'}) ∥}_{F} \\ = \max_{i \in [n]} {∥ e_{i}^{'} (U U^{'} - \hat{U} sgn (H) U^{'} + \hat{U} sgn (H) U^{'} - \hat{U} {\hat{U}}^{'}) ∥}_{F} \\ \leq \max_{i \in [n]} ∥ e_{i}^{'} (U - \hat{U} sgn (H)) U^{'} ∥_{F} + \max_{i \in [n]} {∥ e_{i}^{'} \hat{U} (sgn (H) U^{'} - {\hat{U}}^{'}) ∥}_{F} \\ = \max_{i \in [n]} ∥ e_{i}^{'} (U - \hat{U} sgn (H)) ∥_{F} + \max_{i \in [n]} {∥ \hat{U} (sgn (H) U^{'} - {\hat{U}}^{'}) e_{i} ∥}_{F} \\ = \max_{i \in [n]} ∥ e_{i}^{'} (U - \hat{U} sgn (H)) ∥_{F} + \max_{i \in [n]} {∥ (sgn (H) U^{'} - {\hat{U}}^{'}) e_{i} ∥}_{F} \\ = \max_{i \in [n]} ∥ e_{i}^{'} (U - \hat{U} sgn (H)) ∥_{F} + \max_{i \in [n]} {∥ e_{i}^{'} (U {(sgn (H))}^{'} - \hat{U}) ∥}_{F} \\ = \max_{i \in [n]} ∥ e_{i}^{'} (U - \hat{U} sgn (H)) ∥_{F} + \max_{i \in [n]} {∥ e_{i}^{'} (U - \hat{U} sgn (H)) ∥}_{F} \\ = 2 \max_{i \in [n]} ∥ e_{i}^{'} (U - \hat{U} sgn (H)) ∥_{F} = 2 {∥ U - \hat{U} sgn (H) ∥}_{2 \to \infty} \\ \leq C \frac{\sqrt{K} (κ (Ω) \sqrt{\frac{n}{K λ_{K} (Π^{'} Π)}} + \sqrt{\log (n)})}{σ_{K} (\tilde{P}) \sqrt{ρ} λ_{K} (Π^{'} Π)}, \end{matrix}

where the last inequality holds since

σ_{K} (Ω) \geq σ_{K} (\tilde{P}) ρ λ_{K} (Π^{'} Π)

under

M M S B_{n} (K, \tilde{P}, Π, ρ)

by Lemma II.4 [44] This bound is

C \frac{\sqrt{n \log (n)}}{σ_{K} (\tilde{P}) \sqrt{ρ} λ_{K}^{1.5} (Π^{'} Π)}

if we use Theorem 4.2 of [65].

Remark A2.

By Theorem 4.5 [77], we have

∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty} \leq \sqrt{n} (∥ \hat{U} ∥_{2 \to \infty} + {∥ U ∥}_{2 \to \infty}) ∥ U - \hat{U} {sgn (H) ∥}_{2 \to \infty} \leq \sqrt{n} (∥ U - \hat{U} {sgn (H) ∥}_{2 \to \infty} + {2 ∥ U ∥}_{2 \to \infty}) ∥ U - \hat{U} {sgn (H) ∥}_{2 \to \infty}

\leq \sqrt{n} (∥ U - \hat{U} {sgn (H) ∥}_{2 \to \infty} + \frac{2}{\sqrt{λ_{K} (Π^{'} Π)}}) ∥ U - \hat{U} {sgn (H) ∥}_{2 \to \infty} = O (\frac{2 \sqrt{n}}{\sqrt{λ_{K} (Π^{'} Π)}} ∥ U - \hat{U} sgn (H) ∥_{2 \to \infty})

. Sure our bound

∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty} \leq 2 {∥ U - \hat{U} sgn (H) ∥}_{2 \to \infty}

enjoys concise form. In particular, when

λ_{K} (Π^{'} Π) = O (\frac{n}{K})

and

K = O (1)

, the two bounds give that

∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty} = O (∥ U - \hat{U} sgn (H) ∥_{2 \to \infty})

, which provides same error bound of the estimated memberships given in Corollary 1.

□

Appendix C.3. Proof of Theorem 1

Proof.

Follow almost the same proof as Equation (3) of [44]. For

i \in [n]

, there exists a permutation matrix

P \in R^{K \times K}

such that

\begin{matrix} ∥ e_{i}^{'} (\hat{Z} - Z P) ∥_{F} = O (ϖ κ (Π^{'} Π) \sqrt{K λ_{1} (Π^{'} Π)}) . \end{matrix}

(A2)

Note that the bound in Equation (A2) is

\sqrt{K}

times the bound in Equation (3) of [44], and this is because in Equation (3) of [44], (in the language of Ref. [44])

∥ {\hat{V}}_{p}^{- 1} ∥

denotes the Frobenius norm of

{\hat{V}}_{p}^{- 1}

instead of the spectral norm. Since

∥ {\hat{V}}_{p}^{- 1} ∥_{F} \leq \frac{\sqrt{K}}{σ_{K} ({\hat{V}}_{p})}

, the bound in Equation (3) [44] should multiply

\sqrt{K}

.

Recall that

Z = Π, Π (i, :) = \frac{Z (i, :)}{{∥ Z (i, :) ∥}_{1}}, \hat{Π} (i, :) = \frac{\hat{Z} (j, :)}{∥ \hat{Z} {(j, :) ∥}_{1}}

, for

i \in [n]

, since

\begin{matrix} ∥ e_{i}^{'} (\hat{Π} - Π P) ∥_{1} & = ∥ \frac{e_{i}^{'} \hat{Z}}{∥ e_{i}^{'} \hat{Z} ∥_{1}} - \frac{e_{i}^{'} Z P}{∥ e_{i}^{'} {Z P ∥}_{1}} ∥_{1} = {∥ \frac{e_{i}^{'} \hat{Z} ∥ e_{i}^{'} {Z ∥}_{1} - e_{i}^{'} Z P {∥ e_{i}^{'} \hat{Z} ∥}_{1}}{∥ e_{i}^{'} \hat{Z} ∥_{1} {∥ e_{i}^{'} Z ∥}_{1}} ∥}_{1} \\ \leq \frac{∥ e_{i}^{'} \hat{Z} ∥ e_{i}^{'} {Z ∥}_{1} - e_{i}^{'} \hat{Z} ∥ e_{i}^{'} \hat{Z} ∥_{1} ∥_{1} + ∥ e_{i}^{'} \hat{Z} ∥ e_{i}^{'} \hat{Z} ∥_{1} - e_{i}^{'} Z P ∥ e_{i}^{'} \hat{Z} {∥_{1} ∥}_{1}}{∥ e_{i}^{'} \hat{Z} ∥_{1} {∥ e_{i}^{'} Z ∥}_{1}} \\ = \frac{| ∥ e_{i}^{'} {Z ∥}_{1} - ∥ e_{i}^{'} \hat{Z} ∥_{1} | + ∥ e_{i}^{'} \hat{Z} - e_{i}^{'} {Z P ∥}_{1}}{∥ e_{i}^{'} {Z ∥}_{1}} \leq \frac{2 ∥ e_{i}^{'} (\hat{Z} - Z P) ∥_{1}}{∥ e_{i}^{'} {Z ∥}_{1}} \\ = \frac{2 ∥ e_{i}^{'} (\hat{Z} - Z P) ∥_{1}}{∥ e_{i}^{'} {Π ∥}_{1}} = 2 ∥ e_{i}^{'} (\hat{Z} - Z P) ∥_{1} \leq 2 \sqrt{K} {∥ e_{i}^{'} (\hat{Z} - Z P) ∥}_{F} \\ = O (ϖ K κ (Π^{'} Π) \sqrt{λ_{1} (Π^{'} Π)}) . \end{matrix}

□

Appendix C.4. Proof of Corollary 1

Proof.

Under the conditions of Corollary 1, we have

\begin{matrix} \max_{i \in [n]} {∥ e_{i}^{'} (\hat{Π} - Π P) ∥}_{1} = O (ϖ \sqrt{n}) . \end{matrix}

Under the conditions of Corollary 1, Lemma 2 gives

ϖ = O (\frac{1}{σ_{K} (\tilde{P})} \frac{1}{\sqrt{n}} \sqrt{\frac{\log (n)}{ρ n}})

, which gives that

\begin{matrix} \max_{i \in [n]} {∥ e_{i}^{'} (\hat{Π} - Π P) ∥}_{1} = O (\frac{1}{σ_{K} (\tilde{P})} \sqrt{\frac{\log (n)}{ρ n}}) . \end{matrix}

□

Appendix D. Proof of Consistency under DCMM

Appendix D.1. Proof of Lemma 3

Proof.

Since

Ω = U Λ U^{'}

, we have

U = Ω U Λ^{- 1}

since

U^{'} U = I_{K}

. Recall that

Ω = Θ Π \tilde{P} Π^{'} Θ

, we have

U = Θ Π \tilde{P} Π^{'} Θ U Λ^{- 1} = Θ Π B

, where we set

B = \tilde{P} Π^{'} Θ U Λ^{- 1}

for convenience. Since

U (I, :) = Θ (I, I) Π (I, :) B = Θ (I, I) B

, we have

B = Θ^{- 1} (I, I) U (I, :)

.

Set

M = Π B

. Then we have

U = Θ M

, which gives that

U (i, :) = e_{i}^{'} U = Θ (i, i) M (i, :)

for

i \in [n]

. Therefore,

U_{*} (i, :) = \frac{U (i, :)}{{∥ U (i, :) ∥}_{F}} = \frac{M (i, :)}{{∥ M (i, :) ∥}_{F}}

, and combined with the fact that

B = Θ^{- 1} (I, I) U (I, :) \equiv Θ^{- 1} (I, I) N_{U}^{- 1} (I, I) N_{U} (I, I) U (I, :) \equiv Θ^{- 1} (I, I) N_{U}^{- 1} (I, I) U_{*} (I, :)

, we have

\begin{matrix} U_{*} = [\begin{matrix} Π (1, :) / {∥ M (1, :) ∥}_{F} \\ Π (2, :) / {∥ M (2, :) ∥}_{F} \\ ⋮ \\ Π (n, :) / {∥ M (n, :) ∥}_{F} \end{matrix}] B = [\begin{matrix} Π (1, :) / {∥ M (1, :) ∥}_{F} \\ Π (2, :) / {∥ M (2, :) ∥}_{F} \\ ⋮ \\ Π (n, :) / {∥ M (n, :) ∥}_{F} \end{matrix}] Θ^{- 1} (I, I) N_{U}^{- 1} (I, I) U_{*} (I, :) . \end{matrix}

Therefore, we have

\begin{matrix} Y = N_{M} Π Θ^{- 1} (I, I) N_{U}^{- 1} (I, I), \end{matrix}

where

N_{M}

is a diagonal matrix whose i-th diagonal entry is

\frac{1}{{∥ M (i, :) ∥}_{F}}

for

i \in [n]

. □

Appendix D.2. Proof of Lemma 4

Proof.

Similar to the proof of Lemma 1, set

W = A - Ω

and

W^{(i, j)} = W (i, j) (e_{i} e_{j}^{'} + e_{j} e_{i}^{'})

, we have

W = \sum_{1 \leq i < j \leq n} W^{(i, j)}

,

E [W^{(i, j)}] = 0

and

∥ W^{(i, j)} ∥ \leq 1

. Since

\begin{matrix} E (W^{2} (i, j)) & = E ({(A (i, j) - Ω (i, j))}^{2}) = Var (A (i, j)) = Ω (i, j) (1 - Ω (i, j)) \\ \leq Ω (i, j) = θ (i) θ (j) Π (i, :) \tilde{P} Π^{'} (j, :) \leq θ (i) θ (j) {\tilde{P}}_{\max}, \end{matrix}

we have

\begin{matrix} σ^{2} & = ∥ \sum_{1 \leq i < j \leq n} E (W^{2} (i, j)) (e_{i} e_{j}^{'} + e_{j} e_{i}^{'}) (e_{i} e_{j}^{'} + e_{j} e_{i}^{'}) ∥ = ∥ \sum_{1 \leq i < j \leq n} E [W^{2} (i, j) (e_{i} e_{i}^{'} + e_{j} e_{j}^{'})] ∥ \\ \leq \max_{1 \leq i \leq n} | \sum_{j = 1}^{n} E (W^{2} (i, j)) | \leq \max_{1 \leq i \leq n} \sum_{j = 1}^{n} θ (i) θ (j) {\tilde{P}}_{\max} \leq {\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} . \end{matrix}

Set

t = \frac{α + 1 + \sqrt{(α + 1) (α + 19)}}{3} \sqrt{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} \log (n)}

for any

α > 0

, combine Theorem A2 with

σ^{2} \leq {\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1}, R = 1, d = n

, we have

\begin{matrix} P (∥ W ∥ \geq t) & = P (∥ \sum_{1 \leq i < j \leq n} W^{(i, j)} ∥ \geq t) \leq n \cdot \exp (\frac{- t^{2} / 2}{σ^{2} + R t / 3}) \\ \leq n \cdot \exp (\frac{- (α + 1) \log (n)}{\frac{18}{{(\sqrt{α + 1} + \sqrt{α + 19})}^{2}} + \frac{2 \sqrt{α + 1}}{\sqrt{α + 1} + \sqrt{α + 19}} \sqrt{\frac{\log (n)}{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1}}}}) \leq \frac{1}{n^{α}}, \end{matrix}

where we use Assumption (A2) such that

\frac{18}{{(\sqrt{α + 1} + \sqrt{α + 19})}^{2}} + \frac{2 \sqrt{α + 1}}{\sqrt{α + 1} + \sqrt{α + 19}} \sqrt{\frac{\log (n)}{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1}}} \leq \frac{18}{{(\sqrt{α + 1} + \sqrt{α + 19})}^{2}} + \frac{2 \sqrt{α + 1}}{\sqrt{α + 1} + \sqrt{α + 19}} = 1

. □

Appendix D.3. Proof of Lemma 5

Proof.

The proof is similar to that of Lemma 2, so we omit most details. Since

E (A (i, j) - Ω (i, j)) = 0

,

E [{(A (i, j) - Ω (i, j))}^{2}] \leq θ (i) θ (j) {\tilde{P}}_{\max} \leq θ_{\max}^{2} {\tilde{P}}_{\max}

,

\frac{1}{θ_{\max} \sqrt{{\tilde{P}}_{\max} n / (μ \log (n))}} \leq O (1)

holds by Assumption (A2) where

μ = \frac{{n ∥ U ∥}_{2 \to \infty}^{2}}{K}

. By Theorem 4.2 [64], with high probability, we have

\begin{matrix} ∥ \hat{U} {sgn (H) - U ∥}_{2 \to \infty} \leq C \frac{θ_{\max} \sqrt{{\tilde{P}}_{\max} K} (κ (Ω) \sqrt{μ} + \sqrt{\log (n)})}{σ_{K} (Ω)}, \end{matrix}

provided that

c_{*} σ_{K} (Ω) \geq θ_{\max} \sqrt{{\tilde{P}}_{\max} n \log (n)}

for some sufficiently small constant

c_{*}

. By Lemma H.1 of [43], we know that

{∥ U ∥}_{2 \to \infty}^{2} \leq \frac{θ_{\max}^{2}}{λ_{K} (Π^{'} Θ^{2} Π)} \leq \frac{θ_{\max}^{2}}{θ_{\min}^{2} λ_{K} (Π^{'} Π)}

under

D C M M_{n} (K, \tilde{P}, Π, Θ)

, which gives

\begin{matrix} ∥ \hat{U} {sgn (H) - U ∥}_{2 \to \infty} \leq C \frac{θ_{\max} \sqrt{{\tilde{P}}_{\max} K} (\frac{θ_{\max} κ (Ω)}{θ_{\min}} \sqrt{\frac{n}{K λ_{K} (Π^{'} Π)}} + \sqrt{\log (n)})}{σ_{K} (Ω)}, \end{matrix}

Remark A3.

Similar to the proof of Lemma 2, by Theorem 4.2 of [65], when

σ_{K} (Ω) \geq 4 {∥ A - Ω ∥}_{\infty}

, we have

\begin{matrix} ∥ \hat{U} {sgn (H) - U ∥}_{2 \to \infty} \leq 14 \frac{{∥ A - Ω ∥}_{\infty}}{σ_{K} (Ω)} {∥ U ∥}_{2 \to \infty} \leq \frac{14 θ_{\max} {∥ A - Ω ∥}_{\infty}}{θ_{\min} σ_{K} (Ω) \sqrt{λ_{K} (Π^{'} Π)}} . \end{matrix}

Let

y = {(y_{1}, y_{2}, \dots, y_{n})}^{'}

be any

n \times 1

vector, and by the Bernstein inequality, for any

t \geq 0

and

i \in [n]

, we have

\begin{matrix} P (| \sum_{j = 1}^{n} (A (i, j) - Ω (i, j)) y (j) | > t) \leq 2 \exp (- \frac{t^{2} / 2}{\sum_{j = 1}^{n} Ω (i, j) y^{2} (j) + \frac{{t ∥ y ∥}_{\infty}}{3}}) . \end{matrix}

By the proof of Lemma 4, we have

Ω (i, j) \leq θ (i) θ (j) {\tilde{P}}_{\max}

, which gives

\sum_{j = 1}^{n} Ω (i, j) \leq {\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1}

. Set

y (j)

as 1 or

- 1

such that

(A (i, j) - Ω (i, j)) y (j) = | A (i, j) - Ω (i, j) |

, we have

\begin{matrix} {P (∥ A - Ω ∥}_{\infty} > t) \leq 2 \exp (- \frac{t^{2} / 2}{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} + \frac{t}{3}}) . \end{matrix}

Set

t = \frac{α + 1 + \sqrt{(α + 1) (α + 19)}}{3} \sqrt{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} \log (n)}

for any

α > 0

, by Assumption (A2), we have

\begin{matrix} {P (∥ A - Ω ∥}_{\infty} > t) \leq 2 \exp (- \frac{t^{2} / 2}{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} + \frac{t}{3}}) \leq n^{- α} . \end{matrix}

Hence, when

σ_{K} (Ω) \geq C_{0} \sqrt{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} \log (n)}

where

C_{0} = 4 \frac{α + 1 + \sqrt{(α + 1) (α + 19)}}{3}

, with probability at least

1 - o (n^{- α})

,

\begin{matrix} ∥ \hat{U} {sgn (H) - U ∥}_{2 \to \infty} \leq C \frac{θ_{\max} \sqrt{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} \log (n)}}{θ_{\min} σ_{K} (Ω) \sqrt{λ_{K} (Π^{'} Π)}} . \end{matrix}

Meanwhile, since

\sqrt{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} \log (n)} \leq θ_{\max} \sqrt{{\tilde{P}}_{\max} n \log (n)}

, for convenience, we let the lower bound requirement of

σ_{K} (Ω)

be

C θ_{\max} \sqrt{{\tilde{P}}_{\max} n \log (n)}

.

Similar to the proof of Lemma 2, we have

\begin{matrix} ∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty} = \max_{i \in [n]} ∥ e_{i}^{'} (U U^{'} - \hat{U} {\hat{U}}^{'}) ∥_{F} \leq 2 {∥ U - \hat{U} sgn (H) ∥}_{2 \to \infty} \\ \leq C \frac{θ_{\max} \sqrt{{\tilde{P}}_{\max} K} (\frac{θ_{\max} κ (Ω)}{θ_{\min}} \sqrt{\frac{n}{K λ_{K} (Π^{'} Π)}} + \sqrt{\log (n)})}{σ_{K} (Ω)} \\ \leq C \frac{θ_{\max} \sqrt{{\tilde{P}}_{\max} K} (\frac{θ_{\max} κ (Ω)}{θ_{\min}} \sqrt{\frac{n}{K λ_{K} (Π^{'} Π)}} + \sqrt{\log (n)})}{θ_{\min}^{2} σ_{K} (\tilde{P}) λ_{K} (Π^{'} Π)}, \end{matrix}

where the last inequality holds since

σ_{K} (Ω) = σ_{K} (Θ Π \tilde{P} Π^{'} Θ) \geq θ_{\min}^{2} σ_{K} (Π P Π^{'}) = θ_{\min}^{2} σ_{K} (Π^{'} Π \tilde{P}) \geq θ_{\min}^{2} σ_{K} (\tilde{P}) σ_{K} (Π^{'} Π) = θ_{\min}^{2} σ_{K} (\tilde{P}) λ_{K} (Π^{'} Π)

. And this bound is

C \frac{θ_{\max} \sqrt{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} \log (n)}}{θ_{\min}^{3} σ_{K} (\tilde{P}) λ_{K}^{1.5} (Π^{'} Π)}

if we use Theorem 4.2 of [65]. □

Appendix D.4. Proof of Theorem 2

Proof.

To prove this theorem, we follow similar procedures as Theorem 3.2 of [43]. For

i \in [n]

, recall that

Z_{*} = Y_{*} J_{*} \equiv N_{U}^{- 1} N_{M} Π, {\hat{Z}}_{*} = {\hat{Y}}_{*} {\hat{J}}_{*}, Π (i, :) = \frac{Z (i, :)}{{∥ Z (i, :) ∥}_{1}}

and

{\hat{Π}}_{*} (i, :) = \frac{{\hat{Z}}_{*} (i, :)}{∥ {\hat{Z}}_{*} {(i, :) ∥}_{1}}

, where

N_{M}

and M are defined in the proof of Lemma 3 such that

U = Θ M \equiv Θ Π B_{*}

and

N_{M} (i, i) = \frac{1}{{∥ M (i, :) ∥}_{F}}

, we have

\begin{matrix} ∥ e_{i}^{'} ({\hat{Π}}_{*} - Π P_{*}) ∥_{1} \leq \frac{2 ∥ e_{i}^{'} ({\hat{Z}}_{*} - Z_{*} P_{*}) ∥_{1}}{∥ e_{i}^{'} Z_{*} ∥_{1}} \leq \frac{2 \sqrt{K} {∥ e_{i}^{'} ({\hat{Z}}_{*} - Z_{*} P_{*}) ∥}_{F}}{∥ e_{i}^{'} Z_{*} ∥_{1}} . \end{matrix}

Now, we provide a lower bound of

∥ e_{i}^{'} Z_{*} ∥_{1}

as below

\begin{matrix} ∥ e_{i}^{'} Z_{*} ∥_{1} & = ∥ e_{i}^{'} N_{U}^{- 1} N_{M} {Π ∥}_{1} = ∥ N_{U}^{- 1} (i, i) e_{i}^{'} N_{M} {Π ∥}_{1} = N_{U}^{- 1} (i, i) {∥ N_{M} (i, i) e_{i}^{'} Π ∥}_{1} = \frac{N_{M} (i, i)}{N_{U} (i, i)} \\ = {∥ U (i, :) ∥}_{F} N_{M} (i, i) = {∥ U (i, :) ∥}_{F} \frac{1}{{∥ M (i, :) ∥}_{F}} = {∥ U (i, :) ∥}_{F} \frac{1}{∥ e_{i}^{'} {M ∥}_{F}} \\ = {∥ U (i, :) ∥}_{F} \frac{1}{∥ e_{i}^{'} Θ^{- 1} {U ∥}_{F}} = {∥ U (i, :) ∥}_{F} \frac{1}{∥ Θ^{- 1} (i, i) e_{i}^{'} {U ∥}_{F}} = θ (i) \geq θ_{\min} . \end{matrix}

Therefore, by Lemma A3, we have

\begin{matrix} ∥ e_{i}^{'} ({\hat{Π}}_{*} - Π P_{*}) ∥_{1} & \leq \frac{2 \sqrt{K} {∥ e_{i}^{'} ({\hat{Z}}_{*} - Z_{*} P_{*}) ∥}_{F}}{∥ e_{i}^{'} Z_{*} ∥_{1}} \leq \frac{2 \sqrt{K} {∥ e_{i}^{'} ({\hat{Z}}_{*} - Z_{*} P_{*}) ∥}_{F}}{θ_{\min}} \\ = O (\frac{θ_{\max}^{15} K^{5} ϖ κ^{4.5} (Π^{'} Π) λ_{1}^{1.5} (Π^{'} Π)}{θ_{\min}^{15} π_{\min}}) . \end{matrix}

□

Appendix D.5. Proof of Corollary 2

Proof.

Under conditions of Corollary 2, we have

\begin{matrix} \max_{i \in [n]} {∥ e_{i}^{'} ({\hat{Π}}_{*} - Π P_{*}) ∥}_{1} = O (\frac{θ_{\max}^{15} K^{5} ϖ κ^{4.5} (Π^{'} Π) λ_{1}^{1.5} (Π^{'} Π)}{θ_{\min}^{15} π_{\min}}) = O (\frac{θ_{\max}^{15} ϖ \sqrt{n}}{θ_{\min}^{15}}) . \end{matrix}

Under the conditions of Corollary 2, Lemma 5 gives

ϖ = O (\frac{θ_{\max} \sqrt{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} \log (n)}}{θ_{\min}^{3} σ_{K} (\tilde{P}) n^{1.5}})

, which gives that

\begin{matrix} \max_{i \in [n]} {∥ e_{i}^{'} ({\hat{Π}}_{*} - Π P_{*}) ∥}_{1} = O (\frac{θ_{\max}^{15} ϖ \sqrt{n}}{θ_{\min}^{15}}) = O (\frac{θ_{\max}^{16} \sqrt{{\tilde{P}}_{\max} θ_{\max} {∥ θ ∥}_{1} \log (n)}}{θ_{\min}^{18} σ_{K} (\tilde{P}) n}) . \end{matrix}

By basic algebra, this corollary follows. □

Appendix D.6. Basic Properties of Ω under DCMM

Lemma A1.

Under

D C M M_{n} (K, \tilde{P}, Π, Θ)

, we have

\begin{matrix} {∥ U (i, :) ∥}_{F} \geq \frac{θ_{\min}}{θ_{\max} \sqrt{K λ_{1} (Π^{'} Π)}} for i \in [n], and η \geq \frac{θ_{\min}^{4} π_{\min}}{θ_{\max}^{4} K λ_{1} (Π^{'} Π)}, \end{matrix}

where

η = \min_{k \in [K]} ({(U_{*} (I, :) U_{*}^{'} (I, :))}^{- 1} 1) (k)

.

Proof.

Since

I = U^{'} U = U^{'} (I, :) Θ^{- 1} (I, I) Π^{'} Θ^{2} Π Θ^{- 1} (I, I) U (I, :)

by the proof of Lemma 3, we have

((Θ^{- 1} (I, I) U (I, :)) {({(Θ^{- 1} (I, I) U (I, :))}^{'})}^{- 1} = Π^{'} Θ^{2} Π

, which gives that

\begin{matrix} \min_{k} {∥ e_{k}^{'} (Θ^{- 1} (I, I) U (I, :)) ∥}_{F}^{2} & = \min_{k} e_{k}^{'} (Θ^{- 1} (I, I) U (I, :)) {(Θ^{- 1} (I, I) U (I, :))}^{'} e_{k} \\ \geq \min_{∥ x ∥ = 1} x^{'} (Θ^{- 1} (I, I) U (I, :)) {(Θ^{- 1} (I, I) U (I, :))}^{'} x \\ = λ_{K} ((Θ^{- 1} (I, I) U (I, :)) {(Θ^{- 1} (I, I) U (I, :))}^{'}) = \frac{1}{λ_{1} (Π^{'} Θ^{2} Π)}, \end{matrix}

where x is a

K \times 1

vector whose

l_{2}

norm is 1. Then, for

i \in [n]

, we have

\begin{matrix} {∥ U (i, :) ∥}_{F} & = ∥ θ_{i} Π (i, :) Θ^{- 1} {(I, I) U (I, :) ∥}_{F} = θ_{i} {∥ Π (i, :) Θ^{- 1} (I, I) U (I, :) ∥}_{F} \\ \geq θ_{i} \min_{i} {∥ Π (i, :) ∥}_{F} \min_{i} {∥ e_{i}^{'} (Θ^{- 1} (I, I) U (I, :)) ∥}_{F} \\ \geq θ_{i} \min_{i} {∥ e_{i}^{'} (Θ^{- 1} (I, I) U (I, :)) ∥}_{F} / \sqrt{K} \\ \geq \frac{θ_{i}}{\sqrt{K λ_{1} (Π^{'} Θ^{2} Π)}} \geq \frac{θ_{\min}}{θ_{\max} \sqrt{K λ_{1} (Π^{'} Π)}}, \end{matrix}

where we use the fact that

\min_{i} {∥ Π (i, :) ∥}_{F} \geq \frac{1}{\sqrt{K}}

since

\sum_{k = 1}^{K} Π (i, k) = 1

and all entries of

Π

are non-negative.

Since

U_{*} = N_{U} U

, we have

\begin{matrix} {(U_{*} (I, :) U_{*}^{'} (I, :))}^{- 1} & = N_{U}^{- 1} (I, I) Θ^{- 1} (I, I) Π^{'} Θ^{2} Π Θ^{- 1} (I, I) N_{U}^{- 1} (I, I) \\ \geq \frac{θ_{\min}^{2}}{θ_{\max}^{2} N_{U, \max}^{2}} Π^{'} Π \geq \frac{θ_{\min}^{4}}{θ_{\max}^{4} K λ_{1} (Π^{'} Π)} Π^{'} Π, \end{matrix}

where we set

N_{U, \max} = \max_{i \in [n]} N_{U} (i, i)

and we use the fact that

N_{U}, Θ

are diagonal matrices and

N_{U, \max} \leq \frac{θ_{\max} \sqrt{K λ_{1} (Π^{'} Π)}}{θ_{\min}}

. Then we have

\begin{matrix} η = \min_{k \in [K]} ({(U_{*} (I, :) U_{*}^{'} (I, :))}^{- 1} 1) (k) \geq \frac{θ_{\min}^{4}}{θ_{\max}^{4} K λ_{1} (Π^{'} Π)} \min_{k \in [K]} e_{k}^{'} Π^{'} Π 1 \\ = \frac{θ_{\min}^{4}}{θ_{\max}^{4} K λ_{1} (Π^{'} Π)} \min_{k \in [K]} e_{k}^{'} Π^{'} 1 = \frac{θ_{\min}^{4} π_{\min}}{θ_{\max}^{4} K λ_{1} (Π^{'} Π)} . \end{matrix}

□

Appendix D.7. Bounds between Ideal SVM-cone-DCMMSB and SVM-cone-DCMMSB

The next lemma focuses on the 2nd step of SVM-cone-DCMMSB and is the cornerstone to characterize the behaviors of SVM-cone-DCMMSB.

Lemma A2.

Under

D C M M_{n} (K, \tilde{P}, Π, Θ)

, when conditions of Lemma 5 hold, there exists a permutation matrix

P_{*} \in R^{K \times K}

such that with probability at least

1 - o (n^{- α})

, we have

\begin{matrix} \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{U}}_{*, 2} ({\hat{I}}_{*}, :) - P_{*}^{'} U_{*, 2} (I, :)) ∥}_{F} = O (\frac{K^{3} θ_{\max}^{11} ϖ κ^{3} (Π^{'} Π) λ_{1}^{1.5} (Π^{'} Π)}{θ_{\min}^{11} π_{\min}}), \end{matrix}

where

U_{*, 2} = U_{*} U^{'}, {\hat{U}}_{*, 2} = {\hat{U}}_{*} {\hat{U}}^{'}

, i.e.,

U_{*, 2}, {\hat{U}}_{*, 2}

are the row-normalized versions of

U U^{'}

and

\hat{U} {\hat{U}}^{'}

, respectively.

Proof.

Lemma G.1. of [43] says that using

{\hat{U}}_{*, 2}

as input of the SVM-cone algorithm returns the same result as using

{\hat{U}}_{*}

as the input. By Lemma F.1 of [43], there exists a permutation matrix

P_{*} \in R^{K \times K}

such that

\begin{matrix} \max_{k \in [K]} {∥ e_{k}^{'} ({\hat{U}}_{*, 2} ({\hat{I}}_{*}, :) - P_{*}^{'} U_{*, 2} (I, :)) ∥}_{F} = O (\frac{\sqrt{K} ζ ϵ_{*}}{λ_{K}^{1.5} (U_{*, 2} (I, :)) U_{*, 2}^{'} (I, :)}), \end{matrix}

where

ζ \leq \frac{4 K}{η λ_{K}^{1.5} (U_{*, 2} (I, :) U_{*, 2}^{'} (I, :))} = O (\frac{K}{η λ_{K}^{1.5} (U_{*} (I, :) U_{*}^{'} (I, :))}), ϵ_{*} = \max_{i \in [n]} {∥ {\hat{U}}_{*, 2} (i, :) - U_{*, 2} (i, :) ∥}_{F}

and

η = \min_{1 \leq k \leq K} ({(U_{*} (I, :) U_{*}^{'} (I, :))}^{- 1} 1) (k)

. Next we give upper bound of

ϵ_{*}

.

\begin{matrix} ∥ {\hat{U}}_{*, 2} (i, :) - U_{*, 2} {(i, :) ∥}_{F} = {∥ \frac{{\hat{U}}_{2} (i, :) ∥ U_{2} {(i, :) ∥}_{F} - U_{2} (i, :) {∥ {\hat{U}}_{2} (i, :) ∥}_{F}}{∥ {\hat{U}}_{2} {(i, :) ∥}_{F} {∥ U_{2} (i, :) ∥}_{F}} ∥}_{F} \leq \frac{2 ∥ {\hat{U}}_{2} (i, :) - U_{2} {(i, :) ∥}_{F}}{∥ U_{2} {(i, :) ∥}_{F}} \\ \leq \frac{2 ∥ {\hat{U}}_{2} - U_{2} ∥_{2 \to \infty}}{∥ U_{2} {(i, :) ∥}_{F}} \leq \frac{2 ϖ}{∥ U_{2} {(i, :) ∥}_{F}} = \frac{2 ϖ}{∥ (U U^{'}) (i, :) ∥_{F}} = \frac{2 ϖ}{∥ U (i, :) U^{'} ∥_{F}} = \frac{2 ϖ}{{∥ U (i, :) ∥}_{F}} \\ \leq \frac{2 θ_{\max} ϖ \sqrt{K λ_{1} (Π^{'} Π)}}{θ_{\min}}, \end{matrix}

where the last inequality holds by Lemma A1. Then, we have

ϵ_{*} = O (\frac{θ_{\max} ϖ \sqrt{K λ_{1} (Π^{'} Π)}}{θ_{\min}})

. By Lemma H.2. of [43],

λ_{K} (U_{*} (I, :) U_{*}^{'} (I, :)) \geq \frac{θ_{\min}^{2} κ^{- 1} (Π^{'} Π)}{θ_{\max}^{2}}

. By the lower bound of

η

given in Lemma A1, we have

\begin{matrix} \max_{k \in [K]} {∥ e_{k}^{'} ({\hat{U}}_{*, 2} ({\hat{I}}_{*}, :) - P_{*}^{'} U_{*, 2} (I, :)) ∥}_{F} = O (\frac{K^{3} θ_{\max}^{11} ϖ κ^{3} (Π^{'} Π) λ_{1}^{1.5} (Π^{'} Π)}{θ_{\min}^{11} π_{\min}}) . \end{matrix}

□

Next the lemma focuses on the 3rd step of SVM-cone-DCMMSB and bounds

\max_{i \in [n]} {∥ e_{i}^{'} ({\hat{Z}}_{*} - Z_{*} P_{*}) ∥}_{F}

.

Lemma A3.

Under

D C M M_{n} (K, \tilde{P}, Π, Θ)

, when the conditions of Lemma 5 hold, with a probability of at least

1 - o (n^{- α})

, we have

\begin{matrix} \max_{i \in [n]} {∥ e_{i}^{'} ({\hat{Z}}_{*} - Z_{*} P_{*}) ∥}_{F} = O (\frac{θ_{\max}^{15} K^{4.5} ϖ κ^{4.5} (Π^{'} Π) λ_{1}^{1.5} (Π^{'} Π)}{θ_{\min}^{14} π_{\min}}) . \end{matrix}

Proof.

For

i \in [n]

, since

Z_{*} = Y_{*} J_{*}, {\hat{Z}}_{*} = {\hat{Y}}_{*} {\hat{J}}_{*}

and

J_{*}, {\hat{J}}_{*}

are diagonal matrices, we have

\begin{matrix} ∥ e_{i}^{'} ({\hat{Z}}_{*} - Z_{*} P_{*}) ∥_{F} = ∥ e_{i}^{'} (\max (0, {\hat{Y}}_{*} {\hat{J}}_{*}) - Y_{*} J_{*} P_{*}) ∥_{F} \leq {∥ e_{i}^{'} ({\hat{Y}}_{*} {\hat{J}}_{*} - Y_{*} J_{*} P_{*}) ∥}_{F} \\ = ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{*}) {\hat{J}}_{*} + e_{i}^{'} Y_{*} P_{*} ({\hat{J}}_{*} - P_{*}^{'} J_{*} P_{*}) ∥_{F} \\ \leq ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{*}) ∥_{F} ∥ {\hat{J}}_{*} ∥ + ∥ e_{i}^{'} Y_{*} P_{*} ∥_{F} ∥ {\hat{J}}_{*} - P_{*}^{'} J_{*} P_{*} ∥ \\ = ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{*}) ∥_{F} ∥ {\hat{J}}_{*} ∥ + ∥ e_{i}^{'} Y_{*} ∥_{F} ∥ {\hat{J}}_{*} - P_{*}^{'} J_{*} P_{*} ∥ \\ = ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{*}) ∥_{F} ∥ {\hat{J}}_{*} ∥ + ∥ e_{i}^{'} Y_{*} ∥_{F} ∥ J_{*} - P_{*} {\hat{J}}_{*} P_{*}^{'} ∥ . \end{matrix}

Therefore, the bound of

∥ e_{i}^{'} ({\hat{Z}}_{*} - Z_{*} P_{*}) ∥_{F}

can be obtained as long as we bound

∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{*}) ∥_{F}, ∥ {\hat{J}}_{*} ∥, ∥ e_{i}^{'} Y_{*} ∥_{F}

and

∥ J_{*} - P_{*} {\hat{J}}_{*} P_{*}^{'} ∥

. We bound the four terms as below:

We bound $∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{*}) ∥_{F}$ first. Set $U_{*} (I, :) = B_{*}, {\hat{U}}_{*} ({\hat{I}}_{*}, :) = {\hat{B}}_{*}, U_{*, 2} (I, :) = B_{2 *}, {\hat{U}}_{*, 2} ({\hat{I}}_{*}, :) = {\hat{B}}_{2 *}$ for convenience. For $i \in [n]$ , we have

$\begin{matrix} ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{*}) ∥_{F} = {∥ e_{i}^{'} (\hat{U} {\hat{B}}_{*}^{'} {({\hat{B}}_{*} {\hat{B}}_{*}^{'})}^{- 1} - U B_{*}^{'} {(B_{*} B_{*}^{'})}^{- 1} P_{*}) ∥}_{F} \\ = ∥ e_{i}^{'} (\hat{U} - U (U^{'} \hat{U})) {\hat{B}}_{*}^{'} {({\hat{B}}_{*} {\hat{B}}_{*}^{'})}^{- 1} + e_{i}^{'} (U (U^{'} \hat{U}) {\hat{B}}_{*}^{'} {({\hat{B}}_{*} {\hat{B}}_{*}^{'})}^{- 1} \\ - U (U^{'} \hat{U}) {(P_{*}^{'} (B_{*} B_{*}^{'}) {(B_{*}^{'})}^{- 1} (U^{'} \hat{U}))}^{- 1} {) ∥}_{F} \\ \leq ∥ e_{i}^{'} (\hat{U} - U (U^{'} \hat{U})) {\hat{B}}_{*}^{'} {({\hat{B}}_{*} {\hat{B}}_{*}^{'})}^{- 1} ∥_{F} + ∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{*}^{'} {({\hat{B}}_{*} {\hat{B}}_{*}^{'})}^{- 1} \\ - {(P_{*}^{'} (B_{*} B_{*}^{'}) {(B_{*}^{'})}^{- 1} (U^{'} \hat{U}))}^{- 1} {) ∥}_{F} \\ \leq ∥ e_{i}^{'} (\hat{U} - U (U^{'} \hat{U})) ∥_{F} ∥ {\hat{B}}_{*}^{- 1} ∥_{F} + {∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{*}^{'} {({\hat{B}}_{*} {\hat{B}}_{*}^{'})}^{- 1} - {(P_{*}^{'} (B_{*} B_{*}^{'}) {(B_{*}^{'})}^{- 1} (U^{'} \hat{U}))}^{- 1}) ∥}_{F} \\ \leq \sqrt{K} ∥ e_{i}^{'} (\hat{U} - U (U^{'} \hat{U})) ∥_{F} / \sqrt{λ_{K} ({\hat{B}}_{*} {\hat{B}}_{*}^{'})} + {∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{*}^{- 1} - {(P_{*}^{'} B_{*} (U^{'} \hat{U}))}^{- 1}) ∥}_{F} \\ \overset{(i)}{=} \sqrt{K} ∥ e_{i}^{'} (\hat{U} {\hat{U}}^{'} - U U^{'}) \hat{U} ∥_{F} O (\frac{θ_{\max} \sqrt{κ (Π^{'} Π)}}{θ_{\min}}) + {∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{*}^{- 1} - {(P_{*}^{'} B_{*} (U^{'} \hat{U}))}^{- 1}) ∥}_{F} \\ \leq \sqrt{K} ∥ e_{i}^{'} (\hat{U} {\hat{U}}^{'} - U U^{'}) ∥_{F} O (\frac{θ_{\max} \sqrt{κ (Π^{'} Π)}}{θ_{\min}}) + {∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{*}^{- 1} - {(P_{*}^{'} B_{*} (U^{'} \hat{U}))}^{- 1}) ∥}_{F} \\ \leq \sqrt{K} ϖ O (\frac{θ_{\max} \sqrt{κ (Π^{'} Π)}}{θ_{\min}}) + {∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{*}^{- 1} - {(P_{*}^{'} B_{*} (U^{'} \hat{U}))}^{- 1}) ∥}_{F} \\ = O (ϖ \frac{θ_{\max} \sqrt{K κ (Π^{'} Π)}}{θ_{\min}}) + {∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{*}^{- 1} - {(P_{*}^{'} B_{*} (U^{'} \hat{U}))}^{- 1}) ∥}_{F}, \end{matrix}$

where we have used similar idea in the proof of Lemma VII.3 in [44] such that we apply $O (\frac{1}{λ_{K} (B_{*} B_{*}^{'})})$ to estimate $\frac{1}{λ_{K} ({\hat{B}}_{*} {\hat{B}}_{*}^{'})}$ .
Now we aim to bound $∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{*}^{- 1} - {(P_{*}^{'} B_{*} (U^{'} \hat{U}))}^{- 1}) ∥_{F}$ . For convenience, set $T = U^{'} \hat{U}, S = P_{*}^{'} B_{*} T$ . We have

$\begin{matrix} ∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{*}^{- 1} - {(P_{*}^{'} B_{*} (U^{'} \hat{U}))}^{- 1}) ∥_{F} = {∥ e_{i}^{'} U T S^{- 1} (S - {\hat{B}}_{*}) {\hat{B}}_{*}^{- 1} ∥}_{F} \\ \leq ∥ e_{i}^{'} U T S^{- 1} (S - {\hat{B}}_{*}) ∥_{F} ∥ {\hat{B}}_{*}^{- 1} ∥_{F} \leq {∥ e_{i}^{'} U T S^{- 1} (S - {\hat{B}}_{*}) ∥}_{F} \frac{\sqrt{K}}{| λ_{K} ({\hat{B}}_{*}) |} \\ = ∥ e_{i}^{'} U T S^{- 1} (S - {\hat{B}}_{*}) ∥_{F} \frac{\sqrt{K}}{\sqrt{λ_{K} ({\hat{B}}_{*} {\hat{B}}_{*}^{'})}} \leq {∥ e_{i}^{'} U T S^{- 1} (S - {\hat{B}}_{*}) ∥}_{F} O (\frac{θ_{\max} \sqrt{K κ (Π^{'} Π)}}{θ_{\min}}) \\ = ∥ e_{i}^{'} U T T^{- 1} B_{*}^{'} {(B_{*} B_{*}^{'})}^{- 1} P_{*} (S - {\hat{B}}_{*}) ∥_{F} O (\frac{θ_{\max} \sqrt{K κ (Π^{'} Π)}}{θ_{\min}}) \\ = ∥ e_{i}^{'} U B_{*}^{'} {(B_{*} B_{*}^{'})}^{- 1} P_{*} (S - {\hat{B}}_{*}) ∥_{F} O (\frac{θ_{\max} \sqrt{K κ (Π^{'} Π)}}{θ_{\min}}) \\ = ∥ e_{i}^{'} Y_{*} P_{*} (S - {\hat{B}}_{*}) ∥_{F} O (\frac{θ_{\max} \sqrt{K λ_{1} (Π^{'} Π)}}{θ_{\min}}) \leq ∥ e_{i}^{'} Y_{*} ∥_{F} {∥ S - {\hat{B}}_{*} ∥}_{F} O (\frac{θ_{\max} \sqrt{K λ_{1} (Π^{'} Π)}}{θ_{\min}}) \\ \overset{By Equation (A 3)}{\leq} \frac{θ_{\max}^{2} \sqrt{K λ_{1} (Π^{'} Π)}}{θ_{\min}^{2} λ_{K} (Π^{'} Π)} \max_{1 \leq k \leq K} {∥ e_{k}^{'} (S - {\hat{B}}_{*}) ∥}_{F} O (\frac{θ_{\max} K \sqrt{κ (Π^{'} Π)}}{θ_{\min}}) \\ = \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{B}}_{*} - P_{*}^{'} B_{*} U^{'} \hat{U}) ∥}_{F} O (\frac{θ_{\max}^{3} K^{1.5} κ (Π^{'} Π)}{θ_{\min}^{3} \sqrt{λ_{K} (Π^{'} Π)}}) \\ = \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{B}}_{*} {\hat{U}}^{'} - P_{*}^{'} B_{*} U^{'}) \hat{U} ∥}_{F} O (\frac{θ_{\max}^{3} K^{1.5} κ (Π^{'} Π)}{θ_{\min}^{3} \sqrt{λ_{K} (Π^{'} Π)}}) \\ \leq \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{B}}_{*} {\hat{U}}^{'} - P^{'} B_{*} U^{'}) ∥}_{F} O (\frac{θ_{\max}^{3} K^{1.5} κ (Π^{'} Π)}{θ_{\min}^{3} \sqrt{λ_{K} (Π^{'} Π)}}) \\ = \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{B}}_{2 *} - P_{*}^{'} B_{2 *}) ∥}_{F} O (\frac{θ_{\max}^{3} K^{1.5} κ (Π^{'} Π)}{θ_{\min}^{3} \sqrt{λ_{K} (Π^{'} Π)}}) \\ \overset{By Lemma A 2}{=} O (\frac{K^{4.5} θ_{\max}^{14} ϖ κ^{4.5} (Π^{'} Π) λ_{1} (Π^{'} Π)}{θ_{\min}^{14} π_{\min}}) . \end{matrix}$

Then, we have

$\begin{matrix} ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{*}) ∥_{F} & \leq O (ϖ \frac{θ_{\max} \sqrt{K κ (Π^{'} Π)}}{θ_{\min}}) + ∥ e_{i}^{'} U (U^{'} \hat{U}) {({\hat{B}}_{*}^{- 1} - (P_{*}^{'} B_{*} U^{'} \hat{U}))}^{- 1} {) ∥}_{F} \\ \leq O (ϖ \frac{θ_{\max} \sqrt{K κ (Π^{'} Π)}}{θ_{\min}}) + O (\frac{K^{4.5} θ_{\max}^{14} ϖ κ^{4.5} (Π^{'} Π) λ_{1} (Π^{'} Π)}{θ_{\min}^{14} π_{\min}}) \\ = O (\frac{K^{4.5} θ_{\max}^{14} ϖ κ^{4.5} (Π^{'} Π) λ_{1} (Π^{'} Π)}{θ_{\min}^{14} π_{\min}}) . \end{matrix}$
for $∥ e_{i}^{'} Y_{*} ∥_{F}$ , since $Y_{*} = U U_{*}^{- 1} (I, :)$ , we have

$\begin{matrix} ∥ e_{i}^{'} Y_{*} ∥_{F} \leq {∥ U (i, :) ∥}_{F} {∥ U_{*}^{- 1} (I, :) ∥}_{F} \leq \frac{\sqrt{K} {∥ U (i, :) ∥}_{F}}{\sqrt{λ_{K} (U_{*} (I, :) U_{*}^{'} (I, :))}} \leq \frac{θ_{\max}^{2} \sqrt{K λ_{1} (Π^{'} Π)}}{θ_{\min}^{2} λ_{K} (Π^{'} Π)} . \end{matrix}$

(A3)
for $∥ {\hat{J}}_{*} ∥$ , recall that ${\hat{J}}_{*} = \sqrt{diag ({\hat{U}}_{*} ({\hat{I}}_{*}, :) \hat{Λ} {\hat{U}}_{*}^{'} ({\hat{I}}_{*}, :))}$ , we have

$\begin{matrix} ∥ {\hat{J}}_{*} ∥^{2} & = \max_{k \in [K]} {\hat{J}}_{*}^{2} (k, k) = \max_{k \in [K]} e_{k}^{'} {\hat{U}}_{*} ({\hat{I}}_{*}, :) \hat{Λ} {\hat{U}}_{*}^{'} ({\hat{I}}_{*}, :) e_{k} \\ = \max_{k \in [K]} ∥ e_{k}^{'} {\hat{U}}_{*} ({\hat{I}}_{*}, :) \hat{Λ} {\hat{U}}_{*}^{'} ({\hat{I}}_{*}, :) e_{k} ∥ \\ \leq \max_{k \in [K]} ∥ e_{k}^{'} {\hat{U}}_{*} ({\hat{I}}_{*}, :) ∥^{2} ∥ \hat{Λ} ∥ \leq \max_{k \in [K]} ∥ e_{k}^{'} {\hat{U}}_{*} ({\hat{I}}_{*}, :) ∥_{F}^{2} ∥ \hat{Λ} ∥ = ∥ \hat{Λ} ∥, \end{matrix}$

where we have used the fact that $∥ {\hat{U}}_{*} {(i, :) ∥}_{F} = 1$ for $i \in [n]$ in the last equality. Since we need $σ_{K} (Ω) \geq C θ_{\max} \sqrt{{\tilde{P}}_{\max} n \log (n)} \geq C ∥ A - Ω ∥$ in the proof of Lemma 5, we have $∥ \hat{Λ} ∥ = ∥ A ∥ = ∥ A - Ω + Ω ∥ \leq ∥ A - Ω ∥ + ∥ Ω ∥ \leq σ_{K} (Ω) + ∥ Ω ∥ \leq 2 ∥ Ω ∥ = 2 ∥ Θ Π \tilde{P} Π^{'} Θ ∥ \leq 2 C θ_{\max}^{2} {\tilde{P}}_{\max} λ_{1} (Π^{'} Π) = O (θ_{\max}^{2} {\tilde{P}}_{\max} λ_{1} (Π^{'} Π))$ . Then we have

$\begin{matrix} ∥ {\hat{J}}_{*} ∥ = O (θ_{\max} \sqrt{{\tilde{P}}_{\max} λ_{1} (Π^{'} Π)}) . \end{matrix}$
for $∥ J_{*} - P_{*} {\hat{J}}_{*} P_{*}^{'} ∥$ , we provide some simple facts first: $∥ \hat{Λ} ∥ = ∥ A ∥, ∥ Λ ∥ = ∥ Ω ∥, Ω = U Λ U^{'}, \tilde{A} = \hat{U} \hat{Λ} {\hat{U}}^{'}, ∥ \hat{U} ∥ = 1, ∥ U ∥ = 1, ∥ e_{k}^{'} P_{*} {\hat{B}}_{2 *} ∥ = ∥ {\hat{B}}_{2 *} e_{k} ∥ = ∥ e_{k}^{'} {\hat{B}}_{2 *} ∥ \leq ∥ e_{k}^{'} {\hat{B}}_{2 *} ∥_{F} = 1$ . Since $\tilde{A}$ is the best rank K approximation to A in the spectral norm, and therefore $∥ \tilde{A} - A ∥ \leq ∥ Ω - A ∥$ since $Ω = U Λ U^{'}$ with rank K and $Ω$ can also be viewed as a rank K approximation to A. This leads to $∥ Ω - \tilde{A} ∥ = ∥ Ω - A + A - \tilde{A} ∥ \leq 2 ∥ A - Ω ∥$ . By Lemma H.2 [43], $∥ B_{*} ∥ = ∥ U_{*} (I, :) ∥ = \sqrt{λ_{1} (U_{*} (I, :) U_{*}^{'} (I, :))} \leq \sqrt{κ (Π^{'} Θ^{2} Π)} \leq \frac{θ_{\max} κ^{0.5} (Π^{'} Π)}{θ_{\min}}$ . $∥ A ∥ = ∥ A - Ω + Ω ∥ \leq ∥ A - Ω ∥ + ∥ Ω ∥ \leq σ_{K} (Ω) + ∥ Ω ∥ \leq 2 ∥ Ω ∥$ by the lower bound requirement of $σ_{K} (Ω)$ in Lemma 5, and we also have $∥ A - Ω ∥ \leq σ_{K} (Ω) \leq ∥ Ω ∥$ . For $k \in [K]$ , let $τ_{k} = J_{*} (k, k), {\hat{τ}}_{k} = (P_{*} {\hat{J}}_{*} P_{*}^{'}) (k, k)$ for convenience. Based on the above facts and Lemma A2, we have

$\begin{matrix} \max_{k \in [K]} | τ_{k}^{2} - {\hat{τ}}_{k}^{2} | = \max_{k \in [K]} ∥ e_{k}^{'} U_{*} (I, :) Λ U_{*}^{'} (I, :) e_{k} - e_{k}^{'} P_{*} {\hat{U}}_{*} ({\hat{I}}_{*}, :) \hat{Λ} {\hat{U}}_{*}^{'} ({\hat{I}}_{*}, :) P_{*}^{'} e_{k} ∥ \\ = \max_{k \in [K]} ∥ e_{k}^{'} B_{2 *} U Λ U^{'} B_{2 *} e_{k} - e_{k}^{'} P_{*} {\hat{B}}_{2 *} \hat{U} \hat{Λ} {\hat{U}}^{'} {\hat{B}}_{2 *}^{'} P_{*}^{'} e_{k} ∥ \\ \leq ∥ e_{k}^{'} (B_{2 *} - P_{*} {\hat{B}}_{2 *}) U Λ U^{'} B_{2 *}^{'} e_{k} ∥ + ∥ e_{k}^{'} P_{*} {\hat{B}}_{2 *} (U Λ U^{'} - \hat{U} \hat{Λ} {\hat{U}}^{'}) B_{2 *}^{'} e_{k} ∥ \\ + ∥ e_{k}^{'} P_{*} {\hat{B}}_{2 *} \hat{U} \hat{Λ} {\hat{U}}^{'} (B_{2 *}^{'} - {\hat{B}}_{2 *}^{'} P_{*}^{'}) e_{k} ∥ \\ \leq ∥ e_{k}^{'} (B_{2 *} - P_{*} {\hat{B}}_{2 *}) ∥ ∥ U ∥ ∥ Λ ∥ ∥ U^{'} ∥ ∥ B_{2 *}^{'} e_{k} ∥ + ∥ e_{k}^{'} P_{*} {\hat{B}}_{2 *} ∥ ∥ U Λ U^{'} - \hat{U} \hat{Λ} {\hat{U}}^{'} ∥ ∥ B_{2 *}^{'} e_{k} ∥ \\ + ∥ e_{k}^{'} P_{*} {\hat{B}}_{2 *} ∥ ∥ \hat{U} ∥ ∥ \hat{Λ} ∥ ∥ {\hat{U}}^{'} ∥ ∥ (B_{2 *}^{'} - {\hat{B}}_{2 *}^{'} P_{*}^{'}) e_{k} ∥ \\ \leq ∥ e_{k}^{'} (B_{2 *} - P_{*} {\hat{B}}_{2 *}) ∥ ∥ Λ ∥ ∥ B_{2 *}^{'} e_{k} ∥ + ∥ U Λ U^{'} - \hat{U} \hat{Λ} {\hat{U}}^{'} ∥ ∥ B_{2 *}^{'} e_{k} ∥ + ∥ \hat{Λ} ∥ ∥ (B_{2 *}^{'} - {\hat{B}}_{2 *}^{'} P_{*}^{'}) e_{k} ∥ \\ = ∥ e_{k}^{'} (B_{2 *} - P_{*} {\hat{B}}_{2 *}) ∥ (∥ Ω ∥ ∥ B_{2 *}^{'} e_{k} ∥ + ∥ A ∥) + ∥ Ω - \tilde{A} ∥ ∥ B_{2 *}^{'} e_{k} ∥ \\ = ∥ e_{k}^{'} ({\hat{B}}_{2 *} - P_{*}^{'} B_{2 *}) ∥ (∥ Ω ∥ ∥ B_{2 *}^{'} e_{k} ∥ + ∥ A ∥) + ∥ Ω - \tilde{A} ∥ ∥ B_{2 *}^{'} e_{k} ∥ \\ \leq ∥ e_{k}^{'} ({\hat{B}}_{2 *} - P_{*}^{'} B_{2 *}) ∥ (∥ Ω ∥ ∥ B_{2 *}^{'} e_{k} ∥ + ∥ A ∥) + 2 ∥ A - Ω ∥ ∥ B_{2 *}^{'} e_{k} ∥ \\ = ∥ e_{k}^{'} ({\hat{B}}_{2 *} - P_{*}^{'} B_{2 *}) ∥ (∥ Ω ∥ ∥ U B_{*}^{'} e_{k} ∥ + ∥ A ∥) + 2 ∥ A - Ω ∥ ∥ U B_{*}^{'} e_{k} ∥ \\ \leq ∥ e_{k}^{'} ({\hat{B}}_{2 *} - P_{*}^{'} B_{2 *}) ∥ (∥ Ω ∥ ∥ B_{*}^{'} e_{k} ∥ + ∥ A ∥) + 2 ∥ A - Ω ∥ ∥ B_{*}^{'} e_{k} ∥ \\ \leq ∥ e_{k}^{'} ({\hat{B}}_{2 *} - P_{*}^{'} B_{2 *}) ∥ (∥ Ω ∥ ∥ B_{*}^{'} e_{k} ∥ + 2 ∥ Ω ∥) + 2 ∥ Ω ∥ ∥ B_{*}^{'} e_{k} ∥ \\ = ∥ e_{k}^{'} ({\hat{B}}_{2 *} - P_{*}^{'} B_{2 *}) ∥ (∥ B_{*}^{'} e_{k} ∥ + 1) O (θ_{\max}^{2} {\tilde{P}}_{\max} λ_{1} (Π^{'} Π)) + ∥ B_{*}^{'} e_{k} ∥ O (θ_{\max}^{2} {\tilde{P}}_{\max} λ_{1} (Π^{'} Π)) \\ \leq ∥ e_{k}^{'} ({\hat{B}}_{2 *} - P_{*}^{'} B_{2 *}) ∥ (∥ B_{*} ∥ + 1) O (θ_{\max}^{2} {\tilde{P}}_{\max} λ_{1} (Π^{'} Π)) + ∥ B_{*} ∥ O (θ_{\max}^{2} {\tilde{P}}_{\max} λ_{1} (Π^{'} Π)) \\ \leq ∥ e_{k}^{'} ({\hat{B}}_{2 *} - P_{*}^{'} B_{2 *}) ∥ O (θ_{\max}^{3} {\tilde{P}}_{\max} κ^{0.5} (Π^{'} Π) λ_{1} (Π^{'} Π) / θ_{\min}) \\ + O (θ_{\max}^{3} {\tilde{P}}_{\max} κ^{0.5} (Π^{'} Π) λ_{1} (Π^{'} Π) / θ_{\min}) \\ \leq ∥ e_{k}^{'} ({\hat{B}}_{2 *} - P_{*}^{'} B_{2 *}) ∥_{F} O (θ_{\max}^{3} {\tilde{P}}_{\max} κ^{0.5} (Π^{'} Π) λ_{1} (Π^{'} Π) / θ_{\min}) \\ + O (θ_{\max}^{3} {\tilde{P}}_{\max} κ^{0.5} (Π^{'} Π) λ_{1} (Π^{'} Π) / θ_{\min}) \\ = O (\frac{K^{3} θ_{\max}^{11} ϖ κ^{3} (Π^{'} Π) λ_{1}^{1.5} (Π^{'} Π)}{θ_{\min}^{11} π_{\min}}) O (θ_{\max}^{3} {\tilde{P}}_{\max} κ^{0.5} (Π^{'} Π) λ_{1} (Π^{'} Π) / θ_{\min}) \\ + O (θ_{\max}^{3} {\tilde{P}}_{\max} κ^{0.5} (Π^{'} Π) λ_{1} (Π^{'} Π) / θ_{\min}) = O (\frac{K^{3} θ_{\max}^{14} {\tilde{P}}_{\max} ϖ κ^{3.5} (Π^{'} Π) λ_{1}^{2.5} (Π^{'} Π)}{θ_{\min}^{12} π_{\min}}) . \end{matrix}$

Recall that $J_{*} = N_{U} (I, I) Θ (I, I)$ , we have $∥ J_{*} ∥ \leq N_{U, \max} θ_{\max} \leq \frac{θ_{\max}^{2} \sqrt{K λ_{1} (Π^{'} Π)}}{θ_{\min}}$ where the last inequality holds by Lemma A1. Similarly, we have $J_{*} (k, k) \geq θ_{\min} \min_{i \in [n]} \frac{1}{{∥ U (i, :) ∥}_{F}} \geq \frac{θ_{\min}^{2} \sqrt{λ_{K} (Π^{'} Π)}}{θ_{\max}}$ where the last inequality holds by the proof of Lemma 5. Then we have

$\begin{matrix} ∥ J_{*} - P_{*} {\hat{J}}_{*} P_{*}^{'} ∥ = \max_{k \in [K]} | {\hat{τ}}_{k} - τ_{k} | = \max_{k \in [K]} \frac{| {\hat{τ}}_{k}^{2} - τ_{k}^{2} |}{{\hat{τ}}_{k} + τ_{k}} \leq \max_{k \in [K]} \frac{| {\hat{τ}}_{k}^{2} - τ_{k}^{2} |}{τ_{k}} \\ \leq \frac{θ_{\max}}{θ_{\min}^{2} \sqrt{λ_{K} (Π^{'} Π)}} \max_{k \in [K]} | {\hat{τ}}_{k}^{2} - τ_{k}^{2} | = O (\frac{K^{3} θ_{\max}^{15} {\tilde{P}}_{\max} ϖ κ^{3.5} (Π^{'} Π) λ_{1}^{2.5} (Π^{'} Π)}{θ_{\min}^{14} π_{\min} \sqrt{λ_{K} (Π^{'} Π)}}) . \end{matrix}$

Combining the above results, we have

$\begin{matrix} ∥ e_{i}^{'} ({\hat{Z}}_{*} - Z_{*} P_{*}) ∥_{F} \leq ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{*}) ∥_{F} ∥ {\hat{J}}_{*} ∥ + ∥ e_{i}^{'} Y_{*} ∥_{F} ∥ J_{*} - P_{*} {\hat{J}}_{*} P_{*}^{'} ∥ \\ \leq O (\frac{K^{4.5} θ_{\max}^{14} ϖ κ^{4.5} (Π^{'} Π) λ_{1} (Π^{'} Π)}{θ_{\min}^{14} π_{\min}}) O (θ_{\max} \sqrt{{\tilde{P}}_{\max} λ_{1} (Π^{'} Π)}) \\ + \frac{θ_{\max}^{2} \sqrt{K λ_{1} (Π^{'} Π)}}{θ_{\min}^{2} λ_{K} (Π^{'} Π)} O (\frac{K^{3} θ_{\max}^{15} {\tilde{P}}_{\max} ϖ κ^{3.5} (Π^{'} Π) λ_{1}^{2.5} (Π^{'} Π)}{θ_{\min}^{14} π_{\min} \sqrt{λ_{K} (Π^{'} Π)}}) \\ = O (\frac{θ_{\max}^{15} K^{4.5} ϖ κ^{4.5} (Π^{'} Π) λ_{1}^{1.5} (Π^{'} Π)}{θ_{\min}^{14} π_{\min}}) . \end{matrix}$

□

References

Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef] [PubMed]
Newman, M.E. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys. Rev. E 2001, 64, 016132. [Google Scholar] [CrossRef] [PubMed]
Dunne, J.A.; Williams, R.J.; Martinez, N.D. Food-web structure and network theory: The role of connectance and size. Proc. Natl. Acad. Sci. USA 2002, 99, 12917–12922. [Google Scholar] [CrossRef]
Newman, M.E.J. Coauthorship networks and patterns of scientific collaboration. Proc. Natl. Acad. Sci. USA 2004, 101, 5200–5205. [Google Scholar] [CrossRef]
Notebaart, R.A.; van Enckevort, F.H.; Francke, C.; Siezen, R.J.; Teusink, B. Accelerating the reconstruction of genome-scale metabolic networks. BMC Bioinform. 2006, 7, 296. [Google Scholar] [CrossRef]
Pizzuti, C. Ga-net: A genetic algorithm for community detection in social networks. In International Conference on Parallel Problem Solving from Nature; Springer: Berlin/Heidelberg, Germany, 2008; pp. 1081–1090. [Google Scholar]
Jackson, M.O. Social and Economic Networks; Princeton University Press: Princeton, NJ, USA, 2010. [Google Scholar]
Gao, J.; Liang, F.; Fan, W.; Wang, C.; Sun, Y.; Han, J. On community outliers and their efficient detection in information networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–28 July 2010; pp. 813–822. [Google Scholar]
Rubinov, M.; Sporns, O. Complex network measures of brain connectivity: Uses and interpretations. Neuroimage 2010, 52, 1059–1069. [Google Scholar] [CrossRef]
Su, G.; Kuchinsky, A.; Morris, J.H.; States, D.J.; Meng, F. GLay: Community structure analysis of biological networks. Bioinformatics 2010, 26, 3135–3137. [Google Scholar] [CrossRef]
Lin, W.; Kong, X.; Yu, P.S.; Wu, Q.; Jia, Y.; Li, C. Community detection in incomplete information networks. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012; pp. 341–350. [Google Scholar]
Scott, J.; Carrington, P.J. The SAGE Handbook of Social Network Analysis; SAGE Publications: London, UK, 2014. [Google Scholar]
Bedi, P.; Sharma, C. Community detection in social networks. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2016, 6, 115–135. [Google Scholar] [CrossRef]
Ji, P.; Jin, J. Coauthorship and citation networks for statisticians. Ann. Appl. Stat. 2016, 10, 1779–1812. [Google Scholar] [CrossRef]
Ji, P.; Jin, J.; Ke, Z.T.; Li, W. Co-citation and Co-authorship Networks of Statisticians. J. Bus. Econ. Stat. 2022, 40, 469–485. [Google Scholar] [CrossRef]
Newman, M.E. The structure and function of complex networks. SIAM Rev. 2003, 45, 167–256. [Google Scholar] [CrossRef]
Newman, M.E.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 2004, 69, 026113. [Google Scholar] [CrossRef] [PubMed]
Boccaletti, S.; Latora, V.; Moreno, Y.; Chavez, M.; Hwang, D.U. Complex networks: Structure and dynamics. Phys. Rep. 2006, 424, 175–308. [Google Scholar] [CrossRef]
Fortunato, S. Community detection in graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef]
Fortunato, S.; Hric, D. Community detection in networks: A user guide. Phys. Rep. 2016, 659, 1–44. [Google Scholar] [CrossRef]
Abbe, E.; Bandeira, A.S.; Hall, G. Exact Recovery in the Stochastic Block Model. IEEE Trans. Inf. Theory 2016, 62, 471–487. [Google Scholar] [CrossRef]
Fortunato, S.; Newman, M.E. 20 years of network community detection. Nat. Phys. 2022, 1–3. [Google Scholar] [CrossRef]
Goldenberg, A.; Zheng, A.X.; Fienberg, S.E.; Airoldi, E.M. A survey of statistical network models. Found. Trends Mach. Learn. 2010, 2, 129–233. [Google Scholar] [CrossRef]
Holland, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 1983, 5, 109–137. [Google Scholar] [CrossRef]
Snijders, T.A.; Nowicki, K. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classif. 1997, 14, 75–100. [Google Scholar] [CrossRef]
Rohe, K.; Chatterjee, S.; Yu, B. Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 2011, 39, 1878–1915. [Google Scholar] [CrossRef]
Choi, D.S.; Wolfe, P.J.; Airoldi, E.M. Stochastic blockmodels with a growing number of classes. Biometrika 2012, 99, 273–284. [Google Scholar] [CrossRef]
Sussman, D.L.; Tang, M.; Fishkind, D.E.; Priebe, C.E. A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Am. Stat. Assoc. 2012, 107, 1119–1128. [Google Scholar] [CrossRef]
Latouche, P.; Birmelé, E.; Ambroise, C. Model selection in overlapping stochastic block models. Electron. J. Stat. 2014, 8, 762–794. [Google Scholar] [CrossRef]
Lei, J.; Rinaldo, A. Consistency of spectral clustering in stochastic block models. Ann. Stat. 2015, 43, 215–237. [Google Scholar] [CrossRef]
Sarkar, P.; Bickel, P.J. Role of normalization in spectral clustering for stochastic blockmodels. Ann. Stat. 2015, 43, 962–990. [Google Scholar] [CrossRef]
Lyzinski, V.; Tang, M.; Athreya, A.; Park, Y.; Priebe, C.E. Community detection and classification in hierarchical stochastic blockmodels. IEEE Trans. Netw. Sci. Eng. 2016, 4, 13–26. [Google Scholar] [CrossRef]
Valles-Catala, T.; Massucci, F.A.; Guimera, R.; Sales-Pardo, M. Multilayer stochastic block models reveal the multilayer structure of complex networks. Phys. Rev. X 2016, 6, 011036. [Google Scholar] [CrossRef]
Lei, J. A goodness-of-fit test for stochastic block models. Ann. Stat. 2016, 44, 401–424. [Google Scholar] [CrossRef]
Tabouy, T.; Barbillon, P.; Chiquet, J. Variational inference for stochastic block models from sampled data. J. Am. Stat. Assoc. 2020, 115, 455–466. [Google Scholar] [CrossRef]
Airoldi, E.M.; Blei, D.M.; Fienberg, S.E.; Xing, E.P. Mixed Membership Stochastic Blockmodels. J. Mach. Learn. Res. 2008, 9, 1981–2014. [Google Scholar] [PubMed]
Wang, F.; Li, T.; Wang, X.; Zhu, S.; Ding, C. Community discovery using nonnegative matrix factorization. Data Min. Knowl. Discov. 2011, 22, 493–521. [Google Scholar] [CrossRef]
Airoldi, E.M.; Wang, X.; Lin, X. Multi-way blockmodels for analyzing coordinated high-dimensional responses. Ann. Appl. Stat. 2013, 7, 2431–2457. [Google Scholar] [CrossRef] [PubMed]
Panov, M.; Slavnov, K.; Ushakov, R. Consistent Estimation of Mixed Memberships with Successive Projections. In International Conference on Complex Networks and Their Applications; Springer: Cham, Switzerland, 2017; pp. 53–64. [Google Scholar]
Zhang, Y.; Levina, E.; Zhu, J. Detecting overlapping communities in networks using spectral methods. SIAM J. Math. Data Sci. 2020, 2, 265–283. [Google Scholar] [CrossRef]
Jin, J.; Ke, Z.T.; Luo, S. Estimating network memberships by simplex vertex hunting. arXiv 2017, arXiv:1708.07852. [Google Scholar]
Mao, X.; Sarkar, P.; Chakrabarti, D. On Mixed Memberships and Symmetric Nonnegative Matrix Factorizations. In Proceedings of the 34th International Conference of Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2324–2333. [Google Scholar]
Mao, X.; Sarkar, P.; Chakrabarti, D. Overlapping Clustering Models, and One (class) SVM to Bind Them All. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Volume 31, pp. 2126–2136. [Google Scholar]
Mao, X.; Sarkar, P.; Chakrabarti, D. Estimating Mixed Memberships With Sharp Eigenvector Deviations. J. Am. Stat. Assoc. 2020, 116, 1928–1940. [Google Scholar] [CrossRef]
Karrer, B.; Newman, M.E.J. Stochastic blockmodels and community structure in networks. Phys. Rev. E 2011, 83, 16107. [Google Scholar] [CrossRef]
Kaufmann, E.; Bonald, T.; Lelarge, M. A spectral algorithm with additive clustering for the recovery of overlapping communities in networks. Theor. Comput. Sci. 2017, 742, 3–26. [Google Scholar] [CrossRef]
Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
Qin, T.; Rohe, K. Regularized spectral clustering under the degree-corrected stochastic blockmodel. Adv. Neural Inf. Process. Syst. 2013, 26, 3120–3128. [Google Scholar]
Joseph, A.; Yu, B. Impact of regularization on spectral clustering. Ann. Stat. 2016, 44, 1765–1791. [Google Scholar] [CrossRef]
Jin, J. Fast community detection by SCORE. Ann. Stat. 2015, 43, 57–89. [Google Scholar] [CrossRef]
Gillis, N.; Vavasis, S.A. Semidefinite Programming Based Preconditioning for More Robust Near-Separable Nonnegative Matrix Factorization. SIAM J. Optim. 2015, 25, 677–698. [Google Scholar] [CrossRef]
Mossel, E.; Neeman, J.; Sly, A. Consistency thresholds for binary symmetric block models. arXiv 2014, arXiv:1407.1591. [Google Scholar]
Abbe, E. Community detection and stochastic block models: Recent developments. J. Mach. Learn. Res. 2017, 18, 6446–6531. [Google Scholar]
Hajek, B.; Wu, Y.; Xu, J. Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions. IEEE Trans. Inf. Theory 2016, 62, 5918–5937. [Google Scholar] [CrossRef]
Agarwal, N.; Bandeira, A.S.; Koiliaris, K.; Kolla, A. Multisection in the Stochastic Block Model using Semidefinite Programming. arXiv 2017, arXiv:1507.02323. [Google Scholar]
Bandeira, A.S. Random Laplacian Matrices and Convex Relaxations. Found. Comput. Math. 2018, 18, 345–379. [Google Scholar] [CrossRef]
Abbe, E.; Sandon, C. Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery. In Proceedings of the 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, Berkeley, CA, USA, 17–20 October 2015; pp. 670–688. [Google Scholar]
Gao, C.; Ma, Z.; Zhang, A.Y.; Zhou, H.H. Achieving Optimal Misclassification Proportion in Stochastic Block Models. J. Mach. Learn. Res. 2017, 18, 1–45. [Google Scholar]
McSherry, F. Spectral partitioning of random graphs. In Proceedings of the 2001 IEEE International Conference on Cluster Computing, Newport Beach, CA, USA, 8–11 October 2001; pp. 529–537. [Google Scholar]
Newman, M.E. Assortative mixing in networks. Phys. Rev. Lett. 2002, 89, 208701. [Google Scholar] [CrossRef]
Erdös, P.; Rényi, A. On the evolution of random graphs. In The Structure and Dynamics of Networks; Princeton University Press: Princeton, NJ, USA, 2011; pp. 38–82. [Google Scholar] [CrossRef]
Blum, A.; Hopcroft, J.; Kannan, R. Foundations of Data Science; Number 1; Cambridge University Press: Cambridge, UK, 2020; pp. 1–465. [Google Scholar]
Lei, L. Unified ℓ_2→∞ Eigenspace Perturbation Theory for Symmetric Random Matrices. arXiv 2019, arXiv:1909.04798. [Google Scholar]
Chen, Y.; Chi, Y.; Fan, J.; Ma, C. Spectral methods for data science: A statistical perspective. Found. Trends Mach. Learn. 2021, 14, 566–806. [Google Scholar] [CrossRef]
Cape, J.; Tang, M.; Priebe, C.E. The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics. Ann. Stat. 2019, 47, 2405–2439. [Google Scholar] [CrossRef]
Abbe, E.; Fan, J.; Wang, K.; Zhong, Y. Entrywise Eigenvector Analysis of Random Matrices with Low Expected Rank. Ann. Stat. 2020, 48, 1452–1474. [Google Scholar] [CrossRef] [PubMed]
Rohe, K.; Qin, T.; Yu, B. Co-clustering directed graphs to discover asymmetries and directional communities. Proc. Natl. Acad. Sci. USA 2016, 113, 12679–12684. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Liang, Y.; Ji, P. Spectral Algorithms for Community Detection in Directed Networks. J. Mach. Learn. Res. 2020, 21, 1–45. [Google Scholar]
Cai, T.T.; Li, X. Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. Ann. Stat. 2015, 43, 1027–1059. [Google Scholar] [CrossRef]
Tropp, J.A. User-Friendly Tail Bounds for Sums of Random Matrices. Found. Comput. Math. 2012, 12, 389–434. [Google Scholar] [CrossRef]
Zhou, Z.; A.Amini, A. Analysis of spectral clustering algorithms for community detection: The general bipartite setting. J. Mach. Learn. Res. 2019, 20, 1–47. [Google Scholar]
Zhao, Y.; Levina, E.; Zhu, J. Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Stat. 2012, 40, 2266–2292. [Google Scholar] [CrossRef]
Ghoshdastidar, D.; Dukkipati, A. Consistency of Spectral Partitioning of Uniform Hypergraphs under Planted Partition Model. In Proceedings of the Advances in Neural Information Processing Systems 27, Montreal, QC, Canada, 8–13 December 2014; Volume 27, pp. 397–405. [Google Scholar]
Ke, Z.T.; Shi, F.; Xia, D. Community Detection for Hypergraph Networks via Regularized Tensor Power Iteration. arXiv 2019, arXiv:1909.06503. [Google Scholar]
Cole, S.; Zhu, Y. Exact recovery in the hypergraph stochastic block model: A spectral algorithm. Linear Algebra Its Appl. 2020, 593, 45–73. [Google Scholar] [CrossRef]
Bandeira, A.S.; van Handel, R. Sharp nonasymptotic bounds on the norm of random matrices with independent entries. Ann. Probab. 2016, 44, 2479–2506. [Google Scholar] [CrossRef]
Cape, J. Orthogonal Procrustes and norm-dependent optimality. Electron. J. Linear Algebra 2020, 36, 158–168. [Google Scholar] [CrossRef]

Figure 1. Phase transition for SPACL and SVM-cone-DCMMSB under MMSB: darker pixels represent lower error rates. The red lines represent

\frac{| α_{in} - α_{out} |}{\sqrt{\max (α_{in}, α_{out})}} = 1

.

Figure 1. Phase transition for SPACL and SVM-cone-DCMMSB under MMSB: darker pixels represent lower error rates. The red lines represent

\frac{| α_{in} - α_{out} |}{\sqrt{\max (α_{in}, α_{out})}} = 1

.

Figure 2. Panel (a): a graph generated from the mixed membership stochastic blockmodel with

n = 600

nodes and 2 communities. Among the 600 nodes, each community has 250 pure nodes. For the 100 mixed nodes, they have mixed membership

(1 / 2, 1 / 2)

. Panel (b): a graph generated from MMSB with

n = 600

nodes and 3 communities. Among the 600 nodes, each community has 150 pure nodes. For the 150 mixed nodes, they have mixed membership

(1 / 3, 1 / 3, 1 / 3)

. Nodes in panels (a,b) connect with probability

p_{in} = 60 / 600

and

p_{out} = 1 / 600

, so the two networks in both panels are assortative networks. For panel (a), error rates for SPACL and SVM-cone-DCMMSB are 0.0285 and 0.0175, respectively, where error rate is defined in Equation (9). For panel (b), error rates for SPACL and SVM-cone-DCMMSB are 0.0709 and 0.0436, respectively. For both panels, dots in the same color are pure nodes in the same community and green square nodes are mixed.

Figure 2. Panel (a): a graph generated from the mixed membership stochastic blockmodel with

n = 600

nodes and 2 communities. Among the 600 nodes, each community has 250 pure nodes. For the 100 mixed nodes, they have mixed membership

(1 / 2, 1 / 2)

. Panel (b): a graph generated from MMSB with

n = 600

nodes and 3 communities. Among the 600 nodes, each community has 150 pure nodes. For the 150 mixed nodes, they have mixed membership

(1 / 3, 1 / 3, 1 / 3)

. Nodes in panels (a,b) connect with probability

p_{in} = 60 / 600

and

p_{out} = 1 / 600

, so the two networks in both panels are assortative networks. For panel (a), error rates for SPACL and SVM-cone-DCMMSB are 0.0285 and 0.0175, respectively, where error rate is defined in Equation (9). For panel (b), error rates for SPACL and SVM-cone-DCMMSB are 0.0709 and 0.0436, respectively. For both panels, dots in the same color are pure nodes in the same community and green square nodes are mixed.

Figure 3. Phase transition for oPCA, nPCA, RSC and SCORE under SBM: darker pixels represent lower error rates. The red lines represent

\frac{| α_{in} - α_{out} |}{\sqrt{\max (α_{in}, α_{out})}} = 1

.

Figure 3. Phase transition for oPCA, nPCA, RSC and SCORE under SBM: darker pixels represent lower error rates. The red lines represent

\frac{| α_{in} - α_{out} |}{\sqrt{\max (α_{in}, α_{out})}} = 1

.

Figure 4. Panel (a): a graph generated from

S B M (600, 2, 30 / 600, 2 / 600)

. Panel (b): a graph generated from

S B M (600, 3, 30 / 600, 2 / 600)

. So, in panel (a), there are 2 communities and each community has 300 nodes; in panel (b), there are 3 communities and each community has 200 nodes. Networks in panels (a,b) are assortative networks since

p_{in} > p_{out}

. For both panels, error rates for oPCA, nPCA, RSC and SCORE are 0. Colors indicate clusters.

Figure 4. Panel (a): a graph generated from

S B M (600, 2, 30 / 600, 2 / 600)

. Panel (b): a graph generated from

S B M (600, 3, 30 / 600, 2 / 600)

. So, in panel (a), there are 2 communities and each community has 300 nodes; in panel (b), there are 3 communities and each community has 200 nodes. Networks in panels (a,b) are assortative networks since

p_{in} > p_{out}

. For both panels, error rates for oPCA, nPCA, RSC and SCORE are 0. Colors indicate clusters.

Table 1. Comparison of separation condition and sharp threshold. Details of this table are given in Section 4. The classical result on separation condition given in Corollary 1 of [59] is

\sqrt{\frac{\log (n)}{n}}

(i.e., Equation (1)). The classical result on sharp threshold is

\frac{\log (n)}{n}

(i.e., Equation (3)) given in [61], Theorem 4.6 [62] and the first bullet in Section 2.5 [53]. In this paper, n is the number of nodes in a network, A is the adjacency matrix,

Ω

is the expectation of A under some models,

A_{re}

is a regularization of A,

ρ

is the sparsity parameter such that

ρ \geq \max_{i, j} Ω (i, j)

and it controls the overall sparsity of a network,

∥ \cdot ∥

denotes spectral norm, and

ξ > 1

.

Table 1. Comparison of separation condition and sharp threshold. Details of this table are given in Section 4. The classical result on separation condition given in Corollary 1 of [59] is

\sqrt{\frac{\log (n)}{n}}

(i.e., Equation (1)). The classical result on sharp threshold is

\frac{\log (n)}{n}

(i.e., Equation (3)) given in [61], Theorem 4.6 [62] and the first bullet in Section 2.5 [53]. In this paper, n is the number of nodes in a network, A is the adjacency matrix,

Ω

is the expectation of A under some models,

A_{re}

is a regularization of A,

ρ

is the sparsity parameter such that

ρ \geq \max_{i, j} Ω (i, j)

and it controls the overall sparsity of a network,

∥ \cdot ∥

denotes spectral norm, and

ξ > 1

.

	Model	Separation Condition	Sharp Threshold
Ours using $∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}$	MMSB&DCMM	$\sqrt{\frac{\log (n)}{n}}$	$\frac{\log (n)}{n}$
Ours using $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}$	MMSB&DCMM	$\sqrt{\frac{\log (n)}{n}}$	$\frac{\log (n)}{n}$
Ref. [41] using $∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}$ (original)	DCMM	$\sqrt{\frac{\log (n)}{n}}$	$\frac{\log (n)}{n}$
Ref. [41] using $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}$	DCMM	$\sqrt{\frac{\log (n)}{n}}$	$\frac{\log (n)}{n}$
Refs. [43,44] using $∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}$ (original)	MMSB&DCMM	$\frac{\log^{ξ} (n)}{\sqrt{n}}$	$\frac{\log^{2 ξ} (n)}{n}$
Refs. [43,44] using $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}$	MMSB&DCMM	$\frac{\log^{ξ + 0.5} (n)}{\sqrt{n}}$	$\frac{\log^{2 ξ + 1} (n)}{n}$
Ref. [30] using $∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}$ (original)	SBM&DCSBM	$\sqrt{\frac{1}{n}}$	$\frac{1}{n}$
Ref. [30] using $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n) \log (n)}$	SBM&DCSBM	$\sqrt{\frac{\log (n)}{n}}$	$\frac{\log (n)}{n}$

Table 2. Comparison of alternative separation condition, where the classical result on alternative separation condition is 1 (i.e., Equation (2)).

	Model	Alternative Separation Condition
Ours using $∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}$	MMSB&DCMM	1
Ours using $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}$	MMSB&DCMM	1
Ref. [41] using $∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}$ (original)	DCMM	1
Ref. [41] using $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}$	DCMM	1
Refs. [43,44] using $∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}$ (original)	MMSB&DCMM	$\log^{ξ - 0.5} (n)$
Refs. [43,44] using $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}$	MMSB&DCMM	$\log^{ξ} (n)$
Ref. [30] using $∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}$ (original)	SBM&DCSBM	$\sqrt{\frac{1}{\log (n)}}$
Ref. [30] using $∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n) \log (n)}$	SBM&DCSBM	1

Table 3. Comparison of error rates between our Theorem 1 and Theorem 3.2 [44] under

M M S B_{n} (K, \tilde{P}, Π, ρ)

. The dependence on K is obtained when

κ (Π^{'} Π) = O (1)

. For comparison, we have adjusted the

l_{2}

error rates of Theorem 3.2 [44] into

l_{1}

error rates. Note that as analyzed in the first bullet given after Lemma 2, whether using

∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}

or

∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}

does not change our

ϖ

, and has no influence on bound in Theorem 1. For [44], using

∥ A_{re} - Ω ∥ \leq \sqrt{ρ n}

, the power of

\log (n)

in their Theorem 3.2 is

ξ

; using

∥ A - Ω ∥ \leq \sqrt{ρ n \log (n)}

, the power of

\log (n)

in their Theorem 3.2 is

ξ + 0.5

.

Table 3. Comparison of error rates between our Theorem 1 and Theorem 3.2 [44] under

M M S B_{n} (K, \tilde{P}, Π, ρ)

. The dependence on K is obtained when

κ (Π^{'} Π) = O (1)

. For comparison, we have adjusted the

l_{2}

error rates of Theorem 3.2 [44] into

l_{1}

error rates. Note that as analyzed in the first bullet given after Lemma 2, whether using

∥ A - Ω ∥ \leq C \sqrt{ρ n \log (n)}

or

∥ A_{re} - Ω ∥ \leq C \sqrt{ρ n}

does not change our

ϖ

, and has no influence on bound in Theorem 1. For [44], using

∥ A_{re} - Ω ∥ \leq \sqrt{ρ n}

, the power of

\log (n)

in their Theorem 3.2 is

ξ

; using

∥ A - Ω ∥ \leq \sqrt{ρ n \log (n)}

, the power of

\log (n)

in their Theorem 3.2 is

ξ + 0.5

.

	$ρ n$	$σ_{K} (Ω)$	$λ_{K} (Π^{'} Π)$	Dependence on K	Dependence on $\log (n)$
Ours	$\geq \log (n)$	$⪰ \sqrt{ρ n \log (n)}$	$> 0$	$K^{2}$	$\log^{0.5} (n)$
[44]	$\geq \log^{2 ξ} (n)$	$⪰ \sqrt{ρ n} \log^{ξ} (n)$	$\geq 1 / ρ$	$K^{2.5}$	$\log^{ξ} (n)$

Table 4. Comparison of error rates between our Theorem 2 and Theorem 3.2 [43] under

D C M M_{n} (K, P, Π, Θ)

. The dependence on K is obtained when

κ (Π^{'} Π) = O (1)

. For comparison, we adjusted the

l_{2}

error rates of Theorem 3.2 [43] into

l_{1}

error rates. Since Theorem 2 enjoys the same separation condition and sharp threshold as Theorem 1, and Theorem 3.2 [43] enjoys the same separation condition and sharp threshold as Theorem 3.2 [44], we do not report them in this table. Note that as analyzed in Remark 11, whether using

∥ A - Ω ∥ \leq C \sqrt{θ_{\max} {∥ θ ∥}_{1} \log (n)}

or

∥ A_{re} - Ω ∥ \leq C \sqrt{θ_{\max} {∥ θ ∥}_{1}}

does not change our

ϖ

under DCMM, and has no influence on the results in Theorem 2. For [43], using

∥ A_{re} - Ω ∥ \sqrt{θ_{\max} {∥ θ ∥}_{1}}

, the power of

\log (n)

in their Theorem 3.2 is

ξ

; using

∥ A - Ω ∥ \sqrt{θ_{\max} {∥ θ ∥}_{1} \log (n)}

, the power of

\log (n)

in their Theorem 3.2 is

ξ + 0.5

.

Table 4. Comparison of error rates between our Theorem 2 and Theorem 3.2 [43] under

D C M M_{n} (K, P, Π, Θ)

. The dependence on K is obtained when

κ (Π^{'} Π) = O (1)

. For comparison, we adjusted the

l_{2}

error rates of Theorem 3.2 [43] into

l_{1}

error rates. Since Theorem 2 enjoys the same separation condition and sharp threshold as Theorem 1, and Theorem 3.2 [43] enjoys the same separation condition and sharp threshold as Theorem 3.2 [44], we do not report them in this table. Note that as analyzed in Remark 11, whether using

∥ A - Ω ∥ \leq C \sqrt{θ_{\max} {∥ θ ∥}_{1} \log (n)}

or

∥ A_{re} - Ω ∥ \leq C \sqrt{θ_{\max} {∥ θ ∥}_{1}}

does not change our

ϖ

under DCMM, and has no influence on the results in Theorem 2. For [43], using

∥ A_{re} - Ω ∥ \sqrt{θ_{\max} {∥ θ ∥}_{1}}

, the power of

\log (n)

in their Theorem 3.2 is

ξ

; using

∥ A - Ω ∥ \sqrt{θ_{\max} {∥ θ ∥}_{1} \log (n)}

, the power of

\log (n)

in their Theorem 3.2 is

ξ + 0.5

.

	$Π (i, :)$	$θ_{\max} {∥ θ ∥}_{1}$	$σ_{K} (Ω)$	$κ (Π^{'} Θ^{2} Π)$	Dependence on K	Dependence on $\log (n)$
Ours	arbitrary	$\geq \log (n)$	$⪰ θ_{\max} \sqrt{n \log (n)}$	$\geq 1$	$K^{6}$	$\log^{0.5} (n)$
[43]	$i i d$ from Dirichlet	$\geq \log^{2 ξ} (n)$	$⪰ θ_{\max} \sqrt{n} \log^{ξ} (n)$	$= O (1)$	$K^{6.5}$	$\log^{ξ} (n)$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qing, H. A Useful Criterion on Studying Consistent Estimation in Community Detection. Entropy 2022, 24, 1098. https://doi.org/10.3390/e24081098

AMA Style

Qing H. A Useful Criterion on Studying Consistent Estimation in Community Detection. Entropy. 2022; 24(8):1098. https://doi.org/10.3390/e24081098

Chicago/Turabian Style

Qing, Huan. 2022. "A Useful Criterion on Studying Consistent Estimation in Community Detection" Entropy 24, no. 8: 1098. https://doi.org/10.3390/e24081098

APA Style

Qing, H. (2022). A Useful Criterion on Studying Consistent Estimation in Community Detection. Entropy, 24(8), 1098. https://doi.org/10.3390/e24081098

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Useful Criterion on Studying Consistent Estimation in Community Detection

Abstract

1. Introduction

1.1. Spectral Clustering Approaches

1.2. Separation Condition, Alternative Separation Condition and Sharp Threshold

1.3. Inconsistencies on Separation Condition in Some Previous Works

1.4. Our Findings

2. Mixed Membership Stochastic Blockmodel

3. Consistency under MMSB

4. Separation Condition and Sharp Threshold Criterion

5. Degree Corrected Mixed Membership Model

Consistency under DCMM

6. Numerical Results

7. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Additional Experiments

Appendix B. Vertex Hunting Algorithms

Appendix C. Proof of Consistency under MMSB

Appendix C.1. Proof of Lemma 1

Appendix C.2. Proof of Lemma 2

Appendix C.3. Proof of Theorem 1

Appendix C.4. Proof of Corollary 1

Appendix D. Proof of Consistency under DCMM

Appendix D.1. Proof of Lemma 3

Appendix D.2. Proof of Lemma 4

Appendix D.3. Proof of Lemma 5

Appendix D.4. Proof of Theorem 2

Appendix D.5. Proof of Corollary 2

Appendix D.6. Basic Properties of Ω under DCMM

Appendix D.7. Bounds between Ideal SVM-cone-DCMMSB and SVM-cone-DCMMSB

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI