Next Article in Journal
Optimal Transport and Seismic Rays
Next Article in Special Issue
Generalizations of the Kantorovich and Wielandt Inequalities with Applications to Statistics
Previous Article in Journal
Convergence of the Euler Method for Impulsive Neutral Delay Differential Equations
Previous Article in Special Issue
Sharper Concentration Inequalities for Median-of-Mean Processes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimal Non-Asymptotic Bounds for the Sparse β Model

1
College of Mathematics, Sichuan University, Chengdu 610017, China
2
Department of Statistics, Central China Normal University, Wuhan 430079, China
3
School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100080, China
4
College of Economics, Shenzhen University, Shenzhen 518060, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(22), 4685; https://doi.org/10.3390/math11224685
Submission received: 15 October 2023 / Revised: 14 November 2023 / Accepted: 15 November 2023 / Published: 17 November 2023
(This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics)

Abstract

:
This paper investigates the sparse β model with 𝓁 1 penalty in the field of network data models, which is a hot topic in both statistical and social network research. We present a refined algorithm designed for parameter estimation in the proposed model. Its effectiveness is highlighted through its alignment with the proximal gradient descent method, stemming from the convexity of the loss function. We study the estimation consistency and establish an optimal bound for the proposed estimator. Empirical validations facilitated through meticulously designed simulation studies corroborate the efficacy of our methodology. These assessments highlight the prospective contributions of our methodology to the advanced field of network data analysis.

1. Introduction

With the advancement of science and technology, an increasing volume of network data is being collected. There is an increasing need to analyze and understand the formation and features of these network-structured data. In the filed of statistics, network-structured data bring about specific challenges in statistical inference and asymptotic analysis [1]. Many statisticians have delved into the study of network models. For example, the well-known Erdős–Rényi model [2,3], assumes that the formation of connections between node pairs happens independently and with a uniform probability. In recent years, two core attributes have been a pivotal focus in research on network models: network sparsity and degree heterogeneity [4]. Network sparsity refers to the phenomenon where the actual number of connections between nodes is significantly lower than the potential maximum. It is a common occurrence in real-world networks and presents unique challenges and opportunities for analysis. Degree heterogeneity refers to the phenomenon whereby the network is characterized by a few highly connected core nodes and many low-degree nodes with fewer links. These facets are intrinsic to the comprehensive understanding of network structures and dynamics (see [5]).
Two prominent statistical models that address degree heterogeneity are the stochastic block model [6] and the β model [7,8,9]. The stochastic block model seeks to encapsulate degree heterogeneity by grouping nodes into communities characterized by analogous connection patterns, as demonstrated in [6,10]. In contrast, the β model addresses degree heterogeneity directly, employing node-specific parameters to characterize the variation in node connections. Degrees of nodes are foundational in summarizing network information, with their distributions offering crucial insights into the networks’ formation processes. Recent advancements in modeling degree sequence in undirected networks have explored distributions within the framework of the exponential family, where vertex “potentials” are used as defining parameters (see [11,12,13,14]). In the context of directed networks, there many studies have been conducted on constructing and sampling graphs with specified in- and out-degree sequences, often termed “bi-degree” (see [15,16,17]).
The β model is renowned for its widespread application and proven statistical efficacy. Statistically, the maximum likelihood estimation of the β -model parameters is recognized for its consistency and asymptotic normality, albeit predominantly in the context of relatively dense networks, as indicated in [7,18]. In practice, however, a pervasive characteristic of observed networks is their sparsity, as manifested by the presence of considerably fewer edges than the theoretically maximum attainable connections. This landscape underscores the imperative for the evolution of a sparse β model. A significant advancement in this field is related to the work reported in [19], where a sparse β model with 𝓁 0 penalty was studied. This innovation is not only adept at encapsulating node heterogeneity but also facilitates parameter estimation endowed with commendable statistical attributes, even within sparse network contexts. The model is distinguished by its computationally fast, theoretically tractable, and intuitively attractive nature. Nevertheless, the authors of [19] accentuated a limitation inherent in 𝓁 0 penalty-based estimation, particularly when the parameter support is unknown. They recommended that when the 𝓁 0 penalty is impractical due to undefined parameter support, 𝓁 1 -norm penalization becomes a preferred, efficient alternative for parameter estimation.
In this paper, we study the sparse β model, specifically one augmented with an 𝓁 1 penalty, aiming to articulate the optimal non-asymptotic bounds in a theoretical framework. The inclusion of the 𝓁 1 penalty renders the loss function convex. This modification simplifies the resolution process, allowing for the application of convex optimization techniques, thereby boosting computational efficiency. Furthermore, theoretical analysis of the consistency of the estimated β is established, and the finite-sample performance of the proposed method is verified by numerical simulation.
For the remainder of this paper, we proceed as follows. In Section 2, we introduce the sparse β model with an 𝓁 1 penalty and the estimation procedure for the degree parameters ( β ). In Section 3, we develop theoretical results to demonstrate the consistency of the estimated β . Simulation studies are conducted in Section 4 to empirically validate the effectiveness and efficiency of the proposed method. We encapsulate our findings and provide a discussion of potential future directions in Section 5. Proofs of the theoretical results are comprehensively provided in Appendix A.

2. Methods

In this section, we first review the formulation of the β model for undirected graphs in Section 2.1 and discuss the sparse β model with 𝓁 1 penalty and its inference procedure in Section 2.2.

2.1. Review of the β Model for Undirected Graphs

For the sake of convenient description, we first provide some explanations of symbols. x p , p = 1 , 2 denotes the p norm of vector x . Let G be an undirected simple graph with no self-loops on n 2 nodes that are labeled as { 1 , , n } , and let A n × n be its adjacency matrix. Here, we consider the element A i , j an indicator variable denoting whether node i is connected to node j: that is, A i , j = 1 if nodes i and j are connected; A i , j = 0 otherwise. Let d i = j i A i , j be the degree of node i and d = ( d 1 , , d n ) be the degree sequence of graph G . The β model, which was named in [7], is an exponential random graph model (ERGM) in which the degree sequence is the exclusively sufficient statistic. Specifically, the distribution of G can be expressed as
P ( G ) = P ( d | β ) = exp ( β d Z ( β ) ) ,
where β = ( β 1 , , β n ) is a vector of parameters, and Z ( β ) = 1 i < j n log ( 1 + exp ( β i + β j ) ) . It implies that all edges ( { A i , j } i < j ) are independent and distributed as Bernoulli random variables, with success probability defined as
p i , j = P ( A i , j = 1 ) = exp ( β i + β j ) 1 + exp ( β i + β j ) , 1 i j n
where β i is the influence parameter of vertex i. In the β model, β i has a natural explanation because it can be used to measure the tendency of vertex i to establish connections with other vertices; that is, the larger β i is, the more likely vertex i is to connect to other vertices.
The negative log-likelihood function for the β model is
l ( β ) : = l ( β , d ) = β d + log ( Z ( β ) ) .
Our prediction loss is the expectation of normalized negative log likelihood (the theoretical risk), and β * is the optimal parameter vector, i.e.,
P l ( β , d ) = E 1 n ( n 1 ) l ( β , d ) = 1 n ( n 1 ) β E [ d ] + log ( Z ( β ) ) = 1 n ( n 1 ) i = 1 n β i E [ d i ] + 1 n ( n 1 ) 1 i < j n log ( 1 + exp ( β i + β j ) ) = 1 n ( n 1 ) i = 1 n β i j i E [ A i , j ] + 1 n ( n 1 ) 1 i < j n log ( 1 + exp ( β i + β j ) )
and
β * = arg min β R n P l ( β , d ) .
Let H * = E [ 2 𝓁 ( β * ) 2 β * ] be the Fisher information of P ( d | β * ) . We induced “Fisher risk”:
β β * H * 2 = ( β β * ) H * ( β β * ) .
We also calculated the 𝓁 1 risk ( β β * 1 ). In addition, when the sample size is sufficiently large, we hope to obtain a reasonable estimate ( β ^ ), with the Fisher risk ( β ^ β * H * 2 ) approaching P l ( β ^ ) P l ( β * ) .
The corresponding empirical risk is defined as
P n [ l ( β , d ) ] = 1 n ( n 1 ) i = 1 n [ d i β i log ( Z ( β ) ) ] = 1 n ( n 1 ) i = 1 n β i j i A i , j + 1 n ( n 1 ) 1 i < j n log ( 1 + exp ( β i + β j ) ) .
Many researchers have studied the maximum likelihood estimation of β by minimizing the empirical risk; theoretical asymptotic results are reported in [11,17,20]. In this paper, we study the sparse β model with 𝓁 1 regularization on β .

2.2. Sparse β Model with 𝓁 1 Penalty

In this subsection, we formulate the sparse β model with 𝓁 1 penalty. First, we assume that the real parameter β * is sparse, with support (M) and a sparsity level (m) of
M = M ( β * ) = { k : β k * 0 } , m = | M | ,
where | · | represents the number of elements in M.
To facilitate the understanding of the analogous convergence rates of the 𝓁 1 regularization algorithm to the 𝓁 0 algorithm, the authors of [21] conducted an intricate exploration of sparse eigenvalue conditions utilizing a design matrix. This study illuminated that a minimal (sparse) subset of eigenvalues is distinctly isolated from 0. Further advancements were made by the authors of [22], who mitigated this condition, achieving it through vectors characterized by a predominant support on a compact subset, as detailed in their comprehensive analysis. In the context of this paper, we extend this discourse by also moderating this condition, with our focal point being the Fisher matrix. Specifically, we constrain the Fisher eigenvalues. For ϵ R n , let ϵ M R n . We define { ϵ M } k = ϵ k I ( k M ) and the complement of M as M C , where I ( · ) represents the indicative function.
Condition (A): For any ϵ such that ϵ M C 1 2 ϵ M 1 and γ min * ϵ M 2 ϵ H * γ max * ϵ M 2 .
It is noteworthy that our approach specifically quantifies the support for M, marking a considerably milder condition compared to the one delineated in [21], where a more comprehensive assessment was conducted. Moreover, unlike the authors of [22], who quantified all subsets, our methodology is characterized by a focused application. In addition, the theoretical analysis presented in Section 3 shows that our subspace only needs to be considered restricted to the following set:
W = { w : w M C 1 2 w M 1 } .
Under the condition of restricted eigenvalues (REs), if H * has a minimum eigenvalue ( ϑ min ), we can replace it with γ min * (much less than ϑ min ).
Ref. [19] adopted a penalty log-likelihood method with 𝓁 0 penalty for the sparse β model. They used the monotonicity lemma to show that by assigning non-zero parameters to vertices with larger degrees, the problem of seemingly complicated calculations caused by the 𝓁 0 penalty can be overcome. However, the monotonicity lemma does not always hold. When the support of these parameters is unknown, the estimator based on the 𝓁 0 penalty is no longer computationally feasible. In view of this situation, we may naturally think of developing the penalized likelihood estimation of the 𝓁 1 norm. Similar to penalized logistic regressions [23], we consider the following 𝓁 1 regularization problem for the β model:
β ^ = arg min β R n P n [ l ( β ) ] + λ j = 1 n | β j | = arg min β R n P n [ l ( β ) ] + λ β 1 ,
where λ is a tuning parameter greater than 0.
Obviously, we know that a sparse estimate can capture important coefficients and eliminate redundant variables, while a non-sparse estimate may cause overfitting. The optimization problem (7) can be solved by the proximal gradient decent method, with the subdifferential of 𝓁 ( β ) with respect to β i derived as follows:
𝓁 ( β ) = 𝓁 ( β ) β i = 1 n 1 d i j i exp ( β i + β j ) 1 + exp ( β i + β j ) = 1 n 1 j i A i , j exp ( β i + β j ) 1 + exp ( β i + β j ) .
If the Lasso estimate ( β ^ ) is a solution of (7), then the corresponding Karush–Kuhn–Tucker (KKT) conditions (see [24]) for (7) are
1 n 1 j i A i , j exp ( β ^ i + β ^ j ) 1 + exp ( β ^ i + β ^ j ) = λ sign ( β ^ i ) if β ^ i 0 , 1 n 1 j i A i , j exp ( β ^ i + β ^ j ) 1 + exp ( β ^ i + β ^ j ) λ if β ^ i = 0 .
Theoretically, we need to use the data-adaptive turning parameter ( λ ) to ensure that the event that evaluates the KKT conditions under the given real parameter conditions has a high probability (see Lemma 2 in Section 3 and the corresponding proof presented in Appendix A).

3. Theoretical Analysis

In this section, we study the estimation consistency associated with β in the context of the sparse β model with an 𝓁 1 penalty. Our primary attention is centered around the Fisher risk ( β β * H * ) and the 𝓁 1 risk ( β β * 1 ).
According to algebraic knowledge, if Hessian 2 H 2 β has a consistent lower bound of eigenvalues, then a strictly convex function H is strongly convex. Generally, the exponential family is only represented in a sufficiently small neighborhood of β * in a strongly convex manner. Below, we quantify this behavior in the form of a theorem for the β model.
Theorem 1
(Almost strong convexity). Suppose μ is the analytical standardized moment or cumulant of β * about a certain subspace ( W ) and β ˜ is an estimator of β that satisfies ( β ˜ β * ) W . If
μ 2 1 36 o r P l ( β ˜ ) P l ( β * ) β ˜ β * H * 2 108 μ 2 ,
then we have
1 3 β ˜ β * H * 2 P l ( β ˜ ) P l ( β * ) 2 3 β ˜ β * H * 2 .
Proof Theorem 1. 
First, we consider that if μ 2 1 36 , using Lemma A2 in Appendix A, then
1 3 β ˜ β * H * 2 P l ( β ˜ ) P l ( β * ) 3 5 β ˜ β * H * 2 2 3 β ˜ β * H * 2 ,
and the theorem is proven. Thus, we suppose that P l ( β ˜ ) P l ( β * ) β ˜ β * H * 2 108 μ 2 . If μ 2 1 36 holds, the previous conclusion shows that the theorem has been proven. Therefore, let μ 2 > 1 36 . Hence, max { 1 , 36 μ 2 } = 36 μ 2 . According to (A5) in Appendix A, P l ( β ˜ ) P l ( β * ) β ˜ β * H * 2 108 μ 2 , which is obviously contradictory. Therefore, we have completed the proof of Theorem 1. □
Remark 1.
In general, the exponential family exhibits strong convexity, primarily within localized neighborhoods of β * , especially in sufficiently small regions. The main outcome of Theorem 1 is to quantify when this behavior occurs. Additionally, the conditions outlined in (8) can be construed as initial “burn-in” phases. Our idea is that an initial set of samples is required until the loss of β approximates the minimum loss; subsequently, quadratic convergence takes effect. This parallels the idea in Newton’s method, quantifying the steps needed to enter the quadratic convergence phase. Constants 1 / 3 and 2 / 3 in inequality (9) can approach 1 / 3 , particularly with an extended “burn-in” phase. A crucial element in the proof of Theorem 1 involves expanding the prediction error in terms of moments/products.
Lemma 1.
Let δ = β ˜ β * and E [ β ˜ ] = β * . If infinite series i = 2 1 i ! a i , β * ( δ ) r i and i = 2 1 i ! b i , β * ( δ ) r i converge for any r [ 0 , 1 ] , where a i , β * ( δ ) = E [ δ i ] and b i , β * ( δ ) = g ( i ) ( 0 ) with g ( r ) = log ( E [ e r δ ] ) . Here, g ( i ) ( r ) represents the i-th-order derivative of the g ( r ) function. Then, we have
P l ( β * + r δ ) P l ( β * ) = log 1 + i = 2 1 i ! a i , β * ( δ ) r i ,
P l ( β * + r δ ) P l ( β * ) = i = 2 1 i ! b i , β * ( δ ) r i .
Next, we will provide the error risk bound for β estimation under the RE conditions. Generally, under a specific noise model, we set the regularization parameter ( λ ) as a function of the noise level. Here, the statement of our theorem is carried out in a quantitative manner, clearly showing that the choice of the appropriate λ value depends on the 𝓁 norm of the measurement error ( max 1 i n | d i E [ d i ] | ). Therefore, under relatively mild distribution assumptions, we can easily quantify λ in Theorem 2. In addition, we must make the measurement error sufficiently small so that the following conditions in Lemma 2 hold.
Lemma 2. 
(Risk boundary) For the subspace ( W ) defined in (6), let μ * be the analytical standardized moment or cumulant of β * . Assuming Condition (A) holds and λ satisfies both
max 1 i n | d i E [ d i ] | ( n 1 ) λ 3 a n d λ β ^ β * H * 2 144 μ * 2 β * 1 ,
if β ^ is a solution of (7), then we have the Fisher risk bound as:
1 3 β ^ β * H * 2 P l ( β ^ ) P l ( β * ) 16 m λ 2 3 γ min * 2
and the 𝓁 1 risk bound as:
β ^ β * 1 12 m λ γ min * 2 .
Note that the entry of measurement errors results in a mild dimensional dependence. Therefore, Lemma 2 shows that under RE conditions, the β model shows good convergence rates. According to Hoeffding’s lemma and the inequality reported in [25], below, we present Proposition 1 and quantify the mild distribution hypothesis to obtain Theorem 2.
Proposition 1.
Assuming that { A i , j } j i n are independent random variables and have a common Bernoulli distribution, it is obvious that A i , j meets the boundary condition ( 1 < 0 A i , j 1 ) for j i , i = 1 , 2 , , n . Then, we have
(i) Hoeffding’s lemma: E [ exp ( η j i ( A i , j E [ A i , j ] ) ] exp ( ( n 1 ) η 2 2 ) for η > 0 ;
(ii) Hoeffding’s inequality: P | j i ( A i , j E [ A i , j ] | ) t 2 exp t 2 2 ( n 1 ) , t > 0 .
Furthermore, let t = ( n 1 ) λ 3 ; then, for any ε > 0 , we have
max 1 i n | d i E [ d i ] ] | 2 ( n 1 ) log ( 2 n ε )
with a probability of at least 1 ε .
Since the degree sequence ( d ) in the β model is the sum of Bernoulli random variables, which are obviously bounded so that d is sub-Gaussian, the following Theorem can be immediately drawn.
Theorem 2.
Let β ^ be a solution of (7) for the β model ( λ = 3 2 log ( 2 n / ε ) n 1 ). When n 1 + K μ * 4 β * 1 2 log ( 2 n / ε ) β ^ β * H * 4 , K is a constant, we have
β ^ β * H * 2 288 m log ( 2 n / ε ) γ min * 2 ( n 1 )
and
β ^ β * 1 36 2 m γ min * 2 log ( 2 n / ε ) n 1
with a probability of at least 1 ε .
Proof Theorem 2. 
Let λ = 3 2 log ( 2 n ε ) n 1 ; then, λ satisfies the conditions of Lemma 2. Using (13) in Lemma 2, we have
1 3 β ^ β * H * 2 16 m λ 2 3 γ min * 2 .
By applying (15) in Proposition 1, for any ε > 0 , with a probability of at least 1 ε
β ^ β * H * 2 16 m λ 2 γ min * 2 = 16 m γ min * 2 18 log ( 2 n / ε ) n 1 = 288 m log ( 2 n / ε ) γ min * 2 ( n 1 ) ,
which proves (16). For the second claim of Theorem 2, according to (14) we can obtain
β ^ β * 1 12 m λ γ min * 2 = 12 m γ min * 2 3 2 log ( 2 n / ε ) n 1 = 36 2 m γ min * 2 log ( 2 n / ε ) n 1 ,
with a probability of at least 1 ε . So far, we have completed the proof of Theorem 2. □
Remark 2.
We observe that Lemma 2 provides a general result, and Theorem 2 represents the specific outcome of Lemma 2 under a chosen   λ . It is a concrete result of Lemma 2 in a specific scenario and can be directly proven through Lemma 2. The main theorem established in this paper, Theorem 2, is tighter than existing bounds.

4. Simulation Study

In this section, we conduct experiments to evaluate the performance of the consistency through finite sizes of networks. For a undirected graph with nodes 1 , 2 , , n , we generate the β model as follows. Given n dimensional parameters ( β = ( β 1 , , β n ) ), the element of adjacent matrix A of an undirected graph follows a Bernoulli distribution with a success probability of
p i , j = P ( A i , j = 1 ) = exp ( β i + β j ) 1 + exp ( β i + β j ) , 1 i j n
In this simulation, the true values of parameter β are set to the following three cases (see [19]):
  • Case 1: β i * = 1.5 for i M ( β * ) and β i * = 0 for i M ( β * ) C , i = 1 , 2 , , n ;
  • Case 2: β i * = log n for i M ( β * ) and β i * = 0 for i M ( β * ) C , i = 1 , 2 , , n ;
  • Case 3: β i * = log n for i M ( β * ) and β i * = 0 for i M ( β * ) C , i = 1 , 2 , , n .
Furthermore, we consider three scenarios for the support of β * corresponding to different sparsity levels of β * , from sparse to dense, that is, M * = M ( β * ) = 2 , n , 2 n , where a denotes the largest integer smaller than a.
We generate undirected network A based on the above settings. Then, we implement the proposed sparse β model with 𝓁 1 penalty to obtain the estimated parameter ( β ^ ). The gradient descent method is applied directly to the convex objective function (7) defined in Section 2. During the optimization procedure, the tuning parameter ( λ ) is set to 3 2 log ( 2 n / ε ) / ( n 1 ) , with ε set to be 0.05 . To compute support constrained MLEs, we used the proximal gradient descent algorithm, where the time-invariant step size is set to η t η = 2 / ( n 1 ) .
We carry out simulations under four different sizes of networks: n = 100 , 200, 300, and 400. Each simulation is repeated n s i m = 1000 times. β ^ ( i ) denotes the estimate of β * from the i-th simulation results, i.e., i = 1 , 2 , , 1000 . Two evaluation criteria are used in this paper: the 𝓁 2 error and the 𝓁 1 error, which are defined as follows:
𝓁 2 error = 1 n s i m i = 1 n s i m β ^ ( i ) β * 2 2 , 𝓁 1 error = 1 n s i m i = 1 n s i m β ^ ( i ) β * 1
The simulation results are shown in Figure 1, Figure 2 and Figure 3 for Cases 1–3, respectively. According to Figure 1, we can see that both the 𝓁 2 error and the 𝓁 1 error decrease as the network size (n) increases, which means the estimation accuracy of β ^ generally improves as n increases. On the other hand, both the 𝓁 2 error and the 𝓁 1 error increase as M ( β * ) increases, corresponding to variation in the sparsity level from sparse to dense, that is, the estimation accuracy of β ^ improves for sparse cases. Similar conclusions can be obtained from Figure 2 and Figure 3 relative to that derived from Figure 1.
Comparing Figure 1, Figure 2 and Figure 3 vertically, it is observable that as the signal strength or signal-to-noise ratio of β * increases, estimation errors also increase. This indicates an inverse relationship between the estimation accuracy and the signal-to-noise ratio. These simulation results show the efficiency of the estimation procedure of the sparse β model with 𝓁 1 penalty proposed in this paper.
Finally, we use an example to introduce an undirected graph. We use the Enron email dataset as an example analysis [26] available from (https://www.cs.cmu.edu/~enron/, accessed on 7 May 2015). This dataset was originally acquired and made public by the Federal Energy Regulatory Commission during its investigation into fraudulent accounting practices. Some of the emails were deleted upon requests from affected employees. Therefore, the raw data are messy and need to be cleaned before any analysis is conducted. Ref. [27] applied data cleaning strategies to compile the Enron email dataset. We use their cleaned data for the subsequent analysis. We treat the data as a simple, undirected graph for our analysis, where each edge denotes that there is at least one message between the corresponding two nodes. For our analysis, we exclude messages with more than ten recipients, which is a subjectively chosen cutoff. The dataset contains 156 nodes with 1634 undirected edges, as shown in Figure 4. The quantiles of 0 , 1 / 4 , 1 / 2 , 3 / 4 , and 1 are 0, 13, 19, 27, and 55 for the degrees, respectively. The sparse β model can capture node heterogeneity. For example, the degree of the 52th node is 1, and the degree of the 156th node is 55. It is therefore natural to associate those important nodes with their individual parameters while leaving the less important nodes as background nodes without associated parameters.

5. Conclusions

In this study, we investigated a sparse β model with an 𝓁 1 penalty in the dynamic field of network data models. The degree parameter was estimated through an 𝓁 1 regularization problem, which can be easily solved by the proximal gradient descent method due to the convexity of the loss function. We established an optimal bound, corroborating the consistency of our proposed estimator, with empirical validations underscored by comprehensive simulation studies. One pivotal avenue for future research emanates from the unearthing and integration of alternative penalties that are more adaptive and efficient, facilitating enhanced model performance and a broader applicative scope. Furthermore, more effective optimization algorithms focusing on enhanced computational efficiency and adaptability to diverse network scenarios should be investigated in future studies. The sparse β model proposed in this study serves as both a theoretical foundation and an algorithmic reference for future explorations into the sparse structures of other network models, such as network models with covariates [28], as well as functional covariates [29] partially functional covariates [30]. We aim to establish non-asymptotic optimal prediction error bounds. This contribution is instrumental in propelling the evolution of network models, offering insights and tools that can be adapted and optimized for diverse network structures and scenarios.

Author Contributions

Conceptualization, X.Y. and C.L.; methodology, X.Y. and L.P.; software, L.P. and K.C.; validation, K.C.; writing—original draft preparation, X.Y.; writing—review and editing, X.Y. and C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shenzhen University Research Start-up Fund for Young Teachers No. 868-000001032037.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The Appendix provides the detailed proofs for the theoretical results reported in Section 3.
Proof of Lemma 1. 
First, we notice that P l ( β * + r δ ) P l ( β * ) is the cumulant-generating function of the sufficient statistic. Thus, P l ( β * + r δ ) P l ( β * ) : = g ( r ) = log ( E [ e r δ ] ) . In fact, it is known from Taylor’s formula that e r δ = i = 0 ( r δ ) i i ! . For (10), given E [ δ ] = E [ β ˜ β * ] = 0 and E [ δ i ] = a i , β * ( δ ) , we have
P l ( β * + r δ ) P l ( β * ) = log E [ e r δ ] = log E [ i = 0 ( r δ ) i i ! ] = log 1 + 0 + i = 2 r i E [ δ i ] i ! = log 1 + i = 2 1 i ! a i , β * ( δ ) r i
which proves (10). Using the definition of the cumulant ( b i , β * ( δ ) ), we know that b i , β * ( δ ) = g ( i ) ( 0 ) . In addition, we notice that g ( 0 ) = P l ( β * + 0 · δ ) P l ( β * ) = 0 and g ( 0 ) = log E [ e r δ ] | r = 0 = E [ δ ] = 0 . For (11), by performing Taylor expansion of the g ( r ) function at r = 0 , we can obtain
P l ( β * + r δ ) P l ( β * ) = g ( r ) = i = 0 g ( i ) ( 0 ) i ! r i = g ( 0 ) + g ( 0 ) r + i = 2 g ( i ) ( 0 ) i ! r i = i = 2 b i , β * ( δ ) i ! r i
This completes the proof. □
For the convenience of expression, we assume that a k ( δ ) is the k-th-order central moment of the random variable ( δ d ) distributed under β * . Before presenting the proof of the main results, we provide some lemmas that will be useful in the following sections.
Lemma A1.
Assume that μ and β ˜ satisfy the conditions of Theorem 1. Let r = min 1 , 1 6 μ and δ = β ˜ β * . If μ is an analytic moment, then we have
2 r 2 a 2 ( δ ) 5 i = 2 a i ( δ ) r i i ! 3 r 2 a 2 ( δ ) 5 .
If μ is an analytic cumulant, then we obtain
2 r 2 b 2 ( δ ) 5 i = 2 b i ( δ ) r i i ! 3 r 2 b 2 ( δ ) 5 .
Proof. 
We first prove the lower bound for (A1). According to the conditions of Theorem 1 and Lemma A1, we know that δ has an analytical moment ( μ ); then, we have
a i ( δ ) = E [ δ i ] 1 2 E [ δ 2 ] μ i 2 i ! = 1 2 a 2 ( δ ) μ i 2 i ! , i 3 .
Then,
i = 3 1 i ! a i ( δ ) r i i = 3 1 i ! 1 2 a 2 ( δ ) μ i 2 i ! r i = 1 2 i = 3 μ i 2 a 2 ( δ ) r i = r 2 a 2 ( δ ) 2 i = 1 r μ i
According to the definition of r, r 1 6 μ , and we can obtain
i = 1 r μ i i = 1 1 6 i = 1 5 .
Thus,
i = 2 a i ( δ ) r i i ! = r 2 a 2 ( δ ) 2 + i = 3 a i ( δ ) r i i ! r 2 a 2 ( δ ) 2 i = 3 a i ( δ ) r i i ! r 2 a 2 ( δ ) 2 r 2 a 2 ( δ ) 2 i = 1 r μ i = 1 i = 1 r μ i r 2 a 2 ( δ ) 2 2 r 2 a 2 ( δ ) 5 .
For the upper bound, we can obtain
i = 2 a i ( δ ) r i i ! = a 2 ( δ ) r 2 2 + i = 3 a i ( δ ) r i i ! a 2 ( δ ) r 2 2 + i = 3 a i ( δ ) r i i ! r 2 a 2 ( δ ) 2 + r 2 a 2 ( δ ) 2 i = 1 r μ i = 1 + i = 1 r μ i r 2 a 2 ( δ ) 2 3 r 2 a 2 ( δ ) 5 .
This proves (A1). For the case of an analytic cumulant, the proof process of (A2) is similar, so do not repeat it here, which completes the proof. □
Lemma A2.
Assuming that the definitions of μ and β ˜ are the same as Theorem 1, then we have
β ˜ β * H * 2 3 max { 1 , 36 μ 2 } P l ( β ˜ ) P l ( β * )
Furthermore, if μ 2 1 36 , then
1 3 β ˜ β * H * 2 P l ( β ˜ ) P l ( β * ) 3 5 β ˜ β * H * 2 .
Proof. 
We only consider the case of an analytic standardized moment (the cumulant situation is similar). As r [ 0 , 1 ] uses the convexity of P l ( β ) , we have P l ( β ˜ ) P l ( β * ) P l ( β * + r δ ) P l ( β * ) . According to (10) in Lemma 1, we have
P l ( β ˜ ) P l ( β * ) P l ( β * + r δ ) P l ( β * ) = log 1 + i = 2 1 i ! a i ( δ ) r i ( by Lemma A 1 ) log 1 + 2 r 2 a 2 ( δ ) 5 .
Using Jensen’s inequality, we know that the kurtosis satisfies a 4 ( δ ) ( a 2 ( δ ) ) 2 a 3 ( δ ) ( a 2 ( δ ) ) 3 / 2 + 1 1 according to (A3). Then, 4 ! 2 μ 2 a 2 ( δ ) ; thus, a 2 ( δ ) μ 2 12 . Note that r 2 = 1 max { 1 , 36 μ 2 } 1 36 μ 2 , so
2 r 2 a 2 ( δ ) 5 2 5 a 2 ( δ ) 36 μ 2 2 15 < 1 6 .
For the function log ( 1 + t ) on the interval [ 0 , 1 / 6 ] , we can easily obtain log ( 1 + t ) 5 6 t . According to the definition of standardized moment, we notice that a 2 ( δ ) = β ˜ β * H * 2 . Therefore, let t = 2 r 2 a 2 ( δ ) 5 ,
P l ( β ˜ ) P l ( β * ) log 1 + 2 r 2 a 2 ( δ ) 5 5 6 2 r 2 a 2 ( δ ) 5 = β ˜ β * H * 2 3 max { 1 , 36 μ 2 } ,
Then, (A5) is proven. For (A6), considering that μ 2 1 36 , max { 1 , 36 μ 2 } = 1 . Thus, according to (A8), we have
β ˜ β * H * 2 3 = a 2 ( δ ) 3 max { 1 , 36 μ 2 } P l ( β ˜ ) P l ( β * ) .
Since 36 μ 2 1 , r = min 1 , 1 6 μ = 1 . Applying Lemma A1 with r = 1 , we obtain
i = 2 a i ( δ ) i ! 3 5 a 2 ( δ ) .
In fact, given r = 1 and δ = β ˜ β * , P l ( β ˜ ) P l ( β * ) = P l ( β * + r δ ) P l ( β * ) . Now, for t > 0 , we have log ( 1 + t ) t . Hence, t = i = 2 1 i ! a i ( δ ) , and according to Lemma 1, we obtain
P l ( β ˜ ) P l ( β * ) = P l ( β * + r δ ) P l ( β * ) = log 1 + i = 2 1 i ! a i ( δ ) i = 2 1 i ! a i ( δ ) 3 5 a 2 ( δ ) = 3 5 β ˜ β * H * 2 .
This completes the proof of Lemma A2. □
Lemma A3.
Let max 1 i n | d i E [ d i ] | ( n 1 ) λ 3 , and suppose that β ^ is a solution of (7); then, for any β R n , we have
P l ( β ^ ) P l ( β ) λ 3 β ^ β 1 + λ β 1 λ β ^ 1 4 3 λ β 1 .
Furthermore, if β only has support on M, then we have
P l ( β ^ ) P l ( β ) 4 3 λ β ^ M β 1 .
Proof. 
First, notice that β ^ is a solution of (7); then,
1 n ( n 1 ) i = 1 n β ^ i d i + 1 n ( n 1 ) 1 i < j n log ( 1 + exp ( β ^ i + β ^ j ) ) + λ i = 1 n | β ^ i | 1 n ( n 1 ) i = 1 n β i d i + 1 n ( n 1 ) 1 i < j n log ( 1 + exp ( β i + β j ) ) + λ i = 1 n | β i | .
By adding 1 n ( n 1 ) i = 1 n ( β ^ i + β i ) E [ d i ] to both sides of the above inequality, we can obtain
1 n ( n 1 ) i = 1 n ( β ^ i + β i ) E [ d i ] 1 n ( n 1 ) i = 1 n β ^ i d i + 1 n ( n 1 ) 1 i < j n log ( 1 + exp ( β ^ i + β ^ j ) ) + λ i = 1 n | β ^ i | 1 n ( n 1 ) i = 1 n ( β ^ i + β i ) E [ d i ] 1 n ( n 1 ) i = 1 n β i d i + 1 n ( n 1 ) 1 i < j n log ( 1 + exp ( β i + β j ) ) + λ i = 1 n | β i | .
Thus,
1 n ( n 1 ) i = 1 n β ^ i E [ d i ] + 1 n ( n 1 ) 1 i < j n log ( 1 + exp ( β ^ i + β ^ j ) ) + λ i = 1 n | β ^ i | 1 n ( n 1 ) i = 1 n β ^ i E [ d i ] 1 n ( n 1 ) i = 1 n β i E [ d i ] 1 n ( n 1 ) i = 1 n β i d i + 1 n ( n 1 ) i = 1 n β i E [ d i ] + 1 n ( n 1 ) i = 1 n β ^ i d i + 1 n ( n 1 ) 1 i < j n log ( 1 + exp ( β i + β j ) ) + λ i = 1 n | β i | = 1 n ( n 1 ) i = 1 n β ^ ( d i E [ d i ] ) 1 n ( n 1 ) i = 1 n β i ( d i E [ d i ] ) 1 n ( n 1 ) i = 1 n β i E [ d i ] + 1 n ( n 1 ) 1 i < j n log ( 1 + exp ( β i + β j ) ) + λ i = 1 n | β i | = 1 n ( n 1 ) i = 1 n ( β ^ i β i ) ( d i E [ d i ] ) 1 n ( n 1 ) i = 1 n β i E [ d i ] + 1 n ( n 1 ) 1 i < j n log ( 1 + exp ( β i + β j ) ) + λ i = 1 n | β i | .
According to the definitions of P l ( β ) and (A13), we have
P l ( β ^ ) P l ( β ) = 1 n ( n 1 ) i = 1 n β ^ i E [ d i ] + 1 n ( n 1 ) 1 i < j n log ( 1 + exp ( β ^ i + β ^ j ) ) 1 n ( n 1 ) i = 1 n β i E [ d i ] + 1 n ( n 1 ) 1 i < j n log ( 1 + exp ( β i + β j ) ) 1 n ( n 1 ) i = 1 n ( β ^ i β i ) ( d i E [ d i ] ) + λ i = 1 n | β i | λ i = 1 n | β ^ i | 1 n ( n 1 ) i = 1 n | β ^ i β i | | d i E [ d i ] | + λ i = 1 n | β i | λ i = 1 n | β ^ i | i = 1 n | β ^ i β i | 1 n ( n 1 ) max 1 i n | d i E [ d i ] | + λ i = 1 n | β i | λ i = 1 n | β ^ i | λ 3 i = 1 n | β ^ i β i | + λ i = 1 n | β i | λ i = 1 n | β ^ i | = λ 3 β ^ β 1 + λ β 1 λ β ^ 1 λ 3 i = 1 n ( | β ^ i | + | β i | ) + λ i = 1 n | β i | λ i = 1 n | β ^ i | = 4 λ 3 i = 1 n | β i | 2 λ 3 i = 1 n | β ^ i | 4 λ 3 i = 1 n | β i | = 4 λ 3 β 1 .
For (A12), according to the sparsity of β , i.e., β i 0 if i M and β i = 0 if i M ,
λ 3 β ^ β 1 + λ β 1 λ β ^ 1 = λ 3 β ^ M β 1 + λ 3 β ^ M C 1 + λ ( β 1 β ^ M 1 ) λ β ^ M C 1 λ 3 β ^ M β 1 + λ β ^ M C 1 + λ ( β 1 β ^ M 1 ) λ β ^ M C 1 = λ 3 β ^ M β 1 + λ ( β 1 β ^ M 1 ) λ 3 β ^ M β 1 + λ β ^ M β 1 = 4 λ 3 β ^ M β 1 .
Lemma A4.
For any β R n , β only has support on M and satisfies P l ( β ) P l ( β ^ ) . We assume that (12) holds. If β ^ is a solution of (7), then we have
β ^ M C 1 2 β ^ M β 1 ,
β ^ β 1 3 β ^ M β 1 .
Proof. 
We use the assumptions of Lemmas A4 and (A11); then,
0 P l ( β ^ ) P l ( β ) λ 3 β ^ β 1 + λ β 1 λ β ^ 1 .
Thus,
0 1 3 β ^ β 1 + β 1 β ^ 1 .
Adding 2 3 β ^ β 1 to both sides, we can obtain
2 3 β ^ β 1 β ^ β 1 + β 1 β ^ 1 .
In fact, | β ^ i β i | + | β i | | β ^ i | = 0 for i M and | β i | | β ^ i | | β ^ i β i | for i M . Then,
2 3 β ^ β 1 ( β ^ M C β M C 1 + β M C 1 β ^ M C 1 ) + ( β ^ M β 1 + β 1 β ^ M 1 ) = β ^ M β 1 + β 1 β ^ M 1 β ^ M β 1 + β ^ M β 1 = 2 β ^ M β 1 .
Therefore, β ^ β 1 3 β ^ M β 1 . Note that
2 3 β ^ M C 1 + 2 3 β ^ M β 1 = ( 2 3 β ^ M C β M C 1 ) + ( 2 3 β ^ M β 1 ) = 2 3 β ^ β 1 2 β ^ M β 1 .
Hence, β ^ M C 1 2 β ^ M β 1 . This completes the proof. □
Proof of Lemma 2. 
Note that according to (7) and (A11), P l ( β ^ ) P l ( β * ) 4 λ 3 β * 1 4 3 β * 1 β ^ β * H * 2 144 μ * 2 β * 1 = β ^ β * H * 2 108 μ * 2 . According to Theorem 1,
0 1 3 β ^ β * H * 2 P l ( β ^ ) P l ( β * ) .
In addition, using (A16) in Lemma A4, we know that β ^ satisfies the RE condition, so ( β ^ β * ) W , i.e., Condition A holds. According to Schwarz’s inequality and Condition A,
β ^ M β * 1 m β ^ M β * 2 m γ min * β ^ β * H * .
According to (A12) in Lemma A3, we see that
1 3 β ^ β * H * 2 P l ( β ^ ) P l ( β * ) 4 λ 3 β ^ M β * 1 4 λ m 3 γ min * β ^ β * H *
Thus, we can obtain
β ^ β * H * 4 λ m γ min * .
Combining (A19) and (A20), we have
1 3 β ^ β * H * 2 P l ( β ^ ) P l ( β * ) 4 λ m 3 γ min * · 4 λ m γ min * = 16 m λ 2 3 γ min * 2 ,
which proves (13). On the other hand, by applying (A20) to (A18), we can obtain
β ^ M β * 1 m γ min * · 4 λ m γ min * = 4 m λ γ min * 2 .
Therefore, using (A17) and (A22), we have
β ^ β * 1 3 β ^ M β * 1 12 m λ γ min * 2 .
This completes the proof of Lemma 2. □
Proof of Proposition 1. 
For η > 0 , since h ( y ) = e y is convex,
e η y 1 y 1 ( 1 ) e η + y ( 1 ) 1 ( 1 ) e η = ( 1 y ) e η 2 + ( 1 + y ) e η 2 , 1 y 1 .
Then, according to E [ Y j ] = E [ A i , j E [ A i , j ] ] = 0 ,
E [ e η Y j ] ( 1 E [ Y j ] ) e η 2 + ( 1 + E [ Y j ] ) e η 2 = e η 2 + e η 2 = 1 2 + 1 2 e 2 η e η = : e f ( t ) ,
where f ( t ) = t 2 + log 1 2 + 1 2 e t , t = 2 η . Thus,
f ( t ) = 1 2 + e t 1 + e t , f ( t ) = e t ( 1 + e t ) 2 = 1 1 + e t 1 1 1 + e t 1 4
for all t > 0 . We see that f ( 0 ) = f ( 0 ) = 0 . After using Taylor’s formula to expand f ( t ) at t = 0 , there exists ξ such that
f ( t ) = f ( 0 ) + f ( 0 ) ( t 0 ) + 1 2 f ( ξ t ) ( t 0 ) 2 = 1 2 f ( ξ t ) t 2 1 8 t 2 = η 2 2 .
By applying f ( t ) to (A23), using the independence of A i , j , we can obtain
E [ exp ( η j i ( A i , j E [ A i , j ] ) ] = E [ exp ( η j i Y j ] exp ( ( n 1 ) η 2 2 ) ,
which proves Hoeffding’s lemma in Proposition 1. Next, we apply Hoeffding’s lemma to derive the Hoeffding’s inequality for the sum of bounded independent random variables ( { A i , j } j i n ), with each A i , j bounded within [ 1 , 1 ] (see [25]). We see that d i = j i A i , j , and using Chernoff’s inequality for any η , t > 0 ,
P j i ( A i , j E [ A i , j ] ) t = P d i E [ d i ] t = P e η ( d i E [ d i ] ) e η t inf t > 0 e η t E e η ( d i E [ d i ] ) = inf t > 0 e η t E e η j i ( A i , j E [ A i , j ] ) ( By Hoeffding s lemma ) inf t > 0 e η t e ( n 1 ) η 2 2 = inf t > 0 e η t + ( n 1 ) η 2 2 = e t 2 2 ( n 1 ) .
The last equation holds because the nature of the quadratic function ensures that the smallest bounded is attained at η = t n 1 . Similarly, we can also prove that
P ( j i ( A i , j E [ A i , j ] ) t ) e t 2 2 ( n 1 ) .
Hence,
P j i ( A i , j E [ A i , j ] ) t P j i ( A i , j E [ A i , j ] ) t + P ( j i ( A i , j E [ A i , j ] ) t ) 2 e t 2 2 ( n 1 ) ,
which proves Hoeffding’s inequality.
Let us consider an event ( Ψ ( λ ) ) of KKT conditions evaluated at β * as follows:
Ψ ( λ ) = : 1 n 1 j i A i , j exp ( β i * + β j * ) 1 + exp ( β i * + β j * ) λ , i = 1 , 2 , , n .
Note that E [ A i , j ] = exp ( β i * + β j * ) 1 + exp ( β i * + β j * ) . Then, using Hoeffding’s inequality in Proposition 1 with t = ( n 1 ) λ 3 , for any ε > 0 , we have
P 1 n 1 j i A i , j exp ( β i * + β j * ) 1 + exp ( β i * + β j * ) λ = P j i A i , j E [ A i , j ] ( n 1 ) λ P j i A i , j E [ A i , j ] ( n 1 ) λ 3 2 exp ( ( n 1 ) λ 2 18 ) = : ε n .
Then, according to (A24), λ = 3 2 log ( 2 n ε ) n 1 . Hence, P ( Ψ ( λ ) ) 1 ε / n .
On the other hand, since max i | d i E [ d i ] | i j n | d i E [ d i ] | ,
P max 1 i n | d i E [ d i ] | ( n 1 ) λ 3 i = 1 n P | d i E [ d i ] | ( n 1 ) λ 3 = i = 1 n P j i A i , j E [ A i , j ] ( n 1 ) λ 3 ( by ( A 24 ) i = 1 n ε n = ε .
Therefore, applying the value of λ in (A25), we have
P max 1 i n | d i E [ d i ] | 2 ( n 1 ) log ( 2 n ε ) = 1 P max 1 i n | d i E [ d i ] | 2 ( n 1 ) log ( 2 n ε ) 1 ε .
This completes the proof of Proposition 1. □

References

  1. Fienberg, S.E. A brief history of statistical models for network analysis and open challenges. J. Comput. Graph. Stat. 2012, 21, 825–839. [Google Scholar] [CrossRef]
  2. Erdös, P.; Rényi, A. On random graphs I. Publ. Math. Debr. 1959, 6, 18. [Google Scholar] [CrossRef]
  3. Gilbert, E.N. Random graphs. Ann. Math. Stat. 1959, 30, 1141–1144. [Google Scholar] [CrossRef]
  4. Kolaczyk, E.D.; Krivitsky, P.N. On the question of effective sample size in network modeling: An asymptotic inquiry. Stat. Sci. Rev. J. Inst. Math. Stat. 2015, 30, 184. [Google Scholar]
  5. Newman, M. Networks; Oxford University Press: Oxford, UK, 2018. [Google Scholar]
  6. Holland, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 1983, 5, 109–137. [Google Scholar] [CrossRef]
  7. Chatterjee, S.; Diaconis, P.; Sly, A. Random graphs with a given degree sequence. Ann. Appl. Probab. 2011, 21, 1400–1435. [Google Scholar] [CrossRef]
  8. Mukherjee, R.; Mukherjee, S.; Sen, S. Detection thresholds for the β-model on sparse graphs. Ann. Stat. 2018, 46, 1288–1317. [Google Scholar] [CrossRef]
  9. Du, Y.; Qu, L.; Yan, T.; Zhang, Y. Time-varying β-model for dynamic directed networks. Scand. J. Stat. 2023. [Google Scholar] [CrossRef]
  10. Abbe, E. Community Detection and Stochastic Block Models: Recent Developments. J. Mach. Learn. Res. 2018, 18, 6446–6531. [Google Scholar]
  11. Rinaldo, A.; Petrović, S.; Fienberg, S.E. Maximum lilkelihood estimation in the β-model. Ann. Stat. 2013, 41, 1085–1110. [Google Scholar] [CrossRef]
  12. Pan, L.; Yan, T. Asymptotics in the β-model for networks with a differentially private degree sequence. Commun. Stat. Theory Methods 2020, 49, 4378–4393. [Google Scholar] [CrossRef]
  13. Luo, J.; Liu, T.; Wu, J.; Ahmed Ali, S.W. Asymptotic in undirected random graph models with a noisy degree sequence. Commun. Stat. Theory Methods 2022, 51, 789–810. [Google Scholar] [CrossRef]
  14. Fan, Y.; Zhang, H.; Yan, T. Asymptotic Theory for Differentially Private Generalized beta-models with Parameters Increasing. Stat. Interface 2020, 13, 385–398. [Google Scholar] [CrossRef]
  15. Chen, N.; Olvera-Cravioto, M. Directed random graphs with given degree distributions. Stoch. Syst. 2013, 3, 147–186. [Google Scholar] [CrossRef]
  16. Kim, H.; Del Genio, C.I.; Bassler, K.E.; Toroczkai, Z. Constructing and sampling directed graphs with given degree sequences. New J. Phys. 2012, 14, 023012. [Google Scholar] [CrossRef]
  17. Yan, T.; Leng, C.; Zhu, J. Asymptotics in directed exponential random graph models with an increasing bi-degree sequence. Ann. Stat. 2016, 44, 31–57. [Google Scholar] [CrossRef]
  18. Yan, T.; Xu, J. A central limit theorem in the β-model for undirected random graphs with a diverging number of vertices. Biometrika 2013, 100, 519–524. [Google Scholar] [CrossRef]
  19. Chen, M.; Kato, K.; Leng, C. Analysis of networks via the sparse β-model. J. R. Stat. Soc. Ser. B Stat. Methodol. 2021, 83, 887–910. [Google Scholar] [CrossRef]
  20. Karwa, V.; Slavković, A. Inference Using Noisy Degrees: Differentially Private Beta-Model and Synthetic Graphs. Ann. Stat. 2016, 44, 87–112. [Google Scholar] [CrossRef]
  21. Meinshausen, N.; Yu, B. Lasso-type recovery of sparse representations for high-dimensional data. Ann. Stat. 2009, 37, 246–270. [Google Scholar] [CrossRef]
  22. Bickel, P.J.; Ritov, Y.; Tsybakov, A.B. Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 2009, 37, 1705–1732. [Google Scholar] [CrossRef]
  23. Huang, H.; Gao, Y.; Zhang, H.; Li, B. Weighted Lasso estimates for sparse logistic regression: Non-asymptotic properties with measurement errors. Acta Math. Sci. 2021, 41, 207–230. [Google Scholar] [CrossRef]
  24. Bühlmann, P.; Van De Geer, S. Statistics for High-Dimensional Data: Methods, Theory and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  25. Zhang, H.; Chen, S.X. Concentration inequalities for statistical inference. Commun. Math. Res. 2021, 37, 1–85. [Google Scholar]
  26. Cohen, W.W. Enron Email Dataset. Available online: https://www.cs.cmu.edu/enron/ (accessed on 7 May 2015).
  27. Zhou, Y.; Goldberg, M.; Magdon-Ismail, M.; Wallace, A. Strategies for cleaning organizational emails with an application to enron email dataset. In Proceedings of the 5th Conference of North American Association for Computational Social and Organizational Science (No. 0621303), Atlanta, GA, USA, 7–9 June 2007. [Google Scholar]
  28. Stein, S.; Leng, C. A Sparse β-Model with Covariates for Networks. arXiv 2020, arXiv:2010.13604. [Google Scholar]
  29. Liu, C.; Zhang, H.; Jing, Y. Model-assisted estimators with auxiliary functional data. Commun. Math. Res. 2022, 38, 81–98. [Google Scholar] [CrossRef]
  30. Zhang, H.; Lei, X. Growing-dimensional Partially Functional Linear Models: Non-asymptotic Optimal Prediction Error. Phys. Scr. 2023, 98, 095216. [Google Scholar] [CrossRef]
Figure 1. Simulation results of 𝓁 2 error and 𝓁 1 error for Case 1.
Figure 1. Simulation results of 𝓁 2 error and 𝓁 1 error for Case 1.
Mathematics 11 04685 g001
Figure 2. Simulation results of 𝓁 2 error and 𝓁 1 error for Case 2.
Figure 2. Simulation results of 𝓁 2 error and 𝓁 1 error for Case 2.
Mathematics 11 04685 g002
Figure 3. Simulation results of 𝓁 2 error and 𝓁 1 error for Case 3.
Figure 3. Simulation results of 𝓁 2 error and 𝓁 1 error for Case 3.
Mathematics 11 04685 g003
Figure 4. Enron email dataset. The vertex sizes are proportional to nodal degrees. The red circles indicate a nodal degree of more than 43. The blue circles indicate isolated nodes.
Figure 4. Enron email dataset. The vertex sizes are proportional to nodal degrees. The red circles indicate a nodal degree of more than 43. The blue circles indicate isolated nodes.
Mathematics 11 04685 g004
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, X.; Pan, L.; Cheng, K.; Liu, C. Optimal Non-Asymptotic Bounds for the Sparse β Model. Mathematics 2023, 11, 4685. https://doi.org/10.3390/math11224685

AMA Style

Yang X, Pan L, Cheng K, Liu C. Optimal Non-Asymptotic Bounds for the Sparse β Model. Mathematics. 2023; 11(22):4685. https://doi.org/10.3390/math11224685

Chicago/Turabian Style

Yang, Xiaowei, Lu Pan, Kun Cheng, and Chao Liu. 2023. "Optimal Non-Asymptotic Bounds for the Sparse β Model" Mathematics 11, no. 22: 4685. https://doi.org/10.3390/math11224685

APA Style

Yang, X., Pan, L., Cheng, K., & Liu, C. (2023). Optimal Non-Asymptotic Bounds for the Sparse β Model. Mathematics, 11(22), 4685. https://doi.org/10.3390/math11224685

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop