1. Introduction
High-dimensional covariance matrix estimation is an important topic in statistical inference and learning. The objective is to estimate the covariance matrix of
p variables based on sample data of size
n. Much statistical analysis of high-dimensional data involves covariance estimation. In a high-dimensional setting (usually
), researchers usually assume that the covariance matrix is sparse and have proposed various regularization techniques [
1,
2,
3,
4,
5,
6,
7,
8,
9] to consistently estimate covariance matrix
. However, this assumption is not appropriate in many applications such as financial analysis, depending on the equity market risks, gene expressions stimulated by cytokines, etc. A more reasonable assumption based on the factor model [
10,
11,
12,
13,
14] is that the high-dimensional covariance matrix is the sum of a low-rank matrix and a sparse matrix. Specifically, a factor model can be formulated as
where
is the observed variable,
is the matrix of factor loadings,
is a
vector of common factors, and
is the idiosyncratic error component, uncorrelated with
. Under this model, the covariance matrix of
is given by
where
is the covariance matrix of
and
is the covariance matrix of
. Since
K is usually less than
p,
is a low-rank matrix. More generally, there is a large class of problems where one is interested in estimating the covariance matrix of
from the noisy observations
, where the noise
is uncorrelated with the signal
. In addition, if
is generated from a linear system so that some of its elements are linear functions of the others, then the covariance matrix of
has low-rank. Based on the structural decomposition (
2), this paper proposes a novel approach to estimate the covariance matrix by solving a constrained low-rank and sparse regularized optimization problem.
Under structural decomposition (
2), researchers have proposed some approaches to estimate the covariance matrix. When the common factors in Model (
1) are known and observable and
is diagonal, Reference [
10] proposed a least-squares-based covariance estimation (LS-CE). The core idea of LS-CE is first to estimate
by the least-squares method and, then, obtain the final estimator assuming that
is diagonal. Compared to the sample covariance matrix estimator, the LS-CE estimator is always invertible, even in a high-dimensional setting. Moreover, this estimator is shown to have substantial gains when estimating the inverse of the covariance matrix, however, it does not have much advantage when estimating the covariance matrix. Instead of assuming diagonal
, Reference [
11] assumed that
is sparse and applied the adaptive thresholding technique of [
4] to obtain an estimator for
. Then, a least-squares-based covariance estimator with conditional sparsity (LS-CS-CE) under Model (
1) was studied in [
11]. More recently, a robust approach based on Huber loss minimization was proposed in [
13]. This approach estimates the initial joint covariance matrix of the observed data and the factors and, then, uses it to recover the covariance matrix of the observed data. Moreover, the proposed estimator in [
13] only requires the condition that the fourth moment of the data exists. This method is applicable to a wider range of data, including sub-Gaussian and elliptical distributions. When the common factors are unknown and unobservable, Reference [
12] proposed a non-parametric estimator via the thresholding principal orthogonal complements, which is called the principal orthogonal complement thresholding (POET) method. The POET uses the
K principal components to estimate
and applies the adaptive thresholding technique [
4] on the rest of the
components to estimate
. However, as stated in [
9], using the simple thresholding approaches in [
1,
2,
3,
4,
5,
9] to estimate
may lead to the covariance estimators in [
11,
12,
13], which are necessarily positive semi-definite.
Moreover, as suggested in [
14], an intuitive optimization-type approach is to solve the problem:
where
is the sample covariance matrix,
denotes the rank of matrix
,
if
and
if
,
is the index set of non-diagonal elements of
, and
and
are tuning parameters. To facilitate the calculation, Reference [
14] proposed the low-rank and sparse covariance estimator (LOREC) via solving the following convex relaxation optimization problem:
where
denotes the nuclear norm of
and
is the
ith largest singular value of
. However, the estimators for
and
proposed in [
14] are not guaranteed to be positive semi-definite. Clearly, the optimization problems (
3) and (
4) can be considered as special cases of the robust principal component analysis (RPCA) in [
15] and its convex relaxation, respectively.
In this paper, we propose an optimization model similar to some models of RPCA, which is formulated as
where
denotes the number of the non-zero elements of
. Reference [
15] showed that the principal component pursuit estimate solving
exactly recovers the low-rank
and the sparse
. Furthermore, Reference [
16] suggested that
and
can be estimated by solving the following unconstrained optimization problem:
More references about the RPCA and related works based on convex and non-convex regularization can be found in [
17,
18,
19,
20,
21].
The main contributions of this paper can be summarized as follows:
We construct a constrained adaptive
-type regularized optimization problem for covariance estimation, which is a non-smooth and non-convex problem. This optimization problem is based on the structural decomposition (
2) and the idea of RPCA in [
16], but is not limited to the factor model (
1).
We study the first-order optimality condition of this problem and define a class of hybrid first-order stationary points. We establish the relationship between the first-order stationary point and the global minimizer of this problem.
We propose a smoothing alternating updating method to solve the constructed non-smooth, non-convex optimization problem and established its convergence. The simulation results show that the proposed smoothing alternating updating method is efficacious for the constrained adaptive -type regularized optimization problem.
The rest of this paper is organized as follows. In
Section 2, we give some notations and preliminaries. In
Section 3, we construct a constrained adaptive
-type regularized optimization problem for covariance estimation. In
Section 4, we study the first-order optimality condition of this problem and define a class of hybrid first-order stationary points. In
Section 5, we propose a smoothing alternating updating method to solve the non-smooth, non-convex optimization problem and establish its convergence. The simulation results are given in
Section 6. The conclusions and discussion are given in
Section 7, while the mathematical proofs are given in
Section 8.
2. Notations and Preliminaries
Throughout this paper, we use the following notations. The sets of positive semi-definite symmetric and positive definite symmetric matrices are denoted by and , respectively. It is known that and are cones.
Given matrices and , where is the th element of , , and denotes the Hadamard product for all . Define the elementwise maximum norm (or max norm) , the Frobenius norm , and the spectral norm , where is 2-norm of vector . Moreover, given a matrix and a index set , denotes the array of the same dimension such that for and are undefined for . For convenience, we will write the th element of as regardless if it is defined or not.
Given a vector
,
denotes a square matrix with vector
on its diagonal and zeros elsewhere. Given a matrix
and letting
denote the
ith largest eigenvalue of
,
, write
. Define sign operator:
and set
.
Let
be a convex function, and let
. The subdifferential of
f at
is defined by
where
denotes the domain of
f,
. In other words,
is the closed convex hull of all points of the form
, where
is any sequence that converges to
while avoiding the set for which
f fails to be differentiable. Moreover, the singular subdifferential of
f at
is defined by
where
denotes the epigraph of
f.
Finally, we give the definition of the proximal mapping in this paper. Given a closed set
and a function
, the proximal mapping on
of
f is the operator given by
3. Covariance Estimation with Adaptive -Type Regularization
Considering that sparse covariance matrix estimation is intrinsically a heteroscedastic problem ([
4]) and
and
are all positive semi-definite matrices, we estimate the covariance matrix by solving the following adaptive
-type regularized optimization problem:
where
,
is the Schatten-
q quasi-norm of matrix
and
and
are tuning parameters. Note that although we use the same
q for
and
here for simplicity, it can be different in general. Our approach is called
regularized covariance (
-CSC) estimation with conditional sparsity. The final estimator
is called the
regularized covariance estimator with conditional sparsity (
-CSCE).
To deal with heteroscedasticity in sparse covariance matrix estimation, we adopt the adaptive thresholding approach of Reference [
4] and [
12] to calculate the entry-dependent threshold
in (
7) for
as
where
is a constant. Note that these entry-dependent tuning parameters for
in Problem (
7) depend on the variance of the corresponding variables. The parameter
can be taken as the
u-th diagonal elements of the estimator of the POET for
. It is worth mentioning that, when the data matrix is normalized, the covariance matrix is equivalent to its correlation matrix, in which case, all tuning parameters are
, and the adaptive
regularization function acting on
becomes the ordinary
regularization function.
We now discuss the selection of
. The common one is the ordinary sample covariance matrix
where
,
is the
i-th sample vector,
. The ordinary sample covariance is suitable for the data with exponential-type tails or polynomial-type tails; see [
1,
4,
8]. However, in many practical problems data tend to be heavy-tailed. References [
9,
13,
22] defined a Huber minimization sample covariance matrix, which is suitable for the data with finite fourth moment. Moreover, Reference [
9] also defined a rank-based covariance matrix and a median of means covariance matrix for data with other distribution conditions.
Finally, we discuss some basic properties of Problem (
7).
Proposition 1. The following statements of in Problem (7) hold: - (i)
is proper, closed, and coercive, i.e., .
- (ii)
attains its minimal value over .
- (iii)
Let ; the gradient is Lipschitz continuous with a constant , that is, for any , , it holds that - (iv)
Given a matrix , the partial gradient function is Lipschitz continuous with a constant , that is, for any , , it holds that Moreover, given a matrix , the partial gradient function is Lipschitz continuous with a constant .
4. Necessary Optimality Conditions
In this section, we define a class of first-order stationary points for Problem (
7), which is a hybridization of the first-order stationary points in [
23,
24,
25] and the fixed point in [
25]. Furthermore, we prove that any global minimizer of Problem (
7) is a first-order stationary point of Problem (
7). We also define an approximate variant of this stationary point. It is worth mentioning that the constraints in Problem (
7) are treated as ordinary closed convex set constraints in this paper.
For the simple unconstrained
regularized sparse optimization problem [
23,
24]:
where
, its the first-order stationary point is defined as the point satisfying
For the simple unconstrained
regularized rank optimization problem [
25]:
its first-order stationary point (or fixed point) is defined as the point satisfying
where
ℓ is a constant greater than the Lipschitz constant of
. Inspired by these works, we can define a class of hybrid first-order stationary points of Problem (
7).
Definition 1. is said to be the first-order stationary point of Problem (7) if satisfieswhere and, , , can be any finite array. Next, we investigate the relationship between the global minimizer of Problem (
7) and the first-order stationary point of Problem (
7) without any qualification conditions.
Theorem 1. Suppose that is a global minimizer of Problem (7), and let . Then, is a first-order stationary point with of Problem (7). Again, under some qualification conditions, the form of the first-order stationary point of Problem (
7) can be further analyzed. We now discuss the specific form of the subdifferential of
at
. Note that
where
is the mapping acting on array
. The adjoint mapping
is defined by
Specifically,
. Clearly,
is convex, and
and
are linear mappings. It follows from Theorem 23.9 in [
26] and Theorem 2.51 in [
27] that, if
, then
Moreover, suppose that the qualification condition
holds, where
is the kernel of linear mapping
, then
Based on these discussions, the following statement holds: If
satisfies
where
is defined in Definition 1 and the qualification condition
holds, then
is a first-order stationary point of Problem (
7).
Obviously, the qualification condition
holds automatically if
is locally Lipschitz continuous at
. In other words, if
, this qualification condition holds. Indeed, the true covariance matrix
is often assumed to be positive definite [
7,
8]. When
, we can establish the lower bound theory of the first-order stationary point of Problem (
7).
Theorem 2. Let be a first-order stationary point of Problem (7) satisfying for some and , . If , then Here, we only give the lower bound theory of the first-order stationary point of Problem (
7). With reference to the definitions of the first- and second-order stationary point established by [
23,
24] for the general unconstrained
regularized optimization, a class of second-order necessary optimality conditions of Problem (
7) and its lower bound theory can be considered.
Note that Problem (
7) is actually a very challenging optimization problem due to the positive definite constraints and the non-Lipschitz regularization terms, even finding a first-order stationary point. Here, we define a class of approximate first-order stationary points of Problem (
7).
Definition 2. We say that is an ε-approximate first-order stationary point () of Problem (7) if there exist a set such thatwhere is defined in Definition 1. Moreover, if
is a first-order stationary point of Problem (
7), it is an
-approximate first-order stationary point of Problem (
7) with
. If
is an
-approximate first-order stationary point of Problem (
7) with
, it is a first-order stationary point of Problem (
7) with
. In practice, the stopping point of any algorithm for an optimization problem is usually an approximate stationary point, rather than a stationary point.