A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation

Xia, Xiaochao; Ming, Hao

doi:10.3390/math10244638

Open AccessArticle

A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation

by

Xiaochao Xia

^*,†

and

Hao Ming

^†

College of Mathematics and Statistics, Chongqing University, Chongqing 401331, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2022, 10(24), 4638; https://doi.org/10.3390/math10244638

Submission received: 14 November 2022 / Revised: 22 November 2022 / Accepted: 3 December 2022 / Published: 7 December 2022

(This article belongs to the Special Issue Statistical Methods for High-Dimensional and Massive Datasets)

Download Versions Notes

Abstract

:

Considering the influence of conditional variables is crucial to statistical modeling, ignoring this may lead to misleading results. Recently, Ma, Li and Tsai proposed the quantile partial correlation (QPC)-based screening approach that takes into account conditional variables for ultrahigh dimensional data. In this paper, we propose a nonparametric version of quantile partial correlation (NQPC), which is able to describe the influence of conditional variables on other relevant variables more flexibly and precisely. Specifically, the NQPC firstly removes the effect of conditional variables via fitting two nonparametric additive models, which differs from the conventional partial correlation that fits two parametric models, and secondly computes the QPC of the resulting residuals as NQPC. This measure is very useful in the situation where the conditional variables are highly nonlinearly correlated with both the predictors and response. Then, we employ this NQPC as the screening utility to do variable screening. A variable screening procedure based on NPQC (NQPC-SIS) is proposed. Theoretically, we prove that the NQPC-SIS enjoys the sure screening property that, with probability going to one, the selected subset can recruit all the truly important predictors under mild conditions. Finally, extensive simulations and an empirical application are carried out to demonstrate the usefulness of our proposal.

Keywords:

ultrahigh dimensional screening; quantile partial correlation; conditional variables; sure screening property

MSC:

62H30; 62J07

1. Introduction

Variable screening technique has been demonstrated as a computationally fast and efficient tool in solving many problems in ultrahigh dimensions. For example, in many scientific areas, such as biological genetics, finance and econometrics, we may collect the ultrahigh dimensional data sets (e.g., biomarkers, financial factors, assets and stocks), where the number

p_{n}

of predictors extremely exceeds the sample size n. Theoretically, ultrahigh dimension often refers to the dimensionality

p_{n}

and sample size n satisfies the relationship:

p_{n} = O (exp (n^{a}))

for some constant

a > 0

. Variable screening is able to reduce the computational cost, to avoid the instability of algorithms, and to improve the estimation accuracy. These issues exist in the variable selection approaches based on LASSO [1], SCAD [2,3] or MCP [4] for ultrahigh dimensional data. Since the seminal work of [5], which pioneeringly proposed the sure independence screening (SIS) procedure, many variable screening approaches have been consecutively documented over the last fifteen years, including the model-based methods (e.g., [6,7,8,9,10,11]) and the model-free methods [12,13,14,15,16,17,18,19,20]. These papers have showed that with probability approaching one, the set of selected predictors contain the set of all truly important predictors.

Most marginal approaches focus only on developing various effective and robust measures to characterize the marginal association between the response and individual predictor. Whereas, these methods do not take into consideration the influence of conditional variables or confounding factors on the response. A simple application of SIS is relatively rough since SIS may perform poorly when predictors are highly correlated with each other. Some predictors that are weakly relevant or irrelevant, but jointly correlated to the response, may be excluded in the final model after applying marginal screening methods. This will result in a high false positive rate (FPR). To surmount this weakness, an iterated screening algorithm or a penalization-based variable selection is usually offered as a refined follow-up step (e.g., [5,10]).

Conditional variable screening can be viewed as an important extension of the marginal screening. It accounts for conditional information when calculating the marginal screening utility. There is relatively less work in the literature. To name a few, Ref. [21] proposed a conditional SIS (CIS) procedure to improve the performance of SIS because some correlated conditional variables may increase the chance of boosting the rank of the marginally weak predictor and that of reducing the number of false negatives. The paper [22] proposed a confounder-adjusted screening method for high dimensional censoring data, in which the additional environmental confounders are regarded as conditional variables. The researchers in [23] studied the variable screening by incorporating within-subject correlation for ultrahigh dimensional longitudinal data, where they used some baseline variables as conditional variables. Ref. [24] proposed a conditional distance correlation-based screening via kernel smoothing method, while [25] further presented a screening procedure based on conditional distance correlation, which is similar to [24] in methodology, but differs in theory. Additionally, Ref. [11] developed a conditional quantile correlation-based screening approach using the B-spline smoothing technique. However, in [11,24,25], among others, the conditional variable they considered is only univariate. Further, Ref. [21] focuses on the generalized linear models, but cannot handle heavy-tailed data. For this regard, we aim to develop a screener that behaves more robustly to outliers and heavy-tailed data, and simultaneously considers more than one conditional variable. On the choice of conditional variables, one can achieve that through some prior knowledge such as published research work or the experience of experts from relevant subjects. When no prior knowledge is available, one can apply some marginal screening approaches, such as the SIS or its robust variants, to select several top-ranked predictors as conditional variables.

On the other hand, to the best of our knowledge, several works have considered multiple conditional variables based on distinct partial correlations. For instance, Ref. [26] proposed a thresholded partial correlation approach to select significant variables in linear regression models. Additionally, Ref. [17] presented a screening procedure on the basis of the quantile partial correlation in [27], and they referred to the procedure as QPC-SIS. More recently, Ref. [28] proposed a copula partial correlation-based screening approach. It is worth noting that the partial correlation used in both [17,28] removes the effect of conditional variables on the response and each predictor through fitting two parametric models with a linear structure. However, this manner may be ineffective, especially when the conditional variables have a nonlinear influence on the response nonlinear. This motivates us to work out a flexible way to control the impact of conditional variables. Meanwhile, we also take into account the issue of the robustness to outlying or heavy-tail response in this paper.

This paper contributes a robust and flexible conditional variable screening procedure via a partial correlation coefficient, which is a non-trivial extension of [17]. First of all, in order to precisely control conditional variables, we propose a nonparametric definition of QPC, which extends that of [17] and allows for more flexibility. Specifically, we first fit two nonparametric additive models to remove the effect of conditional variables on the response and an individual predictor, where we use the B-spline smoothing technique to estimate the nonparametric functions. This can be viewed as a nonparametric adjustment for controlling conditional variables. By that, we can obtain two residuals, on which a quantile correlation can be calculated to formulate a nonparametric QPC. Second, we use this quantity as the screening utility in variable screening. This procedure can be implemented rapidly. We refer to this procedure as the nonparametric quantile partial correlation-based screening, denoted as NQPC-SIS. Third, theoretically, we establish the sure screening property for NQPC-SIS under some mild conditions. Compared to [17], our approach is more flexible and our theory on the sure screening property is more difficult to derive. Moreover, our screening idea can be easily transferred to some existing screening methods that use some popular partial correlation.

The remainder of the paper is organized as follows. In Section 2, the NQPC-SIS is introduced. The technical conditions needed are listed and asymptotic properties are established in Section 3. Section 4 provides an iterative algorithm for a further refinement. Numerical studies and empirical analysis of real data set are carried out in Section 5. Concluding remarks are given in Section 6. All the proofs of the main results are relegated to the Appendix A.

2. Methodology

2.1. A Preliminary

In this section, we formally introduce the NQPC-SIS procedure. To begin with, we give some background on the quantile correlation (QC) introduced in [27]. Let X and Y be two random variables, and

E X

be the expectation of X. The definition of QC is formulated as

{qcor}_{τ} (Y, X) = \frac{E [ψ_{τ} (Y - Q_{τ, Y}) (X - E (X))]}{\sqrt{var (I (Y - Q_{τ, Y} > 0)) var (X)}},

(1)

where

Q_{τ, Y}

is the

τ

th quantile of Y, and

ψ_{τ} (u) = τ - I (u < 0)

for some quantile level

τ \in (0, 1)

, here

I (\cdot)

denotes an indicator function. This correlation takes on a value between

- 1

and 1 and is asymmetric with respect to Y and X compared with the conventional correlation coefficient. The QC shares the merits: the property of monotone invariance for Y as well as the robustness of Y, due to the use of the quantile rather than the mean in the definition. Thus, QC affects little in the presence of outliers in Y. Besides, as shown in [27],

{qcor}_{τ} (Y, X)

is closely related to the quantile regression. If we denote by

(a_{0 τ}^{*}, a_{1 τ}^{*})

the minimizer of

E {ρ_{τ} (Y - a_{0 τ} - a_{1 τ} X)}

with respect to

a_{0 τ}

and

a_{1 τ}

, where

ρ_{τ} (u) = u [τ - I (u < 0)]

. Then, it follows that

{qcor}_{τ} (Y, X) = φ (a_{1 τ}^{*})

, where

φ (\cdot)

is a continuous and increasing function, and

φ (a_{1 τ}^{*}) = 0

if and only if

a_{1 τ}^{*} = 0

.

When QC is used as a marginal screening utility for variable screening, the screening results obtained may be misleading when the predictors are highly correlated. To overcome this problem, Ref. [17] proposed the screening based on quantile partial correlation (QPC) to reduce the effect from conditional predictors. For the sake of presentation, write

X_{- j} = {(X_{k}, k \neq j)}^{T}

for

j = 1, \dots, p_{n}

. The QPC in [17] is defined as

\begin{matrix} {qpcor}_{τ} (Y, X_{j} | X_{- j}) & = & \frac{cov (ψ_{τ} (Y - X_{- j}^{T} α_{j}^{0}), X_{j} - X_{- j}^{T} θ_{j}^{0})}{\sqrt{var (ψ_{τ} (Y - X_{- j}^{T} α_{j}^{0})) var (X_{j} - X_{- j}^{T} θ_{j}^{0})}} \\ = & \frac{E {ψ_{τ} (Y - X_{- j}^{T} α_{j}^{0}) (X_{j} - X_{- j}^{T} θ_{j}^{0})}}{\sqrt{τ (1 - τ) σ_{j}^{2}}}, \end{matrix}

(2)

where

σ_{j}^{2} = var (X_{j} - X_{- j}^{T} θ_{j}^{0})

,

α_{j}^{0} = {argmin}_{α_{j}} E {ρ_{τ} (Y - X_{- j}^{T} α_{j})}

and

θ_{j}^{0} = {argmin}_{θ_{j}} E {{(X_{j} - X_{- j}^{T} θ_{j})}^{2}}

. When applying the QPC to variable screening, we must estimate two quantities

α_{j}^{0}

and

θ_{j}^{0}

in advance. However, for ultrahigh dimensional data, the dimensionality of

X_{- j}

is

p_{n} - 1

, which can still be much bigger than the sample size n. In this situation, it is difficult to obtain the estimators of

α_{j}^{0}

and

θ_{j}^{0}

. On the other hand, it is usually believed that the useful conditional variables are relatively less. Thus, it is reasonable to consider a small subset of

{k : k \neq j, k = 1 \dots, p_{n}}

, denoted by

S_{j}

in practice. Here,

S_{j}

is said to be conditional set with a size smaller than n and it can be specified as the set of previously selected variables and the variables related to the jth predictor, if there is no prior knowledge on it. As a result, Ref. [17] suggested using the following measure to perform variable screening:

\begin{matrix} {qpcor}_{τ} (Y, X_{j} | X_{S_{j}}) = \frac{E {ψ_{τ} (Y - X_{S_{j}}^{T} α_{j}^{0}) (X_{j} - X_{S_{j}}^{T} θ_{j}^{0})}}{\sqrt{τ (1 - τ) σ_{j}^{2}}}, \end{matrix}

(3)

where

σ_{j}^{2} = var (X_{j} - X_{S_{j}}^{T} θ_{j}^{0})

,

α_{j}^{0} = {argmin}_{α_{j}} E {ρ_{τ} (Y - X_{S_{j}}^{T} α_{j})}

and

θ_{j}^{0} = {argmin}_{θ_{j}} E {{(X_{j} - X_{S_{j}}^{T} θ_{j})}^{2}}

, in which

X_{S_{j}} = {(X_{k}, k \in S_{j})}^{T}

.

From the definition, one can see that the QPC is just the QC between Y and

X_{j}

after removing the confounding effects of conditional variables

X_{S_{j}}

. Typically, it is through fitting two parametric regression models: one is to fit a linear quantile regression of Y on

X_{S_{j}}

, and another is on a multivariate linear regression of

X_{j}

on

X_{S_{j}}

. Afterwards, the QPC computes the QC of two residuals that are obtained from these two regression fittings. However, in real applications, the parametric models used to dispel the confounding effects may not be adequate, especially when a nonlinear dependence structure between the response and the predictions is present, which is quite common in high-dimensional data. This motivates us to consider a more flexible and efficient approach to control the influence of the confounding/conditional variables.

2.2. Proposed Method: NQPC-SIS

Without loss of generality, we assume that the predictors

{X_{j}, 1 \leq j \leq p}

are standardized and the response Y satisfies

τ

-qauntile centered, i.e.,

Q_{τ, Y} = 0

, which is similar to the treatment where the response is centered by mean. Then, we consider the quantile additive model as

\begin{matrix} Y = m_{1} (X_{1}) + m_{2} (X_{2}) + \dots + m_{p} (X_{p}) + ε, \end{matrix}

where the error term satisfies

P (ε < 0 | X) = τ

. This means that the conditional

τ

-quantile of Y given

X

is

Q_{τ, Y | X} = m_{1} (X_{1}) + m_{2} (X_{2}) + \dots + m_{p} (X_{p})

. We denote by

M_{*} = {j : m_{j} (X_{j}) \neq 0, 1 \leq j \leq p}

the active set, which indicates the set of indices associated with the nonzero coefficients in the true model and is often assumed to be sparse.

Let

| S_{j} |

be the cardinality of a set

S_{j}

, and

m_{j k}

and

g_{j k}

,

k \in S_{j},

be

ℓ_{2}

-smoothing functions satisfying some conditions. For the identification, we require that

\int m_{j k} (x) d x = 0

and

E {g_{j k} (X_{k})} = 0

for all

j, k

. Set

m_{j} (X_{S_{j}}) = \sum_{k \in S_{j}} m_{j k} (X_{k})

and

g_{j} (X_{S_{j}}) = \sum_{k \in S_{j}} g_{j k} (X_{k})

. A nonparametric version of QPC (denoted as NQPC) is formulated as

\begin{matrix} ϱ_{τ} (Y, X_{j} | X_{S_{j}}) = \frac{E {ψ_{τ} (Y - m_{j}^{0} (X_{S_{j}})) (X_{j} - g_{j}^{0} (X_{S_{j}}))}}{\sqrt{τ (1 - τ) σ_{j, 0}^{2}}}, \end{matrix}

(4)

where

σ_{j, 0}^{2} = var (X_{j} - g_{j}^{0} (X_{S_{j}}))

,

m_{j}^{0} = {argmin}_{m_{j}} E {ρ_{τ} (Y - m_{j} (X_{S_{j}}))}

and

g_{j}^{0} = {argmin}_{g_{j}} E {{(X_{j} - g_{j} (X_{S_{j}}))}^{2}}

. Suppose we have a dataset:

{(Y_{i}, X_{i}), i = 1, \dots, n}

consisting of n independent copies of

(Y, X)

, where the dimensionality of

X_{i}

is

p_{n}

. Let

X_{i, S_{j}}

be the sub-vector of

X_{i}

indexed by

S_{j}

. Then, a sample estimate for NQPC can be given as

\begin{matrix} {\tilde{ϱ}}_{τ} (Y, X_{j} | X_{S_{j}}) = \frac{n^{- 1} \sum_{i = 1}^{n} ψ_{τ} (Y_{i} - {\tilde{m}}_{j} (X_{i, S_{j}})) (X_{i j} - {\tilde{g}}_{j} (X_{i, S_{j}}))}{\sqrt{τ (1 - τ) {\tilde{σ}}_{j}^{2}}}, \end{matrix}

(5)

where

{\tilde{σ}}_{j}^{2} = n^{- 1} \sum_{i = 1}^{n} {(X_{i j} - {\tilde{g}}_{j} (X_{i, S_{j}}))}^{2}

,

{\tilde{m}}_{j} = {argmin}_{m_{j}} \frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - m_{j} (X_{i, S_{j}}))

and

{\tilde{g}}_{j} = {argmin}_{g_{j}} \frac{1}{n} \sum_{i = 1}^{n} {(X_{i j} - g_{j} (X_{i, S_{j}}))}^{2}

. Since

m_{j k}

and

g_{j k}

are unknown nonparametric functions, so

{\tilde{m}}_{j}

and

{\tilde{g}}_{j}

cannot be used, rendering

{\tilde{ϱ}}_{τ} (Y, X_{j} | X_{S_{j}})

inapplicable. In what follows, we estimate each of

m_{j k}

s and

g_{j k}

s by making use of nonparametric B-spline approximation.

To proceed, we denote

{B_{k} (\cdot), k = 1, \dots, L_{n}}

with

∥ B_{k} ∥_{\infty} \leq 1

by a sequence of normalized and centered B-spline basis functions, where

L_{n}

is the number of basis functions. Then, according to the theory of B-spline approximation ([29]), for a generic smoothing function m, there exists a vector

γ \in R^{L_{n}}

such that

m (x) \approx B {(x)}^{T} γ

, where

B (\cdot) = {(B_{1} (\cdot), \dots, B_{L_{n}} (\cdot))}^{T}

. Therefore, there exist vectors

α_{j k} \in R^{L_{n}}

and

θ_{j k} \in R^{L_{n}}

such that

m_{j k} (X_{k}) \approx B {(X_{k})}^{T} α_{j k}

and

g_{j k} (X_{k}) \approx B {(X_{k})}^{T} θ_{j k}

. Since

\int m_{j k} (x) d x = 0

and

E {g_{j k} (X_{k})} = 0

, it naturally implies that

E {B (X_{k})} = 0

for

k \in S_{j}

. Write

α_{j} = {({α_{j k}^{T}, k \in S_{j}})}^{T}

,

θ_{j} = {({θ_{j k}^{T}, k \in S_{j}})}^{T}

and

B_{j} = {({B {(X_{k})}^{T}, k \in S_{j}})}^{T}

. Denote by

{\hat{m}}_{j} (X_{i, S_{j}}) = B_{i j}^{T} {\hat{α}}_{j}

,

{\hat{g}}_{j} (X_{i, S_{j}}) = B_{i j}^{T} {\hat{θ}}_{j}

and

{\hat{σ}}_{j}^{2} = n^{- 1} \sum_{i = 1}^{n} {(X_{i j} - {\hat{g}}_{j} (X_{i, S_{j}}))}^{2}

, where

\begin{matrix} {\hat{α}}_{j} = {argmin}_{α_{j}} \frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - B_{i j}^{T} α_{j}) \end{matrix}

and

\begin{matrix} {\hat{θ}}_{j} = {argmin}_{θ_{j}} \frac{1}{n} \sum_{i = 1}^{n} {(X_{i j} - B_{i j}^{T} θ_{j})}^{2}, \end{matrix}

where

B_{i j}

indicates

B_{j}

within

B (X_{k})

being replaced by

B (X_{i k})

for

i = 1, \dots, n

and

k \in S_{j}

. Then, it follows that a feasible sample estimate for NQPC is given by

\begin{matrix} {\hat{ϱ}}_{τ} (Y, X_{j} | X_{S_{j}}) = \frac{n^{- 1} \sum_{i = 1}^{n} ψ_{τ} (Y_{i} - {\hat{m}}_{j} (X_{i, S_{j}})) (X_{i j} - {\hat{g}}_{j} (X_{i, S_{j}}))}{\sqrt{τ (1 - τ) {\hat{σ}}_{j}^{2}}} . \end{matrix}

(6)

Next, we employ the above NQPC estimator as a screening utility in variable screening. To this end, we denote

{\hat{M}}_{ν_{n}}

to be the selected active set via the screening procedure such that the maximal absolute sample NQPC of the selected variables in

{\hat{M}}_{ν_{n}}

are greater than a user-specified threshold value

ν_{n}

. In other words, we can select an active set of variables by

\begin{matrix} {\hat{M}}_{ν_{n}} = {j : | \hat{ϱ_{τ}} (Y, X_{j} | X_{S_{j}}) | \geq ν_{n} for 1 \leq j \leq p} . \end{matrix}

(7)

We name this procedure as the NQPC-based variable screening, abbreviated as NQPC-SIS. In the next section, we will provide some theoretical justification for this approach.

3. Theoretical Properties

To state our theoretical results, we first make some notations. Let

r_{n} = {max}_{1 \leq j \leq p} | S_{j} |

. Throughout the rest of the paper, for any matrix

A

, we use

∥ A ∥ = \sqrt{λ_{max} (A^{T} A)}

,

{∥ A ∥}_{\infty} = {max}_{i, j} | A_{i j} |

, and

λ_{min} (A)

and

λ_{max} (A)

to stand for the operator norm, the infinity norm as well as the minimum and maximum eigenvalues for a symmetric matrix

A

, respectively. In addition, for any vector

a

,

∥ a ∥ = \sqrt{\sum_{i} a_{i}^{2}}

means the Euclidean norm.

Denote

u_{j} = | ϱ_{τ} (Y, X_{j} | X_{S_{j}}) |

and

{\hat{u}}_{j} = | {\hat{ϱ}}_{τ} (Y, X_{j} | X_{S_{j}}) |

, where

ϱ_{τ} (Y, X_{j} | X_{S_{j}})

is given in Equation (4) and

{\hat{ϱ}}_{τ} (Y, X_{j} | X_{S_{j}})

is given in Equation (7). Further, we also denote

u_{j}^{*} = | ϱ_{τ}^{*} (Y, X_{j} | X_{S_{j}}) |

, where

\begin{matrix} ϱ_{τ}^{*} (Y, X_{j} | X_{S_{j}}) = \frac{E {ψ_{τ} (Y - B_{j}^{T} α_{j}^{0}) (X_{j} - B_{j}^{T} θ_{j}^{0})}}{\sqrt{τ (1 - τ) σ_{j}^{2}}}, \end{matrix}

(8)

where

σ_{j}^{2} = var (X_{j} - B_{j}^{T} θ_{j}^{0})

,

α_{j}^{0} = {argmin}_{α_{j}} E {ρ_{τ} (Y - B_{j}^{T} α_{j})}

and

θ_{j}^{0} = {argmin}_{θ_{j}} E {{(X_{j} - B_{j}^{T} θ_{j})}^{2}}

. Before we establish the uniform convergence of

{\hat{u}}_{j}

to

u_{j}

, we first investigate the bound of the gap between

u_{j}

and

u_{j}^{*}

, which is helpful to understand the marginal signal level after applying B-spline approximation to the population utility. We need the following conditions:

(B1): We assume that $E {X_{j} | X_{S_{j}}} = g_{j}^{0} (X_{S_{j}}) = \sum_{k \in S_{j}} g_{j k}^{0} (X_{k})$ and $X_{k}$ denotes the support of covariate $X_{k}$ . There exist some positive constants $C_{g}$ and $C_{m}$ such that for any $k \in S_{j}$ ,

$\begin{matrix} max_{1 \leq j \leq p} sup_{x \in X_{k}} | g_{j k}^{0} (x) - B_{j} {(x)}^{T} θ_{j k}^{0} | \leq C_{g} L_{n}^{- d}, \\ max_{1 \leq j \leq p} sup_{x \in X_{k}} | m_{j k}^{0} (x) - B_{j} {(x)}^{T} α_{j k}^{0} | \leq C_{m} L_{n}^{- d}, \end{matrix}$

where d is defined in condition (C1) below.
(B2): There exist some positive constants $c_{σ, min}, c_{σ, max}, {\tilde{c}}_{σ, min}, {\tilde{c}}_{σ, max}$ such that

$\begin{matrix} 0 < c_{σ, min} \leq max_{1 \leq j \leq p} σ_{j}^{2} \leq c_{σ, max} < \infty, \\ 0 < {\tilde{c}}_{σ, min} \leq max_{1 \leq j \leq p} σ_{j, 0}^{2} \leq {\tilde{c}}_{σ, max} < \infty, \end{matrix}$

where $σ_{j}^{2}$ and $σ_{j, 0}^{2}$ are given in (4) and (8), respectively.
(B3): In a neighborhood of $B_{j}^{T} α_{j}^{0}$ , the conditional density of Y given $(X_{j}, X_{S_{j}})$ , $f_{Y | (X_{j}, X_{S_{j}})} (y)$ , is bounded on the support of $(X_{j}, X_{S_{j}})$ and uniformly in j.
(B4): ${min}_{j \in M_{*}} u_{j} \geq C_{0} r_{n} n^{- κ}$ for some $C_{0} > 0$ and $0 < κ < 1 / 2$ .

Condition (B1) is imposed on the approximation error condition for nonparametric function in B-spline smoothing literature (e.g., [11,30,31]). Condition (B2) requires variances

σ_{j}^{2}

and

σ_{j, 0}^{2}

to be uniformly bounded. Condition (B3) implies that there exists a finite constant

{\bar{c}}_{f} > 0

such that for a small

ϵ > 0

,

{sup}_{| y - B_{j}^{T} α_{j}^{0} | < ϵ} f_{Y | (X_{j}, X_{S_{j}})} (y) \leq {\bar{c}}_{f}

holds uniformly. Condition (B4) guarantees that the marginal signal of active components in model

M_{*}

does not vanish. These conditions are similar to those in [17].

Proposition 1.

Under conditions (B1)–(B3), there exists a positive constant

M_{1 *}

such that

\begin{matrix} u_{j} - u_{j}^{*} \leq M_{1 *} r_{n} L_{n}^{- d}, \end{matrix}

In addition, if condition (B4) further holds, then

\begin{matrix} min_{j \in M_{*}} u_{j}^{*} \geq C_{0} ξ r_{n} n^{- κ}, \end{matrix}

provided that

L_{n}^{- d} \leq C_{0} (1 - ξ) n^{- κ} / M_{1}

for some

ξ \in (0, 1)

.

To establish the sure screening property, we make the following assumptions:

(C1): ${m_{k j}}$ and ${g_{k j}}$ belong to a class of functions $F$ , whose rth derivatives $m_{k j}^{(r)}$ and $g_{k j}^{(r)}$ exist and are Lipschitz of order $α$ ,

$F = {b (\cdot) : | b^{(r)} (s) - b^{(r)} (t) | \leq K | s - t |^{α}}, for s, t \in [a, b]$

for some positive constant K, where $[a, b]$ is the support of $X_{k}$ , r is a non-negative integer and $α \in (0, 1]$ such that $d = r + α > 0.5$ .
(C2): The joint density of $X$ , $f_{X}$ is bounded by two positive numbers $b_{1 f}$ and $b_{2 f}$ satisfying $b_{1 f} \leq f_{X} \leq b_{2 f}$ . The density of $X_{j}$ , $f_{X_{j}}$ is bounded away from zero and infinity uniformly in j, that is, there exist two positive constants $c_{1 f}$ and $c_{2 f}$ such that $c_{1 f} \leq f_{X_{j}} (x) \leq c_{2 f}$ .
(C3): There exist two positive constants $K_{1}$ and $K_{2}$ , such that $P (X_{j} > x | X_{- j}) \leq K_{1} exp (- K_{2}^{- 1} x)$ for every j.
(C4): The conditional density of Y given $X = x$ , $f_{Y | X = x} (y)$ , satisfies the Lipschitz condition of first order and $c_{3 f} \leq f_{Y | X = x} (y) \leq c_{4 f}$ for some positive constants $c_{3 f}$ and $c_{4 f}$ for any y in a neighborhood of $B_{j}^{T} α_{j}^{0}$ for $1 \leq j \leq p$ .
(C5): There exist some positive constants $M_{1}$ and $M_{2}$ such that ${sup}_{i, j} | B_{i j}^{T} α_{j}^{0} | \leq M_{1} < \infty$ , ${sup}_{i, j} | B_{i j}^{T} θ_{j}^{0} | \leq M_{2} < \infty$ . Furthermore, assume that ${min}_{1 \leq j \leq p} σ_{j}^{2} \geq M_{3} > 0$ for some constant $M_{3}$ .
(C6): There exists some constant $ξ \in (0, 1)$ such that $L_{n}^{- d} \leq C_{0} (1 - ξ) n^{- κ} / M_{1 *}$ .

Condition (C1) is a smoothness assumption on

{m_{k j}}

and

{g_{k j}}

in nonparametric B-spline-related literature ([7,32]). Condition (C3) is a moment constraint on each of the predictors. Conditions (C2), (C4) and (C5) are similar to those imposed in [17]. Condition (C6) is assumed to ensure the marginal signal level of truly active variables not too weak after B-spline approximation. The above conditions are standard in variable screening literature (e.g., [17,28]).

According to the properties of normalized B-splines and under the conditions (C1) and (C2) (c.f., [33,34]), we can obtain the fact that for each

j = 1, \dots, p

and

k = 1, \dots, L_{n}

, there exist positive constants

C_{1}, C_{2}

and

C_{3}

independent of

j, k

such that

\begin{matrix} C_{1} L_{n}^{- 1} \leq λ_{min} (E {B (X_{j}) B {(X_{j})}^{T}}) \leq λ_{max} (E {B (X_{j}) B {(X_{j})}^{T}}) \leq C_{2} L_{n}^{- 1}, \end{matrix}

(9)

and

\begin{matrix} E {B_{k}^{2} (X_{j})} \leq C_{3} L_{n}^{- 1} . \end{matrix}

(10)

The following lemma bounds the eigenvalues of the B-spline basis matrix from below and from above. This result extends Lemma 3 of [32] from a fixed dimension to a diverging dimension, which may be crucial to the independent interest of some readers.

Lemma 1.

Suppose that conditions (C1) and (C2) hold, then we have

\begin{matrix} C_{1} {(\frac{1 - δ_{0}}{2})}^{| S_{j} | - 1} L_{n}^{- 1} \leq λ_{min} (E {B_{j} B_{j}^{T}}) \leq λ_{max} (E {B_{j} B_{j}^{T}}) \leq C_{2} | S_{j} | L_{n}^{- 1}, \end{matrix}

where

δ_{0} = {(1 - b_{1 f}^{2} b_{2 f}^{- 2} ζ)}^{1 / 2}

for some constant

0 < ζ < 1

.

This result reveals that

r_{n}

plays an important role in bounding the eigenvalues of the B-spline basis matrix. When

r_{n}

goes to infinity rapidly, the minimum eigenvalue of the basis matrix will degrade to zero very quickly at an exponential rate. However, if the following result holds, then the divergence rate of

r_{n}

cannot achieve a polynomial order of n, but can be of an order of

log n

.

Theorem 1.

Suppose that conditions (B1)–(B5) and (C1)–(C5) hold and assume that

a_{0}^{- 2 r_{n}} L_{n} / n^{1 - 2 κ} = o (1)

and

a_{0}^{- 2 r_{n}} r_{n}^{3} L_{n} n^{- κ} = o (1)

are satisfied.

(i): For any $C > 0$ , then there exist some positive constants $c_{6}^{*}, c_{14}^{*}$ such that, for $0 < κ < 1 / 2$ and sufficiently large n,

$\begin{matrix} P (max_{1 \leq j \leq p_{n}} | {\hat{u}}_{j} - u_{j}^{*} | \geq C r_{n} n^{- κ}) \\ \leq p_{n} {7 exp (- c_{6}^{*} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) + [116 {(r_{n} L_{n})}^{2} + 60 r_{n} L_{n} + 10] exp (- c_{14}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ})}, \end{matrix}$

where $a_{0} = (1 - δ_{0}) / 2$ and $δ_{0}$ is given in Lemma 1.
(ii): In addition, if condition (C6) is further satisfied, by choosing $ν_{n} = {\tilde{C}}_{0} r_{n} n^{- κ}$ with ${\tilde{C}}_{0} \leq C_{0} ξ / 2$ , we have

$\begin{matrix} P (M_{*} \subset {\hat{M}}_{ν_{n}}) & \geq 1 - s_{n} {7 exp (- c_{6}^{*} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) \\ + [116 {(r_{n} L_{n})}^{2} + 60 r_{n} L_{n} + 10] exp (- c_{14}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ})} \end{matrix}$

for sufficiently large n, where $s_{n} = | M_{*} |$ .

The above establishes the sure screening property that all the relevant variables can be recruited with probability going to one in the final model. The probability bound in the property is free of

p_{n}

, but depends on

r_{n}

and the number of basis functions

L_{n}

. Though this ensures that NQPC-SIS retains all important predictors with high probability, the noisy variables can be included by NQPC-SIS. Ideally, this can be realized by the choice of

ν_{n}

, according to Theorem 1 and by setting

{max}_{j \notin M_{*}} | ϱ_{τ}^{*} (Y, X_{j} | X_{S_{j}}) | = o (r_{n} n^{- κ})

, to achieve the selection consistency, i.e.,

\begin{matrix} P (M_{*} = {\hat{M}}_{ν_{n}}) \to 1 \end{matrix}

when n is sufficiently large. This property can also be achieved by Theorem 1 and by assuming that

ϱ_{τ}^{*} (Y, X_{j} | X_{S_{j}}) = 0

for

j \notin M_{*}

. However, this would be too restrictive to check in practice. Similar to [17], we may assume that

\sum_{j = 1}^{p} u_{j}^{*} = O (n^{ς})

for some

ς > 0

to control the false selection rate. With this condition, we can obtain the following property to control the size of the selected model.

Theorem 2.

Under the conditions of Theorem 1 and by choosing

ν_{n} = {\tilde{C}}_{0} r_{n} n^{- κ}

with

{\tilde{C}}_{0} \leq C_{0} ξ / 2

and if

\sum_{j = 1}^{p} u_{j}^{*} = O (n^{ς})

for some

ς > 0

, then for some positive constant

C_{*}

, there exist some constants

{\tilde{c}}_{6}^{*}, {\tilde{c}}_{14}^{*}

such that

\begin{matrix} P (| {\hat{M}}_{ν_{n}} | \leq C_{*} r_{n}^{- 1} n^{κ + ς}) & \geq 1 - p_{n} {7 exp (- {\tilde{c}}_{6}^{*} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) \\ + [116 {(r_{n} L_{n})}^{2} + 60 r_{n} L_{n} + 10] exp (- {\tilde{c}}_{14}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ})} \end{matrix}

for sufficiently large n.

This theorem reveals that after an application of the NQPC-SIS, the dimensionality can be reduced from an exponential order to a polynomial size of n at the same time retaining all the important predictors with probability approaching one.

4. Algorithm for NQPC-SIS

To make the NQPS-SIS practically applicable, for each

X_{J}

, we need to specify the conditional set

S_{j}

. We note that a sequential test was developed to identify

S_{j}

in [17] via an application of the Fisher’s Z-transformation [35] and partial correlation. In this section, we provide a two-stage procedure based on nonparametric additive quantile regression model, which can be viewed as a complementary to [17].

To reduce the computational burden, we first apply the quantile-adaptive model-free feature screening (Qa-SIS) proposed by [13] to select a subset from

{X_{j}, 1 \leq j \leq p_{n}}

, denoted by

{\hat{M}}_{Qa-SIS}

with

| {\hat{M}}_{Qa-SIS} | = ⌊ 0.5 n L_{n}^{- 1} / log (n L_{n}^{- 1}) ⌋ + 1

, where

L_{n}

is the number of basis functions used in Qa-SIS and

⌊ a ⌋

denotes the largest integer not exceeding a. Second, for each

X_{j}

, if

X_{j} \in {\hat{M}}_{Qa-SIS}

, we set

C_{j} = {X_{k} | X_{k} \in {\hat{M}}_{Qa-SIS}, k \neq j}

, otherwise

C_{j} = {X_{k} | X_{k} \in {\hat{M}}_{Qa-SIS}, k \neq | {\hat{M}}_{Qa-SIS} |}

. Thus,

| C_{j} | = ⌊ 0.5 n L_{n}^{- 1} / log (n L_{n}^{- 1}) ⌋

. Third, we carry out a variable selection with SCAD penalty [2] based on additive quantile regression model for data set

{(X_{i j}, X_{i C_{j}}), i = 1, \dots, n}

and then a small reduced subset is obtained, denoted by

C_{j}^{v}

. Such a two-stage procedure can help to find the conditional subset for the jth variable and will be incorporated in the following algorithm. With a slight abuse of notation, we use

d_{n}

to denote the screening threshold parameter of the NQPC-SIS, in other words, for the NQPC-SIS, we select

d_{n}

covariates that correspond to the first

d_{n}

largest NQPCs.

Algorithm 1 has the same spirit as the QPCS algorithm of [17], who demonstrated empirically that the QPCS algorithm outperforms their QTCS and QFR algorithms. In the implementation, we choose

d_{n}^{*} = ⌊ 0.5 n L_{n}^{- 1} / log (n L_{n}^{- 1}) ⌋

and

d_{n} = ⌊ n / log n ⌋

, which does not exclude other choice. According to our limited simulation experience, this choice works satisfactorily. The values of

d_{n}^{*}

and

r_{n}

we take on cannot be too large, due to the use of B-spline basis approximations. Theoretically, we need to specify

d_{n}^{*}

such that

d_{n}^{*} \leq r_{n}

, while it is sufficient to require

L_{n} d_{n}^{*} < n

practically.

Algorithm 1 The implementation of NQPC-SIS.

1:

Given

d_{n}

, we set a pre-specified number

d_{n}^{*} \leq d_{n}

and an initial set

A^{(0)} = \emptyset

.

2:

For

k = 1, \dots, d_{n}^{*}

,

(2a): update $S_{j} = A^{(k - 1)} \cup C_{j}^{v}$ ;
(2b): update $A^{(k)} = A^{(k - 1)} \cup {j^{*}}$ , where the variable index $j^{*}$ is defined by

$\begin{matrix} j^{*} = {argmax}_{j \notin A^{(k - 1)}} | {\hat{ρ}}_{τ} (Y, X_{j} | X_{S_{j}}) | . \end{matrix}$

3:

For

k = d_{n}^{*} + 1, \dots, d_{n}

,

(3a): update $S_{j} = A^{(d_{n}^{*})} \cup C_{j}^{v}$ ;
(3b): update $A^{(k)} = A^{(k - 1)} \cup {j^{*}}$ , where the variable index $j^{*}$ is such that

$\begin{matrix} j^{*} = {argmax}_{j \notin A^{(k - 1)}} | {\hat{ρ}}_{τ} (Y, X_{j} | X_{S_{j}}) | . \end{matrix}$

4:

Repeat Step 3 until

k \geq d_{n}

. The final selected set is denoted as

\hat{M}

.

5. Numerical Studies

5.1. Simulations

In this subsection, we conduct some simulation studies to examine the finite sample performance of the proposed NQPC-SIS. In order to evaluate the performance, we employ three criteria: the minimum model size (MMS), i.e., the smallest number of covariates that contain all the active variables, its robust standard deviation (RSD), and the proportion of all the active variables selected (

P

) with the screening threshold parameter being specified as

d_{n} = ⌊ n / log n ⌋

. Throughout this subsection, we adopt the following simulation settings: the sample size

n = 200

, the number of basis

L_{n} = ⌊ n^{1 / 5} ⌋ + 1

, and the dimensionality

p_{n} = 1000

. We simulate the random error

ε

from two distributions:

N (0, 1)

and

t (3)

, respectively. Three quantile levels

τ = 0.2, 0.5, 0.8

are considered in all situations. For each simulation scenario, all the results are obtained over

N = 200

replications.

Example 1.

Let

X = {(X_{1}, \dots, X_{p_{n}})}^{T}

be a

p_{n}

-dimensional random vector having a multivariate normal distribution with mean zero and covariance matrix

Σ = {(σ_{j k})}_{1 \leq j, k \leq p_{n}}

, where

σ_{j j} = 1

and

σ_{j, k} = ρ, j \neq k

except that

σ_{i 4} = σ_{4 j} = \sqrt{ρ}

. Generate the response as:

\begin{matrix} Y = β X_{1} + β X_{2} + β X_{3} - 3 β \sqrt{ρ} X_{4} + ε . \end{matrix}

It is easily observed that the marginal Pearson’s correlation between

X_{4}

and Y is zero. We take

ρ = 0.5, 0.8

and set

β = 2.5 (1 + | τ - 0.5 |)

to incorporate the quantile information.

Example 2.

We follow the simulation model of [17] and generate the response as

\begin{matrix} Y = β X_{1} + β X_{2} + β X_{3} - 3 β \sqrt{ρ} X_{4} - 0.25 β X_{5} + ε, \end{matrix}

where

β, ρ,

and

X

are defined as in Example 1 except that

σ_{i 5} = σ_{5 j} = 0

such that

X_{5}

is uncorrelated with

X_{j}, j \neq 5

.

Example 3.

We simulate the response from the following nonlinear model:

\begin{matrix} Y = 3 g_{1} (X_{1}) + 3 g_{2} (X_{2}) + 3 g_{3} (X_{3}) + 3 g_{4} (X_{4}) + 3 g_{5} (X_{5}) + ε, \end{matrix}

where

g_{1} (x) = 1.5 x, g_{2} (x) = 2 x (2 x - 1), g_{3} (x) = sin (2 π x) / (2 sin (2 π x)), g_{4} (x) = sin (2 π x), g_{5} (x) = e^{x - 0.5}

. The covariates

X = (X_{1}, \dots, X_{p_{n}})

are simulated from a random-effects model

X_{j} = \frac{W_{j} + t U}{1 + t}, j = 1, \dots, p_{n}

, where

W_{j}

s and U are iid

U n i f (0, 1)

. We consider two cases of

t = 1

and

t = 2

, corresponding to

corr (X_{j}, X_{k}) = 0.5

and

0.8

for

j \neq k

, respectively.

Example 4.

We consider the same model as that in Example 3, with exception that

X_{2}

and

X_{5}

are replaced by

X_{2} = cos (2 π X_{6}) + ϵ

and

X_{5} = {(X_{1} - 0.5)}^{2} + ϵ

, where

ϵ \sim N (0, 1)

and is independent of ε, the error in the model in Example 3.

The simulation results of Examples 1–4 are shown in Table 1, Table 2, Table 3 and Table 4, respectively. The results in Table 1 show that when the true relation between the response and covariates in the model is linear, the SIS, NIS and Qa-SIS methods fail to work. However, when comparing to those methods, we can see that both QPC-SIS and NQPC-SIS with

τ = 0.5

work reasonably well, although the QPC-SIS slightly outperform our NQPC-SIS when

ρ = 0.8

. This is expected because the QPC works for the model with linear relationship between the covariates. A similar observation can be drawn in Table 2 for Example 2, which is also a linear model, albeit the difference that

X_{5}

and

X_{j}, j \neq 5

are independent in Example 2. The results in Table 3 indicate that when the relationship between Y and

X

is nonlinear and the relationship between covariates is linear, our proposed NQPC-SIS performs best and then followed by QPC-SIS. From Table 4, we can see that when the relationship between Y and

X

is nonlinear and there also exists a nonlinear relationship among

X

, NQPC-SIS works most satisfactorily and is much better than Qa-SIS and QPC-SIS in terms of both MMS and selection rate

P

.

In addition, the simulation results of QPC-SIS and NQPC-SIS for Examples 1–4 with

ρ = 0.9

and

τ = 0.5

are reported in Table 5. It can be observed from Table 5 that when the sample size increases from 200 to 400, the performance of QPC-SIS and NQPC-SIS are improved by much, although QPC-SIS and NQPC-SIS perform very competitively in Examples 1 and 2, while NQPC-SIS performs significantly better than QPC-SIS in Examples 3 and 4. These evidences indicate the effectiveness and usefulness of our NQPC-SIS.

As suggested by one anonymous reviewer, we add one more simulation to compare our NQPC-SIS with the following two approaches: (a) QC-SIS, which is the screening method based on quantile correlation, but simply ignores the effect of conditional variables on the response, and (b) RFQPC-SIS, which is a procedure very similarly to our NQPC-SIS, yet removes the effect of conditional variables through fitting Random Forest models. We examine the performance of these three approaches under

τ = 0.5

and

n = 200

for Examples 1 to 4, where RFQPC-SIS is a variant of the NQPC method and implemented with randomForest in R package “randomForest”. Note that RFQPC-SIS requires

2 (p - | A^{(k)} |)

random forest regressions in the k-th iteration, which is highly computationally intensive. Here, we evaluate NQPC-SIS, QC-SIS and RFQPC-SIS using effective model size (EMS) and

P

, where EMS indicates the average of true variables contained in the first

d_{n} = ⌊ n / log (n) ⌋

variables selected from 200 replicate experiments. The results are reported in Table 6, showing that our NQPC-SIS still performs the best and is followed by RFQPC-SIS. Moreover, the computational cost of NQPC-SIS is much less than that of RFQPC-SIS.

5.2. An Application to Breast Cancer Data

In this subsection, we apply the proposed NQPC-SIS to breast cancer data with a high lethality rate, which is reported by [36]. The data consists of 19,672 gene expression and 2,149 CGH measurements from 89 cancer patient samples, which is available at https://github.com/bnaras/PMA/blob/master/data/breastdata.rda (accessed on 18 June 2021). Our interest here is to detect the genes that have the most impact on comparative genomic hybridization (CGH) measurements. A similar purpose was achieved in [25,37]. Following [37], we consider the first principal component of 136 CGH measurements as the response Y and the remaining 18,672 gene probes as the explanatory variables

X

. We implement the two stage procedure for the sake of comparison, where a variable screening method is implemented in the first stage and a predictive regression model is conducted in the second stage. To this end, we select

d_{n} = ⌊ n / log (n) ⌋

variables in the first stage using one of the screening methods: SIS, NIS, Qa-SIS, QPC-SIS and NQPC-SIS, as mentioned in the simulation study. In the second stage, we randomly select 80% sample data as the training set, and the remaining

20 %

sample as the test set. Then, we apply one machine learning method, regression tree, to the dimension-reduced data to examine the finite sample performance on the test set. We use the command M5P in R package “RWeka” for implementing the regression tree method. We use the mean of absolute prediction error (MAPE), defined as

\begin{matrix} MAPE = \frac{1}{n^{(t e s t)}} \sum_{i = 1}^{n^{(t e s t)}} |Y_{i}^{(t e s t)} - {\hat{Y}}_{i}^{(t e s t)}|, \end{matrix}

as our evaluation index, where

n^{(t e s t)}

is the number of observations in the training set and

{\hat{Y}}_{i}^{(t e s t)}

is the predicted value of Y at the observation

x_{i}

in the test set. We repeat the above procedure 500 times and report the mean and standard deviation of 500 MAPEs in Table 7. According to the results in Table 7, we can observe that the NQPC-SIS outperforms both the SIS, NIS and Qa-SIS. Typically, our NQPC-SIS produces the lowest prediction error (MAPE) among these methods when

τ = 0.4

,

τ = 0.5

and

τ = 0.7

. Moreover, we also note that the QPC-SIS performs better than our NQPC-SIS at

τ = 0.3

and

τ = 0.6

, but worse than our method at other three quantile levels. Qa-SIS performs worst among these methods. This evidence supports that the proposed NPQC-SIS in this paper works well for this real data.

6. Concluding Remarks

In this paper, we proposed a nonparametric quantile partial correlation-based variable screening approach (NQPC-SIS), which can be viewed as an extension of the QPC-SIS proposed in [17] from a parametric framework to the nonparametric situation. Our proposed NQPC-SIS enjoys the sure independence screening property under some mild technical conditions. Furthermore, an algorithm of NQPC-SIS for implementation is provided for users. Extensive numerical experiments including simulations and real-world data analysis are carried out for illustration. The numerical results showed that our NQPC-SIS works fairly well especially when the relationship between variables is highly nonlinear.

Author Contributions

All the authors contributed to formulating the research idea, methodology, theory, algorithm design, result analysis, writing and reviewing the research. Conceptualization, X.X. and H.M.; methodology, X.X.; software, H.M.; validation, H.M.; formal analysis, H.M.; investigation, X.X.; writing—original draft preparation, H.M.; writing—review and editing, X.X.; supervision, X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Fundamental Research Funds for the Central Universities (Grant No. 2021CDJQY-047) and National Natural Science Foundation of China (Grant No. 11801202).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Technical Proofs

Proof of Proposition 1.

First, recalling definitions of

u_{j}

and

u_{j}^{*}

, we can make a simple algebra decomposition:

\begin{matrix} \sqrt{τ (1 - τ)} [ϱ_{τ} (Y, X_{j} | X_{S_{j}}) - ϱ_{τ}^{*} (Y, X_{j} | X_{S_{j}})] \\ = {(σ_{j, 0} σ_{j})}^{- 1} (σ_{j} - σ_{j, 0}) E {ψ_{τ} (Y - m_{j}^{0} (X_{S_{j}})) (X_{j} - g_{j}^{0} (X_{S_{j}}))} \\ + σ_{j}^{- 1} E {[ψ_{τ} (Y - m_{j}^{0} (X_{S_{j}})) - ψ_{τ} (Y - B_{j}^{T} α_{j}^{0})] (X_{j} - g_{j}^{0} (X_{S_{j}}))} \\ - σ_{j}^{- 1} E {ψ_{τ} (Y - B_{j}^{T} α_{j}^{0}) [g_{j}^{0} (X_{S_{j}}) - B_{j}^{T} θ_{j}^{0}]} \\ ≜ A_{1} + A_{2} + A_{3} (say) . \end{matrix}

(A1)

Due to condition (B1), we can observe

\begin{matrix} σ_{j}^{2} - σ_{j, 0}^{2} & = & E {{(X_{j} - B_{j}^{T} θ_{j}^{0})}^{2}} - E {{(X_{j} - g_{j}^{0} (X_{S_{j}}))}^{2}} \\ = & E {{(g_{j}^{0} (X_{S_{j}}) - B_{j}^{T} θ_{j}^{0})}^{2}} + 2 E {(g_{j}^{0} (X_{S_{j}}) - B_{j}^{T} θ_{j}^{0}) (X_{j} - g_{j}^{0} (X_{S_{j}}))} \\ = & E {{(g_{j}^{0} (X_{S_{j}}) - B_{j}^{T} θ_{j}^{0})}^{2}}, \end{matrix}

where the cross product is zero due to

E {X_{j} - g_{j}^{0} (X_{S_{j}}) | X_{S_{j}}} = 0

by condition (B1). This, in conjunction with condition (B1) and the basic inequality that

\sqrt{a} - \sqrt{b} \leq \sqrt{a - b}

for

a > b > 0

, gives

\begin{matrix} σ_{j} - σ_{j, 0} \leq {[E {\sum_{k \in S_{j}} (g_{j k}^{0} (X_{k}) - B_{j} {(X_{k})}^{T} θ_{j k}^{0})}^{2}]}^{1 / 2} \leq C_{g} r_{n} L_{n}^{- d} . \end{matrix}

(A2)

Using Cauchy–Schwarz inequality, (A2) and the fact that

| ψ_{τ} (u) | \leq max (τ, 1 - τ) \leq 1

, we have

\begin{matrix} | A_{1} | & \leq & {(σ_{j, 0} σ_{j})}^{- 1} (σ_{j} - σ_{j, 0}) {[E {| ψ_{τ} (Y - m_{j}^{0} (X_{S_{j}})) |^{2}}]}^{1 / 2} {[E {| (X_{j} - g_{j}^{0} (X_{S_{j}})) |^{2}}]}^{1 / 2} \\ \leq & σ_{j}^{- 1} (σ_{j} - σ_{j, 0}) \leq c_{σ, min}^{- 1 / 2} C_{g} r_{n} L_{n}^{- d} . \end{matrix}

(A3)

For

A_{2}

, we note that

\begin{matrix} E {[ψ_{τ} (Y - m_{j}^{0} (X_{S_{j}})) - ψ_{τ} (Y - B_{j}^{T} α_{j}^{0})] (X_{j} - g_{j}^{0} (X_{S_{j}}))} ≜ E {(X_{j} - g_{j}^{0} (X_{S_{j}})) A_{21}}, \end{matrix}

where, by Taylor’s expansion,

\begin{matrix} A_{21} & = & E {[ψ_{τ} (Y - m_{j}^{0} (X_{S_{j}})) - ψ_{τ} (Y - B_{j}^{T} α_{j}^{0})] | X_{j}, X_{S_{j}}} \\ = & - f_{Y | (X_{j}, X_{S_{j}})} (y^{*}) (m_{j}^{0} (X_{S_{j}}) - B_{j}^{T} α_{j}^{0}), \end{matrix}

where

y^{*}

is a number between

m_{j}^{0} (X_{S_{j}})

and

B_{j}^{T} α_{j}^{0}

. Hence, by condition (B1)–(B3) and Cauchy–Schwarz inequality, we can obtain

\begin{matrix} | A_{2} | & \leq & σ_{j}^{- 1} E {| X_{j} - g_{j}^{0} (X_{S_{j}}) | \cdot | A_{21} |} \\ \leq & σ_{j}^{- 1} {\bar{c}}_{f} E {| m_{j}^{0} (X_{S_{j}}) - B_{j}^{T} α_{j}^{0} | \cdot | X_{j} - g_{j}^{0} (X_{S_{j}}) |} \\ \leq & σ_{j}^{- 1} {\bar{c}}_{f} {E [| m_{j}^{0} (X_{S_{j}}) - B_{j}^{T} α_{j}^{0} |^{2} {]}}^{1 / 2} {E [| X_{j} - g_{j}^{0} (X_{S_{j}}) |^{2} {]}}^{1 / 2} \\ \leq & σ_{j}^{- 1} σ_{j, 0} {\bar{c}}_{f} C_{m} r_{n} L_{n}^{- d} \leq c_{σ, min}^{- 1 / 2} {\tilde{c}}_{σ, max}^{1 / 2} {\bar{c}}_{f} C_{m} r_{n} L_{n}^{- d} \end{matrix}

(A4)

for some constant

{\bar{c}}_{f} > 0

.

For

A_{3}

, by a similar argument, we can obtain

\begin{matrix} | A_{3} | & \leq & σ_{j}^{- 1} E {| ψ_{τ} (Y - B_{j}^{T} α_{j}^{0}) | \cdot | g_{j}^{0} (X_{S_{j}}) - B_{j}^{T} θ_{j}^{0} |} \\ \leq & σ_{j}^{- 1} {E [| ψ_{τ} (Y - B_{j}^{T} α_{j}^{0}) |^{2} {]}}^{1 / 2} {E [| g_{j}^{0} (X_{S_{j}}) - B_{j}^{T} θ_{j}^{0} |^{2} {]}}^{1 / 2} \\ \leq & σ_{j}^{- 1} C_{g} r_{n} L_{n}^{- d} \leq c_{σ, min}^{- 1 / 2} C_{g} r_{n} L_{n}^{- d} . \end{matrix}

(A5)

Therefore, combining (A1) and the results in (A3)–(A5), we have

\begin{matrix} | ϱ_{τ} (Y, X_{j} | X_{S_{j}}) - ϱ_{τ}^{*} (Y, X_{j} | X_{S_{j}}) | \\ \leq {[τ (1 - τ)]}^{- 1 / 2} c_{σ, min}^{- 1 / 2} (2 C_{g} + {\tilde{c}}_{σ, max}^{1 / 2} {\bar{c}}_{f} C_{m}) r_{n} L_{n}^{- d} . \end{matrix}

Using the basic inequality that

| a | - | b | \leq | | a | - | b | | \leq | a - b |

, we can immediately conclude

\begin{matrix} u_{j} - u_{j}^{*} \leq M_{1 *} r_{n} L_{n}^{- d}, \end{matrix}

where

M_{1 *} = {[τ (1 - τ)]}^{- 1 / 2} c_{σ, min}^{- 1 / 2} (2 C_{g} + {\tilde{c}}_{σ, max}^{1 / 2} {\bar{c}}_{f} C_{m})

. Thus, we complete the proof. □

Proof of Lemma 1.

Without loss of generality, suppose that

S_{j} = {1, 2, \dots, s}

. Then,

B_{j} = {(B {(X_{1})}^{T}, \dots, B {(X_{s})}^{T})}^{T}

. Let

∥ a ∥ = 1

, where

a = {(a_{1}^{T}, \dots, a_{s}^{T})}^{T}

with

a_{k} \in R^{L_{n}}

. On one hand, since

{(\sum_{i = 1}^{n} x_{i})}^{2} \leq n \sum_{i = 1}^{n} x_{i}^{2}

by Cauchy–Schwarz inequality, we have

\begin{matrix} a^{T} E {B_{j} B_{j}^{T}} a = E \{{[\sum_{k = 1}^{s} a_{k}^{T} B (X_{k})]}^{2}\} \leq s \sum_{k = 1}^{s} a_{k}^{T} E {B (X_{k}) B {(X_{k})}^{T}} a_{k} . \end{matrix}

This together with the right hand side of (9) implies that

\begin{matrix} λ_{max} (E {B_{j} B_{j}^{T}}) \leq s λ_{max} (E {B (X_{k}) B {(X_{k})}^{T}}) \leq C_{2} s L_{n}^{- 1} . \end{matrix}

(A6)

On the other hand, an application of Lemma S.1 of [38] leads to

\begin{matrix} a^{T} E {B_{j} B_{j}^{T}} a & = & E \{{[\sum_{k = 1}^{s} a_{k}^{T} B (X_{k})]}^{2}\} \\ \geq & {(\frac{1 - δ_{0}}{2})}^{s - 1} {[\sum_{k = 1}^{s} \sqrt{E {a_{k}^{T} B (X_{k}) B {(X_{k})}^{T} a_{k}}}]}^{2} \\ \geq & {(\frac{1 - δ_{0}}{2})}^{s - 1} λ_{min} (E {B (X_{k}) B {(X_{k})}^{T}}) {[\sum_{k = 1}^{s} ∥ a_{k} ∥]}^{2}, \end{matrix}

where

δ_{0} = {(1 - b_{1 f}^{2} b_{2 f}^{- 2} ζ)}^{1 / 2}

for some positive constant

ζ > 0

and the last line uses the fact that

a^{T} A a \geq λ_{min} (A)

for any

∥ a ∥ = 1

. It follows from the result on the left hand side of (9) that

\begin{matrix} a^{T} E {B_{j} B_{j}^{T}} a \geq {(\frac{1 - δ_{0}}{2})}^{s - 1} C_{1} L_{n}^{- 1} {[\sum_{k = 1}^{s} ∥ a_{k} ∥]}^{2} \geq {(\frac{1 - δ_{0}}{2})}^{s - 1} C_{1} L_{n}^{- 1}, \end{matrix}

where the second inequality stems from

(\sum_{i = 1}^{n} | x_{i} {|)}^{2} \geq \sum_{i = 1}^{n} x_{i}^{2}

and

∥ a ∥ = 1

. This in turns implies that

\begin{matrix} λ_{min} (E {B_{j} B_{j}^{T}}) \geq {(\frac{1 - δ_{0}}{2})}^{s - 1} C_{1} L_{n}^{- 1} . \end{matrix}

(A7)

Hence, combining (A6) and (A7) completes the proof of Lemma 1. □

Lemma A1.

Suppose that condition (C3) holds, then, for all

r \geq 2

,

\begin{matrix} E (| X_{j} |^{r} | X_{- j}) \leq K_{1} K_{2}^{r} r! \end{matrix}

holds uniformly in j.

Lemma A1 is the same as Lemma 1 of [11]. From this, it is easily seen that

E {| X_{j} |^{2} | X_{- j}}

is finite and bounded by

2 K_{1} K_{2}^{2}

.

Lemma A2

(Bernstein’s inequality, Lemma 2.2.11, [39]). For independent random variables

Y_{1}, \dots, Y_{n}

with mean zero and

E {| Y_{i} |^{r}} \leq r! K^{r - 2} v_{i} / 2

for every

r \geq 2

,

i = 1, \dots, n

and some constants

K, v_{i}

. Then, for

x > 0

, we have

\begin{matrix} P (| Y_{1} + \dots + Y_{n} | > x) \leq 2 exp (- \frac{x^{2}}{2 (v + K x)}), \end{matrix}

for

v \geq \sum_{i = 1}^{n} v_{i}

.

Lemma A3

(Bernstein’s inequality, Lemma 2.2.9, [39]). For independent random variables

Y_{1}, \dots, Y_{n}

with mean zero and bounded range

[- M, M]

, then

\begin{matrix} P (| Y_{1} + \dots + Y_{n} | > x) \leq 2 exp (- \frac{x^{2}}{2 (v + M x / 3)}), \end{matrix}

for

v \geq var (Y_{i} + \dots + Y_{n})

.

Lemma A4

(Symmetrization, Lemma 2.3.1, [39]). Let

Z_{1}, \dots, Z_{n}

be independent random variables with values in

Z

and

F

is a class of real valued functions on

Z

. Then,

\begin{matrix} E {sup_{f \in F} | (P_{n} - P) f (Z) |} \leq 2 E {sup_{f \in F} | P_{n} ε f (Z) |}, \end{matrix}

where

ε_{1}, \dots, ε_{n}

is a Rademacher sequence (i.e., independent and identically distributed sequence taking values

\pm 1

with probability

\frac{1}{2}

) independent of

Z_{1}, \dots, Z_{n}

, and

P f (Z) = E f (Z)

and

P_{n} f (Z) = n^{- 1} \sum_{i = 1}^{n} f (Z_{i})

.

Lemma A5.

(Contraction theorem, [40]). Let

z_{1}, \dots, z_{n}

be nonrandom elements of some space

Z

and let

F

be a class of real valued functions on

Z

. Denote by

ε_{1}, \dots, ε_{n}

a Rademacher sequence. Consider Lipschitz functions

g_{i} : R \mapsto R

, that is

\begin{matrix} | g_{i} (s_{1}) - g_{i} (s_{2}) | \leq | s_{1} - s_{2} |, \forall s_{1}, s_{2} \in R . \end{matrix}

Then, for any function

f_{1} : Z \mapsto R

, we have

\begin{matrix} E {sup_{f \in F} | P_{n} ε (g (f) - g (f_{1})) |} \leq 2 E {sup_{f \in F} | P_{n} ε (f - f_{1}) |} . \end{matrix}

Lemma A6

(Concentration theorem, [41]). Let

Z_{1}, \dots, Z_{n}

be independent random variables with values in

Z

and let

g \in G

, a class of real valued functions on

Z

. We assume that for some positive constants

l_{i}, g

and

u_{i, g}

,

l_{i, g} \leq g (Z_{i}) \leq u_{i, g} \forall g \in G

. Define

D^{2} = {sup}_{g \in G} \sum_{i = 1}^{n} {(u_{i, g} - l_{i, g})}^{2} / n

, and

U = {sup}_{g \in G} | (P_{n} - P) g (Z) |

, then for any

t > 0

,

\begin{matrix} P (U \geq E U + t) \leq exp (- \frac{n t^{2}}{2 D^{2}}) . \end{matrix}

Next, we need several lemmas to establish the consistency inequalities for

{\hat{θ}}_{j}

and

{\hat{α}}_{j}

. Write

D_{n j} = \frac{1}{n} \sum_{i = 1}^{n} B_{i j} B_{i j}^{T}

,

D_{j} = E {B_{i j} B_{i j}^{T}} = E {B_{j} B_{j}^{T}}

,

E_{n j} = \frac{1}{n} \sum_{i = 1}^{n} B_{i j} X_{i j}

and

E_{j} = E {B_{i j} X_{i j}} = E {B_{j} X_{j}}

. Thus

{\hat{θ}}_{j} = D_{n j}^{- 1} E_{n j}

and

θ_{j}^{0} = D_{j}^{- 1} E_{j}

.

Lemma A7.

Under conditions (C1) and (C2),

(i): there exists a constant $C_{3}$ such that for any $δ > 0$ ,

$\begin{matrix} P (| λ_{min} (D_{n j}) - λ_{min} (D_{j}) | \geq r_{n} L_{n} δ / n) \leq 2 {(r_{n} L_{n})}^{2} exp (- \frac{δ^{2}}{2 (C_{3} L_{n}^{- 1} n + 2 δ / 3)}), \\ P (| λ_{max} (D_{n j} - D_{j}) | \geq r_{n} L_{n} δ / n) \leq 2 {(r_{n} L_{n})}^{2} exp (- \frac{δ^{2}}{2 (C_{3} L_{n}^{- 1} n + 2 δ / 3)}), \end{matrix}$
(ii): for some positive constant $c_{1}$ , there exists some positive constant $c_{2}$ such that

$\begin{matrix} P (λ_{min} (D_{n j}) | \geq (1 + c_{1}) λ_{min} (D_{j})) \leq 2 {(r_{n} L_{n})}^{2} exp (- c_{2} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n), \end{matrix}$

where $a_{0} = (1 - δ_{0}) / 2$ and $δ_{0}$ is defined in Lemma 1; and
(iii): in addition, for any given constant $c_{2}$ , there exists some positive constant $c_{3}$ such that

$\begin{matrix} P (∥ D_{n j}^{- 1} ∥ \geq (1 + c_{3}) ∥ D_{j}^{- 1} ∥) \leq 2 {(r_{n} L_{n})}^{2} exp (- c_{2} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) . \end{matrix}$

Proof of Lemma A7.

First, consider the proof of part (i). Denote

Q_{i j, s, t}^{(k, l)} = B_{s} (X_{i k}) B_{t} (X_{i l}) - E {B_{s} (X_{i k}) B_{t} (X_{i l})}

with

k, l \in S_{j}

and

s, t = 1, \dots, L_{n}

. Recalling that

∥ B_{t} ∥_{\infty} \leq 1

, we have

| Q_{i j, s, t}^{(k, l)} | \leq 2

and

var {Q_{i j, s, t}^{(k, l)}} \leq E {B_{s}^{2} (X_{i k}) B_{t}^{2} (X_{i l})} \leq E {B_{s}^{2} (X_{i k})} \leq C_{3} L_{n}^{- 1}

by the inequality (10). By Lemma A3, we have for any

δ > 0

,

\begin{matrix} P (| n^{- 1} \sum_{i = 1}^{n} Q_{i j, s, t}^{(k, l)} | > \frac{δ}{n}) \leq 2 exp (- \frac{δ^{2}}{2 (C_{3} L_{n}^{- 1} n + 2 δ / 3)}) . \end{matrix}

(A8)

Let

Q_{n j} = D_{n j} - D_{j}

. It follows from Lemma 5 of [7] that

| λ_{min} (D_{n j}) - λ_{min} (D_{j}) | \leq max {| λ_{min} (Q_{n j}) |, | λ_{min} (- Q_{n j}) |}

. Besides, it is easy to derive that for any

| S_{j} | L_{n} \times 1

vector

∥ a ∥ = 1

,

| a^{T} Q_{n j} a | \leq L_{n} | S_{j} | \cdot ∥ Q_{n j} ∥_{\infty}

, which implies that

\begin{matrix} | λ_{min} (Q_{n j}) | \leq L_{n} | S_{j} | \cdot ∥ Q_{n j} ∥_{\infty}, and | λ_{max} (Q_{n j}) | \leq L_{n} | S_{j} | \cdot ∥ Q_{n j} ∥_{\infty} . \end{matrix}

(A9)

This in conjunction with (A8) and the union bound of probability yields that

\begin{matrix} P (| λ_{min} (D_{n j}) - λ_{min} (D_{j}) | \geq r_{n} L_{n} δ / n) \\ \leq P (∥ Q_{n j} ∥_{\infty} \geq δ / n) \leq 2 {(r_{n} L_{n})}^{2} exp (- \frac{δ^{2}}{2 (C_{3} L_{n}^{- 1} n + 2 δ / 3)}) \end{matrix}

(A10)

and

\begin{matrix} P (| λ_{max} (D_{n j} - D_{j}) | \geq r_{n} L_{n} δ / n) \leq 2 {(r_{n} L_{n})}^{2} exp (- \frac{δ^{2}}{2 (C_{3} L_{n}^{- 1} n + 2 δ / 3)}) . \end{matrix}

(A11)

Next, consider the proof of part (ii). Let

c_{1}^{*} = 2 c_{1} C_{1} / (1 - δ_{0})

, where

c_{1} \in (0, 1)

. Employing the result (A10) and taking

δ = c_{1}^{*} a_{0}^{r_{n}} r_{n}^{- 1} L_{n}^{- 2} n

, we have

\begin{matrix} P (| λ_{min} (D_{n j}) - λ_{min} (D_{j}) | \geq c_{1} λ_{min} (D_{j})) \\ \leq P (| λ_{min} (D_{n j}) - λ_{min} (D_{j}) | \geq c_{1}^{*} a_{0}^{r_{n}} L_{n}^{- 1}) \\ \leq 2 {(r_{n} L_{n})}^{2} exp (- \frac{{c_{1}^{*}}^{2} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 4} n^{2}}{2 (C_{3} L_{n}^{- 1} n + 2 c_{1}^{*} a_{0}^{r_{n}} r_{n}^{- 1} L_{n}^{- 2} n / 3)}) \\ \leq 2 {(r_{n} L_{n})}^{2} exp (- c_{2} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) \end{matrix}

(A12)

for some positive constant

c_{2}

. This implies the part (ii).

Last, consider the proof of part (iii). Let

A = λ_{min} (D_{n j})

and

B = λ_{min} (D_{j})

. Obviously, we know that

A, B > 0

. Using the same arguments as in [7], we can show that for

a \in (0, 1)

,

| A^{- 1} - B^{- 1} | \geq c B^{- 1}

implies

| A - B | \geq a B

, where

c = \frac{1}{1 - a} - 1

. Thus,

| λ_{min}^{- 1} (D_{n j}) - λ_{min}^{- 1} (D_{j}) | \geq (1 / (1 - c_{1}) - 1) λ_{min}^{- 1} (D_{j})

implies

| λ_{min} (D_{n j}) - λ_{min} (D_{j}) | \geq c_{1} λ_{min} (D_{j})

. Hence, using the fact that

λ_{min}^{- 1} (A) = λ_{max} (A^{- 1}) = ∥ A^{- 1} ∥

for any real symmetric invertible matrix

A

, we have

\begin{matrix} P (| ∥ D_{n j}^{- 1}) ∥ | \geq (1 + c_{3}) ∥ D_{j}^{- 1} ∥) \\ \leq P (| ∥ D_{n j}^{- 1}) ∥ - ∥ D_{j}^{- 1} ∥ | \geq c_{3} ∥ D_{j}^{- 1} ∥) \\ \leq P (| λ_{min} (D_{n j}) - λ_{min} (D_{j}) | \geq c_{1} λ_{min} (D_{j})) \\ \leq 2 {(r_{n} L_{n})}^{2} exp (- c_{2} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n), \end{matrix}

(A13)

where

c_{3} = 1 / (1 - c_{1}) - 1

. This completes the proof. □

Lemma A8.

Under conditions (C1)–(C3), for every

1 \leq j \leq p

and for any given positive constant

c_{1}^{*}

, there exist some positive constants

c_{2}^{*}

such that

\begin{matrix} P (∥ {\hat{θ}}_{j} - θ_{j}^{0} ∥ \geq c_{1}^{*} a_{0}^{- r_{n}} r_{n}^{1 / 2} L_{n}) \leq [8 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{2}^{*} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) . \end{matrix}

Proof of Lemma A8.

By the definitions of

{\hat{θ}}_{j}

and

θ_{j}^{0}

and a simple algebra operation, we have

\begin{matrix} {\hat{θ}}_{j} - θ_{j}^{0} = (D_{n j}^{- 1} - D_{j}^{- 1}) E_{n j} + D_{j}^{- 1} (E_{n j} - E_{j}) ≜ I_{n 1} + I_{n 2} (say) . \end{matrix}

(A14)

In the following, we need to find the exponential tail probabilities for

I_{n 1}

and

I_{n 2}

, respectively.

We first deal with the first term

I_{n 1}

. Since

D_{n j}^{- 1} - D_{j}^{- 1} = D_{n j}^{- 1} (D_{j} - D_{n j}) D_{j}^{- 1}

, we have

\begin{matrix} ∥ I_{n 1} ∥^{2} & = & E_{n j}^{T} D_{j}^{- 1} (D_{j} - D_{n j}) D_{n j}^{- 1} D_{n j}^{- 1} (D_{j} - D_{n j}) D_{j}^{- 1} E_{n j} \\ \leq & ∥ D_{j}^{- 1} ∥^{2} ∥ D_{n j}^{- 1} ∥^{2} ∥ D_{n j} - D_{j} ∥^{2} {∥ E_{n j} ∥}^{2} . \end{matrix}

Thus, it follows from the triangle inequality and Lemma 1 that

\begin{matrix} ∥ I_{n 1} ∥ & \leq & λ_{min}^{- 1} (D_{j}) ∥ D_{n j}^{- 1} ∥ \cdot | λ_{max} (D_{n j} - D_{j}) | \cdot ∥ E_{n j} ∥ \\ \leq & C_{1}^{- 1} a_{0}^{- | S_{j} | + 1} L_{n} ∥ D_{n j}^{- 1} ∥ \cdot | λ_{max} (D_{n j} - D_{j}) | \cdot ∥ E_{j} ∥ \\ + C_{1}^{- 1} a_{0}^{- | S_{j} | + 1} L_{n} ∥ D_{n j}^{- 1} ∥ \cdot | λ_{max} (D_{n j} - D_{j}) | \cdot ∥ E_{n j} - E_{j} ∥ \\ ≜ & I_{n 1}^{(1)} + I_{n 1}^{(2)} (say) . \end{matrix}

For

I_{n 1}^{(1)}

, it follows that

\begin{matrix} ∥ E_{j} ∥^{2} & = & \sum_{k \in S_{j}} \sum_{l = 1}^{L_{n}} {[E {B_{l} (X_{i k}) X_{i j}}]}^{2} \leq \sum_{k \in S_{j}} \sum_{l = 1}^{L_{n}} E [B_{l}^{2} (X_{i k}) X_{i j}^{2}] \\ \leq & \sum_{k \in S_{j}} \sum_{l = 1}^{L_{n}} E [B_{l}^{2} (X_{i k}) E {X_{i j}^{2} | X_{- j}}] \leq 2 K_{1} K_{2}^{2} C_{3} r_{n} = C_{4} r_{n}, \end{matrix}

where

C_{4} = 2 K_{1} K_{2}^{2} C_{3}

and the last inequality holds by applying Lemma A1 and the result in (10). Using the above result, we have

\begin{matrix} I_{n 1}^{(1)} \leq C_{1}^{- 1} C_{4}^{1 / 2} a_{0}^{- r_{n} + 1} r_{n}^{1 / 2} L_{n} ∥ D_{n j}^{- 1} ∥ \cdot | λ_{max} (D_{n j} - D_{j}) | . \end{matrix}

Let

C_{5} = (1 + c_{3}) a_{0}^{2} C_{1}^{- 2} C_{4}^{1 / 2}

, then for any

δ > 0

, we have

\begin{matrix} P (| I_{n 1}^{(1)} | \geq C_{5} a_{0}^{- 2 r_{n}} r_{n}^{3 / 2} L_{n}^{3} δ / n) & \leq & P (∥ D_{n j}^{- 1} ∥ \geq (1 + c_{3}) ∥ D_{j}^{- 1} ∥) \\ + P (| λ_{max} (D_{n j} - D_{j}) | \geq r_{n} L_{n} δ / n) . \end{matrix}

Therefore, by Lemma A7, it follows that

\begin{matrix} P (| I_{n 1}^{(1)} | \geq C_{5} a_{0}^{- 2 r_{n}} r_{n}^{3 / 2} L_{n}^{3} δ / n) \\ \leq 2 {(r_{n} L_{n})}^{2} exp (- c_{2} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) + 2 {(r_{n} L_{n})}^{2} exp (- \frac{δ^{2}}{2 (C_{3} L_{n}^{- 1} n + 2 δ / 3)}) . \end{matrix}

(A15)

For

I_{n 1}^{(2)}

, note that

E_{n j} - E_{j} = \frac{1}{n} \sum_{i = 1}^{n} [B_{i j} X_{i j} - E {B_{i j} X_{i j}}]

is an

| S_{j} | L_{n} \times 1

vector, whose

((k - 1) L_{n} + 1)

th component is

\frac{1}{n} \sum_{i = 1}^{n} [B_{l} (X_{i k}) X_{i j} - E {B_{l} (X_{i k}) X_{i j}}]

, where

k \in S_{j}

and

l = 1, \dots, L_{n}

. Let

Z_{i k l j} = B_{l} (X_{i k}) X_{i j} - E {B_{l} (X_{i k}) X_{i j}}

. Then, for every

r \geq 2

, we have

\begin{matrix} E {| Z_{i k l j} |^{r}} & \leq & 2^{r} E {| B_{l} (X_{i k}) X_{i j} |^{r}} \leq 2^{r} E {B_{l}^{2} (X_{i k}) | X_{i j} |^{r}} \\ \leq & 2^{r} E {B_{l}^{2} (X_{i k}) E (| X_{i j} |^{r} | X_{- j})} \leq 2^{r} K_{1} K_{2}^{r} r! C_{3} L_{n}^{- 1} \\ = & r! {(2 K_{2})}^{r - 2} 8 K_{1} K_{2}^{2} C_{3} L_{n}^{- 1} / 2, \end{matrix}

where we have used the

C_{r}

inequality that

{| x + y |}^{r} \leq 2^{r - 1} {(| x |}^{r} + {| y |}^{r})

for

r \geq 2

, the fact that

∥ B_{l} ∥_{\infty} \leq 1

as well as Lemma A1. It follows from Lemma A2 that for any

δ > 0

,

\begin{matrix} P (| \frac{1}{n} \sum_{i = 1}^{n} Z_{i k l j} | \geq \frac{δ}{n}) \leq 2 exp (- \frac{δ^{2}}{c_{4} L_{n}^{- 1} n + c_{5} δ}), \end{matrix}

(A16)

where

c_{4} = 16 K_{1} K_{2}^{2} C_{3}

and

c_{5} = 4 K_{2}

. Employing the union bound of probability and the inequality (A16), we further have

\begin{matrix} P (∥ E_{n j} - E_{j} ∥ \geq r_{n}^{1 / 2} L_{n}^{1 / 2} δ / n) \leq 2 r_{n} L_{n} exp (- \frac{δ^{2}}{c_{4} L_{n}^{- 1} n + c_{5} δ}) . \end{matrix}

(A17)

Let

C_{6} = (1 + c_{3}) C_{1}^{- 2} a_{0}^{2}

. Similar to the derivation of (A15) and by Lemma 1 and Lemma A7 and (A17), we obtain

\begin{matrix} P (| I_{n 1}^{(2)} | \geq C_{6} a_{0}^{- 2 r_{n}} r_{n}^{3 / 2} L_{n}^{7 / 2} δ^{2} / n^{2}) \\ \leq P (∥ D_{n j}^{- 1} ∥ \cdot | λ_{max} (D_{n j} - D_{j}) | \geq C_{6} C_{1} a_{0}^{- r_{n} - 1} r_{n} L_{n}^{2} δ / n) \\ + P (∥ E_{n j} - E_{j} ∥ \geq r_{n}^{1 / 2} L_{n}^{1 / 2} δ / n) \\ \leq P (∥ D_{n j}^{- 1} ∥ \geq (1 + c_{3}) ∥ D_{j}^{- 1} ∥) + P (| λ_{max} (D_{n j} - D_{j}) | > r_{n} L_{n} δ / n) \\ + P (∥ E_{n j} - E_{j} ∥ \geq r_{n}^{1 / 2} L_{n}^{1 / 2} δ / n) \\ \leq 2 {(r_{n} L_{n})}^{2} exp (- c_{2} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) + 2 {(r_{n} L_{n})}^{2} exp (- \frac{δ^{2}}{2 (C_{3} L_{n}^{- 1} n + 2 δ / 3)}) \\ + 2 r_{n} L_{n} exp (- \frac{δ^{2}}{c_{4} L_{n}^{- 1} n + c_{5} δ}) \end{matrix}

(A18)

Hence, combining (A15) and (A18) gives

\begin{matrix} P (∥ I_{n 1} ∥ \geq C_{5} a_{0}^{- 2 r_{n}} r_{n}^{3 / 2} L_{n}^{3} δ / n + C_{6} a_{0}^{- 2 r_{n}} r_{n}^{3 / 2} L_{n}^{7 / 2} δ^{2} / n^{2}) \\ \leq P (| I_{n 1}^{(1)} | \geq C_{5} a_{0}^{- 2 r_{n}} r_{n}^{3 / 2} L_{n}^{3} δ / n) + P (| I_{n 1}^{(2)} | \geq C_{6} a_{0}^{- 2 r_{n}} r_{n}^{3 / 2} L_{n}^{7 / 2} δ^{2} / n^{2}) \\ \leq 4 {(r_{n} L_{n})}^{2} exp (- c_{2} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) + 4 {(r_{n} L_{n})}^{2} exp (- \frac{δ^{2}}{2 (C_{3} L_{n}^{- 1} n + 2 δ / 3)}) \\ + 2 r_{n} L_{n} exp (- \frac{δ^{2}}{c_{4} L_{n}^{- 1} n + c_{5} δ}) \\ \leq 2 [2 {(r_{n} L_{n})}^{2} + r_{n} L_{n}] exp (- \frac{δ^{2}}{c_{6} L_{n}^{- 1} n + c_{7} δ}) \\ + 4 {(r_{n} L_{n})}^{2} exp (- c_{2} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n), \end{matrix}

(A19)

where

c_{6} = max (2 C_{3}, c_{4})

and

c_{7} = max (c_{5}, 4 / 3)

.

Next, we deal with the second term

I_{n 2}

. Since

∥ I_{n 2} ∥^{2} = {(E_{n j} - E_{j})}^{T} D_{j}^{- 1} D_{j}^{- 1} (E_{n j} - E_{j}) \leq ∥ D_{j}^{- 1} ∥^{2} {∥ E_{n j} - E_{j} ∥}^{2}

, we have

∥ I_{n 2} ∥ \leq λ_{min}^{- 1} (D_{j}) ∥ E_{n j} - E_{j} ∥ \leq C_{1}^{- 1} a_{0}^{- r_{n} + 1} L_{n} ∥ E_{n j} - E_{j} ∥

by Lemma 1. Then, it follows from (A17) that

\begin{matrix} P (∥ I_{n 2} ∥ \geq C_{1}^{- 1} a_{0}^{- r_{n} + 1} r_{n}^{1 / 2} L_{n}^{3 / 2} δ / n) & \leq & P (∥ E_{n j} - E_{j} ∥ \geq r_{n}^{1 / 2} L_{n}^{1 / 2} δ / n) \\ \leq & 2 r_{n} L_{n} exp (- \frac{δ^{2}}{c_{4} L_{n}^{- 1} n + c_{5} δ}) . \end{matrix}

(A20)

Putting (A14), (A19) and (A20) together, we find that

\begin{matrix} P (∥ {\hat{θ}}_{j} - θ_{j}^{0} ∥ \geq C_{5} a_{0}^{- 2 r_{n}} r_{n}^{3 / 2} L_{n}^{3} δ / n + C_{6} a_{0}^{- 2 r_{n}} r_{n}^{3 / 2} L_{n}^{7 / 2} δ^{2} / n^{2} + C_{1}^{- 1} a_{0}^{- r_{n} + 1} r_{n}^{1 / 2} L_{n}^{3 / 2} δ / n) \\ \leq P (∥ I_{n 1} ∥ \geq C_{5} a_{0}^{- 2 r_{n}} r_{n}^{3 / 2} L_{n}^{3} δ / n + C_{6} a_{0}^{- 2 r_{n}} r_{n}^{3 / 2} L_{n}^{7 / 2} δ^{2} / n^{2}) \\ + P (∥ I_{n 2} ∥ \geq C_{1}^{- 1} a_{0}^{- r_{n} + 1} r_{n}^{1 / 2} L_{n}^{3 / 2} δ / n) \\ \leq 4 [{(r_{n} L_{n})}^{2} + r_{n} L_{n}] exp (- \frac{δ^{2}}{c_{6} L_{n}^{- 1} n + c_{7} δ}) \\ + 4 {(r_{n} L_{n})}^{2} exp (- c_{2} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n), \end{matrix}

(A21)

Using (A21) with

δ = a_{0}^{r_{n}} r_{n}^{- 1} L_{n}^{- 2} n

, we have

\begin{matrix} P (∥ {\hat{θ}}_{j} - θ_{j}^{0} ∥ \geq c_{1}^{*} a_{0}^{- r_{n}} r_{n}^{1 / 2} L_{n}) \leq [8 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{2}^{*} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) \end{matrix}

for some positive constant

c_{2}^{*}

and sufficiently large n, where

c_{1}^{*} = C_{5} + C_{6} + C_{1}^{- 1} a_{0}

. Hence, the desired result follows. □

Lemma A9.

Under conditions (C1)–(C5), for any given constant

C > 0

and for every

1 \leq j \leq p

, there exist some positive constants

c_{9}

and

c_{11}

such that

\begin{matrix} P (∥ {\hat{α}}_{j} - α_{j}^{0} ∥ \geq C {(r_{n} L_{n})}^{1 / 2} n^{- κ}) \leq 2 exp (- c_{9} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) + exp (- c_{11} a_{0}^{2 r_{n}} L_{n}^{- 2} n^{1 - 2 κ}) . \end{matrix}

Proof of Lemma A9.

Write

W_{n} (α_{j}) = \frac{1}{n} \sum_{i = 1}^{n} {ρ_{τ} (Y_{i} - B_{i j}^{T} α_{j}) - ρ_{τ} (Y_{i})}

and

W (α_{j}) = E {ρ_{τ} (Y - B_{j}^{T} α_{j}) - ρ_{τ} (Y)}

. By Lemma A.2 of [13], we have, for any

ϵ > 0

,

\begin{matrix} P (∥ {\hat{α}}_{j} - α_{j}^{0} ∥ \geq ϵ) \\ \leq P (sup_{∥ α_{j} - α_{j}^{0} ∥ \leq ϵ} | W_{n} (α_{j}) - W (α_{j}) | \geq \frac{1}{2} inf_{∥ α_{j} - α_{j}^{0} ∥ = ϵ} W (α_{j}) - W (α_{j}^{0})) \end{matrix}

(A22)

Taking

ϵ = C {(r_{n} L_{n})}^{1 / 2} n^{- κ}

in (A22), where C is any given positive constant, we first show that there exists some positive constant

c_{8}

such that

\begin{matrix} inf_{∥ α_{j} - α_{j}^{0} ∥ = C {(r_{n} L_{n})}^{1 / 2} n^{- κ}} W (α_{j}) - W (α_{j}^{0}) \geq c_{8} a_{0}^{r_{n}} r_{n} n^{- 2 κ} . \end{matrix}

(A23)

To this end, let

α_{j} = α_{j}^{0} + C {(r_{n} L_{n})}^{1 / 2} n^{- κ} u

with

∥ u ∥ = 1

. Invoking the Knight’s identity ([42], p121), i.e.,

ρ_{τ} (u - v) - ρ_{τ} (u) = - v [τ - I (u < 0)] + \int_{0}^{v} [I (u \leq s) - I (u \leq 0)] d s

, we have

\begin{matrix} W (α_{j}) - W (α_{j}^{0}) = E \{\int_{0}^{C {(r_{n} L_{n})}^{1 / 2} n^{- κ} B_{j}^{T} u} I (0 < Y - B_{j}^{T} α_{j}^{0} \leq s) d s\}, \end{matrix}

(A24)

where we have used the result that

E {B_{j} ψ_{τ} (Y - B_{j}^{T} α_{j}^{0})} = 0

by the definition of

α_{j}^{0}

. Note that the right hand side of (A24) equals

\begin{matrix} E \{\int_{0}^{C {(r_{n} L_{n})}^{1 / 2} n^{- κ} B_{j}^{T} u} E \{I (0 < Y - B_{j}^{T} α_{j}^{0} \leq s) | X\} d s\} \\ = E \{\int_{0}^{C {(r_{n} L_{n})}^{1 / 2} n^{- κ} B_{j}^{T} u} f_{Y | X} (y^{*}) s d s\}, \end{matrix}

for

y^{*}

between

B_{j}^{T} α_{j}^{0}

and

B_{j}^{T} α_{j}^{0} + s

. By condition (C4), it follows that

\begin{matrix} W (α_{j}) - W (α_{j}^{0}) & \geq & \frac{1}{2} c_{3 f} C^{2} r_{n} L_{n} n^{- 2 κ} E \{{(B_{j}^{T} u)}^{2}\} \\ \geq & \frac{1}{2} c_{3 f} C^{2} r_{n} L_{n} n^{- 2 κ} λ_{min} (E {B_{j} B_{j}^{T}}) \\ \geq & \frac{1}{2} c_{3 f} C^{2} C_{1} a_{0}^{r_{n} - 1} r_{n} n^{- 2 κ} = c_{8} a_{0}^{r_{n}} r_{n} n^{- 2 κ}, \end{matrix}

where

c_{8} = \frac{1}{2} c_{3 f} C^{2} C_{1} a_{0}^{- 1}

and

a_{0} = (1 - δ_{0}) / 2

. This proves (A23). Hence, by (A22), it reduces to derive that

\begin{matrix} P (∥ {\hat{α}}_{j} - α_{j}^{0} ∥ \geq C {(r_{n} L_{n})}^{1 / 2} n^{- κ}) \\ \leq P (sup_{∥ α_{j} - α_{j}^{0} ∥ \leq C {(r_{n} L_{n})}^{1 / 2} n^{- κ}} | W_{n} (α_{j}) - W (α_{j}) | \geq \frac{1}{2} c_{8} a_{0}^{r_{n}} r_{n} n^{- 2 κ}) \\ \leq P (sup_{∥ α_{j} - α_{j}^{0} ∥ \leq C {(r_{n} L_{n})}^{1 / 2} n^{- κ}} | {W_{n} (α_{j}) - W_{n} (α_{j}^{0})} - {W (α_{j}) - W (α_{j}^{0})} | \geq \frac{1}{4} c_{8} a_{0}^{r_{n}} r_{n} n^{- 2 κ}) \\ + P (| W_{n} (α_{j}^{0}) - W (α_{j}^{0}) | \geq \frac{1}{4} c_{8} a_{0}^{r_{n}} r_{n} n^{- 2 κ}) \\ ≜ J_{n 1} + J_{n 2} . \end{matrix}

(A25)

In what follows, we first consider

J_{n 2}

. Let

U_{i j} = [ρ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0}) - ρ_{τ} (Y_{i})] - E [ρ_{τ} (Y - B_{j}^{T} α_{j}^{0}) - ρ_{τ} (Y)]

and then

W_{n} (α_{j}^{0}) - W (α_{j}^{0}) = \frac{1}{n} \sum_{i = 1}^{n} U_{i j}

. Note that using the Knight’s identity, we have

| ρ_{τ} (u - v) - ρ_{τ} (u) | \leq | v | max {τ - 1, τ} \leq | v |

. So, by using condition (C5), it follows that

\begin{matrix} | U_{i j} | \leq 2 | ρ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0}) - ρ_{τ} (Y_{i}) | \leq 2 sup_{i, j} | B_{i j}^{T} α_{j}^{0} | \leq 2 M_{1}, \end{matrix}

and

\begin{matrix} var (U_{i j}) \leq E \{{[ρ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0}) - ρ_{τ} (Y_{i})]}^{2}\} \leq E \{sup_{i, j} | B_{i j}^{T} α_{j}^{0} |^{2}\} \leq M_{1}^{2} . \end{matrix}

According to Lemma A3, we have

\begin{matrix} J_{n 2} & = & P (| \frac{1}{n} \sum_{i = 1}^{n} U_{i j} | \geq \frac{1}{4} c_{8} a_{0}^{r_{n}} r_{n} n^{- 2 κ}) \\ \leq & 2 exp (- \frac{16^{- 1} c_{8}^{2} a_{0}^{2 r_{n}} r_{n}^{2} n^{2 - 4 κ}}{2 (n M_{1}^{2} + M_{1} c_{8} a_{0}^{r_{n}} r_{n} n^{1 - 2 κ} / 6)}) \\ \leq & 2 exp (- c_{9} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) \end{matrix}

(A26)

for some positive constant

c_{9}

, provided

a_{0}^{r_{n}} r_{n} n^{- 2 κ} = o (1)

.

Next, we consider

J_{n 1}

. Define

V_{i j} (α_{j}) = ρ_{τ} (Y_{i} - B_{i j}^{T} α_{j}) - ρ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0})

and so

W_{n} (α_{j}) - W_{n} (α_{j}^{0}) = \frac{1}{n} \sum_{i = 1}^{n} V_{i j} (α_{j})

. This leads to

\begin{matrix} J_{n 1} = P (sup_{∥ α_{j} - α_{j}^{0} ∥ \leq C {(r_{n} L_{n})}^{1 / 2} n^{- κ}} | \frac{1}{n} \sum_{i = 1}^{n} [V_{i j} (α_{j}) - E {V_{i j} (α_{j})}] | \geq \frac{1}{4} c_{8} a_{0}^{r_{n}} r_{n} n^{- 2 κ}) . \end{matrix}

(A27)

Again, using the Knight’s identity, we obtain

\begin{matrix} | V_{i j} (α_{j}) | & \leq & | B_{i j}^{T} (α_{j} - α_{j}^{0}) [I (Y_{i} - B_{i j}^{T} α_{j}^{0} < 0) - τ] | \\ + | \int_{0}^{B_{i j}^{T} (α_{j} - α_{j}^{0})} {I (Y_{i} - B_{i j}^{T} α_{j}^{0} \leq s) - I (Y_{i} - B_{i j}^{T} α_{j}^{0} \leq 0)} d s | \\ \leq & 2 | B_{i j}^{T} (α_{j} - α_{j}^{0}) | \leq 2 (| S_{j} | L_{n})^{1 / 2} ∥ α_{j} - α_{j}^{0} ∥, \end{matrix}

where the last line is because

∥ B_{k} ∥_{\infty} \leq 1

. Thus, it follows that

\begin{matrix} sup_{∥ α_{j} - α_{j}^{0} ∥ \leq C {(r_{n} L_{n})}^{1 / 2} n^{- κ}} | V_{i j} (α_{j}) | & \leq & 2 (| S_{j} {| L_{n})}^{1 / 2} \{sup_{∥ α_{j} - α_{j}^{0} ∥ \leq C {(r_{n} L_{n})}^{1 / 2} n^{- κ}} ∥ α_{j} - α_{j}^{0} ∥\} \\ \leq & 2 C r_{n} L_{n} n^{- κ} . \end{matrix}

(A28)

Let

ε_{1}, \dots, ε_{n}

be a Rademacher sequence independent of

V_{i j} (α_{j})

. By Lemmas A4 and A5, we have

\begin{matrix} E \{sup_{∥ α_{j} - α_{j}^{0} ∥ \leq C {(r_{n} L_{n})}^{1 / 2} n^{- κ}} | \frac{1}{n} \sum_{i = 1}^{n} [V_{i j} (α_{j}) - E {V_{i j} (α_{j})}] |\} \\ \leq 2 E \{sup_{∥ α_{j} - α_{j}^{0} ∥ \leq C {(r_{n} L_{n})}^{1 / 2} n^{- κ}} | \frac{1}{n} \sum_{i = 1}^{n} ε_{i} V_{i j} (α_{j}) |\} \\ = 2 E \{sup_{∥ α_{j} - α_{j}^{0} ∥ \leq C {(r_{n} L_{n})}^{1 / 2} n^{- κ}} | \frac{1}{n} \sum_{i = 1}^{n} ε_{i} [ρ_{τ} (Y_{i} - B_{i j}^{T} α_{j}) - ρ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0})] |\} \\ \leq 4 E \{sup_{∥ α_{j} - α_{j}^{0} ∥ \leq C {(r_{n} L_{n})}^{1 / 2} n^{- κ}} | \frac{1}{n} \sum_{i = 1}^{n} ε_{i} B_{i j}^{T} (α_{j} - α_{j}^{0}) |\} \\ \leq 4 C {(r_{n} L_{n})}^{1 / 2} n^{- κ} E \{∥ \frac{1}{n} \sum_{i = 1}^{n} ε_{i} B_{i j} ∥\} \\ \leq 4 C {(r_{n} L_{n})}^{1 / 2} n^{- κ} {\{E ∥ \frac{1}{n} \sum_{i = 1}^{n} ε_{i} B_{i j} ∥^{2}\}}^{1 / 2} \\ = 4 C {(r_{n} L_{n})}^{1 / 2} n^{- κ} {\{n^{- 2} \sum_{k \in S_{j}} \sum_{l = 1}^{L_{n}} \sum_{i = 1}^{n} E [ε_{i}^{2} B_{l}^{2} (X_{i k})]\}}^{1 / 2} \\ \leq c_{10} r_{n} L_{n}^{1 / 2} n^{- \frac{1}{2} - κ}, \end{matrix}

where

c_{10} = 4 C C_{3}^{1 / 2}

and we have used (10) in the last line. With the above arguments, we can apply Lemma A6 to derive

J_{n 2}

in equation (A27). Set

U = sup_{∥ α_{j} - α_{j}^{0} ∥ \leq C {(r_{n} L_{n})}^{1 / 2} n^{- κ}} | \frac{1}{n} \sum_{i = 1}^{n} [V_{i j} (α_{j}) - E {V_{i j} (α_{j})}] | .

Taking

t = \frac{1}{4} c_{8} a_{0}^{r_{n}} r_{n} n^{- 2 κ} - c_{10} r_{n} L_{n}^{1 / 2} n^{- \frac{1}{2} - κ}

in Lemma A6, we have

\begin{matrix} J_{n 2} & = & P (U \geq \frac{1}{4} c_{8} a_{0}^{r_{n}} r_{n} n^{- 2 κ}) = P (U \geq E {U} + (\frac{1}{4} c_{8} a_{0}^{r_{n}} r_{n} n^{- 2 κ} - E {U})) \\ \leq & P (U \geq E {U} + (\frac{1}{4} c_{8} a_{0}^{r_{n}} r_{n} n^{- 2 κ} - c_{10} r_{n} L_{n}^{1 / 2} n^{- \frac{1}{2} - κ})) \\ \leq & exp (- \frac{n {(\frac{1}{4} c_{8} a_{0}^{r_{n}} r_{n} n^{- 2 κ} - c_{10} r_{n} L_{n}^{1 / 2} n^{- \frac{1}{2} - κ})}^{2}}{2 {(2 C r_{n} L_{n} n^{- κ})}^{2}}) \\ \leq & exp (- c_{11} a_{0}^{2 r_{n}} L_{n}^{- 2} n^{1 - 2 κ}) \end{matrix}

(A29)

foe some positive constant

c_{11}

, provided

a_{0}^{- 2 r_{n}} L_{n} / n^{1 - 2 κ} = o (1)

. Plugging (A26) and (A29) into (A25) gives the desired result. □

Lemma A10.

Under conditions (C1)–(C5), for every

1 \leq j \leq p

and for any given constant

c_{5}^{*}

, there exist some positive constants

c_{6}^{*}

and

c_{7}^{*}

such that

\begin{matrix} P (| \frac{1}{n} \sum_{i = 1}^{n} ψ_{τ} (Y_{i} - B_{i j}^{T} {\hat{α}}_{j}) (X_{i j} - B_{i j}^{T} {\hat{θ}}_{j}) - E {ψ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0}) (X_{i j} - B_{i j}^{T} θ_{j}^{0})} | \geq c_{5}^{*} r_{n} n^{- κ}) \\ \leq 7 exp (- c_{6}^{*} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) + [8 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{7}^{*} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) . \end{matrix}

Proof of Lemma A10.

Since

E {B_{i j} ψ_{τ} (Y_{i} - B_{i j}^{T} θ_{j}^{0})} = 0

by definition, so

E {ψ_{τ} (Y_{i} - B_{i j}^{T} θ_{j}^{0}) (X_{i j} - B_{i j}^{T} θ_{j}^{0})} = E {ψ_{τ} (Y_{i} - B_{i j}^{T} θ_{j}^{0}) X_{i j}}

. A simple decomposition gives

\begin{matrix} n^{- 1} \sum_{i = 1}^{n} ψ_{τ} (Y_{i} - B_{i j}^{T} {\hat{α}}_{j}) (X_{i j} - B_{i j}^{T} {\hat{θ}}_{j}) - E {ψ_{τ} (Y_{i} - B_{j}^{T} α_{j}^{0}) X_{i j}} \\ = n^{- 1} \sum_{i = 1}^{n} [ψ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0}) X_{i j} - E {ψ_{τ} (Y - B_{j}^{T} α_{j}^{0}) X_{j}}] \\ + n^{- 1} \sum_{i = 1}^{n} {ψ_{τ} (Y_{i} - B_{i j}^{T} {\hat{α}}_{j}) - ψ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0})} X_{i j} - n^{- 1} \sum_{i = 1}^{n} ψ_{τ} (Y_{i} - B_{i j}^{T} {\hat{α}}_{j}) B_{i j}^{T} {\hat{θ}}_{j} \\ ≜ Δ_{n 1 j} + Δ_{n 2 j} + Δ_{n 3 j} . \end{matrix}

(A30)

The rest is to find exponential bounds for the tail probabilities of

Δ_{n 1 j}, Δ_{n 2 j}

and

Δ_{n 3 j}

, respectively.

For

Δ_{n 1 j}

, since

| ψ_{τ} (u) | \leq max (τ, 1 - τ) \leq 1

, so it follows from the

C_{r}

inequality and Lemma A1 that for each

r \geq 2

,

\begin{matrix} E \{| ψ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0}) X_{i j} - E {ψ_{τ} (Y - B_{j}^{T} α_{j}^{0}) X_{j}} |^{r}\} \\ \leq 2^{r} E \{| ψ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0}) X_{i j} |^{r}\} \leq 2^{r} E \{| X_{i j} |^{r}\} \\ = 2^{r} E \{E (| X_{i j} |^{r} | X_{- j})\} \leq 2^{r} K_{1} K_{2}^{r} r! = r! {(2 K_{2})}^{r - 2} 8 K_{1} K_{2}^{2} / 2 . \end{matrix}

Invoking Lemma A2, for any

δ > 0

, we have

\begin{matrix} P (| Δ_{n 1 j} | \geq δ / n) \leq 2 exp (- \frac{δ^{2}}{c_{12} n + c_{13} δ}), \end{matrix}

(A31)

where

c_{12} = 16 K_{1} K_{2}^{2}

and

c_{13} = 4 K_{2}

.

For

Δ_{n 2 j}

, note that for each

r \geq 2

,

\begin{matrix} P (| Δ_{n 2 j} | \geq δ / n) & \leq & P (| Δ_{n 2 j} | \geq δ / n, ∥ {\hat{α}}_{j} - α_{j}^{0} ∥ < C {(r_{n} L_{n})}^{1 / 2} n^{- κ}) \\ + P (∥ {\hat{α}}_{j} - α_{j}^{0} ∥ \geq C {(r_{n} L_{n})}^{1 / 2} n^{- κ}) \\ ≜ & H_{n 1 j} + H_{n 2 j}, \end{matrix}

(A32)

where a direct application of Lemma A9 yields

H_{n 2 j} \leq 2 exp (- c_{9} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) + exp (- c_{11} a_{0}^{2 r_{n}} L_{n}^{- 2} n^{1 - 2 κ})

. Let

{\hat{α}}_{j} = α_{j}^{0} + C {(r_{n} L_{n})}^{1 / 2} n^{- κ} u

with

∥ u ∥ \leq 1

. Denote

Π_{i j} = sup_{∥ u ∥ \leq 1} | {ψ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0} - C {(r_{n} L_{n})}^{1 / 2} n^{- κ} B_{i j}^{T} u) - ψ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0})} X_{i j} | .

Then,

\begin{matrix} H_{n 1 j} \leq P (| n^{- 1} \sum_{i = 1}^{n} Π_{i j} | \geq \frac{δ}{n}) . \end{matrix}

(A33)

Furthermore, there exists a

u^{*} = {({{u_{k}^{*}}^{T}, k \in S_{j}})}^{T}

with

∥ u^{*} ∥ \leq 1

and

u_{k}^{*} \in R^{L_{n}}

such that

\begin{matrix} E {Π_{i j}} & = & E {| {ψ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0} - C {(r_{n} L_{n})}^{1 / 2} n^{- κ} B_{i j}^{T} u^{*}) - ψ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0})} X_{i j} |} \\ \leq & E \{| \int_{B_{i j}^{T} α_{j}^{0}}^{B_{i j}^{T} α_{j}^{0} + C {(r_{n} L_{n})}^{1 / 2} n^{- κ} B_{i j}^{T} u^{*}} f_{Y | X} (y) d y | | X_{i j} |\} \\ \leq & c_{4 f} C {(r_{n} L_{n})}^{1 / 2} n^{- κ} E \{| B_{i j}^{T} u^{*} | | X_{i j} |\} \\ \leq & c_{4 f} C {(r_{n} L_{n})}^{1 / 2} n^{- κ} \sqrt{E \{| B_{i j}^{T} u^{*} |^{2}\} E {| X_{i j} |^{2}}} \\ \leq & c_{14} r_{n} n^{- κ} \end{matrix}

for some positive constant

c_{14}

, where we have used condition (C4) in the third line, Cauchy–Schwarz inequality in the fourth line, Lemmas 1 and A1 in the last line. Analogously to (A31), we have for each

r \geq 2

,

\begin{matrix} E {| Π_{i j} - E (Π_{i j}) |^{r}} \leq 2^{r} E {| Π_{i j} |^{r}} \leq 2^{r} E {2^{r} | X_{i j} |^{r}} \leq r! {(4 K_{2})}^{r - 2} 32 K_{2}^{2} K_{1} / 2 \end{matrix}

and it follows from Lemma A2 that for any

δ > 0

,

\begin{matrix} P (| \frac{1}{n} \sum_{i = 1}^{n} {Π_{i} - E (Π_{i j})} | \geq \frac{δ}{n}) \leq 2 exp (- \frac{δ^{2}}{c_{15} n + c_{16} δ}), \end{matrix}

(A34)

where

c_{15} = 64 K_{1} K_{2}^{2}

and

c_{16} = 8 K_{2}

. Setting

δ = c_{14} r_{n} n^{1 - κ}

in (A34), we obtain

\begin{matrix} P (| \frac{1}{n} \sum_{i = 1}^{n} Π_{i} | \geq 2 c_{14} r_{n} n^{- κ}) \\ \leq P (| \frac{1}{n} \sum_{i = 1}^{n} {Π_{i} - E (Π_{i j})} | \geq 2 c_{14} r_{n} n^{- κ} - E (Π_{i j})) \\ \leq P (| \frac{1}{n} \sum_{i = 1}^{n} {Π_{i} - E (Π_{i j})} | \geq c_{14} r_{n} n^{- κ}) \\ \leq 2 exp (- c_{17} r_{n}^{2} n^{1 - 2 κ}) . \end{matrix}

(A35)

As

r_{n}^{2} n^{1 - 2 κ} / (a_{0}^{2 r_{n}} L_{n}^{- 2} n^{1 - 2 κ}) \to \infty

as

n \to \infty

, combining (A32), (A33) and (A35), we obtain

\begin{matrix} P (| Δ_{n 2 j} | \geq 2 c_{14} r_{n} n^{- κ}) & \leq & 2 exp (- c_{9} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) + 3 exp (- c_{11} a_{0}^{2 r_{n}} L_{n}^{- 2} n^{1 - 2 κ}) \\ \leq & 5 exp (- c_{18} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) \end{matrix}

(A36)

for some positive constant

c_{18}

.

Finally, we consider

Δ_{n 3 j}

. Denote

Φ (α_{j}) = n^{- 1} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - B_{i j}^{T} α_{j})

and define its subdifferential as

\partial Φ (α_{j}) = {({\partial Φ_{(k - 1) L_{n} + l} (α_{j}) : k \in S_{j}, l = 1, \dots, L_{n}})}^{T}

with

\begin{matrix} \partial Φ_{(k - 1) L_{n} + l} (α_{j}) = - n^{- 1} \sum_{i = 1}^{n} ψ_{τ} (Y_{i} - B_{i j}^{T} α_{j}) B_{l} (X_{i k}) - n^{- 1} \sum_{i = 1}^{n} I (Y_{i} - B_{i j}^{T} α_{j} = 0) v_{i} B_{l} (X_{i k}) \end{matrix}

and

v_{i} \in [τ - 1, τ]

. Recalling the definition of

{\hat{α}}_{j}

, there exists

v_{i}^{*} \in [τ - 1 . τ]

such that

\partial Φ_{(k - 1) L_{n} + l} ({\hat{α}}_{j}) = 0

. This yields

\begin{matrix} Δ_{n 3 j} = n^{- 1} \sum_{i = 1}^{n} I (Y_{i} - B_{i j}^{T} {\hat{α}}_{j} = 0) v_{i} B_{i j}^{T} {\hat{θ}}_{j} . \end{matrix}

Thus, by condition (C5), it follows that

\begin{matrix} | Δ_{n 3 j} | & \leq & n^{- 1} \sum_{i = 1}^{n} I (Y_{i} - B_{i j}^{T} {\hat{α}}_{j} = 0) | B_{i j}^{T} {\hat{θ}}_{j} | \\ \leq & n^{- 1} \sum_{i = 1}^{n} I (Y_{i} - B_{i j}^{T} {\hat{α}}_{j} = 0) | B_{i j}^{T} θ_{j}^{0} | \\ + n^{- 1} \sum_{i = 1}^{n} I (Y_{i} - B_{i j}^{T} {\hat{α}}_{j} = 0) | B_{i j}^{T} ({\hat{θ}}_{j} - θ_{j}^{0}) | \\ \leq & n^{- 1} \sum_{i = 1}^{n} I (Y_{i} - B_{i j}^{T} {\hat{α}}_{j} = 0) (M_{2} + {(r_{n} L_{n})}^{1 / 2} ∥ {\hat{θ}}_{j} - θ_{j}^{0} ∥) . \end{matrix}

(A37)

Using Lemma A8, we obtain

\begin{matrix} P (M_{2} + {(r_{n} L_{n})}^{1 / 2} ∥ {\hat{θ}}_{j} - θ_{j}^{0} ∥ \geq M_{2} + c_{1}^{*} a_{0}^{- r_{n}} r_{n} L_{n}^{3 / 2}) \\ \leq [8 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{2}^{*} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) . \end{matrix}

(A38)

Note that

P (n^{- 1} \sum_{i = 1}^{n} I (Y_{i} - B_{i j}^{T} {\hat{α}}_{j} = 0) \geq ϵ) = 0

for any

ϵ > 0

. Letting

ϵ = n^{- 1} L_{n}^{- 3 / 2}

, we thus have

\begin{matrix} P (n^{- 1} \sum_{i = 1}^{n} I (Y_{i} - B_{i j}^{T} {\hat{α}}_{j} = 0) \geq n^{- 1} L_{n}^{- 3 / 2}) = 0 . \end{matrix}

(A39)

Gathering (A37)–(A39) gives

\begin{matrix} P (| Δ_{n 3 j} | \geq n^{- 1} L_{n}^{- 3 / 2} (M_{2} + c_{1}^{*} a_{0}^{- r_{n}} r_{n} L_{n}^{3 / 2})) \\ \leq [8 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{2}^{*} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) . \end{matrix}

(A40)

Furthermore, using (A31) with

δ = c_{14} r_{n} n^{1 - κ}

, we have

\begin{matrix} P (| Δ_{n 1 j} \geq c_{14} r_{n} n^{- κ}) \leq 2 exp (- c_{3}^{*} r_{n}^{2} n^{1 - 2 κ}) \end{matrix}

(A41)

for some positive constant

c_{3}^{*}

. Accordingly, by (A36), (A40) and (A41), we obtain

\begin{matrix} P (| Δ_{n 1 j} + Δ_{n 2 j} + Δ_{n 3 j} | \geq 3 c_{14} r_{n} n^{- κ} + n^{- 1} L_{n}^{- 3 / 2} (M_{2} + c_{1}^{*} a_{0}^{- r_{n}} r_{n} L_{n}^{3 / 2})) \\ \leq 2 exp (- c_{3}^{*} r_{n}^{2} n^{1 - 2 κ}) + 5 exp (- c_{18} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) \\ + [8 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{2}^{*} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) \\ \leq 7 exp (- c_{4}^{*} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) + [8 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{2}^{*} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) \end{matrix}

for some positive constant

c_{4}^{*}

. As a result, the desired result follows for some given positive constant

c_{5}^{*} = 3 c_{14} + M_{2} + c_{1}^{*}

and for sufficiently large n. □

Lemma A11.

Under conditions (C1)–(C5), for every

1 \leq j \leq p

and for any given constant

c_{8}^{*}

, there exist some positive constants

c_{10}^{*}

and

c_{12}^{*}

such that

\begin{matrix} P (| {\hat{σ}}_{j}^{2} - σ_{j}^{2} | \geq c_{5}^{*} r_{n} n^{- κ}) & \leq & [8 {(r_{n} L_{n})}^{2} + 6 r_{n} L_{n} + 2] exp (- c_{13}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ}) \\ + [10 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{10}^{*} a_{0}^{4 r_{n}} r_{n}^{- 3} L_{n}^{- 4} n^{1 - κ}) \end{matrix}

when n is sufficiently large. In addition, for some

{\tilde{c}}_{1} \in (0, 1)

,

\begin{matrix} P (| {\hat{σ}}_{j}^{2} - σ_{j}^{2} | \geq {\tilde{c}}_{1} σ_{j}^{2}) & \leq & [8 {(r_{n} L_{n})}^{2} + 6 r_{n} L_{n} + 2] exp (- c_{13}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ}) \\ + [10 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{10}^{*} a_{0}^{4 r_{n}} r_{n}^{- 3} L_{n}^{- 4} n^{1 - κ}) \end{matrix}

Proof of Lemma A11.

Recalling the definition of

{\hat{σ}}_{j}^{2}

and

σ_{j}^{2}

, we have

\begin{matrix} | {\hat{σ}}_{j}^{2} - σ_{j}^{2} | & \leq & | n^{- 1} \sum_{i = 1}^{n} {(X_{i j} - B_{i j}^{T} θ_{j}^{0})}^{2} - E {{(X_{i j} - B_{i j}^{T} θ_{j}^{0})}^{2}} | \\ + | n^{- 1} \sum_{i = 1}^{n} {(X_{i j} - B_{i j}^{T} {\hat{θ}}_{j})}^{2} - n^{- 1} \sum_{i = 1}^{n} {(X_{i j} - B_{i j}^{T} θ_{j}^{0})}^{2} | \\ ≜ Ξ_{n 1 j} + Ξ_{n 2 j} . \end{matrix}

(A42)

Let

ξ_{i j} = {(X_{i j} - B_{i j}^{T} θ_{j}^{0})}^{2} - E {{(X_{i j} - B_{i j}^{T} θ_{j}^{0})}^{2}}

. For every

r \geq 2

, by the

C_{r}

inequality and condition (C5), we have

E {| ξ_{i j} |^{r}} \leq 2^{r} E {{(X_{i j} - B_{i j}^{T} θ_{j}^{0})}^{2 r}} \leq 2^{3 r - 1} {E | X_{i j} |^{2 r} + M_{2}^{2 r}} \leq 2^{3 r - 1} {K_{1} K_{2}^{2 r} (2 r)! + M_{2}^{2 r}} \leq 2^{3 r} {\tilde{K}}_{1} {\tilde{K}}_{2}^{2 r} (2 r)! \leq 2^{3 r} {\tilde{K}}_{1} {\tilde{K}}_{2}^{2 r} {(2 r)}^{r} r! = r! {(16 r {\tilde{K}}_{2}^{2})}^{r - 2} 512 {(r {\tilde{K}}_{2}^{2})}^{2} {\tilde{K}}_{1} / 2

with

{\tilde{K}}_{1} = max (K_{1}, 1)

and

{\tilde{K}}_{2} = max (K_{2}, M_{2})

. Thus, by Lemma A2, it follows

\begin{matrix} P (Ξ_{n 1 j} \geq \frac{1}{2} c_{5}^{*} r_{n} n^{- κ}) \leq 2 exp (- c_{6}^{*} r_{n}^{2} n^{1 - 2 κ}) \end{matrix}

(A43)

for some positive constant

c_{6}^{*}

. In addition, it is easily derived that

\begin{matrix} Ξ_{n 2 j} & \leq & {({\hat{θ}}_{j} - θ_{j}^{0})}^{T} D_{n j} ({\hat{θ}}_{j} - θ_{j}^{0}) + | 2 n^{- 1} \sum_{i = 1}^{n} (X_{i j} - B_{i j}^{T} θ_{j}^{0}) B_{i j}^{T} ({\hat{θ}}_{j} - θ_{j}^{0}) | \\ ≜ & Ξ_{n 2 j}^{(1)} + Ξ_{n 2 j}^{(2)}, \end{matrix}

(A44)

where

Ξ_{n 2 j}^{(1)} \leq λ_{max} (D_{n j}) {∥ {\hat{θ}}_{j} - θ_{j}^{0} ∥}^{2}

. Similarly, applying the arguments used in deriving Lemma A7(ii), we have that for any constant

{\tilde{c}}_{1} \in (0, 1)

, there exists some finite positive constant

c_{7}^{*}

such that

\begin{matrix} P (| λ_{max} (D_{n j}) | \geq (1 + {\tilde{c}}_{1}) λ_{max} (D_{j})) \leq 2 {(r_{n} L_{n})}^{2} exp (- c_{7}^{*} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) . \end{matrix}

This together with Lemma 1 yields

\begin{matrix} P (| λ_{max} (D_{n j}) | \geq (1 + {\tilde{c}}_{1}) C_{2} r_{n} L_{n}^{- 1}) \leq 2 {(r_{n} L_{n})}^{2} exp (- c_{7}^{*} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) . \end{matrix}

(A45)

Moreover, employing (A21) with

δ = {(1 + {\tilde{c}}_{1})}^{- 1 / 2} C_{2}^{- 1 / 2} {(c_{5}^{*} / 4)}^{1 / 2} c_{1}^{* - 1} a_{0}^{2 r_{n}} r_{n}^{- 3 / 2} L_{n}^{- 5 / 2} n^{1 - κ / 2}

, we have

\begin{matrix} P (∥ {\hat{θ}}_{j} - θ_{j}^{0} ∥ \geq {(1 + {\tilde{c}}_{1})}^{- 1 / 2} C_{2}^{- 1 / 2} {(c_{5}^{*} / 4)}^{1 / 2} L_{n}^{1 / 2} n^{- κ / 2}) \\ \leq 4 [{(r_{n} L_{n})}^{2} + r_{n} L_{n}] exp (- c_{8}^{*} a_{0}^{4 r_{n}} r_{n}^{- 3} L_{n}^{- 4} n^{1 - κ}) \\ + 4 {(r_{n} L_{n})}^{2} exp (- c_{2}^{*} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) \\ \leq 4 [2 {(r_{n} L_{n})}^{2} + r_{n} L_{n}] exp (- c_{9}^{*} a_{0}^{4 r_{n}} r_{n}^{- 3} L_{n}^{- 4} n^{1 - κ}) \end{matrix}

for some positive constants

c_{8}^{*}

and

c_{9}^{*}

. This in conjunction with (A45) gives

\begin{matrix} P (Ξ_{n 2 j}^{(1)} \geq \frac{1}{4} c_{5}^{*} r_{n} n^{- κ}) & \leq & P (| λ_{max} (D_{n j}) | \geq (1 + {\tilde{c}}_{1}) C_{2} r_{n} L_{n}^{- 1}) \\ + P (∥ {\hat{θ}}_{j} - θ_{j}^{0} ∥ \geq {(1 + {\tilde{c}}_{1})}^{- 1 / 2} C_{2}^{- 1 / 2} {(c_{5}^{*} / 4)}^{1 / 2} L_{n}^{1 / 2} n^{- κ / 2}) \\ \leq & [10 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{10}^{*} a_{0}^{4 r_{n}} r_{n}^{- 3} L_{n}^{- 4} n^{1 - κ}) \end{matrix}

(A46)

for some positive constant

c_{10}^{*}

. For

Ξ_{n 2 j}^{(2)}

, let

N_{i k l j} = (X_{i j} - B_{i j}^{T} θ_{j}^{0}) B_{l} (X_{i k}), k \in S_{j}, l = 1, \dots, L_{n}

, and then for every

r \geq 2

,

E {| N_{i k l j} |^{r}} \leq E {| X_{i j} - B_{i j}^{T} θ_{j}^{0} |^{r}} \leq 2^{r - 1} {E | X_{i j} |^{r} + {sup}_{i, j} | B_{i j}^{T} θ_{j}^{0} |^{r}} \leq 2^{r - 1} (K_{1} K_{2}^{r} r! + M_{2}^{r}) \leq 2^{r - 1} (2 {\tilde{K}}_{1} {\tilde{K}}_{2}^{r} r!) = r! {(2 {\tilde{K}}_{2})}^{r - 2} 8 {\tilde{K}}_{1} {\tilde{K}}_{2}^{2} / 2

, where

{\tilde{K}}_{1} = max (K_{1}, 1)

and

{\tilde{K}}_{2} = max (K_{2}, M_{2})

. Thus, it follows from Lemma A2 that

\begin{matrix} P (| n^{- 1} \sum_{i = 1}^{n} N_{i k l j} | \geq \frac{1}{8} c_{5}^{*} c_{1}^{* - 1} a_{0}^{r_{n}} L_{n}^{- 3 / 2} n^{- κ}) \leq 2 exp (- c_{11}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ}) \end{matrix}

(A47)

for some positive constant

c_{11}^{*}

. Note that

∥ n^{- 1} \sum_{i = 1}^{n} (X_{i j} - B_{i j}^{T} θ_{j}^{0}) B_{i j} ∥ \leq {(r_{n} L_{n})}^{1 / 2} {max}_{k, l} | N_{i k l j} |

. This together with (A47) and the union bound of probability gives

\begin{matrix} P (∥ n^{- 1} \sum_{i = 1}^{n} (X_{i j} - B_{i j}^{T} θ_{j}^{0}) B_{i j} ∥ \geq \frac{1}{8} c_{5}^{*} c_{1}^{* - 1} a_{0}^{r_{n}} r_{n}^{1 / 2} L_{n}^{- 1} n^{- κ}) \\ \leq 2 (r_{n} L_{n}) exp (- c_{11}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ}) . \end{matrix}

(A48)

Using Lemma A8 and (A48), we obtain

\begin{matrix} P (Ξ_{n 2 j}^{(2)} \geq \frac{1}{4} c_{5}^{*} r_{n} n^{- κ}) & \leq & P (∥ n^{- 1} \sum_{i = 1}^{n} (X_{i j} - B_{i j}^{T} θ_{j}^{0}) B_{i j} ∥ ∥ {\hat{θ}}_{j} - θ_{j}^{0} ∥ \geq \frac{1}{8} c_{5}^{*} r_{n} n^{- κ}) \\ \leq & P (∥ n^{- 1} \sum_{i = 1}^{n} (X_{i j} - B_{i j}^{T} θ_{j}^{0}) B_{i j} ∥ \geq \frac{1}{8} c_{1}^{* - 1} c_{5}^{*} a_{0}^{r_{n}} r_{n}^{1 / 2} L_{n}^{- 1} n^{- κ}) \\ + P (∥ {\hat{θ}}_{j} - θ_{j}^{0} ∥ \geq c_{1}^{*} a_{0}^{- r_{n}} r_{n}^{1 / 2} L_{n}) \\ \leq & 2 (r_{n} L_{n}) exp (- c_{11}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ}) \\ + [8 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{2}^{*} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) \\ \leq & [8 {(r_{n} L_{n})}^{2} + 6 r_{n} L_{n}] exp (- c_{12}^{*} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) \end{matrix}

(A49)

for some positive constant

c_{12}^{*}

. Therefore, combining (A43), (A44), (A46) and (A49), we can conclude the first result of Lemma A11. Moreover, the assumption that

r_{n} n^{- κ} = o (1)

implies

c_{5}^{*} r_{n} n^{- κ} \leq {\tilde{c}}_{1} σ_{j}^{2}

for large n. Hence, the second result of Lemma A11 follows from the first result. □

Proof of Theorem 1.

(i) We first show the first assertion. Let

H_{n 1 j} = \frac{1}{n} \sum_{i = 1}^{n} ψ_{τ} (Y_{i} - B_{i j}^{T} {\hat{α}}_{j}) (X_{i j} - B_{i j}^{T} {\hat{θ}}_{j})

,

H_{n 2 j} = \sqrt{{\hat{σ}}_{j}^{2}} = {\hat{σ}}_{j}

,

h_{1 j} = E {ψ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0}) (X_{i j} - B_{i j}^{T} θ_{j}^{0})}

and

h_{2 j} = σ_{j}

. Then,

\begin{matrix} | \hat{ϱ_{τ}} (Y, X_{j} | X_{S_{j}}) - ϱ_{τ}^{*} (Y, X_{j} | X_{S_{j}}) | \\ = H_{n 2 j}^{- 1} h_{2 j}^{- 1} | (H_{n 1 j} - h_{1 j}) h_{2 j} - h_{1 j} (H_{n 2 j} - h_{2 j}) | \\ \leq H_{n 2 j}^{- 1} | H_{n 1 j} - h_{1 j} | + H_{n 2 j}^{- 1} h_{2 j}^{- 1} | h_{1 j} | | H_{n 2 j} - h_{2 j} | . \end{matrix}

(A50)

We first show that for some given constant

C_{7} = {(\sqrt{1 - {\tilde{c}}_{1}} + 1)}^{- 1} M_{3}^{- 1 / 2} c_{5}^{*}

, there exists a positive constant

c_{13}^{*}

such that

\begin{matrix} P (| H_{n 2 j} - h_{2 j} | \geq C_{7} r_{n} n^{- κ}) & \leq & 2 [8 {(r_{n} L_{n})}^{2} + 6 r_{n} L_{n} + 2] exp (- c_{13}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ}) \\ + 2 [10 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{10}^{*} a_{0}^{4 r_{n}} r_{n}^{- 3} L_{n}^{- 4} n^{1 - κ}) . \end{matrix}

(A51)

To this end, using the fact that

\sqrt{x} - \sqrt{y} = (x - y) / (\sqrt{x} + \sqrt{y})

for positive x and y, we have

\begin{matrix} P (| H_{n 2 j} - h_{2 j} | \geq C_{7} r_{n} n^{- κ}) & = & P (| {\hat{σ}}_{j}^{2} - σ_{j}^{2} | \geq C_{7} r_{n} n^{- κ} ({\hat{σ}}_{j} + σ_{j})) \\ \leq & P (| {\hat{σ}}_{j}^{2} - σ_{j}^{2} | \geq C_{7} r_{n} n^{- κ} ({\hat{σ}}_{j} + σ_{j}), {\hat{σ}}_{j}^{2} > (1 - {\tilde{c}}_{1}) σ_{j}^{2}) \\ + P ({\hat{σ}}_{j}^{2} \leq (1 - {\tilde{c}}_{1}) σ_{j}^{2}) \\ \leq & P (| {\hat{σ}}_{j}^{2} - σ_{j}^{2} | \geq c_{5}^{*} r_{n} n^{- κ}) + P (| {\hat{σ}}_{j}^{2} - σ_{j}^{2} | \geq {\tilde{c}}_{1} σ_{j}^{2}), \end{matrix}

where the last line uses condition (C5). This together with Lemma A11 implies (A51). Notice that since

C_{7} r_{n} n^{- κ} = o (1)

, we have, for sufficiently large n, there exists a constant

{\tilde{c}}_{2} \in (0, 1)

such that

C_{7} r_{n} n^{- κ} \leq {\tilde{c}}_{2} M_{3}^{1 / 2} \leq {\tilde{c}}_{2} σ_{j}

. Thus,

\begin{matrix} P (H_{n 2 j} \leq (1 - {\tilde{c}}_{2}) h_{2 j}) \leq P (| H_{n 2 j} - h_{2 j} | \geq {\tilde{c}}_{2} σ_{j}) \leq P (| H_{n 2 j} - h_{2 j} | \geq C_{7} r_{n} n^{- κ}) \\ \leq 2 [8 {(r_{n} L_{n})}^{2} + 6 r_{n} L_{n} + 2] exp (- c_{13}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ}) \\ + 2 [10 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{10}^{*} a_{0}^{4 r_{n}} r_{n}^{- 3} L_{n}^{- 4} n^{1 - κ}) . \end{matrix}

(A52)

Accordingly,

\begin{matrix} P (H_{n 2 j}^{- 1} | H_{n 1 j} - h_{1 j} | \geq {(1 - {\tilde{c}}_{2})}^{- 1} M_{3}^{- 1 / 2} c_{5}^{*} r_{n} n^{- κ}) \\ \leq P (| H_{n 1 j} - h_{1 j} | \geq {(1 - {\tilde{c}}_{2})}^{- 1} M_{3}^{- 1 / 2} c_{5}^{*} r_{n} n^{- κ} H_{n 2 j}, H_{n 2 j} > (1 - {\tilde{c}}_{2}) h_{2 j}) \\ + P (H_{n 2 j} \leq (1 - {\tilde{c}}_{2}) h_{2 j}) \\ \leq P (| H_{n 1 j} - h_{1 j} | \geq c_{5}^{*} r_{n} n^{- κ}) + P (H_{n 2 j} \leq (1 - {\tilde{c}}_{2}) h_{2 j}) \\ \leq 7 exp (- c_{6}^{*} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) + [8 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{7}^{*} a_{0}^{2 r_{n}} r_{n}^{- 2} L_{n}^{- 3} n) \\ + 2 [8 {(r_{n} L_{n})}^{2} + 6 r_{n} L_{n} + 2] exp (- c_{13}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ}) \\ + 2 [10 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{10}^{*} a_{0}^{4 r_{n}} r_{n}^{- 3} L_{n}^{- 4} n^{1 - κ}) \\ \leq 7 exp (- c_{6}^{*} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) + [44 {(r_{n} L_{n})}^{2} + 20 r_{n} L_{n} + 2] exp (- c_{14}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ}), \end{matrix}

(A53)

where

c_{14}^{*} = min (c_{7}^{*}, c_{13}^{*}, c_{10}^{*})

and the last inequality is due to

a_{0}^{- 2 r_{n}} r_{n}^{3} L_{n} n^{- κ} = o (1)

. Moreover, observe that, by the definition of

θ_{j}^{0}

and Lemma A1,

\begin{matrix} | h_{1 j} | & = & | E {ψ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0}) (X_{i j} - B_{i j}^{T} θ_{j}^{0})} | = | E {ψ_{τ} (Y_{i} - B_{i j}^{T} α_{j}^{0}) X_{i j}} | \\ \leq & max (τ, 1 - τ) E {| X_{i j} |} \leq max (τ, 1 - τ) {E (X_{i j}^{2})}^{1 / 2} \leq M_{4}, \end{matrix}

where

M_{4} = max (τ, 1 - τ) \sqrt{2 K_{1} K_{2}^{2}}

. So it follows from condition (C5) and (A51) and (A51) that

\begin{matrix} P (H_{n 2 j}^{- 1} h_{2 j}^{- 1} | h_{1 j} | | H_{n 2 j} - h_{2 j} | \geq {(1 - {\tilde{c}}_{2})}^{- 1} M_{4} M_{3}^{- 3 / 2} {(\sqrt{1 - {\tilde{c}}_{1}} + 1)}^{- 1} c_{5}^{*} r_{n} n^{- κ}) \\ \leq P (| H_{n 2 j} - h_{2 j} | \geq {(1 - {\tilde{c}}_{2})}^{- 1} M_{3}^{- 1} {(\sqrt{1 - {\tilde{c}}_{1}} + 1)}^{- 1} c_{5}^{*} r_{n} n^{- κ} H_{n 2 j}) \\ \leq P (| H_{n 2 j} - h_{2 j} | \geq {(1 - {\tilde{c}}_{2})}^{- 1} M_{3}^{- 1} {(\sqrt{1 - {\tilde{c}}_{1}} + 1)}^{- 1} c_{5}^{*} r_{n} n^{- κ} H_{n 2 j}, H_{n 2 j} > (1 - {\tilde{c}}_{2}) h_{2 j}) \\ + P (H_{n 2 j} \leq (1 - {\tilde{c}}_{2}) h_{2 j}) \\ \leq P (| H_{n 2 j} - h_{2 j} | \geq C_{7} r_{n} n^{- κ}) + P (H_{n 2 j} \leq (1 - {\tilde{c}}_{2}) h_{2 j}) \\ \leq 4 [8 {(r_{n} L_{n})}^{2} + 6 r_{n} L_{n} + 2] exp (- c_{13}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ}) \\ + 4 [10 {(r_{n} L_{n})}^{2} + 4 r_{n} L_{n}] exp (- c_{10}^{*} a_{0}^{4 r_{n}} r_{n}^{- 3} L_{n}^{- 4} n^{1 - κ}) . \end{matrix}

(A54)

Put

C = {(1 - {\tilde{c}}_{2})}^{- 1} c_{5}^{*} M_{3}^{- 1 / 2} [1 + \frac{M_{4}}{M_{3}} {(\sqrt{1 - {\tilde{c}}_{1}} + 1)}^{- 1}] / \sqrt{τ (1 - τ)}

. Therefore, a direct application of (A53) and (A54) as well as the fact that

| x - y | \geq | | x | - | y | |

, we can obtain

\begin{matrix} max_{1 \leq j \leq p} P (| {\hat{u}}_{j} - u_{j} | \geq C r_{n} n^{- κ}) \\ \leq 7 exp (- c_{6}^{*} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) + [116 {(r_{n} L_{n})}^{2} + 60 r_{n} L_{n} + 10] exp (- c_{14}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ}) . \end{matrix}

This together with the union bound of probability proves the first assertion.

(ii) Next, we show the second assertion. By the choice of

ν_{n} = {\tilde{C}}_{0} r_{n} n^{- κ}

with

{\tilde{C}}_{0} \leq C_{0} / 2

and condition (C6), we have

\begin{matrix} P (M_{*} \subset \hat{M}) \geq P (min_{j \in M_{*}} {\hat{u}}_{j} > ν_{n}) = P (min_{j \in M_{*}} u_{j} - min_{j \in M_{*}} {\hat{u}}_{j} < min_{j \in M_{*}} u_{j} - ν_{n}) \\ \geq P (min_{j \in M_{*}} ({\hat{u}}_{j} - u_{j}) > ν_{n} - min_{j \in M_{*}} u_{j}) \geq P (min_{j \in M_{*}} u_{j} - max_{j \in M_{*}} | {\hat{u}}_{j} - u_{j} | > ν_{n}) \\ = 1 - P (max_{j \in M_{*}} | {\hat{u}}_{j} - u_{j} | \geq min_{j \in M_{*}} u_{j} - ν_{n}) \geq 1 - P (max_{j \in M_{*}} | {\hat{u}}_{j} - u_{j} | \geq ν_{n}) \\ \geq 1 - s_{n} {7 exp (- c_{6}^{*} a_{0}^{2 r_{n}} r_{n}^{2} n^{1 - 4 κ}) + [116 {(r_{n} L_{n})}^{2} + 60 r_{n} L_{n} + 10] exp (- c_{14}^{*} a_{0}^{2 r_{n}} L_{n}^{- 3} n^{1 - 2 κ})} . \end{matrix}

Thus, this completes the proof. □

Proof of Theorem 2.

By the assumption that

\sum_{i = 1}^{p} u_{j}^{*} = O (n^{ς})

which implies that the size of

{j : u_{j}^{*} > {\tilde{C}}_{0} r_{n} n^{- κ}}

cannot exceed

O (r_{n}^{- 1} n^{κ + ς})

. Thus, it follows that for any

δ > 0

, on the set

A_{n} = \{{max}_{1 \leq j \leq p} | {\hat{u}}_{j} - u_{j}^{*} | \leq δ r_{n} n^{- κ}\}

, the size of

{j : {\hat{u}}_{j} > 2 δ r_{n} n^{- κ}}

cannot exceed the size of

{j : u_{j}^{*} > δ r_{n} n^{- κ}}

, which is bounded by

O (r_{n}^{- 1} n^{κ + ς})

. Then, taking

δ = {\tilde{C}}_{0}

and

ν_{n} = 2 {\tilde{C}}_{0} r_{n} n^{- κ}

, we have

\begin{matrix} P (| \hat{M} | \leq O (r_{n}^{- 1} n^{κ + κ})) \geq P (A_{n}) \geq 1 - P (max_{1 \leq j \leq p} | {\hat{u}}_{j} - u_{j}^{*} | > {\tilde{C}}_{0} r_{n} n^{- κ}) . \end{matrix}

Therefore, the desired conclusion follows from part (i) of Theorem 1. □

References

Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zou, H.; Li, R. One-step sparse estimate in nonconcave penalized likelihood models. Ann. Stat. 2008, 36, 1509–1533. [Google Scholar] [PubMed]
Zhang, C. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. B 2008, 70, 849–911. [Google Scholar] [CrossRef] [Green Version]
Cheng, M.; Honda, T.; Li, J.; Peng, H. Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal data. Ann. Stat. 2014, 42, 1819–1849. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Feng, Y.; Song, R. Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Am. Stat. Assoc. 2011, 106, 544–557. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Ma, Y.; Dai, W. Nonparametric independent screening in sparse ultra-high dimensional varying coefficient models. J. Am. Stat. Assoc. 2014, 109, 1270–1284. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Song, R. Sure independence screening in generalized linear models with NP-dimensionality. Ann. Stat. 2010, 38, 3567–3604. [Google Scholar] [CrossRef]
Liu, J.; Li, R.; Wu, R. Feature selection for varying coefficient models with ultrahigh-dimensional covariates. J. Am. Stat. Assoc. 2014, 109, 266–274. [Google Scholar] [CrossRef]
Xia, X.; Li, J.; Fu, B. Conditional quantile correlation learning for ultrahigh dimensional varying coefficient models and its application in survival analysis. Statist. Sinica 2019, 29, 645–669. [Google Scholar] [CrossRef]
Chang, J.; Tang, C.Y.; Wu, Y. Local independence feature screening for nonparametric and semiparametric models by marginal empirical likelihood. Ann. Stat. 2016, 44, 515–539. [Google Scholar] [CrossRef] [PubMed]
He, X.; Wang, L.; Hong, H. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Stat. 2013, 41, 342–369. [Google Scholar] [CrossRef]
Liu, W.; Ke, Y.; Liu, J.; Li, R. Model-free feature screening and FDR control with knockoff features. J. Am. Stat. Assoc. 2022, 117, 428–443. [Google Scholar] [CrossRef]
Li, J.; Zheng, Q.; Peng, L.; Huang, Z. Survival impact index and ultrahigh-dimensional model-free screening with survival outcomes. Biometrics 2016, 72, 1145–1154. [Google Scholar] [CrossRef] [Green Version]
Li, R.; Zhong, W.; Zhu, L. Feature screening via distance correlation learning. J. Am. Stat. Assoc. 2012, 107, 1129–1139. [Google Scholar] [CrossRef] [Green Version]
Ma, S.; Li, R.; Tsai, C. Variable screening via quantile partial correlation. J. Am. Stat. Assoc. 2017, 112, 650–663. [Google Scholar] [CrossRef]
Mai, Q.; Zou, H. The fused Kolmogorov filter: A nonparametric model-free screening method. Ann. Stat. 2015, 43, 1471–1497. [Google Scholar] [CrossRef] [Green Version]
Wu, Y.; Yin, G. Conditional qunatile screening in ultrahigh-dimensional heterogeneous data. Biometrika 2015, 102, 65–76. [Google Scholar] [CrossRef]
Zhou, T.; Zhu, L.; Xu, C.; Li, R. Model-free forward screening via cumulative divergence. J. Am. Stat. Assoc. 2020, 115, 1393–1405. [Google Scholar] [CrossRef]
Barut, E.; Fan, J.; Verhasselt, A. Conditional sure independence screening. J. Am. Stat. Assoc. 2016, 111, 1266–1277. [Google Scholar] [CrossRef] [PubMed]
Xia, X.; Jiang, B.; Li, J.; Zhang, W. Low-dimensional confounder adjustment and high-dimensional penalized estimation for survival analysis. Lifetime Data Anal. 2016, 22, 549–569. [Google Scholar] [CrossRef] [PubMed]
Chu, W.; Li, R.; Meimherr, M. Feature screening for time-varying coefficient models with ultrahigh-dimensional longitudinal data. Ann. Appl. Stat. 2016, 10, 596–617. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Wang, Q. Model-free feature screening for ultrahigh-dimensional data conditional on some variables. Ann. I. Stat. Math. 2018, 70, 283–301. [Google Scholar] [CrossRef]
Wen, C.; Pan, W.; Huang, M.; Wang, X. Sure independence screening adjusted for confounding covariates with ultrahigh-dimensional data. Statist. Sinica 2018, 28, 293–317. [Google Scholar]
Li, R.; Liu, J.; Lou, L. Variable selection via partial correlation. Statist. Sinica 2017, 27, 983–996. [Google Scholar] [CrossRef]
Li, G.; Li, Y.; Tsai, C.L. Quantile correlations and quantile autoregressive modeling. J. Am. Stat. Assoc. 2015, 110, 246–261. [Google Scholar] [CrossRef] [Green Version]
Xia, X.; Li, J. Copula-based partial correlation screening: A joint and robust approach. Statist. Sinica 2021, 31, 421–447. [Google Scholar] [CrossRef]
De Boor, C. A Practical Guide to Splines; Springer: New York, NY, USA, 2001. [Google Scholar]
Huang, J.Z.; Wu, C.; Zhou, L. Varying-coefficient models and basis function approximation for the analysis of repeated measurements. Biometrika 2002, 89, 111–128. [Google Scholar] [CrossRef]
Xia, X. Model averaging prediction for nonparametric varying-coefficient models with B-spline smoothing. Stat. Pap. 2022, 62, 2885–2905. [Google Scholar] [CrossRef]
Huang, J.; Horowitz, J.; Wei, F. Variable selection in nonparametric additive models. Ann. Stat. 2010, 38, 2282–2313. [Google Scholar] [CrossRef] [PubMed]
Stone, C. Additive regression and other nonparametric models. Ann. Stat. 1985, 13, 689–705. [Google Scholar] [CrossRef]
Zhou, S.; Shen, X.; Wolfe, D.A. Local asymptotics for regression splines and confidence regions. Ann. Stat. 1998, 26, 1760–1782. [Google Scholar]
Kalisch, M.; Bühlmann, P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 2007, 8, 613–636. [Google Scholar]
Chin, K.; DeVries, S.; Fridlyand, J.; Spellman, P.T.; Roydasgupta, R.; Kuo, W.L.; Lapuk, A.; Neve, R.M.; Qian, Z.; Ryder, T.; et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell 2006, 10, 529–541. [Google Scholar] [CrossRef] [Green Version]
Zhou, Y.; Liu, J.; Hao, Z.; Zhu, L. Model-free conditional feature screening with exposure variables. Stat. Its Interface 2019, 12, 239–251. [Google Scholar] [CrossRef] [Green Version]
Chen, Z.; Fan, J.; Li, R. Error variance estimation in ultrahigh dimensional additive models. J. Am. Stat. Assoc. 2018, 113, 315–327. [Google Scholar] [CrossRef]
Van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes; Springer: New York, NY, USA, 1996. [Google Scholar]
Leddoux, M.; Talagrand, M. Probability in Banach Spaces: Isoperimetry and Processes; Springer: Berlin, Germany, 1991. [Google Scholar]
Massart, P. About the constants in talagrands concentration inequalities for empirical processes. Ann. Probab. 2000, 28, 863–884. [Google Scholar] [CrossRef]
Koenker, R. Quantile Regression; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]

Table 1. Simulation results for Example 1 when n = 200.

				$τ = 0.2$		$τ = 0.5$		$τ = 0.8$
$ε$	$ρ$	Method	$s_{n}$	MMS(RSD)	$P$	MMS(RSD)	$P$	MMS(RSD)	$P$
$N (0, 1)$	0.5	SIS	4	455.5(319.3)	0	437(330.3)	0	434.5(372)	0
		NIS	4	451(456.5)	0.025	506(421.3)	0	486.5(390.5)	0
		Qa-SIS	4	466(392.5)	0.02	466.5(375.5)	0.01	490.5(382.3)	0.01
		QPC-SIS	4	4(0)	1	4(0)	1	4(0)	1
		NQPC-SIS	4	4(0)	0.995	4(0)	1	4(0)	1
	0.8	SIS	4	444.5(141.3)	0	458(161.8)	0	452.5(188)	0
		NIS	4	489.5(274.5)	0	518.5(274)	0	511(285.8)	0
		Qa-SIS	4	522(372.3)	0.01	510.5(358)	0	560.5(292.8)	0
		QPC-SIS	4	5(2)	0.99	4(1)	1	5(2)	0.98
		NQPC-SIS	4	6(3)	0.96	4(2)	0.99	6(3)	0.96
$t (3)$	0.5	SIS	4	434.5(352.8)	0.005	475(343.3)	0	472.5(366)	0
		NIS	4	492.5(347.5)	0.01	501.5(415)	0	555.5(352.3)	0
		Qa-SIS	4	510.5(390.3)	0.005	481(463.3)	0.015	541.5(460.3)	0.01
		QPC-SIS	4	4(0)	1	4(0)	1	4(0)	1
		NQPC-SIS	4	4(0)	0.995	4(0)	1	4(0)	1
	0.8	SIS	4	453(135.8)	0	468(200.5)	0	473(283.5)	0
		NIS	4	535.5(288.3)	0	507(253.3)	0	507.5(368.3)	0
		Qa-SIS	4	597.5(329.3)	0	578.5(374)	0	591.5(366.8)	0.005
		QPC-SIS	4	6(3)	0.915	5(2)	0.975	6(2)	0.945
		NQPC-SIS	4	6.5(6)	0.84	5(3)	0.955	6(5.3)	0.855

Table 2. Simulation results for Example 2 when

n = 200

.

Table 2. Simulation results for Example 2 when

n = 200

.

				$τ = 0.2$		$τ = 0.5$		$τ = 0.8$
$ε$	$ρ$	Method	$s_{n}$	MMS(RSD)	$P$	MMS(RSD)	$P$	MMS(RSD)	$P$
$N (0, 1)$	0.5	SIS	5	439.5(359.5)	0	477(319)	0	427(324.8)	0
		NIS	5	522(362)	0.005	566(429.8)	0	507.5(392)	0
		Qa-SIS	5	542.5(400.3)	0	565.5(351.5)	0	554(340.3)	0
		QPC-SIS	5	5(0)	1	5(0)	1	5(0)	1
		NQPC-SIS	5	5(0)	1	5(0)	1	5(0)	1
	0.8	SIS	5	436(111.3)	0	479.5(232.8)	0	466.5(219.3)	0
		NIS	5	523.5(246.8)	0	556.5(265.8)	0	527(286.3)	0
		Qa-SIS	5	557.5(376.5)	0	604(358.8)	0	542(363.8)	0
		QPC-SIS	5	6(2)	0.97	6(2)	0.97	7(3)	0.93
		NQPC-SIS	5	7(2)	0.9	6(2)	0.945	7(3)	0.9
$t (3)$	0.5	SIS	5	478.5(347)	0	451.5(308.8)	0	483.5(322.8)	0
		NIS	5	535.5(384.3)	0	545.5(317.8)	0.005	508(353)	0
		Qa-SIS	5	597.5(389.5)	0.005	568.5(341.5)	0	593.5(435.8)	0.005
		QPC-SIS	5	5(0)	0.99	5(0)	1	5(0)	0.995
		NQPC-SIS	5	5(1)	0.985	5(0)	0.995	5(1)	0.975
	0.8	SIS	5	468(286.3)	0	477(238.5)	0	466(229)	0
		NIS	5	530.5(324.8)	0	532.5(321.3)	0	525.5(245.5)	0
		Qa-SIS	5	655(391.8)	0	590.5(361.5)	0	591.5(374.8)	0
		QPC-SIS	5	7(21.3)	0.765	7(3)	0.88	8(44.8)	0.74
		NQPC-SIS	5	8(81.8)	0.63	7(7.3)	0.82	11(136.5)	0.57

Table 3. Simulation results for Example 3 when

n = 200

.

Table 3. Simulation results for Example 3 when

n = 200

.

				$t = 1$		$t = 2$
$ε$	$τ$	Method	$s_{n}$	MMS(RSD)	$P$	MMS(RSD)	$P$
$N (0, 1)$	$τ = 0.2$	Qa-SIS	5	695(461.5)	0.005	799.5(326.5)	0
		QPC-SIS	5	46(165.3)	0.485	298.5(477.5)	0.075
		NQPC-SIS	5	7(13)	0.82	160(345.8)	0.25
	$τ = 0.5$	SIS	5	492(805)	0.155	1000(1)	0
		NIS	5	533(608)	0.035	798(391)	0.005
		Qa-SIS	5	761.5(460)	0.005	762.5(353)	0
		QPC-SIS	5	8.5(33)	0.75	211(372.8)	0.145
		NQPC-SIS	5	5(0)	0.96	29(103.8)	0.56
	$τ = 0.8$	Qa-SIS	5	599(508)	0.01	708(348.3)	0
		QPC-SIS	5	47(213)	0.46	393(400)	0.015
		NQPC-SIS	5	6(7)	0.86	156.5(388)	0.23
$t (3)$	$τ = 0.2$	Qa-SIS	5	689.5(434)	0	794(345)	0
		QPC-SIS	5	85.5(239.5)	0.28	548.5(518.5)	0.025
		NQPC-SIS	5	46.5(181)	0.46	487(551)	0.055
	$τ = 0.5$	SIS	5	626.5(767.8)	0.1	999(6)	0
		NIS	5	560.5(574.5)	0.025	742(430.3)	0
		Qa-SIS	5	673.5(449.3)	0.005	751.5(358.5)	0
		QPC-SIS	5	21.5(89.3)	0.56	331.5(467.8)	0.06
		NQPC-SIS	5	6(5)	0.855	136.5(388)	0.21
	$τ = 0.8$	Qa-SIS	5	583(458.5)	0.015	711.5(382.3)	0
		QPC-SIS	5	108.5(303)	0.3	623(448)	0.005
		NQPC-SIS	5	28.5(136.3)	0.52	413(489.5)	0.03

Table 4. Simulation results for Example 4 when

n = 200

.

Table 4. Simulation results for Example 4 when

n = 200

.

				$t = 1$		$t = 2$
$ε$	$τ$	Method	$s_{n}$	MMS(RSD)	$P$	MMS(RSD)	$P$
$N (0, 1)$	$τ = 0.2$	Qa-SIS	5	735(448.5)	0.005	681.5(388.5)	0
		QPC-SIS	5	647.5(425)	0.005	746.5(329.3)	0
		NQPC-SIS	5	5(1)	0.945	86(334)	0.385
	$τ = 0.5$	SIS	5	793(268.3)	0	846.5(252.8)	0
		NIS	5	765.5(298.8)	0	896.5(225.8)	0
		Qa-SIS	5	749.5(274.8)	0	818(301.5)	0
		QPC-SIS	5	717.5(326.8)	0	805.5(254.3)	0
		NQPC-SIS	5	5(0)	1	8(60.3)	0.705
	$τ = 0.8$	Qa-SIS	5	836(274)	0	867.5(243.8)	0
		QPC-SIS	5	798.5(248.3)	0	811.5(249.3)	0
		NQPC-SIS	5	5(1)	0.985	61(355.8)	0.44
$t (3)$	$τ = 0.2$	Qa-SIS	5	716.5(374.3)	0	703(375.3)	0
		QPC-SIS	5	603(422)	0.01	743.5(295.3)	0
		NQPC-SIS	5	7(22.5)	0.78	317(592.3)	0.15
	$τ = 0.5$	SIS	5	786.5(261.8)	0	869.5(301.3)	0
		NIS	5	779(285.5)	0	833.5(259.8)	0
		Qa-SIS	5	754.5(324.5)	0	800(255.8)	0
		QPC-SIS	5	755.5(379.5)	0	825(296.3)	0
		NQPC-SIS	5	5(0)	0.99	61.5(302.3)	0.435
	$τ = 0.8$	Qa-SIS	5	819.5(255.8)	0	869(241.3)	0
		QPC-SIS	5	795(249.3)	0	847(260.3)	0
		NQPC-SIS	5	6(9)	0.835	375(576)	0.14

Table 5. Simulation results for Examples 1 to 4 when

ρ = 0.9

and

τ = 0.5

.

Table 5. Simulation results for Examples 1 to 4 when

ρ = 0.9

and

τ = 0.5

.

		$n = 200$				$n = 400$
		$ε \sim N (0, 1)$		$ε \sim t (3)$		$ε \sim N (0, 1)$		$ε \sim t (3)$
	Method	MMS(RSD)	$P$	MMS(RSD)	$P$	MMS(RSD)	$P$	MMS(RSD)	$P$
Example 1	QPC-SIS	6(2)	0.94	9(64)	0.665	5(2)	1	5(2)	0.985
	NQPC-SIS	6(3)	0.88	30.5(174.3)	0.52	5(2)	0.99	6(2)	0.93
Example 2	QPC-SIS	7(2)	0.9	25.5(204.5)	0.525	6(2)	1	7(2)	0.985
	NQPC-SIS	8(9.25)	0.825	44.5(178.8)	0.465	6(2)	0.995	7(2)	0.95
Example 3	QPC-SIS	608.5(471.5)	0	687.5(400)	0.005	253(408.3)	0.175	399.5(521)	0.105
	NQPC-SIS	341(505.8)	0.09	527.5(496.5)	0.025	19(75)	0.705	69.5(247.8)	0.475
Example 4	QPC-SIS	770.5(311.5)	0	802.5(232.8)	0	752.5(312)	0	784(258.3)	0
	NQPC-SIS	301.5(484.3)	0.11	433(554.8)	0.075	11(44)	0.76	38.5(142.8)	0.58

Table 6. Simulation results for Examples 1 to 4 when

n = 200

and

τ = 0.5

.

Table 6. Simulation results for Examples 1 to 4 when

n = 200

and

τ = 0.5

.

		$ρ = 0.5$				$ρ = 0.8$
		$ε \sim N (0, 1)$		$ε \sim t (3)$		$ε \sim N (0, 1)$		$ε \sim t (3)$
	Method	EMS(SD)	$P$	EMS(RSD)	$P$	EMS(SD)	$P$	EMS(SD)	$P$
Example 1	QC-SIS	2.985(0.122)	0	2.925(0.346)	0	2.570(0.969)	0	2.340(1.162)	0
	RFQPC-SIS	3.995(0.071)	0.995	3.975(0.157)	0.975	3.675(0.470)	0.675	3.460(0.500)	0.46
	NQPC-SIS	4(0)	1	4(0)	1	3.985(0.122)	0.985	3.930(0.256)	0.93
Example 2	QC-SIS	3.565(0.646)	0	3.490(0.657)	0	3.355(1.147)	0	3.255(1.280)	0
	RFQPC-SIS	4.745(0.437)	0.745	4.630(0.504)	0.64	4.475(0.609)	0.535	4.215(0.649)	0.335
	NQPC-SIS	5(0)	1	5(0)	1	4.960(0.221)	0.965	4.785(0.447)	0.8
Example 3	QC-SIS	2.470(0.862)	0.03	2.575(0.894)	0	1.505(0.567)	0	1.460(0.592)	0
	RFQPC-SIS	4.945(0.229)	0.945	4.730(0.788)	0.75	4.275(0.808)	0.465	3.535(0.175)	0.175
	NQPC-SIS	4.955(0.208)	0.955	4.785(0.424)	0.79	4.320(0.825)	0.49	3.715(1.109)	0.265
Example 4	QC-SIS	1.775(0.613)	0	1.790(0.720)	0	1.685(0.639)	0	1.690(0.629)	0
	RFQPC-SIS	3.045(0.739)	0.02	3.040(0.788)	0.045	2.455(0.616)	0	2.350(0.528)	0
	NQPC-SIS	5(0)	1	4.980(0.140)	0.98	4.590(0.731)	0.715	4.205(0.864)	0.465

Table 7. Prediction results for the real data on the test set, where the standard deviation is given in the parenthesis.

Method	$τ = 0.3$	$τ = 0.4$	$τ = 0.5$	$τ = 0.6$	$τ = 0.7$
SIS	-	-	0.8202(0.1362)	-	-
NIS	-	-	0.8254(0.1348)	-	-
Qa-SIS	0.8318(0.1472)	0.8261(0.1446)	0.8375(0.1448)	0.8612(0.1541)	0.8431(0.1512)
QPC-SIS	0.7269(0.1267)	0.7989(0.1458)	0.8347(0.1471)	0.6495(0.1155)	1.0240(0.1732)
NQPC-SIS	0.7629(0.1356)	0.6488(0.1232)	0.6742(0.1285)	0.8156(0.1643)	0.7802(0.1315)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xia, X.; Ming, H. A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation. Mathematics 2022, 10, 4638. https://doi.org/10.3390/math10244638

AMA Style

Xia X, Ming H. A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation. Mathematics. 2022; 10(24):4638. https://doi.org/10.3390/math10244638

Chicago/Turabian Style

Xia, Xiaochao, and Hao Ming. 2022. "A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation" Mathematics 10, no. 24: 4638. https://doi.org/10.3390/math10244638

APA Style

Xia, X., & Ming, H. (2022). A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation. Mathematics, 10(24), 4638. https://doi.org/10.3390/math10244638

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation

Abstract

1. Introduction

2. Methodology

2.1. A Preliminary

2.2. Proposed Method: NQPC-SIS

3. Theoretical Properties

4. Algorithm for NQPC-SIS

5. Numerical Studies

5.1. Simulations

5.2. An Application to Breast Cancer Data

6. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Technical Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI