Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models

Guo, Xiao; Zhang, Chunming

doi:10.3390/e20030168

Open AccessArticle

Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models

by

Xiao Guo

^1,*

and

Chunming Zhang

²

¹

Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China

²

Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA

^*

Author to whom correspondence should be addressed.

Entropy 2018, 20(3), 168; https://doi.org/10.3390/e20030168

Submission received: 12 January 2018 / Revised: 1 March 2018 / Accepted: 1 March 2018 / Published: 5 March 2018

(This article belongs to the Special Issue New Developments in Statistical Information Theory Based on Entropy and Divergence Measures)

Download

Browse Figures

Versions Notes

Abstract

:

An important issue for robust inference is to examine the stability of the asymptotic level and power of the test statistic in the presence of contaminated data. Most existing results are derived in finite-dimensional settings with some particular choices of loss functions. This paper re-examines this issue by allowing for a diverging number of parameters combined with a broader array of robust error measures, called “robust-

BD

”, for the class of “general linear models”. Under regularity conditions, we derive the influence function of the robust-

BD

parameter estimator and demonstrate that the robust-

BD

Wald-type test enjoys the robustness of validity and efficiency asymptotically. Specifically, the asymptotic level of the test is stable under a small amount of contamination of the null hypothesis, whereas the asymptotic power is large enough under a contaminated distribution in a neighborhood of the contiguous alternatives, thus lending supports to the utility of the proposed robust-

BD

Wald-type test.

Keywords:

Bregman divergence; general linear model; hypothesis testing; influence function; robust; Wald-type test

1. Introduction

The class of varying-dimensional “general linear models” [1], including the conventional generalized linear model (

GLM

in [2]), is flexible and powerful for modeling a large variety of data and plays an important role in many statistical applications. In the literature, it has been extensively studied that the conventional maximum likelihood estimator for the

GLM

is nonrobust; for example, see [3,4]. To enhance the resistance to outliers in applications, many efforts have been made to obtain robust estimators. For example, Noh et al. [5] and Künsch et al. [6] developed robust estimator for the

GLM

, and Stefanski et al. [7], Bianco et al. [8] and Croux et al. [9] studied robust estimation for the logistic regression model with the deviance loss as the error measure.

Besides robust estimation for the

GLM

, robust inference is another important issue, which, however, receives relatively less attention. Basically, the study of robust testing includes two aspects: (i) establishing the stability of the asymptotic level under small departures from the null hypothesis (i.e., robustness of “validity”); and (ii) demonstrating that the asymptotic power is sufficiently large under small departures from specified alternatives (i.e., robustness of “efficiency”). In the literature, robust inference has been conducted for different models. For example, Heritier et al. [10] studied the robustness properties of the Wald, score and likelihood ratio tests based on M estimators for general parametric models. Cantoni et al. [11] developed a test statistic based on the robust deviance, and conducted robust inference for the

GLM

using quasi-likelihood as the loss function. A robust Wald-type test for the logistic regression model is studied in [12]. Ronchetti et al. [13] concerned the robustness property for the generalized method of moments estimators. Basu et al. [14] proposed robust tests based on the density power divergence (DPD) measure for the equality of two normal means. Robust tests for parameter change have been studied using the density-based divergence method in [15,16]. However, the aforementioned methods based on the

GLM

mostly focus on situations where the number of parameters is fixed and the loss function is specific.

Zhang et al. [1] developed robust estimation and testing for the “general linear model” based on a broader array of error measures, namely Bregman divergence, allowing for a diverging number of parameters. The Bregman divergence includes a wide class of error measures as special cases, e.g., the (negative) quasi-likelihood in regression, the deviance loss and exponential loss in machine learning practice, among many other commonly used loss functions. Zhang et al. [1] studied the consistency and asymptotic normality of their proposed robust-

BD

parameter estimator and demonstrated the asymptotic distribution of the Wald-type test constructed from robust-

BD

estimators. Naturally, it remains an important issue to examine the robustness property of the robust-

BD

Wald-type test [1] in the varying-dimensional case, i.e., whether the test still has stable asymptotic level and power, in the presence of contaminated data.

This paper aims to demonstrate the robustness property of the robust-

BD

Wald-type test in [1]. Nevertheless, it is a nontrivial task to address this issue. Although the local stability for the Wald-type tests have been established for the M estimators [10], generalized method of moment estimators [13], minimum density power divergence estimator [17] and general M estimators under random censoring [18], their results for finite-dimensional settings are not directly applicable to our situations with a diverging number of parameters. Under certain regularity conditions, we provide rigorous theoretical derivation for robust testing based on the Wald-type test statistic. The essential results are approximations of the asymptotic level and power under contaminated distributions of the data in a small neighborhood of the null and alternative hypotheses, respectively.

Specifically, we show in Theorem 1 that, if the influence function of the estimator is bounded, then the asymptotic level of the test is also bounded under a small amount of contamination.
We also demonstrate in Theorem 2 that, if the contamination belongs to a neighborhood of the contiguous alternatives, then the asymptotic power is also stable.

Hence, we contribute to establish the robustness of validity and efficiency for the robust-

BD

Wald-type test for the “general linear model” with a diverging number of parameters.

The rest of the paper is organized as follows. Section 2 reviews the Bregman divergence (

BD

), robust-

BD

estimation and the Wald-type test statistic proposed in [1]. Section 3 derives the influence function of the robust-

BD

estimator and studies the robustness properties of the asymptotic level and power of the Wald-type test under a small amount of contamination. Section 4 conducts the simulation studies. The technical conditions and proofs are given in Appendix A. A list of notations and symbols is provided in Appendix B.

We will introduce some necessary notations. In the following, C and c are generic finite constants which may vary from place to place, but do not depend on the sample size n. Denote by

E_{K} (\cdot)

the expectation with respect to the underlying distribution K. For a positive integer q, let

0_{q} = {(0, \dots, 0)}^{T} \in R^{q}

be a

q \times 1

zero vector and

I_{q}

be the

q \times q

identity matrix. For a vector

v = {(v_{1}, \dots, v_{q})}^{T} \in R^{q}

, the

L_{1}

norm is

{∥ v ∥}_{1} = \sum_{i = 1}^{q} | v_{i} |

,

L_{2}

norm is

{∥ v ∥}_{2} = {(\sum_{i = 1}^{q} v_{i}^{2})}^{1 / 2}

and the

L_{\infty}

norm is

{∥ v ∥}_{\infty} = {max}_{i = 1, \dots, q} | v_{i} |

. For a

q \times q

matrix A, the

L_{2}

and Frobenius norms of A are

{∥ A ∥}_{2} = {λ_{max} (A^{T} A)}^{1 / 2}

and

{∥ A ∥}_{F} = \sqrt{tr (A A^{T})}

, respectively, where

λ_{max} (\cdot)

denotes the largest eigenvalue of a matrix and

tr (\cdot)

denotes the trace of a matrix.

2. Review of Robust- $BD$ Estimation and Inference for “General Linear Models”

This section briefly reviews the robust-

BD

estimation and inference methods for the “general linear model” developed in [1]. Let

{(X_{n 1}, Y_{1}), \dots, (X_{n n}, Y_{n})}

be

i . i . d .

observations from some underlying distribution

(X_{n}, Y)

with

X_{n} = {(X_{1}, \dots, X_{p_{n}})}^{T} \in R^{p_{n}}

the explanatory variables and Y the response variable. The dimension

p_{n}

is allowed to diverge with the sample size n. The “general linear model” is given by

\begin{matrix} m (x_{n}) \equiv E (Y ∣ X_{n} = x_{n}) = F^{- 1} ({\tilde{x}}_{n}^{T} {\tilde{β}}_{n, 0}), \end{matrix}

(1)

and

\begin{matrix} var (Y ∣ X_{n} = x_{n}) = V (m (x_{n})), \end{matrix}

(2)

where F is a known link function,

{\tilde{β}}_{n, 0} \in R^{p_{n} + 1}

is the vector of unknown true regression parameters,

{\tilde{x}}_{n} = {(1, x_{n}^{T})}^{T}

and

V (\cdot)

is a known function. Note that the conventional generalized linear model (

GLM

) satisfying Equations (1) and (2) assumes that

Y ∣ X_{n} = x_{n}

follows a particular distribution in the exponential family. However, our “general linear model” does not require explicit form of distributions of the response. Hence, the “general linear model” includes the

GLM

as a special case. For notational simplicity, denote

Z_{n} = {(X_{n}^{T}, Y)}^{T}

and

{\tilde{Z}}_{n} = {({\tilde{X}}_{n}^{T}, Y)}^{T}

.

Bregman divergence (

BD

) is a class of error measures, which is introduced in [19] and covers a wide range of loss functions. Specifically, Bregman divergence is defined as a bivariate function,

\begin{matrix} Q_{q} (ν, μ) = - q (ν) + q (μ) + (ν - μ) q^{'} (μ), \end{matrix}

where

q (\cdot)

is the concave generating q-function. For example,

q (μ) = a μ - μ^{2}

for a constant a corresponds to the quadratic loss

Q_{a} (Y, μ) = {(Y - μ)}^{2}

. For a binary response variable Y,

q (μ) = min {μ, 1 - μ}

gives the misclassification loss

Q_{q} (Y, μ) = I {Y \neq I (μ > 0.5)}

;

q (μ) = - 2 {μ log (μ) + (1 - μ) log (1 - μ)}

gives Bernoulli deviance loss

Q_{q} (Y, μ) = - 2 {Y log (μ) + (1 - Y) log (1 - μ)}

;

q (μ) = 2 min {μ, 1 - μ}

gives the hinge loss

Q_{q} (Y, μ) = max {1 - (2 Y - 1) sign (μ - 0.5), 0}

for the support vector machine;

q (μ) = 2 {μ (1 - μ)}^{1 / 2}

yields the exponential loss

Q_{q} (Y, μ) = exp [- (Y - 0.5) log {μ / (1 - μ)}]

used in AdaBoost [20]. Furthermore, Zhang et al. [21] showed that if

\begin{matrix} q (μ) = \int_{a}^{μ} \frac{s - μ}{V (s)} d s, \end{matrix}

(3)

where a is a finite constant such that the integral is well-defined, then

Q_{q} (y, μ)

is the “classical (negative) quasi-likelihood” function

- Q_{QL} (y, μ)

with

\partial Q_{QL} (y, μ) / \partial μ = (y - μ) / V (μ)

.

To obtain a robust estimator based on

BD

, Zhang et al. [1] developed the robust-

BD

loss function

\begin{matrix} ρ_{q} (y, μ) = \int_{y}^{μ} ψ (r (y, s)) {q^{″} (s) \sqrt{V (s)}} d s - G (μ), \end{matrix}

(4)

where

ψ (\cdot)

is a bounded odd function, such as the Huber

ψ

-function [22],

r (y, s) = (y - s) / \sqrt{V (s)}

denotes the Pearson residual and

G (μ)

is the bias-correction term satisfying

\begin{matrix} G^{'} (μ) = G_{1}^{'} (μ) {q^{″} (μ) \sqrt{V (μ)}}, \end{matrix}

with

\begin{matrix} G_{1}^{'} (m (x_{n})) = E {ψ (r (Y, m (x_{n}))) ∣ X_{n} = x_{n}} . \end{matrix}

Based on robust-

BD

, the estimator of

{\tilde{β}}_{n, 0}

proposed in [1] is defined as

\begin{matrix} \hat{\tilde{β}} = arg min_{\tilde{β}} \{\frac{1}{n} \sum_{i = 1}^{n} ρ_{q} (Y_{i}, F^{- 1} ({\tilde{X}}_{n i}^{T} \tilde{β})) w (X_{n i})\}, \end{matrix}

(5)

where

w (\cdot) \geq 0

is a known bounded weight function which downweights the high leverage points.

In [11], the “robust quasi-likelihood estimator” of

{\tilde{β}}_{n, 0}

is formulated according to the “robust quasi-likelihood function” defined as

\begin{matrix} Q_{RQL} (x_{n}, y, μ) \\ = & \{\int_{μ_{0}}^{μ} ψ (r (y, s)) / \sqrt{V (s)} d s\} w (x_{n}) - \frac{1}{n} \sum_{j = 1}^{n} \int_{μ_{0}}^{μ_{j}} [E {ψ (r (Y_{j}, s)) | X_{n j}} / \sqrt{V (s)} d s] w (X_{n j}), \end{matrix}

where

μ = F^{- 1} ({\tilde{x}}_{n}^{T} \tilde{β})

and

μ_{j} = μ_{j} (\tilde{β}) = F^{- 1} ({\tilde{X}}_{n j}^{T} \tilde{β})

,

j = 1, \dots, n

. To describe the intuition of the “robust-

BD

”, we use the following diagram from [1], which illustrates the relation among the “robust-

BD

”, “classical-

BD

”, “robust quasi-likelihood” and “classical (negative) quasi-likelihood”.

For the robust-

BD

, assume that

\begin{matrix} p_{j} (y; θ) = \frac{\partial^{j}}{\partial θ^{j}} ρ_{q} (y, F^{- 1} (θ)), j = 0, 1, \dots, \end{matrix}

exist finitely up to any order required. For example, for

j = 1

,

\begin{matrix} p_{1} (y; θ) = {ψ (r (y, μ)) - G_{1}^{'} (μ)} {q^{″} (μ) \sqrt{V (μ)}} / F^{'} (μ), \end{matrix}

(6)

where

μ = F^{- 1} (θ)

. Explicit expressions for

p_{j} (y; θ)

(

j = 2, 3

) can be found in Equation (3.7) of [1]. Then, the estimation equation for

\hat{\tilde{β}}

is

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} ψ_{RBD} (Z_{n i}; \tilde{β}) = 0, \end{matrix}

where the score vector is

\begin{matrix} ψ_{RBD} (z_{n}; \tilde{β}) = p_{1} (y; θ) w (x_{n}) {\tilde{x}}_{n}, \end{matrix}

(7)

with

θ = {\tilde{x}}_{n}^{T} \tilde{β}

. The consistency and asymptotic normality of

\hat{\tilde{β}}

have been studied in [1]; see Theorems 1 and 2 therein.

Furthermore, to conduct statistical inference for the “general linear model”, the following hypotheses are considered,

\begin{matrix} H_{0} : A_{n} {\tilde{β}}_{n, 0} = g_{0} versus H_{1} : A_{n} {\tilde{β}}_{n, 0} \neq g_{0}, \end{matrix}

(8)

where

A_{n}

is a given

k \times (p_{n} + 1)

matrix such that

A_{n} A_{n}^{T} \to G

with

G

being a

k \times k

positive-definite matrix, and

g_{0}

is a known

k \times 1

vector.

To perform the test of Equation (8), Zhang et al. [1] proposed the Wald-type test statistic,

\begin{matrix} W_{n} = n {(A_{n} \hat{\tilde{β}} - g_{0})}^{T} {(A_{n} {\hat{H}}_{n}^{- 1} {\hat{Ω}}_{n} {\hat{H}}_{n}^{- 1} A_{n}^{T})}^{- 1} (A_{n} \hat{\tilde{β}} - g_{0}), \end{matrix}

(9)

constructed from the robust-

BD

estimator

\hat{\tilde{β}}

in Equation (5), where

\begin{matrix} {\hat{Ω}}_{n} & = & \frac{1}{n} \sum_{i = 1}^{n} p_{1}^{2} (Y_{i}; {\tilde{X}}_{n i}^{T} \hat{\tilde{β}}) w^{2} (X_{n i}) {\tilde{X}}_{n i} {\tilde{X}}_{n i}^{T}, \\ {\hat{H}}_{n} & = & \frac{1}{n} \sum_{i = 1}^{n} p_{2} (Y_{i}; {\tilde{X}}_{n i}^{T} \hat{\tilde{β}}) w (X_{n i}) {\tilde{X}}_{n i} {\tilde{X}}_{n i}^{T} . \end{matrix}

The asymptotic distributions of

W_{n}

under the null and alternative hypotheses have been developed in [1]; see Theorems 4–6 therein.

On the other hand, the issue on the robustness of

W_{n}

, used for possibly contaminated data, remains unknown. Section 3 of this paper will address this issue with detailed derivations.

3. Robustness Properties of $W_{n}$ in Equation (9)

This section derives the influence function of the robust-

BD

Wald-type test and studies the influence of a small amount of contamination on the asymptotic level and power of the test. The proofs of the theoretical results are given in Appendix A.

Denote by

K_{n, 0}

the true distribution of

Z_{n}

following the “general linear model” characterized by Equations (1) and (2). To facilitate the discussion of robustness properties, we consider the

ϵ

-contamination,

\begin{matrix} K_{n, ϵ} = (1 - \frac{ϵ}{\sqrt{n}}) K_{n, 0} + \frac{ϵ}{\sqrt{n}} J, \end{matrix}

(10)

where J is an arbitrary distribution and

ϵ > 0

is a constant. Then,

K_{n, ϵ}

is a contaminated distribution of

Z_{n}

with the amount of contamination converging to 0 at rate

1 / \sqrt{n}

. Denote by

K_{n}

the empirical distribution of

{{Z_{n}}_{i}}_{i = 1}^{n}

.

For a generic distribution K of

Z_{n}

, define

\begin{matrix} ℓ_{K} (\tilde{β}) & = & E_{K} {ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} \tilde{β})) w (X_{n})}, \\ S_{K} & = & {\tilde{β} : E_{K} {ψ_{RBD} (Z_{n}; \tilde{β})} = 0}, \end{matrix}

(11)

where

ρ_{q} (\cdot, \cdot)

and

ψ_{RBD} (\cdot; \cdot)

are defined in Equations (4) and (7), respectively. It’s worth noting that the solution to

E_{K} {ψ_{RBD} (Z_{n}; \tilde{β})} = 0

may not be unique, i.e.,

S_{K}

may contain more than one element. We then define a functional for the estimator of

{\tilde{β}}_{n, 0}

as follows,

\begin{matrix} T (K) = \underset{\tilde{β} \in S_{K}}{arg min} ∥ \tilde{β} - {\tilde{β}}_{n, 0} ∥ . \end{matrix}

(12)

From the result of Lemma A1 in Appendix A,

T (K_{n, ϵ})

is the unique local minimizer of

ℓ_{K_{n, ϵ}} (\tilde{β})

in the

\sqrt{p_{n} / n}

-neighborhood of

{\tilde{β}}_{n, 0}

. Particularly,

T (K_{n, 0}) = {\tilde{β}}_{n, 0}

. Similarly, from Lemma A2 in Appendix A,

T (K_{n})

is the unique local minimizer of

ℓ_{K_{n}} (\tilde{β})

which satisfies

∥ T (K_{n}) - {\tilde{β}}_{n, 0} ∥ = O_{P} (\sqrt{p_{n} / n})

.

From [23] (Equation (2.1.6) on pp. 84), the influence function of

T (\cdot)

at

K_{n, 0}

is defined as

\begin{matrix} IF (z_{n}; T, K_{n, 0}) = \frac{\partial}{\partial t} T ((1 - t) K_{n, 0} + t Δ_{z_{n}}) |_{t = 0} = lim_{t ↓ 0} \frac{T ((1 - t) K_{n, 0} + t Δ_{z_{n}}) - {\tilde{β}}_{n, 0}}{t}, \end{matrix}

where

Δ_{z_{n}}

is the probability measure which puts mass 1 at the point

z_{n}

. Since the dimension of

T (\cdot)

diverges with n, its influence function is defined for each fixed n. From Lemma A8 in Appendix A, under certain regularity conditions, the influence function exists and has the following expression:

\begin{matrix} IF (z_{n}; T, K_{n, 0}) = - H_{n}^{- 1} ψ_{RBD} (z_{n}; {\tilde{β}}_{n, 0}), \end{matrix}

(13)

where

H_{n} = E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}}

. The form of the influence function for diverging

p_{n}

in Equation(13) coincides with that in [23,24] for fixed

p_{n}

.

In our theoretical derivations, approximations of the asymptotic level and power of

W_{n}

will involve the following matrices:

\begin{matrix} Ω_{n} & = & E_{K_{n, 0}} {p_{1}^{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w^{2} (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}}, \\ U_{n} & = & A_{n} H_{n}^{- 1} Ω_{n} H_{n}^{- 1} A_{n}^{T} . \end{matrix}

3.1. Asymptotic Level of $W_{n}$ under Contamination

We now investigate the asymptotic level of the Wald-type test

W_{n}

under the

ϵ

-contamination.

Theorem 1.

Assume Conditions A0–A9 and B4 in Appendix A. Suppose

p_{n}^{6} / n \to 0

as

n \to \infty

,

{sup}_{n} E_{J} (∥ w (X_{n}) {\tilde{X}}_{n} ∥) \leq C

. Denote by

α (K_{n, ϵ})

the level of

W_{n} = n {A_{n} T (K_{n}) - g_{0}}^{T} {(A_{n} {\hat{H}}_{n}^{- 1} {\hat{Ω}}_{n} {\hat{H}}_{n}^{- 1} A_{n}^{T})}^{- 1} {A_{n} T (K_{n}) - g_{0}}

when the underlying distribution is

K_{n, ϵ}

in Equation (10) and by

α_{_{0}}

the nominal level. Under

H_{0}

in Equation (8), it follows that

\begin{matrix} \underset{n \to \infty}{lim sup} α (K_{n, ϵ}) = α_{_{0}} + ϵ^{2} μ_{k} D + o (ϵ^{2}) as ϵ \to 0, \end{matrix}

where

\begin{matrix} D = \underset{n \to \infty}{lim sup} {∥ U_{n}^{- 1 / 2} A_{n} E_{J} {IF (Z_{n}; T, K_{n, 0})} ∥}^{2} < \infty, \end{matrix}

μ_{k} = - \frac{\partial}{\partial δ} H_{k} (η_{_{1 - α_{_{0}}}}; δ) |_{δ = 0}

,

H_{k} (\cdot; δ)

is the cumulative distribution function of a

χ_{k}^{2} (δ)

distribution, and

η_{_{1 - α_{_{0}}}}

is the

1 - α_{_{0}}

quantile of the central

χ_{k}^{2}

distribution.

Theorem 1 indicates that if the influence function for

T (\cdot)

is bounded, then the asymptotic level of

W_{n}

under the

ϵ

-contamination is also bounded and close to the nominal level when

ϵ

is sufficiently small. As a comparison, the robustness property in [10] of the Wald-type test is studied based on M-estimator for general parametric models with a fixed dimension

p_{n}

. They assumed certain conditions that guarantee Fréchet differentiability which further implies the existence of the influence function and the asymptotic normality of the corresponding estimator. However, in the set-ups of our paper, it’s difficult to check those conditions, due to the use of Bregman divergence and the diverging dimension

p_{n}

. Hence, the assumptions we make in Theorem 1 are different from those in [10], and are comparatively mild and easy to check. Moreover, the result of Theorem 1 cannot be easily derived from that of [10].

In Theorem 1,

p_{n}

is allowed to diverge with

p_{n}^{6} / n = o (1)

, which is slower than that in [1] with

p_{n}^{5} / n = o (1)

. Theoretically, the assumption

p_{n}^{5} / n = o (1)

is required to obtain the asymptotic distribution of

W_{n}

in [1]. Furthermore, to derive the limit distribution of

W_{n}

under the

ϵ

-contamination, assumption

p_{n}^{6} / n = o (1)

is needed (see Lemma A7 in Appendix A). Hence, the reason that our assumption is stronger than that in [1] is the consideration of the

ϵ

-contamination of the data. Practically, due to the advancement of technology and different forms of data gathering, large dimension becomes a common characteristic and hence the varying-dimensional model has a wide range of applications, e.g., brain imaging data, financial data, web term-document data and gene expression data. Even some of the classical settings, e.g., the Framingham heart study with

n = 25, 000

and

p_{n} = 100

, can be viewed as varying-dimensional cases.

As an illustration, we apply the general result of Theorem 1 to the special case of a point mass contamination.

Corollary 1.

With the notations in Theorem 1, assume Conditions

A 0

–

A 9

in Appendix A,

{sup}_{x_{n} \in R^{p_{n}}} ∥ w (x_{n}) x_{n} ∥ \leq C

and

{sup}_{μ \in R} | q^{″} (μ) \sqrt{V (μ)} / F^{'} (μ) | \leq C

.

(i): If $p_{n} \equiv p$ , $A_{n} \equiv A$ , ${\tilde{β}}_{n, 0} \equiv {\tilde{β}}_{0}$ , $K_{n, 0} \equiv K_{0}$ and $U_{n} \equiv U$ are fixed, then, for $K_{n, ϵ} = (1 - ϵ / \sqrt{n}) K_{0} + ϵ / \sqrt{n} Δ_{z}$ with $z \in R^{p}$ a fixed point, under $H_{0}$ in Equation (8), it follows that

$\begin{matrix} sup_{z \in R^{p}} lim_{n \to \infty} α (K_{n, ϵ}) = α_{_{0}} + ϵ^{2} μ_{k} D_{1} + o (ϵ^{2}) as ϵ \to 0, \end{matrix}$

where

$\begin{matrix} D_{1} = sup_{z \in R^{p}} {∥ U^{- 1 / 2} A IF (z; T, K_{0}) ∥}^{2} < \infty . \end{matrix}$
(ii): If $p_{n}$ diverges with $p_{n}^{6} / n \to 0$ , for $K_{n, ϵ} = (1 - ϵ / \sqrt{n}) K_{n, 0} + ϵ / \sqrt{n} Δ_{z_{n}}$ with $z_{n} \in R^{p_{n}}$ a sequence of deterministic points, then, under $H_{0}$ in Equation (8),

$\begin{matrix} sup_{C_{0} > 0} sup_{z_{n} \in S_{C_{0}}} \underset{n \to \infty}{lim sup} α (K_{n, ϵ}) = α_{_{0}} + ϵ^{2} μ_{k} D_{2} + o (ϵ^{2}) as ϵ \to 0, \end{matrix}$

where $S_{C_{0}} = {z_{n} = {(x_{n}^{T}, y)}^{T} : ∥ x_{n} ∥_{\infty} \leq C_{0}}$ , $C_{0} > 0$ is a constant and

$\begin{matrix} D_{2} = sup_{C_{0} > 0} sup_{z_{n} \in S_{C_{0}}} \underset{n \to \infty}{lim sup} {∥ U_{n}^{- 1 / 2} A_{n} IF (z_{n}; T, K_{n, 0}) ∥}^{2} < \infty . \end{matrix}$

In Corollary 1, conditions

{sup}_{x_{n} \in R^{p_{n}}} ∥ w (x_{n}) x_{n} ∥ \leq C

and

{sup}_{μ \in R} | q^{″} (μ) \sqrt{V (μ)} / F^{'} (μ) | \leq C

are needed to guarantee the boundedness of the score function in Equation (7). Particularly, the function

w (x_{n})

downweights the high leverage points and can be chosen as, e.g.,

w (x_{n}) = 1 / (1 + ∥ x_{n} ∥)

. The condition

{sup}_{μ \in R} | q^{″} (μ) \sqrt{V (μ)} / F^{'} (μ) | \leq C

is needed to bound Equation (6), and is satisfied in many situations.

For example, for the linear model with $q (μ) = a μ - μ^{2}$ , $V (μ) = σ^{2}$ and $F (μ) = μ$ , where a and $σ^{2}$ are constants, we observe $| q^{″} (μ) \sqrt{V (μ)} / F^{'} (μ) | = 2 σ \leq C$ .
Another example is the logistic regression model with binary response and $q (μ) = - 2 {μ log (μ) + (1 - μ) log (1 - μ)}$ (corresponding to Bernoulli deviance loss), $V (μ) = μ (1 - μ)$ , $F (μ) = log {μ / (1 - μ)}$ . In this case, $| q^{″} (μ) \sqrt{V (μ)} / F^{'} {(μ) | = 2 {μ (1 - μ)}}^{1 / 2} \leq C$ since $μ \in [0, 1]$ . Likewise, if $q (μ) = 2 {μ (1 - μ)}^{1 / 2}$ (for the exponential loss), then $| q^{″} (μ) \sqrt{V (μ)} / F^{'} (μ) | = 1 / 2$ .

Furthermore, the bound on

ψ (\cdot)

is useful to control deviations in the Y-space, which ensures the stability of the robust-

BD

test if Y is arbitrarily contaminated.

Concerning the dimensionality

p_{n}

, Corollary 1 reveals the following implications. If

p_{n}

is fixed, then the asymptotic level of

W_{n}

under the

ϵ

-contamination is uniformly bounded for all

z \in R^{p}

, which implies the robustness of validity of the test. This result coincides with that in Proposition 5 of [10]. When

p_{n}

diverges, the asymptotic level is still stable if the point contamination satisfies

∥ x_{n} ∥_{\infty} \leq C_{0}

, where

C_{0} > 0

is an arbitrary constant. Although this condition may not be the weakest, it still covers a wide range of point mass contaminations.

3.2. Asymptotic Power of $W_{n}$ under Contamination

Now, we will study the asymptotic power of

W_{n}

under a sequence of contiguous alternatives of the form

\begin{matrix} H_{1 n} : A_{n} {\tilde{β}}_{n, 0} - g_{0} = n^{- 1 / 2} c, \end{matrix}

(14)

where

c = {(c_{1}, \dots, c_{k})}^{T} \neq 0

is fixed.

Theorem 2.

Assume Conditions

A 0

–

A 9

and

B 4

in Appendix A. Suppose

p_{n}^{6} / n \to 0

as

n \to \infty

,

{sup}_{n} E_{J} (∥ w (X_{n}) {\tilde{X}}_{n} ∥) \leq C

. Denote by

β (K_{n, ϵ})

the power of

W_{n} = n {A_{n} T (K_{n}) - g_{0}}^{T} {(A_{n} {\hat{H}}_{n}^{- 1} {\hat{Ω}}_{n} {\hat{H}}_{n}^{- 1} A_{n}^{T})}^{- 1} {A_{n} T (K_{n}) - g_{0}}

when the underlying distribution is

K_{n, ϵ}

in Equation (10) and by

β_{0}

the nominal power. Under

H_{1 n}

in Equation (14), it follows that

\begin{matrix} \underset{n \to \infty}{lim inf} β (K_{n, ϵ}) = β_{0} + ϵ ν_{k} B + o (ϵ) as ϵ \to 0, \end{matrix}

where

\begin{matrix} B = \underset{n \to \infty}{lim inf} 2 c^{T} U_{n}^{- 1} A_{n} E_{J} {IF (Z_{n}; T, K_{n, 0})}, \end{matrix}

with

| B | < \infty

,

ν_{k} = - \frac{\partial}{\partial δ} H_{k} (η_{_{1 - α_{_{0}}}}; δ) |_{δ = c^{T} U_{n}^{- 1} c}

and

H_{k} (\cdot; δ)

and

η_{_{1 - α_{_{0}}}}

being defined in Theorem 1.

The result for the asymptotic power is similar in spirit to that for the level. From Theorem 2, if the influence function is bounded, the asymptotic power is also bounded from below and close to the nominal power under a small amount of contamination. This means that the robust-

BD

Wald-type test enjoys the robustness of efficiency. In addition, the property of the asymptotic power can be obtained for a point mass contamination.

Corollary 2.

With the notations in Theorem 2, assume Conditions

A 0

–

A 9

in Appendix A,

{sup}_{x_{n} \in R^{p_{n}}} ∥ w (x_{n}) x_{n} ∥ \leq C

and

{sup}_{μ \in R} | q^{″} (μ) \sqrt{V (μ)} / F^{'} (μ) | \leq C

.

(i): If $p_{n} \equiv p$ , $A_{n} \equiv A$ , ${\tilde{β}}_{n, 0} \equiv {\tilde{β}}_{0}$ , $K_{n, 0} \equiv K_{0}$ and $U_{n} \equiv U$ are fixed, then, for $K_{n, ϵ} = (1 - ϵ / \sqrt{n}) K_{0} + ϵ / \sqrt{n} Δ_{z}$ with $z \in R^{p}$ a fixed point, under $H_{1 n}$ in Equation (14), it follows that

$\begin{matrix} inf_{z \in R^{p}} lim_{n \to \infty} β (K_{n, ϵ}) = β_{0} + ϵ ν_{k} B_{1} + o (ϵ) as ϵ \to 0, \end{matrix}$

where

$\begin{matrix} B_{1} = inf_{z \in R^{p}} 2 c^{T} U^{- 1} A IF (z; T, K_{0}), \end{matrix}$

with $| B_{1} | < \infty$ .
(ii): If $p_{n}$ diverges with $p_{n}^{6} / n \to 0$ , for $K_{n, ϵ} = (1 - ϵ / \sqrt{n}) K_{n, 0} + ϵ / \sqrt{n} Δ_{z_{n}}$ with $z_{n} \in R^{p_{n}}$ a sequence of deterministic points, then, under $H_{1 n}$ in Equation (14),

$\begin{matrix} inf_{C_{0} > 0} inf_{z_{n} \in S_{C_{0}}} \underset{n \to \infty}{lim inf} β (K_{n, ϵ}) = β_{0} + ϵ ν_{k} B_{2} + o (ϵ) as ϵ \to 0, \end{matrix}$

where $S_{C_{0}} = {z_{n} = {(x_{n}^{T}, y)}^{T} : ∥ x_{n} ∥_{\infty} \leq C_{0}}$ , $C_{0} > 0$ is a constant and

$\begin{matrix} B_{2} = inf_{C_{0} > 0} inf_{z_{n} \in S_{C_{0}}} \underset{n \to \infty}{lim inf} 2 c^{T} U_{n}^{- 1} A_{n} IF (Z_{n}; T, K_{n, 0}), \end{matrix}$

with $| B_{2} | < \infty$ .

4. Simulation

Regarding the practical utility of

W_{n}

, numerical studies concerning the empirical level and power of

W_{n}

under a fixed amount of contamination have been conducted in Section 6 of [1]. To support the theoretical results in our paper, we conduct new simulations to check the robustness of validity and efficiency of

W_{n}

. Specifically, we will examine the empirical level and power of the test statistic as

ϵ

varies.

The robust-

BD

estimation utilizes the Huber

ψ

-function

ψ_{c} (\cdot)

with

c = 1.345

and the weight function

w (X_{n}) = 1 / (1 + ∥ X_{n} ∥)

. Comparisons are made with the classical non-robust counterparts corresponding to using

ψ (r) = r

and

w (x_{n}) \equiv 1

. For each situation below, we set

n = 1000

and conduct 400 replications.

4.1. Overdispersed Poisson Responses

Overdispersed Poisson counts Y, satisfying

var (Y | X_{n} = x_{n}) = 2 m (x_{n})

, are generated via a negative Binomial

(m (x_{n}), 1 / 2)

distribution. Let

p_{n} = ⌊ 4 (n^{1 / 5.5} - 1) ⌋

and

{\tilde{β}}_{n, 0} = {(0, 2, 0, \dots, 0)}^{T}

, where

⌊ \cdot ⌋

denotes the floor function. Generate

X_{n i} = {(X_{i, 1}, \dots, X_{i, p_{n}})}^{T}

by

X_{i, j} \overset{i . i . d .}{\sim} Unif [- 0.5, 0.5]

. The log link function is considered and the (negative) quasi-likelihood is utilized as the BD, generated by the q-function in Equation (3) with

V (μ) = μ

. The estimator and test statistic are calculated by assuming Y follows Poisson distribution.

The data are contaminated by

X_{i, \mod (i, p_{n} - 1) + 1}^{*} = 3 sign (U_{i} - 0.5)

and

Y_{i}^{*} = Y_{i} I (Y_{i} > 20) + 20 I (Y_{i} \leq 20)

for

i = 1, \dots, k

, with

k \in {2, 4, 6, 8, 10, 12, 14, 16}

the number of contaminated data points, where

\mod (a, b)

is the modulo operation “a modulo b” and

{U_{i}} \overset{i . i . d .}{\sim} Unif (0, 1)

. Then, the proportion of contaminated data,

k / n

, is equal to

ϵ / \sqrt{n}

as in Equation (10), which implies

ϵ = k / \sqrt{n}

.

Consider the null hypothesis

H_{0} : A_{n} {\tilde{β}}_{n, 0} = 0

with

A_{n} = (0, 0, 0, 1, 0, \dots, 0)

. Figure 1 plots the empirical level of

W_{n}

versus

ϵ

. We observe that the asymptotic nominal level 0.05 is approximately retained by the robust Wald-type test. On the other hand, under contaminations, the non-robust Wald-type test breaks in level, showing high sensitivity to the presence of outliers.

To assess the stability of the power of the test, we generate the original data from the true model, but with the true parameter

{\tilde{β}}_{n, 0}

replaced by

{\tilde{β}}_{n} = {\tilde{β}}_{n, 0} + δ c

with

δ \in {- 0.4, 0.4, - 0.6, 0.6}

and

c = {(1, \dots, 1)}^{T}

a vector of ones. Figure 2 plots the empirical rejection rates of the null model, which implies that the robust Wald-type test has sufficiently large power to detect the alternative hypothesis. In addition, the power of the robust method is generally larger than that of the non-robust method.

4.2. Bernoulli Responses

We generate data with two classes from the model,

Y | X_{n} = x_{n} \sim Bernoulli {m (x_{n})}

, where

logit {m (x_{n})} = {\tilde{x}}_{n}^{T} {\tilde{β}}_{n, 0}

. Let

p_{n} = 2

,

{\tilde{β}}_{n, 0} = {(0, 1, 1)}^{T}

and

X_{n i} \overset{i . i . d .}{\sim} N (0, I_{p_{n}})

. The null hypothesis is

H_{0} : {\tilde{β}}_{n, 0} = {(0, 1, 1)}^{T}

. Both the deviance loss and the exponential loss are employed as the BD. We contaminate the data by setting

X_{i, 1}^{*} = 2 + i / 8

and

Y_{i}^{*} = 0

for

i = 1, \dots, k

with

k \in {2, 4, 6, 8, 10, 12, 14, 16}

. To investigate the robustness of validity of

W_{n}

, we plot the observed level versus

ϵ

in Figure 3. We find that the level of the non-robust method diverges fast as

ϵ

increases. It’s also clear that the empirical level of the robust method is close to the nominal level when

ϵ

is small and increases slightly with

ϵ

, which coincides with our results in Theorem 1.

To assess the stability of the power of

W_{n}

, we generate the original data from the true model, but with the true parameter

{\tilde{β}}_{n, 0}

replaced by

{\tilde{β}}_{n} = {\tilde{β}}_{n, 0} + δ c

with

δ \in {- 0.1, 0.2, - 0.3, 0.4}

and

c = {(1, \dots, 1)}^{T}

a vector of ones. Figure 4 plots the power of the Wald-type test versus

ϵ

, which implies that the robust method has sufficiently large power, and hence supports the theoretical results in Theorem 2.

Acknowledgments

We thank the two referees for insightful comments and suggestions. Chunming Zhang’s research is supported by the U.S. NSF Grants DMS–1712418, DMS–1521761, the Wisconsin Alumni Research Foundation and the National Natural Science Foundation of China, grants 11690014. Xiao Guo’s research is supported by the Fundamental Research Funds for the Central Universities and the National Natural Science Foundation of China, grants 11601500, 11671374 and 11771418.

Author Contributions

Chunming Zhang conceived and designed the experiments; Xiao Guo performed the experiments; Xiao Guo analyzed the data; Chunming Zhang contributed to analysis tools; Chunming Zhang and Xiao Guo wrote the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Conditions and Proofs of Main Results

We first introduce some necessary notations used in the proof.

Notations.

For arbitrary distributions K and

K^{'}

of

Z_{n}

, define

\begin{matrix} Ω_{n, K, T (K^{'})} & = & E_{K} {p_{1}^{2} (Y; {\tilde{X}}_{n}^{T} T (K^{'})) w^{2} (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}}, \\ H_{n, K, T (K^{'})} & = & E_{K} {p_{2} (Y; {\tilde{X}}_{n}^{T} T (K^{'})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} . \end{matrix}

Therefore,

Ω_{n} = Ω_{n, K_{n, 0}, {\tilde{β}}_{n, 0}}

,

H_{n} = H_{n, K_{n, 0}, {\tilde{β}}_{n, 0}}

,

{\hat{Ω}}_{n} = Ω_{n, K_{n}, T (K_{n})}

and

{\hat{H}}_{n} = H_{n, K_{n}, T (K_{n})}

. For notational simplicity, let

Ω_{n, ϵ} = Ω_{n, K_{n, ϵ}, T (K_{n, ϵ})}

and

H_{n, ϵ} = H_{n, K_{n, ϵ}, T (K_{n, ϵ})}

.

Define the following matrices,

\begin{matrix} U (K_{n, ϵ}) & = & A_{n} H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} A_{n}^{T}, \\ U (K_{n}) & = & A_{n} {\hat{H}}_{n}^{- 1} {\hat{Ω}}_{n} {\hat{H}}_{n}^{- 1} A_{n}^{T} . \end{matrix}

The following conditions are needed in the proof, which are adopted from [1].

Condition A.

A0.: ${sup}_{n \geq 1} {∥ {\tilde{β}}_{n, 0} ∥}_{1} < \infty$ .
A1.: $w (\cdot)$ is a bounded function. Assume that $ψ (r)$ is a bounded, odd function, and twice differentiable, such that $ψ^{'} (r)$ , $ψ^{'} (r) r$ , $ψ^{″} (r)$ , $ψ^{″} (r) r$ and $ψ^{″} (r) r^{2}$ are bounded; $V (\cdot) > 0$ , $V^{(2)}$ is continuous.
A2.: $q^{(4)} (\cdot)$ is continuous, and $q^{(2)} (\cdot) < 0$ . $G_{1}^{(3)}$ is continuous.
A3.: $F (\cdot)$ is monotone and a bijection, $F^{(3)} (\cdot)$ is continuous, and $F^{(1)} (\cdot) \neq 0$ .
A4.: $∥ X_{n} ∥_{\infty} \leq C$ almost surely if the underlying distribution is $K_{n, 0}$ .
A5.: $E_{K_{n, 0}} ({\tilde{X}}_{n} {\tilde{X}}_{n}^{T})$ exists and is nonsingular.
A6.: There is a large enough open subset of $R^{p_{n} + 1}$ which contains ${\tilde{β}}_{n, 0}$ , such that $F^{- 1} ({\tilde{x}}_{n}^{T} \tilde{β})$ is bounded for all $\tilde{β}$ in the subset and all ${\tilde{x}}_{n}$ such that $∥ {\tilde{x}}_{n} ∥_{\infty} \leq C$ , where $C > 0$ is a large enough constant.
A7.: $H_{n}$ is positive definite, with eigenvalues uniformly bounded away from 0.
A8.: $Ω_{n}$ is positive definite, with eigenvalues uniformly bounded away from 0.
A9.: $∥ H_{n}^{- 1} Ω_{n} ∥$ is bounded away from ∞.

Condition B.

B4.: $∥ X_{n} ∥_{\infty} \leq C$ almost surely if the underlying distribution is J.

The following Lemmas A1–A9 are needed to prove the main theoretical results in this paper.

Lemma A1 (

∥ T (K_{n, ϵ}) - {\tilde{β}}_{n, 0} ∥

).

Assume Conditions A0–A7 and B4. For

K_{n, ϵ}

in Equation (10),

ℓ_{K} (\cdot)

in Equation (11) and

T (\cdot)

in Equation (12), if

p_{n}^{4} / n \to 0

as

n \to \infty

, then

T (K_{n, ϵ})

is a local minimizer of

ℓ_{K_{n, ϵ}} (\tilde{β})

such that

∥ T (K_{n, ϵ}) - {\tilde{β}}_{n, 0} ∥ = O (\sqrt{p_{n} / n})

. Furthermore,

T (K_{n, ϵ})

is unique.

Proof.

We follow the idea of the proof in [25]. Let

r_{n} = \sqrt{p_{n} / n}

and

{\tilde{u}}_{n} = {(u_{0}, u_{1}, \dots, u_{p_{n}})}^{T} \in R^{p_{n} + 1}

. First, we show that there exists a sufficiently large constant C such that, for large n, we have

\begin{matrix} inf_{∥ {\tilde{u}}_{n} ∥ = C} ℓ_{K_{n, ϵ}} ({\tilde{β}}_{n, 0} + r_{n} {\tilde{u}}_{n}) > ℓ_{K_{n, ϵ}} ({\tilde{β}}_{n, 0}) . \end{matrix}

(A1)

To show Equation (A1), consider

\begin{matrix} ℓ_{K_{n, ϵ}} ({\tilde{β}}_{n, 0} + r_{n} {\tilde{u}}_{n}) - ℓ_{K_{n, ϵ}} ({\tilde{β}}_{n, 0}) & = & E_{K_{n, ϵ}} {ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0} + r_{n} {\tilde{X}}_{n}^{T} {\tilde{u}}_{n})) w (X_{n}) \\ - ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0})) w (X_{n})} \\ \equiv & I_{1}, \end{matrix}

where

∥ {\tilde{u}}_{n} ∥ = C

.

By Taylor expansion,

\begin{matrix} I_{1} = I_{1, 1} + I_{1, 2} + I_{1, 3}, \end{matrix}

(A2)

where

\begin{matrix} I_{1, 1} & = & r_{n} E_{K_{n, ϵ}} {p_{1} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n}^{T}} {\tilde{u}}_{n}, \\ I_{1, 2} & = & r_{n}^{2} / 2 E_{K_{n, ϵ}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {({\tilde{X}}_{n}^{T} {\tilde{u}}_{n})}^{2}}, \\ I_{1, 3} & = & r_{n}^{3} / 6 E_{K_{n, ϵ}} {p_{3} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n}^{*}) w (X_{n}) {({\tilde{X}}_{n}^{T} {\tilde{u}}_{n})}^{3}}, \end{matrix}

for

{\tilde{β}}_{n}^{*}

located between

{\tilde{β}}_{n, 0}

and

{\tilde{β}}_{n, 0} + r_{n} {\tilde{u}}_{n}

. Hence

\begin{matrix} | I_{1, 1} | & \leq & r_{n} ∥ E_{K_{n, ϵ}} {p_{1} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n}} ∥ ∥ {\tilde{u}}_{n} ∥ \\ = & r_{n} \frac{ϵ}{\sqrt{n}} ∥ E_{J} {p_{1} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n}} ∥ ∥ {\tilde{u}}_{n} ∥ \\ \leq & C r_{n} \sqrt{p_{n} / n} ∥ {\tilde{u}}_{n} ∥, \end{matrix}

since

∥ E_{J} {p_{1} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n}} ∥ = O (\sqrt{p_{n}})

and

E_{K_{n, 0}} {p_{1} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n}} = 0

. For

I_{1, 2}

in Equation (A2),

\begin{matrix} I_{1, 2} & = & \frac{r_{n}^{2}}{2} E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {({\tilde{X}}_{n}^{T} {\tilde{u}}_{n})}^{2}} \\ + \frac{r_{n}^{2}}{2} [E_{K_{n, ϵ}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {({\tilde{X}}_{n}^{T} {\tilde{u}}_{n})}^{2}} - E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {({\tilde{X}}_{n}^{T} {\tilde{u}}_{n})}^{2}}] \\ \equiv & I_{1, 2, 1} + I_{1, 2, 2}, \end{matrix}

where

I_{1, 2, 1} = 2^{- 1} r_{n}^{2} {\tilde{u}}_{n}^{T} H_{n} {\tilde{u}}_{n}

. Meanwhile, we have

\begin{matrix} | I_{1, 2, 2} | \\ \leq & r_{n}^{2} ∥ E_{K_{n, ϵ}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} - E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} ∥_{F} {∥ {\tilde{u}}_{n} ∥}^{2} \\ = & r_{n}^{2} \frac{ϵ}{\sqrt{n}} ∥ E_{J} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} - E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} ∥_{F} {∥ {\tilde{u}}_{n} ∥}^{2} \\ \leq & C r_{n}^{2} p_{n} {∥ {\tilde{u}}_{n} ∥}^{2} / \sqrt{n}, \end{matrix}

where

∥ E_{J} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} ∥_{F} = O (p_{n})

and

∥ E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} ∥_{F} = O (p_{n})

. Thus,

\begin{matrix} I_{1, 2} = 2^{- 1} r_{n}^{2} {\tilde{u}}_{n}^{T} H_{n} {\tilde{u}}_{n} + O (r_{n}^{2} p_{n} / \sqrt{n}) {∥ {\tilde{u}}_{n} ∥}^{2} . \end{matrix}

(A3)

For

I_{1, 3}

in Equation (A2), we observe that

\begin{matrix} | I_{1, 3} | \leq C r_{n}^{3} E_{K_{n, ϵ}} {| p_{3} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n}^{*}) | w (X_{n}) | {\tilde{X}}_{n}^{T} {\tilde{u}}_{n} |^{3}} = O (r_{n}^{3} p_{n}^{3 / 2}) {∥ {\tilde{u}}_{n} ∥}^{3} . \end{matrix}

We can choose some large C such that

I_{1, 1}

,

I_{1, 2, 2}

and

I_{1, 3}

are all dominated by the first term of

I_{1, 2}

in Equation (A3), which is positive by the eigenvalue assumption. This implies Equation (A1). Therefore, there exists a local minimizer of

ℓ_{K_{n, ϵ}} (\tilde{β})

in the

\sqrt{p_{n} / n}

neighborhood of

{\tilde{β}}_{n, 0}

, and denote this minimizer by

{\tilde{β}}_{n, ϵ}

.

Next, we show that the local minimizer

{\tilde{β}}_{n, ϵ}

of

ℓ_{K_{n, ϵ}} (\tilde{β})

is unique in the

\sqrt{p_{n} / n}

neighborhood of

{\tilde{β}}_{n, 0}

. For all

\tilde{β}

such that

∥ \tilde{β} - {\tilde{β}}_{n, 0} ∥ = O (n^{- 1 / 4} p_{n}^{- 1 / 2})

,

\begin{matrix} E_{K_{n, ϵ}} ∥ \frac{\partial}{\partial \tilde{β}} ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} \tilde{β})) w (X_{n}) ∥ & = & E_{K_{n, ϵ}} ∥ p_{1} (Y; {\tilde{X}}_{n}^{T} \tilde{β}) w (X_{n}) {\tilde{X}}_{n} ∥ \leq C \sqrt{p_{n}} \\ E_{K_{n, ϵ}} ∥ \frac{\partial^{2}}{\partial {\tilde{β}}^{2}} ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} \tilde{β})) w (X_{n}) ∥ & = & E_{K_{n, ϵ}} ∥ p_{2} (Y; {\tilde{X}}_{n}^{T} \tilde{β}) w (X_{n}) {\tilde{X}}_{n} X_{n}^{T} ∥ \leq C p_{n} \end{matrix}

and hence,

\begin{matrix} \frac{\partial}{\partial \tilde{β}} E_{K_{n, ϵ}} {ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} \tilde{β})) w (X_{n})} & = & E_{K_{n, ϵ}} \{\frac{\partial}{\partial \tilde{β}} ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} \tilde{β})) w (X_{n})\} \\ \frac{\partial^{2}}{\partial {\tilde{β}}^{2}} E_{K_{n, ϵ}} {ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} \tilde{β})) w (X_{n})} & = & E_{K_{n, ϵ}} \{\frac{\partial^{2}}{\partial {\tilde{β}}^{2}} ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} \tilde{β})) w (X_{n})\} . \end{matrix}

Therefore,

\begin{matrix} \frac{\partial^{2}}{\partial {\tilde{β}}^{2}} E_{K_{n, ϵ}} {ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} \tilde{β})) w (X_{n})} \\ = & E_{K_{n, ϵ}} {p_{2} (Y; {\tilde{X}}_{n}^{T} \tilde{β}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} \\ = & E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} \\ + E_{K_{n, 0}} [{p_{2} (Y; {\tilde{X}}_{n}^{T} \tilde{β}) - p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0})} w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}] \\ + [E_{K_{n, ϵ}} {p_{2} (Y; {\tilde{X}}_{n}^{T} \tilde{β}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} - E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} \tilde{β}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}}] \\ = & I_{1}^{*} + I_{2}^{*} + I_{3}^{*} . \end{matrix}

We know that the minimum eigenvalues of

I_{1}^{*}

are uniformly bounded away from 0,

\begin{matrix} ∥ I_{2}^{*} ∥ & = & ∥ E_{K_{n, 0}} {p_{3} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}^{* *}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} {\tilde{X}}_{n}^{T} (\tilde{β} - {\tilde{β}}_{n, 0})} ∥ \leq C p_{n} / n^{1 / 4} = o (1) \\ ∥ I_{3}^{*} ∥ & \leq & ϵ / \sqrt{n} [∥ E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} \tilde{β}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} ∥ + ∥ E_{J} {p_{2} (Y; {\tilde{X}}_{n}^{T} \tilde{β}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} ∥] \\ \leq & C p_{n} / \sqrt{n} = o (1) . \end{matrix}

Hence, for n large enough,

\frac{\partial^{2}}{\partial {\tilde{β}}^{2}} E_{K_{n, ϵ}} {ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} \tilde{β})) w (X_{n})}

is positive definite for all

\tilde{β}

such that

∥ \tilde{β} - {\tilde{β}}_{n, 0} ∥ = O (n^{- 1 / 4} p_{n}^{- 1 / 2})

. Therefore, there exists a unique minimizer of

ℓ_{K_{n, ϵ}} (\tilde{β})

in the

n^{- 1 / 4} p_{n}^{- 1 / 2}

neighborhood of

{\tilde{β}}_{n, 0}

which covers

{\tilde{β}}_{n, ϵ}

. From

\begin{matrix} 0 & = & \frac{\partial}{\partial \tilde{β}} E_{K_{n, ϵ}} {ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} \tilde{β})) w (X_{n})} |_{\tilde{β} = {\tilde{β}}_{n, ϵ}} = E_{K_{n, ϵ}} \{\frac{\partial}{\partial \tilde{β}} ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} \tilde{β})) |_{\tilde{β} = {\tilde{β}}_{n, ϵ}} w (X_{n})\} \\ = & E_{K_{n, ϵ}} {p_{1} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, ϵ}) w (X_{n}) {\tilde{X}}_{n}}, \end{matrix}

we know

T (K_{n, ϵ}) = {\tilde{β}}_{n, ϵ}

. From the definition of

T (\cdot)

, it’s easy to see that

T (K_{n, ϵ})

is unique. ☐

Lemma A2 (

∥ T (K_{n}) - T (K_{n, ϵ}) ∥

).

Assume Conditions A0–A7 and B4. For

K_{n, ϵ}

in Equation (10),

ℓ_{K} (\cdot)

in Equation (11) and

T (\cdot)

in Equation (12), if

p_{n}^{4} / n \to 0

as

n \to \infty

and the distribution of

(X_{n}, Y)

is

K_{n, ϵ}

, then there exists a unique local minimizer

{\hat{\tilde{β}}}_{n}

of

ℓ_{K_{n}} (\tilde{β})

such that

∥ {\hat{\tilde{β}}}_{n} - T (K_{n, ϵ}) ∥ = O_{P} (\sqrt{p_{n} / n})

. Furthermore,

∥ {\hat{\tilde{β}}}_{n} - {\tilde{β}}_{n, 0} ∥ = O_{P} (\sqrt{p_{n} / n})

and

T (K_{n}) = \hat{\tilde{β}}

.

Proof.

Let

r_{n} = \sqrt{p_{n} / n}

and

{\tilde{u}}_{n} = {(u_{0}, u_{1}, \dots, u_{p_{n}})}^{T} \in R^{p_{n} + 1}

. To show the existence of the estimator, it suffices to show that for any given

κ > 0

, there exists a sufficiently large constant

C_{κ}

such that, for large n we have

\begin{matrix} P \{inf_{∥ {\tilde{u}}_{n} ∥ = C_{κ}} ℓ_{K_{n}} (T (K_{n, ϵ}) + r_{n} {\tilde{u}}_{n}) > ℓ_{K_{n}} (T (K_{n, ϵ}))\} \geq 1 - κ . \end{matrix}

(A4)

This implies that with probability at least

1 - κ

, there exists a local minimizer

{\hat{\tilde{β}}}_{n}

of

ℓ_{K_{n}} (\tilde{β})

in the ball

{T (K_{n, ϵ}) + r_{n} {\tilde{u}}_{n} : ∥ {\tilde{u}}_{n} ∥ \leq C_{κ}}

. To show Equation (A4), consider

\begin{matrix} ℓ_{K_{n}} (T (K_{n, ϵ}) + r_{n} {\tilde{u}}_{n}) - ℓ_{K_{n}} (T (K_{n, ϵ})) & = & \frac{1}{n} \sum_{i = 1}^{n} {ρ_{q} (Y_{i}, F^{- 1} ({\tilde{X}}_{n i}^{T} (T (K_{n, ϵ}) + r_{n} {\tilde{u}}_{n}))) w (X_{n i}) \\ - ρ_{q} (Y_{i}, F^{- 1} ({\tilde{X}}_{n i}^{T} T (K_{n, ϵ}))) w (X_{n i})} \\ \equiv & I_{1}, \end{matrix}

where

∥ {\tilde{u}}_{n} ∥ = C_{κ}

.

By Taylor expansion,

\begin{matrix} I_{1} = I_{1, 1} + I_{1, 2} + I_{1, 3}, \end{matrix}

(A5)

where

\begin{matrix} I_{1, 1} & = & r_{n} / n \sum_{i = 1}^{n} p_{1} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {\tilde{X}}_{n i}^{T} {\tilde{u}}_{n}, \\ I_{1, 2} & = & r_{n}^{2} / (2 n) \sum_{i = 1}^{n} p_{2} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {({\tilde{X}}_{n i}^{T} {\tilde{u}}_{n})}^{2}, \\ I_{1, 3} & = & r_{n}^{3} / (6 n) \sum_{i = 1}^{n} p_{3} (Y_{i}; {\tilde{X}}_{n i}^{T} {\tilde{β}}_{n}^{*}) w (X_{n i}) {({\tilde{X}}_{n i}^{T} {\tilde{u}}_{n})}^{3} \end{matrix}

for

{\tilde{β}}_{n}^{*}

located between

T (K_{n, ϵ})

and

T (K_{n, ϵ}) + r_{n} {\tilde{u}}_{n}

.

Since

∥ T (K_{n, ϵ}) - {\tilde{β}}_{n, 0} ∥ = O (\sqrt{p_{n} / n}) = o (1)

, the large open set considered in Condition

A 6

contains

T (K_{n, ϵ})

when n is large enough, say

n \geq N

where N is a positive constant. Therefore, for any fixed

n \geq N

, there exists a bounded open subset of

R^{p_{n} + 1}

containing

T (K_{n, ϵ})

such that for all

\tilde{β}

in this set,

∥ p_{1} (Y; {\tilde{X}}_{n}^{T} \tilde{β}) w (X_{n}) {\tilde{X}}_{n} ∥ \leq C ∥ {\tilde{X}}_{n} ∥

which is integrable with respect to

K_{n, ϵ}

, where C is a positive constant. Thus, for

n \geq N

,

\begin{matrix} 0 = \frac{\partial}{\partial \tilde{β}} E_{K_{n, ϵ}} {ρ_{q} (Y, F^{- 1} ({\tilde{X}}_{n}^{T} \tilde{β})) w (X_{n})} |_{\tilde{β} = T (K_{n, ϵ})} = E_{K_{n, ϵ}} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} . \end{matrix}

(A6)

Hence,

\begin{matrix} | I_{1, 1} | \leq r_{n} ∥ \frac{1}{n} \sum_{i = 1}^{n} p_{1} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {\tilde{X}}_{n i} ∥ ∥ {\tilde{u}}_{n} ∥ = O_{P} (r_{n} \sqrt{p_{n} / n}) ∥ {\tilde{u}}_{n} ∥ . \end{matrix}

For

I_{1, 2}

in Equation (A5),

\begin{matrix} I_{1, 2} & = & \frac{r_{n}^{2}}{2 n} \sum_{i = 1}^{n} E_{K_{n, ϵ}} {p_{2} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {({\tilde{X}}_{n i}^{T} {\tilde{u}}_{n})}^{2}} \\ + \frac{r_{n}^{2}}{2 n} \sum_{i = 1}^{n} [p_{2} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {({\tilde{X}}_{n i}^{T} {\tilde{u}}_{n})}^{2} \\ - E_{K_{n, ϵ}} {p_{2} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {({\tilde{X}}_{n i}^{T} {\tilde{u}}_{n})}^{2}}] \\ \equiv & I_{1, 2, 1} + I_{1, 2, 2}, \end{matrix}

where

I_{1, 2, 1} = 2^{- 1} r_{n}^{2} {\tilde{u}}_{n}^{T} H_{n, ϵ} {\tilde{u}}_{n}

. Meanwhile, we have

\begin{matrix} | I_{1, 2, 2} | & \leq & \frac{r_{n}^{2}}{2} ∥ \frac{1}{n} \sum_{i = 1}^{n} [p_{2} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {\tilde{X}}_{n i} {\tilde{X}}_{n i}^{T} \\ - E_{K_{n, ϵ}} {p_{2} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {\tilde{X}}_{n i} {\tilde{X}}_{n i}^{T}}] ∥_{F} {∥ {\tilde{u}}_{n} ∥}^{2} \\ = & r_{n}^{2} O_{P} (p_{n} / \sqrt{n}) {∥ {\tilde{u}}_{n} ∥}^{2} . \end{matrix}

Thus,

\begin{matrix} I_{1, 2} = 2^{- 1} r_{n}^{2} {\tilde{u}}_{n}^{T} H_{n, ϵ} {\tilde{u}}_{n} + O_{P} (r_{n}^{2} p_{n} / \sqrt{n}) {∥ {\tilde{u}}_{n} ∥}^{2} . \end{matrix}

(A7)

For

I_{1, 3}

in Equation (A5), we observe that

\begin{matrix} | I_{1, 3} | \leq C r_{n}^{3} \frac{1}{n} \sum_{i = 1}^{n} | p_{3} (Y_{i}; {\tilde{X}}_{n i}^{T} {\tilde{β}}_{n}^{*}) | w (X_{n i}) | {\tilde{X}}_{n i}^{T} {\tilde{u}}_{n} |^{3} = O_{P} (r_{n}^{3} p_{n}^{3 / 2}) {∥ {\tilde{u}}_{n} ∥}^{3} . \end{matrix}

We will show that the minimum eigenvalue of

H_{n, ϵ}

is uniformly bounded away from 0.

H_{n, ϵ} = (1 - ϵ / \sqrt{n}) H_{n, K_{n, 0}, T (K_{n, ϵ})} + ϵ / \sqrt{n} H_{n, J, T (K_{n, ϵ})}

. Note

\begin{matrix} ∥ H_{n, K_{n, 0}, T (K_{n, ϵ})} - H_{n} ∥ \\ = & ∥ E_{K_{n, 0}} [{p_{2} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) - p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0})} w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}] ∥ \\ = & ∥ E_{K_{n, 0}} [p_{3} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n}^{* *}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} {\tilde{X}}_{n}^{T} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}}] ∥ = O (p_{n}^{2} / \sqrt{n}) . \end{matrix}

Since the eigenvalues of

H_{n}

are uniformly bounded away from 0, so are those of

H_{n, K_{n, 0}, T (K_{n, ϵ})}

and

H_{n, ϵ}

.

We can choose some large

C_{κ}

such that

I_{1, 1}

and

I_{1, 3}

are both dominated by the first term of

I_{1, 2}

in Equation (A7), which is positive by the eigenvalue assumption. This implies Equation (A4).

Next we show the uniqueness of

\hat{\tilde{β}}

. For all

\tilde{β}

such that

∥ \tilde{β} - T (K_{n, ϵ}) ∥ = O (n^{- 1 / 4} p_{n}^{- 1 / 2})

,

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} p_{2} (Y_{i}; {\tilde{X}}_{n i}^{T} \tilde{β}) w (X_{n i}) {\tilde{X}}_{n i} {\tilde{X}}_{n i}^{T} \\ = & E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n} X_{n}^{T}} \\ + E_{K_{n, 0}} [{p_{2} (Y; {\tilde{X}}_{n}^{T} \tilde{β}) - p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0})} w (X_{n}) {\tilde{X}}_{n} X_{n}^{T}] \\ + [E_{K_{n, ϵ}} {p_{2} (Y; {\tilde{X}}_{n}^{T} \tilde{β}) w (X_{n}) {\tilde{X}}_{n} X_{n}^{T}} - E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} \tilde{β}) w (X_{n}) {\tilde{X}}_{n} X_{n}^{T}}] \\ + [\frac{1}{n} \sum_{i = 1}^{n} p_{2} (Y_{i}; {\tilde{X}}_{n i}^{T} \tilde{β})) w (X_{n i}) {\tilde{X}}_{n i} {\tilde{X}}_{n i}^{T} - E_{K_{n, ϵ}} {p_{2} (Y; {\tilde{X}}_{n}^{T} \tilde{β}) w (X_{n}) {\tilde{X}}_{n} X_{n}^{T}}] \\ = & I_{1}^{*} + I_{2}^{*} + I_{3}^{*} + I_{4}^{*} . \end{matrix}

We know that the minimum eigenvalues of

I_{1}^{*}

are uniformly bounded away from 0. Following the proof of Lemma A1, we have

∥ I_{2}^{*} ∥ = o (1)

and

∥ I_{3}^{*} ∥ = o (1)

. It’s easy to see

∥ I_{4}^{*} ∥ = O_{P} (p_{n} / \sqrt{n})

.

Hence, for n large enough,

\frac{\partial^{2}}{\partial {\tilde{β}}^{2}} ℓ_{K_{n}} (\tilde{β})

is positive definite with high probability for all

\tilde{β}

such that

∥ \tilde{β} - {\tilde{β}}_{n, 0} ∥ = O (n^{- 1 / 4} p_{n}^{- 1 / 2})

. Therefore, there exists a unique minimizer of

ℓ_{K_{n}} (\tilde{β})

in the

n^{- 1 / 4} p_{n}^{- 1 / 2}

neighborhood of

T (K_{n, ϵ})

which covers

\hat{\tilde{β}}

. ☐

Lemma A3 (

∥ A_{n} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}} ∥

).

Assume Conditions A0–A7 and B4. For

K_{n, ϵ}

in Equation (10) and

T (\cdot)

in Equation (12), if

p_{n}^{5} / n \to 0

as

n \to \infty

, the distribution of

(X_{n}, Y)

is

K_{n, ϵ}

and

E_{J} (∥ w (X_{n}) X_{n} ∥) \leq C

, then

\begin{matrix} \sqrt{n} A_{n} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}} = O (1), \end{matrix}

where

A_{n}

is any given

k \times (p_{n} + 1)

matrix such that

A_{n} A_{n}^{T} \to G

, with

G

being a

k \times k

positive-definite matrix and

k

is a fixed integer.

Proof.

Taylor’s expansion yields

\begin{matrix} 0 & = & E_{K_{n, ϵ}} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} \\ = & E_{K_{n, ϵ}} {p_{1} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n}} \\ + E_{K_{n, ϵ}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}} \\ + 1 / 2 E_{K_{n, ϵ}} (p_{3} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n}^{*}) w (X_{n}) {\tilde{X}}_{n} {[{\tilde{X}}_{n}^{T} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}}]}^{2}) \\ = & I_{1} + I_{2} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}} + I_{3}, \end{matrix}

where

{\tilde{β}}_{n}^{*}

lies between

T (K_{n, ϵ})

and

{\tilde{β}}_{n, 0}

. Below, we will show

\begin{matrix} ∥ I_{1} ∥ = O (1 / \sqrt{n}), ∥ I_{2} - H_{n} ∥ = O (p_{n} / \sqrt{n}), ∥ I_{3} ∥ = O (p_{n}^{5 / 2} / n) . \end{matrix}

First,

∥ I_{1} ∥ = ϵ / \sqrt{n} ∥ E_{J} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} ∥ \leq C ϵ / \sqrt{n} E_{J} (∥ w (X_{n}) X_{n} ∥) = O (1 / \sqrt{n})

. Following the proof of

I_{3}^{*}

in Lemma A1,

∥ I_{2} - H_{n} ∥ = O (p_{n} / \sqrt{n})

. Since

∥ T (K_{n, ϵ}) - {\tilde{β}}_{n, 0} ∥ = O (\sqrt{p_{n} / n})

, we have

∥ I_{3} ∥ = O (p_{n}^{5 / 2} / n)

.

Therefore,

\sqrt{n} A_{n} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}} = - \sqrt{n} A_{n} H_{n}^{- 1} I_{1} + o (1),

which completes the proof. ☐

Lemma A4 (asymptotic normality of

T (K_{n}) - T (K_{n, ϵ})

).

Assume Conditions A0–A8 and B4. If

p_{n}^{5} / n \to 0

as

n \to \infty

and the distribution of

(X_{n}, Y)

is

K_{n, ϵ}

, then

\begin{matrix} \sqrt{n} {U (K_{n, ϵ})}^{- 1 / 2} A_{n} {T (K_{n}) - T (K_{n, ϵ})} \overset{L}{⟶} N (0, I_{k}), \end{matrix}

where

U (K_{n, ϵ}) = A_{n} H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} A_{n}^{T}

,

A_{n}

is any given

k \times (p_{n} + 1)

matrix such that

A_{n} A_{n}^{T} \to G

, with

G

being a

k \times k

positive-definite matrix,

k

is a fixed integer.

Proof.

We will first show that

\begin{matrix} T (K_{n}) - T (K_{n, ϵ}) = - \frac{1}{n} H_{n, ϵ}^{- 1} \sum_{i = 1}^{n} p_{1} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {\tilde{X}}_{n i} + o_{P} (n^{- 1 / 2}) . \end{matrix}

(A8)

From

\frac{\partial ℓ_{K_{n}} (\tilde{β})}{\partial \tilde{β}} |_{\tilde{β} = T (K_{n})} = 0

, Taylor’s expansion yields

\begin{matrix} 0 & = & \{\frac{1}{n} \sum_{i = 1}^{n} p_{1} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {\tilde{X}}_{n i}\} \\ + \{\frac{1}{n} \sum_{i = 1}^{n} p_{2} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {\tilde{X}}_{n i} {\tilde{X}}_{n i}^{T}\} {T (K_{n}) - T (K_{n, ϵ})} \\ + \frac{1}{2 n} \sum_{i = 1}^{n} p_{3} (Y_{i}; {\tilde{X}}_{n i}^{T} {\tilde{β}}_{n}^{*}) w (X_{n i}) {[{\tilde{X}}_{n i}^{T} {T (K_{n}) - T (K_{n, ϵ})}]}^{2} {\tilde{X}}_{n i} \\ \equiv & \{\frac{1}{n} \sum_{i = 1}^{n} p_{1} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {\tilde{X}}_{n i}\} + I_{2} {T (K_{n}) - T (K_{n, ϵ})} + I_{3}, \end{matrix}

(A9)

where

{\tilde{β}}_{n}^{*}

lies between

T (K_{n, ϵ})

and

T (K_{n})

. Below, we will show

\begin{matrix} ∥ I_{2} - H_{n, ϵ} ∥ = O_{P} (p_{n} / \sqrt{n}), ∥ I_{3} ∥ = O_{P} (p_{n}^{5 / 2} / n) . \end{matrix}

Similar arguments for the proof of

I_{1, 2}

of Lemma A2, we have

∥ I_{2} - H_{n, ϵ} ∥ = O_{P} (p_{n} / \sqrt{n})

.

Second, a similar proof used for

I_{1, 3}^{*}

in Equation (A5) gives

∥ I_{3} ∥ = O_{P} (p_{n}^{5 / 2} / n) .

Third, by Equation (A9) and

∥ T (K_{n}) - T (K_{n, ϵ}) ∥ = O_{P} (\sqrt{p_{n} / n})

, we see that

\begin{matrix} H_{n, ϵ} {T (K_{n}) - T (K_{n, ϵ})} = - \frac{1}{n} \sum_{i = 1}^{n} p_{1} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {\tilde{X}}_{n i} + u_{n}, \end{matrix}

where

∥ u_{n} ∥ = O_{P} (p_{n}^{5 / 2} / n) = o_{P} (n^{- 1 / 2})

. From the proof of Lemma A2, the eigenvalues of

H_{n, ϵ}

are uniformly bounded away from 0 and we complete the proof of Equation (A8).

Following the proof for the bounded eigenvalues of

H_{n, ϵ}

in Lemma A2, we can show that the eigenvalues of

Ω_{n, ϵ}

are uniformly bounded away from 0. Hence, the eigenvalues of

H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1}

are uniformly bounded away from 0, as are the eigenvalues of

U (K_{n, ϵ})

. From Equation (A8), we see that

\begin{matrix} A_{n} {T (K_{n}) - T (K_{n, ϵ})} = - \frac{1}{n} A_{n} H_{n, ϵ}^{- 1} \sum_{i = 1}^{n} p_{1} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {\tilde{X}}_{n i} + o_{P} (n^{- 1 / 2}) . \end{matrix}

It follows that

\begin{matrix} \sqrt{n} {U (K_{n, ϵ})}^{- 1 / 2} A_{n} {T (K_{n}) - T (K_{n, ϵ})} = \sum_{i = 1}^{n} R_{n i} + o_{P} (1), \end{matrix}

where

R_{n i} = - n^{- 1 / 2} {U (K_{n, ϵ})}^{- 1 / 2} A_{n} H_{n, ϵ}^{- 1} p_{1} (Y_{i}; {\tilde{X}}_{n i}^{T} T (K_{n, ϵ})) w (X_{n i}) {\tilde{X}}_{n i}

. Following (A6) in Lemma A2, one can show that

E_{K_{n, ϵ}} (R_{n i}) = 0

for n large enough.

To show

\sum_{i = 1}^{n} R_{n i} \overset{L}{⟶} N (0, I_{k})

, we apply the Lindeberg-Feller central limit theorem in [26]. Specifically, we check (I)

\sum_{i = 1}^{n} {cov}_{K_{n, ϵ}} (R_{n i}) \to I_{k}

; (II)

\sum_{i = 1}^{n} E_{K_{n, ϵ}} (∥ R_{n i} ∥^{2 + δ}) = o (1)

for some

δ > 0

. Condition (I) is straightforward since

\sum_{i = 1}^{n} {cov}_{K_{n, ϵ}} (R_{n i}) = {U (K_{n, ϵ})}^{- 1 / 2} U (K_{n, ϵ}) {U (K_{n, ϵ})}^{- 1 / 2} = I_{k}

. To check condition (II), we can show that

E_{K_{n, ϵ}} (∥ R_{n i} ∥^{2 + δ}) = O ({(p_{n} / n)}^{(2 + δ) / 2})

. This yields

\sum_{i = 1}^{n} E_{K_{n, ϵ}} (∥ R_{n i} ∥^{2 + δ}) \leq O (p_{n}^{(2 + δ) / 2} / n^{δ / 2}) = o (1)

. Hence

\begin{matrix} \sqrt{n} {U (K_{n, ϵ})}^{- 1 / 2} A_{n} {T (K_{n}) - T (K_{n, ϵ})} \overset{L}{⟶} N (0, I_{k}) . \end{matrix}

Thus, we complete the proof. ☐

Lemma A5 (asymptotic covariance matrices

U (K_{n, ϵ})

and

U_{n}

).

Assume Conditions A0–A9 and B4. If

p_{n}^{4} / n \to 0

as

n \to \infty

, then

\begin{matrix} ∥ U_{n}^{- 1 / 2} {U (K_{n, ϵ})}^{1 / 2} - I_{k} ∥ = O (p_{n} / n^{1 / 4}), \end{matrix}

where

U (K_{n, ϵ}) = A_{n} H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} A_{n}^{T}

,

A_{n}

is any given

k \times (p_{n} + 1)

matrix such that

A_{n} A_{n}^{T} \to G

, with

G

being a

k \times k

positive-definite matrix, and

k

is a fixed integer.

Proof.

Note that

\begin{matrix} ∥ {U (K_{n, ϵ})}^{1 / 2} - U_{n}^{1 / 2} ∥^{2} \leq ∥ U (K_{n, ϵ}) - U_{n} ∥ \\ \leq & ∥ H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} - H_{n}^{- 1} Ω_{n} H_{n}^{- 1} ∥ ∥ A_{n} ∥_{F}^{2} . \end{matrix}

Since

∥ A_{n} ∥_{F}^{2} \to tr (G)

, it suffices to prove that

∥ H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} - H_{n}^{- 1} Ω_{n} H_{n}^{- 1} ∥ = O (p_{n}^{2} / \sqrt{n})

.

First, we prove

∥ H_{n, ϵ} - H_{n} ∥ = O (p_{n}^{2} / \sqrt{n})

. Note that

\begin{matrix} H_{n, ϵ} - H_{n} & = & E_{K_{n, ϵ}} [{p_{2} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) - p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0})} w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}] \\ + [E_{K_{n, ϵ}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} - H_{n}] \\ = & E_{K_{n, ϵ}} [p_{3} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}^{*}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} {\tilde{X}}_{n}^{T} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}}] \\ + [E_{K_{n, ϵ}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} - H_{n}] \\ \equiv & I_{1} + I_{2} . \end{matrix}

We know that

∥ I_{1} ∥ = O (p_{n}^{2} / \sqrt{n})

and

∥ I_{2} ∥ = O (p_{n} / \sqrt{n})

. Thus,

∥ I_{1} ∥ = O (p_{n}^{2} / \sqrt{n})

.

Second, we show

∥ Ω_{n, ϵ} - Ω_{n} ∥ = O (p_{n}^{2} / \sqrt{n})

. It is easy to see that

\begin{matrix} Ω_{n, ϵ} - Ω_{n} & = & E_{K_{n, ϵ}} [{p_{1}^{2} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) - p_{1}^{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0})} w^{2} (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}] \\ + [E_{K_{n, ϵ}} {p_{1}^{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w^{2} (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} - Ω_{n}] \\ = & Δ_{1, 1} + Δ_{1, 2}, \end{matrix}

where

∥ Δ_{1, 1} ∥ = O (p_{n}^{2} / \sqrt{n})

and

∥ Δ_{1, 2} ∥ = O (p_{n} / \sqrt{n})

. We observe that

∥ Ω_{n, ϵ} - Ω_{n} ∥ = O (p_{n}^{2} / \sqrt{n})

.

Third, we show

∥ H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} - H_{n}^{- 1} Ω_{n} H_{n}^{- 1} ∥ = O (p_{n}^{2} / \sqrt{n})

. Note

H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} - H_{n}^{- 1} Ω_{n} H_{n}^{- 1} = L_{1} + L_{2} + L_{3}

, where

L_{1} = H_{n, ϵ}^{- 1} (Ω_{n, ϵ} - Ω_{n}) H_{n, ϵ}^{- 1}

,

L_{2} = H_{n, ϵ}^{- 1} (H_{n} - H_{n, ϵ}) H_{n}^{- 1} Ω_{n} H_{n, ϵ}^{- 1}

and

L_{3} = H_{n}^{- 1} Ω_{n} H_{n, ϵ}^{- 1} (H_{n} - H_{n, ϵ}) H_{n}^{- 1}

. Under Conditions

A 7

and

A 9

, it is straightforward to see that

∥ H_{n, ϵ}^{- 1} ∥ = O (1)

,

∥ H_{n}^{- 1} ∥ = O (1)

and

∥ H_{n}^{- 1} Ω_{n} ∥ = O (1)

. Since

∥ L_{1} ∥ \leq ∥ H_{n, ϵ}^{- 1} ∥ ∥ Ω_{n, ϵ} - Ω_{n} ∥ ∥ H_{n, ϵ}^{- 1} ∥

, we conclude

∥ L_{1} ∥ = O (p_{n}^{2} / \sqrt{n})

, and similarly

∥ L_{2} ∥ = O (p_{n}^{2} / \sqrt{n})

and

∥ L_{3} ∥ = O (p_{n}^{2} / \sqrt{n})

. Hence,

∥ H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} - H_{n}^{- 1} Ω_{n} H_{n}^{- 1} ∥ = O (p_{n}^{2} / \sqrt{n})

.

Thus, we can conclude that

∥ U (K_{n, ϵ}) - U_{n} ∥ = O (p_{n}^{2} / \sqrt{n})

and that the eigenvalues of

U (K_{n, ϵ})

and

U_{n}

are uniformly bounded away from 0 and ∞. Consequently,

∥ {U (K_{n, ϵ})}^{1 / 2} - U_{n}^{1 / 2} ∥ = O (p_{n} / n^{1 / 4})

and proof is finished. ☐

Lemma A6 (asymptotic covariance matrices

U (K_{n})

and

U (K_{n, ϵ})

).

Assume Conditions A0–A9 and B4. If

p_{n}^{4} / n \to 0

n \to \infty

and the distribution of

(X_{n}, Y)

is

K_{n, ϵ}

, then

\begin{matrix} ∥ {U (K_{n})}^{- 1 / 2} {U (K_{n, ϵ})}^{1 / 2} - I_{k} ∥ = O_{P} (p_{n} / n^{1 / 4}), \end{matrix}

where

U (K_{n, ϵ}) = A_{n} H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} A_{n}^{T},

U (K_{n}) = A_{n} {\hat{H}}_{n}^{- 1} {\hat{Ω}}_{n} {\hat{H}}_{n}^{- 1} A_{n}^{T},

A_{n}

is any given

k \times (p_{n} + 1)

matrix such that

A_{n} A_{n}^{T} \to G

, with

G

being a

k \times k

positive-definite matrix, and

k

is a fixed integer.

Proof.

Note that

∥ {U (K_{n})}^{1 / 2} - {U (K_{n, ϵ})}^{1 / 2} ∥^{2} \leq ∥ U (K_{n}) - U (K_{n, ϵ}) ∥ \leq ∥ {\hat{H}}_{n}^{- 1} {\hat{Ω}}_{n} {\hat{H}}_{n}^{- 1} - H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} ∥ ∥ A_{n} ∥_{F}^{2}

. Since

∥ A_{n} ∥_{F}^{2} \to tr (G)

, it suffices to prove that

∥ {\hat{H}}_{n}^{- 1} {\hat{Ω}}_{n} {\hat{H}}_{n}^{- 1} - H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} ∥ = O_{P} (p_{n}^{2} / \sqrt{n})

.

Following the proof of Proposition 1 in [1], we can show that

∥ {\hat{H}}_{n} - H_{n, ϵ} ∥ = O_{P} (p_{n}^{2} / \sqrt{n})

and

∥ {\hat{Ω}}_{n} - Ω_{n, ϵ} ∥ = O_{P} (p_{n}^{2} / \sqrt{n})

.

To show

∥ {\hat{H}}_{n}^{- 1} {\hat{Ω}}_{n} {\hat{H}}_{n}^{- 1} - H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} ∥ = O_{P} (p_{n}^{2} / \sqrt{n})

, note

{\hat{H}}_{n}^{- 1} {\hat{Ω}}_{n} {\hat{H}}_{n}^{- 1} - H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} = L_{1} + L_{2} + L_{3}

, where

L_{1} = {\hat{H}}_{n}^{- 1} ({\hat{Ω}}_{n} - Ω_{n, ϵ}) {\hat{H}}_{n}^{- 1}

,

L_{2} = {\hat{H}}_{n}^{- 1} (H_{n, ϵ} - {\hat{H}}_{n}) H_{n, ϵ}^{- 1} Ω_{n, ϵ} {\hat{H}}_{n}^{- 1}

and

L_{3} = H_{n, ϵ}^{- 1} Ω_{n, ϵ} {\hat{H}}_{n}^{- 1} (H_{n, ϵ} - {\hat{H}}_{n}) H_{n, ϵ}^{- 1}

. Following the proof in Lemma A2, it is straightforward to verify that

∥ H_{n, ϵ}^{- 1} ∥ = O (1)

,

∥ {\hat{H}}_{n}^{- 1} ∥ = O_{P} (1)

. In addition,

∥ H_{n, ϵ}^{- 1} Ω_{n, ϵ} ∥ = ∥ (H_{n, ϵ}^{- 1} - H_{n}^{- 1}) Ω_{n, ϵ} + H_{n}^{- 1} (Ω_{n, ϵ} - Ω_{n}) + H_{n}^{- 1} Ω_{n} ∥ \leq ∥ H_{n, ϵ}^{- 1} ∥ ∥ H_{n, ϵ} - H_{n} ∥ ∥ H_{n}^{- 1} ∥ ∥ Ω_{n, ϵ} ∥ + ∥ H_{n}^{- 1} ∥ ∥ Ω_{n, ϵ} - Ω_{n} ∥ + ∥ H_{n}^{- 1} Ω_{n} ∥ = O (1)

.

Since

∥ L_{1} ∥ \leq ∥ {\hat{H}}_{n}^{- 1} ∥ ∥ {\hat{Ω}}_{n} - Ω_{n, ϵ} ∥ ∥ {\hat{H}}_{n}^{- 1} ∥

, we conclude

∥ L_{1} ∥ = O_{P} (p_{n}^{2} / \sqrt{n})

, and similarly

∥ L_{2} ∥ = O_{P} (p_{n}^{2} / \sqrt{n})

and

∥ L_{3} ∥ = O_{P} (p_{n}^{2} / \sqrt{n})

. Hence,

∥ {\hat{H}}_{n}^{- 1} {\hat{Ω}}_{n} {\hat{H}}_{n}^{- 1} - H_{n, ϵ}^{- 1} Ω_{n, ϵ} H_{n, ϵ}^{- 1} ∥ = O_{P} (p_{n}^{2} / \sqrt{n})

.

Thus, we can conclude that

∥ U (K_{n}) - U (K_{n, ϵ}) ∥ = O_{P} (p_{n}^{2} / \sqrt{n})

and the eigenvalues of

U (K_{n})

are uniformly bounded away from 0 and ∞ with probability tending to 1. Noting that

∥ {U (K_{n})}^{1 / 2} - {U (K_{n, ϵ})}^{1 / 2} ∥^{2} \leq ∥ U (K_{n}) - U (K_{n, ϵ}) ∥

. ☐

Lemma A7 (asymptotic distribution of test statistic).

Assume Conditions A0–A9 and B4. If

p_{n}^{6} / n \to 0

n \to \infty

and the distribution of

(X_{n}, Y)

is

K_{n, ϵ}

, then

\sqrt{n} [{U (K_{n})}^{- 1 / 2} A_{n} {T (K_{n}) - {\tilde{β}}_{n, 0}} - U_{n}^{- 1 / 2} A_{n} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}}] \overset{L}{⟶} N (0, I_{k}),

where

A_{n}

is any given

k \times (p_{n} + 1)

matrix such that

A_{n} A_{n}^{T} \to G

, with

G

being a

k \times k

positive-definite matrix, and

k

is a fixed integer.

Proof.

Note that

\begin{matrix} \sqrt{n} [{U (K_{n})}^{- 1 / 2} A_{n} {T (K_{n}) - {\tilde{β}}_{n, 0}} - U_{n}^{- 1 / 2} A_{n} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}}] \\ = & \sqrt{n} {U (K_{n})}^{- 1 / 2} A_{n} {T (K_{n}) - T (K_{n, ϵ})} \\ + \sqrt{n} [{U (K_{n})}^{- 1 / 2} - {U (K_{n, ϵ})}^{- 1 / 2}] A_{n} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}} \\ + \sqrt{n} [{U (K_{n, ϵ})}^{- 1 / 2} - U_{n}^{- 1 / 2}] A_{n} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}} \\ \equiv & I + II + III . \end{matrix}

For term I, we obtain from Lemma A4 that

\sqrt{n} {U (K_{n, ϵ})}^{- 1 / 2} A_{n} (T (K_{n}) - T (K_{n, ϵ})) \overset{L}{⟶} N (0, I_{k}) .

From Lemma A6, we get

∥ {U (K_{n})}^{- 1 / 2} {U (K_{n, ϵ})}^{1 / 2} - I_{k} ∥ = o_{P} (1) .

Thus, by Slutsky theorem,

\begin{matrix} I \overset{L}{⟶} N (0, I_{k}) . \end{matrix}

(A10)

For term

II

, we see from Lemma A6 that

\begin{matrix} ∥ {U (K_{n})}^{- 1 / 2} - {U (K_{n, ϵ})}^{- 1 / 2} ∥ = O_{P} (p_{n} / n^{1 / 4}) . \end{matrix}

Since

\begin{matrix} ∥ A_{n} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}} ∥ \leq ∥ A_{n} ∥ ∥ T (K_{n, ϵ}) - {\tilde{β}}_{n, 0} ∥ = O (\sqrt{p_{n} / n}) . \end{matrix}

Thus,

\begin{matrix} ∥ II ∥ \leq \sqrt{n} ∥ {U (K_{n})}^{- 1 / 2} - {U (K_{n, ϵ})}^{- 1 / 2} ∥ ∥ A_{n} ∥ ∥ T (K_{n, ϵ}) - {\tilde{β}}_{n, 0} ∥ = O_{P} (p_{n}^{3 / 2} / n^{1 / 4}) . \end{matrix}

(A11)

Similarly,

∥ III ∥ = o_{P} (1) .

Combining (A10) and (A11) with Slutsky theorem completes the proof. ☐

Lemma A8 (Influence Function IF).

Assume Conditions A1–A8 and B4. For any fixed sample size n,

\begin{matrix} \frac{\partial}{\partial t} T ((1 - t) K_{n, 0} + t J) |_{t = t_{0}} \equiv lim_{t \to t_{0}} \frac{T ((1 - t) K_{n, 0} + t J) - T ((1 - t_{0}) K_{n, 0} + t_{0} J)}{t - t_{0}} \\ = & - H_{n, K_{t_{0}}, T (K_{t_{0}})}^{- 1} [E_{J} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))} - E_{K_{n, 0}} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))}], \end{matrix}

where

K_{t_{0}} = (1 - t_{0}) K_{n, 0} + t_{0} J

and

t_{0}

is a positive constant such that

t_{0} \leq c / p_{n}^{2}

with

c > 0

a sufficiently small constant. In addition,

∥ H_{n, K_{t_{0}}, T (K_{t_{0}})}^{- 1} ∥ \leq C

uniformly for all n and

t_{0}

such that

t_{0} \leq c / p_{n}^{2}

with

c > 0

a sufficiently small constant.

Proof.

We follow the proof of Theorem 5.1 in [27]. Note

\begin{matrix} lim_{t \to t_{0}} \frac{T ((1 - t) K_{n, 0} + t J) - T ((1 - t_{0}) K_{n, 0} + t_{0} J)}{t - t_{0}} \\ = & lim_{Δ \to 0} \frac{T (K_{t_{0}} + Δ (J - K_{n, 0})) - T (K_{t_{0}})}{Δ}, \end{matrix}

where

Δ = t - t_{0}

.

It suffices to prove that for any sequence

{Δ_{j}}_{j = 1}^{\infty}

such that

{lim}_{j \to \infty} Δ_{j} = 0

, we have

\begin{matrix} lim_{j \to \infty} \frac{T (K_{t_{0}} + Δ_{j} (J - K_{n, 0})) - T (K_{t_{0}})}{Δ_{j}} \\ = & - H_{n, K_{t_{0}}, T (K_{t_{0}})}^{- 1} [E_{J} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))} - E_{K_{n, 0}} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))}] . \end{matrix}

Following similar proofs in Lemma A1, we can show that for

t_{0}

sufficiently small,

\begin{matrix} ∥ {\tilde{β}}_{n, 0} - T (K_{t_{0}}) ∥ \leq C t_{0} \sqrt{p_{n}} . \end{matrix}

(A12)

Next we will show that the eigenvalues of

H_{n, K_{t_{0}}, T (K_{t_{0}})}

are bounded away from 0.

\begin{matrix} H_{n, K_{t_{0}}, T (K_{t_{0}})} = (1 - t_{0}) H_{n, K_{n, 0}, T (K_{t_{0}})} + t_{0} H_{n, J, T (K_{t_{0}})} \\ = & (1 - t_{0}) H_{n} + t_{0} H_{n, J, {\tilde{β}}_{n, 0}} + (1 - t_{0}) {H_{n, K_{n, 0}, T (K_{t_{0}})} - H_{n}} \\ + t_{0} {H_{n, J, T (K_{t_{0}})} - H_{n, J, {\tilde{β}}_{n, 0}}} = (1 - t_{0}) I_{1} + t_{0} I_{2} + I_{3} + I_{4} . \end{matrix}

First,

\begin{matrix} ∥ I_{3} ∥ & \leq & C E_{K_{n, 0}} ∥ {p_{2} (Y; {\tilde{X}}_{n}^{T} T (K_{t_{0}})) - p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0})} w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} ∥ \\ \leq & C p_{n}^{3 / 2} ∥ T (K_{t_{0}}) - {\tilde{β}}_{n, 0} ∥ \leq C p_{n}^{2} t_{0} . \end{matrix}

Similarly,

∥ I_{2} ∥ \leq C p_{n} t_{0}

and

∥ I_{4} ∥ \leq C p_{n}^{2} t_{0}^{2}

. Since the eigenvalues of

I_{1}

are bounded away from zero,

∥ I_{2} ∥

,

∥ I_{3} ∥

and

∥ I_{4} ∥

could be sufficiently small, we conclude that for

t_{0} \leq c / p_{n}^{2}

when c is sufficiently small, the eigenvalues of

H_{n, K_{t_{0}}, T (K_{t_{0}})}

are uniformly bounded away from 0.

Define

K_{j} = K_{t_{0}} + Δ_{j} (J - K_{n, 0})

. Following similar arguments for (A6) in Lemma A2, for j large enough,

E_{K_{j}} {ψ_{RBD} (Z_{n}; T (K_{j}))} = 0

. We will only consider j large enough below. The two term Taylor expansion yields

\begin{matrix} 0 = E_{K_{j}} {ψ_{RBD} (Z_{n}; T (K_{j}))} = E_{K_{j}} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))} + H_{n, K_{j}, {\tilde{β}}_{j}^{*}} {T (K_{j}) - T (K_{t_{0}})}, \end{matrix}

(A13)

where

{\tilde{β}}_{j}^{*}

lies between

T (K_{t_{0}})

and

T (K_{j})

.

Thus, from (A13) and the fact

E_{K_{j}} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))} = Δ_{j} [E_{J} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))} - E_{K_{n, 0}} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))}]

, we have

\begin{matrix} 0 & = & E_{K_{j}} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))} + H_{n, K_{t_{0}}, T (K_{t_{0}})} {T (K_{j}) - T (K_{t_{0}})} \\ + {H_{n, K_{j}, {\tilde{β}}_{j}^{*}} - H_{n, K_{t_{0}}, T (K_{t_{0}})}} {T (K_{j}) - T (K_{t_{0}})} \\ = & Δ_{j} [E_{J} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))} - E_{K_{n, 0}} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))}] \\ + H_{n, K_{t_{0}}, T (K_{t_{0}})} {T (K_{j}) - T (K_{t_{0}})} + (H_{n, K_{j}, {\tilde{β}}_{j}^{*}} - H_{n, K_{t_{0}}, T (K_{t_{0}})}) {T (K_{j}) - T (K_{t_{0}})}, \end{matrix}

and we obtain that

\begin{matrix} T (K_{j}) - T (K_{t_{0}}) \\ = & - Δ_{j} H_{n, K_{t_{0}}, T (K_{t_{0}})}^{- 1} [E_{J} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))} - E_{K_{n, 0}} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))}] \\ - H_{n, K_{t_{0}}, T (K_{t_{0}})}^{- 1} {H_{n, K_{j}, {\tilde{β}}_{j}^{*}} - H_{n, K_{t_{0}}, T (K_{t_{0}})}} {T (K_{j}) - T (K_{t_{0}})} . \end{matrix}

(A14)

Next, we will show that

∥ H_{n, K_{j}, {\tilde{β}}_{j}^{*}} - H_{n, K_{t_{0}}, T (K_{t_{0}})} ∥ = o (1)

as

j \to \infty

for any fixed n. Since

∥ {\tilde{β}}_{j}^{*} - T (K_{t_{0}}) ∥ \leq ∥ T (K_{j}) - T (K_{t_{0}}) ∥ = O (Δ_{j})

,

\begin{matrix} ∥ H_{n, K_{j}, {\tilde{β}}_{j}^{*}} - H_{n, K_{t_{0}}, {\tilde{β}}_{j}^{*}} ∥ \\ = & Δ_{j} ∥ E_{J} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{j}^{*}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} - E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{j}^{*}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} ∥ \\ = & O (Δ_{j}) = o (1) as j \to \infty, \end{matrix}

(A15)

and also,

\begin{matrix} ∥ H_{n, K_{t_{0}}, {\tilde{β}}_{j}^{*}} - H_{n, K_{t_{0}}, T (K_{t_{0}})} ∥ \\ = & ∥ E_{K_{t_{0}}} [{p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{j}^{*}) - p_{2} (Y; {\tilde{X}}_{n}^{T} T (K_{t_{0}}))} w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}] ∥ \\ = & o (1) as j \to \infty . \end{matrix}

(A16)

From Equations (A15) and (A16),

\begin{matrix} ∥ H_{n, K_{j}, {\tilde{β}}_{j}^{*}} - H_{n, K_{t_{0}}, T (K_{t_{0}})} ∥ = o (1) as j \to \infty \end{matrix}

which, together with Equations (A12) and (A14), implies that

\begin{matrix} ∥ T (K_{j}) - T (K_{t_{0}}) + Δ_{j} H_{n, K_{t_{0}}, T (K_{t_{0}})}^{- 1} [E_{J} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))} - E_{K_{n, 0}} {ψ_{RBD} (Z_{n}; T (K_{t_{0}}))}] ∥ = o (Δ_{j}) . \end{matrix}

This completes the proof. ☐

Lemma A9.

Assume Conditions A1–A8 and B4 and

{sup}_{n} E_{J} (∥ w (X_{n}) {\tilde{X}}_{n} ∥) \leq C

. Let

H_{k} (\cdot; δ)

be the cumulative distribution function of

χ_{k}^{2} (δ)

distribution with δ the noncentrality parameter. Denote

δ (ϵ) = n ∥ U_{n}^{- 1 / 2} {A_{n} T (K_{n, ϵ}) - g_{0}} ∥^{2}

. Let

b (ϵ) = - H_{k} (x; δ (ϵ))

. Then, for any fixed

x > 0

,

{sup}_{ϵ \in [0, C]} {lim sup}_{n \to \infty} | b^{(3)} (ϵ) | \leq C

under

H_{0}

and

{sup}_{ϵ \in [0, C]} {lim sup}_{n \to \infty} | b^{″} (ϵ) | \leq C

under

H_{1 n}

.

Proof.

Since

b (ϵ) = - H_{k} (x; δ (ϵ))

, we have

\begin{matrix} b^{'} (ϵ) & = & - \frac{\partial}{\partial ϵ} H_{k} (x; δ (ϵ)) = \{- \frac{\partial}{\partial δ} H_{k} (x; δ) |_{δ = δ (ϵ)}\} \{\frac{\partial δ (ϵ)}{\partial ϵ}\} \\ b^{″} (ϵ) & = & \{- \frac{\partial^{2}}{\partial δ^{2}} H_{k} (x; δ) |_{δ = δ (ϵ)}\} {\{\frac{\partial δ (ϵ)}{\partial ϵ}\}}^{2} + \{- \frac{\partial}{\partial δ} H_{k} (x; δ) |_{δ = δ (ϵ)}\} \{\frac{\partial^{2} δ (ϵ)}{\partial ϵ^{2}}\} \\ b^{(3)} (ϵ) & = & \{- \frac{\partial^{3}}{\partial δ^{3}} H_{k} (x; δ) |_{δ = δ (ϵ)}\} {\{\frac{\partial δ (ϵ)}{\partial ϵ}\}}^{3} \\ + 3 \{- \frac{\partial^{2}}{\partial δ^{2}} H_{k} (x; δ) |_{δ = δ (ϵ)}\} \{\frac{\partial δ (ϵ)}{\partial ϵ}\} \{\frac{\partial^{2} δ (ϵ)}{\partial ϵ^{2}}\} \\ + \{- \frac{\partial}{\partial δ} H_{k} (x; δ) |_{δ = δ (ϵ)}\} \{\frac{\partial^{3} δ (ϵ)}{\partial ϵ^{3}}\} . \end{matrix}

To complete the proof, we only need to show that

\partial^{i} / \partial δ^{i} H_{k} {(x; δ) |}_{δ = δ (ϵ)}

and

\partial^{i} δ (ϵ) / \partial ϵ^{i}

(

i = 1, 2, 3

) are bounded as

n \to \infty

for all

ϵ \in [0, C]

. Note that

\begin{matrix} H_{k} (x; δ) = e^{- δ / 2} \sum_{j = 0}^{\infty} \frac{{(δ / 2)}^{j}}{j!} \frac{γ (j + k / 2, x / 2)}{Γ (j + k / 2)}, \end{matrix}

where

Γ (\cdot)

is the Gamma function, and

γ (\cdot, \cdot)

is the lower incomplete gamma function

γ (s, x) = \int_{0}^{x} t^{s - 1} e^{- t} d t,

which satisfies

γ (s, x) = (s - 1) γ (s - 1, x) - x^{s - 1} e^{- x} .

Therefore,

\begin{matrix} \frac{\partial}{\partial δ} H_{k} (x; δ) & = & - \frac{e^{- δ / 2}}{2} \sum_{j = 0}^{\infty} \frac{{(δ / 2)}^{j}}{j!} \frac{γ (j + k / 2, x / 2)}{Γ (j + k / 2)} + \frac{e^{- δ / 2}}{2} \sum_{j = 1}^{\infty} \frac{{(δ / 2)}^{j - 1}}{(j - 1)!} \frac{γ (j + k / 2, x / 2)}{Γ (j + k / 2)} \\ = & \frac{1}{2} e^{- δ / 2} \sum_{j = 0}^{\infty} \frac{{(δ / 2)}^{j}}{j!} \{- \frac{γ (j + k / 2, x / 2)}{Γ (j + k / 2)} + \frac{γ (j + 1 + k / 2, x / 2)}{Γ (j + 1 + k / 2)}\} . \end{matrix}

Since

\begin{matrix} \frac{γ (j + 1 + k / 2, x / 2)}{Γ (j + 1 + k / 2)} & = & \frac{(j + k / 2) γ (j + k / 2, x / 2) - {(x / 2)}^{j + k / 2} e^{- x / 2}}{Γ (j + 1 + k / 2)} \\ = & \frac{γ (j + k / 2, x / 2)}{Γ (j + k / 2)} - \frac{{(x / 2)}^{j + k / 2} e^{- x / 2}}{Γ (j + 1 + k / 2)}, \end{matrix}

we have

\begin{matrix} \frac{\partial}{\partial δ} H_{k} (x; δ) & = & - \frac{1}{2} e^{- δ / 2} \sum_{j = 0}^{\infty} \frac{{(δ / 2)}^{j}}{j!} \frac{{(x / 2)}^{j + k / 2} e^{- x / 2}}{Γ (j + 1 + k / 2)} \\ \frac{\partial^{2}}{\partial δ^{2}} H_{k} (x; δ) & = & \frac{1}{4} e^{- δ / 2} \sum_{j = 0}^{\infty} \frac{{(δ / 2)}^{j}}{j!} \frac{{(x / 2)}^{j + k / 2} e^{- x / 2}}{Γ (j + 1 + k / 2)} - \frac{1}{4} e^{- δ / 2} \sum_{j = 0}^{\infty} \frac{{(δ / 2)}^{j}}{j!} \frac{{(x / 2)}^{j + 1 + k / 2} e^{- x / 2}}{Γ (j + 2 + k / 2)} \\ = & \frac{1}{4} {(x / 2)}^{k / 2} e^{- x / 2} e^{- δ / 2} \sum_{j = 0}^{\infty} \frac{{(δ / 2)}^{j}}{j!} \{\frac{{(x / 2)}^{j}}{Γ (j + 1 + k / 2)} - \frac{{(x / 2)}^{j + 1}}{Γ (j + 2 + k / 2)}\} \\ = & \frac{1}{4} {(x / 2)}^{k / 2} e^{- x / 2} e^{- δ / 2} \sum_{j = 0}^{\infty} \frac{{(δ / 2)}^{j}}{j!} \frac{{(x / 2)}^{j}}{Γ (j + 1 + k / 2)} \{1 - \frac{(x / 2)}{j + 1 + k / 2}\} \\ \frac{\partial^{3}}{\partial δ^{3}} H_{k} (x; δ) & = & - \frac{1}{8} {(x / 2)}^{k / 2} e^{- x / 2} e^{- δ / 2} \sum_{j = 0}^{\infty} \frac{{(δ / 2)}^{j}}{j!} \frac{{(x / 2)}^{j}}{Γ (j + 1 + k / 2)} \{1 - \frac{(x / 2)}{j + 1 + k / 2}\} \\ + \frac{1}{8} {(x / 2)}^{k / 2} e^{- x / 2} e^{- δ / 2} \sum_{j = 0}^{\infty} \frac{{(δ / 2)}^{j}}{j!} \frac{{(x / 2)}^{j + 1}}{Γ (j + 2 + k / 2)} \{1 - \frac{(x / 2)}{j + 2 + k / 2}\} \\ = & \frac{1}{8} {(x / 2)}^{k / 2} e^{- x / 2} e^{- δ / 2} \sum_{j = 0}^{\infty} \frac{{(δ / 2)}^{j}}{j!} \frac{{(x / 2)}^{j}}{Γ (j + 1 + k / 2)} \\ \cdot [\frac{(x / 2)}{j + 1 + k / 2} \{1 - \frac{(x / 2)}{j + 2 + k / 2}\} - \{1 - \frac{(x / 2)}{j + 1 + k / 2}\}] . \end{matrix}

From the results of Lemma A3, that

| δ (ϵ) |

is bounded as

n \to \infty

for all

ϵ \in [0, C]

under both

H_{0}

and

H_{1 n}

, so are

\partial^{i} / \partial δ^{i} H_{k} {(x; δ) |}_{δ = δ (ϵ)}

(

i = 1, 2, 3

). Now, we consider the derivatives of

δ (ϵ)

,

\begin{matrix} \frac{\partial δ (ϵ)}{\partial ϵ} & = & 2 n {\{A_{n} \frac{\partial T (K_{n, ϵ})}{\partial ϵ}\}}^{T} U_{n}^{- 1} {A_{n} T (K_{n, ϵ}) - g_{0}} \\ \frac{\partial^{2} δ (ϵ)}{\partial ϵ^{2}} & = & 2 n {\{A_{n} \frac{\partial T (K_{n, ϵ})}{\partial ϵ}\}}^{T} U_{n}^{- 1} \{A_{n} \frac{\partial T (K_{n, ϵ})}{\partial ϵ}\} \\ + 2 n {\{A_{n} \frac{\partial^{2} T (K_{n, ϵ})}{\partial ϵ^{2}}\}}^{T} U_{n}^{- 1} {A_{n} T (K_{n, ϵ}) - g_{0}} \\ \frac{\partial^{3} δ (ϵ)}{\partial ϵ^{3}} & = & 6 n {\{A_{n} \frac{\partial^{2} T (K_{n, ϵ})}{\partial ϵ^{2}}\}}^{T} U_{n}^{- 1} \{A_{n} \frac{\partial T (K_{n, ϵ})}{\partial ϵ}\} \\ + 2 n {\{A_{n} \frac{\partial^{3} T (K_{n, ϵ})}{\partial ϵ^{3}}\}}^{T} U_{n}^{- 1} {A_{n} T (K_{n, ϵ}) - g_{0}} . \end{matrix}

To complete the proof, we only need to show that

\sqrt{n} ∥ \partial^{i} / \partial ϵ^{i} T (K_{n, ϵ}) ∥

(

i = 1, 2, 3

) are bounded as

n \to \infty

for all

ϵ \in [0, C]

, and

\sqrt{n} ∥ A_{n} T (K_{n, ϵ}) - g_{0} ∥

is bounded under

H_{0}

and

H_{1 n}

as

n \to \infty

for all

ϵ \in [0, C]

. The result for

\sqrt{n} ∥ A_{n} T (K_{n, ϵ}) - g_{0} ∥

is straightforward from Lemma A3.

First, for the first order derivative of

T (K_{n, ϵ})

,

\begin{matrix} \sqrt{n} \frac{\partial}{\partial ϵ} T (K_{n, ϵ}) \\ = & - H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1} [E_{J} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} - E_{K_{n, 0}} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}}] . \end{matrix}

Since

∥ H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1} ∥ \leq C

,

∥ E_{J} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} ∥ \leq C E_{J} ∥ w (X_{n}) {\tilde{X}}_{n} ∥ \leq C

and

\begin{matrix} ∥ E_{K_{n, 0}} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} ∥ \\ = & ∥ E_{K_{n, 0}} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} - E_{K_{n, 0}} {p_{1} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n}} ∥ \\ = & ∥ E_{K_{n, 0}} [p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}^{*}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}}] ∥ \\ \leq & C p_{n}^{3 / 2} / \sqrt{n}, \end{matrix}

we conclude that

\sqrt{n} ∥ \partial / \partial ϵ T (K_{n, ϵ}) ∥

is uniformly bounded for all

ϵ \in [0, C]

as

n \to \infty

.

Second, for the second order derivative of

T (K_{n, ϵ})

,

\begin{matrix} \sqrt{n} \frac{\partial^{2}}{\partial ϵ^{2}} T ((1 - ϵ / \sqrt{n}) K_{n, 0} + ϵ / \sqrt{n} J) \\ = & - \frac{\partial H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1}}{\partial ϵ} \\ \cdot [E_{J} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} - E_{K_{n, 0}} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}}] \\ - H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1} \\ \cdot \frac{\partial}{\partial ϵ} [E_{J} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} - E_{K_{n, 0}} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}}] \end{matrix}

with

\begin{matrix} \frac{\partial}{\partial ϵ} H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1} & = & - H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1} \frac{\partial H_{n, K_{n, ϵ}, T (K_{n, ϵ})}}{\partial ϵ} H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1}, \\ \frac{\partial H_{n, K_{n, ϵ}, T (K_{n, ϵ})}}{\partial ϵ} & = & - \frac{1}{\sqrt{n}} E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} \\ + \frac{1}{\sqrt{n}} E_{J} {p_{2} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}} \\ + (1 - ϵ / \sqrt{n}) E_{K_{n, 0}} \{p_{3} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} {\tilde{X}}_{n}^{T} \frac{\partial}{\partial ϵ} T (K_{n, ϵ})\} \\ + ϵ / \sqrt{n} E_{J} \{p_{3} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} {\tilde{X}}_{n}^{T} \frac{\partial}{\partial ϵ} T (K_{n, ϵ})\} . \end{matrix}

Therefore,

∥ \partial / \partial ϵ H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1} ∥ \leq C ∥ \partial / \partial ϵ H_{n, K_{n, ϵ}, T (K_{n, ϵ})} ∥ \leq C p_{n}^{3 / 2} / \sqrt{n}

. In addition,

\begin{matrix} ∥ \frac{\partial}{\partial ϵ} [E_{J} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} - E_{K_{n, 0}} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}}] ∥ \\ = & ∥ E_{J} \{p_{2} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} \frac{\partial}{\partial ϵ} T (K_{n, ϵ})\} \\ - E_{K_{n, 0}} \{p_{2} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} \frac{\partial}{\partial ϵ} T (K_{n, ϵ})\} ∥ \\ \leq & C p_{n} / \sqrt{n} . \end{matrix}

Therefore,

∥ \sqrt{n} \frac{\partial^{2}}{\partial ϵ^{2}} T ((1 - ϵ / \sqrt{n}) K_{n, 0} + ϵ / \sqrt{n} J) ∥ = o (1)

for all

ϵ \in [0, C]

.

Finally, for the third order derivative of

T (K_{n, ϵ})

,

\begin{matrix} \sqrt{n} \frac{\partial^{3}}{\partial ϵ^{3}} T ((1 - ϵ / \sqrt{n}) K_{n, 0} + ϵ / \sqrt{n} J) \\ = & - \frac{\partial^{2} H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1}}{\partial ϵ^{2}} \\ \cdot [E_{J} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} - E_{K_{n, 0}} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}}] \\ - 2 \frac{\partial H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1}}{\partial ϵ} \\ \cdot \frac{\partial}{\partial ϵ} [E_{J} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} - E_{K_{n, 0}} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}}] \\ - H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1} \\ \cdot \frac{\partial^{2}}{\partial ϵ^{2}} [E_{J} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} - E_{K_{n, 0}} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}}] . \end{matrix}

Note:

\begin{matrix} \frac{\partial^{2}}{\partial ϵ^{2}} H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1} \\ = & - \frac{\partial H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1}}{\partial ϵ} \frac{\partial H_{n, K_{n, ϵ}, T (K_{n, ϵ})}}{\partial ϵ} H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1} \\ - H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1} \frac{\partial^{2} H_{n, K_{n, ϵ}, T (K_{n, ϵ})}}{\partial ϵ^{2}} H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1} \\ - H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1} \frac{\partial H_{n, K_{n, ϵ}, T (K_{n, ϵ})}}{\partial ϵ} \frac{\partial H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1}}{\partial ϵ}, \end{matrix}

where

\begin{matrix} \frac{\partial^{2}}{\partial ϵ^{2}} H_{n, K_{n, ϵ}, T (K_{n, ϵ})} \\ = & - \frac{2}{\sqrt{n}} E_{K_{n, 0}} \{p_{3} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} {\tilde{X}}_{n}^{T} \frac{\partial}{\partial ϵ} T (K_{n, ϵ})\} \\ + \frac{2}{\sqrt{n}} E_{J} \{p_{3} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} {\tilde{X}}_{n}^{T} \frac{\partial}{\partial ϵ} T (K_{n, ϵ})\} \\ + (1 - ϵ / \sqrt{n}) E_{K_{n, 0}} \{p_{4} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} {({\tilde{X}}_{n}^{T} \frac{\partial}{\partial ϵ} T (K_{n, ϵ}))}^{2}\} \\ + ϵ / \sqrt{n} E_{J} \{p_{4} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} {({\tilde{X}}_{n}^{T} \frac{\partial}{\partial ϵ} T (K_{n, ϵ}))}^{2}\} . \end{matrix}

Hence,

∥ \frac{\partial^{2}}{\partial ϵ^{2}} H_{n, K_{n, ϵ}, T (K_{n, ϵ})} ∥ \leq C p_{n}^{2} / n

which implies that

∥ \frac{\partial^{2}}{\partial ϵ^{2}} H_{n, K_{n, ϵ}, T (K_{n, ϵ})}^{- 1} ∥ = o (1)

for all

ϵ \in [0, C]

. In addition,

\begin{matrix} \frac{\partial^{2}}{\partial ϵ^{2}} [E_{J} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} - E_{K_{n, 0}} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}}] \\ = & \frac{\partial}{\partial ϵ} [E_{J} \{p_{2} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} \frac{\partial}{\partial ϵ} T (K_{n, ϵ})\} \\ - E_{K_{n, 0}} \{p_{2} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} \frac{\partial}{\partial ϵ} T (K_{n, ϵ})\}] \\ = & E_{J} \{p_{3} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {({\tilde{X}}_{n}^{T} \frac{\partial}{\partial ϵ} T (K_{n, ϵ}))}^{2}\} \\ + E_{J} \{p_{2} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} \frac{\partial^{2}}{\partial ϵ^{2}} T (K_{n, ϵ})\} \\ - E_{{\tilde{β}}_{n, 0}} \{p_{3} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {({\tilde{X}}_{n}^{T} \frac{\partial}{\partial ϵ} T (K_{n, ϵ}))}^{2}\} \\ - E_{{\tilde{β}}_{n, 0}} \{p_{2} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T} \frac{\partial^{2}}{\partial ϵ^{2}} T (K_{n, ϵ})\} . \end{matrix}

Hence,

∥ \frac{\partial^{2}}{\partial ϵ^{2}} [E_{J} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}} - E_{K_{n, 0}} {p_{1} (Y; {\tilde{X}}_{n}^{T} T (K_{n, ϵ})) w (X_{n}) {\tilde{X}}_{n}}] ∥ \leq C p_{n} / \sqrt{n}

. Therefore,

∥ \sqrt{n} \frac{\partial^{3}}{\partial ϵ^{3}} T ((1 - ϵ / \sqrt{n}) K_{n, 0} + ϵ / \sqrt{n} J) ∥ = o (1)

for all

ϵ \in [0, C]

. Hence, we complete the proof. ☐

Proof of Theorem 1.

We follow the idea of the proof in [10]. Lemma A7 implies that the Wald-type test statistic

W_{n}

is asymptotically noncentral

χ_{k}^{2}

with noncentrality parameter

δ (ϵ) = n ∥ U_{n}^{- 1 / 2} {A_{n} T (K_{n, ϵ}) - g_{0}} ∥^{2}

. Therefore,

α (K_{n, ϵ}) = P (W_{n} > η_{_{1 - α_{_{0}}}} | H_{0}) = 1 - H_{k} (η_{_{1 - α_{_{0}}}}; δ (ϵ)) + h (n, ϵ)

where

h (n, ϵ) = α (K_{n, ϵ}) - 1 + H_{k} (η_{_{1 - α_{_{0}}}}; δ (ϵ)) \to 0

as

n \to \infty

for any fixed

ϵ

. Let

b (ϵ) = - H_{k} (η_{_{1 - α_{_{0}}}}; δ (ϵ))

. Then, for

ϵ

close to 0, we have

\begin{matrix} α (K_{n, ϵ}) - α_{_{0}} = b (ϵ) - b (0) + h (n, ϵ) - h (n, 0) \\ = & ϵ b^{'} (0) + \frac{1}{2} ϵ^{2} b^{″} (0) + \frac{1}{6} ϵ^{3} b^{(3)} (ϵ^{*}) + h (n, ϵ) - h (n, 0), \end{matrix}

(A17)

where

0 < ϵ^{*} < ϵ

. Note that under

H_{0}

\begin{matrix} b^{'} (0) = μ_{k} \{\frac{\partial δ (ϵ)}{\partial ϵ} |_{ϵ = 0}\} = 2 μ_{k} n {\{A_{n} \frac{\partial T (K_{n, ϵ})}{\partial ϵ}\}}^{T} |_{ϵ = 0} U_{n}^{- 1} {A_{n} {\tilde{β}}_{n, 0} - g_{0}} = 0 . \end{matrix}

From Lemma A8, under

H_{0}

\begin{matrix} \frac{\partial T (K_{n, ϵ})}{\partial ϵ} |_{ϵ = 0} = 1 / \sqrt{n} E_{J} {IF (Z_{n}; T, K_{n, 0})} . \end{matrix}

Thus,

\begin{matrix} b^{″} (0) & = & μ_{k} \{\frac{\partial^{2} δ (ϵ)}{\partial ϵ^{2}} |_{ϵ = 0}\} = 2 μ_{k} {∥ U_{n}^{- 1 / 2} A_{n} E_{J} {IF (Z_{n}; T, K_{n, 0})} ∥}^{2} . \end{matrix}

Since from Lemma A8,

IF (z_{n}; T, K_{n, 0}) = - H_{n}^{- 1} E_{J} {ψ_{RBD} (z_{n}; {\tilde{β}}_{n, 0})}

is uniformly bounded, we have

\begin{matrix} D = \underset{n \to \infty}{lim sup} {∥ U_{n}^{- 1 / 2} A_{n} E_{J} {IF (Z_{n}; T, K_{n, 0})} ∥}^{2} < \infty . \end{matrix}

From Equation (A17)

\begin{matrix} \underset{n \to \infty}{lim sup} α (K_{n, ϵ}) = α_{_{0}} + ϵ^{2} μ_{k} D + o (ϵ^{2}), \end{matrix}

since

{sup}_{ϵ \in [0, C]} {lim sup}_{n \to \infty} | b^{(3)} (ϵ) | \leq C

from Lemma A9. We complete the proof. ☐

Proof of Corollary 1.

For Part

(i)

, following the proof of Theorem 1, for any fixed

z

,

lim_{n \to \infty} α (K_{n, ϵ}) = α_{_{0}} + ϵ^{2} μ_{k} {∥ U^{- 1 / 2} A IF (z; T, K_{0}) ∥}^{2} + d (z, ϵ),

where

d (z, ϵ) = o (ϵ^{2})

. From the assumption that

{sup}_{x \in R^{p}} ∥ w (x) x ∥ \leq C

and

{sup}_{μ \in R} | q^{″} (μ) \sqrt{V (μ)} / F^{'} (μ) | \leq C

, we know

D_{1} \leq \infty

. Following the proof of Lemma A9,

{sup}_{z \in R} | d (z, ϵ) | = o (ϵ^{2})

. We finished the proof of part

(i)

.

Part

(ii)

is straightforward by applying Theorem 1 with

J = Δ_{z_{n}}

. ☐

Proof of Theorem 2.

Lemma A7 implies that

\begin{matrix} \sqrt{n} [{U (K_{n})}^{- 1 / 2} {A_{n} T (K_{n}) - g_{0}} - {U (K_{n})}^{- 1 / 2} (A_{n} {\tilde{β}}_{n, 0} - g_{0}) \\ - U_{n}^{- 1 / 2} A_{n} {T (K_{n, ϵ}) - {\tilde{β}}_{n, 0}}] \overset{L}{⟶} (0, I_{k}) . \end{matrix}

From Lemmas A5 and A6,

\begin{matrix} \sqrt{n} [{U (K_{n})}^{- 1 / 2} {A_{n} T (K_{n}) - g_{0}} - U_{n}^{- 1 / 2} {A_{n} T (K_{n, ϵ}) - g_{0}}] \overset{L}{⟶} (0, I_{k}) . \end{matrix}

Then,

W_{n}

is asymptotically

χ_{k}^{2} (δ (ϵ))

with

δ (ϵ) = n ∥ U_{n}^{- 1 / 2} {A_{n} T (K_{n, ϵ}) - g_{0}} ∥^{2}

under

H_{1 n}

. Therefore,

β (K_{n, ϵ}) = P (W_{n} > η_{_{1 - α_{_{0}}}} | H_{1 n}) = 1 - H_{k} (η_{_{1 - α_{_{0}}}}; δ (ϵ)) + h (n, ϵ)

, where

h (n, ϵ) = β (K_{n, ϵ}) - 1 + H_{k} (η_{_{1 - α_{_{0}}}}; δ (ϵ)) \to 0

as

n \to \infty

for any fixed

ϵ

. Let

b (ϵ) = - H_{k} (η_{_{1 - α_{_{0}}}}; δ (ϵ))

. Then, for

ϵ

close to 0, we have

\begin{matrix} β (K_{n, ϵ}) - β_{0} = b (ϵ) - b (0) + h (n, ϵ) - h (n, 0) \\ = & ϵ b^{'} (0) + \frac{1}{2} ϵ^{2} b^{″} (ϵ^{*}) + h (n, ϵ) - h (n, 0), \end{matrix}

(A18)

where

0 < ϵ^{*} < ϵ

. Note that under

H_{1 n}

,

δ (0) = n ∥ U_{n}^{- 1 / 2} (A_{n} {\tilde{β}}_{n, 0} - g_{0}) ∥^{2} = c^{T} U_{n}^{- 1} c .

Then,

\begin{matrix} b^{'} (0) & = & \frac{- \partial H_{k} (η_{_{1 - α_{_{0}}}}; δ)}{\partial δ} |_{δ = δ (0)} \frac{\partial δ (ϵ)}{\partial ϵ} |_{ϵ = 0} = 2 ν_{k} n {\{A_{n} \frac{\partial T (K_{n, ϵ})}{\partial ϵ}\}}^{T} |_{ϵ = 0} U_{n}^{- 1} {A_{n} {\tilde{β}}_{n, 0} - g_{0}} \\ = & 2 ν_{k} \sqrt{n} {\{A_{n} \frac{\partial T (K_{n, ϵ})}{\partial ϵ}\}}^{T} |_{ϵ = 0} U_{n}^{- 1} c . \end{matrix}

From Lemma A8,

\begin{matrix} \frac{\partial T (K_{n, ϵ})}{\partial ϵ} |_{ϵ = 0} = 1 / \sqrt{n} E_{J} {IF (Z_{n}; T, K_{n, 0})}, \end{matrix}

and hence,

\begin{matrix} b^{'} (0) = 2 ν_{k} c^{T} U_{n}^{- 1} A_{n} E_{J} {IF (Z_{n}; T, K_{n, 0})} . \end{matrix}

Since

{sup}_{ϵ \in [0, C]} {lim sup}_{n \to \infty} | b^{″} (ϵ) | \leq C

under

H_{1 n}

by Lemma A9, we have

{lim inf}_{n \to \infty} 1 / 2 ϵ^{2} b^{″} (ϵ^{*}) = o (ϵ)

as

ϵ \to 0

.

Since from Lemma A8,

IF (z_{n}; T, K_{n, 0}) = - H_{n}^{- 1} E_{J} {ψ_{RBD} (z_{n}; {\tilde{β}}_{n, 0})}

is uniformly bounded,

\begin{matrix} | B | = | \underset{n \to \infty}{lim inf} 2 c^{T} U_{n}^{- 1} A_{n} E_{J} {IF (Z_{n}; T, K_{n, 0})} | < \infty . \end{matrix}

From Equation (A18), we complete the proof. ☐

Proof of Corollary 2.

The proof is similar to that for Corollary 1, using the results in Theorem 2. ☐

Appendix B. List of Notations and Symbols

$A_{n}$ : $k \times (p_{n} + 1)$ matrix in hypotheses Equations (8) and (14)
$c$ : $k$ dimensional vector in $H_{1 n}$ in Equation (14)
$F (\cdot)$ : link function
G: bias-correction term in “robust- $BD$ ”
$G$ : limit of $A_{n} A_{n}^{T}$ , i.e., $A_{n} A_{n}^{T} \overset{n \to \infty}{⟶} G$
$H_{n}$ : $H_{n} = E_{K_{n, 0}} {p_{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}}$
$IF (\cdot; \cdot, \cdot)$ : influence function
J: an arbitrary distribution in the contamination of Equation (10)
$K_{n, 0}$ : true parametric distribution of $Z_{n}$
$K_{n, ϵ}$ : $K_{n, ϵ} = (1 - \frac{ϵ}{\sqrt{n}}) K_{n, 0} + \frac{ϵ}{\sqrt{n}} J$ , $ϵ$ -contamination in Equation (10)
$K_{n}$ : empirical distribution of ${{Z_{n}}_{i}}_{i = 1}^{n}$
$ℓ_{K} (\cdot)$ : expectation of robust- $BD$ in Equation (11)
$m (\cdot)$ : conditional mean of Y given $X_{n}$ in Equation (1)
n: sample size
$p_{n}$ : dimension of $β$
$p_{i} (\cdot; \cdot)$ : ith order derivative of robust- $BD$
$q (\cdot)$ : generating q-function of $BD$
$T (\cdot)$ : vector, a functional of estimator in Equation (12)
$U_{n}$ : $U_{n} = A_{n} H_{n}^{- 1} Ω_{n} H_{n}^{- 1} A_{n}^{T}$
$V (\cdot)$ : conditional variance of Y given $X_{n}$ in Equation (2)
$W_{n}$ : Wald-type test statistic in Equation (9)
$w (\cdot)$ : weight function
$X_{n}$ : explanatory variables
Y: response variable
$Z_{n} = {(X_{n}^{T}, Y)}^{T}$
$α (\cdot)$ : level of the test
$β (\cdot)$ : power of the test
${\tilde{β}}_{n, 0}$ : true regression parameter
$Δ_{z_{n}}$ : probability measure which puts mass 1 at the point $z_{n}$
$ϵ$ : amount of contamination in Equation (10), positive constant
$ψ_{RBD} (\cdot; \cdot)$ : score vector in Equation (7)
$Ω_{n}$ : $Ω_{n} = E_{K_{n, 0}} {p_{1}^{2} (Y; {\tilde{X}}_{n}^{T} {\tilde{β}}_{n, 0}) w^{2} (X_{n}) {\tilde{X}}_{n} {\tilde{X}}_{n}^{T}}$
$ρ_{q} (\cdot, \cdot)$ : robust- $BD$ in Equation (4)

References

Zhang, C.M.; Guo, X.; Cheng, C.; Zhang, Z.J. Robust-BD estimation and inference for varying-dimensional general linear models. Stat. Sin. 2012, 24, 653–673. [Google Scholar] [CrossRef]
McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Chapman & Hall: London, UK, 1989. [Google Scholar]
Morgenthaler, S. Least-absolute-deviations fits for generalized linear models. Biometrika 1992, 79, 747–754. [Google Scholar] [CrossRef]
Ruckstuhl, A.F.; Welsh, A.H. Robust fitting of the binomial model. Ann. Stat. 2001, 29, 1117–1136. [Google Scholar]
Noh, M.; Lee, Y. Robust modeling for inference from generalized linear model classes. J. Am. Stat. Assoc. 2007, 102, 1059–1072. [Google Scholar] [CrossRef]
Künsch, H.R.; Stefanski, L.A.; Carroll, R.J. Conditionally unbiased bounded-influence estimation in general regression models, with applications to generalized linear models. J. Am. Stat. Assoc. 1989, 84, 460–466. [Google Scholar]
Stefanski, L.A.; Carroll, R.J.; Ruppert, D. Optimally bounded score functions for generalized linear models with applications to logistic regression. Biometrika 1986, 73, 413–424. [Google Scholar] [CrossRef]
Bianco, A.M.; Yohai, V.J. Robust estimation in the logistic regression model. In Robust Statistics, Data Analysis, and Computer Intensive Methods; Springer: New York, NY, USA, 1996; pp. 17–34. [Google Scholar]
Croux, C.; Haesbroeck, G. Implementing the Bianco and Yohai estimator for logistic regression. Comput. Stat. Data Anal. 2003, 44, 273–295. [Google Scholar] [CrossRef]
Heritier, S.; Ronchetti, E. Robust bounded-influence tests in general parametric models. J. Am. Stat. Assoc. 1994, 89, 897–904. [Google Scholar] [CrossRef]
Cantoni, E.; Ronchetti, E. Robust inference for generalized linear models. J. Am. Stat. Assoc. 2001, 96, 1022–1030. [Google Scholar] [CrossRef]
Bianco, A.M.; Martínez, E. Robust testing in the logistic regression model. Comput. Stat. Data Anal. 2009, 53, 4095–4105. [Google Scholar] [CrossRef]
Ronchetti, E.; Trojani, F. Robust inference with GMM estimators. J. Econom. 2001, 101, 37–69. [Google Scholar] [CrossRef]
Basu, A.; Mandal, N.; Martin, N.; Pardo, L. Robust tests for the equality of two normal means based on the density power divergence. Metrika 2015, 78, 611–634. [Google Scholar] [CrossRef] [Green Version]
Lee, S.; Na, O. Test for parameter change based on the estimator minimizing density-based divergence measures. Ann. Inst. Stat. Math. 2005, 57, 553–573. [Google Scholar] [CrossRef]
Kang, J.; Song, J. Robust parameter change test for Poisson autoregressive models. Stat. Probab. Lett. 2015, 104, 14–21. [Google Scholar] [CrossRef]
Basu, A.; Ghosh, A.; Martin, N.; Pardo, L. Robust Wald-type tests for non-homogeneous observations based on minimum density power divergence estimator. ArXiv Pre-print, 2017; arXiv:1707.02333. [Google Scholar]
Ghosh, A.; Basu, A.; Pardo, L. Robust Wald-type tests under random censoring. ArXiv, 2017; arXiv:1708.09695. [Google Scholar]
Brègman, L.M. A relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming. U.S.S.R. Comput. Math. Math. Phys. 1967, 7, 620–631. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: Berlin, Germany, 2001. [Google Scholar]
Zhang, C.M.; Jiang, Y.; Shang, Z. New aspects of Bregman divergence in regression and classification with parametric and nonparametric estimation. Can. J. Stat. 2009, 37, 119–139. [Google Scholar] [CrossRef]
Huber, P. Robust estimation of a location parameter. Ann. Math. Statist. 1964, 35, 73–101. [Google Scholar] [CrossRef]
Hampel, F.R.; Ronchetti, E.M.; Roussecuw, P.J.; Stahel, W.A. Robust Statistics: The Application Based on Influence Function; John Wiley: New York, NY, USA, 1986. [Google Scholar]
Hampel, F.R. The influence curve and its role in robust estimation. J. Am. Stat. Assoc. 1974, 69, 383–393. [Google Scholar] [CrossRef]
Fan, J.; Peng, H. Nonconcave penalized likelihood with a diverging number of parameters. Ann. Stat. 2004, 32, 928–961. [Google Scholar]
Van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
Clarke, B.R. Uniqueness and Fréchet differentiability of functional solutions to maximum likelihood type equations. Ann. Stat. 1983, 11, 1196–1205. [Google Scholar] [CrossRef]

Figure 1. Observed level of

W_{n}

versus

ϵ

for overdispersed Poisson responses. The dotted line indicates the 5% significance level.

Figure 1. Observed level of

W_{n}

versus

ϵ

for overdispersed Poisson responses. The dotted line indicates the 5% significance level.

Figure 2. Observed power of

W_{n}

versus

ϵ

for overdispersed Poisson responses. The statistics in the left panel correspond to non-robust method and those in the right panel are for robust method. The asterisk line indicates the 5% significance level.

Figure 2. Observed power of

W_{n}

versus

ϵ

for overdispersed Poisson responses. The statistics in the left panel correspond to non-robust method and those in the right panel are for robust method. The asterisk line indicates the 5% significance level.

Figure 3. Observed level of

W_{n}

versus

ϵ

for Bernoulli responses. The statistics in (a) use deviance loss and those in (b) use exponential loss. The dotted line indicates the 5% significancelevel.

Figure 3. Observed level of

W_{n}

versus

ϵ

for Bernoulli responses. The statistics in (a) use deviance loss and those in (b) use exponential loss. The dotted line indicates the 5% significancelevel.

Figure 4. Observed power of

W_{n}

versus

ϵ

for Bernoulli responses. The top panels correspond to deviance loss while the bottom panels are for exponential loss. The statistics in the left panels are calculated using non-robust method and those in the right panels are from robust method. The asterisk line indicates the 5% significance level.

Figure 4. Observed power of

W_{n}

versus

ϵ

for Bernoulli responses. The top panels correspond to deviance loss while the bottom panels are for exponential loss. The statistics in the left panels are calculated using non-robust method and those in the right panels are from robust method. The asterisk line indicates the 5% significance level.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, X.; Zhang, C. Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models. Entropy 2018, 20, 168. https://doi.org/10.3390/e20030168

AMA Style

Guo X, Zhang C. Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models. Entropy. 2018; 20(3):168. https://doi.org/10.3390/e20030168

Chicago/Turabian Style

Guo, Xiao, and Chunming Zhang. 2018. "Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models" Entropy 20, no. 3: 168. https://doi.org/10.3390/e20030168

APA Style

Guo, X., & Zhang, C. (2018). Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models. Entropy, 20(3), 168. https://doi.org/10.3390/e20030168

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models

Abstract

1. Introduction

2. Review of Robust- $BD$ Estimation and Inference for “General Linear Models”

3. Robustness Properties of $W_{n}$ in Equation (9)

3.1. Asymptotic Level of $W_{n}$ under Contamination

3.2. Asymptotic Power of $W_{n}$ under Contamination

4. Simulation

4.1. Overdispersed Poisson Responses

4.2. Bernoulli Responses

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A. Conditions and Proofs of Main Results

Appendix B. List of Notations and Symbols

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models

Abstract

1. Introduction

2. Review of Robust- BD Estimation and Inference for “General Linear Models”

3. Robustness Properties of W n in Equation (9)

3.1. Asymptotic Level of W n under Contamination

3.2. Asymptotic Power of W n under Contamination

4. Simulation

4.1. Overdispersed Poisson Responses

4.2. Bernoulli Responses

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A. Conditions and Proofs of Main Results

Appendix B. List of Notations and Symbols

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2. Review of Robust- $BD$ Estimation and Inference for “General Linear Models”

3. Robustness Properties of $W_{n}$ in Equation (9)

3.1. Asymptotic Level of $W_{n}$ under Contamination

3.2. Asymptotic Power of $W_{n}$ under Contamination