A U-Statistic for Testing the Lack of Dependence in Functional Partially Linear Regression Model

Zhao, Fanrong; Zhang, Baoxue

doi:10.3390/math12162588

Open AccessFeature PaperArticle

A U-Statistic for Testing the Lack of Dependence in Functional Partially Linear Regression Model

by

Fanrong Zhao

¹

and

Baoxue Zhang

^2,*

¹

School of Mathematics and Statistics, Shanxi University, Taiyuan 030006, China

²

School of Statistics, Capital University of Economics and Business, Beijing 100070, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(16), 2588; https://doi.org/10.3390/math12162588

Submission received: 17 July 2024 / Revised: 14 August 2024 / Accepted: 19 August 2024 / Published: 21 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

The functional partially linear regression model comprises a functional linear part and a non-parametric part. Testing the linear relationship between the response and the functional predictor is of fundamental importance. In cases where functional data cannot be approximated with a few principal components, we develop a second-order U-statistic using a pseudo-estimate for the unknown non-parametric component. Under some regularity conditions, the asymptotic normality of the proposed test statistic is established using the martingale central limit theorem. The proposed test is evaluated for finite sample properties through simulation studies and its application to real data.

Keywords:

asymptotic normality; functional partially linear regression model; Nadaraya–Watson estimate; U-statistic

MSC:

62G10; 62G20; 62F03

1. Introduction

In the past few decades, functional data analysis has been widely developed and applied in various fields, such as medicine, biology, economics, environmetrics, and chemistry (see [1,2,3,4,5]). An important model in functional data analysis is the partial functional linear model, which includes the parametric linear part and the functional linear part. To make the relationships between variables more flexible, the parametric linear part is usually replaced by the non-parametric part. This model is known as the functional partially linear regression model, which has been studied in [6,7,8]. The functional partially linear regression model is formulated as follows:

Y = g (u) + \int_{0}^{1} α (s) X (s) d s + ε,

(1)

where Y is the response variable.

X (\cdot)

denotes the functional predictor, characterized by its mean function,

μ_{0} (\cdot)

, and covariance operator,

C

. The slope function

α (\cdot)

is an unknown function.

g (\cdot)

is a general continuous function defined on a compact support

Ω

. The random error

ε

has a mean of zero and a finite variance

σ^{2}

, and is statistically independent of the predictor

X (\cdot)

. When

g (\cdot)

is a constant, model (1) reduces to a functional linear model. Refer to [9,10,11] for further details. With

g (\cdot)

representing the parametric linear component, model (1) is identified as a partially functional linear model, an area explored in [12,13,14].

Hypothesis testing plays a critical role in statistical inference. For testing the linear relationship between the response and the functional predictor in the functional linear model, functional principal component analysis (FPCA) is a major idea in constructing test statistics. See [9,10,15]. Taking into account the flexibility of non-parametric functions, Ref. [6] introduced the functional partially linear model. Refs. [7,8] constructed the estimators of the slope functions based on spline and FPCA respectively. They utilized B-spline for estimating non-parametric components. In the context of predictors with additive measurement error, ref. [16] investigated estimators for the slope function and non-parametric component using FPCA and kernel smoothing methods. Ref. [17] established estimators of the slope function, non-parametric component, and mean of the response variable in the presence of randomly missing responses.

However, testing the relationship between the response variable and functional predictor in the functional partially linear regression model has been rarely considered so far. In this paper, the following hypothesis testing for model (1) will be considered:

H_{0} : α (t) = α_{0} (t) v . s . H_{1} : α (t) \neq α_{0} (t),

(2)

where

α_{0} (t)

denotes an assigned function. Here we assume

α_{0} (t) = 0

without compromising generality. To test (2) within the framework of model (1), a chi-square test was devised by [18]. This test relies on estimators for the nonlinear and slope functions. The underlying assumption is that the functional data can be well-approximated by a small number of principal components.

In particular, we focus on functional data that cannot be approximated with a few principal components, such as the velocity and acceleration of changes in China’s Air Quality Index (AQI). If these changes are represented by some curves, the velocity and acceleration are equivalent to the first and second derivatives of the AQI, respectively. The number of principal components selected by FPCA may approach approximately 30. Only several research studies have considered this data structure in the functional data analysis. Ref. [19] constructed a FLUTE test based on order-four U-statistic for the testing in the functional linear model, which can be computationally very costly. In order to save calculation time, ref. [20] developed a faster test using a order-two U-statistic. Inspired by this, we introduce a non-parametric U-statistic that integrates functional data analysis with the traditional kernel method to test (2).

The structure of the paper is as follows. Section 2 details the development of a new test procedure for the functional partially linear regression model. Section 3 presents the theoretical properties of the proposed test statistic under some regularity conditions. Section 4 includes a simulation study to evaluate the finite sample performance of the proposed test. Section 5 presents the application of the test to spectrometric data. The proofs of the primary theoretical results are presented in Appendix A.

2. Test Statistic

Assume Y and U are random variables taking real values.

X (\cdot)

is a stochastic process with sample paths in

L^{2} [0, 1]

, which is the set of all square-integrable functions defined on

[0, 1]

. Let

〈 \cdot, \cdot 〉

,

‖ \cdot ‖

represent the inner product and norm in

L^{2} [0, 1]

, respectively. {

(Y_{i}, X_{i} (\cdot), U_{i}), i = 1, 2, \dots, n

} constitutes a random sample drawn from model (1),

Y_{i} = \int_{0}^{1} α (s) X_{i} (s) d s + g (U_{i}) + ε_{i}, i = 1, 2, \dots, n .

(3)

For any given

α (t) \in L^{2} [0, 1]

, we move

α (t)

to the left,

Y_{i} - 〈 X_{i}, α 〉 = g (U_{i}) + ε_{i}, i = 1, 2, \dots, n .

(4)

Hence, model (4) simplifies to a classical non-parametric model. A pseudo-estimate for the non-parametric function employing Nadaraya–Watson method, can be formulated as follows:

\hat{g} (U_{i}) = \sum_{j \neq i}^{n} \frac{K_{h} (U_{j} - U_{i}) (Y_{j} - 〈 X_{j}, α 〉)}{\sum_{k \neq i}^{n} K_{h} (U_{k} - U_{i})}, i = 1, 2, \dots, n,

(5)

where

K_{h} (\cdot) = K (\cdot / h) / h

with

K (\cdot)

being a preselected kernel function. A kernel function maps from the set of real numbers to the set of real numbers. It adheres to the following properties: (i) Non-negativity: the kernel function

K (\cdot)

must be non-negative. (ii) Normalization: The integral (or sum in the discrete case) of the kernel function over the entire real line must equal 1, which means it can be interpreted as a probability density function. The bandwidth h in (5) is typically selected through data-driven procedures, such as cross-validation techniques. Here, we estimate non-parametric

g (U_{i})

without the ith sample.

Let

W_{- i} = {(W_{i 1}, \dots, W_{i (i - 1)}, W_{i (i + 1)}, \dots, W_{i n})}^{T},

〈 X_{- i}, α 〉 = {(〈 X_{1}, α 〉, \dots, 〈 X_{i - 1}, α 〉, 〈 X_{i + 1}, α 〉, \dots, 〈 X_{n}, α 〉)}^{T},

Y_{- i} = {(Y_{1}, \dots, Y_{i - 1}, Y_{i + 1}, \dots, Y_{n})}^{T}, X_{- i} = {(X_{1}, \dots, X_{i - 1}, X_{i + 1}, \dots, X_{n})}^{T},

where

W_{i j} = K_{h} (U_{j} - U_{i}) / \sum_{k \neq i} K_{h} (U_{k} - U_{i})

. So the pseudo-estimate (5) of non-parametric function can be reformulated in matrix form as

\overset{ˇ}{g} (U_{i}) = W_{- i}^{T} (Y_{- i} - 〈 X_{- i}, α 〉) .

Substituting

\overset{ˇ}{g} (U_{i})

for

g (U_{i})

in model (3), we have

{\overset{ˇ}{Y}}_{i} = 〈 {\overset{ˇ}{X}}_{i}, α 〉 + ε_{i},

(6)

where

{\overset{ˇ}{X}}_{i} (t) = X_{i} (t) - W_{- i}^{T} X_{- i} (t)

,

{\overset{ˇ}{Y}}_{i} = Y_{i} - W_{- i}^{T} Y_{- i}

. If we denote

μ_{i t} ≜ μ (U_{i}, t) = E [X_{1} (t) | U_{i}]

, where “≜” stands for “defined as”. Then

{\hat{μ}}_{i t} = W_{- i}^{T} X_{- i} (t)

can be the estimator of the conditional expectation

μ_{i t}

for any

t \in [0, 1]

.

Given an arbitrary orthonormal basis

{ψ_{j}}_{j = 1}^{\infty}

in

L^{2} [0, 1]

, the functional predictor

X (\cdot)

and the slope function

α (\cdot)

admit the following series expansions: Let p represent the number of truncated basis functions, as follows:

X_{i} (t) = \sum_{j = 1}^{p} ξ_{i j} ψ_{j} (t) + \sum_{j = p + 1}^{\infty} ξ_{i j} ψ_{j} (t), α (t) = \sum_{j = 1}^{p} β_{j} ψ_{j} (t) + \sum_{j = p + 1}^{\infty} β_{j} ψ_{j} (t),

(7)

where

ξ_{i j} = 〈 X_{i}, ψ_{j} 〉

,

β_{j} = 〈 α, ψ_{j} 〉

. Let

{\overset{ˇ}{ξ}}_{i j} = 〈 {\overset{ˇ}{X}}_{i}, ψ_{j} 〉

, then the model (6) can be rewritten as follows:

{\overset{ˇ}{Y}}_{i} = \sum_{j = 1}^{\infty} {\overset{ˇ}{ξ}}_{i j} β_{j} + ε_{i} = \sum_{j = 1}^{p} {\overset{ˇ}{ξ}}_{i j} β_{j} + \sum_{j = p + 1}^{\infty} {\overset{ˇ}{ξ}}_{i j} β_{j} + ε_{i} .

Denote

ξ_{i} = {(ξ_{i 1}, ξ_{i 2}, \dots, ξ_{i p})}^{T}

, which has mean

μ

and covariance matrix

Σ

. Let

ξ_{- i} = {(ξ_{1}, \dots, ξ_{i - 1}, ξ_{i + 1}, \dots, ξ_{n})}^{T}, {\overset{ˇ}{ξ}}_{i} = {({\overset{ˇ}{ξ}}_{i 1}, {\overset{ˇ}{ξ}}_{i 2}, \dots, {\overset{ˇ}{ξ}}_{i p})}^{T},

{\overset{ˇ}{ξ}}_{- i} = {({\overset{ˇ}{ξ}}_{1}, \dots, {\overset{ˇ}{ξ}}_{i - 1}, {\overset{ˇ}{ξ}}_{i + 1}, \dots, {\overset{ˇ}{ξ}}_{n})}^{T}, β = {(β_{1}, β_{2}, \dots, β_{p})}^{T} .

For model (3), the approximation error is defined as follows:

e_{i} = \int_{0}^{1} α (s) X_{i} (s) d s - \sum_{k = 1}^{p} ξ_{i k} β_{k} .

To investigate the influence of the approximation error, we impose the following conditions on the functional predictors and regression function:

(C1) The functional predictors

{X_{i} (\cdot)}_{i = 1}^{n}

and the regression function

α (t)

adhere to the following conditions:

(i) The functional predictors

{X_{i} (\cdot)}_{i = 1}^{n}

reside within a Sobolev ellipsoid of order two, then there exists a universal constant C, such that

\sum_{j = 1}^{\infty} ξ_{i j}^{2} j^{4} \leq C^{2}, for i = 1, \dots, n .

(ii) The regression function satisfies

\int α^{2} (t) d t \leq D,

where D is a constant.

By applying the Cauchy–Schwarz inequality, we obtain the following:

e_{i}^{2} = {(\sum_{j = p + 1}^{\infty} ξ_{i j} β_{j})}^{2} \leq \sum_{j = p + 1}^{\infty} ξ_{i j}^{2} j^{4} \sum_{j = p + 1}^{\infty} j^{- 4} α_{j}^{2} \leq \frac{C^{2} D}{p^{4}} .

Then the approximation error can be ignored as

p \to \infty

. Model (6) becomes as follows:

{\overset{ˇ}{Y}}_{i} = \sum_{j = 1}^{p} {\overset{ˇ}{ξ}}_{i j} β_{j} + ε_{i},

which is a high-dimensional partial linear model. Since

E ∥ (X_{i} - E [X_{i} | U_{i}]) (Y_{i} - E (Y_{i} | U_{i})) ∥^{2}

(8)

can be an effective measure for assessing the distance between

α (\cdot)

and zero for test (2). Motivated by [21], we construct the following test statistic by estimating (8).

T_{n p} = {(1 - \frac{2}{n})}^{- 2} {(\binom{n}{2})}^{- 1} \sum_{i = 2}^{n} \sum_{j = 1}^{i - 1} Δ_{i j} (\overset{ˇ}{X}) Δ_{i j} (\overset{ˇ}{Y}),

(9)

where

\begin{matrix} Δ_{i j} (\overset{ˇ}{X}) & = 〈{\overset{ˇ}{X}}_{i} - \bar{\overset{ˇ}{X}}, {\overset{ˇ}{X}}_{j} - \bar{\overset{ˇ}{X}}〉 + \frac{〈 {\overset{ˇ}{X}}_{i} - {\overset{ˇ}{X}}_{j}, {\overset{ˇ}{X}}_{i} - {\overset{ˇ}{X}}_{j} 〉}{2 n}, \\ Δ_{i j} (\overset{ˇ}{Y}) & = ({\overset{ˇ}{Y}}_{i} - \bar{\overset{ˇ}{Y}}) ({\overset{ˇ}{Y}}_{j} - \bar{\overset{ˇ}{Y}}) + \frac{{({\overset{ˇ}{Y}}_{i} - {\overset{ˇ}{Y}}_{j})}^{2}}{2 n}, \end{matrix}

where

\bar{\overset{ˇ}{X}} (t)

and

\bar{\overset{ˇ}{Y}}

denote the sample means of

{\overset{ˇ}{X}}_{i} (t)

and

{\overset{ˇ}{Y}}_{i}

, respectively. By some calculations, we can obtain

E [Δ_{i j} (\overset{ˇ}{X})] = 0

,

E [Δ_{i j} (\overset{ˇ}{Y})] = 0

. The test statistic

T_{n p}

quantifies the discrepancy between

α (\cdot)

and 0 under the null hypothesis. High values of the test statistic

T_{n p}

suggest evidence in favor of the alternative hypothesis, prompting the rejection of the null hypothesis.

3. Asymptotic Theory

To achieve the asymptotic properties of the proposed test, we first suppose the following conditions based on [19,21]. We denote the following:

μ (U_{i}) = {(μ_{1} (U_{i}), μ_{2} (U_{i}), \dots, μ_{p} (U_{i}))}^{T} ≜ E [ξ | U_{i}],

Σ_{*} (U_{i}) = E [ξ_{i} ξ_{i}^{T} | U_{i}] - μ (U_{i}) μ^{T} (U_{i}), Σ_{*} = Σ - E [μ (U_{1}) μ^{T} (U_{1})] .

A condition on the dimensionality of matrix

Σ_{*}

is stipulated as follows:

(C2) As

n \to \infty

,

p \to \infty

;

Σ_{*} > 0

,

tr (Σ_{*}^{4}) = o ({tr}^{2} (Σ_{*}^{2}))

.

(C3) For a constant

m \geq p

, there exists an m-dimensional random vector

Z_{i} = {(Z_{i 1}, \dots, Z_{i m})}^{T}

such that

ξ_{i} = E [ξ_{i} | U_{i}] + Γ (U_{i}) Z_{i}

. The vector

Z_{i}

is characterized by

E (Z_{i}) = 0

,

var (Z_{i}) = I_{m}

, and for any

U_{i}

,

Γ (U_{i})

is a

p \times m

matrix with

Γ (U_{i}) Γ^{T} (U_{i}) = Σ_{*} (U_{i})

. It is assumed that each random vector

{Z_{i}, i = 1, \dots, n}

has finite fourth moments and

E (Z_{i j}^{4}) = 3 + Δ

for some constant

Δ

. Moreover, we assume the following:

\begin{matrix} E (Z_{i j_{1}}^{l_{1}} Z_{i j_{2}}^{l_{2}} \dots Z_{i j_{d}}^{l_{d}}) = E (Z_{i j_{1}}^{l_{1}}) E (Z_{i j_{2}}^{l_{2}}) \dots E (Z_{i j_{d}}^{l_{d}}) \end{matrix}

for

\sum_{k = 1}^{d} l_{k} \leq 4

and

j_{1} \neq j_{2} \neq \dots \neq j_{d}

, where d is a positive integer.

(C4)

β^{T} Σ_{*} β = o (h^{2})

, and

β^{T} Σ_{*}^{3} β = o (tr (Σ_{*}^{2}) / n)

.

(C5) The error term satisfies

E [ε^{4}] < + \infty

.

(C6) The random variable U is confined to a compact domain

Ω

, and its density function f exhibits a continuously differentiable second derivative and bounded away from 0 on its support. The kernel

K (\cdot)

is a symmetric probability density with compact support and is Lipschitz continuous.

(C7)

E (ξ_{1} | U_{1})

and

g (\cdot)

are Lipschitz continuous and admit continuous second-order derivatives.

(C8) It is assumed that the sample size n and the smoothing parameter h satisfy the following:

lim_{n \to \infty} h = 0, lim_{n \to \infty} n h = \infty, lim_{n \to \infty} n h^{4} = 0

.

(C9) The truncated number p and the sample size n are assumed to satisfy

p = o (n^{2} h^{2})

.

Condition (C2) is widely utilized in high-dimensional data research (see [21,22,23]). Condition (C3) resembles a factor model. To assess local power, we further impose condition (C4) on the coefficient vector

β

. In fact, (C4) can serve as the local alternative as its distance measurement between

β

and 0. This local alternative can be also found in [21]. (C5) is the typical assumption for the error term

ε

. Conditions (C6–C8) are very common in non-parametric smoothing. (C9) is a technical condition that is needed to derive the theorems.

In practical applications, the data must satisfy conditions (C1–C3) and (C7). Conditions (C1) and (C7) are generally met for most datasets. (C2) does not specify a relationship between p and n. The matrix’s positive definiteness ensures that the regression coefficients can be identified.

tr (Σ_{*}^{4}) = o ({tr}^{2} (Σ_{*}^{2}))

holds if the eigenvalues of

Σ_{*}

are all bounded or the largest eigenvalue is of smaller order than

{(p - b)}^{1 / 2} b^{- 1 / 4}

, where b is the number of unbounded eigenvalues. Condition (C3) essentially assumes that the functional predictor

X_{i} (t)

is based on a latent factor model, where the factor loadings meet the pseudo-independence assumption. If

X (t)

is a Gaussian process, it can be expanded as

X (t) = \sum_{j = 1}^{m} \sqrt{λ_{j}} N_{j} u_{j} (t)

, with

N_{j}

being independent standard normal random variables. This expansion is a special case of (C3) when the

(i, j)

-th element of the transformation matrix

Γ

is

\sqrt{λ_{j}} 〈 u_{j}, ϕ_{i} 〉

. These conditions are generally met for most data and do not affect the validity of the proposed test. Many datasets can be regarded as following a Gaussian process, such as changes in gene expression levels, logarithmic returns on financial asset prices, soil moisture, and temperature distribution.

We present the asymptotic theory for the proposed test statistic under the null hypothesis and local alternative (C4) in the subsequent two theorems:

Theorem 1.

Under the assumptions of conditions (C1), (C3–C9), it follows that

\begin{matrix} (i) E (T_{n p}) = {∥ C^{*} (α) ∥}^{2} + o (\sqrt{tr (Σ_{*}^{2})} / n); \\ (i i) T_{n p} - {∥ C^{*} (α) ∥}^{2} = \frac{1}{(\binom{n}{2})} \sum_{i = 2}^{n} \sum_{j = 1}^{i - 1} 〈 X_{i} - μ_{i t}, X_{j} - μ_{j t} 〉 ε_{i} ε_{j} + o_{p} (\sqrt{tr (Σ_{*}^{2})} / n), \end{matrix}

where

C^{*} (α) = E [〈 X_{i} - μ_{i t}, α 〉 (X_{i} - μ_{i t})]

. It can be regarded as the covariance operator of a random variable

X_{i} - μ_{i t}

.

Theorem 2.

Assume conditions (C1–C3) and (C5–C9) hold, we then have the following results under either the null hypothesis or the local alternative (C4):

\frac{n (T_{n p} - ∥ C^{*} (α) ∥^{2})}{σ^{2} \sqrt{2 tr (Σ_{*}^{2})}} \overset{D}{⟶} N (0, 1), as n \to \infty,

where

\overset{D}{⟶}

represents convergence in distribution.

Theorem 2 demonstrates that, under the local alternative hypothesis (C4), the proposed test statistic possesses the following asymptotic local power at the nominal significance level

α

:

Ψ (β) = Φ (- z_{α} + \frac{n ∥ C^{*} {(α) ∥}^{2}}{σ^{2} \sqrt{2 tr (Σ_{*}^{2})}}),

where

Φ (\cdot)

denotes the cumulative distribution function of the standard normal, and

z_{α}

represents its

(1 - α)

th quantile. We define

η (α) = C^{*} (α) / σ^{2} \sqrt{2 tr (Σ_{*}^{2})}

, which represents the signal-to-noise ratio. When the term

η (α) = o (1 / n)

, the power converges to

α

, then the power converges to 1 if it has a high order of

1 / n

. This implies that the proposed test is consistent. The power performance will be demonstrated through simulations in Section 4.

According to Theorem 2, the proposed test statistic leads to the rejection of

H_{0}

at a significance level

α

when

n T_{n p} \geq \sqrt{2 {\hat{σ}}^{2} \hat{tr (Σ_{*}^{2})}} z_{α},

where

{\hat{σ}}^{2}

and

\hat{tr (Σ_{*}^{2})} z_{α}

serve as consistent estimators for

σ^{2}

and

tr (Σ_{*}^{2})

, respectively. We use a similar method as in [24] to estimate the trace. That is,

\hat{tr (Σ_{*}^{2})} = Y_{1 n} - 2 Y_{2 n} + Y_{3 n},

where

Y_{1 n} = \frac{1}{A_{n}^{2}} \sum_{i \neq j} {〈 {\overset{ˇ}{X}}_{i}^{T}, {\overset{ˇ}{X}}_{j} 〉}^{2}

,

Y_{2 n} = \frac{1}{A_{n}^{3}} \sum_{i \neq j \neq k} 〈 {\overset{ˇ}{X}}_{i}^{T}, {\overset{ˇ}{X}}_{j} 〉 〈 {\overset{ˇ}{X}}_{j}^{T}, {\overset{ˇ}{X}}_{k} 〉

,

Y_{2 n} = \frac{1}{A_{n}^{4}} \sum_{i \neq j \neq k \neq l} 〈 {\overset{ˇ}{X}}_{i}^{T}, {\overset{ˇ}{X}}_{j} 〉 〈 {\overset{ˇ}{X}}_{k}, {\overset{ˇ}{X}}_{l} 〉

with

A_{n}^{m} = n! / (n - m)!

. And the simple estimator

{\hat{σ}}^{2} = {(n - 1)}^{- 1} \sum_{i = 1}^{n} {({\overset{ˇ}{Y}}_{i} - \bar{\overset{ˇ}{Y}})}^{2}

is used, which is consistent under the null hypothesis testing.

4. Simulation

This section evaluates the finite sample performance of the proposed test, including its size and power. The assessment is conducted through a series of simulation studies. Through numerical simulations, we will validate that the distribution of the proposed test statistic under the null hypothesis is consistent with the properties stated in Theorem 1. For each simulation, we create 1000 Monte Carlo samples. The basis expansion and FPCA are conducted using the R package fda.

To mitigate the probability of both Type I and Type II errors in the testing procedure, the sample size must be adequately large. However, to maintain computational efficiency during the numerical simulations, the sample size should not be excessively large. Consequently, the sample size n in this study has been set within a range of 50 to 200. To validate the effectiveness of our proposed test, the parameters are flexibly set.

Here we compare the proposed test

T_{n p}

with the chi-square test

T_{n}

constructed by [18]. The cumulative percentage of total variance (CPV) method is used to estimate the number of principal components in

T_{n}

. Let CPV, explained by the first m empirical functional principal components, be defined as follows:

CPV = \frac{\sum_{i = 1}^{m} {\hat{λ}}_{i}}{\sum_{i = 1}^{p} {\hat{λ}}_{i}},

where

{{\hat{λ}}_{i}}_{i = 1}^{p}

is the estimate of the eigenvalue of the covariance operator. The smallest value of m for which CPV (m) surpasses the threshold of 95% is selected in this section. We denote p as the number of basis functions used to fit curves. The simulated data are produced according to the following model:

Y_{i} = \int_{0}^{1} α (s) X_{i} (s) d s + g (U_{i}) + ε_{i}, i = 1, 2, \dots, n .

where

g (U_{i}) = 2 U_{i}

or

g (U_{i}) = 2 + sin (2 π U_{i})

, and

{U_{i}, i = 1, 2, \dots, n}

is independently drawn from the uniform distribution on

(0, 1)

. To analyze the impact of different error distributions, the following four distributions will be selected: (1)

ε_{i}

∼

N (0, 1)

, (2)

ε_{i}

∼

t (3) / \sqrt{3}

, (3)

ε_{i}

∼

Γ (1, 1) - 1

, (4)

ε_{i}

∼

(lnorm (0, 1) - \sqrt{e}) / \sqrt{e (e - 1)}

. All results about

g (U_{i}) = 2 U_{i}

are presented in Supplementary Materials.

We next report the simulation results for two data structures of the predictor

X (t)

.

1. The predictor

X (t)

is defined as

\sum_{j = 1}^{50} ξ_{j} ϕ_{j} (t)

, with

ξ_{j}

normally distributed with mean 0 and variance

λ_{j} = 10 {((j - 1 / 2) π)}^{- 2}

,

ϕ_{j} (t) = \sqrt{2} (j - 1 / 2) π t

for

j = 1, 2, \dots, 50

. The slope function

α (t)

is given by

c (\sqrt{2} sin (π t / 2) + 3 \sqrt{2} sin (3 π t / 2))

, where the coefficient c ranges from 0 to 0.2.

c = 0

corresponds to the null hypothesis. The number of basis functions used to fit curves and the sample size are taken as follows:

p = 11, 49

,

n = 50, 100

. Under different error distributions, Table 1 and Table 2 evaluate the empirical size and power of both tests for different non-parametric functions when the nominal level

α

is

0.05

.

From Table 1 and Table 2, the following can be seen: (i) The performances of both tests remain consistent across various error distributions and non-parametric functions; (ii) Because

T_{n p}

is intended for functional data beyond the reach of a few principal components, the power of the proposed test is somewhat less than that of

T_{n}

. (iii) The power of the test increases with the sample size n, but it is not significantly affected by increases in the parameter value p. In fact, for the functional data structure given in Simulation 1, the number of principal components selected is relatively small, regardless of the number of basis functions used to fit the functional data.

2. The functional predictor is constructed using the expansion in (7), with

ϕ_{k}

representing the Fourier basis function on [0,1] defined as

ϕ_{1} (t) = 1

,

ϕ_{2} (t) = \sqrt{2} \sin (2 π t)

,

ϕ_{3} (t) = \sqrt{2} \cos (2 π t)

,

ϕ_{4} (t) = \sqrt{2} \sin (4 π t)

,

ϕ_{4} (t) = \sqrt{2} \cos (4 π t), \dots

. The first p of the basis functions will be used to generate the prediction function and slope function. Let

X_{i} (t) = \sum_{j = 1}^{p} ξ_{i j} ϕ_{j} (t)

,

α (t) = \sum_{j = 1}^{p} β_{j} ϕ_{j} (t)

, where

p = 11, 49, 201, 365

,

n = 50, 100, 200

, the coefficient of slope function

{β_{i} = | β | / \sqrt{p}, i = 1, \dots, p}

with

{| β |}^{2} = c * 10^{- 2}

and c varying from 0 to 1.

c = 0

corresponds to the case in which

H_{0}

is true. The coefficients of predictor

ξ_{i j}

follow the moving average model:

ξ_{i j} = ρ_{1} Z_{i j} + ρ_{2} Z_{i (j + 2)} + \dots + ρ_{T} Z_{i (p + T - 1)},

where the constant T adjusts the degree of dependence among the elements of the predictor.

{Z_{i j}, Z_{i (j + 1)}, \dots, Z_{i (p + T - 1)}}

are drawn independently from the distribution

N (0, I_{p + T - 1})

with

T = 10

. The element at the

(j, k)

position of the covariance matrix

Σ

for coefficient vector

ξ_{i}

is

\sum_{l = 1}^{T - | j - k |} ρ_{l} ρ_{l + | j - k |} I {| j - k | < T},

where

{ρ_{k}, k = 1, \dots, T}

is independently generated from the uniform distribution

U (0, 1)

.

The bandwidth is chosen using cross-validation (CV). At a significance level of

α = 0.05

, Table 3 delineates the empirical size and power of the two tests when the function

g (\cdot)

is linear. Table 4 presents the results for the case where

g (\cdot)

is a trigonometric function.

From Table 3 and Table 4, the number of basis functions used for fitting functions has a very important impact on the test. Specifically, (i) Across various error distributions, as p increases, the empirical size of test

T_{n}

significantly exceeds the nominal level, whereas our proposed test

T_{n p}

maintains stable performance; (ii) The power of the test increases with the sample size n. Conversely, it decreases as the values of p increase. (iii) The proposed test demonstrates robustness across all scenarios presented in this simulation study. Actually, for the functional data structure given in Simulation 2, selecting too many principal components negates the effectiveness of FPCA-based test statistics. Instead, the proposed test has great advantages (see bold numbers in Table 3 and Table 4).

To more effectively verify the accuracy of the asymptotic theory underlying our proposed test statistic, Table 5 provides the mean and standard deviation (sd) of the test statistic under different scenarios. From Table 5, it is observed that when

c = 0

, the mean of our proposed test statistic fluctuates around zero, and the standard deviation fluctuates around one. This aligns with the theoretical expectations. As c increases, the mean of the test statistic moves further away from zero, and the standard deviation moves further away from one, indicating a departure from the null hypothesis.

Furthermore, to verify the asymptotic theory of our proposed test, we consider the case where

(n, p) = (200, 365)

. Figure 1 and Figure 2 draw the null distributions and the q-q plots

T_{n p}

, corresponding to

g (u) = 2 u

and

g (u) = 2 + sin (2 π u)

, respectively. The null distributions are represented by the dashed lines, while the solid lines are density function curves of standard normal distributions.

For different

n, p

, Figure 3 and Figure 4, respectively, show the empirical power functions of the proposed test statistics. These figures are presented for four different error distribution functions. The function

g (\cdot)

is linear in Figure 3 and trigonometric in Figure 4. When

(n, p) = (200, 201), (100, 201), (200, 365)

, the empirical power functions of the proposed test are represented by solid lines, dashed lines, and dotted lines, respectively. From Figure 3 and Figure 4, it can be seen that the power increases rapidly as long as c increases slightly. The test’s power is positively related to the sample size n and inversely related to the magnitude of p. The proposed test is stable under different error distributions. These are consistent with the conclusions in Table 3 and Table 4.

It is worth noting that, theoretically, a kernel function

K (\cdot)

is sufficient if it satisfies the conditions of symmetry and Lipschitz continuity. In practical applications, however, the choice of kernel function should be based on the characteristics and requirements of the data. For instance, the Epanechnikov kernel is more suitable for bounded data, while the Gaussian kernel is better suited for data with long tails. In this simulation study, according to the given data setting, the Epanechnikov kernel was chosen. To compare the effects of the two kernels, we replaced the Epanechnikov kernel used to generate Figure 4 with a Gaussian kernel to produce Figure 5. From Figure 4 and Figure 5, it can be observed that the impact of the two kernels on the test is relatively minor.

The numerical simulations show that our proposed test performs well for the data types. However, with larger sample sizes, the numerical simulations in this paper require considerable computational time, which is a limitation of the proposed test statistic. Additionally, its performance on the datasets that violate the assumptions (C1–C3) and (C7), such as when the real data are not in a Sobolev ellipsoid of order two, remains to be seen.

5. Application

This section applies the proposed test to the spectral data, which has been described and analyzed in the literature (see [25,26]). This dataset can be obtained on the following platforms: http://lib.stat.cmu.edu/datasets/tecator (accessed on 16 July 2024). Each meat sample is characterized by a 100-channel spectrum of absorbance, along with the moisture (water), fat, and protein contents. The absorbance is calculated as the negative logarithm base 10 of the transmittance, as measured by the spectrometer. The three contents, measured in percent, are determined by analytic chemistry. The dataset comprises 240 samples, partitioned into 5 subsets for the validation of models and extrapolation studies. In this section, we utilize a total of 215 samples, which include both training and test samples drawn from the 5 subsets. The spectral measurement data consist of curves, represented by

X_{i} (\cdot)

, corresponding to absorbance values recorded at 100 equally spaced wavelengths from 850 nm to 1050 nm. Let

Y_{i}

represent the fat content as the response variable,

Z_{i}

represent the protein content, and

U_{i}

represent the moisture content. Similar to [27], the following two models will be used to assume the relationship between them:

Y_{i} = \int_{850}^{1050} α (t) X_{i} (t) d t + g (Z_{i}) + ε_{i},

(10)

Y_{i} = \int_{850}^{1050} α (t) X_{i} (t) d t + g (U_{i}) + ε_{i} .

(11)

The present investigation primarily focuses on the test in models (10) and (11):

α (t) = 0

. The number of basis functions used for fitting function curves p is selected as 129. Figure 6 shows the estimation of slope function

α (t)

in models (10) and (11).

The calculation results are as follows: (i) For model (10), the value of the statistic is

T_{n p} = 31.186

; p-value is 0. (ii) For model (11), the value of the statistic is

T_{n p} = - 0.867

; p-values are 0.386. From this, we can see that the model test (10) is significant, while the model test (11) is not significant. This result can also be reflected in Figure 6. It is obvious that the estimated value of

α (t)

on the right side of Figure 6 is much smaller than that on the left side.

6. Conclusions

To test (2), this paper first provides a pseudo-estimate of the non-parametric function

g (u)

using kernel methods with a fixed coefficient function

β (t)

. The pseudo-estimate is then substituted into the model, converting the original model into a linear one. This allows for the construction of the second-order U-test statistic employed in this paper, utilizing the corresponding testing methods from functional linear models. The proposed test does not require estimating the covariance operator of the predictor function. It follows a normal distribution asymptotically under both a null hypothesis and a local alternative. Moreover, numerical simulations show that our proposed test performs better than the test constructed in [18] when functional data cannot be approximated by a few principal components. Finally, the real data are applied to our proposed test to verify its feasibility.

Additionally, the proposed test is adaptable to cases where the response variable is functional, which is a focus for our upcoming research. In the real world, the proposed test requires real data to meet some technical conditions (C1–C3, C7). When data fail to satisfy these conditions, the viability of the test presented in this paper requires further investigation. Therefore, future work may focus on broadening the test’s applicability. The calculation of the statistics in numerical simulations requires optimization for efficiency.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/math12162588/s1.

Author Contributions

Conceptualization, F.Z. and B.Z.; methodology, B.Z.; Validation, B.Z.; formal analysis, F.Z.; investigation, F.Z.; data curation, F.Z.; writing—original draft preparation, F.Z.; writing—review and editing, B.Z.; visualization, B.Z.; supervision, B.Z.; project administration, B.Z.; funding acquisition, B.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (nos. 12271370, 12301349), the Natural Science Foundation of Shanxi Province, China (no. 202203021222009).

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors are grateful to the editor, associate editor, and referees for their valuable feedback, which has significantly enhanced the quality of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Several lemmas are established to facilitate the proofs of Theorems 1 and 2. Without loss of generality, we assume that

μ_{0} (t) = 0

and

E [g (U)] = 0

. Let

C_{n} = \sqrt{log (1 / h) / (n h)} + h^{2}

. With reference to the asymptotic theory of non-parametric estimation, the pseudo-estimation of the non-parametric function satisfies

sup_{u \in Ω} ∥ \overset{ˇ}{g} (u) - g (u) ∥ = O_{p} (C_{n})

. Denote

D g (U_{i}) = g (U_{i}) - \overset{ˇ}{g} (U_{i}), D μ_{i t} = μ_{i t} - {\hat{μ}}_{i t},

for

i = 1, 2, \dots, n

. Similarly to the lemmas in [21], it is easy to derive the following lemmas.

Lemma A1.

If (C1), (C3), and (C4) hold, it can be demonstrated that for any square matrix, M,

\begin{matrix} (i) E [Z_{1} Z_{1}^{T} M Z_{1} Z_{1}^{T}] = M + M^{T} + tr (M) I_{p} + Δ diag (M); \\ (i i) E [Z_{1} Z_{2}^{T} M Z_{2} Z_{1}^{T}] = tr (M) I_{p}; \\ (i i i) E [{(〈 X_{1} - μ_{1 t}, α 〉 〈 X_{1} - μ_{1 t}, X_{2} - μ_{2 t} 〉 〈 X_{2} - μ_{2 t}, α 〉)}^{2}] = o (tr (Σ_{*}^{2})) . \end{matrix}

Lemma A2.

Given that conditions (C1–C3) and (C5–C9) are satisfied, the following results are obtained.

\begin{matrix} (i) E [{〈 {\overset{ˇ}{X}}_{1}, {\overset{ˇ}{X}}_{2} 〉}^{4}] = o (n {tr}^{2} (Σ_{*}^{2})); \\ (i i) E [{〈 C^{*} ({\overset{ˇ}{X}}_{1}), {\overset{ˇ}{X}}_{1} 〉}^{2}] = o (n {tr}^{2} (Σ_{*}^{2})) . \end{matrix}

Lemma A3.

If (C1–C9) hold, then we can obtain the following:

\begin{matrix} (i) E [〈 {\overset{ˇ}{X}}_{1}, {\overset{ˇ}{X}}_{1} 〉] = O (tr (Σ_{*})); \\ (i i) E [{\overset{ˇ}{Y}}_{1}^{2}] = O (1); \\ (i i i) E [〈 {\bar{\overset{ˇ}{X}}}_{12}, {\bar{\overset{ˇ}{X}}}_{12} 〉] = O (tr (Σ_{*}) / n); \\ (i v) E [{\bar{\overset{ˇ}{Y}}}_{12}^{2}] = O (C_{n}^{2}); \\ (v) E [〈 {\bar{\overset{ˇ}{X}}}_{12}, {\bar{\overset{ˇ}{X}}}_{12} 〉 {\bar{\overset{ˇ}{Y}}}_{12}^{2}] = O (tr (Σ_{*}) / n^{2}), \end{matrix}

where

{\bar{\overset{ˇ}{X}}}_{i j} (t)

,

{\bar{\overset{ˇ}{Y}}}_{i j}

represent the sample means of

\overset{ˇ}{X} (t)

and

\overset{ˇ}{Y}

without ith and jth samples, for

i, j = 1, 2, \dots, n

. That is

{\bar{\overset{ˇ}{X}}}_{i j} (t) = \frac{1}{n - 2} \sum_{k \neq i, j} {\overset{ˇ}{X}}_{k} (t), {\bar{\overset{ˇ}{Y}}}_{i j} = \frac{1}{n - 2} \sum_{k \neq i, j} {\overset{ˇ}{Y}}_{k} .

Proof of Theorem 1.

Rewrite

\frac{n}{n - 2} Δ_{i j} (\overset{ˇ}{X}) ≜ P_{i j}^{(1)} + P_{i j}^{(2)} + P_{i j}^{(3)} + P_{i j}^{(4)}, \frac{n}{n - 2} Δ_{i j} (\overset{ˇ}{Y}) ≜ L_{i j}^{(1)} + L_{i j}^{(2)} + L_{i j}^{(3)} + L_{i j}^{(4)},

where

\begin{matrix} P_{i j}^{(1)} = (1 - \frac{1}{n}) 〈 {\overset{ˇ}{X}}_{i}, {\overset{ˇ}{X}}_{j} 〉, P_{i j}^{(2)} = - \frac{1}{2 n} (〈 {\overset{ˇ}{X}}_{i}, {\overset{ˇ}{X}}_{i} 〉 + 〈 {\overset{ˇ}{X}}_{j}, {\overset{ˇ}{X}}_{j} 〉 - 2 E [〈 {\overset{ˇ}{X}}_{1}, {\overset{ˇ}{X}}_{1} 〉]), \\ P_{i j}^{(3)} = - (1 - \frac{2}{n}) 〈 {\overset{ˇ}{X}}_{i} + {\overset{ˇ}{X}}_{j}, {\bar{\overset{ˇ}{X}}}_{i j} 〉, P_{i j}^{(4)} = (1 - \frac{2}{n}) (〈 {\bar{\overset{ˇ}{X}}}_{i j}, {\bar{\overset{ˇ}{X}}}_{i j} 〉 - \frac{E [〈 {\overset{ˇ}{X}}_{1}, {\overset{ˇ}{X}}_{1} 〉]}{n - 2}), \\ L_{i j}^{(1)} = (1 - \frac{1}{n}) {\overset{ˇ}{Y}}_{i} {\overset{ˇ}{Y}}_{j}, L_{i j}^{(2)} = - \frac{1}{2 n} ({\overset{ˇ}{Y}}_{i}^{2} + {\overset{ˇ}{Y}}_{j}^{2} - 2 E [{\overset{ˇ}{Y}}_{1}^{2}]), \\ L_{i j}^{(3)} = - (1 - \frac{2}{n}) ({\overset{ˇ}{Y}}_{i} + {\overset{ˇ}{Y}}_{j}) {\bar{\overset{ˇ}{Y}}}_{i j}, L_{i j}^{(4)} = (1 - \frac{2}{n}) ({\bar{\overset{ˇ}{Y}}}_{i j}^{2} - \frac{E ({\overset{ˇ}{Y}}_{1}^{2})}{n - 2}), \end{matrix}

then the expectation of test statistic

T n p

is as follows:

E [T_{n p}] = \sum_{j < i} \sum_{l, k = 1}^{4} E [P_{i j}^{(l)} L_{i j}^{(k)}] .

To prove the conclusion (i) in Theorem 1, it needs to be calculated one by one for

(l, k), l, k = 1, 2, 3, 4

. Because of the similarity to calculations in different cases of

(l, k)

, here, we mainly consider the case where

(l, k) = (1, 1)

,

E [P_{i j}^{(1)} L_{i j}^{(1)}] ≜ G_{1}^{(1, 1)} + G_{2}^{(1, 1)} + G_{3}^{(1, 1)} + G_{4}^{(1, 1)} + G_{5}^{(1, 1)} + G_{6}^{(1, 1)},

where

\begin{matrix} G_{1}^{(1, 1)} = \frac{{(n - 1)}^{2}}{n^{2}} E [〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 D g (U_{1}) D g (U_{2})], \\ G_{2}^{(1, 1)} = \frac{{(n - 1)}^{2}}{n^{2}} E [〈 X_{1} - {\hat{μ}}_{1 t}, α 〉 〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 〈 X_{2} - {\hat{μ}}_{2 t}, α 〉], \\ G_{3}^{(1, 1)} = \frac{{(n - 1)}^{2}}{n^{2}} E [〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 ε_{1} ε_{2}], \\ G_{4}^{(1, 1)} = \frac{2 {(n - 1)}^{2}}{n^{2}} E [〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 〈 X_{2} - {\hat{μ}}_{2 t}, α 〉 D g (U_{1})], \\ G_{5}^{(1, 1)} = \frac{2 {(n - 1)}^{2}}{n^{2}} E [〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 D g (U_{1}) ε_{2}], \\ G_{6}^{(1, 1)} = \frac{2 {(n - 1)}^{2}}{n^{2}} E [〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 〈 X_{1} - {\hat{μ}}_{1 t}, α 〉 ε_{2}] . \end{matrix}

For the above six items, we will analyze them one by one. Firstly, we consider the first term. We have the following:

\begin{matrix} E [〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 D g (U_{1}) D g (U_{2})] \\ = 2 E [〈 D μ_{1 t}, X_{2} - μ_{2 t} 〉 D g (U_{1}) D g (U_{2})] + E [〈 D μ_{1 t}, D μ_{2 t} 〉 D g (U_{1}) D g (U_{2})] \\ = O (W_{12} W_{21} (ξ_{2}^{T} ξ_{2} - μ^{T} (U_{2}) ξ_{2}) g^{2} (U_{1})) \\ = O (tr (Σ_{*}) / n^{2} h), \end{matrix}

then

G_{1}^{(1, 1)} = o (\sqrt{tr (Σ_{*}^{2})} / n)

holds. For the second term, we have the following:

\begin{matrix} E [〈 X_{1} - {\hat{μ}}_{1 t}, α 〉 〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 〈 X_{2} - {\hat{μ}}_{2 t}, α 〉] \\ = E [〈 X_{1} - μ_{1 t}, α 〉 〈 X_{1} - μ_{1 t}, X_{2} - μ_{2 t} 〉 〈 X_{2} - μ_{2 t}, α 〉] \\ + 2 E [〈 D μ_{1 t}, α 〉 〈 D μ_{1 t}, D μ_{2 t} 〉 〈 D μ_{2 t}, α 〉] \\ + 2 E [〈 X_{1} - μ_{1 t}, α 〉 〈 D μ_{1 t}, D μ_{2 t} 〉 〈 D μ_{2 t}, α 〉] \\ + 2 E [〈 X_{1} - μ_{1 t}, α 〉 〈 X_{2} - μ_{2 t}, D μ_{1 t} 〉 〈 D μ_{2 t}, α 〉] \\ + 2 E [〈 X_{1} - μ_{1 t}, α 〉 〈 X_{1} - μ_{1 t}, D μ_{2 t} 〉 〈 D μ_{2 t}, α 〉] \\ + 2 E [〈 D μ_{1 t}, α 〉 〈 X_{1} - μ_{1 t}, D μ_{2 t} 〉 〈 D μ_{2 t}, α 〉] \\ + 2 E [〈 X_{1} - μ_{1 t}, α 〉 〈 D μ_{1 t}, D μ_{2 t} 〉 〈 X_{2} - μ_{2 t}, α 〉] \\ = ∥ C^{*} {(α) ∥}^{2} + O (2 β Σ_{*}^{2} β / n h + β^{T} Σ_{*} β tr (Σ_{*}) / n^{2}) . \end{matrix}

Combined with (C1), (C3), and (C9),

G_{2}^{(1, 1)} = {∥ C^{*} (α) ∥}^{2} + o (\sqrt{tr (Σ_{*}^{2})} / n)

holds. The error term

ε_{i}

with a mean of zero is independent of the predictor; hence, it is easy to see that both the third term

G_{3}^{(1, 1)}

and the sixth term

G_{6}^{(1, 1)}

are zero. For the other two cross terms,

G_{4}^{(1, 1)}

and

G_{5}^{(1, 1)}

, we need to prove that they are high-order infinitesimals of

\sqrt{tr (Σ_{*}^{2})} / n

. In fact,

\begin{matrix} E [〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 〈 X_{2} - {\hat{μ}}_{2 t}, α 〉 D g (U_{1})] \\ = E [〈 X_{1} - μ_{1 t}, α 〉 〈 X_{1} - μ_{1 t}, D μ_{2 t} 〉 D g (U_{2})] \\ + E [〈 D μ_{1 t}, α 〉 〈 D μ_{1 t}, X_{2} - μ_{2 t} 〉 D g (U_{2})] \\ + E [〈 D μ_{1 t}, α 〉 〈 X_{1} - μ_{1 t}, D μ_{2 t} 〉 D g (U_{2})] \\ + E [〈 X_{1} - μ_{1 t}, α 〉 〈 D μ_{1 t}, D μ_{2 t} 〉 D g (U_{2})] \\ + E [〈 D μ_{1 t}, α 〉 〈 D μ_{1 t}, D μ_{2 t} 〉 D g (U_{2})] \\ = O (n E [W_{23}^{2} β^{T} (ξ_{1} ξ_{1}^{T} - ξ_{1} μ^{T} (U_{1})) ξ_{3} g (U_{3})]) \\ = O (E [β^{T} Σ_{*} (U_{1}) μ (U_{2}) g (U_{2}) f^{- 1} (U_{2})] / n h) \\ = o (\sqrt{tr (Σ_{*}^{2})} / n) . \end{matrix}

Finally, for

G_{5}^{(1, 1)}

, we have the following:

\begin{matrix} E [〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 D g (U_{1}) ε_{2}] \\ = E [〈 X_{1} - μ_{1 t} + D μ_{1 t}, X_{2} - μ_{2 t} + D μ_{2 t} 〉 (- W_{12} ε_{2}^{2})] \\ = E [{(ξ_{1} - μ (U_{1}))}^{T} ξ_{1} W_{21} W_{12} σ^{2}] + E [W_{12}^{2} ξ_{2}^{T} (ξ_{2} - μ (U_{1})) σ^{2}] \\ = O (tr (Σ_{*}) / n^{2} h) . \end{matrix}

Using (C3) and the following fact

{tr}^{2} (Σ_{*}) \leq p tr (Σ_{*}^{2})

, we obtain

tr (Σ_{*}) / \sqrt{tr (Σ_{*}^{2})} = o (n h)

, i.e.,

A_{5}^{(1, 1)} = o (\sqrt{tr (Σ_{*}^{2})} / n)

. Then, the following can be seen:

E [P_{i j}^{(1)} L_{i j}^{(1)}] = {∥ C^{*} (α) ∥}^{2} + o (\sqrt{tr (Σ_{*}^{2})} / n) .

The conclusion (i) of Theorem 1 follows from the calculation of

E [P_{i j}^{(1)} L_{i j}^{(1)}]

and the proof of Theorem 1 in [21]. Conclusion (ii) is addressed in Theorem 2’s proof; here we omit it. □

Proof of Theorem 2.

By Theorem 1, we have the following:

\frac{n (E (T_{n p}) - ∥ C^{*} (α) ∥^{2})}{σ^{2} \sqrt{2 tr (Σ_{*}^{2})}} = o (1),

then we only need to prove the following:

\frac{n (T_{n p} - E T_{n p})}{σ^{2} \sqrt{2 tr (Σ_{*}^{2})}} \overset{D}{⟶} N (0, 1) .

(A1)

We denote

T_{n p}^{(k, l)} = n {(\binom{n}{2})}^{- 1} \sum_{i > j} (P_{i j}^{(k)} L_{i j}^{(l)} - E (P_{i j}^{(k)} L_{i j}^{(l)}))

with

k, l = 1, 2, 3, 4

. The subsequent result is established:

n (T_{n p} - E T_{n p}) = \sum_{k = 1}^{4} \sum_{l = 1}^{4} T_{n p}^{(k, l)} .

In order to derive the asymptotic properties of the above equation, we will find the asymptotic order of all terms

T_{n p}^{(k, l)}

. These items are divided into the following two groups according to the treatment methods.

Group 1: $(k, l) = (1, 1)$ , $(1, 2)$ , $(1, 3)$ , $(1, 4)$ , $(2, 1)$ , $(2, 3)$ , $(3, 1)$ , $(3, 3)$ , $(4, 1)$ , $(4, 3)$ .
Group 2: $(k, l) = (2, 2)$ , $(2, 4)$ , $(3, 2)$ , $(3, 4)$ , $(4, 2)$ , $(4, 4)$ .

Since the methods are similar, the cases of

(k, l) = (1, 1)

and

(k, l) = (2, 2)

will be considered, respectively, in detail in each group. Firstly, for

T_{n p}^{(1, 1)}

, we can rewrite the following:

T_{n p}^{(1, 1)} ≜ T_{n p, 1}^{(1, 1)} + T_{n p, 2}^{(1, 1)} + T_{n p, 3}^{(1, 1)} + T_{n p, 4}^{(1, 1)} + T_{n p, 5}^{(1, 1)} + T_{n p, 6}^{(1, 1)} + T_{n p, 7}^{(1, 1)} + T_{n p, 8}^{(1, 1)} + T_{n p, 9}^{(1, 1)},

(A2)

where

\begin{matrix} T_{n p, 1}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{j < i} {〈 X_{i} - {\hat{μ}}_{i t}, X_{j} - {\hat{μ}}_{j t} 〉 D g (U_{i}) D g (U_{j}) - E [〈 X_{i} - {\hat{μ}}_{i t}, X_{j} - {\hat{μ}}_{j t} 〉 \\ D g (U_{i}) D g (U_{j})]}, \\ T_{n p, 2}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{j < i} {〈 X_{i} - {\hat{μ}}_{i t}, X_{j} - {\hat{μ}}_{j t} 〉 〈 X_{j} - {\hat{μ}}_{j t}, α 〉 D g (U_{i}) - E [〈 X_{i} - {\hat{μ}}_{i t}, X_{j} \\ - {\hat{μ}}_{j t} 〉] 〈 X_{j} - {\hat{μ}}_{j t}, α 〉 D g (U_{i})}, \\ T_{n p, 3}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{j < i} {〈 X_{i} - {\hat{μ}}_{i t}, X_{j} - {\hat{μ}}_{j t} 〉 〈 X_{i} - {\hat{μ}}_{i t}, α 〉 D g (U_{j}) - E [〈 X_{i} - {\hat{μ}}_{i t}, X_{j} \\ - {\hat{μ}}_{j t} 〉] 〈 X_{i} - {\hat{μ}}_{i t}, α 〉 D g (U_{j})}, \\ T_{n p, 4}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{j < i} {〈 X_{i} - {\hat{μ}}_{i t}, X_{j} - {\hat{μ}}_{j t} 〉 D g (U_{i}) ε_{j} - E [〈 X_{i} - {\hat{μ}}_{i t}, X_{j} \\ - {\hat{μ}}_{j t} 〉 D g (U_{i}) ε_{j}]}, \\ T_{n p, 5}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{j < i} {〈 X_{i} - {\hat{μ}}_{i t}, X_{j} - {\hat{μ}}_{j t} 〉 D g (U_{j}) ε_{i} - E [〈 X_{i} - {\hat{μ}}_{i t}, X_{j} \\ - {\hat{μ}}_{j t} 〉 D g (U_{j}) ε_{i}]}, \\ T_{n p, 6}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{j < i} \{〈 X_{i} - {\hat{μ}}_{i t}, X_{j} - {\hat{μ}}_{j t} 〉 〈 X_{j} - {\hat{μ}}_{j t}, α 〉 ε_{i}\}, \\ T_{n p, 7}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{j < i} \{〈 X_{i} - {\hat{μ}}_{i t}, X_{j} - {\hat{μ}}_{j t} 〉 〈 X_{i} - {\hat{μ}}_{i t}, α 〉 ε_{j}\}, \\ T_{n p, 8}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{j < i} {〈 X_{i} - {\hat{μ}}_{i t}, α 〉 〈 X_{i} - {\hat{μ}}_{i t}, X_{j} - {\hat{μ}}_{j t} 〉 〈 X_{j} - {\hat{μ}}_{j t}, α 〉 - E [〈 X_{i} \\ - {\hat{μ}}_{i t}, α 〉 〈 X_{i} - {\hat{μ}}_{i t}, X_{j} - {\hat{μ}}_{j t} 〉 〈 X_{j} - {\hat{μ}}_{j t}, α 〉]}, \\ T_{n p, 9}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{j < i} 〈 X_{i} - {\hat{μ}}_{i t}, X_{j} - {\hat{μ}}_{j t} 〉 ε_{i} ε_{j} . \end{matrix}

To prove (A1), we shall prove the following:

\frac{T_{n p}^{(1, 1)} - E T_{n p}^{(1, 1)}}{σ^{2} \sqrt{2 tr (Σ_{*}^{2})}} = \frac{T_{n p, 91}^{(1, 1)}}{σ^{2} \sqrt{2 tr (Σ_{*}^{2})}} + o_{p} (1),

(A3)

where

T_{n p, 91}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{j < i} 〈 X_{i} - μ_{i t}, X_{j} - μ_{j t} 〉 ε_{i} ε_{j}

.

It is easy to see that the means of nine items in the right equation of (A2) are all zero. To calculate their asymptotic order, it is necessary to prove their second moment. Due to the similarity in calculating the first eight items, we use the first item

T_{n p, 1}^{(1, 1)}

as an example to consider.

E [{(T_{n p, 1}^{(1, 1)})}^{2}] = \frac{n^{2}}{{(\binom{n}{2})}^{2}} \sum_{i = 2}^{n} E [Q_{i, 1}^{(1, 1)} Q_{i, 1}^{(1, 1)}] + \frac{n^{2}}{{(\binom{n}{2})}^{2}} \sum_{i = 2}^{n} \sum_{j \neq i} E [Q_{i, 1}^{(1, 1)} Q_{j, 1}^{(1, 1)}] + o (tr (Σ_{*}^{2})),

where

Q_{i, 1}^{(1, 1)} = \sum_{j = 1}^{i - 1} 〈 X_{i} - {\hat{μ}}_{i t}, X_{j} - {\hat{μ}}_{j t} 〉 D g (U_{i}) D g (U_{j}) .

For

i \neq j

, let us calculate

E [Q_{i, 1}^{(1, 1)} Q_{i, 1}^{(1, 1)}]

and

E [Q_{i, 1}^{(1, 1)} Q_{j, 1}^{(1, 1)}]

.

\begin{matrix} E [Q_{i, 1}^{(1, 1)} Q_{i, 1}^{(1, 1)}] \\ = (i - 1) E [〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 D g {(U_{1})}^{2} D g {(U_{2})}^{2}] \\ + (i - 1) (i - 2) E [〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 〈 X_{1} - {\hat{μ}}_{1 t}, X_{3} - {\hat{μ}}_{3 t} 〉 D g {(U_{1})}^{2} D g (U_{2}) D g (U_{3})] \\ ≜ (i - 1) B_{11}^{(1, 1)} + (i - 1) (i - 2) B_{12}^{(1, 1)}, \\ E [Q_{i, 1}^{(1, 1)} Q_{j, 1}^{(1, 1)}] \\ = Ξ_{1} E [〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 〈 X_{1} - {\hat{μ}}_{1 t}, X_{3} - {\hat{μ}}_{3 t} 〉 D g {(U_{1})}^{2} D g (U_{2}) D g (U_{3})] \\ + Ξ_{2} E [〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 〈 X_{3} - {\hat{μ}}_{3 t}, X_{4} - {\hat{μ}}_{4 t} 〉 D g (U_{1}) D g (U_{2}) D g (U_{3}) D g (U_{4})] \\ ≜ Ξ_{1} B_{12}^{(1, 1)} + Ξ_{2} B_{13}^{(1, 1)}, \end{matrix}

where

Ξ_{1} = (i - 1) \land (j - 1)

,

Ξ_{2} = (i - 1) (j - 1) - (i - 1) \land (j - 1)

.

Using the Cauchy–Schwarz inequality and Lemma A.2, we can obtain

B_{11}^{(1, 1)} = O (C_{n}^{4} \sqrt{n} tr (Σ_{*}^{2})) .

For

B_{12}^{(1, 1)}

and

B_{13}^{(1, 1)}

,

\begin{matrix} B_{12}^{(1, 1)} & = E [〈 X_{1} - {\hat{μ}}_{1 t}, X_{2} - {\hat{μ}}_{2 t} 〉 〈 X_{1} - {\hat{μ}}_{1 t}, X_{3} - {\hat{μ}}_{3 t} 〉 D g {(U_{1})}^{2} D g (U_{2}) D g (U_{3})] \\ = O (E [W_{23} tr (Σ_{*} (U_{1}) Σ_{*} (U_{3})) D g {(U_{1})}^{2} D g (U_{2}) D g (U_{3})] \\ + E [W_{13} W_{21} tr (Σ_{*} (U_{1}) Σ_{*} (U_{3})) D g {(U_{1})}^{2} D g (U_{2}) D g (U_{3})] \\ + E [W_{12} W_{13} tr (Σ_{*} (U_{1})) tr (Σ_{*} (U_{3})) D g {(U_{1})}^{2} D g (U_{2}) D g (U_{3})] \\ + E [μ^{T} (U_{1}) μ (U_{2}) μ^{T} (U_{3}) μ (U_{1}) D g {(U_{1})}^{2} D g (U_{2}) D g (U_{3})]) \\ = O (E [tr (Σ_{*} (U_{1}) Σ_{*} (U_{2})) g^{2} (U_{1}) g (U_{2})] / n^{2} + E^{2} [tr (Σ_{*} (U_{1})) g^{2} (U_{1})] / n^{3} \\ + E [μ^{T} (U_{2}) μ (U_{1}) μ^{T} (U_{1}) μ (U_{2}) g (U_{1}) / n^{2}] \\ + E [μ^{T} (U_{2}) μ (U_{1}) μ^{T} (U_{1}) μ (U_{2}) g^{2} (U_{1}) g^{2} (U_{2}) / n^{2}]), \\ B_{13}^{(1, 1)} & = O (E [D μ^{T} (U_{1}) D μ (U_{2}) {(μ (U_{3}) - \hat{μ} (U_{3}))}^{T} (μ (U_{4}) - \hat{μ} (U_{4})) \\ D g (U_{1}) D g (U_{2}) D g (U_{3}) D g (U_{4})]) \\ = O (μ^{T} (U_{1}) μ (U_{2}) μ^{T} (U_{3}) μ (U_{4}) g (U_{1}) g (U_{2}) g (U_{3}) g (U_{4}) / n^{2}) . \end{matrix}

So, we can have

T_{n p, 1}^{(1, 1)} = o_{p} (\sqrt{tr (Σ_{*}^{2})})

. We apply similar methods to the

T_{n p, 1}^{(1, 1)}

, the terms

{T_{n p, k}^{(1, 1)}, k = 2, \dots, 7}

are all equal to

o_{p} (\sqrt{tr (Σ_{*}^{2})})

. For

T_{n p, 9}^{(1, 1)}

, we rewrite

\begin{matrix} T_{n p, 9}^{(1, 1)} ≜ T_{n p, 91}^{(1, 1)} + T_{n p, 92}^{(1, 1)} + T_{n p, 93}^{(1, 1)} + T_{n p, 94}^{(1, 1)}, \end{matrix}

where

\begin{matrix} T_{n p, 91}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{i = 2}^{n} \sum_{j = 1}^{i - 1} \{〈 X_{i} - μ_{i t}, X_{j} - μ_{j t} 〉 ε_{i} ε_{j}\}, \\ T_{n p, 92}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{i = 2}^{n} \sum_{j = 1}^{i - 1} \{〈 X_{i} - μ_{i t}, D μ_{j t} 〉 ε_{i} ε_{j}\}, \\ T_{n p, 93}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{i = 2}^{n} \sum_{j = 1}^{i - 1} \{〈 X_{j} - μ_{j t}, D μ_{i t} 〉 ε_{i} ε_{j}\}, \\ T_{n p, 94}^{(1, 1)} = \frac{2 (n - 1)}{n^{2}} \sum_{i = 2}^{n} \sum_{j = 1}^{i - 1} \{〈 D μ_{i t}, D μ_{j t} 〉 ε_{i} ε_{j}\} . \end{matrix}

Since the means of the above four formulas are zero, in order to prove that (A3) is true, it is necessary to verify that the second moments of

T_{n p, 9 k}^{(1, 1)}, k = 2, 3, 4

are higher-order infinitesimals of the quantity

tr (Σ_{*}^{2})

. In fact,

\begin{matrix} E {[T_{n p, 92}^{(1, 1)}]}^{2} & = E {[T_{n p, 93}^{(1, 1)}]}^{2} \\ = \frac{4 {(n - 1)}^{2} σ^{4}}{n^{4}} \sum_{i = 2}^{n} \{(i - 1) E {[〈 X_{1} - μ_{1 t}, D μ_{2 t} 〉]}^{2}\} \\ = O (E [〈 X_{1} - μ_{1 t}, D μ_{2 t} 〉 〈 X_{1} - μ_{1 t}, D μ_{2 t} 〉]) \\ = O (n E [W_{23}^{2} ξ_{3}^{T} Σ_{*} (U_{1}) ξ_{3}]) \\ = o (tr (Σ_{*}^{2})), \end{matrix}

\begin{matrix} E {[T_{n p, 94}^{(1, 1)}]}^{2} & = O (E [〈 D μ_{i t}, D μ_{j t} 〉 〈 D μ_{i t}, D μ_{j t} 〉]) \\ = O (E [tr (Σ_{*} (U_{3}) μ (U_{1}) μ^{T} (U_{1}) f^{- 1} (U_{3}))] / n h) + O ({tr}^{2} (Σ_{*}) / n^{2} h) \\ = o (tr (Σ_{*}^{2})) . \end{matrix}

Then Equation (A3) holds. Similarly, for Group 2, i.e., when

(k, l) = (2, 2)

,

(2, 4)

,

(3, 2)

,

(3, 4)

,

(4, 2)

,

(4, 4)

, there is a similar proof process for the asymptotic behavior of each item in the group. Here, we only consider

T_{n p}^{(2, 2)}

. By careful calculation, we have the following:

\begin{matrix} E [{(〈 {\overset{ˇ}{X}}_{1}, {\overset{ˇ}{X}}_{1} 〉 + 〈 {\overset{ˇ}{X}}_{2}, {\overset{ˇ}{X}}_{2} 〉 - 2 E (〈 {\overset{ˇ}{X}}_{1}, {\overset{ˇ}{X}}_{1} 〉))}^{2}] \\ = O (E [{〈 X_{1} - μ_{1 t}, X_{1} - μ_{1 t} 〉}^{2}] + E^{2} [〈 X_{1} - μ_{1 t}, X_{1} - μ_{1 t} 〉] \\ + E [〈 X_{1} - μ_{1 t}, X_{1} - μ_{1 t} 〉 〈 X_{2} - μ_{2 t}, X_{2} - μ_{2 t} 〉]) \\ = O (E [{(ξ_{1} - μ (U_{1}))}^{T} (ξ_{1} - μ (U_{1})) {(ξ_{1} - μ (U_{1}))}^{T} (ξ_{1} - μ (U_{1}))] \\ + E [{(ξ_{1} - μ (U_{1}))}^{T} (ξ_{1} - μ (U_{1})) {(ξ_{2} - μ (U_{2}))}^{T} (ξ_{2} - μ (U_{2}))] \\ + E^{2} [{(ξ_{1} - μ (U_{1}))}^{T} (ξ_{1} - μ (U_{1}))]) \\ = O (E [2 tr (Σ_{*}^{2} (U_{1})) + {tr}^{2} (Σ_{*}) + Δ tr (diag (Γ^{T} (U_{1}) Γ (U_{1})) Γ^{T} (U_{1}) Γ (U_{1}))]) . \end{matrix}

Using the fact that

E [tr (diag (Γ^{T} (U_{1}) Γ (U_{1})) Γ^{T} (U_{1}) Γ (U_{1}))] = E [tr (Σ_{*}^{2} (U_{1}))] = O (tr (Σ_{*}^{2})),

we have the following:

\begin{matrix} E [{(〈 {\overset{ˇ}{X}}_{1}, {\overset{ˇ}{X}}_{1} 〉 + 〈 {\overset{ˇ}{X}}_{2}, {\overset{ˇ}{X}}_{2} 〉 - 2 E (〈 {\overset{ˇ}{X}}_{1}, {\overset{ˇ}{X}}_{1} 〉))}^{2}] = O ({tr}^{2} (Σ_{*}) + tr (Σ_{*}^{2})) . \end{matrix}

(A4)

In addition, by a simple calculation, we have the following:

\begin{matrix} E [{({\overset{ˇ}{Y}}_{1}^{2} + {\overset{ˇ}{Y}}_{2}^{2} - 2 E [{\overset{ˇ}{Y}}_{1}^{2}])}^{2}] & = 2 E [{({\overset{ˇ}{Y}}_{1}^{2})}^{2}] + 2 E [{\overset{ˇ}{Y}}_{1}^{2} {\overset{ˇ}{Y}}_{2}^{2}] - 4 E^{2} [{\overset{ˇ}{Y}}_{1}^{2}] = O (1) . \end{matrix}

(A5)

Combining (A4), (A5) with the Cauchy–Schwarz inequality, we have the following:

\begin{matrix} E | T_{n p}^{(2, 2)} | & \leq \frac{1}{4 n} \sqrt{E [{(〈 {\overset{ˇ}{X}}_{1}, {\overset{ˇ}{X}}_{1} 〉 + 〈 {\overset{ˇ}{X}}_{2}, {\overset{ˇ}{X}}_{2} 〉 - 2 E (〈 {\overset{ˇ}{X}}_{1}, {\overset{ˇ}{X}}_{1} 〉))}^{2}]} \sqrt{E {[{\overset{ˇ}{Y}}_{1}^{2} + {\overset{ˇ}{Y}}_{2}^{2} - 2 E ({\overset{ˇ}{Y}}_{1}^{2})]}^{2}} \\ = o (\sqrt{tr (Σ_{*}^{2})}) . \end{matrix}

We denote the following:

{\overset{ˇ}{T}}_{n p 1} ≜ {(\binom{n}{2})}^{- 1 / 2} \sum_{i = 2}^{n} \sum_{j = 1}^{i - 1} 〈 X_{i} - μ_{i t}, X_{j} - μ_{j t} 〉 ε_{i} ε_{j}

, by condition (C1), we only need to consider the following:

{\overset{ˇ}{T}}_{n p} = {(\binom{n}{2})}^{- 1 / 2} \sum_{i = 2}^{n} \sum_{j = 1}^{i - 1} {(ξ_{i} - μ (U_{i}))}^{T} (ξ_{j} - μ (U_{j})) ε_{i} ε_{j}

. Then, by Slutsky’s theorem, if the following conclusion can be obtained, Theorem 2 will be proved.

\frac{{\overset{ˇ}{T}}_{n p}}{\sqrt{var ({\overset{ˇ}{T}}_{n p})}} \overset{D}{⟶} N (0, 1) .

By some simple calculations, we have

var ({\overset{ˇ}{T}}_{n p}) = σ^{4} tr (Σ_{*}^{2})

. Let

Z_{n i} = \sum_{j = 1}^{i - 1} 〈 X_{i} - μ_{i t}, X_{j} - μ_{j t} 〉 ε_{i} ε_{j} / \sqrt{(\binom{n}{2})}

,

v_{n i} = E [Z_{n i}^{2} | F_{i - 1}]

,

X u_{i} = {(ξ_{i}, U_{i}, ε_{i})}^{T}

, where

F_{i} = {X u_{1}, \dots, X u_{i}}

is a

σ

-algebra produced by

{X u_{i}, k = 1, \dots, i}

,

V_{n} = \sum_{i = 2}^{n} v_{n i}

. The condition

E [Z_{n i} | F_{i}] = 0

is readily verifiable, and the sequence

{\sum_{i = 2}^{j} Z_{n i}, F_{j} : 2 \leq j \leq n}

constitutes a mean-zero martingale. Adherence to the Martingale central limit theorem is ensured by verifying the following conditions:

\frac{V_{n}}{var ({\overset{ˇ}{T}}_{n p})} \overset{P}{⟶} 1, as n \to \infty;

(A6)

\sum_{i = 2}^{n} σ^{- 4} {tr}^{- 1} (Σ_{*}^{2}) E {Z_{n i}^{2} I (| Z_{n i} | > η σ^{2} \sqrt{tr (Σ_{*}^{2})}) | F_{i - 1}} \overset{P}{⟶} 0 for \forall η > 0 .

(A7)

Note that

v_{n i} = \frac{σ^{2}}{(\binom{n}{2})} \sum_{j = 1}^{i - 1} \{ε_{j}^{2} {(ξ_{j} - μ (U_{j}))}^{T} Σ_{*} (ξ_{j} - μ (U_{j})) + 2 \sum_{k < l < i} ε_{k} ε_{l} {(ξ_{k} - μ (U_{k}))}^{T} Σ_{*} (ξ_{l} - μ (U_{l}))\} .

Then, we define the following:

\frac{V_{n}}{var ({\tilde{T}}_{n p})} ≜ C_{n 1} + C_{n 2},

where

\begin{matrix} C_{n 1} & = \frac{1}{(\binom{n}{2}) σ^{2} tr (Σ_{*}^{2})} \sum_{j < i} ε_{j}^{2} {(ξ_{j} - μ (U_{j}))}^{T} Σ_{*} (ξ_{j} - μ (U_{j})), \\ C_{n 2} & = \frac{2}{(\binom{n}{2}) σ^{2} tr (Σ_{*}^{2})} \sum_{k < l < i} ε_{k} ε_{l} {(ξ_{k} - μ (U_{k}))}^{T} Σ_{*} (ξ_{l} - μ (U_{l})) . \end{matrix}

The equality

E [C_{n 1}] = 1

can be readily confirmed, and we have the following:

\begin{matrix} var (C_{n 1}) = E [C_{n 1}^{2}] - 1 \\ = \frac{1}{n^{4} {(σ^{4} Σ_{*}^{2})}^{2}} \sum_{i = 2}^{n} ((i - 1) E [{(ξ_{1} - μ (U_{1}))}^{T} Σ_{*} (ξ_{1} - μ (U_{1})) {(ξ_{1} - μ (U_{1}))}^{T} Σ_{*} (ξ_{1} \\ - μ (U_{1})) σ^{4} ε_{1}^{4}] + (i - 1) (i - 2) E [{(ξ_{1} - μ (U_{1}))}^{T} Σ_{*} (ξ_{1} - μ (U_{1})) (ξ_{2} \\ - μ (U_{2}))^{T} Σ_{*} (ξ_{2} - μ (U_{2})) σ^{4} ε_{1}^{2} ε_{2}^{2}]) \\ + \frac{1}{n^{4} {(σ^{4} Σ_{*}^{2})}^{2}} \sum_{i = 2}^{n} \sum_{j \neq i} ((i - 1) \land (j - 1) E [{(ξ_{1} - μ (U_{1}))}^{T} Σ_{*} (ξ_{1} - μ (U_{1})) (ξ_{1} \\ - μ (U_{1}))^{T} Σ_{*} (ξ_{1} - μ (U_{1})) σ^{4} ε_{1}^{4}] + ((i - 1) (j - 1) - (i - 1) \land (j - 1)) E [(ξ_{1} \\ - μ (U_{1}))^{T} Σ_{*} (ξ_{1} - μ (U_{1})) {(ξ_{2} - μ (U_{2}))}^{T} Σ_{*} (ξ_{2} - μ (U_{2})) σ^{4} ε_{1}^{2} ε_{2}^{2}]) - 1 \\ = \frac{E [ε_{1}^{4}]}{n σ^{4} {tr}^{2} (Σ_{*}^{2})} O (E [{(ξ_{1} - μ (U_{1}))}^{T} Σ_{*} (ξ_{1} - μ (U_{1})) {(ξ_{1} - μ (U_{1}))}^{T} Σ_{*} (ξ_{1} - μ (U_{1}))]) \\ = \frac{E [ε_{1}^{4}]}{n σ^{4} {tr}^{2} (Σ_{*}^{2})} O (E [tr (Σ_{*} (U_{1}) Σ_{*} Σ_{*} (U_{1}) Σ_{*})] + {tr}^{2} (Σ_{*} (U_{1}) Σ_{*}) \\ + Δ tr (diag (Γ^{T} (U_{1}) Σ_{*} Γ (U_{1})) Γ^{T} (U_{1}) Σ_{*} Γ (U_{1}))) \\ = \frac{E [ε_{1}^{4}]}{n σ^{4} {tr}^{2} (Σ_{*}^{2})} O (tr (Σ_{*}^{4}) + {tr}^{2} (Σ_{*}^{2})), \end{matrix}

By (C2), we have

C_{n 1} \overset{P}{⟶} 1

. Similarly, we can obtain

E [C_{n 2}] = 0

, and we have the following:

\begin{matrix} var (C_{n 2}) = E [C_{n 2}^{2}] \\ = O (\frac{2}{{(\binom{n}{2})}^{2} {tr}^{2} (Σ_{*}^{2})} \sum_{i = 2}^{n} ((i - 1) (i - 2) E [{(ξ_{1} - μ (U_{1}))}^{T} Σ_{*} (ξ_{2} - μ (U_{2})) (X_{2} \\ - μ (U_{2}))^{T} Σ_{*} (ξ_{1} - μ (U_{1}))] + \sum_{i = 2}^{n} \sum_{j \neq i} (i - 1) \land (j - 1) ((i - 1) \land (j - 1) - 1) \\ E [{(ξ_{1} - μ (U_{1}))}^{T} Σ_{*} (ξ_{2} - μ (U_{2})) {(ξ_{2} - μ (U_{2}))}^{T} Σ_{*} (ξ_{1} - μ (U_{1}))])) \\ = O (\frac{tr (Σ_{*}^{4})}{{tr}^{2} (Σ_{*}^{2})}) . \end{matrix}

Combined with

tr (Σ_{*}^{4}) = o ({tr}^{2} (Σ_{*}^{2}))

, we have

C_{n 2} \overset{P}{⟶} 0

. Thus, Equation (A6) holds. Finally, we only need to prove (A7). Hence, leveraging the law of large numbers and the fact that

E [Z_{n i}^{2} I (| Z_{n i} | > η σ^{2} tr (Σ_{*}^{2}))] \leq E (Z_{n i}^{4} | F_{i - 1}) / (η^{2} σ^{4} tr (Σ_{*}^{2})),

it suffices to demonstrate that

\sum_{2 \leq i \leq n} E (Z_{n i}^{4}) = o ({tr}^{2} (Σ_{*}^{2}))

. Through straightforward computations, we obtain the following result:

\begin{matrix} \sum_{i = 2}^{n} E [Z_{n i}^{4}] \\ = \frac{1}{{(\binom{n}{2})}^{2}} \sum_{j < i} E [{({(ξ_{i} - μ (U_{i}))}^{T} (ξ_{j} - μ (U_{j})))}^{4} ε_{i}^{4} ε_{j}^{4}] + \sum_{i = 2}^{n} E [Z_{n i}^{4}] \\ = \frac{1}{{(\binom{n}{2})}^{2}} \sum_{j < i} \sum_{k \neq j} E [{({(ξ_{i} - μ (U_{i}))}^{T} (ξ_{j} - μ (U_{j})))}^{2} {({(ξ_{i} - μ (U_{i}))}^{T} (ξ_{k} - μ (U_{k})))}^{2} ε_{i}^{4} ε_{j}^{2} ε_{k}^{2}] \\ = \frac{E^{2} (ε_{1}^{4})}{{(\binom{n}{2})}^{2}} \sum_{i = 2}^{n} (i - 1) E {[{(ξ_{1} - μ (U_{1}))}^{T} (ξ_{2} - μ (U_{2}))]}^{4} + \frac{3 σ^{4} E (ε_{1}^{4})}{{(\binom{n}{2})}^{2}} \sum_{i = 2}^{n} (i - 1) (i - 2) \\ E [{({(ξ_{1} - μ (U_{1}))}^{T} (ξ_{2} - μ (U_{2})))}^{2} ({(ξ_{1} - μ (U_{1}))}^{T} {(ξ_{3} - μ (U_{3}))}^{2})] \\ = O (\frac{E {[{(ξ_{1} - μ (U_{1}))}^{T} (ξ_{2} - μ (U_{2}))]}^{4}}{n^{2}}) + O (\frac{E {[{(ξ_{1} - μ (U_{1}))}^{T} Σ_{*} (ξ_{1} - μ (U_{1}))]}^{2}}{n}) \end{matrix}

Combining (C2) and Lemma 2, Equation (A7) holds. Thus, the proof of Theorem 2 is completed. □

References

Crainiceanu, C.M.; Staicu, A.M.; Di, C.Z. Generalized multilevel functional regression. J. Am. Stat. Assoc. 2009, 104, 1550–1561. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Zhou, F.; Li, C.; Yin, N.; Liu, H.; Zhuang, B.; Huang, Q.; Wen, Y. Gene Association Analysis of Quantitative Trait Based on Functional Linear Regression Model with Local Sparse Estimator. Genes 2023, 14, 834. [Google Scholar] [CrossRef] [PubMed]
Kokoszka, P.; Miao, H.; Zhang, X. Functional dynamic factor model for intraday price curves. J. Financ. Econom. 2014, 13, 456–477. [Google Scholar] [CrossRef]
Rigueira, X.; Araújo, M.; Martínez, J.; García-Nieto, P.J.; Ocarranza, I. Functional Data Analysis for the Detection of Outliers and Study of the Effects of the COVID-19 Pandemic on Air Quality: A Case Study in Gijón, Spain. Mathematics 2022, 10, 2374. [Google Scholar] [CrossRef]
Yao, F.; Müller, H.G. Functional quadratic regression. Biometrika 2010, 97, 49–64. [Google Scholar] [CrossRef]
Lian, H. Functional partial linear model. J. Nonparametr. Stat. 2011, 23, 115–128. [Google Scholar] [CrossRef]
Zhou, J.; Chen, M. Spline estimators for semi-functional linear model. Stat. Probab. Lett. 2012, 82, 505–513. [Google Scholar] [CrossRef]
Tang, Q. Estimation for semi-functional linear regression. Statistics 2015, 49, 1262–1278. [Google Scholar]
Zhang, Y.; Wu, Y. Robust hypothesis testing in functional linear models. J. Stat. Comput. Simul. 2023, 93, 2563–2581. [Google Scholar] [CrossRef]
Kokoszka, P.; Maslova, I.; Sojka, J.; Zhu, L. Testing for lack of dependence in the functional linear model. Can. J. Stat. 2008, 36, 207–222. [Google Scholar] [CrossRef]
James, G.M.; Wang, J.; Zhu, J. Functional linear regression that’s interpretable. Ann. Stat. 2009, 37, 2083–2108. [Google Scholar] [CrossRef]
Shin, H. Partial functional linear regression. J. Stat. Plan. Inference 2009, 139, 3405–3418. [Google Scholar] [CrossRef]
Yu, P.; Zhang, Z.; Du, J. A test of linearity in partial functional linear regression. Metrika 2016, 79, 953–969. [Google Scholar] [CrossRef]
Hu, H.; Zhang, R.; Yu, Z.; Lian, H.; Liu, Y. Estimation and testing for partially functional linear errors-in-variables models. J. Multivar. Anal. 2019, 170, 296–314. [Google Scholar]
Smaga, Ł. General linear hypothesis testing in functional response model. Commun. Stat.-Theory Methods 2019, 50, 5068–5083. [Google Scholar]
Zhu, H.; Zhang, R.; Li, H. Estimation on semi-functional linear errors-in-variables models. Commun. Stat.-Theory Methods 2019, 48, 4380–4393. [Google Scholar] [CrossRef]
Zhou, J.; Peng, Q. Estimation for functional partial linear models with missing responses. Stat. Probab. Lett. 2020, 156, 108598. [Google Scholar] [CrossRef]
Zhao, F.; Zhang, B. Testing linearity in functional partially linear models. Acta Math. Appl. Sin. Engl. Ser. 2024, 40, 875–886. [Google Scholar] [CrossRef]
Hu, W.; Lin, N.; Zhang, B. Nonparametric testing of lack of dependence in functional linear models. PLoS ONE 2020, 15, e0234094. [Google Scholar] [CrossRef]
Zhao, F.; Lin, N.; Hu, W.; Zhang, B. A faster U-statistic for testing independence in the functional linear models. J. Stat. Plan. Inference 2022, 217, 188–203. [Google Scholar] [CrossRef]
Zhao, F.; Lin, N.; Zhang, B. A new test for high-dimensional regression coefficients in partially linear models. Can. J. Stat. 2023, 51, 5–18. [Google Scholar] [CrossRef]
Cui, H.; Guo, W.; Zhong, W. Test for high-dimensional regression coefficients using refitted cross-validation variance estimation. Ann. Stat. 2018, 46, 958–988. [Google Scholar] [CrossRef]
Zhong, P.; Chen, S. Tests for high-dimensional regression coefficients with factorial designs. J. Am. Stat. Assoc. 2011, 106, 260–274. [Google Scholar] [CrossRef]
Chen, S.; Zhang, L.; Zhong, P. Tests for high-dimensional covariance matrices. J. Am. Stat. Assoc. 2010, 105, 810–819. [Google Scholar] [CrossRef]
Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis: Theory and Practice; Springer: New York, NY, USA, 2006. [Google Scholar]
Shang, H.L. Bayesian bandwidth estimation for a semi-functional partial linear regression model with unknown error density. Comput. Stat. 2014, 29, 829–848. [Google Scholar] [CrossRef]
Yu, P.; Zhang, Z.; Du, J. Estimation in functional partial linear composite quantile regression model. Chin. J. Appl. Probab. Stat. 2017, 33, 170–190. [Google Scholar]

Figure 1. The null distributions and q-q plots of our proposed test when

g (u) = 2 u

.

Figure 1. The null distributions and q-q plots of our proposed test when

g (u) = 2 u

.

Figure 2. The null distributions and the q-q plots of our proposed test when

g (u) = 2 + s i n (2 π u)

.

Figure 2. The null distributions and the q-q plots of our proposed test when

g (u) = 2 + s i n (2 π u)

.

Figure 3. Empirical power functions of our proposed test when

g (u) = 2 u

.

Figure 3. Empirical power functions of our proposed test when

g (u) = 2 u

.

Figure 4. Empirical power functions of our proposed test when

g (u) = 2 + s i n (2 π u)

.

Figure 4. Empirical power functions of our proposed test when

g (u) = 2 + s i n (2 π u)

.

Figure 5. Empirical power functions of our proposed test with the Gaussian kernel when

g (u) = 2 + s i n (2 π u)

.

Figure 5. Empirical power functions of our proposed test with the Gaussian kernel when

g (u) = 2 + s i n (2 π u)

.

Figure 6. (a) The estimator of the slope function in model (10); (b) the estimator of the slope function in model (11).

Table 1. When

g (u) = 2 u