A New Class of Weighted CUSUM Statistics

Shi, Xiaoping; Wang, Xiang-Sheng; Reid, Nancy

doi:10.3390/e24111652

Open AccessArticle

A New Class of Weighted CUSUM Statistics

by

Xiaoping Shi

^1,*

,

Xiang-Sheng Wang

²

and

Nancy Reid

³

¹

Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Kelowna, BC V1V 1V7, Canada

²

Department of Mathematics, University of Louisiana at Lafayette, Lafayette, LA 70503, USA

³

Department of Statistical Sciences, University of Toronto, Toronto, ON M5S 3G3, Canada

^*

Author to whom correspondence should be addressed.

Entropy 2022, 24(11), 1652; https://doi.org/10.3390/e24111652

Submission received: 20 September 2022 / Revised: 9 November 2022 / Accepted: 11 November 2022 / Published: 14 November 2022

(This article belongs to the Special Issue Recent Advances in Statistical Theory and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

A change point is a location or time at which observations or data obey two different models: before and after. In real problems, we may know some prior information about the location of the change point, say at the right or left tail of the sequence. How does one incorporate the prior information into the current cumulative sum (CUSUM) statistics? We propose a new class of weighted CUSUM statistics with three different types of quadratic weights accounting for different prior positions of the change points. One interpretation of the weights is the mean duration in a random walk. Under the normal model with known variance, the exact distributions of these statistics are explicitly expressed in terms of eigenvalues. Theoretical results about the explicit difference of the distributions are valuable. The expansions of asymptotic distributions are compared with the expansion of the limit distributions of the Cramér-von Mises statistic and the Anderson and Darling statistic. We provide some extensions from independent normal responses to more interesting models, such as graphical models, the mixture of normals, Poisson, and weakly dependent models. Simulations suggest that the proposed test statistics have better power than the graph-based statistics. We illustrate their application to a detection problem with video data.

Keywords:

asymptotic distribution; Brownian bridge; exact distribution; quadratic weights; weak dependence

1. Introduction

A change point is a location or time at which observations or data obey two different models: before and after. Detecting change points is a nontrivial problem and has been studied by many authors; see a book treatment in [1] and recent advances in CUSUM-based change point tests [2,3,4]. In real problems, we may know some prior information about the location of the change point, say at the right or left tail of the sequence. How does one incorporate prior information into current CUSUM-based statistics? We consider a new class of weighted CUSUM statistics for a simple model and provide some extensions to more complicated models.

Given a series of univariate random variables

Y_{1}, \dots, Y_{n}

, we consider the problem of testing whether there is a change in the mean of their distribution. The test statistic we use is:

S_{n} (Y; τ, γ) = \sum_{k = 1}^{n - 1} w_{k}^{- 1} (τ) {\{\sum_{i = 1}^{k} (Y_{i} - \bar{Y})\}}^{γ},

(1)

where

Y = {(Y_{1}, \dots, Y_{n})}^{⊤}

,

\bar{Y} = n^{- 1} \sum_{j = 1}^{n} Y_{j}

,

γ > 0

, and

w_{k} (τ) = - {(k - τ)}^{2} + max {τ^{2}, {(n - τ)}^{2}} = \{\begin{matrix} (n + k) (n - k), if τ = 0, \\ k (n - k), if τ = n / 2, \\ k (2 n - k), if τ = n, \end{matrix}

(2)

where

τ = 0, n / 2

, and n account for three different prior positions of the change point, respectively. We call

S_{n}

a weighted CUSUM (WC) statistic.

Inspired by the change point literature, we consider these types of quadratic weights. The term

max {τ^{2}, {(n - τ)}^{2}} = {max}_{0 \leq j \leq n} {(j - τ)}^{2}

is introduced to ensure that the weight

w_{k} (τ)

is positive for any

0 < k < n

. Usually, we choose

γ = 2

to capture the change in the mean. When

τ = n / 2

, the weight

w_{k} (n / 2) = k (n - k)

corresponds to the likelihood ratio test; see Csörgö and Horváth [1] and a related review in Jandhyala et al. [5]. If prior information indicates that the change point more likely occurs in the right or left tail of the sequence, we can set the weight

w_{k} (0) = (n + k) (n - k)

(left drifted to the symmetry center point 0) or

w_{k} (n) = k (2 n - k)

(right drifted to the symmetry center point n) to improve the power of the test.

One interpretation of the weights is the mean duration in a random walk

{X_{i}, i \geq 0}

on

N + 1

states,

{0, 1, \dots, N}

, whose transition probability is given by

P (X_{i + 1} = k \pm 1 | X_{i} = k) = 1 / 2

for

k = 1, \dots, N - 1

,

P (X_{i + 1} = 0 | X_{i} = 0) = 1,

and

P (X_{i + 1} = N | X_{i} = N) = 1

. Let T denote the random time at which the process first reaches 0 or N. Then, for

k = 1, \dots,

n - 1

,

E (T | X_{0} = k) = k (n - k) = w_{k} (n / 2)

if

N = n

;

E (T | X_{0} = k) = k (2 n - k) = w_{k} (n)

if

N = 2 n

; and

E (T | X_{0} = n - k) = (n + k) (n - k) = w_{k} (0)

if

N = 2 n

. Figure 1 depicts four vectors

w_{k}

for

n = 10

. The centers of symmetry of these quadratic weights are at different positions.

The weights in (1) can be thought of as an inverse prior probability on the change point, giving

S_{n}

a Bayesian flavor, as in Gardner [6], who used the uniform prior

n^{- 2}

, or Perron [7], who devised a unit-root test for time series. From a frequentist perspective, the weighted sum statistic offers an alternative to the maximum statistic most commonly used Csörgö and Horváth [1], which we show (in small simulations omitted here) has higher power, especially when the change point is at the center of the sequence for any

τ

, in the right tail of the sequence for

τ = n

, and in the left tail of the sequence for

τ = 0

.

For these types of quadratic weights, a couple of questions naturally arise: will different weights lead to different distributions of WC in Equation (1)? If so, how significant will the differences in the distribution be? If two different weights lead to the same distribution, are there any intrinsic reasons? Although one can estimate the distribution of WC by simulation, theoretical results about the explicit differences of the distributions are valuable. Moreover, simulations and computations of eigenvalues for large n are computationally expensive. To answer the aforementioned questions, we shall study the distribution of the WC theoretically; we derive Karhunen–Loève expansions of the exact and asymptotic distributions of the WC statistics. The calculation of a Karhunen–Loève expansion is a nontrivial task, even under the normal model. Gardner [6] discussed the uniform weight under the normal assumption, but the quadratic weights we consider here increase the difficulty substantially. We present below a unified theory that enables us to establish the distribution of WC using dual Hahn polynomials. The asymptotic distributions for the quadratic weights

w_{k} (0)

and

w_{k} (n)

are identical, and the expansions of asymptotic distributions between

w_{k} (0)

and another quadratic weight

w_{k} (n / 2)

differ by an odd number of terms. We make a comparison with the expansion of the limit distributions of the Cramér-von Mises statistic and the Anderson and Darling statistics; see also MacNeill [8].

The WC has some variants in other models. For example, in the graphical model,

γ

can be 1 if we replace

Y

with a count of edges. Here, the main challenge is to approximate the covariance of edge-count statistics under the null permutations. In the normal mixed model, a variant of WC can be derived by considering a marginal likelihood function. In the Poisson mixed model, however, the calculation of the marginal likelihood function is hindered by an integral without a closed form. To approximate this integral, one may use Laplace, or saddle point approximation [9,10,11,12,13]. Here, we apply the saddle point approximation to the integral and provide a variant of WC related to the log link. For the classical change point Poisson model without latent variables, see [1] (p. 27); for the Poisson process with a change point, we refer readers to Akman and Raftery [14], Loader [15]. Moreover, to adopt the assumption of weak dependence in practice, we avoid the estimation of the variance and provide a randomized version of WC.

The structure of the paper is outlined as follows. In Section 2, we derive the explicit expansions of the distribution of the WC statistics and explore their connections with the Karhunen–Loève expansion. We derive extended versions of WC by considering the observations as nodes in the graphical model and allowing the observations from a normal or Poisson mixed model to be weakly dependent. In Section 3, we discuss the power of the proposed WC test. In Section 4, we use simulation to compare the performance of this test with that of a graph-based test statistic. In Section 5, we present an application for video data. In Section 6, we discuss the extension to multiple change points and suggest future work on other quadratic weights.

2. Exact and Asymptotic Distributions of the WC Statistics

2.1. Explicit Distribution for a Normal Model

We assume here that

{Y_{i}}

are independent following a normal distribution with a common known variance

σ^{2}

. The case of unknown

σ^{2}

is addressed in Remark 3, and an extension relaxing the independence assumption is given in Section 2.6.

Following the derivation in Gardner [6], we write (1) as a quadratic form

S_{n} (Y; τ, 2) = \frac{1}{n^{2}} \sum_{k = 1}^{n - 1} p_{k} {\{\sum_{i = 1}^{k} (n - k) Y_{i} - \sum_{i = k + 1}^{n} k Y_{i}\}}^{2} = \frac{1}{n^{2}} Y^{⊤} A A^{⊤} Y = Y^{⊤} Q Y,

(3)

where

p_{k} = p_{k} (τ) = w_{k}^{- 1} (τ)

, and

n^{2} Q = A A^{⊤}

with

A = (A_{1}, \dots, A_{n - 1})

. Here,

A_{k} = p_{k}^{1 / 2} {(n - k, \dots, n - k, - k, \dots, - k)}^{⊤}

such that the first k entries of

A_{k}

are

p_{k}^{1 / 2} (n - k)

and the last

n - k

entries

- p_{k}^{1 / 2} k

.

By using the recurrence identity and the dual Hahn polynomial, we obtain a new exact result in terms of the eigenvalues of Q in (3).

Theorem 1.

Assume that

{Y_{i}}

are independent normally distributed random variables with a common mean and known variance

σ^{2}

. The exact distribution of

S_{n} (Y; τ, 2)

is

\frac{S_{n} (Y; τ, 2)}{σ^{2}} =_{d} \sum_{k = 1}^{n} λ_{k} (τ) Z_{k}^{2},

(4)

where

Z_{k}^{2}

are independent and identically distributed normal random variables with mean zero and variance 1,

λ_{n} (τ) = 0

, and

λ_{k} (τ) = \{\begin{matrix} \frac{1}{k (k + 1)}, k = 1, \dots, n - 1, if τ = n / 2 \\ \frac{1}{2 k (2 k + 1)}, k = 1, \dots, n - 1, if τ = 0 o r n . \end{matrix}

The proof of Theorem 1 is given in Appendix A. We make the following remarks.

Remark 1.

It is interesting that

λ_{2 k} (n / 2) = λ_{k} (0)

for all

0 < k < n / 2

; namely, the eigenvalues for

w_{k} (n / 2)

with even indices coincide with the eigenvalues for

w_{k} (0)

with indices less than

n / 2

. As the sample size increases from n to

n + 1

, the

n - 1

nonzero eigenvalues are retained and the added nonzero eigenvalue must be

1 / {n (n + 1)}

for

w_{k} (n / 2)

or

1 / {2 n (2 n + 1)}

for

w_{k} (0)

or

w_{k} (n)

. This interesting phenomenon has not been seen in the uniform weights of Gardner [6]. As far as we know, this recursive property of the eigenvalues for the non-uniform weights is new. Figure 2 depicts the pattern of eigenvalues (cross products of rows and columns) illustrated by dots for three weights

w_{k} (n / 2)

(blue),

w_{k} (0)

(green), and

w_{k} (n)

(purple) with the increase of n.

Remark 2.

The distribution in (4) can be calculated numerically using Imhof’s method [16] or simulated by a Monte Carlo method, but accurate analytical approximations are potentially faster and more stable. A saddle point approximation to the distribution of quadratic forms in normal variates was studied in Kuonen [17], building on Daniels [9,18] and Lugannani and Rice [19].

Remark 3.

When the variance

σ^{2}

is unknown, we can replace

σ^{2}

with a consistent estimator

{\hat{σ}}^{2} = {(n - 1)}^{- 1} \sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2},

by using Slutsky’s lemma. This also holds in Corollary 1. For dependent data, one issue is to give a valid estimate of the variance; see Section 2.6.

2.2. Karhunen–Loève Expansion

The squared integral of a Brownian bridge arises in the study of tests for goodness-of-fit. Given a sample of independent and identically distributed random variables with an empirical distribution function

F_{n} (x)

, the statistic

ω_{n}^{2} (ψ) = n \int_{- \infty}^{\infty} {F_{n} (t) - F (t)}^{2} ψ {F (t)} d F (t)

provides a test of the null hypothesis that the observations come from the distribution

F (\cdot)

. The Cramér-von Mises statistic has

ψ (t) \equiv 1

, and the Anderson-Darling statistic has

ψ (t) = 1 / {t (1 - t)}

. Here, we shall discuss two new weights:

ψ (t) = 1 / {t (2 - t)}

and

ψ (t) = 1 / (1 - t^{2})

.

MacNeill [8] showed that

\int_{0}^{1} {B (t) - t B (1)}^{2} d t = \sum_{k = 1}^{\infty} \frac{1}{k^{2} π^{2}} Z_{k}^{2},

using a Fourier expansion of

B (t) - t B (1) = \sum_{k = 1}^{\infty} \sqrt{2} sin (k π t) / (k π) Z_{k}

, where

{\sqrt{2} sin (k π t),

k = 1, 2, \dots, \infty}

is an orthonormal basis in

L^{2} (0, 1)

and

B (t)

is a standard Brownian motion and

B (t) - t B (1)

is a Brownian bridge.

Anderson and Darling [20] showed that

\int_{0}^{1} \frac{{B (t) - t B (1)}^{2}}{t (1 - t)} d t = \sum_{k = 1}^{\infty} \frac{1}{k (k + 1)} Z_{k}^{2} .

In Appendix B, we use Jacobi polynomials to derive the Karhunen–Loève expansion for the integrals of the weighted square of the Brownian bridge with two new weights

ψ (t) = 1 /

{t (2 - t)}

and

ψ (t) = 1 / (1 - t^{2})

. The results are stated in the following theorem.

Theorem 2.

The two weights

ψ (t) = 1 / {t (2 - t)}

and

ψ (t) = 1 / (1 - t^{2})

lead to the same Karhunen–Loève expansions:

\begin{matrix} \int_{0}^{1} \frac{{B (t) - t B (1)}^{2}}{2 t - t^{2}} d t = \sum_{k = 1}^{\infty} \frac{1}{2 k (2 k + 1)} Z_{k}^{2} \end{matrix}

(5)

and

\begin{matrix} \int_{0}^{1} \frac{{B (t) - t B (1)}^{2}}{1 - t^{2}} d t = \sum_{k = 1}^{\infty} \frac{1}{2 k (2 k + 1)} Z_{k}^{2} . \end{matrix}

(6)

The proof of the above two equalities will be provided in Appendix B. One can see the equivalence of these two equalities by using a change of variable.

Given different probabilities (p), Table 1 presents the critical values

c_{p}

for which

p = P (χ_{n}^{2} (τ) \leq c_{p})

for different n, where

χ_{n}^{2} (τ) = \sum_{k = 1}^{n} λ_{k} (τ) Z_{k}^{2}

and calculations of critical values for finite n are based on Imhof’s method [16] implemented in R package CompQuadForm [21]. A few critical values are tabulated in Anderson and Darling [22] for

χ_{n}^{2} (τ)

with

n = \infty

. One can see the critical values converge very quickly as n increases to ∞.

In fact, we can connect the limit distribution of WC statistic and its functional limit distribution by the Karhunen–Loève expansion of the integral of the weighted square of Brownian bridge in terms of the Jacobi polynomials. Theorem 1 immediately implies the following asymptotic distribution as

n \to \infty

.

Corollary 1.

Under the assumptions of Theorem 1, when

n \to \infty

,

\frac{S_{n} (Y; τ, 2)}{σ^{2}} \to_{d} \sum_{k = 1}^{\infty} λ_{k} (τ) Z_{k}^{2} .

One can check

\sum_{k = n}^{\infty} λ_{k} (τ) Z_{k}^{2} \to_{p} 0

by Markov’s inequality. Hence,

\sum_{k = 1}^{n - 1} λ_{k} (τ) Z_{k}^{2}

converges to

\sum_{k = 1}^{\infty} λ_{k} (τ) Z_{k}^{2}

in probability as

n \to \infty

. By the functional limit theorem,

\frac{S_{n} (Y; τ, 2)}{σ^{2}} \to_{d} \{\begin{matrix} \int_{0}^{1} \frac{{B (t) - t B (1)}^{2}}{1 - t^{2}} d t, if τ = 0, \\ \int_{0}^{1} \frac{{B (t) - t B (1)}^{2}}{t (1 - t)} d t, if τ = n / 2, \\ \int_{0}^{1} \frac{{B (t) - t B (1)}^{2}}{2 t - t^{2}} d t, if τ = n . \end{matrix}

2.3. Graphical Model

Assume the

{Y_{i, j}, 1 \leq i \leq n, 1 \leq j \leq q}

are independent and have common mean

E (Y_{i, j}) = μ_{i}

and variance

Var (Y_{i, j}) = σ_{i}^{2}

. Consider testing

H_{0} : μ_{i} \equiv μ, σ_{i}^{2} \equiv σ^{2} v s H_{a} : μ_{i} = \{\begin{matrix} μ_{-}, for i \leq k^{*}, \\ μ_{+}, for k^{*} > i, \end{matrix} or σ_{i}^{2} = \{\begin{matrix} σ_{-}^{2}, for i \leq k^{*}, \\ σ_{+}^{2}, for k^{*} > i, \end{matrix}

(7)

where

μ_{-} \neq μ_{+}

or

σ_{-}^{2} \neq σ_{+}^{2}

, the parameters

μ

,

μ_{-}

,

μ_{+}

,

σ^{2}

,

σ_{-}^{2}

, and

σ_{+}^{2}

are unknown.

A graphical model can be established by treating each q-dimensional vector as a node and assigning the Euclidean distance between any two vectors. Here, we consider a path

P

with an ordering of nodes

(v_{1}, \dots, v_{n})

and edges

(v_{i}, v_{i + 1})

for

i = 1, \dots, n - 1

. Associated with the path, the count of edges that connect nodes between arbitrary two disjoint sets

N_{k} = {1, \dots, k}

and

{\bar{N}}_{k} = {k + 1, \dots, n}

is defined to be:

C_{P} (N_{k}, {\bar{N}}_{k}) = \sum_{i = 1}^{n - 1} I [\{(v_{i} \in N_{k}) \cap (v_{i + 1} \in {\bar{N}}_{k})} \cup {(v_{i + 1} \in N_{k}) \cap (v_{i} \in {\bar{N}}_{k})\}],

(8)

where

I (\cdot)

is an indicator function that takes 1 if true otherwise 0. The

C_{P} (N_{k}, {\bar{N}}_{k})

counts edges between two groups

N_{k}

and

{\bar{N}}_{k}

.

Denote the expectation and variance of

C_{P} (N_{k}, {\bar{N}}_{k})

under

n!

permutations of nodes as

E_{perm} C_{P} (N_{k}, {\bar{N}}_{k})

and

{Var}_{perm} C_{P} (N_{k}, {\bar{N}}_{k})

. By [23],

E_{perm} C_{P} (N_{k}, {\bar{N}}_{k}) = \frac{2 k (n - k)}{n} and {Var}_{perm} C_{P} (N_{k}, {\bar{N}}_{k}) = \frac{2 k (n - k) {2 k (n - k) - n}}{n^{3} - n^{2}} .

A WC statistic may be constructed as

S_{n} (P; τ, γ) = \sum_{k = 1}^{n - 1} w_{k}^{- 1} (τ) {\{- C_{P} (N_{k}, {\bar{N}}_{k}) + \frac{2 k (n - k)}{n}\}}^{γ} .

(9)

A large value of observed

S_{n} (P^{*}; τ, γ)

based on the shortest Hamiltonian path (SHP),

P^{*}

, indicates a rejection of the null hypothesis, i.e., there is a change point; see the heuristic algorithm of SHP in Biswas et al. [24] and the analysis of power and change point in Shi, Wu and Rao [25], Shi, Wu and Rao [26] for

γ = 2

and

w_{k} (τ) = {Var}_{perm} C_{P} (N_{k}, {\bar{N}}_{k})

. Here, we will establish the asymptotic distribution of

S_{n} (P; τ, γ)

for

γ = 1, 2

. First, we give the following Lemma.

Lemma 1.

For

k = ⌊t n⌋

with

0 < t < 1

,

\frac{1}{\sqrt{2 n}} \{- C_{P} (N_{k}, {\bar{N}}_{k}) + \frac{2 k (n - k)}{n}\} \to_{d} {B (t) - t B (1)}^{2} - t (1 - t), n \to \infty .

(10)

By the functional limit theorem,

\sqrt{\frac{n}{2}} S_{n} (P; τ, 1) \to_{d} \{\begin{matrix} \int_{0}^{1} \frac{{B (t) - t B (1)}^{2}}{1 - t^{2}} d t + log (2) - 1, if τ = 0, \\ \int_{0}^{1} \frac{{B (t) - t B (1)}^{2}}{t (1 - t)} d t - 1, if τ = n / 2, \\ \int_{0}^{1} \frac{{B (t) - t B (1)}^{2}}{2 t - t^{2}} d t + log (2) - 1, if τ = n, \end{matrix}

(11)

and

\frac{1}{2} S_{n} (P; τ, 2) \to_{d} \{\begin{matrix} \int_{0}^{1} \frac{{[{B (t) - t B (1)}^{2} - t (1 - t)]}^{2}}{1 - t^{2}} d t, if τ = 0, \\ \int_{0}^{1} \frac{{[{B (t) - t B (1)}^{2} - t (1 - t)]}^{2}}{t (1 - t)} d t, if τ = n / 2, \\ \int_{0}^{1} \frac{{[{B (t) - t B (1)}^{2} - t (1 - t)]}^{2}}{2 t - t^{2}} d t, if τ = n, \end{matrix}

(12)

which solves an open problem in [25,26]. Different values of

γ

lead to different rates of convergence and different ”normings”.

2.4. Normal Mixed Model

Assume

Y_{i, j} = μ_{i} + U_{i} + e_{i, j},

where

1 \leq i \leq n

,

1 \leq j \leq q

,

e_{i, j}

are independent and identically normally distributed with mean zero and variance

σ^{2}

, and

U_{i}

are independent latent variables following a normal distribution with mean zero and variance

ν^{2}

.

Consider testing

H_{0} : μ_{i} \equiv μ v s H_{a} : μ_{i} = \{\begin{matrix} μ_{-}, for i \leq k^{*}, \\ μ_{+}, for k^{*} > i, \end{matrix}

(13)

where

μ_{-} \neq μ_{+}

, the parameters

μ

,

μ_{-}

and

μ_{+}

are unknown, and we tentatively assume the time

k^{*}

, called the change point, and the variances

σ^{2}

and

ν^{2}

to be known.

The marginal log-likelihood function of

μ

under

H_{0}

is

ℓ (μ) = ℓ_{0} - \frac{\sum_{i = 1}^{n} {({\bar{Y}}_{i •} - μ)}^{2}}{2 ν^{2} + 2 σ^{2} / q},

where

ℓ_{0}

does not depend on

μ

and

{\bar{Y}}_{i •} = \sum_{j = 1}^{q} Y_{i, j} / q .

Therefore,

max_{μ} ℓ (μ) = ℓ_{0} - \sum_{i = 1}^{n} \frac{{({\bar{Y}}_{i •} - {\hat{μ}}_{1, n})}^{2}}{2 ν^{2} + 2 σ^{2} / q},

where

{\hat{μ}}_{t_{1}, t_{2}} = \sum_{i = t_{1}}^{t_{2}} {\bar{Y}}_{i •} / (t_{2} - t_{1} + 1)

.

In a similar way, the marginal log-likelihood function of

μ_{-}

and

μ_{+}

under

H_{a}

can be obtained. Then, the marginal log-likelihood ratio is

\sum_{i = k^{*} + 1}^{n} \frac{{({\bar{Y}}_{i •} - {\hat{μ}}_{1, n})}^{2}}{ν^{2} + σ^{2} / q} - \sum_{i = 1}^{k^{*}} \frac{{({\bar{Y}}_{i •} - {\hat{μ}}_{1, k^{*}})}^{2}}{ν^{2} + σ^{2} / q} - \sum_{i = k^{*}}^{n} \frac{{({\bar{Y}}_{i •} - {\hat{μ}}_{k^{*} + 1, n})}^{2}}{ν^{2} + σ^{2} / q},

which is equal to

\frac{n {\{\sum_{i = 1}^{k^{*}} ({\bar{Y}}_{i •} - {\hat{μ}}_{1, n})\}}^{2}}{k^{*} (n - k^{*}) (ν^{2} + σ^{2} / q)} .

As the change point

k^{*}

could be unknown in practice, we may sum over

k^{*} = 1, \dots, n - 1

and consider the average value, which leads to

S_{n} ({\bar{Y}}_{•}; n / 2, 2) = \sum_{k = 1}^{n - 1} w_{k}^{- 1} (n / 2) {\{\sum_{i = 1}^{k} ({\bar{Y}}_{i •} - {\hat{μ}}_{1, n})\}}^{2} .

(14)

where

{\bar{Y}}_{•} = {({\bar{Y}}_{1, •}, \dots, {\bar{Y}}_{n, •})}^{⊤}

.

By Theorem 1 and Remark (3) in terms of weighted version for any

τ

, as

n \to \infty

,

\frac{S_{n} ({\bar{Y}}_{•}; τ, 2)}{{(n - 1)}^{- 1} {\{\sum_{i = 1}^{n} ({\bar{Y}}_{i •} - {\hat{μ}}_{1, n})\}}^{2}} \to_{d} \sum_{k = 1}^{\infty} λ_{k} (τ) Z_{k}^{2} .

2.5. Poisson Mixed Model

Assume

Y_{i, j}

follows a Poisson distribution with conditional mean

E (Y_{i, j} | U_{i}) = exp (ρ_{i} + U_{i})

. Consider testing

H_{0} : ρ_{i} \equiv ρ v s H_{a} : ρ_{i} = \{\begin{matrix} ρ_{-}, for 1 \leq i \leq k^{*}, \\ ρ_{+}, for k^{*} < i \leq n, \end{matrix}

(15)

where

ρ_{-} \neq ρ_{+}

, the parameters

ρ

,

ρ_{-}

and

ρ_{+}

are unknown. Under normal distribution for

U_{i}

, the likelihood ratio contains an integral. With the focus on the simple Poisson mixed model without a change point, Hall et al. [27,28] applied the Gaussian variational approximation (GVA) to approximate the integral so as to avoid solving the integral. We provide a saddle point approximation here.

The marginal log-likelihood function of

ρ

under

H_{0}

is

ℓ (ρ) = ℓ_{1} + \sum_{i = 1}^{n} log I_{i} (ρ),

(16)

where

ℓ_{1}

does not depend on r and

I_{i} (ρ) = \int_{- \infty}^{\infty} exp \{- q e^{ρ + u} + q {\bar{Y}}_{i •} (ρ + u) - \frac{u^{2}}{2 ν^{2}}\} d u

.

The calculation of

ℓ (ρ)

is hindered by the lack of a closed form of the integral

I_{i} (ρ)

. Here, we apply the saddle point approximation to the integral as shown in Lemma 2.

Lemma 2.

For the integral

I (ρ; a, b, ν^{2}) = \int_{- \infty}^{\infty} exp \{- b e^{u} + a u - \frac{{(u - ρ)}^{2}}{2 ν^{2}}\} d u

,

I (ρ; a, b, ν^{2}) \approx {(\frac{a}{b e})}^{a} \sqrt{\frac{2 π}{a}} e^{- \frac{{(c - ρ)}^{2}}{2 ν^{2}}},

where the symbol ≈ means asymptotic equivalence and the saddle point c solves

ϕ^{'} (u) = 0

with

ϕ (u) = a u - b e^{u}

, i.e.,

c = log (a / b)

.

In (16),

I_{i} (ρ) = I (ρ; q {\bar{Y}}_{i •}, q, ν^{2})

, so Lemma 2 gives the leading term as

ℓ (ρ) \approx ℓ_{1} - \sum_{i = 1}^{n} \frac{{(log {\bar{Y}}_{i •} - ρ)}^{2}}{2 ν^{2}},

and the leading term approximation to

{max}_{ρ} ℓ (ρ)

ℓ_{1} - \sum_{i = 1}^{n} \frac{{(log {\bar{Y}}_{i •} - {\hat{ρ}}_{1, n})}^{2}}{2 ν^{2}},

where

{\hat{ρ}}_{t_{1}, t_{2}} = \sum_{i = t_{1}}^{t_{2}} log {\bar{Y}}_{i •} / (t_{2} - t_{1} + 1)

.

In a similar way,

{max}_{ρ_{1}, ρ_{2}} ℓ (ρ_{1}, ρ_{2})

under

H_{a}

can be approximated, giving the approximate log-likelihood ratio

\begin{matrix} \sum_{i = 1}^{n} \frac{{(log {\bar{Y}}_{i •} - {\hat{ρ}}_{1, n})}^{2}}{ν^{2}} - \sum_{i = 1}^{k^{*}} \frac{{(log {\bar{Y}}_{i •} - {\hat{ρ}}_{1, k^{*}})}^{2}}{ν^{2}} - \sum_{i = k^{*} + 1}^{n} \frac{{(log {\bar{Y}}_{i •} - {\hat{ρ}}_{k^{*} + 1, n})}^{2}}{ν^{2}} \\ = \frac{n {\sum_{i = 1}^{k^{*}} (log {\bar{Y}}_{i •} - {\hat{ρ}}_{1, n})}^{2}}{k^{*} (n - k^{*}) ν^{2}} . \end{matrix}

(17)

Considering that the change point

k^{*}

is unknown, we may sum (17) over

k^{*} = 1, \dots,

n - 1

as shown in (1) and consider the average value,

S_{n} (log {\bar{Y}}_{•}; n / 2, 2) = \sum_{k = 1}^{n - 1} w_{k}^{- 1} (n / 2) {\{\sum_{i = 1}^{k} (log {\bar{Y}}_{i •} - {\hat{ρ}}_{1, n})\}}^{2} .

(18)

Note that the term

w_{k} (n / 2)

is derived from the approximate likelihood ratio statistic, different from the classical Poisson change point statistic in Csörgö and Horváth [1] (p. 27).

By Theorem 1 and Remark 3 in terms of weighted version for any

τ

, as

n, q \to \infty

,

\frac{S_{n} (log {\bar{Y}}_{•}; τ, 2)}{{(n - 1)}^{- 1} {\{\sum_{i = 1}^{n} (log {\bar{Y}}_{i •} - {\hat{ρ}}_{1, n})\}}^{2}} \to_{d} \sum_{k = 1}^{\infty} λ_{k} (τ) Z_{k}^{2} .

2.6. Weak Dependence

Now, we consider a space-time model for the distribution of

Y_{i, j}

, where i indexes time and j indexes space. First, we assume some weak dependence conditions on space by supposing the central limit theorem holds:

\frac{1}{\sqrt{q}} \sum_{j = 1}^{q} (Y_{i, j} - {\bar{Y}}_{i •}) \to_{d} N (0, σ^{2}),

(19)

where

σ^{2} = {lim}_{q \to \infty} q var ({\bar{Y}}_{i •})

.

Next, we assume some weak dependence conditions on time by supposing that the following invariance principle or functional central limit theorem holds for any

t \in (0, 1)

[29,30]:

\frac{1}{\sqrt{n q}} \sum_{i = 1}^{[n t]} (Y_{i •} - {\bar{Y}}_{• •}) \Rightarrow \tilde{σ} {B (t) - t B (1)},

(20)

where

{\bar{Y}}_{• •} = \sum_{i = 1}^{n} Y_{i •} / n = {\hat{μ}}_{1, n}

and

{\tilde{σ}}^{2} = {lim}_{n q \to \infty} n q var ({\bar{Y}}_{• •})

.

The weak dependence conditions in (19) and (20) are satisfied if the series is m-dependence, mixing, or linear process. Shao and Zhang [31] proposed a normalized change point statistic

M_{n, q} (Y_{•}) = max_{k} \frac{n}{w_{k}} {\{\sum_{i = 1}^{k} (Y_{i •} - {\bar{Y}}_{• •})\}}^{2},

(21)

where

Y_{•} = {(Y_{1, •}, \dots, Y_{n, •})}^{⊤}

and

w_{k} = \sum_{i = 1}^{k} {\sum_{j = 1}^{i} Y_{j •} - (i / k) \sum_{j = 1}^{i} Y_{j •}}^{2} + \sum_{i = k + 1}^{n} {\sum_{j = i}^{n} Y_{j •} - (n - i + 1) / (n - k) \sum_{j = k + 1}^{n} Y_{j •}}^{2}

is a random weight.

They showed that

\frac{M_{n, q} (Y_{•})}{q} \to_{d} max_{0 < t < 1} \frac{{B (t) - t B (1)}^{2}}{D_{1, 0, t} + D_{2, t, 1}},

where

D_{1, 0, t} = \int_{0}^{t} {B (s) - (s / t) B (t)}^{2} d s

and

D_{2, t, 1} = \int_{t}^{1} [B (1) - B (s) - (1 - s) / (1 - t)

{{B (1) - B (t)}]}^{2} d s

.

Similarly, with the same

w_{k}

as above, we propose a randomized version of WC:

S_{n, q} (Y_{•}) = \sum_{k = 1}^{n - 1} \frac{1}{w_{k}} {\{\sum_{i = 1}^{k} (Y_{i •} - {\bar{Y}}_{• •})\}}^{2} .

(22)

By the functional central limit theorem, when

n \to \infty

,

\frac{S_{n, q} (Y_{•})}{q} \to_{d} \int_{0}^{1} \frac{{B (t) - t B (1)}^{2}}{D_{1, 0, t} + D_{2, t, 1}} d t .

3. Power and Change Point Estimation

Considering the WC statistic

S_{n} ({\bar{Y}}_{•}; τ, 2)

in (14), we now consider the power of change point test based on

\frac{S_{n} ({\bar{Y}}_{•}; τ, 2)}{{(n - 1)}^{- 1} {\{\sum_{i = 1}^{n} ({\bar{Y}}_{i •} - {\hat{μ}}_{1, n})\}}^{2}},

(23)

under the alternative hypothesis in Section 2.4. We assume some weak dependence conditions in Section 2.6. We note that (23) has the same asymptotic null distribution as (4) in Theorem 1. The asymptotic distribution is shown in Theorem 2. To establish the consistency of the test, we make a further assumption that the change point index

k^{*}

is bounded away from the endpoints.

Theorem 3.

Assume

E (Y_{i, j}) = μ_{i} = μ_{-}

if

i \leq k^{*}

,

μ_{+}

otherwise. Under the alternative hypothesis, the change magnitude

Δ = μ_{+} - μ_{-} \neq 0

. Under a weak dependence satisfying (19) and (20),

0 < τ_{1} \leq k^{*} / n \leq τ_{2} < 1

,

τ_{1}

and

τ_{2}

are two constants,

n q Δ^{2} \to \infty

, and

n^{3 / 2} q^{- 1 / 2} | Δ | \to \infty

,

q S_{n} (Y_{•}; τ, 2) \to_{p} \infty

.

The proof of Theorem 3 is in Appendix E. As expected, the power of the test based on (23) increases with n, q, and the size of the change in the mean.

The estimated change point is

\begin{matrix} \hat{k} (τ) = arg max_{1 \leq k < n} w_{k}^{- 1 / 2} (τ) |\sum_{i = 1}^{k} (Y_{i •} - {\bar{Y}}_{• •})| . \end{matrix}

(24)

We refer the reader to Bai [32,33] for some early works on the asymptotic distribution of

\hat{k} (n / 2)

and [34] for a treatment on the convergence rate of

\hat{k} (n / 2)

.

4. Simulations

The main purpose of this simulation is to assess the effect of different values for

w_{k} (τ)

, n, q, and change magnitude on the power of our test in (23), and that of the graph-based tests [25,26,35], as both can handle high-dimensional data, and the distance of the graph can be changed to test different changes of parameters for a fair comparison. For example, if we are not sure whether the mean or variance changes, the Euclidean distance can be used to measure the distance between any two nodes in the graph:

d_{i_{1}, i_{2}} = {\{\sum_{j = 1}^{q} {(Y_{i_{1}, j} - Y_{i_{2}, j})}^{2}\}}^{1 / 2};

(25)

see Chen and Zhang [35], and Shi, Wu and Rao [25]. Another pseudo-distance can be used

d_{i_{1}, i_{2}}^{*} = |Y_{i_{1}, •} - Y_{i_{2}, •}|,

(26)

if only the change in the mean needs to be detected; see Shi, Wu and Rao [26]. We denote the maximal test of Chen and Zhang based on Euclidean distance by MST and based on the pseudo-distance by MST*. The associated algorithm is in the R package gSeg [36]. Similarly, we denote Shi, Wu, and Rao’s test (Shi, Wu and Rao [25,26]) based on Euclidean distance by SHP and based on the pseudo-distance by SHP

^{*}

, and the associated R package can be accessed from [37].

First, we simulate

{Y_{i, j}, 1 \leq i \leq k^{*}, 1 \leq j \leq q}

independent standard normal random variables and

{Y_{i, j}, k^{*} + 1 \leq i \leq n, 1 \leq j \leq q}

independent normal random variables with mean

Δ

and variance 1. The critical values for

α = 0.05

are given in Table 1 with

p = 1 - α

. We use these critical values and generate 200 simulations with sample sizes

n = 40, 80

, dimensions

q = 50, 100

, change point locations

k^{*} = n / 4, n / 2, 3 n / 4

, and change magnitude

Δ = 0.1, 0.2

.

In Table 2, we show the percentage of rejections of the null hypothesis at level

0.05

for each of the change point tests. We can see that the power of the graph-based method MST

^{*}

or SHP

^{*}

is higher than that of MST and SHP, which use the pseudo-distance for detecting changes in the mean. Interestingly, the power of the graph-based method for change point detection is still not as high as that of (23). This aspect of the comparison, which we have not seen in other literature so far, is considered a new and meaningful comparison, and at least we can claim that there is room for improvement in the change point detection of the graph-based method.

Now we look at the effect of the weights on the power. This weight

w_{k} (n / 2)

yields the highest power when the change point is in the middle; however, the

w_{k} (n)

weight yields the highest power when the change point is near the beginning of the sequence, and conversely, the

w_{k} (0)

weight yields the highest power when the change point is near the end of the sequence. Moreover, the power increases with increasing n, q, and

Δ

, which agrees with Theorem 3.

Now, we introduce a mixture distribution and slightly change the way the random variables are generated. We simulate

{Y_{i, j}, k^{*} + 1 \leq i \leq n, 1 \leq j \leq q}

from a mixture of two normal distributions with mixture weights (0.5, 0.5) or (0.8, 0.2), means (0, 0.2) or (0, 1), and variance always being (1, 1), which corresponds to

Δ = 0.1

or

Δ = 0.2

. We keep the other settings from the previous comparison. As we expected, the difference between Table 2 and Table 3 is very small.

5. Data Analysis

Here, we analyze the video data provided by Dr. Mathieu Lihorea, which are available from [26]. In Lihoreau, Chittka and Raine [38], the authors used artificial pollen to attract bees and an automatic monitoring camera to capture the bee’s flight path. However, this automatic monitoring feature does not fully start recording when the bee enters and stops recording when the bee leaves, in fact in this video, the recording starts before the bee enters and does not stop when the bee leaves. Since we only care about the part of the video with bees, detecting the arrival and departure of bees helps us to automatically cut the original video. Although the video contains the interference of ants, the bees are much larger compared to the ants, so it can be assumed that the presence and departure of the bees cause a change in the mean value of the pixel values of the image.

This video has a length of 49 seconds, a frame width of 352, a frame height of 288, and a frame rate of 29.97 frames per second. Shi, Wu and Rao [26] extracted the video into

n = 49

images according to the rate of one frame per second. From these 49 images, we can obtain that the image positions corresponding to the bee entering and leaving are 4 and 40, respectively. Moreover, we can extract this video into more images according to the rate of 2 or 5 frames per second. So, the number of images obtained, n, increases to 98 or 245, and at the same time, the positions of the images corresponding to the entry and exit of the bees also change with n. If we call the image locations where these bees appear and leave as change points,

k^{*}

, we assume that

k^{*} / n

is constant with respect to n and close to 0 or 1, respectively. In Figure 3 the first row is four images located at 4 (change point), 5, 40 (change point), and 41 from extracted 49 images; the second row is four images located at 7 (change point), 8, 79 (change point), and 80 from extracted 98 images; and the third row is four images located at 19 (change point), 20, 198 (change point), and 199 from extracted 245 images. Since the images contain R, G, and B components, we use a weighted average of the R, G, and B components and same-scale transformations on the weighted average as suggested by Shi, Wu and Rao [26].

Our quadratic weight test statistics are able to detect these two change points. We compared them to the graph-based change point estimates by applying the method of SHP

^{*}

and MST

^{*}

once to the whole sequence. As shown in Table 4, all tests are significant at a level 0.05 except the quadratic weight

w_{k} (0)

for the size of 49 returns a p-value 0.067;

w_{k} (0)

and

w_{k} (n

) give the estimates of the second and first change points, respectively;

w_{k} (n / 2)

gives the same estimates of change points as

w_{k} (n)

, and both cannot give the estimates of the second change point, such as SHP

^{*}

and MST

^{*}

. Thus, we recommend these two weights

w_{k} (0)

and

w_{k} (n)

for detecting the departure and arrival of the bee.

6. Discussion

This paper mainly focuses on single change point detection. However, it is possible to extend our method and apply the WC statistic to the detection of multiple change points. An approach recommended in the literature is to select data intervals where there is evidence for a single change point. Some researchers suggested penalty procedures based on either the adaptive lasso [39] or smoothly clipped absolute deviation [40,41,42]; others applied CUSUM statistics [4,43,44,45]. As long as the aforementioned intervals have been chosen, one could use tests based on WC. If the tests are rejected for some of the intervals, then the change point can be estimated by (24).

It would also be of interest, although challenging, to consider other quadratic weights, such as

w_{k} (n / 4)

and

w_{k} (3 n / 4)

, as these statistics may be more powerful to detect some change points that are close to the third-quarter and quarter positions of the sequence. The eigenvalues of these quadratic terms may not have recursive formulas.

Author Contributions

X.S., X.-S.W. and N.R. designed research; X.S., X.-S.W. and N.R. performed research; X.S. analyzed data; X.S., X.-S.W. and N.R. wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

Shi’s work was supported by NSERC Discovery Grant RGPIN 2022-03264, the Interior Universities Research Coalition and the BC Ministry of Health, and the University of British Columbia Okanagan (UBC-O) Vice Principal Research in collaboration with UBC-O Irving K. Barber Faculty of Science.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank two anonymous reviewers for helpful comments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Theorem 1.

The exact distribution of

S_{n} (Y; τ, 2)

is determined by the eigenvalues of Q. Define the

n \times (n - 1)

matrix

B = (B_{1}, \dots, B_{n - 1})

with

B_{k} = p_{k}^{- 1 / 2} (0, \dots, 0, 1, - 1,

{0, \dots, 0)}^{⊤}

such that all entries of

B_{k}

are zeros except the k entry

p_{k}^{- 1 / 2}

and the

k + 1

entry

- p_{k}^{- 1 / 2}

. It is readily seen that

A^{⊤} B = n I_{n - 1}

and thus

B^{⊤} Q B = I_{n - 1}

. Note that

B^{⊤} B = P^{- 1 / 2} T P^{- 1 / 2} = (\begin{matrix} p_{1}^{- 1 / 2} \\ ⋱ \\ p_{n - 1}^{- 1 / 2} \end{matrix}) (\begin{matrix} 2 & - 1 \\ - 1 & 2 & - 1 \\ ⋱ & ⋱ & ⋱ \\ - 1 & 2 & - 1 \\ - 1 & 2 \end{matrix}) (\begin{matrix} p_{1}^{- 1 / 2} \\ ⋱ \\ p_{n - 1}^{- 1 / 2} \end{matrix}),

where P is a diagonal matrix with

P_{k k} = p_{k}

and T is a tridiagonal matrix with

T_{k k} = 2

and

T_{k, k + 1} = T_{k + 1, k} = - 1

.

We shall find the relationship between Q and

B^{⊤} B

. We diagonalize

B^{⊤} B = R Γ R^{⊤}

with

R R^{⊤} = I_{n}

and

Γ

diagonal matrix with

Γ_{k k} = γ_{k}

. Set

C = B R Γ^{- 1 / 2}

. We have

C^{⊤} C = I_{n - 1}

and

C^{⊤} Q C = Γ^{- 1}

. Finally, we introduce

u = {(n^{- 1 / 2}, \dots, n^{- 1 / 2})}^{⊤}

such that

Q u = 0

,

C^{⊤} u = 0

and

u^{⊤} u = 1

. Define

U = (C, u)

. It then follows that

U^{⊤} U = I_{n}

and

U^{⊤} Q U = (\begin{matrix} C^{⊤} Q C & C^{⊤} Q u \\ u^{⊤} Q C & u^{⊤} Q u \end{matrix}) = (\begin{matrix} Γ^{- 1} & 0 \\ 0 & 0 \end{matrix}) .

This implies that the nonzero eigenvalues of Q are reciprocals of those of

B^{⊤} B

.

Let

{(v_{1}, \dots, v_{n - 1})}^{⊤}

be the eigenvector corresponding to an eigenvalue

λ

of

B^{⊤} B

. We have the recurrence identity

- p_{k - 1}^{- 1 / 2} v_{k - 1} + 2 p_{k}^{- 1 / 2} v_{k} - p_{k + 1}^{- 1 / 2} v_{k + 1} = λ p_{k}^{1 / 2} v_{k},

(A1)

where

v_{0} = v_{n} = 0

,

p_{0} = 1

, and

k = 1, \dots, n - 1

. The above recurrence relation appears in Gardner [6] (1.7) as an eigenvalue equation for a forward difference operator. As mentioned in Gardner [6], it is difficult to find an explicit formula for the eigenvalues unless the prior distribution is uniform; i.e.,

p_{k}

is independent of k. To overcome this difficulty, we make use of the above recurrence relation and then apply the classical theory of orthogonal polynomials and special functions. To be more specific, we shall link the eigenvector in (A1) to the dual Hahn polynomial by making some transformations.

Let

v_{k} = {(p_{k} p_{k - 1})}^{1 / 2} \dots {(p_{1} p_{0})}^{1 / 2} f_{k}

. The above recurrence relation becomes

- \frac{f_{k - 1}}{p_{k} p_{k - 1}} + \frac{2 f_{k}}{p_{k}} - f_{k + 1} = λ f_{k} .

We further denote

g_{k} = {(- 1)}^{k} f_{k}

and

π_{k} = g_{k + 1}

. It is readily seen that

\frac{g_{k - 1}}{p_{k} p_{k - 1}} + \frac{2 g_{k}}{p_{k}} + g_{k + 1} = λ g_{k},

and (by shifting the index k)

\frac{π_{k - 1}}{p_{k} p_{k + 1}} + \frac{2 π_{k}}{p_{k + 1}} + π_{k + 1} = λ π_{k},

where

π_{- 1} = π_{n - 1} = 0

and

π_{0} = 1

. By induction,

π_{k}

is a monic kth order polynomial of

λ

.

Now, we consider three quadratic weights in (2).

Case I. If

p_{k} = 1 / {k (n - k)}

for

k = 1, \dots, n - 1

, then

π_{k}

is related to the dual Hahn polynomial

π_{k} = lim_{N \to n - 2} {(2)}_{k} {(- N)}_{k} R_{k} (λ - 2; 1, 1, N) = lim_{N \to n - 2} {(2)}_{k} {(- N)}_{k} \sum_{j = 0}^{k} \frac{{(- k)}_{j} {(- x)}_{j} {(x + 3)}_{j}}{{(1)}_{j} {(2)}_{j} {(- N)}_{j}},

where

{(a)}_{j} : = \prod_{i = 0}^{j - 1} (a + i)

is the Pochhammer symbol which is commonly used in the field of orthogonal polynomials and special functions,

R_{k}

is the dual Hahn polynomial of degree k and

λ - 2 = x (x + 3)

. In particular,

π_{n - 1} = {(- 1)}^{n - 1} {(- x)}_{n - 1} {(x + 3)}_{n - 1} = \prod_{j = 0}^{n - 2} (x - j) (x + 3 + j) = \prod_{j = 0}^{n - 2} [λ - (j + 1) (j + 2)] .

This implies that the eigenvalues of

B^{⊤} B

are

(j + 1) (j + 2)

for

j = 0, \dots, n - 2

. Consequently, the eigenvalues of Q are 0 and

1 / {k (k + 1)}

for

k = 1, \dots, n - 1

.

Case II. If

p_{k} = 1 / {k (2 n - k)}

for

k = 1, \dots, n - 1

, then

π_{k}

is related to the dual Hahn polynomial as follows

π_{k} = {(2)}_{k} {(2 - 2 n)}_{k} R_{k} (λ - 2; 1, 1, 2 n - 2) = {(2)}_{k} {(2 - 2 n)}_{k} \sum_{j = 0}^{k} \frac{{(- k)}_{j} {(- x)}_{j} {(x + 3)}_{j}}{{(1)}_{j} {(2)}_{j} {(2 - 2 n)}_{j}},

where

R_{k}

is the dual Hahn polynomial of degree k and

λ - 2 = x (x + 3)

. In particular,

π_{n - 1} = {(2)}_{n - 1} {(2 - 2 n)}_{n - 1} \sum_{j = 0}^{n - 1} \frac{{(1 - n)}_{j} {(- x)}_{j} {(x + 3)}_{j}}{{(1)}_{j} {(2)}_{j} {(2 - 2 n)}_{j}} .

By Watson’s sum,

π_{n - 1} = 0

when

x = 2 k - 1

or

x = - 2 - 2 k

with

k = 1, \dots, n - 1

. This implies that the eigenvalues of

B^{⊤} B

are

(2 k) (2 k + 1)

for

k = 1, \dots, n - 1

. Consequently, the eigenvalues of Q are 0 and

1 / {2 k (2 k + 1)}

with

k = 1, \dots, n - 1

.

Case III. If

p_{k} = 1 / {(n + k) (n - k)}

for

k = 1, \dots, n - 1

, then the eigenvalues of Q are also 0 and

1 / [2 k (2 k + 1)]

with

k = 1, \dots, n - 1

because the sequence

{p_{1}, \dots, p_{n - 1}}

is just the reverse of that in Case II.

Since it is a quadratic form in normal random variables, the results follow. □

Appendix B

Proof of Theorem 2.

Case I. For the weight

2 t - t^{2}

, we define

X_{t} = \frac{B (t) - t B (1)}{\sqrt{2 t - t^{2}}}

. By the Karhunen–Loève expansion,

X_{t} = \sum_{k = 1}^{\infty} Z_{k} e_{k} (t),

where random variables

Z_{k}

are stochastically independent normal and

e_{k} (\cdot)

are an orthonormal basis. Then, the integral of the square of

X_{t}

becomes

\sum_{k = 1}^{\infty} Z_{k}^{2},

and we need the variance of

Z_{k}

. We consider the covariance of

X_{t}

, called the Mercer Kernel:

K_{X} (t, s) = E (X_{t} X_{s}) = \frac{min (t, s) - t s}{\sqrt{2 t - t^{2}} \sqrt{2 s - s^{2}}} .

By Mercer’s theorem, there exists a set

{λ_{k}, e_{k} (t)}

such that

K_{X} (t, s) = \sum_{k = 1}^{\infty} λ_{k} e_{k} (t) e_{k} (s),

where

λ_{k}

are eigenvalues and

e_{k} (t)

are eigenfunctions satisfying the Fredholm integral equation

\int_{0}^{1} K_{X} (t, s) e_{k} (s) d s = λ_{k} e_{k} (t)

.

Thus, we have the eigenvalue problem

\int_{0}^{1} \frac{min (t, s) - t s}{\sqrt{2 t - t^{2}} \sqrt{2 s - s^{2}}} e (s) d s = λ e (t) .

Denote

e (t) = \sqrt{2 t - t^{2}} f (t)

. After the multiplication of

\sqrt{2 t - t^{2}}

on both sides, the above eigenvalue problem becomes

\int_{0}^{1} [min (t, s) - t s] f (s) d s = λ (2 t - t^{2}) f (t) .

Let

f (t) = \sum_{k = 0}^{\infty} f_{k} t^{k}

. It follows that

\sum_{k = 0}^{\infty} f_{k} \int_{0}^{1} [min (t, s) - t s] s^{k} d s = λ \sum_{k = 0}^{\infty} f_{k} (2 t - t^{2}) t^{k} .

Note that

\int_{0}^{1} [min (t, s) - t s] s^{k} d s = \frac{t - t^{k + 2}}{(k + 1) (k + 2)} .

We then have

\sum_{k = 0}^{\infty} \frac{f_{k} t}{(k + 1) (k + 2)} - \sum_{k = 0}^{\infty} \frac{f_{k} t^{k + 2}}{(k + 1) (k + 2)} = λ \sum_{k = 0}^{\infty} 2 f_{k} t^{k + 1} - λ \sum_{k = 0}^{\infty} f_{k} t^{k + 2} .

Obviously,

λ \neq 0

; otherwise

f_{k} = 0

for all

k \geq 0

. Now, we compare the coefficients of

t^{k}

with

k \geq 0

on both sides of the above identity. It follows that

2 λ f_{0} = \sum_{k = 0}^{\infty} \frac{f_{k}}{(k + 1) (k + 2)},

and

- \frac{f_{k}}{(k + 1) (k + 2)} = λ (2 f_{k + 1} - f_{k}) .

We shall prove that

λ_{n} = {1 / [(2 n) (2 n + 1)], n = 1, \dots}

are eigenvalues of

K_{X} (t, s)

. To see this, we obtain from the above recurrence relation

\frac{f_{k + 1}}{f_{k}} = \frac{(k + 1) (k + 2) - (2 n) (2 n + 1)}{2 (k + 1) (k + 2)} = \frac{(k + 1 - 2 n) (k + 2 n + 2)}{2 (k + 1) (k + 2)} .

Making use of the Pochhammer symbol

{(a)}_{j} : = \prod_{i = 0}^{j - 1} (a + i)

, we obtain

\frac{f_{k}}{f_{0}} = \frac{{(1 - 2 n)}_{k} {(2 n + 2)}_{k}}{2^{k} {(1)}_{k} {(2)}_{k}} .

Consequently,

2 λ f_{0} = \sum_{k = 0}^{\infty} \frac{f_{k}}{(k + 1) (k + 2)} = f_{0} \sum_{k = 0}^{\infty} \frac{{(1 - 2 n)}_{k} {(2 n + 2)}_{k}}{2^{k} {(1)}_{k + 1} {(2)}_{k + 1}} = \frac{2 f_{0}}{(- 2 n) (2 n + 1)} \sum_{j = 1}^{\infty} \frac{{(- 2 n)}_{j} {(2 n + 1)}_{j}}{2^{j} {(1)}_{j} {(2)}_{j}}

Since

\begin{matrix} \sum_{j = 0}^{\infty} \frac{{(- 2 n)}_{j} {(2 n + 1)}_{j}}{2^{j} {(1)}_{j} {(2)}_{j}} = 0, \end{matrix}

we then have

2 λ f_{0} = \frac{- 2 f_{0}}{(- 2 n) (2 n + 1)} = \frac{2 f_{0}}{(2 n) (2 n + 1)},

which agrees with

λ = 1 / [(2 n) (2 n + 1)]

. By normalizing

f_{0} = 1

, we can express the eigenfunction as

f (t) = \sum_{k = 0}^{\infty} \frac{{(1 - 2 n)}_{k} {(2 n + 2)}_{k}}{2^{k} {(1)}_{k} {(2)}_{k}} t^{k} = \frac{1}{2 n} P_{2 n - 1}^{(1, 1)} (1 - t),

where

P_{n}^{(α, β)} (z) = \frac{{(α + 1)}_{n}}{n!} \sum_{j = 0}^{n} \frac{{(- n)}_{k} {(n + α + β + 1)}_{k}}{{(1)}_{k} {(α + 1)}_{k}} {(\frac{1 - z}{2})}^{j}

is the Jacobi polynomial.

So, we have

\int_{0}^{1} \frac{{B (t) - t B (1)}^{2}}{2 t - t^{2}} d t = \sum_{k = 1}^{\infty} \frac{1}{2 k (2 k + 1)} Z_{k}^{2},

where

Z_{k}

are independent normal random variables, each having mean zero and variance 1. That proves (5).

Case II. For the weight

1 - t^{2}

, we intend to solve the eigenvalue problem

\int_{0}^{1} \frac{min (t, s) - t s}{\sqrt{1 - t^{2}} \sqrt{1 - s^{2}}} e (s) d s = λ e (t) .

Denote

e (t) = \sqrt{1 - t^{2}} f (t)

. After the multiplication of

\sqrt{1 - t^{2}}

on both sides, the above eigenvalue problem becomes

\int_{0}^{1} [min (t, s) - t s] f (s) d s = λ (1 - t^{2}) f (t) .

Let

f (t) = \sum_{k = 0}^{\infty} f_{k} t^{k}

. It follows that

\sum_{k = 0}^{\infty} f_{k} \int_{0}^{1} [min (t, s) - t s] s^{k} d s = λ \sum_{k = 0}^{\infty} f_{k} (1 - t^{2}) t^{k} .

Note that

\int_{0}^{1} [min (t, s) - t s] s^{k} d s = \frac{t - t^{k + 2}}{(k + 1) (k + 2)} .

We then have

\sum_{k = 0}^{\infty} \frac{f_{k} t}{(k + 1) (k + 2)} - \sum_{k = 0}^{\infty} \frac{f_{k} t^{k + 2}}{(k + 1) (k + 2)} = λ \sum_{k = 0}^{\infty} f_{k} t^{k} - λ \sum_{k = 0}^{\infty} f_{k} t^{k + 2} .

Obviously,

λ \neq 0

; otherwise

f_{k} = 0

for all

k \geq 0

. Now, we compare the coefficients of

t^{k}

with

k \geq 0

on both sides of the above identity. It follows that

f_{0} = 0

, and

λ f_{1} = \sum_{k = 0}^{\infty} \frac{f_{k}}{(k + 1) (k + 2)},

and

- \frac{f_{k}}{(k + 1) (k + 2)} = λ (f_{k + 2} - f_{k}) .

On account of

f_{0} = 0

, the above recurrence relation implies that

f_{2 j} = 0

for all

j \geq 0

. Now, we set

g_{j} = f_{2 j + 1}

with

j \geq 0

. The above recurrence relation (with

k = 2 j + 1

) becomes

- \frac{g_{j}}{(2 j + 2) (2 j + 3)} = λ (g_{j + 1} - g_{j}) .

For convenience, we let

1 / λ = 2 μ (2 μ + 1)

. It is readily seen that

\frac{g_{j + 1}}{g_{j}} = \frac{(2 j + 2) (2 j + 3) - (2 μ) (2 μ + 1)}{(2 j + 2) (2 j + 3)} = \frac{(j + 1 - μ) (j + 3 / 2 + μ)}{(j + 1) (j + 3 / 2)} .

Making use of the Pochhammer symbol

{(a)}_{j} : = \prod_{i = 0}^{j - 1} (a + i)

, we obtain

\frac{f_{2 j + 1}}{f_{1}} = \frac{g_{j}}{g_{0}} = \frac{{(1 - μ)}_{j} {(3 / 2 + μ)}_{j}}{{(1)}_{j} {(3 / 2)}_{j}} .

Consequently,

\sum_{k = 0}^{\infty} \frac{f_{k}}{(k + 1) (k + 2)} = \sum_{j = 0}^{\infty} \frac{f_{2 j + 1}}{(2 j + 2) (2 j + 3)} = \frac{f_{1}}{6} \sum_{j = 0}^{\infty} \frac{{(1 - μ)}_{j} {(3 / 2 + μ)}_{j}}{{(2)}_{j} {(5 / 2)}_{j}} .

The left-hand side is

λ f_{1}

. When

μ = n + 1

with

n = 0, 1, \dots

, by Pfaff-Saalschütz identity, we calculate the right-hand side as

\frac{f_{1}}{6} \sum_{j = 0}^{\infty} \frac{{(1 - μ)}_{j} {(3 / 2 + μ)}_{j} {(1)}_{j}}{{(2)}_{j} {(5 / 2)}_{j} {(1)}_{j}} = \frac{f_{1}}{6} \cdot \frac{{(1)}_{n} {(- 1 / 2 - n)}_{n}}{{(2)}_{n} {(- 3 / 2 - n)}_{n}} = \frac{f_{1}}{(2 n + 2) (2 n + 3)} .

Hence,

λ_{n} = \frac{1}{(2 n + 2) (2 n + 3)}

,

n = 0, 1, \dots

are the eigenvalues. By normalizing

f_{1} = 1

, we can express the eigenfunction as

f (t) = \sum_{j = 0}^{\infty} f_{2 j + 1} t^{2 j + 1} = \sum_{j = 0}^{n} \frac{{(- n)}_{j} {(5 / 2 + n)}_{j}}{{(1)}_{j} {(3 / 2)}_{j}} t^{2 j + 1} = \frac{t n!}{{(3 / 2)}_{n}} P_{n}^{(1 / 2, 1)} (1 - 2 t^{2}) .

So, we have

\int_{0}^{1} \frac{{B (t) - t B (1)}^{2}}{1 - t^{2}} d t = \sum_{k = 1}^{\infty} \frac{1}{2 k (2 k + 1)} Z_{k}^{2},

where

Z_{k}

are independent normal random variables, each having mean zero and variance 1. This gives (6). □

Appendix C

Proof of Lemma 1.

It is well known that

(1 - t) B (\frac{t}{1 - t}) =_{d} B (t) - t B (1)

We first show the covariance:

Cov {{(1 - s)}^{2} B^{2} (\frac{s}{1 - s}), {(1 - t)}^{2} B^{2} (\frac{t}{1 - t})}

for

0 < s < t < 1

. Note that

B (\frac{s}{1 - s}) \sim N (0, \frac{s}{1 - s})

.

Since

B^{2} (\frac{t}{1 - t}) - B^{2} (\frac{s}{1 - s}) = {B (\frac{t}{1 - t}) - B (\frac{s}{1 - s})}^{2} + 2 {B (\frac{t}{1 - t}) - B (\frac{s}{1 - s})} B (\frac{s}{1 - s})

, we have

E {B^{2} (\frac{t}{1 - t}) - B^{2} (\frac{s}{1 - s})} = E {B (\frac{t}{1 - t}) - B (\frac{s}{1 - s})}^{2}

and

E [B^{2} (\frac{s}{1 - s}) {B^{2} (\frac{t}{1 - t}) - B^{2} (\frac{s}{1 - s})}]

= E {B^{2} (\frac{s}{1 - s})} E {B (\frac{t}{1 - t}) - B (\frac{s}{1 - s})}^{2}

by the independence of increments. These lead to

E [B^{2} (\frac{s}{1 - s}) {B^{2} (\frac{t}{1 - t}) - B^{2} (\frac{s}{1 - s})}] = E {B^{2} (\frac{s}{1 - s})} E {B^{2} (\frac{t}{1 - t}) - B^{2} (\frac{s}{1 - s})} .

Therefore, we have

\begin{matrix} Cov {{(1 - s)}^{2} B^{2} (\frac{s}{1 - s}), {(1 - t)}^{2} B^{2} (\frac{t}{1 - t})} \\ = {(1 - s)}^{2} {(1 - t)}^{2} E {B^{2} (\frac{s}{1 - s}) B^{2} (\frac{t}{1 - t})} - s (1 - s) t (1 - t), \end{matrix}

where

\begin{matrix} E {B^{2} (\frac{s}{1 - s}) B^{2} (\frac{t}{1 - t})} & = E [B^{2} (\frac{s}{1 - s}) {B^{2} (\frac{s}{1 - s}) + B^{2} (\frac{t}{1 - t}) - B^{2} (\frac{s}{1 - s})}] \\ = E {B^{4} (\frac{s}{1 - s})} + E {B^{2} (\frac{s}{1 - s})} E {B^{2} (\frac{t}{1 - t}) - B^{2} (\frac{s}{1 - s})} \\ = 3 {(\frac{s}{1 - s})}^{2} + \frac{s}{1 - s} (\frac{t}{1 - t} - \frac{s}{1 - s}) . \end{matrix}

So,

Cov {{(1 - s)}^{2} B^{2} (\frac{s}{1 - s}), {(1 - t)}^{2} B^{2} (\frac{t}{1 - t})} = 2 s^{2} {(1 - t)}^{2}

.

In the next step, we will show that

n^{- 1} {Cov}_{perm} {C_{P} (N_{ℓ}, {\bar{N}}_{ℓ}), C_{P} (N_{m}, {\bar{N}}_{m})} \to 4 s^{2} {(1 - t)}^{2}

for

ℓ = ⌊s n⌋

,

m = ⌊t n⌋

, and

ℓ < m

.

We first decompose

{Cov}_{perm} {C_{P} (N_{ℓ}, {\bar{N}}_{ℓ}), C_{P} (N_{m}, {\bar{N}}_{m})} .

Note that

C_{P} (N_{ℓ}, {\bar{N}}_{ℓ}) = C_{P} (N_{ℓ},

{\bar{N}}_{ℓ} \cap N_{m}) + C_{P} (N_{ℓ}, {\bar{N}}_{m})

and

C_{P} (N_{m}, {\bar{N}}_{m}) = C_{P} (N_{ℓ}, {\bar{N}}_{m}) + C_{P} ({\bar{N}}_{ℓ} \cap N_{m}, {\bar{N}}_{m})

. Then,

\begin{matrix} {Cov}_{perm} {C_{P} (N_{ℓ}, {\bar{N}}_{ℓ}), C_{P} (N_{m}, {\bar{N}}_{m})} \\ = {Cov}_{perm} {C_{P} (N_{ℓ}, {\bar{N}}_{ℓ} \cap N_{m}), C_{P} (N_{ℓ}, {\bar{N}}_{m})} + {Cov}_{perm} {C_{P} (N_{ℓ}, {\bar{N}}_{ℓ} \cap N_{m}), C_{P} ({\bar{N}}_{ℓ} \cap N_{m}, {\bar{N}}_{m})} \\ + {Cov}_{perm} {C_{P} (N_{ℓ}, {\bar{N}}_{m}), C_{P} (N_{ℓ}, {\bar{N}}_{m})} + {Cov}_{perm} {C_{P} (N_{ℓ}, {\bar{N}}_{m}), C_{P} ({\bar{N}}_{ℓ} \cap N_{m}, {\bar{N}}_{m})} . \end{matrix}

To calculate the covariance, we need the following moments for any disjoint subsets

A_{1}

,

A_{2}

, and

A_{3}

of

N_{n}

. Their sizes are denoted as

n_{1}

,

n_{2}

and

n_{3}

with

n_{1} + n_{2} + n_{3} \leq n

.

\begin{matrix} E_{perm} C_{P} (A_{1}, A_{2}) & = \sum_{i = 1}^{n - 1} P [\{(v_{i} \in A_{1}) \cap (v_{i + 1} \in A_{2})\} \cup \{(v_{i + 1} \in A_{1}) \cap (v_{i} \in A_{2})\}] \\ = \frac{2 n_{1} n_{2}}{n} . \end{matrix}

(A2)

The following calculations of second moments need to consider three cases

i = j

,

| i - j | = 1

and

| i - j | > 1

for

1 \leq i, j < n

, with

# {i = j | 1 \leq i, j < n} = n - 1,

# {| i - j | = 1 | 1 \leq i, j < n} = 2 (n - 2),

and

# {| i - j | > 1 | 1 \leq i, j < n} = (n - 2) (n - 3) .

Therefore, we have

\begin{matrix} E_{perm} {C_{P} (A_{1}, A_{2}) C_{P} (A_{2}, A_{3})} \\ = \sum_{i = 1}^{n - 1} \sum_{j = 1}^{n - 1} P [\{(v_{i} \in A_{1}) \cap (v_{i + 1}, v_{j + 1} \in A_{2}) \cap (v_{j + 1} \in A_{3})\} \\ \cup \{(v_{i} \in A_{1}) \cap (v_{i + 1}, v_{j + 1} \in A_{2}) \cap (v_{j} \in A_{3})\} \\ \cup \{(v_{i + 1} \in A_{1}) \cap (v_{i}, v_{j} \in A_{2}) \cap (v_{j + 1} \in A_{3})\} \\ \cup \{(v_{i + 1} \in A_{1}) \cap (v_{i}, v_{j + 1} \in A_{2}) \cap (v_{j} \in A_{3})\}] \\ = \sum_{i = j, 1 \leq i, j < n} 0 + \sum_{| i - j | = 1, 1 \leq i, j < n} \frac{n_{1} n_{2} n_{3}}{n (n - 1) (n - 2)} + \sum_{| i - j | > 1, 1 \leq i, j < n} \frac{4 n_{1} n_{2} (n_{2} - 1) n_{3}}{n (n - 1) (n - 2) (n - 3)} \\ = \frac{2 n_{1} n_{3} n_{2} (2 n_{2} - 1)}{n (n - 1)} . \end{matrix}

(A3)

In a similar way, we have

\begin{matrix} E_{perm} {C_{P}^{2} (A_{1}, A_{2})} & = \sum_{i = j, 1 \leq i, j < n} \frac{2 n_{1} n_{2}}{n (n - 1)} + \sum_{| i - j | = 1, 1 \leq i, j < n} \frac{n_{1} n_{2} (n_{1} + n_{2} - 2)}{n (n - 1) (n - 2)} \\ + \sum_{| i - j | > 1, 1 \leq i, j < n} \frac{4 n_{1} (n_{1} - 1) n_{2} (n_{2} - 1)}{n (n - 1) (n - 2) (n - 3)} \\ = \frac{2 n_{1} n_{2}}{n} + \frac{2 n_{1} n_{2} (n_{1} + n_{2} - 2)}{n (n - 1)} + \frac{4 n_{1} (n_{1} - 1) n_{2} (n_{2} - 1)}{n (n - 1)} . \end{matrix}

(A4)

After tedious calculations using (A3) and (A4), we have

{Cov}_{perm} {C_{P} (N_{ℓ}, {\bar{N}}_{ℓ}), C_{P} (N_{m}, {\bar{N}}_{m})} = 2 ℓ (n - m) {2 ℓ (n - m) - n} / (n^{3} - n^{2}),

which leads to

n^{- 1} {Cov}_{perm} {C_{P} (N_{ℓ}, {\bar{N}}_{ℓ}), C_{P} (N_{m}, {\bar{N}}_{m})} \to 4 s^{2} {(1 - t)}^{2}

. The proof of Lemma 1 is finished. □

Appendix D

Proof of Lemma 2.

Consider the integral

I (ρ; a, b, ν^{2}) = \int_{- \infty}^{\infty} exp {a u - b e^{u} - {(u - ρ)}^{2} / (2 ν^{2})} d u,

where

a > 0

is large and

b > 0

. The saddle point for the phase function

a u - b e^{u}

is

c = log (a / b)

. We set

s = u - c

and define

a z^{2} / 2 = b e^{u} - a u - (b e^{c} - a c) = a (e^{s} - s - 1)

such that z is analytic near

s = 0

and

z \sim s

as

s \to 0

. It is easily seen that

z = s + \frac{s^{2}}{6} + \frac{s^{3}}{36} + \frac{s^{4}}{270} + \dots, s = z - \frac{z^{2}}{6} + \frac{z^{3}}{36} - \frac{z^{4}}{270} + \dots .

Moreover,

\frac{d u}{d z} = \frac{a z}{b e^{u} - a} = \frac{z}{e^{s} - 1} = \frac{z}{s + z^{2} / 2} .

Now, we can rewrite the integral as

I (ρ; a, b, ν^{2}) = {(\frac{a}{b e})}^{a} \int_{- \infty}^{\infty} e^{- a z^{2} / 2} \frac{z e^{- {(u - ρ)}^{2} / (2 ν^{2})}}{s + z^{2} / 2} d z .

A simple calculation gives

\begin{matrix} e^{- \frac{{(u - ρ)}^{2}}{2 ν^{2}}} & = e^{- \frac{{(c - ρ)}^{2}}{2 ν^{2}} - \frac{(c - ρ) s}{ν^{2}} - \frac{s^{2}}{2 ν^{2}}} = e^{- \frac{{(c - ρ)}^{2}}{2 ν^{2}}} [1 - \frac{(c - ρ) s}{ν^{2}} - \frac{s^{2}}{2 ν^{2}} + \frac{{(c - ρ)}^{2} s^{2}}{2 ν^{4}} + O (s^{3})] \\ = e^{- \frac{{(c - ρ)}^{2}}{2 ν^{2}}} [1 - \frac{(c - ρ) (z - z^{2} / 6)}{ν^{2}} - \frac{z^{2}}{2 ν^{2}} + \frac{{(c - ρ)}^{2} z^{2}}{2 ν^{4}} + O (z^{3})] \\ = e^{- \frac{{(c - ρ)}^{2}}{2 ν^{2}}} [1 - \frac{(c - ρ)}{ν^{2}} z + \frac{(c - ρ) ν^{2} - 3 ν^{2} + 3 {(c - ρ)}^{2}}{6 ν^{4}} z^{2} + O (z^{3})], \end{matrix}

and

\begin{matrix} \frac{z}{s + z^{2} / 2} = \frac{1}{1 + z / 3 + z^{2} / 36 + O (z^{3})} = 1 - \frac{z}{3} + \frac{z^{2}}{12} + O (z^{3}) . \end{matrix}

Consequently,

\begin{matrix} \frac{z e^{- {(u - ρ)}^{2} / (2 ν^{2})}}{s + z^{2} / 2} = e^{- \frac{{(c - ρ)}^{2}}{2 ν^{2}}} [1 - \frac{3 (c - ρ) + ν^{2}}{3 ν^{2}} z + (\frac{(c - ρ) ν^{2} - ν^{2} + {(c - ρ)}^{2}}{2 ν^{4}} + \frac{1}{12}) z^{2} + O (z^{3})] . \end{matrix}

By Watson’s lemma, we obtain

\begin{matrix} I (ρ; a, b, ν^{2}) \\ = {(\frac{a}{b e})}^{a} e^{- \frac{{(c - ρ)}^{2}}{2 ν^{2}}} \{\int_{- \infty}^{\infty} e^{- a z^{2} / 2} [1 - \frac{3 (c - ρ) + ν^{2}}{3 ν^{2}} z + (\frac{c ν^{2} - ν^{2} + {(c - ρ)}^{2}}{2 ν^{4}} + \frac{1}{12}) z^{2}] d z + O (a^{- 5 / 2})\} \\ = {(\frac{a}{b e})}^{a} e^{- \frac{{(c - ρ)}^{2}}{2 ν^{2}}} [\frac{\sqrt{2 π}}{a^{1 / 2}} + (\frac{(c - ρ) ν^{2} - ν^{2} + {(c - ρ)}^{2}}{2 ν^{4}} + \frac{1}{12}) \frac{\sqrt{2 π}}{a^{3 / 2}} + O (a^{- 5 / 2})] \\ = {(\frac{a}{b e})}^{a} \sqrt{\frac{2 π}{a}} e^{- \frac{{(c - ρ)}^{2}}{2 ν^{2}}} [1 + (\frac{(c - ρ) ν^{2} - ν^{2} + {(c - ρ)}^{2}}{2 ν^{4}} + \frac{1}{12}) a^{- 1} + O (a^{- 2})] . \end{matrix}

□

Appendix E

Proof of Theorem 3.

We denote

{\bar{Y}}_{i •}^{*} = {\bar{Y}}_{i •} - E ({\bar{Y}}_{i •})

,

{\hat{μ}}_{1, n}^{*} = {\hat{μ}}_{1, n} - E {\hat{μ}}_{1, n}

. Under the alternative hypothesis,

E {\hat{μ}}_{1, n} = {k^{*} μ_{-} + (n - k^{*}) μ^{+}} / n .

We first find a lower bound

S_{n} ({\bar{Y}}_{•}; τ, 2)

, i.e.,

S_{n} ({\bar{Y}}_{•}; τ, 2) \geq S_{k^{*}} ({\bar{Y}}_{•}; τ, 2) .

Then, we decompose the lower bound into three terms:

\begin{matrix} S_{k^{*}} ({\bar{Y}}_{•}; τ, 2) & = S_{k^{*}} ({\bar{Y}}_{•}^{*}; τ, 2) + \frac{2 (n - k^{*}) (μ_{-} - μ^{+})}{n} S_{k^{*}} ({\bar{Y}}_{•}^{*}; τ, 1) \\ + \frac{{(n - k^{*})}^{2} {(μ_{-} - μ^{+})}^{2}}{n^{2}} \sum_{k = 1}^{k^{*}} k^{2} w_{k}^{- 1} (τ), \end{matrix}

where

S_{n} ({\bar{Y}}_{•}^{*}; τ, γ) = \sum_{k = 1}^{n} w_{k}^{- 1} (τ) {\{\sum_{i = 1}^{k} ({\bar{Y}}_{i •}^{*} - {\hat{μ}}_{1, n}^{*})\}}^{γ}

.

By the weak dependence,

q S_{k^{*}} ({\bar{Y}}_{•}^{*}; τ, 2) = O_{p} (1)

and

q S_{k^{*}} ({\bar{Y}}_{•}^{*}; τ, 1) = O_{p} (| Δ | \sqrt{\frac{q}{n}}) .

Furthermore,

n^{- 2} {(n - k^{*})}^{2} {(μ_{-} - μ^{+})}^{2} \sum_{k = 1}^{k^{*}} k^{2} w_{k}^{- 1} (τ) = O (n Δ^{2})

.

q S_{n} ({\bar{Y}}_{•}; τ, 2) \to_{p} \infty

holds because

n q Δ^{2} \to \infty

and

n^{3 / 2} q^{- 1 / 2} | Δ | \to \infty

. □

References

Csörgö, M.; Horváth, L. Limit Theorems in Change-Point Analysis; Wiley: Chichester, UK, 1997. [Google Scholar]
Jiang, F.; Zhao, Z.; Shao, X. Modeling the COVID-19 infection trajectory: A piecewise linear quantile trend model. J. R. Statist. Soc. B 2021, accepted. [Google Scholar]
Liu, B.; Zhou, C.; Zhang, X.; Liu, Y. A unified data-adaptive framework for high dimensional change point detection. J. R. Statist. Soc. B 2020, 82, 933–963. [Google Scholar] [CrossRef]
Yu, M.; Chen, X. Finite sample change point inference and identification for high-dimensional mean vectors. J. R. Statist. Soc. B 2021, 83, 247–270. [Google Scholar] [CrossRef]
Jandhyala, V.; Fotopoulos, S.; MacNeill, I.; Liu, P. Inference for single and multiple change-points in time series. J. Time Ser. Anal. 2013, 34, 423–446. [Google Scholar] [CrossRef]
Gardner, J.A. On detecting changes in the mean of normal variates. Ann. Math. Statist. 1969, 40, 116–126. [Google Scholar] [CrossRef]
Perron, P. Dealing with structural breaks. In Palgrave Handbook of Econometrics: Volume 1, Econometric Theory; Mills, T.C., Patterson, K., Eds.; Publisher: Palgrave Macmillan, London, UK, 2006; pp. 278–352. [Google Scholar]
MacNeill, I. Properties of sequences of partial sums of polynomial regression residuals with applications to tests for change of regression at unknown times. Ann. Statist. 1978, 6, 422–433. [Google Scholar] [CrossRef]
Daniels, H.E. Saddlepoint approximations in statistics. Ann. Math. Statist. 1954, 25, 631–650. [Google Scholar] [CrossRef]
Reid, N. Saddlepoint methods and statistical inference (with discussion). Statist. Sci. 1988, 3, 213–238. [Google Scholar]
Reid, N. Approximations and asymptotics, In Statistics Theory Model; Essays in Honor of D.R. Cox; Chapman and Hall: London, UK, 1991; pp. 287–334. [Google Scholar]
Shi, X.; Wang, X.-S.; Reid, N. Saddlepoint approximation of nonlinear moments. Statist. Sinica 2014, 24, 1597–1611. [Google Scholar] [CrossRef] [Green Version]
Shi, X.; Reid, N.; Wu, Y. Approximation to the moments of ratios of cumulative sums. Can. J. Statist. 2014, 42, 325–336. [Google Scholar] [CrossRef]
Akman, V.E.; Raftery, A.E. Asymptotic inference for a change-point Poisson process. Ann. Statist. 1986, 14, 1583–1590. [Google Scholar] [CrossRef]
Loader, C.R. A log-linear model for a Poisson process change point. Ann. Statist. 1992, 20, 1391–1411. [Google Scholar] [CrossRef]
Imhof, J.P. Computing the distribution of quadratic forms in normal variables. Biometrika 1961, 48, 419–426. [Google Scholar] [CrossRef] [Green Version]
Kuonen, D. Saddlepoint approximations for distributions of quadratic forms in normal variables. Biometrika 1999, 86, 929–935. [Google Scholar] [CrossRef] [Green Version]
Daniels, H.E. Tail probability approximations. Int. Statist. Rev. 1987, 55, 37–48. [Google Scholar] [CrossRef]
Lugannani, R.; Rice, S.O. Saddlepoint approximations for the distribution of the sum of independent random variables. Adv. Appl. Probab. 1980, 12, 475–490. [Google Scholar] [CrossRef]
Anderson, T.; Darling, D. Asymptotic theory of certain “goodness of fit”criteria based on stochastic processes. Ann. Math. Statist. 1952, 23, 193–212. [Google Scholar] [CrossRef]
de Micheaux, P.L. R Package CompQuadForm. 2017. Available online: https://cran.r-project.org/web/packages/CompQuadForm/index.html (accessed on 25 December 2020).
Anderson, T.; Darling, D. A test of ‘‘goodness of fit”. J. Amer. Statist. Assoc. 1954, 49, 765–769. [Google Scholar] [CrossRef]
Wald, A.; Wolfowitz, J. On a test whether two samples are from the same distribution. Ann. Math. Statist. 1940, 11, 147–162. [Google Scholar] [CrossRef]
Biswas, M.; Mukhopadhyay, M.; Ghosh, A.K. A distribution-free two-sample run test applicable to high-dimensional data. Biometrika 2014, 101, 913–926. [Google Scholar] [CrossRef]
Shi, X.; Wu, Y.; Rao, C.R. Consistent and powerful graph-based change-point test for high-dimensional data. Proc. Natl. Acad. Sci. USA 2017, 114, 3969–3974. [Google Scholar] [CrossRef] [Green Version]
Shi, X.; Wu, Y.; Rao, C.R. Consistent and powerful non-Euclidean graph-based change-point test with applications to segmenting random interfered video data. Proc. Natl. Acad. Sci. USA 2018, 115, 5914–5919. [Google Scholar] [CrossRef]
Hall, P.; Ormerod, J.T.; Wand, M.P. Theory of Gaussian variational approximation for a Poisson mixed model. Statist. Sinica 2011, 21, 369–389. [Google Scholar]
Hall, P.; Pham, T.; Wand, M.P.; Wang, S.S.J. Asymptotic normality and valid inference for Gaussian variational approximation. Ann. Statist. 2011, 39, 2502–2532. [Google Scholar] [CrossRef] [Green Version]
Peligrad, M. An invariance principle for ϕ-mixing sequences. Ann. Probab. 1985, 13, 1304–1313. [Google Scholar] [CrossRef]
Phillips, P.C.B.; Solo, V. Asymptotics for linear processes. Ann. Statist. 1992, 20, 971–1001. [Google Scholar] [CrossRef]
Shao, X.; Zhang, X. Testing for change points in time series. J. Am. Statist. Assoc. 2010, 105, 1228–1240. [Google Scholar] [CrossRef]
Bai, J. Least square estimation of a shift in linear processes. J. Time Ser. Anal. 1994, 15, 453–472. [Google Scholar] [CrossRef] [Green Version]
Bai, J. Estimation of a change point in multiple regressions. Rev. Econ. Stat. 1997, 79, 551–563. [Google Scholar] [CrossRef]
Kokoszka, P.; Leipus, R.D. Change-point in the mean of dependent observations. Statist. Probab. Lett. 1998, 40, 385–393. [Google Scholar] [CrossRef]
Chen, H.; Zhang, N. Graph-based change-point detection. Ann. Statist. 2015, 43, 139–176. [Google Scholar] [CrossRef]
Chen, H.; Zhang, N. gSeg: Graph-Based Change-Point Detection (G-Segmentation). R Package Version 0.1. 2014. Available online: https://cran.r-project.org/web/packages/gSeg/index.html (accessed on 27 December 2020).
Chen, M.; Shi, X.; Li, H. GraphCpClust: Graph-Based Change-Point Detection and Clustering. R Package Version 0.1. 2021. Available online: https://github.com/Meiqian-Chen/GraphCpClust (accessed on 27 April 2021).
Lihoreau, M.; Chittka, L.; Raine, N.E. Monitoring flower visitation networks and interactions between pairs of bumble bees in a large outdoor flight cage. PLoS ONE 2016, 11, e0150844. [Google Scholar] [CrossRef]
Zou, H. The adaptive Lasso and its oracle properties. J. Am. Statist. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Statist. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Jin, B.; Shi, X.; Wu, Y. A novel and fast methodology for simultaneous multiple structural break estimation and variable selection for non-stationary time series models. Statist. Comput 2013, 23, 221–231. [Google Scholar] [CrossRef]
Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 2010, 38, 894–942. [Google Scholar] [CrossRef] [Green Version]
Cho, H.; Fryzlewicz, P. Multiple-change-point detection for high dimensional time series via sparsified binary segmentation. J. R. Statist. Soc. B 2015, 77, 475–507. [Google Scholar] [CrossRef]
Fryzlewicz, P. Wild binary segmentation for multiple change-point detection. Ann. Statist. 2014, 42, 2243–2281. [Google Scholar] [CrossRef]
Wang, T.; Samworth, R.J. High dimensional change point estimation via sparse projection. J. R. Statist. Soc. B 2017, 80, 57–83. [Google Scholar] [CrossRef]

Figure 1. Plot of weights:

n^{2}

(uniform),

k (n - k)

(centered),

(n + k) (n - k)

(left shifted), and

k (2 n - k)

(right shifted).

Figure 1. Plot of weights:

n^{2}

(uniform),

k (n - k)

(centered),

(n + k) (n - k)

(left shifted), and

k (2 n - k)

(right shifted).

Figure 2. The pattern of eigenvalues (cross products of rows and columns) illustrated by dots for three weights

w_{k} (n / 2)

for

τ = n / 2

(blue),

w_{k} (0)

for

τ = 0

(green) and

w_{k} (n)

for

τ = n

(purple) with the increase of n.

Figure 2. The pattern of eigenvalues (cross products of rows and columns) illustrated by dots for three weights

w_{k} (n / 2)

for

τ = n / 2

(blue),

w_{k} (0)

for

τ = 0

(green) and

w_{k} (n)

for

τ = n

(purple) with the increase of n.

Figure 3. Typical images in three different image sets extracted from the same video data with different frame rates. The first row contains four images located at 4 (change point), 5, 40 (change point), and 41 from the first set of extracted 49 images (1 frame per second); the second row contains four images located at 7 (change point), 8, 79 (change point), and 80 from the second set of extracted 98 images (2 frames per second); and the third row contains four images located at 19 (change point), 20, 198 (change point), and 199 from the third set of extracted 245 images (5 frames per second).

Table 1. Critical values of

\sum_{k = 1}^{n} λ_{k} (τ) Z_{k}^{2}

for different weights in (4), sizes (n), and probabilities (p).

Table 1. Critical values of

\sum_{k = 1}^{n} λ_{k} (τ) Z_{k}^{2}

for different weights in (4), sizes (n), and probabilities (p).

		n
Weight	p	20	40	60	80	100	200	400	1000	10,000	∞
$w_{k} (n / 2)$	0.90	1.883	1.908	1.916	1.920	1.923	1.928	1.930	1.932	1.933	1.933
	0.925	2.111	2.136	2.145	2.149	2.151	2.156	2.159	2.160	2.161
	0.95	2.442	2.467	2.476	2.480	2.482	2.487	2.490	2.491	2.492	2.492
	0.975	3.027	3.052	3.061	3.065	3.067	3.072	3.075	3.076	3.077	3.070
	0.99	3.828	3.853	3.861	3.866	3.868	3.873	3.876	3.877	3.878	3.850
$w_{k} (0)$	0.90	0.599	0.605	0.607	0.608	0.609	0.610	0.611	0.611	0.611
	0.925	0.675	0.682	0.684	0.685	0.685	0.687	0.687	0.688	0.688
	0.95	0.786	0.792	0.794	0.795	0.796	0.797	0.798	0.798	0.798
	0.975	0.981	0.988	0.990	0.991	0.991	0.993	0.993	0.994	0.994
	0.99	1.249	1.255	1.257	1.258	1.259	1.260	1.261	1.261	1.261

Table 2. Estimated power (%) for the

w_{k} (0)

,

w_{k} (n / 2)

, and

w_{k} (n)

in (23), MST, MST

^{*}

, SHP, and SHP

^{*}

, based on 200 simulations; n are the sample sizes, q are the dimensions,

k^{*}

are the change point locations, and

Δ

is the size of the change in the mean of the normal random variables.

Table 2. Estimated power (%) for the

w_{k} (0)

,

w_{k} (n / 2)

, and

w_{k} (n)

in (23), MST, MST

^{*}

, SHP, and SHP

^{*}

, based on 200 simulations; n are the sample sizes, q are the dimensions,

k^{*}

are the change point locations, and

Δ

is the size of the change in the mean of the normal random variables.

n	40												80
q	50						100						50						100
$Δ$	0.1			0.2			0.1			0.2			0.1			0.2			0.1			0.2
$\frac{k^{*}}{n}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$
$w_{k} (0)$	23	41	36	67	91	84	39	72	72	96	100	100	43	74	67	97	100	100	75	96	91	100	100	100
$w_{k} (n / 2)$	31	43	31	82	92	82	51	73	62	99	100	99	57	77	61	99	100	100	87	97	89	100	100	100
$w_{k} (n)$	32	43	23	89	92	64	59	73	46	99	100	91	61	74	49	100	100	97	91	97	76	100	100	100
MST	3	5	4	7	6	10	3	5	5	6	19	9	3	4	5	7	21	10	5	8	4	13	35	14
MST $^{*}$	18	24	16	44	38	44	33	29	38	75	77	75	21	21	21	65	80	69	36	43	35	95	99	98
SHP	4	6	5	7	9	12	4	6	9	10	16	6	3	8	6	9	22	13	5	6	8	13	24	16
SHP $^{*}$	10	13	8	33	37	33	17	23	18	67	77	65	9	12	13	49	71	54	22	32	21	90	97	92

Table 3. Estimated power (%) for the

w_{k} (0)

,

w_{k} (n / 2)

,

w_{k} (n)

, MST, MST

^{*}

in (23), SHP, and SHP

^{*}

, based on 200 simulations; n are the sample sizes, q are the dimensions,

k^{*}

are the change point locations, and

Δ

is the size of the change in the mean of mixed normal distributions.

Table 3. Estimated power (%) for the

w_{k} (0)

,

w_{k} (n / 2)

,

w_{k} (n)

, MST, MST

^{*}

in (23), SHP, and SHP

^{*}

, based on 200 simulations; n are the sample sizes, q are the dimensions,

k^{*}

are the change point locations, and

Δ

is the size of the change in the mean of mixed normal distributions.

n	40												80
q	50						100						50						100
$Δ$	0.1			0.2			0.1			0.2			0.1			0.2			0.1			0.2
$\frac{k^{*}}{n}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{3}{4}$
$w_{k} (0)$	18	35	40	64	92	81	37	74	62	89	100	98	43	72	59	94	100	99	73	98	90	100	100	100
$w_{k} (n / 2)$	28	36	37	80	94	74	50	76	57	98	100	97	53	74	52	98	100	99	84	98	87	100	100	100
$w_{k} (n)$	32	38	27	83	95	56	58	75	42	99	100	92	60	72	44	100	100	96	89	99	73	100	100	100
MST	4	5	6	4	5	4	6	6	6	5	10	6	4	4	6	4	24	20	5	6	2	8	25	20
MST $^{*}$	20	11	24	41	44	39	27	28	27	70	72	64	22	21	24	60	75	65	38	39	33	95	100	95
SHP	4	4	8	10	15	13	6	5	10	17	28	21	8	6	5	10	29	18	9	11	6	28	54	41
SHP $^{*}$	9	7	11	26	41	26	16	20	18	55	70	57	12	12	15	44	59	49	22	28	23	92	96	88

Table 4. Estimated change points for the

w_{k} (0)

,

w_{k} (n / 2)

,

w_{k} (n)

, MST

^{*}

, and SHP

^{*}

, based on extracted 49, 98, and 245 images; n are the sample sizes and

k^{*}

are the change point locations.

Table 4. Estimated change points for the

w_{k} (0)

,

w_{k} (n / 2)

,

w_{k} (n)

, MST

^{*}

, and SHP

^{*}

, based on extracted 49, 98, and 245 images; n are the sample sizes and

k^{*}

are the change point locations.

n	49		98		245
$k^{*}$	4	40	7	79	19	198
$w_{k} (0)$		41		82		206
$w_{k} (n / 2)$	4		8		19
$w_{k} (n)$	4		8		19
MST $^{*}$	4		7		19
SHP $^{*}$	4		7		19

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, X.; Wang, X.-S.; Reid, N. A New Class of Weighted CUSUM Statistics. Entropy 2022, 24, 1652. https://doi.org/10.3390/e24111652

AMA Style

Shi X, Wang X-S, Reid N. A New Class of Weighted CUSUM Statistics. Entropy. 2022; 24(11):1652. https://doi.org/10.3390/e24111652

Chicago/Turabian Style

Shi, Xiaoping, Xiang-Sheng Wang, and Nancy Reid. 2022. "A New Class of Weighted CUSUM Statistics" Entropy 24, no. 11: 1652. https://doi.org/10.3390/e24111652

APA Style

Shi, X., Wang, X. -S., & Reid, N. (2022). A New Class of Weighted CUSUM Statistics. Entropy, 24(11), 1652. https://doi.org/10.3390/e24111652

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Class of Weighted CUSUM Statistics

Abstract

1. Introduction

2. Exact and Asymptotic Distributions of the WC Statistics

2.1. Explicit Distribution for a Normal Model

2.2. Karhunen–Loève Expansion

2.3. Graphical Model

2.4. Normal Mixed Model

2.5. Poisson Mixed Model

2.6. Weak Dependence

3. Power and Change Point Estimation

4. Simulations

5. Data Analysis

6. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI