Next Article in Journal
Smooth Sigmoid Surrogate (SSS): An Alternative to Greedy Search in Decision Trees
Previous Article in Journal
Self-Similar Solutions of a Multidimensional Degenerate Partial Differential Equation of the Third Order
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multiscale Change Point Detection for Univariate Time Series Data with Missing Value

1
School of Mathematics, Harbin Institute of Technology, Harbin 150001, China
2
Department of Statistics, College of Natural and Computational Sciences, Arba Minch University, Arba Minch P.O. Box 21, Ethiopia
3
Department of Mathematical Sciences, College of Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
*
Authors to whom correspondence should be addressed.
Mathematics 2024, 12(20), 3189; https://doi.org/10.3390/math12203189
Submission received: 21 August 2024 / Revised: 3 October 2024 / Accepted: 10 October 2024 / Published: 11 October 2024

Abstract

:
This paper studies the autoregressive integrated moving average (ARIMA) state space model combined with Kalman smoothing to impute missing values in a univariate time series before detecting change points. We estimate a scale-dependent time-average variance constant that depends on the length of the data section and is robust to mean shifts under serial dependence. The consistency of the proposed estimator is shown under the assumption allowing heavy tailedness. Integrating the proposed estimator with the moving sum and wild binary segmentation procedures to determine the number and locations of change points is discussed. Furthermore, the performance of the proposed methods is evaluated through extensive simulation studies and applied to the Beijing multi-site air quality dataset to impute missing values and detect mean changes in the data.

1. Introduction

Change point detection (CPD) has been studied for several decades and remains an active area of research. It has been employed in various fields such as financial analysis, medical condition monitoring, climatology, and air quality. Recent developments in machine learning have also contributed to the advancement of CPD methods, with applications in bioinformatics, speech recognition, image analysis, and monitoring of complex systems have also been motivated. For an extensive review of change point detection, refer to Aminikhanghahi and Cook [1] and Truong et al. [2]. The methods used for detecting and localization of multiple mean shifts in univariate data that search for changes over the data section at various scales are commonly referred to as the canonical change point detection [3]. In such methods, it is typically necessary to estimate the level of noise to distinguish the actual change from the random noise-induced fluctuations, especially when there is a serial correlation.
A brief review of multiple CPDs in scenarios with serial correlation is provided. One research direction aims to adopt techniques originally developed for independent observations to be applicable in the analysis of time series data. This includes the use of test statistics like the cumulative sum (CUSUM) [4] and moving sum (MOSUM) [5] tests. The information criteria of Yao [6] was also adopted for dependent data [3,7] which was initially designed for independent Gaussian random variables. Using specific time series models, such as the autoregressive (AR) model, and concurrently estimating dependence and change points are the topics of another area of study. In line with this, refs. [8,9,10,11] utilized the AR model, which is adopted to handle the serial correlations to detect multiple change points in the mean of AR processes. Furthermore, another research line suggests using long-run variance (LRV) estimators to gauge noise levels in CPD that are robust to multiple mean shifts [12,13,14,15]. Estimation of the LRV has been a long-time difficult problem, even without the presence of mean change [16]. The widely used LRV kernel estimator tends to have a downward bias [17,18]. Moreover, sometimes it does not guarantee that the corresponding estimate is always positive unless some underlying assumptions are imposed on the estimators of the LRV [13]. Even some estimators of the LRV can take negative values when the LRV approaches zero [19], which renders their application unsuitable for statistical procedures. This task even becomes more difficult when there are multiple change points and the data exhibit serial dependence.
Missing values are ubiquitous in real-world datasets. They introduce uncertainty during data analysis, which can affect the statistical estimator’s properties and reduce efficiency, leading to incorrect conclusions [20,21]. Usually, the deletion of missing observations results in a valuable loss of important information, which biases statistical estimation and inference. The concept of change point analysis with missing values has also been addressed in the literature [22,23,24]. Another more application-oriented work includes Yang et al. [25], among others, who considered the use of threshold-based regression models, which examine the change based on covariate relationships. The other is Zhao et al. [26] who applied an online missing value imputation algorithm for mixed-type (continuous and categorical) data and detected the change points. The aforementioned literature considers high dimensional data or non-time series data, which depends on the inter-attribute correlation between variables or attributes. Furthermore, most existing univariate change point detection methods, designed for complete datasets, often delete missing values [27,28]. Such an approach may result in the incorrect identification of genuine change point locations, especially when these points coincide with the time points where missing data occur. Given the necessity of replacing missing values with reasonable estimates before applying CPD methods, it is crucial to develop an effective imputation technique to ensure the accurate identification of change points. However, there is a significant gap in addressing change point detection for univariate time series in the presence of missing values.
Therefore, in this paper, we propose the ARIMA state space model combined with Kalman smoothing to impute missing values in univariate time series. This approach leverages the ARIMA model and Kalman smoothing capability to handle time series data with seasonality and trends, making the imputation more accurate and reliable. Moreover, to tackle the challenges associated with using LRV in change point detection, we apply a scale-dependent time-average variance constant (TAVC, Mcgonigle and Cho [27]) estimator, which is resilient to varying mean and serial correlation. For a stationary process ε t t = 1 T , the scale-dependent TAVC defined as
σ D 2 = V a r D 1 2 t = 1 D ε t ,
which depends on scale D, and is different from the earlier LRV estimators. Combining a scale-dependent TAVC estimator with multiscale CPD methods, such as the MOSUM procedure and the wild binary segmentation (WBS) algorithm, effectively addresses the problem of detecting multiple mean shifts in the series. The performance of our imputation method and robust TAVC estimator is rigorously evaluated through extensive simulations and subsequently applied to the Beijing air quality dataset. Thus, this integrated approach is designed to overcome the challenges posed by missing values and improve the precision of change point detection in real-world time series data. The contributions of this paper are summarized as follows. Firstly, we apply the ARIMA state-space model combined with Kalman smoothing to impute missing values in univariate time series data exhibiting mean shifts. Secondly, we derive the TAVC estimator, and integrate it into multiscale change point detection methods. This approach estimates and locates change points for the imputed time series data, demonstrating robust performance in handling missing values.
The remainder of this paper is structured as follows: Section 2 discusses the imputation of missing values using the ARIMA state space model and Kalman smoothing. Section 3 introduces the concept of TAVC, its robust estimator, and demonstrates its consistency. In Section 4, the application of the TAVC estimator in conjunction with multiscale CPD methods such as MOSUM and WBS has been given in detail. Section 5 evaluates the performance of the proposed approaches and apply to Beijing air quality. Finally, Section 6 summarizes the main findings of the paper.

2. Missing Value Imputation

In this section, we briefly discuss the general framework of the ARIMA state space model and the implementation of Kalman smoothing to estimate missing values. The state-space model offers a versatile method for analyzing time series data due to the recursive behavior of the model, which is particularly useful for computational simplicity of maximum-likelihood estimation and handling of missing values [29,30,31]. The Kalman filtering method, as part of a state-space model, dynamically adjusts to new data and handles non-stationary series by recursively estimating missing values as new observations become available [32,33,34]. Moreover, the ARIMA state space and Kalman smoothing method perform exceptionally well for time series data that exhibit strong seasonality with high percentages of missing and large gap sizes under the MAR and MCAR mechanisms [35,36,37]. The state space model is adaptable for various types of time series data; however, in this paper, we consider the linear time series which follows Gaussian distribution (see, [30,33]).

2.1. State Space Model and Kalman Filter

The state space model comprises a measurement equation that establishes a relationship between the observed data ( y t ) and an m-dimensional state vector α , and a transition equation that describes how the state vector changes over time. For a linear Gaussian state space model, the measurement (observation) equation takes the form
y t = Z t α t + ε t , t = 1 , , T ,
where y t is N × 1 vector of observed value, Z t is N × m matrix, and ε t is N × 1 vector of disturbances, which is normally independently distributed (NID) with mean zero and N × N covariance matrix H t . The elements of the state vector α t are unobservable and its transition equation is the first-order Markov process
α t = T t α t 1 + R t η t , t = 1 , , T ,
where T t is m × m transition matrix, R t is m × r matrix, and η t is r × 1 vector of disturbances which is N I D ( 0 , Q t ) .
Assumption 1.
i. 
The initial state vector α 0 N ( a 0 , P 0 ) .
ii. 
E ( ε t η s ) = 0 , E ( ε t α 0 ) = 0 and E ( η t α 0 ) = 0 for all s , t = 1 , , T .
The matrices Z t , H t , T t , R t and Q t are called the system matrices and contain non-random elements.
Once the model is translated to a state space model, we apply the Kalman filter to estimate state vector parameters. The Kalman filter is a recursive algorithm utilized for the computation of the optimal estimator of the state vector at time t, taking into account all the available information, including the observations y t [30,32,33]. For the state space model (2) and (3) assume that the initial state α 0 is N ( a 0 , P 0 ) , where a 0 and P 0 are known. Let Y t 1 denote the set of past observations y 1 , , y t 1 and a t 1 be the optimal estimators of α t 1 based on Y t 1 with the corresponding mean square error (MSE) matrix, P t 1 . Then, the Kalman filter consists of two sets of equations, namely prediction equations and updating equations. Given a t 1 and P t 1 at time t 1 , the optimal estimator of α t and its corresponding MSE matrix are
a t | t 1 = E ( α t | Y t 1 ) = T t a t 1 ,
P t | t 1 = E [ ( α t a t | t 1 ) ( α t a t | t 1 ) | Y t 1 ] = T t P t 1 T t + R t Q t R t .
Equations (4) and (5) are known as prediction equations. Additionally, the optimal estimator of y t given the information at t 1 is y ^ t | t 1   = E ( y t | Y t 1 ) = Z t a t | t 1 . The prediction error is computed as v t = y t y ^ t | t 1 = Z t ( α t a t | t 1 ) + ε t . Usually v t is known as a one-step ahead prediction error and its MSE matrix becomes V a r ( v t ) = E ( v t v t ) = F t = Z t P t | t 1 Z t + H t . Therefore, the above components are required to derive the prediction error decomposition of the log-likelihood function.
When additional new observations y t become available, the knowledge about the optimal estimator of α t , a t | t 1 can be updated. The updating equations for α t given all the available information Y t are derived following from Durbin [33]. The optimal predictor a t | t 1 and its MSE matrix are updated using
a t = E ( α t | Y t ) = a t | t 1 + P t | t 1 Z t F t 1 v t ,
P t = P t | t 1 P t | t 1 Z t F t 1 Z t P t | t 1 ,
where F t = Z t P t | t 1 Z t + H t and it is assumed to be non-singular. Thus, taken together ((4) and (5)) and ((6) and (7)) make up the Kalman filter. The initial values for the Kalman filter may be specified as a 0 and P 0 or a 1 | 0 and P 1 | 0 . If the process is stationary and the initial state α 0 N ( a 0 , P 0 ) , the Kalman filter can be initialized as a 0 = 0 ,   P 0 or a 1 | 0 = 0 ,   P 1 | 0 = P 0 . For non-stationary transition equations, the initial distribution of α 0 is specified in terms of a diffuse or non-informative prior [30].
Furthermore, the Kalman filter enables a likelihood to be computed via prediction error decomposition which opens the way for the estimation of unknown parameters of the model. The distinguishing feature of a time series model is that the observations are dependent. Thus, the likelihood function is defined using the conditional probability density function as
L ( y | ψ ) = T t = 1 p ( y t | Y t 1 ) ,
where p ( y t | Y t 1 ) denotes the distribution of y t which is conditional on the observations at time t 1 , that is Y t 1 = { y t 1 , y t 2 , , y 1 } [30]. For the measurement equation y t = Z t a t | t 1 + Z t ( α t a t | t 1 ) + ε t , its conditional distribution is normal with mean
E ( y t | Y t 1 ) = y ^ t | t 1 = Z t a t | t 1
and covariance matrix, F t = Z t P t | t 1 Z t + H t , t = 1 , 2 , , T . Therefore, for the Gaussian model, the log-likelihood function of (8) can be given as
log L = N T 2 log 2 π 1 2 T t = 1 log | F t | 1 2 v t T t = 1 F t 1 v t ,
where v t = y t y ^ t | t 1 , t = 1 , , T . Since the conditional mean, y ^ t | t 1 of y t | t 1 is also the minimum mean square estimator (MMSE) of y t , the N × 1 vector v t can be interpreted as a vector of prediction error and (9) is known as prediction error decomposition. To maximize the likelihood function with respect to the unknown parameters ψ , commonly a Newton–Raphson method will be employed. This method iteratively calculates the roots of the function until the parameter values reach a stable state (see, [33,38]).

2.2. State Space Formulation for ARIMA Model

This section presents the translation of the ARIMA model to state space representation and the estimation of the missing value using fixed-point smoothing. Let Δ y t = y t y t 1 , Δ s y t = y t y t s , where Δ and Δ s are non-seasonal and seasonal difference operators, respectively. As suggested by Box et al. [39], differencing is continued until trend and seasonal effects have been eliminated, giving a new variable y t * = Δ d Δ s D y t for d , D = 0 , 1 , , which can be modeled as a stationary ARMA(p, q) model.
A non-stationary ARIMA(p, d, q) model can be defined as
ϕ p ( B ) Δ d y t = θ q ( B ) η t ,
where B is the back-shift operator, ϕ p ( B ) = 1 ϕ i p i = 1 B i is an autoregressive operator, θ q ( B ) = 1 θ i q i = 1 B i is an arbitrary moving average operator, Δ = 1 B is the backward difference operator and η t N I D ( 0 , σ 2 ) . For ease of simplicity, write Δ d = ( 1 B ) d = 1 δ 1 B δ 2 B 2 δ d B d . Moreover, the multiplicative seasonal ARIMA(p, d, q)×(P, D, Q)s is given by
ϕ p ( B ) Φ P ( B s ) δ d ( B ) Δ D ( B s ) y t = θ q ( B ) Θ Q ( B s ) η t
where Φ P ( B s ) = 1 Φ 1 B s Φ P B P s and Θ Q ( B s ) = 1 Θ 1 B s Θ Q B Q s are seasonal (stationary) AR and MA operators, respectively; δ d ( B ) = ( 1 B ) d is non-seasonal differencing and Δ D ( B s ) = ( 1 B s ) D is seasonal differencing.
For the ARIMA model defined by (10), its state space formulation is presented in accordance with Ansley and Kohn [29]. Let
w t = Δ d y t .
Following (10)
ϕ p ( B ) w t = θ q ( B ) η t .
Hence, w t is a stationary ARMA process; now it is able to be written in the state space form as w t = z t α t where z t = ( 1 , 0 , , 0 ) and α t is ( d + m ) × 1 state vector with ith element α i , t defined as
α i , t = y t i + 1 for i = 1 , , d ,
α i + d , t = k = m i + 1 m ( ϕ k w t + i 1 k θ k 1 η t + i k ) , i = 1 , , m
and m = max ( p , q + 1 ) . In (13), θ 0 = 1 , ϕ i = 0 for i > p and θ i = 0 for i > q . The state transition equation for the state vector α t will be given as
α t = T α t 1 + R η t ,
where
T = T d T 0 P ,
T d = δ 1 δ 2 δ d 1 0 0 0 1 0 0 1 0 , T = ϕ 1 1 0 0 0 0 0 0 0 0 0 0 ,
Δ d = ( 1 B ) d = 1 δ 1 B δ 2 B 2 δ d B d ,
P = ϕ 1 1 0 0 ϕ 2 0 1 0 ϕ m 1 0 0 1 ϕ m 0 0 0
and R is ( m + d ) × 1 dimensional vector which is defined as R = ( 1 , 0 , , 1 , θ 1 , , θ m 1 ) and η t N I D ( 0 , σ 2 ) . The matrix P in (15) is similar to the transition matrix for stationary processes. The first d elements of (14) form the state transition equation for the difference (11) while the matrix T provides the link between the stationary and non-stationary parts of the model (11) and (12).
The likelihood function for the ARIMA model with missing observations can be formulated in two different ways: level formulation and difference formulation. Level formulation provides greater flexibility and is recommended for missing observations occurring not at regular pattern [29,40], which is applied in this paper. Let B be the lag operator and δ j be the coefficient of B j in the expansion of Δ d Δ D ( B s ) = ( 1 B ) d ( 1 B s ) D . For the stationary ARMA(p, q) process w t , let α ¯ t be the state vector of the process in state space form and define the augmented state vector α t = ( α ¯ t y t 1 * ) , where y t 1 * = ( y t 1 , , y t d s D ) . The transition equation for the augmented state vector is
α t = α ¯ t y t 1 * T | 0 | 1 0 0 | δ 1 δ k 0 | I k 1 0 α ¯ t 1 y t 2 * + R 0 η t , t = 1 , , T ,
where T and R are matrices associated with state vector α ¯ t , I k 1 is k 1 dimensional identity matrix and k = d + s D . If y t is observed for all t = 1 , , T , the measurement equation will be y t = ( 1 , 0 m 1 , δ 1 , , δ k ) α t , t = 1 , , T , where 0 m 1 is an ( m 1 ) vector of zeros. The initial value for the Kalman filter can be taken at t = k with a k + 1 | k = ( 0 , y k + 1 * ) and the covariance matrix
P k + 1 | k = P ¯ 1 | 0 0 0 0 ,
where P ¯ 1 | 0 = σ 2 E ( α ¯ t α ¯ t ) . The likelihood function generated by a Kalman filter that is applied to the differenced observations Δ d Δ s D y t is the same as the likelihood function that is obtained by using the Kalman filter for the time series [40].

Estimating Missing Observations

Missing observations can be estimated by utilizing fixed-point smoothing to the level form of the ARIMA model. It proceeds by applying the Kalman filter and augmenting the state vector each time a missing observation is encountered [30]. After processing all observations, the augmented state vector provides the MMSE estimates of the missing data, with the corresponding MSEs derived from the augmented covariance matrix. Let y t be missing at time t = τ . The state vector is augmented by y τ and is given as y τ = z α τ . Then the fixed-point smoothing recursions, which is modified from Anderson and Moore [32] leads to
y ¯ τ | t = y ¯ τ | t 1 + p τ | t 1 z t f t 1 v t , t = τ , , T ,
p τ | t = T ( I x t z t ) p τ | t 1 , t = τ , , T ,
where x t = P t | t 1 z t f t 1 and the initial values are y ¯ τ | τ 1 = z a τ | τ 1 and p τ | τ 1 = P τ | τ 1 z . The values of v t , f t and x t are all generated by the Kalman filter for the original state space model and the vector z t is being used to define the measurement equation. Whenever a missing observation is encountered at time t, then (16) and (17) lessen to y ¯ τ | t = y ¯ τ | t 1 and p τ | t = Tp τ | t 1 , respectively. Moreover, the MSE of y ¯ τ | T is given by σ 2 f τ | T , where f τ | T is obtained from the recursion
f τ | t = f τ | t 1 p τ | t 1 z t f t 1 z t p τ | t 1 , t = τ , , T
with f τ | τ 1 = z P τ | τ 1 z . The fixed-point smoothing algorithm, which is set up in this way, will result in an extremely efficient estimator for the missing value.

3. TAVC Estimation

3.1. Multiscale Change Point Detection in the Mean

Change point detectability depends on the interrelationship between the magnitude of changes and their distance to neighboring change points. Large changes are usually detectable even when close to others, but smaller changes need more distance to be identified. When data shows repeated large fluctuations and minor changes in data over prolonged stable periods, it demands a multiscale method to accurately detect all change points [3].
The change point model, which allows for multiple changes in the mean, is defined by
Y t = f t + ε t = f 0 + i = 1 p μ i · I ( t τ i + 1 ) + ε t , t = 1 , , T ,
where f t is piecewise deterministic signal with p change points at locations τ i , i = 1 , , p , with τ 0 = 0 and τ p + 1 = T . μ i is mean at τ i . The error term ε t t = 1 T is assumed to be a (weakly) stationary time series satisfying E ( ε t ) = 0 with finite LRV σ 2 = lim N V a r N 1 2 t = 1 N ε t ( 0 , ) . These errors can be serially correlated and heavy-tailed. The objective is to detect changes in the mean and accurately estimate both the number and locations of change points.
To address this problem, we use a local testing method for data segmentation. This method applies a local test for a single change point adopted to detect and estimate multiple change points [3]. The procedure involves evaluating a test statistic of the form σ ^ s , e 1 | T s , k , e | to a threshold, say Γ . A change point detector T s , k , e ( Y ) is computed on the segment { Y t ,   s < t e } as
T s , k , e ( Y ) = ( k s ) ( e k ) e s Y ¯ s : k Y ¯ k : e
at some location 0 s < k < e T , which is determined in a method-specific way, usually it can either be fixed, random or data-dependent and Y ¯ a : b = 1 b a t = a + 1 b Y t . Here, σ s , e 2 denotes variability in Y t t = s + 1 e , σ ^ s , e 2 is its estimator. If the error term ε t t = 1 T is assumed to be stationarity, then the appropriate choice of the variance will be σ s , e 2 = σ e s 2 is the TAVC, which depends on the scale D = e s is given in (1). As the result, if σ ^ s , e 1 | T s , k , e | > Γ , then it indicates that the data segment Y t t = s + 1 e contains a change point within the interval; otherwise, it signals no such change point has been detected in the given data section [3].
However, in the presence of serial correlation, distinguishing a true mean shift from random fluctuations becomes challenging. Consequently, it is imperative to have a robust variance estimator which effectively handles the variability in the data section under consideration. For such a multiscale procedure, using an estimator of the global LRV instead of σ s , e 2 , regardless of the interval length used to compute the detector statistic, can lead to spurious change point estimations. Therefore, a variance that systematically captures the variability in the data section under serial correlation for standardization of multiscale change point detectors will be σ D 2 estimator [27].

3.2. Robust Estimation of Multiscale TAVC

Here, the derivation σ D 2 estimator is presented. For notational convenience, assume that the data length D is an even number. To account for temporal dependency, group the dataset into blocks of the same size G = D 2 . Then, for some starting point h ( 0 , , G 1 ) with number of blocks N ( h ) = ( T h G ) G , define
Y ¯ j , h = 1 G t = j G + h + 1 ( j + 1 ) G + h Y t , and π j , h = G ( Y ¯ j , h Y ¯ j 1 , h ) 2 2 for j = 1 , , N ( h ) .
Similarly, define
ε ¯ j , h = 1 G t = j G + h + 1 ( j + 1 ) G + h ε t and π ˜ j , h = G ( ε ¯ j , h ε ¯ j 1 , h ) 2 2 .
Then, the mean of π ˜ j , h expressed as
σ ˜ ^ D 2 = 1 N ( h ) j = 1 N ( h ) π ˜ j , h
considers time dependency in local data sections of length D = 2 G . In addition, we have E ( σ ˜ ^ D 2 ) σ D 2 = o ( 1 ) for D (see Theorem 1 below). The natural estimate of TAVC is N ( h ) 1 j = 1 N ( h ) π j , h but it is usually contaminated by jumps or the mean shifts. Furthermore, its counterpart σ ˜ ^ D 2 is indicative of the level of variability but being inaccessible (as it is defined with ε ¯ j , h in place of Y ¯ j , h ).
Thus, the M-estimation technique discussed in [41] is adopted to achieve the required estimator as detailed in [42]. Consider a non-decreasing influence function ρ : R R such that
ρ ( x ) = log ( 2 ) , x 1 , log ( 1 + x + x 2 2 ) , 1 x 0 , log ( 1 x + x 2 2 ) , 0 x 1 , log ( 2 ) , x 1 .
Then, the estimator of TAVC at D and h, ( σ ^ D , h 2 ), is defined as the solution of the M estimation equation
J D , h ( θ ) = 1 N ( h ) j = 1 N ( h ) ρ ν ( π j , h θ ) = 0 ,
where ρ ν ( x ) = ν 1 ρ ( ν x ) for some ν > 0 . The condition for ν will be set later under Section 5.1. Equation (22) may have more than one solution; in such a case, any of them can be used to define σ ^ D , h 2 .

3.3. Consistency of the Scale-Dependent TAVC Estimator

Assumption 2.
(i) 
The error process is assumed to be linear, i.e., ε t = k = 0 w k η t k where η t t Z is a sequence of i i d random variables and w k Ξ ( k + 1 ) β for some constants β > 2.5 and Ξ > 0 for all k 0 .
(ii) 
σ > 0 s.t. σ 2 = k 0 w k 2 < , where σ 2 c σ .
(iii) 
On the distribution of η t t Z , we work under any one of the two following scenarios.
a. 
For q > 4 ,   η 1 q = E ( | η 1 | q ) 1 q < .
b. 
For R η > 0 and ω > 0 , η 1 q R η q ω for all q 1 .
The assumptions stated above have the following underlying basic reasons. Linearity of the error process ε t t = 1 T assumed in Assumption 2 (i) enables controlling of the functional dependence in π j , h j = 1 N ( h ) . This allows the temporal dependence of the error process to decay at an algebraic rate. Whereas Assumption 2 (ii) declared to secure positivity of the LRV in such a way that it is well-defined while computing change point detector statistics. Assumption 2 (iii) (a) characterizes the heavy-tailedness of the error process ε t t = 1 T . While Assumption 2 (iii) (b) claims a stronger condition that requires sub-Weibull tail behavior on η t t Z . This includes sub-Gaussian distribution for ω = 1 2 such that E ( | η 1 | q ) 1 q R η q 1 2 and sub-exponential distribution for ω = 1 , where E ( | η 1 | q ) 1 q R η q which are defined based on the growth behavior of moment [43].
Definition 1.
Let { a n } and { b n } be positive sequences, then a n b n if there exists κ 1 , κ 2 > 0 such that κ 1 a n / b n κ 2 as n ; a n b n or a n = o ( b n ) means a n / b n 0 ; a n b n or a n = O ( b n ) if there exists a constant C > 0 such that a n / b n C as n .
Theorem 1.
Under Assumption 2. Let TAVC be defined as
σ ˜ D 2 = V a r 1 D t = 1 G ε t t = G + 1 D ε t for D = 2 G .
Then provided that ν p G T , under Assumption 2 (iii) (a) the estimator σ ^ D , h 2 satisfies
σ ^ D , h 2 σ ˜ D 2 = O P p D T + max D T q 2 q + 2 , D log T T
and under Assumption 2 (iii) (b),
σ ^ D , h 2 σ ˜ D 2 = O P p D T + D log 4 ω + 3 T T
for any fixed h 0 , , G 1 . In addition, σ ˜ D 2 satisfies
σ ˜ D 2 σ D 2 = O ( D 1 )
Proof. 
The concept of the proof is similar to the one discussed in [27]. □
Theorem 1 illustrates that the proposed robust estimator consistently approximates the TAVC at scale D. The estimation error is decomposed into two components: the error from approximating σ D 2 by σ ˜ D 2 in (25), and the error in estimating σ ˜ D 2 by σ ^ D , h 2 . The second error accounts explicitly for the influence of multiple mean shifts on the estimator through the term p D T in (23) and (24), while the remaining terms capture the effect of the error distribution. Consequently, Theorem 1 establishes that
σ ˜ D 2 σ 2 = O 1 D
therefore, as D increases, the scale-D TAVC approximates the global LRV as expected.

3.3.1. Setting the Maximum Time-Scale for TAVC Estimation

In this section, we briefly discuss the maximum time scale or length required for D in the data section. As shown in (25), the error resulting from approximating σ D 2 with σ ˜ D 2 diminishes as D increases. Conversely, the error in estimating σ ˜ D 2 with σ ^ D , h 2 increases with D, as given in (23) and (24). This increase is attributed to the growing effect of mean shifts with an increase in D, coupled with the decrease in the number of available blocks. To balance these two errors, Mcgonigle and Cho [27] proposed setting a maximum time-scale, denoted as M, for use with a multiscale CPD algorithm. Particularly when the change point detector T s , k , e has e s M , we scale the detector using the estimator of σ e s 2 , the TAVC at scale D = e s . However, if e s > M , it is recommended to scale the detector using the estimator of σ M 2 , the TAVC at the maximum time-scale M, ensuring that σ e s 2 σ M 2 = O ( M 1 ) .

4. Applications of Robust TAVC Estimator

For multiple CPD, the suggested TAVC estimator can be used with the MOSUM method and WBS algorithms that scan the data using detector statistics of the type (19).

4.1. Multiscale MOSUM Procedure

The MOSUM procedure is utilized for simultaneously detecting and locating points where the various changes have occurred [5,13,44]. The MOSUM statistic used for detecting a change in mean at time point k for a given bandwidth G is defined as
T G = max G k T G T G ( k ) σ ^ k
with MOSUM detector
T G ( k ) = T G ( k ; Y ) = 1 2 G t = k + 1 k + G Y t t = k G + 1 k Y t , k = G , , T G
and σ ^ k 2 is the estimator of error variance.
As the exact probability distribution of the MOSUM statistic is not well known, we use an asymptotic result to develop a MOSUM-based test for changes in the mean [13,45]. By utilizing the asymptotic result of a MOSUM-based test with an asymptotic significance level α ( 0 , 1 ) , we reject the null hypothesis H 0 : p = 0 in favor of the alternative H 1 : p 1 when the MOSUM statistic T G , as indicated in (27), surpasses the asymptotic critical value given by
Γ T ( G ; α ) = b G , T + c α a G , T with c α = log log 1 1 α .
Here, a G , T and b G , T are properly chosen scaling and shifting factors that depend on the data length T and the bandwidth G (see [13]). In practical situations, the performance of the methodology is significantly determined by choice of the bandwidth G. Employing of mono-bandwidth MOSUM procedure yields consistent estimators with optimal localization rate only under the assumption that min 0 i p ( τ i + 1 τ i ) > 2 G , min 1 i p μ i 2 G / log T / G as T (see, Theorem 3.2 of [13] and Corollary D1 of [46]). When the change points are heterogeneous, i.e., if the data sequence contains both large changes over short intervals and small changes over a long stretches, the mono-bandwidth MOSUM procedure fails to achieve consistency in the estimation of change points. One way to address this issue is by utilizing a multiscale MOSUM procedure that incorporates bottom-up merging [47] and localized pruning [48] techniques to identify a set of candidate change point estimators. A multiscale MOSUM procedure with bottom-up merging has low computational complexity [45] hence, we will employ this procedure in our study.
Let G = { G ι , 1 ι H : G 1 < < G H } represent a range of bandwidths, and let χ ( G ) indicate the set of estimators identified with a particular bandwidth G G . Next, check if there is a previously accepted change-point k ^ 1 in the final set of estimators χ obtained using the finest bandwidth G 1 , which is in proximity to the current candidate k ^ 2 detected with bandwidth G 2 . Proceed to accept the change point k ^ 2 χ ( G 2 ) only if k ^ 2 k ^ 1 > η G 2 ; otherwise, reject k ^ 2 as it has already been identified by k ^ 1 . Repeat this process sequentially for ι = 3 , , H . The recommended value of η is 0.4 to have a good estimate of the change point and its localization [45].
Then, we apply the robust TAVC estimator to the multiscale MOSUM procedure with bottom-up merging to standardize the MOSUM detector. For each G ι G , the TAVC at scale D = 2 G ι is estimated by σ ^ 2 G ι 2 solving (22), under the condition that 2 G ι M , where M represents the maximum scale, as explained in Section 3.3.1. Subsequently, the global estimator σ ^ k in (27) will be substituted with σ ^ 2 G ι to standardize the MOSUM detector T G ι ( k ) , while for 2 G ι > M , we will use σ ^ M instead of σ ^ 2 G ι to scale the MOSUM detector. Such a way of standardizing will correctly reflect the extent of variation over the sliding window and accurately estimate the change points. The algorithm for the multiscale MOSUM method combined with the TAVC estimator can be found in [27].

4.2. WBS2 Procedure

Binary segmentation, introduced by [49], is a generic technique that provides a simple framework to extend single change point detection to multiple changes. It initially scans the entire dataset for a change point using a CUSUM statistic as defined in (19). If a significant change point is found, the data are split into two segments, and the process is repeated recursively until no change point is detected. It has various extensions like circular binary segmentation [50], wild binary segmentation (WBS) [51], and WBS2 [52], each proposed to address the shortcomings associated with the earlier versions. WBS2 is designed to overcome the issues of lack of computational adaptivity and incomplete solution path associated with WBS. We combine the WBS2 algorithm with the proposed TAVC estimator for multiple change point detection in the mean under model (18).
Suppose that B s , e = { ( l , r ) Z 2 : s l < r e , r l > 1 } represents the collection of all possible intervals within { s + 1 , , e } for some 0 s < e T . Let H s , e B s , e selected either randomly or deterministically (we use a deterministic grid selection approach) with H s , e = min ( M , B s , e ) for some given sub-samples M T ( T 1 ) 2 drawn from { 1 , , T } . Starting with ( s , e ) = ( 0 , T ) , we identify the candidate change point that maximizes the absolute CUSUM statistic as
( s 0 , k 0 , e 0 ) = arg max ( l , k , r ) : l < k < r ( l , r ) H s , e T l , k , r σ ^ r l with T s 0 , k 0 , e 0 σ ^ e 0 s 0 > Γ
where σ ^ r l 2 denotes the estimator of the TAVC with length D = r l (when r l is odd, we use σ ^ r l 1 2 instead) and Γ is the threshold. Γ is determined as Γ = C 2 log T where C is a constant taken based on the recommendation of Fryzlewicz [51]. As described in Section 3.3.1, for standardizing of CUSUM statistic, we use σ ^ M for any interval of length greater than the maximum scale M. WBS2 marks the first candidate change point ( s 0 , k 0 , e 0 ) which satisfies Equation (28) and then splits the entire dataset into Y t t = s + 1 k 0 and Y t t = k 0 + 1 e on which the same step be repeated on the partitioned segments. If there is no such ( s 0 , k 0 , e 0 ) found or the minimum segment length defined by the user has been attained, the search for change point will end up on Y t t = s + 1 e and no change point is returned as the final result. An algorithm for WBS2 combined with TAVC estimator can be found in [27].

5. Numerical Results

This section outlines simulation and real data applications to validate theoretical findings.

5.1. Parameter Tuning for CPD

The tuning parameter ν in (22) is defined as ν = ν h = π ¯ h 1 G / T where π ¯ h is a fixed constant for a given starting value h { 0 , 1 , , G 1 } . For the choice of π ¯ h , we employ both the trimmed mean of { π j , h } j = 1 N ( h ) proposed by Chen et al. [42] and scaled median value of { π j , h } j = 1 N ( h ) suggested by Mcgonigle et al. [53]. A trimmed mean is defined as
π ¯ [ 1 ] , h = 1 3 N ( h ) / 4 N ( h ) / 4 + 1 j = N ( h ) / 4 3 N ( h ) / 4 π ( j ) , h ,
where π ( 1 ) , h π ( N ( h ) ) , h are ordered { π j , h } j = 1 N ( h ) . Whereas an appropriately scaled median value of { π j , h } j = 1 N ( h ) that ensures unbiasedness is given as
π ¯ [ 2 ] , h = Δ · M e d i a n { π 1 , h , , π N ( h ) , h } ,
where Δ is a scaling constant and it is taken to be Δ = 2.125 . Hence, we employ π ¯ [ i ] , h , i = 1 , 2 , in setting the parameter ν h in order to observe their effect on the simulation result. The other is the maximum time scale M, recommended by Mcgonigle and Cho [27] at which the TAVC to be estimated is M = 2.5 T . For tuning the parameters of the multiscale MOSUM procedure, we follow [27,46]. Moreover, we use the R package ‘mosum’ available on CRAN [45] suggested default settings to define additional tuning parameters, and set η = 0.4 and α = 0.05 . Whereas, for tuning the parameters of the WBS2 algorithm, we followed the approaches discussed at [3,27,51].

5.2. Simulation Results

In this section, we conduct an extensive simulation study comparing the performance of the proposed ARIMA state space and Kalman smoothing with the various competitive univariate time series imputation methods for the series with mean shift. Moreover, our proposed CPD approach is compared against that of various competing methods that consider serial dependence under (18) and are easily accessible in R, with different scenarios for generating serially correlated ε t t Z . We generate 1000 realizations with T = 1000 under each setting by introducing a mean shift in the data where u t i i d N ( 0 , 1 ) unless otherwise stated.

5.2.1. Set-Up

This section first presents settings to examine the performance of missing value imputation methods. The data are generated under different scenarios including change points (mean shift). Once the data are generated, we artificially deleted varying percentages (20%, 30%, and 40%) of values with missing completely at random (MCAR) mechanism using ‘missMethods’ R package available on CRAN [54]. Then, the missing values for simulated data are imputed using various univariate time series imputation methods, namely: interpolation (linear, spline and Stineman), mean imputation, last observation carried forward (LOCF), and moving average (simple (SMA), linear weighted (LWMA), and exponential weighted (EWMA)) methods. We employ ‘imputeTS’ R package available on CRAN [55] to estimate the missing values. To show the robustness of the proposed ARIMA state space and Kalman smoothing in estimating missing values in the presence of change point (mean shift), we calculate the proportion of change points estimated using MOSUM CPD method scaled by the TAVC estimator.
(M1*)
f t undergoes p = 2 with change points at ( τ 1 , τ 2 ) = ( 200 , 600 ) and ( f 0 , f 1 , f 2 ) = ( 0 , 10 , 20 ) , and the model follows seasonal ARIMA (0, 1, −0.4)(0, 1, −0.56)12.
(M2*)
f t undergoes p = 2 with change points at ( τ 1 , τ 2 ) = ( 300 , 700 ) and ( f 0 , f 1 , f 2 ) = ( 0 , 30 , 10 ) , and the model follows seasonal ARIMA (0.5, 1, −0.8)(0, 1, 0)12.
(M3*)
f t undergoes p = 4 with change points at ( τ 1 , τ 1 , τ 2 , τ 4 ) = ( 200 , 400 , 600 , 800 ) and f i = T i / 16 , i = 1 , , 4 , and the model follows seasonal ARIMA (0.9, 0, 0)(0, 1, 0)12.
Models (M1*) and (M2*) represent nonstationary seasonal series scenarios and model (M3*) is strongly autocorrelated seasonal series. Model (M1*) is adopted based on modeling air passenger time series data, which is commonly used in much literature [39,40,56] with sigma = 0.0025. While model (M2*) is adopted from Dette et al. [14] with modification of introducing nonstationarity to make the model a nonstationary seasonal ARIMA model with sigma = 1. Finally, model (M3*) is taken from Cho and Fryzlewicz [28] and we added seasonality to produce strongly autocorrelated seasonal A R ( 1 ) series with sigma = 1. In model (M1*), (M2*), and (M3*), we injected 20%, 40% and 30% of the missing values, respectively. For the detection of change points, we apply the MOSUM procedure scaled with the TAVC estimator by choosing the tuning parameter ν in (22) using (29), the bandwidth ( G ) is set to be 60 and α = 0.05 .
Furthermore, we investigate the performance of MOSUM and WBS2 procedures scaled with the TAVC estimator as described in Section 4 on simulated datasets, in comparison with various methods when there is no change ( p = 0 ) and there is at least one change ( p 1 ) under different error structures. Moreover, we study the condition where f t = 0 to examine the performance in size control.
(M1)
ε t = u t , and f t undergoes p = 4 with change points at ( τ 1 , τ 2 , τ 3 , τ 4 ) = ( 200 , 400 , 600 , 800 ) and ( f 0 , f 1 , f 2 , f 3 , f 4 ) = ( 0 , 2 , 2 , 2 , 2 ) .
(M2)
ε t = u t , where u t is an independent random variable following t 5 distribution, and f t and τ i are same as in (M1).
(M3)
ε t = u t + 0.9 ε t 1 follows AR(1) process with σ u = 0.4359 , and f t undergoes p = 4 with change points τ i as in (M2) and ( f 0 , f 1 , f 2 , f 3 , f 4 ) = ( 0 , 2 , 1 , 2 , 1 ) .
(M4)
ε t = u t + 0.5 ε t 1 + 0.3 ε t 2 follows AR(2) process with σ u = 0.6676 , and f t and τ i undergo as in (M2).
(M5)
ε t = u t 0.9 u t 1 follows MA(1) process and f t and τ i undergo as in (M2).
(M6)
ε t = σ t 2 u t with σ t 2 = 0.5 + 0.4 ε t 1 2 follows ARCH(1) model, and f t and τ i undergo as in (M5).
Model (M1) is widely used in various kinds of literature, and (M2) is taken from [46] to evaluate how well the proposed approach performs other than Gaussian error. In models (M3) and (M4), ε t has a high degree of auto-correlations, which is adapted from Cho and Fryzlewicz [28]. For model (M5), long-run variance approaches zero, and thus it is difficult to obtain an accurate estimate and may affect the performance of the methods. Model (M6) is adopted from Mcgonigle and Cho [27] to examine whether a method works well for non-linear processes. We apply the proposed TAVC robust estimator to both multiscale MOSUM and WBS2 procedures, denoted as MOSUM.TAVC[i] and WBS2.TAVC[i], respectively. Here, i indicates the selection of tuning parameter π ¯ [ i ] , h related to the parameter ν ; refer to (29) and (30). The selection of tuning parameters is discussed in Section 5.1.
For comparison, we include DeCAFS [9], DepSMUCE [14], and WCM.gSa [28]. DeCAFS is adaptable to the problem of detecting multiple change points in the mean under the assumption that the error term follows the stationary AR(1) model. DepSMUCE extends the SMUCE procedure of [57] to dependent data by estimating the long-run variance (LRV) using a difference-type estimator. The WCM.gSa is adapted to detect multiple change points in the mean of an otherwise stationary, autocorrelated, linear time series. It combines solution path generation based on the wild contrast maximization principle, and an information criterion-based model selection strategy termed the gappy Schwarz algorithm. The significance levels α { 0.05 , 0.1 } are considered for the DepSMUCE method. The tuning parameters not specified herein are implemented according to the authors’ recommendations.

5.2.2. Results

The comparative simulation study results on missing value imputation, which encompassed 1000 realizations for T = 1000 , generated according to models (M1*)–(M3*) with p = 2 , 4 , are succinctly summarized in Table 1. We report the average mean absolute percentage error (MAPE), and root mean square error (root MSE) over 1000 realizations to evaluate the performance of various missing value imputation methods. In addition, we also report the distribution of the estimated number of change points and the average covering metric (CM, defined later) for the imputed data returned by applying the MOSUM procedure scaled with the TAVC estimator. Methods that achieve the best performance based on the evaluation criteria for each scenario are shown in bold.
MAPE and root MSE between the actual time series value y t and the corresponding imputed value y ^ t can be defined as:
M A P E ( y t , y ^ t ) = t = 1 T y t y ^ t y t 100 % T , R o o t M S E ( y t , y ^ t ) = t = 1 T y t y ^ t 2 T
where T is the time period (number of observations).
Table 1 shows that ARIMA state space and Kalman smoothing imputation outperformed other methods in most scenarios, as indicated by their lower MAPE and root MSE values. Among the other methods, interpolation and moving average methods provided more accurate approximations than mean imputation and LOCF methods. Conversely, mean imputation, which relies on a single measure (mean, median, or mode) to fill in all missing values, exhibited the poorest performance. Moreover, the number of change points is estimated with great accuracy for the data imputed using the ARIMA state space and Kalman smoothing method, as indicated by the model selection accuracy, p ^ p and the covering metric. The proportion of the correctly estimated number of change points and the covering metric for the data imputed by using ARIMA state space and Kalman smoothing almost approaches one in many of the cases. Furthermore, the proposed imputation method has not resulted in under-detecting change points, but a very small percent of spurious change points are observed. Overall, the proposed approach outperformed most other univariate imputation methods in the majority of scenarios. Imputation by interpolation and moving average methods yields a relatively good estimate of the missing value for the data in the presence of a mean shift. Nevertheless, the worst false positive and false negative change points are detected for the data imputed by using the mean imputation method. Therefore, we will employ ARIMA state space and the Kalman smoothing method to estimate and impute missing values for the real data application and determine change points by applying multiscale CPD methods scaled with TAVC as discussed in Section 4.
While this method improves the handling of missing data by offering the nearest estimated value, it exhibits a high computational complexity in comparison to other univariate imputation techniques, particularly when applied to large datasets. This complexity is due to the algorithm’s iterative nature, which updates the estimate of the state vector as new observations become available [37,56].
Table 2 summarizes the results of the simulation study for CPD, showing the performance of different methods for T = 1000 generated over 1000 realizations as in (M1)–(M6) with p { 0 , 4 } . We report the proportion of realizations of p ^ 1 when p = 0 (in ‘size’ column) and the summary of estimated change points based on the distribution of p ^ . We also present the averaged CM and relative mean squared error (RMSE) over 1000 realizations when p = 4 . The modal value of the distribution of the estimated number of change points with the highest value for each scenario is shown in bold. Based on the evaluation criteria, the methods that attained the highest performance level when p = 4 are underlined. From the theoretical point of view, RMSE is one the most common error metrics that is applied in various literatures [27,28] to evaluate the performance of change point detection methods, which measures the deviation between the estimated change point positions and the true change point locations. It assigns a higher weight to larger errors due to its squaring effect, making it sensitive to substantial deviations in change point detection, which is essential in the context of time series analysis. Moreover, RMSE serves to evaluate the performance of change point detection methods in the presence of missing data, illustrating how accurately the detected change points correspond to the true points, even in light of imputation errors. It also provides insight into the impact of imputed data on the accuracy of change point estimates. If imputation is flawed, RMSE will increase, reflecting greater discrepancies between estimated and actual change points. RMSE is defined as:
R M S E = t = 1 T f t f ^ t 2 t = 1 T f t f ^ t * 2
where f ^ t is the piecewise constant signal constructed with the set of estimated change point locations τ ^ , and f ^ t * is an oracle estimator constructed with the true change point locations, τ i .
Covering metric is another evaluation metric, whose concept is a type of clustering metric that aligns with the aim that CPD fundamentally seeks to partition the time series into separate segments characterized by a stable data-generating process [58]. It measures the overlap between predicted region segments produced from the estimated change point and the ground truth regions obtained from the true change points [59]. Moreover, the covering metric captures how well the method detects structural changes in time series data, even when some points are imputed. Thus, it is recommended as an evaluation metric to measure the performance of CPD algorithms. Let S be a segmentation of the interval { 1 , 2 , , T } into disjoint sets R i s.t. R i is the segment { τ i 1 + 1 , , τ i } for the true change locations. In the same way, the estimated change locations τ ^ i i = 1 p ^ generates a partition S of segments R . Then, we define CM as
C M ( S , S ) = 1 T R S R max R S R R R R
The value of CM lies [0, 1]. Together, RMSE and the covering metric provide a balanced evaluation of the performance, accounting for both exactness (RMSE) and the method’s ability to cover true structural changes (covering metric), making them valuable tools for evaluating CPD in datasets with missing values.
Empirical simulation results presented in Table 2 show that, overall, MOSUM.TAVC tends to have more false positives in terms of size control compared to WBS2.TAVC. This is because the MOSUM procedure accepts all the candidate change point estimators obtained from the smallest bandwidth. In addition, MOSUM.TAVC outperforms some of its competitors, including DeCAFS and DepSMUCE, in terms of size control. Apart from this, regarding the choice of the tuning parameter ν employed in (22), the multiscale procedures corresponding to subscript 2, obtained by applying the median given in (30) for TAVC estimation, return better size control. On the contrary, when p 1 , the utilization of the trimmed mean (represented by subscript 1) defined in Equation (29) exhibits improved power. This suggests adopting an approach that combines both choices of ν , thereby achieving a more balanced performance that takes into account the trade-off between power and size.
Moreover, when p 1 , WBS2.TAVC performs well in most scenarios across different evaluation criteria, such as model selection accuracy measured by p ^ p , RMSE, or CM. In particular, WBS2.TAVC corresponding to the subscript 1 belonging to the trimmed mean seems to perform exceptionally well according to the evaluation criteria. DepSMUCE may encounter calibration problems. To inhibit the detection of spurious change points, it is necessary to use a conservative value for α . Conversely, for better detection power, a higher α value is preferable. In addition, the LRV estimator proposed in DepSMUCE tends to underestimate the LRV when it approaches zero, as observed in (M5), leading to a substantial number of falsely detected change points. DeCAFS works based on ε t follows an AR(1) process. As a result, it is utilized under a model that is not correctly specified in certain scenarios, yet it continues to show satisfactory performance. However, it returns spurious change points and fails to adequately control size, even when applied to the correctly specified model (M3). Based on the evaluation criteria, in most scenarios, WCM.gSa generally demonstrates strong performance in maintaining size control and accurately determining the number of change points when p 1 .
Furthermore, Figure 1 shows histogram plots of the detected change point locations from different methods for model (M6) applied to 1000 realizations of data. The multiscale MOSUM.TAVC and WBS2.TAVC are highly precise and reliable, showing strong peaks at the true change points (200, 400, 600, and 800) with minimal spurious detections. DeCAFS and DepSMUCE(0.1) exhibit more spurious change points, indicating lower precision compared to the MOSUM.TAVC and WBS2.TAVC methods. DepSMUCE(0.05) shows better precision than DepSMUCE(0.1) but still has some spurious detections. WCM.gSa demonstrates high precision, similar to MOSUM.TAVC and WBS2.TAVC methods, making it a reliable choice as well. Overall, for applications where minimising false positives is crucial, MOSUM.TAVC and WBS2.TAVC are the best choices.

5.3. Real Data Application: Fine Particulate Matter 2.5 (PM2.5)

Different studies revealed that fine particulate matter, particularly PM2.5, is the main air pollutant, among others. The main anthropogenic sources of PM2.5 and its acute and chronic health effects have been well-presented in [60,61]. In this study, we consider the Beijing multi-site air quality dataset collected from 1 March 2013 to 28 February 2017, particularly the daily average concentration of PM2.5. The dataset consists of six air pollutants, including PM2.5, and six meteorological variables collected from twelve monitoring stations in Beijing. It has 35,064 data points, which were recorded hourly at each station from which the daily measure is taken, and consists of randomly missing values. The dataset was retrieved from the University of California Irvine machine learning repository and is available from [62].
We have estimated the hourly missing records of PM2.5 by using the ARIMA state space model and Kalman smoothing. After estimating and imputing the hourly missing data, the daily average PM2.5 concentration data are the results of averaging the site according to the hourly data of the environmental protection station recorded that day (https://www.aqistudy.cn, accessed on 15 February 2024). The data show seasonality, which may be attributed to heating during the winter season as the primary contributor to PM2.5. To mitigate this effect, we apply a square root transformation to the data and eliminate seasonality. We analyze the transformed imputed time series from PM2.5 concentrations for change points in the level using the proposed MOSUM.TAVC and WBS2.TAVC. We select the parameter ν in (22) using (29) and choose the other tuning parameters as recommended in Section 5.1. The daily average PM2.5 concentration along with the detected change points by MOSUM.TAVC and WBS2.TAVC are plotted in Figure 2. For comparison, we also present the change points estimated by DeCAFS, DepSMUCE(0.05), and DepSMUCE(0.1) in Table 3. As WCM.gSa did not detect any change points, it has been excluded from the report.
MOSUM.TAVC and WBS2.TAVC detect three and one change point, respectively, some of which can be linked to weather conditions and policy changes likely affecting the levels of air pollutants. Adverse meteorological conditions in November and December 2015 caused several heavy pollution events in North China, where daily average PM2.5 concentrations exceeded 150 μ g/m3. At the same time, the Beijing–Tianjin–Hebei region experienced two major pollution episodes, each persisting for more than five consecutive days [61]. This change more likely corresponds to 31 October 2015, which was detected by MOSUM.TAVC. On the other hand, China implemented the Action Plan for Prevention and Control of Air Pollution, along with a reinforced version for the Beijing-Tianjin-Hebei Region (2016–2017). Measures included optimising the energy structure, switching from coal to electricity or gas, eliminating small coal-fired boilers, and implementing ultra-low emission technical reforms. Furthermore, old vehicles were phased out, and new emission standards for light-duty vehicles and ship engines were established [63]. The implementation of the policy has significantly reduced the level of PM2.5, which can be linked to the change on 3 January 2016 and it is detected by all methods.

6. Conclusions

In this paper, we proposed the ARIMA state space model and Kalman smoothing to impute missing values for a univariate time series. By utilizing fixed-point smoothing on the Kalman filter, we have shown that ARIMA state space and Kalman smoothing yield a minimum mean square estimate for the missing observation. It has also been shown that the proposed imputation method has better approximated the missing observation and provides good adaptivity to the mean shift in the series. For detecting change points in imputed time series, we have applied a robust scale-dependent TAVC estimator and showed its consistency under heavy tails and serial dependence circumstances. Moreover, the MOSUM and WBS2 procedures scaled with the proposed TAVC estimator have better estimated the number of change points and their corresponding locations. Specifically, WBS2 combined with the TAVC estimator has accurately detected the change in mean shift. Notably, the method also successfully identified change points in time series with seasonal variations, without the need for data transformation, highlighting its practical applicability for real-world datasets and indicating a promising avenue for future research. Furthermore, our approach has the potential to be applied to local stationarity, providing a promising direction for future research.
In addition, for detecting change points in a time series exhibiting periodic fluctuations such as seasonal and cyclic variation, investigating periodic models [64,65] to separate cyclical changes from genuine shifts. Furthermore, integrating Markov switching models provides a promising avenue for detecting regime shifts, where the system changes from one state to another [66]. Finally, threshold models could be explored to address structural breaks in time series when variables exceed certain critical levels, offering a deeper understanding of nonlinear dynamics and abrupt shifts in real-world phenomena [67]. By addressing these open problems, future research could extend the applicability and robustness of the proposed methods, particularly for complex time series with structural or periodic changes in diverse fields such as economics, finance, and environmental science.

Author Contributions

Conceptualization, T.T.H., F.T., G.A. and B.T.; Formal analysis, T.T.H., F.T., G.A. and B.T.; Supervision, B.T.; Validation, T.T.H., F.T., G.A. and B.T.; Visualization, T.T.H., F.T., G.A. and B.T.; Writing—original draft, T.T.H.; Writing—review and editing, T.T.H., F.T., G.A. and B.T.; Investigation, T.T.H., F.T., G.A. and B.T.; Methodology, T.T.H., F.T., G.A. and B.T.; Funding acquisition, G.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R45), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Data Availability Statement

The data applied in this paper are openly available at the University of California, Irvine Machine Learning Repository, accessible through https://doi.org/10.24432/C5RK5G (accessed on 20 July 2023).

Acknowledgments

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R45), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Aminikhanghahi, S.; Cook, D.J. A survey of methods for time series change point detection. Knowl. Inf. Syst. 2017, 51, 339–367. [Google Scholar] [CrossRef] [PubMed]
  2. Truong, C.; Oudre, L.; Vayatis, N. Selective review of offline change point detection methods. Signal Process. 2020, 167, 107299. [Google Scholar] [CrossRef]
  3. Cho, H.; Kirch, C. Data segmentation algorithms: Univariate mean change and beyond. Econom. Stat. 2024, 30, 76–95. [Google Scholar] [CrossRef]
  4. Csorgo, M.; Horváth, L. Limit Theorems in Change-Point Analysis; Wiley: New York, NY, USA, 1997. [Google Scholar]
  5. Hušková, M.; Slabỳ, A. Permutation tests for multiple changes. Kybernetika 2001, 37, 605–622. [Google Scholar]
  6. Yao, Y.C. Estimating the number of change-points via Schwarz’ criterion. Stat. Probab. Lett. 1988, 6, 181–189. [Google Scholar] [CrossRef]
  7. Lavielle, M.; Moulines, E. Least-squares estimation of an unknown number of shifts in a time series. J. Time Ser. Anal. 2000, 21, 33–59. [Google Scholar] [CrossRef]
  8. Chakar, S.; Lebarbier, E.; Lévy-Leduc, C.; Robin, S. A robust approach for estimating change-points in the mean of an AR(1) process. Bernoulli 2017, 23, 1408–1447. [Google Scholar] [CrossRef]
  9. Romano, G.; Rigaill, G.; Runge, V.; Fearnhead, P. Detecting abrupt changes in the presence of local fluctuations and autocorrelated noise. J. Am. Stat. Assoc. 2022, 117, 2147–2162. [Google Scholar] [CrossRef]
  10. Lu, Q.; Lund, R.; Lee, T.C.M. An MDL approach to the climate segmentation problem. Ann. Appl. Stat. 2010, 4, 299–319. [Google Scholar] [CrossRef]
  11. Gallagher, C.; Killick, R.; Lund, R.; Shi, X. Autocovariance Estimation in the Presence of Changepoints. J. Korean Stat. Soc. 2022, 51, 1021–1040. [Google Scholar] [CrossRef]
  12. Tecuapetla-Gómez, I.; Munk, A. Autocovariance estimation in regression with a discontinuous signal and m-dependent errors: A difference-based approach. Scand. J. Stat. 2017, 44, 346–368. [Google Scholar] [CrossRef]
  13. Eichinger, B.; Kirch, C. A MOSUM procedure for the estimation of multiple random change points. Bernoulli 2018, 24, 526–564. [Google Scholar] [CrossRef]
  14. Dette, H.; Eckle, T.; Vetter, M. Multiscale change point detection for dependent data. Scand. J. Stat. 2020, 47, 1243–1274. [Google Scholar] [CrossRef]
  15. Chan, K.W. Optimal difference-based variance estimators in time series: A general framework. Ann. Stat. 2022, 50, 1376–1400. [Google Scholar] [CrossRef]
  16. Robbins, M.; Gallagher, C.; Lund, R.; Aue, A. Mean shift testing in correlated data. J. Time Ser. Anal. 2011, 32, 498–511. [Google Scholar] [CrossRef]
  17. den Haan, W.J.; Levin, A.T. 12 A practitioner’s guide to robust covariance matrix estimation. In Handbook of Statistics; Elsevier: Amsterdam, The Netherlands, 1997; Volume 15, pp. 299–342. [Google Scholar]
  18. Chan, K.W.; Yau, C.Y. High-order corrected Estimator of Asymptotic Variance with Optimal Bandwidth. Scand. J. Stat. 2017, 44, 866–898. [Google Scholar] [CrossRef]
  19. Hušková, M.; Kirch, C. A note on studentized confidence intervals for the change-point. Comput. Stat. 2010, 25, 269–289. [Google Scholar] [CrossRef]
  20. Schmitt, P.; Mandel, J.; Guedj, M. A comparison of six methods for missing data imputation. J. Biom. Biostat. 2015, 6, 1. [Google Scholar]
  21. Somasundaram, R.; Nedunchezhian, R. Evaluation of three simple imputation methods for enhancing preprocessing of data with missing values. Int. J. Comput. Appl. 2011, 21, 14–19. [Google Scholar] [CrossRef]
  22. Xie, Y.; Huang, J.; Willett, R. Change-point detection for high-dimensional time series with missing data. IEEE J. Sel. Top. Signal Process. 2012, 7, 12–27. [Google Scholar] [CrossRef]
  23. Follain, B.; Wang, T.; Samworth, R.J. High-dimensional changepoint estimation with heterogeneous missingness. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2021, 84, 1023–1055. [Google Scholar] [CrossRef]
  24. Londschien, M.; Kovács, S.; Bühlmann, P. Change-point detection for graphical models in the presence of missing values. J. Comput. Graph. Stat. 2021, 30, 768–779. [Google Scholar] [CrossRef]
  25. Li, S.; Hu, T.; Wang, L.; McMahan, C.S.; Tebbs, J.M. Regression analysis of group-tested current status data. Biometrika 2024, 111, 1047–1061. [Google Scholar] [CrossRef]
  26. Zhao, Y.; Landgrebe, E.; Shekhtman, E.; Udell, M. Online missing value imputation and change point detection with the gaussian copula. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 27 February–2 March 2022; Volume 36, pp. 9199–9207. [Google Scholar]
  27. McGonigle, E.T.; Cho, H. Robust multiscale estimation of time-average variance for time series segmentation. Comput. Stat. Data Anal. 2023, 179, 107648. [Google Scholar] [CrossRef]
  28. Cho, H.; Fryzlewicz, P. Multiple change point detection under serial dependence: Wild contrast maximisation and gappy Schwarz algorithm. J. Time Ser. Anal. 2024, 45, 479–494. [Google Scholar] [CrossRef]
  29. Ansley, C.F.; Kohn, R. On the estimation of ARIMA models with missing values. In Time Series Analysis of Irregularly Observed Data: Proceedings of a Symposium Held at Texas A & M University, College Station, TX, USA, 10–13 February 1983; Springer: Berlin/Heidelberg, Germany, 1984; pp. 9–37. [Google Scholar]
  30. Harvey, A.C. Forecasting, Structural Time Series Models and the Kalman Filter; Cambridge University Press: New York, NY, USA, 1989. [Google Scholar]
  31. Tsay, R.S. Analysis of Financial Time Series, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2010. [Google Scholar]
  32. Anderson, B.; Moore, J.B. Optimal Filtering; Prentice-Hall, Inc.: Englewood Cliffs, NJ, USA, 1979. [Google Scholar]
  33. Durbin, J.; Koopman, S.J. Time Series Analysis by State Space Methods, 2nd ed.; Oxford University Press: Oxford, UK, 2012. [Google Scholar]
  34. Welch, G.; Bishop, G. An Introduction to the Kalman Filter; University of North Carolina: Chapel Hill, NC, USA, 1997. [Google Scholar]
  35. Wijesekara, L.; Liyanage, L. Mind the Large Gap: Novel Algorithm Using Seasonal Decomposition and Elastic Net Regression to Impute Large Intervals of Missing Data in Air Quality Data. Atmosphere 2023, 14, 355. [Google Scholar] [CrossRef]
  36. Wijesekara, W.M.L.K.N.; Liyanage, L. Comparison of Imputation Methods for Missing Values in Air Pollution Data: Case Study on Sydney Air Quality Index. In Advances in Information and Communication; Arai, K., Kapoor, S., Bhatia, R., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 257–269. [Google Scholar]
  37. Moritz, S.; Bartz-Beielstein, T. imputeTS: Time series missing value imputation in R. R J. 2017, 9, 207. [Google Scholar] [CrossRef]
  38. Fletcher, R. Practical Methods of Optimization, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 1987. [Google Scholar]
  39. Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control, 5th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  40. Harvey, A.C.; Pierse, R.G. Estimating missing observations in economic time series. J. Am. Stat. Assoc. 1984, 79, 125–131. [Google Scholar] [CrossRef]
  41. Catoni, O. Challenging the empirical mean and empirical variance: A deviation study. Ann. de l’IHP Probabilités et Stat. 2012, 48, 1148–1185. [Google Scholar] [CrossRef]
  42. Likai Chen, W.W.; Wu, W.B. Inference of Breakpoints in High-dimensional Time Series. J. Am. Stat. Assoc. 2022, 117, 1951–1963. [Google Scholar] [CrossRef]
  43. Wong, K.C.; Li, Z.; Tewari, A. Lasso guarantees for β-mixing heavy-tailed time series. Ann. Stat. 2020, 48, 1124–1142. [Google Scholar] [CrossRef]
  44. Chu, C.S.J.; Hornik, K.; Kaun, C.M. MOSUM tests for parameter constancy. Biometrika 1995, 82, 603–617. [Google Scholar] [CrossRef]
  45. Meier, A.; Kirch, C.; Cho, H. mosum: A package for moving sums in change-point analysis. J. Stat. Softw. 2021, 97, 1–42. [Google Scholar] [CrossRef]
  46. Cho, H.; Kirch, C. Two-stage data segmentation permitting multiscale change points, heavy tails and dependence. Ann. Inst. Stat. Math. 2022, 74, 653–684. [Google Scholar] [CrossRef]
  47. Messer, M.; Kirchner, M.; Schiemann, J.; Roeper, J.; Neininger, R.; Schneider, G. A multiple filter test for the detection of rate changes in renewal processes with varying variance. Ann. Appl. Stat. 2014, 8, 2027–2067. [Google Scholar] [CrossRef]
  48. Niu, Y.S.; Zhang, H. The screening and ranking algorithm to detect DNA copy number variations. Ann. Appl. Stat. 2012, 6, 1306. [Google Scholar] [CrossRef]
  49. Scott, A.J.; Knott, M. A cluster analysis method for grouping means in the analysis of variance. Biometrics 1974, 30, 507–512. [Google Scholar] [CrossRef]
  50. Olshen, A.B.; Venkatraman, E.S.; Lucito, R.; Wigler, M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 2004, 5, 557–572. [Google Scholar] [CrossRef]
  51. Fryzlewicz, P. Wild binary segmentation for multiple change-point detection. Ann. Stat. 2014, 42, 2243–2281. [Google Scholar] [CrossRef]
  52. Fryzlewicz, P. Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection. J. Korean Stat. Soc. 2020, 49, 1027–1070. [Google Scholar] [CrossRef]
  53. McGonigle, E.T.; Killick, R.; Nunes, M.A. Detecting changes in mean in the presence of time-varying autocovariance. Stat 2021, 10, e351. [Google Scholar] [CrossRef]
  54. Rockel, T. R Package, version 0.4.0.; missMethods: Methods for Missing Data; R Foundation for Statistical Computing: Vienna, Austira, 2022.
  55. Moritz, S.; Gatscha, S. R Package, version 3.3; imputeTS: Time Series Missing Value Imputation in R; R Foundation for Statistical Computing: Vienna, Austira, 2021.
  56. Moritz, S.; Sardá, A.; Bartz-Beielstein, T.; Zaefferer, M.; Stork, J. Comparison of different methods for univariate time series imputation in R. arXiv 2015, arXiv:1510.03924. [Google Scholar]
  57. Frick, K.; Munk, A.; Sieling, H. Multiscale change point inference. J. R. Stat. Society. Ser. B (Stat. Methodol.) 2014, 76, 495–580. [Google Scholar] [CrossRef]
  58. Van den Burg, G.J.; Williams, C.K. An evaluation of change point detection algorithms. arXiv 2020, arXiv:2003.06222. [Google Scholar]
  59. Arbeláez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour Detection and Hierarchical Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 898–916. [Google Scholar] [CrossRef]
  60. Liang, X.; Zou, T.; Guo, B.; Li, S.; Zhang, H.; Zhang, S.; Huang, H.; Chen, S.X. Assessing Beijing’s PM2.5 pollution: Severity, weather impact, APEC and winter heating. Proc. R. Soc. A Math. Phys. Eng. Sci. 2015, 471, 20150257. [Google Scholar]
  61. Dai, Q.; Dai, T.; Hou, L.; Li, L.; Bi, X.; Zhang, Y.; Feng, Y. Quantifying the impacts of emissions and meteorology on the interannual variations of air pollutants in major Chinese cities from 2015 to 2021. Sci. China Earth Sci. 2023, 66, 1725–1737. [Google Scholar] [CrossRef]
  62. Chen, S. Beijing Multi-Site Air-Quality Data. UCI Machine Learning Repository 2019. Available online: https://archive.ics.uci.edu/dataset/501/beijing+multi+site+air+quality+data (accessed on 20 July 2023).
  63. Ministry of Environmental Protection. 2016 Report on the State of the Environment in China. 31 May 2017. Available online: https://english.mee.gov.cn/Resources/Reports/soe/ (accessed on 13 May 2024).
  64. Boubacar Maïnassara, Y.; Ilmi Amir, A. Portmanteau tests for periodic ARMA models with dependent errors. J. Time Ser. Anal. 2024, 45, 164–188. [Google Scholar] [CrossRef]
  65. Ghezal, A.; Alzeley, O. Probabilistic properties and estimation methods for periodic threshold autoregressive stochastic volatility. AIMS Math. 2024, 9, 11805–11832. [Google Scholar] [CrossRef]
  66. Cavicchioli, M. A matrix unified framework for deriving various impulse responses in Markov switching VAR: Evidence from oil and gas markets. J. Econ. Asymmetries 2024, 29, e00349. [Google Scholar] [CrossRef]
  67. Tong, H. Threshold Models in Non-Linear Time Series Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 21. [Google Scholar]
Figure 1. Histogram plots of estimated change points obtained from various methods under model (6), with true change points at 200, 400, 600, and 800.
Figure 1. Histogram plots of estimated change points obtained from various methods under model (6), with true change points at 200, 400, 600, and 800.
Mathematics 12 03189 g001
Figure 2. Daily average concentrations of PM2.5, plotted together with the change points detected by MOSUM.TAVC (left) and WBS2.TAVC (right) are denoted by blue vertical lines and estimated means are given by red horizontal lines.
Figure 2. Daily average concentrations of PM2.5, plotted together with the change points detected by MOSUM.TAVC (left) and WBS2.TAVC (right) are denoted by blue vertical lines and estimated means are given by red horizontal lines.
Mathematics 12 03189 g002
Table 1. Performance comparisons of various univariate imputation methods for model (M1*)–(M3*) over 1000 realizations for T = 1000 .
Table 1. Performance comparisons of various univariate imputation methods for model (M1*)–(M3*) over 1000 realizations for T = 1000 .
( p ^ p )
Model Method MAPE Root MSE 2 1 0 1 2 ≥3 CM
(M1*)ARIMA.Kalman0.5393.9400.0000.0000.9570.0410.0020.0000.992
 Linear interpolation0.5813.8080.0000.0000.9380.0570.0050.0000.989
 Spline interpolation0.6034.0750.0000.0000.9140.0780.0060.0020.986
 Stineman interpolation0.5703.7270.0000.0000.9410.0550.0040.0000.990
 Mean imputation6.05136.1090.0400.2090.3550.2440.1080.0440.571
 LOCF0.9115.8170.0000.0000.8040.1670.0260.0030.969
 SMA0.9156.0550.0000.0000.8110.1700.0170.0020.970
 LWMA0.8005.3680.0000.0000.8610.1260.0110.0020.978
 EWMA0.7094.7850.0000.0000.8950.0920.0130.0000.982
(M2*)ARIMA.Kalman160.7294.4300.0000.0000.9330.0600.0070.0000.987
 Linear interpolation199.1005.8880.0000.0090.8960.0840.0110.0000.979
 Spline interpolation251.7057.9870.0210.0950.7160.1290.0320.0070.916
 Stineman interpolation206.6646.0250.0000.0100.8960.0830.0110.0000.978
 Mean imputation566.6886.5120.0100.1490.7770.0560.0080.0000.924
 LOCF235.1376.5600.0050.0230.8440.1130.0130.0020.967
 SMA161.9525.0550.0000.0020.9190.0730.0060.0000.985
 LWMA165.8325.1660.0000.0020.9170.0730.0080.0000.984
 EWMA173.5135.3690.0000.0030.9090.0810.0070.0000.983
(M3*)ARIMA.Kalman37.0063.4540.0000.0000.8870.1120.0010.0000.988
 Linear interpolation16.6953.9660.0000.0000.8920.1070.0010.0000.989
 Spline interpolation19.8234.2790.0000.0000.8950.1030.0020.0000.989
 Stineman interpolation15.9663.4840.0000.0000.8890.1100.0010.0000.988
 Mean imputation1438.25449.0490.0320.2190.6550.0870.0060.0010.913
 LOCF25.2924.8050.0000.0000.9010.0980.0010.0000.989
 SMA18.6084.3170.0000.0000.8900.1080.0020.0000.988
 LWMA16.9483.9600.0000.0000.8920.1060.0020.0000.988
 EWMA15.7833.7130.0000.0000.8890.1090.0020.0000.988
Table 2. Performance comparisons for the competing methods for (M1)–(M6) for T = 1000 over 1000 realizations.
Table 2. Performance comparisons for the competing methods for (M1)–(M6) for T = 1000 over 1000 realizations.
      ( p ^ p )    
Model Method Size 2 1 0 1 ≥2 CM RMSE
(M1)MOSUM.TAVC[1]0.1040.0000.0001.0000.0000.0000.9973.007
 MOSUM.TAVC[2]0.0660.0000.0000.9990.0010.0000.9973.007
 WBS2.TAVC[1]0.0350.0000.0001.0000.0000.0000.9991.943
 WBS2.TAVC[2]0.0350.0000.0001.0000.0000.0000.9991.894
 DeCAFS0.0070.0000.3250.6710.0020.0020.9347.839
 DepSMUCE(0.05)0.0030.0000.0000.8930.1070.0000.8214.820
 DepSMUCE(0.1)0.0200.0000.0000.7110.2890.0000.8575.811
 WCM.gSa0.0060.0000.0000.9840.0150.0010.9981.941
(M2)MOSUM.TAVC[1]0.1160.0000.0000.9980.0020.0000.9953.373
 MOSUM.TAVC[2]0.0700.0000.0001.0000.0000.0000.9953.380
 WBS2.TAVC[1]0.0490.0000.0001.0000.0000.0000.9972.402
 WBS2.TAVC[2]0.0240.0000.0001.0000.0000.0000.9972.402
 DeCAFS0.0100.0000.0870.0920.1500.6710.84721.360
 DepSMUCE(0.05)0.3400.0000.0080.6730.1080.2110.91411.538
 DepSMUCE(0.1)0.4760.0000.0100.6400.1290.2210.94312.202
 WCM.gSa0.0100.0000.0000.9910.0060.0030.9982.429
(M3)MOSUM.TAVC[1]0.1610.0000.0520.8020.1290.1100.9402.930
 MOSUM.TAVC[2]0.0840.0010.0780.8120.1060.0030.9353.023
 WBS2.TAVC[1]0.1070.0000.1450.8530.0020.0000.9502.509
 WBS2.TAVC[2]0.0620.0010.1750.8240.0000.0000.9382.785
 DeCAFS0.3440.0000.1290.6610.1800.0300.9564.130
 DepSMUCE(0.05)1.0000.0000.0010.7490.2370.0130.9262.547
 DepSMUCE(0.1)1.0000.0000.0000.7130.2610.0260.9392.633
 WCM.gSa0.0440.0060.0680.7930.0650.0680.9432.608
(M4)MOSUM.TAVC[1]0.1220.0000.0400.9400.0150.0020.9742.491
 MOSUM.TAVC[2]0.0150.0000.0560.9230.0190.0020.9702.374
 WBS2.TAVC[1]0.0550.0000.0790.9210.0000.0000.9732.126
 WBS2.TAVC[2]0.0370.0000.1700.8300.0000.0000.9562.742
 DeCAFS0.3700.0000.3250.6710.0020.0020.8836.776
 DepSMUCE(0.05)1.0000.0000.0040.8210.0680.1070.9502.941
 DepSMUCE(0.1)1.0000.0000.0010.7780.0430.1780.9646.776
 WCM.gSa0.0320.0000.0000.9840.0150.0010.9672.042
(M5)MOSUM.TAVC[1]0.0270.0000.0001.0000.0000.0000.99837.926
 MOSUM.TAVC[2]0.0160.0000.0001.0000.0000.0000.99837.926
 WBS2.TAVC[1]0.0690.0000.0001.0000.0000.0000.99825.809
 WBS2.TAVC[2]0.0330.0000.0001.0000.0000.0000.99825.809
 DeCAFS0.0000.0010.0060.9910.0020.0000.99570.859
 DepSMUCE(0.05)0.0580.0000.0050.5270.0160.4520.871492.296
 DepSMUCE(0.1)0.1190.0000.0000.4620.0240.5140.919512.047
 WCM.gSa0.0000.0000.0001.0000.0000.0000.99824.396
(M6)MOSUM.TAVC[1]0.1540.0010.0000.9990.0010.0000.9981.196
 MOSUM.TAVC[2]0.1070.0000.0000.9950.0040.0010.9981.198
 WBS2.TAVC[1]0.0910.0000.0000.9970.0030.0000.9990.815
 WBS2.TAVC[2]0.0420.0000.0000.9990.0010.0000.9990.815
 DeCAFS0.7740.0000.0550.2150.0710.6590.89816.904
 DepSMUCE(0.05)0.4170.0000.0100.7010.1930.0960.8807.476
 DepSMUCE(0.1)0.4980.0000.0120.6810.1820.1250.9068.363
 WCM.gSa0.0170.0000.0000.9720.0210.0070.9972.867
Table 3. Change points detected from the daily average concentrations of PM2.5.
Table 3. Change points detected from the daily average concentrations of PM2.5.
MethodDetected Change Point Location (Time)
MOSUM.TAVC2015-10-31, 2016-01-03, 2016-10-09
WBS2.TAVC2016-01-03
DeCAFS2014-02-12, 2014-02-24, 2014-02-26, 2014-10-07, 2014-10-11, 2015-11-26, 2015-11-30,
2015-12-01, 2015-12-24, 2015-12-26, 2016-01-03, 2016-03-02, 2016-03-04,
2016-09-21, 2016-12-19, 2016-12-21, 2016-12-29, 2017-01-07, 2017-01-27, 2017-01-28
DepSMUCE(0.05)2014-02-12, 2014-02-26, 2015-11-26, 2016-01-03, 2016-12-16, 2017-01-07
DepSMUCE(0.1)2014-02-12, 2014-02-26, 2015-05-01, 2015-11-26, 2015-12-01, 2016-01-03, 2016-12-16, 2017-01-07
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Haile, T.T.; Tian, F.; AlNemer, G.; Tian, B. Multiscale Change Point Detection for Univariate Time Series Data with Missing Value. Mathematics 2024, 12, 3189. https://doi.org/10.3390/math12203189

AMA Style

Haile TT, Tian F, AlNemer G, Tian B. Multiscale Change Point Detection for Univariate Time Series Data with Missing Value. Mathematics. 2024; 12(20):3189. https://doi.org/10.3390/math12203189

Chicago/Turabian Style

Haile, Tariku Tesfaye, Fenglin Tian, Ghada AlNemer, and Boping Tian. 2024. "Multiscale Change Point Detection for Univariate Time Series Data with Missing Value" Mathematics 12, no. 20: 3189. https://doi.org/10.3390/math12203189

APA Style

Haile, T. T., Tian, F., AlNemer, G., & Tian, B. (2024). Multiscale Change Point Detection for Univariate Time Series Data with Missing Value. Mathematics, 12(20), 3189. https://doi.org/10.3390/math12203189

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop