A Functional Data Approach for Continuous-Time Analysis Subject to Modeling Discrepancy under Infill Asymptotics

Chen, Tao; Li, Yixuan; Tian, Renfang

doi:10.3390/math11204386

Open AccessArticle

A Functional Data Approach for Continuous-Time Analysis Subject to Modeling Discrepancy under Infill Asymptotics

by

Tao Chen

^{1,2,3,4,*,†},

Yixuan Li

^2,5,†

and

Renfang Tian

^2,6,†

¹

Department of Economics, Cross Appointed to Statistics and Actuarial Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada

²

Big Data Research Lab, University of Waterloo, Waterloo, ON N2L 3G1, Canada

³

Labor and Worklife Program, Harvard University, Cambridge, MA 02138, USA

⁴

Ordered Number Technology Inc., Shanghai 200131, China

⁵

Department of Anthropology, Economics and Political Science, MacEwan University, Edmonton, AB T5J 4S2, Canada

⁶

School of Management, Economics, and Mathematics, King’s University College at Western University, London, ON N6A 2M3, Canada

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2023, 11(20), 4386; https://doi.org/10.3390/math11204386

Submission received: 17 August 2023 / Revised: 17 October 2023 / Accepted: 19 October 2023 / Published: 22 October 2023

(This article belongs to the Special Issue Financial Econometrics and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Parametric continuous-time analysis often entails derivations of continuous-time models from predefined discrete formulations. However, undetermined convergence rates of frequency-dependent parameters can result in ill-defined continuous-time limits, leading to modeling discrepancy, which impairs the reliability of fitting and forecasting. To circumvent this issue, we propose a simple solution based on functional data analysis (FDA) and truncated Taylor series expansions. It is demonstrated through a simulation study that our proposed method is superior—compared with misspecified parametric methods—in fitting and forecasting continuous-time stochastic processes, while the parametric method slightly dominates under correct specification, with comparable forecast errors to the FDA-based method. Due to its generally consistent and more robust performance against possible misspecification, the proposed FDA-based method is recommended in the presence of modeling discrepancy. Further, we apply the proposed method to predict the future return of the S&P 500, utilizing observations extracted from a latent continuous-time process, and show the practical efficacy of our approach in accurately discerning the underlying dynamics.

Keywords:

continuous-time analysis; frequency-dependent parameter; functional data analysis; infill asymptotics; modeling discrepancy

MSC:

41A15; 41A58; 60J60; 62R10; 65D10

1. Introduction

Much of finance and economics is about the study of dynamics over time, in which analysis using time-series data plays a vital part. Despite discrete observations in practice, many time-series data such as stock prices, interest rates, and GDP are essentially drawn from their continuous-time underlying processes, for which consistent estimation can be achieved from continuous-time analyses but not necessarily from their discrete-time counterparts. Thus, the former has become increasingly incorporated by modern time-series analysis (e.g., [1,2,3,4]).

In traditional parametric continuous-time analyses, the modeling routinely originates in a discrete-time setting and is then extended to a continuous-time formulation as the length of the time window shrinks towards zero [3]. However, such derivation often imposes strong restrictions while specifying the convergence rates of frequency-dependent (hereafter, f.d.) parameters, resulting in models that are unadaptable to real-world data and thus forecasting failure [4,5,6,7,8,9,10,11,12]. For instance, with different sets of conditions regarding the converging speeds of the parameters, many GARCH-like processes have various diffusion processes (e.g., [6,9,11,13]). Moreover, through a testing procedure involving discrimination between the classes of deterministic and stochastic volatility models, ref. [7] discovered that, unlike the traditional stochastic view of short-term interest rates, the Japanese short-term interest rate indeed follows a deterministic pattern. Consequently, there arises discrepancy in continuous-time modeling, as it is debatable which assumption is the correct one (e.g., [14], pp. 176–178, [15]), and mistaking the assumption will lead to a misspecified limit and thereby unreliable analysis results. In such circumstances, nonparametric approaches appear to be appealing tools that can adapt to the true limits based on their data-driven nature and bypass this discrepancy in continuous-time analysis [16].

There is a large body of literature demonstrating the effectiveness of nonparametric methods for accurate estimation and forecasting in cases where parametric assumptions are deemed to be inadequate (e.g., [17,18,19,20,21,22,23,24,25,26]). Many studies incorporate nonparametric methods to approximate the density of the states in the absence of a closed-form expression, so that they could use maximum likelihood estimation (MLE) for continuous-time diffusion models (e.g., [22,27,28]). However, they still presume the parametric format of the underlying processes. Another stream of literature uses kernel-based methods to estimate the processes that are nonanticipative smooth functions with unknown structures (e.g., [23]) and make forecasts through conditional density estimation (e.g., [19,20,29,30]). It has been shown, however, that traditional kernel estimators can become inconsistent as the sampling density grows despite the underlying processes becoming further revealed [18,31], and the infill asymptotics (i.e., the asymptotic properties achieved as the sample becomes increasingly dense [32]) require the careful consideration of the dependence among observations, which is substantial work [33].

The main contribution of our paper is to propose a fitting and forecasting approach from the viewpoint of functional data analysis (FDA) to accommodate f.d. data structures and make good use of high-frequency data in order to achieve robust infill asymptotics. Indeed, the FDA fitting method, by design, deals with a collection of subsets of finite-dimensional parameter spaces, which becomes richer and denser with an enlarging sample, and the target function can be consistently estimated by optimizing an empirical criterion (e.g., [34,35,36]). For such a feature, in our method we employ the FDA-based approach to fit the underlying process using local polynomial bases and obtain the forecasts by extending the movement of the process based on the boundary derivatives of the functional fitting. FDA is renowned for its ability to uncover the dynamics of unknown continuous processes without necessitating stringent assumptions about the data structure [26,37,38]. However, to the best of our knowledge, its application in tracing the f.d. data structures and performing forecasting in continuous-time analysis remains underexplored in the existing literature. Thus, with a simulation study, this paper illustrates the procedure and properties of the proposed FDA-based method and emphasizes that this method excels when the convergence rate of f.d. parameters is undefined, effectively mitigating the common problem of misspecification prevalent in parametric estimations. In addition, we remark that functional approaches can be further developed in this context to achieve desired infill asymptotics with bounded or unbounded domains. To improve readability, all proofs are placed in Appendix A.

In addition to the simulation analysis, we substantiate the practical utility of the proposed method in forecasting S&P 500 prices, a topic of enduring significance that has consistently commanded the interest of financial practitioners and researchers within the field. Throughout the years, various statistical and mathematical methods have emerged to offer insights into stock price trends including but not limited to time-series analysis (e.g., [39,40,41]); machine learning algorithms (e.g., [22,42,43]); technical indicators (e.g., [44,45,46]); and fundamental analysis (e.g., [47,48,49]). However, to the best of our knowledge, the utilization of FDA for future stock price predictions appears to be infrequent. This paper adopts the FDA-based approach to predict the S&P 500 log-returns. Through the root mean squared forecast error (RMSFE) among different sample sizes and frequencies, we examine the performance of the proposed method in forecasting the process in a given step ahead.

The rest of this paper is structured as follows. In Section 2, we explain the FDA-based method, followed by a discussion of the large-sample properties. Section 3 sets up a simulation study and takes strong GARCH(1,1) as a motivating example to show that in continuous-time analysis, while parametric methods—such as MLE—encounter discrepancy, the proposed FDA-based method can provide robust and reliable estimation and forecasting. We then present an application of the proposed method in predicting future S&P 500 prices in Section 4, and Section 5 concludes the paper.

2. Methods

In this section, we explain the functional fitting and forecasting methods, followed by a discussion of the large-sample properties of the estimators. The method concerns target functions that are continuous but not necessarily differentiable. While smoothness assumptions are often imposed in FDA to accommodate incomplete and noisy observations, as noted by [50], we construct the functional estimator through the Bernstein polynomial approximation, which induces the convergence that requires only the continuity of the underlying process.

2.1. Fitting and Forecasting

Consider a complete probability space

(Ω, F, P)

and a re-scaled bounded time interval

[0, 1]

. The sample path of the stochastic process

{X (t) : t \in [0, 1]}

for state

ω \in Ω

over

[0, 1]

is denoted as

X_{ω}

. Realizations are drawn at countably many time points with observational error

ϵ_{ω, t}

; as such,

\begin{matrix} X_{ω, t} = X_{ω} (t) + ϵ_{ω, t} . \end{matrix}

(1)

In parametric modeling,

X_{ω} (t)

is the conditional mean with a specified functional form, whereas in FDA,

X_{ω} (t)

is the underlying sample path assumed to be continuous at any given point t. Thus, in the discussion below, we establish the properties of our fitting and forecasting quantities through an arbitrarily close approximation of

X_{ω} (t)

by the Bernstein polynomial

B_{K} (t, X_{ω})

, so that for each

K \in N

,

\begin{matrix} B_{K} (t, X_{ω}) : = \sum_{k = 0}^{K} X_{ω} (\frac{k}{K}) \times (\binom{K}{k}) \times t^{k} {(1 - t)}^{K - k}, \end{matrix}

(2)

whence we have the following lemma.

Lemma 1.

Consider

B_{K} (t, X_{ω})

as in Equation (2) with the continuous sample path

X_{ω}

for any given

ω \in Ω

. For any

ε > 0

, there exists some

δ (ε) > 0

that induces

| t_{1} - t_{2} | \leq δ (ε) \Rightarrow | X_{ω} (t_{1}) - X_{ω} (t_{2}) | \leq ε / 2

for all

t_{1}, t_{2} \in [0, 1]

. Then, for all

K \geq 4 {sup}_{t \in [0, 1]} | X_{ω} (t) | / (δ^{2} (ε) ε)

, the following holds:

sup_{t \in [0, 1]} | B_{K} (t, X_{ω}) - X_{ω} (t) | \leq ε .

Lemma 1 suggests that, fixing everything else, for any given

ε

, a high-degree polynomial is needed for approximation if the continuity of

X_{ω}

requires a very small

δ (ε)

, while a low-degree polynomial can achieve the desired approximation if a large

δ (ε)

suffices for the given

ε

(see, e.g., Theorem 5.14 [51]). This lemma also implies that while fitting any continuous process, the typical smoothness assumptions for functional data can be imposed on the Bernstein approximating polynomials

B_{K} (t, X_{ω})

, and the size of the remaining estimation error can be regulated by a high-degree

B_{K} (t, X_{ω})

. There are three remarks following Lemma 1. First, this lemma is based on the continuity of

X_{ω} (t)

, which does not require the continuity of the observed series. Second, the lemma holds for any continuous sample path and thus can also be applicable when

X_{ω} (t)

itself is a diffusion process of Hölder continuity. Last but not the least, there are various methods to obtain approximations to a continuous function with desirably small errors (Chapter 10 [52]), and here we adopt the Bernstein polynomial for the delivery of the discussion. Hence, the establishment of the asymptotics in the current study is achieved with the help of the parametrization of

B_{K} (t, X_{ω})

, while in practice, we do not directly estimate or determine the degree K in

B_{K} (t, X_{ω})

. Instead, the degree of the approximating polynomials will be controlled through the construction of the functional estimator, as explained below.

In the spirit of FDA [38], one can construct a B-spline representation of a certain degree for the polynomial approximation

B_{K} (t, X_{ω})

, such that

B_{K} (t, X_{ω}) = Φ^{⊺} (\cdot) C

, where

Φ

is a B-spline basis that contains a vector of Q basis functions,

C_{ω}

is a vector of Q corresponding basis coefficients, and the superscript “⊺” indicates the transpose operation. Then, Equation (1) can be rewritten as

\begin{matrix} X_{ω, t} \approx Φ^{⊺} (t) C_{ω} + ϵ_{ω, t} . \end{matrix}

(3)

With the predetermined basis functions

Φ

and the J observations

{X_{t_{j}}}_{j = 1}^{J} : = {X_{ω, t_{j}}}_{j = 1}^{J}

at

{t_{j}}_{j = 1}^{J} \subseteq [0, 1 - Δ]

for some

0 < Δ < 1

, the coefficients

C_{ω}

are estimated by minimizing the penalized squares

J^{- 1} \sum_{j = 1}^{J} {[X_{t_{j}} - \tilde{X} (t_{j})]}^{2} + λ \int_{0}^{1} {\tilde{X}}^{(2)} (t) d t

with a tuning parameter

λ

and some fitted function

\tilde{X} (\cdot)

, which yields

\begin{matrix} {\tilde{C}}_{ω} : = {[\frac{1}{J} \sum_{j = 1}^{J} Φ (t_{j}) Φ^{⊺} (t_{j}) + λ \int_{0}^{1} Φ^{(2)} (t) {\{Φ^{(2)} (t)\}}^{⊺} d t]}^{- 1} \frac{1}{J} \sum_{j = 1}^{J} Φ (t_{j}) X_{ω, t_{j}}, \end{matrix}

(4)

where the superscript “

(2)

” indicates the second-order derivative of a function (the second-order derivative of the fitted function is one option for the roughness penalty—other penalties can also be applied depending on the situation). In practice, Q and

λ

can be selected through certain data-driven algorithms (e.g., [38]). Henceforth, the fitted process of

X_{ω} (t)

is defined as

\begin{matrix} {\tilde{X}}_{ω} (t) : = Φ^{⊺} (t) {\tilde{C}}_{ω}, \forall t . \end{matrix}

(5)

By choosing a B-spline basis of a proper degree larger than some value R for the fitted process

{\tilde{X}}_{ω} (t)

, the

Δ

-step-ahead predictor at any given t,

{\hat{X}}_{ω} (t + Δ)

is defined as a truncated Taylor expansion with the first R derivatives of the fitted functions, such that

\begin{matrix} {\hat{X}}_{ω} (t + Δ) : = \sum_{r = 0}^{R} \frac{1}{r!} Δ^{r} {\tilde{X}}_{ω}^{(r)} (t) = \sum_{r = 0}^{R} \frac{1}{r!} Δ^{r} {\tilde{C}}_{ω}^{⊺} Φ^{(r)} (t) \forall t \in (0, 1 - Δ] . \end{matrix}

(6)

The validity of the above Taylor expansion requires the fitted function

{\tilde{X}}_{ω} (t)

to be at least R times continuously differentiable, while the underlying process

X_{ω} (t)

only needs to be continuous. We want the relationship between the order R of continuous differentiability and the degree K of the approximated function

B_{K} (t, X_{ω})

to be such that

R = K + 1

, as the Taylor polynomial approximation will have a non-zero residual if

B_{K} (t, X_{ω})

is of a higher degree (i.e., if

R - 1 < K

), while a higher degree of the functional estimator than necessary (i.e.,

R - 1 > K

) will bring extra volatility to the fitting and forecasting values. In practice, R can also be selected through certain data-driven methods, such as that explained in the simulation and empirical studies below. Due to the features of the B-spline basis employed, one can adopt the order of

R + 3

to ensure the desired differentiability of

{\tilde{X}}_{ω} (t)

. The half-open interval

(0, 1 - Δ]

indicates that, technically, the forecast cannot be achieved without any observation, and the forecast can be obtained as far as

t + Δ = 1 .

Furthermore, although the derivatives

{\tilde{X}}_{ω}^{(r)} (t)

are applied in the approximation, we do not actually need the convergence of these derivatives to, even if they exist, the derivatives of the truth—only the convergence of the forecasting function in the level is of interest.

2.2. Large-Sample Properties

We now discuss the consistency and the asymptotic normality of our functional predictors based on the following assumptions.

Assumption 1.

Consider a continuous sample path

X_{ω} (t)

where for any

ε > 0

, there exists

δ (ε) > 0

such that

| t_{1} - t_{2} | \leq δ (ε)

implies

| X_{ω} (t_{1}) - X_{ω} (t_{2}) | \leq ε / 2

. Then, we have the following:

(a): The sample path $X_{ω} (t)$ is observed on a set of evenly spaced time points.
(b): The observational error $ϵ_{ω, t_{j}}$ is uncorrelated across ${\{t_{j}\}}_{j = 1}^{J} \subseteq [0, 1]$ , with $E [ϵ_{ω, t} ∣ X_{ω} (s)] = 0$ for $s < t$ and $Var [ϵ_{ω, t}] < c < \infty$ for all $t \in [0, 1]$ and some constant c.
(c): $Q \sim J^{α_{1}}$ and $λ \sim J^{α_{2}}$ with $0 < α_{1} < 1$ and $α_{2} < 0$ .

For simplicity, we assume in Assumption 1(a) that the observations are equally spaced without a loss of generality. A less restrictive version would be to assume that the ratio between the lengths of the largest and the smallest time windows is bounded above and away from zero as in [53]. Part (b) states a sufficient condition to achieve the consistency of the functional estimator

{\tilde{X}}_{ω}

as in Equation (5), while a proper “low correlation” assumption for this error term is also sufficient. Finally, part (c) indicates that the dimension of the basis expansion shall increase with the sample size to allow for a consistent functional estimator

{\tilde{X}}_{ω}

, and, meanwhile, the tuning parameter shall be kept to

o (1)

so that the estimation error introduced by the roughness penalty dies down as the sample size grows towards infinity.

For any given

0 < Δ < 1

, we consider the convergence of the forecasting process

{\hat{X}}_{ω}

over

(Δ, 1)

or, equivalently, over

t \in (0, 1 - Δ)

. Note that even though the forecasting process can be defined at the point

t + Δ = 1

as in Equation (6), the asymptotics exclude the point of

t = 1 - Δ

due to the limited boundary performance.

Theorem 1

(Convergence of

{\hat{X}}_{ω}

given

Δ

). Consider a continuous sample path

X_{ω}

on

[0, 1]

such that for any

ε > 0

, there exists

δ (ε) > 0

, which induces

| t_{1} - t_{2} | \leq δ (ε) \Rightarrow

| X_{ω} (t_{1}) - X_{ω} (t_{2}) | \leq ε / 2

.

Let

F_{K}

be the collection of all the (local) polynomials over

[0, 1 - Δ]

that are at least

K + 1

times continuously differentiable, and

F_{K} : F_{K} \to R

be the linear functional such that

F_{K} (ψ) : = 〈{\hat{X}}_{ω} (\cdot + Δ) - B_{K} (\cdot + Δ, X_{ω}), ψ〉

for

ψ \in F_{K}

.

If Assumption 1 holds, then for any given K and

R = K + 1

,

F_{K} (ψ) \overset{p}{⟶} 0

as

J \to \infty

for all

ψ \in F_{K}

, whence under

K \to \infty

,

{sup}_{t \in (0, 1 - Δ)} | {\hat{X}}_{ω} (t + Δ) - X_{ω} (t + Δ) | \overset{p}{⟶} 0

as

J \to \infty

.

Theorem 1 establishes the convergence of the forecasting process

{\hat{X}}_{ω} (\cdot + Δ)

through its weak convergence to the Bernstein approximating polynomial

B_{K} (\cdot + Δ, X_{ω})

and the uniform convergence of

B_{K} (\cdot, X_{ω})

to

X_{ω}

. Specifically, for any given K and

R = K + 1

,

{\hat{X}}_{ω} (\cdot + Δ)

weakly converges to the approximating polynomial

B_{K} (\cdot + Δ, X_{ω})

with the enlarging sample size J, while

B_{K} (\cdot, X_{ω})

uniformly converges to the true path

X_{ω}

with an enlarging K and regardless of the sample size J. Henceforth,

{\hat{X}}_{ω} (\cdot + Δ)

converges uniformly to the true path

X_{ω}

with an enlarging K and J. For

R = K + 1

with

K \geq 4 {sup}_{t \in [0, 1]} | X_{ω} (t) | / (δ^{2} (ε) ε)

, it is worth noting that for a B-spline basis of order

R + 3

to span the function space, the number of basis functions Q needs to be such that

Q \geq 2 (R + 2) + 1

, while Q grows towards infinity at a lower speed than J; hence, we also have

K = o (J)

as

K, J \to \infty

.

Now we explore the asymptotic distribution of the functional prediction

{\hat{X}}_{ω}

. However, before we state any further assumptions or the resulting asymptotic properties, it is important to note that the realizations are subject to observational error

ϵ_{ω, t}

, the behavior of which can impact the asymptotic distribution of

{\hat{X}}_{ω}

. For example, when a continuous sample path

X_{ω}

is observed without noise, it is more plausible to consider the functional fitting as an interpolation—in this case, we do not necessarily have asymptotic normality for

{\hat{X}}_{ω}

. On the other hand, if

X_{ω}

is observed with noise, we can establish pointwise asymptotic normality under the following sufficient conditions:

Assumption 2.

(a): The noise $ϵ_{t_{j}}$ is independent across all ${\{t_{j}\}}_{j = 1}^{J} \subseteq [0, 1]$ .
(b): $Q \sim J^{α_{1}}$ and $λ \sim J^{α_{2}}$ with $α_{1} > 0$ and $α_{2} < 0$ , such that $Q^{2} λ = o (J^{- 1 / 2})$ and $J^{- 1} Q^{2} = O (J^{- 1 / 2})$ .

Assumption 2(a) states a sufficient condition for generating the asymptotic normality by the Lyapunov CLT, which allows for non-identically distributed variables. If one has mixing random processes, a different version of this assumption can be specified by imposing identical distribution, where a different CLT can be applied without affecting the result regarding asymptotic normality. Part (b) assumes a stronger condition on the orders of the two parameters Q and

λ

, so that the non-normal part of the estimation error is dominated and will not be inflated while deriving the asymptotic distribution.

Theorem 2.

Consider a continuous sample path

X_{ω}

on

[0, 1]

where for any

ε > 0

, there exists

δ (ε) > 0

such that

| t_{1} - t_{2} | \leq δ (ε)

implies

| X_{ω} (t_{1}) - X_{ω} (t_{2}) | \leq ε / 2

.

Under Assumptions 1 and 2, the following asymptotic normality holds for all

t \in (0, 1 - Δ]

,

R \to \infty

, and

J \to \infty

:

\begin{matrix} {[V (t; Δ, R, Φ)]}^{- 1 / 2} [{\hat{X}}_{ω} (t + Δ) - X_{ω} (t + Δ)] \overset{d}{⟶} N (0, 1), \end{matrix}

where

σ_{j}^{2} : = Var [ϵ_{t_{j}}]

for all j, and

\begin{matrix} V (t; Δ, R, Φ) : = \frac{1}{J^{2}} \sum_{j = 1}^{J} σ_{j}^{2} A_{t_{j}}^{2} (t; Δ, R, Φ), \\ A_{t_{j}} (t; Δ, R, Φ) : = \sum_{r = 0}^{R} \frac{1}{r!} Δ^{r} {\{Φ^{(r)} (t)\}}^{⊺} {\{\int_{0}^{1} Φ (s) Φ^{⊺} (s) d s\}}^{- 1} Φ (t_{j}) . \end{matrix}

Theorem 2 constructs the pointwise asymptotic normality of the functional predictor

{\hat{X}}_{ω}

through the asymptotically normal prediction error from

{\hat{X}}_{ω}

to the Bernstein approximating polynomial

B_{K} (t, X_{ω})

and the convergence of

B_{K} (t, X_{ω})

to

X_{ω}

as K increases towards infinity. Specifically,

{\hat{X}}_{ω} (t + Δ) - X_{ω} (t + Δ)

can be decomposed into

{\hat{X}}_{ω} (t + Δ) - B_{K} (t + Δ, X_{ω})

and

B_{K} (t + Δ, X_{ω}) - X_{ω} (t + Δ)

, where the former is inflated by

V (t; Δ, R, Φ)

to an asymptotic normal, and the latter is dominated because a desirably small approximation error

B_{K} (t + Δ, X_{ω}) - X_{ω} (t + Δ)

can be achieved by a sufficiently large K (and R). Meanwhile, since the inverse factorial series

\sum_{r = 0}^{\infty} {(r!)}^{- 1}

converges, when

R \to \infty

we have

\sum_{r = 0}^{R} {(r!)}^{- 1} Δ^{r} < C < \infty

for some small

Δ

and some fixed

C \in R

, whence

A_{t_{j}} (t; Δ, R, Φ)

, and thus

A_{t_{j}}^{2} (t; Δ, R, Φ)

is bounded. Then, by Assumptions 1(b) and 2(a), the term

V (t; Δ, R, Φ)

—and so is the variance

{\hat{X}}_{ω} (t + Δ) - X_{ω} (t + Δ)

—is of order

J^{- 1}

. As a result, a

\sqrt{J}

-asymptotic normality can be achieved.

3. Simulation

We now explain the numerical procedure of the proposed methods through a simulation study and compare its performance to that of a parametric approach under both correct specification and misspecification, thereby revealing FDA’s superiority in tracking the f.d. data structures of stochastic processes. The simulation study was conducted using the software R (version 4.0.5) and MATLAB (version R2023a), and the computer code is available in the Supplementary Materials.

3.1. The Data-Generating Process

The data-generating process is based on the strong GARCH (1,1), where

y_{k} = (S_{k} - S_{k - 1}) / S_{k - 1}

,

k = 1, 2, \dots

is the arithmetic return on a financial asset with the price

S_{k}

. Let h be the time window, and recall that a strong GARCH (1,1) process is represented by

y_{k} = μ + ϵ_{k}

with

ϵ_{k} \sim N (0, V_{k})

given the

σ

-algebra generated by

ϵ_{k - 1}

and

V_{k} = ω_{h} + ξ_{h} ϵ_{k - 1}^{2} + γ_{h} V_{k - 1}

for the f.d. parameters

ω_{h}, ξ_{h}, γ_{h} > 0

and

ξ_{h} + γ_{h} < 1

[54,55].

As h shrinks,

(S_{k h}, V_{k h})

weakly converges to its continuous-time limit

(S_{t}, V_{t})

[6]. This is a simple but effective example, since given different assumptions as to the f.d. parameters’ convergence rates (i.e., the convergence of

ξ_{h}

can have rate

\sqrt{h}

or rate h),

(S_{t}, V_{t})

has been shown to be a diffusion process solution to either a stochastic volatility (SV) model [13,56]

\begin{matrix} [\begin{matrix} d s_{t} \\ d v_{t} \end{matrix}] = [\begin{matrix} a \\ (β - \frac{1}{2} σ^{2}) + α exp (- v_{t}) \end{matrix}] d t + [\begin{matrix} \sqrt{1 - ρ^{2}} exp (\frac{v_{t}}{2}) & ρ exp (\frac{v_{t}}{2}) \\ 0 & σ \end{matrix}] [\begin{matrix} d W_{t}^{1} \\ d W_{t}^{2} \end{matrix}], \end{matrix}

(7)

where

s_{t} : = log (S_{t})

,

v_{t} : = log (V_{t})

, and

W_{t}^{1}

and

W_{t}^{2}

are independent standard Brownian motions, or a deterministic volatility (DV) model [6]

\begin{matrix} [\begin{matrix} d s_{t} \\ d v_{t} \end{matrix}] = [\begin{matrix} a \\ β + α exp (- v_{t}) \end{matrix}] d t + [\begin{matrix} exp (\frac{v_{t}}{2}) \\ 0 \end{matrix}] d W_{t}^{1} . \end{matrix}

(8)

Indeed, despite the fact that the ARCH-type diffusion models under both SV and DV have been frequently applied in the literature to estimate continuous-time processes (e.g., [8,9,22,27,57,58]), there has been considerable debate as to the choice of the convergence rate assumptions, which makes parametric analysis such as MLE tailored for either limit questionable (e.g., [14,59]). More generally, if the f.d. data structures have different limiting processes under different convergence conditions, it would be impractical for parametric continuous-time generalizations to exhaust all possible limits to choose the correct one.

To demonstrate how the FDA-based method can bypass this issue, we generate pseudo-continuous return and volatility processes, as shown in Equations (7) and (8), over the interval of

[0, 1]

, resembling the “continuous-time” processes of five equally spaced trading points per day for one year of 252 trading days (i.e., 5 time points per day × 252 days per year × 1 year = 1260 data points in total). The true values of the parameters in Table 1 and Table 2 are used to generate

N = 1000

pseudo-continuous log-return and log-volatility trajectories from the SV limit, denoted respectively by

s_{S} (t)

and

v_{S} (t)

, and those trajectories from the DV limit, denoted respectively by

s_{D} (t)

and

v_{D} (t)

. It should be noted that the volatility is not always observed in practice, and discussions addressing parametric analysis with such unobservability have been provided in the literature (e.g., [22,27,60]). However, the FDA-based method can handle the return and volatility processes separately; hence, in what follows, we assume that volatility is accessible without harming the analysis of the return. Should we be interested in the unobserved volatility, the same FDA-based method could be applied to suitable proxies. We take the pseudo-continuous-time processes as the observations, denoted as

s_{S} (t_{j})

,

v_{S} (t_{j})

,

s_{D} (t_{j})

, and

v_{D} (t_{j})

for

j = 1, \dots, J

, to numerically mimic the scenario where the sampling time window becomes arbitrarily small, and the estimation is performed based on one-month rolling windows (i.e.,

J = 1260 / 12 = 105

) and eight-month rolling windows (i.e.,

J = 1260 / 12 * 8 = 840

), respectively. Cases with an evenly spaced “daily” sampling density are also included to demonstrate the performance of the FDA-based method under different sample sizes and frequencies.

3.2. Fitting and Forecasting with FDA

Now we present the procedures for fitting and forecasting with FDA in steps (a) to (c) below, and the forecast evaluation is discussed in steps (d) and (e). Note that only the implementations in terms of

s_{S} (t)

and

s_{D} (t)

are explained here, and similar procedures can be applied to the

v_{S} (t)

and

v_{D} (t)

processes.

(a): Provide a candidate pool for the vector $(λ, R)$ of the tuning parameter and the Taylor expansion order. For each of the 1000 replications, take the length $J - 10$ rolling windows (except for the one-month daily sample, where we take the length $J - 3$ rolling windows) from the first J observed data points and estimate the underlying processes $s_{S} (t)$ and $s_{D} (t)$ as in Equations (4) and (5) using the B-spline basis of order $R + 3$ and the number of basis functions $min {2 (R + 2) + 1, J + R + 3}$ . We then obtain ${\tilde{s}}_{S} (t)$ and ${\tilde{s}}_{D} (t)$ , respectively, for each rolling window with each pair of $(λ, R)$ candidates.
(b): For the first J observed data points from each of the 1000 replicates, compute the forecasting values ${\hat{s}}_{S} (t + Δ)$ and ${\hat{s}}_{D} (t + Δ)$ for each of the length $J - 10$ rolling windows with each pair of $(λ, R)$ candidates using Equation (6). The $(λ, R)$ pair that minimizes the RMSFE over the ten rolling windows across all 1000 replications is selected and denoted by $(\hat{λ}, \hat{R})$ to be used for later fitting and forecasting.
(c): For each of the 1000 replications, perform the fitting and forecasting on the length J rolling windows (either one-month or eight-month) with the given observation frequency using the selected parameters $(\hat{λ}, \hat{R})$ , the basis order of $R + 3$ , and the number of basis functions $min {2 (R + 2) + 1, J + R + 3}$ , yielding the rolling-window forecasting values ${\hat{s}}_{S} (t + Δ)$ and ${\hat{s}}_{D} (t + Δ)$ . It is worth noting that the forecasting step of this method can be flexible, as $Δ$ is technically a continuous quantity. In this simulation study, we only consider the forecasting step that is the same as the sampling time window, which we later refer to as the “one-step-ahead forecast”, while in general, for example, one can make a one-hour-ahead forecast with daily data or a one-day-ahead forecast with hourly data.
(d): Implement the Kolmogorov-Smirnov ( $K - S$ ) test on the pairs “ ${\hat{s}}_{S} (t + Δ)$ and $S_{S} (t + Δ)$ ”, “ ${\hat{s}}_{S} (t + Δ)$ and $S_{D} (t + Δ)$ ”, “ ${\hat{s}}_{D} (t + Δ)$ and $S_{S} (t + Δ)$ ”, and “ ${\hat{s}}_{D} (t + Δ)$ and $S_{D} (t + Δ)$ ” for all time points t to check whether the FDA-based predictors can correctly distinguish the true underlying processes in terms of distributions.
(e): Calculate the RMSFE according to [61,62] between the pairs “ ${\hat{s}}_{S} (t + Δ)$ and $s_{S} (t + Δ)$ ”, “ ${\hat{s}}_{S} (t + Δ)$ and $s_{D} (t + Δ)$ ”, “ ${\hat{s}}_{D} (t + Δ)$ and $s_{S} (t + Δ)$ ”, and “ ${\hat{s}}_{D} (t + Δ)$ and $s_{D} (t + Δ)$ ” for the out-of-sample performance evaluation.

In practice, the relevant parameters can be selected more exhaustively for each forecast and each replication as needed; however, developing the asymptotic properties of such data-driven methods is not the focus of the current paper, so we simplify the search in step (a) by having only three possible Rs and three possible

λ

s for each search, forming nine pairs of

(λ, R)

candidates, and the resulting

(\hat{λ}, \hat{R})

is used for all the replicates in all the later forecasts. Also, the upper bound

J + R + 3

of the number of basis functions Q is used to simplify the search by first fixing Q at the sample size J while extending the fitting function beyond the fitting domain by

R + 3

equally spaced basis functions to capture the smoothness on

[1 - Δ, 1]

and to avoid the boundary noise induced by the last

R + 3

(i.e., the order of the basis in our setting) basis functions.

3.3. Comparison with Parametric Methods

We adopted MLE as a benchmark parametric method for a performance comparison with our FDA approach as it is a commonly used approach for GARCH-like models (see e.g., [22,63]). It is imperative to underscore that within this context, MLE was employed as the benchmark parametric approach for the estimation of a sequence of discrete observations drawing from an unknown underlying diffusion process. While alternative parametric methodologies, like the general method of moments (GMM), are available for application, it is noteworthy that they all necessitate certain parametric assumptions about the data distribution, which could potentially result in erroneous estimation results and forecasts when dealing with f.d. data structures under infill asymptotics. Based on the stochastic differential equations of the two limits in Equations (7) and (8), the likelihood functions were obtained utilizing the (joint) normality of

d W_{t}^{1}

and

d W_{t}^{2}

, and the differentials

d s

and

d v

were approximated by the corresponding first difference of the discrete observations. In favor of the parametric methods, we took the eight-month pseudo-continuous rolling windows as the observed series, and the comparison between the two approaches was conducted in terms of RMSFEs.

Thus, based on Equations (7) and (8), the forecasts based on conditional expectation are such that

\{\begin{matrix} {\hat{s}}_{t + Δ} : = \hat{E} [s_{t + Δ} | s_{t}] = s_{t} + \hat{a} Δ, \\ {\hat{v}}_{t + Δ} : = \hat{E} [v_{t + Δ} | v_{t}] = v_{t} + [(\hat{β} - \frac{1}{2} {\hat{σ}}^{2}) + \hat{α} exp (- v_{t})] Δ \end{matrix}

(9)

for the SV limit and

\{\begin{matrix} {\hat{s}}_{t + Δ} : = \hat{E} [s_{t + Δ} | s_{t}] = s_{t} + \hat{a} Δ, \\ {\hat{v}}_{t + Δ} : = \hat{E} [v_{t + Δ} | v_{t}] = v_{t} + [\hat{β} + \hat{α} exp (- v_{t})] Δ \end{matrix}

(10)

for the DV limit, where

\hat{a}, \hat{β}, \hat{α},

and

\hat{σ}

are the estimated parameters from MLE, and

Δ

is the forecasting step as previously defined.

3.4. Results

Figure 1 and Figure 2 present the forecasting results using the FDA-based method, where the underlying processes are indicated by gray lines, the one-step-ahead rolling forecasts are indicated by black lines, and the selected order of Taylor expansion is noted in the labels of the plots. The figures reveal that the proposed functional approach accurately traced the movements of both the underlying return and volatility processes.

Then, we used the

K - S

test to compare the predicted and the underlying processes. The p-values of the test on the null of distribution equality over the entire time domain are presented in Figure 3 and Figure 4, where the dashed lines indicate the

5 %

significance level. As depicted in the two plots from the first column of Figure 3, the true return and volatility underlying processes generated from the SV and DV models exhibited significantly disparate distributions at almost every fixed observation time

t_{j}

. The second and third columns of Figure 3 suggest that the null hypothesis of distribution equality was rejected when the underlying processes showed significant differences in the cross-comparison, while the FDA forecast shared the same distribution as its true underlying process at almost every

t_{j}

, as evident in Figure 4. The results imply that the one-step-ahead rolling forecast using the FDA-based method could preserve the distributions of the true underlying processes pointwisely and could distinguish the true underlying processes from a falsely assumed continuous-time limit in terms of their distributions.

The proposed method was further applied to a lower observation frequency and a smaller sample size, where it exhibited decent performance with short and sparse observed series; the corresponding results are illustrated in Appendix B.1. A summary of comparisons across different sample sizes and frequencies in terms of RMSFE is presented in Figure 5, where “1 Md” denotes the one-month rolling window with daily observations, “8 Md” denotes the eight-month rolling window with daily observations, “1 Mps” denotes the one-month rolling window with pseudo-continuous observations, and “8 Mps” denotes the eight-month rolling window with pseudo-continuous observations. The small yet discernible differences in the RMSFEs among the different domain spans and different sampling densities indicates that the FDA-based method benefited from both the infill sample growth and the increasing-domain sample growth, though more from the former than from the latter.

The results for the first rolling window of the MLE are shown in Table 1 and Table 2 to provide a snapshot of the performance of MLE, while the results for all 420 rolling-window estimations are summarized by boxplots in Appendix B.2. Unsurprisingly, with the misspecified models, the estimates of the affected parameters mostly departed from the true values with statistical significance, while MLE showed good asymptotic performance under the correct specification. As noted in Table 1 and Table 2, the rejection rates for some parameters slightly exceeded the reasonable range of errors induced by a binomial distribution with

N = 1000

trials and a probability of success at

5 %

even when the model was correctly specified, which could be attributed to the insufficient number of observations. Nevertheless, in general, methods that can circumvent such modeling discrepancy will be useful to avoid misleading estimation results in continuous-time analysis.

Finally, to compare the performance of the FDA and MLE methods, we applied the MLE rolling-window estimation to perform one-step-ahead out-of-sample forecasting and compared the distributions of the relative RMSFEs (rRMSFEs) calculated as the ratio of the RMSFEs between FDA and MLE—a ratio below one signified that the FDA method provided a more precise forecast than the MLE method. The results for cases in which the MLE forecast was affected by the misspecification are shown in Figure 6, and those for cases in which the MLE forecast was unaffected are summarized in Figure A13 in Appendix B.2.

First, note that misspecifying an SV process and estimating it with a DV model did not cause significant bias in the estimation of the parameter a, which is the only parameter needed for the forecast of the return process, according to Equations (9) and (10). Thus, the estimation of

s_{S}

was not significantly affected by misspecifying it as

s_{D}

, and the distribution of the rRMSFEs is summarized in Figure A13. However, the estimations for the parameters

α

and

β

were affected by such a misspecification, which in turn caused inaccuracy and instability in the forecast of the volatility process, as shown in the first plot of Figure 6. Specifically, 47.9% of the rRMSFEs were lower than one, and around 44% of the rRMSFEs were lower than 0.01, while when MLE performed better, the rRMSFEs of the two methods were still within a comparable range. Hence, the FDA-based method appeared to be more stable and generally reliable. On the contrary, when a DV process was misspecified and estimated by an SV model, the estimation of all the parameters was affected statistically, but for the parameters

α

,

β

, and

σ

, the biases were negligibly small. Hence, according to Equations (9) and (10) with

\hat{σ} \approx 0

, the forecast for the volatility process would not be substantially affected, as summarized in Figure A13. For the forecast of the return process that relied on

\hat{a}

, the estimates significantly deviated from the true value. However, since the forecast error was determined by the re-scaled estimation error

(\hat{a} - a) δ

and the noise, the sizes of which were of the same order numerically, the final forecast error was not notably influenced by the error in

\hat{a}

, and the forecast exhibited a similar performance to the FDA-based method, as shown in the second plot of Figure 6. However, in practice, when the step

Δ

became larger, the error in

\hat{a}

began to show a further impact on the forecast. Finally, the FDA-based method could still outperform the parametric method even in a correctly specified case, as shown in the last plot of Figure 6.

Above all, the FDA-based method showed general consistency and robustness against modeling discrepancy, while the parametric methods could demonstrate unreliable performance due to misspecification, and even when the parametric method excelled under correct specification, the forecast errors of the two methods still lay within a comparable range, as shown in Figure 6 and Figure A13. When DV was misspecified as SV, the MLE forecast for the volatility process surpassed FDA the furthest in terms of the RMSFE. However, for the forecast of this non-random function, both methods performed well, as shown by the RMSFEs in Figure 5. Hence, the proposed FDA-based method is recommended in the presence modeling discrepancy.

4. Empirical Study

This section illustrates the application of our FDA-based method in stock price prediction, through which we obtained the closing prices of the S&P 500 Index from the Bloomberg terminal. Predicting volatility entails similar processes to forecasting future price movements, for which historical volatility indices like the VIX could be incorporated into our methodology. In accordance with the procedure utilized in our prior simulation study, we proceeded to assess the predictive efficacy of the FDA method utilizing observations spanning a duration of one year. Specifically, one-month and eight-month rolling windows of hourly data (every hour from 9:30 to 15:30) and daily data (every day at 12:30) were employed, and the fitting and forecasting with parameter selection followed the procedures described in steps (a) to (c) from Section 3.

Figure 7 and Figure 8 depict the functional predictions across various sampling frequencies and rolling windows. Figure 9 and Figure 10 present the distribution of the ratio of absolute forecast errors to log-returns in the four corresponding scenarios. The percentage of the forecast errors was consistently small, suggesting a close alignment between the forecast and the actual log-returns, which underscored the robustness and reliability of the proposed FDA-based method.

5. Conclusions

In continuous-time modeling, parametric methods fail to provide reliable analysis when there is discrepancy due to the existence of multiple limits. This paper adopted FDA to uncover the true continuous-time underlying processes subject to f.d. data structures under infill asymptotics and suggested a forecasting method by integrating FDA with Taylor series expansion, also exploring an application of FDA in out-of-sample prediction.

Our theorems demonstrate that the FDA-based method only requires the continuity of the underlying sample path and the low correlation of the observation errors, and with proper basis expansions, enlarging samples with an increasing sampling density ensure that the functional estimator converges to a unique and well-defined limit. The simulation analysis showed that the FDA-based method is capable of distinguishing processes with different continuous-time limits in out-of-sample prediction. Furthermore, although the parametric method dominated under correct specification, the forecast errors of the two methods were still within a comparable range, while the FDA-based method showed a generally consistent and much more robust performance against possible misspecification when the parametric method was subject to modeling discrepancy, which makes the functional method a preferable tool for fitting and forecasting when there is uncertainty in modeling the underlying process. Further, we validated the practical applicability of the proposed method in predicting future S&P 500 prices. Our findings demonstrated that the functional approach yielded dependable predictions even in cases where the underlying process was unknown. It is important to note that, as shown in the empirical study, the suggested FDA-based method is not limited to any particular continuous-time model; instead, it is able to uncover the underlying process of many discretely observed stochastic processes, including equity returns.

Lastly, the proposed FDA-based method shows limitations in certain circumstances. First, as the most natural way to obtain the optimal values of the parameters determining the functional estimators and forecasts, e.g., the number and the order of the basis function, the tuning parameter

λ

, and the order of the Taylor expansion, is by data-driven algorithms, the search within such an optimization process can induce computationally expensive tasks. Additionally, in the presence of a continuous but volatile underlying process, our approach also necessitates increased computational resources for determining the high-order derivatives of the fitted functions. Moreover, the nature of the method implies that, with everything else fixed, it generally works more comfortably and easily for smoother processes than volatile ones and is more suited for capturing long-term patterns than short-term shocks. Therefore, the performance of the method may heavily rely on high-frequency data while fitting and forecasting highly volatile processes. For future research, more efficient methods to obtain the optimal values of the parameters are to be developed, and the effects of the estimated parameters on the overall performance of the fitting and forecasting are to be explored. Furthermore, the method’s performance with volatile underlying processes and/or sparse observations is to be further studied and improved.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math11204386/s1.

Author Contributions

Conceptualization, T.C., Y.L. and R.T.; formal analysis, Y.L. and R.T.; writing, T.C., Y.L. and R.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in the empirical study is the closing prices of the S&P 500 Index from the Bloomberg terminal and is publicly available upon subscription. The exact dataset employed is available on request from the corresponding author, provided that the user agrees not to distribute it further.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

f.d.	frequency-dependent
GARCH	generalized autoregressive conditional heteroskedasticity
MLE	maximum likelihood estimation
FDA	functional data analysis
SV	stochastic volatility
DV	deterministic volatility
MSFE	mean squared forecast error
RMSFEs	relative MSFEs

Appendix A

Additional notations and lemmas (proofs of the lemmas are given further below).

Let

N_{0} (ζ)

and

N_{1 - Δ} (ζ)

denote the

ζ

-neighborhood of

t = 0

and

t = 1 - Δ

, respectively, for some small

ζ > 0

. Then, we define

T_{Δ, ζ} : = [0, 1 - Δ] \cup N_{0} (ζ) \cup N_{1 - Δ} (ζ)

. Also, let

{\tilde{X}}_{ω, J}

denote the fitted function for

X_{ω}

under the sample size J.

Lemma A1.

Consider a continuous sample path

X_{ω} (t)

on

[0, 1]

where for any

ε > 0

, there exists

δ (ε) > 0

such that

| t_{1} - t_{2} | \leq δ (ε)

implies

| X_{ω} (t_{1}) - X_{ω} (t_{2}) | \leq ε / 2

. Suppose Assumption 1 holds. Then, for

B_{K} (t, X_{ω})

, as in Equation (2), and

K \geq 4 {sup}_{t \in [0, 1]} | X_{ω} (t) | / (δ^{2} (ε) ε)

, we have the following:

(a): For any $t \in T_{Δ, ζ}$ and $ω \in Ω$ , ${\tilde{X}}_{ω, J} (t) - B_{K} (t, X_{ω}) \overset{p}{⟶} 0$ as $J \to \infty$ .
(b): For every $ϱ, η > 0$ there exists a $ζ > 0$ such that ${{\tilde{X}}_{ω, J}}$ is asymptotically stochastically equicontinuous on $T_{Δ, ζ}$ in that ${lim sup}_{J \to \infty} P {{sup}_{t_{1}, t_{2} \in T_{Δ, ζ}, | t_{1} - t_{2} | < ζ} | {\tilde{X}}_{ω, J} (t_{1}) - {\tilde{X}}_{ω, J} (t_{2}) | > ϱ} < η$ .
(c): For any $t \in T_{Δ, ζ}$ , $ω \in Ω$ , and $r = 1, \dots, K$ , ${\tilde{X}}_{ω, J_{1}}^{(r)} (t) - {\tilde{X}}_{ω, J_{2}}^{(r)} (t) \overset{p}{⟶} 0$ for any $J_{1}, J_{2} \to \infty$ .
(d): For every $ϱ, η > 0$ there exists a $ζ > 0$ such that for all $r = 1, \dots, K$ ,
${lim sup}_{J \to \infty} P {{sup}_{t_{1}, t_{2} \in N_{0} (ζ)} | {\tilde{X}}_{ω, J}^{(r)} (t_{1}) - {\tilde{X}}_{ω, J}^{(r)} (t_{2}) | > ε} < η$ , and
${lim sup}_{J \to \infty} P {{sup}_{t_{1}, t_{2} \in N_{1 - Δ} (ζ)} | {\tilde{X}}_{ω, J}^{(r)} (t_{1}) - {\tilde{X}}_{ω, J}^{(r)} (t_{2}) | > ε} < η$ .

Lemma A2.

Suppose Assumption 1 and Lemma A1 hold. Then, for

r = 1, \dots, K

,

(a): ${plim}_{J \to \infty} {lim}_{t \to 0^{+}} {\tilde{X}}_{ω, J}^{(r)} (t) = {lim}_{t \to 0^{+}} {plim}_{J \to \infty} {\tilde{X}}_{ω, J}^{(r)} (t)$ and
${plim}_{J \to \infty} {lim}_{t \to {(1 - Δ)}^{-}} {\tilde{X}}_{ω, J}^{(r)} (t) = {lim}_{t \to {(1 - Δ)}^{-}} {plim}_{J \to \infty} {\tilde{X}}_{ω, J}^{(r)} (t)$ .
(b): ${sup}_{t \in [0, 1 - Δ]} | {\tilde{X}}_{ω, J} (t) - B_{K} (t, X_{ω}) | \overset{p}{⟶} 0$ as $J \to \infty$ .
(c): ${\tilde{X}}_{ω, J}^{(r)} (0) - B_{K}^{(r)} (0, X_{ω}) \overset{p}{⟶} 0$ and ${\tilde{X}}_{ω, J}^{(r)} (1 - Δ) - B_{K}^{(r)} (1 - Δ, X_{ω}) \overset{p}{⟶} 0$ as $J \to \infty$ .

Lemma A3.

Let f be a function that maps a squared matrix to a real value; then, for full-rank Q-by-Q squared matrices

A

,

B

, and

Z

,

\begin{matrix} f (A) = f (B) + tr [{\{\frac{\partial f (Z)}{\partial Z}\}}^{⊺} (A - B)], \end{matrix}

where

min \{a_{i j}, b_{i j}\} < z_{i j} < max \{a_{i j}, b_{i j}\}

for all elements

a_{i j}

,

b_{i j}

, and

z_{i j}

of the matrices

A

,

B

, and

Z

, respectively, with

i, j = 1, \dots, Q

.

Appendix A.1. Proof of Theorem 1

Proof of Theorem 1.

First, applying integration by parts given any positive integer

H \leq K + 1

, we have that for all

ψ \in F_{K}

,

\begin{matrix} \sum_{r = 1}^{H} {(- 1)}^{r - 1} {\tilde{X}}_{ω, J}^{(r - 1)} (1 - Δ) ψ^{(H - r)} (1 - Δ) & = \sum_{r = 1}^{H} {(- 1)}^{r - 1} {\tilde{X}}_{ω, J}^{(r - 1)} (0) ψ^{(H - r)} (0) + \\ \int_{0}^{1 - Δ} {\tilde{X}}_{ω, J} (t) ψ^{(H)} (t) d t - {(- 1)}^{H} \int_{0}^{1 - Δ} {\tilde{X}}_{ω, J}^{(H)} (t) ψ (t) d t, \\ \sum_{r = 1}^{H} {(- 1)}^{r - 1} B_{K}^{(r - 1)} (1 - Δ, X_{ω}) ψ^{(H - r)} (1 - Δ) & = \sum_{r = 1}^{H} {(- 1)}^{r - 1} B_{K}^{(r - 1)} (0, X_{ω}) ψ^{(H - r)} (0) + \\ \int_{0}^{1 - Δ} B_{K} (t, X_{ω}) ψ^{(H)} (t) d t - {(- 1)}^{H} \int_{0}^{1 - Δ} B_{K}^{(H)} (t, X_{ω}) ψ (t) d t, \end{matrix}

which implies that

\begin{matrix} |\int_{0}^{1 - Δ} {\tilde{X}}_{ω, J}^{(H)} (t) ψ (t) d t - \int_{0}^{1 - Δ} B_{K}^{(H)} (t, X_{ω}) ψ (t) d t| \\ \leq |\sum_{r = 1}^{H} {(- 1)}^{r - 1} \{{\tilde{X}}_{ω, J}^{(r - 1)} (1 - Δ) - B_{K}^{(r - 1)} (1 - Δ, X_{ω})\} ψ^{(H - r)} (1 - Δ)| \\ + |\sum_{r = 1}^{H} {(- 1)}^{r - 1} \{{\tilde{X}}_{ω, J}^{(r - 1)} (0) - B_{K}^{(r - 1)} (0, X_{ω})\} ψ^{(H - r)} (0)| \\ + |\int_{0}^{1 - Δ} {\tilde{X}}_{ω, J} (t) ψ^{(H)} (t) d t - \int_{0}^{1 - Δ} B_{K} (t, X_{ω}) ψ^{(H)} (t) d t| . \end{matrix}

Lemma A2(c) indicates that both

| \sum_{r = 1}^{H} {(- 1)}^{r - 1} \{{\tilde{X}}_{ω, J}^{(r - 1)} (1 - Δ) - B_{K}^{(r - 1)} (1 - Δ, X_{ω})\} ψ^{(H - r)}

(1 - Δ) |

and

|\sum_{r = 1}^{H} {(- 1)}^{r - 1} \{{\tilde{X}}_{ω, J}^{(r - 1)} (0) - B_{K}^{(r - 1)} (0, X_{ω})\} ψ^{(H - r)} (0)|

are

o_{p} (1)

. Meanwhile, Lemma A2(b) implies that

|\int_{0}^{1 - Δ} {\tilde{X}}_{ω, J} (t) ψ^{(H)} (t) d t - \int_{0}^{1 - Δ} B_{K} (t, X_{ω}) ψ^{(H)} (t) d t| = o_{p} (1)

. Hence, given any positive integer

H \leq K + 1

,

\begin{matrix} |\int_{0}^{1 - Δ} {\tilde{X}}_{ω, J}^{(H)} (t) ψ (t) d t - \int_{0}^{1 - Δ} B_{K}^{(H)} (t, X_{ω}) ψ (t) d t| = |〈{\tilde{X}}_{ω, J}^{(H)}, ψ〉 - 〈B_{K}^{(H)} (\cdot, X_{ω}), ψ〉| = o_{p} (1), \forall ψ \in F_{K}, \end{matrix}

and for

R = K + 1

, applying the properties of inner products, one can obtain

\begin{matrix} 〈{\hat{X}}_{ω} (\cdot + Δ), ψ〉 = & 〈\sum_{r = 0}^{R} \frac{1}{r!} {(Δ)}^{r} {\tilde{X}}_{ω, J}^{(r)}, ψ〉 = \sum_{r = 0}^{R} \frac{1}{r!} {(Δ)}^{r} 〈{\tilde{X}}_{ω, J}^{(r)}, ψ〉 \\ = \sum_{r = 0}^{R} \frac{1}{r!} {(Δ)}^{r} 〈B_{K}^{(r)} (\cdot, X_{ω}), ψ〉 + o_{p} (1) = 〈\sum_{r = 0}^{R} \frac{1}{r!} {(Δ)}^{r} B_{K}^{(r)} (\cdot, X_{ω}), ψ〉 + o_{p} (1) \\ = 〈B_{K} (\cdot + Δ, X_{ω}), ψ〉 + o_{p} (1) . \end{matrix}

Then, under Lemma 1, the desired results follow. □

Appendix A.2. Proof of Theorem 2

Proof of Theorem 2.

To improve readability, we define the following:

\begin{matrix} Υ_{Δ} (t) : = \sum_{r = 0}^{R} \frac{1}{r!} Δ^{r} {[Φ^{(r)} (t)]}^{⊺}, Ω : = \frac{1}{J} \sum_{j = 1}^{J} Φ (t_{j}) Φ^{⊺} (t_{j}), Γ : = \int_{0}^{1} Φ^{(2)} (t) {\{Φ^{(2)} (t)\}}^{⊺} d t . \end{matrix}

Given the order of Q relative to J, we impose that

Q < J

without a loss of generality. Then, applying Equation (3), Lemma 1, and Lemma A3 with

f (M) = a^{⊺} M^{- 1} b

for square matrix

M

, vectors

a : = Υ_{Δ} (t)

and

b : = \frac{1}{J} \sum_{j = 1}^{J} Φ (t_{j}) X_{t_{j}}

, and matrices

A : = Ω + λ Γ

and

B : = Ω

indicates that

\begin{matrix} {\hat{X}}_{ω} (t + Δ) = Υ_{Δ} (t) {\tilde{C}}_{ω} = Υ_{Δ} (t) {(Ω + λ Γ)}^{- 1} \frac{1}{J} \sum_{j = 1}^{J} Φ (t_{j}) X_{t_{j}} = a^{⊺} A^{- 1} b = a^{⊺} B^{- 1} b + tr [{\{\frac{\partial f (Z)}{\partial Z}\}}^{⊺} (A - B)] \\ = Υ_{Δ} (t) Ω^{- 1} \frac{1}{J} \sum_{j = 1}^{J} Φ (t_{j}) X_{t_{j}} - tr [Z^{- 1} b a^{⊺} Z^{- 1} λ Γ] \\ = Υ_{Δ} (t) Ω^{- 1} \frac{1}{J} \sum_{j = 1}^{J} Φ (t_{j}) Φ^{⊺} (t_{j}) C + Υ_{Δ} (t) Ω^{- 1} \frac{1}{J} \sum_{j = 1}^{J} Φ (t_{j}) ϵ_{t_{j}} + O (Q^{2} λ) \\ = Υ_{Δ} (t) C + Υ_{Δ} (t) Ω^{- 1} \frac{1}{J} \sum_{j = 1}^{J} Φ (t_{j}) ϵ_{t_{j}} + O (Q^{2} λ), \end{matrix}

and thus

{\hat{X}}_{ω} (t + Δ) - B_{K} (t + Δ, X_{ω}) = Υ_{Δ} (t) Ω^{- 1} \frac{1}{J} \sum_{j = 1}^{J} Φ (t_{j}) ϵ_{t_{j}} + O (Q^{2} λ) .

By Assumption 2(b) and [64],

Ω = \frac{1}{J} \sum_{j = 1}^{J} Φ (t_{j}) Φ^{⊺} (t_{j}) = \int_{0}^{1} Φ (t) Φ^{⊺} (t) d t + o (J^{- 1})

. Then, again by Lemma A3, we have

\begin{matrix} {\hat{X}}_{ω} (t + Δ) - B_{K} (t + Δ, X_{ω}) = \frac{1}{J} \sum_{j = 1}^{J} \sum_{r = 0}^{R} \frac{1}{r!} Δ^{r} {\{Φ^{(r)} (t)\}}^{⊺} {\{\int_{0}^{1} Φ (s) Φ^{⊺} (s) d s\}}^{- 1} Φ (t_{j}) ϵ_{t_{j}} + o (J^{- 1} Q^{2}), \end{matrix}

whence with

A_{t_{j}} (t; Δ, R, Φ) : = \sum_{r = 0}^{R} \frac{1}{r!} Δ^{r} {\{Φ^{(r)} (t)\}}^{⊺} {\{\int_{0}^{1} Φ (s) Φ^{⊺} (s) d s\}}^{- 1} Φ (t_{j})

,

σ_{j}^{2} : = Var (ϵ_{t_{j}})

and

V (t; Δ, R, Φ) : = J^{- 2} \sum_{j = 1}^{J} σ_{j}^{2} A_{t_{j}}^{2} (t; Δ, R, Φ)

for all j, by Assumption 2(a) and the Lyapunov CLT, it follows that

\begin{matrix} {[V (t; Δ, R, Φ)]}^{- 1 / 2} \{{\hat{X}}_{ω} (t + Δ) - B_{K} (t + Δ, X_{ω})\} \\ = {[\sum_{j = 1}^{J} σ_{j}^{2} A_{t_{j}}^{2} (t; Δ, R, Φ)]}^{- 1 / 2} \sum_{j = 1}^{J} A_{t_{j}} (t; Δ, R, Φ) ϵ_{t_{j}} + o_{p} (1) \\ \overset{d}{⟶} N (0, 1) . \end{matrix}

Hence, under Lemma 1, the desired results follow with

K, R \to \infty

. □

Appendix A.3. Proof of Lemma 1

Proof of Lemma 1.

For any given state

ω \in Ω

, the continuous sample path

X_{ω}

is bounded on the compact set

[0, 1]

, where we have

M : = {sup}_{t \in [0, 1]} | X_{ω} (t) | < \infty

. Then, the result can be justified by the proof of Theorem 5.14 from [51]. □

Appendix A.4. Proof of Lemma A1

Proof of Lemma A1.

Lemma A1(a) can be justified by the results from previous studies: for example, under Assumptions 1 to 3 in [53], which state conditions for the choice of basis functions and the distribution of the sampling points.

For part (b), with properly selected basis functions that are continuously differentiable up to a desired order, applying the mean-value theorem indicates the Lipschitz condition such that for all

t_{1} < t_{2} \in T_{Δ, ζ}

with

t_{2} - t_{1} < ζ

,

|{\tilde{X}}_{ω, J} (t_{1}) - {\tilde{X}}_{ω, J} (t_{2})| \leq {sup}_{s \in T_{Δ, ζ}} |{\tilde{X}}_{ω, J}^{(1)} (s)| (t_{2} - t_{1})

, where

{sup}_{s \in T_{Δ, ζ}} |{\tilde{X}}_{ω, J}^{(1)} (s)| = O_{p} (1)

. Then, based on the fact that

{lim}_{ζ \to 0} {sup}_{|t_{1} - t_{2}| < ζ} |t_{1} - t_{2}| = 0

for all

t_{1}, t_{2} \in T_{Δ, ζ}

, the asymptotic stochastic equicontinuity of

{{\tilde{X}}_{ω, J}}

follows on

T_{Δ, ζ}

.

For part (c), note that the convergence of the functional estimators is achieved through the convergence of the estimated basis coefficients, and the derivatives of these functional estimators are obtained through the derivatives of the non-stochastic basis functions; hence, the convergence of the higher-order derivatives of the estimated functions can be easily justified by choosing the proper basis functions that are continuously differentiable up to a desired order.

For part (d), similarly to the justification for (b), with properly selected basis functions, one can determine that for all

t_{1} < t_{2} \in N_{0} (ζ)

,

|{\tilde{X}}_{ω, J}^{(r)} (t_{1}) - {\tilde{X}}_{ω, J}^{(r)} (t_{2})| \leq {sup}_{s \in [t_{1}, t_{2}]} |{\tilde{X}}_{ω, J}^{(r + 1)} (s)| (t_{2} - t_{1})

, where

{sup}_{s \in [t_{1}, t_{2}]} |{\tilde{X}}_{ω, J}^{(r + 1)} (s)| = O_{p} (1)

and

{lim}_{ζ \to 0} {sup}_{t_{1} < t_{2} \in N_{0} (ζ)} |t_{2} - t_{1}| = 0

. Therefore, the asymptotic stochastic equicontinuity of

{{\tilde{X}}_{ω, J}^{(r)}}

on

N_{0} (ζ)

for

r = 1, \dots, K

follows. The same justification holds for

N_{1 - Δ} (ζ)

. □

Appendix A.5. Proof of Lemma A2

Proof of Lemma A2.

To verify Lemma A2(a), note that

\begin{matrix} |lim_{t \to 0} {\tilde{X}}_{ω, J}^{(r)} (t) - B_{K}^{(r)} (0, X_{ω})| \leq |lim_{t \to 0} {\tilde{X}}_{ω, J}^{(r)} (t) - {\tilde{X}}_{ω, J}^{(r)} (t)| + |{\tilde{X}}_{ω, J}^{(r)} (t) - B_{K}^{(r)} (t, X_{ω})| + |B_{K}^{(r)} (t, X_{ω}) - B_{K}^{(r)} (0, X_{ω})| . \end{matrix}

Given any

ϱ > 0

, there exists an

η > 0

such that one can find a t for which there is a

\bar{J}

such that

\begin{matrix} P \{max \{|lim_{t \to 0} {\tilde{X}}_{ω, J}^{(r)} (t) - {\tilde{X}}_{ω, J}^{(r)} (t)|, |{\tilde{X}}_{ω, J}^{(r)} (t) - B_{K}^{(r)} (t, X_{ω})|, |B_{K}^{(r)} (t, X_{ω}) - B_{K}^{(r)} (0, X_{ω})|\} < ϱ\} > 1 - η, \forall J > \bar{J}, \end{matrix}

implying that

P \{|{lim}_{t \to 0} {\tilde{X}}_{ω, J}^{(r)} (t) - B_{K}^{(r)} (0, X_{ω})| < 3 ϱ\} > 1 - η

. Therefore, with Lemma A1(c), we can state that

{plim}_{J \to \infty} {lim}_{t \to 0} {\tilde{X}}_{ω, J}^{(r)} (t) = {lim}_{t \to 0} B_{K}^{(r)} (t, X_{ω}) = {lim}_{t \to 0} {plim}_{J \to \infty} {\tilde{X}}_{ω, J}^{(r)} (t)

.

For Lemma A2(b), applying Theorem 21.9 from [65], with

{\tilde{X}}_{ω, J} (t) - B_{K} (t, X_{ω}) \overset{p}{⟶} 0

for each

t \in T_{Δ, ζ}

according to Lemma A1(a), as well as the asymptotic stochastic equicontinuity of

{{\tilde{X}}_{ω, J} (t)}

on

t \in T_{Δ, ζ}

according to Lemma A1(b), the uniform convergence in probability of

{\tilde{X}}_{ω, J}

on

T_{Δ, ζ}

such that

{sup}_{t \in T_{Δ, ζ}} |{\tilde{X}}_{ω, J} (t) - B_{K} (t, X_{ω})| \overset{p}{⟶} 0

follows. Hence, Lemma A2(b) is verified.

For Lemma A2(c), since the proofs for the two convergences follow the same idea, we only focus on

t \to 0^{+}

and omit the proof under

t \to {(1 - Δ)}^{-}

. Let

B_{ζ} = N_{0} (ζ) ⋂ [0, 1 - Δ]

. Then, Lemma A2(c) can be proved by induction—we show that

{plim}_{J \to \infty} {\tilde{X}}_{ω, J}^{(1)} (0) = B_{K}^{(1)} (0, X_{ω})

, and we justify that

{plim}_{J \to \infty} {\tilde{X}}_{ω, J}^{(r)} (0) = B_{K}^{(r)} (0, X_{ω})

implies

{plim}_{J \to \infty} {\tilde{X}}_{ω, J}^{(r + 1)} (0) = B_{K}^{(r + 1)} (0, X_{ω})

for

r = 1, \dots, K - 1

.

Similarly to the verification for Lemma A2(b) based on pointwise convergence and asymptotic stochastic equicontinuity, by Lemmas A1(c) and (d), as well as Theorem 21.9 from [65], we can show that

{\tilde{X}}_{ω, J}^{(1)}

converges uniformly in probability on

B_{ζ}

, such that given any

ϱ > 0

, there exists an

η > 0

for which one can find a

\bar{J}

such that

P \{sup_{t \in B_{ζ}} |{\tilde{X}}_{J_{1}}^{(1)} (t) - {\tilde{X}}_{J_{2}}^{(1)} (t)| < ϱ\} > 1 - η, \forall J_{1}, J_{2} > \bar{J} .

(A1)

Under the same

ϱ

,

η

,

J_{1}

, and

J_{2}

, for all

τ \neq 0 \in B_{ζ}

, applying the mean-value theorem yields

\begin{matrix} |\frac{{\tilde{X}}_{J_{1}} (τ) - {\tilde{X}}_{J_{2}} (τ) - {\tilde{X}}_{J_{1}} (0) + {\tilde{X}}_{J_{2}} (0)}{τ - 0}| \leq sup_{t \in [0, τ] \subset B_{ζ}} |{\tilde{X}}_{J_{1}}^{(1)} (t) - {\tilde{X}}_{J_{2}}^{(1)} (t)| \leq sup_{t \in B_{ζ}} |{\tilde{X}}_{J_{1}}^{(1)} (t) - {\tilde{X}}_{J_{2}}^{(1)} (t)|, \end{matrix}

and with (A1), we have

\begin{matrix} P \{sup_{t \neq 0 \in B_{ζ}} |\frac{{\tilde{X}}_{J_{1}} (t) - {\tilde{X}}_{J_{2}} (t) - {\tilde{X}}_{J_{1}} (0) + {\tilde{X}}_{J_{2}} (0)}{t}| < ϱ\} > 1 - η . \end{matrix}

(A2)

We define the following two functions for

t \neq 0 \in B_{ζ}

:

\begin{matrix} g_{J} (t) = \frac{{\tilde{X}}_{ω, J} (t) - {\tilde{X}}_{ω, J} (0)}{t} and g (t) = \frac{B_{K} (t, X_{ω}) - B_{K} (0, X_{ω})}{t}; \end{matrix}

then, (A2) implies that

g_{J}

converges uniformly in probability on

B_{ζ} ∖ {0}

. Since

{\tilde{X}}_{ω, J}

converges uniformly to

B_{K} (\cdot, X_{ω})

in probability on

B_{ζ}

, it follows that

\begin{matrix} \underset{J \to \infty}{plim} g_{J} (t) = g (t), \forall t \neq 0 \in B_{ζ} . \end{matrix}

(A3)

Meanwhile, given the differentiability of

{\tilde{X}}_{ω, J} (t)

and

B_{K} (t, X_{ω})

, we have

\begin{matrix} lim_{t \to 0} g_{J} (t) = {\tilde{X}}_{ω, J}^{(1)} (0) and lim_{t \to 0} g (t) = B_{K}^{(1)} (0, X_{ω}) . \end{matrix}

(A4)

Then, applying Lemma A2(a) on (A3) and (A4) indicates that

{plim}_{J \to \infty} {\tilde{X}}_{ω, J}^{(1)} (0) = B_{K}^{(1)} (0, X_{ω})

.

Now suppose for a given r, where

r = 1, \dots, K - 1

, we have

{plim}_{J \to \infty} {\tilde{X}}_{ω, J}^{(r)} (0) = B_{K}^{(r)} (0, X_{ω})

. Then, Lemmas A1(c) and (d) as well as Theorem 21.9 from [65] imply that

{\tilde{X}}_{ω, J}^{(r + 1)}

converges uniformly in probability on

B_{ζ}

, such that given any

ϱ > 0

, there exists an

η > 0

for which one can find a

\bar{J}

such that

P \{{sup}_{t \in B_{ζ}} |{\tilde{X}}_{J_{1}}^{(r + 1)} (t) - {\tilde{X}}_{J_{2}}^{(r + 1)} (t)| < ϱ\} > 1 - η

, for all

J_{1}, J_{2} > \bar{J}

. Similarly to the previous proof, one can obtain

\begin{matrix} P \{sup_{t \neq 0 \in B_{ζ}} |\frac{{\tilde{X}}_{J_{1}}^{(r)} (t) - {\tilde{X}}_{J_{2}}^{(r)} (t) - {\tilde{X}}_{J_{1}}^{(r)} (0) + {\tilde{X}}_{J_{2}}^{(r)} (0)}{t}| < ϱ\} > 1 - η . \end{matrix}

Note that the pointwise consistency of

{\tilde{X}}_{ω, J}^{(1)}

on

B_{ζ}

can be shown by the same means as for

{plim}_{J \to \infty} {\tilde{X}}_{ω, J}^{(1)} (0) = B_{K}^{(1)} (0, X_{ω})

, switching m with any point in the domain. Then, we re-define the following two functions for

t \neq 0 \in B_{ζ}

:

\begin{matrix} g_{J} (t) = \frac{{\tilde{X}}_{ω, J}^{(r)} (t) - {\tilde{X}}_{ω, J}^{(r)} (0)}{t} and g (t) = \frac{B_{K}^{(r)} (t, X_{ω}) - B_{K}^{(r)} (0, X_{ω})}{t} . \end{matrix}

It is implied by the uniform convergence and the pointwise consistency that

{\tilde{X}}_{ω, J}^{(r)}

converges uniformly to

B_{K}^{(r)} (\cdot, X_{ω})

in probability on

B_{ζ}

. It follows that

\begin{matrix} \underset{J \to \infty}{plim} g_{J} (t) = g (t), \forall t \neq 0 \in B_{ζ} . \end{matrix}

Meanwhile, given the differentiability of

{\tilde{X}}_{ω, J}^{(r + 1)} (t)

and

B_{K}^{(r + 1)} (t, X_{ω})

, we have

\begin{matrix} lim_{t \to 0} g_{J} (t) = {\tilde{X}}_{ω, J}^{(r + 1)} (0) and lim_{t \to 0} g (t) = B_{K}^{(r + 1)} (0, X_{ω}) . \end{matrix}

Then, again applying Lemma A2(a) indicates that

{plim}_{J \to \infty} {\tilde{X}}_{ω, J}^{(r + 1)} (0) = B_{K}^{(r + 1)} (0, X_{ω})

. □

Appendix A.6. Proof of Lemma A3

Proof of Lemma A3.

First, let

ψ (q) : = f (B + q (A - B))

for

q \in [0, 1]

. Then, taking the first-order derivative of

ψ (q)

with respect to q through the matrix argument of the function f yields

\begin{matrix} ψ^{(1)} (q) = tr [{\{\frac{\partial f (B + q (A - B))}{\partial (B + q (A - B))}\}}^{⊺} \{\frac{\partial (B + q (A - B))}{\partial q}\}] = tr [{\{\frac{\partial f (B + q (A - B))}{\partial (B + q (A - B))}\}}^{⊺} (A - B)] . \end{matrix}

By the mean-value theorem, there exists some

q \in [0, 1]

such that

ψ (1) - ψ (0) = ψ^{(1)} (q)

, which is equivalent to

\begin{matrix} f (A) - f (B) = tr [{\{\frac{\partial f (Z)}{\partial Z}\}}^{⊺} (A - B)] . \end{matrix}

□

Appendix B

Appendix B.1. FDA Results

This appendix presents the functional data predictions of return and volatility, the corresponding K-S test results with different sample sizes, and the distribution of the comparisons in relative MSFE (RMSFE). In particular, we adjusted our simulation by either altering the observation frequency to daily or changing the rolling window to one month. The prediction results are graphed in Figure A1, Figure A2, Figure A5, Figure A6, Figure A9 and Figure A10. All the plots of the one-step-ahead predictions are consistent with the scenario presented in the main text, where our forecast results could correctly trace the trends of the underlying processes. Further, Figure A4, Figure A8 and Figure A12 show that the predictions shared the same distribution as the true underlying process at (almost) every time point, while Figure A3, Figure A7 and Figure A11 imply that the null hypothesis of distribution equality is rejected where the underlying processes show significant differences in the cross-comparison. In summary, the K-S test results indicate that the FDA-based method appeared to correctly distinguish between the processes with different limits in out-of-sample prediction and various sample sizes.

Figure A1. Functional data prediction, one-month rolling window, continuous returns.

Figure A2. Functional data prediction, one-month rolling window, continuous volatility.

Figure A3.

K - S

test, one-month rolling window, continuous observations, cross-comparison.

Figure A3.

K - S

test, one-month rolling window, continuous observations, cross-comparison.

Figure A4.

K - S

test, one-month rolling window, continuous observations, comparison with the true underlying process.

Figure A4.

K - S

test, one-month rolling window, continuous observations, comparison with the true underlying process.

Figure A5. Functional data prediction, eight-month rolling window, daily returns.

Figure A6. Functional data prediction, eight-month rolling window, daily volatility.

Figure A7.

K - S

test, eight-month rolling window, daily observations, cross-comparison.

Figure A7.

K - S

test, eight-month rolling window, daily observations, cross-comparison.

Figure A8.

K - S

test, eight-month rolling window, daily observations, comparison with the true underlying process.

Figure A8.

K - S

test, eight-month rolling window, daily observations, comparison with the true underlying process.

Figure A9. Functional data prediction, one-month rolling window, daily returns.

Figure A10. Functional data prediction, one-month rolling window, daily volatility.

Figure A11.

K - S

test, one-month rolling window, daily observations, cross-comparison.

Figure A11.

K - S

test, one-month rolling window, daily observations, cross-comparison.

Figure A12.

K - S

test, one-month rolling window, daily observations, comparison with the true underlying process.

Figure A12.

K - S

test, one-month rolling window, daily observations, comparison with the true underlying process.

Appendix B.2. MLE Results

This section presents detailed information on the 420 rolling-window estimates. Figure A14, Figure A15, Figure A16, Figure A17 and Figure A18 present the estimation results for each parameter. The three consecutive plots, from left to right, illustrate the distribution of the estimated values eliminating the 2.5% tail on each side to avoid plot distortion due to extreme values, the estimation bias, and the rejection rate, respectively. Within each plot, the label “SS” corresponds to the utilization of the SV model for estimating SV underlying processes, “SD” corresponds to the employment of the SV model for DV underlying process estimation, “DD” corresponds the use of the DV model for estimating DV underlying processes, and “DS” corresponds to the use of the DV model for SV underlying process estimation. These figures suggest that MLE offered dependable estimations under accurate specifications. However, its performance lacked consistency when the underlying process was misspecified.

Figure A13. Unaffected MLE vs. FDA.

Figure A14. Estimation of

\hat{a}

.

Figure A14. Estimation of

\hat{a}

.

Figure A15. Estimation of

\hat{α}

.

Figure A15. Estimation of

\hat{α}

.

Figure A16. Estimation of

\hat{β}

.

Figure A16. Estimation of

\hat{β}

.

Figure A17. Estimation of

\hat{σ}

.

Figure A17. Estimation of

\hat{σ}

.

Figure A18. Estimation of

\hat{ρ}

.

Figure A18. Estimation of

\hat{ρ}

.

References

Buccheri, G.; Corsi, F.; Flandoli, F.; Livieri, G. The continuous-time limit of score-driven volatility models. J. Econom. 2021, 221, 655–675. [Google Scholar] [CrossRef]
Merton, R.C. On estimating the expected return on the market: An exploratory investigation. J. Financ. Econ. 1980, 8, 323–361. [Google Scholar] [CrossRef]
Merton, R.C.; Samuelson, P.A. Continuous-Time Finance; Blackwell: Boston, MA, USA, 1992. [Google Scholar]
Stentoft, L. American option pricing with discrete and continuous time models: An empirical comparison. J. Empir. Financ. 2011, 18, 880–902. [Google Scholar] [CrossRef]
Badescu, A.; Cui, Z.; Ortega, J.-P. Non-affine GARCH option pricing models, variance-dependent kernels, and diffusion limits. J. Financ. Econom. 2017, 15, 602–648. [Google Scholar] [CrossRef]
Corradi, V. Reconsidering the continuous time limit of the GARCH (1, 1) process. J. Econom. 2000, 96, 145–153. [Google Scholar] [CrossRef]
Corradi, V.; Distaso, W. Deterministic versus Stochastic Volatility; Manuscript; University of Warwick: Coventry, UK, 2010. [Google Scholar]
Das, S.R. The surprise element: Jumps in interest rates. J. Econom. 2002, 106, 27–65. [Google Scholar] [CrossRef]
Duan, J.-C.; Ritchken, P.; Sun, Z. Approximating GARCH-JUMP models, jump-diffusion processes, and option pricing. Math. Financ. Int. J. Math. Stat. Financ. Econ. 2006, 16, 21–52. [Google Scholar] [CrossRef]
Giraitis, L.; Leipus, R.; Surgailis, D. Recent advances in ARCH modelling. In Long Memory in Economics; Springer: Berlin/Heidelberg, Germany, 2007; pp. 3–38. [Google Scholar]
Hafner, C.M.; Laurent, S.; Violante, F. Weak diffusion limits of dynamic conditional correlation models. Econom. Theory 2017, 33, 691–716. [Google Scholar] [CrossRef]
Trifi, A. Issues of aggregation over time of conditional heteroscedastic volatility models: What kind of diffusion do we recover? Stud. Nonlinear Dyn. Econom. 2006, 10, 4. [Google Scholar] [CrossRef]
Nelson, D.B. ARCH models as diffusion approximations. J. Econom. 1990, 45, 7–38. [Google Scholar] [CrossRef]
Alexander, C.; Lazar, E. The continuous limit of weak GARCH. Econom. Rev. 2021, 40, 197–216. [Google Scholar] [CrossRef]
Singleton, K.J. Empirical Dynamic Asset Pricing; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
Sundaresan, S.M. Continuous-time methods in finance: A review and an assessment. J. Financ. 2000, 55, 1569–1622. [Google Scholar] [CrossRef]
Aydin, D.; Yilmaz, E. Censored nonparametric time-series analysis with autoregressive error models. Comput. Econ. 2021, 58, 169–202. [Google Scholar] [CrossRef]
Bosq, D. Nonparametric Statistics for Stochastic Processes: Estimation and Prediction; Springer Science & Business Media: New York, NY, USA, 2012; Volume 110. [Google Scholar]
Fan, J.; Yao, Q. Nonlinear Time Series: Nonparametric and Parametric Methods; Springer: New York, NY, USA, 2003; Volume 20. [Google Scholar]
Härdle, W.; Lütkepohl, H.; Chen, R. A review of nonparametric time series analysis. Int. Stat. Rev. 1997, 65, 49–72. [Google Scholar] [CrossRef]
Heiler, S. A Survey on Nonparametric Time Series Analysis; Technical Report, CoFE Discussion Paper; ZBW—Leibniz-Informationszentrum Wirtschaft: Kiel, Germany, 1999. [Google Scholar]
Kleppe, T.S.; Yu, J.; Skaug, H.J. Maximum likelihood estimation of partially observed diffusion models. J. Econom. 2014, 180, 73–80. [Google Scholar] [CrossRef]
Kutoyants, Y.A. Identification of Dynamical Systems with Small Noise; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 300. [Google Scholar]
Pena, D.; Tiao, G.C.; Tsay, R.S. A Course in Time Series Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
Ryabko, D. Asymptotic Nonparametric Statistical Analysis of Stationary Time Series; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
Zhu, B.; Song, P.X.-K.; Taylor, J.M. Stochastic functional data analysis: A diffusion model-based approach. Biometrics 2011, 67, 1295–1304. [Google Scholar] [CrossRef]
Aït-Sahalia, Y.; Kimmel, R. Maximum likelihood estimation of stochastic volatility models. J. Financ. Econ. 2007, 83, 413–452. [Google Scholar] [CrossRef]
Durham, G.B.; Gallant, A.R. Numerical techniques for maximum likelihood estimation of continuous-time diffusion processes. J. Bus. Econ. Stat. 2002, 20, 297–338. [Google Scholar] [CrossRef]
Berry, T.; Giannakis, D.; Harlim, J. Nonparametric forecasting of low-dimensional dynamical systems. Phys. Rev. E 2015, 91, 032915. [Google Scholar] [CrossRef]
Matzner-Løfber, E.; Gannoun, A.; Gooijer, J.G.D. Nonparametric forecasting: A comparison of three kernel-based methods. Commun. Stat.-Theory Methods 1998, 27, 1593–1617. [Google Scholar] [CrossRef]
Lahiri, S.N. On inconsistency of estimators based on spatial data under infill asymptotics. Sankhyā Indian J. Stat. Ser. A 1996, 58, 403–417. [Google Scholar]
Cressie, N.A. Statistics for Spatial Data; Wiley: New York, NY, USA, 1991. [Google Scholar]
Kurisu, D. On nonparametric inference for spatial regression models under domain expanding and infill asymptotics. Stat. Probab. Lett. 2019, 154, 108543. [Google Scholar] [CrossRef]
Chen, X. Large sample sieve estimation of semi-nonparametric models. Handb. Econom. 2007, 6, 5549–5632. [Google Scholar]
Chen, X.; Liao, Z. Sieve semiparametric two-step GMM under weak dependence. J. Econom. 2015, 189, 163–186. [Google Scholar] [CrossRef]
Grenander, U. Abstract Inference; Wiley: New York, NY, USA; Toronto, ON, Canada, 1981. [Google Scholar]
Chen, T.; DeJuan, J.; Tian, R. Distributions of GDP across versions of the Penn World Tables: A functional data analysis approach. Econ. Lett. 2018, 170, 179–184. [Google Scholar] [CrossRef]
Ramsay, J. Functional Data Analysis; Wiley Online Library: Hoboken, NJ, USA, 2005. [Google Scholar]
Adebayo, F.A.; Sivasamy, R.; Shangodoyin, D.K. Forecasting stock market series with ARIMA model. J. Stat. Econom. Methods 2014, 3, 65–77. [Google Scholar]
Khan, S.; Alghulaiakh, H. ARIMA model for accurate time series stocks forecasting. Int. J. Adv. Comput. Sci. Appl. 2015, 11, 7. [Google Scholar] [CrossRef]
Mondal, P.; Shit, L.; Goswami, S. Study of effectiveness of time series modeling (ARIMA) in forecasting stock prices. Int. J. Comput. Sci. Eng. Appl. 2014, 4, 13. [Google Scholar] [CrossRef]
Huang, M.; Rojas, R.; Convery, P. Forecasting stock market movements using Google Trend searches. Empir. Econ. 2022, 59, 2821–2839. [Google Scholar] [CrossRef]
Kumar, M.; Thenmozhi, M. Forecasting stock index returns using ARIMA-SVM, ARIMA-ANN, and ARIMA-random forest hybrid models. Int. J. Banking Account. Financ. 2014, 5, 284–308. [Google Scholar] [CrossRef]
Dai, Z.; Dong, X.; Kang, J.; Hong, L. Forecasting stock market returns: New technical indicators and two-step economic constraint method. N. Am. J. Econ. Financ. 2020, 53, 101216. [Google Scholar] [CrossRef]
Li, A.; Bastos, G.S. Stock market forecasting using deep learning and technical analysis: A systematic review. IEEE Access 2020, 8, 185232–185242. [Google Scholar] [CrossRef]
Lo, A.W.; Mamaysky, H.; Wang, J. Foundations of technical analysis: Computational algorithms, statistical inference, and empirical implementation. J. Financ. 2000, 55, 1705–1765. [Google Scholar] [CrossRef]
Agrawal, J.G.; Chourasia, V.; Mittra, A. State-of-the-art in stock prediction techniques. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2013, 2, 1360–1366. [Google Scholar]
Chen, Y.J.; Chen, Y.M.; Lu, C.L. Enhancement of stock market forecasting using an improved fundamental analysis-based approach. Soft Comput. 2017, 21, 3735–3757. [Google Scholar] [CrossRef]
Seng, D.; Hancock, D.R. Fundamental analysis and the prediction of earnings. Int. J. Bus. Manag. 2012, 7, 32. [Google Scholar] [CrossRef]
Jouzdani, N.M.; Panaretos, V.M. Functional data analysis with rough sampled paths? arXiv 2021, arXiv:2105.12035. [Google Scholar]
Kadison, R.V.; Liu, Z. The Heisenberg relation-mathematical formulations. SIGMA Symmetry Integr. Geom. Methods Appl. 2014, 10, 009. [Google Scholar] [CrossRef]
Davidson, K.R.; Donsig, A.P. Real Analysis and Applications; Springer: New York, NY, USA, 2010. [Google Scholar]
Claeskens, G.; Krivobokova, T.; Opsomer, J.D. Asymptotic properties of penalized spline estimators. Biometrika 2009, 96, 529–544. [Google Scholar] [CrossRef]
Bollerslev, T. Generalized autoregressive conditional heteroskedasticity. J. Econom. 1986, 31, 307–327. [Google Scholar] [CrossRef]
Engle, R.F. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econom. J. Econom. Soc. 1982, 50, 987–1007. [Google Scholar] [CrossRef]
Meddahi, N. An Eigenfunction Approach for Volatility Modeling; Cahier de Recherche; University of Montreal: Montreal, QC, Canada, 2001. [Google Scholar]
Christoffersen, P.; Elkamhi, R.; Feunou, B.; Jacobs, K. Option valuation with conditional heteroskedasticity and nonnormality. Rev. Financ. Stud. 2010, 23, 2139–2183. [Google Scholar] [CrossRef]
Wu, X.; Zhou, H.; Wang, S. Estimation of market prices of risks in the GARCH diffusion model. Econ. Res.—Ekon. Istraživanja 2018, 31, 15–36. [Google Scholar] [CrossRef]
Wang, Y. Asymptotic nonequivalence of GARCH models and diffusions. Ann. Stat. 2002, 30, 754–783. [Google Scholar] [CrossRef]
Ledoit, O.; Santa-Clara, P. Relative Pricing of Options with Stochastic Volatility; Los Angeles Finance Working Paper; University of California: Los Angeles, CA, USA, 1998; pp. 9–98. [Google Scholar]
Hansen, P.R.; Lunde, A. A forecast comparison of volatility models: Does anything beat a GARCH (1, 1)? J. Appl. Econom. 2005, 20, 873–889. [Google Scholar] [CrossRef]
Leitch, G.; Tanner, J.E. Economic forecast evaluation: Profits versus the conventional error measures. Am. Econ. Rev. 1991, 81, 580–590. [Google Scholar]
Aït-Sahalia, Y. Maximum likelihood estimation of discretely sampled diffusions: A closed-form approximation approach. Econometrica 2002, 70, 223–262. [Google Scholar] [CrossRef]
Chui, C.K. Concerning rates of convergence of Riemann sums. J. Approx. Theory 1971, 4, 279–287. [Google Scholar] [CrossRef]
Davidson, J. Stochastic Limit Theory: An Introduction for Econometricians; OUP Oxford: Oxford, UK, 1994. [Google Scholar]

Figure 1. Functional data prediction, eight-month rolling window, continuous returns.

Figure 2. Functional data prediction, eight-month rolling window, continuous volatility.

Figure 3.

K - S

test, eight-month rolling window, cross-comparison.

Figure 3.

K - S

test, eight-month rolling window, cross-comparison.

Figure 4.

K - S

test, eight-month rolling window, comparison with the true underlying process.

Figure 4.

K - S

test, eight-month rolling window, comparison with the true underlying process.

Figure 5. RMSFE distributions for different underlying processes.

Figure 6. rRMSFEs of FDA vs. MLE.

Figure 7. Functional data prediction, hourly S&P 500 log-returns.

Figure 8. Functional data prediction, daily S&P 500 log-returns.

Figure 9. Functional data prediction, hourly S&P 500 log-returns, distribution of % forecast error.

Figure 10. Functional data prediction, daily S&P 500 log-returns, distribution of % forecast error.

Table 1. Estimating SV parameters with MLE.

Parameter	a	$α$	$β$	$σ$	$ρ$
True value	0.1	0.2	−8.5	2.7	−0.8
SV process fitted by an SV model (correct specification)
Estimate	0.004	0.289	−9.546	2.699	−0.800
Bias	−0.096	0.089	−1.046	−0.001	0.000
Rejection rate	0.066	0.080	0.047	0.056	0.062
SV process fitted by a DV model (misspecification)
Estimate	0.101	232.294	−259.989	–	–
Bias	0.001	232.094	−251.489	–	–
Rejection rate	0.060	0.789	0.902	–	–

Note: a 1000-time simulation allows for a “

\pm \sqrt{0.95 * 0.05 / 1000} * 2 = 0.014

” error on the

5 %

rejection rate under correct specification.

Table 2. Estimating DV parameters with MLE.

Parameter	a	$α$	$β$	$σ$	$ρ$
True value	0.1	0.2	−8.5	0	0
DV process fitted by a DV model (correct specification)
Estimate	0.107	0.200	−8.531	–	–
Bias	0.007	0.000	−0.031	–	–
Rejection rate	0.048	0.051	0.053	–	–
DV process fitted by an SV model (misspecification)
Estimate	1.033	0.200	−8.500	0.000	0.154
Bias	0.933	0.000	0.000	0.000	0.154
Rejection rate	0.346	1.000	1.000	1.000	0.877

Note: a 1000-time simulation allows for a “

\pm \sqrt{0.95 * 0.05 / 1000} * 2 = 0.014

” error on the

5 %

rejection rate under correct specification.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, T.; Li, Y.; Tian, R. A Functional Data Approach for Continuous-Time Analysis Subject to Modeling Discrepancy under Infill Asymptotics. Mathematics 2023, 11, 4386. https://doi.org/10.3390/math11204386

AMA Style

Chen T, Li Y, Tian R. A Functional Data Approach for Continuous-Time Analysis Subject to Modeling Discrepancy under Infill Asymptotics. Mathematics. 2023; 11(20):4386. https://doi.org/10.3390/math11204386

Chicago/Turabian Style

Chen, Tao, Yixuan Li, and Renfang Tian. 2023. "A Functional Data Approach for Continuous-Time Analysis Subject to Modeling Discrepancy under Infill Asymptotics" Mathematics 11, no. 20: 4386. https://doi.org/10.3390/math11204386

APA Style

Chen, T., Li, Y., & Tian, R. (2023). A Functional Data Approach for Continuous-Time Analysis Subject to Modeling Discrepancy under Infill Asymptotics. Mathematics, 11(20), 4386. https://doi.org/10.3390/math11204386

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Functional Data Approach for Continuous-Time Analysis Subject to Modeling Discrepancy under Infill Asymptotics

Abstract

1. Introduction

2. Methods

2.1. Fitting and Forecasting

2.2. Large-Sample Properties

3. Simulation

3.1. The Data-Generating Process

3.2. Fitting and Forecasting with FDA

3.3. Comparison with Parametric Methods

3.4. Results

4. Empirical Study

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Proof of Theorem 1

Appendix A.2. Proof of Theorem 2

Appendix A.3. Proof of Lemma 1

Appendix A.4. Proof of Lemma A1

Appendix A.5. Proof of Lemma A2

Appendix A.6. Proof of Lemma A3

Appendix B

Appendix B.1. FDA Results

Appendix B.2. MLE Results

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI