Stochastic Conditional Duration Model with Intraday Seasonality and Limit Order Book Information

Toyabe, Tomoki; Nakatsuma, Teruo

doi:10.3390/jrfm15100470

Open AccessArticle

Stochastic Conditional Duration Model with Intraday Seasonality and Limit Order Book Information

by

Tomoki Toyabe

^1,*

and

Teruo Nakatsuma

²

¹

Graduate School of Economics, Keio University, Tokyo 108-8345, Japan

²

Faculty of Economics, Keio University, Tokyo 108-8345, Japan

^*

Author to whom correspondence should be addressed.

J. Risk Financial Manag. 2022, 15(10), 470; https://doi.org/10.3390/jrfm15100470

Submission received: 26 July 2022 / Revised: 11 October 2022 / Accepted: 11 October 2022 / Published: 17 October 2022

(This article belongs to the Special Issue Innovative Financial Econometrics and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

It is a widely known fact that the intraday seasonality of trading intervals for financial transactions such as stocks is short at the beginning of business hours and long in the middle of the day. In this paper, we extend the stochastic conditional duration (SCD) model to capture the pattern of intraday trading intervals and propose a new Markov chain Monte Carlo method to estimate this intraday seasonality simultaneously. To efficiently generate the Monte Carlo sample, we used a hybrid of the Gibbs/Metropolis–Hastings (MH) sampling scheme and also applied generalized Gibbs sampling. In addition to capturing this intraday seasonality, this paper also considers limit order book information. Three-day tick data for three stocks obtained from Nikkei NEEDS are used for estimation, and model selection is performed on smooth parameters, Weibull distribution and Gamma distribution. The typical intraday regularity of frequent trading immediately after the start of trading is confirmed, and the spread of the limit order book information is also found to affect the trading time interval.

Keywords:

Bayesian inference; Markov chain Monte Carlo; Metropolis–Hastings algorithm; state space model; block sampler

1. Introduction

Recently, intraday transaction data have come to attract more and more researcher attention with the increased availability of high frequency data, development in computer technology and the increasing market share of high-frequency trading (HFT). According to Hosaka (2014), in the Tokyo Stock Exchange (TSE), HFT accounted for about 55% of total orders in May 2013, compared with about 23% in September 2012. In such a situation, the importance of the analysis of intraday data has been increasing.

The first characteristics of these intraday transaction data are that they are irregularly spaced. Thus, the conventional models for regularly spaced time series, such as the autoregressive conditional heteroskedasticity (ARCH) model by Engle (1982), the generalized autoregressive conditional heteroskedasticity (GARCH) model by Bollerslev (1986) and the stochastic volatility (SV) model by Taylor (1982), cannot be applied to the transaction data. As an alternative to these models for fixed interval data, Engle and Russell (1998) propose the autoregressive conditional duration (ACD) model. The word “duration” means the interval between two transaction times. In the ACD model, this duration is treated as the product of the conditional duration, which depends on past durations and noise.

Bauwens and Veredas (2004) proposed the stochastic conditional duration (SCD) model and assumed this duration to be driven by a latent variable. The state equation of the basic SCD model is a stationary first order autoregressive process (AR(1)) on the logarithm of the latent variable, and the observation equation is a product of the latent variable and noise, which follows either a Gamma or a Weibull distribution. The similarity of the forms between the SCD model and the SV model motivates some researchers to use an MCMC method for estimation. Strickland et al. (2006) developed a Bayesian MCMC method which employs a hybrid Gibbs/Metropolis–Hastings (MH) sampling scheme for estimating the SCD model. In this method, as proposed in Shephard and Pitt (1997), the state variables are divided into several blocks and are drawn in each block. Watanabe and Omori (2004) suggested that the blocking scheme proposed by Shephard and Pitt (1997) yields an estimation bias and Omori and Watanabe (2008) proposed a new efficient method for asymmetric stochastic volatility models.

The second characteristic of intraday transaction data is that they have an intraday seasonality. Trading frequency is known to be higher just after the start and just before the end of the market (i.e., shorter trading time intervals), which is called intraday seasonality. Engle and Russell (1998) estimated the ACD model using “diurnal adjusted data” in which diurnal seasonality was removed in advance by cubic spline. The log-ACD model proposed by Bauwens and Giot (2000) also estimates in two steps using cubic spline. Veredas et al. (2002) estimated the representation of intraday seasonality with a quartic kernel simultaneously in the log-ACD model. Brownlees and Vannucci (2013) proposed the mixed ACD model that incorporates the ACD model, intraday seasonality and daily random effect. The parameters of the mixed ACD model are estimated jointly by MCMC, and intraday seasonality is represented by a cubic B-Spline. In the SCD model, Bauwens and Veredas (2004) used a quartic kernel, and Feng et al. (2004) used a piecewise cubic spline to remove intraday seasonality in advance. Men et al. (2019) proposed the threshold stochastic conditional duration (TSCD) model. The model assumes that the AR(1) process switches between the two regimes, allowing us to consider the behavior of two types of traders with different behavioral criteria. Gingras and McCausland (2020) proposed the all-duration flexible SCD (FSCD) model. This is an extension of the SCD model that attempts to model transactions occurring at the same second with a mixture distribution and uses the B-Spline to capture intraday seasonality.

The main contributions of this study are twofold. First, we extend the simultaneous intraday seasonality estimation algorithm using MCMC within the SCD model. In the previous study by Strickland et al. (2006), the intraday seasonality was simultaneously estimated within the SCD model by MCMC, however, there are no parameters to be estimated by the MCMC loop because cubic smoothing spline is used. and the roughness parameter is estimated by generalized cross-sectional verification. In this study, intraday seasonality is simultaneously estimated via parameters by including the coefficients of the basis spline (B-Spline) in the MCMC loop. B-Spline is a piecewise polynomial function driven by control points. This property allows for the expression of smooth curves without the use of higher-order functions. In addition, since the curve is expressed as a linear combination, it can be estimated in the same framework as the estimation of the effect of external information, such as the limit order information. The second contribution is the treatment of the limit order information in the SCD model. Although the limit order information plays a very important role in capturing recent trends in financial markets, there have been only a few analyses using limit order information in the SCD model, such as Sugiura et al. (2015). In this paper, we estimate the effect of the limit order book information on the trading time interval using the variables of limit order price difference (spread), order volume, and the balance between buy and sell orders.

The structure of this paper is as follows. In Section 2, we introduce the SCD model. Section 3 describes a MCMC sampling method for this model. Section 4 provides estimates for three stocks listed on the TSE. Section 5 presents the conclusions.

2. Stochastic Conditional Duration Model

2.1. A Proposed Model

Defining

τ_{i}

as the time of the ith transaction, the ith duration

y_{i}

can be written as

y_{i} = τ_{i + 1} - τ_{i}

. The SCD model is a non-Gaussian state space model and considers this duration

y_{i}

as a product of a latent valuable

α_{i}

and a positive random variable

ε_{i}

. The state equation is a stationary AR(1) process on the logarithm of the latent variable with noise

η_{i}

. Both

ε_{i}

and

η_{i}

are mutually and serially independent in this model. In this paper, we consider the SCD model as

\begin{matrix} y_{i} & = exp (x_{i}^{'} β + α_{i}) ε_{i}, ε_{i} > 0, i \in {1, \dots, N}, \end{matrix}

(1)

\begin{matrix} \begin{matrix} α_{i} & = ϕ α_{i - 1} + η_{i}, η_{i} \sim N (0, σ^{2}), i \in {2, \dots, N}, \\ α_{1} & \sim N (0, σ^{2} / (1 - ϕ^{2})), | ϕ | < 1, \end{matrix} \end{matrix}

(2)

where a

(1 \times k)

vector

x_{i}^{'}

is the ith row of the basis spline (B-Spline) function

(N \times k)

matrix X and a

(k \times 1)

vector

β

is the control points of B-Spline. The random variable

ε_{i}

is supposed to follow a positive-valued continuous distribution. In the literature,

\begin{matrix} Weibull (γ, 1) : & p (ε_{i} | γ) = γ ε_{i}^{γ - 1} exp (- ε_{i}^{γ}), \\ Gamma (γ, γ) : & p (ε_{i} | γ) = \frac{γ^{γ}}{Γ (γ)} ε_{i}^{γ - 1} exp (- γ ε_{i}), \end{matrix}

are often used1 as the distribution of

ε_{i}

. In our study, we will follow the precedent. Thus, the probability density function (pdf) of

y_{i}

is

p (y_{i} | α_{i}, β, γ) = \{\begin{matrix} γ y_{i}^{γ - 1} exp (- γ (x_{i}^{'} β + α_{i}) - y_{i}^{γ} exp (- γ (x_{i}^{'} β + α_{i}))), \\ (Weibull distribution), \\ \frac{γ^{γ}}{Γ (γ)} y_{i}^{γ - 1} exp (- γ (x_{i}^{'} β + α_{i} + y_{i} exp (- x_{i}^{'} β - α_{i}))), \\ (gamma distribution) . \end{matrix}

(3)

2.2. Basis Spline (B-Spline)

To imply the intraday seasonality by the model, we follow Eilers and Marx (1996), Lang and Brezger (2004), Brownlees and Vannucci (2013) and Gingras and McCausland (2020) and use the B-Spline in the observation equation. B-Spline is a piecewise polynomial function driven by control points. Owing to the properties, we can express a smooth curve without using the higher order function. In our model, the B-Spline is implemented as

x_{i}^{B^{'}} = (B_{1, n} (τ_{i}), \dots, B_{k, n} (τ_{i})), k = m - n - 1

(4)

where m is the number of the knots

t_{1} \leq t_{2} \leq \dots \leq t_{m}

, and n is the order of B-Spline.

B_{j, n} (τ_{i})

is the basis function defined by means of the Cox–de Boor recursion formula by De Boor and De Boor (1978) as

\begin{matrix} B_{j, 0} (τ_{i}) & = \{\begin{matrix} 1 if t_{j} \leq τ_{i} < t_{j + 1} \\ 0 otherwise, \end{matrix} \end{matrix}

(5)

\begin{matrix} B_{j, n} (τ_{i}) & = \frac{τ_{i} - t_{j}}{t_{j + n} - t_{j}} B_{j, n - 1} (τ_{i}) + \frac{t_{j + n + 1} - τ_{i}}{t_{j + n + 1} - t_{j + 1}} B_{j + 1, n - 1} (τ_{i}) \end{matrix}

(6)

2.3. Limit Order Book Information

By the same method, we added the order book information as

x_{i}^{'} = (x_{i}^{B^{'}}, d_{i 1}, d_{i 2}, d_{i 3})

(7)

where

d_{1}

is the price spread,

d_{2}

is the sum of the volume at the best bid and ask price, and

d_{3}

is the bid-ask volume ratio, which means the value of the best bid volume divided by the best ask volume minus one for centering. This design allows us to simultaneously estimate the regression coefficients

β

on the diurnal seasonality and the effect of external information in the SCD model.

3. Estimation Method

3.1. Joint Posterior Distribution

The SCD model (1)–(2) is a non-linear non-Gaussian state–space model where (1) is the measurement equation, (2) is the transition equation, and

α_{i}

,

i \in {1, \dots, n}

are the state variables. Since (2) is a stationary AR(1) model, the joint probability distribution of

α = [α_{1}; \dots; α_{n}]

is

N (0, σ^{2} V^{- 1})

where

V = [\begin{matrix} 1 & - ϕ & 0 & \dots & \dots & \dots & 0 \\ - ϕ & 1 + ϕ^{2} & - ϕ & ⋱ & ⋮ \\ 0 & - ϕ & 1 + ϕ^{2} & - ϕ & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋱ & ⋱ & ⋱ & ⋱ & ⋮ \\ ⋮ & ⋱ & - ϕ & 1 + ϕ^{2} & - ϕ & 0 \\ ⋮ & ⋱ & - ϕ & 1 + ϕ^{2} & - ϕ \\ 0 & \dots & \dots & \dots & 0 & - ϕ & 1 \end{matrix}],

(8)

is a tridiagonal matrix, and it is positive definite as long as

| ϕ | < 1

. The prior of

β

,

γ

,

ϕ

and

σ^{2}

in our study are

\begin{matrix} β \sim N ({\bar{μ}}_{β}, {\bar{A}}_{β}^{- 1}), γ \sim Gamma (a_{γ}, b_{γ}), \\ \frac{ϕ + 1}{2} \sim Beta (a_{ϕ}, b_{ϕ}), σ^{2} \sim Inv . Gamma (a_{σ}, b_{σ}), \end{matrix}

(9)

where

\begin{matrix} N (μ, σ) : & p (x | μ, σ) = \frac{1}{\sqrt{2 π σ^{2}}} exp \{- \frac{{(x - μ)}^{2}}{2 σ^{2}}\}, \\ Gamma (α, β) : & p (x | α, β) = \frac{β^{α}}{Γ (α)} x_{i}^{α - 1} exp (- β x), x > 0, α, β > 0, \\ Beta (α, β) : & p (x | α, β) = \frac{Γ (α + β)}{Γ (α) Γ (β)} x^{α - 1} {(1 - x)}^{β - 1}, 0 < x < 1, α, β > 0 \\ Inv . Gamma (α, β) : & p (x | α, β) = \frac{β^{α}}{Γ (α)} x_{i}^{- α - 1} exp (- \frac{β}{x}), x > 0, α, β > 0 . \end{matrix}

Therefore, the joint posterior density of

(α, β, γ, σ^{2})

for the SCD model (1)–(2) is

p (α, β, γ, σ^{2} | y) \propto \prod_{i = 1}^{n} p (y_{i} | α_{i}, β, γ) \cdot p (α | ϕ, σ^{2}) \cdot p (β) p (γ) p (ϕ) p (σ^{2}),

(10)

where

p (y_{i} | α_{i}, β, γ)

is the pdf of the Weibull or Gamma distribution in (3),

p (α | ϕ, σ^{2})

is the pdf of

N (0, σ^{2} V^{- 1})

, and

p (β), \dots, p (σ^{2})

are the pdf’s of the priors in (9). Since analytical evaluation of the joint posterior distribution (10) is impractical, we apply an MCMC method.

In the MCMC method, we generate a random sample of

(α, β, γ, ϕ, σ^{2})

, say

{(α^{(r)}, β^{(r)},

γ^{(r)}, ϕ^{(r)}, σ^{2 (r)} {)}}_{r = 1}^{m}

, from the joint posterior distribution (10), and numerically evaluate the posterior statistics necessary for Bayesian inference with Monte Carlo integration. The outline of the sampling scheme is given as follows:

Step 1:: Initialize $(α^{(0)}, β^{(0)}, γ^{(0)}, ϕ^{(0)}, σ^{2 (0)})$ and set the counter $r = 1$ .
Step 2:: Generate $α^{(r)}$ from $p (α | β^{(r - 1)}, γ^{(r - 1)}, ϕ^{(r - 1)}, σ^{2 (r - 1)}, y)$ .
Step 3:: Generate $β^{(r)}$ from $p (β | α^{(r)}, γ^{(r - 1)}, ϕ^{(r - 1)}, σ^{2 (r - 1)}, y)$ .
Step 4:: Generate $γ^{(r)}$ from $p (γ | α^{(r)}, β^{(r)}, ϕ^{(r - 1)}, σ^{2 (r - 1)}, y)$ .
Step 5:: Generate $ϕ^{(r)}$ from $p (ϕ | α^{(r)}, β^{(r)}, γ^{(r)}, σ^{2 (r - 1)}, y)$ .
Step 6:: Generate $σ^{2 (r)}$ from $p (σ^{2} | α^{(r)}, β^{(r)}, ϕ^{(r)}, y)$ .
Step 7:: Let $r = r + 1$ , and go to Step 2 until the burn-in iterations are completed.
Step 8:: Reset the counter $r = 1$ , and repeat Step 2–6 m times to obtain ${(α^{(r)}, β^{(r)}, γ^{(r)}, ϕ^{(r)},$ $σ^{2 (r)} {)}}_{r = 1}^{m}$ .

3.2. State Variables

The conditional posterior density of the state variables

α

is

p (α | β, γ, ϕ, σ^{2}, y) \propto \prod_{i = 1}^{n} p (y_{i} | α_{i}, β, γ) \cdot p (α | ϕ, σ^{2}) .

(11)

First we consider the second-order Taylor expansion of the log likelihood

\begin{matrix} ℓ (α) \equiv \sum_{i = 1}^{n} log p (y_{i} | α_{i}, β, γ), \\ log p (y_{i} | α_{i}, β, γ) = \\ \{\begin{matrix} log γ - log y_{i} + γ (log y_{i} - x_{i}^{'} β - α_{i}) - exp (γ (log y_{i} - x_{i}^{'} β - α_{i})), \\ (Weibull distribution), \\ γ log γ - log Γ (γ) - log y_{i} + γ (log y_{i} - x_{i}^{'} β - α_{i} - exp (log y_{i} - x_{i}^{'} β - α_{i})), \\ (Gamma distribution), \end{matrix} \end{matrix}

(12)

in the neighborhood of

α^{*} = [α_{1}^{*}; \dots; α_{n}^{*}] \in R^{n}

:

ℓ (α) \approx ℓ (α^{*}) + g {(α^{*})}^{'} (α - α^{*}) - \frac{1}{2} {(α - α^{*})}^{'} Q (α^{*}) (α - α^{*}),

(13)

where

\begin{matrix} g (α^{*}) & \equiv \nabla_{α} ℓ (α^{*}) = [g_{1} (α^{*}); \dots; g_{n} (α^{*})], \\ Q (α^{*}) & \equiv - \nabla_{α} \nabla_{α}^{'} ℓ (α^{*}) = diag {q_{1} (α^{*}), \dots, q_{n} (α^{*})}, \end{matrix}

and

\begin{matrix} g_{i} (α^{*}) & = \{\begin{matrix} - γ + γ exp (γ (log y_{i} - x_{i}^{'} β - α_{i}^{*})), & (Weibull distribution), \\ - γ + γ exp (log y_{i} - x_{i}^{'} β - α_{i}^{*}), & (Gamma distribution), \end{matrix} \\ q_{i} (α^{*}) & = \{\begin{matrix} γ^{2} exp (γ (log y_{i} - x_{i}^{'} β - α_{i}^{*})), & (Weibull distribution), \\ γ exp (log y_{i} - x_{i}^{'} β - α_{i}^{*}), & (Gamma distribution), \end{matrix} \\ i \in {1, \dots, n} . \end{matrix}

Note that

Q (α^{*})

is always positive definite.

Since the log prior density of

α

is

log p (α) \equiv - \frac{n}{2} log (2 π σ^{2}) + \frac{1}{2} log (1 - ϕ^{2}) - \frac{1}{2 σ^{2}} α^{'} V α,

(14)

the conditional posterior density of

α

can be approximated by

\begin{matrix} p (α | β, γ, ϕ, τ^{2}, y) \\ = C exp [ℓ (α) + log p (α)] \\ \approx C exp [ℓ (α^{*}) + g {(α^{*})}^{'} (α - α^{*}) - \frac{1}{2} {(α - α^{*})}^{'} Q (α^{*}) (α - α^{*}) + log p (α)] \\ = C exp [ℓ (α^{*}) - \frac{n}{2} log (2 π τ^{2}) + \frac{1}{2} log (1 - ϕ^{2})] \\ \times exp [g {(α^{*})}^{'} (α - α^{*}) - \frac{1}{2} {(α - α^{*})}^{'} Q (α^{*}) (α - α^{*}) - \frac{1}{2 τ^{2}} α^{'} V α], \end{matrix}

(15)

where C is the normalizing constant. By completing the square in (15), we have

\begin{matrix} g {(α^{*})}^{'} (α - α^{*}) - \frac{1}{2} {(α - α^{*})}^{'} Q (α^{*}) (α - α^{*}) - \frac{1}{2 σ^{2}} α^{'} V α \\ = - \frac{1}{2} {(α - μ_{α} (α^{*}))}^{'} Σ_{α} {(α^{*})}^{- 1} (α - μ_{α} (α^{*})) \\ + \frac{1}{2} g {(α^{*})}^{'} Q {(α^{*})}^{- 1} g (α^{*}) \\ - \frac{1}{2} {(g (α^{*}) + Q (α^{*}) α^{*})}^{'} (Σ_{α} (α^{*}) + Q {(α^{*})}^{- 1}) (g (α^{*}) + Q (α^{*}) α^{*}), \end{matrix}

(16)

where

Σ_{α} (α^{*}) = {(\frac{1}{τ^{2}} V + Q (α^{*}))}^{- 1}, μ_{α} (α^{*}) = Σ_{α} (α^{*}) (g (α^{*}) + Q (α^{*}) α^{*}) .

Therefore, the right-hand side of (15) is proportional to the pdf of the following normal distribution:

α \sim N (μ_{α} (α^{*}), Σ_{α} (α^{*})) .

(17)

Hoping that the approximation (15) is sufficiently accurate, we apply the Metropolis–Hastings (MH) algorithm to generate

α

from the conditional posterior distribution (11) by using (17) as the proposal distribution.

In practice, however, we need to address two issues:

The choice of $α^{*}$ is crucial to make the approximation (15) workable.
The acceptance rate of the MH algorithm tends to be low when $α$ is a high-dimensional vector.

We address the former issue by using the mode of the conditional posterior density as

α^{*}

. The search of the mode is performed by the following recursion:

Step 1:: Initialize $α^{* (0)}$ , and set the counter $r = 1$ .
Step 2:: Update $α^{* (r)}$ by $α^{* (r)} = μ_{α} (α^{* (r - 1)})$ .
Step 3:: Let $r = r + 1$ and go to Step 2 unless ${max}_{i = 1, \dots, n} | α_{i}^{* (r)} - α_{i}^{* (r - 1)} |$ is less than the preset tolerance level.

It turns out that the above algorithm is equivalent to the Newton–Raphson method, and in our experience, it mostly attains convergence in a few iterations.

We address the latter issue by applying a so-called block sampler. In the block sampler, we randomly partition

α

into several sub-vectors (blocks), generate each block

α_{b}

from its conditional distribution given the rest of the blocks

α_{r}

and apply the MH algorithm to each generated block.

\begin{matrix} α = (\begin{matrix} α_{b} \\ α_{r} \end{matrix}), α^{*} = (\begin{matrix} α_{b}^{*} \\ α_{r}^{*} \end{matrix}), Σ_{α} {(α^{*})}^{- 1} = (\begin{matrix} Ω_{b b} & Ω_{b r} \\ Ω_{r b} & Ω_{r r} \end{matrix}) \end{matrix}

(18)

Using (17), the kernel of the conditional distribution

p (α_{b} ∣ α_{r})

is introduced by

\begin{matrix} {(α - α^{*})}^{'} Σ_{α} {(α^{*})}^{- 1} (α - α^{*}) \\ = & {(\begin{matrix} α_{b} - α_{b}^{*} \\ α_{r} - α_{r}^{*} \end{matrix})}^{'} (\begin{matrix} Ω_{b b} & Ω_{b r} \\ Ω_{r b} & Ω_{r r} \end{matrix}) (\begin{matrix} α_{b} - α_{b}^{*} \\ α_{r} - α_{r}^{*} \end{matrix}) \\ = & {(α_{b} - α_{b}^{*})}^{'} Ω_{b b} (α_{b} - α_{b}^{*}) + 2 {(α_{r} - α_{r}^{*})}^{'} Ω_{r b} (α_{b} - α_{b}^{*}) \\ + {(α_{r} - α_{r}^{*})}^{'} Ω_{r r} (α_{r} - α_{r}^{*}) \end{matrix}

(19)

By completing the square in (19), we have

\begin{matrix} {(α_{b} - α_{b}^{*})}^{'} Ω_{b b} (α_{b} - α_{b}^{*}) + 2 {(α_{r} - α_{r}^{*})}^{'} Ω_{r b} (α_{b} - α_{b}^{*}) + {(α_{r} - α_{r}^{*})}^{'} Ω_{r r} (α_{r} - α_{r}^{*}) \\ = & {(α_{b} - μ_{α_{b}} (α^{*}))}^{'} Σ_{α_{b}} {(α^{*})}^{- 1} (α_{b} - μ_{α_{b}} (α^{*})) \\ - 2 {(α_{r} - α_{r}^{*})}^{'} Ω_{r b} α_{b}^{*} + {(α_{r} - α_{r}^{*})}^{'} (Ω_{r r} - Ω_{r b} Ω_{b b}^{- 1} Ω_{b r}) (α_{r} - α_{r}^{*}) \end{matrix}

(20)

where

μ_{α_{b}} (α^{*}) = α_{b}^{*} - Ω_{b b}^{- 1} Ω_{b r} (α_{r} - α_{r}^{*}), Σ_{α_{b}} (α^{*}) = Ω_{b b}^{- 1}

Therefore, the right-hand side of (19) is the kernel of the pdf of the following normal distribution:

α_{b} \sim N (μ_{α_{b}} (α^{*}), Σ_{α_{b}} (α^{*})) .

(21)

3.3. Regression Coefficients

The simulation strategy for the regression coefficients

β

is almost identical to the one for the state variables

α

. The second-order Taylor expansion of the log likelihood (12) with respect to

β

in the neighborhood of

β^{*} \in R^{m}

is

ℓ (β) \approx ℓ (β^{*}) + g {(β^{*})}^{'} (β - β^{*}) - \frac{1}{2} {(β - β^{*})}^{'} Q (β^{*}) (β - β^{*}),

(22)

where

\begin{matrix} g (β^{*}) & \equiv \nabla_{β} ℓ (β^{*}) = \sum_{i = 1}^{n} g_{i} (β^{*}) x_{i}, \\ Q (β^{*}) & \equiv - \nabla_{β} \nabla_{β}^{'} ℓ (β^{*}) = \sum_{i = 1}^{n} q_{i} (β^{*}) x_{i} x_{i}^{'}, \end{matrix}

and

\begin{matrix} g_{i} (β^{*}) & = \{\begin{matrix} - γ + γ exp (γ (log y_{i} - x_{i}^{'} β^{*} - α_{i}), & (Weibull distribution), \\ - γ + γ exp (log y_{i} - x_{i}^{'} β^{*} - α_{i}), & (Gamma distribution), \end{matrix} \\ q_{i} (β^{*}) & = \{\begin{matrix} γ^{2} exp (γ (log y_{i} - x_{i}^{'} β^{*} - α_{i})), & (Weibull distribution), \\ γ exp (log y_{i} - x_{i}^{'} β^{*} - α_{i}), & (Gamma distribution), \end{matrix} \\ i \in {1, \dots, n} . \end{matrix}

Note that

Q (β^{*})

is always positive definite.

With the prior

β \sim N ({\bar{μ}}_{β}, {\bar{A}}_{β}^{- 1})

, the conditional posterior density of

β

can be approximated by

\begin{matrix} p (β | α, γ, ϕ, τ^{2}, y) \\ = C exp [ℓ (β) + log p (β)] \\ \approx C exp [ℓ (β^{*}) - \frac{1}{2} log (2 π) + \frac{1}{2} log | {\bar{A}}_{β} |] \\ \times exp [g {(β^{*})}^{'} (β - β^{*}) - \frac{1}{2} {(β - β^{*})}^{'} Q (β^{*}) (β - β^{*}) - \frac{1}{2} {(β - {\bar{μ}}_{β})}^{'} {\bar{A}}_{β} (β - {\bar{μ}}_{β})] . \end{matrix}

(23)

By completing the square as in (16), the proposal distribution for the MH algorithm is derived as

β \sim N (μ_{β} (β^{*}), Σ_{β} (β^{*})),

(24)

where

Σ_{β} (β^{*}) = {({\bar{A}}_{β} + Q (β^{*}))}^{- 1}, μ_{β} (β^{*}) = Σ_{β} (β^{*}) ({\bar{A}}_{β} {\bar{μ}}_{β} + g (β^{*}) + Q (β^{*}) β^{*}) .

The search algorithm for

β^{*}

is the same as

α^{*}

. Since the dimension of

β

is considerably smaller than

α

, it is not necessary to apply the block sampler in our experience.

3.4. Shape Parameter

The sampling strategy for the shape parameter

γ

is also the same as

α

and

β

. Since the prior of

γ

is not Gaussian, instead of the log likelihood (12), we consider the second-order Taylor expansion of the log conditional posterior density of

γ

:

f (γ) \equiv \sum_{i = 1}^{n} log p (y_{i} | α_{i}, β, γ) + log p (γ) + constant,

with respect to

γ

in the neighborhood of

γ^{*} > 0

, i.e.,

f (γ) \approx f (γ^{*}) + g (γ^{*}) (γ - γ^{*}) - \frac{1}{2} q (γ^{*}) {(γ - γ^{*})}^{2},

(25)

where

\begin{matrix} g (γ^{*}) & \equiv \nabla_{γ} f (γ^{*}) \\ = \{\begin{matrix} \frac{n}{γ^{*}} + \sum_{i = 1}^{n} (u_{i} - u_{i} e^{γ^{*} u_{i}}) + \frac{a_{γ} - 1}{γ^{*}} - b_{γ}, \\ (Weibull distribution), \\ n + n log γ^{*} - n ψ^{(0)} (γ^{*}) + \sum_{i = 1}^{n} (u_{i} - e^{u_{i}}) + \frac{a_{γ} - 1}{γ^{*}} - b_{γ}, \\ (Gamma distribution), \end{matrix} \\ q (γ^{*}) & \equiv - \nabla_{γ}^{2} f (γ^{*}) \\ = \{\begin{matrix} \frac{n}{γ^{* 2}} + \sum_{i = 1}^{n} u_{i}^{2} e^{γ^{*} u_{i}} + \frac{a_{γ} - 1}{γ^{* 2}}, & (Weibull distribution), \\ - \frac{n}{γ^{*}} + n ψ^{(1)} (γ^{*}) + \frac{a_{γ} - 1}{γ^{* 2}}, & (Gamma distribution), \end{matrix} \\ u_{i} & \equiv log y_{i} - x_{i}^{'} β - α_{i}, i \in {1, \dots, n}, \end{matrix}

and

ψ^{(s)}

is the polygamma function of order s.

q (γ^{*})

for the Weibull distribution is positive for any

γ^{*} > 0

if

n + a_{γ} > 1

which is satisfied in any applications. To prove

q (γ^{*}) > 0

for the Gamma distribution, we use the following formula (Equation (6.1.38) in Abramowitz et al. (1988)):

log Γ (γ) = \frac{1}{2} log (2 π) + (γ - \frac{1}{2}) log γ - γ + \frac{θ}{12 γ}, γ > 0, 0 < θ < 1 .

(26)

Taking the differential of both side of (26) twice, we have

ψ^{(1)} (γ) = \nabla_{γ}^{2} log Γ (γ) = \frac{1}{γ} + \frac{1}{2 γ^{2}} + \frac{θ}{6 γ^{3}} .

(27)

Replacing

ψ^{(1)}

in

q (γ^{*})

with (27), we have

q (γ^{*}) = \frac{n + 2 a_{γ} - 2}{2 γ^{* 2}} + \frac{n θ}{6 γ^{* 3}},

(28)

which is positive if

n + 2 a_{γ} > 2

. This condition is always satisfied in practice.

We directly apply the completing-the-square technique to (25) and obtain the proposal distribution for the MH algorithm which is derived as

γ \sim N (μ_{γ} (β^{*}), σ_{γ}^{2} (γ^{*})),

(29)

where

σ_{γ}^{2} (γ^{*}) = \frac{1}{q (γ^{*})}, μ_{γ} (γ^{*}) = γ^{*} + \frac{g (γ^{*})}{q (γ^{*})} .

If we use the mode of

f (γ)

as

γ^{*}

,

g (γ^{*}) = 0

always holds due to the global concavity of

f (γ)

. Thus

μ_{γ} (γ^{*})

is effectively identical to

γ^{*}

. The search algorithm for

γ^{*}

is identical to those for

α^{*}

and

β^{*}

except for the fact that

γ^{*}

is a scalar. The implementation of the MH algorithm with the proposal distribution (29) is straightforward.

3.5. AR(1) Coefficient and Variance

Once the state variables

α

are generated, the conditional posterior density of

ϕ

is given by

\begin{matrix} p (ϕ | y, α, σ^{2}) & \propto \sqrt{1 - ϕ^{2}} exp [- \frac{(1 - ϕ^{2}) α_{1}^{2} + \sum_{i = 2}^{n} {(α_{i} - ϕ α_{i - 1})}^{2}}{2 σ^{2}}] \\ \times {(1 + ϕ)}^{a_{ϕ} - 1} {(1 - ϕ)}^{b_{ϕ} - 1} . \end{matrix}

(30)

By completing the square, we have

\begin{matrix} (1 - ϕ^{2}) α_{1}^{2} + \sum_{i = 2}^{n} {(α_{i} - ϕ α_{i - 1})}^{2} \\ = (1 - ϕ^{2}) α_{1}^{2} + \sum_{i = 2}^{n} α_{i}^{2} - 2 ϕ \sum_{i = 2}^{n} α_{i} α_{i - 1} + ϕ^{2} \sum_{i = 2}^{n} α_{i - 1}^{2} \\ = α_{1}^{2} + \sum_{i = 2}^{n} α_{i}^{2} - 2 ϕ \sum_{i = 2}^{n} α_{i} α_{i - 1} + ϕ^{2} \sum_{i = 2}^{n - 1} α_{i}^{2} \\ = \sum_{i = 2}^{n - 1} α_{i}^{2} {(ϕ - \frac{\sum_{i = 2}^{n} α_{i} α_{i - 1}}{\sum_{i = 2}^{n - 1} α_{i}^{2}})}^{2} + α_{1}^{2} - \frac{{(\sum_{i = 2}^{n} α_{i} α_{i - 1})}^{2}}{\sum_{i = 2}^{n - 1} α_{i}^{2}} . \end{matrix}

With the above expression in mind, we use the following normal distribution:

ϕ \sim N (\frac{\sum_{i = 2}^{n} α_{i} α_{i - 1}}{\sum_{i = 2}^{n - 1} α_{i}^{2}}, \frac{σ^{2}}{\sum_{i = 2}^{n - 1} α_{i}^{2}}),

(31)

as the proposal distribution in the MH algorithm for

ϕ

.

For the variance

σ^{2}

, the exact expression of the conditional posterior distribution is available. With the inverse Gamma prior, it is derived as

σ^{2} | y, α, ϕ \sim Inv . Gamma (a_{σ} + \frac{n}{2}, b_{σ} + \frac{1}{2} α^{'} V α) .

(32)

3.6. Generalized Gibbs Sampler

To facilitate convergence of the Markov chain sampling path, we utilize a generalized Gibbs sampler by Liu and Sabatti (2000). To do so, we first apply the following reparameterization:

ξ = \frac{1}{γ}, τ = \sqrt{σ^{2}} .

to the posterior distribution of

(α, β, γ, ϕ, σ^{2})

. Then, the pdf of the reparameterized posterior distribution is given by

f (α, β, ξ, ϕ, σ | y) = p (α, β, ξ^{- 1}, ϕ, σ^{2} | y) | J |,

(33)

where

p (α, β, γ, ϕ, σ^{2} | y)

is the pdf of the original posterior distribution, and

| J | = ξ^{- 2} \times 2 τ

is the Jacobian. Next we introduce the scale transformation:

(c α, c β, c ξ, c σ)

,

c > 0

. In the generalized Gibbs sampler, we generate a random scale c from its conditional posterior distribution and apply the scale transformation to the parameters so that they would be updated more frequently and bounce more freely around in the parameter space. The conditional posterior density of c is proportional to

{| c |}^{n + k + 1} p (c α, c β, c ξ, ϕ, c σ | y),

(34)

where

n + k + 1

is the number of transformed parameters minus 1. The outline of the generalized Gibbs sampler is summarized as follows.

Step 1:: Obtain $(α^{(r)}, β^{(r)}, γ^{(r)}, σ^{2 (r)})$ at the r-th run of the sampling scheme.
Step 2:: Generate c from (34) with the MH algorithm.
Step 3:: Transform the parameters as $(c α^{(r)}, c β^{(r)}, γ^{(r)} / c, c^{2} σ^{2 (r)})$ .

At the end of every cycle in the sampling scheme, we execute Step 1–3. Since the pdf of c is non-standard, we use the MH algorithm to generate c in Step 2 in the same manner as

(α, β, γ)

, and we execute Step 3 only if the newly generated c is accepted in the MH algorithm.

4. Empirical Applications

4.1. Data description

The proposed SCD model is estimated with tick data of Toyota Motor Corporation (TMC) (Aichi, Japan), Asahi Group Holdings, Ltd. (AGH) (Tokyo, Japan) and Japan Airlines Co., Ltd. (JAL) (Tokyo, Japan) for 4 January 2016, 23 June 2016 and 27 July 2016 obtained from Nikkei NEEDS. The dates are chosen because 4 January was the first trading day of the year, 23 June was the day before the Brexit vote and 27 July was the day before the Bank of Japan’s policy meeting.

The daily trading session on the TSE, where these three companies are listed, is divided into two parts: the morning session is from 9:00 to 11:30, and the afternoon session is from 12:30 to 15:00. In this study, only the morning session is treated, and 9:00 is normalized to

t = 0

and 11:30 to

t = 1

. As in the study by Bauwens and Veredas (2004), we exclude data that occurred at the same time. This is based on the assumption that the simultaneously recorded trades are only executed by small orders on the board due to one large order, i.e., they are caused by a single event. Since the TSE tick-data is recorded in microseconds, the percentage of null duration is very small, as shown in Table 1. For example, Veredas et al. (2002) find that 26.5% of the data is null duration.

4.2. Empirical Results

The results in this section are produced by 20,000 iterations and 10,000 burn-ins. The setting of hyperparameters is below

(a_{ϕ}, b_{ϕ}) = (245, 5), (a_{τ}, b_{τ}) = (5, 0.04), (γ_{a}, γ_{b}) = (1, 1) .

Since our proposed models include the parameters of B-Spline, the smoothing parameter

λ

and the number of knots k, we need to choose a model from candidate parameters

λ = {5, 10, 100}

and

k = {150, 30, 15, 10}

. The number of k means that we set knots every 1, 5, 10 and 30 minutes, respectively. As shown in Table 1, the trading volume of TMC is several times larger than that of the two companies AGH and JAL; thus we set k = (150, 30, 15) for TMC and k = (30, 15, 10) for AGH and JAL as candidates. We use the Deviance information criterion (DIC) proposed by Spiegelhalter et al. (2002) as a criterion for model selection.

Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 show the DIC for each setting of TMC on 4 January 2016, 23 June 2016 and 27 July 2016, respectively. The best of each distribution is underlined, and the best overall is in bold. Using the variable selections in bold, the MCMC result of the limit order book information and plots of intraday seasonality are shown. The estimated parameters about the limit order book information are reported in Table 8, Table 9 and Table 10. It is intuitive that the spread parameter is positive. This indicates that the larger the spread, the more open the trading interval. The effect of trading volume is negative only on 27 July. The effect of bid–ask ratio is positive only on 23 June. The 95% of CIs for the other items include 0, thus no impact can be claimed. Figure 1, Figure 2 and Figure 3 display the curve of intraday seasonality on 4 January, 23 June and 27 July 2016. Figure 1 takes the typical form of the inversed U-shape; however, Figure 3 shows that the duration becomes sparse toward noon. This is likely due to a change in trading behavior ahead of the Bank of Japan’s policy meeting.

Table 11, Table 12, Table 13, Table 14, Table 15 and Table 16 show the DIC for each setting of AGH on 4 January 2016, 23 June 2016 and 27 July 2016, respectively. In Table 17, Table 18 and Table 19, as in the case of TMC, the effect of the spread is positive on all dates. In other words, the larger the spread, the longer the interval between the next transaction. The trading volume parameters are negative on 4 January and 27 July, while the bid–ask ratio does not appear to affect the interval on all dates. Negative impact of trading volume indicates that traders may tend to falter or wait and see when large volume limit orders are placed. Figure 4, Figure 5 and Figure 6 display the curve of intraday seasonality on 4 January, 23 June and 27 July 2016. The credible interval on 4 January and 23 June is wider, but this is most likely due to the small sample size, as Table 1 shows.

Table 20, Table 21, Table 22, Table 23, Table 24 and Table 25 show the DIC for each setting of JAL on 4 January 2016, 23 June 2016 and 27 July 2016, respectively. The estimation results of the limit order book information parameters in Table 26, Table 27 and Table 28 show almost the same results as those for TMC and AGH. Specifically, the spread parameters are positive on all dates, the volume parameters are negative on 23 June and 27 July, and the bid-ask ratio parameter is positive only on 27 July. Figure 7, Figure 8 and Figure 9 display the curve of intraday seasonality on 4 January, 23 June and 27 July 2016. The credible interval on 23 June is wider, but this is most likely due to the small sample size, as Table 1 shows.

To summarize the results of the estimation of the parameters of the limit order book information for the three stocks, the spread has a positive impact in all nine patterns of the three dates for the three stocks. We argue that the fact that trades are more likely to occur when price differentials are narrow is a near-obvious result from the perspective of understanding the order generation mechanism, but it is a useful result from the perspective of analyzing and modeling the time interval of trades based on the limit order book information. As for volume, five of the nine patterns show a negative impact, indicating that large volume at the best quote may not lead to active trading. Unlike the spread, there are some patterns in which the effect of volume is not pronounced. Therefore, identifying under what conditions the effect of volume changes is a future issue. The bid–ask ratio is positive in one of the nine patterns and negative in one pattern. The ratio of buy to sell order volume does not seem to affect the time interval.

4.3. Model Comparison

For the models selected in Section 4.2, we compare their performance with existing methods. The comparison is made with the SCD model based on Bauwens and Veredas (2004) (the original SCD). Following this paper, intraday seasonality is removed in advance using the Nadaraya–Watson estimator. The kernel is the quartic, and the bandwidth h is set to

2.78 s N^{- 1 / 5}

where s is the standard deviation of the data.

The DIC and the widely applicable information criterion (WAIC) proposed by Watanabe (2013) are used as the model selection criteria.

Out of a total of eight patterns shown in Table 29, Table 30 and Table 31, the proposed model outperformed in the WAIC in six patterns and the model with a priori removal of intraday seasonality in the DIC in five patterns. This may be due to the better generalization performance of the proposed model with simultaneous estimation, since the DIC tends to select overfitted models, as indicated by Chan and Grant (2016). The original model for JAL on 23 June could not be estimated because the matrix was not positive definite.

Table 32, Table 33, Table 34, Table 35, Table 36, Table 37, Table 38, Table 39 and Table 40 compare the results of the parameter estimation for each of the three companies in the proposed model and the original model. In all eight patterns, AR(1) Coefficient

ϕ

of the original model exceeds that of the proposed model, and it can be observed that the prior intraday seasonality removal does not successfully remove the persistence of the state variable. The shape parameter

γ

changes significantly in Table 37, Table 38 and Table 40 where the Gamma distribution is selected, while it changes little in the remaining five patterns where the Weibull distribution is selected. The variance

σ

of the proposed model increased in six out of eight patterns compared to that of the original model. This may be due to the fact that the extreme rigidity of the state variable

α

in the original model has been eliminated because

ϕ

no longer takes values close to one. The two patterns in Table 35 and Table 36 with decreased

σ

in the proposed model are consistent with the two highest when

ϕ

is compared among the proposed models (0.9650, 0.9116), supporting the above interpretation.

5. Conclusions

This paper proposed an extension of the SCD model for estimating the intraday seasonality and duration clustering simultaneously, while previous studies remove this intraday seasonality externally. The intraday seasonality is represented by a B-Spline, and its coefficients are simulated along with the model parameters in the MCMC loop. This has the advantage not only of allowing simultaneous estimation of the intraday seasonality, but also of facilitating the extension to the estimation of the effect of external information such as the limit order information. One limitation of the method in this paper is that the number of knots and smoothing parameters must be selected based on an information criterion, which is computationally expensive. Future research should address these issues and reduce the computational cost. We also use the block-sampling for state variable and change a setting for the parameter in the Gamma distribution for a stable sampling.

In the empirical analysis, the proposed SCD model is applied to three days of microsecond tick-data for three stocks listed on the TSE stock market. The proposed method outperforms the existing method in the WAIC sense in many of the estimation results for the three stocks with different trading frequencies, consistent with previous studies. Since very few studies have incorporated the limit order book information into the SCD models, the use of the limit order book information and discussion of the corresponding parameter estimation results are also contributions of this paper. We find that the spread has a positive impact on the trading time interval for all patterns, while order size has a negative impact for about half of the patterns. Future studies will confirm the superiority of simultaneous estimation in forecasting with a wide range of data.

Author Contributions

T.T.: validation, data curation, writing-original draft preparation and visualization; T.N.: resources, writing—review and editing, supervision, project administration and funding acquisition. All authors: conceptualization, methodology, software, formal analysis and investigation. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number 19K01592.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Note

1	Instead of $Gamma (γ, γ)$ , many previous studies have used $Gamma (γ, 1)$ . In our experience, however, the former is more stable in the posterior simulation than the latter.

References

Abramowitz, Milton, Irene A. Stegun, and Robert H. Romer. 1988. Handbook of mathematical functions with formulas, graphs, and mathematical tables. American Journal of Physics 56: 958. [Google Scholar] [CrossRef] [Green Version]
Bauwens, Luc, and Pierre Giot. 2000. The logarithmic acd model: An application to the bid-ask quote process of three nyse stocks. Annales d’Economie et de Statistique 60: 117–49. [Google Scholar] [CrossRef]
Bauwens, Luc, and David Veredas. 2004. The stochastic conditional duration model: A latent variable model for the analysis of financial durations. Journal of Econometrics 119: 381–412. [Google Scholar] [CrossRef]
Bollerslev, Tim. 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31: 307–27. [Google Scholar] [CrossRef] [Green Version]
Brownlees, Christian T., and Marina Vannucci. 2013. A bayesian approach for capturing daily heterogeneity in intra-daily durations time series. Studies in Nonlinear Dynamics and Econometrics 17: 21–46. [Google Scholar] [CrossRef]
Chan, Joshua C. C., and Angelia L. Grant. 2016. On the observed-data deviance information criterion for volatility modeling. Journal of Financial Econometrics 14: 772–802. [Google Scholar] [CrossRef] [Green Version]
De Boor, Carl. 1978. A Practical Guide to Splines. New York: Springer, vol. 27. [Google Scholar]
Eilers, Paul H. C., and Brian D. Marx. 1996. Flexible smoothing with b-splines and penalties. Statistical Science 11: 89–121. Available online: https://projecteuclid.org/journals/statistical-science/volume-11/issue-2/Flexible-smoothing-with-B-splines-and-penalties/10.1214/ss/1038425655.full (accessed on 10 October 2022). [CrossRef]
Engle, Robert F. 1982. Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica: Journal of the Econometric Society 50: 987–1007. [Google Scholar] [CrossRef]
Engle, Robert F., and Jeffrey R. Russell. 1998. Autoregressive conditional duration: A new model for irregularly spaced transaction data. Econometrica 66: 1127–62. [Google Scholar] [CrossRef] [Green Version]
Feng, Dingan, George J. Jiang, and Peter X.-K. Song. 2004. Stochastic conditional duration models with “leverage effect” for financial transaction data. Journal of Financial Econometrics 2: 390–421. [Google Scholar] [CrossRef]
Gingras, Samuel, and William J. McCausland. 2020. A flexible stochastic conditional duration model. arXiv arXiv:2005.09166. Available online: https://arxiv.org/abs/2005.09166 (accessed on 10 October 2022).
Hosaka, Go. 2014. Analysis of high-frequency trading at tokyo stock exchange. Paper presened at the Japan Exchange Group, JPX Working Papers, Tokyo, Japan, May; Available online: https://www.jpx.co.jp/english/corporate/research-study/working-paper/b5b4pj000000i468-att/Summary_JPX_wp_No4.pdf (accessed on 10 October 2022).
Lang, Stefan, and Andreas Brezger. 2004. Bayesian p-splines. Journal of Computational and Graphical Statistics 13: 183–212. [Google Scholar] [CrossRef] [Green Version]
Liu, Jun S., and Chiara Sabatti. 2000. Generalised gibbs sampler and multigrid monte carlo for bayesian computation. Biometrika 87: 353–69. [Google Scholar] [CrossRef]
Men, Zhongxian, Adam W. Kolkiewicz, and Tony S. Wirjanto. 2019. Threshold stochastic conditional duration model for financial transaction data. Journal of Risk and Financial Management 12: 88. Available online: https://www.mdpi.com/1911-8074/12/2/88 (accessed on 10 October 2022). [CrossRef] [Green Version]
Omori, Yasuhiro, and Toshiaki Watanabe. 2008. Block sampler and posterior mode estimation for asymmetric stochastic volatility models. Computational Statistics & Data Analysis 52: 2892–910. [Google Scholar]
Shephard, Neil, and Michael K. Pitt. 1997. Likelihood analysis of non-gaussian measurement time series. Biometrika 84: 653–67. [Google Scholar] [CrossRef] [Green Version]
Spiegelhalter, David J., Nicola G. Best, Bradley P. Carlin, and Angelika Van Der Linde. 2002. Bayesian measures of model complexity and fit. Journal of the rOyal Statistical Society: Series b (Statistical Methodology) 64: 583–639. Available online: https://rss.onlinelibrary.wiley.com/doi/full/10.1111/1467-9868.00353 (accessed on 10 October 2022). [CrossRef]
Strickland, Chris M., Catherine S. Forbes, and Gael M. Martin. 2006. Bayesian analysis of the stochastic conditional duration model. Computational Statistics & Data Analysis 50: 2247–267. [Google Scholar]
Sugiura, Ko, Teruo Nakatsuma, and Kenichiro McAlinn. 2015. Predicting the next executions using high-frequency data. Data Analytics 2015: 95–100. Available online: https://www.thinkmind.org/index.php?view=instance&instance=DATA+ANALYTICS+2015 (accessed on 10 October 2022).
Taylor, Stephen John. 1982. Financial returns modelled by the product of two stochastic processes-a study of the daily sugar prices 1961–75. Time Series Analysis: Theory and Practice 1: 203–26. [Google Scholar]
Veredas, David, Juan M. Rodrıguez Poo, and Antoni Espasa. 2002. On the (Intradaily) Seasonality and Dynamics of a Financial Point Process: A Semiparametric Pproach. Technical Report, Working Paper, CORE DP 2002/23. Louvain-la-Neuve: Université Catholique de Louvain. Available online: https://dial.uclouvain.be/pr/boreal/object/boreal:4267 (accessed on 10 October 2022).
Watanabe, Sumio. 2013. A widely applicable bayesian information criterion. Journal of Machine Learning Research 14: 867–897. Available online: https://www.jmlr.org/papers/volume14/watanabe13a/watanabe13a.pdf (accessed on 10 October 2022).
Watanabe, Toshiaki, and Yasuhiro Omori. 2004. A multi-move sampler for estimating non-gaussian time series models: Comments on shephard & pitt (1997). Biometrika 91: 246–48. [Google Scholar]

Figure 1. Intraday seasonality of time interval of TMC, Aichi, Japan (4 January 2016).

Figure 2. Intraday seasonality of time interval of TMC, Aichi, Japan (23 June 2016).

Figure 3. Intraday seasonality of time interval of TMC, Aichi, Japan (27 July 2016).

Figure 4. Intraday seasonality of time interval of AGH, Tokyo, Japan (4 January 2016).

Figure 5. Intraday seasonality of time interval of AGH, Tokyo, Japan (23 June 2016).

Figure 6. Intraday seasonality of time interval of AGH, Tokyo, Japan (27 July 2016).

Figure 7. Intraday seasonality of time interval of JAL, Tokyo, Japan (4 January 2016).

Figure 8. Intraday seasonality of time interval of JAL, Tokyo, Japan (23 June 2016).

Figure 9. Intraday seasonality of time interval of JAL, Tokyo, Japan (27 July 2016).

Table 1. Basic information comparison for Toyota Motor Corporation (TMC) (Aichi, Japan), Asahi Group Holdings, Ltd. (AGH) (Tokyo, Japan) and Japan Airlines Co., Ltd. (JAL) (Tokyo, Japan).

	Date	No. Durations > 0	No. Durations = 0
TMC	4 January 2016	8086	579
	23 June 2016	8263	291
	27 July 2016	8359	457
AGH	4 January 2016	1385	50
	23 June 2016	1092	33
	27 July 2016	3581	151
JAL	4 January 2016	2755	107
	23 June 2016	1308	38
	27 July 2016	1814	21

Table 2. DIC for Weibull distribution (4 January 2016, TMC, Aichi, Japan).

	Smoothing Parameter
Num of Knots	5	10	100
150	−146,550	−146,558	−146,578
30	−146,562	−146,571	−146,560
15	−146,562	−146,546	−146,535

The best of each distribution is underlined, and the best overall is in bold.

Table 3. DIC for Gamma distribution (4 January 2016, TMC, Aichi, Japan).

	Smoothing Parameters
Num of Knots	5	10	100
150	−146,387	−146,455	−146,437
30	−146,417	−146,434	−146,423
15	−146,454	−146,422	−146,389

The best of each distribution is underlined, and the best overall is in bold.

Table 4. DIC for Weibull distribution (23 June 2016, TMC, Aichi, Japan).

	Smoothing Parameter
Num of Knots	5	10	100
150	−152,178	−152,175	−152,148
30	−152,114	−152,102	−152,103
15	−152,093	−152,099	−152,041

The best of each distribution is underlined, and the best overall is in bold.

Table 5. DIC for Gamma distribution (23 June 2016, TMC, Aichi, Japan).

	Smoothing Parameters
Num of Knots	5	10	100
150	−151,919	−150,961	−151,948
30	−151,938	−151,950	−151,896
15	−151,953	−151,961	−151,898

The best of each distribution is underlined, and the best overall is in bold.

Table 6. DIC for Weibull distribution (27 July 2016, TMC, Aichi, Japan).

	Smoothing Parameter
Num of Knots	5	10	100
150	−153,018	−153,018	−153,021
30	−153,024	−153,028	−153,027
15	−153,001	−153,012	−153,001

The best of each distribution is underlined, and the best overall is in bold.

Table 7. DIC for Gamma distribution (27 July 2016, TMC, Aichi, Japan).

	Smoothing Parameters
Num of Knots	5	10	100
150	−152,932	−152,927	−152,921
30	−152,924	−152,925	−152,950
15	−152,901	−152,946	−152,935

The best of each distribution is underlined, and the best overall is in bold.

Table 8. Estimated parameters about limit order book (4 January 2016, TMC, Aichi, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
Spread	0.8862	0.0390	[0.8098 0.9623]
Volume	$- 3.2627 \times 10^{- 6}$	$2.4334 \times 10^{- 6}$	[ $- 8.0522 \times 10^{- 6} 1.5150 \times 10^{- 6}$ ]
Bid-ask ratio	−0.1706	0.1067	[−0.3799 0.0404]

Table 9. Estimated parameters about limit order book (23 June 2016, TMC, Aichi, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
Spread	1.4483	0.0560	[1.3389 1.5553]
Volume	$- 5.8621 \times 10^{- 6}$	$4.5002 \times 10^{- 2}$	[ $- 1.4395 \times 10^{- 5} 2.6732 \times 10^{- 6}$ ]
Bid-ask ratio	0.8105	0.1151	[0.5866 1.0511]

Table 10. Estimated parameters about limit order book (27 July 2016, TMC, Aichi, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
Spread	1.2863	0.0480	[1.1951 1.3798]
Volume	$- 1.8708 \times 10^{- 5}$	$3.6379 \times 10^{- 6}$	[ $- 2.5667 \times 10^{- 5} - 1.1645 \times 10^{- 5}$ ]
Bid-ask ratio	0.0093	0.1152	[−0.2199 0.2328]

Table 11. DIC for Weibull distribution (4 January 2016, AGH, Tokyo, Japan).

	Smoothing Parameters
Num of Knots	5	10	100
30	−22,290	−22,200	−22,289
15	−22,112	−22,135	−22,273
10	−22,059	−22,120	−22,038

The best of each distribution is underlined, and the best overall is in bold.

Table 12. DIC for Gamma distribution (4 January 2016, AGH, Tokyo, Japan).

	Smoothing Parameters
Num of Knots	5	10	100
30	−22,084	−22,146	−22,111
15	−22,129	−22,140	−22,059
10	−22,125	−22,111	−22,092

The best of each distribution is underlined, and the best overall is in bold.

Table 13. DIC for Weibull distribution (23 June 2016, AGH, Tokyo, Japan).

	Smoothing Parameter
Num of Knots	5	10	100
30	−16,633	−16,637	−16,632
15	−16,632	−16,636	−16,635
10	−16,638	−16,639	−16,628

The best of each distribution is underlined, and the best overall is in bold.

Table 14. DIC for Gamma distribution (23 June 2016, AGH, Tokyo, Japan).

	Smoothing Parameter
Num of Knots	5	10	100
30	−16,604	−16,606	−16,604
15	−16,607	−16,613	−16,608
10	−16,609	−16,606	−16,601

The best of each distribution is underlined, and the best overall is in bold.

Table 15. DIC for Weibull distribution (27 July 2016, AGH, Tokyo, Japan).

	Smoothing Parameters
Num of Knots	5	10	100
30	−62,228	−62,231	−62,234
15	−62,244	−62,218	−62,235
10	−62,235	−62,238	−62,216

The best of each distribution is underlined, and the best overall is in bold.

Table 16. DIC for Gamma distribution (27 July 2016, AGH, Tokyo, Japan).

	Smoothing Parameters
Num of Knots	5	10	100
30	−62,267	−62,274	−62,254
15	−62,265	−62,275	−62,260
10	−62,273	−62,252	−62,261

The best of each distribution is underlined, and the best overall is in bold.

Table 17. Estimated parameters about limit order book (4 January 2016, AGH, Tokyo, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
Spread	0.9452	0.1127	[0.7121 1.1289]
Volume	$- 2.6640 \times 10^{- 4}$	$1.0761 \times 10^{- 4}$	[ $- 4.6876 \times 10^{- 4} - 5.6517 \times 10^{- 5}$ ]
Bid-ask ratio	−0.4060	0.4020	[−1.2842 0.3290]

Table 18. Estimated parameters about limit order book (23 June 2016, AGH, Tokyo, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
Spread	0.8802	0.1100	[0.6645 1.0960]
Volume	$- 3.3787 \times 10^{- 4}$	$1.4132 \times 10^{- 4}$	[ $- 6.0885 \times 10^{- 4} - 6.1209 \times 10^{- 5}$ ]
Bid-ask ratio	0.2303	0.3793	[−0.5134 0.9776]

Table 19. Estimated parameters about limit order book (27 July 2016, AGH, Tokyo, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
Spread	1.0511	0.0791	[0.9201 1.1741]
Volume	$- 9.9515 \times 10^{- 5}$	$3.4337 \times 10^{- 5}$	[ $- 1.6464 \times 10^{- 4} 2.8444 \times 10^{- 5}$ ]
Bid-ask ratio	−0.2928	0.1532	[−0.5582 0.0043]

Table 20. DIC for Weibull distribution (4 January 2016, JAL, Tokyo, Japan).

	Smoothing Parameter
Num of Knots	5	10	100
30	−47,826	−47,808	−47,803
15	−47,840	−47,823	−47,813
10	−47,817	−47,804	−47,765

The best of each distribution is underlined, and the best overall is in bold.

Table 21. DIC for Gamma distribution (4 January 2016, JAL, Tokyo, Japan).

	Smoothing Parameters
Num of Knots	5	10	100
30	−47,845	−47,861	−47,843
15	−47,850	−47,806	−47,827
10	−47,831	−47,820	−47,791

The best of each distribution is underlined, and the best overall is in bold.

Table 22. DIC for Weibull distribution (23 June 2016, JAL, Tokyo, Japan).

	Smoothing Parameter
Num of Knots	5	10	100
30	−16,633	−16,637	−16,632
15	−16,632	−16,636	−16,635
10	−16,638	−16,639	−16,628

The best of each distribution is underlined, and the best overall is in bold.

Table 23. DIC for Gamma distribution (23 June 2016, JAL, Tokyo, Japan).

	Smoothing Parameters
Num of Knots	5	10	100
30	−16,604	−16,606	−16,604
15	−16,607	−16,613	−16,608
10	−16,609	−16,606	−16,601

The best of each distribution is underlined, and the best overall is in bold.

Table 24. DIC for Weibull distribution (27 July 2016, JAL, Tokyo, Japan).

	Smoothing Parameter
Num of Knots	5	10	100
30	−29,421	−29,422	−29,422
15	−29,418	−29,420	−29,416
10	−29,419	−29,418	−29,406

The best of each distribution is underlined, and the best overall is in bold.

Table 25. DIC for Gamma distribution (27 July 2016, JAL, Tokyo, Japan).

	Smoothing Parameters
Num of Knots	5	10	100
30	−29,481	−29,499	−29,482
15	−29,524	−29,483	−29,461
10	−29,491	−29,477	−29,468

The best of each distribution is underlined, and the best overall is in bold.

Table 26. Estimated parameters about limit order book (4 January 2016, JAL, Tokyo, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
Spread	1.1827	0.0529	[1.0809 1.2932]
Volume	$- 6.5667 \times 10^{- 5}$	$3.4338 \times 10^{- 5}$	[ $- 1.3631 \times 10^{- 4} 1.9506 \times 10^{- 7}$ ]
Bid-ask ratio	0.0616	0.2112	[−0.3058 0.5194]

Table 27. Estimated parameters about limit order book (23 June 2016, JAL, Tokyo, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
Spread	1.4040	0.1372	[1.1384 1.6733]
Volume	$- 3.0252 \times 10^{- 5}$	$1.2035 \times 10^{- 5}$	[ $- 5.3968 \times 10^{- 5} - 7.0058 \times 10^{- 6}$ ]
Bid-ask ratio	0.0729	0.3496	[−0.6238 0.7474]

Table 28. Estimated parameters about limit order book (27 July 2016, JAL, Tokyo, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
Spread	1.0004	0.0469	[0.9136 1.0864]
Volume	$- 5.9807 \times 10^{- 5}$	$2.8536 \times 10^{- 5}$	[ $- 1.1086 \times 10^{- 4} - 3.9883 \times 10^{- 6}$ ]
Bid-ask ratio	−0.4280	0.1643	[−0.7945 −0.1419]

Table 29. Comparison of information criteria (TMC, Aichi, Japan).

	DIC	WAIC
proposed-SCD (4 January 2016)	−146,578	−143,175
original-SCD (4 January 2016)	−137,586	−137,275
proposed-SCD (23 June 2016)	−152,178	−156,533
original-SCD (23 June 2016)	−151,017	−152,319
proposed-SCD (27 July 2016)	−153,028	−144,322
original-SCD (27 July 2016)	−143,312	−143,084

Table 30. Comparison of information criteria (AGH, Tokyo, Japan).

	DIC	WAIC
proposed-SCD (4 January 2016)	−22,290	−22,132
original-SCD (4 January 2016)	−25,160	−25,239
proposed-SCD (23 June 2016)	−16,639	−16,647
original-SCD (23 June 2016)	−20,711	−20,675
proposed-SCD (27 July 2016)	−62,275	−62,025
original-SCD (27 July 2016)	−62,786	−62,015

Table 31. Comparison of information criteria (JAL, Tokyo, Japan).

	DIC	WAIC
proposed-SCD (4 January 2016)	−47,861	−49,786
original-SCD (4 January 2016)	−49,472	−49,284
proposed-SCD (23 June 2016)	−16,639	−21,825
original-SCD (23 June 2016)	-	-
proposed-SCD (27 July 2016)	−29,524	−32,512
original-SCD (27 July 2016)	−32613	−32,402

Table 32. Comparison of estimated parameters (4 January 2016, TMC, Aichi, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
$ϕ_{p r o p o s e d}$	0.8215	0.0211	[0.7772 0.8600]
$γ_{p r o p o s e d}$	0.5391	0.0105	[0.5202 0.5614]
$σ_{p r o p o s e d}$	0.8994	0.0706	[0.7708 1.0492]
$ϕ_{o r i g i n a l}$	0.9995	0.0001	[0.9993 0.9998]
$γ_{o r i g i n a l}$	0.5216	0.0048	[0.5122 0.5309]
$σ_{o r i g i n a l}$	0.1883	0.0088	[0.1717 0.2055]

Table 33. Comparison of estimated parameters (23 June 2016, TMC, Aichi, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
$ϕ_{p r o p o s e d}$	0.5812	0.0188	[0.5475 0.6195]
$γ_{p r o p o s e d}$	0.7672	0.0231	[0.7238 0.8093]
$σ_{p r o p o s e d}$	1.8313	0.0516	[1.7316 1.9234]
$ϕ_{o r i g i n a l}$	0.9970	0.0007	[0.9955 0.9982]
$γ_{o r i g i n a l}$	0.5088	0.0065	[0.4963 0.5219]
$σ_{o r i g i n a l}$	0.8340	0.0318	[0.7706 0.8988]

Table 34. Comparison of estimated parameters (27 July 2016, TMC, Aichi, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
$ϕ_{p r o p o s e d}$	0.7213	0.0233	[0.6754 0.7620]
$γ_{p r o p o s e d}$	0.6168	0.0172	[0.5863 0.6506]
$σ_{p r o p o s e d}$	1.3649	0.0755	[1.2317 1.5081]
$ϕ_{o r i g i n a l}$	0.9993	0.0002	[0.9988 0.9996]
$γ_{o r i g i n a l}$	0.5083	0.0047	[0.4991 0.5176]
$σ_{o r i g i n a l}$	0.2815	0.0125	[0.2587 0.3080]

Table 35. Comparison of estimated parameters (4 January 2016, AGH, Tokyo, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
$ϕ_{p r o p o s e d}$	0.9116	0.0264	[0.8566 0.9579]
$γ_{p r o p o s e d}$	0.3706	0.0110	[0.3497 0.3930]
$σ_{p r o p o s e d}$	0.5502	0.1285	[0.3093 0.8022]
$ϕ_{o r i g i n a l}$	0.9875	0.0033	[0.9877 0.9934]
$γ_{o r i g i n a l}$	0.3776	0.0115	[0.3557 0.4006]
$σ_{o r i g i n a l}$	1.3037	0.1065	[1.1071 1.5176]

Table 36. Comparison of estimated parameters (23 June 2016, AGH, Tokyo, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
$ϕ_{p r o p o s e d}$	0.9650	0.0150	[0.9291 0.9874]
$γ_{p r o p o s e d}$	0.3513	0.0087	[0.3345 0.3685]
$σ_{p r o p o s e d}$	0.1179	0.0366	[0.0647 0.2062]
$ϕ_{o r i g i n a l}$	0.9916	0.0025	[0.9862 0.9960]
$γ_{o r i g i n a l}$	0.3362	0.0099	[0.3173 0.3559]
$σ_{o r i g i n a l}$	1.0119	0.0888	[0.8462 1.1915]

Table 37. Comparison of estimated parameters (27 July 2016, AGH, Tokyo, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
$ϕ_{p r o p o s e d}$	0.2824	0.0153	[0.2494 0.3066]
$γ_{p r o p o s e d}$	1.3463	0.0952	[1.1919 1.5324]
$σ_{p r o p o s e d}$	2.6611	0.0395	[2.5852 2.7386]
$ϕ_{o r i g i n a l}$	0.9958	0.0021	[0.9904 0.9983]
$γ_{o r i g i n a l}$	0.3944	0.0115	[0.3692 0.4114]
$σ_{o r i g i n a l}$	0.6704	0.1291	[0.5394 0.9674]

Table 38. Comparison of estimated parameters (4 January 2016, JAL, Tokyo, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
$ϕ_{p r o p o s e d}$	0.3760	0.0180	[0.3402 0.4128]
$γ_{p r o p o s e d}$	1.9805	0.1875	[1.6585 2.3511]
$σ_{p r o p o s e d}$	2.9829	0.0447	[2.8972 3.0719]
$ϕ_{o r i g i n a l}$	0.9900	0.0024	[0.9850 0.9944]
$γ_{o r i g i n a l}$	0.3727	0.0092	[0.3553 0.3909]
$σ_{o r i g i n a l}$	1.1598	0.0732	[1.0233 1.3094]

Table 39. Comparison of estimated parameters (23 June 2016, JAL, Tokyo, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
$ϕ_{p r o p o s e d}$	0.9140	0.0311	[0.8452 0.9664]
$γ_{p r o p o s e d}$	0.3455	0.0121	[0.3239 0.3714]
$σ_{p r o p o s e d}$	0.6072	0.1817	[0.2827 0.9946]
$ϕ_{o r i g i n a l}$	-	-	[- -]
$γ_{o r i g i n a l}$	-	-	[- -]
$σ_{o r i g i n a l}$	-	-	[- -]

Table 40. Comparison of estimated parameters (27 July 2016, JAL, Tokyo, Japan).

Parameters	Posterior Mean	Posterior SD	95%CI
$ϕ_{p r o p o s e d}$	0.3277	0.0155	[0.2911 0.3577]
$γ_{p r o p o s e d}$	1.1510	0.1310	[0.8505 1.3794]
$σ_{p r o p o s e d}$	3.1048	0.0667	[2.9681 3.2322]
$ϕ_{o r i g i n a l}$	0.9977	0.0007	[0.9963 0.9990]
$γ_{o r i g i n a l}$	0.2459	0.0066	[0.2332 0.2590]
$σ_{o r i g i n a l}$	0.3712	0.0248	[0.3228 0.4210]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Toyabe, T.; Nakatsuma, T. Stochastic Conditional Duration Model with Intraday Seasonality and Limit Order Book Information. J. Risk Financial Manag. 2022, 15, 470. https://doi.org/10.3390/jrfm15100470

AMA Style

Toyabe T, Nakatsuma T. Stochastic Conditional Duration Model with Intraday Seasonality and Limit Order Book Information. Journal of Risk and Financial Management. 2022; 15(10):470. https://doi.org/10.3390/jrfm15100470

Chicago/Turabian Style

Toyabe, Tomoki, and Teruo Nakatsuma. 2022. "Stochastic Conditional Duration Model with Intraday Seasonality and Limit Order Book Information" Journal of Risk and Financial Management 15, no. 10: 470. https://doi.org/10.3390/jrfm15100470

APA Style

Toyabe, T., & Nakatsuma, T. (2022). Stochastic Conditional Duration Model with Intraday Seasonality and Limit Order Book Information. Journal of Risk and Financial Management, 15(10), 470. https://doi.org/10.3390/jrfm15100470

Article Menu

Stochastic Conditional Duration Model with Intraday Seasonality and Limit Order Book Information

Abstract

1. Introduction

2. Stochastic Conditional Duration Model

2.1. A Proposed Model

2.2. Basis Spline (B-Spline)

2.3. Limit Order Book Information

3. Estimation Method

3.1. Joint Posterior Distribution

3.2. State Variables

3.3. Regression Coefficients

3.4. Shape Parameter

3.5. AR(1) Coefficient and Variance

3.6. Generalized Gibbs Sampler

4. Empirical Applications

4.1. Data description

4.2. Empirical Results

4.3. Model Comparison

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Note

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI