1. Introduction
Stock markets are not only one of the most important economic and financial markets in each country today, but their immaturity and institutional weaknesses can lead to serious divergences between their development and macroeconomic developments, mainly referring to the high volatility of stock prices, which makes asset pricing and effective portfolios subject to a lot of uncertainty. Therefore, the study of stock market volatility and more accurate estimation and prediction of stock market fluctuations play an important role and significance in reducing stock market risks, maintaining the safe and stable development of the stock market and ensuring the healthy and stable operation of the macro economy; refer to
Brooks and Persand (
2003);
Giot and Laurent (
2004). With the rapid advancement of computer technology, accessing high-frequency financial data has become easier. Using high-frequency data, we can estimate realized volatility; refer to
Andersen et al. (
2003);
Barndorff-Nielsen and Shephard (
2003a,
2003b);
Jacod et al. (
2009), etc. By incorporating high-frequency financial data, it provides a more accurate measure of market volatility compared to traditional methods.
The original stochastic volatility (SV) model was proposed by
Taylor (
1986) and others.
Taylor (
1986) proposed a discrete-time SV model,
White (
1984) proposed a continuous-time SV model, and
Harvey and Shephard (
1996) discussed an asymmetric SV model with leverage effects between the return process and the stochastic volatility process in the SV model using the quasi-maximum likelihood estimation method.
Han et al. (
2016) described an asymmetric stochastic volatility model using Gaussian regression with parameter estimation using the sequential Monte Carlo method.
Financial return volatility is defined as the standard deviation of returns and plays a central role in modern finance. Realized volatility is the sum of the squares of intra-day returns over an interval and is used by modern financial economists and econometricians as a measure of true volatility.
Andersen and Bollerslev (
1997) thought the realized volatility proposed would provide a stable estimate of the potential volatility under the assumption of an ideal market. However, in real markets, measuring daily realized volatility based on high-frequency return data raises problems related to the presence of microstructure noise of the trading market. There are many noise-robust approaches of realized volatility (see
Zhang et al. (
2005);
Barndorff-Nielsen et al. (
2008);
Xiu (
2010);
Jacod et al. (
2009) and references therein). We apply the pre-averaging method (
Jacod et al. (
2009)) to estimate the realized volatility using high-frequency data.
Realized volatility reveals some important information of volatility; combining realized volatility into a traditional volatility model can improve the forecasting effect.
Hansen et al. (
2012,
2014) incorporated realized volatility with a generalized autoregressive conditional heteroscedasticity model.
Takahashi et al. (
2009) explores a stochastic volatility model with realized volatility, selecting the new sampling method and using the Markov chain Monte Carlo method for parameter estimation.
Chaussé and Xu (
2018) used four generalized methods of moments for the generalized asymmetric stochastic volatility with a realized volatility model (GASV-RV) and concluded that the efficiency of the GMM was improved by automatic moment selection through the principal component GMM and regularized GMM procedures.
This paper uses realized volatility constructed from high-frequency data and adds it to the stochastic volatility model to improve the prediction of volatility; the new model is called the realized stochastic volatility model. We employ the GMM method to estimate the parameters of the realized stochastic volatility model. The paper presents the theoretical moment conditions of the realized stochastic volatility model; the research contribution is providing the moment conditions for realized volatility. Furthermore, we explore the accuracy of GMM by comparing it with other two methods, MCMC and QML, which are utilized for parameter estimation in the realized stochastic volatility model.
We introduce the realized stochastic volatility model in
Section 2. Estimation of realized volatility is given in
Section 3. Three parameter estimation methods are introduced in
Section 4.
Section 5 provides an empirical illustration and demonstrates the effectiveness of three different parameter estimation methods.
Section 6 contains the conclusion.
2. Realized Stochastic Volatility Model
Considering the realized volatility measure in the traditional SV model, the realized stochastic volatility (RSV) model is constructed by
Takahashi et al. (
2009). Compared with the traditional SV model, the RSV model contains more intra-day information, which is helpful to improve the prediction performance of the model inside and outside the sample. The specific RSV model is expressed as follows:
In the yield equation
, the volatility
plays the role of a constant scale factor, and
is the unobserved potential volatility. To ensure the strict stationarity and iterative nature of the stochastic process, the persistence parameter
is assumed in the logarithmic volatility equation
and set
.
and
are random error terms. Theoretically, when the error term
obeys the standard normal distribution,
is a stationary process of AR(1), following the normal distribution with the mean value of
and the variance of
. The RSV model is composed by adding a metric Equation (
2) to the rate of return equation and the state equation of the SV model. Where
is the realized volatility at time
t, the pre-averaging method can be chosen for the estimation of realized volatility,
is the variance of the new interest
, the smaller the
, the better the fit of the model,
is the bias correction term of the realized volatility measure. We use the pre-averaging method to estimate the realized volatility, because this method can handle the microstructure noise problem when using high-frequency data.
3. Realized Volatility
The term volatility comes from mathematical statistics, and it is an indicator used to measure the level of price volatility and reflects the extent to which prices deviate from their average value. Realized volatility is an estimation of integrated volatility. When using high-frequency data, the traditional realized volatility estimator will be dominated by noise and will not have convergence to the integrated volatility. In this work, the pre-averaging method is used to calculate the realized volatility, which is proposed by
Jacod et al. (
2009). This method can reduce the effect of microstructure noise, and the estimator is a consistent estimator for the integrated volatility. Precisely, the latent price is
,
, the noise price is
, and the observed contaminated data are represented by
,
We choose a sequence and a number that satisfies . be a function defined in which satisfies g is continuous, piecewise with a piecewise Lipschitz derivative . Denote for , then we define the pre-averaged increments: .
The pre-averaged estimator is
When using Equation (
2), we employ the value of
to replace
, which represents the realized volatility.
4. Parameter Estimation Methods
4.1. GMM Method Based on RSV Model
The GMM method was first proposed by
Hansen (
1982). It is a generic method for estimation parameters in semiparametric models. The method requires a certain number of moment conditions that are specified for the model. In this work, we refer to the method used in
Jacquier et al. (
2002) to construct moment conditions for the rate of return and have the following theorem.
Theorem 1. Given the RSV model given in Equations (1)–(3), for , the first four order moments and the cross-moment expressions for and arewhere, , , . Referring to the proof of the moment condition in
Chaussé and Xu (
2018), this paper gives the moment condition that the RSV model has the realized volatility term, which is proved as follows.
Proposition 1. Given the RSV model specified in Equations (1)–(3), the first two order moments and the cross-moment expressions for and are Proof of Proposition 1. Given
and
specified in (
2),
□
Let
be a
vector with typical element
,
for some
and
, and let
be the theoretical moments of the RSV model. Let
; then, the GMM estimator
of the true vector of coefficients
is based on the following moment conditions:
and is the solution to:
where
is the admissible parameter space implied by the model,
and
is a consistent estimate of the auto-correlation matrix of
.
Therefore, the estimator defined by Equation (
21) is a one-step GMM with the estimate of the auto-correlation consistent (HAC) matrix given by:
where
is a kernel, and
h is the bandwidth, which can be chosen using the procedures proposed by
Newey and West (
1986) and
Andrews (
1991),
In order to improve the properties of the two-step GMM,
Hansen (
1982) suggested two other methods. The first one is the iterative version of the two-step GMM and can be computed as follows:
Compute ;
Compute the HAC matix ;
Compute the ;
If stops, else and go to step 2;
Define the two-step GMM estimator as ;
where can be set as small as we want to increase the precision.
4.2. MCMC Method Based on RSV Model
In this paper, the MCMC method is used to estimate the parameters of the RSV model as a comparison with the GMM method. In the estimation, the prior distribution of the parameters is estimated and the conditional distribution of the combined sample information is given first, and then, the posterior distribution of the parameters to be estimated is calculated and the parameters of the models can be estimated for specific problems using the WinBUGS 1.4.3 software package.
Consider the RSV model described in (
1)–(
4). When given
, referring to
Takahashi et al. (
2009), we can compute the conditional likelihood of the RSV model as:
where
denotes the parameters. Therefore, we use a Bayesian approach to estimate the posterior distribution of the parameters of the RSV model, considering
h as an additional latent variable. In this setup, the most important thing is how to sample
h efficiently. Therefore, we first describe the sampling algorithm for
h.
When selecting the parameter’s prior distributions, we refer to the setting in
Yu (
2005), and we set priors as:
,
,
,
,
. Then, denoting
and
, the posterior density for
and
h becomes
To implement the Markov chain Monte Carlo simulation, we sample from the posterior distribution as follows:
Simulate h from .
Simulate from .
Simulate from .
Simulate from .
Simulate from .
Simulate from .
4.3. QML Method Based on RSV Model
In this work, QML estimation is also performed for the RSV model. Due to the nonlinear relationship between the daily returns and the log of latent volatility in the Equations (
1)–(
3), we cannot compute the likelihood of these models by the Kalman filter. But given the parameter vector of the RSV model is
, the log latent volatility is
, and by referencing
Takahashi et al. (
2009), we can compute the conditional likelihood of the RSV model as:
Then, the RSV model log-likelihood function can be written as:
The log-likelihood estimation obtained from the above equation is a continuous function of the RSV model parameter
. Then, the parameter
of this model can be estimated by virtue of the classical proposed maximum likelihood estimation method obtained as follows.
where
is the admissible parameter space implied by the model.
5. Empirical Research
In this part, an empirical study will be conducted using the data of the Shanghai Stock Exchange (SSE) Composite Index from 4 January 2005 to 15 December 2022. The GMM method is used for estimating the parameters in the RSV model. The QML method and MCMC methods are also used in the RSV model for a comparative study.
5.1. Loss Functions
To measure different methods’ estimation and prediction performance, loss functions, also known as objective functions, are needed for measuring the errors between the actual volatility and predicted volatility. For regression data, the mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) are often used. The mean square error refers to the expected value of the square of the difference between the estimated value and the true value. The root mean square error is the arithmetic square root of the mean square error, which can directly observe the direct difference between the predicted value and the real value. The mean absolute error can better reflect the actual error between the predicted value and the actual value. The mean absolute percentage error is a measure of relative error, which uses the absolute value to avoid the positive error and negative error canceling each other. The relative error can be used to compare the prediction accuracy of various time-series models. The loss functions mentioned above are defined as follows:
where
is the predicted volatility at time
t,
T is the count number of the model forecast, and
is the realized volatility estimated by the pre-averaged estimator.
5.2. Data Selection and Processing
We used high-frequency data for the SSE Composite Index for the period from 4 January 2005 to 15 December 2022. The sample length is 4363, where the first 3963 trading days of data are selected for in-sample fitting and the last 400 trading days of data are selected for out-of-sample prediction. The frequency of our observed stock data is every five minutes. For a normal trading day, there are 48 observations. The data used in the empirical analysis are sourced from the Oxford-Man Institute of Quantitative Finance Realized Library and the Wind database. Prior to conducting the empirical analysis, certain processing steps are required for the return variable, .
The logarithm of the stock index closing price data for each stock market trading day is , , forming a logarithmic price series ;
The logarithmic price series are differenced to obtain the return , for the t-th trading day and then constitute the return sequence .
Figure 1 below is the index returns of SSE. We can see the irregular and aggregation of the SSE stock index return volatility. In the three phases 2007–2009, 2015–2016 and 2018–2020, the SSE composite index return volatility is large, and extreme values are more prominent. As we know, there are relatively large stock price fluctuations during these three periods since the financial crisis and economic market downturn.
5.3. Model Parameter Estimation
We use the daily return series of the SSE Composite Index to represent
in Equation (
1). In addition, we use the five-minute high-frequency return series of the SSE Composite Index to estimate the pre-averaged realized volatility in (
6), and we use it as
in Equation (
2). The GMM method and QML method are used to estimate the parameters of the RSV model by R 4.1.3 language software. The MCMC method is used to estimate the parameters of the RSV model using WinBUGS software. WinBUGS’ basic principle is to sample from the complete conditional probability distribution through Gibbs sampling and the Metropolis algorithm, so as to generate a Markov chain, and finally estimate the model parameters through iteration. The obtained parameter estimation results are shown in
Table 1. The advantage of introducing Gibbs sampling and MCMC is self-evident: that is, to avoid calculating a complete joint posterior probability publication with high-dimensional integral form and instead calculate the univariate conditional probability distribution of each estimated parameter.
Observing the persistence parameter , the parameter ’s value of the RSV model of the SSE index is close to 1, indicating that the estimation results show that the time series of the SSE index has high persistent volatility characteristics. Next, observing the bias correction term , the parameter of the RSV model is positive, indicating that the effect of market microstructure noise still persists.
The results of the parameters of the GMM method do not differ much from those of the QML method. The values are still close to 1 and the persistence of volatility is still high.
From
Figure 2,
Figure 3 and
Figure 4, it is evident that the GMM method exhibits a notable ability to identify significant changes in volatility, particularly when volatility levels are high. The GMM method outperforms the MCMC, QML method in accurately predicting large volatility. The MCMC method performs well in forecasting, as it closely aligns the predicted volatility with the actual volatility. The predictive performance of the QML method in volatility estimation is satisfactory, yet it is not on par with the superior performance demonstrated by the GMM and MCMC approaches. The four loss functions are used to test the accuracy of the forecasting results.
The efficiency of three parameter estimation methods was investigated, and the results presented in
Table 2 demonstrate that, under the RSV model, using parameters obtained from the MCMC method yields the most effective predictions of volatility, followed by the GMM method, while the QML method performs relatively weaker. When the RSV model is used for volatility prediction, the error of predicting volatility using the parameters estimated by the GMM method is almost the same as that predicted by the MCMC method. It is worth mentioning that the MCMC method requires more computation time compared to the GMM method, yet the predictive performance remains comparable. This finding substantiates the effectiveness and utility of the GMM method of RSV model proposed in this study.
6. Conclusions
With the development of science and technology, people’s research in the field of stochastic volatility-type models parameter estimation is becoming more and more in-depth, and new parameter estimation methods are bound to appear. In this paper, GMM, MCMC and QML methods are used for realized stochastic volatility model parameter estimation, and we use these parameters and the realized stochastic volatility model to predict volatility. Empirical data are analyzed in this paper. We use the five-minute high-frequency return series of the SSE Composite Index, and we apply the pre-averaging method to estimate the realized volatility. The prediction results illustrate that the GMM method is very effective and the calculation speed is faster, while the MCMC method is also effective, and the QML method is less accurate.
Although the realized volatility is introduced on the basis of the random volatility model, this paper still assumes that the disturbance term obeys normal distribution. According to the research in recent years, it is shown that the model disturbance term obeys the generalized hyperbolic distribution, which may improve the prediction effect of the model. For the improved model, we can consider using the efficient generalized method of moments to estimate the unknown parameters.