We consider two series of returns related to the S&P 500 data, namely low-frequency and high-frequency returns. The low-frequency data consist of 22,000 daily return observations for the period from 13 August 1928 to 30 December 2011. Among these daily returns, the observations from 13 August 1928 to 30 October 2002 were kindly provided by William Schwert. The source of the data for the period 4 January 1928 through 2 July 1962 is
Schwert (
1990). From 3 July 1962 to 30 October 2002 it is from the CRSP daily returns file, and the returns for the time period after 30 October 2002 were obtained from the Yahoo Finance website. Because of the need to construct various aggregate measures, the effective initial date for estimate is 13 August 1928. The high-frequency data pertain to S&P 500 futures and includes 1-min returns from 7 October 1986 to 2 March 2007, amounting to 5000 trading days in total. These were purchased from
http://www.grainmarketresearch.com/. These futures contracts expire within one year after their inception. Specifically, contracts incepted in January, April, July, and October expire in March, June, September, and December, respectively. The cleaned version was provided by Shin Ikeda as described in
Appendix A of
Ikeda (
2015). The span of the data was mostly dictated by the data availability, though it conveniently avoids the turbulent period of the great recession. In order to eliminate the effect of outliers in the data, we use a logarithmic transformation of the observations. Since there are some zeros in the original high-frequency and daily data, we demean our data first, as in
Deo et al. (
2006). Other methods were proposed in the literature, e.g.,
Perron and Qu (
2010) and
Lu and Perron (
2010), who add a small value to the squared returns.
4.1. Low Frequency Data
We first start our analysis with low frequency data, i.e., the daily data series.
Table 1 shows the LP estimates for the log realized
S-day return series, which is simply calculated by cumulating
S neighboring squared daily returns that do not overlap. More specifically,
, 5, 10, and 20 stands for the squared daily returns, the realized weekly (every five business days), biweekly, and monthly (every 20 business days) volatilities, respectively. The columns labelled
,
and
refer to the
S-period aggregation of the logarithmic transformation of squared daily returns, the logarithmic transformation of the original squared daily returns and the logarithmic transformation of the
S-period aggregated squared daily returns, respectively, with
S denoting the aggregation level. Both the standard LP (SLP) and trimmed LP (TLP) estimates are presented for purposes of comparisons. For each series, the standard LP estimate is computed using
, while the trimmed one is constructed using
, which performs relatively well according to
McCloskey and Perron (
2013). In all cases,
. Note that for the original daily returns series
we consider the LP estimates constructed with different bandwidths to highlight the importance of the bandwidth selection. As the results will show, the empirical estimates are very similar using either the
S periods aggregation with bandwidth
or the original daily series with the same bandwidth
. This is an implication of Lemmas 1–2 so that it should hold whether the process is a RLS or pure long-memory.
Remark 7. Our theory applies only to the cases and but not to . However, unreported simulations show that all three measures behave similarly as the aggregation changes. Hence, we conjecture that it would be possible to extend our results to cover the case of .
Some interesting results in
Table 1 are worth notice. First, with the same bandwidth (
), the estimates for the log realized daily return series, which is the log aggregated squared daily returns, are approximately equal to and always a bit larger than the estimates for the log squared original returns. This finding confirms Corollary 1, i.e., the same long-memory parameter estimate can be obtained for both the aggregated time series and the original series when the same bandwidths are used.
Figure 1 shows the periodograms of the squared daily return series (left), the five-period aggregation of the squared daily return series, divided by 5 (middle), and the 20-periods aggregation of the squared daily return series, divided by 20 (right) for frequency indices up to 550. The 550th frequency index corresponds to the frequencies
,
, and
for the squared daily returns, the five-period aggregation of the squared daily returns, and the 20-period aggregation of the squared daily returns, respectively. Note that they have almost identical pattern near frequency zero. This finding confirms Lemmas 1–2 again and implies the same long-memory parameter estimate for the aggregated and original series when the same bandwidths are used.
Second, when
, i.e., the daily return series, note that both the standard and trimmed LP estimators are very different. In particular, the trimmed LP estimates are close to zero when
, indicating the (near) absence of long-memory processes. On the other hand, the standard LP estimate is
. However, for the realized 5-day return series, the standard and trimmed LP estimators are
and
, respectively. In addition, a feature of interest is the fact that both the standard and trimmed LP estimates increase as
S increases. As shown in Remark 5, the same estimate of the long-memory parameter for the aggregated series should be obtained when using the same bandwidths as those on the original time series. Therefore, the apparent difference could simply be caused by the bandwidth selection. We actually use a relative small bandwidth for the aggregated series, compared with the original series, because the number of observation in the aggregated series is smaller. The issue of interest is whether the documented feature is more likely to occur with a RLS model or with a pure long-memory process. As discussed in
Perron and Qu (
2010), the LP estimate increases as
m decreases when considering the RLS plus white noise model. Hence, a larger LP estimate is expected for the aggregated series under RLS. In particular, it is expected to be greater than
, i.e., in the non-stationary region. No such increase towards values in the non-stationary region is expected, as the bandwidth decreases, with a pure long-memory process. Hence, these results are more consistent with a RLS being the data-generating process.
4.2. High Frequency Data
We now consider high-frequency data, for which the unit of time is one minute.
Table 2 shows the LP estimates for the log realized daily volatility constructed from
k-period high-frequency data, and the log squared original returns. Here,
, 5, 30, and 330 correspond to the case of 1-min, 5-min, 30-min, and daily returns, respectively. The columns labelled
,
and
refer to the
S-period aggregation of the logarithmic transformation of squared
k-min returns, the logarithmic transformation of squared
k-min returns and logarithmic transformation of the realized daily volatility aggregated by squared
k-min returns over a day, respectively.
S denotes the number of
k-min returns per day, with
. Both the standard and trimmed LP estimates are presented for purposes of comparisons. For each series, the standard LP estimate is constructed using
, and the trimmed one using
. For the log realized return volatility series, we let
, so that the number of return observations equals the number of days on which prices are available. However, we let
for the log squared return series, which means that the total number of return observations is equal to the product of the number of days and the number of observations in each day. For comparison purposes, we also include the estimates for the log squared original returns with
.
Some interesting results in
Table 2 are worth notice. First, similar to the results in
Table 1, with the same bandwidth set to
, the estimates for the log realized volatility series, which is the aggregated squared returns, are approximately equal to the estimates for the log squared original returns. For instance, when
and
, the trimmed LP estimator for the log realized return volatility is
while the corresponding estimate for the log squared original returns is
.
Figure 2 shows the periodograms of the realized volatility obtained from 1-min return series (left),
s times the periodograms of the squared 1-min return series (middle), as well as the difference between them (right), with 2500 frequency indices. The 2500th frequency index corresponds to the frequencies
and
for the realized volatility and the squared 1-min returns, respectively. The periodogram of the realized volatility (left) and that of the squared 1-min returns (middle) exhibit very similar values for frequency indices smaller than 1000, especially for frequency indices close to zero. The difference between the periodogram for the realized volatility and the 1-min returns (right) is slight. These results can be explained by the fact that the realized volatility series is here the 330-period non-overlapping aggregation of the squared 1-min return series, and their periodograms exhibit approximately the same values for small frequency indices, as stated in Lemmas 1–2.
Second, for the log realized volatility series, the LP estimates decrease as k increases, i.e., they are smaller when the return interval is longer. With daily returns , so that the log realized volatility series is the log squared daily return series, the standard and trimmed LP estimates are and . In particular, the trimmed LP estimate is close to zero, indicating the (near) absence of long-memory, while the standard LP estimate is large, consistent with a RLS process. Of interest is the fact that for all estimates are similar when the same bandwidth is used. This accords with the theoretical results that the estimates are invariant to the aggregation level. When , i.e., daily data, the estimates are somewhat smaller. This feature can be explained by the fact that as the aggregation level increases the spectral density function is contaminated by noise, which henceforth reduces the estimate, see Remark 2.
Combining the equation for the squared daily returns (
2) and that for the realized volatility (
4), we know, as discussed in
Section 3, that both the realized volatility and the squared daily returns contain the same information about long memory. However, the squared daily returns contain a larger noise component than does the realized volatility.
Figure 3 shows the log squared daily return series (left), the log realized volatility obtained from 1-min return series (middle), and the difference between them (right). We can see that the log squared daily return series (left) exhibits larger variance than the log realized volatility series (middle). No pattern is seen in the difference (right), and it simply seems to be noise.
Figure 4 shows the periodograms of the log squared daily returns (left) and the log realized volatility obtained from 1-min return (middle), as well as the difference between them (right). Note that the periodogram of the log squared daily returns (left) is much larger than that of the log realized volatility constructed from 1-min returns (middle) except for the first few frequencies near zero. These results are consistent with Equation (
5) and with the presence of RLS given the very large values near frequency zero. For those frequencies near zero, both periodograms show very large values. Similar to what was shown in
Figure 3, the difference (right) appears to be caused by a white noise process. In addition, we can see that the white noise process is dominant in the periodogram of the log squared daily return series (left). As shown in
Figure 5, similar results occur for the periodograms of the log realized volatility obtained from 110-min returns (left), 30-min returns (middle) and 5-min returns (right). The periodograms exhibit smaller values when higher frequency data are used, except for the first few frequencies indices near zero. For these, the periodograms are likely determined by low-frequency contamination, for example, random level shifts (
Perron and Qu 2010 and
McCloskey and Perron 2013). In general, the values of the periodograms are much larger for frequency indices near zero, which can be explained by the fact that the impact of the random level shifts dominates that of the noise process.
Third, when larger bandwidths
are used to estimate the memory parameter for the log squared original returns, the trimmed LP estimates are close to zero, indicating the (near) absence of long memory, while the standard LP estimates are near the non-stationary region, regardless of the length of the return intervals. These results are consistent with those of
Perron and Qu (
2010),
Lu and Perron (
2010),
McCloskey and Perron (
2013) and
Varneskov and Perron (
2018). This is an important feature, which shows the importance of the bandwidth selection, in particular in selecting a value large enough. The use of the aggregated squared
k minutes returns allow much more flexibility in the possible choice of the bandwidth, so that we can always use
, which was suggested by
Souza (
2008) as leading to improved estimates. Such choices are not possible when using log realized volatility so that the estimator is much more influenced by the low frequencies thereby inducing larger estimates, regardless of whether the true process is RLS or pure long-memory. Hence, we view the estimates obtained with the aggregated squared
k minutes returns and a large bandwidth as being the most reliable, indicating the near absence of long-memory. This is reinforced by the fact that these estimates are (very nearly) the same across aggregation levels, showing robustness to aggregation at any level.