1. Introduction
Throughout his career, Peter Phillips has made important contributions to knowledge across the broad spectrum of econometrics and statistics, providing inspiration to many other researchers along the way. This paper builds on two of the strands of Peter’s research, namely jackknife bias reduction and the analysis of nonstationary time series. Indeed, our own work on the jackknife (
Chambers 2013,
2015;
Chambers and Kyriacou 2013) was inspired by Peter’s work on this topic with Jun Yu, published as
Phillips and Yu (
2005), and the current contribution also extends the results on moment generating functions (MGFs) contained in
Phillips (
1987a).
The jackknife has been proven to be an easy-to-implement method of eliminating first-order estimation bias in a wide variety of applications in statistics and econometrics. Its genesis can be traced to
Quenouille (
1956) and
Tukey (
1958) in the case of independently and identically distributed (iid) samples, while it has been adapted more recently to accommodate more general time series settings. Within the class of stationary autoregressive time series models,
Phillips and Yu (
2005) show that the jackknife can effectively reduce bias in the pricing of bond options in finance, while
Chambers (
2013) analyses the performance of jackknife methods based on a variety of sub-sampling procedures. In subsequent work,
Chambers and Kyriacou (
2013) demonstrate that the usual jackknife construction in the time series case has to be amended when a unit root is present, while
Chen and Yu (
2015) show that a variance-minimising jackknife can be constructed in a unit root setting that also retains its bias reduction properties. In addition,
Kruse and Kaufmann (
2015) compare bootstrap, jackknife and indirect inference estimators in mildly explosive autoregressions, finding that the indirect inference estimator dominates in terms of root mean squared error, but that the jackknife excels for bias reduction in stationary and unit root situations.
The usual motivation for a jackknife estimator relies on the existence of a Nagar-type expansion of the original full-sample estimator’s bias. Its construction proceeds by finding a set of weights that, when applied to a full-sample estimator and a set of sub-sample estimators, is able to eliminate fully the first-order term in the resulting jackknife estimator’s bias expansion. In stationary time series settings, the bias expansions are common to both the full-sample and sub-sample estimators, but
Chambers and Kyriacou (
2013) pointed out that this property no longer holds in the case of a unit root. This is because the initial values in the sub-samples are no longer negligible in the asymptotics and have a resulting effect on the bias expansions, thereby affecting the optimal weights. Construction of a fully-effective jackknife estimator relies, therefore, on knowledge of the presence (or otherwise) of a unit root.
In this paper, we explore the construction of jackknife estimators that eliminate fully the first-order bias in the near-unit root setting. Near-unit root models have attracted a great deal of interest in time series owing, amongst other things, to their ability to capture better the effects of sample size in the vicinity of a unit root, to explore analytically the power properties of unit root tests and to allow the development of an integrated asymptotic theory for both stationary and non-stationary autoregressions; see
Phillips (
1987a) and
Chan and Wei (
1987) for details. We find that jackknife estimators can be constructed in the presence of a near-unit root that achieve this aim of bias reduction. Jackknife estimators have the advantage of incurring only a very slight additional computational burden, unlike alternative resampling and simulation-based methods such as the bootstrap and indirect inference. Furthermore, they are applicable in a wide variety of estimation frameworks and work well in finite sample situations in which the prime objective is bias reduction. Although the bootstrap is often a viable candidate for bias reduction, it was shown by
Park (
2006) that the bootstrap is inconsistent in the presence of a near-unit root, and hence, jackknife methods offer a useful alternative in these circumstances.
The development of a jackknife estimator that achieves bias reduction in the near-unit root case is not simply a straightforward application of previous results. While
Chambers and Kyriacou (
2013) first pointed out that under unit root non-stationarity, the effect of the sub-sample initial conditions does not vanish asymptotically, thereby affecting asymptotic expansions of sub-sample estimator bias and the resulting jackknife weights as compared to the stationary case, the extension of these results to a local-to-unity setting is not obvious. With a near-unit root, the autoregressive parameter plays an important role, and it is therefore necessary to derive the appropriate asymptotic expansion of sub-sample estimator bias for this more general case, as well as the MGFs of the relevant limiting distributions that can be used to construct the appropriate jackknife weights. The derivation of such results is challenging in itself and is a major reason why we focus on the bias-minimising jackknife, rather than attempting to derive results for the variance-minimising jackknife of
Chen and Yu (
2015).
The paper is organised as follows.
Section 2 defines the near-unit root model of interest and focuses on the limit distributions of sub-sample estimators, demonstrating that these limit distributions are sub-sample dependent. An asymptotic expansion of these limit distributions demonstrates the source of the failure of the standard jackknife weights in a near-unit root setting by showing that the bias expansion is also sub-sample dependent. In order to define a successful jackknife estimator, it is necessary to compute the mean of these limit distributions, and so,
Section 3 derives the moment generating function of two random variables that determine the limit distributions over an arbitrary sub-interval of the unit interval. Expressions for the computation of the mean of the ratio of the two random variables are derived using the MGF. Various properties of the MGF are established, and it is shown that results obtained in
Phillips (
1987a) arise as a special case, including those that emerge as the near-unit root parameter tends to minus infinity.
Based on the results in
Section 2 and
Section 3, the optimal weights for the jackknife estimator are defined in
Section 4, which then goes on to explore, via simulations, the performance of the proposed estimator in finite samples. Consideration is given to the choice of the appropriate number of sub-samples to use when either bias reduction or root mean squared error (RMSE) minimisation is the objective. It is found that greatest bias reduction can be achieved using just two sub-samples, while minimisation of RMSE, which, it should be stressed, is not the objective of the jackknife estimator, requires a larger number of sub-samples, which increases with sample size.
Section 5 contains some concluding comments, and all proofs are contained in the
Appendix A.
The following notation will be used throughout the paper. The symbol denotes equality in distribution; denotes convergence in distribution; denotes convergence in probability; ⇒ denotes weak convergence of the relevant probability measures; denotes a Wiener process on , the space of continuous real-valued functions on the unit interval; and denotes the Ornstein–Uhlenbeck process, which satisfies for some constant parameter c. Functionals of and , such as , are denoted for notational convenience where appropriate, and in stochastic integrals of the form , it is to be understood that integration is carried out with respect to r. Finally, L denotes the lag operator such that for a random variable .
3. A Moment Generating Function and Its Properties
The following result provides the joint moment generating function (MGF) of two relevant functionals of defined over a subinterval of where . Although our focus is on sub-intervals of , we leave b unconstrained for greater generality than is required for our specific purposes because the results may have more widespread use beyond our particular application.
Theorem 3. Let and , where is an Ornstein–Uhlenbeck process on with parameter c, and . Then:
- (a)
The joint MGF of and is given by:where, defining and , - (b)
The individual MGFs for and are given by, respectively,where and . - (c)
Then, the expectation of is given by:where:
The MGFs for the two functionals in Theorem 3 have potential applications in a wide range of sub-sampling problems with near-unit root processes. A potential application of the joint MGF in Part (a) of Theorem 3 is in the computation of the cumulative and probability density functions of the distributions
when setting
and
. For example, the probability density function of
is given by (with
):
see, for example,
Perron (
1991, p. 221), who performs this type of calculation for the distribution
, while
Abadir (
1993) derives a representation for the density function of
in terms of a parabolic cylinder function.
The result in Part (b) of Theorem 3 is obtained by differentiating the MGF and constructing the appropriate integrals. When
, the usual (full-sample) result, where
and
, can be obtained as a special case. Noting that
in this case and making the substitution
results in:
these expressions can be found; for example,
Gonzalo and Pitarakis (
1998, Lemma 3.1). Some further special cases of interest that follow from Theorem 3 are presented below.
Corollary to Theorem 3. (a) Let so that and . Then:while taking the limit as yields:where . (b) Let so that and . Then:where . Taking the limit as results in: The results in Part (a) of the corollary are relevant in the full-sample case, and the result for
goes back to
White (
1958). The results in Part (b) of the corollary are pertinent to the sub-sampling issues being investigated here in the case of a near-unit root, with the unit root (
) result for
having been first derived by
Chambers and Kyriacou (
2013).
It is also possible to use the above results to explore the relationship between the sub-sample distributions and the full-sample distribution. For example, it is possible to show that
on
is equal to
for
in the sub-samples, while
on
is equal to
for
in the sub-samples; an implication of this is that:
Furthermore, this implies that the limit distribution of the first sub-sample estimator, , when , is the same as that of the full-sample estimator, , when .
The sub-sample results with a near-unit root can be related to the full-sample results of
Phillips (
1987a). For example, the MGF in Theorem 3 has the equivalent representation:
where
and
are defined in the theorem,
and
. When
,
, it follows that
, and the above expression nests the MGF in
Phillips (
1987a), i.e.,
this follows straightforwardly from (
14). It is also of interest to examine what happens when the local-to-unity parameter
, as in
Phillips (
1987a) and other recent work on autoregression, e.g.,
Phillips (
2012). We present the results in Theorem 4 below.
Theorem 4. Let denote an Ornstein–Uhlenbeck process on with parameter c, and let . Furthermore, define the functional:where and: Then, as :
- (a)
;
- (b)
;
- (c)
if (and hence ) and diverges otherwise.
The functional
in Theorem 4 represents the limit distribution of the normalised estimator
, where
ℓ denotes the number of observations in the sub-sample
(so that
in this case) and
is the corresponding estimator. However, as pointed out by
Phillips (
1987a), the sequential limits (large sample for fixed
c, followed by
) are only indicative of the results one might expect in the stationary case and do not constitute a rigorous demonstration. The results in Theorem 4 also encompass the related results in
Phillips (
1987a) obtained when
and
.
4. An Optimal Jackknife Estimator
The discussion following Theorem 2 indicates that the weights defining an optimal jackknife estimator, which removes first-order bias in the local-to-unity setting, depend on the quantities:
where
,
and:
In particular, Part (c) of Theorem 3 can be used to evaluate the quantities:
where we have defined:
The relevant MGFs for evaluating and are given in the corollary to Theorem 3.
Table 1 contains the values of
for values of
m and
c as follows:
and
.
2 The entries for
correspond to
in view of the distributional equivalence of
and
discussed following the corollary. For a given combination of
j and
m, it can be seen that the expectations increase as
c increases, while for given
c and
j the expectations increase with
m. A simple explanation for the different properties of the sub-samples beyond
is that the initial values are of the same order of magnitude as the partial sums of the innovations. The values of the sub-sample expectations when
are seen from
Table 1 to be independent of
m and to increase with
j. Note that
corresponds to the expected value of the limit distribution of the full-sample estimator
under a unit root; see, for example, (
10) and the associated commentary. The values of
can be used to define jackknife weights under a unit root for different values of
m; see, for example,
Chambers and Kyriacou (
2013). More generally, the values of
can be used to define optimal weights for the jackknife estimator that achieve the aim of first-order bias removal in the presence of a near-unit root. The result is presented in Theorem 5.
Theorem 5. Let . Then, under Assumption 1, an optimal jackknife estimator is given by:where and . Theorem 5 shows the form of the optimal weights for the jackknife estimator when the process (
1) has a near-unit root. It can be seen that the weights depend not only on the value of
c, but also on the value of
, both of which are unknown in practice. The authors in
Chambers and Kyriacou (
2013) and
Chen and Yu (
2015) have emphasised the case
and
and have reported simulation results highlighting the good bias-reduction properties of appropriate jackknife estimators in that case. When
and
, the optimal weights in Theorem 5 simplify to:
The values of
in
Table 1 can be utilised to derive these optimal weights for the jackknife estimator in this case; these are reported in
Table 2 for the values of
m and
c used in
Table 1, along with the values of the standard weights that are applicable in stationary autoregressions. The entries in
Table 1 show that the optimal weights are larger in (absolute) value than the standard weights that would apply if all the sub-sample distributions were the same and that they increase with
c for given
m. The optimal weights also converge towards the standard weights as
c becomes more negative; this could presumably be demonstrated analytically using the properties of the MGF in constructing the
by examining the appropriate limits as
, although we do not pursue such an investigation here.
The relationship between the optimal weights when
and when
is not straightforward. Noting that:
and that:
we find that:
This expression can be manipulated to write
explicitly in terms of
as follows:
The second weight is obtained simply as
. In situations where
, which essentially reflects cases where
does not have a white noise structure, the optimal weights can be obtained from the entries in
Table 2 (at least for the relevant values of
c) using (
15), but knowledge is still required not only of
c and
, but also the expectations of the inverses of
and
. The latter can be computed numerically; Equation (2.3) of
Meng (
2005) shows that:
where
and
are the MGFs of
and
, respectively, which can be obtained from the corollary to Theorem 3.
In practice, however, the values of
c and
are still required in order to construct the optimal estimator. Although the localization parameter
c from the model defined under Assumption 1 is identifiable, it is not possible to estimate it consistently, and attempts to do so require a completely different formulation of the model; see, for example,
Phillips et al. (
2001), who propose a block local-to-unity framework to consistently estimate
c, although this approach does not appear to have been pursued subsequently. Furthermore
depends on an estimator of the long-run variance
, which is a notoriously difficult quantity to estimate in finite samples. In view of these unresolved challenges and following earlier work on jackknife estimation of autoregressive models with a (near-)unit root, we focus on the case
, but allow
with particular attention paid to unit root and locally-stationary processes, i.e.,
.
Our simulations examine the performance of five estimators
3 of the parameter
. The baseline estimator is the OLS estimator in (
4), the bias of which the jackknife estimators aim to reduce. Three jackknife estimators with the generic form:
are also considered, each differing in the choice of weights
; in all cases,
. The standard jackknife sets
; the optimal jackknife sets
; and the unit root jackknife sets
. The standard jackknife removes fully the first-order bias in stationary autoregressions, but does not do so in the near-unit root framework, in which the optimal estimator achieves this goal. However, the optimal estimator is infeasible because it relies on the unknown parameter
c.
4 We therefore also consider the feasible unit root jackknife obtained by setting
. In addition, we consider the jackknife estimator of
Chen and Yu (
2015), which is of the form:
The weights are chosen so as to minimise the variance of the estimator in addition to providing bias reduction in the case
. Because the choice of weights is a more complex problem for this type of jackknife estimator, Chen and Yu only provide results for the cases
and
, in which case the weights are
,
,
and
,
,
,
, respectively; see Table 1 of
Chen and Yu (
2015).
Table 3 reports the bias of the five estimators obtained from 100,000 replications of the model in Assumption 1 with
and
using
for each of the jackknife estimators; this value has been found to provide particularly good bias reduction in a number of studies, including
Phillips and Yu (
2005),
Chambers (
2013),
Chambers and Kyriacou (
2013) and
Chen and Yu (
2015). The particular values of
c are
, which focus on the pure unit root case, as well as locally stationary processes, and four sample sizes are considered, these being
n = 24, 48, 96 and 192. The corresponding values of
are:
;
;
; and
for all values of
n when
. The values of
when
are some way from unity for the smaller sample sizes, which suggests that the standard jackknife might perform well in these cases.
The value of the bias of the estimator producing the minimum (absolute) bias for each
c and
n is highlighted in bold in
Table 3. The results show the substantial reduction in bias that can be achieved with jackknife estimators, the superiority of the optimal estimator being apparent as
c becomes more negative, although the unit root jackknife also performs well in terms of bias reduction.
Table 4 contains the corresponding RMSE values for the jackknife estimators using
, as well as the RMSE corresponding to the RMSE-minimising values of
m, which are typically larger than
and are also reported in the table. The RMSE value of the estimator producing the minimum RMSE for each
c and
n is highlighted in bold. In fact, the optimal jackknife estimator, although constructed to eliminate first order bias, manages to reduce the OLS estimator’s RMSE and outperforms the
Chen and Yu (
2015) jackknife estimator in both bias and RMSE reduction, although the latter occurs at a larger number of sub-samples. The results show that use of larger values of
m tends to produce smaller RMSE than when
, and again, the optimal jackknife performs particularly well when
c becomes more negative. The performance of the unit root jackknife is also impressive, suggesting that it is a feasible alternative to the optimal estimator when the value of
c is unknown.
Although in itself important, bias is not the only feature of a distribution that is of interest, and hence, the RMSE values in
Table 4 should also be taken into account when assessing the performance of the estimators. The substantial bias reductions obtained with the bias-minimising value of
come at the cost of a larger variance that ultimately feeds through into a larger RMSE compared with the OLS estimator
. This can be offset, however, by using the larger RMSE-minimising values of
m that, despite having a larger bias than when
, are nevertheless able to reduce the variance sufficiently to result in a smaller RMSE than
.
In order to assess the robustness of the jackknife estimators, some additional bias results are presented in
Table 5 that correspond to values of
, while the estimators are based on the assumption that
, as in the preceding simulations.
5 The results correspond to two different specifications for
that enable data to be generated that are consistent with different values of
. The first specifies
to be a first-order moving average (MA(1)) process, so that
where
; in this case
. The second specification is a first-order autoregressive (AR(1)) process of the form
, in which case
. In the MA(1) case, we have chosen
in order to give an intermediate value of
, while in the AR(1) case, we have chosen
to give a small value of
. As in
Table 3, the value of the bias of the estimator producing the minimum (absolute) bias for each
c and
n is highlighted in bold.
Table 5 shows, in the MA case, that the jackknife estimators are able to reduce bias when
, but none of them is able to do so when
or
. In the AR case, with a smaller value of
, the jackknife estimators are still able to deliver bias reduction, albeit to a lesser extent than when
, and it is the unit root jackknife of
Chambers and Kyriacou (
2013) that achieves the greatest bias reduction in this case. These results are indicative of the importance of knowing
and suggest that developing methods to allow for
is important from an empirical viewpoint.
5. Conclusions
This paper has analysed the specification and performance of jackknife estimators of the autoregressive coefficient in a model with a near-unit root. The limit distributions of sub-sample estimators that are used in the construction of the jackknife estimator are derived, and the joint MGF of two components of that distribution is obtained and its properties explored. The MGF can then be used to derive the weights for an optimal jackknife estimator that removes fully the first-order finite sample bias from the OLS estimator. The resulting jackknife estimator is shown to perform well at finite sample bias reduction in a simulation study and, with a suitable choice of the number of sub-samples, is shown to be able to reduce the overall finite sample RMSE, as well.
The theoretical findings in
Section 3 and
Section 4 show how first-order approximations on sub-sample estimators can be used along with the well-known full-sample results of
Phillips (
1987a) for finite-sample refinements. The jackknife uses analytical (rather than simulation-based) results to achieve bias reduction at minimal computational cost along the same lines as indirect inference methods based on analytical approximations in
Phillips (
2012) and
Kyriacou et al. (
2017). Apart from computational simplicity, an evident advantage of analytical-based methods over simulation-based alternatives such as bootstrap or (traditional, simulation-based) indirect inference methods is that they require no distributional assumptions on the error term.
Despite its success in achieving substantial bias reduction in finite samples, as shown in the simulations, a shortcoming of the jackknife estimator, and an impediment to its use in practice, is the dependence of the optimal weights on the unknown near-unit root parameter, as well as on a quantity related to the long-run variance of the disturbances.
6 However, our theoretical results in
Section 3 and
Section 4 reveal precisely how these quantities affect the optimal weights and therefore can, in principle, be used to guide further research into the development of a feasible data-driven version of the jackknife within this framework. Such further work is potentially useful in view of the simulations in
Table 3 and
Table 4 highlighting that (feasible) jackknife estimators are an effective bias and RMSE reduction tool in a local unit root setting, even if they do not fully remove first order bias. Moreover, the results obtained in Theorems 1–4 can be utilised in a wide range of sub-sampling situations outside that of jackknife estimation itself.
The results in this paper could be utilised and extended in a number of directions. An obvious application would be in the use of jackknife estimators as the basis for developing unit root test statistics, the local-to-unity framework being particularly well suited to the analysis of the power functions of such tests. It would also be possible to develop, fully, a variance-minimising jackknife estimator along the lines of
Chen and Yu (
2015) who derived analytic results for
and
or 3, although extending their approach to arbitrary
c and
m represents a challenging task. However, considerable progress has been made in this direction by
Stoykov (
2017), who builds upon our results and also proposes a two-step jackknife estimator that incorporates an estimate of
c to determine the jackknife weights. The estimation model could also be extended to include an intercept and/or a time trend. The presence of an intercept will affect the limit distributions by replacing the Ornstein–Uhlenbeck processes by demeaned versions thereof, which will also have an effect on the finite sample biases. Such effects have been investigated by
Stoykov (
2017), who shows that substantial reductions in bias can still be achieved by jackknife methods. Applications of jackknife methods in multivariate time series settings are also possible, a recent example being
Chambers (
2015) in the case of a cointegrated system, but other multivariate possibilities could be envisaged.