1. Introduction
Granger and Newbold (
1974) and others show that regression of independent (nearly) nonstationary time series could result in spurious outcomes,
Pesaran et al. (
1999) and others find that a mixed integration of orders; that is, I(0) or I(1), could be cointegrated and the residual is stationary, and
Westerlund (
2008) documents that many studies commit a Type 1 error by failing to reject the no-cointegration hypothesis. On the other hand,
Engle and Granger (
1987) establish the relationship between cointegration and error correction models that first suggested in
Granger (
1981) and develop estimation procedures and tests for the cointegration model. In addition,
Phillips (
1986) develops an asymptotic theory for regressions of integrated random processes, including the spurious regressions discovered by
Granger and Newbold (
1974) and the cointegrating regressions developed by
Engle and Granger (
1987).
Entorf (
1997) analyses the regression of two independent random walks with drifts and shows that the convergence to pseudo true values applies to the estimation of spurious fixed-effects models. Readers may refer to
Ventosa-Santaulária (
2009) for an overview of spurious regression.
Is it possible that the regression of two independent and nearly non-stationary series does not have any spurious problem? In this paper, we explore the issue. To explore the problem, we first conjecture that under some situations, regression of two independent and nearly non-stationary series does not have any spurious problem at all. To check whether the conjecture we set holds, we first generate two independent and nearly nonstationary AR(1) processes, and with . We then regress on the independent to get and check the proportion of rejecting the null hypothesis that the beta () is zero. We first find that under some situations, consistent with the literature, regressing two independent and (nearly) nonstationary time series could be spurious. Nonetheless, we also find that under some other situations, different from the literature, our results show that the rejection rates are much smaller than the 5% level of significance for all the cases simulated in our paper, implying that under some other situations, regressing nearly nonstationary on independent and nearly nonstationary will not get any spurious problem at all as shown in all the cases being simulated in our paper.
The rest of the paper is organized as follows. In
Section 2, we state the basic models for the regression and the regression with a spurious problem. In
Section 3, we state our model setup and construct the algorithm for the simulation. In
Section 4, we discuss our findings from our simulation and the last section concludes.
3. Model Setup and Algorithm
In this section, we first state the model setting of generating two purely independent and nearly nonstationary time series, regressing one of them onto the other, and examining whether the corresponding regression is spurious. We then construct the algorithm for the simulation and discuss our simulation result in the next section.
3.1. Model Setup
We consider the simple linear regression in (
1) between two unrelated nearly nonstationary AR(1) series
and
such that
in which
. For simplicity, we assume that both
and
follow:
where
and
. We note that
,
, and
follows a Student’s
t distribution with
degrees of freedom (df). For
,
k is equated to 1 and in this case,
in (
8) is simply a scale parameter. When
, it becomes a Cauchy distribution and when
, it becomes a normal distribution. Readers may refer to
Pötzelberger (
1990),
Tiku and Wong (
1998),
Tiku et al. (
1999,
2000),
Wong and Bian (
2005),
Fu and Fu (
2015), and others to know more properties of AR(1) series.
To simulate and properly, without loss of generality, we will consider different factors that could affect the behavior of the time series. First, we consider the distribution of the error terms. We choose a time series that follows the following four different iid error distributions in our study:
Situation 1. We assume that the distribution of the error terms and defined in (7) follow the following situations: - 1.
a standard normal distribution: that is, both and ∼ N(0,1);
- 2.
a t-distribution with df = 5: that is, both and ∼ t(5);
- 3.
a t-distribution with df = 2: that is, both and ∼ t(2); and
- 4.
a t-distribution with df = 1: that is, both and follow the standard Cauchy distribution.
Second, we vary the lengths of the times series and simulate a time series with the following four different lengths in our study as stated in the following situations:
Situation 2. We consider that the lengths of the times series and defined in (7) to be: (i) T = 100; (ii) T = 200; (iii) T = 400; and (iv) T = 800. After deciding the error distribution and the lengths of the AR(1) processes, we now consider the different values of
and
. In our model, since both
and
are nearly nonstationary, we choose
and, in particular, we define
and
1 and consider the following values for both
and
as stated in Situations 3 and 4:
Situation 3. We consider that the values of both and such that and .
Situation 4. We consider that the values of both and such that and .
We note that in this paper, we consider Situations 3 and 4 because when two autoregressive processes in which one is associated to the zero frequency; that is, the AR(1) with a positive coefficient in our paper, and the other is associated to the Nyquist frequency (
); that is, the AR(1) with a negative coefficient in our paper that has power at frequency
and completes a cycle every 2 observations, are independent or even asymptotically orthogonal. Readers may refer to
Johansen and Schaumburg (
1999),
Ghysels and Osborn (
2001), and
del Barrio Castro et al. (
2018,
2019) for more information. Readers may also refer to seasonal unit root tests, see, for example,
del Barrio Castro et al. (
2012) and
Smith et al. (
2009), and cointegration for processes integrated at different frequencies, see, for example,
del Barrio Castro et al. (
2020) with properties that are related to the series we are using in our paper.
2With four different error distributions, four different time series lengths, and the above 50 combinations of and values as stated in Assumptions 1, 2, 3, and 4, there are in total 800 cases of simulation in our study for the cases when autoregressive coefficients and have different signs.
Nevertheless, in this paper, we also study the cases when both autoregressive coefficients and are of the same signs, either positive or negative. Thus, we include the following situations in our study:
Situation 5. We consider that the values of both and such that and .
Situation 6. We consider that the values of both and such that and .
3.2. Algorithm
The two series
and
are generated from independent error terms, and thus, they are expected not to be related. However, Granger and Newbold (1974) and others have shown that regression of independent nonstationary time series could result in spurious outcomes. In this paper, we believe that it is possible that when regressinng independent and nearly nonstationary
and
as shown in Equation (
1) may not be spurious under some situations as we stated in Conjecture 1. To check whether Conjecture 1 could hold under some situations, we set the following algorithm for each situation (different error distributions, different time series lengths, different combinations of
and
) as described in
Section 3.1:
Algorithm 1: For each situation (different error distributions, different time series lengths, different combinations of and ) as described in Section 3.1, we will conduct the following steps in our simulation: |
|
For each situation (different error distributions, different time series lengths, different combinations of
and
) as described in
Section 3.1, we will conduct simulation as described in Algorithm 1 and discuss the results in the next section.
4. Simulation
We follow Algorithm 1 to conduct simulation for each situation (different error distributions, different time series lengths, different combinations of
and
) as described in
Section 3.1. The simulation helps us to examine whether the
T statistic as shown in Equation (
3) for the model as shown in Equation (
1) follow a Student t-distribution. If
and
are unrelated, the true null hypothesis that all
coefficients are zero should be rejected around 5% of the time at the significance level of 5%. If the T test is good, that is,
’s follow student t-distribution, the rejection rate should be close to 5%. If the rejection rate is significantly greater than 5%, then we conclude that there exists the spurious problem. In addition, we believe that it is possible that when regressing independent and nearly nonstationary
and
as shown in (
1) may not be spurious under some situations as we hypothesized in Conjecture 1. To check whether Conjecture 1 could hold under some situations, we discuss it in this section. We first discuss the results of the simulation for the cases when
and
are of different signs in the next subsection.
4.1. Simulation for the Cases When and Are of Different Signs
We first analyze cases as stated in Situation 3 and exhibit the results in
Table A1,
Table A2,
Table A3 and
Table A4 displaying in
Appendix A that report the rejecting frequency of the
T test when
. From
Table A1,
Table A2,
Table A3 and
Table A4, one can observe that when choosing the values of both
and
as stated in Situation 3 are from
to
, the rejection rate is about 0.0000 for any
n and for any error distribution studied in our paper, except the situation when the error term follows a
in which the rejection rates are close to 0.0004.
We then analyze the cases as stated in Situation 4 and show the results in
Table A5,
Table A6,
Table A7 and
Table A8 displaying in
Appendix B that report the rejecting frequency of the
T test when
. Similarly, from
Table A5,
Table A6,
Table A7 and
Table A8, one can observe that when choosing values of both
and
as stated in Situation 4 are between
and
, the rejection rate is zero or close to zero for any
n and any error distribution studied in our paper.
Our analysis shows that for all the cases when choosing values for
as stated in Situations 3 and 4 and when choosing values of both
and
are between
and
, the rejection rates are much smaller than the 5% level of significance, implying that when
follow Situations 3 and 4 and when both
and
are between
and
, all the corresponding regressions do not encounter any spurious problem for all the cases simulated in our paper, confirming that Conjecture 1 holds. In other words, our analysis shows that when independent
and
follow nearly nonstationary AR(1) model and the autoregressive coefficients
and
have opposite signs, there is no spurious problem in the regression stated in Equation (
1) and Conjecture 1 holds.
4.2. Simulation for the Cases When and Are of the Same Sign
We turn to examine whether the regression shown in Equation (
1) is spurious for the cases when both
and
are of the same signs; that is, both
and
are positive or both are negative. To do so, we follow Algorithm 1 to conduct simulations for the cases when both
and
are positive and both are negative as displayed in Situations 5 and 6 and exhibit the results in
Table A9,
Table A10,
Table A11,
Table A12,
Table A13,
Table A14,
Table A15 and
Table A16 displaying in Appendices
Appendix C and
Appendix D, respectively.
We first discuss the cases when both
and
are positive as stated in Situation 5. Compared with the results in
Table A1,
Table A2,
Table A3,
Table A4,
Table A5,
Table A6,
Table A7 and
Table A8, all of the rejection rates in
Table A9,
Table A10,
Table A11 and
Table A12 are significantly higher than 5% and the rejecting frequency of the
T test is higher than 49% for any
n and any error distribution studied in our paper, except the situation when the error term follows
in which the rejection rates is higher than 32%. In addition, as
n increases, or either
or
increases, or as the error distributions are further away from normal distribution, the rejecting rate increases even further.
We turn to discuss the cases when both and are negative as stated in Situation 6. Similar to the cases when both and are positive, when both and are negative, the rejecting frequency of the T test is higher than 50% for any n and for any error distribution studied in our paper, except the situation when the error term follows in which the rejection rates is higher than 31%. In addition, Similar to the cases when both and are positive, as n increases, or either or increases, or as the error distributions are further away from normal distribution, the rejecting rate increases even further.
Our analysis shows that, different from all the cases when for and are of different signs, for all the cases when and are of the same signs, either positive or negative, as stated in Situations 5 and 6, respectively, and when both and are between and , the rejection rate is much higher than the 5% level of significance for all the cases studied in our paper and it could be higher than 49%, implying that when follow Situations 5 and 6 and when both and are between and , the chance that the regressions being spurious is very high for all the cases simulated in our paper, which, in turn, rejects Conjecture 1 for all the cases in Situations 5 and 6.
5. Concluding Remarks
In this paper, we conjecture that under some situations, the regression of two independent and nearly non-stationary series does not have any spurious problem at all. To check whether our conjecture holds, we first generate two independent and nearly nonstationary AR(1) processes, and in which . We then regress on independent to get and check whether the proportion of rejecting the null hypothesis of the beta () to be zero. We first find that consistent with the literature that supports the hypothesis of regressing two independent and (nearly) nonstationary time series could be spurious, when both and are of the same signs, either positive or negative, and when the values of both and are between and , the rejection rate is much bigger than the 5% level of significance in all the cases examined in our simulation and it could be higher than 49% in many cases, implying that the chance that the regressions being spurious is very high for all the cases when both and are of the same signs.
Nonetheless, for all the cases when for and are of different signs, then different from the literature, our results show that when both and are between and , the rejection rates are much smaller than the 5% level of significance for all the cases studied in our paper, implying that when and are of different signs, regressing nearly nonstationary on independent and nearly nonstationary will not get any spurious problem at all for all the cases being simulated in our paper.
We note that the literature shows that the regression of independent and (nearly) nonstationary time series could result in spurious outcomes. In this paper, we conjecture that under some situations, regression of two independent and nearly non-stationary series does not have any spurious problem at all, and in this paper, we aim to find some situations that our conjecture could hold. In this paper, we find that when or , then our conjecture holds. We note that when or , our conjecture holds which does not imply that these are only situations that our conjecture holds. There could have other situations that our conjecture could hold. We leave it to future studies to find other situations that our conjecture could hold. The purpose of our paper is to tell readers that when one finds regression of any two or more time series that do not have any spurious problem, this does not necessarily imply that the series are not independent. Thus, academics and practitioners should conduct some proper tests to show whether the series are independent.
Some academics may wonder whether there are some financial or economic time series that exhibit extreme negative autocorrelations. We believe there could have some financial or economic time series exhibit positive autocorrelations and some exhibit negative autocorrelations. We note that the time period used in our paper may not be daily or monthly, it should be set to fit the nature of the time series. It is well-known that stock returns could be overreacted or underreacted, this means that it could be positively auto-correlated or negatively auto-correlated and the true unobserved stock returns are positively auto-correlated or negatively auto-correlated. Whether they are extreme positively auto-correlated or negatively auto-correlated will depend on particular stocks. In addition, as we have mentioned before, when or , our conjecture holds which does not imply that these are the only situations that our conjecture could hold. There may have other situations that our conjecture could hold. Some financial or economic time series could follow other situations that yet to be discovered, and thus, the conjecture could be important not only for statistics, but also for economics and finance. We also note that in our paper, we only consider to cover nearly non-stationary series but do not cover the situations and . We do not cover the situation because this has been well-studied in the literature. On the other hand, we do not cover the situation because this situation, we believe, is of no practice relevance.
We note that as far as we know, this paper is the first paper to discover that under some situations, the regression of two independent and nearly non-stationary series does not have any spurious problem at all. We follow
Granger and Newbold (
1974) and others to provide simulation results to show our discovery. Academics could follow
Phillips (
1986),
Johansen and Schaumburg (
1999), and others to provide formal proof of the finding in our paper to replace Brownian motions by using the OU processes with
to approximate
.
3 We will leave it to further research to develop the theoretical results to explain the phenomena discovered in this paper. We also note that in this paper, we get very good results by using
. One may get good results by using the near-integrated approach. We will leave this to future studies.
4 Another problem in our study is that there is a serious problem with under-rejection. Further study could expose this problem and correct the test properly.