Next Article in Journal
Finite Key Size Analysis of Two-Way Quantum Cryptography
Next Article in Special Issue
A Robust Bayesian Approach to an Optimal Replacement Policy for Gas Pipelines
Previous Article in Journal
A Fuzzy Logic-Based Approach for Estimation of Dwelling Times of Panama Metro Stations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identifying the Most Relevant Lag with Runs

1
Departamento de Métodos Cuantitativos para la Economía y la Empresa, Universidad de Murcia, Espinardo 30100, Spain
2
Departamento de Economía A. Cuantitativa I, Universidad Nacional de Educación a Distancia (UNED), Madrid 28040, Spain
3
Department of Quantitative Methods, Universidad Politécnica de Cartagena, Cartagena 30203, Spain
*
Author to whom correspondence should be addressed.
Entropy 2015, 17(5), 2706-2722; https://doi.org/10.3390/e17052706
Submission received: 19 February 2015 / Revised: 19 April 2015 / Accepted: 23 April 2015 / Published: 28 April 2015
(This article belongs to the Special Issue Inductive Statistical Methods)

Abstract

:
In this paper, we propose a nonparametric statistical tool to identify the most relevant lag in the model description of a time series. It is also shown that it can be used for model identification. The statistic is based on the number of runs, when the time series is symbolized depending on the empirical quantiles of the time series. With a Monte Carlo simulation, we show the size and power performance of our new test statistic under linear and nonlinear data generating processes. From the theoretical point of view, it is the first time that symbolic analysis and runs are proposed to identifying characteristic lags and also to help in the identification of univariate time series models. From a more applied point of view, the results show the power and competitiveness of the proposed tool with respect to other techniques without presuming or specifying a model.

1. Introduction

In this paper, we are particularly interested in providing new statistical tools that help in the process of modelling univariate time series processes. We are focused on the selection of the appropriate time lags when data faced by the modeller might potentially come from either a linear or a nonlinear dynamic time process. Needless to say, a correct estimate of the lag is essential in forecasting basically because the introduction of delayed information into the dynamic models significantly changes their asymptotic properties. Traditionally, autocorrelation and partial autocorrelation coefficients have been utilized by empirical modellers in specifying the appropriate delays. However, it is well established [1] that processes with zero autocorrelation could still exhibit high order dependence or nonlinear time dependence. This is the case for some bilinear processes and even for purely deterministic chaotic models, among others. In general, autocorrelation-based procedures may be misleading for nonlinear models, and so might fail to detect important nonlinear relationships present in the data, and are therefore of limited utility in detecting appropriate time delays (lags), especially in those scenarios where nonlinear phenomena are more the rule than the exception. The relevance of non-linear time dependent processes in science and social sciences, as well as in macroeconomics and finance, is well established. However, statistical tools that help to specify what lag(s) to use in a nonlinear description of an observed time series are scarce.
From a statistical point of view, this situation has motivated the development of tests for serial independence (see [2] for a review) with statistical power against alternative hypotheses that exhibit general types of dependence. The vast majority of these statistical tests are of nonparametric nature, hence trying to avoid restrictive assumptions on the marginal distributions of the model. However these tests are not designed for selecting relevant lags. This partly explains the relative scarcity of nonparametric techniques for investigating lag dependence, regardless the linear or nonlinear nature of the process, which is an aspect that is unknown in most of the practical cases. Some notable exceptions to this relative scarcity are [36]. A common characteristic of most of these techniques is the use of entropy-based measures to identify the correct lag. Particularly, in [5,6], the use of permutation entropy, evaluated at several delay times, is theoretically motived and then applied to identify from a time series the characteristic lag of the generating system. Interesting physical applications of this approach are [7,8]. In order to complete the paper, the new proposal technique is compared with the widely applied autocorrelation function and with recent techniques based on permutation entropy.
In this paper, we construct a new nonparametric runs statistic, based on symbolic analysis, which estimates the lag that best describes a time series sample. The versatile nature of runs tests is well-known for the relevant statistical literature as it has been used for analyzing independence, symmetries, randomness, etc. Furthermore, symbolic analysis is a field of increasing interest in several scientific disciplines (see [9]). It has foundations in information theory and in theory of dynamic systems. For example, properties of symbolic encodings are central to the theory of communication [10]. Furthermore, there is a well-established mathematical discipline, namely, symbolic dynamics, that studies the behavior of dynamical systems. This discipline started in 1898 with the pioneering works of Hadamard, which developed a symbolic description of sequences of geodesic flows, and was later extended by [11], who coined the name of symbolic dynamics. [12] showed that a complete description of the behavior of a dynamical system can be captured in terms of symbols. This observation is crucial for understanding this paper because important characteristics of a random variable can be also captured by analyzing the symbols derived from it.
The paper finally shows that the new approach can be useful for model identification, and it is applied to the a real time series, particularly, to the New York Stock Exchange.

2. Definitions and Notation

Let {Xi}iI be a time series. Assume first that {Xi}iI is a sequence of categorical data with q categories. Denote by nk the number of elements of the k category. Therefore
k = 1 q n k = n ,
where n = |I| is the cardinality of the set of time indexes I. Under this setting we will define a run as a sequence of categories of the same type.
In the case of quantitative (continuous or discrete) data we will encode the sequence {Xi}iI in q different categories for q a positive integer in the following manner. Denote by Qk the quantile k n q of {Xi}iI for k = 1, 2…, q − 1. Now we are going to encoded the sequence {Xi}iI as the sequence {Xi}iI where
Z i = { 1 if min { X i } < X i Q 1 k if Q k 1 < X i Q k q if Q k < X i max { X i }
A run is obtained by encoding the time series {Xi}iI with a finite set of symbols Γ. Then a sequence of symbols of the same type is called a run. In the previous encoding the set of symbols is Γ = {1, 2,…, q} for the symbolization given in Equation (1).

3. Constructing the Statistic

The classical runs test is defined for q = 2 where only two categories or symbols are used. This can be considered in a multinomial scenario with q > 2. For completeness we will give the construction of the runs test for i.i.d (identically and independently distributed) in the multinomial case that was developed in [13].
Let MR be the random variable counting the number of runs in {Xi}iI (if this sequence is not categorical we will use the symbolization procedure given by Equation (1)). Define the following indicator function
I j = { 1 if X j 1 X j 0 otherwise .
The variable I j is a Bernoulli variable B(p) with probability of success
p = i = 1 q n i ( n n i ) n ( n 1 )
for all j = 2, 3,…, n. Then the statistic MR q remains as:
MR q = 1 + j = 2 n I j
and its expected value is
E ( MR q ) = 1 i = 1 q n i ( n n i ) n .
The estimation of the variance of MR q is more complicated and can be computed as follows (see [13] for a detailed explanation of the computation):
σ MR q 2 = ( n 1 ) p ( 1 ( n 1 ) p ) + 2 i j n i n j ( 2 n n i n j 2 ) n ( n 1 ) + i j n i n j ( ( n i 1 ) ( n n i 1 ) + ( n j 1 ) ( n n j 1 ) + k i , j n k ( n n k 2 ) ) n ( n 1 )
When the lag order of the underlying process is known, say p, one can consider the following p time series
Γ 1 = { X 1 , X 1 + p , X 1 + 2 p , , X 1 + t p , } Γ 2 = { X 2 , X 2 + p , X 2 + 2 p , , X 2 + t p , } Γ p = { X p , X p + p , X p + 2 p , , X p + t p , }
For each one of the time series Γj one can compute the normalized statistic
ϒ q j = MR q E ( MR q ) σ MR q .
Under the null of i.i.d. the statistic ϒ q j is asymptotically normal distributed (see [13]).
Notice that if p is the most relevant lag in the underlying data generating process, then the runs statistic MR measured on each Γj will differ from its expected value more than for any other lag. Then we define
Λ q p = | j = 1 p ϒ q j p |
as the absolute value of the sum of the statistic ϒ q j for j = 1, 2,…, p divided by the square root of p. In the case that the time series {Xi}iI is i.i.d., and therefore no relevant lags are present, the distribution of the statistic Λ q p is the folded standard normal distribution, and hence its expected value is E ( Λ q p ) = 2 π 0 . 7979.
Hence, if p0 is the most relevant lag describing the dynamics of the underlying data generating process then
Λ q p 0 = max { Λ q p | p }
and Λ q p 0 has to be greater than 2 π.

4. Monte Carlo Simulation Experiments

In order to show the statistical power performance of the Λ q p 0 statistic under different scenarios, we have considered (and therefore simulated) the following data generating processes (DGPs) because of its rich linear and nonlinear variety. The models are the following:
  • DGP 1 Xt = 0.3Xt−1 + ∈t,
  • DGP 2 Xt = |0.5Xt−1|0.8 + ∈t,
  • DGP 3 X t = 0 . 8 t 2 2 + t ,
  • DGP 4 Xt = 0.7∈t−1Xt−2+∈t,
  • DGP 5 X t = h t t , h t = 1 + 0 . 8 X t 1 2 ,
  • DGP 6 Xt = 4Xt−1(1 − Xt−1),
  • DGP 7 Xt = ∈t ∼ N(0, 1).
We have considered mainly nonlinear models, but also some linear ones in order to study and compare the procedure with other statistical tools that are commonly used for model specification. Model 1 is a linear processes while Models 2–6 are nonlinear. DGP 1 is an AR1 autoregressive with decaying memory at lag order 1, so the procedure should detect (select) p = 1. DGP 2 is a nonlinear autoregressive model of order 1. DGP 3 is a nonlinear moving average processes of order 2, and then the statistical process is expected to select p = 2. DGP 4 is bilinear with white noise characteristic of orders 2 and 1. Conditional heteroskedastic models (i.e., those with structure in the conditional variance) are commonly employed in financial applications (for example time series that show periods of high and low market uncertainty), and accordingly, it is interesting to know about the behavior of the procedure under these kind of nonlinearities in the conditional variance, so we have included DGP 5. Finally, a purely deterministic model (DGP 6) and an independent and identically distributed stochastic process (DGP 7) are incorporated as they represent two models of opposite nature.
To evaluate the performance of the nonparametric method in finite samples, we compute 1000 Monte Carlo replications of each model, and we consider 6 lags (p). In general, experiments using large data sets are desirable, however situations do occur in which the number of available data is small. Statistical techniques, especially those that are model-free, as it is the case, are very sensitive to the number of observations. For this reason, in the Monte Carlo experiment, we study the effect of small sample size on the outcome of the statistical procedure. We present the results for several different sample sizes, namely, T = 120, 360, 500, 1000, 5000 and 10,000. In order to estimate Λ q p it is necessary to fix the number of quantiles that we will use to symbolise the time series under consideration, in this case we select q = 3, thus only 3 symbols are used to obtain a conclusion about the dynamic structure of the time series under study.
As mentioned in the introduction, we also compare the proposed method with the widely applied autocorrelation functions and with the permutation entropy based technique, as used in [5], which is related to [6] in the sense that both papers look for the lag that minimizes the permutation entropy. In order to apply the procedure we also fix the embedding reconstruction dimension at m = 3, and we will refer to it as h3.
Tables 17 show the percentage of times that the Λ q p statistic, autocorrelation function (ACF) and partial autocorrelation function (PAF), estimate the lag parameter p in 1000 Monte Carlo replications. For sample sizes T = 1000 (or larger), Λ 3 p always selects the correct lag for the linear autoregressive process (DGP 1), the same can be said for ACF and PAF. As the sample size is reduced (to T = 360), Λ q p statistic reduces its statistical power to 93.2%, while autocorrelations functions only deteriorate their power to 99.7%. This is to be expected because autocorrelation functions are ideal for linear processes: even for the smallest sample size its empirical behavior is high (around 90%). According to these results, if the underlying process is linear, the researcher may either use autocorrelations functions or Λ q p for sample sizes above 500. It is not convenient to use Λ q p if sample is below 360 observations. Similarly, for DGP 2, which is a nonlinear variation of the autoregressive DGP 1, the both approaches perform extremely well for sample sizes large than 1000 observations. For these two processes, when compared with h3 for DGP 1 and DGP 2, it can be observed that Λ 3 p outperforms h3.
On the other hand, if the lag-dependence comes from a nonlinear moving average process, we can observe clear differences in favour of the symbolic-runs proposal: The results for DGP 3 show for large data sets (5000 and 10,000) that Λ 3 p is ideal, as it detects the correct lag 100% of the time, while autocorrelation functions correctly estimate the lag only 30% of the time, regardless the sample size, although a better performance is obtained for h3. With DGP 4 we study a combination of lag dependence in an autoregressive component of (dominant) order 2, and moving average lag dependence of order 1. Clearly, the results for this bilinear processes show that Λ 3 p captures the correct lag, while ACF and PAF do not. The empirical evidence is in favor of Λ 3 p when compared with h3.
As commented earlier, if delay structure is introduced via the second conditional moment of the stochastic process (variance), a practitioner would like to have a statistical procedure that might also detect the correct lag. This is what we study in DGP 5. The proposed statistic is very effective in detecting the correct lag for T = 1000 or higher. However, it is remarkable that autocorrelation functions correctly estimate lag less that 50% of the time for all sample sizes. In comparison with the permutation entropy based technique, h3, Λ 3 p outperforms it.
Finally, the last two models (DGP 6 and DGP 7) are also illuminating. The first one is a purely deterministic logistic model, so no stochastic terms are added into it; and the second one is a purely normal distribution. Autocorrelation based approaches perform poorly in detecting the correct lag (i.e., lag 1) for the logistic model. Further, the results of ACF and PAF for the normal samples are statistically not distinguished from those obtained for the logistic equation. In contrast the Λ 3 p procedure detects, even for small sample sizes, that there is a dependence structure and that it comes from lag 1. Interestingly, for this pure deterministic process, the entropy based procedure is superior in these terms, hinting that permutation entropy is very effective when there is no noise terms.
The results provided for DGP 7 allows to understand that to fail to detect the most relevant lag parameter(s) is equivalent to find that all considered lags are equally important, that is to say, δ = 1 τ, where τ is the number of lags that the user has considered in the study. In our Monte Carlo experiment τ = 6 and then δ = 16.66667%. Therefore, for a lag parameter to be detected, the percentage of times Λ 3 p identify that lag should be above γ = δ + z α δ ( 1 + δ ) n, for a nominal level α, where n is the number of Monte Carlo replications and zα is the quantile 1 − α of the Normal standard distribution N(0, 1). In our experiment γ = 18.599 for a confidence level of α = 0.05.

5. Model Identification

In the previous sections we have shown, first theoretically and then empirically, that the proposed method correctly estimates the lag that a data analyst might use to modelling time series data. Now we are concerned with trying to identify the appropriate generating model with the help of Λ q p evaluated at several lags. For instance, we are particularly interested in studying the behavior of Λ q p when the data generating process is an autoregressive linear model (AR(p)-model) and how it behaves for a moving average model (MA(p)-model). In the case of being able to discriminate between models, the statistic would not only be useful for selecting lags, but also to distinguish between models of very different nature. To evaluate the performance of Λ q p for identifying models, the following stochastic models have been studied:
  • AR(1) Xt = 0.5Xt−1 +∈t,
  • MA(1) Xt = 0.5∈t−1 +∈t,
  • AR(2) Xt = 0.5Xt−2 + ∈t,
  • MA(2) Xt = 0.5∈t−2 + ∈t,
  • ARMA(1,1) Xt = 0.4Xt−1 + 0.4∈t−1 + ∈t,
  • ARMA(1,2) Xt = 0.4Xt−1 + 0.4∈t−2 + ∈t,
  • MA(2;4) Xt = 0.6∈t−2 + 0.3∈t−4 + ∈t.
The shared characteristic of these models is that they all have a linear conditional mean. However, some are autoregressive, some moving averages of external shocks and some are mixture of linear processes. These models are well-known in univariate time series analysis, so for the statistic to be of some utility a clear detection of the lag and of the model is expected. Autoregressive models have memory, while moving average models do not. Accordingly, this essential difference should be detected. We compute the average value of the Λ 3 p statistic for sample sizes n = 120, 360, 500, 1000, 5000, 10000 of 1000 Monte Carlo simulations for the seven models. We also do the same for a benchmark normal (0,1) model, which is an i.i.d. process and allows us to show if the expected value of E ( Λ q p ) = 2 π 0 . 7979. is achieved empirically. Averages are given in Figure 1, that reports the results for the largest sample size; while the behavior for the remaining sample sizes are given in Figures 48 which can be found in the appendix.
For models AR(1) and AR(2) the statistic clearly shows an exponential decrease in Λ 3 p, indicating (a) that the correct lag is in 1 and 2, respectively; (b) that the data generating process has memory respecting the true lag, as each true lag is less relevant that the preceding one; and (c) this occurs for all sample sizes. Interestingly, observation (b) sharply contrast with classical techniques that do not gather this salient empirical fact. For models MA(1) and MA(2), our statistic also performs as expected for identifying the model: the true lag is detected, the memoryless basic property is clearly observed for all lags, and therefore cannot be confused with an AR(p) model. The results for mixed models are also of interest. Regarding the ARMA(1,1), the statistic reaches its maximum at the correct lag, namely, 1, and then decays exponentially to zero; given the MA structure at lag 1, statistic’s decay from its value at lag 1 and 2 is more prominent than in the case of an AR(1). This can also be devised for ARMA(1,2) model: the two relevant lags are clearly detected, namely, 1 and 2; the statistic does not decay very fast at lag 2 given its MA(2) structure, but it reappears for p > 2. Finally, in the MA(2;4) model, we have considered a moving average linear process with different weights in the two relevant lags, so we expect and observe that the proposed statistic firstly estimates the correct lag and then determines its relevance. To complete the analysis, the results are compared for two non linear models previously studied (DGP 2 and DGP 3), which are nonlinear counterparts of AR and MA models. It can be observed that the exponential decaying behavior is much faster in the nonlinear case than in the linear case, for the autoregressive models. For the moving average models, the values of the statistic, while being statistically significant, are lower than in the linear MA counterpart.
We now illustrate how our tools can be used for helping in the modelling process of real data. To this end we have considered a well-known empirical time series, namely, the returns obtained from closing values of the daily New York Stock Exchange (NYSE) index from 2000 to 2008 (Figure 2).
Given the series of returns, our proposal consists of using the tools previously presented. To this end we compute Λ 3 p for several lags for the selected time series. The results are given in Figure 3. According to the results, our statistical procedure identifies lags 2 and 4, so the modeler is recommended to use these two lags. As regarding the identification of the underlying model, in view of the Figure 3 when compared with Figure 2, it does not seem that the model is of an autoregressive linear nature, and seem to be closer to a moving average process where lags 2 and 4 will play a relevant role.

6. Conclusions

In this paper, we have presented a statistical procedure, based on the distribution of symbols (number of runs), to estimate the relevant lag of a dynamic generating process from which the researcher only has one observed sample. This is the first time runs has been used for detecting structure, and it is also the first time it is used for model identification. The technique shows several appealing advantages: (1) It is model independent, so that the end-user can easily use it without assuming or estimating a model; (2) It can be used for stochastic processes of linear or nonlinear nature, because in both studied scenarios the procedure is very competitive and robust; (3) In the studied models, it correctly detects the correct lag even in the case of a relatively small number observations, which facilitates its use even in sciences where there are small samples; (4) When it is compared with the standard autocorrelation function and/or partial autocorrelation functions, the empirical results are undoubtedly in favor of the new statistical tool; and when compared with permutation entropy based procedures, it generally has more statistical power; (5) Particularly interesting for financial data analysis, it shows an extraordinary empirical behavior when the lag structure is in the second conditional moment of the data generating process; (6) It can be used for identifying, no only lags, but also linear models. Points (1) to (6) makes the new statistical tool general, and widely applicable for data analyzers.

Acknowledgments

The authors are grateful to the following research grants: EC02012-36032-C03-03, MTM2012-35240, from the Ministerio de Economía y Competitividad de España and the COST Action IS1104: The EU in the new economic complex geography: models, tools and policy evaluation.

Author Contributions

Manuel Ruiz, Mariano Matilla-García, Úrsula Faura and Matilde Lafuente conceived and designed the novel statistical test. Manuel Ruiz and Mariano Matilla-García developed the analysis tool. Manuel Ruiz implemented the software. Manuel Ruiz, Mariano Matilla-García, Matilde Lafuente and Úrsula Faura acquired and generated the datasets, analyzed the data and interpreted the results. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

A. Appendix

Figure 4. Mean value of Λ 3 p statistic as a function of the lag time (for p = 1, 2,…,6). Sample size for each realization is fixed at n = 120. The number of Monte Carlo simulations is 1000 for each model. Blue bars refer to average the Λ 3 p for each model. Red bars refer to the benchmark standard normal process.
Figure 4. Mean value of Λ 3 p statistic as a function of the lag time (for p = 1, 2,…,6). Sample size for each realization is fixed at n = 120. The number of Monte Carlo simulations is 1000 for each model. Blue bars refer to average the Λ 3 p for each model. Red bars refer to the benchmark standard normal process.
Entropy 17 02706f4
Figure 5. Mean value of Λ 3 p statistic as a function of the lag time (for p = 1, 2,…, 6). Sample size for each realization is fixed at n = 360. The number of Monte Carlo simulations is 1000 for each model. Blue bars refer to average the Λ 3 p for each model. Red bars refer to the benchmark standard normal process.
Figure 5. Mean value of Λ 3 p statistic as a function of the lag time (for p = 1, 2,…, 6). Sample size for each realization is fixed at n = 360. The number of Monte Carlo simulations is 1000 for each model. Blue bars refer to average the Λ 3 p for each model. Red bars refer to the benchmark standard normal process.
Entropy 17 02706f5
Figure 6. Mean value of Λ 3 p statistic as a function of the lag time (for p = 1, 2,…,6). Sample size for each realization is fixed at n = 500. The number of Monte Carlo simulations is 1000 for each model. Blue bars refer to average the Λ 3 p for each model. Red bars refer to the benchmark standard normal process.
Figure 6. Mean value of Λ 3 p statistic as a function of the lag time (for p = 1, 2,…,6). Sample size for each realization is fixed at n = 500. The number of Monte Carlo simulations is 1000 for each model. Blue bars refer to average the Λ 3 p for each model. Red bars refer to the benchmark standard normal process.
Entropy 17 02706f6
Figure 7. Mean value of Λ 3 p statistic as a function of the lag time (for p = 1, 2,…, 6). Sample size for each realization is fixed at n = 1000. The number of Monte Carlo simulations is 1000 for each model. Blue bars refer to average the Λ 3 p for each model. Red bars refer to the benchmark standard normal process.
Figure 7. Mean value of Λ 3 p statistic as a function of the lag time (for p = 1, 2,…, 6). Sample size for each realization is fixed at n = 1000. The number of Monte Carlo simulations is 1000 for each model. Blue bars refer to average the Λ 3 p for each model. Red bars refer to the benchmark standard normal process.
Entropy 17 02706f7
Figure 8. Mean value of Λ 3 p statistic as a function of the lag time (for p = 1, 2,…, 6). Sample size for each realization is fixed at n = 5000. The number of Monte Carlo simulations is 1000 for each model. Blue bars refer to average the Λ 3 p for each model. Red bars refer to the benchmark standard normal process.
Figure 8. Mean value of Λ 3 p statistic as a function of the lag time (for p = 1, 2,…, 6). Sample size for each realization is fixed at n = 5000. The number of Monte Carlo simulations is 1000 for each model. Blue bars refer to average the Λ 3 p for each model. Red bars refer to the benchmark standard normal process.
Entropy 17 02706f8

References

  1. Granger, C.; Weiss, A. Time Series Analysis of Error-Correcting Models. In Studies in Econometrics, Time Series, and Multivariate Statistics; Academic Press: New York, NY, USA, 1983. [Google Scholar]
  2. Tjostheim, D. Measures and tests of independence: A survey. Statistics 1996, 28, 249–284. [Google Scholar]
  3. Granger, C.; Lin, J. Using the mutual information coefficient to identify lags in nonlinear models. J. Time Series Anal. 1994, 15, 371–384. [Google Scholar]
  4. Tjostheim, D.; Auestad, B. Nonparametric identification of nonlinear time series: Selecting significant lags. J. Am. Statist. Assoc. 1994, 89, 1410–1419. [Google Scholar]
  5. Matilla-Garcia, M.; Ruiz Marin, M. Detection of non-linear structure in time series. Econ. Lett. 2009, 105, 1–6. [Google Scholar]
  6. Zunino, L.; Soriano, M.C.; Fischer, I.; Rosso, O.A.; Mirasso, C.R. Permutation-information-theory approach to unveil delay dynamics from time-series analysis. Phys. Rev. E 2010, 82, 046212. [Google Scholar]
  7. Soriano, M.C.; Zunino, L.; Rosso, O.A.; Fischer, I.; Mirasso, C.R. Time Scales of a Chaotic Semiconductor Laser with Optical Feedback Under the Lens of a Permutation Information Analysis. IEEE J. Quant. Electr. 2011, 42, 242–261. [Google Scholar]
  8. Toomey, J.P.; Kane, D.M. Mapping the dynamic complexity of a semiconductor laser with optical feedback using permutation entropy. Opt. Express 2014, 22, 1713–1725. [Google Scholar]
  9. Amigo, J.M. Permutation Complexity in Dynamical Systems; Springer: Berlin, Germany, 2010. [Google Scholar]
  10. Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; University of Illinois Press: Urbana, IL, USA, 1949. [Google Scholar]
  11. Morse, M. Recursive geodesic on a surface of negative curvature. Trans. Am. Math. Soc. 1949, 22, 84–100. [Google Scholar]
  12. Collet, P.; Eckmann, J.P. Iterated Maps on the Interval as Dynamical Systems; Brickhauser: Basel, Switzerland, 1980. [Google Scholar]
  13. Ruiz Marin, M.; Faura, U.; Lafuente, M.; Dore, M. H. I. Nonparametric Tests for Serial Dependence Based on Runs. Dyn. Psychol. Life Sci. 2014, 18, 123–136. [Google Scholar]
Figure 1. Mean value of Λ 3 p statistic as a function of the lag time (for p = 1, 2,…, 6). Sample size for each realization is fixed at n = 10,000. The number of Monte Carlo simulations is 1000 for each model. Blue bars refer to average the Λ 3 p for each model. Red bars refer to the benchmark iid process.
Figure 1. Mean value of Λ 3 p statistic as a function of the lag time (for p = 1, 2,…, 6). Sample size for each realization is fixed at n = 10,000. The number of Monte Carlo simulations is 1000 for each model. Blue bars refer to average the Λ 3 p for each model. Red bars refer to the benchmark iid process.
Entropy 17 02706f1
Figure 2. NYSE Daily Returns (2000–2008).
Figure 2. NYSE Daily Returns (2000–2008).
Entropy 17 02706f2
Figure 3. Λ 3 p function for NYSE Daily Returns (blue) and (red) expected value for Λ 3 p in case of no relevant lag.
Figure 3. Λ 3 p function for NYSE Daily Returns (blue) and (red) expected value for Λ 3 p in case of no relevant lag.
Entropy 17 02706f3
Table 1. Comparison Λ 3 p against ACF, PAF and h3 for DGP 1: Xt = 0.3Xt−1 + ∈t.
Table 1. Comparison Λ 3 p against ACF, PAF and h3 for DGP 1: Xt = 0.3Xt−1 + ∈t.
Tp = 1p = 2p = 3p = 4p = 5p = 6

120 Λ q p57.2117.68.28.67.4
ACF90.71.91.91.81.91.8
PAF91.421.21.91.71.8
h326.414.112.815.215.316.2

360 Λ q p93.21.71.21.51.31.1
ACF99.70.10.10.100
PAF99.80.10.1000
h350.512.08.98.99.210.5

500 Λ q p97.80.90.50.30.40.1
ACF10000000
PAF10000000
h363.09.76.66.15.88,8

1000 Λ q p10000000
ACF10000000
PAF10000000
h385.74.62.22.72.72.1

5000 Λ q p10000000
ACF10000000
PAF10000000
h310000000

10,000 Λ q p10000000
ACF10000000
PAF10000000
h310000000
Percentage of times that each lag parameter has been detected on 1000 Monte Carlo simulations. T stands for sample size and p for the considered lags.
Table 2. Comparison Λ 3 p against ACF, PAF and h3 for DGP 2: Xt = |0.5Xt−1|0.8 + ∈t.
Table 2. Comparison Λ 3 p against ACF, PAF and h3 for DGP 2: Xt = |0.5Xt−1|0.8 + ∈t.
Tp = 1p = 2p = 3p = 4p = 5p = 6

120 Λ q p40.913.711.611.612.99.3
ACF57.99.97.98.28.27.9
PAF57.59.17.399.18
h326.013.512.618.013.816.1

360 Λ q p73.95.94.25.95.15
ACF92.11.61.821.31.2
PAF92.71.61.31.81.31.3
h347.19.412.08.911.111.5

500 Λ q p83.13.83.13.11.93
ACF96.90.20.51.20.90.3
PAF97.30.20.60.90.80.2
h357.89.67.48.58.58.2

1000 Λ q p97.3110.40.30
ACF99.80.10.1000
PAF99.90.10000
h382.24.02.23.53.94.2

5000 Λ q p10000000
ACF10000000
PAF10000000
h310000000

10,000 Λ q p10000000
ACF10000000
PAF10000000
h310000000
Percentage of times that each lag parameter has been detected on 1000 Monte Carlo simulations. T stands for sample size and p for the considered lags.
Table 3. Comparison Λ 3 p against ACF, PAF and h3 for DGP 3: X t = 0 . 8 t 2 2 + t.
Table 3. Comparison Λ 3 p against ACF, PAF and h3 for DGP 3: X t = 0 . 8 t 2 2 + t.
Tp = 1p = 2p = 3p = 4p = 5p = 6

120 Λ q p15.121.517.213.217.515.5
ACF14.93012.915.214.312.7
PAF14.227.912.316.314.714.6
h311.430.611.815.015.715.5

360 Λ q p12.133.812.513.413.514.7
ACF14.430.312.71214.616
PAF14.529.812.512.414.915.9
h312.355.48.48.08.27.7

500 Λ q p11.844.310.511.211.410.8
ACF14.726.21314.715.515.9
PAF14.126.612.81515.116.4
h312.268.24.95.94.84.0

1000 Λ q p8.462.778.16.57.3
ACF13.628.614.814.116.112.8
PAF142914.513.915.912.7
h38.585.71.41.21.71.5

5000 Λ q p0.299.10.500.10.1
ACF13.428.915.113.714.414.5
PAF13.329.114.913.614.414.7
h301000000

10,000 Λ q p01000000
ACF13.630.912.215.414.113.8
PAF13.430.712.315.713.914.0
h301000000
Percentage of times that each lag parameter has been detected on 1000 Monte Carlo simulations. T stands for sample size and p for the considered lags.
Table 4. Comparison Λ 3 p against ACF, PAF and h3 for DGP 4: Xt = 0.7∈t−1Xt−2 + ∈t.
Table 4. Comparison Λ 3 p against ACF, PAF and h3 for DGP 4: Xt = 0.7∈t−1Xt−2 + ∈t.
Tp = 1p = 2p = 3p = 4p = 5p = 6

120 Λ q p19.221.514.315.715.813.5
ACF24.523.117.7149.810.9
PAF23.225.31713.39.311.4
h315.112.118.817.118.218.7

360 Λ q p17.336.412.411.91111
ACF22.724.616.6178.610.5
PAF22.525.115.617.29.410.2
h315.417.016.317.815.318.2

500 Λ q p17.643.610.39.810.18.6
ACF19.927.117.614.49.811.2
PAF20.726.517.414.29.811.4
h316.215.615.717.319.415.8

1000 Λ q p20.558.36.96.44.33.6
ACF22.325.218.315.47.811
PAF21.925.517.915.68.310.8
h316.618.514.816.718.115.3

5000 Λ q p8.690.70.30.400
ACF22.129.213.715.87.911.3
PAF21.929.413.815.87.79.1
h316.044.99.110.99.49.7

10,000 Λ q p2.297.80000
ACF20.730.614.917.96.99
PAF20.230.615.117.97.19.1
h313.464.54.06.46.15.6
Percentage of times that each lag parameter has been detected on 1000 Monte Carlo simulations. T stands for sample size and p for the considered lags.
Table 5. Comparison Λ 3 p against AC F, PAF and h3 for DGP 5: X t = h t t, h t = 1 + 0 . 8 X t 1 2.
Table 5. Comparison Λ 3 p against AC F, PAF and h3 for DGP 5: X t = h t t, h t = 1 + 0 . 8 X t 1 2.
Tp = 1p = 2p = 3p = 4p = 5p = 6

120 Λ q p36.314.212.511.21411.8
ACF36.521.114.69.99.58.4
PAF36.121.313.59.79.99.5
h316.816.915.116.617.417.2

360 Λ q p61.611.66.96.65.97.4
ACF39.722.214.78.49.75.3
PAF38.923.1149.29.15.7
h320.216.314.616.115.217.6

500 Λ q p73.210.24.944.43.3
ACF39.721.715.311.26.85.3
PAF4022.215.7106.55.6
h324,718.012.914.415.414.6

1000 Λ q p90.55.60.90.81.50.7
ACF43.423.313.48.56.74.7
PAF43.124.512.48.46.74.9
h330.414.814.612.615.212.4

5000 Λ q p10000000
ACF43.624.814.48.65.43.2
PAF43.725.314.47.95.33.4
h376.59.44.33.63.23.0

10,000 Λ q p10000000
ACF45.725.211.98.16.32.8
PAF4525.712.37.86.23
h393.54.20.60.40.70.6
Percentage of times that each lag parameter has been detected on 1000 Monte Carlo simulations. T stands for sample size and p for the considered lags.
Table 6. Comparison Λ 3 p against ACF, PAF and h3 for DGP 6: Xt = 4Xt−1(1 − Xt−1).
Table 6. Comparison Λ 3 p against ACF, PAF and h3 for DGP 6: Xt = 4Xt−1(1 − Xt−1).
Tp = 1p = 2p = 3p = 4p = 5p = 6

120 Λ q p73.58.94.73.45.54
ACF18.715171717.215.1
PAF16.71516.617.418.915.4
h310000000

360 Λ q p94.64.30.30.20.30.3
ACF15.414.717.916.516.918.6
PAF15.814.517.616.41718.7
h310000000

500 Λ q p97.52.10.10.10.10.1
ACF16.113.317.717.717.617.6
PAF15.213.818.317.817.617.3
h310000000

1000 Λ q p10000000
ACF17.215.115.218.51517
PAF16.315.515.418.417.716.7
h310000000

5000 Λ q p10000000
ACF15.617.117.617.416.815.5
PAF15.816.917.617.416.715.6
h310000000

10,000 Λ q p10000000
ACF15.216.217.217.71617.7
PAF15.516.117.317.616.117.4
h310000000
Percentage of times that each lag parameter has been detected on 1000 Monte Carlo simulations. T stands for sample size and p for the considered lags.
Table 7. Comparison Λ 3 p against ACF, PAF and h3 for DGP 7: Xt ~ N(0,1).
Table 7. Comparison Λ 3 p against ACF, PAF and h3 for DGP 7: Xt ~ N(0,1).
Tp = 1p = 2p = 3p = 4p = 5p = 6

120 Λ q p17.515.314.917.918.116.3
ACF18.816.216.415.51518.1
PAF17.816.71516.415.518.6
h311.314.819.116.121.217.5

360 Λ q p14.617.316.118160.18
ACF17.717.417.317.315.215.1
PAF16.716.617.816.915.816.2
h316.017.016.916.718.015.4

500 Λ q p15.117.516.318.315.517.3
ACF16.416.316.316.617.117.3
PAF16.116.216.517.31716.9
h315.814.716.717.017.818.0

1000 Λ q p15.516.618.216.917.715.1
ACF17.116.51717.715.316.4
PAF17.516.3171715.316.9
h314.715.916.218.818.116.3

5000 Λ q p16.918.916.41615.716.1
ACF16.515.516.816.815.319.1
PAF16.615.816.516.715.518.9
h316.616.116.115.617.717.9

10,000 Λ q p16.91618.915.516.616.1
ACF14.316.818.516.615.618.2
PAF14.316.918.716.715.617.8
h316.816.515.515.818.317.1
Percentage of times that each lag parameter has been detected on 1000 Monte Carlo simulations. T stands for sample size and p for the considered lags.

Share and Cite

MDPI and ACS Style

Faura, Ú.; Lafuente, M.; Matilla-García, M.; Ruiz, M. Identifying the Most Relevant Lag with Runs. Entropy 2015, 17, 2706-2722. https://doi.org/10.3390/e17052706

AMA Style

Faura Ú, Lafuente M, Matilla-García M, Ruiz M. Identifying the Most Relevant Lag with Runs. Entropy. 2015; 17(5):2706-2722. https://doi.org/10.3390/e17052706

Chicago/Turabian Style

Faura, Úrsula, Matilde Lafuente, Mariano Matilla-García, and Manuel Ruiz. 2015. "Identifying the Most Relevant Lag with Runs" Entropy 17, no. 5: 2706-2722. https://doi.org/10.3390/e17052706

APA Style

Faura, Ú., Lafuente, M., Matilla-García, M., & Ruiz, M. (2015). Identifying the Most Relevant Lag with Runs. Entropy, 17(5), 2706-2722. https://doi.org/10.3390/e17052706

Article Metrics

Back to TopTop