Next Article in Journal
Celebrated Econometricians: Katarina Juselius and Søren Johansen
Previous Article in Journal
Algorithmic Modelling of Financial Conditions for Macro Predictive Purposes: Pilot Application to USA Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Alternative Estimation Method for Time-Varying Parameter Models

1
Faculty of Economics, Keio University, 2-15-45 Mita, Minato-ku, Tokyo 108-8345, Japan
2
School of Commerce, Meiji University, 1-1 Kanda-Surugadai, Chiyoda-ku, Tokyo 101-8301, Japan
3
Keio Economic Observatory, Keio University, 2-15-45 Mita, Minato-ku, Tokyo 108-8345, Japan
4
Faculty of Policy Management, Keio University, 5322 Endo, Fujisawa 252-0882, Kanagawa, Japan
*
Author to whom correspondence should be addressed.
Econometrics 2022, 10(2), 23; https://doi.org/10.3390/econometrics10020023
Submission received: 31 October 2021 / Revised: 14 March 2022 / Accepted: 20 April 2022 / Published: 27 April 2022

Abstract

:
A multivariate, non-Bayesian, regression-based, or feasible generalized least squares (GLS)-based approach is proposed to estimate time-varying VAR parameter models. Although it has been known that the Kalman-smoothed estimate can be alternatively estimated using GLS for univariate models, we assess the accuracy of the feasible GLS estimator compared with commonly used Bayesian estimators. Unlike the maximum likelihood estimator often used together with the Kalman filter, it is shown that the possibility of the pile-up problem occurring is negligible. In addition, this approach enables us to deal with stochastic volatility models, models with a time-dependent variance–covariance matrix, and models with non-Gaussian errors that allow us to deal with abrupt changes or structural breaks in time-varying parameters.

1. Introduction

Most macroeconomists recognize that time-varying parameter models are sufficiently flexible to capture the complex nature of a macroeconomic system, thereby fitting the data better than models with constant parameters. The instability of the parameters in econometric models has often been incorporated in Markov-switching models (e.g., Hamilton 1989) and structural change models (e.g., Perron 1989). However, time-varying models allow the parameters to change gradually over time, which is the main difference between time-varying models and Markov-switching or structural break models.
In the literature on the application of time-varying vector autoregressive (TV-VAR) models in macroeconomics, Bernanke and Mihov (1998) consider that the autoregressive parameters may be time-varying. However, after confirming the stability of the parameters using the parameter consistency test of Hansen (1992), they employ the time-invariant (i.e., usual) VAR model. Indeed, Cogley and Sargent (2005) find that Hansen’s (1992) test has low power and is unreliable and instead propose a TV-VAR model with stochastic volatility in the error term. Primiceri (2005) sheds light on a technical aspect of the time-varying model, namely, the Bayesian estimation technique used for the time-varying parameters. In general, difficulties in dealing with time-varying parameter models arise when free parameters and unobserved variables need to be estimated. Primiceri (2005) thus presents a clear estimation procedure based on the Bayesian Markov Chain Monte Carlo (MCMC) method.
Several studies, including Primiceri (2005), claim that the Bayesian method is preferred to the maximum likelihood (ML) method because the former (i) is less likely to suffer from the so-called pile-up problem (Sargan and Bhargava 1983); (ii) is less likely to have computation problems such as a degenerated likelihood function or multiple local minima; and (iii) helps find statistical inferences such as standard errors. However, both the Bayesian and the ML methods require Kalman filtering to estimate an unobservable state vector that includes the time-varying parameters.1
Duncan and Horn (1972), Sant (1977), Maddala and Kim (1998), more recently, Chan and Jeliazkov (2009) and Durbin and Koopman (2012) have attempted to understand Kalman filtering through the lens of conventional regression models.2 To the best of our knowledge, Duncan and Horn (1972) are the first to show that the generalized least squares (GLS) estimator for basic state-space models equivalently uncovers the unobserved state vector estimated through Kalman filtering. Sant (1977) proves the equivalence between the GLS estimator and Kalman-smoothed estimator. From a practical perspective, the series of papers by Ito et al. (2014, 2016, 2021) apply the TV-VAR, time-varying autoregressive (TV-AR) and time-varying vector error correction (TV-VEC) models to stock prices and exchange rates using the regression method as opposed to the Kalman filter.
In this study, in the spirit of Duncan and Horn (1972) and Sant (1977), we propose a multivariate, regression-based approach or GLS-based approach that utilizes ordinary least squares (OLS) or GLS in lieu of the Kalman smoother. More precisely, our GLS-based approach includes OLS as a variety of GLS. It is employed by Ito et al. (2014, 2016, 2021) to evaluate market efficiency in stock markets and foreign exchange markets.3 However, the main purpose of employing the Kalman filter (or smoother) is to avoid using a system of large matrices required by GLS—at least until computers became capable of handling large matrices.
The equivalence between GLS and the Kalman smoother leads us to the following question: if GLS yields Kalman-smoothed estimates, to what extent can the GLS-based approach recover time-varying parameters? This question is practical and important because the finite sample properties of the GLS estimator are, generally, unknown.4 Another question pertains to the seriousness of the pile-up problem, which occurs when the ML estimation of the variance of the state equation error is zero, even though its true value is non-zero (but small). While our proposed method is not identical to ML because we do not maximize the likelihood function with respect to the variances of the errors, whether our GLS-based approach suffers from the pile-up problem to the same degree as ML is not immediately obvious.
We also consider the possibility of non-independent and identically distributed (i.i.d.) or non-Gaussian errors in the model. The former are frequently used in this field because it is reasonable to assume that the variance of the errors may be time-varying. The latter are important in empirical studies because they allow us to model abrupt changes or structural breaks in time-varying parameters similar to the strategy employed by Perron and Wada (2009) among others.
To summarize, the contributions of this study are as follows: GLS estimates the true time-varying parameters fairly well even when the errors are not i.i.d. or not Gaussian, provided an appropriate way to implement FGLS is carefully chosen based primarily on the relative size of the variances of the errors or signal-to-noise ratio (SNR). The pile-up problem that is often cumbersome to ML is shown to be negligible. In addition, our GLS outperforms the commonly used Bayesian estimation method in recovering the time-varying parameter values.
The rest of this paper is organized as follows. Section 2 presents our model together with its likelihood function. We explain the GLS-based approach for the class of TV-AR models and steps used to implement FGLS in Section 3. Section 4 evaluates the GLS-based approach in a variety of conditions such as a small SNR, non-i.i.d. errors, and non-Gaussian errors.
An application to macroeconomic data, including a comparison with the Bayesian MCMC method, is demonstrated in Section 5. Section 6 concludes.

2. Model

Our model allows the class of TV-AR models and permits two different matrix forms. The first matrix form is that of Durbin and Koopman (2012), which they use as a device to find the Kalman-smoothed estimate of an unobserved state vector. The second matrix form is an extended version of Maddala and Kim (1998), which we employ in this study. As it will become clear, this form allows us to use GLS to estimate the time-varying parameters. We can then formally demonstrate that the Kalman-smoothed estimate of the first matrix form is equivalent to the GLS estimates of the second matrix form, showing that GLS estimates are an alternative estimation method to the Kalman smoother.

2.1. Basic State-Space Model of the Class of TV-AR Models

Our model is given by:
y t = Z t β t + ε t
β t = β t 1 + η t ,
where y t is a k × 1 vector of observable variables; Z t is a k × m matrix of observable variables; β t is an m × 1 vector of time-varying parameters; and ε t and η t are k × 1 and m × 1 vectors of normally distributed error terms with the variance–covariance matrices of H t and Q t , respectively:5
ε t η t N 0 0 , H t 0 0 Q t .
The variance–covariance matrices H t and Q t are allowed to be time-dependent, as in the stochastic volatility model. For the initial value of β t , we assume
β 0 N b 0 , P 0 .
If we assume b 0 and P 0 are known, it is reasonable to use the diffuse prior for P 0 because β t follows a non-stationary process. In this case, the diagonal elements of P 0 should be large numbers (e.g., Harvey 1989; Koopman 1997). Alternatively, we can simply ignore P 0 as zero when we assume β 0 is known and not stochastic.
Equations (1) and (2) can be used for a variety of TV-AR models. For example, when k = 1 , Z t = y t 1 yields a TV-AR(1) model. Similarly, the TV-VAR(1) model y t = A t y t 1 + ε t with A t = A t 1 + η t is expressed by setting Z t = y t 1 I k and β t = v e c A t . It is also possible to include intercepts that vary over time. For a TV-AR(1) model, for example, one can set Z t = 1 , y t 1 , meaning that the first element of β t is the time-varying intercept.
Below, we present two specifications of our model, (1) and (2). The first specification allows us to derive the Kalman-smoothed estimate as explained by Durbin and Koopman (2012). The second specification is in the spirit of Duncan and Horn (1972), leading us to the GLS-based approach. As we will see, both specifications yield the same smoothed estimate.

2.2. Model Matrix Formulation of the State-Space Model

Following Durbin and Koopman (2012), we employ the matrix formulation of Equations (1) and (2). For t = 1 , , T , we have a system of equations:
Y T = Z β + ε
β = C b 0 * + η
ε N 0 , H , η N 0 , Q ,
with
Y T = y p + 1 y p + 2 y T , Z = Z p + 1 0 Z p + 2 0 Z T , β = β p + 1 β p + 2 β T , ε = ε p + 1 ε p + 2 ε T H = H p + 1 0 H p + 2 0 H T , C = I 0 0 I I 0 I I I I , b 0 * = b 0 0 0 , P 0 * = P 0 0 0 0 0 0 0 0 , η = η p + 1 η p + 2 η T , Q = Q p + 1 0 Q p + 2 0 Q T .
Unlike a more general state-space model in which Equation (2) has a transition matrix that includes unknown parameters to be estimated, the matrix formulation of the time-varying parameter model is largely simplified. For example, matrix C is often called the random walk generating matrix (e.g., Tanaka 2017), which is non-singular, and there are no free parameters to be estimated in the matrix. In addition, if H t and Q t are time invariant (i.e., if no GARCH effects or stochastic volatility exists in the model), matrices H and Q are simplified substantially.
For simplicity, we assume b 0 is known and non-stochastic; hence, P 0 = 0 .6

2.3. Likelihood Function

Since we assume that ε and η are normally distributed, our matrix formulation of (1) and (2) allows us to write the log-likelihood function for Y T given the covariance matrices of the errors (H and Q), and initial value vector ( b 0 * ) as
log p Y T | H , Q , b 0 * = T p k 2 log 2 π 1 2 log Ω 1 2 Y T Z C b 0 * Ω 1 Y T Z C b 0 * ,
where
Ω = H + Z C Q C Z .
Interestingly, provided that H , Q , and b 0 * are known, the likelihood function does not involve our main parameter vector of interest, β .

3. Estimation of the TV-AR Models

3.1. Regression Lemma and Kalman Smoothing

Before showing the equivalence of our estimator and Kalman smoother, let us clarify the outcomes of the Kalman smoother when the model is described by Equations (1) and (2). According to Durbin and Koopman (2012), the Kalman-smoothed state of β is given by the expectation of β conditional on the information on all the observations of y t :
β ˜ = E β | Y T = E β + C o v β , Y T V a r Y T 1 Y T E Y T .
Note that we assume normal errors to derive Equation (6). The variance of β , given all the observations Y T , is then
V a r β | Y T = V a r β C o v β , Y T V a r Y T 1 C o v β , Y T .
The Kalman-smoothed estimate and its mean squared error (MSE) are given by (6) and (7), respectively. Durbin and Koopman (2012) call these equations the regression lemma, which derives the mean and variance of the distribution of β conditional on Y T , assuming the joint distribution of β and Y T is a multivariate normal distribution. It follows that for (3) and (4), the Kalman-smoothed estimate is
β ˜ = E β | Y T = C b 0 * + C Q C Z Ω 1 Y T Z C b 0 * ,
and the conditional variance (or MSE) of the smoothed estimate is
V a r β | Y T = C Q C C Q C Z Ω 1 Z C Q C ,
where Ω = H + Z C Q C Z . Equations (8) and (9) are obtained given that C o v β , Y T = V a r β Z = C Q C Z and V a r Y T = Z C Q C Z + H = Ω , and by substituting them into Equations (6) and (7). It is well known that (8) is a minimum-variance linear unbiased estimate of β , given Y T , even though we do not assume the errors are normally distributed (e.g., Durbin and Koopman 2012).

3.2. Equivalence of the GLS-Based Estimator and Kalman Smoother

Equations (3) and (4) can be written in another matrix form to apply conventional regression analysis:
Y T b 0 * = Z C 1 β + ε η .
This specification is similar to those of Duncan and Horn (1972) and Maddala and Kim (1998). The main difference between our specification and that of Duncan and Horn (1972) is that the former applies to a time-varying parameter model, while the latter is for a more general state-space model, which allows the transition equation to have a transition matrix F (i.e., when Equation (2) is β t = F β t 1 + η t ). Since we do not need to estimate the transition matrix, our regressors in Equation (10) are all known.7
As mentioned by Duncan and Horn (1972), the confusion around the similarities and differences between Kalman filtering (including smoothing) and the conventional regression model stems from the fact that the former is the expectation of β , conditional on the information on Y T , which is the linear projection of β onto the space spanned by Y T (provided that the errors are normally distributed). On the contrary, the latter is a linear projection of the dependent variable onto the space spanned by the regressor, which is the projection of the left hand side of Equation (10) onto the space spanned by Z C 1 . However, Duncan and Horn (1972) show that GLS for (10) until the time-t observation yields the Kalman-filtered estimate.
As shown by Sant (1977), a natural extension of Duncan and Horn (1972) is that we obtain the Kalman-smoothed estimate of β when GLS is applied to all the observations, Y T . We summarize the equivalence of the GLS estimator of model (10) and Kalman-smoothed estimator (8) and its MSE matrix (9) in Appendix A. We further reveal the equivalence when the time-varying model has time-invariant intercepts in Appendix B. With the likelihood function (5), GLS yields the ML estimator for the time-invariant intercepts and the Kalman-smoothed estimator for the rest (time-varying coefficients).

3.3. GLS in Practice

As shown in the previous subsections, under the condition that the variance–covariance matrices of the errors (H and Q) are known, the GLS estimator of β is identical to the Kalman-smoothed estimates. However, in practice, those variance–covariance matrices are generally unknown. The following two-step approach is often used to find the FGLS estimator. First, β can be estimated using OLS. Then, the OLS residuals are used to estimate H and Q, which are denoted as H ^ and Q ^ , respectively. In the second step, FGLS is applied to our model assuming H ^ and Q ^ are the variance–covariance matrices of ε and η , respectively.
However, FGLS suffers from two problems. First, H and Q may involve too many unknown parameters. For example, when a TV-VAR(p) model has many variables (i.e., when k is large), H has T p of k × k matrices, and Q has the same number of k k p + 1 × k k p + 1 matrices. The second problem is possible heteroskedasticity. Suppose that ε is much greater in magnitude than η . More precisely, when the average trace of H is much larger than the average trace of Q, our GLS-based approach has heteroskedasticity in regression Equation (10), potentially causing an imprecise estimation of β . This concern is largely mitigated when the average traces of H and Q are similar.
To solve these two problems, we propose the following FGLS procedure.
  • Step 1. We estimate model (10) by OLS and obtain the estimate of β by OLS, β ^ O . From the OLS residuals, ε ^ t and η ^ t , we construct the first-step estimates of H t and Q t :
    H ^ t = 1 T p t = p + 1 T ε ^ t ε ^ t and Q ^ t = 1 T p t = p + 1 T η ^ t η ^ t .
    Then, to construct the estimates of H and Q, denoted as H ^ O and Q ^ O , respectively, we set H ^ p + 1 = H ^ p + 2 = = H ^ T and Q ^ p + 1 = Q ^ p + 2 = = Q ^ T to assume that the variances of ε and η are time-invariant. This assumption is undesirable because a number of studies of TV-VAR models have focused on stochastic volatility models, which require Q ^ t Q ^ t + 1 , for example. However, thanks to this assumption, H and Q are always invertible, and those inverses are readily computed. The simulations in the next section will reveal how severely this assumption affects our estimation when stochastic volatility is present. With H ^ O and Q ^ O , the log-likelihood is computed by (A6) or (5).
  • Step 2 (1FGLS). Given H ^ O and Q ^ O , we apply FGLS to obtain β ^ G 1 , which is the FGLS or 1FGLS estimate of β . We also compute the estimates of H and Q, denoted as H ^ G 1 and Q ^ G 1 , respectively, in the same way as we computed H ^ O and Q ^ O in the first step. Then, the value of the log-likelihood function is computed.
  • Step 3 (2FGLS). We repeat Step 2, computing β ^ G 2 , which is the (second-time) FGLS or 2FGLS of β . More precisely, we use the FGLS residuals in Step 2 to construct H ^ t and Q ^ t to obtain β ^ G 2 . Then, the value of the log-likelihood function is computed. If the likelihood ratio (from OLS to 1FGLS or from 1FGLS to 2FGLS) cannot be computed or is extraordinarily large, such as greater than 1e+10, we disregard the 1FGLS and 2FGLS estimators because both indicate that the variance–covariance matrix is not precisely estimated (degenerated). In such a case, we only record OLS. In addition, we define 2FGLS’ as GLS using Y ^ T = Z β ^ 1 G 1 and b ^ 0 * = C 1 β ^ 2 G 1 in place of ε ^ t and η ^ t , respectively, to compute H ^ t and Q ^ t , where β ^ 1 G 1 and β ^ 2 G 1 are the corresponding elements of 1FGLS, β ^ G 1 . The reason why we use β ^ G 2 , which denotes 2FGLS’, is that it is expected to ameliorate the effects arising from poorly estimated β ^ G 1 . That is perhaps due to misspecified H and Q. When those matrices are not correctly estimated, β ^ G 1 may be far from its true value; hence, the residuals computed from β ^ G 1 should not be used for further FGLS because the repeated use of the wrong variance–covariance matrices may make the estimator worse. In such a case, it may make sense to obtain β ^ G 2 as it does not repeat the same type of misspecification.
In summary, our procedure is based on the assumptions that the error terms have time-invariant variances and that the heteroskedasticity arising from the different sizes of H and Q can be correctly handled by the repeated use of FGLS. To validate our assumptions and procedure, we investigate the degree to which our procedure precisely estimates the true β using simulations.

4. Simulations

Among some influential empirical studies of TV-VAR, both Cogley and Sargent (2001, 2005) and Primiceri (2005) employ a three-variable TV-VAR(2) model. Hence, in our simulation study, we adopt the same specification and use simulations to assess how well the GLS-based approach recovers the true time-varying parameters relative to the Bayesian approach. First, we compute the means and variances of the estimated time-varying parameters and compare them with those of the true time-varying parameters to evaluate the accuracy of both the GLS-based and the Bayesian8 approaches. While comparing the first and second moments of the estimates to those of the true process may be inadequate to determine whether the GLS-based approach yields precise estimates, it is a useful way to grasp the overall accuracy of the estimates.9
Second, we consider the possibility of the pile-up problem. According to Primiceri (2005), the Bayesian approach is preferred when estimating time-varying parameter models because, among other reasons, it can potentially avoid the pile-up problem. However, the extent to which this problem affects our estimate is not immediately obvious because the literature (e.g., Shephard and Harvey 1990) provides theoretical explanations only for limited (simple) cases. On the contrary, our model can have a vector of time-varying terms ( β t ), unlike prior studies that have analyzed scalar time-varying terms for simplicity. Therefore, it is reasonable to conduct a simulation study to reveal the extent to which our GLS-based approach suffers from the pile-up problem. Because the concern about the pile-up problem grows when the variance of the state equation error or the SNR is small, we study the performance of the GLS-based approach more comprehensively by altering the SNR in the data-generating process.
Third, we also evaluate the performance of the GLS-based approach when stochastic volatility and non-Gaussian errors are present. We investigate the effect of stochastic volatility on the GLS-based approach because macroeconomic research, including Cogley and Sargent (2005) and Primiceri (2005), has been allowing such shocks in the TV-VAR model. While the GLS-based approach does not require the assumption of i.i.d. errors to estimate β t , we are interested in the extent to which the accuracy of the GLS-based approach is affected by the stochastic volatility of the errors. For the non-Gaussian errors, we focus on the possible structural breaks in the time-varying coefficients, β t . By allowing a mixture of normal errors, as explained in the following subsection, we can model structural breaks or abrupt changes in β t , as opposed to the gradual changes that the time-varying model generally assumes. Our simulation study is thus expected to shed light on the performance of the GLS-based approach when such errors are present.
Finally, as mentioned in Section 1, since we consider OLS to be a component of the GLS-based approach, we study its performance using simulations to clarify the relative performance FGLS and OLS, especially for small samples.

4.1. Data-Generating Process

We generate pseudo data by the system of Equations (3) and (4) with T = 100 , 250 , H = 0.02 2 I , 0.2 2 I , 1 2 I and Q = 0.03 2 I . By changing the variance of the error to the observation equation, we consider the role of the SNR, which we define as the average trace of the variance–covariance matrix of η t relative to that of ε t : In our simulation, we consider the SNRs for 0.03 2 / 0.02 2 , 0.03 2 / 0.2 2 and 0.03 2 / 1 2 . The SNR is particularly important when we consider the possibility of the pile-up problem, which will be discussed in the next section. For the initial values, we set b 0 * = 0 .

4.1.1. Non-Gaussian Errors

The original motivation to employ time-varying models for macroeconomic research was to allow β t to change gradually. However, structural breaks or abrupt changes may exist in β t , which means that β t is almost constant over time until some point in the sample, for example, T b ; it then jumps to a different level afterward. One way to model such a break is to assume non-Gaussian errors for η t . In particular, we assume mixtures of normal distributions (e.g., Perron and Wada 2009 for each element of error vector η t :
η i t = λ t ζ 1 , t + 1 λ t ζ 2 , t
where
λ t i . i . d . B e r n o u l l i 0.95 ζ 1 , t N 0 , 0.03 2 , ζ 2 , t N 0 , 0.1 2 .
Intuitively, with a probability of 95%, η t is ζ 1 , t , which is drawn from a normal distribution with a small variance. This small η t keeps β t nearly constant over time. However, a large η t , which is ζ 2 , t , is drawn from a normal distribution with a (relatively) large variance. This η t causes β t to jump to a new level, with a 5% probability. Since we use the assumption of Gaussian errors to derive the equivalence between GLS and the Kalman-smoothed estimator, how non-Gaussian errors affect the accuracy of the GLS estimator when estimating β t should be evaluated using simulations.

4.1.2. Stochastic Volatility and Autoregressive Stochastic Volatility

As Cogley and Sargent (2005) argue, in response to the criticism of Cogley and Sargent (2001), it is more flexible and realistic to assume that the variance of the shock ε t is time-varying. Intuitively, not all shocks are generated from the same i.i.d. process. Since the GLS-based approach can handle the heteroskedasticity in H t and Q t , we can estimate the time-varying model with stochastic volatility, such as the one used by Primiceri (2005), at least theoretically. The error term ε t may also follow the autoregressive stochastic volatility process described by Taylor (2007) and elsewhere.
However, in general, FGLS is merely a remedy to more precisely estimate the coefficients (in our case, β t ) when heteroskedasticity is present and FGLS is not primarily designed to estimate the process that the error term (or its variance) follows.
Nevertheless, we use the following data-generating process to assess the performance of the GLS-based approach.
ε i t = h i , t ξ t log h i , t = ρ log h i , t 1 + e t
where ρ = 1 when stochastic volatility is considered and ρ = 0.9 when autoregressive stochastic volatility is considered; ε i t is the i-th element of ε t . We assume log h i , 0 = 0 , e t N 0 , 0.02 2 , and ξ t N 0 , 1 .

4.1.3. Eliminating Outliers

Since time-varying parameters mean that the generated series are generally non-stationary, some such series may be explosive and thus disregarded because they lack practical usefulness. In particular, if the standard deviation of the last 50 observations of a generated series is at least three times greater than that of the first 50 observations or if the inverse of Z Z does not exist, the series is discarded.

4.2. Mean and Variance of the Estimated β t and Likelihood

Since we simulate a TV-VAR(2) model with time-varying intercepts, β t is a 21 × 1 vector. Let β t , i , n denote the true (data-generating process) β t , i , n (i.e., the i-th element of vector β t , n ) and let β ^ t , i , n denote the estimate of β t , i , n . In the Bayesian MCMC case, we use the posterior mean for β ^ t , i , n .10 Since b 0 * is unknown in practice when estimating β t , i , n , we estimate b 0 * as the coefficient vector from a full-sample time-invariant (i.e., usual) VAR(2) model before estimating β t , i , n by OLS, GLS, or Bayesian. The sample means and sample standard deviations of the estimate over the sample period are then computed:
β ^ ¯ i , n = 1 T p t = p + 1 T β ^ t , i , n
s d β ^ i , n = 1 T p 1 t = p + 1 T β ^ t , i , n β ^ ¯ i , n 2 .
Similarly, we compute those of the true (data-generating) process:
β ¯ i , n = 1 T p t = p + 1 T β t , i , n
s d β i , n = 1 T p 1 t = p + 1 T β t , i , n β ¯ i , n 2 .
From (11) and (12) and their data-generating process counterparts, (13) and (14), we have 21 means and standard deviations for each replication. After N = 1000 replications, we compute the averages of β ^ ¯ i , n , β ¯ i , n , s d β ^ i , n , and s d β i , n over the replications. We then have 21 means of time-varying parameters and 21 means of standard deviations (i.e., i = 1 , 2 , , 21 ).
m ^ i = 1 N n = 1 N β ^ ¯ i , n ; m i = 1 N n = 1 N β ¯ i , n
s ^ i = 1 N n = 1 N s d β ^ i , n ; s i = 1 N n = 1 N s d β i , n
Since both m and s are aggregate means, a small difference between m and m ^ or between s and s ^ is only an indication that the estimator is close to what it is expected to estimate. Hence, we further investigate the similarities of β and β ^ . Comparing each element of β , we define the distance, “ d i s t ”, as follows:
d i s t i = 1 N T p n = 1 N t = p + 1 T β t , i , n β ^ t , i , n .
Similarly, we compare the standard deviations of each element of β as a ratio of the standard deviation of β ^ i , n to the standard deviation of the true process, β i , n :
r a t i = 1 N n = 1 N s d β ^ i , n s d β i , n .
In this simulation study, we focus on both d i s t i and r a t i . Our criteria for a good estimator are whether the d i s t i of an estimate is close to zero and whether the r a t i of that estimate is close to one.

4.3. Simulation Results 1: The SNR, Sample Size and Estimation Precision

Table 1 and Table 2 display the medians of d i s t i and r a t i as well as the medians of m i and s i for T = 100 and T = 250 , respectively.
We focus on the median values of the estimated parameters because we sometimes encounter outliers. Our view is that the non-stationary nature of the data-generating process together with the possibility of poorly estimated H and Q, especially when the SNR is low, creates those outliers in the estimated coefficients. OLS works relatively well when the SNR is relatively large because, as Table 1 and Table 2 show, the median distance of the estimate from the true process (i.e., d i s t i ) is small, and the median sample variance of the estimated β t is closer to that of the true process compared with other approaches (i.e., r a t i is closer to one).11 On the contrary, 2FGLS’ works relatively well when the SNR is small. General tendencies from Table 1 can be summarized as follows. First, OLS, 1FGLS and 2FGLS share largely the same characteristics. However, the volatility of β t estimated by 2FGLS’ is much smaller than those estimated by OLS, 1FGLS and 2FGLS. Second, OLS, 1FGLS, 2FGLS and 2FGLS’ all tend to have larger r a t i as the SNR increases. More precisely, OLS, 1FGLS, 2FGLS and 2FGLS’ overestimate (underestimate) the volatility of β t when the SNR is very small (large). Third, for the median distance of the estimate from the true process (i.e., d i s t i ) for T = 250, the best case for OLS and 1FGLS is when the SNR is 2.25. This phenomenon is easy to understand because an SNR that is either too small or too large make the estimation of β t difficult, since an SNR far from one means the degree of heterogeneity is quite serious. In such a situation, it is easy to imagine that OLS cannot recover β t well and 1FGLS may thus be unsuitable for implementing FGLS.
What is the effect of increasing the sample size? A comparison of Table 1 and Table 2 show that the degrees to which the volatility of β t is over- or underestimated is largely mitigated for OLS, 1FGLS and 2FGLS when the sample size increases from 100 to 250. At the same time, the median distances of the estimate from the true process for OLS, 1FGLS and 2FGLS generally shorten as the sample size increases, showing that the accuracy of OLS and 1FGLS improves with the sample size. However, such effects of an increased sample size do not clearly hold for 2FGLS’. Furthermore, Primiceri’s (2005) Bayesian estimation is inaccurate at recovering the true parameter values throughout the simulation. As Primiceri (2005) explains in detail, one caveat is that the prior for Q is crucial for the volatility of the posterior mean of β t . Hence, it may not be appropriate to use the same set of priors as Primiceri (2005) for this simulation study because more suitable prior values may improve the results. However, we must note that the other estimates do not necessitate such a choice depending on the SNR.
Additionally, instead of computing the median value of each measure over i, we can look closely at d i s t i and r a t i for each i. There are some parameters (i) where Primiceri’s (2005) method outperforms the others. Yet, our focus here is to see the overall performance of each method.

4.4. Simulation Results 2: The Effects of Non-i.i.d. and Non-Gaussian Errors

Table 3 presents the effects of non-Gaussian errors as well as stochastic volatility and stochastic autoregressive errors.
The general tendencies that appear in the Gaussian error case (Table 1 and Table 2) are maintained. While both OLS, 1FGLS and 2FGLS overestimate the volatility of β t , the degree of overestimation is largely mitigated when the sample size increases. Moreover, 2FGLS’ underestimates the volatility of β t , and increasing the sample size helps 2FGLS’ estimate β t more accurately, and Primiceri’s (2005) Bayesian approach cannot estimate β t well. Remarkably, given the value of the autoregressive parameter ρ , there is also a negligible difference between the stochastic volatility and autoregressive stochastic volatility cases.
When only the non-Gaussian error is considered, as Table 4 and Table 5 show, we obtain mostly the same results as those presented in Table 1 and Table 2. Once again, except for the Bayesian estimator, the degree of overestimation (underestimation) depends on the SNR. Similar to the results in Table 1 and Table 2, a larger sample size generally improves the estimation by OLS, 1FGLS and 2FGLS in that the degree of over- or underestimation is largely reduced when the sample size increases. In addition, for OLS, 1FGLS and 2FGLS, the median distance between the true and estimated β t shortens with the sample size. However, this tendency does not apply to 2FGLS’.
What is the effect of scholastic volatility or autoregressive volatility in the observation equation error ( ε t ) on our estimation? Table 6 shows that except for Primiceri’s (2005) Bayesian approach, the results arising from such errors are similar to the small SNR cases in Table 1 and Table 2.
This is because the observation error ( ε t ) has a variance larger than one due to the stochastic volatility ( h t ) term. The results of the stochastic volatility and autoregressive stochastic volatility cases are therefore similar.

4.5. Discussion: The Pile-Up Problem

Our results suggest that the GLS-based approach does not suffer from the pile-up problem and that lower SNRs often lead to the overestimation of the volatility of β t , especially when OLS, 1FGLS or 2FGLS is used (Table 1 and Table 2). Moreover, the degree of overestimation of the sample variance of β t becomes more severe when the sample size is small. This may be puzzling given the fact that OLS and ML are generally equivalent and that GLS and ML are equivalent if the errors are not i.i.d. (i.e., the errors heteroskedastic or autocorrelated). However, this statement is not true if FGLS fails to deal with non-i.i.d. errors appropriately. As OLS, 1FGLS and 2FGLS would then be unable to estimate an equivalent β t to that under ML, this explains why the GLS-based approach does not suffer from the pile-up problem. Interestingly, our simulations reveal that the Bayesian estimator yields much smaller variations over time.
Therefore, our simulations seem to suggest both that the use of 2FGLS’ is recommended when the sample size is small, and that OLS (1FGLS and 2FGLS as well) can recover the time-varying parameters fairly well when the sample size is large.

5. Application to the TV-VAR(2) Model with Interest Rates, Inflation, and Unemployment

A number of studies that employ TV-VAR models, including Cogley and Sargent (2005) and Primiceri (2005), focus on recovering the structural parameters from the estimated reduced form. Although we do not aim to identify fundamental shocks or compute impulse responses, we present the estimated TV-VAR(2) parameter using OLS (Figure 1), FGLS (Figure 2 for 1FGLS, Figure 3 for 2FGLS’, and Figure 4 for 2FGLS), and the posterior mean of the time-varying approach using Primiceri’s (2005) method (Figure 5).12
While Primiceri’s (2005) Bayesian MCMC posterior means are virtually time-invariant and the estimates by 2FGLS’ are slightly more volatile, the estimates provided by OLS, 1FGLS and 2FGLS have much larger volatility.
Interestingly, as detailed in the supplementary appendix (available upon request), the coefficients on the interest rate vary noticeably over time, exhibiting distinct patterns in the early 1980s (dip), late 1990s (up) and early 2000s (down). Similar to Primiceri’s (2005) Bayesian posterior means, the intercepts (three time-varying coefficients) are largely stable over time. This finding is consistent with our simulation results. Overall, Primiceri’s (2005) Bayesian estimator for the time-varying parameter ( β t ) tends to have very small variation over time, resulting in virtually time-invariant parameters.13

6. Conclusions

The multivariate non-Bayesian regression-based or GLS-based approach for the time-varying parameter model is presented and assessed from a simulation aspect. Although this approach has already been theoretically justified and (at least partly) used by Ito et al. (2014 2016, 2021), it is shown that using the GLS-based approach has at least four advantages. First, this approach can produce equivalent estimates without needing Kalman filtering or smoothing; it is also applicable to a wide range of time-varying parameter models (e.g., TV-AR, TV-VAR, and TV-VEC models) by adjusting the regression matrix accordingly. Second, the GLS-based approach works reasonably well in practice in that it can estimate the time-varying parameters even when non-i.i.d. errors or non-Gaussian errors exist in the model. In particular, we find that GLS outperforms Primiceri’s (2005) Bayesian approach in recovering the true parameter values because it can take into account generally heteroskedastic error terms. The ability to deal with non-Gaussian errors is particularly important in empirical studies because it allows us to consider possible abrupt changes in time-varying parameters instead of gradual changes due to Gaussian errors. Remarkably, GLS can even outperform Primiceri’s (2005) Bayesian approach that is often employed to estimate the TV-VAR models. However, in practice, the most appropriate method (OLS, 1FGLS, 2FGLS or 2FGLS’) should be chosen depending on the sample size and SNR. More precisely, OLS is acceptable when the SNR is not far from one or the sample size is not small; otherwise, 2FGLS’ is recommended. The reason why the sample size and SNR are important for choosing the preferred of the three methods is that 1FGLS, 2FGLS and 2FGLS’ are not ideal GLS; hence, they cannot fully deal with the heterogeneity arising from our regression equation that includes both observation equation errors and state equation errors. However, because we do not maximize the unconditional likelihood function with respect to the variances of the errors and because 1FGLS, 2FGLS and 2FGLS’ are not ideal GLS, the true variances are imprecisely estimated, and our GLS-based approach does not suffer from the pile-up problem that often occurs with ML.
While our focus in this paper is the estimation method of relatively simple TV models, more flexible models, such as a TV-VAR with time-varying variances of the structural shocks, have a higher necessity for macroeconomic analysis. Extending our approach to such complex models is of great importance to both econometricians and macroeconomists.

Author Contributions

Conceptualization, M.I., A.N. and T.W.; methodology, M.I. and T.W.; software, M.I., A.N. and T.W.; validation, A.N. and T.W.; formal analysis, T.W.; investigation, A.N. and T.W.; resources, T.W.; data curation, T.W.; writing—original draft preparation, T.W.; writing—review and editing, A.N. and T.W.; visualization, A.N. and T.W.; supervision, T.W.; project administration, T.W.; funding acquisition, M.I., A.N. and T.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Japan Society for the Promotion of Science Grant in Aid for Scientific Research (Nos.17K03809, 19K13747, and 20K01775), Murata Science Foundation Research Grant, and Okawa Foundation Research Grant.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from the Thomson Reuters Datastream and are available from the authors with the permission of the Thomson Reuters Datastream.

Acknowledgments

We would like to thank the editor, three anonymous referees, James Morley, Daniel Rees, Yunjong Eo, Yohei Yamamoto, Eiji Kurozumi, and conference participants at the 91th Annual Conference of the Western Economic Association International, First International Conference on Econometrics and Statistics, and Macro Reading Group Workshop at the Reserve Bank of Australia for their helpful comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
FGLSFeasible generalized least squares
GLSGeneralized least squares
MCMCMarkov Chain Monte Carlo
MLMaximum likelihood
MSEMean squared error
OLSOrdinary least squares
SNRSignal-to-noise ratio
TV-ARtime-varying autoregressive
TV-VARtime-varying vector autoregressive
TV-VECtime-varying vector error correction
VARVector autoregressive

Appendix A. The Summary of GLS-Kalman Smoother Equivalence

Proposition A1.
(Sant (1977); and else) The GLS estimator of model (10) yields the Kalman-smoothed estimates (8) and its mean squared error matrix (9).
Proof of Proposition A1.
Lemma A1.
S T U 1 V 1 = S 1 + S 1 T U V S 1 T 1 V S 1 provided S 1 exists.
The GLS estimate of β is
β ^ G L S = Z H 1 / 2 C 1 Q 1 / 2 H 1 / 2 Z Q 1 / 2 C 1 1 × Z H 1 / 2 C 1 Q 1 / 2 H 1 / 2 Y T Q 1 / 2 b 0 * = Z H 1 Z + C 1 Q 1 C 1 1 Z H 1 Y T + C 1 Q 1 b 0 * = C Q C C Q C Z Ω 1 Z C Q C Z H 1 Y T + C 1 Q 1 b 0 * = C Q C Z Ω 1 Y T + C C Q C Z Ω 1 Z C b 0 * = C b 0 * + C Q C Z Ω 1 Y T Z C b 0 * .
Here, we used Lemma,
Z H 1 Z + C 1 Q 1 C 1 1 = C Q C C Q C Z H + Z * C Q C Z 1 Z C Q C .
From (A2), the conditional variance of β ^ G L S is
V a r β ^ G L S | Y T = Z H 1 Z + C 1 Q 1 C 1 1 = C Q C C Q C Z Ω 1 Z C Q C .

Appendix B. Model with Time-Invariant Intercepts

While our model, (1) and (2), and its matrix formulation, (3) and (3), are flexible enough to admit time-varying coefficients, it is sometimes assumed that the class of TV-AR models has time-invariant intercepts. For the purpose of deriving the likelihood function, here, we modify our model to admit time-invariant intercepts. Suppose we have a k × k vector of time-invariant intercepts, v, in our model. Then, (1) and (2) become
y t = v + Z t β t + ε t
β t = β t 1 + η t .
In this case, it is convenient to use a matrix form to derive the likelihood function. With the vector of intercepts, our model in matrix form, (3) and (4), is then modified to
Y T = I v + Z β + ε
β = C b 0 * + η ,
where
I = I k I k I k ,
and I k is a k × k identity matrix. Similar to our assumption that time-varying intercepts, if they exist, are unknown, we assume that the vector of time-invariant intercepts, v, is the unknown parameter vector.
From our matrix formulation of (A4) and (A5), the log-likelihood function for Y T given the covariance matrices of the errors (H and Q), intercept (v), and initial value vector ( b 0 * ) is
log p Y T | H , Q , v , b 0 * = T p k 2 log 2 π 1 2 log Ω 1 2 Y T Z C b 0 * I v Ω 1 Y T Z C b 0 * I v .

Appendix B.1. The GLS Estimator under the Presence of Time-Invariant Intercepts

As we discuss in the previous section, our model admits time-invariant intercepts. Therefore, it is straightforward to define the GLS estimator for such models. To do so, assuming that the time-invariant intercepts are unknown, let us define the vector of unknown parameters, β * = v β . Then, the matrix form for regression that is analogous to (10) is
Y T b 0 * = I Z 0 C 1 β * + ε η .
Here, one of the advantages of utilizing the regression approach (A7) over Kalman smoothing (A5) is that the unknown intercept vector v is estimated simultaneously with β . Then, it can be shown that the GLS estimate v ^ is indeed the maximum likelihood estimate.
Proposition A2.
The GLS estimate v ^ of model (A7) is the maximum likelihood estimate (MLE) of (A5), v ^ M L conditional on H , Q , and b 0 * .
Proof. 
From the likelihood function, (A6), the normal equations pertaining to v are
I Ω 1 Y T Z C b 0 * I v ^ M L = 0 .
Therefore, the MLE for v is
v ^ M L = I Ω 1 I 1 I Ω 1 Y T Z C b 0 * .
Now, the GLS estimates for β * in model (A7) are
β ^ * = v ^ β ^ = I H 1 I I H 1 Z Z H 1 I Z H 1 Z + C 1 Q 1 C 1 1 I H 1 Z H 1 Y T + O C 1 Q 1 b 0 * .
Using the Lemma, we arrive at the following (see Appendix B.2 “Detailed Proof of Proposition A2” for details):
v ^ = I Ω 1 I 1 I Ω 1 Y T Z C b 0 * .
This proves v ^ M L = v ^ . □
Proposition A3.
The GLS estimate β ^ of model (A7) is the Kalman-smoothed estimate of model (A5).
Proof. 
Thanks to the intercept, the Kalman-smoothed estimate is now
β ˜ = C b 0 * + C Q C Z Ω 1 Y T I v Z C b 0 * .
From (A9), it follows that
β ^ = C b 0 * + C Q C Z Ω 1 Y T I v ^ Z C b 0 * .
We prove the equivalence. □
It is clear that the GLS-based approach can compute the Kalman-smoothed β and estimate the unknown intercepts, v, simultaneously. The next question is how we can obtain the statistical inference about β ^ . More precisely, at issue is whether the GLS-based approach yields the same MSE as the Kalman smoother. The answer to this question is negative for β ^ .
Proposition A4.
The mean squared error of the Kalman smoothed estimate is
V a r β | Y T = C Q C C Q C Z Ω 1 Z C Q C ,
whereas the variance estimated from the GLS-based approach (A9) is
V a r β ^ = C Q C C Q C Z Ω 1 Z C Q C + C Q C Z Ω 1 I I Ω 1 I 1 I Ω 1 Z C Q C .
Proof. 
See Appendix B.2 “Detailed Proof of Proposition A2” below. □
The difference between the Kalman-smoothed V a r β | Y T and the GLS-based variance V a r β ^ is C Q C Z Ω 1 I I Ω 1 I 1 I Ω 1 Z C Q C , which pertains to the estimation of v. If we did not have to estimate v (as we assume for the Kalman-smoothed estimate), V a r β | Y T and V a r β ^ would be the same. In other words, if v is known, the MSE of the Kalman-smoothed estimate is the same as the variance of the GLS-based estimate. As a matter of fact, if V a r v ^ = I Ω 1 I 1 = 0 , the two estimates would be identical. This result reflects that the two approaches yield the same estimate and MSE, as in Proposition A1. Nevertheless, what is important here is that we can obtain (A11) by utilizing the estimated variance of β ^ * of (A9). More specifically, we can estimate the MSE of the Kalman-smoothed estimate by
V a r β | Y T = V a r β ^ C o v β ^ , v ^ V a r v ^ 1 C o v β ^ , v ^ .

Appendix B.2. Detailed Proof of Propositions A2

Lemma A2.
If G 1 and the inverse of F = A B G 1 E exist,
A B E G 1 = F 1 F 1 B G 1 G 1 E F 1 G 1 + G 1 E F 1 B G 1 .
In our case,
A = I H 1 I , B = I H 1 Z , G = Z H 1 Z + C 1 Q 1 C 1 , E = Z H 1 I ;
and
F = I H 1 I I H 1 Z Z H 1 Z + C 1 Q 1 C 1 1 Z H 1 I = I H 1 H 1 Z Z H 1 Z + C 1 Q 1 C 1 1 Z H 1 I ,
whose inverse is
F 1 = I H 1 H 1 Z Z H 1 Z + C 1 Q 1 C 1 1 Z H 1 I 1 = I H + Z C Q C Z 1 I 1 = I Ω 1 I 1 .
Other useful equations are
G 1 = Z H 1 Z + C 1 Q 1 C 1 1 = C Q C C Q C Z H + Z C Q C Z 1 Z C Q C = C Q C C Q C Z Ω 1 Z C Q C ;
Ω 1 = H + Z C Q C Z 1 = H 1 H 1 Z Z H 1 Z + C 1 Q 1 C 1 1 Z H 1 = H 1 H 1 Z G 1 Z H 1 .
Then, for (A9), we arrive at
v ^ = F 1 I H 1 F 1 B G 1 Z H 1 ( i ) Y T F 1 B G 1 C 1 Q 1 ( ii ) b 0 *
β ^ = G 1 E F 1 I H 1 + G 1 + G 1 E F 1 B G 1 Z H 1 ( i i i ) Y T + G 1 + G 1 E F 1 B G 1 C 1 Q 1 ( i v ) b 0 * .
(i)
F 1 I H 1 F 1 B G 1 Z H 1 = I Ω 1 I 1 I H 1 × I Z C Q C Z H 1 + Z C Q C Z Ω 1 Z C Q C Z H 1 = I Ω 1 I 1 I H 1 I Ω H H 1 + Ω H Ω 1 Ω H H 1 = I Ω 1 I 1 I H 1 I Ω H 1 + I + Ω H 1 I I + H Ω 1 = I Ω 1 I 1 I Ω 1 .
(ii)
F 1 B G 1 C 1 Q 1 = I Ω 1 I 1 I H 1 Z C Q C C Q C Z Ω 1 Z C Q C C 1 Q 1 = I Ω 1 I 1 I H 1 Z C C Q C Z Ω 1 Z C = I Ω 1 I 1 I H 1 Z C Z C Q C Z Ω 1 Z C = I Ω 1 I 1 I H 1 I Z C Q C Z Ω 1 Z C = I Ω 1 I 1 I H 1 I Ω H Ω 1 Z C = I Ω 1 I 1 I Ω 1 Z C .
Therefore,
v ^ = I Ω 1 I 1 I Ω 1 Y T I Ω 1 I 1 I Ω 1 Z C b 0 * = I Ω 1 I 1 I Ω 1 Y T Z C b 0 * .
(iii)
G 1 E F 1 I H 1 + G 1 + G 1 E F 1 B G 1 Z H 1 = G 1 E F 1 I H 1 B G 1 Z H 1 + G 1 Z H 1 = G 1 E F 1 I H 1 H 1 Z G 1 Z H 1 + G 1 Z H 1 = G 1 E F 1 I Ω 1 + G 1 Z H 1 ( from ( A13 ) ) = G 1 Z H 1 I F 1 I Ω 1 + G 1 Z H 1 = G 1 Z H 1 I I F 1 I Ω 1 = C Q C C Q C Z Ω 1 Z C Q C Z H 1 I I F 1 I Ω 1 = C Q C Z H 1 I I F 1 I Ω 1 + C Q C Z Ω 1 Ω H H 1 I I F 1 I Ω 1 = C Q C Z Ω 1 I I F 1 I Ω 1 = C Q C Z Ω 1 C Q C Z Ω 1 I I Ω 1 I 1 I Ω 1 = C Q C Z Ω 1 I I I Ω 1 I 1 I Ω 1
(iv)
G 1 + G 1 E F 1 B G 1 C 1 Q 1 = G 1 C 1 Q 1 + G 1 E F 1 B G 1 C 1 Q 1 = C Q C C Q C Z Ω 1 Z C Q C C 1 Q 1 + G 1 E I Ω 1 I 1 I Ω 1 Z C ( from ( ii ) ) = C C Q C Z Ω 1 Z C + C Q C C Q C Z Ω 1 Z C Q C Z H 1 I I Ω 1 I 1 I Ω 1 Z C = C C Q C Z Ω 1 Z C + C Q C Z H 1 Z Ω 1 Z C Q C Z H 1 I I Ω 1 I 1 I Ω 1 Z C = C C Q C Z Ω 1 Z C + C Q C Z H 1 Z Ω 1 Ω H H 1 I I Ω 1 I 1 I Ω 1 Z C = C C Q C Z Ω 1 Z C + C Q C Z Ω 1 I I Ω 1 I 1 I Ω 1 Z C = I C Q C Z Ω 1 Z + C Q C Z Ω 1 I I Ω 1 I 1 I Ω 1 Z C .
Therefore,
β ^ = C Q C Z Ω 1 I I I Ω 1 I 1 I Ω 1 Y T + I C Q C Z Ω 1 Z + C Q C Z Ω 1 I I Ω 1 I 1 I Ω 1 Z C b 0 * = C b 0 * + C Q C Z Ω 1 Y T I v ^ Z C b 0 * .
The mean squared error matrix
V a r β | Y T = C Q C C Q C Z Ω 1 Z C Q C .
From (A9) and Lemma, we can show
V a r β ^ = G 1 + G 1 E F 1 B G 1 = C Q C C Q C Z Ω 1 Z C Q C + G 1 Z H 1 I I Ω 1 I 1 I H 1 Z G 1 = C Q C C Q C Z Ω 1 Z C Q C + C Q C C Q C Z Ω 1 Z C Q C Z H 1 I I Ω 1 I 1 I H 1 Z × C Q C C Q C Z Ω 1 Z C Q C = C Q C C Q C Z Ω 1 Z C Q C + C Q C Z H 1 C Q C Z Ω 1 Z C Q C Z H 1 I I Ω 1 I 1 I × H 1 Z C Q C H 1 Z C Q C Z Ω 1 Z C Q C = C Q C C Q C Z Ω 1 Z C Q C + C Q C Z H 1 C Q C Z Ω 1 Ω H H 1 I I Ω 1 I 1 I × H 1 Z C Q C H 1 Ω H Ω 1 Z C Q C = C Q C C Q C Z Ω 1 Z C Q C + C Q C Z Ω 1 I I Ω 1 I 1 I Ω 1 Z C Q C .
Note also that
F 1 B G 1 = I Ω 1 I 1 I H 1 Z C Q C C Q C Z Ω 1 Z C Q C = I Ω 1 I 1 I H 1 Z C Q C H 1 Z C Q C Z Ω 1 Z C Q C = I Ω 1 I 1 I H 1 Z C Q C H 1 Ω H Ω 1 Z C Q C = I Ω 1 I 1 I Ω 1 Z C Q C ,
and
G 1 E F 1 = C Q C C Q C Z Ω 1 Z C Q C Z H 1 I I Ω 1 I 1 = C Q C Z H 1 C Q C Z Ω 1 Z C Q C Z H 1 I I Ω 1 I 1 = C Q C Z H 1 C Q C Z Ω 1 Ω H H 1 I I Ω 1 I 1 = C Q C Z Ω 1 I I Ω 1 I 1 .
Therefore,
V a r β | Y T = V a r β ^ C o v β ^ , v ^ V a r v ^ 1 C o v β ^ , v ^ .

Appendix C. TV-VAR(2) with Time-Varying Intercepts

VAR(2) Case: p = 2 (i.e., 2 Lags) and k = 3 (i.e., 3 Variables)

To make the matrix Z, first define
Z t = 1 , y t 1 , y t 2 I k k × p k + 1 k = 1 , y t 1 , y t 2 I k .
Then,
Z k T p × p k + 1 k T p = Z 3 0 Z 4 0 Z T = 1 , y 2 , y 1 I k 0 1 , y 3 , y 2 I k 0 1 , y T 1 , y T 2 I k = z I k
where
z T p × k p + 1 T p = 1 , y 2 , y 1 0 1 , y 3 , y 2 0 1 , y T 1 , y T 2 .
For the regression:
Y T b 0 * k T p k p + 2 × 1 = Z C 1 k T p k p + 2 × k p + 1 k T p β * T p k k p + 1 × 1 + ε η k T p k p + 2 × 1 ,
one needs to define
X = Z C 1 .
Then,
X X k p + 1 k T p × k p + 1 k T p = Z C 1 Z C 1 = Z Z + C 1 C 1 ,
where
C k p + 1 k T p × k p + 1 k T p = I 0 0 I I 0 I I I I = 1 0 0 1 1 0 1 1 1 1 T p × T p I k p + 1 k = c I k p + 1 k .
Here,
c T p × T p = 1 0 0 1 1 0 1 1 1 1 .
The rest of the matrices needed for GLS are:
Y T T p k × 1 = y p + 1 y p + 2 y T , Z k T p × p k + 1 k T p = Z p + 1 0 Z p + 2 0 Z T , β T p k k p + 1 × 1 = β p + 1 β p + 2 β T , ε T p k × 1 = ε p + 1 ε p + 2 ε T , η T p k k p + 1 × 1 = η p + 1 η p + 2 η T , C k p + 1 k p T p × k p + 1 k T p = I 0 0 I I 0 I I I I , Q T p k k p + 1 × T p k k p + 1 = Q p + 1 0 Q p + 2 0 Q T ,
H T p k × T p k = H p + 1 0 H p + 2 0 H T ,
b 0 * T p k k p + 1 × 1 = b 0 0 0 , P 0 * = P 0 0 0 0 0 0 0 0 .

Notes

1
An alternative to those two methods is the approach presented by Cooley and Prescott (1976), who use the likelihood method to estimate the unknown parameters rather than Kalman filtering.
2
Related to our approach of not using Kalman filtering, McCausland et al. (2011) develop and propose a new simulation smoothing approach which is more computationally efficient than the approach based on Kalman filtering. While we pay little attention to computational efficiency in this paper, evaluating computation costs along with estimation accuracy should be further investigated in later studies.
3
Ito et al. (2014, 2016) do not formally prove that their regression-based approach generates estimates that are equivalent to Kalman-smoothed estimates.
4
As our model include unknown parameters such as the variances of the error terms, we must rely on feasible GLS (FGLS), which may not be equivalent to GLS.
5
In this paper, we focus on the case where ε t and η t are mutually uncorrelated. Relaxing this assumption poses a great challenge.
6
This assumption does not change our conclusions below. The main difference is that V a r β = C P 0 * + Q C and V a r Y T = Z C P 0 * + Q C Z + H = Ω . An exception is when the diffuse prior is used and the likelihood function is computed excluding the first few observations. In such a case, the estimates of the unknown intercept parameters under the two approaches would differ.
7
By contrast, Duncan and Horn (1972) assume that matrix F is known, which renders their estimation impractical. The original form of Maddala and Kim (1998, pp. 469–70) is similar to ours, but it is a general form for a scalar y t . Hence, it does not aim to deal with the autoregressive part of time-varying parameter models nor consider vector processes.
8
For the Bayesian approach, we focus on the posterior mean from MCMC. Since our simulations are based on Primiceri’s (2005) model, we use the same priors as his. The Matlab codes provided by D. Korobilis are used, which can be downloaded from: https://drive.google.com/file/d/1pYNP96FeGgBH1KpnDEEdXGqZ62ZPw_PQ/view, accessed on 14 March 2022.
9
In addition, we can compute the values of the log-likelihood function to evaluate whether the repeated use of FGLS improves estimation accuracy. Our simulation tends to show that 2FGLS has a higher likelihood value than 1FGLS.
10
After computing β ^ t , i , n , we discard first 40 ( t = 1 , , 40 ) of them in accordance with Korobilis’s Matlab codes for the Bayesian MCMC method. Hence, we use T = l e n g t h o f t h e d a t a 40 to compute the sample moments of β ^ by (11) through (18).
11
Throughout this simulation study, we use bold numbers to highlight the best (the smallest median d i s t and the median r a t closest to one) estimation method of the four approaches (OLS, 1FGLS, 2FGLS, 2FGLS’ and Primiceri).
12
We use the data and MATLAB codes provided by Koop and Korobilis (2010).
13
Note that the impulse responses of Primiceri’s (2005) VAR vary largely over time. This is not because the time-varying parameters ( β t ) are very volatile over time, but mainly because the variance of the shocks are time-dependent and vary greatly, as shown in Figure 1 of Primiceri (2005, p. 832) and as discussed in the conclusion thereof.

References

  1. Bernanke, Ben S., and Ilian Mihov. 1998. Measuring monetary policy. Quarterly Journal of Economics 113: 869–902. [Google Scholar] [CrossRef] [Green Version]
  2. Chan, Joshua C. C., and Ivan Jeliazkov. 2009. Efficient simulation and integrated likelihood estimationin state space models. International Journal of Mathematical Modelling and Numerical Optimisation 1: 101–20. [Google Scholar] [CrossRef]
  3. Cogley, Timothy F., and Thomas J. Sargent. 2001. Evolving post-world war II u.s. inflation dynamics. NBER Macroeconomics Annual 16: 331–73. [Google Scholar] [CrossRef]
  4. Cogley, Timothy F., and Thomas J. Sargent. 2005. Drifts and volatilities: Monetary policies and outcomes in the post WWII US. Review of Economic Dynamics 8: 262–302. [Google Scholar] [CrossRef] [Green Version]
  5. Cogley, Timothy F., and Edward C. Prescott. 1976. Estimation in the presence of stochastic parameter variation. Econometrica 44: 167–84. [Google Scholar]
  6. Duncan, David B., and Susan D. Horn. 1972. Linear dynamic recursive estimation from the viewpoint of regression analysis. Journal of the American Statistical Association 67: 815–21. [Google Scholar] [CrossRef]
  7. Durbin, James, and Siem J. Koopman. 2012. Time Series Analysis by State Space Methods, 2nd ed. Oxford: Oxford University Press. [Google Scholar]
  8. Hamilton, James D. 1989. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57: 357–84. [Google Scholar] [CrossRef]
  9. Hansen, Bruce E. 1992. Testing for parameter instability in linear models. Journal of Policy Modeling 14: 517–33. [Google Scholar] [CrossRef]
  10. Harvey, Andrew C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge and New York: Cambridge University Press. [Google Scholar]
  11. Ito, Mikio, Akihiko Noda, and Tatsuma Wada. 2014. International stock market efficiency: A non-bayesian time-varying model approach. Applied Economics 46: 2744–754. [Google Scholar] [CrossRef] [Green Version]
  12. Ito, Mikio, Akihiko Noda, and Tatsuma Wada. 2016. The evolution of stock market efficiency in the us: A non-bayesian time-varying model approach. Applied Economics 48: 621–35. [Google Scholar] [CrossRef] [Green Version]
  13. Ito, Mikio, Akihiko Noda, and Tatsuma Wada. 2021. Time-varying comovement of foreign exchange markets: A GLS-based time-varying model approach. Mathematics 9: 849. [Google Scholar] [CrossRef]
  14. Koop, Gary, and Dimitris Korobilis. 2010. Bayesian multivariate time series methods for empirical macroeconomics. Foundations and Trends in Econometrics 3: 267–358. [Google Scholar] [CrossRef]
  15. Koopman, Siem J. 1997. Exact initial kalman filtering and smoothing for nonstationary time series models. Journal of the American Statistical Association 92: 1630–638. [Google Scholar] [CrossRef]
  16. Maddala, Gangadharrao S., and In-Moo Kim. 1998. Unit Roots, Cointegration, and Structural Change. Cambridge, New York: Cambridge University Press. [Google Scholar]
  17. McCausland, William J., Shirley Miller, and Denis Pelletier. 2011. Simulation smoothing for state-space models: A computational efficiency analysis. Computational Statistics & Data Analysis 55: 199–212. [Google Scholar]
  18. Perron, Pierre. 1989. The great crash, the oil price shock, and the unit root hypothesis. Econometrica 57: 1361–401. [Google Scholar] [CrossRef]
  19. Perron, Pierre, and Tatsuma Wada. 2009. Let’s take a break: Trends and cycles in us real gdp. Journal of Monetary Economics 56: 749–65. [Google Scholar] [CrossRef]
  20. Primiceri, Giorgio E. 2005. Time varying structural vector autoregressions and monetary policy. Review of Economic Studies 72: 821–52. [Google Scholar] [CrossRef]
  21. Sant, Donald T. 1977. Generalized least squares applied to time varying parameter models. Annals of Economic and Social Measurement 6: 301–14. [Google Scholar]
  22. Sargan, Dennis J., and Alok Bhargava. 1983. Maximum likelihood estimation of regression models with first order moving average errors when the root lies on the unit circle. Econometrica 51: 799–820. [Google Scholar] [CrossRef]
  23. Shephard, Neil G., and Andrew C. Harvey. 1990. On the probability of estimating a deterministic component in the local level model. Journal of Time Series Analysis 11: 339–347. [Google Scholar] [CrossRef]
  24. Tanaka, Katsuto. 2017. Time Series Analysis: Nonstationary and Noninvertible Distribution Theory, 2nd ed. Hoboken: John Wiley & Suns, Inc. [Google Scholar]
  25. Taylor, Stephen J. 2007. Modelling Financial Time Series, 2nd ed. Hackensack: World Scientific. [Google Scholar]
Figure 1. The Estimated Time-Varying Parameters: OLS.
Figure 1. The Estimated Time-Varying Parameters: OLS.
Econometrics 10 00023 g001
Figure 2. The Estimated Time-Varying Parameters: 1FGLS.
Figure 2. The Estimated Time-Varying Parameters: 1FGLS.
Econometrics 10 00023 g002
Figure 3. The Estimated Time-Varying Parameters: 2GLS’.
Figure 3. The Estimated Time-Varying Parameters: 2GLS’.
Econometrics 10 00023 g003
Figure 4. The Estimated Time-Varying Parameters: 2GLS.
Figure 4. The Estimated Time-Varying Parameters: 2GLS.
Econometrics 10 00023 g004
Figure 5. Replication of Primiceri (2005).
Figure 5. Replication of Primiceri (2005).
Econometrics 10 00023 g005
Table 1. Simulation Results T = 100 .
Table 1. Simulation Results T = 100 .
HQ TrueOLS1FGLS2FGLS2FGLS’Primiceri
0.02 2 0.03 2 median m−0.006−0.0010.000 0.0010.0010.000
median s0.0860.0330.0160.019 0.0100.000
median d i s t 0.129 0.1410.1640.1930.273
S N R = 2.25 median r a t 0.4160.2060.2520.1240.006
0.2 2 0.03 2 median m−0.002−0.003−0.003−0.0030.0010.002
median s0.0870.1350.0840.048 0.0310.000
median d i s t 0.1640.1310.1200.1220.273
S N R = 0.0225 median r a t 1.7591.0780.6090.3970.006
1 0.03 2 median m−0.001−0.009−0.009−0.006−0.005−0.003
median s0.0870.2870.2910.297 0.1040.000
median d i s t 0.2720.2770.2780.1350.273
S N R = 0.0009 median r a t 3.7613.8123.9041.3390.005
Notes: (1) The numbers in the column under “True”are computed from the data-generating process described in Section 4.2. (2) “m”, “s”, “ d i s t ”, and “ r a t ”stand for the mean, the standard deviation, the distance from the true values, and the ratio of the standard deviation of the estimates to that of the true values of β . (3) The bold numbers are the smallest (for median “ d i s t ”) and the closest to one (for median “ r a t ”), indicating the best method out of the four (OLS, 1FGLS, 2FGLS, 2FGLS’, Primiceri).
Table 2. Simulation Results T = 250 .
Table 2. Simulation Results T = 250 .
HQ TrueOLS1FGLS2FGLS2FGLS’Primiceri
0.02 2 0.03 2 median m−0.007−0.003−0.002−0.0010.0020.002
median s0.1560.1100.0760.059 0.0220.024
median d i s t 0.1030.1260.1500.2630.348
S N R = 2.25 median r a t 0.7180.4940.3920.1590.173
0.2 2 0.03 2 median m−0.004−0.006−0.006−0.0050.0020.003
median s0.1560.2010.1500.105 0.0610.048
median d i s t 0.1530.1280.120 0.1440.341
S N R = 0.0225 median r a t 1.3791.0130.6920.4080.337
1 0.03 2 median m−0.002−0.003−0.004−0.005−0.0020.003
median s0.1560.3210.3200.317 0.1350.030
median d i s t 0.2430.2430.2430.1140.341
S N R = 0.0009 median r a t 2.2852.2822.2650.9220.208
Notes: (1) The numbers in the column under “True”are computed from the data-generating process described in Section 4.1. (2) “m”, “s”, “ d i s t ”, and “ r a t ”stand for the mean, the standard deviation, the distance from the true values, and the ratio of the standard deviation of the estimates to that of the true values of β . (3) The bold numbers are the smallest (for median “ d i s t ”) and the closest to one (for median “ r a t ”), indicating the best method out of the four (OLS, 1FGLS, 2FGLS, 2FGLS’, Primiceri).
Table 3. Stochastic Volatility and Autoregressive Stochastic Volatility.
Table 3. Stochastic Volatility and Autoregressive Stochastic Volatility.
TQ TrueOLS1FGLS2FGLS2FGLS’Primiceri
100 0.03 2 median m−0.002−0.002−0.003−0.002−0.005−0.001
RWmedian s0.0860.2890.2950.298 0.1030.000
median d i s t 0.2730.2780.2790.1350.273
median r a t 3.8373.9163.9861.3690.005
100 0.03 2 median m−0.002−0.002−0.003−0.002−0.005−0.002
ARmedian s0.0860.2890.2950.298 0.1030.000
median d i s t 0.2730.2780.2790.1350.273
median r a t 3.8373.9163.9861.3690.005
250 0.03 2 median m−0.002−0.006−0.005−0.005−0.0040.001
RWmedian s0.1540.3210.3200.318 0.1360.030
median d i s t 0.2440.2440.2440.1140.343
median r a t 2.2992.2912.2770.9280.211
250 0.03 2 median m−0.002−0.008−0.008 -0.009−0.0050.002
ARmedian s0.1540.3220.3210.318 0.1370.030
median d i s t 0.2430.2440.2440.1140.338
median r a t 2.3102.3032.2920.9370.213
Notes: (1) The numbers in the column under “True”are computed from the data-generating process: η i t = λ t ζ t 1 + ( 1 λ t ) ζ t 2 where λ t i . i . d . B e r n o u l l i ( 0.95 ) , ζ 1 , t N ( 0 , 0.03 2 ) , ζ 2 , t N ( 0 , 0.1 2 ) and ε i t = h i , t ξ t , log h i , t = ρ log h i , t 1 + e t where ρ = 1 when stochastic volatility is considered (labeled as RW), ρ = 0.9 when autoregressive stochastic volatility is considered (labeled as AR), ε i t is the i-th element of ε t ; log h i , 0 = 0 , e t N 0 , 0.02 2 , and ξ t N 0 , 1 . (2) “m”, “s”, “ d i s t ”, and “ r a t ”stand for the mean, the standard deviation, the distance from the true values, and the ratio of the standard deviation of the estimates to that of the true values of β . (3) The bold numbers are the smallest (for median “ d i s t ”) and the closest to one (for median “ r a t ”), indicating the best method out of the four (OLS, 1FGLS, 2FGLS, 2FGLS’, Primiceri).
Table 4. Mixtures of Normals T = 100 .
Table 4. Mixtures of Normals T = 100 .
HQ TrueOLS1FGLS2FGLS2FGLS’Primiceri
0.02 2 median m−0.002 0.0020.0040.0040.0040.003
median s0.1020.0420.0250.021 0.0120.001
median d i s t 0.1370.1510.1710.2290.328
median r a t 0.4770.2660.229 0.1390.006
0.2 2 median m−0.001−0.004−0.003−0.0030.0060.007
median s0.1030.1380.0920.059 0.0330.000
median d i s t 0.1640.1360.1290.1350.310
median r a t 1.4970.9870.6330.3620.006
1 median m−0.002−0.007−0.007−0.007−0.0070.000
median s0.1050.2770.2810.284 0.0990.000
median d i s t 0.2590.2640.2650.1350.319
median r a t 3.0683.1123.1391.0660.005
Notes: (1) The numbers in the column under “True”are computed from the data-generating process: η i t = λ t ζ t 1 + ( 1 λ t ) ζ t 2 where λ t i . i . d . B e r n o u l l i ( 0.95 ) , ζ 1 , t N ( 0 , 0.03 2 ) , ζ 2 , t N ( 0 , 0.1 2 ) . (2) “m”, “s”, “ d i s t ”, and “ r a t ”stand for the mean, the standard deviation, the distance from the true values, and the ratio of the standard deviation of the estimates to that of the true values of β . (3) The bold numbers are the smallest (for median “ d i s t ”) and the closest to one (for median “ r a t ”), indicating the best method out of the four (OLS, 1FGLS, 2FGLS, 2FGLS’, Primiceri).
Table 5. Mixtures of Normals T = 250 .
Table 5. Mixtures of Normals T = 250 .
HQ TrueOLS1FGLS2FGLS2FGLS’Primiceri
0.02 2 median m−0.006−0.001−0.002−0.0030.0040.002
median s0.1810.1320.1010.083 0.0300.045
median d i s t 0.1090.1300.1530.2910.388
median r a t 0.7380.5540.460 0.1830.268
0.2 2 median m−0.004−0.005−0.004−0.0030.0030.002
median s0.1820.2120.1690.128 0.0640.074
median d i s t 0.1500.1310.1290.1770.387
median r a t 1.2350.9690.7260.3730.450
1 median m−0.002−0.006−0.006−0.006−0.0070.000
median s0.1810.3180.3170.329 0.1380.063
median d i s t 0.2280.2290.2390.1270.385
median r a t 1.9181.9061.9610.7950.373
Notes: (1) The numbers in the column under “True”are computed from the data-generating process: η i t = λ t ζ t 1 + 1 λ t ζ t 2 where λ t i . i . d . B e r n o u l l i ( 0.95 ) , ζ 1 , t N ( 0 , 0.03 2 ) , ζ 2 , t N ( 0 , 0.1 2 ) . (2) “m”, “s”, “ d i s t ”, and “ r a t ”stand for the mean, the standard deviation, the distance from the true values, and the ratio of the standard deviation of the estimates to that of the true values of β . (3) The bold numbers are the smallest (for median “ d i s t ”) and the closest to one (for median “ r a t ”), indicating the best method out of the four (OLS, 1FGLS, 2FGLS, 2FGLS’, Primiceri).
Table 6. Stochastic Volatility, Autoregressive Stochastic Volatility, and Mixtures of Normals.
Table 6. Stochastic Volatility, Autoregressive Stochastic Volatility, and Mixtures of Normals.
TRW/AR TrueOLS1FGLS2FGLS2FGLS’Primiceri
100RWmedian m−0.002−0.010−0.010−0.008−0.008−0.004
median s0.1040.2750.2780.282 0.0980.000
median d i s t 0.2580.2610.2620.1340.317
median r a t 3.0703.1313.1751.0780.005
ARmedian m−0.002−0.010−0.010−0.008−0.008−0.004
median s0.1040.2750.2780.282 0.0980.000
median d i s t 0.2580.2610.2620.1340.317
median r a t 3.0703.1303.1771.0780.005
250RWmedian m−0.001−0.005−0.005−0.006−0.004−0.001
median s0.1800.3170.3150.314 0.1350.060
median d i s t 0.2280.2280.2290.1270.381
median r a t 1.9241.9131.9050.7850.361
ARmedian m−0.001−0.005−0.006−0.007−0.004−0.001
median s0.1800.3170.3150.314 0.1350.060
median d i s t 0.2280.2280.2290.1270.382
median r a t 1.9231.9161.9110.7850.362
Notes: (1) The numbers in the column under “True”are computed from the data-generating process: ε i t = h i , t ξ t , log h i , t = ρ log h i , t 1 + e t where ρ = 1 when stochastic volatility is considered (labeled as RW), ρ = 0.9 when autoregressive stochastic volatility is considered (labeled as AR), ε i t is the i-th element of ε t ; log h i , 0 = 0 , e t N 0 , 0.02 2 , and ξ t N 0 , 1 . (2) “m”, “s”, “ d i s t ”, and “ r a t ”stand for the mean, the standard deviation, the distance from the true values, and the ratio of the standard deviation of the estimates to that of the true values of β . (3) The bold numbers are the smallest (for median “ d i s t ”) and the closest to one (for median “ r a t ”), indicating the best method out of the four (OLS, 1FGLS, 2FGLS, 2FGLS’, Primiceri).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ito, M.; Noda, A.; Wada, T. An Alternative Estimation Method for Time-Varying Parameter Models. Econometrics 2022, 10, 23. https://doi.org/10.3390/econometrics10020023

AMA Style

Ito M, Noda A, Wada T. An Alternative Estimation Method for Time-Varying Parameter Models. Econometrics. 2022; 10(2):23. https://doi.org/10.3390/econometrics10020023

Chicago/Turabian Style

Ito, Mikio, Akihiko Noda, and Tatsuma Wada. 2022. "An Alternative Estimation Method for Time-Varying Parameter Models" Econometrics 10, no. 2: 23. https://doi.org/10.3390/econometrics10020023

APA Style

Ito, M., Noda, A., & Wada, T. (2022). An Alternative Estimation Method for Time-Varying Parameter Models. Econometrics, 10(2), 23. https://doi.org/10.3390/econometrics10020023

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop